Training: 2022-04-12 19:42:25,049-rank_id: 0
Training: 2022-04-12 19:42:55,671-: margin_list [1.0, 0.0, 0.4]
Training: 2022-04-12 19:42:55,672-: network r100
Training: 2022-04-12 19:42:55,672-: resume False
Training: 2022-04-12 19:42:55,672-: output work_dirs/wf42m_pfc02_r100
Training: 2022-04-12 19:42:55,672-: embedding_size 512
Training: 2022-04-12 19:42:55,672-: sample_rate 0.2
Training: 2022-04-12 19:42:55,672-: interclass_filtering_threshold0
Training: 2022-04-12 19:42:55,672-: fp16 True
Training: 2022-04-12 19:42:55,672-: batch_size 128
Training: 2022-04-12 19:42:55,672-: optimizer sgd
Training: 2022-04-12 19:42:55,672-: lr 0.1
Training: 2022-04-12 19:42:55,672-: momentum 0.9
Training: 2022-04-12 19:42:55,673-: weight_decay 0.0005
Training: 2022-04-12 19:42:55,673-: verbose 10000
Training: 2022-04-12 19:42:55,673-: frequent 10
Training: 2022-04-12 19:42:55,673-: dali True
Training: 2022-04-12 19:42:55,673-: rec /train_tmp/WebFace42M
Training: 2022-04-12 19:42:55,673-: num_classes 2059906
Training: 2022-04-12 19:42:55,673-: num_image 42474557
Training: 2022-04-12 19:42:55,673-: num_epoch 20
Training: 2022-04-12 19:42:55,673-: warmup_epoch 0
Training: 2022-04-12 19:42:55,673-: val_targets ['lfw', 'cfp_fp', 'agedb_30']
Training: 2022-04-12 19:42:55,673-: total_batch_size 1024
Training: 2022-04-12 19:42:55,673-: warmup_step 0
Training: 2022-04-12 19:42:55,673-: total_step 829580
Training: 2022-04-12 19:43:57,091-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-12 19:44:04,153-Speed 2621.92 samples/sec Loss 42.5304 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 16384 Required: 109 hours
Training: 2022-04-12 19:44:08,052-Speed 2627.37 samples/sec Loss 42.6028 LearningRate 0.1000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 16384 Required: 103 hours
Training: 2022-04-12 19:44:11,898-Speed 2663.44 samples/sec Loss 42.8713 LearningRate 0.1000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 16384 Required: 99 hours
Training: 2022-04-12 19:44:15,772-Speed 2643.31 samples/sec Loss 43.3095 LearningRate 0.1000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 16384 Required: 97 hours
Training: 2022-04-12 19:44:19,641-Speed 2647.58 samples/sec Loss 43.3497 LearningRate 0.1000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 16384 Required: 96 hours
Training: 2022-04-12 19:44:23,558-Speed 2615.48 samples/sec Loss 43.4846 LearningRate 0.1000 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 16384 Required: 95 hours
Training: 2022-04-12 19:44:27,482-Speed 2610.28 samples/sec Loss 43.5972 LearningRate 0.1000 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 16384 Required: 95 hours
Training: 2022-04-12 19:44:31,325-Speed 2664.88 samples/sec Loss 43.0671 LearningRate 0.1000 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 16384 Required: 94 hours
Training: 2022-04-12 19:44:35,209-Speed 2637.76 samples/sec Loss 43.2108 LearningRate 0.1000 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 16384 Required: 94 hours
Training: 2022-04-12 19:44:39,129-Speed 2613.09 samples/sec Loss 43.3634 LearningRate 0.1000 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 32768 Required: 93 hours
Training: 2022-04-12 19:44:42,990-Speed 2653.32 samples/sec Loss 43.2378 LearningRate 0.1000 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 32768 Required: 93 hours
Training: 2022-04-12 19:44:46,847-Speed 2655.40 samples/sec Loss 43.3276 LearningRate 0.1000 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 32768 Required: 93 hours
Training: 2022-04-12 19:44:50,701-Speed 2657.17 samples/sec Loss 43.5597 LearningRate 0.1000 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 32768 Required: 92 hours
Training: 2022-04-12 19:44:54,548-Speed 2662.05 samples/sec Loss 43.1397 LearningRate 0.1000 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 32768 Required: 92 hours
Training: 2022-04-12 19:44:58,395-Speed 2662.99 samples/sec Loss 43.3202 LearningRate 0.1000 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 32768 Required: 92 hours
Training: 2022-04-12 19:45:02,240-Speed 2664.75 samples/sec Loss 43.0240 LearningRate 0.1000 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 32768 Required: 92 hours
Training: 2022-04-12 19:45:06,166-Speed 2608.34 samples/sec Loss 43.0135 LearningRate 0.1000 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 32768 Required: 92 hours
Training: 2022-04-12 19:45:10,080-Speed 2617.94 samples/sec Loss 43.3390 LearningRate 0.1000 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 32768 Required: 92 hours
Training: 2022-04-12 19:45:13,959-Speed 2640.07 samples/sec Loss 43.0878 LearningRate 0.1000 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 32768 Required: 91 hours
Training: 2022-04-12 19:45:17,827-Speed 2648.24 samples/sec Loss 42.9079 LearningRate 0.0999 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:21,675-Speed 2661.79 samples/sec Loss 43.0201 LearningRate 0.0999 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:25,537-Speed 2652.53 samples/sec Loss 42.9868 LearningRate 0.0999 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:29,390-Speed 2658.00 samples/sec Loss 42.9571 LearningRate 0.0999 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:33,275-Speed 2636.77 samples/sec Loss 42.9252 LearningRate 0.0999 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:37,132-Speed 2655.06 samples/sec Loss 43.1064 LearningRate 0.0999 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:41,031-Speed 2627.43 samples/sec Loss 42.9818 LearningRate 0.0999 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:44,887-Speed 2656.52 samples/sec Loss 42.8356 LearningRate 0.0999 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:48,739-Speed 2659.10 samples/sec Loss 42.7623 LearningRate 0.0999 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:52,591-Speed 2658.33 samples/sec Loss 42.7866 LearningRate 0.0999 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 19:45:56,476-Speed 2636.73 samples/sec Loss 42.8538 LearningRate 0.0999 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 19:46:00,355-Speed 2640.70 samples/sec Loss 42.7730 LearningRate 0.0999 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 19:46:04,213-Speed 2655.39 samples/sec Loss 42.9514 LearningRate 0.0999 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:08,063-Speed 2660.34 samples/sec Loss 42.7928 LearningRate 0.0999 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:11,924-Speed 2652.76 samples/sec Loss 42.7317 LearningRate 0.0999 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:15,778-Speed 2658.14 samples/sec Loss 42.6753 LearningRate 0.0999 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:19,705-Speed 2607.79 samples/sec Loss 42.6768 LearningRate 0.0999 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:23,568-Speed 2651.61 samples/sec Loss 42.6679 LearningRate 0.0999 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:27,490-Speed 2611.66 samples/sec Loss 42.6630 LearningRate 0.0999 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:31,398-Speed 2620.97 samples/sec Loss 42.6817 LearningRate 0.0999 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 19:46:35,261-Speed 2651.37 samples/sec Loss 42.6596 LearningRate 0.0999 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:46:39,146-Speed 2636.36 samples/sec Loss 42.6045 LearningRate 0.0999 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:46:43,019-Speed 2645.17 samples/sec Loss 42.6643 LearningRate 0.0999 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:46:46,859-Speed 2667.30 samples/sec Loss 42.6625 LearningRate 0.0999 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:46:50,717-Speed 2655.27 samples/sec Loss 42.5804 LearningRate 0.0999 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:46:54,623-Speed 2622.11 samples/sec Loss 42.5641 LearningRate 0.0999 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:46:58,491-Speed 2648.52 samples/sec Loss 42.6173 LearningRate 0.0999 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:47:02,389-Speed 2627.68 samples/sec Loss 42.5451 LearningRate 0.0999 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:47:06,272-Speed 2637.37 samples/sec Loss 42.4640 LearningRate 0.0999 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:47:10,137-Speed 2650.07 samples/sec Loss 42.5158 LearningRate 0.0999 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 19:47:14,010-Speed 2644.60 samples/sec Loss 42.4684 LearningRate 0.0999 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:17,896-Speed 2636.06 samples/sec Loss 42.5050 LearningRate 0.0999 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:21,787-Speed 2631.79 samples/sec Loss 42.4539 LearningRate 0.0999 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:25,747-Speed 2586.84 samples/sec Loss 42.4475 LearningRate 0.0999 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:29,673-Speed 2608.46 samples/sec Loss 42.4660 LearningRate 0.0999 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:33,531-Speed 2654.96 samples/sec Loss 42.5184 LearningRate 0.0999 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:37,423-Speed 2631.80 samples/sec Loss 42.4954 LearningRate 0.0999 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:41,275-Speed 2659.25 samples/sec Loss 42.4437 LearningRate 0.0999 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:45,136-Speed 2652.63 samples/sec Loss 42.3833 LearningRate 0.0999 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:48,997-Speed 2652.83 samples/sec Loss 42.4676 LearningRate 0.0999 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:52,844-Speed 2662.35 samples/sec Loss 42.3712 LearningRate 0.0999 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:47:56,708-Speed 2651.03 samples/sec Loss 42.3637 LearningRate 0.0999 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:00,556-Speed 2661.27 samples/sec Loss 42.4162 LearningRate 0.0998 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:04,407-Speed 2660.02 samples/sec Loss 42.4495 LearningRate 0.0998 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:08,254-Speed 2662.37 samples/sec Loss 42.3778 LearningRate 0.0998 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:12,116-Speed 2651.73 samples/sec Loss 42.4134 LearningRate 0.0998 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:15,962-Speed 2663.13 samples/sec Loss 42.3165 LearningRate 0.0998 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:19,834-Speed 2644.94 samples/sec Loss 42.3406 LearningRate 0.0998 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:23,699-Speed 2650.05 samples/sec Loss 42.3699 LearningRate 0.0998 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:27,556-Speed 2655.89 samples/sec Loss 42.3046 LearningRate 0.0998 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:32,307-Speed 2156.08 samples/sec Loss 42.3396 LearningRate 0.0998 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:36,159-Speed 2658.60 samples/sec Loss 42.3294 LearningRate 0.0998 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:40,010-Speed 2659.81 samples/sec Loss 42.2768 LearningRate 0.0998 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:43,859-Speed 2660.99 samples/sec Loss 42.2089 LearningRate 0.0998 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:47,765-Speed 2621.76 samples/sec Loss 42.2807 LearningRate 0.0998 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:51,630-Speed 2650.14 samples/sec Loss 42.2119 LearningRate 0.0998 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:55,522-Speed 2632.03 samples/sec Loss 42.2720 LearningRate 0.0998 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:48:59,427-Speed 2622.71 samples/sec Loss 42.1816 LearningRate 0.0998 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:03,321-Speed 2630.07 samples/sec Loss 42.2307 LearningRate 0.0998 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:07,168-Speed 2662.65 samples/sec Loss 42.2073 LearningRate 0.0998 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:10,995-Speed 2676.44 samples/sec Loss 42.1903 LearningRate 0.0998 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:14,881-Speed 2635.49 samples/sec Loss 42.2138 LearningRate 0.0998 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:18,853-Speed 2578.55 samples/sec Loss 42.1714 LearningRate 0.0998 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:22,708-Speed 2657.42 samples/sec Loss 42.1144 LearningRate 0.0998 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:26,584-Speed 2642.83 samples/sec Loss 42.1573 LearningRate 0.0998 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:30,442-Speed 2654.56 samples/sec Loss 42.1422 LearningRate 0.0998 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:34,300-Speed 2655.08 samples/sec Loss 42.0638 LearningRate 0.0998 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:38,178-Speed 2640.91 samples/sec Loss 42.1135 LearningRate 0.0998 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:42,179-Speed 2560.79 samples/sec Loss 41.9945 LearningRate 0.0998 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:46,043-Speed 2650.88 samples/sec Loss 42.0112 LearningRate 0.0998 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:49,876-Speed 2672.14 samples/sec Loss 41.9996 LearningRate 0.0998 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:53,745-Speed 2647.32 samples/sec Loss 42.0212 LearningRate 0.0998 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:49:57,598-Speed 2658.22 samples/sec Loss 41.9687 LearningRate 0.0998 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:01,455-Speed 2655.86 samples/sec Loss 41.9489 LearningRate 0.0998 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:05,360-Speed 2623.09 samples/sec Loss 41.9953 LearningRate 0.0998 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:09,249-Speed 2633.17 samples/sec Loss 41.9818 LearningRate 0.0998 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:13,097-Speed 2662.03 samples/sec Loss 41.9204 LearningRate 0.0998 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:16,941-Speed 2664.45 samples/sec Loss 41.9794 LearningRate 0.0998 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:20,809-Speed 2648.52 samples/sec Loss 41.9502 LearningRate 0.0998 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:24,680-Speed 2645.88 samples/sec Loss 41.9432 LearningRate 0.0998 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:28,587-Speed 2621.94 samples/sec Loss 42.0069 LearningRate 0.0998 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:32,469-Speed 2638.68 samples/sec Loss 41.9488 LearningRate 0.0998 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:36,315-Speed 2662.77 samples/sec Loss 41.8047 LearningRate 0.0998 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:40,171-Speed 2656.05 samples/sec Loss 41.8269 LearningRate 0.0997 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:44,031-Speed 2653.71 samples/sec Loss 41.7951 LearningRate 0.0997 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:47,898-Speed 2648.92 samples/sec Loss 41.8016 LearningRate 0.0997 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:51,752-Speed 2657.72 samples/sec Loss 41.8751 LearningRate 0.0997 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:55,603-Speed 2659.90 samples/sec Loss 41.7278 LearningRate 0.0997 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:50:59,456-Speed 2657.99 samples/sec Loss 41.7916 LearningRate 0.0997 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:03,313-Speed 2655.84 samples/sec Loss 41.8093 LearningRate 0.0997 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:07,150-Speed 2668.96 samples/sec Loss 41.7445 LearningRate 0.0997 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:11,125-Speed 2576.68 samples/sec Loss 41.8262 LearningRate 0.0997 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:14,973-Speed 2662.14 samples/sec Loss 41.7468 LearningRate 0.0997 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:18,892-Speed 2613.71 samples/sec Loss 41.7342 LearningRate 0.0997 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:22,804-Speed 2618.05 samples/sec Loss 41.7011 LearningRate 0.0997 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:26,659-Speed 2657.61 samples/sec Loss 41.6749 LearningRate 0.0997 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:30,519-Speed 2653.65 samples/sec Loss 41.6210 LearningRate 0.0997 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:34,378-Speed 2653.61 samples/sec Loss 41.6678 LearningRate 0.0997 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:38,236-Speed 2655.23 samples/sec Loss 41.6911 LearningRate 0.0997 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:42,094-Speed 2654.99 samples/sec Loss 41.5984 LearningRate 0.0997 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:45,915-Speed 2679.99 samples/sec Loss 41.6814 LearningRate 0.0997 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:49,776-Speed 2653.27 samples/sec Loss 41.5634 LearningRate 0.0997 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:53,653-Speed 2641.81 samples/sec Loss 41.5432 LearningRate 0.0997 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:51:57,498-Speed 2664.47 samples/sec Loss 41.5780 LearningRate 0.0997 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:01,459-Speed 2585.18 samples/sec Loss 41.5293 LearningRate 0.0997 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:05,340-Speed 2639.19 samples/sec Loss 41.5144 LearningRate 0.0997 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:09,202-Speed 2652.16 samples/sec Loss 41.5784 LearningRate 0.0997 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:13,077-Speed 2643.26 samples/sec Loss 41.5274 LearningRate 0.0997 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:16,924-Speed 2662.96 samples/sec Loss 41.5047 LearningRate 0.0997 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:20,783-Speed 2653.88 samples/sec Loss 41.5687 LearningRate 0.0997 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:24,625-Speed 2665.77 samples/sec Loss 41.5304 LearningRate 0.0997 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:28,484-Speed 2654.95 samples/sec Loss 41.4842 LearningRate 0.0997 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:32,409-Speed 2609.53 samples/sec Loss 41.3958 LearningRate 0.0997 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:36,356-Speed 2594.58 samples/sec Loss 41.5194 LearningRate 0.0997 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:40,215-Speed 2654.11 samples/sec Loss 41.4695 LearningRate 0.0997 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:44,058-Speed 2665.23 samples/sec Loss 41.4078 LearningRate 0.0997 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:47,900-Speed 2666.61 samples/sec Loss 41.4669 LearningRate 0.0997 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:51,749-Speed 2661.33 samples/sec Loss 41.4162 LearningRate 0.0997 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:55,619-Speed 2646.64 samples/sec Loss 41.3238 LearningRate 0.0997 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:52:59,459-Speed 2666.77 samples/sec Loss 41.3471 LearningRate 0.0997 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:03,312-Speed 2658.46 samples/sec Loss 41.3947 LearningRate 0.0997 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 1048576 Required: 90 hours
Training: 2022-04-12 19:53:07,272-Speed 2586.19 samples/sec Loss 41.2457 LearningRate 0.0997 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:11,284-Speed 2552.77 samples/sec Loss 41.3320 LearningRate 0.0997 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:15,330-Speed 2531.34 samples/sec Loss 41.3041 LearningRate 0.0997 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:19,275-Speed 2596.85 samples/sec Loss 41.3373 LearningRate 0.0997 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:23,120-Speed 2664.67 samples/sec Loss 41.3215 LearningRate 0.0996 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:26,973-Speed 2658.03 samples/sec Loss 41.2658 LearningRate 0.0996 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:30,836-Speed 2651.20 samples/sec Loss 41.3112 LearningRate 0.0996 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:34,696-Speed 2653.50 samples/sec Loss 41.2044 LearningRate 0.0996 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:38,553-Speed 2655.16 samples/sec Loss 41.2451 LearningRate 0.0996 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:42,451-Speed 2627.48 samples/sec Loss 41.2717 LearningRate 0.0996 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:46,281-Speed 2674.71 samples/sec Loss 41.2494 LearningRate 0.0996 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:50,130-Speed 2661.14 samples/sec Loss 41.1898 LearningRate 0.0996 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:53,981-Speed 2660.22 samples/sec Loss 41.1657 LearningRate 0.0996 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:53:57,830-Speed 2660.60 samples/sec Loss 41.1613 LearningRate 0.0996 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:54:01,679-Speed 2661.10 samples/sec Loss 41.2220 LearningRate 0.0996 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:54:05,551-Speed 2645.58 samples/sec Loss 41.1442 LearningRate 0.0996 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 19:54:09,398-Speed 2662.92 samples/sec Loss 41.1496 LearningRate 0.0996 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:13,245-Speed 2662.11 samples/sec Loss 41.1264 LearningRate 0.0996 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:17,090-Speed 2663.60 samples/sec Loss 41.1509 LearningRate 0.0996 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:20,937-Speed 2662.34 samples/sec Loss 41.1268 LearningRate 0.0996 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:24,775-Speed 2669.26 samples/sec Loss 41.0994 LearningRate 0.0996 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:28,625-Speed 2660.64 samples/sec Loss 41.0465 LearningRate 0.0996 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:32,473-Speed 2661.68 samples/sec Loss 41.0645 LearningRate 0.0996 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:36,350-Speed 2641.53 samples/sec Loss 41.0397 LearningRate 0.0996 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:40,205-Speed 2656.96 samples/sec Loss 40.9561 LearningRate 0.0996 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:44,178-Speed 2578.01 samples/sec Loss 41.0124 LearningRate 0.0996 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:48,033-Speed 2656.75 samples/sec Loss 41.1072 LearningRate 0.0996 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:51,890-Speed 2656.08 samples/sec Loss 41.0189 LearningRate 0.0996 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:55,737-Speed 2662.45 samples/sec Loss 40.9806 LearningRate 0.0996 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:54:59,604-Speed 2648.42 samples/sec Loss 40.9770 LearningRate 0.0996 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:03,447-Speed 2665.06 samples/sec Loss 40.9253 LearningRate 0.0996 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:07,294-Speed 2662.69 samples/sec Loss 40.9440 LearningRate 0.0996 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:11,141-Speed 2662.59 samples/sec Loss 40.9179 LearningRate 0.0996 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:14,990-Speed 2661.01 samples/sec Loss 40.9538 LearningRate 0.0996 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:18,861-Speed 2645.92 samples/sec Loss 40.9404 LearningRate 0.0996 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:22,765-Speed 2623.95 samples/sec Loss 40.8540 LearningRate 0.0996 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:26,642-Speed 2642.26 samples/sec Loss 40.9988 LearningRate 0.0996 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:30,511-Speed 2647.02 samples/sec Loss 40.8583 LearningRate 0.0996 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:34,368-Speed 2655.24 samples/sec Loss 40.8915 LearningRate 0.0996 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:38,211-Speed 2665.10 samples/sec Loss 40.8874 LearningRate 0.0996 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:42,037-Speed 2677.72 samples/sec Loss 40.8494 LearningRate 0.0996 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:45,879-Speed 2665.71 samples/sec Loss 40.8729 LearningRate 0.0996 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:49,748-Speed 2647.75 samples/sec Loss 40.8416 LearningRate 0.0996 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:53,607-Speed 2654.09 samples/sec Loss 40.7785 LearningRate 0.0996 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:55:57,455-Speed 2661.85 samples/sec Loss 40.7893 LearningRate 0.0996 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:01,332-Speed 2642.02 samples/sec Loss 40.7354 LearningRate 0.0995 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:05,185-Speed 2658.03 samples/sec Loss 40.7616 LearningRate 0.0995 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:09,041-Speed 2656.06 samples/sec Loss 40.7822 LearningRate 0.0995 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:12,890-Speed 2662.34 samples/sec Loss 40.7469 LearningRate 0.0995 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:16,739-Speed 2660.58 samples/sec Loss 40.7567 LearningRate 0.0995 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:20,562-Speed 2679.77 samples/sec Loss 40.7845 LearningRate 0.0995 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:24,445-Speed 2637.57 samples/sec Loss 40.7217 LearningRate 0.0995 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:28,296-Speed 2660.14 samples/sec Loss 40.6904 LearningRate 0.0995 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:56:32,132-Speed 2669.91 samples/sec Loss 40.6544 LearningRate 0.0995 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:56:35,986-Speed 2657.45 samples/sec Loss 40.7414 LearningRate 0.0995 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:56:39,843-Speed 2655.55 samples/sec Loss 40.6627 LearningRate 0.0995 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:56:43,703-Speed 2653.55 samples/sec Loss 40.6364 LearningRate 0.0995 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:56:47,594-Speed 2631.97 samples/sec Loss 40.5832 LearningRate 0.0995 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:56:51,463-Speed 2647.71 samples/sec Loss 40.5860 LearningRate 0.0995 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:56:55,319-Speed 2656.54 samples/sec Loss 40.5484 LearningRate 0.0995 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:56:59,176-Speed 2655.86 samples/sec Loss 40.5930 LearningRate 0.0995 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:57:03,027-Speed 2659.17 samples/sec Loss 40.6221 LearningRate 0.0995 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:57:06,969-Speed 2598.30 samples/sec Loss 40.5851 LearningRate 0.0995 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:57:11,002-Speed 2539.24 samples/sec Loss 40.5833 LearningRate 0.0995 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:14,906-Speed 2624.02 samples/sec Loss 40.5884 LearningRate 0.0995 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:18,760-Speed 2658.39 samples/sec Loss 40.4775 LearningRate 0.0995 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:22,611-Speed 2659.98 samples/sec Loss 40.5093 LearningRate 0.0995 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:26,457-Speed 2663.22 samples/sec Loss 40.4569 LearningRate 0.0995 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:30,316-Speed 2654.39 samples/sec Loss 40.4250 LearningRate 0.0995 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:34,175-Speed 2654.05 samples/sec Loss 40.5026 LearningRate 0.0995 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:38,031-Speed 2656.02 samples/sec Loss 40.4758 LearningRate 0.0995 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:42,048-Speed 2549.70 samples/sec Loss 40.4176 LearningRate 0.0995 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:46,000-Speed 2591.78 samples/sec Loss 40.4618 LearningRate 0.0995 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:49,820-Speed 2681.69 samples/sec Loss 40.4291 LearningRate 0.0995 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:53,682-Speed 2651.67 samples/sec Loss 40.3924 LearningRate 0.0995 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:57:57,532-Speed 2660.94 samples/sec Loss 40.3789 LearningRate 0.0995 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:01,377-Speed 2663.76 samples/sec Loss 40.4080 LearningRate 0.0995 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:05,221-Speed 2663.80 samples/sec Loss 40.3101 LearningRate 0.0995 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:09,076-Speed 2657.04 samples/sec Loss 40.3845 LearningRate 0.0995 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:13,050-Speed 2577.61 samples/sec Loss 40.3328 LearningRate 0.0995 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:16,897-Speed 2662.42 samples/sec Loss 40.2691 LearningRate 0.0995 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:20,749-Speed 2659.33 samples/sec Loss 40.2989 LearningRate 0.0995 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:24,598-Speed 2661.11 samples/sec Loss 40.3804 LearningRate 0.0995 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:28,422-Speed 2678.36 samples/sec Loss 40.2580 LearningRate 0.0995 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:32,312-Speed 2634.00 samples/sec Loss 40.1867 LearningRate 0.0995 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:58:36,148-Speed 2669.90 samples/sec Loss 40.3195 LearningRate 0.0995 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:58:40,032-Speed 2636.81 samples/sec Loss 40.2760 LearningRate 0.0995 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:58:43,896-Speed 2651.08 samples/sec Loss 40.1930 LearningRate 0.0994 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:58:47,745-Speed 2661.48 samples/sec Loss 40.2465 LearningRate 0.0994 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:58:51,613-Speed 2647.95 samples/sec Loss 40.1891 LearningRate 0.0994 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:58:55,635-Speed 2546.81 samples/sec Loss 40.1783 LearningRate 0.0994 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:58:59,534-Speed 2626.79 samples/sec Loss 40.0880 LearningRate 0.0994 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:59:03,398-Speed 2650.80 samples/sec Loss 40.1053 LearningRate 0.0994 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:59:07,258-Speed 2653.66 samples/sec Loss 40.0922 LearningRate 0.0994 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:59:11,129-Speed 2645.40 samples/sec Loss 40.2074 LearningRate 0.0994 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 19:59:15,006-Speed 2642.27 samples/sec Loss 40.1522 LearningRate 0.0994 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:18,853-Speed 2662.49 samples/sec Loss 39.9927 LearningRate 0.0994 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:22,707-Speed 2657.87 samples/sec Loss 40.0784 LearningRate 0.0994 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:26,564-Speed 2655.41 samples/sec Loss 40.0351 LearningRate 0.0994 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:30,420-Speed 2656.05 samples/sec Loss 39.9989 LearningRate 0.0994 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:34,279-Speed 2654.41 samples/sec Loss 40.1087 LearningRate 0.0994 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:38,129-Speed 2660.57 samples/sec Loss 39.9762 LearningRate 0.0994 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:41,975-Speed 2662.91 samples/sec Loss 40.0479 LearningRate 0.0994 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:45,838-Speed 2651.70 samples/sec Loss 40.0027 LearningRate 0.0994 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:49,751-Speed 2617.56 samples/sec Loss 39.9511 LearningRate 0.0994 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:53,585-Speed 2671.46 samples/sec Loss 39.9656 LearningRate 0.0994 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 19:59:57,424-Speed 2668.39 samples/sec Loss 39.9624 LearningRate 0.0994 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:01,275-Speed 2659.68 samples/sec Loss 39.9551 LearningRate 0.0994 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:05,119-Speed 2664.19 samples/sec Loss 39.9514 LearningRate 0.0994 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:08,975-Speed 2656.07 samples/sec Loss 39.9417 LearningRate 0.0994 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:12,825-Speed 2660.53 samples/sec Loss 39.8884 LearningRate 0.0994 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:16,671-Speed 2663.08 samples/sec Loss 39.9012 LearningRate 0.0994 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:20,519-Speed 2661.99 samples/sec Loss 39.9391 LearningRate 0.0994 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:24,365-Speed 2663.24 samples/sec Loss 39.8745 LearningRate 0.0994 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:28,228-Speed 2651.45 samples/sec Loss 39.8240 LearningRate 0.0994 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:32,169-Speed 2598.79 samples/sec Loss 39.8180 LearningRate 0.0994 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:00:36,140-Speed 2578.82 samples/sec Loss 39.8768 LearningRate 0.0994 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:00:40,027-Speed 2635.04 samples/sec Loss 39.7742 LearningRate 0.0994 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:00:44,017-Speed 2567.75 samples/sec Loss 39.7691 LearningRate 0.0994 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:00:47,862-Speed 2663.45 samples/sec Loss 39.7529 LearningRate 0.0994 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:00:51,708-Speed 2663.37 samples/sec Loss 39.7866 LearningRate 0.0994 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:00:55,566-Speed 2654.61 samples/sec Loss 39.7313 LearningRate 0.0994 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:00:59,417-Speed 2659.26 samples/sec Loss 39.7998 LearningRate 0.0994 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:01:03,279-Speed 2652.41 samples/sec Loss 39.7164 LearningRate 0.0994 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:01:07,131-Speed 2658.75 samples/sec Loss 39.6912 LearningRate 0.0994 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:01:10,981-Speed 2660.29 samples/sec Loss 39.7123 LearningRate 0.0994 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:01:14,798-Speed 2683.82 samples/sec Loss 39.7934 LearningRate 0.0994 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:01:18,650-Speed 2659.18 samples/sec Loss 39.7308 LearningRate 0.0994 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:01:22,497-Speed 2662.28 samples/sec Loss 39.6438 LearningRate 0.0994 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:01:26,343-Speed 2663.31 samples/sec Loss 39.7132 LearningRate 0.0993 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:01:30,691-Speed 2356.47 samples/sec Loss 39.6244 LearningRate 0.0993 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:01:34,601-Speed 2619.36 samples/sec Loss 39.6483 LearningRate 0.0993 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:01:38,459-Speed 2654.56 samples/sec Loss 39.6867 LearningRate 0.0993 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:01:42,311-Speed 2659.76 samples/sec Loss 39.5841 LearningRate 0.0993 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:01:46,159-Speed 2661.83 samples/sec Loss 39.6679 LearningRate 0.0993 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:01:50,014-Speed 2656.78 samples/sec Loss 39.5977 LearningRate 0.0993 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:01:53,880-Speed 2649.34 samples/sec Loss 39.5822 LearningRate 0.0993 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:01:57,729-Speed 2661.24 samples/sec Loss 39.6654 LearningRate 0.0993 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:01,650-Speed 2611.86 samples/sec Loss 39.4988 LearningRate 0.0993 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:06,211-Speed 2245.87 samples/sec Loss 39.5830 LearningRate 0.0993 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:10,063-Speed 2658.93 samples/sec Loss 39.4532 LearningRate 0.0993 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:13,917-Speed 2657.57 samples/sec Loss 39.5505 LearningRate 0.0993 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:02:17,874-Speed 2588.72 samples/sec Loss 39.5254 LearningRate 0.0993 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:21,784-Speed 2619.65 samples/sec Loss 39.5191 LearningRate 0.0993 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:25,627-Speed 2664.75 samples/sec Loss 39.4855 LearningRate 0.0993 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:29,931-Speed 2380.19 samples/sec Loss 39.4702 LearningRate 0.0993 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:33,796-Speed 2649.61 samples/sec Loss 39.4657 LearningRate 0.0993 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:38,062-Speed 2400.72 samples/sec Loss 39.4489 LearningRate 0.0993 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:41,911-Speed 2661.35 samples/sec Loss 39.4566 LearningRate 0.0993 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:45,756-Speed 2663.90 samples/sec Loss 39.4355 LearningRate 0.0993 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:49,624-Speed 2648.38 samples/sec Loss 39.3976 LearningRate 0.0993 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:53,497-Speed 2644.47 samples/sec Loss 39.3934 LearningRate 0.0993 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:02:57,343-Speed 2663.67 samples/sec Loss 39.4137 LearningRate 0.0993 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:01,193-Speed 2659.80 samples/sec Loss 39.3886 LearningRate 0.0993 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:05,099-Speed 2622.61 samples/sec Loss 39.3333 LearningRate 0.0993 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:08,995-Speed 2628.46 samples/sec Loss 39.2035 LearningRate 0.0993 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:12,856-Speed 2653.29 samples/sec Loss 39.2837 LearningRate 0.0993 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:16,725-Speed 2647.38 samples/sec Loss 39.3784 LearningRate 0.0993 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:20,610-Speed 2636.69 samples/sec Loss 39.3632 LearningRate 0.0993 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:24,467-Speed 2655.53 samples/sec Loss 39.3235 LearningRate 0.0993 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:28,323-Speed 2657.14 samples/sec Loss 39.1619 LearningRate 0.0993 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:32,174-Speed 2658.86 samples/sec Loss 39.3767 LearningRate 0.0993 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:36,029-Speed 2656.97 samples/sec Loss 39.2234 LearningRate 0.0993 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:03:39,882-Speed 2658.38 samples/sec Loss 39.2403 LearningRate 0.0993 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:03:43,721-Speed 2668.74 samples/sec Loss 39.1903 LearningRate 0.0993 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:47,574-Speed 2657.89 samples/sec Loss 39.2098 LearningRate 0.0993 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:51,437-Speed 2651.51 samples/sec Loss 39.1761 LearningRate 0.0993 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:03:55,289-Speed 2658.62 samples/sec Loss 39.1466 LearningRate 0.0993 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:03:59,144-Speed 2657.19 samples/sec Loss 39.1346 LearningRate 0.0993 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:03,004-Speed 2653.38 samples/sec Loss 39.1762 LearningRate 0.0993 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:06,854-Speed 2660.45 samples/sec Loss 39.1302 LearningRate 0.0992 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:10,732-Speed 2641.02 samples/sec Loss 39.1963 LearningRate 0.0992 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:14,586-Speed 2657.74 samples/sec Loss 39.0633 LearningRate 0.0992 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:18,442-Speed 2656.33 samples/sec Loss 39.1371 LearningRate 0.0992 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:22,310-Speed 2648.08 samples/sec Loss 39.1250 LearningRate 0.0992 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:26,340-Speed 2541.95 samples/sec Loss 38.9594 LearningRate 0.0992 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:30,194-Speed 2657.04 samples/sec Loss 39.0327 LearningRate 0.0992 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:04:34,129-Speed 2603.37 samples/sec Loss 39.1091 LearningRate 0.0992 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:04:37,986-Speed 2655.87 samples/sec Loss 38.9936 LearningRate 0.0992 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:04:41,841-Speed 2656.60 samples/sec Loss 38.9710 LearningRate 0.0992 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:04:45,733-Speed 2631.76 samples/sec Loss 38.9302 LearningRate 0.0992 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:04:49,775-Speed 2534.31 samples/sec Loss 39.0973 LearningRate 0.0992 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:04:53,704-Speed 2607.40 samples/sec Loss 39.0165 LearningRate 0.0992 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:04:57,557-Speed 2658.29 samples/sec Loss 39.0115 LearningRate 0.0992 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:01,447-Speed 2633.42 samples/sec Loss 39.0031 LearningRate 0.0992 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:05,467-Speed 2548.05 samples/sec Loss 38.9980 LearningRate 0.0992 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:09,507-Speed 2535.46 samples/sec Loss 38.8823 LearningRate 0.0992 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:13,522-Speed 2550.68 samples/sec Loss 38.8606 LearningRate 0.0992 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:17,410-Speed 2634.71 samples/sec Loss 38.9217 LearningRate 0.0992 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:21,262-Speed 2658.77 samples/sec Loss 38.8542 LearningRate 0.0992 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:25,167-Speed 2623.21 samples/sec Loss 38.7892 LearningRate 0.0992 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:29,031-Speed 2650.59 samples/sec Loss 38.8637 LearningRate 0.0992 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:32,918-Speed 2635.39 samples/sec Loss 38.8750 LearningRate 0.0992 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:36,783-Speed 2649.88 samples/sec Loss 38.7469 LearningRate 0.0992 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:40,681-Speed 2627.60 samples/sec Loss 38.7871 LearningRate 0.0992 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:44,541-Speed 2653.45 samples/sec Loss 38.6208 LearningRate 0.0992 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:48,416-Speed 2643.59 samples/sec Loss 38.7532 LearningRate 0.0992 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:05:52,262-Speed 2662.76 samples/sec Loss 38.6912 LearningRate 0.0992 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:05:56,140-Speed 2641.60 samples/sec Loss 38.7821 LearningRate 0.0992 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:05:59,993-Speed 2658.14 samples/sec Loss 38.6954 LearningRate 0.0992 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:06:03,849-Speed 2656.94 samples/sec Loss 38.7402 LearningRate 0.0992 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:07,714-Speed 2649.46 samples/sec Loss 38.7240 LearningRate 0.0992 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:11,582-Speed 2647.70 samples/sec Loss 38.6733 LearningRate 0.0992 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:15,433-Speed 2659.60 samples/sec Loss 38.6158 LearningRate 0.0992 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:19,291-Speed 2654.91 samples/sec Loss 38.5968 LearningRate 0.0992 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:23,154-Speed 2651.40 samples/sec Loss 38.6091 LearningRate 0.0992 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:27,027-Speed 2644.53 samples/sec Loss 38.6587 LearningRate 0.0992 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:30,881-Speed 2658.19 samples/sec Loss 38.5811 LearningRate 0.0992 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:34,734-Speed 2658.26 samples/sec Loss 38.7139 LearningRate 0.0992 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:38,600-Speed 2649.52 samples/sec Loss 38.4925 LearningRate 0.0992 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:06:42,463-Speed 2651.14 samples/sec Loss 38.5850 LearningRate 0.0992 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:06:46,341-Speed 2640.93 samples/sec Loss 38.5757 LearningRate 0.0992 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:06:50,222-Speed 2639.58 samples/sec Loss 38.5402 LearningRate 0.0991 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:06:54,084-Speed 2651.94 samples/sec Loss 38.5598 LearningRate 0.0991 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:06:57,949-Speed 2650.23 samples/sec Loss 38.5989 LearningRate 0.0991 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:01,826-Speed 2642.10 samples/sec Loss 38.5215 LearningRate 0.0991 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:05,690-Speed 2650.22 samples/sec Loss 38.4559 LearningRate 0.0991 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:09,584-Speed 2630.38 samples/sec Loss 38.4935 LearningRate 0.0991 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:13,462-Speed 2641.16 samples/sec Loss 38.5077 LearningRate 0.0991 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:17,345-Speed 2637.89 samples/sec Loss 38.4518 LearningRate 0.0991 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:21,213-Speed 2647.97 samples/sec Loss 38.4694 LearningRate 0.0991 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:25,099-Speed 2635.68 samples/sec Loss 38.3846 LearningRate 0.0991 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:28,988-Speed 2633.55 samples/sec Loss 38.3976 LearningRate 0.0991 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:32,877-Speed 2633.72 samples/sec Loss 38.4048 LearningRate 0.0991 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:07:36,753-Speed 2642.75 samples/sec Loss 38.3564 LearningRate 0.0991 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:07:40,625-Speed 2645.27 samples/sec Loss 38.2873 LearningRate 0.0991 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:07:44,522-Speed 2628.28 samples/sec Loss 38.3042 LearningRate 0.0991 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:07:48,558-Speed 2537.31 samples/sec Loss 38.3016 LearningRate 0.0991 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:07:52,474-Speed 2615.40 samples/sec Loss 38.3053 LearningRate 0.0991 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:07:56,362-Speed 2634.51 samples/sec Loss 38.3925 LearningRate 0.0991 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:08:00,248-Speed 2635.85 samples/sec Loss 38.1772 LearningRate 0.0991 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:08:04,125-Speed 2641.81 samples/sec Loss 38.0769 LearningRate 0.0991 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:07,995-Speed 2646.69 samples/sec Loss 38.1563 LearningRate 0.0991 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:11,870-Speed 2642.82 samples/sec Loss 38.4046 LearningRate 0.0991 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:15,736-Speed 2649.26 samples/sec Loss 38.1527 LearningRate 0.0991 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:19,609-Speed 2644.66 samples/sec Loss 38.2230 LearningRate 0.0991 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:23,490-Speed 2639.61 samples/sec Loss 38.1597 LearningRate 0.0991 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:27,385-Speed 2629.22 samples/sec Loss 38.1225 LearningRate 0.0991 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:31,289-Speed 2623.33 samples/sec Loss 38.1904 LearningRate 0.0991 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:35,174-Speed 2636.40 samples/sec Loss 38.0662 LearningRate 0.0991 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:39,057-Speed 2637.45 samples/sec Loss 38.1134 LearningRate 0.0991 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:08:42,955-Speed 2628.18 samples/sec Loss 38.1362 LearningRate 0.0991 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:08:46,839-Speed 2637.44 samples/sec Loss 38.0315 LearningRate 0.0991 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:08:50,713-Speed 2643.78 samples/sec Loss 37.9904 LearningRate 0.0991 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:08:54,598-Speed 2636.24 samples/sec Loss 38.0017 LearningRate 0.0991 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:08:58,479-Speed 2639.57 samples/sec Loss 37.9783 LearningRate 0.0991 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:02,353-Speed 2643.23 samples/sec Loss 38.0460 LearningRate 0.0991 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:06,231-Speed 2640.96 samples/sec Loss 38.0099 LearningRate 0.0991 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:10,129-Speed 2627.99 samples/sec Loss 38.0569 LearningRate 0.0991 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:14,024-Speed 2629.17 samples/sec Loss 38.0828 LearningRate 0.0991 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:17,912-Speed 2634.62 samples/sec Loss 37.8833 LearningRate 0.0991 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:21,791-Speed 2640.97 samples/sec Loss 37.8297 LearningRate 0.0991 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:25,696-Speed 2622.35 samples/sec Loss 37.9622 LearningRate 0.0991 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:29,599-Speed 2624.91 samples/sec Loss 37.8013 LearningRate 0.0990 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:33,500-Speed 2625.09 samples/sec Loss 37.9154 LearningRate 0.0990 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:37,411-Speed 2618.57 samples/sec Loss 37.8259 LearningRate 0.0990 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:41,293-Speed 2638.71 samples/sec Loss 37.8161 LearningRate 0.0990 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:45,181-Speed 2634.48 samples/sec Loss 37.7841 LearningRate 0.0990 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:49,083-Speed 2624.26 samples/sec Loss 37.8513 LearningRate 0.0990 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:52,978-Speed 2630.00 samples/sec Loss 37.8044 LearningRate 0.0990 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:09:56,868-Speed 2632.71 samples/sec Loss 37.7173 LearningRate 0.0990 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:00,785-Speed 2615.13 samples/sec Loss 37.7698 LearningRate 0.0990 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:10:04,845-Speed 2523.20 samples/sec Loss 37.8085 LearningRate 0.0990 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:10:08,841-Speed 2562.36 samples/sec Loss 37.8604 LearningRate 0.0990 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:10:12,751-Speed 2619.61 samples/sec Loss 37.7562 LearningRate 0.0990 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:10:16,634-Speed 2637.72 samples/sec Loss 37.6119 LearningRate 0.0990 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:10:20,538-Speed 2623.40 samples/sec Loss 37.6041 LearningRate 0.0990 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:10:24,437-Speed 2626.81 samples/sec Loss 37.6087 LearningRate 0.0990 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:28,334-Speed 2628.91 samples/sec Loss 37.6244 LearningRate 0.0990 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:32,226-Speed 2631.49 samples/sec Loss 37.6001 LearningRate 0.0990 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:36,117-Speed 2632.39 samples/sec Loss 37.6001 LearningRate 0.0990 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:40,044-Speed 2608.18 samples/sec Loss 37.5621 LearningRate 0.0990 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:43,953-Speed 2619.91 samples/sec Loss 37.6180 LearningRate 0.0990 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:47,853-Speed 2626.20 samples/sec Loss 37.6152 LearningRate 0.0990 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:51,755-Speed 2624.90 samples/sec Loss 37.5811 LearningRate 0.0990 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:55,783-Speed 2543.41 samples/sec Loss 37.6891 LearningRate 0.0990 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:10:59,689-Speed 2622.01 samples/sec Loss 37.4972 LearningRate 0.0990 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:03,583-Speed 2630.65 samples/sec Loss 37.4964 LearningRate 0.0990 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:11:07,527-Speed 2596.67 samples/sec Loss 37.5289 LearningRate 0.0990 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:11:11,548-Speed 2547.33 samples/sec Loss 37.4582 LearningRate 0.0990 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:11:15,604-Speed 2525.06 samples/sec Loss 37.4443 LearningRate 0.0990 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:11:19,597-Speed 2564.85 samples/sec Loss 37.4963 LearningRate 0.0990 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:11:23,497-Speed 2626.22 samples/sec Loss 37.5297 LearningRate 0.0990 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:11:27,404-Speed 2622.11 samples/sec Loss 37.3847 LearningRate 0.0990 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:11:31,306-Speed 2624.32 samples/sec Loss 37.3837 LearningRate 0.0990 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:35,243-Speed 2602.34 samples/sec Loss 37.4402 LearningRate 0.0990 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:39,150-Speed 2621.20 samples/sec Loss 37.4219 LearningRate 0.0990 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:43,181-Speed 2540.99 samples/sec Loss 37.2878 LearningRate 0.0990 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:47,230-Speed 2529.57 samples/sec Loss 37.3528 LearningRate 0.0990 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:51,133-Speed 2624.40 samples/sec Loss 37.3855 LearningRate 0.0990 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:55,034-Speed 2626.00 samples/sec Loss 37.3495 LearningRate 0.0990 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:11:58,937-Speed 2624.23 samples/sec Loss 37.2557 LearningRate 0.0990 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:02,844-Speed 2621.67 samples/sec Loss 37.1253 LearningRate 0.0990 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:06,746-Speed 2625.23 samples/sec Loss 37.2704 LearningRate 0.0990 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:10,635-Speed 2633.25 samples/sec Loss 37.2503 LearningRate 0.0990 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:14,536-Speed 2626.04 samples/sec Loss 37.1716 LearningRate 0.0989 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:18,483-Speed 2594.93 samples/sec Loss 37.2437 LearningRate 0.0989 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:22,405-Speed 2611.59 samples/sec Loss 37.2620 LearningRate 0.0989 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:26,387-Speed 2572.53 samples/sec Loss 37.2134 LearningRate 0.0989 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:30,489-Speed 2497.17 samples/sec Loss 37.1718 LearningRate 0.0989 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:34,486-Speed 2562.14 samples/sec Loss 37.1826 LearningRate 0.0989 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:38,427-Speed 2598.72 samples/sec Loss 37.1326 LearningRate 0.0989 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:42,335-Speed 2621.24 samples/sec Loss 37.1341 LearningRate 0.0989 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:46,436-Speed 2497.60 samples/sec Loss 36.9572 LearningRate 0.0989 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:50,539-Speed 2496.87 samples/sec Loss 37.0367 LearningRate 0.0989 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:12:54,536-Speed 2562.60 samples/sec Loss 37.0501 LearningRate 0.0989 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:12:58,442-Speed 2621.82 samples/sec Loss 37.0338 LearningRate 0.0989 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:02,342-Speed 2626.37 samples/sec Loss 37.0853 LearningRate 0.0989 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:06,251-Speed 2620.26 samples/sec Loss 37.0500 LearningRate 0.0989 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:10,159-Speed 2621.48 samples/sec Loss 36.9836 LearningRate 0.0989 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:14,059-Speed 2626.10 samples/sec Loss 36.8891 LearningRate 0.0989 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:17,957-Speed 2627.56 samples/sec Loss 36.9875 LearningRate 0.0989 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:21,877-Speed 2613.38 samples/sec Loss 36.9237 LearningRate 0.0989 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:25,792-Speed 2615.94 samples/sec Loss 36.9314 LearningRate 0.0989 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:29,712-Speed 2613.13 samples/sec Loss 36.8525 LearningRate 0.0989 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:33,595-Speed 2638.24 samples/sec Loss 36.8027 LearningRate 0.0989 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:37,497-Speed 2624.63 samples/sec Loss 36.8146 LearningRate 0.0989 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:41,399-Speed 2624.82 samples/sec Loss 36.9001 LearningRate 0.0989 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:45,300-Speed 2626.12 samples/sec Loss 36.8108 LearningRate 0.0989 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:49,199-Speed 2626.57 samples/sec Loss 36.7389 LearningRate 0.0989 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:53,119-Speed 2612.91 samples/sec Loss 36.8937 LearningRate 0.0989 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:13:57,065-Speed 2596.25 samples/sec Loss 36.6866 LearningRate 0.0989 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:00,976-Speed 2619.18 samples/sec Loss 36.8824 LearningRate 0.0989 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:04,982-Speed 2556.22 samples/sec Loss 36.6646 LearningRate 0.0989 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:08,889-Speed 2621.42 samples/sec Loss 36.7035 LearningRate 0.0989 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:12,803-Speed 2616.62 samples/sec Loss 36.6914 LearningRate 0.0989 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:14:16,710-Speed 2622.02 samples/sec Loss 36.6477 LearningRate 0.0989 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:14:20,599-Speed 2634.62 samples/sec Loss 36.7906 LearningRate 0.0989 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:24,563-Speed 2583.15 samples/sec Loss 36.7113 LearningRate 0.0989 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:28,673-Speed 2493.30 samples/sec Loss 36.7178 LearningRate 0.0989 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:32,628-Speed 2589.31 samples/sec Loss 36.6456 LearningRate 0.0989 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:36,536-Speed 2620.96 samples/sec Loss 36.5202 LearningRate 0.0989 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:40,448-Speed 2618.00 samples/sec Loss 36.5608 LearningRate 0.0989 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:44,350-Speed 2625.15 samples/sec Loss 36.5546 LearningRate 0.0989 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:48,248-Speed 2627.50 samples/sec Loss 36.4731 LearningRate 0.0989 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:52,149-Speed 2626.10 samples/sec Loss 36.4947 LearningRate 0.0989 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:56,049-Speed 2626.00 samples/sec Loss 36.5854 LearningRate 0.0989 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:14:59,971-Speed 2612.20 samples/sec Loss 36.4984 LearningRate 0.0988 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:15:03,859-Speed 2633.87 samples/sec Loss 36.5313 LearningRate 0.0988 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:07,788-Speed 2606.82 samples/sec Loss 36.4358 LearningRate 0.0988 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:11,706-Speed 2613.68 samples/sec Loss 36.4918 LearningRate 0.0988 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:15,629-Speed 2611.60 samples/sec Loss 36.3662 LearningRate 0.0988 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:19,566-Speed 2601.59 samples/sec Loss 36.3526 LearningRate 0.0988 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:23,471-Speed 2622.82 samples/sec Loss 36.4004 LearningRate 0.0988 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:27,393-Speed 2611.69 samples/sec Loss 36.3576 LearningRate 0.0988 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:31,318-Speed 2609.86 samples/sec Loss 36.3057 LearningRate 0.0988 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:35,218-Speed 2625.69 samples/sec Loss 36.3566 LearningRate 0.0988 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:39,141-Speed 2611.06 samples/sec Loss 36.3466 LearningRate 0.0988 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:43,082-Speed 2598.53 samples/sec Loss 36.4149 LearningRate 0.0988 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:15:46,969-Speed 2635.61 samples/sec Loss 36.2610 LearningRate 0.0988 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:50,867-Speed 2627.59 samples/sec Loss 36.1962 LearningRate 0.0988 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:54,762-Speed 2629.88 samples/sec Loss 36.1179 LearningRate 0.0988 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:15:58,664-Speed 2624.90 samples/sec Loss 36.1487 LearningRate 0.0988 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:02,604-Speed 2599.55 samples/sec Loss 36.2884 LearningRate 0.0988 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:06,511-Speed 2621.90 samples/sec Loss 36.2084 LearningRate 0.0988 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:10,422-Speed 2618.48 samples/sec Loss 36.1738 LearningRate 0.0988 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:14,322-Speed 2627.03 samples/sec Loss 36.1480 LearningRate 0.0988 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:18,234-Speed 2617.94 samples/sec Loss 36.2383 LearningRate 0.0988 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:22,143-Speed 2620.02 samples/sec Loss 36.0457 LearningRate 0.0988 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:26,044-Speed 2626.01 samples/sec Loss 36.0366 LearningRate 0.0988 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:16:29,946-Speed 2625.04 samples/sec Loss 36.0548 LearningRate 0.0988 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:33,847-Speed 2626.04 samples/sec Loss 36.1658 LearningRate 0.0988 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:37,748-Speed 2625.44 samples/sec Loss 36.0212 LearningRate 0.0988 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:41,656-Speed 2621.33 samples/sec Loss 36.0153 LearningRate 0.0988 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:45,560-Speed 2623.03 samples/sec Loss 35.9839 LearningRate 0.0988 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:49,469-Speed 2620.27 samples/sec Loss 36.0517 LearningRate 0.0988 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:53,384-Speed 2615.96 samples/sec Loss 35.9245 LearningRate 0.0988 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:16:57,285-Speed 2625.59 samples/sec Loss 35.9891 LearningRate 0.0988 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:01,192-Speed 2621.64 samples/sec Loss 35.9411 LearningRate 0.0988 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:05,098-Speed 2622.53 samples/sec Loss 35.8126 LearningRate 0.0988 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:09,014-Speed 2615.56 samples/sec Loss 35.7562 LearningRate 0.0988 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:17:12,916-Speed 2624.82 samples/sec Loss 35.8749 LearningRate 0.0988 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:17:16,818-Speed 2624.95 samples/sec Loss 35.7547 LearningRate 0.0988 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:17:20,892-Speed 2513.59 samples/sec Loss 35.8858 LearningRate 0.0988 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:17:24,967-Speed 2513.85 samples/sec Loss 35.7020 LearningRate 0.0988 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:28,992-Speed 2544.77 samples/sec Loss 35.7964 LearningRate 0.0988 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:32,901-Speed 2620.35 samples/sec Loss 35.8057 LearningRate 0.0988 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:36,817-Speed 2615.37 samples/sec Loss 35.7299 LearningRate 0.0988 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:40,749-Speed 2604.83 samples/sec Loss 35.7588 LearningRate 0.0988 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:44,654-Speed 2623.16 samples/sec Loss 35.7959 LearningRate 0.0987 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:48,563-Speed 2619.76 samples/sec Loss 35.7541 LearningRate 0.0987 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:52,470-Speed 2621.54 samples/sec Loss 35.6726 LearningRate 0.0987 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:17:56,390-Speed 2612.65 samples/sec Loss 35.6798 LearningRate 0.0987 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:00,289-Speed 2626.95 samples/sec Loss 35.8385 LearningRate 0.0987 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:04,188-Speed 2627.20 samples/sec Loss 35.6452 LearningRate 0.0987 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:18:08,096-Speed 2620.91 samples/sec Loss 35.7032 LearningRate 0.0987 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:18:11,985-Speed 2633.90 samples/sec Loss 35.6526 LearningRate 0.0987 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:15,987-Speed 2559.06 samples/sec Loss 35.5451 LearningRate 0.0987 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:19,886-Speed 2626.71 samples/sec Loss 35.5302 LearningRate 0.0987 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:23,806-Speed 2612.76 samples/sec Loss 35.5257 LearningRate 0.0987 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:27,843-Speed 2537.24 samples/sec Loss 35.4904 LearningRate 0.0987 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:31,822-Speed 2574.63 samples/sec Loss 35.4634 LearningRate 0.0987 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:35,742-Speed 2613.11 samples/sec Loss 35.4056 LearningRate 0.0987 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:39,640-Speed 2627.32 samples/sec Loss 35.3940 LearningRate 0.0987 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:43,540-Speed 2626.20 samples/sec Loss 35.4564 LearningRate 0.0987 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:47,466-Speed 2608.81 samples/sec Loss 35.3916 LearningRate 0.0987 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:18:51,406-Speed 2599.56 samples/sec Loss 35.4319 LearningRate 0.0987 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:18:55,317-Speed 2619.08 samples/sec Loss 35.2466 LearningRate 0.0987 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:18:59,200-Speed 2637.69 samples/sec Loss 35.4244 LearningRate 0.0987 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:03,102-Speed 2624.98 samples/sec Loss 35.4258 LearningRate 0.0987 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:07,006-Speed 2624.07 samples/sec Loss 35.4317 LearningRate 0.0987 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:10,906-Speed 2626.55 samples/sec Loss 35.2272 LearningRate 0.0987 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:14,805-Speed 2626.73 samples/sec Loss 35.3857 LearningRate 0.0987 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:18,715-Speed 2619.22 samples/sec Loss 35.2411 LearningRate 0.0987 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:22,642-Speed 2608.48 samples/sec Loss 35.2241 LearningRate 0.0987 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:26,552-Speed 2619.95 samples/sec Loss 35.2367 LearningRate 0.0987 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:30,455-Speed 2624.37 samples/sec Loss 35.3222 LearningRate 0.0987 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:34,361-Speed 2621.92 samples/sec Loss 35.3202 LearningRate 0.0987 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:38,265-Speed 2623.56 samples/sec Loss 35.1747 LearningRate 0.0987 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:19:42,178-Speed 2617.31 samples/sec Loss 35.2025 LearningRate 0.0987 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:19:46,107-Speed 2607.10 samples/sec Loss 35.0848 LearningRate 0.0987 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:19:50,007-Speed 2626.03 samples/sec Loss 35.0078 LearningRate 0.0987 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:19:53,920-Speed 2618.55 samples/sec Loss 35.0596 LearningRate 0.0987 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:19:57,821-Speed 2625.17 samples/sec Loss 35.0623 LearningRate 0.0987 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:01,786-Speed 2583.85 samples/sec Loss 35.0539 LearningRate 0.0987 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:05,728-Speed 2598.53 samples/sec Loss 35.1178 LearningRate 0.0987 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:09,637-Speed 2619.54 samples/sec Loss 34.9651 LearningRate 0.0987 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:13,539-Speed 2625.08 samples/sec Loss 34.9361 LearningRate 0.0987 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:17,441-Speed 2625.02 samples/sec Loss 34.8686 LearningRate 0.0987 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:21,348-Speed 2621.97 samples/sec Loss 35.0181 LearningRate 0.0987 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:25,463-Speed 2488.89 samples/sec Loss 34.8929 LearningRate 0.0986 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:29,518-Speed 2526.24 samples/sec Loss 35.0111 LearningRate 0.0986 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:33,422-Speed 2624.05 samples/sec Loss 35.0492 LearningRate 0.0986 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:20:37,450-Speed 2542.99 samples/sec Loss 34.8013 LearningRate 0.0986 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:20:41,347-Speed 2628.03 samples/sec Loss 34.9848 LearningRate 0.0986 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:20:45,246-Speed 2626.94 samples/sec Loss 34.8505 LearningRate 0.0986 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:20:49,142-Speed 2628.83 samples/sec Loss 34.9822 LearningRate 0.0986 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:53,045-Speed 2624.67 samples/sec Loss 34.7693 LearningRate 0.0986 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:20:57,050-Speed 2557.70 samples/sec Loss 34.8846 LearningRate 0.0986 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:00,950-Speed 2628.25 samples/sec Loss 34.8628 LearningRate 0.0986 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:04,856-Speed 2622.13 samples/sec Loss 34.7246 LearningRate 0.0986 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:08,772-Speed 2615.34 samples/sec Loss 34.8403 LearningRate 0.0986 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:12,673-Speed 2625.43 samples/sec Loss 34.5997 LearningRate 0.0986 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:16,577-Speed 2623.67 samples/sec Loss 34.6872 LearningRate 0.0986 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:20,630-Speed 2527.47 samples/sec Loss 34.5773 LearningRate 0.0986 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:24,676-Speed 2531.66 samples/sec Loss 34.6448 LearningRate 0.0986 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:28,563-Speed 2635.00 samples/sec Loss 34.7026 LearningRate 0.0986 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:32,538-Speed 2576.75 samples/sec Loss 34.4643 LearningRate 0.0986 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:36,437-Speed 2627.53 samples/sec Loss 34.7998 LearningRate 0.0986 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:40,337-Speed 2625.92 samples/sec Loss 34.5995 LearningRate 0.0986 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:44,371-Speed 2538.67 samples/sec Loss 34.5848 LearningRate 0.0986 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:48,364-Speed 2565.42 samples/sec Loss 34.4339 LearningRate 0.0986 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:52,266-Speed 2624.94 samples/sec Loss 34.5216 LearningRate 0.0986 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:21:56,187-Speed 2612.77 samples/sec Loss 34.4249 LearningRate 0.0986 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:00,120-Speed 2604.08 samples/sec Loss 34.4100 LearningRate 0.0986 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:04,036-Speed 2615.87 samples/sec Loss 34.3533 LearningRate 0.0986 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:07,958-Speed 2611.17 samples/sec Loss 34.5426 LearningRate 0.0986 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:22:11,867-Speed 2620.47 samples/sec Loss 34.3541 LearningRate 0.0986 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:22:15,788-Speed 2612.66 samples/sec Loss 34.5030 LearningRate 0.0986 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:19,694-Speed 2622.12 samples/sec Loss 34.3219 LearningRate 0.0986 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:23,601-Speed 2621.35 samples/sec Loss 34.4173 LearningRate 0.0986 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:27,501-Speed 2626.25 samples/sec Loss 34.3737 LearningRate 0.0986 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:31,401-Speed 2627.05 samples/sec Loss 34.3680 LearningRate 0.0986 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:35,306-Speed 2622.70 samples/sec Loss 34.1544 LearningRate 0.0986 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:39,212-Speed 2622.28 samples/sec Loss 34.2713 LearningRate 0.0986 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:43,122-Speed 2619.93 samples/sec Loss 34.3173 LearningRate 0.0986 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:47,057-Speed 2602.52 samples/sec Loss 34.3726 LearningRate 0.0986 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:22:51,032-Speed 2579.25 samples/sec Loss 34.0460 LearningRate 0.0986 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:22:55,105-Speed 2514.66 samples/sec Loss 34.1723 LearningRate 0.0986 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:22:59,115-Speed 2553.79 samples/sec Loss 34.1425 LearningRate 0.0986 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:03,068-Speed 2591.53 samples/sec Loss 33.9713 LearningRate 0.0986 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:06,971-Speed 2624.52 samples/sec Loss 34.1654 LearningRate 0.0986 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:10,881-Speed 2620.12 samples/sec Loss 34.1128 LearningRate 0.0985 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:14,785-Speed 2623.55 samples/sec Loss 34.1258 LearningRate 0.0985 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:18,708-Speed 2610.87 samples/sec Loss 34.0147 LearningRate 0.0985 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:22,616-Speed 2620.94 samples/sec Loss 34.0261 LearningRate 0.0985 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:26,530-Speed 2617.11 samples/sec Loss 34.1214 LearningRate 0.0985 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:23:30,452-Speed 2611.59 samples/sec Loss 34.0655 LearningRate 0.0985 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:23:34,354-Speed 2625.16 samples/sec Loss 34.0430 LearningRate 0.0985 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:23:38,267-Speed 2617.18 samples/sec Loss 33.9728 LearningRate 0.0985 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:23:42,202-Speed 2603.39 samples/sec Loss 33.8394 LearningRate 0.0985 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:23:46,108-Speed 2622.32 samples/sec Loss 33.9915 LearningRate 0.0985 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:23:50,016-Speed 2621.10 samples/sec Loss 33.9473 LearningRate 0.0985 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:23:53,934-Speed 2614.13 samples/sec Loss 33.8250 LearningRate 0.0985 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:23:57,839-Speed 2623.07 samples/sec Loss 33.8547 LearningRate 0.0985 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:01,742-Speed 2624.20 samples/sec Loss 33.7300 LearningRate 0.0985 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:05,645-Speed 2624.49 samples/sec Loss 33.8557 LearningRate 0.0985 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:09,550-Speed 2622.47 samples/sec Loss 33.7938 LearningRate 0.0985 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:24:13,463-Speed 2617.85 samples/sec Loss 33.8555 LearningRate 0.0985 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:17,382-Speed 2613.41 samples/sec Loss 33.8633 LearningRate 0.0985 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:21,285-Speed 2624.88 samples/sec Loss 33.6847 LearningRate 0.0985 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:25,198-Speed 2616.90 samples/sec Loss 33.6890 LearningRate 0.0985 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:29,102-Speed 2624.00 samples/sec Loss 33.5697 LearningRate 0.0985 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:33,004-Speed 2624.77 samples/sec Loss 33.6544 LearningRate 0.0985 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:36,938-Speed 2603.73 samples/sec Loss 33.7582 LearningRate 0.0985 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:40,841-Speed 2624.44 samples/sec Loss 33.7559 LearningRate 0.0985 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:44,766-Speed 2609.38 samples/sec Loss 33.6103 LearningRate 0.0985 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:48,685-Speed 2614.13 samples/sec Loss 33.5045 LearningRate 0.0985 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:24:52,588-Speed 2624.63 samples/sec Loss 33.5311 LearningRate 0.0985 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:24:56,511-Speed 2611.14 samples/sec Loss 33.6326 LearningRate 0.0985 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:00,414-Speed 2624.04 samples/sec Loss 33.4163 LearningRate 0.0985 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:04,317-Speed 2623.95 samples/sec Loss 33.4364 LearningRate 0.0985 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:08,217-Speed 2626.03 samples/sec Loss 33.4079 LearningRate 0.0985 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:12,122-Speed 2623.45 samples/sec Loss 33.4634 LearningRate 0.0985 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:16,024-Speed 2624.90 samples/sec Loss 33.6069 LearningRate 0.0985 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:19,935-Speed 2618.80 samples/sec Loss 33.2852 LearningRate 0.0985 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:23,865-Speed 2606.24 samples/sec Loss 33.5009 LearningRate 0.0985 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:27,763-Speed 2628.02 samples/sec Loss 33.3382 LearningRate 0.0985 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:31,680-Speed 2615.35 samples/sec Loss 33.3261 LearningRate 0.0985 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:35,564-Speed 2637.06 samples/sec Loss 33.3406 LearningRate 0.0985 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:39,471-Speed 2621.36 samples/sec Loss 33.2412 LearningRate 0.0985 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:43,386-Speed 2616.24 samples/sec Loss 33.4366 LearningRate 0.0985 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:47,303-Speed 2615.01 samples/sec Loss 33.2903 LearningRate 0.0985 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:51,204-Speed 2625.83 samples/sec Loss 33.3708 LearningRate 0.0985 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:55,104-Speed 2626.09 samples/sec Loss 33.1003 LearningRate 0.0984 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:25:59,009-Speed 2623.40 samples/sec Loss 33.1680 LearningRate 0.0984 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:02,910-Speed 2625.47 samples/sec Loss 33.1572 LearningRate 0.0984 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:06,807-Speed 2628.14 samples/sec Loss 33.0730 LearningRate 0.0984 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:10,707-Speed 2625.83 samples/sec Loss 33.1681 LearningRate 0.0984 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:14,586-Speed 2640.53 samples/sec Loss 33.0487 LearningRate 0.0984 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:18,491-Speed 2622.92 samples/sec Loss 33.1290 LearningRate 0.0984 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:22,396-Speed 2623.36 samples/sec Loss 33.1780 LearningRate 0.0984 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:26,297-Speed 2625.51 samples/sec Loss 33.0306 LearningRate 0.0984 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:30,201-Speed 2623.49 samples/sec Loss 33.1446 LearningRate 0.0984 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:34,105-Speed 2623.54 samples/sec Loss 32.9486 LearningRate 0.0984 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:38,028-Speed 2610.81 samples/sec Loss 32.9195 LearningRate 0.0984 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:41,935-Speed 2621.10 samples/sec Loss 32.9230 LearningRate 0.0984 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:45,848-Speed 2617.60 samples/sec Loss 32.9465 LearningRate 0.0984 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:49,751-Speed 2624.25 samples/sec Loss 32.8549 LearningRate 0.0984 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:26:53,734-Speed 2571.45 samples/sec Loss 32.9468 LearningRate 0.0984 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:26:57,805-Speed 2515.96 samples/sec Loss 32.9403 LearningRate 0.0984 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:27:01,905-Speed 2497.86 samples/sec Loss 32.8940 LearningRate 0.0984 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:27:05,834-Speed 2607.36 samples/sec Loss 32.8663 LearningRate 0.0984 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:27:09,718-Speed 2636.67 samples/sec Loss 32.8623 LearningRate 0.0984 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:13,614-Speed 2629.42 samples/sec Loss 32.6287 LearningRate 0.0984 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:17,517-Speed 2624.06 samples/sec Loss 32.6822 LearningRate 0.0984 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:21,420-Speed 2624.17 samples/sec Loss 32.8240 LearningRate 0.0984 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:25,331-Speed 2619.00 samples/sec Loss 32.7200 LearningRate 0.0984 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:29,267-Speed 2602.62 samples/sec Loss 32.6346 LearningRate 0.0984 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:33,173-Speed 2621.88 samples/sec Loss 32.6765 LearningRate 0.0984 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:37,084-Speed 2619.00 samples/sec Loss 32.7087 LearningRate 0.0984 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:41,001-Speed 2614.30 samples/sec Loss 32.6512 LearningRate 0.0984 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:44,917-Speed 2615.87 samples/sec Loss 32.5247 LearningRate 0.0984 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:27:48,827-Speed 2619.53 samples/sec Loss 32.6963 LearningRate 0.0984 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:27:52,739-Speed 2618.82 samples/sec Loss 32.5356 LearningRate 0.0984 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:27:56,652-Speed 2616.86 samples/sec Loss 32.5219 LearningRate 0.0984 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:28:00,542-Speed 2633.06 samples/sec Loss 32.5763 LearningRate 0.0984 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:04,450-Speed 2621.17 samples/sec Loss 32.4855 LearningRate 0.0984 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:08,357-Speed 2620.85 samples/sec Loss 32.5242 LearningRate 0.0984 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:12,267-Speed 2619.36 samples/sec Loss 32.6748 LearningRate 0.0984 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:16,176-Speed 2620.70 samples/sec Loss 32.4196 LearningRate 0.0984 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:20,090-Speed 2616.62 samples/sec Loss 32.6146 LearningRate 0.0984 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:23,988-Speed 2627.55 samples/sec Loss 32.3633 LearningRate 0.0984 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:27,908-Speed 2613.52 samples/sec Loss 32.3764 LearningRate 0.0984 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:31,888-Speed 2573.39 samples/sec Loss 32.3350 LearningRate 0.0984 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:35,790-Speed 2624.50 samples/sec Loss 32.5603 LearningRate 0.0984 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:39,687-Speed 2628.05 samples/sec Loss 32.3605 LearningRate 0.0983 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:28:43,588-Speed 2626.02 samples/sec Loss 32.2889 LearningRate 0.0983 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:28:47,465-Speed 2641.56 samples/sec Loss 32.3160 LearningRate 0.0983 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:51,419-Speed 2590.54 samples/sec Loss 32.3068 LearningRate 0.0983 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:55,402-Speed 2571.66 samples/sec Loss 32.2107 LearningRate 0.0983 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:28:59,300-Speed 2627.35 samples/sec Loss 32.2756 LearningRate 0.0983 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:29:03,202-Speed 2624.60 samples/sec Loss 32.1582 LearningRate 0.0983 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:29:07,104-Speed 2625.32 samples/sec Loss 32.3757 LearningRate 0.0983 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:29:11,160-Speed 2524.91 samples/sec Loss 32.1219 LearningRate 0.0983 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:29:15,258-Speed 2499.59 samples/sec Loss 32.1021 LearningRate 0.0983 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:29:19,361-Speed 2496.11 samples/sec Loss 32.0510 LearningRate 0.0983 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:29:23,456-Speed 2500.80 samples/sec Loss 32.0516 LearningRate 0.0983 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:29:27,428-Speed 2578.78 samples/sec Loss 32.2105 LearningRate 0.0983 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:31,331-Speed 2624.55 samples/sec Loss 31.9394 LearningRate 0.0983 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:35,229-Speed 2627.78 samples/sec Loss 32.1908 LearningRate 0.0983 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:39,124-Speed 2629.29 samples/sec Loss 32.0051 LearningRate 0.0983 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:43,024-Speed 2626.64 samples/sec Loss 32.0673 LearningRate 0.0983 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:46,923-Speed 2627.00 samples/sec Loss 32.0858 LearningRate 0.0983 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:50,819-Speed 2628.65 samples/sec Loss 31.8388 LearningRate 0.0983 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:54,717-Speed 2627.59 samples/sec Loss 31.9085 LearningRate 0.0983 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:29:58,627-Speed 2619.20 samples/sec Loss 31.9363 LearningRate 0.0983 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:02,529-Speed 2625.16 samples/sec Loss 31.9298 LearningRate 0.0983 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:06,433-Speed 2623.54 samples/sec Loss 31.8660 LearningRate 0.0983 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:30:10,325-Speed 2631.90 samples/sec Loss 31.8789 LearningRate 0.0983 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:14,227-Speed 2624.65 samples/sec Loss 31.6086 LearningRate 0.0983 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:18,238-Speed 2554.05 samples/sec Loss 32.0320 LearningRate 0.0983 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:22,144-Speed 2621.48 samples/sec Loss 31.8754 LearningRate 0.0983 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:26,045-Speed 2625.88 samples/sec Loss 31.8391 LearningRate 0.0983 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:29,944-Speed 2627.59 samples/sec Loss 31.8138 LearningRate 0.0983 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:33,852-Speed 2620.33 samples/sec Loss 31.6954 LearningRate 0.0983 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:37,776-Speed 2610.02 samples/sec Loss 31.6759 LearningRate 0.0983 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:41,679-Speed 2625.01 samples/sec Loss 31.5674 LearningRate 0.0983 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:45,647-Speed 2580.75 samples/sec Loss 31.8369 LearningRate 0.0983 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:49,600-Speed 2591.37 samples/sec Loss 31.8668 LearningRate 0.0983 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:30:53,588-Speed 2568.18 samples/sec Loss 31.5941 LearningRate 0.0983 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:30:57,490-Speed 2625.21 samples/sec Loss 31.6417 LearningRate 0.0983 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:31:01,527-Speed 2536.88 samples/sec Loss 31.4814 LearningRate 0.0983 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:31:05,566-Speed 2536.11 samples/sec Loss 31.4879 LearningRate 0.0983 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:31:09,470-Speed 2623.33 samples/sec Loss 31.4460 LearningRate 0.0983 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:31:13,371-Speed 2625.52 samples/sec Loss 31.4880 LearningRate 0.0983 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:31:17,349-Speed 2575.54 samples/sec Loss 31.4714 LearningRate 0.0983 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:31:21,452-Speed 2495.98 samples/sec Loss 31.3622 LearningRate 0.0983 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:31:25,361-Speed 2620.54 samples/sec Loss 31.4135 LearningRate 0.0982 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:29,260-Speed 2627.19 samples/sec Loss 31.1969 LearningRate 0.0982 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:33,157-Speed 2628.02 samples/sec Loss 31.3781 LearningRate 0.0982 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:37,117-Speed 2586.38 samples/sec Loss 31.5173 LearningRate 0.0982 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:41,125-Speed 2555.57 samples/sec Loss 31.3308 LearningRate 0.0982 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:45,088-Speed 2584.67 samples/sec Loss 31.2794 LearningRate 0.0982 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:48,996-Speed 2621.07 samples/sec Loss 31.2997 LearningRate 0.0982 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:52,902-Speed 2622.31 samples/sec Loss 31.3638 LearningRate 0.0982 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:31:56,800-Speed 2628.06 samples/sec Loss 31.1534 LearningRate 0.0982 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:32:00,698-Speed 2627.12 samples/sec Loss 31.3053 LearningRate 0.0982 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:32:04,601-Speed 2624.34 samples/sec Loss 31.1291 LearningRate 0.0982 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:08,506-Speed 2622.63 samples/sec Loss 31.1307 LearningRate 0.0982 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:12,431-Speed 2609.82 samples/sec Loss 31.1277 LearningRate 0.0982 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:16,339-Speed 2621.18 samples/sec Loss 31.1308 LearningRate 0.0982 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:20,240-Speed 2625.90 samples/sec Loss 31.1896 LearningRate 0.0982 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:24,146-Speed 2621.85 samples/sec Loss 31.0132 LearningRate 0.0982 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:28,072-Speed 2609.54 samples/sec Loss 30.8965 LearningRate 0.0982 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:31,976-Speed 2623.39 samples/sec Loss 31.1256 LearningRate 0.0982 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:35,888-Speed 2618.19 samples/sec Loss 31.2007 LearningRate 0.0982 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:39,788-Speed 2626.37 samples/sec Loss 30.9814 LearningRate 0.0982 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:43,692-Speed 2623.88 samples/sec Loss 30.8926 LearningRate 0.0982 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:32:47,696-Speed 2558.26 samples/sec Loss 30.9746 LearningRate 0.0982 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:51,805-Speed 2492.68 samples/sec Loss 30.9087 LearningRate 0.0982 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:55,720-Speed 2616.49 samples/sec Loss 30.9580 LearningRate 0.0982 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:32:59,623-Speed 2624.23 samples/sec Loss 31.0432 LearningRate 0.0982 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:03,529-Speed 2621.94 samples/sec Loss 30.8776 LearningRate 0.0982 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:07,430-Speed 2625.54 samples/sec Loss 30.9190 LearningRate 0.0982 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:11,396-Speed 2583.30 samples/sec Loss 31.0235 LearningRate 0.0982 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:15,504-Speed 2493.06 samples/sec Loss 30.7353 LearningRate 0.0982 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:19,617-Speed 2490.48 samples/sec Loss 30.9403 LearningRate 0.0982 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:23,686-Speed 2516.91 samples/sec Loss 30.8062 LearningRate 0.0982 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:27,614-Speed 2607.67 samples/sec Loss 30.6895 LearningRate 0.0982 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:31,536-Speed 2612.32 samples/sec Loss 30.7420 LearningRate 0.0982 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:35,456-Speed 2612.68 samples/sec Loss 30.7051 LearningRate 0.0982 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:39,358-Speed 2624.40 samples/sec Loss 30.5601 LearningRate 0.0982 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:43,260-Speed 2624.92 samples/sec Loss 30.7237 LearningRate 0.0982 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:47,199-Speed 2600.65 samples/sec Loss 30.6493 LearningRate 0.0982 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:51,116-Speed 2615.34 samples/sec Loss 30.5775 LearningRate 0.0982 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:55,016-Speed 2626.23 samples/sec Loss 30.6444 LearningRate 0.0982 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:33:58,916-Speed 2626.40 samples/sec Loss 30.4838 LearningRate 0.0982 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:02,821-Speed 2622.23 samples/sec Loss 30.6008 LearningRate 0.0982 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:06,745-Speed 2609.91 samples/sec Loss 30.4543 LearningRate 0.0981 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:34:10,830-Speed 2507.87 samples/sec Loss 30.7298 LearningRate 0.0981 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:14,897-Speed 2518.15 samples/sec Loss 30.4743 LearningRate 0.0981 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:19,010-Speed 2490.39 samples/sec Loss 30.4403 LearningRate 0.0981 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:23,079-Speed 2517.56 samples/sec Loss 30.4858 LearningRate 0.0981 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:27,005-Speed 2608.88 samples/sec Loss 30.7023 LearningRate 0.0981 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:30,919-Speed 2617.44 samples/sec Loss 30.5233 LearningRate 0.0981 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:34:34,799-Speed 2639.58 samples/sec Loss 30.4448 LearningRate 0.0981 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:34:38,725-Speed 2609.42 samples/sec Loss 30.5592 LearningRate 0.0981 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:34:42,622-Speed 2628.42 samples/sec Loss 30.2927 LearningRate 0.0981 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:34:46,520-Speed 2627.69 samples/sec Loss 30.3314 LearningRate 0.0981 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:34:50,415-Speed 2629.33 samples/sec Loss 30.4409 LearningRate 0.0981 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:34:54,321-Speed 2622.80 samples/sec Loss 30.4759 LearningRate 0.0981 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:34:58,219-Speed 2627.51 samples/sec Loss 30.3092 LearningRate 0.0981 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:35:02,117-Speed 2627.03 samples/sec Loss 30.3522 LearningRate 0.0981 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:35:06,024-Speed 2625.72 samples/sec Loss 30.3515 LearningRate 0.0981 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:35:09,953-Speed 2606.94 samples/sec Loss 30.2874 LearningRate 0.0981 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:35:13,857-Speed 2623.78 samples/sec Loss 30.2703 LearningRate 0.0981 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:17,773-Speed 2616.25 samples/sec Loss 30.1768 LearningRate 0.0981 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:21,679-Speed 2622.42 samples/sec Loss 30.3648 LearningRate 0.0981 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:25,579-Speed 2626.48 samples/sec Loss 30.1958 LearningRate 0.0981 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:29,481-Speed 2624.89 samples/sec Loss 30.1501 LearningRate 0.0981 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:33,433-Speed 2591.48 samples/sec Loss 30.1606 LearningRate 0.0981 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:37,339-Speed 2621.93 samples/sec Loss 30.1301 LearningRate 0.0981 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:41,244-Speed 2623.47 samples/sec Loss 30.1067 LearningRate 0.0981 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:45,146-Speed 2624.98 samples/sec Loss 30.0879 LearningRate 0.0981 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:49,055-Speed 2620.15 samples/sec Loss 30.0037 LearningRate 0.0981 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:35:52,961-Speed 2622.27 samples/sec Loss 30.0166 LearningRate 0.0981 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:35:57,009-Speed 2531.20 samples/sec Loss 30.1903 LearningRate 0.0981 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:01,079-Speed 2516.54 samples/sec Loss 30.1629 LearningRate 0.0981 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:05,158-Speed 2511.02 samples/sec Loss 30.1590 LearningRate 0.0981 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:09,093-Speed 2603.32 samples/sec Loss 29.9725 LearningRate 0.0981 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:12,993-Speed 2625.73 samples/sec Loss 29.9926 LearningRate 0.0981 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:16,900-Speed 2621.55 samples/sec Loss 29.7415 LearningRate 0.0981 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:20,801-Speed 2626.20 samples/sec Loss 29.8282 LearningRate 0.0981 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:24,699-Speed 2627.19 samples/sec Loss 29.8789 LearningRate 0.0981 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:28,626-Speed 2608.38 samples/sec Loss 29.7601 LearningRate 0.0981 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:32,525-Speed 2626.98 samples/sec Loss 29.6902 LearningRate 0.0981 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:36,412-Speed 2635.22 samples/sec Loss 29.8888 LearningRate 0.0981 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:40,333-Speed 2612.24 samples/sec Loss 29.6038 LearningRate 0.0981 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:44,239-Speed 2622.56 samples/sec Loss 29.6613 LearningRate 0.0981 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:48,155-Speed 2615.64 samples/sec Loss 29.8589 LearningRate 0.0981 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:52,068-Speed 2617.57 samples/sec Loss 29.7129 LearningRate 0.0980 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:55,980-Speed 2618.04 samples/sec Loss 29.7486 LearningRate 0.0980 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:36:59,901-Speed 2612.08 samples/sec Loss 29.5960 LearningRate 0.0980 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:03,798-Speed 2628.04 samples/sec Loss 29.6836 LearningRate 0.0980 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:07,701-Speed 2624.75 samples/sec Loss 29.6433 LearningRate 0.0980 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:11,601-Speed 2626.06 samples/sec Loss 29.6254 LearningRate 0.0980 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:15,510-Speed 2620.27 samples/sec Loss 29.4742 LearningRate 0.0980 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:37:19,490-Speed 2573.45 samples/sec Loss 29.5948 LearningRate 0.0980 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:23,594-Speed 2495.46 samples/sec Loss 29.5731 LearningRate 0.0980 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:27,491-Speed 2628.90 samples/sec Loss 29.4678 LearningRate 0.0980 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:31,390-Speed 2626.69 samples/sec Loss 29.4725 LearningRate 0.0980 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:35,324-Speed 2603.87 samples/sec Loss 29.3436 LearningRate 0.0980 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:39,225-Speed 2625.72 samples/sec Loss 29.5214 LearningRate 0.0980 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:43,122-Speed 2628.10 samples/sec Loss 29.2837 LearningRate 0.0980 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:47,025-Speed 2623.99 samples/sec Loss 29.4372 LearningRate 0.0980 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:50,934-Speed 2621.04 samples/sec Loss 29.4086 LearningRate 0.0980 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:54,848-Speed 2616.38 samples/sec Loss 29.3765 LearningRate 0.0980 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:37:58,829-Speed 2574.04 samples/sec Loss 29.3001 LearningRate 0.0980 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:38:02,722-Speed 2630.99 samples/sec Loss 29.3908 LearningRate 0.0980 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:38:06,631-Speed 2619.82 samples/sec Loss 29.3652 LearningRate 0.0980 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:38:10,538-Speed 2620.98 samples/sec Loss 29.4736 LearningRate 0.0980 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:38:14,618-Speed 2510.74 samples/sec Loss 29.4221 LearningRate 0.0980 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:38:18,587-Speed 2580.51 samples/sec Loss 29.2012 LearningRate 0.0980 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:38:22,486-Speed 2627.50 samples/sec Loss 29.0432 LearningRate 0.0980 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:38:26,407-Speed 2612.04 samples/sec Loss 29.1645 LearningRate 0.0980 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:38:30,293-Speed 2635.93 samples/sec Loss 29.0904 LearningRate 0.0980 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:38:34,191-Speed 2627.53 samples/sec Loss 29.1964 LearningRate 0.0980 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:38:38,098-Speed 2621.74 samples/sec Loss 29.2414 LearningRate 0.0980 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:38:42,004-Speed 2621.85 samples/sec Loss 29.2019 LearningRate 0.0980 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:38:45,895-Speed 2633.14 samples/sec Loss 29.0513 LearningRate 0.0980 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:38:49,799-Speed 2623.38 samples/sec Loss 29.0234 LearningRate 0.0980 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:38:53,709-Speed 2619.56 samples/sec Loss 29.2007 LearningRate 0.0980 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:38:57,616-Speed 2621.61 samples/sec Loss 29.0230 LearningRate 0.0980 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:39:01,516-Speed 2626.54 samples/sec Loss 28.9956 LearningRate 0.0980 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:39:05,418-Speed 2624.36 samples/sec Loss 28.9609 LearningRate 0.0980 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:39:09,341-Speed 2611.19 samples/sec Loss 28.8818 LearningRate 0.0980 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:13,271-Speed 2606.08 samples/sec Loss 28.7655 LearningRate 0.0980 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:17,171-Speed 2626.16 samples/sec Loss 28.9609 LearningRate 0.0980 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:21,233-Speed 2521.67 samples/sec Loss 28.7315 LearningRate 0.0980 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:25,332-Speed 2498.95 samples/sec Loss 28.7060 LearningRate 0.0980 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:29,439-Speed 2493.73 samples/sec Loss 28.7558 LearningRate 0.0980 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:33,554-Speed 2488.91 samples/sec Loss 28.6553 LearningRate 0.0980 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:37,583-Speed 2542.45 samples/sec Loss 28.8282 LearningRate 0.0979 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:41,488-Speed 2622.43 samples/sec Loss 28.8563 LearningRate 0.0979 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:39:45,371-Speed 2638.22 samples/sec Loss 28.7507 LearningRate 0.0979 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:39:49,279-Speed 2620.69 samples/sec Loss 28.8108 LearningRate 0.0979 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:39:53,201-Speed 2612.13 samples/sec Loss 28.7837 LearningRate 0.0979 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:39:57,103-Speed 2624.92 samples/sec Loss 28.7941 LearningRate 0.0979 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:40:01,025-Speed 2612.05 samples/sec Loss 28.7554 LearningRate 0.0979 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:40:05,007-Speed 2572.01 samples/sec Loss 28.7215 LearningRate 0.0979 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:40:08,915-Speed 2620.54 samples/sec Loss 28.7366 LearningRate 0.0979 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:40:12,822-Speed 2621.72 samples/sec Loss 28.5774 LearningRate 0.0979 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:40:16,725-Speed 2624.79 samples/sec Loss 28.6153 LearningRate 0.0979 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:40:20,645-Speed 2612.10 samples/sec Loss 28.5789 LearningRate 0.0979 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:40:24,546-Speed 2625.80 samples/sec Loss 28.6659 LearningRate 0.0979 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:28,451-Speed 2623.55 samples/sec Loss 28.6232 LearningRate 0.0979 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:32,352-Speed 2625.40 samples/sec Loss 28.6456 LearningRate 0.0979 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:36,251-Speed 2626.37 samples/sec Loss 28.6229 LearningRate 0.0979 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:40,320-Speed 2516.94 samples/sec Loss 28.4283 LearningRate 0.0979 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:44,429-Speed 2492.83 samples/sec Loss 28.4668 LearningRate 0.0979 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:48,366-Speed 2601.76 samples/sec Loss 28.4218 LearningRate 0.0979 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:52,263-Speed 2628.56 samples/sec Loss 28.4679 LearningRate 0.0979 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:40:56,176-Speed 2617.83 samples/sec Loss 28.3855 LearningRate 0.0979 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:00,082-Speed 2622.00 samples/sec Loss 28.4518 LearningRate 0.0979 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:03,970-Speed 2634.32 samples/sec Loss 28.3854 LearningRate 0.0979 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:07,872-Speed 2624.98 samples/sec Loss 28.4194 LearningRate 0.0979 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:11,787-Speed 2616.49 samples/sec Loss 28.4204 LearningRate 0.0979 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:15,711-Speed 2610.13 samples/sec Loss 28.2436 LearningRate 0.0979 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:19,610-Speed 2626.94 samples/sec Loss 28.5098 LearningRate 0.0979 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:23,544-Speed 2603.67 samples/sec Loss 28.2190 LearningRate 0.0979 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:27,445-Speed 2626.16 samples/sec Loss 28.3208 LearningRate 0.0979 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:31,346-Speed 2625.18 samples/sec Loss 28.1830 LearningRate 0.0979 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:35,244-Speed 2627.75 samples/sec Loss 28.2027 LearningRate 0.0979 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:39,150-Speed 2621.96 samples/sec Loss 28.1714 LearningRate 0.0979 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:43,054-Speed 2624.02 samples/sec Loss 28.2329 LearningRate 0.0979 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:46,956-Speed 2625.13 samples/sec Loss 28.1270 LearningRate 0.0979 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:50,873-Speed 2614.89 samples/sec Loss 28.2216 LearningRate 0.0979 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:54,790-Speed 2614.75 samples/sec Loss 28.1825 LearningRate 0.0979 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:41:58,693-Speed 2624.76 samples/sec Loss 28.0849 LearningRate 0.0979 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:02,711-Speed 2548.99 samples/sec Loss 28.2432 LearningRate 0.0979 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:06,821-Speed 2492.02 samples/sec Loss 28.1227 LearningRate 0.0979 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:10,802-Speed 2573.31 samples/sec Loss 28.0397 LearningRate 0.0979 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:14,707-Speed 2622.64 samples/sec Loss 28.2350 LearningRate 0.0979 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:18,606-Speed 2627.52 samples/sec Loss 28.0076 LearningRate 0.0979 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:22,486-Speed 2639.05 samples/sec Loss 28.0850 LearningRate 0.0978 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:26,491-Speed 2558.19 samples/sec Loss 27.9504 LearningRate 0.0978 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:30,399-Speed 2620.91 samples/sec Loss 28.0034 LearningRate 0.0978 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:34,300-Speed 2625.26 samples/sec Loss 27.8420 LearningRate 0.0978 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:38,212-Speed 2618.57 samples/sec Loss 27.8597 LearningRate 0.0978 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:42,109-Speed 2628.11 samples/sec Loss 27.8935 LearningRate 0.0978 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:46,022-Speed 2617.98 samples/sec Loss 27.7350 LearningRate 0.0978 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:49,939-Speed 2614.82 samples/sec Loss 27.9296 LearningRate 0.0978 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:53,851-Speed 2617.62 samples/sec Loss 27.8657 LearningRate 0.0978 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:42:57,753-Speed 2625.35 samples/sec Loss 27.8021 LearningRate 0.0978 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:01,639-Speed 2636.02 samples/sec Loss 27.9774 LearningRate 0.0978 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:05,541-Speed 2624.64 samples/sec Loss 27.8316 LearningRate 0.0978 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:09,581-Speed 2535.06 samples/sec Loss 27.8182 LearningRate 0.0978 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:13,648-Speed 2519.01 samples/sec Loss 27.7763 LearningRate 0.0978 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:17,548-Speed 2626.37 samples/sec Loss 27.7766 LearningRate 0.0978 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:21,446-Speed 2627.64 samples/sec Loss 27.4138 LearningRate 0.0978 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:25,356-Speed 2619.29 samples/sec Loss 27.4933 LearningRate 0.0978 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:29,254-Speed 2627.95 samples/sec Loss 27.5888 LearningRate 0.0978 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:33,154-Speed 2626.18 samples/sec Loss 27.7330 LearningRate 0.0978 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:37,059-Speed 2623.01 samples/sec Loss 27.6962 LearningRate 0.0978 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:40,939-Speed 2639.77 samples/sec Loss 27.4825 LearningRate 0.0978 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:44,836-Speed 2628.62 samples/sec Loss 27.3066 LearningRate 0.0978 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:48,978-Speed 2472.53 samples/sec Loss 27.7119 LearningRate 0.0978 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:52,879-Speed 2626.05 samples/sec Loss 27.5522 LearningRate 0.0978 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:43:56,809-Speed 2606.55 samples/sec Loss 27.5434 LearningRate 0.0978 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:00,709-Speed 2626.56 samples/sec Loss 27.7230 LearningRate 0.0978 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:04,609-Speed 2626.18 samples/sec Loss 27.2470 LearningRate 0.0978 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:08,516-Speed 2621.60 samples/sec Loss 27.3241 LearningRate 0.0978 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:12,414-Speed 2627.60 samples/sec Loss 27.5802 LearningRate 0.0978 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:16,511-Speed 2500.27 samples/sec Loss 27.4420 LearningRate 0.0978 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:20,457-Speed 2595.02 samples/sec Loss 27.4551 LearningRate 0.0978 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 20:44:24,349-Speed 2632.66 samples/sec Loss 27.5119 LearningRate 0.0978 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:28,245-Speed 2628.61 samples/sec Loss 27.5700 LearningRate 0.0978 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:32,149-Speed 2624.12 samples/sec Loss 27.4008 LearningRate 0.0978 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:36,048-Speed 2626.45 samples/sec Loss 27.3973 LearningRate 0.0978 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:40,013-Speed 2583.47 samples/sec Loss 27.5546 LearningRate 0.0978 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:43,958-Speed 2596.11 samples/sec Loss 27.1830 LearningRate 0.0978 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:47,877-Speed 2613.80 samples/sec Loss 27.4072 LearningRate 0.0978 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:51,775-Speed 2627.01 samples/sec Loss 27.3724 LearningRate 0.0978 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:55,676-Speed 2626.37 samples/sec Loss 27.2869 LearningRate 0.0978 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:44:59,557-Speed 2639.17 samples/sec Loss 27.3143 LearningRate 0.0978 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:03,527-Speed 2580.17 samples/sec Loss 27.2176 LearningRate 0.0978 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:07,421-Speed 2630.22 samples/sec Loss 27.3812 LearningRate 0.0977 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:11,451-Speed 2541.08 samples/sec Loss 27.1825 LearningRate 0.0977 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:15,565-Speed 2489.60 samples/sec Loss 27.1027 LearningRate 0.0977 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:19,462-Speed 2628.85 samples/sec Loss 27.3380 LearningRate 0.0977 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:23,381-Speed 2614.08 samples/sec Loss 27.1832 LearningRate 0.0977 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:27,285-Speed 2623.27 samples/sec Loss 27.4080 LearningRate 0.0977 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:31,242-Speed 2589.49 samples/sec Loss 27.1572 LearningRate 0.0977 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:35,146-Speed 2623.32 samples/sec Loss 26.9004 LearningRate 0.0977 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:45:39,064-Speed 2614.21 samples/sec Loss 26.9150 LearningRate 0.0977 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:45:42,966-Speed 2625.14 samples/sec Loss 27.0863 LearningRate 0.0977 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:45:46,864-Speed 2627.48 samples/sec Loss 26.8527 LearningRate 0.0977 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:45:50,795-Speed 2605.47 samples/sec Loss 27.0035 LearningRate 0.0977 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:45:54,697-Speed 2625.25 samples/sec Loss 26.8741 LearningRate 0.0977 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:45:58,609-Speed 2618.26 samples/sec Loss 26.9509 LearningRate 0.0977 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:02,515-Speed 2623.19 samples/sec Loss 26.9166 LearningRate 0.0977 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:06,474-Speed 2587.05 samples/sec Loss 27.0350 LearningRate 0.0977 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:10,403-Speed 2606.67 samples/sec Loss 27.0351 LearningRate 0.0977 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:14,334-Speed 2606.03 samples/sec Loss 26.9478 LearningRate 0.0977 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:18,313-Speed 2573.90 samples/sec Loss 27.0402 LearningRate 0.0977 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:22,222-Speed 2620.26 samples/sec Loss 26.9473 LearningRate 0.0977 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:26,132-Speed 2619.42 samples/sec Loss 26.8317 LearningRate 0.0977 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:30,050-Speed 2614.61 samples/sec Loss 26.7432 LearningRate 0.0977 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:33,946-Speed 2629.01 samples/sec Loss 26.9352 LearningRate 0.0977 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:38,014-Speed 2517.59 samples/sec Loss 26.9532 LearningRate 0.0977 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:42,041-Speed 2543.77 samples/sec Loss 26.7934 LearningRate 0.0977 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:45,952-Speed 2619.08 samples/sec Loss 26.7332 LearningRate 0.0977 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:49,853-Speed 2625.16 samples/sec Loss 26.6772 LearningRate 0.0977 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:53,766-Speed 2618.47 samples/sec Loss 26.4110 LearningRate 0.0977 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:46:57,686-Speed 2612.34 samples/sec Loss 27.0927 LearningRate 0.0977 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:47:01,594-Speed 2621.02 samples/sec Loss 26.6755 LearningRate 0.0977 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:47:05,494-Speed 2626.50 samples/sec Loss 26.9161 LearningRate 0.0977 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:47:09,392-Speed 2627.40 samples/sec Loss 26.8106 LearningRate 0.0977 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:47:13,321-Speed 2607.54 samples/sec Loss 26.6780 LearningRate 0.0977 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:47:17,222-Speed 2625.21 samples/sec Loss 26.7642 LearningRate 0.0977 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:47:21,151-Speed 2607.64 samples/sec Loss 26.5720 LearningRate 0.0977 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:47:25,045-Speed 2630.21 samples/sec Loss 26.7784 LearningRate 0.0977 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:47:28,946-Speed 2625.72 samples/sec Loss 26.5854 LearningRate 0.0977 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:47:32,857-Speed 2619.01 samples/sec Loss 26.6253 LearningRate 0.0977 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:47:36,763-Speed 2621.67 samples/sec Loss 26.5464 LearningRate 0.0977 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:47:40,669-Speed 2622.12 samples/sec Loss 26.5488 LearningRate 0.0977 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:47:44,589-Speed 2613.70 samples/sec Loss 26.7020 LearningRate 0.0977 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:47:48,484-Speed 2629.34 samples/sec Loss 26.6182 LearningRate 0.0977 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:47:52,393-Speed 2620.27 samples/sec Loss 26.6087 LearningRate 0.0976 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:47:56,311-Speed 2614.27 samples/sec Loss 26.4546 LearningRate 0.0976 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:00,222-Speed 2619.34 samples/sec Loss 26.5255 LearningRate 0.0976 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:04,122-Speed 2626.12 samples/sec Loss 26.2516 LearningRate 0.0976 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:08,061-Speed 2599.98 samples/sec Loss 26.4670 LearningRate 0.0976 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:11,964-Speed 2624.60 samples/sec Loss 26.4238 LearningRate 0.0976 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:15,864-Speed 2626.50 samples/sec Loss 26.3077 LearningRate 0.0976 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:19,771-Speed 2621.33 samples/sec Loss 26.4509 LearningRate 0.0976 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:23,669-Speed 2628.03 samples/sec Loss 26.3677 LearningRate 0.0976 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 20:48:27,571-Speed 2625.04 samples/sec Loss 26.3974 LearningRate 0.0976 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:31,482-Speed 2619.32 samples/sec Loss 26.2149 LearningRate 0.0976 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:35,381-Speed 2626.92 samples/sec Loss 26.2522 LearningRate 0.0976 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:39,291-Speed 2619.03 samples/sec Loss 26.3349 LearningRate 0.0976 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:43,195-Speed 2623.75 samples/sec Loss 26.0603 LearningRate 0.0976 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:47,163-Speed 2581.75 samples/sec Loss 26.1304 LearningRate 0.0976 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:51,220-Speed 2524.70 samples/sec Loss 26.3008 LearningRate 0.0976 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:55,318-Speed 2499.64 samples/sec Loss 26.1454 LearningRate 0.0976 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:48:59,253-Speed 2604.85 samples/sec Loss 26.1659 LearningRate 0.0976 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:49:03,157-Speed 2623.28 samples/sec Loss 26.1125 LearningRate 0.0976 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 20:49:07,066-Speed 2620.23 samples/sec Loss 26.0881 LearningRate 0.0976 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 20:49:49,878-[lfw][10000]XNorm: 24.128440
Training: 2022-04-12 20:49:49,879-[lfw][10000]Accuracy-Flip: 0.98383+-0.00548
Training: 2022-04-12 20:49:49,880-[lfw][10000]Accuracy-Highest: 0.98383
Training: 2022-04-12 20:50:39,805-[cfp_fp][10000]XNorm: 21.212615
Training: 2022-04-12 20:50:39,806-[cfp_fp][10000]Accuracy-Flip: 0.90471+-0.01300
Training: 2022-04-12 20:50:39,807-[cfp_fp][10000]Accuracy-Highest: 0.90471
Training: 2022-04-12 20:51:22,999-[agedb_30][10000]XNorm: 23.680319
Training: 2022-04-12 20:51:23,000-[agedb_30][10000]Accuracy-Flip: 0.87767+-0.02199
Training: 2022-04-12 20:51:23,001-[agedb_30][10000]Accuracy-Highest: 0.87767
Training: 2022-04-12 20:51:26,904-Speed 73.23 samples/sec Loss 26.3416 LearningRate 0.0976 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:30,775-Speed 2645.46 samples/sec Loss 26.0953 LearningRate 0.0976 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:34,688-Speed 2618.33 samples/sec Loss 26.2818 LearningRate 0.0976 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:38,563-Speed 2642.87 samples/sec Loss 26.0735 LearningRate 0.0976 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:42,444-Speed 2638.94 samples/sec Loss 26.0839 LearningRate 0.0976 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:46,339-Speed 2630.36 samples/sec Loss 25.9140 LearningRate 0.0976 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:50,233-Speed 2630.60 samples/sec Loss 26.2683 LearningRate 0.0976 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:54,131-Speed 2627.88 samples/sec Loss 26.0169 LearningRate 0.0976 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:51:58,043-Speed 2618.52 samples/sec Loss 25.8959 LearningRate 0.0976 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:52:01,918-Speed 2643.15 samples/sec Loss 26.0521 LearningRate 0.0976 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:05,825-Speed 2621.14 samples/sec Loss 25.9822 LearningRate 0.0976 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:09,717-Speed 2632.27 samples/sec Loss 25.8842 LearningRate 0.0976 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:13,630-Speed 2617.83 samples/sec Loss 25.8582 LearningRate 0.0976 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:17,531-Speed 2625.73 samples/sec Loss 25.7265 LearningRate 0.0976 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:21,435-Speed 2624.24 samples/sec Loss 25.7682 LearningRate 0.0976 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:25,338-Speed 2623.82 samples/sec Loss 25.9368 LearningRate 0.0976 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:29,238-Speed 2626.65 samples/sec Loss 25.7861 LearningRate 0.0976 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:33,140-Speed 2624.75 samples/sec Loss 25.9549 LearningRate 0.0976 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:37,099-Speed 2586.99 samples/sec Loss 25.8198 LearningRate 0.0976 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:52:41,125-Speed 2544.28 samples/sec Loss 25.7778 LearningRate 0.0976 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:52:45,037-Speed 2618.01 samples/sec Loss 26.0180 LearningRate 0.0976 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:52:48,938-Speed 2626.01 samples/sec Loss 25.7231 LearningRate 0.0976 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:52:52,929-Speed 2566.31 samples/sec Loss 25.9952 LearningRate 0.0975 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:52:56,917-Speed 2568.36 samples/sec Loss 25.6923 LearningRate 0.0975 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:00,821-Speed 2624.03 samples/sec Loss 25.8228 LearningRate 0.0975 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:04,905-Speed 2507.84 samples/sec Loss 25.7174 LearningRate 0.0975 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:09,011-Speed 2494.09 samples/sec Loss 25.5992 LearningRate 0.0975 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:13,027-Speed 2550.87 samples/sec Loss 25.4028 LearningRate 0.0975 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:16,926-Speed 2626.79 samples/sec Loss 25.5055 LearningRate 0.0975 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:20,830-Speed 2624.05 samples/sec Loss 25.7221 LearningRate 0.0975 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:24,753-Speed 2610.98 samples/sec Loss 25.4007 LearningRate 0.0975 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:28,649-Speed 2629.10 samples/sec Loss 25.6807 LearningRate 0.0975 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:32,544-Speed 2629.63 samples/sec Loss 25.8160 LearningRate 0.0975 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:53:36,437-Speed 2630.91 samples/sec Loss 25.7485 LearningRate 0.0975 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:53:40,334-Speed 2628.21 samples/sec Loss 25.5572 LearningRate 0.0975 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:53:44,258-Speed 2612.81 samples/sec Loss 25.5285 LearningRate 0.0975 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:53:48,155-Speed 2628.61 samples/sec Loss 25.5055 LearningRate 0.0975 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:53:52,062-Speed 2622.03 samples/sec Loss 25.4029 LearningRate 0.0975 Epoch: 0 Global Step: 10380 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:53:55,965-Speed 2624.27 samples/sec Loss 25.5362 LearningRate 0.0975 Epoch: 0 Global Step: 10390 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:53:59,897-Speed 2605.09 samples/sec Loss 25.5295 LearningRate 0.0975 Epoch: 0 Global Step: 10400 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:03,811-Speed 2617.25 samples/sec Loss 25.5614 LearningRate 0.0975 Epoch: 0 Global Step: 10410 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:07,713-Speed 2624.61 samples/sec Loss 25.3439 LearningRate 0.0975 Epoch: 0 Global Step: 10420 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:11,607-Speed 2630.55 samples/sec Loss 25.4497 LearningRate 0.0975 Epoch: 0 Global Step: 10430 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:15,495-Speed 2634.40 samples/sec Loss 25.4713 LearningRate 0.0975 Epoch: 0 Global Step: 10440 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:19,386-Speed 2632.68 samples/sec Loss 25.3453 LearningRate 0.0975 Epoch: 0 Global Step: 10450 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:23,284-Speed 2627.55 samples/sec Loss 25.5343 LearningRate 0.0975 Epoch: 0 Global Step: 10460 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:27,190-Speed 2622.65 samples/sec Loss 25.1323 LearningRate 0.0975 Epoch: 0 Global Step: 10470 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:31,125-Speed 2602.81 samples/sec Loss 25.3013 LearningRate 0.0975 Epoch: 0 Global Step: 10480 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:35,041-Speed 2615.58 samples/sec Loss 25.2745 LearningRate 0.0975 Epoch: 0 Global Step: 10490 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:38,954-Speed 2617.30 samples/sec Loss 25.2008 LearningRate 0.0975 Epoch: 0 Global Step: 10500 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:54:42,839-Speed 2636.15 samples/sec Loss 25.0343 LearningRate 0.0975 Epoch: 0 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:54:46,732-Speed 2631.08 samples/sec Loss 25.1451 LearningRate 0.0975 Epoch: 0 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:54:50,639-Speed 2622.34 samples/sec Loss 25.2910 LearningRate 0.0975 Epoch: 0 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:54:54,536-Speed 2627.77 samples/sec Loss 25.3129 LearningRate 0.0975 Epoch: 0 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:54:58,446-Speed 2619.87 samples/sec Loss 25.1349 LearningRate 0.0975 Epoch: 0 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:02,340-Speed 2630.16 samples/sec Loss 25.1342 LearningRate 0.0975 Epoch: 0 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:06,242-Speed 2625.40 samples/sec Loss 25.1804 LearningRate 0.0975 Epoch: 0 Global Step: 10570 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:10,153-Speed 2618.88 samples/sec Loss 25.3840 LearningRate 0.0975 Epoch: 0 Global Step: 10580 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:14,073-Speed 2612.99 samples/sec Loss 25.2804 LearningRate 0.0975 Epoch: 0 Global Step: 10590 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:17,971-Speed 2627.84 samples/sec Loss 25.3338 LearningRate 0.0975 Epoch: 0 Global Step: 10600 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:21,871-Speed 2626.48 samples/sec Loss 25.1724 LearningRate 0.0975 Epoch: 0 Global Step: 10610 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:55:25,791-Speed 2612.54 samples/sec Loss 25.0033 LearningRate 0.0975 Epoch: 0 Global Step: 10620 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:55:29,697-Speed 2623.07 samples/sec Loss 25.0583 LearningRate 0.0975 Epoch: 0 Global Step: 10630 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:55:33,600-Speed 2624.21 samples/sec Loss 25.0010 LearningRate 0.0975 Epoch: 0 Global Step: 10640 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:55:37,496-Speed 2628.57 samples/sec Loss 25.2046 LearningRate 0.0974 Epoch: 0 Global Step: 10650 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:55:41,387-Speed 2632.44 samples/sec Loss 25.0156 LearningRate 0.0974 Epoch: 0 Global Step: 10660 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:45,273-Speed 2635.23 samples/sec Loss 24.9292 LearningRate 0.0974 Epoch: 0 Global Step: 10670 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:49,172-Speed 2627.36 samples/sec Loss 24.9341 LearningRate 0.0974 Epoch: 0 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:53,072-Speed 2626.26 samples/sec Loss 24.9616 LearningRate 0.0974 Epoch: 0 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:55:56,970-Speed 2627.47 samples/sec Loss 24.8965 LearningRate 0.0974 Epoch: 0 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:00,870-Speed 2626.26 samples/sec Loss 24.8104 LearningRate 0.0974 Epoch: 0 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:04,770-Speed 2626.36 samples/sec Loss 25.0927 LearningRate 0.0974 Epoch: 0 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:08,673-Speed 2624.14 samples/sec Loss 24.7839 LearningRate 0.0974 Epoch: 0 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:12,603-Speed 2605.98 samples/sec Loss 24.8693 LearningRate 0.0974 Epoch: 0 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:16,523-Speed 2612.93 samples/sec Loss 24.7595 LearningRate 0.0974 Epoch: 0 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:20,440-Speed 2615.13 samples/sec Loss 24.9841 LearningRate 0.0974 Epoch: 0 Global Step: 10760 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:56:24,345-Speed 2623.21 samples/sec Loss 24.8591 LearningRate 0.0974 Epoch: 0 Global Step: 10770 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:56:28,242-Speed 2628.47 samples/sec Loss 24.7366 LearningRate 0.0974 Epoch: 0 Global Step: 10780 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:56:32,120-Speed 2640.99 samples/sec Loss 24.6284 LearningRate 0.0974 Epoch: 0 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:36,142-Speed 2546.67 samples/sec Loss 24.9517 LearningRate 0.0974 Epoch: 0 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:40,245-Speed 2496.62 samples/sec Loss 24.9556 LearningRate 0.0974 Epoch: 0 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:44,303-Speed 2524.01 samples/sec Loss 24.7013 LearningRate 0.0974 Epoch: 0 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:48,219-Speed 2615.33 samples/sec Loss 24.7794 LearningRate 0.0974 Epoch: 0 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:52,110-Speed 2633.39 samples/sec Loss 24.7543 LearningRate 0.0974 Epoch: 0 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:56,005-Speed 2629.56 samples/sec Loss 24.7747 LearningRate 0.0974 Epoch: 0 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:56:59,904-Speed 2627.28 samples/sec Loss 24.7925 LearningRate 0.0974 Epoch: 0 Global Step: 10860 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:57:03,805-Speed 2625.06 samples/sec Loss 24.5589 LearningRate 0.0974 Epoch: 0 Global Step: 10870 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:57:07,848-Speed 2533.47 samples/sec Loss 24.7518 LearningRate 0.0974 Epoch: 0 Global Step: 10880 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 20:57:11,953-Speed 2495.12 samples/sec Loss 24.7633 LearningRate 0.0974 Epoch: 0 Global Step: 10890 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:15,867-Speed 2617.15 samples/sec Loss 24.7814 LearningRate 0.0974 Epoch: 0 Global Step: 10900 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:19,770-Speed 2623.86 samples/sec Loss 24.7277 LearningRate 0.0974 Epoch: 0 Global Step: 10910 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:23,688-Speed 2614.25 samples/sec Loss 24.8398 LearningRate 0.0974 Epoch: 0 Global Step: 10920 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:27,591-Speed 2624.81 samples/sec Loss 24.7911 LearningRate 0.0974 Epoch: 0 Global Step: 10930 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:31,500-Speed 2619.61 samples/sec Loss 24.7152 LearningRate 0.0974 Epoch: 0 Global Step: 10940 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:35,411-Speed 2618.98 samples/sec Loss 24.6755 LearningRate 0.0974 Epoch: 0 Global Step: 10950 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:39,310-Speed 2626.99 samples/sec Loss 24.6336 LearningRate 0.0974 Epoch: 0 Global Step: 10960 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:43,212-Speed 2624.68 samples/sec Loss 24.5345 LearningRate 0.0974 Epoch: 0 Global Step: 10970 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:47,109-Speed 2635.54 samples/sec Loss 24.6742 LearningRate 0.0974 Epoch: 0 Global Step: 10980 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:50,994-Speed 2636.25 samples/sec Loss 24.4949 LearningRate 0.0974 Epoch: 0 Global Step: 10990 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:54,893-Speed 2627.26 samples/sec Loss 24.4961 LearningRate 0.0974 Epoch: 0 Global Step: 11000 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:57:58,937-Speed 2532.62 samples/sec Loss 24.3708 LearningRate 0.0974 Epoch: 0 Global Step: 11010 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:02,849-Speed 2618.35 samples/sec Loss 24.5412 LearningRate 0.0974 Epoch: 0 Global Step: 11020 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:06,828-Speed 2573.88 samples/sec Loss 24.5040 LearningRate 0.0974 Epoch: 0 Global Step: 11030 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:10,779-Speed 2592.79 samples/sec Loss 24.4288 LearningRate 0.0974 Epoch: 0 Global Step: 11040 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:14,705-Speed 2609.35 samples/sec Loss 24.5361 LearningRate 0.0974 Epoch: 0 Global Step: 11050 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:18,619-Speed 2616.87 samples/sec Loss 24.6164 LearningRate 0.0974 Epoch: 0 Global Step: 11060 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:22,551-Speed 2605.30 samples/sec Loss 24.3951 LearningRate 0.0973 Epoch: 0 Global Step: 11070 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:26,447-Speed 2628.43 samples/sec Loss 24.2871 LearningRate 0.0973 Epoch: 0 Global Step: 11080 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:30,337-Speed 2633.44 samples/sec Loss 24.3536 LearningRate 0.0973 Epoch: 0 Global Step: 11090 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:34,237-Speed 2626.20 samples/sec Loss 24.3756 LearningRate 0.0973 Epoch: 0 Global Step: 11100 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:38,142-Speed 2623.13 samples/sec Loss 24.3116 LearningRate 0.0973 Epoch: 0 Global Step: 11110 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:42,038-Speed 2628.67 samples/sec Loss 24.3662 LearningRate 0.0973 Epoch: 0 Global Step: 11120 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:45,937-Speed 2627.08 samples/sec Loss 24.1740 LearningRate 0.0973 Epoch: 0 Global Step: 11130 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:49,839-Speed 2625.20 samples/sec Loss 24.1972 LearningRate 0.0973 Epoch: 0 Global Step: 11140 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:53,734-Speed 2630.24 samples/sec Loss 24.3213 LearningRate 0.0973 Epoch: 0 Global Step: 11150 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:58:57,731-Speed 2562.08 samples/sec Loss 24.2747 LearningRate 0.0973 Epoch: 0 Global Step: 11160 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:01,627-Speed 2629.06 samples/sec Loss 24.1988 LearningRate 0.0973 Epoch: 0 Global Step: 11170 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:05,521-Speed 2630.00 samples/sec Loss 24.4360 LearningRate 0.0973 Epoch: 0 Global Step: 11180 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:09,405-Speed 2637.29 samples/sec Loss 24.2739 LearningRate 0.0973 Epoch: 0 Global Step: 11190 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:13,305-Speed 2626.56 samples/sec Loss 24.3445 LearningRate 0.0973 Epoch: 0 Global Step: 11200 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:17,227-Speed 2611.31 samples/sec Loss 24.2993 LearningRate 0.0973 Epoch: 0 Global Step: 11210 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:21,124-Speed 2632.87 samples/sec Loss 24.3260 LearningRate 0.0973 Epoch: 0 Global Step: 11220 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:25,035-Speed 2618.98 samples/sec Loss 24.4078 LearningRate 0.0973 Epoch: 0 Global Step: 11230 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:29,130-Speed 2501.32 samples/sec Loss 24.0584 LearningRate 0.0973 Epoch: 0 Global Step: 11240 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:33,155-Speed 2544.87 samples/sec Loss 24.2061 LearningRate 0.0973 Epoch: 0 Global Step: 11250 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:37,064-Speed 2620.31 samples/sec Loss 24.2413 LearningRate 0.0973 Epoch: 0 Global Step: 11260 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:40,970-Speed 2622.08 samples/sec Loss 24.0774 LearningRate 0.0973 Epoch: 0 Global Step: 11270 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:44,883-Speed 2617.61 samples/sec Loss 24.0927 LearningRate 0.0973 Epoch: 0 Global Step: 11280 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:48,781-Speed 2627.23 samples/sec Loss 24.2251 LearningRate 0.0973 Epoch: 0 Global Step: 11290 Fp16 Grad Scale: 524288 Required: 92 hours
Training: 2022-04-12 20:59:52,663-Speed 2639.25 samples/sec Loss 24.0165 LearningRate 0.0973 Epoch: 0 Global Step: 11300 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 20:59:56,557-Speed 2629.81 samples/sec Loss 23.9387 LearningRate 0.0973 Epoch: 0 Global Step: 11310 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:00:00,455-Speed 2627.80 samples/sec Loss 23.9461 LearningRate 0.0973 Epoch: 0 Global Step: 11320 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:00:04,353-Speed 2627.79 samples/sec Loss 24.0221 LearningRate 0.0973 Epoch: 0 Global Step: 11330 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:00:08,265-Speed 2617.76 samples/sec Loss 24.1475 LearningRate 0.0973 Epoch: 0 Global Step: 11340 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:00:12,166-Speed 2625.22 samples/sec Loss 24.0531 LearningRate 0.0973 Epoch: 0 Global Step: 11350 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:00:16,057-Speed 2632.60 samples/sec Loss 23.8007 LearningRate 0.0973 Epoch: 0 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:19,964-Speed 2622.10 samples/sec Loss 23.8685 LearningRate 0.0973 Epoch: 0 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:23,873-Speed 2620.21 samples/sec Loss 24.0787 LearningRate 0.0973 Epoch: 0 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:27,774-Speed 2626.12 samples/sec Loss 23.9378 LearningRate 0.0973 Epoch: 0 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:31,668-Speed 2630.01 samples/sec Loss 24.0796 LearningRate 0.0973 Epoch: 0 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:35,568-Speed 2626.22 samples/sec Loss 23.8185 LearningRate 0.0973 Epoch: 0 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:39,464-Speed 2629.05 samples/sec Loss 23.9420 LearningRate 0.0973 Epoch: 0 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:43,366-Speed 2625.29 samples/sec Loss 24.0507 LearningRate 0.0973 Epoch: 0 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:47,265-Speed 2626.65 samples/sec Loss 23.8362 LearningRate 0.0973 Epoch: 0 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:51,165-Speed 2626.58 samples/sec Loss 23.6750 LearningRate 0.0973 Epoch: 0 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:00:55,068-Speed 2624.08 samples/sec Loss 23.7539 LearningRate 0.0973 Epoch: 0 Global Step: 11460 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:00:58,966-Speed 2627.83 samples/sec Loss 24.0846 LearningRate 0.0973 Epoch: 0 Global Step: 11470 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:02,947-Speed 2572.71 samples/sec Loss 23.8444 LearningRate 0.0973 Epoch: 0 Global Step: 11480 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:06,869-Speed 2611.62 samples/sec Loss 23.8200 LearningRate 0.0972 Epoch: 0 Global Step: 11490 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:10,781-Speed 2618.25 samples/sec Loss 23.7068 LearningRate 0.0972 Epoch: 0 Global Step: 11500 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:14,691-Speed 2619.98 samples/sec Loss 23.9309 LearningRate 0.0972 Epoch: 0 Global Step: 11510 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:18,594-Speed 2623.71 samples/sec Loss 23.7136 LearningRate 0.0972 Epoch: 0 Global Step: 11520 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:22,508-Speed 2617.90 samples/sec Loss 23.6356 LearningRate 0.0972 Epoch: 0 Global Step: 11530 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:26,488-Speed 2573.29 samples/sec Loss 23.6270 LearningRate 0.0972 Epoch: 0 Global Step: 11540 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:30,421-Speed 2604.54 samples/sec Loss 23.7791 LearningRate 0.0972 Epoch: 0 Global Step: 11550 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:34,304-Speed 2638.06 samples/sec Loss 23.5974 LearningRate 0.0972 Epoch: 0 Global Step: 11560 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:38,201-Speed 2627.93 samples/sec Loss 23.9312 LearningRate 0.0972 Epoch: 0 Global Step: 11570 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:42,130-Speed 2606.91 samples/sec Loss 23.8163 LearningRate 0.0972 Epoch: 0 Global Step: 11580 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:46,025-Speed 2629.94 samples/sec Loss 23.8748 LearningRate 0.0972 Epoch: 0 Global Step: 11590 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:49,923-Speed 2628.13 samples/sec Loss 23.6850 LearningRate 0.0972 Epoch: 0 Global Step: 11600 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:53,831-Speed 2621.12 samples/sec Loss 23.7128 LearningRate 0.0972 Epoch: 0 Global Step: 11610 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:01:57,748-Speed 2615.18 samples/sec Loss 23.6204 LearningRate 0.0972 Epoch: 0 Global Step: 11620 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:01,651-Speed 2623.96 samples/sec Loss 23.5729 LearningRate 0.0972 Epoch: 0 Global Step: 11630 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:05,571-Speed 2612.86 samples/sec Loss 23.6758 LearningRate 0.0972 Epoch: 0 Global Step: 11640 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:09,466-Speed 2629.41 samples/sec Loss 23.7170 LearningRate 0.0972 Epoch: 0 Global Step: 11650 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:13,353-Speed 2635.54 samples/sec Loss 23.4402 LearningRate 0.0972 Epoch: 0 Global Step: 11660 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:17,243-Speed 2632.79 samples/sec Loss 23.5444 LearningRate 0.0972 Epoch: 0 Global Step: 11670 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:21,155-Speed 2618.43 samples/sec Loss 23.5445 LearningRate 0.0972 Epoch: 0 Global Step: 11680 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:25,128-Speed 2578.68 samples/sec Loss 23.4500 LearningRate 0.0972 Epoch: 0 Global Step: 11690 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:29,133-Speed 2557.15 samples/sec Loss 23.6127 LearningRate 0.0972 Epoch: 0 Global Step: 11700 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:33,080-Speed 2594.92 samples/sec Loss 23.6015 LearningRate 0.0972 Epoch: 0 Global Step: 11710 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:37,096-Speed 2550.61 samples/sec Loss 23.3289 LearningRate 0.0972 Epoch: 0 Global Step: 11720 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:41,008-Speed 2618.61 samples/sec Loss 23.6072 LearningRate 0.0972 Epoch: 0 Global Step: 11730 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:44,909-Speed 2625.56 samples/sec Loss 23.6353 LearningRate 0.0972 Epoch: 0 Global Step: 11740 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:48,819-Speed 2619.44 samples/sec Loss 23.3112 LearningRate 0.0972 Epoch: 0 Global Step: 11750 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:02:52,813-Speed 2564.43 samples/sec Loss 23.4358 LearningRate 0.0972 Epoch: 0 Global Step: 11760 Fp16 Grad Scale: 524288 Required: 92 hours
Training: 2022-04-12 21:02:56,698-Speed 2636.85 samples/sec Loss 23.5063 LearningRate 0.0972 Epoch: 0 Global Step: 11770 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:03:00,603-Speed 2622.35 samples/sec Loss 23.5662 LearningRate 0.0972 Epoch: 0 Global Step: 11780 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:03:04,528-Speed 2609.88 samples/sec Loss 23.2884 LearningRate 0.0972 Epoch: 0 Global Step: 11790 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:03:08,429-Speed 2626.08 samples/sec Loss 23.4348 LearningRate 0.0972 Epoch: 0 Global Step: 11800 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:03:12,539-Speed 2491.67 samples/sec Loss 23.3854 LearningRate 0.0972 Epoch: 0 Global Step: 11810 Fp16 Grad Scale: 262144 Required: 92 hours
Training: 2022-04-12 21:03:16,619-Speed 2511.36 samples/sec Loss 23.2861 LearningRate 0.0972 Epoch: 0 Global Step: 11820 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:03:20,566-Speed 2595.98 samples/sec Loss 23.3866 LearningRate 0.0972 Epoch: 0 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:03:24,466-Speed 2625.85 samples/sec Loss 23.2487 LearningRate 0.0972 Epoch: 0 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:03:28,378-Speed 2618.60 samples/sec Loss 23.2853 LearningRate 0.0972 Epoch: 0 Global Step: 11850 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:03:32,296-Speed 2614.29 samples/sec Loss 23.4276 LearningRate 0.0972 Epoch: 0 Global Step: 11860 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:03:36,325-Speed 2541.96 samples/sec Loss 23.3802 LearningRate 0.0972 Epoch: 0 Global Step: 11870 Fp16 Grad Scale: 131072 Required: 92 hours
Training: 2022-04-12 21:03:40,230-Speed 2623.39 samples/sec Loss 23.3754 LearningRate 0.0972 Epoch: 0 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:03:44,141-Speed 2618.50 samples/sec Loss 23.2311 LearningRate 0.0972 Epoch: 0 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:03:48,039-Speed 2627.92 samples/sec Loss 23.3196 LearningRate 0.0972 Epoch: 0 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:03:51,952-Speed 2617.62 samples/sec Loss 23.3323 LearningRate 0.0971 Epoch: 0 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:03:55,852-Speed 2626.55 samples/sec Loss 23.1215 LearningRate 0.0971 Epoch: 0 Global Step: 11920 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:03:59,760-Speed 2621.11 samples/sec Loss 23.3211 LearningRate 0.0971 Epoch: 0 Global Step: 11930 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:03,663-Speed 2624.34 samples/sec Loss 23.0525 LearningRate 0.0971 Epoch: 0 Global Step: 11940 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:07,563-Speed 2626.41 samples/sec Loss 23.3195 LearningRate 0.0971 Epoch: 0 Global Step: 11950 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:11,466-Speed 2624.18 samples/sec Loss 23.2789 LearningRate 0.0971 Epoch: 0 Global Step: 11960 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:15,363-Speed 2628.72 samples/sec Loss 23.2794 LearningRate 0.0971 Epoch: 0 Global Step: 11970 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:19,260-Speed 2628.24 samples/sec Loss 23.2121 LearningRate 0.0971 Epoch: 0 Global Step: 11980 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:23,163-Speed 2623.97 samples/sec Loss 23.1384 LearningRate 0.0971 Epoch: 0 Global Step: 11990 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:27,065-Speed 2625.56 samples/sec Loss 23.2496 LearningRate 0.0971 Epoch: 0 Global Step: 12000 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:30,967-Speed 2624.79 samples/sec Loss 22.9744 LearningRate 0.0971 Epoch: 0 Global Step: 12010 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:34,863-Speed 2629.29 samples/sec Loss 23.1442 LearningRate 0.0971 Epoch: 0 Global Step: 12020 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:38,777-Speed 2616.91 samples/sec Loss 23.1529 LearningRate 0.0971 Epoch: 0 Global Step: 12030 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:42,675-Speed 2627.17 samples/sec Loss 23.1291 LearningRate 0.0971 Epoch: 0 Global Step: 12040 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:46,574-Speed 2627.43 samples/sec Loss 23.1541 LearningRate 0.0971 Epoch: 0 Global Step: 12050 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:50,482-Speed 2621.24 samples/sec Loss 23.1384 LearningRate 0.0971 Epoch: 0 Global Step: 12060 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:54,387-Speed 2622.67 samples/sec Loss 22.9940 LearningRate 0.0971 Epoch: 0 Global Step: 12070 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:04:58,286-Speed 2626.95 samples/sec Loss 23.1300 LearningRate 0.0971 Epoch: 0 Global Step: 12080 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:02,185-Speed 2627.04 samples/sec Loss 23.1124 LearningRate 0.0971 Epoch: 0 Global Step: 12090 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:06,085-Speed 2625.98 samples/sec Loss 23.0423 LearningRate 0.0971 Epoch: 0 Global Step: 12100 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:09,982-Speed 2628.53 samples/sec Loss 22.9773 LearningRate 0.0971 Epoch: 0 Global Step: 12110 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:13,881-Speed 2627.07 samples/sec Loss 23.0201 LearningRate 0.0971 Epoch: 0 Global Step: 12120 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:05:17,767-Speed 2635.20 samples/sec Loss 23.1005 LearningRate 0.0971 Epoch: 0 Global Step: 12130 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:21,664-Speed 2628.93 samples/sec Loss 23.0081 LearningRate 0.0971 Epoch: 0 Global Step: 12140 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:25,562-Speed 2627.32 samples/sec Loss 23.1205 LearningRate 0.0971 Epoch: 0 Global Step: 12150 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:29,580-Speed 2549.37 samples/sec Loss 22.7805 LearningRate 0.0971 Epoch: 0 Global Step: 12160 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:33,673-Speed 2502.19 samples/sec Loss 22.7256 LearningRate 0.0971 Epoch: 0 Global Step: 12170 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:37,707-Speed 2539.19 samples/sec Loss 23.0662 LearningRate 0.0971 Epoch: 0 Global Step: 12180 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:41,745-Speed 2536.80 samples/sec Loss 22.8063 LearningRate 0.0971 Epoch: 0 Global Step: 12190 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:45,820-Speed 2513.22 samples/sec Loss 22.7127 LearningRate 0.0971 Epoch: 0 Global Step: 12200 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:49,743-Speed 2610.96 samples/sec Loss 22.7708 LearningRate 0.0971 Epoch: 0 Global Step: 12210 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:53,672-Speed 2612.60 samples/sec Loss 22.8297 LearningRate 0.0971 Epoch: 0 Global Step: 12220 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:05:57,563-Speed 2632.11 samples/sec Loss 22.9926 LearningRate 0.0971 Epoch: 0 Global Step: 12230 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:01,473-Speed 2619.69 samples/sec Loss 22.7414 LearningRate 0.0971 Epoch: 0 Global Step: 12240 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:05,373-Speed 2626.44 samples/sec Loss 22.6848 LearningRate 0.0971 Epoch: 0 Global Step: 12250 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:09,281-Speed 2620.67 samples/sec Loss 22.8649 LearningRate 0.0971 Epoch: 0 Global Step: 12260 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:13,184-Speed 2624.59 samples/sec Loss 22.9567 LearningRate 0.0971 Epoch: 0 Global Step: 12270 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:17,207-Speed 2545.87 samples/sec Loss 22.8172 LearningRate 0.0971 Epoch: 0 Global Step: 12280 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:21,113-Speed 2622.57 samples/sec Loss 22.7182 LearningRate 0.0971 Epoch: 0 Global Step: 12290 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:25,010-Speed 2628.23 samples/sec Loss 22.7642 LearningRate 0.0971 Epoch: 0 Global Step: 12300 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:28,911-Speed 2625.46 samples/sec Loss 22.6296 LearningRate 0.0971 Epoch: 0 Global Step: 12310 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:32,812-Speed 2625.65 samples/sec Loss 22.6737 LearningRate 0.0971 Epoch: 0 Global Step: 12320 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:36,772-Speed 2586.39 samples/sec Loss 22.7985 LearningRate 0.0970 Epoch: 0 Global Step: 12330 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:40,677-Speed 2622.82 samples/sec Loss 22.8695 LearningRate 0.0970 Epoch: 0 Global Step: 12340 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:44,641-Speed 2584.17 samples/sec Loss 22.4713 LearningRate 0.0970 Epoch: 0 Global Step: 12350 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:48,564-Speed 2610.66 samples/sec Loss 22.6371 LearningRate 0.0970 Epoch: 0 Global Step: 12360 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:52,476-Speed 2618.34 samples/sec Loss 22.5962 LearningRate 0.0970 Epoch: 0 Global Step: 12370 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:06:56,428-Speed 2591.95 samples/sec Loss 22.7849 LearningRate 0.0970 Epoch: 0 Global Step: 12380 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:00,548-Speed 2486.05 samples/sec Loss 22.6879 LearningRate 0.0970 Epoch: 0 Global Step: 12390 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:04,662-Speed 2489.76 samples/sec Loss 22.5807 LearningRate 0.0970 Epoch: 0 Global Step: 12400 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:08,769-Speed 2493.86 samples/sec Loss 22.5824 LearningRate 0.0970 Epoch: 0 Global Step: 12410 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:12,754-Speed 2570.23 samples/sec Loss 22.4593 LearningRate 0.0970 Epoch: 0 Global Step: 12420 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:16,859-Speed 2495.08 samples/sec Loss 22.6251 LearningRate 0.0970 Epoch: 0 Global Step: 12430 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:07:20,898-Speed 2536.37 samples/sec Loss 22.6533 LearningRate 0.0970 Epoch: 0 Global Step: 12440 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:24,806-Speed 2620.56 samples/sec Loss 22.5777 LearningRate 0.0970 Epoch: 0 Global Step: 12450 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:28,715-Speed 2620.87 samples/sec Loss 22.6462 LearningRate 0.0970 Epoch: 0 Global Step: 12460 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:07:32,600-Speed 2636.06 samples/sec Loss 22.5146 LearningRate 0.0970 Epoch: 0 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:07:36,514-Speed 2616.86 samples/sec Loss 22.6128 LearningRate 0.0970 Epoch: 0 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:07:40,420-Speed 2621.82 samples/sec Loss 22.7122 LearningRate 0.0970 Epoch: 0 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:07:44,330-Speed 2619.95 samples/sec Loss 22.6672 LearningRate 0.0970 Epoch: 0 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:07:48,234-Speed 2623.59 samples/sec Loss 22.5040 LearningRate 0.0970 Epoch: 0 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:07:52,134-Speed 2626.27 samples/sec Loss 22.3887 LearningRate 0.0970 Epoch: 0 Global Step: 12520 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:07:56,049-Speed 2616.30 samples/sec Loss 22.5132 LearningRate 0.0970 Epoch: 0 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:07:59,944-Speed 2630.21 samples/sec Loss 22.4858 LearningRate 0.0970 Epoch: 0 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:08:03,859-Speed 2616.13 samples/sec Loss 22.5814 LearningRate 0.0970 Epoch: 0 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:08:07,761-Speed 2625.18 samples/sec Loss 22.4180 LearningRate 0.0970 Epoch: 0 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:08:11,666-Speed 2622.49 samples/sec Loss 22.5058 LearningRate 0.0970 Epoch: 0 Global Step: 12570 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:15,572-Speed 2622.53 samples/sec Loss 22.3390 LearningRate 0.0970 Epoch: 0 Global Step: 12580 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:19,489-Speed 2614.53 samples/sec Loss 22.3738 LearningRate 0.0970 Epoch: 0 Global Step: 12590 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:23,391-Speed 2625.10 samples/sec Loss 22.3143 LearningRate 0.0970 Epoch: 0 Global Step: 12600 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:27,299-Speed 2620.64 samples/sec Loss 22.5547 LearningRate 0.0970 Epoch: 0 Global Step: 12610 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:31,222-Speed 2611.18 samples/sec Loss 22.3073 LearningRate 0.0970 Epoch: 0 Global Step: 12620 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:35,126-Speed 2623.67 samples/sec Loss 22.5420 LearningRate 0.0970 Epoch: 0 Global Step: 12630 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:39,031-Speed 2622.36 samples/sec Loss 22.3726 LearningRate 0.0970 Epoch: 0 Global Step: 12640 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:43,004-Speed 2578.21 samples/sec Loss 22.3295 LearningRate 0.0970 Epoch: 0 Global Step: 12650 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:46,904-Speed 2626.04 samples/sec Loss 22.4101 LearningRate 0.0970 Epoch: 0 Global Step: 12660 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:50,786-Speed 2638.54 samples/sec Loss 22.2679 LearningRate 0.0970 Epoch: 0 Global Step: 12670 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:54,693-Speed 2621.48 samples/sec Loss 22.1937 LearningRate 0.0970 Epoch: 0 Global Step: 12680 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:08:58,596-Speed 2624.35 samples/sec Loss 22.1994 LearningRate 0.0970 Epoch: 0 Global Step: 12690 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:02,501-Speed 2622.88 samples/sec Loss 22.3682 LearningRate 0.0970 Epoch: 0 Global Step: 12700 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:06,405-Speed 2623.62 samples/sec Loss 22.1289 LearningRate 0.0970 Epoch: 0 Global Step: 12710 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:10,340-Speed 2603.37 samples/sec Loss 22.0947 LearningRate 0.0970 Epoch: 0 Global Step: 12720 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:14,238-Speed 2627.76 samples/sec Loss 22.3172 LearningRate 0.0970 Epoch: 0 Global Step: 12730 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:18,142-Speed 2623.42 samples/sec Loss 22.2262 LearningRate 0.0970 Epoch: 0 Global Step: 12740 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:22,055-Speed 2617.62 samples/sec Loss 21.9881 LearningRate 0.0969 Epoch: 0 Global Step: 12750 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:25,969-Speed 2616.69 samples/sec Loss 22.3067 LearningRate 0.0969 Epoch: 0 Global Step: 12760 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:29,863-Speed 2631.10 samples/sec Loss 22.2658 LearningRate 0.0969 Epoch: 0 Global Step: 12770 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:33,837-Speed 2576.94 samples/sec Loss 21.8652 LearningRate 0.0969 Epoch: 0 Global Step: 12780 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:37,792-Speed 2590.34 samples/sec Loss 22.3276 LearningRate 0.0969 Epoch: 0 Global Step: 12790 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:41,695-Speed 2624.09 samples/sec Loss 22.3600 LearningRate 0.0969 Epoch: 0 Global Step: 12800 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:45,614-Speed 2613.58 samples/sec Loss 22.0661 LearningRate 0.0969 Epoch: 0 Global Step: 12810 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:49,510-Speed 2629.14 samples/sec Loss 22.1113 LearningRate 0.0969 Epoch: 0 Global Step: 12820 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:53,415-Speed 2623.05 samples/sec Loss 21.9496 LearningRate 0.0969 Epoch: 0 Global Step: 12830 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:09:57,504-Speed 2504.65 samples/sec Loss 22.1286 LearningRate 0.0969 Epoch: 0 Global Step: 12840 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:01,474-Speed 2580.23 samples/sec Loss 22.0993 LearningRate 0.0969 Epoch: 0 Global Step: 12850 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:05,457-Speed 2571.40 samples/sec Loss 22.0234 LearningRate 0.0969 Epoch: 0 Global Step: 12860 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:09,556-Speed 2498.66 samples/sec Loss 21.9928 LearningRate 0.0969 Epoch: 0 Global Step: 12870 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:10:13,436-Speed 2640.06 samples/sec Loss 22.1007 LearningRate 0.0969 Epoch: 0 Global Step: 12880 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:17,338-Speed 2624.65 samples/sec Loss 21.8888 LearningRate 0.0969 Epoch: 0 Global Step: 12890 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:21,237-Speed 2627.66 samples/sec Loss 22.0910 LearningRate 0.0969 Epoch: 0 Global Step: 12900 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:25,138-Speed 2625.06 samples/sec Loss 21.9996 LearningRate 0.0969 Epoch: 0 Global Step: 12910 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:29,083-Speed 2597.13 samples/sec Loss 22.0999 LearningRate 0.0969 Epoch: 0 Global Step: 12920 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:32,985-Speed 2624.62 samples/sec Loss 21.8988 LearningRate 0.0969 Epoch: 0 Global Step: 12930 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:36,919-Speed 2603.34 samples/sec Loss 21.9651 LearningRate 0.0969 Epoch: 0 Global Step: 12940 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:40,828-Speed 2621.02 samples/sec Loss 22.0520 LearningRate 0.0969 Epoch: 0 Global Step: 12950 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:44,734-Speed 2622.31 samples/sec Loss 21.9549 LearningRate 0.0969 Epoch: 0 Global Step: 12960 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:48,633-Speed 2626.95 samples/sec Loss 21.9126 LearningRate 0.0969 Epoch: 0 Global Step: 12970 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:52,519-Speed 2635.68 samples/sec Loss 22.0018 LearningRate 0.0969 Epoch: 0 Global Step: 12980 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:10:56,425-Speed 2621.93 samples/sec Loss 21.8344 LearningRate 0.0969 Epoch: 0 Global Step: 12990 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:11:00,326-Speed 2625.91 samples/sec Loss 22.0886 LearningRate 0.0969 Epoch: 0 Global Step: 13000 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:11:04,223-Speed 2628.17 samples/sec Loss 22.0893 LearningRate 0.0969 Epoch: 0 Global Step: 13010 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:08,124-Speed 2625.65 samples/sec Loss 21.9768 LearningRate 0.0969 Epoch: 0 Global Step: 13020 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:12,028-Speed 2623.19 samples/sec Loss 21.9919 LearningRate 0.0969 Epoch: 0 Global Step: 13030 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:15,948-Speed 2613.15 samples/sec Loss 21.8396 LearningRate 0.0969 Epoch: 0 Global Step: 13040 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:19,862-Speed 2616.87 samples/sec Loss 21.8320 LearningRate 0.0969 Epoch: 0 Global Step: 13050 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:23,769-Speed 2621.88 samples/sec Loss 21.9053 LearningRate 0.0969 Epoch: 0 Global Step: 13060 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:27,683-Speed 2616.63 samples/sec Loss 21.9638 LearningRate 0.0969 Epoch: 0 Global Step: 13070 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:31,686-Speed 2558.87 samples/sec Loss 21.9854 LearningRate 0.0969 Epoch: 0 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:35,594-Speed 2620.68 samples/sec Loss 21.8539 LearningRate 0.0969 Epoch: 0 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:39,497-Speed 2624.01 samples/sec Loss 21.8529 LearningRate 0.0969 Epoch: 0 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:11:43,444-Speed 2595.12 samples/sec Loss 21.8813 LearningRate 0.0969 Epoch: 0 Global Step: 13110 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:11:47,460-Speed 2550.17 samples/sec Loss 21.9317 LearningRate 0.0969 Epoch: 0 Global Step: 13120 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:11:51,376-Speed 2615.83 samples/sec Loss 21.7083 LearningRate 0.0969 Epoch: 0 Global Step: 13130 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:11:55,275-Speed 2626.67 samples/sec Loss 22.1024 LearningRate 0.0969 Epoch: 0 Global Step: 13140 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:11:59,190-Speed 2616.70 samples/sec Loss 22.0134 LearningRate 0.0969 Epoch: 0 Global Step: 13150 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:03,095-Speed 2622.52 samples/sec Loss 21.8493 LearningRate 0.0969 Epoch: 0 Global Step: 13160 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:06,995-Speed 2626.47 samples/sec Loss 21.7454 LearningRate 0.0969 Epoch: 0 Global Step: 13170 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:10,907-Speed 2618.25 samples/sec Loss 21.7387 LearningRate 0.0968 Epoch: 0 Global Step: 13180 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:14,808-Speed 2625.76 samples/sec Loss 21.7976 LearningRate 0.0968 Epoch: 0 Global Step: 13190 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:18,720-Speed 2617.67 samples/sec Loss 21.9894 LearningRate 0.0968 Epoch: 0 Global Step: 13200 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:22,634-Speed 2616.94 samples/sec Loss 21.7291 LearningRate 0.0968 Epoch: 0 Global Step: 13210 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:12:26,519-Speed 2636.18 samples/sec Loss 21.7679 LearningRate 0.0968 Epoch: 0 Global Step: 13220 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:30,428-Speed 2620.50 samples/sec Loss 21.7536 LearningRate 0.0968 Epoch: 0 Global Step: 13230 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:34,339-Speed 2619.39 samples/sec Loss 21.7988 LearningRate 0.0968 Epoch: 0 Global Step: 13240 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:38,236-Speed 2628.02 samples/sec Loss 21.6645 LearningRate 0.0968 Epoch: 0 Global Step: 13250 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:12:42,137-Speed 2625.62 samples/sec Loss 21.6848 LearningRate 0.0968 Epoch: 0 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:12:46,037-Speed 2626.17 samples/sec Loss 21.6457 LearningRate 0.0968 Epoch: 0 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:12:49,942-Speed 2622.94 samples/sec Loss 21.5899 LearningRate 0.0968 Epoch: 0 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:12:53,846-Speed 2623.57 samples/sec Loss 21.6562 LearningRate 0.0968 Epoch: 0 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:12:57,743-Speed 2628.40 samples/sec Loss 21.5651 LearningRate 0.0968 Epoch: 0 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:13:01,646-Speed 2624.23 samples/sec Loss 21.7268 LearningRate 0.0968 Epoch: 0 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:13:05,550-Speed 2624.12 samples/sec Loss 21.6673 LearningRate 0.0968 Epoch: 0 Global Step: 13320 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:13:09,454-Speed 2623.87 samples/sec Loss 21.5848 LearningRate 0.0968 Epoch: 0 Global Step: 13330 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:13:13,352-Speed 2627.39 samples/sec Loss 21.5243 LearningRate 0.0968 Epoch: 0 Global Step: 13340 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:13:17,252-Speed 2625.71 samples/sec Loss 21.6052 LearningRate 0.0968 Epoch: 0 Global Step: 13350 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:13:21,154-Speed 2625.63 samples/sec Loss 21.5295 LearningRate 0.0968 Epoch: 0 Global Step: 13360 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:25,051-Speed 2627.81 samples/sec Loss 21.7733 LearningRate 0.0968 Epoch: 0 Global Step: 13370 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:28,960-Speed 2620.28 samples/sec Loss 21.6417 LearningRate 0.0968 Epoch: 0 Global Step: 13380 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:32,864-Speed 2623.66 samples/sec Loss 21.4377 LearningRate 0.0968 Epoch: 0 Global Step: 13390 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:36,796-Speed 2605.06 samples/sec Loss 21.6088 LearningRate 0.0968 Epoch: 0 Global Step: 13400 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:40,758-Speed 2585.31 samples/sec Loss 21.3130 LearningRate 0.0968 Epoch: 0 Global Step: 13410 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:44,865-Speed 2493.53 samples/sec Loss 21.5228 LearningRate 0.0968 Epoch: 0 Global Step: 13420 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:48,835-Speed 2580.29 samples/sec Loss 21.6012 LearningRate 0.0968 Epoch: 0 Global Step: 13430 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:52,733-Speed 2627.45 samples/sec Loss 21.5291 LearningRate 0.0968 Epoch: 0 Global Step: 13440 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:13:56,628-Speed 2630.47 samples/sec Loss 21.3219 LearningRate 0.0968 Epoch: 0 Global Step: 13450 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:00,522-Speed 2630.06 samples/sec Loss 21.6586 LearningRate 0.0968 Epoch: 0 Global Step: 13460 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:14:04,417-Speed 2629.90 samples/sec Loss 21.4278 LearningRate 0.0968 Epoch: 0 Global Step: 13470 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:08,316-Speed 2626.88 samples/sec Loss 21.4503 LearningRate 0.0968 Epoch: 0 Global Step: 13480 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:12,222-Speed 2622.04 samples/sec Loss 21.4006 LearningRate 0.0968 Epoch: 0 Global Step: 13490 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:16,126-Speed 2624.05 samples/sec Loss 21.5551 LearningRate 0.0968 Epoch: 0 Global Step: 13500 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:20,037-Speed 2619.11 samples/sec Loss 21.4444 LearningRate 0.0968 Epoch: 0 Global Step: 13510 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:23,942-Speed 2622.77 samples/sec Loss 21.3833 LearningRate 0.0968 Epoch: 0 Global Step: 13520 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:27,844-Speed 2624.72 samples/sec Loss 21.4697 LearningRate 0.0968 Epoch: 0 Global Step: 13530 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:31,744-Speed 2626.18 samples/sec Loss 21.5747 LearningRate 0.0968 Epoch: 0 Global Step: 13540 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:35,645-Speed 2625.57 samples/sec Loss 21.3708 LearningRate 0.0968 Epoch: 0 Global Step: 13550 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:39,557-Speed 2618.32 samples/sec Loss 21.5991 LearningRate 0.0968 Epoch: 0 Global Step: 13560 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:43,467-Speed 2619.58 samples/sec Loss 21.3451 LearningRate 0.0968 Epoch: 0 Global Step: 13570 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:14:47,348-Speed 2639.04 samples/sec Loss 21.3215 LearningRate 0.0968 Epoch: 0 Global Step: 13580 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:51,267-Speed 2613.54 samples/sec Loss 21.3097 LearningRate 0.0968 Epoch: 0 Global Step: 13590 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:14:55,163-Speed 2629.26 samples/sec Loss 21.2851 LearningRate 0.0967 Epoch: 0 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:14:59,068-Speed 2622.84 samples/sec Loss 21.1045 LearningRate 0.0967 Epoch: 0 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:02,967-Speed 2626.21 samples/sec Loss 21.3155 LearningRate 0.0967 Epoch: 0 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:06,867-Speed 2626.82 samples/sec Loss 21.1027 LearningRate 0.0967 Epoch: 0 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:10,784-Speed 2615.37 samples/sec Loss 21.0412 LearningRate 0.0967 Epoch: 0 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:14,703-Speed 2613.52 samples/sec Loss 21.2593 LearningRate 0.0967 Epoch: 0 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:18,602-Speed 2626.81 samples/sec Loss 21.2689 LearningRate 0.0967 Epoch: 0 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:22,512-Speed 2619.74 samples/sec Loss 21.1240 LearningRate 0.0967 Epoch: 0 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:26,564-Speed 2527.58 samples/sec Loss 21.2263 LearningRate 0.0967 Epoch: 0 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:30,478-Speed 2617.36 samples/sec Loss 21.3292 LearningRate 0.0967 Epoch: 0 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:15:34,401-Speed 2610.83 samples/sec Loss 21.2554 LearningRate 0.0967 Epoch: 0 Global Step: 13700 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:15:38,410-Speed 2554.67 samples/sec Loss 21.1902 LearningRate 0.0967 Epoch: 0 Global Step: 13710 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:15:42,329-Speed 2613.51 samples/sec Loss 21.2321 LearningRate 0.0967 Epoch: 0 Global Step: 13720 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:15:46,259-Speed 2606.14 samples/sec Loss 21.0333 LearningRate 0.0967 Epoch: 0 Global Step: 13730 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:15:50,169-Speed 2619.75 samples/sec Loss 21.1267 LearningRate 0.0967 Epoch: 0 Global Step: 13740 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:15:54,069-Speed 2626.67 samples/sec Loss 21.2469 LearningRate 0.0967 Epoch: 0 Global Step: 13750 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:15:57,976-Speed 2621.49 samples/sec Loss 21.1254 LearningRate 0.0967 Epoch: 0 Global Step: 13760 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:01,882-Speed 2622.66 samples/sec Loss 21.1211 LearningRate 0.0967 Epoch: 0 Global Step: 13770 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:05,791-Speed 2620.58 samples/sec Loss 21.2554 LearningRate 0.0967 Epoch: 0 Global Step: 13780 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:09,710-Speed 2613.15 samples/sec Loss 21.1452 LearningRate 0.0967 Epoch: 0 Global Step: 13790 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:13,625-Speed 2616.02 samples/sec Loss 20.9004 LearningRate 0.0967 Epoch: 0 Global Step: 13800 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:16:17,514-Speed 2634.44 samples/sec Loss 21.0467 LearningRate 0.0967 Epoch: 0 Global Step: 13810 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:21,434-Speed 2612.86 samples/sec Loss 21.2549 LearningRate 0.0967 Epoch: 0 Global Step: 13820 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:25,369-Speed 2603.09 samples/sec Loss 21.2285 LearningRate 0.0967 Epoch: 0 Global Step: 13830 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:29,296-Speed 2608.23 samples/sec Loss 21.0404 LearningRate 0.0967 Epoch: 0 Global Step: 13840 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:33,199-Speed 2624.61 samples/sec Loss 21.2160 LearningRate 0.0967 Epoch: 0 Global Step: 13850 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:37,114-Speed 2616.13 samples/sec Loss 21.0615 LearningRate 0.0967 Epoch: 0 Global Step: 13860 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:41,019-Speed 2622.62 samples/sec Loss 20.9693 LearningRate 0.0967 Epoch: 0 Global Step: 13870 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:44,921-Speed 2624.70 samples/sec Loss 21.1094 LearningRate 0.0967 Epoch: 0 Global Step: 13880 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:16:48,807-Speed 2636.41 samples/sec Loss 20.9659 LearningRate 0.0967 Epoch: 0 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:16:52,705-Speed 2627.65 samples/sec Loss 21.0756 LearningRate 0.0967 Epoch: 0 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:16:56,617-Speed 2618.13 samples/sec Loss 20.9197 LearningRate 0.0967 Epoch: 0 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:00,557-Speed 2599.78 samples/sec Loss 21.1058 LearningRate 0.0967 Epoch: 0 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:04,463-Speed 2622.30 samples/sec Loss 21.1606 LearningRate 0.0967 Epoch: 0 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:08,427-Speed 2583.48 samples/sec Loss 21.0476 LearningRate 0.0967 Epoch: 0 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:12,332-Speed 2623.24 samples/sec Loss 20.9954 LearningRate 0.0967 Epoch: 0 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:16,330-Speed 2562.13 samples/sec Loss 21.0270 LearningRate 0.0967 Epoch: 0 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:20,235-Speed 2622.79 samples/sec Loss 21.0123 LearningRate 0.0967 Epoch: 0 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:24,138-Speed 2624.51 samples/sec Loss 20.9538 LearningRate 0.0967 Epoch: 0 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:17:28,045-Speed 2621.61 samples/sec Loss 20.8936 LearningRate 0.0967 Epoch: 0 Global Step: 13990 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:32,082-Speed 2537.59 samples/sec Loss 20.7091 LearningRate 0.0967 Epoch: 0 Global Step: 14000 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:36,186-Speed 2495.83 samples/sec Loss 20.8389 LearningRate 0.0967 Epoch: 0 Global Step: 14010 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:40,290-Speed 2495.17 samples/sec Loss 21.0306 LearningRate 0.0966 Epoch: 0 Global Step: 14020 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:44,240-Speed 2593.52 samples/sec Loss 20.9286 LearningRate 0.0966 Epoch: 0 Global Step: 14030 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:48,137-Speed 2628.60 samples/sec Loss 20.9496 LearningRate 0.0966 Epoch: 0 Global Step: 14040 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:52,058-Speed 2612.36 samples/sec Loss 21.0849 LearningRate 0.0966 Epoch: 0 Global Step: 14050 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:55,960-Speed 2624.65 samples/sec Loss 20.9803 LearningRate 0.0966 Epoch: 0 Global Step: 14060 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:17:59,862-Speed 2625.04 samples/sec Loss 20.7408 LearningRate 0.0966 Epoch: 0 Global Step: 14070 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:03,764-Speed 2625.42 samples/sec Loss 20.8444 LearningRate 0.0966 Epoch: 0 Global Step: 14080 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:07,645-Speed 2638.65 samples/sec Loss 20.5915 LearningRate 0.0966 Epoch: 0 Global Step: 14090 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:11,553-Speed 2621.16 samples/sec Loss 20.9767 LearningRate 0.0966 Epoch: 0 Global Step: 14100 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:15,455-Speed 2624.60 samples/sec Loss 20.8806 LearningRate 0.0966 Epoch: 0 Global Step: 14110 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:19,358-Speed 2624.31 samples/sec Loss 20.9393 LearningRate 0.0966 Epoch: 0 Global Step: 14120 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:23,265-Speed 2621.46 samples/sec Loss 20.7980 LearningRate 0.0966 Epoch: 0 Global Step: 14130 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:27,191-Speed 2609.02 samples/sec Loss 20.7596 LearningRate 0.0966 Epoch: 0 Global Step: 14140 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:31,216-Speed 2544.75 samples/sec Loss 20.7570 LearningRate 0.0966 Epoch: 0 Global Step: 14150 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:35,321-Speed 2495.55 samples/sec Loss 20.8807 LearningRate 0.0966 Epoch: 0 Global Step: 14160 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:39,278-Speed 2587.87 samples/sec Loss 20.8961 LearningRate 0.0966 Epoch: 0 Global Step: 14170 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:43,176-Speed 2628.02 samples/sec Loss 20.9016 LearningRate 0.0966 Epoch: 0 Global Step: 14180 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:47,056-Speed 2639.79 samples/sec Loss 20.7540 LearningRate 0.0966 Epoch: 0 Global Step: 14190 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:50,979-Speed 2610.98 samples/sec Loss 20.7130 LearningRate 0.0966 Epoch: 0 Global Step: 14200 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:54,915-Speed 2602.84 samples/sec Loss 20.7855 LearningRate 0.0966 Epoch: 0 Global Step: 14210 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:18:58,836-Speed 2612.38 samples/sec Loss 20.9002 LearningRate 0.0966 Epoch: 0 Global Step: 14220 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:19:02,763-Speed 2607.94 samples/sec Loss 20.6541 LearningRate 0.0966 Epoch: 0 Global Step: 14230 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:19:06,672-Speed 2620.31 samples/sec Loss 20.6625 LearningRate 0.0966 Epoch: 0 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:10,573-Speed 2625.57 samples/sec Loss 20.6501 LearningRate 0.0966 Epoch: 0 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:14,549-Speed 2576.43 samples/sec Loss 20.5439 LearningRate 0.0966 Epoch: 0 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:18,463-Speed 2616.41 samples/sec Loss 20.6955 LearningRate 0.0966 Epoch: 0 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:22,400-Speed 2602.24 samples/sec Loss 20.6172 LearningRate 0.0966 Epoch: 0 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:26,355-Speed 2590.10 samples/sec Loss 20.6038 LearningRate 0.0966 Epoch: 0 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:30,264-Speed 2619.67 samples/sec Loss 20.7531 LearningRate 0.0966 Epoch: 0 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:34,182-Speed 2614.56 samples/sec Loss 20.8339 LearningRate 0.0966 Epoch: 0 Global Step: 14310 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:38,086-Speed 2623.36 samples/sec Loss 20.7419 LearningRate 0.0966 Epoch: 0 Global Step: 14320 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:41,995-Speed 2620.77 samples/sec Loss 20.7064 LearningRate 0.0966 Epoch: 0 Global Step: 14330 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:19:45,892-Speed 2628.06 samples/sec Loss 20.6198 LearningRate 0.0966 Epoch: 0 Global Step: 14340 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:19:49,795-Speed 2624.30 samples/sec Loss 20.5753 LearningRate 0.0966 Epoch: 0 Global Step: 14350 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:19:53,706-Speed 2618.41 samples/sec Loss 20.6853 LearningRate 0.0966 Epoch: 0 Global Step: 14360 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:19:57,686-Speed 2574.11 samples/sec Loss 20.6354 LearningRate 0.0966 Epoch: 0 Global Step: 14370 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:01,594-Speed 2620.16 samples/sec Loss 20.5403 LearningRate 0.0966 Epoch: 0 Global Step: 14380 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:05,504-Speed 2620.60 samples/sec Loss 20.6340 LearningRate 0.0966 Epoch: 0 Global Step: 14390 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:09,431-Speed 2608.09 samples/sec Loss 20.7312 LearningRate 0.0966 Epoch: 0 Global Step: 14400 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:13,329-Speed 2627.82 samples/sec Loss 20.4011 LearningRate 0.0966 Epoch: 0 Global Step: 14410 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:17,231-Speed 2624.76 samples/sec Loss 20.5333 LearningRate 0.0966 Epoch: 0 Global Step: 14420 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:21,151-Speed 2613.22 samples/sec Loss 20.5136 LearningRate 0.0966 Epoch: 0 Global Step: 14430 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:25,045-Speed 2630.28 samples/sec Loss 20.6507 LearningRate 0.0965 Epoch: 0 Global Step: 14440 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:28,943-Speed 2627.45 samples/sec Loss 20.5765 LearningRate 0.0965 Epoch: 0 Global Step: 14450 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:32,892-Speed 2593.62 samples/sec Loss 20.4315 LearningRate 0.0965 Epoch: 0 Global Step: 14460 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:36,795-Speed 2624.61 samples/sec Loss 20.4861 LearningRate 0.0965 Epoch: 0 Global Step: 14470 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:40,693-Speed 2627.51 samples/sec Loss 20.4021 LearningRate 0.0965 Epoch: 0 Global Step: 14480 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:44,593-Speed 2626.18 samples/sec Loss 20.6437 LearningRate 0.0965 Epoch: 0 Global Step: 14490 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:48,496-Speed 2625.11 samples/sec Loss 20.4150 LearningRate 0.0965 Epoch: 0 Global Step: 14500 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:52,456-Speed 2586.36 samples/sec Loss 20.7459 LearningRate 0.0965 Epoch: 0 Global Step: 14510 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:20:56,392-Speed 2603.00 samples/sec Loss 20.3151 LearningRate 0.0965 Epoch: 0 Global Step: 14520 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:00,302-Speed 2619.95 samples/sec Loss 20.5424 LearningRate 0.0965 Epoch: 0 Global Step: 14530 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:04,239-Speed 2601.15 samples/sec Loss 20.6996 LearningRate 0.0965 Epoch: 0 Global Step: 14540 Fp16 Grad Scale: 524288 Required: 91 hours
Training: 2022-04-12 21:21:08,104-Speed 2650.27 samples/sec Loss 20.4683 LearningRate 0.0965 Epoch: 0 Global Step: 14550 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:12,021-Speed 2614.93 samples/sec Loss 20.4152 LearningRate 0.0965 Epoch: 0 Global Step: 14560 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:15,921-Speed 2626.15 samples/sec Loss 20.3713 LearningRate 0.0965 Epoch: 0 Global Step: 14570 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:19,860-Speed 2600.33 samples/sec Loss 20.3828 LearningRate 0.0965 Epoch: 0 Global Step: 14580 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:23,971-Speed 2491.81 samples/sec Loss 20.4944 LearningRate 0.0965 Epoch: 0 Global Step: 14590 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:28,047-Speed 2512.82 samples/sec Loss 20.3514 LearningRate 0.0965 Epoch: 0 Global Step: 14600 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:31,948-Speed 2625.59 samples/sec Loss 20.5247 LearningRate 0.0965 Epoch: 0 Global Step: 14610 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:35,847-Speed 2627.72 samples/sec Loss 20.5284 LearningRate 0.0965 Epoch: 0 Global Step: 14620 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:39,750-Speed 2623.74 samples/sec Loss 20.4351 LearningRate 0.0965 Epoch: 0 Global Step: 14630 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:43,665-Speed 2615.84 samples/sec Loss 20.2618 LearningRate 0.0965 Epoch: 0 Global Step: 14640 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:47,541-Speed 2642.62 samples/sec Loss 20.2751 LearningRate 0.0965 Epoch: 0 Global Step: 14650 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:51,442-Speed 2626.01 samples/sec Loss 20.4466 LearningRate 0.0965 Epoch: 0 Global Step: 14660 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:55,337-Speed 2630.09 samples/sec Loss 20.4509 LearningRate 0.0965 Epoch: 0 Global Step: 14670 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:21:59,238-Speed 2625.09 samples/sec Loss 20.3595 LearningRate 0.0965 Epoch: 0 Global Step: 14680 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:22:03,139-Speed 2625.64 samples/sec Loss 20.2840 LearningRate 0.0965 Epoch: 0 Global Step: 14690 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:22:07,044-Speed 2622.92 samples/sec Loss 20.3608 LearningRate 0.0965 Epoch: 0 Global Step: 14700 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:22:10,947-Speed 2623.92 samples/sec Loss 20.2411 LearningRate 0.0965 Epoch: 0 Global Step: 14710 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:22:14,836-Speed 2633.46 samples/sec Loss 20.2789 LearningRate 0.0965 Epoch: 0 Global Step: 14720 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:18,731-Speed 2629.84 samples/sec Loss 20.5326 LearningRate 0.0965 Epoch: 0 Global Step: 14730 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:22,636-Speed 2623.19 samples/sec Loss 20.3216 LearningRate 0.0965 Epoch: 0 Global Step: 14740 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:26,676-Speed 2535.21 samples/sec Loss 20.3277 LearningRate 0.0965 Epoch: 0 Global Step: 14750 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:30,585-Speed 2620.68 samples/sec Loss 20.3620 LearningRate 0.0965 Epoch: 0 Global Step: 14760 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:34,494-Speed 2620.01 samples/sec Loss 20.3141 LearningRate 0.0965 Epoch: 0 Global Step: 14770 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:38,409-Speed 2616.51 samples/sec Loss 20.2870 LearningRate 0.0965 Epoch: 0 Global Step: 14780 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:42,327-Speed 2613.91 samples/sec Loss 20.3425 LearningRate 0.0965 Epoch: 0 Global Step: 14790 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:46,248-Speed 2612.50 samples/sec Loss 20.2899 LearningRate 0.0965 Epoch: 0 Global Step: 14800 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:50,175-Speed 2608.11 samples/sec Loss 20.2103 LearningRate 0.0965 Epoch: 0 Global Step: 14810 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:22:54,072-Speed 2628.61 samples/sec Loss 20.1464 LearningRate 0.0965 Epoch: 0 Global Step: 14820 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:22:57,969-Speed 2628.09 samples/sec Loss 20.1599 LearningRate 0.0965 Epoch: 0 Global Step: 14830 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:01,866-Speed 2628.54 samples/sec Loss 20.3299 LearningRate 0.0965 Epoch: 0 Global Step: 14840 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:05,763-Speed 2628.07 samples/sec Loss 20.1703 LearningRate 0.0965 Epoch: 0 Global Step: 14850 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:09,659-Speed 2628.77 samples/sec Loss 20.3464 LearningRate 0.0964 Epoch: 0 Global Step: 14860 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:13,679-Speed 2547.70 samples/sec Loss 20.2353 LearningRate 0.0964 Epoch: 0 Global Step: 14870 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:17,725-Speed 2532.15 samples/sec Loss 20.2014 LearningRate 0.0964 Epoch: 0 Global Step: 14880 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:21,628-Speed 2624.32 samples/sec Loss 20.2372 LearningRate 0.0964 Epoch: 0 Global Step: 14890 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:25,523-Speed 2629.85 samples/sec Loss 20.1777 LearningRate 0.0964 Epoch: 0 Global Step: 14900 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:29,423-Speed 2626.48 samples/sec Loss 20.2687 LearningRate 0.0964 Epoch: 0 Global Step: 14910 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:33,310-Speed 2634.97 samples/sec Loss 20.2008 LearningRate 0.0964 Epoch: 0 Global Step: 14920 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:37,214-Speed 2623.72 samples/sec Loss 20.0360 LearningRate 0.0964 Epoch: 0 Global Step: 14930 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:41,120-Speed 2622.16 samples/sec Loss 20.2671 LearningRate 0.0964 Epoch: 0 Global Step: 14940 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:45,024-Speed 2624.18 samples/sec Loss 20.0740 LearningRate 0.0964 Epoch: 0 Global Step: 14950 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:23:48,913-Speed 2633.68 samples/sec Loss 20.0570 LearningRate 0.0964 Epoch: 0 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:23:52,846-Speed 2604.66 samples/sec Loss 20.2165 LearningRate 0.0964 Epoch: 0 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:23:56,744-Speed 2627.41 samples/sec Loss 19.9507 LearningRate 0.0964 Epoch: 0 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:00,642-Speed 2628.00 samples/sec Loss 20.2019 LearningRate 0.0964 Epoch: 0 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:04,576-Speed 2603.87 samples/sec Loss 20.0819 LearningRate 0.0964 Epoch: 0 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:08,497-Speed 2611.75 samples/sec Loss 20.0790 LearningRate 0.0964 Epoch: 0 Global Step: 15010 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:12,397-Speed 2626.44 samples/sec Loss 20.1781 LearningRate 0.0964 Epoch: 0 Global Step: 15020 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:16,294-Speed 2628.44 samples/sec Loss 20.1237 LearningRate 0.0964 Epoch: 0 Global Step: 15030 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:20,206-Speed 2617.94 samples/sec Loss 19.9617 LearningRate 0.0964 Epoch: 0 Global Step: 15040 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:24,105-Speed 2627.20 samples/sec Loss 20.0262 LearningRate 0.0964 Epoch: 0 Global Step: 15050 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:24:28,010-Speed 2623.27 samples/sec Loss 19.8639 LearningRate 0.0964 Epoch: 0 Global Step: 15060 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:32,114-Speed 2495.46 samples/sec Loss 20.0847 LearningRate 0.0964 Epoch: 0 Global Step: 15070 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:36,226-Speed 2491.19 samples/sec Loss 20.0900 LearningRate 0.0964 Epoch: 0 Global Step: 15080 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:40,257-Speed 2541.26 samples/sec Loss 20.1431 LearningRate 0.0964 Epoch: 0 Global Step: 15090 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:44,159-Speed 2625.11 samples/sec Loss 19.9474 LearningRate 0.0964 Epoch: 0 Global Step: 15100 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:48,067-Speed 2620.83 samples/sec Loss 20.0551 LearningRate 0.0964 Epoch: 0 Global Step: 15110 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:51,970-Speed 2624.34 samples/sec Loss 20.0255 LearningRate 0.0964 Epoch: 0 Global Step: 15120 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:55,875-Speed 2623.18 samples/sec Loss 20.0312 LearningRate 0.0964 Epoch: 0 Global Step: 15130 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:24:59,781-Speed 2622.01 samples/sec Loss 20.1780 LearningRate 0.0964 Epoch: 0 Global Step: 15140 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:25:03,826-Speed 2531.94 samples/sec Loss 19.8765 LearningRate 0.0964 Epoch: 0 Global Step: 15150 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:25:07,876-Speed 2529.30 samples/sec Loss 19.9500 LearningRate 0.0964 Epoch: 0 Global Step: 15160 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:25:11,809-Speed 2603.96 samples/sec Loss 20.0620 LearningRate 0.0964 Epoch: 0 Global Step: 15170 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:15,704-Speed 2630.01 samples/sec Loss 19.8389 LearningRate 0.0964 Epoch: 0 Global Step: 15180 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:19,605-Speed 2625.19 samples/sec Loss 19.9978 LearningRate 0.0964 Epoch: 0 Global Step: 15190 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:23,539-Speed 2604.41 samples/sec Loss 19.9619 LearningRate 0.0964 Epoch: 0 Global Step: 15200 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:27,435-Speed 2629.23 samples/sec Loss 19.9275 LearningRate 0.0964 Epoch: 0 Global Step: 15210 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:31,351-Speed 2615.60 samples/sec Loss 19.9222 LearningRate 0.0964 Epoch: 0 Global Step: 15220 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:35,429-Speed 2515.76 samples/sec Loss 19.8817 LearningRate 0.0964 Epoch: 0 Global Step: 15230 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:39,412-Speed 2571.15 samples/sec Loss 19.8325 LearningRate 0.0964 Epoch: 0 Global Step: 15240 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:43,312-Speed 2626.31 samples/sec Loss 19.8355 LearningRate 0.0964 Epoch: 0 Global Step: 15250 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:47,218-Speed 2621.92 samples/sec Loss 19.8318 LearningRate 0.0964 Epoch: 0 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:25:51,116-Speed 2628.25 samples/sec Loss 19.8232 LearningRate 0.0964 Epoch: 0 Global Step: 15270 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:25:55,019-Speed 2624.53 samples/sec Loss 19.7669 LearningRate 0.0964 Epoch: 0 Global Step: 15280 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:25:58,906-Speed 2635.65 samples/sec Loss 19.8215 LearningRate 0.0963 Epoch: 0 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:02,818-Speed 2617.61 samples/sec Loss 19.8195 LearningRate 0.0963 Epoch: 0 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:06,722-Speed 2624.19 samples/sec Loss 20.0283 LearningRate 0.0963 Epoch: 0 Global Step: 15310 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:10,631-Speed 2620.25 samples/sec Loss 19.8638 LearningRate 0.0963 Epoch: 0 Global Step: 15320 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:14,531-Speed 2626.09 samples/sec Loss 19.8123 LearningRate 0.0963 Epoch: 0 Global Step: 15330 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:18,449-Speed 2614.22 samples/sec Loss 19.8732 LearningRate 0.0963 Epoch: 0 Global Step: 15340 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:22,352-Speed 2624.56 samples/sec Loss 19.9749 LearningRate 0.0963 Epoch: 0 Global Step: 15350 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:26,259-Speed 2621.64 samples/sec Loss 19.6293 LearningRate 0.0963 Epoch: 0 Global Step: 15360 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:30,184-Speed 2609.17 samples/sec Loss 19.7119 LearningRate 0.0963 Epoch: 0 Global Step: 15370 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:34,099-Speed 2616.59 samples/sec Loss 19.5980 LearningRate 0.0963 Epoch: 0 Global Step: 15380 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:38,010-Speed 2619.08 samples/sec Loss 19.8416 LearningRate 0.0963 Epoch: 0 Global Step: 15390 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:26:41,913-Speed 2623.92 samples/sec Loss 19.9768 LearningRate 0.0963 Epoch: 0 Global Step: 15400 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:26:45,797-Speed 2637.59 samples/sec Loss 19.8493 LearningRate 0.0963 Epoch: 0 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:49,701-Speed 2623.40 samples/sec Loss 19.8868 LearningRate 0.0963 Epoch: 0 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:53,602-Speed 2625.66 samples/sec Loss 19.9059 LearningRate 0.0963 Epoch: 0 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:26:57,500-Speed 2627.99 samples/sec Loss 19.8015 LearningRate 0.0963 Epoch: 0 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:27:01,402-Speed 2624.59 samples/sec Loss 19.7324 LearningRate 0.0963 Epoch: 0 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:27:05,303-Speed 2625.09 samples/sec Loss 19.7264 LearningRate 0.0963 Epoch: 0 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:27:09,203-Speed 2626.54 samples/sec Loss 19.7913 LearningRate 0.0963 Epoch: 0 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:27:13,106-Speed 2624.42 samples/sec Loss 19.7186 LearningRate 0.0963 Epoch: 0 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:27:17,014-Speed 2621.46 samples/sec Loss 19.6280 LearningRate 0.0963 Epoch: 0 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:27:20,916-Speed 2624.41 samples/sec Loss 19.8520 LearningRate 0.0963 Epoch: 0 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:27:24,847-Speed 2605.69 samples/sec Loss 19.7613 LearningRate 0.0963 Epoch: 0 Global Step: 15510 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:28,751-Speed 2623.64 samples/sec Loss 19.6829 LearningRate 0.0963 Epoch: 0 Global Step: 15520 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:32,655-Speed 2623.67 samples/sec Loss 19.8028 LearningRate 0.0963 Epoch: 0 Global Step: 15530 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:36,616-Speed 2586.10 samples/sec Loss 19.6120 LearningRate 0.0963 Epoch: 0 Global Step: 15540 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:40,550-Speed 2603.45 samples/sec Loss 19.6141 LearningRate 0.0963 Epoch: 0 Global Step: 15550 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:44,459-Speed 2620.43 samples/sec Loss 19.6628 LearningRate 0.0963 Epoch: 0 Global Step: 15560 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:48,368-Speed 2620.20 samples/sec Loss 19.8381 LearningRate 0.0963 Epoch: 0 Global Step: 15570 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:52,280-Speed 2618.38 samples/sec Loss 19.7307 LearningRate 0.0963 Epoch: 0 Global Step: 15580 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:27:56,198-Speed 2614.13 samples/sec Loss 19.4143 LearningRate 0.0963 Epoch: 0 Global Step: 15590 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 21:28:00,081-Speed 2637.92 samples/sec Loss 19.7376 LearningRate 0.0963 Epoch: 0 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:03,995-Speed 2616.84 samples/sec Loss 19.6846 LearningRate 0.0963 Epoch: 0 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:07,912-Speed 2614.75 samples/sec Loss 19.6380 LearningRate 0.0963 Epoch: 0 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:11,823-Speed 2619.39 samples/sec Loss 19.5017 LearningRate 0.0963 Epoch: 0 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:15,738-Speed 2615.60 samples/sec Loss 19.7234 LearningRate 0.0963 Epoch: 0 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:19,706-Speed 2581.59 samples/sec Loss 19.5795 LearningRate 0.0963 Epoch: 0 Global Step: 15650 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:23,608-Speed 2624.84 samples/sec Loss 19.6088 LearningRate 0.0963 Epoch: 0 Global Step: 15660 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:27,643-Speed 2538.50 samples/sec Loss 19.5246 LearningRate 0.0963 Epoch: 0 Global Step: 15670 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:31,548-Speed 2622.92 samples/sec Loss 19.5502 LearningRate 0.0963 Epoch: 0 Global Step: 15680 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:35,464-Speed 2615.55 samples/sec Loss 19.3298 LearningRate 0.0963 Epoch: 0 Global Step: 15690 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:28:39,377-Speed 2617.61 samples/sec Loss 19.3995 LearningRate 0.0963 Epoch: 0 Global Step: 15700 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:28:43,301-Speed 2609.89 samples/sec Loss 19.7105 LearningRate 0.0962 Epoch: 0 Global Step: 15710 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:28:47,216-Speed 2618.04 samples/sec Loss 19.8256 LearningRate 0.0962 Epoch: 0 Global Step: 15720 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:28:51,122-Speed 2622.49 samples/sec Loss 19.8013 LearningRate 0.0962 Epoch: 0 Global Step: 15730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:28:55,019-Speed 2628.16 samples/sec Loss 19.6269 LearningRate 0.0962 Epoch: 0 Global Step: 15740 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:28:58,916-Speed 2628.66 samples/sec Loss 19.5726 LearningRate 0.0962 Epoch: 0 Global Step: 15750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:02,821-Speed 2622.90 samples/sec Loss 19.4912 LearningRate 0.0962 Epoch: 0 Global Step: 15760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:06,720-Speed 2626.40 samples/sec Loss 19.5454 LearningRate 0.0962 Epoch: 0 Global Step: 15770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:10,618-Speed 2628.19 samples/sec Loss 19.4650 LearningRate 0.0962 Epoch: 0 Global Step: 15780 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:14,524-Speed 2622.00 samples/sec Loss 19.5618 LearningRate 0.0962 Epoch: 0 Global Step: 15790 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:18,442-Speed 2614.49 samples/sec Loss 19.4869 LearningRate 0.0962 Epoch: 0 Global Step: 15800 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:22,338-Speed 2628.46 samples/sec Loss 19.5134 LearningRate 0.0962 Epoch: 0 Global Step: 15810 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:26,246-Speed 2621.38 samples/sec Loss 19.3377 LearningRate 0.0962 Epoch: 0 Global Step: 15820 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:30,255-Speed 2554.37 samples/sec Loss 19.5641 LearningRate 0.0962 Epoch: 0 Global Step: 15830 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:34,161-Speed 2622.39 samples/sec Loss 19.5162 LearningRate 0.0962 Epoch: 0 Global Step: 15840 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:38,062-Speed 2625.20 samples/sec Loss 19.3122 LearningRate 0.0962 Epoch: 0 Global Step: 15850 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:29:42,058-Speed 2563.68 samples/sec Loss 19.5205 LearningRate 0.0962 Epoch: 0 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:29:46,094-Speed 2537.50 samples/sec Loss 19.4667 LearningRate 0.0962 Epoch: 0 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:29:50,129-Speed 2538.84 samples/sec Loss 19.4186 LearningRate 0.0962 Epoch: 0 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:29:54,031-Speed 2624.49 samples/sec Loss 19.5073 LearningRate 0.0962 Epoch: 0 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:29:57,992-Speed 2586.81 samples/sec Loss 19.3129 LearningRate 0.0962 Epoch: 0 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:30:02,058-Speed 2518.70 samples/sec Loss 19.4443 LearningRate 0.0962 Epoch: 0 Global Step: 15910 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:30:05,978-Speed 2613.18 samples/sec Loss 19.4529 LearningRate 0.0962 Epoch: 0 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:30:09,877-Speed 2626.37 samples/sec Loss 19.4327 LearningRate 0.0962 Epoch: 0 Global Step: 15930 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:30:13,827-Speed 2593.45 samples/sec Loss 19.6399 LearningRate 0.0962 Epoch: 0 Global Step: 15940 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:30:17,816-Speed 2568.25 samples/sec Loss 19.3149 LearningRate 0.0962 Epoch: 0 Global Step: 15950 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:30:21,722-Speed 2622.01 samples/sec Loss 19.3308 LearningRate 0.0962 Epoch: 0 Global Step: 15960 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:25,632-Speed 2619.29 samples/sec Loss 19.4301 LearningRate 0.0962 Epoch: 0 Global Step: 15970 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:29,546-Speed 2617.28 samples/sec Loss 19.2976 LearningRate 0.0962 Epoch: 0 Global Step: 15980 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:33,455-Speed 2620.86 samples/sec Loss 19.3772 LearningRate 0.0962 Epoch: 0 Global Step: 15990 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:37,368-Speed 2616.92 samples/sec Loss 19.3519 LearningRate 0.0962 Epoch: 0 Global Step: 16000 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:41,275-Speed 2621.23 samples/sec Loss 19.4913 LearningRate 0.0962 Epoch: 0 Global Step: 16010 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:45,187-Speed 2618.16 samples/sec Loss 19.4049 LearningRate 0.0962 Epoch: 0 Global Step: 16020 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:49,211-Speed 2545.81 samples/sec Loss 19.5115 LearningRate 0.0962 Epoch: 0 Global Step: 16030 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:53,309-Speed 2499.64 samples/sec Loss 19.4723 LearningRate 0.0962 Epoch: 0 Global Step: 16040 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:30:57,390-Speed 2510.02 samples/sec Loss 19.3057 LearningRate 0.0962 Epoch: 0 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:01,287-Speed 2628.43 samples/sec Loss 19.2148 LearningRate 0.0962 Epoch: 0 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:05,208-Speed 2612.27 samples/sec Loss 19.4634 LearningRate 0.0962 Epoch: 0 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:09,131-Speed 2610.96 samples/sec Loss 19.4095 LearningRate 0.0962 Epoch: 0 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:13,044-Speed 2617.73 samples/sec Loss 19.4027 LearningRate 0.0962 Epoch: 0 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:16,959-Speed 2616.31 samples/sec Loss 19.2628 LearningRate 0.0962 Epoch: 0 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:20,862-Speed 2624.19 samples/sec Loss 19.2431 LearningRate 0.0962 Epoch: 0 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:24,762-Speed 2626.03 samples/sec Loss 19.2754 LearningRate 0.0962 Epoch: 0 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:28,662-Speed 2626.99 samples/sec Loss 19.2482 LearningRate 0.0961 Epoch: 0 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:32,557-Speed 2629.66 samples/sec Loss 19.3709 LearningRate 0.0961 Epoch: 0 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:36,457-Speed 2626.32 samples/sec Loss 19.3035 LearningRate 0.0961 Epoch: 0 Global Step: 16150 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:31:40,368-Speed 2618.69 samples/sec Loss 19.3423 LearningRate 0.0961 Epoch: 0 Global Step: 16160 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:31:44,272-Speed 2623.22 samples/sec Loss 19.1420 LearningRate 0.0961 Epoch: 0 Global Step: 16170 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:31:48,254-Speed 2572.32 samples/sec Loss 19.2104 LearningRate 0.0961 Epoch: 0 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:52,155-Speed 2626.31 samples/sec Loss 19.4840 LearningRate 0.0961 Epoch: 0 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:56,053-Speed 2627.84 samples/sec Loss 19.4578 LearningRate 0.0961 Epoch: 0 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:31:59,956-Speed 2624.16 samples/sec Loss 19.3557 LearningRate 0.0961 Epoch: 0 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:32:03,854-Speed 2627.85 samples/sec Loss 19.2880 LearningRate 0.0961 Epoch: 0 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:32:07,749-Speed 2629.44 samples/sec Loss 19.3535 LearningRate 0.0961 Epoch: 0 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:32:11,650-Speed 2625.58 samples/sec Loss 19.1594 LearningRate 0.0961 Epoch: 0 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:32:15,550-Speed 2626.13 samples/sec Loss 19.2143 LearningRate 0.0961 Epoch: 0 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:32:19,458-Speed 2620.97 samples/sec Loss 19.3322 LearningRate 0.0961 Epoch: 0 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:32:23,356-Speed 2627.85 samples/sec Loss 19.2327 LearningRate 0.0961 Epoch: 0 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:32:27,258-Speed 2625.15 samples/sec Loss 19.4268 LearningRate 0.0961 Epoch: 0 Global Step: 16280 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:31,161-Speed 2623.75 samples/sec Loss 19.2140 LearningRate 0.0961 Epoch: 0 Global Step: 16290 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:35,060-Speed 2627.29 samples/sec Loss 19.1399 LearningRate 0.0961 Epoch: 0 Global Step: 16300 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:38,990-Speed 2606.70 samples/sec Loss 19.2792 LearningRate 0.0961 Epoch: 0 Global Step: 16310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:42,888-Speed 2627.41 samples/sec Loss 19.2350 LearningRate 0.0961 Epoch: 0 Global Step: 16320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:46,810-Speed 2611.75 samples/sec Loss 19.1820 LearningRate 0.0961 Epoch: 0 Global Step: 16330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:50,723-Speed 2617.77 samples/sec Loss 19.1477 LearningRate 0.0961 Epoch: 0 Global Step: 16340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:54,621-Speed 2627.40 samples/sec Loss 19.0761 LearningRate 0.0961 Epoch: 0 Global Step: 16350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:32:58,503-Speed 2638.06 samples/sec Loss 19.1807 LearningRate 0.0961 Epoch: 0 Global Step: 16360 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:02,418-Speed 2616.28 samples/sec Loss 19.1691 LearningRate 0.0961 Epoch: 0 Global Step: 16370 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:06,328-Speed 2620.27 samples/sec Loss 19.0885 LearningRate 0.0961 Epoch: 0 Global Step: 16380 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:10,225-Speed 2628.04 samples/sec Loss 19.0631 LearningRate 0.0961 Epoch: 0 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:14,132-Speed 2621.97 samples/sec Loss 19.1683 LearningRate 0.0961 Epoch: 0 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:18,037-Speed 2623.37 samples/sec Loss 19.2033 LearningRate 0.0961 Epoch: 0 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:21,950-Speed 2617.32 samples/sec Loss 19.1899 LearningRate 0.0961 Epoch: 0 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:25,850-Speed 2626.84 samples/sec Loss 19.1508 LearningRate 0.0961 Epoch: 0 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:29,753-Speed 2623.98 samples/sec Loss 18.9315 LearningRate 0.0961 Epoch: 0 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:33,660-Speed 2621.45 samples/sec Loss 19.1554 LearningRate 0.0961 Epoch: 0 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:37,571-Speed 2618.63 samples/sec Loss 19.0474 LearningRate 0.0961 Epoch: 0 Global Step: 16460 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:33:41,474-Speed 2624.75 samples/sec Loss 19.1450 LearningRate 0.0961 Epoch: 0 Global Step: 16470 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:33:45,386-Speed 2618.02 samples/sec Loss 19.0001 LearningRate 0.0961 Epoch: 0 Global Step: 16480 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:33:49,282-Speed 2632.65 samples/sec Loss 19.1274 LearningRate 0.0961 Epoch: 0 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:53,173-Speed 2632.32 samples/sec Loss 19.0434 LearningRate 0.0961 Epoch: 0 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:33:57,075-Speed 2625.07 samples/sec Loss 19.0653 LearningRate 0.0961 Epoch: 0 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:00,971-Speed 2629.12 samples/sec Loss 18.8936 LearningRate 0.0961 Epoch: 0 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:04,944-Speed 2577.77 samples/sec Loss 19.0752 LearningRate 0.0961 Epoch: 0 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:08,976-Speed 2539.86 samples/sec Loss 18.9252 LearningRate 0.0961 Epoch: 0 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:12,970-Speed 2565.12 samples/sec Loss 19.3036 LearningRate 0.0960 Epoch: 0 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:16,883-Speed 2617.36 samples/sec Loss 18.9670 LearningRate 0.0960 Epoch: 0 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:20,795-Speed 2618.62 samples/sec Loss 19.1126 LearningRate 0.0960 Epoch: 0 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:24,700-Speed 2622.63 samples/sec Loss 18.8779 LearningRate 0.0960 Epoch: 0 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:28,599-Speed 2627.26 samples/sec Loss 18.9406 LearningRate 0.0960 Epoch: 0 Global Step: 16590 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:34:32,497-Speed 2627.48 samples/sec Loss 19.0408 LearningRate 0.0960 Epoch: 0 Global Step: 16600 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:34:36,408-Speed 2619.08 samples/sec Loss 19.0475 LearningRate 0.0960 Epoch: 0 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:40,338-Speed 2606.12 samples/sec Loss 19.0731 LearningRate 0.0960 Epoch: 0 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:44,236-Speed 2627.67 samples/sec Loss 18.8711 LearningRate 0.0960 Epoch: 0 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:48,140-Speed 2623.90 samples/sec Loss 18.9269 LearningRate 0.0960 Epoch: 0 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:52,041-Speed 2625.67 samples/sec Loss 18.8855 LearningRate 0.0960 Epoch: 0 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:55,978-Speed 2601.64 samples/sec Loss 18.8269 LearningRate 0.0960 Epoch: 0 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:34:59,882-Speed 2623.51 samples/sec Loss 18.9991 LearningRate 0.0960 Epoch: 0 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:35:03,795-Speed 2617.78 samples/sec Loss 18.9296 LearningRate 0.0960 Epoch: 0 Global Step: 16680 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:35:07,696-Speed 2625.22 samples/sec Loss 19.0012 LearningRate 0.0960 Epoch: 0 Global Step: 16690 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:35:11,593-Speed 2628.29 samples/sec Loss 19.0163 LearningRate 0.0960 Epoch: 0 Global Step: 16700 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:35:15,493-Speed 2626.50 samples/sec Loss 18.9328 LearningRate 0.0960 Epoch: 0 Global Step: 16710 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:19,408-Speed 2616.36 samples/sec Loss 18.8871 LearningRate 0.0960 Epoch: 0 Global Step: 16720 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:23,348-Speed 2599.57 samples/sec Loss 18.9915 LearningRate 0.0960 Epoch: 0 Global Step: 16730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:27,264-Speed 2615.75 samples/sec Loss 18.8782 LearningRate 0.0960 Epoch: 0 Global Step: 16740 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:31,169-Speed 2623.09 samples/sec Loss 18.8398 LearningRate 0.0960 Epoch: 0 Global Step: 16750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:35,066-Speed 2628.27 samples/sec Loss 18.8749 LearningRate 0.0960 Epoch: 0 Global Step: 16760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:38,974-Speed 2620.58 samples/sec Loss 18.8228 LearningRate 0.0960 Epoch: 0 Global Step: 16770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:42,885-Speed 2619.51 samples/sec Loss 18.7446 LearningRate 0.0960 Epoch: 0 Global Step: 16780 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:46,796-Speed 2618.18 samples/sec Loss 18.9940 LearningRate 0.0960 Epoch: 0 Global Step: 16790 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:50,810-Speed 2551.89 samples/sec Loss 18.6806 LearningRate 0.0960 Epoch: 0 Global Step: 16800 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:35:54,899-Speed 2505.24 samples/sec Loss 18.9573 LearningRate 0.0960 Epoch: 0 Global Step: 16810 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 21:35:58,992-Speed 2502.66 samples/sec Loss 18.9461 LearningRate 0.0960 Epoch: 0 Global Step: 16820 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 21:36:02,948-Speed 2588.99 samples/sec Loss 19.0081 LearningRate 0.0960 Epoch: 0 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:06,846-Speed 2627.72 samples/sec Loss 18.9221 LearningRate 0.0960 Epoch: 0 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:10,769-Speed 2610.94 samples/sec Loss 18.8652 LearningRate 0.0960 Epoch: 0 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:14,674-Speed 2623.04 samples/sec Loss 18.9036 LearningRate 0.0960 Epoch: 0 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:18,601-Speed 2608.72 samples/sec Loss 18.9129 LearningRate 0.0960 Epoch: 0 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:22,498-Speed 2628.05 samples/sec Loss 18.8909 LearningRate 0.0960 Epoch: 0 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:26,420-Speed 2611.75 samples/sec Loss 18.7915 LearningRate 0.0960 Epoch: 0 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:30,323-Speed 2624.79 samples/sec Loss 18.9109 LearningRate 0.0960 Epoch: 0 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:34,222-Speed 2626.87 samples/sec Loss 18.8309 LearningRate 0.0960 Epoch: 0 Global Step: 16910 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:38,227-Speed 2557.58 samples/sec Loss 18.8111 LearningRate 0.0960 Epoch: 0 Global Step: 16920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:42,154-Speed 2607.83 samples/sec Loss 18.8319 LearningRate 0.0960 Epoch: 0 Global Step: 16930 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:36:46,054-Speed 2626.83 samples/sec Loss 18.9484 LearningRate 0.0960 Epoch: 0 Global Step: 16940 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:36:49,943-Speed 2633.54 samples/sec Loss 18.8288 LearningRate 0.0960 Epoch: 0 Global Step: 16950 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:53,850-Speed 2621.58 samples/sec Loss 18.7720 LearningRate 0.0960 Epoch: 0 Global Step: 16960 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:36:57,774-Speed 2610.18 samples/sec Loss 18.7448 LearningRate 0.0960 Epoch: 0 Global Step: 16970 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:01,680-Speed 2622.60 samples/sec Loss 18.7248 LearningRate 0.0959 Epoch: 0 Global Step: 16980 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:05,581-Speed 2625.12 samples/sec Loss 18.8466 LearningRate 0.0959 Epoch: 0 Global Step: 16990 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:09,550-Speed 2580.51 samples/sec Loss 18.6689 LearningRate 0.0959 Epoch: 0 Global Step: 17000 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:13,456-Speed 2622.25 samples/sec Loss 18.7883 LearningRate 0.0959 Epoch: 0 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:17,368-Speed 2618.40 samples/sec Loss 18.6970 LearningRate 0.0959 Epoch: 0 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:21,272-Speed 2623.20 samples/sec Loss 18.6128 LearningRate 0.0959 Epoch: 0 Global Step: 17030 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:25,177-Speed 2623.79 samples/sec Loss 18.6981 LearningRate 0.0959 Epoch: 0 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:29,078-Speed 2625.17 samples/sec Loss 18.7014 LearningRate 0.0959 Epoch: 0 Global Step: 17050 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:37:32,975-Speed 2628.66 samples/sec Loss 18.7558 LearningRate 0.0959 Epoch: 0 Global Step: 17060 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:37:37,007-Speed 2540.36 samples/sec Loss 18.5585 LearningRate 0.0959 Epoch: 0 Global Step: 17070 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:37:41,093-Speed 2506.68 samples/sec Loss 18.6075 LearningRate 0.0959 Epoch: 0 Global Step: 17080 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:37:45,111-Speed 2549.05 samples/sec Loss 18.6098 LearningRate 0.0959 Epoch: 0 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:49,005-Speed 2630.43 samples/sec Loss 18.8486 LearningRate 0.0959 Epoch: 0 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:52,910-Speed 2622.49 samples/sec Loss 18.6959 LearningRate 0.0959 Epoch: 0 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:37:56,882-Speed 2579.44 samples/sec Loss 18.6890 LearningRate 0.0959 Epoch: 0 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:00,787-Speed 2622.99 samples/sec Loss 18.8730 LearningRate 0.0959 Epoch: 0 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:04,690-Speed 2624.20 samples/sec Loss 18.7591 LearningRate 0.0959 Epoch: 0 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:08,586-Speed 2628.62 samples/sec Loss 18.8309 LearningRate 0.0959 Epoch: 0 Global Step: 17150 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:12,483-Speed 2628.53 samples/sec Loss 18.5447 LearningRate 0.0959 Epoch: 0 Global Step: 17160 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:16,377-Speed 2630.17 samples/sec Loss 18.6145 LearningRate 0.0959 Epoch: 0 Global Step: 17170 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:20,280-Speed 2623.93 samples/sec Loss 18.7063 LearningRate 0.0959 Epoch: 0 Global Step: 17180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:24,184-Speed 2623.50 samples/sec Loss 18.5919 LearningRate 0.0959 Epoch: 0 Global Step: 17190 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:38:28,080-Speed 2629.96 samples/sec Loss 18.6246 LearningRate 0.0959 Epoch: 0 Global Step: 17200 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:38:31,958-Speed 2640.86 samples/sec Loss 18.3443 LearningRate 0.0959 Epoch: 0 Global Step: 17210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:35,854-Speed 2628.88 samples/sec Loss 18.6411 LearningRate 0.0959 Epoch: 0 Global Step: 17220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:39,774-Speed 2612.85 samples/sec Loss 18.7173 LearningRate 0.0959 Epoch: 0 Global Step: 17230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:43,691-Speed 2615.36 samples/sec Loss 18.6190 LearningRate 0.0959 Epoch: 0 Global Step: 17240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:47,625-Speed 2603.90 samples/sec Loss 18.5798 LearningRate 0.0959 Epoch: 0 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:51,531-Speed 2622.72 samples/sec Loss 18.5093 LearningRate 0.0959 Epoch: 0 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:55,444-Speed 2616.85 samples/sec Loss 18.6254 LearningRate 0.0959 Epoch: 0 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:38:59,348-Speed 2623.56 samples/sec Loss 18.7235 LearningRate 0.0959 Epoch: 0 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:39:03,255-Speed 2621.71 samples/sec Loss 18.6812 LearningRate 0.0959 Epoch: 0 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:39:07,164-Speed 2620.70 samples/sec Loss 18.5658 LearningRate 0.0959 Epoch: 0 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:39:11,066-Speed 2624.93 samples/sec Loss 18.5562 LearningRate 0.0959 Epoch: 0 Global Step: 17310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:39:14,973-Speed 2621.72 samples/sec Loss 18.6058 LearningRate 0.0959 Epoch: 0 Global Step: 17320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:39:18,908-Speed 2603.17 samples/sec Loss 18.5384 LearningRate 0.0959 Epoch: 0 Global Step: 17330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:39:22,818-Speed 2619.81 samples/sec Loss 18.6768 LearningRate 0.0959 Epoch: 0 Global Step: 17340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:39:26,718-Speed 2626.08 samples/sec Loss 18.5012 LearningRate 0.0959 Epoch: 0 Global Step: 17350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:39:30,620-Speed 2625.14 samples/sec Loss 18.6699 LearningRate 0.0959 Epoch: 0 Global Step: 17360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:39:34,491-Speed 2645.98 samples/sec Loss 18.6062 LearningRate 0.0959 Epoch: 0 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:39:38,382-Speed 2631.92 samples/sec Loss 18.5708 LearningRate 0.0959 Epoch: 0 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:39:42,288-Speed 2622.43 samples/sec Loss 18.6713 LearningRate 0.0959 Epoch: 0 Global Step: 17390 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:39:46,216-Speed 2607.55 samples/sec Loss 18.4591 LearningRate 0.0958 Epoch: 0 Global Step: 17400 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:39:50,126-Speed 2620.42 samples/sec Loss 18.2639 LearningRate 0.0958 Epoch: 0 Global Step: 17410 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:39:54,032-Speed 2621.56 samples/sec Loss 18.4446 LearningRate 0.0958 Epoch: 0 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:39:57,943-Speed 2619.19 samples/sec Loss 18.3904 LearningRate 0.0958 Epoch: 0 Global Step: 17430 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:40:01,847-Speed 2623.29 samples/sec Loss 18.4460 LearningRate 0.0958 Epoch: 0 Global Step: 17440 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:40:05,762-Speed 2616.52 samples/sec Loss 18.4449 LearningRate 0.0958 Epoch: 0 Global Step: 17450 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:40:09,666-Speed 2623.24 samples/sec Loss 18.5072 LearningRate 0.0958 Epoch: 0 Global Step: 17460 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:40:13,566-Speed 2626.64 samples/sec Loss 18.5321 LearningRate 0.0958 Epoch: 0 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:17,468-Speed 2625.06 samples/sec Loss 18.5781 LearningRate 0.0958 Epoch: 0 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:21,396-Speed 2607.82 samples/sec Loss 18.3992 LearningRate 0.0958 Epoch: 0 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:25,297-Speed 2625.03 samples/sec Loss 18.6779 LearningRate 0.0958 Epoch: 0 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:29,249-Speed 2591.94 samples/sec Loss 18.5081 LearningRate 0.0958 Epoch: 0 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:33,157-Speed 2621.47 samples/sec Loss 18.3493 LearningRate 0.0958 Epoch: 0 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:37,064-Speed 2621.40 samples/sec Loss 18.3773 LearningRate 0.0958 Epoch: 0 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:40,974-Speed 2619.63 samples/sec Loss 18.5439 LearningRate 0.0958 Epoch: 0 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:44,882-Speed 2621.15 samples/sec Loss 18.2566 LearningRate 0.0958 Epoch: 0 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:48,796-Speed 2617.36 samples/sec Loss 18.5245 LearningRate 0.0958 Epoch: 0 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:40:52,707-Speed 2618.18 samples/sec Loss 18.4863 LearningRate 0.0958 Epoch: 0 Global Step: 17570 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:40:56,680-Speed 2578.99 samples/sec Loss 18.6396 LearningRate 0.0958 Epoch: 0 Global Step: 17580 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:41:00,779-Speed 2498.66 samples/sec Loss 18.4179 LearningRate 0.0958 Epoch: 0 Global Step: 17590 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:04,783-Speed 2557.97 samples/sec Loss 18.4413 LearningRate 0.0958 Epoch: 0 Global Step: 17600 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:08,685-Speed 2624.85 samples/sec Loss 18.4886 LearningRate 0.0958 Epoch: 0 Global Step: 17610 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:12,591-Speed 2622.69 samples/sec Loss 18.3128 LearningRate 0.0958 Epoch: 0 Global Step: 17620 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:16,496-Speed 2622.35 samples/sec Loss 18.3152 LearningRate 0.0958 Epoch: 0 Global Step: 17630 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:20,462-Speed 2582.73 samples/sec Loss 18.5744 LearningRate 0.0958 Epoch: 0 Global Step: 17640 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:24,361-Speed 2627.34 samples/sec Loss 18.5958 LearningRate 0.0958 Epoch: 0 Global Step: 17650 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:28,264-Speed 2624.66 samples/sec Loss 18.5716 LearningRate 0.0958 Epoch: 0 Global Step: 17660 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:32,168-Speed 2623.61 samples/sec Loss 18.5943 LearningRate 0.0958 Epoch: 0 Global Step: 17670 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:36,071-Speed 2624.32 samples/sec Loss 18.2299 LearningRate 0.0958 Epoch: 0 Global Step: 17680 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:41:39,979-Speed 2620.62 samples/sec Loss 18.5051 LearningRate 0.0958 Epoch: 0 Global Step: 17690 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:41:43,886-Speed 2621.81 samples/sec Loss 18.2257 LearningRate 0.0958 Epoch: 0 Global Step: 17700 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:41:47,796-Speed 2619.64 samples/sec Loss 18.4533 LearningRate 0.0958 Epoch: 0 Global Step: 17710 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:41:51,701-Speed 2622.97 samples/sec Loss 18.4208 LearningRate 0.0958 Epoch: 0 Global Step: 17720 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:41:55,601-Speed 2626.00 samples/sec Loss 18.3075 LearningRate 0.0958 Epoch: 0 Global Step: 17730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:41:59,501-Speed 2627.20 samples/sec Loss 18.4998 LearningRate 0.0958 Epoch: 0 Global Step: 17740 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:42:03,414-Speed 2617.17 samples/sec Loss 18.4351 LearningRate 0.0958 Epoch: 0 Global Step: 17750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:42:07,314-Speed 2626.30 samples/sec Loss 18.4997 LearningRate 0.0958 Epoch: 0 Global Step: 17760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:42:11,228-Speed 2616.37 samples/sec Loss 18.3818 LearningRate 0.0958 Epoch: 0 Global Step: 17770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:42:15,169-Speed 2599.73 samples/sec Loss 18.3415 LearningRate 0.0958 Epoch: 0 Global Step: 17780 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:42:19,051-Speed 2639.00 samples/sec Loss 18.3828 LearningRate 0.0958 Epoch: 0 Global Step: 17790 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:42:22,954-Speed 2624.15 samples/sec Loss 18.2549 LearningRate 0.0958 Epoch: 0 Global Step: 17800 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:42:26,850-Speed 2629.19 samples/sec Loss 18.3265 LearningRate 0.0958 Epoch: 0 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:30,752-Speed 2625.02 samples/sec Loss 18.4371 LearningRate 0.0957 Epoch: 0 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:34,779-Speed 2543.70 samples/sec Loss 18.3358 LearningRate 0.0957 Epoch: 0 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:38,758-Speed 2574.11 samples/sec Loss 18.3665 LearningRate 0.0957 Epoch: 0 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:42,668-Speed 2619.68 samples/sec Loss 18.4066 LearningRate 0.0957 Epoch: 0 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:46,573-Speed 2622.98 samples/sec Loss 18.3307 LearningRate 0.0957 Epoch: 0 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:50,483-Speed 2619.34 samples/sec Loss 18.3592 LearningRate 0.0957 Epoch: 0 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:54,386-Speed 2624.52 samples/sec Loss 18.4958 LearningRate 0.0957 Epoch: 0 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:42:58,284-Speed 2627.60 samples/sec Loss 18.2821 LearningRate 0.0957 Epoch: 0 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:43:02,197-Speed 2617.83 samples/sec Loss 18.3501 LearningRate 0.0957 Epoch: 0 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:43:06,115-Speed 2614.22 samples/sec Loss 18.3933 LearningRate 0.0957 Epoch: 0 Global Step: 17910 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:10,028-Speed 2618.31 samples/sec Loss 18.4611 LearningRate 0.0957 Epoch: 0 Global Step: 17920 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:13,925-Speed 2628.15 samples/sec Loss 18.2330 LearningRate 0.0957 Epoch: 0 Global Step: 17930 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:17,909-Speed 2570.66 samples/sec Loss 18.2776 LearningRate 0.0957 Epoch: 0 Global Step: 17940 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:21,809-Speed 2625.87 samples/sec Loss 18.4361 LearningRate 0.0957 Epoch: 0 Global Step: 17950 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:25,709-Speed 2625.96 samples/sec Loss 18.1383 LearningRate 0.0957 Epoch: 0 Global Step: 17960 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:29,614-Speed 2623.40 samples/sec Loss 18.1212 LearningRate 0.0957 Epoch: 0 Global Step: 17970 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:33,525-Speed 2619.10 samples/sec Loss 18.3211 LearningRate 0.0957 Epoch: 0 Global Step: 17980 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:37,429-Speed 2623.54 samples/sec Loss 18.0621 LearningRate 0.0957 Epoch: 0 Global Step: 17990 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:43:41,315-Speed 2636.31 samples/sec Loss 18.2717 LearningRate 0.0957 Epoch: 0 Global Step: 18000 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:43:45,220-Speed 2622.67 samples/sec Loss 18.3641 LearningRate 0.0957 Epoch: 0 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:43:49,131-Speed 2618.23 samples/sec Loss 18.3980 LearningRate 0.0957 Epoch: 0 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:43:53,043-Speed 2618.43 samples/sec Loss 18.2029 LearningRate 0.0957 Epoch: 0 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:43:56,952-Speed 2620.34 samples/sec Loss 18.0442 LearningRate 0.0957 Epoch: 0 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:44:00,863-Speed 2619.11 samples/sec Loss 18.2281 LearningRate 0.0957 Epoch: 0 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:44:04,780-Speed 2614.85 samples/sec Loss 18.3425 LearningRate 0.0957 Epoch: 0 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:44:08,690-Speed 2619.99 samples/sec Loss 18.3728 LearningRate 0.0957 Epoch: 0 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:44:12,587-Speed 2628.06 samples/sec Loss 18.1922 LearningRate 0.0957 Epoch: 0 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:44:16,488-Speed 2625.31 samples/sec Loss 18.2622 LearningRate 0.0957 Epoch: 0 Global Step: 18090 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:44:20,408-Speed 2612.75 samples/sec Loss 18.0871 LearningRate 0.0957 Epoch: 0 Global Step: 18100 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:24,307-Speed 2627.30 samples/sec Loss 18.1889 LearningRate 0.0957 Epoch: 0 Global Step: 18110 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:28,228-Speed 2611.77 samples/sec Loss 18.1746 LearningRate 0.0957 Epoch: 0 Global Step: 18120 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:32,133-Speed 2622.73 samples/sec Loss 18.1558 LearningRate 0.0957 Epoch: 0 Global Step: 18130 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:36,039-Speed 2622.34 samples/sec Loss 18.1852 LearningRate 0.0957 Epoch: 0 Global Step: 18140 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:39,944-Speed 2623.39 samples/sec Loss 18.2432 LearningRate 0.0957 Epoch: 0 Global Step: 18150 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:43,846-Speed 2625.36 samples/sec Loss 18.1496 LearningRate 0.0957 Epoch: 0 Global Step: 18160 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:47,768-Speed 2611.47 samples/sec Loss 18.1871 LearningRate 0.0957 Epoch: 0 Global Step: 18170 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:51,680-Speed 2618.25 samples/sec Loss 18.2074 LearningRate 0.0957 Epoch: 0 Global Step: 18180 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:55,582-Speed 2624.91 samples/sec Loss 18.1602 LearningRate 0.0957 Epoch: 0 Global Step: 18190 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:44:59,660-Speed 2511.82 samples/sec Loss 18.1146 LearningRate 0.0957 Epoch: 0 Global Step: 18200 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:45:03,780-Speed 2485.94 samples/sec Loss 18.3924 LearningRate 0.0957 Epoch: 0 Global Step: 18210 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:45:07,806-Speed 2544.04 samples/sec Loss 18.3129 LearningRate 0.0957 Epoch: 0 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:11,708-Speed 2625.20 samples/sec Loss 18.3000 LearningRate 0.0957 Epoch: 0 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:15,643-Speed 2603.09 samples/sec Loss 18.0136 LearningRate 0.0957 Epoch: 0 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:19,550-Speed 2621.96 samples/sec Loss 18.1801 LearningRate 0.0956 Epoch: 0 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:23,457-Speed 2621.54 samples/sec Loss 18.0841 LearningRate 0.0956 Epoch: 0 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:27,358-Speed 2625.86 samples/sec Loss 18.1076 LearningRate 0.0956 Epoch: 0 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:31,256-Speed 2627.70 samples/sec Loss 18.1984 LearningRate 0.0956 Epoch: 0 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:35,171-Speed 2615.92 samples/sec Loss 18.1256 LearningRate 0.0956 Epoch: 0 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:39,067-Speed 2628.68 samples/sec Loss 18.0985 LearningRate 0.0956 Epoch: 0 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:42,989-Speed 2612.02 samples/sec Loss 18.0364 LearningRate 0.0956 Epoch: 0 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:45:46,885-Speed 2628.70 samples/sec Loss 18.1798 LearningRate 0.0956 Epoch: 0 Global Step: 18320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:45:50,785-Speed 2626.68 samples/sec Loss 18.0837 LearningRate 0.0956 Epoch: 0 Global Step: 18330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:45:54,689-Speed 2623.63 samples/sec Loss 18.1257 LearningRate 0.0956 Epoch: 0 Global Step: 18340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:45:58,585-Speed 2628.71 samples/sec Loss 18.0332 LearningRate 0.0956 Epoch: 0 Global Step: 18350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:46:02,485-Speed 2626.25 samples/sec Loss 18.3143 LearningRate 0.0956 Epoch: 0 Global Step: 18360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:46:06,385-Speed 2626.08 samples/sec Loss 18.0062 LearningRate 0.0956 Epoch: 0 Global Step: 18370 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:46:10,288-Speed 2624.24 samples/sec Loss 18.1183 LearningRate 0.0956 Epoch: 0 Global Step: 18380 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:46:14,193-Speed 2623.28 samples/sec Loss 18.1862 LearningRate 0.0956 Epoch: 0 Global Step: 18390 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:46:18,072-Speed 2640.72 samples/sec Loss 18.1182 LearningRate 0.0956 Epoch: 0 Global Step: 18400 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:21,969-Speed 2628.18 samples/sec Loss 18.0740 LearningRate 0.0956 Epoch: 0 Global Step: 18410 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:25,869-Speed 2625.96 samples/sec Loss 17.8848 LearningRate 0.0956 Epoch: 0 Global Step: 18420 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:29,778-Speed 2620.79 samples/sec Loss 18.1338 LearningRate 0.0956 Epoch: 0 Global Step: 18430 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:33,674-Speed 2628.59 samples/sec Loss 18.2178 LearningRate 0.0956 Epoch: 0 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:37,609-Speed 2602.63 samples/sec Loss 18.0238 LearningRate 0.0956 Epoch: 0 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:41,519-Speed 2619.21 samples/sec Loss 18.0538 LearningRate 0.0956 Epoch: 0 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:45,419-Speed 2627.02 samples/sec Loss 18.0097 LearningRate 0.0956 Epoch: 0 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:49,320-Speed 2625.89 samples/sec Loss 18.1874 LearningRate 0.0956 Epoch: 0 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:53,384-Speed 2520.29 samples/sec Loss 18.2285 LearningRate 0.0956 Epoch: 0 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:46:57,368-Speed 2571.00 samples/sec Loss 18.1414 LearningRate 0.0956 Epoch: 0 Global Step: 18500 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:47:01,272-Speed 2623.65 samples/sec Loss 18.1673 LearningRate 0.0956 Epoch: 0 Global Step: 18510 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:47:05,187-Speed 2616.02 samples/sec Loss 18.0343 LearningRate 0.0956 Epoch: 0 Global Step: 18520 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:47:09,088-Speed 2625.21 samples/sec Loss 18.0002 LearningRate 0.0956 Epoch: 0 Global Step: 18530 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:47:12,986-Speed 2628.00 samples/sec Loss 18.1295 LearningRate 0.0956 Epoch: 0 Global Step: 18540 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:47:16,887-Speed 2625.47 samples/sec Loss 17.8831 LearningRate 0.0956 Epoch: 0 Global Step: 18550 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:47:20,797-Speed 2620.03 samples/sec Loss 18.0493 LearningRate 0.0956 Epoch: 0 Global Step: 18560 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:47:24,682-Speed 2636.40 samples/sec Loss 17.9456 LearningRate 0.0956 Epoch: 0 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:28,591-Speed 2620.00 samples/sec Loss 17.9657 LearningRate 0.0956 Epoch: 0 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:32,528-Speed 2601.54 samples/sec Loss 17.9078 LearningRate 0.0956 Epoch: 0 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:36,435-Speed 2621.10 samples/sec Loss 17.9802 LearningRate 0.0956 Epoch: 0 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:40,352-Speed 2614.72 samples/sec Loss 18.0268 LearningRate 0.0956 Epoch: 0 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:44,262-Speed 2620.22 samples/sec Loss 17.9153 LearningRate 0.0956 Epoch: 0 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:48,167-Speed 2622.47 samples/sec Loss 18.0416 LearningRate 0.0956 Epoch: 0 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:52,074-Speed 2621.78 samples/sec Loss 17.9440 LearningRate 0.0956 Epoch: 0 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:55,976-Speed 2625.47 samples/sec Loss 17.9009 LearningRate 0.0956 Epoch: 0 Global Step: 18650 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:47:59,875-Speed 2626.94 samples/sec Loss 18.0552 LearningRate 0.0956 Epoch: 0 Global Step: 18660 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:03,794-Speed 2613.23 samples/sec Loss 17.9787 LearningRate 0.0955 Epoch: 0 Global Step: 18670 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:48:07,693-Speed 2626.96 samples/sec Loss 17.8204 LearningRate 0.0955 Epoch: 0 Global Step: 18680 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:48:11,717-Speed 2544.77 samples/sec Loss 18.0202 LearningRate 0.0955 Epoch: 0 Global Step: 18690 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:48:15,625-Speed 2621.53 samples/sec Loss 18.0259 LearningRate 0.0955 Epoch: 0 Global Step: 18700 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:48:19,636-Speed 2553.73 samples/sec Loss 17.9317 LearningRate 0.0955 Epoch: 0 Global Step: 18710 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:23,757-Speed 2485.69 samples/sec Loss 18.1433 LearningRate 0.0955 Epoch: 0 Global Step: 18720 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:27,970-Speed 2430.75 samples/sec Loss 17.7669 LearningRate 0.0955 Epoch: 0 Global Step: 18730 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:31,992-Speed 2547.13 samples/sec Loss 17.9332 LearningRate 0.0955 Epoch: 0 Global Step: 18740 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:35,888-Speed 2628.85 samples/sec Loss 18.0121 LearningRate 0.0955 Epoch: 0 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:39,786-Speed 2627.57 samples/sec Loss 17.8698 LearningRate 0.0955 Epoch: 0 Global Step: 18760 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:43,687-Speed 2625.54 samples/sec Loss 17.8436 LearningRate 0.0955 Epoch: 0 Global Step: 18770 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:47,578-Speed 2632.33 samples/sec Loss 17.8947 LearningRate 0.0955 Epoch: 0 Global Step: 18780 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:51,474-Speed 2629.45 samples/sec Loss 17.6847 LearningRate 0.0955 Epoch: 0 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:55,369-Speed 2629.32 samples/sec Loss 17.9630 LearningRate 0.0955 Epoch: 0 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:48:59,270-Speed 2625.31 samples/sec Loss 17.9661 LearningRate 0.0955 Epoch: 0 Global Step: 18810 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:49:03,182-Speed 2618.59 samples/sec Loss 17.9470 LearningRate 0.0955 Epoch: 0 Global Step: 18820 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:49:07,069-Speed 2634.99 samples/sec Loss 17.8972 LearningRate 0.0955 Epoch: 0 Global Step: 18830 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:10,972-Speed 2624.39 samples/sec Loss 17.7020 LearningRate 0.0955 Epoch: 0 Global Step: 18840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:14,869-Speed 2628.41 samples/sec Loss 17.9604 LearningRate 0.0955 Epoch: 0 Global Step: 18850 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:18,774-Speed 2622.25 samples/sec Loss 17.9261 LearningRate 0.0955 Epoch: 0 Global Step: 18860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:22,672-Speed 2628.01 samples/sec Loss 17.8128 LearningRate 0.0955 Epoch: 0 Global Step: 18870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:26,575-Speed 2624.40 samples/sec Loss 18.0030 LearningRate 0.0955 Epoch: 0 Global Step: 18880 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:30,479-Speed 2623.52 samples/sec Loss 18.1322 LearningRate 0.0955 Epoch: 0 Global Step: 18890 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:34,383-Speed 2623.78 samples/sec Loss 17.7548 LearningRate 0.0955 Epoch: 0 Global Step: 18900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:38,284-Speed 2625.87 samples/sec Loss 17.7556 LearningRate 0.0955 Epoch: 0 Global Step: 18910 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:42,179-Speed 2630.13 samples/sec Loss 17.7762 LearningRate 0.0955 Epoch: 0 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:46,081-Speed 2624.42 samples/sec Loss 17.7876 LearningRate 0.0955 Epoch: 0 Global Step: 18930 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:49:49,992-Speed 2619.21 samples/sec Loss 17.8212 LearningRate 0.0955 Epoch: 0 Global Step: 18940 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:49:53,873-Speed 2639.01 samples/sec Loss 17.7351 LearningRate 0.0955 Epoch: 0 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:49:57,782-Speed 2620.85 samples/sec Loss 17.8961 LearningRate 0.0955 Epoch: 0 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:01,679-Speed 2627.75 samples/sec Loss 17.8273 LearningRate 0.0955 Epoch: 0 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:05,591-Speed 2618.62 samples/sec Loss 17.8659 LearningRate 0.0955 Epoch: 0 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:09,490-Speed 2627.11 samples/sec Loss 17.7508 LearningRate 0.0955 Epoch: 0 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:13,410-Speed 2613.66 samples/sec Loss 18.0121 LearningRate 0.0955 Epoch: 0 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:17,450-Speed 2534.59 samples/sec Loss 17.9239 LearningRate 0.0955 Epoch: 0 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:21,365-Speed 2616.62 samples/sec Loss 17.9537 LearningRate 0.0955 Epoch: 0 Global Step: 19020 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:25,260-Speed 2629.08 samples/sec Loss 18.0341 LearningRate 0.0955 Epoch: 0 Global Step: 19030 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:29,159-Speed 2628.21 samples/sec Loss 17.8589 LearningRate 0.0955 Epoch: 0 Global Step: 19040 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:33,100-Speed 2598.76 samples/sec Loss 17.8529 LearningRate 0.0955 Epoch: 0 Global Step: 19050 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:50:37,121-Speed 2547.27 samples/sec Loss 17.5309 LearningRate 0.0955 Epoch: 0 Global Step: 19060 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:50:41,020-Speed 2627.08 samples/sec Loss 17.6666 LearningRate 0.0955 Epoch: 0 Global Step: 19070 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:50:44,923-Speed 2624.30 samples/sec Loss 17.7971 LearningRate 0.0955 Epoch: 0 Global Step: 19080 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:50:48,827-Speed 2623.43 samples/sec Loss 17.7781 LearningRate 0.0955 Epoch: 0 Global Step: 19090 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:50:52,723-Speed 2629.48 samples/sec Loss 17.9023 LearningRate 0.0954 Epoch: 0 Global Step: 19100 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:50:56,625-Speed 2624.61 samples/sec Loss 17.6611 LearningRate 0.0954 Epoch: 0 Global Step: 19110 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:00,523-Speed 2627.62 samples/sec Loss 17.6982 LearningRate 0.0954 Epoch: 0 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:04,424-Speed 2625.37 samples/sec Loss 17.7216 LearningRate 0.0954 Epoch: 0 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:08,328-Speed 2623.87 samples/sec Loss 17.7848 LearningRate 0.0954 Epoch: 0 Global Step: 19140 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:12,240-Speed 2618.68 samples/sec Loss 17.8644 LearningRate 0.0954 Epoch: 0 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:16,159-Speed 2613.41 samples/sec Loss 17.6474 LearningRate 0.0954 Epoch: 0 Global Step: 19160 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:20,102-Speed 2598.13 samples/sec Loss 17.7398 LearningRate 0.0954 Epoch: 0 Global Step: 19170 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:24,008-Speed 2621.96 samples/sec Loss 17.7747 LearningRate 0.0954 Epoch: 0 Global Step: 19180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:27,918-Speed 2619.94 samples/sec Loss 17.8897 LearningRate 0.0954 Epoch: 0 Global Step: 19190 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:31,831-Speed 2617.67 samples/sec Loss 17.6622 LearningRate 0.0954 Epoch: 0 Global Step: 19200 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:51:35,734-Speed 2624.45 samples/sec Loss 17.8350 LearningRate 0.0954 Epoch: 0 Global Step: 19210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:39,662-Speed 2607.28 samples/sec Loss 17.7188 LearningRate 0.0954 Epoch: 0 Global Step: 19220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:43,745-Speed 2508.75 samples/sec Loss 17.7541 LearningRate 0.0954 Epoch: 0 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:47,652-Speed 2621.02 samples/sec Loss 17.6151 LearningRate 0.0954 Epoch: 0 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:51,561-Speed 2620.62 samples/sec Loss 17.7492 LearningRate 0.0954 Epoch: 0 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:55,462-Speed 2625.28 samples/sec Loss 17.7059 LearningRate 0.0954 Epoch: 0 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:51:59,367-Speed 2622.99 samples/sec Loss 17.7024 LearningRate 0.0954 Epoch: 0 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:03,281-Speed 2617.18 samples/sec Loss 17.6729 LearningRate 0.0954 Epoch: 0 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:07,195-Speed 2616.36 samples/sec Loss 17.6817 LearningRate 0.0954 Epoch: 0 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:11,114-Speed 2613.77 samples/sec Loss 17.6995 LearningRate 0.0954 Epoch: 0 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:15,021-Speed 2621.58 samples/sec Loss 17.8199 LearningRate 0.0954 Epoch: 0 Global Step: 19310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:52:18,929-Speed 2620.64 samples/sec Loss 17.5816 LearningRate 0.0954 Epoch: 0 Global Step: 19320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:52:22,839-Speed 2620.18 samples/sec Loss 17.8101 LearningRate 0.0954 Epoch: 0 Global Step: 19330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:52:26,748-Speed 2619.99 samples/sec Loss 17.3588 LearningRate 0.0954 Epoch: 0 Global Step: 19340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:52:30,656-Speed 2621.09 samples/sec Loss 17.7315 LearningRate 0.0954 Epoch: 0 Global Step: 19350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:52:34,622-Speed 2582.21 samples/sec Loss 17.7278 LearningRate 0.0954 Epoch: 0 Global Step: 19360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:52:38,734-Speed 2491.31 samples/sec Loss 17.6213 LearningRate 0.0954 Epoch: 0 Global Step: 19370 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:52:42,776-Speed 2534.08 samples/sec Loss 17.6644 LearningRate 0.0954 Epoch: 0 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:46,685-Speed 2619.72 samples/sec Loss 17.6863 LearningRate 0.0954 Epoch: 0 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:50,597-Speed 2618.32 samples/sec Loss 17.8517 LearningRate 0.0954 Epoch: 0 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:54,505-Speed 2620.59 samples/sec Loss 17.7554 LearningRate 0.0954 Epoch: 0 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:52:58,417-Speed 2618.48 samples/sec Loss 17.6788 LearningRate 0.0954 Epoch: 0 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:53:02,328-Speed 2618.93 samples/sec Loss 17.5842 LearningRate 0.0954 Epoch: 0 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:53:06,306-Speed 2574.71 samples/sec Loss 17.7680 LearningRate 0.0954 Epoch: 0 Global Step: 19440 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:53:10,261-Speed 2589.43 samples/sec Loss 17.6630 LearningRate 0.0954 Epoch: 0 Global Step: 19450 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:53:14,171-Speed 2620.29 samples/sec Loss 17.5794 LearningRate 0.0954 Epoch: 0 Global Step: 19460 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:53:18,076-Speed 2622.60 samples/sec Loss 17.6071 LearningRate 0.0954 Epoch: 0 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:53:21,973-Speed 2628.32 samples/sec Loss 17.6385 LearningRate 0.0954 Epoch: 0 Global Step: 19480 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:25,874-Speed 2625.89 samples/sec Loss 17.7544 LearningRate 0.0954 Epoch: 0 Global Step: 19490 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:29,769-Speed 2629.87 samples/sec Loss 17.5695 LearningRate 0.0954 Epoch: 0 Global Step: 19500 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:33,668-Speed 2626.87 samples/sec Loss 17.6947 LearningRate 0.0954 Epoch: 0 Global Step: 19510 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:37,563-Speed 2629.08 samples/sec Loss 17.5368 LearningRate 0.0953 Epoch: 0 Global Step: 19520 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:41,460-Speed 2628.73 samples/sec Loss 17.6336 LearningRate 0.0953 Epoch: 0 Global Step: 19530 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:45,370-Speed 2619.69 samples/sec Loss 17.6664 LearningRate 0.0953 Epoch: 0 Global Step: 19540 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:49,270-Speed 2626.26 samples/sec Loss 17.5681 LearningRate 0.0953 Epoch: 0 Global Step: 19550 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:53,192-Speed 2612.07 samples/sec Loss 17.5985 LearningRate 0.0953 Epoch: 0 Global Step: 19560 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 21:53:57,078-Speed 2635.35 samples/sec Loss 17.6396 LearningRate 0.0953 Epoch: 0 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:00,981-Speed 2625.07 samples/sec Loss 17.5347 LearningRate 0.0953 Epoch: 0 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:04,888-Speed 2621.52 samples/sec Loss 17.4657 LearningRate 0.0953 Epoch: 0 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:08,790-Speed 2625.36 samples/sec Loss 17.5218 LearningRate 0.0953 Epoch: 0 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:12,684-Speed 2629.80 samples/sec Loss 17.4519 LearningRate 0.0953 Epoch: 0 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:16,597-Speed 2617.98 samples/sec Loss 17.5497 LearningRate 0.0953 Epoch: 0 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:20,578-Speed 2573.00 samples/sec Loss 17.6564 LearningRate 0.0953 Epoch: 0 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:24,534-Speed 2589.55 samples/sec Loss 17.5050 LearningRate 0.0953 Epoch: 0 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:28,469-Speed 2602.79 samples/sec Loss 17.5466 LearningRate 0.0953 Epoch: 0 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:54:32,350-Speed 2638.79 samples/sec Loss 17.4392 LearningRate 0.0953 Epoch: 0 Global Step: 19660 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:54:36,431-Speed 2510.25 samples/sec Loss 17.5418 LearningRate 0.0953 Epoch: 0 Global Step: 19670 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:54:40,331-Speed 2626.47 samples/sec Loss 17.6341 LearningRate 0.0953 Epoch: 0 Global Step: 19680 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:54:44,238-Speed 2621.94 samples/sec Loss 17.5715 LearningRate 0.0953 Epoch: 0 Global Step: 19690 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:54:48,134-Speed 2629.32 samples/sec Loss 17.4866 LearningRate 0.0953 Epoch: 0 Global Step: 19700 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:54:52,032-Speed 2627.58 samples/sec Loss 17.3626 LearningRate 0.0953 Epoch: 0 Global Step: 19710 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:54:55,932-Speed 2625.83 samples/sec Loss 17.5903 LearningRate 0.0953 Epoch: 0 Global Step: 19720 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:54:59,838-Speed 2622.18 samples/sec Loss 17.6314 LearningRate 0.0953 Epoch: 0 Global Step: 19730 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:55:03,755-Speed 2615.56 samples/sec Loss 17.4813 LearningRate 0.0953 Epoch: 0 Global Step: 19740 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:55:07,653-Speed 2627.15 samples/sec Loss 17.4221 LearningRate 0.0953 Epoch: 0 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:55:11,645-Speed 2565.97 samples/sec Loss 17.5287 LearningRate 0.0953 Epoch: 0 Global Step: 19760 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:15,549-Speed 2624.07 samples/sec Loss 17.5506 LearningRate 0.0953 Epoch: 0 Global Step: 19770 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:19,451-Speed 2624.91 samples/sec Loss 17.6147 LearningRate 0.0953 Epoch: 0 Global Step: 19780 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:23,351-Speed 2626.28 samples/sec Loss 17.3949 LearningRate 0.0953 Epoch: 0 Global Step: 19790 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:27,248-Speed 2628.36 samples/sec Loss 17.4337 LearningRate 0.0953 Epoch: 0 Global Step: 19800 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:31,154-Speed 2622.18 samples/sec Loss 17.6503 LearningRate 0.0953 Epoch: 0 Global Step: 19810 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:35,054-Speed 2626.20 samples/sec Loss 17.5286 LearningRate 0.0953 Epoch: 0 Global Step: 19820 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:38,955-Speed 2625.47 samples/sec Loss 17.5874 LearningRate 0.0953 Epoch: 0 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:42,852-Speed 2628.72 samples/sec Loss 17.4944 LearningRate 0.0953 Epoch: 0 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 21:55:46,740-Speed 2634.01 samples/sec Loss 17.5396 LearningRate 0.0953 Epoch: 0 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:55:50,789-Speed 2529.97 samples/sec Loss 17.6308 LearningRate 0.0953 Epoch: 0 Global Step: 19860 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:55:54,684-Speed 2630.17 samples/sec Loss 17.4186 LearningRate 0.0953 Epoch: 0 Global Step: 19870 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:55:58,576-Speed 2630.97 samples/sec Loss 17.4360 LearningRate 0.0953 Epoch: 0 Global Step: 19880 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:02,471-Speed 2629.74 samples/sec Loss 17.6369 LearningRate 0.0953 Epoch: 0 Global Step: 19890 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:06,393-Speed 2612.40 samples/sec Loss 17.5765 LearningRate 0.0953 Epoch: 0 Global Step: 19900 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:10,286-Speed 2631.59 samples/sec Loss 17.7087 LearningRate 0.0953 Epoch: 0 Global Step: 19910 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:14,181-Speed 2629.56 samples/sec Loss 17.5543 LearningRate 0.0953 Epoch: 0 Global Step: 19920 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:18,082-Speed 2626.01 samples/sec Loss 17.5969 LearningRate 0.0953 Epoch: 0 Global Step: 19930 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:21,992-Speed 2619.36 samples/sec Loss 17.6018 LearningRate 0.0953 Epoch: 0 Global Step: 19940 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:25,903-Speed 2618.75 samples/sec Loss 17.4949 LearningRate 0.0952 Epoch: 0 Global Step: 19950 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 21:56:29,811-Speed 2621.25 samples/sec Loss 17.6592 LearningRate 0.0952 Epoch: 0 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:56:33,706-Speed 2629.67 samples/sec Loss 17.3061 LearningRate 0.0952 Epoch: 0 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:56:37,604-Speed 2627.38 samples/sec Loss 17.2772 LearningRate 0.0952 Epoch: 0 Global Step: 19980 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:56:41,533-Speed 2607.05 samples/sec Loss 17.5025 LearningRate 0.0952 Epoch: 0 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:56:45,442-Speed 2620.85 samples/sec Loss 17.6272 LearningRate 0.0952 Epoch: 0 Global Step: 20000 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 21:57:28,604-[lfw][20000]XNorm: 22.849141
Training: 2022-04-12 21:57:28,605-[lfw][20000]Accuracy-Flip: 0.99500+-0.00279
Training: 2022-04-12 21:57:28,605-[lfw][20000]Accuracy-Highest: 0.99500
Training: 2022-04-12 21:58:18,988-[cfp_fp][20000]XNorm: 20.668791
Training: 2022-04-12 21:58:18,989-[cfp_fp][20000]Accuracy-Flip: 0.96043+-0.00869
Training: 2022-04-12 21:58:18,991-[cfp_fp][20000]Accuracy-Highest: 0.96043
Training: 2022-04-12 21:59:02,372-[agedb_30][20000]XNorm: 22.528649
Training: 2022-04-12 21:59:02,373-[agedb_30][20000]Accuracy-Flip: 0.94267+-0.01138
Training: 2022-04-12 21:59:02,373-[agedb_30][20000]Accuracy-Highest: 0.94267
Training: 2022-04-12 21:59:06,244-Speed 72.73 samples/sec Loss 17.2786 LearningRate 0.0952 Epoch: 0 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 21:59:10,116-Speed 2645.17 samples/sec Loss 17.3368 LearningRate 0.0952 Epoch: 0 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 21:59:14,026-Speed 2619.95 samples/sec Loss 17.5399 LearningRate 0.0952 Epoch: 0 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 21:59:17,971-Speed 2596.09 samples/sec Loss 17.5037 LearningRate 0.0952 Epoch: 0 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 21:59:21,854-Speed 2637.86 samples/sec Loss 17.3660 LearningRate 0.0952 Epoch: 0 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 21:59:25,817-Speed 2584.21 samples/sec Loss 17.3713 LearningRate 0.0952 Epoch: 0 Global Step: 20060 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:29,729-Speed 2619.65 samples/sec Loss 17.3161 LearningRate 0.0952 Epoch: 0 Global Step: 20070 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:33,618-Speed 2633.45 samples/sec Loss 17.3355 LearningRate 0.0952 Epoch: 0 Global Step: 20080 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:37,507-Speed 2633.78 samples/sec Loss 17.5217 LearningRate 0.0952 Epoch: 0 Global Step: 20090 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:41,378-Speed 2646.37 samples/sec Loss 17.2970 LearningRate 0.0952 Epoch: 0 Global Step: 20100 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:45,263-Speed 2636.41 samples/sec Loss 17.1568 LearningRate 0.0952 Epoch: 0 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:49,148-Speed 2636.23 samples/sec Loss 17.5031 LearningRate 0.0952 Epoch: 0 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:53,037-Speed 2633.46 samples/sec Loss 17.2312 LearningRate 0.0952 Epoch: 0 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 21:59:56,926-Speed 2634.14 samples/sec Loss 17.2506 LearningRate 0.0952 Epoch: 0 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:00,818-Speed 2631.03 samples/sec Loss 17.3349 LearningRate 0.0952 Epoch: 0 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:04,695-Speed 2642.21 samples/sec Loss 17.4012 LearningRate 0.0952 Epoch: 0 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:08,589-Speed 2630.30 samples/sec Loss 17.3337 LearningRate 0.0952 Epoch: 0 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:12,485-Speed 2628.84 samples/sec Loss 17.3130 LearningRate 0.0952 Epoch: 0 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:16,506-Speed 2547.22 samples/sec Loss 17.4086 LearningRate 0.0952 Epoch: 0 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:20,405-Speed 2627.49 samples/sec Loss 17.3173 LearningRate 0.0952 Epoch: 0 Global Step: 20200 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:24,299-Speed 2630.34 samples/sec Loss 17.3467 LearningRate 0.0952 Epoch: 0 Global Step: 20210 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:28,194-Speed 2629.25 samples/sec Loss 17.5611 LearningRate 0.0952 Epoch: 0 Global Step: 20220 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:32,105-Speed 2619.41 samples/sec Loss 17.3273 LearningRate 0.0952 Epoch: 0 Global Step: 20230 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:36,005-Speed 2626.51 samples/sec Loss 17.2458 LearningRate 0.0952 Epoch: 0 Global Step: 20240 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:39,904-Speed 2626.93 samples/sec Loss 17.3941 LearningRate 0.0952 Epoch: 0 Global Step: 20250 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:43,799-Speed 2629.04 samples/sec Loss 17.2698 LearningRate 0.0952 Epoch: 0 Global Step: 20260 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:00:47,702-Speed 2624.49 samples/sec Loss 17.0955 LearningRate 0.0952 Epoch: 0 Global Step: 20270 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:00:51,598-Speed 2628.98 samples/sec Loss 17.3007 LearningRate 0.0952 Epoch: 0 Global Step: 20280 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:00:55,516-Speed 2614.79 samples/sec Loss 17.2960 LearningRate 0.0952 Epoch: 0 Global Step: 20290 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:00:59,411-Speed 2629.61 samples/sec Loss 17.2550 LearningRate 0.0952 Epoch: 0 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:03,328-Speed 2614.76 samples/sec Loss 17.1290 LearningRate 0.0952 Epoch: 0 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:07,333-Speed 2557.54 samples/sec Loss 17.1881 LearningRate 0.0952 Epoch: 0 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:11,232-Speed 2627.17 samples/sec Loss 17.2640 LearningRate 0.0952 Epoch: 0 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:15,132-Speed 2626.31 samples/sec Loss 17.3689 LearningRate 0.0952 Epoch: 0 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:19,033-Speed 2625.12 samples/sec Loss 17.4201 LearningRate 0.0952 Epoch: 0 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:22,959-Speed 2609.07 samples/sec Loss 17.2519 LearningRate 0.0952 Epoch: 0 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:26,879-Speed 2612.71 samples/sec Loss 17.3264 LearningRate 0.0951 Epoch: 0 Global Step: 20370 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:30,794-Speed 2616.76 samples/sec Loss 17.2888 LearningRate 0.0951 Epoch: 0 Global Step: 20380 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:34,716-Speed 2611.82 samples/sec Loss 17.3595 LearningRate 0.0951 Epoch: 0 Global Step: 20390 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:01:38,760-Speed 2532.26 samples/sec Loss 17.0819 LearningRate 0.0951 Epoch: 0 Global Step: 20400 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:01:42,665-Speed 2623.10 samples/sec Loss 17.2258 LearningRate 0.0951 Epoch: 0 Global Step: 20410 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:01:46,580-Speed 2616.95 samples/sec Loss 17.2909 LearningRate 0.0951 Epoch: 0 Global Step: 20420 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:01:50,468-Speed 2634.20 samples/sec Loss 17.3200 LearningRate 0.0951 Epoch: 0 Global Step: 20430 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:54,372-Speed 2623.38 samples/sec Loss 17.3095 LearningRate 0.0951 Epoch: 0 Global Step: 20440 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:01:58,274-Speed 2625.10 samples/sec Loss 17.3333 LearningRate 0.0951 Epoch: 0 Global Step: 20450 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:02,179-Speed 2623.05 samples/sec Loss 17.3007 LearningRate 0.0951 Epoch: 0 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:06,098-Speed 2613.26 samples/sec Loss 17.2143 LearningRate 0.0951 Epoch: 0 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:10,022-Speed 2610.53 samples/sec Loss 17.2889 LearningRate 0.0951 Epoch: 0 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:14,007-Speed 2569.67 samples/sec Loss 17.2423 LearningRate 0.0951 Epoch: 0 Global Step: 20490 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:17,917-Speed 2619.81 samples/sec Loss 17.2116 LearningRate 0.0951 Epoch: 0 Global Step: 20500 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:21,872-Speed 2590.17 samples/sec Loss 17.1046 LearningRate 0.0951 Epoch: 0 Global Step: 20510 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:25,775-Speed 2624.26 samples/sec Loss 17.3211 LearningRate 0.0951 Epoch: 0 Global Step: 20520 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:02:29,679-Speed 2624.26 samples/sec Loss 17.0901 LearningRate 0.0951 Epoch: 0 Global Step: 20530 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:02:33,582-Speed 2624.40 samples/sec Loss 17.3014 LearningRate 0.0951 Epoch: 0 Global Step: 20540 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:02:37,484-Speed 2624.70 samples/sec Loss 17.2327 LearningRate 0.0951 Epoch: 0 Global Step: 20550 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:02:41,387-Speed 2624.20 samples/sec Loss 17.2847 LearningRate 0.0951 Epoch: 0 Global Step: 20560 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:02:45,293-Speed 2622.84 samples/sec Loss 17.3132 LearningRate 0.0951 Epoch: 0 Global Step: 20570 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:02:49,196-Speed 2623.80 samples/sec Loss 17.2234 LearningRate 0.0951 Epoch: 0 Global Step: 20580 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:02:53,155-Speed 2587.31 samples/sec Loss 17.1704 LearningRate 0.0951 Epoch: 0 Global Step: 20590 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:02:57,071-Speed 2615.71 samples/sec Loss 17.3309 LearningRate 0.0951 Epoch: 0 Global Step: 20600 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:03:00,971-Speed 2626.36 samples/sec Loss 17.2529 LearningRate 0.0951 Epoch: 0 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:04,966-Speed 2563.97 samples/sec Loss 17.1898 LearningRate 0.0951 Epoch: 0 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:08,873-Speed 2621.28 samples/sec Loss 17.0936 LearningRate 0.0951 Epoch: 0 Global Step: 20630 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:12,779-Speed 2622.35 samples/sec Loss 17.3306 LearningRate 0.0951 Epoch: 0 Global Step: 20640 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:16,696-Speed 2614.78 samples/sec Loss 17.2487 LearningRate 0.0951 Epoch: 0 Global Step: 20650 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:20,598-Speed 2625.24 samples/sec Loss 17.3869 LearningRate 0.0951 Epoch: 0 Global Step: 20660 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:24,501-Speed 2624.44 samples/sec Loss 17.2078 LearningRate 0.0951 Epoch: 0 Global Step: 20670 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:28,441-Speed 2599.46 samples/sec Loss 17.3298 LearningRate 0.0951 Epoch: 0 Global Step: 20680 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:32,345-Speed 2623.55 samples/sec Loss 17.3589 LearningRate 0.0951 Epoch: 0 Global Step: 20690 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:36,250-Speed 2623.27 samples/sec Loss 17.2275 LearningRate 0.0951 Epoch: 0 Global Step: 20700 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:40,163-Speed 2617.44 samples/sec Loss 17.2090 LearningRate 0.0951 Epoch: 0 Global Step: 20710 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:03:44,073-Speed 2619.14 samples/sec Loss 17.2965 LearningRate 0.0951 Epoch: 0 Global Step: 20720 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:03:47,986-Speed 2617.86 samples/sec Loss 17.0071 LearningRate 0.0951 Epoch: 0 Global Step: 20730 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:03:51,928-Speed 2599.01 samples/sec Loss 17.2234 LearningRate 0.0951 Epoch: 0 Global Step: 20740 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:03:55,831-Speed 2623.75 samples/sec Loss 17.0716 LearningRate 0.0951 Epoch: 0 Global Step: 20750 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:03:59,754-Speed 2611.99 samples/sec Loss 17.0878 LearningRate 0.0951 Epoch: 0 Global Step: 20760 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:03,668-Speed 2616.28 samples/sec Loss 17.1337 LearningRate 0.0951 Epoch: 0 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:07,590-Speed 2611.28 samples/sec Loss 17.1259 LearningRate 0.0951 Epoch: 0 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:11,508-Speed 2614.83 samples/sec Loss 17.1120 LearningRate 0.0951 Epoch: 0 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:15,428-Speed 2612.67 samples/sec Loss 17.3119 LearningRate 0.0950 Epoch: 0 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:19,394-Speed 2582.57 samples/sec Loss 17.1137 LearningRate 0.0950 Epoch: 0 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:23,338-Speed 2597.75 samples/sec Loss 17.1623 LearningRate 0.0950 Epoch: 0 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:27,245-Speed 2621.71 samples/sec Loss 17.0761 LearningRate 0.0950 Epoch: 0 Global Step: 20830 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:31,153-Speed 2620.55 samples/sec Loss 17.1821 LearningRate 0.0950 Epoch: 0 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:04:35,058-Speed 2623.38 samples/sec Loss 17.2244 LearningRate 0.0950 Epoch: 0 Global Step: 20850 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:04:38,968-Speed 2619.20 samples/sec Loss 17.1397 LearningRate 0.0950 Epoch: 0 Global Step: 20860 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:04:42,870-Speed 2625.17 samples/sec Loss 17.1464 LearningRate 0.0950 Epoch: 0 Global Step: 20870 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:04:46,779-Speed 2620.01 samples/sec Loss 17.1666 LearningRate 0.0950 Epoch: 0 Global Step: 20880 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:04:50,686-Speed 2621.86 samples/sec Loss 17.1096 LearningRate 0.0950 Epoch: 0 Global Step: 20890 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:04:54,591-Speed 2623.27 samples/sec Loss 17.0485 LearningRate 0.0950 Epoch: 0 Global Step: 20900 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:04:58,483-Speed 2631.52 samples/sec Loss 17.0844 LearningRate 0.0950 Epoch: 0 Global Step: 20910 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:02,533-Speed 2528.68 samples/sec Loss 17.2007 LearningRate 0.0950 Epoch: 0 Global Step: 20920 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:06,655-Speed 2485.53 samples/sec Loss 17.2306 LearningRate 0.0950 Epoch: 0 Global Step: 20930 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:10,770-Speed 2488.79 samples/sec Loss 17.1636 LearningRate 0.0950 Epoch: 0 Global Step: 20940 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:14,706-Speed 2602.08 samples/sec Loss 17.1559 LearningRate 0.0950 Epoch: 0 Global Step: 20950 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:18,612-Speed 2622.92 samples/sec Loss 17.0436 LearningRate 0.0950 Epoch: 0 Global Step: 20960 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:22,544-Speed 2604.43 samples/sec Loss 17.0375 LearningRate 0.0950 Epoch: 0 Global Step: 20970 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:26,479-Speed 2603.26 samples/sec Loss 16.9903 LearningRate 0.0950 Epoch: 0 Global Step: 20980 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:30,412-Speed 2604.61 samples/sec Loss 16.9601 LearningRate 0.0950 Epoch: 0 Global Step: 20990 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:34,503-Speed 2503.14 samples/sec Loss 17.0820 LearningRate 0.0950 Epoch: 0 Global Step: 21000 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:05:38,411-Speed 2621.28 samples/sec Loss 16.8805 LearningRate 0.0950 Epoch: 0 Global Step: 21010 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:05:42,327-Speed 2615.57 samples/sec Loss 17.1483 LearningRate 0.0950 Epoch: 0 Global Step: 21020 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:05:46,240-Speed 2617.60 samples/sec Loss 17.0557 LearningRate 0.0950 Epoch: 0 Global Step: 21030 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:05:50,147-Speed 2621.65 samples/sec Loss 17.0277 LearningRate 0.0950 Epoch: 0 Global Step: 21040 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:05:54,054-Speed 2620.99 samples/sec Loss 17.2500 LearningRate 0.0950 Epoch: 0 Global Step: 21050 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:05:57,971-Speed 2615.30 samples/sec Loss 17.0240 LearningRate 0.0950 Epoch: 0 Global Step: 21060 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:06:01,887-Speed 2615.59 samples/sec Loss 17.0192 LearningRate 0.0950 Epoch: 0 Global Step: 21070 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:05,799-Speed 2618.49 samples/sec Loss 16.9565 LearningRate 0.0950 Epoch: 0 Global Step: 21080 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:09,784-Speed 2570.29 samples/sec Loss 17.0669 LearningRate 0.0950 Epoch: 0 Global Step: 21090 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:13,813-Speed 2541.92 samples/sec Loss 17.0735 LearningRate 0.0950 Epoch: 0 Global Step: 21100 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:17,760-Speed 2594.95 samples/sec Loss 17.1467 LearningRate 0.0950 Epoch: 0 Global Step: 21110 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:21,681-Speed 2612.49 samples/sec Loss 17.0775 LearningRate 0.0950 Epoch: 0 Global Step: 21120 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:25,593-Speed 2618.34 samples/sec Loss 16.9033 LearningRate 0.0950 Epoch: 0 Global Step: 21130 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:29,502-Speed 2620.14 samples/sec Loss 17.0271 LearningRate 0.0950 Epoch: 0 Global Step: 21140 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:33,418-Speed 2615.98 samples/sec Loss 17.0637 LearningRate 0.0950 Epoch: 0 Global Step: 21150 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:37,332-Speed 2617.11 samples/sec Loss 17.0440 LearningRate 0.0950 Epoch: 0 Global Step: 21160 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:41,248-Speed 2615.42 samples/sec Loss 17.0535 LearningRate 0.0950 Epoch: 0 Global Step: 21170 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:06:45,152-Speed 2623.95 samples/sec Loss 17.0259 LearningRate 0.0950 Epoch: 0 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:49,058-Speed 2622.33 samples/sec Loss 17.0544 LearningRate 0.0950 Epoch: 0 Global Step: 21190 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:53,020-Speed 2585.33 samples/sec Loss 16.9845 LearningRate 0.0950 Epoch: 0 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:06:57,086-Speed 2518.83 samples/sec Loss 17.0774 LearningRate 0.0950 Epoch: 0 Global Step: 21210 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:00,992-Speed 2622.23 samples/sec Loss 16.9532 LearningRate 0.0949 Epoch: 0 Global Step: 21220 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:04,902-Speed 2619.66 samples/sec Loss 17.2552 LearningRate 0.0949 Epoch: 0 Global Step: 21230 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:08,806-Speed 2623.45 samples/sec Loss 16.9449 LearningRate 0.0949 Epoch: 0 Global Step: 21240 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:12,713-Speed 2621.95 samples/sec Loss 17.0939 LearningRate 0.0949 Epoch: 0 Global Step: 21250 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:16,625-Speed 2617.76 samples/sec Loss 16.9523 LearningRate 0.0949 Epoch: 0 Global Step: 21260 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:20,688-Speed 2521.28 samples/sec Loss 16.9467 LearningRate 0.0949 Epoch: 0 Global Step: 21270 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:24,610-Speed 2611.70 samples/sec Loss 17.0630 LearningRate 0.0949 Epoch: 0 Global Step: 21280 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:07:28,535-Speed 2609.81 samples/sec Loss 16.9539 LearningRate 0.0949 Epoch: 0 Global Step: 21290 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:07:32,446-Speed 2618.62 samples/sec Loss 17.1433 LearningRate 0.0949 Epoch: 0 Global Step: 21300 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:07:36,357-Speed 2619.07 samples/sec Loss 16.9352 LearningRate 0.0949 Epoch: 0 Global Step: 21310 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:07:40,266-Speed 2619.77 samples/sec Loss 17.0704 LearningRate 0.0949 Epoch: 0 Global Step: 21320 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:07:44,177-Speed 2618.56 samples/sec Loss 17.0987 LearningRate 0.0949 Epoch: 0 Global Step: 21330 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:07:48,105-Speed 2608.56 samples/sec Loss 16.9881 LearningRate 0.0949 Epoch: 0 Global Step: 21340 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:07:52,095-Speed 2566.40 samples/sec Loss 17.0044 LearningRate 0.0949 Epoch: 0 Global Step: 21350 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:07:56,214-Speed 2487.17 samples/sec Loss 16.9794 LearningRate 0.0949 Epoch: 0 Global Step: 21360 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:00,121-Speed 2621.43 samples/sec Loss 17.1177 LearningRate 0.0949 Epoch: 0 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:04,035-Speed 2616.77 samples/sec Loss 16.9850 LearningRate 0.0949 Epoch: 0 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:07,946-Speed 2619.09 samples/sec Loss 17.1058 LearningRate 0.0949 Epoch: 0 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:11,853-Speed 2621.67 samples/sec Loss 17.0647 LearningRate 0.0949 Epoch: 0 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:15,782-Speed 2606.68 samples/sec Loss 17.1170 LearningRate 0.0949 Epoch: 0 Global Step: 21410 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:19,718-Speed 2602.54 samples/sec Loss 16.9096 LearningRate 0.0949 Epoch: 0 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:23,627-Speed 2620.51 samples/sec Loss 17.0543 LearningRate 0.0949 Epoch: 0 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:27,534-Speed 2622.20 samples/sec Loss 16.9177 LearningRate 0.0949 Epoch: 0 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:31,447-Speed 2617.52 samples/sec Loss 17.0270 LearningRate 0.0949 Epoch: 0 Global Step: 21450 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:08:35,385-Speed 2600.88 samples/sec Loss 16.7937 LearningRate 0.0949 Epoch: 0 Global Step: 21460 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:08:39,438-Speed 2526.96 samples/sec Loss 16.9385 LearningRate 0.0949 Epoch: 0 Global Step: 21470 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:43,531-Speed 2502.09 samples/sec Loss 16.9112 LearningRate 0.0949 Epoch: 0 Global Step: 21480 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:47,453-Speed 2612.05 samples/sec Loss 16.9142 LearningRate 0.0949 Epoch: 0 Global Step: 21490 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:51,360-Speed 2621.14 samples/sec Loss 16.9864 LearningRate 0.0949 Epoch: 0 Global Step: 21500 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:55,274-Speed 2617.18 samples/sec Loss 16.8475 LearningRate 0.0949 Epoch: 0 Global Step: 21510 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:08:59,184-Speed 2619.61 samples/sec Loss 16.9216 LearningRate 0.0949 Epoch: 0 Global Step: 21520 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:03,095-Speed 2618.96 samples/sec Loss 16.7584 LearningRate 0.0949 Epoch: 0 Global Step: 21530 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:07,004-Speed 2619.82 samples/sec Loss 16.7814 LearningRate 0.0949 Epoch: 0 Global Step: 21540 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:10,924-Speed 2612.80 samples/sec Loss 16.8829 LearningRate 0.0949 Epoch: 0 Global Step: 21550 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:14,834-Speed 2619.37 samples/sec Loss 16.9803 LearningRate 0.0949 Epoch: 0 Global Step: 21560 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:18,936-Speed 2496.91 samples/sec Loss 17.0373 LearningRate 0.0949 Epoch: 0 Global Step: 21570 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:09:23,001-Speed 2519.82 samples/sec Loss 17.0240 LearningRate 0.0949 Epoch: 0 Global Step: 21580 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:09:26,906-Speed 2622.91 samples/sec Loss 16.9210 LearningRate 0.0949 Epoch: 0 Global Step: 21590 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:09:30,821-Speed 2616.40 samples/sec Loss 16.7600 LearningRate 0.0949 Epoch: 0 Global Step: 21600 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:09:34,734-Speed 2617.21 samples/sec Loss 16.9929 LearningRate 0.0949 Epoch: 0 Global Step: 21610 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:09:38,622-Speed 2633.96 samples/sec Loss 16.8408 LearningRate 0.0949 Epoch: 0 Global Step: 21620 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:42,532-Speed 2620.03 samples/sec Loss 16.8212 LearningRate 0.0949 Epoch: 0 Global Step: 21630 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:46,452-Speed 2612.30 samples/sec Loss 17.0384 LearningRate 0.0949 Epoch: 0 Global Step: 21640 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:50,384-Speed 2605.38 samples/sec Loss 16.9902 LearningRate 0.0948 Epoch: 0 Global Step: 21650 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:54,386-Speed 2559.39 samples/sec Loss 16.8431 LearningRate 0.0948 Epoch: 0 Global Step: 21660 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:09:58,296-Speed 2620.02 samples/sec Loss 16.9397 LearningRate 0.0948 Epoch: 0 Global Step: 21670 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:02,350-Speed 2526.19 samples/sec Loss 16.9069 LearningRate 0.0948 Epoch: 0 Global Step: 21680 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:06,325-Speed 2577.07 samples/sec Loss 17.0303 LearningRate 0.0948 Epoch: 0 Global Step: 21690 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:10,236-Speed 2618.34 samples/sec Loss 16.9323 LearningRate 0.0948 Epoch: 0 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:14,150-Speed 2616.83 samples/sec Loss 16.9864 LearningRate 0.0948 Epoch: 0 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:18,076-Speed 2609.58 samples/sec Loss 16.9566 LearningRate 0.0948 Epoch: 0 Global Step: 21720 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:10:21,993-Speed 2615.01 samples/sec Loss 16.9099 LearningRate 0.0948 Epoch: 0 Global Step: 21730 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:10:25,888-Speed 2629.01 samples/sec Loss 16.8735 LearningRate 0.0948 Epoch: 0 Global Step: 21740 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:29,804-Speed 2616.11 samples/sec Loss 16.9736 LearningRate 0.0948 Epoch: 0 Global Step: 21750 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:33,721-Speed 2615.02 samples/sec Loss 16.9530 LearningRate 0.0948 Epoch: 0 Global Step: 21760 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:10:37,654-Speed 2604.32 samples/sec Loss 16.8648 LearningRate 0.0948 Epoch: 0 Global Step: 21770 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:10:41,622-Speed 2581.73 samples/sec Loss 16.8348 LearningRate 0.0948 Epoch: 0 Global Step: 21780 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:10:45,720-Speed 2498.83 samples/sec Loss 16.9737 LearningRate 0.0948 Epoch: 0 Global Step: 21790 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:10:49,726-Speed 2557.74 samples/sec Loss 16.7927 LearningRate 0.0948 Epoch: 0 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:10:53,639-Speed 2617.46 samples/sec Loss 16.8731 LearningRate 0.0948 Epoch: 0 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:10:57,548-Speed 2619.91 samples/sec Loss 16.8285 LearningRate 0.0948 Epoch: 0 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:11:01,454-Speed 2622.10 samples/sec Loss 16.8650 LearningRate 0.0948 Epoch: 0 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:11:05,389-Speed 2603.47 samples/sec Loss 16.7568 LearningRate 0.0948 Epoch: 0 Global Step: 21840 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:11:09,297-Speed 2620.51 samples/sec Loss 16.9007 LearningRate 0.0948 Epoch: 0 Global Step: 21850 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:11:13,205-Speed 2620.84 samples/sec Loss 16.9331 LearningRate 0.0948 Epoch: 0 Global Step: 21860 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:11:17,127-Speed 2611.48 samples/sec Loss 16.9770 LearningRate 0.0948 Epoch: 0 Global Step: 21870 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:21,060-Speed 2604.47 samples/sec Loss 16.8032 LearningRate 0.0948 Epoch: 0 Global Step: 21880 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:24,991-Speed 2605.66 samples/sec Loss 16.7248 LearningRate 0.0948 Epoch: 0 Global Step: 21890 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:28,906-Speed 2616.40 samples/sec Loss 16.8836 LearningRate 0.0948 Epoch: 0 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:32,827-Speed 2611.68 samples/sec Loss 16.7817 LearningRate 0.0948 Epoch: 0 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:36,742-Speed 2617.18 samples/sec Loss 16.9030 LearningRate 0.0948 Epoch: 0 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:40,663-Speed 2611.68 samples/sec Loss 16.9036 LearningRate 0.0948 Epoch: 0 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:44,575-Speed 2618.55 samples/sec Loss 16.7863 LearningRate 0.0948 Epoch: 0 Global Step: 21940 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:48,486-Speed 2618.33 samples/sec Loss 16.8626 LearningRate 0.0948 Epoch: 0 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:52,393-Speed 2621.81 samples/sec Loss 16.7639 LearningRate 0.0948 Epoch: 0 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:11:56,292-Speed 2626.90 samples/sec Loss 16.9486 LearningRate 0.0948 Epoch: 0 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:00,202-Speed 2619.63 samples/sec Loss 16.8055 LearningRate 0.0948 Epoch: 0 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:04,118-Speed 2615.54 samples/sec Loss 16.7627 LearningRate 0.0948 Epoch: 0 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:08,026-Speed 2621.09 samples/sec Loss 16.9939 LearningRate 0.0948 Epoch: 0 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:11,938-Speed 2618.33 samples/sec Loss 16.8963 LearningRate 0.0948 Epoch: 0 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:15,881-Speed 2597.13 samples/sec Loss 16.7661 LearningRate 0.0948 Epoch: 0 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:19,819-Speed 2601.39 samples/sec Loss 16.7928 LearningRate 0.0948 Epoch: 0 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:23,769-Speed 2592.85 samples/sec Loss 16.8972 LearningRate 0.0948 Epoch: 0 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:27,717-Speed 2594.96 samples/sec Loss 16.7560 LearningRate 0.0948 Epoch: 0 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:31,629-Speed 2618.08 samples/sec Loss 16.7642 LearningRate 0.0948 Epoch: 0 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:35,535-Speed 2622.57 samples/sec Loss 16.8549 LearningRate 0.0948 Epoch: 0 Global Step: 22070 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:12:39,477-Speed 2598.30 samples/sec Loss 16.8242 LearningRate 0.0947 Epoch: 0 Global Step: 22080 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:12:43,390-Speed 2617.46 samples/sec Loss 16.8511 LearningRate 0.0947 Epoch: 0 Global Step: 22090 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:12:47,330-Speed 2599.45 samples/sec Loss 16.8887 LearningRate 0.0947 Epoch: 0 Global Step: 22100 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:12:51,273-Speed 2598.49 samples/sec Loss 16.8176 LearningRate 0.0947 Epoch: 0 Global Step: 22110 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:12:55,178-Speed 2622.58 samples/sec Loss 16.8183 LearningRate 0.0947 Epoch: 0 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:12:59,086-Speed 2620.71 samples/sec Loss 16.8515 LearningRate 0.0947 Epoch: 0 Global Step: 22130 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:03,005-Speed 2613.39 samples/sec Loss 16.7055 LearningRate 0.0947 Epoch: 0 Global Step: 22140 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:06,927-Speed 2611.99 samples/sec Loss 16.7682 LearningRate 0.0947 Epoch: 0 Global Step: 22150 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:10,834-Speed 2621.64 samples/sec Loss 16.7296 LearningRate 0.0947 Epoch: 0 Global Step: 22160 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:14,742-Speed 2621.10 samples/sec Loss 16.7022 LearningRate 0.0947 Epoch: 0 Global Step: 22170 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:18,738-Speed 2563.10 samples/sec Loss 16.9250 LearningRate 0.0947 Epoch: 0 Global Step: 22180 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:22,650-Speed 2618.31 samples/sec Loss 16.8334 LearningRate 0.0947 Epoch: 0 Global Step: 22190 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:26,596-Speed 2595.90 samples/sec Loss 16.6100 LearningRate 0.0947 Epoch: 0 Global Step: 22200 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:30,535-Speed 2600.19 samples/sec Loss 16.6690 LearningRate 0.0947 Epoch: 0 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:34,461-Speed 2608.77 samples/sec Loss 16.5633 LearningRate 0.0947 Epoch: 0 Global Step: 22220 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:13:38,380-Speed 2613.66 samples/sec Loss 16.8058 LearningRate 0.0947 Epoch: 0 Global Step: 22230 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:13:42,295-Speed 2616.63 samples/sec Loss 16.6796 LearningRate 0.0947 Epoch: 0 Global Step: 22240 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:13:46,211-Speed 2614.96 samples/sec Loss 16.8363 LearningRate 0.0947 Epoch: 0 Global Step: 22250 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:13:50,138-Speed 2608.60 samples/sec Loss 16.9602 LearningRate 0.0947 Epoch: 0 Global Step: 22260 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:13:54,033-Speed 2629.69 samples/sec Loss 16.8557 LearningRate 0.0947 Epoch: 0 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:13:57,946-Speed 2617.99 samples/sec Loss 16.6426 LearningRate 0.0947 Epoch: 0 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:01,883-Speed 2601.68 samples/sec Loss 16.7326 LearningRate 0.0947 Epoch: 0 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:05,975-Speed 2502.78 samples/sec Loss 16.8904 LearningRate 0.0947 Epoch: 0 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:10,072-Speed 2499.86 samples/sec Loss 16.8618 LearningRate 0.0947 Epoch: 0 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:14,118-Speed 2531.59 samples/sec Loss 16.8052 LearningRate 0.0947 Epoch: 0 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:18,029-Speed 2619.23 samples/sec Loss 16.7486 LearningRate 0.0947 Epoch: 0 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:21,944-Speed 2616.50 samples/sec Loss 16.6979 LearningRate 0.0947 Epoch: 0 Global Step: 22340 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:25,848-Speed 2623.91 samples/sec Loss 16.7916 LearningRate 0.0947 Epoch: 0 Global Step: 22350 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:29,775-Speed 2607.96 samples/sec Loss 16.8267 LearningRate 0.0947 Epoch: 0 Global Step: 22360 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:33,704-Speed 2606.30 samples/sec Loss 16.7835 LearningRate 0.0947 Epoch: 0 Global Step: 22370 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:14:37,624-Speed 2613.20 samples/sec Loss 16.6536 LearningRate 0.0947 Epoch: 0 Global Step: 22380 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:14:41,522-Speed 2627.66 samples/sec Loss 16.7005 LearningRate 0.0947 Epoch: 0 Global Step: 22390 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:14:45,438-Speed 2615.60 samples/sec Loss 16.8289 LearningRate 0.0947 Epoch: 0 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:14:49,347-Speed 2620.10 samples/sec Loss 16.9206 LearningRate 0.0947 Epoch: 0 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:14:53,250-Speed 2624.09 samples/sec Loss 16.8509 LearningRate 0.0947 Epoch: 0 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:14:57,156-Speed 2622.52 samples/sec Loss 16.6571 LearningRate 0.0947 Epoch: 0 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:15:01,058-Speed 2625.50 samples/sec Loss 16.5993 LearningRate 0.0947 Epoch: 0 Global Step: 22440 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:15:04,962-Speed 2622.96 samples/sec Loss 16.6175 LearningRate 0.0947 Epoch: 0 Global Step: 22450 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:15:08,864-Speed 2624.70 samples/sec Loss 16.7212 LearningRate 0.0947 Epoch: 0 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:15:12,769-Speed 2623.42 samples/sec Loss 16.6576 LearningRate 0.0947 Epoch: 0 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:15:16,677-Speed 2620.98 samples/sec Loss 16.8581 LearningRate 0.0947 Epoch: 0 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:15:20,604-Speed 2608.60 samples/sec Loss 16.7419 LearningRate 0.0947 Epoch: 0 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:15:24,517-Speed 2617.85 samples/sec Loss 16.6385 LearningRate 0.0946 Epoch: 0 Global Step: 22500 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:28,434-Speed 2614.90 samples/sec Loss 16.5988 LearningRate 0.0946 Epoch: 0 Global Step: 22510 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:32,340-Speed 2622.25 samples/sec Loss 16.5852 LearningRate 0.0946 Epoch: 0 Global Step: 22520 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:36,276-Speed 2602.04 samples/sec Loss 16.8079 LearningRate 0.0946 Epoch: 0 Global Step: 22530 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:40,185-Speed 2619.94 samples/sec Loss 16.6693 LearningRate 0.0946 Epoch: 0 Global Step: 22540 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:44,102-Speed 2615.59 samples/sec Loss 16.6322 LearningRate 0.0946 Epoch: 0 Global Step: 22550 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:48,006-Speed 2623.30 samples/sec Loss 16.7721 LearningRate 0.0946 Epoch: 0 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:51,932-Speed 2609.38 samples/sec Loss 16.7160 LearningRate 0.0946 Epoch: 0 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:55,837-Speed 2623.05 samples/sec Loss 16.7050 LearningRate 0.0946 Epoch: 0 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:15:59,740-Speed 2624.35 samples/sec Loss 16.5683 LearningRate 0.0946 Epoch: 0 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:16:03,642-Speed 2624.51 samples/sec Loss 16.7109 LearningRate 0.0946 Epoch: 0 Global Step: 22600 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:16:07,553-Speed 2618.98 samples/sec Loss 16.6937 LearningRate 0.0946 Epoch: 0 Global Step: 22610 Fp16 Grad Scale: 262144 Required: 91 hours
Training: 2022-04-12 22:16:11,429-Speed 2642.63 samples/sec Loss 16.6583 LearningRate 0.0946 Epoch: 0 Global Step: 22620 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:15,357-Speed 2607.57 samples/sec Loss 16.5752 LearningRate 0.0946 Epoch: 0 Global Step: 22630 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:19,277-Speed 2613.26 samples/sec Loss 16.5559 LearningRate 0.0946 Epoch: 0 Global Step: 22640 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:23,178-Speed 2625.43 samples/sec Loss 16.7394 LearningRate 0.0946 Epoch: 0 Global Step: 22650 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:27,083-Speed 2623.17 samples/sec Loss 16.7260 LearningRate 0.0946 Epoch: 0 Global Step: 22660 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:30,999-Speed 2615.94 samples/sec Loss 16.7089 LearningRate 0.0946 Epoch: 0 Global Step: 22670 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:34,896-Speed 2627.95 samples/sec Loss 16.6027 LearningRate 0.0946 Epoch: 0 Global Step: 22680 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:38,804-Speed 2621.01 samples/sec Loss 16.6966 LearningRate 0.0946 Epoch: 0 Global Step: 22690 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:42,720-Speed 2616.22 samples/sec Loss 16.5720 LearningRate 0.0946 Epoch: 0 Global Step: 22700 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:46,648-Speed 2607.15 samples/sec Loss 16.7235 LearningRate 0.0946 Epoch: 0 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 91 hours
Training: 2022-04-12 22:16:50,553-Speed 2622.97 samples/sec Loss 16.4758 LearningRate 0.0946 Epoch: 0 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:16:54,458-Speed 2623.05 samples/sec Loss 16.6120 LearningRate 0.0946 Epoch: 0 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:16:58,410-Speed 2592.07 samples/sec Loss 16.5066 LearningRate 0.0946 Epoch: 0 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:17:02,340-Speed 2606.30 samples/sec Loss 16.6481 LearningRate 0.0946 Epoch: 0 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:17:06,253-Speed 2617.86 samples/sec Loss 16.6615 LearningRate 0.0946 Epoch: 0 Global Step: 22760 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:17:10,163-Speed 2619.12 samples/sec Loss 16.5848 LearningRate 0.0946 Epoch: 0 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:17:14,071-Speed 2620.94 samples/sec Loss 16.6244 LearningRate 0.0946 Epoch: 0 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:17:17,987-Speed 2614.99 samples/sec Loss 16.4830 LearningRate 0.0946 Epoch: 0 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 91 hours
Training: 2022-04-12 22:17:21,898-Speed 2619.21 samples/sec Loss 16.6417 LearningRate 0.0946 Epoch: 0 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:17:25,800-Speed 2625.39 samples/sec Loss 16.7599 LearningRate 0.0946 Epoch: 0 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:17:29,704-Speed 2623.33 samples/sec Loss 16.6405 LearningRate 0.0946 Epoch: 0 Global Step: 22820 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:17:33,613-Speed 2620.37 samples/sec Loss 16.4435 LearningRate 0.0946 Epoch: 0 Global Step: 22830 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:17:37,507-Speed 2632.65 samples/sec Loss 16.6291 LearningRate 0.0946 Epoch: 0 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:17:41,405-Speed 2627.74 samples/sec Loss 16.5831 LearningRate 0.0946 Epoch: 0 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:17:45,311-Speed 2621.99 samples/sec Loss 16.5760 LearningRate 0.0946 Epoch: 0 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:17:49,209-Speed 2627.10 samples/sec Loss 16.7358 LearningRate 0.0946 Epoch: 0 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:17:53,116-Speed 2621.97 samples/sec Loss 16.4928 LearningRate 0.0946 Epoch: 0 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:17:57,018-Speed 2625.29 samples/sec Loss 16.5705 LearningRate 0.0946 Epoch: 0 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:18:00,922-Speed 2623.82 samples/sec Loss 16.5105 LearningRate 0.0946 Epoch: 0 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:18:04,823-Speed 2625.17 samples/sec Loss 16.4597 LearningRate 0.0946 Epoch: 0 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:18:08,722-Speed 2626.89 samples/sec Loss 16.5756 LearningRate 0.0946 Epoch: 0 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:18:12,625-Speed 2624.10 samples/sec Loss 16.5930 LearningRate 0.0945 Epoch: 0 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:18:16,530-Speed 2623.16 samples/sec Loss 16.6126 LearningRate 0.0945 Epoch: 0 Global Step: 22940 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:20,433-Speed 2623.86 samples/sec Loss 16.6224 LearningRate 0.0945 Epoch: 0 Global Step: 22950 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:24,340-Speed 2621.65 samples/sec Loss 16.7100 LearningRate 0.0945 Epoch: 0 Global Step: 22960 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:28,243-Speed 2624.88 samples/sec Loss 16.6416 LearningRate 0.0945 Epoch: 0 Global Step: 22970 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:32,142-Speed 2627.17 samples/sec Loss 16.7317 LearningRate 0.0945 Epoch: 0 Global Step: 22980 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:36,042-Speed 2625.56 samples/sec Loss 16.3526 LearningRate 0.0945 Epoch: 0 Global Step: 22990 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:39,952-Speed 2619.62 samples/sec Loss 16.6539 LearningRate 0.0945 Epoch: 0 Global Step: 23000 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:43,890-Speed 2601.56 samples/sec Loss 16.6327 LearningRate 0.0945 Epoch: 0 Global Step: 23010 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:47,791-Speed 2625.24 samples/sec Loss 16.6432 LearningRate 0.0945 Epoch: 0 Global Step: 23020 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:51,691-Speed 2626.20 samples/sec Loss 16.4949 LearningRate 0.0945 Epoch: 0 Global Step: 23030 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:18:55,607-Speed 2615.79 samples/sec Loss 16.7332 LearningRate 0.0945 Epoch: 0 Global Step: 23040 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:18:59,496-Speed 2633.89 samples/sec Loss 16.5109 LearningRate 0.0945 Epoch: 0 Global Step: 23050 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:03,494-Speed 2561.36 samples/sec Loss 16.5240 LearningRate 0.0945 Epoch: 0 Global Step: 23060 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:07,402-Speed 2621.51 samples/sec Loss 16.6508 LearningRate 0.0945 Epoch: 0 Global Step: 23070 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:11,313-Speed 2618.40 samples/sec Loss 16.5687 LearningRate 0.0945 Epoch: 0 Global Step: 23080 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:15,222-Speed 2620.33 samples/sec Loss 16.7593 LearningRate 0.0945 Epoch: 0 Global Step: 23090 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:19,133-Speed 2619.34 samples/sec Loss 16.5569 LearningRate 0.0945 Epoch: 0 Global Step: 23100 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:23,056-Speed 2611.00 samples/sec Loss 16.3881 LearningRate 0.0945 Epoch: 0 Global Step: 23110 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:26,956-Speed 2626.09 samples/sec Loss 16.4379 LearningRate 0.0945 Epoch: 0 Global Step: 23120 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:30,866-Speed 2619.21 samples/sec Loss 16.4483 LearningRate 0.0945 Epoch: 0 Global Step: 23130 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:34,783-Speed 2614.79 samples/sec Loss 16.3667 LearningRate 0.0945 Epoch: 0 Global Step: 23140 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:38,679-Speed 2629.04 samples/sec Loss 16.5144 LearningRate 0.0945 Epoch: 0 Global Step: 23150 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:42,591-Speed 2618.07 samples/sec Loss 16.4234 LearningRate 0.0945 Epoch: 0 Global Step: 23160 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:19:46,484-Speed 2631.34 samples/sec Loss 16.5534 LearningRate 0.0945 Epoch: 0 Global Step: 23170 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:19:50,393-Speed 2620.47 samples/sec Loss 16.6336 LearningRate 0.0945 Epoch: 0 Global Step: 23180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:19:54,293-Speed 2626.27 samples/sec Loss 16.4652 LearningRate 0.0945 Epoch: 0 Global Step: 23190 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:19:58,194-Speed 2625.49 samples/sec Loss 16.4841 LearningRate 0.0945 Epoch: 0 Global Step: 23200 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:20:02,094-Speed 2626.39 samples/sec Loss 16.5268 LearningRate 0.0945 Epoch: 0 Global Step: 23210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:20:05,992-Speed 2627.54 samples/sec Loss 16.4997 LearningRate 0.0945 Epoch: 0 Global Step: 23220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:20:09,893-Speed 2625.50 samples/sec Loss 16.6351 LearningRate 0.0945 Epoch: 0 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:20:13,799-Speed 2622.11 samples/sec Loss 16.5465 LearningRate 0.0945 Epoch: 0 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:20:17,710-Speed 2619.43 samples/sec Loss 16.5233 LearningRate 0.0945 Epoch: 0 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:20:21,613-Speed 2624.16 samples/sec Loss 16.4842 LearningRate 0.0945 Epoch: 0 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:20:25,520-Speed 2621.74 samples/sec Loss 16.6100 LearningRate 0.0945 Epoch: 0 Global Step: 23270 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:29,502-Speed 2572.28 samples/sec Loss 16.6170 LearningRate 0.0945 Epoch: 0 Global Step: 23280 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:33,413-Speed 2618.65 samples/sec Loss 16.4927 LearningRate 0.0945 Epoch: 0 Global Step: 23290 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:37,312-Speed 2626.64 samples/sec Loss 16.5420 LearningRate 0.0945 Epoch: 0 Global Step: 23300 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:41,226-Speed 2616.99 samples/sec Loss 16.3600 LearningRate 0.0945 Epoch: 0 Global Step: 23310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:45,131-Speed 2622.89 samples/sec Loss 16.5680 LearningRate 0.0945 Epoch: 0 Global Step: 23320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:49,037-Speed 2623.15 samples/sec Loss 16.5730 LearningRate 0.0945 Epoch: 0 Global Step: 23330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:52,942-Speed 2622.58 samples/sec Loss 16.3344 LearningRate 0.0945 Epoch: 0 Global Step: 23340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:20:56,843-Speed 2625.64 samples/sec Loss 16.2890 LearningRate 0.0944 Epoch: 0 Global Step: 23350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:21:00,751-Speed 2620.70 samples/sec Loss 16.4035 LearningRate 0.0944 Epoch: 0 Global Step: 23360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:21:04,666-Speed 2616.14 samples/sec Loss 16.3842 LearningRate 0.0944 Epoch: 0 Global Step: 23370 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:21:08,553-Speed 2634.50 samples/sec Loss 16.3626 LearningRate 0.0944 Epoch: 0 Global Step: 23380 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:21:12,460-Speed 2622.28 samples/sec Loss 16.5520 LearningRate 0.0944 Epoch: 0 Global Step: 23390 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:21:16,368-Speed 2620.94 samples/sec Loss 16.4575 LearningRate 0.0944 Epoch: 0 Global Step: 23400 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:21:20,254-Speed 2635.74 samples/sec Loss 16.4034 LearningRate 0.0944 Epoch: 0 Global Step: 23410 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:24,156-Speed 2625.16 samples/sec Loss 16.5067 LearningRate 0.0944 Epoch: 0 Global Step: 23420 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:28,058-Speed 2624.56 samples/sec Loss 16.2884 LearningRate 0.0944 Epoch: 0 Global Step: 23430 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:31,957-Speed 2627.04 samples/sec Loss 16.4724 LearningRate 0.0944 Epoch: 0 Global Step: 23440 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:35,856-Speed 2626.90 samples/sec Loss 16.5573 LearningRate 0.0944 Epoch: 0 Global Step: 23450 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:39,760-Speed 2623.16 samples/sec Loss 16.6909 LearningRate 0.0944 Epoch: 0 Global Step: 23460 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:43,668-Speed 2621.60 samples/sec Loss 16.2647 LearningRate 0.0944 Epoch: 0 Global Step: 23470 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:47,568-Speed 2626.33 samples/sec Loss 16.4521 LearningRate 0.0944 Epoch: 0 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:51,470-Speed 2625.03 samples/sec Loss 16.3547 LearningRate 0.0944 Epoch: 0 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:55,378-Speed 2620.37 samples/sec Loss 16.2704 LearningRate 0.0944 Epoch: 0 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:21:59,365-Speed 2569.45 samples/sec Loss 16.5173 LearningRate 0.0944 Epoch: 0 Global Step: 23510 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:03,294-Speed 2606.54 samples/sec Loss 16.2632 LearningRate 0.0944 Epoch: 0 Global Step: 23520 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:07,191-Speed 2627.75 samples/sec Loss 16.3426 LearningRate 0.0944 Epoch: 0 Global Step: 23530 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:11,101-Speed 2619.71 samples/sec Loss 16.4848 LearningRate 0.0944 Epoch: 0 Global Step: 23540 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:15,004-Speed 2624.37 samples/sec Loss 16.3858 LearningRate 0.0944 Epoch: 0 Global Step: 23550 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:18,908-Speed 2624.05 samples/sec Loss 16.5043 LearningRate 0.0944 Epoch: 0 Global Step: 23560 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:22,808-Speed 2625.84 samples/sec Loss 16.3076 LearningRate 0.0944 Epoch: 0 Global Step: 23570 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:26,706-Speed 2627.62 samples/sec Loss 16.4921 LearningRate 0.0944 Epoch: 0 Global Step: 23580 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:30,624-Speed 2613.99 samples/sec Loss 16.3556 LearningRate 0.0944 Epoch: 0 Global Step: 23590 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:34,528-Speed 2623.77 samples/sec Loss 16.2611 LearningRate 0.0944 Epoch: 0 Global Step: 23600 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:38,415-Speed 2634.82 samples/sec Loss 16.4782 LearningRate 0.0944 Epoch: 0 Global Step: 23610 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:22:42,300-Speed 2636.80 samples/sec Loss 16.4489 LearningRate 0.0944 Epoch: 0 Global Step: 23620 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:22:46,206-Speed 2622.12 samples/sec Loss 16.4776 LearningRate 0.0944 Epoch: 0 Global Step: 23630 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:22:50,149-Speed 2597.39 samples/sec Loss 16.3204 LearningRate 0.0944 Epoch: 0 Global Step: 23640 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:22:54,045-Speed 2628.81 samples/sec Loss 16.3845 LearningRate 0.0944 Epoch: 0 Global Step: 23650 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:22:57,942-Speed 2629.22 samples/sec Loss 16.6311 LearningRate 0.0944 Epoch: 0 Global Step: 23660 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:01,843-Speed 2625.51 samples/sec Loss 16.4749 LearningRate 0.0944 Epoch: 0 Global Step: 23670 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:05,745-Speed 2624.69 samples/sec Loss 16.4679 LearningRate 0.0944 Epoch: 0 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:09,649-Speed 2623.48 samples/sec Loss 16.3462 LearningRate 0.0944 Epoch: 0 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:13,555-Speed 2622.18 samples/sec Loss 16.2171 LearningRate 0.0944 Epoch: 0 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:17,460-Speed 2622.92 samples/sec Loss 16.2067 LearningRate 0.0944 Epoch: 0 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:21,358-Speed 2627.41 samples/sec Loss 16.1076 LearningRate 0.0944 Epoch: 0 Global Step: 23720 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:23:25,263-Speed 2623.40 samples/sec Loss 16.2866 LearningRate 0.0944 Epoch: 0 Global Step: 23730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:23:29,171-Speed 2621.14 samples/sec Loss 16.1951 LearningRate 0.0944 Epoch: 0 Global Step: 23740 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:23:33,066-Speed 2629.19 samples/sec Loss 16.4544 LearningRate 0.0944 Epoch: 0 Global Step: 23750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:23:36,972-Speed 2622.31 samples/sec Loss 16.5291 LearningRate 0.0944 Epoch: 0 Global Step: 23760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:23:40,868-Speed 2628.43 samples/sec Loss 16.4718 LearningRate 0.0944 Epoch: 0 Global Step: 23770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:23:44,758-Speed 2633.27 samples/sec Loss 16.4226 LearningRate 0.0943 Epoch: 0 Global Step: 23780 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:48,662-Speed 2623.21 samples/sec Loss 16.4711 LearningRate 0.0943 Epoch: 0 Global Step: 23790 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:52,565-Speed 2624.08 samples/sec Loss 16.2945 LearningRate 0.0943 Epoch: 0 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:23:56,463-Speed 2627.41 samples/sec Loss 16.4693 LearningRate 0.0943 Epoch: 0 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:00,362-Speed 2627.66 samples/sec Loss 16.4158 LearningRate 0.0943 Epoch: 0 Global Step: 23820 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:04,258-Speed 2629.14 samples/sec Loss 16.2077 LearningRate 0.0943 Epoch: 0 Global Step: 23830 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:08,157-Speed 2626.19 samples/sec Loss 16.5161 LearningRate 0.0943 Epoch: 0 Global Step: 23840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:12,056-Speed 2627.22 samples/sec Loss 16.3597 LearningRate 0.0943 Epoch: 0 Global Step: 23850 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:15,957-Speed 2625.75 samples/sec Loss 16.4629 LearningRate 0.0943 Epoch: 0 Global Step: 23860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:19,857-Speed 2626.13 samples/sec Loss 16.5167 LearningRate 0.0943 Epoch: 0 Global Step: 23870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:23,750-Speed 2630.28 samples/sec Loss 16.3669 LearningRate 0.0943 Epoch: 0 Global Step: 23880 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:24:27,656-Speed 2622.60 samples/sec Loss 16.3555 LearningRate 0.0943 Epoch: 0 Global Step: 23890 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:24:31,627-Speed 2579.33 samples/sec Loss 16.3344 LearningRate 0.0943 Epoch: 0 Global Step: 23900 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:24:35,537-Speed 2619.21 samples/sec Loss 16.4348 LearningRate 0.0943 Epoch: 0 Global Step: 23910 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:24:39,427-Speed 2633.43 samples/sec Loss 16.3685 LearningRate 0.0943 Epoch: 0 Global Step: 23920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:43,463-Speed 2537.66 samples/sec Loss 16.4127 LearningRate 0.0943 Epoch: 0 Global Step: 23930 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:47,362-Speed 2627.31 samples/sec Loss 16.5432 LearningRate 0.0943 Epoch: 0 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:51,263-Speed 2625.45 samples/sec Loss 16.3310 LearningRate 0.0943 Epoch: 0 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:55,168-Speed 2622.80 samples/sec Loss 16.2016 LearningRate 0.0943 Epoch: 0 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:24:59,070-Speed 2624.54 samples/sec Loss 16.4113 LearningRate 0.0943 Epoch: 0 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:25:02,974-Speed 2623.26 samples/sec Loss 16.1036 LearningRate 0.0943 Epoch: 0 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:25:06,873-Speed 2627.50 samples/sec Loss 16.3123 LearningRate 0.0943 Epoch: 0 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:25:10,765-Speed 2631.56 samples/sec Loss 16.3975 LearningRate 0.0943 Epoch: 0 Global Step: 24000 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:14,685-Speed 2613.41 samples/sec Loss 16.3409 LearningRate 0.0943 Epoch: 0 Global Step: 24010 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:18,578-Speed 2630.86 samples/sec Loss 16.3821 LearningRate 0.0943 Epoch: 0 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:22,486-Speed 2621.13 samples/sec Loss 16.5595 LearningRate 0.0943 Epoch: 0 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:26,380-Speed 2630.43 samples/sec Loss 16.2933 LearningRate 0.0943 Epoch: 0 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:30,277-Speed 2628.80 samples/sec Loss 16.2740 LearningRate 0.0943 Epoch: 0 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:34,180-Speed 2624.00 samples/sec Loss 16.3091 LearningRate 0.0943 Epoch: 0 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:38,084-Speed 2623.46 samples/sec Loss 16.3472 LearningRate 0.0943 Epoch: 0 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:41,982-Speed 2627.46 samples/sec Loss 16.3115 LearningRate 0.0943 Epoch: 0 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:45,894-Speed 2618.39 samples/sec Loss 16.0769 LearningRate 0.0943 Epoch: 0 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:25:49,792-Speed 2627.88 samples/sec Loss 16.1843 LearningRate 0.0943 Epoch: 0 Global Step: 24100 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:25:53,690-Speed 2627.88 samples/sec Loss 16.3162 LearningRate 0.0943 Epoch: 0 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:25:57,588-Speed 2627.45 samples/sec Loss 16.4758 LearningRate 0.0943 Epoch: 0 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:01,495-Speed 2621.36 samples/sec Loss 16.3187 LearningRate 0.0943 Epoch: 0 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:05,398-Speed 2624.04 samples/sec Loss 16.3519 LearningRate 0.0943 Epoch: 0 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:09,298-Speed 2626.05 samples/sec Loss 16.2054 LearningRate 0.0943 Epoch: 0 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:13,197-Speed 2627.52 samples/sec Loss 16.3593 LearningRate 0.0943 Epoch: 0 Global Step: 24160 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:17,099-Speed 2624.77 samples/sec Loss 16.2533 LearningRate 0.0943 Epoch: 0 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:20,996-Speed 2628.62 samples/sec Loss 16.3028 LearningRate 0.0943 Epoch: 0 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:24,897-Speed 2626.13 samples/sec Loss 16.4042 LearningRate 0.0943 Epoch: 0 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:28,795-Speed 2627.28 samples/sec Loss 16.2423 LearningRate 0.0943 Epoch: 0 Global Step: 24200 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:26:32,677-Speed 2638.64 samples/sec Loss 16.3283 LearningRate 0.0942 Epoch: 0 Global Step: 24210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:36,574-Speed 2628.36 samples/sec Loss 16.2913 LearningRate 0.0942 Epoch: 0 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:40,475-Speed 2625.25 samples/sec Loss 16.3129 LearningRate 0.0942 Epoch: 0 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:44,451-Speed 2576.47 samples/sec Loss 16.3265 LearningRate 0.0942 Epoch: 0 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:48,363-Speed 2618.12 samples/sec Loss 16.1471 LearningRate 0.0942 Epoch: 0 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:52,257-Speed 2630.53 samples/sec Loss 16.3599 LearningRate 0.0942 Epoch: 0 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:26:56,154-Speed 2628.13 samples/sec Loss 16.3434 LearningRate 0.0942 Epoch: 0 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:27:00,054-Speed 2626.86 samples/sec Loss 16.2524 LearningRate 0.0942 Epoch: 0 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:27:03,955-Speed 2625.57 samples/sec Loss 16.3845 LearningRate 0.0942 Epoch: 0 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:27:07,855-Speed 2625.92 samples/sec Loss 16.2886 LearningRate 0.0942 Epoch: 0 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:27:11,752-Speed 2628.49 samples/sec Loss 16.2064 LearningRate 0.0942 Epoch: 0 Global Step: 24310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:15,668-Speed 2615.59 samples/sec Loss 16.3439 LearningRate 0.0942 Epoch: 0 Global Step: 24320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:19,570-Speed 2625.11 samples/sec Loss 16.3769 LearningRate 0.0942 Epoch: 0 Global Step: 24330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:23,471-Speed 2625.75 samples/sec Loss 16.2354 LearningRate 0.0942 Epoch: 0 Global Step: 24340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:27,376-Speed 2622.94 samples/sec Loss 16.3487 LearningRate 0.0942 Epoch: 0 Global Step: 24350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:31,277-Speed 2625.78 samples/sec Loss 16.1938 LearningRate 0.0942 Epoch: 0 Global Step: 24360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:35,177-Speed 2626.51 samples/sec Loss 16.3039 LearningRate 0.0942 Epoch: 0 Global Step: 24370 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:39,089-Speed 2617.52 samples/sec Loss 16.3368 LearningRate 0.0942 Epoch: 0 Global Step: 24380 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:43,028-Speed 2601.06 samples/sec Loss 16.1631 LearningRate 0.0942 Epoch: 0 Global Step: 24390 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:46,933-Speed 2622.74 samples/sec Loss 16.3247 LearningRate 0.0942 Epoch: 0 Global Step: 24400 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:50,817-Speed 2637.27 samples/sec Loss 16.1291 LearningRate 0.0942 Epoch: 0 Global Step: 24410 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:54,714-Speed 2627.86 samples/sec Loss 16.1441 LearningRate 0.0942 Epoch: 0 Global Step: 24420 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:27:58,615-Speed 2626.33 samples/sec Loss 16.1917 LearningRate 0.0942 Epoch: 0 Global Step: 24430 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:02,520-Speed 2623.15 samples/sec Loss 16.2824 LearningRate 0.0942 Epoch: 0 Global Step: 24440 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:06,433-Speed 2617.28 samples/sec Loss 15.9994 LearningRate 0.0942 Epoch: 0 Global Step: 24450 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:10,331-Speed 2627.06 samples/sec Loss 15.9221 LearningRate 0.0942 Epoch: 0 Global Step: 24460 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:14,242-Speed 2619.87 samples/sec Loss 16.2837 LearningRate 0.0942 Epoch: 0 Global Step: 24470 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:18,144-Speed 2624.88 samples/sec Loss 16.0933 LearningRate 0.0942 Epoch: 0 Global Step: 24480 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:22,043-Speed 2626.55 samples/sec Loss 16.1981 LearningRate 0.0942 Epoch: 0 Global Step: 24490 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:25,945-Speed 2624.86 samples/sec Loss 16.2517 LearningRate 0.0942 Epoch: 0 Global Step: 24500 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:29,845-Speed 2626.80 samples/sec Loss 16.1236 LearningRate 0.0942 Epoch: 0 Global Step: 24510 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:33,748-Speed 2623.95 samples/sec Loss 16.2278 LearningRate 0.0942 Epoch: 0 Global Step: 24520 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:37,669-Speed 2611.91 samples/sec Loss 16.2078 LearningRate 0.0942 Epoch: 0 Global Step: 24530 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:41,571-Speed 2625.33 samples/sec Loss 16.2233 LearningRate 0.0942 Epoch: 0 Global Step: 24540 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:45,493-Speed 2611.62 samples/sec Loss 16.2463 LearningRate 0.0942 Epoch: 0 Global Step: 24550 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:49,401-Speed 2621.57 samples/sec Loss 16.2629 LearningRate 0.0942 Epoch: 0 Global Step: 24560 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:53,324-Speed 2610.46 samples/sec Loss 16.1297 LearningRate 0.0942 Epoch: 0 Global Step: 24570 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:28:57,224-Speed 2626.51 samples/sec Loss 16.3627 LearningRate 0.0942 Epoch: 0 Global Step: 24580 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:01,133-Speed 2619.81 samples/sec Loss 16.2449 LearningRate 0.0942 Epoch: 0 Global Step: 24590 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:05,035-Speed 2625.45 samples/sec Loss 16.2264 LearningRate 0.0942 Epoch: 0 Global Step: 24600 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:08,923-Speed 2634.53 samples/sec Loss 16.0792 LearningRate 0.0942 Epoch: 0 Global Step: 24610 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:12,824-Speed 2625.66 samples/sec Loss 16.1326 LearningRate 0.0942 Epoch: 0 Global Step: 24620 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:16,719-Speed 2629.96 samples/sec Loss 16.1789 LearningRate 0.0942 Epoch: 0 Global Step: 24630 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:20,617-Speed 2627.70 samples/sec Loss 16.0838 LearningRate 0.0941 Epoch: 0 Global Step: 24640 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:24,519-Speed 2624.98 samples/sec Loss 16.1601 LearningRate 0.0941 Epoch: 0 Global Step: 24650 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:28,419-Speed 2626.29 samples/sec Loss 16.2356 LearningRate 0.0941 Epoch: 0 Global Step: 24660 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:32,320-Speed 2625.65 samples/sec Loss 16.2092 LearningRate 0.0941 Epoch: 0 Global Step: 24670 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:36,228-Speed 2621.01 samples/sec Loss 16.1104 LearningRate 0.0941 Epoch: 0 Global Step: 24680 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:40,131-Speed 2624.25 samples/sec Loss 16.2075 LearningRate 0.0941 Epoch: 0 Global Step: 24690 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:44,026-Speed 2629.55 samples/sec Loss 16.2501 LearningRate 0.0941 Epoch: 0 Global Step: 24700 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:47,940-Speed 2617.12 samples/sec Loss 16.1106 LearningRate 0.0941 Epoch: 0 Global Step: 24710 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:29:51,825-Speed 2636.64 samples/sec Loss 16.0720 LearningRate 0.0941 Epoch: 0 Global Step: 24720 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:55,726-Speed 2625.51 samples/sec Loss 16.1187 LearningRate 0.0941 Epoch: 0 Global Step: 24730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:29:59,632-Speed 2621.64 samples/sec Loss 15.9715 LearningRate 0.0941 Epoch: 0 Global Step: 24740 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:03,530-Speed 2627.82 samples/sec Loss 16.1906 LearningRate 0.0941 Epoch: 0 Global Step: 24750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:07,430-Speed 2626.81 samples/sec Loss 16.1093 LearningRate 0.0941 Epoch: 0 Global Step: 24760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:11,324-Speed 2630.61 samples/sec Loss 16.2786 LearningRate 0.0941 Epoch: 0 Global Step: 24770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:15,220-Speed 2628.60 samples/sec Loss 16.2142 LearningRate 0.0941 Epoch: 0 Global Step: 24780 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:19,116-Speed 2629.32 samples/sec Loss 16.0429 LearningRate 0.0941 Epoch: 0 Global Step: 24790 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:23,016-Speed 2625.47 samples/sec Loss 16.2146 LearningRate 0.0941 Epoch: 0 Global Step: 24800 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:26,914-Speed 2627.96 samples/sec Loss 16.1350 LearningRate 0.0941 Epoch: 0 Global Step: 24810 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:30,793-Speed 2639.83 samples/sec Loss 16.0570 LearningRate 0.0941 Epoch: 0 Global Step: 24820 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:34,693-Speed 2627.07 samples/sec Loss 16.2219 LearningRate 0.0941 Epoch: 0 Global Step: 24830 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:30:38,578-Speed 2636.70 samples/sec Loss 16.0566 LearningRate 0.0941 Epoch: 0 Global Step: 24840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:30:42,473-Speed 2629.86 samples/sec Loss 15.9370 LearningRate 0.0941 Epoch: 0 Global Step: 24850 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:30:46,369-Speed 2628.73 samples/sec Loss 16.0747 LearningRate 0.0941 Epoch: 0 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:30:50,270-Speed 2625.69 samples/sec Loss 16.2535 LearningRate 0.0941 Epoch: 0 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:30:54,168-Speed 2627.39 samples/sec Loss 16.1590 LearningRate 0.0941 Epoch: 0 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:30:58,074-Speed 2622.70 samples/sec Loss 16.2205 LearningRate 0.0941 Epoch: 0 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:31:01,973-Speed 2626.20 samples/sec Loss 16.1926 LearningRate 0.0941 Epoch: 0 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:31:05,874-Speed 2625.66 samples/sec Loss 15.9412 LearningRate 0.0941 Epoch: 0 Global Step: 24910 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:31:09,777-Speed 2624.70 samples/sec Loss 16.1645 LearningRate 0.0941 Epoch: 0 Global Step: 24920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:31:13,681-Speed 2623.20 samples/sec Loss 15.9605 LearningRate 0.0941 Epoch: 0 Global Step: 24930 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:31:17,589-Speed 2621.30 samples/sec Loss 16.1188 LearningRate 0.0941 Epoch: 0 Global Step: 24940 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:21,484-Speed 2629.62 samples/sec Loss 16.0613 LearningRate 0.0941 Epoch: 0 Global Step: 24950 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:25,378-Speed 2630.28 samples/sec Loss 16.1395 LearningRate 0.0941 Epoch: 0 Global Step: 24960 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:29,280-Speed 2624.57 samples/sec Loss 16.0491 LearningRate 0.0941 Epoch: 0 Global Step: 24970 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:33,177-Speed 2628.25 samples/sec Loss 16.0942 LearningRate 0.0941 Epoch: 0 Global Step: 24980 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:37,073-Speed 2628.94 samples/sec Loss 15.9214 LearningRate 0.0941 Epoch: 0 Global Step: 24990 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:40,974-Speed 2625.68 samples/sec Loss 16.2773 LearningRate 0.0941 Epoch: 0 Global Step: 25000 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:44,886-Speed 2618.74 samples/sec Loss 16.1864 LearningRate 0.0941 Epoch: 0 Global Step: 25010 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:48,780-Speed 2629.72 samples/sec Loss 16.0812 LearningRate 0.0941 Epoch: 0 Global Step: 25020 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:52,682-Speed 2625.59 samples/sec Loss 16.0169 LearningRate 0.0941 Epoch: 0 Global Step: 25030 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:31:56,582-Speed 2626.30 samples/sec Loss 16.0744 LearningRate 0.0941 Epoch: 0 Global Step: 25040 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:32:00,459-Speed 2642.06 samples/sec Loss 16.1355 LearningRate 0.0941 Epoch: 0 Global Step: 25050 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:04,360-Speed 2625.39 samples/sec Loss 15.9694 LearningRate 0.0940 Epoch: 0 Global Step: 25060 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:08,282-Speed 2611.09 samples/sec Loss 15.9633 LearningRate 0.0940 Epoch: 0 Global Step: 25070 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:12,181-Speed 2627.21 samples/sec Loss 16.0465 LearningRate 0.0940 Epoch: 0 Global Step: 25080 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:16,077-Speed 2629.47 samples/sec Loss 16.1060 LearningRate 0.0940 Epoch: 0 Global Step: 25090 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:19,977-Speed 2626.99 samples/sec Loss 16.2154 LearningRate 0.0940 Epoch: 0 Global Step: 25100 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:23,879-Speed 2624.85 samples/sec Loss 16.0709 LearningRate 0.0940 Epoch: 0 Global Step: 25110 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:27,778-Speed 2626.73 samples/sec Loss 16.1173 LearningRate 0.0940 Epoch: 0 Global Step: 25120 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:31,678-Speed 2626.67 samples/sec Loss 16.2152 LearningRate 0.0940 Epoch: 0 Global Step: 25130 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:35,576-Speed 2627.39 samples/sec Loss 16.0528 LearningRate 0.0940 Epoch: 0 Global Step: 25140 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:39,457-Speed 2639.00 samples/sec Loss 16.0490 LearningRate 0.0940 Epoch: 0 Global Step: 25150 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:43,355-Speed 2627.90 samples/sec Loss 16.1691 LearningRate 0.0940 Epoch: 0 Global Step: 25160 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:47,252-Speed 2628.53 samples/sec Loss 16.1384 LearningRate 0.0940 Epoch: 0 Global Step: 25170 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:51,192-Speed 2599.24 samples/sec Loss 15.9610 LearningRate 0.0940 Epoch: 0 Global Step: 25180 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:55,096-Speed 2624.11 samples/sec Loss 15.9938 LearningRate 0.0940 Epoch: 0 Global Step: 25190 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:32:58,992-Speed 2629.03 samples/sec Loss 16.1451 LearningRate 0.0940 Epoch: 0 Global Step: 25200 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:02,902-Speed 2619.53 samples/sec Loss 16.0753 LearningRate 0.0940 Epoch: 0 Global Step: 25210 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:06,801-Speed 2626.82 samples/sec Loss 16.0884 LearningRate 0.0940 Epoch: 0 Global Step: 25220 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:10,702-Speed 2625.77 samples/sec Loss 16.1005 LearningRate 0.0940 Epoch: 0 Global Step: 25230 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:14,614-Speed 2618.52 samples/sec Loss 16.0414 LearningRate 0.0940 Epoch: 0 Global Step: 25240 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:18,514-Speed 2626.58 samples/sec Loss 16.1543 LearningRate 0.0940 Epoch: 0 Global Step: 25250 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:33:22,580-Speed 2519.01 samples/sec Loss 15.9576 LearningRate 0.0940 Epoch: 0 Global Step: 25260 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:26,499-Speed 2613.55 samples/sec Loss 16.2170 LearningRate 0.0940 Epoch: 0 Global Step: 25270 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:30,430-Speed 2605.80 samples/sec Loss 16.1568 LearningRate 0.0940 Epoch: 0 Global Step: 25280 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:34,329-Speed 2627.12 samples/sec Loss 16.0414 LearningRate 0.0940 Epoch: 0 Global Step: 25290 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:38,227-Speed 2627.73 samples/sec Loss 15.9548 LearningRate 0.0940 Epoch: 0 Global Step: 25300 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:42,124-Speed 2627.66 samples/sec Loss 16.1961 LearningRate 0.0940 Epoch: 0 Global Step: 25310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:46,020-Speed 2629.04 samples/sec Loss 16.0834 LearningRate 0.0940 Epoch: 0 Global Step: 25320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:49,927-Speed 2622.13 samples/sec Loss 15.8710 LearningRate 0.0940 Epoch: 0 Global Step: 25330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:53,968-Speed 2534.76 samples/sec Loss 15.9740 LearningRate 0.0940 Epoch: 0 Global Step: 25340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:33:57,981-Speed 2552.36 samples/sec Loss 16.1969 LearningRate 0.0940 Epoch: 0 Global Step: 25350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:01,863-Speed 2638.57 samples/sec Loss 16.1169 LearningRate 0.0940 Epoch: 0 Global Step: 25360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:05,763-Speed 2626.07 samples/sec Loss 16.1198 LearningRate 0.0940 Epoch: 0 Global Step: 25370 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:09,686-Speed 2610.73 samples/sec Loss 16.2013 LearningRate 0.0940 Epoch: 0 Global Step: 25380 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:13,582-Speed 2633.26 samples/sec Loss 16.0836 LearningRate 0.0940 Epoch: 0 Global Step: 25390 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:17,484-Speed 2624.38 samples/sec Loss 16.0941 LearningRate 0.0940 Epoch: 0 Global Step: 25400 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:21,398-Speed 2616.97 samples/sec Loss 16.0220 LearningRate 0.0940 Epoch: 0 Global Step: 25410 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:25,294-Speed 2629.26 samples/sec Loss 16.0441 LearningRate 0.0940 Epoch: 0 Global Step: 25420 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:34:29,178-Speed 2636.95 samples/sec Loss 15.8189 LearningRate 0.0940 Epoch: 0 Global Step: 25430 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:34:33,085-Speed 2621.59 samples/sec Loss 16.0026 LearningRate 0.0940 Epoch: 0 Global Step: 25440 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:34:36,982-Speed 2628.35 samples/sec Loss 15.8957 LearningRate 0.0940 Epoch: 0 Global Step: 25450 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:34:40,879-Speed 2627.98 samples/sec Loss 16.0913 LearningRate 0.0940 Epoch: 0 Global Step: 25460 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:34:44,775-Speed 2629.44 samples/sec Loss 16.0890 LearningRate 0.0940 Epoch: 0 Global Step: 25470 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:34:48,671-Speed 2628.85 samples/sec Loss 16.0451 LearningRate 0.0940 Epoch: 0 Global Step: 25480 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:34:52,567-Speed 2628.93 samples/sec Loss 16.0027 LearningRate 0.0939 Epoch: 0 Global Step: 25490 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:34:56,463-Speed 2629.27 samples/sec Loss 16.0866 LearningRate 0.0939 Epoch: 0 Global Step: 25500 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:35:00,456-Speed 2564.80 samples/sec Loss 16.2367 LearningRate 0.0939 Epoch: 0 Global Step: 25510 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:35:04,354-Speed 2627.48 samples/sec Loss 15.9935 LearningRate 0.0939 Epoch: 0 Global Step: 25520 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:35:08,271-Speed 2615.72 samples/sec Loss 16.0045 LearningRate 0.0939 Epoch: 0 Global Step: 25530 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:12,180-Speed 2620.19 samples/sec Loss 16.0332 LearningRate 0.0939 Epoch: 0 Global Step: 25540 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:16,086-Speed 2622.22 samples/sec Loss 15.8460 LearningRate 0.0939 Epoch: 0 Global Step: 25550 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:19,990-Speed 2623.10 samples/sec Loss 16.1433 LearningRate 0.0939 Epoch: 0 Global Step: 25560 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:23,897-Speed 2622.08 samples/sec Loss 15.9658 LearningRate 0.0939 Epoch: 0 Global Step: 25570 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:27,818-Speed 2612.09 samples/sec Loss 16.1088 LearningRate 0.0939 Epoch: 0 Global Step: 25580 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:31,716-Speed 2627.31 samples/sec Loss 16.0262 LearningRate 0.0939 Epoch: 0 Global Step: 25590 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:35,616-Speed 2625.85 samples/sec Loss 16.0518 LearningRate 0.0939 Epoch: 0 Global Step: 25600 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:39,518-Speed 2625.68 samples/sec Loss 16.0651 LearningRate 0.0939 Epoch: 0 Global Step: 25610 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:43,421-Speed 2624.11 samples/sec Loss 16.0537 LearningRate 0.0939 Epoch: 0 Global Step: 25620 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:47,319-Speed 2627.78 samples/sec Loss 16.0581 LearningRate 0.0939 Epoch: 0 Global Step: 25630 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:35:51,205-Speed 2636.32 samples/sec Loss 16.0348 LearningRate 0.0939 Epoch: 0 Global Step: 25640 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:55,116-Speed 2619.33 samples/sec Loss 16.0158 LearningRate 0.0939 Epoch: 0 Global Step: 25650 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:35:59,012-Speed 2628.42 samples/sec Loss 16.0493 LearningRate 0.0939 Epoch: 0 Global Step: 25660 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:02,911-Speed 2626.80 samples/sec Loss 15.9929 LearningRate 0.0939 Epoch: 0 Global Step: 25670 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:06,822-Speed 2618.44 samples/sec Loss 16.0816 LearningRate 0.0939 Epoch: 0 Global Step: 25680 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:10,724-Speed 2625.87 samples/sec Loss 15.9443 LearningRate 0.0939 Epoch: 0 Global Step: 25690 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:14,623-Speed 2626.93 samples/sec Loss 16.1148 LearningRate 0.0939 Epoch: 0 Global Step: 25700 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:18,521-Speed 2627.81 samples/sec Loss 15.8212 LearningRate 0.0939 Epoch: 0 Global Step: 25710 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:22,417-Speed 2628.93 samples/sec Loss 15.9731 LearningRate 0.0939 Epoch: 0 Global Step: 25720 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:26,313-Speed 2629.14 samples/sec Loss 16.0559 LearningRate 0.0939 Epoch: 0 Global Step: 25730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:30,207-Speed 2629.64 samples/sec Loss 15.8981 LearningRate 0.0939 Epoch: 0 Global Step: 25740 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:36:34,101-Speed 2630.65 samples/sec Loss 15.9739 LearningRate 0.0939 Epoch: 0 Global Step: 25750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:37,998-Speed 2628.64 samples/sec Loss 16.2543 LearningRate 0.0939 Epoch: 0 Global Step: 25760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:41,905-Speed 2621.43 samples/sec Loss 16.0802 LearningRate 0.0939 Epoch: 0 Global Step: 25770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:45,806-Speed 2625.75 samples/sec Loss 16.0102 LearningRate 0.0939 Epoch: 0 Global Step: 25780 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:49,711-Speed 2622.64 samples/sec Loss 15.9279 LearningRate 0.0939 Epoch: 0 Global Step: 25790 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:53,610-Speed 2627.81 samples/sec Loss 15.8773 LearningRate 0.0939 Epoch: 0 Global Step: 25800 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:36:57,510-Speed 2626.14 samples/sec Loss 15.7401 LearningRate 0.0939 Epoch: 0 Global Step: 25810 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:01,412-Speed 2624.84 samples/sec Loss 15.9810 LearningRate 0.0939 Epoch: 0 Global Step: 25820 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:05,310-Speed 2626.90 samples/sec Loss 15.9846 LearningRate 0.0939 Epoch: 0 Global Step: 25830 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:09,210-Speed 2627.11 samples/sec Loss 16.0951 LearningRate 0.0939 Epoch: 0 Global Step: 25840 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:13,089-Speed 2640.08 samples/sec Loss 16.0267 LearningRate 0.0939 Epoch: 0 Global Step: 25850 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:16,992-Speed 2624.15 samples/sec Loss 16.0138 LearningRate 0.0939 Epoch: 0 Global Step: 25860 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:20,895-Speed 2624.42 samples/sec Loss 16.1036 LearningRate 0.0939 Epoch: 0 Global Step: 25870 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:24,796-Speed 2625.42 samples/sec Loss 16.0055 LearningRate 0.0939 Epoch: 0 Global Step: 25880 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:28,695-Speed 2627.59 samples/sec Loss 16.1422 LearningRate 0.0939 Epoch: 0 Global Step: 25890 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:32,597-Speed 2624.80 samples/sec Loss 15.8156 LearningRate 0.0939 Epoch: 0 Global Step: 25900 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:36,496-Speed 2626.24 samples/sec Loss 15.9555 LearningRate 0.0939 Epoch: 0 Global Step: 25910 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:40,397-Speed 2626.16 samples/sec Loss 16.0049 LearningRate 0.0938 Epoch: 0 Global Step: 25920 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:44,295-Speed 2627.86 samples/sec Loss 15.9690 LearningRate 0.0938 Epoch: 0 Global Step: 25930 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:48,193-Speed 2627.58 samples/sec Loss 16.0190 LearningRate 0.0938 Epoch: 0 Global Step: 25940 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:52,080-Speed 2635.42 samples/sec Loss 15.9247 LearningRate 0.0938 Epoch: 0 Global Step: 25950 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:55,979-Speed 2626.87 samples/sec Loss 15.8363 LearningRate 0.0938 Epoch: 0 Global Step: 25960 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:37:59,876-Speed 2628.01 samples/sec Loss 16.0192 LearningRate 0.0938 Epoch: 0 Global Step: 25970 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:38:03,777-Speed 2625.42 samples/sec Loss 16.0301 LearningRate 0.0938 Epoch: 0 Global Step: 25980 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:38:07,669-Speed 2631.34 samples/sec Loss 16.0161 LearningRate 0.0938 Epoch: 0 Global Step: 25990 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:11,563-Speed 2630.77 samples/sec Loss 15.8810 LearningRate 0.0938 Epoch: 0 Global Step: 26000 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:15,466-Speed 2624.67 samples/sec Loss 15.9354 LearningRate 0.0938 Epoch: 0 Global Step: 26010 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:19,368-Speed 2624.47 samples/sec Loss 16.0381 LearningRate 0.0938 Epoch: 0 Global Step: 26020 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:23,280-Speed 2618.68 samples/sec Loss 15.8604 LearningRate 0.0938 Epoch: 0 Global Step: 26030 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:27,181-Speed 2625.98 samples/sec Loss 15.9873 LearningRate 0.0938 Epoch: 0 Global Step: 26040 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:31,091-Speed 2619.44 samples/sec Loss 15.8689 LearningRate 0.0938 Epoch: 0 Global Step: 26050 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:34,988-Speed 2628.04 samples/sec Loss 15.9226 LearningRate 0.0938 Epoch: 0 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:38,887-Speed 2626.91 samples/sec Loss 15.6784 LearningRate 0.0938 Epoch: 0 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:42,788-Speed 2625.87 samples/sec Loss 15.8579 LearningRate 0.0938 Epoch: 0 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:38:46,691-Speed 2624.75 samples/sec Loss 16.0332 LearningRate 0.0938 Epoch: 0 Global Step: 26090 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:38:50,597-Speed 2622.28 samples/sec Loss 15.8994 LearningRate 0.0938 Epoch: 0 Global Step: 26100 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:38:54,504-Speed 2621.41 samples/sec Loss 15.9975 LearningRate 0.0938 Epoch: 0 Global Step: 26110 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:38:58,403-Speed 2627.47 samples/sec Loss 15.9908 LearningRate 0.0938 Epoch: 0 Global Step: 26120 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:02,298-Speed 2629.35 samples/sec Loss 15.8118 LearningRate 0.0938 Epoch: 0 Global Step: 26130 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:06,264-Speed 2582.37 samples/sec Loss 16.0199 LearningRate 0.0938 Epoch: 0 Global Step: 26140 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:10,201-Speed 2601.69 samples/sec Loss 16.0106 LearningRate 0.0938 Epoch: 0 Global Step: 26150 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:14,101-Speed 2626.22 samples/sec Loss 15.8954 LearningRate 0.0938 Epoch: 0 Global Step: 26160 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:18,010-Speed 2620.18 samples/sec Loss 15.9299 LearningRate 0.0938 Epoch: 0 Global Step: 26170 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:21,919-Speed 2620.49 samples/sec Loss 15.9896 LearningRate 0.0938 Epoch: 0 Global Step: 26180 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:25,813-Speed 2629.67 samples/sec Loss 16.0644 LearningRate 0.0938 Epoch: 0 Global Step: 26190 Fp16 Grad Scale: 524288 Required: 90 hours
Training: 2022-04-12 22:39:29,696-Speed 2638.67 samples/sec Loss 15.9947 LearningRate 0.0938 Epoch: 0 Global Step: 26200 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:39:33,570-Speed 2643.31 samples/sec Loss 15.8799 LearningRate 0.0938 Epoch: 0 Global Step: 26210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:39:37,472-Speed 2624.89 samples/sec Loss 15.8969 LearningRate 0.0938 Epoch: 0 Global Step: 26220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:39:41,373-Speed 2625.30 samples/sec Loss 15.9086 LearningRate 0.0938 Epoch: 0 Global Step: 26230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:39:45,277-Speed 2624.10 samples/sec Loss 15.9509 LearningRate 0.0938 Epoch: 0 Global Step: 26240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:39:49,178-Speed 2625.28 samples/sec Loss 15.7858 LearningRate 0.0938 Epoch: 0 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:39:53,080-Speed 2625.22 samples/sec Loss 15.9277 LearningRate 0.0938 Epoch: 0 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:39:56,981-Speed 2625.39 samples/sec Loss 15.7109 LearningRate 0.0938 Epoch: 0 Global Step: 26270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:00,885-Speed 2623.73 samples/sec Loss 15.9135 LearningRate 0.0938 Epoch: 0 Global Step: 26280 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:04,793-Speed 2620.63 samples/sec Loss 15.9336 LearningRate 0.0938 Epoch: 0 Global Step: 26290 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:08,692-Speed 2626.56 samples/sec Loss 15.8089 LearningRate 0.0938 Epoch: 0 Global Step: 26300 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:12,593-Speed 2625.69 samples/sec Loss 15.7900 LearningRate 0.0938 Epoch: 0 Global Step: 26310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:40:16,501-Speed 2620.83 samples/sec Loss 15.7935 LearningRate 0.0938 Epoch: 0 Global Step: 26320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:40:20,422-Speed 2612.73 samples/sec Loss 15.9531 LearningRate 0.0938 Epoch: 0 Global Step: 26330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:40:24,322-Speed 2626.60 samples/sec Loss 15.9588 LearningRate 0.0938 Epoch: 0 Global Step: 26340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:40:28,219-Speed 2627.92 samples/sec Loss 15.8179 LearningRate 0.0937 Epoch: 0 Global Step: 26350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:40:32,145-Speed 2609.07 samples/sec Loss 15.8838 LearningRate 0.0937 Epoch: 0 Global Step: 26360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:40:36,046-Speed 2625.62 samples/sec Loss 15.9577 LearningRate 0.0937 Epoch: 0 Global Step: 26370 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:40:39,945-Speed 2627.02 samples/sec Loss 16.0394 LearningRate 0.0937 Epoch: 0 Global Step: 26380 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:43,857-Speed 2618.35 samples/sec Loss 15.8536 LearningRate 0.0937 Epoch: 0 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:47,765-Speed 2621.63 samples/sec Loss 15.8087 LearningRate 0.0937 Epoch: 0 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:51,661-Speed 2628.61 samples/sec Loss 15.7733 LearningRate 0.0937 Epoch: 0 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:40:55,545-Speed 2637.30 samples/sec Loss 15.7793 LearningRate 0.0937 Epoch: 0 Global Step: 26420 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:40:59,445-Speed 2626.22 samples/sec Loss 15.9591 LearningRate 0.0937 Epoch: 0 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:03,346-Speed 2625.45 samples/sec Loss 15.8481 LearningRate 0.0937 Epoch: 0 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:07,240-Speed 2630.47 samples/sec Loss 15.8866 LearningRate 0.0937 Epoch: 0 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:11,133-Speed 2631.04 samples/sec Loss 15.8636 LearningRate 0.0937 Epoch: 0 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:15,029-Speed 2628.87 samples/sec Loss 15.7671 LearningRate 0.0937 Epoch: 0 Global Step: 26470 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:18,922-Speed 2631.44 samples/sec Loss 15.9520 LearningRate 0.0937 Epoch: 0 Global Step: 26480 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:22,816-Speed 2630.58 samples/sec Loss 15.9080 LearningRate 0.0937 Epoch: 0 Global Step: 26490 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:26,716-Speed 2626.05 samples/sec Loss 15.8573 LearningRate 0.0937 Epoch: 0 Global Step: 26500 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:30,611-Speed 2629.92 samples/sec Loss 15.8011 LearningRate 0.0937 Epoch: 0 Global Step: 26510 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:41:34,510-Speed 2626.15 samples/sec Loss 15.8501 LearningRate 0.0937 Epoch: 0 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:41:38,406-Speed 2629.26 samples/sec Loss 15.7513 LearningRate 0.0937 Epoch: 0 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:41:42,333-Speed 2608.53 samples/sec Loss 15.7431 LearningRate 0.0937 Epoch: 0 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:41:46,229-Speed 2628.65 samples/sec Loss 15.6443 LearningRate 0.0937 Epoch: 0 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:41:50,124-Speed 2629.53 samples/sec Loss 15.8854 LearningRate 0.0937 Epoch: 0 Global Step: 26560 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:41:54,013-Speed 2634.17 samples/sec Loss 15.9721 LearningRate 0.0937 Epoch: 0 Global Step: 26570 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:41:57,920-Speed 2621.56 samples/sec Loss 15.7988 LearningRate 0.0937 Epoch: 0 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:01,815-Speed 2629.31 samples/sec Loss 15.9153 LearningRate 0.0937 Epoch: 0 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:05,711-Speed 2628.92 samples/sec Loss 15.7916 LearningRate 0.0937 Epoch: 0 Global Step: 26600 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:09,614-Speed 2624.60 samples/sec Loss 15.7534 LearningRate 0.0937 Epoch: 0 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:13,509-Speed 2629.59 samples/sec Loss 15.7672 LearningRate 0.0937 Epoch: 0 Global Step: 26620 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:42:17,396-Speed 2635.07 samples/sec Loss 15.8038 LearningRate 0.0937 Epoch: 0 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:21,290-Speed 2629.84 samples/sec Loss 15.9071 LearningRate 0.0937 Epoch: 0 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:25,186-Speed 2629.78 samples/sec Loss 15.7947 LearningRate 0.0937 Epoch: 0 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:29,079-Speed 2630.75 samples/sec Loss 15.8731 LearningRate 0.0937 Epoch: 0 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:32,973-Speed 2630.50 samples/sec Loss 15.8794 LearningRate 0.0937 Epoch: 0 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:36,873-Speed 2625.82 samples/sec Loss 15.7602 LearningRate 0.0937 Epoch: 0 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:40,768-Speed 2629.85 samples/sec Loss 15.8061 LearningRate 0.0937 Epoch: 0 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:44,663-Speed 2629.67 samples/sec Loss 15.7881 LearningRate 0.0937 Epoch: 0 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:48,561-Speed 2627.21 samples/sec Loss 15.8726 LearningRate 0.0937 Epoch: 0 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:52,462-Speed 2625.94 samples/sec Loss 15.9314 LearningRate 0.0937 Epoch: 0 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:42:56,357-Speed 2629.52 samples/sec Loss 15.9190 LearningRate 0.0937 Epoch: 0 Global Step: 26730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:00,254-Speed 2628.08 samples/sec Loss 15.8098 LearningRate 0.0937 Epoch: 0 Global Step: 26740 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:04,154-Speed 2626.48 samples/sec Loss 15.8906 LearningRate 0.0937 Epoch: 0 Global Step: 26750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:08,057-Speed 2624.48 samples/sec Loss 15.8146 LearningRate 0.0937 Epoch: 0 Global Step: 26760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:11,955-Speed 2627.68 samples/sec Loss 15.9297 LearningRate 0.0937 Epoch: 0 Global Step: 26770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:15,856-Speed 2625.38 samples/sec Loss 15.8084 LearningRate 0.0936 Epoch: 0 Global Step: 26780 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:19,757-Speed 2625.09 samples/sec Loss 15.9453 LearningRate 0.0936 Epoch: 0 Global Step: 26790 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:23,664-Speed 2622.19 samples/sec Loss 15.7521 LearningRate 0.0936 Epoch: 0 Global Step: 26800 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:27,580-Speed 2615.26 samples/sec Loss 15.8158 LearningRate 0.0936 Epoch: 0 Global Step: 26810 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:31,489-Speed 2620.12 samples/sec Loss 15.6547 LearningRate 0.0936 Epoch: 0 Global Step: 26820 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:35,369-Speed 2639.75 samples/sec Loss 15.8239 LearningRate 0.0936 Epoch: 0 Global Step: 26830 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:39,268-Speed 2627.27 samples/sec Loss 15.6381 LearningRate 0.0936 Epoch: 0 Global Step: 26840 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:43,166-Speed 2627.49 samples/sec Loss 15.7380 LearningRate 0.0936 Epoch: 0 Global Step: 26850 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:47,071-Speed 2622.66 samples/sec Loss 15.7523 LearningRate 0.0936 Epoch: 0 Global Step: 26860 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:50,980-Speed 2620.92 samples/sec Loss 15.7960 LearningRate 0.0936 Epoch: 0 Global Step: 26870 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:54,880-Speed 2625.69 samples/sec Loss 15.6837 LearningRate 0.0936 Epoch: 0 Global Step: 26880 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:43:58,787-Speed 2621.89 samples/sec Loss 15.8028 LearningRate 0.0936 Epoch: 0 Global Step: 26890 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:44:02,668-Speed 2639.26 samples/sec Loss 15.7885 LearningRate 0.0936 Epoch: 0 Global Step: 26900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:06,568-Speed 2625.65 samples/sec Loss 15.8761 LearningRate 0.0936 Epoch: 0 Global Step: 26910 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:10,472-Speed 2623.30 samples/sec Loss 15.6912 LearningRate 0.0936 Epoch: 0 Global Step: 26920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:14,375-Speed 2624.58 samples/sec Loss 15.7536 LearningRate 0.0936 Epoch: 0 Global Step: 26930 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:18,274-Speed 2626.90 samples/sec Loss 15.5660 LearningRate 0.0936 Epoch: 0 Global Step: 26940 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:22,177-Speed 2624.42 samples/sec Loss 15.7890 LearningRate 0.0936 Epoch: 0 Global Step: 26950 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:26,075-Speed 2627.88 samples/sec Loss 15.7575 LearningRate 0.0936 Epoch: 0 Global Step: 26960 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:29,982-Speed 2621.42 samples/sec Loss 15.8769 LearningRate 0.0936 Epoch: 0 Global Step: 26970 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:33,883-Speed 2625.41 samples/sec Loss 15.8677 LearningRate 0.0936 Epoch: 0 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:44:37,763-Speed 2639.45 samples/sec Loss 15.7182 LearningRate 0.0936 Epoch: 0 Global Step: 26990 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:44:41,660-Speed 2628.41 samples/sec Loss 15.7360 LearningRate 0.0936 Epoch: 0 Global Step: 27000 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:44:45,557-Speed 2628.44 samples/sec Loss 15.7131 LearningRate 0.0936 Epoch: 0 Global Step: 27010 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:44:49,455-Speed 2627.33 samples/sec Loss 15.7758 LearningRate 0.0936 Epoch: 0 Global Step: 27020 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:44:53,353-Speed 2627.90 samples/sec Loss 15.7726 LearningRate 0.0936 Epoch: 0 Global Step: 27030 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:44:57,249-Speed 2629.21 samples/sec Loss 15.6922 LearningRate 0.0936 Epoch: 0 Global Step: 27040 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:45:01,147-Speed 2627.64 samples/sec Loss 15.6480 LearningRate 0.0936 Epoch: 0 Global Step: 27050 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:45:05,048-Speed 2625.41 samples/sec Loss 15.7402 LearningRate 0.0936 Epoch: 0 Global Step: 27060 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:45:08,944-Speed 2628.44 samples/sec Loss 15.7959 LearningRate 0.0936 Epoch: 0 Global Step: 27070 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:45:12,850-Speed 2622.68 samples/sec Loss 15.6262 LearningRate 0.0936 Epoch: 0 Global Step: 27080 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 22:45:16,750-Speed 2626.00 samples/sec Loss 15.8067 LearningRate 0.0936 Epoch: 0 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:20,657-Speed 2621.83 samples/sec Loss 15.6307 LearningRate 0.0936 Epoch: 0 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:24,566-Speed 2620.12 samples/sec Loss 15.9060 LearningRate 0.0936 Epoch: 0 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:28,472-Speed 2622.27 samples/sec Loss 15.7579 LearningRate 0.0936 Epoch: 0 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:32,374-Speed 2625.35 samples/sec Loss 15.7189 LearningRate 0.0936 Epoch: 0 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:36,272-Speed 2627.16 samples/sec Loss 15.7445 LearningRate 0.0936 Epoch: 0 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:40,169-Speed 2627.91 samples/sec Loss 15.6464 LearningRate 0.0936 Epoch: 0 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:44,075-Speed 2622.65 samples/sec Loss 15.6733 LearningRate 0.0936 Epoch: 0 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:47,977-Speed 2624.55 samples/sec Loss 15.7206 LearningRate 0.0936 Epoch: 0 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:51,886-Speed 2620.34 samples/sec Loss 15.7684 LearningRate 0.0936 Epoch: 0 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 22:45:55,787-Speed 2625.30 samples/sec Loss 15.6763 LearningRate 0.0936 Epoch: 0 Global Step: 27190 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:45:59,684-Speed 2629.12 samples/sec Loss 15.8681 LearningRate 0.0935 Epoch: 0 Global Step: 27200 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:03,600-Speed 2615.02 samples/sec Loss 15.6145 LearningRate 0.0935 Epoch: 0 Global Step: 27210 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:07,497-Speed 2628.56 samples/sec Loss 15.7306 LearningRate 0.0935 Epoch: 0 Global Step: 27220 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:11,441-Speed 2596.45 samples/sec Loss 15.6863 LearningRate 0.0935 Epoch: 0 Global Step: 27230 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:15,395-Speed 2590.59 samples/sec Loss 15.8283 LearningRate 0.0935 Epoch: 0 Global Step: 27240 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:19,300-Speed 2622.71 samples/sec Loss 15.6004 LearningRate 0.0935 Epoch: 0 Global Step: 27250 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:23,194-Speed 2630.40 samples/sec Loss 15.7107 LearningRate 0.0935 Epoch: 0 Global Step: 27260 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:27,092-Speed 2627.22 samples/sec Loss 15.6688 LearningRate 0.0935 Epoch: 0 Global Step: 27270 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:30,992-Speed 2627.09 samples/sec Loss 15.8228 LearningRate 0.0935 Epoch: 0 Global Step: 27280 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:34,912-Speed 2612.92 samples/sec Loss 15.6429 LearningRate 0.0935 Epoch: 0 Global Step: 27290 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:38,813-Speed 2625.41 samples/sec Loss 15.7834 LearningRate 0.0935 Epoch: 0 Global Step: 27300 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:42,711-Speed 2626.98 samples/sec Loss 15.7189 LearningRate 0.0935 Epoch: 0 Global Step: 27310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 22:46:46,607-Speed 2629.39 samples/sec Loss 15.7105 LearningRate 0.0935 Epoch: 0 Global Step: 27320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:46:50,500-Speed 2630.57 samples/sec Loss 15.7370 LearningRate 0.0935 Epoch: 0 Global Step: 27330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:46:54,414-Speed 2617.63 samples/sec Loss 15.8142 LearningRate 0.0935 Epoch: 0 Global Step: 27340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:46:58,328-Speed 2616.26 samples/sec Loss 15.6896 LearningRate 0.0935 Epoch: 0 Global Step: 27350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:02,237-Speed 2620.28 samples/sec Loss 15.6826 LearningRate 0.0935 Epoch: 0 Global Step: 27360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:06,136-Speed 2627.16 samples/sec Loss 15.8023 LearningRate 0.0935 Epoch: 0 Global Step: 27370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:10,042-Speed 2622.66 samples/sec Loss 15.6528 LearningRate 0.0935 Epoch: 0 Global Step: 27380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:13,920-Speed 2640.73 samples/sec Loss 15.5321 LearningRate 0.0935 Epoch: 0 Global Step: 27390 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:17,817-Speed 2628.18 samples/sec Loss 15.7662 LearningRate 0.0935 Epoch: 0 Global Step: 27400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:21,712-Speed 2629.52 samples/sec Loss 15.6120 LearningRate 0.0935 Epoch: 0 Global Step: 27410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:25,615-Speed 2624.50 samples/sec Loss 15.8130 LearningRate 0.0935 Epoch: 0 Global Step: 27420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:29,518-Speed 2623.76 samples/sec Loss 15.8125 LearningRate 0.0935 Epoch: 0 Global Step: 27430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:33,415-Speed 2628.20 samples/sec Loss 15.8209 LearningRate 0.0935 Epoch: 0 Global Step: 27440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:37,337-Speed 2611.84 samples/sec Loss 15.8051 LearningRate 0.0935 Epoch: 0 Global Step: 27450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:41,242-Speed 2622.65 samples/sec Loss 15.7231 LearningRate 0.0935 Epoch: 0 Global Step: 27460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:47:45,132-Speed 2633.41 samples/sec Loss 15.7691 LearningRate 0.0935 Epoch: 0 Global Step: 27470 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:47:49,028-Speed 2628.84 samples/sec Loss 15.8549 LearningRate 0.0935 Epoch: 0 Global Step: 27480 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:47:52,942-Speed 2616.72 samples/sec Loss 15.7284 LearningRate 0.0935 Epoch: 0 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:47:56,836-Speed 2630.33 samples/sec Loss 15.8668 LearningRate 0.0935 Epoch: 0 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:00,732-Speed 2628.92 samples/sec Loss 15.6933 LearningRate 0.0935 Epoch: 0 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:04,628-Speed 2628.55 samples/sec Loss 15.7541 LearningRate 0.0935 Epoch: 0 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:08,526-Speed 2627.67 samples/sec Loss 15.8415 LearningRate 0.0935 Epoch: 0 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:12,431-Speed 2623.33 samples/sec Loss 15.6719 LearningRate 0.0935 Epoch: 0 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:16,326-Speed 2629.38 samples/sec Loss 15.5611 LearningRate 0.0935 Epoch: 0 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:20,222-Speed 2629.29 samples/sec Loss 15.7372 LearningRate 0.0935 Epoch: 0 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:24,120-Speed 2627.96 samples/sec Loss 15.7730 LearningRate 0.0935 Epoch: 0 Global Step: 27570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:48:28,014-Speed 2630.03 samples/sec Loss 15.6396 LearningRate 0.0935 Epoch: 0 Global Step: 27580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:48:31,911-Speed 2628.05 samples/sec Loss 15.6432 LearningRate 0.0935 Epoch: 0 Global Step: 27590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:48:35,795-Speed 2636.66 samples/sec Loss 15.6027 LearningRate 0.0935 Epoch: 0 Global Step: 27600 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:39,692-Speed 2628.66 samples/sec Loss 15.6611 LearningRate 0.0935 Epoch: 0 Global Step: 27610 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:43,589-Speed 2628.21 samples/sec Loss 15.6398 LearningRate 0.0935 Epoch: 0 Global Step: 27620 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:47,485-Speed 2628.92 samples/sec Loss 15.5020 LearningRate 0.0934 Epoch: 0 Global Step: 27630 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:51,386-Speed 2625.62 samples/sec Loss 15.7427 LearningRate 0.0934 Epoch: 0 Global Step: 27640 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:55,282-Speed 2629.50 samples/sec Loss 15.4917 LearningRate 0.0934 Epoch: 0 Global Step: 27650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:48:59,176-Speed 2630.01 samples/sec Loss 15.5851 LearningRate 0.0934 Epoch: 0 Global Step: 27660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:03,070-Speed 2630.11 samples/sec Loss 15.5919 LearningRate 0.0934 Epoch: 0 Global Step: 27670 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:06,964-Speed 2630.19 samples/sec Loss 15.7966 LearningRate 0.0934 Epoch: 0 Global Step: 27680 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:10,860-Speed 2628.93 samples/sec Loss 15.7716 LearningRate 0.0934 Epoch: 0 Global Step: 27690 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:14,757-Speed 2628.06 samples/sec Loss 15.6210 LearningRate 0.0934 Epoch: 0 Global Step: 27700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:49:18,661-Speed 2623.26 samples/sec Loss 15.6178 LearningRate 0.0934 Epoch: 0 Global Step: 27710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:49:22,542-Speed 2639.63 samples/sec Loss 15.6384 LearningRate 0.0934 Epoch: 0 Global Step: 27720 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:26,441-Speed 2626.21 samples/sec Loss 15.6938 LearningRate 0.0934 Epoch: 0 Global Step: 27730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:30,338-Speed 2629.10 samples/sec Loss 15.5822 LearningRate 0.0934 Epoch: 0 Global Step: 27740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:34,234-Speed 2628.59 samples/sec Loss 15.6810 LearningRate 0.0934 Epoch: 0 Global Step: 27750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:38,131-Speed 2628.39 samples/sec Loss 15.5193 LearningRate 0.0934 Epoch: 0 Global Step: 27760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:42,027-Speed 2629.01 samples/sec Loss 15.7653 LearningRate 0.0934 Epoch: 0 Global Step: 27770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:45,926-Speed 2626.47 samples/sec Loss 15.6139 LearningRate 0.0934 Epoch: 0 Global Step: 27780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:49,822-Speed 2628.82 samples/sec Loss 15.6619 LearningRate 0.0934 Epoch: 0 Global Step: 27790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:53,718-Speed 2629.03 samples/sec Loss 15.6230 LearningRate 0.0934 Epoch: 0 Global Step: 27800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:49:57,620-Speed 2625.03 samples/sec Loss 15.7504 LearningRate 0.0934 Epoch: 0 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:50:01,519-Speed 2627.05 samples/sec Loss 15.6266 LearningRate 0.0934 Epoch: 0 Global Step: 27820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:05,418-Speed 2626.58 samples/sec Loss 15.5318 LearningRate 0.0934 Epoch: 0 Global Step: 27830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:09,310-Speed 2631.63 samples/sec Loss 15.6411 LearningRate 0.0934 Epoch: 0 Global Step: 27840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:13,219-Speed 2619.95 samples/sec Loss 15.6292 LearningRate 0.0934 Epoch: 0 Global Step: 27850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:17,118-Speed 2627.19 samples/sec Loss 15.6022 LearningRate 0.0934 Epoch: 0 Global Step: 27860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:21,016-Speed 2627.75 samples/sec Loss 15.5194 LearningRate 0.0934 Epoch: 0 Global Step: 27870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:24,920-Speed 2623.57 samples/sec Loss 15.5999 LearningRate 0.0934 Epoch: 0 Global Step: 27880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:28,830-Speed 2619.49 samples/sec Loss 15.6782 LearningRate 0.0934 Epoch: 0 Global Step: 27890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:32,728-Speed 2627.61 samples/sec Loss 15.7811 LearningRate 0.0934 Epoch: 0 Global Step: 27900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:36,628-Speed 2625.88 samples/sec Loss 15.6365 LearningRate 0.0934 Epoch: 0 Global Step: 27910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:40,518-Speed 2633.10 samples/sec Loss 15.4624 LearningRate 0.0934 Epoch: 0 Global Step: 27920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:50:44,398-Speed 2640.10 samples/sec Loss 15.7480 LearningRate 0.0934 Epoch: 0 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:50:48,292-Speed 2630.37 samples/sec Loss 15.4463 LearningRate 0.0934 Epoch: 0 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:50:52,187-Speed 2629.52 samples/sec Loss 15.6351 LearningRate 0.0934 Epoch: 0 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:50:56,082-Speed 2629.54 samples/sec Loss 15.5479 LearningRate 0.0934 Epoch: 0 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:50:59,995-Speed 2617.24 samples/sec Loss 15.4177 LearningRate 0.0934 Epoch: 0 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:51:03,895-Speed 2626.13 samples/sec Loss 15.6432 LearningRate 0.0934 Epoch: 0 Global Step: 27980 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:51:07,805-Speed 2619.69 samples/sec Loss 15.5599 LearningRate 0.0934 Epoch: 0 Global Step: 27990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:51:11,702-Speed 2627.81 samples/sec Loss 15.5866 LearningRate 0.0934 Epoch: 0 Global Step: 28000 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:51:15,597-Speed 2630.03 samples/sec Loss 15.8174 LearningRate 0.0934 Epoch: 0 Global Step: 28010 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:51:19,495-Speed 2627.52 samples/sec Loss 15.6177 LearningRate 0.0934 Epoch: 0 Global Step: 28020 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:23,382-Speed 2636.76 samples/sec Loss 15.6416 LearningRate 0.0934 Epoch: 0 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:27,293-Speed 2618.63 samples/sec Loss 15.5092 LearningRate 0.0934 Epoch: 0 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:31,193-Speed 2626.15 samples/sec Loss 15.6387 LearningRate 0.0934 Epoch: 0 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:35,092-Speed 2626.42 samples/sec Loss 15.7264 LearningRate 0.0933 Epoch: 0 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:39,020-Speed 2607.99 samples/sec Loss 15.7134 LearningRate 0.0933 Epoch: 0 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:42,924-Speed 2623.29 samples/sec Loss 15.7915 LearningRate 0.0933 Epoch: 0 Global Step: 28080 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:46,821-Speed 2628.23 samples/sec Loss 15.7562 LearningRate 0.0933 Epoch: 0 Global Step: 28090 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:50,719-Speed 2627.63 samples/sec Loss 15.7205 LearningRate 0.0933 Epoch: 0 Global Step: 28100 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:54,617-Speed 2627.71 samples/sec Loss 15.6410 LearningRate 0.0933 Epoch: 0 Global Step: 28110 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 22:51:58,515-Speed 2627.57 samples/sec Loss 15.5937 LearningRate 0.0933 Epoch: 0 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:02,417-Speed 2625.20 samples/sec Loss 15.6453 LearningRate 0.0933 Epoch: 0 Global Step: 28130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:06,320-Speed 2624.25 samples/sec Loss 15.5976 LearningRate 0.0933 Epoch: 0 Global Step: 28140 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:10,219-Speed 2626.90 samples/sec Loss 15.6914 LearningRate 0.0933 Epoch: 0 Global Step: 28150 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:14,117-Speed 2627.31 samples/sec Loss 15.5587 LearningRate 0.0933 Epoch: 0 Global Step: 28160 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:18,018-Speed 2625.44 samples/sec Loss 15.4887 LearningRate 0.0933 Epoch: 0 Global Step: 28170 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:21,920-Speed 2625.25 samples/sec Loss 15.5053 LearningRate 0.0933 Epoch: 0 Global Step: 28180 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:25,833-Speed 2617.60 samples/sec Loss 15.6146 LearningRate 0.0933 Epoch: 0 Global Step: 28190 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:29,731-Speed 2627.42 samples/sec Loss 15.6995 LearningRate 0.0933 Epoch: 0 Global Step: 28200 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:33,653-Speed 2611.33 samples/sec Loss 15.4886 LearningRate 0.0933 Epoch: 0 Global Step: 28210 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:52:37,551-Speed 2627.96 samples/sec Loss 15.5989 LearningRate 0.0933 Epoch: 0 Global Step: 28220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:52:41,551-Speed 2561.60 samples/sec Loss 15.5684 LearningRate 0.0933 Epoch: 0 Global Step: 28230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:52:45,448-Speed 2628.25 samples/sec Loss 15.5675 LearningRate 0.0933 Epoch: 0 Global Step: 28240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:52:49,349-Speed 2625.71 samples/sec Loss 15.5178 LearningRate 0.0933 Epoch: 0 Global Step: 28250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:52:53,250-Speed 2625.04 samples/sec Loss 15.5954 LearningRate 0.0933 Epoch: 0 Global Step: 28260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:52:57,152-Speed 2624.91 samples/sec Loss 15.4865 LearningRate 0.0933 Epoch: 0 Global Step: 28270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:53:01,143-Speed 2566.21 samples/sec Loss 15.5872 LearningRate 0.0933 Epoch: 0 Global Step: 28280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:53:05,058-Speed 2616.22 samples/sec Loss 15.6020 LearningRate 0.0933 Epoch: 0 Global Step: 28290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:53:08,942-Speed 2637.56 samples/sec Loss 15.7003 LearningRate 0.0933 Epoch: 0 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:12,850-Speed 2621.04 samples/sec Loss 15.5854 LearningRate 0.0933 Epoch: 0 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:16,769-Speed 2614.19 samples/sec Loss 15.5867 LearningRate 0.0933 Epoch: 0 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:20,676-Speed 2621.39 samples/sec Loss 15.6318 LearningRate 0.0933 Epoch: 0 Global Step: 28330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:24,572-Speed 2628.62 samples/sec Loss 15.6978 LearningRate 0.0933 Epoch: 0 Global Step: 28340 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:28,468-Speed 2628.78 samples/sec Loss 15.5390 LearningRate 0.0933 Epoch: 0 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:32,374-Speed 2622.55 samples/sec Loss 15.6317 LearningRate 0.0933 Epoch: 0 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:36,274-Speed 2625.79 samples/sec Loss 15.5215 LearningRate 0.0933 Epoch: 0 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:40,173-Speed 2626.86 samples/sec Loss 15.6088 LearningRate 0.0933 Epoch: 0 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:44,071-Speed 2627.89 samples/sec Loss 15.5735 LearningRate 0.0933 Epoch: 0 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:47,963-Speed 2632.06 samples/sec Loss 15.6043 LearningRate 0.0933 Epoch: 0 Global Step: 28400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:53:51,862-Speed 2626.58 samples/sec Loss 15.6804 LearningRate 0.0933 Epoch: 0 Global Step: 28410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:53:55,750-Speed 2634.10 samples/sec Loss 15.5479 LearningRate 0.0933 Epoch: 0 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:53:59,636-Speed 2636.02 samples/sec Loss 15.5970 LearningRate 0.0933 Epoch: 0 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:03,531-Speed 2629.52 samples/sec Loss 15.5086 LearningRate 0.0933 Epoch: 0 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:07,424-Speed 2630.55 samples/sec Loss 15.5698 LearningRate 0.0933 Epoch: 0 Global Step: 28450 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:11,319-Speed 2629.80 samples/sec Loss 15.5355 LearningRate 0.0933 Epoch: 0 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:15,214-Speed 2629.22 samples/sec Loss 15.6019 LearningRate 0.0933 Epoch: 0 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:19,107-Speed 2631.55 samples/sec Loss 15.6936 LearningRate 0.0933 Epoch: 0 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:23,004-Speed 2628.30 samples/sec Loss 15.4856 LearningRate 0.0932 Epoch: 0 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:26,939-Speed 2602.73 samples/sec Loss 15.5190 LearningRate 0.0932 Epoch: 0 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:30,836-Speed 2628.44 samples/sec Loss 15.4081 LearningRate 0.0932 Epoch: 0 Global Step: 28510 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:34,733-Speed 2628.18 samples/sec Loss 15.4662 LearningRate 0.0932 Epoch: 0 Global Step: 28520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:54:38,634-Speed 2625.19 samples/sec Loss 15.5298 LearningRate 0.0932 Epoch: 0 Global Step: 28530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:54:42,530-Speed 2629.26 samples/sec Loss 15.6014 LearningRate 0.0932 Epoch: 0 Global Step: 28540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:54:46,432-Speed 2624.98 samples/sec Loss 15.5077 LearningRate 0.0932 Epoch: 0 Global Step: 28550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:54:50,335-Speed 2624.11 samples/sec Loss 15.4260 LearningRate 0.0932 Epoch: 0 Global Step: 28560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:54:54,228-Speed 2630.93 samples/sec Loss 15.5417 LearningRate 0.0932 Epoch: 0 Global Step: 28570 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:54:58,125-Speed 2628.80 samples/sec Loss 15.6918 LearningRate 0.0932 Epoch: 0 Global Step: 28580 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:02,021-Speed 2628.47 samples/sec Loss 15.5092 LearningRate 0.0932 Epoch: 0 Global Step: 28590 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:05,918-Speed 2628.41 samples/sec Loss 15.5344 LearningRate 0.0932 Epoch: 0 Global Step: 28600 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:09,815-Speed 2627.92 samples/sec Loss 15.5622 LearningRate 0.0932 Epoch: 0 Global Step: 28610 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:13,725-Speed 2619.60 samples/sec Loss 15.3843 LearningRate 0.0932 Epoch: 0 Global Step: 28620 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:17,619-Speed 2630.93 samples/sec Loss 15.4148 LearningRate 0.0932 Epoch: 0 Global Step: 28630 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:21,517-Speed 2627.73 samples/sec Loss 15.5293 LearningRate 0.0932 Epoch: 0 Global Step: 28640 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:25,436-Speed 2613.30 samples/sec Loss 15.4542 LearningRate 0.0932 Epoch: 0 Global Step: 28650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:29,336-Speed 2627.07 samples/sec Loss 15.5589 LearningRate 0.0932 Epoch: 0 Global Step: 28660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:55:33,238-Speed 2624.71 samples/sec Loss 15.5551 LearningRate 0.0932 Epoch: 0 Global Step: 28670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:55:37,136-Speed 2626.95 samples/sec Loss 15.4308 LearningRate 0.0932 Epoch: 0 Global Step: 28680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:55:41,036-Speed 2626.25 samples/sec Loss 15.4785 LearningRate 0.0932 Epoch: 0 Global Step: 28690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:55:44,934-Speed 2628.07 samples/sec Loss 15.5770 LearningRate 0.0932 Epoch: 0 Global Step: 28700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:55:48,834-Speed 2625.64 samples/sec Loss 15.4747 LearningRate 0.0932 Epoch: 0 Global Step: 28710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:55:52,732-Speed 2627.97 samples/sec Loss 15.5432 LearningRate 0.0932 Epoch: 0 Global Step: 28720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:55:56,627-Speed 2629.41 samples/sec Loss 15.6154 LearningRate 0.0932 Epoch: 0 Global Step: 28730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:56:00,529-Speed 2625.51 samples/sec Loss 15.4635 LearningRate 0.0932 Epoch: 0 Global Step: 28740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:56:04,498-Speed 2580.21 samples/sec Loss 15.4088 LearningRate 0.0932 Epoch: 0 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:08,402-Speed 2623.51 samples/sec Loss 15.4477 LearningRate 0.0932 Epoch: 0 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:12,306-Speed 2623.38 samples/sec Loss 15.4056 LearningRate 0.0932 Epoch: 0 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:16,208-Speed 2625.06 samples/sec Loss 15.4145 LearningRate 0.0932 Epoch: 0 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:20,099-Speed 2632.15 samples/sec Loss 15.4765 LearningRate 0.0932 Epoch: 0 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:23,991-Speed 2631.50 samples/sec Loss 15.4205 LearningRate 0.0932 Epoch: 0 Global Step: 28800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:27,886-Speed 2630.04 samples/sec Loss 15.4941 LearningRate 0.0932 Epoch: 0 Global Step: 28810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:31,780-Speed 2629.89 samples/sec Loss 15.5083 LearningRate 0.0932 Epoch: 0 Global Step: 28820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:35,678-Speed 2628.27 samples/sec Loss 15.5745 LearningRate 0.0932 Epoch: 0 Global Step: 28830 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:39,574-Speed 2628.44 samples/sec Loss 15.4659 LearningRate 0.0932 Epoch: 0 Global Step: 28840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:56:43,479-Speed 2622.92 samples/sec Loss 15.4623 LearningRate 0.0932 Epoch: 0 Global Step: 28850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:56:47,386-Speed 2621.60 samples/sec Loss 15.4373 LearningRate 0.0932 Epoch: 0 Global Step: 28860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:56:51,280-Speed 2630.01 samples/sec Loss 15.4659 LearningRate 0.0932 Epoch: 0 Global Step: 28870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:56:55,177-Speed 2628.70 samples/sec Loss 15.4722 LearningRate 0.0932 Epoch: 0 Global Step: 28880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:56:59,079-Speed 2624.81 samples/sec Loss 15.5162 LearningRate 0.0932 Epoch: 0 Global Step: 28890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:57:02,993-Speed 2616.93 samples/sec Loss 15.5624 LearningRate 0.0932 Epoch: 0 Global Step: 28900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:57:06,889-Speed 2628.49 samples/sec Loss 15.2146 LearningRate 0.0932 Epoch: 0 Global Step: 28910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:57:10,784-Speed 2629.46 samples/sec Loss 15.3921 LearningRate 0.0931 Epoch: 0 Global Step: 28920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:57:14,682-Speed 2627.62 samples/sec Loss 15.5105 LearningRate 0.0931 Epoch: 0 Global Step: 28930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:57:18,579-Speed 2628.47 samples/sec Loss 15.4881 LearningRate 0.0931 Epoch: 0 Global Step: 28940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:57:22,460-Speed 2639.41 samples/sec Loss 15.6349 LearningRate 0.0931 Epoch: 0 Global Step: 28950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:57:26,351-Speed 2632.06 samples/sec Loss 15.3961 LearningRate 0.0931 Epoch: 0 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:30,249-Speed 2627.90 samples/sec Loss 15.5544 LearningRate 0.0931 Epoch: 0 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:34,149-Speed 2625.67 samples/sec Loss 15.4695 LearningRate 0.0931 Epoch: 0 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:38,050-Speed 2625.52 samples/sec Loss 15.3767 LearningRate 0.0931 Epoch: 0 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:41,952-Speed 2624.93 samples/sec Loss 15.5016 LearningRate 0.0931 Epoch: 0 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:45,860-Speed 2620.75 samples/sec Loss 15.4516 LearningRate 0.0931 Epoch: 0 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:49,762-Speed 2625.43 samples/sec Loss 15.5558 LearningRate 0.0931 Epoch: 0 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:53,660-Speed 2627.29 samples/sec Loss 15.4194 LearningRate 0.0931 Epoch: 0 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:57:57,556-Speed 2629.15 samples/sec Loss 15.3907 LearningRate 0.0931 Epoch: 0 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:01,452-Speed 2628.93 samples/sec Loss 15.4479 LearningRate 0.0931 Epoch: 0 Global Step: 29050 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:05,347-Speed 2629.28 samples/sec Loss 15.4278 LearningRate 0.0931 Epoch: 0 Global Step: 29060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:58:09,246-Speed 2627.22 samples/sec Loss 15.4615 LearningRate 0.0931 Epoch: 0 Global Step: 29070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:58:13,143-Speed 2627.66 samples/sec Loss 15.4766 LearningRate 0.0931 Epoch: 0 Global Step: 29080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:58:17,041-Speed 2628.13 samples/sec Loss 15.4305 LearningRate 0.0931 Epoch: 0 Global Step: 29090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:58:20,937-Speed 2628.97 samples/sec Loss 15.5048 LearningRate 0.0931 Epoch: 0 Global Step: 29100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:58:24,834-Speed 2628.18 samples/sec Loss 15.4571 LearningRate 0.0931 Epoch: 0 Global Step: 29110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:58:28,715-Speed 2638.75 samples/sec Loss 15.4655 LearningRate 0.0931 Epoch: 0 Global Step: 29120 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:32,612-Speed 2628.99 samples/sec Loss 15.4816 LearningRate 0.0931 Epoch: 0 Global Step: 29130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:36,521-Speed 2619.93 samples/sec Loss 15.4550 LearningRate 0.0931 Epoch: 0 Global Step: 29140 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:40,425-Speed 2623.03 samples/sec Loss 15.4211 LearningRate 0.0931 Epoch: 0 Global Step: 29150 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:44,324-Speed 2627.54 samples/sec Loss 15.5096 LearningRate 0.0931 Epoch: 0 Global Step: 29160 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:48,223-Speed 2626.57 samples/sec Loss 15.3157 LearningRate 0.0931 Epoch: 0 Global Step: 29170 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:52,126-Speed 2624.27 samples/sec Loss 15.4267 LearningRate 0.0931 Epoch: 0 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:56,038-Speed 2618.04 samples/sec Loss 15.5290 LearningRate 0.0931 Epoch: 0 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:58:59,931-Speed 2631.00 samples/sec Loss 15.4160 LearningRate 0.0931 Epoch: 0 Global Step: 29200 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:03,839-Speed 2620.74 samples/sec Loss 15.2990 LearningRate 0.0931 Epoch: 0 Global Step: 29210 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:07,738-Speed 2627.69 samples/sec Loss 15.4273 LearningRate 0.0931 Epoch: 0 Global Step: 29220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:59:11,636-Speed 2626.95 samples/sec Loss 15.1885 LearningRate 0.0931 Epoch: 0 Global Step: 29230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:59:15,537-Speed 2625.47 samples/sec Loss 15.4428 LearningRate 0.0931 Epoch: 0 Global Step: 29240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:59:19,439-Speed 2625.40 samples/sec Loss 15.3481 LearningRate 0.0931 Epoch: 0 Global Step: 29250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:59:23,336-Speed 2627.98 samples/sec Loss 15.5034 LearningRate 0.0931 Epoch: 0 Global Step: 29260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:59:27,242-Speed 2622.24 samples/sec Loss 15.4509 LearningRate 0.0931 Epoch: 0 Global Step: 29270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 22:59:31,133-Speed 2632.19 samples/sec Loss 15.3011 LearningRate 0.0931 Epoch: 0 Global Step: 29280 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:35,046-Speed 2617.53 samples/sec Loss 15.4907 LearningRate 0.0931 Epoch: 0 Global Step: 29290 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:38,946-Speed 2626.56 samples/sec Loss 15.4755 LearningRate 0.0931 Epoch: 0 Global Step: 29300 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:42,846-Speed 2626.04 samples/sec Loss 15.4361 LearningRate 0.0931 Epoch: 0 Global Step: 29310 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:46,743-Speed 2628.58 samples/sec Loss 15.4552 LearningRate 0.0931 Epoch: 0 Global Step: 29320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:50,636-Speed 2630.94 samples/sec Loss 15.5564 LearningRate 0.0931 Epoch: 0 Global Step: 29330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:54,536-Speed 2625.93 samples/sec Loss 15.4305 LearningRate 0.0931 Epoch: 0 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 22:59:58,438-Speed 2625.25 samples/sec Loss 15.4119 LearningRate 0.0930 Epoch: 0 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:00:02,362-Speed 2610.20 samples/sec Loss 15.4456 LearningRate 0.0930 Epoch: 0 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:00:06,273-Speed 2618.82 samples/sec Loss 15.3740 LearningRate 0.0930 Epoch: 0 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:00:10,178-Speed 2622.66 samples/sec Loss 15.3141 LearningRate 0.0930 Epoch: 0 Global Step: 29380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:14,081-Speed 2623.80 samples/sec Loss 15.3895 LearningRate 0.0930 Epoch: 0 Global Step: 29390 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:17,979-Speed 2628.07 samples/sec Loss 15.6122 LearningRate 0.0930 Epoch: 0 Global Step: 29400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:21,883-Speed 2623.73 samples/sec Loss 15.4547 LearningRate 0.0930 Epoch: 0 Global Step: 29410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:25,788-Speed 2622.93 samples/sec Loss 15.4311 LearningRate 0.0930 Epoch: 0 Global Step: 29420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:29,695-Speed 2621.00 samples/sec Loss 15.4582 LearningRate 0.0930 Epoch: 0 Global Step: 29430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:33,596-Speed 2625.78 samples/sec Loss 15.4415 LearningRate 0.0930 Epoch: 0 Global Step: 29440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:37,495-Speed 2626.93 samples/sec Loss 15.3698 LearningRate 0.0930 Epoch: 0 Global Step: 29450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:41,400-Speed 2622.59 samples/sec Loss 15.4415 LearningRate 0.0930 Epoch: 0 Global Step: 29460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:45,303-Speed 2624.52 samples/sec Loss 15.3407 LearningRate 0.0930 Epoch: 0 Global Step: 29470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:49,198-Speed 2629.26 samples/sec Loss 15.5126 LearningRate 0.0930 Epoch: 0 Global Step: 29480 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 23:00:53,081-Speed 2638.39 samples/sec Loss 15.4865 LearningRate 0.0930 Epoch: 0 Global Step: 29490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:00:56,958-Speed 2641.55 samples/sec Loss 15.2246 LearningRate 0.0930 Epoch: 0 Global Step: 29500 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:00,866-Speed 2620.75 samples/sec Loss 15.4759 LearningRate 0.0930 Epoch: 0 Global Step: 29510 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:04,768-Speed 2625.25 samples/sec Loss 15.3405 LearningRate 0.0930 Epoch: 0 Global Step: 29520 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:08,666-Speed 2626.94 samples/sec Loss 15.2382 LearningRate 0.0930 Epoch: 0 Global Step: 29530 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:12,564-Speed 2628.19 samples/sec Loss 15.5200 LearningRate 0.0930 Epoch: 0 Global Step: 29540 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:16,462-Speed 2627.23 samples/sec Loss 15.4535 LearningRate 0.0930 Epoch: 0 Global Step: 29550 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:20,357-Speed 2629.96 samples/sec Loss 15.2507 LearningRate 0.0930 Epoch: 0 Global Step: 29560 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:24,274-Speed 2614.46 samples/sec Loss 15.4491 LearningRate 0.0930 Epoch: 0 Global Step: 29570 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:28,180-Speed 2622.23 samples/sec Loss 15.4066 LearningRate 0.0930 Epoch: 0 Global Step: 29580 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:32,079-Speed 2627.40 samples/sec Loss 15.3866 LearningRate 0.0930 Epoch: 0 Global Step: 29590 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:01:35,975-Speed 2628.56 samples/sec Loss 15.3749 LearningRate 0.0930 Epoch: 0 Global Step: 29600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:01:39,873-Speed 2627.89 samples/sec Loss 15.3999 LearningRate 0.0930 Epoch: 0 Global Step: 29610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:01:43,772-Speed 2627.13 samples/sec Loss 15.5479 LearningRate 0.0930 Epoch: 0 Global Step: 29620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:01:47,666-Speed 2629.98 samples/sec Loss 15.3121 LearningRate 0.0930 Epoch: 0 Global Step: 29630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:01:51,570-Speed 2623.65 samples/sec Loss 15.3837 LearningRate 0.0930 Epoch: 0 Global Step: 29640 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:01:55,471-Speed 2625.79 samples/sec Loss 15.2103 LearningRate 0.0930 Epoch: 0 Global Step: 29650 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:01:59,374-Speed 2624.14 samples/sec Loss 15.2528 LearningRate 0.0930 Epoch: 0 Global Step: 29660 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:02:03,293-Speed 2613.08 samples/sec Loss 15.4521 LearningRate 0.0930 Epoch: 0 Global Step: 29670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:02:07,196-Speed 2624.50 samples/sec Loss 15.4783 LearningRate 0.0930 Epoch: 0 Global Step: 29680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:02:11,095-Speed 2626.87 samples/sec Loss 15.4479 LearningRate 0.0930 Epoch: 0 Global Step: 29690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:02:14,979-Speed 2636.92 samples/sec Loss 15.4876 LearningRate 0.0930 Epoch: 0 Global Step: 29700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:02:18,880-Speed 2626.11 samples/sec Loss 15.3668 LearningRate 0.0930 Epoch: 0 Global Step: 29710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:02:22,760-Speed 2639.72 samples/sec Loss 15.3512 LearningRate 0.0930 Epoch: 0 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:26,834-Speed 2513.69 samples/sec Loss 15.3042 LearningRate 0.0930 Epoch: 0 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:30,733-Speed 2627.57 samples/sec Loss 15.3324 LearningRate 0.0930 Epoch: 0 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:34,635-Speed 2624.62 samples/sec Loss 15.4684 LearningRate 0.0930 Epoch: 0 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:38,534-Speed 2626.33 samples/sec Loss 15.4069 LearningRate 0.0930 Epoch: 0 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:42,436-Speed 2625.38 samples/sec Loss 15.2420 LearningRate 0.0930 Epoch: 0 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:46,331-Speed 2629.69 samples/sec Loss 15.3919 LearningRate 0.0929 Epoch: 0 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:50,223-Speed 2631.40 samples/sec Loss 15.4150 LearningRate 0.0929 Epoch: 0 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:54,127-Speed 2624.12 samples/sec Loss 15.2252 LearningRate 0.0929 Epoch: 0 Global Step: 29800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:02:58,026-Speed 2626.55 samples/sec Loss 15.3654 LearningRate 0.0929 Epoch: 0 Global Step: 29810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:01,931-Speed 2623.15 samples/sec Loss 15.3148 LearningRate 0.0929 Epoch: 0 Global Step: 29820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:03:05,808-Speed 2641.87 samples/sec Loss 15.3863 LearningRate 0.0929 Epoch: 0 Global Step: 29830 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:09,709-Speed 2625.03 samples/sec Loss 15.5462 LearningRate 0.0929 Epoch: 0 Global Step: 29840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:13,607-Speed 2627.48 samples/sec Loss 15.5164 LearningRate 0.0929 Epoch: 0 Global Step: 29850 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:17,502-Speed 2630.01 samples/sec Loss 15.4188 LearningRate 0.0929 Epoch: 0 Global Step: 29860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:21,397-Speed 2629.32 samples/sec Loss 15.4878 LearningRate 0.0929 Epoch: 0 Global Step: 29870 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:25,295-Speed 2628.28 samples/sec Loss 15.3790 LearningRate 0.0929 Epoch: 0 Global Step: 29880 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:29,190-Speed 2629.13 samples/sec Loss 15.3072 LearningRate 0.0929 Epoch: 0 Global Step: 29890 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:33,086-Speed 2629.30 samples/sec Loss 15.2424 LearningRate 0.0929 Epoch: 0 Global Step: 29900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:36,979-Speed 2630.75 samples/sec Loss 15.3303 LearningRate 0.0929 Epoch: 0 Global Step: 29910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:40,872-Speed 2630.73 samples/sec Loss 15.3693 LearningRate 0.0929 Epoch: 0 Global Step: 29920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:03:44,770-Speed 2627.52 samples/sec Loss 15.4416 LearningRate 0.0929 Epoch: 0 Global Step: 29930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:03:48,747-Speed 2575.63 samples/sec Loss 15.5118 LearningRate 0.0929 Epoch: 0 Global Step: 29940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:03:52,651-Speed 2623.62 samples/sec Loss 15.4025 LearningRate 0.0929 Epoch: 0 Global Step: 29950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:03:56,557-Speed 2622.47 samples/sec Loss 15.4134 LearningRate 0.0929 Epoch: 0 Global Step: 29960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:04:00,467-Speed 2620.15 samples/sec Loss 15.4168 LearningRate 0.0929 Epoch: 0 Global Step: 29970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:04:04,374-Speed 2621.32 samples/sec Loss 15.4043 LearningRate 0.0929 Epoch: 0 Global Step: 29980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:04:08,276-Speed 2624.70 samples/sec Loss 15.4428 LearningRate 0.0929 Epoch: 0 Global Step: 29990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:04:12,152-Speed 2642.73 samples/sec Loss 15.2710 LearningRate 0.0929 Epoch: 0 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:04:55,646-[lfw][30000]XNorm: 22.802505
Training: 2022-04-12 23:04:55,647-[lfw][30000]Accuracy-Flip: 0.99483+-0.00320
Training: 2022-04-12 23:04:55,648-[lfw][30000]Accuracy-Highest: 0.99500
Training: 2022-04-12 23:05:46,179-[cfp_fp][30000]XNorm: 20.030670
Training: 2022-04-12 23:05:46,180-[cfp_fp][30000]Accuracy-Flip: 0.96657+-0.00969
Training: 2022-04-12 23:05:46,181-[cfp_fp][30000]Accuracy-Highest: 0.96657
Training: 2022-04-12 23:06:29,647-[agedb_30][30000]XNorm: 22.689600
Training: 2022-04-12 23:06:29,648-[agedb_30][30000]Accuracy-Flip: 0.94833+-0.00872
Training: 2022-04-12 23:06:29,648-[agedb_30][30000]Accuracy-Highest: 0.94833
Training: 2022-04-12 23:06:33,516-Speed 72.44 samples/sec Loss 15.4600 LearningRate 0.0929 Epoch: 0 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:06:37,394-Speed 2641.88 samples/sec Loss 15.4214 LearningRate 0.0929 Epoch: 0 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:06:41,268-Speed 2643.70 samples/sec Loss 15.4069 LearningRate 0.0929 Epoch: 0 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:06:45,143-Speed 2643.34 samples/sec Loss 15.5326 LearningRate 0.0929 Epoch: 0 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:06:49,020-Speed 2641.86 samples/sec Loss 15.4620 LearningRate 0.0929 Epoch: 0 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:06:52,901-Speed 2639.47 samples/sec Loss 15.2599 LearningRate 0.0929 Epoch: 0 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:06:56,787-Speed 2635.70 samples/sec Loss 15.3573 LearningRate 0.0929 Epoch: 0 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:00,670-Speed 2638.60 samples/sec Loss 15.4528 LearningRate 0.0929 Epoch: 0 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:04,556-Speed 2635.34 samples/sec Loss 15.3624 LearningRate 0.0929 Epoch: 0 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:08,463-Speed 2622.17 samples/sec Loss 15.3936 LearningRate 0.0929 Epoch: 0 Global Step: 30100 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:12,353-Speed 2633.40 samples/sec Loss 15.3039 LearningRate 0.0929 Epoch: 0 Global Step: 30110 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:16,252-Speed 2626.76 samples/sec Loss 15.3198 LearningRate 0.0929 Epoch: 0 Global Step: 30120 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:20,139-Speed 2635.06 samples/sec Loss 15.3563 LearningRate 0.0929 Epoch: 0 Global Step: 30130 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:24,038-Speed 2626.98 samples/sec Loss 15.3412 LearningRate 0.0929 Epoch: 0 Global Step: 30140 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:28,038-Speed 2560.96 samples/sec Loss 15.3991 LearningRate 0.0929 Epoch: 0 Global Step: 30150 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:31,978-Speed 2599.41 samples/sec Loss 15.2092 LearningRate 0.0929 Epoch: 0 Global Step: 30160 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:35,875-Speed 2628.49 samples/sec Loss 15.3585 LearningRate 0.0929 Epoch: 0 Global Step: 30170 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:07:39,761-Speed 2636.26 samples/sec Loss 15.3319 LearningRate 0.0929 Epoch: 0 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:43,718-Speed 2588.26 samples/sec Loss 15.2173 LearningRate 0.0929 Epoch: 0 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:47,622-Speed 2623.58 samples/sec Loss 15.1729 LearningRate 0.0929 Epoch: 0 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:51,520-Speed 2627.31 samples/sec Loss 15.3680 LearningRate 0.0928 Epoch: 0 Global Step: 30210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:55,425-Speed 2623.41 samples/sec Loss 15.2765 LearningRate 0.0928 Epoch: 0 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:07:59,316-Speed 2632.38 samples/sec Loss 15.1989 LearningRate 0.0928 Epoch: 0 Global Step: 30230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:08:03,212-Speed 2628.53 samples/sec Loss 15.3482 LearningRate 0.0928 Epoch: 0 Global Step: 30240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:08:07,118-Speed 2622.66 samples/sec Loss 15.4385 LearningRate 0.0928 Epoch: 0 Global Step: 30250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:08:11,020-Speed 2625.07 samples/sec Loss 15.2930 LearningRate 0.0928 Epoch: 0 Global Step: 30260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:08:14,920-Speed 2626.82 samples/sec Loss 15.2451 LearningRate 0.0928 Epoch: 0 Global Step: 30270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:08:18,816-Speed 2628.48 samples/sec Loss 15.1729 LearningRate 0.0928 Epoch: 0 Global Step: 30280 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:22,720-Speed 2623.85 samples/sec Loss 15.3417 LearningRate 0.0928 Epoch: 0 Global Step: 30290 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:26,619-Speed 2626.57 samples/sec Loss 15.2945 LearningRate 0.0928 Epoch: 0 Global Step: 30300 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:30,516-Speed 2628.75 samples/sec Loss 15.1997 LearningRate 0.0928 Epoch: 0 Global Step: 30310 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:34,416-Speed 2626.31 samples/sec Loss 15.4567 LearningRate 0.0928 Epoch: 0 Global Step: 30320 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:38,321-Speed 2622.90 samples/sec Loss 15.0129 LearningRate 0.0928 Epoch: 0 Global Step: 30330 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:42,232-Speed 2619.38 samples/sec Loss 15.3813 LearningRate 0.0928 Epoch: 0 Global Step: 30340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:46,127-Speed 2629.60 samples/sec Loss 15.4082 LearningRate 0.0928 Epoch: 0 Global Step: 30350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:50,022-Speed 2629.77 samples/sec Loss 15.2575 LearningRate 0.0928 Epoch: 0 Global Step: 30360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:53,918-Speed 2628.94 samples/sec Loss 15.4343 LearningRate 0.0928 Epoch: 0 Global Step: 30370 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:08:57,785-Speed 2648.68 samples/sec Loss 15.2777 LearningRate 0.0928 Epoch: 0 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:01,683-Speed 2627.43 samples/sec Loss 15.3148 LearningRate 0.0928 Epoch: 0 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:05,577-Speed 2631.10 samples/sec Loss 15.3331 LearningRate 0.0928 Epoch: 0 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:09,475-Speed 2627.20 samples/sec Loss 15.2628 LearningRate 0.0928 Epoch: 0 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:13,397-Speed 2611.48 samples/sec Loss 15.3089 LearningRate 0.0928 Epoch: 0 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:17,289-Speed 2631.78 samples/sec Loss 15.2634 LearningRate 0.0928 Epoch: 0 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:21,186-Speed 2628.80 samples/sec Loss 15.3617 LearningRate 0.0928 Epoch: 0 Global Step: 30440 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:25,164-Speed 2575.05 samples/sec Loss 15.3516 LearningRate 0.0928 Epoch: 0 Global Step: 30450 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:29,059-Speed 2629.31 samples/sec Loss 15.2135 LearningRate 0.0928 Epoch: 0 Global Step: 30460 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:32,962-Speed 2624.13 samples/sec Loss 15.4451 LearningRate 0.0928 Epoch: 0 Global Step: 30470 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:37,004-Speed 2534.59 samples/sec Loss 15.3713 LearningRate 0.0928 Epoch: 0 Global Step: 30480 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:09:40,912-Speed 2621.10 samples/sec Loss 15.2705 LearningRate 0.0928 Epoch: 0 Global Step: 30490 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:09:44,800-Speed 2634.34 samples/sec Loss 15.4849 LearningRate 0.0928 Epoch: 0 Global Step: 30500 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:48,720-Speed 2612.67 samples/sec Loss 15.1649 LearningRate 0.0928 Epoch: 0 Global Step: 30510 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:52,670-Speed 2593.38 samples/sec Loss 15.3227 LearningRate 0.0928 Epoch: 0 Global Step: 30520 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:09:56,571-Speed 2625.38 samples/sec Loss 15.2025 LearningRate 0.0928 Epoch: 0 Global Step: 30530 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:00,468-Speed 2628.39 samples/sec Loss 15.3668 LearningRate 0.0928 Epoch: 0 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:04,369-Speed 2625.84 samples/sec Loss 15.1701 LearningRate 0.0928 Epoch: 0 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:08,288-Speed 2613.87 samples/sec Loss 15.2991 LearningRate 0.0928 Epoch: 0 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:12,186-Speed 2627.82 samples/sec Loss 15.3890 LearningRate 0.0928 Epoch: 0 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:16,092-Speed 2622.11 samples/sec Loss 15.3023 LearningRate 0.0928 Epoch: 0 Global Step: 30580 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:19,992-Speed 2627.12 samples/sec Loss 15.3633 LearningRate 0.0928 Epoch: 0 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:23,896-Speed 2623.10 samples/sec Loss 15.3790 LearningRate 0.0928 Epoch: 0 Global Step: 30600 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:10:27,776-Speed 2640.09 samples/sec Loss 15.1684 LearningRate 0.0928 Epoch: 0 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:31,674-Speed 2627.17 samples/sec Loss 15.1718 LearningRate 0.0928 Epoch: 0 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:35,580-Speed 2622.81 samples/sec Loss 15.2849 LearningRate 0.0928 Epoch: 0 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:39,503-Speed 2610.78 samples/sec Loss 15.3432 LearningRate 0.0927 Epoch: 0 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:43,421-Speed 2614.63 samples/sec Loss 15.2112 LearningRate 0.0927 Epoch: 0 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:47,334-Speed 2617.45 samples/sec Loss 15.3223 LearningRate 0.0927 Epoch: 0 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:51,318-Speed 2571.12 samples/sec Loss 15.1722 LearningRate 0.0927 Epoch: 0 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:55,308-Speed 2566.99 samples/sec Loss 15.1646 LearningRate 0.0927 Epoch: 0 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:10:59,284-Speed 2575.85 samples/sec Loss 15.0905 LearningRate 0.0927 Epoch: 0 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:03,197-Speed 2617.94 samples/sec Loss 15.2728 LearningRate 0.0927 Epoch: 0 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:07,090-Speed 2631.33 samples/sec Loss 15.2241 LearningRate 0.0927 Epoch: 0 Global Step: 30710 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:11:10,990-Speed 2626.47 samples/sec Loss 15.3202 LearningRate 0.0927 Epoch: 0 Global Step: 30720 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:11:14,921-Speed 2605.04 samples/sec Loss 15.4775 LearningRate 0.0927 Epoch: 0 Global Step: 30730 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:11:18,938-Speed 2550.20 samples/sec Loss 15.3660 LearningRate 0.0927 Epoch: 0 Global Step: 30740 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:11:22,890-Speed 2592.04 samples/sec Loss 15.2107 LearningRate 0.0927 Epoch: 0 Global Step: 30750 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:11:26,786-Speed 2629.27 samples/sec Loss 15.2427 LearningRate 0.0927 Epoch: 0 Global Step: 30760 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:11:30,683-Speed 2628.07 samples/sec Loss 15.3072 LearningRate 0.0927 Epoch: 0 Global Step: 30770 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:11:34,581-Speed 2627.80 samples/sec Loss 15.1932 LearningRate 0.0927 Epoch: 0 Global Step: 30780 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:38,471-Speed 2632.53 samples/sec Loss 15.2095 LearningRate 0.0927 Epoch: 0 Global Step: 30790 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:42,364-Speed 2631.43 samples/sec Loss 15.1837 LearningRate 0.0927 Epoch: 0 Global Step: 30800 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:46,265-Speed 2625.54 samples/sec Loss 15.3095 LearningRate 0.0927 Epoch: 0 Global Step: 30810 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:50,162-Speed 2628.38 samples/sec Loss 15.1372 LearningRate 0.0927 Epoch: 0 Global Step: 30820 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:54,070-Speed 2621.00 samples/sec Loss 15.3268 LearningRate 0.0927 Epoch: 0 Global Step: 30830 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:11:58,005-Speed 2603.10 samples/sec Loss 15.1801 LearningRate 0.0927 Epoch: 0 Global Step: 30840 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:01,907-Speed 2624.98 samples/sec Loss 15.2899 LearningRate 0.0927 Epoch: 0 Global Step: 30850 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:05,830-Speed 2611.03 samples/sec Loss 15.2849 LearningRate 0.0927 Epoch: 0 Global Step: 30860 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:09,738-Speed 2620.52 samples/sec Loss 15.1843 LearningRate 0.0927 Epoch: 0 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:13,635-Speed 2628.69 samples/sec Loss 15.0807 LearningRate 0.0927 Epoch: 0 Global Step: 30880 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:12:17,538-Speed 2624.45 samples/sec Loss 15.1297 LearningRate 0.0927 Epoch: 0 Global Step: 30890 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:12:21,439-Speed 2625.76 samples/sec Loss 15.0727 LearningRate 0.0927 Epoch: 0 Global Step: 30900 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:12:25,335-Speed 2628.68 samples/sec Loss 15.4031 LearningRate 0.0927 Epoch: 0 Global Step: 30910 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:12:29,225-Speed 2633.28 samples/sec Loss 15.2761 LearningRate 0.0927 Epoch: 0 Global Step: 30920 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:33,127-Speed 2625.24 samples/sec Loss 15.1777 LearningRate 0.0927 Epoch: 0 Global Step: 30930 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:37,024-Speed 2628.26 samples/sec Loss 15.2814 LearningRate 0.0927 Epoch: 0 Global Step: 30940 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:40,920-Speed 2628.66 samples/sec Loss 15.2009 LearningRate 0.0927 Epoch: 0 Global Step: 30950 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:44,816-Speed 2628.96 samples/sec Loss 15.2467 LearningRate 0.0927 Epoch: 0 Global Step: 30960 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:48,726-Speed 2620.25 samples/sec Loss 15.3001 LearningRate 0.0927 Epoch: 0 Global Step: 30970 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:52,622-Speed 2628.72 samples/sec Loss 15.1717 LearningRate 0.0927 Epoch: 0 Global Step: 30980 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:12:56,537-Speed 2616.09 samples/sec Loss 15.1135 LearningRate 0.0927 Epoch: 0 Global Step: 30990 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:13:00,436-Speed 2627.31 samples/sec Loss 15.2318 LearningRate 0.0927 Epoch: 0 Global Step: 31000 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:13:04,342-Speed 2622.14 samples/sec Loss 15.1569 LearningRate 0.0927 Epoch: 0 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:13:08,235-Speed 2631.03 samples/sec Loss 15.4059 LearningRate 0.0927 Epoch: 0 Global Step: 31020 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:12,135-Speed 2626.56 samples/sec Loss 15.2960 LearningRate 0.0927 Epoch: 0 Global Step: 31030 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:16,033-Speed 2627.55 samples/sec Loss 15.2751 LearningRate 0.0927 Epoch: 0 Global Step: 31040 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:19,929-Speed 2628.96 samples/sec Loss 15.1084 LearningRate 0.0927 Epoch: 0 Global Step: 31050 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:23,824-Speed 2629.58 samples/sec Loss 15.2263 LearningRate 0.0927 Epoch: 0 Global Step: 31060 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:27,722-Speed 2630.58 samples/sec Loss 15.2765 LearningRate 0.0926 Epoch: 0 Global Step: 31070 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:31,621-Speed 2626.66 samples/sec Loss 15.2160 LearningRate 0.0926 Epoch: 0 Global Step: 31080 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:35,518-Speed 2628.62 samples/sec Loss 15.0275 LearningRate 0.0926 Epoch: 0 Global Step: 31090 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:39,428-Speed 2619.26 samples/sec Loss 15.1419 LearningRate 0.0926 Epoch: 0 Global Step: 31100 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:43,323-Speed 2629.53 samples/sec Loss 15.2500 LearningRate 0.0926 Epoch: 0 Global Step: 31110 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:13:47,201-Speed 2640.99 samples/sec Loss 15.1450 LearningRate 0.0926 Epoch: 0 Global Step: 31120 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:13:51,098-Speed 2628.94 samples/sec Loss 15.1663 LearningRate 0.0926 Epoch: 0 Global Step: 31130 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:13:54,997-Speed 2626.21 samples/sec Loss 15.0845 LearningRate 0.0926 Epoch: 0 Global Step: 31140 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:13:58,898-Speed 2626.54 samples/sec Loss 15.1876 LearningRate 0.0926 Epoch: 0 Global Step: 31150 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:02,797-Speed 2626.78 samples/sec Loss 15.1230 LearningRate 0.0926 Epoch: 0 Global Step: 31160 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:06,690-Speed 2630.71 samples/sec Loss 15.2710 LearningRate 0.0926 Epoch: 0 Global Step: 31170 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:10,586-Speed 2629.02 samples/sec Loss 15.1805 LearningRate 0.0926 Epoch: 0 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:14,484-Speed 2627.95 samples/sec Loss 15.1374 LearningRate 0.0926 Epoch: 0 Global Step: 31190 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:18,383-Speed 2627.77 samples/sec Loss 15.1181 LearningRate 0.0926 Epoch: 0 Global Step: 31200 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:22,293-Speed 2619.25 samples/sec Loss 15.0659 LearningRate 0.0926 Epoch: 0 Global Step: 31210 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:26,188-Speed 2629.96 samples/sec Loss 15.1274 LearningRate 0.0926 Epoch: 0 Global Step: 31220 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:14:30,104-Speed 2615.67 samples/sec Loss 15.2871 LearningRate 0.0926 Epoch: 0 Global Step: 31230 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:14:33,987-Speed 2637.50 samples/sec Loss 15.1045 LearningRate 0.0926 Epoch: 0 Global Step: 31240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:37,895-Speed 2621.23 samples/sec Loss 15.2066 LearningRate 0.0926 Epoch: 0 Global Step: 31250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:41,795-Speed 2626.54 samples/sec Loss 15.2253 LearningRate 0.0926 Epoch: 0 Global Step: 31260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:45,693-Speed 2628.30 samples/sec Loss 15.0967 LearningRate 0.0926 Epoch: 0 Global Step: 31270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:49,594-Speed 2625.43 samples/sec Loss 15.1408 LearningRate 0.0926 Epoch: 0 Global Step: 31280 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:53,498-Speed 2623.03 samples/sec Loss 15.2443 LearningRate 0.0926 Epoch: 0 Global Step: 31290 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:14:57,405-Speed 2621.81 samples/sec Loss 15.1954 LearningRate 0.0926 Epoch: 0 Global Step: 31300 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:15:01,311-Speed 2622.65 samples/sec Loss 15.1398 LearningRate 0.0926 Epoch: 0 Global Step: 31310 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:15:05,212-Speed 2625.72 samples/sec Loss 15.1392 LearningRate 0.0926 Epoch: 0 Global Step: 31320 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:15:09,109-Speed 2628.25 samples/sec Loss 15.0534 LearningRate 0.0926 Epoch: 0 Global Step: 31330 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:15:13,016-Speed 2621.82 samples/sec Loss 15.0004 LearningRate 0.0926 Epoch: 0 Global Step: 31340 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:16,920-Speed 2623.94 samples/sec Loss 15.2576 LearningRate 0.0926 Epoch: 0 Global Step: 31350 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:20,817-Speed 2628.33 samples/sec Loss 15.1326 LearningRate 0.0926 Epoch: 0 Global Step: 31360 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:24,722-Speed 2622.93 samples/sec Loss 15.2704 LearningRate 0.0926 Epoch: 0 Global Step: 31370 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:28,622-Speed 2625.62 samples/sec Loss 15.0404 LearningRate 0.0926 Epoch: 0 Global Step: 31380 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:32,517-Speed 2630.30 samples/sec Loss 15.0210 LearningRate 0.0926 Epoch: 0 Global Step: 31390 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:36,410-Speed 2630.58 samples/sec Loss 15.3116 LearningRate 0.0926 Epoch: 0 Global Step: 31400 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:40,311-Speed 2625.84 samples/sec Loss 15.3132 LearningRate 0.0926 Epoch: 0 Global Step: 31410 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:15:44,187-Speed 2642.90 samples/sec Loss 15.2467 LearningRate 0.0926 Epoch: 0 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:15:48,084-Speed 2627.87 samples/sec Loss 15.1816 LearningRate 0.0926 Epoch: 0 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:15:52,001-Speed 2614.76 samples/sec Loss 15.2909 LearningRate 0.0926 Epoch: 0 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:15:55,895-Speed 2630.47 samples/sec Loss 15.1075 LearningRate 0.0926 Epoch: 0 Global Step: 31450 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:15:59,793-Speed 2627.69 samples/sec Loss 15.2493 LearningRate 0.0926 Epoch: 0 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:16:03,693-Speed 2625.98 samples/sec Loss 15.1527 LearningRate 0.0926 Epoch: 0 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:16:07,657-Speed 2583.87 samples/sec Loss 15.0387 LearningRate 0.0926 Epoch: 0 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:16:11,559-Speed 2625.37 samples/sec Loss 15.0926 LearningRate 0.0926 Epoch: 0 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:16:15,461-Speed 2624.39 samples/sec Loss 15.1002 LearningRate 0.0925 Epoch: 0 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:16:19,355-Speed 2630.95 samples/sec Loss 15.0739 LearningRate 0.0925 Epoch: 0 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:16:23,258-Speed 2623.59 samples/sec Loss 15.2498 LearningRate 0.0925 Epoch: 0 Global Step: 31520 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:27,165-Speed 2621.39 samples/sec Loss 15.1407 LearningRate 0.0925 Epoch: 0 Global Step: 31530 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:31,065-Speed 2626.51 samples/sec Loss 15.1343 LearningRate 0.0925 Epoch: 0 Global Step: 31540 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:34,980-Speed 2615.95 samples/sec Loss 15.2236 LearningRate 0.0925 Epoch: 0 Global Step: 31550 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:38,877-Speed 2628.37 samples/sec Loss 15.2513 LearningRate 0.0925 Epoch: 0 Global Step: 31560 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:42,771-Speed 2630.98 samples/sec Loss 15.1752 LearningRate 0.0925 Epoch: 0 Global Step: 31570 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:46,672-Speed 2625.28 samples/sec Loss 15.1379 LearningRate 0.0925 Epoch: 0 Global Step: 31580 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:50,573-Speed 2626.02 samples/sec Loss 15.2599 LearningRate 0.0925 Epoch: 0 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:54,467-Speed 2629.95 samples/sec Loss 15.1742 LearningRate 0.0925 Epoch: 0 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:16:58,360-Speed 2630.73 samples/sec Loss 15.2665 LearningRate 0.0925 Epoch: 0 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:17:02,255-Speed 2629.69 samples/sec Loss 15.2395 LearningRate 0.0925 Epoch: 0 Global Step: 31620 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:06,149-Speed 2630.54 samples/sec Loss 15.1894 LearningRate 0.0925 Epoch: 0 Global Step: 31630 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:10,042-Speed 2630.54 samples/sec Loss 15.0339 LearningRate 0.0925 Epoch: 0 Global Step: 31640 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:13,938-Speed 2629.69 samples/sec Loss 14.8609 LearningRate 0.0925 Epoch: 0 Global Step: 31650 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:17,831-Speed 2631.39 samples/sec Loss 15.1177 LearningRate 0.0925 Epoch: 0 Global Step: 31660 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:21,873-Speed 2534.05 samples/sec Loss 15.0956 LearningRate 0.0925 Epoch: 0 Global Step: 31670 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:25,934-Speed 2521.85 samples/sec Loss 14.9889 LearningRate 0.0925 Epoch: 0 Global Step: 31680 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:29,823-Speed 2633.36 samples/sec Loss 15.1898 LearningRate 0.0925 Epoch: 0 Global Step: 31690 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:33,721-Speed 2627.99 samples/sec Loss 15.0605 LearningRate 0.0925 Epoch: 0 Global Step: 31700 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:17:37,614-Speed 2630.59 samples/sec Loss 15.1752 LearningRate 0.0925 Epoch: 0 Global Step: 31710 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:17:41,520-Speed 2622.32 samples/sec Loss 15.2398 LearningRate 0.0925 Epoch: 0 Global Step: 31720 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:17:45,419-Speed 2626.91 samples/sec Loss 15.2719 LearningRate 0.0925 Epoch: 0 Global Step: 31730 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:17:49,315-Speed 2629.22 samples/sec Loss 15.0218 LearningRate 0.0925 Epoch: 0 Global Step: 31740 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:17:53,232-Speed 2614.60 samples/sec Loss 15.0721 LearningRate 0.0925 Epoch: 0 Global Step: 31750 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:17:57,131-Speed 2627.62 samples/sec Loss 15.2409 LearningRate 0.0925 Epoch: 0 Global Step: 31760 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:01,026-Speed 2629.62 samples/sec Loss 15.0964 LearningRate 0.0925 Epoch: 0 Global Step: 31770 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:04,921-Speed 2629.66 samples/sec Loss 15.0314 LearningRate 0.0925 Epoch: 0 Global Step: 31780 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:08,816-Speed 2629.76 samples/sec Loss 14.9716 LearningRate 0.0925 Epoch: 0 Global Step: 31790 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:12,712-Speed 2628.41 samples/sec Loss 15.2479 LearningRate 0.0925 Epoch: 0 Global Step: 31800 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:16,622-Speed 2619.69 samples/sec Loss 15.2167 LearningRate 0.0925 Epoch: 0 Global Step: 31810 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:18:20,515-Speed 2631.24 samples/sec Loss 15.2066 LearningRate 0.0925 Epoch: 0 Global Step: 31820 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:18:24,408-Speed 2631.31 samples/sec Loss 15.0883 LearningRate 0.0925 Epoch: 0 Global Step: 31830 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:18:28,299-Speed 2632.14 samples/sec Loss 15.1858 LearningRate 0.0925 Epoch: 0 Global Step: 31840 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:18:32,190-Speed 2632.44 samples/sec Loss 15.2351 LearningRate 0.0925 Epoch: 0 Global Step: 31850 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:18:36,083-Speed 2630.83 samples/sec Loss 15.1992 LearningRate 0.0925 Epoch: 0 Global Step: 31860 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:18:39,959-Speed 2642.49 samples/sec Loss 15.0604 LearningRate 0.0925 Epoch: 0 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:43,849-Speed 2633.30 samples/sec Loss 15.1086 LearningRate 0.0925 Epoch: 0 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:47,742-Speed 2631.41 samples/sec Loss 14.9740 LearningRate 0.0925 Epoch: 0 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:51,634-Speed 2631.52 samples/sec Loss 15.0621 LearningRate 0.0925 Epoch: 0 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:18:55,494-Speed 2653.81 samples/sec Loss 15.2894 LearningRate 0.0925 Epoch: 0 Global Step: 31910 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:18:59,390-Speed 2629.85 samples/sec Loss 15.0140 LearningRate 0.0925 Epoch: 0 Global Step: 31920 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:03,286-Speed 2628.97 samples/sec Loss 15.0976 LearningRate 0.0925 Epoch: 0 Global Step: 31930 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:07,203-Speed 2614.93 samples/sec Loss 14.9914 LearningRate 0.0924 Epoch: 0 Global Step: 31940 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:11,109-Speed 2622.22 samples/sec Loss 15.1169 LearningRate 0.0924 Epoch: 0 Global Step: 31950 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:15,000-Speed 2632.11 samples/sec Loss 15.2001 LearningRate 0.0924 Epoch: 0 Global Step: 31960 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:18,890-Speed 2633.21 samples/sec Loss 14.9280 LearningRate 0.0924 Epoch: 0 Global Step: 31970 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:22,783-Speed 2630.79 samples/sec Loss 14.9523 LearningRate 0.0924 Epoch: 0 Global Step: 31980 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:26,678-Speed 2629.74 samples/sec Loss 15.1638 LearningRate 0.0924 Epoch: 0 Global Step: 31990 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:30,569-Speed 2632.38 samples/sec Loss 15.0917 LearningRate 0.0924 Epoch: 0 Global Step: 32000 Fp16 Grad Scale: 32768 Required: 90 hours
Training: 2022-04-12 23:19:34,461-Speed 2631.49 samples/sec Loss 15.0577 LearningRate 0.0924 Epoch: 0 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:19:38,353-Speed 2631.38 samples/sec Loss 15.1076 LearningRate 0.0924 Epoch: 0 Global Step: 32020 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:19:42,247-Speed 2630.76 samples/sec Loss 15.2836 LearningRate 0.0924 Epoch: 0 Global Step: 32030 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:19:46,141-Speed 2629.64 samples/sec Loss 15.1205 LearningRate 0.0924 Epoch: 0 Global Step: 32040 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:19:50,037-Speed 2629.23 samples/sec Loss 15.2558 LearningRate 0.0924 Epoch: 0 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:19:53,929-Speed 2631.53 samples/sec Loss 15.2292 LearningRate 0.0924 Epoch: 0 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:19:57,838-Speed 2620.52 samples/sec Loss 15.1677 LearningRate 0.0924 Epoch: 0 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:20:01,729-Speed 2632.00 samples/sec Loss 15.0306 LearningRate 0.0924 Epoch: 0 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:20:05,631-Speed 2624.97 samples/sec Loss 15.2382 LearningRate 0.0924 Epoch: 0 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:20:09,529-Speed 2627.54 samples/sec Loss 15.0871 LearningRate 0.0924 Epoch: 0 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 90 hours
Training: 2022-04-12 23:20:13,434-Speed 2623.49 samples/sec Loss 15.1452 LearningRate 0.0924 Epoch: 0 Global Step: 32110 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:17,339-Speed 2622.80 samples/sec Loss 15.1425 LearningRate 0.0924 Epoch: 0 Global Step: 32120 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:21,234-Speed 2629.48 samples/sec Loss 15.2374 LearningRate 0.0924 Epoch: 0 Global Step: 32130 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:25,128-Speed 2630.08 samples/sec Loss 15.0669 LearningRate 0.0924 Epoch: 0 Global Step: 32140 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:29,037-Speed 2620.21 samples/sec Loss 15.2390 LearningRate 0.0924 Epoch: 0 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:32,945-Speed 2621.04 samples/sec Loss 15.0548 LearningRate 0.0924 Epoch: 0 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:36,839-Speed 2630.21 samples/sec Loss 15.1811 LearningRate 0.0924 Epoch: 0 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:40,734-Speed 2629.73 samples/sec Loss 15.1026 LearningRate 0.0924 Epoch: 0 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:44,631-Speed 2628.52 samples/sec Loss 15.1439 LearningRate 0.0924 Epoch: 0 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:48,525-Speed 2630.12 samples/sec Loss 15.2164 LearningRate 0.0924 Epoch: 0 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:20:52,418-Speed 2630.73 samples/sec Loss 15.1415 LearningRate 0.0924 Epoch: 0 Global Step: 32210 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:20:56,317-Speed 2627.22 samples/sec Loss 15.2364 LearningRate 0.0924 Epoch: 0 Global Step: 32220 Fp16 Grad Scale: 262144 Required: 90 hours
Training: 2022-04-12 23:21:00,204-Speed 2634.90 samples/sec Loss 15.0374 LearningRate 0.0924 Epoch: 0 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:21:04,097-Speed 2631.37 samples/sec Loss 15.0771 LearningRate 0.0924 Epoch: 0 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:21:07,988-Speed 2631.88 samples/sec Loss 15.1525 LearningRate 0.0924 Epoch: 0 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:21:11,884-Speed 2629.21 samples/sec Loss 15.2461 LearningRate 0.0924 Epoch: 0 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:21:15,791-Speed 2622.13 samples/sec Loss 15.0351 LearningRate 0.0924 Epoch: 0 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:21:19,691-Speed 2626.26 samples/sec Loss 15.1281 LearningRate 0.0924 Epoch: 0 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 90 hours
Training: 2022-04-12 23:21:23,586-Speed 2629.40 samples/sec Loss 14.9424 LearningRate 0.0924 Epoch: 0 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:21:27,512-Speed 2608.91 samples/sec Loss 15.0432 LearningRate 0.0924 Epoch: 0 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:21:31,416-Speed 2623.21 samples/sec Loss 14.9389 LearningRate 0.0924 Epoch: 0 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:21:35,490-Speed 2514.33 samples/sec Loss 15.0658 LearningRate 0.0924 Epoch: 0 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:21:39,388-Speed 2627.81 samples/sec Loss 15.0022 LearningRate 0.0924 Epoch: 0 Global Step: 32330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:21:43,412-Speed 2544.99 samples/sec Loss 15.0791 LearningRate 0.0924 Epoch: 0 Global Step: 32340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:21:47,315-Speed 2624.61 samples/sec Loss 15.0936 LearningRate 0.0924 Epoch: 0 Global Step: 32350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:21:51,205-Speed 2632.89 samples/sec Loss 14.9402 LearningRate 0.0924 Epoch: 0 Global Step: 32360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:21:55,104-Speed 2626.94 samples/sec Loss 15.2202 LearningRate 0.0923 Epoch: 0 Global Step: 32370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:21:59,005-Speed 2625.73 samples/sec Loss 15.0449 LearningRate 0.0923 Epoch: 0 Global Step: 32380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:02,903-Speed 2627.62 samples/sec Loss 15.1204 LearningRate 0.0923 Epoch: 0 Global Step: 32390 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:06,807-Speed 2623.49 samples/sec Loss 15.0649 LearningRate 0.0923 Epoch: 0 Global Step: 32400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:10,702-Speed 2630.23 samples/sec Loss 15.1236 LearningRate 0.0923 Epoch: 0 Global Step: 32410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:14,598-Speed 2628.39 samples/sec Loss 15.2007 LearningRate 0.0923 Epoch: 0 Global Step: 32420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:18,506-Speed 2621.31 samples/sec Loss 15.0954 LearningRate 0.0923 Epoch: 0 Global Step: 32430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:22,399-Speed 2631.01 samples/sec Loss 15.1484 LearningRate 0.0923 Epoch: 0 Global Step: 32440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:26,296-Speed 2628.12 samples/sec Loss 15.2139 LearningRate 0.0923 Epoch: 0 Global Step: 32450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:30,191-Speed 2629.71 samples/sec Loss 14.9543 LearningRate 0.0923 Epoch: 0 Global Step: 32460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:34,085-Speed 2630.85 samples/sec Loss 15.2057 LearningRate 0.0923 Epoch: 0 Global Step: 32470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:37,983-Speed 2627.49 samples/sec Loss 15.1228 LearningRate 0.0923 Epoch: 0 Global Step: 32480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:41,915-Speed 2604.82 samples/sec Loss 15.1871 LearningRate 0.0923 Epoch: 0 Global Step: 32490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:45,810-Speed 2629.49 samples/sec Loss 15.0647 LearningRate 0.0923 Epoch: 0 Global Step: 32500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:49,711-Speed 2625.87 samples/sec Loss 15.0827 LearningRate 0.0923 Epoch: 0 Global Step: 32510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:53,609-Speed 2627.47 samples/sec Loss 15.1505 LearningRate 0.0923 Epoch: 0 Global Step: 32520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:22:57,509-Speed 2626.68 samples/sec Loss 14.9900 LearningRate 0.0923 Epoch: 0 Global Step: 32530 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 23:23:01,408-Speed 2626.79 samples/sec Loss 15.0144 LearningRate 0.0923 Epoch: 0 Global Step: 32540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:05,322-Speed 2616.75 samples/sec Loss 15.1522 LearningRate 0.0923 Epoch: 0 Global Step: 32550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:09,220-Speed 2627.76 samples/sec Loss 15.1154 LearningRate 0.0923 Epoch: 0 Global Step: 32560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:13,115-Speed 2629.32 samples/sec Loss 14.9483 LearningRate 0.0923 Epoch: 0 Global Step: 32570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:17,020-Speed 2623.13 samples/sec Loss 14.8703 LearningRate 0.0923 Epoch: 0 Global Step: 32580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:20,918-Speed 2627.83 samples/sec Loss 15.0316 LearningRate 0.0923 Epoch: 0 Global Step: 32590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:24,813-Speed 2629.51 samples/sec Loss 14.9651 LearningRate 0.0923 Epoch: 0 Global Step: 32600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:28,712-Speed 2627.36 samples/sec Loss 15.1530 LearningRate 0.0923 Epoch: 0 Global Step: 32610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:32,620-Speed 2620.11 samples/sec Loss 15.0747 LearningRate 0.0923 Epoch: 0 Global Step: 32620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:36,524-Speed 2623.81 samples/sec Loss 14.9736 LearningRate 0.0923 Epoch: 0 Global Step: 32630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:40,407-Speed 2637.81 samples/sec Loss 15.1331 LearningRate 0.0923 Epoch: 0 Global Step: 32640 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:44,302-Speed 2629.99 samples/sec Loss 15.1768 LearningRate 0.0923 Epoch: 0 Global Step: 32650 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:48,198-Speed 2629.04 samples/sec Loss 15.1104 LearningRate 0.0923 Epoch: 0 Global Step: 32660 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:52,092-Speed 2629.93 samples/sec Loss 14.9886 LearningRate 0.0923 Epoch: 0 Global Step: 32670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:55,987-Speed 2630.02 samples/sec Loss 14.9102 LearningRate 0.0923 Epoch: 0 Global Step: 32680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:23:59,881-Speed 2630.75 samples/sec Loss 15.1068 LearningRate 0.0923 Epoch: 0 Global Step: 32690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:03,775-Speed 2629.87 samples/sec Loss 14.9611 LearningRate 0.0923 Epoch: 0 Global Step: 32700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:07,671-Speed 2628.70 samples/sec Loss 15.0035 LearningRate 0.0923 Epoch: 0 Global Step: 32710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:11,563-Speed 2631.47 samples/sec Loss 14.9610 LearningRate 0.0923 Epoch: 0 Global Step: 32720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:15,464-Speed 2625.85 samples/sec Loss 14.9801 LearningRate 0.0923 Epoch: 0 Global Step: 32730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:19,347-Speed 2638.03 samples/sec Loss 15.0331 LearningRate 0.0923 Epoch: 0 Global Step: 32740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:23,244-Speed 2628.04 samples/sec Loss 15.0283 LearningRate 0.0923 Epoch: 0 Global Step: 32750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:27,140-Speed 2629.10 samples/sec Loss 15.0928 LearningRate 0.0923 Epoch: 0 Global Step: 32760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:31,035-Speed 2629.85 samples/sec Loss 15.0573 LearningRate 0.0923 Epoch: 0 Global Step: 32770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:34,930-Speed 2629.97 samples/sec Loss 15.1280 LearningRate 0.0923 Epoch: 0 Global Step: 32780 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:38,832-Speed 2624.68 samples/sec Loss 15.1040 LearningRate 0.0923 Epoch: 0 Global Step: 32790 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:42,732-Speed 2626.32 samples/sec Loss 15.0953 LearningRate 0.0922 Epoch: 0 Global Step: 32800 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:46,628-Speed 2628.69 samples/sec Loss 15.0316 LearningRate 0.0922 Epoch: 0 Global Step: 32810 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:50,524-Speed 2628.96 samples/sec Loss 15.0691 LearningRate 0.0922 Epoch: 0 Global Step: 32820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:54,420-Speed 2628.66 samples/sec Loss 15.0287 LearningRate 0.0922 Epoch: 0 Global Step: 32830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:24:58,305-Speed 2636.89 samples/sec Loss 15.0160 LearningRate 0.0922 Epoch: 0 Global Step: 32840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:02,193-Speed 2634.45 samples/sec Loss 15.0196 LearningRate 0.0922 Epoch: 0 Global Step: 32850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:06,089-Speed 2628.96 samples/sec Loss 14.9673 LearningRate 0.0922 Epoch: 0 Global Step: 32860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:09,987-Speed 2627.70 samples/sec Loss 14.9052 LearningRate 0.0922 Epoch: 0 Global Step: 32870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:13,884-Speed 2627.99 samples/sec Loss 15.0256 LearningRate 0.0922 Epoch: 0 Global Step: 32880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:17,782-Speed 2627.88 samples/sec Loss 15.0500 LearningRate 0.0922 Epoch: 0 Global Step: 32890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:21,679-Speed 2627.86 samples/sec Loss 14.9480 LearningRate 0.0922 Epoch: 0 Global Step: 32900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:25,573-Speed 2630.14 samples/sec Loss 14.9896 LearningRate 0.0922 Epoch: 0 Global Step: 32910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:29,476-Speed 2624.57 samples/sec Loss 15.2157 LearningRate 0.0922 Epoch: 0 Global Step: 32920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:33,379-Speed 2623.99 samples/sec Loss 15.0759 LearningRate 0.0922 Epoch: 0 Global Step: 32930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:37,274-Speed 2629.78 samples/sec Loss 14.9450 LearningRate 0.0922 Epoch: 0 Global Step: 32940 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 23:25:41,153-Speed 2640.41 samples/sec Loss 15.0808 LearningRate 0.0922 Epoch: 0 Global Step: 32950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:45,054-Speed 2626.59 samples/sec Loss 15.2113 LearningRate 0.0922 Epoch: 0 Global Step: 32960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:48,959-Speed 2623.10 samples/sec Loss 14.9184 LearningRate 0.0922 Epoch: 0 Global Step: 32970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:52,852-Speed 2630.87 samples/sec Loss 15.0355 LearningRate 0.0922 Epoch: 0 Global Step: 32980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:25:56,746-Speed 2630.32 samples/sec Loss 15.0409 LearningRate 0.0922 Epoch: 0 Global Step: 32990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:26:00,645-Speed 2626.83 samples/sec Loss 15.0993 LearningRate 0.0922 Epoch: 0 Global Step: 33000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:26:04,542-Speed 2628.00 samples/sec Loss 14.9712 LearningRate 0.0922 Epoch: 0 Global Step: 33010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:26:08,443-Speed 2625.16 samples/sec Loss 14.9424 LearningRate 0.0922 Epoch: 0 Global Step: 33020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:26:12,337-Speed 2630.95 samples/sec Loss 15.0449 LearningRate 0.0922 Epoch: 0 Global Step: 33030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:26:16,231-Speed 2630.54 samples/sec Loss 14.9980 LearningRate 0.0922 Epoch: 0 Global Step: 33040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:26:20,255-Speed 2545.26 samples/sec Loss 14.9854 LearningRate 0.0922 Epoch: 0 Global Step: 33050 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:26:24,168-Speed 2618.11 samples/sec Loss 14.9975 LearningRate 0.0922 Epoch: 0 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:28,073-Speed 2622.45 samples/sec Loss 14.9518 LearningRate 0.0922 Epoch: 0 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:31,963-Speed 2633.08 samples/sec Loss 14.9722 LearningRate 0.0922 Epoch: 0 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:35,855-Speed 2631.14 samples/sec Loss 15.0667 LearningRate 0.0922 Epoch: 0 Global Step: 33090 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:39,751-Speed 2629.18 samples/sec Loss 14.9969 LearningRate 0.0922 Epoch: 0 Global Step: 33100 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:43,672-Speed 2611.94 samples/sec Loss 15.1204 LearningRate 0.0922 Epoch: 0 Global Step: 33110 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:47,574-Speed 2624.87 samples/sec Loss 15.1921 LearningRate 0.0922 Epoch: 0 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:51,498-Speed 2610.89 samples/sec Loss 15.0241 LearningRate 0.0922 Epoch: 0 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:55,403-Speed 2622.71 samples/sec Loss 15.0004 LearningRate 0.0922 Epoch: 0 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:26:59,328-Speed 2609.99 samples/sec Loss 15.0180 LearningRate 0.0922 Epoch: 0 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:27:03,246-Speed 2614.45 samples/sec Loss 14.9298 LearningRate 0.0922 Epoch: 0 Global Step: 33160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:07,189-Speed 2597.07 samples/sec Loss 14.9469 LearningRate 0.0922 Epoch: 0 Global Step: 33170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:11,098-Speed 2620.04 samples/sec Loss 15.0842 LearningRate 0.0922 Epoch: 0 Global Step: 33180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:14,996-Speed 2628.53 samples/sec Loss 14.8448 LearningRate 0.0922 Epoch: 0 Global Step: 33190 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:18,890-Speed 2630.02 samples/sec Loss 14.9606 LearningRate 0.0922 Epoch: 0 Global Step: 33200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:22,818-Speed 2607.33 samples/sec Loss 14.9833 LearningRate 0.0922 Epoch: 0 Global Step: 33210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:26,713-Speed 2630.12 samples/sec Loss 15.0877 LearningRate 0.0922 Epoch: 0 Global Step: 33220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:30,607-Speed 2630.20 samples/sec Loss 15.0145 LearningRate 0.0921 Epoch: 0 Global Step: 33230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:34,506-Speed 2626.57 samples/sec Loss 14.7710 LearningRate 0.0921 Epoch: 0 Global Step: 33240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:38,432-Speed 2609.27 samples/sec Loss 14.9417 LearningRate 0.0921 Epoch: 0 Global Step: 33250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:42,328-Speed 2629.06 samples/sec Loss 14.9885 LearningRate 0.0921 Epoch: 0 Global Step: 33260 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 23:27:46,215-Speed 2635.32 samples/sec Loss 14.8693 LearningRate 0.0921 Epoch: 0 Global Step: 33270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:27:50,098-Speed 2638.05 samples/sec Loss 15.0154 LearningRate 0.0921 Epoch: 0 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:27:53,998-Speed 2626.42 samples/sec Loss 14.8720 LearningRate 0.0921 Epoch: 0 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:27:57,894-Speed 2628.90 samples/sec Loss 14.9415 LearningRate 0.0921 Epoch: 0 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:01,790-Speed 2629.13 samples/sec Loss 14.9695 LearningRate 0.0921 Epoch: 0 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:05,685-Speed 2629.04 samples/sec Loss 15.0288 LearningRate 0.0921 Epoch: 0 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:09,581-Speed 2629.02 samples/sec Loss 14.9848 LearningRate 0.0921 Epoch: 0 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:13,478-Speed 2628.56 samples/sec Loss 14.8323 LearningRate 0.0921 Epoch: 0 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:17,395-Speed 2614.76 samples/sec Loss 14.9693 LearningRate 0.0921 Epoch: 0 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:21,294-Speed 2627.49 samples/sec Loss 14.9799 LearningRate 0.0921 Epoch: 0 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:25,200-Speed 2622.14 samples/sec Loss 14.8973 LearningRate 0.0921 Epoch: 0 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:29,093-Speed 2630.95 samples/sec Loss 15.0072 LearningRate 0.0921 Epoch: 0 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:32,992-Speed 2627.33 samples/sec Loss 14.9369 LearningRate 0.0921 Epoch: 0 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:36,891-Speed 2626.35 samples/sec Loss 14.9119 LearningRate 0.0921 Epoch: 0 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:40,792-Speed 2626.59 samples/sec Loss 14.7909 LearningRate 0.0921 Epoch: 0 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:44,697-Speed 2622.74 samples/sec Loss 14.9903 LearningRate 0.0921 Epoch: 0 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:48,605-Speed 2621.06 samples/sec Loss 14.9954 LearningRate 0.0921 Epoch: 0 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:52,503-Speed 2628.25 samples/sec Loss 15.0026 LearningRate 0.0921 Epoch: 0 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:28:56,409-Speed 2622.13 samples/sec Loss 14.9368 LearningRate 0.0921 Epoch: 0 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:00,306-Speed 2627.88 samples/sec Loss 14.9846 LearningRate 0.0921 Epoch: 0 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:04,220-Speed 2616.57 samples/sec Loss 14.8786 LearningRate 0.0921 Epoch: 0 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:08,111-Speed 2632.74 samples/sec Loss 15.0684 LearningRate 0.0921 Epoch: 0 Global Step: 33480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:29:12,010-Speed 2626.59 samples/sec Loss 14.8097 LearningRate 0.0921 Epoch: 0 Global Step: 33490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:29:15,903-Speed 2631.48 samples/sec Loss 14.9985 LearningRate 0.0921 Epoch: 0 Global Step: 33500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:29:19,778-Speed 2643.91 samples/sec Loss 14.8545 LearningRate 0.0921 Epoch: 0 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:23,674-Speed 2628.88 samples/sec Loss 15.0837 LearningRate 0.0921 Epoch: 0 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:27,572-Speed 2627.25 samples/sec Loss 14.9759 LearningRate 0.0921 Epoch: 0 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:31,467-Speed 2629.62 samples/sec Loss 14.9340 LearningRate 0.0921 Epoch: 0 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:35,361-Speed 2630.26 samples/sec Loss 15.0802 LearningRate 0.0921 Epoch: 0 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:39,270-Speed 2620.29 samples/sec Loss 15.0002 LearningRate 0.0921 Epoch: 0 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:43,171-Speed 2625.82 samples/sec Loss 15.0520 LearningRate 0.0921 Epoch: 0 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:47,073-Speed 2624.67 samples/sec Loss 14.9454 LearningRate 0.0921 Epoch: 0 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:50,981-Speed 2620.96 samples/sec Loss 14.8876 LearningRate 0.0921 Epoch: 0 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:54,878-Speed 2628.51 samples/sec Loss 14.9521 LearningRate 0.0921 Epoch: 0 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:29:58,775-Speed 2628.32 samples/sec Loss 14.9875 LearningRate 0.0921 Epoch: 0 Global Step: 33610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:30:02,673-Speed 2627.45 samples/sec Loss 14.8608 LearningRate 0.0921 Epoch: 0 Global Step: 33620 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:30:06,579-Speed 2621.77 samples/sec Loss 14.8653 LearningRate 0.0921 Epoch: 0 Global Step: 33630 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:30:10,477-Speed 2627.83 samples/sec Loss 14.8481 LearningRate 0.0921 Epoch: 0 Global Step: 33640 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:30:14,387-Speed 2619.80 samples/sec Loss 14.9294 LearningRate 0.0921 Epoch: 0 Global Step: 33650 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:30:18,287-Speed 2626.28 samples/sec Loss 14.9609 LearningRate 0.0920 Epoch: 0 Global Step: 33660 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:30:22,148-Speed 2653.03 samples/sec Loss 14.8815 LearningRate 0.0920 Epoch: 0 Global Step: 33670 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:26,053-Speed 2622.84 samples/sec Loss 15.0118 LearningRate 0.0920 Epoch: 0 Global Step: 33680 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:29,963-Speed 2619.60 samples/sec Loss 14.9455 LearningRate 0.0920 Epoch: 0 Global Step: 33690 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:33,861-Speed 2627.73 samples/sec Loss 14.9618 LearningRate 0.0920 Epoch: 0 Global Step: 33700 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:37,763-Speed 2624.49 samples/sec Loss 14.9164 LearningRate 0.0920 Epoch: 0 Global Step: 33710 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:41,661-Speed 2627.64 samples/sec Loss 14.9317 LearningRate 0.0920 Epoch: 0 Global Step: 33720 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:45,558-Speed 2628.10 samples/sec Loss 15.0242 LearningRate 0.0920 Epoch: 0 Global Step: 33730 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:49,455-Speed 2629.02 samples/sec Loss 14.9043 LearningRate 0.0920 Epoch: 0 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:53,361-Speed 2622.59 samples/sec Loss 14.9802 LearningRate 0.0920 Epoch: 0 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:30:57,260-Speed 2627.10 samples/sec Loss 14.9454 LearningRate 0.0920 Epoch: 0 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:31:01,156-Speed 2628.91 samples/sec Loss 14.9144 LearningRate 0.0920 Epoch: 0 Global Step: 33770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:05,053-Speed 2628.22 samples/sec Loss 14.9417 LearningRate 0.0920 Epoch: 0 Global Step: 33780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:08,960-Speed 2621.43 samples/sec Loss 15.0037 LearningRate 0.0920 Epoch: 0 Global Step: 33790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:12,864-Speed 2623.97 samples/sec Loss 14.8254 LearningRate 0.0920 Epoch: 0 Global Step: 33800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:16,761-Speed 2628.38 samples/sec Loss 14.7838 LearningRate 0.0920 Epoch: 0 Global Step: 33810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:20,668-Speed 2621.60 samples/sec Loss 14.9449 LearningRate 0.0920 Epoch: 0 Global Step: 33820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:24,568-Speed 2626.13 samples/sec Loss 14.8676 LearningRate 0.0920 Epoch: 0 Global Step: 33830 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:28,488-Speed 2613.13 samples/sec Loss 14.9192 LearningRate 0.0920 Epoch: 0 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:32,380-Speed 2631.86 samples/sec Loss 14.9929 LearningRate 0.0920 Epoch: 0 Global Step: 33850 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:36,282-Speed 2624.48 samples/sec Loss 14.9312 LearningRate 0.0920 Epoch: 0 Global Step: 33860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:31:40,179-Speed 2628.33 samples/sec Loss 14.7776 LearningRate 0.0920 Epoch: 0 Global Step: 33870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:31:44,129-Speed 2593.55 samples/sec Loss 14.8612 LearningRate 0.0920 Epoch: 0 Global Step: 33880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:31:48,019-Speed 2632.67 samples/sec Loss 14.9181 LearningRate 0.0920 Epoch: 0 Global Step: 33890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:31:51,919-Speed 2627.06 samples/sec Loss 14.8650 LearningRate 0.0920 Epoch: 0 Global Step: 33900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:31:55,811-Speed 2631.14 samples/sec Loss 14.9133 LearningRate 0.0920 Epoch: 0 Global Step: 33910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:31:59,687-Speed 2643.29 samples/sec Loss 14.7749 LearningRate 0.0920 Epoch: 0 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:03,585-Speed 2627.99 samples/sec Loss 14.9152 LearningRate 0.0920 Epoch: 0 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:07,486-Speed 2625.11 samples/sec Loss 14.9519 LearningRate 0.0920 Epoch: 0 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:11,403-Speed 2614.86 samples/sec Loss 15.0259 LearningRate 0.0920 Epoch: 0 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:15,296-Speed 2631.39 samples/sec Loss 14.9772 LearningRate 0.0920 Epoch: 0 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:19,201-Speed 2622.70 samples/sec Loss 14.9416 LearningRate 0.0920 Epoch: 0 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:23,094-Speed 2631.02 samples/sec Loss 14.8620 LearningRate 0.0920 Epoch: 0 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:26,993-Speed 2626.95 samples/sec Loss 14.8288 LearningRate 0.0920 Epoch: 0 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:30,890-Speed 2628.36 samples/sec Loss 14.8550 LearningRate 0.0920 Epoch: 0 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:34,784-Speed 2630.46 samples/sec Loss 14.8793 LearningRate 0.0920 Epoch: 0 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:38,673-Speed 2633.29 samples/sec Loss 14.9172 LearningRate 0.0920 Epoch: 0 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:32:42,548-Speed 2643.53 samples/sec Loss 14.9520 LearningRate 0.0920 Epoch: 0 Global Step: 34030 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:32:46,438-Speed 2633.58 samples/sec Loss 14.8393 LearningRate 0.0920 Epoch: 0 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:32:50,336-Speed 2627.43 samples/sec Loss 14.7983 LearningRate 0.0920 Epoch: 0 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:32:54,248-Speed 2618.56 samples/sec Loss 14.8458 LearningRate 0.0920 Epoch: 0 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:32:58,135-Speed 2634.90 samples/sec Loss 14.8353 LearningRate 0.0920 Epoch: 0 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:33:02,033-Speed 2627.51 samples/sec Loss 14.9244 LearningRate 0.0920 Epoch: 0 Global Step: 34080 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:33:05,931-Speed 2627.82 samples/sec Loss 14.8203 LearningRate 0.0920 Epoch: 0 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:33:09,824-Speed 2630.97 samples/sec Loss 14.9738 LearningRate 0.0919 Epoch: 0 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:33:13,714-Speed 2633.12 samples/sec Loss 14.8653 LearningRate 0.0919 Epoch: 0 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:33:17,607-Speed 2631.40 samples/sec Loss 14.9405 LearningRate 0.0919 Epoch: 0 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:33:21,508-Speed 2625.10 samples/sec Loss 14.9299 LearningRate 0.0919 Epoch: 0 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:25,411-Speed 2624.83 samples/sec Loss 14.9675 LearningRate 0.0919 Epoch: 0 Global Step: 34140 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:29,308-Speed 2628.06 samples/sec Loss 14.7837 LearningRate 0.0919 Epoch: 0 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:33,205-Speed 2628.15 samples/sec Loss 14.8411 LearningRate 0.0919 Epoch: 0 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:37,269-Speed 2520.17 samples/sec Loss 14.8631 LearningRate 0.0919 Epoch: 0 Global Step: 34170 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:41,286-Speed 2550.33 samples/sec Loss 14.8403 LearningRate 0.0919 Epoch: 0 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:45,185-Speed 2627.25 samples/sec Loss 14.9761 LearningRate 0.0919 Epoch: 0 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:49,082-Speed 2627.90 samples/sec Loss 14.9360 LearningRate 0.0919 Epoch: 0 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:52,985-Speed 2624.60 samples/sec Loss 14.8138 LearningRate 0.0919 Epoch: 0 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:33:56,884-Speed 2627.51 samples/sec Loss 14.8540 LearningRate 0.0919 Epoch: 0 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:34:00,743-Speed 2653.81 samples/sec Loss 14.9829 LearningRate 0.0919 Epoch: 0 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:04,655-Speed 2618.23 samples/sec Loss 14.8847 LearningRate 0.0919 Epoch: 0 Global Step: 34240 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:08,547-Speed 2631.73 samples/sec Loss 14.8845 LearningRate 0.0919 Epoch: 0 Global Step: 34250 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:12,443-Speed 2629.36 samples/sec Loss 14.8790 LearningRate 0.0919 Epoch: 0 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:16,335-Speed 2631.35 samples/sec Loss 15.0544 LearningRate 0.0919 Epoch: 0 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:20,234-Speed 2627.18 samples/sec Loss 14.7535 LearningRate 0.0919 Epoch: 0 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:24,137-Speed 2624.33 samples/sec Loss 14.9140 LearningRate 0.0919 Epoch: 0 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:28,033-Speed 2629.21 samples/sec Loss 14.8558 LearningRate 0.0919 Epoch: 0 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:31,946-Speed 2617.44 samples/sec Loss 14.7232 LearningRate 0.0919 Epoch: 0 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:35,843-Speed 2628.12 samples/sec Loss 14.8525 LearningRate 0.0919 Epoch: 0 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:34:39,755-Speed 2618.50 samples/sec Loss 14.8888 LearningRate 0.0919 Epoch: 0 Global Step: 34330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:34:43,653-Speed 2627.49 samples/sec Loss 14.7720 LearningRate 0.0919 Epoch: 0 Global Step: 34340 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:34:47,545-Speed 2631.88 samples/sec Loss 14.9586 LearningRate 0.0919 Epoch: 0 Global Step: 34350 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:34:51,437-Speed 2631.92 samples/sec Loss 14.8251 LearningRate 0.0919 Epoch: 0 Global Step: 34360 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:34:55,329-Speed 2631.61 samples/sec Loss 14.8306 LearningRate 0.0919 Epoch: 0 Global Step: 34370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:34:59,223-Speed 2630.45 samples/sec Loss 14.9223 LearningRate 0.0919 Epoch: 0 Global Step: 34380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:35:03,115-Speed 2631.49 samples/sec Loss 14.9731 LearningRate 0.0919 Epoch: 0 Global Step: 34390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:35:07,002-Speed 2635.10 samples/sec Loss 14.9957 LearningRate 0.0919 Epoch: 0 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:35:10,887-Speed 2637.05 samples/sec Loss 15.0298 LearningRate 0.0919 Epoch: 0 Global Step: 34410 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:35:14,755-Speed 2647.93 samples/sec Loss 15.0642 LearningRate 0.0919 Epoch: 0 Global Step: 34420 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:18,661-Speed 2622.19 samples/sec Loss 14.6983 LearningRate 0.0919 Epoch: 0 Global Step: 34430 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:22,558-Speed 2628.73 samples/sec Loss 14.7714 LearningRate 0.0919 Epoch: 0 Global Step: 34440 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:26,482-Speed 2609.91 samples/sec Loss 14.8592 LearningRate 0.0919 Epoch: 0 Global Step: 34450 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:30,398-Speed 2616.69 samples/sec Loss 14.9750 LearningRate 0.0919 Epoch: 0 Global Step: 34460 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:34,299-Speed 2625.21 samples/sec Loss 14.9046 LearningRate 0.0919 Epoch: 0 Global Step: 34470 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:38,202-Speed 2624.88 samples/sec Loss 14.9236 LearningRate 0.0919 Epoch: 0 Global Step: 34480 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:42,149-Speed 2594.39 samples/sec Loss 14.9773 LearningRate 0.0919 Epoch: 0 Global Step: 34490 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:46,098-Speed 2594.47 samples/sec Loss 14.7090 LearningRate 0.0919 Epoch: 0 Global Step: 34500 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:49,999-Speed 2626.04 samples/sec Loss 14.7299 LearningRate 0.0919 Epoch: 0 Global Step: 34510 Fp16 Grad Scale: 16384 Required: 89 hours
Training: 2022-04-12 23:35:53,935-Speed 2601.74 samples/sec Loss 15.0252 LearningRate 0.0919 Epoch: 0 Global Step: 34520 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:35:57,828-Speed 2631.30 samples/sec Loss 14.7789 LearningRate 0.0918 Epoch: 0 Global Step: 34530 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:01,723-Speed 2629.86 samples/sec Loss 14.9176 LearningRate 0.0918 Epoch: 0 Global Step: 34540 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:05,625-Speed 2624.53 samples/sec Loss 14.8577 LearningRate 0.0918 Epoch: 0 Global Step: 34550 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:09,523-Speed 2627.70 samples/sec Loss 14.8301 LearningRate 0.0918 Epoch: 0 Global Step: 34560 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:13,417-Speed 2630.86 samples/sec Loss 14.9119 LearningRate 0.0918 Epoch: 0 Global Step: 34570 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:17,316-Speed 2626.67 samples/sec Loss 14.7277 LearningRate 0.0918 Epoch: 0 Global Step: 34580 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:21,211-Speed 2629.62 samples/sec Loss 14.8982 LearningRate 0.0918 Epoch: 0 Global Step: 34590 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:25,115-Speed 2623.71 samples/sec Loss 14.8364 LearningRate 0.0918 Epoch: 0 Global Step: 34600 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:29,009-Speed 2630.38 samples/sec Loss 14.8483 LearningRate 0.0918 Epoch: 0 Global Step: 34610 Fp16 Grad Scale: 32768 Required: 89 hours
Training: 2022-04-12 23:36:32,908-Speed 2627.42 samples/sec Loss 14.8890 LearningRate 0.0918 Epoch: 0 Global Step: 34620 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:36:36,800-Speed 2631.33 samples/sec Loss 14.8503 LearningRate 0.0918 Epoch: 0 Global Step: 34630 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:36:40,694-Speed 2629.77 samples/sec Loss 14.7516 LearningRate 0.0918 Epoch: 0 Global Step: 34640 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:36:44,590-Speed 2630.19 samples/sec Loss 14.8353 LearningRate 0.0918 Epoch: 0 Global Step: 34650 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:36:48,495-Speed 2622.19 samples/sec Loss 14.9055 LearningRate 0.0918 Epoch: 0 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:36:52,402-Speed 2621.93 samples/sec Loss 14.8726 LearningRate 0.0918 Epoch: 0 Global Step: 34670 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:36:56,301-Speed 2626.65 samples/sec Loss 14.8647 LearningRate 0.0918 Epoch: 0 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:37:00,214-Speed 2617.99 samples/sec Loss 14.7552 LearningRate 0.0918 Epoch: 0 Global Step: 34690 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:37:04,109-Speed 2629.49 samples/sec Loss 14.8017 LearningRate 0.0918 Epoch: 0 Global Step: 34700 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:37:08,001-Speed 2631.73 samples/sec Loss 14.7827 LearningRate 0.0918 Epoch: 0 Global Step: 34710 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:37:11,896-Speed 2629.68 samples/sec Loss 14.8726 LearningRate 0.0918 Epoch: 0 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:15,790-Speed 2629.95 samples/sec Loss 14.9195 LearningRate 0.0918 Epoch: 0 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:19,690-Speed 2625.88 samples/sec Loss 14.9369 LearningRate 0.0918 Epoch: 0 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:23,583-Speed 2631.64 samples/sec Loss 14.8872 LearningRate 0.0918 Epoch: 0 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:27,477-Speed 2629.97 samples/sec Loss 14.6997 LearningRate 0.0918 Epoch: 0 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:31,373-Speed 2629.24 samples/sec Loss 14.7657 LearningRate 0.0918 Epoch: 0 Global Step: 34770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:35,269-Speed 2629.07 samples/sec Loss 14.8521 LearningRate 0.0918 Epoch: 0 Global Step: 34780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:39,161-Speed 2631.78 samples/sec Loss 14.7675 LearningRate 0.0918 Epoch: 0 Global Step: 34790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:43,189-Speed 2542.68 samples/sec Loss 14.8349 LearningRate 0.0918 Epoch: 0 Global Step: 34800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:47,087-Speed 2626.93 samples/sec Loss 14.9033 LearningRate 0.0918 Epoch: 0 Global Step: 34810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:37:50,989-Speed 2625.01 samples/sec Loss 14.8078 LearningRate 0.0918 Epoch: 0 Global Step: 34820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:37:54,889-Speed 2626.49 samples/sec Loss 14.7687 LearningRate 0.0918 Epoch: 0 Global Step: 34830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:37:58,794-Speed 2622.81 samples/sec Loss 14.9143 LearningRate 0.0918 Epoch: 0 Global Step: 34840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:02,697-Speed 2624.04 samples/sec Loss 14.7909 LearningRate 0.0918 Epoch: 0 Global Step: 34850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:06,597-Speed 2626.17 samples/sec Loss 14.8241 LearningRate 0.0918 Epoch: 0 Global Step: 34860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:10,492-Speed 2629.37 samples/sec Loss 14.8256 LearningRate 0.0918 Epoch: 0 Global Step: 34870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:14,406-Speed 2617.80 samples/sec Loss 14.8818 LearningRate 0.0918 Epoch: 0 Global Step: 34880 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:18,323-Speed 2614.88 samples/sec Loss 14.8138 LearningRate 0.0918 Epoch: 0 Global Step: 34890 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:22,232-Speed 2620.08 samples/sec Loss 14.8169 LearningRate 0.0918 Epoch: 0 Global Step: 34900 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:26,180-Speed 2594.25 samples/sec Loss 14.7869 LearningRate 0.0918 Epoch: 0 Global Step: 34910 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:30,064-Speed 2637.60 samples/sec Loss 14.8165 LearningRate 0.0918 Epoch: 0 Global Step: 34920 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:33,968-Speed 2623.22 samples/sec Loss 14.8856 LearningRate 0.0918 Epoch: 0 Global Step: 34930 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:37,869-Speed 2626.02 samples/sec Loss 14.8995 LearningRate 0.0918 Epoch: 0 Global Step: 34940 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:41,763-Speed 2630.74 samples/sec Loss 14.8872 LearningRate 0.0918 Epoch: 0 Global Step: 34950 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:45,665-Speed 2624.80 samples/sec Loss 14.8836 LearningRate 0.0917 Epoch: 0 Global Step: 34960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:49,562-Speed 2628.03 samples/sec Loss 14.8023 LearningRate 0.0917 Epoch: 0 Global Step: 34970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:53,457-Speed 2629.97 samples/sec Loss 14.6967 LearningRate 0.0917 Epoch: 0 Global Step: 34980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:38:57,364-Speed 2621.70 samples/sec Loss 14.9274 LearningRate 0.0917 Epoch: 0 Global Step: 34990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:01,267-Speed 2623.95 samples/sec Loss 14.8785 LearningRate 0.0917 Epoch: 0 Global Step: 35000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:05,169-Speed 2624.91 samples/sec Loss 14.7803 LearningRate 0.0917 Epoch: 0 Global Step: 35010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:09,066-Speed 2628.62 samples/sec Loss 14.7869 LearningRate 0.0917 Epoch: 0 Global Step: 35020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:12,981-Speed 2616.25 samples/sec Loss 14.9270 LearningRate 0.0917 Epoch: 0 Global Step: 35030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:16,902-Speed 2612.28 samples/sec Loss 14.8249 LearningRate 0.0917 Epoch: 0 Global Step: 35040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:20,828-Speed 2608.51 samples/sec Loss 14.7531 LearningRate 0.0917 Epoch: 0 Global Step: 35050 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:24,725-Speed 2629.00 samples/sec Loss 14.8089 LearningRate 0.0917 Epoch: 0 Global Step: 35060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:28,627-Speed 2625.21 samples/sec Loss 14.7624 LearningRate 0.0917 Epoch: 0 Global Step: 35070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:32,541-Speed 2616.12 samples/sec Loss 14.5531 LearningRate 0.0917 Epoch: 0 Global Step: 35080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:36,435-Speed 2630.54 samples/sec Loss 14.8272 LearningRate 0.0917 Epoch: 0 Global Step: 35090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:40,341-Speed 2622.50 samples/sec Loss 14.7491 LearningRate 0.0917 Epoch: 0 Global Step: 35100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:44,246-Speed 2622.98 samples/sec Loss 14.7429 LearningRate 0.0917 Epoch: 0 Global Step: 35110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:48,197-Speed 2592.44 samples/sec Loss 14.7747 LearningRate 0.0917 Epoch: 0 Global Step: 35120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:52,129-Speed 2604.79 samples/sec Loss 14.8959 LearningRate 0.0917 Epoch: 0 Global Step: 35130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:56,023-Speed 2630.39 samples/sec Loss 14.9822 LearningRate 0.0917 Epoch: 0 Global Step: 35140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:39:59,921-Speed 2627.62 samples/sec Loss 14.8307 LearningRate 0.0917 Epoch: 0 Global Step: 35150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:03,821-Speed 2626.09 samples/sec Loss 14.7799 LearningRate 0.0917 Epoch: 0 Global Step: 35160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:07,730-Speed 2620.29 samples/sec Loss 14.8384 LearningRate 0.0917 Epoch: 0 Global Step: 35170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:11,629-Speed 2627.60 samples/sec Loss 14.8196 LearningRate 0.0917 Epoch: 0 Global Step: 35180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:15,529-Speed 2626.43 samples/sec Loss 14.6865 LearningRate 0.0917 Epoch: 0 Global Step: 35190 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:19,429-Speed 2625.82 samples/sec Loss 14.6684 LearningRate 0.0917 Epoch: 0 Global Step: 35200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:23,326-Speed 2629.06 samples/sec Loss 14.5941 LearningRate 0.0917 Epoch: 0 Global Step: 35210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:27,206-Speed 2639.19 samples/sec Loss 14.6704 LearningRate 0.0917 Epoch: 0 Global Step: 35220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:31,104-Speed 2627.76 samples/sec Loss 14.9183 LearningRate 0.0917 Epoch: 0 Global Step: 35230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:34,999-Speed 2629.82 samples/sec Loss 14.8450 LearningRate 0.0917 Epoch: 0 Global Step: 35240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:38,894-Speed 2629.72 samples/sec Loss 14.7977 LearningRate 0.0917 Epoch: 0 Global Step: 35250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:42,787-Speed 2631.16 samples/sec Loss 14.7694 LearningRate 0.0917 Epoch: 0 Global Step: 35260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:46,679-Speed 2632.52 samples/sec Loss 14.8131 LearningRate 0.0917 Epoch: 0 Global Step: 35270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:50,578-Speed 2626.46 samples/sec Loss 14.7898 LearningRate 0.0917 Epoch: 0 Global Step: 35280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:54,472-Speed 2630.36 samples/sec Loss 14.8016 LearningRate 0.0917 Epoch: 0 Global Step: 35290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:40:58,367-Speed 2629.63 samples/sec Loss 14.7053 LearningRate 0.0917 Epoch: 0 Global Step: 35300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:02,264-Speed 2628.55 samples/sec Loss 14.7876 LearningRate 0.0917 Epoch: 0 Global Step: 35310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:06,163-Speed 2626.86 samples/sec Loss 14.7540 LearningRate 0.0917 Epoch: 0 Global Step: 35320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:10,060-Speed 2628.17 samples/sec Loss 14.8242 LearningRate 0.0917 Epoch: 0 Global Step: 35330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:13,959-Speed 2627.07 samples/sec Loss 14.7867 LearningRate 0.0917 Epoch: 0 Global Step: 35340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:17,863-Speed 2623.27 samples/sec Loss 14.8764 LearningRate 0.0917 Epoch: 0 Global Step: 35350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:21,766-Speed 2624.86 samples/sec Loss 14.6531 LearningRate 0.0917 Epoch: 0 Global Step: 35360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:25,664-Speed 2627.77 samples/sec Loss 14.8244 LearningRate 0.0917 Epoch: 0 Global Step: 35370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:29,588-Speed 2610.13 samples/sec Loss 14.8147 LearningRate 0.0917 Epoch: 0 Global Step: 35380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:33,532-Speed 2597.20 samples/sec Loss 14.6142 LearningRate 0.0916 Epoch: 0 Global Step: 35390 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:37,440-Speed 2620.79 samples/sec Loss 14.7889 LearningRate 0.0916 Epoch: 0 Global Step: 35400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:41,351-Speed 2618.71 samples/sec Loss 14.7567 LearningRate 0.0916 Epoch: 0 Global Step: 35410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:45,264-Speed 2617.70 samples/sec Loss 14.6779 LearningRate 0.0916 Epoch: 0 Global Step: 35420 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-12 23:41:49,160-Speed 2629.14 samples/sec Loss 14.5487 LearningRate 0.0916 Epoch: 0 Global Step: 35430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:53,072-Speed 2618.31 samples/sec Loss 14.7968 LearningRate 0.0916 Epoch: 0 Global Step: 35440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:41:56,991-Speed 2613.52 samples/sec Loss 14.7568 LearningRate 0.0916 Epoch: 0 Global Step: 35450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:00,918-Speed 2608.84 samples/sec Loss 14.7958 LearningRate 0.0916 Epoch: 0 Global Step: 35460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:04,828-Speed 2619.47 samples/sec Loss 14.7047 LearningRate 0.0916 Epoch: 0 Global Step: 35470 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:08,765-Speed 2601.09 samples/sec Loss 14.7779 LearningRate 0.0916 Epoch: 0 Global Step: 35480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:12,687-Speed 2611.80 samples/sec Loss 14.6305 LearningRate 0.0916 Epoch: 0 Global Step: 35490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:16,606-Speed 2614.41 samples/sec Loss 14.7546 LearningRate 0.0916 Epoch: 0 Global Step: 35500 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:20,513-Speed 2621.29 samples/sec Loss 14.7602 LearningRate 0.0916 Epoch: 0 Global Step: 35510 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:24,427-Speed 2617.08 samples/sec Loss 15.0261 LearningRate 0.0916 Epoch: 0 Global Step: 35520 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:28,322-Speed 2629.78 samples/sec Loss 14.8546 LearningRate 0.0916 Epoch: 0 Global Step: 35530 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:32,217-Speed 2629.86 samples/sec Loss 14.6530 LearningRate 0.0916 Epoch: 0 Global Step: 35540 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:36,121-Speed 2623.57 samples/sec Loss 14.9561 LearningRate 0.0916 Epoch: 0 Global Step: 35550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:40,015-Speed 2630.26 samples/sec Loss 14.8330 LearningRate 0.0916 Epoch: 0 Global Step: 35560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:43,910-Speed 2629.69 samples/sec Loss 14.8982 LearningRate 0.0916 Epoch: 0 Global Step: 35570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:47,804-Speed 2630.09 samples/sec Loss 14.6497 LearningRate 0.0916 Epoch: 0 Global Step: 35580 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:51,704-Speed 2626.80 samples/sec Loss 14.8185 LearningRate 0.0916 Epoch: 0 Global Step: 35590 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:55,600-Speed 2628.63 samples/sec Loss 14.8001 LearningRate 0.0916 Epoch: 0 Global Step: 35600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:42:59,500-Speed 2626.59 samples/sec Loss 14.7349 LearningRate 0.0916 Epoch: 0 Global Step: 35610 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:43:03,378-Speed 2641.09 samples/sec Loss 14.7020 LearningRate 0.0916 Epoch: 0 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:07,279-Speed 2625.24 samples/sec Loss 14.6863 LearningRate 0.0916 Epoch: 0 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:11,175-Speed 2629.23 samples/sec Loss 14.8074 LearningRate 0.0916 Epoch: 0 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:15,099-Speed 2610.23 samples/sec Loss 14.8169 LearningRate 0.0916 Epoch: 0 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:19,103-Speed 2558.38 samples/sec Loss 14.8387 LearningRate 0.0916 Epoch: 0 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:23,008-Speed 2623.18 samples/sec Loss 14.7620 LearningRate 0.0916 Epoch: 0 Global Step: 35670 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:26,909-Speed 2625.10 samples/sec Loss 14.6783 LearningRate 0.0916 Epoch: 0 Global Step: 35680 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:30,807-Speed 2627.77 samples/sec Loss 14.5051 LearningRate 0.0916 Epoch: 0 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:34,706-Speed 2627.11 samples/sec Loss 14.8211 LearningRate 0.0916 Epoch: 0 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:38,608-Speed 2625.19 samples/sec Loss 14.7765 LearningRate 0.0916 Epoch: 0 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:43:42,516-Speed 2620.49 samples/sec Loss 14.7954 LearningRate 0.0916 Epoch: 0 Global Step: 35720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:43:46,418-Speed 2625.03 samples/sec Loss 14.6334 LearningRate 0.0916 Epoch: 0 Global Step: 35730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:43:50,313-Speed 2629.51 samples/sec Loss 14.6375 LearningRate 0.0916 Epoch: 0 Global Step: 35740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:43:54,213-Speed 2626.55 samples/sec Loss 14.9458 LearningRate 0.0916 Epoch: 0 Global Step: 35750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:43:58,110-Speed 2628.60 samples/sec Loss 14.7191 LearningRate 0.0916 Epoch: 0 Global Step: 35760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:44:02,031-Speed 2612.32 samples/sec Loss 14.9241 LearningRate 0.0916 Epoch: 0 Global Step: 35770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:44:05,954-Speed 2610.90 samples/sec Loss 14.6070 LearningRate 0.0916 Epoch: 0 Global Step: 35780 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:44:09,851-Speed 2627.61 samples/sec Loss 14.5872 LearningRate 0.0916 Epoch: 0 Global Step: 35790 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:44:13,731-Speed 2639.67 samples/sec Loss 14.7449 LearningRate 0.0916 Epoch: 0 Global Step: 35800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:17,633-Speed 2625.11 samples/sec Loss 14.7854 LearningRate 0.0916 Epoch: 0 Global Step: 35810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:21,529-Speed 2629.40 samples/sec Loss 14.5033 LearningRate 0.0916 Epoch: 0 Global Step: 35820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:25,424-Speed 2630.12 samples/sec Loss 14.8006 LearningRate 0.0915 Epoch: 0 Global Step: 35830 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:29,320-Speed 2628.46 samples/sec Loss 14.7711 LearningRate 0.0915 Epoch: 0 Global Step: 35840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:33,217-Speed 2628.58 samples/sec Loss 14.7637 LearningRate 0.0915 Epoch: 0 Global Step: 35850 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:37,111-Speed 2629.91 samples/sec Loss 14.7950 LearningRate 0.0915 Epoch: 0 Global Step: 35860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:41,006-Speed 2630.37 samples/sec Loss 14.7557 LearningRate 0.0915 Epoch: 0 Global Step: 35870 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:44,904-Speed 2627.35 samples/sec Loss 14.8373 LearningRate 0.0915 Epoch: 0 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:48,798-Speed 2630.65 samples/sec Loss 14.4993 LearningRate 0.0915 Epoch: 0 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:52,676-Speed 2640.97 samples/sec Loss 14.7549 LearningRate 0.0915 Epoch: 0 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:44:56,574-Speed 2627.67 samples/sec Loss 14.7415 LearningRate 0.0915 Epoch: 0 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:00,467-Speed 2631.41 samples/sec Loss 14.8551 LearningRate 0.0915 Epoch: 0 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:04,360-Speed 2630.31 samples/sec Loss 14.6481 LearningRate 0.0915 Epoch: 0 Global Step: 35930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:08,253-Speed 2630.81 samples/sec Loss 14.8105 LearningRate 0.0915 Epoch: 0 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:12,148-Speed 2630.00 samples/sec Loss 14.7318 LearningRate 0.0915 Epoch: 0 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:16,046-Speed 2628.06 samples/sec Loss 14.7945 LearningRate 0.0915 Epoch: 0 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:19,941-Speed 2629.31 samples/sec Loss 14.7660 LearningRate 0.0915 Epoch: 0 Global Step: 35970 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:23,845-Speed 2623.73 samples/sec Loss 14.6821 LearningRate 0.0915 Epoch: 0 Global Step: 35980 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:27,748-Speed 2624.40 samples/sec Loss 14.6974 LearningRate 0.0915 Epoch: 0 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:45:31,650-Speed 2625.01 samples/sec Loss 14.6140 LearningRate 0.0915 Epoch: 0 Global Step: 36000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:45:35,510-Speed 2653.71 samples/sec Loss 14.6313 LearningRate 0.0915 Epoch: 0 Global Step: 36010 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:45:39,402-Speed 2631.65 samples/sec Loss 14.5851 LearningRate 0.0915 Epoch: 0 Global Step: 36020 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:45:43,296-Speed 2630.02 samples/sec Loss 14.7099 LearningRate 0.0915 Epoch: 0 Global Step: 36030 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:45:47,189-Speed 2630.81 samples/sec Loss 14.7882 LearningRate 0.0915 Epoch: 0 Global Step: 36040 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:45:51,086-Speed 2628.58 samples/sec Loss 14.8086 LearningRate 0.0915 Epoch: 0 Global Step: 36050 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:45:54,984-Speed 2627.45 samples/sec Loss 14.8540 LearningRate 0.0915 Epoch: 0 Global Step: 36060 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:45:58,879-Speed 2630.04 samples/sec Loss 14.6625 LearningRate 0.0915 Epoch: 0 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:02,772-Speed 2630.79 samples/sec Loss 14.8676 LearningRate 0.0915 Epoch: 0 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:06,663-Speed 2632.15 samples/sec Loss 14.6883 LearningRate 0.0915 Epoch: 0 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:10,555-Speed 2631.65 samples/sec Loss 14.8369 LearningRate 0.0915 Epoch: 0 Global Step: 36100 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:14,447-Speed 2632.15 samples/sec Loss 14.7159 LearningRate 0.0915 Epoch: 0 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:46:18,346-Speed 2626.85 samples/sec Loss 14.7730 LearningRate 0.0915 Epoch: 0 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:46:22,242-Speed 2628.88 samples/sec Loss 14.6378 LearningRate 0.0915 Epoch: 0 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:46:26,130-Speed 2634.58 samples/sec Loss 14.7299 LearningRate 0.0915 Epoch: 0 Global Step: 36140 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:30,050-Speed 2612.71 samples/sec Loss 14.7922 LearningRate 0.0915 Epoch: 0 Global Step: 36150 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:33,961-Speed 2619.36 samples/sec Loss 14.7076 LearningRate 0.0915 Epoch: 0 Global Step: 36160 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:37,859-Speed 2627.38 samples/sec Loss 14.7891 LearningRate 0.0915 Epoch: 0 Global Step: 36170 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:41,751-Speed 2631.59 samples/sec Loss 14.7107 LearningRate 0.0915 Epoch: 0 Global Step: 36180 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:45,644-Speed 2631.03 samples/sec Loss 14.6643 LearningRate 0.0915 Epoch: 0 Global Step: 36190 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:49,537-Speed 2630.43 samples/sec Loss 14.7183 LearningRate 0.0915 Epoch: 0 Global Step: 36200 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:53,439-Speed 2625.19 samples/sec Loss 14.7547 LearningRate 0.0915 Epoch: 0 Global Step: 36210 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:46:57,344-Speed 2622.89 samples/sec Loss 14.7637 LearningRate 0.0915 Epoch: 0 Global Step: 36220 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:47:01,244-Speed 2626.23 samples/sec Loss 14.5793 LearningRate 0.0915 Epoch: 0 Global Step: 36230 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-12 23:47:05,141-Speed 2628.58 samples/sec Loss 14.6168 LearningRate 0.0915 Epoch: 0 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:09,033-Speed 2631.26 samples/sec Loss 14.6032 LearningRate 0.0915 Epoch: 0 Global Step: 36250 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:12,927-Speed 2630.26 samples/sec Loss 14.8354 LearningRate 0.0914 Epoch: 0 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:16,821-Speed 2630.73 samples/sec Loss 14.7463 LearningRate 0.0914 Epoch: 0 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:20,716-Speed 2629.46 samples/sec Loss 14.7268 LearningRate 0.0914 Epoch: 0 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:24,614-Speed 2627.68 samples/sec Loss 14.5731 LearningRate 0.0914 Epoch: 0 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:28,514-Speed 2626.12 samples/sec Loss 14.7121 LearningRate 0.0914 Epoch: 0 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:32,411-Speed 2628.31 samples/sec Loss 14.7462 LearningRate 0.0914 Epoch: 0 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:36,309-Speed 2627.63 samples/sec Loss 14.7273 LearningRate 0.0914 Epoch: 0 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:40,205-Speed 2628.99 samples/sec Loss 14.8089 LearningRate 0.0914 Epoch: 0 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:47:44,100-Speed 2629.69 samples/sec Loss 14.6989 LearningRate 0.0914 Epoch: 0 Global Step: 36340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:47:47,997-Speed 2628.26 samples/sec Loss 14.6655 LearningRate 0.0914 Epoch: 0 Global Step: 36350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:47:51,899-Speed 2625.55 samples/sec Loss 14.6791 LearningRate 0.0914 Epoch: 0 Global Step: 36360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:47:55,800-Speed 2625.26 samples/sec Loss 14.7695 LearningRate 0.0914 Epoch: 0 Global Step: 36370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:47:59,705-Speed 2623.06 samples/sec Loss 14.6149 LearningRate 0.0914 Epoch: 0 Global Step: 36380 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:48:03,636-Speed 2605.51 samples/sec Loss 14.6726 LearningRate 0.0914 Epoch: 0 Global Step: 36390 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:48:07,562-Speed 2608.70 samples/sec Loss 14.5970 LearningRate 0.0914 Epoch: 0 Global Step: 36400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:48:11,478-Speed 2615.68 samples/sec Loss 14.3835 LearningRate 0.0914 Epoch: 0 Global Step: 36410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:48:15,387-Speed 2620.75 samples/sec Loss 14.6430 LearningRate 0.0914 Epoch: 0 Global Step: 36420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:48:19,289-Speed 2624.49 samples/sec Loss 14.7175 LearningRate 0.0914 Epoch: 0 Global Step: 36430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:48:23,168-Speed 2640.37 samples/sec Loss 14.6186 LearningRate 0.0914 Epoch: 0 Global Step: 36440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:48:27,083-Speed 2616.78 samples/sec Loss 14.8224 LearningRate 0.0914 Epoch: 0 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:30,976-Speed 2631.05 samples/sec Loss 14.6203 LearningRate 0.0914 Epoch: 0 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:34,877-Speed 2626.31 samples/sec Loss 14.7239 LearningRate 0.0914 Epoch: 0 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:38,773-Speed 2628.74 samples/sec Loss 14.5686 LearningRate 0.0914 Epoch: 0 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:42,668-Speed 2629.56 samples/sec Loss 14.6102 LearningRate 0.0914 Epoch: 0 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:46,567-Speed 2627.08 samples/sec Loss 14.7176 LearningRate 0.0914 Epoch: 0 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:50,470-Speed 2624.21 samples/sec Loss 14.7307 LearningRate 0.0914 Epoch: 0 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:54,364-Speed 2629.94 samples/sec Loss 14.7969 LearningRate 0.0914 Epoch: 0 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:48:58,259-Speed 2629.57 samples/sec Loss 14.5990 LearningRate 0.0914 Epoch: 0 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:02,153-Speed 2630.79 samples/sec Loss 14.6973 LearningRate 0.0914 Epoch: 0 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:06,048-Speed 2630.13 samples/sec Loss 14.7184 LearningRate 0.0914 Epoch: 0 Global Step: 36550 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:49:09,944-Speed 2628.44 samples/sec Loss 14.7486 LearningRate 0.0914 Epoch: 0 Global Step: 36560 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:49:13,827-Speed 2637.43 samples/sec Loss 14.6947 LearningRate 0.0914 Epoch: 0 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:17,721-Speed 2630.30 samples/sec Loss 14.6734 LearningRate 0.0914 Epoch: 0 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:21,628-Speed 2621.94 samples/sec Loss 14.6767 LearningRate 0.0914 Epoch: 0 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:25,523-Speed 2629.52 samples/sec Loss 14.6934 LearningRate 0.0914 Epoch: 0 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:29,416-Speed 2631.52 samples/sec Loss 14.6441 LearningRate 0.0914 Epoch: 0 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:33,312-Speed 2628.97 samples/sec Loss 14.6468 LearningRate 0.0914 Epoch: 0 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:37,209-Speed 2628.28 samples/sec Loss 14.7152 LearningRate 0.0914 Epoch: 0 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:41,112-Speed 2624.33 samples/sec Loss 14.5896 LearningRate 0.0914 Epoch: 0 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:45,001-Speed 2633.27 samples/sec Loss 14.5754 LearningRate 0.0914 Epoch: 0 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:48,904-Speed 2624.15 samples/sec Loss 14.7323 LearningRate 0.0914 Epoch: 0 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:49:52,798-Speed 2631.16 samples/sec Loss 14.7414 LearningRate 0.0914 Epoch: 0 Global Step: 36670 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:49:56,697-Speed 2627.28 samples/sec Loss 14.6607 LearningRate 0.0914 Epoch: 0 Global Step: 36680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:50:00,606-Speed 2620.10 samples/sec Loss 14.6406 LearningRate 0.0914 Epoch: 0 Global Step: 36690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:50:04,504-Speed 2627.54 samples/sec Loss 14.5548 LearningRate 0.0913 Epoch: 0 Global Step: 36700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:50:08,407-Speed 2624.55 samples/sec Loss 14.7192 LearningRate 0.0913 Epoch: 0 Global Step: 36710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:50:12,289-Speed 2638.08 samples/sec Loss 14.7613 LearningRate 0.0913 Epoch: 0 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:16,182-Speed 2631.57 samples/sec Loss 14.5456 LearningRate 0.0913 Epoch: 0 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:20,076-Speed 2630.26 samples/sec Loss 14.7717 LearningRate 0.0913 Epoch: 0 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:23,975-Speed 2627.26 samples/sec Loss 14.7567 LearningRate 0.0913 Epoch: 0 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:27,870-Speed 2629.91 samples/sec Loss 14.4593 LearningRate 0.0913 Epoch: 0 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:31,777-Speed 2621.18 samples/sec Loss 14.6775 LearningRate 0.0913 Epoch: 0 Global Step: 36770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:35,690-Speed 2618.06 samples/sec Loss 14.6185 LearningRate 0.0913 Epoch: 0 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:39,584-Speed 2629.83 samples/sec Loss 14.6031 LearningRate 0.0913 Epoch: 0 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:43,485-Speed 2625.90 samples/sec Loss 14.7775 LearningRate 0.0913 Epoch: 0 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:47,393-Speed 2621.23 samples/sec Loss 14.5796 LearningRate 0.0913 Epoch: 0 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:50:51,275-Speed 2638.15 samples/sec Loss 14.6611 LearningRate 0.0913 Epoch: 0 Global Step: 36820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:50:55,172-Speed 2628.94 samples/sec Loss 14.6382 LearningRate 0.0913 Epoch: 0 Global Step: 36830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:50:59,070-Speed 2627.44 samples/sec Loss 14.6532 LearningRate 0.0913 Epoch: 0 Global Step: 36840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:51:02,978-Speed 2620.43 samples/sec Loss 14.5723 LearningRate 0.0913 Epoch: 0 Global Step: 36850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:51:06,865-Speed 2634.72 samples/sec Loss 14.7697 LearningRate 0.0913 Epoch: 0 Global Step: 36860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:10,772-Speed 2622.37 samples/sec Loss 14.5479 LearningRate 0.0913 Epoch: 0 Global Step: 36870 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:14,672-Speed 2625.91 samples/sec Loss 14.6375 LearningRate 0.0913 Epoch: 0 Global Step: 36880 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:18,580-Speed 2620.91 samples/sec Loss 14.7395 LearningRate 0.0913 Epoch: 0 Global Step: 36890 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:22,477-Speed 2628.20 samples/sec Loss 14.7585 LearningRate 0.0913 Epoch: 0 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:26,376-Speed 2627.25 samples/sec Loss 14.6799 LearningRate 0.0913 Epoch: 0 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:30,277-Speed 2625.35 samples/sec Loss 14.8267 LearningRate 0.0913 Epoch: 0 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:34,178-Speed 2625.52 samples/sec Loss 14.7566 LearningRate 0.0913 Epoch: 0 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:38,117-Speed 2600.49 samples/sec Loss 14.4812 LearningRate 0.0913 Epoch: 0 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:42,052-Speed 2602.25 samples/sec Loss 14.4902 LearningRate 0.0913 Epoch: 0 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:51:45,950-Speed 2628.25 samples/sec Loss 14.6866 LearningRate 0.0913 Epoch: 0 Global Step: 36960 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:51:49,848-Speed 2627.51 samples/sec Loss 14.6051 LearningRate 0.0913 Epoch: 0 Global Step: 36970 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:51:53,749-Speed 2626.09 samples/sec Loss 14.5708 LearningRate 0.0913 Epoch: 0 Global Step: 36980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:51:57,659-Speed 2619.68 samples/sec Loss 14.6013 LearningRate 0.0913 Epoch: 0 Global Step: 36990 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:52:01,558-Speed 2627.34 samples/sec Loss 14.6940 LearningRate 0.0913 Epoch: 0 Global Step: 37000 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:52:05,458-Speed 2626.03 samples/sec Loss 14.5649 LearningRate 0.0913 Epoch: 0 Global Step: 37010 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:52:09,354-Speed 2628.63 samples/sec Loss 14.6485 LearningRate 0.0913 Epoch: 0 Global Step: 37020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:52:13,278-Speed 2610.24 samples/sec Loss 14.6138 LearningRate 0.0913 Epoch: 0 Global Step: 37030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:52:17,180-Speed 2625.25 samples/sec Loss 14.7111 LearningRate 0.0913 Epoch: 0 Global Step: 37040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:52:21,063-Speed 2638.01 samples/sec Loss 14.5790 LearningRate 0.0913 Epoch: 0 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:24,963-Speed 2626.50 samples/sec Loss 14.5320 LearningRate 0.0913 Epoch: 0 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:28,873-Speed 2619.17 samples/sec Loss 14.6542 LearningRate 0.0913 Epoch: 0 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:32,772-Speed 2627.34 samples/sec Loss 14.6862 LearningRate 0.0913 Epoch: 0 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:36,682-Speed 2619.30 samples/sec Loss 14.6376 LearningRate 0.0913 Epoch: 0 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:40,584-Speed 2625.19 samples/sec Loss 14.6539 LearningRate 0.0913 Epoch: 0 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:44,495-Speed 2618.88 samples/sec Loss 14.6058 LearningRate 0.0913 Epoch: 0 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:48,389-Speed 2629.97 samples/sec Loss 14.7842 LearningRate 0.0913 Epoch: 0 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:52,293-Speed 2623.60 samples/sec Loss 14.6923 LearningRate 0.0912 Epoch: 0 Global Step: 37130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:52:56,193-Speed 2626.01 samples/sec Loss 14.5732 LearningRate 0.0912 Epoch: 0 Global Step: 37140 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:00,093-Speed 2626.85 samples/sec Loss 14.5779 LearningRate 0.0912 Epoch: 0 Global Step: 37150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:53:04,000-Speed 2621.24 samples/sec Loss 14.5998 LearningRate 0.0912 Epoch: 0 Global Step: 37160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:53:07,888-Speed 2634.55 samples/sec Loss 14.7481 LearningRate 0.0912 Epoch: 0 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:11,788-Speed 2625.87 samples/sec Loss 14.5585 LearningRate 0.0912 Epoch: 0 Global Step: 37180 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:15,707-Speed 2614.28 samples/sec Loss 14.7085 LearningRate 0.0912 Epoch: 0 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:19,604-Speed 2627.86 samples/sec Loss 14.5995 LearningRate 0.0912 Epoch: 0 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:23,510-Speed 2622.91 samples/sec Loss 14.6238 LearningRate 0.0912 Epoch: 0 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:27,403-Speed 2631.02 samples/sec Loss 14.4981 LearningRate 0.0912 Epoch: 0 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:31,300-Speed 2628.26 samples/sec Loss 14.9110 LearningRate 0.0912 Epoch: 0 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:35,207-Speed 2621.99 samples/sec Loss 14.5906 LearningRate 0.0912 Epoch: 0 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:39,103-Speed 2628.37 samples/sec Loss 14.6834 LearningRate 0.0912 Epoch: 0 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:43,002-Speed 2626.98 samples/sec Loss 14.5264 LearningRate 0.0912 Epoch: 0 Global Step: 37260 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:53:46,912-Speed 2620.26 samples/sec Loss 14.5402 LearningRate 0.0912 Epoch: 0 Global Step: 37270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:53:50,807-Speed 2629.69 samples/sec Loss 14.6014 LearningRate 0.0912 Epoch: 0 Global Step: 37280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:53:54,716-Speed 2620.11 samples/sec Loss 14.6357 LearningRate 0.0912 Epoch: 0 Global Step: 37290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:53:58,614-Speed 2628.06 samples/sec Loss 14.6206 LearningRate 0.0912 Epoch: 0 Global Step: 37300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:54:02,529-Speed 2616.47 samples/sec Loss 14.7906 LearningRate 0.0912 Epoch: 0 Global Step: 37310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:54:06,401-Speed 2645.26 samples/sec Loss 14.5249 LearningRate 0.0912 Epoch: 0 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:10,300-Speed 2626.44 samples/sec Loss 14.6109 LearningRate 0.0912 Epoch: 0 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:14,210-Speed 2619.66 samples/sec Loss 14.3367 LearningRate 0.0912 Epoch: 0 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:18,103-Speed 2631.25 samples/sec Loss 14.4624 LearningRate 0.0912 Epoch: 0 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:22,010-Speed 2622.16 samples/sec Loss 14.6390 LearningRate 0.0912 Epoch: 0 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:25,906-Speed 2628.77 samples/sec Loss 14.5135 LearningRate 0.0912 Epoch: 0 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:29,802-Speed 2629.18 samples/sec Loss 14.6779 LearningRate 0.0912 Epoch: 0 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:33,762-Speed 2586.26 samples/sec Loss 14.6406 LearningRate 0.0912 Epoch: 0 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:37,650-Speed 2635.22 samples/sec Loss 14.7093 LearningRate 0.0912 Epoch: 0 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:41,547-Speed 2628.37 samples/sec Loss 14.5384 LearningRate 0.0912 Epoch: 0 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-12 23:54:45,442-Speed 2629.03 samples/sec Loss 14.7316 LearningRate 0.0912 Epoch: 0 Global Step: 37420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:54:49,348-Speed 2622.71 samples/sec Loss 14.7594 LearningRate 0.0912 Epoch: 0 Global Step: 37430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:54:53,244-Speed 2628.90 samples/sec Loss 14.6003 LearningRate 0.0912 Epoch: 0 Global Step: 37440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-12 23:54:57,141-Speed 2628.94 samples/sec Loss 14.6144 LearningRate 0.0912 Epoch: 0 Global Step: 37450 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:55:01,045-Speed 2623.56 samples/sec Loss 14.5957 LearningRate 0.0912 Epoch: 0 Global Step: 37460 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:55:04,943-Speed 2627.98 samples/sec Loss 14.5355 LearningRate 0.0912 Epoch: 0 Global Step: 37470 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:55:08,839-Speed 2628.47 samples/sec Loss 14.6150 LearningRate 0.0912 Epoch: 0 Global Step: 37480 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:55:12,719-Speed 2640.14 samples/sec Loss 14.6696 LearningRate 0.0912 Epoch: 0 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:16,614-Speed 2630.01 samples/sec Loss 14.6516 LearningRate 0.0912 Epoch: 0 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:20,572-Speed 2587.90 samples/sec Loss 14.3109 LearningRate 0.0912 Epoch: 0 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:24,472-Speed 2626.49 samples/sec Loss 14.5574 LearningRate 0.0912 Epoch: 0 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:28,370-Speed 2627.30 samples/sec Loss 14.5195 LearningRate 0.0912 Epoch: 0 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:32,298-Speed 2607.84 samples/sec Loss 14.6510 LearningRate 0.0912 Epoch: 0 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:36,428-Speed 2480.28 samples/sec Loss 14.7007 LearningRate 0.0912 Epoch: 0 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:40,324-Speed 2628.90 samples/sec Loss 14.7182 LearningRate 0.0911 Epoch: 0 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:44,222-Speed 2627.09 samples/sec Loss 14.5937 LearningRate 0.0911 Epoch: 0 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:48,121-Speed 2627.07 samples/sec Loss 14.5972 LearningRate 0.0911 Epoch: 0 Global Step: 37580 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:55:52,017-Speed 2629.08 samples/sec Loss 14.4536 LearningRate 0.0911 Epoch: 0 Global Step: 37590 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:55:55,914-Speed 2628.37 samples/sec Loss 14.5798 LearningRate 0.0911 Epoch: 0 Global Step: 37600 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:55:59,815-Speed 2625.91 samples/sec Loss 14.6914 LearningRate 0.0911 Epoch: 0 Global Step: 37610 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:03,710-Speed 2629.22 samples/sec Loss 14.5050 LearningRate 0.0911 Epoch: 0 Global Step: 37620 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:07,615-Speed 2623.16 samples/sec Loss 14.4456 LearningRate 0.0911 Epoch: 0 Global Step: 37630 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:11,515-Speed 2626.40 samples/sec Loss 14.5271 LearningRate 0.0911 Epoch: 0 Global Step: 37640 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:15,420-Speed 2622.72 samples/sec Loss 14.5463 LearningRate 0.0911 Epoch: 0 Global Step: 37650 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:19,314-Speed 2629.97 samples/sec Loss 14.6110 LearningRate 0.0911 Epoch: 0 Global Step: 37660 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:23,208-Speed 2630.59 samples/sec Loss 14.5985 LearningRate 0.0911 Epoch: 0 Global Step: 37670 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:27,102-Speed 2630.40 samples/sec Loss 14.6080 LearningRate 0.0911 Epoch: 0 Global Step: 37680 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:30,988-Speed 2635.78 samples/sec Loss 14.3575 LearningRate 0.0911 Epoch: 0 Global Step: 37690 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:34,892-Speed 2623.56 samples/sec Loss 14.5668 LearningRate 0.0911 Epoch: 0 Global Step: 37700 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:38,785-Speed 2630.46 samples/sec Loss 14.5414 LearningRate 0.0911 Epoch: 0 Global Step: 37710 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:42,699-Speed 2616.66 samples/sec Loss 14.6540 LearningRate 0.0911 Epoch: 0 Global Step: 37720 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:46,594-Speed 2629.83 samples/sec Loss 14.6837 LearningRate 0.0911 Epoch: 0 Global Step: 37730 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:50,500-Speed 2621.98 samples/sec Loss 14.6026 LearningRate 0.0911 Epoch: 0 Global Step: 37740 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:54,400-Speed 2626.84 samples/sec Loss 14.6130 LearningRate 0.0911 Epoch: 0 Global Step: 37750 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:56:58,308-Speed 2620.64 samples/sec Loss 14.6321 LearningRate 0.0911 Epoch: 0 Global Step: 37760 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:02,206-Speed 2627.35 samples/sec Loss 14.5486 LearningRate 0.0911 Epoch: 0 Global Step: 37770 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:06,108-Speed 2625.11 samples/sec Loss 14.6059 LearningRate 0.0911 Epoch: 0 Global Step: 37780 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:10,004-Speed 2631.61 samples/sec Loss 14.5734 LearningRate 0.0911 Epoch: 0 Global Step: 37790 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:13,905-Speed 2624.96 samples/sec Loss 14.5741 LearningRate 0.0911 Epoch: 0 Global Step: 37800 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:17,804-Speed 2627.27 samples/sec Loss 14.5903 LearningRate 0.0911 Epoch: 0 Global Step: 37810 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:21,698-Speed 2629.90 samples/sec Loss 14.5958 LearningRate 0.0911 Epoch: 0 Global Step: 37820 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:25,681-Speed 2572.03 samples/sec Loss 14.5700 LearningRate 0.0911 Epoch: 0 Global Step: 37830 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:29,578-Speed 2627.80 samples/sec Loss 14.6260 LearningRate 0.0911 Epoch: 0 Global Step: 37840 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:33,473-Speed 2629.58 samples/sec Loss 14.7083 LearningRate 0.0911 Epoch: 0 Global Step: 37850 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:37,383-Speed 2620.01 samples/sec Loss 14.5624 LearningRate 0.0911 Epoch: 0 Global Step: 37860 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:41,271-Speed 2634.27 samples/sec Loss 14.6123 LearningRate 0.0911 Epoch: 0 Global Step: 37870 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:45,176-Speed 2622.95 samples/sec Loss 14.5660 LearningRate 0.0911 Epoch: 0 Global Step: 37880 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:49,052-Speed 2643.07 samples/sec Loss 14.5354 LearningRate 0.0911 Epoch: 0 Global Step: 37890 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:52,944-Speed 2631.12 samples/sec Loss 14.5691 LearningRate 0.0911 Epoch: 0 Global Step: 37900 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:57:56,840-Speed 2629.04 samples/sec Loss 14.6529 LearningRate 0.0911 Epoch: 0 Global Step: 37910 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:58:00,717-Speed 2641.60 samples/sec Loss 14.7694 LearningRate 0.0911 Epoch: 0 Global Step: 37920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:04,673-Speed 2588.84 samples/sec Loss 14.5460 LearningRate 0.0911 Epoch: 0 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:08,567-Speed 2630.64 samples/sec Loss 14.6174 LearningRate 0.0911 Epoch: 0 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:12,464-Speed 2628.70 samples/sec Loss 14.4837 LearningRate 0.0911 Epoch: 0 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:16,359-Speed 2629.75 samples/sec Loss 14.6868 LearningRate 0.0911 Epoch: 0 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:20,260-Speed 2625.71 samples/sec Loss 14.7234 LearningRate 0.0911 Epoch: 0 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:24,165-Speed 2622.53 samples/sec Loss 14.4276 LearningRate 0.0911 Epoch: 0 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:28,059-Speed 2630.73 samples/sec Loss 14.3793 LearningRate 0.0911 Epoch: 0 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:31,961-Speed 2624.45 samples/sec Loss 14.5891 LearningRate 0.0910 Epoch: 0 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:35,862-Speed 2625.61 samples/sec Loss 14.4546 LearningRate 0.0910 Epoch: 0 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:58:39,764-Speed 2624.39 samples/sec Loss 14.5751 LearningRate 0.0910 Epoch: 0 Global Step: 38020 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:58:43,662-Speed 2627.89 samples/sec Loss 14.5560 LearningRate 0.0910 Epoch: 0 Global Step: 38030 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:58:47,562-Speed 2626.77 samples/sec Loss 14.6786 LearningRate 0.0910 Epoch: 0 Global Step: 38040 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:58:51,464-Speed 2624.89 samples/sec Loss 14.6189 LearningRate 0.0910 Epoch: 0 Global Step: 38050 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:58:55,465-Speed 2560.06 samples/sec Loss 14.5625 LearningRate 0.0910 Epoch: 0 Global Step: 38060 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:58:59,352-Speed 2635.05 samples/sec Loss 14.5922 LearningRate 0.0910 Epoch: 0 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:03,248-Speed 2628.92 samples/sec Loss 14.6392 LearningRate 0.0910 Epoch: 0 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:07,159-Speed 2619.39 samples/sec Loss 14.6878 LearningRate 0.0910 Epoch: 0 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:11,073-Speed 2616.92 samples/sec Loss 14.4574 LearningRate 0.0910 Epoch: 0 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:14,973-Speed 2626.38 samples/sec Loss 14.4297 LearningRate 0.0910 Epoch: 0 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:18,889-Speed 2615.90 samples/sec Loss 14.4269 LearningRate 0.0910 Epoch: 0 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:22,787-Speed 2627.67 samples/sec Loss 14.4103 LearningRate 0.0910 Epoch: 0 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:26,677-Speed 2633.01 samples/sec Loss 14.5331 LearningRate 0.0910 Epoch: 0 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:30,572-Speed 2629.52 samples/sec Loss 14.6839 LearningRate 0.0910 Epoch: 0 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:34,464-Speed 2631.87 samples/sec Loss 14.5781 LearningRate 0.0910 Epoch: 0 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-12 23:59:38,355-Speed 2631.61 samples/sec Loss 14.6131 LearningRate 0.0910 Epoch: 0 Global Step: 38170 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:59:42,263-Speed 2621.66 samples/sec Loss 14.6273 LearningRate 0.0910 Epoch: 0 Global Step: 38180 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:59:46,149-Speed 2635.93 samples/sec Loss 14.5144 LearningRate 0.0910 Epoch: 0 Global Step: 38190 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:59:50,049-Speed 2626.37 samples/sec Loss 14.5808 LearningRate 0.0910 Epoch: 0 Global Step: 38200 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:59:53,942-Speed 2630.66 samples/sec Loss 14.4287 LearningRate 0.0910 Epoch: 0 Global Step: 38210 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-12 23:59:57,837-Speed 2629.89 samples/sec Loss 14.4768 LearningRate 0.0910 Epoch: 0 Global Step: 38220 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:01,734-Speed 2628.58 samples/sec Loss 14.4742 LearningRate 0.0910 Epoch: 0 Global Step: 38230 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:05,632-Speed 2627.36 samples/sec Loss 14.5339 LearningRate 0.0910 Epoch: 0 Global Step: 38240 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:09,528-Speed 2628.92 samples/sec Loss 14.5489 LearningRate 0.0910 Epoch: 0 Global Step: 38250 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:13,429-Speed 2626.21 samples/sec Loss 14.3949 LearningRate 0.0910 Epoch: 0 Global Step: 38260 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:17,309-Speed 2639.68 samples/sec Loss 14.3513 LearningRate 0.0910 Epoch: 0 Global Step: 38270 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:21,202-Speed 2630.79 samples/sec Loss 14.3503 LearningRate 0.0910 Epoch: 0 Global Step: 38280 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:25,108-Speed 2622.55 samples/sec Loss 14.4942 LearningRate 0.0910 Epoch: 0 Global Step: 38290 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:29,003-Speed 2629.69 samples/sec Loss 14.5723 LearningRate 0.0910 Epoch: 0 Global Step: 38300 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:32,906-Speed 2624.98 samples/sec Loss 14.5121 LearningRate 0.0910 Epoch: 0 Global Step: 38310 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:00:36,794-Speed 2634.04 samples/sec Loss 14.5716 LearningRate 0.0910 Epoch: 0 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:00:40,717-Speed 2611.30 samples/sec Loss 14.5415 LearningRate 0.0910 Epoch: 0 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:00:44,615-Speed 2627.50 samples/sec Loss 14.4656 LearningRate 0.0910 Epoch: 0 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:00:48,523-Speed 2620.59 samples/sec Loss 14.6288 LearningRate 0.0910 Epoch: 0 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:00:52,423-Speed 2626.83 samples/sec Loss 14.6679 LearningRate 0.0910 Epoch: 0 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:00:56,336-Speed 2617.09 samples/sec Loss 14.6328 LearningRate 0.0910 Epoch: 0 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:01:00,236-Speed 2627.12 samples/sec Loss 14.6576 LearningRate 0.0910 Epoch: 0 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:01:04,139-Speed 2624.26 samples/sec Loss 14.5149 LearningRate 0.0910 Epoch: 0 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:01:08,035-Speed 2628.65 samples/sec Loss 14.4666 LearningRate 0.0910 Epoch: 0 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:01:11,931-Speed 2628.83 samples/sec Loss 14.4523 LearningRate 0.0910 Epoch: 0 Global Step: 38410 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:01:15,836-Speed 2622.79 samples/sec Loss 14.3832 LearningRate 0.0910 Epoch: 0 Global Step: 38420 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:19,731-Speed 2629.45 samples/sec Loss 14.6406 LearningRate 0.0909 Epoch: 0 Global Step: 38430 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:23,643-Speed 2618.38 samples/sec Loss 14.5173 LearningRate 0.0909 Epoch: 0 Global Step: 38440 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:27,561-Speed 2614.72 samples/sec Loss 14.5003 LearningRate 0.0909 Epoch: 0 Global Step: 38450 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:31,586-Speed 2544.96 samples/sec Loss 14.7222 LearningRate 0.0909 Epoch: 0 Global Step: 38460 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:35,494-Speed 2620.23 samples/sec Loss 14.6355 LearningRate 0.0909 Epoch: 0 Global Step: 38470 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:39,391-Speed 2628.07 samples/sec Loss 14.7344 LearningRate 0.0909 Epoch: 0 Global Step: 38480 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:43,285-Speed 2630.55 samples/sec Loss 14.6244 LearningRate 0.0909 Epoch: 0 Global Step: 38490 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:47,186-Speed 2625.44 samples/sec Loss 14.4593 LearningRate 0.0909 Epoch: 0 Global Step: 38500 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:01:51,065-Speed 2640.74 samples/sec Loss 14.4064 LearningRate 0.0909 Epoch: 0 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:01:54,957-Speed 2631.74 samples/sec Loss 14.5695 LearningRate 0.0909 Epoch: 0 Global Step: 38520 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:01:58,852-Speed 2629.55 samples/sec Loss 14.4803 LearningRate 0.0909 Epoch: 0 Global Step: 38530 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:02,743-Speed 2632.91 samples/sec Loss 14.5885 LearningRate 0.0909 Epoch: 0 Global Step: 38540 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:06,637-Speed 2630.28 samples/sec Loss 14.4833 LearningRate 0.0909 Epoch: 0 Global Step: 38550 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:10,532-Speed 2629.38 samples/sec Loss 14.4344 LearningRate 0.0909 Epoch: 0 Global Step: 38560 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:14,432-Speed 2626.34 samples/sec Loss 14.4763 LearningRate 0.0909 Epoch: 0 Global Step: 38570 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:18,324-Speed 2631.70 samples/sec Loss 14.5535 LearningRate 0.0909 Epoch: 0 Global Step: 38580 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:22,218-Speed 2630.49 samples/sec Loss 14.6660 LearningRate 0.0909 Epoch: 0 Global Step: 38590 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:26,110-Speed 2631.24 samples/sec Loss 14.4841 LearningRate 0.0909 Epoch: 0 Global Step: 38600 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:30,002-Speed 2631.52 samples/sec Loss 14.4978 LearningRate 0.0909 Epoch: 0 Global Step: 38610 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:02:33,907-Speed 2623.81 samples/sec Loss 14.6872 LearningRate 0.0909 Epoch: 0 Global Step: 38620 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:02:37,821-Speed 2616.62 samples/sec Loss 14.4720 LearningRate 0.0909 Epoch: 0 Global Step: 38630 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:02:41,715-Speed 2630.42 samples/sec Loss 14.5654 LearningRate 0.0909 Epoch: 0 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:45,611-Speed 2629.37 samples/sec Loss 14.5658 LearningRate 0.0909 Epoch: 0 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:49,512-Speed 2625.25 samples/sec Loss 14.4887 LearningRate 0.0909 Epoch: 0 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:53,403-Speed 2632.15 samples/sec Loss 14.6290 LearningRate 0.0909 Epoch: 0 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:02:57,296-Speed 2631.04 samples/sec Loss 14.2452 LearningRate 0.0909 Epoch: 0 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:01,190-Speed 2630.40 samples/sec Loss 14.4722 LearningRate 0.0909 Epoch: 0 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:05,090-Speed 2626.03 samples/sec Loss 14.5182 LearningRate 0.0909 Epoch: 0 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:09,015-Speed 2609.77 samples/sec Loss 14.6681 LearningRate 0.0909 Epoch: 0 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:12,906-Speed 2633.06 samples/sec Loss 14.4559 LearningRate 0.0909 Epoch: 0 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:16,801-Speed 2629.37 samples/sec Loss 14.4817 LearningRate 0.0909 Epoch: 0 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:20,693-Speed 2632.26 samples/sec Loss 14.2665 LearningRate 0.0909 Epoch: 0 Global Step: 38740 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:24,587-Speed 2630.33 samples/sec Loss 14.5045 LearningRate 0.0909 Epoch: 0 Global Step: 38750 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:28,479-Speed 2631.85 samples/sec Loss 14.5103 LearningRate 0.0909 Epoch: 0 Global Step: 38760 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:32,503-Speed 2544.92 samples/sec Loss 14.3618 LearningRate 0.0909 Epoch: 0 Global Step: 38770 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:36,396-Speed 2631.30 samples/sec Loss 14.4675 LearningRate 0.0909 Epoch: 0 Global Step: 38780 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:40,302-Speed 2622.39 samples/sec Loss 14.4506 LearningRate 0.0909 Epoch: 0 Global Step: 38790 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:44,198-Speed 2629.06 samples/sec Loss 14.6283 LearningRate 0.0909 Epoch: 0 Global Step: 38800 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:48,111-Speed 2617.64 samples/sec Loss 14.6175 LearningRate 0.0909 Epoch: 0 Global Step: 38810 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:03:51,986-Speed 2643.23 samples/sec Loss 14.5796 LearningRate 0.0909 Epoch: 0 Global Step: 38820 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:55,895-Speed 2620.69 samples/sec Loss 14.5228 LearningRate 0.0909 Epoch: 0 Global Step: 38830 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:03:59,784-Speed 2634.06 samples/sec Loss 14.4580 LearningRate 0.0909 Epoch: 0 Global Step: 38840 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:03,677-Speed 2630.61 samples/sec Loss 14.3088 LearningRate 0.0909 Epoch: 0 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:07,700-Speed 2545.75 samples/sec Loss 14.4400 LearningRate 0.0909 Epoch: 0 Global Step: 38860 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:11,678-Speed 2574.82 samples/sec Loss 14.4454 LearningRate 0.0908 Epoch: 0 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:15,642-Speed 2583.80 samples/sec Loss 14.6639 LearningRate 0.0908 Epoch: 0 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:19,552-Speed 2620.15 samples/sec Loss 14.5088 LearningRate 0.0908 Epoch: 0 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:23,474-Speed 2611.62 samples/sec Loss 14.4998 LearningRate 0.0908 Epoch: 0 Global Step: 38900 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:27,378-Speed 2624.23 samples/sec Loss 14.4532 LearningRate 0.0908 Epoch: 0 Global Step: 38910 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:04:31,289-Speed 2618.18 samples/sec Loss 14.5737 LearningRate 0.0908 Epoch: 0 Global Step: 38920 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:04:35,197-Speed 2621.38 samples/sec Loss 14.5462 LearningRate 0.0908 Epoch: 0 Global Step: 38930 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:04:39,103-Speed 2622.28 samples/sec Loss 14.4037 LearningRate 0.0908 Epoch: 0 Global Step: 38940 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:04:43,016-Speed 2619.21 samples/sec Loss 14.4933 LearningRate 0.0908 Epoch: 0 Global Step: 38950 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:04:46,915-Speed 2626.68 samples/sec Loss 14.3376 LearningRate 0.0908 Epoch: 0 Global Step: 38960 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:04:50,816-Speed 2625.60 samples/sec Loss 14.6815 LearningRate 0.0908 Epoch: 0 Global Step: 38970 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:04:54,720-Speed 2624.00 samples/sec Loss 14.6873 LearningRate 0.0908 Epoch: 0 Global Step: 38980 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:04:58,612-Speed 2631.41 samples/sec Loss 14.5099 LearningRate 0.0908 Epoch: 0 Global Step: 38990 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:02,519-Speed 2621.27 samples/sec Loss 14.5013 LearningRate 0.0908 Epoch: 0 Global Step: 39000 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:06,423-Speed 2623.60 samples/sec Loss 14.5885 LearningRate 0.0908 Epoch: 0 Global Step: 39010 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:10,302-Speed 2640.37 samples/sec Loss 14.5510 LearningRate 0.0908 Epoch: 0 Global Step: 39020 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:14,212-Speed 2620.18 samples/sec Loss 14.5893 LearningRate 0.0908 Epoch: 0 Global Step: 39030 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:18,106-Speed 2630.70 samples/sec Loss 14.4378 LearningRate 0.0908 Epoch: 0 Global Step: 39040 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:22,013-Speed 2621.17 samples/sec Loss 14.4666 LearningRate 0.0908 Epoch: 0 Global Step: 39050 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:25,907-Speed 2630.64 samples/sec Loss 14.4792 LearningRate 0.0908 Epoch: 0 Global Step: 39060 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:29,809-Speed 2625.02 samples/sec Loss 14.3808 LearningRate 0.0908 Epoch: 0 Global Step: 39070 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:33,725-Speed 2615.37 samples/sec Loss 14.4430 LearningRate 0.0908 Epoch: 0 Global Step: 39080 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:37,629-Speed 2623.61 samples/sec Loss 14.5064 LearningRate 0.0908 Epoch: 0 Global Step: 39090 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:41,524-Speed 2630.15 samples/sec Loss 14.4537 LearningRate 0.0908 Epoch: 0 Global Step: 39100 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:45,420-Speed 2628.93 samples/sec Loss 14.3894 LearningRate 0.0908 Epoch: 0 Global Step: 39110 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:49,297-Speed 2642.19 samples/sec Loss 14.5430 LearningRate 0.0908 Epoch: 0 Global Step: 39120 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:05:53,175-Speed 2640.57 samples/sec Loss 14.4420 LearningRate 0.0908 Epoch: 0 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:05:57,072-Speed 2628.34 samples/sec Loss 14.6421 LearningRate 0.0908 Epoch: 0 Global Step: 39140 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:00,970-Speed 2627.96 samples/sec Loss 14.5500 LearningRate 0.0908 Epoch: 0 Global Step: 39150 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:04,869-Speed 2626.57 samples/sec Loss 14.3012 LearningRate 0.0908 Epoch: 0 Global Step: 39160 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:08,762-Speed 2630.74 samples/sec Loss 14.5864 LearningRate 0.0908 Epoch: 0 Global Step: 39170 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:12,665-Speed 2624.40 samples/sec Loss 14.4410 LearningRate 0.0908 Epoch: 0 Global Step: 39180 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:16,560-Speed 2629.73 samples/sec Loss 14.2759 LearningRate 0.0908 Epoch: 0 Global Step: 39190 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:20,458-Speed 2627.64 samples/sec Loss 14.4985 LearningRate 0.0908 Epoch: 0 Global Step: 39200 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:24,350-Speed 2631.62 samples/sec Loss 14.4717 LearningRate 0.0908 Epoch: 0 Global Step: 39210 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:28,244-Speed 2630.72 samples/sec Loss 14.5911 LearningRate 0.0908 Epoch: 0 Global Step: 39220 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:06:32,141-Speed 2628.07 samples/sec Loss 14.4373 LearningRate 0.0908 Epoch: 0 Global Step: 39230 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:06:36,051-Speed 2619.41 samples/sec Loss 14.4922 LearningRate 0.0908 Epoch: 0 Global Step: 39240 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:06:39,944-Speed 2630.56 samples/sec Loss 14.3087 LearningRate 0.0908 Epoch: 0 Global Step: 39250 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:06:43,841-Speed 2628.34 samples/sec Loss 14.5641 LearningRate 0.0908 Epoch: 0 Global Step: 39260 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:06:47,761-Speed 2613.32 samples/sec Loss 14.5388 LearningRate 0.0908 Epoch: 0 Global Step: 39270 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:06:51,658-Speed 2628.11 samples/sec Loss 14.4114 LearningRate 0.0908 Epoch: 0 Global Step: 39280 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:06:55,576-Speed 2614.54 samples/sec Loss 14.3530 LearningRate 0.0908 Epoch: 0 Global Step: 39290 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:06:59,454-Speed 2641.09 samples/sec Loss 14.4184 LearningRate 0.0907 Epoch: 0 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:03,348-Speed 2630.30 samples/sec Loss 14.3447 LearningRate 0.0907 Epoch: 0 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:07,331-Speed 2571.22 samples/sec Loss 14.4688 LearningRate 0.0907 Epoch: 0 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:11,284-Speed 2591.10 samples/sec Loss 14.5117 LearningRate 0.0907 Epoch: 0 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:15,290-Speed 2556.83 samples/sec Loss 14.5131 LearningRate 0.0907 Epoch: 0 Global Step: 39340 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:19,183-Speed 2630.95 samples/sec Loss 14.4638 LearningRate 0.0907 Epoch: 0 Global Step: 39350 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:23,079-Speed 2629.37 samples/sec Loss 14.4296 LearningRate 0.0907 Epoch: 0 Global Step: 39360 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:26,972-Speed 2630.84 samples/sec Loss 14.4389 LearningRate 0.0907 Epoch: 0 Global Step: 39370 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:30,885-Speed 2618.11 samples/sec Loss 14.5675 LearningRate 0.0907 Epoch: 0 Global Step: 39380 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:34,783-Speed 2627.43 samples/sec Loss 14.5320 LearningRate 0.0907 Epoch: 0 Global Step: 39390 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:07:38,676-Speed 2630.50 samples/sec Loss 14.3626 LearningRate 0.0907 Epoch: 0 Global Step: 39400 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:07:42,664-Speed 2568.20 samples/sec Loss 14.4095 LearningRate 0.0907 Epoch: 0 Global Step: 39410 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:07:46,578-Speed 2617.20 samples/sec Loss 14.4885 LearningRate 0.0907 Epoch: 0 Global Step: 39420 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:07:50,492-Speed 2617.00 samples/sec Loss 14.5304 LearningRate 0.0907 Epoch: 0 Global Step: 39430 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:07:54,384-Speed 2631.87 samples/sec Loss 14.4325 LearningRate 0.0907 Epoch: 0 Global Step: 39440 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:07:58,286-Speed 2624.76 samples/sec Loss 14.4452 LearningRate 0.0907 Epoch: 0 Global Step: 39450 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:08:02,162-Speed 2642.91 samples/sec Loss 14.4297 LearningRate 0.0907 Epoch: 0 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:06,060-Speed 2627.74 samples/sec Loss 14.4627 LearningRate 0.0907 Epoch: 0 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:09,995-Speed 2602.99 samples/sec Loss 14.3747 LearningRate 0.0907 Epoch: 0 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:13,893-Speed 2627.42 samples/sec Loss 14.4385 LearningRate 0.0907 Epoch: 0 Global Step: 39490 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:17,793-Speed 2626.21 samples/sec Loss 14.3847 LearningRate 0.0907 Epoch: 0 Global Step: 39500 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:21,691-Speed 2627.55 samples/sec Loss 14.3480 LearningRate 0.0907 Epoch: 0 Global Step: 39510 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:25,592-Speed 2625.45 samples/sec Loss 14.4111 LearningRate 0.0907 Epoch: 0 Global Step: 39520 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:29,491-Speed 2627.60 samples/sec Loss 14.4441 LearningRate 0.0907 Epoch: 0 Global Step: 39530 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:33,391-Speed 2626.13 samples/sec Loss 14.4280 LearningRate 0.0907 Epoch: 0 Global Step: 39540 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:37,285-Speed 2630.16 samples/sec Loss 14.5098 LearningRate 0.0907 Epoch: 0 Global Step: 39550 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:08:41,186-Speed 2626.02 samples/sec Loss 14.6340 LearningRate 0.0907 Epoch: 0 Global Step: 39560 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:08:45,096-Speed 2619.57 samples/sec Loss 14.5355 LearningRate 0.0907 Epoch: 0 Global Step: 39570 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:08:48,992-Speed 2629.10 samples/sec Loss 14.4651 LearningRate 0.0907 Epoch: 0 Global Step: 39580 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:08:52,903-Speed 2619.08 samples/sec Loss 14.3658 LearningRate 0.0907 Epoch: 0 Global Step: 39590 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:08:56,791-Speed 2634.56 samples/sec Loss 14.4110 LearningRate 0.0907 Epoch: 0 Global Step: 39600 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:09:00,689-Speed 2627.41 samples/sec Loss 14.5075 LearningRate 0.0907 Epoch: 0 Global Step: 39610 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:09:04,572-Speed 2637.59 samples/sec Loss 14.4290 LearningRate 0.0907 Epoch: 0 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:08,469-Speed 2628.54 samples/sec Loss 14.5028 LearningRate 0.0907 Epoch: 0 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:12,368-Speed 2627.12 samples/sec Loss 14.5417 LearningRate 0.0907 Epoch: 0 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:16,259-Speed 2632.22 samples/sec Loss 14.3410 LearningRate 0.0907 Epoch: 0 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:20,155-Speed 2629.48 samples/sec Loss 14.4885 LearningRate 0.0907 Epoch: 0 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:24,056-Speed 2625.18 samples/sec Loss 14.4570 LearningRate 0.0907 Epoch: 0 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:27,954-Speed 2628.06 samples/sec Loss 14.4440 LearningRate 0.0907 Epoch: 0 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:31,848-Speed 2629.62 samples/sec Loss 14.4952 LearningRate 0.0907 Epoch: 0 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:35,743-Speed 2629.41 samples/sec Loss 14.3892 LearningRate 0.0907 Epoch: 0 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:39,635-Speed 2631.63 samples/sec Loss 14.4647 LearningRate 0.0907 Epoch: 0 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:43,531-Speed 2629.44 samples/sec Loss 14.5089 LearningRate 0.0907 Epoch: 0 Global Step: 39720 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:09:47,407-Speed 2642.70 samples/sec Loss 14.4944 LearningRate 0.0907 Epoch: 0 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:51,331-Speed 2610.58 samples/sec Loss 14.5031 LearningRate 0.0906 Epoch: 0 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:55,222-Speed 2631.65 samples/sec Loss 14.5341 LearningRate 0.0906 Epoch: 0 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:09:59,119-Speed 2628.84 samples/sec Loss 14.4938 LearningRate 0.0906 Epoch: 0 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:10:03,013-Speed 2629.69 samples/sec Loss 14.3825 LearningRate 0.0906 Epoch: 0 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:10:06,965-Speed 2591.72 samples/sec Loss 14.5049 LearningRate 0.0906 Epoch: 0 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:10:10,862-Speed 2628.07 samples/sec Loss 14.3531 LearningRate 0.0906 Epoch: 0 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:10:14,755-Speed 2631.45 samples/sec Loss 14.5291 LearningRate 0.0906 Epoch: 0 Global Step: 39800 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:10:18,648-Speed 2631.38 samples/sec Loss 14.2880 LearningRate 0.0906 Epoch: 0 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:10:22,543-Speed 2630.02 samples/sec Loss 14.4829 LearningRate 0.0906 Epoch: 0 Global Step: 39820 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:10:26,435-Speed 2631.35 samples/sec Loss 14.3325 LearningRate 0.0906 Epoch: 0 Global Step: 39830 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:30,332-Speed 2628.35 samples/sec Loss 14.5166 LearningRate 0.0906 Epoch: 0 Global Step: 39840 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:34,224-Speed 2631.43 samples/sec Loss 14.4866 LearningRate 0.0906 Epoch: 0 Global Step: 39850 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:38,118-Speed 2630.56 samples/sec Loss 14.4474 LearningRate 0.0906 Epoch: 0 Global Step: 39860 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:42,010-Speed 2631.61 samples/sec Loss 14.3005 LearningRate 0.0906 Epoch: 0 Global Step: 39870 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:45,904-Speed 2630.28 samples/sec Loss 14.5405 LearningRate 0.0906 Epoch: 0 Global Step: 39880 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:49,812-Speed 2621.33 samples/sec Loss 14.4777 LearningRate 0.0906 Epoch: 0 Global Step: 39890 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:53,757-Speed 2596.04 samples/sec Loss 14.5826 LearningRate 0.0906 Epoch: 0 Global Step: 39900 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:10:57,650-Speed 2631.52 samples/sec Loss 14.2765 LearningRate 0.0906 Epoch: 0 Global Step: 39910 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:11:01,527-Speed 2641.63 samples/sec Loss 14.3667 LearningRate 0.0906 Epoch: 0 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:05,421-Speed 2630.20 samples/sec Loss 14.3403 LearningRate 0.0906 Epoch: 0 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:09,317-Speed 2628.89 samples/sec Loss 14.4814 LearningRate 0.0906 Epoch: 0 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:13,217-Speed 2626.07 samples/sec Loss 14.4274 LearningRate 0.0906 Epoch: 0 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:17,114-Speed 2628.46 samples/sec Loss 14.2084 LearningRate 0.0906 Epoch: 0 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:21,011-Speed 2628.56 samples/sec Loss 14.3751 LearningRate 0.0906 Epoch: 0 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:24,907-Speed 2628.89 samples/sec Loss 14.4620 LearningRate 0.0906 Epoch: 0 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:28,817-Speed 2619.84 samples/sec Loss 14.5253 LearningRate 0.0906 Epoch: 0 Global Step: 39990 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:11:32,713-Speed 2628.89 samples/sec Loss 14.3157 LearningRate 0.0906 Epoch: 0 Global Step: 40000 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:12:16,122-[lfw][40000]XNorm: 22.849399
Training: 2022-04-13 00:12:16,123-[lfw][40000]Accuracy-Flip: 0.99517+-0.00311
Training: 2022-04-13 00:12:16,124-[lfw][40000]Accuracy-Highest: 0.99517
Training: 2022-04-13 00:13:06,650-[cfp_fp][40000]XNorm: 20.437435
Training: 2022-04-13 00:13:06,651-[cfp_fp][40000]Accuracy-Flip: 0.97057+-0.00788
Training: 2022-04-13 00:13:06,652-[cfp_fp][40000]Accuracy-Highest: 0.97057
Training: 2022-04-13 00:13:50,149-[agedb_30][40000]XNorm: 22.183955
Training: 2022-04-13 00:13:50,150-[agedb_30][40000]Accuracy-Flip: 0.95567+-0.00844
Training: 2022-04-13 00:13:50,150-[agedb_30][40000]Accuracy-Highest: 0.95567
Training: 2022-04-13 00:13:54,029-Speed 72.46 samples/sec Loss 14.4160 LearningRate 0.0906 Epoch: 0 Global Step: 40010 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:13:57,903-Speed 2643.96 samples/sec Loss 14.3571 LearningRate 0.0906 Epoch: 0 Global Step: 40020 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:01,782-Speed 2640.98 samples/sec Loss 14.2670 LearningRate 0.0906 Epoch: 0 Global Step: 40030 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:05,661-Speed 2640.02 samples/sec Loss 14.3108 LearningRate 0.0906 Epoch: 0 Global Step: 40040 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:09,586-Speed 2610.04 samples/sec Loss 14.4073 LearningRate 0.0906 Epoch: 0 Global Step: 40050 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:13,467-Speed 2639.05 samples/sec Loss 14.4426 LearningRate 0.0906 Epoch: 0 Global Step: 40060 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:17,355-Speed 2635.10 samples/sec Loss 14.4410 LearningRate 0.0906 Epoch: 0 Global Step: 40070 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:21,254-Speed 2633.72 samples/sec Loss 14.4021 LearningRate 0.0906 Epoch: 0 Global Step: 40080 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:25,127-Speed 2644.20 samples/sec Loss 14.3593 LearningRate 0.0906 Epoch: 0 Global Step: 40090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:29,020-Speed 2631.72 samples/sec Loss 14.3853 LearningRate 0.0906 Epoch: 0 Global Step: 40100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:32,913-Speed 2631.17 samples/sec Loss 14.4130 LearningRate 0.0906 Epoch: 0 Global Step: 40110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:36,789-Speed 2642.24 samples/sec Loss 14.4990 LearningRate 0.0906 Epoch: 0 Global Step: 40120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:40,686-Speed 2628.48 samples/sec Loss 14.4107 LearningRate 0.0906 Epoch: 0 Global Step: 40130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:44,590-Speed 2623.11 samples/sec Loss 14.3697 LearningRate 0.0906 Epoch: 0 Global Step: 40140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:48,497-Speed 2622.70 samples/sec Loss 14.4244 LearningRate 0.0906 Epoch: 0 Global Step: 40150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:52,401-Speed 2623.86 samples/sec Loss 14.2394 LearningRate 0.0906 Epoch: 0 Global Step: 40160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:14:56,316-Speed 2616.42 samples/sec Loss 14.4623 LearningRate 0.0906 Epoch: 0 Global Step: 40170 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:00,220-Speed 2624.01 samples/sec Loss 14.6685 LearningRate 0.0905 Epoch: 0 Global Step: 40180 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:04,124-Speed 2623.58 samples/sec Loss 14.4091 LearningRate 0.0905 Epoch: 0 Global Step: 40190 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:08,033-Speed 2619.93 samples/sec Loss 14.4530 LearningRate 0.0905 Epoch: 0 Global Step: 40200 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:11,943-Speed 2619.76 samples/sec Loss 14.3183 LearningRate 0.0905 Epoch: 0 Global Step: 40210 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:15,837-Speed 2630.39 samples/sec Loss 14.2451 LearningRate 0.0905 Epoch: 0 Global Step: 40220 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:19,743-Speed 2622.05 samples/sec Loss 14.3724 LearningRate 0.0905 Epoch: 0 Global Step: 40230 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:23,650-Speed 2622.12 samples/sec Loss 14.4806 LearningRate 0.0905 Epoch: 0 Global Step: 40240 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:27,554-Speed 2623.14 samples/sec Loss 14.3699 LearningRate 0.0905 Epoch: 0 Global Step: 40250 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:31,467-Speed 2618.08 samples/sec Loss 14.2691 LearningRate 0.0905 Epoch: 0 Global Step: 40260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:35,375-Speed 2620.20 samples/sec Loss 14.5893 LearningRate 0.0905 Epoch: 0 Global Step: 40270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:39,386-Speed 2553.72 samples/sec Loss 14.3208 LearningRate 0.0905 Epoch: 0 Global Step: 40280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:43,486-Speed 2497.96 samples/sec Loss 14.4296 LearningRate 0.0905 Epoch: 0 Global Step: 40290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:15:47,558-Speed 2515.43 samples/sec Loss 14.3983 LearningRate 0.0905 Epoch: 0 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:15:51,522-Speed 2584.04 samples/sec Loss 14.3922 LearningRate 0.0905 Epoch: 0 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:15:55,563-Speed 2535.26 samples/sec Loss 14.4048 LearningRate 0.0905 Epoch: 0 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:15:59,465-Speed 2624.63 samples/sec Loss 14.3445 LearningRate 0.0905 Epoch: 0 Global Step: 40330 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:03,368-Speed 2624.62 samples/sec Loss 14.3235 LearningRate 0.0905 Epoch: 0 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:07,270-Speed 2624.32 samples/sec Loss 14.3898 LearningRate 0.0905 Epoch: 0 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:11,170-Speed 2626.43 samples/sec Loss 14.4048 LearningRate 0.0905 Epoch: 0 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:15,077-Speed 2621.54 samples/sec Loss 14.3138 LearningRate 0.0905 Epoch: 0 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:18,977-Speed 2626.83 samples/sec Loss 14.4085 LearningRate 0.0905 Epoch: 0 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:22,929-Speed 2591.29 samples/sec Loss 14.4584 LearningRate 0.0905 Epoch: 0 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:26,827-Speed 2628.29 samples/sec Loss 14.4268 LearningRate 0.0905 Epoch: 0 Global Step: 40400 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:16:30,736-Speed 2620.15 samples/sec Loss 14.4096 LearningRate 0.0905 Epoch: 0 Global Step: 40410 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:16:34,638-Speed 2624.96 samples/sec Loss 14.3719 LearningRate 0.0905 Epoch: 0 Global Step: 40420 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:16:38,550-Speed 2618.26 samples/sec Loss 14.4587 LearningRate 0.0905 Epoch: 0 Global Step: 40430 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:16:42,448-Speed 2627.67 samples/sec Loss 14.4505 LearningRate 0.0905 Epoch: 0 Global Step: 40440 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:16:46,471-Speed 2545.75 samples/sec Loss 14.3618 LearningRate 0.0905 Epoch: 0 Global Step: 40450 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:16:50,374-Speed 2624.44 samples/sec Loss 14.3038 LearningRate 0.0905 Epoch: 0 Global Step: 40460 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:16:54,270-Speed 2629.47 samples/sec Loss 14.3873 LearningRate 0.0905 Epoch: 0 Global Step: 40470 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:16:58,159-Speed 2633.44 samples/sec Loss 14.5302 LearningRate 0.0905 Epoch: 0 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:02,217-Speed 2524.37 samples/sec Loss 14.3686 LearningRate 0.0905 Epoch: 0 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:06,353-Speed 2476.86 samples/sec Loss 14.3802 LearningRate 0.0905 Epoch: 0 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:10,385-Speed 2540.08 samples/sec Loss 14.4505 LearningRate 0.0905 Epoch: 0 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:14,311-Speed 2608.92 samples/sec Loss 14.3811 LearningRate 0.0905 Epoch: 0 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:18,208-Speed 2628.35 samples/sec Loss 14.3702 LearningRate 0.0905 Epoch: 0 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:22,119-Speed 2619.04 samples/sec Loss 14.2603 LearningRate 0.0905 Epoch: 0 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:26,019-Speed 2626.82 samples/sec Loss 14.3331 LearningRate 0.0905 Epoch: 0 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:29,922-Speed 2624.07 samples/sec Loss 14.4168 LearningRate 0.0905 Epoch: 0 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:33,839-Speed 2615.19 samples/sec Loss 14.3637 LearningRate 0.0905 Epoch: 0 Global Step: 40570 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:17:37,717-Speed 2640.56 samples/sec Loss 14.3943 LearningRate 0.0905 Epoch: 0 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:41,625-Speed 2620.75 samples/sec Loss 14.4141 LearningRate 0.0905 Epoch: 0 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:45,519-Speed 2630.48 samples/sec Loss 14.3372 LearningRate 0.0905 Epoch: 0 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:49,439-Speed 2613.20 samples/sec Loss 14.4195 LearningRate 0.0904 Epoch: 0 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:53,338-Speed 2626.84 samples/sec Loss 14.3010 LearningRate 0.0904 Epoch: 0 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:17:57,242-Speed 2624.23 samples/sec Loss 14.3949 LearningRate 0.0904 Epoch: 0 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:18:01,138-Speed 2629.03 samples/sec Loss 14.4383 LearningRate 0.0904 Epoch: 0 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:18:05,036-Speed 2628.05 samples/sec Loss 14.4077 LearningRate 0.0904 Epoch: 0 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:18:08,931-Speed 2629.01 samples/sec Loss 14.3621 LearningRate 0.0904 Epoch: 0 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:18:12,832-Speed 2625.94 samples/sec Loss 14.5978 LearningRate 0.0904 Epoch: 0 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:18:16,731-Speed 2626.71 samples/sec Loss 14.3121 LearningRate 0.0904 Epoch: 0 Global Step: 40680 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:20,638-Speed 2621.71 samples/sec Loss 14.2439 LearningRate 0.0904 Epoch: 0 Global Step: 40690 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:24,537-Speed 2626.75 samples/sec Loss 14.4216 LearningRate 0.0904 Epoch: 0 Global Step: 40700 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:28,438-Speed 2625.47 samples/sec Loss 14.3291 LearningRate 0.0904 Epoch: 0 Global Step: 40710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:32,335-Speed 2628.08 samples/sec Loss 14.4439 LearningRate 0.0904 Epoch: 0 Global Step: 40720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:36,247-Speed 2618.29 samples/sec Loss 14.2937 LearningRate 0.0904 Epoch: 0 Global Step: 40730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:40,148-Speed 2626.41 samples/sec Loss 14.4887 LearningRate 0.0904 Epoch: 0 Global Step: 40740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:44,046-Speed 2627.26 samples/sec Loss 14.4063 LearningRate 0.0904 Epoch: 0 Global Step: 40750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:47,950-Speed 2623.85 samples/sec Loss 14.3119 LearningRate 0.0904 Epoch: 0 Global Step: 40760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:51,870-Speed 2612.95 samples/sec Loss 14.3207 LearningRate 0.0904 Epoch: 0 Global Step: 40770 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:18:55,775-Speed 2622.93 samples/sec Loss 14.3311 LearningRate 0.0904 Epoch: 0 Global Step: 40780 Fp16 Grad Scale: 524288 Required: 89 hours
Training: 2022-04-13 00:18:59,663-Speed 2634.26 samples/sec Loss 14.3186 LearningRate 0.0904 Epoch: 0 Global Step: 40790 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:03,566-Speed 2624.43 samples/sec Loss 14.3875 LearningRate 0.0904 Epoch: 0 Global Step: 40800 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:07,486-Speed 2613.69 samples/sec Loss 14.4066 LearningRate 0.0904 Epoch: 0 Global Step: 40810 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:11,378-Speed 2631.43 samples/sec Loss 14.4206 LearningRate 0.0904 Epoch: 0 Global Step: 40820 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:15,322-Speed 2597.47 samples/sec Loss 14.3415 LearningRate 0.0904 Epoch: 0 Global Step: 40830 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:19,234-Speed 2618.24 samples/sec Loss 14.2580 LearningRate 0.0904 Epoch: 0 Global Step: 40840 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:23,206-Speed 2578.95 samples/sec Loss 14.3586 LearningRate 0.0904 Epoch: 0 Global Step: 40850 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:27,143-Speed 2601.22 samples/sec Loss 14.3963 LearningRate 0.0904 Epoch: 0 Global Step: 40860 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:31,106-Speed 2584.59 samples/sec Loss 14.0970 LearningRate 0.0904 Epoch: 0 Global Step: 40870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:19:34,999-Speed 2630.99 samples/sec Loss 14.5044 LearningRate 0.0904 Epoch: 0 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:19:38,895-Speed 2629.16 samples/sec Loss 14.3887 LearningRate 0.0904 Epoch: 0 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:19:42,795-Speed 2626.54 samples/sec Loss 14.1790 LearningRate 0.0904 Epoch: 0 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:19:46,695-Speed 2626.00 samples/sec Loss 14.1966 LearningRate 0.0904 Epoch: 0 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:19:50,597-Speed 2625.25 samples/sec Loss 14.3270 LearningRate 0.0904 Epoch: 0 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:19:54,505-Speed 2621.10 samples/sec Loss 14.2808 LearningRate 0.0904 Epoch: 0 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:19:58,403-Speed 2627.47 samples/sec Loss 14.3307 LearningRate 0.0904 Epoch: 0 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:02,303-Speed 2626.04 samples/sec Loss 14.4088 LearningRate 0.0904 Epoch: 0 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:06,211-Speed 2620.82 samples/sec Loss 14.3217 LearningRate 0.0904 Epoch: 0 Global Step: 40960 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:10,120-Speed 2620.54 samples/sec Loss 14.2433 LearningRate 0.0904 Epoch: 0 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:14,037-Speed 2614.87 samples/sec Loss 14.4350 LearningRate 0.0904 Epoch: 0 Global Step: 40980 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:20:17,927-Speed 2633.16 samples/sec Loss 14.4631 LearningRate 0.0904 Epoch: 0 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:21,823-Speed 2629.43 samples/sec Loss 14.2802 LearningRate 0.0904 Epoch: 0 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:25,720-Speed 2627.92 samples/sec Loss 14.3993 LearningRate 0.0904 Epoch: 0 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:29,630-Speed 2619.25 samples/sec Loss 14.2862 LearningRate 0.0904 Epoch: 0 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:33,534-Speed 2623.59 samples/sec Loss 14.2515 LearningRate 0.0904 Epoch: 0 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:37,434-Speed 2626.60 samples/sec Loss 14.3849 LearningRate 0.0904 Epoch: 0 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:41,334-Speed 2626.66 samples/sec Loss 14.4949 LearningRate 0.0903 Epoch: 0 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:45,344-Speed 2554.10 samples/sec Loss 14.2658 LearningRate 0.0903 Epoch: 0 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:49,341-Speed 2562.86 samples/sec Loss 14.2763 LearningRate 0.0903 Epoch: 0 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:53,239-Speed 2627.67 samples/sec Loss 14.4264 LearningRate 0.0903 Epoch: 0 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:20:57,138-Speed 2626.48 samples/sec Loss 14.2924 LearningRate 0.0903 Epoch: 0 Global Step: 41090 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:21:01,046-Speed 2621.12 samples/sec Loss 14.4275 LearningRate 0.0903 Epoch: 0 Global Step: 41100 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:21:04,946-Speed 2626.17 samples/sec Loss 14.3715 LearningRate 0.0903 Epoch: 0 Global Step: 41110 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:21:08,845-Speed 2627.42 samples/sec Loss 14.3199 LearningRate 0.0903 Epoch: 0 Global Step: 41120 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:21:12,746-Speed 2625.00 samples/sec Loss 14.5221 LearningRate 0.0903 Epoch: 0 Global Step: 41130 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:21:16,649-Speed 2624.65 samples/sec Loss 14.4579 LearningRate 0.0903 Epoch: 0 Global Step: 41140 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:21:20,548-Speed 2626.59 samples/sec Loss 14.4631 LearningRate 0.0903 Epoch: 0 Global Step: 41150 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:21:24,432-Speed 2636.85 samples/sec Loss 14.3470 LearningRate 0.0903 Epoch: 0 Global Step: 41160 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:28,331-Speed 2627.14 samples/sec Loss 14.3688 LearningRate 0.0903 Epoch: 0 Global Step: 41170 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:32,232-Speed 2625.83 samples/sec Loss 14.2903 LearningRate 0.0903 Epoch: 0 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:36,144-Speed 2618.36 samples/sec Loss 14.2611 LearningRate 0.0903 Epoch: 0 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:40,046-Speed 2624.67 samples/sec Loss 14.3035 LearningRate 0.0903 Epoch: 0 Global Step: 41200 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:43,965-Speed 2613.83 samples/sec Loss 14.3316 LearningRate 0.0903 Epoch: 0 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:47,866-Speed 2625.67 samples/sec Loss 14.2818 LearningRate 0.0903 Epoch: 0 Global Step: 41220 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:51,782-Speed 2615.75 samples/sec Loss 14.2971 LearningRate 0.0903 Epoch: 0 Global Step: 41230 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:55,684-Speed 2624.75 samples/sec Loss 14.3189 LearningRate 0.0903 Epoch: 0 Global Step: 41240 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:21:59,583-Speed 2627.27 samples/sec Loss 14.2130 LearningRate 0.0903 Epoch: 0 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:22:03,483-Speed 2626.63 samples/sec Loss 14.3296 LearningRate 0.0903 Epoch: 0 Global Step: 41260 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:07,377-Speed 2629.78 samples/sec Loss 14.2872 LearningRate 0.0903 Epoch: 0 Global Step: 41270 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:11,328-Speed 2592.26 samples/sec Loss 14.3626 LearningRate 0.0903 Epoch: 0 Global Step: 41280 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:15,226-Speed 2628.12 samples/sec Loss 14.5368 LearningRate 0.0903 Epoch: 0 Global Step: 41290 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:19,165-Speed 2601.00 samples/sec Loss 14.3500 LearningRate 0.0903 Epoch: 0 Global Step: 41300 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:23,067-Speed 2624.50 samples/sec Loss 14.5093 LearningRate 0.0903 Epoch: 0 Global Step: 41310 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:26,977-Speed 2620.11 samples/sec Loss 14.3048 LearningRate 0.0903 Epoch: 0 Global Step: 41320 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:30,872-Speed 2629.19 samples/sec Loss 14.4767 LearningRate 0.0903 Epoch: 0 Global Step: 41330 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:34,779-Speed 2621.79 samples/sec Loss 14.4808 LearningRate 0.0903 Epoch: 0 Global Step: 41340 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:38,679-Speed 2626.35 samples/sec Loss 14.3007 LearningRate 0.0903 Epoch: 0 Global Step: 41350 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:42,587-Speed 2620.84 samples/sec Loss 14.3806 LearningRate 0.0903 Epoch: 0 Global Step: 41360 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:46,486-Speed 2626.91 samples/sec Loss 14.3273 LearningRate 0.0903 Epoch: 0 Global Step: 41370 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:22:50,372-Speed 2636.31 samples/sec Loss 14.5200 LearningRate 0.0903 Epoch: 0 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:22:54,270-Speed 2627.05 samples/sec Loss 14.4785 LearningRate 0.0903 Epoch: 0 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:22:58,167-Speed 2628.70 samples/sec Loss 14.2920 LearningRate 0.0903 Epoch: 0 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:02,068-Speed 2626.08 samples/sec Loss 14.3577 LearningRate 0.0903 Epoch: 0 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:05,964-Speed 2628.41 samples/sec Loss 14.4150 LearningRate 0.0903 Epoch: 0 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:09,860-Speed 2628.91 samples/sec Loss 14.1782 LearningRate 0.0903 Epoch: 0 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:13,760-Speed 2626.82 samples/sec Loss 14.3385 LearningRate 0.0903 Epoch: 0 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:17,661-Speed 2625.27 samples/sec Loss 14.3022 LearningRate 0.0903 Epoch: 0 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:21,560-Speed 2627.32 samples/sec Loss 14.1887 LearningRate 0.0903 Epoch: 0 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:25,473-Speed 2617.99 samples/sec Loss 14.3989 LearningRate 0.0903 Epoch: 0 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:29,368-Speed 2629.79 samples/sec Loss 14.2305 LearningRate 0.0902 Epoch: 0 Global Step: 41480 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:23:50,664-Speed 480.87 samples/sec Loss 14.3810 LearningRate 0.0902 Epoch: 1 Global Step: 41490 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:23:54,541-Speed 2642.46 samples/sec Loss 14.2984 LearningRate 0.0902 Epoch: 1 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:23:58,425-Speed 2636.78 samples/sec Loss 14.2921 LearningRate 0.0902 Epoch: 1 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:02,342-Speed 2615.01 samples/sec Loss 14.3595 LearningRate 0.0902 Epoch: 1 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:06,229-Speed 2634.92 samples/sec Loss 14.4849 LearningRate 0.0902 Epoch: 1 Global Step: 41530 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:10,114-Speed 2636.77 samples/sec Loss 14.5274 LearningRate 0.0902 Epoch: 1 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:14,156-Speed 2533.99 samples/sec Loss 14.3063 LearningRate 0.0902 Epoch: 1 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:18,162-Speed 2556.54 samples/sec Loss 14.4472 LearningRate 0.0902 Epoch: 1 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:22,050-Speed 2634.92 samples/sec Loss 14.1614 LearningRate 0.0902 Epoch: 1 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:25,935-Speed 2636.48 samples/sec Loss 14.1693 LearningRate 0.0902 Epoch: 1 Global Step: 41580 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:29,823-Speed 2634.30 samples/sec Loss 14.4487 LearningRate 0.0902 Epoch: 1 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:33,713-Speed 2632.54 samples/sec Loss 14.3075 LearningRate 0.0902 Epoch: 1 Global Step: 41600 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:24:37,584-Speed 2645.97 samples/sec Loss 14.2440 LearningRate 0.0902 Epoch: 1 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:41,474-Speed 2633.09 samples/sec Loss 14.3963 LearningRate 0.0902 Epoch: 1 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:45,399-Speed 2609.53 samples/sec Loss 14.4321 LearningRate 0.0902 Epoch: 1 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:49,446-Speed 2531.39 samples/sec Loss 14.3627 LearningRate 0.0902 Epoch: 1 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:53,496-Speed 2528.73 samples/sec Loss 14.1761 LearningRate 0.0902 Epoch: 1 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:24:57,429-Speed 2604.41 samples/sec Loss 14.3239 LearningRate 0.0902 Epoch: 1 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:01,330-Speed 2625.97 samples/sec Loss 14.3191 LearningRate 0.0902 Epoch: 1 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:05,229-Speed 2626.26 samples/sec Loss 14.2773 LearningRate 0.0902 Epoch: 1 Global Step: 41680 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:09,136-Speed 2621.27 samples/sec Loss 14.2910 LearningRate 0.0902 Epoch: 1 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:13,042-Speed 2622.56 samples/sec Loss 14.3059 LearningRate 0.0902 Epoch: 1 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:16,943-Speed 2625.26 samples/sec Loss 14.2715 LearningRate 0.0902 Epoch: 1 Global Step: 41710 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:25:20,844-Speed 2626.17 samples/sec Loss 14.3366 LearningRate 0.0902 Epoch: 1 Global Step: 41720 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:25:24,743-Speed 2626.87 samples/sec Loss 14.3922 LearningRate 0.0902 Epoch: 1 Global Step: 41730 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:25:28,641-Speed 2627.89 samples/sec Loss 14.2569 LearningRate 0.0902 Epoch: 1 Global Step: 41740 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:25:32,541-Speed 2626.04 samples/sec Loss 14.4079 LearningRate 0.0902 Epoch: 1 Global Step: 41750 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:25:36,443-Speed 2624.75 samples/sec Loss 14.4067 LearningRate 0.0902 Epoch: 1 Global Step: 41760 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:25:40,321-Speed 2641.37 samples/sec Loss 14.1733 LearningRate 0.0902 Epoch: 1 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:44,218-Speed 2628.14 samples/sec Loss 14.3336 LearningRate 0.0902 Epoch: 1 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:48,117-Speed 2627.07 samples/sec Loss 14.3612 LearningRate 0.0902 Epoch: 1 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:52,015-Speed 2627.81 samples/sec Loss 14.4239 LearningRate 0.0902 Epoch: 1 Global Step: 41800 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:55,912-Speed 2628.19 samples/sec Loss 14.2460 LearningRate 0.0902 Epoch: 1 Global Step: 41810 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:25:59,812-Speed 2626.63 samples/sec Loss 14.3600 LearningRate 0.0902 Epoch: 1 Global Step: 41820 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:03,711-Speed 2626.92 samples/sec Loss 14.4317 LearningRate 0.0902 Epoch: 1 Global Step: 41830 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:07,607-Speed 2628.76 samples/sec Loss 14.2358 LearningRate 0.0902 Epoch: 1 Global Step: 41840 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:11,508-Speed 2625.12 samples/sec Loss 14.2559 LearningRate 0.0902 Epoch: 1 Global Step: 41850 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:15,524-Speed 2550.44 samples/sec Loss 14.3077 LearningRate 0.0902 Epoch: 1 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:19,631-Speed 2494.48 samples/sec Loss 14.2971 LearningRate 0.0902 Epoch: 1 Global Step: 41870 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:26:23,701-Speed 2516.34 samples/sec Loss 14.3131 LearningRate 0.0902 Epoch: 1 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:27,615-Speed 2617.03 samples/sec Loss 14.2526 LearningRate 0.0902 Epoch: 1 Global Step: 41890 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:31,509-Speed 2630.18 samples/sec Loss 14.1290 LearningRate 0.0902 Epoch: 1 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:35,403-Speed 2630.01 samples/sec Loss 14.3302 LearningRate 0.0902 Epoch: 1 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:39,297-Speed 2630.90 samples/sec Loss 14.2852 LearningRate 0.0901 Epoch: 1 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:43,203-Speed 2622.58 samples/sec Loss 14.2651 LearningRate 0.0901 Epoch: 1 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:47,095-Speed 2631.54 samples/sec Loss 14.2773 LearningRate 0.0901 Epoch: 1 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:51,129-Speed 2539.15 samples/sec Loss 14.2468 LearningRate 0.0901 Epoch: 1 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:26:55,014-Speed 2636.61 samples/sec Loss 14.1832 LearningRate 0.0901 Epoch: 1 Global Step: 41960 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:26:58,910-Speed 2629.28 samples/sec Loss 14.2395 LearningRate 0.0901 Epoch: 1 Global Step: 41970 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:02,817-Speed 2621.58 samples/sec Loss 14.3010 LearningRate 0.0901 Epoch: 1 Global Step: 41980 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:06,715-Speed 2627.31 samples/sec Loss 14.1904 LearningRate 0.0901 Epoch: 1 Global Step: 41990 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:10,627-Speed 2618.76 samples/sec Loss 14.2608 LearningRate 0.0901 Epoch: 1 Global Step: 42000 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:14,530-Speed 2624.38 samples/sec Loss 14.2974 LearningRate 0.0901 Epoch: 1 Global Step: 42010 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:18,430-Speed 2626.03 samples/sec Loss 14.2564 LearningRate 0.0901 Epoch: 1 Global Step: 42020 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:22,330-Speed 2626.23 samples/sec Loss 14.1622 LearningRate 0.0901 Epoch: 1 Global Step: 42030 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:26,232-Speed 2625.14 samples/sec Loss 14.0508 LearningRate 0.0901 Epoch: 1 Global Step: 42040 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:30,160-Speed 2607.83 samples/sec Loss 14.3261 LearningRate 0.0901 Epoch: 1 Global Step: 42050 Fp16 Grad Scale: 65536 Required: 89 hours
Training: 2022-04-13 00:27:34,062-Speed 2624.66 samples/sec Loss 14.1446 LearningRate 0.0901 Epoch: 1 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:27:37,990-Speed 2607.45 samples/sec Loss 14.1923 LearningRate 0.0901 Epoch: 1 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:27:41,883-Speed 2630.89 samples/sec Loss 14.4141 LearningRate 0.0901 Epoch: 1 Global Step: 42080 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:27:45,813-Speed 2606.97 samples/sec Loss 14.2172 LearningRate 0.0901 Epoch: 1 Global Step: 42090 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:27:49,706-Speed 2631.28 samples/sec Loss 14.3808 LearningRate 0.0901 Epoch: 1 Global Step: 42100 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:27:53,614-Speed 2620.82 samples/sec Loss 14.1770 LearningRate 0.0901 Epoch: 1 Global Step: 42110 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:27:57,530-Speed 2616.25 samples/sec Loss 14.0886 LearningRate 0.0901 Epoch: 1 Global Step: 42120 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:28:01,434-Speed 2623.64 samples/sec Loss 14.1710 LearningRate 0.0901 Epoch: 1 Global Step: 42130 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:28:05,331-Speed 2627.72 samples/sec Loss 14.2679 LearningRate 0.0901 Epoch: 1 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:28:09,229-Speed 2627.43 samples/sec Loss 14.2072 LearningRate 0.0901 Epoch: 1 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 89 hours
Training: 2022-04-13 00:28:13,129-Speed 2627.05 samples/sec Loss 14.2125 LearningRate 0.0901 Epoch: 1 Global Step: 42160 Fp16 Grad Scale: 262144 Required: 89 hours
Training: 2022-04-13 00:28:17,027-Speed 2627.30 samples/sec Loss 14.2393 LearningRate 0.0901 Epoch: 1 Global Step: 42170 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:28:20,934-Speed 2621.71 samples/sec Loss 14.2534 LearningRate 0.0901 Epoch: 1 Global Step: 42180 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:28:24,854-Speed 2613.15 samples/sec Loss 14.3046 LearningRate 0.0901 Epoch: 1 Global Step: 42190 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:28:28,754-Speed 2626.20 samples/sec Loss 14.3178 LearningRate 0.0901 Epoch: 1 Global Step: 42200 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:28:32,642-Speed 2634.89 samples/sec Loss 14.1644 LearningRate 0.0901 Epoch: 1 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:28:36,564-Speed 2611.52 samples/sec Loss 14.2444 LearningRate 0.0901 Epoch: 1 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:28:40,468-Speed 2623.70 samples/sec Loss 14.2661 LearningRate 0.0901 Epoch: 1 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:28:44,393-Speed 2609.18 samples/sec Loss 14.4298 LearningRate 0.0901 Epoch: 1 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:28:48,435-Speed 2534.55 samples/sec Loss 14.2135 LearningRate 0.0901 Epoch: 1 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:28:52,335-Speed 2625.80 samples/sec Loss 14.3638 LearningRate 0.0901 Epoch: 1 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:28:56,230-Speed 2630.44 samples/sec Loss 14.1470 LearningRate 0.0901 Epoch: 1 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:00,126-Speed 2628.43 samples/sec Loss 14.3219 LearningRate 0.0901 Epoch: 1 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:04,034-Speed 2621.30 samples/sec Loss 14.4338 LearningRate 0.0901 Epoch: 1 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:07,933-Speed 2626.81 samples/sec Loss 14.3233 LearningRate 0.0901 Epoch: 1 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:11,837-Speed 2623.26 samples/sec Loss 14.1940 LearningRate 0.0901 Epoch: 1 Global Step: 42310 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:29:15,718-Speed 2639.14 samples/sec Loss 14.2667 LearningRate 0.0901 Epoch: 1 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:19,617-Speed 2626.81 samples/sec Loss 14.2569 LearningRate 0.0901 Epoch: 1 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:23,520-Speed 2624.85 samples/sec Loss 14.2578 LearningRate 0.0901 Epoch: 1 Global Step: 42340 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:27,420-Speed 2626.55 samples/sec Loss 14.3254 LearningRate 0.0901 Epoch: 1 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:31,371-Speed 2592.26 samples/sec Loss 14.2271 LearningRate 0.0900 Epoch: 1 Global Step: 42360 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:35,278-Speed 2621.53 samples/sec Loss 14.4185 LearningRate 0.0900 Epoch: 1 Global Step: 42370 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:39,178-Speed 2626.46 samples/sec Loss 14.3556 LearningRate 0.0900 Epoch: 1 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:43,079-Speed 2625.32 samples/sec Loss 14.2325 LearningRate 0.0900 Epoch: 1 Global Step: 42390 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:46,979-Speed 2626.07 samples/sec Loss 14.3450 LearningRate 0.0900 Epoch: 1 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:50,880-Speed 2626.20 samples/sec Loss 14.2357 LearningRate 0.0900 Epoch: 1 Global Step: 42410 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:29:54,779-Speed 2627.20 samples/sec Loss 14.3928 LearningRate 0.0900 Epoch: 1 Global Step: 42420 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:29:58,695-Speed 2615.38 samples/sec Loss 14.2541 LearningRate 0.0900 Epoch: 1 Global Step: 42430 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:30:02,713-Speed 2549.56 samples/sec Loss 14.2552 LearningRate 0.0900 Epoch: 1 Global Step: 42440 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:30:06,593-Speed 2639.31 samples/sec Loss 14.2709 LearningRate 0.0900 Epoch: 1 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:10,517-Speed 2610.29 samples/sec Loss 14.1042 LearningRate 0.0900 Epoch: 1 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:14,407-Speed 2633.00 samples/sec Loss 14.2386 LearningRate 0.0900 Epoch: 1 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:18,299-Speed 2632.35 samples/sec Loss 14.1692 LearningRate 0.0900 Epoch: 1 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:22,191-Speed 2631.12 samples/sec Loss 14.2307 LearningRate 0.0900 Epoch: 1 Global Step: 42490 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:26,092-Speed 2626.01 samples/sec Loss 14.2770 LearningRate 0.0900 Epoch: 1 Global Step: 42500 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:29,986-Speed 2630.49 samples/sec Loss 14.2275 LearningRate 0.0900 Epoch: 1 Global Step: 42510 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:33,882-Speed 2629.07 samples/sec Loss 14.2858 LearningRate 0.0900 Epoch: 1 Global Step: 42520 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:37,781-Speed 2626.86 samples/sec Loss 14.3700 LearningRate 0.0900 Epoch: 1 Global Step: 42530 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:41,676-Speed 2630.07 samples/sec Loss 14.2068 LearningRate 0.0900 Epoch: 1 Global Step: 42540 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:45,552-Speed 2642.49 samples/sec Loss 14.3204 LearningRate 0.0900 Epoch: 1 Global Step: 42550 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:49,453-Speed 2625.40 samples/sec Loss 14.1017 LearningRate 0.0900 Epoch: 1 Global Step: 42560 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:53,441-Speed 2568.63 samples/sec Loss 14.3419 LearningRate 0.0900 Epoch: 1 Global Step: 42570 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:30:57,344-Speed 2623.97 samples/sec Loss 14.3108 LearningRate 0.0900 Epoch: 1 Global Step: 42580 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:01,243-Speed 2626.95 samples/sec Loss 14.2347 LearningRate 0.0900 Epoch: 1 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:05,140-Speed 2628.43 samples/sec Loss 14.3068 LearningRate 0.0900 Epoch: 1 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:09,040-Speed 2626.13 samples/sec Loss 14.1695 LearningRate 0.0900 Epoch: 1 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:12,951-Speed 2618.58 samples/sec Loss 14.3744 LearningRate 0.0900 Epoch: 1 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:16,846-Speed 2629.73 samples/sec Loss 14.2672 LearningRate 0.0900 Epoch: 1 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:20,763-Speed 2615.02 samples/sec Loss 14.1949 LearningRate 0.0900 Epoch: 1 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:24,663-Speed 2626.48 samples/sec Loss 14.1628 LearningRate 0.0900 Epoch: 1 Global Step: 42650 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:31:28,560-Speed 2628.39 samples/sec Loss 14.2407 LearningRate 0.0900 Epoch: 1 Global Step: 42660 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:31:32,458-Speed 2627.90 samples/sec Loss 14.2568 LearningRate 0.0900 Epoch: 1 Global Step: 42670 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:31:36,364-Speed 2622.08 samples/sec Loss 14.3877 LearningRate 0.0900 Epoch: 1 Global Step: 42680 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:31:40,264-Speed 2626.02 samples/sec Loss 14.3788 LearningRate 0.0900 Epoch: 1 Global Step: 42690 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:31:44,155-Speed 2632.50 samples/sec Loss 14.3367 LearningRate 0.0900 Epoch: 1 Global Step: 42700 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:48,060-Speed 2623.16 samples/sec Loss 14.2055 LearningRate 0.0900 Epoch: 1 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:51,956-Speed 2628.89 samples/sec Loss 14.3255 LearningRate 0.0900 Epoch: 1 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:55,861-Speed 2622.49 samples/sec Loss 14.4456 LearningRate 0.0900 Epoch: 1 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:31:59,776-Speed 2616.79 samples/sec Loss 14.2181 LearningRate 0.0900 Epoch: 1 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:03,688-Speed 2618.01 samples/sec Loss 14.2413 LearningRate 0.0900 Epoch: 1 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:07,599-Speed 2619.29 samples/sec Loss 14.0956 LearningRate 0.0900 Epoch: 1 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:11,506-Speed 2621.37 samples/sec Loss 14.2223 LearningRate 0.0900 Epoch: 1 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:15,420-Speed 2617.09 samples/sec Loss 14.2207 LearningRate 0.0900 Epoch: 1 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:19,332-Speed 2617.60 samples/sec Loss 14.3320 LearningRate 0.0899 Epoch: 1 Global Step: 42790 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:23,230-Speed 2628.09 samples/sec Loss 14.1894 LearningRate 0.0899 Epoch: 1 Global Step: 42800 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:32:27,132-Speed 2625.28 samples/sec Loss 14.3554 LearningRate 0.0899 Epoch: 1 Global Step: 42810 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:32:31,032-Speed 2626.19 samples/sec Loss 14.3123 LearningRate 0.0899 Epoch: 1 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:34,946-Speed 2617.13 samples/sec Loss 14.1706 LearningRate 0.0899 Epoch: 1 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:38,846-Speed 2626.00 samples/sec Loss 14.0526 LearningRate 0.0899 Epoch: 1 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:42,746-Speed 2626.14 samples/sec Loss 14.4412 LearningRate 0.0899 Epoch: 1 Global Step: 42850 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:46,644-Speed 2628.20 samples/sec Loss 14.3272 LearningRate 0.0899 Epoch: 1 Global Step: 42860 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:50,548-Speed 2623.50 samples/sec Loss 14.3067 LearningRate 0.0899 Epoch: 1 Global Step: 42870 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:54,451-Speed 2623.93 samples/sec Loss 14.1374 LearningRate 0.0899 Epoch: 1 Global Step: 42880 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:32:58,346-Speed 2630.00 samples/sec Loss 14.2535 LearningRate 0.0899 Epoch: 1 Global Step: 42890 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:33:02,247-Speed 2625.94 samples/sec Loss 14.3303 LearningRate 0.0899 Epoch: 1 Global Step: 42900 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:33:06,155-Speed 2620.49 samples/sec Loss 14.3252 LearningRate 0.0899 Epoch: 1 Global Step: 42910 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:33:10,098-Speed 2597.72 samples/sec Loss 14.1189 LearningRate 0.0899 Epoch: 1 Global Step: 42920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:33:14,167-Speed 2517.16 samples/sec Loss 14.2810 LearningRate 0.0899 Epoch: 1 Global Step: 42930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:33:18,216-Speed 2529.30 samples/sec Loss 14.1850 LearningRate 0.0899 Epoch: 1 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:22,150-Speed 2604.38 samples/sec Loss 14.3394 LearningRate 0.0899 Epoch: 1 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:26,046-Speed 2628.96 samples/sec Loss 14.2599 LearningRate 0.0899 Epoch: 1 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:29,963-Speed 2614.65 samples/sec Loss 14.2217 LearningRate 0.0899 Epoch: 1 Global Step: 42970 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:33,866-Speed 2624.95 samples/sec Loss 14.1144 LearningRate 0.0899 Epoch: 1 Global Step: 42980 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:37,760-Speed 2629.93 samples/sec Loss 14.3485 LearningRate 0.0899 Epoch: 1 Global Step: 42990 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:41,665-Speed 2623.09 samples/sec Loss 14.1954 LearningRate 0.0899 Epoch: 1 Global Step: 43000 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:45,563-Speed 2627.26 samples/sec Loss 14.1295 LearningRate 0.0899 Epoch: 1 Global Step: 43010 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:49,456-Speed 2630.75 samples/sec Loss 14.3135 LearningRate 0.0899 Epoch: 1 Global Step: 43020 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:53,352-Speed 2630.48 samples/sec Loss 14.3028 LearningRate 0.0899 Epoch: 1 Global Step: 43030 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:33:57,271-Speed 2613.45 samples/sec Loss 14.3439 LearningRate 0.0899 Epoch: 1 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:01,173-Speed 2624.99 samples/sec Loss 14.1320 LearningRate 0.0899 Epoch: 1 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:05,094-Speed 2611.61 samples/sec Loss 14.2306 LearningRate 0.0899 Epoch: 1 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:08,993-Speed 2627.28 samples/sec Loss 14.4273 LearningRate 0.0899 Epoch: 1 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:12,889-Speed 2629.28 samples/sec Loss 14.1769 LearningRate 0.0899 Epoch: 1 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:16,793-Speed 2623.53 samples/sec Loss 14.1576 LearningRate 0.0899 Epoch: 1 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:20,698-Speed 2623.63 samples/sec Loss 14.3131 LearningRate 0.0899 Epoch: 1 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:24,603-Speed 2623.10 samples/sec Loss 14.1590 LearningRate 0.0899 Epoch: 1 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:28,497-Speed 2630.38 samples/sec Loss 14.2344 LearningRate 0.0899 Epoch: 1 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:32,395-Speed 2627.48 samples/sec Loss 14.3280 LearningRate 0.0899 Epoch: 1 Global Step: 43130 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:36,294-Speed 2627.22 samples/sec Loss 14.1777 LearningRate 0.0899 Epoch: 1 Global Step: 43140 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:34:40,193-Speed 2627.20 samples/sec Loss 14.2386 LearningRate 0.0899 Epoch: 1 Global Step: 43150 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:34:44,083-Speed 2632.83 samples/sec Loss 14.3029 LearningRate 0.0899 Epoch: 1 Global Step: 43160 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:47,976-Speed 2630.21 samples/sec Loss 14.3219 LearningRate 0.0899 Epoch: 1 Global Step: 43170 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:51,872-Speed 2629.61 samples/sec Loss 14.0983 LearningRate 0.0899 Epoch: 1 Global Step: 43180 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:55,775-Speed 2625.05 samples/sec Loss 14.2041 LearningRate 0.0899 Epoch: 1 Global Step: 43190 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:34:59,674-Speed 2626.28 samples/sec Loss 14.2866 LearningRate 0.0899 Epoch: 1 Global Step: 43200 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:03,581-Speed 2621.91 samples/sec Loss 14.2421 LearningRate 0.0899 Epoch: 1 Global Step: 43210 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:07,487-Speed 2622.33 samples/sec Loss 14.1469 LearningRate 0.0899 Epoch: 1 Global Step: 43220 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:11,397-Speed 2619.72 samples/sec Loss 14.4397 LearningRate 0.0898 Epoch: 1 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:15,298-Speed 2625.50 samples/sec Loss 14.1559 LearningRate 0.0898 Epoch: 1 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:19,201-Speed 2624.65 samples/sec Loss 14.2467 LearningRate 0.0898 Epoch: 1 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:23,094-Speed 2630.64 samples/sec Loss 14.2880 LearningRate 0.0898 Epoch: 1 Global Step: 43260 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:35:27,027-Speed 2605.11 samples/sec Loss 14.3539 LearningRate 0.0898 Epoch: 1 Global Step: 43270 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:35:30,906-Speed 2640.05 samples/sec Loss 14.1918 LearningRate 0.0898 Epoch: 1 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:34,800-Speed 2630.87 samples/sec Loss 14.2220 LearningRate 0.0898 Epoch: 1 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:38,695-Speed 2629.57 samples/sec Loss 14.2277 LearningRate 0.0898 Epoch: 1 Global Step: 43300 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:42,588-Speed 2630.71 samples/sec Loss 14.1834 LearningRate 0.0898 Epoch: 1 Global Step: 43310 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:46,490-Speed 2624.79 samples/sec Loss 14.2169 LearningRate 0.0898 Epoch: 1 Global Step: 43320 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:50,399-Speed 2620.49 samples/sec Loss 14.1616 LearningRate 0.0898 Epoch: 1 Global Step: 43330 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:54,299-Speed 2626.24 samples/sec Loss 14.3153 LearningRate 0.0898 Epoch: 1 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:35:58,220-Speed 2612.45 samples/sec Loss 14.2782 LearningRate 0.0898 Epoch: 1 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:02,114-Speed 2630.22 samples/sec Loss 14.3177 LearningRate 0.0898 Epoch: 1 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:06,014-Speed 2626.92 samples/sec Loss 14.1046 LearningRate 0.0898 Epoch: 1 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:09,912-Speed 2627.54 samples/sec Loss 14.3640 LearningRate 0.0898 Epoch: 1 Global Step: 43380 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:36:13,797-Speed 2635.70 samples/sec Loss 14.2367 LearningRate 0.0898 Epoch: 1 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:17,698-Speed 2625.76 samples/sec Loss 14.3066 LearningRate 0.0898 Epoch: 1 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:21,608-Speed 2619.74 samples/sec Loss 14.1429 LearningRate 0.0898 Epoch: 1 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:25,503-Speed 2630.40 samples/sec Loss 14.1471 LearningRate 0.0898 Epoch: 1 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:29,426-Speed 2610.70 samples/sec Loss 14.2751 LearningRate 0.0898 Epoch: 1 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:33,334-Speed 2620.97 samples/sec Loss 14.1867 LearningRate 0.0898 Epoch: 1 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:37,236-Speed 2625.30 samples/sec Loss 14.3910 LearningRate 0.0898 Epoch: 1 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:41,174-Speed 2601.28 samples/sec Loss 14.2419 LearningRate 0.0898 Epoch: 1 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:45,070-Speed 2628.47 samples/sec Loss 14.3381 LearningRate 0.0898 Epoch: 1 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:48,998-Speed 2608.18 samples/sec Loss 14.2744 LearningRate 0.0898 Epoch: 1 Global Step: 43480 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:36:52,897-Speed 2627.37 samples/sec Loss 14.1531 LearningRate 0.0898 Epoch: 1 Global Step: 43490 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:36:56,808-Speed 2618.65 samples/sec Loss 14.3452 LearningRate 0.0898 Epoch: 1 Global Step: 43500 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:00,720-Speed 2618.20 samples/sec Loss 14.2471 LearningRate 0.0898 Epoch: 1 Global Step: 43510 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:04,629-Speed 2620.32 samples/sec Loss 14.2006 LearningRate 0.0898 Epoch: 1 Global Step: 43520 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:08,531-Speed 2624.87 samples/sec Loss 14.1348 LearningRate 0.0898 Epoch: 1 Global Step: 43530 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:12,439-Speed 2620.48 samples/sec Loss 14.2685 LearningRate 0.0898 Epoch: 1 Global Step: 43540 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:16,346-Speed 2621.84 samples/sec Loss 14.1817 LearningRate 0.0898 Epoch: 1 Global Step: 43550 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:20,248-Speed 2624.87 samples/sec Loss 14.1899 LearningRate 0.0898 Epoch: 1 Global Step: 43560 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:24,148-Speed 2626.86 samples/sec Loss 14.1109 LearningRate 0.0898 Epoch: 1 Global Step: 43570 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:28,045-Speed 2628.15 samples/sec Loss 14.1927 LearningRate 0.0898 Epoch: 1 Global Step: 43580 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:31,921-Speed 2642.42 samples/sec Loss 14.0843 LearningRate 0.0898 Epoch: 1 Global Step: 43590 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:35,826-Speed 2623.05 samples/sec Loss 14.0538 LearningRate 0.0898 Epoch: 1 Global Step: 43600 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:39,724-Speed 2627.64 samples/sec Loss 14.3613 LearningRate 0.0898 Epoch: 1 Global Step: 43610 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:43,616-Speed 2631.48 samples/sec Loss 14.1764 LearningRate 0.0898 Epoch: 1 Global Step: 43620 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:47,511-Speed 2629.40 samples/sec Loss 14.2278 LearningRate 0.0898 Epoch: 1 Global Step: 43630 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:51,404-Speed 2631.09 samples/sec Loss 14.2897 LearningRate 0.0898 Epoch: 1 Global Step: 43640 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:37:55,288-Speed 2636.84 samples/sec Loss 14.1984 LearningRate 0.0898 Epoch: 1 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:37:59,188-Speed 2626.49 samples/sec Loss 14.1597 LearningRate 0.0898 Epoch: 1 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:03,102-Speed 2616.87 samples/sec Loss 14.1144 LearningRate 0.0897 Epoch: 1 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:06,998-Speed 2628.66 samples/sec Loss 14.1629 LearningRate 0.0897 Epoch: 1 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:10,892-Speed 2630.18 samples/sec Loss 14.1525 LearningRate 0.0897 Epoch: 1 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:14,790-Speed 2628.34 samples/sec Loss 14.0507 LearningRate 0.0897 Epoch: 1 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:18,690-Speed 2626.27 samples/sec Loss 14.2418 LearningRate 0.0897 Epoch: 1 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:22,588-Speed 2627.42 samples/sec Loss 14.1694 LearningRate 0.0897 Epoch: 1 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:26,492-Speed 2623.40 samples/sec Loss 14.2737 LearningRate 0.0897 Epoch: 1 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:30,392-Speed 2627.21 samples/sec Loss 14.1572 LearningRate 0.0897 Epoch: 1 Global Step: 43740 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:34,302-Speed 2619.06 samples/sec Loss 14.0977 LearningRate 0.0897 Epoch: 1 Global Step: 43750 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:38:38,201-Speed 2626.63 samples/sec Loss 14.2009 LearningRate 0.0897 Epoch: 1 Global Step: 43760 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:38:42,101-Speed 2626.27 samples/sec Loss 14.2327 LearningRate 0.0897 Epoch: 1 Global Step: 43770 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:38:46,021-Speed 2613.53 samples/sec Loss 14.1303 LearningRate 0.0897 Epoch: 1 Global Step: 43780 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:38:49,897-Speed 2642.27 samples/sec Loss 14.2101 LearningRate 0.0897 Epoch: 1 Global Step: 43790 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:53,793-Speed 2629.05 samples/sec Loss 14.3241 LearningRate 0.0897 Epoch: 1 Global Step: 43800 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:38:57,689-Speed 2628.83 samples/sec Loss 14.1126 LearningRate 0.0897 Epoch: 1 Global Step: 43810 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:01,597-Speed 2621.23 samples/sec Loss 14.3491 LearningRate 0.0897 Epoch: 1 Global Step: 43820 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:05,495-Speed 2627.30 samples/sec Loss 14.4129 LearningRate 0.0897 Epoch: 1 Global Step: 43830 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:09,395-Speed 2625.74 samples/sec Loss 14.1026 LearningRate 0.0897 Epoch: 1 Global Step: 43840 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:13,298-Speed 2624.10 samples/sec Loss 14.0921 LearningRate 0.0897 Epoch: 1 Global Step: 43850 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:17,208-Speed 2619.92 samples/sec Loss 14.1134 LearningRate 0.0897 Epoch: 1 Global Step: 43860 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:21,109-Speed 2626.30 samples/sec Loss 14.2727 LearningRate 0.0897 Epoch: 1 Global Step: 43870 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:25,004-Speed 2629.43 samples/sec Loss 14.2783 LearningRate 0.0897 Epoch: 1 Global Step: 43880 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:28,886-Speed 2638.30 samples/sec Loss 14.1542 LearningRate 0.0897 Epoch: 1 Global Step: 43890 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:32,790-Speed 2623.94 samples/sec Loss 14.2273 LearningRate 0.0897 Epoch: 1 Global Step: 43900 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:36,690-Speed 2626.16 samples/sec Loss 14.0842 LearningRate 0.0897 Epoch: 1 Global Step: 43910 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:40,588-Speed 2626.99 samples/sec Loss 14.0363 LearningRate 0.0897 Epoch: 1 Global Step: 43920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:44,480-Speed 2632.35 samples/sec Loss 14.3011 LearningRate 0.0897 Epoch: 1 Global Step: 43930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:48,378-Speed 2627.28 samples/sec Loss 14.3026 LearningRate 0.0897 Epoch: 1 Global Step: 43940 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:52,284-Speed 2623.15 samples/sec Loss 14.1103 LearningRate 0.0897 Epoch: 1 Global Step: 43950 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:39:56,179-Speed 2629.99 samples/sec Loss 14.2152 LearningRate 0.0897 Epoch: 1 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:00,075-Speed 2628.86 samples/sec Loss 14.0698 LearningRate 0.0897 Epoch: 1 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:03,987-Speed 2618.26 samples/sec Loss 14.1183 LearningRate 0.0897 Epoch: 1 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:07,884-Speed 2628.17 samples/sec Loss 14.1342 LearningRate 0.0897 Epoch: 1 Global Step: 43990 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:40:11,764-Speed 2639.61 samples/sec Loss 14.2186 LearningRate 0.0897 Epoch: 1 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:15,661-Speed 2628.35 samples/sec Loss 14.1071 LearningRate 0.0897 Epoch: 1 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:19,558-Speed 2627.81 samples/sec Loss 14.2240 LearningRate 0.0897 Epoch: 1 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:23,465-Speed 2621.64 samples/sec Loss 14.1691 LearningRate 0.0897 Epoch: 1 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:27,361-Speed 2629.32 samples/sec Loss 14.0080 LearningRate 0.0897 Epoch: 1 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:31,258-Speed 2628.33 samples/sec Loss 14.2975 LearningRate 0.0897 Epoch: 1 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:35,153-Speed 2629.37 samples/sec Loss 14.1106 LearningRate 0.0897 Epoch: 1 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:39,052-Speed 2627.41 samples/sec Loss 14.0278 LearningRate 0.0897 Epoch: 1 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:42,968-Speed 2614.86 samples/sec Loss 14.2041 LearningRate 0.0897 Epoch: 1 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:46,870-Speed 2625.08 samples/sec Loss 14.1301 LearningRate 0.0897 Epoch: 1 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:40:50,775-Speed 2622.82 samples/sec Loss 14.1585 LearningRate 0.0897 Epoch: 1 Global Step: 44100 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:40:54,681-Speed 2621.99 samples/sec Loss 14.1038 LearningRate 0.0896 Epoch: 1 Global Step: 44110 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:40:58,592-Speed 2619.32 samples/sec Loss 14.2062 LearningRate 0.0896 Epoch: 1 Global Step: 44120 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:41:02,498-Speed 2621.52 samples/sec Loss 14.0994 LearningRate 0.0896 Epoch: 1 Global Step: 44130 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:41:06,398-Speed 2626.63 samples/sec Loss 13.9660 LearningRate 0.0896 Epoch: 1 Global Step: 44140 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:41:10,297-Speed 2627.43 samples/sec Loss 14.2189 LearningRate 0.0896 Epoch: 1 Global Step: 44150 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:41:14,192-Speed 2629.32 samples/sec Loss 14.1323 LearningRate 0.0896 Epoch: 1 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:18,088-Speed 2629.09 samples/sec Loss 14.1202 LearningRate 0.0896 Epoch: 1 Global Step: 44170 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:21,984-Speed 2628.38 samples/sec Loss 14.1871 LearningRate 0.0896 Epoch: 1 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:25,885-Speed 2626.21 samples/sec Loss 14.1415 LearningRate 0.0896 Epoch: 1 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:29,792-Speed 2621.23 samples/sec Loss 14.1060 LearningRate 0.0896 Epoch: 1 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:33,685-Speed 2630.62 samples/sec Loss 14.2056 LearningRate 0.0896 Epoch: 1 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:37,583-Speed 2627.49 samples/sec Loss 14.1757 LearningRate 0.0896 Epoch: 1 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:41,483-Speed 2626.40 samples/sec Loss 14.1911 LearningRate 0.0896 Epoch: 1 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:41:45,373-Speed 2633.17 samples/sec Loss 14.1268 LearningRate 0.0896 Epoch: 1 Global Step: 44240 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:41:49,267-Speed 2630.61 samples/sec Loss 14.2284 LearningRate 0.0896 Epoch: 1 Global Step: 44250 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:41:53,161-Speed 2630.00 samples/sec Loss 14.1504 LearningRate 0.0896 Epoch: 1 Global Step: 44260 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:41:57,049-Speed 2634.41 samples/sec Loss 14.1640 LearningRate 0.0896 Epoch: 1 Global Step: 44270 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:42:00,945-Speed 2628.71 samples/sec Loss 14.2144 LearningRate 0.0896 Epoch: 1 Global Step: 44280 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:42:04,842-Speed 2628.52 samples/sec Loss 14.3070 LearningRate 0.0896 Epoch: 1 Global Step: 44290 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:42:08,735-Speed 2631.01 samples/sec Loss 14.2162 LearningRate 0.0896 Epoch: 1 Global Step: 44300 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:42:12,631-Speed 2628.35 samples/sec Loss 14.2101 LearningRate 0.0896 Epoch: 1 Global Step: 44310 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:42:16,530-Speed 2627.69 samples/sec Loss 14.0682 LearningRate 0.0896 Epoch: 1 Global Step: 44320 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:42:20,428-Speed 2627.65 samples/sec Loss 14.2800 LearningRate 0.0896 Epoch: 1 Global Step: 44330 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:42:24,327-Speed 2626.79 samples/sec Loss 14.0063 LearningRate 0.0896 Epoch: 1 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:28,223-Speed 2628.68 samples/sec Loss 14.2435 LearningRate 0.0896 Epoch: 1 Global Step: 44350 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:32,119-Speed 2629.16 samples/sec Loss 14.1065 LearningRate 0.0896 Epoch: 1 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:36,014-Speed 2629.63 samples/sec Loss 14.1362 LearningRate 0.0896 Epoch: 1 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:40,050-Speed 2537.40 samples/sec Loss 14.1402 LearningRate 0.0896 Epoch: 1 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:44,047-Speed 2562.77 samples/sec Loss 14.1172 LearningRate 0.0896 Epoch: 1 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:47,942-Speed 2629.75 samples/sec Loss 14.2123 LearningRate 0.0896 Epoch: 1 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:51,843-Speed 2625.72 samples/sec Loss 14.1357 LearningRate 0.0896 Epoch: 1 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:55,795-Speed 2591.74 samples/sec Loss 14.0941 LearningRate 0.0896 Epoch: 1 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:42:59,697-Speed 2624.86 samples/sec Loss 14.4016 LearningRate 0.0896 Epoch: 1 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:43:03,593-Speed 2628.87 samples/sec Loss 13.9794 LearningRate 0.0896 Epoch: 1 Global Step: 44440 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:43:07,488-Speed 2629.28 samples/sec Loss 14.1265 LearningRate 0.0896 Epoch: 1 Global Step: 44450 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:43:11,382-Speed 2630.19 samples/sec Loss 14.2311 LearningRate 0.0896 Epoch: 1 Global Step: 44460 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:43:15,281-Speed 2627.04 samples/sec Loss 14.2226 LearningRate 0.0896 Epoch: 1 Global Step: 44470 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:43:19,156-Speed 2642.68 samples/sec Loss 14.1379 LearningRate 0.0896 Epoch: 1 Global Step: 44480 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:23,064-Speed 2621.48 samples/sec Loss 14.2903 LearningRate 0.0896 Epoch: 1 Global Step: 44490 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:26,959-Speed 2628.98 samples/sec Loss 14.0069 LearningRate 0.0896 Epoch: 1 Global Step: 44500 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:30,857-Speed 2628.58 samples/sec Loss 14.1477 LearningRate 0.0896 Epoch: 1 Global Step: 44510 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:34,751-Speed 2629.96 samples/sec Loss 14.0001 LearningRate 0.0896 Epoch: 1 Global Step: 44520 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:38,646-Speed 2629.63 samples/sec Loss 14.1308 LearningRate 0.0896 Epoch: 1 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:42,542-Speed 2628.46 samples/sec Loss 14.1853 LearningRate 0.0896 Epoch: 1 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:46,442-Speed 2626.82 samples/sec Loss 14.1651 LearningRate 0.0895 Epoch: 1 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:50,360-Speed 2613.66 samples/sec Loss 14.3643 LearningRate 0.0895 Epoch: 1 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:54,259-Speed 2627.06 samples/sec Loss 14.2065 LearningRate 0.0895 Epoch: 1 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:43:58,155-Speed 2628.62 samples/sec Loss 14.0347 LearningRate 0.0895 Epoch: 1 Global Step: 44580 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:02,054-Speed 2627.81 samples/sec Loss 14.2155 LearningRate 0.0895 Epoch: 1 Global Step: 44590 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:05,953-Speed 2626.55 samples/sec Loss 14.2216 LearningRate 0.0895 Epoch: 1 Global Step: 44600 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:09,855-Speed 2624.65 samples/sec Loss 14.0560 LearningRate 0.0895 Epoch: 1 Global Step: 44610 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:13,750-Speed 2629.61 samples/sec Loss 14.0690 LearningRate 0.0895 Epoch: 1 Global Step: 44620 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:17,651-Speed 2625.69 samples/sec Loss 14.1374 LearningRate 0.0895 Epoch: 1 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:21,573-Speed 2612.00 samples/sec Loss 14.0100 LearningRate 0.0895 Epoch: 1 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:25,503-Speed 2606.19 samples/sec Loss 14.0642 LearningRate 0.0895 Epoch: 1 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:29,399-Speed 2628.84 samples/sec Loss 14.1200 LearningRate 0.0895 Epoch: 1 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:33,314-Speed 2616.50 samples/sec Loss 14.1109 LearningRate 0.0895 Epoch: 1 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:37,287-Speed 2577.71 samples/sec Loss 14.2006 LearningRate 0.0895 Epoch: 1 Global Step: 44680 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:44:41,172-Speed 2636.48 samples/sec Loss 14.1218 LearningRate 0.0895 Epoch: 1 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:45,071-Speed 2626.93 samples/sec Loss 14.0673 LearningRate 0.0895 Epoch: 1 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:48,966-Speed 2629.37 samples/sec Loss 14.1888 LearningRate 0.0895 Epoch: 1 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:52,861-Speed 2630.03 samples/sec Loss 14.0840 LearningRate 0.0895 Epoch: 1 Global Step: 44720 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:44:56,773-Speed 2617.79 samples/sec Loss 14.1620 LearningRate 0.0895 Epoch: 1 Global Step: 44730 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:00,672-Speed 2627.66 samples/sec Loss 14.1174 LearningRate 0.0895 Epoch: 1 Global Step: 44740 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:04,592-Speed 2612.46 samples/sec Loss 14.1008 LearningRate 0.0895 Epoch: 1 Global Step: 44750 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:08,501-Speed 2620.16 samples/sec Loss 14.2388 LearningRate 0.0895 Epoch: 1 Global Step: 44760 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:12,400-Speed 2626.85 samples/sec Loss 14.1629 LearningRate 0.0895 Epoch: 1 Global Step: 44770 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:16,293-Speed 2631.22 samples/sec Loss 14.1898 LearningRate 0.0895 Epoch: 1 Global Step: 44780 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:20,185-Speed 2631.44 samples/sec Loss 13.9806 LearningRate 0.0895 Epoch: 1 Global Step: 44790 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:45:24,081-Speed 2629.28 samples/sec Loss 14.1676 LearningRate 0.0895 Epoch: 1 Global Step: 44800 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:45:27,983-Speed 2624.73 samples/sec Loss 14.2688 LearningRate 0.0895 Epoch: 1 Global Step: 44810 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:45:31,879-Speed 2629.71 samples/sec Loss 14.2666 LearningRate 0.0895 Epoch: 1 Global Step: 44820 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:45:35,779-Speed 2626.03 samples/sec Loss 14.0374 LearningRate 0.0895 Epoch: 1 Global Step: 44830 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:45:39,691-Speed 2617.69 samples/sec Loss 14.1138 LearningRate 0.0895 Epoch: 1 Global Step: 44840 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:45:43,594-Speed 2624.38 samples/sec Loss 14.1661 LearningRate 0.0895 Epoch: 1 Global Step: 44850 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:45:47,496-Speed 2625.19 samples/sec Loss 14.1152 LearningRate 0.0895 Epoch: 1 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:51,398-Speed 2625.26 samples/sec Loss 14.0037 LearningRate 0.0895 Epoch: 1 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:55,293-Speed 2629.79 samples/sec Loss 14.2573 LearningRate 0.0895 Epoch: 1 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:45:59,191-Speed 2627.66 samples/sec Loss 14.0662 LearningRate 0.0895 Epoch: 1 Global Step: 44890 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:46:03,089-Speed 2626.94 samples/sec Loss 14.3061 LearningRate 0.0895 Epoch: 1 Global Step: 44900 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:46:06,987-Speed 2627.94 samples/sec Loss 14.0926 LearningRate 0.0895 Epoch: 1 Global Step: 44910 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:46:10,882-Speed 2629.07 samples/sec Loss 13.9796 LearningRate 0.0895 Epoch: 1 Global Step: 44920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:46:14,964-Speed 2509.65 samples/sec Loss 14.1463 LearningRate 0.0895 Epoch: 1 Global Step: 44930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:46:18,858-Speed 2629.54 samples/sec Loss 14.1568 LearningRate 0.0895 Epoch: 1 Global Step: 44940 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:46:22,766-Speed 2621.78 samples/sec Loss 14.2225 LearningRate 0.0895 Epoch: 1 Global Step: 44950 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:46:26,656-Speed 2633.10 samples/sec Loss 14.1137 LearningRate 0.0895 Epoch: 1 Global Step: 44960 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:46:30,548-Speed 2631.76 samples/sec Loss 14.0639 LearningRate 0.0895 Epoch: 1 Global Step: 44970 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:46:34,440-Speed 2631.36 samples/sec Loss 14.1617 LearningRate 0.0894 Epoch: 1 Global Step: 44980 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:46:38,334-Speed 2629.98 samples/sec Loss 14.1871 LearningRate 0.0894 Epoch: 1 Global Step: 44990 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:46:42,239-Speed 2622.83 samples/sec Loss 14.1104 LearningRate 0.0894 Epoch: 1 Global Step: 45000 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:46:46,145-Speed 2622.02 samples/sec Loss 14.2141 LearningRate 0.0894 Epoch: 1 Global Step: 45010 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:46:50,020-Speed 2643.29 samples/sec Loss 14.0346 LearningRate 0.0894 Epoch: 1 Global Step: 45020 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:46:53,932-Speed 2618.57 samples/sec Loss 14.1897 LearningRate 0.0894 Epoch: 1 Global Step: 45030 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:46:57,832-Speed 2626.64 samples/sec Loss 14.0096 LearningRate 0.0894 Epoch: 1 Global Step: 45040 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:01,726-Speed 2630.22 samples/sec Loss 14.0848 LearningRate 0.0894 Epoch: 1 Global Step: 45050 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:05,615-Speed 2633.17 samples/sec Loss 14.1921 LearningRate 0.0894 Epoch: 1 Global Step: 45060 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:09,511-Speed 2628.92 samples/sec Loss 14.1493 LearningRate 0.0894 Epoch: 1 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:13,410-Speed 2627.26 samples/sec Loss 14.1778 LearningRate 0.0894 Epoch: 1 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:17,312-Speed 2624.24 samples/sec Loss 14.0868 LearningRate 0.0894 Epoch: 1 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:21,208-Speed 2629.33 samples/sec Loss 14.1701 LearningRate 0.0894 Epoch: 1 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:25,110-Speed 2624.83 samples/sec Loss 13.9975 LearningRate 0.0894 Epoch: 1 Global Step: 45110 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:47:29,001-Speed 2632.57 samples/sec Loss 14.2355 LearningRate 0.0894 Epoch: 1 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:47:32,906-Speed 2622.66 samples/sec Loss 14.0604 LearningRate 0.0894 Epoch: 1 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:47:36,812-Speed 2622.33 samples/sec Loss 14.0266 LearningRate 0.0894 Epoch: 1 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:47:40,705-Speed 2630.80 samples/sec Loss 14.1487 LearningRate 0.0894 Epoch: 1 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:47:44,613-Speed 2620.73 samples/sec Loss 14.0613 LearningRate 0.0894 Epoch: 1 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:47:48,468-Speed 2656.54 samples/sec Loss 14.1369 LearningRate 0.0894 Epoch: 1 Global Step: 45170 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:47:52,366-Speed 2627.85 samples/sec Loss 14.1352 LearningRate 0.0894 Epoch: 1 Global Step: 45180 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:47:56,257-Speed 2632.38 samples/sec Loss 14.1681 LearningRate 0.0894 Epoch: 1 Global Step: 45190 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:00,159-Speed 2625.14 samples/sec Loss 14.0315 LearningRate 0.0894 Epoch: 1 Global Step: 45200 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:04,059-Speed 2626.32 samples/sec Loss 14.1681 LearningRate 0.0894 Epoch: 1 Global Step: 45210 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:07,964-Speed 2622.60 samples/sec Loss 14.1420 LearningRate 0.0894 Epoch: 1 Global Step: 45220 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:11,875-Speed 2618.85 samples/sec Loss 14.1291 LearningRate 0.0894 Epoch: 1 Global Step: 45230 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:15,767-Speed 2631.97 samples/sec Loss 14.2556 LearningRate 0.0894 Epoch: 1 Global Step: 45240 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:19,659-Speed 2631.24 samples/sec Loss 14.1658 LearningRate 0.0894 Epoch: 1 Global Step: 45250 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:23,558-Speed 2626.70 samples/sec Loss 14.2150 LearningRate 0.0894 Epoch: 1 Global Step: 45260 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 00:48:27,459-Speed 2626.04 samples/sec Loss 14.1816 LearningRate 0.0894 Epoch: 1 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:31,367-Speed 2620.61 samples/sec Loss 14.1012 LearningRate 0.0894 Epoch: 1 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:35,270-Speed 2624.21 samples/sec Loss 14.1768 LearningRate 0.0894 Epoch: 1 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:39,165-Speed 2629.58 samples/sec Loss 14.1414 LearningRate 0.0894 Epoch: 1 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:43,061-Speed 2628.97 samples/sec Loss 14.1421 LearningRate 0.0894 Epoch: 1 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:46,960-Speed 2627.08 samples/sec Loss 14.0852 LearningRate 0.0894 Epoch: 1 Global Step: 45320 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:50,856-Speed 2628.89 samples/sec Loss 14.0499 LearningRate 0.0894 Epoch: 1 Global Step: 45330 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:54,759-Speed 2624.40 samples/sec Loss 14.0381 LearningRate 0.0894 Epoch: 1 Global Step: 45340 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:48:58,669-Speed 2619.34 samples/sec Loss 14.0576 LearningRate 0.0894 Epoch: 1 Global Step: 45350 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:49:02,568-Speed 2626.77 samples/sec Loss 13.9480 LearningRate 0.0894 Epoch: 1 Global Step: 45360 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:49:06,463-Speed 2629.43 samples/sec Loss 14.0403 LearningRate 0.0894 Epoch: 1 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:10,357-Speed 2630.63 samples/sec Loss 14.0540 LearningRate 0.0894 Epoch: 1 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:14,263-Speed 2621.85 samples/sec Loss 14.0864 LearningRate 0.0894 Epoch: 1 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:18,160-Speed 2629.09 samples/sec Loss 14.1846 LearningRate 0.0894 Epoch: 1 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:22,054-Speed 2629.76 samples/sec Loss 14.1092 LearningRate 0.0894 Epoch: 1 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:25,951-Speed 2628.64 samples/sec Loss 14.1147 LearningRate 0.0893 Epoch: 1 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:29,846-Speed 2629.29 samples/sec Loss 14.0603 LearningRate 0.0893 Epoch: 1 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:33,744-Speed 2627.72 samples/sec Loss 14.1027 LearningRate 0.0893 Epoch: 1 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:37,643-Speed 2627.06 samples/sec Loss 14.0583 LearningRate 0.0893 Epoch: 1 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:41,540-Speed 2628.10 samples/sec Loss 14.1352 LearningRate 0.0893 Epoch: 1 Global Step: 45460 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:49:45,435-Speed 2629.56 samples/sec Loss 14.0965 LearningRate 0.0893 Epoch: 1 Global Step: 45470 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:49:49,351-Speed 2615.70 samples/sec Loss 13.9943 LearningRate 0.0893 Epoch: 1 Global Step: 45480 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:49:53,251-Speed 2626.40 samples/sec Loss 13.8107 LearningRate 0.0893 Epoch: 1 Global Step: 45490 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:49:57,152-Speed 2625.08 samples/sec Loss 14.0745 LearningRate 0.0893 Epoch: 1 Global Step: 45500 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:01,050-Speed 2628.17 samples/sec Loss 14.1416 LearningRate 0.0893 Epoch: 1 Global Step: 45510 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:04,950-Speed 2625.60 samples/sec Loss 14.1367 LearningRate 0.0893 Epoch: 1 Global Step: 45520 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:08,848-Speed 2627.41 samples/sec Loss 14.1543 LearningRate 0.0893 Epoch: 1 Global Step: 45530 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:12,754-Speed 2622.44 samples/sec Loss 14.0332 LearningRate 0.0893 Epoch: 1 Global Step: 45540 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:16,651-Speed 2628.54 samples/sec Loss 13.9214 LearningRate 0.0893 Epoch: 1 Global Step: 45550 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:20,554-Speed 2623.91 samples/sec Loss 14.2636 LearningRate 0.0893 Epoch: 1 Global Step: 45560 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:24,440-Speed 2636.15 samples/sec Loss 14.0578 LearningRate 0.0893 Epoch: 1 Global Step: 45570 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:28,350-Speed 2619.62 samples/sec Loss 14.2429 LearningRate 0.0893 Epoch: 1 Global Step: 45580 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:32,248-Speed 2627.72 samples/sec Loss 14.1496 LearningRate 0.0893 Epoch: 1 Global Step: 45590 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:36,155-Speed 2621.11 samples/sec Loss 14.0227 LearningRate 0.0893 Epoch: 1 Global Step: 45600 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:50:40,038-Speed 2637.87 samples/sec Loss 14.1263 LearningRate 0.0893 Epoch: 1 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:50:43,936-Speed 2627.32 samples/sec Loss 13.9710 LearningRate 0.0893 Epoch: 1 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:50:47,838-Speed 2625.25 samples/sec Loss 14.1025 LearningRate 0.0893 Epoch: 1 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:50:51,712-Speed 2643.66 samples/sec Loss 14.0409 LearningRate 0.0893 Epoch: 1 Global Step: 45640 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:50:55,606-Speed 2630.59 samples/sec Loss 14.0986 LearningRate 0.0893 Epoch: 1 Global Step: 45650 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:50:59,509-Speed 2624.35 samples/sec Loss 14.0470 LearningRate 0.0893 Epoch: 1 Global Step: 45660 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:03,401-Speed 2631.58 samples/sec Loss 14.1240 LearningRate 0.0893 Epoch: 1 Global Step: 45670 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:07,294-Speed 2630.80 samples/sec Loss 14.1124 LearningRate 0.0893 Epoch: 1 Global Step: 45680 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:11,185-Speed 2632.45 samples/sec Loss 14.1045 LearningRate 0.0893 Epoch: 1 Global Step: 45690 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:15,079-Speed 2630.20 samples/sec Loss 14.0644 LearningRate 0.0893 Epoch: 1 Global Step: 45700 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:18,978-Speed 2627.01 samples/sec Loss 14.0734 LearningRate 0.0893 Epoch: 1 Global Step: 45710 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:22,873-Speed 2629.24 samples/sec Loss 14.1221 LearningRate 0.0893 Epoch: 1 Global Step: 45720 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:26,777-Speed 2623.68 samples/sec Loss 14.1241 LearningRate 0.0893 Epoch: 1 Global Step: 45730 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 00:51:30,672-Speed 2630.15 samples/sec Loss 14.0398 LearningRate 0.0893 Epoch: 1 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:51:34,567-Speed 2629.47 samples/sec Loss 14.2249 LearningRate 0.0893 Epoch: 1 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:51:38,487-Speed 2612.71 samples/sec Loss 14.0696 LearningRate 0.0893 Epoch: 1 Global Step: 45760 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:51:42,389-Speed 2624.52 samples/sec Loss 14.1912 LearningRate 0.0893 Epoch: 1 Global Step: 45770 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:51:46,289-Speed 2626.49 samples/sec Loss 14.2325 LearningRate 0.0893 Epoch: 1 Global Step: 45780 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:51:50,208-Speed 2613.71 samples/sec Loss 14.0920 LearningRate 0.0893 Epoch: 1 Global Step: 45790 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:51:54,108-Speed 2625.76 samples/sec Loss 14.0176 LearningRate 0.0893 Epoch: 1 Global Step: 45800 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:51:58,004-Speed 2629.59 samples/sec Loss 14.2615 LearningRate 0.0893 Epoch: 1 Global Step: 45810 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:01,896-Speed 2631.85 samples/sec Loss 13.8963 LearningRate 0.0893 Epoch: 1 Global Step: 45820 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:05,794-Speed 2627.23 samples/sec Loss 14.1163 LearningRate 0.0893 Epoch: 1 Global Step: 45830 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:09,680-Speed 2635.66 samples/sec Loss 14.1390 LearningRate 0.0893 Epoch: 1 Global Step: 45840 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:13,583-Speed 2623.83 samples/sec Loss 14.0631 LearningRate 0.0893 Epoch: 1 Global Step: 45850 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:17,491-Speed 2621.55 samples/sec Loss 14.1759 LearningRate 0.0892 Epoch: 1 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:21,396-Speed 2622.64 samples/sec Loss 14.2022 LearningRate 0.0892 Epoch: 1 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:25,315-Speed 2612.92 samples/sec Loss 13.9860 LearningRate 0.0892 Epoch: 1 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:29,216-Speed 2626.45 samples/sec Loss 14.0944 LearningRate 0.0892 Epoch: 1 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:33,111-Speed 2629.30 samples/sec Loss 14.1284 LearningRate 0.0892 Epoch: 1 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:37,005-Speed 2630.86 samples/sec Loss 13.9665 LearningRate 0.0892 Epoch: 1 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:40,896-Speed 2632.62 samples/sec Loss 14.0922 LearningRate 0.0892 Epoch: 1 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:44,788-Speed 2631.43 samples/sec Loss 14.1191 LearningRate 0.0892 Epoch: 1 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:48,688-Speed 2626.16 samples/sec Loss 14.0047 LearningRate 0.0892 Epoch: 1 Global Step: 45940 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:52:52,561-Speed 2645.10 samples/sec Loss 14.1549 LearningRate 0.0892 Epoch: 1 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:52:56,457-Speed 2628.53 samples/sec Loss 14.1149 LearningRate 0.0892 Epoch: 1 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:00,358-Speed 2626.05 samples/sec Loss 14.1204 LearningRate 0.0892 Epoch: 1 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:04,253-Speed 2629.83 samples/sec Loss 13.9719 LearningRate 0.0892 Epoch: 1 Global Step: 45980 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:08,171-Speed 2614.83 samples/sec Loss 14.0672 LearningRate 0.0892 Epoch: 1 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:12,077-Speed 2622.32 samples/sec Loss 14.0166 LearningRate 0.0892 Epoch: 1 Global Step: 46000 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:15,967-Speed 2632.56 samples/sec Loss 14.0662 LearningRate 0.0892 Epoch: 1 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:19,861-Speed 2630.49 samples/sec Loss 14.0586 LearningRate 0.0892 Epoch: 1 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:23,755-Speed 2630.39 samples/sec Loss 13.9991 LearningRate 0.0892 Epoch: 1 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:27,650-Speed 2630.04 samples/sec Loss 14.1188 LearningRate 0.0892 Epoch: 1 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:53:31,541-Speed 2632.14 samples/sec Loss 14.0903 LearningRate 0.0892 Epoch: 1 Global Step: 46050 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:53:35,436-Speed 2630.44 samples/sec Loss 14.1598 LearningRate 0.0892 Epoch: 1 Global Step: 46060 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:53:39,344-Speed 2620.97 samples/sec Loss 13.9657 LearningRate 0.0892 Epoch: 1 Global Step: 46070 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:53:43,236-Speed 2630.94 samples/sec Loss 14.1872 LearningRate 0.0892 Epoch: 1 Global Step: 46080 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:53:47,136-Speed 2626.07 samples/sec Loss 13.9540 LearningRate 0.0892 Epoch: 1 Global Step: 46090 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:53:51,032-Speed 2629.81 samples/sec Loss 14.1776 LearningRate 0.0892 Epoch: 1 Global Step: 46100 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:53:54,930-Speed 2627.70 samples/sec Loss 14.0270 LearningRate 0.0892 Epoch: 1 Global Step: 46110 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:53:58,822-Speed 2631.44 samples/sec Loss 14.0939 LearningRate 0.0892 Epoch: 1 Global Step: 46120 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:54:02,769-Speed 2595.19 samples/sec Loss 14.2652 LearningRate 0.0892 Epoch: 1 Global Step: 46130 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:54:06,684-Speed 2616.36 samples/sec Loss 14.0976 LearningRate 0.0892 Epoch: 1 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:10,579-Speed 2629.42 samples/sec Loss 14.0329 LearningRate 0.0892 Epoch: 1 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:14,470-Speed 2632.24 samples/sec Loss 14.0925 LearningRate 0.0892 Epoch: 1 Global Step: 46160 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:18,366-Speed 2629.12 samples/sec Loss 14.2096 LearningRate 0.0892 Epoch: 1 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:22,264-Speed 2627.36 samples/sec Loss 14.1454 LearningRate 0.0892 Epoch: 1 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:26,162-Speed 2627.35 samples/sec Loss 14.1105 LearningRate 0.0892 Epoch: 1 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:30,058-Speed 2629.19 samples/sec Loss 14.2009 LearningRate 0.0892 Epoch: 1 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:33,976-Speed 2614.67 samples/sec Loss 13.9652 LearningRate 0.0892 Epoch: 1 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:37,881-Speed 2623.33 samples/sec Loss 13.8590 LearningRate 0.0892 Epoch: 1 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:41,779-Speed 2627.02 samples/sec Loss 14.1109 LearningRate 0.0892 Epoch: 1 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:45,676-Speed 2629.19 samples/sec Loss 14.0444 LearningRate 0.0892 Epoch: 1 Global Step: 46240 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:54:49,587-Speed 2618.60 samples/sec Loss 13.9875 LearningRate 0.0892 Epoch: 1 Global Step: 46250 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:54:53,465-Speed 2641.30 samples/sec Loss 14.0055 LearningRate 0.0892 Epoch: 1 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:54:57,370-Speed 2622.80 samples/sec Loss 13.9348 LearningRate 0.0892 Epoch: 1 Global Step: 46270 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:01,265-Speed 2629.79 samples/sec Loss 14.1896 LearningRate 0.0892 Epoch: 1 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:05,163-Speed 2627.75 samples/sec Loss 14.0572 LearningRate 0.0892 Epoch: 1 Global Step: 46290 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:09,061-Speed 2628.29 samples/sec Loss 14.0886 LearningRate 0.0891 Epoch: 1 Global Step: 46300 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:12,961-Speed 2626.36 samples/sec Loss 14.0615 LearningRate 0.0891 Epoch: 1 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:16,868-Speed 2621.11 samples/sec Loss 14.1074 LearningRate 0.0891 Epoch: 1 Global Step: 46320 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:20,777-Speed 2620.98 samples/sec Loss 14.2077 LearningRate 0.0891 Epoch: 1 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:24,676-Speed 2627.09 samples/sec Loss 13.9765 LearningRate 0.0891 Epoch: 1 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:28,568-Speed 2631.74 samples/sec Loss 13.9056 LearningRate 0.0891 Epoch: 1 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:32,466-Speed 2627.38 samples/sec Loss 14.1456 LearningRate 0.0891 Epoch: 1 Global Step: 46360 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:55:36,363-Speed 2628.41 samples/sec Loss 13.9660 LearningRate 0.0891 Epoch: 1 Global Step: 46370 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:55:40,262-Speed 2627.11 samples/sec Loss 14.0246 LearningRate 0.0891 Epoch: 1 Global Step: 46380 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:55:44,168-Speed 2621.89 samples/sec Loss 14.1257 LearningRate 0.0891 Epoch: 1 Global Step: 46390 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:55:48,078-Speed 2619.65 samples/sec Loss 14.0475 LearningRate 0.0891 Epoch: 1 Global Step: 46400 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:55:51,959-Speed 2639.00 samples/sec Loss 13.9778 LearningRate 0.0891 Epoch: 1 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:55,853-Speed 2631.11 samples/sec Loss 13.9922 LearningRate 0.0891 Epoch: 1 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:55:59,748-Speed 2629.77 samples/sec Loss 14.0584 LearningRate 0.0891 Epoch: 1 Global Step: 46430 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:03,645-Speed 2627.75 samples/sec Loss 14.0800 LearningRate 0.0891 Epoch: 1 Global Step: 46440 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:07,541-Speed 2628.76 samples/sec Loss 14.1045 LearningRate 0.0891 Epoch: 1 Global Step: 46450 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:11,449-Speed 2621.04 samples/sec Loss 13.9987 LearningRate 0.0891 Epoch: 1 Global Step: 46460 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:15,346-Speed 2628.03 samples/sec Loss 13.9602 LearningRate 0.0891 Epoch: 1 Global Step: 46470 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:19,243-Speed 2628.75 samples/sec Loss 14.1135 LearningRate 0.0891 Epoch: 1 Global Step: 46480 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:23,146-Speed 2624.45 samples/sec Loss 14.0351 LearningRate 0.0891 Epoch: 1 Global Step: 46490 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:27,038-Speed 2631.65 samples/sec Loss 14.0830 LearningRate 0.0891 Epoch: 1 Global Step: 46500 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:30,908-Speed 2646.89 samples/sec Loss 13.9116 LearningRate 0.0891 Epoch: 1 Global Step: 46510 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:34,801-Speed 2630.31 samples/sec Loss 14.1873 LearningRate 0.0891 Epoch: 1 Global Step: 46520 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:38,698-Speed 2628.55 samples/sec Loss 13.9257 LearningRate 0.0891 Epoch: 1 Global Step: 46530 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:42,605-Speed 2621.53 samples/sec Loss 14.0440 LearningRate 0.0891 Epoch: 1 Global Step: 46540 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:46,511-Speed 2622.04 samples/sec Loss 14.0303 LearningRate 0.0891 Epoch: 1 Global Step: 46550 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:50,413-Speed 2625.36 samples/sec Loss 13.9334 LearningRate 0.0891 Epoch: 1 Global Step: 46560 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:54,317-Speed 2624.02 samples/sec Loss 14.1532 LearningRate 0.0891 Epoch: 1 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:56:58,220-Speed 2624.30 samples/sec Loss 13.9848 LearningRate 0.0891 Epoch: 1 Global Step: 46580 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:02,125-Speed 2623.25 samples/sec Loss 13.9481 LearningRate 0.0891 Epoch: 1 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:06,033-Speed 2620.43 samples/sec Loss 13.8698 LearningRate 0.0891 Epoch: 1 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:09,933-Speed 2626.11 samples/sec Loss 13.9386 LearningRate 0.0891 Epoch: 1 Global Step: 46610 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:57:13,804-Speed 2646.36 samples/sec Loss 14.1742 LearningRate 0.0891 Epoch: 1 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:17,709-Speed 2622.87 samples/sec Loss 14.0324 LearningRate 0.0891 Epoch: 1 Global Step: 46630 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:21,596-Speed 2634.86 samples/sec Loss 14.1631 LearningRate 0.0891 Epoch: 1 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:25,497-Speed 2626.05 samples/sec Loss 13.9820 LearningRate 0.0891 Epoch: 1 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:29,393-Speed 2628.76 samples/sec Loss 14.0774 LearningRate 0.0891 Epoch: 1 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:33,285-Speed 2631.50 samples/sec Loss 14.0087 LearningRate 0.0891 Epoch: 1 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:37,181-Speed 2628.72 samples/sec Loss 14.1115 LearningRate 0.0891 Epoch: 1 Global Step: 46680 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:41,080-Speed 2627.12 samples/sec Loss 14.1064 LearningRate 0.0891 Epoch: 1 Global Step: 46690 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:45,001-Speed 2611.98 samples/sec Loss 13.9232 LearningRate 0.0891 Epoch: 1 Global Step: 46700 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:48,929-Speed 2608.21 samples/sec Loss 14.1075 LearningRate 0.0891 Epoch: 1 Global Step: 46710 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:57:52,823-Speed 2630.48 samples/sec Loss 14.0363 LearningRate 0.0891 Epoch: 1 Global Step: 46720 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:57:56,717-Speed 2630.30 samples/sec Loss 14.0397 LearningRate 0.0891 Epoch: 1 Global Step: 46730 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:00,613-Speed 2628.93 samples/sec Loss 13.9274 LearningRate 0.0890 Epoch: 1 Global Step: 46740 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:04,511-Speed 2627.31 samples/sec Loss 14.0236 LearningRate 0.0890 Epoch: 1 Global Step: 46750 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:08,410-Speed 2627.28 samples/sec Loss 14.0729 LearningRate 0.0890 Epoch: 1 Global Step: 46760 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:12,331-Speed 2612.44 samples/sec Loss 14.1084 LearningRate 0.0890 Epoch: 1 Global Step: 46770 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:16,226-Speed 2629.67 samples/sec Loss 14.1688 LearningRate 0.0890 Epoch: 1 Global Step: 46780 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:20,141-Speed 2615.98 samples/sec Loss 14.0929 LearningRate 0.0890 Epoch: 1 Global Step: 46790 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:24,038-Speed 2628.57 samples/sec Loss 14.0357 LearningRate 0.0890 Epoch: 1 Global Step: 46800 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:27,945-Speed 2622.38 samples/sec Loss 14.1040 LearningRate 0.0890 Epoch: 1 Global Step: 46810 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:31,841-Speed 2628.42 samples/sec Loss 14.1659 LearningRate 0.0890 Epoch: 1 Global Step: 46820 Fp16 Grad Scale: 524288 Required: 88 hours
Training: 2022-04-13 00:58:35,727-Speed 2636.45 samples/sec Loss 14.0848 LearningRate 0.0890 Epoch: 1 Global Step: 46830 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:39,628-Speed 2625.43 samples/sec Loss 14.0703 LearningRate 0.0890 Epoch: 1 Global Step: 46840 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:43,538-Speed 2619.96 samples/sec Loss 14.0725 LearningRate 0.0890 Epoch: 1 Global Step: 46850 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:47,446-Speed 2620.61 samples/sec Loss 14.0433 LearningRate 0.0890 Epoch: 1 Global Step: 46860 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:51,434-Speed 2568.13 samples/sec Loss 14.0467 LearningRate 0.0890 Epoch: 1 Global Step: 46870 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:55,332-Speed 2627.95 samples/sec Loss 14.0425 LearningRate 0.0890 Epoch: 1 Global Step: 46880 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:58:59,228-Speed 2629.53 samples/sec Loss 14.0595 LearningRate 0.0890 Epoch: 1 Global Step: 46890 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:59:03,110-Speed 2638.62 samples/sec Loss 13.9843 LearningRate 0.0890 Epoch: 1 Global Step: 46900 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:07,006-Speed 2628.72 samples/sec Loss 13.9413 LearningRate 0.0890 Epoch: 1 Global Step: 46910 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:10,898-Speed 2631.40 samples/sec Loss 14.0834 LearningRate 0.0890 Epoch: 1 Global Step: 46920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:14,800-Speed 2625.40 samples/sec Loss 14.0411 LearningRate 0.0890 Epoch: 1 Global Step: 46930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:18,704-Speed 2623.55 samples/sec Loss 14.0996 LearningRate 0.0890 Epoch: 1 Global Step: 46940 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:22,611-Speed 2621.35 samples/sec Loss 14.0704 LearningRate 0.0890 Epoch: 1 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:26,509-Speed 2627.98 samples/sec Loss 13.9780 LearningRate 0.0890 Epoch: 1 Global Step: 46960 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:30,408-Speed 2627.18 samples/sec Loss 14.0297 LearningRate 0.0890 Epoch: 1 Global Step: 46970 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:34,308-Speed 2626.56 samples/sec Loss 14.0058 LearningRate 0.0890 Epoch: 1 Global Step: 46980 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:38,203-Speed 2629.19 samples/sec Loss 13.9394 LearningRate 0.0890 Epoch: 1 Global Step: 46990 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 00:59:42,114-Speed 2618.96 samples/sec Loss 14.0169 LearningRate 0.0890 Epoch: 1 Global Step: 47000 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:59:46,019-Speed 2622.92 samples/sec Loss 13.9880 LearningRate 0.0890 Epoch: 1 Global Step: 47010 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:59:49,917-Speed 2628.01 samples/sec Loss 14.1415 LearningRate 0.0890 Epoch: 1 Global Step: 47020 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:59:53,821-Speed 2623.23 samples/sec Loss 14.0180 LearningRate 0.0890 Epoch: 1 Global Step: 47030 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 00:59:57,720-Speed 2627.12 samples/sec Loss 14.0589 LearningRate 0.0890 Epoch: 1 Global Step: 47040 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:01,620-Speed 2626.49 samples/sec Loss 13.9915 LearningRate 0.0890 Epoch: 1 Global Step: 47050 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:05,519-Speed 2627.08 samples/sec Loss 14.0115 LearningRate 0.0890 Epoch: 1 Global Step: 47060 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:09,422-Speed 2624.46 samples/sec Loss 13.9414 LearningRate 0.0890 Epoch: 1 Global Step: 47070 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:13,334-Speed 2617.93 samples/sec Loss 14.0575 LearningRate 0.0890 Epoch: 1 Global Step: 47080 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:17,229-Speed 2629.64 samples/sec Loss 14.0745 LearningRate 0.0890 Epoch: 1 Global Step: 47090 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:21,133-Speed 2623.68 samples/sec Loss 14.1058 LearningRate 0.0890 Epoch: 1 Global Step: 47100 Fp16 Grad Scale: 524288 Required: 88 hours
Training: 2022-04-13 01:00:25,012-Speed 2639.69 samples/sec Loss 14.1804 LearningRate 0.0890 Epoch: 1 Global Step: 47110 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:28,909-Speed 2629.03 samples/sec Loss 13.9613 LearningRate 0.0890 Epoch: 1 Global Step: 47120 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:32,803-Speed 2630.37 samples/sec Loss 14.1554 LearningRate 0.0890 Epoch: 1 Global Step: 47130 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:36,719-Speed 2615.49 samples/sec Loss 14.0511 LearningRate 0.0890 Epoch: 1 Global Step: 47140 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:40,631-Speed 2618.12 samples/sec Loss 14.0705 LearningRate 0.0890 Epoch: 1 Global Step: 47150 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:44,526-Speed 2630.05 samples/sec Loss 14.0123 LearningRate 0.0890 Epoch: 1 Global Step: 47160 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:48,427-Speed 2625.40 samples/sec Loss 14.1057 LearningRate 0.0890 Epoch: 1 Global Step: 47170 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:00:52,323-Speed 2629.67 samples/sec Loss 14.1853 LearningRate 0.0889 Epoch: 1 Global Step: 47180 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:00:56,267-Speed 2596.30 samples/sec Loss 14.1176 LearningRate 0.0889 Epoch: 1 Global Step: 47190 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:00,348-Speed 2510.43 samples/sec Loss 14.0983 LearningRate 0.0889 Epoch: 1 Global Step: 47200 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:04,494-Speed 2470.07 samples/sec Loss 13.9371 LearningRate 0.0889 Epoch: 1 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:08,444-Speed 2593.20 samples/sec Loss 13.9930 LearningRate 0.0889 Epoch: 1 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:12,335-Speed 2632.14 samples/sec Loss 13.9025 LearningRate 0.0889 Epoch: 1 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:16,232-Speed 2628.33 samples/sec Loss 13.9310 LearningRate 0.0889 Epoch: 1 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:20,147-Speed 2616.29 samples/sec Loss 13.8693 LearningRate 0.0889 Epoch: 1 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:24,047-Speed 2625.92 samples/sec Loss 13.9085 LearningRate 0.0889 Epoch: 1 Global Step: 47260 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:27,944-Speed 2628.55 samples/sec Loss 13.9549 LearningRate 0.0889 Epoch: 1 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:01:31,841-Speed 2628.46 samples/sec Loss 13.8592 LearningRate 0.0889 Epoch: 1 Global Step: 47280 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:01:35,737-Speed 2628.92 samples/sec Loss 14.1855 LearningRate 0.0889 Epoch: 1 Global Step: 47290 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:01:39,632-Speed 2629.39 samples/sec Loss 14.0821 LearningRate 0.0889 Epoch: 1 Global Step: 47300 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:01:43,528-Speed 2628.95 samples/sec Loss 13.8551 LearningRate 0.0889 Epoch: 1 Global Step: 47310 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:01:47,422-Speed 2630.14 samples/sec Loss 14.0245 LearningRate 0.0889 Epoch: 1 Global Step: 47320 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:01:51,316-Speed 2630.41 samples/sec Loss 13.9601 LearningRate 0.0889 Epoch: 1 Global Step: 47330 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:01:55,213-Speed 2628.65 samples/sec Loss 13.9619 LearningRate 0.0889 Epoch: 1 Global Step: 47340 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:01:59,116-Speed 2624.69 samples/sec Loss 13.8893 LearningRate 0.0889 Epoch: 1 Global Step: 47350 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:03,011-Speed 2629.51 samples/sec Loss 13.9776 LearningRate 0.0889 Epoch: 1 Global Step: 47360 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:06,910-Speed 2626.93 samples/sec Loss 14.2012 LearningRate 0.0889 Epoch: 1 Global Step: 47370 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:10,803-Speed 2630.69 samples/sec Loss 13.9836 LearningRate 0.0889 Epoch: 1 Global Step: 47380 Fp16 Grad Scale: 524288 Required: 88 hours
Training: 2022-04-13 01:02:14,681-Speed 2641.64 samples/sec Loss 13.8946 LearningRate 0.0889 Epoch: 1 Global Step: 47390 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:18,576-Speed 2629.85 samples/sec Loss 14.1580 LearningRate 0.0889 Epoch: 1 Global Step: 47400 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:22,470-Speed 2630.07 samples/sec Loss 13.8634 LearningRate 0.0889 Epoch: 1 Global Step: 47410 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:26,364-Speed 2630.45 samples/sec Loss 14.0722 LearningRate 0.0889 Epoch: 1 Global Step: 47420 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:30,258-Speed 2630.21 samples/sec Loss 14.1432 LearningRate 0.0889 Epoch: 1 Global Step: 47430 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:34,153-Speed 2630.01 samples/sec Loss 14.1043 LearningRate 0.0889 Epoch: 1 Global Step: 47440 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:38,070-Speed 2614.78 samples/sec Loss 14.0073 LearningRate 0.0889 Epoch: 1 Global Step: 47450 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:42,021-Speed 2592.17 samples/sec Loss 14.0114 LearningRate 0.0889 Epoch: 1 Global Step: 47460 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:45,917-Speed 2629.15 samples/sec Loss 14.0830 LearningRate 0.0889 Epoch: 1 Global Step: 47470 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:49,811-Speed 2631.02 samples/sec Loss 14.0494 LearningRate 0.0889 Epoch: 1 Global Step: 47480 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:53,689-Speed 2640.58 samples/sec Loss 14.1061 LearningRate 0.0889 Epoch: 1 Global Step: 47490 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:02:57,598-Speed 2620.51 samples/sec Loss 13.9315 LearningRate 0.0889 Epoch: 1 Global Step: 47500 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:01,490-Speed 2632.04 samples/sec Loss 14.1343 LearningRate 0.0889 Epoch: 1 Global Step: 47510 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:05,394-Speed 2623.74 samples/sec Loss 14.0984 LearningRate 0.0889 Epoch: 1 Global Step: 47520 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:09,301-Speed 2621.08 samples/sec Loss 14.0531 LearningRate 0.0889 Epoch: 1 Global Step: 47530 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:13,198-Speed 2628.17 samples/sec Loss 13.9783 LearningRate 0.0889 Epoch: 1 Global Step: 47540 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:17,101-Speed 2624.15 samples/sec Loss 13.9908 LearningRate 0.0889 Epoch: 1 Global Step: 47550 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:20,994-Speed 2631.71 samples/sec Loss 13.9802 LearningRate 0.0889 Epoch: 1 Global Step: 47560 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:24,889-Speed 2629.82 samples/sec Loss 13.9813 LearningRate 0.0889 Epoch: 1 Global Step: 47570 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:03:28,767-Speed 2641.29 samples/sec Loss 13.9460 LearningRate 0.0889 Epoch: 1 Global Step: 47580 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:32,662-Speed 2629.36 samples/sec Loss 14.0129 LearningRate 0.0889 Epoch: 1 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:36,557-Speed 2629.26 samples/sec Loss 13.8518 LearningRate 0.0889 Epoch: 1 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:40,466-Speed 2620.35 samples/sec Loss 14.1152 LearningRate 0.0889 Epoch: 1 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:44,367-Speed 2625.87 samples/sec Loss 14.0001 LearningRate 0.0888 Epoch: 1 Global Step: 47620 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:48,266-Speed 2627.36 samples/sec Loss 13.9550 LearningRate 0.0888 Epoch: 1 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:52,164-Speed 2627.11 samples/sec Loss 14.0831 LearningRate 0.0888 Epoch: 1 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:56,065-Speed 2626.22 samples/sec Loss 14.0234 LearningRate 0.0888 Epoch: 1 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:03:59,962-Speed 2628.04 samples/sec Loss 14.1259 LearningRate 0.0888 Epoch: 1 Global Step: 47660 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:03,860-Speed 2627.52 samples/sec Loss 13.9169 LearningRate 0.0888 Epoch: 1 Global Step: 47670 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:07,767-Speed 2621.81 samples/sec Loss 14.0504 LearningRate 0.0888 Epoch: 1 Global Step: 47680 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:04:11,665-Speed 2627.72 samples/sec Loss 14.0988 LearningRate 0.0888 Epoch: 1 Global Step: 47690 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:04:15,556-Speed 2632.29 samples/sec Loss 13.9813 LearningRate 0.0888 Epoch: 1 Global Step: 47700 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:04:19,450-Speed 2630.81 samples/sec Loss 14.0064 LearningRate 0.0888 Epoch: 1 Global Step: 47710 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:04:23,341-Speed 2632.43 samples/sec Loss 14.0737 LearningRate 0.0888 Epoch: 1 Global Step: 47720 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:04:27,216-Speed 2643.06 samples/sec Loss 14.1581 LearningRate 0.0888 Epoch: 1 Global Step: 47730 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:31,116-Speed 2626.34 samples/sec Loss 14.0724 LearningRate 0.0888 Epoch: 1 Global Step: 47740 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:35,019-Speed 2624.22 samples/sec Loss 14.0124 LearningRate 0.0888 Epoch: 1 Global Step: 47750 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:38,925-Speed 2622.29 samples/sec Loss 13.9537 LearningRate 0.0888 Epoch: 1 Global Step: 47760 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:42,837-Speed 2618.81 samples/sec Loss 14.1085 LearningRate 0.0888 Epoch: 1 Global Step: 47770 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:46,746-Speed 2619.90 samples/sec Loss 13.9697 LearningRate 0.0888 Epoch: 1 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:50,641-Speed 2629.65 samples/sec Loss 13.8287 LearningRate 0.0888 Epoch: 1 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:54,544-Speed 2623.69 samples/sec Loss 14.1607 LearningRate 0.0888 Epoch: 1 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:04:58,473-Speed 2607.55 samples/sec Loss 14.0219 LearningRate 0.0888 Epoch: 1 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:05:02,368-Speed 2629.35 samples/sec Loss 14.1159 LearningRate 0.0888 Epoch: 1 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:05:06,267-Speed 2627.80 samples/sec Loss 13.8682 LearningRate 0.0888 Epoch: 1 Global Step: 47830 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:10,163-Speed 2628.53 samples/sec Loss 13.9447 LearningRate 0.0888 Epoch: 1 Global Step: 47840 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:14,074-Speed 2619.28 samples/sec Loss 14.0487 LearningRate 0.0888 Epoch: 1 Global Step: 47850 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:17,990-Speed 2614.97 samples/sec Loss 14.1053 LearningRate 0.0888 Epoch: 1 Global Step: 47860 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:21,877-Speed 2635.29 samples/sec Loss 13.9061 LearningRate 0.0888 Epoch: 1 Global Step: 47870 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:25,796-Speed 2613.11 samples/sec Loss 13.9164 LearningRate 0.0888 Epoch: 1 Global Step: 47880 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:29,712-Speed 2616.33 samples/sec Loss 14.1337 LearningRate 0.0888 Epoch: 1 Global Step: 47890 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:33,610-Speed 2627.61 samples/sec Loss 14.1231 LearningRate 0.0888 Epoch: 1 Global Step: 47900 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:37,509-Speed 2626.70 samples/sec Loss 13.9146 LearningRate 0.0888 Epoch: 1 Global Step: 47910 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:41,484-Speed 2576.74 samples/sec Loss 14.1513 LearningRate 0.0888 Epoch: 1 Global Step: 47920 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:45,444-Speed 2586.54 samples/sec Loss 14.0097 LearningRate 0.0888 Epoch: 1 Global Step: 47930 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:49,346-Speed 2624.97 samples/sec Loss 14.1089 LearningRate 0.0888 Epoch: 1 Global Step: 47940 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:53,240-Speed 2630.13 samples/sec Loss 14.0176 LearningRate 0.0888 Epoch: 1 Global Step: 47950 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:05:57,130-Speed 2633.78 samples/sec Loss 13.8081 LearningRate 0.0888 Epoch: 1 Global Step: 47960 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:01,047-Speed 2615.01 samples/sec Loss 13.7974 LearningRate 0.0888 Epoch: 1 Global Step: 47970 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:04,978-Speed 2605.39 samples/sec Loss 13.8842 LearningRate 0.0888 Epoch: 1 Global Step: 47980 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:08,884-Speed 2622.95 samples/sec Loss 14.0031 LearningRate 0.0888 Epoch: 1 Global Step: 47990 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:12,780-Speed 2629.24 samples/sec Loss 14.0957 LearningRate 0.0888 Epoch: 1 Global Step: 48000 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:16,681-Speed 2625.48 samples/sec Loss 14.0100 LearningRate 0.0888 Epoch: 1 Global Step: 48010 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:20,589-Speed 2620.82 samples/sec Loss 14.0644 LearningRate 0.0888 Epoch: 1 Global Step: 48020 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:24,499-Speed 2619.23 samples/sec Loss 14.0149 LearningRate 0.0888 Epoch: 1 Global Step: 48030 Fp16 Grad Scale: 524288 Required: 87 hours
Training: 2022-04-13 01:06:28,393-Speed 2630.76 samples/sec Loss 14.0791 LearningRate 0.0888 Epoch: 1 Global Step: 48040 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:32,286-Speed 2631.51 samples/sec Loss 14.0448 LearningRate 0.0888 Epoch: 1 Global Step: 48050 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:36,180-Speed 2629.72 samples/sec Loss 13.9905 LearningRate 0.0887 Epoch: 1 Global Step: 48060 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:40,108-Speed 2608.53 samples/sec Loss 13.9830 LearningRate 0.0887 Epoch: 1 Global Step: 48070 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:44,000-Speed 2631.56 samples/sec Loss 13.9327 LearningRate 0.0887 Epoch: 1 Global Step: 48080 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:47,899-Speed 2626.67 samples/sec Loss 14.0759 LearningRate 0.0887 Epoch: 1 Global Step: 48090 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:51,802-Speed 2625.05 samples/sec Loss 14.0242 LearningRate 0.0887 Epoch: 1 Global Step: 48100 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:55,702-Speed 2626.21 samples/sec Loss 13.9910 LearningRate 0.0887 Epoch: 1 Global Step: 48110 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:06:59,598-Speed 2628.40 samples/sec Loss 14.0603 LearningRate 0.0887 Epoch: 1 Global Step: 48120 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:03,512-Speed 2616.59 samples/sec Loss 13.9527 LearningRate 0.0887 Epoch: 1 Global Step: 48130 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:07,391-Speed 2641.11 samples/sec Loss 14.0234 LearningRate 0.0887 Epoch: 1 Global Step: 48140 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:11,283-Speed 2631.75 samples/sec Loss 13.9331 LearningRate 0.0887 Epoch: 1 Global Step: 48150 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:15,178-Speed 2629.82 samples/sec Loss 13.9978 LearningRate 0.0887 Epoch: 1 Global Step: 48160 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:19,080-Speed 2625.19 samples/sec Loss 13.9133 LearningRate 0.0887 Epoch: 1 Global Step: 48170 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:22,991-Speed 2619.09 samples/sec Loss 13.8184 LearningRate 0.0887 Epoch: 1 Global Step: 48180 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:26,886-Speed 2629.66 samples/sec Loss 14.0358 LearningRate 0.0887 Epoch: 1 Global Step: 48190 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:30,786-Speed 2626.56 samples/sec Loss 13.9957 LearningRate 0.0887 Epoch: 1 Global Step: 48200 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:34,690-Speed 2623.64 samples/sec Loss 13.9878 LearningRate 0.0887 Epoch: 1 Global Step: 48210 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:38,599-Speed 2620.07 samples/sec Loss 13.9356 LearningRate 0.0887 Epoch: 1 Global Step: 48220 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:42,507-Speed 2621.20 samples/sec Loss 13.9723 LearningRate 0.0887 Epoch: 1 Global Step: 48230 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:46,394-Speed 2635.24 samples/sec Loss 13.8785 LearningRate 0.0887 Epoch: 1 Global Step: 48240 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:50,296-Speed 2625.10 samples/sec Loss 13.8746 LearningRate 0.0887 Epoch: 1 Global Step: 48250 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:54,187-Speed 2631.80 samples/sec Loss 13.9103 LearningRate 0.0887 Epoch: 1 Global Step: 48260 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:07:58,086-Speed 2627.39 samples/sec Loss 14.0131 LearningRate 0.0887 Epoch: 1 Global Step: 48270 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:01,986-Speed 2626.59 samples/sec Loss 13.9173 LearningRate 0.0887 Epoch: 1 Global Step: 48280 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:05,882-Speed 2628.51 samples/sec Loss 13.9823 LearningRate 0.0887 Epoch: 1 Global Step: 48290 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:09,784-Speed 2625.06 samples/sec Loss 13.9896 LearningRate 0.0887 Epoch: 1 Global Step: 48300 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:13,681-Speed 2628.59 samples/sec Loss 13.8061 LearningRate 0.0887 Epoch: 1 Global Step: 48310 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:17,576-Speed 2629.59 samples/sec Loss 13.9223 LearningRate 0.0887 Epoch: 1 Global Step: 48320 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:21,469-Speed 2631.30 samples/sec Loss 14.1121 LearningRate 0.0887 Epoch: 1 Global Step: 48330 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:25,363-Speed 2630.11 samples/sec Loss 13.8848 LearningRate 0.0887 Epoch: 1 Global Step: 48340 Fp16 Grad Scale: 524288 Required: 87 hours
Training: 2022-04-13 01:08:29,284-Speed 2612.27 samples/sec Loss 14.0361 LearningRate 0.0887 Epoch: 1 Global Step: 48350 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:33,182-Speed 2627.16 samples/sec Loss 13.9791 LearningRate 0.0887 Epoch: 1 Global Step: 48360 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:37,084-Speed 2624.90 samples/sec Loss 14.0200 LearningRate 0.0887 Epoch: 1 Global Step: 48370 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:40,986-Speed 2624.47 samples/sec Loss 13.8592 LearningRate 0.0887 Epoch: 1 Global Step: 48380 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:44,881-Speed 2630.15 samples/sec Loss 13.8208 LearningRate 0.0887 Epoch: 1 Global Step: 48390 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:48,777-Speed 2629.46 samples/sec Loss 13.9552 LearningRate 0.0887 Epoch: 1 Global Step: 48400 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:52,674-Speed 2627.96 samples/sec Loss 13.8824 LearningRate 0.0887 Epoch: 1 Global Step: 48410 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:08:56,573-Speed 2627.74 samples/sec Loss 14.0383 LearningRate 0.0887 Epoch: 1 Global Step: 48420 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:00,466-Speed 2630.83 samples/sec Loss 13.9175 LearningRate 0.0887 Epoch: 1 Global Step: 48430 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:04,362-Speed 2628.65 samples/sec Loss 14.0994 LearningRate 0.0887 Epoch: 1 Global Step: 48440 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:08,243-Speed 2638.85 samples/sec Loss 13.8768 LearningRate 0.0887 Epoch: 1 Global Step: 48450 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:12,142-Speed 2627.39 samples/sec Loss 13.9107 LearningRate 0.0887 Epoch: 1 Global Step: 48460 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:16,035-Speed 2630.84 samples/sec Loss 13.9418 LearningRate 0.0887 Epoch: 1 Global Step: 48470 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:19,942-Speed 2621.97 samples/sec Loss 13.9891 LearningRate 0.0887 Epoch: 1 Global Step: 48480 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:23,860-Speed 2613.85 samples/sec Loss 14.0205 LearningRate 0.0887 Epoch: 1 Global Step: 48490 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:27,755-Speed 2630.38 samples/sec Loss 13.9867 LearningRate 0.0886 Epoch: 1 Global Step: 48500 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:31,664-Speed 2620.20 samples/sec Loss 14.0574 LearningRate 0.0886 Epoch: 1 Global Step: 48510 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:35,560-Speed 2628.55 samples/sec Loss 14.0468 LearningRate 0.0886 Epoch: 1 Global Step: 48520 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:39,472-Speed 2617.91 samples/sec Loss 14.0045 LearningRate 0.0886 Epoch: 1 Global Step: 48530 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:43,386-Speed 2617.33 samples/sec Loss 13.9163 LearningRate 0.0886 Epoch: 1 Global Step: 48540 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:47,268-Speed 2638.16 samples/sec Loss 13.9816 LearningRate 0.0886 Epoch: 1 Global Step: 48550 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:51,163-Speed 2629.79 samples/sec Loss 13.9027 LearningRate 0.0886 Epoch: 1 Global Step: 48560 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:09:55,043-Speed 2639.89 samples/sec Loss 13.9392 LearningRate 0.0886 Epoch: 1 Global Step: 48570 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:09:58,940-Speed 2628.36 samples/sec Loss 13.8843 LearningRate 0.0886 Epoch: 1 Global Step: 48580 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:02,836-Speed 2629.00 samples/sec Loss 14.1398 LearningRate 0.0886 Epoch: 1 Global Step: 48590 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:06,731-Speed 2629.08 samples/sec Loss 13.9670 LearningRate 0.0886 Epoch: 1 Global Step: 48600 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:10,627-Speed 2628.78 samples/sec Loss 13.9192 LearningRate 0.0886 Epoch: 1 Global Step: 48610 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:14,522-Speed 2629.77 samples/sec Loss 14.0041 LearningRate 0.0886 Epoch: 1 Global Step: 48620 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:18,417-Speed 2629.39 samples/sec Loss 14.0588 LearningRate 0.0886 Epoch: 1 Global Step: 48630 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:22,313-Speed 2629.24 samples/sec Loss 14.0464 LearningRate 0.0886 Epoch: 1 Global Step: 48640 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:26,206-Speed 2631.21 samples/sec Loss 13.9018 LearningRate 0.0886 Epoch: 1 Global Step: 48650 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:30,100-Speed 2630.36 samples/sec Loss 14.1461 LearningRate 0.0886 Epoch: 1 Global Step: 48660 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:10:33,991-Speed 2632.16 samples/sec Loss 13.9866 LearningRate 0.0886 Epoch: 1 Global Step: 48670 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:10:37,897-Speed 2622.12 samples/sec Loss 13.9029 LearningRate 0.0886 Epoch: 1 Global Step: 48680 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:10:41,786-Speed 2633.99 samples/sec Loss 14.0135 LearningRate 0.0886 Epoch: 1 Global Step: 48690 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:10:45,693-Speed 2621.16 samples/sec Loss 14.0377 LearningRate 0.0886 Epoch: 1 Global Step: 48700 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:10:49,602-Speed 2620.58 samples/sec Loss 13.9016 LearningRate 0.0886 Epoch: 1 Global Step: 48710 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:10:53,507-Speed 2622.60 samples/sec Loss 13.9755 LearningRate 0.0886 Epoch: 1 Global Step: 48720 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:10:57,407-Speed 2626.03 samples/sec Loss 14.0099 LearningRate 0.0886 Epoch: 1 Global Step: 48730 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:01,306-Speed 2627.44 samples/sec Loss 13.7711 LearningRate 0.0886 Epoch: 1 Global Step: 48740 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:05,204-Speed 2627.23 samples/sec Loss 13.9880 LearningRate 0.0886 Epoch: 1 Global Step: 48750 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:09,105-Speed 2625.78 samples/sec Loss 13.8887 LearningRate 0.0886 Epoch: 1 Global Step: 48760 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:12,989-Speed 2637.02 samples/sec Loss 14.0647 LearningRate 0.0886 Epoch: 1 Global Step: 48770 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:16,943-Speed 2590.69 samples/sec Loss 14.0639 LearningRate 0.0886 Epoch: 1 Global Step: 48780 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:20,878-Speed 2602.27 samples/sec Loss 13.9664 LearningRate 0.0886 Epoch: 1 Global Step: 48790 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:24,775-Speed 2628.64 samples/sec Loss 14.1556 LearningRate 0.0886 Epoch: 1 Global Step: 48800 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:11:28,668-Speed 2630.59 samples/sec Loss 14.0453 LearningRate 0.0886 Epoch: 1 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:32,570-Speed 2624.92 samples/sec Loss 13.9453 LearningRate 0.0886 Epoch: 1 Global Step: 48820 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:36,488-Speed 2614.47 samples/sec Loss 14.0618 LearningRate 0.0886 Epoch: 1 Global Step: 48830 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:40,389-Speed 2625.83 samples/sec Loss 14.0737 LearningRate 0.0886 Epoch: 1 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:44,288-Speed 2627.08 samples/sec Loss 13.9652 LearningRate 0.0886 Epoch: 1 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:48,192-Speed 2623.16 samples/sec Loss 13.8942 LearningRate 0.0886 Epoch: 1 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:52,087-Speed 2629.79 samples/sec Loss 13.9053 LearningRate 0.0886 Epoch: 1 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:55,982-Speed 2629.54 samples/sec Loss 13.8821 LearningRate 0.0886 Epoch: 1 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:11:59,879-Speed 2627.91 samples/sec Loss 13.9839 LearningRate 0.0886 Epoch: 1 Global Step: 48890 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:12:03,782-Speed 2624.40 samples/sec Loss 13.8919 LearningRate 0.0886 Epoch: 1 Global Step: 48900 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:12:07,687-Speed 2622.85 samples/sec Loss 14.0463 LearningRate 0.0886 Epoch: 1 Global Step: 48910 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:11,587-Speed 2626.79 samples/sec Loss 13.9540 LearningRate 0.0886 Epoch: 1 Global Step: 48920 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:15,482-Speed 2629.08 samples/sec Loss 13.9421 LearningRate 0.0886 Epoch: 1 Global Step: 48930 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:19,379-Speed 2628.65 samples/sec Loss 14.1469 LearningRate 0.0885 Epoch: 1 Global Step: 48940 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:23,273-Speed 2630.50 samples/sec Loss 13.9099 LearningRate 0.0885 Epoch: 1 Global Step: 48950 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:27,186-Speed 2617.26 samples/sec Loss 13.9456 LearningRate 0.0885 Epoch: 1 Global Step: 48960 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:31,090-Speed 2623.30 samples/sec Loss 13.9732 LearningRate 0.0885 Epoch: 1 Global Step: 48970 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:35,062-Speed 2578.53 samples/sec Loss 13.8757 LearningRate 0.0885 Epoch: 1 Global Step: 48980 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:38,959-Speed 2628.35 samples/sec Loss 14.0201 LearningRate 0.0885 Epoch: 1 Global Step: 48990 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:42,857-Speed 2627.40 samples/sec Loss 14.0431 LearningRate 0.0885 Epoch: 1 Global Step: 49000 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:46,742-Speed 2636.76 samples/sec Loss 13.9349 LearningRate 0.0885 Epoch: 1 Global Step: 49010 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:50,641-Speed 2627.12 samples/sec Loss 13.9699 LearningRate 0.0885 Epoch: 1 Global Step: 49020 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:54,535-Speed 2630.20 samples/sec Loss 13.9458 LearningRate 0.0885 Epoch: 1 Global Step: 49030 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:12:58,429-Speed 2630.69 samples/sec Loss 13.9015 LearningRate 0.0885 Epoch: 1 Global Step: 49040 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:02,325-Speed 2628.60 samples/sec Loss 14.0679 LearningRate 0.0885 Epoch: 1 Global Step: 49050 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:06,237-Speed 2618.06 samples/sec Loss 13.9068 LearningRate 0.0885 Epoch: 1 Global Step: 49060 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:10,138-Speed 2625.68 samples/sec Loss 13.9280 LearningRate 0.0885 Epoch: 1 Global Step: 49070 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:14,046-Speed 2621.77 samples/sec Loss 14.0824 LearningRate 0.0885 Epoch: 1 Global Step: 49080 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:17,946-Speed 2625.59 samples/sec Loss 13.9710 LearningRate 0.0885 Epoch: 1 Global Step: 49090 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:21,896-Speed 2593.42 samples/sec Loss 14.0631 LearningRate 0.0885 Epoch: 1 Global Step: 49100 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:25,960-Speed 2520.78 samples/sec Loss 13.9501 LearningRate 0.0885 Epoch: 1 Global Step: 49110 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:29,931-Speed 2579.15 samples/sec Loss 13.9582 LearningRate 0.0885 Epoch: 1 Global Step: 49120 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:33,832-Speed 2625.44 samples/sec Loss 14.0019 LearningRate 0.0885 Epoch: 1 Global Step: 49130 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:37,733-Speed 2625.46 samples/sec Loss 13.8035 LearningRate 0.0885 Epoch: 1 Global Step: 49140 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:41,637-Speed 2623.68 samples/sec Loss 13.9635 LearningRate 0.0885 Epoch: 1 Global Step: 49150 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:45,541-Speed 2623.58 samples/sec Loss 13.7972 LearningRate 0.0885 Epoch: 1 Global Step: 49160 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:49,450-Speed 2620.30 samples/sec Loss 13.9927 LearningRate 0.0885 Epoch: 1 Global Step: 49170 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:53,367-Speed 2614.43 samples/sec Loss 14.0363 LearningRate 0.0885 Epoch: 1 Global Step: 49180 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:13:57,263-Speed 2628.94 samples/sec Loss 13.7618 LearningRate 0.0885 Epoch: 1 Global Step: 49190 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:14:01,161-Speed 2627.69 samples/sec Loss 14.0434 LearningRate 0.0885 Epoch: 1 Global Step: 49200 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:14:05,068-Speed 2621.89 samples/sec Loss 14.0735 LearningRate 0.0885 Epoch: 1 Global Step: 49210 Fp16 Grad Scale: 524288 Required: 87 hours
Training: 2022-04-13 01:14:08,952-Speed 2637.11 samples/sec Loss 13.9577 LearningRate 0.0885 Epoch: 1 Global Step: 49220 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:14:12,856-Speed 2623.33 samples/sec Loss 13.9234 LearningRate 0.0885 Epoch: 1 Global Step: 49230 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:14:16,742-Speed 2635.45 samples/sec Loss 13.8222 LearningRate 0.0885 Epoch: 1 Global Step: 49240 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:20,639-Speed 2628.88 samples/sec Loss 13.8946 LearningRate 0.0885 Epoch: 1 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:24,537-Speed 2626.77 samples/sec Loss 13.9583 LearningRate 0.0885 Epoch: 1 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:28,439-Speed 2625.06 samples/sec Loss 13.8262 LearningRate 0.0885 Epoch: 1 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:32,337-Speed 2627.54 samples/sec Loss 14.0905 LearningRate 0.0885 Epoch: 1 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:36,234-Speed 2627.92 samples/sec Loss 13.9592 LearningRate 0.0885 Epoch: 1 Global Step: 49290 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:40,127-Speed 2631.33 samples/sec Loss 14.0301 LearningRate 0.0885 Epoch: 1 Global Step: 49300 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:44,021-Speed 2630.91 samples/sec Loss 13.8695 LearningRate 0.0885 Epoch: 1 Global Step: 49310 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:47,920-Speed 2626.61 samples/sec Loss 14.0518 LearningRate 0.0885 Epoch: 1 Global Step: 49320 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:51,816-Speed 2628.98 samples/sec Loss 13.8207 LearningRate 0.0885 Epoch: 1 Global Step: 49330 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:14:55,711-Speed 2629.69 samples/sec Loss 14.1276 LearningRate 0.0885 Epoch: 1 Global Step: 49340 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:14:59,592-Speed 2639.29 samples/sec Loss 13.8537 LearningRate 0.0885 Epoch: 1 Global Step: 49350 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:03,497-Speed 2622.69 samples/sec Loss 13.9304 LearningRate 0.0885 Epoch: 1 Global Step: 49360 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:07,395-Speed 2626.87 samples/sec Loss 13.8455 LearningRate 0.0885 Epoch: 1 Global Step: 49370 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:11,290-Speed 2629.73 samples/sec Loss 14.0827 LearningRate 0.0884 Epoch: 1 Global Step: 49380 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:15,188-Speed 2627.76 samples/sec Loss 13.8766 LearningRate 0.0884 Epoch: 1 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:19,083-Speed 2630.08 samples/sec Loss 13.8512 LearningRate 0.0884 Epoch: 1 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:22,984-Speed 2625.64 samples/sec Loss 13.9173 LearningRate 0.0884 Epoch: 1 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:26,879-Speed 2630.06 samples/sec Loss 14.0429 LearningRate 0.0884 Epoch: 1 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:30,773-Speed 2630.37 samples/sec Loss 13.8448 LearningRate 0.0884 Epoch: 1 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:34,668-Speed 2629.21 samples/sec Loss 14.0156 LearningRate 0.0884 Epoch: 1 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:15:38,583-Speed 2615.86 samples/sec Loss 13.9404 LearningRate 0.0884 Epoch: 1 Global Step: 49450 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:15:42,481-Speed 2628.61 samples/sec Loss 13.8314 LearningRate 0.0884 Epoch: 1 Global Step: 49460 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:15:46,393-Speed 2618.85 samples/sec Loss 13.9549 LearningRate 0.0884 Epoch: 1 Global Step: 49470 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:15:50,315-Speed 2611.82 samples/sec Loss 13.8279 LearningRate 0.0884 Epoch: 1 Global Step: 49480 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:15:54,210-Speed 2629.61 samples/sec Loss 13.8484 LearningRate 0.0884 Epoch: 1 Global Step: 49490 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:15:58,123-Speed 2618.11 samples/sec Loss 14.0316 LearningRate 0.0884 Epoch: 1 Global Step: 49500 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:16:02,022-Speed 2627.16 samples/sec Loss 13.9481 LearningRate 0.0884 Epoch: 1 Global Step: 49510 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:16:05,916-Speed 2630.12 samples/sec Loss 13.9999 LearningRate 0.0884 Epoch: 1 Global Step: 49520 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:16:09,812-Speed 2628.53 samples/sec Loss 13.8414 LearningRate 0.0884 Epoch: 1 Global Step: 49530 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:16:13,705-Speed 2631.49 samples/sec Loss 13.9733 LearningRate 0.0884 Epoch: 1 Global Step: 49540 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:16:17,596-Speed 2632.15 samples/sec Loss 14.0726 LearningRate 0.0884 Epoch: 1 Global Step: 49550 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:16:21,481-Speed 2636.33 samples/sec Loss 14.0962 LearningRate 0.0884 Epoch: 1 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:16:25,383-Speed 2626.47 samples/sec Loss 13.9987 LearningRate 0.0884 Epoch: 1 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:16:29,284-Speed 2625.90 samples/sec Loss 13.8806 LearningRate 0.0884 Epoch: 1 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:16:33,180-Speed 2628.69 samples/sec Loss 13.9832 LearningRate 0.0884 Epoch: 1 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:16:37,076-Speed 2628.79 samples/sec Loss 13.9747 LearningRate 0.0884 Epoch: 1 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:16:40,970-Speed 2630.41 samples/sec Loss 13.9705 LearningRate 0.0884 Epoch: 1 Global Step: 49610 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:16:44,866-Speed 2628.93 samples/sec Loss 13.8942 LearningRate 0.0884 Epoch: 1 Global Step: 49620 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:16:48,815-Speed 2594.02 samples/sec Loss 14.0441 LearningRate 0.0884 Epoch: 1 Global Step: 49630 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:16:52,713-Speed 2627.58 samples/sec Loss 13.7869 LearningRate 0.0884 Epoch: 1 Global Step: 49640 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:16:56,607-Speed 2630.33 samples/sec Loss 13.7766 LearningRate 0.0884 Epoch: 1 Global Step: 49650 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:17:00,502-Speed 2629.49 samples/sec Loss 13.8981 LearningRate 0.0884 Epoch: 1 Global Step: 49660 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:17:04,411-Speed 2620.12 samples/sec Loss 14.0257 LearningRate 0.0884 Epoch: 1 Global Step: 49670 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:17:08,311-Speed 2626.05 samples/sec Loss 13.9578 LearningRate 0.0884 Epoch: 1 Global Step: 49680 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:17:12,205-Speed 2630.90 samples/sec Loss 13.8833 LearningRate 0.0884 Epoch: 1 Global Step: 49690 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:17:16,106-Speed 2625.31 samples/sec Loss 13.9454 LearningRate 0.0884 Epoch: 1 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:17:20,006-Speed 2626.55 samples/sec Loss 13.9366 LearningRate 0.0884 Epoch: 1 Global Step: 49710 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:23,902-Speed 2628.88 samples/sec Loss 13.9270 LearningRate 0.0884 Epoch: 1 Global Step: 49720 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:27,813-Speed 2619.05 samples/sec Loss 13.7845 LearningRate 0.0884 Epoch: 1 Global Step: 49730 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:31,708-Speed 2629.96 samples/sec Loss 13.8957 LearningRate 0.0884 Epoch: 1 Global Step: 49740 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:35,611-Speed 2624.42 samples/sec Loss 14.0160 LearningRate 0.0884 Epoch: 1 Global Step: 49750 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:39,520-Speed 2620.44 samples/sec Loss 13.8984 LearningRate 0.0884 Epoch: 1 Global Step: 49760 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:43,422-Speed 2625.20 samples/sec Loss 13.8062 LearningRate 0.0884 Epoch: 1 Global Step: 49770 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:47,317-Speed 2629.94 samples/sec Loss 13.7931 LearningRate 0.0884 Epoch: 1 Global Step: 49780 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:51,215-Speed 2627.46 samples/sec Loss 13.9275 LearningRate 0.0884 Epoch: 1 Global Step: 49790 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:55,109-Speed 2629.87 samples/sec Loss 13.8347 LearningRate 0.0884 Epoch: 1 Global Step: 49800 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:17:59,022-Speed 2617.92 samples/sec Loss 13.8798 LearningRate 0.0884 Epoch: 1 Global Step: 49810 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:18:02,938-Speed 2615.74 samples/sec Loss 13.8102 LearningRate 0.0883 Epoch: 1 Global Step: 49820 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:18:06,838-Speed 2626.05 samples/sec Loss 13.8071 LearningRate 0.0883 Epoch: 1 Global Step: 49830 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:18:10,720-Speed 2638.77 samples/sec Loss 13.9118 LearningRate 0.0883 Epoch: 1 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:14,619-Speed 2627.07 samples/sec Loss 13.8530 LearningRate 0.0883 Epoch: 1 Global Step: 49850 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:18,518-Speed 2627.36 samples/sec Loss 13.8131 LearningRate 0.0883 Epoch: 1 Global Step: 49860 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:22,412-Speed 2630.53 samples/sec Loss 13.8207 LearningRate 0.0883 Epoch: 1 Global Step: 49870 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:26,314-Speed 2624.36 samples/sec Loss 13.8751 LearningRate 0.0883 Epoch: 1 Global Step: 49880 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:30,310-Speed 2563.17 samples/sec Loss 13.9348 LearningRate 0.0883 Epoch: 1 Global Step: 49890 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:34,204-Speed 2631.03 samples/sec Loss 13.9997 LearningRate 0.0883 Epoch: 1 Global Step: 49900 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:38,098-Speed 2630.16 samples/sec Loss 13.8451 LearningRate 0.0883 Epoch: 1 Global Step: 49910 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:42,000-Speed 2625.01 samples/sec Loss 13.7688 LearningRate 0.0883 Epoch: 1 Global Step: 49920 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:45,914-Speed 2616.59 samples/sec Loss 13.9467 LearningRate 0.0883 Epoch: 1 Global Step: 49930 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:18:49,824-Speed 2619.46 samples/sec Loss 13.8931 LearningRate 0.0883 Epoch: 1 Global Step: 49940 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:18:53,719-Speed 2630.35 samples/sec Loss 13.8883 LearningRate 0.0883 Epoch: 1 Global Step: 49950 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:18:57,615-Speed 2628.64 samples/sec Loss 13.9709 LearningRate 0.0883 Epoch: 1 Global Step: 49960 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:19:01,517-Speed 2624.98 samples/sec Loss 13.8682 LearningRate 0.0883 Epoch: 1 Global Step: 49970 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:19:05,415-Speed 2627.14 samples/sec Loss 13.8674 LearningRate 0.0883 Epoch: 1 Global Step: 49980 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:19:09,321-Speed 2622.68 samples/sec Loss 13.9311 LearningRate 0.0883 Epoch: 1 Global Step: 49990 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:19:13,216-Speed 2629.10 samples/sec Loss 13.8255 LearningRate 0.0883 Epoch: 1 Global Step: 50000 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:19:55,881-[lfw][50000]XNorm: 21.741130
Training: 2022-04-13 01:19:55,882-[lfw][50000]Accuracy-Flip: 0.99600+-0.00335
Training: 2022-04-13 01:19:55,882-[lfw][50000]Accuracy-Highest: 0.99600
Training: 2022-04-13 01:20:46,216-[cfp_fp][50000]XNorm: 19.481108
Training: 2022-04-13 01:20:46,217-[cfp_fp][50000]Accuracy-Flip: 0.97500+-0.00935
Training: 2022-04-13 01:20:46,218-[cfp_fp][50000]Accuracy-Highest: 0.97500
Training: 2022-04-13 01:21:29,214-[agedb_30][50000]XNorm: 21.220101
Training: 2022-04-13 01:21:29,215-[agedb_30][50000]Accuracy-Flip: 0.96100+-0.01081
Training: 2022-04-13 01:21:29,216-[agedb_30][50000]Accuracy-Highest: 0.96100
Training: 2022-04-13 01:21:33,076-Speed 73.22 samples/sec Loss 13.7808 LearningRate 0.0883 Epoch: 1 Global Step: 50010 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:21:36,943-Speed 2648.86 samples/sec Loss 13.9753 LearningRate 0.0883 Epoch: 1 Global Step: 50020 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:21:40,814-Speed 2646.19 samples/sec Loss 13.7504 LearningRate 0.0883 Epoch: 1 Global Step: 50030 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:21:44,707-Speed 2631.01 samples/sec Loss 13.8417 LearningRate 0.0883 Epoch: 1 Global Step: 50040 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:21:48,617-Speed 2619.86 samples/sec Loss 13.8234 LearningRate 0.0883 Epoch: 1 Global Step: 50050 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:21:52,514-Speed 2628.24 samples/sec Loss 13.8655 LearningRate 0.0883 Epoch: 1 Global Step: 50060 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:21:56,405-Speed 2632.49 samples/sec Loss 13.8623 LearningRate 0.0883 Epoch: 1 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:00,314-Speed 2619.81 samples/sec Loss 13.8250 LearningRate 0.0883 Epoch: 1 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:04,203-Speed 2634.14 samples/sec Loss 13.8472 LearningRate 0.0883 Epoch: 1 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:08,094-Speed 2633.07 samples/sec Loss 13.8505 LearningRate 0.0883 Epoch: 1 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:11,984-Speed 2633.17 samples/sec Loss 13.9334 LearningRate 0.0883 Epoch: 1 Global Step: 50110 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:22:15,872-Speed 2634.11 samples/sec Loss 13.7816 LearningRate 0.0883 Epoch: 1 Global Step: 50120 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:22:19,764-Speed 2631.01 samples/sec Loss 13.8765 LearningRate 0.0883 Epoch: 1 Global Step: 50130 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:22:23,653-Speed 2634.09 samples/sec Loss 13.9630 LearningRate 0.0883 Epoch: 1 Global Step: 50140 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:22:27,548-Speed 2630.40 samples/sec Loss 13.9984 LearningRate 0.0883 Epoch: 1 Global Step: 50150 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:31,432-Speed 2636.66 samples/sec Loss 13.8003 LearningRate 0.0883 Epoch: 1 Global Step: 50160 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:35,330-Speed 2627.61 samples/sec Loss 13.9886 LearningRate 0.0883 Epoch: 1 Global Step: 50170 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:39,237-Speed 2621.84 samples/sec Loss 13.8831 LearningRate 0.0883 Epoch: 1 Global Step: 50180 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:43,149-Speed 2618.59 samples/sec Loss 13.8809 LearningRate 0.0883 Epoch: 1 Global Step: 50190 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:47,236-Speed 2505.84 samples/sec Loss 13.8762 LearningRate 0.0883 Epoch: 1 Global Step: 50200 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:51,336-Speed 2498.65 samples/sec Loss 13.8197 LearningRate 0.0883 Epoch: 1 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:55,301-Speed 2583.17 samples/sec Loss 13.8583 LearningRate 0.0883 Epoch: 1 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:22:59,201-Speed 2626.04 samples/sec Loss 13.9605 LearningRate 0.0883 Epoch: 1 Global Step: 50230 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:23:03,094-Speed 2631.49 samples/sec Loss 13.9997 LearningRate 0.0883 Epoch: 1 Global Step: 50240 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:23:06,987-Speed 2630.54 samples/sec Loss 13.9226 LearningRate 0.0883 Epoch: 1 Global Step: 50250 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:23:10,890-Speed 2624.65 samples/sec Loss 13.7769 LearningRate 0.0883 Epoch: 1 Global Step: 50260 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:23:14,788-Speed 2627.64 samples/sec Loss 13.9304 LearningRate 0.0882 Epoch: 1 Global Step: 50270 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:23:18,747-Speed 2587.34 samples/sec Loss 13.8998 LearningRate 0.0882 Epoch: 1 Global Step: 50280 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:23:22,677-Speed 2606.58 samples/sec Loss 13.8686 LearningRate 0.0882 Epoch: 1 Global Step: 50290 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:23:26,555-Speed 2641.27 samples/sec Loss 13.7471 LearningRate 0.0882 Epoch: 1 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:30,451-Speed 2628.86 samples/sec Loss 13.7989 LearningRate 0.0882 Epoch: 1 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:34,345-Speed 2630.31 samples/sec Loss 13.8626 LearningRate 0.0882 Epoch: 1 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:38,247-Speed 2624.92 samples/sec Loss 13.9787 LearningRate 0.0882 Epoch: 1 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:42,145-Speed 2631.32 samples/sec Loss 13.8855 LearningRate 0.0882 Epoch: 1 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:46,041-Speed 2629.15 samples/sec Loss 13.9081 LearningRate 0.0882 Epoch: 1 Global Step: 50350 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:49,956-Speed 2616.08 samples/sec Loss 13.9144 LearningRate 0.0882 Epoch: 1 Global Step: 50360 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:53,848-Speed 2631.91 samples/sec Loss 13.9868 LearningRate 0.0882 Epoch: 1 Global Step: 50370 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:23:57,750-Speed 2625.21 samples/sec Loss 13.8303 LearningRate 0.0882 Epoch: 1 Global Step: 50380 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:24:01,656-Speed 2622.11 samples/sec Loss 13.8282 LearningRate 0.0882 Epoch: 1 Global Step: 50390 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:24:05,573-Speed 2614.86 samples/sec Loss 13.7785 LearningRate 0.0882 Epoch: 1 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:09,463-Speed 2632.51 samples/sec Loss 13.8023 LearningRate 0.0882 Epoch: 1 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:13,354-Speed 2633.11 samples/sec Loss 13.9475 LearningRate 0.0882 Epoch: 1 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:17,248-Speed 2630.62 samples/sec Loss 13.7294 LearningRate 0.0882 Epoch: 1 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:21,151-Speed 2624.22 samples/sec Loss 13.8229 LearningRate 0.0882 Epoch: 1 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:25,060-Speed 2620.33 samples/sec Loss 13.9044 LearningRate 0.0882 Epoch: 1 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:28,953-Speed 2630.73 samples/sec Loss 13.9809 LearningRate 0.0882 Epoch: 1 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:32,858-Speed 2623.25 samples/sec Loss 13.9650 LearningRate 0.0882 Epoch: 1 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:36,755-Speed 2628.19 samples/sec Loss 13.9701 LearningRate 0.0882 Epoch: 1 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:40,650-Speed 2629.86 samples/sec Loss 13.8376 LearningRate 0.0882 Epoch: 1 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:44,605-Speed 2589.86 samples/sec Loss 13.8691 LearningRate 0.0882 Epoch: 1 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:48,491-Speed 2635.80 samples/sec Loss 13.9073 LearningRate 0.0882 Epoch: 1 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:52,387-Speed 2629.13 samples/sec Loss 13.9352 LearningRate 0.0882 Epoch: 1 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:24:56,302-Speed 2616.85 samples/sec Loss 13.7322 LearningRate 0.0882 Epoch: 1 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:00,209-Speed 2621.67 samples/sec Loss 13.9864 LearningRate 0.0882 Epoch: 1 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:04,128-Speed 2613.41 samples/sec Loss 13.9039 LearningRate 0.0882 Epoch: 1 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:08,024-Speed 2628.67 samples/sec Loss 13.7583 LearningRate 0.0882 Epoch: 1 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:11,922-Speed 2627.70 samples/sec Loss 13.6593 LearningRate 0.0882 Epoch: 1 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:15,813-Speed 2632.35 samples/sec Loss 14.0868 LearningRate 0.0882 Epoch: 1 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:19,706-Speed 2631.87 samples/sec Loss 13.8017 LearningRate 0.0882 Epoch: 1 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:23,600-Speed 2630.11 samples/sec Loss 13.8525 LearningRate 0.0882 Epoch: 1 Global Step: 50600 Fp16 Grad Scale: 262144 Required: 88 hours
Training: 2022-04-13 01:25:27,482-Speed 2638.65 samples/sec Loss 13.7864 LearningRate 0.0882 Epoch: 1 Global Step: 50610 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:31,377-Speed 2629.79 samples/sec Loss 13.8000 LearningRate 0.0882 Epoch: 1 Global Step: 50620 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:35,276-Speed 2626.29 samples/sec Loss 13.7956 LearningRate 0.0882 Epoch: 1 Global Step: 50630 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:39,175-Speed 2627.34 samples/sec Loss 14.0180 LearningRate 0.0882 Epoch: 1 Global Step: 50640 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:43,076-Speed 2625.87 samples/sec Loss 13.8380 LearningRate 0.0882 Epoch: 1 Global Step: 50650 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:46,973-Speed 2628.45 samples/sec Loss 13.8372 LearningRate 0.0882 Epoch: 1 Global Step: 50660 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:50,872-Speed 2627.04 samples/sec Loss 13.8552 LearningRate 0.0882 Epoch: 1 Global Step: 50670 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:54,767-Speed 2629.03 samples/sec Loss 13.8331 LearningRate 0.0882 Epoch: 1 Global Step: 50680 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:25:58,664-Speed 2628.53 samples/sec Loss 13.9342 LearningRate 0.0882 Epoch: 1 Global Step: 50690 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:26:02,541-Speed 2642.16 samples/sec Loss 13.9531 LearningRate 0.0882 Epoch: 1 Global Step: 50700 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:06,512-Speed 2579.07 samples/sec Loss 13.7797 LearningRate 0.0881 Epoch: 1 Global Step: 50710 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:10,421-Speed 2620.46 samples/sec Loss 13.8090 LearningRate 0.0881 Epoch: 1 Global Step: 50720 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:14,314-Speed 2630.53 samples/sec Loss 13.8742 LearningRate 0.0881 Epoch: 1 Global Step: 50730 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:18,207-Speed 2631.58 samples/sec Loss 13.8570 LearningRate 0.0881 Epoch: 1 Global Step: 50740 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:22,104-Speed 2628.16 samples/sec Loss 13.8462 LearningRate 0.0881 Epoch: 1 Global Step: 50750 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:25,999-Speed 2630.16 samples/sec Loss 13.9757 LearningRate 0.0881 Epoch: 1 Global Step: 50760 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:29,899-Speed 2626.16 samples/sec Loss 13.7303 LearningRate 0.0881 Epoch: 1 Global Step: 50770 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:33,824-Speed 2610.31 samples/sec Loss 13.8633 LearningRate 0.0881 Epoch: 1 Global Step: 50780 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:37,880-Speed 2525.09 samples/sec Loss 13.7710 LearningRate 0.0881 Epoch: 1 Global Step: 50790 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:26:41,776-Speed 2628.58 samples/sec Loss 13.9808 LearningRate 0.0881 Epoch: 1 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:26:45,669-Speed 2631.33 samples/sec Loss 13.9466 LearningRate 0.0881 Epoch: 1 Global Step: 50810 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:26:49,564-Speed 2629.93 samples/sec Loss 13.8886 LearningRate 0.0881 Epoch: 1 Global Step: 50820 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:26:53,458-Speed 2630.26 samples/sec Loss 13.8001 LearningRate 0.0881 Epoch: 1 Global Step: 50830 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:26:57,379-Speed 2612.30 samples/sec Loss 13.8264 LearningRate 0.0881 Epoch: 1 Global Step: 50840 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:27:01,278-Speed 2627.52 samples/sec Loss 13.7634 LearningRate 0.0881 Epoch: 1 Global Step: 50850 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:27:05,204-Speed 2608.88 samples/sec Loss 13.7432 LearningRate 0.0881 Epoch: 1 Global Step: 50860 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:27:09,108-Speed 2623.59 samples/sec Loss 13.9414 LearningRate 0.0881 Epoch: 1 Global Step: 50870 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:27:13,056-Speed 2593.92 samples/sec Loss 14.0218 LearningRate 0.0881 Epoch: 1 Global Step: 50880 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:27:16,953-Speed 2628.96 samples/sec Loss 13.9274 LearningRate 0.0881 Epoch: 1 Global Step: 50890 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:27:20,854-Speed 2625.64 samples/sec Loss 13.9075 LearningRate 0.0881 Epoch: 1 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:24,749-Speed 2629.74 samples/sec Loss 13.7872 LearningRate 0.0881 Epoch: 1 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:28,647-Speed 2627.63 samples/sec Loss 13.9030 LearningRate 0.0881 Epoch: 1 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:32,545-Speed 2627.73 samples/sec Loss 13.7849 LearningRate 0.0881 Epoch: 1 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:36,444-Speed 2627.04 samples/sec Loss 13.7835 LearningRate 0.0881 Epoch: 1 Global Step: 50940 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:40,361-Speed 2614.71 samples/sec Loss 13.9766 LearningRate 0.0881 Epoch: 1 Global Step: 50950 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:44,254-Speed 2630.56 samples/sec Loss 13.8344 LearningRate 0.0881 Epoch: 1 Global Step: 50960 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:48,154-Speed 2626.59 samples/sec Loss 13.8646 LearningRate 0.0881 Epoch: 1 Global Step: 50970 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:52,066-Speed 2618.21 samples/sec Loss 13.8155 LearningRate 0.0881 Epoch: 1 Global Step: 50980 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:27:56,268-Speed 2437.69 samples/sec Loss 13.9233 LearningRate 0.0881 Epoch: 1 Global Step: 50990 Fp16 Grad Scale: 131072 Required: 88 hours
Training: 2022-04-13 01:28:00,133-Speed 2649.52 samples/sec Loss 13.8188 LearningRate 0.0881 Epoch: 1 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:28:04,035-Speed 2625.34 samples/sec Loss 13.8439 LearningRate 0.0881 Epoch: 1 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:28:07,944-Speed 2620.23 samples/sec Loss 13.8452 LearningRate 0.0881 Epoch: 1 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:28:11,849-Speed 2623.10 samples/sec Loss 13.8689 LearningRate 0.0881 Epoch: 1 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:28:15,753-Speed 2623.52 samples/sec Loss 14.0367 LearningRate 0.0881 Epoch: 1 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:28:19,621-Speed 2648.09 samples/sec Loss 13.9063 LearningRate 0.0881 Epoch: 1 Global Step: 51050 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:23,575-Speed 2590.45 samples/sec Loss 14.0058 LearningRate 0.0881 Epoch: 1 Global Step: 51060 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:27,481-Speed 2629.70 samples/sec Loss 13.7980 LearningRate 0.0881 Epoch: 1 Global Step: 51070 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:31,405-Speed 2610.01 samples/sec Loss 13.6780 LearningRate 0.0881 Epoch: 1 Global Step: 51080 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:35,313-Speed 2620.81 samples/sec Loss 13.9825 LearningRate 0.0881 Epoch: 1 Global Step: 51090 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:39,208-Speed 2630.07 samples/sec Loss 13.9629 LearningRate 0.0881 Epoch: 1 Global Step: 51100 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:43,110-Speed 2625.46 samples/sec Loss 13.9012 LearningRate 0.0881 Epoch: 1 Global Step: 51110 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:47,003-Speed 2630.57 samples/sec Loss 13.8993 LearningRate 0.0881 Epoch: 1 Global Step: 51120 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:50,897-Speed 2630.56 samples/sec Loss 13.8998 LearningRate 0.0881 Epoch: 1 Global Step: 51130 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:54,796-Speed 2626.23 samples/sec Loss 13.9163 LearningRate 0.0881 Epoch: 1 Global Step: 51140 Fp16 Grad Scale: 16384 Required: 88 hours
Training: 2022-04-13 01:28:58,703-Speed 2621.84 samples/sec Loss 13.8366 LearningRate 0.0880 Epoch: 1 Global Step: 51150 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:02,609-Speed 2622.04 samples/sec Loss 13.9206 LearningRate 0.0880 Epoch: 1 Global Step: 51160 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:06,507-Speed 2628.04 samples/sec Loss 13.8270 LearningRate 0.0880 Epoch: 1 Global Step: 51170 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:10,408-Speed 2625.72 samples/sec Loss 13.7650 LearningRate 0.0880 Epoch: 1 Global Step: 51180 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:14,301-Speed 2631.14 samples/sec Loss 14.0210 LearningRate 0.0880 Epoch: 1 Global Step: 51190 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:18,384-Speed 2509.27 samples/sec Loss 13.8866 LearningRate 0.0880 Epoch: 1 Global Step: 51200 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:22,280-Speed 2628.35 samples/sec Loss 13.9307 LearningRate 0.0880 Epoch: 1 Global Step: 51210 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:26,175-Speed 2630.06 samples/sec Loss 13.9387 LearningRate 0.0880 Epoch: 1 Global Step: 51220 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:30,072-Speed 2628.55 samples/sec Loss 13.8742 LearningRate 0.0880 Epoch: 1 Global Step: 51230 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:33,974-Speed 2624.45 samples/sec Loss 13.8937 LearningRate 0.0880 Epoch: 1 Global Step: 51240 Fp16 Grad Scale: 32768 Required: 88 hours
Training: 2022-04-13 01:29:37,883-Speed 2620.66 samples/sec Loss 13.8816 LearningRate 0.0880 Epoch: 1 Global Step: 51250 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:29:41,795-Speed 2618.21 samples/sec Loss 13.8762 LearningRate 0.0880 Epoch: 1 Global Step: 51260 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:29:45,701-Speed 2622.12 samples/sec Loss 13.8245 LearningRate 0.0880 Epoch: 1 Global Step: 51270 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:29:49,651-Speed 2593.74 samples/sec Loss 13.8381 LearningRate 0.0880 Epoch: 1 Global Step: 51280 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:29:53,554-Speed 2624.22 samples/sec Loss 13.8560 LearningRate 0.0880 Epoch: 1 Global Step: 51290 Fp16 Grad Scale: 65536 Required: 88 hours
Training: 2022-04-13 01:29:57,447-Speed 2631.18 samples/sec Loss 13.7658 LearningRate 0.0880 Epoch: 1 Global Step: 51300 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:30:01,327-Speed 2639.94 samples/sec Loss 13.9093 LearningRate 0.0880 Epoch: 1 Global Step: 51310 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:05,221-Speed 2630.58 samples/sec Loss 13.9092 LearningRate 0.0880 Epoch: 1 Global Step: 51320 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:09,114-Speed 2630.80 samples/sec Loss 13.8230 LearningRate 0.0880 Epoch: 1 Global Step: 51330 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:13,013-Speed 2627.45 samples/sec Loss 13.8431 LearningRate 0.0880 Epoch: 1 Global Step: 51340 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:16,911-Speed 2627.59 samples/sec Loss 13.7753 LearningRate 0.0880 Epoch: 1 Global Step: 51350 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:20,812-Speed 2627.15 samples/sec Loss 13.8373 LearningRate 0.0880 Epoch: 1 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:24,704-Speed 2631.50 samples/sec Loss 14.0575 LearningRate 0.0880 Epoch: 1 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:28,639-Speed 2603.00 samples/sec Loss 13.6554 LearningRate 0.0880 Epoch: 1 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:32,531-Speed 2631.71 samples/sec Loss 13.9141 LearningRate 0.0880 Epoch: 1 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:36,430-Speed 2627.13 samples/sec Loss 13.9383 LearningRate 0.0880 Epoch: 1 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:30:40,331-Speed 2625.56 samples/sec Loss 13.9343 LearningRate 0.0880 Epoch: 1 Global Step: 51410 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:30:44,235-Speed 2623.46 samples/sec Loss 13.8223 LearningRate 0.0880 Epoch: 1 Global Step: 51420 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:30:48,141-Speed 2622.92 samples/sec Loss 13.8065 LearningRate 0.0880 Epoch: 1 Global Step: 51430 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:30:52,045-Speed 2623.22 samples/sec Loss 13.8876 LearningRate 0.0880 Epoch: 1 Global Step: 51440 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:30:55,942-Speed 2628.40 samples/sec Loss 13.9137 LearningRate 0.0880 Epoch: 1 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:30:59,872-Speed 2606.10 samples/sec Loss 13.8548 LearningRate 0.0880 Epoch: 1 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:31:03,768-Speed 2629.51 samples/sec Loss 13.9020 LearningRate 0.0880 Epoch: 1 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:31:07,666-Speed 2627.58 samples/sec Loss 13.8995 LearningRate 0.0880 Epoch: 1 Global Step: 51480 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:31:11,563-Speed 2628.48 samples/sec Loss 13.9566 LearningRate 0.0880 Epoch: 1 Global Step: 51490 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:31:15,454-Speed 2631.97 samples/sec Loss 13.7810 LearningRate 0.0880 Epoch: 1 Global Step: 51500 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:31:19,354-Speed 2626.89 samples/sec Loss 13.7588 LearningRate 0.0880 Epoch: 1 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:31:23,249-Speed 2629.28 samples/sec Loss 13.8007 LearningRate 0.0880 Epoch: 1 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:31:27,149-Speed 2626.88 samples/sec Loss 13.6503 LearningRate 0.0880 Epoch: 1 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:31:31,055-Speed 2622.39 samples/sec Loss 13.8341 LearningRate 0.0880 Epoch: 1 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:31:34,929-Speed 2643.39 samples/sec Loss 13.8951 LearningRate 0.0880 Epoch: 1 Global Step: 51550 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:31:38,837-Speed 2620.72 samples/sec Loss 13.8749 LearningRate 0.0880 Epoch: 1 Global Step: 51560 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:31:42,785-Speed 2594.51 samples/sec Loss 13.7493 LearningRate 0.0880 Epoch: 1 Global Step: 51570 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:31:46,679-Speed 2630.55 samples/sec Loss 13.7701 LearningRate 0.0880 Epoch: 1 Global Step: 51580 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:31:50,576-Speed 2628.65 samples/sec Loss 14.0100 LearningRate 0.0879 Epoch: 1 Global Step: 51590 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:31:54,473-Speed 2628.06 samples/sec Loss 13.7731 LearningRate 0.0879 Epoch: 1 Global Step: 51600 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:31:58,368-Speed 2630.27 samples/sec Loss 13.7664 LearningRate 0.0879 Epoch: 1 Global Step: 51610 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:32:02,268-Speed 2626.21 samples/sec Loss 13.8699 LearningRate 0.0879 Epoch: 1 Global Step: 51620 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:32:06,162-Speed 2630.13 samples/sec Loss 13.7181 LearningRate 0.0879 Epoch: 1 Global Step: 51630 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:32:10,061-Speed 2627.12 samples/sec Loss 13.8222 LearningRate 0.0879 Epoch: 1 Global Step: 51640 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:32:13,954-Speed 2630.69 samples/sec Loss 13.9005 LearningRate 0.0879 Epoch: 1 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:17,849-Speed 2629.97 samples/sec Loss 14.0495 LearningRate 0.0879 Epoch: 1 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:21,759-Speed 2619.98 samples/sec Loss 13.9642 LearningRate 0.0879 Epoch: 1 Global Step: 51670 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:25,655-Speed 2629.34 samples/sec Loss 13.7457 LearningRate 0.0879 Epoch: 1 Global Step: 51680 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:29,549-Speed 2629.97 samples/sec Loss 13.6999 LearningRate 0.0879 Epoch: 1 Global Step: 51690 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:33,440-Speed 2632.34 samples/sec Loss 13.6262 LearningRate 0.0879 Epoch: 1 Global Step: 51700 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:37,334-Speed 2630.13 samples/sec Loss 13.8640 LearningRate 0.0879 Epoch: 1 Global Step: 51710 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:41,239-Speed 2622.98 samples/sec Loss 13.7795 LearningRate 0.0879 Epoch: 1 Global Step: 51720 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:45,137-Speed 2627.18 samples/sec Loss 13.9495 LearningRate 0.0879 Epoch: 1 Global Step: 51730 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:49,036-Speed 2627.36 samples/sec Loss 13.9739 LearningRate 0.0879 Epoch: 1 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:32:52,928-Speed 2631.79 samples/sec Loss 13.9666 LearningRate 0.0879 Epoch: 1 Global Step: 51750 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:32:56,887-Speed 2587.07 samples/sec Loss 13.8479 LearningRate 0.0879 Epoch: 1 Global Step: 51760 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:33:00,784-Speed 2628.81 samples/sec Loss 13.7147 LearningRate 0.0879 Epoch: 1 Global Step: 51770 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:33:04,689-Speed 2623.41 samples/sec Loss 13.7207 LearningRate 0.0879 Epoch: 1 Global Step: 51780 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:33:08,586-Speed 2628.04 samples/sec Loss 13.7309 LearningRate 0.0879 Epoch: 1 Global Step: 51790 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:33:12,480-Speed 2629.65 samples/sec Loss 13.6651 LearningRate 0.0879 Epoch: 1 Global Step: 51800 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:33:16,378-Speed 2628.49 samples/sec Loss 13.9341 LearningRate 0.0879 Epoch: 1 Global Step: 51810 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:33:20,273-Speed 2629.63 samples/sec Loss 13.8515 LearningRate 0.0879 Epoch: 1 Global Step: 51820 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:33:24,149-Speed 2642.89 samples/sec Loss 13.8884 LearningRate 0.0879 Epoch: 1 Global Step: 51830 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:28,043-Speed 2630.06 samples/sec Loss 13.8851 LearningRate 0.0879 Epoch: 1 Global Step: 51840 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:31,934-Speed 2632.55 samples/sec Loss 13.7361 LearningRate 0.0879 Epoch: 1 Global Step: 51850 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:35,841-Speed 2621.35 samples/sec Loss 13.7388 LearningRate 0.0879 Epoch: 1 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:39,737-Speed 2628.74 samples/sec Loss 13.8687 LearningRate 0.0879 Epoch: 1 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:43,639-Speed 2624.98 samples/sec Loss 13.7600 LearningRate 0.0879 Epoch: 1 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:47,527-Speed 2634.33 samples/sec Loss 13.8134 LearningRate 0.0879 Epoch: 1 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:51,422-Speed 2630.00 samples/sec Loss 13.8457 LearningRate 0.0879 Epoch: 1 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:55,317-Speed 2629.90 samples/sec Loss 13.7654 LearningRate 0.0879 Epoch: 1 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:33:59,213-Speed 2628.83 samples/sec Loss 13.7988 LearningRate 0.0879 Epoch: 1 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:34:03,115-Speed 2625.04 samples/sec Loss 13.6845 LearningRate 0.0879 Epoch: 1 Global Step: 51930 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:07,014-Speed 2626.40 samples/sec Loss 13.9971 LearningRate 0.0879 Epoch: 1 Global Step: 51940 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:10,924-Speed 2619.97 samples/sec Loss 13.9475 LearningRate 0.0879 Epoch: 1 Global Step: 51950 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:14,841-Speed 2614.75 samples/sec Loss 13.8696 LearningRate 0.0879 Epoch: 1 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:18,740-Speed 2627.11 samples/sec Loss 13.7657 LearningRate 0.0879 Epoch: 1 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:22,647-Speed 2621.64 samples/sec Loss 13.8788 LearningRate 0.0879 Epoch: 1 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:26,659-Speed 2553.02 samples/sec Loss 13.6067 LearningRate 0.0879 Epoch: 1 Global Step: 51990 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:30,559-Speed 2625.97 samples/sec Loss 13.9771 LearningRate 0.0879 Epoch: 1 Global Step: 52000 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:34,459-Speed 2626.75 samples/sec Loss 13.8020 LearningRate 0.0879 Epoch: 1 Global Step: 52010 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:38,370-Speed 2619.21 samples/sec Loss 13.7722 LearningRate 0.0879 Epoch: 1 Global Step: 52020 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:42,283-Speed 2617.29 samples/sec Loss 13.8503 LearningRate 0.0878 Epoch: 1 Global Step: 52030 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:34:46,192-Speed 2620.12 samples/sec Loss 13.8325 LearningRate 0.0878 Epoch: 1 Global Step: 52040 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:34:50,083-Speed 2632.09 samples/sec Loss 13.8765 LearningRate 0.0878 Epoch: 1 Global Step: 52050 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:53,989-Speed 2623.03 samples/sec Loss 13.7307 LearningRate 0.0878 Epoch: 1 Global Step: 52060 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:34:57,892-Speed 2624.40 samples/sec Loss 13.7546 LearningRate 0.0878 Epoch: 1 Global Step: 52070 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:01,887-Speed 2563.11 samples/sec Loss 13.7472 LearningRate 0.0878 Epoch: 1 Global Step: 52080 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:05,805-Speed 2614.57 samples/sec Loss 13.7750 LearningRate 0.0878 Epoch: 1 Global Step: 52090 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:09,709-Speed 2623.21 samples/sec Loss 13.6906 LearningRate 0.0878 Epoch: 1 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:13,609-Speed 2626.70 samples/sec Loss 13.9099 LearningRate 0.0878 Epoch: 1 Global Step: 52110 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:17,513-Speed 2623.65 samples/sec Loss 13.8067 LearningRate 0.0878 Epoch: 1 Global Step: 52120 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:21,524-Speed 2553.44 samples/sec Loss 13.6802 LearningRate 0.0878 Epoch: 1 Global Step: 52130 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:25,430-Speed 2622.66 samples/sec Loss 13.6334 LearningRate 0.0878 Epoch: 1 Global Step: 52140 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:29,332-Speed 2625.66 samples/sec Loss 13.7045 LearningRate 0.0878 Epoch: 1 Global Step: 52150 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:35:33,243-Speed 2619.13 samples/sec Loss 13.8946 LearningRate 0.0878 Epoch: 1 Global Step: 52160 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:35:37,130-Speed 2634.76 samples/sec Loss 13.9137 LearningRate 0.0878 Epoch: 1 Global Step: 52170 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:41,041-Speed 2618.76 samples/sec Loss 13.6525 LearningRate 0.0878 Epoch: 1 Global Step: 52180 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:35:44,929-Speed 2634.29 samples/sec Loss 13.8903 LearningRate 0.0878 Epoch: 1 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:35:48,840-Speed 2619.69 samples/sec Loss 13.9025 LearningRate 0.0878 Epoch: 1 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:35:52,760-Speed 2612.82 samples/sec Loss 13.7155 LearningRate 0.0878 Epoch: 1 Global Step: 52210 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:35:56,755-Speed 2564.36 samples/sec Loss 13.7172 LearningRate 0.0878 Epoch: 1 Global Step: 52220 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:36:00,655-Speed 2625.75 samples/sec Loss 13.7431 LearningRate 0.0878 Epoch: 1 Global Step: 52230 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:36:04,563-Speed 2621.22 samples/sec Loss 13.6200 LearningRate 0.0878 Epoch: 1 Global Step: 52240 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:36:08,531-Speed 2580.66 samples/sec Loss 13.8220 LearningRate 0.0878 Epoch: 1 Global Step: 52250 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:36:12,441-Speed 2620.14 samples/sec Loss 13.8414 LearningRate 0.0878 Epoch: 1 Global Step: 52260 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:36:16,351-Speed 2619.28 samples/sec Loss 13.8715 LearningRate 0.0878 Epoch: 1 Global Step: 52270 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:36:20,269-Speed 2614.34 samples/sec Loss 13.8708 LearningRate 0.0878 Epoch: 1 Global Step: 52280 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:36:24,186-Speed 2614.98 samples/sec Loss 13.8384 LearningRate 0.0878 Epoch: 1 Global Step: 52290 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:28,092-Speed 2622.87 samples/sec Loss 13.7355 LearningRate 0.0878 Epoch: 1 Global Step: 52300 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:32,001-Speed 2620.22 samples/sec Loss 13.8302 LearningRate 0.0878 Epoch: 1 Global Step: 52310 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:35,920-Speed 2613.31 samples/sec Loss 13.7040 LearningRate 0.0878 Epoch: 1 Global Step: 52320 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:39,831-Speed 2618.85 samples/sec Loss 13.6456 LearningRate 0.0878 Epoch: 1 Global Step: 52330 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:43,724-Speed 2630.66 samples/sec Loss 13.7763 LearningRate 0.0878 Epoch: 1 Global Step: 52340 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:47,630-Speed 2622.53 samples/sec Loss 13.8707 LearningRate 0.0878 Epoch: 1 Global Step: 52350 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:51,524-Speed 2630.60 samples/sec Loss 13.7260 LearningRate 0.0878 Epoch: 1 Global Step: 52360 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:55,418-Speed 2630.24 samples/sec Loss 13.8242 LearningRate 0.0878 Epoch: 1 Global Step: 52370 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:36:59,269-Speed 2659.31 samples/sec Loss 13.7883 LearningRate 0.0878 Epoch: 1 Global Step: 52380 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:03,162-Speed 2631.80 samples/sec Loss 13.9432 LearningRate 0.0878 Epoch: 1 Global Step: 52390 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:07,270-Speed 2492.68 samples/sec Loss 13.7609 LearningRate 0.0878 Epoch: 1 Global Step: 52400 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:11,188-Speed 2614.55 samples/sec Loss 13.8524 LearningRate 0.0878 Epoch: 1 Global Step: 52410 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:15,090-Speed 2625.31 samples/sec Loss 13.8315 LearningRate 0.0878 Epoch: 1 Global Step: 52420 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:18,983-Speed 2631.06 samples/sec Loss 13.9785 LearningRate 0.0878 Epoch: 1 Global Step: 52430 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:22,881-Speed 2627.87 samples/sec Loss 13.8113 LearningRate 0.0878 Epoch: 1 Global Step: 52440 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:26,779-Speed 2627.35 samples/sec Loss 13.8287 LearningRate 0.0878 Epoch: 1 Global Step: 52450 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:30,695-Speed 2615.21 samples/sec Loss 13.7797 LearningRate 0.0878 Epoch: 1 Global Step: 52460 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:34,594-Speed 2627.20 samples/sec Loss 13.9839 LearningRate 0.0878 Epoch: 1 Global Step: 52470 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 01:37:38,491-Speed 2628.57 samples/sec Loss 13.7543 LearningRate 0.0877 Epoch: 1 Global Step: 52480 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:37:42,390-Speed 2626.76 samples/sec Loss 13.9084 LearningRate 0.0877 Epoch: 1 Global Step: 52490 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:37:46,288-Speed 2627.62 samples/sec Loss 13.7483 LearningRate 0.0877 Epoch: 1 Global Step: 52500 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:37:50,181-Speed 2631.00 samples/sec Loss 13.8041 LearningRate 0.0877 Epoch: 1 Global Step: 52510 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:37:54,096-Speed 2616.15 samples/sec Loss 13.8797 LearningRate 0.0877 Epoch: 1 Global Step: 52520 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:37:58,004-Speed 2620.86 samples/sec Loss 13.7482 LearningRate 0.0877 Epoch: 1 Global Step: 52530 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:38:01,903-Speed 2627.35 samples/sec Loss 13.8932 LearningRate 0.0877 Epoch: 1 Global Step: 52540 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:38:05,802-Speed 2626.95 samples/sec Loss 13.7424 LearningRate 0.0877 Epoch: 1 Global Step: 52550 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:38:09,697-Speed 2629.34 samples/sec Loss 13.7106 LearningRate 0.0877 Epoch: 1 Global Step: 52560 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:38:13,626-Speed 2607.33 samples/sec Loss 13.8361 LearningRate 0.0877 Epoch: 1 Global Step: 52570 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:38:17,521-Speed 2629.34 samples/sec Loss 13.7000 LearningRate 0.0877 Epoch: 1 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:21,420-Speed 2627.45 samples/sec Loss 13.8035 LearningRate 0.0877 Epoch: 1 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:25,316-Speed 2629.16 samples/sec Loss 13.8408 LearningRate 0.0877 Epoch: 1 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:29,220-Speed 2623.30 samples/sec Loss 13.7362 LearningRate 0.0877 Epoch: 1 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:33,120-Speed 2626.46 samples/sec Loss 13.8655 LearningRate 0.0877 Epoch: 1 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:37,011-Speed 2632.08 samples/sec Loss 13.7003 LearningRate 0.0877 Epoch: 1 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:40,907-Speed 2629.15 samples/sec Loss 13.7286 LearningRate 0.0877 Epoch: 1 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:44,810-Speed 2624.32 samples/sec Loss 13.7823 LearningRate 0.0877 Epoch: 1 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:48,722-Speed 2618.74 samples/sec Loss 13.7864 LearningRate 0.0877 Epoch: 1 Global Step: 52660 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:52,621-Speed 2626.84 samples/sec Loss 13.9610 LearningRate 0.0877 Epoch: 1 Global Step: 52670 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:38:56,518-Speed 2636.43 samples/sec Loss 13.8252 LearningRate 0.0877 Epoch: 1 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:00,500-Speed 2572.18 samples/sec Loss 13.8174 LearningRate 0.0877 Epoch: 1 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:04,421-Speed 2612.48 samples/sec Loss 13.9417 LearningRate 0.0877 Epoch: 1 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:08,314-Speed 2630.60 samples/sec Loss 13.8732 LearningRate 0.0877 Epoch: 1 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:12,228-Speed 2616.69 samples/sec Loss 13.6958 LearningRate 0.0877 Epoch: 1 Global Step: 52720 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:16,128-Speed 2626.50 samples/sec Loss 13.7830 LearningRate 0.0877 Epoch: 1 Global Step: 52730 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:20,037-Speed 2620.56 samples/sec Loss 13.9567 LearningRate 0.0877 Epoch: 1 Global Step: 52740 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:23,928-Speed 2632.78 samples/sec Loss 13.7712 LearningRate 0.0877 Epoch: 1 Global Step: 52750 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:27,824-Speed 2629.00 samples/sec Loss 13.6801 LearningRate 0.0877 Epoch: 1 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:31,717-Speed 2631.23 samples/sec Loss 13.8046 LearningRate 0.0877 Epoch: 1 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:35,648-Speed 2605.16 samples/sec Loss 13.8376 LearningRate 0.0877 Epoch: 1 Global Step: 52780 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:39:39,550-Speed 2625.03 samples/sec Loss 13.8615 LearningRate 0.0877 Epoch: 1 Global Step: 52790 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:39:43,486-Speed 2602.45 samples/sec Loss 13.8372 LearningRate 0.0877 Epoch: 1 Global Step: 52800 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:39:47,366-Speed 2639.84 samples/sec Loss 14.0038 LearningRate 0.0877 Epoch: 1 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:51,251-Speed 2636.36 samples/sec Loss 13.7857 LearningRate 0.0877 Epoch: 1 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:55,154-Speed 2624.26 samples/sec Loss 13.7883 LearningRate 0.0877 Epoch: 1 Global Step: 52830 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:39:59,052-Speed 2627.61 samples/sec Loss 13.7197 LearningRate 0.0877 Epoch: 1 Global Step: 52840 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:02,957-Speed 2623.52 samples/sec Loss 13.6750 LearningRate 0.0877 Epoch: 1 Global Step: 52850 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:06,850-Speed 2631.07 samples/sec Loss 13.7757 LearningRate 0.0877 Epoch: 1 Global Step: 52860 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:10,742-Speed 2631.81 samples/sec Loss 13.9032 LearningRate 0.0877 Epoch: 1 Global Step: 52870 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:14,635-Speed 2630.32 samples/sec Loss 13.9782 LearningRate 0.0877 Epoch: 1 Global Step: 52880 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:18,525-Speed 2636.36 samples/sec Loss 13.7587 LearningRate 0.0877 Epoch: 1 Global Step: 52890 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:22,433-Speed 2620.90 samples/sec Loss 13.7088 LearningRate 0.0877 Epoch: 1 Global Step: 52900 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:26,329-Speed 2628.55 samples/sec Loss 13.7000 LearningRate 0.0877 Epoch: 1 Global Step: 52910 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:40:30,210-Speed 2639.58 samples/sec Loss 13.7819 LearningRate 0.0876 Epoch: 1 Global Step: 52920 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:34,101-Speed 2632.71 samples/sec Loss 13.8919 LearningRate 0.0876 Epoch: 1 Global Step: 52930 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:37,994-Speed 2631.34 samples/sec Loss 13.6959 LearningRate 0.0876 Epoch: 1 Global Step: 52940 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:41,893-Speed 2626.81 samples/sec Loss 13.8269 LearningRate 0.0876 Epoch: 1 Global Step: 52950 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:45,785-Speed 2631.32 samples/sec Loss 13.8447 LearningRate 0.0876 Epoch: 1 Global Step: 52960 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:49,675-Speed 2633.29 samples/sec Loss 13.7280 LearningRate 0.0876 Epoch: 1 Global Step: 52970 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:53,569-Speed 2630.48 samples/sec Loss 13.7311 LearningRate 0.0876 Epoch: 1 Global Step: 52980 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:40:57,482-Speed 2617.25 samples/sec Loss 13.7358 LearningRate 0.0876 Epoch: 1 Global Step: 52990 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:41:01,379-Speed 2627.80 samples/sec Loss 13.7297 LearningRate 0.0876 Epoch: 1 Global Step: 53000 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:41:05,277-Speed 2627.51 samples/sec Loss 13.6960 LearningRate 0.0876 Epoch: 1 Global Step: 53010 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:41:09,189-Speed 2618.78 samples/sec Loss 13.7665 LearningRate 0.0876 Epoch: 1 Global Step: 53020 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:41:13,082-Speed 2631.07 samples/sec Loss 13.8479 LearningRate 0.0876 Epoch: 1 Global Step: 53030 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:41:16,973-Speed 2632.24 samples/sec Loss 13.8072 LearningRate 0.0876 Epoch: 1 Global Step: 53040 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:41:20,860-Speed 2634.96 samples/sec Loss 13.6556 LearningRate 0.0876 Epoch: 1 Global Step: 53050 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:41:24,751-Speed 2631.88 samples/sec Loss 13.6085 LearningRate 0.0876 Epoch: 1 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:41:28,644-Speed 2631.97 samples/sec Loss 13.6570 LearningRate 0.0876 Epoch: 1 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:41:32,533-Speed 2633.42 samples/sec Loss 13.6987 LearningRate 0.0876 Epoch: 1 Global Step: 53080 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:41:36,430-Speed 2627.83 samples/sec Loss 13.7573 LearningRate 0.0876 Epoch: 1 Global Step: 53090 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:41:40,322-Speed 2631.81 samples/sec Loss 13.6566 LearningRate 0.0876 Epoch: 1 Global Step: 53100 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:41:44,224-Speed 2625.59 samples/sec Loss 13.7994 LearningRate 0.0876 Epoch: 1 Global Step: 53110 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:41:48,118-Speed 2630.77 samples/sec Loss 13.8518 LearningRate 0.0876 Epoch: 1 Global Step: 53120 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:41:52,010-Speed 2631.98 samples/sec Loss 13.7930 LearningRate 0.0876 Epoch: 1 Global Step: 53130 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:41:55,898-Speed 2633.98 samples/sec Loss 13.8020 LearningRate 0.0876 Epoch: 1 Global Step: 53140 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:41:59,801-Speed 2624.49 samples/sec Loss 13.7022 LearningRate 0.0876 Epoch: 1 Global Step: 53150 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:42:03,706-Speed 2623.49 samples/sec Loss 13.8159 LearningRate 0.0876 Epoch: 1 Global Step: 53160 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:42:07,597-Speed 2631.74 samples/sec Loss 13.8742 LearningRate 0.0876 Epoch: 1 Global Step: 53170 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:42:11,505-Speed 2620.84 samples/sec Loss 13.6655 LearningRate 0.0876 Epoch: 1 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:15,410-Speed 2623.02 samples/sec Loss 13.6698 LearningRate 0.0876 Epoch: 1 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:19,311-Speed 2626.07 samples/sec Loss 13.8304 LearningRate 0.0876 Epoch: 1 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:23,211-Speed 2626.19 samples/sec Loss 13.6162 LearningRate 0.0876 Epoch: 1 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:27,113-Speed 2625.30 samples/sec Loss 13.7340 LearningRate 0.0876 Epoch: 1 Global Step: 53220 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:31,010-Speed 2628.47 samples/sec Loss 13.9467 LearningRate 0.0876 Epoch: 1 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:34,911-Speed 2625.19 samples/sec Loss 13.7384 LearningRate 0.0876 Epoch: 1 Global Step: 53240 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:38,821-Speed 2619.48 samples/sec Loss 13.7431 LearningRate 0.0876 Epoch: 1 Global Step: 53250 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:42,721-Speed 2626.11 samples/sec Loss 13.7665 LearningRate 0.0876 Epoch: 1 Global Step: 53260 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:46,620-Speed 2626.90 samples/sec Loss 13.8963 LearningRate 0.0876 Epoch: 1 Global Step: 53270 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:42:50,521-Speed 2625.72 samples/sec Loss 13.7227 LearningRate 0.0876 Epoch: 1 Global Step: 53280 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:42:54,419-Speed 2627.97 samples/sec Loss 13.7004 LearningRate 0.0876 Epoch: 1 Global Step: 53290 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:42:58,376-Speed 2588.81 samples/sec Loss 13.7706 LearningRate 0.0876 Epoch: 1 Global Step: 53300 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:43:02,315-Speed 2600.18 samples/sec Loss 13.7965 LearningRate 0.0876 Epoch: 1 Global Step: 53310 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:43:06,208-Speed 2630.80 samples/sec Loss 13.5808 LearningRate 0.0876 Epoch: 1 Global Step: 53320 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:43:10,108-Speed 2626.04 samples/sec Loss 13.7853 LearningRate 0.0876 Epoch: 1 Global Step: 53330 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:13,995-Speed 2635.58 samples/sec Loss 13.7246 LearningRate 0.0876 Epoch: 1 Global Step: 53340 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:17,904-Speed 2620.16 samples/sec Loss 13.8359 LearningRate 0.0876 Epoch: 1 Global Step: 53350 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:21,808-Speed 2624.20 samples/sec Loss 13.7876 LearningRate 0.0875 Epoch: 1 Global Step: 53360 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:25,706-Speed 2627.30 samples/sec Loss 13.6676 LearningRate 0.0875 Epoch: 1 Global Step: 53370 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:29,604-Speed 2627.76 samples/sec Loss 13.8325 LearningRate 0.0875 Epoch: 1 Global Step: 53380 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:33,503-Speed 2627.02 samples/sec Loss 13.8690 LearningRate 0.0875 Epoch: 1 Global Step: 53390 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:37,432-Speed 2606.93 samples/sec Loss 13.6935 LearningRate 0.0875 Epoch: 1 Global Step: 53400 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:41,332-Speed 2626.36 samples/sec Loss 13.7283 LearningRate 0.0875 Epoch: 1 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:45,321-Speed 2568.15 samples/sec Loss 13.7908 LearningRate 0.0875 Epoch: 1 Global Step: 53420 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:43:49,338-Speed 2549.73 samples/sec Loss 13.9527 LearningRate 0.0875 Epoch: 1 Global Step: 53430 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:43:53,233-Speed 2629.97 samples/sec Loss 13.7750 LearningRate 0.0875 Epoch: 1 Global Step: 53440 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:43:57,150-Speed 2615.10 samples/sec Loss 13.7626 LearningRate 0.0875 Epoch: 1 Global Step: 53450 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:01,050-Speed 2626.25 samples/sec Loss 13.7850 LearningRate 0.0875 Epoch: 1 Global Step: 53460 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:04,962-Speed 2618.31 samples/sec Loss 13.7220 LearningRate 0.0875 Epoch: 1 Global Step: 53470 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:08,880-Speed 2613.80 samples/sec Loss 13.8328 LearningRate 0.0875 Epoch: 1 Global Step: 53480 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:12,789-Speed 2620.33 samples/sec Loss 13.7666 LearningRate 0.0875 Epoch: 1 Global Step: 53490 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:16,714-Speed 2609.10 samples/sec Loss 13.6406 LearningRate 0.0875 Epoch: 1 Global Step: 53500 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:20,626-Speed 2618.98 samples/sec Loss 13.7550 LearningRate 0.0875 Epoch: 1 Global Step: 53510 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:24,539-Speed 2617.74 samples/sec Loss 13.6552 LearningRate 0.0875 Epoch: 1 Global Step: 53520 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:28,461-Speed 2611.57 samples/sec Loss 13.7670 LearningRate 0.0875 Epoch: 1 Global Step: 53530 Fp16 Grad Scale: 524288 Required: 87 hours
Training: 2022-04-13 01:44:32,379-Speed 2614.37 samples/sec Loss 13.5855 LearningRate 0.0875 Epoch: 1 Global Step: 53540 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:36,274-Speed 2629.50 samples/sec Loss 13.8258 LearningRate 0.0875 Epoch: 1 Global Step: 53550 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:40,171-Speed 2628.55 samples/sec Loss 13.7005 LearningRate 0.0875 Epoch: 1 Global Step: 53560 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:44,066-Speed 2629.04 samples/sec Loss 13.6413 LearningRate 0.0875 Epoch: 1 Global Step: 53570 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:47,965-Speed 2626.76 samples/sec Loss 13.8424 LearningRate 0.0875 Epoch: 1 Global Step: 53580 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:44:51,844-Speed 2640.65 samples/sec Loss 13.6979 LearningRate 0.0875 Epoch: 1 Global Step: 53590 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:44:55,740-Speed 2629.36 samples/sec Loss 13.6775 LearningRate 0.0875 Epoch: 1 Global Step: 53600 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:44:59,635-Speed 2630.11 samples/sec Loss 13.7978 LearningRate 0.0875 Epoch: 1 Global Step: 53610 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:03,535-Speed 2626.15 samples/sec Loss 13.6526 LearningRate 0.0875 Epoch: 1 Global Step: 53620 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:07,437-Speed 2624.96 samples/sec Loss 13.7715 LearningRate 0.0875 Epoch: 1 Global Step: 53630 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:11,342-Speed 2622.91 samples/sec Loss 13.7469 LearningRate 0.0875 Epoch: 1 Global Step: 53640 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:15,247-Speed 2622.97 samples/sec Loss 13.7602 LearningRate 0.0875 Epoch: 1 Global Step: 53650 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:19,142-Speed 2629.33 samples/sec Loss 13.8489 LearningRate 0.0875 Epoch: 1 Global Step: 53660 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:23,039-Speed 2628.48 samples/sec Loss 13.7725 LearningRate 0.0875 Epoch: 1 Global Step: 53670 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:26,929-Speed 2632.63 samples/sec Loss 13.6763 LearningRate 0.0875 Epoch: 1 Global Step: 53680 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:30,863-Speed 2604.45 samples/sec Loss 13.7939 LearningRate 0.0875 Epoch: 1 Global Step: 53690 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:45:34,763-Speed 2626.42 samples/sec Loss 13.7383 LearningRate 0.0875 Epoch: 1 Global Step: 53700 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:45:38,660-Speed 2627.96 samples/sec Loss 13.6189 LearningRate 0.0875 Epoch: 1 Global Step: 53710 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:45:42,569-Speed 2620.78 samples/sec Loss 13.7159 LearningRate 0.0875 Epoch: 1 Global Step: 53720 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:45:46,469-Speed 2625.74 samples/sec Loss 13.7808 LearningRate 0.0875 Epoch: 1 Global Step: 53730 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:45:50,378-Speed 2620.16 samples/sec Loss 13.6068 LearningRate 0.0875 Epoch: 1 Global Step: 53740 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:45:54,281-Speed 2624.28 samples/sec Loss 13.7634 LearningRate 0.0875 Epoch: 1 Global Step: 53750 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:45:58,177-Speed 2629.42 samples/sec Loss 13.8529 LearningRate 0.0875 Epoch: 1 Global Step: 53760 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:02,143-Speed 2582.93 samples/sec Loss 13.7174 LearningRate 0.0875 Epoch: 1 Global Step: 53770 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:06,042-Speed 2626.92 samples/sec Loss 13.7227 LearningRate 0.0875 Epoch: 1 Global Step: 53780 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:09,956-Speed 2616.89 samples/sec Loss 13.8368 LearningRate 0.0875 Epoch: 1 Global Step: 53790 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:13,854-Speed 2627.35 samples/sec Loss 13.6515 LearningRate 0.0875 Epoch: 1 Global Step: 53800 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:17,764-Speed 2619.56 samples/sec Loss 13.8034 LearningRate 0.0874 Epoch: 1 Global Step: 53810 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:21,676-Speed 2617.80 samples/sec Loss 13.7027 LearningRate 0.0874 Epoch: 1 Global Step: 53820 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:25,592-Speed 2615.63 samples/sec Loss 13.5605 LearningRate 0.0874 Epoch: 1 Global Step: 53830 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:29,487-Speed 2629.77 samples/sec Loss 13.8104 LearningRate 0.0874 Epoch: 1 Global Step: 53840 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:33,370-Speed 2638.29 samples/sec Loss 13.7037 LearningRate 0.0874 Epoch: 1 Global Step: 53850 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:37,281-Speed 2618.80 samples/sec Loss 13.7385 LearningRate 0.0874 Epoch: 1 Global Step: 53860 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:41,196-Speed 2615.91 samples/sec Loss 13.8441 LearningRate 0.0874 Epoch: 1 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:45,162-Speed 2582.39 samples/sec Loss 13.7214 LearningRate 0.0874 Epoch: 1 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:49,067-Speed 2623.47 samples/sec Loss 13.8635 LearningRate 0.0874 Epoch: 1 Global Step: 53890 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:52,979-Speed 2618.07 samples/sec Loss 13.6978 LearningRate 0.0874 Epoch: 1 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:46:56,876-Speed 2628.23 samples/sec Loss 13.8205 LearningRate 0.0874 Epoch: 1 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:47:00,772-Speed 2629.17 samples/sec Loss 13.6522 LearningRate 0.0874 Epoch: 1 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:47:04,672-Speed 2625.98 samples/sec Loss 13.6237 LearningRate 0.0874 Epoch: 1 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:47:08,575-Speed 2625.28 samples/sec Loss 13.8033 LearningRate 0.0874 Epoch: 1 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:47:12,469-Speed 2630.10 samples/sec Loss 13.5626 LearningRate 0.0874 Epoch: 1 Global Step: 53950 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:16,365-Speed 2628.55 samples/sec Loss 13.7314 LearningRate 0.0874 Epoch: 1 Global Step: 53960 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:20,269-Speed 2624.32 samples/sec Loss 13.7197 LearningRate 0.0874 Epoch: 1 Global Step: 53970 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:24,174-Speed 2622.55 samples/sec Loss 13.7034 LearningRate 0.0874 Epoch: 1 Global Step: 53980 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:28,066-Speed 2632.00 samples/sec Loss 13.8059 LearningRate 0.0874 Epoch: 1 Global Step: 53990 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:31,961-Speed 2629.10 samples/sec Loss 13.6739 LearningRate 0.0874 Epoch: 1 Global Step: 54000 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:35,864-Speed 2624.13 samples/sec Loss 13.6514 LearningRate 0.0874 Epoch: 1 Global Step: 54010 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:39,757-Speed 2630.82 samples/sec Loss 13.6138 LearningRate 0.0874 Epoch: 1 Global Step: 54020 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:43,692-Speed 2603.38 samples/sec Loss 13.7545 LearningRate 0.0874 Epoch: 1 Global Step: 54030 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:47,608-Speed 2615.55 samples/sec Loss 13.6865 LearningRate 0.0874 Epoch: 1 Global Step: 54040 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:51,490-Speed 2638.89 samples/sec Loss 13.6893 LearningRate 0.0874 Epoch: 1 Global Step: 54050 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:55,385-Speed 2629.77 samples/sec Loss 13.7684 LearningRate 0.0874 Epoch: 1 Global Step: 54060 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:47:59,278-Speed 2631.51 samples/sec Loss 13.7073 LearningRate 0.0874 Epoch: 1 Global Step: 54070 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:48:03,173-Speed 2629.43 samples/sec Loss 13.7484 LearningRate 0.0874 Epoch: 1 Global Step: 54080 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:48:07,047-Speed 2643.48 samples/sec Loss 13.8017 LearningRate 0.0874 Epoch: 1 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:10,943-Speed 2629.25 samples/sec Loss 13.7154 LearningRate 0.0874 Epoch: 1 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:14,834-Speed 2632.49 samples/sec Loss 13.7939 LearningRate 0.0874 Epoch: 1 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:18,728-Speed 2630.23 samples/sec Loss 13.7372 LearningRate 0.0874 Epoch: 1 Global Step: 54120 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:22,626-Speed 2628.03 samples/sec Loss 13.6756 LearningRate 0.0874 Epoch: 1 Global Step: 54130 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:26,518-Speed 2631.30 samples/sec Loss 13.5967 LearningRate 0.0874 Epoch: 1 Global Step: 54140 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:30,433-Speed 2617.05 samples/sec Loss 13.7462 LearningRate 0.0874 Epoch: 1 Global Step: 54150 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:34,324-Speed 2631.67 samples/sec Loss 13.8271 LearningRate 0.0874 Epoch: 1 Global Step: 54160 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:38,220-Speed 2628.99 samples/sec Loss 13.7006 LearningRate 0.0874 Epoch: 1 Global Step: 54170 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:42,114-Speed 2630.47 samples/sec Loss 13.6841 LearningRate 0.0874 Epoch: 1 Global Step: 54180 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:46,006-Speed 2631.57 samples/sec Loss 13.5321 LearningRate 0.0874 Epoch: 1 Global Step: 54190 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:49,900-Speed 2630.62 samples/sec Loss 13.6558 LearningRate 0.0874 Epoch: 1 Global Step: 54200 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:53,797-Speed 2627.99 samples/sec Loss 13.6046 LearningRate 0.0874 Epoch: 1 Global Step: 54210 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:48:57,698-Speed 2626.16 samples/sec Loss 13.7698 LearningRate 0.0874 Epoch: 1 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:49:01,580-Speed 2638.28 samples/sec Loss 13.6126 LearningRate 0.0874 Epoch: 1 Global Step: 54230 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:05,480-Speed 2626.50 samples/sec Loss 13.8215 LearningRate 0.0874 Epoch: 1 Global Step: 54240 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:09,370-Speed 2632.83 samples/sec Loss 13.5269 LearningRate 0.0873 Epoch: 1 Global Step: 54250 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:13,266-Speed 2628.74 samples/sec Loss 13.7729 LearningRate 0.0873 Epoch: 1 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:17,164-Speed 2627.83 samples/sec Loss 13.5676 LearningRate 0.0873 Epoch: 1 Global Step: 54270 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:21,065-Speed 2625.82 samples/sec Loss 13.6328 LearningRate 0.0873 Epoch: 1 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:24,965-Speed 2625.89 samples/sec Loss 13.7019 LearningRate 0.0873 Epoch: 1 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:28,864-Speed 2627.55 samples/sec Loss 13.5531 LearningRate 0.0873 Epoch: 1 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:32,762-Speed 2627.84 samples/sec Loss 13.6156 LearningRate 0.0873 Epoch: 1 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:36,659-Speed 2628.34 samples/sec Loss 13.6757 LearningRate 0.0873 Epoch: 1 Global Step: 54320 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:49:40,555-Speed 2628.24 samples/sec Loss 13.7351 LearningRate 0.0873 Epoch: 1 Global Step: 54330 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:49:44,451-Speed 2629.63 samples/sec Loss 13.6473 LearningRate 0.0873 Epoch: 1 Global Step: 54340 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:49:48,372-Speed 2611.78 samples/sec Loss 13.5441 LearningRate 0.0873 Epoch: 1 Global Step: 54350 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:49:52,265-Speed 2631.43 samples/sec Loss 13.7120 LearningRate 0.0873 Epoch: 1 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:49:56,161-Speed 2628.94 samples/sec Loss 13.5107 LearningRate 0.0873 Epoch: 1 Global Step: 54370 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:00,061-Speed 2626.43 samples/sec Loss 13.7773 LearningRate 0.0873 Epoch: 1 Global Step: 54380 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:03,961-Speed 2626.61 samples/sec Loss 13.6130 LearningRate 0.0873 Epoch: 1 Global Step: 54390 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:07,855-Speed 2630.00 samples/sec Loss 13.6883 LearningRate 0.0873 Epoch: 1 Global Step: 54400 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:11,747-Speed 2631.63 samples/sec Loss 13.6674 LearningRate 0.0873 Epoch: 1 Global Step: 54410 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:15,640-Speed 2631.15 samples/sec Loss 13.6139 LearningRate 0.0873 Epoch: 1 Global Step: 54420 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:19,542-Speed 2625.33 samples/sec Loss 13.6513 LearningRate 0.0873 Epoch: 1 Global Step: 54430 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:50:23,450-Speed 2620.80 samples/sec Loss 13.5922 LearningRate 0.0873 Epoch: 1 Global Step: 54440 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:50:27,371-Speed 2612.55 samples/sec Loss 13.6419 LearningRate 0.0873 Epoch: 1 Global Step: 54450 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:50:31,260-Speed 2633.80 samples/sec Loss 13.8162 LearningRate 0.0873 Epoch: 1 Global Step: 54460 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:50:35,152-Speed 2631.94 samples/sec Loss 13.8087 LearningRate 0.0873 Epoch: 1 Global Step: 54470 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:50:39,048-Speed 2629.24 samples/sec Loss 13.6126 LearningRate 0.0873 Epoch: 1 Global Step: 54480 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:50:42,933-Speed 2636.61 samples/sec Loss 13.7174 LearningRate 0.0873 Epoch: 1 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:46,834-Speed 2625.85 samples/sec Loss 13.6303 LearningRate 0.0873 Epoch: 1 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:50,748-Speed 2616.13 samples/sec Loss 13.7373 LearningRate 0.0873 Epoch: 1 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:54,669-Speed 2612.67 samples/sec Loss 13.7268 LearningRate 0.0873 Epoch: 1 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:50:58,615-Speed 2595.74 samples/sec Loss 13.7055 LearningRate 0.0873 Epoch: 1 Global Step: 54530 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:02,516-Speed 2625.55 samples/sec Loss 13.7393 LearningRate 0.0873 Epoch: 1 Global Step: 54540 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:06,416-Speed 2626.80 samples/sec Loss 13.7177 LearningRate 0.0873 Epoch: 1 Global Step: 54550 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:10,312-Speed 2628.35 samples/sec Loss 13.6632 LearningRate 0.0873 Epoch: 1 Global Step: 54560 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:14,206-Speed 2630.79 samples/sec Loss 13.6552 LearningRate 0.0873 Epoch: 1 Global Step: 54570 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:18,100-Speed 2630.29 samples/sec Loss 13.7417 LearningRate 0.0873 Epoch: 1 Global Step: 54580 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:21,990-Speed 2632.86 samples/sec Loss 13.6880 LearningRate 0.0873 Epoch: 1 Global Step: 54590 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:51:25,883-Speed 2630.66 samples/sec Loss 13.6952 LearningRate 0.0873 Epoch: 1 Global Step: 54600 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:51:29,756-Speed 2645.11 samples/sec Loss 13.5968 LearningRate 0.0873 Epoch: 1 Global Step: 54610 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:33,649-Speed 2630.91 samples/sec Loss 13.6486 LearningRate 0.0873 Epoch: 1 Global Step: 54620 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:37,543-Speed 2630.26 samples/sec Loss 13.6180 LearningRate 0.0873 Epoch: 1 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:41,440-Speed 2628.91 samples/sec Loss 13.7042 LearningRate 0.0873 Epoch: 1 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:45,349-Speed 2620.34 samples/sec Loss 13.5928 LearningRate 0.0873 Epoch: 1 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:49,248-Speed 2626.32 samples/sec Loss 13.6315 LearningRate 0.0873 Epoch: 1 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:53,140-Speed 2631.49 samples/sec Loss 13.6213 LearningRate 0.0873 Epoch: 1 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:51:57,034-Speed 2630.65 samples/sec Loss 13.7388 LearningRate 0.0873 Epoch: 1 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:52:00,961-Speed 2608.55 samples/sec Loss 13.7087 LearningRate 0.0872 Epoch: 1 Global Step: 54690 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:52:04,850-Speed 2633.95 samples/sec Loss 13.6386 LearningRate 0.0872 Epoch: 1 Global Step: 54700 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:52:08,750-Speed 2626.20 samples/sec Loss 13.5663 LearningRate 0.0872 Epoch: 1 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:52:12,645-Speed 2629.86 samples/sec Loss 13.8129 LearningRate 0.0872 Epoch: 1 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:52:16,538-Speed 2631.24 samples/sec Loss 13.6373 LearningRate 0.0872 Epoch: 1 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:20,433-Speed 2629.28 samples/sec Loss 13.4661 LearningRate 0.0872 Epoch: 1 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:24,323-Speed 2633.07 samples/sec Loss 13.6804 LearningRate 0.0872 Epoch: 1 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:28,222-Speed 2627.21 samples/sec Loss 13.8524 LearningRate 0.0872 Epoch: 1 Global Step: 54760 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:32,112-Speed 2632.83 samples/sec Loss 13.6612 LearningRate 0.0872 Epoch: 1 Global Step: 54770 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:36,018-Speed 2622.64 samples/sec Loss 13.7980 LearningRate 0.0872 Epoch: 1 Global Step: 54780 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:39,908-Speed 2633.42 samples/sec Loss 13.7978 LearningRate 0.0872 Epoch: 1 Global Step: 54790 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:43,827-Speed 2613.61 samples/sec Loss 13.7536 LearningRate 0.0872 Epoch: 1 Global Step: 54800 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:47,720-Speed 2631.61 samples/sec Loss 13.5556 LearningRate 0.0872 Epoch: 1 Global Step: 54810 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:51,609-Speed 2633.23 samples/sec Loss 13.7827 LearningRate 0.0872 Epoch: 1 Global Step: 54820 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 01:52:55,508-Speed 2627.12 samples/sec Loss 13.5899 LearningRate 0.0872 Epoch: 1 Global Step: 54830 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:52:59,401-Speed 2630.90 samples/sec Loss 13.6300 LearningRate 0.0872 Epoch: 1 Global Step: 54840 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:03,294-Speed 2631.47 samples/sec Loss 13.7222 LearningRate 0.0872 Epoch: 1 Global Step: 54850 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:07,187-Speed 2630.48 samples/sec Loss 13.6537 LearningRate 0.0872 Epoch: 1 Global Step: 54860 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:11,080-Speed 2631.66 samples/sec Loss 13.6930 LearningRate 0.0872 Epoch: 1 Global Step: 54870 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:14,984-Speed 2623.59 samples/sec Loss 13.8576 LearningRate 0.0872 Epoch: 1 Global Step: 54880 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:18,876-Speed 2631.55 samples/sec Loss 13.7161 LearningRate 0.0872 Epoch: 1 Global Step: 54890 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:22,777-Speed 2626.38 samples/sec Loss 13.6541 LearningRate 0.0872 Epoch: 1 Global Step: 54900 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:26,670-Speed 2630.83 samples/sec Loss 13.5367 LearningRate 0.0872 Epoch: 1 Global Step: 54910 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:30,562-Speed 2632.24 samples/sec Loss 13.7038 LearningRate 0.0872 Epoch: 1 Global Step: 54920 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:53:34,468-Speed 2621.64 samples/sec Loss 13.7391 LearningRate 0.0872 Epoch: 1 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:53:38,361-Speed 2630.61 samples/sec Loss 13.7398 LearningRate 0.0872 Epoch: 1 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:53:42,280-Speed 2613.89 samples/sec Loss 13.6437 LearningRate 0.0872 Epoch: 1 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:53:46,184-Speed 2623.46 samples/sec Loss 13.6778 LearningRate 0.0872 Epoch: 1 Global Step: 54960 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:53:50,083-Speed 2627.52 samples/sec Loss 13.5749 LearningRate 0.0872 Epoch: 1 Global Step: 54970 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:53:53,976-Speed 2631.01 samples/sec Loss 13.6232 LearningRate 0.0872 Epoch: 1 Global Step: 54980 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:53:57,875-Speed 2627.08 samples/sec Loss 13.4947 LearningRate 0.0872 Epoch: 1 Global Step: 54990 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:01,767-Speed 2631.29 samples/sec Loss 13.7350 LearningRate 0.0872 Epoch: 1 Global Step: 55000 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:05,661-Speed 2630.32 samples/sec Loss 13.6247 LearningRate 0.0872 Epoch: 1 Global Step: 55010 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:09,571-Speed 2619.51 samples/sec Loss 13.7300 LearningRate 0.0872 Epoch: 1 Global Step: 55020 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:13,465-Speed 2630.54 samples/sec Loss 13.5364 LearningRate 0.0872 Epoch: 1 Global Step: 55030 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:54:17,357-Speed 2631.51 samples/sec Loss 13.7922 LearningRate 0.0872 Epoch: 1 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:21,262-Speed 2623.10 samples/sec Loss 13.7569 LearningRate 0.0872 Epoch: 1 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:25,169-Speed 2621.78 samples/sec Loss 13.6421 LearningRate 0.0872 Epoch: 1 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:29,081-Speed 2618.51 samples/sec Loss 13.6648 LearningRate 0.0872 Epoch: 1 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:32,983-Speed 2624.28 samples/sec Loss 13.5414 LearningRate 0.0872 Epoch: 1 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:36,893-Speed 2619.58 samples/sec Loss 13.4401 LearningRate 0.0872 Epoch: 1 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:40,792-Speed 2626.92 samples/sec Loss 13.6351 LearningRate 0.0872 Epoch: 1 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:44,694-Speed 2625.40 samples/sec Loss 13.7467 LearningRate 0.0872 Epoch: 1 Global Step: 55110 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:48,588-Speed 2630.15 samples/sec Loss 13.5481 LearningRate 0.0872 Epoch: 1 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:52,481-Speed 2631.26 samples/sec Loss 13.6540 LearningRate 0.0872 Epoch: 1 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:54:56,372-Speed 2632.40 samples/sec Loss 13.7258 LearningRate 0.0871 Epoch: 1 Global Step: 55140 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:00,273-Speed 2625.37 samples/sec Loss 13.7318 LearningRate 0.0871 Epoch: 1 Global Step: 55150 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:04,167-Speed 2630.58 samples/sec Loss 13.6682 LearningRate 0.0871 Epoch: 1 Global Step: 55160 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:08,060-Speed 2630.70 samples/sec Loss 13.6021 LearningRate 0.0871 Epoch: 1 Global Step: 55170 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:11,956-Speed 2629.04 samples/sec Loss 13.6352 LearningRate 0.0871 Epoch: 1 Global Step: 55180 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:15,847-Speed 2632.17 samples/sec Loss 13.5620 LearningRate 0.0871 Epoch: 1 Global Step: 55190 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:19,741-Speed 2630.87 samples/sec Loss 13.5537 LearningRate 0.0871 Epoch: 1 Global Step: 55200 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:23,636-Speed 2630.06 samples/sec Loss 13.7027 LearningRate 0.0871 Epoch: 1 Global Step: 55210 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:55:27,525-Speed 2633.01 samples/sec Loss 13.6170 LearningRate 0.0871 Epoch: 1 Global Step: 55220 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:31,415-Speed 2633.72 samples/sec Loss 13.6214 LearningRate 0.0871 Epoch: 1 Global Step: 55230 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:35,308-Speed 2630.39 samples/sec Loss 13.6862 LearningRate 0.0871 Epoch: 1 Global Step: 55240 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:39,212-Speed 2623.36 samples/sec Loss 13.5828 LearningRate 0.0871 Epoch: 1 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:43,104-Speed 2631.34 samples/sec Loss 13.6443 LearningRate 0.0871 Epoch: 1 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:47,001-Speed 2628.67 samples/sec Loss 13.6968 LearningRate 0.0871 Epoch: 1 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:50,894-Speed 2631.32 samples/sec Loss 13.6200 LearningRate 0.0871 Epoch: 1 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:54,787-Speed 2630.57 samples/sec Loss 13.5623 LearningRate 0.0871 Epoch: 1 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:55:58,682-Speed 2630.35 samples/sec Loss 13.6691 LearningRate 0.0871 Epoch: 1 Global Step: 55300 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:56:02,587-Speed 2622.73 samples/sec Loss 13.6071 LearningRate 0.0871 Epoch: 1 Global Step: 55310 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:56:06,467-Speed 2639.58 samples/sec Loss 13.6782 LearningRate 0.0871 Epoch: 1 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:10,349-Speed 2637.88 samples/sec Loss 13.6381 LearningRate 0.0871 Epoch: 1 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:14,253-Speed 2624.33 samples/sec Loss 13.7209 LearningRate 0.0871 Epoch: 1 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:18,151-Speed 2627.40 samples/sec Loss 13.7201 LearningRate 0.0871 Epoch: 1 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:22,044-Speed 2630.60 samples/sec Loss 13.6608 LearningRate 0.0871 Epoch: 1 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:25,941-Speed 2628.74 samples/sec Loss 13.6239 LearningRate 0.0871 Epoch: 1 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:29,836-Speed 2629.81 samples/sec Loss 13.5730 LearningRate 0.0871 Epoch: 1 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:33,729-Speed 2630.61 samples/sec Loss 13.7705 LearningRate 0.0871 Epoch: 1 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:37,637-Speed 2621.57 samples/sec Loss 13.6062 LearningRate 0.0871 Epoch: 1 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:41,530-Speed 2630.77 samples/sec Loss 13.7033 LearningRate 0.0871 Epoch: 1 Global Step: 55410 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:56:45,441-Speed 2619.12 samples/sec Loss 13.5057 LearningRate 0.0871 Epoch: 1 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:56:49,346-Speed 2622.77 samples/sec Loss 13.6385 LearningRate 0.0871 Epoch: 1 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:56:53,240-Speed 2630.85 samples/sec Loss 13.7025 LearningRate 0.0871 Epoch: 1 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:56:57,135-Speed 2629.62 samples/sec Loss 13.6025 LearningRate 0.0871 Epoch: 1 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:01,039-Speed 2623.80 samples/sec Loss 13.7085 LearningRate 0.0871 Epoch: 1 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:04,950-Speed 2619.12 samples/sec Loss 13.7756 LearningRate 0.0871 Epoch: 1 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:08,854-Speed 2623.90 samples/sec Loss 13.5099 LearningRate 0.0871 Epoch: 1 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:12,748-Speed 2630.34 samples/sec Loss 13.6364 LearningRate 0.0871 Epoch: 1 Global Step: 55490 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:16,651-Speed 2625.09 samples/sec Loss 13.7140 LearningRate 0.0871 Epoch: 1 Global Step: 55500 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:20,577-Speed 2608.58 samples/sec Loss 13.7257 LearningRate 0.0871 Epoch: 1 Global Step: 55510 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:24,472-Speed 2629.67 samples/sec Loss 13.6715 LearningRate 0.0871 Epoch: 1 Global Step: 55520 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:57:28,350-Speed 2641.23 samples/sec Loss 13.7073 LearningRate 0.0871 Epoch: 1 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:32,263-Speed 2617.24 samples/sec Loss 13.6573 LearningRate 0.0871 Epoch: 1 Global Step: 55540 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:36,158-Speed 2629.53 samples/sec Loss 13.7104 LearningRate 0.0871 Epoch: 1 Global Step: 55550 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:40,065-Speed 2622.32 samples/sec Loss 13.4431 LearningRate 0.0871 Epoch: 1 Global Step: 55560 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:43,966-Speed 2625.90 samples/sec Loss 13.7545 LearningRate 0.0871 Epoch: 1 Global Step: 55570 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:47,875-Speed 2619.32 samples/sec Loss 13.5607 LearningRate 0.0870 Epoch: 1 Global Step: 55580 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:51,784-Speed 2620.62 samples/sec Loss 13.6100 LearningRate 0.0870 Epoch: 1 Global Step: 55590 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:55,697-Speed 2617.00 samples/sec Loss 13.7607 LearningRate 0.0870 Epoch: 1 Global Step: 55600 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:57:59,610-Speed 2618.17 samples/sec Loss 13.7278 LearningRate 0.0870 Epoch: 1 Global Step: 55610 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:58:03,506-Speed 2629.02 samples/sec Loss 13.5872 LearningRate 0.0870 Epoch: 1 Global Step: 55620 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:58:07,403-Speed 2628.65 samples/sec Loss 13.5043 LearningRate 0.0870 Epoch: 1 Global Step: 55630 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:58:11,286-Speed 2637.60 samples/sec Loss 13.7508 LearningRate 0.0870 Epoch: 1 Global Step: 55640 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:58:15,262-Speed 2576.19 samples/sec Loss 13.7510 LearningRate 0.0870 Epoch: 1 Global Step: 55650 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:58:19,193-Speed 2605.75 samples/sec Loss 13.4634 LearningRate 0.0870 Epoch: 1 Global Step: 55660 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:58:23,113-Speed 2612.56 samples/sec Loss 13.5692 LearningRate 0.0870 Epoch: 1 Global Step: 55670 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:58:27,012-Speed 2627.31 samples/sec Loss 13.5239 LearningRate 0.0870 Epoch: 1 Global Step: 55680 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:58:30,894-Speed 2638.36 samples/sec Loss 13.3937 LearningRate 0.0870 Epoch: 1 Global Step: 55690 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:58:34,791-Speed 2628.28 samples/sec Loss 13.5272 LearningRate 0.0870 Epoch: 1 Global Step: 55700 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:58:38,685-Speed 2630.35 samples/sec Loss 13.6138 LearningRate 0.0870 Epoch: 1 Global Step: 55710 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:58:42,581-Speed 2629.08 samples/sec Loss 13.6961 LearningRate 0.0870 Epoch: 1 Global Step: 55720 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:58:46,477-Speed 2628.97 samples/sec Loss 13.7348 LearningRate 0.0870 Epoch: 1 Global Step: 55730 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:58:50,372-Speed 2629.84 samples/sec Loss 13.5868 LearningRate 0.0870 Epoch: 1 Global Step: 55740 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:58:54,266-Speed 2630.15 samples/sec Loss 13.5894 LearningRate 0.0870 Epoch: 1 Global Step: 55750 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:58:58,169-Speed 2624.18 samples/sec Loss 13.6475 LearningRate 0.0870 Epoch: 1 Global Step: 55760 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:59:02,070-Speed 2625.50 samples/sec Loss 13.5032 LearningRate 0.0870 Epoch: 1 Global Step: 55770 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:59:05,968-Speed 2627.50 samples/sec Loss 13.5709 LearningRate 0.0870 Epoch: 1 Global Step: 55780 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 01:59:09,871-Speed 2624.67 samples/sec Loss 13.6148 LearningRate 0.0870 Epoch: 1 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:13,905-Speed 2539.07 samples/sec Loss 13.4833 LearningRate 0.0870 Epoch: 1 Global Step: 55800 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:17,963-Speed 2523.76 samples/sec Loss 13.6007 LearningRate 0.0870 Epoch: 1 Global Step: 55810 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:21,863-Speed 2627.16 samples/sec Loss 13.6440 LearningRate 0.0870 Epoch: 1 Global Step: 55820 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:25,769-Speed 2622.06 samples/sec Loss 13.6493 LearningRate 0.0870 Epoch: 1 Global Step: 55830 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:29,669-Speed 2626.29 samples/sec Loss 13.5988 LearningRate 0.0870 Epoch: 1 Global Step: 55840 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:33,569-Speed 2626.12 samples/sec Loss 13.5851 LearningRate 0.0870 Epoch: 1 Global Step: 55850 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:37,477-Speed 2620.96 samples/sec Loss 13.6108 LearningRate 0.0870 Epoch: 1 Global Step: 55860 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:41,370-Speed 2630.32 samples/sec Loss 13.5935 LearningRate 0.0870 Epoch: 1 Global Step: 55870 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:45,267-Speed 2629.02 samples/sec Loss 13.5929 LearningRate 0.0870 Epoch: 1 Global Step: 55880 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 01:59:49,161-Speed 2630.22 samples/sec Loss 13.5587 LearningRate 0.0870 Epoch: 1 Global Step: 55890 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:59:53,090-Speed 2607.39 samples/sec Loss 13.6429 LearningRate 0.0870 Epoch: 1 Global Step: 55900 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 01:59:56,996-Speed 2622.12 samples/sec Loss 13.5280 LearningRate 0.0870 Epoch: 1 Global Step: 55910 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:00,904-Speed 2621.50 samples/sec Loss 13.7312 LearningRate 0.0870 Epoch: 1 Global Step: 55920 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:04,812-Speed 2620.16 samples/sec Loss 13.8084 LearningRate 0.0870 Epoch: 1 Global Step: 55930 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:08,721-Speed 2620.74 samples/sec Loss 13.6175 LearningRate 0.0870 Epoch: 1 Global Step: 55940 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:12,633-Speed 2618.27 samples/sec Loss 13.5936 LearningRate 0.0870 Epoch: 1 Global Step: 55950 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:16,539-Speed 2622.37 samples/sec Loss 13.7198 LearningRate 0.0870 Epoch: 1 Global Step: 55960 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:20,445-Speed 2622.39 samples/sec Loss 13.6341 LearningRate 0.0870 Epoch: 1 Global Step: 55970 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:24,352-Speed 2621.25 samples/sec Loss 13.7886 LearningRate 0.0870 Epoch: 1 Global Step: 55980 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:28,256-Speed 2624.37 samples/sec Loss 13.5844 LearningRate 0.0870 Epoch: 1 Global Step: 55990 Fp16 Grad Scale: 524288 Required: 87 hours
Training: 2022-04-13 02:00:32,164-Speed 2620.43 samples/sec Loss 13.5642 LearningRate 0.0870 Epoch: 1 Global Step: 56000 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:36,071-Speed 2621.81 samples/sec Loss 13.5436 LearningRate 0.0870 Epoch: 1 Global Step: 56010 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:39,984-Speed 2617.59 samples/sec Loss 13.6115 LearningRate 0.0870 Epoch: 1 Global Step: 56020 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:43,894-Speed 2619.88 samples/sec Loss 13.6458 LearningRate 0.0869 Epoch: 1 Global Step: 56030 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:47,795-Speed 2625.45 samples/sec Loss 13.6635 LearningRate 0.0869 Epoch: 1 Global Step: 56040 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:51,724-Speed 2607.26 samples/sec Loss 13.7354 LearningRate 0.0869 Epoch: 1 Global Step: 56050 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:55,618-Speed 2630.61 samples/sec Loss 13.6860 LearningRate 0.0869 Epoch: 1 Global Step: 56060 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:00:59,512-Speed 2630.46 samples/sec Loss 13.6433 LearningRate 0.0869 Epoch: 1 Global Step: 56070 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:03,402-Speed 2632.81 samples/sec Loss 13.7211 LearningRate 0.0869 Epoch: 1 Global Step: 56080 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:07,317-Speed 2616.19 samples/sec Loss 13.6352 LearningRate 0.0869 Epoch: 1 Global Step: 56090 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:11,216-Speed 2627.15 samples/sec Loss 13.6445 LearningRate 0.0869 Epoch: 1 Global Step: 56100 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:15,123-Speed 2621.74 samples/sec Loss 13.6010 LearningRate 0.0869 Epoch: 1 Global Step: 56110 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:19,022-Speed 2627.12 samples/sec Loss 13.5306 LearningRate 0.0869 Epoch: 1 Global Step: 56120 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:22,951-Speed 2607.04 samples/sec Loss 13.5703 LearningRate 0.0869 Epoch: 1 Global Step: 56130 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:26,843-Speed 2632.26 samples/sec Loss 13.6676 LearningRate 0.0869 Epoch: 1 Global Step: 56140 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:01:30,691-Speed 2661.74 samples/sec Loss 13.8708 LearningRate 0.0869 Epoch: 1 Global Step: 56150 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:01:34,600-Speed 2619.97 samples/sec Loss 13.4749 LearningRate 0.0869 Epoch: 1 Global Step: 56160 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:01:38,528-Speed 2608.25 samples/sec Loss 13.5467 LearningRate 0.0869 Epoch: 1 Global Step: 56170 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:01:42,420-Speed 2630.98 samples/sec Loss 13.5801 LearningRate 0.0869 Epoch: 1 Global Step: 56180 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:01:46,325-Speed 2623.25 samples/sec Loss 13.6105 LearningRate 0.0869 Epoch: 1 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:01:50,217-Speed 2631.93 samples/sec Loss 13.6953 LearningRate 0.0869 Epoch: 1 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:01:54,113-Speed 2629.20 samples/sec Loss 13.6315 LearningRate 0.0869 Epoch: 1 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:01:58,009-Speed 2628.81 samples/sec Loss 13.4678 LearningRate 0.0869 Epoch: 1 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:02:01,902-Speed 2631.40 samples/sec Loss 13.6619 LearningRate 0.0869 Epoch: 1 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:02:05,799-Speed 2628.25 samples/sec Loss 13.7096 LearningRate 0.0869 Epoch: 1 Global Step: 56240 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:02:09,695-Speed 2628.75 samples/sec Loss 13.4775 LearningRate 0.0869 Epoch: 1 Global Step: 56250 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:13,589-Speed 2629.99 samples/sec Loss 13.6840 LearningRate 0.0869 Epoch: 1 Global Step: 56260 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:17,508-Speed 2613.94 samples/sec Loss 13.6166 LearningRate 0.0869 Epoch: 1 Global Step: 56270 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:21,404-Speed 2629.47 samples/sec Loss 13.5647 LearningRate 0.0869 Epoch: 1 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:25,297-Speed 2630.86 samples/sec Loss 13.3841 LearningRate 0.0869 Epoch: 1 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:29,206-Speed 2619.89 samples/sec Loss 13.5239 LearningRate 0.0869 Epoch: 1 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:33,102-Speed 2629.02 samples/sec Loss 13.4743 LearningRate 0.0869 Epoch: 1 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:36,992-Speed 2633.68 samples/sec Loss 13.5226 LearningRate 0.0869 Epoch: 1 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:40,883-Speed 2632.20 samples/sec Loss 13.5473 LearningRate 0.0869 Epoch: 1 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:44,792-Speed 2620.15 samples/sec Loss 13.8219 LearningRate 0.0869 Epoch: 1 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:02:48,689-Speed 2628.70 samples/sec Loss 13.6197 LearningRate 0.0869 Epoch: 1 Global Step: 56350 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:02:52,581-Speed 2631.15 samples/sec Loss 13.5089 LearningRate 0.0869 Epoch: 1 Global Step: 56360 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:02:56,510-Speed 2606.97 samples/sec Loss 13.6546 LearningRate 0.0869 Epoch: 1 Global Step: 56370 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:00,415-Speed 2622.80 samples/sec Loss 13.6639 LearningRate 0.0869 Epoch: 1 Global Step: 56380 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:04,335-Speed 2613.43 samples/sec Loss 13.6135 LearningRate 0.0869 Epoch: 1 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:08,233-Speed 2627.38 samples/sec Loss 13.6706 LearningRate 0.0869 Epoch: 1 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:12,162-Speed 2607.47 samples/sec Loss 13.5648 LearningRate 0.0869 Epoch: 1 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:16,104-Speed 2598.03 samples/sec Loss 13.4222 LearningRate 0.0869 Epoch: 1 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:20,021-Speed 2615.09 samples/sec Loss 13.5667 LearningRate 0.0869 Epoch: 1 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:23,953-Speed 2605.07 samples/sec Loss 13.6794 LearningRate 0.0869 Epoch: 1 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:27,922-Speed 2580.91 samples/sec Loss 13.4559 LearningRate 0.0869 Epoch: 1 Global Step: 56450 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:03:31,816-Speed 2629.63 samples/sec Loss 13.5121 LearningRate 0.0869 Epoch: 1 Global Step: 56460 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:03:35,709-Speed 2631.57 samples/sec Loss 13.5443 LearningRate 0.0868 Epoch: 1 Global Step: 56470 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:39,608-Speed 2627.03 samples/sec Loss 13.4822 LearningRate 0.0868 Epoch: 1 Global Step: 56480 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:43,521-Speed 2618.08 samples/sec Loss 13.5658 LearningRate 0.0868 Epoch: 1 Global Step: 56490 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:47,420-Speed 2626.83 samples/sec Loss 13.4995 LearningRate 0.0868 Epoch: 1 Global Step: 56500 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:51,317-Speed 2629.36 samples/sec Loss 13.6272 LearningRate 0.0868 Epoch: 1 Global Step: 56510 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:55,210-Speed 2631.49 samples/sec Loss 13.6287 LearningRate 0.0868 Epoch: 1 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:03:59,109-Speed 2626.13 samples/sec Loss 13.6108 LearningRate 0.0868 Epoch: 1 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:03,014-Speed 2623.42 samples/sec Loss 13.6372 LearningRate 0.0868 Epoch: 1 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:06,914-Speed 2626.21 samples/sec Loss 13.4574 LearningRate 0.0868 Epoch: 1 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:10,829-Speed 2616.75 samples/sec Loss 13.5913 LearningRate 0.0868 Epoch: 1 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:14,738-Speed 2619.96 samples/sec Loss 13.5125 LearningRate 0.0868 Epoch: 1 Global Step: 56570 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:04:18,640-Speed 2625.54 samples/sec Loss 13.6148 LearningRate 0.0868 Epoch: 1 Global Step: 56580 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:04:22,547-Speed 2621.47 samples/sec Loss 13.6042 LearningRate 0.0868 Epoch: 1 Global Step: 56590 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:04:26,417-Speed 2646.77 samples/sec Loss 13.4964 LearningRate 0.0868 Epoch: 1 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:30,312-Speed 2629.46 samples/sec Loss 13.5973 LearningRate 0.0868 Epoch: 1 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:34,243-Speed 2606.04 samples/sec Loss 13.6153 LearningRate 0.0868 Epoch: 1 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:38,145-Speed 2624.60 samples/sec Loss 13.4679 LearningRate 0.0868 Epoch: 1 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:42,040-Speed 2630.10 samples/sec Loss 13.5334 LearningRate 0.0868 Epoch: 1 Global Step: 56640 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:45,947-Speed 2621.43 samples/sec Loss 13.4576 LearningRate 0.0868 Epoch: 1 Global Step: 56650 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:49,841-Speed 2630.29 samples/sec Loss 13.6149 LearningRate 0.0868 Epoch: 1 Global Step: 56660 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:53,737-Speed 2629.81 samples/sec Loss 13.5063 LearningRate 0.0868 Epoch: 1 Global Step: 56670 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:04:57,653-Speed 2615.37 samples/sec Loss 13.5752 LearningRate 0.0868 Epoch: 1 Global Step: 56680 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:01,544-Speed 2631.68 samples/sec Loss 13.5491 LearningRate 0.0868 Epoch: 1 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:05,441-Speed 2628.66 samples/sec Loss 13.5419 LearningRate 0.0868 Epoch: 1 Global Step: 56700 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:05:09,338-Speed 2628.99 samples/sec Loss 13.4853 LearningRate 0.0868 Epoch: 1 Global Step: 56710 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:05:13,219-Speed 2639.07 samples/sec Loss 13.5478 LearningRate 0.0868 Epoch: 1 Global Step: 56720 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:17,113-Speed 2630.44 samples/sec Loss 13.4878 LearningRate 0.0868 Epoch: 1 Global Step: 56730 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:21,005-Speed 2632.26 samples/sec Loss 13.4938 LearningRate 0.0868 Epoch: 1 Global Step: 56740 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:24,898-Speed 2630.36 samples/sec Loss 13.7358 LearningRate 0.0868 Epoch: 1 Global Step: 56750 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:28,884-Speed 2570.22 samples/sec Loss 13.5157 LearningRate 0.0868 Epoch: 1 Global Step: 56760 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:32,776-Speed 2631.18 samples/sec Loss 13.4473 LearningRate 0.0868 Epoch: 1 Global Step: 56770 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:36,676-Speed 2626.34 samples/sec Loss 13.5737 LearningRate 0.0868 Epoch: 1 Global Step: 56780 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:40,582-Speed 2622.27 samples/sec Loss 13.5044 LearningRate 0.0868 Epoch: 1 Global Step: 56790 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:44,476-Speed 2630.35 samples/sec Loss 13.4981 LearningRate 0.0868 Epoch: 1 Global Step: 56800 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:48,371-Speed 2629.54 samples/sec Loss 13.6274 LearningRate 0.0868 Epoch: 1 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:05:52,272-Speed 2625.74 samples/sec Loss 13.3686 LearningRate 0.0868 Epoch: 1 Global Step: 56820 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:05:56,165-Speed 2631.18 samples/sec Loss 13.5358 LearningRate 0.0868 Epoch: 1 Global Step: 56830 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:06:00,106-Speed 2599.01 samples/sec Loss 13.5897 LearningRate 0.0868 Epoch: 1 Global Step: 56840 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:06:04,059-Speed 2590.59 samples/sec Loss 13.5801 LearningRate 0.0868 Epoch: 1 Global Step: 56850 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:06:07,947-Speed 2634.56 samples/sec Loss 13.5855 LearningRate 0.0868 Epoch: 1 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:06:11,808-Speed 2652.56 samples/sec Loss 13.5777 LearningRate 0.0868 Epoch: 1 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:06:15,698-Speed 2633.48 samples/sec Loss 13.5956 LearningRate 0.0868 Epoch: 1 Global Step: 56880 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:06:19,572-Speed 2643.76 samples/sec Loss 13.7270 LearningRate 0.0868 Epoch: 1 Global Step: 56890 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:23,473-Speed 2625.60 samples/sec Loss 13.5636 LearningRate 0.0868 Epoch: 1 Global Step: 56900 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:27,365-Speed 2631.92 samples/sec Loss 13.6558 LearningRate 0.0868 Epoch: 1 Global Step: 56910 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:31,258-Speed 2631.51 samples/sec Loss 13.5559 LearningRate 0.0867 Epoch: 1 Global Step: 56920 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:35,149-Speed 2631.67 samples/sec Loss 13.5136 LearningRate 0.0867 Epoch: 1 Global Step: 56930 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:39,051-Speed 2625.29 samples/sec Loss 13.5626 LearningRate 0.0867 Epoch: 1 Global Step: 56940 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:42,947-Speed 2629.20 samples/sec Loss 13.5813 LearningRate 0.0867 Epoch: 1 Global Step: 56950 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:46,859-Speed 2618.74 samples/sec Loss 13.5720 LearningRate 0.0867 Epoch: 1 Global Step: 56960 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:50,758-Speed 2626.42 samples/sec Loss 13.5851 LearningRate 0.0867 Epoch: 1 Global Step: 56970 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:54,653-Speed 2634.09 samples/sec Loss 13.6387 LearningRate 0.0867 Epoch: 1 Global Step: 56980 Fp16 Grad Scale: 16384 Required: 87 hours
Training: 2022-04-13 02:06:58,546-Speed 2630.79 samples/sec Loss 13.5925 LearningRate 0.0867 Epoch: 1 Global Step: 56990 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:02,446-Speed 2627.04 samples/sec Loss 13.5534 LearningRate 0.0867 Epoch: 1 Global Step: 57000 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:06,337-Speed 2632.32 samples/sec Loss 13.5837 LearningRate 0.0867 Epoch: 1 Global Step: 57010 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:10,231-Speed 2630.04 samples/sec Loss 13.6869 LearningRate 0.0867 Epoch: 1 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:14,145-Speed 2616.71 samples/sec Loss 13.5643 LearningRate 0.0867 Epoch: 1 Global Step: 57030 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:18,040-Speed 2630.08 samples/sec Loss 13.7014 LearningRate 0.0867 Epoch: 1 Global Step: 57040 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:21,932-Speed 2632.02 samples/sec Loss 13.5634 LearningRate 0.0867 Epoch: 1 Global Step: 57050 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:25,831-Speed 2626.63 samples/sec Loss 13.5576 LearningRate 0.0867 Epoch: 1 Global Step: 57060 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:29,735-Speed 2623.94 samples/sec Loss 13.5630 LearningRate 0.0867 Epoch: 1 Global Step: 57070 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:33,635-Speed 2625.88 samples/sec Loss 13.5553 LearningRate 0.0867 Epoch: 1 Global Step: 57080 Fp16 Grad Scale: 32768 Required: 87 hours
Training: 2022-04-13 02:07:37,527-Speed 2631.60 samples/sec Loss 13.6148 LearningRate 0.0867 Epoch: 1 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:07:41,421-Speed 2629.67 samples/sec Loss 13.4327 LearningRate 0.0867 Epoch: 1 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:07:45,316-Speed 2630.36 samples/sec Loss 13.4959 LearningRate 0.0867 Epoch: 1 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:07:49,208-Speed 2631.77 samples/sec Loss 13.5697 LearningRate 0.0867 Epoch: 1 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:07:53,109-Speed 2625.34 samples/sec Loss 13.3425 LearningRate 0.0867 Epoch: 1 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:07:57,005-Speed 2629.19 samples/sec Loss 13.4949 LearningRate 0.0867 Epoch: 1 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:08:00,912-Speed 2621.70 samples/sec Loss 13.4547 LearningRate 0.0867 Epoch: 1 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:08:04,813-Speed 2625.34 samples/sec Loss 13.4140 LearningRate 0.0867 Epoch: 1 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:08:08,706-Speed 2630.93 samples/sec Loss 13.4620 LearningRate 0.0867 Epoch: 1 Global Step: 57170 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:08:12,619-Speed 2617.54 samples/sec Loss 13.5399 LearningRate 0.0867 Epoch: 1 Global Step: 57180 Fp16 Grad Scale: 65536 Required: 87 hours
Training: 2022-04-13 02:08:16,513-Speed 2630.63 samples/sec Loss 13.4724 LearningRate 0.0867 Epoch: 1 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:20,408-Speed 2629.83 samples/sec Loss 13.6005 LearningRate 0.0867 Epoch: 1 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:24,328-Speed 2613.24 samples/sec Loss 13.4485 LearningRate 0.0867 Epoch: 1 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:28,220-Speed 2631.44 samples/sec Loss 13.6112 LearningRate 0.0867 Epoch: 1 Global Step: 57220 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:32,113-Speed 2631.21 samples/sec Loss 13.5018 LearningRate 0.0867 Epoch: 1 Global Step: 57230 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:36,027-Speed 2616.42 samples/sec Loss 13.5537 LearningRate 0.0867 Epoch: 1 Global Step: 57240 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:39,919-Speed 2631.81 samples/sec Loss 13.5487 LearningRate 0.0867 Epoch: 1 Global Step: 57250 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:43,812-Speed 2631.15 samples/sec Loss 13.7290 LearningRate 0.0867 Epoch: 1 Global Step: 57260 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:47,707-Speed 2629.90 samples/sec Loss 13.5362 LearningRate 0.0867 Epoch: 1 Global Step: 57270 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:51,618-Speed 2619.08 samples/sec Loss 13.5994 LearningRate 0.0867 Epoch: 1 Global Step: 57280 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:08:55,510-Speed 2631.26 samples/sec Loss 13.5277 LearningRate 0.0867 Epoch: 1 Global Step: 57290 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:08:59,407-Speed 2628.59 samples/sec Loss 13.4746 LearningRate 0.0867 Epoch: 1 Global Step: 57300 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:09:03,308-Speed 2625.59 samples/sec Loss 13.5715 LearningRate 0.0867 Epoch: 1 Global Step: 57310 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:07,226-Speed 2613.49 samples/sec Loss 13.4822 LearningRate 0.0867 Epoch: 1 Global Step: 57320 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:11,122-Speed 2630.19 samples/sec Loss 13.5055 LearningRate 0.0867 Epoch: 1 Global Step: 57330 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:15,021-Speed 2626.81 samples/sec Loss 13.5675 LearningRate 0.0867 Epoch: 1 Global Step: 57340 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:18,920-Speed 2627.06 samples/sec Loss 13.4048 LearningRate 0.0867 Epoch: 1 Global Step: 57350 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:22,816-Speed 2629.73 samples/sec Loss 13.4848 LearningRate 0.0866 Epoch: 1 Global Step: 57360 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:26,713-Speed 2628.31 samples/sec Loss 13.5374 LearningRate 0.0866 Epoch: 1 Global Step: 57370 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:30,618-Speed 2622.80 samples/sec Loss 13.4661 LearningRate 0.0866 Epoch: 1 Global Step: 57380 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:09:34,487-Speed 2646.72 samples/sec Loss 13.5338 LearningRate 0.0866 Epoch: 1 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:09:38,390-Speed 2624.36 samples/sec Loss 13.4015 LearningRate 0.0866 Epoch: 1 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:09:42,295-Speed 2623.46 samples/sec Loss 13.5631 LearningRate 0.0866 Epoch: 1 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:09:46,189-Speed 2630.01 samples/sec Loss 13.5812 LearningRate 0.0866 Epoch: 1 Global Step: 57420 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:09:50,085-Speed 2629.41 samples/sec Loss 13.5886 LearningRate 0.0866 Epoch: 1 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:09:53,981-Speed 2629.25 samples/sec Loss 13.5502 LearningRate 0.0866 Epoch: 1 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:09:57,877-Speed 2628.76 samples/sec Loss 13.5116 LearningRate 0.0866 Epoch: 1 Global Step: 57450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:01,777-Speed 2626.71 samples/sec Loss 13.6344 LearningRate 0.0866 Epoch: 1 Global Step: 57460 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:05,674-Speed 2628.09 samples/sec Loss 13.4127 LearningRate 0.0866 Epoch: 1 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:09,579-Speed 2622.90 samples/sec Loss 13.5374 LearningRate 0.0866 Epoch: 1 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:13,475-Speed 2629.11 samples/sec Loss 13.4211 LearningRate 0.0866 Epoch: 1 Global Step: 57490 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:10:17,376-Speed 2626.19 samples/sec Loss 13.4748 LearningRate 0.0866 Epoch: 1 Global Step: 57500 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:10:21,275-Speed 2626.59 samples/sec Loss 13.5352 LearningRate 0.0866 Epoch: 1 Global Step: 57510 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:10:25,172-Speed 2628.27 samples/sec Loss 13.5936 LearningRate 0.0866 Epoch: 1 Global Step: 57520 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:29,096-Speed 2610.64 samples/sec Loss 13.4219 LearningRate 0.0866 Epoch: 1 Global Step: 57530 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:33,003-Speed 2621.75 samples/sec Loss 13.4279 LearningRate 0.0866 Epoch: 1 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:36,940-Speed 2601.16 samples/sec Loss 13.5916 LearningRate 0.0866 Epoch: 1 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:40,840-Speed 2626.94 samples/sec Loss 13.6022 LearningRate 0.0866 Epoch: 1 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:44,745-Speed 2622.48 samples/sec Loss 13.5343 LearningRate 0.0866 Epoch: 1 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:48,688-Speed 2598.05 samples/sec Loss 13.5629 LearningRate 0.0866 Epoch: 1 Global Step: 57580 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:52,602-Speed 2616.84 samples/sec Loss 13.5686 LearningRate 0.0866 Epoch: 1 Global Step: 57590 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:10:56,521-Speed 2614.37 samples/sec Loss 13.4835 LearningRate 0.0866 Epoch: 1 Global Step: 57600 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:00,439-Speed 2614.03 samples/sec Loss 13.5398 LearningRate 0.0866 Epoch: 1 Global Step: 57610 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:04,336-Speed 2627.98 samples/sec Loss 13.3575 LearningRate 0.0866 Epoch: 1 Global Step: 57620 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:11:08,232-Speed 2629.27 samples/sec Loss 13.5605 LearningRate 0.0866 Epoch: 1 Global Step: 57630 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:11:12,110-Speed 2641.31 samples/sec Loss 13.4669 LearningRate 0.0866 Epoch: 1 Global Step: 57640 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:16,010-Speed 2626.08 samples/sec Loss 13.4919 LearningRate 0.0866 Epoch: 1 Global Step: 57650 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:19,915-Speed 2623.03 samples/sec Loss 13.4568 LearningRate 0.0866 Epoch: 1 Global Step: 57660 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:23,809-Speed 2630.25 samples/sec Loss 13.5606 LearningRate 0.0866 Epoch: 1 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:27,704-Speed 2629.59 samples/sec Loss 13.6215 LearningRate 0.0866 Epoch: 1 Global Step: 57680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:31,600-Speed 2629.81 samples/sec Loss 13.4978 LearningRate 0.0866 Epoch: 1 Global Step: 57690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:35,499-Speed 2626.70 samples/sec Loss 13.4318 LearningRate 0.0866 Epoch: 1 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:39,401-Speed 2625.21 samples/sec Loss 13.5358 LearningRate 0.0866 Epoch: 1 Global Step: 57710 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:43,328-Speed 2608.10 samples/sec Loss 13.4118 LearningRate 0.0866 Epoch: 1 Global Step: 57720 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:47,232-Speed 2623.64 samples/sec Loss 13.5183 LearningRate 0.0866 Epoch: 1 Global Step: 57730 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:11:51,131-Speed 2626.82 samples/sec Loss 13.6636 LearningRate 0.0866 Epoch: 1 Global Step: 57740 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:11:55,028-Speed 2628.81 samples/sec Loss 13.6271 LearningRate 0.0866 Epoch: 1 Global Step: 57750 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:11:58,924-Speed 2628.56 samples/sec Loss 13.6913 LearningRate 0.0866 Epoch: 1 Global Step: 57760 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:12:02,813-Speed 2634.26 samples/sec Loss 13.6077 LearningRate 0.0866 Epoch: 1 Global Step: 57770 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:06,711-Speed 2627.79 samples/sec Loss 13.3972 LearningRate 0.0866 Epoch: 1 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:10,607-Speed 2628.95 samples/sec Loss 13.4413 LearningRate 0.0866 Epoch: 1 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:14,500-Speed 2630.35 samples/sec Loss 13.5896 LearningRate 0.0866 Epoch: 1 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:18,392-Speed 2632.25 samples/sec Loss 13.6630 LearningRate 0.0865 Epoch: 1 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:22,293-Speed 2625.18 samples/sec Loss 13.4933 LearningRate 0.0865 Epoch: 1 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:26,205-Speed 2618.03 samples/sec Loss 13.5069 LearningRate 0.0865 Epoch: 1 Global Step: 57830 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:30,127-Speed 2612.45 samples/sec Loss 13.4942 LearningRate 0.0865 Epoch: 1 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:34,018-Speed 2632.31 samples/sec Loss 13.4483 LearningRate 0.0865 Epoch: 1 Global Step: 57850 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:37,911-Speed 2630.55 samples/sec Loss 13.6162 LearningRate 0.0865 Epoch: 1 Global Step: 57860 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:41,819-Speed 2621.19 samples/sec Loss 13.4954 LearningRate 0.0865 Epoch: 1 Global Step: 57870 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:12:45,728-Speed 2620.66 samples/sec Loss 13.6014 LearningRate 0.0865 Epoch: 1 Global Step: 57880 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:12:49,605-Speed 2641.70 samples/sec Loss 13.6322 LearningRate 0.0865 Epoch: 1 Global Step: 57890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:53,500-Speed 2629.92 samples/sec Loss 13.5366 LearningRate 0.0865 Epoch: 1 Global Step: 57900 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:12:57,396-Speed 2629.01 samples/sec Loss 13.6129 LearningRate 0.0865 Epoch: 1 Global Step: 57910 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:01,289-Speed 2630.94 samples/sec Loss 13.3949 LearningRate 0.0865 Epoch: 1 Global Step: 57920 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:05,190-Speed 2625.76 samples/sec Loss 13.5361 LearningRate 0.0865 Epoch: 1 Global Step: 57930 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:09,088-Speed 2627.73 samples/sec Loss 13.4492 LearningRate 0.0865 Epoch: 1 Global Step: 57940 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:12,986-Speed 2627.87 samples/sec Loss 13.5324 LearningRate 0.0865 Epoch: 1 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:16,884-Speed 2627.43 samples/sec Loss 13.4489 LearningRate 0.0865 Epoch: 1 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:20,780-Speed 2629.00 samples/sec Loss 13.5784 LearningRate 0.0865 Epoch: 1 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:24,673-Speed 2631.76 samples/sec Loss 13.3652 LearningRate 0.0865 Epoch: 1 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:28,553-Speed 2639.61 samples/sec Loss 13.4860 LearningRate 0.0865 Epoch: 1 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:32,470-Speed 2615.28 samples/sec Loss 13.5294 LearningRate 0.0865 Epoch: 1 Global Step: 58000 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:36,365-Speed 2629.75 samples/sec Loss 13.3977 LearningRate 0.0865 Epoch: 1 Global Step: 58010 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:40,261-Speed 2628.68 samples/sec Loss 13.5670 LearningRate 0.0865 Epoch: 1 Global Step: 58020 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:44,152-Speed 2632.19 samples/sec Loss 13.4687 LearningRate 0.0865 Epoch: 1 Global Step: 58030 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:48,044-Speed 2632.22 samples/sec Loss 13.5349 LearningRate 0.0865 Epoch: 1 Global Step: 58040 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:51,938-Speed 2630.46 samples/sec Loss 13.4411 LearningRate 0.0865 Epoch: 1 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:55,828-Speed 2633.05 samples/sec Loss 13.4730 LearningRate 0.0865 Epoch: 1 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:13:59,719-Speed 2632.34 samples/sec Loss 13.4127 LearningRate 0.0865 Epoch: 1 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:03,611-Speed 2631.75 samples/sec Loss 13.2897 LearningRate 0.0865 Epoch: 1 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:07,506-Speed 2629.26 samples/sec Loss 13.6155 LearningRate 0.0865 Epoch: 1 Global Step: 58090 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:14:11,400-Speed 2630.36 samples/sec Loss 13.4978 LearningRate 0.0865 Epoch: 1 Global Step: 58100 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:14:15,292-Speed 2631.56 samples/sec Loss 13.3840 LearningRate 0.0865 Epoch: 1 Global Step: 58110 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:14:19,188-Speed 2629.68 samples/sec Loss 13.5428 LearningRate 0.0865 Epoch: 1 Global Step: 58120 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:14:23,081-Speed 2631.23 samples/sec Loss 13.4410 LearningRate 0.0865 Epoch: 1 Global Step: 58130 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:14:26,980-Speed 2627.05 samples/sec Loss 13.4840 LearningRate 0.0865 Epoch: 1 Global Step: 58140 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:14:30,858-Speed 2641.24 samples/sec Loss 13.3795 LearningRate 0.0865 Epoch: 1 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:34,794-Speed 2601.88 samples/sec Loss 13.4914 LearningRate 0.0865 Epoch: 1 Global Step: 58160 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:38,735-Speed 2605.18 samples/sec Loss 13.5077 LearningRate 0.0865 Epoch: 1 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:42,645-Speed 2619.78 samples/sec Loss 13.5250 LearningRate 0.0865 Epoch: 1 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:46,540-Speed 2629.53 samples/sec Loss 13.6529 LearningRate 0.0865 Epoch: 1 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:50,458-Speed 2614.47 samples/sec Loss 13.4589 LearningRate 0.0865 Epoch: 1 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:54,349-Speed 2632.83 samples/sec Loss 13.4544 LearningRate 0.0865 Epoch: 1 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:14:58,238-Speed 2633.35 samples/sec Loss 13.3107 LearningRate 0.0865 Epoch: 1 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:02,131-Speed 2630.76 samples/sec Loss 13.4272 LearningRate 0.0865 Epoch: 1 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:06,026-Speed 2630.33 samples/sec Loss 13.3445 LearningRate 0.0865 Epoch: 1 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:10,063-Speed 2537.24 samples/sec Loss 13.3312 LearningRate 0.0864 Epoch: 1 Global Step: 58250 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:15:13,959-Speed 2629.25 samples/sec Loss 13.4102 LearningRate 0.0864 Epoch: 1 Global Step: 58260 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:15:17,857-Speed 2627.15 samples/sec Loss 13.4309 LearningRate 0.0864 Epoch: 1 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:21,749-Speed 2631.94 samples/sec Loss 13.6459 LearningRate 0.0864 Epoch: 1 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:25,647-Speed 2627.78 samples/sec Loss 13.4125 LearningRate 0.0864 Epoch: 1 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:29,553-Speed 2622.21 samples/sec Loss 13.5030 LearningRate 0.0864 Epoch: 1 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:33,458-Speed 2622.41 samples/sec Loss 13.5603 LearningRate 0.0864 Epoch: 1 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:37,352-Speed 2631.12 samples/sec Loss 13.5221 LearningRate 0.0864 Epoch: 1 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:41,244-Speed 2631.35 samples/sec Loss 13.5279 LearningRate 0.0864 Epoch: 1 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:45,156-Speed 2618.15 samples/sec Loss 13.4446 LearningRate 0.0864 Epoch: 1 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:49,058-Speed 2625.44 samples/sec Loss 13.5851 LearningRate 0.0864 Epoch: 1 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:52,951-Speed 2631.46 samples/sec Loss 13.5339 LearningRate 0.0864 Epoch: 1 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:15:56,855-Speed 2622.85 samples/sec Loss 13.4321 LearningRate 0.0864 Epoch: 1 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:00,753-Speed 2627.69 samples/sec Loss 13.3338 LearningRate 0.0864 Epoch: 1 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:04,671-Speed 2614.67 samples/sec Loss 13.5982 LearningRate 0.0864 Epoch: 1 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:08,565-Speed 2630.63 samples/sec Loss 13.5981 LearningRate 0.0864 Epoch: 1 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:12,467-Speed 2624.92 samples/sec Loss 13.4477 LearningRate 0.0864 Epoch: 1 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:16,359-Speed 2631.93 samples/sec Loss 13.5213 LearningRate 0.0864 Epoch: 1 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:20,260-Speed 2625.75 samples/sec Loss 13.2251 LearningRate 0.0864 Epoch: 1 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:24,170-Speed 2619.64 samples/sec Loss 13.4779 LearningRate 0.0864 Epoch: 1 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:28,086-Speed 2614.73 samples/sec Loss 13.4904 LearningRate 0.0864 Epoch: 1 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:31,989-Speed 2624.39 samples/sec Loss 13.3611 LearningRate 0.0864 Epoch: 1 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:16:35,896-Speed 2621.54 samples/sec Loss 13.3473 LearningRate 0.0864 Epoch: 1 Global Step: 58470 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:16:39,797-Speed 2626.24 samples/sec Loss 13.5411 LearningRate 0.0864 Epoch: 1 Global Step: 58480 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:16:43,691-Speed 2630.18 samples/sec Loss 13.4287 LearningRate 0.0864 Epoch: 1 Global Step: 58490 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:16:47,583-Speed 2631.67 samples/sec Loss 13.3937 LearningRate 0.0864 Epoch: 1 Global Step: 58500 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:16:51,485-Speed 2625.47 samples/sec Loss 13.4984 LearningRate 0.0864 Epoch: 1 Global Step: 58510 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:16:55,381-Speed 2629.03 samples/sec Loss 13.5108 LearningRate 0.0864 Epoch: 1 Global Step: 58520 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:16:59,278-Speed 2628.01 samples/sec Loss 13.4025 LearningRate 0.0864 Epoch: 1 Global Step: 58530 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:03,176-Speed 2627.64 samples/sec Loss 13.6143 LearningRate 0.0864 Epoch: 1 Global Step: 58540 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:07,079-Speed 2623.88 samples/sec Loss 13.4004 LearningRate 0.0864 Epoch: 1 Global Step: 58550 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:10,972-Speed 2630.98 samples/sec Loss 13.3323 LearningRate 0.0864 Epoch: 1 Global Step: 58560 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:14,852-Speed 2640.30 samples/sec Loss 13.5716 LearningRate 0.0864 Epoch: 1 Global Step: 58570 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:18,744-Speed 2631.65 samples/sec Loss 13.3335 LearningRate 0.0864 Epoch: 1 Global Step: 58580 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:22,639-Speed 2629.32 samples/sec Loss 13.4631 LearningRate 0.0864 Epoch: 1 Global Step: 58590 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:26,535-Speed 2629.80 samples/sec Loss 13.4486 LearningRate 0.0864 Epoch: 1 Global Step: 58600 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:17:30,413-Speed 2641.04 samples/sec Loss 13.4037 LearningRate 0.0864 Epoch: 1 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:17:34,311-Speed 2627.59 samples/sec Loss 13.4823 LearningRate 0.0864 Epoch: 1 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:17:38,202-Speed 2631.91 samples/sec Loss 13.3748 LearningRate 0.0864 Epoch: 1 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:17:42,102-Speed 2626.65 samples/sec Loss 13.3719 LearningRate 0.0864 Epoch: 1 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:17:46,006-Speed 2623.40 samples/sec Loss 13.5332 LearningRate 0.0864 Epoch: 1 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:17:49,905-Speed 2627.48 samples/sec Loss 13.2553 LearningRate 0.0864 Epoch: 1 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:17:53,801-Speed 2628.93 samples/sec Loss 13.5175 LearningRate 0.0864 Epoch: 1 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:17:57,697-Speed 2629.15 samples/sec Loss 13.5177 LearningRate 0.0864 Epoch: 1 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:01,601-Speed 2623.41 samples/sec Loss 13.3956 LearningRate 0.0864 Epoch: 1 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:05,499-Speed 2627.29 samples/sec Loss 13.3928 LearningRate 0.0863 Epoch: 1 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:09,398-Speed 2626.66 samples/sec Loss 13.4832 LearningRate 0.0863 Epoch: 1 Global Step: 58710 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:18:13,304-Speed 2622.94 samples/sec Loss 13.3690 LearningRate 0.0863 Epoch: 1 Global Step: 58720 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:18:17,218-Speed 2616.93 samples/sec Loss 13.6595 LearningRate 0.0863 Epoch: 1 Global Step: 58730 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:18:21,121-Speed 2624.65 samples/sec Loss 13.3871 LearningRate 0.0863 Epoch: 1 Global Step: 58740 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:18:25,022-Speed 2625.56 samples/sec Loss 13.6077 LearningRate 0.0863 Epoch: 1 Global Step: 58750 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:18:28,895-Speed 2644.68 samples/sec Loss 13.5757 LearningRate 0.0863 Epoch: 1 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:32,800-Speed 2623.28 samples/sec Loss 13.4431 LearningRate 0.0863 Epoch: 1 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:36,702-Speed 2624.15 samples/sec Loss 13.5152 LearningRate 0.0863 Epoch: 1 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:40,619-Speed 2615.16 samples/sec Loss 13.4202 LearningRate 0.0863 Epoch: 1 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:44,508-Speed 2633.81 samples/sec Loss 13.5571 LearningRate 0.0863 Epoch: 1 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:48,408-Speed 2626.79 samples/sec Loss 13.3802 LearningRate 0.0863 Epoch: 1 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:52,306-Speed 2627.58 samples/sec Loss 13.4088 LearningRate 0.0863 Epoch: 1 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:18:56,241-Speed 2603.33 samples/sec Loss 13.3164 LearningRate 0.0863 Epoch: 1 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:19:00,165-Speed 2610.61 samples/sec Loss 13.5122 LearningRate 0.0863 Epoch: 1 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:19:04,072-Speed 2621.34 samples/sec Loss 13.4387 LearningRate 0.0863 Epoch: 1 Global Step: 58850 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:19:07,967-Speed 2629.19 samples/sec Loss 13.3977 LearningRate 0.0863 Epoch: 1 Global Step: 58860 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:19:11,865-Speed 2628.42 samples/sec Loss 13.5194 LearningRate 0.0863 Epoch: 1 Global Step: 58870 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:19:15,817-Speed 2591.45 samples/sec Loss 13.4072 LearningRate 0.0863 Epoch: 1 Global Step: 58880 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:19:19,703-Speed 2639.89 samples/sec Loss 13.4925 LearningRate 0.0863 Epoch: 1 Global Step: 58890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:19:23,591-Speed 2633.73 samples/sec Loss 13.3579 LearningRate 0.0863 Epoch: 1 Global Step: 58900 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:27,564-Speed 2578.86 samples/sec Loss 13.6435 LearningRate 0.0863 Epoch: 1 Global Step: 58910 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:31,459-Speed 2629.50 samples/sec Loss 13.6969 LearningRate 0.0863 Epoch: 1 Global Step: 58920 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:35,365-Speed 2622.12 samples/sec Loss 13.3869 LearningRate 0.0863 Epoch: 1 Global Step: 58930 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:39,275-Speed 2619.61 samples/sec Loss 13.2862 LearningRate 0.0863 Epoch: 1 Global Step: 58940 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:43,171-Speed 2629.43 samples/sec Loss 13.4351 LearningRate 0.0863 Epoch: 1 Global Step: 58950 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:47,067-Speed 2628.36 samples/sec Loss 13.4383 LearningRate 0.0863 Epoch: 1 Global Step: 58960 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:50,970-Speed 2624.44 samples/sec Loss 13.3802 LearningRate 0.0863 Epoch: 1 Global Step: 58970 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:54,871-Speed 2625.61 samples/sec Loss 13.6661 LearningRate 0.0863 Epoch: 1 Global Step: 58980 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:19:58,775-Speed 2624.24 samples/sec Loss 13.4829 LearningRate 0.0863 Epoch: 1 Global Step: 58990 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:20:02,695-Speed 2612.30 samples/sec Loss 13.5499 LearningRate 0.0863 Epoch: 1 Global Step: 59000 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:06,602-Speed 2621.71 samples/sec Loss 13.4954 LearningRate 0.0863 Epoch: 1 Global Step: 59010 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:10,499-Speed 2628.26 samples/sec Loss 13.3011 LearningRate 0.0863 Epoch: 1 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:14,398-Speed 2626.89 samples/sec Loss 13.4180 LearningRate 0.0863 Epoch: 1 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:18,293-Speed 2630.20 samples/sec Loss 13.5223 LearningRate 0.0863 Epoch: 1 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:22,189-Speed 2628.57 samples/sec Loss 13.4605 LearningRate 0.0863 Epoch: 1 Global Step: 59050 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:26,085-Speed 2628.93 samples/sec Loss 13.4482 LearningRate 0.0863 Epoch: 1 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:29,996-Speed 2619.02 samples/sec Loss 13.5287 LearningRate 0.0863 Epoch: 1 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:33,889-Speed 2631.34 samples/sec Loss 13.4470 LearningRate 0.0863 Epoch: 1 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:37,778-Speed 2633.47 samples/sec Loss 13.3394 LearningRate 0.0863 Epoch: 1 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:41,655-Speed 2641.86 samples/sec Loss 13.5424 LearningRate 0.0863 Epoch: 1 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:45,563-Speed 2621.08 samples/sec Loss 13.3415 LearningRate 0.0863 Epoch: 1 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:49,465-Speed 2624.61 samples/sec Loss 13.4607 LearningRate 0.0863 Epoch: 1 Global Step: 59120 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:53,365-Speed 2626.85 samples/sec Loss 13.4115 LearningRate 0.0863 Epoch: 1 Global Step: 59130 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:20:57,258-Speed 2630.92 samples/sec Loss 13.4289 LearningRate 0.0863 Epoch: 1 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:21:01,157-Speed 2626.65 samples/sec Loss 13.4171 LearningRate 0.0862 Epoch: 1 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:21:05,053-Speed 2629.06 samples/sec Loss 13.3222 LearningRate 0.0862 Epoch: 1 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:21:08,948-Speed 2629.68 samples/sec Loss 13.2742 LearningRate 0.0862 Epoch: 1 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:21:12,847-Speed 2627.29 samples/sec Loss 13.3948 LearningRate 0.0862 Epoch: 1 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:21:16,744-Speed 2627.95 samples/sec Loss 13.4123 LearningRate 0.0862 Epoch: 1 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:21:20,662-Speed 2615.17 samples/sec Loss 13.2488 LearningRate 0.0862 Epoch: 1 Global Step: 59200 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:24,565-Speed 2623.88 samples/sec Loss 13.3361 LearningRate 0.0862 Epoch: 1 Global Step: 59210 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:28,463-Speed 2628.26 samples/sec Loss 13.6657 LearningRate 0.0862 Epoch: 1 Global Step: 59220 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:32,364-Speed 2625.62 samples/sec Loss 13.4222 LearningRate 0.0862 Epoch: 1 Global Step: 59230 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:36,268-Speed 2623.63 samples/sec Loss 13.6328 LearningRate 0.0862 Epoch: 1 Global Step: 59240 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:40,162-Speed 2630.04 samples/sec Loss 13.5630 LearningRate 0.0862 Epoch: 1 Global Step: 59250 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:44,246-Speed 2507.80 samples/sec Loss 13.5261 LearningRate 0.0862 Epoch: 1 Global Step: 59260 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:48,180-Speed 2604.32 samples/sec Loss 13.3557 LearningRate 0.0862 Epoch: 1 Global Step: 59270 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:52,110-Speed 2605.99 samples/sec Loss 13.6120 LearningRate 0.0862 Epoch: 1 Global Step: 59280 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:56,016-Speed 2623.17 samples/sec Loss 13.2486 LearningRate 0.0862 Epoch: 1 Global Step: 59290 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:21:59,897-Speed 2639.69 samples/sec Loss 13.4088 LearningRate 0.0862 Epoch: 1 Global Step: 59300 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:22:03,804-Speed 2621.90 samples/sec Loss 13.4545 LearningRate 0.0862 Epoch: 1 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:22:07,694-Speed 2633.05 samples/sec Loss 13.3708 LearningRate 0.0862 Epoch: 1 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:22:11,584-Speed 2632.85 samples/sec Loss 13.4204 LearningRate 0.0862 Epoch: 1 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:22:15,460-Speed 2643.05 samples/sec Loss 13.6270 LearningRate 0.0862 Epoch: 1 Global Step: 59340 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:19,357-Speed 2628.44 samples/sec Loss 13.4775 LearningRate 0.0862 Epoch: 1 Global Step: 59350 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:23,250-Speed 2630.89 samples/sec Loss 13.5826 LearningRate 0.0862 Epoch: 1 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:27,163-Speed 2617.76 samples/sec Loss 13.3522 LearningRate 0.0862 Epoch: 1 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:31,083-Speed 2612.77 samples/sec Loss 13.3934 LearningRate 0.0862 Epoch: 1 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:35,011-Speed 2607.61 samples/sec Loss 13.4320 LearningRate 0.0862 Epoch: 1 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:38,926-Speed 2616.07 samples/sec Loss 13.4267 LearningRate 0.0862 Epoch: 1 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:42,867-Speed 2599.55 samples/sec Loss 13.4418 LearningRate 0.0862 Epoch: 1 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:46,802-Speed 2603.07 samples/sec Loss 13.3074 LearningRate 0.0862 Epoch: 1 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:50,706-Speed 2623.29 samples/sec Loss 13.4494 LearningRate 0.0862 Epoch: 1 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:22:54,610-Speed 2624.02 samples/sec Loss 13.4540 LearningRate 0.0862 Epoch: 1 Global Step: 59440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:22:58,509-Speed 2626.97 samples/sec Loss 13.4906 LearningRate 0.0862 Epoch: 1 Global Step: 59450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:02,406-Speed 2628.43 samples/sec Loss 13.4389 LearningRate 0.0862 Epoch: 1 Global Step: 59460 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:06,304-Speed 2627.37 samples/sec Loss 13.6186 LearningRate 0.0862 Epoch: 1 Global Step: 59470 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:10,222-Speed 2614.40 samples/sec Loss 13.5371 LearningRate 0.0862 Epoch: 1 Global Step: 59480 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:14,121-Speed 2626.80 samples/sec Loss 13.4750 LearningRate 0.0862 Epoch: 1 Global Step: 59490 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:18,038-Speed 2615.11 samples/sec Loss 13.6004 LearningRate 0.0862 Epoch: 1 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:21,949-Speed 2619.00 samples/sec Loss 13.3483 LearningRate 0.0862 Epoch: 1 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:25,850-Speed 2625.52 samples/sec Loss 13.5324 LearningRate 0.0862 Epoch: 1 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:29,757-Speed 2621.66 samples/sec Loss 13.4073 LearningRate 0.0862 Epoch: 1 Global Step: 59530 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:33,675-Speed 2614.55 samples/sec Loss 13.5416 LearningRate 0.0862 Epoch: 1 Global Step: 59540 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:23:37,611-Speed 2601.87 samples/sec Loss 13.4896 LearningRate 0.0862 Epoch: 1 Global Step: 59550 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:23:41,509-Speed 2628.02 samples/sec Loss 13.5163 LearningRate 0.0862 Epoch: 1 Global Step: 59560 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:23:45,412-Speed 2624.08 samples/sec Loss 13.4615 LearningRate 0.0862 Epoch: 1 Global Step: 59570 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:23:49,294-Speed 2638.79 samples/sec Loss 13.4131 LearningRate 0.0862 Epoch: 1 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:53,196-Speed 2625.56 samples/sec Loss 13.3029 LearningRate 0.0861 Epoch: 1 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:23:57,168-Speed 2578.68 samples/sec Loss 13.3419 LearningRate 0.0861 Epoch: 1 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:01,076-Speed 2621.11 samples/sec Loss 13.3079 LearningRate 0.0861 Epoch: 1 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:04,974-Speed 2627.67 samples/sec Loss 13.3818 LearningRate 0.0861 Epoch: 1 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:08,870-Speed 2628.50 samples/sec Loss 13.2506 LearningRate 0.0861 Epoch: 1 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:12,766-Speed 2629.55 samples/sec Loss 13.3014 LearningRate 0.0861 Epoch: 1 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:16,670-Speed 2623.35 samples/sec Loss 13.4051 LearningRate 0.0861 Epoch: 1 Global Step: 59650 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:20,564-Speed 2629.81 samples/sec Loss 13.3981 LearningRate 0.0861 Epoch: 1 Global Step: 59660 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:24,457-Speed 2631.53 samples/sec Loss 13.4373 LearningRate 0.0861 Epoch: 1 Global Step: 59670 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:28,354-Speed 2628.42 samples/sec Loss 13.2999 LearningRate 0.0861 Epoch: 1 Global Step: 59680 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:24:32,232-Speed 2640.98 samples/sec Loss 13.3404 LearningRate 0.0861 Epoch: 1 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:36,126-Speed 2630.49 samples/sec Loss 13.3889 LearningRate 0.0861 Epoch: 1 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:40,024-Speed 2627.13 samples/sec Loss 13.4326 LearningRate 0.0861 Epoch: 1 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:43,932-Speed 2620.89 samples/sec Loss 13.4482 LearningRate 0.0861 Epoch: 1 Global Step: 59720 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:47,894-Speed 2585.31 samples/sec Loss 13.4204 LearningRate 0.0861 Epoch: 1 Global Step: 59730 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:51,784-Speed 2633.43 samples/sec Loss 13.3773 LearningRate 0.0861 Epoch: 1 Global Step: 59740 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:55,675-Speed 2632.96 samples/sec Loss 13.4507 LearningRate 0.0861 Epoch: 1 Global Step: 59750 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:24:59,603-Speed 2607.59 samples/sec Loss 13.4949 LearningRate 0.0861 Epoch: 1 Global Step: 59760 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:03,508-Speed 2623.23 samples/sec Loss 13.4653 LearningRate 0.0861 Epoch: 1 Global Step: 59770 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:07,403-Speed 2629.52 samples/sec Loss 13.2933 LearningRate 0.0861 Epoch: 1 Global Step: 59780 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:11,303-Speed 2625.91 samples/sec Loss 13.4803 LearningRate 0.0861 Epoch: 1 Global Step: 59790 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:25:15,222-Speed 2613.52 samples/sec Loss 13.3634 LearningRate 0.0861 Epoch: 1 Global Step: 59800 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:25:19,136-Speed 2617.58 samples/sec Loss 13.5002 LearningRate 0.0861 Epoch: 1 Global Step: 59810 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:25:23,013-Speed 2642.03 samples/sec Loss 13.5322 LearningRate 0.0861 Epoch: 1 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:26,906-Speed 2630.86 samples/sec Loss 13.3879 LearningRate 0.0861 Epoch: 1 Global Step: 59830 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:30,795-Speed 2633.79 samples/sec Loss 13.3910 LearningRate 0.0861 Epoch: 1 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:34,691-Speed 2629.71 samples/sec Loss 13.4373 LearningRate 0.0861 Epoch: 1 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:38,588-Speed 2628.08 samples/sec Loss 13.4896 LearningRate 0.0861 Epoch: 1 Global Step: 59860 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:42,482-Speed 2629.96 samples/sec Loss 13.4348 LearningRate 0.0861 Epoch: 1 Global Step: 59870 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:46,376-Speed 2630.27 samples/sec Loss 13.3985 LearningRate 0.0861 Epoch: 1 Global Step: 59880 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:50,283-Speed 2621.66 samples/sec Loss 13.3430 LearningRate 0.0861 Epoch: 1 Global Step: 59890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:54,179-Speed 2629.21 samples/sec Loss 13.4003 LearningRate 0.0861 Epoch: 1 Global Step: 59900 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:25:58,081-Speed 2624.66 samples/sec Loss 13.3472 LearningRate 0.0861 Epoch: 1 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:01,979-Speed 2627.88 samples/sec Loss 13.4812 LearningRate 0.0861 Epoch: 1 Global Step: 59920 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:26:05,860-Speed 2639.29 samples/sec Loss 13.4081 LearningRate 0.0861 Epoch: 1 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:09,762-Speed 2624.59 samples/sec Loss 13.4377 LearningRate 0.0861 Epoch: 1 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:13,667-Speed 2622.91 samples/sec Loss 13.4747 LearningRate 0.0861 Epoch: 1 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:17,570-Speed 2624.13 samples/sec Loss 13.1648 LearningRate 0.0861 Epoch: 1 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:21,558-Speed 2568.37 samples/sec Loss 13.6281 LearningRate 0.0861 Epoch: 1 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:25,501-Speed 2597.77 samples/sec Loss 13.3216 LearningRate 0.0861 Epoch: 1 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:29,402-Speed 2626.03 samples/sec Loss 13.5359 LearningRate 0.0861 Epoch: 1 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:26:33,305-Speed 2624.35 samples/sec Loss 13.4007 LearningRate 0.0861 Epoch: 1 Global Step: 60000 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:27:16,321-[lfw][60000]XNorm: 24.636387
Training: 2022-04-13 02:27:16,322-[lfw][60000]Accuracy-Flip: 0.99600+-0.00271
Training: 2022-04-13 02:27:16,322-[lfw][60000]Accuracy-Highest: 0.99600
Training: 2022-04-13 02:28:06,789-[cfp_fp][60000]XNorm: 21.984522
Training: 2022-04-13 02:28:06,790-[cfp_fp][60000]Accuracy-Flip: 0.97100+-0.00975
Training: 2022-04-13 02:28:06,791-[cfp_fp][60000]Accuracy-Highest: 0.97500
Training: 2022-04-13 02:28:50,209-[agedb_30][60000]XNorm: 24.158313
Training: 2022-04-13 02:28:50,210-[agedb_30][60000]Accuracy-Flip: 0.96283+-0.01019
Training: 2022-04-13 02:28:50,210-[agedb_30][60000]Accuracy-Highest: 0.96283
Training: 2022-04-13 02:28:54,089-Speed 72.74 samples/sec Loss 13.3686 LearningRate 0.0861 Epoch: 1 Global Step: 60010 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:28:57,964-Speed 2643.02 samples/sec Loss 13.5044 LearningRate 0.0861 Epoch: 1 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:29:01,843-Speed 2640.67 samples/sec Loss 13.4097 LearningRate 0.0861 Epoch: 1 Global Step: 60030 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:05,734-Speed 2632.31 samples/sec Loss 13.3074 LearningRate 0.0860 Epoch: 1 Global Step: 60040 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:09,619-Speed 2636.45 samples/sec Loss 13.2636 LearningRate 0.0860 Epoch: 1 Global Step: 60050 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:13,501-Speed 2638.60 samples/sec Loss 13.3799 LearningRate 0.0860 Epoch: 1 Global Step: 60060 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:17,385-Speed 2637.26 samples/sec Loss 13.4819 LearningRate 0.0860 Epoch: 1 Global Step: 60070 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:21,270-Speed 2636.71 samples/sec Loss 13.4840 LearningRate 0.0860 Epoch: 1 Global Step: 60080 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:25,156-Speed 2635.71 samples/sec Loss 13.5570 LearningRate 0.0860 Epoch: 1 Global Step: 60090 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:29,042-Speed 2635.97 samples/sec Loss 13.3409 LearningRate 0.0860 Epoch: 1 Global Step: 60100 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:32,930-Speed 2634.35 samples/sec Loss 13.4807 LearningRate 0.0860 Epoch: 1 Global Step: 60110 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:36,828-Speed 2627.64 samples/sec Loss 13.5210 LearningRate 0.0860 Epoch: 1 Global Step: 60120 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:40,721-Speed 2632.16 samples/sec Loss 13.4134 LearningRate 0.0860 Epoch: 1 Global Step: 60130 Fp16 Grad Scale: 524288 Required: 87 hours
Training: 2022-04-13 02:29:44,593-Speed 2644.72 samples/sec Loss 13.4895 LearningRate 0.0860 Epoch: 1 Global Step: 60140 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:48,491-Speed 2628.01 samples/sec Loss 13.4127 LearningRate 0.0860 Epoch: 1 Global Step: 60150 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:52,399-Speed 2620.78 samples/sec Loss 13.4360 LearningRate 0.0860 Epoch: 1 Global Step: 60160 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:29:56,296-Speed 2628.82 samples/sec Loss 13.4454 LearningRate 0.0860 Epoch: 1 Global Step: 60170 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:30:00,193-Speed 2628.12 samples/sec Loss 13.3948 LearningRate 0.0860 Epoch: 1 Global Step: 60180 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:30:04,117-Speed 2610.20 samples/sec Loss 13.4077 LearningRate 0.0860 Epoch: 1 Global Step: 60190 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:30:08,016-Speed 2627.57 samples/sec Loss 13.3438 LearningRate 0.0860 Epoch: 1 Global Step: 60200 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:30:11,898-Speed 2638.24 samples/sec Loss 13.5000 LearningRate 0.0860 Epoch: 1 Global Step: 60210 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:15,857-Speed 2587.06 samples/sec Loss 13.4455 LearningRate 0.0860 Epoch: 1 Global Step: 60220 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:19,765-Speed 2620.83 samples/sec Loss 13.4670 LearningRate 0.0860 Epoch: 1 Global Step: 60230 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:23,671-Speed 2622.96 samples/sec Loss 13.4130 LearningRate 0.0860 Epoch: 1 Global Step: 60240 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:27,569-Speed 2627.30 samples/sec Loss 13.5316 LearningRate 0.0860 Epoch: 1 Global Step: 60250 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:31,493-Speed 2610.12 samples/sec Loss 13.2303 LearningRate 0.0860 Epoch: 1 Global Step: 60260 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:35,398-Speed 2623.50 samples/sec Loss 13.4632 LearningRate 0.0860 Epoch: 1 Global Step: 60270 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:39,296-Speed 2627.31 samples/sec Loss 13.3631 LearningRate 0.0860 Epoch: 1 Global Step: 60280 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:43,198-Speed 2625.44 samples/sec Loss 13.4040 LearningRate 0.0860 Epoch: 1 Global Step: 60290 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:47,100-Speed 2624.19 samples/sec Loss 13.4517 LearningRate 0.0860 Epoch: 1 Global Step: 60300 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:30:51,005-Speed 2622.85 samples/sec Loss 13.5857 LearningRate 0.0860 Epoch: 1 Global Step: 60310 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:30:54,908-Speed 2624.39 samples/sec Loss 13.2713 LearningRate 0.0860 Epoch: 1 Global Step: 60320 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:30:58,811-Speed 2624.46 samples/sec Loss 13.3738 LearningRate 0.0860 Epoch: 1 Global Step: 60330 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:31:02,732-Speed 2612.52 samples/sec Loss 13.4013 LearningRate 0.0860 Epoch: 1 Global Step: 60340 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:31:06,630-Speed 2627.54 samples/sec Loss 13.4905 LearningRate 0.0860 Epoch: 1 Global Step: 60350 Fp16 Grad Scale: 262144 Required: 87 hours
Training: 2022-04-13 02:31:10,512-Speed 2638.14 samples/sec Loss 13.4481 LearningRate 0.0860 Epoch: 1 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:31:14,423-Speed 2619.33 samples/sec Loss 13.4137 LearningRate 0.0860 Epoch: 1 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 87 hours
Training: 2022-04-13 02:31:18,321-Speed 2627.47 samples/sec Loss 13.4428 LearningRate 0.0860 Epoch: 1 Global Step: 60380 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:22,229-Speed 2620.47 samples/sec Loss 13.3709 LearningRate 0.0860 Epoch: 1 Global Step: 60390 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:26,130-Speed 2625.71 samples/sec Loss 13.3598 LearningRate 0.0860 Epoch: 1 Global Step: 60400 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:30,024-Speed 2630.61 samples/sec Loss 13.3353 LearningRate 0.0860 Epoch: 1 Global Step: 60410 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:33,922-Speed 2627.74 samples/sec Loss 13.3912 LearningRate 0.0860 Epoch: 1 Global Step: 60420 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:37,827-Speed 2623.11 samples/sec Loss 13.3210 LearningRate 0.0860 Epoch: 1 Global Step: 60430 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:41,735-Speed 2621.39 samples/sec Loss 13.3658 LearningRate 0.0860 Epoch: 1 Global Step: 60440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:45,634-Speed 2626.91 samples/sec Loss 13.4134 LearningRate 0.0860 Epoch: 1 Global Step: 60450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:31:49,551-Speed 2614.45 samples/sec Loss 13.2492 LearningRate 0.0860 Epoch: 1 Global Step: 60460 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:31:53,454-Speed 2624.62 samples/sec Loss 13.3354 LearningRate 0.0860 Epoch: 1 Global Step: 60470 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:31:57,366-Speed 2618.60 samples/sec Loss 13.4490 LearningRate 0.0860 Epoch: 1 Global Step: 60480 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:32:01,248-Speed 2638.29 samples/sec Loss 13.1897 LearningRate 0.0859 Epoch: 1 Global Step: 60490 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:05,144-Speed 2628.74 samples/sec Loss 13.4364 LearningRate 0.0859 Epoch: 1 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:09,184-Speed 2535.25 samples/sec Loss 13.5558 LearningRate 0.0859 Epoch: 1 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:13,080-Speed 2629.31 samples/sec Loss 13.3614 LearningRate 0.0859 Epoch: 1 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:16,978-Speed 2627.37 samples/sec Loss 13.3889 LearningRate 0.0859 Epoch: 1 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:20,874-Speed 2629.09 samples/sec Loss 13.3883 LearningRate 0.0859 Epoch: 1 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:24,768-Speed 2629.79 samples/sec Loss 13.3135 LearningRate 0.0859 Epoch: 1 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:28,669-Speed 2625.98 samples/sec Loss 13.3341 LearningRate 0.0859 Epoch: 1 Global Step: 60560 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:32,569-Speed 2626.42 samples/sec Loss 13.3909 LearningRate 0.0859 Epoch: 1 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:36,473-Speed 2623.47 samples/sec Loss 13.4652 LearningRate 0.0859 Epoch: 1 Global Step: 60580 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:40,370-Speed 2628.70 samples/sec Loss 13.5854 LearningRate 0.0859 Epoch: 1 Global Step: 60590 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:32:44,249-Speed 2640.36 samples/sec Loss 13.3261 LearningRate 0.0859 Epoch: 1 Global Step: 60600 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:48,146-Speed 2628.00 samples/sec Loss 13.5367 LearningRate 0.0859 Epoch: 1 Global Step: 60610 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:52,051-Speed 2623.70 samples/sec Loss 13.4666 LearningRate 0.0859 Epoch: 1 Global Step: 60620 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:55,945-Speed 2630.02 samples/sec Loss 13.4715 LearningRate 0.0859 Epoch: 1 Global Step: 60630 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:32:59,848-Speed 2624.10 samples/sec Loss 13.3357 LearningRate 0.0859 Epoch: 1 Global Step: 60640 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:33:03,748-Speed 2626.45 samples/sec Loss 13.4441 LearningRate 0.0859 Epoch: 1 Global Step: 60650 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:33:07,648-Speed 2626.82 samples/sec Loss 13.4040 LearningRate 0.0859 Epoch: 1 Global Step: 60660 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:33:11,547-Speed 2626.42 samples/sec Loss 13.3515 LearningRate 0.0859 Epoch: 1 Global Step: 60670 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:33:15,477-Speed 2606.83 samples/sec Loss 13.3390 LearningRate 0.0859 Epoch: 1 Global Step: 60680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:33:19,372-Speed 2630.43 samples/sec Loss 13.1728 LearningRate 0.0859 Epoch: 1 Global Step: 60690 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:23,278-Speed 2622.08 samples/sec Loss 13.3420 LearningRate 0.0859 Epoch: 1 Global Step: 60700 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:27,177-Speed 2627.13 samples/sec Loss 13.3897 LearningRate 0.0859 Epoch: 1 Global Step: 60710 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:31,085-Speed 2620.50 samples/sec Loss 13.1462 LearningRate 0.0859 Epoch: 1 Global Step: 60720 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:34,981-Speed 2629.22 samples/sec Loss 13.1307 LearningRate 0.0859 Epoch: 1 Global Step: 60730 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:38,885-Speed 2623.47 samples/sec Loss 13.5348 LearningRate 0.0859 Epoch: 1 Global Step: 60740 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:42,783-Speed 2632.23 samples/sec Loss 13.3663 LearningRate 0.0859 Epoch: 1 Global Step: 60750 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:46,686-Speed 2624.34 samples/sec Loss 13.3437 LearningRate 0.0859 Epoch: 1 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:50,589-Speed 2624.52 samples/sec Loss 13.3736 LearningRate 0.0859 Epoch: 1 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:54,487-Speed 2627.73 samples/sec Loss 13.2780 LearningRate 0.0859 Epoch: 1 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:33:58,396-Speed 2620.63 samples/sec Loss 13.4348 LearningRate 0.0859 Epoch: 1 Global Step: 60790 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:02,308-Speed 2617.76 samples/sec Loss 13.3453 LearningRate 0.0859 Epoch: 1 Global Step: 60800 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:06,220-Speed 2618.70 samples/sec Loss 13.4067 LearningRate 0.0859 Epoch: 1 Global Step: 60810 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:10,127-Speed 2621.43 samples/sec Loss 13.5028 LearningRate 0.0859 Epoch: 1 Global Step: 60820 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:14,035-Speed 2620.79 samples/sec Loss 13.3939 LearningRate 0.0859 Epoch: 1 Global Step: 60830 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:17,941-Speed 2622.48 samples/sec Loss 13.3443 LearningRate 0.0859 Epoch: 1 Global Step: 60840 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:21,844-Speed 2624.22 samples/sec Loss 13.2769 LearningRate 0.0859 Epoch: 1 Global Step: 60850 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:25,745-Speed 2625.57 samples/sec Loss 13.2108 LearningRate 0.0859 Epoch: 1 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:29,649-Speed 2623.62 samples/sec Loss 13.4311 LearningRate 0.0859 Epoch: 1 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:33,550-Speed 2625.44 samples/sec Loss 13.3580 LearningRate 0.0859 Epoch: 1 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:34:37,455-Speed 2622.71 samples/sec Loss 13.4248 LearningRate 0.0859 Epoch: 1 Global Step: 60890 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:34:41,372-Speed 2615.11 samples/sec Loss 13.3460 LearningRate 0.0859 Epoch: 1 Global Step: 60900 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:34:45,243-Speed 2646.09 samples/sec Loss 13.3323 LearningRate 0.0859 Epoch: 1 Global Step: 60910 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:34:49,215-Speed 2578.56 samples/sec Loss 13.3221 LearningRate 0.0859 Epoch: 1 Global Step: 60920 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:34:53,133-Speed 2613.69 samples/sec Loss 13.4355 LearningRate 0.0859 Epoch: 1 Global Step: 60930 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:34:57,053-Speed 2613.30 samples/sec Loss 13.3742 LearningRate 0.0858 Epoch: 1 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:35:01,152-Speed 2498.99 samples/sec Loss 13.3105 LearningRate 0.0858 Epoch: 1 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:35:05,092-Speed 2599.18 samples/sec Loss 13.4415 LearningRate 0.0858 Epoch: 1 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:35:09,059-Speed 2582.20 samples/sec Loss 13.2890 LearningRate 0.0858 Epoch: 1 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:35:12,989-Speed 2605.72 samples/sec Loss 13.5780 LearningRate 0.0858 Epoch: 1 Global Step: 60980 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:35:16,892-Speed 2624.28 samples/sec Loss 13.4584 LearningRate 0.0858 Epoch: 1 Global Step: 60990 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:35:20,790-Speed 2634.24 samples/sec Loss 13.4455 LearningRate 0.0858 Epoch: 1 Global Step: 61000 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:35:24,687-Speed 2628.08 samples/sec Loss 13.2003 LearningRate 0.0858 Epoch: 1 Global Step: 61010 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:28,583-Speed 2628.55 samples/sec Loss 13.3959 LearningRate 0.0858 Epoch: 1 Global Step: 61020 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:32,479-Speed 2629.21 samples/sec Loss 13.4610 LearningRate 0.0858 Epoch: 1 Global Step: 61030 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:36,373-Speed 2630.36 samples/sec Loss 13.4941 LearningRate 0.0858 Epoch: 1 Global Step: 61040 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:40,276-Speed 2623.93 samples/sec Loss 13.3398 LearningRate 0.0858 Epoch: 1 Global Step: 61050 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:44,169-Speed 2630.99 samples/sec Loss 13.2861 LearningRate 0.0858 Epoch: 1 Global Step: 61060 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:48,065-Speed 2629.33 samples/sec Loss 13.4260 LearningRate 0.0858 Epoch: 1 Global Step: 61070 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:51,968-Speed 2624.07 samples/sec Loss 13.2344 LearningRate 0.0858 Epoch: 1 Global Step: 61080 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:55,864-Speed 2629.03 samples/sec Loss 13.3442 LearningRate 0.0858 Epoch: 1 Global Step: 61090 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:35:59,764-Speed 2626.66 samples/sec Loss 13.4423 LearningRate 0.0858 Epoch: 1 Global Step: 61100 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:36:03,646-Speed 2638.14 samples/sec Loss 13.3847 LearningRate 0.0858 Epoch: 1 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:07,609-Speed 2584.26 samples/sec Loss 13.3590 LearningRate 0.0858 Epoch: 1 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:11,505-Speed 2629.57 samples/sec Loss 13.2631 LearningRate 0.0858 Epoch: 1 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:15,399-Speed 2630.28 samples/sec Loss 13.3754 LearningRate 0.0858 Epoch: 1 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:19,292-Speed 2631.00 samples/sec Loss 13.4830 LearningRate 0.0858 Epoch: 1 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:23,184-Speed 2631.43 samples/sec Loss 13.3317 LearningRate 0.0858 Epoch: 1 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:27,078-Speed 2630.56 samples/sec Loss 13.4516 LearningRate 0.0858 Epoch: 1 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:30,970-Speed 2631.36 samples/sec Loss 13.5294 LearningRate 0.0858 Epoch: 1 Global Step: 61180 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:34,865-Speed 2629.49 samples/sec Loss 13.5837 LearningRate 0.0858 Epoch: 1 Global Step: 61190 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:38,773-Speed 2620.96 samples/sec Loss 13.4258 LearningRate 0.0858 Epoch: 1 Global Step: 61200 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:42,665-Speed 2631.48 samples/sec Loss 13.4132 LearningRate 0.0858 Epoch: 1 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:36:46,563-Speed 2627.83 samples/sec Loss 13.2993 LearningRate 0.0858 Epoch: 1 Global Step: 61220 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:36:50,456-Speed 2630.87 samples/sec Loss 13.2357 LearningRate 0.0858 Epoch: 1 Global Step: 61230 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:36:54,333-Speed 2641.89 samples/sec Loss 13.5318 LearningRate 0.0858 Epoch: 1 Global Step: 61240 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:36:58,233-Speed 2626.39 samples/sec Loss 13.4669 LearningRate 0.0858 Epoch: 1 Global Step: 61250 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:02,132-Speed 2627.01 samples/sec Loss 13.3258 LearningRate 0.0858 Epoch: 1 Global Step: 61260 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:06,026-Speed 2629.88 samples/sec Loss 13.4120 LearningRate 0.0858 Epoch: 1 Global Step: 61270 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:09,973-Speed 2594.79 samples/sec Loss 13.4763 LearningRate 0.0858 Epoch: 1 Global Step: 61280 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:13,883-Speed 2619.74 samples/sec Loss 13.4160 LearningRate 0.0858 Epoch: 1 Global Step: 61290 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:17,785-Speed 2625.31 samples/sec Loss 13.3762 LearningRate 0.0858 Epoch: 1 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:21,682-Speed 2628.28 samples/sec Loss 13.4431 LearningRate 0.0858 Epoch: 1 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:25,581-Speed 2626.77 samples/sec Loss 13.4457 LearningRate 0.0858 Epoch: 1 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:29,476-Speed 2629.73 samples/sec Loss 13.3921 LearningRate 0.0858 Epoch: 1 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:37:33,371-Speed 2629.34 samples/sec Loss 13.3167 LearningRate 0.0858 Epoch: 1 Global Step: 61340 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:37:37,266-Speed 2630.09 samples/sec Loss 13.3279 LearningRate 0.0858 Epoch: 1 Global Step: 61350 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:37:41,173-Speed 2621.34 samples/sec Loss 13.4472 LearningRate 0.0858 Epoch: 1 Global Step: 61360 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:37:45,043-Speed 2646.56 samples/sec Loss 13.3391 LearningRate 0.0858 Epoch: 1 Global Step: 61370 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:37:48,933-Speed 2632.72 samples/sec Loss 13.2756 LearningRate 0.0857 Epoch: 1 Global Step: 61380 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:37:52,887-Speed 2590.26 samples/sec Loss 13.4548 LearningRate 0.0857 Epoch: 1 Global Step: 61390 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:37:56,802-Speed 2616.29 samples/sec Loss 13.4563 LearningRate 0.0857 Epoch: 1 Global Step: 61400 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:38:00,695-Speed 2630.92 samples/sec Loss 13.3884 LearningRate 0.0857 Epoch: 1 Global Step: 61410 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:38:04,595-Speed 2626.88 samples/sec Loss 13.2946 LearningRate 0.0857 Epoch: 1 Global Step: 61420 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:38:08,495-Speed 2625.90 samples/sec Loss 13.3411 LearningRate 0.0857 Epoch: 1 Global Step: 61430 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:38:12,388-Speed 2631.13 samples/sec Loss 13.3635 LearningRate 0.0857 Epoch: 1 Global Step: 61440 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:38:16,282-Speed 2630.25 samples/sec Loss 13.3811 LearningRate 0.0857 Epoch: 1 Global Step: 61450 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:38:20,176-Speed 2629.68 samples/sec Loss 13.3082 LearningRate 0.0857 Epoch: 1 Global Step: 61460 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:38:24,040-Speed 2651.13 samples/sec Loss 13.4326 LearningRate 0.0857 Epoch: 1 Global Step: 61470 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:27,935-Speed 2629.45 samples/sec Loss 13.6310 LearningRate 0.0857 Epoch: 1 Global Step: 61480 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:31,827-Speed 2632.01 samples/sec Loss 13.3421 LearningRate 0.0857 Epoch: 1 Global Step: 61490 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:35,720-Speed 2631.63 samples/sec Loss 13.2527 LearningRate 0.0857 Epoch: 1 Global Step: 61500 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:39,614-Speed 2630.57 samples/sec Loss 13.3856 LearningRate 0.0857 Epoch: 1 Global Step: 61510 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:43,513-Speed 2626.88 samples/sec Loss 13.4653 LearningRate 0.0857 Epoch: 1 Global Step: 61520 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:47,504-Speed 2566.97 samples/sec Loss 13.3371 LearningRate 0.0857 Epoch: 1 Global Step: 61530 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:51,398-Speed 2630.06 samples/sec Loss 13.2684 LearningRate 0.0857 Epoch: 1 Global Step: 61540 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:55,294-Speed 2629.80 samples/sec Loss 13.3700 LearningRate 0.0857 Epoch: 1 Global Step: 61550 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:38:59,198-Speed 2623.22 samples/sec Loss 13.3935 LearningRate 0.0857 Epoch: 1 Global Step: 61560 Fp16 Grad Scale: 8192 Required: 86 hours
Training: 2022-04-13 02:39:03,100-Speed 2624.88 samples/sec Loss 13.3841 LearningRate 0.0857 Epoch: 1 Global Step: 61570 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:06,994-Speed 2630.69 samples/sec Loss 13.4505 LearningRate 0.0857 Epoch: 1 Global Step: 61580 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:10,889-Speed 2629.78 samples/sec Loss 13.4625 LearningRate 0.0857 Epoch: 1 Global Step: 61590 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:14,787-Speed 2628.11 samples/sec Loss 13.2806 LearningRate 0.0857 Epoch: 1 Global Step: 61600 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:18,682-Speed 2629.48 samples/sec Loss 13.3103 LearningRate 0.0857 Epoch: 1 Global Step: 61610 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:22,578-Speed 2629.80 samples/sec Loss 13.1977 LearningRate 0.0857 Epoch: 1 Global Step: 61620 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:26,487-Speed 2619.92 samples/sec Loss 13.3371 LearningRate 0.0857 Epoch: 1 Global Step: 61630 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:30,388-Speed 2626.07 samples/sec Loss 13.2303 LearningRate 0.0857 Epoch: 1 Global Step: 61640 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:34,281-Speed 2631.17 samples/sec Loss 13.2883 LearningRate 0.0857 Epoch: 1 Global Step: 61650 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:38,177-Speed 2628.42 samples/sec Loss 13.2554 LearningRate 0.0857 Epoch: 1 Global Step: 61660 Fp16 Grad Scale: 16384 Required: 86 hours
Training: 2022-04-13 02:39:42,100-Speed 2610.63 samples/sec Loss 13.4100 LearningRate 0.0857 Epoch: 1 Global Step: 61670 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:39:45,996-Speed 2629.29 samples/sec Loss 13.5120 LearningRate 0.0857 Epoch: 1 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:39:49,892-Speed 2629.27 samples/sec Loss 13.3272 LearningRate 0.0857 Epoch: 1 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:39:53,791-Speed 2627.48 samples/sec Loss 13.3083 LearningRate 0.0857 Epoch: 1 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:39:57,700-Speed 2619.96 samples/sec Loss 13.4825 LearningRate 0.0857 Epoch: 1 Global Step: 61710 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:40:01,595-Speed 2629.79 samples/sec Loss 13.2977 LearningRate 0.0857 Epoch: 1 Global Step: 61720 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:40:05,491-Speed 2628.68 samples/sec Loss 13.3586 LearningRate 0.0857 Epoch: 1 Global Step: 61730 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:40:09,403-Speed 2618.05 samples/sec Loss 13.2901 LearningRate 0.0857 Epoch: 1 Global Step: 61740 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:40:13,302-Speed 2626.97 samples/sec Loss 13.3651 LearningRate 0.0857 Epoch: 1 Global Step: 61750 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:40:17,197-Speed 2629.81 samples/sec Loss 13.2974 LearningRate 0.0857 Epoch: 1 Global Step: 61760 Fp16 Grad Scale: 32768 Required: 86 hours
Training: 2022-04-13 02:40:21,099-Speed 2625.08 samples/sec Loss 13.1946 LearningRate 0.0857 Epoch: 1 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:25,000-Speed 2625.49 samples/sec Loss 13.2518 LearningRate 0.0857 Epoch: 1 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:28,900-Speed 2626.20 samples/sec Loss 13.0861 LearningRate 0.0857 Epoch: 1 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:32,802-Speed 2625.35 samples/sec Loss 13.3495 LearningRate 0.0857 Epoch: 1 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:36,706-Speed 2623.30 samples/sec Loss 13.3345 LearningRate 0.0857 Epoch: 1 Global Step: 61810 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:40,613-Speed 2621.56 samples/sec Loss 13.3551 LearningRate 0.0857 Epoch: 1 Global Step: 61820 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:44,522-Speed 2620.55 samples/sec Loss 13.4505 LearningRate 0.0856 Epoch: 1 Global Step: 61830 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:48,423-Speed 2625.50 samples/sec Loss 13.3488 LearningRate 0.0856 Epoch: 1 Global Step: 61840 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:52,319-Speed 2629.06 samples/sec Loss 13.3414 LearningRate 0.0856 Epoch: 1 Global Step: 61850 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:40:56,216-Speed 2628.56 samples/sec Loss 13.3535 LearningRate 0.0856 Epoch: 1 Global Step: 61860 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:41:00,116-Speed 2626.21 samples/sec Loss 13.4096 LearningRate 0.0856 Epoch: 1 Global Step: 61870 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:04,013-Speed 2628.08 samples/sec Loss 13.4504 LearningRate 0.0856 Epoch: 1 Global Step: 61880 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:07,906-Speed 2630.85 samples/sec Loss 13.3691 LearningRate 0.0856 Epoch: 1 Global Step: 61890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:11,806-Speed 2626.25 samples/sec Loss 13.3927 LearningRate 0.0856 Epoch: 1 Global Step: 61900 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:15,707-Speed 2626.00 samples/sec Loss 13.3726 LearningRate 0.0856 Epoch: 1 Global Step: 61910 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:19,602-Speed 2629.11 samples/sec Loss 13.3054 LearningRate 0.0856 Epoch: 1 Global Step: 61920 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:23,504-Speed 2625.38 samples/sec Loss 13.3516 LearningRate 0.0856 Epoch: 1 Global Step: 61930 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:27,407-Speed 2624.05 samples/sec Loss 13.3812 LearningRate 0.0856 Epoch: 1 Global Step: 61940 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:31,310-Speed 2624.47 samples/sec Loss 13.2315 LearningRate 0.0856 Epoch: 1 Global Step: 61950 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:35,205-Speed 2629.64 samples/sec Loss 13.3376 LearningRate 0.0856 Epoch: 1 Global Step: 61960 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:39,084-Speed 2640.31 samples/sec Loss 13.2351 LearningRate 0.0856 Epoch: 1 Global Step: 61970 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:42,980-Speed 2628.81 samples/sec Loss 13.2577 LearningRate 0.0856 Epoch: 1 Global Step: 61980 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:46,883-Speed 2623.87 samples/sec Loss 13.3860 LearningRate 0.0856 Epoch: 1 Global Step: 61990 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:50,790-Speed 2622.27 samples/sec Loss 13.4083 LearningRate 0.0856 Epoch: 1 Global Step: 62000 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:54,689-Speed 2626.31 samples/sec Loss 13.3286 LearningRate 0.0856 Epoch: 1 Global Step: 62010 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:41:58,582-Speed 2631.36 samples/sec Loss 13.2716 LearningRate 0.0856 Epoch: 1 Global Step: 62020 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:42:02,475-Speed 2630.73 samples/sec Loss 13.5264 LearningRate 0.0856 Epoch: 1 Global Step: 62030 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:42:06,372-Speed 2628.54 samples/sec Loss 13.4329 LearningRate 0.0856 Epoch: 1 Global Step: 62040 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:42:10,278-Speed 2622.41 samples/sec Loss 13.3321 LearningRate 0.0856 Epoch: 1 Global Step: 62050 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:42:14,170-Speed 2631.87 samples/sec Loss 13.3566 LearningRate 0.0856 Epoch: 1 Global Step: 62060 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:42:18,066-Speed 2628.56 samples/sec Loss 13.2477 LearningRate 0.0856 Epoch: 1 Global Step: 62070 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:21,962-Speed 2628.55 samples/sec Loss 13.3575 LearningRate 0.0856 Epoch: 1 Global Step: 62080 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:25,860-Speed 2627.64 samples/sec Loss 13.1080 LearningRate 0.0856 Epoch: 1 Global Step: 62090 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:29,755-Speed 2630.02 samples/sec Loss 13.2441 LearningRate 0.0856 Epoch: 1 Global Step: 62100 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:33,694-Speed 2600.20 samples/sec Loss 13.3547 LearningRate 0.0856 Epoch: 1 Global Step: 62110 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:37,591-Speed 2628.45 samples/sec Loss 13.3373 LearningRate 0.0856 Epoch: 1 Global Step: 62120 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:41,495-Speed 2623.49 samples/sec Loss 13.3848 LearningRate 0.0856 Epoch: 1 Global Step: 62130 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:45,390-Speed 2630.21 samples/sec Loss 13.4506 LearningRate 0.0856 Epoch: 1 Global Step: 62140 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:49,306-Speed 2615.20 samples/sec Loss 13.3077 LearningRate 0.0856 Epoch: 1 Global Step: 62150 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:53,200-Speed 2630.41 samples/sec Loss 13.2621 LearningRate 0.0856 Epoch: 1 Global Step: 62160 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:42:57,081-Speed 2638.97 samples/sec Loss 13.3578 LearningRate 0.0856 Epoch: 1 Global Step: 62170 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:43:00,968-Speed 2635.03 samples/sec Loss 13.1638 LearningRate 0.0856 Epoch: 1 Global Step: 62180 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:43:04,885-Speed 2614.64 samples/sec Loss 13.1644 LearningRate 0.0856 Epoch: 1 Global Step: 62190 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:08,997-Speed 2490.67 samples/sec Loss 13.4858 LearningRate 0.0856 Epoch: 1 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:13,075-Speed 2512.15 samples/sec Loss 13.4663 LearningRate 0.0856 Epoch: 1 Global Step: 62210 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:17,147-Speed 2514.89 samples/sec Loss 13.4263 LearningRate 0.0856 Epoch: 1 Global Step: 62220 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:21,218-Speed 2515.94 samples/sec Loss 13.3769 LearningRate 0.0856 Epoch: 1 Global Step: 62230 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:25,302-Speed 2508.14 samples/sec Loss 13.4045 LearningRate 0.0856 Epoch: 1 Global Step: 62240 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:29,403-Speed 2497.74 samples/sec Loss 13.2615 LearningRate 0.0856 Epoch: 1 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:33,500-Speed 2499.68 samples/sec Loss 13.3436 LearningRate 0.0856 Epoch: 1 Global Step: 62260 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:37,475-Speed 2576.80 samples/sec Loss 13.4121 LearningRate 0.0856 Epoch: 1 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:41,393-Speed 2613.97 samples/sec Loss 13.2240 LearningRate 0.0855 Epoch: 1 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:45,281-Speed 2634.32 samples/sec Loss 13.3269 LearningRate 0.0855 Epoch: 1 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:49,169-Speed 2633.88 samples/sec Loss 13.1644 LearningRate 0.0855 Epoch: 1 Global Step: 62300 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:53,067-Speed 2627.80 samples/sec Loss 13.4990 LearningRate 0.0855 Epoch: 1 Global Step: 62310 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:43:56,962-Speed 2629.95 samples/sec Loss 13.3470 LearningRate 0.0855 Epoch: 1 Global Step: 62320 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:00,854-Speed 2632.00 samples/sec Loss 13.3298 LearningRate 0.0855 Epoch: 1 Global Step: 62330 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:04,930-Speed 2512.80 samples/sec Loss 13.3373 LearningRate 0.0855 Epoch: 1 Global Step: 62340 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:09,034-Speed 2495.45 samples/sec Loss 13.3387 LearningRate 0.0855 Epoch: 1 Global Step: 62350 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:13,129-Speed 2500.93 samples/sec Loss 13.3433 LearningRate 0.0855 Epoch: 1 Global Step: 62360 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:17,224-Speed 2501.43 samples/sec Loss 13.3681 LearningRate 0.0855 Epoch: 1 Global Step: 62370 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:21,243-Speed 2548.31 samples/sec Loss 13.3819 LearningRate 0.0855 Epoch: 1 Global Step: 62380 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:25,164-Speed 2611.82 samples/sec Loss 13.2493 LearningRate 0.0855 Epoch: 1 Global Step: 62390 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:44:29,065-Speed 2625.37 samples/sec Loss 13.2407 LearningRate 0.0855 Epoch: 1 Global Step: 62400 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:44:32,970-Speed 2623.09 samples/sec Loss 13.3635 LearningRate 0.0855 Epoch: 1 Global Step: 62410 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:44:36,850-Speed 2639.86 samples/sec Loss 13.2918 LearningRate 0.0855 Epoch: 1 Global Step: 62420 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:40,761-Speed 2619.19 samples/sec Loss 13.2567 LearningRate 0.0855 Epoch: 1 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:44,658-Speed 2628.03 samples/sec Loss 13.2014 LearningRate 0.0855 Epoch: 1 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:48,564-Speed 2622.42 samples/sec Loss 13.1810 LearningRate 0.0855 Epoch: 1 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:52,475-Speed 2618.90 samples/sec Loss 13.2759 LearningRate 0.0855 Epoch: 1 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:44:56,373-Speed 2627.55 samples/sec Loss 13.5381 LearningRate 0.0855 Epoch: 1 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:00,266-Speed 2630.65 samples/sec Loss 13.2649 LearningRate 0.0855 Epoch: 1 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:04,158-Speed 2631.56 samples/sec Loss 13.2111 LearningRate 0.0855 Epoch: 1 Global Step: 62490 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:08,053-Speed 2629.44 samples/sec Loss 13.0992 LearningRate 0.0855 Epoch: 1 Global Step: 62500 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:11,946-Speed 2631.38 samples/sec Loss 13.2741 LearningRate 0.0855 Epoch: 1 Global Step: 62510 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:15,822-Speed 2643.16 samples/sec Loss 13.4609 LearningRate 0.0855 Epoch: 1 Global Step: 62520 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:19,716-Speed 2630.07 samples/sec Loss 13.4005 LearningRate 0.0855 Epoch: 1 Global Step: 62530 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:23,607-Speed 2632.27 samples/sec Loss 13.3163 LearningRate 0.0855 Epoch: 1 Global Step: 62540 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:27,502-Speed 2629.84 samples/sec Loss 13.3730 LearningRate 0.0855 Epoch: 1 Global Step: 62550 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:31,395-Speed 2630.25 samples/sec Loss 13.3402 LearningRate 0.0855 Epoch: 1 Global Step: 62560 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:35,293-Speed 2627.43 samples/sec Loss 13.2794 LearningRate 0.0855 Epoch: 1 Global Step: 62570 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:39,201-Speed 2621.02 samples/sec Loss 13.1263 LearningRate 0.0855 Epoch: 1 Global Step: 62580 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:43,107-Speed 2622.94 samples/sec Loss 13.3723 LearningRate 0.0855 Epoch: 1 Global Step: 62590 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:47,007-Speed 2626.71 samples/sec Loss 13.3692 LearningRate 0.0855 Epoch: 1 Global Step: 62600 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:50,944-Speed 2601.78 samples/sec Loss 13.4483 LearningRate 0.0855 Epoch: 1 Global Step: 62610 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:45:54,869-Speed 2609.41 samples/sec Loss 13.3519 LearningRate 0.0855 Epoch: 1 Global Step: 62620 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:45:58,771-Speed 2624.93 samples/sec Loss 13.2437 LearningRate 0.0855 Epoch: 1 Global Step: 62630 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:46:02,659-Speed 2633.98 samples/sec Loss 13.3831 LearningRate 0.0855 Epoch: 1 Global Step: 62640 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:06,561-Speed 2625.07 samples/sec Loss 13.3844 LearningRate 0.0855 Epoch: 1 Global Step: 62650 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:10,464-Speed 2623.86 samples/sec Loss 13.3966 LearningRate 0.0855 Epoch: 1 Global Step: 62660 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:14,373-Speed 2620.83 samples/sec Loss 13.3071 LearningRate 0.0855 Epoch: 1 Global Step: 62670 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:18,271-Speed 2627.15 samples/sec Loss 13.3330 LearningRate 0.0855 Epoch: 1 Global Step: 62680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:22,166-Speed 2630.83 samples/sec Loss 13.2805 LearningRate 0.0855 Epoch: 1 Global Step: 62690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:26,061-Speed 2629.43 samples/sec Loss 13.1379 LearningRate 0.0855 Epoch: 1 Global Step: 62700 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:29,971-Speed 2619.41 samples/sec Loss 13.3976 LearningRate 0.0855 Epoch: 1 Global Step: 62710 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:33,867-Speed 2628.94 samples/sec Loss 13.3504 LearningRate 0.0855 Epoch: 1 Global Step: 62720 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:37,765-Speed 2627.23 samples/sec Loss 13.2137 LearningRate 0.0854 Epoch: 1 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:41,653-Speed 2634.13 samples/sec Loss 13.2830 LearningRate 0.0854 Epoch: 1 Global Step: 62740 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:45,554-Speed 2625.70 samples/sec Loss 13.3661 LearningRate 0.0854 Epoch: 1 Global Step: 62750 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:49,452-Speed 2628.11 samples/sec Loss 13.2664 LearningRate 0.0854 Epoch: 1 Global Step: 62760 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:53,358-Speed 2622.48 samples/sec Loss 13.3071 LearningRate 0.0854 Epoch: 1 Global Step: 62770 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:46:57,250-Speed 2631.47 samples/sec Loss 13.2854 LearningRate 0.0854 Epoch: 1 Global Step: 62780 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:01,147-Speed 2628.41 samples/sec Loss 13.2111 LearningRate 0.0854 Epoch: 1 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:05,046-Speed 2626.89 samples/sec Loss 13.1254 LearningRate 0.0854 Epoch: 1 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:08,943-Speed 2627.80 samples/sec Loss 13.2592 LearningRate 0.0854 Epoch: 1 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:12,843-Speed 2626.10 samples/sec Loss 13.2485 LearningRate 0.0854 Epoch: 1 Global Step: 62820 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:16,745-Speed 2624.95 samples/sec Loss 13.3103 LearningRate 0.0854 Epoch: 1 Global Step: 62830 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:20,642-Speed 2628.60 samples/sec Loss 13.2756 LearningRate 0.0854 Epoch: 1 Global Step: 62840 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:47:24,522-Speed 2639.34 samples/sec Loss 13.2002 LearningRate 0.0854 Epoch: 1 Global Step: 62850 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:28,443-Speed 2612.66 samples/sec Loss 13.3378 LearningRate 0.0854 Epoch: 1 Global Step: 62860 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:32,359-Speed 2615.20 samples/sec Loss 13.3392 LearningRate 0.0854 Epoch: 1 Global Step: 62870 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:36,255-Speed 2628.90 samples/sec Loss 13.3517 LearningRate 0.0854 Epoch: 1 Global Step: 62880 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:40,153-Speed 2627.83 samples/sec Loss 13.3786 LearningRate 0.0854 Epoch: 1 Global Step: 62890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:44,063-Speed 2619.19 samples/sec Loss 13.3157 LearningRate 0.0854 Epoch: 1 Global Step: 62900 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:47,960-Speed 2628.26 samples/sec Loss 13.3324 LearningRate 0.0854 Epoch: 1 Global Step: 62910 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:51,859-Speed 2627.28 samples/sec Loss 13.3813 LearningRate 0.0854 Epoch: 1 Global Step: 62920 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:55,757-Speed 2627.00 samples/sec Loss 13.2804 LearningRate 0.0854 Epoch: 1 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:47:59,656-Speed 2627.92 samples/sec Loss 13.3636 LearningRate 0.0854 Epoch: 1 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:03,560-Speed 2623.25 samples/sec Loss 13.2214 LearningRate 0.0854 Epoch: 1 Global Step: 62950 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:48:07,463-Speed 2624.59 samples/sec Loss 13.3250 LearningRate 0.0854 Epoch: 1 Global Step: 62960 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:48:11,369-Speed 2622.00 samples/sec Loss 13.3327 LearningRate 0.0854 Epoch: 1 Global Step: 62970 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:48:15,266-Speed 2629.07 samples/sec Loss 13.3397 LearningRate 0.0854 Epoch: 1 Global Step: 62980 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:48:19,164-Speed 2627.62 samples/sec Loss 13.4734 LearningRate 0.0854 Epoch: 1 Global Step: 62990 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:48:23,062-Speed 2627.26 samples/sec Loss 13.1917 LearningRate 0.0854 Epoch: 1 Global Step: 63000 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:48:26,943-Speed 2639.42 samples/sec Loss 13.3711 LearningRate 0.0854 Epoch: 1 Global Step: 63010 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:30,843-Speed 2626.02 samples/sec Loss 13.2464 LearningRate 0.0854 Epoch: 1 Global Step: 63020 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:34,746-Speed 2624.41 samples/sec Loss 13.2740 LearningRate 0.0854 Epoch: 1 Global Step: 63030 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:38,643-Speed 2628.15 samples/sec Loss 13.3408 LearningRate 0.0854 Epoch: 1 Global Step: 63040 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:42,664-Speed 2547.69 samples/sec Loss 13.3869 LearningRate 0.0854 Epoch: 1 Global Step: 63050 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:46,757-Speed 2502.07 samples/sec Loss 13.1529 LearningRate 0.0854 Epoch: 1 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:50,694-Speed 2601.46 samples/sec Loss 13.2538 LearningRate 0.0854 Epoch: 1 Global Step: 63070 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:54,601-Speed 2621.70 samples/sec Loss 13.6472 LearningRate 0.0854 Epoch: 1 Global Step: 63080 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:48:58,514-Speed 2617.16 samples/sec Loss 13.3653 LearningRate 0.0854 Epoch: 1 Global Step: 63090 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:02,416-Speed 2625.06 samples/sec Loss 13.2928 LearningRate 0.0854 Epoch: 1 Global Step: 63100 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:06,317-Speed 2625.80 samples/sec Loss 13.3797 LearningRate 0.0854 Epoch: 1 Global Step: 63110 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:49:10,196-Speed 2640.48 samples/sec Loss 13.3263 LearningRate 0.0854 Epoch: 1 Global Step: 63120 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:14,098-Speed 2624.94 samples/sec Loss 13.1899 LearningRate 0.0854 Epoch: 1 Global Step: 63130 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:18,000-Speed 2624.95 samples/sec Loss 13.3389 LearningRate 0.0854 Epoch: 1 Global Step: 63140 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:21,897-Speed 2628.20 samples/sec Loss 13.4370 LearningRate 0.0854 Epoch: 1 Global Step: 63150 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:25,806-Speed 2620.39 samples/sec Loss 13.3514 LearningRate 0.0854 Epoch: 1 Global Step: 63160 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:29,705-Speed 2626.57 samples/sec Loss 13.2515 LearningRate 0.0854 Epoch: 1 Global Step: 63170 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:33,601-Speed 2628.97 samples/sec Loss 13.2063 LearningRate 0.0853 Epoch: 1 Global Step: 63180 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:37,497-Speed 2628.66 samples/sec Loss 13.3091 LearningRate 0.0853 Epoch: 1 Global Step: 63190 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:41,399-Speed 2624.99 samples/sec Loss 13.2220 LearningRate 0.0853 Epoch: 1 Global Step: 63200 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:45,318-Speed 2613.49 samples/sec Loss 13.2963 LearningRate 0.0853 Epoch: 1 Global Step: 63210 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:49,240-Speed 2611.51 samples/sec Loss 13.2540 LearningRate 0.0853 Epoch: 1 Global Step: 63220 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:49:53,127-Speed 2635.36 samples/sec Loss 13.3683 LearningRate 0.0853 Epoch: 1 Global Step: 63230 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:49:57,024-Speed 2627.83 samples/sec Loss 13.3567 LearningRate 0.0853 Epoch: 1 Global Step: 63240 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:00,925-Speed 2626.01 samples/sec Loss 13.2165 LearningRate 0.0853 Epoch: 1 Global Step: 63250 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:04,822-Speed 2627.90 samples/sec Loss 13.2742 LearningRate 0.0853 Epoch: 1 Global Step: 63260 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:08,718-Speed 2628.80 samples/sec Loss 13.3632 LearningRate 0.0853 Epoch: 1 Global Step: 63270 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:12,618-Speed 2626.35 samples/sec Loss 13.3971 LearningRate 0.0853 Epoch: 1 Global Step: 63280 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:16,516-Speed 2627.80 samples/sec Loss 13.2626 LearningRate 0.0853 Epoch: 1 Global Step: 63290 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:20,417-Speed 2625.37 samples/sec Loss 13.4275 LearningRate 0.0853 Epoch: 1 Global Step: 63300 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:24,319-Speed 2625.62 samples/sec Loss 13.2726 LearningRate 0.0853 Epoch: 1 Global Step: 63310 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:28,228-Speed 2619.92 samples/sec Loss 13.2992 LearningRate 0.0853 Epoch: 1 Global Step: 63320 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:32,129-Speed 2625.52 samples/sec Loss 13.1742 LearningRate 0.0853 Epoch: 1 Global Step: 63330 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:50:36,026-Speed 2628.16 samples/sec Loss 13.4018 LearningRate 0.0853 Epoch: 1 Global Step: 63340 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:50:39,908-Speed 2638.05 samples/sec Loss 13.2481 LearningRate 0.0853 Epoch: 1 Global Step: 63350 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:43,812-Speed 2623.92 samples/sec Loss 13.2596 LearningRate 0.0853 Epoch: 1 Global Step: 63360 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:47,713-Speed 2626.41 samples/sec Loss 13.3129 LearningRate 0.0853 Epoch: 1 Global Step: 63370 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:51,619-Speed 2621.71 samples/sec Loss 13.3258 LearningRate 0.0853 Epoch: 1 Global Step: 63380 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:55,527-Speed 2621.27 samples/sec Loss 13.3430 LearningRate 0.0853 Epoch: 1 Global Step: 63390 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:50:59,421-Speed 2629.93 samples/sec Loss 13.1516 LearningRate 0.0853 Epoch: 1 Global Step: 63400 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:03,340-Speed 2613.95 samples/sec Loss 13.3329 LearningRate 0.0853 Epoch: 1 Global Step: 63410 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:07,250-Speed 2619.06 samples/sec Loss 13.3731 LearningRate 0.0853 Epoch: 1 Global Step: 63420 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:11,154-Speed 2623.97 samples/sec Loss 13.2253 LearningRate 0.0853 Epoch: 1 Global Step: 63430 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:15,065-Speed 2618.71 samples/sec Loss 13.3533 LearningRate 0.0853 Epoch: 1 Global Step: 63440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:18,960-Speed 2629.04 samples/sec Loss 13.1676 LearningRate 0.0853 Epoch: 1 Global Step: 63450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:22,857-Speed 2628.53 samples/sec Loss 13.3289 LearningRate 0.0853 Epoch: 1 Global Step: 63460 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:26,754-Speed 2628.31 samples/sec Loss 13.2518 LearningRate 0.0853 Epoch: 1 Global Step: 63470 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:30,663-Speed 2620.22 samples/sec Loss 13.4082 LearningRate 0.0853 Epoch: 1 Global Step: 63480 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:34,571-Speed 2620.90 samples/sec Loss 13.2805 LearningRate 0.0853 Epoch: 1 Global Step: 63490 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:38,473-Speed 2625.25 samples/sec Loss 13.2677 LearningRate 0.0853 Epoch: 1 Global Step: 63500 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:42,378-Speed 2622.73 samples/sec Loss 13.3747 LearningRate 0.0853 Epoch: 1 Global Step: 63510 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:46,292-Speed 2616.62 samples/sec Loss 13.3463 LearningRate 0.0853 Epoch: 1 Global Step: 63520 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:50,198-Speed 2622.06 samples/sec Loss 13.3993 LearningRate 0.0853 Epoch: 1 Global Step: 63530 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:54,090-Speed 2631.97 samples/sec Loss 13.2181 LearningRate 0.0853 Epoch: 1 Global Step: 63540 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:51:57,991-Speed 2624.97 samples/sec Loss 13.3999 LearningRate 0.0853 Epoch: 1 Global Step: 63550 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:52:01,879-Speed 2635.09 samples/sec Loss 13.4173 LearningRate 0.0853 Epoch: 1 Global Step: 63560 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:05,855-Speed 2575.44 samples/sec Loss 13.3583 LearningRate 0.0853 Epoch: 1 Global Step: 63570 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:09,753-Speed 2627.70 samples/sec Loss 13.3676 LearningRate 0.0853 Epoch: 1 Global Step: 63580 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:13,649-Speed 2628.90 samples/sec Loss 13.2815 LearningRate 0.0853 Epoch: 1 Global Step: 63590 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:17,547-Speed 2627.58 samples/sec Loss 13.2495 LearningRate 0.0853 Epoch: 1 Global Step: 63600 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:21,440-Speed 2631.45 samples/sec Loss 13.1462 LearningRate 0.0853 Epoch: 1 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:25,333-Speed 2630.60 samples/sec Loss 13.3134 LearningRate 0.0853 Epoch: 1 Global Step: 63620 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:29,230-Speed 2628.59 samples/sec Loss 13.2686 LearningRate 0.0852 Epoch: 1 Global Step: 63630 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:33,147-Speed 2614.78 samples/sec Loss 13.2918 LearningRate 0.0852 Epoch: 1 Global Step: 63640 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:37,079-Speed 2604.16 samples/sec Loss 13.1984 LearningRate 0.0852 Epoch: 1 Global Step: 63650 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:40,983-Speed 2624.00 samples/sec Loss 13.2432 LearningRate 0.0852 Epoch: 1 Global Step: 63660 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:52:44,895-Speed 2618.04 samples/sec Loss 13.3672 LearningRate 0.0852 Epoch: 1 Global Step: 63670 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:52:48,791-Speed 2629.02 samples/sec Loss 13.3607 LearningRate 0.0852 Epoch: 1 Global Step: 63680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:52,689-Speed 2628.22 samples/sec Loss 13.3508 LearningRate 0.0852 Epoch: 1 Global Step: 63690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:52:56,580-Speed 2631.68 samples/sec Loss 13.2638 LearningRate 0.0852 Epoch: 1 Global Step: 63700 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:00,476-Speed 2629.04 samples/sec Loss 13.2811 LearningRate 0.0852 Epoch: 1 Global Step: 63710 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:04,376-Speed 2626.41 samples/sec Loss 13.3088 LearningRate 0.0852 Epoch: 1 Global Step: 63720 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:08,272-Speed 2628.93 samples/sec Loss 13.0835 LearningRate 0.0852 Epoch: 1 Global Step: 63730 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:12,185-Speed 2617.39 samples/sec Loss 13.1615 LearningRate 0.0852 Epoch: 1 Global Step: 63740 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:16,069-Speed 2636.90 samples/sec Loss 13.2394 LearningRate 0.0852 Epoch: 1 Global Step: 63750 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:19,962-Speed 2630.79 samples/sec Loss 13.3251 LearningRate 0.0852 Epoch: 1 Global Step: 63760 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:23,857-Speed 2629.84 samples/sec Loss 13.3367 LearningRate 0.0852 Epoch: 1 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:27,752-Speed 2629.50 samples/sec Loss 13.3794 LearningRate 0.0852 Epoch: 1 Global Step: 63780 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:53:31,652-Speed 2626.86 samples/sec Loss 13.3238 LearningRate 0.0852 Epoch: 1 Global Step: 63790 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:53:35,549-Speed 2627.84 samples/sec Loss 13.2127 LearningRate 0.0852 Epoch: 1 Global Step: 63800 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:53:39,447-Speed 2627.85 samples/sec Loss 13.3512 LearningRate 0.0852 Epoch: 1 Global Step: 63810 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:53:43,341-Speed 2630.15 samples/sec Loss 13.2691 LearningRate 0.0852 Epoch: 1 Global Step: 63820 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:53:47,241-Speed 2626.50 samples/sec Loss 13.2478 LearningRate 0.0852 Epoch: 1 Global Step: 63830 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:53:51,120-Speed 2640.03 samples/sec Loss 13.2521 LearningRate 0.0852 Epoch: 1 Global Step: 63840 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:55,018-Speed 2627.33 samples/sec Loss 13.1363 LearningRate 0.0852 Epoch: 1 Global Step: 63850 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:53:58,927-Speed 2620.85 samples/sec Loss 13.3038 LearningRate 0.0852 Epoch: 1 Global Step: 63860 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:02,843-Speed 2615.44 samples/sec Loss 13.3626 LearningRate 0.0852 Epoch: 1 Global Step: 63870 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:06,743-Speed 2626.04 samples/sec Loss 13.2959 LearningRate 0.0852 Epoch: 1 Global Step: 63880 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:10,645-Speed 2625.09 samples/sec Loss 13.3655 LearningRate 0.0852 Epoch: 1 Global Step: 63890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:14,544-Speed 2626.66 samples/sec Loss 13.2218 LearningRate 0.0852 Epoch: 1 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:18,437-Speed 2631.01 samples/sec Loss 13.3349 LearningRate 0.0852 Epoch: 1 Global Step: 63910 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:22,335-Speed 2627.47 samples/sec Loss 13.2666 LearningRate 0.0852 Epoch: 1 Global Step: 63920 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:26,236-Speed 2626.12 samples/sec Loss 13.2355 LearningRate 0.0852 Epoch: 1 Global Step: 63930 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:30,131-Speed 2628.90 samples/sec Loss 13.3930 LearningRate 0.0852 Epoch: 1 Global Step: 63940 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:54:34,018-Speed 2635.20 samples/sec Loss 13.1191 LearningRate 0.0852 Epoch: 1 Global Step: 63950 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:37,916-Speed 2627.29 samples/sec Loss 13.1780 LearningRate 0.0852 Epoch: 1 Global Step: 63960 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:41,809-Speed 2631.40 samples/sec Loss 13.2333 LearningRate 0.0852 Epoch: 1 Global Step: 63970 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:54:45,691-Speed 2638.80 samples/sec Loss 13.4094 LearningRate 0.0852 Epoch: 1 Global Step: 63980 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:54:49,590-Speed 2626.66 samples/sec Loss 13.3402 LearningRate 0.0852 Epoch: 1 Global Step: 63990 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:54:53,492-Speed 2625.24 samples/sec Loss 13.4194 LearningRate 0.0852 Epoch: 1 Global Step: 64000 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:54:57,398-Speed 2621.68 samples/sec Loss 13.3158 LearningRate 0.0852 Epoch: 1 Global Step: 64010 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:01,292-Speed 2630.00 samples/sec Loss 13.3464 LearningRate 0.0852 Epoch: 1 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:05,189-Speed 2628.31 samples/sec Loss 13.2994 LearningRate 0.0852 Epoch: 1 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:09,083-Speed 2630.68 samples/sec Loss 13.1751 LearningRate 0.0852 Epoch: 1 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:12,979-Speed 2628.37 samples/sec Loss 13.1872 LearningRate 0.0852 Epoch: 1 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:16,876-Speed 2635.68 samples/sec Loss 13.3201 LearningRate 0.0852 Epoch: 1 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:20,772-Speed 2628.76 samples/sec Loss 13.2712 LearningRate 0.0852 Epoch: 1 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:24,668-Speed 2629.27 samples/sec Loss 13.3779 LearningRate 0.0851 Epoch: 1 Global Step: 64080 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:55:28,605-Speed 2601.35 samples/sec Loss 13.2116 LearningRate 0.0851 Epoch: 1 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:32,539-Speed 2603.48 samples/sec Loss 13.2942 LearningRate 0.0851 Epoch: 1 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:36,437-Speed 2627.76 samples/sec Loss 13.3646 LearningRate 0.0851 Epoch: 1 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:40,331-Speed 2630.42 samples/sec Loss 13.3101 LearningRate 0.0851 Epoch: 1 Global Step: 64120 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:44,231-Speed 2626.06 samples/sec Loss 13.2955 LearningRate 0.0851 Epoch: 1 Global Step: 64130 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:48,127-Speed 2628.66 samples/sec Loss 13.1928 LearningRate 0.0851 Epoch: 1 Global Step: 64140 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:52,027-Speed 2626.15 samples/sec Loss 13.1853 LearningRate 0.0851 Epoch: 1 Global Step: 64150 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:55,924-Speed 2629.26 samples/sec Loss 13.4107 LearningRate 0.0851 Epoch: 1 Global Step: 64160 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:55:59,824-Speed 2626.13 samples/sec Loss 13.2075 LearningRate 0.0851 Epoch: 1 Global Step: 64170 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:56:03,718-Speed 2630.08 samples/sec Loss 13.1581 LearningRate 0.0851 Epoch: 1 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 02:56:07,612-Speed 2630.09 samples/sec Loss 13.3761 LearningRate 0.0851 Epoch: 1 Global Step: 64190 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:11,505-Speed 2630.60 samples/sec Loss 13.1109 LearningRate 0.0851 Epoch: 1 Global Step: 64200 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:15,399-Speed 2630.13 samples/sec Loss 13.1551 LearningRate 0.0851 Epoch: 1 Global Step: 64210 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:19,297-Speed 2627.82 samples/sec Loss 13.2896 LearningRate 0.0851 Epoch: 1 Global Step: 64220 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:23,191-Speed 2630.35 samples/sec Loss 13.3608 LearningRate 0.0851 Epoch: 1 Global Step: 64230 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:27,082-Speed 2633.08 samples/sec Loss 13.2728 LearningRate 0.0851 Epoch: 1 Global Step: 64240 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:30,974-Speed 2631.75 samples/sec Loss 13.2110 LearningRate 0.0851 Epoch: 1 Global Step: 64250 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:34,868-Speed 2629.97 samples/sec Loss 13.2587 LearningRate 0.0851 Epoch: 1 Global Step: 64260 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:38,761-Speed 2631.02 samples/sec Loss 13.2071 LearningRate 0.0851 Epoch: 1 Global Step: 64270 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:42,661-Speed 2626.12 samples/sec Loss 13.2521 LearningRate 0.0851 Epoch: 1 Global Step: 64280 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:46,559-Speed 2628.15 samples/sec Loss 13.3925 LearningRate 0.0851 Epoch: 1 Global Step: 64290 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:56:50,439-Speed 2639.28 samples/sec Loss 13.1689 LearningRate 0.0851 Epoch: 1 Global Step: 64300 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:54,332-Speed 2631.24 samples/sec Loss 13.2020 LearningRate 0.0851 Epoch: 1 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:56:58,243-Speed 2618.50 samples/sec Loss 13.2640 LearningRate 0.0851 Epoch: 1 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:02,157-Speed 2616.98 samples/sec Loss 13.2457 LearningRate 0.0851 Epoch: 1 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:06,065-Speed 2620.75 samples/sec Loss 13.2606 LearningRate 0.0851 Epoch: 1 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:09,965-Speed 2626.85 samples/sec Loss 13.2554 LearningRate 0.0851 Epoch: 1 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:13,861-Speed 2628.70 samples/sec Loss 13.1444 LearningRate 0.0851 Epoch: 1 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:17,761-Speed 2626.32 samples/sec Loss 13.2717 LearningRate 0.0851 Epoch: 1 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:21,659-Speed 2627.99 samples/sec Loss 13.2657 LearningRate 0.0851 Epoch: 1 Global Step: 64380 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:25,559-Speed 2625.79 samples/sec Loss 13.2508 LearningRate 0.0851 Epoch: 1 Global Step: 64390 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:29,459-Speed 2626.77 samples/sec Loss 13.1680 LearningRate 0.0851 Epoch: 1 Global Step: 64400 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:57:33,354-Speed 2629.36 samples/sec Loss 13.3435 LearningRate 0.0851 Epoch: 1 Global Step: 64410 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:57:37,248-Speed 2630.27 samples/sec Loss 13.2497 LearningRate 0.0851 Epoch: 1 Global Step: 64420 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:57:41,145-Speed 2628.29 samples/sec Loss 13.1946 LearningRate 0.0851 Epoch: 1 Global Step: 64430 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:57:45,024-Speed 2641.11 samples/sec Loss 13.2616 LearningRate 0.0851 Epoch: 1 Global Step: 64440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:48,917-Speed 2630.30 samples/sec Loss 13.3695 LearningRate 0.0851 Epoch: 1 Global Step: 64450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:52,810-Speed 2631.20 samples/sec Loss 13.4091 LearningRate 0.0851 Epoch: 1 Global Step: 64460 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:57:56,707-Speed 2628.29 samples/sec Loss 13.2849 LearningRate 0.0851 Epoch: 1 Global Step: 64470 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:00,628-Speed 2612.63 samples/sec Loss 13.3470 LearningRate 0.0851 Epoch: 1 Global Step: 64480 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:04,524-Speed 2628.87 samples/sec Loss 13.3042 LearningRate 0.0851 Epoch: 1 Global Step: 64490 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:08,427-Speed 2624.62 samples/sec Loss 13.2841 LearningRate 0.0851 Epoch: 1 Global Step: 64500 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:12,335-Speed 2620.71 samples/sec Loss 13.4183 LearningRate 0.0851 Epoch: 1 Global Step: 64510 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:16,230-Speed 2629.68 samples/sec Loss 13.2285 LearningRate 0.0851 Epoch: 1 Global Step: 64520 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:20,125-Speed 2629.88 samples/sec Loss 13.1408 LearningRate 0.0850 Epoch: 1 Global Step: 64530 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:24,019-Speed 2630.30 samples/sec Loss 13.2651 LearningRate 0.0850 Epoch: 1 Global Step: 64540 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:58:27,918-Speed 2626.62 samples/sec Loss 13.1190 LearningRate 0.0850 Epoch: 1 Global Step: 64550 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:58:31,828-Speed 2620.08 samples/sec Loss 13.1849 LearningRate 0.0850 Epoch: 1 Global Step: 64560 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:58:35,724-Speed 2629.03 samples/sec Loss 13.3745 LearningRate 0.0850 Epoch: 1 Global Step: 64570 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:58:39,627-Speed 2623.91 samples/sec Loss 13.3602 LearningRate 0.0850 Epoch: 1 Global Step: 64580 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:58:43,522-Speed 2629.84 samples/sec Loss 13.3493 LearningRate 0.0850 Epoch: 1 Global Step: 64590 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:58:47,418-Speed 2629.12 samples/sec Loss 13.2922 LearningRate 0.0850 Epoch: 1 Global Step: 64600 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:51,341-Speed 2610.75 samples/sec Loss 13.2899 LearningRate 0.0850 Epoch: 1 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:55,239-Speed 2627.77 samples/sec Loss 13.3618 LearningRate 0.0850 Epoch: 1 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:58:59,142-Speed 2624.28 samples/sec Loss 13.2760 LearningRate 0.0850 Epoch: 1 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:03,052-Speed 2619.06 samples/sec Loss 13.1646 LearningRate 0.0850 Epoch: 1 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:06,951-Speed 2626.80 samples/sec Loss 13.1048 LearningRate 0.0850 Epoch: 1 Global Step: 64650 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:10,851-Speed 2626.65 samples/sec Loss 13.2173 LearningRate 0.0850 Epoch: 1 Global Step: 64660 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:14,758-Speed 2622.14 samples/sec Loss 13.2687 LearningRate 0.0850 Epoch: 1 Global Step: 64670 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:18,653-Speed 2629.03 samples/sec Loss 13.1877 LearningRate 0.0850 Epoch: 1 Global Step: 64680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:22,553-Speed 2626.71 samples/sec Loss 13.1900 LearningRate 0.0850 Epoch: 1 Global Step: 64690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:26,446-Speed 2630.66 samples/sec Loss 13.2228 LearningRate 0.0850 Epoch: 1 Global Step: 64700 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 02:59:30,324-Speed 2641.41 samples/sec Loss 13.1976 LearningRate 0.0850 Epoch: 1 Global Step: 64710 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:34,220-Speed 2629.05 samples/sec Loss 13.2596 LearningRate 0.0850 Epoch: 1 Global Step: 64720 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:38,116-Speed 2628.55 samples/sec Loss 13.2388 LearningRate 0.0850 Epoch: 1 Global Step: 64730 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:42,016-Speed 2626.42 samples/sec Loss 13.2333 LearningRate 0.0850 Epoch: 1 Global Step: 64740 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:45,912-Speed 2629.32 samples/sec Loss 13.3068 LearningRate 0.0850 Epoch: 1 Global Step: 64750 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:49,807-Speed 2629.11 samples/sec Loss 13.4075 LearningRate 0.0850 Epoch: 1 Global Step: 64760 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:53,701-Speed 2630.68 samples/sec Loss 13.1788 LearningRate 0.0850 Epoch: 1 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 02:59:57,597-Speed 2628.77 samples/sec Loss 13.3232 LearningRate 0.0850 Epoch: 1 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:01,523-Speed 2609.10 samples/sec Loss 13.3398 LearningRate 0.0850 Epoch: 1 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:05,417-Speed 2630.41 samples/sec Loss 13.2695 LearningRate 0.0850 Epoch: 1 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:09,312-Speed 2629.46 samples/sec Loss 13.2503 LearningRate 0.0850 Epoch: 1 Global Step: 64810 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:00:13,209-Speed 2628.17 samples/sec Loss 13.2499 LearningRate 0.0850 Epoch: 1 Global Step: 64820 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:00:17,090-Speed 2639.64 samples/sec Loss 13.1672 LearningRate 0.0850 Epoch: 1 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:20,993-Speed 2624.55 samples/sec Loss 13.4367 LearningRate 0.0850 Epoch: 1 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:24,892-Speed 2626.46 samples/sec Loss 13.1942 LearningRate 0.0850 Epoch: 1 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:28,797-Speed 2622.81 samples/sec Loss 13.3154 LearningRate 0.0850 Epoch: 1 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:32,698-Speed 2625.65 samples/sec Loss 13.2392 LearningRate 0.0850 Epoch: 1 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:36,601-Speed 2625.06 samples/sec Loss 13.1243 LearningRate 0.0850 Epoch: 1 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:40,498-Speed 2628.24 samples/sec Loss 13.1938 LearningRate 0.0850 Epoch: 1 Global Step: 64890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:44,399-Speed 2625.01 samples/sec Loss 13.2727 LearningRate 0.0850 Epoch: 1 Global Step: 64900 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:48,296-Speed 2628.86 samples/sec Loss 13.2447 LearningRate 0.0850 Epoch: 1 Global Step: 64910 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:52,246-Speed 2593.12 samples/sec Loss 13.2131 LearningRate 0.0850 Epoch: 1 Global Step: 64920 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:00:56,303-Speed 2524.53 samples/sec Loss 13.2230 LearningRate 0.0850 Epoch: 1 Global Step: 64930 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:01:00,181-Speed 2641.30 samples/sec Loss 13.1195 LearningRate 0.0850 Epoch: 1 Global Step: 64940 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:01:04,083-Speed 2624.79 samples/sec Loss 13.2710 LearningRate 0.0850 Epoch: 1 Global Step: 64950 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:01:07,980-Speed 2628.71 samples/sec Loss 13.2584 LearningRate 0.0850 Epoch: 1 Global Step: 64960 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:01:11,920-Speed 2627.14 samples/sec Loss 13.2530 LearningRate 0.0850 Epoch: 1 Global Step: 64970 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:01:15,832-Speed 2617.95 samples/sec Loss 13.0314 LearningRate 0.0849 Epoch: 1 Global Step: 64980 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:01:19,754-Speed 2648.66 samples/sec Loss 13.1973 LearningRate 0.0849 Epoch: 1 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:23,647-Speed 2631.33 samples/sec Loss 13.2046 LearningRate 0.0849 Epoch: 1 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:28,264-Speed 2636.16 samples/sec Loss 13.1232 LearningRate 0.0849 Epoch: 1 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:32,180-Speed 2615.23 samples/sec Loss 13.0974 LearningRate 0.0849 Epoch: 1 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:36,077-Speed 2628.49 samples/sec Loss 13.2505 LearningRate 0.0849 Epoch: 1 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:39,968-Speed 2633.02 samples/sec Loss 13.2375 LearningRate 0.0849 Epoch: 1 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:43,861-Speed 2630.98 samples/sec Loss 13.3354 LearningRate 0.0849 Epoch: 1 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:49,353-Speed 2639.93 samples/sec Loss 13.1588 LearningRate 0.0849 Epoch: 1 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:53,263-Speed 2619.65 samples/sec Loss 13.2779 LearningRate 0.0849 Epoch: 1 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:01:57,155-Speed 2632.29 samples/sec Loss 13.1475 LearningRate 0.0849 Epoch: 1 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:02:01,060-Speed 2622.87 samples/sec Loss 13.0752 LearningRate 0.0849 Epoch: 1 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:05,000-Speed 2599.42 samples/sec Loss 13.0956 LearningRate 0.0849 Epoch: 1 Global Step: 65100 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:08,977-Speed 2575.46 samples/sec Loss 13.2936 LearningRate 0.0849 Epoch: 1 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:12,875-Speed 2627.86 samples/sec Loss 13.3806 LearningRate 0.0849 Epoch: 1 Global Step: 65120 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:16,794-Speed 2613.72 samples/sec Loss 13.0535 LearningRate 0.0849 Epoch: 1 Global Step: 65130 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:20,699-Speed 2622.48 samples/sec Loss 13.2026 LearningRate 0.0849 Epoch: 1 Global Step: 65140 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:24,607-Speed 2621.78 samples/sec Loss 13.1727 LearningRate 0.0849 Epoch: 1 Global Step: 65150 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:28,505-Speed 2627.15 samples/sec Loss 13.2420 LearningRate 0.0849 Epoch: 1 Global Step: 65160 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:32,437-Speed 2604.97 samples/sec Loss 13.3828 LearningRate 0.0849 Epoch: 1 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:36,345-Speed 2621.11 samples/sec Loss 13.1567 LearningRate 0.0849 Epoch: 1 Global Step: 65180 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:02:40,243-Speed 2627.45 samples/sec Loss 13.1781 LearningRate 0.0849 Epoch: 1 Global Step: 65190 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:02:44,147-Speed 2623.50 samples/sec Loss 13.1654 LearningRate 0.0849 Epoch: 1 Global Step: 65200 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:02:48,047-Speed 2626.22 samples/sec Loss 13.1107 LearningRate 0.0849 Epoch: 1 Global Step: 65210 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:02:51,944-Speed 2628.36 samples/sec Loss 13.2766 LearningRate 0.0849 Epoch: 1 Global Step: 65220 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:02:55,843-Speed 2627.20 samples/sec Loss 13.2787 LearningRate 0.0849 Epoch: 1 Global Step: 65230 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:02:59,743-Speed 2626.33 samples/sec Loss 13.1520 LearningRate 0.0849 Epoch: 1 Global Step: 65240 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:03:03,639-Speed 2628.77 samples/sec Loss 13.0162 LearningRate 0.0849 Epoch: 1 Global Step: 65250 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:03:07,542-Speed 2624.37 samples/sec Loss 13.3421 LearningRate 0.0849 Epoch: 1 Global Step: 65260 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:03:11,436-Speed 2633.65 samples/sec Loss 13.1105 LearningRate 0.0849 Epoch: 1 Global Step: 65270 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:03:15,319-Speed 2637.54 samples/sec Loss 13.3958 LearningRate 0.0849 Epoch: 1 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:19,231-Speed 2617.88 samples/sec Loss 13.3622 LearningRate 0.0849 Epoch: 1 Global Step: 65290 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:23,134-Speed 2624.99 samples/sec Loss 13.1664 LearningRate 0.0849 Epoch: 1 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:27,028-Speed 2630.39 samples/sec Loss 13.4229 LearningRate 0.0849 Epoch: 1 Global Step: 65310 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:30,948-Speed 2618.11 samples/sec Loss 13.2035 LearningRate 0.0849 Epoch: 1 Global Step: 65320 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:34,846-Speed 2627.59 samples/sec Loss 13.2717 LearningRate 0.0849 Epoch: 1 Global Step: 65330 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:38,742-Speed 2629.20 samples/sec Loss 13.2312 LearningRate 0.0849 Epoch: 1 Global Step: 65340 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:42,639-Speed 2628.18 samples/sec Loss 13.4032 LearningRate 0.0849 Epoch: 1 Global Step: 65350 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:46,565-Speed 2609.00 samples/sec Loss 13.2408 LearningRate 0.0849 Epoch: 1 Global Step: 65360 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:50,477-Speed 2618.27 samples/sec Loss 13.1927 LearningRate 0.0849 Epoch: 1 Global Step: 65370 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:03:54,365-Speed 2634.51 samples/sec Loss 13.0360 LearningRate 0.0849 Epoch: 1 Global Step: 65380 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:03:58,257-Speed 2631.38 samples/sec Loss 13.2781 LearningRate 0.0849 Epoch: 1 Global Step: 65390 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:04:02,146-Speed 2633.79 samples/sec Loss 13.2476 LearningRate 0.0849 Epoch: 1 Global Step: 65400 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:04:06,026-Speed 2640.02 samples/sec Loss 13.1613 LearningRate 0.0849 Epoch: 1 Global Step: 65410 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:04:09,926-Speed 2626.55 samples/sec Loss 13.1496 LearningRate 0.0849 Epoch: 1 Global Step: 65420 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:04:13,822-Speed 2628.66 samples/sec Loss 13.1243 LearningRate 0.0848 Epoch: 1 Global Step: 65430 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:04:17,729-Speed 2621.53 samples/sec Loss 13.1820 LearningRate 0.0848 Epoch: 1 Global Step: 65440 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:04:21,622-Speed 2631.44 samples/sec Loss 13.0934 LearningRate 0.0848 Epoch: 1 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:04:25,513-Speed 2631.92 samples/sec Loss 13.3097 LearningRate 0.0848 Epoch: 1 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:04:29,384-Speed 2646.54 samples/sec Loss 13.2349 LearningRate 0.0848 Epoch: 1 Global Step: 65470 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:04:33,284-Speed 2625.50 samples/sec Loss 13.2358 LearningRate 0.0848 Epoch: 1 Global Step: 65480 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:04:37,174-Speed 2633.14 samples/sec Loss 13.3214 LearningRate 0.0848 Epoch: 1 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:04:41,066-Speed 2632.02 samples/sec Loss 13.2518 LearningRate 0.0848 Epoch: 1 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:04:44,960-Speed 2630.49 samples/sec Loss 13.3331 LearningRate 0.0848 Epoch: 1 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:04:48,860-Speed 2626.39 samples/sec Loss 13.4197 LearningRate 0.0848 Epoch: 1 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:04:52,750-Speed 2633.02 samples/sec Loss 13.3722 LearningRate 0.0848 Epoch: 1 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:04:56,663-Speed 2617.42 samples/sec Loss 13.2497 LearningRate 0.0848 Epoch: 1 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:00,560-Speed 2629.14 samples/sec Loss 13.3285 LearningRate 0.0848 Epoch: 1 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:04,480-Speed 2612.31 samples/sec Loss 13.2202 LearningRate 0.0848 Epoch: 1 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:08,380-Speed 2625.89 samples/sec Loss 13.0786 LearningRate 0.0848 Epoch: 1 Global Step: 65570 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:05:12,275-Speed 2629.82 samples/sec Loss 13.1755 LearningRate 0.0848 Epoch: 1 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:16,174-Speed 2627.14 samples/sec Loss 13.2810 LearningRate 0.0848 Epoch: 1 Global Step: 65590 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:20,065-Speed 2632.70 samples/sec Loss 13.2268 LearningRate 0.0848 Epoch: 1 Global Step: 65600 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:23,973-Speed 2620.90 samples/sec Loss 13.1894 LearningRate 0.0848 Epoch: 1 Global Step: 65610 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:27,864-Speed 2632.86 samples/sec Loss 13.2611 LearningRate 0.0848 Epoch: 1 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:31,751-Speed 2635.24 samples/sec Loss 13.1354 LearningRate 0.0848 Epoch: 1 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:35,641-Speed 2633.22 samples/sec Loss 13.4953 LearningRate 0.0848 Epoch: 1 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:39,533-Speed 2631.50 samples/sec Loss 13.0618 LearningRate 0.0848 Epoch: 1 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:43,425-Speed 2631.20 samples/sec Loss 13.3888 LearningRate 0.0848 Epoch: 1 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:47,313-Speed 2634.70 samples/sec Loss 13.3300 LearningRate 0.0848 Epoch: 1 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:05:51,216-Speed 2623.98 samples/sec Loss 13.1951 LearningRate 0.0848 Epoch: 1 Global Step: 65680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:05:55,110-Speed 2630.73 samples/sec Loss 13.1743 LearningRate 0.0848 Epoch: 1 Global Step: 65690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:05:59,005-Speed 2629.92 samples/sec Loss 13.2145 LearningRate 0.0848 Epoch: 1 Global Step: 65700 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:06:02,905-Speed 2626.38 samples/sec Loss 13.0858 LearningRate 0.0848 Epoch: 1 Global Step: 65710 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:06:06,799-Speed 2630.85 samples/sec Loss 13.1818 LearningRate 0.0848 Epoch: 1 Global Step: 65720 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:06:10,704-Speed 2622.86 samples/sec Loss 13.3455 LearningRate 0.0848 Epoch: 1 Global Step: 65730 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:06:14,595-Speed 2632.23 samples/sec Loss 13.2073 LearningRate 0.0848 Epoch: 1 Global Step: 65740 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:06:18,492-Speed 2628.61 samples/sec Loss 13.1120 LearningRate 0.0848 Epoch: 1 Global Step: 65750 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:06:22,397-Speed 2623.39 samples/sec Loss 13.4083 LearningRate 0.0848 Epoch: 1 Global Step: 65760 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:06:26,288-Speed 2631.89 samples/sec Loss 13.3361 LearningRate 0.0848 Epoch: 1 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:30,199-Speed 2619.06 samples/sec Loss 13.3089 LearningRate 0.0848 Epoch: 1 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:34,094-Speed 2629.85 samples/sec Loss 13.0672 LearningRate 0.0848 Epoch: 1 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:37,996-Speed 2625.46 samples/sec Loss 13.1924 LearningRate 0.0848 Epoch: 1 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:41,893-Speed 2628.31 samples/sec Loss 13.1029 LearningRate 0.0848 Epoch: 1 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:45,784-Speed 2631.99 samples/sec Loss 13.3658 LearningRate 0.0848 Epoch: 1 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:49,683-Speed 2626.95 samples/sec Loss 13.1760 LearningRate 0.0848 Epoch: 1 Global Step: 65830 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:53,579-Speed 2629.13 samples/sec Loss 13.3934 LearningRate 0.0848 Epoch: 1 Global Step: 65840 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:06:57,476-Speed 2628.08 samples/sec Loss 13.1680 LearningRate 0.0848 Epoch: 1 Global Step: 65850 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:07:01,374-Speed 2627.39 samples/sec Loss 13.1587 LearningRate 0.0848 Epoch: 1 Global Step: 65860 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:07:05,264-Speed 2633.68 samples/sec Loss 13.0995 LearningRate 0.0848 Epoch: 1 Global Step: 65870 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:09,154-Speed 2632.51 samples/sec Loss 13.1336 LearningRate 0.0847 Epoch: 1 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:13,051-Speed 2629.01 samples/sec Loss 13.1072 LearningRate 0.0847 Epoch: 1 Global Step: 65890 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:16,945-Speed 2631.14 samples/sec Loss 13.3096 LearningRate 0.0847 Epoch: 1 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:20,837-Speed 2631.44 samples/sec Loss 13.2200 LearningRate 0.0847 Epoch: 1 Global Step: 65910 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:24,737-Speed 2626.33 samples/sec Loss 13.1808 LearningRate 0.0847 Epoch: 1 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:28,653-Speed 2615.13 samples/sec Loss 13.3155 LearningRate 0.0847 Epoch: 1 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:32,564-Speed 2619.02 samples/sec Loss 13.2000 LearningRate 0.0847 Epoch: 1 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:36,454-Speed 2633.18 samples/sec Loss 13.1434 LearningRate 0.0847 Epoch: 1 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:40,346-Speed 2631.77 samples/sec Loss 13.2376 LearningRate 0.0847 Epoch: 1 Global Step: 65960 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:44,240-Speed 2630.36 samples/sec Loss 13.1803 LearningRate 0.0847 Epoch: 1 Global Step: 65970 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:07:48,177-Speed 2601.72 samples/sec Loss 13.0175 LearningRate 0.0847 Epoch: 1 Global Step: 65980 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:07:52,177-Speed 2560.43 samples/sec Loss 13.1538 LearningRate 0.0847 Epoch: 1 Global Step: 65990 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:07:56,057-Speed 2639.61 samples/sec Loss 13.2638 LearningRate 0.0847 Epoch: 1 Global Step: 66000 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:07:59,953-Speed 2628.85 samples/sec Loss 13.0831 LearningRate 0.0847 Epoch: 1 Global Step: 66010 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:03,847-Speed 2630.16 samples/sec Loss 13.1211 LearningRate 0.0847 Epoch: 1 Global Step: 66020 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:07,744-Speed 2628.71 samples/sec Loss 13.0967 LearningRate 0.0847 Epoch: 1 Global Step: 66030 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:11,640-Speed 2629.07 samples/sec Loss 13.1995 LearningRate 0.0847 Epoch: 1 Global Step: 66040 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:15,534-Speed 2630.54 samples/sec Loss 13.0462 LearningRate 0.0847 Epoch: 1 Global Step: 66050 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:19,440-Speed 2622.48 samples/sec Loss 12.9790 LearningRate 0.0847 Epoch: 1 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:23,341-Speed 2624.94 samples/sec Loss 13.1543 LearningRate 0.0847 Epoch: 1 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:27,234-Speed 2631.20 samples/sec Loss 13.1697 LearningRate 0.0847 Epoch: 1 Global Step: 66080 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:31,143-Speed 2620.28 samples/sec Loss 13.0959 LearningRate 0.0847 Epoch: 1 Global Step: 66090 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:35,029-Speed 2635.69 samples/sec Loss 13.0844 LearningRate 0.0847 Epoch: 1 Global Step: 66100 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:08:38,928-Speed 2626.31 samples/sec Loss 13.1158 LearningRate 0.0847 Epoch: 1 Global Step: 66110 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:08:42,830-Speed 2625.53 samples/sec Loss 13.1025 LearningRate 0.0847 Epoch: 1 Global Step: 66120 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:08:46,726-Speed 2629.04 samples/sec Loss 13.1047 LearningRate 0.0847 Epoch: 1 Global Step: 66130 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:08:50,624-Speed 2627.50 samples/sec Loss 13.1765 LearningRate 0.0847 Epoch: 1 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:54,519-Speed 2630.00 samples/sec Loss 13.1062 LearningRate 0.0847 Epoch: 1 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:08:58,412-Speed 2631.12 samples/sec Loss 13.0686 LearningRate 0.0847 Epoch: 1 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:02,306-Speed 2630.17 samples/sec Loss 13.0564 LearningRate 0.0847 Epoch: 1 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:06,204-Speed 2626.85 samples/sec Loss 13.2017 LearningRate 0.0847 Epoch: 1 Global Step: 66180 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:10,095-Speed 2632.38 samples/sec Loss 13.2267 LearningRate 0.0847 Epoch: 1 Global Step: 66190 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:13,991-Speed 2629.48 samples/sec Loss 13.3136 LearningRate 0.0847 Epoch: 1 Global Step: 66200 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:17,901-Speed 2619.91 samples/sec Loss 13.1573 LearningRate 0.0847 Epoch: 1 Global Step: 66210 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:21,795-Speed 2630.37 samples/sec Loss 13.0054 LearningRate 0.0847 Epoch: 1 Global Step: 66220 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:25,693-Speed 2627.75 samples/sec Loss 13.1942 LearningRate 0.0847 Epoch: 1 Global Step: 66230 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:29,605-Speed 2618.20 samples/sec Loss 13.1211 LearningRate 0.0847 Epoch: 1 Global Step: 66240 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:09:33,490-Speed 2636.13 samples/sec Loss 13.2019 LearningRate 0.0847 Epoch: 1 Global Step: 66250 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:37,380-Speed 2633.13 samples/sec Loss 13.0895 LearningRate 0.0847 Epoch: 1 Global Step: 66260 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:41,270-Speed 2633.00 samples/sec Loss 13.1128 LearningRate 0.0847 Epoch: 1 Global Step: 66270 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:45,189-Speed 2613.39 samples/sec Loss 13.3736 LearningRate 0.0847 Epoch: 1 Global Step: 66280 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:49,106-Speed 2615.05 samples/sec Loss 13.2475 LearningRate 0.0847 Epoch: 1 Global Step: 66290 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:53,110-Speed 2557.89 samples/sec Loss 13.1949 LearningRate 0.0847 Epoch: 1 Global Step: 66300 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:09:57,099-Speed 2567.79 samples/sec Loss 13.2226 LearningRate 0.0847 Epoch: 1 Global Step: 66310 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:01,222-Speed 2484.26 samples/sec Loss 13.1080 LearningRate 0.0847 Epoch: 1 Global Step: 66320 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:05,140-Speed 2614.48 samples/sec Loss 13.1755 LearningRate 0.0846 Epoch: 1 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:09,033-Speed 2630.92 samples/sec Loss 13.2472 LearningRate 0.0846 Epoch: 1 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:12,935-Speed 2624.86 samples/sec Loss 13.1614 LearningRate 0.0846 Epoch: 1 Global Step: 66350 Fp16 Grad Scale: 262144 Required: 86 hours
Training: 2022-04-13 03:10:16,816-Speed 2638.45 samples/sec Loss 13.2133 LearningRate 0.0846 Epoch: 1 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:20,718-Speed 2624.94 samples/sec Loss 12.9383 LearningRate 0.0846 Epoch: 1 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:24,616-Speed 2627.51 samples/sec Loss 13.1503 LearningRate 0.0846 Epoch: 1 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:28,513-Speed 2628.85 samples/sec Loss 13.3614 LearningRate 0.0846 Epoch: 1 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:32,413-Speed 2626.48 samples/sec Loss 13.2911 LearningRate 0.0846 Epoch: 1 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:36,314-Speed 2625.01 samples/sec Loss 13.1195 LearningRate 0.0846 Epoch: 1 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:40,208-Speed 2630.08 samples/sec Loss 13.1293 LearningRate 0.0846 Epoch: 1 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:10:44,093-Speed 2636.41 samples/sec Loss 13.2052 LearningRate 0.0846 Epoch: 1 Global Step: 66430 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:10:47,987-Speed 2630.52 samples/sec Loss 13.1790 LearningRate 0.0846 Epoch: 1 Global Step: 66440 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:10:51,887-Speed 2626.40 samples/sec Loss 13.1746 LearningRate 0.0846 Epoch: 1 Global Step: 66450 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:10:55,879-Speed 2565.77 samples/sec Loss 13.0447 LearningRate 0.0846 Epoch: 1 Global Step: 66460 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:10:59,782-Speed 2624.21 samples/sec Loss 13.1974 LearningRate 0.0846 Epoch: 1 Global Step: 66470 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:03,690-Speed 2621.65 samples/sec Loss 13.1820 LearningRate 0.0846 Epoch: 1 Global Step: 66480 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:07,584-Speed 2629.67 samples/sec Loss 13.2310 LearningRate 0.0846 Epoch: 1 Global Step: 66490 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:11,482-Speed 2627.79 samples/sec Loss 13.1913 LearningRate 0.0846 Epoch: 1 Global Step: 66500 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:15,402-Speed 2612.26 samples/sec Loss 13.1184 LearningRate 0.0846 Epoch: 1 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:19,297-Speed 2629.41 samples/sec Loss 13.0054 LearningRate 0.0846 Epoch: 1 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:23,260-Speed 2585.47 samples/sec Loss 13.1773 LearningRate 0.0846 Epoch: 1 Global Step: 66530 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:11:27,177-Speed 2614.69 samples/sec Loss 13.3053 LearningRate 0.0846 Epoch: 1 Global Step: 66540 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:11:31,074-Speed 2628.29 samples/sec Loss 13.1476 LearningRate 0.0846 Epoch: 1 Global Step: 66550 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:11:35,036-Speed 2585.20 samples/sec Loss 13.2276 LearningRate 0.0846 Epoch: 1 Global Step: 66560 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:11:38,939-Speed 2624.57 samples/sec Loss 13.1176 LearningRate 0.0846 Epoch: 1 Global Step: 66570 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:11:42,818-Speed 2640.50 samples/sec Loss 13.0472 LearningRate 0.0846 Epoch: 1 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:46,713-Speed 2629.38 samples/sec Loss 13.0986 LearningRate 0.0846 Epoch: 1 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:50,632-Speed 2613.77 samples/sec Loss 13.1401 LearningRate 0.0846 Epoch: 1 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:54,537-Speed 2622.79 samples/sec Loss 13.1029 LearningRate 0.0846 Epoch: 1 Global Step: 66610 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:11:58,431-Speed 2630.89 samples/sec Loss 13.0777 LearningRate 0.0846 Epoch: 1 Global Step: 66620 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:12:02,342-Speed 2618.65 samples/sec Loss 13.1769 LearningRate 0.0846 Epoch: 1 Global Step: 66630 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:12:06,244-Speed 2624.63 samples/sec Loss 13.1071 LearningRate 0.0846 Epoch: 1 Global Step: 66640 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:12:10,159-Speed 2616.32 samples/sec Loss 13.2194 LearningRate 0.0846 Epoch: 1 Global Step: 66650 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:12:14,062-Speed 2623.80 samples/sec Loss 13.1530 LearningRate 0.0846 Epoch: 1 Global Step: 66660 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:12:17,968-Speed 2630.50 samples/sec Loss 13.2080 LearningRate 0.0846 Epoch: 1 Global Step: 66670 Fp16 Grad Scale: 65536 Required: 86 hours
Training: 2022-04-13 03:12:21,873-Speed 2622.98 samples/sec Loss 13.0534 LearningRate 0.0846 Epoch: 1 Global Step: 66680 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:25,778-Speed 2622.56 samples/sec Loss 13.1035 LearningRate 0.0846 Epoch: 1 Global Step: 66690 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:29,680-Speed 2625.38 samples/sec Loss 13.1057 LearningRate 0.0846 Epoch: 1 Global Step: 66700 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:33,581-Speed 2625.80 samples/sec Loss 13.1757 LearningRate 0.0846 Epoch: 1 Global Step: 66710 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:37,484-Speed 2624.30 samples/sec Loss 13.1324 LearningRate 0.0846 Epoch: 1 Global Step: 66720 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:41,387-Speed 2623.85 samples/sec Loss 13.3083 LearningRate 0.0846 Epoch: 1 Global Step: 66730 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:45,285-Speed 2628.10 samples/sec Loss 13.2944 LearningRate 0.0846 Epoch: 1 Global Step: 66740 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:49,186-Speed 2625.32 samples/sec Loss 13.2566 LearningRate 0.0846 Epoch: 1 Global Step: 66750 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:53,085-Speed 2627.02 samples/sec Loss 13.2630 LearningRate 0.0846 Epoch: 1 Global Step: 66760 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:12:56,984-Speed 2626.96 samples/sec Loss 13.0976 LearningRate 0.0846 Epoch: 1 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 86 hours
Training: 2022-04-13 03:13:00,865-Speed 2639.04 samples/sec Loss 13.0757 LearningRate 0.0845 Epoch: 1 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:04,764-Speed 2626.91 samples/sec Loss 13.2636 LearningRate 0.0845 Epoch: 1 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:08,664-Speed 2626.68 samples/sec Loss 13.3068 LearningRate 0.0845 Epoch: 1 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:12,562-Speed 2627.68 samples/sec Loss 13.2350 LearningRate 0.0845 Epoch: 1 Global Step: 66810 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:16,464-Speed 2625.50 samples/sec Loss 13.1200 LearningRate 0.0845 Epoch: 1 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:20,362-Speed 2627.44 samples/sec Loss 13.2493 LearningRate 0.0845 Epoch: 1 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:24,284-Speed 2611.62 samples/sec Loss 13.1430 LearningRate 0.0845 Epoch: 1 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:28,183-Speed 2627.46 samples/sec Loss 13.2126 LearningRate 0.0845 Epoch: 1 Global Step: 66850 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:32,081-Speed 2627.67 samples/sec Loss 13.2522 LearningRate 0.0845 Epoch: 1 Global Step: 66860 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:35,984-Speed 2623.90 samples/sec Loss 13.2498 LearningRate 0.0845 Epoch: 1 Global Step: 66870 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:39,861-Speed 2641.81 samples/sec Loss 13.0770 LearningRate 0.0845 Epoch: 1 Global Step: 66880 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:43,756-Speed 2629.89 samples/sec Loss 13.2297 LearningRate 0.0845 Epoch: 1 Global Step: 66890 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:47,654-Speed 2628.07 samples/sec Loss 13.2150 LearningRate 0.0845 Epoch: 1 Global Step: 66900 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:51,559-Speed 2622.93 samples/sec Loss 13.2312 LearningRate 0.0845 Epoch: 1 Global Step: 66910 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:55,478-Speed 2613.44 samples/sec Loss 13.2701 LearningRate 0.0845 Epoch: 1 Global Step: 66920 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:13:59,375-Speed 2628.60 samples/sec Loss 13.2239 LearningRate 0.0845 Epoch: 1 Global Step: 66930 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:03,274-Speed 2626.70 samples/sec Loss 13.1048 LearningRate 0.0845 Epoch: 1 Global Step: 66940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:07,179-Speed 2622.70 samples/sec Loss 13.2573 LearningRate 0.0845 Epoch: 1 Global Step: 66950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:11,078-Speed 2627.27 samples/sec Loss 13.1020 LearningRate 0.0845 Epoch: 1 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:14,975-Speed 2628.53 samples/sec Loss 13.1586 LearningRate 0.0845 Epoch: 1 Global Step: 66970 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:18,917-Speed 2598.59 samples/sec Loss 13.0615 LearningRate 0.0845 Epoch: 1 Global Step: 66980 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:14:22,789-Speed 2645.53 samples/sec Loss 13.2300 LearningRate 0.0845 Epoch: 1 Global Step: 66990 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:26,684-Speed 2629.76 samples/sec Loss 13.3256 LearningRate 0.0845 Epoch: 1 Global Step: 67000 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:30,584-Speed 2626.17 samples/sec Loss 13.3267 LearningRate 0.0845 Epoch: 1 Global Step: 67010 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:34,504-Speed 2612.50 samples/sec Loss 13.2256 LearningRate 0.0845 Epoch: 1 Global Step: 67020 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:38,445-Speed 2598.80 samples/sec Loss 13.2631 LearningRate 0.0845 Epoch: 1 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:14:42,347-Speed 2625.17 samples/sec Loss 13.1660 LearningRate 0.0845 Epoch: 1 Global Step: 67040 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:14:46,248-Speed 2625.78 samples/sec Loss 13.1768 LearningRate 0.0845 Epoch: 1 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:14:50,155-Speed 2621.76 samples/sec Loss 13.2562 LearningRate 0.0845 Epoch: 1 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:14:54,058-Speed 2623.81 samples/sec Loss 12.9899 LearningRate 0.0845 Epoch: 1 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:14:57,965-Speed 2621.56 samples/sec Loss 13.0506 LearningRate 0.0845 Epoch: 1 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:15:01,871-Speed 2622.41 samples/sec Loss 13.1423 LearningRate 0.0845 Epoch: 1 Global Step: 67090 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:15:05,771-Speed 2626.29 samples/sec Loss 13.1482 LearningRate 0.0845 Epoch: 1 Global Step: 67100 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:15:09,676-Speed 2622.28 samples/sec Loss 13.2009 LearningRate 0.0845 Epoch: 1 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:15:13,574-Speed 2628.31 samples/sec Loss 13.2907 LearningRate 0.0845 Epoch: 1 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:15:17,474-Speed 2625.92 samples/sec Loss 13.2296 LearningRate 0.0845 Epoch: 1 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:15:21,369-Speed 2630.50 samples/sec Loss 13.1927 LearningRate 0.0845 Epoch: 1 Global Step: 67140 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:25,267-Speed 2627.11 samples/sec Loss 13.1098 LearningRate 0.0845 Epoch: 1 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:29,163-Speed 2628.90 samples/sec Loss 13.2518 LearningRate 0.0845 Epoch: 1 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:33,061-Speed 2627.81 samples/sec Loss 13.2367 LearningRate 0.0845 Epoch: 1 Global Step: 67170 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:36,960-Speed 2626.71 samples/sec Loss 13.2514 LearningRate 0.0845 Epoch: 1 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:40,883-Speed 2610.59 samples/sec Loss 13.0858 LearningRate 0.0845 Epoch: 1 Global Step: 67190 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:44,769-Speed 2635.49 samples/sec Loss 13.2226 LearningRate 0.0845 Epoch: 1 Global Step: 67200 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:48,673-Speed 2623.99 samples/sec Loss 13.0898 LearningRate 0.0845 Epoch: 1 Global Step: 67210 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:52,582-Speed 2620.28 samples/sec Loss 13.0349 LearningRate 0.0845 Epoch: 1 Global Step: 67220 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:15:56,477-Speed 2630.15 samples/sec Loss 13.1290 LearningRate 0.0844 Epoch: 1 Global Step: 67230 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:00,373-Speed 2628.48 samples/sec Loss 13.2341 LearningRate 0.0844 Epoch: 1 Global Step: 67240 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:16:04,274-Speed 2625.80 samples/sec Loss 13.1589 LearningRate 0.0844 Epoch: 1 Global Step: 67250 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:16:08,174-Speed 2625.74 samples/sec Loss 13.1740 LearningRate 0.0844 Epoch: 1 Global Step: 67260 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:16:12,062-Speed 2634.82 samples/sec Loss 13.1363 LearningRate 0.0844 Epoch: 1 Global Step: 67270 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:15,976-Speed 2616.38 samples/sec Loss 13.1764 LearningRate 0.0844 Epoch: 1 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:19,879-Speed 2624.09 samples/sec Loss 13.1522 LearningRate 0.0844 Epoch: 1 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:23,782-Speed 2625.40 samples/sec Loss 13.1491 LearningRate 0.0844 Epoch: 1 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:27,694-Speed 2617.71 samples/sec Loss 13.1132 LearningRate 0.0844 Epoch: 1 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:31,606-Speed 2618.95 samples/sec Loss 13.1188 LearningRate 0.0844 Epoch: 1 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:35,509-Speed 2624.18 samples/sec Loss 13.2671 LearningRate 0.0844 Epoch: 1 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:39,407-Speed 2627.60 samples/sec Loss 13.3159 LearningRate 0.0844 Epoch: 1 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:43,309-Speed 2624.87 samples/sec Loss 13.2172 LearningRate 0.0844 Epoch: 1 Global Step: 67350 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:47,219-Speed 2619.36 samples/sec Loss 13.1095 LearningRate 0.0844 Epoch: 1 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:16:51,075-Speed 2656.19 samples/sec Loss 13.1994 LearningRate 0.0844 Epoch: 1 Global Step: 67370 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:16:54,975-Speed 2627.23 samples/sec Loss 13.3090 LearningRate 0.0844 Epoch: 1 Global Step: 67380 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:16:58,876-Speed 2625.45 samples/sec Loss 12.9944 LearningRate 0.0844 Epoch: 1 Global Step: 67390 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:02,782-Speed 2622.01 samples/sec Loss 13.1100 LearningRate 0.0844 Epoch: 1 Global Step: 67400 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:06,681-Speed 2627.01 samples/sec Loss 13.0688 LearningRate 0.0844 Epoch: 1 Global Step: 67410 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:10,577-Speed 2628.68 samples/sec Loss 13.1818 LearningRate 0.0844 Epoch: 1 Global Step: 67420 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:14,484-Speed 2621.31 samples/sec Loss 13.1839 LearningRate 0.0844 Epoch: 1 Global Step: 67430 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:18,398-Speed 2617.46 samples/sec Loss 13.1162 LearningRate 0.0844 Epoch: 1 Global Step: 67440 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:22,312-Speed 2616.95 samples/sec Loss 13.2472 LearningRate 0.0844 Epoch: 1 Global Step: 67450 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:26,230-Speed 2613.93 samples/sec Loss 13.0948 LearningRate 0.0844 Epoch: 1 Global Step: 67460 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:17:30,151-Speed 2612.76 samples/sec Loss 13.2391 LearningRate 0.0844 Epoch: 1 Global Step: 67470 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:17:34,052-Speed 2625.78 samples/sec Loss 13.1826 LearningRate 0.0844 Epoch: 1 Global Step: 67480 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:17:37,949-Speed 2627.81 samples/sec Loss 13.2824 LearningRate 0.0844 Epoch: 1 Global Step: 67490 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:17:41,851-Speed 2624.80 samples/sec Loss 13.0881 LearningRate 0.0844 Epoch: 1 Global Step: 67500 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:17:45,756-Speed 2622.80 samples/sec Loss 13.2354 LearningRate 0.0844 Epoch: 1 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:17:49,655-Speed 2627.13 samples/sec Loss 13.1957 LearningRate 0.0844 Epoch: 1 Global Step: 67520 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:17:53,558-Speed 2625.36 samples/sec Loss 13.0965 LearningRate 0.0844 Epoch: 1 Global Step: 67530 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:17:57,458-Speed 2625.59 samples/sec Loss 13.2561 LearningRate 0.0844 Epoch: 1 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:18:01,372-Speed 2617.60 samples/sec Loss 13.2547 LearningRate 0.0844 Epoch: 1 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:18:05,267-Speed 2629.43 samples/sec Loss 13.3247 LearningRate 0.0844 Epoch: 1 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:18:09,166-Speed 2627.00 samples/sec Loss 13.2157 LearningRate 0.0844 Epoch: 1 Global Step: 67570 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:13,063-Speed 2628.29 samples/sec Loss 13.2210 LearningRate 0.0844 Epoch: 1 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:16,959-Speed 2628.86 samples/sec Loss 13.2148 LearningRate 0.0844 Epoch: 1 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:20,859-Speed 2626.40 samples/sec Loss 13.0497 LearningRate 0.0844 Epoch: 1 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:24,764-Speed 2622.75 samples/sec Loss 13.2458 LearningRate 0.0844 Epoch: 1 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:28,661-Speed 2628.54 samples/sec Loss 13.1620 LearningRate 0.0844 Epoch: 1 Global Step: 67620 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:32,560-Speed 2627.33 samples/sec Loss 13.0330 LearningRate 0.0844 Epoch: 1 Global Step: 67630 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:36,460-Speed 2626.16 samples/sec Loss 13.1491 LearningRate 0.0844 Epoch: 1 Global Step: 67640 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:40,365-Speed 2622.24 samples/sec Loss 13.1678 LearningRate 0.0844 Epoch: 1 Global Step: 67650 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:44,264-Speed 2627.03 samples/sec Loss 13.2005 LearningRate 0.0844 Epoch: 1 Global Step: 67660 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:18:48,220-Speed 2589.16 samples/sec Loss 13.1003 LearningRate 0.0844 Epoch: 1 Global Step: 67670 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:18:52,117-Speed 2628.37 samples/sec Loss 13.1051 LearningRate 0.0843 Epoch: 1 Global Step: 67680 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:18:56,016-Speed 2627.05 samples/sec Loss 13.0784 LearningRate 0.0843 Epoch: 1 Global Step: 67690 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:18:59,914-Speed 2627.61 samples/sec Loss 13.1140 LearningRate 0.0843 Epoch: 1 Global Step: 67700 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:19:03,815-Speed 2625.29 samples/sec Loss 13.1237 LearningRate 0.0843 Epoch: 1 Global Step: 67710 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:19:07,706-Speed 2632.44 samples/sec Loss 13.2440 LearningRate 0.0843 Epoch: 1 Global Step: 67720 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:11,613-Speed 2621.37 samples/sec Loss 13.0762 LearningRate 0.0843 Epoch: 1 Global Step: 67730 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:15,518-Speed 2622.96 samples/sec Loss 13.2404 LearningRate 0.0843 Epoch: 1 Global Step: 67740 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:19,426-Speed 2621.22 samples/sec Loss 13.1261 LearningRate 0.0843 Epoch: 1 Global Step: 67750 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:23,333-Speed 2620.97 samples/sec Loss 13.0577 LearningRate 0.0843 Epoch: 1 Global Step: 67760 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:27,244-Speed 2619.13 samples/sec Loss 13.1565 LearningRate 0.0843 Epoch: 1 Global Step: 67770 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:31,157-Speed 2617.19 samples/sec Loss 13.1190 LearningRate 0.0843 Epoch: 1 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:35,063-Speed 2622.06 samples/sec Loss 13.1709 LearningRate 0.0843 Epoch: 1 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:38,969-Speed 2622.41 samples/sec Loss 13.2268 LearningRate 0.0843 Epoch: 1 Global Step: 67800 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:42,872-Speed 2624.89 samples/sec Loss 13.1004 LearningRate 0.0843 Epoch: 1 Global Step: 67810 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:46,781-Speed 2620.07 samples/sec Loss 13.0839 LearningRate 0.0843 Epoch: 1 Global Step: 67820 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:19:50,726-Speed 2595.78 samples/sec Loss 13.0654 LearningRate 0.0843 Epoch: 1 Global Step: 67830 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:19:54,611-Speed 2636.21 samples/sec Loss 13.2686 LearningRate 0.0843 Epoch: 1 Global Step: 67840 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:19:58,517-Speed 2622.54 samples/sec Loss 13.1470 LearningRate 0.0843 Epoch: 1 Global Step: 67850 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:02,414-Speed 2628.33 samples/sec Loss 13.1090 LearningRate 0.0843 Epoch: 1 Global Step: 67860 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:06,312-Speed 2627.66 samples/sec Loss 13.1766 LearningRate 0.0843 Epoch: 1 Global Step: 67870 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:10,212-Speed 2625.68 samples/sec Loss 13.1550 LearningRate 0.0843 Epoch: 1 Global Step: 67880 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:14,109-Speed 2628.53 samples/sec Loss 13.0809 LearningRate 0.0843 Epoch: 1 Global Step: 67890 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:18,018-Speed 2620.32 samples/sec Loss 12.9716 LearningRate 0.0843 Epoch: 1 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:21,919-Speed 2626.00 samples/sec Loss 13.2628 LearningRate 0.0843 Epoch: 1 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:25,829-Speed 2619.15 samples/sec Loss 13.2789 LearningRate 0.0843 Epoch: 1 Global Step: 67920 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:29,734-Speed 2623.27 samples/sec Loss 13.1466 LearningRate 0.0843 Epoch: 1 Global Step: 67930 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:33,619-Speed 2636.21 samples/sec Loss 13.2963 LearningRate 0.0843 Epoch: 1 Global Step: 67940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:37,518-Speed 2626.48 samples/sec Loss 13.1635 LearningRate 0.0843 Epoch: 1 Global Step: 67950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:20:41,402-Speed 2636.73 samples/sec Loss 13.2954 LearningRate 0.0843 Epoch: 1 Global Step: 67960 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:20:45,298-Speed 2628.77 samples/sec Loss 13.1718 LearningRate 0.0843 Epoch: 1 Global Step: 67970 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:20:49,218-Speed 2613.13 samples/sec Loss 13.1301 LearningRate 0.0843 Epoch: 1 Global Step: 67980 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:20:53,119-Speed 2626.05 samples/sec Loss 13.2058 LearningRate 0.0843 Epoch: 1 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:20:57,020-Speed 2625.48 samples/sec Loss 13.2753 LearningRate 0.0843 Epoch: 1 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:21:00,917-Speed 2628.41 samples/sec Loss 13.2015 LearningRate 0.0843 Epoch: 1 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:21:04,812-Speed 2628.83 samples/sec Loss 13.1133 LearningRate 0.0843 Epoch: 1 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:21:08,707-Speed 2629.74 samples/sec Loss 13.0572 LearningRate 0.0843 Epoch: 1 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:21:12,607-Speed 2626.40 samples/sec Loss 13.1965 LearningRate 0.0843 Epoch: 1 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:21:16,503-Speed 2628.52 samples/sec Loss 13.2194 LearningRate 0.0843 Epoch: 1 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:21:20,403-Speed 2626.62 samples/sec Loss 13.2277 LearningRate 0.0843 Epoch: 1 Global Step: 68060 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:24,298-Speed 2629.27 samples/sec Loss 13.0651 LearningRate 0.0843 Epoch: 1 Global Step: 68070 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:28,197-Speed 2627.78 samples/sec Loss 13.1251 LearningRate 0.0843 Epoch: 1 Global Step: 68080 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:32,094-Speed 2627.77 samples/sec Loss 13.1110 LearningRate 0.0843 Epoch: 1 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:35,988-Speed 2630.00 samples/sec Loss 13.0518 LearningRate 0.0843 Epoch: 1 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:39,886-Speed 2627.99 samples/sec Loss 13.1603 LearningRate 0.0843 Epoch: 1 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:43,792-Speed 2621.91 samples/sec Loss 13.2489 LearningRate 0.0843 Epoch: 1 Global Step: 68120 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:47,780-Speed 2568.22 samples/sec Loss 13.2781 LearningRate 0.0842 Epoch: 1 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:51,697-Speed 2615.16 samples/sec Loss 13.2469 LearningRate 0.0842 Epoch: 1 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:55,596-Speed 2626.88 samples/sec Loss 13.2267 LearningRate 0.0842 Epoch: 1 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:21:59,500-Speed 2623.66 samples/sec Loss 13.1503 LearningRate 0.0842 Epoch: 1 Global Step: 68160 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:03,398-Speed 2627.32 samples/sec Loss 13.1789 LearningRate 0.0842 Epoch: 1 Global Step: 68170 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:07,301-Speed 2624.32 samples/sec Loss 13.2234 LearningRate 0.0842 Epoch: 1 Global Step: 68180 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:11,203-Speed 2625.20 samples/sec Loss 13.1464 LearningRate 0.0842 Epoch: 1 Global Step: 68190 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:15,096-Speed 2630.87 samples/sec Loss 13.0952 LearningRate 0.0842 Epoch: 1 Global Step: 68200 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:19,004-Speed 2621.06 samples/sec Loss 13.1935 LearningRate 0.0842 Epoch: 1 Global Step: 68210 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:22,899-Speed 2629.85 samples/sec Loss 13.2705 LearningRate 0.0842 Epoch: 1 Global Step: 68220 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:26,802-Speed 2624.23 samples/sec Loss 13.0348 LearningRate 0.0842 Epoch: 1 Global Step: 68230 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:30,703-Speed 2625.00 samples/sec Loss 13.0893 LearningRate 0.0842 Epoch: 1 Global Step: 68240 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:34,611-Speed 2621.37 samples/sec Loss 13.1987 LearningRate 0.0842 Epoch: 1 Global Step: 68250 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:38,491-Speed 2639.53 samples/sec Loss 13.0775 LearningRate 0.0842 Epoch: 1 Global Step: 68260 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:42,390-Speed 2626.99 samples/sec Loss 13.1950 LearningRate 0.0842 Epoch: 1 Global Step: 68270 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:46,288-Speed 2627.29 samples/sec Loss 13.1731 LearningRate 0.0842 Epoch: 1 Global Step: 68280 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:50,194-Speed 2622.66 samples/sec Loss 13.1447 LearningRate 0.0842 Epoch: 1 Global Step: 68290 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:54,100-Speed 2622.04 samples/sec Loss 13.0945 LearningRate 0.0842 Epoch: 1 Global Step: 68300 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:22:58,014-Speed 2617.19 samples/sec Loss 13.1383 LearningRate 0.0842 Epoch: 1 Global Step: 68310 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:23:01,978-Speed 2583.40 samples/sec Loss 13.0784 LearningRate 0.0842 Epoch: 1 Global Step: 68320 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:23:05,879-Speed 2625.96 samples/sec Loss 13.0261 LearningRate 0.0842 Epoch: 1 Global Step: 68330 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:23:09,777-Speed 2626.97 samples/sec Loss 13.0815 LearningRate 0.0842 Epoch: 1 Global Step: 68340 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:23:13,677-Speed 2626.46 samples/sec Loss 13.0264 LearningRate 0.0842 Epoch: 1 Global Step: 68350 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:23:17,562-Speed 2637.17 samples/sec Loss 13.1374 LearningRate 0.0842 Epoch: 1 Global Step: 68360 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:21,460-Speed 2627.13 samples/sec Loss 13.2232 LearningRate 0.0842 Epoch: 1 Global Step: 68370 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:25,360-Speed 2626.71 samples/sec Loss 13.0529 LearningRate 0.0842 Epoch: 1 Global Step: 68380 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:29,257-Speed 2627.79 samples/sec Loss 13.0623 LearningRate 0.0842 Epoch: 1 Global Step: 68390 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:33,154-Speed 2628.57 samples/sec Loss 13.0619 LearningRate 0.0842 Epoch: 1 Global Step: 68400 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:37,050-Speed 2628.54 samples/sec Loss 13.2266 LearningRate 0.0842 Epoch: 1 Global Step: 68410 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:40,972-Speed 2611.23 samples/sec Loss 13.0784 LearningRate 0.0842 Epoch: 1 Global Step: 68420 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:44,869-Speed 2628.33 samples/sec Loss 13.1806 LearningRate 0.0842 Epoch: 1 Global Step: 68430 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:48,923-Speed 2526.81 samples/sec Loss 13.0629 LearningRate 0.0842 Epoch: 1 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:52,827-Speed 2623.56 samples/sec Loss 13.1804 LearningRate 0.0842 Epoch: 1 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:23:56,726-Speed 2627.15 samples/sec Loss 13.1544 LearningRate 0.0842 Epoch: 1 Global Step: 68460 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:00,631-Speed 2622.54 samples/sec Loss 13.1399 LearningRate 0.0842 Epoch: 1 Global Step: 68470 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:04,529-Speed 2627.55 samples/sec Loss 13.1466 LearningRate 0.0842 Epoch: 1 Global Step: 68480 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:08,428-Speed 2626.76 samples/sec Loss 12.9848 LearningRate 0.0842 Epoch: 1 Global Step: 68490 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:12,327-Speed 2627.09 samples/sec Loss 13.0588 LearningRate 0.0842 Epoch: 1 Global Step: 68500 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:16,226-Speed 2626.75 samples/sec Loss 12.8754 LearningRate 0.0842 Epoch: 1 Global Step: 68510 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:20,145-Speed 2613.51 samples/sec Loss 13.1025 LearningRate 0.0842 Epoch: 1 Global Step: 68520 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:24,052-Speed 2621.95 samples/sec Loss 12.8723 LearningRate 0.0842 Epoch: 1 Global Step: 68530 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:27,958-Speed 2622.03 samples/sec Loss 13.1687 LearningRate 0.0842 Epoch: 1 Global Step: 68540 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:24:31,848-Speed 2633.26 samples/sec Loss 13.0393 LearningRate 0.0842 Epoch: 1 Global Step: 68550 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:24:35,747-Speed 2626.61 samples/sec Loss 13.1494 LearningRate 0.0842 Epoch: 1 Global Step: 68560 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:24:39,644-Speed 2628.39 samples/sec Loss 13.0633 LearningRate 0.0842 Epoch: 1 Global Step: 68570 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:24:43,541-Speed 2628.04 samples/sec Loss 13.1134 LearningRate 0.0841 Epoch: 1 Global Step: 68580 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:24:47,444-Speed 2624.10 samples/sec Loss 12.9771 LearningRate 0.0841 Epoch: 1 Global Step: 68590 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:24:51,348-Speed 2623.95 samples/sec Loss 12.9951 LearningRate 0.0841 Epoch: 1 Global Step: 68600 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:24:55,250-Speed 2624.65 samples/sec Loss 13.2086 LearningRate 0.0841 Epoch: 1 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:24:59,309-Speed 2523.34 samples/sec Loss 13.2181 LearningRate 0.0841 Epoch: 1 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:03,275-Speed 2582.69 samples/sec Loss 13.1526 LearningRate 0.0841 Epoch: 1 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:07,177-Speed 2625.04 samples/sec Loss 13.0539 LearningRate 0.0841 Epoch: 1 Global Step: 68640 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:11,070-Speed 2630.52 samples/sec Loss 13.1106 LearningRate 0.0841 Epoch: 1 Global Step: 68650 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:25:14,965-Speed 2630.03 samples/sec Loss 13.1352 LearningRate 0.0841 Epoch: 1 Global Step: 68660 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:25:18,863-Speed 2627.99 samples/sec Loss 13.1936 LearningRate 0.0841 Epoch: 1 Global Step: 68670 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:25:22,787-Speed 2609.82 samples/sec Loss 13.1308 LearningRate 0.0841 Epoch: 1 Global Step: 68680 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:25:26,665-Speed 2641.15 samples/sec Loss 13.2287 LearningRate 0.0841 Epoch: 1 Global Step: 68690 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:30,562-Speed 2628.06 samples/sec Loss 12.9638 LearningRate 0.0841 Epoch: 1 Global Step: 68700 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:34,456-Speed 2630.38 samples/sec Loss 13.0639 LearningRate 0.0841 Epoch: 1 Global Step: 68710 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:38,357-Speed 2625.38 samples/sec Loss 13.1829 LearningRate 0.0841 Epoch: 1 Global Step: 68720 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:42,259-Speed 2625.09 samples/sec Loss 13.0422 LearningRate 0.0841 Epoch: 1 Global Step: 68730 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:46,152-Speed 2631.01 samples/sec Loss 13.1191 LearningRate 0.0841 Epoch: 1 Global Step: 68740 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:50,057-Speed 2623.32 samples/sec Loss 13.1568 LearningRate 0.0841 Epoch: 1 Global Step: 68750 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:53,962-Speed 2622.40 samples/sec Loss 13.1525 LearningRate 0.0841 Epoch: 1 Global Step: 68760 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:25:57,858-Speed 2628.92 samples/sec Loss 13.0552 LearningRate 0.0841 Epoch: 1 Global Step: 68770 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:01,753-Speed 2629.94 samples/sec Loss 12.9889 LearningRate 0.0841 Epoch: 1 Global Step: 68780 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:05,645-Speed 2631.00 samples/sec Loss 13.2197 LearningRate 0.0841 Epoch: 1 Global Step: 68790 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:26:09,542-Speed 2628.35 samples/sec Loss 13.0801 LearningRate 0.0841 Epoch: 1 Global Step: 68800 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:26:13,442-Speed 2626.69 samples/sec Loss 13.1503 LearningRate 0.0841 Epoch: 1 Global Step: 68810 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:26:17,419-Speed 2575.03 samples/sec Loss 13.1796 LearningRate 0.0841 Epoch: 1 Global Step: 68820 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:26:21,330-Speed 2619.42 samples/sec Loss 13.2273 LearningRate 0.0841 Epoch: 1 Global Step: 68830 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:25,223-Speed 2630.68 samples/sec Loss 13.2332 LearningRate 0.0841 Epoch: 1 Global Step: 68840 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:29,119-Speed 2629.21 samples/sec Loss 13.2602 LearningRate 0.0841 Epoch: 1 Global Step: 68850 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:33,019-Speed 2626.02 samples/sec Loss 12.9673 LearningRate 0.0841 Epoch: 1 Global Step: 68860 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:36,910-Speed 2632.48 samples/sec Loss 13.0360 LearningRate 0.0841 Epoch: 1 Global Step: 68870 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:40,803-Speed 2630.70 samples/sec Loss 13.0535 LearningRate 0.0841 Epoch: 1 Global Step: 68880 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:44,702-Speed 2626.98 samples/sec Loss 13.0337 LearningRate 0.0841 Epoch: 1 Global Step: 68890 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:48,601-Speed 2626.55 samples/sec Loss 13.0499 LearningRate 0.0841 Epoch: 1 Global Step: 68900 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:52,499-Speed 2628.52 samples/sec Loss 13.1130 LearningRate 0.0841 Epoch: 1 Global Step: 68910 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:26:56,388-Speed 2633.75 samples/sec Loss 13.0736 LearningRate 0.0841 Epoch: 1 Global Step: 68920 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:00,277-Speed 2633.13 samples/sec Loss 13.0897 LearningRate 0.0841 Epoch: 1 Global Step: 68930 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:27:04,151-Speed 2643.99 samples/sec Loss 13.1454 LearningRate 0.0841 Epoch: 1 Global Step: 68940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:08,042-Speed 2632.01 samples/sec Loss 12.9187 LearningRate 0.0841 Epoch: 1 Global Step: 68950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:11,959-Speed 2614.99 samples/sec Loss 12.9641 LearningRate 0.0841 Epoch: 1 Global Step: 68960 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:15,850-Speed 2631.85 samples/sec Loss 13.1578 LearningRate 0.0841 Epoch: 1 Global Step: 68970 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:19,751-Speed 2625.72 samples/sec Loss 13.1978 LearningRate 0.0841 Epoch: 1 Global Step: 68980 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:23,648-Speed 2628.34 samples/sec Loss 13.1343 LearningRate 0.0841 Epoch: 1 Global Step: 68990 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:27,544-Speed 2629.28 samples/sec Loss 13.1308 LearningRate 0.0841 Epoch: 1 Global Step: 69000 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:31,438-Speed 2630.08 samples/sec Loss 13.2036 LearningRate 0.0841 Epoch: 1 Global Step: 69010 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:35,336-Speed 2627.64 samples/sec Loss 13.1125 LearningRate 0.0841 Epoch: 1 Global Step: 69020 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:39,232-Speed 2628.82 samples/sec Loss 13.2319 LearningRate 0.0841 Epoch: 1 Global Step: 69030 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:27:43,090-Speed 2655.01 samples/sec Loss 13.0706 LearningRate 0.0840 Epoch: 1 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:27:46,999-Speed 2619.67 samples/sec Loss 13.2377 LearningRate 0.0840 Epoch: 1 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:27:50,934-Speed 2603.01 samples/sec Loss 13.1373 LearningRate 0.0840 Epoch: 1 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:27:54,826-Speed 2632.06 samples/sec Loss 13.2216 LearningRate 0.0840 Epoch: 1 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:27:58,717-Speed 2632.31 samples/sec Loss 13.0916 LearningRate 0.0840 Epoch: 1 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:28:02,615-Speed 2627.29 samples/sec Loss 13.1271 LearningRate 0.0840 Epoch: 1 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:28:06,508-Speed 2631.28 samples/sec Loss 13.2377 LearningRate 0.0840 Epoch: 1 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:28:10,418-Speed 2619.11 samples/sec Loss 13.0389 LearningRate 0.0840 Epoch: 1 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:28:14,317-Speed 2626.91 samples/sec Loss 13.1528 LearningRate 0.0840 Epoch: 1 Global Step: 69120 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:28:18,215-Speed 2627.92 samples/sec Loss 13.0519 LearningRate 0.0840 Epoch: 1 Global Step: 69130 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:28:22,111-Speed 2628.63 samples/sec Loss 13.1948 LearningRate 0.0840 Epoch: 1 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:26,007-Speed 2628.84 samples/sec Loss 13.1795 LearningRate 0.0840 Epoch: 1 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:29,900-Speed 2630.65 samples/sec Loss 13.2693 LearningRate 0.0840 Epoch: 1 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:33,798-Speed 2627.82 samples/sec Loss 13.0089 LearningRate 0.0840 Epoch: 1 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:37,691-Speed 2631.20 samples/sec Loss 13.1094 LearningRate 0.0840 Epoch: 1 Global Step: 69180 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:41,585-Speed 2629.97 samples/sec Loss 13.0420 LearningRate 0.0840 Epoch: 1 Global Step: 69190 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:45,482-Speed 2628.80 samples/sec Loss 13.0020 LearningRate 0.0840 Epoch: 1 Global Step: 69200 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:49,375-Speed 2631.04 samples/sec Loss 13.0281 LearningRate 0.0840 Epoch: 1 Global Step: 69210 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:53,268-Speed 2630.37 samples/sec Loss 13.1458 LearningRate 0.0840 Epoch: 1 Global Step: 69220 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:28:57,159-Speed 2632.27 samples/sec Loss 13.0451 LearningRate 0.0840 Epoch: 1 Global Step: 69230 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:01,052-Speed 2631.36 samples/sec Loss 13.0268 LearningRate 0.0840 Epoch: 1 Global Step: 69240 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:29:04,936-Speed 2636.65 samples/sec Loss 13.0794 LearningRate 0.0840 Epoch: 1 Global Step: 69250 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:08,826-Speed 2633.29 samples/sec Loss 13.0733 LearningRate 0.0840 Epoch: 1 Global Step: 69260 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:12,715-Speed 2633.78 samples/sec Loss 12.9822 LearningRate 0.0840 Epoch: 1 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:16,609-Speed 2630.17 samples/sec Loss 12.9287 LearningRate 0.0840 Epoch: 1 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:20,524-Speed 2616.13 samples/sec Loss 13.1434 LearningRate 0.0840 Epoch: 1 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:24,416-Speed 2632.08 samples/sec Loss 13.1199 LearningRate 0.0840 Epoch: 1 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:28,312-Speed 2628.88 samples/sec Loss 13.0324 LearningRate 0.0840 Epoch: 1 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:32,213-Speed 2625.48 samples/sec Loss 13.0068 LearningRate 0.0840 Epoch: 1 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:36,107-Speed 2630.79 samples/sec Loss 13.1684 LearningRate 0.0840 Epoch: 1 Global Step: 69330 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:40,008-Speed 2625.32 samples/sec Loss 13.0795 LearningRate 0.0840 Epoch: 1 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:29:43,905-Speed 2628.48 samples/sec Loss 13.2221 LearningRate 0.0840 Epoch: 1 Global Step: 69350 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:29:47,801-Speed 2629.46 samples/sec Loss 12.9967 LearningRate 0.0840 Epoch: 1 Global Step: 69360 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:29:51,695-Speed 2629.98 samples/sec Loss 13.1796 LearningRate 0.0840 Epoch: 1 Global Step: 69370 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:29:55,591-Speed 2629.45 samples/sec Loss 12.9912 LearningRate 0.0840 Epoch: 1 Global Step: 69380 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:29:59,492-Speed 2625.59 samples/sec Loss 13.0330 LearningRate 0.0840 Epoch: 1 Global Step: 69390 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:30:03,394-Speed 2624.68 samples/sec Loss 13.1486 LearningRate 0.0840 Epoch: 1 Global Step: 69400 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:30:07,292-Speed 2627.63 samples/sec Loss 13.0922 LearningRate 0.0840 Epoch: 1 Global Step: 69410 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:30:11,186-Speed 2630.91 samples/sec Loss 13.1388 LearningRate 0.0840 Epoch: 1 Global Step: 69420 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:30:15,099-Speed 2617.24 samples/sec Loss 13.0740 LearningRate 0.0840 Epoch: 1 Global Step: 69430 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:30:19,010-Speed 2618.65 samples/sec Loss 12.9446 LearningRate 0.0840 Epoch: 1 Global Step: 69440 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:30:22,890-Speed 2639.83 samples/sec Loss 13.2820 LearningRate 0.0840 Epoch: 1 Global Step: 69450 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:30:26,766-Speed 2642.55 samples/sec Loss 13.2058 LearningRate 0.0840 Epoch: 1 Global Step: 69460 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:30,816-Speed 2529.00 samples/sec Loss 13.0807 LearningRate 0.0840 Epoch: 1 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:34,711-Speed 2630.29 samples/sec Loss 13.0930 LearningRate 0.0840 Epoch: 1 Global Step: 69480 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:38,603-Speed 2631.42 samples/sec Loss 12.9164 LearningRate 0.0839 Epoch: 1 Global Step: 69490 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:42,497-Speed 2629.88 samples/sec Loss 13.0401 LearningRate 0.0839 Epoch: 1 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:46,394-Speed 2628.55 samples/sec Loss 13.1350 LearningRate 0.0839 Epoch: 1 Global Step: 69510 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:50,289-Speed 2629.91 samples/sec Loss 13.0345 LearningRate 0.0839 Epoch: 1 Global Step: 69520 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:54,179-Speed 2632.61 samples/sec Loss 13.1684 LearningRate 0.0839 Epoch: 1 Global Step: 69530 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:30:58,074-Speed 2630.04 samples/sec Loss 13.1541 LearningRate 0.0839 Epoch: 1 Global Step: 69540 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:01,976-Speed 2624.40 samples/sec Loss 13.0594 LearningRate 0.0839 Epoch: 1 Global Step: 69550 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:05,858-Speed 2638.52 samples/sec Loss 13.1130 LearningRate 0.0839 Epoch: 1 Global Step: 69560 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:09,751-Speed 2630.87 samples/sec Loss 13.0789 LearningRate 0.0839 Epoch: 1 Global Step: 69570 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:13,645-Speed 2630.72 samples/sec Loss 13.1564 LearningRate 0.0839 Epoch: 1 Global Step: 69580 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:17,536-Speed 2631.84 samples/sec Loss 13.1993 LearningRate 0.0839 Epoch: 1 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:21,438-Speed 2625.25 samples/sec Loss 13.1641 LearningRate 0.0839 Epoch: 1 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:25,341-Speed 2624.10 samples/sec Loss 13.2423 LearningRate 0.0839 Epoch: 1 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:29,243-Speed 2625.20 samples/sec Loss 13.1429 LearningRate 0.0839 Epoch: 1 Global Step: 69620 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:33,151-Speed 2620.93 samples/sec Loss 13.2138 LearningRate 0.0839 Epoch: 1 Global Step: 69630 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:37,054-Speed 2624.13 samples/sec Loss 13.0646 LearningRate 0.0839 Epoch: 1 Global Step: 69640 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:40,961-Speed 2621.67 samples/sec Loss 13.1430 LearningRate 0.0839 Epoch: 1 Global Step: 69650 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:31:44,874-Speed 2617.47 samples/sec Loss 13.1270 LearningRate 0.0839 Epoch: 1 Global Step: 69660 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:31:48,863-Speed 2567.52 samples/sec Loss 13.1866 LearningRate 0.0839 Epoch: 1 Global Step: 69670 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:31:52,768-Speed 2622.89 samples/sec Loss 13.1768 LearningRate 0.0839 Epoch: 1 Global Step: 69680 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:31:56,674-Speed 2622.55 samples/sec Loss 12.9814 LearningRate 0.0839 Epoch: 1 Global Step: 69690 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:32:00,557-Speed 2637.84 samples/sec Loss 13.2031 LearningRate 0.0839 Epoch: 1 Global Step: 69700 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:32:04,459-Speed 2624.52 samples/sec Loss 13.2039 LearningRate 0.0839 Epoch: 1 Global Step: 69710 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:32:08,374-Speed 2616.16 samples/sec Loss 13.0219 LearningRate 0.0839 Epoch: 1 Global Step: 69720 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:32:12,305-Speed 2605.37 samples/sec Loss 13.1924 LearningRate 0.0839 Epoch: 1 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:32:16,205-Speed 2627.13 samples/sec Loss 13.0110 LearningRate 0.0839 Epoch: 1 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:32:20,113-Speed 2621.03 samples/sec Loss 13.0741 LearningRate 0.0839 Epoch: 1 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:32:24,005-Speed 2631.61 samples/sec Loss 13.1269 LearningRate 0.0839 Epoch: 1 Global Step: 69760 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:32:27,907-Speed 2624.81 samples/sec Loss 13.1891 LearningRate 0.0839 Epoch: 1 Global Step: 69770 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:32:31,805-Speed 2627.64 samples/sec Loss 13.0757 LearningRate 0.0839 Epoch: 1 Global Step: 69780 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:32:35,690-Speed 2635.83 samples/sec Loss 13.0770 LearningRate 0.0839 Epoch: 1 Global Step: 69790 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:32:39,565-Speed 2643.40 samples/sec Loss 13.2658 LearningRate 0.0839 Epoch: 1 Global Step: 69800 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:32:43,459-Speed 2630.80 samples/sec Loss 13.1124 LearningRate 0.0839 Epoch: 1 Global Step: 69810 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:32:47,353-Speed 2630.07 samples/sec Loss 13.1887 LearningRate 0.0839 Epoch: 1 Global Step: 69820 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:32:51,254-Speed 2625.84 samples/sec Loss 13.2342 LearningRate 0.0839 Epoch: 1 Global Step: 69830 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:32:55,153-Speed 2626.96 samples/sec Loss 13.0379 LearningRate 0.0839 Epoch: 1 Global Step: 69840 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:32:59,049-Speed 2629.59 samples/sec Loss 13.0370 LearningRate 0.0839 Epoch: 1 Global Step: 69850 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:33:02,945-Speed 2628.63 samples/sec Loss 13.1880 LearningRate 0.0839 Epoch: 1 Global Step: 69860 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:33:06,842-Speed 2628.24 samples/sec Loss 13.2113 LearningRate 0.0839 Epoch: 1 Global Step: 69870 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:33:10,733-Speed 2632.43 samples/sec Loss 13.1769 LearningRate 0.0839 Epoch: 1 Global Step: 69880 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:33:14,626-Speed 2631.25 samples/sec Loss 12.8798 LearningRate 0.0839 Epoch: 1 Global Step: 69890 Fp16 Grad Scale: 16384 Required: 85 hours
Training: 2022-04-13 03:33:18,584-Speed 2587.40 samples/sec Loss 13.0601 LearningRate 0.0839 Epoch: 1 Global Step: 69900 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:22,496-Speed 2618.91 samples/sec Loss 13.1885 LearningRate 0.0839 Epoch: 1 Global Step: 69910 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:26,400-Speed 2623.30 samples/sec Loss 13.1153 LearningRate 0.0839 Epoch: 1 Global Step: 69920 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:30,298-Speed 2628.27 samples/sec Loss 13.0620 LearningRate 0.0839 Epoch: 1 Global Step: 69930 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:34,190-Speed 2631.24 samples/sec Loss 13.0577 LearningRate 0.0838 Epoch: 1 Global Step: 69940 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:38,084-Speed 2630.49 samples/sec Loss 13.2528 LearningRate 0.0838 Epoch: 1 Global Step: 69950 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:41,976-Speed 2631.33 samples/sec Loss 13.0593 LearningRate 0.0838 Epoch: 1 Global Step: 69960 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:45,880-Speed 2623.79 samples/sec Loss 13.0641 LearningRate 0.0838 Epoch: 1 Global Step: 69970 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:49,777-Speed 2628.65 samples/sec Loss 12.8933 LearningRate 0.0838 Epoch: 1 Global Step: 69980 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:53,686-Speed 2619.86 samples/sec Loss 13.0793 LearningRate 0.0838 Epoch: 1 Global Step: 69990 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:33:57,586-Speed 2627.07 samples/sec Loss 13.1502 LearningRate 0.0838 Epoch: 1 Global Step: 70000 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:34:40,954-[lfw][70000]XNorm: 23.249918
Training: 2022-04-13 03:34:40,955-[lfw][70000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-13 03:34:40,956-[lfw][70000]Accuracy-Highest: 0.99783
Training: 2022-04-13 03:35:31,044-[cfp_fp][70000]XNorm: 20.953000
Training: 2022-04-13 03:35:31,045-[cfp_fp][70000]Accuracy-Flip: 0.97500+-0.00680
Training: 2022-04-13 03:35:31,046-[cfp_fp][70000]Accuracy-Highest: 0.97500
Training: 2022-04-13 03:36:14,566-[agedb_30][70000]XNorm: 22.650397
Training: 2022-04-13 03:36:14,567-[agedb_30][70000]Accuracy-Flip: 0.96083+-0.00597
Training: 2022-04-13 03:36:14,568-[agedb_30][70000]Accuracy-Highest: 0.96283
Training: 2022-04-13 03:36:18,455-Speed 72.69 samples/sec Loss 13.2279 LearningRate 0.0838 Epoch: 1 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:22,338-Speed 2638.38 samples/sec Loss 13.1059 LearningRate 0.0838 Epoch: 1 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:26,213-Speed 2643.15 samples/sec Loss 13.0600 LearningRate 0.0838 Epoch: 1 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:30,086-Speed 2645.04 samples/sec Loss 13.0094 LearningRate 0.0838 Epoch: 1 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:33,953-Speed 2648.49 samples/sec Loss 13.1365 LearningRate 0.0838 Epoch: 1 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:37,835-Speed 2639.22 samples/sec Loss 13.1191 LearningRate 0.0838 Epoch: 1 Global Step: 70060 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:41,709-Speed 2644.24 samples/sec Loss 12.9019 LearningRate 0.0838 Epoch: 1 Global Step: 70070 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:45,596-Speed 2634.85 samples/sec Loss 12.8726 LearningRate 0.0838 Epoch: 1 Global Step: 70080 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:49,475-Speed 2640.53 samples/sec Loss 12.9454 LearningRate 0.0838 Epoch: 1 Global Step: 70090 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:36:53,351-Speed 2643.41 samples/sec Loss 12.8893 LearningRate 0.0838 Epoch: 1 Global Step: 70100 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:36:57,247-Speed 2628.45 samples/sec Loss 12.9735 LearningRate 0.0838 Epoch: 1 Global Step: 70110 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:01,186-Speed 2600.97 samples/sec Loss 13.1731 LearningRate 0.0838 Epoch: 1 Global Step: 70120 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:05,073-Speed 2634.56 samples/sec Loss 13.0358 LearningRate 0.0838 Epoch: 1 Global Step: 70130 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:08,957-Speed 2637.14 samples/sec Loss 13.0908 LearningRate 0.0838 Epoch: 1 Global Step: 70140 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:12,846-Speed 2633.77 samples/sec Loss 13.0904 LearningRate 0.0838 Epoch: 1 Global Step: 70150 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:16,742-Speed 2629.39 samples/sec Loss 13.0712 LearningRate 0.0838 Epoch: 1 Global Step: 70160 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:20,645-Speed 2624.36 samples/sec Loss 12.9279 LearningRate 0.0838 Epoch: 1 Global Step: 70170 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:24,551-Speed 2622.06 samples/sec Loss 12.8615 LearningRate 0.0838 Epoch: 1 Global Step: 70180 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:28,473-Speed 2611.77 samples/sec Loss 12.9951 LearningRate 0.0838 Epoch: 1 Global Step: 70190 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:32,362-Speed 2634.18 samples/sec Loss 13.0238 LearningRate 0.0838 Epoch: 1 Global Step: 70200 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:37:36,251-Speed 2633.59 samples/sec Loss 13.0233 LearningRate 0.0838 Epoch: 1 Global Step: 70210 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:37:40,139-Speed 2634.71 samples/sec Loss 13.0223 LearningRate 0.0838 Epoch: 1 Global Step: 70220 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:37:44,029-Speed 2632.83 samples/sec Loss 13.0407 LearningRate 0.0838 Epoch: 1 Global Step: 70230 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:37:47,903-Speed 2644.07 samples/sec Loss 12.9335 LearningRate 0.0838 Epoch: 1 Global Step: 70240 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:51,790-Speed 2634.83 samples/sec Loss 13.0166 LearningRate 0.0838 Epoch: 1 Global Step: 70250 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:55,682-Speed 2632.36 samples/sec Loss 13.0152 LearningRate 0.0838 Epoch: 1 Global Step: 70260 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:37:59,590-Speed 2620.29 samples/sec Loss 13.0846 LearningRate 0.0838 Epoch: 1 Global Step: 70270 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:38:03,524-Speed 2604.04 samples/sec Loss 12.9803 LearningRate 0.0838 Epoch: 1 Global Step: 70280 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:38:07,416-Speed 2631.62 samples/sec Loss 12.9957 LearningRate 0.0838 Epoch: 1 Global Step: 70290 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:38:11,310-Speed 2630.71 samples/sec Loss 13.0694 LearningRate 0.0838 Epoch: 1 Global Step: 70300 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:38:15,209-Speed 2626.81 samples/sec Loss 13.0671 LearningRate 0.0838 Epoch: 1 Global Step: 70310 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:38:19,096-Speed 2634.61 samples/sec Loss 13.0453 LearningRate 0.0838 Epoch: 1 Global Step: 70320 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:38:22,987-Speed 2633.16 samples/sec Loss 12.9655 LearningRate 0.0838 Epoch: 1 Global Step: 70330 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:38:26,877-Speed 2632.26 samples/sec Loss 12.9856 LearningRate 0.0838 Epoch: 1 Global Step: 70340 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:38:30,781-Speed 2624.08 samples/sec Loss 13.0703 LearningRate 0.0838 Epoch: 1 Global Step: 70350 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:38:34,629-Speed 2661.70 samples/sec Loss 13.1417 LearningRate 0.0838 Epoch: 1 Global Step: 70360 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:38:38,519-Speed 2633.16 samples/sec Loss 12.9224 LearningRate 0.0838 Epoch: 1 Global Step: 70370 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:38:42,410-Speed 2633.08 samples/sec Loss 12.8842 LearningRate 0.0838 Epoch: 1 Global Step: 70380 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:38:46,301-Speed 2632.26 samples/sec Loss 13.0163 LearningRate 0.0837 Epoch: 1 Global Step: 70390 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:38:50,199-Speed 2627.67 samples/sec Loss 13.1638 LearningRate 0.0837 Epoch: 1 Global Step: 70400 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:38:54,114-Speed 2616.32 samples/sec Loss 12.9810 LearningRate 0.0837 Epoch: 1 Global Step: 70410 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:38:58,004-Speed 2632.63 samples/sec Loss 13.2294 LearningRate 0.0837 Epoch: 1 Global Step: 70420 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:39:01,901-Speed 2627.90 samples/sec Loss 13.0680 LearningRate 0.0837 Epoch: 1 Global Step: 70430 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:39:05,804-Speed 2624.75 samples/sec Loss 12.9926 LearningRate 0.0837 Epoch: 1 Global Step: 70440 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:39:09,702-Speed 2627.78 samples/sec Loss 13.1561 LearningRate 0.0837 Epoch: 1 Global Step: 70450 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 03:39:13,594-Speed 2632.02 samples/sec Loss 13.0510 LearningRate 0.0837 Epoch: 1 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:17,511-Speed 2615.00 samples/sec Loss 13.0147 LearningRate 0.0837 Epoch: 1 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:21,404-Speed 2630.85 samples/sec Loss 12.9931 LearningRate 0.0837 Epoch: 1 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:25,300-Speed 2629.17 samples/sec Loss 13.0666 LearningRate 0.0837 Epoch: 1 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:29,212-Speed 2618.40 samples/sec Loss 12.9322 LearningRate 0.0837 Epoch: 1 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:33,121-Speed 2619.81 samples/sec Loss 13.0512 LearningRate 0.0837 Epoch: 1 Global Step: 70510 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:37,018-Speed 2628.17 samples/sec Loss 13.0916 LearningRate 0.0837 Epoch: 1 Global Step: 70520 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:40,912-Speed 2630.78 samples/sec Loss 12.9814 LearningRate 0.0837 Epoch: 1 Global Step: 70530 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:44,806-Speed 2629.93 samples/sec Loss 13.0398 LearningRate 0.0837 Epoch: 1 Global Step: 70540 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:48,705-Speed 2627.19 samples/sec Loss 12.9953 LearningRate 0.0837 Epoch: 1 Global Step: 70550 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:39:52,597-Speed 2632.06 samples/sec Loss 13.0149 LearningRate 0.0837 Epoch: 1 Global Step: 70560 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:39:56,503-Speed 2622.30 samples/sec Loss 13.0542 LearningRate 0.0837 Epoch: 1 Global Step: 70570 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:00,397-Speed 2630.26 samples/sec Loss 12.9529 LearningRate 0.0837 Epoch: 1 Global Step: 70580 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:04,296-Speed 2627.08 samples/sec Loss 13.0035 LearningRate 0.0837 Epoch: 1 Global Step: 70590 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:08,199-Speed 2624.29 samples/sec Loss 13.0594 LearningRate 0.0837 Epoch: 1 Global Step: 70600 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:12,184-Speed 2570.65 samples/sec Loss 13.0260 LearningRate 0.0837 Epoch: 1 Global Step: 70610 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:16,082-Speed 2627.16 samples/sec Loss 13.0000 LearningRate 0.0837 Epoch: 1 Global Step: 70620 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:19,990-Speed 2621.50 samples/sec Loss 12.9963 LearningRate 0.0837 Epoch: 1 Global Step: 70630 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:23,893-Speed 2623.75 samples/sec Loss 13.0691 LearningRate 0.0837 Epoch: 1 Global Step: 70640 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:27,792-Speed 2627.48 samples/sec Loss 13.0223 LearningRate 0.0837 Epoch: 1 Global Step: 70650 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:31,692-Speed 2625.95 samples/sec Loss 13.0435 LearningRate 0.0837 Epoch: 1 Global Step: 70660 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:40:35,595-Speed 2623.90 samples/sec Loss 13.0966 LearningRate 0.0837 Epoch: 1 Global Step: 70670 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:40:39,493-Speed 2627.87 samples/sec Loss 12.8385 LearningRate 0.0837 Epoch: 1 Global Step: 70680 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:43,401-Speed 2621.05 samples/sec Loss 12.8656 LearningRate 0.0837 Epoch: 1 Global Step: 70690 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:47,304-Speed 2624.72 samples/sec Loss 13.1644 LearningRate 0.0837 Epoch: 1 Global Step: 70700 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:51,206-Speed 2624.29 samples/sec Loss 13.0650 LearningRate 0.0837 Epoch: 1 Global Step: 70710 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:55,105-Speed 2628.89 samples/sec Loss 12.9958 LearningRate 0.0837 Epoch: 1 Global Step: 70720 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:40:59,008-Speed 2624.43 samples/sec Loss 13.0987 LearningRate 0.0837 Epoch: 1 Global Step: 70730 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:03,037-Speed 2541.98 samples/sec Loss 12.8701 LearningRate 0.0837 Epoch: 1 Global Step: 70740 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:07,120-Speed 2508.05 samples/sec Loss 12.9916 LearningRate 0.0837 Epoch: 1 Global Step: 70750 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:11,197-Speed 2512.60 samples/sec Loss 13.0115 LearningRate 0.0837 Epoch: 1 Global Step: 70760 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:15,281-Speed 2507.98 samples/sec Loss 13.0354 LearningRate 0.0837 Epoch: 1 Global Step: 70770 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:19,288-Speed 2556.70 samples/sec Loss 12.9512 LearningRate 0.0837 Epoch: 1 Global Step: 70780 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:41:23,188-Speed 2626.14 samples/sec Loss 13.1699 LearningRate 0.0837 Epoch: 1 Global Step: 70790 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:41:27,086-Speed 2627.61 samples/sec Loss 13.0100 LearningRate 0.0837 Epoch: 1 Global Step: 70800 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:41:30,975-Speed 2634.15 samples/sec Loss 13.1320 LearningRate 0.0837 Epoch: 1 Global Step: 70810 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:34,873-Speed 2626.94 samples/sec Loss 13.0608 LearningRate 0.0837 Epoch: 1 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:38,788-Speed 2615.91 samples/sec Loss 13.2146 LearningRate 0.0837 Epoch: 1 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:42,700-Speed 2618.41 samples/sec Loss 13.0937 LearningRate 0.0837 Epoch: 1 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:46,625-Speed 2610.14 samples/sec Loss 13.0994 LearningRate 0.0836 Epoch: 1 Global Step: 70850 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:50,554-Speed 2607.18 samples/sec Loss 13.0412 LearningRate 0.0836 Epoch: 1 Global Step: 70860 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:54,473-Speed 2612.78 samples/sec Loss 13.0408 LearningRate 0.0836 Epoch: 1 Global Step: 70870 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:41:58,389-Speed 2616.15 samples/sec Loss 13.1484 LearningRate 0.0836 Epoch: 1 Global Step: 70880 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:02,298-Speed 2619.77 samples/sec Loss 13.0086 LearningRate 0.0836 Epoch: 1 Global Step: 70890 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:06,203-Speed 2622.47 samples/sec Loss 13.1016 LearningRate 0.0836 Epoch: 1 Global Step: 70900 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:10,107-Speed 2623.70 samples/sec Loss 13.0312 LearningRate 0.0836 Epoch: 1 Global Step: 70910 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:42:14,015-Speed 2621.53 samples/sec Loss 13.1348 LearningRate 0.0836 Epoch: 1 Global Step: 70920 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:42:17,918-Speed 2624.67 samples/sec Loss 13.0517 LearningRate 0.0836 Epoch: 1 Global Step: 70930 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:42:21,809-Speed 2631.83 samples/sec Loss 13.0204 LearningRate 0.0836 Epoch: 1 Global Step: 70940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:25,725-Speed 2615.94 samples/sec Loss 13.0463 LearningRate 0.0836 Epoch: 1 Global Step: 70950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:29,633-Speed 2620.66 samples/sec Loss 13.0076 LearningRate 0.0836 Epoch: 1 Global Step: 70960 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:33,550-Speed 2615.28 samples/sec Loss 13.1701 LearningRate 0.0836 Epoch: 1 Global Step: 70970 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:37,460-Speed 2619.65 samples/sec Loss 13.0061 LearningRate 0.0836 Epoch: 1 Global Step: 70980 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:41,390-Speed 2605.78 samples/sec Loss 13.0004 LearningRate 0.0836 Epoch: 1 Global Step: 70990 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:45,302-Speed 2618.16 samples/sec Loss 13.1339 LearningRate 0.0836 Epoch: 1 Global Step: 71000 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:49,346-Speed 2533.19 samples/sec Loss 13.0388 LearningRate 0.0836 Epoch: 1 Global Step: 71010 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:53,359-Speed 2552.89 samples/sec Loss 13.0169 LearningRate 0.0836 Epoch: 1 Global Step: 71020 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:42:57,273-Speed 2616.30 samples/sec Loss 13.2049 LearningRate 0.0836 Epoch: 1 Global Step: 71030 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:43:01,187-Speed 2617.61 samples/sec Loss 13.0395 LearningRate 0.0836 Epoch: 1 Global Step: 71040 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:05,089-Speed 2624.53 samples/sec Loss 13.1257 LearningRate 0.0836 Epoch: 1 Global Step: 71050 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:08,997-Speed 2620.76 samples/sec Loss 13.1184 LearningRate 0.0836 Epoch: 1 Global Step: 71060 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:12,901-Speed 2623.38 samples/sec Loss 13.1019 LearningRate 0.0836 Epoch: 1 Global Step: 71070 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:16,915-Speed 2552.18 samples/sec Loss 12.9770 LearningRate 0.0836 Epoch: 1 Global Step: 71080 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:20,819-Speed 2623.82 samples/sec Loss 12.9232 LearningRate 0.0836 Epoch: 1 Global Step: 71090 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:24,724-Speed 2623.51 samples/sec Loss 12.9958 LearningRate 0.0836 Epoch: 1 Global Step: 71100 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:28,636-Speed 2618.32 samples/sec Loss 12.9966 LearningRate 0.0836 Epoch: 1 Global Step: 71110 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:43:32,522-Speed 2636.05 samples/sec Loss 13.1885 LearningRate 0.0836 Epoch: 1 Global Step: 71120 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:43:36,426-Speed 2623.31 samples/sec Loss 13.0638 LearningRate 0.0836 Epoch: 1 Global Step: 71130 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:43:40,337-Speed 2618.60 samples/sec Loss 12.9834 LearningRate 0.0836 Epoch: 1 Global Step: 71140 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:43:44,240-Speed 2623.92 samples/sec Loss 13.1558 LearningRate 0.0836 Epoch: 1 Global Step: 71150 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:43:48,145-Speed 2623.07 samples/sec Loss 13.1424 LearningRate 0.0836 Epoch: 1 Global Step: 71160 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:43:52,036-Speed 2632.50 samples/sec Loss 13.2758 LearningRate 0.0836 Epoch: 1 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:43:55,946-Speed 2619.80 samples/sec Loss 12.9558 LearningRate 0.0836 Epoch: 1 Global Step: 71180 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:43:59,859-Speed 2617.50 samples/sec Loss 13.0449 LearningRate 0.0836 Epoch: 1 Global Step: 71190 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:03,769-Speed 2619.60 samples/sec Loss 12.9706 LearningRate 0.0836 Epoch: 1 Global Step: 71200 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:07,693-Speed 2610.39 samples/sec Loss 13.0721 LearningRate 0.0836 Epoch: 1 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:11,603-Speed 2619.49 samples/sec Loss 13.0186 LearningRate 0.0836 Epoch: 1 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:15,507-Speed 2623.06 samples/sec Loss 13.0724 LearningRate 0.0836 Epoch: 1 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:19,421-Speed 2617.56 samples/sec Loss 13.0119 LearningRate 0.0836 Epoch: 1 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:23,334-Speed 2617.71 samples/sec Loss 12.8705 LearningRate 0.0836 Epoch: 1 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:27,256-Speed 2611.36 samples/sec Loss 13.0042 LearningRate 0.0836 Epoch: 1 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:44:31,162-Speed 2622.73 samples/sec Loss 13.1145 LearningRate 0.0836 Epoch: 1 Global Step: 71270 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:44:35,074-Speed 2618.17 samples/sec Loss 13.0057 LearningRate 0.0836 Epoch: 1 Global Step: 71280 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:44:39,005-Speed 2605.34 samples/sec Loss 12.9667 LearningRate 0.0836 Epoch: 1 Global Step: 71290 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:44:42,912-Speed 2621.75 samples/sec Loss 12.9779 LearningRate 0.0835 Epoch: 1 Global Step: 71300 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:44:46,820-Speed 2620.95 samples/sec Loss 12.9623 LearningRate 0.0835 Epoch: 1 Global Step: 71310 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:44:50,730-Speed 2619.48 samples/sec Loss 12.9912 LearningRate 0.0835 Epoch: 1 Global Step: 71320 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:44:54,648-Speed 2614.21 samples/sec Loss 13.1502 LearningRate 0.0835 Epoch: 1 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:44:58,549-Speed 2625.44 samples/sec Loss 13.1277 LearningRate 0.0835 Epoch: 1 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:02,457-Speed 2621.20 samples/sec Loss 12.9808 LearningRate 0.0835 Epoch: 1 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:06,374-Speed 2614.69 samples/sec Loss 13.0900 LearningRate 0.0835 Epoch: 1 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:10,295-Speed 2612.10 samples/sec Loss 13.0626 LearningRate 0.0835 Epoch: 1 Global Step: 71370 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:45:14,231-Speed 2602.46 samples/sec Loss 13.0187 LearningRate 0.0835 Epoch: 1 Global Step: 71380 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:45:18,158-Speed 2612.69 samples/sec Loss 13.1483 LearningRate 0.0835 Epoch: 1 Global Step: 71390 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:45:22,076-Speed 2614.03 samples/sec Loss 12.7533 LearningRate 0.0835 Epoch: 1 Global Step: 71400 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:45:25,993-Speed 2615.24 samples/sec Loss 13.0100 LearningRate 0.0835 Epoch: 1 Global Step: 71410 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:45:29,894-Speed 2626.08 samples/sec Loss 13.1437 LearningRate 0.0835 Epoch: 1 Global Step: 71420 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:33,812-Speed 2613.76 samples/sec Loss 13.0637 LearningRate 0.0835 Epoch: 1 Global Step: 71430 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:37,725-Speed 2617.46 samples/sec Loss 13.0682 LearningRate 0.0835 Epoch: 1 Global Step: 71440 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:41,634-Speed 2620.43 samples/sec Loss 13.1262 LearningRate 0.0835 Epoch: 1 Global Step: 71450 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:45,544-Speed 2619.17 samples/sec Loss 12.9519 LearningRate 0.0835 Epoch: 1 Global Step: 71460 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:49,466-Speed 2611.85 samples/sec Loss 13.0486 LearningRate 0.0835 Epoch: 1 Global Step: 71470 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:53,388-Speed 2611.43 samples/sec Loss 12.8853 LearningRate 0.0835 Epoch: 1 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:45:57,296-Speed 2621.37 samples/sec Loss 13.1104 LearningRate 0.0835 Epoch: 1 Global Step: 71490 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:46:01,208-Speed 2618.60 samples/sec Loss 13.1394 LearningRate 0.0835 Epoch: 1 Global Step: 71500 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:46:05,122-Speed 2616.52 samples/sec Loss 13.0769 LearningRate 0.0835 Epoch: 1 Global Step: 71510 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:46:09,035-Speed 2617.47 samples/sec Loss 13.0393 LearningRate 0.0835 Epoch: 1 Global Step: 71520 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:12,951-Speed 2615.92 samples/sec Loss 13.0329 LearningRate 0.0835 Epoch: 1 Global Step: 71530 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:16,860-Speed 2620.45 samples/sec Loss 13.0766 LearningRate 0.0835 Epoch: 1 Global Step: 71540 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:20,784-Speed 2609.87 samples/sec Loss 12.8302 LearningRate 0.0835 Epoch: 1 Global Step: 71550 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:24,695-Speed 2619.57 samples/sec Loss 13.0960 LearningRate 0.0835 Epoch: 1 Global Step: 71560 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:28,633-Speed 2601.01 samples/sec Loss 13.1691 LearningRate 0.0835 Epoch: 1 Global Step: 71570 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:32,546-Speed 2617.84 samples/sec Loss 12.9368 LearningRate 0.0835 Epoch: 1 Global Step: 71580 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:36,463-Speed 2614.77 samples/sec Loss 13.0293 LearningRate 0.0835 Epoch: 1 Global Step: 71590 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:40,377-Speed 2616.58 samples/sec Loss 13.1966 LearningRate 0.0835 Epoch: 1 Global Step: 71600 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:44,288-Speed 2619.17 samples/sec Loss 13.0440 LearningRate 0.0835 Epoch: 1 Global Step: 71610 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:48,188-Speed 2626.96 samples/sec Loss 13.0663 LearningRate 0.0835 Epoch: 1 Global Step: 71620 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:52,089-Speed 2625.30 samples/sec Loss 13.0106 LearningRate 0.0835 Epoch: 1 Global Step: 71630 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:56,017-Speed 2607.44 samples/sec Loss 12.8748 LearningRate 0.0835 Epoch: 1 Global Step: 71640 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:46:59,906-Speed 2633.68 samples/sec Loss 13.0524 LearningRate 0.0835 Epoch: 1 Global Step: 71650 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:03,815-Speed 2620.76 samples/sec Loss 13.0083 LearningRate 0.0835 Epoch: 1 Global Step: 71660 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:07,722-Speed 2621.30 samples/sec Loss 12.9992 LearningRate 0.0835 Epoch: 1 Global Step: 71670 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:11,632-Speed 2619.45 samples/sec Loss 13.1196 LearningRate 0.0835 Epoch: 1 Global Step: 71680 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:15,538-Speed 2622.35 samples/sec Loss 12.8769 LearningRate 0.0835 Epoch: 1 Global Step: 71690 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:19,464-Speed 2608.92 samples/sec Loss 13.0032 LearningRate 0.0835 Epoch: 1 Global Step: 71700 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:23,371-Speed 2621.56 samples/sec Loss 13.0109 LearningRate 0.0835 Epoch: 1 Global Step: 71710 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:27,283-Speed 2618.21 samples/sec Loss 12.8153 LearningRate 0.0835 Epoch: 1 Global Step: 71720 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:31,194-Speed 2618.95 samples/sec Loss 12.9902 LearningRate 0.0835 Epoch: 1 Global Step: 71730 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:35,104-Speed 2619.32 samples/sec Loss 12.9769 LearningRate 0.0835 Epoch: 1 Global Step: 71740 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:39,016-Speed 2617.86 samples/sec Loss 12.9053 LearningRate 0.0835 Epoch: 1 Global Step: 71750 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:47:42,908-Speed 2632.48 samples/sec Loss 13.2114 LearningRate 0.0834 Epoch: 1 Global Step: 71760 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:46,814-Speed 2622.13 samples/sec Loss 12.9456 LearningRate 0.0834 Epoch: 1 Global Step: 71770 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:50,720-Speed 2622.37 samples/sec Loss 12.9738 LearningRate 0.0834 Epoch: 1 Global Step: 71780 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:54,632-Speed 2617.68 samples/sec Loss 13.1288 LearningRate 0.0834 Epoch: 1 Global Step: 71790 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:47:58,536-Speed 2624.09 samples/sec Loss 12.9503 LearningRate 0.0834 Epoch: 1 Global Step: 71800 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:02,437-Speed 2625.76 samples/sec Loss 12.9352 LearningRate 0.0834 Epoch: 1 Global Step: 71810 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:06,343-Speed 2622.23 samples/sec Loss 13.1079 LearningRate 0.0834 Epoch: 1 Global Step: 71820 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:10,248-Speed 2622.72 samples/sec Loss 13.2199 LearningRate 0.0834 Epoch: 1 Global Step: 71830 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:14,170-Speed 2612.04 samples/sec Loss 12.9951 LearningRate 0.0834 Epoch: 1 Global Step: 71840 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:18,079-Speed 2620.17 samples/sec Loss 13.0810 LearningRate 0.0834 Epoch: 1 Global Step: 71850 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:21,979-Speed 2626.06 samples/sec Loss 13.2017 LearningRate 0.0834 Epoch: 1 Global Step: 71860 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:48:25,882-Speed 2624.86 samples/sec Loss 12.9884 LearningRate 0.0834 Epoch: 1 Global Step: 71870 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:48:29,797-Speed 2615.90 samples/sec Loss 13.0071 LearningRate 0.0834 Epoch: 1 Global Step: 71880 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:48:33,692-Speed 2629.91 samples/sec Loss 13.0966 LearningRate 0.0834 Epoch: 1 Global Step: 71890 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:37,605-Speed 2617.27 samples/sec Loss 13.0127 LearningRate 0.0834 Epoch: 1 Global Step: 71900 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:41,520-Speed 2616.62 samples/sec Loss 13.2090 LearningRate 0.0834 Epoch: 1 Global Step: 71910 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:45,442-Speed 2610.92 samples/sec Loss 12.9306 LearningRate 0.0834 Epoch: 1 Global Step: 71920 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:49,358-Speed 2615.51 samples/sec Loss 13.0771 LearningRate 0.0834 Epoch: 1 Global Step: 71930 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:53,279-Speed 2612.62 samples/sec Loss 13.0023 LearningRate 0.0834 Epoch: 1 Global Step: 71940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:48:57,189-Speed 2619.00 samples/sec Loss 12.9288 LearningRate 0.0834 Epoch: 1 Global Step: 71950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:49:01,092-Speed 2624.31 samples/sec Loss 12.9227 LearningRate 0.0834 Epoch: 1 Global Step: 71960 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:49:04,997-Speed 2623.11 samples/sec Loss 13.1241 LearningRate 0.0834 Epoch: 1 Global Step: 71970 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:49:08,924-Speed 2608.22 samples/sec Loss 12.9917 LearningRate 0.0834 Epoch: 1 Global Step: 71980 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:49:12,894-Speed 2579.65 samples/sec Loss 12.9038 LearningRate 0.0834 Epoch: 1 Global Step: 71990 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:16,814-Speed 2613.26 samples/sec Loss 13.0172 LearningRate 0.0834 Epoch: 1 Global Step: 72000 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:20,723-Speed 2619.86 samples/sec Loss 12.9815 LearningRate 0.0834 Epoch: 1 Global Step: 72010 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:24,631-Speed 2620.94 samples/sec Loss 12.9939 LearningRate 0.0834 Epoch: 1 Global Step: 72020 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:28,550-Speed 2613.56 samples/sec Loss 12.9063 LearningRate 0.0834 Epoch: 1 Global Step: 72030 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:32,457-Speed 2621.38 samples/sec Loss 13.0332 LearningRate 0.0834 Epoch: 1 Global Step: 72040 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:36,368-Speed 2618.96 samples/sec Loss 12.9374 LearningRate 0.0834 Epoch: 1 Global Step: 72050 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:40,280-Speed 2617.84 samples/sec Loss 12.9209 LearningRate 0.0834 Epoch: 1 Global Step: 72060 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:44,187-Speed 2621.43 samples/sec Loss 12.9468 LearningRate 0.0834 Epoch: 1 Global Step: 72070 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:49:48,081-Speed 2630.83 samples/sec Loss 12.9884 LearningRate 0.0834 Epoch: 1 Global Step: 72080 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:49:51,987-Speed 2622.55 samples/sec Loss 12.8867 LearningRate 0.0834 Epoch: 1 Global Step: 72090 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:49:55,894-Speed 2621.18 samples/sec Loss 12.9805 LearningRate 0.0834 Epoch: 1 Global Step: 72100 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:49:59,799-Speed 2622.65 samples/sec Loss 13.0478 LearningRate 0.0834 Epoch: 1 Global Step: 72110 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:03,706-Speed 2622.16 samples/sec Loss 12.8653 LearningRate 0.0834 Epoch: 1 Global Step: 72120 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:07,620-Speed 2616.71 samples/sec Loss 13.0628 LearningRate 0.0834 Epoch: 1 Global Step: 72130 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:11,528-Speed 2620.60 samples/sec Loss 12.9504 LearningRate 0.0834 Epoch: 1 Global Step: 72140 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:15,454-Speed 2608.29 samples/sec Loss 12.9661 LearningRate 0.0834 Epoch: 1 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:19,362-Speed 2621.89 samples/sec Loss 13.0323 LearningRate 0.0834 Epoch: 1 Global Step: 72160 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:23,272-Speed 2619.98 samples/sec Loss 13.0328 LearningRate 0.0834 Epoch: 1 Global Step: 72170 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:27,222-Speed 2593.31 samples/sec Loss 13.0700 LearningRate 0.0834 Epoch: 1 Global Step: 72180 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:50:31,130-Speed 2620.43 samples/sec Loss 12.9633 LearningRate 0.0834 Epoch: 1 Global Step: 72190 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:50:35,046-Speed 2615.73 samples/sec Loss 13.0256 LearningRate 0.0834 Epoch: 1 Global Step: 72200 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:50:38,962-Speed 2615.91 samples/sec Loss 13.0423 LearningRate 0.0833 Epoch: 1 Global Step: 72210 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:50:42,875-Speed 2617.20 samples/sec Loss 12.9897 LearningRate 0.0833 Epoch: 1 Global Step: 72220 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:50:46,799-Speed 2610.19 samples/sec Loss 13.1702 LearningRate 0.0833 Epoch: 1 Global Step: 72230 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:50:50,716-Speed 2615.01 samples/sec Loss 13.0477 LearningRate 0.0833 Epoch: 1 Global Step: 72240 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:50:54,623-Speed 2621.34 samples/sec Loss 12.9828 LearningRate 0.0833 Epoch: 1 Global Step: 72250 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:50:58,533-Speed 2619.82 samples/sec Loss 12.9833 LearningRate 0.0833 Epoch: 1 Global Step: 72260 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:51:02,447-Speed 2616.45 samples/sec Loss 12.8792 LearningRate 0.0833 Epoch: 1 Global Step: 72270 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:51:06,357-Speed 2619.83 samples/sec Loss 12.8899 LearningRate 0.0833 Epoch: 1 Global Step: 72280 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:51:10,267-Speed 2619.45 samples/sec Loss 13.0773 LearningRate 0.0833 Epoch: 1 Global Step: 72290 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:51:14,173-Speed 2622.24 samples/sec Loss 13.0357 LearningRate 0.0833 Epoch: 1 Global Step: 72300 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:51:18,061-Speed 2633.97 samples/sec Loss 13.0064 LearningRate 0.0833 Epoch: 1 Global Step: 72310 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:22,028-Speed 2582.44 samples/sec Loss 13.0352 LearningRate 0.0833 Epoch: 1 Global Step: 72320 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:25,944-Speed 2615.12 samples/sec Loss 13.0906 LearningRate 0.0833 Epoch: 1 Global Step: 72330 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:29,861-Speed 2614.83 samples/sec Loss 13.0043 LearningRate 0.0833 Epoch: 1 Global Step: 72340 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:33,778-Speed 2615.11 samples/sec Loss 12.9256 LearningRate 0.0833 Epoch: 1 Global Step: 72350 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:37,697-Speed 2613.53 samples/sec Loss 12.9989 LearningRate 0.0833 Epoch: 1 Global Step: 72360 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:41,615-Speed 2614.25 samples/sec Loss 13.0581 LearningRate 0.0833 Epoch: 1 Global Step: 72370 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:45,541-Speed 2608.68 samples/sec Loss 12.9405 LearningRate 0.0833 Epoch: 1 Global Step: 72380 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:49,458-Speed 2615.16 samples/sec Loss 13.0349 LearningRate 0.0833 Epoch: 1 Global Step: 72390 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:53,371-Speed 2616.86 samples/sec Loss 13.0696 LearningRate 0.0833 Epoch: 1 Global Step: 72400 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:51:57,279-Speed 2621.57 samples/sec Loss 13.0077 LearningRate 0.0833 Epoch: 1 Global Step: 72410 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:52:01,203-Speed 2610.29 samples/sec Loss 12.9117 LearningRate 0.0833 Epoch: 1 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:05,137-Speed 2603.70 samples/sec Loss 12.9897 LearningRate 0.0833 Epoch: 1 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:09,044-Speed 2621.38 samples/sec Loss 12.9914 LearningRate 0.0833 Epoch: 1 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:12,949-Speed 2623.71 samples/sec Loss 13.0902 LearningRate 0.0833 Epoch: 1 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:16,851-Speed 2624.36 samples/sec Loss 12.9021 LearningRate 0.0833 Epoch: 1 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:20,802-Speed 2592.93 samples/sec Loss 12.9663 LearningRate 0.0833 Epoch: 1 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:24,706-Speed 2623.18 samples/sec Loss 12.8441 LearningRate 0.0833 Epoch: 1 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:28,612-Speed 2622.57 samples/sec Loss 13.1670 LearningRate 0.0833 Epoch: 1 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:32,580-Speed 2581.66 samples/sec Loss 12.9061 LearningRate 0.0833 Epoch: 1 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:36,478-Speed 2627.37 samples/sec Loss 13.0163 LearningRate 0.0833 Epoch: 1 Global Step: 72510 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:52:40,386-Speed 2621.08 samples/sec Loss 12.9440 LearningRate 0.0833 Epoch: 1 Global Step: 72520 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:52:44,289-Speed 2624.01 samples/sec Loss 13.0597 LearningRate 0.0833 Epoch: 1 Global Step: 72530 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:52:48,215-Speed 2608.91 samples/sec Loss 12.9976 LearningRate 0.0833 Epoch: 1 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:52:52,116-Speed 2625.85 samples/sec Loss 13.0762 LearningRate 0.0833 Epoch: 1 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:52:56,040-Speed 2610.20 samples/sec Loss 12.9498 LearningRate 0.0833 Epoch: 1 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:52:59,946-Speed 2622.26 samples/sec Loss 12.9823 LearningRate 0.0833 Epoch: 1 Global Step: 72570 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:03,875-Speed 2606.74 samples/sec Loss 12.9565 LearningRate 0.0833 Epoch: 1 Global Step: 72580 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:07,781-Speed 2622.36 samples/sec Loss 12.7562 LearningRate 0.0833 Epoch: 1 Global Step: 72590 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:11,683-Speed 2624.81 samples/sec Loss 12.9274 LearningRate 0.0833 Epoch: 1 Global Step: 72600 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:15,588-Speed 2623.34 samples/sec Loss 13.0075 LearningRate 0.0833 Epoch: 1 Global Step: 72610 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:19,489-Speed 2625.70 samples/sec Loss 13.0361 LearningRate 0.0833 Epoch: 1 Global Step: 72620 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:23,400-Speed 2618.28 samples/sec Loss 12.9731 LearningRate 0.0833 Epoch: 1 Global Step: 72630 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:27,407-Speed 2556.81 samples/sec Loss 13.0541 LearningRate 0.0833 Epoch: 1 Global Step: 72640 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:31,384-Speed 2575.20 samples/sec Loss 13.1481 LearningRate 0.0833 Epoch: 1 Global Step: 72650 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:35,295-Speed 2618.93 samples/sec Loss 13.0443 LearningRate 0.0832 Epoch: 1 Global Step: 72660 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:53:39,209-Speed 2617.22 samples/sec Loss 12.9733 LearningRate 0.0832 Epoch: 1 Global Step: 72670 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:53:43,117-Speed 2621.16 samples/sec Loss 12.8914 LearningRate 0.0832 Epoch: 1 Global Step: 72680 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:53:47,015-Speed 2627.14 samples/sec Loss 12.8985 LearningRate 0.0832 Epoch: 1 Global Step: 72690 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:53:50,951-Speed 2602.25 samples/sec Loss 13.0475 LearningRate 0.0832 Epoch: 1 Global Step: 72700 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:53:54,858-Speed 2621.90 samples/sec Loss 13.1769 LearningRate 0.0832 Epoch: 1 Global Step: 72710 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:53:58,755-Speed 2628.90 samples/sec Loss 13.0852 LearningRate 0.0832 Epoch: 1 Global Step: 72720 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:54:02,656-Speed 2625.59 samples/sec Loss 12.9573 LearningRate 0.0832 Epoch: 1 Global Step: 72730 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:54:06,559-Speed 2623.48 samples/sec Loss 13.0303 LearningRate 0.0832 Epoch: 1 Global Step: 72740 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:54:10,554-Speed 2564.05 samples/sec Loss 12.9100 LearningRate 0.0832 Epoch: 1 Global Step: 72750 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:54:14,467-Speed 2618.32 samples/sec Loss 13.1001 LearningRate 0.0832 Epoch: 1 Global Step: 72760 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:54:18,372-Speed 2622.96 samples/sec Loss 12.9049 LearningRate 0.0832 Epoch: 1 Global Step: 72770 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:22,285-Speed 2617.57 samples/sec Loss 12.9840 LearningRate 0.0832 Epoch: 1 Global Step: 72780 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:26,186-Speed 2625.30 samples/sec Loss 12.9818 LearningRate 0.0832 Epoch: 1 Global Step: 72790 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:30,085-Speed 2627.55 samples/sec Loss 12.9589 LearningRate 0.0832 Epoch: 1 Global Step: 72800 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:33,996-Speed 2618.79 samples/sec Loss 12.9186 LearningRate 0.0832 Epoch: 1 Global Step: 72810 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:37,903-Speed 2621.84 samples/sec Loss 13.0291 LearningRate 0.0832 Epoch: 1 Global Step: 72820 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:41,816-Speed 2616.98 samples/sec Loss 12.8769 LearningRate 0.0832 Epoch: 1 Global Step: 72830 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:45,724-Speed 2621.45 samples/sec Loss 13.0056 LearningRate 0.0832 Epoch: 1 Global Step: 72840 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:49,638-Speed 2616.26 samples/sec Loss 12.9126 LearningRate 0.0832 Epoch: 1 Global Step: 72850 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:53,541-Speed 2624.50 samples/sec Loss 13.0389 LearningRate 0.0832 Epoch: 1 Global Step: 72860 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:54:57,430-Speed 2633.91 samples/sec Loss 13.1488 LearningRate 0.0832 Epoch: 1 Global Step: 72870 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:55:01,330-Speed 2626.28 samples/sec Loss 13.1695 LearningRate 0.0832 Epoch: 1 Global Step: 72880 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:55:05,213-Speed 2637.87 samples/sec Loss 12.9935 LearningRate 0.0832 Epoch: 1 Global Step: 72890 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:55:09,139-Speed 2609.34 samples/sec Loss 12.9451 LearningRate 0.0832 Epoch: 1 Global Step: 72900 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:55:13,043-Speed 2623.64 samples/sec Loss 12.8959 LearningRate 0.0832 Epoch: 1 Global Step: 72910 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:55:16,945-Speed 2624.89 samples/sec Loss 12.9202 LearningRate 0.0832 Epoch: 1 Global Step: 72920 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:55:20,849-Speed 2623.03 samples/sec Loss 12.9402 LearningRate 0.0832 Epoch: 1 Global Step: 72930 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:55:24,738-Speed 2634.12 samples/sec Loss 13.1346 LearningRate 0.0832 Epoch: 1 Global Step: 72940 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:28,660-Speed 2611.62 samples/sec Loss 12.9315 LearningRate 0.0832 Epoch: 1 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:32,563-Speed 2624.35 samples/sec Loss 12.9767 LearningRate 0.0832 Epoch: 1 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:36,522-Speed 2587.39 samples/sec Loss 12.9755 LearningRate 0.0832 Epoch: 1 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:40,439-Speed 2614.79 samples/sec Loss 12.7887 LearningRate 0.0832 Epoch: 1 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:44,358-Speed 2613.41 samples/sec Loss 12.8928 LearningRate 0.0832 Epoch: 1 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:48,269-Speed 2619.27 samples/sec Loss 13.1790 LearningRate 0.0832 Epoch: 1 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:52,178-Speed 2620.01 samples/sec Loss 12.9242 LearningRate 0.0832 Epoch: 1 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:55:56,118-Speed 2599.33 samples/sec Loss 12.9067 LearningRate 0.0832 Epoch: 1 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:56:00,026-Speed 2622.18 samples/sec Loss 13.0404 LearningRate 0.0832 Epoch: 1 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:56:03,936-Speed 2619.83 samples/sec Loss 12.8179 LearningRate 0.0832 Epoch: 1 Global Step: 73040 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:07,831-Speed 2629.53 samples/sec Loss 13.0079 LearningRate 0.0832 Epoch: 1 Global Step: 73050 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:11,729-Speed 2627.45 samples/sec Loss 12.7579 LearningRate 0.0832 Epoch: 1 Global Step: 73060 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:15,630-Speed 2625.60 samples/sec Loss 12.9366 LearningRate 0.0832 Epoch: 1 Global Step: 73070 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:19,538-Speed 2620.53 samples/sec Loss 13.0977 LearningRate 0.0832 Epoch: 1 Global Step: 73080 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:23,449-Speed 2618.99 samples/sec Loss 12.8744 LearningRate 0.0832 Epoch: 1 Global Step: 73090 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:27,347-Speed 2627.71 samples/sec Loss 12.8252 LearningRate 0.0832 Epoch: 1 Global Step: 73100 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:31,250-Speed 2624.42 samples/sec Loss 12.9396 LearningRate 0.0832 Epoch: 1 Global Step: 73110 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:35,147-Speed 2628.47 samples/sec Loss 12.8883 LearningRate 0.0831 Epoch: 1 Global Step: 73120 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:39,046-Speed 2626.74 samples/sec Loss 12.8298 LearningRate 0.0831 Epoch: 1 Global Step: 73130 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:42,948-Speed 2625.42 samples/sec Loss 12.8543 LearningRate 0.0831 Epoch: 1 Global Step: 73140 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:56:46,847-Speed 2627.22 samples/sec Loss 12.7675 LearningRate 0.0831 Epoch: 1 Global Step: 73150 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:56:50,757-Speed 2619.48 samples/sec Loss 12.9283 LearningRate 0.0831 Epoch: 1 Global Step: 73160 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:56:54,654-Speed 2628.14 samples/sec Loss 12.9378 LearningRate 0.0831 Epoch: 1 Global Step: 73170 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:56:58,558-Speed 2623.59 samples/sec Loss 12.9207 LearningRate 0.0831 Epoch: 1 Global Step: 73180 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:57:02,457-Speed 2626.96 samples/sec Loss 13.0012 LearningRate 0.0831 Epoch: 1 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:57:06,336-Speed 2640.18 samples/sec Loss 12.9415 LearningRate 0.0831 Epoch: 1 Global Step: 73200 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:10,234-Speed 2628.23 samples/sec Loss 13.0178 LearningRate 0.0831 Epoch: 1 Global Step: 73210 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:14,133-Speed 2627.57 samples/sec Loss 13.0148 LearningRate 0.0831 Epoch: 1 Global Step: 73220 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:18,030-Speed 2628.33 samples/sec Loss 12.7996 LearningRate 0.0831 Epoch: 1 Global Step: 73230 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:21,928-Speed 2627.73 samples/sec Loss 12.9385 LearningRate 0.0831 Epoch: 1 Global Step: 73240 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:25,829-Speed 2625.28 samples/sec Loss 12.9480 LearningRate 0.0831 Epoch: 1 Global Step: 73250 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:29,732-Speed 2625.12 samples/sec Loss 12.9736 LearningRate 0.0831 Epoch: 1 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:33,629-Speed 2628.09 samples/sec Loss 13.0886 LearningRate 0.0831 Epoch: 1 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:37,524-Speed 2629.30 samples/sec Loss 12.8796 LearningRate 0.0831 Epoch: 1 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:41,422-Speed 2627.57 samples/sec Loss 12.9515 LearningRate 0.0831 Epoch: 1 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:57:45,320-Speed 2627.90 samples/sec Loss 13.1038 LearningRate 0.0831 Epoch: 1 Global Step: 73300 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:57:49,224-Speed 2623.72 samples/sec Loss 13.0709 LearningRate 0.0831 Epoch: 1 Global Step: 73310 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:57:53,125-Speed 2625.61 samples/sec Loss 13.0397 LearningRate 0.0831 Epoch: 1 Global Step: 73320 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:57:57,026-Speed 2625.83 samples/sec Loss 13.0076 LearningRate 0.0831 Epoch: 1 Global Step: 73330 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:00,928-Speed 2624.74 samples/sec Loss 13.1364 LearningRate 0.0831 Epoch: 1 Global Step: 73340 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:04,842-Speed 2617.02 samples/sec Loss 13.0791 LearningRate 0.0831 Epoch: 1 Global Step: 73350 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:08,738-Speed 2628.32 samples/sec Loss 12.8108 LearningRate 0.0831 Epoch: 1 Global Step: 73360 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:12,644-Speed 2622.76 samples/sec Loss 13.1194 LearningRate 0.0831 Epoch: 1 Global Step: 73370 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:16,550-Speed 2621.94 samples/sec Loss 12.9665 LearningRate 0.0831 Epoch: 1 Global Step: 73380 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:20,462-Speed 2618.65 samples/sec Loss 13.0720 LearningRate 0.0831 Epoch: 1 Global Step: 73390 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:24,361-Speed 2627.10 samples/sec Loss 13.0093 LearningRate 0.0831 Epoch: 1 Global Step: 73400 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:58:28,242-Speed 2639.35 samples/sec Loss 12.9860 LearningRate 0.0831 Epoch: 1 Global Step: 73410 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:32,138-Speed 2628.63 samples/sec Loss 12.8663 LearningRate 0.0831 Epoch: 1 Global Step: 73420 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:36,039-Speed 2625.35 samples/sec Loss 12.9357 LearningRate 0.0831 Epoch: 1 Global Step: 73430 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:39,942-Speed 2624.16 samples/sec Loss 13.0083 LearningRate 0.0831 Epoch: 1 Global Step: 73440 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:43,857-Speed 2616.11 samples/sec Loss 12.9607 LearningRate 0.0831 Epoch: 1 Global Step: 73450 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:47,769-Speed 2618.54 samples/sec Loss 13.0027 LearningRate 0.0831 Epoch: 1 Global Step: 73460 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:51,664-Speed 2629.86 samples/sec Loss 12.9760 LearningRate 0.0831 Epoch: 1 Global Step: 73470 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:55,570-Speed 2622.46 samples/sec Loss 13.0089 LearningRate 0.0831 Epoch: 1 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:58:59,483-Speed 2617.52 samples/sec Loss 12.9755 LearningRate 0.0831 Epoch: 1 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:59:03,383-Speed 2626.00 samples/sec Loss 12.8638 LearningRate 0.0831 Epoch: 1 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:59:07,293-Speed 2619.81 samples/sec Loss 13.0106 LearningRate 0.0831 Epoch: 1 Global Step: 73510 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:11,193-Speed 2625.84 samples/sec Loss 13.0236 LearningRate 0.0831 Epoch: 1 Global Step: 73520 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:15,093-Speed 2626.59 samples/sec Loss 12.8997 LearningRate 0.0831 Epoch: 1 Global Step: 73530 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:18,995-Speed 2624.88 samples/sec Loss 13.0566 LearningRate 0.0831 Epoch: 1 Global Step: 73540 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:22,892-Speed 2628.68 samples/sec Loss 13.0165 LearningRate 0.0831 Epoch: 1 Global Step: 73550 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:26,792-Speed 2626.35 samples/sec Loss 13.0678 LearningRate 0.0831 Epoch: 1 Global Step: 73560 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:30,695-Speed 2623.88 samples/sec Loss 12.9029 LearningRate 0.0830 Epoch: 1 Global Step: 73570 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:34,616-Speed 2612.23 samples/sec Loss 13.1736 LearningRate 0.0830 Epoch: 1 Global Step: 73580 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:38,516-Speed 2626.32 samples/sec Loss 13.0127 LearningRate 0.0830 Epoch: 1 Global Step: 73590 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 03:59:42,411-Speed 2629.54 samples/sec Loss 13.0338 LearningRate 0.0830 Epoch: 1 Global Step: 73600 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 03:59:46,292-Speed 2639.82 samples/sec Loss 12.9498 LearningRate 0.0830 Epoch: 1 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:59:50,205-Speed 2617.56 samples/sec Loss 12.9664 LearningRate 0.0830 Epoch: 1 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:59:54,101-Speed 2629.00 samples/sec Loss 13.0551 LearningRate 0.0830 Epoch: 1 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 03:59:58,027-Speed 2609.29 samples/sec Loss 12.8057 LearningRate 0.0830 Epoch: 1 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:00:01,922-Speed 2629.72 samples/sec Loss 13.0089 LearningRate 0.0830 Epoch: 1 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:00:05,882-Speed 2586.27 samples/sec Loss 13.0439 LearningRate 0.0830 Epoch: 1 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:00:09,796-Speed 2616.43 samples/sec Loss 13.0522 LearningRate 0.0830 Epoch: 1 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:00:13,693-Speed 2628.49 samples/sec Loss 13.0239 LearningRate 0.0830 Epoch: 1 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:00:17,593-Speed 2626.55 samples/sec Loss 12.9865 LearningRate 0.0830 Epoch: 1 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:00:21,490-Speed 2627.85 samples/sec Loss 12.8370 LearningRate 0.0830 Epoch: 1 Global Step: 73700 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:00:25,404-Speed 2617.43 samples/sec Loss 12.8400 LearningRate 0.0830 Epoch: 1 Global Step: 73710 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:29,307-Speed 2624.82 samples/sec Loss 12.9362 LearningRate 0.0830 Epoch: 1 Global Step: 73720 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:33,203-Speed 2628.85 samples/sec Loss 12.8997 LearningRate 0.0830 Epoch: 1 Global Step: 73730 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:37,105-Speed 2624.76 samples/sec Loss 12.9473 LearningRate 0.0830 Epoch: 1 Global Step: 73740 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:41,009-Speed 2623.80 samples/sec Loss 13.0901 LearningRate 0.0830 Epoch: 1 Global Step: 73750 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:44,906-Speed 2628.48 samples/sec Loss 13.0820 LearningRate 0.0830 Epoch: 1 Global Step: 73760 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:48,808-Speed 2625.26 samples/sec Loss 12.9228 LearningRate 0.0830 Epoch: 1 Global Step: 73770 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:52,720-Speed 2618.24 samples/sec Loss 12.9271 LearningRate 0.0830 Epoch: 1 Global Step: 73780 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:00:56,616-Speed 2628.82 samples/sec Loss 12.8876 LearningRate 0.0830 Epoch: 1 Global Step: 73790 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:00,523-Speed 2621.79 samples/sec Loss 13.0310 LearningRate 0.0830 Epoch: 1 Global Step: 73800 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:04,421-Speed 2628.07 samples/sec Loss 13.0298 LearningRate 0.0830 Epoch: 1 Global Step: 73810 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:01:08,322-Speed 2625.31 samples/sec Loss 12.8349 LearningRate 0.0830 Epoch: 1 Global Step: 73820 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:01:12,221-Speed 2627.27 samples/sec Loss 12.9809 LearningRate 0.0830 Epoch: 1 Global Step: 73830 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:01:16,124-Speed 2624.23 samples/sec Loss 12.8596 LearningRate 0.0830 Epoch: 1 Global Step: 73840 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:01:20,042-Speed 2614.64 samples/sec Loss 12.9964 LearningRate 0.0830 Epoch: 1 Global Step: 73850 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:01:23,933-Speed 2631.88 samples/sec Loss 12.8064 LearningRate 0.0830 Epoch: 1 Global Step: 73860 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:01:27,819-Speed 2635.85 samples/sec Loss 12.9793 LearningRate 0.0830 Epoch: 1 Global Step: 73870 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:31,718-Speed 2627.08 samples/sec Loss 12.9107 LearningRate 0.0830 Epoch: 1 Global Step: 73880 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:35,616-Speed 2627.74 samples/sec Loss 13.0090 LearningRate 0.0830 Epoch: 1 Global Step: 73890 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:39,516-Speed 2626.48 samples/sec Loss 12.9262 LearningRate 0.0830 Epoch: 1 Global Step: 73900 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:43,441-Speed 2609.69 samples/sec Loss 12.9090 LearningRate 0.0830 Epoch: 1 Global Step: 73910 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:47,336-Speed 2630.11 samples/sec Loss 12.9740 LearningRate 0.0830 Epoch: 1 Global Step: 73920 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:51,241-Speed 2622.61 samples/sec Loss 13.0182 LearningRate 0.0830 Epoch: 1 Global Step: 73930 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:55,139-Speed 2627.52 samples/sec Loss 13.0015 LearningRate 0.0830 Epoch: 1 Global Step: 73940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:01:59,063-Speed 2610.57 samples/sec Loss 13.0284 LearningRate 0.0830 Epoch: 1 Global Step: 73950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:02:02,974-Speed 2618.94 samples/sec Loss 12.9505 LearningRate 0.0830 Epoch: 1 Global Step: 73960 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:02:06,887-Speed 2617.67 samples/sec Loss 12.6597 LearningRate 0.0830 Epoch: 1 Global Step: 73970 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:10,792-Speed 2623.38 samples/sec Loss 13.0241 LearningRate 0.0830 Epoch: 1 Global Step: 73980 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:14,692-Speed 2626.40 samples/sec Loss 13.0943 LearningRate 0.0830 Epoch: 1 Global Step: 73990 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:18,624-Speed 2605.02 samples/sec Loss 12.9721 LearningRate 0.0830 Epoch: 1 Global Step: 74000 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:22,540-Speed 2615.24 samples/sec Loss 12.9078 LearningRate 0.0830 Epoch: 1 Global Step: 74010 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:26,459-Speed 2613.70 samples/sec Loss 12.8438 LearningRate 0.0830 Epoch: 1 Global Step: 74020 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:30,394-Speed 2603.30 samples/sec Loss 12.8935 LearningRate 0.0829 Epoch: 1 Global Step: 74030 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:34,311-Speed 2615.14 samples/sec Loss 12.9146 LearningRate 0.0829 Epoch: 1 Global Step: 74040 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:38,224-Speed 2617.09 samples/sec Loss 12.7887 LearningRate 0.0829 Epoch: 1 Global Step: 74050 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:42,135-Speed 2619.00 samples/sec Loss 12.8489 LearningRate 0.0829 Epoch: 1 Global Step: 74060 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:46,018-Speed 2638.21 samples/sec Loss 12.9093 LearningRate 0.0829 Epoch: 1 Global Step: 74070 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:49,917-Speed 2626.82 samples/sec Loss 12.9467 LearningRate 0.0829 Epoch: 1 Global Step: 74080 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:53,813-Speed 2629.20 samples/sec Loss 12.9416 LearningRate 0.0829 Epoch: 1 Global Step: 74090 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:02:57,719-Speed 2622.05 samples/sec Loss 12.9548 LearningRate 0.0829 Epoch: 1 Global Step: 74100 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:01,622-Speed 2624.69 samples/sec Loss 12.9958 LearningRate 0.0829 Epoch: 1 Global Step: 74110 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:05,547-Speed 2609.49 samples/sec Loss 13.0601 LearningRate 0.0829 Epoch: 1 Global Step: 74120 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:09,449-Speed 2625.46 samples/sec Loss 12.8954 LearningRate 0.0829 Epoch: 1 Global Step: 74130 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:13,348-Speed 2626.80 samples/sec Loss 13.0228 LearningRate 0.0829 Epoch: 1 Global Step: 74140 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:17,248-Speed 2626.23 samples/sec Loss 13.0373 LearningRate 0.0829 Epoch: 1 Global Step: 74150 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:21,156-Speed 2620.53 samples/sec Loss 13.0071 LearningRate 0.0829 Epoch: 1 Global Step: 74160 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:25,060-Speed 2623.96 samples/sec Loss 12.9844 LearningRate 0.0829 Epoch: 1 Global Step: 74170 Fp16 Grad Scale: 524288 Required: 85 hours
Training: 2022-04-13 04:03:28,948-Speed 2634.95 samples/sec Loss 13.0483 LearningRate 0.0829 Epoch: 1 Global Step: 74180 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:32,844-Speed 2628.47 samples/sec Loss 13.0756 LearningRate 0.0829 Epoch: 1 Global Step: 74190 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:36,745-Speed 2626.07 samples/sec Loss 12.9799 LearningRate 0.0829 Epoch: 1 Global Step: 74200 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:40,648-Speed 2624.32 samples/sec Loss 13.0124 LearningRate 0.0829 Epoch: 1 Global Step: 74210 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:44,556-Speed 2620.93 samples/sec Loss 12.9472 LearningRate 0.0829 Epoch: 1 Global Step: 74220 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:48,501-Speed 2596.75 samples/sec Loss 12.8661 LearningRate 0.0829 Epoch: 1 Global Step: 74230 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:52,549-Speed 2529.94 samples/sec Loss 12.9263 LearningRate 0.0829 Epoch: 1 Global Step: 74240 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:03:56,652-Speed 2496.40 samples/sec Loss 13.0516 LearningRate 0.0829 Epoch: 1 Global Step: 74250 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:00,754-Speed 2497.48 samples/sec Loss 12.9259 LearningRate 0.0829 Epoch: 1 Global Step: 74260 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:04,714-Speed 2586.33 samples/sec Loss 12.9627 LearningRate 0.0829 Epoch: 1 Global Step: 74270 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:08,603-Speed 2633.92 samples/sec Loss 12.8720 LearningRate 0.0829 Epoch: 1 Global Step: 74280 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:12,630-Speed 2543.17 samples/sec Loss 12.9356 LearningRate 0.0829 Epoch: 1 Global Step: 74290 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:16,532-Speed 2625.42 samples/sec Loss 12.9097 LearningRate 0.0829 Epoch: 1 Global Step: 74300 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:20,428-Speed 2629.12 samples/sec Loss 12.9764 LearningRate 0.0829 Epoch: 1 Global Step: 74310 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:24,323-Speed 2629.68 samples/sec Loss 12.8967 LearningRate 0.0829 Epoch: 1 Global Step: 74320 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:28,219-Speed 2628.58 samples/sec Loss 13.0942 LearningRate 0.0829 Epoch: 1 Global Step: 74330 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:32,139-Speed 2612.76 samples/sec Loss 12.9420 LearningRate 0.0829 Epoch: 1 Global Step: 74340 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:36,031-Speed 2631.97 samples/sec Loss 12.9298 LearningRate 0.0829 Epoch: 1 Global Step: 74350 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:39,935-Speed 2623.78 samples/sec Loss 13.0775 LearningRate 0.0829 Epoch: 1 Global Step: 74360 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:43,840-Speed 2622.90 samples/sec Loss 12.9078 LearningRate 0.0829 Epoch: 1 Global Step: 74370 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:47,738-Speed 2627.81 samples/sec Loss 12.8495 LearningRate 0.0829 Epoch: 1 Global Step: 74380 Fp16 Grad Scale: 524288 Required: 85 hours
Training: 2022-04-13 04:04:51,630-Speed 2631.34 samples/sec Loss 12.8456 LearningRate 0.0829 Epoch: 1 Global Step: 74390 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:55,540-Speed 2619.30 samples/sec Loss 12.9169 LearningRate 0.0829 Epoch: 1 Global Step: 74400 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:04:59,443-Speed 2625.08 samples/sec Loss 12.9363 LearningRate 0.0829 Epoch: 1 Global Step: 74410 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:03,343-Speed 2626.05 samples/sec Loss 12.8916 LearningRate 0.0829 Epoch: 1 Global Step: 74420 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:07,240-Speed 2628.06 samples/sec Loss 13.1244 LearningRate 0.0829 Epoch: 1 Global Step: 74430 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:11,136-Speed 2629.32 samples/sec Loss 12.8522 LearningRate 0.0829 Epoch: 1 Global Step: 74440 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:15,031-Speed 2629.98 samples/sec Loss 12.9073 LearningRate 0.0829 Epoch: 1 Global Step: 74450 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:18,930-Speed 2626.88 samples/sec Loss 13.0176 LearningRate 0.0829 Epoch: 1 Global Step: 74460 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:22,828-Speed 2627.82 samples/sec Loss 12.8387 LearningRate 0.0829 Epoch: 1 Global Step: 74470 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:26,735-Speed 2620.95 samples/sec Loss 13.0146 LearningRate 0.0828 Epoch: 1 Global Step: 74480 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:30,612-Speed 2642.87 samples/sec Loss 12.9990 LearningRate 0.0828 Epoch: 1 Global Step: 74490 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:34,505-Speed 2630.91 samples/sec Loss 12.9340 LearningRate 0.0828 Epoch: 1 Global Step: 74500 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:38,400-Speed 2629.46 samples/sec Loss 13.0015 LearningRate 0.0828 Epoch: 1 Global Step: 74510 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:42,317-Speed 2614.73 samples/sec Loss 12.7979 LearningRate 0.0828 Epoch: 1 Global Step: 74520 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:46,215-Speed 2627.72 samples/sec Loss 13.0600 LearningRate 0.0828 Epoch: 1 Global Step: 74530 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:50,116-Speed 2625.93 samples/sec Loss 12.9314 LearningRate 0.0828 Epoch: 1 Global Step: 74540 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:54,013-Speed 2628.05 samples/sec Loss 12.7550 LearningRate 0.0828 Epoch: 1 Global Step: 74550 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:05:57,912-Speed 2627.09 samples/sec Loss 12.8127 LearningRate 0.0828 Epoch: 1 Global Step: 74560 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:01,823-Speed 2618.83 samples/sec Loss 12.9908 LearningRate 0.0828 Epoch: 1 Global Step: 74570 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:05,725-Speed 2625.07 samples/sec Loss 12.9836 LearningRate 0.0828 Epoch: 1 Global Step: 74580 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:09,620-Speed 2629.44 samples/sec Loss 12.9281 LearningRate 0.0828 Epoch: 1 Global Step: 74590 Fp16 Grad Scale: 524288 Required: 85 hours
Training: 2022-04-13 04:06:13,503-Speed 2638.43 samples/sec Loss 12.8689 LearningRate 0.0828 Epoch: 1 Global Step: 74600 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:17,399-Speed 2628.66 samples/sec Loss 12.9446 LearningRate 0.0828 Epoch: 1 Global Step: 74610 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:21,368-Speed 2580.88 samples/sec Loss 12.9299 LearningRate 0.0828 Epoch: 1 Global Step: 74620 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:25,283-Speed 2615.86 samples/sec Loss 13.0423 LearningRate 0.0828 Epoch: 1 Global Step: 74630 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:29,378-Speed 2501.45 samples/sec Loss 12.8103 LearningRate 0.0828 Epoch: 1 Global Step: 74640 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:33,344-Speed 2582.07 samples/sec Loss 12.9621 LearningRate 0.0828 Epoch: 1 Global Step: 74650 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:37,241-Speed 2628.25 samples/sec Loss 12.7751 LearningRate 0.0828 Epoch: 1 Global Step: 74660 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:41,153-Speed 2618.59 samples/sec Loss 12.9596 LearningRate 0.0828 Epoch: 1 Global Step: 74670 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:06:45,019-Speed 2649.44 samples/sec Loss 12.6932 LearningRate 0.0828 Epoch: 1 Global Step: 74680 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:06:48,915-Speed 2629.05 samples/sec Loss 13.1589 LearningRate 0.0828 Epoch: 1 Global Step: 74690 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:06:52,810-Speed 2629.80 samples/sec Loss 12.8589 LearningRate 0.0828 Epoch: 1 Global Step: 74700 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:06:56,711-Speed 2625.49 samples/sec Loss 13.0248 LearningRate 0.0828 Epoch: 1 Global Step: 74710 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:07:00,607-Speed 2628.47 samples/sec Loss 12.8406 LearningRate 0.0828 Epoch: 1 Global Step: 74720 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:07:04,503-Speed 2629.17 samples/sec Loss 12.8256 LearningRate 0.0828 Epoch: 1 Global Step: 74730 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:07:08,400-Speed 2628.43 samples/sec Loss 13.0190 LearningRate 0.0828 Epoch: 1 Global Step: 74740 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:07:12,318-Speed 2614.13 samples/sec Loss 12.7577 LearningRate 0.0828 Epoch: 1 Global Step: 74750 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:07:16,229-Speed 2619.09 samples/sec Loss 12.9010 LearningRate 0.0828 Epoch: 1 Global Step: 74760 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:07:20,144-Speed 2616.18 samples/sec Loss 12.9723 LearningRate 0.0828 Epoch: 1 Global Step: 74770 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:07:24,061-Speed 2615.45 samples/sec Loss 12.8087 LearningRate 0.0828 Epoch: 1 Global Step: 74780 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:27,974-Speed 2617.61 samples/sec Loss 12.9431 LearningRate 0.0828 Epoch: 1 Global Step: 74790 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:31,883-Speed 2619.57 samples/sec Loss 13.1382 LearningRate 0.0828 Epoch: 1 Global Step: 74800 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:35,791-Speed 2621.81 samples/sec Loss 12.8724 LearningRate 0.0828 Epoch: 1 Global Step: 74810 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:39,697-Speed 2621.61 samples/sec Loss 12.9721 LearningRate 0.0828 Epoch: 1 Global Step: 74820 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:43,619-Speed 2611.97 samples/sec Loss 13.0499 LearningRate 0.0828 Epoch: 1 Global Step: 74830 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:47,534-Speed 2616.16 samples/sec Loss 12.9794 LearningRate 0.0828 Epoch: 1 Global Step: 74840 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:51,434-Speed 2626.73 samples/sec Loss 12.8713 LearningRate 0.0828 Epoch: 1 Global Step: 74850 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:55,351-Speed 2614.78 samples/sec Loss 12.9171 LearningRate 0.0828 Epoch: 1 Global Step: 74860 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:07:59,260-Speed 2619.91 samples/sec Loss 12.9061 LearningRate 0.0828 Epoch: 1 Global Step: 74870 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:03,168-Speed 2620.82 samples/sec Loss 12.9007 LearningRate 0.0828 Epoch: 1 Global Step: 74880 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:08:07,066-Speed 2627.90 samples/sec Loss 12.9699 LearningRate 0.0828 Epoch: 1 Global Step: 74890 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:08:11,107-Speed 2535.02 samples/sec Loss 12.9163 LearningRate 0.0828 Epoch: 1 Global Step: 74900 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:08:15,016-Speed 2620.02 samples/sec Loss 12.9622 LearningRate 0.0828 Epoch: 1 Global Step: 74910 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:08:18,913-Speed 2628.67 samples/sec Loss 12.9503 LearningRate 0.0828 Epoch: 1 Global Step: 74920 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:08:22,813-Speed 2626.02 samples/sec Loss 13.1415 LearningRate 0.0828 Epoch: 1 Global Step: 74930 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:08:26,704-Speed 2631.90 samples/sec Loss 12.9330 LearningRate 0.0827 Epoch: 1 Global Step: 74940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:30,599-Speed 2629.75 samples/sec Loss 12.9876 LearningRate 0.0827 Epoch: 1 Global Step: 74950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:34,495-Speed 2629.09 samples/sec Loss 13.0135 LearningRate 0.0827 Epoch: 1 Global Step: 74960 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:38,406-Speed 2618.89 samples/sec Loss 12.8817 LearningRate 0.0827 Epoch: 1 Global Step: 74970 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:42,340-Speed 2603.55 samples/sec Loss 12.8650 LearningRate 0.0827 Epoch: 1 Global Step: 74980 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:46,259-Speed 2615.51 samples/sec Loss 12.8951 LearningRate 0.0827 Epoch: 1 Global Step: 74990 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:50,169-Speed 2619.53 samples/sec Loss 12.8843 LearningRate 0.0827 Epoch: 1 Global Step: 75000 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:54,064-Speed 2629.46 samples/sec Loss 12.8575 LearningRate 0.0827 Epoch: 1 Global Step: 75010 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:08:57,962-Speed 2627.52 samples/sec Loss 12.9021 LearningRate 0.0827 Epoch: 1 Global Step: 75020 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:09:01,861-Speed 2626.60 samples/sec Loss 12.9454 LearningRate 0.0827 Epoch: 1 Global Step: 75030 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:09:05,770-Speed 2620.23 samples/sec Loss 12.8399 LearningRate 0.0827 Epoch: 1 Global Step: 75040 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:09:09,667-Speed 2629.59 samples/sec Loss 12.9840 LearningRate 0.0827 Epoch: 1 Global Step: 75050 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:09:13,561-Speed 2630.19 samples/sec Loss 12.8546 LearningRate 0.0827 Epoch: 1 Global Step: 75060 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:09:17,458-Speed 2628.23 samples/sec Loss 12.7977 LearningRate 0.0827 Epoch: 1 Global Step: 75070 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:09:21,355-Speed 2628.63 samples/sec Loss 12.8885 LearningRate 0.0827 Epoch: 1 Global Step: 75080 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:09:25,252-Speed 2628.15 samples/sec Loss 13.0399 LearningRate 0.0827 Epoch: 1 Global Step: 75090 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:09:29,182-Speed 2606.59 samples/sec Loss 12.9009 LearningRate 0.0827 Epoch: 1 Global Step: 75100 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:09:33,061-Speed 2640.51 samples/sec Loss 12.9559 LearningRate 0.0827 Epoch: 1 Global Step: 75110 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:09:36,945-Speed 2636.95 samples/sec Loss 12.8602 LearningRate 0.0827 Epoch: 1 Global Step: 75120 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:09:40,843-Speed 2627.29 samples/sec Loss 13.0446 LearningRate 0.0827 Epoch: 1 Global Step: 75130 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:09:44,776-Speed 2605.15 samples/sec Loss 12.8849 LearningRate 0.0827 Epoch: 1 Global Step: 75140 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:09:48,678-Speed 2624.91 samples/sec Loss 12.8681 LearningRate 0.0827 Epoch: 1 Global Step: 75150 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:09:52,588-Speed 2619.36 samples/sec Loss 13.0149 LearningRate 0.0827 Epoch: 1 Global Step: 75160 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:09:56,500-Speed 2617.99 samples/sec Loss 12.7904 LearningRate 0.0827 Epoch: 1 Global Step: 75170 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:10:00,409-Speed 2620.58 samples/sec Loss 12.9949 LearningRate 0.0827 Epoch: 1 Global Step: 75180 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:10:04,318-Speed 2620.33 samples/sec Loss 13.1000 LearningRate 0.0827 Epoch: 1 Global Step: 75190 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:10:08,239-Speed 2611.97 samples/sec Loss 12.8884 LearningRate 0.0827 Epoch: 1 Global Step: 75200 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:10:12,144-Speed 2623.30 samples/sec Loss 12.7336 LearningRate 0.0827 Epoch: 1 Global Step: 75210 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:10:16,042-Speed 2627.95 samples/sec Loss 12.9436 LearningRate 0.0827 Epoch: 1 Global Step: 75220 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:19,942-Speed 2625.85 samples/sec Loss 12.9606 LearningRate 0.0827 Epoch: 1 Global Step: 75230 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:23,870-Speed 2607.41 samples/sec Loss 12.8078 LearningRate 0.0827 Epoch: 1 Global Step: 75240 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:27,797-Speed 2608.25 samples/sec Loss 12.9223 LearningRate 0.0827 Epoch: 1 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:31,693-Speed 2629.59 samples/sec Loss 12.9038 LearningRate 0.0827 Epoch: 1 Global Step: 75260 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:35,590-Speed 2628.11 samples/sec Loss 12.7720 LearningRate 0.0827 Epoch: 1 Global Step: 75270 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:39,486-Speed 2628.71 samples/sec Loss 12.9828 LearningRate 0.0827 Epoch: 1 Global Step: 75280 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:43,385-Speed 2626.79 samples/sec Loss 12.7638 LearningRate 0.0827 Epoch: 1 Global Step: 75290 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:47,283-Speed 2627.66 samples/sec Loss 12.7618 LearningRate 0.0827 Epoch: 1 Global Step: 75300 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:51,192-Speed 2620.07 samples/sec Loss 12.7347 LearningRate 0.0827 Epoch: 1 Global Step: 75310 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:10:55,091-Speed 2627.24 samples/sec Loss 12.8552 LearningRate 0.0827 Epoch: 1 Global Step: 75320 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:10:58,990-Speed 2626.80 samples/sec Loss 12.8754 LearningRate 0.0827 Epoch: 1 Global Step: 75330 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:02,889-Speed 2627.05 samples/sec Loss 12.8701 LearningRate 0.0827 Epoch: 1 Global Step: 75340 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:06,783-Speed 2630.68 samples/sec Loss 12.9938 LearningRate 0.0827 Epoch: 1 Global Step: 75350 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:10,681-Speed 2627.49 samples/sec Loss 13.0116 LearningRate 0.0827 Epoch: 1 Global Step: 75360 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:14,580-Speed 2627.11 samples/sec Loss 12.7713 LearningRate 0.0827 Epoch: 1 Global Step: 75370 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:18,475-Speed 2629.18 samples/sec Loss 12.8234 LearningRate 0.0827 Epoch: 1 Global Step: 75380 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:22,373-Speed 2628.11 samples/sec Loss 12.9131 LearningRate 0.0827 Epoch: 1 Global Step: 75390 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:26,270-Speed 2627.98 samples/sec Loss 12.9170 LearningRate 0.0826 Epoch: 1 Global Step: 75400 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:30,172-Speed 2625.50 samples/sec Loss 12.9558 LearningRate 0.0826 Epoch: 1 Global Step: 75410 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:34,062-Speed 2632.86 samples/sec Loss 12.9823 LearningRate 0.0826 Epoch: 1 Global Step: 75420 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:11:37,968-Speed 2621.74 samples/sec Loss 12.9421 LearningRate 0.0826 Epoch: 1 Global Step: 75430 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:11:41,867-Speed 2627.27 samples/sec Loss 12.9147 LearningRate 0.0826 Epoch: 1 Global Step: 75440 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:11:45,774-Speed 2621.90 samples/sec Loss 12.9664 LearningRate 0.0826 Epoch: 1 Global Step: 75450 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:11:49,673-Speed 2627.09 samples/sec Loss 12.9832 LearningRate 0.0826 Epoch: 1 Global Step: 75460 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:11:53,575-Speed 2625.10 samples/sec Loss 12.8034 LearningRate 0.0826 Epoch: 1 Global Step: 75470 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:11:57,479-Speed 2623.44 samples/sec Loss 12.8575 LearningRate 0.0826 Epoch: 1 Global Step: 75480 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:12:01,392-Speed 2617.78 samples/sec Loss 12.7460 LearningRate 0.0826 Epoch: 1 Global Step: 75490 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:12:05,297-Speed 2622.72 samples/sec Loss 12.9629 LearningRate 0.0826 Epoch: 1 Global Step: 75500 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:12:09,198-Speed 2625.81 samples/sec Loss 12.8824 LearningRate 0.0826 Epoch: 1 Global Step: 75510 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:12:13,092-Speed 2629.50 samples/sec Loss 12.8473 LearningRate 0.0826 Epoch: 1 Global Step: 75520 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:12:16,987-Speed 2630.02 samples/sec Loss 12.8996 LearningRate 0.0826 Epoch: 1 Global Step: 75530 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:12:20,887-Speed 2626.56 samples/sec Loss 12.8158 LearningRate 0.0826 Epoch: 1 Global Step: 75540 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:12:24,791-Speed 2623.65 samples/sec Loss 12.6540 LearningRate 0.0826 Epoch: 1 Global Step: 75550 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:12:28,665-Speed 2644.18 samples/sec Loss 12.9650 LearningRate 0.0826 Epoch: 1 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:12:32,553-Speed 2633.84 samples/sec Loss 12.8646 LearningRate 0.0826 Epoch: 1 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:12:36,467-Speed 2617.87 samples/sec Loss 12.8810 LearningRate 0.0826 Epoch: 1 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:12:40,355-Speed 2634.09 samples/sec Loss 12.9245 LearningRate 0.0826 Epoch: 1 Global Step: 75590 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:12:44,264-Speed 2620.30 samples/sec Loss 12.9760 LearningRate 0.0826 Epoch: 1 Global Step: 75600 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:12:48,166-Speed 2625.19 samples/sec Loss 12.8153 LearningRate 0.0826 Epoch: 1 Global Step: 75610 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:12:52,149-Speed 2571.71 samples/sec Loss 12.8386 LearningRate 0.0826 Epoch: 1 Global Step: 75620 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:12:56,052-Speed 2624.28 samples/sec Loss 12.7636 LearningRate 0.0826 Epoch: 1 Global Step: 75630 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:12:59,957-Speed 2622.75 samples/sec Loss 12.9743 LearningRate 0.0826 Epoch: 1 Global Step: 75640 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:13:03,866-Speed 2619.83 samples/sec Loss 12.9530 LearningRate 0.0826 Epoch: 1 Global Step: 75650 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:13:07,763-Speed 2628.52 samples/sec Loss 12.9755 LearningRate 0.0826 Epoch: 1 Global Step: 75660 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:13:11,670-Speed 2621.96 samples/sec Loss 12.8388 LearningRate 0.0826 Epoch: 1 Global Step: 75670 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:13:15,570-Speed 2626.42 samples/sec Loss 12.9348 LearningRate 0.0826 Epoch: 1 Global Step: 75680 Fp16 Grad Scale: 32768 Required: 85 hours
Training: 2022-04-13 04:13:19,477-Speed 2621.66 samples/sec Loss 13.0608 LearningRate 0.0826 Epoch: 1 Global Step: 75690 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:23,530-Speed 2527.61 samples/sec Loss 13.0184 LearningRate 0.0826 Epoch: 1 Global Step: 75700 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:27,631-Speed 2497.24 samples/sec Loss 12.8515 LearningRate 0.0826 Epoch: 1 Global Step: 75710 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:31,587-Speed 2588.70 samples/sec Loss 12.8785 LearningRate 0.0826 Epoch: 1 Global Step: 75720 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:35,493-Speed 2622.89 samples/sec Loss 12.9483 LearningRate 0.0826 Epoch: 1 Global Step: 75730 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:39,393-Speed 2626.11 samples/sec Loss 12.9854 LearningRate 0.0826 Epoch: 1 Global Step: 75740 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:43,289-Speed 2628.93 samples/sec Loss 12.8884 LearningRate 0.0826 Epoch: 1 Global Step: 75750 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:47,186-Speed 2628.74 samples/sec Loss 12.9768 LearningRate 0.0826 Epoch: 1 Global Step: 75760 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:51,083-Speed 2627.64 samples/sec Loss 12.8643 LearningRate 0.0826 Epoch: 1 Global Step: 75770 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:54,984-Speed 2625.72 samples/sec Loss 12.7522 LearningRate 0.0826 Epoch: 1 Global Step: 75780 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:13:58,888-Speed 2623.59 samples/sec Loss 12.9255 LearningRate 0.0826 Epoch: 1 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:02,781-Speed 2631.13 samples/sec Loss 12.9566 LearningRate 0.0826 Epoch: 1 Global Step: 75800 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:06,680-Speed 2626.76 samples/sec Loss 13.0069 LearningRate 0.0826 Epoch: 1 Global Step: 75810 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:10,573-Speed 2630.91 samples/sec Loss 12.7178 LearningRate 0.0826 Epoch: 1 Global Step: 75820 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:14,468-Speed 2630.19 samples/sec Loss 13.0350 LearningRate 0.0826 Epoch: 1 Global Step: 75830 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:18,365-Speed 2627.86 samples/sec Loss 13.0042 LearningRate 0.0826 Epoch: 1 Global Step: 75840 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:22,345-Speed 2573.31 samples/sec Loss 12.9807 LearningRate 0.0825 Epoch: 1 Global Step: 75850 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:26,240-Speed 2629.72 samples/sec Loss 12.8545 LearningRate 0.0825 Epoch: 1 Global Step: 75860 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:30,140-Speed 2626.22 samples/sec Loss 12.9664 LearningRate 0.0825 Epoch: 1 Global Step: 75870 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:34,039-Speed 2627.08 samples/sec Loss 13.0463 LearningRate 0.0825 Epoch: 1 Global Step: 75880 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:37,939-Speed 2626.44 samples/sec Loss 12.8681 LearningRate 0.0825 Epoch: 1 Global Step: 75890 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:14:41,832-Speed 2631.03 samples/sec Loss 12.8513 LearningRate 0.0825 Epoch: 1 Global Step: 75900 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:14:45,734-Speed 2625.08 samples/sec Loss 12.8755 LearningRate 0.0825 Epoch: 1 Global Step: 75910 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:14:49,611-Speed 2641.67 samples/sec Loss 12.9560 LearningRate 0.0825 Epoch: 1 Global Step: 75920 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:53,508-Speed 2628.44 samples/sec Loss 12.9600 LearningRate 0.0825 Epoch: 1 Global Step: 75930 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:14:57,401-Speed 2631.41 samples/sec Loss 12.9705 LearningRate 0.0825 Epoch: 1 Global Step: 75940 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:01,294-Speed 2630.99 samples/sec Loss 12.9180 LearningRate 0.0825 Epoch: 1 Global Step: 75950 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:05,186-Speed 2631.42 samples/sec Loss 12.9838 LearningRate 0.0825 Epoch: 1 Global Step: 75960 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:09,117-Speed 2605.69 samples/sec Loss 12.8034 LearningRate 0.0825 Epoch: 1 Global Step: 75970 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:13,012-Speed 2629.55 samples/sec Loss 12.8530 LearningRate 0.0825 Epoch: 1 Global Step: 75980 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:16,921-Speed 2620.94 samples/sec Loss 12.8398 LearningRate 0.0825 Epoch: 1 Global Step: 75990 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:20,827-Speed 2622.37 samples/sec Loss 13.1242 LearningRate 0.0825 Epoch: 1 Global Step: 76000 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:24,735-Speed 2620.89 samples/sec Loss 12.8308 LearningRate 0.0825 Epoch: 1 Global Step: 76010 Fp16 Grad Scale: 131072 Required: 85 hours
Training: 2022-04-13 04:15:28,636-Speed 2625.45 samples/sec Loss 12.8559 LearningRate 0.0825 Epoch: 1 Global Step: 76020 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:32,542-Speed 2622.63 samples/sec Loss 12.8610 LearningRate 0.0825 Epoch: 1 Global Step: 76030 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:36,438-Speed 2628.30 samples/sec Loss 12.8159 LearningRate 0.0825 Epoch: 1 Global Step: 76040 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:40,343-Speed 2623.04 samples/sec Loss 12.8949 LearningRate 0.0825 Epoch: 1 Global Step: 76050 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:44,236-Speed 2631.19 samples/sec Loss 12.8845 LearningRate 0.0825 Epoch: 1 Global Step: 76060 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:48,144-Speed 2621.33 samples/sec Loss 12.7816 LearningRate 0.0825 Epoch: 1 Global Step: 76070 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:52,045-Speed 2626.08 samples/sec Loss 12.8531 LearningRate 0.0825 Epoch: 1 Global Step: 76080 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:55,944-Speed 2627.03 samples/sec Loss 12.8096 LearningRate 0.0825 Epoch: 1 Global Step: 76090 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:15:59,840-Speed 2628.87 samples/sec Loss 13.0085 LearningRate 0.0825 Epoch: 1 Global Step: 76100 Fp16 Grad Scale: 262144 Required: 85 hours
Training: 2022-04-13 04:16:03,703-Speed 2651.66 samples/sec Loss 12.7997 LearningRate 0.0825 Epoch: 1 Global Step: 76110 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:16:07,598-Speed 2629.26 samples/sec Loss 12.9275 LearningRate 0.0825 Epoch: 1 Global Step: 76120 Fp16 Grad Scale: 65536 Required: 85 hours
Training: 2022-04-13 04:16:11,500-Speed 2624.83 samples/sec Loss 12.8621 LearningRate 0.0825 Epoch: 1 Global Step: 76130 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:15,391-Speed 2633.20 samples/sec Loss 12.9177 LearningRate 0.0825 Epoch: 1 Global Step: 76140 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:19,294-Speed 2623.96 samples/sec Loss 12.7896 LearningRate 0.0825 Epoch: 1 Global Step: 76150 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:23,192-Speed 2627.80 samples/sec Loss 12.8313 LearningRate 0.0825 Epoch: 1 Global Step: 76160 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:27,093-Speed 2625.57 samples/sec Loss 12.8207 LearningRate 0.0825 Epoch: 1 Global Step: 76170 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:31,002-Speed 2620.22 samples/sec Loss 12.8452 LearningRate 0.0825 Epoch: 1 Global Step: 76180 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:34,905-Speed 2624.46 samples/sec Loss 12.9270 LearningRate 0.0825 Epoch: 1 Global Step: 76190 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:38,825-Speed 2612.87 samples/sec Loss 12.8673 LearningRate 0.0825 Epoch: 1 Global Step: 76200 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:16:42,726-Speed 2625.31 samples/sec Loss 12.8434 LearningRate 0.0825 Epoch: 1 Global Step: 76210 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:16:46,626-Speed 2626.51 samples/sec Loss 12.8429 LearningRate 0.0825 Epoch: 1 Global Step: 76220 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:16:50,531-Speed 2623.25 samples/sec Loss 12.9010 LearningRate 0.0825 Epoch: 1 Global Step: 76230 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:16:54,425-Speed 2629.94 samples/sec Loss 12.9610 LearningRate 0.0825 Epoch: 1 Global Step: 76240 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:16:58,337-Speed 2618.44 samples/sec Loss 12.8454 LearningRate 0.0825 Epoch: 1 Global Step: 76250 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:17:02,237-Speed 2626.81 samples/sec Loss 12.9570 LearningRate 0.0825 Epoch: 1 Global Step: 76260 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:17:06,129-Speed 2631.50 samples/sec Loss 13.0286 LearningRate 0.0825 Epoch: 1 Global Step: 76270 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:17:10,013-Speed 2636.67 samples/sec Loss 12.8662 LearningRate 0.0825 Epoch: 1 Global Step: 76280 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:17:13,864-Speed 2659.49 samples/sec Loss 13.0483 LearningRate 0.0825 Epoch: 1 Global Step: 76290 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:17,760-Speed 2628.68 samples/sec Loss 13.1365 LearningRate 0.0825 Epoch: 1 Global Step: 76300 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:21,653-Speed 2631.29 samples/sec Loss 12.8535 LearningRate 0.0824 Epoch: 1 Global Step: 76310 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:25,552-Speed 2627.26 samples/sec Loss 13.0596 LearningRate 0.0824 Epoch: 1 Global Step: 76320 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:29,442-Speed 2633.11 samples/sec Loss 12.8599 LearningRate 0.0824 Epoch: 1 Global Step: 76330 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:33,368-Speed 2609.10 samples/sec Loss 13.0249 LearningRate 0.0824 Epoch: 1 Global Step: 76340 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:37,267-Speed 2626.80 samples/sec Loss 12.8561 LearningRate 0.0824 Epoch: 1 Global Step: 76350 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:41,162-Speed 2629.45 samples/sec Loss 12.8247 LearningRate 0.0824 Epoch: 1 Global Step: 76360 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:45,058-Speed 2629.56 samples/sec Loss 12.8251 LearningRate 0.0824 Epoch: 1 Global Step: 76370 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:48,947-Speed 2633.09 samples/sec Loss 12.8243 LearningRate 0.0824 Epoch: 1 Global Step: 76380 Fp16 Grad Scale: 8192 Required: 84 hours
Training: 2022-04-13 04:17:52,840-Speed 2631.74 samples/sec Loss 12.9461 LearningRate 0.0824 Epoch: 1 Global Step: 76390 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:17:56,732-Speed 2631.76 samples/sec Loss 12.8400 LearningRate 0.0824 Epoch: 1 Global Step: 76400 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:00,621-Speed 2633.35 samples/sec Loss 12.7910 LearningRate 0.0824 Epoch: 1 Global Step: 76410 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:04,537-Speed 2616.00 samples/sec Loss 12.8787 LearningRate 0.0824 Epoch: 1 Global Step: 76420 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:08,427-Speed 2632.57 samples/sec Loss 13.0382 LearningRate 0.0824 Epoch: 1 Global Step: 76430 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:12,321-Speed 2630.45 samples/sec Loss 12.8805 LearningRate 0.0824 Epoch: 1 Global Step: 76440 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:16,214-Speed 2631.12 samples/sec Loss 12.8592 LearningRate 0.0824 Epoch: 1 Global Step: 76450 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:20,178-Speed 2583.54 samples/sec Loss 12.9168 LearningRate 0.0824 Epoch: 1 Global Step: 76460 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:24,077-Speed 2626.89 samples/sec Loss 12.6906 LearningRate 0.0824 Epoch: 1 Global Step: 76470 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:27,969-Speed 2632.28 samples/sec Loss 12.8532 LearningRate 0.0824 Epoch: 1 Global Step: 76480 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:18:31,860-Speed 2632.57 samples/sec Loss 12.8765 LearningRate 0.0824 Epoch: 1 Global Step: 76490 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:18:35,751-Speed 2632.23 samples/sec Loss 12.8165 LearningRate 0.0824 Epoch: 1 Global Step: 76500 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:18:39,644-Speed 2630.69 samples/sec Loss 12.8683 LearningRate 0.0824 Epoch: 1 Global Step: 76510 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:18:43,550-Speed 2621.96 samples/sec Loss 12.7770 LearningRate 0.0824 Epoch: 1 Global Step: 76520 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:18:47,445-Speed 2629.77 samples/sec Loss 12.7843 LearningRate 0.0824 Epoch: 1 Global Step: 76530 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:18:51,339-Speed 2630.31 samples/sec Loss 12.8270 LearningRate 0.0824 Epoch: 1 Global Step: 76540 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:18:55,233-Speed 2630.45 samples/sec Loss 12.8096 LearningRate 0.0824 Epoch: 1 Global Step: 76550 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:18:59,124-Speed 2632.29 samples/sec Loss 12.8907 LearningRate 0.0824 Epoch: 1 Global Step: 76560 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:19:03,017-Speed 2630.60 samples/sec Loss 12.7809 LearningRate 0.0824 Epoch: 1 Global Step: 76570 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:19:06,912-Speed 2630.02 samples/sec Loss 12.9110 LearningRate 0.0824 Epoch: 1 Global Step: 76580 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:19:10,805-Speed 2631.02 samples/sec Loss 12.8213 LearningRate 0.0824 Epoch: 1 Global Step: 76590 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:14,697-Speed 2631.22 samples/sec Loss 12.7568 LearningRate 0.0824 Epoch: 1 Global Step: 76600 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:18,606-Speed 2620.84 samples/sec Loss 12.9231 LearningRate 0.0824 Epoch: 1 Global Step: 76610 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:22,502-Speed 2628.86 samples/sec Loss 12.8652 LearningRate 0.0824 Epoch: 1 Global Step: 76620 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:26,418-Speed 2616.09 samples/sec Loss 12.8131 LearningRate 0.0824 Epoch: 1 Global Step: 76630 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:30,317-Speed 2627.10 samples/sec Loss 12.8367 LearningRate 0.0824 Epoch: 1 Global Step: 76640 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:34,218-Speed 2625.24 samples/sec Loss 12.9512 LearningRate 0.0824 Epoch: 1 Global Step: 76650 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:38,116-Speed 2627.68 samples/sec Loss 12.8886 LearningRate 0.0824 Epoch: 1 Global Step: 76660 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:42,013-Speed 2628.50 samples/sec Loss 12.9765 LearningRate 0.0824 Epoch: 1 Global Step: 76670 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:45,979-Speed 2582.32 samples/sec Loss 12.8343 LearningRate 0.0824 Epoch: 1 Global Step: 76680 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:19:49,900-Speed 2612.83 samples/sec Loss 12.8498 LearningRate 0.0824 Epoch: 1 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:19:53,835-Speed 2602.54 samples/sec Loss 12.8105 LearningRate 0.0824 Epoch: 1 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:19:57,728-Speed 2630.91 samples/sec Loss 12.8211 LearningRate 0.0824 Epoch: 1 Global Step: 76710 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:01,636-Speed 2620.72 samples/sec Loss 12.7983 LearningRate 0.0824 Epoch: 1 Global Step: 76720 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:05,530-Speed 2630.36 samples/sec Loss 13.0267 LearningRate 0.0824 Epoch: 1 Global Step: 76730 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:09,425-Speed 2629.50 samples/sec Loss 12.9563 LearningRate 0.0824 Epoch: 1 Global Step: 76740 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:13,320-Speed 2629.37 samples/sec Loss 12.9033 LearningRate 0.0824 Epoch: 1 Global Step: 76750 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:17,215-Speed 2629.95 samples/sec Loss 12.9190 LearningRate 0.0824 Epoch: 1 Global Step: 76760 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:21,110-Speed 2630.10 samples/sec Loss 12.7641 LearningRate 0.0823 Epoch: 1 Global Step: 76770 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:25,006-Speed 2629.17 samples/sec Loss 12.9380 LearningRate 0.0823 Epoch: 1 Global Step: 76780 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:20:28,902-Speed 2628.45 samples/sec Loss 12.8382 LearningRate 0.0823 Epoch: 1 Global Step: 76790 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:20:32,765-Speed 2651.21 samples/sec Loss 12.8308 LearningRate 0.0823 Epoch: 1 Global Step: 76800 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:20:36,663-Speed 2627.32 samples/sec Loss 12.8401 LearningRate 0.0823 Epoch: 1 Global Step: 76810 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:20:40,567-Speed 2624.57 samples/sec Loss 12.7394 LearningRate 0.0823 Epoch: 1 Global Step: 76820 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:20:44,473-Speed 2621.72 samples/sec Loss 12.7547 LearningRate 0.0823 Epoch: 1 Global Step: 76830 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:20:48,494-Speed 2547.54 samples/sec Loss 13.0120 LearningRate 0.0823 Epoch: 1 Global Step: 76840 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:20:52,518-Speed 2545.43 samples/sec Loss 12.7756 LearningRate 0.0823 Epoch: 1 Global Step: 76850 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:20:56,452-Speed 2604.14 samples/sec Loss 12.9502 LearningRate 0.0823 Epoch: 1 Global Step: 76860 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:21:00,364-Speed 2617.87 samples/sec Loss 12.7326 LearningRate 0.0823 Epoch: 1 Global Step: 76870 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:21:04,275-Speed 2619.11 samples/sec Loss 12.9402 LearningRate 0.0823 Epoch: 1 Global Step: 76880 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:21:08,179-Speed 2623.45 samples/sec Loss 12.7660 LearningRate 0.0823 Epoch: 1 Global Step: 76890 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:21:12,076-Speed 2628.87 samples/sec Loss 12.9224 LearningRate 0.0823 Epoch: 1 Global Step: 76900 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:15,984-Speed 2620.57 samples/sec Loss 12.7335 LearningRate 0.0823 Epoch: 1 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:19,893-Speed 2620.37 samples/sec Loss 12.8386 LearningRate 0.0823 Epoch: 1 Global Step: 76920 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:23,788-Speed 2629.53 samples/sec Loss 12.8562 LearningRate 0.0823 Epoch: 1 Global Step: 76930 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:27,690-Speed 2624.98 samples/sec Loss 12.7255 LearningRate 0.0823 Epoch: 1 Global Step: 76940 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:31,589-Speed 2626.86 samples/sec Loss 12.9439 LearningRate 0.0823 Epoch: 1 Global Step: 76950 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:35,488-Speed 2627.35 samples/sec Loss 12.7875 LearningRate 0.0823 Epoch: 1 Global Step: 76960 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:39,383-Speed 2629.07 samples/sec Loss 12.9289 LearningRate 0.0823 Epoch: 1 Global Step: 76970 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:43,284-Speed 2625.65 samples/sec Loss 12.8144 LearningRate 0.0823 Epoch: 1 Global Step: 76980 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:47,181-Speed 2628.86 samples/sec Loss 12.8171 LearningRate 0.0823 Epoch: 1 Global Step: 76990 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:51,057-Speed 2642.62 samples/sec Loss 12.7596 LearningRate 0.0823 Epoch: 1 Global Step: 77000 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:54,960-Speed 2624.45 samples/sec Loss 12.8936 LearningRate 0.0823 Epoch: 1 Global Step: 77010 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:21:58,875-Speed 2616.09 samples/sec Loss 12.8092 LearningRate 0.0823 Epoch: 1 Global Step: 77020 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:02,790-Speed 2616.33 samples/sec Loss 12.8935 LearningRate 0.0823 Epoch: 1 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:06,693-Speed 2624.00 samples/sec Loss 12.7110 LearningRate 0.0823 Epoch: 1 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:10,615-Speed 2612.32 samples/sec Loss 12.9158 LearningRate 0.0823 Epoch: 1 Global Step: 77050 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:14,510-Speed 2628.96 samples/sec Loss 12.8176 LearningRate 0.0823 Epoch: 1 Global Step: 77060 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:18,418-Speed 2621.06 samples/sec Loss 12.9017 LearningRate 0.0823 Epoch: 1 Global Step: 77070 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:22,312-Speed 2630.32 samples/sec Loss 12.7619 LearningRate 0.0823 Epoch: 1 Global Step: 77080 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:26,205-Speed 2630.61 samples/sec Loss 12.8640 LearningRate 0.0823 Epoch: 1 Global Step: 77090 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:30,100-Speed 2630.57 samples/sec Loss 12.8729 LearningRate 0.0823 Epoch: 1 Global Step: 77100 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:22:33,994-Speed 2630.29 samples/sec Loss 12.8663 LearningRate 0.0823 Epoch: 1 Global Step: 77110 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:22:37,877-Speed 2637.66 samples/sec Loss 12.8858 LearningRate 0.0823 Epoch: 1 Global Step: 77120 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:41,766-Speed 2633.63 samples/sec Loss 12.9015 LearningRate 0.0823 Epoch: 1 Global Step: 77130 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:45,657-Speed 2632.40 samples/sec Loss 12.8278 LearningRate 0.0823 Epoch: 1 Global Step: 77140 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:49,552-Speed 2629.97 samples/sec Loss 12.8830 LearningRate 0.0823 Epoch: 1 Global Step: 77150 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:53,445-Speed 2630.64 samples/sec Loss 12.7697 LearningRate 0.0823 Epoch: 1 Global Step: 77160 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:22:57,346-Speed 2625.93 samples/sec Loss 12.8758 LearningRate 0.0823 Epoch: 1 Global Step: 77170 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:01,251-Speed 2623.54 samples/sec Loss 12.9532 LearningRate 0.0823 Epoch: 1 Global Step: 77180 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:05,148-Speed 2627.78 samples/sec Loss 12.9283 LearningRate 0.0823 Epoch: 1 Global Step: 77190 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:09,043-Speed 2629.53 samples/sec Loss 12.9016 LearningRate 0.0823 Epoch: 1 Global Step: 77200 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:12,945-Speed 2624.71 samples/sec Loss 12.8940 LearningRate 0.0823 Epoch: 1 Global Step: 77210 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:16,843-Speed 2627.99 samples/sec Loss 12.8562 LearningRate 0.0822 Epoch: 1 Global Step: 77220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:23:20,738-Speed 2629.25 samples/sec Loss 12.8258 LearningRate 0.0822 Epoch: 1 Global Step: 77230 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:23:24,635-Speed 2628.54 samples/sec Loss 12.8596 LearningRate 0.0822 Epoch: 1 Global Step: 77240 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:23:28,528-Speed 2631.35 samples/sec Loss 13.0255 LearningRate 0.0822 Epoch: 1 Global Step: 77250 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:23:32,422-Speed 2630.35 samples/sec Loss 12.8263 LearningRate 0.0822 Epoch: 1 Global Step: 77260 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:23:36,301-Speed 2640.77 samples/sec Loss 12.8347 LearningRate 0.0822 Epoch: 1 Global Step: 77270 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:40,199-Speed 2626.91 samples/sec Loss 12.9929 LearningRate 0.0822 Epoch: 1 Global Step: 77280 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:44,094-Speed 2630.06 samples/sec Loss 12.8637 LearningRate 0.0822 Epoch: 1 Global Step: 77290 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:48,089-Speed 2563.76 samples/sec Loss 12.9543 LearningRate 0.0822 Epoch: 1 Global Step: 77300 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:51,985-Speed 2629.26 samples/sec Loss 12.8956 LearningRate 0.0822 Epoch: 1 Global Step: 77310 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:55,878-Speed 2631.35 samples/sec Loss 12.7875 LearningRate 0.0822 Epoch: 1 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:23:59,770-Speed 2631.40 samples/sec Loss 12.8469 LearningRate 0.0822 Epoch: 1 Global Step: 77330 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:03,660-Speed 2632.99 samples/sec Loss 12.6390 LearningRate 0.0822 Epoch: 1 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:07,566-Speed 2622.36 samples/sec Loss 12.8411 LearningRate 0.0822 Epoch: 1 Global Step: 77350 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:11,459-Speed 2630.79 samples/sec Loss 12.7705 LearningRate 0.0822 Epoch: 1 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:15,388-Speed 2607.13 samples/sec Loss 12.8773 LearningRate 0.0822 Epoch: 1 Global Step: 77370 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:24:19,295-Speed 2621.79 samples/sec Loss 12.9283 LearningRate 0.0822 Epoch: 1 Global Step: 77380 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:24:23,191-Speed 2629.10 samples/sec Loss 12.7600 LearningRate 0.0822 Epoch: 1 Global Step: 77390 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:24:27,111-Speed 2612.92 samples/sec Loss 12.8039 LearningRate 0.0822 Epoch: 1 Global Step: 77400 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:31,007-Speed 2629.13 samples/sec Loss 12.7412 LearningRate 0.0822 Epoch: 1 Global Step: 77410 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:34,922-Speed 2616.33 samples/sec Loss 12.8358 LearningRate 0.0822 Epoch: 1 Global Step: 77420 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:38,819-Speed 2627.88 samples/sec Loss 12.7361 LearningRate 0.0822 Epoch: 1 Global Step: 77430 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:42,713-Speed 2630.79 samples/sec Loss 12.9287 LearningRate 0.0822 Epoch: 1 Global Step: 77440 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:46,623-Speed 2619.96 samples/sec Loss 12.8598 LearningRate 0.0822 Epoch: 1 Global Step: 77450 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:50,519-Speed 2628.49 samples/sec Loss 12.7753 LearningRate 0.0822 Epoch: 1 Global Step: 77460 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:54,415-Speed 2629.32 samples/sec Loss 12.7552 LearningRate 0.0822 Epoch: 1 Global Step: 77470 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:24:58,321-Speed 2622.27 samples/sec Loss 12.8872 LearningRate 0.0822 Epoch: 1 Global Step: 77480 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:25:02,226-Speed 2622.93 samples/sec Loss 12.6866 LearningRate 0.0822 Epoch: 1 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:25:06,120-Speed 2630.76 samples/sec Loss 12.8269 LearningRate 0.0822 Epoch: 1 Global Step: 77500 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:10,030-Speed 2619.29 samples/sec Loss 12.8765 LearningRate 0.0822 Epoch: 1 Global Step: 77510 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:13,930-Speed 2626.56 samples/sec Loss 12.7963 LearningRate 0.0822 Epoch: 1 Global Step: 77520 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:17,825-Speed 2629.74 samples/sec Loss 12.7566 LearningRate 0.0822 Epoch: 1 Global Step: 77530 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:21,716-Speed 2632.29 samples/sec Loss 12.7257 LearningRate 0.0822 Epoch: 1 Global Step: 77540 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:25,621-Speed 2623.38 samples/sec Loss 12.7211 LearningRate 0.0822 Epoch: 1 Global Step: 77550 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:29,526-Speed 2622.74 samples/sec Loss 12.7575 LearningRate 0.0822 Epoch: 1 Global Step: 77560 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:33,420-Speed 2630.54 samples/sec Loss 12.9111 LearningRate 0.0822 Epoch: 1 Global Step: 77570 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:37,320-Speed 2625.67 samples/sec Loss 12.7212 LearningRate 0.0822 Epoch: 1 Global Step: 77580 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:41,228-Speed 2621.59 samples/sec Loss 12.7457 LearningRate 0.0822 Epoch: 1 Global Step: 77590 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:45,138-Speed 2619.26 samples/sec Loss 12.8922 LearningRate 0.0822 Epoch: 1 Global Step: 77600 Fp16 Grad Scale: 524288 Required: 84 hours
Training: 2022-04-13 04:25:49,023-Speed 2637.45 samples/sec Loss 12.7435 LearningRate 0.0822 Epoch: 1 Global Step: 77610 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:25:52,905-Speed 2638.43 samples/sec Loss 12.8578 LearningRate 0.0822 Epoch: 1 Global Step: 77620 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:25:56,808-Speed 2624.23 samples/sec Loss 12.7040 LearningRate 0.0822 Epoch: 1 Global Step: 77630 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:00,704-Speed 2629.13 samples/sec Loss 13.0481 LearningRate 0.0822 Epoch: 1 Global Step: 77640 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:04,614-Speed 2619.65 samples/sec Loss 12.8703 LearningRate 0.0822 Epoch: 1 Global Step: 77650 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:08,549-Speed 2602.81 samples/sec Loss 12.8510 LearningRate 0.0822 Epoch: 1 Global Step: 77660 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:12,441-Speed 2631.44 samples/sec Loss 12.8718 LearningRate 0.0822 Epoch: 1 Global Step: 77670 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:16,336-Speed 2629.81 samples/sec Loss 12.8150 LearningRate 0.0821 Epoch: 1 Global Step: 77680 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:20,271-Speed 2603.42 samples/sec Loss 12.8008 LearningRate 0.0821 Epoch: 1 Global Step: 77690 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:24,176-Speed 2622.73 samples/sec Loss 12.7927 LearningRate 0.0821 Epoch: 1 Global Step: 77700 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:28,080-Speed 2624.24 samples/sec Loss 12.8690 LearningRate 0.0821 Epoch: 1 Global Step: 77710 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:31,978-Speed 2627.44 samples/sec Loss 12.9090 LearningRate 0.0821 Epoch: 1 Global Step: 77720 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:26:35,866-Speed 2634.04 samples/sec Loss 12.7505 LearningRate 0.0821 Epoch: 1 Global Step: 77730 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:39,759-Speed 2631.37 samples/sec Loss 12.7080 LearningRate 0.0821 Epoch: 1 Global Step: 77740 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:43,652-Speed 2630.97 samples/sec Loss 12.9159 LearningRate 0.0821 Epoch: 1 Global Step: 77750 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:47,543-Speed 2632.76 samples/sec Loss 12.8624 LearningRate 0.0821 Epoch: 1 Global Step: 77760 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:51,439-Speed 2628.54 samples/sec Loss 12.7939 LearningRate 0.0821 Epoch: 1 Global Step: 77770 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:55,341-Speed 2625.45 samples/sec Loss 12.7761 LearningRate 0.0821 Epoch: 1 Global Step: 77780 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:26:59,254-Speed 2616.83 samples/sec Loss 12.6788 LearningRate 0.0821 Epoch: 1 Global Step: 77790 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:03,161-Speed 2622.01 samples/sec Loss 12.7208 LearningRate 0.0821 Epoch: 1 Global Step: 77800 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:07,055-Speed 2630.08 samples/sec Loss 12.7039 LearningRate 0.0821 Epoch: 1 Global Step: 77810 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:10,949-Speed 2630.40 samples/sec Loss 12.9194 LearningRate 0.0821 Epoch: 1 Global Step: 77820 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:14,847-Speed 2627.51 samples/sec Loss 12.7473 LearningRate 0.0821 Epoch: 1 Global Step: 77830 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:27:18,747-Speed 2626.47 samples/sec Loss 12.8490 LearningRate 0.0821 Epoch: 1 Global Step: 77840 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:22,645-Speed 2627.88 samples/sec Loss 13.0079 LearningRate 0.0821 Epoch: 1 Global Step: 77850 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:26,545-Speed 2626.93 samples/sec Loss 12.8253 LearningRate 0.0821 Epoch: 1 Global Step: 77860 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:30,439-Speed 2630.07 samples/sec Loss 12.7887 LearningRate 0.0821 Epoch: 1 Global Step: 77870 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:34,341-Speed 2624.32 samples/sec Loss 12.8818 LearningRate 0.0821 Epoch: 1 Global Step: 77880 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:38,234-Speed 2631.17 samples/sec Loss 12.7712 LearningRate 0.0821 Epoch: 1 Global Step: 77890 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:42,131-Speed 2628.50 samples/sec Loss 12.8965 LearningRate 0.0821 Epoch: 1 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:46,025-Speed 2630.04 samples/sec Loss 12.7564 LearningRate 0.0821 Epoch: 1 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:49,921-Speed 2629.64 samples/sec Loss 12.8997 LearningRate 0.0821 Epoch: 1 Global Step: 77920 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:53,818-Speed 2627.60 samples/sec Loss 12.8622 LearningRate 0.0821 Epoch: 1 Global Step: 77930 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:27:57,754-Speed 2603.14 samples/sec Loss 12.8552 LearningRate 0.0821 Epoch: 1 Global Step: 77940 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:28:01,654-Speed 2626.16 samples/sec Loss 12.7746 LearningRate 0.0821 Epoch: 1 Global Step: 77950 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:28:05,549-Speed 2629.21 samples/sec Loss 12.7134 LearningRate 0.0821 Epoch: 1 Global Step: 77960 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:28:09,433-Speed 2637.60 samples/sec Loss 12.8033 LearningRate 0.0821 Epoch: 1 Global Step: 77970 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:13,333-Speed 2626.61 samples/sec Loss 12.7087 LearningRate 0.0821 Epoch: 1 Global Step: 77980 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:17,230-Speed 2628.36 samples/sec Loss 12.6412 LearningRate 0.0821 Epoch: 1 Global Step: 77990 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:21,125-Speed 2629.53 samples/sec Loss 12.9300 LearningRate 0.0821 Epoch: 1 Global Step: 78000 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:25,019-Speed 2630.23 samples/sec Loss 12.8192 LearningRate 0.0821 Epoch: 1 Global Step: 78010 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:28,927-Speed 2621.19 samples/sec Loss 12.8672 LearningRate 0.0821 Epoch: 1 Global Step: 78020 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:32,827-Speed 2626.15 samples/sec Loss 12.8814 LearningRate 0.0821 Epoch: 1 Global Step: 78030 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:36,729-Speed 2624.98 samples/sec Loss 12.7293 LearningRate 0.0821 Epoch: 1 Global Step: 78040 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:40,627-Speed 2627.91 samples/sec Loss 13.0511 LearningRate 0.0821 Epoch: 1 Global Step: 78050 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:44,524-Speed 2627.89 samples/sec Loss 12.8501 LearningRate 0.0821 Epoch: 1 Global Step: 78060 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:28:48,436-Speed 2618.50 samples/sec Loss 12.8240 LearningRate 0.0821 Epoch: 1 Global Step: 78070 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:28:52,346-Speed 2619.13 samples/sec Loss 12.8910 LearningRate 0.0821 Epoch: 1 Global Step: 78080 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:28:56,238-Speed 2632.12 samples/sec Loss 12.9451 LearningRate 0.0821 Epoch: 1 Global Step: 78090 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:00,123-Speed 2636.74 samples/sec Loss 12.9203 LearningRate 0.0821 Epoch: 1 Global Step: 78100 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:04,024-Speed 2625.45 samples/sec Loss 12.9362 LearningRate 0.0821 Epoch: 1 Global Step: 78110 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:07,927-Speed 2624.14 samples/sec Loss 12.8824 LearningRate 0.0821 Epoch: 1 Global Step: 78120 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:11,830-Speed 2624.05 samples/sec Loss 12.9384 LearningRate 0.0821 Epoch: 1 Global Step: 78130 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:15,725-Speed 2629.94 samples/sec Loss 12.7650 LearningRate 0.0820 Epoch: 1 Global Step: 78140 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:19,620-Speed 2629.36 samples/sec Loss 12.7823 LearningRate 0.0820 Epoch: 1 Global Step: 78150 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:23,520-Speed 2626.44 samples/sec Loss 12.8208 LearningRate 0.0820 Epoch: 1 Global Step: 78160 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:27,421-Speed 2625.43 samples/sec Loss 12.8645 LearningRate 0.0820 Epoch: 1 Global Step: 78170 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:31,323-Speed 2625.17 samples/sec Loss 12.7678 LearningRate 0.0820 Epoch: 1 Global Step: 78180 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:29:35,218-Speed 2629.44 samples/sec Loss 12.8584 LearningRate 0.0820 Epoch: 1 Global Step: 78190 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:29:39,121-Speed 2624.41 samples/sec Loss 12.7438 LearningRate 0.0820 Epoch: 1 Global Step: 78200 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:29:43,018-Speed 2628.31 samples/sec Loss 12.7168 LearningRate 0.0820 Epoch: 1 Global Step: 78210 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:29:46,917-Speed 2626.98 samples/sec Loss 12.9208 LearningRate 0.0820 Epoch: 1 Global Step: 78220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:29:50,854-Speed 2601.62 samples/sec Loss 12.7785 LearningRate 0.0820 Epoch: 1 Global Step: 78230 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:29:54,802-Speed 2594.84 samples/sec Loss 12.6715 LearningRate 0.0820 Epoch: 1 Global Step: 78240 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:29:58,682-Speed 2639.59 samples/sec Loss 12.6967 LearningRate 0.0820 Epoch: 1 Global Step: 78250 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:30:02,584-Speed 2625.11 samples/sec Loss 12.9039 LearningRate 0.0820 Epoch: 1 Global Step: 78260 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:30:06,476-Speed 2631.55 samples/sec Loss 12.8114 LearningRate 0.0820 Epoch: 1 Global Step: 78270 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:30:10,368-Speed 2631.74 samples/sec Loss 12.7737 LearningRate 0.0820 Epoch: 1 Global Step: 78280 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:30:14,261-Speed 2630.88 samples/sec Loss 12.7110 LearningRate 0.0820 Epoch: 1 Global Step: 78290 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:30:18,164-Speed 2624.70 samples/sec Loss 12.7869 LearningRate 0.0820 Epoch: 1 Global Step: 78300 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:30:22,041-Speed 2642.51 samples/sec Loss 12.8020 LearningRate 0.0820 Epoch: 1 Global Step: 78310 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:25,936-Speed 2628.87 samples/sec Loss 12.9103 LearningRate 0.0820 Epoch: 1 Global Step: 78320 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:29,834-Speed 2628.31 samples/sec Loss 12.7381 LearningRate 0.0820 Epoch: 1 Global Step: 78330 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:33,729-Speed 2629.37 samples/sec Loss 12.7006 LearningRate 0.0820 Epoch: 1 Global Step: 78340 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:37,624-Speed 2630.57 samples/sec Loss 12.8465 LearningRate 0.0820 Epoch: 1 Global Step: 78350 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:41,520-Speed 2629.07 samples/sec Loss 12.9006 LearningRate 0.0820 Epoch: 1 Global Step: 78360 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:45,418-Speed 2627.59 samples/sec Loss 12.6969 LearningRate 0.0820 Epoch: 1 Global Step: 78370 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:49,314-Speed 2629.08 samples/sec Loss 12.7265 LearningRate 0.0820 Epoch: 1 Global Step: 78380 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:53,212-Speed 2627.89 samples/sec Loss 12.8976 LearningRate 0.0820 Epoch: 1 Global Step: 78390 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:30:57,104-Speed 2631.42 samples/sec Loss 12.6753 LearningRate 0.0820 Epoch: 1 Global Step: 78400 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:31:01,001-Speed 2627.61 samples/sec Loss 12.8461 LearningRate 0.0820 Epoch: 1 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:04,918-Speed 2614.72 samples/sec Loss 12.8387 LearningRate 0.0820 Epoch: 1 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:08,811-Speed 2631.30 samples/sec Loss 12.6760 LearningRate 0.0820 Epoch: 1 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:12,720-Speed 2620.46 samples/sec Loss 12.8697 LearningRate 0.0820 Epoch: 1 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:16,622-Speed 2624.76 samples/sec Loss 12.7270 LearningRate 0.0820 Epoch: 1 Global Step: 78450 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:20,520-Speed 2627.78 samples/sec Loss 12.7657 LearningRate 0.0820 Epoch: 1 Global Step: 78460 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:24,416-Speed 2629.41 samples/sec Loss 12.9697 LearningRate 0.0820 Epoch: 1 Global Step: 78470 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:28,331-Speed 2616.09 samples/sec Loss 12.8208 LearningRate 0.0820 Epoch: 1 Global Step: 78480 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:32,228-Speed 2628.16 samples/sec Loss 12.7919 LearningRate 0.0820 Epoch: 1 Global Step: 78490 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:36,134-Speed 2621.76 samples/sec Loss 12.8770 LearningRate 0.0820 Epoch: 1 Global Step: 78500 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:40,033-Speed 2627.06 samples/sec Loss 12.7877 LearningRate 0.0820 Epoch: 1 Global Step: 78510 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:31:43,930-Speed 2628.64 samples/sec Loss 12.9402 LearningRate 0.0820 Epoch: 1 Global Step: 78520 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:31:47,823-Speed 2630.46 samples/sec Loss 12.8662 LearningRate 0.0820 Epoch: 1 Global Step: 78530 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:51,720-Speed 2628.27 samples/sec Loss 12.9167 LearningRate 0.0820 Epoch: 1 Global Step: 78540 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:55,619-Speed 2627.36 samples/sec Loss 12.8010 LearningRate 0.0820 Epoch: 1 Global Step: 78550 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:31:59,518-Speed 2626.98 samples/sec Loss 12.8306 LearningRate 0.0820 Epoch: 1 Global Step: 78560 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:03,414-Speed 2628.88 samples/sec Loss 12.7414 LearningRate 0.0820 Epoch: 1 Global Step: 78570 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:07,306-Speed 2631.06 samples/sec Loss 12.8925 LearningRate 0.0820 Epoch: 1 Global Step: 78580 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:11,202-Speed 2629.07 samples/sec Loss 12.7802 LearningRate 0.0820 Epoch: 1 Global Step: 78590 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:15,104-Speed 2625.10 samples/sec Loss 12.6732 LearningRate 0.0819 Epoch: 1 Global Step: 78600 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:19,014-Speed 2619.90 samples/sec Loss 12.8059 LearningRate 0.0819 Epoch: 1 Global Step: 78610 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:22,924-Speed 2619.57 samples/sec Loss 12.7461 LearningRate 0.0819 Epoch: 1 Global Step: 78620 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:26,820-Speed 2629.27 samples/sec Loss 12.8218 LearningRate 0.0819 Epoch: 1 Global Step: 78630 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:32:30,714-Speed 2630.54 samples/sec Loss 12.8370 LearningRate 0.0819 Epoch: 1 Global Step: 78640 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:32:34,615-Speed 2625.31 samples/sec Loss 12.8568 LearningRate 0.0819 Epoch: 1 Global Step: 78650 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:32:38,533-Speed 2614.15 samples/sec Loss 12.6604 LearningRate 0.0819 Epoch: 1 Global Step: 78660 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:32:42,428-Speed 2629.73 samples/sec Loss 12.7272 LearningRate 0.0819 Epoch: 1 Global Step: 78670 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:32:46,321-Speed 2631.21 samples/sec Loss 12.7554 LearningRate 0.0819 Epoch: 1 Global Step: 78680 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:32:50,214-Speed 2630.97 samples/sec Loss 12.7779 LearningRate 0.0819 Epoch: 1 Global Step: 78690 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:32:54,100-Speed 2635.72 samples/sec Loss 12.9035 LearningRate 0.0819 Epoch: 1 Global Step: 78700 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:32:57,993-Speed 2630.49 samples/sec Loss 12.8566 LearningRate 0.0819 Epoch: 1 Global Step: 78710 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:33:01,889-Speed 2628.97 samples/sec Loss 12.5797 LearningRate 0.0819 Epoch: 1 Global Step: 78720 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:33:05,796-Speed 2621.38 samples/sec Loss 12.7827 LearningRate 0.0819 Epoch: 1 Global Step: 78730 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:33:09,687-Speed 2632.34 samples/sec Loss 12.5908 LearningRate 0.0819 Epoch: 1 Global Step: 78740 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:33:13,581-Speed 2630.71 samples/sec Loss 12.7201 LearningRate 0.0819 Epoch: 1 Global Step: 78750 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:33:17,447-Speed 2649.90 samples/sec Loss 12.8506 LearningRate 0.0819 Epoch: 1 Global Step: 78760 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:33:21,323-Speed 2642.01 samples/sec Loss 12.9083 LearningRate 0.0819 Epoch: 1 Global Step: 78770 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:25,215-Speed 2632.11 samples/sec Loss 13.0466 LearningRate 0.0819 Epoch: 1 Global Step: 78780 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:29,105-Speed 2632.40 samples/sec Loss 13.0360 LearningRate 0.0819 Epoch: 1 Global Step: 78790 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:32,998-Speed 2630.83 samples/sec Loss 13.1003 LearningRate 0.0819 Epoch: 1 Global Step: 78800 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:36,890-Speed 2631.63 samples/sec Loss 12.7936 LearningRate 0.0819 Epoch: 1 Global Step: 78810 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:40,789-Speed 2626.97 samples/sec Loss 12.8726 LearningRate 0.0819 Epoch: 1 Global Step: 78820 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:44,680-Speed 2632.32 samples/sec Loss 12.9907 LearningRate 0.0819 Epoch: 1 Global Step: 78830 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:48,572-Speed 2632.46 samples/sec Loss 12.9715 LearningRate 0.0819 Epoch: 1 Global Step: 78840 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:52,465-Speed 2630.85 samples/sec Loss 12.7721 LearningRate 0.0819 Epoch: 1 Global Step: 78850 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:33:56,356-Speed 2632.38 samples/sec Loss 12.8491 LearningRate 0.0819 Epoch: 1 Global Step: 78860 Fp16 Grad Scale: 16384 Required: 84 hours
Training: 2022-04-13 04:34:00,248-Speed 2631.13 samples/sec Loss 13.0084 LearningRate 0.0819 Epoch: 1 Global Step: 78870 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:04,144-Speed 2628.87 samples/sec Loss 12.9840 LearningRate 0.0819 Epoch: 1 Global Step: 78880 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:08,040-Speed 2628.92 samples/sec Loss 12.9162 LearningRate 0.0819 Epoch: 1 Global Step: 78890 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:11,931-Speed 2632.57 samples/sec Loss 12.8324 LearningRate 0.0819 Epoch: 1 Global Step: 78900 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:15,825-Speed 2630.07 samples/sec Loss 12.6981 LearningRate 0.0819 Epoch: 1 Global Step: 78910 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:19,724-Speed 2626.77 samples/sec Loss 12.8609 LearningRate 0.0819 Epoch: 1 Global Step: 78920 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:23,622-Speed 2627.84 samples/sec Loss 12.7898 LearningRate 0.0819 Epoch: 1 Global Step: 78930 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:27,519-Speed 2628.63 samples/sec Loss 12.7558 LearningRate 0.0819 Epoch: 1 Global Step: 78940 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:31,413-Speed 2630.44 samples/sec Loss 12.7609 LearningRate 0.0819 Epoch: 1 Global Step: 78950 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:35,306-Speed 2630.23 samples/sec Loss 12.8650 LearningRate 0.0819 Epoch: 1 Global Step: 78960 Fp16 Grad Scale: 32768 Required: 84 hours
Training: 2022-04-13 04:34:39,199-Speed 2631.15 samples/sec Loss 12.7998 LearningRate 0.0819 Epoch: 1 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:34:43,092-Speed 2630.94 samples/sec Loss 12.7965 LearningRate 0.0819 Epoch: 1 Global Step: 78980 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:34:46,998-Speed 2622.15 samples/sec Loss 12.8599 LearningRate 0.0819 Epoch: 1 Global Step: 78990 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:34:50,917-Speed 2613.18 samples/sec Loss 12.6153 LearningRate 0.0819 Epoch: 1 Global Step: 79000 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:34:54,812-Speed 2629.87 samples/sec Loss 12.7665 LearningRate 0.0819 Epoch: 1 Global Step: 79010 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:34:58,708-Speed 2629.01 samples/sec Loss 12.8888 LearningRate 0.0819 Epoch: 1 Global Step: 79020 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:35:02,607-Speed 2626.90 samples/sec Loss 12.6751 LearningRate 0.0819 Epoch: 1 Global Step: 79030 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:35:06,501-Speed 2630.36 samples/sec Loss 12.7594 LearningRate 0.0819 Epoch: 1 Global Step: 79040 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:35:10,404-Speed 2624.21 samples/sec Loss 12.7454 LearningRate 0.0819 Epoch: 1 Global Step: 79050 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:35:14,294-Speed 2632.98 samples/sec Loss 12.7362 LearningRate 0.0818 Epoch: 1 Global Step: 79060 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:35:18,188-Speed 2630.35 samples/sec Loss 12.9595 LearningRate 0.0818 Epoch: 1 Global Step: 79070 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:22,088-Speed 2626.09 samples/sec Loss 12.8832 LearningRate 0.0818 Epoch: 1 Global Step: 79080 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:25,983-Speed 2629.60 samples/sec Loss 12.7141 LearningRate 0.0818 Epoch: 1 Global Step: 79090 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:29,883-Speed 2626.06 samples/sec Loss 12.6687 LearningRate 0.0818 Epoch: 1 Global Step: 79100 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:33,788-Speed 2622.99 samples/sec Loss 12.9121 LearningRate 0.0818 Epoch: 1 Global Step: 79110 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:37,684-Speed 2629.46 samples/sec Loss 12.9523 LearningRate 0.0818 Epoch: 1 Global Step: 79120 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:41,590-Speed 2622.13 samples/sec Loss 12.8702 LearningRate 0.0818 Epoch: 1 Global Step: 79130 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:45,499-Speed 2620.22 samples/sec Loss 12.8484 LearningRate 0.0818 Epoch: 1 Global Step: 79140 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:49,402-Speed 2624.37 samples/sec Loss 12.8190 LearningRate 0.0818 Epoch: 1 Global Step: 79150 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:53,317-Speed 2616.22 samples/sec Loss 12.7038 LearningRate 0.0818 Epoch: 1 Global Step: 79160 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:35:57,223-Speed 2621.80 samples/sec Loss 12.9511 LearningRate 0.0818 Epoch: 1 Global Step: 79170 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:36:01,136-Speed 2617.79 samples/sec Loss 12.8236 LearningRate 0.0818 Epoch: 1 Global Step: 79180 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:36:05,051-Speed 2616.44 samples/sec Loss 12.8169 LearningRate 0.0818 Epoch: 1 Global Step: 79190 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:36:08,957-Speed 2622.00 samples/sec Loss 12.8891 LearningRate 0.0818 Epoch: 1 Global Step: 79200 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:36:12,855-Speed 2627.34 samples/sec Loss 12.8398 LearningRate 0.0818 Epoch: 1 Global Step: 79210 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:16,764-Speed 2620.43 samples/sec Loss 12.7741 LearningRate 0.0818 Epoch: 1 Global Step: 79220 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:20,660-Speed 2628.98 samples/sec Loss 12.8372 LearningRate 0.0818 Epoch: 1 Global Step: 79230 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:24,561-Speed 2625.05 samples/sec Loss 12.8558 LearningRate 0.0818 Epoch: 1 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:28,463-Speed 2625.58 samples/sec Loss 12.9408 LearningRate 0.0818 Epoch: 1 Global Step: 79250 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:32,359-Speed 2628.84 samples/sec Loss 12.8133 LearningRate 0.0818 Epoch: 1 Global Step: 79260 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:36,261-Speed 2624.12 samples/sec Loss 12.8987 LearningRate 0.0818 Epoch: 1 Global Step: 79270 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:40,177-Speed 2615.69 samples/sec Loss 12.8755 LearningRate 0.0818 Epoch: 1 Global Step: 79280 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:44,072-Speed 2630.48 samples/sec Loss 12.7173 LearningRate 0.0818 Epoch: 1 Global Step: 79290 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:47,964-Speed 2631.48 samples/sec Loss 12.7431 LearningRate 0.0818 Epoch: 1 Global Step: 79300 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:36:51,857-Speed 2630.66 samples/sec Loss 12.7852 LearningRate 0.0818 Epoch: 1 Global Step: 79310 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:36:55,752-Speed 2630.00 samples/sec Loss 12.7898 LearningRate 0.0818 Epoch: 1 Global Step: 79320 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:36:59,646-Speed 2630.51 samples/sec Loss 12.9319 LearningRate 0.0818 Epoch: 1 Global Step: 79330 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:37:03,541-Speed 2629.44 samples/sec Loss 12.7597 LearningRate 0.0818 Epoch: 1 Global Step: 79340 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:37:07,435-Speed 2630.54 samples/sec Loss 12.8926 LearningRate 0.0818 Epoch: 1 Global Step: 79350 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:37:11,323-Speed 2634.00 samples/sec Loss 12.6816 LearningRate 0.0818 Epoch: 1 Global Step: 79360 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:15,224-Speed 2625.13 samples/sec Loss 12.6626 LearningRate 0.0818 Epoch: 1 Global Step: 79370 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:19,119-Speed 2630.18 samples/sec Loss 12.8122 LearningRate 0.0818 Epoch: 1 Global Step: 79380 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:23,016-Speed 2628.69 samples/sec Loss 12.9114 LearningRate 0.0818 Epoch: 1 Global Step: 79390 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:26,915-Speed 2627.03 samples/sec Loss 12.7084 LearningRate 0.0818 Epoch: 1 Global Step: 79400 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:30,810-Speed 2629.08 samples/sec Loss 12.6976 LearningRate 0.0818 Epoch: 1 Global Step: 79410 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:34,705-Speed 2629.81 samples/sec Loss 12.8348 LearningRate 0.0818 Epoch: 1 Global Step: 79420 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:38,599-Speed 2629.98 samples/sec Loss 12.9684 LearningRate 0.0818 Epoch: 1 Global Step: 79430 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:42,504-Speed 2623.15 samples/sec Loss 13.0040 LearningRate 0.0818 Epoch: 1 Global Step: 79440 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:46,404-Speed 2626.45 samples/sec Loss 12.8818 LearningRate 0.0818 Epoch: 1 Global Step: 79450 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:37:50,311-Speed 2621.70 samples/sec Loss 12.6068 LearningRate 0.0818 Epoch: 1 Global Step: 79460 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:37:54,216-Speed 2623.12 samples/sec Loss 12.7484 LearningRate 0.0818 Epoch: 1 Global Step: 79470 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:37:58,119-Speed 2624.02 samples/sec Loss 12.8204 LearningRate 0.0818 Epoch: 1 Global Step: 79480 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:38:02,004-Speed 2636.57 samples/sec Loss 12.8082 LearningRate 0.0818 Epoch: 1 Global Step: 79490 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:05,908-Speed 2623.01 samples/sec Loss 12.6333 LearningRate 0.0818 Epoch: 1 Global Step: 79500 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:09,822-Speed 2616.87 samples/sec Loss 12.7522 LearningRate 0.0817 Epoch: 1 Global Step: 79510 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:13,727-Speed 2622.89 samples/sec Loss 12.8615 LearningRate 0.0817 Epoch: 1 Global Step: 79520 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:17,632-Speed 2623.25 samples/sec Loss 12.7101 LearningRate 0.0817 Epoch: 1 Global Step: 79530 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:21,538-Speed 2622.43 samples/sec Loss 12.8005 LearningRate 0.0817 Epoch: 1 Global Step: 79540 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:25,446-Speed 2620.61 samples/sec Loss 12.6562 LearningRate 0.0817 Epoch: 1 Global Step: 79550 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:29,353-Speed 2621.79 samples/sec Loss 12.9067 LearningRate 0.0817 Epoch: 1 Global Step: 79560 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:33,256-Speed 2624.42 samples/sec Loss 12.7459 LearningRate 0.0817 Epoch: 1 Global Step: 79570 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:37,153-Speed 2627.85 samples/sec Loss 12.8979 LearningRate 0.0817 Epoch: 1 Global Step: 79580 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:38:41,047-Speed 2630.15 samples/sec Loss 12.8001 LearningRate 0.0817 Epoch: 1 Global Step: 79590 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:38:44,947-Speed 2625.79 samples/sec Loss 12.8898 LearningRate 0.0817 Epoch: 1 Global Step: 79600 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:38:48,841-Speed 2631.03 samples/sec Loss 12.6969 LearningRate 0.0817 Epoch: 1 Global Step: 79610 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:38:52,736-Speed 2629.41 samples/sec Loss 12.6629 LearningRate 0.0817 Epoch: 1 Global Step: 79620 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:38:56,615-Speed 2640.80 samples/sec Loss 12.6578 LearningRate 0.0817 Epoch: 1 Global Step: 79630 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:00,524-Speed 2620.38 samples/sec Loss 12.9250 LearningRate 0.0817 Epoch: 1 Global Step: 79640 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:04,423-Speed 2627.02 samples/sec Loss 12.7343 LearningRate 0.0817 Epoch: 1 Global Step: 79650 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:08,329-Speed 2621.85 samples/sec Loss 12.6895 LearningRate 0.0817 Epoch: 1 Global Step: 79660 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:12,243-Speed 2617.53 samples/sec Loss 12.6784 LearningRate 0.0817 Epoch: 1 Global Step: 79670 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:16,151-Speed 2620.77 samples/sec Loss 12.8997 LearningRate 0.0817 Epoch: 1 Global Step: 79680 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:20,050-Speed 2626.65 samples/sec Loss 12.8189 LearningRate 0.0817 Epoch: 1 Global Step: 79690 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:23,946-Speed 2629.58 samples/sec Loss 12.7580 LearningRate 0.0817 Epoch: 1 Global Step: 79700 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:27,861-Speed 2615.97 samples/sec Loss 12.9302 LearningRate 0.0817 Epoch: 1 Global Step: 79710 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:31,761-Speed 2625.93 samples/sec Loss 12.7935 LearningRate 0.0817 Epoch: 1 Global Step: 79720 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:35,660-Speed 2627.27 samples/sec Loss 12.9045 LearningRate 0.0817 Epoch: 1 Global Step: 79730 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:39:39,556-Speed 2628.77 samples/sec Loss 12.9638 LearningRate 0.0817 Epoch: 1 Global Step: 79740 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:39:43,442-Speed 2635.39 samples/sec Loss 12.6972 LearningRate 0.0817 Epoch: 1 Global Step: 79750 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:47,331-Speed 2633.71 samples/sec Loss 12.7133 LearningRate 0.0817 Epoch: 1 Global Step: 79760 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:51,225-Speed 2630.82 samples/sec Loss 12.9490 LearningRate 0.0817 Epoch: 1 Global Step: 79770 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:55,120-Speed 2629.52 samples/sec Loss 12.7519 LearningRate 0.0817 Epoch: 1 Global Step: 79780 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:39:59,026-Speed 2622.76 samples/sec Loss 12.7407 LearningRate 0.0817 Epoch: 1 Global Step: 79790 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:02,926-Speed 2625.82 samples/sec Loss 12.6279 LearningRate 0.0817 Epoch: 1 Global Step: 79800 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:06,825-Speed 2626.91 samples/sec Loss 12.6480 LearningRate 0.0817 Epoch: 1 Global Step: 79810 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:10,725-Speed 2626.11 samples/sec Loss 12.6798 LearningRate 0.0817 Epoch: 1 Global Step: 79820 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:14,627-Speed 2625.38 samples/sec Loss 12.9296 LearningRate 0.0817 Epoch: 1 Global Step: 79830 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:18,524-Speed 2628.28 samples/sec Loss 12.8823 LearningRate 0.0817 Epoch: 1 Global Step: 79840 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:22,423-Speed 2626.46 samples/sec Loss 12.7759 LearningRate 0.0817 Epoch: 1 Global Step: 79850 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:40:26,304-Speed 2639.53 samples/sec Loss 12.6561 LearningRate 0.0817 Epoch: 1 Global Step: 79860 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:30,199-Speed 2629.60 samples/sec Loss 12.7642 LearningRate 0.0817 Epoch: 1 Global Step: 79870 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:34,095-Speed 2629.55 samples/sec Loss 12.7361 LearningRate 0.0817 Epoch: 1 Global Step: 79880 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:37,987-Speed 2631.08 samples/sec Loss 12.6978 LearningRate 0.0817 Epoch: 1 Global Step: 79890 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:41,892-Speed 2623.25 samples/sec Loss 12.7561 LearningRate 0.0817 Epoch: 1 Global Step: 79900 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:45,814-Speed 2611.35 samples/sec Loss 12.7193 LearningRate 0.0817 Epoch: 1 Global Step: 79910 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:49,744-Speed 2606.50 samples/sec Loss 12.7288 LearningRate 0.0817 Epoch: 1 Global Step: 79920 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:53,641-Speed 2628.63 samples/sec Loss 12.9034 LearningRate 0.0817 Epoch: 1 Global Step: 79930 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:40:57,555-Speed 2616.82 samples/sec Loss 12.8345 LearningRate 0.0817 Epoch: 1 Global Step: 79940 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:41:01,472-Speed 2615.02 samples/sec Loss 12.7232 LearningRate 0.0817 Epoch: 1 Global Step: 79950 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:41:05,414-Speed 2598.04 samples/sec Loss 12.8026 LearningRate 0.0817 Epoch: 1 Global Step: 79960 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:41:09,337-Speed 2610.76 samples/sec Loss 12.6180 LearningRate 0.0816 Epoch: 1 Global Step: 79970 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:41:13,242-Speed 2622.81 samples/sec Loss 12.8061 LearningRate 0.0816 Epoch: 1 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:41:17,140-Speed 2627.97 samples/sec Loss 12.6761 LearningRate 0.0816 Epoch: 1 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:41:21,037-Speed 2628.73 samples/sec Loss 12.7602 LearningRate 0.0816 Epoch: 1 Global Step: 80000 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:42:04,299-[lfw][80000]XNorm: 22.462203
Training: 2022-04-13 04:42:04,300-[lfw][80000]Accuracy-Flip: 0.99717+-0.00269
Training: 2022-04-13 04:42:04,301-[lfw][80000]Accuracy-Highest: 0.99783
Training: 2022-04-13 04:42:54,464-[cfp_fp][80000]XNorm: 20.692995
Training: 2022-04-13 04:42:54,465-[cfp_fp][80000]Accuracy-Flip: 0.97586+-0.00640
Training: 2022-04-13 04:42:54,466-[cfp_fp][80000]Accuracy-Highest: 0.97586
Training: 2022-04-13 04:43:37,644-[agedb_30][80000]XNorm: 22.361619
Training: 2022-04-13 04:43:37,645-[agedb_30][80000]Accuracy-Flip: 0.96600+-0.00779
Training: 2022-04-13 04:43:37,646-[agedb_30][80000]Accuracy-Highest: 0.96600
Training: 2022-04-13 04:43:41,512-Speed 72.90 samples/sec Loss 12.8134 LearningRate 0.0816 Epoch: 1 Global Step: 80010 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:43:45,383-Speed 2646.66 samples/sec Loss 12.7882 LearningRate 0.0816 Epoch: 1 Global Step: 80020 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:43:49,268-Speed 2636.10 samples/sec Loss 12.6886 LearningRate 0.0816 Epoch: 1 Global Step: 80030 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:43:53,137-Speed 2647.22 samples/sec Loss 12.9031 LearningRate 0.0816 Epoch: 1 Global Step: 80040 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:43:57,015-Speed 2641.92 samples/sec Loss 12.8988 LearningRate 0.0816 Epoch: 1 Global Step: 80050 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:00,892-Speed 2641.43 samples/sec Loss 12.8086 LearningRate 0.0816 Epoch: 1 Global Step: 80060 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:04,770-Speed 2641.46 samples/sec Loss 12.6649 LearningRate 0.0816 Epoch: 1 Global Step: 80070 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:08,653-Speed 2638.40 samples/sec Loss 12.7231 LearningRate 0.0816 Epoch: 1 Global Step: 80080 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:44:12,536-Speed 2638.82 samples/sec Loss 12.6675 LearningRate 0.0816 Epoch: 1 Global Step: 80090 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:44:16,400-Speed 2650.16 samples/sec Loss 12.8448 LearningRate 0.0816 Epoch: 1 Global Step: 80100 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:20,282-Speed 2638.29 samples/sec Loss 12.7490 LearningRate 0.0816 Epoch: 1 Global Step: 80110 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:24,169-Speed 2634.78 samples/sec Loss 12.7155 LearningRate 0.0816 Epoch: 1 Global Step: 80120 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:28,062-Speed 2631.81 samples/sec Loss 12.8777 LearningRate 0.0816 Epoch: 1 Global Step: 80130 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:31,953-Speed 2633.04 samples/sec Loss 12.8058 LearningRate 0.0816 Epoch: 1 Global Step: 80140 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:35,855-Speed 2624.43 samples/sec Loss 12.7501 LearningRate 0.0816 Epoch: 1 Global Step: 80150 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:39,751-Speed 2629.24 samples/sec Loss 12.7743 LearningRate 0.0816 Epoch: 1 Global Step: 80160 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:43,644-Speed 2631.21 samples/sec Loss 12.6797 LearningRate 0.0816 Epoch: 1 Global Step: 80170 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:47,530-Speed 2635.37 samples/sec Loss 12.8394 LearningRate 0.0816 Epoch: 1 Global Step: 80180 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:51,421-Speed 2631.71 samples/sec Loss 12.7971 LearningRate 0.0816 Epoch: 1 Global Step: 80190 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:44:55,307-Speed 2636.24 samples/sec Loss 12.7350 LearningRate 0.0816 Epoch: 1 Global Step: 80200 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:44:59,216-Speed 2620.57 samples/sec Loss 12.7464 LearningRate 0.0816 Epoch: 1 Global Step: 80210 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:45:03,134-Speed 2614.41 samples/sec Loss 12.8488 LearningRate 0.0816 Epoch: 1 Global Step: 80220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:45:07,025-Speed 2632.07 samples/sec Loss 12.6119 LearningRate 0.0816 Epoch: 1 Global Step: 80230 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:45:10,918-Speed 2631.36 samples/sec Loss 12.7329 LearningRate 0.0816 Epoch: 1 Global Step: 80240 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:45:14,792-Speed 2643.32 samples/sec Loss 12.6955 LearningRate 0.0816 Epoch: 1 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:18,690-Speed 2627.51 samples/sec Loss 12.6549 LearningRate 0.0816 Epoch: 1 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:22,585-Speed 2629.43 samples/sec Loss 12.6659 LearningRate 0.0816 Epoch: 1 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:26,491-Speed 2622.54 samples/sec Loss 12.5762 LearningRate 0.0816 Epoch: 1 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:30,396-Speed 2622.53 samples/sec Loss 12.6130 LearningRate 0.0816 Epoch: 1 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:34,303-Speed 2622.53 samples/sec Loss 12.6348 LearningRate 0.0816 Epoch: 1 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:38,202-Speed 2626.46 samples/sec Loss 12.8579 LearningRate 0.0816 Epoch: 1 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:42,094-Speed 2631.84 samples/sec Loss 12.8343 LearningRate 0.0816 Epoch: 1 Global Step: 80320 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:45,997-Speed 2624.56 samples/sec Loss 12.6599 LearningRate 0.0816 Epoch: 1 Global Step: 80330 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:49,890-Speed 2631.01 samples/sec Loss 12.8302 LearningRate 0.0816 Epoch: 1 Global Step: 80340 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:45:53,782-Speed 2631.30 samples/sec Loss 12.7807 LearningRate 0.0816 Epoch: 1 Global Step: 80350 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:45:57,678-Speed 2629.50 samples/sec Loss 12.7705 LearningRate 0.0816 Epoch: 1 Global Step: 80360 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:01,570-Speed 2631.37 samples/sec Loss 12.7796 LearningRate 0.0816 Epoch: 1 Global Step: 80370 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:05,641-Speed 2515.90 samples/sec Loss 12.8321 LearningRate 0.0816 Epoch: 1 Global Step: 80380 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:09,676-Speed 2538.71 samples/sec Loss 12.9455 LearningRate 0.0816 Epoch: 1 Global Step: 80390 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:13,577-Speed 2625.94 samples/sec Loss 12.7840 LearningRate 0.0816 Epoch: 1 Global Step: 80400 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:17,480-Speed 2624.12 samples/sec Loss 12.6987 LearningRate 0.0816 Epoch: 1 Global Step: 80410 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:21,376-Speed 2628.83 samples/sec Loss 12.7805 LearningRate 0.0816 Epoch: 1 Global Step: 80420 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:25,274-Speed 2627.66 samples/sec Loss 12.6222 LearningRate 0.0815 Epoch: 1 Global Step: 80430 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:29,176-Speed 2624.43 samples/sec Loss 12.8477 LearningRate 0.0815 Epoch: 1 Global Step: 80440 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:33,063-Speed 2634.98 samples/sec Loss 12.8288 LearningRate 0.0815 Epoch: 1 Global Step: 80450 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:36,974-Speed 2619.04 samples/sec Loss 12.7998 LearningRate 0.0815 Epoch: 1 Global Step: 80460 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:40,886-Speed 2618.84 samples/sec Loss 12.5066 LearningRate 0.0815 Epoch: 1 Global Step: 80470 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:46:44,784-Speed 2627.32 samples/sec Loss 12.7464 LearningRate 0.0815 Epoch: 1 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:46:48,679-Speed 2630.11 samples/sec Loss 12.7376 LearningRate 0.0815 Epoch: 1 Global Step: 80490 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:46:52,571-Speed 2631.64 samples/sec Loss 12.6713 LearningRate 0.0815 Epoch: 1 Global Step: 80500 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:46:56,466-Speed 2629.56 samples/sec Loss 12.7699 LearningRate 0.0815 Epoch: 1 Global Step: 80510 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:00,364-Speed 2627.43 samples/sec Loss 12.8512 LearningRate 0.0815 Epoch: 1 Global Step: 80520 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:04,260-Speed 2628.88 samples/sec Loss 12.6571 LearningRate 0.0815 Epoch: 1 Global Step: 80530 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:08,161-Speed 2625.91 samples/sec Loss 12.7599 LearningRate 0.0815 Epoch: 1 Global Step: 80540 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:12,060-Speed 2627.43 samples/sec Loss 12.6809 LearningRate 0.0815 Epoch: 1 Global Step: 80550 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:15,951-Speed 2632.24 samples/sec Loss 12.6743 LearningRate 0.0815 Epoch: 1 Global Step: 80560 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:19,854-Speed 2624.20 samples/sec Loss 12.9173 LearningRate 0.0815 Epoch: 1 Global Step: 80570 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:23,737-Speed 2637.48 samples/sec Loss 12.7960 LearningRate 0.0815 Epoch: 1 Global Step: 80580 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:27,649-Speed 2618.07 samples/sec Loss 12.8844 LearningRate 0.0815 Epoch: 1 Global Step: 80590 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:31,552-Speed 2624.01 samples/sec Loss 12.8958 LearningRate 0.0815 Epoch: 1 Global Step: 80600 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:35,470-Speed 2614.38 samples/sec Loss 12.7100 LearningRate 0.0815 Epoch: 1 Global Step: 80610 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:39,454-Speed 2570.65 samples/sec Loss 12.6080 LearningRate 0.0815 Epoch: 1 Global Step: 80620 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:43,350-Speed 2628.94 samples/sec Loss 12.8326 LearningRate 0.0815 Epoch: 1 Global Step: 80630 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:47,288-Speed 2601.28 samples/sec Loss 12.7482 LearningRate 0.0815 Epoch: 1 Global Step: 80640 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:51,189-Speed 2625.48 samples/sec Loss 12.7633 LearningRate 0.0815 Epoch: 1 Global Step: 80650 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:55,084-Speed 2629.68 samples/sec Loss 12.6248 LearningRate 0.0815 Epoch: 1 Global Step: 80660 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:47:58,978-Speed 2629.60 samples/sec Loss 12.8583 LearningRate 0.0815 Epoch: 1 Global Step: 80670 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:48:02,883-Speed 2623.18 samples/sec Loss 12.7128 LearningRate 0.0815 Epoch: 1 Global Step: 80680 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:06,779-Speed 2629.12 samples/sec Loss 12.8196 LearningRate 0.0815 Epoch: 1 Global Step: 80690 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:10,674-Speed 2629.30 samples/sec Loss 12.6130 LearningRate 0.0815 Epoch: 1 Global Step: 80700 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:14,570-Speed 2629.25 samples/sec Loss 12.6965 LearningRate 0.0815 Epoch: 1 Global Step: 80710 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:18,469-Speed 2627.33 samples/sec Loss 12.6029 LearningRate 0.0815 Epoch: 1 Global Step: 80720 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:22,361-Speed 2631.61 samples/sec Loss 12.7464 LearningRate 0.0815 Epoch: 1 Global Step: 80730 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:26,261-Speed 2625.92 samples/sec Loss 12.8016 LearningRate 0.0815 Epoch: 1 Global Step: 80740 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:30,158-Speed 2627.82 samples/sec Loss 12.7421 LearningRate 0.0815 Epoch: 1 Global Step: 80750 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:34,054-Speed 2629.55 samples/sec Loss 12.6184 LearningRate 0.0815 Epoch: 1 Global Step: 80760 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:37,951-Speed 2627.99 samples/sec Loss 12.8556 LearningRate 0.0815 Epoch: 1 Global Step: 80770 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:41,841-Speed 2632.98 samples/sec Loss 12.6286 LearningRate 0.0815 Epoch: 1 Global Step: 80780 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:45,742-Speed 2625.94 samples/sec Loss 12.7446 LearningRate 0.0815 Epoch: 1 Global Step: 80790 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:49,640-Speed 2627.16 samples/sec Loss 12.6728 LearningRate 0.0815 Epoch: 1 Global Step: 80800 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:53,533-Speed 2631.14 samples/sec Loss 12.6960 LearningRate 0.0815 Epoch: 1 Global Step: 80810 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:48:57,435-Speed 2625.27 samples/sec Loss 12.6455 LearningRate 0.0815 Epoch: 1 Global Step: 80820 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:49:01,329-Speed 2629.92 samples/sec Loss 12.7649 LearningRate 0.0815 Epoch: 1 Global Step: 80830 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:49:05,224-Speed 2629.28 samples/sec Loss 12.6688 LearningRate 0.0815 Epoch: 1 Global Step: 80840 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:49:09,104-Speed 2640.04 samples/sec Loss 12.7638 LearningRate 0.0815 Epoch: 1 Global Step: 80850 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:49:12,999-Speed 2629.82 samples/sec Loss 12.7499 LearningRate 0.0815 Epoch: 1 Global Step: 80860 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:49:16,895-Speed 2629.04 samples/sec Loss 12.5910 LearningRate 0.0815 Epoch: 1 Global Step: 80870 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:49:20,806-Speed 2618.99 samples/sec Loss 12.7298 LearningRate 0.0815 Epoch: 1 Global Step: 80880 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:49:24,697-Speed 2632.09 samples/sec Loss 12.7774 LearningRate 0.0814 Epoch: 1 Global Step: 80890 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:49:28,576-Speed 2640.08 samples/sec Loss 12.8067 LearningRate 0.0814 Epoch: 1 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:32,484-Speed 2620.99 samples/sec Loss 12.6970 LearningRate 0.0814 Epoch: 1 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:36,380-Speed 2629.38 samples/sec Loss 12.8965 LearningRate 0.0814 Epoch: 1 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:40,272-Speed 2631.42 samples/sec Loss 12.8138 LearningRate 0.0814 Epoch: 1 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:44,166-Speed 2630.01 samples/sec Loss 12.6446 LearningRate 0.0814 Epoch: 1 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:48,063-Speed 2628.49 samples/sec Loss 12.8336 LearningRate 0.0814 Epoch: 1 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:51,958-Speed 2629.39 samples/sec Loss 12.6968 LearningRate 0.0814 Epoch: 1 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:55,858-Speed 2626.45 samples/sec Loss 12.6231 LearningRate 0.0814 Epoch: 1 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:49:59,756-Speed 2627.38 samples/sec Loss 12.6524 LearningRate 0.0814 Epoch: 1 Global Step: 80980 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:50:03,669-Speed 2617.80 samples/sec Loss 12.8564 LearningRate 0.0814 Epoch: 1 Global Step: 80990 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 04:50:07,590-Speed 2611.91 samples/sec Loss 12.6948 LearningRate 0.0814 Epoch: 1 Global Step: 81000 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:11,484-Speed 2630.79 samples/sec Loss 12.6512 LearningRate 0.0814 Epoch: 1 Global Step: 81010 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:15,382-Speed 2627.41 samples/sec Loss 12.5896 LearningRate 0.0814 Epoch: 1 Global Step: 81020 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:19,279-Speed 2628.25 samples/sec Loss 12.7791 LearningRate 0.0814 Epoch: 1 Global Step: 81030 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:23,172-Speed 2630.50 samples/sec Loss 12.8260 LearningRate 0.0814 Epoch: 1 Global Step: 81040 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:27,066-Speed 2630.43 samples/sec Loss 12.7332 LearningRate 0.0814 Epoch: 1 Global Step: 81050 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:30,964-Speed 2627.96 samples/sec Loss 12.7378 LearningRate 0.0814 Epoch: 1 Global Step: 81060 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:34,868-Speed 2623.21 samples/sec Loss 12.9162 LearningRate 0.0814 Epoch: 1 Global Step: 81070 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:38,783-Speed 2615.89 samples/sec Loss 12.7844 LearningRate 0.0814 Epoch: 1 Global Step: 81080 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:42,693-Speed 2619.86 samples/sec Loss 12.7199 LearningRate 0.0814 Epoch: 1 Global Step: 81090 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:46,591-Speed 2627.60 samples/sec Loss 12.8298 LearningRate 0.0814 Epoch: 1 Global Step: 81100 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:50,476-Speed 2636.66 samples/sec Loss 12.7015 LearningRate 0.0814 Epoch: 1 Global Step: 81110 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:54,365-Speed 2633.26 samples/sec Loss 12.7069 LearningRate 0.0814 Epoch: 1 Global Step: 81120 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:50:58,261-Speed 2629.22 samples/sec Loss 12.6679 LearningRate 0.0814 Epoch: 1 Global Step: 81130 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:02,152-Speed 2631.96 samples/sec Loss 12.7390 LearningRate 0.0814 Epoch: 1 Global Step: 81140 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:06,042-Speed 2632.85 samples/sec Loss 12.7599 LearningRate 0.0814 Epoch: 1 Global Step: 81150 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:09,940-Speed 2627.90 samples/sec Loss 12.7039 LearningRate 0.0814 Epoch: 1 Global Step: 81160 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:13,837-Speed 2628.30 samples/sec Loss 12.8003 LearningRate 0.0814 Epoch: 1 Global Step: 81170 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:17,744-Speed 2621.62 samples/sec Loss 12.5774 LearningRate 0.0814 Epoch: 1 Global Step: 81180 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:21,638-Speed 2630.41 samples/sec Loss 12.7879 LearningRate 0.0814 Epoch: 1 Global Step: 81190 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:25,549-Speed 2619.12 samples/sec Loss 12.8007 LearningRate 0.0814 Epoch: 1 Global Step: 81200 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:51:29,442-Speed 2630.30 samples/sec Loss 12.8140 LearningRate 0.0814 Epoch: 1 Global Step: 81210 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:51:33,506-Speed 2520.32 samples/sec Loss 12.7617 LearningRate 0.0814 Epoch: 1 Global Step: 81220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:51:37,527-Speed 2547.29 samples/sec Loss 12.8809 LearningRate 0.0814 Epoch: 1 Global Step: 81230 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:41,420-Speed 2632.25 samples/sec Loss 12.7595 LearningRate 0.0814 Epoch: 1 Global Step: 81240 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:45,324-Speed 2622.99 samples/sec Loss 12.9105 LearningRate 0.0814 Epoch: 1 Global Step: 81250 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:49,218-Speed 2630.57 samples/sec Loss 12.7549 LearningRate 0.0814 Epoch: 1 Global Step: 81260 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:53,114-Speed 2629.28 samples/sec Loss 12.6899 LearningRate 0.0814 Epoch: 1 Global Step: 81270 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:51:57,030-Speed 2615.76 samples/sec Loss 12.7518 LearningRate 0.0814 Epoch: 1 Global Step: 81280 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:00,920-Speed 2632.87 samples/sec Loss 12.8466 LearningRate 0.0814 Epoch: 1 Global Step: 81290 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:04,817-Speed 2628.56 samples/sec Loss 12.6669 LearningRate 0.0814 Epoch: 1 Global Step: 81300 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:08,708-Speed 2631.90 samples/sec Loss 12.6150 LearningRate 0.0814 Epoch: 1 Global Step: 81310 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:12,600-Speed 2631.44 samples/sec Loss 12.6824 LearningRate 0.0814 Epoch: 1 Global Step: 81320 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:16,494-Speed 2630.18 samples/sec Loss 12.8216 LearningRate 0.0814 Epoch: 1 Global Step: 81330 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:52:20,400-Speed 2622.23 samples/sec Loss 12.7252 LearningRate 0.0814 Epoch: 1 Global Step: 81340 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:52:24,305-Speed 2623.19 samples/sec Loss 12.5465 LearningRate 0.0813 Epoch: 1 Global Step: 81350 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:52:28,195-Speed 2632.86 samples/sec Loss 12.7207 LearningRate 0.0813 Epoch: 1 Global Step: 81360 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:52:32,090-Speed 2630.17 samples/sec Loss 12.7792 LearningRate 0.0813 Epoch: 1 Global Step: 81370 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:52:35,963-Speed 2644.19 samples/sec Loss 12.5230 LearningRate 0.0813 Epoch: 1 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:39,853-Speed 2633.45 samples/sec Loss 12.8004 LearningRate 0.0813 Epoch: 1 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:43,748-Speed 2629.59 samples/sec Loss 12.6007 LearningRate 0.0813 Epoch: 1 Global Step: 81400 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:47,682-Speed 2603.57 samples/sec Loss 12.6748 LearningRate 0.0813 Epoch: 1 Global Step: 81410 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:51,576-Speed 2630.02 samples/sec Loss 12.6747 LearningRate 0.0813 Epoch: 1 Global Step: 81420 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:55,467-Speed 2632.10 samples/sec Loss 12.7286 LearningRate 0.0813 Epoch: 1 Global Step: 81430 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:52:59,365-Speed 2627.25 samples/sec Loss 12.6791 LearningRate 0.0813 Epoch: 1 Global Step: 81440 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:03,259-Speed 2630.74 samples/sec Loss 12.7724 LearningRate 0.0813 Epoch: 1 Global Step: 81450 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:07,155-Speed 2629.03 samples/sec Loss 12.6047 LearningRate 0.0813 Epoch: 1 Global Step: 81460 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:11,053-Speed 2627.82 samples/sec Loss 12.7933 LearningRate 0.0813 Epoch: 1 Global Step: 81470 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:14,949-Speed 2628.43 samples/sec Loss 12.6709 LearningRate 0.0813 Epoch: 1 Global Step: 81480 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:53:18,832-Speed 2638.04 samples/sec Loss 12.6596 LearningRate 0.0813 Epoch: 1 Global Step: 81490 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:22,732-Speed 2625.76 samples/sec Loss 12.6927 LearningRate 0.0813 Epoch: 1 Global Step: 81500 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:26,634-Speed 2625.18 samples/sec Loss 12.6432 LearningRate 0.0813 Epoch: 1 Global Step: 81510 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:30,533-Speed 2626.28 samples/sec Loss 12.5319 LearningRate 0.0813 Epoch: 1 Global Step: 81520 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:34,435-Speed 2625.54 samples/sec Loss 12.5782 LearningRate 0.0813 Epoch: 1 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:38,336-Speed 2625.32 samples/sec Loss 12.5581 LearningRate 0.0813 Epoch: 1 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:42,237-Speed 2625.67 samples/sec Loss 12.7249 LearningRate 0.0813 Epoch: 1 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:46,138-Speed 2626.03 samples/sec Loss 12.5270 LearningRate 0.0813 Epoch: 1 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:50,036-Speed 2627.74 samples/sec Loss 12.8708 LearningRate 0.0813 Epoch: 1 Global Step: 81570 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:53,931-Speed 2629.30 samples/sec Loss 12.7725 LearningRate 0.0813 Epoch: 1 Global Step: 81580 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:53:57,823-Speed 2631.45 samples/sec Loss 12.6493 LearningRate 0.0813 Epoch: 1 Global Step: 81590 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:54:01,716-Speed 2630.73 samples/sec Loss 12.7764 LearningRate 0.0813 Epoch: 1 Global Step: 81600 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:54:05,616-Speed 2626.25 samples/sec Loss 12.6475 LearningRate 0.0813 Epoch: 1 Global Step: 81610 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:54:09,524-Speed 2621.22 samples/sec Loss 12.6867 LearningRate 0.0813 Epoch: 1 Global Step: 81620 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:54:13,432-Speed 2620.30 samples/sec Loss 12.4883 LearningRate 0.0813 Epoch: 1 Global Step: 81630 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:54:17,334-Speed 2625.61 samples/sec Loss 12.7440 LearningRate 0.0813 Epoch: 1 Global Step: 81640 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:54:21,216-Speed 2638.55 samples/sec Loss 12.7219 LearningRate 0.0813 Epoch: 1 Global Step: 81650 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:25,114-Speed 2627.47 samples/sec Loss 12.8117 LearningRate 0.0813 Epoch: 1 Global Step: 81660 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:29,016-Speed 2624.30 samples/sec Loss 12.6689 LearningRate 0.0813 Epoch: 1 Global Step: 81670 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:32,915-Speed 2627.15 samples/sec Loss 12.7536 LearningRate 0.0813 Epoch: 1 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:36,812-Speed 2628.26 samples/sec Loss 12.5726 LearningRate 0.0813 Epoch: 1 Global Step: 81690 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:40,705-Speed 2630.60 samples/sec Loss 12.6654 LearningRate 0.0813 Epoch: 1 Global Step: 81700 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:44,606-Speed 2625.56 samples/sec Loss 12.5321 LearningRate 0.0813 Epoch: 1 Global Step: 81710 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:48,502-Speed 2629.01 samples/sec Loss 12.7687 LearningRate 0.0813 Epoch: 1 Global Step: 81720 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:52,397-Speed 2629.96 samples/sec Loss 12.7468 LearningRate 0.0813 Epoch: 1 Global Step: 81730 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:54:56,296-Speed 2627.01 samples/sec Loss 12.6623 LearningRate 0.0813 Epoch: 1 Global Step: 81740 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:55:00,193-Speed 2628.21 samples/sec Loss 12.6982 LearningRate 0.0813 Epoch: 1 Global Step: 81750 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:04,095-Speed 2624.97 samples/sec Loss 12.7564 LearningRate 0.0813 Epoch: 1 Global Step: 81760 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:07,989-Speed 2630.11 samples/sec Loss 12.7046 LearningRate 0.0813 Epoch: 1 Global Step: 81770 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:11,885-Speed 2628.48 samples/sec Loss 12.7803 LearningRate 0.0813 Epoch: 1 Global Step: 81780 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:15,780-Speed 2629.46 samples/sec Loss 12.6532 LearningRate 0.0813 Epoch: 1 Global Step: 81790 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:19,674-Speed 2630.46 samples/sec Loss 12.6093 LearningRate 0.0813 Epoch: 1 Global Step: 81800 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:23,574-Speed 2626.53 samples/sec Loss 12.8642 LearningRate 0.0812 Epoch: 1 Global Step: 81810 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:27,473-Speed 2632.76 samples/sec Loss 12.5971 LearningRate 0.0812 Epoch: 1 Global Step: 81820 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:31,369-Speed 2628.81 samples/sec Loss 12.7297 LearningRate 0.0812 Epoch: 1 Global Step: 81830 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:35,264-Speed 2629.51 samples/sec Loss 12.8105 LearningRate 0.0812 Epoch: 1 Global Step: 81840 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:39,152-Speed 2634.10 samples/sec Loss 12.6360 LearningRate 0.0812 Epoch: 1 Global Step: 81850 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:43,044-Speed 2632.25 samples/sec Loss 12.5466 LearningRate 0.0812 Epoch: 1 Global Step: 81860 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:46,943-Speed 2626.87 samples/sec Loss 12.6833 LearningRate 0.0812 Epoch: 1 Global Step: 81870 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:50,839-Speed 2628.56 samples/sec Loss 12.7764 LearningRate 0.0812 Epoch: 1 Global Step: 81880 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:54,744-Speed 2623.19 samples/sec Loss 12.5993 LearningRate 0.0812 Epoch: 1 Global Step: 81890 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:55:58,642-Speed 2627.66 samples/sec Loss 12.7398 LearningRate 0.0812 Epoch: 1 Global Step: 81900 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:02,535-Speed 2630.44 samples/sec Loss 12.7492 LearningRate 0.0812 Epoch: 1 Global Step: 81910 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:06,430-Speed 2629.54 samples/sec Loss 12.8317 LearningRate 0.0812 Epoch: 1 Global Step: 81920 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:10,335-Speed 2622.99 samples/sec Loss 12.7210 LearningRate 0.0812 Epoch: 1 Global Step: 81930 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:14,232-Speed 2628.84 samples/sec Loss 12.6599 LearningRate 0.0812 Epoch: 1 Global Step: 81940 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:18,135-Speed 2624.50 samples/sec Loss 12.7609 LearningRate 0.0812 Epoch: 1 Global Step: 81950 Fp16 Grad Scale: 524288 Required: 84 hours
Training: 2022-04-13 04:56:22,018-Speed 2637.28 samples/sec Loss 12.7326 LearningRate 0.0812 Epoch: 1 Global Step: 81960 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:25,923-Speed 2623.31 samples/sec Loss 12.5790 LearningRate 0.0812 Epoch: 1 Global Step: 81970 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:29,825-Speed 2624.71 samples/sec Loss 12.6488 LearningRate 0.0812 Epoch: 1 Global Step: 81980 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:33,919-Speed 2501.46 samples/sec Loss 12.7640 LearningRate 0.0812 Epoch: 1 Global Step: 81990 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:37,931-Speed 2553.15 samples/sec Loss 12.6293 LearningRate 0.0812 Epoch: 1 Global Step: 82000 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:41,827-Speed 2628.54 samples/sec Loss 12.6754 LearningRate 0.0812 Epoch: 1 Global Step: 82010 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:45,741-Speed 2617.55 samples/sec Loss 12.6118 LearningRate 0.0812 Epoch: 1 Global Step: 82020 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:56:49,651-Speed 2619.58 samples/sec Loss 12.6666 LearningRate 0.0812 Epoch: 1 Global Step: 82030 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:56:53,542-Speed 2632.03 samples/sec Loss 12.5728 LearningRate 0.0812 Epoch: 1 Global Step: 82040 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:56:57,447-Speed 2623.33 samples/sec Loss 12.5581 LearningRate 0.0812 Epoch: 1 Global Step: 82050 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:01,340-Speed 2630.39 samples/sec Loss 12.6067 LearningRate 0.0812 Epoch: 1 Global Step: 82060 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:05,246-Speed 2621.94 samples/sec Loss 12.6107 LearningRate 0.0812 Epoch: 1 Global Step: 82070 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:09,146-Speed 2626.67 samples/sec Loss 12.7986 LearningRate 0.0812 Epoch: 1 Global Step: 82080 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:13,069-Speed 2611.07 samples/sec Loss 12.5893 LearningRate 0.0812 Epoch: 1 Global Step: 82090 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:16,967-Speed 2626.87 samples/sec Loss 12.7543 LearningRate 0.0812 Epoch: 1 Global Step: 82100 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:20,866-Speed 2627.57 samples/sec Loss 12.7022 LearningRate 0.0812 Epoch: 1 Global Step: 82110 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:24,761-Speed 2629.33 samples/sec Loss 12.7836 LearningRate 0.0812 Epoch: 1 Global Step: 82120 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 04:57:28,701-Speed 2600.15 samples/sec Loss 12.7135 LearningRate 0.0812 Epoch: 1 Global Step: 82130 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:32,607-Speed 2621.80 samples/sec Loss 12.5842 LearningRate 0.0812 Epoch: 1 Global Step: 82140 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:36,503-Speed 2629.17 samples/sec Loss 12.7908 LearningRate 0.0812 Epoch: 1 Global Step: 82150 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:40,399-Speed 2628.36 samples/sec Loss 12.7755 LearningRate 0.0812 Epoch: 1 Global Step: 82160 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:44,296-Speed 2628.48 samples/sec Loss 12.6497 LearningRate 0.0812 Epoch: 1 Global Step: 82170 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:48,196-Speed 2626.47 samples/sec Loss 12.5428 LearningRate 0.0812 Epoch: 1 Global Step: 82180 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:52,101-Speed 2622.59 samples/sec Loss 12.6256 LearningRate 0.0812 Epoch: 1 Global Step: 82190 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:56,005-Speed 2624.00 samples/sec Loss 12.6583 LearningRate 0.0812 Epoch: 1 Global Step: 82200 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:57:59,914-Speed 2620.24 samples/sec Loss 12.7950 LearningRate 0.0812 Epoch: 1 Global Step: 82210 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:03,816-Speed 2624.84 samples/sec Loss 12.7580 LearningRate 0.0812 Epoch: 1 Global Step: 82220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:07,710-Speed 2629.86 samples/sec Loss 12.5997 LearningRate 0.0812 Epoch: 1 Global Step: 82230 Fp16 Grad Scale: 524288 Required: 84 hours
Training: 2022-04-13 04:58:11,593-Speed 2638.00 samples/sec Loss 12.6582 LearningRate 0.0812 Epoch: 1 Global Step: 82240 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:15,496-Speed 2624.32 samples/sec Loss 12.8005 LearningRate 0.0812 Epoch: 1 Global Step: 82250 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:19,408-Speed 2617.95 samples/sec Loss 12.6448 LearningRate 0.0812 Epoch: 1 Global Step: 82260 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:23,316-Speed 2620.74 samples/sec Loss 12.7128 LearningRate 0.0811 Epoch: 1 Global Step: 82270 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:27,215-Speed 2627.33 samples/sec Loss 12.6916 LearningRate 0.0811 Epoch: 1 Global Step: 82280 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:31,111-Speed 2628.88 samples/sec Loss 12.7688 LearningRate 0.0811 Epoch: 1 Global Step: 82290 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:35,006-Speed 2629.86 samples/sec Loss 12.6644 LearningRate 0.0811 Epoch: 1 Global Step: 82300 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:38,899-Speed 2630.83 samples/sec Loss 12.8269 LearningRate 0.0811 Epoch: 1 Global Step: 82310 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:42,800-Speed 2625.70 samples/sec Loss 12.5897 LearningRate 0.0811 Epoch: 1 Global Step: 82320 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:46,696-Speed 2629.08 samples/sec Loss 12.6441 LearningRate 0.0811 Epoch: 1 Global Step: 82330 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:50,576-Speed 2639.57 samples/sec Loss 12.7058 LearningRate 0.0811 Epoch: 1 Global Step: 82340 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:54,475-Speed 2627.77 samples/sec Loss 12.7321 LearningRate 0.0811 Epoch: 1 Global Step: 82350 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:58:58,373-Speed 2627.21 samples/sec Loss 12.9011 LearningRate 0.0811 Epoch: 1 Global Step: 82360 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:02,272-Speed 2626.70 samples/sec Loss 12.7435 LearningRate 0.0811 Epoch: 1 Global Step: 82370 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:06,175-Speed 2624.55 samples/sec Loss 12.6077 LearningRate 0.0811 Epoch: 1 Global Step: 82380 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:10,071-Speed 2629.04 samples/sec Loss 12.6641 LearningRate 0.0811 Epoch: 1 Global Step: 82390 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:13,965-Speed 2630.74 samples/sec Loss 12.6721 LearningRate 0.0811 Epoch: 1 Global Step: 82400 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:17,860-Speed 2629.90 samples/sec Loss 12.7596 LearningRate 0.0811 Epoch: 1 Global Step: 82410 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:21,754-Speed 2630.35 samples/sec Loss 12.7226 LearningRate 0.0811 Epoch: 1 Global Step: 82420 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:25,646-Speed 2631.05 samples/sec Loss 12.7139 LearningRate 0.0811 Epoch: 1 Global Step: 82430 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:29,528-Speed 2638.37 samples/sec Loss 12.6202 LearningRate 0.0811 Epoch: 1 Global Step: 82440 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:33,423-Speed 2629.60 samples/sec Loss 12.6714 LearningRate 0.0811 Epoch: 1 Global Step: 82450 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:37,315-Speed 2631.71 samples/sec Loss 12.6482 LearningRate 0.0811 Epoch: 1 Global Step: 82460 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:41,209-Speed 2629.96 samples/sec Loss 12.6327 LearningRate 0.0811 Epoch: 1 Global Step: 82470 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:45,105-Speed 2629.25 samples/sec Loss 12.6468 LearningRate 0.0811 Epoch: 1 Global Step: 82480 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:49,010-Speed 2623.20 samples/sec Loss 12.6882 LearningRate 0.0811 Epoch: 1 Global Step: 82490 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:52,907-Speed 2628.60 samples/sec Loss 12.6799 LearningRate 0.0811 Epoch: 1 Global Step: 82500 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 04:59:56,807-Speed 2626.17 samples/sec Loss 12.8499 LearningRate 0.0811 Epoch: 1 Global Step: 82510 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:00,709-Speed 2624.63 samples/sec Loss 12.7245 LearningRate 0.0811 Epoch: 1 Global Step: 82520 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:04,621-Speed 2617.64 samples/sec Loss 12.7288 LearningRate 0.0811 Epoch: 1 Global Step: 82530 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:08,507-Speed 2636.30 samples/sec Loss 12.6739 LearningRate 0.0811 Epoch: 1 Global Step: 82540 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:12,413-Speed 2621.67 samples/sec Loss 12.7611 LearningRate 0.0811 Epoch: 1 Global Step: 82550 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:16,309-Speed 2628.92 samples/sec Loss 12.7189 LearningRate 0.0811 Epoch: 1 Global Step: 82560 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:20,205-Speed 2629.60 samples/sec Loss 12.5895 LearningRate 0.0811 Epoch: 1 Global Step: 82570 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:24,101-Speed 2628.86 samples/sec Loss 12.7070 LearningRate 0.0811 Epoch: 1 Global Step: 82580 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:27,998-Speed 2628.10 samples/sec Loss 12.6487 LearningRate 0.0811 Epoch: 1 Global Step: 82590 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:31,890-Speed 2631.67 samples/sec Loss 12.7384 LearningRate 0.0811 Epoch: 1 Global Step: 82600 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:35,785-Speed 2629.36 samples/sec Loss 12.7496 LearningRate 0.0811 Epoch: 1 Global Step: 82610 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:39,695-Speed 2619.55 samples/sec Loss 12.7746 LearningRate 0.0811 Epoch: 1 Global Step: 82620 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:43,734-Speed 2536.12 samples/sec Loss 12.6215 LearningRate 0.0811 Epoch: 1 Global Step: 82630 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:47,766-Speed 2540.17 samples/sec Loss 12.7360 LearningRate 0.0811 Epoch: 1 Global Step: 82640 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:51,666-Speed 2626.85 samples/sec Loss 12.7079 LearningRate 0.0811 Epoch: 1 Global Step: 82650 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:55,563-Speed 2628.36 samples/sec Loss 12.7221 LearningRate 0.0811 Epoch: 1 Global Step: 82660 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:00:59,460-Speed 2627.54 samples/sec Loss 12.7136 LearningRate 0.0811 Epoch: 1 Global Step: 82670 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:01:03,356-Speed 2628.82 samples/sec Loss 12.6344 LearningRate 0.0811 Epoch: 1 Global Step: 82680 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:01:07,236-Speed 2639.94 samples/sec Loss 12.7343 LearningRate 0.0811 Epoch: 1 Global Step: 82690 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:11,134-Speed 2627.79 samples/sec Loss 12.6650 LearningRate 0.0811 Epoch: 1 Global Step: 82700 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:15,058-Speed 2609.82 samples/sec Loss 12.6163 LearningRate 0.0811 Epoch: 1 Global Step: 82710 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:18,955-Speed 2628.47 samples/sec Loss 12.6202 LearningRate 0.0811 Epoch: 1 Global Step: 82720 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:22,855-Speed 2626.19 samples/sec Loss 12.6998 LearningRate 0.0810 Epoch: 1 Global Step: 82730 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:26,756-Speed 2626.05 samples/sec Loss 12.7759 LearningRate 0.0810 Epoch: 1 Global Step: 82740 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:30,658-Speed 2624.96 samples/sec Loss 12.7015 LearningRate 0.0810 Epoch: 1 Global Step: 82750 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:34,558-Speed 2626.05 samples/sec Loss 12.7171 LearningRate 0.0810 Epoch: 1 Global Step: 82760 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:38,451-Speed 2630.89 samples/sec Loss 12.7009 LearningRate 0.0810 Epoch: 1 Global Step: 82770 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:42,340-Speed 2633.38 samples/sec Loss 12.6431 LearningRate 0.0810 Epoch: 1 Global Step: 82780 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:01:46,234-Speed 2630.27 samples/sec Loss 12.7664 LearningRate 0.0810 Epoch: 1 Global Step: 82790 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:01:50,129-Speed 2629.83 samples/sec Loss 12.6279 LearningRate 0.0810 Epoch: 1 Global Step: 82800 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:01:54,032-Speed 2623.75 samples/sec Loss 12.8490 LearningRate 0.0810 Epoch: 1 Global Step: 82810 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:01:57,936-Speed 2624.14 samples/sec Loss 12.6811 LearningRate 0.0810 Epoch: 1 Global Step: 82820 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:02:01,864-Speed 2607.11 samples/sec Loss 12.6864 LearningRate 0.0810 Epoch: 1 Global Step: 82830 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:02:05,771-Speed 2622.16 samples/sec Loss 12.8650 LearningRate 0.0810 Epoch: 1 Global Step: 82840 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:02:09,668-Speed 2627.87 samples/sec Loss 12.6440 LearningRate 0.0810 Epoch: 1 Global Step: 82850 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:02:13,567-Speed 2627.22 samples/sec Loss 12.6982 LearningRate 0.0810 Epoch: 1 Global Step: 82860 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:02:17,464-Speed 2628.07 samples/sec Loss 12.8198 LearningRate 0.0810 Epoch: 1 Global Step: 82870 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:02:21,325-Speed 2652.76 samples/sec Loss 12.8725 LearningRate 0.0810 Epoch: 1 Global Step: 82880 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:02:25,218-Speed 2630.76 samples/sec Loss 12.7274 LearningRate 0.0810 Epoch: 1 Global Step: 82890 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:02:29,111-Speed 2630.76 samples/sec Loss 12.9040 LearningRate 0.0810 Epoch: 1 Global Step: 82900 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:02:33,004-Speed 2631.09 samples/sec Loss 12.7294 LearningRate 0.0810 Epoch: 1 Global Step: 82910 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:02:36,906-Speed 2624.73 samples/sec Loss 12.6230 LearningRate 0.0810 Epoch: 1 Global Step: 82920 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:02:40,817-Speed 2619.28 samples/sec Loss 12.9445 LearningRate 0.0810 Epoch: 1 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:02:44,716-Speed 2627.21 samples/sec Loss 12.8571 LearningRate 0.0810 Epoch: 1 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:02:48,635-Speed 2613.38 samples/sec Loss 12.7275 LearningRate 0.0810 Epoch: 1 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:03:09,768-Speed 484.57 samples/sec Loss 12.6181 LearningRate 0.0810 Epoch: 2 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:03:13,643-Speed 2643.91 samples/sec Loss 12.6812 LearningRate 0.0810 Epoch: 2 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:03:17,526-Speed 2637.38 samples/sec Loss 12.7368 LearningRate 0.0810 Epoch: 2 Global Step: 82980 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:21,408-Speed 2639.01 samples/sec Loss 12.7815 LearningRate 0.0810 Epoch: 2 Global Step: 82990 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:25,289-Speed 2639.04 samples/sec Loss 12.7604 LearningRate 0.0810 Epoch: 2 Global Step: 83000 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:29,185-Speed 2628.80 samples/sec Loss 12.7710 LearningRate 0.0810 Epoch: 2 Global Step: 83010 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:33,090-Speed 2623.52 samples/sec Loss 12.8139 LearningRate 0.0810 Epoch: 2 Global Step: 83020 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:36,984-Speed 2630.33 samples/sec Loss 12.7055 LearningRate 0.0810 Epoch: 2 Global Step: 83030 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:40,870-Speed 2635.55 samples/sec Loss 12.6682 LearningRate 0.0810 Epoch: 2 Global Step: 83040 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:44,762-Speed 2631.98 samples/sec Loss 12.6308 LearningRate 0.0810 Epoch: 2 Global Step: 83050 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:48,666-Speed 2623.54 samples/sec Loss 12.5435 LearningRate 0.0810 Epoch: 2 Global Step: 83060 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:52,555-Speed 2633.51 samples/sec Loss 12.7190 LearningRate 0.0810 Epoch: 2 Global Step: 83070 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:03:56,447-Speed 2632.64 samples/sec Loss 12.8639 LearningRate 0.0810 Epoch: 2 Global Step: 83080 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:00,337-Speed 2632.53 samples/sec Loss 12.5883 LearningRate 0.0810 Epoch: 2 Global Step: 83090 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:04,229-Speed 2632.03 samples/sec Loss 12.7701 LearningRate 0.0810 Epoch: 2 Global Step: 83100 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:08,124-Speed 2629.20 samples/sec Loss 12.6821 LearningRate 0.0810 Epoch: 2 Global Step: 83110 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:12,020-Speed 2629.61 samples/sec Loss 12.8174 LearningRate 0.0810 Epoch: 2 Global Step: 83120 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:15,914-Speed 2629.95 samples/sec Loss 12.4704 LearningRate 0.0810 Epoch: 2 Global Step: 83130 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:19,806-Speed 2631.90 samples/sec Loss 12.7375 LearningRate 0.0810 Epoch: 2 Global Step: 83140 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:23,698-Speed 2632.07 samples/sec Loss 12.5765 LearningRate 0.0810 Epoch: 2 Global Step: 83150 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:27,597-Speed 2626.99 samples/sec Loss 12.6542 LearningRate 0.0810 Epoch: 2 Global Step: 83160 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:31,489-Speed 2631.48 samples/sec Loss 12.5551 LearningRate 0.0810 Epoch: 2 Global Step: 83170 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:35,367-Speed 2641.72 samples/sec Loss 12.5882 LearningRate 0.0810 Epoch: 2 Global Step: 83180 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:39,261-Speed 2630.77 samples/sec Loss 12.7456 LearningRate 0.0809 Epoch: 2 Global Step: 83190 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:43,153-Speed 2631.41 samples/sec Loss 12.6461 LearningRate 0.0809 Epoch: 2 Global Step: 83200 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:47,051-Speed 2627.69 samples/sec Loss 12.7515 LearningRate 0.0809 Epoch: 2 Global Step: 83210 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:50,949-Speed 2627.60 samples/sec Loss 12.6849 LearningRate 0.0809 Epoch: 2 Global Step: 83220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:54,844-Speed 2629.94 samples/sec Loss 12.7803 LearningRate 0.0809 Epoch: 2 Global Step: 83230 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:04:58,747-Speed 2623.87 samples/sec Loss 12.6311 LearningRate 0.0809 Epoch: 2 Global Step: 83240 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:02,655-Speed 2620.68 samples/sec Loss 12.6180 LearningRate 0.0809 Epoch: 2 Global Step: 83250 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:06,549-Speed 2630.42 samples/sec Loss 12.5331 LearningRate 0.0809 Epoch: 2 Global Step: 83260 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:10,457-Speed 2621.34 samples/sec Loss 12.8018 LearningRate 0.0809 Epoch: 2 Global Step: 83270 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:14,335-Speed 2641.28 samples/sec Loss 12.7061 LearningRate 0.0809 Epoch: 2 Global Step: 83280 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:18,233-Speed 2627.44 samples/sec Loss 12.6106 LearningRate 0.0809 Epoch: 2 Global Step: 83290 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:22,127-Speed 2630.78 samples/sec Loss 12.7963 LearningRate 0.0809 Epoch: 2 Global Step: 83300 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:26,023-Speed 2628.65 samples/sec Loss 12.7176 LearningRate 0.0809 Epoch: 2 Global Step: 83310 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:29,921-Speed 2628.12 samples/sec Loss 12.6952 LearningRate 0.0809 Epoch: 2 Global Step: 83320 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:33,820-Speed 2626.39 samples/sec Loss 12.6567 LearningRate 0.0809 Epoch: 2 Global Step: 83330 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:37,717-Speed 2628.40 samples/sec Loss 12.6643 LearningRate 0.0809 Epoch: 2 Global Step: 83340 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:41,616-Speed 2626.94 samples/sec Loss 12.6514 LearningRate 0.0809 Epoch: 2 Global Step: 83350 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:45,513-Speed 2628.53 samples/sec Loss 12.5549 LearningRate 0.0809 Epoch: 2 Global Step: 83360 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:49,408-Speed 2631.72 samples/sec Loss 12.8500 LearningRate 0.0809 Epoch: 2 Global Step: 83370 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:05:53,303-Speed 2629.03 samples/sec Loss 12.7266 LearningRate 0.0809 Epoch: 2 Global Step: 83380 Fp16 Grad Scale: 524288 Required: 84 hours
Training: 2022-04-13 05:05:57,180-Speed 2641.95 samples/sec Loss 12.5868 LearningRate 0.0809 Epoch: 2 Global Step: 83390 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:01,074-Speed 2630.60 samples/sec Loss 12.5005 LearningRate 0.0809 Epoch: 2 Global Step: 83400 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:04,978-Speed 2623.57 samples/sec Loss 12.5689 LearningRate 0.0809 Epoch: 2 Global Step: 83410 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:08,879-Speed 2625.33 samples/sec Loss 12.6257 LearningRate 0.0809 Epoch: 2 Global Step: 83420 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:12,774-Speed 2629.43 samples/sec Loss 12.5673 LearningRate 0.0809 Epoch: 2 Global Step: 83430 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:16,674-Speed 2626.86 samples/sec Loss 12.7097 LearningRate 0.0809 Epoch: 2 Global Step: 83440 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:20,571-Speed 2628.38 samples/sec Loss 12.7615 LearningRate 0.0809 Epoch: 2 Global Step: 83450 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:24,470-Speed 2626.85 samples/sec Loss 12.5454 LearningRate 0.0809 Epoch: 2 Global Step: 83460 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:28,451-Speed 2572.77 samples/sec Loss 12.5263 LearningRate 0.0809 Epoch: 2 Global Step: 83470 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:32,349-Speed 2627.87 samples/sec Loss 12.6705 LearningRate 0.0809 Epoch: 2 Global Step: 83480 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:36,236-Speed 2634.75 samples/sec Loss 12.7527 LearningRate 0.0809 Epoch: 2 Global Step: 83490 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:40,206-Speed 2579.60 samples/sec Loss 12.6085 LearningRate 0.0809 Epoch: 2 Global Step: 83500 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:06:44,227-Speed 2547.63 samples/sec Loss 12.6121 LearningRate 0.0809 Epoch: 2 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:06:48,147-Speed 2612.67 samples/sec Loss 12.4387 LearningRate 0.0809 Epoch: 2 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:06:52,048-Speed 2626.02 samples/sec Loss 12.6112 LearningRate 0.0809 Epoch: 2 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:06:55,945-Speed 2628.19 samples/sec Loss 12.6596 LearningRate 0.0809 Epoch: 2 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:06:59,840-Speed 2630.18 samples/sec Loss 12.8412 LearningRate 0.0809 Epoch: 2 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:03,734-Speed 2629.76 samples/sec Loss 12.8221 LearningRate 0.0809 Epoch: 2 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:07,626-Speed 2631.66 samples/sec Loss 12.8442 LearningRate 0.0809 Epoch: 2 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:11,518-Speed 2631.36 samples/sec Loss 12.6725 LearningRate 0.0809 Epoch: 2 Global Step: 83580 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:15,418-Speed 2626.40 samples/sec Loss 12.5799 LearningRate 0.0809 Epoch: 2 Global Step: 83590 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:19,358-Speed 2599.68 samples/sec Loss 12.6651 LearningRate 0.0809 Epoch: 2 Global Step: 83600 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:23,263-Speed 2623.06 samples/sec Loss 12.6659 LearningRate 0.0809 Epoch: 2 Global Step: 83610 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:07:27,166-Speed 2624.60 samples/sec Loss 12.6874 LearningRate 0.0809 Epoch: 2 Global Step: 83620 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:07:31,083-Speed 2614.61 samples/sec Loss 12.7385 LearningRate 0.0809 Epoch: 2 Global Step: 83630 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:07:34,972-Speed 2633.62 samples/sec Loss 12.8003 LearningRate 0.0809 Epoch: 2 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:38,878-Speed 2622.44 samples/sec Loss 12.8965 LearningRate 0.0808 Epoch: 2 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:42,770-Speed 2631.47 samples/sec Loss 12.6835 LearningRate 0.0808 Epoch: 2 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:46,662-Speed 2631.55 samples/sec Loss 12.8774 LearningRate 0.0808 Epoch: 2 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:50,556-Speed 2630.44 samples/sec Loss 12.7550 LearningRate 0.0808 Epoch: 2 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:54,450-Speed 2630.22 samples/sec Loss 12.6474 LearningRate 0.0808 Epoch: 2 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:07:58,351-Speed 2626.01 samples/sec Loss 12.7320 LearningRate 0.0808 Epoch: 2 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:08:02,245-Speed 2629.79 samples/sec Loss 12.6085 LearningRate 0.0808 Epoch: 2 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:08:06,136-Speed 2632.21 samples/sec Loss 12.7492 LearningRate 0.0808 Epoch: 2 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:08:10,031-Speed 2630.24 samples/sec Loss 12.7555 LearningRate 0.0808 Epoch: 2 Global Step: 83730 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:08:13,929-Speed 2627.42 samples/sec Loss 12.6233 LearningRate 0.0808 Epoch: 2 Global Step: 83740 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:17,828-Speed 2626.86 samples/sec Loss 12.7859 LearningRate 0.0808 Epoch: 2 Global Step: 83750 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:21,721-Speed 2630.69 samples/sec Loss 12.6010 LearningRate 0.0808 Epoch: 2 Global Step: 83760 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:25,621-Speed 2626.42 samples/sec Loss 12.6841 LearningRate 0.0808 Epoch: 2 Global Step: 83770 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:29,533-Speed 2618.63 samples/sec Loss 12.7767 LearningRate 0.0808 Epoch: 2 Global Step: 83780 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:33,443-Speed 2619.35 samples/sec Loss 12.7096 LearningRate 0.0808 Epoch: 2 Global Step: 83790 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:37,354-Speed 2619.33 samples/sec Loss 12.6757 LearningRate 0.0808 Epoch: 2 Global Step: 83800 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:41,264-Speed 2619.20 samples/sec Loss 12.5644 LearningRate 0.0808 Epoch: 2 Global Step: 83810 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:45,199-Speed 2603.31 samples/sec Loss 12.6744 LearningRate 0.0808 Epoch: 2 Global Step: 83820 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:49,109-Speed 2619.69 samples/sec Loss 12.7836 LearningRate 0.0808 Epoch: 2 Global Step: 83830 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:08:53,024-Speed 2616.57 samples/sec Loss 12.7465 LearningRate 0.0808 Epoch: 2 Global Step: 83840 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:08:56,928-Speed 2623.74 samples/sec Loss 12.6630 LearningRate 0.0808 Epoch: 2 Global Step: 83850 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:00,835-Speed 2621.33 samples/sec Loss 12.7791 LearningRate 0.0808 Epoch: 2 Global Step: 83860 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:04,736-Speed 2626.25 samples/sec Loss 12.6649 LearningRate 0.0808 Epoch: 2 Global Step: 83870 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:08,637-Speed 2625.10 samples/sec Loss 12.7156 LearningRate 0.0808 Epoch: 2 Global Step: 83880 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:12,545-Speed 2621.07 samples/sec Loss 12.7347 LearningRate 0.0808 Epoch: 2 Global Step: 83890 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:16,460-Speed 2616.14 samples/sec Loss 12.7525 LearningRate 0.0808 Epoch: 2 Global Step: 83900 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:20,360-Speed 2626.32 samples/sec Loss 12.8300 LearningRate 0.0808 Epoch: 2 Global Step: 83910 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:24,264-Speed 2624.45 samples/sec Loss 12.6140 LearningRate 0.0808 Epoch: 2 Global Step: 83920 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:28,167-Speed 2623.71 samples/sec Loss 12.7353 LearningRate 0.0808 Epoch: 2 Global Step: 83930 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:32,093-Speed 2609.51 samples/sec Loss 12.7445 LearningRate 0.0808 Epoch: 2 Global Step: 83940 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:36,007-Speed 2617.00 samples/sec Loss 12.7494 LearningRate 0.0808 Epoch: 2 Global Step: 83950 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:39,913-Speed 2621.60 samples/sec Loss 12.7077 LearningRate 0.0808 Epoch: 2 Global Step: 83960 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:43,823-Speed 2619.99 samples/sec Loss 12.8054 LearningRate 0.0808 Epoch: 2 Global Step: 83970 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:47,733-Speed 2619.50 samples/sec Loss 12.6227 LearningRate 0.0808 Epoch: 2 Global Step: 83980 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:51,634-Speed 2625.57 samples/sec Loss 12.6517 LearningRate 0.0808 Epoch: 2 Global Step: 83990 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:55,538-Speed 2624.28 samples/sec Loss 12.7417 LearningRate 0.0808 Epoch: 2 Global Step: 84000 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:09:59,433-Speed 2629.44 samples/sec Loss 12.8201 LearningRate 0.0808 Epoch: 2 Global Step: 84010 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:03,366-Speed 2604.58 samples/sec Loss 12.6396 LearningRate 0.0808 Epoch: 2 Global Step: 84020 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:07,286-Speed 2612.67 samples/sec Loss 12.7039 LearningRate 0.0808 Epoch: 2 Global Step: 84030 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:11,212-Speed 2608.81 samples/sec Loss 12.5985 LearningRate 0.0808 Epoch: 2 Global Step: 84040 Fp16 Grad Scale: 524288 Required: 84 hours
Training: 2022-04-13 05:10:15,095-Speed 2638.10 samples/sec Loss 12.5920 LearningRate 0.0808 Epoch: 2 Global Step: 84050 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:18,998-Speed 2624.05 samples/sec Loss 12.6899 LearningRate 0.0808 Epoch: 2 Global Step: 84060 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:22,905-Speed 2621.61 samples/sec Loss 12.7865 LearningRate 0.0808 Epoch: 2 Global Step: 84070 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:26,811-Speed 2622.04 samples/sec Loss 12.6616 LearningRate 0.0808 Epoch: 2 Global Step: 84080 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:30,712-Speed 2625.93 samples/sec Loss 12.7313 LearningRate 0.0808 Epoch: 2 Global Step: 84090 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:34,626-Speed 2617.09 samples/sec Loss 12.7410 LearningRate 0.0808 Epoch: 2 Global Step: 84100 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:38,529-Speed 2624.52 samples/sec Loss 12.6205 LearningRate 0.0808 Epoch: 2 Global Step: 84110 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:42,450-Speed 2611.63 samples/sec Loss 12.8531 LearningRate 0.0807 Epoch: 2 Global Step: 84120 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:46,390-Speed 2599.63 samples/sec Loss 12.6965 LearningRate 0.0807 Epoch: 2 Global Step: 84130 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:50,298-Speed 2620.72 samples/sec Loss 12.7952 LearningRate 0.0807 Epoch: 2 Global Step: 84140 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:54,235-Speed 2602.61 samples/sec Loss 12.5031 LearningRate 0.0807 Epoch: 2 Global Step: 84150 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:10:58,136-Speed 2625.89 samples/sec Loss 12.7377 LearningRate 0.0807 Epoch: 2 Global Step: 84160 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:02,034-Speed 2627.66 samples/sec Loss 12.5362 LearningRate 0.0807 Epoch: 2 Global Step: 84170 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:05,939-Speed 2622.34 samples/sec Loss 12.6135 LearningRate 0.0807 Epoch: 2 Global Step: 84180 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:09,845-Speed 2622.87 samples/sec Loss 12.6470 LearningRate 0.0807 Epoch: 2 Global Step: 84190 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:13,743-Speed 2627.37 samples/sec Loss 12.6101 LearningRate 0.0807 Epoch: 2 Global Step: 84200 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:17,642-Speed 2626.77 samples/sec Loss 12.7002 LearningRate 0.0807 Epoch: 2 Global Step: 84210 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:21,537-Speed 2629.42 samples/sec Loss 12.5859 LearningRate 0.0807 Epoch: 2 Global Step: 84220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:25,433-Speed 2628.86 samples/sec Loss 12.4878 LearningRate 0.0807 Epoch: 2 Global Step: 84230 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:29,335-Speed 2626.00 samples/sec Loss 12.6438 LearningRate 0.0807 Epoch: 2 Global Step: 84240 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:11:33,238-Speed 2623.78 samples/sec Loss 12.6012 LearningRate 0.0807 Epoch: 2 Global Step: 84250 Fp16 Grad Scale: 524288 Required: 84 hours
Training: 2022-04-13 05:11:37,113-Speed 2642.99 samples/sec Loss 12.5769 LearningRate 0.0807 Epoch: 2 Global Step: 84260 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:11:41,017-Speed 2623.68 samples/sec Loss 12.6687 LearningRate 0.0807 Epoch: 2 Global Step: 84270 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:11:44,923-Speed 2622.62 samples/sec Loss 12.5435 LearningRate 0.0807 Epoch: 2 Global Step: 84280 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:11:48,827-Speed 2623.21 samples/sec Loss 12.8105 LearningRate 0.0807 Epoch: 2 Global Step: 84290 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:11:52,727-Speed 2627.04 samples/sec Loss 12.7847 LearningRate 0.0807 Epoch: 2 Global Step: 84300 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:11:56,629-Speed 2624.89 samples/sec Loss 12.7399 LearningRate 0.0807 Epoch: 2 Global Step: 84310 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:12:00,527-Speed 2627.30 samples/sec Loss 12.7072 LearningRate 0.0807 Epoch: 2 Global Step: 84320 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:12:04,433-Speed 2622.70 samples/sec Loss 12.6864 LearningRate 0.0807 Epoch: 2 Global Step: 84330 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:12:08,333-Speed 2626.34 samples/sec Loss 12.6342 LearningRate 0.0807 Epoch: 2 Global Step: 84340 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:12:12,232-Speed 2626.73 samples/sec Loss 12.7220 LearningRate 0.0807 Epoch: 2 Global Step: 84350 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:12:16,124-Speed 2631.95 samples/sec Loss 12.5929 LearningRate 0.0807 Epoch: 2 Global Step: 84360 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:20,018-Speed 2630.20 samples/sec Loss 12.6315 LearningRate 0.0807 Epoch: 2 Global Step: 84370 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:23,914-Speed 2629.13 samples/sec Loss 12.6572 LearningRate 0.0807 Epoch: 2 Global Step: 84380 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:27,810-Speed 2629.48 samples/sec Loss 12.5986 LearningRate 0.0807 Epoch: 2 Global Step: 84390 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:31,713-Speed 2623.92 samples/sec Loss 12.6473 LearningRate 0.0807 Epoch: 2 Global Step: 84400 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:35,756-Speed 2533.22 samples/sec Loss 12.8007 LearningRate 0.0807 Epoch: 2 Global Step: 84410 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:39,647-Speed 2632.27 samples/sec Loss 12.7784 LearningRate 0.0807 Epoch: 2 Global Step: 84420 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:43,543-Speed 2629.66 samples/sec Loss 12.6485 LearningRate 0.0807 Epoch: 2 Global Step: 84430 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:47,439-Speed 2628.56 samples/sec Loss 12.8082 LearningRate 0.0807 Epoch: 2 Global Step: 84440 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:51,335-Speed 2629.17 samples/sec Loss 12.5653 LearningRate 0.0807 Epoch: 2 Global Step: 84450 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:12:55,254-Speed 2613.73 samples/sec Loss 12.7548 LearningRate 0.0807 Epoch: 2 Global Step: 84460 Fp16 Grad Scale: 524288 Required: 84 hours
Training: 2022-04-13 05:12:59,149-Speed 2629.50 samples/sec Loss 12.5432 LearningRate 0.0807 Epoch: 2 Global Step: 84470 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:03,045-Speed 2629.03 samples/sec Loss 12.6680 LearningRate 0.0807 Epoch: 2 Global Step: 84480 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:06,938-Speed 2631.14 samples/sec Loss 12.6283 LearningRate 0.0807 Epoch: 2 Global Step: 84490 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:10,892-Speed 2589.90 samples/sec Loss 12.6699 LearningRate 0.0807 Epoch: 2 Global Step: 84500 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:14,800-Speed 2621.75 samples/sec Loss 12.6858 LearningRate 0.0807 Epoch: 2 Global Step: 84510 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:18,691-Speed 2632.27 samples/sec Loss 12.7933 LearningRate 0.0807 Epoch: 2 Global Step: 84520 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:22,585-Speed 2630.34 samples/sec Loss 12.6295 LearningRate 0.0807 Epoch: 2 Global Step: 84530 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:26,478-Speed 2631.31 samples/sec Loss 12.6732 LearningRate 0.0807 Epoch: 2 Global Step: 84540 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:30,372-Speed 2630.30 samples/sec Loss 12.5617 LearningRate 0.0807 Epoch: 2 Global Step: 84550 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:13:34,254-Speed 2638.54 samples/sec Loss 12.6293 LearningRate 0.0807 Epoch: 2 Global Step: 84560 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:13:38,151-Speed 2628.34 samples/sec Loss 12.5032 LearningRate 0.0807 Epoch: 2 Global Step: 84570 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:13:42,110-Speed 2586.56 samples/sec Loss 12.7094 LearningRate 0.0806 Epoch: 2 Global Step: 84580 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:13:46,012-Speed 2626.45 samples/sec Loss 12.5143 LearningRate 0.0806 Epoch: 2 Global Step: 84590 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:13:49,901-Speed 2633.37 samples/sec Loss 12.5269 LearningRate 0.0806 Epoch: 2 Global Step: 84600 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:13:53,795-Speed 2630.54 samples/sec Loss 12.6594 LearningRate 0.0806 Epoch: 2 Global Step: 84610 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:13:57,698-Speed 2624.30 samples/sec Loss 12.8048 LearningRate 0.0806 Epoch: 2 Global Step: 84620 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:14:01,591-Speed 2630.52 samples/sec Loss 12.5336 LearningRate 0.0806 Epoch: 2 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:14:05,491-Speed 2626.15 samples/sec Loss 12.5849 LearningRate 0.0806 Epoch: 2 Global Step: 84640 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:14:09,405-Speed 2616.87 samples/sec Loss 12.7306 LearningRate 0.0806 Epoch: 2 Global Step: 84650 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:14:13,509-Speed 2495.98 samples/sec Loss 12.6487 LearningRate 0.0806 Epoch: 2 Global Step: 84660 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:17,403-Speed 2630.48 samples/sec Loss 12.7807 LearningRate 0.0806 Epoch: 2 Global Step: 84670 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:21,308-Speed 2622.83 samples/sec Loss 12.4878 LearningRate 0.0806 Epoch: 2 Global Step: 84680 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:25,206-Speed 2628.53 samples/sec Loss 12.6558 LearningRate 0.0806 Epoch: 2 Global Step: 84690 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:29,103-Speed 2627.59 samples/sec Loss 12.7988 LearningRate 0.0806 Epoch: 2 Global Step: 84700 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:32,997-Speed 2630.94 samples/sec Loss 12.6838 LearningRate 0.0806 Epoch: 2 Global Step: 84710 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:36,900-Speed 2623.62 samples/sec Loss 12.7378 LearningRate 0.0806 Epoch: 2 Global Step: 84720 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:40,808-Speed 2621.10 samples/sec Loss 12.7757 LearningRate 0.0806 Epoch: 2 Global Step: 84730 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:44,703-Speed 2629.16 samples/sec Loss 12.5519 LearningRate 0.0806 Epoch: 2 Global Step: 84740 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:48,601-Speed 2628.57 samples/sec Loss 12.5955 LearningRate 0.0806 Epoch: 2 Global Step: 84750 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:52,479-Speed 2641.15 samples/sec Loss 12.7336 LearningRate 0.0806 Epoch: 2 Global Step: 84760 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:14:56,355-Speed 2642.61 samples/sec Loss 12.5958 LearningRate 0.0806 Epoch: 2 Global Step: 84770 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:00,252-Speed 2628.19 samples/sec Loss 12.7699 LearningRate 0.0806 Epoch: 2 Global Step: 84780 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:04,216-Speed 2584.08 samples/sec Loss 12.7556 LearningRate 0.0806 Epoch: 2 Global Step: 84790 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:08,139-Speed 2610.22 samples/sec Loss 12.5051 LearningRate 0.0806 Epoch: 2 Global Step: 84800 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:12,169-Speed 2541.62 samples/sec Loss 12.7456 LearningRate 0.0806 Epoch: 2 Global Step: 84810 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:16,165-Speed 2562.81 samples/sec Loss 12.7446 LearningRate 0.0806 Epoch: 2 Global Step: 84820 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:20,070-Speed 2623.69 samples/sec Loss 12.6422 LearningRate 0.0806 Epoch: 2 Global Step: 84830 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:23,968-Speed 2627.58 samples/sec Loss 12.5929 LearningRate 0.0806 Epoch: 2 Global Step: 84840 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:27,862-Speed 2630.34 samples/sec Loss 12.5913 LearningRate 0.0806 Epoch: 2 Global Step: 84850 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:31,756-Speed 2630.39 samples/sec Loss 12.5731 LearningRate 0.0806 Epoch: 2 Global Step: 84860 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:15:35,653-Speed 2628.07 samples/sec Loss 12.7372 LearningRate 0.0806 Epoch: 2 Global Step: 84870 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:15:39,558-Speed 2622.62 samples/sec Loss 12.6132 LearningRate 0.0806 Epoch: 2 Global Step: 84880 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:15:43,530-Speed 2578.98 samples/sec Loss 12.6684 LearningRate 0.0806 Epoch: 2 Global Step: 84890 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:15:47,608-Speed 2511.31 samples/sec Loss 12.5944 LearningRate 0.0806 Epoch: 2 Global Step: 84900 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:15:51,511-Speed 2624.42 samples/sec Loss 12.5465 LearningRate 0.0806 Epoch: 2 Global Step: 84910 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:15:55,416-Speed 2623.37 samples/sec Loss 12.7334 LearningRate 0.0806 Epoch: 2 Global Step: 84920 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:15:59,295-Speed 2640.72 samples/sec Loss 12.5934 LearningRate 0.0806 Epoch: 2 Global Step: 84930 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:03,311-Speed 2550.40 samples/sec Loss 12.7536 LearningRate 0.0806 Epoch: 2 Global Step: 84940 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:07,358-Speed 2530.83 samples/sec Loss 12.5569 LearningRate 0.0806 Epoch: 2 Global Step: 84950 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:11,369-Speed 2553.66 samples/sec Loss 12.5957 LearningRate 0.0806 Epoch: 2 Global Step: 84960 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:15,272-Speed 2624.48 samples/sec Loss 12.5676 LearningRate 0.0806 Epoch: 2 Global Step: 84970 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:19,167-Speed 2629.20 samples/sec Loss 12.6192 LearningRate 0.0806 Epoch: 2 Global Step: 84980 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:23,078-Speed 2618.68 samples/sec Loss 12.5580 LearningRate 0.0806 Epoch: 2 Global Step: 84990 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:26,990-Speed 2618.43 samples/sec Loss 12.5390 LearningRate 0.0806 Epoch: 2 Global Step: 85000 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:30,902-Speed 2618.61 samples/sec Loss 12.6126 LearningRate 0.0806 Epoch: 2 Global Step: 85010 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:34,806-Speed 2623.31 samples/sec Loss 12.6206 LearningRate 0.0806 Epoch: 2 Global Step: 85020 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:16:38,712-Speed 2622.33 samples/sec Loss 12.6252 LearningRate 0.0806 Epoch: 2 Global Step: 85030 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:16:42,616-Speed 2623.66 samples/sec Loss 12.6244 LearningRate 0.0805 Epoch: 2 Global Step: 85040 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:16:46,529-Speed 2617.33 samples/sec Loss 12.6847 LearningRate 0.0805 Epoch: 2 Global Step: 85050 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:16:50,459-Speed 2606.38 samples/sec Loss 12.6144 LearningRate 0.0805 Epoch: 2 Global Step: 85060 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:16:54,365-Speed 2622.53 samples/sec Loss 12.5373 LearningRate 0.0805 Epoch: 2 Global Step: 85070 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:16:58,272-Speed 2621.79 samples/sec Loss 12.5378 LearningRate 0.0805 Epoch: 2 Global Step: 85080 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:17:02,238-Speed 2582.40 samples/sec Loss 12.8040 LearningRate 0.0805 Epoch: 2 Global Step: 85090 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:17:06,131-Speed 2630.78 samples/sec Loss 12.6422 LearningRate 0.0805 Epoch: 2 Global Step: 85100 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:17:10,016-Speed 2637.23 samples/sec Loss 12.6515 LearningRate 0.0805 Epoch: 2 Global Step: 85110 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:13,927-Speed 2618.56 samples/sec Loss 12.7767 LearningRate 0.0805 Epoch: 2 Global Step: 85120 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:17,839-Speed 2618.14 samples/sec Loss 12.6138 LearningRate 0.0805 Epoch: 2 Global Step: 85130 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:21,733-Speed 2630.83 samples/sec Loss 12.6180 LearningRate 0.0805 Epoch: 2 Global Step: 85140 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:25,622-Speed 2633.35 samples/sec Loss 12.6384 LearningRate 0.0805 Epoch: 2 Global Step: 85150 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:29,514-Speed 2631.60 samples/sec Loss 12.4843 LearningRate 0.0805 Epoch: 2 Global Step: 85160 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:33,405-Speed 2632.77 samples/sec Loss 12.5864 LearningRate 0.0805 Epoch: 2 Global Step: 85170 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:37,299-Speed 2630.30 samples/sec Loss 12.5278 LearningRate 0.0805 Epoch: 2 Global Step: 85180 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:41,191-Speed 2631.47 samples/sec Loss 12.7651 LearningRate 0.0805 Epoch: 2 Global Step: 85190 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:45,089-Speed 2628.03 samples/sec Loss 12.6728 LearningRate 0.0805 Epoch: 2 Global Step: 85200 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:17:48,982-Speed 2630.88 samples/sec Loss 12.6963 LearningRate 0.0805 Epoch: 2 Global Step: 85210 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:17:52,875-Speed 2631.10 samples/sec Loss 12.6352 LearningRate 0.0805 Epoch: 2 Global Step: 85220 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:17:56,772-Speed 2628.18 samples/sec Loss 12.5739 LearningRate 0.0805 Epoch: 2 Global Step: 85230 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:18:00,755-Speed 2571.61 samples/sec Loss 12.5854 LearningRate 0.0805 Epoch: 2 Global Step: 85240 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:18:04,651-Speed 2629.15 samples/sec Loss 12.4850 LearningRate 0.0805 Epoch: 2 Global Step: 85250 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:18:08,546-Speed 2629.10 samples/sec Loss 12.6020 LearningRate 0.0805 Epoch: 2 Global Step: 85260 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:18:12,442-Speed 2629.22 samples/sec Loss 12.6308 LearningRate 0.0805 Epoch: 2 Global Step: 85270 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:18:16,346-Speed 2623.24 samples/sec Loss 12.5984 LearningRate 0.0805 Epoch: 2 Global Step: 85280 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:18:20,226-Speed 2640.69 samples/sec Loss 12.6451 LearningRate 0.0805 Epoch: 2 Global Step: 85290 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:18:24,144-Speed 2613.92 samples/sec Loss 12.6425 LearningRate 0.0805 Epoch: 2 Global Step: 85300 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:18:28,146-Speed 2559.39 samples/sec Loss 12.6161 LearningRate 0.0805 Epoch: 2 Global Step: 85310 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:18:32,194-Speed 2530.52 samples/sec Loss 12.6923 LearningRate 0.0805 Epoch: 2 Global Step: 85320 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:18:36,089-Speed 2629.07 samples/sec Loss 12.8329 LearningRate 0.0805 Epoch: 2 Global Step: 85330 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:18:39,982-Speed 2630.66 samples/sec Loss 12.6846 LearningRate 0.0805 Epoch: 2 Global Step: 85340 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:18:43,894-Speed 2618.64 samples/sec Loss 12.5376 LearningRate 0.0805 Epoch: 2 Global Step: 85350 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:18:47,800-Speed 2622.72 samples/sec Loss 12.4275 LearningRate 0.0805 Epoch: 2 Global Step: 85360 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:18:51,716-Speed 2615.79 samples/sec Loss 12.7769 LearningRate 0.0805 Epoch: 2 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:18:55,614-Speed 2627.83 samples/sec Loss 12.6142 LearningRate 0.0805 Epoch: 2 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:18:59,520-Speed 2622.57 samples/sec Loss 12.6423 LearningRate 0.0805 Epoch: 2 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:19:03,427-Speed 2621.69 samples/sec Loss 12.5875 LearningRate 0.0805 Epoch: 2 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:19:07,345-Speed 2613.80 samples/sec Loss 12.6910 LearningRate 0.0805 Epoch: 2 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 84 hours
Training: 2022-04-13 05:19:11,242-Speed 2628.46 samples/sec Loss 12.6411 LearningRate 0.0805 Epoch: 2 Global Step: 85420 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:15,151-Speed 2620.42 samples/sec Loss 12.6325 LearningRate 0.0805 Epoch: 2 Global Step: 85430 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:19,057-Speed 2621.70 samples/sec Loss 12.5757 LearningRate 0.0805 Epoch: 2 Global Step: 85440 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:22,967-Speed 2620.44 samples/sec Loss 12.5994 LearningRate 0.0805 Epoch: 2 Global Step: 85450 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:26,879-Speed 2618.11 samples/sec Loss 12.5990 LearningRate 0.0805 Epoch: 2 Global Step: 85460 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:30,793-Speed 2617.02 samples/sec Loss 12.5426 LearningRate 0.0805 Epoch: 2 Global Step: 85470 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:34,721-Speed 2607.87 samples/sec Loss 12.6589 LearningRate 0.0805 Epoch: 2 Global Step: 85480 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:38,664-Speed 2597.06 samples/sec Loss 12.6778 LearningRate 0.0805 Epoch: 2 Global Step: 85490 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:42,563-Speed 2627.04 samples/sec Loss 12.7073 LearningRate 0.0804 Epoch: 2 Global Step: 85500 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:46,624-Speed 2522.41 samples/sec Loss 12.5322 LearningRate 0.0804 Epoch: 2 Global Step: 85510 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:19:50,569-Speed 2596.56 samples/sec Loss 12.7801 LearningRate 0.0804 Epoch: 2 Global Step: 85520 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:19:54,473-Speed 2623.19 samples/sec Loss 12.7782 LearningRate 0.0804 Epoch: 2 Global Step: 85530 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:19:58,377-Speed 2624.12 samples/sec Loss 12.7357 LearningRate 0.0804 Epoch: 2 Global Step: 85540 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:20:02,277-Speed 2626.65 samples/sec Loss 12.7583 LearningRate 0.0804 Epoch: 2 Global Step: 85550 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:20:06,172-Speed 2629.10 samples/sec Loss 12.6217 LearningRate 0.0804 Epoch: 2 Global Step: 85560 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:20:10,070-Speed 2627.68 samples/sec Loss 12.6493 LearningRate 0.0804 Epoch: 2 Global Step: 85570 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:20:13,976-Speed 2622.09 samples/sec Loss 12.7011 LearningRate 0.0804 Epoch: 2 Global Step: 85580 Fp16 Grad Scale: 262144 Required: 84 hours
Training: 2022-04-13 05:20:17,848-Speed 2645.28 samples/sec Loss 12.6061 LearningRate 0.0804 Epoch: 2 Global Step: 85590 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:20:21,748-Speed 2626.79 samples/sec Loss 12.6005 LearningRate 0.0804 Epoch: 2 Global Step: 85600 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:20:25,644-Speed 2629.58 samples/sec Loss 12.5137 LearningRate 0.0804 Epoch: 2 Global Step: 85610 Fp16 Grad Scale: 131072 Required: 84 hours
Training: 2022-04-13 05:20:29,541-Speed 2627.72 samples/sec Loss 12.5526 LearningRate 0.0804 Epoch: 2 Global Step: 85620 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:20:33,453-Speed 2618.67 samples/sec Loss 12.4882 LearningRate 0.0804 Epoch: 2 Global Step: 85630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:20:37,354-Speed 2625.17 samples/sec Loss 12.4238 LearningRate 0.0804 Epoch: 2 Global Step: 85640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:20:41,317-Speed 2584.65 samples/sec Loss 12.5519 LearningRate 0.0804 Epoch: 2 Global Step: 85650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:20:45,409-Speed 2502.69 samples/sec Loss 12.5808 LearningRate 0.0804 Epoch: 2 Global Step: 85660 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:20:49,345-Speed 2603.49 samples/sec Loss 12.6116 LearningRate 0.0804 Epoch: 2 Global Step: 85670 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:20:53,243-Speed 2627.19 samples/sec Loss 12.5540 LearningRate 0.0804 Epoch: 2 Global Step: 85680 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:20:57,139-Speed 2628.91 samples/sec Loss 12.6394 LearningRate 0.0804 Epoch: 2 Global Step: 85690 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:01,039-Speed 2626.87 samples/sec Loss 12.5670 LearningRate 0.0804 Epoch: 2 Global Step: 85700 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:04,938-Speed 2627.67 samples/sec Loss 12.5786 LearningRate 0.0804 Epoch: 2 Global Step: 85710 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:08,836-Speed 2627.55 samples/sec Loss 12.6309 LearningRate 0.0804 Epoch: 2 Global Step: 85720 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:12,731-Speed 2628.93 samples/sec Loss 12.5673 LearningRate 0.0804 Epoch: 2 Global Step: 85730 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:16,640-Speed 2620.32 samples/sec Loss 12.6501 LearningRate 0.0804 Epoch: 2 Global Step: 85740 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:20,544-Speed 2624.14 samples/sec Loss 12.5459 LearningRate 0.0804 Epoch: 2 Global Step: 85750 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:24,440-Speed 2628.69 samples/sec Loss 12.6845 LearningRate 0.0804 Epoch: 2 Global Step: 85760 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:28,339-Speed 2626.87 samples/sec Loss 12.6532 LearningRate 0.0804 Epoch: 2 Global Step: 85770 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:32,250-Speed 2619.35 samples/sec Loss 12.6822 LearningRate 0.0804 Epoch: 2 Global Step: 85780 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:36,156-Speed 2621.88 samples/sec Loss 12.5162 LearningRate 0.0804 Epoch: 2 Global Step: 85790 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:40,067-Speed 2618.70 samples/sec Loss 12.5610 LearningRate 0.0804 Epoch: 2 Global Step: 85800 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:43,963-Speed 2629.55 samples/sec Loss 12.6083 LearningRate 0.0804 Epoch: 2 Global Step: 85810 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:47,855-Speed 2631.68 samples/sec Loss 12.5728 LearningRate 0.0804 Epoch: 2 Global Step: 85820 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:21:51,732-Speed 2641.42 samples/sec Loss 12.5581 LearningRate 0.0804 Epoch: 2 Global Step: 85830 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:21:55,635-Speed 2624.73 samples/sec Loss 12.6421 LearningRate 0.0804 Epoch: 2 Global Step: 85840 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:21:59,530-Speed 2629.22 samples/sec Loss 12.5954 LearningRate 0.0804 Epoch: 2 Global Step: 85850 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:03,425-Speed 2630.04 samples/sec Loss 12.5197 LearningRate 0.0804 Epoch: 2 Global Step: 85860 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:07,319-Speed 2630.68 samples/sec Loss 12.5688 LearningRate 0.0804 Epoch: 2 Global Step: 85870 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:11,255-Speed 2602.11 samples/sec Loss 12.5571 LearningRate 0.0804 Epoch: 2 Global Step: 85880 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:15,143-Speed 2634.60 samples/sec Loss 12.4978 LearningRate 0.0804 Epoch: 2 Global Step: 85890 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:19,040-Speed 2628.53 samples/sec Loss 12.4238 LearningRate 0.0804 Epoch: 2 Global Step: 85900 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:22,943-Speed 2624.27 samples/sec Loss 12.5431 LearningRate 0.0804 Epoch: 2 Global Step: 85910 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:26,897-Speed 2590.54 samples/sec Loss 12.7123 LearningRate 0.0804 Epoch: 2 Global Step: 85920 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:30,780-Speed 2638.09 samples/sec Loss 12.5622 LearningRate 0.0804 Epoch: 2 Global Step: 85930 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:34,705-Speed 2609.18 samples/sec Loss 12.4802 LearningRate 0.0804 Epoch: 2 Global Step: 85940 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:38,610-Speed 2623.39 samples/sec Loss 12.4552 LearningRate 0.0804 Epoch: 2 Global Step: 85950 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:42,503-Speed 2630.74 samples/sec Loss 12.5740 LearningRate 0.0803 Epoch: 2 Global Step: 85960 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:46,407-Speed 2624.01 samples/sec Loss 12.6258 LearningRate 0.0803 Epoch: 2 Global Step: 85970 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:50,311-Speed 2623.29 samples/sec Loss 12.6021 LearningRate 0.0803 Epoch: 2 Global Step: 85980 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:54,230-Speed 2614.05 samples/sec Loss 12.5940 LearningRate 0.0803 Epoch: 2 Global Step: 85990 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:22:58,121-Speed 2632.42 samples/sec Loss 12.6143 LearningRate 0.0803 Epoch: 2 Global Step: 86000 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:02,016-Speed 2629.63 samples/sec Loss 12.6218 LearningRate 0.0803 Epoch: 2 Global Step: 86010 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:05,913-Speed 2628.15 samples/sec Loss 12.5078 LearningRate 0.0803 Epoch: 2 Global Step: 86020 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:09,806-Speed 2631.09 samples/sec Loss 12.7434 LearningRate 0.0803 Epoch: 2 Global Step: 86030 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:23:13,684-Speed 2641.54 samples/sec Loss 12.5888 LearningRate 0.0803 Epoch: 2 Global Step: 86040 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:17,577-Speed 2630.34 samples/sec Loss 12.6355 LearningRate 0.0803 Epoch: 2 Global Step: 86050 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:21,470-Speed 2631.65 samples/sec Loss 12.7385 LearningRate 0.0803 Epoch: 2 Global Step: 86060 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:25,365-Speed 2629.69 samples/sec Loss 12.5696 LearningRate 0.0803 Epoch: 2 Global Step: 86070 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:29,257-Speed 2631.62 samples/sec Loss 12.5925 LearningRate 0.0803 Epoch: 2 Global Step: 86080 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:33,150-Speed 2630.72 samples/sec Loss 12.5885 LearningRate 0.0803 Epoch: 2 Global Step: 86090 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:37,050-Speed 2626.48 samples/sec Loss 12.6926 LearningRate 0.0803 Epoch: 2 Global Step: 86100 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:40,947-Speed 2628.51 samples/sec Loss 12.5802 LearningRate 0.0803 Epoch: 2 Global Step: 86110 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:44,840-Speed 2631.10 samples/sec Loss 12.6867 LearningRate 0.0803 Epoch: 2 Global Step: 86120 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:48,739-Speed 2626.46 samples/sec Loss 12.5719 LearningRate 0.0803 Epoch: 2 Global Step: 86130 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:23:52,646-Speed 2622.24 samples/sec Loss 12.5666 LearningRate 0.0803 Epoch: 2 Global Step: 86140 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:23:56,540-Speed 2630.38 samples/sec Loss 12.6406 LearningRate 0.0803 Epoch: 2 Global Step: 86150 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:00,434-Speed 2630.23 samples/sec Loss 12.5468 LearningRate 0.0803 Epoch: 2 Global Step: 86160 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:04,332-Speed 2627.84 samples/sec Loss 12.5976 LearningRate 0.0803 Epoch: 2 Global Step: 86170 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:08,224-Speed 2631.02 samples/sec Loss 12.5573 LearningRate 0.0803 Epoch: 2 Global Step: 86180 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:12,119-Speed 2629.63 samples/sec Loss 12.5549 LearningRate 0.0803 Epoch: 2 Global Step: 86190 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:16,017-Speed 2627.91 samples/sec Loss 12.5582 LearningRate 0.0803 Epoch: 2 Global Step: 86200 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:19,918-Speed 2625.45 samples/sec Loss 12.7513 LearningRate 0.0803 Epoch: 2 Global Step: 86210 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:23,814-Speed 2629.48 samples/sec Loss 12.4987 LearningRate 0.0803 Epoch: 2 Global Step: 86220 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:27,710-Speed 2628.95 samples/sec Loss 12.5670 LearningRate 0.0803 Epoch: 2 Global Step: 86230 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:31,617-Speed 2622.12 samples/sec Loss 12.5370 LearningRate 0.0803 Epoch: 2 Global Step: 86240 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:35,507-Speed 2632.45 samples/sec Loss 12.5307 LearningRate 0.0803 Epoch: 2 Global Step: 86250 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:39,404-Speed 2628.72 samples/sec Loss 12.6861 LearningRate 0.0803 Epoch: 2 Global Step: 86260 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:43,303-Speed 2626.57 samples/sec Loss 12.6169 LearningRate 0.0803 Epoch: 2 Global Step: 86270 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:47,204-Speed 2625.87 samples/sec Loss 12.6654 LearningRate 0.0803 Epoch: 2 Global Step: 86280 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:51,119-Speed 2616.50 samples/sec Loss 12.6449 LearningRate 0.0803 Epoch: 2 Global Step: 86290 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:55,018-Speed 2626.80 samples/sec Loss 12.7166 LearningRate 0.0803 Epoch: 2 Global Step: 86300 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:24:58,903-Speed 2636.95 samples/sec Loss 12.6607 LearningRate 0.0803 Epoch: 2 Global Step: 86310 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:25:02,823-Speed 2612.53 samples/sec Loss 12.5981 LearningRate 0.0803 Epoch: 2 Global Step: 86320 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:25:06,718-Speed 2629.93 samples/sec Loss 12.7260 LearningRate 0.0803 Epoch: 2 Global Step: 86330 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:25:10,592-Speed 2643.50 samples/sec Loss 12.5671 LearningRate 0.0803 Epoch: 2 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:14,485-Speed 2631.60 samples/sec Loss 12.6017 LearningRate 0.0803 Epoch: 2 Global Step: 86350 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:18,381-Speed 2628.94 samples/sec Loss 12.5866 LearningRate 0.0803 Epoch: 2 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:22,274-Speed 2630.99 samples/sec Loss 12.5736 LearningRate 0.0803 Epoch: 2 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:26,165-Speed 2632.60 samples/sec Loss 12.6028 LearningRate 0.0803 Epoch: 2 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:30,055-Speed 2633.43 samples/sec Loss 12.6121 LearningRate 0.0803 Epoch: 2 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:33,950-Speed 2629.37 samples/sec Loss 12.6952 LearningRate 0.0803 Epoch: 2 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:37,843-Speed 2630.54 samples/sec Loss 12.6402 LearningRate 0.0803 Epoch: 2 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:41,737-Speed 2630.25 samples/sec Loss 12.5248 LearningRate 0.0803 Epoch: 2 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:45,632-Speed 2632.70 samples/sec Loss 12.6268 LearningRate 0.0802 Epoch: 2 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:25:49,531-Speed 2626.57 samples/sec Loss 12.7035 LearningRate 0.0802 Epoch: 2 Global Step: 86440 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:25:53,435-Speed 2623.60 samples/sec Loss 12.6852 LearningRate 0.0802 Epoch: 2 Global Step: 86450 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:25:57,328-Speed 2631.63 samples/sec Loss 12.5687 LearningRate 0.0802 Epoch: 2 Global Step: 86460 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:01,225-Speed 2628.33 samples/sec Loss 12.5066 LearningRate 0.0802 Epoch: 2 Global Step: 86470 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:05,120-Speed 2629.44 samples/sec Loss 12.5660 LearningRate 0.0802 Epoch: 2 Global Step: 86480 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:09,013-Speed 2630.76 samples/sec Loss 12.6333 LearningRate 0.0802 Epoch: 2 Global Step: 86490 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:12,907-Speed 2630.21 samples/sec Loss 12.6075 LearningRate 0.0802 Epoch: 2 Global Step: 86500 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:16,802-Speed 2630.31 samples/sec Loss 12.5996 LearningRate 0.0802 Epoch: 2 Global Step: 86510 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:20,700-Speed 2627.66 samples/sec Loss 12.5307 LearningRate 0.0802 Epoch: 2 Global Step: 86520 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:24,599-Speed 2626.71 samples/sec Loss 12.5058 LearningRate 0.0802 Epoch: 2 Global Step: 86530 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:26:28,502-Speed 2624.42 samples/sec Loss 12.5556 LearningRate 0.0802 Epoch: 2 Global Step: 86540 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:32,409-Speed 2621.55 samples/sec Loss 12.6668 LearningRate 0.0802 Epoch: 2 Global Step: 86550 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:36,319-Speed 2620.23 samples/sec Loss 12.6053 LearningRate 0.0802 Epoch: 2 Global Step: 86560 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:40,229-Speed 2619.48 samples/sec Loss 12.5508 LearningRate 0.0802 Epoch: 2 Global Step: 86570 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:44,125-Speed 2628.65 samples/sec Loss 12.5460 LearningRate 0.0802 Epoch: 2 Global Step: 86580 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:48,027-Speed 2625.09 samples/sec Loss 12.6109 LearningRate 0.0802 Epoch: 2 Global Step: 86590 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:51,940-Speed 2617.81 samples/sec Loss 12.6103 LearningRate 0.0802 Epoch: 2 Global Step: 86600 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:55,857-Speed 2614.95 samples/sec Loss 12.5916 LearningRate 0.0802 Epoch: 2 Global Step: 86610 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:26:59,763-Speed 2622.45 samples/sec Loss 12.6642 LearningRate 0.0802 Epoch: 2 Global Step: 86620 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:03,668-Speed 2622.62 samples/sec Loss 12.5872 LearningRate 0.0802 Epoch: 2 Global Step: 86630 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:07,576-Speed 2621.44 samples/sec Loss 12.5799 LearningRate 0.0802 Epoch: 2 Global Step: 86640 Fp16 Grad Scale: 524288 Required: 83 hours
Training: 2022-04-13 05:27:11,463-Speed 2635.03 samples/sec Loss 12.4176 LearningRate 0.0802 Epoch: 2 Global Step: 86650 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:15,376-Speed 2617.31 samples/sec Loss 12.4862 LearningRate 0.0802 Epoch: 2 Global Step: 86660 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:19,277-Speed 2625.84 samples/sec Loss 12.6208 LearningRate 0.0802 Epoch: 2 Global Step: 86670 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:23,180-Speed 2624.05 samples/sec Loss 12.5103 LearningRate 0.0802 Epoch: 2 Global Step: 86680 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:27,072-Speed 2631.74 samples/sec Loss 12.5556 LearningRate 0.0802 Epoch: 2 Global Step: 86690 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:30,970-Speed 2627.61 samples/sec Loss 12.5487 LearningRate 0.0802 Epoch: 2 Global Step: 86700 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:34,870-Speed 2626.85 samples/sec Loss 12.4357 LearningRate 0.0802 Epoch: 2 Global Step: 86710 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:38,773-Speed 2624.08 samples/sec Loss 12.4331 LearningRate 0.0802 Epoch: 2 Global Step: 86720 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:42,681-Speed 2620.54 samples/sec Loss 12.3852 LearningRate 0.0802 Epoch: 2 Global Step: 86730 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:46,588-Speed 2621.42 samples/sec Loss 12.5660 LearningRate 0.0802 Epoch: 2 Global Step: 86740 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:50,480-Speed 2632.11 samples/sec Loss 12.5761 LearningRate 0.0802 Epoch: 2 Global Step: 86750 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:54,376-Speed 2629.14 samples/sec Loss 12.4185 LearningRate 0.0802 Epoch: 2 Global Step: 86760 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:27:58,303-Speed 2608.30 samples/sec Loss 12.5966 LearningRate 0.0802 Epoch: 2 Global Step: 86770 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:02,204-Speed 2625.75 samples/sec Loss 12.5480 LearningRate 0.0802 Epoch: 2 Global Step: 86780 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:06,104-Speed 2626.47 samples/sec Loss 12.5770 LearningRate 0.0802 Epoch: 2 Global Step: 86790 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:10,005-Speed 2625.43 samples/sec Loss 12.5102 LearningRate 0.0802 Epoch: 2 Global Step: 86800 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:13,901-Speed 2628.66 samples/sec Loss 12.6977 LearningRate 0.0802 Epoch: 2 Global Step: 86810 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:17,797-Speed 2628.89 samples/sec Loss 12.4139 LearningRate 0.0802 Epoch: 2 Global Step: 86820 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:21,701-Speed 2623.83 samples/sec Loss 12.5774 LearningRate 0.0802 Epoch: 2 Global Step: 86830 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:25,604-Speed 2624.21 samples/sec Loss 12.4652 LearningRate 0.0802 Epoch: 2 Global Step: 86840 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:29,491-Speed 2635.11 samples/sec Loss 12.5266 LearningRate 0.0802 Epoch: 2 Global Step: 86850 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:33,403-Speed 2617.88 samples/sec Loss 12.6480 LearningRate 0.0802 Epoch: 2 Global Step: 86860 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:37,319-Speed 2616.08 samples/sec Loss 12.5682 LearningRate 0.0802 Epoch: 2 Global Step: 86870 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:41,218-Speed 2627.26 samples/sec Loss 12.4467 LearningRate 0.0802 Epoch: 2 Global Step: 86880 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:45,114-Speed 2628.63 samples/sec Loss 12.4918 LearningRate 0.0801 Epoch: 2 Global Step: 86890 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:49,010-Speed 2629.08 samples/sec Loss 12.4338 LearningRate 0.0801 Epoch: 2 Global Step: 86900 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:52,909-Speed 2627.60 samples/sec Loss 12.6246 LearningRate 0.0801 Epoch: 2 Global Step: 86910 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:28:56,807-Speed 2627.80 samples/sec Loss 12.5798 LearningRate 0.0801 Epoch: 2 Global Step: 86920 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:29:00,699-Speed 2631.35 samples/sec Loss 12.6099 LearningRate 0.0801 Epoch: 2 Global Step: 86930 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:29:04,578-Speed 2640.84 samples/sec Loss 12.5608 LearningRate 0.0801 Epoch: 2 Global Step: 86940 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:08,474-Speed 2628.69 samples/sec Loss 12.5880 LearningRate 0.0801 Epoch: 2 Global Step: 86950 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:12,376-Speed 2624.91 samples/sec Loss 12.6427 LearningRate 0.0801 Epoch: 2 Global Step: 86960 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:16,284-Speed 2621.45 samples/sec Loss 12.5669 LearningRate 0.0801 Epoch: 2 Global Step: 86970 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:20,194-Speed 2619.25 samples/sec Loss 12.5546 LearningRate 0.0801 Epoch: 2 Global Step: 86980 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:24,089-Speed 2630.12 samples/sec Loss 12.6188 LearningRate 0.0801 Epoch: 2 Global Step: 86990 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:27,999-Speed 2619.76 samples/sec Loss 12.5125 LearningRate 0.0801 Epoch: 2 Global Step: 87000 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:31,894-Speed 2629.29 samples/sec Loss 12.6235 LearningRate 0.0801 Epoch: 2 Global Step: 87010 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:35,802-Speed 2621.05 samples/sec Loss 12.5091 LearningRate 0.0801 Epoch: 2 Global Step: 87020 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:39,699-Speed 2628.67 samples/sec Loss 12.6974 LearningRate 0.0801 Epoch: 2 Global Step: 87030 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:29:43,631-Speed 2604.39 samples/sec Loss 12.4327 LearningRate 0.0801 Epoch: 2 Global Step: 87040 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:29:47,532-Speed 2626.06 samples/sec Loss 12.5347 LearningRate 0.0801 Epoch: 2 Global Step: 87050 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:29:51,428-Speed 2629.17 samples/sec Loss 12.4707 LearningRate 0.0801 Epoch: 2 Global Step: 87060 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:29:55,332-Speed 2623.75 samples/sec Loss 12.4642 LearningRate 0.0801 Epoch: 2 Global Step: 87070 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:29:59,225-Speed 2631.13 samples/sec Loss 12.5715 LearningRate 0.0801 Epoch: 2 Global Step: 87080 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:03,132-Speed 2621.70 samples/sec Loss 12.7133 LearningRate 0.0801 Epoch: 2 Global Step: 87090 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:07,032-Speed 2625.86 samples/sec Loss 12.3934 LearningRate 0.0801 Epoch: 2 Global Step: 87100 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:10,926-Speed 2630.13 samples/sec Loss 12.5151 LearningRate 0.0801 Epoch: 2 Global Step: 87110 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:14,826-Speed 2626.83 samples/sec Loss 12.6218 LearningRate 0.0801 Epoch: 2 Global Step: 87120 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:18,725-Speed 2626.85 samples/sec Loss 12.6602 LearningRate 0.0801 Epoch: 2 Global Step: 87130 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:22,615-Speed 2633.64 samples/sec Loss 12.6137 LearningRate 0.0801 Epoch: 2 Global Step: 87140 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:26,521-Speed 2622.37 samples/sec Loss 12.5555 LearningRate 0.0801 Epoch: 2 Global Step: 87150 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:30,426-Speed 2622.89 samples/sec Loss 12.5492 LearningRate 0.0801 Epoch: 2 Global Step: 87160 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:30:34,316-Speed 2632.85 samples/sec Loss 12.5929 LearningRate 0.0801 Epoch: 2 Global Step: 87170 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:30:38,221-Speed 2622.61 samples/sec Loss 12.4248 LearningRate 0.0801 Epoch: 2 Global Step: 87180 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:30:42,189-Speed 2581.27 samples/sec Loss 12.6545 LearningRate 0.0801 Epoch: 2 Global Step: 87190 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:30:46,089-Speed 2626.77 samples/sec Loss 12.5891 LearningRate 0.0801 Epoch: 2 Global Step: 87200 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:30:49,985-Speed 2628.99 samples/sec Loss 12.6150 LearningRate 0.0801 Epoch: 2 Global Step: 87210 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:30:53,882-Speed 2628.72 samples/sec Loss 12.5778 LearningRate 0.0801 Epoch: 2 Global Step: 87220 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:30:57,786-Speed 2623.36 samples/sec Loss 12.4118 LearningRate 0.0801 Epoch: 2 Global Step: 87230 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:01,680-Speed 2630.53 samples/sec Loss 12.4444 LearningRate 0.0801 Epoch: 2 Global Step: 87240 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:05,575-Speed 2629.86 samples/sec Loss 12.5291 LearningRate 0.0801 Epoch: 2 Global Step: 87250 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:09,468-Speed 2630.66 samples/sec Loss 12.6604 LearningRate 0.0801 Epoch: 2 Global Step: 87260 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:13,361-Speed 2630.64 samples/sec Loss 12.6222 LearningRate 0.0801 Epoch: 2 Global Step: 87270 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:31:17,257-Speed 2629.47 samples/sec Loss 12.5832 LearningRate 0.0801 Epoch: 2 Global Step: 87280 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:31:21,149-Speed 2632.06 samples/sec Loss 12.4772 LearningRate 0.0801 Epoch: 2 Global Step: 87290 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:31:25,043-Speed 2630.24 samples/sec Loss 12.5263 LearningRate 0.0801 Epoch: 2 Global Step: 87300 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:31:28,926-Speed 2637.54 samples/sec Loss 12.6298 LearningRate 0.0801 Epoch: 2 Global Step: 87310 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:32,829-Speed 2624.68 samples/sec Loss 12.5972 LearningRate 0.0801 Epoch: 2 Global Step: 87320 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:36,723-Speed 2630.40 samples/sec Loss 12.4155 LearningRate 0.0801 Epoch: 2 Global Step: 87330 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:40,615-Speed 2631.29 samples/sec Loss 12.7000 LearningRate 0.0801 Epoch: 2 Global Step: 87340 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:44,514-Speed 2626.92 samples/sec Loss 12.3933 LearningRate 0.0800 Epoch: 2 Global Step: 87350 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:48,410-Speed 2628.89 samples/sec Loss 12.6117 LearningRate 0.0800 Epoch: 2 Global Step: 87360 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:52,312-Speed 2625.04 samples/sec Loss 12.5403 LearningRate 0.0800 Epoch: 2 Global Step: 87370 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:31:56,239-Speed 2608.34 samples/sec Loss 12.5401 LearningRate 0.0800 Epoch: 2 Global Step: 87380 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:32:00,143-Speed 2623.65 samples/sec Loss 12.6504 LearningRate 0.0800 Epoch: 2 Global Step: 87390 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:32:04,036-Speed 2630.54 samples/sec Loss 12.6262 LearningRate 0.0800 Epoch: 2 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:32:07,923-Speed 2635.64 samples/sec Loss 12.7488 LearningRate 0.0800 Epoch: 2 Global Step: 87410 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:11,816-Speed 2630.84 samples/sec Loss 12.8811 LearningRate 0.0800 Epoch: 2 Global Step: 87420 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:15,715-Speed 2626.87 samples/sec Loss 12.9303 LearningRate 0.0800 Epoch: 2 Global Step: 87430 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:19,638-Speed 2610.84 samples/sec Loss 12.6039 LearningRate 0.0800 Epoch: 2 Global Step: 87440 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:23,545-Speed 2621.16 samples/sec Loss 12.7249 LearningRate 0.0800 Epoch: 2 Global Step: 87450 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:27,450-Speed 2623.16 samples/sec Loss 12.6173 LearningRate 0.0800 Epoch: 2 Global Step: 87460 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:31,345-Speed 2629.23 samples/sec Loss 12.6143 LearningRate 0.0800 Epoch: 2 Global Step: 87470 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:35,277-Speed 2605.04 samples/sec Loss 12.5483 LearningRate 0.0800 Epoch: 2 Global Step: 87480 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:39,173-Speed 2629.45 samples/sec Loss 12.4567 LearningRate 0.0800 Epoch: 2 Global Step: 87490 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:43,069-Speed 2628.88 samples/sec Loss 12.7219 LearningRate 0.0800 Epoch: 2 Global Step: 87500 Fp16 Grad Scale: 32768 Required: 83 hours
Training: 2022-04-13 05:32:46,977-Speed 2620.31 samples/sec Loss 12.7702 LearningRate 0.0800 Epoch: 2 Global Step: 87510 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:32:50,879-Speed 2625.15 samples/sec Loss 12.5715 LearningRate 0.0800 Epoch: 2 Global Step: 87520 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:32:54,777-Speed 2627.66 samples/sec Loss 12.5338 LearningRate 0.0800 Epoch: 2 Global Step: 87530 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:32:58,764-Speed 2570.83 samples/sec Loss 12.5566 LearningRate 0.0800 Epoch: 2 Global Step: 87540 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:33:02,833-Speed 2517.28 samples/sec Loss 12.6085 LearningRate 0.0800 Epoch: 2 Global Step: 87550 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:33:06,867-Speed 2539.06 samples/sec Loss 12.5691 LearningRate 0.0800 Epoch: 2 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:33:10,765-Speed 2626.94 samples/sec Loss 12.4803 LearningRate 0.0800 Epoch: 2 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:33:14,662-Speed 2628.42 samples/sec Loss 12.5810 LearningRate 0.0800 Epoch: 2 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:33:18,561-Speed 2627.47 samples/sec Loss 12.5741 LearningRate 0.0800 Epoch: 2 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:33:22,458-Speed 2628.31 samples/sec Loss 12.7871 LearningRate 0.0800 Epoch: 2 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:33:26,355-Speed 2627.95 samples/sec Loss 12.7247 LearningRate 0.0800 Epoch: 2 Global Step: 87610 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:30,252-Speed 2628.80 samples/sec Loss 12.5408 LearningRate 0.0800 Epoch: 2 Global Step: 87620 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:34,152-Speed 2626.10 samples/sec Loss 12.4941 LearningRate 0.0800 Epoch: 2 Global Step: 87630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:38,050-Speed 2627.92 samples/sec Loss 12.5701 LearningRate 0.0800 Epoch: 2 Global Step: 87640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:41,953-Speed 2624.51 samples/sec Loss 12.3667 LearningRate 0.0800 Epoch: 2 Global Step: 87650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:45,849-Speed 2628.59 samples/sec Loss 12.6751 LearningRate 0.0800 Epoch: 2 Global Step: 87660 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:49,747-Speed 2627.65 samples/sec Loss 12.5827 LearningRate 0.0800 Epoch: 2 Global Step: 87670 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:53,646-Speed 2626.38 samples/sec Loss 12.6558 LearningRate 0.0800 Epoch: 2 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:33:57,572-Speed 2609.31 samples/sec Loss 12.5299 LearningRate 0.0800 Epoch: 2 Global Step: 87690 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:34:01,467-Speed 2630.17 samples/sec Loss 12.5867 LearningRate 0.0800 Epoch: 2 Global Step: 87700 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:34:05,363-Speed 2629.08 samples/sec Loss 12.4694 LearningRate 0.0800 Epoch: 2 Global Step: 87710 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:09,282-Speed 2613.45 samples/sec Loss 12.6562 LearningRate 0.0800 Epoch: 2 Global Step: 87720 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:13,185-Speed 2624.59 samples/sec Loss 12.6967 LearningRate 0.0800 Epoch: 2 Global Step: 87730 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:17,095-Speed 2619.28 samples/sec Loss 12.5893 LearningRate 0.0800 Epoch: 2 Global Step: 87740 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:20,996-Speed 2625.81 samples/sec Loss 12.6661 LearningRate 0.0800 Epoch: 2 Global Step: 87750 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:24,897-Speed 2625.48 samples/sec Loss 12.5430 LearningRate 0.0800 Epoch: 2 Global Step: 87760 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:28,803-Speed 2621.61 samples/sec Loss 12.5884 LearningRate 0.0800 Epoch: 2 Global Step: 87770 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:32,710-Speed 2622.35 samples/sec Loss 12.4089 LearningRate 0.0800 Epoch: 2 Global Step: 87780 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:36,607-Speed 2628.35 samples/sec Loss 12.4172 LearningRate 0.0800 Epoch: 2 Global Step: 87790 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:40,507-Speed 2626.26 samples/sec Loss 12.4615 LearningRate 0.0800 Epoch: 2 Global Step: 87800 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:44,392-Speed 2636.69 samples/sec Loss 12.5799 LearningRate 0.0800 Epoch: 2 Global Step: 87810 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:48,292-Speed 2626.00 samples/sec Loss 12.6235 LearningRate 0.0799 Epoch: 2 Global Step: 87820 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:52,194-Speed 2624.92 samples/sec Loss 12.6553 LearningRate 0.0799 Epoch: 2 Global Step: 87830 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:34:56,055-Speed 2652.61 samples/sec Loss 12.4768 LearningRate 0.0799 Epoch: 2 Global Step: 87840 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:34:59,957-Speed 2624.95 samples/sec Loss 12.5340 LearningRate 0.0799 Epoch: 2 Global Step: 87850 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:03,865-Speed 2620.77 samples/sec Loss 12.5442 LearningRate 0.0799 Epoch: 2 Global Step: 87860 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:07,770-Speed 2623.57 samples/sec Loss 12.5715 LearningRate 0.0799 Epoch: 2 Global Step: 87870 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:11,673-Speed 2624.33 samples/sec Loss 12.5348 LearningRate 0.0799 Epoch: 2 Global Step: 87880 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:15,568-Speed 2629.61 samples/sec Loss 12.3355 LearningRate 0.0799 Epoch: 2 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:19,460-Speed 2631.42 samples/sec Loss 12.5946 LearningRate 0.0799 Epoch: 2 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:23,353-Speed 2630.67 samples/sec Loss 12.4507 LearningRate 0.0799 Epoch: 2 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:27,263-Speed 2619.98 samples/sec Loss 12.5993 LearningRate 0.0799 Epoch: 2 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:31,155-Speed 2632.07 samples/sec Loss 12.6165 LearningRate 0.0799 Epoch: 2 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:35:35,049-Speed 2630.29 samples/sec Loss 12.4224 LearningRate 0.0799 Epoch: 2 Global Step: 87940 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:35:38,943-Speed 2630.43 samples/sec Loss 12.5546 LearningRate 0.0799 Epoch: 2 Global Step: 87950 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:35:42,833-Speed 2632.89 samples/sec Loss 12.6466 LearningRate 0.0799 Epoch: 2 Global Step: 87960 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:35:46,727-Speed 2630.70 samples/sec Loss 12.5660 LearningRate 0.0799 Epoch: 2 Global Step: 87970 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:35:50,631-Speed 2623.45 samples/sec Loss 12.5411 LearningRate 0.0799 Epoch: 2 Global Step: 87980 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:35:54,540-Speed 2620.25 samples/sec Loss 12.6598 LearningRate 0.0799 Epoch: 2 Global Step: 87990 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:35:58,446-Speed 2622.30 samples/sec Loss 12.6939 LearningRate 0.0799 Epoch: 2 Global Step: 88000 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:36:02,347-Speed 2625.82 samples/sec Loss 12.6619 LearningRate 0.0799 Epoch: 2 Global Step: 88010 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:36:06,245-Speed 2627.11 samples/sec Loss 12.5813 LearningRate 0.0799 Epoch: 2 Global Step: 88020 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:36:10,140-Speed 2630.05 samples/sec Loss 12.4665 LearningRate 0.0799 Epoch: 2 Global Step: 88030 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:36:14,046-Speed 2622.37 samples/sec Loss 12.6313 LearningRate 0.0799 Epoch: 2 Global Step: 88040 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:17,946-Speed 2626.16 samples/sec Loss 12.4833 LearningRate 0.0799 Epoch: 2 Global Step: 88050 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:21,860-Speed 2616.47 samples/sec Loss 12.6219 LearningRate 0.0799 Epoch: 2 Global Step: 88060 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:25,766-Speed 2623.06 samples/sec Loss 12.5271 LearningRate 0.0799 Epoch: 2 Global Step: 88070 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:29,677-Speed 2618.83 samples/sec Loss 12.5810 LearningRate 0.0799 Epoch: 2 Global Step: 88080 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:33,582-Speed 2622.45 samples/sec Loss 12.3858 LearningRate 0.0799 Epoch: 2 Global Step: 88090 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:37,477-Speed 2630.03 samples/sec Loss 12.4228 LearningRate 0.0799 Epoch: 2 Global Step: 88100 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:41,388-Speed 2619.29 samples/sec Loss 12.6281 LearningRate 0.0799 Epoch: 2 Global Step: 88110 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:45,310-Speed 2611.08 samples/sec Loss 12.4282 LearningRate 0.0799 Epoch: 2 Global Step: 88120 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:49,227-Speed 2615.06 samples/sec Loss 12.5433 LearningRate 0.0799 Epoch: 2 Global Step: 88130 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:53,141-Speed 2616.95 samples/sec Loss 12.6036 LearningRate 0.0799 Epoch: 2 Global Step: 88140 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:36:57,036-Speed 2630.55 samples/sec Loss 12.5320 LearningRate 0.0799 Epoch: 2 Global Step: 88150 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:00,938-Speed 2624.87 samples/sec Loss 12.5085 LearningRate 0.0799 Epoch: 2 Global Step: 88160 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:04,837-Speed 2626.92 samples/sec Loss 12.4027 LearningRate 0.0799 Epoch: 2 Global Step: 88170 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:08,734-Speed 2627.70 samples/sec Loss 12.5555 LearningRate 0.0799 Epoch: 2 Global Step: 88180 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:12,649-Speed 2616.60 samples/sec Loss 12.7041 LearningRate 0.0799 Epoch: 2 Global Step: 88190 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:16,537-Speed 2634.09 samples/sec Loss 12.5259 LearningRate 0.0799 Epoch: 2 Global Step: 88200 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:20,468-Speed 2606.28 samples/sec Loss 12.6637 LearningRate 0.0799 Epoch: 2 Global Step: 88210 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:24,377-Speed 2620.30 samples/sec Loss 12.4989 LearningRate 0.0799 Epoch: 2 Global Step: 88220 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:28,289-Speed 2618.40 samples/sec Loss 12.5286 LearningRate 0.0799 Epoch: 2 Global Step: 88230 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:32,201-Speed 2617.94 samples/sec Loss 12.7541 LearningRate 0.0799 Epoch: 2 Global Step: 88240 Fp16 Grad Scale: 524288 Required: 83 hours
Training: 2022-04-13 05:37:36,099-Speed 2627.77 samples/sec Loss 12.4556 LearningRate 0.0799 Epoch: 2 Global Step: 88250 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:40,114-Speed 2550.68 samples/sec Loss 12.5891 LearningRate 0.0799 Epoch: 2 Global Step: 88260 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:44,021-Speed 2621.60 samples/sec Loss 12.5489 LearningRate 0.0799 Epoch: 2 Global Step: 88270 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:47,929-Speed 2621.05 samples/sec Loss 12.5367 LearningRate 0.0798 Epoch: 2 Global Step: 88280 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:51,833-Speed 2623.64 samples/sec Loss 12.4487 LearningRate 0.0798 Epoch: 2 Global Step: 88290 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:55,732-Speed 2627.08 samples/sec Loss 12.5531 LearningRate 0.0798 Epoch: 2 Global Step: 88300 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:37:59,630-Speed 2627.41 samples/sec Loss 12.6015 LearningRate 0.0798 Epoch: 2 Global Step: 88310 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:03,531-Speed 2625.77 samples/sec Loss 12.5824 LearningRate 0.0798 Epoch: 2 Global Step: 88320 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:07,449-Speed 2614.06 samples/sec Loss 12.4877 LearningRate 0.0798 Epoch: 2 Global Step: 88330 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:11,360-Speed 2618.94 samples/sec Loss 12.5974 LearningRate 0.0798 Epoch: 2 Global Step: 88340 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:15,246-Speed 2635.44 samples/sec Loss 12.6286 LearningRate 0.0798 Epoch: 2 Global Step: 88350 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:19,151-Speed 2622.82 samples/sec Loss 12.5666 LearningRate 0.0798 Epoch: 2 Global Step: 88360 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:23,063-Speed 2618.55 samples/sec Loss 12.5198 LearningRate 0.0798 Epoch: 2 Global Step: 88370 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:26,968-Speed 2622.43 samples/sec Loss 12.6472 LearningRate 0.0798 Epoch: 2 Global Step: 88380 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:30,869-Speed 2627.43 samples/sec Loss 12.6900 LearningRate 0.0798 Epoch: 2 Global Step: 88390 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:38:34,750-Speed 2639.45 samples/sec Loss 12.4047 LearningRate 0.0798 Epoch: 2 Global Step: 88400 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:38:38,645-Speed 2629.78 samples/sec Loss 12.6196 LearningRate 0.0798 Epoch: 2 Global Step: 88410 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:38:42,540-Speed 2628.94 samples/sec Loss 12.5943 LearningRate 0.0798 Epoch: 2 Global Step: 88420 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:38:46,433-Speed 2631.43 samples/sec Loss 12.4698 LearningRate 0.0798 Epoch: 2 Global Step: 88430 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:38:50,333-Speed 2625.91 samples/sec Loss 12.5187 LearningRate 0.0798 Epoch: 2 Global Step: 88440 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:38:54,243-Speed 2619.78 samples/sec Loss 12.6257 LearningRate 0.0798 Epoch: 2 Global Step: 88450 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:38:58,139-Speed 2629.20 samples/sec Loss 12.3885 LearningRate 0.0798 Epoch: 2 Global Step: 88460 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:02,039-Speed 2626.60 samples/sec Loss 12.3993 LearningRate 0.0798 Epoch: 2 Global Step: 88470 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:05,937-Speed 2627.34 samples/sec Loss 12.5382 LearningRate 0.0798 Epoch: 2 Global Step: 88480 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:09,831-Speed 2630.21 samples/sec Loss 12.5221 LearningRate 0.0798 Epoch: 2 Global Step: 88490 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:13,724-Speed 2630.75 samples/sec Loss 12.5484 LearningRate 0.0798 Epoch: 2 Global Step: 88500 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:39:17,628-Speed 2624.04 samples/sec Loss 12.6300 LearningRate 0.0798 Epoch: 2 Global Step: 88510 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:21,535-Speed 2621.93 samples/sec Loss 12.4948 LearningRate 0.0798 Epoch: 2 Global Step: 88520 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:25,445-Speed 2619.61 samples/sec Loss 12.4756 LearningRate 0.0798 Epoch: 2 Global Step: 88530 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:29,357-Speed 2618.18 samples/sec Loss 12.5561 LearningRate 0.0798 Epoch: 2 Global Step: 88540 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:33,274-Speed 2615.11 samples/sec Loss 12.4968 LearningRate 0.0798 Epoch: 2 Global Step: 88550 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:37,268-Speed 2564.40 samples/sec Loss 12.6971 LearningRate 0.0798 Epoch: 2 Global Step: 88560 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:41,345-Speed 2512.23 samples/sec Loss 12.5720 LearningRate 0.0798 Epoch: 2 Global Step: 88570 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:45,433-Speed 2505.30 samples/sec Loss 12.6396 LearningRate 0.0798 Epoch: 2 Global Step: 88580 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:49,357-Speed 2610.38 samples/sec Loss 12.5659 LearningRate 0.0798 Epoch: 2 Global Step: 88590 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:53,262-Speed 2622.77 samples/sec Loss 12.4098 LearningRate 0.0798 Epoch: 2 Global Step: 88600 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:39:57,183-Speed 2612.09 samples/sec Loss 12.4489 LearningRate 0.0798 Epoch: 2 Global Step: 88610 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:40:01,085-Speed 2625.06 samples/sec Loss 12.5019 LearningRate 0.0798 Epoch: 2 Global Step: 88620 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:40:04,989-Speed 2623.74 samples/sec Loss 12.6385 LearningRate 0.0798 Epoch: 2 Global Step: 88630 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:40:08,900-Speed 2618.80 samples/sec Loss 12.4512 LearningRate 0.0798 Epoch: 2 Global Step: 88640 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:40:12,779-Speed 2640.03 samples/sec Loss 12.4608 LearningRate 0.0798 Epoch: 2 Global Step: 88650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:16,675-Speed 2629.85 samples/sec Loss 12.5847 LearningRate 0.0798 Epoch: 2 Global Step: 88660 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:20,570-Speed 2630.23 samples/sec Loss 12.5540 LearningRate 0.0798 Epoch: 2 Global Step: 88670 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:24,473-Speed 2623.96 samples/sec Loss 12.4999 LearningRate 0.0798 Epoch: 2 Global Step: 88680 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:28,374-Speed 2625.85 samples/sec Loss 12.5879 LearningRate 0.0798 Epoch: 2 Global Step: 88690 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:32,271-Speed 2628.76 samples/sec Loss 12.5325 LearningRate 0.0798 Epoch: 2 Global Step: 88700 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:36,171-Speed 2626.25 samples/sec Loss 12.4908 LearningRate 0.0798 Epoch: 2 Global Step: 88710 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:40,090-Speed 2613.53 samples/sec Loss 12.6923 LearningRate 0.0798 Epoch: 2 Global Step: 88720 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:44,006-Speed 2615.60 samples/sec Loss 12.7709 LearningRate 0.0798 Epoch: 2 Global Step: 88730 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:47,908-Speed 2625.07 samples/sec Loss 12.7714 LearningRate 0.0798 Epoch: 2 Global Step: 88740 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:40:51,805-Speed 2629.14 samples/sec Loss 12.4912 LearningRate 0.0797 Epoch: 2 Global Step: 88750 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:40:55,703-Speed 2627.60 samples/sec Loss 12.6758 LearningRate 0.0797 Epoch: 2 Global Step: 88760 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:40:59,596-Speed 2630.71 samples/sec Loss 12.5809 LearningRate 0.0797 Epoch: 2 Global Step: 88770 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:03,492-Speed 2628.53 samples/sec Loss 12.5217 LearningRate 0.0797 Epoch: 2 Global Step: 88780 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:07,394-Speed 2625.49 samples/sec Loss 12.5642 LearningRate 0.0797 Epoch: 2 Global Step: 88790 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:11,291-Speed 2628.53 samples/sec Loss 12.5697 LearningRate 0.0797 Epoch: 2 Global Step: 88800 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:15,230-Speed 2599.89 samples/sec Loss 12.4660 LearningRate 0.0797 Epoch: 2 Global Step: 88810 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:19,136-Speed 2622.38 samples/sec Loss 12.3638 LearningRate 0.0797 Epoch: 2 Global Step: 88820 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:23,037-Speed 2625.79 samples/sec Loss 12.5579 LearningRate 0.0797 Epoch: 2 Global Step: 88830 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:26,944-Speed 2622.24 samples/sec Loss 12.7039 LearningRate 0.0797 Epoch: 2 Global Step: 88840 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:30,826-Speed 2637.86 samples/sec Loss 12.3771 LearningRate 0.0797 Epoch: 2 Global Step: 88850 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:34,824-Speed 2561.83 samples/sec Loss 12.6995 LearningRate 0.0797 Epoch: 2 Global Step: 88860 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:38,719-Speed 2630.10 samples/sec Loss 12.6446 LearningRate 0.0797 Epoch: 2 Global Step: 88870 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:42,617-Speed 2627.77 samples/sec Loss 12.5640 LearningRate 0.0797 Epoch: 2 Global Step: 88880 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:41:46,498-Speed 2638.67 samples/sec Loss 12.5405 LearningRate 0.0797 Epoch: 2 Global Step: 88890 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:41:50,390-Speed 2631.96 samples/sec Loss 12.4575 LearningRate 0.0797 Epoch: 2 Global Step: 88900 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:41:54,287-Speed 2627.85 samples/sec Loss 12.4945 LearningRate 0.0797 Epoch: 2 Global Step: 88910 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:41:58,180-Speed 2631.16 samples/sec Loss 12.6424 LearningRate 0.0797 Epoch: 2 Global Step: 88920 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:42:02,077-Speed 2628.35 samples/sec Loss 12.5299 LearningRate 0.0797 Epoch: 2 Global Step: 88930 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:42:05,975-Speed 2627.27 samples/sec Loss 12.5500 LearningRate 0.0797 Epoch: 2 Global Step: 88940 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:42:09,872-Speed 2628.32 samples/sec Loss 12.5444 LearningRate 0.0797 Epoch: 2 Global Step: 88950 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:42:13,773-Speed 2626.20 samples/sec Loss 12.4651 LearningRate 0.0797 Epoch: 2 Global Step: 88960 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:42:17,672-Speed 2626.91 samples/sec Loss 12.5775 LearningRate 0.0797 Epoch: 2 Global Step: 88970 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:42:21,577-Speed 2623.00 samples/sec Loss 12.6244 LearningRate 0.0797 Epoch: 2 Global Step: 88980 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:42:25,495-Speed 2614.05 samples/sec Loss 12.4654 LearningRate 0.0797 Epoch: 2 Global Step: 88990 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:29,394-Speed 2627.56 samples/sec Loss 12.4709 LearningRate 0.0797 Epoch: 2 Global Step: 89000 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:33,288-Speed 2630.08 samples/sec Loss 12.6773 LearningRate 0.0797 Epoch: 2 Global Step: 89010 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:37,189-Speed 2625.34 samples/sec Loss 12.6118 LearningRate 0.0797 Epoch: 2 Global Step: 89020 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:41,093-Speed 2623.91 samples/sec Loss 12.3752 LearningRate 0.0797 Epoch: 2 Global Step: 89030 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:44,988-Speed 2630.08 samples/sec Loss 12.4470 LearningRate 0.0797 Epoch: 2 Global Step: 89040 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:48,895-Speed 2621.38 samples/sec Loss 12.6189 LearningRate 0.0797 Epoch: 2 Global Step: 89050 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:52,796-Speed 2625.33 samples/sec Loss 12.5110 LearningRate 0.0797 Epoch: 2 Global Step: 89060 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:42:56,691-Speed 2630.20 samples/sec Loss 12.5289 LearningRate 0.0797 Epoch: 2 Global Step: 89070 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:43:00,589-Speed 2627.57 samples/sec Loss 12.5542 LearningRate 0.0797 Epoch: 2 Global Step: 89080 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:43:04,490-Speed 2625.83 samples/sec Loss 12.5828 LearningRate 0.0797 Epoch: 2 Global Step: 89090 Fp16 Grad Scale: 524288 Required: 83 hours
Training: 2022-04-13 05:43:08,374-Speed 2636.73 samples/sec Loss 12.5192 LearningRate 0.0797 Epoch: 2 Global Step: 89100 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:43:12,406-Speed 2540.58 samples/sec Loss 12.6093 LearningRate 0.0797 Epoch: 2 Global Step: 89110 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:43:16,314-Speed 2620.63 samples/sec Loss 12.6257 LearningRate 0.0797 Epoch: 2 Global Step: 89120 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:43:20,217-Speed 2624.30 samples/sec Loss 12.5453 LearningRate 0.0797 Epoch: 2 Global Step: 89130 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:43:24,101-Speed 2637.03 samples/sec Loss 12.4752 LearningRate 0.0797 Epoch: 2 Global Step: 89140 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:28,001-Speed 2626.58 samples/sec Loss 12.5401 LearningRate 0.0797 Epoch: 2 Global Step: 89150 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:31,924-Speed 2610.77 samples/sec Loss 12.4685 LearningRate 0.0797 Epoch: 2 Global Step: 89160 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:35,826-Speed 2625.05 samples/sec Loss 12.5484 LearningRate 0.0797 Epoch: 2 Global Step: 89170 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:39,775-Speed 2593.21 samples/sec Loss 12.4803 LearningRate 0.0797 Epoch: 2 Global Step: 89180 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:43,681-Speed 2622.63 samples/sec Loss 12.4603 LearningRate 0.0797 Epoch: 2 Global Step: 89190 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:47,577-Speed 2629.19 samples/sec Loss 12.5667 LearningRate 0.0797 Epoch: 2 Global Step: 89200 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:51,489-Speed 2618.02 samples/sec Loss 12.5252 LearningRate 0.0796 Epoch: 2 Global Step: 89210 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:55,394-Speed 2623.16 samples/sec Loss 12.6617 LearningRate 0.0796 Epoch: 2 Global Step: 89220 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:43:59,299-Speed 2623.09 samples/sec Loss 12.3754 LearningRate 0.0796 Epoch: 2 Global Step: 89230 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:44:03,195-Speed 2628.59 samples/sec Loss 12.6311 LearningRate 0.0796 Epoch: 2 Global Step: 89240 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:07,092-Speed 2627.92 samples/sec Loss 12.4950 LearningRate 0.0796 Epoch: 2 Global Step: 89250 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:10,990-Speed 2628.36 samples/sec Loss 12.5885 LearningRate 0.0796 Epoch: 2 Global Step: 89260 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:14,892-Speed 2624.85 samples/sec Loss 12.4984 LearningRate 0.0796 Epoch: 2 Global Step: 89270 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:18,790-Speed 2627.88 samples/sec Loss 12.5011 LearningRate 0.0796 Epoch: 2 Global Step: 89280 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:22,693-Speed 2624.40 samples/sec Loss 12.4283 LearningRate 0.0796 Epoch: 2 Global Step: 89290 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:26,594-Speed 2625.55 samples/sec Loss 12.6771 LearningRate 0.0796 Epoch: 2 Global Step: 89300 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:30,488-Speed 2631.78 samples/sec Loss 12.4459 LearningRate 0.0796 Epoch: 2 Global Step: 89310 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:34,390-Speed 2624.61 samples/sec Loss 12.3607 LearningRate 0.0796 Epoch: 2 Global Step: 89320 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:38,297-Speed 2621.74 samples/sec Loss 12.2911 LearningRate 0.0796 Epoch: 2 Global Step: 89330 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:42,175-Speed 2640.89 samples/sec Loss 12.5887 LearningRate 0.0796 Epoch: 2 Global Step: 89340 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:46,081-Speed 2622.33 samples/sec Loss 12.4955 LearningRate 0.0796 Epoch: 2 Global Step: 89350 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:49,978-Speed 2628.12 samples/sec Loss 12.6191 LearningRate 0.0796 Epoch: 2 Global Step: 89360 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:44:53,873-Speed 2629.64 samples/sec Loss 12.5140 LearningRate 0.0796 Epoch: 2 Global Step: 89370 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:44:57,766-Speed 2630.90 samples/sec Loss 12.4930 LearningRate 0.0796 Epoch: 2 Global Step: 89380 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:01,661-Speed 2629.56 samples/sec Loss 12.4289 LearningRate 0.0796 Epoch: 2 Global Step: 89390 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:05,560-Speed 2627.72 samples/sec Loss 12.4442 LearningRate 0.0796 Epoch: 2 Global Step: 89400 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:09,450-Speed 2633.05 samples/sec Loss 12.5553 LearningRate 0.0796 Epoch: 2 Global Step: 89410 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:13,346-Speed 2628.63 samples/sec Loss 12.4912 LearningRate 0.0796 Epoch: 2 Global Step: 89420 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:17,246-Speed 2626.15 samples/sec Loss 12.4156 LearningRate 0.0796 Epoch: 2 Global Step: 89430 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:21,136-Speed 2633.11 samples/sec Loss 12.3253 LearningRate 0.0796 Epoch: 2 Global Step: 89440 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:25,032-Speed 2628.86 samples/sec Loss 12.5020 LearningRate 0.0796 Epoch: 2 Global Step: 89450 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:28,932-Speed 2626.21 samples/sec Loss 12.3129 LearningRate 0.0796 Epoch: 2 Global Step: 89460 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:45:32,825-Speed 2631.45 samples/sec Loss 12.6258 LearningRate 0.0796 Epoch: 2 Global Step: 89470 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:45:36,721-Speed 2628.92 samples/sec Loss 12.5448 LearningRate 0.0796 Epoch: 2 Global Step: 89480 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:45:40,614-Speed 2630.71 samples/sec Loss 12.5847 LearningRate 0.0796 Epoch: 2 Global Step: 89490 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:45:44,512-Speed 2627.15 samples/sec Loss 12.4487 LearningRate 0.0796 Epoch: 2 Global Step: 89500 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:45:48,419-Speed 2621.72 samples/sec Loss 12.6035 LearningRate 0.0796 Epoch: 2 Global Step: 89510 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:45:52,321-Speed 2624.93 samples/sec Loss 12.5525 LearningRate 0.0796 Epoch: 2 Global Step: 89520 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:45:56,215-Speed 2630.21 samples/sec Loss 12.4599 LearningRate 0.0796 Epoch: 2 Global Step: 89530 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:46:00,137-Speed 2611.50 samples/sec Loss 12.7046 LearningRate 0.0796 Epoch: 2 Global Step: 89540 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:46:04,068-Speed 2605.58 samples/sec Loss 12.5206 LearningRate 0.0796 Epoch: 2 Global Step: 89550 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:07,965-Speed 2628.10 samples/sec Loss 12.3967 LearningRate 0.0796 Epoch: 2 Global Step: 89560 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:11,864-Speed 2627.12 samples/sec Loss 12.4953 LearningRate 0.0796 Epoch: 2 Global Step: 89570 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:15,759-Speed 2629.79 samples/sec Loss 12.4986 LearningRate 0.0796 Epoch: 2 Global Step: 89580 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:19,658-Speed 2627.10 samples/sec Loss 12.4868 LearningRate 0.0796 Epoch: 2 Global Step: 89590 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:23,553-Speed 2629.42 samples/sec Loss 12.5976 LearningRate 0.0796 Epoch: 2 Global Step: 89600 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:27,446-Speed 2630.94 samples/sec Loss 12.2993 LearningRate 0.0796 Epoch: 2 Global Step: 89610 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:31,343-Speed 2628.28 samples/sec Loss 12.6788 LearningRate 0.0796 Epoch: 2 Global Step: 89620 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:35,248-Speed 2622.35 samples/sec Loss 12.6442 LearningRate 0.0796 Epoch: 2 Global Step: 89630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:39,135-Speed 2635.53 samples/sec Loss 12.6129 LearningRate 0.0796 Epoch: 2 Global Step: 89640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:46:43,031-Speed 2628.96 samples/sec Loss 12.4339 LearningRate 0.0796 Epoch: 2 Global Step: 89650 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:46:46,925-Speed 2630.92 samples/sec Loss 12.5659 LearningRate 0.0796 Epoch: 2 Global Step: 89660 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:46:50,826-Speed 2626.21 samples/sec Loss 12.3877 LearningRate 0.0796 Epoch: 2 Global Step: 89670 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:46:54,722-Speed 2628.81 samples/sec Loss 12.4884 LearningRate 0.0795 Epoch: 2 Global Step: 89680 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:46:58,622-Speed 2626.00 samples/sec Loss 12.3869 LearningRate 0.0795 Epoch: 2 Global Step: 89690 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:47:02,522-Speed 2625.78 samples/sec Loss 12.3565 LearningRate 0.0795 Epoch: 2 Global Step: 89700 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:47:06,431-Speed 2620.72 samples/sec Loss 12.5650 LearningRate 0.0795 Epoch: 2 Global Step: 89710 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:47:10,335-Speed 2623.23 samples/sec Loss 12.4623 LearningRate 0.0795 Epoch: 2 Global Step: 89720 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:47:14,230-Speed 2630.53 samples/sec Loss 12.4193 LearningRate 0.0795 Epoch: 2 Global Step: 89730 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:18,127-Speed 2629.00 samples/sec Loss 12.4052 LearningRate 0.0795 Epoch: 2 Global Step: 89740 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:22,023-Speed 2628.70 samples/sec Loss 12.4174 LearningRate 0.0795 Epoch: 2 Global Step: 89750 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:25,920-Speed 2628.29 samples/sec Loss 12.3856 LearningRate 0.0795 Epoch: 2 Global Step: 89760 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:29,814-Speed 2629.80 samples/sec Loss 12.3968 LearningRate 0.0795 Epoch: 2 Global Step: 89770 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:33,713-Speed 2627.29 samples/sec Loss 12.4098 LearningRate 0.0795 Epoch: 2 Global Step: 89780 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:37,625-Speed 2618.12 samples/sec Loss 12.3804 LearningRate 0.0795 Epoch: 2 Global Step: 89790 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:41,529-Speed 2623.27 samples/sec Loss 12.5352 LearningRate 0.0795 Epoch: 2 Global Step: 89800 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:45,457-Speed 2607.95 samples/sec Loss 12.3776 LearningRate 0.0795 Epoch: 2 Global Step: 89810 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:49,377-Speed 2612.86 samples/sec Loss 12.4081 LearningRate 0.0795 Epoch: 2 Global Step: 89820 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:47:53,310-Speed 2604.18 samples/sec Loss 12.6405 LearningRate 0.0795 Epoch: 2 Global Step: 89830 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:47:57,218-Speed 2621.71 samples/sec Loss 12.5063 LearningRate 0.0795 Epoch: 2 Global Step: 89840 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:01,123-Speed 2622.42 samples/sec Loss 12.4423 LearningRate 0.0795 Epoch: 2 Global Step: 89850 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:05,023-Speed 2626.43 samples/sec Loss 12.5655 LearningRate 0.0795 Epoch: 2 Global Step: 89860 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:09,021-Speed 2561.79 samples/sec Loss 12.4024 LearningRate 0.0795 Epoch: 2 Global Step: 89870 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:12,940-Speed 2614.08 samples/sec Loss 12.4850 LearningRate 0.0795 Epoch: 2 Global Step: 89880 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:16,840-Speed 2626.27 samples/sec Loss 12.5925 LearningRate 0.0795 Epoch: 2 Global Step: 89890 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:20,742-Speed 2625.22 samples/sec Loss 12.5711 LearningRate 0.0795 Epoch: 2 Global Step: 89900 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:24,637-Speed 2629.67 samples/sec Loss 12.6117 LearningRate 0.0795 Epoch: 2 Global Step: 89910 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:28,546-Speed 2620.37 samples/sec Loss 12.4748 LearningRate 0.0795 Epoch: 2 Global Step: 89920 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:32,430-Speed 2637.11 samples/sec Loss 12.5805 LearningRate 0.0795 Epoch: 2 Global Step: 89930 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:36,333-Speed 2624.41 samples/sec Loss 12.4757 LearningRate 0.0795 Epoch: 2 Global Step: 89940 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:40,227-Speed 2630.10 samples/sec Loss 12.4222 LearningRate 0.0795 Epoch: 2 Global Step: 89950 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:44,128-Speed 2625.76 samples/sec Loss 12.3750 LearningRate 0.0795 Epoch: 2 Global Step: 89960 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:48,026-Speed 2627.32 samples/sec Loss 12.4639 LearningRate 0.0795 Epoch: 2 Global Step: 89970 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:51,925-Speed 2626.73 samples/sec Loss 12.5041 LearningRate 0.0795 Epoch: 2 Global Step: 89980 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:55,830-Speed 2623.67 samples/sec Loss 12.4757 LearningRate 0.0795 Epoch: 2 Global Step: 89990 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:48:59,736-Speed 2622.09 samples/sec Loss 12.4954 LearningRate 0.0795 Epoch: 2 Global Step: 90000 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:49:42,926-[lfw][90000]XNorm: 23.094169
Training: 2022-04-13 05:49:42,927-[lfw][90000]Accuracy-Flip: 0.99667+-0.00258
Training: 2022-04-13 05:49:42,927-[lfw][90000]Accuracy-Highest: 0.99783
Training: 2022-04-13 05:50:33,196-[cfp_fp][90000]XNorm: 21.114501
Training: 2022-04-13 05:50:33,197-[cfp_fp][90000]Accuracy-Flip: 0.97986+-0.00488
Training: 2022-04-13 05:50:33,197-[cfp_fp][90000]Accuracy-Highest: 0.97986
Training: 2022-04-13 05:51:16,475-[agedb_30][90000]XNorm: 23.006428
Training: 2022-04-13 05:51:16,476-[agedb_30][90000]Accuracy-Flip: 0.96333+-0.00650
Training: 2022-04-13 05:51:16,476-[agedb_30][90000]Accuracy-Highest: 0.96600
Training: 2022-04-13 05:51:20,437-Speed 72.78 samples/sec Loss 12.6087 LearningRate 0.0795 Epoch: 2 Global Step: 90010 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:51:24,313-Speed 2642.72 samples/sec Loss 12.4703 LearningRate 0.0795 Epoch: 2 Global Step: 90020 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:51:28,175-Speed 2652.19 samples/sec Loss 12.5309 LearningRate 0.0795 Epoch: 2 Global Step: 90030 Fp16 Grad Scale: 524288 Required: 83 hours
Training: 2022-04-13 05:51:32,033-Speed 2654.34 samples/sec Loss 12.5832 LearningRate 0.0795 Epoch: 2 Global Step: 90040 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:51:35,901-Speed 2647.82 samples/sec Loss 12.5604 LearningRate 0.0795 Epoch: 2 Global Step: 90050 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:51:39,769-Speed 2648.56 samples/sec Loss 12.4957 LearningRate 0.0795 Epoch: 2 Global Step: 90060 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:51:43,644-Speed 2643.48 samples/sec Loss 12.4759 LearningRate 0.0795 Epoch: 2 Global Step: 90070 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:51:47,512-Speed 2647.61 samples/sec Loss 12.5796 LearningRate 0.0795 Epoch: 2 Global Step: 90080 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:51:51,390-Speed 2641.22 samples/sec Loss 12.5081 LearningRate 0.0795 Epoch: 2 Global Step: 90090 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:51:55,268-Speed 2641.37 samples/sec Loss 12.4691 LearningRate 0.0795 Epoch: 2 Global Step: 90100 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:51:59,155-Speed 2634.70 samples/sec Loss 12.6911 LearningRate 0.0795 Epoch: 2 Global Step: 90110 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:03,041-Speed 2635.82 samples/sec Loss 12.5042 LearningRate 0.0795 Epoch: 2 Global Step: 90120 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:06,927-Speed 2635.58 samples/sec Loss 12.5590 LearningRate 0.0795 Epoch: 2 Global Step: 90130 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:10,813-Speed 2635.31 samples/sec Loss 12.5113 LearningRate 0.0794 Epoch: 2 Global Step: 90140 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:14,700-Speed 2635.79 samples/sec Loss 12.4965 LearningRate 0.0794 Epoch: 2 Global Step: 90150 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:18,585-Speed 2636.14 samples/sec Loss 12.5488 LearningRate 0.0794 Epoch: 2 Global Step: 90160 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:22,478-Speed 2630.92 samples/sec Loss 12.6433 LearningRate 0.0794 Epoch: 2 Global Step: 90170 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:26,378-Speed 2626.29 samples/sec Loss 12.4645 LearningRate 0.0794 Epoch: 2 Global Step: 90180 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:52:30,283-Speed 2623.09 samples/sec Loss 12.5801 LearningRate 0.0794 Epoch: 2 Global Step: 90190 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:52:34,194-Speed 2618.51 samples/sec Loss 12.5438 LearningRate 0.0794 Epoch: 2 Global Step: 90200 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:52:38,093-Speed 2626.92 samples/sec Loss 12.3670 LearningRate 0.0794 Epoch: 2 Global Step: 90210 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:52:42,006-Speed 2617.90 samples/sec Loss 12.5421 LearningRate 0.0794 Epoch: 2 Global Step: 90220 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:52:45,887-Speed 2638.83 samples/sec Loss 12.6265 LearningRate 0.0794 Epoch: 2 Global Step: 90230 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:49,783-Speed 2629.41 samples/sec Loss 12.5942 LearningRate 0.0794 Epoch: 2 Global Step: 90240 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:53,676-Speed 2630.92 samples/sec Loss 12.2895 LearningRate 0.0794 Epoch: 2 Global Step: 90250 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:52:57,569-Speed 2631.12 samples/sec Loss 12.3536 LearningRate 0.0794 Epoch: 2 Global Step: 90260 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:53:01,459-Speed 2632.98 samples/sec Loss 12.3689 LearningRate 0.0794 Epoch: 2 Global Step: 90270 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:53:05,353-Speed 2629.62 samples/sec Loss 12.5258 LearningRate 0.0794 Epoch: 2 Global Step: 90280 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:53:09,263-Speed 2619.35 samples/sec Loss 12.3166 LearningRate 0.0794 Epoch: 2 Global Step: 90290 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:53:13,163-Speed 2626.98 samples/sec Loss 12.2435 LearningRate 0.0794 Epoch: 2 Global Step: 90300 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:53:17,061-Speed 2627.25 samples/sec Loss 12.4989 LearningRate 0.0794 Epoch: 2 Global Step: 90310 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:53:20,960-Speed 2626.83 samples/sec Loss 12.3757 LearningRate 0.0794 Epoch: 2 Global Step: 90320 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:53:24,852-Speed 2631.48 samples/sec Loss 12.2589 LearningRate 0.0794 Epoch: 2 Global Step: 90330 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:28,746-Speed 2630.83 samples/sec Loss 12.5611 LearningRate 0.0794 Epoch: 2 Global Step: 90340 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:32,660-Speed 2616.87 samples/sec Loss 12.5392 LearningRate 0.0794 Epoch: 2 Global Step: 90350 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:36,552-Speed 2631.79 samples/sec Loss 12.4399 LearningRate 0.0794 Epoch: 2 Global Step: 90360 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:40,446-Speed 2630.14 samples/sec Loss 12.4126 LearningRate 0.0794 Epoch: 2 Global Step: 90370 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:44,353-Speed 2621.18 samples/sec Loss 12.6504 LearningRate 0.0794 Epoch: 2 Global Step: 90380 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:48,246-Speed 2631.18 samples/sec Loss 12.4746 LearningRate 0.0794 Epoch: 2 Global Step: 90390 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:52,140-Speed 2630.33 samples/sec Loss 12.5007 LearningRate 0.0794 Epoch: 2 Global Step: 90400 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:56,031-Speed 2631.76 samples/sec Loss 12.6204 LearningRate 0.0794 Epoch: 2 Global Step: 90410 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:53:59,930-Speed 2627.74 samples/sec Loss 12.6546 LearningRate 0.0794 Epoch: 2 Global Step: 90420 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:54:03,828-Speed 2627.76 samples/sec Loss 12.3924 LearningRate 0.0794 Epoch: 2 Global Step: 90430 Fp16 Grad Scale: 524288 Required: 83 hours
Training: 2022-04-13 05:54:07,687-Speed 2653.58 samples/sec Loss 12.4004 LearningRate 0.0794 Epoch: 2 Global Step: 90440 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:54:11,609-Speed 2611.61 samples/sec Loss 12.5268 LearningRate 0.0794 Epoch: 2 Global Step: 90450 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:54:15,599-Speed 2566.65 samples/sec Loss 12.5390 LearningRate 0.0794 Epoch: 2 Global Step: 90460 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:54:19,476-Speed 2642.24 samples/sec Loss 12.6056 LearningRate 0.0794 Epoch: 2 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:23,368-Speed 2631.61 samples/sec Loss 12.4067 LearningRate 0.0794 Epoch: 2 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:27,255-Speed 2635.11 samples/sec Loss 12.5806 LearningRate 0.0794 Epoch: 2 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:31,149-Speed 2630.16 samples/sec Loss 12.4007 LearningRate 0.0794 Epoch: 2 Global Step: 90500 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:35,043-Speed 2630.65 samples/sec Loss 12.6431 LearningRate 0.0794 Epoch: 2 Global Step: 90510 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:38,934-Speed 2632.37 samples/sec Loss 12.3786 LearningRate 0.0794 Epoch: 2 Global Step: 90520 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:42,822-Speed 2634.19 samples/sec Loss 12.4187 LearningRate 0.0794 Epoch: 2 Global Step: 90530 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:46,713-Speed 2631.98 samples/sec Loss 12.4650 LearningRate 0.0794 Epoch: 2 Global Step: 90540 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:50,609-Speed 2629.16 samples/sec Loss 12.5458 LearningRate 0.0794 Epoch: 2 Global Step: 90550 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:54,501-Speed 2632.17 samples/sec Loss 12.4139 LearningRate 0.0794 Epoch: 2 Global Step: 90560 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:54:58,401-Speed 2625.67 samples/sec Loss 12.3967 LearningRate 0.0794 Epoch: 2 Global Step: 90570 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:02,293-Speed 2631.63 samples/sec Loss 12.5470 LearningRate 0.0794 Epoch: 2 Global Step: 90580 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:06,185-Speed 2631.52 samples/sec Loss 12.3688 LearningRate 0.0794 Epoch: 2 Global Step: 90590 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:10,079-Speed 2630.20 samples/sec Loss 12.5596 LearningRate 0.0794 Epoch: 2 Global Step: 90600 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:13,975-Speed 2628.73 samples/sec Loss 12.5415 LearningRate 0.0793 Epoch: 2 Global Step: 90610 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:17,870-Speed 2630.61 samples/sec Loss 12.4552 LearningRate 0.0793 Epoch: 2 Global Step: 90620 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:21,763-Speed 2630.98 samples/sec Loss 12.3131 LearningRate 0.0793 Epoch: 2 Global Step: 90630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:25,660-Speed 2627.80 samples/sec Loss 12.4707 LearningRate 0.0793 Epoch: 2 Global Step: 90640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:29,559-Speed 2627.05 samples/sec Loss 12.4925 LearningRate 0.0793 Epoch: 2 Global Step: 90650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:33,465-Speed 2622.52 samples/sec Loss 12.5258 LearningRate 0.0793 Epoch: 2 Global Step: 90660 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:55:37,361-Speed 2628.47 samples/sec Loss 12.4574 LearningRate 0.0793 Epoch: 2 Global Step: 90670 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:55:41,254-Speed 2630.73 samples/sec Loss 12.5098 LearningRate 0.0793 Epoch: 2 Global Step: 90680 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:55:45,149-Speed 2629.78 samples/sec Loss 12.5191 LearningRate 0.0793 Epoch: 2 Global Step: 90690 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:55:49,042-Speed 2630.86 samples/sec Loss 12.4256 LearningRate 0.0793 Epoch: 2 Global Step: 90700 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:55:52,938-Speed 2629.34 samples/sec Loss 12.5212 LearningRate 0.0793 Epoch: 2 Global Step: 90710 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:55:56,833-Speed 2629.98 samples/sec Loss 12.3401 LearningRate 0.0793 Epoch: 2 Global Step: 90720 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:00,732-Speed 2626.35 samples/sec Loss 12.4726 LearningRate 0.0793 Epoch: 2 Global Step: 90730 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:04,630-Speed 2627.81 samples/sec Loss 12.3998 LearningRate 0.0793 Epoch: 2 Global Step: 90740 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:08,523-Speed 2630.95 samples/sec Loss 12.3885 LearningRate 0.0793 Epoch: 2 Global Step: 90750 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:12,415-Speed 2631.19 samples/sec Loss 12.4149 LearningRate 0.0793 Epoch: 2 Global Step: 90760 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:16,319-Speed 2623.74 samples/sec Loss 12.4627 LearningRate 0.0793 Epoch: 2 Global Step: 90770 Fp16 Grad Scale: 524288 Required: 83 hours
Training: 2022-04-13 05:56:20,209-Speed 2632.77 samples/sec Loss 12.5551 LearningRate 0.0793 Epoch: 2 Global Step: 90780 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:24,106-Speed 2628.26 samples/sec Loss 12.5272 LearningRate 0.0793 Epoch: 2 Global Step: 90790 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:27,998-Speed 2631.76 samples/sec Loss 12.3950 LearningRate 0.0793 Epoch: 2 Global Step: 90800 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:31,900-Speed 2625.28 samples/sec Loss 12.4772 LearningRate 0.0793 Epoch: 2 Global Step: 90810 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:35,818-Speed 2614.24 samples/sec Loss 12.6655 LearningRate 0.0793 Epoch: 2 Global Step: 90820 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:39,715-Speed 2628.01 samples/sec Loss 12.3999 LearningRate 0.0793 Epoch: 2 Global Step: 90830 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:43,609-Speed 2630.13 samples/sec Loss 12.2097 LearningRate 0.0793 Epoch: 2 Global Step: 90840 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:47,517-Speed 2621.09 samples/sec Loss 12.4997 LearningRate 0.0793 Epoch: 2 Global Step: 90850 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:51,409-Speed 2631.29 samples/sec Loss 12.4486 LearningRate 0.0793 Epoch: 2 Global Step: 90860 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:55,305-Speed 2629.39 samples/sec Loss 12.4566 LearningRate 0.0793 Epoch: 2 Global Step: 90870 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:56:59,186-Speed 2638.71 samples/sec Loss 12.4814 LearningRate 0.0793 Epoch: 2 Global Step: 90880 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:57:03,103-Speed 2614.56 samples/sec Loss 12.6137 LearningRate 0.0793 Epoch: 2 Global Step: 90890 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:57:07,007-Speed 2623.86 samples/sec Loss 12.4833 LearningRate 0.0793 Epoch: 2 Global Step: 90900 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:57:10,908-Speed 2625.60 samples/sec Loss 12.4288 LearningRate 0.0793 Epoch: 2 Global Step: 90910 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:57:14,808-Speed 2628.64 samples/sec Loss 12.6207 LearningRate 0.0793 Epoch: 2 Global Step: 90920 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:57:18,705-Speed 2627.87 samples/sec Loss 12.4292 LearningRate 0.0793 Epoch: 2 Global Step: 90930 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:57:22,590-Speed 2636.78 samples/sec Loss 12.4654 LearningRate 0.0793 Epoch: 2 Global Step: 90940 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:26,476-Speed 2635.31 samples/sec Loss 12.5246 LearningRate 0.0793 Epoch: 2 Global Step: 90950 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:30,374-Speed 2627.95 samples/sec Loss 12.4688 LearningRate 0.0793 Epoch: 2 Global Step: 90960 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:34,269-Speed 2629.20 samples/sec Loss 12.5058 LearningRate 0.0793 Epoch: 2 Global Step: 90970 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:38,165-Speed 2629.00 samples/sec Loss 12.4112 LearningRate 0.0793 Epoch: 2 Global Step: 90980 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:42,060-Speed 2629.83 samples/sec Loss 12.3955 LearningRate 0.0793 Epoch: 2 Global Step: 90990 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:45,953-Speed 2630.87 samples/sec Loss 12.5482 LearningRate 0.0793 Epoch: 2 Global Step: 91000 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:49,848-Speed 2629.93 samples/sec Loss 12.3594 LearningRate 0.0793 Epoch: 2 Global Step: 91010 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:53,743-Speed 2629.04 samples/sec Loss 12.4187 LearningRate 0.0793 Epoch: 2 Global Step: 91020 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:57:57,640-Speed 2628.37 samples/sec Loss 12.4988 LearningRate 0.0793 Epoch: 2 Global Step: 91030 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:01,536-Speed 2629.11 samples/sec Loss 12.4263 LearningRate 0.0793 Epoch: 2 Global Step: 91040 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:58:05,452-Speed 2615.35 samples/sec Loss 12.3978 LearningRate 0.0793 Epoch: 2 Global Step: 91050 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:58:09,333-Speed 2638.77 samples/sec Loss 12.5106 LearningRate 0.0793 Epoch: 2 Global Step: 91060 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:13,229-Speed 2629.43 samples/sec Loss 12.3828 LearningRate 0.0792 Epoch: 2 Global Step: 91070 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:17,124-Speed 2628.93 samples/sec Loss 12.3700 LearningRate 0.0792 Epoch: 2 Global Step: 91080 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:21,019-Speed 2630.55 samples/sec Loss 12.5268 LearningRate 0.0792 Epoch: 2 Global Step: 91090 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:24,914-Speed 2629.64 samples/sec Loss 12.4918 LearningRate 0.0792 Epoch: 2 Global Step: 91100 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:28,820-Speed 2623.76 samples/sec Loss 12.6271 LearningRate 0.0792 Epoch: 2 Global Step: 91110 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:32,724-Speed 2623.76 samples/sec Loss 12.3964 LearningRate 0.0792 Epoch: 2 Global Step: 91120 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:36,620-Speed 2628.82 samples/sec Loss 12.3566 LearningRate 0.0792 Epoch: 2 Global Step: 91130 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:40,512-Speed 2631.00 samples/sec Loss 12.4972 LearningRate 0.0792 Epoch: 2 Global Step: 91140 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:44,425-Speed 2618.07 samples/sec Loss 12.4289 LearningRate 0.0792 Epoch: 2 Global Step: 91150 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:58:48,320-Speed 2629.48 samples/sec Loss 12.3939 LearningRate 0.0792 Epoch: 2 Global Step: 91160 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:58:52,252-Speed 2605.11 samples/sec Loss 12.5667 LearningRate 0.0792 Epoch: 2 Global Step: 91170 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:58:56,149-Speed 2627.86 samples/sec Loss 12.4631 LearningRate 0.0792 Epoch: 2 Global Step: 91180 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:59:00,040-Speed 2632.56 samples/sec Loss 12.4002 LearningRate 0.0792 Epoch: 2 Global Step: 91190 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:03,937-Speed 2628.58 samples/sec Loss 12.4478 LearningRate 0.0792 Epoch: 2 Global Step: 91200 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:07,841-Speed 2623.57 samples/sec Loss 12.5689 LearningRate 0.0792 Epoch: 2 Global Step: 91210 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:11,741-Speed 2625.95 samples/sec Loss 12.4817 LearningRate 0.0792 Epoch: 2 Global Step: 91220 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:15,639-Speed 2627.99 samples/sec Loss 12.4373 LearningRate 0.0792 Epoch: 2 Global Step: 91230 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:19,557-Speed 2614.64 samples/sec Loss 12.4574 LearningRate 0.0792 Epoch: 2 Global Step: 91240 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:23,457-Speed 2626.21 samples/sec Loss 12.5578 LearningRate 0.0792 Epoch: 2 Global Step: 91250 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:27,355-Speed 2627.86 samples/sec Loss 12.4258 LearningRate 0.0792 Epoch: 2 Global Step: 91260 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:31,259-Speed 2623.47 samples/sec Loss 12.4576 LearningRate 0.0792 Epoch: 2 Global Step: 91270 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:35,154-Speed 2629.44 samples/sec Loss 12.5542 LearningRate 0.0792 Epoch: 2 Global Step: 91280 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:39,059-Speed 2623.12 samples/sec Loss 12.4592 LearningRate 0.0792 Epoch: 2 Global Step: 91290 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:59:42,964-Speed 2623.38 samples/sec Loss 12.3915 LearningRate 0.0792 Epoch: 2 Global Step: 91300 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 05:59:46,842-Speed 2640.80 samples/sec Loss 12.5056 LearningRate 0.0792 Epoch: 2 Global Step: 91310 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:50,746-Speed 2623.91 samples/sec Loss 12.4138 LearningRate 0.0792 Epoch: 2 Global Step: 91320 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 05:59:54,624-Speed 2641.30 samples/sec Loss 12.4868 LearningRate 0.0792 Epoch: 2 Global Step: 91330 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 05:59:58,522-Speed 2627.96 samples/sec Loss 12.2947 LearningRate 0.0792 Epoch: 2 Global Step: 91340 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:02,417-Speed 2629.72 samples/sec Loss 12.4918 LearningRate 0.0792 Epoch: 2 Global Step: 91350 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:06,315-Speed 2627.54 samples/sec Loss 12.4688 LearningRate 0.0792 Epoch: 2 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:10,223-Speed 2620.88 samples/sec Loss 12.4252 LearningRate 0.0792 Epoch: 2 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:14,120-Speed 2627.97 samples/sec Loss 12.3921 LearningRate 0.0792 Epoch: 2 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:18,017-Speed 2628.35 samples/sec Loss 12.2655 LearningRate 0.0792 Epoch: 2 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:21,913-Speed 2629.29 samples/sec Loss 12.3958 LearningRate 0.0792 Epoch: 2 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:25,809-Speed 2629.32 samples/sec Loss 12.3479 LearningRate 0.0792 Epoch: 2 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:29,712-Speed 2624.14 samples/sec Loss 12.5960 LearningRate 0.0792 Epoch: 2 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:00:33,608-Speed 2629.16 samples/sec Loss 12.4549 LearningRate 0.0792 Epoch: 2 Global Step: 91430 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:00:37,510-Speed 2624.52 samples/sec Loss 12.4075 LearningRate 0.0792 Epoch: 2 Global Step: 91440 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:00:41,435-Speed 2609.49 samples/sec Loss 12.4451 LearningRate 0.0792 Epoch: 2 Global Step: 91450 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:00:45,334-Speed 2626.88 samples/sec Loss 12.3916 LearningRate 0.0792 Epoch: 2 Global Step: 91460 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:00:49,231-Speed 2628.61 samples/sec Loss 12.3923 LearningRate 0.0792 Epoch: 2 Global Step: 91470 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:00:53,127-Speed 2628.77 samples/sec Loss 12.3342 LearningRate 0.0792 Epoch: 2 Global Step: 91480 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:00:57,034-Speed 2621.61 samples/sec Loss 12.4448 LearningRate 0.0792 Epoch: 2 Global Step: 91490 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:00,929-Speed 2629.57 samples/sec Loss 12.4216 LearningRate 0.0792 Epoch: 2 Global Step: 91500 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:04,841-Speed 2621.30 samples/sec Loss 12.6142 LearningRate 0.0792 Epoch: 2 Global Step: 91510 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:08,740-Speed 2627.00 samples/sec Loss 12.3942 LearningRate 0.0792 Epoch: 2 Global Step: 91520 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:12,633-Speed 2630.88 samples/sec Loss 12.4394 LearningRate 0.0792 Epoch: 2 Global Step: 91530 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:01:16,530-Speed 2628.50 samples/sec Loss 12.4907 LearningRate 0.0791 Epoch: 2 Global Step: 91540 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:01:20,428-Speed 2628.03 samples/sec Loss 12.3239 LearningRate 0.0791 Epoch: 2 Global Step: 91550 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:01:24,310-Speed 2638.78 samples/sec Loss 12.7019 LearningRate 0.0791 Epoch: 2 Global Step: 91560 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:28,371-Speed 2522.13 samples/sec Loss 12.4993 LearningRate 0.0791 Epoch: 2 Global Step: 91570 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:32,418-Speed 2533.28 samples/sec Loss 12.4543 LearningRate 0.0791 Epoch: 2 Global Step: 91580 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:36,320-Speed 2625.21 samples/sec Loss 12.5234 LearningRate 0.0791 Epoch: 2 Global Step: 91590 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:40,222-Speed 2624.75 samples/sec Loss 12.4793 LearningRate 0.0791 Epoch: 2 Global Step: 91600 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:44,134-Speed 2618.42 samples/sec Loss 12.5522 LearningRate 0.0791 Epoch: 2 Global Step: 91610 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:48,040-Speed 2622.10 samples/sec Loss 12.5634 LearningRate 0.0791 Epoch: 2 Global Step: 91620 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:51,937-Speed 2628.63 samples/sec Loss 12.4529 LearningRate 0.0791 Epoch: 2 Global Step: 91630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:55,846-Speed 2620.71 samples/sec Loss 12.5310 LearningRate 0.0791 Epoch: 2 Global Step: 91640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:01:59,744-Speed 2627.32 samples/sec Loss 12.3468 LearningRate 0.0791 Epoch: 2 Global Step: 91650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:03,641-Speed 2628.52 samples/sec Loss 12.4632 LearningRate 0.0791 Epoch: 2 Global Step: 91660 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:02:07,545-Speed 2623.66 samples/sec Loss 12.4375 LearningRate 0.0791 Epoch: 2 Global Step: 91670 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:02:11,431-Speed 2635.40 samples/sec Loss 12.4488 LearningRate 0.0791 Epoch: 2 Global Step: 91680 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:15,342-Speed 2619.22 samples/sec Loss 12.3730 LearningRate 0.0791 Epoch: 2 Global Step: 91690 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:19,245-Speed 2623.90 samples/sec Loss 12.4682 LearningRate 0.0791 Epoch: 2 Global Step: 91700 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:23,143-Speed 2627.47 samples/sec Loss 12.3608 LearningRate 0.0791 Epoch: 2 Global Step: 91710 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:27,044-Speed 2625.49 samples/sec Loss 12.3585 LearningRate 0.0791 Epoch: 2 Global Step: 91720 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:30,941-Speed 2628.88 samples/sec Loss 12.4800 LearningRate 0.0791 Epoch: 2 Global Step: 91730 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:34,839-Speed 2627.55 samples/sec Loss 12.5400 LearningRate 0.0791 Epoch: 2 Global Step: 91740 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:38,738-Speed 2626.92 samples/sec Loss 12.5367 LearningRate 0.0791 Epoch: 2 Global Step: 91750 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:42,638-Speed 2626.83 samples/sec Loss 12.3744 LearningRate 0.0791 Epoch: 2 Global Step: 91760 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:46,570-Speed 2604.72 samples/sec Loss 12.4668 LearningRate 0.0791 Epoch: 2 Global Step: 91770 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:50,457-Speed 2635.14 samples/sec Loss 12.4695 LearningRate 0.0791 Epoch: 2 Global Step: 91780 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:54,363-Speed 2622.42 samples/sec Loss 12.5335 LearningRate 0.0791 Epoch: 2 Global Step: 91790 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:02:58,270-Speed 2621.47 samples/sec Loss 12.4039 LearningRate 0.0791 Epoch: 2 Global Step: 91800 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:03:02,178-Speed 2620.97 samples/sec Loss 12.5271 LearningRate 0.0791 Epoch: 2 Global Step: 91810 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:03:06,076-Speed 2627.30 samples/sec Loss 12.4724 LearningRate 0.0791 Epoch: 2 Global Step: 91820 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:03:09,980-Speed 2623.77 samples/sec Loss 12.4296 LearningRate 0.0791 Epoch: 2 Global Step: 91830 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:03:13,891-Speed 2618.90 samples/sec Loss 12.4129 LearningRate 0.0791 Epoch: 2 Global Step: 91840 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:03:17,807-Speed 2615.59 samples/sec Loss 12.2908 LearningRate 0.0791 Epoch: 2 Global Step: 91850 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:03:21,721-Speed 2617.21 samples/sec Loss 12.3117 LearningRate 0.0791 Epoch: 2 Global Step: 91860 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:03:25,598-Speed 2641.66 samples/sec Loss 12.5092 LearningRate 0.0791 Epoch: 2 Global Step: 91870 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:29,503-Speed 2623.10 samples/sec Loss 12.4458 LearningRate 0.0791 Epoch: 2 Global Step: 91880 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:33,399-Speed 2629.15 samples/sec Loss 12.4717 LearningRate 0.0791 Epoch: 2 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:37,294-Speed 2629.37 samples/sec Loss 12.5793 LearningRate 0.0791 Epoch: 2 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:41,192-Speed 2627.52 samples/sec Loss 12.4189 LearningRate 0.0791 Epoch: 2 Global Step: 91910 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:45,091-Speed 2627.36 samples/sec Loss 12.5457 LearningRate 0.0791 Epoch: 2 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:48,999-Speed 2620.09 samples/sec Loss 12.3322 LearningRate 0.0791 Epoch: 2 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:52,913-Speed 2617.34 samples/sec Loss 12.4618 LearningRate 0.0791 Epoch: 2 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:03:56,818-Speed 2622.45 samples/sec Loss 12.4108 LearningRate 0.0791 Epoch: 2 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:00,720-Speed 2625.70 samples/sec Loss 12.4333 LearningRate 0.0791 Epoch: 2 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:04,621-Speed 2625.28 samples/sec Loss 12.3345 LearningRate 0.0791 Epoch: 2 Global Step: 91970 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:04:08,520-Speed 2626.99 samples/sec Loss 12.3374 LearningRate 0.0791 Epoch: 2 Global Step: 91980 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:04:12,417-Speed 2628.02 samples/sec Loss 12.3876 LearningRate 0.0791 Epoch: 2 Global Step: 91990 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:04:16,315-Speed 2627.94 samples/sec Loss 12.4860 LearningRate 0.0790 Epoch: 2 Global Step: 92000 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:04:20,251-Speed 2602.22 samples/sec Loss 12.3281 LearningRate 0.0790 Epoch: 2 Global Step: 92010 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:04:24,148-Speed 2628.94 samples/sec Loss 12.4746 LearningRate 0.0790 Epoch: 2 Global Step: 92020 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:28,059-Speed 2619.12 samples/sec Loss 12.4352 LearningRate 0.0790 Epoch: 2 Global Step: 92030 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:31,969-Speed 2619.30 samples/sec Loss 12.5712 LearningRate 0.0790 Epoch: 2 Global Step: 92040 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:35,886-Speed 2615.11 samples/sec Loss 12.4460 LearningRate 0.0790 Epoch: 2 Global Step: 92050 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:39,793-Speed 2621.58 samples/sec Loss 12.3491 LearningRate 0.0790 Epoch: 2 Global Step: 92060 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:43,698-Speed 2622.78 samples/sec Loss 12.4635 LearningRate 0.0790 Epoch: 2 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:47,609-Speed 2618.43 samples/sec Loss 12.4257 LearningRate 0.0790 Epoch: 2 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:51,514-Speed 2623.18 samples/sec Loss 12.3188 LearningRate 0.0790 Epoch: 2 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:55,414-Speed 2626.97 samples/sec Loss 12.4254 LearningRate 0.0790 Epoch: 2 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:04:59,315-Speed 2625.30 samples/sec Loss 12.3608 LearningRate 0.0790 Epoch: 2 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:05:03,213-Speed 2627.65 samples/sec Loss 12.4065 LearningRate 0.0790 Epoch: 2 Global Step: 92120 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:07,110-Speed 2628.13 samples/sec Loss 12.4485 LearningRate 0.0790 Epoch: 2 Global Step: 92130 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:11,008-Speed 2627.72 samples/sec Loss 12.3711 LearningRate 0.0790 Epoch: 2 Global Step: 92140 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:14,925-Speed 2614.76 samples/sec Loss 12.5643 LearningRate 0.0790 Epoch: 2 Global Step: 92150 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:18,883-Speed 2587.42 samples/sec Loss 12.5110 LearningRate 0.0790 Epoch: 2 Global Step: 92160 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:22,826-Speed 2598.31 samples/sec Loss 12.2933 LearningRate 0.0790 Epoch: 2 Global Step: 92170 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:26,738-Speed 2618.07 samples/sec Loss 12.4446 LearningRate 0.0790 Epoch: 2 Global Step: 92180 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:30,643-Speed 2622.91 samples/sec Loss 12.3303 LearningRate 0.0790 Epoch: 2 Global Step: 92190 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:34,545-Speed 2625.19 samples/sec Loss 12.3154 LearningRate 0.0790 Epoch: 2 Global Step: 92200 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:38,447-Speed 2624.70 samples/sec Loss 12.5075 LearningRate 0.0790 Epoch: 2 Global Step: 92210 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:05:42,348-Speed 2625.43 samples/sec Loss 12.4245 LearningRate 0.0790 Epoch: 2 Global Step: 92220 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:05:46,259-Speed 2619.00 samples/sec Loss 12.3449 LearningRate 0.0790 Epoch: 2 Global Step: 92230 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:05:50,160-Speed 2625.55 samples/sec Loss 12.4063 LearningRate 0.0790 Epoch: 2 Global Step: 92240 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:05:54,062-Speed 2624.75 samples/sec Loss 12.5487 LearningRate 0.0790 Epoch: 2 Global Step: 92250 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:05:57,961-Speed 2627.54 samples/sec Loss 12.3984 LearningRate 0.0790 Epoch: 2 Global Step: 92260 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:06:01,844-Speed 2637.24 samples/sec Loss 12.3857 LearningRate 0.0790 Epoch: 2 Global Step: 92270 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:05,750-Speed 2622.48 samples/sec Loss 12.4617 LearningRate 0.0790 Epoch: 2 Global Step: 92280 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:09,689-Speed 2599.99 samples/sec Loss 12.3088 LearningRate 0.0790 Epoch: 2 Global Step: 92290 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:13,589-Speed 2626.82 samples/sec Loss 12.4180 LearningRate 0.0790 Epoch: 2 Global Step: 92300 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:17,511-Speed 2611.17 samples/sec Loss 12.4669 LearningRate 0.0790 Epoch: 2 Global Step: 92310 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:21,413-Speed 2625.24 samples/sec Loss 12.3219 LearningRate 0.0790 Epoch: 2 Global Step: 92320 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:25,316-Speed 2624.39 samples/sec Loss 12.3936 LearningRate 0.0790 Epoch: 2 Global Step: 92330 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:29,252-Speed 2601.94 samples/sec Loss 12.4169 LearningRate 0.0790 Epoch: 2 Global Step: 92340 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:33,168-Speed 2616.47 samples/sec Loss 12.4345 LearningRate 0.0790 Epoch: 2 Global Step: 92350 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:37,068-Speed 2626.41 samples/sec Loss 12.5265 LearningRate 0.0790 Epoch: 2 Global Step: 92360 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:40,970-Speed 2624.75 samples/sec Loss 12.4710 LearningRate 0.0790 Epoch: 2 Global Step: 92370 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:44,879-Speed 2619.93 samples/sec Loss 12.5328 LearningRate 0.0790 Epoch: 2 Global Step: 92380 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:06:48,781-Speed 2625.25 samples/sec Loss 12.4016 LearningRate 0.0790 Epoch: 2 Global Step: 92390 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:06:52,695-Speed 2616.49 samples/sec Loss 12.5488 LearningRate 0.0790 Epoch: 2 Global Step: 92400 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:06:56,601-Speed 2622.72 samples/sec Loss 12.4961 LearningRate 0.0790 Epoch: 2 Global Step: 92410 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:00,507-Speed 2622.31 samples/sec Loss 12.4559 LearningRate 0.0790 Epoch: 2 Global Step: 92420 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:04,410-Speed 2624.26 samples/sec Loss 12.5178 LearningRate 0.0790 Epoch: 2 Global Step: 92430 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:08,397-Speed 2568.93 samples/sec Loss 12.5564 LearningRate 0.0790 Epoch: 2 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:12,503-Speed 2494.55 samples/sec Loss 12.5293 LearningRate 0.0790 Epoch: 2 Global Step: 92450 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:16,467-Speed 2583.16 samples/sec Loss 12.4619 LearningRate 0.0790 Epoch: 2 Global Step: 92460 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:20,374-Speed 2622.18 samples/sec Loss 12.3521 LearningRate 0.0789 Epoch: 2 Global Step: 92470 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:24,284-Speed 2619.37 samples/sec Loss 12.3378 LearningRate 0.0789 Epoch: 2 Global Step: 92480 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:07:28,206-Speed 2611.84 samples/sec Loss 12.3694 LearningRate 0.0789 Epoch: 2 Global Step: 92490 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:32,111-Speed 2622.97 samples/sec Loss 12.3951 LearningRate 0.0789 Epoch: 2 Global Step: 92500 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:36,026-Speed 2616.26 samples/sec Loss 12.3580 LearningRate 0.0789 Epoch: 2 Global Step: 92510 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:39,932-Speed 2622.14 samples/sec Loss 12.5053 LearningRate 0.0789 Epoch: 2 Global Step: 92520 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:43,835-Speed 2624.45 samples/sec Loss 12.4401 LearningRate 0.0789 Epoch: 2 Global Step: 92530 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:47,734-Speed 2626.67 samples/sec Loss 12.3168 LearningRate 0.0789 Epoch: 2 Global Step: 92540 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:51,635-Speed 2626.28 samples/sec Loss 12.3673 LearningRate 0.0789 Epoch: 2 Global Step: 92550 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:55,540-Speed 2623.45 samples/sec Loss 12.4599 LearningRate 0.0789 Epoch: 2 Global Step: 92560 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:07:59,442-Speed 2624.73 samples/sec Loss 12.4471 LearningRate 0.0789 Epoch: 2 Global Step: 92570 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:03,348-Speed 2622.12 samples/sec Loss 12.2734 LearningRate 0.0789 Epoch: 2 Global Step: 92580 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:07,260-Speed 2618.40 samples/sec Loss 12.3225 LearningRate 0.0789 Epoch: 2 Global Step: 92590 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:08:11,170-Speed 2619.77 samples/sec Loss 12.4281 LearningRate 0.0789 Epoch: 2 Global Step: 92600 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:08:15,069-Speed 2626.82 samples/sec Loss 12.4027 LearningRate 0.0789 Epoch: 2 Global Step: 92610 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:18,996-Speed 2608.62 samples/sec Loss 12.3715 LearningRate 0.0789 Epoch: 2 Global Step: 92620 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:22,892-Speed 2628.25 samples/sec Loss 12.5354 LearningRate 0.0789 Epoch: 2 Global Step: 92630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:26,797-Speed 2623.86 samples/sec Loss 12.4225 LearningRate 0.0789 Epoch: 2 Global Step: 92640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:30,710-Speed 2617.83 samples/sec Loss 12.4115 LearningRate 0.0789 Epoch: 2 Global Step: 92650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:34,617-Speed 2621.28 samples/sec Loss 12.5527 LearningRate 0.0789 Epoch: 2 Global Step: 92660 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:38,519-Speed 2624.47 samples/sec Loss 12.3298 LearningRate 0.0789 Epoch: 2 Global Step: 92670 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:42,428-Speed 2620.13 samples/sec Loss 12.3476 LearningRate 0.0789 Epoch: 2 Global Step: 92680 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:46,337-Speed 2620.37 samples/sec Loss 12.4444 LearningRate 0.0789 Epoch: 2 Global Step: 92690 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:50,239-Speed 2625.05 samples/sec Loss 12.3834 LearningRate 0.0789 Epoch: 2 Global Step: 92700 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:08:54,147-Speed 2620.95 samples/sec Loss 12.4194 LearningRate 0.0789 Epoch: 2 Global Step: 92710 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:08:58,050-Speed 2624.57 samples/sec Loss 12.3069 LearningRate 0.0789 Epoch: 2 Global Step: 92720 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:01,957-Speed 2621.15 samples/sec Loss 12.4295 LearningRate 0.0789 Epoch: 2 Global Step: 92730 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:05,875-Speed 2614.53 samples/sec Loss 12.5686 LearningRate 0.0789 Epoch: 2 Global Step: 92740 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:09,783-Speed 2621.28 samples/sec Loss 12.3455 LearningRate 0.0789 Epoch: 2 Global Step: 92750 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:13,688-Speed 2622.80 samples/sec Loss 12.4333 LearningRate 0.0789 Epoch: 2 Global Step: 92760 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:17,587-Speed 2626.50 samples/sec Loss 12.2531 LearningRate 0.0789 Epoch: 2 Global Step: 92770 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:21,494-Speed 2622.43 samples/sec Loss 12.5302 LearningRate 0.0789 Epoch: 2 Global Step: 92780 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:25,405-Speed 2618.52 samples/sec Loss 12.4305 LearningRate 0.0789 Epoch: 2 Global Step: 92790 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:29,326-Speed 2612.75 samples/sec Loss 12.4207 LearningRate 0.0789 Epoch: 2 Global Step: 92800 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:33,247-Speed 2612.14 samples/sec Loss 12.6008 LearningRate 0.0789 Epoch: 2 Global Step: 92810 Fp16 Grad Scale: 524288 Required: 83 hours
Training: 2022-04-13 06:09:37,131-Speed 2636.73 samples/sec Loss 12.6143 LearningRate 0.0789 Epoch: 2 Global Step: 92820 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:09:41,015-Speed 2637.50 samples/sec Loss 12.4221 LearningRate 0.0789 Epoch: 2 Global Step: 92830 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:09:44,919-Speed 2623.94 samples/sec Loss 12.3819 LearningRate 0.0789 Epoch: 2 Global Step: 92840 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:09:48,821-Speed 2624.45 samples/sec Loss 12.4188 LearningRate 0.0789 Epoch: 2 Global Step: 92850 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:09:52,724-Speed 2624.72 samples/sec Loss 12.4187 LearningRate 0.0789 Epoch: 2 Global Step: 92860 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:09:56,621-Speed 2628.30 samples/sec Loss 12.3679 LearningRate 0.0789 Epoch: 2 Global Step: 92870 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:10:00,519-Speed 2628.03 samples/sec Loss 12.4832 LearningRate 0.0789 Epoch: 2 Global Step: 92880 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:10:04,418-Speed 2626.48 samples/sec Loss 12.3896 LearningRate 0.0789 Epoch: 2 Global Step: 92890 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:10:08,320-Speed 2625.10 samples/sec Loss 12.2901 LearningRate 0.0789 Epoch: 2 Global Step: 92900 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:10:12,219-Speed 2626.59 samples/sec Loss 12.5101 LearningRate 0.0789 Epoch: 2 Global Step: 92910 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:10:16,118-Speed 2626.88 samples/sec Loss 12.3521 LearningRate 0.0789 Epoch: 2 Global Step: 92920 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:10:20,018-Speed 2626.40 samples/sec Loss 12.4205 LearningRate 0.0789 Epoch: 2 Global Step: 92930 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:23,922-Speed 2623.22 samples/sec Loss 12.3975 LearningRate 0.0788 Epoch: 2 Global Step: 92940 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:27,828-Speed 2622.64 samples/sec Loss 12.3854 LearningRate 0.0788 Epoch: 2 Global Step: 92950 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:31,851-Speed 2546.27 samples/sec Loss 12.5221 LearningRate 0.0788 Epoch: 2 Global Step: 92960 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:35,756-Speed 2622.57 samples/sec Loss 12.4377 LearningRate 0.0788 Epoch: 2 Global Step: 92970 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:39,658-Speed 2624.62 samples/sec Loss 12.3526 LearningRate 0.0788 Epoch: 2 Global Step: 92980 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:43,562-Speed 2623.49 samples/sec Loss 12.3768 LearningRate 0.0788 Epoch: 2 Global Step: 92990 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:47,478-Speed 2615.43 samples/sec Loss 12.5218 LearningRate 0.0788 Epoch: 2 Global Step: 93000 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:51,408-Speed 2606.71 samples/sec Loss 12.4254 LearningRate 0.0788 Epoch: 2 Global Step: 93010 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:55,344-Speed 2602.54 samples/sec Loss 12.2678 LearningRate 0.0788 Epoch: 2 Global Step: 93020 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:10:59,237-Speed 2630.73 samples/sec Loss 12.5705 LearningRate 0.0788 Epoch: 2 Global Step: 93030 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:03,162-Speed 2609.39 samples/sec Loss 12.2680 LearningRate 0.0788 Epoch: 2 Global Step: 93040 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:07,074-Speed 2618.94 samples/sec Loss 12.4655 LearningRate 0.0788 Epoch: 2 Global Step: 93050 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:10,976-Speed 2624.61 samples/sec Loss 12.3664 LearningRate 0.0788 Epoch: 2 Global Step: 93060 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:14,881-Speed 2629.25 samples/sec Loss 12.4601 LearningRate 0.0788 Epoch: 2 Global Step: 93070 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:18,783-Speed 2624.44 samples/sec Loss 12.4148 LearningRate 0.0788 Epoch: 2 Global Step: 93080 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:22,689-Speed 2622.78 samples/sec Loss 12.4475 LearningRate 0.0788 Epoch: 2 Global Step: 93090 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:26,592-Speed 2624.29 samples/sec Loss 12.2638 LearningRate 0.0788 Epoch: 2 Global Step: 93100 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:30,506-Speed 2617.05 samples/sec Loss 12.2595 LearningRate 0.0788 Epoch: 2 Global Step: 93110 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:11:34,404-Speed 2627.81 samples/sec Loss 12.2703 LearningRate 0.0788 Epoch: 2 Global Step: 93120 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:11:38,306-Speed 2624.93 samples/sec Loss 12.3557 LearningRate 0.0788 Epoch: 2 Global Step: 93130 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:11:42,212-Speed 2622.46 samples/sec Loss 12.5449 LearningRate 0.0788 Epoch: 2 Global Step: 93140 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:11:46,258-Speed 2531.92 samples/sec Loss 12.5243 LearningRate 0.0788 Epoch: 2 Global Step: 93150 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:11:50,269-Speed 2553.54 samples/sec Loss 12.4441 LearningRate 0.0788 Epoch: 2 Global Step: 93160 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:11:54,180-Speed 2618.93 samples/sec Loss 12.3306 LearningRate 0.0788 Epoch: 2 Global Step: 93170 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:11:58,071-Speed 2632.83 samples/sec Loss 12.3233 LearningRate 0.0788 Epoch: 2 Global Step: 93180 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:01,975-Speed 2623.67 samples/sec Loss 12.4181 LearningRate 0.0788 Epoch: 2 Global Step: 93190 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:05,888-Speed 2617.80 samples/sec Loss 12.4432 LearningRate 0.0788 Epoch: 2 Global Step: 93200 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:09,791-Speed 2623.82 samples/sec Loss 12.4207 LearningRate 0.0788 Epoch: 2 Global Step: 93210 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:13,681-Speed 2632.85 samples/sec Loss 12.4727 LearningRate 0.0788 Epoch: 2 Global Step: 93220 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:17,577-Speed 2629.09 samples/sec Loss 12.4071 LearningRate 0.0788 Epoch: 2 Global Step: 93230 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:21,473-Speed 2629.47 samples/sec Loss 12.4357 LearningRate 0.0788 Epoch: 2 Global Step: 93240 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:25,372-Speed 2626.90 samples/sec Loss 12.4518 LearningRate 0.0788 Epoch: 2 Global Step: 93250 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:29,278-Speed 2622.35 samples/sec Loss 12.4049 LearningRate 0.0788 Epoch: 2 Global Step: 93260 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:33,209-Speed 2605.81 samples/sec Loss 12.5645 LearningRate 0.0788 Epoch: 2 Global Step: 93270 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:37,119-Speed 2619.50 samples/sec Loss 12.3919 LearningRate 0.0788 Epoch: 2 Global Step: 93280 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:41,015-Speed 2628.61 samples/sec Loss 12.4464 LearningRate 0.0788 Epoch: 2 Global Step: 93290 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:44,924-Speed 2620.35 samples/sec Loss 12.3727 LearningRate 0.0788 Epoch: 2 Global Step: 93300 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:48,840-Speed 2615.79 samples/sec Loss 12.5044 LearningRate 0.0788 Epoch: 2 Global Step: 93310 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:12:52,744-Speed 2623.39 samples/sec Loss 12.4332 LearningRate 0.0788 Epoch: 2 Global Step: 93320 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:12:56,653-Speed 2620.87 samples/sec Loss 12.5418 LearningRate 0.0788 Epoch: 2 Global Step: 93330 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:13:00,554-Speed 2625.18 samples/sec Loss 12.2786 LearningRate 0.0788 Epoch: 2 Global Step: 93340 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:13:04,439-Speed 2637.31 samples/sec Loss 12.3841 LearningRate 0.0788 Epoch: 2 Global Step: 93350 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:08,342-Speed 2623.96 samples/sec Loss 12.3935 LearningRate 0.0788 Epoch: 2 Global Step: 93360 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:12,251-Speed 2620.01 samples/sec Loss 12.3972 LearningRate 0.0788 Epoch: 2 Global Step: 93370 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:16,153-Speed 2624.55 samples/sec Loss 12.3968 LearningRate 0.0788 Epoch: 2 Global Step: 93380 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:20,059-Speed 2622.64 samples/sec Loss 12.3372 LearningRate 0.0788 Epoch: 2 Global Step: 93390 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:23,959-Speed 2626.55 samples/sec Loss 12.2385 LearningRate 0.0788 Epoch: 2 Global Step: 93400 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:27,867-Speed 2621.16 samples/sec Loss 12.4934 LearningRate 0.0787 Epoch: 2 Global Step: 93410 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:31,771-Speed 2623.41 samples/sec Loss 12.3868 LearningRate 0.0787 Epoch: 2 Global Step: 93420 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:35,671-Speed 2627.24 samples/sec Loss 12.2568 LearningRate 0.0787 Epoch: 2 Global Step: 93430 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:39,572-Speed 2625.26 samples/sec Loss 12.3896 LearningRate 0.0787 Epoch: 2 Global Step: 93440 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:43,483-Speed 2618.61 samples/sec Loss 12.4257 LearningRate 0.0787 Epoch: 2 Global Step: 93450 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:13:47,505-Speed 2546.42 samples/sec Loss 12.4695 LearningRate 0.0787 Epoch: 2 Global Step: 93460 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:13:51,417-Speed 2618.76 samples/sec Loss 12.3422 LearningRate 0.0787 Epoch: 2 Global Step: 93470 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:13:55,316-Speed 2626.73 samples/sec Loss 12.3470 LearningRate 0.0787 Epoch: 2 Global Step: 93480 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:13:59,229-Speed 2617.45 samples/sec Loss 12.2831 LearningRate 0.0787 Epoch: 2 Global Step: 93490 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:03,142-Speed 2617.85 samples/sec Loss 12.3306 LearningRate 0.0787 Epoch: 2 Global Step: 93500 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:07,055-Speed 2617.24 samples/sec Loss 12.4846 LearningRate 0.0787 Epoch: 2 Global Step: 93510 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:10,967-Speed 2618.66 samples/sec Loss 12.3989 LearningRate 0.0787 Epoch: 2 Global Step: 93520 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:14,877-Speed 2619.20 samples/sec Loss 12.4299 LearningRate 0.0787 Epoch: 2 Global Step: 93530 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:18,791-Speed 2616.25 samples/sec Loss 12.3631 LearningRate 0.0787 Epoch: 2 Global Step: 93540 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:22,705-Speed 2617.37 samples/sec Loss 12.3723 LearningRate 0.0787 Epoch: 2 Global Step: 93550 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:26,615-Speed 2619.55 samples/sec Loss 12.3333 LearningRate 0.0787 Epoch: 2 Global Step: 93560 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:30,527-Speed 2618.28 samples/sec Loss 12.3776 LearningRate 0.0787 Epoch: 2 Global Step: 93570 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:34,437-Speed 2619.41 samples/sec Loss 12.3287 LearningRate 0.0787 Epoch: 2 Global Step: 93580 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:14:38,337-Speed 2626.27 samples/sec Loss 12.3492 LearningRate 0.0787 Epoch: 2 Global Step: 93590 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:42,244-Speed 2621.77 samples/sec Loss 12.5750 LearningRate 0.0787 Epoch: 2 Global Step: 93600 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:46,149-Speed 2623.41 samples/sec Loss 12.4599 LearningRate 0.0787 Epoch: 2 Global Step: 93610 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:50,057-Speed 2621.17 samples/sec Loss 12.4883 LearningRate 0.0787 Epoch: 2 Global Step: 93620 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:53,974-Speed 2614.49 samples/sec Loss 12.5040 LearningRate 0.0787 Epoch: 2 Global Step: 93630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:14:57,888-Speed 2617.20 samples/sec Loss 12.5360 LearningRate 0.0787 Epoch: 2 Global Step: 93640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:01,797-Speed 2620.56 samples/sec Loss 12.4002 LearningRate 0.0787 Epoch: 2 Global Step: 93650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:05,702-Speed 2622.71 samples/sec Loss 12.3310 LearningRate 0.0787 Epoch: 2 Global Step: 93660 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:09,606-Speed 2623.22 samples/sec Loss 12.3921 LearningRate 0.0787 Epoch: 2 Global Step: 93670 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:13,519-Speed 2617.45 samples/sec Loss 12.2709 LearningRate 0.0787 Epoch: 2 Global Step: 93680 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:17,423-Speed 2624.48 samples/sec Loss 12.5061 LearningRate 0.0787 Epoch: 2 Global Step: 93690 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:15:21,323-Speed 2626.40 samples/sec Loss 12.3873 LearningRate 0.0787 Epoch: 2 Global Step: 93700 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:15:25,226-Speed 2624.28 samples/sec Loss 12.3583 LearningRate 0.0787 Epoch: 2 Global Step: 93710 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:15:29,126-Speed 2625.90 samples/sec Loss 12.4231 LearningRate 0.0787 Epoch: 2 Global Step: 93720 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:15:33,041-Speed 2616.66 samples/sec Loss 12.4079 LearningRate 0.0787 Epoch: 2 Global Step: 93730 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:15:36,933-Speed 2631.32 samples/sec Loss 12.4921 LearningRate 0.0787 Epoch: 2 Global Step: 93740 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:40,848-Speed 2616.14 samples/sec Loss 12.4686 LearningRate 0.0787 Epoch: 2 Global Step: 93750 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:44,751-Speed 2625.15 samples/sec Loss 12.3115 LearningRate 0.0787 Epoch: 2 Global Step: 93760 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:48,649-Speed 2627.28 samples/sec Loss 12.3932 LearningRate 0.0787 Epoch: 2 Global Step: 93770 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:52,558-Speed 2620.32 samples/sec Loss 12.2341 LearningRate 0.0787 Epoch: 2 Global Step: 93780 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:15:56,461-Speed 2624.16 samples/sec Loss 12.4485 LearningRate 0.0787 Epoch: 2 Global Step: 93790 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:00,364-Speed 2624.56 samples/sec Loss 12.3942 LearningRate 0.0787 Epoch: 2 Global Step: 93800 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:04,264-Speed 2626.29 samples/sec Loss 12.3259 LearningRate 0.0787 Epoch: 2 Global Step: 93810 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:08,169-Speed 2622.59 samples/sec Loss 12.3153 LearningRate 0.0787 Epoch: 2 Global Step: 93820 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:12,075-Speed 2622.51 samples/sec Loss 12.2968 LearningRate 0.0787 Epoch: 2 Global Step: 93830 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:15,980-Speed 2622.73 samples/sec Loss 12.3469 LearningRate 0.0787 Epoch: 2 Global Step: 93840 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:16:19,886-Speed 2622.46 samples/sec Loss 12.3895 LearningRate 0.0787 Epoch: 2 Global Step: 93850 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:16:23,789-Speed 2624.20 samples/sec Loss 12.2863 LearningRate 0.0787 Epoch: 2 Global Step: 93860 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:16:27,690-Speed 2625.06 samples/sec Loss 12.4112 LearningRate 0.0786 Epoch: 2 Global Step: 93870 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:16:31,580-Speed 2633.97 samples/sec Loss 12.4231 LearningRate 0.0786 Epoch: 2 Global Step: 93880 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:35,479-Speed 2626.68 samples/sec Loss 12.4350 LearningRate 0.0786 Epoch: 2 Global Step: 93890 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:39,382-Speed 2623.86 samples/sec Loss 12.3679 LearningRate 0.0786 Epoch: 2 Global Step: 93900 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:43,285-Speed 2624.47 samples/sec Loss 12.3763 LearningRate 0.0786 Epoch: 2 Global Step: 93910 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:47,187-Speed 2625.19 samples/sec Loss 12.4318 LearningRate 0.0786 Epoch: 2 Global Step: 93920 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:51,087-Speed 2626.89 samples/sec Loss 12.3540 LearningRate 0.0786 Epoch: 2 Global Step: 93930 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:54,983-Speed 2628.38 samples/sec Loss 12.3429 LearningRate 0.0786 Epoch: 2 Global Step: 93940 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:16:58,888-Speed 2623.65 samples/sec Loss 12.5026 LearningRate 0.0786 Epoch: 2 Global Step: 93950 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:02,791-Speed 2623.83 samples/sec Loss 12.4750 LearningRate 0.0786 Epoch: 2 Global Step: 93960 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:06,692-Speed 2625.60 samples/sec Loss 12.4858 LearningRate 0.0786 Epoch: 2 Global Step: 93970 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:10,598-Speed 2622.15 samples/sec Loss 12.3342 LearningRate 0.0786 Epoch: 2 Global Step: 93980 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:17:14,515-Speed 2615.21 samples/sec Loss 12.3048 LearningRate 0.0786 Epoch: 2 Global Step: 93990 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:17:18,434-Speed 2613.35 samples/sec Loss 12.3014 LearningRate 0.0786 Epoch: 2 Global Step: 94000 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:22,348-Speed 2616.82 samples/sec Loss 12.3041 LearningRate 0.0786 Epoch: 2 Global Step: 94010 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:26,247-Speed 2627.46 samples/sec Loss 12.4018 LearningRate 0.0786 Epoch: 2 Global Step: 94020 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:30,159-Speed 2618.39 samples/sec Loss 12.4283 LearningRate 0.0786 Epoch: 2 Global Step: 94030 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:34,059-Speed 2625.90 samples/sec Loss 12.3740 LearningRate 0.0786 Epoch: 2 Global Step: 94040 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:37,960-Speed 2625.66 samples/sec Loss 12.4738 LearningRate 0.0786 Epoch: 2 Global Step: 94050 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:41,859-Speed 2627.15 samples/sec Loss 12.2985 LearningRate 0.0786 Epoch: 2 Global Step: 94060 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:45,771-Speed 2618.45 samples/sec Loss 12.5016 LearningRate 0.0786 Epoch: 2 Global Step: 94070 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:49,682-Speed 2618.40 samples/sec Loss 12.3559 LearningRate 0.0786 Epoch: 2 Global Step: 94080 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:53,593-Speed 2619.26 samples/sec Loss 12.3603 LearningRate 0.0786 Epoch: 2 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:17:57,478-Speed 2636.55 samples/sec Loss 12.3667 LearningRate 0.0786 Epoch: 2 Global Step: 94100 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:18:01,390-Speed 2618.24 samples/sec Loss 12.4346 LearningRate 0.0786 Epoch: 2 Global Step: 94110 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:18:05,296-Speed 2622.47 samples/sec Loss 12.4374 LearningRate 0.0786 Epoch: 2 Global Step: 94120 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:18:09,197-Speed 2625.61 samples/sec Loss 12.3677 LearningRate 0.0786 Epoch: 2 Global Step: 94130 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:18:13,097-Speed 2626.54 samples/sec Loss 12.3658 LearningRate 0.0786 Epoch: 2 Global Step: 94140 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:18:17,003-Speed 2622.09 samples/sec Loss 12.2916 LearningRate 0.0786 Epoch: 2 Global Step: 94150 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:18:20,932-Speed 2608.21 samples/sec Loss 12.5557 LearningRate 0.0786 Epoch: 2 Global Step: 94160 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:18:24,819-Speed 2635.14 samples/sec Loss 12.3815 LearningRate 0.0786 Epoch: 2 Global Step: 94170 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:28,727-Speed 2620.94 samples/sec Loss 12.3773 LearningRate 0.0786 Epoch: 2 Global Step: 94180 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:32,627-Speed 2625.95 samples/sec Loss 12.4166 LearningRate 0.0786 Epoch: 2 Global Step: 94190 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:36,584-Speed 2589.42 samples/sec Loss 12.4102 LearningRate 0.0786 Epoch: 2 Global Step: 94200 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:40,482-Speed 2627.38 samples/sec Loss 12.5074 LearningRate 0.0786 Epoch: 2 Global Step: 94210 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:44,383-Speed 2625.53 samples/sec Loss 12.5507 LearningRate 0.0786 Epoch: 2 Global Step: 94220 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:48,285-Speed 2625.00 samples/sec Loss 12.4041 LearningRate 0.0786 Epoch: 2 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:52,185-Speed 2626.07 samples/sec Loss 12.5660 LearningRate 0.0786 Epoch: 2 Global Step: 94240 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:56,088-Speed 2624.57 samples/sec Loss 12.3250 LearningRate 0.0786 Epoch: 2 Global Step: 94250 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:18:59,988-Speed 2625.99 samples/sec Loss 12.3720 LearningRate 0.0786 Epoch: 2 Global Step: 94260 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:19:03,954-Speed 2583.33 samples/sec Loss 12.1959 LearningRate 0.0786 Epoch: 2 Global Step: 94270 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:07,853-Speed 2626.94 samples/sec Loss 12.4748 LearningRate 0.0786 Epoch: 2 Global Step: 94280 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:11,759-Speed 2622.10 samples/sec Loss 12.2754 LearningRate 0.0786 Epoch: 2 Global Step: 94290 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:15,673-Speed 2616.36 samples/sec Loss 12.4161 LearningRate 0.0786 Epoch: 2 Global Step: 94300 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:19,584-Speed 2619.27 samples/sec Loss 12.4346 LearningRate 0.0786 Epoch: 2 Global Step: 94310 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:23,486-Speed 2625.01 samples/sec Loss 12.3549 LearningRate 0.0786 Epoch: 2 Global Step: 94320 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:27,416-Speed 2606.65 samples/sec Loss 12.3873 LearningRate 0.0786 Epoch: 2 Global Step: 94330 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:31,452-Speed 2537.93 samples/sec Loss 12.4361 LearningRate 0.0785 Epoch: 2 Global Step: 94340 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:35,493-Speed 2535.03 samples/sec Loss 12.2632 LearningRate 0.0785 Epoch: 2 Global Step: 94350 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:19:39,385-Speed 2631.24 samples/sec Loss 12.4849 LearningRate 0.0785 Epoch: 2 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:19:43,368-Speed 2571.60 samples/sec Loss 12.3731 LearningRate 0.0785 Epoch: 2 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:19:47,270-Speed 2624.47 samples/sec Loss 12.4836 LearningRate 0.0785 Epoch: 2 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:19:51,181-Speed 2619.44 samples/sec Loss 12.2414 LearningRate 0.0785 Epoch: 2 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:19:55,076-Speed 2629.88 samples/sec Loss 12.2957 LearningRate 0.0785 Epoch: 2 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:19:58,972-Speed 2628.93 samples/sec Loss 12.3698 LearningRate 0.0785 Epoch: 2 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:20:02,875-Speed 2624.10 samples/sec Loss 12.2326 LearningRate 0.0785 Epoch: 2 Global Step: 94420 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:20:06,777-Speed 2624.97 samples/sec Loss 12.4585 LearningRate 0.0785 Epoch: 2 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:20:10,676-Speed 2626.70 samples/sec Loss 12.4076 LearningRate 0.0785 Epoch: 2 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:20:14,590-Speed 2616.55 samples/sec Loss 12.3624 LearningRate 0.0785 Epoch: 2 Global Step: 94450 Fp16 Grad Scale: 65536 Required: 83 hours
Training: 2022-04-13 06:20:18,494-Speed 2624.13 samples/sec Loss 12.3479 LearningRate 0.0785 Epoch: 2 Global Step: 94460 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:22,398-Speed 2623.42 samples/sec Loss 12.4136 LearningRate 0.0785 Epoch: 2 Global Step: 94470 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:26,304-Speed 2622.91 samples/sec Loss 12.2839 LearningRate 0.0785 Epoch: 2 Global Step: 94480 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:30,206-Speed 2624.99 samples/sec Loss 12.1890 LearningRate 0.0785 Epoch: 2 Global Step: 94490 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:34,104-Speed 2627.74 samples/sec Loss 12.3879 LearningRate 0.0785 Epoch: 2 Global Step: 94500 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:38,010-Speed 2622.23 samples/sec Loss 12.5266 LearningRate 0.0785 Epoch: 2 Global Step: 94510 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:41,912-Speed 2625.76 samples/sec Loss 12.5609 LearningRate 0.0785 Epoch: 2 Global Step: 94520 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:45,823-Speed 2618.26 samples/sec Loss 12.4626 LearningRate 0.0785 Epoch: 2 Global Step: 94530 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:49,723-Speed 2626.80 samples/sec Loss 12.2228 LearningRate 0.0785 Epoch: 2 Global Step: 94540 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:53,623-Speed 2626.33 samples/sec Loss 12.2862 LearningRate 0.0785 Epoch: 2 Global Step: 94550 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:20:57,538-Speed 2616.52 samples/sec Loss 12.4248 LearningRate 0.0785 Epoch: 2 Global Step: 94560 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:21:01,455-Speed 2614.80 samples/sec Loss 12.4841 LearningRate 0.0785 Epoch: 2 Global Step: 94570 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:21:05,355-Speed 2626.12 samples/sec Loss 12.2809 LearningRate 0.0785 Epoch: 2 Global Step: 94580 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:21:09,259-Speed 2623.63 samples/sec Loss 12.2275 LearningRate 0.0785 Epoch: 2 Global Step: 94590 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:21:13,171-Speed 2617.82 samples/sec Loss 12.3622 LearningRate 0.0785 Epoch: 2 Global Step: 94600 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:21:17,099-Speed 2608.54 samples/sec Loss 12.4808 LearningRate 0.0785 Epoch: 2 Global Step: 94610 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:21:21,007-Speed 2620.92 samples/sec Loss 12.3472 LearningRate 0.0785 Epoch: 2 Global Step: 94620 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:21:24,887-Speed 2639.60 samples/sec Loss 12.4973 LearningRate 0.0785 Epoch: 2 Global Step: 94630 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:28,784-Speed 2628.37 samples/sec Loss 12.2362 LearningRate 0.0785 Epoch: 2 Global Step: 94640 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:32,685-Speed 2626.19 samples/sec Loss 12.3176 LearningRate 0.0785 Epoch: 2 Global Step: 94650 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:36,587-Speed 2624.76 samples/sec Loss 12.5733 LearningRate 0.0785 Epoch: 2 Global Step: 94660 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:40,484-Speed 2628.24 samples/sec Loss 12.4492 LearningRate 0.0785 Epoch: 2 Global Step: 94670 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:44,388-Speed 2623.37 samples/sec Loss 12.4918 LearningRate 0.0785 Epoch: 2 Global Step: 94680 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:48,292-Speed 2623.59 samples/sec Loss 12.5127 LearningRate 0.0785 Epoch: 2 Global Step: 94690 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:52,218-Speed 2608.94 samples/sec Loss 12.2132 LearningRate 0.0785 Epoch: 2 Global Step: 94700 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:21:56,119-Speed 2625.69 samples/sec Loss 12.2472 LearningRate 0.0785 Epoch: 2 Global Step: 94710 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:22:00,019-Speed 2626.32 samples/sec Loss 12.1816 LearningRate 0.0785 Epoch: 2 Global Step: 94720 Fp16 Grad Scale: 131072 Required: 83 hours
Training: 2022-04-13 06:22:03,918-Speed 2627.05 samples/sec Loss 12.3433 LearningRate 0.0785 Epoch: 2 Global Step: 94730 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:22:07,820-Speed 2624.78 samples/sec Loss 12.3686 LearningRate 0.0785 Epoch: 2 Global Step: 94740 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:22:11,731-Speed 2619.17 samples/sec Loss 12.4657 LearningRate 0.0785 Epoch: 2 Global Step: 94750 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:22:15,627-Speed 2629.15 samples/sec Loss 12.5813 LearningRate 0.0785 Epoch: 2 Global Step: 94760 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:22:19,530-Speed 2624.54 samples/sec Loss 12.3781 LearningRate 0.0785 Epoch: 2 Global Step: 94770 Fp16 Grad Scale: 262144 Required: 83 hours
Training: 2022-04-13 06:22:23,560-Speed 2541.65 samples/sec Loss 12.4180 LearningRate 0.0785 Epoch: 2 Global Step: 94780 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:22:27,463-Speed 2623.99 samples/sec Loss 12.5558 LearningRate 0.0785 Epoch: 2 Global Step: 94790 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:22:31,348-Speed 2636.90 samples/sec Loss 12.4274 LearningRate 0.0785 Epoch: 2 Global Step: 94800 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:22:35,268-Speed 2612.57 samples/sec Loss 12.4152 LearningRate 0.0784 Epoch: 2 Global Step: 94810 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:22:39,166-Speed 2627.54 samples/sec Loss 12.3513 LearningRate 0.0784 Epoch: 2 Global Step: 94820 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:22:43,065-Speed 2626.61 samples/sec Loss 12.3886 LearningRate 0.0784 Epoch: 2 Global Step: 94830 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:22:46,966-Speed 2626.27 samples/sec Loss 12.3181 LearningRate 0.0784 Epoch: 2 Global Step: 94840 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:22:50,868-Speed 2625.04 samples/sec Loss 12.4398 LearningRate 0.0784 Epoch: 2 Global Step: 94850 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:22:54,772-Speed 2623.53 samples/sec Loss 12.2982 LearningRate 0.0784 Epoch: 2 Global Step: 94860 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:22:58,676-Speed 2623.74 samples/sec Loss 12.3589 LearningRate 0.0784 Epoch: 2 Global Step: 94870 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:23:02,603-Speed 2608.35 samples/sec Loss 12.2600 LearningRate 0.0784 Epoch: 2 Global Step: 94880 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:23:06,504-Speed 2625.14 samples/sec Loss 12.2818 LearningRate 0.0784 Epoch: 2 Global Step: 94890 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:23:10,405-Speed 2625.47 samples/sec Loss 12.2442 LearningRate 0.0784 Epoch: 2 Global Step: 94900 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:23:14,303-Speed 2627.44 samples/sec Loss 12.5001 LearningRate 0.0784 Epoch: 2 Global Step: 94910 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:23:18,117-Speed 2686.07 samples/sec Loss 12.4129 LearningRate 0.0784 Epoch: 2 Global Step: 94920 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:22,013-Speed 2629.09 samples/sec Loss 12.5417 LearningRate 0.0784 Epoch: 2 Global Step: 94930 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:25,909-Speed 2629.02 samples/sec Loss 12.4410 LearningRate 0.0784 Epoch: 2 Global Step: 94940 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:29,847-Speed 2601.34 samples/sec Loss 12.3869 LearningRate 0.0784 Epoch: 2 Global Step: 94950 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:33,742-Speed 2629.53 samples/sec Loss 12.4389 LearningRate 0.0784 Epoch: 2 Global Step: 94960 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:37,642-Speed 2625.53 samples/sec Loss 12.3814 LearningRate 0.0784 Epoch: 2 Global Step: 94970 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:41,540-Speed 2627.79 samples/sec Loss 12.4704 LearningRate 0.0784 Epoch: 2 Global Step: 94980 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:45,437-Speed 2628.60 samples/sec Loss 12.3706 LearningRate 0.0784 Epoch: 2 Global Step: 94990 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:49,346-Speed 2620.24 samples/sec Loss 12.3828 LearningRate 0.0784 Epoch: 2 Global Step: 95000 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:53,258-Speed 2618.59 samples/sec Loss 12.4051 LearningRate 0.0784 Epoch: 2 Global Step: 95010 Fp16 Grad Scale: 8192 Required: 82 hours
Training: 2022-04-13 06:23:57,161-Speed 2624.01 samples/sec Loss 12.5102 LearningRate 0.0784 Epoch: 2 Global Step: 95020 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:01,068-Speed 2621.91 samples/sec Loss 12.3514 LearningRate 0.0784 Epoch: 2 Global Step: 95030 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:04,969-Speed 2625.21 samples/sec Loss 12.3649 LearningRate 0.0784 Epoch: 2 Global Step: 95040 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:08,879-Speed 2619.64 samples/sec Loss 12.5036 LearningRate 0.0784 Epoch: 2 Global Step: 95050 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:12,770-Speed 2632.25 samples/sec Loss 12.2806 LearningRate 0.0784 Epoch: 2 Global Step: 95060 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:16,669-Speed 2626.94 samples/sec Loss 12.4772 LearningRate 0.0784 Epoch: 2 Global Step: 95070 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:20,571-Speed 2625.04 samples/sec Loss 12.3377 LearningRate 0.0784 Epoch: 2 Global Step: 95080 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:24,470-Speed 2627.03 samples/sec Loss 12.4531 LearningRate 0.0784 Epoch: 2 Global Step: 95090 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:28,391-Speed 2612.58 samples/sec Loss 12.4295 LearningRate 0.0784 Epoch: 2 Global Step: 95100 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:32,317-Speed 2608.45 samples/sec Loss 12.1799 LearningRate 0.0784 Epoch: 2 Global Step: 95110 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 06:24:36,286-Speed 2581.01 samples/sec Loss 12.2615 LearningRate 0.0784 Epoch: 2 Global Step: 95120 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:24:40,201-Speed 2615.93 samples/sec Loss 12.3562 LearningRate 0.0784 Epoch: 2 Global Step: 95130 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:24:44,134-Speed 2604.67 samples/sec Loss 12.3918 LearningRate 0.0784 Epoch: 2 Global Step: 95140 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:24:48,048-Speed 2616.41 samples/sec Loss 12.2960 LearningRate 0.0784 Epoch: 2 Global Step: 95150 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:24:51,978-Speed 2606.75 samples/sec Loss 12.4647 LearningRate 0.0784 Epoch: 2 Global Step: 95160 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:24:55,875-Speed 2628.40 samples/sec Loss 12.3674 LearningRate 0.0784 Epoch: 2 Global Step: 95170 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:24:59,789-Speed 2617.34 samples/sec Loss 12.3257 LearningRate 0.0784 Epoch: 2 Global Step: 95180 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:25:03,687-Speed 2627.72 samples/sec Loss 12.3591 LearningRate 0.0784 Epoch: 2 Global Step: 95190 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:25:07,590-Speed 2624.22 samples/sec Loss 12.4246 LearningRate 0.0784 Epoch: 2 Global Step: 95200 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:25:11,493-Speed 2623.85 samples/sec Loss 12.3298 LearningRate 0.0784 Epoch: 2 Global Step: 95210 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 06:25:15,410-Speed 2615.40 samples/sec Loss 12.3856 LearningRate 0.0784 Epoch: 2 Global Step: 95220 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:19,320-Speed 2619.69 samples/sec Loss 12.3294 LearningRate 0.0784 Epoch: 2 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:23,226-Speed 2622.31 samples/sec Loss 12.4524 LearningRate 0.0784 Epoch: 2 Global Step: 95240 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:27,139-Speed 2618.08 samples/sec Loss 12.2790 LearningRate 0.0784 Epoch: 2 Global Step: 95250 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:31,042-Speed 2624.16 samples/sec Loss 12.3578 LearningRate 0.0784 Epoch: 2 Global Step: 95260 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:34,939-Speed 2628.29 samples/sec Loss 12.2463 LearningRate 0.0784 Epoch: 2 Global Step: 95270 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:38,844-Speed 2622.90 samples/sec Loss 12.2554 LearningRate 0.0783 Epoch: 2 Global Step: 95280 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:42,749-Speed 2622.92 samples/sec Loss 12.4621 LearningRate 0.0783 Epoch: 2 Global Step: 95290 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:46,654-Speed 2622.89 samples/sec Loss 12.5071 LearningRate 0.0783 Epoch: 2 Global Step: 95300 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:50,582-Speed 2607.85 samples/sec Loss 12.1942 LearningRate 0.0783 Epoch: 2 Global Step: 95310 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:25:54,547-Speed 2583.39 samples/sec Loss 12.3049 LearningRate 0.0783 Epoch: 2 Global Step: 95320 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:25:58,449-Speed 2625.35 samples/sec Loss 12.3074 LearningRate 0.0783 Epoch: 2 Global Step: 95330 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:02,350-Speed 2625.38 samples/sec Loss 12.3310 LearningRate 0.0783 Epoch: 2 Global Step: 95340 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:06,246-Speed 2629.47 samples/sec Loss 12.2558 LearningRate 0.0783 Epoch: 2 Global Step: 95350 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:10,156-Speed 2619.71 samples/sec Loss 12.3984 LearningRate 0.0783 Epoch: 2 Global Step: 95360 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:14,057-Speed 2625.37 samples/sec Loss 12.3498 LearningRate 0.0783 Epoch: 2 Global Step: 95370 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:17,954-Speed 2627.89 samples/sec Loss 12.3842 LearningRate 0.0783 Epoch: 2 Global Step: 95380 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:21,862-Speed 2621.54 samples/sec Loss 12.1875 LearningRate 0.0783 Epoch: 2 Global Step: 95390 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:25,780-Speed 2613.71 samples/sec Loss 12.4236 LearningRate 0.0783 Epoch: 2 Global Step: 95400 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:29,695-Speed 2616.41 samples/sec Loss 12.2960 LearningRate 0.0783 Epoch: 2 Global Step: 95410 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:33,589-Speed 2630.38 samples/sec Loss 12.4678 LearningRate 0.0783 Epoch: 2 Global Step: 95420 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:26:37,489-Speed 2626.44 samples/sec Loss 12.3078 LearningRate 0.0783 Epoch: 2 Global Step: 95430 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:26:41,397-Speed 2620.41 samples/sec Loss 12.3673 LearningRate 0.0783 Epoch: 2 Global Step: 95440 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:26:45,281-Speed 2636.98 samples/sec Loss 12.2580 LearningRate 0.0783 Epoch: 2 Global Step: 95450 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:49,189-Speed 2621.36 samples/sec Loss 12.5007 LearningRate 0.0783 Epoch: 2 Global Step: 95460 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:53,089-Speed 2626.05 samples/sec Loss 12.5400 LearningRate 0.0783 Epoch: 2 Global Step: 95470 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:26:56,988-Speed 2627.71 samples/sec Loss 12.2965 LearningRate 0.0783 Epoch: 2 Global Step: 95480 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:00,887-Speed 2626.61 samples/sec Loss 12.3505 LearningRate 0.0783 Epoch: 2 Global Step: 95490 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:04,791-Speed 2623.41 samples/sec Loss 12.4877 LearningRate 0.0783 Epoch: 2 Global Step: 95500 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:08,694-Speed 2624.23 samples/sec Loss 12.6136 LearningRate 0.0783 Epoch: 2 Global Step: 95510 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:12,591-Speed 2628.86 samples/sec Loss 12.3638 LearningRate 0.0783 Epoch: 2 Global Step: 95520 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:16,506-Speed 2615.86 samples/sec Loss 12.3673 LearningRate 0.0783 Epoch: 2 Global Step: 95530 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:20,406-Speed 2626.68 samples/sec Loss 12.3588 LearningRate 0.0783 Epoch: 2 Global Step: 95540 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:24,334-Speed 2607.86 samples/sec Loss 12.4866 LearningRate 0.0783 Epoch: 2 Global Step: 95550 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:27:28,231-Speed 2628.80 samples/sec Loss 12.2813 LearningRate 0.0783 Epoch: 2 Global Step: 95560 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:27:32,138-Speed 2620.98 samples/sec Loss 12.2886 LearningRate 0.0783 Epoch: 2 Global Step: 95570 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:27:36,042-Speed 2624.09 samples/sec Loss 12.2301 LearningRate 0.0783 Epoch: 2 Global Step: 95580 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:27:39,941-Speed 2626.74 samples/sec Loss 12.3765 LearningRate 0.0783 Epoch: 2 Global Step: 95590 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:27:43,866-Speed 2609.21 samples/sec Loss 12.2687 LearningRate 0.0783 Epoch: 2 Global Step: 95600 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:47,768-Speed 2625.54 samples/sec Loss 12.4446 LearningRate 0.0783 Epoch: 2 Global Step: 95610 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:51,664-Speed 2628.81 samples/sec Loss 12.3033 LearningRate 0.0783 Epoch: 2 Global Step: 95620 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:27:55,611-Speed 2595.29 samples/sec Loss 12.3672 LearningRate 0.0783 Epoch: 2 Global Step: 95630 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:27:59,510-Speed 2626.92 samples/sec Loss 12.9900 LearningRate 0.0783 Epoch: 2 Global Step: 95640 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:03,409-Speed 2626.98 samples/sec Loss 12.7384 LearningRate 0.0783 Epoch: 2 Global Step: 95650 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:07,336-Speed 2608.49 samples/sec Loss 12.4522 LearningRate 0.0783 Epoch: 2 Global Step: 95660 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:11,389-Speed 2527.54 samples/sec Loss 12.5440 LearningRate 0.0783 Epoch: 2 Global Step: 95670 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:15,312-Speed 2610.53 samples/sec Loss 12.5117 LearningRate 0.0783 Epoch: 2 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:19,203-Speed 2633.18 samples/sec Loss 12.5504 LearningRate 0.0783 Epoch: 2 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:23,094-Speed 2632.13 samples/sec Loss 12.3565 LearningRate 0.0783 Epoch: 2 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:26,992-Speed 2627.92 samples/sec Loss 12.3879 LearningRate 0.0783 Epoch: 2 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:30,893-Speed 2625.54 samples/sec Loss 12.3992 LearningRate 0.0783 Epoch: 2 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:28:34,788-Speed 2629.41 samples/sec Loss 12.3691 LearningRate 0.0783 Epoch: 2 Global Step: 95730 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:28:38,680-Speed 2631.82 samples/sec Loss 12.6148 LearningRate 0.0783 Epoch: 2 Global Step: 95740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:28:42,579-Speed 2627.13 samples/sec Loss 12.4007 LearningRate 0.0782 Epoch: 2 Global Step: 95750 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:28:46,475-Speed 2628.51 samples/sec Loss 12.4408 LearningRate 0.0782 Epoch: 2 Global Step: 95760 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:28:50,388-Speed 2618.17 samples/sec Loss 12.4624 LearningRate 0.0782 Epoch: 2 Global Step: 95770 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:28:54,308-Speed 2612.97 samples/sec Loss 12.5665 LearningRate 0.0782 Epoch: 2 Global Step: 95780 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:28:58,206-Speed 2627.55 samples/sec Loss 12.3666 LearningRate 0.0782 Epoch: 2 Global Step: 95790 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:29:02,108-Speed 2624.81 samples/sec Loss 12.4307 LearningRate 0.0782 Epoch: 2 Global Step: 95800 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:29:06,005-Speed 2627.87 samples/sec Loss 12.5336 LearningRate 0.0782 Epoch: 2 Global Step: 95810 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:29:09,907-Speed 2625.27 samples/sec Loss 12.4109 LearningRate 0.0782 Epoch: 2 Global Step: 95820 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:29:13,805-Speed 2627.82 samples/sec Loss 12.4817 LearningRate 0.0782 Epoch: 2 Global Step: 95830 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:17,707-Speed 2624.68 samples/sec Loss 12.4335 LearningRate 0.0782 Epoch: 2 Global Step: 95840 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:21,618-Speed 2619.13 samples/sec Loss 12.1688 LearningRate 0.0782 Epoch: 2 Global Step: 95850 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:25,526-Speed 2620.92 samples/sec Loss 12.4334 LearningRate 0.0782 Epoch: 2 Global Step: 95860 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:29,437-Speed 2618.98 samples/sec Loss 12.3509 LearningRate 0.0782 Epoch: 2 Global Step: 95870 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:33,360-Speed 2610.34 samples/sec Loss 12.5021 LearningRate 0.0782 Epoch: 2 Global Step: 95880 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:37,279-Speed 2613.97 samples/sec Loss 12.4403 LearningRate 0.0782 Epoch: 2 Global Step: 95890 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:41,176-Speed 2628.07 samples/sec Loss 12.3506 LearningRate 0.0782 Epoch: 2 Global Step: 95900 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:45,069-Speed 2630.91 samples/sec Loss 12.3862 LearningRate 0.0782 Epoch: 2 Global Step: 95910 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:48,967-Speed 2629.35 samples/sec Loss 12.4708 LearningRate 0.0782 Epoch: 2 Global Step: 95920 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:29:52,845-Speed 2641.13 samples/sec Loss 12.2250 LearningRate 0.0782 Epoch: 2 Global Step: 95930 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:29:56,775-Speed 2606.17 samples/sec Loss 12.2520 LearningRate 0.0782 Epoch: 2 Global Step: 95940 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:00,671-Speed 2628.90 samples/sec Loss 12.2116 LearningRate 0.0782 Epoch: 2 Global Step: 95950 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:04,564-Speed 2630.95 samples/sec Loss 12.3021 LearningRate 0.0782 Epoch: 2 Global Step: 95960 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:08,459-Speed 2629.67 samples/sec Loss 12.1989 LearningRate 0.0782 Epoch: 2 Global Step: 95970 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:12,360-Speed 2625.80 samples/sec Loss 12.3298 LearningRate 0.0782 Epoch: 2 Global Step: 95980 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:16,256-Speed 2628.86 samples/sec Loss 12.2920 LearningRate 0.0782 Epoch: 2 Global Step: 95990 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:20,221-Speed 2583.04 samples/sec Loss 12.4009 LearningRate 0.0782 Epoch: 2 Global Step: 96000 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:24,155-Speed 2603.09 samples/sec Loss 12.4395 LearningRate 0.0782 Epoch: 2 Global Step: 96010 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:28,051-Speed 2629.62 samples/sec Loss 12.5021 LearningRate 0.0782 Epoch: 2 Global Step: 96020 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:30:31,960-Speed 2620.13 samples/sec Loss 12.3987 LearningRate 0.0782 Epoch: 2 Global Step: 96030 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:30:35,869-Speed 2620.48 samples/sec Loss 12.5961 LearningRate 0.0782 Epoch: 2 Global Step: 96040 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:30:39,773-Speed 2623.25 samples/sec Loss 12.3366 LearningRate 0.0782 Epoch: 2 Global Step: 96050 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:30:43,674-Speed 2625.61 samples/sec Loss 12.3436 LearningRate 0.0782 Epoch: 2 Global Step: 96060 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:30:47,573-Speed 2627.41 samples/sec Loss 12.4092 LearningRate 0.0782 Epoch: 2 Global Step: 96070 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:30:51,480-Speed 2621.70 samples/sec Loss 12.5041 LearningRate 0.0782 Epoch: 2 Global Step: 96080 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:30:55,379-Speed 2626.88 samples/sec Loss 12.5216 LearningRate 0.0782 Epoch: 2 Global Step: 96090 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:30:59,296-Speed 2615.20 samples/sec Loss 12.3386 LearningRate 0.0782 Epoch: 2 Global Step: 96100 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:03,216-Speed 2612.58 samples/sec Loss 12.4276 LearningRate 0.0782 Epoch: 2 Global Step: 96110 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:07,114-Speed 2628.29 samples/sec Loss 12.2136 LearningRate 0.0782 Epoch: 2 Global Step: 96120 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:11,022-Speed 2620.57 samples/sec Loss 12.3971 LearningRate 0.0782 Epoch: 2 Global Step: 96130 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:14,923-Speed 2625.39 samples/sec Loss 12.3342 LearningRate 0.0782 Epoch: 2 Global Step: 96140 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:18,825-Speed 2625.13 samples/sec Loss 12.4786 LearningRate 0.0782 Epoch: 2 Global Step: 96150 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:22,719-Speed 2630.58 samples/sec Loss 12.4734 LearningRate 0.0782 Epoch: 2 Global Step: 96160 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:26,623-Speed 2623.80 samples/sec Loss 12.3421 LearningRate 0.0782 Epoch: 2 Global Step: 96170 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:30,515-Speed 2632.48 samples/sec Loss 12.3205 LearningRate 0.0782 Epoch: 2 Global Step: 96180 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:34,411-Speed 2628.38 samples/sec Loss 12.6137 LearningRate 0.0782 Epoch: 2 Global Step: 96190 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:38,325-Speed 2617.30 samples/sec Loss 12.2979 LearningRate 0.0782 Epoch: 2 Global Step: 96200 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:42,218-Speed 2631.43 samples/sec Loss 12.1817 LearningRate 0.0782 Epoch: 2 Global Step: 96210 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:46,115-Speed 2628.35 samples/sec Loss 12.1643 LearningRate 0.0781 Epoch: 2 Global Step: 96220 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:50,038-Speed 2610.56 samples/sec Loss 12.2645 LearningRate 0.0781 Epoch: 2 Global Step: 96230 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:54,058-Speed 2548.10 samples/sec Loss 12.2766 LearningRate 0.0781 Epoch: 2 Global Step: 96240 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:31:58,131-Speed 2514.86 samples/sec Loss 12.3582 LearningRate 0.0781 Epoch: 2 Global Step: 96250 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:02,202-Speed 2515.93 samples/sec Loss 12.3579 LearningRate 0.0781 Epoch: 2 Global Step: 96260 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:06,130-Speed 2607.96 samples/sec Loss 12.2371 LearningRate 0.0781 Epoch: 2 Global Step: 96270 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:10,032-Speed 2624.37 samples/sec Loss 12.3373 LearningRate 0.0781 Epoch: 2 Global Step: 96280 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:13,938-Speed 2622.36 samples/sec Loss 12.3841 LearningRate 0.0781 Epoch: 2 Global Step: 96290 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:17,829-Speed 2631.71 samples/sec Loss 12.3542 LearningRate 0.0781 Epoch: 2 Global Step: 96300 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:21,721-Speed 2632.13 samples/sec Loss 12.2693 LearningRate 0.0781 Epoch: 2 Global Step: 96310 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:25,619-Speed 2627.37 samples/sec Loss 12.2868 LearningRate 0.0781 Epoch: 2 Global Step: 96320 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:29,500-Speed 2639.11 samples/sec Loss 12.2014 LearningRate 0.0781 Epoch: 2 Global Step: 96330 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:33,394-Speed 2630.70 samples/sec Loss 12.3983 LearningRate 0.0781 Epoch: 2 Global Step: 96340 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:37,300-Speed 2622.34 samples/sec Loss 12.3680 LearningRate 0.0781 Epoch: 2 Global Step: 96350 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:41,191-Speed 2632.32 samples/sec Loss 12.3394 LearningRate 0.0781 Epoch: 2 Global Step: 96360 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:45,084-Speed 2630.88 samples/sec Loss 12.3035 LearningRate 0.0781 Epoch: 2 Global Step: 96370 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:48,978-Speed 2630.74 samples/sec Loss 12.4094 LearningRate 0.0781 Epoch: 2 Global Step: 96380 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:32:52,853-Speed 2642.88 samples/sec Loss 12.2119 LearningRate 0.0781 Epoch: 2 Global Step: 96390 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:32:56,763-Speed 2619.61 samples/sec Loss 12.3041 LearningRate 0.0781 Epoch: 2 Global Step: 96400 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:00,657-Speed 2630.01 samples/sec Loss 12.3963 LearningRate 0.0781 Epoch: 2 Global Step: 96410 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:04,551-Speed 2630.13 samples/sec Loss 12.3643 LearningRate 0.0781 Epoch: 2 Global Step: 96420 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:08,458-Speed 2621.95 samples/sec Loss 12.3685 LearningRate 0.0781 Epoch: 2 Global Step: 96430 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:12,389-Speed 2606.15 samples/sec Loss 12.2676 LearningRate 0.0781 Epoch: 2 Global Step: 96440 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:16,279-Speed 2632.46 samples/sec Loss 12.2976 LearningRate 0.0781 Epoch: 2 Global Step: 96450 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:20,173-Speed 2630.82 samples/sec Loss 12.2655 LearningRate 0.0781 Epoch: 2 Global Step: 96460 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:24,064-Speed 2631.93 samples/sec Loss 12.2929 LearningRate 0.0781 Epoch: 2 Global Step: 96470 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:27,956-Speed 2632.15 samples/sec Loss 12.3740 LearningRate 0.0781 Epoch: 2 Global Step: 96480 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:33:31,852-Speed 2628.71 samples/sec Loss 12.3753 LearningRate 0.0781 Epoch: 2 Global Step: 96490 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:33:35,752-Speed 2625.73 samples/sec Loss 12.4954 LearningRate 0.0781 Epoch: 2 Global Step: 96500 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:33:39,658-Speed 2622.21 samples/sec Loss 12.3497 LearningRate 0.0781 Epoch: 2 Global Step: 96510 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:33:43,553-Speed 2629.98 samples/sec Loss 12.2957 LearningRate 0.0781 Epoch: 2 Global Step: 96520 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:33:47,446-Speed 2631.39 samples/sec Loss 12.3652 LearningRate 0.0781 Epoch: 2 Global Step: 96530 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:33:51,361-Speed 2616.37 samples/sec Loss 12.3492 LearningRate 0.0781 Epoch: 2 Global Step: 96540 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:33:55,262-Speed 2625.72 samples/sec Loss 12.3450 LearningRate 0.0781 Epoch: 2 Global Step: 96550 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:33:59,163-Speed 2625.91 samples/sec Loss 12.3397 LearningRate 0.0781 Epoch: 2 Global Step: 96560 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:34:03,048-Speed 2635.92 samples/sec Loss 12.1974 LearningRate 0.0781 Epoch: 2 Global Step: 96570 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:06,952-Speed 2623.95 samples/sec Loss 12.5146 LearningRate 0.0781 Epoch: 2 Global Step: 96580 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:10,850-Speed 2627.43 samples/sec Loss 12.4264 LearningRate 0.0781 Epoch: 2 Global Step: 96590 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:14,755-Speed 2623.36 samples/sec Loss 12.4100 LearningRate 0.0781 Epoch: 2 Global Step: 96600 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:18,656-Speed 2625.83 samples/sec Loss 12.2056 LearningRate 0.0781 Epoch: 2 Global Step: 96610 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:22,550-Speed 2630.01 samples/sec Loss 12.2610 LearningRate 0.0781 Epoch: 2 Global Step: 96620 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:26,441-Speed 2632.13 samples/sec Loss 12.3153 LearningRate 0.0781 Epoch: 2 Global Step: 96630 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:30,335-Speed 2630.55 samples/sec Loss 12.4366 LearningRate 0.0781 Epoch: 2 Global Step: 96640 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:34,227-Speed 2631.44 samples/sec Loss 12.3978 LearningRate 0.0781 Epoch: 2 Global Step: 96650 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:38,142-Speed 2616.01 samples/sec Loss 12.3955 LearningRate 0.0781 Epoch: 2 Global Step: 96660 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:34:42,036-Speed 2630.75 samples/sec Loss 12.5405 LearningRate 0.0781 Epoch: 2 Global Step: 96670 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:34:45,940-Speed 2623.15 samples/sec Loss 12.2334 LearningRate 0.0780 Epoch: 2 Global Step: 96680 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:34:49,834-Speed 2630.97 samples/sec Loss 12.4347 LearningRate 0.0780 Epoch: 2 Global Step: 96690 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:34:53,733-Speed 2626.43 samples/sec Loss 12.3697 LearningRate 0.0780 Epoch: 2 Global Step: 96700 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:34:57,664-Speed 2606.07 samples/sec Loss 12.3712 LearningRate 0.0780 Epoch: 2 Global Step: 96710 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:35:01,550-Speed 2635.75 samples/sec Loss 12.2798 LearningRate 0.0780 Epoch: 2 Global Step: 96720 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:05,459-Speed 2619.68 samples/sec Loss 12.2087 LearningRate 0.0780 Epoch: 2 Global Step: 96730 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:09,363-Speed 2623.67 samples/sec Loss 12.4304 LearningRate 0.0780 Epoch: 2 Global Step: 96740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:13,268-Speed 2623.66 samples/sec Loss 12.2959 LearningRate 0.0780 Epoch: 2 Global Step: 96750 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:17,185-Speed 2614.46 samples/sec Loss 12.2907 LearningRate 0.0780 Epoch: 2 Global Step: 96760 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:21,166-Speed 2572.77 samples/sec Loss 12.3801 LearningRate 0.0780 Epoch: 2 Global Step: 96770 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:25,265-Speed 2498.97 samples/sec Loss 12.2830 LearningRate 0.0780 Epoch: 2 Global Step: 96780 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:29,170-Speed 2623.34 samples/sec Loss 12.2197 LearningRate 0.0780 Epoch: 2 Global Step: 96790 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:33,068-Speed 2627.67 samples/sec Loss 12.3357 LearningRate 0.0780 Epoch: 2 Global Step: 96800 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:36,977-Speed 2619.77 samples/sec Loss 12.3319 LearningRate 0.0780 Epoch: 2 Global Step: 96810 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:35:40,885-Speed 2620.67 samples/sec Loss 12.4075 LearningRate 0.0780 Epoch: 2 Global Step: 96820 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:35:44,788-Speed 2624.80 samples/sec Loss 12.2945 LearningRate 0.0780 Epoch: 2 Global Step: 96830 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:35:48,684-Speed 2629.40 samples/sec Loss 12.2732 LearningRate 0.0780 Epoch: 2 Global Step: 96840 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:35:52,578-Speed 2629.96 samples/sec Loss 12.3866 LearningRate 0.0780 Epoch: 2 Global Step: 96850 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:35:56,474-Speed 2629.22 samples/sec Loss 12.1310 LearningRate 0.0780 Epoch: 2 Global Step: 96860 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:00,369-Speed 2629.33 samples/sec Loss 12.2598 LearningRate 0.0780 Epoch: 2 Global Step: 96870 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:04,270-Speed 2625.58 samples/sec Loss 12.4297 LearningRate 0.0780 Epoch: 2 Global Step: 96880 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:08,172-Speed 2624.83 samples/sec Loss 12.3366 LearningRate 0.0780 Epoch: 2 Global Step: 96890 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:12,069-Speed 2628.37 samples/sec Loss 12.3643 LearningRate 0.0780 Epoch: 2 Global Step: 96900 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:15,967-Speed 2627.42 samples/sec Loss 12.4281 LearningRate 0.0780 Epoch: 2 Global Step: 96910 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:19,850-Speed 2637.87 samples/sec Loss 12.2273 LearningRate 0.0780 Epoch: 2 Global Step: 96920 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:23,740-Speed 2633.24 samples/sec Loss 12.1373 LearningRate 0.0780 Epoch: 2 Global Step: 96930 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:27,630-Speed 2632.86 samples/sec Loss 12.1727 LearningRate 0.0780 Epoch: 2 Global Step: 96940 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:31,524-Speed 2630.46 samples/sec Loss 12.3128 LearningRate 0.0780 Epoch: 2 Global Step: 96950 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:35,421-Speed 2628.34 samples/sec Loss 12.4147 LearningRate 0.0780 Epoch: 2 Global Step: 96960 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:39,321-Speed 2626.16 samples/sec Loss 12.3024 LearningRate 0.0780 Epoch: 2 Global Step: 96970 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:43,216-Speed 2629.66 samples/sec Loss 12.1948 LearningRate 0.0780 Epoch: 2 Global Step: 96980 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:47,111-Speed 2629.45 samples/sec Loss 12.2093 LearningRate 0.0780 Epoch: 2 Global Step: 96990 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:51,002-Speed 2632.24 samples/sec Loss 12.3577 LearningRate 0.0780 Epoch: 2 Global Step: 97000 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:54,910-Speed 2620.60 samples/sec Loss 12.2757 LearningRate 0.0780 Epoch: 2 Global Step: 97010 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:36:58,795-Speed 2636.92 samples/sec Loss 12.4586 LearningRate 0.0780 Epoch: 2 Global Step: 97020 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:02,691-Speed 2629.32 samples/sec Loss 12.4469 LearningRate 0.0780 Epoch: 2 Global Step: 97030 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:06,585-Speed 2630.33 samples/sec Loss 12.3337 LearningRate 0.0780 Epoch: 2 Global Step: 97040 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:10,480-Speed 2630.19 samples/sec Loss 12.2611 LearningRate 0.0780 Epoch: 2 Global Step: 97050 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:14,386-Speed 2621.58 samples/sec Loss 12.3325 LearningRate 0.0780 Epoch: 2 Global Step: 97060 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:18,281-Speed 2629.90 samples/sec Loss 12.2781 LearningRate 0.0780 Epoch: 2 Global Step: 97070 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:22,178-Speed 2627.72 samples/sec Loss 12.3842 LearningRate 0.0780 Epoch: 2 Global Step: 97080 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:26,101-Speed 2610.96 samples/sec Loss 12.4398 LearningRate 0.0780 Epoch: 2 Global Step: 97090 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:30,005-Speed 2624.01 samples/sec Loss 12.1393 LearningRate 0.0780 Epoch: 2 Global Step: 97100 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:37:33,885-Speed 2639.84 samples/sec Loss 12.1844 LearningRate 0.0780 Epoch: 2 Global Step: 97110 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:37:37,788-Speed 2624.58 samples/sec Loss 12.2365 LearningRate 0.0780 Epoch: 2 Global Step: 97120 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:37:41,683-Speed 2629.81 samples/sec Loss 12.4282 LearningRate 0.0780 Epoch: 2 Global Step: 97130 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:37:45,582-Speed 2627.14 samples/sec Loss 12.4310 LearningRate 0.0780 Epoch: 2 Global Step: 97140 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:37:49,478-Speed 2628.79 samples/sec Loss 12.3243 LearningRate 0.0779 Epoch: 2 Global Step: 97150 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:37:53,377-Speed 2626.45 samples/sec Loss 12.4180 LearningRate 0.0779 Epoch: 2 Global Step: 97160 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:37:57,275-Speed 2627.68 samples/sec Loss 12.3597 LearningRate 0.0779 Epoch: 2 Global Step: 97170 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:01,171-Speed 2629.17 samples/sec Loss 12.2422 LearningRate 0.0779 Epoch: 2 Global Step: 97180 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:05,065-Speed 2630.85 samples/sec Loss 12.3515 LearningRate 0.0779 Epoch: 2 Global Step: 97190 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:08,964-Speed 2626.62 samples/sec Loss 12.3795 LearningRate 0.0779 Epoch: 2 Global Step: 97200 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:12,865-Speed 2626.03 samples/sec Loss 12.2042 LearningRate 0.0779 Epoch: 2 Global Step: 97210 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:38:16,765-Speed 2626.15 samples/sec Loss 12.2763 LearningRate 0.0779 Epoch: 2 Global Step: 97220 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:38:20,661-Speed 2628.42 samples/sec Loss 12.2958 LearningRate 0.0779 Epoch: 2 Global Step: 97230 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:38:24,543-Speed 2638.23 samples/sec Loss 12.3865 LearningRate 0.0779 Epoch: 2 Global Step: 97240 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:28,439-Speed 2629.30 samples/sec Loss 12.4271 LearningRate 0.0779 Epoch: 2 Global Step: 97250 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:32,340-Speed 2625.91 samples/sec Loss 12.2510 LearningRate 0.0779 Epoch: 2 Global Step: 97260 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:36,235-Speed 2629.21 samples/sec Loss 12.1196 LearningRate 0.0779 Epoch: 2 Global Step: 97270 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:40,149-Speed 2617.40 samples/sec Loss 12.1788 LearningRate 0.0779 Epoch: 2 Global Step: 97280 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:44,043-Speed 2630.39 samples/sec Loss 12.4076 LearningRate 0.0779 Epoch: 2 Global Step: 97290 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:47,935-Speed 2631.07 samples/sec Loss 12.4209 LearningRate 0.0779 Epoch: 2 Global Step: 97300 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:51,826-Speed 2632.58 samples/sec Loss 12.4034 LearningRate 0.0779 Epoch: 2 Global Step: 97310 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:55,723-Speed 2628.08 samples/sec Loss 12.4033 LearningRate 0.0779 Epoch: 2 Global Step: 97320 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:38:59,618-Speed 2629.77 samples/sec Loss 12.2769 LearningRate 0.0779 Epoch: 2 Global Step: 97330 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:03,519-Speed 2625.25 samples/sec Loss 12.2340 LearningRate 0.0779 Epoch: 2 Global Step: 97340 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:39:07,420-Speed 2625.99 samples/sec Loss 12.2644 LearningRate 0.0779 Epoch: 2 Global Step: 97350 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:39:11,315-Speed 2629.28 samples/sec Loss 12.3639 LearningRate 0.0779 Epoch: 2 Global Step: 97360 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:39:15,213-Speed 2627.98 samples/sec Loss 12.3600 LearningRate 0.0779 Epoch: 2 Global Step: 97370 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:39:19,115-Speed 2625.14 samples/sec Loss 12.2585 LearningRate 0.0779 Epoch: 2 Global Step: 97380 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:39:22,999-Speed 2636.66 samples/sec Loss 12.3830 LearningRate 0.0779 Epoch: 2 Global Step: 97390 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:26,895-Speed 2629.43 samples/sec Loss 12.2077 LearningRate 0.0779 Epoch: 2 Global Step: 97400 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:30,803-Speed 2620.35 samples/sec Loss 12.3076 LearningRate 0.0779 Epoch: 2 Global Step: 97410 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:34,694-Speed 2632.08 samples/sec Loss 12.2287 LearningRate 0.0779 Epoch: 2 Global Step: 97420 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:38,590-Speed 2629.40 samples/sec Loss 12.3457 LearningRate 0.0779 Epoch: 2 Global Step: 97430 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:42,488-Speed 2627.45 samples/sec Loss 12.3181 LearningRate 0.0779 Epoch: 2 Global Step: 97440 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:46,384-Speed 2629.06 samples/sec Loss 12.2099 LearningRate 0.0779 Epoch: 2 Global Step: 97450 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:50,278-Speed 2630.66 samples/sec Loss 12.4412 LearningRate 0.0779 Epoch: 2 Global Step: 97460 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:54,175-Speed 2628.27 samples/sec Loss 12.3456 LearningRate 0.0779 Epoch: 2 Global Step: 97470 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:39:58,073-Speed 2627.91 samples/sec Loss 12.1849 LearningRate 0.0779 Epoch: 2 Global Step: 97480 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:01,968-Speed 2629.29 samples/sec Loss 12.1210 LearningRate 0.0779 Epoch: 2 Global Step: 97490 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:40:05,865-Speed 2628.33 samples/sec Loss 12.3235 LearningRate 0.0779 Epoch: 2 Global Step: 97500 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:40:09,762-Speed 2627.60 samples/sec Loss 12.3248 LearningRate 0.0779 Epoch: 2 Global Step: 97510 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:40:13,680-Speed 2615.13 samples/sec Loss 12.2207 LearningRate 0.0779 Epoch: 2 Global Step: 97520 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:40:17,574-Speed 2630.29 samples/sec Loss 12.4907 LearningRate 0.0779 Epoch: 2 Global Step: 97530 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:40:21,452-Speed 2640.87 samples/sec Loss 12.4269 LearningRate 0.0779 Epoch: 2 Global Step: 97540 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:25,354-Speed 2625.17 samples/sec Loss 12.2611 LearningRate 0.0779 Epoch: 2 Global Step: 97550 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:29,249-Speed 2630.09 samples/sec Loss 12.2763 LearningRate 0.0779 Epoch: 2 Global Step: 97560 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:33,146-Speed 2628.22 samples/sec Loss 12.4356 LearningRate 0.0779 Epoch: 2 Global Step: 97570 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:37,070-Speed 2610.30 samples/sec Loss 12.1865 LearningRate 0.0779 Epoch: 2 Global Step: 97580 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:40,997-Speed 2607.70 samples/sec Loss 12.3766 LearningRate 0.0779 Epoch: 2 Global Step: 97590 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:44,894-Speed 2629.05 samples/sec Loss 12.3209 LearningRate 0.0779 Epoch: 2 Global Step: 97600 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:48,813-Speed 2613.75 samples/sec Loss 12.3284 LearningRate 0.0779 Epoch: 2 Global Step: 97610 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:52,729-Speed 2615.78 samples/sec Loss 12.4818 LearningRate 0.0778 Epoch: 2 Global Step: 97620 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:40:56,620-Speed 2632.02 samples/sec Loss 12.4138 LearningRate 0.0778 Epoch: 2 Global Step: 97630 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:00,523-Speed 2624.43 samples/sec Loss 12.1869 LearningRate 0.0778 Epoch: 2 Global Step: 97640 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:41:04,403-Speed 2640.19 samples/sec Loss 12.3037 LearningRate 0.0778 Epoch: 2 Global Step: 97650 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:08,300-Speed 2628.19 samples/sec Loss 12.3170 LearningRate 0.0778 Epoch: 2 Global Step: 97660 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:12,198-Speed 2627.51 samples/sec Loss 12.2405 LearningRate 0.0778 Epoch: 2 Global Step: 97670 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:16,104-Speed 2622.15 samples/sec Loss 12.2920 LearningRate 0.0778 Epoch: 2 Global Step: 97680 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:20,015-Speed 2618.93 samples/sec Loss 12.2653 LearningRate 0.0778 Epoch: 2 Global Step: 97690 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:23,923-Speed 2620.92 samples/sec Loss 12.2847 LearningRate 0.0778 Epoch: 2 Global Step: 97700 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:27,816-Speed 2631.48 samples/sec Loss 12.3061 LearningRate 0.0778 Epoch: 2 Global Step: 97710 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:31,712-Speed 2628.72 samples/sec Loss 12.3854 LearningRate 0.0778 Epoch: 2 Global Step: 97720 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:35,607-Speed 2629.62 samples/sec Loss 12.7009 LearningRate 0.0778 Epoch: 2 Global Step: 97730 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:39,507-Speed 2625.95 samples/sec Loss 12.4542 LearningRate 0.0778 Epoch: 2 Global Step: 97740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:41:43,405-Speed 2628.26 samples/sec Loss 12.3482 LearningRate 0.0778 Epoch: 2 Global Step: 97750 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:41:47,314-Speed 2619.81 samples/sec Loss 12.3821 LearningRate 0.0778 Epoch: 2 Global Step: 97760 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:41:51,212-Speed 2628.21 samples/sec Loss 12.3500 LearningRate 0.0778 Epoch: 2 Global Step: 97770 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:41:55,115-Speed 2624.00 samples/sec Loss 12.2425 LearningRate 0.0778 Epoch: 2 Global Step: 97780 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:41:59,019-Speed 2624.02 samples/sec Loss 12.3848 LearningRate 0.0778 Epoch: 2 Global Step: 97790 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:42:02,897-Speed 2640.74 samples/sec Loss 12.3483 LearningRate 0.0778 Epoch: 2 Global Step: 97800 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:06,791-Speed 2630.45 samples/sec Loss 12.2266 LearningRate 0.0778 Epoch: 2 Global Step: 97810 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:10,683-Speed 2631.39 samples/sec Loss 12.3804 LearningRate 0.0778 Epoch: 2 Global Step: 97820 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:14,574-Speed 2632.56 samples/sec Loss 12.0463 LearningRate 0.0778 Epoch: 2 Global Step: 97830 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:18,476-Speed 2625.19 samples/sec Loss 12.3594 LearningRate 0.0778 Epoch: 2 Global Step: 97840 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:22,388-Speed 2618.42 samples/sec Loss 12.2352 LearningRate 0.0778 Epoch: 2 Global Step: 97850 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:26,290-Speed 2624.86 samples/sec Loss 12.3976 LearningRate 0.0778 Epoch: 2 Global Step: 97860 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:30,185-Speed 2628.87 samples/sec Loss 12.2179 LearningRate 0.0778 Epoch: 2 Global Step: 97870 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:34,080-Speed 2630.38 samples/sec Loss 12.3030 LearningRate 0.0778 Epoch: 2 Global Step: 97880 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:37,972-Speed 2631.81 samples/sec Loss 12.3086 LearningRate 0.0778 Epoch: 2 Global Step: 97890 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:41,901-Speed 2606.98 samples/sec Loss 12.3963 LearningRate 0.0778 Epoch: 2 Global Step: 97900 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:42:45,813-Speed 2618.57 samples/sec Loss 12.2959 LearningRate 0.0778 Epoch: 2 Global Step: 97910 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:42:49,693-Speed 2639.76 samples/sec Loss 12.3165 LearningRate 0.0778 Epoch: 2 Global Step: 97920 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:53,591-Speed 2627.94 samples/sec Loss 12.3054 LearningRate 0.0778 Epoch: 2 Global Step: 97930 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:42:57,483-Speed 2631.60 samples/sec Loss 12.1860 LearningRate 0.0778 Epoch: 2 Global Step: 97940 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:01,385-Speed 2624.84 samples/sec Loss 12.4365 LearningRate 0.0778 Epoch: 2 Global Step: 97950 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:05,285-Speed 2626.51 samples/sec Loss 12.3350 LearningRate 0.0778 Epoch: 2 Global Step: 97960 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:09,185-Speed 2626.53 samples/sec Loss 12.2568 LearningRate 0.0778 Epoch: 2 Global Step: 97970 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:13,082-Speed 2627.97 samples/sec Loss 12.3683 LearningRate 0.0778 Epoch: 2 Global Step: 97980 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:16,990-Speed 2621.55 samples/sec Loss 12.3395 LearningRate 0.0778 Epoch: 2 Global Step: 97990 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:20,894-Speed 2623.31 samples/sec Loss 12.1997 LearningRate 0.0778 Epoch: 2 Global Step: 98000 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:24,811-Speed 2614.59 samples/sec Loss 12.2019 LearningRate 0.0778 Epoch: 2 Global Step: 98010 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:28,714-Speed 2624.45 samples/sec Loss 12.2424 LearningRate 0.0778 Epoch: 2 Global Step: 98020 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:43:32,605-Speed 2632.44 samples/sec Loss 12.1498 LearningRate 0.0778 Epoch: 2 Global Step: 98030 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:36,517-Speed 2618.06 samples/sec Loss 12.2320 LearningRate 0.0778 Epoch: 2 Global Step: 98040 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:40,436-Speed 2613.71 samples/sec Loss 12.2985 LearningRate 0.0778 Epoch: 2 Global Step: 98050 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:44,331-Speed 2629.81 samples/sec Loss 12.1855 LearningRate 0.0778 Epoch: 2 Global Step: 98060 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:48,260-Speed 2607.26 samples/sec Loss 12.2905 LearningRate 0.0778 Epoch: 2 Global Step: 98070 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:52,157-Speed 2628.05 samples/sec Loss 12.3484 LearningRate 0.0778 Epoch: 2 Global Step: 98080 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:56,051-Speed 2630.71 samples/sec Loss 12.2651 LearningRate 0.0777 Epoch: 2 Global Step: 98090 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:43:59,947-Speed 2628.68 samples/sec Loss 12.2646 LearningRate 0.0777 Epoch: 2 Global Step: 98100 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:03,877-Speed 2606.37 samples/sec Loss 12.3203 LearningRate 0.0777 Epoch: 2 Global Step: 98110 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:07,785-Speed 2621.06 samples/sec Loss 12.3751 LearningRate 0.0777 Epoch: 2 Global Step: 98120 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:11,681-Speed 2629.53 samples/sec Loss 12.3964 LearningRate 0.0777 Epoch: 2 Global Step: 98130 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:44:15,576-Speed 2629.54 samples/sec Loss 12.3090 LearningRate 0.0777 Epoch: 2 Global Step: 98140 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:44:19,481-Speed 2622.94 samples/sec Loss 12.4678 LearningRate 0.0777 Epoch: 2 Global Step: 98150 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:44:23,374-Speed 2631.12 samples/sec Loss 12.2377 LearningRate 0.0777 Epoch: 2 Global Step: 98160 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:44:27,274-Speed 2625.72 samples/sec Loss 12.4625 LearningRate 0.0777 Epoch: 2 Global Step: 98170 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:44:31,149-Speed 2643.01 samples/sec Loss 12.4256 LearningRate 0.0777 Epoch: 2 Global Step: 98180 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:35,041-Speed 2632.33 samples/sec Loss 12.3008 LearningRate 0.0777 Epoch: 2 Global Step: 98190 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:38,933-Speed 2631.69 samples/sec Loss 12.0643 LearningRate 0.0777 Epoch: 2 Global Step: 98200 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:42,827-Speed 2630.27 samples/sec Loss 12.3365 LearningRate 0.0777 Epoch: 2 Global Step: 98210 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:46,741-Speed 2617.10 samples/sec Loss 12.2994 LearningRate 0.0777 Epoch: 2 Global Step: 98220 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:50,633-Speed 2631.59 samples/sec Loss 12.2434 LearningRate 0.0777 Epoch: 2 Global Step: 98230 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:54,528-Speed 2629.81 samples/sec Loss 12.3444 LearningRate 0.0777 Epoch: 2 Global Step: 98240 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:44:58,423-Speed 2629.75 samples/sec Loss 12.3586 LearningRate 0.0777 Epoch: 2 Global Step: 98250 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:02,317-Speed 2629.93 samples/sec Loss 12.3026 LearningRate 0.0777 Epoch: 2 Global Step: 98260 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:06,214-Speed 2628.61 samples/sec Loss 12.3159 LearningRate 0.0777 Epoch: 2 Global Step: 98270 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:10,107-Speed 2630.89 samples/sec Loss 12.4043 LearningRate 0.0777 Epoch: 2 Global Step: 98280 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:45:13,990-Speed 2638.03 samples/sec Loss 12.1773 LearningRate 0.0777 Epoch: 2 Global Step: 98290 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:17,888-Speed 2627.82 samples/sec Loss 11.9870 LearningRate 0.0777 Epoch: 2 Global Step: 98300 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:21,787-Speed 2626.70 samples/sec Loss 12.2542 LearningRate 0.0777 Epoch: 2 Global Step: 98310 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:25,683-Speed 2629.18 samples/sec Loss 12.2346 LearningRate 0.0777 Epoch: 2 Global Step: 98320 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:29,577-Speed 2630.47 samples/sec Loss 12.0580 LearningRate 0.0777 Epoch: 2 Global Step: 98330 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:33,483-Speed 2621.95 samples/sec Loss 12.2221 LearningRate 0.0777 Epoch: 2 Global Step: 98340 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:37,374-Speed 2631.95 samples/sec Loss 12.2959 LearningRate 0.0777 Epoch: 2 Global Step: 98350 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:45:41,260-Speed 2636.17 samples/sec Loss 12.2468 LearningRate 0.0777 Epoch: 2 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:45:45,156-Speed 2629.25 samples/sec Loss 12.3013 LearningRate 0.0777 Epoch: 2 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:45:49,071-Speed 2616.04 samples/sec Loss 12.3426 LearningRate 0.0777 Epoch: 2 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:45:52,990-Speed 2613.69 samples/sec Loss 12.4195 LearningRate 0.0777 Epoch: 2 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:45:56,903-Speed 2617.69 samples/sec Loss 12.3072 LearningRate 0.0777 Epoch: 2 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:46:00,802-Speed 2626.24 samples/sec Loss 12.4059 LearningRate 0.0777 Epoch: 2 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:46:04,715-Speed 2617.45 samples/sec Loss 12.2283 LearningRate 0.0777 Epoch: 2 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:46:08,629-Speed 2616.60 samples/sec Loss 12.2024 LearningRate 0.0777 Epoch: 2 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:46:12,525-Speed 2629.73 samples/sec Loss 12.2876 LearningRate 0.0777 Epoch: 2 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:46:16,425-Speed 2626.34 samples/sec Loss 12.0745 LearningRate 0.0777 Epoch: 2 Global Step: 98450 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:46:20,335-Speed 2619.52 samples/sec Loss 12.2155 LearningRate 0.0777 Epoch: 2 Global Step: 98460 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:24,236-Speed 2625.49 samples/sec Loss 12.2746 LearningRate 0.0777 Epoch: 2 Global Step: 98470 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:28,133-Speed 2628.33 samples/sec Loss 12.2056 LearningRate 0.0777 Epoch: 2 Global Step: 98480 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:32,035-Speed 2624.79 samples/sec Loss 12.3377 LearningRate 0.0777 Epoch: 2 Global Step: 98490 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:35,929-Speed 2629.96 samples/sec Loss 12.3427 LearningRate 0.0777 Epoch: 2 Global Step: 98500 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:39,827-Speed 2627.64 samples/sec Loss 12.2990 LearningRate 0.0777 Epoch: 2 Global Step: 98510 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:43,724-Speed 2628.66 samples/sec Loss 12.2589 LearningRate 0.0777 Epoch: 2 Global Step: 98520 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:47,619-Speed 2629.91 samples/sec Loss 12.3811 LearningRate 0.0777 Epoch: 2 Global Step: 98530 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:51,512-Speed 2630.99 samples/sec Loss 12.1863 LearningRate 0.0777 Epoch: 2 Global Step: 98540 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:55,424-Speed 2617.84 samples/sec Loss 12.2202 LearningRate 0.0777 Epoch: 2 Global Step: 98550 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:46:59,335-Speed 2619.40 samples/sec Loss 12.2662 LearningRate 0.0777 Epoch: 2 Global Step: 98560 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:47:03,209-Speed 2643.91 samples/sec Loss 12.3252 LearningRate 0.0776 Epoch: 2 Global Step: 98570 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:47:07,106-Speed 2628.42 samples/sec Loss 12.3137 LearningRate 0.0776 Epoch: 2 Global Step: 98580 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:47:11,002-Speed 2629.34 samples/sec Loss 12.2972 LearningRate 0.0776 Epoch: 2 Global Step: 98590 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:47:14,910-Speed 2620.81 samples/sec Loss 12.3709 LearningRate 0.0776 Epoch: 2 Global Step: 98600 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:47:18,813-Speed 2625.08 samples/sec Loss 12.3277 LearningRate 0.0776 Epoch: 2 Global Step: 98610 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:47:22,709-Speed 2628.50 samples/sec Loss 12.2514 LearningRate 0.0776 Epoch: 2 Global Step: 98620 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:47:26,586-Speed 2642.28 samples/sec Loss 12.2494 LearningRate 0.0776 Epoch: 2 Global Step: 98630 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:30,521-Speed 2602.45 samples/sec Loss 12.2572 LearningRate 0.0776 Epoch: 2 Global Step: 98640 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:34,418-Speed 2628.59 samples/sec Loss 12.1931 LearningRate 0.0776 Epoch: 2 Global Step: 98650 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:38,312-Speed 2630.42 samples/sec Loss 12.3268 LearningRate 0.0776 Epoch: 2 Global Step: 98660 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:42,210-Speed 2627.86 samples/sec Loss 12.2915 LearningRate 0.0776 Epoch: 2 Global Step: 98670 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:46,114-Speed 2623.43 samples/sec Loss 12.2557 LearningRate 0.0776 Epoch: 2 Global Step: 98680 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:50,010-Speed 2628.94 samples/sec Loss 12.3179 LearningRate 0.0776 Epoch: 2 Global Step: 98690 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:53,908-Speed 2627.54 samples/sec Loss 12.1940 LearningRate 0.0776 Epoch: 2 Global Step: 98700 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:47:57,855-Speed 2595.54 samples/sec Loss 12.2895 LearningRate 0.0776 Epoch: 2 Global Step: 98710 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:48:01,749-Speed 2630.26 samples/sec Loss 12.3345 LearningRate 0.0776 Epoch: 2 Global Step: 98720 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:48:05,645-Speed 2628.61 samples/sec Loss 12.3096 LearningRate 0.0776 Epoch: 2 Global Step: 98730 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:09,544-Speed 2626.53 samples/sec Loss 12.3228 LearningRate 0.0776 Epoch: 2 Global Step: 98740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:13,470-Speed 2609.30 samples/sec Loss 12.3506 LearningRate 0.0776 Epoch: 2 Global Step: 98750 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:17,510-Speed 2535.46 samples/sec Loss 12.2290 LearningRate 0.0776 Epoch: 2 Global Step: 98760 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:21,436-Speed 2608.69 samples/sec Loss 12.2514 LearningRate 0.0776 Epoch: 2 Global Step: 98770 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:25,369-Speed 2605.00 samples/sec Loss 12.2289 LearningRate 0.0776 Epoch: 2 Global Step: 98780 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:29,263-Speed 2630.22 samples/sec Loss 12.3239 LearningRate 0.0776 Epoch: 2 Global Step: 98790 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:33,183-Speed 2613.02 samples/sec Loss 12.3932 LearningRate 0.0776 Epoch: 2 Global Step: 98800 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:37,103-Speed 2612.62 samples/sec Loss 12.2764 LearningRate 0.0776 Epoch: 2 Global Step: 98810 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:41,079-Speed 2576.30 samples/sec Loss 12.3404 LearningRate 0.0776 Epoch: 2 Global Step: 98820 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:44,994-Speed 2616.40 samples/sec Loss 12.3094 LearningRate 0.0776 Epoch: 2 Global Step: 98830 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:48:48,880-Speed 2635.78 samples/sec Loss 12.1403 LearningRate 0.0776 Epoch: 2 Global Step: 98840 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:52,883-Speed 2558.41 samples/sec Loss 12.2174 LearningRate 0.0776 Epoch: 2 Global Step: 98850 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:48:56,782-Speed 2626.69 samples/sec Loss 12.1715 LearningRate 0.0776 Epoch: 2 Global Step: 98860 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:00,679-Speed 2628.78 samples/sec Loss 12.1519 LearningRate 0.0776 Epoch: 2 Global Step: 98870 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:04,578-Speed 2627.20 samples/sec Loss 12.3221 LearningRate 0.0776 Epoch: 2 Global Step: 98880 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:08,473-Speed 2629.23 samples/sec Loss 12.2405 LearningRate 0.0776 Epoch: 2 Global Step: 98890 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:12,370-Speed 2628.58 samples/sec Loss 12.3036 LearningRate 0.0776 Epoch: 2 Global Step: 98900 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:16,276-Speed 2622.42 samples/sec Loss 12.2763 LearningRate 0.0776 Epoch: 2 Global Step: 98910 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:20,175-Speed 2627.21 samples/sec Loss 12.2060 LearningRate 0.0776 Epoch: 2 Global Step: 98920 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:24,074-Speed 2626.69 samples/sec Loss 12.2739 LearningRate 0.0776 Epoch: 2 Global Step: 98930 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:27,969-Speed 2629.58 samples/sec Loss 12.3316 LearningRate 0.0776 Epoch: 2 Global Step: 98940 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:49:31,877-Speed 2620.64 samples/sec Loss 12.3206 LearningRate 0.0776 Epoch: 2 Global Step: 98950 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:49:35,777-Speed 2626.51 samples/sec Loss 12.2627 LearningRate 0.0776 Epoch: 2 Global Step: 98960 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:49:39,653-Speed 2642.64 samples/sec Loss 12.3404 LearningRate 0.0776 Epoch: 2 Global Step: 98970 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:43,553-Speed 2626.28 samples/sec Loss 12.2721 LearningRate 0.0776 Epoch: 2 Global Step: 98980 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:47,455-Speed 2625.21 samples/sec Loss 12.1025 LearningRate 0.0776 Epoch: 2 Global Step: 98990 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:51,344-Speed 2633.18 samples/sec Loss 12.4266 LearningRate 0.0776 Epoch: 2 Global Step: 99000 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:55,241-Speed 2628.32 samples/sec Loss 12.1279 LearningRate 0.0776 Epoch: 2 Global Step: 99010 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:49:59,138-Speed 2628.28 samples/sec Loss 12.2729 LearningRate 0.0776 Epoch: 2 Global Step: 99020 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:50:03,028-Speed 2633.67 samples/sec Loss 12.2735 LearningRate 0.0776 Epoch: 2 Global Step: 99030 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:50:06,924-Speed 2629.02 samples/sec Loss 12.3430 LearningRate 0.0775 Epoch: 2 Global Step: 99040 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:50:10,816-Speed 2631.30 samples/sec Loss 12.3024 LearningRate 0.0775 Epoch: 2 Global Step: 99050 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:50:14,711-Speed 2629.69 samples/sec Loss 12.2179 LearningRate 0.0775 Epoch: 2 Global Step: 99060 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:50:18,602-Speed 2632.31 samples/sec Loss 12.3877 LearningRate 0.0775 Epoch: 2 Global Step: 99070 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:22,513-Speed 2618.67 samples/sec Loss 12.3577 LearningRate 0.0775 Epoch: 2 Global Step: 99080 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:26,406-Speed 2630.94 samples/sec Loss 12.2985 LearningRate 0.0775 Epoch: 2 Global Step: 99090 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:30,480-Speed 2513.96 samples/sec Loss 12.3446 LearningRate 0.0775 Epoch: 2 Global Step: 99100 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:34,375-Speed 2630.38 samples/sec Loss 12.2550 LearningRate 0.0775 Epoch: 2 Global Step: 99110 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:38,268-Speed 2630.93 samples/sec Loss 12.2807 LearningRate 0.0775 Epoch: 2 Global Step: 99120 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:42,188-Speed 2612.63 samples/sec Loss 12.3745 LearningRate 0.0775 Epoch: 2 Global Step: 99130 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:46,101-Speed 2617.96 samples/sec Loss 12.4002 LearningRate 0.0775 Epoch: 2 Global Step: 99140 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:50,003-Speed 2625.32 samples/sec Loss 12.4008 LearningRate 0.0775 Epoch: 2 Global Step: 99150 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:53,916-Speed 2617.34 samples/sec Loss 12.2546 LearningRate 0.0775 Epoch: 2 Global Step: 99160 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:50:57,792-Speed 2643.07 samples/sec Loss 12.3300 LearningRate 0.0775 Epoch: 2 Global Step: 99170 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:51:01,695-Speed 2624.04 samples/sec Loss 12.0957 LearningRate 0.0775 Epoch: 2 Global Step: 99180 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:51:05,590-Speed 2629.99 samples/sec Loss 12.3216 LearningRate 0.0775 Epoch: 2 Global Step: 99190 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:51:09,482-Speed 2631.93 samples/sec Loss 12.2596 LearningRate 0.0775 Epoch: 2 Global Step: 99200 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:51:13,384-Speed 2624.44 samples/sec Loss 12.4100 LearningRate 0.0775 Epoch: 2 Global Step: 99210 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:51:17,268-Speed 2637.32 samples/sec Loss 12.3623 LearningRate 0.0775 Epoch: 2 Global Step: 99220 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:21,160-Speed 2631.32 samples/sec Loss 12.2558 LearningRate 0.0775 Epoch: 2 Global Step: 99230 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:25,057-Speed 2628.90 samples/sec Loss 12.2057 LearningRate 0.0775 Epoch: 2 Global Step: 99240 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:28,952-Speed 2630.29 samples/sec Loss 12.1656 LearningRate 0.0775 Epoch: 2 Global Step: 99250 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:32,857-Speed 2622.81 samples/sec Loss 12.2123 LearningRate 0.0775 Epoch: 2 Global Step: 99260 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:36,753-Speed 2628.98 samples/sec Loss 12.3698 LearningRate 0.0775 Epoch: 2 Global Step: 99270 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:40,648-Speed 2629.44 samples/sec Loss 12.2953 LearningRate 0.0775 Epoch: 2 Global Step: 99280 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:44,551-Speed 2624.00 samples/sec Loss 12.2690 LearningRate 0.0775 Epoch: 2 Global Step: 99290 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:48,462-Speed 2618.97 samples/sec Loss 12.2229 LearningRate 0.0775 Epoch: 2 Global Step: 99300 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:52,363-Speed 2625.74 samples/sec Loss 12.2691 LearningRate 0.0775 Epoch: 2 Global Step: 99310 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:51:56,260-Speed 2628.13 samples/sec Loss 12.3495 LearningRate 0.0775 Epoch: 2 Global Step: 99320 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:52:00,158-Speed 2627.76 samples/sec Loss 12.2490 LearningRate 0.0775 Epoch: 2 Global Step: 99330 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:52:04,053-Speed 2629.51 samples/sec Loss 12.2586 LearningRate 0.0775 Epoch: 2 Global Step: 99340 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:52:07,933-Speed 2640.15 samples/sec Loss 12.4508 LearningRate 0.0775 Epoch: 2 Global Step: 99350 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:52:11,840-Speed 2620.92 samples/sec Loss 12.1196 LearningRate 0.0775 Epoch: 2 Global Step: 99360 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:52:15,746-Speed 2622.39 samples/sec Loss 12.2132 LearningRate 0.0775 Epoch: 2 Global Step: 99370 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:52:19,642-Speed 2628.93 samples/sec Loss 12.2086 LearningRate 0.0775 Epoch: 2 Global Step: 99380 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:52:23,547-Speed 2622.98 samples/sec Loss 12.3027 LearningRate 0.0775 Epoch: 2 Global Step: 99390 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:52:27,449-Speed 2624.80 samples/sec Loss 12.3828 LearningRate 0.0775 Epoch: 2 Global Step: 99400 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:52:31,340-Speed 2632.46 samples/sec Loss 12.0772 LearningRate 0.0775 Epoch: 2 Global Step: 99410 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:52:35,220-Speed 2640.05 samples/sec Loss 12.2261 LearningRate 0.0775 Epoch: 2 Global Step: 99420 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:52:39,119-Speed 2626.68 samples/sec Loss 12.2769 LearningRate 0.0775 Epoch: 2 Global Step: 99430 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:52:43,023-Speed 2623.63 samples/sec Loss 12.2801 LearningRate 0.0775 Epoch: 2 Global Step: 99440 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:52:46,920-Speed 2628.51 samples/sec Loss 12.1979 LearningRate 0.0775 Epoch: 2 Global Step: 99450 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:52:50,836-Speed 2615.52 samples/sec Loss 12.1422 LearningRate 0.0775 Epoch: 2 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:52:54,734-Speed 2627.30 samples/sec Loss 12.2356 LearningRate 0.0775 Epoch: 2 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:52:58,641-Speed 2621.86 samples/sec Loss 12.2459 LearningRate 0.0775 Epoch: 2 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:53:02,535-Speed 2629.95 samples/sec Loss 12.2970 LearningRate 0.0775 Epoch: 2 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:53:06,431-Speed 2628.43 samples/sec Loss 12.3458 LearningRate 0.0775 Epoch: 2 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:53:10,344-Speed 2617.61 samples/sec Loss 12.3646 LearningRate 0.0774 Epoch: 2 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 06:53:14,300-Speed 2589.69 samples/sec Loss 12.3010 LearningRate 0.0774 Epoch: 2 Global Step: 99520 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:18,192-Speed 2631.28 samples/sec Loss 12.2785 LearningRate 0.0774 Epoch: 2 Global Step: 99530 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:22,086-Speed 2632.66 samples/sec Loss 12.2767 LearningRate 0.0774 Epoch: 2 Global Step: 99540 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:25,975-Speed 2633.38 samples/sec Loss 12.0599 LearningRate 0.0774 Epoch: 2 Global Step: 99550 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:29,958-Speed 2571.81 samples/sec Loss 12.3426 LearningRate 0.0774 Epoch: 2 Global Step: 99560 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:33,855-Speed 2627.95 samples/sec Loss 12.3186 LearningRate 0.0774 Epoch: 2 Global Step: 99570 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:37,912-Speed 2524.35 samples/sec Loss 12.1964 LearningRate 0.0774 Epoch: 2 Global Step: 99580 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:41,993-Speed 2509.79 samples/sec Loss 12.3181 LearningRate 0.0774 Epoch: 2 Global Step: 99590 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:45,930-Speed 2601.69 samples/sec Loss 12.1606 LearningRate 0.0774 Epoch: 2 Global Step: 99600 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:50,002-Speed 2515.32 samples/sec Loss 12.0989 LearningRate 0.0774 Epoch: 2 Global Step: 99610 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:53:54,073-Speed 2516.09 samples/sec Loss 12.2968 LearningRate 0.0774 Epoch: 2 Global Step: 99620 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:53:57,996-Speed 2610.85 samples/sec Loss 12.1463 LearningRate 0.0774 Epoch: 2 Global Step: 99630 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:54:01,899-Speed 2623.85 samples/sec Loss 12.2792 LearningRate 0.0774 Epoch: 2 Global Step: 99640 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:54:05,794-Speed 2629.79 samples/sec Loss 12.1388 LearningRate 0.0774 Epoch: 2 Global Step: 99650 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:54:09,679-Speed 2636.06 samples/sec Loss 12.3330 LearningRate 0.0774 Epoch: 2 Global Step: 99660 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:13,577-Speed 2627.79 samples/sec Loss 12.1135 LearningRate 0.0774 Epoch: 2 Global Step: 99670 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:17,474-Speed 2628.32 samples/sec Loss 12.2458 LearningRate 0.0774 Epoch: 2 Global Step: 99680 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:21,370-Speed 2628.46 samples/sec Loss 12.2339 LearningRate 0.0774 Epoch: 2 Global Step: 99690 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:25,266-Speed 2629.76 samples/sec Loss 12.1387 LearningRate 0.0774 Epoch: 2 Global Step: 99700 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:29,162-Speed 2629.20 samples/sec Loss 12.3997 LearningRate 0.0774 Epoch: 2 Global Step: 99710 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:33,079-Speed 2614.15 samples/sec Loss 12.3275 LearningRate 0.0774 Epoch: 2 Global Step: 99720 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:36,986-Speed 2621.78 samples/sec Loss 12.0762 LearningRate 0.0774 Epoch: 2 Global Step: 99730 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:40,886-Speed 2626.40 samples/sec Loss 12.2722 LearningRate 0.0774 Epoch: 2 Global Step: 99740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:44,802-Speed 2615.76 samples/sec Loss 12.1985 LearningRate 0.0774 Epoch: 2 Global Step: 99750 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:54:48,705-Speed 2623.60 samples/sec Loss 12.2206 LearningRate 0.0774 Epoch: 2 Global Step: 99760 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:54:52,617-Speed 2619.27 samples/sec Loss 12.2315 LearningRate 0.0774 Epoch: 2 Global Step: 99770 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:54:56,515-Speed 2627.25 samples/sec Loss 12.2358 LearningRate 0.0774 Epoch: 2 Global Step: 99780 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:55:00,417-Speed 2624.56 samples/sec Loss 12.2619 LearningRate 0.0774 Epoch: 2 Global Step: 99790 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:55:04,315-Speed 2627.79 samples/sec Loss 12.1700 LearningRate 0.0774 Epoch: 2 Global Step: 99800 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:55:08,191-Speed 2642.92 samples/sec Loss 12.4104 LearningRate 0.0774 Epoch: 2 Global Step: 99810 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:12,099-Speed 2621.19 samples/sec Loss 12.2275 LearningRate 0.0774 Epoch: 2 Global Step: 99820 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:16,000-Speed 2625.69 samples/sec Loss 12.2997 LearningRate 0.0774 Epoch: 2 Global Step: 99830 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:19,899-Speed 2626.58 samples/sec Loss 12.2014 LearningRate 0.0774 Epoch: 2 Global Step: 99840 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:23,798-Speed 2627.50 samples/sec Loss 12.2107 LearningRate 0.0774 Epoch: 2 Global Step: 99850 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:27,722-Speed 2609.91 samples/sec Loss 12.2719 LearningRate 0.0774 Epoch: 2 Global Step: 99860 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:31,619-Speed 2628.46 samples/sec Loss 12.2622 LearningRate 0.0774 Epoch: 2 Global Step: 99870 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:35,515-Speed 2629.07 samples/sec Loss 12.2528 LearningRate 0.0774 Epoch: 2 Global Step: 99880 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:39,449-Speed 2603.40 samples/sec Loss 12.2114 LearningRate 0.0774 Epoch: 2 Global Step: 99890 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:43,528-Speed 2510.77 samples/sec Loss 12.2258 LearningRate 0.0774 Epoch: 2 Global Step: 99900 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:47,542-Speed 2552.00 samples/sec Loss 12.2826 LearningRate 0.0774 Epoch: 2 Global Step: 99910 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:55:51,434-Speed 2631.93 samples/sec Loss 12.3184 LearningRate 0.0774 Epoch: 2 Global Step: 99920 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:55:55,309-Speed 2643.37 samples/sec Loss 12.4011 LearningRate 0.0774 Epoch: 2 Global Step: 99930 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:55:59,200-Speed 2632.57 samples/sec Loss 12.3258 LearningRate 0.0774 Epoch: 2 Global Step: 99940 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:56:03,100-Speed 2626.30 samples/sec Loss 12.2245 LearningRate 0.0774 Epoch: 2 Global Step: 99950 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:56:07,010-Speed 2619.24 samples/sec Loss 12.3686 LearningRate 0.0774 Epoch: 2 Global Step: 99960 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:56:11,008-Speed 2562.01 samples/sec Loss 12.1977 LearningRate 0.0774 Epoch: 2 Global Step: 99970 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:56:14,956-Speed 2594.89 samples/sec Loss 12.1902 LearningRate 0.0773 Epoch: 2 Global Step: 99980 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:56:18,862-Speed 2621.78 samples/sec Loss 12.1553 LearningRate 0.0773 Epoch: 2 Global Step: 99990 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:56:22,766-Speed 2623.89 samples/sec Loss 12.2024 LearningRate 0.0773 Epoch: 2 Global Step: 100000 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:57:05,822-[lfw][100000]XNorm: 24.093140
Training: 2022-04-13 06:57:05,823-[lfw][100000]Accuracy-Flip: 0.99667+-0.00298
Training: 2022-04-13 06:57:05,823-[lfw][100000]Accuracy-Highest: 0.99783
Training: 2022-04-13 06:57:56,593-[cfp_fp][100000]XNorm: 22.382464
Training: 2022-04-13 06:57:56,594-[cfp_fp][100000]Accuracy-Flip: 0.97614+-0.00658
Training: 2022-04-13 06:57:56,594-[cfp_fp][100000]Accuracy-Highest: 0.97986
Training: 2022-04-13 06:58:39,974-[agedb_30][100000]XNorm: 23.680389
Training: 2022-04-13 06:58:39,975-[agedb_30][100000]Accuracy-Flip: 0.96750+-0.00834
Training: 2022-04-13 06:58:39,975-[agedb_30][100000]Accuracy-Highest: 0.96750
Training: 2022-04-13 06:58:43,869-Speed 72.57 samples/sec Loss 12.2919 LearningRate 0.0773 Epoch: 2 Global Step: 100010 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:58:47,742-Speed 2644.54 samples/sec Loss 12.2197 LearningRate 0.0773 Epoch: 2 Global Step: 100020 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:58:51,615-Speed 2644.91 samples/sec Loss 12.2164 LearningRate 0.0773 Epoch: 2 Global Step: 100030 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:58:55,496-Speed 2638.49 samples/sec Loss 12.1436 LearningRate 0.0773 Epoch: 2 Global Step: 100040 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:58:59,390-Speed 2630.45 samples/sec Loss 12.2636 LearningRate 0.0773 Epoch: 2 Global Step: 100050 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:59:03,257-Speed 2648.58 samples/sec Loss 12.1178 LearningRate 0.0773 Epoch: 2 Global Step: 100060 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:07,138-Speed 2639.49 samples/sec Loss 12.2830 LearningRate 0.0773 Epoch: 2 Global Step: 100070 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:11,038-Speed 2626.09 samples/sec Loss 12.2513 LearningRate 0.0773 Epoch: 2 Global Step: 100080 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:14,984-Speed 2595.95 samples/sec Loss 12.3199 LearningRate 0.0773 Epoch: 2 Global Step: 100090 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:18,870-Speed 2636.21 samples/sec Loss 12.2728 LearningRate 0.0773 Epoch: 2 Global Step: 100100 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:22,755-Speed 2635.82 samples/sec Loss 12.2451 LearningRate 0.0773 Epoch: 2 Global Step: 100110 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:26,656-Speed 2626.01 samples/sec Loss 12.1791 LearningRate 0.0773 Epoch: 2 Global Step: 100120 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:30,551-Speed 2630.20 samples/sec Loss 12.1800 LearningRate 0.0773 Epoch: 2 Global Step: 100130 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:34,439-Speed 2634.20 samples/sec Loss 12.3156 LearningRate 0.0773 Epoch: 2 Global Step: 100140 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:38,328-Speed 2633.79 samples/sec Loss 12.0245 LearningRate 0.0773 Epoch: 2 Global Step: 100150 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 06:59:42,245-Speed 2615.25 samples/sec Loss 12.2319 LearningRate 0.0773 Epoch: 2 Global Step: 100160 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:59:46,142-Speed 2628.60 samples/sec Loss 12.0774 LearningRate 0.0773 Epoch: 2 Global Step: 100170 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:59:50,044-Speed 2624.28 samples/sec Loss 12.4209 LearningRate 0.0773 Epoch: 2 Global Step: 100180 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:59:53,938-Speed 2630.47 samples/sec Loss 12.2400 LearningRate 0.0773 Epoch: 2 Global Step: 100190 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 06:59:57,836-Speed 2627.57 samples/sec Loss 12.2784 LearningRate 0.0773 Epoch: 2 Global Step: 100200 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:01,736-Speed 2626.55 samples/sec Loss 12.2090 LearningRate 0.0773 Epoch: 2 Global Step: 100210 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:05,635-Speed 2627.48 samples/sec Loss 12.2386 LearningRate 0.0773 Epoch: 2 Global Step: 100220 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:09,531-Speed 2628.83 samples/sec Loss 12.1740 LearningRate 0.0773 Epoch: 2 Global Step: 100230 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:13,431-Speed 2626.09 samples/sec Loss 12.2549 LearningRate 0.0773 Epoch: 2 Global Step: 100240 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:17,333-Speed 2625.15 samples/sec Loss 12.2152 LearningRate 0.0773 Epoch: 2 Global Step: 100250 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:21,230-Speed 2628.23 samples/sec Loss 12.2302 LearningRate 0.0773 Epoch: 2 Global Step: 100260 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:25,121-Speed 2632.15 samples/sec Loss 12.1100 LearningRate 0.0773 Epoch: 2 Global Step: 100270 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:29,025-Speed 2624.12 samples/sec Loss 12.3089 LearningRate 0.0773 Epoch: 2 Global Step: 100280 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:32,922-Speed 2628.11 samples/sec Loss 12.1838 LearningRate 0.0773 Epoch: 2 Global Step: 100290 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:00:36,801-Speed 2640.56 samples/sec Loss 12.2294 LearningRate 0.0773 Epoch: 2 Global Step: 100300 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:00:40,701-Speed 2626.87 samples/sec Loss 12.2548 LearningRate 0.0773 Epoch: 2 Global Step: 100310 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:00:44,607-Speed 2621.75 samples/sec Loss 12.1533 LearningRate 0.0773 Epoch: 2 Global Step: 100320 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:00:48,508-Speed 2625.87 samples/sec Loss 12.2825 LearningRate 0.0773 Epoch: 2 Global Step: 100330 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:00:52,406-Speed 2627.53 samples/sec Loss 12.1042 LearningRate 0.0773 Epoch: 2 Global Step: 100340 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:00:56,344-Speed 2601.65 samples/sec Loss 12.2951 LearningRate 0.0773 Epoch: 2 Global Step: 100350 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:00,256-Speed 2618.36 samples/sec Loss 12.3940 LearningRate 0.0773 Epoch: 2 Global Step: 100360 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:04,150-Speed 2630.28 samples/sec Loss 12.2223 LearningRate 0.0773 Epoch: 2 Global Step: 100370 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:08,043-Speed 2630.85 samples/sec Loss 12.1884 LearningRate 0.0773 Epoch: 2 Global Step: 100380 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:11,947-Speed 2623.70 samples/sec Loss 12.2507 LearningRate 0.0773 Epoch: 2 Global Step: 100390 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:15,859-Speed 2618.32 samples/sec Loss 12.1768 LearningRate 0.0773 Epoch: 2 Global Step: 100400 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:01:19,735-Speed 2642.10 samples/sec Loss 12.1700 LearningRate 0.0773 Epoch: 2 Global Step: 100410 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:23,637-Speed 2625.67 samples/sec Loss 12.1008 LearningRate 0.0773 Epoch: 2 Global Step: 100420 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:27,534-Speed 2628.08 samples/sec Loss 12.2809 LearningRate 0.0773 Epoch: 2 Global Step: 100430 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:31,437-Speed 2624.70 samples/sec Loss 12.1084 LearningRate 0.0773 Epoch: 2 Global Step: 100440 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:01:35,375-Speed 2600.83 samples/sec Loss 12.2661 LearningRate 0.0772 Epoch: 2 Global Step: 100450 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:01:39,264-Speed 2633.57 samples/sec Loss 12.1561 LearningRate 0.0772 Epoch: 2 Global Step: 100460 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:01:43,163-Speed 2627.28 samples/sec Loss 12.2241 LearningRate 0.0772 Epoch: 2 Global Step: 100470 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:01:47,058-Speed 2629.03 samples/sec Loss 12.2098 LearningRate 0.0772 Epoch: 2 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:01:50,980-Speed 2612.34 samples/sec Loss 12.2855 LearningRate 0.0772 Epoch: 2 Global Step: 100490 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:01:54,993-Speed 2552.29 samples/sec Loss 12.4344 LearningRate 0.0772 Epoch: 2 Global Step: 100500 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:01:58,909-Speed 2615.72 samples/sec Loss 12.0474 LearningRate 0.0772 Epoch: 2 Global Step: 100510 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:02,808-Speed 2626.97 samples/sec Loss 12.2450 LearningRate 0.0772 Epoch: 2 Global Step: 100520 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:06,702-Speed 2630.27 samples/sec Loss 12.0580 LearningRate 0.0772 Epoch: 2 Global Step: 100530 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:10,594-Speed 2631.35 samples/sec Loss 12.3551 LearningRate 0.0772 Epoch: 2 Global Step: 100540 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:14,521-Speed 2608.45 samples/sec Loss 12.2549 LearningRate 0.0772 Epoch: 2 Global Step: 100550 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:02:18,398-Speed 2641.66 samples/sec Loss 12.2078 LearningRate 0.0772 Epoch: 2 Global Step: 100560 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:22,289-Speed 2632.51 samples/sec Loss 12.2944 LearningRate 0.0772 Epoch: 2 Global Step: 100570 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:26,182-Speed 2631.04 samples/sec Loss 12.3503 LearningRate 0.0772 Epoch: 2 Global Step: 100580 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:30,076-Speed 2630.82 samples/sec Loss 12.2689 LearningRate 0.0772 Epoch: 2 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:33,973-Speed 2628.37 samples/sec Loss 12.3087 LearningRate 0.0772 Epoch: 2 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:37,875-Speed 2624.82 samples/sec Loss 12.2033 LearningRate 0.0772 Epoch: 2 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:41,775-Speed 2625.79 samples/sec Loss 12.2495 LearningRate 0.0772 Epoch: 2 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:45,671-Speed 2629.38 samples/sec Loss 12.3401 LearningRate 0.0772 Epoch: 2 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:49,569-Speed 2626.86 samples/sec Loss 12.1728 LearningRate 0.0772 Epoch: 2 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:53,474-Speed 2623.28 samples/sec Loss 12.1578 LearningRate 0.0772 Epoch: 2 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:02:57,370-Speed 2629.12 samples/sec Loss 12.1944 LearningRate 0.0772 Epoch: 2 Global Step: 100660 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:01,295-Speed 2610.16 samples/sec Loss 12.2984 LearningRate 0.0772 Epoch: 2 Global Step: 100670 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:05,200-Speed 2622.52 samples/sec Loss 12.2219 LearningRate 0.0772 Epoch: 2 Global Step: 100680 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:09,103-Speed 2624.59 samples/sec Loss 12.1074 LearningRate 0.0772 Epoch: 2 Global Step: 100690 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:12,994-Speed 2631.69 samples/sec Loss 12.0849 LearningRate 0.0772 Epoch: 2 Global Step: 100700 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:16,896-Speed 2625.10 samples/sec Loss 12.1407 LearningRate 0.0772 Epoch: 2 Global Step: 100710 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:20,807-Speed 2619.75 samples/sec Loss 12.1448 LearningRate 0.0772 Epoch: 2 Global Step: 100720 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:24,702-Speed 2629.32 samples/sec Loss 12.3900 LearningRate 0.0772 Epoch: 2 Global Step: 100730 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:28,639-Speed 2602.22 samples/sec Loss 12.2513 LearningRate 0.0772 Epoch: 2 Global Step: 100740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:32,535-Speed 2628.68 samples/sec Loss 12.3497 LearningRate 0.0772 Epoch: 2 Global Step: 100750 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:03:36,425-Speed 2633.33 samples/sec Loss 12.3160 LearningRate 0.0772 Epoch: 2 Global Step: 100760 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:03:40,324-Speed 2626.96 samples/sec Loss 12.1851 LearningRate 0.0772 Epoch: 2 Global Step: 100770 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:03:44,222-Speed 2627.90 samples/sec Loss 12.2673 LearningRate 0.0772 Epoch: 2 Global Step: 100780 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:03:48,114-Speed 2631.51 samples/sec Loss 12.2167 LearningRate 0.0772 Epoch: 2 Global Step: 100790 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:03:52,035-Speed 2612.47 samples/sec Loss 12.0929 LearningRate 0.0772 Epoch: 2 Global Step: 100800 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:03:55,937-Speed 2624.97 samples/sec Loss 12.2086 LearningRate 0.0772 Epoch: 2 Global Step: 100810 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:03:59,833-Speed 2628.79 samples/sec Loss 12.3403 LearningRate 0.0772 Epoch: 2 Global Step: 100820 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:04:03,733-Speed 2626.71 samples/sec Loss 12.1891 LearningRate 0.0772 Epoch: 2 Global Step: 100830 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:04:07,614-Speed 2639.03 samples/sec Loss 12.3343 LearningRate 0.0772 Epoch: 2 Global Step: 100840 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:11,513-Speed 2626.88 samples/sec Loss 12.3359 LearningRate 0.0772 Epoch: 2 Global Step: 100850 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:15,417-Speed 2623.66 samples/sec Loss 12.2650 LearningRate 0.0772 Epoch: 2 Global Step: 100860 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:19,318-Speed 2625.50 samples/sec Loss 12.0746 LearningRate 0.0772 Epoch: 2 Global Step: 100870 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:23,232-Speed 2616.71 samples/sec Loss 12.1503 LearningRate 0.0772 Epoch: 2 Global Step: 100880 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:27,129-Speed 2628.41 samples/sec Loss 12.1591 LearningRate 0.0772 Epoch: 2 Global Step: 100890 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:31,026-Speed 2628.34 samples/sec Loss 12.2153 LearningRate 0.0772 Epoch: 2 Global Step: 100900 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:34,970-Speed 2597.27 samples/sec Loss 12.1915 LearningRate 0.0772 Epoch: 2 Global Step: 100910 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:38,874-Speed 2623.41 samples/sec Loss 12.2254 LearningRate 0.0771 Epoch: 2 Global Step: 100920 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:42,769-Speed 2629.79 samples/sec Loss 12.1230 LearningRate 0.0771 Epoch: 2 Global Step: 100930 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:04:46,681-Speed 2617.89 samples/sec Loss 12.4187 LearningRate 0.0771 Epoch: 2 Global Step: 100940 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:04:50,589-Speed 2621.22 samples/sec Loss 12.2386 LearningRate 0.0771 Epoch: 2 Global Step: 100950 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:04:54,485-Speed 2629.25 samples/sec Loss 12.1654 LearningRate 0.0771 Epoch: 2 Global Step: 100960 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:04:58,378-Speed 2631.61 samples/sec Loss 12.1288 LearningRate 0.0771 Epoch: 2 Global Step: 100970 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:05:02,272-Speed 2629.83 samples/sec Loss 12.2003 LearningRate 0.0771 Epoch: 2 Global Step: 100980 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:05:06,166-Speed 2630.69 samples/sec Loss 12.1408 LearningRate 0.0771 Epoch: 2 Global Step: 100990 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:05:10,058-Speed 2632.00 samples/sec Loss 12.0539 LearningRate 0.0771 Epoch: 2 Global Step: 101000 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:05:13,973-Speed 2615.54 samples/sec Loss 12.1113 LearningRate 0.0771 Epoch: 2 Global Step: 101010 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:05:17,874-Speed 2626.22 samples/sec Loss 12.2934 LearningRate 0.0771 Epoch: 2 Global Step: 101020 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:05:21,778-Speed 2623.70 samples/sec Loss 12.1977 LearningRate 0.0771 Epoch: 2 Global Step: 101030 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:05:25,669-Speed 2632.22 samples/sec Loss 12.3255 LearningRate 0.0771 Epoch: 2 Global Step: 101040 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:29,577-Speed 2621.07 samples/sec Loss 12.2311 LearningRate 0.0771 Epoch: 2 Global Step: 101050 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:33,473-Speed 2628.65 samples/sec Loss 12.1915 LearningRate 0.0771 Epoch: 2 Global Step: 101060 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:37,372-Speed 2627.64 samples/sec Loss 12.3571 LearningRate 0.0771 Epoch: 2 Global Step: 101070 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:41,269-Speed 2628.31 samples/sec Loss 12.2085 LearningRate 0.0771 Epoch: 2 Global Step: 101080 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:45,160-Speed 2631.86 samples/sec Loss 12.2268 LearningRate 0.0771 Epoch: 2 Global Step: 101090 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:49,053-Speed 2631.16 samples/sec Loss 12.3231 LearningRate 0.0771 Epoch: 2 Global Step: 101100 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:52,947-Speed 2630.05 samples/sec Loss 12.4115 LearningRate 0.0771 Epoch: 2 Global Step: 101110 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:05:56,845-Speed 2627.91 samples/sec Loss 12.3528 LearningRate 0.0771 Epoch: 2 Global Step: 101120 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:06:00,748-Speed 2623.95 samples/sec Loss 12.1512 LearningRate 0.0771 Epoch: 2 Global Step: 101130 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:06:04,653-Speed 2622.80 samples/sec Loss 12.2121 LearningRate 0.0771 Epoch: 2 Global Step: 101140 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:08,561-Speed 2620.92 samples/sec Loss 12.2276 LearningRate 0.0771 Epoch: 2 Global Step: 101150 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:12,503-Speed 2598.19 samples/sec Loss 12.2998 LearningRate 0.0771 Epoch: 2 Global Step: 101160 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:16,417-Speed 2616.85 samples/sec Loss 12.1156 LearningRate 0.0771 Epoch: 2 Global Step: 101170 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:20,313-Speed 2629.30 samples/sec Loss 12.2375 LearningRate 0.0771 Epoch: 2 Global Step: 101180 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:24,208-Speed 2629.60 samples/sec Loss 12.0986 LearningRate 0.0771 Epoch: 2 Global Step: 101190 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:28,106-Speed 2627.12 samples/sec Loss 12.2704 LearningRate 0.0771 Epoch: 2 Global Step: 101200 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:32,006-Speed 2626.72 samples/sec Loss 12.1323 LearningRate 0.0771 Epoch: 2 Global Step: 101210 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:35,901-Speed 2629.57 samples/sec Loss 12.2261 LearningRate 0.0771 Epoch: 2 Global Step: 101220 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:39,808-Speed 2621.38 samples/sec Loss 12.2239 LearningRate 0.0771 Epoch: 2 Global Step: 101230 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:43,695-Speed 2635.07 samples/sec Loss 12.2985 LearningRate 0.0771 Epoch: 2 Global Step: 101240 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:06:47,574-Speed 2640.48 samples/sec Loss 12.3178 LearningRate 0.0771 Epoch: 2 Global Step: 101250 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:06:51,468-Speed 2630.71 samples/sec Loss 12.2327 LearningRate 0.0771 Epoch: 2 Global Step: 101260 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:06:55,362-Speed 2630.28 samples/sec Loss 12.1734 LearningRate 0.0771 Epoch: 2 Global Step: 101270 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:06:59,273-Speed 2618.55 samples/sec Loss 12.2979 LearningRate 0.0771 Epoch: 2 Global Step: 101280 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:03,171-Speed 2627.38 samples/sec Loss 12.3307 LearningRate 0.0771 Epoch: 2 Global Step: 101290 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:07,063-Speed 2631.92 samples/sec Loss 12.2384 LearningRate 0.0771 Epoch: 2 Global Step: 101300 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:10,961-Speed 2627.50 samples/sec Loss 12.1941 LearningRate 0.0771 Epoch: 2 Global Step: 101310 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:14,864-Speed 2624.12 samples/sec Loss 12.1392 LearningRate 0.0771 Epoch: 2 Global Step: 101320 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:18,758-Speed 2629.83 samples/sec Loss 12.2062 LearningRate 0.0771 Epoch: 2 Global Step: 101330 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:22,654-Speed 2629.65 samples/sec Loss 12.1602 LearningRate 0.0771 Epoch: 2 Global Step: 101340 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:26,548-Speed 2630.26 samples/sec Loss 12.1277 LearningRate 0.0771 Epoch: 2 Global Step: 101350 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:07:30,423-Speed 2643.27 samples/sec Loss 12.3019 LearningRate 0.0771 Epoch: 2 Global Step: 101360 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:34,321-Speed 2627.38 samples/sec Loss 12.2482 LearningRate 0.0771 Epoch: 2 Global Step: 101370 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:38,218-Speed 2628.39 samples/sec Loss 12.2814 LearningRate 0.0771 Epoch: 2 Global Step: 101380 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:42,111-Speed 2630.83 samples/sec Loss 12.2769 LearningRate 0.0771 Epoch: 2 Global Step: 101390 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:46,005-Speed 2630.41 samples/sec Loss 12.2905 LearningRate 0.0770 Epoch: 2 Global Step: 101400 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:49,897-Speed 2631.15 samples/sec Loss 12.1402 LearningRate 0.0770 Epoch: 2 Global Step: 101410 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:53,790-Speed 2631.85 samples/sec Loss 12.2120 LearningRate 0.0770 Epoch: 2 Global Step: 101420 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:07:57,684-Speed 2630.36 samples/sec Loss 12.2517 LearningRate 0.0770 Epoch: 2 Global Step: 101430 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:01,576-Speed 2631.48 samples/sec Loss 12.0966 LearningRate 0.0770 Epoch: 2 Global Step: 101440 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:05,471-Speed 2630.17 samples/sec Loss 12.2362 LearningRate 0.0770 Epoch: 2 Global Step: 101450 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:09,384-Speed 2616.88 samples/sec Loss 12.1906 LearningRate 0.0770 Epoch: 2 Global Step: 101460 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:08:13,283-Speed 2627.31 samples/sec Loss 12.1464 LearningRate 0.0770 Epoch: 2 Global Step: 101470 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:08:17,159-Speed 2642.46 samples/sec Loss 12.3691 LearningRate 0.0770 Epoch: 2 Global Step: 101480 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:21,051-Speed 2632.15 samples/sec Loss 12.1966 LearningRate 0.0770 Epoch: 2 Global Step: 101490 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:24,947-Speed 2628.66 samples/sec Loss 12.1006 LearningRate 0.0770 Epoch: 2 Global Step: 101500 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:28,841-Speed 2630.39 samples/sec Loss 12.2043 LearningRate 0.0770 Epoch: 2 Global Step: 101510 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:32,743-Speed 2625.22 samples/sec Loss 12.2626 LearningRate 0.0770 Epoch: 2 Global Step: 101520 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:36,666-Speed 2610.42 samples/sec Loss 12.3191 LearningRate 0.0770 Epoch: 2 Global Step: 101530 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:40,584-Speed 2614.74 samples/sec Loss 12.3106 LearningRate 0.0770 Epoch: 2 Global Step: 101540 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:44,482-Speed 2627.51 samples/sec Loss 12.2297 LearningRate 0.0770 Epoch: 2 Global Step: 101550 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:48,374-Speed 2631.71 samples/sec Loss 12.1820 LearningRate 0.0770 Epoch: 2 Global Step: 101560 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:52,268-Speed 2630.19 samples/sec Loss 12.3121 LearningRate 0.0770 Epoch: 2 Global Step: 101570 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:08:56,163-Speed 2629.56 samples/sec Loss 12.1909 LearningRate 0.0770 Epoch: 2 Global Step: 101580 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:09:00,062-Speed 2626.78 samples/sec Loss 12.3520 LearningRate 0.0770 Epoch: 2 Global Step: 101590 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:09:03,978-Speed 2615.09 samples/sec Loss 12.3107 LearningRate 0.0770 Epoch: 2 Global Step: 101600 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:09:07,875-Speed 2628.48 samples/sec Loss 12.1437 LearningRate 0.0770 Epoch: 2 Global Step: 101610 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:09:11,751-Speed 2642.29 samples/sec Loss 12.2231 LearningRate 0.0770 Epoch: 2 Global Step: 101620 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:15,670-Speed 2614.07 samples/sec Loss 12.1652 LearningRate 0.0770 Epoch: 2 Global Step: 101630 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:19,625-Speed 2590.23 samples/sec Loss 12.1588 LearningRate 0.0770 Epoch: 2 Global Step: 101640 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:23,574-Speed 2593.66 samples/sec Loss 12.2917 LearningRate 0.0770 Epoch: 2 Global Step: 101650 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:27,470-Speed 2628.92 samples/sec Loss 12.1052 LearningRate 0.0770 Epoch: 2 Global Step: 101660 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:31,359-Speed 2633.48 samples/sec Loss 12.3780 LearningRate 0.0770 Epoch: 2 Global Step: 101670 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:35,277-Speed 2614.44 samples/sec Loss 12.2479 LearningRate 0.0770 Epoch: 2 Global Step: 101680 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:39,168-Speed 2632.05 samples/sec Loss 12.1207 LearningRate 0.0770 Epoch: 2 Global Step: 101690 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:43,061-Speed 2631.12 samples/sec Loss 12.2198 LearningRate 0.0770 Epoch: 2 Global Step: 101700 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:46,957-Speed 2629.26 samples/sec Loss 12.2900 LearningRate 0.0770 Epoch: 2 Global Step: 101710 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:09:50,859-Speed 2625.03 samples/sec Loss 12.1635 LearningRate 0.0770 Epoch: 2 Global Step: 101720 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:09:54,759-Speed 2625.94 samples/sec Loss 12.0671 LearningRate 0.0770 Epoch: 2 Global Step: 101730 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:09:58,639-Speed 2640.25 samples/sec Loss 12.1690 LearningRate 0.0770 Epoch: 2 Global Step: 101740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:10:02,511-Speed 2645.22 samples/sec Loss 12.1375 LearningRate 0.0770 Epoch: 2 Global Step: 101750 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:10:06,445-Speed 2603.57 samples/sec Loss 12.3203 LearningRate 0.0770 Epoch: 2 Global Step: 101760 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:10:10,337-Speed 2630.88 samples/sec Loss 12.1361 LearningRate 0.0770 Epoch: 2 Global Step: 101770 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:10:14,229-Speed 2631.93 samples/sec Loss 12.2872 LearningRate 0.0770 Epoch: 2 Global Step: 101780 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:10:18,121-Speed 2632.26 samples/sec Loss 12.3180 LearningRate 0.0770 Epoch: 2 Global Step: 101790 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:10:22,012-Speed 2632.27 samples/sec Loss 12.2593 LearningRate 0.0770 Epoch: 2 Global Step: 101800 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:25,900-Speed 2634.94 samples/sec Loss 12.4949 LearningRate 0.0770 Epoch: 2 Global Step: 101810 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:29,905-Speed 2557.42 samples/sec Loss 12.2437 LearningRate 0.0770 Epoch: 2 Global Step: 101820 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:33,903-Speed 2562.29 samples/sec Loss 12.3102 LearningRate 0.0770 Epoch: 2 Global Step: 101830 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:37,798-Speed 2629.39 samples/sec Loss 12.3919 LearningRate 0.0770 Epoch: 2 Global Step: 101840 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:41,699-Speed 2625.39 samples/sec Loss 12.3819 LearningRate 0.0770 Epoch: 2 Global Step: 101850 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:45,596-Speed 2628.47 samples/sec Loss 12.6346 LearningRate 0.0770 Epoch: 2 Global Step: 101860 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:49,497-Speed 2625.93 samples/sec Loss 12.3651 LearningRate 0.0769 Epoch: 2 Global Step: 101870 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:53,399-Speed 2624.45 samples/sec Loss 12.3805 LearningRate 0.0769 Epoch: 2 Global Step: 101880 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:10:57,301-Speed 2624.91 samples/sec Loss 12.2019 LearningRate 0.0769 Epoch: 2 Global Step: 101890 Fp16 Grad Scale: 16384 Required: 82 hours
Training: 2022-04-13 07:11:01,210-Speed 2620.43 samples/sec Loss 12.3155 LearningRate 0.0769 Epoch: 2 Global Step: 101900 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:05,113-Speed 2624.14 samples/sec Loss 12.0658 LearningRate 0.0769 Epoch: 2 Global Step: 101910 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:09,003-Speed 2632.96 samples/sec Loss 12.2669 LearningRate 0.0769 Epoch: 2 Global Step: 101920 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:12,895-Speed 2631.60 samples/sec Loss 12.3025 LearningRate 0.0769 Epoch: 2 Global Step: 101930 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:16,789-Speed 2630.26 samples/sec Loss 12.2500 LearningRate 0.0769 Epoch: 2 Global Step: 101940 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:20,680-Speed 2632.81 samples/sec Loss 12.2388 LearningRate 0.0769 Epoch: 2 Global Step: 101950 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:24,573-Speed 2631.09 samples/sec Loss 12.3048 LearningRate 0.0769 Epoch: 2 Global Step: 101960 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:28,471-Speed 2627.80 samples/sec Loss 12.1552 LearningRate 0.0769 Epoch: 2 Global Step: 101970 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:32,365-Speed 2630.02 samples/sec Loss 12.1765 LearningRate 0.0769 Epoch: 2 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:36,259-Speed 2630.47 samples/sec Loss 12.2038 LearningRate 0.0769 Epoch: 2 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:11:40,162-Speed 2623.94 samples/sec Loss 12.2109 LearningRate 0.0769 Epoch: 2 Global Step: 102000 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:11:44,066-Speed 2623.47 samples/sec Loss 12.3778 LearningRate 0.0769 Epoch: 2 Global Step: 102010 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:11:47,960-Speed 2629.93 samples/sec Loss 12.2938 LearningRate 0.0769 Epoch: 2 Global Step: 102020 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:11:51,858-Speed 2628.12 samples/sec Loss 12.2434 LearningRate 0.0769 Epoch: 2 Global Step: 102030 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:11:55,757-Speed 2627.07 samples/sec Loss 12.2913 LearningRate 0.0769 Epoch: 2 Global Step: 102040 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:11:59,649-Speed 2631.59 samples/sec Loss 12.3403 LearningRate 0.0769 Epoch: 2 Global Step: 102050 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:12:03,542-Speed 2630.70 samples/sec Loss 12.1955 LearningRate 0.0769 Epoch: 2 Global Step: 102060 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:12:07,438-Speed 2629.47 samples/sec Loss 12.1081 LearningRate 0.0769 Epoch: 2 Global Step: 102070 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:12:11,330-Speed 2631.29 samples/sec Loss 12.2871 LearningRate 0.0769 Epoch: 2 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:12:15,223-Speed 2630.79 samples/sec Loss 12.1583 LearningRate 0.0769 Epoch: 2 Global Step: 102090 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:12:19,119-Speed 2628.57 samples/sec Loss 12.2729 LearningRate 0.0769 Epoch: 2 Global Step: 102100 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:23,015-Speed 2628.75 samples/sec Loss 12.2197 LearningRate 0.0769 Epoch: 2 Global Step: 102110 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:26,911-Speed 2630.04 samples/sec Loss 12.2424 LearningRate 0.0769 Epoch: 2 Global Step: 102120 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:30,804-Speed 2630.40 samples/sec Loss 12.2355 LearningRate 0.0769 Epoch: 2 Global Step: 102130 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:34,709-Speed 2623.17 samples/sec Loss 12.2040 LearningRate 0.0769 Epoch: 2 Global Step: 102140 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:38,611-Speed 2624.88 samples/sec Loss 12.2032 LearningRate 0.0769 Epoch: 2 Global Step: 102150 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:42,508-Speed 2628.90 samples/sec Loss 12.3380 LearningRate 0.0769 Epoch: 2 Global Step: 102160 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:46,401-Speed 2630.66 samples/sec Loss 12.2054 LearningRate 0.0769 Epoch: 2 Global Step: 102170 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:50,294-Speed 2631.12 samples/sec Loss 12.1262 LearningRate 0.0769 Epoch: 2 Global Step: 102180 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:54,186-Speed 2632.00 samples/sec Loss 12.2094 LearningRate 0.0769 Epoch: 2 Global Step: 102190 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:12:58,089-Speed 2623.84 samples/sec Loss 12.1615 LearningRate 0.0769 Epoch: 2 Global Step: 102200 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:13:01,986-Speed 2628.19 samples/sec Loss 12.3645 LearningRate 0.0769 Epoch: 2 Global Step: 102210 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:13:05,881-Speed 2630.15 samples/sec Loss 12.2009 LearningRate 0.0769 Epoch: 2 Global Step: 102220 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:13:09,777-Speed 2628.79 samples/sec Loss 12.2743 LearningRate 0.0769 Epoch: 2 Global Step: 102230 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:13:13,669-Speed 2631.23 samples/sec Loss 12.2560 LearningRate 0.0769 Epoch: 2 Global Step: 102240 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:17,565-Speed 2629.39 samples/sec Loss 12.2688 LearningRate 0.0769 Epoch: 2 Global Step: 102250 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:21,475-Speed 2619.59 samples/sec Loss 12.2014 LearningRate 0.0769 Epoch: 2 Global Step: 102260 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:25,370-Speed 2629.38 samples/sec Loss 12.0947 LearningRate 0.0769 Epoch: 2 Global Step: 102270 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:29,265-Speed 2630.57 samples/sec Loss 12.2565 LearningRate 0.0769 Epoch: 2 Global Step: 102280 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:33,162-Speed 2627.81 samples/sec Loss 12.3066 LearningRate 0.0769 Epoch: 2 Global Step: 102290 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:37,072-Speed 2619.50 samples/sec Loss 12.1499 LearningRate 0.0769 Epoch: 2 Global Step: 102300 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:40,967-Speed 2629.38 samples/sec Loss 12.0755 LearningRate 0.0769 Epoch: 2 Global Step: 102310 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:44,862-Speed 2629.69 samples/sec Loss 12.1873 LearningRate 0.0769 Epoch: 2 Global Step: 102320 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:48,756-Speed 2630.44 samples/sec Loss 12.2116 LearningRate 0.0769 Epoch: 2 Global Step: 102330 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:13:52,653-Speed 2628.48 samples/sec Loss 12.2640 LearningRate 0.0768 Epoch: 2 Global Step: 102340 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:13:56,532-Speed 2639.97 samples/sec Loss 12.1860 LearningRate 0.0768 Epoch: 2 Global Step: 102350 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:14:00,514-Speed 2572.52 samples/sec Loss 12.3142 LearningRate 0.0768 Epoch: 2 Global Step: 102360 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:14:04,591-Speed 2512.54 samples/sec Loss 12.1283 LearningRate 0.0768 Epoch: 2 Global Step: 102370 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:14:08,589-Speed 2561.62 samples/sec Loss 12.0942 LearningRate 0.0768 Epoch: 2 Global Step: 102380 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:14:12,484-Speed 2629.77 samples/sec Loss 12.2319 LearningRate 0.0768 Epoch: 2 Global Step: 102390 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:14:16,379-Speed 2629.29 samples/sec Loss 12.1836 LearningRate 0.0768 Epoch: 2 Global Step: 102400 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:14:20,284-Speed 2622.88 samples/sec Loss 12.2646 LearningRate 0.0768 Epoch: 2 Global Step: 102410 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:14:24,166-Speed 2643.38 samples/sec Loss 12.2556 LearningRate 0.0768 Epoch: 2 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:28,069-Speed 2623.82 samples/sec Loss 12.0787 LearningRate 0.0768 Epoch: 2 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:31,973-Speed 2624.15 samples/sec Loss 12.1703 LearningRate 0.0768 Epoch: 2 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:35,867-Speed 2630.16 samples/sec Loss 12.2292 LearningRate 0.0768 Epoch: 2 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:39,764-Speed 2628.24 samples/sec Loss 12.3302 LearningRate 0.0768 Epoch: 2 Global Step: 102460 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:43,655-Speed 2632.02 samples/sec Loss 12.2017 LearningRate 0.0768 Epoch: 2 Global Step: 102470 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:47,555-Speed 2626.58 samples/sec Loss 12.2753 LearningRate 0.0768 Epoch: 2 Global Step: 102480 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:51,454-Speed 2627.45 samples/sec Loss 12.3237 LearningRate 0.0768 Epoch: 2 Global Step: 102490 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:55,346-Speed 2631.05 samples/sec Loss 12.2631 LearningRate 0.0768 Epoch: 2 Global Step: 102500 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:14:59,253-Speed 2621.45 samples/sec Loss 12.2867 LearningRate 0.0768 Epoch: 2 Global Step: 102510 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:15:03,148-Speed 2629.94 samples/sec Loss 12.3139 LearningRate 0.0768 Epoch: 2 Global Step: 102520 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:07,043-Speed 2629.37 samples/sec Loss 12.3185 LearningRate 0.0768 Epoch: 2 Global Step: 102530 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:10,945-Speed 2625.30 samples/sec Loss 12.1234 LearningRate 0.0768 Epoch: 2 Global Step: 102540 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:14,842-Speed 2628.18 samples/sec Loss 12.3464 LearningRate 0.0768 Epoch: 2 Global Step: 102550 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:18,741-Speed 2626.55 samples/sec Loss 12.1836 LearningRate 0.0768 Epoch: 2 Global Step: 102560 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:22,645-Speed 2624.29 samples/sec Loss 12.3583 LearningRate 0.0768 Epoch: 2 Global Step: 102570 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:26,556-Speed 2618.15 samples/sec Loss 12.3493 LearningRate 0.0768 Epoch: 2 Global Step: 102580 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:30,457-Speed 2625.87 samples/sec Loss 12.3093 LearningRate 0.0768 Epoch: 2 Global Step: 102590 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:15:34,334-Speed 2642.06 samples/sec Loss 12.4760 LearningRate 0.0768 Epoch: 2 Global Step: 102600 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:15:38,225-Speed 2631.57 samples/sec Loss 12.5732 LearningRate 0.0768 Epoch: 2 Global Step: 102610 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:15:42,122-Speed 2628.65 samples/sec Loss 12.4080 LearningRate 0.0768 Epoch: 2 Global Step: 102620 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:15:46,018-Speed 2628.87 samples/sec Loss 12.2528 LearningRate 0.0768 Epoch: 2 Global Step: 102630 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:15:49,913-Speed 2629.89 samples/sec Loss 12.2990 LearningRate 0.0768 Epoch: 2 Global Step: 102640 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:15:53,807-Speed 2630.29 samples/sec Loss 12.2819 LearningRate 0.0768 Epoch: 2 Global Step: 102650 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:15:57,713-Speed 2622.42 samples/sec Loss 12.4035 LearningRate 0.0768 Epoch: 2 Global Step: 102660 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:16:01,605-Speed 2631.60 samples/sec Loss 12.2369 LearningRate 0.0768 Epoch: 2 Global Step: 102670 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:16:05,508-Speed 2624.17 samples/sec Loss 12.1342 LearningRate 0.0768 Epoch: 2 Global Step: 102680 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:16:09,415-Speed 2621.28 samples/sec Loss 12.1245 LearningRate 0.0768 Epoch: 2 Global Step: 102690 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:16:13,326-Speed 2618.89 samples/sec Loss 12.1826 LearningRate 0.0768 Epoch: 2 Global Step: 102700 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:17,232-Speed 2622.76 samples/sec Loss 12.2153 LearningRate 0.0768 Epoch: 2 Global Step: 102710 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:21,133-Speed 2625.42 samples/sec Loss 12.2048 LearningRate 0.0768 Epoch: 2 Global Step: 102720 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:25,045-Speed 2618.16 samples/sec Loss 12.2632 LearningRate 0.0768 Epoch: 2 Global Step: 102730 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:28,937-Speed 2631.98 samples/sec Loss 12.4143 LearningRate 0.0768 Epoch: 2 Global Step: 102740 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:32,841-Speed 2623.24 samples/sec Loss 12.1846 LearningRate 0.0768 Epoch: 2 Global Step: 102750 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:36,736-Speed 2630.12 samples/sec Loss 12.3480 LearningRate 0.0768 Epoch: 2 Global Step: 102760 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:40,629-Speed 2630.88 samples/sec Loss 12.2049 LearningRate 0.0768 Epoch: 2 Global Step: 102770 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:44,521-Speed 2631.43 samples/sec Loss 12.1873 LearningRate 0.0768 Epoch: 2 Global Step: 102780 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:48,417-Speed 2628.97 samples/sec Loss 12.1127 LearningRate 0.0768 Epoch: 2 Global Step: 102790 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:16:52,310-Speed 2631.44 samples/sec Loss 12.3044 LearningRate 0.0768 Epoch: 2 Global Step: 102800 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:16:56,221-Speed 2618.61 samples/sec Loss 12.1643 LearningRate 0.0767 Epoch: 2 Global Step: 102810 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:17:00,141-Speed 2613.28 samples/sec Loss 12.2840 LearningRate 0.0767 Epoch: 2 Global Step: 102820 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:17:04,024-Speed 2637.82 samples/sec Loss 12.1622 LearningRate 0.0767 Epoch: 2 Global Step: 102830 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:07,924-Speed 2626.42 samples/sec Loss 11.9802 LearningRate 0.0767 Epoch: 2 Global Step: 102840 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:11,823-Speed 2627.37 samples/sec Loss 12.3053 LearningRate 0.0767 Epoch: 2 Global Step: 102850 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:15,723-Speed 2626.05 samples/sec Loss 12.2704 LearningRate 0.0767 Epoch: 2 Global Step: 102860 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:19,622-Speed 2626.55 samples/sec Loss 12.1794 LearningRate 0.0767 Epoch: 2 Global Step: 102870 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:23,516-Speed 2630.22 samples/sec Loss 12.2029 LearningRate 0.0767 Epoch: 2 Global Step: 102880 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:27,447-Speed 2606.38 samples/sec Loss 12.1963 LearningRate 0.0767 Epoch: 2 Global Step: 102890 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:31,337-Speed 2633.10 samples/sec Loss 12.1750 LearningRate 0.0767 Epoch: 2 Global Step: 102900 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:17:35,213-Speed 2642.17 samples/sec Loss 12.1784 LearningRate 0.0767 Epoch: 2 Global Step: 102910 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:17:39,123-Speed 2619.52 samples/sec Loss 12.1686 LearningRate 0.0767 Epoch: 2 Global Step: 102920 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:17:43,020-Speed 2628.35 samples/sec Loss 12.1427 LearningRate 0.0767 Epoch: 2 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:17:46,916-Speed 2628.93 samples/sec Loss 12.2312 LearningRate 0.0767 Epoch: 2 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:17:50,808-Speed 2631.98 samples/sec Loss 12.2392 LearningRate 0.0767 Epoch: 2 Global Step: 102950 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:17:54,696-Speed 2634.80 samples/sec Loss 12.1929 LearningRate 0.0767 Epoch: 2 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:17:58,598-Speed 2624.71 samples/sec Loss 12.0552 LearningRate 0.0767 Epoch: 2 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:02,490-Speed 2631.42 samples/sec Loss 12.1724 LearningRate 0.0767 Epoch: 2 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:06,381-Speed 2633.23 samples/sec Loss 12.3818 LearningRate 0.0767 Epoch: 2 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:10,270-Speed 2633.41 samples/sec Loss 12.2809 LearningRate 0.0767 Epoch: 2 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:14,161-Speed 2632.24 samples/sec Loss 12.2900 LearningRate 0.0767 Epoch: 2 Global Step: 103010 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:18:18,045-Speed 2637.08 samples/sec Loss 12.0736 LearningRate 0.0767 Epoch: 2 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:21,951-Speed 2623.12 samples/sec Loss 12.2291 LearningRate 0.0767 Epoch: 2 Global Step: 103030 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:25,850-Speed 2626.97 samples/sec Loss 12.1129 LearningRate 0.0767 Epoch: 2 Global Step: 103040 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:29,771-Speed 2611.86 samples/sec Loss 12.1548 LearningRate 0.0767 Epoch: 2 Global Step: 103050 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:33,676-Speed 2622.85 samples/sec Loss 12.2434 LearningRate 0.0767 Epoch: 2 Global Step: 103060 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:37,571-Speed 2630.17 samples/sec Loss 12.1950 LearningRate 0.0767 Epoch: 2 Global Step: 103070 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:41,460-Speed 2633.32 samples/sec Loss 12.2404 LearningRate 0.0767 Epoch: 2 Global Step: 103080 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:45,387-Speed 2608.39 samples/sec Loss 12.1179 LearningRate 0.0767 Epoch: 2 Global Step: 103090 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:49,297-Speed 2619.82 samples/sec Loss 12.1448 LearningRate 0.0767 Epoch: 2 Global Step: 103100 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:53,192-Speed 2629.50 samples/sec Loss 12.1400 LearningRate 0.0767 Epoch: 2 Global Step: 103110 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:18:57,089-Speed 2628.82 samples/sec Loss 12.2128 LearningRate 0.0767 Epoch: 2 Global Step: 103120 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:00,980-Speed 2631.69 samples/sec Loss 12.3559 LearningRate 0.0767 Epoch: 2 Global Step: 103130 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:04,871-Speed 2632.53 samples/sec Loss 12.2608 LearningRate 0.0767 Epoch: 2 Global Step: 103140 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:08,763-Speed 2631.34 samples/sec Loss 12.1999 LearningRate 0.0767 Epoch: 2 Global Step: 103150 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:12,654-Speed 2632.93 samples/sec Loss 12.2596 LearningRate 0.0767 Epoch: 2 Global Step: 103160 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:16,558-Speed 2623.66 samples/sec Loss 12.3314 LearningRate 0.0767 Epoch: 2 Global Step: 103170 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:20,470-Speed 2617.76 samples/sec Loss 12.2267 LearningRate 0.0767 Epoch: 2 Global Step: 103180 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:24,379-Speed 2620.94 samples/sec Loss 12.2213 LearningRate 0.0767 Epoch: 2 Global Step: 103190 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:28,294-Speed 2615.75 samples/sec Loss 12.2289 LearningRate 0.0767 Epoch: 2 Global Step: 103200 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:32,208-Speed 2617.05 samples/sec Loss 12.0932 LearningRate 0.0767 Epoch: 2 Global Step: 103210 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:36,120-Speed 2618.81 samples/sec Loss 12.2105 LearningRate 0.0767 Epoch: 2 Global Step: 103220 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:19:40,022-Speed 2625.02 samples/sec Loss 12.1696 LearningRate 0.0767 Epoch: 2 Global Step: 103230 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:19:43,913-Speed 2631.66 samples/sec Loss 12.2234 LearningRate 0.0767 Epoch: 2 Global Step: 103240 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:19:47,809-Speed 2629.42 samples/sec Loss 12.2353 LearningRate 0.0767 Epoch: 2 Global Step: 103250 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:19:51,685-Speed 2643.25 samples/sec Loss 12.2068 LearningRate 0.0767 Epoch: 2 Global Step: 103260 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:55,592-Speed 2621.63 samples/sec Loss 12.1431 LearningRate 0.0767 Epoch: 2 Global Step: 103270 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:19:59,488-Speed 2628.91 samples/sec Loss 12.2695 LearningRate 0.0767 Epoch: 2 Global Step: 103280 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:20:03,381-Speed 2631.40 samples/sec Loss 12.2344 LearningRate 0.0766 Epoch: 2 Global Step: 103290 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:20:07,275-Speed 2630.29 samples/sec Loss 12.1218 LearningRate 0.0766 Epoch: 2 Global Step: 103300 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:20:11,173-Speed 2627.16 samples/sec Loss 12.2609 LearningRate 0.0766 Epoch: 2 Global Step: 103310 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:20:15,087-Speed 2617.11 samples/sec Loss 12.2086 LearningRate 0.0766 Epoch: 2 Global Step: 103320 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:20:18,953-Speed 2649.08 samples/sec Loss 12.3539 LearningRate 0.0766 Epoch: 2 Global Step: 103330 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:22,856-Speed 2624.97 samples/sec Loss 12.5259 LearningRate 0.0766 Epoch: 2 Global Step: 103340 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:26,766-Speed 2619.86 samples/sec Loss 12.2670 LearningRate 0.0766 Epoch: 2 Global Step: 103350 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:30,658-Speed 2631.94 samples/sec Loss 12.2427 LearningRate 0.0766 Epoch: 2 Global Step: 103360 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:34,552-Speed 2630.59 samples/sec Loss 12.3601 LearningRate 0.0766 Epoch: 2 Global Step: 103370 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:38,458-Speed 2621.82 samples/sec Loss 12.0875 LearningRate 0.0766 Epoch: 2 Global Step: 103380 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:42,350-Speed 2631.55 samples/sec Loss 12.1752 LearningRate 0.0766 Epoch: 2 Global Step: 103390 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:46,242-Speed 2631.96 samples/sec Loss 12.1180 LearningRate 0.0766 Epoch: 2 Global Step: 103400 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:50,138-Speed 2629.21 samples/sec Loss 12.2851 LearningRate 0.0766 Epoch: 2 Global Step: 103410 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:54,044-Speed 2621.98 samples/sec Loss 12.1039 LearningRate 0.0766 Epoch: 2 Global Step: 103420 Fp16 Grad Scale: 32768 Required: 82 hours
Training: 2022-04-13 07:20:57,947-Speed 2624.25 samples/sec Loss 12.3190 LearningRate 0.0766 Epoch: 2 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:01,863-Speed 2615.91 samples/sec Loss 12.1148 LearningRate 0.0766 Epoch: 2 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:05,763-Speed 2626.66 samples/sec Loss 12.0321 LearningRate 0.0766 Epoch: 2 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:09,655-Speed 2631.07 samples/sec Loss 12.1902 LearningRate 0.0766 Epoch: 2 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:13,549-Speed 2630.20 samples/sec Loss 12.0855 LearningRate 0.0766 Epoch: 2 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:17,465-Speed 2616.09 samples/sec Loss 12.1602 LearningRate 0.0766 Epoch: 2 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:21,358-Speed 2630.99 samples/sec Loss 12.1656 LearningRate 0.0766 Epoch: 2 Global Step: 103490 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:25,274-Speed 2615.65 samples/sec Loss 12.2833 LearningRate 0.0766 Epoch: 2 Global Step: 103500 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:29,174-Speed 2626.53 samples/sec Loss 12.3899 LearningRate 0.0766 Epoch: 2 Global Step: 103510 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:33,070-Speed 2629.40 samples/sec Loss 12.1381 LearningRate 0.0766 Epoch: 2 Global Step: 103520 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:21:36,969-Speed 2626.77 samples/sec Loss 11.9863 LearningRate 0.0766 Epoch: 2 Global Step: 103530 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:21:40,863-Speed 2630.26 samples/sec Loss 12.2981 LearningRate 0.0766 Epoch: 2 Global Step: 103540 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:21:44,760-Speed 2628.47 samples/sec Loss 12.1568 LearningRate 0.0766 Epoch: 2 Global Step: 103550 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:21:48,656-Speed 2629.16 samples/sec Loss 12.2095 LearningRate 0.0766 Epoch: 2 Global Step: 103560 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:21:52,549-Speed 2631.31 samples/sec Loss 12.0828 LearningRate 0.0766 Epoch: 2 Global Step: 103570 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:21:56,455-Speed 2622.33 samples/sec Loss 12.1764 LearningRate 0.0766 Epoch: 2 Global Step: 103580 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:00,357-Speed 2625.21 samples/sec Loss 12.3042 LearningRate 0.0766 Epoch: 2 Global Step: 103590 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:04,253-Speed 2628.62 samples/sec Loss 12.0015 LearningRate 0.0766 Epoch: 2 Global Step: 103600 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:08,157-Speed 2623.28 samples/sec Loss 12.3096 LearningRate 0.0766 Epoch: 2 Global Step: 103610 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:12,058-Speed 2625.67 samples/sec Loss 12.3122 LearningRate 0.0766 Epoch: 2 Global Step: 103620 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:15,952-Speed 2629.97 samples/sec Loss 12.1485 LearningRate 0.0766 Epoch: 2 Global Step: 103630 Fp16 Grad Scale: 262144 Required: 82 hours
Training: 2022-04-13 07:22:19,836-Speed 2637.29 samples/sec Loss 12.2795 LearningRate 0.0766 Epoch: 2 Global Step: 103640 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:23,735-Speed 2627.38 samples/sec Loss 12.2563 LearningRate 0.0766 Epoch: 2 Global Step: 103650 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:27,639-Speed 2624.36 samples/sec Loss 12.1888 LearningRate 0.0766 Epoch: 2 Global Step: 103660 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:31,537-Speed 2627.74 samples/sec Loss 12.1773 LearningRate 0.0766 Epoch: 2 Global Step: 103670 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:35,437-Speed 2626.12 samples/sec Loss 12.1839 LearningRate 0.0766 Epoch: 2 Global Step: 103680 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:39,340-Speed 2624.17 samples/sec Loss 12.1108 LearningRate 0.0766 Epoch: 2 Global Step: 103690 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:43,239-Speed 2627.01 samples/sec Loss 12.1221 LearningRate 0.0766 Epoch: 2 Global Step: 103700 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:47,132-Speed 2630.29 samples/sec Loss 12.1737 LearningRate 0.0766 Epoch: 2 Global Step: 103710 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:22:51,014-Speed 2639.31 samples/sec Loss 12.2011 LearningRate 0.0766 Epoch: 2 Global Step: 103720 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:22:54,904-Speed 2632.61 samples/sec Loss 12.2444 LearningRate 0.0766 Epoch: 2 Global Step: 103730 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:22:58,797-Speed 2631.31 samples/sec Loss 12.1434 LearningRate 0.0766 Epoch: 2 Global Step: 103740 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:02,709-Speed 2618.17 samples/sec Loss 12.2640 LearningRate 0.0766 Epoch: 2 Global Step: 103750 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:06,601-Speed 2632.32 samples/sec Loss 12.2150 LearningRate 0.0765 Epoch: 2 Global Step: 103760 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:10,494-Speed 2630.54 samples/sec Loss 12.1398 LearningRate 0.0765 Epoch: 2 Global Step: 103770 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:14,392-Speed 2627.96 samples/sec Loss 12.0545 LearningRate 0.0765 Epoch: 2 Global Step: 103780 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:18,292-Speed 2625.86 samples/sec Loss 12.3392 LearningRate 0.0765 Epoch: 2 Global Step: 103790 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:22,186-Speed 2630.62 samples/sec Loss 12.1687 LearningRate 0.0765 Epoch: 2 Global Step: 103800 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:26,078-Speed 2631.94 samples/sec Loss 12.2499 LearningRate 0.0765 Epoch: 2 Global Step: 103810 Fp16 Grad Scale: 65536 Required: 82 hours
Training: 2022-04-13 07:23:29,993-Speed 2616.59 samples/sec Loss 12.1643 LearningRate 0.0765 Epoch: 2 Global Step: 103820 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:23:33,887-Speed 2630.14 samples/sec Loss 12.2311 LearningRate 0.0765 Epoch: 2 Global Step: 103830 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:23:37,795-Speed 2621.30 samples/sec Loss 12.2044 LearningRate 0.0765 Epoch: 2 Global Step: 103840 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:23:41,713-Speed 2614.32 samples/sec Loss 12.0922 LearningRate 0.0765 Epoch: 2 Global Step: 103850 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:23:45,608-Speed 2629.66 samples/sec Loss 12.1947 LearningRate 0.0765 Epoch: 2 Global Step: 103860 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:23:49,500-Speed 2631.38 samples/sec Loss 12.1682 LearningRate 0.0765 Epoch: 2 Global Step: 103870 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:23:53,401-Speed 2625.84 samples/sec Loss 12.1183 LearningRate 0.0765 Epoch: 2 Global Step: 103880 Fp16 Grad Scale: 131072 Required: 82 hours
Training: 2022-04-13 07:23:57,301-Speed 2626.94 samples/sec Loss 12.1906 LearningRate 0.0765 Epoch: 2 Global Step: 103890 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:01,196-Speed 2629.68 samples/sec Loss 12.1330 LearningRate 0.0765 Epoch: 2 Global Step: 103900 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:05,091-Speed 2630.13 samples/sec Loss 12.1191 LearningRate 0.0765 Epoch: 2 Global Step: 103910 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:09,000-Speed 2620.37 samples/sec Loss 12.3123 LearningRate 0.0765 Epoch: 2 Global Step: 103920 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:24:12,896-Speed 2628.79 samples/sec Loss 12.1130 LearningRate 0.0765 Epoch: 2 Global Step: 103930 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:24:16,788-Speed 2631.50 samples/sec Loss 12.1960 LearningRate 0.0765 Epoch: 2 Global Step: 103940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:20,692-Speed 2623.91 samples/sec Loss 12.1054 LearningRate 0.0765 Epoch: 2 Global Step: 103950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:24,595-Speed 2624.04 samples/sec Loss 12.2250 LearningRate 0.0765 Epoch: 2 Global Step: 103960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:28,511-Speed 2616.10 samples/sec Loss 12.1282 LearningRate 0.0765 Epoch: 2 Global Step: 103970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:32,410-Speed 2626.80 samples/sec Loss 12.2532 LearningRate 0.0765 Epoch: 2 Global Step: 103980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:36,311-Speed 2626.20 samples/sec Loss 12.1321 LearningRate 0.0765 Epoch: 2 Global Step: 103990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:40,206-Speed 2629.30 samples/sec Loss 12.0551 LearningRate 0.0765 Epoch: 2 Global Step: 104000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:44,121-Speed 2616.61 samples/sec Loss 12.0718 LearningRate 0.0765 Epoch: 2 Global Step: 104010 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:48,031-Speed 2619.44 samples/sec Loss 12.1762 LearningRate 0.0765 Epoch: 2 Global Step: 104020 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:51,928-Speed 2629.13 samples/sec Loss 12.1915 LearningRate 0.0765 Epoch: 2 Global Step: 104030 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:24:55,825-Speed 2627.77 samples/sec Loss 12.1825 LearningRate 0.0765 Epoch: 2 Global Step: 104040 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:24:59,829-Speed 2558.34 samples/sec Loss 12.2123 LearningRate 0.0765 Epoch: 2 Global Step: 104050 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:25:03,730-Speed 2625.03 samples/sec Loss 12.2001 LearningRate 0.0765 Epoch: 2 Global Step: 104060 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:25:07,625-Speed 2630.23 samples/sec Loss 12.3199 LearningRate 0.0765 Epoch: 2 Global Step: 104070 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:25:11,616-Speed 2566.57 samples/sec Loss 12.2402 LearningRate 0.0765 Epoch: 2 Global Step: 104080 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:15,673-Speed 2524.23 samples/sec Loss 12.0971 LearningRate 0.0765 Epoch: 2 Global Step: 104090 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:19,746-Speed 2514.73 samples/sec Loss 12.1841 LearningRate 0.0765 Epoch: 2 Global Step: 104100 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:23,816-Speed 2516.68 samples/sec Loss 12.1156 LearningRate 0.0765 Epoch: 2 Global Step: 104110 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:27,897-Speed 2510.07 samples/sec Loss 12.1042 LearningRate 0.0765 Epoch: 2 Global Step: 104120 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:31,938-Speed 2534.42 samples/sec Loss 12.1305 LearningRate 0.0765 Epoch: 2 Global Step: 104130 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:35,834-Speed 2629.21 samples/sec Loss 12.1200 LearningRate 0.0765 Epoch: 2 Global Step: 104140 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:39,730-Speed 2628.81 samples/sec Loss 12.1821 LearningRate 0.0765 Epoch: 2 Global Step: 104150 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:43,631-Speed 2625.77 samples/sec Loss 12.2137 LearningRate 0.0765 Epoch: 2 Global Step: 104160 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:47,527-Speed 2628.82 samples/sec Loss 12.2472 LearningRate 0.0765 Epoch: 2 Global Step: 104170 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:25:51,421-Speed 2630.08 samples/sec Loss 12.2565 LearningRate 0.0765 Epoch: 2 Global Step: 104180 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:25:55,289-Speed 2649.11 samples/sec Loss 12.1614 LearningRate 0.0765 Epoch: 2 Global Step: 104190 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:25:59,182-Speed 2630.82 samples/sec Loss 12.3095 LearningRate 0.0765 Epoch: 2 Global Step: 104200 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:03,078-Speed 2628.89 samples/sec Loss 12.1699 LearningRate 0.0765 Epoch: 2 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:06,987-Speed 2620.05 samples/sec Loss 12.2803 LearningRate 0.0765 Epoch: 2 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:10,894-Speed 2622.17 samples/sec Loss 12.0826 LearningRate 0.0765 Epoch: 2 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:14,792-Speed 2627.59 samples/sec Loss 12.0162 LearningRate 0.0764 Epoch: 2 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:18,685-Speed 2630.88 samples/sec Loss 12.2482 LearningRate 0.0764 Epoch: 2 Global Step: 104250 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:22,582-Speed 2628.28 samples/sec Loss 12.0227 LearningRate 0.0764 Epoch: 2 Global Step: 104260 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:26,477-Speed 2630.04 samples/sec Loss 12.1014 LearningRate 0.0764 Epoch: 2 Global Step: 104270 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:30,397-Speed 2612.43 samples/sec Loss 12.1500 LearningRate 0.0764 Epoch: 2 Global Step: 104280 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:26:34,291-Speed 2630.57 samples/sec Loss 12.0578 LearningRate 0.0764 Epoch: 2 Global Step: 104290 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:26:38,193-Speed 2625.30 samples/sec Loss 12.1094 LearningRate 0.0764 Epoch: 2 Global Step: 104300 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:26:42,091-Speed 2627.34 samples/sec Loss 12.1258 LearningRate 0.0764 Epoch: 2 Global Step: 104310 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:26:45,988-Speed 2628.26 samples/sec Loss 12.1136 LearningRate 0.0764 Epoch: 2 Global Step: 104320 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:26:49,965-Speed 2575.25 samples/sec Loss 12.2484 LearningRate 0.0764 Epoch: 2 Global Step: 104330 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:26:53,874-Speed 2620.77 samples/sec Loss 12.3049 LearningRate 0.0764 Epoch: 2 Global Step: 104340 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:26:57,770-Speed 2629.15 samples/sec Loss 12.2111 LearningRate 0.0764 Epoch: 2 Global Step: 104350 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:27:01,663-Speed 2630.58 samples/sec Loss 12.0884 LearningRate 0.0764 Epoch: 2 Global Step: 104360 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:27:05,539-Speed 2642.82 samples/sec Loss 12.3076 LearningRate 0.0764 Epoch: 2 Global Step: 104370 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:09,432-Speed 2630.60 samples/sec Loss 12.1819 LearningRate 0.0764 Epoch: 2 Global Step: 104380 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:13,326-Speed 2630.84 samples/sec Loss 12.0106 LearningRate 0.0764 Epoch: 2 Global Step: 104390 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:17,218-Speed 2631.29 samples/sec Loss 12.2035 LearningRate 0.0764 Epoch: 2 Global Step: 104400 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:21,107-Speed 2633.77 samples/sec Loss 12.1374 LearningRate 0.0764 Epoch: 2 Global Step: 104410 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:25,012-Speed 2622.87 samples/sec Loss 12.2912 LearningRate 0.0764 Epoch: 2 Global Step: 104420 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:28,913-Speed 2626.02 samples/sec Loss 12.1247 LearningRate 0.0764 Epoch: 2 Global Step: 104430 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:32,807-Speed 2630.51 samples/sec Loss 12.1036 LearningRate 0.0764 Epoch: 2 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:36,705-Speed 2627.30 samples/sec Loss 12.3275 LearningRate 0.0764 Epoch: 2 Global Step: 104450 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:40,602-Speed 2627.92 samples/sec Loss 12.2157 LearningRate 0.0764 Epoch: 2 Global Step: 104460 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:27:44,503-Speed 2625.66 samples/sec Loss 12.1639 LearningRate 0.0764 Epoch: 2 Global Step: 104470 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:27:48,411-Speed 2621.11 samples/sec Loss 12.0892 LearningRate 0.0764 Epoch: 2 Global Step: 104480 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:27:52,308-Speed 2628.03 samples/sec Loss 12.1159 LearningRate 0.0764 Epoch: 2 Global Step: 104490 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:27:56,204-Speed 2628.80 samples/sec Loss 12.1715 LearningRate 0.0764 Epoch: 2 Global Step: 104500 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:28:00,102-Speed 2627.82 samples/sec Loss 12.1546 LearningRate 0.0764 Epoch: 2 Global Step: 104510 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:28:03,995-Speed 2631.20 samples/sec Loss 12.1585 LearningRate 0.0764 Epoch: 2 Global Step: 104520 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:28:07,887-Speed 2631.25 samples/sec Loss 12.0926 LearningRate 0.0764 Epoch: 2 Global Step: 104530 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:28:11,782-Speed 2629.91 samples/sec Loss 12.1943 LearningRate 0.0764 Epoch: 2 Global Step: 104540 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:28:15,676-Speed 2630.00 samples/sec Loss 12.2820 LearningRate 0.0764 Epoch: 2 Global Step: 104550 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:28:19,578-Speed 2624.66 samples/sec Loss 12.2065 LearningRate 0.0764 Epoch: 2 Global Step: 104560 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:28:23,508-Speed 2606.50 samples/sec Loss 12.1263 LearningRate 0.0764 Epoch: 2 Global Step: 104570 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:27,407-Speed 2627.29 samples/sec Loss 12.1626 LearningRate 0.0764 Epoch: 2 Global Step: 104580 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:31,313-Speed 2622.46 samples/sec Loss 12.1979 LearningRate 0.0764 Epoch: 2 Global Step: 104590 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:35,198-Speed 2635.92 samples/sec Loss 12.0360 LearningRate 0.0764 Epoch: 2 Global Step: 104600 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:39,095-Speed 2629.25 samples/sec Loss 12.2204 LearningRate 0.0764 Epoch: 2 Global Step: 104610 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:43,202-Speed 2493.25 samples/sec Loss 12.1786 LearningRate 0.0764 Epoch: 2 Global Step: 104620 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:47,157-Speed 2589.76 samples/sec Loss 12.1608 LearningRate 0.0764 Epoch: 2 Global Step: 104630 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:51,067-Speed 2619.80 samples/sec Loss 12.2400 LearningRate 0.0764 Epoch: 2 Global Step: 104640 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:54,965-Speed 2627.28 samples/sec Loss 12.2995 LearningRate 0.0764 Epoch: 2 Global Step: 104650 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:28:58,860-Speed 2630.15 samples/sec Loss 12.2631 LearningRate 0.0764 Epoch: 2 Global Step: 104660 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:29:02,763-Speed 2623.66 samples/sec Loss 12.2951 LearningRate 0.0764 Epoch: 2 Global Step: 104670 Fp16 Grad Scale: 524288 Required: 81 hours
Training: 2022-04-13 07:29:06,640-Speed 2642.28 samples/sec Loss 12.1110 LearningRate 0.0764 Epoch: 2 Global Step: 104680 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:29:10,541-Speed 2626.05 samples/sec Loss 12.1165 LearningRate 0.0764 Epoch: 2 Global Step: 104690 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:29:14,435-Speed 2629.67 samples/sec Loss 12.0330 LearningRate 0.0764 Epoch: 2 Global Step: 104700 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:29:18,328-Speed 2631.34 samples/sec Loss 12.3109 LearningRate 0.0763 Epoch: 2 Global Step: 104710 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:29:22,231-Speed 2624.82 samples/sec Loss 12.1562 LearningRate 0.0763 Epoch: 2 Global Step: 104720 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:29:26,130-Speed 2627.08 samples/sec Loss 12.1233 LearningRate 0.0763 Epoch: 2 Global Step: 104730 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:29:30,028-Speed 2627.89 samples/sec Loss 12.2826 LearningRate 0.0763 Epoch: 2 Global Step: 104740 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:29:33,933-Speed 2622.66 samples/sec Loss 12.3044 LearningRate 0.0763 Epoch: 2 Global Step: 104750 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:29:37,807-Speed 2644.06 samples/sec Loss 12.3629 LearningRate 0.0763 Epoch: 2 Global Step: 104760 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:29:41,699-Speed 2631.81 samples/sec Loss 12.2540 LearningRate 0.0763 Epoch: 2 Global Step: 104770 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:29:45,590-Speed 2632.15 samples/sec Loss 12.1929 LearningRate 0.0763 Epoch: 2 Global Step: 104780 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:29:49,485-Speed 2629.83 samples/sec Loss 12.0655 LearningRate 0.0763 Epoch: 2 Global Step: 104790 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:29:53,410-Speed 2610.11 samples/sec Loss 12.1439 LearningRate 0.0763 Epoch: 2 Global Step: 104800 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:29:57,314-Speed 2623.27 samples/sec Loss 12.0365 LearningRate 0.0763 Epoch: 2 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:30:01,226-Speed 2618.66 samples/sec Loss 12.2332 LearningRate 0.0763 Epoch: 2 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:30:05,124-Speed 2628.25 samples/sec Loss 12.3663 LearningRate 0.0763 Epoch: 2 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:30:09,016-Speed 2631.49 samples/sec Loss 12.0917 LearningRate 0.0763 Epoch: 2 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:30:12,915-Speed 2626.75 samples/sec Loss 12.1779 LearningRate 0.0763 Epoch: 2 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:30:16,821-Speed 2622.56 samples/sec Loss 12.2960 LearningRate 0.0763 Epoch: 2 Global Step: 104860 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:20,713-Speed 2631.33 samples/sec Loss 12.1904 LearningRate 0.0763 Epoch: 2 Global Step: 104870 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:24,613-Speed 2626.89 samples/sec Loss 12.3483 LearningRate 0.0763 Epoch: 2 Global Step: 104880 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:28,506-Speed 2631.19 samples/sec Loss 12.3147 LearningRate 0.0763 Epoch: 2 Global Step: 104890 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:32,398-Speed 2631.73 samples/sec Loss 12.0905 LearningRate 0.0763 Epoch: 2 Global Step: 104900 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:36,293-Speed 2629.43 samples/sec Loss 12.1552 LearningRate 0.0763 Epoch: 2 Global Step: 104910 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:40,186-Speed 2630.45 samples/sec Loss 12.1943 LearningRate 0.0763 Epoch: 2 Global Step: 104920 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:44,090-Speed 2623.95 samples/sec Loss 12.2414 LearningRate 0.0763 Epoch: 2 Global Step: 104930 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:47,975-Speed 2636.28 samples/sec Loss 12.2108 LearningRate 0.0763 Epoch: 2 Global Step: 104940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:51,872-Speed 2628.88 samples/sec Loss 12.1823 LearningRate 0.0763 Epoch: 2 Global Step: 104950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:55,764-Speed 2631.58 samples/sec Loss 12.0669 LearningRate 0.0763 Epoch: 2 Global Step: 104960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:30:59,662-Speed 2627.49 samples/sec Loss 12.2513 LearningRate 0.0763 Epoch: 2 Global Step: 104970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:03,562-Speed 2626.26 samples/sec Loss 12.2454 LearningRate 0.0763 Epoch: 2 Global Step: 104980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:07,461-Speed 2626.63 samples/sec Loss 12.2952 LearningRate 0.0763 Epoch: 2 Global Step: 104990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:11,368-Speed 2622.35 samples/sec Loss 12.1808 LearningRate 0.0763 Epoch: 2 Global Step: 105000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:15,264-Speed 2628.74 samples/sec Loss 12.2372 LearningRate 0.0763 Epoch: 2 Global Step: 105010 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:19,184-Speed 2612.93 samples/sec Loss 12.0052 LearningRate 0.0763 Epoch: 2 Global Step: 105020 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:23,082-Speed 2628.00 samples/sec Loss 12.1660 LearningRate 0.0763 Epoch: 2 Global Step: 105030 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:26,977-Speed 2630.00 samples/sec Loss 12.0542 LearningRate 0.0763 Epoch: 2 Global Step: 105040 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:31:30,890-Speed 2617.72 samples/sec Loss 12.1448 LearningRate 0.0763 Epoch: 2 Global Step: 105050 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:31:34,783-Speed 2630.69 samples/sec Loss 11.9839 LearningRate 0.0763 Epoch: 2 Global Step: 105060 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:31:38,680-Speed 2627.99 samples/sec Loss 12.1629 LearningRate 0.0763 Epoch: 2 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:31:42,574-Speed 2630.65 samples/sec Loss 12.1680 LearningRate 0.0763 Epoch: 2 Global Step: 105080 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:31:46,467-Speed 2631.16 samples/sec Loss 12.2777 LearningRate 0.0763 Epoch: 2 Global Step: 105090 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:31:50,371-Speed 2623.92 samples/sec Loss 12.1626 LearningRate 0.0763 Epoch: 2 Global Step: 105100 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:31:54,260-Speed 2633.54 samples/sec Loss 12.2306 LearningRate 0.0763 Epoch: 2 Global Step: 105110 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:31:58,150-Speed 2632.85 samples/sec Loss 12.1406 LearningRate 0.0763 Epoch: 2 Global Step: 105120 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:32:02,042-Speed 2631.91 samples/sec Loss 12.2782 LearningRate 0.0763 Epoch: 2 Global Step: 105130 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:32:05,932-Speed 2632.45 samples/sec Loss 12.2338 LearningRate 0.0763 Epoch: 2 Global Step: 105140 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:32:09,830-Speed 2627.75 samples/sec Loss 12.2786 LearningRate 0.0763 Epoch: 2 Global Step: 105150 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:13,729-Speed 2626.63 samples/sec Loss 12.2385 LearningRate 0.0763 Epoch: 2 Global Step: 105160 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:17,623-Speed 2630.71 samples/sec Loss 12.2826 LearningRate 0.0763 Epoch: 2 Global Step: 105170 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:21,516-Speed 2631.55 samples/sec Loss 12.0359 LearningRate 0.0763 Epoch: 2 Global Step: 105180 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:25,415-Speed 2626.57 samples/sec Loss 12.1738 LearningRate 0.0762 Epoch: 2 Global Step: 105190 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:29,310-Speed 2629.81 samples/sec Loss 12.1550 LearningRate 0.0762 Epoch: 2 Global Step: 105200 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:33,211-Speed 2625.55 samples/sec Loss 12.1621 LearningRate 0.0762 Epoch: 2 Global Step: 105210 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:37,107-Speed 2628.50 samples/sec Loss 12.2652 LearningRate 0.0762 Epoch: 2 Global Step: 105220 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:40,999-Speed 2632.00 samples/sec Loss 12.0388 LearningRate 0.0762 Epoch: 2 Global Step: 105230 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:44,894-Speed 2629.74 samples/sec Loss 12.1801 LearningRate 0.0762 Epoch: 2 Global Step: 105240 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:32:48,787-Speed 2631.11 samples/sec Loss 12.1633 LearningRate 0.0762 Epoch: 2 Global Step: 105250 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:32:52,683-Speed 2628.78 samples/sec Loss 12.2276 LearningRate 0.0762 Epoch: 2 Global Step: 105260 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:32:56,580-Speed 2628.98 samples/sec Loss 12.0434 LearningRate 0.0762 Epoch: 2 Global Step: 105270 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:33:00,448-Speed 2647.92 samples/sec Loss 11.9990 LearningRate 0.0762 Epoch: 2 Global Step: 105280 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:04,340-Speed 2631.39 samples/sec Loss 11.9901 LearningRate 0.0762 Epoch: 2 Global Step: 105290 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:08,233-Speed 2631.18 samples/sec Loss 12.2099 LearningRate 0.0762 Epoch: 2 Global Step: 105300 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:12,123-Speed 2633.19 samples/sec Loss 12.0475 LearningRate 0.0762 Epoch: 2 Global Step: 105310 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:16,015-Speed 2631.90 samples/sec Loss 12.0915 LearningRate 0.0762 Epoch: 2 Global Step: 105320 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:19,911-Speed 2629.12 samples/sec Loss 12.1920 LearningRate 0.0762 Epoch: 2 Global Step: 105330 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:23,850-Speed 2600.23 samples/sec Loss 12.2253 LearningRate 0.0762 Epoch: 2 Global Step: 105340 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:27,744-Speed 2631.00 samples/sec Loss 12.2892 LearningRate 0.0762 Epoch: 2 Global Step: 105350 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:31,658-Speed 2616.54 samples/sec Loss 12.3795 LearningRate 0.0762 Epoch: 2 Global Step: 105360 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:35,572-Speed 2616.73 samples/sec Loss 12.1746 LearningRate 0.0762 Epoch: 2 Global Step: 105370 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:33:39,489-Speed 2614.25 samples/sec Loss 12.1443 LearningRate 0.0762 Epoch: 2 Global Step: 105380 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:33:43,401-Speed 2619.08 samples/sec Loss 11.9622 LearningRate 0.0762 Epoch: 2 Global Step: 105390 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:33:47,320-Speed 2613.10 samples/sec Loss 12.3554 LearningRate 0.0762 Epoch: 2 Global Step: 105400 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:33:51,219-Speed 2627.44 samples/sec Loss 12.2863 LearningRate 0.0762 Epoch: 2 Global Step: 105410 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:33:55,119-Speed 2626.02 samples/sec Loss 12.1485 LearningRate 0.0762 Epoch: 2 Global Step: 105420 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:33:59,028-Speed 2620.85 samples/sec Loss 12.3475 LearningRate 0.0762 Epoch: 2 Global Step: 105430 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:34:02,941-Speed 2617.07 samples/sec Loss 12.2081 LearningRate 0.0762 Epoch: 2 Global Step: 105440 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:34:06,895-Speed 2590.25 samples/sec Loss 12.2380 LearningRate 0.0762 Epoch: 2 Global Step: 105450 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:34:10,881-Speed 2570.00 samples/sec Loss 12.3191 LearningRate 0.0762 Epoch: 2 Global Step: 105460 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:34:14,773-Speed 2631.87 samples/sec Loss 12.2630 LearningRate 0.0762 Epoch: 2 Global Step: 105470 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:34:18,662-Speed 2634.30 samples/sec Loss 12.3122 LearningRate 0.0762 Epoch: 2 Global Step: 105480 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:22,562-Speed 2626.07 samples/sec Loss 12.6681 LearningRate 0.0762 Epoch: 2 Global Step: 105490 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:26,463-Speed 2625.68 samples/sec Loss 12.3751 LearningRate 0.0762 Epoch: 2 Global Step: 105500 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:30,357-Speed 2630.44 samples/sec Loss 12.2226 LearningRate 0.0762 Epoch: 2 Global Step: 105510 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:34,274-Speed 2615.33 samples/sec Loss 12.2084 LearningRate 0.0762 Epoch: 2 Global Step: 105520 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:38,243-Speed 2581.13 samples/sec Loss 12.2194 LearningRate 0.0762 Epoch: 2 Global Step: 105530 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:42,145-Speed 2624.97 samples/sec Loss 12.3660 LearningRate 0.0762 Epoch: 2 Global Step: 105540 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:46,036-Speed 2632.42 samples/sec Loss 12.1954 LearningRate 0.0762 Epoch: 2 Global Step: 105550 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:49,989-Speed 2590.74 samples/sec Loss 12.0879 LearningRate 0.0762 Epoch: 2 Global Step: 105560 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:53,890-Speed 2625.82 samples/sec Loss 12.1549 LearningRate 0.0762 Epoch: 2 Global Step: 105570 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:34:57,780-Speed 2632.78 samples/sec Loss 12.2021 LearningRate 0.0762 Epoch: 2 Global Step: 105580 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:01,673-Speed 2631.17 samples/sec Loss 12.1441 LearningRate 0.0762 Epoch: 2 Global Step: 105590 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:05,569-Speed 2629.11 samples/sec Loss 12.1587 LearningRate 0.0762 Epoch: 2 Global Step: 105600 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:09,474-Speed 2623.19 samples/sec Loss 12.2875 LearningRate 0.0762 Epoch: 2 Global Step: 105610 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:13,376-Speed 2625.07 samples/sec Loss 12.2957 LearningRate 0.0762 Epoch: 2 Global Step: 105620 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:17,274-Speed 2627.26 samples/sec Loss 12.2303 LearningRate 0.0762 Epoch: 2 Global Step: 105630 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:21,177-Speed 2624.44 samples/sec Loss 12.2017 LearningRate 0.0762 Epoch: 2 Global Step: 105640 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:25,087-Speed 2619.36 samples/sec Loss 12.0352 LearningRate 0.0762 Epoch: 2 Global Step: 105650 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:28,993-Speed 2622.31 samples/sec Loss 12.1423 LearningRate 0.0761 Epoch: 2 Global Step: 105660 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:32,894-Speed 2625.35 samples/sec Loss 12.2646 LearningRate 0.0761 Epoch: 2 Global Step: 105670 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:36,800-Speed 2623.14 samples/sec Loss 12.1476 LearningRate 0.0761 Epoch: 2 Global Step: 105680 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:35:40,697-Speed 2628.33 samples/sec Loss 12.0637 LearningRate 0.0761 Epoch: 2 Global Step: 105690 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:35:44,598-Speed 2625.35 samples/sec Loss 12.1976 LearningRate 0.0761 Epoch: 2 Global Step: 105700 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:35:48,499-Speed 2625.35 samples/sec Loss 12.2499 LearningRate 0.0761 Epoch: 2 Global Step: 105710 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:35:52,381-Speed 2638.82 samples/sec Loss 12.2408 LearningRate 0.0761 Epoch: 2 Global Step: 105720 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:35:56,279-Speed 2627.46 samples/sec Loss 12.2115 LearningRate 0.0761 Epoch: 2 Global Step: 105730 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:00,172-Speed 2631.24 samples/sec Loss 12.2932 LearningRate 0.0761 Epoch: 2 Global Step: 105740 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:04,079-Speed 2621.25 samples/sec Loss 12.1140 LearningRate 0.0761 Epoch: 2 Global Step: 105750 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:07,974-Speed 2630.25 samples/sec Loss 12.0954 LearningRate 0.0761 Epoch: 2 Global Step: 105760 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:11,875-Speed 2625.43 samples/sec Loss 12.2772 LearningRate 0.0761 Epoch: 2 Global Step: 105770 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:15,850-Speed 2577.10 samples/sec Loss 12.1174 LearningRate 0.0761 Epoch: 2 Global Step: 105780 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:19,920-Speed 2516.04 samples/sec Loss 12.2266 LearningRate 0.0761 Epoch: 2 Global Step: 105790 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:24,001-Speed 2509.65 samples/sec Loss 12.2279 LearningRate 0.0761 Epoch: 2 Global Step: 105800 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:28,045-Speed 2532.90 samples/sec Loss 12.2610 LearningRate 0.0761 Epoch: 2 Global Step: 105810 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:31,937-Speed 2631.48 samples/sec Loss 12.0572 LearningRate 0.0761 Epoch: 2 Global Step: 105820 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:36:35,845-Speed 2620.97 samples/sec Loss 12.1448 LearningRate 0.0761 Epoch: 2 Global Step: 105830 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:36:39,747-Speed 2625.10 samples/sec Loss 12.0173 LearningRate 0.0761 Epoch: 2 Global Step: 105840 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:36:43,649-Speed 2625.03 samples/sec Loss 12.1755 LearningRate 0.0761 Epoch: 2 Global Step: 105850 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:36:47,718-Speed 2516.94 samples/sec Loss 12.1783 LearningRate 0.0761 Epoch: 2 Global Step: 105860 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:36:51,739-Speed 2547.37 samples/sec Loss 12.0636 LearningRate 0.0761 Epoch: 2 Global Step: 105870 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:55,635-Speed 2629.42 samples/sec Loss 12.1300 LearningRate 0.0761 Epoch: 2 Global Step: 105880 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:36:59,529-Speed 2630.14 samples/sec Loss 12.0425 LearningRate 0.0761 Epoch: 2 Global Step: 105890 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:03,431-Speed 2624.48 samples/sec Loss 12.2914 LearningRate 0.0761 Epoch: 2 Global Step: 105900 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:07,325-Speed 2630.52 samples/sec Loss 12.1333 LearningRate 0.0761 Epoch: 2 Global Step: 105910 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:11,226-Speed 2625.64 samples/sec Loss 12.0797 LearningRate 0.0761 Epoch: 2 Global Step: 105920 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:15,124-Speed 2627.82 samples/sec Loss 12.1944 LearningRate 0.0761 Epoch: 2 Global Step: 105930 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:19,017-Speed 2631.87 samples/sec Loss 12.2819 LearningRate 0.0761 Epoch: 2 Global Step: 105940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:22,918-Speed 2625.61 samples/sec Loss 12.1094 LearningRate 0.0761 Epoch: 2 Global Step: 105950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:26,828-Speed 2619.73 samples/sec Loss 12.1186 LearningRate 0.0761 Epoch: 2 Global Step: 105960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:30,714-Speed 2635.60 samples/sec Loss 12.2050 LearningRate 0.0761 Epoch: 2 Global Step: 105970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:37:34,587-Speed 2644.86 samples/sec Loss 12.2162 LearningRate 0.0761 Epoch: 2 Global Step: 105980 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:37:38,479-Speed 2631.30 samples/sec Loss 12.1719 LearningRate 0.0761 Epoch: 2 Global Step: 105990 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:37:42,379-Speed 2626.75 samples/sec Loss 12.1747 LearningRate 0.0761 Epoch: 2 Global Step: 106000 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:37:46,272-Speed 2631.19 samples/sec Loss 12.2167 LearningRate 0.0761 Epoch: 2 Global Step: 106010 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:37:50,174-Speed 2624.87 samples/sec Loss 12.1952 LearningRate 0.0761 Epoch: 2 Global Step: 106020 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:37:54,066-Speed 2631.50 samples/sec Loss 12.2484 LearningRate 0.0761 Epoch: 2 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:37:57,954-Speed 2634.67 samples/sec Loss 12.0159 LearningRate 0.0761 Epoch: 2 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:38:01,848-Speed 2630.33 samples/sec Loss 12.0201 LearningRate 0.0761 Epoch: 2 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:38:05,742-Speed 2630.78 samples/sec Loss 12.1652 LearningRate 0.0761 Epoch: 2 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:38:09,637-Speed 2628.99 samples/sec Loss 12.1575 LearningRate 0.0761 Epoch: 2 Global Step: 106070 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:38:13,537-Speed 2626.51 samples/sec Loss 12.1539 LearningRate 0.0761 Epoch: 2 Global Step: 106080 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:17,430-Speed 2631.08 samples/sec Loss 12.1061 LearningRate 0.0761 Epoch: 2 Global Step: 106090 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:21,323-Speed 2631.34 samples/sec Loss 12.1317 LearningRate 0.0761 Epoch: 2 Global Step: 106100 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:25,215-Speed 2631.93 samples/sec Loss 11.9791 LearningRate 0.0761 Epoch: 2 Global Step: 106110 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:29,119-Speed 2623.51 samples/sec Loss 12.0179 LearningRate 0.0761 Epoch: 2 Global Step: 106120 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:33,015-Speed 2629.60 samples/sec Loss 12.0772 LearningRate 0.0761 Epoch: 2 Global Step: 106130 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:36,907-Speed 2631.54 samples/sec Loss 12.1443 LearningRate 0.0760 Epoch: 2 Global Step: 106140 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:40,798-Speed 2632.02 samples/sec Loss 12.2644 LearningRate 0.0760 Epoch: 2 Global Step: 106150 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:44,742-Speed 2597.04 samples/sec Loss 12.1205 LearningRate 0.0760 Epoch: 2 Global Step: 106160 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:48,636-Speed 2630.68 samples/sec Loss 12.1859 LearningRate 0.0760 Epoch: 2 Global Step: 106170 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:38:52,528-Speed 2631.32 samples/sec Loss 12.1485 LearningRate 0.0760 Epoch: 2 Global Step: 106180 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:38:56,476-Speed 2594.46 samples/sec Loss 12.0924 LearningRate 0.0760 Epoch: 2 Global Step: 106190 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:00,475-Speed 2561.40 samples/sec Loss 12.1310 LearningRate 0.0760 Epoch: 2 Global Step: 106200 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:04,394-Speed 2613.53 samples/sec Loss 12.1635 LearningRate 0.0760 Epoch: 2 Global Step: 106210 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:08,290-Speed 2628.47 samples/sec Loss 12.2093 LearningRate 0.0760 Epoch: 2 Global Step: 106220 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:12,192-Speed 2625.50 samples/sec Loss 12.1155 LearningRate 0.0760 Epoch: 2 Global Step: 106230 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:16,093-Speed 2624.95 samples/sec Loss 12.2428 LearningRate 0.0760 Epoch: 2 Global Step: 106240 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:20,002-Speed 2620.76 samples/sec Loss 12.0527 LearningRate 0.0760 Epoch: 2 Global Step: 106250 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:23,919-Speed 2614.65 samples/sec Loss 12.0019 LearningRate 0.0760 Epoch: 2 Global Step: 106260 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:27,817-Speed 2627.57 samples/sec Loss 12.0585 LearningRate 0.0760 Epoch: 2 Global Step: 106270 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:31,717-Speed 2626.23 samples/sec Loss 12.0607 LearningRate 0.0760 Epoch: 2 Global Step: 106280 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:35,623-Speed 2622.24 samples/sec Loss 12.0955 LearningRate 0.0760 Epoch: 2 Global Step: 106290 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:39:39,526-Speed 2624.51 samples/sec Loss 12.2181 LearningRate 0.0760 Epoch: 2 Global Step: 106300 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:39:43,423-Speed 2628.18 samples/sec Loss 12.2101 LearningRate 0.0760 Epoch: 2 Global Step: 106310 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:39:47,307-Speed 2637.43 samples/sec Loss 12.0158 LearningRate 0.0760 Epoch: 2 Global Step: 106320 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:51,200-Speed 2630.78 samples/sec Loss 12.0113 LearningRate 0.0760 Epoch: 2 Global Step: 106330 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:39:55,088-Speed 2634.18 samples/sec Loss 12.1309 LearningRate 0.0760 Epoch: 2 Global Step: 106340 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:39:58,981-Speed 2631.31 samples/sec Loss 12.4171 LearningRate 0.0760 Epoch: 2 Global Step: 106350 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:02,878-Speed 2628.12 samples/sec Loss 12.2583 LearningRate 0.0760 Epoch: 2 Global Step: 106360 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:06,785-Speed 2621.57 samples/sec Loss 12.2547 LearningRate 0.0760 Epoch: 2 Global Step: 106370 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:10,679-Speed 2630.36 samples/sec Loss 12.2514 LearningRate 0.0760 Epoch: 2 Global Step: 106380 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:14,579-Speed 2626.92 samples/sec Loss 12.0663 LearningRate 0.0760 Epoch: 2 Global Step: 106390 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:18,482-Speed 2623.53 samples/sec Loss 12.2094 LearningRate 0.0760 Epoch: 2 Global Step: 106400 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:22,384-Speed 2624.77 samples/sec Loss 12.0792 LearningRate 0.0760 Epoch: 2 Global Step: 106410 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:26,288-Speed 2623.77 samples/sec Loss 12.2405 LearningRate 0.0760 Epoch: 2 Global Step: 106420 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:30,192-Speed 2624.08 samples/sec Loss 11.9958 LearningRate 0.0760 Epoch: 2 Global Step: 106430 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:40:34,094-Speed 2624.84 samples/sec Loss 12.0058 LearningRate 0.0760 Epoch: 2 Global Step: 106440 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:40:37,991-Speed 2628.33 samples/sec Loss 12.1716 LearningRate 0.0760 Epoch: 2 Global Step: 106450 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:40:41,892-Speed 2626.66 samples/sec Loss 12.1279 LearningRate 0.0760 Epoch: 2 Global Step: 106460 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:40:45,806-Speed 2616.57 samples/sec Loss 12.1383 LearningRate 0.0760 Epoch: 2 Global Step: 106470 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:40:49,701-Speed 2629.62 samples/sec Loss 12.0240 LearningRate 0.0760 Epoch: 2 Global Step: 106480 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:40:53,594-Speed 2630.94 samples/sec Loss 12.1409 LearningRate 0.0760 Epoch: 2 Global Step: 106490 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:40:57,500-Speed 2622.46 samples/sec Loss 12.2282 LearningRate 0.0760 Epoch: 2 Global Step: 106500 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:01,397-Speed 2627.94 samples/sec Loss 12.1945 LearningRate 0.0760 Epoch: 2 Global Step: 106510 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:05,289-Speed 2631.99 samples/sec Loss 12.1384 LearningRate 0.0760 Epoch: 2 Global Step: 106520 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:09,183-Speed 2630.37 samples/sec Loss 12.1647 LearningRate 0.0760 Epoch: 2 Global Step: 106530 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:13,077-Speed 2630.40 samples/sec Loss 12.2515 LearningRate 0.0760 Epoch: 2 Global Step: 106540 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:41:16,970-Speed 2631.23 samples/sec Loss 12.0462 LearningRate 0.0760 Epoch: 2 Global Step: 106550 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:41:20,859-Speed 2633.14 samples/sec Loss 12.0276 LearningRate 0.0760 Epoch: 2 Global Step: 106560 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:41:24,755-Speed 2628.91 samples/sec Loss 12.0869 LearningRate 0.0760 Epoch: 2 Global Step: 106570 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:41:28,792-Speed 2537.61 samples/sec Loss 12.1080 LearningRate 0.0760 Epoch: 2 Global Step: 106580 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:41:32,724-Speed 2604.44 samples/sec Loss 12.1479 LearningRate 0.0760 Epoch: 2 Global Step: 106590 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:41:36,613-Speed 2633.95 samples/sec Loss 12.2318 LearningRate 0.0760 Epoch: 2 Global Step: 106600 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:40,501-Speed 2635.00 samples/sec Loss 12.1395 LearningRate 0.0759 Epoch: 2 Global Step: 106610 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:44,394-Speed 2630.91 samples/sec Loss 12.1407 LearningRate 0.0759 Epoch: 2 Global Step: 106620 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:48,285-Speed 2632.55 samples/sec Loss 12.0895 LearningRate 0.0759 Epoch: 2 Global Step: 106630 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:52,182-Speed 2627.71 samples/sec Loss 12.1133 LearningRate 0.0759 Epoch: 2 Global Step: 106640 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:56,080-Speed 2627.61 samples/sec Loss 12.0923 LearningRate 0.0759 Epoch: 2 Global Step: 106650 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:41:59,986-Speed 2622.72 samples/sec Loss 11.9902 LearningRate 0.0759 Epoch: 2 Global Step: 106660 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:03,879-Speed 2631.18 samples/sec Loss 12.1205 LearningRate 0.0759 Epoch: 2 Global Step: 106670 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:07,772-Speed 2631.15 samples/sec Loss 12.0455 LearningRate 0.0759 Epoch: 2 Global Step: 106680 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:11,665-Speed 2631.26 samples/sec Loss 12.2767 LearningRate 0.0759 Epoch: 2 Global Step: 106690 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:15,566-Speed 2625.46 samples/sec Loss 12.1606 LearningRate 0.0759 Epoch: 2 Global Step: 106700 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:42:19,509-Speed 2597.40 samples/sec Loss 12.1155 LearningRate 0.0759 Epoch: 2 Global Step: 106710 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:42:23,404-Speed 2629.24 samples/sec Loss 12.2608 LearningRate 0.0759 Epoch: 2 Global Step: 106720 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:42:27,295-Speed 2632.67 samples/sec Loss 12.1381 LearningRate 0.0759 Epoch: 2 Global Step: 106730 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:42:31,201-Speed 2622.08 samples/sec Loss 12.2722 LearningRate 0.0759 Epoch: 2 Global Step: 106740 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:42:35,097-Speed 2629.70 samples/sec Loss 12.1097 LearningRate 0.0759 Epoch: 2 Global Step: 106750 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:42:38,997-Speed 2626.29 samples/sec Loss 12.0831 LearningRate 0.0759 Epoch: 2 Global Step: 106760 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:42:42,876-Speed 2640.03 samples/sec Loss 12.1141 LearningRate 0.0759 Epoch: 2 Global Step: 106770 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:46,784-Speed 2620.84 samples/sec Loss 12.0299 LearningRate 0.0759 Epoch: 2 Global Step: 106780 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:50,679-Speed 2630.10 samples/sec Loss 12.0378 LearningRate 0.0759 Epoch: 2 Global Step: 106790 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:54,617-Speed 2601.25 samples/sec Loss 12.2373 LearningRate 0.0759 Epoch: 2 Global Step: 106800 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:42:58,510-Speed 2630.64 samples/sec Loss 12.1734 LearningRate 0.0759 Epoch: 2 Global Step: 106810 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:02,406-Speed 2629.15 samples/sec Loss 12.0204 LearningRate 0.0759 Epoch: 2 Global Step: 106820 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:06,302-Speed 2628.70 samples/sec Loss 12.2945 LearningRate 0.0759 Epoch: 2 Global Step: 106830 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:10,201-Speed 2627.71 samples/sec Loss 12.0841 LearningRate 0.0759 Epoch: 2 Global Step: 106840 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:14,134-Speed 2603.87 samples/sec Loss 12.1454 LearningRate 0.0759 Epoch: 2 Global Step: 106850 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:18,148-Speed 2551.56 samples/sec Loss 12.1036 LearningRate 0.0759 Epoch: 2 Global Step: 106860 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:22,211-Speed 2521.52 samples/sec Loss 12.1728 LearningRate 0.0759 Epoch: 2 Global Step: 106870 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:43:26,106-Speed 2629.39 samples/sec Loss 12.1347 LearningRate 0.0759 Epoch: 2 Global Step: 106880 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:43:29,997-Speed 2632.78 samples/sec Loss 12.0829 LearningRate 0.0759 Epoch: 2 Global Step: 106890 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:43:33,890-Speed 2630.73 samples/sec Loss 12.1284 LearningRate 0.0759 Epoch: 2 Global Step: 106900 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:43:37,784-Speed 2630.03 samples/sec Loss 12.2723 LearningRate 0.0759 Epoch: 2 Global Step: 106910 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:43:41,682-Speed 2627.44 samples/sec Loss 11.9663 LearningRate 0.0759 Epoch: 2 Global Step: 106920 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:43:45,566-Speed 2637.75 samples/sec Loss 12.2550 LearningRate 0.0759 Epoch: 2 Global Step: 106930 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:49,459-Speed 2631.57 samples/sec Loss 12.0306 LearningRate 0.0759 Epoch: 2 Global Step: 106940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:53,372-Speed 2617.51 samples/sec Loss 12.0112 LearningRate 0.0759 Epoch: 2 Global Step: 106950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:43:57,283-Speed 2619.07 samples/sec Loss 12.0126 LearningRate 0.0759 Epoch: 2 Global Step: 106960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:01,177-Speed 2629.85 samples/sec Loss 12.1024 LearningRate 0.0759 Epoch: 2 Global Step: 106970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:05,071-Speed 2630.32 samples/sec Loss 12.0718 LearningRate 0.0759 Epoch: 2 Global Step: 106980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:08,969-Speed 2627.47 samples/sec Loss 12.2120 LearningRate 0.0759 Epoch: 2 Global Step: 106990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:12,864-Speed 2630.15 samples/sec Loss 12.2502 LearningRate 0.0759 Epoch: 2 Global Step: 107000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:16,757-Speed 2630.82 samples/sec Loss 12.1826 LearningRate 0.0759 Epoch: 2 Global Step: 107010 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:20,664-Speed 2621.81 samples/sec Loss 12.2013 LearningRate 0.0759 Epoch: 2 Global Step: 107020 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:24,544-Speed 2640.39 samples/sec Loss 12.2019 LearningRate 0.0759 Epoch: 2 Global Step: 107030 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:28,442-Speed 2627.55 samples/sec Loss 12.2088 LearningRate 0.0759 Epoch: 2 Global Step: 107040 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:32,346-Speed 2623.73 samples/sec Loss 12.1201 LearningRate 0.0759 Epoch: 2 Global Step: 107050 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:36,245-Speed 2626.75 samples/sec Loss 12.2045 LearningRate 0.0759 Epoch: 2 Global Step: 107060 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:40,145-Speed 2626.54 samples/sec Loss 12.1599 LearningRate 0.0759 Epoch: 2 Global Step: 107070 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:44,039-Speed 2630.18 samples/sec Loss 12.0279 LearningRate 0.0759 Epoch: 2 Global Step: 107080 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:47,942-Speed 2624.42 samples/sec Loss 12.0992 LearningRate 0.0758 Epoch: 2 Global Step: 107090 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:51,837-Speed 2629.71 samples/sec Loss 12.2826 LearningRate 0.0758 Epoch: 2 Global Step: 107100 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:55,740-Speed 2624.26 samples/sec Loss 12.0651 LearningRate 0.0758 Epoch: 2 Global Step: 107110 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:44:59,638-Speed 2628.01 samples/sec Loss 12.0807 LearningRate 0.0758 Epoch: 2 Global Step: 107120 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:45:03,543-Speed 2622.53 samples/sec Loss 11.9453 LearningRate 0.0758 Epoch: 2 Global Step: 107130 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:07,437-Speed 2630.19 samples/sec Loss 12.1075 LearningRate 0.0758 Epoch: 2 Global Step: 107140 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:11,345-Speed 2620.59 samples/sec Loss 12.0284 LearningRate 0.0758 Epoch: 2 Global Step: 107150 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:15,252-Speed 2621.84 samples/sec Loss 12.1348 LearningRate 0.0758 Epoch: 2 Global Step: 107160 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:19,159-Speed 2621.61 samples/sec Loss 12.1620 LearningRate 0.0758 Epoch: 2 Global Step: 107170 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:23,058-Speed 2627.29 samples/sec Loss 12.2358 LearningRate 0.0758 Epoch: 2 Global Step: 107180 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:26,958-Speed 2626.71 samples/sec Loss 12.1587 LearningRate 0.0758 Epoch: 2 Global Step: 107190 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:30,865-Speed 2621.44 samples/sec Loss 12.2354 LearningRate 0.0758 Epoch: 2 Global Step: 107200 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:45:34,793-Speed 2607.67 samples/sec Loss 12.0734 LearningRate 0.0758 Epoch: 2 Global Step: 107210 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:45:38,689-Speed 2628.79 samples/sec Loss 12.0510 LearningRate 0.0758 Epoch: 2 Global Step: 107220 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:45:42,618-Speed 2606.82 samples/sec Loss 12.1518 LearningRate 0.0758 Epoch: 2 Global Step: 107230 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:45:46,517-Speed 2627.40 samples/sec Loss 12.2477 LearningRate 0.0758 Epoch: 2 Global Step: 107240 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:45:50,419-Speed 2624.94 samples/sec Loss 12.0920 LearningRate 0.0758 Epoch: 2 Global Step: 107250 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:45:54,328-Speed 2620.53 samples/sec Loss 12.2334 LearningRate 0.0758 Epoch: 2 Global Step: 107260 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:45:58,226-Speed 2627.55 samples/sec Loss 12.1161 LearningRate 0.0758 Epoch: 2 Global Step: 107270 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:02,130-Speed 2623.21 samples/sec Loss 12.1774 LearningRate 0.0758 Epoch: 2 Global Step: 107280 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:06,053-Speed 2611.04 samples/sec Loss 12.0556 LearningRate 0.0758 Epoch: 2 Global Step: 107290 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:09,950-Speed 2628.57 samples/sec Loss 12.0803 LearningRate 0.0758 Epoch: 2 Global Step: 107300 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:13,851-Speed 2626.06 samples/sec Loss 12.1002 LearningRate 0.0758 Epoch: 2 Global Step: 107310 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:46:17,728-Speed 2641.62 samples/sec Loss 12.0647 LearningRate 0.0758 Epoch: 2 Global Step: 107320 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:21,626-Speed 2627.66 samples/sec Loss 12.0520 LearningRate 0.0758 Epoch: 2 Global Step: 107330 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:25,525-Speed 2627.06 samples/sec Loss 12.0692 LearningRate 0.0758 Epoch: 2 Global Step: 107340 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:29,419-Speed 2630.97 samples/sec Loss 12.1605 LearningRate 0.0758 Epoch: 2 Global Step: 107350 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:33,316-Speed 2628.15 samples/sec Loss 11.9945 LearningRate 0.0758 Epoch: 2 Global Step: 107360 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:37,228-Speed 2618.73 samples/sec Loss 12.0780 LearningRate 0.0758 Epoch: 2 Global Step: 107370 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:41,126-Speed 2627.43 samples/sec Loss 12.1782 LearningRate 0.0758 Epoch: 2 Global Step: 107380 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:45,034-Speed 2621.10 samples/sec Loss 12.0654 LearningRate 0.0758 Epoch: 2 Global Step: 107390 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:48,943-Speed 2620.38 samples/sec Loss 12.0674 LearningRate 0.0758 Epoch: 2 Global Step: 107400 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:52,834-Speed 2631.87 samples/sec Loss 12.1024 LearningRate 0.0758 Epoch: 2 Global Step: 107410 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:46:56,714-Speed 2640.12 samples/sec Loss 12.0788 LearningRate 0.0758 Epoch: 2 Global Step: 107420 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:47:00,638-Speed 2609.93 samples/sec Loss 11.9677 LearningRate 0.0758 Epoch: 2 Global Step: 107430 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:47:04,535-Speed 2628.49 samples/sec Loss 12.1818 LearningRate 0.0758 Epoch: 2 Global Step: 107440 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:47:08,469-Speed 2603.76 samples/sec Loss 12.1219 LearningRate 0.0758 Epoch: 2 Global Step: 107450 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:47:12,363-Speed 2630.92 samples/sec Loss 12.0534 LearningRate 0.0758 Epoch: 2 Global Step: 107460 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:47:16,285-Speed 2610.87 samples/sec Loss 12.3211 LearningRate 0.0758 Epoch: 2 Global Step: 107470 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:47:20,177-Speed 2632.09 samples/sec Loss 12.2885 LearningRate 0.0758 Epoch: 2 Global Step: 107480 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:47:24,054-Speed 2641.71 samples/sec Loss 12.2851 LearningRate 0.0758 Epoch: 2 Global Step: 107490 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:27,944-Speed 2633.50 samples/sec Loss 12.1051 LearningRate 0.0758 Epoch: 2 Global Step: 107500 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:31,846-Speed 2624.57 samples/sec Loss 11.9399 LearningRate 0.0758 Epoch: 2 Global Step: 107510 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:35,745-Speed 2627.09 samples/sec Loss 11.8788 LearningRate 0.0758 Epoch: 2 Global Step: 107520 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:39,641-Speed 2629.38 samples/sec Loss 11.9618 LearningRate 0.0758 Epoch: 2 Global Step: 107530 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:43,532-Speed 2632.46 samples/sec Loss 12.1550 LearningRate 0.0758 Epoch: 2 Global Step: 107540 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:47,449-Speed 2614.75 samples/sec Loss 12.1620 LearningRate 0.0758 Epoch: 2 Global Step: 107550 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:51,351-Speed 2624.71 samples/sec Loss 11.9934 LearningRate 0.0757 Epoch: 2 Global Step: 107560 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:55,246-Speed 2630.07 samples/sec Loss 12.0206 LearningRate 0.0757 Epoch: 2 Global Step: 107570 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:47:59,135-Speed 2634.21 samples/sec Loss 11.9787 LearningRate 0.0757 Epoch: 2 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:48:03,025-Speed 2632.37 samples/sec Loss 12.1328 LearningRate 0.0757 Epoch: 2 Global Step: 107590 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:07,022-Speed 2563.49 samples/sec Loss 12.2072 LearningRate 0.0757 Epoch: 2 Global Step: 107600 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:10,913-Speed 2631.81 samples/sec Loss 12.1734 LearningRate 0.0757 Epoch: 2 Global Step: 107610 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:14,806-Speed 2631.79 samples/sec Loss 12.0023 LearningRate 0.0757 Epoch: 2 Global Step: 107620 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:18,707-Speed 2625.15 samples/sec Loss 12.1927 LearningRate 0.0757 Epoch: 2 Global Step: 107630 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:22,620-Speed 2617.99 samples/sec Loss 12.0313 LearningRate 0.0757 Epoch: 2 Global Step: 107640 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:26,513-Speed 2630.46 samples/sec Loss 12.0624 LearningRate 0.0757 Epoch: 2 Global Step: 107650 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:30,409-Speed 2629.13 samples/sec Loss 12.1299 LearningRate 0.0757 Epoch: 2 Global Step: 107660 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:34,313-Speed 2623.86 samples/sec Loss 12.1157 LearningRate 0.0757 Epoch: 2 Global Step: 107670 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:38,205-Speed 2631.18 samples/sec Loss 12.0022 LearningRate 0.0757 Epoch: 2 Global Step: 107680 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:42,090-Speed 2636.48 samples/sec Loss 12.0722 LearningRate 0.0757 Epoch: 2 Global Step: 107690 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:45,977-Speed 2635.83 samples/sec Loss 12.1946 LearningRate 0.0757 Epoch: 2 Global Step: 107700 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:49,871-Speed 2630.17 samples/sec Loss 11.9895 LearningRate 0.0757 Epoch: 2 Global Step: 107710 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:53,786-Speed 2615.71 samples/sec Loss 12.1514 LearningRate 0.0757 Epoch: 2 Global Step: 107720 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:48:57,697-Speed 2619.05 samples/sec Loss 12.0017 LearningRate 0.0757 Epoch: 2 Global Step: 107730 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:01,597-Speed 2626.44 samples/sec Loss 12.0900 LearningRate 0.0757 Epoch: 2 Global Step: 107740 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:05,492-Speed 2629.34 samples/sec Loss 12.0690 LearningRate 0.0757 Epoch: 2 Global Step: 107750 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:09,426-Speed 2604.01 samples/sec Loss 12.1163 LearningRate 0.0757 Epoch: 2 Global Step: 107760 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:13,320-Speed 2630.51 samples/sec Loss 12.1115 LearningRate 0.0757 Epoch: 2 Global Step: 107770 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:17,213-Speed 2630.71 samples/sec Loss 12.1700 LearningRate 0.0757 Epoch: 2 Global Step: 107780 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:21,112-Speed 2627.37 samples/sec Loss 12.1181 LearningRate 0.0757 Epoch: 2 Global Step: 107790 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:49:24,989-Speed 2642.20 samples/sec Loss 12.1057 LearningRate 0.0757 Epoch: 2 Global Step: 107800 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:28,906-Speed 2615.05 samples/sec Loss 12.0184 LearningRate 0.0757 Epoch: 2 Global Step: 107810 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:32,812-Speed 2622.96 samples/sec Loss 12.0238 LearningRate 0.0757 Epoch: 2 Global Step: 107820 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:36,718-Speed 2622.03 samples/sec Loss 12.1087 LearningRate 0.0757 Epoch: 2 Global Step: 107830 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:40,649-Speed 2605.39 samples/sec Loss 12.0065 LearningRate 0.0757 Epoch: 2 Global Step: 107840 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:44,544-Speed 2629.95 samples/sec Loss 12.0919 LearningRate 0.0757 Epoch: 2 Global Step: 107850 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:48,441-Speed 2627.99 samples/sec Loss 11.9539 LearningRate 0.0757 Epoch: 2 Global Step: 107860 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:52,340-Speed 2627.70 samples/sec Loss 12.0886 LearningRate 0.0757 Epoch: 2 Global Step: 107870 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:49:56,236-Speed 2628.40 samples/sec Loss 12.0829 LearningRate 0.0757 Epoch: 2 Global Step: 107880 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:00,131-Speed 2629.98 samples/sec Loss 12.1296 LearningRate 0.0757 Epoch: 2 Global Step: 107890 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:04,036-Speed 2623.19 samples/sec Loss 11.9621 LearningRate 0.0757 Epoch: 2 Global Step: 107900 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:50:07,933-Speed 2627.87 samples/sec Loss 12.0196 LearningRate 0.0757 Epoch: 2 Global Step: 107910 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:50:11,830-Speed 2628.71 samples/sec Loss 12.1181 LearningRate 0.0757 Epoch: 2 Global Step: 107920 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:50:15,718-Speed 2634.81 samples/sec Loss 12.0675 LearningRate 0.0757 Epoch: 2 Global Step: 107930 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:19,614-Speed 2629.11 samples/sec Loss 12.0756 LearningRate 0.0757 Epoch: 2 Global Step: 107940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:23,507-Speed 2631.19 samples/sec Loss 12.1412 LearningRate 0.0757 Epoch: 2 Global Step: 107950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:27,402-Speed 2629.49 samples/sec Loss 12.2661 LearningRate 0.0757 Epoch: 2 Global Step: 107960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:31,304-Speed 2625.18 samples/sec Loss 12.1781 LearningRate 0.0757 Epoch: 2 Global Step: 107970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:35,201-Speed 2627.83 samples/sec Loss 11.9912 LearningRate 0.0757 Epoch: 2 Global Step: 107980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:39,092-Speed 2632.51 samples/sec Loss 12.1391 LearningRate 0.0757 Epoch: 2 Global Step: 107990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:42,990-Speed 2628.22 samples/sec Loss 12.0429 LearningRate 0.0757 Epoch: 2 Global Step: 108000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:46,883-Speed 2630.60 samples/sec Loss 12.0831 LearningRate 0.0757 Epoch: 2 Global Step: 108010 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:50,783-Speed 2626.66 samples/sec Loss 12.1805 LearningRate 0.0757 Epoch: 2 Global Step: 108020 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:50:54,679-Speed 2628.64 samples/sec Loss 12.0231 LearningRate 0.0757 Epoch: 2 Global Step: 108030 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:50:58,573-Speed 2630.80 samples/sec Loss 12.0066 LearningRate 0.0756 Epoch: 2 Global Step: 108040 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:51:02,462-Speed 2633.43 samples/sec Loss 12.2031 LearningRate 0.0756 Epoch: 2 Global Step: 108050 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:51:06,362-Speed 2626.44 samples/sec Loss 11.9959 LearningRate 0.0756 Epoch: 2 Global Step: 108060 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:51:10,374-Speed 2552.74 samples/sec Loss 12.0708 LearningRate 0.0756 Epoch: 2 Global Step: 108070 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:51:14,281-Speed 2621.23 samples/sec Loss 12.1786 LearningRate 0.0756 Epoch: 2 Global Step: 108080 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:51:18,181-Speed 2626.63 samples/sec Loss 12.0631 LearningRate 0.0756 Epoch: 2 Global Step: 108090 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:51:22,078-Speed 2627.94 samples/sec Loss 12.0129 LearningRate 0.0756 Epoch: 2 Global Step: 108100 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:51:25,974-Speed 2629.65 samples/sec Loss 12.2320 LearningRate 0.0756 Epoch: 2 Global Step: 108110 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:51:29,856-Speed 2638.64 samples/sec Loss 12.2700 LearningRate 0.0756 Epoch: 2 Global Step: 108120 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:51:33,751-Speed 2629.40 samples/sec Loss 12.1500 LearningRate 0.0756 Epoch: 2 Global Step: 108130 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:51:37,649-Speed 2627.19 samples/sec Loss 12.1854 LearningRate 0.0756 Epoch: 2 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:51:41,538-Speed 2633.94 samples/sec Loss 12.0555 LearningRate 0.0756 Epoch: 2 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:51:45,433-Speed 2629.23 samples/sec Loss 12.0629 LearningRate 0.0756 Epoch: 2 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:51:49,331-Speed 2627.84 samples/sec Loss 12.0794 LearningRate 0.0756 Epoch: 2 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:51:53,227-Speed 2629.12 samples/sec Loss 11.9393 LearningRate 0.0756 Epoch: 2 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:51:57,125-Speed 2627.85 samples/sec Loss 12.0834 LearningRate 0.0756 Epoch: 2 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:52:01,021-Speed 2629.09 samples/sec Loss 12.0955 LearningRate 0.0756 Epoch: 2 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:52:04,913-Speed 2631.70 samples/sec Loss 12.2419 LearningRate 0.0756 Epoch: 2 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:52:08,810-Speed 2628.88 samples/sec Loss 12.0509 LearningRate 0.0756 Epoch: 2 Global Step: 108220 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:12,712-Speed 2624.63 samples/sec Loss 12.1429 LearningRate 0.0756 Epoch: 2 Global Step: 108230 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:16,640-Speed 2608.01 samples/sec Loss 11.9966 LearningRate 0.0756 Epoch: 2 Global Step: 108240 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:20,536-Speed 2628.91 samples/sec Loss 12.1471 LearningRate 0.0756 Epoch: 2 Global Step: 108250 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:24,456-Speed 2613.19 samples/sec Loss 12.0613 LearningRate 0.0756 Epoch: 2 Global Step: 108260 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:28,455-Speed 2560.88 samples/sec Loss 12.0975 LearningRate 0.0756 Epoch: 2 Global Step: 108270 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:32,347-Speed 2631.57 samples/sec Loss 12.1424 LearningRate 0.0756 Epoch: 2 Global Step: 108280 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:36,265-Speed 2614.63 samples/sec Loss 12.0658 LearningRate 0.0756 Epoch: 2 Global Step: 108290 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:40,228-Speed 2584.55 samples/sec Loss 12.1081 LearningRate 0.0756 Epoch: 2 Global Step: 108300 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:44,123-Speed 2630.11 samples/sec Loss 12.0102 LearningRate 0.0756 Epoch: 2 Global Step: 108310 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:52:48,019-Speed 2628.37 samples/sec Loss 12.1857 LearningRate 0.0756 Epoch: 2 Global Step: 108320 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:52:51,893-Speed 2644.21 samples/sec Loss 12.1094 LearningRate 0.0756 Epoch: 2 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:52:55,788-Speed 2629.63 samples/sec Loss 12.0892 LearningRate 0.0756 Epoch: 2 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:52:59,694-Speed 2622.81 samples/sec Loss 12.0654 LearningRate 0.0756 Epoch: 2 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:03,593-Speed 2626.65 samples/sec Loss 12.0869 LearningRate 0.0756 Epoch: 2 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:07,489-Speed 2629.64 samples/sec Loss 12.0914 LearningRate 0.0756 Epoch: 2 Global Step: 108370 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:11,386-Speed 2627.87 samples/sec Loss 12.2122 LearningRate 0.0756 Epoch: 2 Global Step: 108380 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:15,285-Speed 2627.48 samples/sec Loss 12.1212 LearningRate 0.0756 Epoch: 2 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:19,183-Speed 2627.51 samples/sec Loss 12.0516 LearningRate 0.0756 Epoch: 2 Global Step: 108400 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:23,079-Speed 2628.71 samples/sec Loss 12.0799 LearningRate 0.0756 Epoch: 2 Global Step: 108410 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:26,998-Speed 2613.28 samples/sec Loss 11.9285 LearningRate 0.0756 Epoch: 2 Global Step: 108420 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:30,894-Speed 2629.31 samples/sec Loss 12.2517 LearningRate 0.0756 Epoch: 2 Global Step: 108430 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:53:34,799-Speed 2623.07 samples/sec Loss 12.2082 LearningRate 0.0756 Epoch: 2 Global Step: 108440 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:53:38,720-Speed 2612.66 samples/sec Loss 12.0538 LearningRate 0.0756 Epoch: 2 Global Step: 108450 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:42,624-Speed 2623.71 samples/sec Loss 11.9938 LearningRate 0.0756 Epoch: 2 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:46,529-Speed 2623.03 samples/sec Loss 12.2881 LearningRate 0.0756 Epoch: 2 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:50,431-Speed 2624.73 samples/sec Loss 12.0087 LearningRate 0.0756 Epoch: 2 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:54,354-Speed 2610.45 samples/sec Loss 12.1033 LearningRate 0.0756 Epoch: 2 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:53:58,256-Speed 2625.04 samples/sec Loss 12.2493 LearningRate 0.0756 Epoch: 2 Global Step: 108500 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:54:02,157-Speed 2625.71 samples/sec Loss 12.0941 LearningRate 0.0756 Epoch: 2 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:54:06,062-Speed 2623.24 samples/sec Loss 12.0865 LearningRate 0.0755 Epoch: 2 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:54:09,961-Speed 2626.30 samples/sec Loss 12.0467 LearningRate 0.0755 Epoch: 2 Global Step: 108530 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:54:13,863-Speed 2625.83 samples/sec Loss 12.0775 LearningRate 0.0755 Epoch: 2 Global Step: 108540 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:54:17,757-Speed 2630.36 samples/sec Loss 12.0388 LearningRate 0.0755 Epoch: 2 Global Step: 108550 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:21,668-Speed 2618.25 samples/sec Loss 12.1829 LearningRate 0.0755 Epoch: 2 Global Step: 108560 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:25,566-Speed 2627.96 samples/sec Loss 12.0848 LearningRate 0.0755 Epoch: 2 Global Step: 108570 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:29,456-Speed 2633.29 samples/sec Loss 12.0913 LearningRate 0.0755 Epoch: 2 Global Step: 108580 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:33,351-Speed 2629.56 samples/sec Loss 12.0622 LearningRate 0.0755 Epoch: 2 Global Step: 108590 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:37,260-Speed 2619.74 samples/sec Loss 12.2147 LearningRate 0.0755 Epoch: 2 Global Step: 108600 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:41,144-Speed 2637.60 samples/sec Loss 12.1002 LearningRate 0.0755 Epoch: 2 Global Step: 108610 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:45,039-Speed 2630.03 samples/sec Loss 12.2254 LearningRate 0.0755 Epoch: 2 Global Step: 108620 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:48,931-Speed 2631.71 samples/sec Loss 12.1150 LearningRate 0.0755 Epoch: 2 Global Step: 108630 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:52,822-Speed 2632.48 samples/sec Loss 11.9998 LearningRate 0.0755 Epoch: 2 Global Step: 108640 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:54:56,716-Speed 2630.11 samples/sec Loss 12.2454 LearningRate 0.0755 Epoch: 2 Global Step: 108650 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:00,611-Speed 2629.39 samples/sec Loss 12.0263 LearningRate 0.0755 Epoch: 2 Global Step: 108660 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:04,558-Speed 2595.09 samples/sec Loss 12.1181 LearningRate 0.0755 Epoch: 2 Global Step: 108670 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:08,450-Speed 2631.56 samples/sec Loss 12.0422 LearningRate 0.0755 Epoch: 2 Global Step: 108680 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:12,354-Speed 2623.48 samples/sec Loss 12.0632 LearningRate 0.0755 Epoch: 2 Global Step: 108690 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:16,248-Speed 2631.06 samples/sec Loss 11.9831 LearningRate 0.0755 Epoch: 2 Global Step: 108700 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:20,155-Speed 2621.26 samples/sec Loss 11.9876 LearningRate 0.0755 Epoch: 2 Global Step: 108710 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:24,052-Speed 2628.98 samples/sec Loss 12.2634 LearningRate 0.0755 Epoch: 2 Global Step: 108720 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:28,002-Speed 2593.19 samples/sec Loss 12.2988 LearningRate 0.0755 Epoch: 2 Global Step: 108730 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:31,904-Speed 2624.76 samples/sec Loss 12.0849 LearningRate 0.0755 Epoch: 2 Global Step: 108740 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:35,783-Speed 2640.85 samples/sec Loss 12.0204 LearningRate 0.0755 Epoch: 2 Global Step: 108750 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:55:39,672-Speed 2633.22 samples/sec Loss 12.2000 LearningRate 0.0755 Epoch: 2 Global Step: 108760 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:55:43,578-Speed 2622.91 samples/sec Loss 12.0983 LearningRate 0.0755 Epoch: 2 Global Step: 108770 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:55:47,479-Speed 2625.38 samples/sec Loss 12.2004 LearningRate 0.0755 Epoch: 2 Global Step: 108780 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:55:51,380-Speed 2625.57 samples/sec Loss 12.0197 LearningRate 0.0755 Epoch: 2 Global Step: 108790 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:55:55,286-Speed 2622.48 samples/sec Loss 12.2006 LearningRate 0.0755 Epoch: 2 Global Step: 108800 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:55:59,192-Speed 2622.27 samples/sec Loss 12.0625 LearningRate 0.0755 Epoch: 2 Global Step: 108810 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:03,096-Speed 2623.33 samples/sec Loss 12.1204 LearningRate 0.0755 Epoch: 2 Global Step: 108820 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:07,000-Speed 2623.55 samples/sec Loss 12.0633 LearningRate 0.0755 Epoch: 2 Global Step: 108830 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:10,902-Speed 2624.81 samples/sec Loss 12.1104 LearningRate 0.0755 Epoch: 2 Global Step: 108840 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:14,804-Speed 2624.82 samples/sec Loss 12.0619 LearningRate 0.0755 Epoch: 2 Global Step: 108850 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:18,706-Speed 2624.85 samples/sec Loss 12.0789 LearningRate 0.0755 Epoch: 2 Global Step: 108860 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:56:22,609-Speed 2624.47 samples/sec Loss 12.1355 LearningRate 0.0755 Epoch: 2 Global Step: 108870 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:56:26,510-Speed 2625.84 samples/sec Loss 11.9669 LearningRate 0.0755 Epoch: 2 Global Step: 108880 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:56:30,414-Speed 2623.65 samples/sec Loss 12.0681 LearningRate 0.0755 Epoch: 2 Global Step: 108890 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:56:34,316-Speed 2624.88 samples/sec Loss 12.2126 LearningRate 0.0755 Epoch: 2 Global Step: 108900 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:56:38,193-Speed 2641.90 samples/sec Loss 12.1523 LearningRate 0.0755 Epoch: 2 Global Step: 108910 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:42,091-Speed 2627.40 samples/sec Loss 12.2089 LearningRate 0.0755 Epoch: 2 Global Step: 108920 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:45,993-Speed 2625.00 samples/sec Loss 12.0243 LearningRate 0.0755 Epoch: 2 Global Step: 108930 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:49,899-Speed 2622.32 samples/sec Loss 12.1898 LearningRate 0.0755 Epoch: 2 Global Step: 108940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:53,792-Speed 2630.67 samples/sec Loss 12.2838 LearningRate 0.0755 Epoch: 2 Global Step: 108950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:56:57,688-Speed 2628.93 samples/sec Loss 11.9669 LearningRate 0.0755 Epoch: 2 Global Step: 108960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:57:01,590-Speed 2625.39 samples/sec Loss 12.1485 LearningRate 0.0755 Epoch: 2 Global Step: 108970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:57:05,494-Speed 2623.68 samples/sec Loss 12.1333 LearningRate 0.0755 Epoch: 2 Global Step: 108980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:57:09,383-Speed 2633.30 samples/sec Loss 12.1019 LearningRate 0.0755 Epoch: 2 Global Step: 108990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:57:13,275-Speed 2631.93 samples/sec Loss 11.9807 LearningRate 0.0754 Epoch: 2 Global Step: 109000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:57:17,167-Speed 2631.67 samples/sec Loss 11.9796 LearningRate 0.0754 Epoch: 2 Global Step: 109010 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:21,081-Speed 2617.10 samples/sec Loss 12.0874 LearningRate 0.0754 Epoch: 2 Global Step: 109020 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:24,973-Speed 2631.60 samples/sec Loss 12.1108 LearningRate 0.0754 Epoch: 2 Global Step: 109030 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:28,866-Speed 2630.89 samples/sec Loss 11.9577 LearningRate 0.0754 Epoch: 2 Global Step: 109040 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:32,758-Speed 2631.26 samples/sec Loss 12.1592 LearningRate 0.0754 Epoch: 2 Global Step: 109050 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:36,655-Speed 2628.26 samples/sec Loss 12.1800 LearningRate 0.0754 Epoch: 2 Global Step: 109060 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:40,562-Speed 2621.72 samples/sec Loss 12.0902 LearningRate 0.0754 Epoch: 2 Global Step: 109070 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:44,468-Speed 2622.27 samples/sec Loss 12.0846 LearningRate 0.0754 Epoch: 2 Global Step: 109080 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:57:48,357-Speed 2633.64 samples/sec Loss 12.1254 LearningRate 0.0754 Epoch: 2 Global Step: 109090 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:57:52,392-Speed 2538.82 samples/sec Loss 12.1212 LearningRate 0.0754 Epoch: 2 Global Step: 109100 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:57:56,300-Speed 2621.11 samples/sec Loss 12.1308 LearningRate 0.0754 Epoch: 2 Global Step: 109110 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:00,203-Speed 2624.49 samples/sec Loss 12.1590 LearningRate 0.0754 Epoch: 2 Global Step: 109120 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:04,107-Speed 2623.64 samples/sec Loss 12.1334 LearningRate 0.0754 Epoch: 2 Global Step: 109130 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:08,007-Speed 2625.80 samples/sec Loss 11.9664 LearningRate 0.0754 Epoch: 2 Global Step: 109140 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:11,912-Speed 2623.05 samples/sec Loss 12.0657 LearningRate 0.0754 Epoch: 2 Global Step: 109150 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:15,851-Speed 2600.59 samples/sec Loss 12.0159 LearningRate 0.0754 Epoch: 2 Global Step: 109160 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:19,749-Speed 2627.42 samples/sec Loss 12.2505 LearningRate 0.0754 Epoch: 2 Global Step: 109170 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:23,647-Speed 2627.52 samples/sec Loss 12.2474 LearningRate 0.0754 Epoch: 2 Global Step: 109180 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:58:27,553-Speed 2622.80 samples/sec Loss 12.0146 LearningRate 0.0754 Epoch: 2 Global Step: 109190 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:58:31,456-Speed 2624.16 samples/sec Loss 12.1351 LearningRate 0.0754 Epoch: 2 Global Step: 109200 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:58:35,361-Speed 2622.66 samples/sec Loss 12.2021 LearningRate 0.0754 Epoch: 2 Global Step: 109210 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:58:39,264-Speed 2624.06 samples/sec Loss 12.0420 LearningRate 0.0754 Epoch: 2 Global Step: 109220 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 07:58:43,132-Speed 2648.59 samples/sec Loss 12.1947 LearningRate 0.0754 Epoch: 2 Global Step: 109230 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:58:47,039-Speed 2621.71 samples/sec Loss 11.9755 LearningRate 0.0754 Epoch: 2 Global Step: 109240 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:58:50,938-Speed 2627.05 samples/sec Loss 12.0470 LearningRate 0.0754 Epoch: 2 Global Step: 109250 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:58:54,841-Speed 2624.07 samples/sec Loss 12.0770 LearningRate 0.0754 Epoch: 2 Global Step: 109260 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:58:58,747-Speed 2621.84 samples/sec Loss 12.0428 LearningRate 0.0754 Epoch: 2 Global Step: 109270 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:02,655-Speed 2621.20 samples/sec Loss 12.0817 LearningRate 0.0754 Epoch: 2 Global Step: 109280 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:06,562-Speed 2621.44 samples/sec Loss 12.0904 LearningRate 0.0754 Epoch: 2 Global Step: 109290 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:10,483-Speed 2611.90 samples/sec Loss 12.0870 LearningRate 0.0754 Epoch: 2 Global Step: 109300 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:14,386-Speed 2624.53 samples/sec Loss 12.1005 LearningRate 0.0754 Epoch: 2 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:18,289-Speed 2624.41 samples/sec Loss 11.9498 LearningRate 0.0754 Epoch: 2 Global Step: 109320 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:22,190-Speed 2625.65 samples/sec Loss 12.1892 LearningRate 0.0754 Epoch: 2 Global Step: 109330 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 07:59:26,070-Speed 2640.12 samples/sec Loss 12.0754 LearningRate 0.0754 Epoch: 2 Global Step: 109340 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:29,962-Speed 2631.35 samples/sec Loss 12.9077 LearningRate 0.0754 Epoch: 2 Global Step: 109350 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 07:59:33,825-Speed 2651.20 samples/sec Loss 12.7647 LearningRate 0.0754 Epoch: 2 Global Step: 109360 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 07:59:37,728-Speed 2624.37 samples/sec Loss 12.5082 LearningRate 0.0754 Epoch: 2 Global Step: 109370 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 07:59:41,626-Speed 2628.04 samples/sec Loss 12.2818 LearningRate 0.0754 Epoch: 2 Global Step: 109380 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 07:59:45,530-Speed 2623.67 samples/sec Loss 12.2389 LearningRate 0.0754 Epoch: 2 Global Step: 109390 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 07:59:49,424-Speed 2630.68 samples/sec Loss 12.1722 LearningRate 0.0754 Epoch: 2 Global Step: 109400 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 07:59:53,321-Speed 2628.39 samples/sec Loss 12.2351 LearningRate 0.0754 Epoch: 2 Global Step: 109410 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 07:59:57,223-Speed 2624.73 samples/sec Loss 12.2997 LearningRate 0.0754 Epoch: 2 Global Step: 109420 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:00:01,185-Speed 2584.70 samples/sec Loss 12.1977 LearningRate 0.0754 Epoch: 2 Global Step: 109430 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:00:05,202-Speed 2550.40 samples/sec Loss 12.1046 LearningRate 0.0754 Epoch: 2 Global Step: 109440 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:00:09,094-Speed 2631.25 samples/sec Loss 12.0615 LearningRate 0.0754 Epoch: 2 Global Step: 109450 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:00:12,986-Speed 2632.22 samples/sec Loss 12.2449 LearningRate 0.0754 Epoch: 2 Global Step: 109460 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:16,881-Speed 2629.54 samples/sec Loss 12.1527 LearningRate 0.0753 Epoch: 2 Global Step: 109470 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:20,784-Speed 2625.02 samples/sec Loss 12.1119 LearningRate 0.0753 Epoch: 2 Global Step: 109480 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:24,677-Speed 2630.42 samples/sec Loss 12.2176 LearningRate 0.0753 Epoch: 2 Global Step: 109490 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:28,608-Speed 2605.86 samples/sec Loss 12.0391 LearningRate 0.0753 Epoch: 2 Global Step: 109500 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:32,503-Speed 2629.26 samples/sec Loss 12.0630 LearningRate 0.0753 Epoch: 2 Global Step: 109510 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:36,401-Speed 2628.32 samples/sec Loss 11.9907 LearningRate 0.0753 Epoch: 2 Global Step: 109520 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:40,295-Speed 2629.70 samples/sec Loss 12.1819 LearningRate 0.0753 Epoch: 2 Global Step: 109530 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:44,202-Speed 2621.34 samples/sec Loss 12.0584 LearningRate 0.0753 Epoch: 2 Global Step: 109540 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:48,092-Speed 2633.39 samples/sec Loss 12.0864 LearningRate 0.0753 Epoch: 2 Global Step: 109550 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:00:51,988-Speed 2629.12 samples/sec Loss 12.0153 LearningRate 0.0753 Epoch: 2 Global Step: 109560 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:00:55,881-Speed 2631.28 samples/sec Loss 12.1187 LearningRate 0.0753 Epoch: 2 Global Step: 109570 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:00:59,788-Speed 2621.53 samples/sec Loss 12.1644 LearningRate 0.0753 Epoch: 2 Global Step: 109580 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:03,684-Speed 2628.49 samples/sec Loss 12.1966 LearningRate 0.0753 Epoch: 2 Global Step: 109590 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:07,575-Speed 2632.76 samples/sec Loss 12.1865 LearningRate 0.0753 Epoch: 2 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:11,463-Speed 2634.37 samples/sec Loss 12.1732 LearningRate 0.0753 Epoch: 2 Global Step: 109610 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:15,362-Speed 2626.92 samples/sec Loss 12.0320 LearningRate 0.0753 Epoch: 2 Global Step: 109620 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:19,267-Speed 2622.44 samples/sec Loss 11.9451 LearningRate 0.0753 Epoch: 2 Global Step: 109630 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:23,178-Speed 2619.39 samples/sec Loss 12.0914 LearningRate 0.0753 Epoch: 2 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:27,067-Speed 2633.54 samples/sec Loss 12.0586 LearningRate 0.0753 Epoch: 2 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:01:30,956-Speed 2633.73 samples/sec Loss 12.0617 LearningRate 0.0753 Epoch: 2 Global Step: 109660 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:01:34,848-Speed 2631.59 samples/sec Loss 12.0576 LearningRate 0.0753 Epoch: 2 Global Step: 109670 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:01:38,741-Speed 2630.73 samples/sec Loss 12.1214 LearningRate 0.0753 Epoch: 2 Global Step: 109680 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:01:42,638-Speed 2628.35 samples/sec Loss 12.0047 LearningRate 0.0753 Epoch: 2 Global Step: 109690 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:01:46,543-Speed 2623.68 samples/sec Loss 12.1174 LearningRate 0.0753 Epoch: 2 Global Step: 109700 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:01:50,452-Speed 2620.13 samples/sec Loss 12.1868 LearningRate 0.0753 Epoch: 2 Global Step: 109710 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:01:54,368-Speed 2615.97 samples/sec Loss 12.2262 LearningRate 0.0753 Epoch: 2 Global Step: 109720 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:01:58,271-Speed 2624.03 samples/sec Loss 12.2735 LearningRate 0.0753 Epoch: 2 Global Step: 109730 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:02,196-Speed 2609.58 samples/sec Loss 12.1244 LearningRate 0.0753 Epoch: 2 Global Step: 109740 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:06,089-Speed 2631.30 samples/sec Loss 12.0442 LearningRate 0.0753 Epoch: 2 Global Step: 109750 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:09,968-Speed 2640.62 samples/sec Loss 12.0241 LearningRate 0.0753 Epoch: 2 Global Step: 109760 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:13,869-Speed 2625.44 samples/sec Loss 12.1831 LearningRate 0.0753 Epoch: 2 Global Step: 109770 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:17,767-Speed 2627.78 samples/sec Loss 12.0441 LearningRate 0.0753 Epoch: 2 Global Step: 109780 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:21,827-Speed 2522.94 samples/sec Loss 11.9970 LearningRate 0.0753 Epoch: 2 Global Step: 109790 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:25,925-Speed 2499.01 samples/sec Loss 12.0530 LearningRate 0.0753 Epoch: 2 Global Step: 109800 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:29,897-Speed 2578.74 samples/sec Loss 12.0382 LearningRate 0.0753 Epoch: 2 Global Step: 109810 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:33,790-Speed 2631.20 samples/sec Loss 11.9907 LearningRate 0.0753 Epoch: 2 Global Step: 109820 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:37,700-Speed 2619.74 samples/sec Loss 12.0735 LearningRate 0.0753 Epoch: 2 Global Step: 109830 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:41,599-Speed 2626.43 samples/sec Loss 12.0928 LearningRate 0.0753 Epoch: 2 Global Step: 109840 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:45,538-Speed 2600.45 samples/sec Loss 11.9639 LearningRate 0.0753 Epoch: 2 Global Step: 109850 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:02:49,430-Speed 2631.88 samples/sec Loss 12.1759 LearningRate 0.0753 Epoch: 2 Global Step: 109860 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:02:53,324-Speed 2630.81 samples/sec Loss 12.1047 LearningRate 0.0753 Epoch: 2 Global Step: 109870 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:02:57,217-Speed 2631.20 samples/sec Loss 12.0891 LearningRate 0.0753 Epoch: 2 Global Step: 109880 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:01,111-Speed 2630.38 samples/sec Loss 12.1207 LearningRate 0.0753 Epoch: 2 Global Step: 109890 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:05,031-Speed 2613.07 samples/sec Loss 11.9241 LearningRate 0.0753 Epoch: 2 Global Step: 109900 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:08,949-Speed 2614.05 samples/sec Loss 12.1597 LearningRate 0.0753 Epoch: 2 Global Step: 109910 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:12,879-Speed 2606.25 samples/sec Loss 11.9903 LearningRate 0.0753 Epoch: 2 Global Step: 109920 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:16,776-Speed 2628.77 samples/sec Loss 12.1304 LearningRate 0.0753 Epoch: 2 Global Step: 109930 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:20,683-Speed 2621.47 samples/sec Loss 12.0935 LearningRate 0.0753 Epoch: 2 Global Step: 109940 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:24,591-Speed 2620.92 samples/sec Loss 12.1041 LearningRate 0.0752 Epoch: 2 Global Step: 109950 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:28,484-Speed 2631.21 samples/sec Loss 12.1361 LearningRate 0.0752 Epoch: 2 Global Step: 109960 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:32,383-Speed 2626.77 samples/sec Loss 12.0959 LearningRate 0.0752 Epoch: 2 Global Step: 109970 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:03:36,277-Speed 2630.69 samples/sec Loss 12.2194 LearningRate 0.0752 Epoch: 2 Global Step: 109980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:03:40,189-Speed 2618.32 samples/sec Loss 12.1815 LearningRate 0.0752 Epoch: 2 Global Step: 109990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:03:44,102-Speed 2617.02 samples/sec Loss 11.9682 LearningRate 0.0752 Epoch: 2 Global Step: 110000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:04:27,269-[lfw][110000]XNorm: 22.173644
Training: 2022-04-13 08:04:27,270-[lfw][110000]Accuracy-Flip: 0.99650+-0.00189
Training: 2022-04-13 08:04:27,270-[lfw][110000]Accuracy-Highest: 0.99783
Training: 2022-04-13 08:05:17,321-[cfp_fp][110000]XNorm: 20.136800
Training: 2022-04-13 08:05:17,322-[cfp_fp][110000]Accuracy-Flip: 0.97729+-0.00721
Training: 2022-04-13 08:05:17,323-[cfp_fp][110000]Accuracy-Highest: 0.97986
Training: 2022-04-13 08:06:00,469-[agedb_30][110000]XNorm: 21.814672
Training: 2022-04-13 08:06:00,470-[agedb_30][110000]Accuracy-Flip: 0.96733+-0.00782
Training: 2022-04-13 08:06:00,470-[agedb_30][110000]Accuracy-Highest: 0.96750
Training: 2022-04-13 08:06:04,344-Speed 73.02 samples/sec Loss 12.1267 LearningRate 0.0752 Epoch: 2 Global Step: 110010 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:06:08,220-Speed 2642.51 samples/sec Loss 12.1057 LearningRate 0.0752 Epoch: 2 Global Step: 110020 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:06:12,106-Speed 2636.06 samples/sec Loss 11.7171 LearningRate 0.0752 Epoch: 2 Global Step: 110030 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:06:15,979-Speed 2644.79 samples/sec Loss 11.9920 LearningRate 0.0752 Epoch: 2 Global Step: 110040 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:06:19,865-Speed 2635.59 samples/sec Loss 12.0496 LearningRate 0.0752 Epoch: 2 Global Step: 110050 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:06:23,744-Speed 2640.50 samples/sec Loss 12.1492 LearningRate 0.0752 Epoch: 2 Global Step: 110060 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:06:27,627-Speed 2638.76 samples/sec Loss 12.0734 LearningRate 0.0752 Epoch: 2 Global Step: 110070 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:06:31,510-Speed 2638.05 samples/sec Loss 11.9498 LearningRate 0.0752 Epoch: 2 Global Step: 110080 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:06:35,395-Speed 2636.61 samples/sec Loss 12.0721 LearningRate 0.0752 Epoch: 2 Global Step: 110090 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:06:39,289-Speed 2629.74 samples/sec Loss 12.1161 LearningRate 0.0752 Epoch: 2 Global Step: 110100 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:06:43,177-Speed 2634.75 samples/sec Loss 12.2796 LearningRate 0.0752 Epoch: 2 Global Step: 110110 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:06:47,070-Speed 2631.18 samples/sec Loss 12.0230 LearningRate 0.0752 Epoch: 2 Global Step: 110120 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:06:50,969-Speed 2627.09 samples/sec Loss 12.1282 LearningRate 0.0752 Epoch: 2 Global Step: 110130 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:06:54,874-Speed 2622.89 samples/sec Loss 12.1854 LearningRate 0.0752 Epoch: 2 Global Step: 110140 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:06:58,753-Speed 2640.46 samples/sec Loss 12.0519 LearningRate 0.0752 Epoch: 2 Global Step: 110150 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:02,651-Speed 2627.70 samples/sec Loss 12.0336 LearningRate 0.0752 Epoch: 2 Global Step: 110160 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:06,552-Speed 2626.15 samples/sec Loss 12.0500 LearningRate 0.0752 Epoch: 2 Global Step: 110170 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:10,450-Speed 2627.05 samples/sec Loss 12.0462 LearningRate 0.0752 Epoch: 2 Global Step: 110180 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:14,356-Speed 2622.61 samples/sec Loss 11.9131 LearningRate 0.0752 Epoch: 2 Global Step: 110190 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:18,283-Speed 2608.21 samples/sec Loss 11.9506 LearningRate 0.0752 Epoch: 2 Global Step: 110200 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:22,189-Speed 2622.53 samples/sec Loss 12.1381 LearningRate 0.0752 Epoch: 2 Global Step: 110210 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:26,100-Speed 2618.51 samples/sec Loss 12.2486 LearningRate 0.0752 Epoch: 2 Global Step: 110220 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:30,008-Speed 2621.40 samples/sec Loss 11.9714 LearningRate 0.0752 Epoch: 2 Global Step: 110230 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:33,915-Speed 2621.18 samples/sec Loss 12.1201 LearningRate 0.0752 Epoch: 2 Global Step: 110240 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:37,802-Speed 2635.35 samples/sec Loss 12.1608 LearningRate 0.0752 Epoch: 2 Global Step: 110250 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:41,713-Speed 2619.32 samples/sec Loss 11.9358 LearningRate 0.0752 Epoch: 2 Global Step: 110260 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:45,625-Speed 2618.42 samples/sec Loss 11.9078 LearningRate 0.0752 Epoch: 2 Global Step: 110270 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:49,539-Speed 2617.07 samples/sec Loss 12.0952 LearningRate 0.0752 Epoch: 2 Global Step: 110280 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:53,444-Speed 2622.70 samples/sec Loss 12.0628 LearningRate 0.0752 Epoch: 2 Global Step: 110290 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:07:57,345-Speed 2625.85 samples/sec Loss 11.9827 LearningRate 0.0752 Epoch: 2 Global Step: 110300 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:08:01,382-Speed 2537.32 samples/sec Loss 12.2289 LearningRate 0.0752 Epoch: 2 Global Step: 110310 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:08:05,280-Speed 2627.76 samples/sec Loss 12.0559 LearningRate 0.0752 Epoch: 2 Global Step: 110320 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:08:09,195-Speed 2615.73 samples/sec Loss 11.9937 LearningRate 0.0752 Epoch: 2 Global Step: 110330 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:08:13,086-Speed 2632.62 samples/sec Loss 12.2329 LearningRate 0.0752 Epoch: 2 Global Step: 110340 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:16,990-Speed 2624.17 samples/sec Loss 12.1646 LearningRate 0.0752 Epoch: 2 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:20,911-Speed 2612.23 samples/sec Loss 12.2336 LearningRate 0.0752 Epoch: 2 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:24,809-Speed 2627.16 samples/sec Loss 12.0953 LearningRate 0.0752 Epoch: 2 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:28,740-Speed 2606.05 samples/sec Loss 11.9805 LearningRate 0.0752 Epoch: 2 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:32,643-Speed 2623.87 samples/sec Loss 11.9999 LearningRate 0.0752 Epoch: 2 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:36,543-Speed 2626.83 samples/sec Loss 12.0766 LearningRate 0.0752 Epoch: 2 Global Step: 110400 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:40,447-Speed 2623.42 samples/sec Loss 12.1347 LearningRate 0.0752 Epoch: 2 Global Step: 110410 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:44,345-Speed 2628.15 samples/sec Loss 12.0173 LearningRate 0.0752 Epoch: 2 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:48,242-Speed 2628.26 samples/sec Loss 12.0979 LearningRate 0.0751 Epoch: 2 Global Step: 110430 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:08:52,143-Speed 2625.26 samples/sec Loss 12.1335 LearningRate 0.0751 Epoch: 2 Global Step: 110440 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:08:56,039-Speed 2628.63 samples/sec Loss 12.0513 LearningRate 0.0751 Epoch: 2 Global Step: 110450 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:08:59,942-Speed 2624.78 samples/sec Loss 12.0830 LearningRate 0.0751 Epoch: 2 Global Step: 110460 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:09:03,840-Speed 2627.82 samples/sec Loss 12.0301 LearningRate 0.0751 Epoch: 2 Global Step: 110470 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:07,740-Speed 2626.40 samples/sec Loss 12.0721 LearningRate 0.0751 Epoch: 2 Global Step: 110480 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:11,642-Speed 2624.95 samples/sec Loss 12.1823 LearningRate 0.0751 Epoch: 2 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:15,540-Speed 2628.24 samples/sec Loss 12.0432 LearningRate 0.0751 Epoch: 2 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:19,462-Speed 2611.41 samples/sec Loss 11.9884 LearningRate 0.0751 Epoch: 2 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:23,370-Speed 2620.58 samples/sec Loss 12.0531 LearningRate 0.0751 Epoch: 2 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:27,281-Speed 2619.01 samples/sec Loss 11.9284 LearningRate 0.0751 Epoch: 2 Global Step: 110530 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:31,195-Speed 2617.11 samples/sec Loss 12.0669 LearningRate 0.0751 Epoch: 2 Global Step: 110540 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:35,107-Speed 2618.12 samples/sec Loss 12.0893 LearningRate 0.0751 Epoch: 2 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:39,020-Speed 2617.47 samples/sec Loss 12.2299 LearningRate 0.0751 Epoch: 2 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:09:42,917-Speed 2628.83 samples/sec Loss 12.0055 LearningRate 0.0751 Epoch: 2 Global Step: 110570 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:09:46,818-Speed 2625.75 samples/sec Loss 12.2020 LearningRate 0.0751 Epoch: 2 Global Step: 110580 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:09:50,717-Speed 2626.64 samples/sec Loss 11.9720 LearningRate 0.0751 Epoch: 2 Global Step: 110590 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:09:54,616-Speed 2626.89 samples/sec Loss 12.0128 LearningRate 0.0751 Epoch: 2 Global Step: 110600 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:09:58,510-Speed 2630.75 samples/sec Loss 12.0750 LearningRate 0.0751 Epoch: 2 Global Step: 110610 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:02,410-Speed 2625.98 samples/sec Loss 12.0755 LearningRate 0.0751 Epoch: 2 Global Step: 110620 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:06,310-Speed 2627.02 samples/sec Loss 12.0976 LearningRate 0.0751 Epoch: 2 Global Step: 110630 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:10,233-Speed 2610.42 samples/sec Loss 11.9559 LearningRate 0.0751 Epoch: 2 Global Step: 110640 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:14,138-Speed 2623.19 samples/sec Loss 12.0633 LearningRate 0.0751 Epoch: 2 Global Step: 110650 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:18,041-Speed 2624.07 samples/sec Loss 12.0996 LearningRate 0.0751 Epoch: 2 Global Step: 110660 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:21,979-Speed 2600.72 samples/sec Loss 12.0363 LearningRate 0.0751 Epoch: 2 Global Step: 110670 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:10:25,898-Speed 2613.90 samples/sec Loss 12.1118 LearningRate 0.0751 Epoch: 2 Global Step: 110680 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:10:29,823-Speed 2609.84 samples/sec Loss 11.9896 LearningRate 0.0751 Epoch: 2 Global Step: 110690 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:10:33,720-Speed 2628.74 samples/sec Loss 11.9183 LearningRate 0.0751 Epoch: 2 Global Step: 110700 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:10:37,617-Speed 2628.07 samples/sec Loss 11.9805 LearningRate 0.0751 Epoch: 2 Global Step: 110710 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:10:41,501-Speed 2637.46 samples/sec Loss 12.0580 LearningRate 0.0751 Epoch: 2 Global Step: 110720 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:45,452-Speed 2592.32 samples/sec Loss 12.1490 LearningRate 0.0751 Epoch: 2 Global Step: 110730 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:49,346-Speed 2630.11 samples/sec Loss 12.0912 LearningRate 0.0751 Epoch: 2 Global Step: 110740 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:53,251-Speed 2623.50 samples/sec Loss 12.0314 LearningRate 0.0751 Epoch: 2 Global Step: 110750 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:10:57,150-Speed 2626.81 samples/sec Loss 12.0158 LearningRate 0.0751 Epoch: 2 Global Step: 110760 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:01,049-Speed 2626.92 samples/sec Loss 12.1455 LearningRate 0.0751 Epoch: 2 Global Step: 110770 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:04,968-Speed 2613.49 samples/sec Loss 12.2227 LearningRate 0.0751 Epoch: 2 Global Step: 110780 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:08,877-Speed 2620.65 samples/sec Loss 12.2660 LearningRate 0.0751 Epoch: 2 Global Step: 110790 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:12,777-Speed 2626.50 samples/sec Loss 12.0587 LearningRate 0.0751 Epoch: 2 Global Step: 110800 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:16,686-Speed 2620.43 samples/sec Loss 12.0454 LearningRate 0.0751 Epoch: 2 Global Step: 110810 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:20,583-Speed 2628.19 samples/sec Loss 12.0303 LearningRate 0.0751 Epoch: 2 Global Step: 110820 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:11:24,473-Speed 2632.88 samples/sec Loss 12.0510 LearningRate 0.0751 Epoch: 2 Global Step: 110830 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:28,375-Speed 2625.53 samples/sec Loss 11.9338 LearningRate 0.0751 Epoch: 2 Global Step: 110840 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:32,276-Speed 2625.93 samples/sec Loss 12.1023 LearningRate 0.0751 Epoch: 2 Global Step: 110850 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:36,180-Speed 2623.02 samples/sec Loss 11.9962 LearningRate 0.0751 Epoch: 2 Global Step: 110860 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:40,077-Speed 2627.91 samples/sec Loss 12.0230 LearningRate 0.0751 Epoch: 2 Global Step: 110870 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:43,975-Speed 2628.37 samples/sec Loss 11.8651 LearningRate 0.0751 Epoch: 2 Global Step: 110880 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:47,876-Speed 2625.07 samples/sec Loss 12.1158 LearningRate 0.0751 Epoch: 2 Global Step: 110890 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:51,782-Speed 2623.05 samples/sec Loss 12.2190 LearningRate 0.0751 Epoch: 2 Global Step: 110900 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:55,683-Speed 2625.10 samples/sec Loss 12.0310 LearningRate 0.0750 Epoch: 2 Global Step: 110910 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:11:59,588-Speed 2623.23 samples/sec Loss 12.0406 LearningRate 0.0750 Epoch: 2 Global Step: 110920 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:03,476-Speed 2634.24 samples/sec Loss 12.0697 LearningRate 0.0750 Epoch: 2 Global Step: 110930 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:07,371-Speed 2630.00 samples/sec Loss 12.0359 LearningRate 0.0750 Epoch: 2 Global Step: 110940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:11,273-Speed 2624.91 samples/sec Loss 11.9402 LearningRate 0.0750 Epoch: 2 Global Step: 110950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:15,174-Speed 2625.62 samples/sec Loss 12.2030 LearningRate 0.0750 Epoch: 2 Global Step: 110960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:19,075-Speed 2625.42 samples/sec Loss 12.0478 LearningRate 0.0750 Epoch: 2 Global Step: 110970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:22,978-Speed 2624.56 samples/sec Loss 11.9715 LearningRate 0.0750 Epoch: 2 Global Step: 110980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:26,876-Speed 2627.48 samples/sec Loss 12.0454 LearningRate 0.0750 Epoch: 2 Global Step: 110990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:30,788-Speed 2618.87 samples/sec Loss 12.1329 LearningRate 0.0750 Epoch: 2 Global Step: 111000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:34,693-Speed 2622.58 samples/sec Loss 12.1216 LearningRate 0.0750 Epoch: 2 Global Step: 111010 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:38,595-Speed 2625.14 samples/sec Loss 12.0061 LearningRate 0.0750 Epoch: 2 Global Step: 111020 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:42,494-Speed 2626.64 samples/sec Loss 12.1361 LearningRate 0.0750 Epoch: 2 Global Step: 111030 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:12:46,391-Speed 2628.73 samples/sec Loss 12.0415 LearningRate 0.0750 Epoch: 2 Global Step: 111040 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:12:50,275-Speed 2636.84 samples/sec Loss 11.9773 LearningRate 0.0750 Epoch: 2 Global Step: 111050 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:54,172-Speed 2628.39 samples/sec Loss 12.1923 LearningRate 0.0750 Epoch: 2 Global Step: 111060 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:12:58,065-Speed 2631.40 samples/sec Loss 12.0014 LearningRate 0.0750 Epoch: 2 Global Step: 111070 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:01,960-Speed 2629.70 samples/sec Loss 11.9596 LearningRate 0.0750 Epoch: 2 Global Step: 111080 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:05,866-Speed 2622.09 samples/sec Loss 12.0078 LearningRate 0.0750 Epoch: 2 Global Step: 111090 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:09,768-Speed 2624.96 samples/sec Loss 12.1258 LearningRate 0.0750 Epoch: 2 Global Step: 111100 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:13,663-Speed 2629.48 samples/sec Loss 12.0981 LearningRate 0.0750 Epoch: 2 Global Step: 111110 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:17,568-Speed 2623.15 samples/sec Loss 11.9268 LearningRate 0.0750 Epoch: 2 Global Step: 111120 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:21,465-Speed 2628.61 samples/sec Loss 12.0996 LearningRate 0.0750 Epoch: 2 Global Step: 111130 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:25,361-Speed 2628.95 samples/sec Loss 12.0050 LearningRate 0.0750 Epoch: 2 Global Step: 111140 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:29,277-Speed 2616.06 samples/sec Loss 12.0250 LearningRate 0.0750 Epoch: 2 Global Step: 111150 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:13:33,238-Speed 2585.54 samples/sec Loss 12.0573 LearningRate 0.0750 Epoch: 2 Global Step: 111160 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:13:37,139-Speed 2625.77 samples/sec Loss 12.0115 LearningRate 0.0750 Epoch: 2 Global Step: 111170 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:13:41,049-Speed 2618.98 samples/sec Loss 11.9367 LearningRate 0.0750 Epoch: 2 Global Step: 111180 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:13:44,930-Speed 2639.59 samples/sec Loss 12.0700 LearningRate 0.0750 Epoch: 2 Global Step: 111190 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:13:48,813-Speed 2637.89 samples/sec Loss 12.0464 LearningRate 0.0750 Epoch: 2 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:13:52,719-Speed 2622.27 samples/sec Loss 12.0130 LearningRate 0.0750 Epoch: 2 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:13:56,614-Speed 2629.69 samples/sec Loss 12.0982 LearningRate 0.0750 Epoch: 2 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:00,508-Speed 2630.29 samples/sec Loss 12.0670 LearningRate 0.0750 Epoch: 2 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:04,406-Speed 2627.44 samples/sec Loss 12.0746 LearningRate 0.0750 Epoch: 2 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:08,298-Speed 2631.51 samples/sec Loss 12.0986 LearningRate 0.0750 Epoch: 2 Global Step: 111250 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:12,192-Speed 2630.28 samples/sec Loss 12.1874 LearningRate 0.0750 Epoch: 2 Global Step: 111260 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:16,098-Speed 2622.20 samples/sec Loss 12.1905 LearningRate 0.0750 Epoch: 2 Global Step: 111270 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:20,000-Speed 2624.79 samples/sec Loss 12.1586 LearningRate 0.0750 Epoch: 2 Global Step: 111280 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:23,897-Speed 2628.13 samples/sec Loss 12.0679 LearningRate 0.0750 Epoch: 2 Global Step: 111290 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:14:27,800-Speed 2624.41 samples/sec Loss 12.1273 LearningRate 0.0750 Epoch: 2 Global Step: 111300 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:31,701-Speed 2625.33 samples/sec Loss 12.1775 LearningRate 0.0750 Epoch: 2 Global Step: 111310 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:35,600-Speed 2626.95 samples/sec Loss 11.9247 LearningRate 0.0750 Epoch: 2 Global Step: 111320 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:39,494-Speed 2630.00 samples/sec Loss 12.0734 LearningRate 0.0750 Epoch: 2 Global Step: 111330 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:43,396-Speed 2625.24 samples/sec Loss 12.1506 LearningRate 0.0750 Epoch: 2 Global Step: 111340 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:47,308-Speed 2618.11 samples/sec Loss 12.1163 LearningRate 0.0750 Epoch: 2 Global Step: 111350 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:51,202-Speed 2630.15 samples/sec Loss 12.0673 LearningRate 0.0750 Epoch: 2 Global Step: 111360 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:55,100-Speed 2627.52 samples/sec Loss 12.1007 LearningRate 0.0750 Epoch: 2 Global Step: 111370 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:14:58,995-Speed 2629.99 samples/sec Loss 12.1421 LearningRate 0.0750 Epoch: 2 Global Step: 111380 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:02,891-Speed 2628.86 samples/sec Loss 12.1346 LearningRate 0.0749 Epoch: 2 Global Step: 111390 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:06,964-Speed 2514.57 samples/sec Loss 11.9070 LearningRate 0.0749 Epoch: 2 Global Step: 111400 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:15:10,876-Speed 2618.49 samples/sec Loss 12.0569 LearningRate 0.0749 Epoch: 2 Global Step: 111410 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:15:14,771-Speed 2629.51 samples/sec Loss 12.1600 LearningRate 0.0749 Epoch: 2 Global Step: 111420 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:15:18,651-Speed 2639.31 samples/sec Loss 11.9232 LearningRate 0.0749 Epoch: 2 Global Step: 111430 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:22,557-Speed 2629.72 samples/sec Loss 12.0288 LearningRate 0.0749 Epoch: 2 Global Step: 111440 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:26,477-Speed 2613.05 samples/sec Loss 12.0736 LearningRate 0.0749 Epoch: 2 Global Step: 111450 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:30,373-Speed 2628.85 samples/sec Loss 11.9205 LearningRate 0.0749 Epoch: 2 Global Step: 111460 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:34,272-Speed 2627.46 samples/sec Loss 12.1197 LearningRate 0.0749 Epoch: 2 Global Step: 111470 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:38,169-Speed 2627.92 samples/sec Loss 11.9801 LearningRate 0.0749 Epoch: 2 Global Step: 111480 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:42,068-Speed 2626.89 samples/sec Loss 11.8774 LearningRate 0.0749 Epoch: 2 Global Step: 111490 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:45,966-Speed 2627.41 samples/sec Loss 12.0871 LearningRate 0.0749 Epoch: 2 Global Step: 111500 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:49,869-Speed 2624.28 samples/sec Loss 11.9174 LearningRate 0.0749 Epoch: 2 Global Step: 111510 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:53,766-Speed 2628.43 samples/sec Loss 11.9154 LearningRate 0.0749 Epoch: 2 Global Step: 111520 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:15:57,671-Speed 2622.82 samples/sec Loss 11.9617 LearningRate 0.0749 Epoch: 2 Global Step: 111530 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:01,564-Speed 2630.62 samples/sec Loss 11.9442 LearningRate 0.0749 Epoch: 2 Global Step: 111540 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:05,467-Speed 2624.52 samples/sec Loss 11.8987 LearningRate 0.0749 Epoch: 2 Global Step: 111550 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:09,372-Speed 2623.04 samples/sec Loss 11.8983 LearningRate 0.0749 Epoch: 2 Global Step: 111560 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:13,273-Speed 2625.94 samples/sec Loss 11.9609 LearningRate 0.0749 Epoch: 2 Global Step: 111570 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:17,178-Speed 2622.50 samples/sec Loss 11.9902 LearningRate 0.0749 Epoch: 2 Global Step: 111580 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:21,079-Speed 2626.02 samples/sec Loss 12.0311 LearningRate 0.0749 Epoch: 2 Global Step: 111590 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:24,980-Speed 2625.09 samples/sec Loss 11.8949 LearningRate 0.0749 Epoch: 2 Global Step: 111600 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:28,882-Speed 2624.62 samples/sec Loss 12.0549 LearningRate 0.0749 Epoch: 2 Global Step: 111610 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:32,784-Speed 2625.25 samples/sec Loss 12.0790 LearningRate 0.0749 Epoch: 2 Global Step: 111620 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:36,673-Speed 2633.45 samples/sec Loss 11.9914 LearningRate 0.0749 Epoch: 2 Global Step: 111630 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:16:40,554-Speed 2640.62 samples/sec Loss 11.9892 LearningRate 0.0749 Epoch: 2 Global Step: 111640 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:16:44,446-Speed 2631.39 samples/sec Loss 11.9645 LearningRate 0.0749 Epoch: 2 Global Step: 111650 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:16:48,345-Speed 2627.10 samples/sec Loss 12.0487 LearningRate 0.0749 Epoch: 2 Global Step: 111660 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:16:52,244-Speed 2627.15 samples/sec Loss 11.8452 LearningRate 0.0749 Epoch: 2 Global Step: 111670 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:16:56,142-Speed 2627.62 samples/sec Loss 12.0021 LearningRate 0.0749 Epoch: 2 Global Step: 111680 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:17:00,043-Speed 2625.40 samples/sec Loss 11.9385 LearningRate 0.0749 Epoch: 2 Global Step: 111690 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:17:03,947-Speed 2623.33 samples/sec Loss 11.9288 LearningRate 0.0749 Epoch: 2 Global Step: 111700 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:17:07,841-Speed 2629.95 samples/sec Loss 12.0591 LearningRate 0.0749 Epoch: 2 Global Step: 111710 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:17:11,724-Speed 2638.48 samples/sec Loss 11.9210 LearningRate 0.0749 Epoch: 2 Global Step: 111720 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:15,619-Speed 2629.90 samples/sec Loss 12.0594 LearningRate 0.0749 Epoch: 2 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:19,512-Speed 2630.78 samples/sec Loss 12.0731 LearningRate 0.0749 Epoch: 2 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:23,423-Speed 2619.48 samples/sec Loss 12.0032 LearningRate 0.0749 Epoch: 2 Global Step: 111750 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:27,326-Speed 2624.00 samples/sec Loss 12.0037 LearningRate 0.0749 Epoch: 2 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:31,242-Speed 2615.74 samples/sec Loss 12.0209 LearningRate 0.0749 Epoch: 2 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:35,145-Speed 2624.02 samples/sec Loss 12.0444 LearningRate 0.0749 Epoch: 2 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:39,058-Speed 2617.04 samples/sec Loss 12.0102 LearningRate 0.0749 Epoch: 2 Global Step: 111790 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:42,962-Speed 2623.59 samples/sec Loss 12.0331 LearningRate 0.0749 Epoch: 2 Global Step: 111800 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:46,870-Speed 2621.20 samples/sec Loss 11.9762 LearningRate 0.0749 Epoch: 2 Global Step: 111810 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:17:50,767-Speed 2628.80 samples/sec Loss 12.0957 LearningRate 0.0749 Epoch: 2 Global Step: 111820 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:17:54,672-Speed 2622.55 samples/sec Loss 11.9502 LearningRate 0.0749 Epoch: 2 Global Step: 111830 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:17:58,568-Speed 2629.04 samples/sec Loss 12.0217 LearningRate 0.0749 Epoch: 2 Global Step: 111840 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:02,468-Speed 2626.09 samples/sec Loss 11.9913 LearningRate 0.0749 Epoch: 2 Global Step: 111850 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:06,367-Speed 2626.88 samples/sec Loss 12.0263 LearningRate 0.0749 Epoch: 2 Global Step: 111860 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:10,262-Speed 2629.51 samples/sec Loss 11.9425 LearningRate 0.0748 Epoch: 2 Global Step: 111870 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:14,177-Speed 2616.55 samples/sec Loss 11.9433 LearningRate 0.0748 Epoch: 2 Global Step: 111880 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:18,126-Speed 2593.59 samples/sec Loss 11.9493 LearningRate 0.0748 Epoch: 2 Global Step: 111890 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:22,019-Speed 2631.50 samples/sec Loss 11.9988 LearningRate 0.0748 Epoch: 2 Global Step: 111900 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:25,920-Speed 2625.92 samples/sec Loss 11.9539 LearningRate 0.0748 Epoch: 2 Global Step: 111910 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:29,818-Speed 2627.90 samples/sec Loss 11.9408 LearningRate 0.0748 Epoch: 2 Global Step: 111920 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:18:33,700-Speed 2638.59 samples/sec Loss 12.1026 LearningRate 0.0748 Epoch: 2 Global Step: 111930 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:37,590-Speed 2632.79 samples/sec Loss 12.0664 LearningRate 0.0748 Epoch: 2 Global Step: 111940 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:41,485-Speed 2629.02 samples/sec Loss 12.0809 LearningRate 0.0748 Epoch: 2 Global Step: 111950 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:45,379-Speed 2631.22 samples/sec Loss 12.1411 LearningRate 0.0748 Epoch: 2 Global Step: 111960 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:49,274-Speed 2629.36 samples/sec Loss 12.0386 LearningRate 0.0748 Epoch: 2 Global Step: 111970 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:53,168-Speed 2631.02 samples/sec Loss 12.2088 LearningRate 0.0748 Epoch: 2 Global Step: 111980 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:18:57,061-Speed 2631.01 samples/sec Loss 12.0630 LearningRate 0.0748 Epoch: 2 Global Step: 111990 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:19:00,953-Speed 2631.97 samples/sec Loss 12.0293 LearningRate 0.0748 Epoch: 2 Global Step: 112000 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:19:04,855-Speed 2624.50 samples/sec Loss 11.9446 LearningRate 0.0748 Epoch: 2 Global Step: 112010 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:19:08,760-Speed 2622.90 samples/sec Loss 11.9822 LearningRate 0.0748 Epoch: 2 Global Step: 112020 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:19:12,668-Speed 2620.80 samples/sec Loss 12.1547 LearningRate 0.0748 Epoch: 2 Global Step: 112030 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:19:16,637-Speed 2580.86 samples/sec Loss 12.0148 LearningRate 0.0748 Epoch: 2 Global Step: 112040 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:19:20,619-Speed 2572.68 samples/sec Loss 12.0033 LearningRate 0.0748 Epoch: 2 Global Step: 112050 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:19:24,524-Speed 2622.88 samples/sec Loss 12.0635 LearningRate 0.0748 Epoch: 2 Global Step: 112060 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:19:28,418-Speed 2629.65 samples/sec Loss 11.9274 LearningRate 0.0748 Epoch: 2 Global Step: 112070 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:19:32,293-Speed 2644.20 samples/sec Loss 12.0352 LearningRate 0.0748 Epoch: 2 Global Step: 112080 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:19:36,218-Speed 2609.38 samples/sec Loss 12.0042 LearningRate 0.0748 Epoch: 2 Global Step: 112090 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:19:40,121-Speed 2624.24 samples/sec Loss 11.9203 LearningRate 0.0748 Epoch: 2 Global Step: 112100 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:19:44,027-Speed 2621.75 samples/sec Loss 11.8764 LearningRate 0.0748 Epoch: 2 Global Step: 112110 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:19:47,925-Speed 2627.78 samples/sec Loss 11.9597 LearningRate 0.0748 Epoch: 2 Global Step: 112120 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:19:51,824-Speed 2627.12 samples/sec Loss 11.9333 LearningRate 0.0748 Epoch: 2 Global Step: 112130 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:19:55,724-Speed 2626.53 samples/sec Loss 12.0401 LearningRate 0.0748 Epoch: 2 Global Step: 112140 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:19:59,619-Speed 2629.36 samples/sec Loss 11.9055 LearningRate 0.0748 Epoch: 2 Global Step: 112150 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:03,524-Speed 2622.67 samples/sec Loss 12.1210 LearningRate 0.0748 Epoch: 2 Global Step: 112160 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:07,429-Speed 2622.77 samples/sec Loss 11.8737 LearningRate 0.0748 Epoch: 2 Global Step: 112170 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:11,316-Speed 2635.16 samples/sec Loss 12.0191 LearningRate 0.0748 Epoch: 2 Global Step: 112180 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:15,229-Speed 2617.90 samples/sec Loss 12.1105 LearningRate 0.0748 Epoch: 2 Global Step: 112190 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:19,135-Speed 2621.84 samples/sec Loss 12.0320 LearningRate 0.0748 Epoch: 2 Global Step: 112200 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:23,030-Speed 2629.85 samples/sec Loss 12.0288 LearningRate 0.0748 Epoch: 2 Global Step: 112210 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:26,925-Speed 2629.89 samples/sec Loss 11.9828 LearningRate 0.0748 Epoch: 2 Global Step: 112220 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:30,821-Speed 2628.71 samples/sec Loss 12.1044 LearningRate 0.0748 Epoch: 2 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:34,720-Speed 2627.60 samples/sec Loss 11.9982 LearningRate 0.0748 Epoch: 2 Global Step: 112240 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:38,648-Speed 2607.57 samples/sec Loss 11.9456 LearningRate 0.0748 Epoch: 2 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:42,547-Speed 2626.75 samples/sec Loss 12.1191 LearningRate 0.0748 Epoch: 2 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:46,442-Speed 2629.77 samples/sec Loss 12.1059 LearningRate 0.0748 Epoch: 2 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:20:50,343-Speed 2625.79 samples/sec Loss 11.8242 LearningRate 0.0748 Epoch: 2 Global Step: 112280 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:20:54,241-Speed 2627.90 samples/sec Loss 12.0107 LearningRate 0.0748 Epoch: 2 Global Step: 112290 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:20:58,134-Speed 2631.25 samples/sec Loss 12.0785 LearningRate 0.0748 Epoch: 2 Global Step: 112300 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:02,028-Speed 2630.04 samples/sec Loss 11.9353 LearningRate 0.0748 Epoch: 2 Global Step: 112310 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:05,929-Speed 2626.49 samples/sec Loss 11.9822 LearningRate 0.0748 Epoch: 2 Global Step: 112320 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:09,836-Speed 2621.69 samples/sec Loss 11.8550 LearningRate 0.0748 Epoch: 2 Global Step: 112330 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:13,735-Speed 2626.46 samples/sec Loss 12.0751 LearningRate 0.0748 Epoch: 2 Global Step: 112340 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:17,634-Speed 2626.72 samples/sec Loss 12.0883 LearningRate 0.0747 Epoch: 2 Global Step: 112350 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:21,538-Speed 2624.18 samples/sec Loss 12.1154 LearningRate 0.0747 Epoch: 2 Global Step: 112360 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:25,439-Speed 2625.78 samples/sec Loss 11.9502 LearningRate 0.0747 Epoch: 2 Global Step: 112370 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:29,340-Speed 2625.69 samples/sec Loss 12.1603 LearningRate 0.0747 Epoch: 2 Global Step: 112380 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:21:33,240-Speed 2626.39 samples/sec Loss 12.0122 LearningRate 0.0747 Epoch: 2 Global Step: 112390 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:21:37,117-Speed 2641.97 samples/sec Loss 11.8769 LearningRate 0.0747 Epoch: 2 Global Step: 112400 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:41,011-Speed 2629.94 samples/sec Loss 12.0427 LearningRate 0.0747 Epoch: 2 Global Step: 112410 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:44,930-Speed 2613.79 samples/sec Loss 12.1626 LearningRate 0.0747 Epoch: 2 Global Step: 112420 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:48,827-Speed 2627.66 samples/sec Loss 12.0736 LearningRate 0.0747 Epoch: 2 Global Step: 112430 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:52,724-Speed 2628.65 samples/sec Loss 12.0678 LearningRate 0.0747 Epoch: 2 Global Step: 112440 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:21:56,618-Speed 2630.79 samples/sec Loss 12.0033 LearningRate 0.0747 Epoch: 2 Global Step: 112450 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:00,513-Speed 2629.99 samples/sec Loss 12.2211 LearningRate 0.0747 Epoch: 2 Global Step: 112460 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:04,406-Speed 2631.07 samples/sec Loss 12.2069 LearningRate 0.0747 Epoch: 2 Global Step: 112470 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:08,302-Speed 2628.36 samples/sec Loss 12.0117 LearningRate 0.0747 Epoch: 2 Global Step: 112480 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:12,203-Speed 2625.66 samples/sec Loss 11.8441 LearningRate 0.0747 Epoch: 2 Global Step: 112490 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:16,113-Speed 2619.67 samples/sec Loss 12.0361 LearningRate 0.0747 Epoch: 2 Global Step: 112500 Fp16 Grad Scale: 262144 Required: 81 hours
Training: 2022-04-13 08:22:19,989-Speed 2642.85 samples/sec Loss 12.1023 LearningRate 0.0747 Epoch: 2 Global Step: 112510 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:23,885-Speed 2629.12 samples/sec Loss 12.2041 LearningRate 0.0747 Epoch: 2 Global Step: 112520 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:27,787-Speed 2624.73 samples/sec Loss 11.9451 LearningRate 0.0747 Epoch: 2 Global Step: 112530 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:22:31,670-Speed 2638.69 samples/sec Loss 11.8222 LearningRate 0.0747 Epoch: 2 Global Step: 112540 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:22:35,565-Speed 2629.62 samples/sec Loss 12.0565 LearningRate 0.0747 Epoch: 2 Global Step: 112550 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:22:39,465-Speed 2625.90 samples/sec Loss 11.8290 LearningRate 0.0747 Epoch: 2 Global Step: 112560 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:22:43,373-Speed 2620.97 samples/sec Loss 11.7630 LearningRate 0.0747 Epoch: 2 Global Step: 112570 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:22:47,272-Speed 2627.61 samples/sec Loss 11.9883 LearningRate 0.0747 Epoch: 2 Global Step: 112580 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:22:51,167-Speed 2629.00 samples/sec Loss 11.9720 LearningRate 0.0747 Epoch: 2 Global Step: 112590 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:22:55,091-Speed 2610.76 samples/sec Loss 12.0850 LearningRate 0.0747 Epoch: 2 Global Step: 112600 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:22:58,983-Speed 2631.75 samples/sec Loss 12.0889 LearningRate 0.0747 Epoch: 2 Global Step: 112610 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:23:02,878-Speed 2629.16 samples/sec Loss 12.1169 LearningRate 0.0747 Epoch: 2 Global Step: 112620 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:23:06,773-Speed 2630.08 samples/sec Loss 11.9978 LearningRate 0.0747 Epoch: 2 Global Step: 112630 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:23:10,666-Speed 2631.18 samples/sec Loss 12.0756 LearningRate 0.0747 Epoch: 2 Global Step: 112640 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:23:14,559-Speed 2631.07 samples/sec Loss 12.0119 LearningRate 0.0747 Epoch: 2 Global Step: 112650 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:23:18,455-Speed 2629.14 samples/sec Loss 11.9005 LearningRate 0.0747 Epoch: 2 Global Step: 112660 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:23:22,360-Speed 2623.11 samples/sec Loss 12.1449 LearningRate 0.0747 Epoch: 2 Global Step: 112670 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:23:26,252-Speed 2631.50 samples/sec Loss 12.0196 LearningRate 0.0747 Epoch: 2 Global Step: 112680 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:23:30,148-Speed 2628.83 samples/sec Loss 12.0416 LearningRate 0.0747 Epoch: 2 Global Step: 112690 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:23:34,044-Speed 2629.00 samples/sec Loss 12.0107 LearningRate 0.0747 Epoch: 2 Global Step: 112700 Fp16 Grad Scale: 131072 Required: 81 hours
Training: 2022-04-13 08:23:37,882-Speed 2668.43 samples/sec Loss 12.0006 LearningRate 0.0747 Epoch: 2 Global Step: 112710 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:23:41,780-Speed 2628.26 samples/sec Loss 12.1892 LearningRate 0.0747 Epoch: 2 Global Step: 112720 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:23:45,670-Speed 2632.70 samples/sec Loss 11.8504 LearningRate 0.0747 Epoch: 2 Global Step: 112730 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:23:49,560-Speed 2632.87 samples/sec Loss 11.9023 LearningRate 0.0747 Epoch: 2 Global Step: 112740 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:23:53,451-Speed 2632.48 samples/sec Loss 12.1347 LearningRate 0.0747 Epoch: 2 Global Step: 112750 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:23:57,347-Speed 2628.98 samples/sec Loss 12.1371 LearningRate 0.0747 Epoch: 2 Global Step: 112760 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:24:01,240-Speed 2631.43 samples/sec Loss 11.9363 LearningRate 0.0747 Epoch: 2 Global Step: 112770 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:24:05,156-Speed 2615.51 samples/sec Loss 12.0157 LearningRate 0.0747 Epoch: 2 Global Step: 112780 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:24:09,050-Speed 2630.36 samples/sec Loss 11.8309 LearningRate 0.0747 Epoch: 2 Global Step: 112790 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:24:12,948-Speed 2627.63 samples/sec Loss 12.1599 LearningRate 0.0747 Epoch: 2 Global Step: 112800 Fp16 Grad Scale: 16384 Required: 81 hours
Training: 2022-04-13 08:24:16,844-Speed 2628.86 samples/sec Loss 11.9169 LearningRate 0.0747 Epoch: 2 Global Step: 112810 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:20,743-Speed 2626.67 samples/sec Loss 12.1383 LearningRate 0.0747 Epoch: 2 Global Step: 112820 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:24,648-Speed 2623.03 samples/sec Loss 12.0775 LearningRate 0.0746 Epoch: 2 Global Step: 112830 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:28,556-Speed 2621.12 samples/sec Loss 11.9599 LearningRate 0.0746 Epoch: 2 Global Step: 112840 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:32,453-Speed 2628.49 samples/sec Loss 12.0101 LearningRate 0.0746 Epoch: 2 Global Step: 112850 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:36,348-Speed 2629.43 samples/sec Loss 11.9501 LearningRate 0.0746 Epoch: 2 Global Step: 112860 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:40,243-Speed 2629.81 samples/sec Loss 11.9731 LearningRate 0.0746 Epoch: 2 Global Step: 112870 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:44,135-Speed 2631.67 samples/sec Loss 12.0303 LearningRate 0.0746 Epoch: 2 Global Step: 112880 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:48,031-Speed 2628.87 samples/sec Loss 12.0651 LearningRate 0.0746 Epoch: 2 Global Step: 112890 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:51,922-Speed 2632.33 samples/sec Loss 12.0153 LearningRate 0.0746 Epoch: 2 Global Step: 112900 Fp16 Grad Scale: 32768 Required: 81 hours
Training: 2022-04-13 08:24:55,860-Speed 2600.97 samples/sec Loss 12.0787 LearningRate 0.0746 Epoch: 2 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:24:59,775-Speed 2616.16 samples/sec Loss 12.3927 LearningRate 0.0746 Epoch: 2 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:25:03,672-Speed 2628.47 samples/sec Loss 12.2308 LearningRate 0.0746 Epoch: 2 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:25:07,571-Speed 2626.37 samples/sec Loss 12.1560 LearningRate 0.0746 Epoch: 2 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:25:11,464-Speed 2631.56 samples/sec Loss 12.1164 LearningRate 0.0746 Epoch: 2 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 81 hours
Training: 2022-04-13 08:25:15,358-Speed 2630.42 samples/sec Loss 12.1217 LearningRate 0.0746 Epoch: 2 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:25:19,256-Speed 2627.37 samples/sec Loss 12.2234 LearningRate 0.0746 Epoch: 2 Global Step: 112970 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:25:23,149-Speed 2631.61 samples/sec Loss 12.1140 LearningRate 0.0746 Epoch: 2 Global Step: 112980 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:25:27,044-Speed 2629.36 samples/sec Loss 12.1619 LearningRate 0.0746 Epoch: 2 Global Step: 112990 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:25:30,980-Speed 2602.71 samples/sec Loss 12.0638 LearningRate 0.0746 Epoch: 2 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:25:34,875-Speed 2629.69 samples/sec Loss 12.0059 LearningRate 0.0746 Epoch: 2 Global Step: 113010 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:25:38,766-Speed 2632.26 samples/sec Loss 11.8868 LearningRate 0.0746 Epoch: 2 Global Step: 113020 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:25:42,663-Speed 2628.48 samples/sec Loss 12.0018 LearningRate 0.0746 Epoch: 2 Global Step: 113030 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:25:46,555-Speed 2632.46 samples/sec Loss 12.1525 LearningRate 0.0746 Epoch: 2 Global Step: 113040 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:25:50,446-Speed 2632.34 samples/sec Loss 12.2091 LearningRate 0.0746 Epoch: 2 Global Step: 113050 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:25:54,357-Speed 2619.56 samples/sec Loss 12.1360 LearningRate 0.0746 Epoch: 2 Global Step: 113060 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:25:58,276-Speed 2613.65 samples/sec Loss 12.0051 LearningRate 0.0746 Epoch: 2 Global Step: 113070 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:02,168-Speed 2631.46 samples/sec Loss 12.0198 LearningRate 0.0746 Epoch: 2 Global Step: 113080 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:06,059-Speed 2632.51 samples/sec Loss 11.9788 LearningRate 0.0746 Epoch: 2 Global Step: 113090 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:09,952-Speed 2630.64 samples/sec Loss 11.9409 LearningRate 0.0746 Epoch: 2 Global Step: 113100 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:13,859-Speed 2621.33 samples/sec Loss 11.9200 LearningRate 0.0746 Epoch: 2 Global Step: 113110 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:26:17,758-Speed 2627.50 samples/sec Loss 12.0594 LearningRate 0.0746 Epoch: 2 Global Step: 113120 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:26:21,644-Speed 2636.47 samples/sec Loss 12.1731 LearningRate 0.0746 Epoch: 2 Global Step: 113130 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:25,542-Speed 2627.34 samples/sec Loss 12.0195 LearningRate 0.0746 Epoch: 2 Global Step: 113140 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:29,439-Speed 2628.36 samples/sec Loss 12.0515 LearningRate 0.0746 Epoch: 2 Global Step: 113150 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:33,331-Speed 2631.44 samples/sec Loss 12.1742 LearningRate 0.0746 Epoch: 2 Global Step: 113160 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:37,226-Speed 2629.50 samples/sec Loss 11.9841 LearningRate 0.0746 Epoch: 2 Global Step: 113170 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:41,131-Speed 2623.13 samples/sec Loss 11.9449 LearningRate 0.0746 Epoch: 2 Global Step: 113180 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:45,035-Speed 2623.12 samples/sec Loss 12.0150 LearningRate 0.0746 Epoch: 2 Global Step: 113190 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:48,924-Speed 2633.63 samples/sec Loss 12.1008 LearningRate 0.0746 Epoch: 2 Global Step: 113200 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:52,823-Speed 2627.22 samples/sec Loss 12.0444 LearningRate 0.0746 Epoch: 2 Global Step: 113210 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:26:56,737-Speed 2617.14 samples/sec Loss 11.9431 LearningRate 0.0746 Epoch: 2 Global Step: 113220 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:00,636-Speed 2627.10 samples/sec Loss 12.0265 LearningRate 0.0746 Epoch: 2 Global Step: 113230 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:27:04,583-Speed 2603.43 samples/sec Loss 12.0393 LearningRate 0.0746 Epoch: 2 Global Step: 113240 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:27:08,457-Speed 2643.71 samples/sec Loss 12.0966 LearningRate 0.0746 Epoch: 2 Global Step: 113250 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:12,359-Speed 2624.76 samples/sec Loss 12.0331 LearningRate 0.0746 Epoch: 2 Global Step: 113260 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:16,258-Speed 2626.74 samples/sec Loss 12.1442 LearningRate 0.0746 Epoch: 2 Global Step: 113270 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:20,153-Speed 2629.33 samples/sec Loss 12.0276 LearningRate 0.0746 Epoch: 2 Global Step: 113280 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:24,048-Speed 2629.92 samples/sec Loss 12.0091 LearningRate 0.0746 Epoch: 2 Global Step: 113290 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:27,940-Speed 2632.25 samples/sec Loss 12.1560 LearningRate 0.0746 Epoch: 2 Global Step: 113300 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:31,831-Speed 2631.94 samples/sec Loss 11.9394 LearningRate 0.0745 Epoch: 2 Global Step: 113310 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:35,732-Speed 2625.55 samples/sec Loss 12.2046 LearningRate 0.0745 Epoch: 2 Global Step: 113320 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:39,631-Speed 2627.02 samples/sec Loss 11.9135 LearningRate 0.0745 Epoch: 2 Global Step: 113330 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:43,527-Speed 2628.98 samples/sec Loss 12.1166 LearningRate 0.0745 Epoch: 2 Global Step: 113340 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:47,403-Speed 2642.37 samples/sec Loss 12.1063 LearningRate 0.0745 Epoch: 2 Global Step: 113350 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:51,296-Speed 2631.41 samples/sec Loss 11.8695 LearningRate 0.0745 Epoch: 2 Global Step: 113360 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:55,190-Speed 2630.78 samples/sec Loss 11.9594 LearningRate 0.0745 Epoch: 2 Global Step: 113370 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:27:59,082-Speed 2631.73 samples/sec Loss 12.2204 LearningRate 0.0745 Epoch: 2 Global Step: 113380 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:28:02,973-Speed 2632.30 samples/sec Loss 12.0029 LearningRate 0.0745 Epoch: 2 Global Step: 113390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:28:06,874-Speed 2625.44 samples/sec Loss 11.9117 LearningRate 0.0745 Epoch: 2 Global Step: 113400 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:28:10,770-Speed 2629.52 samples/sec Loss 12.0550 LearningRate 0.0745 Epoch: 2 Global Step: 113410 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:28:14,674-Speed 2623.48 samples/sec Loss 12.1403 LearningRate 0.0745 Epoch: 2 Global Step: 113420 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:28:18,586-Speed 2617.53 samples/sec Loss 11.8616 LearningRate 0.0745 Epoch: 2 Global Step: 113430 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:28:22,478-Speed 2632.19 samples/sec Loss 11.9114 LearningRate 0.0745 Epoch: 2 Global Step: 113440 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:28:26,373-Speed 2630.01 samples/sec Loss 11.9408 LearningRate 0.0745 Epoch: 2 Global Step: 113450 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:30,270-Speed 2627.51 samples/sec Loss 11.8746 LearningRate 0.0745 Epoch: 2 Global Step: 113460 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:34,166-Speed 2629.33 samples/sec Loss 12.0750 LearningRate 0.0745 Epoch: 2 Global Step: 113470 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:38,058-Speed 2631.73 samples/sec Loss 12.0738 LearningRate 0.0745 Epoch: 2 Global Step: 113480 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:41,951-Speed 2631.43 samples/sec Loss 12.0993 LearningRate 0.0745 Epoch: 2 Global Step: 113490 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:45,860-Speed 2620.38 samples/sec Loss 12.0828 LearningRate 0.0745 Epoch: 2 Global Step: 113500 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:49,774-Speed 2616.84 samples/sec Loss 12.0595 LearningRate 0.0745 Epoch: 2 Global Step: 113510 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:53,667-Speed 2631.13 samples/sec Loss 12.0032 LearningRate 0.0745 Epoch: 2 Global Step: 113520 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:28:57,557-Speed 2632.96 samples/sec Loss 12.0009 LearningRate 0.0745 Epoch: 2 Global Step: 113530 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:01,449-Speed 2631.65 samples/sec Loss 12.1176 LearningRate 0.0745 Epoch: 2 Global Step: 113540 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:05,330-Speed 2638.84 samples/sec Loss 11.9068 LearningRate 0.0745 Epoch: 2 Global Step: 113550 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:09,216-Speed 2635.65 samples/sec Loss 12.0724 LearningRate 0.0745 Epoch: 2 Global Step: 113560 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:13,110-Speed 2631.00 samples/sec Loss 12.0748 LearningRate 0.0745 Epoch: 2 Global Step: 113570 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:17,004-Speed 2630.87 samples/sec Loss 11.9872 LearningRate 0.0745 Epoch: 2 Global Step: 113580 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:20,895-Speed 2631.85 samples/sec Loss 12.0834 LearningRate 0.0745 Epoch: 2 Global Step: 113590 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:24,790-Speed 2630.30 samples/sec Loss 12.0336 LearningRate 0.0745 Epoch: 2 Global Step: 113600 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:28,705-Speed 2616.10 samples/sec Loss 12.0215 LearningRate 0.0745 Epoch: 2 Global Step: 113610 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:32,599-Speed 2630.39 samples/sec Loss 12.0407 LearningRate 0.0745 Epoch: 2 Global Step: 113620 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:36,510-Speed 2618.68 samples/sec Loss 11.8106 LearningRate 0.0745 Epoch: 2 Global Step: 113630 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:40,414-Speed 2623.84 samples/sec Loss 12.0480 LearningRate 0.0745 Epoch: 2 Global Step: 113640 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:44,308-Speed 2630.41 samples/sec Loss 11.9142 LearningRate 0.0745 Epoch: 2 Global Step: 113650 Fp16 Grad Scale: 524288 Required: 80 hours
Training: 2022-04-13 08:29:48,186-Speed 2641.69 samples/sec Loss 11.9417 LearningRate 0.0745 Epoch: 2 Global Step: 113660 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:29:52,063-Speed 2641.96 samples/sec Loss 11.8973 LearningRate 0.0745 Epoch: 2 Global Step: 113670 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:29:55,960-Speed 2628.08 samples/sec Loss 11.9566 LearningRate 0.0745 Epoch: 2 Global Step: 113680 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:29:59,857-Speed 2628.41 samples/sec Loss 11.8970 LearningRate 0.0745 Epoch: 2 Global Step: 113690 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:03,757-Speed 2625.91 samples/sec Loss 12.0579 LearningRate 0.0745 Epoch: 2 Global Step: 113700 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:07,650-Speed 2630.70 samples/sec Loss 12.0125 LearningRate 0.0745 Epoch: 2 Global Step: 113710 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:11,548-Speed 2627.39 samples/sec Loss 11.8716 LearningRate 0.0745 Epoch: 2 Global Step: 113720 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:15,452-Speed 2624.32 samples/sec Loss 12.0256 LearningRate 0.0745 Epoch: 2 Global Step: 113730 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:19,370-Speed 2614.37 samples/sec Loss 11.9438 LearningRate 0.0745 Epoch: 2 Global Step: 113740 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:23,274-Speed 2623.86 samples/sec Loss 11.8747 LearningRate 0.0745 Epoch: 2 Global Step: 113750 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:27,168-Speed 2629.52 samples/sec Loss 11.9960 LearningRate 0.0745 Epoch: 2 Global Step: 113760 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:31,081-Speed 2618.46 samples/sec Loss 12.0666 LearningRate 0.0745 Epoch: 2 Global Step: 113770 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:30:34,957-Speed 2642.45 samples/sec Loss 11.9229 LearningRate 0.0745 Epoch: 2 Global Step: 113780 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:38,961-Speed 2557.89 samples/sec Loss 11.9780 LearningRate 0.0744 Epoch: 2 Global Step: 113790 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:43,024-Speed 2520.59 samples/sec Loss 11.9591 LearningRate 0.0744 Epoch: 2 Global Step: 113800 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:46,921-Speed 2628.54 samples/sec Loss 12.0863 LearningRate 0.0744 Epoch: 2 Global Step: 113810 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:50,942-Speed 2547.27 samples/sec Loss 12.1645 LearningRate 0.0744 Epoch: 2 Global Step: 113820 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:54,840-Speed 2628.19 samples/sec Loss 11.9533 LearningRate 0.0744 Epoch: 2 Global Step: 113830 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:30:58,735-Speed 2629.76 samples/sec Loss 12.0593 LearningRate 0.0744 Epoch: 2 Global Step: 113840 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:02,630-Speed 2629.65 samples/sec Loss 12.0969 LearningRate 0.0744 Epoch: 2 Global Step: 113850 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:06,551-Speed 2612.05 samples/sec Loss 12.1927 LearningRate 0.0744 Epoch: 2 Global Step: 113860 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:10,448-Speed 2628.13 samples/sec Loss 12.0592 LearningRate 0.0744 Epoch: 2 Global Step: 113870 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:14,420-Speed 2578.84 samples/sec Loss 12.1510 LearningRate 0.0744 Epoch: 2 Global Step: 113880 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:18,316-Speed 2629.16 samples/sec Loss 12.0039 LearningRate 0.0744 Epoch: 2 Global Step: 113890 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:22,213-Speed 2628.40 samples/sec Loss 11.9059 LearningRate 0.0744 Epoch: 2 Global Step: 113900 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:26,106-Speed 2630.72 samples/sec Loss 11.9005 LearningRate 0.0744 Epoch: 2 Global Step: 113910 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:30,005-Speed 2627.26 samples/sec Loss 12.0191 LearningRate 0.0744 Epoch: 2 Global Step: 113920 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:33,901-Speed 2629.13 samples/sec Loss 11.9930 LearningRate 0.0744 Epoch: 2 Global Step: 113930 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:37,788-Speed 2635.29 samples/sec Loss 11.9705 LearningRate 0.0744 Epoch: 2 Global Step: 113940 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:41,687-Speed 2626.68 samples/sec Loss 11.9495 LearningRate 0.0744 Epoch: 2 Global Step: 113950 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:45,591-Speed 2623.53 samples/sec Loss 12.0699 LearningRate 0.0744 Epoch: 2 Global Step: 113960 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:49,485-Speed 2630.45 samples/sec Loss 12.0650 LearningRate 0.0744 Epoch: 2 Global Step: 113970 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:31:53,433-Speed 2594.23 samples/sec Loss 11.9260 LearningRate 0.0744 Epoch: 2 Global Step: 113980 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:31:57,334-Speed 2625.31 samples/sec Loss 11.9731 LearningRate 0.0744 Epoch: 2 Global Step: 113990 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:32:01,216-Speed 2638.61 samples/sec Loss 11.9786 LearningRate 0.0744 Epoch: 2 Global Step: 114000 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:05,110-Speed 2630.76 samples/sec Loss 12.0235 LearningRate 0.0744 Epoch: 2 Global Step: 114010 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:09,014-Speed 2623.14 samples/sec Loss 12.0379 LearningRate 0.0744 Epoch: 2 Global Step: 114020 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:12,909-Speed 2629.78 samples/sec Loss 12.0443 LearningRate 0.0744 Epoch: 2 Global Step: 114030 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:16,801-Speed 2631.83 samples/sec Loss 11.8751 LearningRate 0.0744 Epoch: 2 Global Step: 114040 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:20,694-Speed 2631.07 samples/sec Loss 11.8452 LearningRate 0.0744 Epoch: 2 Global Step: 114050 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:24,599-Speed 2622.96 samples/sec Loss 11.8834 LearningRate 0.0744 Epoch: 2 Global Step: 114060 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:28,529-Speed 2606.32 samples/sec Loss 11.9960 LearningRate 0.0744 Epoch: 2 Global Step: 114070 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:32,434-Speed 2622.99 samples/sec Loss 11.9713 LearningRate 0.0744 Epoch: 2 Global Step: 114080 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:36,346-Speed 2618.79 samples/sec Loss 12.0899 LearningRate 0.0744 Epoch: 2 Global Step: 114090 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:32:40,258-Speed 2618.20 samples/sec Loss 12.0163 LearningRate 0.0744 Epoch: 2 Global Step: 114100 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:32:44,167-Speed 2619.73 samples/sec Loss 12.0542 LearningRate 0.0744 Epoch: 2 Global Step: 114110 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:32:48,079-Speed 2618.48 samples/sec Loss 12.0158 LearningRate 0.0744 Epoch: 2 Global Step: 114120 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:32:52,018-Speed 2600.17 samples/sec Loss 11.9240 LearningRate 0.0744 Epoch: 2 Global Step: 114130 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:32:55,925-Speed 2621.94 samples/sec Loss 12.1170 LearningRate 0.0744 Epoch: 2 Global Step: 114140 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:32:59,818-Speed 2631.23 samples/sec Loss 12.0347 LearningRate 0.0744 Epoch: 2 Global Step: 114150 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:03,714-Speed 2628.68 samples/sec Loss 11.9245 LearningRate 0.0744 Epoch: 2 Global Step: 114160 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:07,613-Speed 2627.43 samples/sec Loss 12.0317 LearningRate 0.0744 Epoch: 2 Global Step: 114170 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:11,509-Speed 2628.78 samples/sec Loss 11.9635 LearningRate 0.0744 Epoch: 2 Global Step: 114180 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:15,411-Speed 2624.85 samples/sec Loss 11.9462 LearningRate 0.0744 Epoch: 2 Global Step: 114190 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:19,370-Speed 2586.93 samples/sec Loss 11.9873 LearningRate 0.0744 Epoch: 2 Global Step: 114200 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:23,403-Speed 2539.98 samples/sec Loss 11.9778 LearningRate 0.0744 Epoch: 2 Global Step: 114210 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:27,300-Speed 2628.68 samples/sec Loss 12.0504 LearningRate 0.0744 Epoch: 2 Global Step: 114220 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:31,196-Speed 2628.68 samples/sec Loss 11.9754 LearningRate 0.0744 Epoch: 2 Global Step: 114230 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:35,096-Speed 2625.99 samples/sec Loss 12.0063 LearningRate 0.0744 Epoch: 2 Global Step: 114240 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:38,992-Speed 2629.13 samples/sec Loss 11.9695 LearningRate 0.0744 Epoch: 2 Global Step: 114250 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:42,908-Speed 2614.93 samples/sec Loss 11.7187 LearningRate 0.0744 Epoch: 2 Global Step: 114260 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:46,805-Speed 2628.35 samples/sec Loss 11.8735 LearningRate 0.0743 Epoch: 2 Global Step: 114270 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:50,696-Speed 2632.91 samples/sec Loss 12.0454 LearningRate 0.0743 Epoch: 2 Global Step: 114280 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:54,591-Speed 2629.56 samples/sec Loss 11.8948 LearningRate 0.0743 Epoch: 2 Global Step: 114290 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:33:58,480-Speed 2633.91 samples/sec Loss 11.8688 LearningRate 0.0743 Epoch: 2 Global Step: 114300 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:34:02,376-Speed 2629.04 samples/sec Loss 12.1054 LearningRate 0.0743 Epoch: 2 Global Step: 114310 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:34:06,256-Speed 2639.71 samples/sec Loss 12.1213 LearningRate 0.0743 Epoch: 2 Global Step: 114320 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:10,150-Speed 2630.20 samples/sec Loss 11.9852 LearningRate 0.0743 Epoch: 2 Global Step: 114330 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:14,040-Speed 2633.01 samples/sec Loss 11.9762 LearningRate 0.0743 Epoch: 2 Global Step: 114340 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:17,934-Speed 2629.85 samples/sec Loss 11.9954 LearningRate 0.0743 Epoch: 2 Global Step: 114350 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:21,831-Speed 2628.72 samples/sec Loss 12.0518 LearningRate 0.0743 Epoch: 2 Global Step: 114360 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:25,725-Speed 2630.04 samples/sec Loss 11.9083 LearningRate 0.0743 Epoch: 2 Global Step: 114370 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:29,643-Speed 2614.50 samples/sec Loss 12.0162 LearningRate 0.0743 Epoch: 2 Global Step: 114380 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:33,551-Speed 2621.02 samples/sec Loss 12.0142 LearningRate 0.0743 Epoch: 2 Global Step: 114390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:37,523-Speed 2579.33 samples/sec Loss 11.9710 LearningRate 0.0743 Epoch: 2 Global Step: 114400 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:41,423-Speed 2626.23 samples/sec Loss 11.9018 LearningRate 0.0743 Epoch: 2 Global Step: 114410 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:34:45,322-Speed 2627.42 samples/sec Loss 12.0350 LearningRate 0.0743 Epoch: 2 Global Step: 114420 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:34:49,231-Speed 2619.61 samples/sec Loss 12.1293 LearningRate 0.0743 Epoch: 2 Global Step: 114430 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:34:53,135-Speed 2623.88 samples/sec Loss 11.9815 LearningRate 0.0743 Epoch: 2 Global Step: 114440 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:34:57,026-Speed 2632.61 samples/sec Loss 11.9595 LearningRate 0.0743 Epoch: 2 Global Step: 114450 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:35:00,916-Speed 2633.32 samples/sec Loss 11.9234 LearningRate 0.0743 Epoch: 2 Global Step: 114460 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:35:04,790-Speed 2643.55 samples/sec Loss 11.8120 LearningRate 0.0743 Epoch: 2 Global Step: 114470 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:08,771-Speed 2573.01 samples/sec Loss 11.8705 LearningRate 0.0743 Epoch: 2 Global Step: 114480 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:12,676-Speed 2623.35 samples/sec Loss 12.0225 LearningRate 0.0743 Epoch: 2 Global Step: 114490 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:16,566-Speed 2633.39 samples/sec Loss 11.9153 LearningRate 0.0743 Epoch: 2 Global Step: 114500 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:20,454-Speed 2634.23 samples/sec Loss 12.0544 LearningRate 0.0743 Epoch: 2 Global Step: 114510 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:24,354-Speed 2626.49 samples/sec Loss 11.9379 LearningRate 0.0743 Epoch: 2 Global Step: 114520 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:28,244-Speed 2632.86 samples/sec Loss 12.0191 LearningRate 0.0743 Epoch: 2 Global Step: 114530 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:32,137-Speed 2631.24 samples/sec Loss 12.0199 LearningRate 0.0743 Epoch: 2 Global Step: 114540 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:36,026-Speed 2633.62 samples/sec Loss 12.1558 LearningRate 0.0743 Epoch: 2 Global Step: 114550 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:39,915-Speed 2633.30 samples/sec Loss 12.1010 LearningRate 0.0743 Epoch: 2 Global Step: 114560 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:43,808-Speed 2631.26 samples/sec Loss 12.2688 LearningRate 0.0743 Epoch: 2 Global Step: 114570 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:35:47,688-Speed 2639.65 samples/sec Loss 12.0816 LearningRate 0.0743 Epoch: 2 Global Step: 114580 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:51,579-Speed 2632.66 samples/sec Loss 12.0359 LearningRate 0.0743 Epoch: 2 Global Step: 114590 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:55,475-Speed 2629.54 samples/sec Loss 11.9791 LearningRate 0.0743 Epoch: 2 Global Step: 114600 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:35:59,370-Speed 2629.05 samples/sec Loss 12.0302 LearningRate 0.0743 Epoch: 2 Global Step: 114610 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:03,278-Speed 2620.84 samples/sec Loss 12.0294 LearningRate 0.0743 Epoch: 2 Global Step: 114620 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:07,168-Speed 2633.17 samples/sec Loss 12.1009 LearningRate 0.0743 Epoch: 2 Global Step: 114630 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:11,070-Speed 2625.08 samples/sec Loss 12.0950 LearningRate 0.0743 Epoch: 2 Global Step: 114640 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:14,986-Speed 2615.80 samples/sec Loss 12.1627 LearningRate 0.0743 Epoch: 2 Global Step: 114650 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:18,881-Speed 2629.79 samples/sec Loss 12.1475 LearningRate 0.0743 Epoch: 2 Global Step: 114660 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:22,801-Speed 2612.63 samples/sec Loss 12.0242 LearningRate 0.0743 Epoch: 2 Global Step: 114670 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:26,719-Speed 2614.34 samples/sec Loss 12.0991 LearningRate 0.0743 Epoch: 2 Global Step: 114680 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:36:30,622-Speed 2624.12 samples/sec Loss 12.0551 LearningRate 0.0743 Epoch: 2 Global Step: 114690 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:36:34,530-Speed 2620.92 samples/sec Loss 12.0762 LearningRate 0.0743 Epoch: 2 Global Step: 114700 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:36:38,429-Speed 2626.99 samples/sec Loss 12.1232 LearningRate 0.0743 Epoch: 2 Global Step: 114710 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:36:42,330-Speed 2625.64 samples/sec Loss 11.9574 LearningRate 0.0743 Epoch: 2 Global Step: 114720 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:36:46,228-Speed 2627.34 samples/sec Loss 12.0588 LearningRate 0.0743 Epoch: 2 Global Step: 114730 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:36:50,127-Speed 2627.79 samples/sec Loss 12.0740 LearningRate 0.0743 Epoch: 2 Global Step: 114740 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:36:54,001-Speed 2643.93 samples/sec Loss 12.0796 LearningRate 0.0742 Epoch: 2 Global Step: 114750 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:36:57,891-Speed 2633.10 samples/sec Loss 12.0184 LearningRate 0.0742 Epoch: 2 Global Step: 114760 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:01,783-Speed 2631.16 samples/sec Loss 12.0376 LearningRate 0.0742 Epoch: 2 Global Step: 114770 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:05,679-Speed 2628.75 samples/sec Loss 11.8753 LearningRate 0.0742 Epoch: 2 Global Step: 114780 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:09,573-Speed 2630.65 samples/sec Loss 12.0686 LearningRate 0.0742 Epoch: 2 Global Step: 114790 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:13,463-Speed 2633.44 samples/sec Loss 12.0181 LearningRate 0.0742 Epoch: 2 Global Step: 114800 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:17,354-Speed 2632.72 samples/sec Loss 11.8307 LearningRate 0.0742 Epoch: 2 Global Step: 114810 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:21,248-Speed 2630.70 samples/sec Loss 12.0137 LearningRate 0.0742 Epoch: 2 Global Step: 114820 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:25,172-Speed 2610.38 samples/sec Loss 11.9836 LearningRate 0.0742 Epoch: 2 Global Step: 114830 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:29,071-Speed 2626.37 samples/sec Loss 11.9255 LearningRate 0.0742 Epoch: 2 Global Step: 114840 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:37:32,964-Speed 2631.36 samples/sec Loss 12.0046 LearningRate 0.0742 Epoch: 2 Global Step: 114850 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:37:36,854-Speed 2632.60 samples/sec Loss 11.9565 LearningRate 0.0742 Epoch: 2 Global Step: 114860 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:37:40,743-Speed 2634.21 samples/sec Loss 12.0128 LearningRate 0.0742 Epoch: 2 Global Step: 114870 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:37:44,636-Speed 2630.77 samples/sec Loss 12.0423 LearningRate 0.0742 Epoch: 2 Global Step: 114880 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:37:48,531-Speed 2629.75 samples/sec Loss 11.9277 LearningRate 0.0742 Epoch: 2 Global Step: 114890 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:37:52,425-Speed 2630.56 samples/sec Loss 11.9826 LearningRate 0.0742 Epoch: 2 Global Step: 114900 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:37:56,320-Speed 2630.03 samples/sec Loss 11.9586 LearningRate 0.0742 Epoch: 2 Global Step: 114910 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:38:00,211-Speed 2632.03 samples/sec Loss 12.0306 LearningRate 0.0742 Epoch: 2 Global Step: 114920 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:38:04,106-Speed 2629.39 samples/sec Loss 11.9454 LearningRate 0.0742 Epoch: 2 Global Step: 114930 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:38:08,018-Speed 2617.79 samples/sec Loss 12.0190 LearningRate 0.0742 Epoch: 2 Global Step: 114940 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:38:11,896-Speed 2641.84 samples/sec Loss 11.9858 LearningRate 0.0742 Epoch: 2 Global Step: 114950 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:15,789-Speed 2630.88 samples/sec Loss 12.0955 LearningRate 0.0742 Epoch: 2 Global Step: 114960 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:19,682-Speed 2630.79 samples/sec Loss 12.0120 LearningRate 0.0742 Epoch: 2 Global Step: 114970 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:23,584-Speed 2625.04 samples/sec Loss 12.0063 LearningRate 0.0742 Epoch: 2 Global Step: 114980 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:27,476-Speed 2631.76 samples/sec Loss 11.9452 LearningRate 0.0742 Epoch: 2 Global Step: 114990 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:31,371-Speed 2630.00 samples/sec Loss 11.9826 LearningRate 0.0742 Epoch: 2 Global Step: 115000 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:35,266-Speed 2629.64 samples/sec Loss 12.0461 LearningRate 0.0742 Epoch: 2 Global Step: 115010 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:39,160-Speed 2629.67 samples/sec Loss 11.9784 LearningRate 0.0742 Epoch: 2 Global Step: 115020 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:43,062-Speed 2624.88 samples/sec Loss 12.0209 LearningRate 0.0742 Epoch: 2 Global Step: 115030 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:46,956-Speed 2630.47 samples/sec Loss 11.9906 LearningRate 0.0742 Epoch: 2 Global Step: 115040 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:38:50,853-Speed 2627.88 samples/sec Loss 11.9933 LearningRate 0.0742 Epoch: 2 Global Step: 115050 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:38:54,749-Speed 2629.30 samples/sec Loss 12.0385 LearningRate 0.0742 Epoch: 2 Global Step: 115060 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:38:58,647-Speed 2628.14 samples/sec Loss 11.9482 LearningRate 0.0742 Epoch: 2 Global Step: 115070 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:39:02,541-Speed 2630.32 samples/sec Loss 12.0451 LearningRate 0.0742 Epoch: 2 Global Step: 115080 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:39:06,437-Speed 2629.17 samples/sec Loss 11.9456 LearningRate 0.0742 Epoch: 2 Global Step: 115090 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:39:10,332-Speed 2629.15 samples/sec Loss 11.8063 LearningRate 0.0742 Epoch: 2 Global Step: 115100 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:39:14,211-Speed 2640.57 samples/sec Loss 11.8489 LearningRate 0.0742 Epoch: 2 Global Step: 115110 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:18,113-Speed 2624.75 samples/sec Loss 12.0178 LearningRate 0.0742 Epoch: 2 Global Step: 115120 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:22,208-Speed 2501.21 samples/sec Loss 11.9213 LearningRate 0.0742 Epoch: 2 Global Step: 115130 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:26,285-Speed 2512.44 samples/sec Loss 11.8318 LearningRate 0.0742 Epoch: 2 Global Step: 115140 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:30,362-Speed 2512.45 samples/sec Loss 11.8547 LearningRate 0.0742 Epoch: 2 Global Step: 115150 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:34,428-Speed 2518.99 samples/sec Loss 12.0327 LearningRate 0.0742 Epoch: 2 Global Step: 115160 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:38,337-Speed 2620.27 samples/sec Loss 12.0176 LearningRate 0.0742 Epoch: 2 Global Step: 115170 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:42,232-Speed 2629.70 samples/sec Loss 11.7844 LearningRate 0.0742 Epoch: 2 Global Step: 115180 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:46,131-Speed 2626.70 samples/sec Loss 12.0369 LearningRate 0.0742 Epoch: 2 Global Step: 115190 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:50,029-Speed 2627.29 samples/sec Loss 11.9647 LearningRate 0.0742 Epoch: 2 Global Step: 115200 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:39:53,927-Speed 2628.28 samples/sec Loss 12.1206 LearningRate 0.0742 Epoch: 2 Global Step: 115210 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:39:57,820-Speed 2630.76 samples/sec Loss 11.9750 LearningRate 0.0742 Epoch: 2 Global Step: 115220 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:01,724-Speed 2623.61 samples/sec Loss 12.0770 LearningRate 0.0741 Epoch: 2 Global Step: 115230 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:05,620-Speed 2629.71 samples/sec Loss 11.9368 LearningRate 0.0741 Epoch: 2 Global Step: 115240 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:09,524-Speed 2623.54 samples/sec Loss 12.0570 LearningRate 0.0741 Epoch: 2 Global Step: 115250 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:13,423-Speed 2626.49 samples/sec Loss 11.9952 LearningRate 0.0741 Epoch: 2 Global Step: 115260 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:17,350-Speed 2608.51 samples/sec Loss 11.9866 LearningRate 0.0741 Epoch: 2 Global Step: 115270 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:21,249-Speed 2626.88 samples/sec Loss 11.9910 LearningRate 0.0741 Epoch: 2 Global Step: 115280 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:25,167-Speed 2614.47 samples/sec Loss 11.9483 LearningRate 0.0741 Epoch: 2 Global Step: 115290 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:29,077-Speed 2619.58 samples/sec Loss 11.9838 LearningRate 0.0741 Epoch: 2 Global Step: 115300 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:32,954-Speed 2642.27 samples/sec Loss 12.1201 LearningRate 0.0741 Epoch: 2 Global Step: 115310 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:36,843-Speed 2633.62 samples/sec Loss 11.9510 LearningRate 0.0741 Epoch: 2 Global Step: 115320 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:40,748-Speed 2622.61 samples/sec Loss 12.0418 LearningRate 0.0741 Epoch: 2 Global Step: 115330 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:44,644-Speed 2629.70 samples/sec Loss 11.9906 LearningRate 0.0741 Epoch: 2 Global Step: 115340 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:48,538-Speed 2630.61 samples/sec Loss 11.9791 LearningRate 0.0741 Epoch: 2 Global Step: 115350 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:52,440-Speed 2624.86 samples/sec Loss 12.0769 LearningRate 0.0741 Epoch: 2 Global Step: 115360 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:40:56,332-Speed 2631.38 samples/sec Loss 11.9445 LearningRate 0.0741 Epoch: 2 Global Step: 115370 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:41:00,231-Speed 2626.71 samples/sec Loss 11.9826 LearningRate 0.0741 Epoch: 2 Global Step: 115380 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:41:04,124-Speed 2631.18 samples/sec Loss 11.9924 LearningRate 0.0741 Epoch: 2 Global Step: 115390 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:41:08,013-Speed 2633.52 samples/sec Loss 12.1279 LearningRate 0.0741 Epoch: 2 Global Step: 115400 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:41:11,890-Speed 2642.22 samples/sec Loss 12.0731 LearningRate 0.0741 Epoch: 2 Global Step: 115410 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:41:15,786-Speed 2628.96 samples/sec Loss 12.1671 LearningRate 0.0741 Epoch: 2 Global Step: 115420 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:41:19,660-Speed 2643.61 samples/sec Loss 12.0916 LearningRate 0.0741 Epoch: 2 Global Step: 115430 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:41:23,561-Speed 2625.64 samples/sec Loss 12.1140 LearningRate 0.0741 Epoch: 2 Global Step: 115440 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:41:27,462-Speed 2625.49 samples/sec Loss 12.1444 LearningRate 0.0741 Epoch: 2 Global Step: 115450 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:41:31,346-Speed 2637.19 samples/sec Loss 11.8297 LearningRate 0.0741 Epoch: 2 Global Step: 115460 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:41:35,251-Speed 2622.22 samples/sec Loss 12.0184 LearningRate 0.0741 Epoch: 2 Global Step: 115470 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:41:39,157-Speed 2622.56 samples/sec Loss 12.0274 LearningRate 0.0741 Epoch: 2 Global Step: 115480 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:41:43,067-Speed 2620.05 samples/sec Loss 12.0270 LearningRate 0.0741 Epoch: 2 Global Step: 115490 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:41:46,961-Speed 2630.10 samples/sec Loss 11.9350 LearningRate 0.0741 Epoch: 2 Global Step: 115500 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:41:50,854-Speed 2630.96 samples/sec Loss 11.9493 LearningRate 0.0741 Epoch: 2 Global Step: 115510 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:41:54,752-Speed 2627.35 samples/sec Loss 11.9926 LearningRate 0.0741 Epoch: 2 Global Step: 115520 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:41:58,668-Speed 2615.59 samples/sec Loss 12.0215 LearningRate 0.0741 Epoch: 2 Global Step: 115530 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:42:02,566-Speed 2627.58 samples/sec Loss 11.9333 LearningRate 0.0741 Epoch: 2 Global Step: 115540 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:42:06,459-Speed 2630.85 samples/sec Loss 11.9869 LearningRate 0.0741 Epoch: 2 Global Step: 115550 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:42:10,353-Speed 2630.29 samples/sec Loss 12.0923 LearningRate 0.0741 Epoch: 2 Global Step: 115560 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:14,253-Speed 2626.75 samples/sec Loss 11.9746 LearningRate 0.0741 Epoch: 2 Global Step: 115570 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:18,146-Speed 2631.42 samples/sec Loss 12.0114 LearningRate 0.0741 Epoch: 2 Global Step: 115580 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:22,036-Speed 2632.64 samples/sec Loss 11.9876 LearningRate 0.0741 Epoch: 2 Global Step: 115590 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:25,927-Speed 2632.60 samples/sec Loss 12.1042 LearningRate 0.0741 Epoch: 2 Global Step: 115600 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:29,820-Speed 2630.97 samples/sec Loss 11.9436 LearningRate 0.0741 Epoch: 2 Global Step: 115610 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:33,714-Speed 2629.96 samples/sec Loss 11.9793 LearningRate 0.0741 Epoch: 2 Global Step: 115620 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:37,616-Speed 2624.58 samples/sec Loss 12.0453 LearningRate 0.0741 Epoch: 2 Global Step: 115630 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:41,512-Speed 2629.28 samples/sec Loss 11.9692 LearningRate 0.0741 Epoch: 2 Global Step: 115640 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:45,402-Speed 2632.65 samples/sec Loss 11.8873 LearningRate 0.0741 Epoch: 2 Global Step: 115650 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:42:49,296-Speed 2630.73 samples/sec Loss 11.9873 LearningRate 0.0741 Epoch: 2 Global Step: 115660 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:42:53,189-Speed 2630.89 samples/sec Loss 11.9720 LearningRate 0.0741 Epoch: 2 Global Step: 115670 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:42:57,088-Speed 2627.46 samples/sec Loss 11.9718 LearningRate 0.0741 Epoch: 2 Global Step: 115680 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:00,998-Speed 2618.95 samples/sec Loss 12.0111 LearningRate 0.0741 Epoch: 2 Global Step: 115690 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:04,894-Speed 2628.62 samples/sec Loss 12.0040 LearningRate 0.0741 Epoch: 2 Global Step: 115700 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:08,795-Speed 2626.02 samples/sec Loss 12.0012 LearningRate 0.0740 Epoch: 2 Global Step: 115710 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:12,687-Speed 2631.68 samples/sec Loss 11.8066 LearningRate 0.0740 Epoch: 2 Global Step: 115720 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:16,581-Speed 2629.52 samples/sec Loss 11.8829 LearningRate 0.0740 Epoch: 2 Global Step: 115730 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:20,486-Speed 2623.68 samples/sec Loss 11.8487 LearningRate 0.0740 Epoch: 2 Global Step: 115740 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:24,380-Speed 2629.97 samples/sec Loss 11.9769 LearningRate 0.0740 Epoch: 2 Global Step: 115750 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:43:28,285-Speed 2623.50 samples/sec Loss 11.7989 LearningRate 0.0740 Epoch: 2 Global Step: 115760 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:43:32,191-Speed 2622.09 samples/sec Loss 11.9213 LearningRate 0.0740 Epoch: 2 Global Step: 115770 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:43:36,097-Speed 2621.87 samples/sec Loss 11.8863 LearningRate 0.0740 Epoch: 2 Global Step: 115780 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:43:39,970-Speed 2644.11 samples/sec Loss 12.0436 LearningRate 0.0740 Epoch: 2 Global Step: 115790 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:43:43,888-Speed 2614.86 samples/sec Loss 11.8619 LearningRate 0.0740 Epoch: 2 Global Step: 115800 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:43:47,796-Speed 2620.71 samples/sec Loss 12.0508 LearningRate 0.0740 Epoch: 2 Global Step: 115810 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:43:51,848-Speed 2528.33 samples/sec Loss 12.0022 LearningRate 0.0740 Epoch: 2 Global Step: 115820 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:43:55,896-Speed 2529.99 samples/sec Loss 11.9319 LearningRate 0.0740 Epoch: 2 Global Step: 115830 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:43:59,791-Speed 2629.66 samples/sec Loss 12.0178 LearningRate 0.0740 Epoch: 2 Global Step: 115840 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:44:03,689-Speed 2627.60 samples/sec Loss 11.9688 LearningRate 0.0740 Epoch: 2 Global Step: 115850 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:44:07,596-Speed 2621.78 samples/sec Loss 11.9701 LearningRate 0.0740 Epoch: 2 Global Step: 115860 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:44:11,491-Speed 2629.39 samples/sec Loss 12.0084 LearningRate 0.0740 Epoch: 2 Global Step: 115870 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:44:15,386-Speed 2629.32 samples/sec Loss 11.8252 LearningRate 0.0740 Epoch: 2 Global Step: 115880 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:44:19,295-Speed 2620.90 samples/sec Loss 11.8796 LearningRate 0.0740 Epoch: 2 Global Step: 115890 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:23,190-Speed 2629.73 samples/sec Loss 12.1267 LearningRate 0.0740 Epoch: 2 Global Step: 115900 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:27,083-Speed 2631.03 samples/sec Loss 11.8837 LearningRate 0.0740 Epoch: 2 Global Step: 115910 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:30,980-Speed 2628.70 samples/sec Loss 11.9304 LearningRate 0.0740 Epoch: 2 Global Step: 115920 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:34,878-Speed 2627.52 samples/sec Loss 11.9426 LearningRate 0.0740 Epoch: 2 Global Step: 115930 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:38,776-Speed 2627.39 samples/sec Loss 11.9354 LearningRate 0.0740 Epoch: 2 Global Step: 115940 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:42,671-Speed 2629.47 samples/sec Loss 12.0092 LearningRate 0.0740 Epoch: 2 Global Step: 115950 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:46,568-Speed 2628.84 samples/sec Loss 11.9550 LearningRate 0.0740 Epoch: 2 Global Step: 115960 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:50,468-Speed 2625.78 samples/sec Loss 11.9854 LearningRate 0.0740 Epoch: 2 Global Step: 115970 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:54,367-Speed 2627.48 samples/sec Loss 12.0069 LearningRate 0.0740 Epoch: 2 Global Step: 115980 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:44:58,266-Speed 2627.06 samples/sec Loss 11.9493 LearningRate 0.0740 Epoch: 2 Global Step: 115990 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:45:02,164-Speed 2628.07 samples/sec Loss 11.9261 LearningRate 0.0740 Epoch: 2 Global Step: 116000 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:45:06,061-Speed 2627.59 samples/sec Loss 12.0076 LearningRate 0.0740 Epoch: 2 Global Step: 116010 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:45:09,959-Speed 2627.88 samples/sec Loss 11.9577 LearningRate 0.0740 Epoch: 2 Global Step: 116020 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:45:13,880-Speed 2612.12 samples/sec Loss 11.9480 LearningRate 0.0740 Epoch: 2 Global Step: 116030 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:45:17,780-Speed 2626.20 samples/sec Loss 11.9741 LearningRate 0.0740 Epoch: 2 Global Step: 116040 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:45:21,673-Speed 2631.33 samples/sec Loss 12.0362 LearningRate 0.0740 Epoch: 2 Global Step: 116050 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:45:25,574-Speed 2625.50 samples/sec Loss 11.7914 LearningRate 0.0740 Epoch: 2 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:29,468-Speed 2630.37 samples/sec Loss 11.9448 LearningRate 0.0740 Epoch: 2 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:33,360-Speed 2632.30 samples/sec Loss 12.0130 LearningRate 0.0740 Epoch: 2 Global Step: 116080 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:37,259-Speed 2626.94 samples/sec Loss 11.9946 LearningRate 0.0740 Epoch: 2 Global Step: 116090 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:41,160-Speed 2625.38 samples/sec Loss 11.8986 LearningRate 0.0740 Epoch: 2 Global Step: 116100 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:45,056-Speed 2629.15 samples/sec Loss 12.0030 LearningRate 0.0740 Epoch: 2 Global Step: 116110 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:48,954-Speed 2627.86 samples/sec Loss 11.9666 LearningRate 0.0740 Epoch: 2 Global Step: 116120 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:52,876-Speed 2611.34 samples/sec Loss 12.0139 LearningRate 0.0740 Epoch: 2 Global Step: 116130 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:45:56,809-Speed 2604.32 samples/sec Loss 12.0147 LearningRate 0.0740 Epoch: 2 Global Step: 116140 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:46:00,729-Speed 2613.32 samples/sec Loss 12.1951 LearningRate 0.0740 Epoch: 2 Global Step: 116150 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:46:04,629-Speed 2626.34 samples/sec Loss 12.0257 LearningRate 0.0740 Epoch: 2 Global Step: 116160 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:08,526-Speed 2628.08 samples/sec Loss 12.0600 LearningRate 0.0740 Epoch: 2 Global Step: 116170 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:12,426-Speed 2626.23 samples/sec Loss 11.9802 LearningRate 0.0740 Epoch: 2 Global Step: 116180 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:16,331-Speed 2622.52 samples/sec Loss 12.0393 LearningRate 0.0739 Epoch: 2 Global Step: 116190 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:20,238-Speed 2621.23 samples/sec Loss 12.0589 LearningRate 0.0739 Epoch: 2 Global Step: 116200 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:24,162-Speed 2610.91 samples/sec Loss 11.9029 LearningRate 0.0739 Epoch: 2 Global Step: 116210 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:28,066-Speed 2623.55 samples/sec Loss 11.8582 LearningRate 0.0739 Epoch: 2 Global Step: 116220 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:31,965-Speed 2627.00 samples/sec Loss 11.7847 LearningRate 0.0739 Epoch: 2 Global Step: 116230 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:35,864-Speed 2627.26 samples/sec Loss 11.8334 LearningRate 0.0739 Epoch: 2 Global Step: 116240 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:39,758-Speed 2630.33 samples/sec Loss 11.9196 LearningRate 0.0739 Epoch: 2 Global Step: 116250 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:43,633-Speed 2645.08 samples/sec Loss 11.9892 LearningRate 0.0739 Epoch: 2 Global Step: 116260 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:47,541-Speed 2621.32 samples/sec Loss 11.9075 LearningRate 0.0739 Epoch: 2 Global Step: 116270 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:51,444-Speed 2624.13 samples/sec Loss 11.9103 LearningRate 0.0739 Epoch: 2 Global Step: 116280 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:55,346-Speed 2625.33 samples/sec Loss 11.8987 LearningRate 0.0739 Epoch: 2 Global Step: 116290 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:46:59,244-Speed 2627.20 samples/sec Loss 11.9408 LearningRate 0.0739 Epoch: 2 Global Step: 116300 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:03,145-Speed 2625.70 samples/sec Loss 11.8878 LearningRate 0.0739 Epoch: 2 Global Step: 116310 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:07,048-Speed 2624.38 samples/sec Loss 11.8787 LearningRate 0.0739 Epoch: 2 Global Step: 116320 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:10,949-Speed 2625.58 samples/sec Loss 11.9597 LearningRate 0.0739 Epoch: 2 Global Step: 116330 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:14,839-Speed 2633.02 samples/sec Loss 11.9081 LearningRate 0.0739 Epoch: 2 Global Step: 116340 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:18,741-Speed 2625.12 samples/sec Loss 12.0817 LearningRate 0.0739 Epoch: 2 Global Step: 116350 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:22,638-Speed 2628.37 samples/sec Loss 12.0767 LearningRate 0.0739 Epoch: 2 Global Step: 116360 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:47:26,540-Speed 2625.23 samples/sec Loss 11.9980 LearningRate 0.0739 Epoch: 2 Global Step: 116370 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:47:30,429-Speed 2634.27 samples/sec Loss 11.9136 LearningRate 0.0739 Epoch: 2 Global Step: 116380 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:34,329-Speed 2625.66 samples/sec Loss 12.0453 LearningRate 0.0739 Epoch: 2 Global Step: 116390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:38,224-Speed 2629.48 samples/sec Loss 11.9465 LearningRate 0.0739 Epoch: 2 Global Step: 116400 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:42,117-Speed 2631.54 samples/sec Loss 11.9256 LearningRate 0.0739 Epoch: 2 Global Step: 116410 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:46,008-Speed 2631.64 samples/sec Loss 12.0195 LearningRate 0.0739 Epoch: 2 Global Step: 116420 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:49,902-Speed 2630.56 samples/sec Loss 11.9428 LearningRate 0.0739 Epoch: 2 Global Step: 116430 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:53,797-Speed 2629.60 samples/sec Loss 11.8956 LearningRate 0.0739 Epoch: 2 Global Step: 116440 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:47:57,696-Speed 2626.95 samples/sec Loss 11.9867 LearningRate 0.0739 Epoch: 2 Global Step: 116450 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:01,590-Speed 2630.41 samples/sec Loss 11.8672 LearningRate 0.0739 Epoch: 2 Global Step: 116460 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:05,482-Speed 2631.69 samples/sec Loss 11.7444 LearningRate 0.0739 Epoch: 2 Global Step: 116470 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:09,378-Speed 2629.07 samples/sec Loss 12.0091 LearningRate 0.0739 Epoch: 2 Global Step: 116480 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:48:13,265-Speed 2635.16 samples/sec Loss 11.9117 LearningRate 0.0739 Epoch: 2 Global Step: 116490 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:17,163-Speed 2627.49 samples/sec Loss 12.0481 LearningRate 0.0739 Epoch: 2 Global Step: 116500 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:21,055-Speed 2631.78 samples/sec Loss 12.0089 LearningRate 0.0739 Epoch: 2 Global Step: 116510 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:24,949-Speed 2630.24 samples/sec Loss 11.9381 LearningRate 0.0739 Epoch: 2 Global Step: 116520 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:28,847-Speed 2627.31 samples/sec Loss 11.9905 LearningRate 0.0739 Epoch: 2 Global Step: 116530 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:32,757-Speed 2619.46 samples/sec Loss 12.0694 LearningRate 0.0739 Epoch: 2 Global Step: 116540 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:36,653-Speed 2629.19 samples/sec Loss 12.0641 LearningRate 0.0739 Epoch: 2 Global Step: 116550 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:40,546-Speed 2630.74 samples/sec Loss 11.9085 LearningRate 0.0739 Epoch: 2 Global Step: 116560 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:44,454-Speed 2621.31 samples/sec Loss 11.8661 LearningRate 0.0739 Epoch: 2 Global Step: 116570 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:48,365-Speed 2618.99 samples/sec Loss 11.9924 LearningRate 0.0739 Epoch: 2 Global Step: 116580 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:48:52,262-Speed 2628.15 samples/sec Loss 11.9982 LearningRate 0.0739 Epoch: 2 Global Step: 116590 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:48:56,174-Speed 2617.97 samples/sec Loss 11.8722 LearningRate 0.0739 Epoch: 2 Global Step: 116600 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:49:00,127-Speed 2591.25 samples/sec Loss 11.8385 LearningRate 0.0739 Epoch: 2 Global Step: 116610 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:49:04,020-Speed 2631.01 samples/sec Loss 11.8685 LearningRate 0.0739 Epoch: 2 Global Step: 116620 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:49:07,916-Speed 2629.02 samples/sec Loss 12.0318 LearningRate 0.0739 Epoch: 2 Global Step: 116630 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:49:11,815-Speed 2627.33 samples/sec Loss 11.9657 LearningRate 0.0739 Epoch: 2 Global Step: 116640 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:49:15,711-Speed 2628.89 samples/sec Loss 11.8466 LearningRate 0.0739 Epoch: 2 Global Step: 116650 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:49:19,617-Speed 2622.34 samples/sec Loss 11.7905 LearningRate 0.0739 Epoch: 2 Global Step: 116660 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:23,521-Speed 2622.95 samples/sec Loss 11.9550 LearningRate 0.0739 Epoch: 2 Global Step: 116670 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:27,429-Speed 2621.48 samples/sec Loss 12.0749 LearningRate 0.0738 Epoch: 2 Global Step: 116680 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:31,323-Speed 2630.22 samples/sec Loss 11.8952 LearningRate 0.0738 Epoch: 2 Global Step: 116690 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:35,217-Speed 2629.99 samples/sec Loss 11.9168 LearningRate 0.0738 Epoch: 2 Global Step: 116700 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:39,115-Speed 2627.80 samples/sec Loss 11.9441 LearningRate 0.0738 Epoch: 2 Global Step: 116710 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:43,009-Speed 2630.67 samples/sec Loss 12.0480 LearningRate 0.0738 Epoch: 2 Global Step: 116720 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:46,914-Speed 2622.33 samples/sec Loss 11.8963 LearningRate 0.0738 Epoch: 2 Global Step: 116730 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:50,808-Speed 2630.34 samples/sec Loss 11.8685 LearningRate 0.0738 Epoch: 2 Global Step: 116740 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:54,704-Speed 2628.87 samples/sec Loss 11.8547 LearningRate 0.0738 Epoch: 2 Global Step: 116750 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:49:58,570-Speed 2649.52 samples/sec Loss 11.8233 LearningRate 0.0738 Epoch: 2 Global Step: 116760 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:02,457-Speed 2634.90 samples/sec Loss 11.9946 LearningRate 0.0738 Epoch: 2 Global Step: 116770 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:06,353-Speed 2629.11 samples/sec Loss 11.9737 LearningRate 0.0738 Epoch: 2 Global Step: 116780 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:10,247-Speed 2630.48 samples/sec Loss 12.0038 LearningRate 0.0738 Epoch: 2 Global Step: 116790 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:14,142-Speed 2629.69 samples/sec Loss 11.9991 LearningRate 0.0738 Epoch: 2 Global Step: 116800 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:18,038-Speed 2628.85 samples/sec Loss 12.0577 LearningRate 0.0738 Epoch: 2 Global Step: 116810 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:21,930-Speed 2631.31 samples/sec Loss 11.8173 LearningRate 0.0738 Epoch: 2 Global Step: 116820 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:25,840-Speed 2619.98 samples/sec Loss 11.9864 LearningRate 0.0738 Epoch: 2 Global Step: 116830 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:29,742-Speed 2624.41 samples/sec Loss 11.9161 LearningRate 0.0738 Epoch: 2 Global Step: 116840 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:33,635-Speed 2631.19 samples/sec Loss 11.9112 LearningRate 0.0738 Epoch: 2 Global Step: 116850 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:50:37,529-Speed 2630.22 samples/sec Loss 11.8549 LearningRate 0.0738 Epoch: 2 Global Step: 116860 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:50:41,423-Speed 2630.31 samples/sec Loss 11.9584 LearningRate 0.0738 Epoch: 2 Global Step: 116870 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:50:45,331-Speed 2620.96 samples/sec Loss 11.9555 LearningRate 0.0738 Epoch: 2 Global Step: 116880 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:50:49,223-Speed 2631.65 samples/sec Loss 11.9068 LearningRate 0.0738 Epoch: 2 Global Step: 116890 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:50:53,116-Speed 2631.42 samples/sec Loss 11.9706 LearningRate 0.0738 Epoch: 2 Global Step: 116900 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:50:57,010-Speed 2630.58 samples/sec Loss 11.8385 LearningRate 0.0738 Epoch: 2 Global Step: 116910 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:51:00,901-Speed 2631.61 samples/sec Loss 11.9940 LearningRate 0.0738 Epoch: 2 Global Step: 116920 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:51:04,802-Speed 2625.92 samples/sec Loss 11.9851 LearningRate 0.0738 Epoch: 2 Global Step: 116930 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:51:08,695-Speed 2631.09 samples/sec Loss 11.8600 LearningRate 0.0738 Epoch: 2 Global Step: 116940 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:51:12,588-Speed 2630.74 samples/sec Loss 11.9452 LearningRate 0.0738 Epoch: 2 Global Step: 116950 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:51:16,482-Speed 2630.26 samples/sec Loss 11.9690 LearningRate 0.0738 Epoch: 2 Global Step: 116960 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:51:20,390-Speed 2621.33 samples/sec Loss 11.9535 LearningRate 0.0738 Epoch: 2 Global Step: 116970 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:51:24,239-Speed 2661.06 samples/sec Loss 11.8362 LearningRate 0.0738 Epoch: 2 Global Step: 116980 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:51:28,106-Speed 2648.69 samples/sec Loss 12.0698 LearningRate 0.0738 Epoch: 2 Global Step: 116990 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:32,009-Speed 2624.63 samples/sec Loss 11.9321 LearningRate 0.0738 Epoch: 2 Global Step: 117000 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:35,902-Speed 2631.21 samples/sec Loss 12.0638 LearningRate 0.0738 Epoch: 2 Global Step: 117010 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:39,797-Speed 2629.32 samples/sec Loss 11.8074 LearningRate 0.0738 Epoch: 2 Global Step: 117020 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:43,692-Speed 2629.67 samples/sec Loss 12.0089 LearningRate 0.0738 Epoch: 2 Global Step: 117030 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:47,582-Speed 2632.82 samples/sec Loss 12.0535 LearningRate 0.0738 Epoch: 2 Global Step: 117040 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:51,471-Speed 2633.83 samples/sec Loss 12.1024 LearningRate 0.0738 Epoch: 2 Global Step: 117050 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:55,362-Speed 2632.75 samples/sec Loss 12.0172 LearningRate 0.0738 Epoch: 2 Global Step: 117060 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:51:59,260-Speed 2627.95 samples/sec Loss 11.9415 LearningRate 0.0738 Epoch: 2 Global Step: 117070 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:52:03,152-Speed 2631.69 samples/sec Loss 12.0597 LearningRate 0.0738 Epoch: 2 Global Step: 117080 Fp16 Grad Scale: 8192 Required: 80 hours
Training: 2022-04-13 08:52:07,046-Speed 2630.11 samples/sec Loss 11.9577 LearningRate 0.0738 Epoch: 2 Global Step: 117090 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:10,958-Speed 2618.23 samples/sec Loss 11.9168 LearningRate 0.0738 Epoch: 2 Global Step: 117100 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:14,846-Speed 2634.06 samples/sec Loss 11.9171 LearningRate 0.0738 Epoch: 2 Global Step: 117110 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:18,739-Speed 2631.38 samples/sec Loss 11.8976 LearningRate 0.0738 Epoch: 2 Global Step: 117120 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:22,636-Speed 2628.35 samples/sec Loss 11.9674 LearningRate 0.0738 Epoch: 2 Global Step: 117130 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:26,543-Speed 2621.48 samples/sec Loss 12.1788 LearningRate 0.0738 Epoch: 2 Global Step: 117140 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:30,435-Speed 2631.80 samples/sec Loss 11.9135 LearningRate 0.0738 Epoch: 2 Global Step: 117150 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:34,326-Speed 2632.31 samples/sec Loss 11.9309 LearningRate 0.0737 Epoch: 2 Global Step: 117160 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:38,222-Speed 2629.31 samples/sec Loss 12.0753 LearningRate 0.0737 Epoch: 2 Global Step: 117170 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:42,115-Speed 2631.03 samples/sec Loss 12.4284 LearningRate 0.0737 Epoch: 2 Global Step: 117180 Fp16 Grad Scale: 16384 Required: 80 hours
Training: 2022-04-13 08:52:46,032-Speed 2614.65 samples/sec Loss 12.1309 LearningRate 0.0737 Epoch: 2 Global Step: 117190 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:52:49,928-Speed 2629.03 samples/sec Loss 12.0262 LearningRate 0.0737 Epoch: 2 Global Step: 117200 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:52:53,820-Speed 2631.92 samples/sec Loss 12.1582 LearningRate 0.0737 Epoch: 2 Global Step: 117210 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:52:57,714-Speed 2630.54 samples/sec Loss 11.9538 LearningRate 0.0737 Epoch: 2 Global Step: 117220 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:53:01,611-Speed 2628.17 samples/sec Loss 11.9390 LearningRate 0.0737 Epoch: 2 Global Step: 117230 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:53:05,508-Speed 2628.04 samples/sec Loss 11.9337 LearningRate 0.0737 Epoch: 2 Global Step: 117240 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:53:09,398-Speed 2633.22 samples/sec Loss 11.9973 LearningRate 0.0737 Epoch: 2 Global Step: 117250 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:53:13,290-Speed 2631.51 samples/sec Loss 12.0182 LearningRate 0.0737 Epoch: 2 Global Step: 117260 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:53:17,189-Speed 2626.80 samples/sec Loss 11.8973 LearningRate 0.0737 Epoch: 2 Global Step: 117270 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:53:21,086-Speed 2628.71 samples/sec Loss 11.8872 LearningRate 0.0737 Epoch: 2 Global Step: 117280 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 08:53:24,979-Speed 2630.74 samples/sec Loss 12.1040 LearningRate 0.0737 Epoch: 2 Global Step: 117290 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:28,868-Speed 2634.01 samples/sec Loss 11.8440 LearningRate 0.0737 Epoch: 2 Global Step: 117300 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:32,762-Speed 2629.80 samples/sec Loss 12.0067 LearningRate 0.0737 Epoch: 2 Global Step: 117310 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:36,654-Speed 2632.11 samples/sec Loss 11.9581 LearningRate 0.0737 Epoch: 2 Global Step: 117320 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:40,550-Speed 2628.50 samples/sec Loss 11.8815 LearningRate 0.0737 Epoch: 2 Global Step: 117330 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:44,441-Speed 2632.82 samples/sec Loss 11.9745 LearningRate 0.0737 Epoch: 2 Global Step: 117340 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:48,334-Speed 2630.83 samples/sec Loss 12.0768 LearningRate 0.0737 Epoch: 2 Global Step: 117350 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:52,229-Speed 2629.76 samples/sec Loss 11.9299 LearningRate 0.0737 Epoch: 2 Global Step: 117360 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:53:56,120-Speed 2632.53 samples/sec Loss 12.1296 LearningRate 0.0737 Epoch: 2 Global Step: 117370 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:54:00,013-Speed 2630.82 samples/sec Loss 12.0249 LearningRate 0.0737 Epoch: 2 Global Step: 117380 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:54:03,910-Speed 2628.46 samples/sec Loss 11.8572 LearningRate 0.0737 Epoch: 2 Global Step: 117390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:07,802-Speed 2631.49 samples/sec Loss 12.0591 LearningRate 0.0737 Epoch: 2 Global Step: 117400 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:11,690-Speed 2634.22 samples/sec Loss 11.8585 LearningRate 0.0737 Epoch: 2 Global Step: 117410 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:15,580-Speed 2632.94 samples/sec Loss 11.9499 LearningRate 0.0737 Epoch: 2 Global Step: 117420 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:19,472-Speed 2632.36 samples/sec Loss 12.0459 LearningRate 0.0737 Epoch: 2 Global Step: 117430 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:23,371-Speed 2626.80 samples/sec Loss 11.9391 LearningRate 0.0737 Epoch: 2 Global Step: 117440 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:27,268-Speed 2628.82 samples/sec Loss 11.6792 LearningRate 0.0737 Epoch: 2 Global Step: 117450 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:31,162-Speed 2630.40 samples/sec Loss 11.9228 LearningRate 0.0737 Epoch: 2 Global Step: 117460 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:35,069-Speed 2621.04 samples/sec Loss 11.9331 LearningRate 0.0737 Epoch: 2 Global Step: 117470 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:38,972-Speed 2624.17 samples/sec Loss 11.9872 LearningRate 0.0737 Epoch: 2 Global Step: 117480 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:42,847-Speed 2643.57 samples/sec Loss 11.9365 LearningRate 0.0737 Epoch: 2 Global Step: 117490 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:46,740-Speed 2631.19 samples/sec Loss 11.8601 LearningRate 0.0737 Epoch: 2 Global Step: 117500 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:50,657-Speed 2614.83 samples/sec Loss 11.9074 LearningRate 0.0737 Epoch: 2 Global Step: 117510 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:54,550-Speed 2631.02 samples/sec Loss 11.9360 LearningRate 0.0737 Epoch: 2 Global Step: 117520 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:54:58,463-Speed 2617.76 samples/sec Loss 11.8626 LearningRate 0.0737 Epoch: 2 Global Step: 117530 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:55:02,344-Speed 2639.48 samples/sec Loss 11.9055 LearningRate 0.0737 Epoch: 2 Global Step: 117540 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:06,237-Speed 2631.32 samples/sec Loss 11.9528 LearningRate 0.0737 Epoch: 2 Global Step: 117550 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:10,139-Speed 2624.54 samples/sec Loss 11.8604 LearningRate 0.0737 Epoch: 2 Global Step: 117560 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:14,039-Speed 2626.69 samples/sec Loss 11.9347 LearningRate 0.0737 Epoch: 2 Global Step: 117570 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:17,944-Speed 2622.29 samples/sec Loss 11.8489 LearningRate 0.0737 Epoch: 2 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:21,852-Speed 2620.95 samples/sec Loss 11.9981 LearningRate 0.0737 Epoch: 2 Global Step: 117590 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:25,744-Speed 2631.95 samples/sec Loss 11.9963 LearningRate 0.0737 Epoch: 2 Global Step: 117600 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:29,634-Speed 2633.87 samples/sec Loss 11.8162 LearningRate 0.0737 Epoch: 2 Global Step: 117610 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:33,523-Speed 2633.34 samples/sec Loss 11.9221 LearningRate 0.0737 Epoch: 2 Global Step: 117620 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:37,414-Speed 2631.87 samples/sec Loss 12.0157 LearningRate 0.0737 Epoch: 2 Global Step: 117630 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 08:55:41,321-Speed 2621.40 samples/sec Loss 11.9540 LearningRate 0.0736 Epoch: 2 Global Step: 117640 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:55:45,251-Speed 2606.51 samples/sec Loss 12.0406 LearningRate 0.0736 Epoch: 2 Global Step: 117650 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:55:49,152-Speed 2625.71 samples/sec Loss 11.9340 LearningRate 0.0736 Epoch: 2 Global Step: 117660 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:55:53,062-Speed 2619.81 samples/sec Loss 11.9973 LearningRate 0.0736 Epoch: 2 Global Step: 117670 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:55:56,972-Speed 2619.58 samples/sec Loss 12.0556 LearningRate 0.0736 Epoch: 2 Global Step: 117680 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:00,877-Speed 2622.95 samples/sec Loss 11.8589 LearningRate 0.0736 Epoch: 2 Global Step: 117690 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:04,790-Speed 2617.44 samples/sec Loss 11.8397 LearningRate 0.0736 Epoch: 2 Global Step: 117700 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:08,705-Speed 2615.99 samples/sec Loss 11.9817 LearningRate 0.0736 Epoch: 2 Global Step: 117710 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:12,601-Speed 2629.14 samples/sec Loss 11.9664 LearningRate 0.0736 Epoch: 2 Global Step: 117720 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:16,490-Speed 2633.62 samples/sec Loss 11.9719 LearningRate 0.0736 Epoch: 2 Global Step: 117730 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:20,385-Speed 2629.50 samples/sec Loss 11.9412 LearningRate 0.0736 Epoch: 2 Global Step: 117740 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:56:24,279-Speed 2630.41 samples/sec Loss 11.8853 LearningRate 0.0736 Epoch: 2 Global Step: 117750 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:56:28,173-Speed 2630.50 samples/sec Loss 12.0130 LearningRate 0.0736 Epoch: 2 Global Step: 117760 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:56:32,066-Speed 2631.44 samples/sec Loss 11.9533 LearningRate 0.0736 Epoch: 2 Global Step: 117770 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:56:35,945-Speed 2640.07 samples/sec Loss 11.8916 LearningRate 0.0736 Epoch: 2 Global Step: 117780 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:39,909-Speed 2583.72 samples/sec Loss 11.8579 LearningRate 0.0736 Epoch: 2 Global Step: 117790 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:43,864-Speed 2589.82 samples/sec Loss 11.9144 LearningRate 0.0736 Epoch: 2 Global Step: 117800 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:47,766-Speed 2624.94 samples/sec Loss 11.9713 LearningRate 0.0736 Epoch: 2 Global Step: 117810 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:51,679-Speed 2617.11 samples/sec Loss 11.9801 LearningRate 0.0736 Epoch: 2 Global Step: 117820 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:55,585-Speed 2623.05 samples/sec Loss 12.0006 LearningRate 0.0736 Epoch: 2 Global Step: 117830 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:56:59,486-Speed 2625.91 samples/sec Loss 12.0635 LearningRate 0.0736 Epoch: 2 Global Step: 117840 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:57:03,386-Speed 2626.06 samples/sec Loss 11.9006 LearningRate 0.0736 Epoch: 2 Global Step: 117850 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:57:07,287-Speed 2625.35 samples/sec Loss 11.9145 LearningRate 0.0736 Epoch: 2 Global Step: 117860 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:57:11,197-Speed 2619.27 samples/sec Loss 11.9371 LearningRate 0.0736 Epoch: 2 Global Step: 117870 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:57:15,104-Speed 2621.74 samples/sec Loss 11.8372 LearningRate 0.0736 Epoch: 2 Global Step: 117880 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:19,033-Speed 2607.37 samples/sec Loss 11.8798 LearningRate 0.0736 Epoch: 2 Global Step: 117890 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:22,945-Speed 2618.10 samples/sec Loss 11.9389 LearningRate 0.0736 Epoch: 2 Global Step: 117900 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:26,848-Speed 2624.48 samples/sec Loss 11.8592 LearningRate 0.0736 Epoch: 2 Global Step: 117910 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:30,755-Speed 2621.52 samples/sec Loss 11.9575 LearningRate 0.0736 Epoch: 2 Global Step: 117920 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:34,672-Speed 2615.14 samples/sec Loss 11.9312 LearningRate 0.0736 Epoch: 2 Global Step: 117930 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:38,581-Speed 2620.06 samples/sec Loss 12.0170 LearningRate 0.0736 Epoch: 2 Global Step: 117940 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:42,490-Speed 2620.11 samples/sec Loss 12.0212 LearningRate 0.0736 Epoch: 2 Global Step: 117950 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:46,384-Speed 2629.97 samples/sec Loss 11.8646 LearningRate 0.0736 Epoch: 2 Global Step: 117960 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:50,282-Speed 2628.31 samples/sec Loss 11.9074 LearningRate 0.0736 Epoch: 2 Global Step: 117970 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:54,160-Speed 2640.91 samples/sec Loss 11.9334 LearningRate 0.0736 Epoch: 2 Global Step: 117980 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:57:58,075-Speed 2616.60 samples/sec Loss 11.7371 LearningRate 0.0736 Epoch: 2 Global Step: 117990 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:58:01,993-Speed 2613.95 samples/sec Loss 11.8595 LearningRate 0.0736 Epoch: 2 Global Step: 118000 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:58:05,870-Speed 2642.32 samples/sec Loss 11.9039 LearningRate 0.0736 Epoch: 2 Global Step: 118010 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:09,763-Speed 2631.05 samples/sec Loss 11.9608 LearningRate 0.0736 Epoch: 2 Global Step: 118020 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:13,660-Speed 2628.28 samples/sec Loss 11.9086 LearningRate 0.0736 Epoch: 2 Global Step: 118030 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:17,553-Speed 2630.96 samples/sec Loss 11.7136 LearningRate 0.0736 Epoch: 2 Global Step: 118040 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:21,449-Speed 2628.29 samples/sec Loss 11.8635 LearningRate 0.0736 Epoch: 2 Global Step: 118050 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:25,344-Speed 2630.23 samples/sec Loss 12.0012 LearningRate 0.0736 Epoch: 2 Global Step: 118060 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:29,236-Speed 2632.10 samples/sec Loss 12.0541 LearningRate 0.0736 Epoch: 2 Global Step: 118070 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:33,141-Speed 2622.27 samples/sec Loss 11.8097 LearningRate 0.0736 Epoch: 2 Global Step: 118080 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:37,035-Speed 2630.57 samples/sec Loss 11.7451 LearningRate 0.0736 Epoch: 2 Global Step: 118090 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:40,929-Speed 2630.81 samples/sec Loss 11.7566 LearningRate 0.0736 Epoch: 2 Global Step: 118100 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:58:44,822-Speed 2631.40 samples/sec Loss 11.8925 LearningRate 0.0736 Epoch: 2 Global Step: 118110 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:58:48,713-Speed 2631.74 samples/sec Loss 11.8493 LearningRate 0.0736 Epoch: 2 Global Step: 118120 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:58:52,624-Speed 2619.00 samples/sec Loss 11.7599 LearningRate 0.0735 Epoch: 2 Global Step: 118130 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:58:56,521-Speed 2628.86 samples/sec Loss 11.7724 LearningRate 0.0735 Epoch: 2 Global Step: 118140 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:59:00,425-Speed 2623.26 samples/sec Loss 11.9398 LearningRate 0.0735 Epoch: 2 Global Step: 118150 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:59:04,322-Speed 2627.89 samples/sec Loss 12.0056 LearningRate 0.0735 Epoch: 2 Global Step: 118160 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:59:08,219-Speed 2629.43 samples/sec Loss 11.9017 LearningRate 0.0735 Epoch: 2 Global Step: 118170 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:59:12,098-Speed 2640.38 samples/sec Loss 11.9225 LearningRate 0.0735 Epoch: 2 Global Step: 118180 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:15,997-Speed 2627.74 samples/sec Loss 11.9236 LearningRate 0.0735 Epoch: 2 Global Step: 118190 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:19,914-Speed 2615.10 samples/sec Loss 11.7372 LearningRate 0.0735 Epoch: 2 Global Step: 118200 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:23,810-Speed 2628.87 samples/sec Loss 11.8968 LearningRate 0.0735 Epoch: 2 Global Step: 118210 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:27,705-Speed 2630.33 samples/sec Loss 11.8301 LearningRate 0.0735 Epoch: 2 Global Step: 118220 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:31,599-Speed 2629.55 samples/sec Loss 11.8247 LearningRate 0.0735 Epoch: 2 Global Step: 118230 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:35,493-Speed 2630.56 samples/sec Loss 11.9002 LearningRate 0.0735 Epoch: 2 Global Step: 118240 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:39,412-Speed 2612.86 samples/sec Loss 11.8573 LearningRate 0.0735 Epoch: 2 Global Step: 118250 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:43,313-Speed 2626.35 samples/sec Loss 11.9724 LearningRate 0.0735 Epoch: 2 Global Step: 118260 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:47,204-Speed 2632.15 samples/sec Loss 11.7616 LearningRate 0.0735 Epoch: 2 Global Step: 118270 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 08:59:51,097-Speed 2631.61 samples/sec Loss 11.9315 LearningRate 0.0735 Epoch: 2 Global Step: 118280 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:59:54,989-Speed 2631.54 samples/sec Loss 11.7485 LearningRate 0.0735 Epoch: 2 Global Step: 118290 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 08:59:58,881-Speed 2631.76 samples/sec Loss 11.9002 LearningRate 0.0735 Epoch: 2 Global Step: 118300 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:00:02,775-Speed 2630.17 samples/sec Loss 11.9119 LearningRate 0.0735 Epoch: 2 Global Step: 118310 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:00:06,671-Speed 2628.37 samples/sec Loss 11.9136 LearningRate 0.0735 Epoch: 2 Global Step: 118320 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:00:10,568-Speed 2628.09 samples/sec Loss 11.9895 LearningRate 0.0735 Epoch: 2 Global Step: 118330 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:00:14,617-Speed 2530.09 samples/sec Loss 11.8178 LearningRate 0.0735 Epoch: 2 Global Step: 118340 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:00:18,683-Speed 2519.03 samples/sec Loss 11.7766 LearningRate 0.0735 Epoch: 2 Global Step: 118350 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:00:22,721-Speed 2536.93 samples/sec Loss 12.0801 LearningRate 0.0735 Epoch: 2 Global Step: 118360 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:26,684-Speed 2583.85 samples/sec Loss 11.9333 LearningRate 0.0735 Epoch: 2 Global Step: 118370 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:30,748-Speed 2520.74 samples/sec Loss 11.8618 LearningRate 0.0735 Epoch: 2 Global Step: 118380 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:34,826-Speed 2511.24 samples/sec Loss 11.9710 LearningRate 0.0735 Epoch: 2 Global Step: 118390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:38,865-Speed 2536.05 samples/sec Loss 11.8974 LearningRate 0.0735 Epoch: 2 Global Step: 118400 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:42,758-Speed 2631.08 samples/sec Loss 11.9326 LearningRate 0.0735 Epoch: 2 Global Step: 118410 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:46,746-Speed 2568.34 samples/sec Loss 11.9914 LearningRate 0.0735 Epoch: 2 Global Step: 118420 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:50,641-Speed 2629.74 samples/sec Loss 11.8481 LearningRate 0.0735 Epoch: 2 Global Step: 118430 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:54,565-Speed 2610.75 samples/sec Loss 11.6769 LearningRate 0.0735 Epoch: 2 Global Step: 118440 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:00:58,455-Speed 2633.05 samples/sec Loss 11.8882 LearningRate 0.0735 Epoch: 2 Global Step: 118450 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:02,355-Speed 2626.51 samples/sec Loss 12.0606 LearningRate 0.0735 Epoch: 2 Global Step: 118460 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:01:06,248-Speed 2630.83 samples/sec Loss 11.8521 LearningRate 0.0735 Epoch: 2 Global Step: 118470 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:01:10,139-Speed 2631.64 samples/sec Loss 11.9110 LearningRate 0.0735 Epoch: 2 Global Step: 118480 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:01:14,016-Speed 2642.67 samples/sec Loss 12.0019 LearningRate 0.0735 Epoch: 2 Global Step: 118490 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:17,903-Speed 2634.77 samples/sec Loss 11.8310 LearningRate 0.0735 Epoch: 2 Global Step: 118500 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:21,803-Speed 2626.78 samples/sec Loss 11.8743 LearningRate 0.0735 Epoch: 2 Global Step: 118510 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:25,700-Speed 2627.79 samples/sec Loss 12.0285 LearningRate 0.0735 Epoch: 2 Global Step: 118520 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:29,598-Speed 2627.64 samples/sec Loss 11.9841 LearningRate 0.0735 Epoch: 2 Global Step: 118530 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:33,517-Speed 2613.32 samples/sec Loss 11.9995 LearningRate 0.0735 Epoch: 2 Global Step: 118540 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:37,409-Speed 2631.76 samples/sec Loss 12.0445 LearningRate 0.0735 Epoch: 2 Global Step: 118550 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:41,309-Speed 2626.37 samples/sec Loss 11.9895 LearningRate 0.0735 Epoch: 2 Global Step: 118560 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:45,205-Speed 2629.03 samples/sec Loss 11.8650 LearningRate 0.0735 Epoch: 2 Global Step: 118570 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:49,097-Speed 2631.99 samples/sec Loss 11.9302 LearningRate 0.0735 Epoch: 2 Global Step: 118580 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:01:52,991-Speed 2630.18 samples/sec Loss 11.9371 LearningRate 0.0735 Epoch: 2 Global Step: 118590 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:01:56,889-Speed 2627.83 samples/sec Loss 12.1032 LearningRate 0.0735 Epoch: 2 Global Step: 118600 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:02:00,803-Speed 2617.35 samples/sec Loss 11.9373 LearningRate 0.0734 Epoch: 2 Global Step: 118610 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:02:04,701-Speed 2627.46 samples/sec Loss 11.9241 LearningRate 0.0734 Epoch: 2 Global Step: 118620 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:02:08,584-Speed 2637.81 samples/sec Loss 11.8163 LearningRate 0.0734 Epoch: 2 Global Step: 118630 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:12,478-Speed 2629.89 samples/sec Loss 11.7069 LearningRate 0.0734 Epoch: 2 Global Step: 118640 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:16,475-Speed 2563.16 samples/sec Loss 11.7726 LearningRate 0.0734 Epoch: 2 Global Step: 118650 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:20,371-Speed 2629.30 samples/sec Loss 11.9193 LearningRate 0.0734 Epoch: 2 Global Step: 118660 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:24,266-Speed 2629.33 samples/sec Loss 11.7560 LearningRate 0.0734 Epoch: 2 Global Step: 118670 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:28,156-Speed 2633.22 samples/sec Loss 11.9829 LearningRate 0.0734 Epoch: 2 Global Step: 118680 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:32,063-Speed 2621.25 samples/sec Loss 12.0131 LearningRate 0.0734 Epoch: 2 Global Step: 118690 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:35,953-Speed 2634.00 samples/sec Loss 11.9088 LearningRate 0.0734 Epoch: 2 Global Step: 118700 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:39,844-Speed 2631.61 samples/sec Loss 11.9151 LearningRate 0.0734 Epoch: 2 Global Step: 118710 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:43,740-Speed 2628.93 samples/sec Loss 11.9617 LearningRate 0.0734 Epoch: 2 Global Step: 118720 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:47,632-Speed 2631.86 samples/sec Loss 11.8606 LearningRate 0.0734 Epoch: 2 Global Step: 118730 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:02:51,537-Speed 2623.08 samples/sec Loss 12.0077 LearningRate 0.0734 Epoch: 2 Global Step: 118740 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:55,438-Speed 2625.88 samples/sec Loss 11.7717 LearningRate 0.0734 Epoch: 2 Global Step: 118750 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:02:59,330-Speed 2631.55 samples/sec Loss 11.9851 LearningRate 0.0734 Epoch: 2 Global Step: 118760 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:03,239-Speed 2620.35 samples/sec Loss 11.8885 LearningRate 0.0734 Epoch: 2 Global Step: 118770 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:07,131-Speed 2631.58 samples/sec Loss 11.9190 LearningRate 0.0734 Epoch: 2 Global Step: 118780 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:11,025-Speed 2630.49 samples/sec Loss 11.9659 LearningRate 0.0734 Epoch: 2 Global Step: 118790 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:14,920-Speed 2629.72 samples/sec Loss 11.8391 LearningRate 0.0734 Epoch: 2 Global Step: 118800 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:18,814-Speed 2630.34 samples/sec Loss 11.8987 LearningRate 0.0734 Epoch: 2 Global Step: 118810 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:22,711-Speed 2628.60 samples/sec Loss 11.7759 LearningRate 0.0734 Epoch: 2 Global Step: 118820 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:26,609-Speed 2627.76 samples/sec Loss 11.8881 LearningRate 0.0734 Epoch: 2 Global Step: 118830 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:30,512-Speed 2623.76 samples/sec Loss 11.7901 LearningRate 0.0734 Epoch: 2 Global Step: 118840 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:03:34,396-Speed 2637.71 samples/sec Loss 12.0241 LearningRate 0.0734 Epoch: 2 Global Step: 118850 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:38,297-Speed 2625.69 samples/sec Loss 11.9551 LearningRate 0.0734 Epoch: 2 Global Step: 118860 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:42,222-Speed 2609.71 samples/sec Loss 11.9040 LearningRate 0.0734 Epoch: 2 Global Step: 118870 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:46,124-Speed 2624.49 samples/sec Loss 11.8104 LearningRate 0.0734 Epoch: 2 Global Step: 118880 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:03:49,990-Speed 2650.05 samples/sec Loss 11.9411 LearningRate 0.0734 Epoch: 2 Global Step: 118890 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:03:53,893-Speed 2623.56 samples/sec Loss 11.8937 LearningRate 0.0734 Epoch: 2 Global Step: 118900 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:03:57,802-Speed 2620.34 samples/sec Loss 11.9929 LearningRate 0.0734 Epoch: 2 Global Step: 118910 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:01,708-Speed 2622.59 samples/sec Loss 11.8118 LearningRate 0.0734 Epoch: 2 Global Step: 118920 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:05,609-Speed 2625.95 samples/sec Loss 12.0314 LearningRate 0.0734 Epoch: 2 Global Step: 118930 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:09,507-Speed 2627.12 samples/sec Loss 11.9818 LearningRate 0.0734 Epoch: 2 Global Step: 118940 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:13,411-Speed 2623.51 samples/sec Loss 11.8860 LearningRate 0.0734 Epoch: 2 Global Step: 118950 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:17,308-Speed 2628.53 samples/sec Loss 11.8929 LearningRate 0.0734 Epoch: 2 Global Step: 118960 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:21,203-Speed 2629.89 samples/sec Loss 11.7208 LearningRate 0.0734 Epoch: 2 Global Step: 118970 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:25,098-Speed 2629.68 samples/sec Loss 11.8515 LearningRate 0.0734 Epoch: 2 Global Step: 118980 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:04:28,998-Speed 2626.35 samples/sec Loss 11.9491 LearningRate 0.0734 Epoch: 2 Global Step: 118990 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:04:32,888-Speed 2633.44 samples/sec Loss 11.8419 LearningRate 0.0734 Epoch: 2 Global Step: 119000 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:04:36,805-Speed 2614.99 samples/sec Loss 11.8090 LearningRate 0.0734 Epoch: 2 Global Step: 119010 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:04:40,721-Speed 2615.21 samples/sec Loss 11.8018 LearningRate 0.0734 Epoch: 2 Global Step: 119020 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:04:44,644-Speed 2610.72 samples/sec Loss 11.8772 LearningRate 0.0734 Epoch: 2 Global Step: 119030 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:04:48,539-Speed 2630.31 samples/sec Loss 11.8381 LearningRate 0.0734 Epoch: 2 Global Step: 119040 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:04:52,433-Speed 2630.55 samples/sec Loss 11.7580 LearningRate 0.0734 Epoch: 2 Global Step: 119050 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:04:56,349-Speed 2615.48 samples/sec Loss 11.7459 LearningRate 0.0734 Epoch: 2 Global Step: 119060 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:05:00,258-Speed 2620.71 samples/sec Loss 11.9167 LearningRate 0.0734 Epoch: 2 Global Step: 119070 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:05:04,157-Speed 2626.73 samples/sec Loss 11.9433 LearningRate 0.0734 Epoch: 2 Global Step: 119080 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:05:08,048-Speed 2632.30 samples/sec Loss 11.9873 LearningRate 0.0733 Epoch: 2 Global Step: 119090 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:11,940-Speed 2632.25 samples/sec Loss 11.8174 LearningRate 0.0733 Epoch: 2 Global Step: 119100 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:15,835-Speed 2630.13 samples/sec Loss 11.8452 LearningRate 0.0733 Epoch: 2 Global Step: 119110 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:19,730-Speed 2629.73 samples/sec Loss 11.8233 LearningRate 0.0733 Epoch: 2 Global Step: 119120 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:23,636-Speed 2622.46 samples/sec Loss 11.7752 LearningRate 0.0733 Epoch: 2 Global Step: 119130 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:27,534-Speed 2627.39 samples/sec Loss 11.8988 LearningRate 0.0733 Epoch: 2 Global Step: 119140 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:31,426-Speed 2631.83 samples/sec Loss 11.8180 LearningRate 0.0733 Epoch: 2 Global Step: 119150 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:35,327-Speed 2625.64 samples/sec Loss 11.7974 LearningRate 0.0733 Epoch: 2 Global Step: 119160 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:39,223-Speed 2628.88 samples/sec Loss 11.9197 LearningRate 0.0733 Epoch: 2 Global Step: 119170 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:43,122-Speed 2626.59 samples/sec Loss 11.9297 LearningRate 0.0733 Epoch: 2 Global Step: 119180 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:47,029-Speed 2621.68 samples/sec Loss 11.8786 LearningRate 0.0733 Epoch: 2 Global Step: 119190 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:05:50,920-Speed 2632.95 samples/sec Loss 11.8557 LearningRate 0.0733 Epoch: 2 Global Step: 119200 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:05:54,805-Speed 2636.19 samples/sec Loss 11.8640 LearningRate 0.0733 Epoch: 2 Global Step: 119210 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:05:58,694-Speed 2634.34 samples/sec Loss 11.9833 LearningRate 0.0733 Epoch: 2 Global Step: 119220 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:06:02,574-Speed 2639.18 samples/sec Loss 11.8615 LearningRate 0.0733 Epoch: 2 Global Step: 119230 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:06,469-Speed 2629.71 samples/sec Loss 11.8564 LearningRate 0.0733 Epoch: 2 Global Step: 119240 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:10,365-Speed 2628.65 samples/sec Loss 11.9028 LearningRate 0.0733 Epoch: 2 Global Step: 119250 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:14,276-Speed 2619.00 samples/sec Loss 11.9499 LearningRate 0.0733 Epoch: 2 Global Step: 119260 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:18,169-Speed 2631.00 samples/sec Loss 11.9006 LearningRate 0.0733 Epoch: 2 Global Step: 119270 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:22,062-Speed 2631.25 samples/sec Loss 11.9585 LearningRate 0.0733 Epoch: 2 Global Step: 119280 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:25,961-Speed 2627.18 samples/sec Loss 11.8451 LearningRate 0.0733 Epoch: 2 Global Step: 119290 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:29,853-Speed 2632.01 samples/sec Loss 11.9082 LearningRate 0.0733 Epoch: 2 Global Step: 119300 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:33,746-Speed 2631.08 samples/sec Loss 11.7541 LearningRate 0.0733 Epoch: 2 Global Step: 119310 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:37,637-Speed 2632.16 samples/sec Loss 11.9750 LearningRate 0.0733 Epoch: 2 Global Step: 119320 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:06:41,560-Speed 2610.82 samples/sec Loss 11.9140 LearningRate 0.0733 Epoch: 2 Global Step: 119330 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:06:45,458-Speed 2627.38 samples/sec Loss 11.8632 LearningRate 0.0733 Epoch: 2 Global Step: 119340 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:06:49,384-Speed 2608.77 samples/sec Loss 11.9346 LearningRate 0.0733 Epoch: 2 Global Step: 119350 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:06:53,330-Speed 2596.52 samples/sec Loss 11.7785 LearningRate 0.0733 Epoch: 2 Global Step: 119360 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:06:57,220-Speed 2632.88 samples/sec Loss 11.7927 LearningRate 0.0733 Epoch: 2 Global Step: 119370 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:01,137-Speed 2616.11 samples/sec Loss 11.9336 LearningRate 0.0733 Epoch: 2 Global Step: 119380 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:05,035-Speed 2627.98 samples/sec Loss 12.0248 LearningRate 0.0733 Epoch: 2 Global Step: 119390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:08,953-Speed 2613.98 samples/sec Loss 11.8378 LearningRate 0.0733 Epoch: 2 Global Step: 119400 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:12,845-Speed 2631.32 samples/sec Loss 11.8458 LearningRate 0.0733 Epoch: 2 Global Step: 119410 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:16,739-Speed 2630.78 samples/sec Loss 11.8719 LearningRate 0.0733 Epoch: 2 Global Step: 119420 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:20,637-Speed 2628.07 samples/sec Loss 11.8802 LearningRate 0.0733 Epoch: 2 Global Step: 119430 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:07:24,537-Speed 2626.08 samples/sec Loss 11.7986 LearningRate 0.0733 Epoch: 2 Global Step: 119440 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:07:28,432-Speed 2629.76 samples/sec Loss 12.0575 LearningRate 0.0733 Epoch: 2 Global Step: 119450 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:07:32,330-Speed 2627.91 samples/sec Loss 11.8551 LearningRate 0.0733 Epoch: 2 Global Step: 119460 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:07:36,224-Speed 2630.02 samples/sec Loss 11.9343 LearningRate 0.0733 Epoch: 2 Global Step: 119470 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:07:40,103-Speed 2640.50 samples/sec Loss 11.8946 LearningRate 0.0733 Epoch: 2 Global Step: 119480 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:44,043-Speed 2599.49 samples/sec Loss 11.9431 LearningRate 0.0733 Epoch: 2 Global Step: 119490 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:47,952-Speed 2620.56 samples/sec Loss 11.8454 LearningRate 0.0733 Epoch: 2 Global Step: 119500 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:51,846-Speed 2630.82 samples/sec Loss 11.8195 LearningRate 0.0733 Epoch: 2 Global Step: 119510 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:55,743-Speed 2627.69 samples/sec Loss 11.9112 LearningRate 0.0733 Epoch: 2 Global Step: 119520 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:07:59,638-Speed 2630.53 samples/sec Loss 11.9521 LearningRate 0.0733 Epoch: 2 Global Step: 119530 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:08:03,530-Speed 2631.48 samples/sec Loss 11.9301 LearningRate 0.0733 Epoch: 2 Global Step: 119540 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:08:07,421-Speed 2632.40 samples/sec Loss 11.9364 LearningRate 0.0733 Epoch: 2 Global Step: 119550 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:08:11,314-Speed 2630.42 samples/sec Loss 11.9972 LearningRate 0.0733 Epoch: 2 Global Step: 119560 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:08:15,220-Speed 2622.76 samples/sec Loss 11.9433 LearningRate 0.0733 Epoch: 2 Global Step: 119570 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:08:19,115-Speed 2629.09 samples/sec Loss 11.9360 LearningRate 0.0732 Epoch: 2 Global Step: 119580 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:08:22,973-Speed 2654.95 samples/sec Loss 11.8682 LearningRate 0.0732 Epoch: 2 Global Step: 119590 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:26,871-Speed 2627.89 samples/sec Loss 11.8578 LearningRate 0.0732 Epoch: 2 Global Step: 119600 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:30,763-Speed 2632.17 samples/sec Loss 11.8526 LearningRate 0.0732 Epoch: 2 Global Step: 119610 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:34,652-Speed 2633.14 samples/sec Loss 11.9640 LearningRate 0.0732 Epoch: 2 Global Step: 119620 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:38,544-Speed 2631.83 samples/sec Loss 11.9566 LearningRate 0.0732 Epoch: 2 Global Step: 119630 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:42,438-Speed 2630.07 samples/sec Loss 11.8657 LearningRate 0.0732 Epoch: 2 Global Step: 119640 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:46,328-Speed 2632.85 samples/sec Loss 11.9772 LearningRate 0.0732 Epoch: 2 Global Step: 119650 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:50,221-Speed 2630.97 samples/sec Loss 11.9010 LearningRate 0.0732 Epoch: 2 Global Step: 119660 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:54,114-Speed 2631.86 samples/sec Loss 11.8531 LearningRate 0.0732 Epoch: 2 Global Step: 119670 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:08:58,005-Speed 2631.91 samples/sec Loss 11.9425 LearningRate 0.0732 Epoch: 2 Global Step: 119680 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:09:01,901-Speed 2629.44 samples/sec Loss 12.0005 LearningRate 0.0732 Epoch: 2 Global Step: 119690 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:05,793-Speed 2631.30 samples/sec Loss 11.8622 LearningRate 0.0732 Epoch: 2 Global Step: 119700 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:09,686-Speed 2631.54 samples/sec Loss 11.9400 LearningRate 0.0732 Epoch: 2 Global Step: 119710 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:13,585-Speed 2626.79 samples/sec Loss 11.7860 LearningRate 0.0732 Epoch: 2 Global Step: 119720 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:17,478-Speed 2630.20 samples/sec Loss 11.8984 LearningRate 0.0732 Epoch: 2 Global Step: 119730 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:21,384-Speed 2622.50 samples/sec Loss 11.8313 LearningRate 0.0732 Epoch: 2 Global Step: 119740 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:25,270-Speed 2636.04 samples/sec Loss 12.0126 LearningRate 0.0732 Epoch: 2 Global Step: 119750 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:29,168-Speed 2627.21 samples/sec Loss 11.8969 LearningRate 0.0732 Epoch: 2 Global Step: 119760 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:33,067-Speed 2627.31 samples/sec Loss 11.7784 LearningRate 0.0732 Epoch: 2 Global Step: 119770 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:36,962-Speed 2629.53 samples/sec Loss 11.8658 LearningRate 0.0732 Epoch: 2 Global Step: 119780 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:09:40,854-Speed 2632.07 samples/sec Loss 11.9293 LearningRate 0.0732 Epoch: 2 Global Step: 119790 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:09:44,747-Speed 2631.04 samples/sec Loss 11.8130 LearningRate 0.0732 Epoch: 2 Global Step: 119800 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:09:48,640-Speed 2630.66 samples/sec Loss 12.0266 LearningRate 0.0732 Epoch: 2 Global Step: 119810 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:09:52,531-Speed 2631.80 samples/sec Loss 11.9329 LearningRate 0.0732 Epoch: 2 Global Step: 119820 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:09:56,473-Speed 2598.60 samples/sec Loss 11.9035 LearningRate 0.0732 Epoch: 2 Global Step: 119830 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:00,373-Speed 2626.83 samples/sec Loss 11.8449 LearningRate 0.0732 Epoch: 2 Global Step: 119840 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:04,267-Speed 2630.42 samples/sec Loss 11.8094 LearningRate 0.0732 Epoch: 2 Global Step: 119850 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:08,170-Speed 2624.07 samples/sec Loss 11.8808 LearningRate 0.0732 Epoch: 2 Global Step: 119860 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:12,069-Speed 2627.55 samples/sec Loss 11.8251 LearningRate 0.0732 Epoch: 2 Global Step: 119870 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:15,968-Speed 2626.49 samples/sec Loss 11.8827 LearningRate 0.0732 Epoch: 2 Global Step: 119880 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:19,868-Speed 2626.21 samples/sec Loss 11.9792 LearningRate 0.0732 Epoch: 2 Global Step: 119890 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:23,768-Speed 2626.45 samples/sec Loss 11.8391 LearningRate 0.0732 Epoch: 2 Global Step: 119900 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:27,675-Speed 2621.39 samples/sec Loss 11.7490 LearningRate 0.0732 Epoch: 2 Global Step: 119910 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:31,586-Speed 2618.90 samples/sec Loss 11.9006 LearningRate 0.0732 Epoch: 2 Global Step: 119920 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:35,489-Speed 2623.93 samples/sec Loss 11.8699 LearningRate 0.0732 Epoch: 2 Global Step: 119930 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:10:39,386-Speed 2628.67 samples/sec Loss 11.8096 LearningRate 0.0732 Epoch: 2 Global Step: 119940 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:43,279-Speed 2630.44 samples/sec Loss 11.7952 LearningRate 0.0732 Epoch: 2 Global Step: 119950 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:47,180-Speed 2625.66 samples/sec Loss 11.9398 LearningRate 0.0732 Epoch: 2 Global Step: 119960 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:51,072-Speed 2631.42 samples/sec Loss 11.7875 LearningRate 0.0732 Epoch: 2 Global Step: 119970 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:54,969-Speed 2628.57 samples/sec Loss 11.7599 LearningRate 0.0732 Epoch: 2 Global Step: 119980 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:10:58,863-Speed 2630.14 samples/sec Loss 11.8931 LearningRate 0.0732 Epoch: 2 Global Step: 119990 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:11:02,757-Speed 2630.58 samples/sec Loss 11.8223 LearningRate 0.0732 Epoch: 2 Global Step: 120000 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:11:45,583-[lfw][120000]XNorm: 23.355378
Training: 2022-04-13 09:11:45,584-[lfw][120000]Accuracy-Flip: 0.99750+-0.00250
Training: 2022-04-13 09:11:45,585-[lfw][120000]Accuracy-Highest: 0.99783
Training: 2022-04-13 09:12:35,723-[cfp_fp][120000]XNorm: 21.516966
Training: 2022-04-13 09:12:35,724-[cfp_fp][120000]Accuracy-Flip: 0.97971+-0.00790
Training: 2022-04-13 09:12:35,724-[cfp_fp][120000]Accuracy-Highest: 0.97986
Training: 2022-04-13 09:13:18,693-[agedb_30][120000]XNorm: 23.355863
Training: 2022-04-13 09:13:18,694-[agedb_30][120000]Accuracy-Flip: 0.96800+-0.00852
Training: 2022-04-13 09:13:18,694-[agedb_30][120000]Accuracy-Highest: 0.96800
Training: 2022-04-13 09:13:22,572-Speed 73.24 samples/sec Loss 11.7953 LearningRate 0.0732 Epoch: 2 Global Step: 120010 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:13:26,437-Speed 2650.30 samples/sec Loss 11.9058 LearningRate 0.0732 Epoch: 2 Global Step: 120020 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:13:30,304-Speed 2648.67 samples/sec Loss 11.8279 LearningRate 0.0732 Epoch: 2 Global Step: 120030 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:13:34,162-Speed 2654.41 samples/sec Loss 11.8315 LearningRate 0.0732 Epoch: 2 Global Step: 120040 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:13:38,042-Speed 2640.03 samples/sec Loss 11.9734 LearningRate 0.0732 Epoch: 2 Global Step: 120050 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:13:41,923-Speed 2638.97 samples/sec Loss 11.8406 LearningRate 0.0731 Epoch: 2 Global Step: 120060 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:13:45,809-Speed 2636.01 samples/sec Loss 11.8424 LearningRate 0.0731 Epoch: 2 Global Step: 120070 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:13:49,710-Speed 2625.62 samples/sec Loss 11.8074 LearningRate 0.0731 Epoch: 2 Global Step: 120080 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:13:53,585-Speed 2643.00 samples/sec Loss 11.9036 LearningRate 0.0731 Epoch: 2 Global Step: 120090 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:13:57,466-Speed 2638.65 samples/sec Loss 11.8632 LearningRate 0.0731 Epoch: 2 Global Step: 120100 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:01,408-Speed 2598.70 samples/sec Loss 11.8836 LearningRate 0.0731 Epoch: 2 Global Step: 120110 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:05,362-Speed 2590.47 samples/sec Loss 11.8211 LearningRate 0.0731 Epoch: 2 Global Step: 120120 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:09,263-Speed 2625.53 samples/sec Loss 11.7182 LearningRate 0.0731 Epoch: 2 Global Step: 120130 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:13,150-Speed 2634.57 samples/sec Loss 11.7325 LearningRate 0.0731 Epoch: 2 Global Step: 120140 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:14:17,090-Speed 2599.95 samples/sec Loss 11.9489 LearningRate 0.0731 Epoch: 2 Global Step: 120150 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:14:20,995-Speed 2623.18 samples/sec Loss 11.8308 LearningRate 0.0731 Epoch: 2 Global Step: 120160 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:14:24,893-Speed 2627.45 samples/sec Loss 11.9262 LearningRate 0.0731 Epoch: 2 Global Step: 120170 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:14:28,792-Speed 2626.58 samples/sec Loss 11.7641 LearningRate 0.0731 Epoch: 2 Global Step: 120180 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:14:32,681-Speed 2633.72 samples/sec Loss 11.8209 LearningRate 0.0731 Epoch: 2 Global Step: 120190 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:14:36,555-Speed 2644.21 samples/sec Loss 11.8369 LearningRate 0.0731 Epoch: 2 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:40,464-Speed 2619.43 samples/sec Loss 11.9501 LearningRate 0.0731 Epoch: 2 Global Step: 120210 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:44,359-Speed 2630.00 samples/sec Loss 11.9429 LearningRate 0.0731 Epoch: 2 Global Step: 120220 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:48,256-Speed 2627.96 samples/sec Loss 11.9547 LearningRate 0.0731 Epoch: 2 Global Step: 120230 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:52,157-Speed 2626.06 samples/sec Loss 11.8869 LearningRate 0.0731 Epoch: 2 Global Step: 120240 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:56,043-Speed 2635.88 samples/sec Loss 12.0201 LearningRate 0.0731 Epoch: 2 Global Step: 120250 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:14:59,928-Speed 2636.25 samples/sec Loss 11.8795 LearningRate 0.0731 Epoch: 2 Global Step: 120260 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:15:03,818-Speed 2632.43 samples/sec Loss 11.9430 LearningRate 0.0731 Epoch: 2 Global Step: 120270 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:15:07,711-Speed 2631.45 samples/sec Loss 11.9478 LearningRate 0.0731 Epoch: 2 Global Step: 120280 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:15:11,630-Speed 2612.97 samples/sec Loss 11.8194 LearningRate 0.0731 Epoch: 2 Global Step: 120290 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:15:15,528-Speed 2627.94 samples/sec Loss 11.8980 LearningRate 0.0731 Epoch: 2 Global Step: 120300 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:19,418-Speed 2632.57 samples/sec Loss 11.7210 LearningRate 0.0731 Epoch: 2 Global Step: 120310 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:23,327-Speed 2620.71 samples/sec Loss 11.9406 LearningRate 0.0731 Epoch: 2 Global Step: 120320 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:27,236-Speed 2620.00 samples/sec Loss 11.9140 LearningRate 0.0731 Epoch: 2 Global Step: 120330 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:31,119-Speed 2638.23 samples/sec Loss 11.8058 LearningRate 0.0731 Epoch: 2 Global Step: 120340 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:35,009-Speed 2632.76 samples/sec Loss 11.9221 LearningRate 0.0731 Epoch: 2 Global Step: 120350 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:38,902-Speed 2630.81 samples/sec Loss 11.8298 LearningRate 0.0731 Epoch: 2 Global Step: 120360 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:42,791-Speed 2633.33 samples/sec Loss 11.9415 LearningRate 0.0731 Epoch: 2 Global Step: 120370 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:46,686-Speed 2629.80 samples/sec Loss 11.7900 LearningRate 0.0731 Epoch: 2 Global Step: 120380 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:50,576-Speed 2632.53 samples/sec Loss 11.9282 LearningRate 0.0731 Epoch: 2 Global Step: 120390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:15:54,466-Speed 2633.46 samples/sec Loss 11.9167 LearningRate 0.0731 Epoch: 2 Global Step: 120400 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:15:58,346-Speed 2639.45 samples/sec Loss 11.7739 LearningRate 0.0731 Epoch: 2 Global Step: 120410 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:02,240-Speed 2630.37 samples/sec Loss 11.9026 LearningRate 0.0731 Epoch: 2 Global Step: 120420 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:06,132-Speed 2631.88 samples/sec Loss 11.9568 LearningRate 0.0731 Epoch: 2 Global Step: 120430 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:10,023-Speed 2632.47 samples/sec Loss 11.8828 LearningRate 0.0731 Epoch: 2 Global Step: 120440 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:13,924-Speed 2625.10 samples/sec Loss 11.8330 LearningRate 0.0731 Epoch: 2 Global Step: 120450 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:17,816-Speed 2631.77 samples/sec Loss 11.6569 LearningRate 0.0731 Epoch: 2 Global Step: 120460 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:21,703-Speed 2635.72 samples/sec Loss 11.8438 LearningRate 0.0731 Epoch: 2 Global Step: 120470 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:25,605-Speed 2624.80 samples/sec Loss 11.7752 LearningRate 0.0731 Epoch: 2 Global Step: 120480 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:29,499-Speed 2629.77 samples/sec Loss 11.9160 LearningRate 0.0731 Epoch: 2 Global Step: 120490 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:33,398-Speed 2626.96 samples/sec Loss 11.7556 LearningRate 0.0731 Epoch: 2 Global Step: 120500 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:37,291-Speed 2630.75 samples/sec Loss 11.8437 LearningRate 0.0731 Epoch: 2 Global Step: 120510 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:16:41,161-Speed 2646.88 samples/sec Loss 11.8332 LearningRate 0.0731 Epoch: 2 Global Step: 120520 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:16:45,031-Speed 2647.07 samples/sec Loss 11.9421 LearningRate 0.0731 Epoch: 2 Global Step: 120530 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:16:48,903-Speed 2645.14 samples/sec Loss 11.8893 LearningRate 0.0731 Epoch: 2 Global Step: 120540 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:16:52,924-Speed 2555.98 samples/sec Loss 11.9597 LearningRate 0.0730 Epoch: 2 Global Step: 120550 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:16:56,811-Speed 2634.84 samples/sec Loss 11.9002 LearningRate 0.0730 Epoch: 2 Global Step: 120560 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:00,699-Speed 2634.40 samples/sec Loss 11.9216 LearningRate 0.0730 Epoch: 2 Global Step: 120570 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:04,595-Speed 2628.68 samples/sec Loss 11.7957 LearningRate 0.0730 Epoch: 2 Global Step: 120580 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:08,506-Speed 2618.49 samples/sec Loss 11.9622 LearningRate 0.0730 Epoch: 2 Global Step: 120590 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:12,410-Speed 2623.87 samples/sec Loss 11.7891 LearningRate 0.0730 Epoch: 2 Global Step: 120600 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:16,309-Speed 2638.96 samples/sec Loss 11.9462 LearningRate 0.0730 Epoch: 2 Global Step: 120610 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:20,204-Speed 2629.66 samples/sec Loss 11.7573 LearningRate 0.0730 Epoch: 2 Global Step: 120620 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:24,145-Speed 2630.56 samples/sec Loss 11.8937 LearningRate 0.0730 Epoch: 2 Global Step: 120630 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:17:28,050-Speed 2622.42 samples/sec Loss 11.9023 LearningRate 0.0730 Epoch: 2 Global Step: 120640 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:17:31,943-Speed 2638.99 samples/sec Loss 11.7641 LearningRate 0.0730 Epoch: 2 Global Step: 120650 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:17:35,830-Speed 2634.59 samples/sec Loss 11.7250 LearningRate 0.0730 Epoch: 2 Global Step: 120660 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:17:39,797-Speed 2581.77 samples/sec Loss 11.9324 LearningRate 0.0730 Epoch: 2 Global Step: 120670 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:17:43,743-Speed 2595.64 samples/sec Loss 11.8826 LearningRate 0.0730 Epoch: 2 Global Step: 120680 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:17:47,740-Speed 2640.96 samples/sec Loss 11.8526 LearningRate 0.0730 Epoch: 2 Global Step: 120690 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:17:51,889-Speed 2641.89 samples/sec Loss 11.8919 LearningRate 0.0730 Epoch: 2 Global Step: 120700 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:17:55,807-Speed 2613.88 samples/sec Loss 11.8648 LearningRate 0.0730 Epoch: 2 Global Step: 120710 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:00,106-Speed 2634.30 samples/sec Loss 11.8405 LearningRate 0.0730 Epoch: 2 Global Step: 120720 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:04,005-Speed 2626.65 samples/sec Loss 11.8168 LearningRate 0.0730 Epoch: 2 Global Step: 120730 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:07,936-Speed 2605.87 samples/sec Loss 11.9654 LearningRate 0.0730 Epoch: 2 Global Step: 120740 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:18:11,843-Speed 2622.16 samples/sec Loss 11.8105 LearningRate 0.0730 Epoch: 2 Global Step: 120750 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:18:15,742-Speed 2626.35 samples/sec Loss 11.9260 LearningRate 0.0730 Epoch: 2 Global Step: 120760 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:18:19,620-Speed 2641.13 samples/sec Loss 11.7581 LearningRate 0.0730 Epoch: 2 Global Step: 120770 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:23,521-Speed 2625.39 samples/sec Loss 12.0620 LearningRate 0.0730 Epoch: 2 Global Step: 120780 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:27,453-Speed 2605.17 samples/sec Loss 12.3688 LearningRate 0.0730 Epoch: 2 Global Step: 120790 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:31,359-Speed 2622.48 samples/sec Loss 11.9747 LearningRate 0.0730 Epoch: 2 Global Step: 120800 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:35,253-Speed 2630.41 samples/sec Loss 12.0341 LearningRate 0.0730 Epoch: 2 Global Step: 120810 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:39,154-Speed 2625.92 samples/sec Loss 11.9469 LearningRate 0.0730 Epoch: 2 Global Step: 120820 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:43,057-Speed 2624.10 samples/sec Loss 11.8490 LearningRate 0.0730 Epoch: 2 Global Step: 120830 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:46,960-Speed 2624.25 samples/sec Loss 11.6752 LearningRate 0.0730 Epoch: 2 Global Step: 120840 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:50,863-Speed 2623.84 samples/sec Loss 11.9498 LearningRate 0.0730 Epoch: 2 Global Step: 120850 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:54,769-Speed 2622.38 samples/sec Loss 11.7887 LearningRate 0.0730 Epoch: 2 Global Step: 120860 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:18:58,673-Speed 2624.23 samples/sec Loss 11.9175 LearningRate 0.0730 Epoch: 2 Global Step: 120870 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:02,579-Speed 2621.65 samples/sec Loss 11.8170 LearningRate 0.0730 Epoch: 2 Global Step: 120880 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:06,484-Speed 2623.69 samples/sec Loss 11.7956 LearningRate 0.0730 Epoch: 2 Global Step: 120890 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:10,387-Speed 2623.74 samples/sec Loss 11.8179 LearningRate 0.0730 Epoch: 2 Global Step: 120900 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:14,287-Speed 2626.28 samples/sec Loss 11.9583 LearningRate 0.0730 Epoch: 2 Global Step: 120910 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:18,181-Speed 2630.30 samples/sec Loss 11.8592 LearningRate 0.0730 Epoch: 2 Global Step: 120920 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:22,081-Speed 2626.36 samples/sec Loss 11.9992 LearningRate 0.0730 Epoch: 2 Global Step: 120930 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:25,977-Speed 2629.35 samples/sec Loss 11.8931 LearningRate 0.0730 Epoch: 2 Global Step: 120940 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:29,874-Speed 2628.24 samples/sec Loss 12.0012 LearningRate 0.0730 Epoch: 2 Global Step: 120950 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:33,777-Speed 2624.03 samples/sec Loss 11.9725 LearningRate 0.0730 Epoch: 2 Global Step: 120960 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:37,676-Speed 2627.70 samples/sec Loss 11.8555 LearningRate 0.0730 Epoch: 2 Global Step: 120970 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:19:41,571-Speed 2629.21 samples/sec Loss 11.7684 LearningRate 0.0730 Epoch: 2 Global Step: 120980 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:19:45,470-Speed 2626.50 samples/sec Loss 11.9957 LearningRate 0.0730 Epoch: 2 Global Step: 120990 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:19:49,371-Speed 2625.95 samples/sec Loss 11.8924 LearningRate 0.0730 Epoch: 2 Global Step: 121000 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:19:53,249-Speed 2641.28 samples/sec Loss 11.7337 LearningRate 0.0730 Epoch: 2 Global Step: 121010 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:19:57,144-Speed 2630.26 samples/sec Loss 11.7660 LearningRate 0.0730 Epoch: 2 Global Step: 121020 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:01,058-Speed 2616.82 samples/sec Loss 11.9353 LearningRate 0.0729 Epoch: 2 Global Step: 121030 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:04,967-Speed 2620.13 samples/sec Loss 11.7850 LearningRate 0.0729 Epoch: 2 Global Step: 121040 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:08,863-Speed 2629.29 samples/sec Loss 11.9033 LearningRate 0.0729 Epoch: 2 Global Step: 121050 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:12,760-Speed 2628.49 samples/sec Loss 11.7507 LearningRate 0.0729 Epoch: 2 Global Step: 121060 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:16,657-Speed 2627.87 samples/sec Loss 11.8643 LearningRate 0.0729 Epoch: 2 Global Step: 121070 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:20,556-Speed 2627.09 samples/sec Loss 11.6592 LearningRate 0.0729 Epoch: 2 Global Step: 121080 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:24,451-Speed 2629.39 samples/sec Loss 11.8951 LearningRate 0.0729 Epoch: 2 Global Step: 121090 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:20:28,331-Speed 2640.42 samples/sec Loss 11.8588 LearningRate 0.0729 Epoch: 2 Global Step: 121100 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:20:32,216-Speed 2636.50 samples/sec Loss 11.9981 LearningRate 0.0729 Epoch: 2 Global Step: 121110 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:20:36,110-Speed 2629.98 samples/sec Loss 11.7821 LearningRate 0.0729 Epoch: 2 Global Step: 121120 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:20:40,005-Speed 2629.81 samples/sec Loss 11.9489 LearningRate 0.0729 Epoch: 2 Global Step: 121130 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:20:43,912-Speed 2620.98 samples/sec Loss 11.8071 LearningRate 0.0729 Epoch: 2 Global Step: 121140 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:20:47,823-Speed 2619.38 samples/sec Loss 11.9687 LearningRate 0.0729 Epoch: 2 Global Step: 121150 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:20:51,731-Speed 2620.81 samples/sec Loss 11.9156 LearningRate 0.0729 Epoch: 2 Global Step: 121160 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:20:55,638-Speed 2621.78 samples/sec Loss 11.6326 LearningRate 0.0729 Epoch: 2 Global Step: 121170 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:20:59,543-Speed 2622.60 samples/sec Loss 11.9023 LearningRate 0.0729 Epoch: 2 Global Step: 121180 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:21:03,440-Speed 2628.58 samples/sec Loss 11.8131 LearningRate 0.0729 Epoch: 2 Global Step: 121190 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:21:07,335-Speed 2629.59 samples/sec Loss 11.7590 LearningRate 0.0729 Epoch: 2 Global Step: 121200 Fp16 Grad Scale: 32768 Required: 80 hours
Training: 2022-04-13 09:21:11,228-Speed 2630.88 samples/sec Loss 11.7416 LearningRate 0.0729 Epoch: 2 Global Step: 121210 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:15,135-Speed 2621.82 samples/sec Loss 11.8528 LearningRate 0.0729 Epoch: 2 Global Step: 121220 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:19,027-Speed 2631.43 samples/sec Loss 11.7822 LearningRate 0.0729 Epoch: 2 Global Step: 121230 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:22,935-Speed 2621.34 samples/sec Loss 11.7380 LearningRate 0.0729 Epoch: 2 Global Step: 121240 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:26,836-Speed 2625.41 samples/sec Loss 11.8334 LearningRate 0.0729 Epoch: 2 Global Step: 121250 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:30,733-Speed 2629.01 samples/sec Loss 11.7743 LearningRate 0.0729 Epoch: 2 Global Step: 121260 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:34,628-Speed 2629.48 samples/sec Loss 11.7383 LearningRate 0.0729 Epoch: 2 Global Step: 121270 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:38,525-Speed 2627.57 samples/sec Loss 11.6860 LearningRate 0.0729 Epoch: 2 Global Step: 121280 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:42,420-Speed 2629.50 samples/sec Loss 11.7783 LearningRate 0.0729 Epoch: 2 Global Step: 121290 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:46,327-Speed 2621.76 samples/sec Loss 11.9261 LearningRate 0.0729 Epoch: 2 Global Step: 121300 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:21:50,239-Speed 2618.52 samples/sec Loss 11.7621 LearningRate 0.0729 Epoch: 2 Global Step: 121310 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:21:54,141-Speed 2625.35 samples/sec Loss 12.0549 LearningRate 0.0729 Epoch: 2 Global Step: 121320 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:21:58,039-Speed 2627.71 samples/sec Loss 11.7720 LearningRate 0.0729 Epoch: 2 Global Step: 121330 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:01,937-Speed 2627.63 samples/sec Loss 11.8362 LearningRate 0.0729 Epoch: 2 Global Step: 121340 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:05,834-Speed 2628.38 samples/sec Loss 11.8056 LearningRate 0.0729 Epoch: 2 Global Step: 121350 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:09,734-Speed 2625.47 samples/sec Loss 12.0086 LearningRate 0.0729 Epoch: 2 Global Step: 121360 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:13,635-Speed 2625.91 samples/sec Loss 11.8193 LearningRate 0.0729 Epoch: 2 Global Step: 121370 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:17,532-Speed 2628.24 samples/sec Loss 11.7585 LearningRate 0.0729 Epoch: 2 Global Step: 121380 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:21,436-Speed 2623.59 samples/sec Loss 11.6961 LearningRate 0.0729 Epoch: 2 Global Step: 121390 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:25,337-Speed 2625.78 samples/sec Loss 11.8371 LearningRate 0.0729 Epoch: 2 Global Step: 121400 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:29,234-Speed 2628.41 samples/sec Loss 11.8788 LearningRate 0.0729 Epoch: 2 Global Step: 121410 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:22:33,131-Speed 2628.22 samples/sec Loss 11.7448 LearningRate 0.0729 Epoch: 2 Global Step: 121420 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:22:37,012-Speed 2639.38 samples/sec Loss 11.9600 LearningRate 0.0729 Epoch: 2 Global Step: 121430 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:40,913-Speed 2624.97 samples/sec Loss 11.9530 LearningRate 0.0729 Epoch: 2 Global Step: 121440 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:44,822-Speed 2620.16 samples/sec Loss 11.9358 LearningRate 0.0729 Epoch: 2 Global Step: 121450 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:48,728-Speed 2621.82 samples/sec Loss 11.9734 LearningRate 0.0729 Epoch: 2 Global Step: 121460 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:52,644-Speed 2616.32 samples/sec Loss 11.9264 LearningRate 0.0729 Epoch: 2 Global Step: 121470 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:22:56,555-Speed 2618.65 samples/sec Loss 11.8084 LearningRate 0.0729 Epoch: 2 Global Step: 121480 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:23:00,468-Speed 2618.33 samples/sec Loss 11.9569 LearningRate 0.0729 Epoch: 2 Global Step: 121490 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:23:04,352-Speed 2637.09 samples/sec Loss 11.8706 LearningRate 0.0729 Epoch: 2 Global Step: 121500 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:08,260-Speed 2620.89 samples/sec Loss 11.9519 LearningRate 0.0729 Epoch: 2 Global Step: 121510 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:12,173-Speed 2617.11 samples/sec Loss 11.8341 LearningRate 0.0728 Epoch: 2 Global Step: 121520 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:16,078-Speed 2623.26 samples/sec Loss 11.6919 LearningRate 0.0728 Epoch: 2 Global Step: 121530 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:19,977-Speed 2626.64 samples/sec Loss 11.8432 LearningRate 0.0728 Epoch: 2 Global Step: 121540 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:23,878-Speed 2626.22 samples/sec Loss 11.9429 LearningRate 0.0728 Epoch: 2 Global Step: 121550 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:27,781-Speed 2623.70 samples/sec Loss 11.8419 LearningRate 0.0728 Epoch: 2 Global Step: 121560 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:31,687-Speed 2622.91 samples/sec Loss 11.8540 LearningRate 0.0728 Epoch: 2 Global Step: 121570 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:35,588-Speed 2625.05 samples/sec Loss 11.8310 LearningRate 0.0728 Epoch: 2 Global Step: 121580 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:39,491-Speed 2624.39 samples/sec Loss 11.8564 LearningRate 0.0728 Epoch: 2 Global Step: 121590 Fp16 Grad Scale: 65536 Required: 80 hours
Training: 2022-04-13 09:23:43,393-Speed 2624.68 samples/sec Loss 11.8116 LearningRate 0.0728 Epoch: 2 Global Step: 121600 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:23:47,291-Speed 2627.93 samples/sec Loss 11.9172 LearningRate 0.0728 Epoch: 2 Global Step: 121610 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:23:51,190-Speed 2626.88 samples/sec Loss 11.7626 LearningRate 0.0728 Epoch: 2 Global Step: 121620 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:23:55,091-Speed 2625.60 samples/sec Loss 11.8741 LearningRate 0.0728 Epoch: 2 Global Step: 121630 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:23:59,021-Speed 2606.47 samples/sec Loss 11.8300 LearningRate 0.0728 Epoch: 2 Global Step: 121640 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:24:02,922-Speed 2625.97 samples/sec Loss 11.8681 LearningRate 0.0728 Epoch: 2 Global Step: 121650 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:24:06,950-Speed 2542.97 samples/sec Loss 11.7689 LearningRate 0.0728 Epoch: 2 Global Step: 121660 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:24:10,843-Speed 2630.70 samples/sec Loss 11.9191 LearningRate 0.0728 Epoch: 2 Global Step: 121670 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:24:14,746-Speed 2624.40 samples/sec Loss 11.7271 LearningRate 0.0728 Epoch: 2 Global Step: 121680 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:24:18,648-Speed 2624.98 samples/sec Loss 11.9771 LearningRate 0.0728 Epoch: 2 Global Step: 121690 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:24:22,669-Speed 2548.10 samples/sec Loss 11.7928 LearningRate 0.0728 Epoch: 2 Global Step: 121700 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:26,648-Speed 2573.64 samples/sec Loss 11.7094 LearningRate 0.0728 Epoch: 2 Global Step: 121710 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:30,542-Speed 2630.45 samples/sec Loss 11.7722 LearningRate 0.0728 Epoch: 2 Global Step: 121720 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:34,449-Speed 2621.77 samples/sec Loss 11.7590 LearningRate 0.0728 Epoch: 2 Global Step: 121730 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:38,344-Speed 2629.85 samples/sec Loss 11.9857 LearningRate 0.0728 Epoch: 2 Global Step: 121740 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:42,239-Speed 2629.10 samples/sec Loss 11.9674 LearningRate 0.0728 Epoch: 2 Global Step: 121750 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:46,137-Speed 2627.44 samples/sec Loss 11.9523 LearningRate 0.0728 Epoch: 2 Global Step: 121760 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:50,039-Speed 2625.31 samples/sec Loss 11.9558 LearningRate 0.0728 Epoch: 2 Global Step: 121770 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:53,936-Speed 2628.04 samples/sec Loss 11.8151 LearningRate 0.0728 Epoch: 2 Global Step: 121780 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:24:57,834-Speed 2627.47 samples/sec Loss 11.9651 LearningRate 0.0728 Epoch: 2 Global Step: 121790 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:01,714-Speed 2639.71 samples/sec Loss 11.6776 LearningRate 0.0728 Epoch: 2 Global Step: 121800 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:05,612-Speed 2627.87 samples/sec Loss 11.8682 LearningRate 0.0728 Epoch: 2 Global Step: 121810 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:09,512-Speed 2625.96 samples/sec Loss 11.8300 LearningRate 0.0728 Epoch: 2 Global Step: 121820 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:13,415-Speed 2624.54 samples/sec Loss 11.8248 LearningRate 0.0728 Epoch: 2 Global Step: 121830 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:17,319-Speed 2623.13 samples/sec Loss 11.8811 LearningRate 0.0728 Epoch: 2 Global Step: 121840 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:21,228-Speed 2620.67 samples/sec Loss 11.8216 LearningRate 0.0728 Epoch: 2 Global Step: 121850 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:25,127-Speed 2626.70 samples/sec Loss 11.7984 LearningRate 0.0728 Epoch: 2 Global Step: 121860 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:29,024-Speed 2628.50 samples/sec Loss 11.9762 LearningRate 0.0728 Epoch: 2 Global Step: 121870 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:32,923-Speed 2626.88 samples/sec Loss 11.8984 LearningRate 0.0728 Epoch: 2 Global Step: 121880 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:36,825-Speed 2625.25 samples/sec Loss 11.9986 LearningRate 0.0728 Epoch: 2 Global Step: 121890 Fp16 Grad Scale: 262144 Required: 80 hours
Training: 2022-04-13 09:25:40,705-Speed 2639.70 samples/sec Loss 11.8537 LearningRate 0.0728 Epoch: 2 Global Step: 121900 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:25:44,602-Speed 2628.07 samples/sec Loss 11.8113 LearningRate 0.0728 Epoch: 2 Global Step: 121910 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:25:48,502-Speed 2626.00 samples/sec Loss 11.7939 LearningRate 0.0728 Epoch: 2 Global Step: 121920 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:25:52,402-Speed 2627.24 samples/sec Loss 11.7835 LearningRate 0.0728 Epoch: 2 Global Step: 121930 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:25:56,298-Speed 2628.36 samples/sec Loss 11.7996 LearningRate 0.0728 Epoch: 2 Global Step: 121940 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:26:00,201-Speed 2624.63 samples/sec Loss 11.9093 LearningRate 0.0728 Epoch: 2 Global Step: 121950 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:26:04,102-Speed 2625.32 samples/sec Loss 11.8187 LearningRate 0.0728 Epoch: 2 Global Step: 121960 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:26:07,999-Speed 2628.33 samples/sec Loss 11.6305 LearningRate 0.0728 Epoch: 2 Global Step: 121970 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:26:11,898-Speed 2626.85 samples/sec Loss 11.6242 LearningRate 0.0728 Epoch: 2 Global Step: 121980 Fp16 Grad Scale: 131072 Required: 80 hours
Training: 2022-04-13 09:26:15,797-Speed 2626.63 samples/sec Loss 11.8262 LearningRate 0.0728 Epoch: 2 Global Step: 121990 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:26:19,662-Speed 2649.99 samples/sec Loss 12.2163 LearningRate 0.0728 Epoch: 2 Global Step: 122000 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:23,561-Speed 2627.48 samples/sec Loss 12.4100 LearningRate 0.0727 Epoch: 2 Global Step: 122010 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:27,470-Speed 2620.31 samples/sec Loss 12.0318 LearningRate 0.0727 Epoch: 2 Global Step: 122020 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:31,371-Speed 2625.42 samples/sec Loss 11.9977 LearningRate 0.0727 Epoch: 2 Global Step: 122030 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:35,290-Speed 2613.86 samples/sec Loss 12.0182 LearningRate 0.0727 Epoch: 2 Global Step: 122040 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:39,186-Speed 2628.88 samples/sec Loss 11.9878 LearningRate 0.0727 Epoch: 2 Global Step: 122050 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:43,087-Speed 2625.94 samples/sec Loss 11.8040 LearningRate 0.0727 Epoch: 2 Global Step: 122060 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:46,987-Speed 2626.17 samples/sec Loss 11.7951 LearningRate 0.0727 Epoch: 2 Global Step: 122070 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:50,887-Speed 2626.12 samples/sec Loss 11.9277 LearningRate 0.0727 Epoch: 2 Global Step: 122080 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:54,789-Speed 2625.36 samples/sec Loss 11.8413 LearningRate 0.0727 Epoch: 2 Global Step: 122090 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:26:58,690-Speed 2625.64 samples/sec Loss 11.7738 LearningRate 0.0727 Epoch: 2 Global Step: 122100 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:02,600-Speed 2619.60 samples/sec Loss 11.8651 LearningRate 0.0727 Epoch: 2 Global Step: 122110 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:06,500-Speed 2626.36 samples/sec Loss 11.9163 LearningRate 0.0727 Epoch: 2 Global Step: 122120 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:10,404-Speed 2622.97 samples/sec Loss 11.8260 LearningRate 0.0727 Epoch: 2 Global Step: 122130 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:14,317-Speed 2617.25 samples/sec Loss 11.8428 LearningRate 0.0727 Epoch: 2 Global Step: 122140 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:18,217-Speed 2626.45 samples/sec Loss 11.9809 LearningRate 0.0727 Epoch: 2 Global Step: 122150 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:22,118-Speed 2625.59 samples/sec Loss 11.9449 LearningRate 0.0727 Epoch: 2 Global Step: 122160 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:26,025-Speed 2622.53 samples/sec Loss 11.9919 LearningRate 0.0727 Epoch: 2 Global Step: 122170 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:29,954-Speed 2606.78 samples/sec Loss 11.9123 LearningRate 0.0727 Epoch: 2 Global Step: 122180 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:33,852-Speed 2627.25 samples/sec Loss 11.8406 LearningRate 0.0727 Epoch: 2 Global Step: 122190 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:27:37,750-Speed 2627.21 samples/sec Loss 11.8255 LearningRate 0.0727 Epoch: 2 Global Step: 122200 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:27:41,647-Speed 2628.64 samples/sec Loss 11.8377 LearningRate 0.0727 Epoch: 2 Global Step: 122210 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:27:45,550-Speed 2623.85 samples/sec Loss 11.7523 LearningRate 0.0727 Epoch: 2 Global Step: 122220 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:27:49,447-Speed 2628.66 samples/sec Loss 11.7737 LearningRate 0.0727 Epoch: 2 Global Step: 122230 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:27:53,347-Speed 2626.59 samples/sec Loss 11.9187 LearningRate 0.0727 Epoch: 2 Global Step: 122240 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:27:57,245-Speed 2627.85 samples/sec Loss 11.9339 LearningRate 0.0727 Epoch: 2 Global Step: 122250 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:28:01,152-Speed 2621.39 samples/sec Loss 11.8834 LearningRate 0.0727 Epoch: 2 Global Step: 122260 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:28:05,052-Speed 2626.37 samples/sec Loss 11.9217 LearningRate 0.0727 Epoch: 2 Global Step: 122270 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:28:08,964-Speed 2618.13 samples/sec Loss 11.5762 LearningRate 0.0727 Epoch: 2 Global Step: 122280 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:28:12,862-Speed 2627.55 samples/sec Loss 11.8152 LearningRate 0.0727 Epoch: 2 Global Step: 122290 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:28:16,760-Speed 2627.35 samples/sec Loss 11.9525 LearningRate 0.0727 Epoch: 2 Global Step: 122300 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:28:20,666-Speed 2622.38 samples/sec Loss 11.9825 LearningRate 0.0727 Epoch: 2 Global Step: 122310 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:28:24,568-Speed 2625.42 samples/sec Loss 11.8499 LearningRate 0.0727 Epoch: 2 Global Step: 122320 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:28:28,457-Speed 2634.17 samples/sec Loss 11.8926 LearningRate 0.0727 Epoch: 2 Global Step: 122330 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:32,370-Speed 2617.32 samples/sec Loss 11.7684 LearningRate 0.0727 Epoch: 2 Global Step: 122340 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:36,265-Speed 2629.31 samples/sec Loss 11.8037 LearningRate 0.0727 Epoch: 2 Global Step: 122350 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:40,170-Speed 2622.79 samples/sec Loss 11.8498 LearningRate 0.0727 Epoch: 2 Global Step: 122360 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:44,076-Speed 2622.50 samples/sec Loss 12.0399 LearningRate 0.0727 Epoch: 2 Global Step: 122370 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:47,980-Speed 2623.42 samples/sec Loss 11.8729 LearningRate 0.0727 Epoch: 2 Global Step: 122380 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:51,896-Speed 2615.24 samples/sec Loss 11.9809 LearningRate 0.0727 Epoch: 2 Global Step: 122390 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:55,798-Speed 2625.67 samples/sec Loss 11.7854 LearningRate 0.0727 Epoch: 2 Global Step: 122400 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:28:59,700-Speed 2624.61 samples/sec Loss 11.8766 LearningRate 0.0727 Epoch: 2 Global Step: 122410 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:03,607-Speed 2621.44 samples/sec Loss 11.7952 LearningRate 0.0727 Epoch: 2 Global Step: 122420 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:07,513-Speed 2622.62 samples/sec Loss 11.8860 LearningRate 0.0727 Epoch: 2 Global Step: 122430 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:29:11,414-Speed 2625.70 samples/sec Loss 11.9470 LearningRate 0.0727 Epoch: 2 Global Step: 122440 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:29:15,354-Speed 2599.76 samples/sec Loss 11.8821 LearningRate 0.0727 Epoch: 2 Global Step: 122450 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:29:19,247-Speed 2631.05 samples/sec Loss 11.6876 LearningRate 0.0727 Epoch: 2 Global Step: 122460 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:29:23,128-Speed 2638.65 samples/sec Loss 11.8168 LearningRate 0.0727 Epoch: 2 Global Step: 122470 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:27,051-Speed 2611.13 samples/sec Loss 11.9437 LearningRate 0.0727 Epoch: 2 Global Step: 122480 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:30,961-Speed 2619.50 samples/sec Loss 11.9653 LearningRate 0.0726 Epoch: 2 Global Step: 122490 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:34,857-Speed 2628.98 samples/sec Loss 11.8267 LearningRate 0.0726 Epoch: 2 Global Step: 122500 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:38,758-Speed 2625.58 samples/sec Loss 11.7818 LearningRate 0.0726 Epoch: 2 Global Step: 122510 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:42,655-Speed 2628.56 samples/sec Loss 11.8563 LearningRate 0.0726 Epoch: 2 Global Step: 122520 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:46,552-Speed 2628.38 samples/sec Loss 11.8464 LearningRate 0.0726 Epoch: 2 Global Step: 122530 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:50,454-Speed 2624.43 samples/sec Loss 11.7533 LearningRate 0.0726 Epoch: 2 Global Step: 122540 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:54,351-Speed 2628.34 samples/sec Loss 11.7482 LearningRate 0.0726 Epoch: 2 Global Step: 122550 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:29:58,253-Speed 2624.90 samples/sec Loss 11.7288 LearningRate 0.0726 Epoch: 2 Global Step: 122560 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:30:02,152-Speed 2627.02 samples/sec Loss 11.9754 LearningRate 0.0726 Epoch: 2 Global Step: 122570 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:06,052-Speed 2626.71 samples/sec Loss 11.8080 LearningRate 0.0726 Epoch: 2 Global Step: 122580 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:09,960-Speed 2620.62 samples/sec Loss 11.7659 LearningRate 0.0726 Epoch: 2 Global Step: 122590 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:13,898-Speed 2601.17 samples/sec Loss 11.7456 LearningRate 0.0726 Epoch: 2 Global Step: 122600 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:17,820-Speed 2611.47 samples/sec Loss 11.8179 LearningRate 0.0726 Epoch: 2 Global Step: 122610 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:21,731-Speed 2619.08 samples/sec Loss 11.8097 LearningRate 0.0726 Epoch: 2 Global Step: 122620 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:25,658-Speed 2607.88 samples/sec Loss 11.7864 LearningRate 0.0726 Epoch: 2 Global Step: 122630 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:29,557-Speed 2626.77 samples/sec Loss 11.7339 LearningRate 0.0726 Epoch: 2 Global Step: 122640 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:33,457-Speed 2626.71 samples/sec Loss 11.8004 LearningRate 0.0726 Epoch: 2 Global Step: 122650 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:37,357-Speed 2626.04 samples/sec Loss 11.9122 LearningRate 0.0726 Epoch: 2 Global Step: 122660 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:30:41,256-Speed 2626.99 samples/sec Loss 11.9685 LearningRate 0.0726 Epoch: 2 Global Step: 122670 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:30:45,160-Speed 2623.48 samples/sec Loss 11.8268 LearningRate 0.0726 Epoch: 2 Global Step: 122680 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:30:49,063-Speed 2624.58 samples/sec Loss 11.7625 LearningRate 0.0726 Epoch: 2 Global Step: 122690 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:30:52,963-Speed 2626.03 samples/sec Loss 11.8637 LearningRate 0.0726 Epoch: 2 Global Step: 122700 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:30:56,847-Speed 2637.02 samples/sec Loss 11.8649 LearningRate 0.0726 Epoch: 2 Global Step: 122710 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:00,747-Speed 2626.88 samples/sec Loss 11.7112 LearningRate 0.0726 Epoch: 2 Global Step: 122720 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:04,663-Speed 2615.60 samples/sec Loss 11.8120 LearningRate 0.0726 Epoch: 2 Global Step: 122730 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:08,576-Speed 2617.03 samples/sec Loss 11.8884 LearningRate 0.0726 Epoch: 2 Global Step: 122740 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:12,475-Speed 2627.32 samples/sec Loss 11.6631 LearningRate 0.0726 Epoch: 2 Global Step: 122750 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:16,376-Speed 2625.73 samples/sec Loss 11.7905 LearningRate 0.0726 Epoch: 2 Global Step: 122760 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:20,272-Speed 2628.63 samples/sec Loss 11.9228 LearningRate 0.0726 Epoch: 2 Global Step: 122770 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:24,205-Speed 2604.32 samples/sec Loss 11.7029 LearningRate 0.0726 Epoch: 2 Global Step: 122780 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:28,104-Speed 2626.94 samples/sec Loss 11.7505 LearningRate 0.0726 Epoch: 2 Global Step: 122790 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:32,033-Speed 2607.41 samples/sec Loss 11.9277 LearningRate 0.0726 Epoch: 2 Global Step: 122800 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:35,937-Speed 2623.77 samples/sec Loss 11.6777 LearningRate 0.0726 Epoch: 2 Global Step: 122810 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:31:39,851-Speed 2616.69 samples/sec Loss 11.8578 LearningRate 0.0726 Epoch: 2 Global Step: 122820 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:31:43,737-Speed 2635.45 samples/sec Loss 11.8592 LearningRate 0.0726 Epoch: 2 Global Step: 122830 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:47,716-Speed 2574.29 samples/sec Loss 11.7583 LearningRate 0.0726 Epoch: 2 Global Step: 122840 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:51,620-Speed 2623.39 samples/sec Loss 11.9020 LearningRate 0.0726 Epoch: 2 Global Step: 122850 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:55,524-Speed 2624.17 samples/sec Loss 11.7362 LearningRate 0.0726 Epoch: 2 Global Step: 122860 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:31:59,424-Speed 2626.17 samples/sec Loss 11.9008 LearningRate 0.0726 Epoch: 2 Global Step: 122870 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:03,325-Speed 2626.15 samples/sec Loss 11.8599 LearningRate 0.0726 Epoch: 2 Global Step: 122880 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:07,222-Speed 2627.58 samples/sec Loss 11.7925 LearningRate 0.0726 Epoch: 2 Global Step: 122890 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:11,125-Speed 2624.30 samples/sec Loss 11.7926 LearningRate 0.0726 Epoch: 2 Global Step: 122900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:15,052-Speed 2608.20 samples/sec Loss 11.7769 LearningRate 0.0726 Epoch: 2 Global Step: 122910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:18,951-Speed 2627.12 samples/sec Loss 11.8210 LearningRate 0.0726 Epoch: 2 Global Step: 122920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:22,841-Speed 2633.20 samples/sec Loss 11.5902 LearningRate 0.0726 Epoch: 2 Global Step: 122930 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:26,738-Speed 2628.20 samples/sec Loss 11.8646 LearningRate 0.0726 Epoch: 2 Global Step: 122940 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:30,635-Speed 2628.45 samples/sec Loss 11.7951 LearningRate 0.0726 Epoch: 2 Global Step: 122950 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:34,532-Speed 2627.78 samples/sec Loss 11.8079 LearningRate 0.0726 Epoch: 2 Global Step: 122960 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:38,433-Speed 2625.88 samples/sec Loss 11.7106 LearningRate 0.0726 Epoch: 2 Global Step: 122970 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:42,336-Speed 2624.30 samples/sec Loss 11.7364 LearningRate 0.0725 Epoch: 2 Global Step: 122980 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:46,244-Speed 2621.17 samples/sec Loss 11.7981 LearningRate 0.0725 Epoch: 2 Global Step: 122990 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:50,154-Speed 2619.56 samples/sec Loss 11.8677 LearningRate 0.0725 Epoch: 2 Global Step: 123000 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:54,076-Speed 2611.66 samples/sec Loss 11.8484 LearningRate 0.0725 Epoch: 2 Global Step: 123010 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:32:57,976-Speed 2626.19 samples/sec Loss 11.7385 LearningRate 0.0725 Epoch: 2 Global Step: 123020 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:01,882-Speed 2622.23 samples/sec Loss 11.8070 LearningRate 0.0725 Epoch: 2 Global Step: 123030 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:33:05,804-Speed 2611.40 samples/sec Loss 11.8015 LearningRate 0.0725 Epoch: 2 Global Step: 123040 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:33:09,693-Speed 2633.80 samples/sec Loss 11.7360 LearningRate 0.0725 Epoch: 2 Global Step: 123050 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:13,596-Speed 2624.31 samples/sec Loss 11.8730 LearningRate 0.0725 Epoch: 2 Global Step: 123060 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:17,513-Speed 2614.64 samples/sec Loss 11.7912 LearningRate 0.0725 Epoch: 2 Global Step: 123070 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:21,435-Speed 2611.85 samples/sec Loss 11.8747 LearningRate 0.0725 Epoch: 2 Global Step: 123080 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:25,336-Speed 2626.08 samples/sec Loss 11.8608 LearningRate 0.0725 Epoch: 2 Global Step: 123090 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:29,252-Speed 2615.66 samples/sec Loss 11.8621 LearningRate 0.0725 Epoch: 2 Global Step: 123100 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:33,146-Speed 2630.04 samples/sec Loss 11.7494 LearningRate 0.0725 Epoch: 2 Global Step: 123110 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:37,049-Speed 2624.51 samples/sec Loss 11.7124 LearningRate 0.0725 Epoch: 2 Global Step: 123120 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:40,950-Speed 2625.73 samples/sec Loss 11.8833 LearningRate 0.0725 Epoch: 2 Global Step: 123130 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:44,845-Speed 2629.38 samples/sec Loss 11.8988 LearningRate 0.0725 Epoch: 2 Global Step: 123140 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:33:48,745-Speed 2626.20 samples/sec Loss 11.7250 LearningRate 0.0725 Epoch: 2 Global Step: 123150 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:33:52,646-Speed 2625.18 samples/sec Loss 11.8196 LearningRate 0.0725 Epoch: 2 Global Step: 123160 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:33:56,526-Speed 2640.69 samples/sec Loss 11.7819 LearningRate 0.0725 Epoch: 2 Global Step: 123170 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:00,448-Speed 2611.62 samples/sec Loss 11.7879 LearningRate 0.0725 Epoch: 2 Global Step: 123180 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:04,351-Speed 2624.07 samples/sec Loss 11.8508 LearningRate 0.0725 Epoch: 2 Global Step: 123190 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:08,254-Speed 2624.12 samples/sec Loss 11.6372 LearningRate 0.0725 Epoch: 2 Global Step: 123200 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:12,160-Speed 2622.39 samples/sec Loss 11.7380 LearningRate 0.0725 Epoch: 2 Global Step: 123210 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:16,080-Speed 2613.04 samples/sec Loss 11.7770 LearningRate 0.0725 Epoch: 2 Global Step: 123220 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:19,993-Speed 2618.04 samples/sec Loss 11.6381 LearningRate 0.0725 Epoch: 2 Global Step: 123230 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:23,923-Speed 2606.25 samples/sec Loss 11.7807 LearningRate 0.0725 Epoch: 2 Global Step: 123240 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:27,826-Speed 2624.13 samples/sec Loss 11.6850 LearningRate 0.0725 Epoch: 2 Global Step: 123250 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:31,754-Speed 2607.99 samples/sec Loss 11.7742 LearningRate 0.0725 Epoch: 2 Global Step: 123260 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:35,643-Speed 2633.24 samples/sec Loss 11.8477 LearningRate 0.0725 Epoch: 2 Global Step: 123270 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:39,543-Speed 2626.55 samples/sec Loss 11.7729 LearningRate 0.0725 Epoch: 2 Global Step: 123280 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:43,446-Speed 2624.08 samples/sec Loss 11.8566 LearningRate 0.0725 Epoch: 2 Global Step: 123290 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:47,352-Speed 2622.24 samples/sec Loss 11.6844 LearningRate 0.0725 Epoch: 2 Global Step: 123300 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:51,251-Speed 2627.31 samples/sec Loss 11.9179 LearningRate 0.0725 Epoch: 2 Global Step: 123310 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:55,150-Speed 2626.45 samples/sec Loss 11.8574 LearningRate 0.0725 Epoch: 2 Global Step: 123320 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:34:59,049-Speed 2627.78 samples/sec Loss 11.7602 LearningRate 0.0725 Epoch: 2 Global Step: 123330 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:02,946-Speed 2627.95 samples/sec Loss 11.7008 LearningRate 0.0725 Epoch: 2 Global Step: 123340 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:06,846-Speed 2626.21 samples/sec Loss 11.8657 LearningRate 0.0725 Epoch: 2 Global Step: 123350 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:10,750-Speed 2623.42 samples/sec Loss 11.8312 LearningRate 0.0725 Epoch: 2 Global Step: 123360 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:14,654-Speed 2623.58 samples/sec Loss 11.9368 LearningRate 0.0725 Epoch: 2 Global Step: 123370 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:35:18,552-Speed 2628.02 samples/sec Loss 11.8768 LearningRate 0.0725 Epoch: 2 Global Step: 123380 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:35:22,438-Speed 2635.84 samples/sec Loss 11.7038 LearningRate 0.0725 Epoch: 2 Global Step: 123390 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:26,349-Speed 2619.74 samples/sec Loss 11.7516 LearningRate 0.0725 Epoch: 2 Global Step: 123400 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:30,257-Speed 2620.31 samples/sec Loss 11.8085 LearningRate 0.0725 Epoch: 2 Global Step: 123410 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:34,251-Speed 2564.48 samples/sec Loss 11.8257 LearningRate 0.0725 Epoch: 2 Global Step: 123420 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:38,172-Speed 2612.15 samples/sec Loss 11.8632 LearningRate 0.0725 Epoch: 2 Global Step: 123430 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:42,076-Speed 2623.71 samples/sec Loss 11.7448 LearningRate 0.0725 Epoch: 2 Global Step: 123440 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:45,979-Speed 2624.46 samples/sec Loss 11.7232 LearningRate 0.0725 Epoch: 2 Global Step: 123450 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:49,877-Speed 2627.95 samples/sec Loss 11.8808 LearningRate 0.0725 Epoch: 2 Global Step: 123460 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:53,777-Speed 2625.88 samples/sec Loss 11.6846 LearningRate 0.0724 Epoch: 2 Global Step: 123470 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:35:57,677-Speed 2626.38 samples/sec Loss 11.6540 LearningRate 0.0724 Epoch: 2 Global Step: 123480 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:01,577-Speed 2626.00 samples/sec Loss 11.8238 LearningRate 0.0724 Epoch: 2 Global Step: 123490 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:36:05,491-Speed 2616.88 samples/sec Loss 11.9202 LearningRate 0.0724 Epoch: 2 Global Step: 123500 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:36:09,394-Speed 2623.95 samples/sec Loss 11.7743 LearningRate 0.0724 Epoch: 2 Global Step: 123510 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:36:13,328-Speed 2603.98 samples/sec Loss 11.6460 LearningRate 0.0724 Epoch: 2 Global Step: 123520 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:36:17,207-Speed 2640.97 samples/sec Loss 11.7637 LearningRate 0.0724 Epoch: 2 Global Step: 123530 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:21,120-Speed 2617.51 samples/sec Loss 11.8375 LearningRate 0.0724 Epoch: 2 Global Step: 123540 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:25,024-Speed 2623.60 samples/sec Loss 11.7961 LearningRate 0.0724 Epoch: 2 Global Step: 123550 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:28,927-Speed 2624.41 samples/sec Loss 11.8193 LearningRate 0.0724 Epoch: 2 Global Step: 123560 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:32,831-Speed 2623.49 samples/sec Loss 11.6527 LearningRate 0.0724 Epoch: 2 Global Step: 123570 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:36,735-Speed 2623.66 samples/sec Loss 11.8526 LearningRate 0.0724 Epoch: 2 Global Step: 123580 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:40,647-Speed 2617.75 samples/sec Loss 11.8455 LearningRate 0.0724 Epoch: 2 Global Step: 123590 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:44,544-Speed 2628.92 samples/sec Loss 11.6886 LearningRate 0.0724 Epoch: 2 Global Step: 123600 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:48,446-Speed 2624.88 samples/sec Loss 11.7370 LearningRate 0.0724 Epoch: 2 Global Step: 123610 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:52,363-Speed 2614.84 samples/sec Loss 11.7370 LearningRate 0.0724 Epoch: 2 Global Step: 123620 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:36:56,249-Speed 2636.26 samples/sec Loss 11.7274 LearningRate 0.0724 Epoch: 2 Global Step: 123630 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:00,148-Speed 2626.71 samples/sec Loss 11.8203 LearningRate 0.0724 Epoch: 2 Global Step: 123640 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:04,047-Speed 2627.15 samples/sec Loss 11.8788 LearningRate 0.0724 Epoch: 2 Global Step: 123650 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:07,949-Speed 2625.06 samples/sec Loss 11.7873 LearningRate 0.0724 Epoch: 2 Global Step: 123660 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:11,848-Speed 2626.97 samples/sec Loss 11.8868 LearningRate 0.0724 Epoch: 2 Global Step: 123670 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:15,748-Speed 2626.11 samples/sec Loss 11.7978 LearningRate 0.0724 Epoch: 2 Global Step: 123680 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:19,653-Speed 2623.05 samples/sec Loss 11.8075 LearningRate 0.0724 Epoch: 2 Global Step: 123690 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:23,555-Speed 2625.73 samples/sec Loss 11.7920 LearningRate 0.0724 Epoch: 2 Global Step: 123700 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:27,471-Speed 2615.58 samples/sec Loss 11.8945 LearningRate 0.0724 Epoch: 2 Global Step: 123710 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:31,391-Speed 2612.84 samples/sec Loss 11.7737 LearningRate 0.0724 Epoch: 2 Global Step: 123720 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:35,296-Speed 2623.36 samples/sec Loss 11.8205 LearningRate 0.0724 Epoch: 2 Global Step: 123730 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:37:39,205-Speed 2620.08 samples/sec Loss 11.8773 LearningRate 0.0724 Epoch: 2 Global Step: 123740 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:37:43,110-Speed 2622.79 samples/sec Loss 11.7602 LearningRate 0.0724 Epoch: 2 Global Step: 123750 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:37:46,998-Speed 2633.99 samples/sec Loss 11.8038 LearningRate 0.0724 Epoch: 2 Global Step: 123760 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:50,907-Speed 2620.50 samples/sec Loss 11.8458 LearningRate 0.0724 Epoch: 2 Global Step: 123770 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:54,816-Speed 2620.31 samples/sec Loss 11.8431 LearningRate 0.0724 Epoch: 2 Global Step: 123780 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:37:58,725-Speed 2620.16 samples/sec Loss 11.7515 LearningRate 0.0724 Epoch: 2 Global Step: 123790 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:02,670-Speed 2596.66 samples/sec Loss 11.9315 LearningRate 0.0724 Epoch: 2 Global Step: 123800 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:06,571-Speed 2625.68 samples/sec Loss 11.7631 LearningRate 0.0724 Epoch: 2 Global Step: 123810 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:10,473-Speed 2624.71 samples/sec Loss 11.6866 LearningRate 0.0724 Epoch: 2 Global Step: 123820 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:14,384-Speed 2618.94 samples/sec Loss 11.8883 LearningRate 0.0724 Epoch: 2 Global Step: 123830 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:18,426-Speed 2534.70 samples/sec Loss 11.7535 LearningRate 0.0724 Epoch: 2 Global Step: 123840 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:22,447-Speed 2547.06 samples/sec Loss 11.7352 LearningRate 0.0724 Epoch: 2 Global Step: 123850 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:26,354-Speed 2621.16 samples/sec Loss 11.6854 LearningRate 0.0724 Epoch: 2 Global Step: 123860 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:38:30,270-Speed 2615.41 samples/sec Loss 11.9382 LearningRate 0.0724 Epoch: 2 Global Step: 123870 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:38:34,157-Speed 2635.38 samples/sec Loss 11.8460 LearningRate 0.0724 Epoch: 2 Global Step: 123880 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:38,059-Speed 2624.53 samples/sec Loss 11.6859 LearningRate 0.0724 Epoch: 2 Global Step: 123890 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:41,968-Speed 2620.33 samples/sec Loss 11.8456 LearningRate 0.0724 Epoch: 2 Global Step: 123900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:45,863-Speed 2629.71 samples/sec Loss 11.7904 LearningRate 0.0724 Epoch: 2 Global Step: 123910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:49,768-Speed 2623.02 samples/sec Loss 11.8729 LearningRate 0.0724 Epoch: 2 Global Step: 123920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:53,670-Speed 2624.88 samples/sec Loss 11.9724 LearningRate 0.0724 Epoch: 2 Global Step: 123930 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:38:57,598-Speed 2607.24 samples/sec Loss 11.6316 LearningRate 0.0724 Epoch: 2 Global Step: 123940 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:01,502-Speed 2623.53 samples/sec Loss 11.8404 LearningRate 0.0723 Epoch: 2 Global Step: 123950 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:05,402-Speed 2626.44 samples/sec Loss 11.7150 LearningRate 0.0723 Epoch: 2 Global Step: 123960 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:09,306-Speed 2623.85 samples/sec Loss 11.7722 LearningRate 0.0723 Epoch: 2 Global Step: 123970 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:13,208-Speed 2625.17 samples/sec Loss 11.7193 LearningRate 0.0723 Epoch: 2 Global Step: 123980 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:39:17,098-Speed 2632.60 samples/sec Loss 11.7745 LearningRate 0.0723 Epoch: 2 Global Step: 123990 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:20,990-Speed 2631.98 samples/sec Loss 11.8023 LearningRate 0.0723 Epoch: 2 Global Step: 124000 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:24,902-Speed 2618.04 samples/sec Loss 11.9746 LearningRate 0.0723 Epoch: 2 Global Step: 124010 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:28,817-Speed 2616.83 samples/sec Loss 11.7262 LearningRate 0.0723 Epoch: 2 Global Step: 124020 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:32,734-Speed 2614.87 samples/sec Loss 11.9272 LearningRate 0.0723 Epoch: 2 Global Step: 124030 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:36,631-Speed 2627.91 samples/sec Loss 11.8676 LearningRate 0.0723 Epoch: 2 Global Step: 124040 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:40,538-Speed 2621.93 samples/sec Loss 11.7195 LearningRate 0.0723 Epoch: 2 Global Step: 124050 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:44,447-Speed 2620.61 samples/sec Loss 11.7235 LearningRate 0.0723 Epoch: 2 Global Step: 124060 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:48,351-Speed 2623.39 samples/sec Loss 11.7162 LearningRate 0.0723 Epoch: 2 Global Step: 124070 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:52,271-Speed 2613.55 samples/sec Loss 11.9289 LearningRate 0.0723 Epoch: 2 Global Step: 124080 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:39:56,180-Speed 2620.36 samples/sec Loss 11.9181 LearningRate 0.0723 Epoch: 2 Global Step: 124090 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:40:00,095-Speed 2616.25 samples/sec Loss 11.7750 LearningRate 0.0723 Epoch: 2 Global Step: 124100 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:40:03,989-Speed 2630.33 samples/sec Loss 11.7366 LearningRate 0.0723 Epoch: 2 Global Step: 124110 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:07,902-Speed 2617.18 samples/sec Loss 11.9672 LearningRate 0.0723 Epoch: 2 Global Step: 124120 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:11,811-Speed 2620.40 samples/sec Loss 11.8273 LearningRate 0.0723 Epoch: 2 Global Step: 124130 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:15,720-Speed 2620.42 samples/sec Loss 11.8792 LearningRate 0.0723 Epoch: 2 Global Step: 124140 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:19,635-Speed 2616.14 samples/sec Loss 11.7589 LearningRate 0.0723 Epoch: 2 Global Step: 124150 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:23,539-Speed 2623.24 samples/sec Loss 11.7384 LearningRate 0.0723 Epoch: 2 Global Step: 124160 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:27,462-Speed 2611.28 samples/sec Loss 11.7829 LearningRate 0.0723 Epoch: 2 Global Step: 124170 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:31,368-Speed 2622.06 samples/sec Loss 11.9226 LearningRate 0.0723 Epoch: 2 Global Step: 124180 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:35,279-Speed 2618.78 samples/sec Loss 11.9701 LearningRate 0.0723 Epoch: 2 Global Step: 124190 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:39,193-Speed 2616.63 samples/sec Loss 11.7124 LearningRate 0.0723 Epoch: 2 Global Step: 124200 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:43,086-Speed 2631.61 samples/sec Loss 11.7927 LearningRate 0.0723 Epoch: 2 Global Step: 124210 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:46,993-Speed 2621.42 samples/sec Loss 11.6731 LearningRate 0.0723 Epoch: 2 Global Step: 124220 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:50,951-Speed 2588.26 samples/sec Loss 11.7843 LearningRate 0.0723 Epoch: 2 Global Step: 124230 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:40:54,855-Speed 2623.91 samples/sec Loss 11.5960 LearningRate 0.0723 Epoch: 2 Global Step: 124240 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:40:58,749-Speed 2630.36 samples/sec Loss 11.7524 LearningRate 0.0723 Epoch: 2 Global Step: 124250 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:02,649-Speed 2626.20 samples/sec Loss 11.7790 LearningRate 0.0723 Epoch: 2 Global Step: 124260 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:06,556-Speed 2621.61 samples/sec Loss 11.8606 LearningRate 0.0723 Epoch: 2 Global Step: 124270 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:10,459-Speed 2623.44 samples/sec Loss 11.7290 LearningRate 0.0723 Epoch: 2 Global Step: 124280 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:14,364-Speed 2623.59 samples/sec Loss 11.6771 LearningRate 0.0723 Epoch: 2 Global Step: 124290 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:18,276-Speed 2618.28 samples/sec Loss 11.8401 LearningRate 0.0723 Epoch: 2 Global Step: 124300 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:22,204-Speed 2608.00 samples/sec Loss 11.7549 LearningRate 0.0723 Epoch: 2 Global Step: 124310 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:26,118-Speed 2617.28 samples/sec Loss 11.8570 LearningRate 0.0723 Epoch: 2 Global Step: 124320 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:30,028-Speed 2619.60 samples/sec Loss 11.7917 LearningRate 0.0723 Epoch: 2 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:41:33,948-Speed 2612.75 samples/sec Loss 11.7748 LearningRate 0.0723 Epoch: 2 Global Step: 124340 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:41:37,855-Speed 2621.33 samples/sec Loss 11.9142 LearningRate 0.0723 Epoch: 2 Global Step: 124350 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:41:41,761-Speed 2621.97 samples/sec Loss 11.5553 LearningRate 0.0723 Epoch: 2 Global Step: 124360 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:41:45,701-Speed 2600.36 samples/sec Loss 11.7386 LearningRate 0.0723 Epoch: 2 Global Step: 124370 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:41:49,598-Speed 2628.58 samples/sec Loss 11.8113 LearningRate 0.0723 Epoch: 2 Global Step: 124380 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:41:53,496-Speed 2627.03 samples/sec Loss 11.7655 LearningRate 0.0723 Epoch: 2 Global Step: 124390 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:41:57,427-Speed 2606.27 samples/sec Loss 11.9761 LearningRate 0.0723 Epoch: 2 Global Step: 124400 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:01,330-Speed 2623.94 samples/sec Loss 11.8139 LearningRate 0.0723 Epoch: 2 Global Step: 124410 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:05,304-Speed 2577.41 samples/sec Loss 11.7064 LearningRate 0.0723 Epoch: 2 Global Step: 124420 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:09,201-Speed 2628.63 samples/sec Loss 11.7881 LearningRate 0.0723 Epoch: 2 Global Step: 124430 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:30,304-Speed 485.27 samples/sec Loss 11.9024 LearningRate 0.0722 Epoch: 3 Global Step: 124440 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:34,183-Speed 2641.13 samples/sec Loss 11.7427 LearningRate 0.0722 Epoch: 3 Global Step: 124450 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:38,075-Speed 2631.38 samples/sec Loss 11.7811 LearningRate 0.0722 Epoch: 3 Global Step: 124460 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:41,989-Speed 2617.14 samples/sec Loss 11.8198 LearningRate 0.0722 Epoch: 3 Global Step: 124470 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:45,877-Speed 2634.52 samples/sec Loss 11.7142 LearningRate 0.0722 Epoch: 3 Global Step: 124480 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:49,762-Speed 2636.12 samples/sec Loss 11.7939 LearningRate 0.0722 Epoch: 3 Global Step: 124490 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:53,652-Speed 2633.23 samples/sec Loss 11.8602 LearningRate 0.0722 Epoch: 3 Global Step: 124500 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:42:57,542-Speed 2633.44 samples/sec Loss 11.7628 LearningRate 0.0722 Epoch: 3 Global Step: 124510 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:01,443-Speed 2625.93 samples/sec Loss 11.6352 LearningRate 0.0722 Epoch: 3 Global Step: 124520 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:05,340-Speed 2627.60 samples/sec Loss 11.8756 LearningRate 0.0722 Epoch: 3 Global Step: 124530 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:09,238-Speed 2627.35 samples/sec Loss 11.8486 LearningRate 0.0722 Epoch: 3 Global Step: 124540 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:43:13,113-Speed 2643.28 samples/sec Loss 11.8849 LearningRate 0.0722 Epoch: 3 Global Step: 124550 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:17,038-Speed 2609.86 samples/sec Loss 11.7464 LearningRate 0.0722 Epoch: 3 Global Step: 124560 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:20,930-Speed 2631.63 samples/sec Loss 11.8035 LearningRate 0.0722 Epoch: 3 Global Step: 124570 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:24,857-Speed 2608.74 samples/sec Loss 11.7912 LearningRate 0.0722 Epoch: 3 Global Step: 124580 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:28,866-Speed 2554.90 samples/sec Loss 11.8408 LearningRate 0.0722 Epoch: 3 Global Step: 124590 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:32,761-Speed 2629.83 samples/sec Loss 11.8870 LearningRate 0.0722 Epoch: 3 Global Step: 124600 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:36,683-Speed 2611.53 samples/sec Loss 11.8761 LearningRate 0.0722 Epoch: 3 Global Step: 124610 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:40,577-Speed 2630.70 samples/sec Loss 11.8428 LearningRate 0.0722 Epoch: 3 Global Step: 124620 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:44,472-Speed 2628.89 samples/sec Loss 11.8663 LearningRate 0.0722 Epoch: 3 Global Step: 124630 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:48,369-Speed 2628.51 samples/sec Loss 11.8508 LearningRate 0.0722 Epoch: 3 Global Step: 124640 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:52,252-Speed 2638.05 samples/sec Loss 11.7146 LearningRate 0.0722 Epoch: 3 Global Step: 124650 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:43:56,147-Speed 2629.70 samples/sec Loss 11.6892 LearningRate 0.0722 Epoch: 3 Global Step: 124660 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:00,042-Speed 2629.76 samples/sec Loss 11.7921 LearningRate 0.0722 Epoch: 3 Global Step: 124670 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:03,939-Speed 2628.46 samples/sec Loss 11.8371 LearningRate 0.0722 Epoch: 3 Global Step: 124680 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:07,855-Speed 2615.32 samples/sec Loss 11.7702 LearningRate 0.0722 Epoch: 3 Global Step: 124690 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:11,764-Speed 2620.36 samples/sec Loss 11.6421 LearningRate 0.0722 Epoch: 3 Global Step: 124700 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:15,662-Speed 2627.64 samples/sec Loss 11.7817 LearningRate 0.0722 Epoch: 3 Global Step: 124710 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:19,562-Speed 2626.54 samples/sec Loss 11.7138 LearningRate 0.0722 Epoch: 3 Global Step: 124720 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:23,462-Speed 2626.46 samples/sec Loss 11.8039 LearningRate 0.0722 Epoch: 3 Global Step: 124730 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:27,360-Speed 2627.56 samples/sec Loss 11.7614 LearningRate 0.0722 Epoch: 3 Global Step: 124740 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:44:31,260-Speed 2625.79 samples/sec Loss 11.7497 LearningRate 0.0722 Epoch: 3 Global Step: 124750 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:44:35,167-Speed 2622.06 samples/sec Loss 11.8649 LearningRate 0.0722 Epoch: 3 Global Step: 124760 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:44:39,073-Speed 2622.58 samples/sec Loss 11.8077 LearningRate 0.0722 Epoch: 3 Global Step: 124770 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:44:42,975-Speed 2624.79 samples/sec Loss 11.7902 LearningRate 0.0722 Epoch: 3 Global Step: 124780 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:44:46,878-Speed 2624.63 samples/sec Loss 11.7090 LearningRate 0.0722 Epoch: 3 Global Step: 124790 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:44:50,786-Speed 2621.03 samples/sec Loss 11.6608 LearningRate 0.0722 Epoch: 3 Global Step: 124800 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:44:54,703-Speed 2615.51 samples/sec Loss 11.8120 LearningRate 0.0722 Epoch: 3 Global Step: 124810 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:44:58,590-Speed 2634.64 samples/sec Loss 11.7161 LearningRate 0.0722 Epoch: 3 Global Step: 124820 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:45:02,469-Speed 2641.03 samples/sec Loss 12.0170 LearningRate 0.0722 Epoch: 3 Global Step: 124830 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:06,369-Speed 2625.95 samples/sec Loss 11.8668 LearningRate 0.0722 Epoch: 3 Global Step: 124840 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:10,285-Speed 2615.81 samples/sec Loss 11.6913 LearningRate 0.0722 Epoch: 3 Global Step: 124850 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:14,203-Speed 2614.56 samples/sec Loss 11.7715 LearningRate 0.0722 Epoch: 3 Global Step: 124860 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:18,117-Speed 2616.72 samples/sec Loss 11.8018 LearningRate 0.0722 Epoch: 3 Global Step: 124870 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:22,021-Speed 2623.94 samples/sec Loss 11.7117 LearningRate 0.0722 Epoch: 3 Global Step: 124880 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:25,921-Speed 2626.53 samples/sec Loss 11.8762 LearningRate 0.0722 Epoch: 3 Global Step: 124890 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:29,856-Speed 2603.07 samples/sec Loss 11.8324 LearningRate 0.0722 Epoch: 3 Global Step: 124900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:33,757-Speed 2625.55 samples/sec Loss 11.6956 LearningRate 0.0722 Epoch: 3 Global Step: 124910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:37,658-Speed 2625.88 samples/sec Loss 11.7471 LearningRate 0.0722 Epoch: 3 Global Step: 124920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:41,561-Speed 2623.95 samples/sec Loss 11.7149 LearningRate 0.0721 Epoch: 3 Global Step: 124930 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:45:45,482-Speed 2612.83 samples/sec Loss 11.6047 LearningRate 0.0721 Epoch: 3 Global Step: 124940 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:45:49,363-Speed 2638.93 samples/sec Loss 11.8329 LearningRate 0.0721 Epoch: 3 Global Step: 124950 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:53,286-Speed 2611.50 samples/sec Loss 11.6939 LearningRate 0.0721 Epoch: 3 Global Step: 124960 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:45:57,193-Speed 2621.95 samples/sec Loss 11.8994 LearningRate 0.0721 Epoch: 3 Global Step: 124970 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:01,109-Speed 2615.07 samples/sec Loss 11.8009 LearningRate 0.0721 Epoch: 3 Global Step: 124980 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:05,000-Speed 2632.95 samples/sec Loss 11.7105 LearningRate 0.0721 Epoch: 3 Global Step: 124990 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:08,900-Speed 2625.89 samples/sec Loss 11.9212 LearningRate 0.0721 Epoch: 3 Global Step: 125000 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:12,904-Speed 2558.51 samples/sec Loss 11.7223 LearningRate 0.0721 Epoch: 3 Global Step: 125010 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:16,815-Speed 2618.69 samples/sec Loss 11.8019 LearningRate 0.0721 Epoch: 3 Global Step: 125020 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:20,717-Speed 2625.49 samples/sec Loss 11.8101 LearningRate 0.0721 Epoch: 3 Global Step: 125030 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:24,623-Speed 2622.07 samples/sec Loss 11.8491 LearningRate 0.0721 Epoch: 3 Global Step: 125040 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:28,530-Speed 2621.85 samples/sec Loss 11.8313 LearningRate 0.0721 Epoch: 3 Global Step: 125050 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:46:32,433-Speed 2624.24 samples/sec Loss 11.7796 LearningRate 0.0721 Epoch: 3 Global Step: 125060 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:46:36,331-Speed 2627.25 samples/sec Loss 11.6037 LearningRate 0.0721 Epoch: 3 Global Step: 125070 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:46:40,238-Speed 2621.61 samples/sec Loss 11.7677 LearningRate 0.0721 Epoch: 3 Global Step: 125080 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:46:44,124-Speed 2635.74 samples/sec Loss 11.7357 LearningRate 0.0721 Epoch: 3 Global Step: 125090 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:48,028-Speed 2623.99 samples/sec Loss 11.6632 LearningRate 0.0721 Epoch: 3 Global Step: 125100 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:51,933-Speed 2622.72 samples/sec Loss 11.8857 LearningRate 0.0721 Epoch: 3 Global Step: 125110 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:55,844-Speed 2619.01 samples/sec Loss 11.6855 LearningRate 0.0721 Epoch: 3 Global Step: 125120 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:46:59,742-Speed 2627.91 samples/sec Loss 11.7927 LearningRate 0.0721 Epoch: 3 Global Step: 125130 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:47:03,621-Speed 2639.95 samples/sec Loss 11.7749 LearningRate 0.0721 Epoch: 3 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:07,518-Speed 2628.55 samples/sec Loss 11.6872 LearningRate 0.0721 Epoch: 3 Global Step: 125150 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:11,421-Speed 2624.15 samples/sec Loss 11.7602 LearningRate 0.0721 Epoch: 3 Global Step: 125160 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:15,323-Speed 2624.65 samples/sec Loss 11.7404 LearningRate 0.0721 Epoch: 3 Global Step: 125170 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:19,254-Speed 2606.35 samples/sec Loss 11.7002 LearningRate 0.0721 Epoch: 3 Global Step: 125180 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:23,190-Speed 2601.92 samples/sec Loss 11.8565 LearningRate 0.0721 Epoch: 3 Global Step: 125190 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:27,110-Speed 2613.56 samples/sec Loss 11.7151 LearningRate 0.0721 Epoch: 3 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:31,013-Speed 2624.09 samples/sec Loss 11.8245 LearningRate 0.0721 Epoch: 3 Global Step: 125210 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:34,934-Speed 2612.15 samples/sec Loss 11.5812 LearningRate 0.0721 Epoch: 3 Global Step: 125220 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:38,818-Speed 2636.64 samples/sec Loss 12.1883 LearningRate 0.0721 Epoch: 3 Global Step: 125230 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:47:42,718-Speed 2626.76 samples/sec Loss 12.0810 LearningRate 0.0721 Epoch: 3 Global Step: 125240 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:47:46,612-Speed 2630.68 samples/sec Loss 11.8369 LearningRate 0.0721 Epoch: 3 Global Step: 125250 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:47:50,508-Speed 2629.03 samples/sec Loss 11.8549 LearningRate 0.0721 Epoch: 3 Global Step: 125260 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:47:54,417-Speed 2620.00 samples/sec Loss 11.7561 LearningRate 0.0721 Epoch: 3 Global Step: 125270 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:47:58,314-Speed 2628.25 samples/sec Loss 11.6946 LearningRate 0.0721 Epoch: 3 Global Step: 125280 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:48:02,209-Speed 2630.14 samples/sec Loss 11.8778 LearningRate 0.0721 Epoch: 3 Global Step: 125290 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:48:06,110-Speed 2625.16 samples/sec Loss 11.7200 LearningRate 0.0721 Epoch: 3 Global Step: 125300 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:48:10,018-Speed 2620.40 samples/sec Loss 11.7535 LearningRate 0.0721 Epoch: 3 Global Step: 125310 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:48:13,906-Speed 2634.65 samples/sec Loss 11.6515 LearningRate 0.0721 Epoch: 3 Global Step: 125320 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:17,809-Speed 2624.04 samples/sec Loss 11.9309 LearningRate 0.0721 Epoch: 3 Global Step: 125330 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:21,719-Speed 2619.56 samples/sec Loss 11.7230 LearningRate 0.0721 Epoch: 3 Global Step: 125340 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:25,624-Speed 2622.76 samples/sec Loss 11.7816 LearningRate 0.0721 Epoch: 3 Global Step: 125350 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:29,520-Speed 2629.26 samples/sec Loss 11.8508 LearningRate 0.0721 Epoch: 3 Global Step: 125360 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:33,422-Speed 2625.46 samples/sec Loss 11.7080 LearningRate 0.0721 Epoch: 3 Global Step: 125370 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:37,320-Speed 2627.62 samples/sec Loss 11.7730 LearningRate 0.0721 Epoch: 3 Global Step: 125380 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:41,221-Speed 2625.31 samples/sec Loss 11.7817 LearningRate 0.0721 Epoch: 3 Global Step: 125390 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:45,117-Speed 2629.42 samples/sec Loss 11.6860 LearningRate 0.0721 Epoch: 3 Global Step: 125400 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:49,012-Speed 2629.47 samples/sec Loss 11.8136 LearningRate 0.0721 Epoch: 3 Global Step: 125410 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:48:52,908-Speed 2628.93 samples/sec Loss 11.7977 LearningRate 0.0720 Epoch: 3 Global Step: 125420 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:48:56,804-Speed 2628.41 samples/sec Loss 11.7171 LearningRate 0.0720 Epoch: 3 Global Step: 125430 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:00,704-Speed 2626.97 samples/sec Loss 11.8609 LearningRate 0.0720 Epoch: 3 Global Step: 125440 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:04,602-Speed 2627.16 samples/sec Loss 11.8207 LearningRate 0.0720 Epoch: 3 Global Step: 125450 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:08,496-Speed 2630.32 samples/sec Loss 11.7031 LearningRate 0.0720 Epoch: 3 Global Step: 125460 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:12,402-Speed 2622.35 samples/sec Loss 11.7806 LearningRate 0.0720 Epoch: 3 Global Step: 125470 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:16,296-Speed 2630.23 samples/sec Loss 11.7989 LearningRate 0.0720 Epoch: 3 Global Step: 125480 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:20,196-Speed 2626.12 samples/sec Loss 11.6625 LearningRate 0.0720 Epoch: 3 Global Step: 125490 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:24,115-Speed 2613.50 samples/sec Loss 11.7329 LearningRate 0.0720 Epoch: 3 Global Step: 125500 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:28,009-Speed 2630.48 samples/sec Loss 11.6270 LearningRate 0.0720 Epoch: 3 Global Step: 125510 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:31,906-Speed 2629.19 samples/sec Loss 11.8414 LearningRate 0.0720 Epoch: 3 Global Step: 125520 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:49:35,782-Speed 2641.79 samples/sec Loss 11.7079 LearningRate 0.0720 Epoch: 3 Global Step: 125530 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:39,678-Speed 2629.41 samples/sec Loss 11.7534 LearningRate 0.0720 Epoch: 3 Global Step: 125540 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:43,577-Speed 2627.41 samples/sec Loss 11.8694 LearningRate 0.0720 Epoch: 3 Global Step: 125550 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:47,479-Speed 2624.20 samples/sec Loss 11.6584 LearningRate 0.0720 Epoch: 3 Global Step: 125560 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:51,377-Speed 2627.78 samples/sec Loss 11.6447 LearningRate 0.0720 Epoch: 3 Global Step: 125570 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:55,308-Speed 2606.03 samples/sec Loss 11.7634 LearningRate 0.0720 Epoch: 3 Global Step: 125580 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:49:59,212-Speed 2623.34 samples/sec Loss 11.7685 LearningRate 0.0720 Epoch: 3 Global Step: 125590 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:50:03,138-Speed 2608.98 samples/sec Loss 11.5462 LearningRate 0.0720 Epoch: 3 Global Step: 125600 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:50:07,031-Speed 2631.18 samples/sec Loss 11.7244 LearningRate 0.0720 Epoch: 3 Global Step: 125610 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:50:10,948-Speed 2615.34 samples/sec Loss 11.8987 LearningRate 0.0720 Epoch: 3 Global Step: 125620 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:50:14,870-Speed 2610.94 samples/sec Loss 11.6117 LearningRate 0.0720 Epoch: 3 Global Step: 125630 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:50:18,770-Speed 2626.68 samples/sec Loss 11.7343 LearningRate 0.0720 Epoch: 3 Global Step: 125640 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:50:22,640-Speed 2646.14 samples/sec Loss 11.7820 LearningRate 0.0720 Epoch: 3 Global Step: 125650 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:26,534-Speed 2630.76 samples/sec Loss 11.9704 LearningRate 0.0720 Epoch: 3 Global Step: 125660 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:30,427-Speed 2631.04 samples/sec Loss 11.6898 LearningRate 0.0720 Epoch: 3 Global Step: 125670 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:34,321-Speed 2630.22 samples/sec Loss 11.5668 LearningRate 0.0720 Epoch: 3 Global Step: 125680 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:38,215-Speed 2630.28 samples/sec Loss 11.8325 LearningRate 0.0720 Epoch: 3 Global Step: 125690 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:42,108-Speed 2631.18 samples/sec Loss 11.8294 LearningRate 0.0720 Epoch: 3 Global Step: 125700 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:46,015-Speed 2621.30 samples/sec Loss 11.8587 LearningRate 0.0720 Epoch: 3 Global Step: 125710 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:49,911-Speed 2628.83 samples/sec Loss 11.7516 LearningRate 0.0720 Epoch: 3 Global Step: 125720 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:53,816-Speed 2623.11 samples/sec Loss 11.8139 LearningRate 0.0720 Epoch: 3 Global Step: 125730 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:50:57,709-Speed 2631.07 samples/sec Loss 11.7749 LearningRate 0.0720 Epoch: 3 Global Step: 125740 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:51:01,603-Speed 2630.00 samples/sec Loss 11.8247 LearningRate 0.0720 Epoch: 3 Global Step: 125750 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:05,505-Speed 2625.29 samples/sec Loss 11.6999 LearningRate 0.0720 Epoch: 3 Global Step: 125760 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:09,411-Speed 2622.26 samples/sec Loss 11.7603 LearningRate 0.0720 Epoch: 3 Global Step: 125770 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:13,311-Speed 2626.32 samples/sec Loss 11.8147 LearningRate 0.0720 Epoch: 3 Global Step: 125780 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:17,229-Speed 2614.00 samples/sec Loss 11.7187 LearningRate 0.0720 Epoch: 3 Global Step: 125790 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:21,125-Speed 2628.89 samples/sec Loss 11.8011 LearningRate 0.0720 Epoch: 3 Global Step: 125800 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:25,020-Speed 2630.30 samples/sec Loss 11.7634 LearningRate 0.0720 Epoch: 3 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:28,990-Speed 2579.58 samples/sec Loss 11.7813 LearningRate 0.0720 Epoch: 3 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:32,905-Speed 2615.99 samples/sec Loss 11.8500 LearningRate 0.0720 Epoch: 3 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:36,827-Speed 2611.73 samples/sec Loss 11.6418 LearningRate 0.0720 Epoch: 3 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:51:40,746-Speed 2613.80 samples/sec Loss 11.7762 LearningRate 0.0720 Epoch: 3 Global Step: 125850 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:51:44,647-Speed 2625.90 samples/sec Loss 11.8140 LearningRate 0.0720 Epoch: 3 Global Step: 125860 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:51:48,554-Speed 2621.55 samples/sec Loss 11.8122 LearningRate 0.0720 Epoch: 3 Global Step: 125870 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:51:52,452-Speed 2627.54 samples/sec Loss 11.7132 LearningRate 0.0720 Epoch: 3 Global Step: 125880 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:51:56,361-Speed 2620.36 samples/sec Loss 11.6106 LearningRate 0.0720 Epoch: 3 Global Step: 125890 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:52:00,272-Speed 2619.02 samples/sec Loss 11.8679 LearningRate 0.0720 Epoch: 3 Global Step: 125900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:52:04,179-Speed 2621.57 samples/sec Loss 11.9712 LearningRate 0.0719 Epoch: 3 Global Step: 125910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:52:08,083-Speed 2623.05 samples/sec Loss 11.8165 LearningRate 0.0719 Epoch: 3 Global Step: 125920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:52:11,945-Speed 2651.86 samples/sec Loss 11.7136 LearningRate 0.0719 Epoch: 3 Global Step: 125930 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:15,853-Speed 2621.51 samples/sec Loss 11.6964 LearningRate 0.0719 Epoch: 3 Global Step: 125940 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:19,749-Speed 2629.13 samples/sec Loss 11.6822 LearningRate 0.0719 Epoch: 3 Global Step: 125950 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:23,654-Speed 2622.58 samples/sec Loss 11.6891 LearningRate 0.0719 Epoch: 3 Global Step: 125960 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:27,559-Speed 2622.89 samples/sec Loss 11.8130 LearningRate 0.0719 Epoch: 3 Global Step: 125970 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:31,464-Speed 2623.02 samples/sec Loss 11.6893 LearningRate 0.0719 Epoch: 3 Global Step: 125980 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:35,370-Speed 2622.42 samples/sec Loss 11.5974 LearningRate 0.0719 Epoch: 3 Global Step: 125990 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:39,264-Speed 2629.89 samples/sec Loss 11.6534 LearningRate 0.0719 Epoch: 3 Global Step: 126000 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:43,187-Speed 2611.46 samples/sec Loss 11.8859 LearningRate 0.0719 Epoch: 3 Global Step: 126010 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:47,083-Speed 2628.86 samples/sec Loss 11.6930 LearningRate 0.0719 Epoch: 3 Global Step: 126020 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 09:52:50,997-Speed 2625.36 samples/sec Loss 11.8475 LearningRate 0.0719 Epoch: 3 Global Step: 126030 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:52:54,896-Speed 2627.06 samples/sec Loss 11.8469 LearningRate 0.0719 Epoch: 3 Global Step: 126040 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:52:58,791-Speed 2629.86 samples/sec Loss 11.7350 LearningRate 0.0719 Epoch: 3 Global Step: 126050 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:02,690-Speed 2626.98 samples/sec Loss 11.8048 LearningRate 0.0719 Epoch: 3 Global Step: 126060 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:06,588-Speed 2627.62 samples/sec Loss 11.8054 LearningRate 0.0719 Epoch: 3 Global Step: 126070 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:10,488-Speed 2626.46 samples/sec Loss 11.8105 LearningRate 0.0719 Epoch: 3 Global Step: 126080 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:14,383-Speed 2630.17 samples/sec Loss 11.7707 LearningRate 0.0719 Epoch: 3 Global Step: 126090 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:18,275-Speed 2630.99 samples/sec Loss 11.8958 LearningRate 0.0719 Epoch: 3 Global Step: 126100 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:22,175-Speed 2626.97 samples/sec Loss 11.7294 LearningRate 0.0719 Epoch: 3 Global Step: 126110 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:26,075-Speed 2625.63 samples/sec Loss 12.0591 LearningRate 0.0719 Epoch: 3 Global Step: 126120 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:29,968-Speed 2631.54 samples/sec Loss 11.7605 LearningRate 0.0719 Epoch: 3 Global Step: 126130 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:53:33,860-Speed 2631.74 samples/sec Loss 11.7266 LearningRate 0.0719 Epoch: 3 Global Step: 126140 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:53:37,760-Speed 2625.71 samples/sec Loss 11.7707 LearningRate 0.0719 Epoch: 3 Global Step: 126150 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:53:41,655-Speed 2629.66 samples/sec Loss 11.8367 LearningRate 0.0719 Epoch: 3 Global Step: 126160 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:53:45,554-Speed 2627.35 samples/sec Loss 11.8123 LearningRate 0.0719 Epoch: 3 Global Step: 126170 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:53:49,454-Speed 2626.27 samples/sec Loss 11.7837 LearningRate 0.0719 Epoch: 3 Global Step: 126180 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:53:53,351-Speed 2628.20 samples/sec Loss 11.6929 LearningRate 0.0719 Epoch: 3 Global Step: 126190 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:53:57,287-Speed 2602.28 samples/sec Loss 11.8090 LearningRate 0.0719 Epoch: 3 Global Step: 126200 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:01,182-Speed 2630.07 samples/sec Loss 11.6795 LearningRate 0.0719 Epoch: 3 Global Step: 126210 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:05,086-Speed 2623.83 samples/sec Loss 11.7475 LearningRate 0.0719 Epoch: 3 Global Step: 126220 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:08,987-Speed 2625.50 samples/sec Loss 11.6720 LearningRate 0.0719 Epoch: 3 Global Step: 126230 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:12,884-Speed 2628.63 samples/sec Loss 11.8032 LearningRate 0.0719 Epoch: 3 Global Step: 126240 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:16,784-Speed 2625.95 samples/sec Loss 11.7044 LearningRate 0.0719 Epoch: 3 Global Step: 126250 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:20,679-Speed 2629.47 samples/sec Loss 12.0247 LearningRate 0.0719 Epoch: 3 Global Step: 126260 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:24,574-Speed 2629.94 samples/sec Loss 11.6536 LearningRate 0.0719 Epoch: 3 Global Step: 126270 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:28,508-Speed 2603.21 samples/sec Loss 11.8452 LearningRate 0.0719 Epoch: 3 Global Step: 126280 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:54:32,422-Speed 2617.26 samples/sec Loss 11.8246 LearningRate 0.0719 Epoch: 3 Global Step: 126290 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:54:36,319-Speed 2628.65 samples/sec Loss 11.7651 LearningRate 0.0719 Epoch: 3 Global Step: 126300 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:54:40,215-Speed 2628.87 samples/sec Loss 11.7735 LearningRate 0.0719 Epoch: 3 Global Step: 126310 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:54:44,109-Speed 2630.30 samples/sec Loss 11.7611 LearningRate 0.0719 Epoch: 3 Global Step: 126320 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:54:48,003-Speed 2630.59 samples/sec Loss 11.6454 LearningRate 0.0719 Epoch: 3 Global Step: 126330 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:54:51,902-Speed 2626.72 samples/sec Loss 11.8674 LearningRate 0.0719 Epoch: 3 Global Step: 126340 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:54:55,800-Speed 2628.04 samples/sec Loss 11.8534 LearningRate 0.0719 Epoch: 3 Global Step: 126350 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:54:59,718-Speed 2614.30 samples/sec Loss 11.8707 LearningRate 0.0719 Epoch: 3 Global Step: 126360 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:55:03,681-Speed 2585.00 samples/sec Loss 11.8062 LearningRate 0.0719 Epoch: 3 Global Step: 126370 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:55:07,584-Speed 2624.40 samples/sec Loss 11.7972 LearningRate 0.0719 Epoch: 3 Global Step: 126380 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:55:11,493-Speed 2620.04 samples/sec Loss 11.6972 LearningRate 0.0719 Epoch: 3 Global Step: 126390 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:15,397-Speed 2623.47 samples/sec Loss 11.8872 LearningRate 0.0718 Epoch: 3 Global Step: 126400 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:19,308-Speed 2619.26 samples/sec Loss 11.8070 LearningRate 0.0718 Epoch: 3 Global Step: 126410 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:23,210-Speed 2624.87 samples/sec Loss 11.7108 LearningRate 0.0718 Epoch: 3 Global Step: 126420 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:27,107-Speed 2627.52 samples/sec Loss 11.7121 LearningRate 0.0718 Epoch: 3 Global Step: 126430 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:31,005-Speed 2630.76 samples/sec Loss 11.7034 LearningRate 0.0718 Epoch: 3 Global Step: 126440 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:34,927-Speed 2612.02 samples/sec Loss 11.6629 LearningRate 0.0718 Epoch: 3 Global Step: 126450 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:38,820-Speed 2630.66 samples/sec Loss 11.7394 LearningRate 0.0718 Epoch: 3 Global Step: 126460 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:42,715-Speed 2629.61 samples/sec Loss 11.7016 LearningRate 0.0718 Epoch: 3 Global Step: 126470 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:46,620-Speed 2622.87 samples/sec Loss 11.7009 LearningRate 0.0718 Epoch: 3 Global Step: 126480 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:50,502-Speed 2638.03 samples/sec Loss 11.8752 LearningRate 0.0718 Epoch: 3 Global Step: 126490 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:54,403-Speed 2626.55 samples/sec Loss 11.4873 LearningRate 0.0718 Epoch: 3 Global Step: 126500 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:55:58,302-Speed 2626.62 samples/sec Loss 11.8502 LearningRate 0.0718 Epoch: 3 Global Step: 126510 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:56:02,199-Speed 2628.18 samples/sec Loss 11.6859 LearningRate 0.0718 Epoch: 3 Global Step: 126520 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:56:06,201-Speed 2559.88 samples/sec Loss 11.6523 LearningRate 0.0718 Epoch: 3 Global Step: 126530 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:10,136-Speed 2602.76 samples/sec Loss 11.8235 LearningRate 0.0718 Epoch: 3 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:14,032-Speed 2628.52 samples/sec Loss 11.6859 LearningRate 0.0718 Epoch: 3 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:17,927-Speed 2629.45 samples/sec Loss 11.7031 LearningRate 0.0718 Epoch: 3 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:21,832-Speed 2623.05 samples/sec Loss 11.8364 LearningRate 0.0718 Epoch: 3 Global Step: 126570 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:25,729-Speed 2628.46 samples/sec Loss 11.7766 LearningRate 0.0718 Epoch: 3 Global Step: 126580 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:29,639-Speed 2619.30 samples/sec Loss 11.8224 LearningRate 0.0718 Epoch: 3 Global Step: 126590 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:33,531-Speed 2631.15 samples/sec Loss 11.6581 LearningRate 0.0718 Epoch: 3 Global Step: 126600 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:37,432-Speed 2625.86 samples/sec Loss 11.7993 LearningRate 0.0718 Epoch: 3 Global Step: 126610 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:41,338-Speed 2622.30 samples/sec Loss 11.8217 LearningRate 0.0718 Epoch: 3 Global Step: 126620 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:56:45,232-Speed 2630.62 samples/sec Loss 11.7364 LearningRate 0.0718 Epoch: 3 Global Step: 126630 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:56:49,134-Speed 2624.93 samples/sec Loss 11.8398 LearningRate 0.0718 Epoch: 3 Global Step: 126640 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:56:53,027-Speed 2630.67 samples/sec Loss 11.6745 LearningRate 0.0718 Epoch: 3 Global Step: 126650 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:56:56,925-Speed 2627.76 samples/sec Loss 11.7806 LearningRate 0.0718 Epoch: 3 Global Step: 126660 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:00,827-Speed 2624.83 samples/sec Loss 11.7038 LearningRate 0.0718 Epoch: 3 Global Step: 126670 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:04,733-Speed 2621.72 samples/sec Loss 11.7896 LearningRate 0.0718 Epoch: 3 Global Step: 126680 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:08,636-Speed 2624.65 samples/sec Loss 11.5782 LearningRate 0.0718 Epoch: 3 Global Step: 126690 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:12,537-Speed 2625.34 samples/sec Loss 11.7581 LearningRate 0.0718 Epoch: 3 Global Step: 126700 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:16,442-Speed 2623.60 samples/sec Loss 11.8433 LearningRate 0.0718 Epoch: 3 Global Step: 126710 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:20,344-Speed 2624.90 samples/sec Loss 11.8178 LearningRate 0.0718 Epoch: 3 Global Step: 126720 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:24,244-Speed 2626.13 samples/sec Loss 11.5851 LearningRate 0.0718 Epoch: 3 Global Step: 126730 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:57:28,148-Speed 2623.77 samples/sec Loss 11.7596 LearningRate 0.0718 Epoch: 3 Global Step: 126740 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:57:32,051-Speed 2623.88 samples/sec Loss 11.6734 LearningRate 0.0718 Epoch: 3 Global Step: 126750 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:57:35,957-Speed 2622.08 samples/sec Loss 11.8245 LearningRate 0.0718 Epoch: 3 Global Step: 126760 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:57:39,840-Speed 2637.38 samples/sec Loss 11.7989 LearningRate 0.0718 Epoch: 3 Global Step: 126770 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:43,739-Speed 2626.98 samples/sec Loss 11.7142 LearningRate 0.0718 Epoch: 3 Global Step: 126780 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:47,639-Speed 2626.72 samples/sec Loss 11.8023 LearningRate 0.0718 Epoch: 3 Global Step: 126790 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:51,550-Speed 2618.87 samples/sec Loss 11.8689 LearningRate 0.0718 Epoch: 3 Global Step: 126800 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:55,626-Speed 2513.15 samples/sec Loss 11.5002 LearningRate 0.0718 Epoch: 3 Global Step: 126810 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:57:59,631-Speed 2557.27 samples/sec Loss 11.6105 LearningRate 0.0718 Epoch: 3 Global Step: 126820 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:03,527-Speed 2628.77 samples/sec Loss 11.8101 LearningRate 0.0718 Epoch: 3 Global Step: 126830 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:07,424-Speed 2628.31 samples/sec Loss 11.7724 LearningRate 0.0718 Epoch: 3 Global Step: 126840 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:11,319-Speed 2629.24 samples/sec Loss 11.6755 LearningRate 0.0718 Epoch: 3 Global Step: 126850 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:15,224-Speed 2623.48 samples/sec Loss 11.7169 LearningRate 0.0718 Epoch: 3 Global Step: 126860 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:19,129-Speed 2622.86 samples/sec Loss 11.7280 LearningRate 0.0718 Epoch: 3 Global Step: 126870 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:58:23,025-Speed 2628.88 samples/sec Loss 11.6532 LearningRate 0.0718 Epoch: 3 Global Step: 126880 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 09:58:26,908-Speed 2638.08 samples/sec Loss 11.7903 LearningRate 0.0717 Epoch: 3 Global Step: 126890 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:30,803-Speed 2630.08 samples/sec Loss 11.8497 LearningRate 0.0717 Epoch: 3 Global Step: 126900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:34,704-Speed 2625.23 samples/sec Loss 11.6460 LearningRate 0.0717 Epoch: 3 Global Step: 126910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:38,602-Speed 2627.10 samples/sec Loss 11.5945 LearningRate 0.0717 Epoch: 3 Global Step: 126920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:58:42,485-Speed 2638.06 samples/sec Loss 11.5061 LearningRate 0.0717 Epoch: 3 Global Step: 126930 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:58:46,385-Speed 2626.19 samples/sec Loss 11.7154 LearningRate 0.0717 Epoch: 3 Global Step: 126940 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:58:50,283-Speed 2628.11 samples/sec Loss 11.6601 LearningRate 0.0717 Epoch: 3 Global Step: 126950 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:58:54,181-Speed 2627.50 samples/sec Loss 11.9510 LearningRate 0.0717 Epoch: 3 Global Step: 126960 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:58:58,073-Speed 2631.64 samples/sec Loss 11.7020 LearningRate 0.0717 Epoch: 3 Global Step: 126970 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:01,971-Speed 2627.93 samples/sec Loss 12.1375 LearningRate 0.0717 Epoch: 3 Global Step: 126980 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:05,864-Speed 2630.61 samples/sec Loss 12.0162 LearningRate 0.0717 Epoch: 3 Global Step: 126990 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:09,755-Speed 2632.42 samples/sec Loss 11.8112 LearningRate 0.0717 Epoch: 3 Global Step: 127000 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:13,650-Speed 2629.69 samples/sec Loss 11.9964 LearningRate 0.0717 Epoch: 3 Global Step: 127010 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:17,578-Speed 2607.22 samples/sec Loss 11.9440 LearningRate 0.0717 Epoch: 3 Global Step: 127020 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:21,476-Speed 2627.83 samples/sec Loss 11.6849 LearningRate 0.0717 Epoch: 3 Global Step: 127030 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 09:59:25,366-Speed 2633.16 samples/sec Loss 11.8174 LearningRate 0.0717 Epoch: 3 Global Step: 127040 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:29,262-Speed 2628.90 samples/sec Loss 11.6445 LearningRate 0.0717 Epoch: 3 Global Step: 127050 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:33,161-Speed 2627.23 samples/sec Loss 11.7393 LearningRate 0.0717 Epoch: 3 Global Step: 127060 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:37,053-Speed 2631.70 samples/sec Loss 11.7862 LearningRate 0.0717 Epoch: 3 Global Step: 127070 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:40,949-Speed 2628.29 samples/sec Loss 11.8749 LearningRate 0.0717 Epoch: 3 Global Step: 127080 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:44,842-Speed 2631.62 samples/sec Loss 11.7609 LearningRate 0.0717 Epoch: 3 Global Step: 127090 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:48,735-Speed 2630.40 samples/sec Loss 11.8234 LearningRate 0.0717 Epoch: 3 Global Step: 127100 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:52,634-Speed 2627.05 samples/sec Loss 11.7725 LearningRate 0.0717 Epoch: 3 Global Step: 127110 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 09:59:56,528-Speed 2630.43 samples/sec Loss 11.8166 LearningRate 0.0717 Epoch: 3 Global Step: 127120 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:00:00,421-Speed 2631.32 samples/sec Loss 11.7567 LearningRate 0.0717 Epoch: 3 Global Step: 127130 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:00:04,316-Speed 2629.66 samples/sec Loss 11.7742 LearningRate 0.0717 Epoch: 3 Global Step: 127140 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:08,231-Speed 2616.31 samples/sec Loss 11.7316 LearningRate 0.0717 Epoch: 3 Global Step: 127150 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:12,138-Speed 2621.06 samples/sec Loss 11.5890 LearningRate 0.0717 Epoch: 3 Global Step: 127160 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:16,036-Speed 2628.51 samples/sec Loss 11.7564 LearningRate 0.0717 Epoch: 3 Global Step: 127170 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:19,935-Speed 2626.88 samples/sec Loss 11.5854 LearningRate 0.0717 Epoch: 3 Global Step: 127180 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:23,836-Speed 2624.99 samples/sec Loss 11.6478 LearningRate 0.0717 Epoch: 3 Global Step: 127190 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:27,734-Speed 2628.06 samples/sec Loss 11.5889 LearningRate 0.0717 Epoch: 3 Global Step: 127200 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:31,628-Speed 2630.75 samples/sec Loss 11.7099 LearningRate 0.0717 Epoch: 3 Global Step: 127210 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:35,572-Speed 2596.83 samples/sec Loss 11.6834 LearningRate 0.0717 Epoch: 3 Global Step: 127220 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:39,465-Speed 2630.99 samples/sec Loss 11.7758 LearningRate 0.0717 Epoch: 3 Global Step: 127230 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:43,357-Speed 2631.77 samples/sec Loss 11.8183 LearningRate 0.0717 Epoch: 3 Global Step: 127240 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:00:47,254-Speed 2628.43 samples/sec Loss 11.7385 LearningRate 0.0717 Epoch: 3 Global Step: 127250 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:00:51,148-Speed 2630.31 samples/sec Loss 11.9327 LearningRate 0.0717 Epoch: 3 Global Step: 127260 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:00:55,032-Speed 2636.64 samples/sec Loss 11.4623 LearningRate 0.0717 Epoch: 3 Global Step: 127270 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:00:58,938-Speed 2622.77 samples/sec Loss 11.8957 LearningRate 0.0717 Epoch: 3 Global Step: 127280 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:02,839-Speed 2625.26 samples/sec Loss 11.6629 LearningRate 0.0717 Epoch: 3 Global Step: 127290 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:06,734-Speed 2629.97 samples/sec Loss 11.7334 LearningRate 0.0717 Epoch: 3 Global Step: 127300 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:10,629-Speed 2629.66 samples/sec Loss 11.8341 LearningRate 0.0717 Epoch: 3 Global Step: 127310 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:14,532-Speed 2624.54 samples/sec Loss 11.7315 LearningRate 0.0717 Epoch: 3 Global Step: 127320 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:18,440-Speed 2620.86 samples/sec Loss 11.7153 LearningRate 0.0717 Epoch: 3 Global Step: 127330 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:22,335-Speed 2629.08 samples/sec Loss 11.7529 LearningRate 0.0717 Epoch: 3 Global Step: 127340 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:26,242-Speed 2621.98 samples/sec Loss 11.7672 LearningRate 0.0717 Epoch: 3 Global Step: 127350 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:30,144-Speed 2624.69 samples/sec Loss 11.7032 LearningRate 0.0717 Epoch: 3 Global Step: 127360 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:01:34,048-Speed 2624.10 samples/sec Loss 11.8327 LearningRate 0.0717 Epoch: 3 Global Step: 127370 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:01:37,953-Speed 2623.16 samples/sec Loss 11.6655 LearningRate 0.0716 Epoch: 3 Global Step: 127380 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:01:41,857-Speed 2623.68 samples/sec Loss 11.6925 LearningRate 0.0716 Epoch: 3 Global Step: 127390 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:01:45,766-Speed 2620.78 samples/sec Loss 11.7120 LearningRate 0.0716 Epoch: 3 Global Step: 127400 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:01:49,705-Speed 2599.77 samples/sec Loss 11.8277 LearningRate 0.0716 Epoch: 3 Global Step: 127410 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:01:53,581-Speed 2642.89 samples/sec Loss 11.6146 LearningRate 0.0716 Epoch: 3 Global Step: 127420 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:01:57,483-Speed 2624.62 samples/sec Loss 11.8467 LearningRate 0.0716 Epoch: 3 Global Step: 127430 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:01,393-Speed 2619.86 samples/sec Loss 11.7672 LearningRate 0.0716 Epoch: 3 Global Step: 127440 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:05,296-Speed 2623.71 samples/sec Loss 11.7897 LearningRate 0.0716 Epoch: 3 Global Step: 127450 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:09,207-Speed 2619.52 samples/sec Loss 11.8402 LearningRate 0.0716 Epoch: 3 Global Step: 127460 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:13,162-Speed 2590.25 samples/sec Loss 11.9202 LearningRate 0.0716 Epoch: 3 Global Step: 127470 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:17,067-Speed 2622.91 samples/sec Loss 11.6238 LearningRate 0.0716 Epoch: 3 Global Step: 127480 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:20,960-Speed 2631.29 samples/sec Loss 11.6669 LearningRate 0.0716 Epoch: 3 Global Step: 127490 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:24,848-Speed 2634.37 samples/sec Loss 11.7893 LearningRate 0.0716 Epoch: 3 Global Step: 127500 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:28,740-Speed 2630.97 samples/sec Loss 11.7150 LearningRate 0.0716 Epoch: 3 Global Step: 127510 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:02:32,630-Speed 2633.26 samples/sec Loss 11.7563 LearningRate 0.0716 Epoch: 3 Global Step: 127520 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:02:36,519-Speed 2633.82 samples/sec Loss 11.7666 LearningRate 0.0716 Epoch: 3 Global Step: 127530 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:02:40,419-Speed 2626.10 samples/sec Loss 11.7183 LearningRate 0.0716 Epoch: 3 Global Step: 127540 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:02:44,320-Speed 2625.54 samples/sec Loss 11.5827 LearningRate 0.0716 Epoch: 3 Global Step: 127550 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:02:48,228-Speed 2620.94 samples/sec Loss 11.6888 LearningRate 0.0716 Epoch: 3 Global Step: 127560 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:02:52,131-Speed 2624.77 samples/sec Loss 11.6174 LearningRate 0.0716 Epoch: 3 Global Step: 127570 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:02:56,031-Speed 2626.40 samples/sec Loss 11.7408 LearningRate 0.0716 Epoch: 3 Global Step: 127580 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:02:59,933-Speed 2624.72 samples/sec Loss 11.5449 LearningRate 0.0716 Epoch: 3 Global Step: 127590 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:03:03,837-Speed 2623.42 samples/sec Loss 11.6841 LearningRate 0.0716 Epoch: 3 Global Step: 127600 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:03:07,748-Speed 2618.81 samples/sec Loss 11.7059 LearningRate 0.0716 Epoch: 3 Global Step: 127610 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:03:11,661-Speed 2617.75 samples/sec Loss 11.5596 LearningRate 0.0716 Epoch: 3 Global Step: 127620 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:03:15,560-Speed 2626.99 samples/sec Loss 11.8608 LearningRate 0.0716 Epoch: 3 Global Step: 127630 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:03:19,443-Speed 2637.63 samples/sec Loss 11.7442 LearningRate 0.0716 Epoch: 3 Global Step: 127640 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:03:23,345-Speed 2625.33 samples/sec Loss 11.6519 LearningRate 0.0716 Epoch: 3 Global Step: 127650 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:03:27,244-Speed 2626.55 samples/sec Loss 11.8642 LearningRate 0.0716 Epoch: 3 Global Step: 127660 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:03:31,156-Speed 2618.85 samples/sec Loss 11.7843 LearningRate 0.0716 Epoch: 3 Global Step: 127670 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:03:35,070-Speed 2616.27 samples/sec Loss 11.7061 LearningRate 0.0716 Epoch: 3 Global Step: 127680 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:03:38,956-Speed 2635.86 samples/sec Loss 11.7380 LearningRate 0.0716 Epoch: 3 Global Step: 127690 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:03:42,830-Speed 2643.56 samples/sec Loss 11.9166 LearningRate 0.0716 Epoch: 3 Global Step: 127700 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:03:46,725-Speed 2630.45 samples/sec Loss 11.8284 LearningRate 0.0716 Epoch: 3 Global Step: 127710 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:03:50,615-Speed 2633.25 samples/sec Loss 11.8292 LearningRate 0.0716 Epoch: 3 Global Step: 127720 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:03:54,518-Speed 2623.53 samples/sec Loss 11.6588 LearningRate 0.0716 Epoch: 3 Global Step: 127730 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:03:58,421-Speed 2624.48 samples/sec Loss 11.6777 LearningRate 0.0716 Epoch: 3 Global Step: 127740 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:04:02,329-Speed 2620.97 samples/sec Loss 11.8488 LearningRate 0.0716 Epoch: 3 Global Step: 127750 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:04:06,219-Speed 2632.58 samples/sec Loss 11.8841 LearningRate 0.0716 Epoch: 3 Global Step: 127760 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:04:10,127-Speed 2621.24 samples/sec Loss 11.7802 LearningRate 0.0716 Epoch: 3 Global Step: 127770 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:04:14,024-Speed 2628.45 samples/sec Loss 11.6980 LearningRate 0.0716 Epoch: 3 Global Step: 127780 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:04:17,928-Speed 2623.39 samples/sec Loss 11.8324 LearningRate 0.0716 Epoch: 3 Global Step: 127790 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:04:21,837-Speed 2620.73 samples/sec Loss 11.7195 LearningRate 0.0716 Epoch: 3 Global Step: 127800 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:25,741-Speed 2623.04 samples/sec Loss 11.9143 LearningRate 0.0716 Epoch: 3 Global Step: 127810 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:29,639-Speed 2628.14 samples/sec Loss 11.7653 LearningRate 0.0716 Epoch: 3 Global Step: 127820 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:33,545-Speed 2622.22 samples/sec Loss 11.7578 LearningRate 0.0716 Epoch: 3 Global Step: 127830 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:37,448-Speed 2624.08 samples/sec Loss 11.6173 LearningRate 0.0716 Epoch: 3 Global Step: 127840 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:41,353-Speed 2622.52 samples/sec Loss 11.8387 LearningRate 0.0716 Epoch: 3 Global Step: 127850 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:45,255-Speed 2624.65 samples/sec Loss 11.7561 LearningRate 0.0716 Epoch: 3 Global Step: 127860 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:49,159-Speed 2623.91 samples/sec Loss 11.6417 LearningRate 0.0715 Epoch: 3 Global Step: 127870 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:53,059-Speed 2626.90 samples/sec Loss 11.7431 LearningRate 0.0715 Epoch: 3 Global Step: 127880 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:04:57,087-Speed 2542.68 samples/sec Loss 11.6699 LearningRate 0.0715 Epoch: 3 Global Step: 127890 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:05:00,982-Speed 2629.69 samples/sec Loss 11.7977 LearningRate 0.0715 Epoch: 3 Global Step: 127900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:04,889-Speed 2621.46 samples/sec Loss 11.6704 LearningRate 0.0715 Epoch: 3 Global Step: 127910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:08,784-Speed 2629.14 samples/sec Loss 11.7678 LearningRate 0.0715 Epoch: 3 Global Step: 127920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:12,701-Speed 2615.06 samples/sec Loss 11.6242 LearningRate 0.0715 Epoch: 3 Global Step: 127930 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:16,606-Speed 2622.44 samples/sec Loss 11.8245 LearningRate 0.0715 Epoch: 3 Global Step: 127940 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:20,511-Speed 2623.63 samples/sec Loss 11.5857 LearningRate 0.0715 Epoch: 3 Global Step: 127950 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:24,416-Speed 2623.00 samples/sec Loss 11.6356 LearningRate 0.0715 Epoch: 3 Global Step: 127960 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:28,321-Speed 2622.51 samples/sec Loss 11.7350 LearningRate 0.0715 Epoch: 3 Global Step: 127970 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:32,232-Speed 2619.33 samples/sec Loss 11.5740 LearningRate 0.0715 Epoch: 3 Global Step: 127980 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:36,204-Speed 2578.64 samples/sec Loss 11.7084 LearningRate 0.0715 Epoch: 3 Global Step: 127990 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:40,111-Speed 2621.32 samples/sec Loss 11.8476 LearningRate 0.0715 Epoch: 3 Global Step: 128000 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:05:43,996-Speed 2636.33 samples/sec Loss 11.6992 LearningRate 0.0715 Epoch: 3 Global Step: 128010 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:47,903-Speed 2620.87 samples/sec Loss 11.6715 LearningRate 0.0715 Epoch: 3 Global Step: 128020 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:51,801-Speed 2627.98 samples/sec Loss 11.7136 LearningRate 0.0715 Epoch: 3 Global Step: 128030 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:55,781-Speed 2573.21 samples/sec Loss 11.7036 LearningRate 0.0715 Epoch: 3 Global Step: 128040 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:05:59,686-Speed 2623.16 samples/sec Loss 11.5263 LearningRate 0.0715 Epoch: 3 Global Step: 128050 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:06:03,590-Speed 2623.48 samples/sec Loss 11.6887 LearningRate 0.0715 Epoch: 3 Global Step: 128060 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:06:07,490-Speed 2625.93 samples/sec Loss 11.7474 LearningRate 0.0715 Epoch: 3 Global Step: 128070 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:06:11,385-Speed 2629.44 samples/sec Loss 11.7938 LearningRate 0.0715 Epoch: 3 Global Step: 128080 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:06:15,275-Speed 2632.98 samples/sec Loss 11.7374 LearningRate 0.0715 Epoch: 3 Global Step: 128090 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:06:19,213-Speed 2600.60 samples/sec Loss 11.7177 LearningRate 0.0715 Epoch: 3 Global Step: 128100 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:06:23,313-Speed 2498.02 samples/sec Loss 11.6393 LearningRate 0.0715 Epoch: 3 Global Step: 128110 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:27,213-Speed 2626.69 samples/sec Loss 11.7182 LearningRate 0.0715 Epoch: 3 Global Step: 128120 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:31,110-Speed 2629.18 samples/sec Loss 11.7207 LearningRate 0.0715 Epoch: 3 Global Step: 128130 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:35,007-Speed 2628.14 samples/sec Loss 11.8235 LearningRate 0.0715 Epoch: 3 Global Step: 128140 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:38,909-Speed 2624.99 samples/sec Loss 11.6127 LearningRate 0.0715 Epoch: 3 Global Step: 128150 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:42,821-Speed 2617.96 samples/sec Loss 11.7297 LearningRate 0.0715 Epoch: 3 Global Step: 128160 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:46,728-Speed 2621.19 samples/sec Loss 11.6254 LearningRate 0.0715 Epoch: 3 Global Step: 128170 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:50,631-Speed 2624.60 samples/sec Loss 11.6357 LearningRate 0.0715 Epoch: 3 Global Step: 128180 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:54,540-Speed 2619.85 samples/sec Loss 11.6868 LearningRate 0.0715 Epoch: 3 Global Step: 128190 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:06:58,463-Speed 2610.34 samples/sec Loss 11.5698 LearningRate 0.0715 Epoch: 3 Global Step: 128200 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:02,338-Speed 2643.46 samples/sec Loss 11.6836 LearningRate 0.0715 Epoch: 3 Global Step: 128210 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:06,233-Speed 2630.47 samples/sec Loss 11.6300 LearningRate 0.0715 Epoch: 3 Global Step: 128220 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:10,134-Speed 2625.16 samples/sec Loss 11.7347 LearningRate 0.0715 Epoch: 3 Global Step: 128230 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:14,034-Speed 2626.38 samples/sec Loss 11.7834 LearningRate 0.0715 Epoch: 3 Global Step: 128240 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:17,936-Speed 2624.83 samples/sec Loss 11.8168 LearningRate 0.0715 Epoch: 3 Global Step: 128250 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:21,842-Speed 2621.63 samples/sec Loss 11.6936 LearningRate 0.0715 Epoch: 3 Global Step: 128260 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:25,742-Speed 2626.21 samples/sec Loss 11.7518 LearningRate 0.0715 Epoch: 3 Global Step: 128270 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:29,650-Speed 2621.04 samples/sec Loss 11.7885 LearningRate 0.0715 Epoch: 3 Global Step: 128280 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:33,554-Speed 2623.58 samples/sec Loss 11.7560 LearningRate 0.0715 Epoch: 3 Global Step: 128290 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:37,454-Speed 2626.55 samples/sec Loss 11.8401 LearningRate 0.0715 Epoch: 3 Global Step: 128300 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:07:41,363-Speed 2619.75 samples/sec Loss 11.5934 LearningRate 0.0715 Epoch: 3 Global Step: 128310 Fp16 Grad Scale: 524288 Required: 79 hours
Training: 2022-04-13 10:07:45,238-Speed 2644.07 samples/sec Loss 11.6903 LearningRate 0.0715 Epoch: 3 Global Step: 128320 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:07:49,132-Speed 2629.83 samples/sec Loss 11.6651 LearningRate 0.0715 Epoch: 3 Global Step: 128330 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:07:53,020-Speed 2634.19 samples/sec Loss 11.6715 LearningRate 0.0715 Epoch: 3 Global Step: 128340 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:07:56,913-Speed 2630.54 samples/sec Loss 11.7267 LearningRate 0.0715 Epoch: 3 Global Step: 128350 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:08:00,810-Speed 2628.53 samples/sec Loss 11.5234 LearningRate 0.0714 Epoch: 3 Global Step: 128360 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:08:04,702-Speed 2631.54 samples/sec Loss 11.7411 LearningRate 0.0714 Epoch: 3 Global Step: 128370 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:08:08,598-Speed 2629.05 samples/sec Loss 11.5286 LearningRate 0.0714 Epoch: 3 Global Step: 128380 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:08:12,471-Speed 2644.22 samples/sec Loss 11.6463 LearningRate 0.0714 Epoch: 3 Global Step: 128390 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:16,362-Speed 2632.90 samples/sec Loss 11.6588 LearningRate 0.0714 Epoch: 3 Global Step: 128400 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:20,255-Speed 2631.02 samples/sec Loss 11.8049 LearningRate 0.0714 Epoch: 3 Global Step: 128410 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:24,142-Speed 2635.07 samples/sec Loss 11.8943 LearningRate 0.0714 Epoch: 3 Global Step: 128420 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:28,036-Speed 2629.72 samples/sec Loss 11.7710 LearningRate 0.0714 Epoch: 3 Global Step: 128430 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:31,939-Speed 2624.57 samples/sec Loss 11.7064 LearningRate 0.0714 Epoch: 3 Global Step: 128440 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:35,828-Speed 2633.66 samples/sec Loss 11.6428 LearningRate 0.0714 Epoch: 3 Global Step: 128450 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:39,720-Speed 2631.49 samples/sec Loss 11.8221 LearningRate 0.0714 Epoch: 3 Global Step: 128460 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:43,612-Speed 2631.06 samples/sec Loss 11.5941 LearningRate 0.0714 Epoch: 3 Global Step: 128470 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:47,517-Speed 2622.99 samples/sec Loss 11.7650 LearningRate 0.0714 Epoch: 3 Global Step: 128480 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:08:51,414-Speed 2628.99 samples/sec Loss 11.6997 LearningRate 0.0714 Epoch: 3 Global Step: 128490 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:08:55,305-Speed 2632.69 samples/sec Loss 11.6834 LearningRate 0.0714 Epoch: 3 Global Step: 128500 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:08:59,183-Speed 2640.65 samples/sec Loss 11.5863 LearningRate 0.0714 Epoch: 3 Global Step: 128510 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:03,080-Speed 2627.85 samples/sec Loss 11.6975 LearningRate 0.0714 Epoch: 3 Global Step: 128520 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:06,973-Speed 2631.08 samples/sec Loss 11.7245 LearningRate 0.0714 Epoch: 3 Global Step: 128530 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:10,879-Speed 2622.44 samples/sec Loss 11.7807 LearningRate 0.0714 Epoch: 3 Global Step: 128540 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:14,772-Speed 2630.76 samples/sec Loss 11.8172 LearningRate 0.0714 Epoch: 3 Global Step: 128550 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:18,672-Speed 2625.93 samples/sec Loss 11.6944 LearningRate 0.0714 Epoch: 3 Global Step: 128560 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:22,563-Speed 2632.96 samples/sec Loss 11.6172 LearningRate 0.0714 Epoch: 3 Global Step: 128570 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:26,455-Speed 2631.68 samples/sec Loss 11.6357 LearningRate 0.0714 Epoch: 3 Global Step: 128580 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:30,350-Speed 2629.51 samples/sec Loss 11.5783 LearningRate 0.0714 Epoch: 3 Global Step: 128590 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:34,242-Speed 2631.52 samples/sec Loss 11.7839 LearningRate 0.0714 Epoch: 3 Global Step: 128600 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:09:38,134-Speed 2631.43 samples/sec Loss 11.6835 LearningRate 0.0714 Epoch: 3 Global Step: 128610 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:09:42,027-Speed 2631.43 samples/sec Loss 11.6715 LearningRate 0.0714 Epoch: 3 Global Step: 128620 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:09:45,926-Speed 2627.18 samples/sec Loss 11.5259 LearningRate 0.0714 Epoch: 3 Global Step: 128630 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:09:49,817-Speed 2632.17 samples/sec Loss 11.6515 LearningRate 0.0714 Epoch: 3 Global Step: 128640 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:09:53,726-Speed 2620.73 samples/sec Loss 11.7352 LearningRate 0.0714 Epoch: 3 Global Step: 128650 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:09:57,620-Speed 2630.37 samples/sec Loss 11.6599 LearningRate 0.0714 Epoch: 3 Global Step: 128660 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:01,522-Speed 2625.51 samples/sec Loss 11.5428 LearningRate 0.0714 Epoch: 3 Global Step: 128670 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:05,416-Speed 2630.11 samples/sec Loss 11.8359 LearningRate 0.0714 Epoch: 3 Global Step: 128680 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:09,308-Speed 2631.49 samples/sec Loss 11.5571 LearningRate 0.0714 Epoch: 3 Global Step: 128690 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:13,202-Speed 2630.52 samples/sec Loss 11.6692 LearningRate 0.0714 Epoch: 3 Global Step: 128700 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:17,100-Speed 2627.39 samples/sec Loss 11.6835 LearningRate 0.0714 Epoch: 3 Global Step: 128710 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:10:20,998-Speed 2627.49 samples/sec Loss 11.7466 LearningRate 0.0714 Epoch: 3 Global Step: 128720 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:10:24,949-Speed 2593.05 samples/sec Loss 11.7119 LearningRate 0.0714 Epoch: 3 Global Step: 128730 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:10:28,838-Speed 2633.37 samples/sec Loss 11.7225 LearningRate 0.0714 Epoch: 3 Global Step: 128740 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:10:32,732-Speed 2630.90 samples/sec Loss 11.6952 LearningRate 0.0714 Epoch: 3 Global Step: 128750 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:10:36,624-Speed 2630.97 samples/sec Loss 11.8177 LearningRate 0.0714 Epoch: 3 Global Step: 128760 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:10:40,498-Speed 2643.66 samples/sec Loss 11.6017 LearningRate 0.0714 Epoch: 3 Global Step: 128770 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:44,425-Speed 2608.05 samples/sec Loss 11.6138 LearningRate 0.0714 Epoch: 3 Global Step: 128780 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:48,324-Speed 2626.82 samples/sec Loss 11.6755 LearningRate 0.0714 Epoch: 3 Global Step: 128790 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:52,229-Speed 2623.13 samples/sec Loss 11.5965 LearningRate 0.0714 Epoch: 3 Global Step: 128800 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:10:56,128-Speed 2627.15 samples/sec Loss 11.5430 LearningRate 0.0714 Epoch: 3 Global Step: 128810 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:00,041-Speed 2617.85 samples/sec Loss 11.7462 LearningRate 0.0714 Epoch: 3 Global Step: 128820 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:03,937-Speed 2628.49 samples/sec Loss 11.6722 LearningRate 0.0714 Epoch: 3 Global Step: 128830 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:07,833-Speed 2629.24 samples/sec Loss 11.6413 LearningRate 0.0714 Epoch: 3 Global Step: 128840 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:11,730-Speed 2628.24 samples/sec Loss 11.7569 LearningRate 0.0713 Epoch: 3 Global Step: 128850 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:15,623-Speed 2631.08 samples/sec Loss 11.6295 LearningRate 0.0713 Epoch: 3 Global Step: 128860 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:19,509-Speed 2635.49 samples/sec Loss 11.8304 LearningRate 0.0713 Epoch: 3 Global Step: 128870 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:23,396-Speed 2635.60 samples/sec Loss 11.6842 LearningRate 0.0713 Epoch: 3 Global Step: 128880 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:27,291-Speed 2629.74 samples/sec Loss 11.5885 LearningRate 0.0713 Epoch: 3 Global Step: 128890 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:31,184-Speed 2630.92 samples/sec Loss 11.5692 LearningRate 0.0713 Epoch: 3 Global Step: 128900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:35,079-Speed 2629.45 samples/sec Loss 11.6548 LearningRate 0.0713 Epoch: 3 Global Step: 128910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:38,974-Speed 2629.10 samples/sec Loss 11.5265 LearningRate 0.0713 Epoch: 3 Global Step: 128920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:42,867-Speed 2631.06 samples/sec Loss 11.5897 LearningRate 0.0713 Epoch: 3 Global Step: 128930 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:46,760-Speed 2631.53 samples/sec Loss 11.7354 LearningRate 0.0713 Epoch: 3 Global Step: 128940 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:50,662-Speed 2624.76 samples/sec Loss 11.7769 LearningRate 0.0713 Epoch: 3 Global Step: 128950 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:54,564-Speed 2625.64 samples/sec Loss 11.6207 LearningRate 0.0713 Epoch: 3 Global Step: 128960 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:11:58,459-Speed 2629.38 samples/sec Loss 11.8681 LearningRate 0.0713 Epoch: 3 Global Step: 128970 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:12:02,335-Speed 2643.08 samples/sec Loss 11.7734 LearningRate 0.0713 Epoch: 3 Global Step: 128980 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:06,241-Speed 2621.53 samples/sec Loss 11.7628 LearningRate 0.0713 Epoch: 3 Global Step: 128990 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:10,140-Speed 2627.32 samples/sec Loss 11.5741 LearningRate 0.0713 Epoch: 3 Global Step: 129000 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:14,035-Speed 2629.45 samples/sec Loss 11.7280 LearningRate 0.0713 Epoch: 3 Global Step: 129010 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:17,930-Speed 2629.79 samples/sec Loss 11.7230 LearningRate 0.0713 Epoch: 3 Global Step: 129020 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:21,851-Speed 2612.42 samples/sec Loss 11.6406 LearningRate 0.0713 Epoch: 3 Global Step: 129030 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:25,832-Speed 2573.05 samples/sec Loss 11.6540 LearningRate 0.0713 Epoch: 3 Global Step: 129040 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:29,758-Speed 2609.05 samples/sec Loss 11.6916 LearningRate 0.0713 Epoch: 3 Global Step: 129050 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:33,650-Speed 2631.06 samples/sec Loss 11.7591 LearningRate 0.0713 Epoch: 3 Global Step: 129060 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:37,543-Speed 2632.23 samples/sec Loss 11.6781 LearningRate 0.0713 Epoch: 3 Global Step: 129070 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:41,442-Speed 2627.08 samples/sec Loss 11.6315 LearningRate 0.0713 Epoch: 3 Global Step: 129080 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:12:45,341-Speed 2627.23 samples/sec Loss 11.5289 LearningRate 0.0713 Epoch: 3 Global Step: 129090 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:12:49,268-Speed 2607.48 samples/sec Loss 11.7113 LearningRate 0.0713 Epoch: 3 Global Step: 129100 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:12:53,154-Speed 2636.36 samples/sec Loss 11.7425 LearningRate 0.0713 Epoch: 3 Global Step: 129110 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:12:57,047-Speed 2630.48 samples/sec Loss 11.6616 LearningRate 0.0713 Epoch: 3 Global Step: 129120 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:00,940-Speed 2631.51 samples/sec Loss 11.7736 LearningRate 0.0713 Epoch: 3 Global Step: 129130 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:04,836-Speed 2629.08 samples/sec Loss 11.8007 LearningRate 0.0713 Epoch: 3 Global Step: 129140 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:08,733-Speed 2628.55 samples/sec Loss 11.5360 LearningRate 0.0713 Epoch: 3 Global Step: 129150 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:12,719-Speed 2569.84 samples/sec Loss 11.7113 LearningRate 0.0713 Epoch: 3 Global Step: 129160 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:16,614-Speed 2629.86 samples/sec Loss 11.5717 LearningRate 0.0713 Epoch: 3 Global Step: 129170 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:20,518-Speed 2623.80 samples/sec Loss 11.6460 LearningRate 0.0713 Epoch: 3 Global Step: 129180 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:24,413-Speed 2629.24 samples/sec Loss 11.5806 LearningRate 0.0713 Epoch: 3 Global Step: 129190 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:28,308-Speed 2629.40 samples/sec Loss 11.6107 LearningRate 0.0713 Epoch: 3 Global Step: 129200 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:32,205-Speed 2629.22 samples/sec Loss 11.6507 LearningRate 0.0713 Epoch: 3 Global Step: 129210 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:13:36,084-Speed 2640.22 samples/sec Loss 11.7114 LearningRate 0.0713 Epoch: 3 Global Step: 129220 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:39,983-Speed 2626.83 samples/sec Loss 11.7732 LearningRate 0.0713 Epoch: 3 Global Step: 129230 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:43,881-Speed 2627.67 samples/sec Loss 11.6165 LearningRate 0.0713 Epoch: 3 Global Step: 129240 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:47,780-Speed 2627.42 samples/sec Loss 11.6731 LearningRate 0.0713 Epoch: 3 Global Step: 129250 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:51,676-Speed 2629.17 samples/sec Loss 11.8101 LearningRate 0.0713 Epoch: 3 Global Step: 129260 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:55,571-Speed 2629.40 samples/sec Loss 11.5246 LearningRate 0.0713 Epoch: 3 Global Step: 129270 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:13:59,466-Speed 2629.15 samples/sec Loss 11.7880 LearningRate 0.0713 Epoch: 3 Global Step: 129280 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:03,361-Speed 2630.05 samples/sec Loss 11.6739 LearningRate 0.0713 Epoch: 3 Global Step: 129290 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:07,254-Speed 2631.25 samples/sec Loss 11.5421 LearningRate 0.0713 Epoch: 3 Global Step: 129300 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:11,150-Speed 2629.05 samples/sec Loss 11.6635 LearningRate 0.0713 Epoch: 3 Global Step: 129310 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:15,055-Speed 2623.31 samples/sec Loss 11.5858 LearningRate 0.0713 Epoch: 3 Global Step: 129320 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:14:18,977-Speed 2611.52 samples/sec Loss 11.6494 LearningRate 0.0713 Epoch: 3 Global Step: 129330 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:14:22,880-Speed 2624.11 samples/sec Loss 11.7222 LearningRate 0.0712 Epoch: 3 Global Step: 129340 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:14:26,784-Speed 2622.90 samples/sec Loss 11.7358 LearningRate 0.0712 Epoch: 3 Global Step: 129350 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:14:30,686-Speed 2625.15 samples/sec Loss 11.4577 LearningRate 0.0712 Epoch: 3 Global Step: 129360 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:14:34,582-Speed 2628.96 samples/sec Loss 11.7443 LearningRate 0.0712 Epoch: 3 Global Step: 129370 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:14:38,479-Speed 2628.02 samples/sec Loss 11.8184 LearningRate 0.0712 Epoch: 3 Global Step: 129380 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:14:42,360-Speed 2639.54 samples/sec Loss 11.6264 LearningRate 0.0712 Epoch: 3 Global Step: 129390 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:46,257-Speed 2628.86 samples/sec Loss 11.6301 LearningRate 0.0712 Epoch: 3 Global Step: 129400 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:50,152-Speed 2629.63 samples/sec Loss 11.7189 LearningRate 0.0712 Epoch: 3 Global Step: 129410 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:54,063-Speed 2618.58 samples/sec Loss 11.5964 LearningRate 0.0712 Epoch: 3 Global Step: 129420 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:14:57,959-Speed 2628.89 samples/sec Loss 11.4963 LearningRate 0.0712 Epoch: 3 Global Step: 129430 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:01,855-Speed 2629.48 samples/sec Loss 11.5767 LearningRate 0.0712 Epoch: 3 Global Step: 129440 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:05,749-Speed 2630.06 samples/sec Loss 11.6359 LearningRate 0.0712 Epoch: 3 Global Step: 129450 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:09,643-Speed 2630.48 samples/sec Loss 11.6951 LearningRate 0.0712 Epoch: 3 Global Step: 129460 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:13,534-Speed 2632.97 samples/sec Loss 11.7909 LearningRate 0.0712 Epoch: 3 Global Step: 129470 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:17,429-Speed 2629.56 samples/sec Loss 11.8139 LearningRate 0.0712 Epoch: 3 Global Step: 129480 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:21,351-Speed 2611.46 samples/sec Loss 11.7944 LearningRate 0.0712 Epoch: 3 Global Step: 129490 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:15:25,241-Speed 2633.19 samples/sec Loss 11.7548 LearningRate 0.0712 Epoch: 3 Global Step: 129500 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:29,141-Speed 2626.07 samples/sec Loss 11.5775 LearningRate 0.0712 Epoch: 3 Global Step: 129510 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:33,048-Speed 2621.93 samples/sec Loss 11.7562 LearningRate 0.0712 Epoch: 3 Global Step: 129520 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:36,940-Speed 2631.53 samples/sec Loss 11.7417 LearningRate 0.0712 Epoch: 3 Global Step: 129530 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:40,844-Speed 2623.99 samples/sec Loss 11.7458 LearningRate 0.0712 Epoch: 3 Global Step: 129540 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:44,738-Speed 2629.99 samples/sec Loss 11.7864 LearningRate 0.0712 Epoch: 3 Global Step: 129550 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:48,632-Speed 2630.98 samples/sec Loss 11.7403 LearningRate 0.0712 Epoch: 3 Global Step: 129560 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:52,520-Speed 2634.59 samples/sec Loss 11.8427 LearningRate 0.0712 Epoch: 3 Global Step: 129570 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:15:56,414-Speed 2630.08 samples/sec Loss 11.6789 LearningRate 0.0712 Epoch: 3 Global Step: 129580 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:16:00,311-Speed 2628.32 samples/sec Loss 11.7211 LearningRate 0.0712 Epoch: 3 Global Step: 129590 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:16:04,172-Speed 2652.40 samples/sec Loss 11.8143 LearningRate 0.0712 Epoch: 3 Global Step: 129600 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:08,072-Speed 2626.36 samples/sec Loss 11.6665 LearningRate 0.0712 Epoch: 3 Global Step: 129610 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:11,970-Speed 2628.31 samples/sec Loss 11.6816 LearningRate 0.0712 Epoch: 3 Global Step: 129620 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:15,858-Speed 2633.98 samples/sec Loss 11.5975 LearningRate 0.0712 Epoch: 3 Global Step: 129630 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:19,765-Speed 2621.82 samples/sec Loss 11.7367 LearningRate 0.0712 Epoch: 3 Global Step: 129640 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:23,688-Speed 2610.72 samples/sec Loss 11.7312 LearningRate 0.0712 Epoch: 3 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:27,583-Speed 2629.81 samples/sec Loss 11.7814 LearningRate 0.0712 Epoch: 3 Global Step: 129660 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:31,483-Speed 2625.87 samples/sec Loss 11.5874 LearningRate 0.0712 Epoch: 3 Global Step: 129670 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:16:35,385-Speed 2624.98 samples/sec Loss 11.7559 LearningRate 0.0712 Epoch: 3 Global Step: 129680 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:16:39,286-Speed 2625.18 samples/sec Loss 11.5554 LearningRate 0.0712 Epoch: 3 Global Step: 129690 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:16:43,182-Speed 2629.84 samples/sec Loss 11.7087 LearningRate 0.0712 Epoch: 3 Global Step: 129700 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:16:47,079-Speed 2629.02 samples/sec Loss 11.6165 LearningRate 0.0712 Epoch: 3 Global Step: 129710 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:16:50,972-Speed 2630.38 samples/sec Loss 11.5548 LearningRate 0.0712 Epoch: 3 Global Step: 129720 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:16:54,883-Speed 2619.61 samples/sec Loss 11.6000 LearningRate 0.0712 Epoch: 3 Global Step: 129730 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:16:58,778-Speed 2629.61 samples/sec Loss 11.5385 LearningRate 0.0712 Epoch: 3 Global Step: 129740 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:02,680-Speed 2625.16 samples/sec Loss 11.6961 LearningRate 0.0712 Epoch: 3 Global Step: 129750 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:06,594-Speed 2617.06 samples/sec Loss 11.7657 LearningRate 0.0712 Epoch: 3 Global Step: 129760 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:10,482-Speed 2634.65 samples/sec Loss 11.7706 LearningRate 0.0712 Epoch: 3 Global Step: 129770 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:14,376-Speed 2630.34 samples/sec Loss 11.6970 LearningRate 0.0712 Epoch: 3 Global Step: 129780 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:18,279-Speed 2623.98 samples/sec Loss 11.7285 LearningRate 0.0712 Epoch: 3 Global Step: 129790 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:22,174-Speed 2629.63 samples/sec Loss 11.7934 LearningRate 0.0712 Epoch: 3 Global Step: 129800 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:17:26,068-Speed 2630.59 samples/sec Loss 11.6423 LearningRate 0.0712 Epoch: 3 Global Step: 129810 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:17:29,962-Speed 2630.25 samples/sec Loss 11.5774 LearningRate 0.0712 Epoch: 3 Global Step: 129820 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:17:33,841-Speed 2641.04 samples/sec Loss 11.6797 LearningRate 0.0711 Epoch: 3 Global Step: 129830 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:37,735-Speed 2630.35 samples/sec Loss 11.7330 LearningRate 0.0711 Epoch: 3 Global Step: 129840 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:41,630-Speed 2629.35 samples/sec Loss 11.6626 LearningRate 0.0711 Epoch: 3 Global Step: 129850 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:45,530-Speed 2626.10 samples/sec Loss 12.0724 LearningRate 0.0711 Epoch: 3 Global Step: 129860 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:49,425-Speed 2629.77 samples/sec Loss 11.8813 LearningRate 0.0711 Epoch: 3 Global Step: 129870 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:17:53,309-Speed 2637.74 samples/sec Loss 11.6924 LearningRate 0.0711 Epoch: 3 Global Step: 129880 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:17:57,203-Speed 2629.95 samples/sec Loss 11.8102 LearningRate 0.0711 Epoch: 3 Global Step: 129890 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:01,111-Speed 2621.15 samples/sec Loss 11.7022 LearningRate 0.0711 Epoch: 3 Global Step: 129900 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:05,024-Speed 2617.00 samples/sec Loss 11.6343 LearningRate 0.0711 Epoch: 3 Global Step: 129910 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:08,939-Speed 2616.18 samples/sec Loss 11.6581 LearningRate 0.0711 Epoch: 3 Global Step: 129920 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:12,882-Speed 2597.93 samples/sec Loss 11.7582 LearningRate 0.0711 Epoch: 3 Global Step: 129930 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:16,786-Speed 2623.27 samples/sec Loss 11.8050 LearningRate 0.0711 Epoch: 3 Global Step: 129940 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:20,686-Speed 2626.38 samples/sec Loss 11.7228 LearningRate 0.0711 Epoch: 3 Global Step: 129950 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:24,575-Speed 2633.72 samples/sec Loss 11.7239 LearningRate 0.0711 Epoch: 3 Global Step: 129960 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:28,466-Speed 2632.33 samples/sec Loss 11.7294 LearningRate 0.0711 Epoch: 3 Global Step: 129970 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:18:32,316-Speed 2660.54 samples/sec Loss 11.7065 LearningRate 0.0711 Epoch: 3 Global Step: 129980 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:18:36,202-Speed 2635.92 samples/sec Loss 11.8400 LearningRate 0.0711 Epoch: 3 Global Step: 129990 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:18:40,097-Speed 2629.50 samples/sec Loss 11.8558 LearningRate 0.0711 Epoch: 3 Global Step: 130000 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:19:22,789-[lfw][130000]XNorm: 22.624960
Training: 2022-04-13 10:19:22,790-[lfw][130000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-13 10:19:22,790-[lfw][130000]Accuracy-Highest: 0.99783
Training: 2022-04-13 10:20:12,922-[cfp_fp][130000]XNorm: 20.408098
Training: 2022-04-13 10:20:12,923-[cfp_fp][130000]Accuracy-Flip: 0.97771+-0.00698
Training: 2022-04-13 10:20:12,924-[cfp_fp][130000]Accuracy-Highest: 0.97986
Training: 2022-04-13 10:20:56,101-[agedb_30][130000]XNorm: 22.158565
Training: 2022-04-13 10:20:56,102-[agedb_30][130000]Accuracy-Flip: 0.96717+-0.00646
Training: 2022-04-13 10:20:56,103-[agedb_30][130000]Accuracy-Highest: 0.96800
Training: 2022-04-13 10:20:59,971-Speed 73.21 samples/sec Loss 11.8310 LearningRate 0.0711 Epoch: 3 Global Step: 130010 Fp16 Grad Scale: 16384 Required: 79 hours
Training: 2022-04-13 10:21:03,837-Speed 2649.01 samples/sec Loss 11.4757 LearningRate 0.0711 Epoch: 3 Global Step: 130020 Fp16 Grad Scale: 16384 Required: 79 hours
Training: 2022-04-13 10:21:07,707-Speed 2646.98 samples/sec Loss 11.7318 LearningRate 0.0711 Epoch: 3 Global Step: 130030 Fp16 Grad Scale: 16384 Required: 79 hours
Training: 2022-04-13 10:21:11,609-Speed 2625.49 samples/sec Loss 11.6377 LearningRate 0.0711 Epoch: 3 Global Step: 130040 Fp16 Grad Scale: 16384 Required: 79 hours
Training: 2022-04-13 10:21:15,577-Speed 2581.55 samples/sec Loss 11.6190 LearningRate 0.0711 Epoch: 3 Global Step: 130050 Fp16 Grad Scale: 16384 Required: 79 hours
Training: 2022-04-13 10:21:19,451-Speed 2643.17 samples/sec Loss 11.6699 LearningRate 0.0711 Epoch: 3 Global Step: 130060 Fp16 Grad Scale: 16384 Required: 79 hours
Training: 2022-04-13 10:21:23,332-Speed 2639.56 samples/sec Loss 11.7546 LearningRate 0.0711 Epoch: 3 Global Step: 130070 Fp16 Grad Scale: 16384 Required: 79 hours
Training: 2022-04-13 10:21:27,210-Speed 2641.15 samples/sec Loss 11.7167 LearningRate 0.0711 Epoch: 3 Global Step: 130080 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:31,090-Speed 2640.36 samples/sec Loss 11.6134 LearningRate 0.0711 Epoch: 3 Global Step: 130090 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:34,989-Speed 2626.39 samples/sec Loss 11.5885 LearningRate 0.0711 Epoch: 3 Global Step: 130100 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:38,875-Speed 2636.31 samples/sec Loss 11.7321 LearningRate 0.0711 Epoch: 3 Global Step: 130110 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:42,754-Speed 2640.26 samples/sec Loss 11.4579 LearningRate 0.0711 Epoch: 3 Global Step: 130120 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:46,651-Speed 2629.03 samples/sec Loss 11.7277 LearningRate 0.0711 Epoch: 3 Global Step: 130130 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:50,535-Speed 2636.50 samples/sec Loss 11.7479 LearningRate 0.0711 Epoch: 3 Global Step: 130140 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:54,423-Speed 2634.47 samples/sec Loss 11.6941 LearningRate 0.0711 Epoch: 3 Global Step: 130150 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:21:58,358-Speed 2602.93 samples/sec Loss 11.7482 LearningRate 0.0711 Epoch: 3 Global Step: 130160 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:22:02,255-Speed 2628.36 samples/sec Loss 11.6573 LearningRate 0.0711 Epoch: 3 Global Step: 130170 Fp16 Grad Scale: 32768 Required: 79 hours
Training: 2022-04-13 10:22:06,149-Speed 2630.54 samples/sec Loss 11.6893 LearningRate 0.0711 Epoch: 3 Global Step: 130180 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:10,042-Speed 2631.12 samples/sec Loss 11.6086 LearningRate 0.0711 Epoch: 3 Global Step: 130190 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:13,944-Speed 2624.66 samples/sec Loss 11.6355 LearningRate 0.0711 Epoch: 3 Global Step: 130200 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:17,840-Speed 2629.26 samples/sec Loss 11.7123 LearningRate 0.0711 Epoch: 3 Global Step: 130210 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:21,732-Speed 2631.79 samples/sec Loss 11.5457 LearningRate 0.0711 Epoch: 3 Global Step: 130220 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:25,627-Speed 2629.44 samples/sec Loss 11.7810 LearningRate 0.0711 Epoch: 3 Global Step: 130230 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:29,523-Speed 2628.49 samples/sec Loss 11.6746 LearningRate 0.0711 Epoch: 3 Global Step: 130240 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:33,437-Speed 2617.22 samples/sec Loss 11.5519 LearningRate 0.0711 Epoch: 3 Global Step: 130250 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:37,445-Speed 2555.01 samples/sec Loss 11.6709 LearningRate 0.0711 Epoch: 3 Global Step: 130260 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:41,432-Speed 2569.40 samples/sec Loss 11.7373 LearningRate 0.0711 Epoch: 3 Global Step: 130270 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:22:45,345-Speed 2617.60 samples/sec Loss 11.6232 LearningRate 0.0711 Epoch: 3 Global Step: 130280 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:22:49,241-Speed 2628.67 samples/sec Loss 11.5235 LearningRate 0.0711 Epoch: 3 Global Step: 130290 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:22:53,139-Speed 2627.74 samples/sec Loss 11.7606 LearningRate 0.0711 Epoch: 3 Global Step: 130300 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:22:57,036-Speed 2628.09 samples/sec Loss 11.7700 LearningRate 0.0711 Epoch: 3 Global Step: 130310 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:23:00,931-Speed 2629.12 samples/sec Loss 11.6757 LearningRate 0.0710 Epoch: 3 Global Step: 130320 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:23:04,842-Speed 2618.98 samples/sec Loss 11.6078 LearningRate 0.0710 Epoch: 3 Global Step: 130330 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:23:08,740-Speed 2627.87 samples/sec Loss 11.7829 LearningRate 0.0710 Epoch: 3 Global Step: 130340 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:23:12,635-Speed 2629.50 samples/sec Loss 11.7930 LearningRate 0.0710 Epoch: 3 Global Step: 130350 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:23:16,543-Speed 2620.60 samples/sec Loss 11.6241 LearningRate 0.0710 Epoch: 3 Global Step: 130360 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:23:20,463-Speed 2613.47 samples/sec Loss 11.7131 LearningRate 0.0710 Epoch: 3 Global Step: 130370 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:23:24,399-Speed 2602.42 samples/sec Loss 11.7312 LearningRate 0.0710 Epoch: 3 Global Step: 130380 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:28,330-Speed 2605.35 samples/sec Loss 11.7498 LearningRate 0.0710 Epoch: 3 Global Step: 130390 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:32,251-Speed 2611.85 samples/sec Loss 11.5767 LearningRate 0.0710 Epoch: 3 Global Step: 130400 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:36,147-Speed 2628.89 samples/sec Loss 11.8385 LearningRate 0.0710 Epoch: 3 Global Step: 130410 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:40,047-Speed 2626.32 samples/sec Loss 11.5604 LearningRate 0.0710 Epoch: 3 Global Step: 130420 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:43,951-Speed 2623.84 samples/sec Loss 11.6179 LearningRate 0.0710 Epoch: 3 Global Step: 130430 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:47,846-Speed 2629.33 samples/sec Loss 11.4821 LearningRate 0.0710 Epoch: 3 Global Step: 130440 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:51,742-Speed 2629.25 samples/sec Loss 11.6301 LearningRate 0.0710 Epoch: 3 Global Step: 130450 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:55,663-Speed 2612.45 samples/sec Loss 11.6790 LearningRate 0.0710 Epoch: 3 Global Step: 130460 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:23:59,563-Speed 2626.03 samples/sec Loss 11.5938 LearningRate 0.0710 Epoch: 3 Global Step: 130470 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:03,458-Speed 2629.75 samples/sec Loss 11.6501 LearningRate 0.0710 Epoch: 3 Global Step: 130480 Fp16 Grad Scale: 524288 Required: 79 hours
Training: 2022-04-13 10:24:07,337-Speed 2640.21 samples/sec Loss 11.5515 LearningRate 0.0710 Epoch: 3 Global Step: 130490 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:11,232-Speed 2629.59 samples/sec Loss 11.5735 LearningRate 0.0710 Epoch: 3 Global Step: 130500 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:15,126-Speed 2629.88 samples/sec Loss 11.7582 LearningRate 0.0710 Epoch: 3 Global Step: 130510 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:19,021-Speed 2629.61 samples/sec Loss 11.7323 LearningRate 0.0710 Epoch: 3 Global Step: 130520 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:22,921-Speed 2626.44 samples/sec Loss 11.6834 LearningRate 0.0710 Epoch: 3 Global Step: 130530 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:26,831-Speed 2619.67 samples/sec Loss 11.6225 LearningRate 0.0710 Epoch: 3 Global Step: 130540 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:30,730-Speed 2627.19 samples/sec Loss 11.5860 LearningRate 0.0710 Epoch: 3 Global Step: 130550 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:34,636-Speed 2621.64 samples/sec Loss 11.6876 LearningRate 0.0710 Epoch: 3 Global Step: 130560 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:38,538-Speed 2625.02 samples/sec Loss 11.5007 LearningRate 0.0710 Epoch: 3 Global Step: 130570 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:42,434-Speed 2629.26 samples/sec Loss 11.7150 LearningRate 0.0710 Epoch: 3 Global Step: 130580 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:46,323-Speed 2633.28 samples/sec Loss 11.6496 LearningRate 0.0710 Epoch: 3 Global Step: 130590 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:50,219-Speed 2628.76 samples/sec Loss 11.6718 LearningRate 0.0710 Epoch: 3 Global Step: 130600 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:54,121-Speed 2624.99 samples/sec Loss 11.7986 LearningRate 0.0710 Epoch: 3 Global Step: 130610 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:24:58,029-Speed 2621.19 samples/sec Loss 11.8100 LearningRate 0.0710 Epoch: 3 Global Step: 130620 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:25:01,941-Speed 2617.69 samples/sec Loss 11.6844 LearningRate 0.0710 Epoch: 3 Global Step: 130630 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:25:05,837-Speed 2628.87 samples/sec Loss 11.6988 LearningRate 0.0710 Epoch: 3 Global Step: 130640 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:25:09,734-Speed 2628.24 samples/sec Loss 11.6518 LearningRate 0.0710 Epoch: 3 Global Step: 130650 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:25:13,606-Speed 2645.66 samples/sec Loss 11.6971 LearningRate 0.0710 Epoch: 3 Global Step: 130660 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:17,501-Speed 2629.27 samples/sec Loss 11.6767 LearningRate 0.0710 Epoch: 3 Global Step: 130670 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:21,402-Speed 2625.81 samples/sec Loss 11.6809 LearningRate 0.0710 Epoch: 3 Global Step: 130680 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:25,316-Speed 2616.86 samples/sec Loss 11.5681 LearningRate 0.0710 Epoch: 3 Global Step: 130690 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:29,308-Speed 2565.44 samples/sec Loss 11.6237 LearningRate 0.0710 Epoch: 3 Global Step: 130700 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:33,257-Speed 2593.86 samples/sec Loss 11.7184 LearningRate 0.0710 Epoch: 3 Global Step: 130710 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:37,157-Speed 2626.09 samples/sec Loss 11.7849 LearningRate 0.0710 Epoch: 3 Global Step: 130720 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:41,051-Speed 2629.83 samples/sec Loss 11.6852 LearningRate 0.0710 Epoch: 3 Global Step: 130730 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:44,953-Speed 2625.30 samples/sec Loss 11.7048 LearningRate 0.0710 Epoch: 3 Global Step: 130740 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:48,853-Speed 2626.05 samples/sec Loss 11.6292 LearningRate 0.0710 Epoch: 3 Global Step: 130750 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:25:52,756-Speed 2624.36 samples/sec Loss 11.7477 LearningRate 0.0710 Epoch: 3 Global Step: 130760 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:25:56,635-Speed 2640.81 samples/sec Loss 11.6020 LearningRate 0.0710 Epoch: 3 Global Step: 130770 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:00,534-Speed 2626.30 samples/sec Loss 11.6360 LearningRate 0.0710 Epoch: 3 Global Step: 130780 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:04,432-Speed 2627.46 samples/sec Loss 11.7221 LearningRate 0.0710 Epoch: 3 Global Step: 130790 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:08,332-Speed 2626.39 samples/sec Loss 11.5646 LearningRate 0.0710 Epoch: 3 Global Step: 130800 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:12,227-Speed 2629.89 samples/sec Loss 11.5655 LearningRate 0.0709 Epoch: 3 Global Step: 130810 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:16,123-Speed 2628.78 samples/sec Loss 11.6771 LearningRate 0.0709 Epoch: 3 Global Step: 130820 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:20,019-Speed 2628.98 samples/sec Loss 11.6224 LearningRate 0.0709 Epoch: 3 Global Step: 130830 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:24,035-Speed 2550.32 samples/sec Loss 11.6163 LearningRate 0.0709 Epoch: 3 Global Step: 130840 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:27,948-Speed 2617.89 samples/sec Loss 11.4852 LearningRate 0.0709 Epoch: 3 Global Step: 130850 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:31,855-Speed 2621.42 samples/sec Loss 11.6118 LearningRate 0.0709 Epoch: 3 Global Step: 130860 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:35,740-Speed 2636.20 samples/sec Loss 11.5589 LearningRate 0.0709 Epoch: 3 Global Step: 130870 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:39,643-Speed 2624.56 samples/sec Loss 11.5994 LearningRate 0.0709 Epoch: 3 Global Step: 130880 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:43,546-Speed 2624.28 samples/sec Loss 11.7393 LearningRate 0.0709 Epoch: 3 Global Step: 130890 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:47,448-Speed 2624.99 samples/sec Loss 11.7128 LearningRate 0.0709 Epoch: 3 Global Step: 130900 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:51,353-Speed 2622.76 samples/sec Loss 11.8304 LearningRate 0.0709 Epoch: 3 Global Step: 130910 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:55,253-Speed 2626.27 samples/sec Loss 11.4677 LearningRate 0.0709 Epoch: 3 Global Step: 130920 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:26:59,176-Speed 2611.08 samples/sec Loss 11.7953 LearningRate 0.0709 Epoch: 3 Global Step: 130930 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:03,082-Speed 2621.92 samples/sec Loss 11.5820 LearningRate 0.0709 Epoch: 3 Global Step: 130940 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:06,982-Speed 2626.06 samples/sec Loss 11.6757 LearningRate 0.0709 Epoch: 3 Global Step: 130950 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:10,880-Speed 2627.46 samples/sec Loss 11.5301 LearningRate 0.0709 Epoch: 3 Global Step: 130960 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:14,861-Speed 2573.13 samples/sec Loss 11.6684 LearningRate 0.0709 Epoch: 3 Global Step: 130970 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:27:18,768-Speed 2621.30 samples/sec Loss 11.5921 LearningRate 0.0709 Epoch: 3 Global Step: 130980 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:27:22,680-Speed 2618.31 samples/sec Loss 11.7501 LearningRate 0.0709 Epoch: 3 Global Step: 130990 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:27:26,582-Speed 2624.79 samples/sec Loss 11.5596 LearningRate 0.0709 Epoch: 3 Global Step: 131000 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:27:30,478-Speed 2629.39 samples/sec Loss 11.6178 LearningRate 0.0709 Epoch: 3 Global Step: 131010 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:27:34,376-Speed 2627.44 samples/sec Loss 11.6126 LearningRate 0.0709 Epoch: 3 Global Step: 131020 Fp16 Grad Scale: 262144 Required: 79 hours
Training: 2022-04-13 10:27:38,257-Speed 2639.21 samples/sec Loss 11.8022 LearningRate 0.0709 Epoch: 3 Global Step: 131030 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:42,158-Speed 2625.49 samples/sec Loss 11.6823 LearningRate 0.0709 Epoch: 3 Global Step: 131040 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:46,057-Speed 2627.02 samples/sec Loss 11.6356 LearningRate 0.0709 Epoch: 3 Global Step: 131050 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:49,959-Speed 2624.74 samples/sec Loss 11.6858 LearningRate 0.0709 Epoch: 3 Global Step: 131060 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:53,859-Speed 2626.22 samples/sec Loss 11.6387 LearningRate 0.0709 Epoch: 3 Global Step: 131070 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:27:57,735-Speed 2642.41 samples/sec Loss 11.6363 LearningRate 0.0709 Epoch: 3 Global Step: 131080 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:01,647-Speed 2617.85 samples/sec Loss 11.5730 LearningRate 0.0709 Epoch: 3 Global Step: 131090 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:05,542-Speed 2630.13 samples/sec Loss 11.5276 LearningRate 0.0709 Epoch: 3 Global Step: 131100 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:09,440-Speed 2627.77 samples/sec Loss 11.7688 LearningRate 0.0709 Epoch: 3 Global Step: 131110 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:13,337-Speed 2628.65 samples/sec Loss 11.5457 LearningRate 0.0709 Epoch: 3 Global Step: 131120 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:17,234-Speed 2627.83 samples/sec Loss 11.6374 LearningRate 0.0709 Epoch: 3 Global Step: 131130 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:21,138-Speed 2623.62 samples/sec Loss 11.7521 LearningRate 0.0709 Epoch: 3 Global Step: 131140 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:25,042-Speed 2623.31 samples/sec Loss 11.6679 LearningRate 0.0709 Epoch: 3 Global Step: 131150 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:28,950-Speed 2620.84 samples/sec Loss 11.6260 LearningRate 0.0709 Epoch: 3 Global Step: 131160 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:32,842-Speed 2631.34 samples/sec Loss 11.7089 LearningRate 0.0709 Epoch: 3 Global Step: 131170 Fp16 Grad Scale: 65536 Required: 79 hours
Training: 2022-04-13 10:28:36,736-Speed 2630.52 samples/sec Loss 11.6426 LearningRate 0.0709 Epoch: 3 Global Step: 131180 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:28:40,636-Speed 2626.84 samples/sec Loss 11.8865 LearningRate 0.0709 Epoch: 3 Global Step: 131190 Fp16 Grad Scale: 131072 Required: 79 hours
Training: 2022-04-13 10:28:44,531-Speed 2629.61 samples/sec Loss 11.6168 LearningRate 0.0709 Epoch: 3 Global Step: 131200 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:28:48,423-Speed 2631.49 samples/sec Loss 11.7150 LearningRate 0.0709 Epoch: 3 Global Step: 131210 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:28:52,317-Speed 2630.82 samples/sec Loss 11.7038 LearningRate 0.0709 Epoch: 3 Global Step: 131220 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:28:56,261-Speed 2596.80 samples/sec Loss 11.6625 LearningRate 0.0709 Epoch: 3 Global Step: 131230 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:00,165-Speed 2623.12 samples/sec Loss 11.6725 LearningRate 0.0709 Epoch: 3 Global Step: 131240 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:04,139-Speed 2577.73 samples/sec Loss 11.5363 LearningRate 0.0709 Epoch: 3 Global Step: 131250 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:08,056-Speed 2614.92 samples/sec Loss 11.4468 LearningRate 0.0709 Epoch: 3 Global Step: 131260 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:11,958-Speed 2625.00 samples/sec Loss 11.7225 LearningRate 0.0709 Epoch: 3 Global Step: 131270 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:15,868-Speed 2619.41 samples/sec Loss 11.6620 LearningRate 0.0709 Epoch: 3 Global Step: 131280 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:19,773-Speed 2622.73 samples/sec Loss 11.5147 LearningRate 0.0709 Epoch: 3 Global Step: 131290 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:23,675-Speed 2624.58 samples/sec Loss 11.5812 LearningRate 0.0709 Epoch: 3 Global Step: 131300 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:27,576-Speed 2625.95 samples/sec Loss 11.6308 LearningRate 0.0708 Epoch: 3 Global Step: 131310 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:31,483-Speed 2621.63 samples/sec Loss 11.6793 LearningRate 0.0708 Epoch: 3 Global Step: 131320 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:35,393-Speed 2619.46 samples/sec Loss 11.5187 LearningRate 0.0708 Epoch: 3 Global Step: 131330 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:39,298-Speed 2622.70 samples/sec Loss 11.6699 LearningRate 0.0708 Epoch: 3 Global Step: 131340 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:43,197-Speed 2627.50 samples/sec Loss 11.6702 LearningRate 0.0708 Epoch: 3 Global Step: 131350 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:29:47,081-Speed 2636.68 samples/sec Loss 11.5974 LearningRate 0.0708 Epoch: 3 Global Step: 131360 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:50,985-Speed 2623.84 samples/sec Loss 11.7054 LearningRate 0.0708 Epoch: 3 Global Step: 131370 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:54,902-Speed 2614.28 samples/sec Loss 11.6298 LearningRate 0.0708 Epoch: 3 Global Step: 131380 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:29:58,823-Speed 2613.05 samples/sec Loss 11.7362 LearningRate 0.0708 Epoch: 3 Global Step: 131390 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:30:02,730-Speed 2621.52 samples/sec Loss 11.6145 LearningRate 0.0708 Epoch: 3 Global Step: 131400 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:30:06,630-Speed 2625.94 samples/sec Loss 11.6530 LearningRate 0.0708 Epoch: 3 Global Step: 131410 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:30:10,535-Speed 2622.43 samples/sec Loss 11.5070 LearningRate 0.0708 Epoch: 3 Global Step: 131420 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:30:14,434-Speed 2627.51 samples/sec Loss 11.6405 LearningRate 0.0708 Epoch: 3 Global Step: 131430 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:30:18,332-Speed 2628.21 samples/sec Loss 11.6565 LearningRate 0.0708 Epoch: 3 Global Step: 131440 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:30:22,269-Speed 2601.45 samples/sec Loss 11.5151 LearningRate 0.0708 Epoch: 3 Global Step: 131450 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:30:26,314-Speed 2531.80 samples/sec Loss 11.6151 LearningRate 0.0708 Epoch: 3 Global Step: 131460 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:30,205-Speed 2632.28 samples/sec Loss 11.6601 LearningRate 0.0708 Epoch: 3 Global Step: 131470 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:34,102-Speed 2628.41 samples/sec Loss 11.6007 LearningRate 0.0708 Epoch: 3 Global Step: 131480 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:37,996-Speed 2630.41 samples/sec Loss 11.5821 LearningRate 0.0708 Epoch: 3 Global Step: 131490 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:41,894-Speed 2627.64 samples/sec Loss 11.5112 LearningRate 0.0708 Epoch: 3 Global Step: 131500 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:45,790-Speed 2628.82 samples/sec Loss 11.6430 LearningRate 0.0708 Epoch: 3 Global Step: 131510 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:49,692-Speed 2624.97 samples/sec Loss 11.3972 LearningRate 0.0708 Epoch: 3 Global Step: 131520 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:53,595-Speed 2624.43 samples/sec Loss 11.5338 LearningRate 0.0708 Epoch: 3 Global Step: 131530 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:30:57,532-Speed 2601.99 samples/sec Loss 11.7182 LearningRate 0.0708 Epoch: 3 Global Step: 131540 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:01,424-Speed 2631.08 samples/sec Loss 11.5581 LearningRate 0.0708 Epoch: 3 Global Step: 131550 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:05,354-Speed 2606.44 samples/sec Loss 11.5886 LearningRate 0.0708 Epoch: 3 Global Step: 131560 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:09,263-Speed 2620.24 samples/sec Loss 11.7704 LearningRate 0.0708 Epoch: 3 Global Step: 131570 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:13,199-Speed 2602.22 samples/sec Loss 11.5889 LearningRate 0.0708 Epoch: 3 Global Step: 131580 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:17,092-Speed 2631.66 samples/sec Loss 11.6828 LearningRate 0.0708 Epoch: 3 Global Step: 131590 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:20,985-Speed 2630.92 samples/sec Loss 11.6327 LearningRate 0.0708 Epoch: 3 Global Step: 131600 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:24,880-Speed 2629.45 samples/sec Loss 11.5536 LearningRate 0.0708 Epoch: 3 Global Step: 131610 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:28,786-Speed 2622.54 samples/sec Loss 11.5667 LearningRate 0.0708 Epoch: 3 Global Step: 131620 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:32,681-Speed 2629.54 samples/sec Loss 11.5548 LearningRate 0.0708 Epoch: 3 Global Step: 131630 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:36,574-Speed 2630.74 samples/sec Loss 11.5324 LearningRate 0.0708 Epoch: 3 Global Step: 131640 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:40,476-Speed 2625.62 samples/sec Loss 11.6559 LearningRate 0.0708 Epoch: 3 Global Step: 131650 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:44,352-Speed 2642.43 samples/sec Loss 11.5339 LearningRate 0.0708 Epoch: 3 Global Step: 131660 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:31:48,230-Speed 2641.30 samples/sec Loss 11.6160 LearningRate 0.0708 Epoch: 3 Global Step: 131670 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:31:52,123-Speed 2631.27 samples/sec Loss 11.6815 LearningRate 0.0708 Epoch: 3 Global Step: 131680 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:31:56,023-Speed 2625.79 samples/sec Loss 11.6855 LearningRate 0.0708 Epoch: 3 Global Step: 131690 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:31:59,911-Speed 2634.49 samples/sec Loss 11.6885 LearningRate 0.0708 Epoch: 3 Global Step: 131700 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:03,806-Speed 2630.00 samples/sec Loss 11.6953 LearningRate 0.0708 Epoch: 3 Global Step: 131710 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:07,704-Speed 2627.35 samples/sec Loss 11.6884 LearningRate 0.0708 Epoch: 3 Global Step: 131720 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:11,601-Speed 2628.47 samples/sec Loss 11.7547 LearningRate 0.0708 Epoch: 3 Global Step: 131730 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:15,505-Speed 2622.89 samples/sec Loss 11.5665 LearningRate 0.0708 Epoch: 3 Global Step: 131740 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:19,398-Speed 2631.48 samples/sec Loss 11.6925 LearningRate 0.0708 Epoch: 3 Global Step: 131750 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:23,293-Speed 2629.64 samples/sec Loss 11.5755 LearningRate 0.0708 Epoch: 3 Global Step: 131760 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:27,189-Speed 2628.33 samples/sec Loss 11.5907 LearningRate 0.0708 Epoch: 3 Global Step: 131770 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:32:31,091-Speed 2624.85 samples/sec Loss 11.6843 LearningRate 0.0708 Epoch: 3 Global Step: 131780 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:32:34,992-Speed 2625.99 samples/sec Loss 11.5815 LearningRate 0.0708 Epoch: 3 Global Step: 131790 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:32:38,889-Speed 2628.49 samples/sec Loss 11.6162 LearningRate 0.0707 Epoch: 3 Global Step: 131800 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:32:42,783-Speed 2629.91 samples/sec Loss 11.6352 LearningRate 0.0707 Epoch: 3 Global Step: 131810 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:32:46,677-Speed 2630.56 samples/sec Loss 11.6210 LearningRate 0.0707 Epoch: 3 Global Step: 131820 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:32:50,556-Speed 2640.22 samples/sec Loss 11.7731 LearningRate 0.0707 Epoch: 3 Global Step: 131830 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:54,447-Speed 2632.59 samples/sec Loss 11.5961 LearningRate 0.0707 Epoch: 3 Global Step: 131840 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:32:58,351-Speed 2623.46 samples/sec Loss 11.5371 LearningRate 0.0707 Epoch: 3 Global Step: 131850 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:02,335-Speed 2570.56 samples/sec Loss 11.6905 LearningRate 0.0707 Epoch: 3 Global Step: 131860 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:06,238-Speed 2624.27 samples/sec Loss 11.6398 LearningRate 0.0707 Epoch: 3 Global Step: 131870 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:10,131-Speed 2631.55 samples/sec Loss 11.6965 LearningRate 0.0707 Epoch: 3 Global Step: 131880 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:14,029-Speed 2628.07 samples/sec Loss 11.6484 LearningRate 0.0707 Epoch: 3 Global Step: 131890 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:17,925-Speed 2628.82 samples/sec Loss 11.6522 LearningRate 0.0707 Epoch: 3 Global Step: 131900 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:21,820-Speed 2629.23 samples/sec Loss 11.6208 LearningRate 0.0707 Epoch: 3 Global Step: 131910 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:25,717-Speed 2628.64 samples/sec Loss 11.6191 LearningRate 0.0707 Epoch: 3 Global Step: 131920 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:29,612-Speed 2629.26 samples/sec Loss 11.6349 LearningRate 0.0707 Epoch: 3 Global Step: 131930 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:33:33,490-Speed 2641.11 samples/sec Loss 11.6536 LearningRate 0.0707 Epoch: 3 Global Step: 131940 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:37,389-Speed 2627.40 samples/sec Loss 11.6102 LearningRate 0.0707 Epoch: 3 Global Step: 131950 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:41,289-Speed 2625.87 samples/sec Loss 11.5882 LearningRate 0.0707 Epoch: 3 Global Step: 131960 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:45,199-Speed 2629.96 samples/sec Loss 11.5079 LearningRate 0.0707 Epoch: 3 Global Step: 131970 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:49,104-Speed 2622.31 samples/sec Loss 11.4768 LearningRate 0.0707 Epoch: 3 Global Step: 131980 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:53,003-Speed 2627.17 samples/sec Loss 11.6120 LearningRate 0.0707 Epoch: 3 Global Step: 131990 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:33:56,901-Speed 2627.97 samples/sec Loss 11.7379 LearningRate 0.0707 Epoch: 3 Global Step: 132000 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:00,798-Speed 2628.59 samples/sec Loss 11.5009 LearningRate 0.0707 Epoch: 3 Global Step: 132010 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:04,695-Speed 2627.76 samples/sec Loss 11.4716 LearningRate 0.0707 Epoch: 3 Global Step: 132020 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:08,588-Speed 2630.99 samples/sec Loss 11.5330 LearningRate 0.0707 Epoch: 3 Global Step: 132030 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:12,492-Speed 2623.79 samples/sec Loss 11.6266 LearningRate 0.0707 Epoch: 3 Global Step: 132040 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:34:16,387-Speed 2629.71 samples/sec Loss 11.6154 LearningRate 0.0707 Epoch: 3 Global Step: 132050 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:34:20,278-Speed 2632.33 samples/sec Loss 11.7653 LearningRate 0.0707 Epoch: 3 Global Step: 132060 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:34:24,162-Speed 2637.21 samples/sec Loss 11.4799 LearningRate 0.0707 Epoch: 3 Global Step: 132070 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:28,060-Speed 2627.47 samples/sec Loss 11.6062 LearningRate 0.0707 Epoch: 3 Global Step: 132080 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:31,962-Speed 2625.04 samples/sec Loss 11.4549 LearningRate 0.0707 Epoch: 3 Global Step: 132090 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:35,861-Speed 2626.56 samples/sec Loss 11.5004 LearningRate 0.0707 Epoch: 3 Global Step: 132100 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:39,775-Speed 2617.31 samples/sec Loss 11.6063 LearningRate 0.0707 Epoch: 3 Global Step: 132110 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:43,679-Speed 2623.56 samples/sec Loss 11.5965 LearningRate 0.0707 Epoch: 3 Global Step: 132120 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:47,573-Speed 2630.89 samples/sec Loss 11.6719 LearningRate 0.0707 Epoch: 3 Global Step: 132130 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:51,463-Speed 2633.26 samples/sec Loss 11.5924 LearningRate 0.0707 Epoch: 3 Global Step: 132140 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:55,359-Speed 2628.63 samples/sec Loss 11.6518 LearningRate 0.0707 Epoch: 3 Global Step: 132150 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:34:59,256-Speed 2628.36 samples/sec Loss 11.6391 LearningRate 0.0707 Epoch: 3 Global Step: 132160 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:03,142-Speed 2635.89 samples/sec Loss 11.7046 LearningRate 0.0707 Epoch: 3 Global Step: 132170 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:07,083-Speed 2599.07 samples/sec Loss 11.6648 LearningRate 0.0707 Epoch: 3 Global Step: 132180 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:10,979-Speed 2628.51 samples/sec Loss 11.6728 LearningRate 0.0707 Epoch: 3 Global Step: 132190 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:14,869-Speed 2633.42 samples/sec Loss 11.5634 LearningRate 0.0707 Epoch: 3 Global Step: 132200 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:18,761-Speed 2631.87 samples/sec Loss 11.6220 LearningRate 0.0707 Epoch: 3 Global Step: 132210 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:22,655-Speed 2630.12 samples/sec Loss 11.7314 LearningRate 0.0707 Epoch: 3 Global Step: 132220 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:26,555-Speed 2626.81 samples/sec Loss 11.5099 LearningRate 0.0707 Epoch: 3 Global Step: 132230 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:30,459-Speed 2623.30 samples/sec Loss 11.5264 LearningRate 0.0707 Epoch: 3 Global Step: 132240 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:34,359-Speed 2626.62 samples/sec Loss 11.6168 LearningRate 0.0707 Epoch: 3 Global Step: 132250 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:38,262-Speed 2624.20 samples/sec Loss 11.6399 LearningRate 0.0707 Epoch: 3 Global Step: 132260 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:35:42,153-Speed 2632.31 samples/sec Loss 11.4820 LearningRate 0.0707 Epoch: 3 Global Step: 132270 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:35:46,046-Speed 2631.00 samples/sec Loss 11.8038 LearningRate 0.0707 Epoch: 3 Global Step: 132280 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:35:49,939-Speed 2631.02 samples/sec Loss 11.5656 LearningRate 0.0706 Epoch: 3 Global Step: 132290 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:35:53,839-Speed 2626.62 samples/sec Loss 11.7122 LearningRate 0.0706 Epoch: 3 Global Step: 132300 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:35:57,755-Speed 2615.24 samples/sec Loss 11.7507 LearningRate 0.0706 Epoch: 3 Global Step: 132310 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:36:01,633-Speed 2641.06 samples/sec Loss 11.4210 LearningRate 0.0706 Epoch: 3 Global Step: 132320 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:05,533-Speed 2626.24 samples/sec Loss 11.4761 LearningRate 0.0706 Epoch: 3 Global Step: 132330 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:09,430-Speed 2628.46 samples/sec Loss 11.7087 LearningRate 0.0706 Epoch: 3 Global Step: 132340 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:13,322-Speed 2631.86 samples/sec Loss 11.6741 LearningRate 0.0706 Epoch: 3 Global Step: 132350 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:17,223-Speed 2625.60 samples/sec Loss 11.5716 LearningRate 0.0706 Epoch: 3 Global Step: 132360 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:21,116-Speed 2630.98 samples/sec Loss 11.4780 LearningRate 0.0706 Epoch: 3 Global Step: 132370 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:25,015-Speed 2627.27 samples/sec Loss 11.6841 LearningRate 0.0706 Epoch: 3 Global Step: 132380 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:28,909-Speed 2630.70 samples/sec Loss 11.7462 LearningRate 0.0706 Epoch: 3 Global Step: 132390 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:32,805-Speed 2628.76 samples/sec Loss 11.6635 LearningRate 0.0706 Epoch: 3 Global Step: 132400 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:36,697-Speed 2631.35 samples/sec Loss 11.5555 LearningRate 0.0706 Epoch: 3 Global Step: 132410 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:40,604-Speed 2621.54 samples/sec Loss 11.6731 LearningRate 0.0706 Epoch: 3 Global Step: 132420 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:36:44,500-Speed 2628.64 samples/sec Loss 11.7145 LearningRate 0.0706 Epoch: 3 Global Step: 132430 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:36:48,395-Speed 2629.77 samples/sec Loss 11.5566 LearningRate 0.0706 Epoch: 3 Global Step: 132440 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:52,326-Speed 2605.82 samples/sec Loss 11.5639 LearningRate 0.0706 Epoch: 3 Global Step: 132450 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:36:56,221-Speed 2629.82 samples/sec Loss 11.6114 LearningRate 0.0706 Epoch: 3 Global Step: 132460 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:00,124-Speed 2624.32 samples/sec Loss 11.5906 LearningRate 0.0706 Epoch: 3 Global Step: 132470 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:04,026-Speed 2624.69 samples/sec Loss 11.5713 LearningRate 0.0706 Epoch: 3 Global Step: 132480 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:07,933-Speed 2621.19 samples/sec Loss 11.5921 LearningRate 0.0706 Epoch: 3 Global Step: 132490 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:11,827-Speed 2630.16 samples/sec Loss 11.6905 LearningRate 0.0706 Epoch: 3 Global Step: 132500 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:15,722-Speed 2630.09 samples/sec Loss 11.5197 LearningRate 0.0706 Epoch: 3 Global Step: 132510 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:19,626-Speed 2623.88 samples/sec Loss 11.5684 LearningRate 0.0706 Epoch: 3 Global Step: 132520 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:23,521-Speed 2629.32 samples/sec Loss 11.5887 LearningRate 0.0706 Epoch: 3 Global Step: 132530 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:27,421-Speed 2626.78 samples/sec Loss 11.5885 LearningRate 0.0706 Epoch: 3 Global Step: 132540 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:37:31,303-Speed 2637.95 samples/sec Loss 11.5138 LearningRate 0.0706 Epoch: 3 Global Step: 132550 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:37:35,200-Speed 2628.20 samples/sec Loss 11.7519 LearningRate 0.0706 Epoch: 3 Global Step: 132560 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:37:39,095-Speed 2629.84 samples/sec Loss 11.6478 LearningRate 0.0706 Epoch: 3 Global Step: 132570 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:37:42,993-Speed 2627.92 samples/sec Loss 11.6120 LearningRate 0.0706 Epoch: 3 Global Step: 132580 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:37:46,884-Speed 2631.95 samples/sec Loss 11.4982 LearningRate 0.0706 Epoch: 3 Global Step: 132590 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:37:50,767-Speed 2638.48 samples/sec Loss 11.6666 LearningRate 0.0706 Epoch: 3 Global Step: 132600 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:37:54,673-Speed 2622.21 samples/sec Loss 11.4874 LearningRate 0.0706 Epoch: 3 Global Step: 132610 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:37:58,567-Speed 2630.54 samples/sec Loss 11.7601 LearningRate 0.0706 Epoch: 3 Global Step: 132620 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:02,460-Speed 2630.43 samples/sec Loss 11.6875 LearningRate 0.0706 Epoch: 3 Global Step: 132630 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:06,356-Speed 2628.79 samples/sec Loss 11.7452 LearningRate 0.0706 Epoch: 3 Global Step: 132640 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:10,248-Speed 2631.63 samples/sec Loss 11.5797 LearningRate 0.0706 Epoch: 3 Global Step: 132650 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:14,162-Speed 2617.08 samples/sec Loss 11.5994 LearningRate 0.0706 Epoch: 3 Global Step: 132660 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:18,052-Speed 2633.11 samples/sec Loss 11.4833 LearningRate 0.0706 Epoch: 3 Global Step: 132670 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:21,949-Speed 2628.09 samples/sec Loss 11.7313 LearningRate 0.0706 Epoch: 3 Global Step: 132680 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:25,849-Speed 2626.59 samples/sec Loss 11.5778 LearningRate 0.0706 Epoch: 3 Global Step: 132690 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:38:29,753-Speed 2623.57 samples/sec Loss 11.6587 LearningRate 0.0706 Epoch: 3 Global Step: 132700 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:38:33,653-Speed 2626.63 samples/sec Loss 11.5357 LearningRate 0.0706 Epoch: 3 Global Step: 132710 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:38:37,548-Speed 2629.52 samples/sec Loss 11.7466 LearningRate 0.0706 Epoch: 3 Global Step: 132720 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:38:41,444-Speed 2628.52 samples/sec Loss 11.6107 LearningRate 0.0706 Epoch: 3 Global Step: 132730 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:38:45,343-Speed 2627.32 samples/sec Loss 11.6313 LearningRate 0.0706 Epoch: 3 Global Step: 132740 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:38:49,236-Speed 2630.93 samples/sec Loss 11.6628 LearningRate 0.0706 Epoch: 3 Global Step: 132750 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:38:53,130-Speed 2630.75 samples/sec Loss 11.5045 LearningRate 0.0706 Epoch: 3 Global Step: 132760 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:38:57,029-Speed 2626.84 samples/sec Loss 11.5759 LearningRate 0.0706 Epoch: 3 Global Step: 132770 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:39:00,925-Speed 2628.79 samples/sec Loss 11.5485 LearningRate 0.0706 Epoch: 3 Global Step: 132780 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:39:04,818-Speed 2630.45 samples/sec Loss 11.4600 LearningRate 0.0705 Epoch: 3 Global Step: 132790 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:39:08,715-Speed 2628.77 samples/sec Loss 11.5293 LearningRate 0.0705 Epoch: 3 Global Step: 132800 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:12,756-Speed 2534.45 samples/sec Loss 11.5582 LearningRate 0.0705 Epoch: 3 Global Step: 132810 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:16,650-Speed 2630.18 samples/sec Loss 11.5753 LearningRate 0.0705 Epoch: 3 Global Step: 132820 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:20,544-Speed 2630.11 samples/sec Loss 11.7071 LearningRate 0.0705 Epoch: 3 Global Step: 132830 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:24,451-Speed 2621.97 samples/sec Loss 11.4428 LearningRate 0.0705 Epoch: 3 Global Step: 132840 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:28,347-Speed 2629.23 samples/sec Loss 11.7813 LearningRate 0.0705 Epoch: 3 Global Step: 132850 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:32,244-Speed 2628.25 samples/sec Loss 11.6460 LearningRate 0.0705 Epoch: 3 Global Step: 132860 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:36,141-Speed 2628.01 samples/sec Loss 11.5843 LearningRate 0.0705 Epoch: 3 Global Step: 132870 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:40,037-Speed 2629.22 samples/sec Loss 11.6366 LearningRate 0.0705 Epoch: 3 Global Step: 132880 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:43,927-Speed 2632.20 samples/sec Loss 11.5944 LearningRate 0.0705 Epoch: 3 Global Step: 132890 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:39:47,824-Speed 2628.89 samples/sec Loss 11.7066 LearningRate 0.0705 Epoch: 3 Global Step: 132900 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:39:51,720-Speed 2628.72 samples/sec Loss 11.5481 LearningRate 0.0705 Epoch: 3 Global Step: 132910 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:39:55,616-Speed 2629.19 samples/sec Loss 11.5261 LearningRate 0.0705 Epoch: 3 Global Step: 132920 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:39:59,512-Speed 2628.58 samples/sec Loss 11.4973 LearningRate 0.0705 Epoch: 3 Global Step: 132930 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:03,413-Speed 2625.62 samples/sec Loss 11.6371 LearningRate 0.0705 Epoch: 3 Global Step: 132940 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:07,311-Speed 2627.46 samples/sec Loss 11.4838 LearningRate 0.0705 Epoch: 3 Global Step: 132950 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:11,207-Speed 2629.34 samples/sec Loss 11.6856 LearningRate 0.0705 Epoch: 3 Global Step: 132960 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:15,105-Speed 2627.62 samples/sec Loss 11.7552 LearningRate 0.0705 Epoch: 3 Global Step: 132970 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:19,021-Speed 2615.35 samples/sec Loss 11.7819 LearningRate 0.0705 Epoch: 3 Global Step: 132980 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:23,012-Speed 2566.70 samples/sec Loss 11.4676 LearningRate 0.0705 Epoch: 3 Global Step: 132990 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:26,885-Speed 2644.81 samples/sec Loss 11.6046 LearningRate 0.0705 Epoch: 3 Global Step: 133000 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:30,783-Speed 2627.07 samples/sec Loss 11.6062 LearningRate 0.0705 Epoch: 3 Global Step: 133010 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:34,727-Speed 2597.21 samples/sec Loss 11.5378 LearningRate 0.0705 Epoch: 3 Global Step: 133020 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:38,620-Speed 2631.40 samples/sec Loss 11.5841 LearningRate 0.0705 Epoch: 3 Global Step: 133030 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:42,510-Speed 2633.19 samples/sec Loss 11.6309 LearningRate 0.0705 Epoch: 3 Global Step: 133040 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:46,401-Speed 2632.26 samples/sec Loss 11.5073 LearningRate 0.0705 Epoch: 3 Global Step: 133050 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:50,294-Speed 2631.46 samples/sec Loss 11.5607 LearningRate 0.0705 Epoch: 3 Global Step: 133060 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:54,185-Speed 2632.02 samples/sec Loss 11.6157 LearningRate 0.0705 Epoch: 3 Global Step: 133070 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:40:58,077-Speed 2631.81 samples/sec Loss 11.4837 LearningRate 0.0705 Epoch: 3 Global Step: 133080 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:01,972-Speed 2629.65 samples/sec Loss 11.4255 LearningRate 0.0705 Epoch: 3 Global Step: 133090 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:05,851-Speed 2640.20 samples/sec Loss 11.5988 LearningRate 0.0705 Epoch: 3 Global Step: 133100 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:09,801-Speed 2594.21 samples/sec Loss 11.6191 LearningRate 0.0705 Epoch: 3 Global Step: 133110 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:13,699-Speed 2626.99 samples/sec Loss 11.5812 LearningRate 0.0705 Epoch: 3 Global Step: 133120 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:17,600-Speed 2625.50 samples/sec Loss 11.6169 LearningRate 0.0705 Epoch: 3 Global Step: 133130 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:21,529-Speed 2607.04 samples/sec Loss 11.5939 LearningRate 0.0705 Epoch: 3 Global Step: 133140 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:25,429-Speed 2626.52 samples/sec Loss 11.7920 LearningRate 0.0705 Epoch: 3 Global Step: 133150 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:29,331-Speed 2625.29 samples/sec Loss 11.6077 LearningRate 0.0705 Epoch: 3 Global Step: 133160 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:33,231-Speed 2626.29 samples/sec Loss 11.5559 LearningRate 0.0705 Epoch: 3 Global Step: 133170 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:37,124-Speed 2630.29 samples/sec Loss 11.6271 LearningRate 0.0705 Epoch: 3 Global Step: 133180 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:41,021-Speed 2628.41 samples/sec Loss 11.5988 LearningRate 0.0705 Epoch: 3 Global Step: 133190 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:44,898-Speed 2642.42 samples/sec Loss 11.4847 LearningRate 0.0705 Epoch: 3 Global Step: 133200 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:41:48,770-Speed 2644.87 samples/sec Loss 11.6153 LearningRate 0.0705 Epoch: 3 Global Step: 133210 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:41:52,667-Speed 2628.87 samples/sec Loss 11.5202 LearningRate 0.0705 Epoch: 3 Global Step: 133220 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:41:56,571-Speed 2623.79 samples/sec Loss 11.6682 LearningRate 0.0705 Epoch: 3 Global Step: 133230 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:00,473-Speed 2624.96 samples/sec Loss 11.6488 LearningRate 0.0705 Epoch: 3 Global Step: 133240 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:04,381-Speed 2620.98 samples/sec Loss 11.4974 LearningRate 0.0705 Epoch: 3 Global Step: 133250 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:08,273-Speed 2631.77 samples/sec Loss 11.6921 LearningRate 0.0705 Epoch: 3 Global Step: 133260 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:12,164-Speed 2631.92 samples/sec Loss 11.5991 LearningRate 0.0705 Epoch: 3 Global Step: 133270 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:16,056-Speed 2632.12 samples/sec Loss 11.5603 LearningRate 0.0704 Epoch: 3 Global Step: 133280 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:19,956-Speed 2626.16 samples/sec Loss 11.4816 LearningRate 0.0704 Epoch: 3 Global Step: 133290 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:23,851-Speed 2630.26 samples/sec Loss 11.6036 LearningRate 0.0704 Epoch: 3 Global Step: 133300 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:27,727-Speed 2642.64 samples/sec Loss 11.6086 LearningRate 0.0704 Epoch: 3 Global Step: 133310 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:31,650-Speed 2610.75 samples/sec Loss 11.5653 LearningRate 0.0704 Epoch: 3 Global Step: 133320 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:35,542-Speed 2632.01 samples/sec Loss 11.6005 LearningRate 0.0704 Epoch: 3 Global Step: 133330 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:42:39,419-Speed 2641.68 samples/sec Loss 11.5294 LearningRate 0.0704 Epoch: 3 Global Step: 133340 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:42:43,313-Speed 2630.12 samples/sec Loss 11.5948 LearningRate 0.0704 Epoch: 3 Global Step: 133350 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:42:47,208-Speed 2629.40 samples/sec Loss 11.7172 LearningRate 0.0704 Epoch: 3 Global Step: 133360 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:42:51,102-Speed 2630.83 samples/sec Loss 11.7959 LearningRate 0.0704 Epoch: 3 Global Step: 133370 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:42:54,998-Speed 2628.96 samples/sec Loss 11.5854 LearningRate 0.0704 Epoch: 3 Global Step: 133380 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:42:58,894-Speed 2629.49 samples/sec Loss 11.5753 LearningRate 0.0704 Epoch: 3 Global Step: 133390 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:43:02,796-Speed 2624.72 samples/sec Loss 11.6310 LearningRate 0.0704 Epoch: 3 Global Step: 133400 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:43:06,689-Speed 2630.92 samples/sec Loss 11.5749 LearningRate 0.0704 Epoch: 3 Global Step: 133410 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:43:10,581-Speed 2631.01 samples/sec Loss 11.5319 LearningRate 0.0704 Epoch: 3 Global Step: 133420 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:43:14,473-Speed 2631.82 samples/sec Loss 11.5469 LearningRate 0.0704 Epoch: 3 Global Step: 133430 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:43:18,380-Speed 2621.75 samples/sec Loss 11.5000 LearningRate 0.0704 Epoch: 3 Global Step: 133440 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:22,276-Speed 2628.78 samples/sec Loss 11.6647 LearningRate 0.0704 Epoch: 3 Global Step: 133450 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:26,173-Speed 2628.70 samples/sec Loss 11.6541 LearningRate 0.0704 Epoch: 3 Global Step: 133460 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:30,071-Speed 2627.86 samples/sec Loss 11.4877 LearningRate 0.0704 Epoch: 3 Global Step: 133470 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:33,963-Speed 2631.33 samples/sec Loss 11.5041 LearningRate 0.0704 Epoch: 3 Global Step: 133480 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:37,859-Speed 2629.38 samples/sec Loss 11.6541 LearningRate 0.0704 Epoch: 3 Global Step: 133490 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:41,815-Speed 2588.97 samples/sec Loss 11.5325 LearningRate 0.0704 Epoch: 3 Global Step: 133500 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:45,709-Speed 2630.17 samples/sec Loss 11.5030 LearningRate 0.0704 Epoch: 3 Global Step: 133510 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:49,608-Speed 2627.23 samples/sec Loss 11.5876 LearningRate 0.0704 Epoch: 3 Global Step: 133520 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:53,505-Speed 2628.30 samples/sec Loss 11.6236 LearningRate 0.0704 Epoch: 3 Global Step: 133530 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:43:57,399-Speed 2630.64 samples/sec Loss 11.3722 LearningRate 0.0704 Epoch: 3 Global Step: 133540 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:44:01,294-Speed 2629.18 samples/sec Loss 11.5298 LearningRate 0.0704 Epoch: 3 Global Step: 133550 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:44:05,172-Speed 2641.38 samples/sec Loss 11.5872 LearningRate 0.0704 Epoch: 3 Global Step: 133560 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:09,065-Speed 2630.59 samples/sec Loss 11.6993 LearningRate 0.0704 Epoch: 3 Global Step: 133570 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:12,962-Speed 2628.51 samples/sec Loss 11.5076 LearningRate 0.0704 Epoch: 3 Global Step: 133580 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:16,866-Speed 2623.58 samples/sec Loss 11.4944 LearningRate 0.0704 Epoch: 3 Global Step: 133590 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:20,764-Speed 2627.53 samples/sec Loss 11.6442 LearningRate 0.0704 Epoch: 3 Global Step: 133600 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:24,667-Speed 2624.41 samples/sec Loss 11.5588 LearningRate 0.0704 Epoch: 3 Global Step: 133610 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:28,562-Speed 2629.91 samples/sec Loss 11.6546 LearningRate 0.0704 Epoch: 3 Global Step: 133620 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:32,459-Speed 2628.19 samples/sec Loss 11.6194 LearningRate 0.0704 Epoch: 3 Global Step: 133630 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:36,373-Speed 2616.94 samples/sec Loss 11.6447 LearningRate 0.0704 Epoch: 3 Global Step: 133640 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:40,390-Speed 2549.87 samples/sec Loss 11.5231 LearningRate 0.0704 Epoch: 3 Global Step: 133650 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:44:44,345-Speed 2590.26 samples/sec Loss 11.5545 LearningRate 0.0704 Epoch: 3 Global Step: 133660 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:44:48,268-Speed 2611.35 samples/sec Loss 11.5400 LearningRate 0.0704 Epoch: 3 Global Step: 133670 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:44:52,158-Speed 2632.59 samples/sec Loss 11.5890 LearningRate 0.0704 Epoch: 3 Global Step: 133680 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:44:56,025-Speed 2649.28 samples/sec Loss 11.7191 LearningRate 0.0704 Epoch: 3 Global Step: 133690 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:44:59,881-Speed 2656.33 samples/sec Loss 12.3524 LearningRate 0.0704 Epoch: 3 Global Step: 133700 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:03,769-Speed 2633.99 samples/sec Loss 12.0529 LearningRate 0.0704 Epoch: 3 Global Step: 133710 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:07,664-Speed 2629.83 samples/sec Loss 11.8304 LearningRate 0.0704 Epoch: 3 Global Step: 133720 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:11,553-Speed 2633.48 samples/sec Loss 11.8141 LearningRate 0.0704 Epoch: 3 Global Step: 133730 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:15,483-Speed 2606.80 samples/sec Loss 11.6434 LearningRate 0.0704 Epoch: 3 Global Step: 133740 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:19,376-Speed 2630.76 samples/sec Loss 11.7892 LearningRate 0.0704 Epoch: 3 Global Step: 133750 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:23,286-Speed 2619.75 samples/sec Loss 11.7020 LearningRate 0.0704 Epoch: 3 Global Step: 133760 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:27,178-Speed 2632.09 samples/sec Loss 11.8907 LearningRate 0.0704 Epoch: 3 Global Step: 133770 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:31,075-Speed 2628.18 samples/sec Loss 11.6437 LearningRate 0.0703 Epoch: 3 Global Step: 133780 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:34,965-Speed 2633.14 samples/sec Loss 11.6913 LearningRate 0.0703 Epoch: 3 Global Step: 133790 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:45:38,857-Speed 2631.87 samples/sec Loss 11.5976 LearningRate 0.0703 Epoch: 3 Global Step: 133800 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:45:42,756-Speed 2626.22 samples/sec Loss 11.6673 LearningRate 0.0703 Epoch: 3 Global Step: 133810 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:45:46,652-Speed 2629.52 samples/sec Loss 11.6888 LearningRate 0.0703 Epoch: 3 Global Step: 133820 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:45:50,544-Speed 2631.81 samples/sec Loss 11.5885 LearningRate 0.0703 Epoch: 3 Global Step: 133830 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:45:54,436-Speed 2631.77 samples/sec Loss 11.6038 LearningRate 0.0703 Epoch: 3 Global Step: 133840 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:45:58,340-Speed 2623.61 samples/sec Loss 11.8245 LearningRate 0.0703 Epoch: 3 Global Step: 133850 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:46:02,243-Speed 2624.11 samples/sec Loss 11.5630 LearningRate 0.0703 Epoch: 3 Global Step: 133860 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:46:06,138-Speed 2629.24 samples/sec Loss 11.7120 LearningRate 0.0703 Epoch: 3 Global Step: 133870 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:46:10,031-Speed 2630.67 samples/sec Loss 11.5564 LearningRate 0.0703 Epoch: 3 Global Step: 133880 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:46:13,929-Speed 2627.78 samples/sec Loss 11.5643 LearningRate 0.0703 Epoch: 3 Global Step: 133890 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:46:17,822-Speed 2631.38 samples/sec Loss 11.4765 LearningRate 0.0703 Epoch: 3 Global Step: 133900 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:21,716-Speed 2630.01 samples/sec Loss 11.8559 LearningRate 0.0703 Epoch: 3 Global Step: 133910 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:25,608-Speed 2632.37 samples/sec Loss 11.5534 LearningRate 0.0703 Epoch: 3 Global Step: 133920 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:29,503-Speed 2629.33 samples/sec Loss 11.6680 LearningRate 0.0703 Epoch: 3 Global Step: 133930 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:33,408-Speed 2622.61 samples/sec Loss 11.7201 LearningRate 0.0703 Epoch: 3 Global Step: 133940 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:37,301-Speed 2630.86 samples/sec Loss 11.5994 LearningRate 0.0703 Epoch: 3 Global Step: 133950 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:41,196-Speed 2630.07 samples/sec Loss 11.5128 LearningRate 0.0703 Epoch: 3 Global Step: 133960 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:45,101-Speed 2622.93 samples/sec Loss 11.6230 LearningRate 0.0703 Epoch: 3 Global Step: 133970 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:48,996-Speed 2629.63 samples/sec Loss 11.6423 LearningRate 0.0703 Epoch: 3 Global Step: 133980 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:52,885-Speed 2633.99 samples/sec Loss 11.7190 LearningRate 0.0703 Epoch: 3 Global Step: 133990 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:46:56,788-Speed 2624.23 samples/sec Loss 11.7071 LearningRate 0.0703 Epoch: 3 Global Step: 134000 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:47:00,687-Speed 2627.18 samples/sec Loss 11.6817 LearningRate 0.0703 Epoch: 3 Global Step: 134010 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:47:04,588-Speed 2625.44 samples/sec Loss 11.6253 LearningRate 0.0703 Epoch: 3 Global Step: 134020 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:47:08,479-Speed 2632.15 samples/sec Loss 11.6674 LearningRate 0.0703 Epoch: 3 Global Step: 134030 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:47:12,395-Speed 2615.98 samples/sec Loss 11.5434 LearningRate 0.0703 Epoch: 3 Global Step: 134040 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:47:16,295-Speed 2626.17 samples/sec Loss 11.4943 LearningRate 0.0703 Epoch: 3 Global Step: 134050 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:47:20,190-Speed 2630.30 samples/sec Loss 11.5192 LearningRate 0.0703 Epoch: 3 Global Step: 134060 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:47:24,082-Speed 2631.13 samples/sec Loss 11.5370 LearningRate 0.0703 Epoch: 3 Global Step: 134070 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:27,972-Speed 2634.07 samples/sec Loss 11.5983 LearningRate 0.0703 Epoch: 3 Global Step: 134080 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:31,866-Speed 2629.99 samples/sec Loss 11.5277 LearningRate 0.0703 Epoch: 3 Global Step: 134090 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:35,767-Speed 2625.52 samples/sec Loss 11.6680 LearningRate 0.0703 Epoch: 3 Global Step: 134100 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:39,827-Speed 2522.72 samples/sec Loss 11.4629 LearningRate 0.0703 Epoch: 3 Global Step: 134110 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:43,743-Speed 2615.61 samples/sec Loss 11.6495 LearningRate 0.0703 Epoch: 3 Global Step: 134120 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:47,634-Speed 2632.63 samples/sec Loss 11.6043 LearningRate 0.0703 Epoch: 3 Global Step: 134130 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:51,540-Speed 2622.38 samples/sec Loss 11.4870 LearningRate 0.0703 Epoch: 3 Global Step: 134140 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:55,430-Speed 2633.23 samples/sec Loss 11.6495 LearningRate 0.0703 Epoch: 3 Global Step: 134150 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:47:59,446-Speed 2550.63 samples/sec Loss 11.6193 LearningRate 0.0703 Epoch: 3 Global Step: 134160 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:48:03,341-Speed 2629.10 samples/sec Loss 11.6491 LearningRate 0.0703 Epoch: 3 Global Step: 134170 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:07,237-Speed 2629.29 samples/sec Loss 11.6879 LearningRate 0.0703 Epoch: 3 Global Step: 134180 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:11,132-Speed 2629.30 samples/sec Loss 11.5835 LearningRate 0.0703 Epoch: 3 Global Step: 134190 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:15,029-Speed 2628.91 samples/sec Loss 11.6910 LearningRate 0.0703 Epoch: 3 Global Step: 134200 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:18,928-Speed 2626.77 samples/sec Loss 11.6770 LearningRate 0.0703 Epoch: 3 Global Step: 134210 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:22,828-Speed 2626.56 samples/sec Loss 11.4125 LearningRate 0.0703 Epoch: 3 Global Step: 134220 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:26,727-Speed 2626.71 samples/sec Loss 11.5451 LearningRate 0.0703 Epoch: 3 Global Step: 134230 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:30,624-Speed 2628.30 samples/sec Loss 11.6972 LearningRate 0.0703 Epoch: 3 Global Step: 134240 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:34,520-Speed 2628.90 samples/sec Loss 11.5195 LearningRate 0.0703 Epoch: 3 Global Step: 134250 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:38,416-Speed 2628.95 samples/sec Loss 11.6375 LearningRate 0.0703 Epoch: 3 Global Step: 134260 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:42,310-Speed 2629.54 samples/sec Loss 11.6423 LearningRate 0.0702 Epoch: 3 Global Step: 134270 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:48:46,201-Speed 2633.11 samples/sec Loss 11.4529 LearningRate 0.0702 Epoch: 3 Global Step: 134280 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:50,089-Speed 2634.53 samples/sec Loss 11.6529 LearningRate 0.0702 Epoch: 3 Global Step: 134290 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:53,988-Speed 2626.71 samples/sec Loss 11.4976 LearningRate 0.0702 Epoch: 3 Global Step: 134300 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:48:57,890-Speed 2625.47 samples/sec Loss 11.6144 LearningRate 0.0702 Epoch: 3 Global Step: 134310 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:01,795-Speed 2622.77 samples/sec Loss 11.5566 LearningRate 0.0702 Epoch: 3 Global Step: 134320 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:05,708-Speed 2617.83 samples/sec Loss 11.6062 LearningRate 0.0702 Epoch: 3 Global Step: 134330 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:09,695-Speed 2568.88 samples/sec Loss 11.4839 LearningRate 0.0702 Epoch: 3 Global Step: 134340 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:13,649-Speed 2590.47 samples/sec Loss 11.6883 LearningRate 0.0702 Epoch: 3 Global Step: 134350 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:17,541-Speed 2631.50 samples/sec Loss 11.6113 LearningRate 0.0702 Epoch: 3 Global Step: 134360 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:21,434-Speed 2631.02 samples/sec Loss 11.5007 LearningRate 0.0702 Epoch: 3 Global Step: 134370 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:25,335-Speed 2625.66 samples/sec Loss 11.5615 LearningRate 0.0702 Epoch: 3 Global Step: 134380 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:49:29,217-Speed 2638.54 samples/sec Loss 11.7182 LearningRate 0.0702 Epoch: 3 Global Step: 134390 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:33,132-Speed 2616.19 samples/sec Loss 11.5894 LearningRate 0.0702 Epoch: 3 Global Step: 134400 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:37,043-Speed 2619.16 samples/sec Loss 11.5624 LearningRate 0.0702 Epoch: 3 Global Step: 134410 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:40,946-Speed 2624.46 samples/sec Loss 11.7549 LearningRate 0.0702 Epoch: 3 Global Step: 134420 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:44,838-Speed 2631.80 samples/sec Loss 11.4084 LearningRate 0.0702 Epoch: 3 Global Step: 134430 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:48,728-Speed 2632.47 samples/sec Loss 11.4616 LearningRate 0.0702 Epoch: 3 Global Step: 134440 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:52,625-Speed 2628.97 samples/sec Loss 11.5842 LearningRate 0.0702 Epoch: 3 Global Step: 134450 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:49:56,520-Speed 2629.21 samples/sec Loss 11.5309 LearningRate 0.0702 Epoch: 3 Global Step: 134460 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:50:00,413-Speed 2631.51 samples/sec Loss 11.5509 LearningRate 0.0702 Epoch: 3 Global Step: 134470 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:50:04,307-Speed 2630.55 samples/sec Loss 11.4827 LearningRate 0.0702 Epoch: 3 Global Step: 134480 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:50:08,207-Speed 2625.56 samples/sec Loss 11.6408 LearningRate 0.0702 Epoch: 3 Global Step: 134490 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:50:12,105-Speed 2627.84 samples/sec Loss 11.5171 LearningRate 0.0702 Epoch: 3 Global Step: 134500 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:50:15,985-Speed 2639.83 samples/sec Loss 11.6606 LearningRate 0.0702 Epoch: 3 Global Step: 134510 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:19,883-Speed 2628.08 samples/sec Loss 11.5656 LearningRate 0.0702 Epoch: 3 Global Step: 134520 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:23,774-Speed 2632.02 samples/sec Loss 11.7297 LearningRate 0.0702 Epoch: 3 Global Step: 134530 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:27,681-Speed 2621.53 samples/sec Loss 11.5550 LearningRate 0.0702 Epoch: 3 Global Step: 134540 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:31,576-Speed 2630.04 samples/sec Loss 11.5822 LearningRate 0.0702 Epoch: 3 Global Step: 134550 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:35,470-Speed 2630.64 samples/sec Loss 11.5515 LearningRate 0.0702 Epoch: 3 Global Step: 134560 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:39,367-Speed 2628.39 samples/sec Loss 11.6242 LearningRate 0.0702 Epoch: 3 Global Step: 134570 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:43,257-Speed 2632.49 samples/sec Loss 11.4957 LearningRate 0.0702 Epoch: 3 Global Step: 134580 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:47,152-Speed 2630.29 samples/sec Loss 11.5745 LearningRate 0.0702 Epoch: 3 Global Step: 134590 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:51,045-Speed 2631.10 samples/sec Loss 11.5453 LearningRate 0.0702 Epoch: 3 Global Step: 134600 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:50:54,944-Speed 2627.17 samples/sec Loss 11.5343 LearningRate 0.0702 Epoch: 3 Global Step: 134610 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:50:58,842-Speed 2627.85 samples/sec Loss 11.6593 LearningRate 0.0702 Epoch: 3 Global Step: 134620 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:51:02,734-Speed 2631.29 samples/sec Loss 11.4863 LearningRate 0.0702 Epoch: 3 Global Step: 134630 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:51:06,629-Speed 2629.24 samples/sec Loss 11.5836 LearningRate 0.0702 Epoch: 3 Global Step: 134640 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:51:10,525-Speed 2629.63 samples/sec Loss 11.4042 LearningRate 0.0702 Epoch: 3 Global Step: 134650 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:51:14,461-Speed 2602.08 samples/sec Loss 11.4465 LearningRate 0.0702 Epoch: 3 Global Step: 134660 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:51:18,359-Speed 2628.03 samples/sec Loss 11.6025 LearningRate 0.0702 Epoch: 3 Global Step: 134670 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:51:22,254-Speed 2629.14 samples/sec Loss 11.6955 LearningRate 0.0702 Epoch: 3 Global Step: 134680 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:51:26,138-Speed 2637.56 samples/sec Loss 11.6315 LearningRate 0.0702 Epoch: 3 Global Step: 134690 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:51:29,991-Speed 2658.86 samples/sec Loss 11.5960 LearningRate 0.0702 Epoch: 3 Global Step: 134700 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:51:33,882-Speed 2631.83 samples/sec Loss 11.7744 LearningRate 0.0702 Epoch: 3 Global Step: 134710 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:51:37,776-Speed 2630.21 samples/sec Loss 11.8374 LearningRate 0.0702 Epoch: 3 Global Step: 134720 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:51:41,671-Speed 2630.00 samples/sec Loss 11.7264 LearningRate 0.0702 Epoch: 3 Global Step: 134730 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:51:45,582-Speed 2619.35 samples/sec Loss 11.4783 LearningRate 0.0702 Epoch: 3 Global Step: 134740 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:51:49,499-Speed 2614.98 samples/sec Loss 11.5012 LearningRate 0.0702 Epoch: 3 Global Step: 134750 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:51:53,390-Speed 2632.79 samples/sec Loss 11.6680 LearningRate 0.0702 Epoch: 3 Global Step: 134760 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:51:57,285-Speed 2629.43 samples/sec Loss 11.6351 LearningRate 0.0701 Epoch: 3 Global Step: 134770 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:52:01,182-Speed 2628.38 samples/sec Loss 11.6531 LearningRate 0.0701 Epoch: 3 Global Step: 134780 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:52:05,076-Speed 2630.35 samples/sec Loss 11.5570 LearningRate 0.0701 Epoch: 3 Global Step: 134790 Fp16 Grad Scale: 8192 Required: 78 hours
Training: 2022-04-13 10:52:08,966-Speed 2633.38 samples/sec Loss 11.5497 LearningRate 0.0701 Epoch: 3 Global Step: 134800 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:12,877-Speed 2619.02 samples/sec Loss 11.6832 LearningRate 0.0701 Epoch: 3 Global Step: 134810 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:16,774-Speed 2628.60 samples/sec Loss 11.6206 LearningRate 0.0701 Epoch: 3 Global Step: 134820 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:20,670-Speed 2628.75 samples/sec Loss 11.7719 LearningRate 0.0701 Epoch: 3 Global Step: 134830 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:24,564-Speed 2631.04 samples/sec Loss 11.6579 LearningRate 0.0701 Epoch: 3 Global Step: 134840 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:28,458-Speed 2630.37 samples/sec Loss 11.5301 LearningRate 0.0701 Epoch: 3 Global Step: 134850 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:32,361-Speed 2624.10 samples/sec Loss 11.7231 LearningRate 0.0701 Epoch: 3 Global Step: 134860 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:36,261-Speed 2626.21 samples/sec Loss 11.7121 LearningRate 0.0701 Epoch: 3 Global Step: 134870 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:40,156-Speed 2629.83 samples/sec Loss 11.6189 LearningRate 0.0701 Epoch: 3 Global Step: 134880 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:44,056-Speed 2626.38 samples/sec Loss 11.5075 LearningRate 0.0701 Epoch: 3 Global Step: 134890 Fp16 Grad Scale: 16384 Required: 78 hours
Training: 2022-04-13 10:52:47,955-Speed 2626.72 samples/sec Loss 11.6423 LearningRate 0.0701 Epoch: 3 Global Step: 134900 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:52:51,856-Speed 2625.75 samples/sec Loss 11.5796 LearningRate 0.0701 Epoch: 3 Global Step: 134910 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:52:55,758-Speed 2624.96 samples/sec Loss 11.6600 LearningRate 0.0701 Epoch: 3 Global Step: 134920 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:52:59,704-Speed 2595.67 samples/sec Loss 11.4675 LearningRate 0.0701 Epoch: 3 Global Step: 134930 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:53:03,625-Speed 2611.80 samples/sec Loss 11.5948 LearningRate 0.0701 Epoch: 3 Global Step: 134940 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:53:07,526-Speed 2625.61 samples/sec Loss 11.4812 LearningRate 0.0701 Epoch: 3 Global Step: 134950 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:53:11,425-Speed 2627.17 samples/sec Loss 11.5275 LearningRate 0.0701 Epoch: 3 Global Step: 134960 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:53:15,325-Speed 2626.56 samples/sec Loss 11.5222 LearningRate 0.0701 Epoch: 3 Global Step: 134970 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:53:19,221-Speed 2628.66 samples/sec Loss 11.6119 LearningRate 0.0701 Epoch: 3 Global Step: 134980 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:53:23,127-Speed 2622.30 samples/sec Loss 11.5870 LearningRate 0.0701 Epoch: 3 Global Step: 134990 Fp16 Grad Scale: 32768 Required: 78 hours
Training: 2022-04-13 10:53:27,039-Speed 2617.73 samples/sec Loss 11.6308 LearningRate 0.0701 Epoch: 3 Global Step: 135000 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:30,949-Speed 2620.13 samples/sec Loss 11.5785 LearningRate 0.0701 Epoch: 3 Global Step: 135010 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:34,841-Speed 2632.03 samples/sec Loss 11.4670 LearningRate 0.0701 Epoch: 3 Global Step: 135020 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:38,731-Speed 2632.54 samples/sec Loss 11.5507 LearningRate 0.0701 Epoch: 3 Global Step: 135030 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:42,655-Speed 2610.31 samples/sec Loss 11.6952 LearningRate 0.0701 Epoch: 3 Global Step: 135040 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:46,554-Speed 2627.62 samples/sec Loss 11.5926 LearningRate 0.0701 Epoch: 3 Global Step: 135050 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:50,445-Speed 2632.70 samples/sec Loss 11.6392 LearningRate 0.0701 Epoch: 3 Global Step: 135060 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:54,360-Speed 2615.57 samples/sec Loss 11.6038 LearningRate 0.0701 Epoch: 3 Global Step: 135070 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:53:58,260-Speed 2626.47 samples/sec Loss 11.5251 LearningRate 0.0701 Epoch: 3 Global Step: 135080 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:54:02,156-Speed 2629.27 samples/sec Loss 11.5236 LearningRate 0.0701 Epoch: 3 Global Step: 135090 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:54:06,062-Speed 2622.01 samples/sec Loss 11.6134 LearningRate 0.0701 Epoch: 3 Global Step: 135100 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:09,959-Speed 2628.10 samples/sec Loss 11.5675 LearningRate 0.0701 Epoch: 3 Global Step: 135110 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:13,855-Speed 2629.54 samples/sec Loss 11.5390 LearningRate 0.0701 Epoch: 3 Global Step: 135120 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:17,749-Speed 2630.23 samples/sec Loss 11.5648 LearningRate 0.0701 Epoch: 3 Global Step: 135130 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:21,641-Speed 2631.57 samples/sec Loss 11.5660 LearningRate 0.0701 Epoch: 3 Global Step: 135140 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:25,535-Speed 2630.62 samples/sec Loss 11.5827 LearningRate 0.0701 Epoch: 3 Global Step: 135150 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:29,444-Speed 2620.71 samples/sec Loss 11.4991 LearningRate 0.0701 Epoch: 3 Global Step: 135160 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:33,363-Speed 2613.26 samples/sec Loss 11.6794 LearningRate 0.0701 Epoch: 3 Global Step: 135170 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:37,280-Speed 2614.69 samples/sec Loss 11.5801 LearningRate 0.0701 Epoch: 3 Global Step: 135180 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:41,198-Speed 2614.25 samples/sec Loss 11.7379 LearningRate 0.0701 Epoch: 3 Global Step: 135190 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:54:45,096-Speed 2627.81 samples/sec Loss 11.5638 LearningRate 0.0701 Epoch: 3 Global Step: 135200 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:54:49,013-Speed 2614.60 samples/sec Loss 11.6269 LearningRate 0.0701 Epoch: 3 Global Step: 135210 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:54:52,911-Speed 2627.93 samples/sec Loss 11.4920 LearningRate 0.0701 Epoch: 3 Global Step: 135220 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:54:56,808-Speed 2628.60 samples/sec Loss 11.4208 LearningRate 0.0701 Epoch: 3 Global Step: 135230 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:55:00,707-Speed 2627.32 samples/sec Loss 11.5028 LearningRate 0.0701 Epoch: 3 Global Step: 135240 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:55:04,604-Speed 2627.75 samples/sec Loss 11.5117 LearningRate 0.0701 Epoch: 3 Global Step: 135250 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:55:08,501-Speed 2628.47 samples/sec Loss 11.5344 LearningRate 0.0700 Epoch: 3 Global Step: 135260 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:55:12,396-Speed 2629.32 samples/sec Loss 11.4725 LearningRate 0.0700 Epoch: 3 Global Step: 135270 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:55:16,274-Speed 2646.27 samples/sec Loss 11.5936 LearningRate 0.0700 Epoch: 3 Global Step: 135280 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:55:20,170-Speed 2628.95 samples/sec Loss 11.4799 LearningRate 0.0700 Epoch: 3 Global Step: 135290 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:55:24,076-Speed 2622.44 samples/sec Loss 11.5111 LearningRate 0.0700 Epoch: 3 Global Step: 135300 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:55:27,976-Speed 2626.38 samples/sec Loss 11.7210 LearningRate 0.0700 Epoch: 3 Global Step: 135310 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:55:31,867-Speed 2632.86 samples/sec Loss 11.5341 LearningRate 0.0700 Epoch: 3 Global Step: 135320 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:55:35,740-Speed 2644.23 samples/sec Loss 11.5810 LearningRate 0.0700 Epoch: 3 Global Step: 135330 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:55:39,630-Speed 2632.87 samples/sec Loss 11.5383 LearningRate 0.0700 Epoch: 3 Global Step: 135340 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:55:43,523-Speed 2631.02 samples/sec Loss 11.6884 LearningRate 0.0700 Epoch: 3 Global Step: 135350 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:55:47,422-Speed 2627.36 samples/sec Loss 11.4249 LearningRate 0.0700 Epoch: 3 Global Step: 135360 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:55:51,318-Speed 2629.21 samples/sec Loss 11.6263 LearningRate 0.0700 Epoch: 3 Global Step: 135370 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:55:55,227-Speed 2619.94 samples/sec Loss 11.5308 LearningRate 0.0700 Epoch: 3 Global Step: 135380 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:55:59,122-Speed 2630.35 samples/sec Loss 11.6522 LearningRate 0.0700 Epoch: 3 Global Step: 135390 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:56:03,012-Speed 2632.65 samples/sec Loss 11.5984 LearningRate 0.0700 Epoch: 3 Global Step: 135400 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:56:06,903-Speed 2632.18 samples/sec Loss 11.5609 LearningRate 0.0700 Epoch: 3 Global Step: 135410 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:56:10,799-Speed 2628.77 samples/sec Loss 11.6644 LearningRate 0.0700 Epoch: 3 Global Step: 135420 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:56:14,701-Speed 2625.38 samples/sec Loss 11.4908 LearningRate 0.0700 Epoch: 3 Global Step: 135430 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:18,604-Speed 2624.61 samples/sec Loss 11.6816 LearningRate 0.0700 Epoch: 3 Global Step: 135440 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:22,538-Speed 2603.45 samples/sec Loss 11.2966 LearningRate 0.0700 Epoch: 3 Global Step: 135450 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:26,446-Speed 2622.04 samples/sec Loss 11.5621 LearningRate 0.0700 Epoch: 3 Global Step: 135460 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:30,340-Speed 2630.03 samples/sec Loss 11.6559 LearningRate 0.0700 Epoch: 3 Global Step: 135470 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:34,236-Speed 2628.60 samples/sec Loss 11.6468 LearningRate 0.0700 Epoch: 3 Global Step: 135480 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:38,138-Speed 2624.44 samples/sec Loss 11.5372 LearningRate 0.0700 Epoch: 3 Global Step: 135490 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:42,081-Speed 2598.37 samples/sec Loss 11.4963 LearningRate 0.0700 Epoch: 3 Global Step: 135500 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:45,970-Speed 2634.12 samples/sec Loss 11.4736 LearningRate 0.0700 Epoch: 3 Global Step: 135510 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:49,878-Speed 2620.68 samples/sec Loss 11.5390 LearningRate 0.0700 Epoch: 3 Global Step: 135520 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:56:53,749-Speed 2646.57 samples/sec Loss 11.5734 LearningRate 0.0700 Epoch: 3 Global Step: 135530 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:56:57,682-Speed 2604.22 samples/sec Loss 11.4712 LearningRate 0.0700 Epoch: 3 Global Step: 135540 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:01,573-Speed 2632.01 samples/sec Loss 11.5340 LearningRate 0.0700 Epoch: 3 Global Step: 135550 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:05,471-Speed 2627.58 samples/sec Loss 11.5158 LearningRate 0.0700 Epoch: 3 Global Step: 135560 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:09,363-Speed 2631.85 samples/sec Loss 11.5992 LearningRate 0.0700 Epoch: 3 Global Step: 135570 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:13,261-Speed 2627.65 samples/sec Loss 11.5302 LearningRate 0.0700 Epoch: 3 Global Step: 135580 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:17,166-Speed 2623.57 samples/sec Loss 11.4441 LearningRate 0.0700 Epoch: 3 Global Step: 135590 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:21,061-Speed 2629.05 samples/sec Loss 11.6616 LearningRate 0.0700 Epoch: 3 Global Step: 135600 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:24,953-Speed 2632.38 samples/sec Loss 11.7223 LearningRate 0.0700 Epoch: 3 Global Step: 135610 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:28,846-Speed 2631.24 samples/sec Loss 11.5864 LearningRate 0.0700 Epoch: 3 Global Step: 135620 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:57:32,747-Speed 2625.07 samples/sec Loss 11.5286 LearningRate 0.0700 Epoch: 3 Global Step: 135630 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:57:36,641-Speed 2630.64 samples/sec Loss 11.4317 LearningRate 0.0700 Epoch: 3 Global Step: 135640 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:57:40,539-Speed 2627.69 samples/sec Loss 11.4492 LearningRate 0.0700 Epoch: 3 Global Step: 135650 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:57:44,430-Speed 2631.93 samples/sec Loss 11.4834 LearningRate 0.0700 Epoch: 3 Global Step: 135660 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:57:48,323-Speed 2631.83 samples/sec Loss 11.6194 LearningRate 0.0700 Epoch: 3 Global Step: 135670 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:57:52,218-Speed 2628.89 samples/sec Loss 11.5839 LearningRate 0.0700 Epoch: 3 Global Step: 135680 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:57:56,124-Speed 2622.98 samples/sec Loss 11.5447 LearningRate 0.0700 Epoch: 3 Global Step: 135690 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:00,034-Speed 2619.12 samples/sec Loss 11.4437 LearningRate 0.0700 Epoch: 3 Global Step: 135700 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:03,938-Speed 2623.14 samples/sec Loss 11.5566 LearningRate 0.0700 Epoch: 3 Global Step: 135710 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:07,837-Speed 2626.81 samples/sec Loss 11.5878 LearningRate 0.0700 Epoch: 3 Global Step: 135720 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:11,731-Speed 2630.53 samples/sec Loss 11.6635 LearningRate 0.0700 Epoch: 3 Global Step: 135730 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:58:15,613-Speed 2638.93 samples/sec Loss 11.6345 LearningRate 0.0700 Epoch: 3 Global Step: 135740 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:19,509-Speed 2629.06 samples/sec Loss 11.6030 LearningRate 0.0700 Epoch: 3 Global Step: 135750 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:23,399-Speed 2632.65 samples/sec Loss 11.6567 LearningRate 0.0699 Epoch: 3 Global Step: 135760 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:27,292-Speed 2631.04 samples/sec Loss 11.5543 LearningRate 0.0699 Epoch: 3 Global Step: 135770 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:31,190-Speed 2627.42 samples/sec Loss 11.5095 LearningRate 0.0699 Epoch: 3 Global Step: 135780 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:35,081-Speed 2632.10 samples/sec Loss 11.4151 LearningRate 0.0699 Epoch: 3 Global Step: 135790 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:38,974-Speed 2630.74 samples/sec Loss 11.5531 LearningRate 0.0699 Epoch: 3 Global Step: 135800 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:42,875-Speed 2625.64 samples/sec Loss 11.4896 LearningRate 0.0699 Epoch: 3 Global Step: 135810 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:46,779-Speed 2623.28 samples/sec Loss 11.5587 LearningRate 0.0699 Epoch: 3 Global Step: 135820 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:50,683-Speed 2624.05 samples/sec Loss 11.3565 LearningRate 0.0699 Epoch: 3 Global Step: 135830 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:58:54,581-Speed 2627.82 samples/sec Loss 11.5510 LearningRate 0.0699 Epoch: 3 Global Step: 135840 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 10:58:58,461-Speed 2639.33 samples/sec Loss 11.5008 LearningRate 0.0699 Epoch: 3 Global Step: 135850 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:59:02,356-Speed 2629.48 samples/sec Loss 11.4029 LearningRate 0.0699 Epoch: 3 Global Step: 135860 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:59:06,253-Speed 2628.71 samples/sec Loss 11.5612 LearningRate 0.0699 Epoch: 3 Global Step: 135870 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:59:10,154-Speed 2625.07 samples/sec Loss 11.4969 LearningRate 0.0699 Epoch: 3 Global Step: 135880 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:59:14,037-Speed 2637.61 samples/sec Loss 11.5893 LearningRate 0.0699 Epoch: 3 Global Step: 135890 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:17,938-Speed 2625.74 samples/sec Loss 11.5370 LearningRate 0.0699 Epoch: 3 Global Step: 135900 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:21,834-Speed 2628.82 samples/sec Loss 11.4928 LearningRate 0.0699 Epoch: 3 Global Step: 135910 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:25,728-Speed 2630.61 samples/sec Loss 11.4630 LearningRate 0.0699 Epoch: 3 Global Step: 135920 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:29,624-Speed 2628.88 samples/sec Loss 11.3223 LearningRate 0.0699 Epoch: 3 Global Step: 135930 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:33,525-Speed 2625.84 samples/sec Loss 11.5889 LearningRate 0.0699 Epoch: 3 Global Step: 135940 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:37,419-Speed 2630.36 samples/sec Loss 11.6363 LearningRate 0.0699 Epoch: 3 Global Step: 135950 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:41,311-Speed 2631.03 samples/sec Loss 11.5993 LearningRate 0.0699 Epoch: 3 Global Step: 135960 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:45,206-Speed 2629.70 samples/sec Loss 11.6287 LearningRate 0.0699 Epoch: 3 Global Step: 135970 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:49,103-Speed 2628.29 samples/sec Loss 11.3723 LearningRate 0.0699 Epoch: 3 Global Step: 135980 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 10:59:53,000-Speed 2628.20 samples/sec Loss 11.5798 LearningRate 0.0699 Epoch: 3 Global Step: 135990 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 10:59:56,901-Speed 2625.72 samples/sec Loss 11.3053 LearningRate 0.0699 Epoch: 3 Global Step: 136000 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:00,796-Speed 2629.75 samples/sec Loss 11.7188 LearningRate 0.0699 Epoch: 3 Global Step: 136010 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:04,692-Speed 2628.76 samples/sec Loss 11.5153 LearningRate 0.0699 Epoch: 3 Global Step: 136020 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:08,591-Speed 2626.99 samples/sec Loss 11.6386 LearningRate 0.0699 Epoch: 3 Global Step: 136030 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:12,495-Speed 2623.49 samples/sec Loss 11.5624 LearningRate 0.0699 Epoch: 3 Global Step: 136040 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:16,394-Speed 2627.13 samples/sec Loss 11.4880 LearningRate 0.0699 Epoch: 3 Global Step: 136050 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:20,288-Speed 2629.91 samples/sec Loss 11.5004 LearningRate 0.0699 Epoch: 3 Global Step: 136060 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:24,188-Speed 2626.60 samples/sec Loss 11.6447 LearningRate 0.0699 Epoch: 3 Global Step: 136070 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:28,089-Speed 2625.32 samples/sec Loss 11.6392 LearningRate 0.0699 Epoch: 3 Global Step: 136080 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:00:31,984-Speed 2630.03 samples/sec Loss 11.5706 LearningRate 0.0699 Epoch: 3 Global Step: 136090 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:00:35,881-Speed 2627.83 samples/sec Loss 11.6929 LearningRate 0.0699 Epoch: 3 Global Step: 136100 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:00:39,777-Speed 2629.14 samples/sec Loss 11.6570 LearningRate 0.0699 Epoch: 3 Global Step: 136110 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:00:43,670-Speed 2631.09 samples/sec Loss 11.5705 LearningRate 0.0699 Epoch: 3 Global Step: 136120 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:00:47,566-Speed 2629.24 samples/sec Loss 11.6216 LearningRate 0.0699 Epoch: 3 Global Step: 136130 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:00:51,501-Speed 2603.33 samples/sec Loss 11.4529 LearningRate 0.0699 Epoch: 3 Global Step: 136140 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:00:55,397-Speed 2628.69 samples/sec Loss 11.6085 LearningRate 0.0699 Epoch: 3 Global Step: 136150 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:00:59,272-Speed 2643.51 samples/sec Loss 11.4014 LearningRate 0.0699 Epoch: 3 Global Step: 136160 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:03,164-Speed 2631.97 samples/sec Loss 11.3992 LearningRate 0.0699 Epoch: 3 Global Step: 136170 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:07,059-Speed 2629.48 samples/sec Loss 11.6215 LearningRate 0.0699 Epoch: 3 Global Step: 136180 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:10,954-Speed 2629.48 samples/sec Loss 11.5997 LearningRate 0.0699 Epoch: 3 Global Step: 136190 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:14,870-Speed 2615.93 samples/sec Loss 11.5497 LearningRate 0.0699 Epoch: 3 Global Step: 136200 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:18,768-Speed 2627.85 samples/sec Loss 11.5974 LearningRate 0.0699 Epoch: 3 Global Step: 136210 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:22,665-Speed 2628.70 samples/sec Loss 11.7044 LearningRate 0.0699 Epoch: 3 Global Step: 136220 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:26,596-Speed 2605.19 samples/sec Loss 11.5245 LearningRate 0.0699 Epoch: 3 Global Step: 136230 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:30,496-Speed 2626.13 samples/sec Loss 11.5560 LearningRate 0.0699 Epoch: 3 Global Step: 136240 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:34,393-Speed 2628.72 samples/sec Loss 11.5659 LearningRate 0.0698 Epoch: 3 Global Step: 136250 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:01:38,289-Speed 2628.82 samples/sec Loss 11.5584 LearningRate 0.0698 Epoch: 3 Global Step: 136260 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:01:42,184-Speed 2629.52 samples/sec Loss 11.5701 LearningRate 0.0698 Epoch: 3 Global Step: 136270 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:01:46,080-Speed 2629.24 samples/sec Loss 11.5391 LearningRate 0.0698 Epoch: 3 Global Step: 136280 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:01:49,979-Speed 2627.19 samples/sec Loss 11.5964 LearningRate 0.0698 Epoch: 3 Global Step: 136290 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:01:53,877-Speed 2627.82 samples/sec Loss 11.4918 LearningRate 0.0698 Epoch: 3 Global Step: 136300 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:01:57,773-Speed 2628.85 samples/sec Loss 11.4593 LearningRate 0.0698 Epoch: 3 Global Step: 136310 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:02:01,669-Speed 2628.98 samples/sec Loss 11.6281 LearningRate 0.0698 Epoch: 3 Global Step: 136320 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:02:05,569-Speed 2626.52 samples/sec Loss 11.5384 LearningRate 0.0698 Epoch: 3 Global Step: 136330 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:02:09,457-Speed 2633.78 samples/sec Loss 11.6494 LearningRate 0.0698 Epoch: 3 Global Step: 136340 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:13,361-Speed 2623.45 samples/sec Loss 11.5969 LearningRate 0.0698 Epoch: 3 Global Step: 136350 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:17,262-Speed 2626.14 samples/sec Loss 11.5706 LearningRate 0.0698 Epoch: 3 Global Step: 136360 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:21,160-Speed 2627.62 samples/sec Loss 11.4490 LearningRate 0.0698 Epoch: 3 Global Step: 136370 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:25,058-Speed 2627.97 samples/sec Loss 11.4407 LearningRate 0.0698 Epoch: 3 Global Step: 136380 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:28,959-Speed 2625.04 samples/sec Loss 11.5465 LearningRate 0.0698 Epoch: 3 Global Step: 136390 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:32,855-Speed 2629.60 samples/sec Loss 11.5955 LearningRate 0.0698 Epoch: 3 Global Step: 136400 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:36,749-Speed 2630.30 samples/sec Loss 11.5668 LearningRate 0.0698 Epoch: 3 Global Step: 136410 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:40,643-Speed 2630.49 samples/sec Loss 11.4540 LearningRate 0.0698 Epoch: 3 Global Step: 136420 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:44,537-Speed 2629.73 samples/sec Loss 11.6208 LearningRate 0.0698 Epoch: 3 Global Step: 136430 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:02:48,456-Speed 2613.57 samples/sec Loss 11.5230 LearningRate 0.0698 Epoch: 3 Global Step: 136440 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:02:52,352-Speed 2629.02 samples/sec Loss 11.4845 LearningRate 0.0698 Epoch: 3 Global Step: 136450 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:02:56,247-Speed 2630.14 samples/sec Loss 11.4230 LearningRate 0.0698 Epoch: 3 Global Step: 136460 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:03:00,143-Speed 2629.32 samples/sec Loss 11.4886 LearningRate 0.0698 Epoch: 3 Global Step: 136470 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:03:04,037-Speed 2629.62 samples/sec Loss 11.5187 LearningRate 0.0698 Epoch: 3 Global Step: 136480 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:03:07,946-Speed 2620.44 samples/sec Loss 11.5798 LearningRate 0.0698 Epoch: 3 Global Step: 136490 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:03:11,834-Speed 2634.43 samples/sec Loss 11.6183 LearningRate 0.0698 Epoch: 3 Global Step: 136500 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:15,741-Speed 2621.33 samples/sec Loss 11.6505 LearningRate 0.0698 Epoch: 3 Global Step: 136510 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:19,639-Speed 2627.60 samples/sec Loss 11.4346 LearningRate 0.0698 Epoch: 3 Global Step: 136520 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:23,530-Speed 2632.06 samples/sec Loss 11.4490 LearningRate 0.0698 Epoch: 3 Global Step: 136530 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:27,431-Speed 2626.13 samples/sec Loss 11.4682 LearningRate 0.0698 Epoch: 3 Global Step: 136540 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:31,334-Speed 2623.78 samples/sec Loss 11.5408 LearningRate 0.0698 Epoch: 3 Global Step: 136550 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:35,241-Speed 2621.57 samples/sec Loss 11.6310 LearningRate 0.0698 Epoch: 3 Global Step: 136560 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:39,142-Speed 2626.10 samples/sec Loss 11.6072 LearningRate 0.0698 Epoch: 3 Global Step: 136570 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:43,048-Speed 2622.63 samples/sec Loss 11.5219 LearningRate 0.0698 Epoch: 3 Global Step: 136580 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:46,940-Speed 2631.50 samples/sec Loss 11.5831 LearningRate 0.0698 Epoch: 3 Global Step: 136590 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:03:50,837-Speed 2628.41 samples/sec Loss 11.5559 LearningRate 0.0698 Epoch: 3 Global Step: 136600 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:03:54,734-Speed 2629.04 samples/sec Loss 11.4516 LearningRate 0.0698 Epoch: 3 Global Step: 136610 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:03:58,611-Speed 2641.50 samples/sec Loss 11.5520 LearningRate 0.0698 Epoch: 3 Global Step: 136620 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:02,507-Speed 2629.07 samples/sec Loss 11.4462 LearningRate 0.0698 Epoch: 3 Global Step: 136630 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:06,403-Speed 2628.65 samples/sec Loss 11.6671 LearningRate 0.0698 Epoch: 3 Global Step: 136640 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:10,298-Speed 2630.27 samples/sec Loss 11.4686 LearningRate 0.0698 Epoch: 3 Global Step: 136650 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:14,191-Speed 2630.42 samples/sec Loss 11.5738 LearningRate 0.0698 Epoch: 3 Global Step: 136660 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:18,102-Speed 2619.67 samples/sec Loss 11.5205 LearningRate 0.0698 Epoch: 3 Global Step: 136670 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:22,033-Speed 2605.39 samples/sec Loss 11.5965 LearningRate 0.0698 Epoch: 3 Global Step: 136680 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:25,937-Speed 2623.91 samples/sec Loss 11.5343 LearningRate 0.0698 Epoch: 3 Global Step: 136690 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:29,838-Speed 2625.06 samples/sec Loss 11.4087 LearningRate 0.0698 Epoch: 3 Global Step: 136700 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:33,735-Speed 2628.29 samples/sec Loss 11.6903 LearningRate 0.0698 Epoch: 3 Global Step: 136710 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:04:37,636-Speed 2625.91 samples/sec Loss 11.4423 LearningRate 0.0698 Epoch: 3 Global Step: 136720 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:04:41,546-Speed 2619.64 samples/sec Loss 11.4867 LearningRate 0.0698 Epoch: 3 Global Step: 136730 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:04:45,465-Speed 2613.21 samples/sec Loss 11.4910 LearningRate 0.0698 Epoch: 3 Global Step: 136740 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:04:49,376-Speed 2619.39 samples/sec Loss 11.6742 LearningRate 0.0697 Epoch: 3 Global Step: 136750 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:04:53,430-Speed 2526.69 samples/sec Loss 11.5582 LearningRate 0.0697 Epoch: 3 Global Step: 136760 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:04:57,348-Speed 2614.60 samples/sec Loss 11.5719 LearningRate 0.0697 Epoch: 3 Global Step: 136770 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:05:01,226-Speed 2641.05 samples/sec Loss 11.6627 LearningRate 0.0697 Epoch: 3 Global Step: 136780 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:05:05,123-Speed 2628.35 samples/sec Loss 11.4207 LearningRate 0.0697 Epoch: 3 Global Step: 136790 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:05:09,016-Speed 2631.29 samples/sec Loss 11.4194 LearningRate 0.0697 Epoch: 3 Global Step: 136800 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:12,918-Speed 2625.07 samples/sec Loss 11.4676 LearningRate 0.0697 Epoch: 3 Global Step: 136810 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:16,839-Speed 2612.54 samples/sec Loss 11.5828 LearningRate 0.0697 Epoch: 3 Global Step: 136820 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:20,732-Speed 2631.32 samples/sec Loss 11.4887 LearningRate 0.0697 Epoch: 3 Global Step: 136830 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:24,623-Speed 2632.23 samples/sec Loss 11.4415 LearningRate 0.0697 Epoch: 3 Global Step: 136840 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:28,523-Speed 2626.53 samples/sec Loss 11.5950 LearningRate 0.0697 Epoch: 3 Global Step: 136850 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:32,411-Speed 2634.62 samples/sec Loss 11.4731 LearningRate 0.0697 Epoch: 3 Global Step: 136860 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:36,312-Speed 2625.68 samples/sec Loss 11.4553 LearningRate 0.0697 Epoch: 3 Global Step: 136870 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:40,209-Speed 2628.13 samples/sec Loss 11.5747 LearningRate 0.0697 Epoch: 3 Global Step: 136880 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:44,113-Speed 2623.69 samples/sec Loss 11.4742 LearningRate 0.0697 Epoch: 3 Global Step: 136890 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:05:48,018-Speed 2623.35 samples/sec Loss 11.6188 LearningRate 0.0697 Epoch: 3 Global Step: 136900 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:05:51,917-Speed 2627.02 samples/sec Loss 11.6199 LearningRate 0.0697 Epoch: 3 Global Step: 136910 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:05:55,818-Speed 2625.67 samples/sec Loss 11.5780 LearningRate 0.0697 Epoch: 3 Global Step: 136920 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:05:59,710-Speed 2631.57 samples/sec Loss 11.6326 LearningRate 0.0697 Epoch: 3 Global Step: 136930 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:03,603-Speed 2630.78 samples/sec Loss 11.5799 LearningRate 0.0697 Epoch: 3 Global Step: 136940 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:07,497-Speed 2630.60 samples/sec Loss 11.5812 LearningRate 0.0697 Epoch: 3 Global Step: 136950 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:11,389-Speed 2631.39 samples/sec Loss 11.4543 LearningRate 0.0697 Epoch: 3 Global Step: 136960 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:15,292-Speed 2624.37 samples/sec Loss 11.5845 LearningRate 0.0697 Epoch: 3 Global Step: 136970 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:19,201-Speed 2619.97 samples/sec Loss 11.6585 LearningRate 0.0697 Epoch: 3 Global Step: 136980 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:23,101-Speed 2626.79 samples/sec Loss 11.5073 LearningRate 0.0697 Epoch: 3 Global Step: 136990 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:27,022-Speed 2611.81 samples/sec Loss 11.6532 LearningRate 0.0697 Epoch: 3 Global Step: 137000 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:06:30,909-Speed 2635.62 samples/sec Loss 11.5124 LearningRate 0.0697 Epoch: 3 Global Step: 137010 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:06:34,803-Speed 2630.64 samples/sec Loss 11.5828 LearningRate 0.0697 Epoch: 3 Global Step: 137020 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:06:38,697-Speed 2630.18 samples/sec Loss 11.4998 LearningRate 0.0697 Epoch: 3 Global Step: 137030 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:06:42,575-Speed 2641.30 samples/sec Loss 11.4175 LearningRate 0.0697 Epoch: 3 Global Step: 137040 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:46,468-Speed 2630.51 samples/sec Loss 11.4476 LearningRate 0.0697 Epoch: 3 Global Step: 137050 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:50,359-Speed 2632.35 samples/sec Loss 11.4649 LearningRate 0.0697 Epoch: 3 Global Step: 137060 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:54,257-Speed 2627.94 samples/sec Loss 11.4745 LearningRate 0.0697 Epoch: 3 Global Step: 137070 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:06:58,155-Speed 2627.24 samples/sec Loss 11.5695 LearningRate 0.0697 Epoch: 3 Global Step: 137080 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:02,053-Speed 2628.09 samples/sec Loss 11.5158 LearningRate 0.0697 Epoch: 3 Global Step: 137090 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:05,957-Speed 2623.89 samples/sec Loss 11.5173 LearningRate 0.0697 Epoch: 3 Global Step: 137100 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:09,865-Speed 2620.70 samples/sec Loss 11.5287 LearningRate 0.0697 Epoch: 3 Global Step: 137110 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:13,768-Speed 2623.79 samples/sec Loss 11.4965 LearningRate 0.0697 Epoch: 3 Global Step: 137120 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:17,666-Speed 2628.39 samples/sec Loss 11.4810 LearningRate 0.0697 Epoch: 3 Global Step: 137130 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:21,574-Speed 2620.34 samples/sec Loss 11.4991 LearningRate 0.0697 Epoch: 3 Global Step: 137140 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:07:25,468-Speed 2630.52 samples/sec Loss 11.4694 LearningRate 0.0697 Epoch: 3 Global Step: 137150 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:07:29,372-Speed 2623.44 samples/sec Loss 11.4497 LearningRate 0.0697 Epoch: 3 Global Step: 137160 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:07:33,282-Speed 2620.26 samples/sec Loss 11.4464 LearningRate 0.0697 Epoch: 3 Global Step: 137170 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:07:37,188-Speed 2622.02 samples/sec Loss 11.4037 LearningRate 0.0697 Epoch: 3 Global Step: 137180 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:07:41,089-Speed 2625.21 samples/sec Loss 11.3076 LearningRate 0.0697 Epoch: 3 Global Step: 137190 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:07:44,995-Speed 2622.12 samples/sec Loss 11.5710 LearningRate 0.0697 Epoch: 3 Global Step: 137200 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:07:48,938-Speed 2597.77 samples/sec Loss 11.5136 LearningRate 0.0697 Epoch: 3 Global Step: 137210 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:52,837-Speed 2627.11 samples/sec Loss 11.4959 LearningRate 0.0697 Epoch: 3 Global Step: 137220 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:07:56,737-Speed 2626.08 samples/sec Loss 11.4146 LearningRate 0.0697 Epoch: 3 Global Step: 137230 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:00,642-Speed 2623.43 samples/sec Loss 11.5058 LearningRate 0.0697 Epoch: 3 Global Step: 137240 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:04,670-Speed 2542.58 samples/sec Loss 11.5080 LearningRate 0.0696 Epoch: 3 Global Step: 137250 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:08,579-Speed 2619.69 samples/sec Loss 11.4937 LearningRate 0.0696 Epoch: 3 Global Step: 137260 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:12,481-Speed 2625.11 samples/sec Loss 11.5564 LearningRate 0.0696 Epoch: 3 Global Step: 137270 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:16,385-Speed 2623.47 samples/sec Loss 11.4429 LearningRate 0.0696 Epoch: 3 Global Step: 137280 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:20,280-Speed 2630.37 samples/sec Loss 11.6270 LearningRate 0.0696 Epoch: 3 Global Step: 137290 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:24,196-Speed 2614.96 samples/sec Loss 11.6202 LearningRate 0.0696 Epoch: 3 Global Step: 137300 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:08:28,092-Speed 2629.25 samples/sec Loss 11.5218 LearningRate 0.0696 Epoch: 3 Global Step: 137310 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:31,985-Speed 2631.20 samples/sec Loss 11.4935 LearningRate 0.0696 Epoch: 3 Global Step: 137320 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:35,885-Speed 2626.19 samples/sec Loss 11.6729 LearningRate 0.0696 Epoch: 3 Global Step: 137330 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:39,804-Speed 2613.35 samples/sec Loss 11.5734 LearningRate 0.0696 Epoch: 3 Global Step: 137340 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:43,695-Speed 2632.32 samples/sec Loss 11.4111 LearningRate 0.0696 Epoch: 3 Global Step: 137350 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:47,589-Speed 2630.48 samples/sec Loss 11.5089 LearningRate 0.0696 Epoch: 3 Global Step: 137360 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:51,520-Speed 2605.71 samples/sec Loss 11.3941 LearningRate 0.0696 Epoch: 3 Global Step: 137370 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:55,415-Speed 2629.84 samples/sec Loss 11.5494 LearningRate 0.0696 Epoch: 3 Global Step: 137380 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:08:59,289-Speed 2643.57 samples/sec Loss 11.4086 LearningRate 0.0696 Epoch: 3 Global Step: 137390 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:03,185-Speed 2628.97 samples/sec Loss 11.5060 LearningRate 0.0696 Epoch: 3 Global Step: 137400 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:07,104-Speed 2613.98 samples/sec Loss 11.5417 LearningRate 0.0696 Epoch: 3 Global Step: 137410 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:11,118-Speed 2551.62 samples/sec Loss 11.4050 LearningRate 0.0696 Epoch: 3 Global Step: 137420 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:15,010-Speed 2631.51 samples/sec Loss 11.4032 LearningRate 0.0696 Epoch: 3 Global Step: 137430 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:18,914-Speed 2623.98 samples/sec Loss 11.4911 LearningRate 0.0696 Epoch: 3 Global Step: 137440 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:22,815-Speed 2625.12 samples/sec Loss 11.4763 LearningRate 0.0696 Epoch: 3 Global Step: 137450 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:26,709-Speed 2631.20 samples/sec Loss 11.5065 LearningRate 0.0696 Epoch: 3 Global Step: 137460 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:30,604-Speed 2629.62 samples/sec Loss 11.5966 LearningRate 0.0696 Epoch: 3 Global Step: 137470 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:34,516-Speed 2617.92 samples/sec Loss 11.6085 LearningRate 0.0696 Epoch: 3 Global Step: 137480 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:09:38,396-Speed 2639.91 samples/sec Loss 11.5430 LearningRate 0.0696 Epoch: 3 Global Step: 137490 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:09:42,288-Speed 2632.03 samples/sec Loss 11.6471 LearningRate 0.0696 Epoch: 3 Global Step: 137500 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:09:46,180-Speed 2631.48 samples/sec Loss 11.3988 LearningRate 0.0696 Epoch: 3 Global Step: 137510 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:09:50,084-Speed 2623.61 samples/sec Loss 11.5404 LearningRate 0.0696 Epoch: 3 Global Step: 137520 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:09:53,977-Speed 2630.59 samples/sec Loss 11.4839 LearningRate 0.0696 Epoch: 3 Global Step: 137530 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:09:57,871-Speed 2631.01 samples/sec Loss 11.5059 LearningRate 0.0696 Epoch: 3 Global Step: 137540 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:01,774-Speed 2624.44 samples/sec Loss 11.4238 LearningRate 0.0696 Epoch: 3 Global Step: 137550 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:05,676-Speed 2624.70 samples/sec Loss 11.3554 LearningRate 0.0696 Epoch: 3 Global Step: 137560 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:09,577-Speed 2625.59 samples/sec Loss 11.5425 LearningRate 0.0696 Epoch: 3 Global Step: 137570 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:13,562-Speed 2569.84 samples/sec Loss 11.4663 LearningRate 0.0696 Epoch: 3 Global Step: 137580 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:17,447-Speed 2636.75 samples/sec Loss 11.6018 LearningRate 0.0696 Epoch: 3 Global Step: 137590 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:21,342-Speed 2629.35 samples/sec Loss 11.4889 LearningRate 0.0696 Epoch: 3 Global Step: 137600 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:25,237-Speed 2629.84 samples/sec Loss 11.5338 LearningRate 0.0696 Epoch: 3 Global Step: 137610 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:29,135-Speed 2627.92 samples/sec Loss 11.5610 LearningRate 0.0696 Epoch: 3 Global Step: 137620 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:33,029-Speed 2630.25 samples/sec Loss 11.5077 LearningRate 0.0696 Epoch: 3 Global Step: 137630 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:36,921-Speed 2631.26 samples/sec Loss 11.4088 LearningRate 0.0696 Epoch: 3 Global Step: 137640 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:40,852-Speed 2605.86 samples/sec Loss 11.5476 LearningRate 0.0696 Epoch: 3 Global Step: 137650 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:10:44,735-Speed 2637.57 samples/sec Loss 11.4011 LearningRate 0.0696 Epoch: 3 Global Step: 137660 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:10:48,620-Speed 2636.74 samples/sec Loss 11.4342 LearningRate 0.0696 Epoch: 3 Global Step: 137670 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:10:52,509-Speed 2633.28 samples/sec Loss 11.3969 LearningRate 0.0696 Epoch: 3 Global Step: 137680 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:10:56,407-Speed 2628.17 samples/sec Loss 11.5616 LearningRate 0.0696 Epoch: 3 Global Step: 137690 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:11:00,301-Speed 2629.67 samples/sec Loss 11.3364 LearningRate 0.0696 Epoch: 3 Global Step: 137700 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:11:04,195-Speed 2630.46 samples/sec Loss 11.3219 LearningRate 0.0696 Epoch: 3 Global Step: 137710 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:11:08,087-Speed 2632.10 samples/sec Loss 11.5017 LearningRate 0.0696 Epoch: 3 Global Step: 137720 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:11:11,987-Speed 2625.99 samples/sec Loss 11.4787 LearningRate 0.0696 Epoch: 3 Global Step: 137730 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:11:15,883-Speed 2629.35 samples/sec Loss 11.5068 LearningRate 0.0695 Epoch: 3 Global Step: 137740 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:11:19,793-Speed 2619.84 samples/sec Loss 11.5222 LearningRate 0.0695 Epoch: 3 Global Step: 137750 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:11:23,703-Speed 2619.28 samples/sec Loss 11.3695 LearningRate 0.0695 Epoch: 3 Global Step: 137760 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:27,632-Speed 2607.16 samples/sec Loss 11.4330 LearningRate 0.0695 Epoch: 3 Global Step: 137770 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:31,526-Speed 2630.19 samples/sec Loss 11.4253 LearningRate 0.0695 Epoch: 3 Global Step: 137780 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:35,425-Speed 2627.00 samples/sec Loss 11.5624 LearningRate 0.0695 Epoch: 3 Global Step: 137790 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:39,325-Speed 2626.57 samples/sec Loss 11.5141 LearningRate 0.0695 Epoch: 3 Global Step: 137800 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:43,224-Speed 2626.80 samples/sec Loss 11.4720 LearningRate 0.0695 Epoch: 3 Global Step: 137810 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:47,129-Speed 2623.00 samples/sec Loss 11.4816 LearningRate 0.0695 Epoch: 3 Global Step: 137820 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:51,026-Speed 2628.35 samples/sec Loss 11.5032 LearningRate 0.0695 Epoch: 3 Global Step: 137830 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:54,925-Speed 2627.43 samples/sec Loss 11.5485 LearningRate 0.0695 Epoch: 3 Global Step: 137840 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:11:58,823-Speed 2627.20 samples/sec Loss 11.5931 LearningRate 0.0695 Epoch: 3 Global Step: 137850 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:12:02,725-Speed 2624.95 samples/sec Loss 11.5961 LearningRate 0.0695 Epoch: 3 Global Step: 137860 Fp16 Grad Scale: 524288 Required: 78 hours
Training: 2022-04-13 11:12:06,608-Speed 2637.52 samples/sec Loss 11.5071 LearningRate 0.0695 Epoch: 3 Global Step: 137870 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:12:10,515-Speed 2622.07 samples/sec Loss 11.5325 LearningRate 0.0695 Epoch: 3 Global Step: 137880 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:12:14,411-Speed 2628.94 samples/sec Loss 11.5771 LearningRate 0.0695 Epoch: 3 Global Step: 137890 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:12:18,313-Speed 2625.50 samples/sec Loss 11.5272 LearningRate 0.0695 Epoch: 3 Global Step: 137900 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:22,198-Speed 2636.33 samples/sec Loss 11.6404 LearningRate 0.0695 Epoch: 3 Global Step: 137910 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:26,108-Speed 2620.09 samples/sec Loss 11.5025 LearningRate 0.0695 Epoch: 3 Global Step: 137920 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:30,001-Speed 2630.57 samples/sec Loss 11.4281 LearningRate 0.0695 Epoch: 3 Global Step: 137930 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:33,905-Speed 2623.35 samples/sec Loss 11.4946 LearningRate 0.0695 Epoch: 3 Global Step: 137940 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:37,812-Speed 2621.61 samples/sec Loss 11.4084 LearningRate 0.0695 Epoch: 3 Global Step: 137950 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:41,707-Speed 2630.13 samples/sec Loss 11.5458 LearningRate 0.0695 Epoch: 3 Global Step: 137960 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:45,602-Speed 2629.87 samples/sec Loss 11.5193 LearningRate 0.0695 Epoch: 3 Global Step: 137970 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:49,517-Speed 2616.02 samples/sec Loss 11.3555 LearningRate 0.0695 Epoch: 3 Global Step: 137980 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:53,418-Speed 2625.56 samples/sec Loss 11.4343 LearningRate 0.0695 Epoch: 3 Global Step: 137990 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:12:57,321-Speed 2624.61 samples/sec Loss 11.4299 LearningRate 0.0695 Epoch: 3 Global Step: 138000 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:13:01,219-Speed 2627.71 samples/sec Loss 11.5280 LearningRate 0.0695 Epoch: 3 Global Step: 138010 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:05,127-Speed 2621.03 samples/sec Loss 11.4687 LearningRate 0.0695 Epoch: 3 Global Step: 138020 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:09,025-Speed 2627.68 samples/sec Loss 11.6047 LearningRate 0.0695 Epoch: 3 Global Step: 138030 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:12,921-Speed 2628.98 samples/sec Loss 11.5437 LearningRate 0.0695 Epoch: 3 Global Step: 138040 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:16,815-Speed 2630.52 samples/sec Loss 11.4681 LearningRate 0.0695 Epoch: 3 Global Step: 138050 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:20,715-Speed 2626.74 samples/sec Loss 11.5167 LearningRate 0.0695 Epoch: 3 Global Step: 138060 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:24,610-Speed 2629.28 samples/sec Loss 11.4287 LearningRate 0.0695 Epoch: 3 Global Step: 138070 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:28,506-Speed 2629.54 samples/sec Loss 11.6876 LearningRate 0.0695 Epoch: 3 Global Step: 138080 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:32,405-Speed 2626.56 samples/sec Loss 11.4170 LearningRate 0.0695 Epoch: 3 Global Step: 138090 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:36,302-Speed 2628.73 samples/sec Loss 11.4325 LearningRate 0.0695 Epoch: 3 Global Step: 138100 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:13:40,199-Speed 2628.16 samples/sec Loss 11.4635 LearningRate 0.0695 Epoch: 3 Global Step: 138110 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:13:44,101-Speed 2625.08 samples/sec Loss 11.4811 LearningRate 0.0695 Epoch: 3 Global Step: 138120 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:13:47,997-Speed 2629.03 samples/sec Loss 11.4078 LearningRate 0.0695 Epoch: 3 Global Step: 138130 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:13:51,896-Speed 2627.02 samples/sec Loss 11.5274 LearningRate 0.0695 Epoch: 3 Global Step: 138140 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:13:55,822-Speed 2608.90 samples/sec Loss 11.4921 LearningRate 0.0695 Epoch: 3 Global Step: 138150 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:13:59,720-Speed 2628.26 samples/sec Loss 11.5089 LearningRate 0.0695 Epoch: 3 Global Step: 138160 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:14:03,627-Speed 2621.04 samples/sec Loss 11.5071 LearningRate 0.0695 Epoch: 3 Global Step: 138170 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:14:07,522-Speed 2629.92 samples/sec Loss 11.3047 LearningRate 0.0695 Epoch: 3 Global Step: 138180 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:14:11,417-Speed 2629.35 samples/sec Loss 11.5076 LearningRate 0.0695 Epoch: 3 Global Step: 138190 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:14:15,315-Speed 2627.84 samples/sec Loss 11.5063 LearningRate 0.0695 Epoch: 3 Global Step: 138200 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:14:19,209-Speed 2631.08 samples/sec Loss 11.5223 LearningRate 0.0695 Epoch: 3 Global Step: 138210 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:14:23,109-Speed 2626.78 samples/sec Loss 11.4328 LearningRate 0.0695 Epoch: 3 Global Step: 138220 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:14:26,998-Speed 2633.05 samples/sec Loss 11.4730 LearningRate 0.0695 Epoch: 3 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:30,921-Speed 2610.74 samples/sec Loss 11.3737 LearningRate 0.0694 Epoch: 3 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:34,813-Speed 2631.79 samples/sec Loss 11.4526 LearningRate 0.0694 Epoch: 3 Global Step: 138250 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:38,711-Speed 2628.41 samples/sec Loss 11.4552 LearningRate 0.0694 Epoch: 3 Global Step: 138260 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:42,604-Speed 2630.79 samples/sec Loss 11.4683 LearningRate 0.0694 Epoch: 3 Global Step: 138270 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:46,503-Speed 2627.18 samples/sec Loss 11.5682 LearningRate 0.0694 Epoch: 3 Global Step: 138280 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:50,394-Speed 2632.23 samples/sec Loss 11.3903 LearningRate 0.0694 Epoch: 3 Global Step: 138290 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:54,290-Speed 2628.82 samples/sec Loss 11.4553 LearningRate 0.0694 Epoch: 3 Global Step: 138300 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:14:58,183-Speed 2631.59 samples/sec Loss 11.4477 LearningRate 0.0694 Epoch: 3 Global Step: 138310 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:15:02,081-Speed 2627.44 samples/sec Loss 11.6249 LearningRate 0.0694 Epoch: 3 Global Step: 138320 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:15:05,981-Speed 2625.80 samples/sec Loss 11.5226 LearningRate 0.0694 Epoch: 3 Global Step: 138330 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:09,880-Speed 2626.79 samples/sec Loss 11.5584 LearningRate 0.0694 Epoch: 3 Global Step: 138340 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:13,780-Speed 2626.99 samples/sec Loss 11.4621 LearningRate 0.0694 Epoch: 3 Global Step: 138350 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:17,682-Speed 2625.09 samples/sec Loss 11.4659 LearningRate 0.0694 Epoch: 3 Global Step: 138360 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:21,582-Speed 2626.18 samples/sec Loss 11.4728 LearningRate 0.0694 Epoch: 3 Global Step: 138370 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:25,477-Speed 2629.61 samples/sec Loss 11.4289 LearningRate 0.0694 Epoch: 3 Global Step: 138380 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:29,377-Speed 2626.49 samples/sec Loss 11.4176 LearningRate 0.0694 Epoch: 3 Global Step: 138390 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:33,275-Speed 2627.22 samples/sec Loss 11.3821 LearningRate 0.0694 Epoch: 3 Global Step: 138400 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:37,170-Speed 2629.87 samples/sec Loss 11.3653 LearningRate 0.0694 Epoch: 3 Global Step: 138410 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:41,076-Speed 2622.44 samples/sec Loss 11.6983 LearningRate 0.0694 Epoch: 3 Global Step: 138420 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:44,970-Speed 2629.94 samples/sec Loss 11.6812 LearningRate 0.0694 Epoch: 3 Global Step: 138430 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:15:48,862-Speed 2631.73 samples/sec Loss 11.4805 LearningRate 0.0694 Epoch: 3 Global Step: 138440 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:52,766-Speed 2623.66 samples/sec Loss 11.6304 LearningRate 0.0694 Epoch: 3 Global Step: 138450 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:15:56,660-Speed 2630.44 samples/sec Loss 11.5149 LearningRate 0.0694 Epoch: 3 Global Step: 138460 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:16:00,561-Speed 2625.86 samples/sec Loss 11.5372 LearningRate 0.0694 Epoch: 3 Global Step: 138470 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:16:04,459-Speed 2627.45 samples/sec Loss 11.4286 LearningRate 0.0694 Epoch: 3 Global Step: 138480 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:16:08,365-Speed 2621.87 samples/sec Loss 11.4786 LearningRate 0.0694 Epoch: 3 Global Step: 138490 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:16:12,261-Speed 2629.19 samples/sec Loss 11.6147 LearningRate 0.0694 Epoch: 3 Global Step: 138500 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:16:16,143-Speed 2638.44 samples/sec Loss 11.5786 LearningRate 0.0694 Epoch: 3 Global Step: 138510 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:20,040-Speed 2628.44 samples/sec Loss 11.4549 LearningRate 0.0694 Epoch: 3 Global Step: 138520 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:23,944-Speed 2623.22 samples/sec Loss 11.5639 LearningRate 0.0694 Epoch: 3 Global Step: 138530 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:27,841-Speed 2629.18 samples/sec Loss 11.4950 LearningRate 0.0694 Epoch: 3 Global Step: 138540 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:31,736-Speed 2629.15 samples/sec Loss 11.4718 LearningRate 0.0694 Epoch: 3 Global Step: 138550 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:35,629-Speed 2631.31 samples/sec Loss 11.4254 LearningRate 0.0694 Epoch: 3 Global Step: 138560 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:39,525-Speed 2628.46 samples/sec Loss 11.6618 LearningRate 0.0694 Epoch: 3 Global Step: 138570 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:43,424-Speed 2627.10 samples/sec Loss 11.4042 LearningRate 0.0694 Epoch: 3 Global Step: 138580 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:47,320-Speed 2628.51 samples/sec Loss 11.2088 LearningRate 0.0694 Epoch: 3 Global Step: 138590 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:51,218-Speed 2628.16 samples/sec Loss 11.4535 LearningRate 0.0694 Epoch: 3 Global Step: 138600 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:16:55,120-Speed 2625.10 samples/sec Loss 11.4674 LearningRate 0.0694 Epoch: 3 Global Step: 138610 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:16:59,019-Speed 2627.56 samples/sec Loss 11.4633 LearningRate 0.0694 Epoch: 3 Global Step: 138620 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:02,915-Speed 2628.65 samples/sec Loss 11.5419 LearningRate 0.0694 Epoch: 3 Global Step: 138630 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:06,810-Speed 2629.45 samples/sec Loss 11.5184 LearningRate 0.0694 Epoch: 3 Global Step: 138640 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:10,725-Speed 2615.99 samples/sec Loss 11.6214 LearningRate 0.0694 Epoch: 3 Global Step: 138650 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:14,615-Speed 2633.33 samples/sec Loss 11.4813 LearningRate 0.0694 Epoch: 3 Global Step: 138660 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:18,515-Speed 2626.76 samples/sec Loss 11.5295 LearningRate 0.0694 Epoch: 3 Global Step: 138670 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:22,412-Speed 2627.96 samples/sec Loss 11.5294 LearningRate 0.0694 Epoch: 3 Global Step: 138680 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:26,312-Speed 2626.50 samples/sec Loss 11.3569 LearningRate 0.0694 Epoch: 3 Global Step: 138690 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:30,206-Speed 2630.80 samples/sec Loss 11.5122 LearningRate 0.0694 Epoch: 3 Global Step: 138700 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:34,213-Speed 2556.11 samples/sec Loss 11.6187 LearningRate 0.0694 Epoch: 3 Global Step: 138710 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:17:38,109-Speed 2628.76 samples/sec Loss 11.4552 LearningRate 0.0694 Epoch: 3 Global Step: 138720 Fp16 Grad Scale: 262144 Required: 78 hours
Training: 2022-04-13 11:17:41,987-Speed 2641.11 samples/sec Loss 11.5110 LearningRate 0.0694 Epoch: 3 Global Step: 138730 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:45,897-Speed 2620.12 samples/sec Loss 11.3615 LearningRate 0.0693 Epoch: 3 Global Step: 138740 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:17:49,961-Speed 2520.17 samples/sec Loss 11.4666 LearningRate 0.0693 Epoch: 3 Global Step: 138750 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:17:53,841-Speed 2639.44 samples/sec Loss 11.4790 LearningRate 0.0693 Epoch: 3 Global Step: 138760 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:17:57,830-Speed 2568.29 samples/sec Loss 11.4805 LearningRate 0.0693 Epoch: 3 Global Step: 138770 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:01,865-Speed 2538.84 samples/sec Loss 11.4170 LearningRate 0.0693 Epoch: 3 Global Step: 138780 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:05,758-Speed 2630.99 samples/sec Loss 11.4590 LearningRate 0.0693 Epoch: 3 Global Step: 138790 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:09,656-Speed 2627.11 samples/sec Loss 11.4936 LearningRate 0.0693 Epoch: 3 Global Step: 138800 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:13,555-Speed 2627.68 samples/sec Loss 11.4167 LearningRate 0.0693 Epoch: 3 Global Step: 138810 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:17,450-Speed 2629.44 samples/sec Loss 11.5533 LearningRate 0.0693 Epoch: 3 Global Step: 138820 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:21,370-Speed 2613.12 samples/sec Loss 11.4843 LearningRate 0.0693 Epoch: 3 Global Step: 138830 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:25,262-Speed 2631.89 samples/sec Loss 11.7894 LearningRate 0.0693 Epoch: 3 Global Step: 138840 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:29,172-Speed 2619.63 samples/sec Loss 11.8044 LearningRate 0.0693 Epoch: 3 Global Step: 138850 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:18:33,077-Speed 2623.21 samples/sec Loss 11.5338 LearningRate 0.0693 Epoch: 3 Global Step: 138860 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:18:36,975-Speed 2626.88 samples/sec Loss 11.6367 LearningRate 0.0693 Epoch: 3 Global Step: 138870 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:18:40,885-Speed 2619.53 samples/sec Loss 11.5650 LearningRate 0.0693 Epoch: 3 Global Step: 138880 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:18:44,790-Speed 2622.60 samples/sec Loss 11.5028 LearningRate 0.0693 Epoch: 3 Global Step: 138890 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:18:48,709-Speed 2614.28 samples/sec Loss 11.6762 LearningRate 0.0693 Epoch: 3 Global Step: 138900 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:18:52,601-Speed 2631.73 samples/sec Loss 11.6278 LearningRate 0.0693 Epoch: 3 Global Step: 138910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:18:56,529-Speed 2607.65 samples/sec Loss 11.7421 LearningRate 0.0693 Epoch: 3 Global Step: 138920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:00,423-Speed 2630.34 samples/sec Loss 11.5870 LearningRate 0.0693 Epoch: 3 Global Step: 138930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:04,319-Speed 2629.03 samples/sec Loss 11.5055 LearningRate 0.0693 Epoch: 3 Global Step: 138940 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:08,218-Speed 2626.81 samples/sec Loss 11.5478 LearningRate 0.0693 Epoch: 3 Global Step: 138950 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:12,108-Speed 2632.83 samples/sec Loss 11.6567 LearningRate 0.0693 Epoch: 3 Global Step: 138960 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:16,017-Speed 2620.76 samples/sec Loss 11.5085 LearningRate 0.0693 Epoch: 3 Global Step: 138970 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:19,914-Speed 2627.85 samples/sec Loss 11.5337 LearningRate 0.0693 Epoch: 3 Global Step: 138980 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:23,814-Speed 2626.91 samples/sec Loss 11.5262 LearningRate 0.0693 Epoch: 3 Global Step: 138990 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:27,720-Speed 2621.98 samples/sec Loss 11.5880 LearningRate 0.0693 Epoch: 3 Global Step: 139000 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:31,628-Speed 2621.22 samples/sec Loss 11.3177 LearningRate 0.0693 Epoch: 3 Global Step: 139010 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:35,532-Speed 2623.62 samples/sec Loss 11.4083 LearningRate 0.0693 Epoch: 3 Global Step: 139020 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:39,445-Speed 2617.17 samples/sec Loss 11.6097 LearningRate 0.0693 Epoch: 3 Global Step: 139030 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:43,352-Speed 2621.76 samples/sec Loss 11.3927 LearningRate 0.0693 Epoch: 3 Global Step: 139040 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:47,256-Speed 2623.90 samples/sec Loss 11.6832 LearningRate 0.0693 Epoch: 3 Global Step: 139050 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:51,159-Speed 2624.32 samples/sec Loss 11.5999 LearningRate 0.0693 Epoch: 3 Global Step: 139060 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:19:55,049-Speed 2633.17 samples/sec Loss 11.4915 LearningRate 0.0693 Epoch: 3 Global Step: 139070 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:19:58,943-Speed 2630.15 samples/sec Loss 11.5676 LearningRate 0.0693 Epoch: 3 Global Step: 139080 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:20:02,848-Speed 2623.83 samples/sec Loss 11.5259 LearningRate 0.0693 Epoch: 3 Global Step: 139090 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:20:06,725-Speed 2641.14 samples/sec Loss 11.6779 LearningRate 0.0693 Epoch: 3 Global Step: 139100 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:10,619-Speed 2630.57 samples/sec Loss 11.5151 LearningRate 0.0693 Epoch: 3 Global Step: 139110 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:14,522-Speed 2623.67 samples/sec Loss 11.5945 LearningRate 0.0693 Epoch: 3 Global Step: 139120 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:18,419-Speed 2628.90 samples/sec Loss 11.5805 LearningRate 0.0693 Epoch: 3 Global Step: 139130 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:22,315-Speed 2629.02 samples/sec Loss 11.5616 LearningRate 0.0693 Epoch: 3 Global Step: 139140 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:26,214-Speed 2626.90 samples/sec Loss 11.3740 LearningRate 0.0693 Epoch: 3 Global Step: 139150 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:30,185-Speed 2578.89 samples/sec Loss 11.5706 LearningRate 0.0693 Epoch: 3 Global Step: 139160 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:34,087-Speed 2624.85 samples/sec Loss 11.5281 LearningRate 0.0693 Epoch: 3 Global Step: 139170 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:37,990-Speed 2624.95 samples/sec Loss 11.5730 LearningRate 0.0693 Epoch: 3 Global Step: 139180 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:41,885-Speed 2629.32 samples/sec Loss 11.4237 LearningRate 0.0693 Epoch: 3 Global Step: 139190 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:20:45,786-Speed 2625.90 samples/sec Loss 11.4499 LearningRate 0.0693 Epoch: 3 Global Step: 139200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:20:49,684-Speed 2628.05 samples/sec Loss 11.4868 LearningRate 0.0693 Epoch: 3 Global Step: 139210 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:20:53,599-Speed 2615.98 samples/sec Loss 11.2867 LearningRate 0.0693 Epoch: 3 Global Step: 139220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:20:57,508-Speed 2620.83 samples/sec Loss 11.3984 LearningRate 0.0693 Epoch: 3 Global Step: 139230 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:01,403-Speed 2629.49 samples/sec Loss 11.4265 LearningRate 0.0692 Epoch: 3 Global Step: 139240 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:05,298-Speed 2629.69 samples/sec Loss 11.4359 LearningRate 0.0692 Epoch: 3 Global Step: 139250 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:09,194-Speed 2628.85 samples/sec Loss 11.4189 LearningRate 0.0692 Epoch: 3 Global Step: 139260 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:13,089-Speed 2629.66 samples/sec Loss 11.5822 LearningRate 0.0692 Epoch: 3 Global Step: 139270 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:16,985-Speed 2629.29 samples/sec Loss 11.3172 LearningRate 0.0692 Epoch: 3 Global Step: 139280 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:20,896-Speed 2618.86 samples/sec Loss 11.6055 LearningRate 0.0692 Epoch: 3 Global Step: 139290 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:24,790-Speed 2630.37 samples/sec Loss 11.3424 LearningRate 0.0692 Epoch: 3 Global Step: 139300 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:21:28,667-Speed 2641.86 samples/sec Loss 11.4830 LearningRate 0.0692 Epoch: 3 Global Step: 139310 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:21:32,542-Speed 2643.35 samples/sec Loss 11.4411 LearningRate 0.0692 Epoch: 3 Global Step: 139320 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:21:36,443-Speed 2625.22 samples/sec Loss 11.6338 LearningRate 0.0692 Epoch: 3 Global Step: 139330 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:21:40,345-Speed 2624.83 samples/sec Loss 11.6626 LearningRate 0.0692 Epoch: 3 Global Step: 139340 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:21:44,258-Speed 2619.48 samples/sec Loss 11.5856 LearningRate 0.0692 Epoch: 3 Global Step: 139350 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:21:48,155-Speed 2628.65 samples/sec Loss 11.5247 LearningRate 0.0692 Epoch: 3 Global Step: 139360 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:21:52,064-Speed 2619.57 samples/sec Loss 11.5669 LearningRate 0.0692 Epoch: 3 Global Step: 139370 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:21:55,977-Speed 2618.37 samples/sec Loss 11.5507 LearningRate 0.0692 Epoch: 3 Global Step: 139380 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:21:59,873-Speed 2628.79 samples/sec Loss 11.4529 LearningRate 0.0692 Epoch: 3 Global Step: 139390 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:03,768-Speed 2629.42 samples/sec Loss 11.5102 LearningRate 0.0692 Epoch: 3 Global Step: 139400 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:07,664-Speed 2629.44 samples/sec Loss 11.5368 LearningRate 0.0692 Epoch: 3 Global Step: 139410 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:11,585-Speed 2612.58 samples/sec Loss 11.3993 LearningRate 0.0692 Epoch: 3 Global Step: 139420 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:22:15,484-Speed 2627.12 samples/sec Loss 11.6713 LearningRate 0.0692 Epoch: 3 Global Step: 139430 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:22:19,634-Speed 2468.35 samples/sec Loss 11.4380 LearningRate 0.0692 Epoch: 3 Global Step: 139440 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:22:23,539-Speed 2622.66 samples/sec Loss 11.4658 LearningRate 0.0692 Epoch: 3 Global Step: 139450 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:22:27,422-Speed 2639.03 samples/sec Loss 11.5489 LearningRate 0.0692 Epoch: 3 Global Step: 139460 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:31,374-Speed 2591.75 samples/sec Loss 11.5234 LearningRate 0.0692 Epoch: 3 Global Step: 139470 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:35,277-Speed 2623.78 samples/sec Loss 11.5054 LearningRate 0.0692 Epoch: 3 Global Step: 139480 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:39,171-Speed 2630.75 samples/sec Loss 11.6054 LearningRate 0.0692 Epoch: 3 Global Step: 139490 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:43,066-Speed 2629.66 samples/sec Loss 11.4039 LearningRate 0.0692 Epoch: 3 Global Step: 139500 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:46,958-Speed 2631.37 samples/sec Loss 11.6920 LearningRate 0.0692 Epoch: 3 Global Step: 139510 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:50,855-Speed 2629.08 samples/sec Loss 11.4551 LearningRate 0.0692 Epoch: 3 Global Step: 139520 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:54,749-Speed 2630.22 samples/sec Loss 11.4349 LearningRate 0.0692 Epoch: 3 Global Step: 139530 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:22:58,646-Speed 2628.31 samples/sec Loss 11.2923 LearningRate 0.0692 Epoch: 3 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:02,545-Speed 2627.11 samples/sec Loss 11.4934 LearningRate 0.0692 Epoch: 3 Global Step: 139550 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:06,440-Speed 2630.00 samples/sec Loss 11.4976 LearningRate 0.0692 Epoch: 3 Global Step: 139560 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:23:10,334-Speed 2630.19 samples/sec Loss 11.4292 LearningRate 0.0692 Epoch: 3 Global Step: 139570 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:23:14,230-Speed 2629.02 samples/sec Loss 11.3972 LearningRate 0.0692 Epoch: 3 Global Step: 139580 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:23:18,128-Speed 2627.69 samples/sec Loss 11.4578 LearningRate 0.0692 Epoch: 3 Global Step: 139590 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:23:22,061-Speed 2604.43 samples/sec Loss 11.5704 LearningRate 0.0692 Epoch: 3 Global Step: 139600 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:26,066-Speed 2557.48 samples/sec Loss 11.3798 LearningRate 0.0692 Epoch: 3 Global Step: 139610 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:29,960-Speed 2630.80 samples/sec Loss 11.6420 LearningRate 0.0692 Epoch: 3 Global Step: 139620 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:33,851-Speed 2631.96 samples/sec Loss 11.6040 LearningRate 0.0692 Epoch: 3 Global Step: 139630 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:37,745-Speed 2630.67 samples/sec Loss 11.4649 LearningRate 0.0692 Epoch: 3 Global Step: 139640 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:41,649-Speed 2623.33 samples/sec Loss 11.5198 LearningRate 0.0692 Epoch: 3 Global Step: 139650 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:45,577-Speed 2607.58 samples/sec Loss 11.3524 LearningRate 0.0692 Epoch: 3 Global Step: 139660 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:49,490-Speed 2617.71 samples/sec Loss 11.5823 LearningRate 0.0692 Epoch: 3 Global Step: 139670 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:53,385-Speed 2629.94 samples/sec Loss 11.5077 LearningRate 0.0692 Epoch: 3 Global Step: 139680 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:23:57,314-Speed 2607.12 samples/sec Loss 11.4814 LearningRate 0.0692 Epoch: 3 Global Step: 139690 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:01,211-Speed 2628.76 samples/sec Loss 11.5425 LearningRate 0.0692 Epoch: 3 Global Step: 139700 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:24:05,119-Speed 2620.39 samples/sec Loss 11.5945 LearningRate 0.0692 Epoch: 3 Global Step: 139710 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:24:09,025-Speed 2622.52 samples/sec Loss 11.4116 LearningRate 0.0692 Epoch: 3 Global Step: 139720 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:24:12,925-Speed 2626.56 samples/sec Loss 11.5413 LearningRate 0.0692 Epoch: 3 Global Step: 139730 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:24:16,828-Speed 2624.60 samples/sec Loss 11.4098 LearningRate 0.0691 Epoch: 3 Global Step: 139740 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:24:20,730-Speed 2624.17 samples/sec Loss 11.3048 LearningRate 0.0691 Epoch: 3 Global Step: 139750 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:24:24,628-Speed 2628.37 samples/sec Loss 11.5183 LearningRate 0.0691 Epoch: 3 Global Step: 139760 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:24:28,514-Speed 2635.46 samples/sec Loss 11.5297 LearningRate 0.0691 Epoch: 3 Global Step: 139770 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:32,411-Speed 2628.82 samples/sec Loss 11.5358 LearningRate 0.0691 Epoch: 3 Global Step: 139780 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:36,313-Speed 2624.64 samples/sec Loss 11.4005 LearningRate 0.0691 Epoch: 3 Global Step: 139790 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:40,216-Speed 2624.03 samples/sec Loss 11.4532 LearningRate 0.0691 Epoch: 3 Global Step: 139800 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:44,242-Speed 2543.71 samples/sec Loss 11.4930 LearningRate 0.0691 Epoch: 3 Global Step: 139810 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:48,199-Speed 2589.18 samples/sec Loss 11.5384 LearningRate 0.0691 Epoch: 3 Global Step: 139820 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:52,099-Speed 2626.57 samples/sec Loss 11.4486 LearningRate 0.0691 Epoch: 3 Global Step: 139830 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:56,007-Speed 2620.82 samples/sec Loss 11.5250 LearningRate 0.0691 Epoch: 3 Global Step: 139840 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:24:59,894-Speed 2635.54 samples/sec Loss 11.3960 LearningRate 0.0691 Epoch: 3 Global Step: 139850 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:25:03,787-Speed 2631.50 samples/sec Loss 11.5522 LearningRate 0.0691 Epoch: 3 Global Step: 139860 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:25:07,680-Speed 2630.50 samples/sec Loss 11.3853 LearningRate 0.0691 Epoch: 3 Global Step: 139870 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:11,574-Speed 2630.11 samples/sec Loss 11.4223 LearningRate 0.0691 Epoch: 3 Global Step: 139880 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:15,473-Speed 2626.67 samples/sec Loss 11.4669 LearningRate 0.0691 Epoch: 3 Global Step: 139890 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:19,370-Speed 2628.75 samples/sec Loss 11.4567 LearningRate 0.0691 Epoch: 3 Global Step: 139900 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:23,272-Speed 2625.16 samples/sec Loss 11.4065 LearningRate 0.0691 Epoch: 3 Global Step: 139910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:27,173-Speed 2625.65 samples/sec Loss 11.3955 LearningRate 0.0691 Epoch: 3 Global Step: 139920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:31,072-Speed 2627.22 samples/sec Loss 11.5157 LearningRate 0.0691 Epoch: 3 Global Step: 139930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:35,087-Speed 2551.26 samples/sec Loss 11.4261 LearningRate 0.0691 Epoch: 3 Global Step: 139940 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:39,052-Speed 2582.59 samples/sec Loss 11.5169 LearningRate 0.0691 Epoch: 3 Global Step: 139950 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:42,953-Speed 2625.67 samples/sec Loss 11.3971 LearningRate 0.0691 Epoch: 3 Global Step: 139960 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:46,832-Speed 2640.26 samples/sec Loss 11.4252 LearningRate 0.0691 Epoch: 3 Global Step: 139970 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:50,730-Speed 2627.41 samples/sec Loss 11.3244 LearningRate 0.0691 Epoch: 3 Global Step: 139980 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:54,638-Speed 2621.29 samples/sec Loss 11.3840 LearningRate 0.0691 Epoch: 3 Global Step: 139990 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:25:58,542-Speed 2623.70 samples/sec Loss 11.4631 LearningRate 0.0691 Epoch: 3 Global Step: 140000 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:26:41,925-[lfw][140000]XNorm: 23.652834
Training: 2022-04-13 11:26:41,926-[lfw][140000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-13 11:26:41,926-[lfw][140000]Accuracy-Highest: 0.99783
Training: 2022-04-13 11:27:31,837-[cfp_fp][140000]XNorm: 21.368183
Training: 2022-04-13 11:27:31,838-[cfp_fp][140000]Accuracy-Flip: 0.97786+-0.00767
Training: 2022-04-13 11:27:31,839-[cfp_fp][140000]Accuracy-Highest: 0.97986
Training: 2022-04-13 11:28:15,252-[agedb_30][140000]XNorm: 23.439239
Training: 2022-04-13 11:28:15,253-[agedb_30][140000]Accuracy-Flip: 0.96683+-0.00848
Training: 2022-04-13 11:28:15,254-[agedb_30][140000]Accuracy-Highest: 0.96800
Training: 2022-04-13 11:28:19,105-Speed 72.85 samples/sec Loss 11.4378 LearningRate 0.0691 Epoch: 3 Global Step: 140010 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:22,972-Speed 2648.64 samples/sec Loss 11.5861 LearningRate 0.0691 Epoch: 3 Global Step: 140020 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:26,845-Speed 2645.02 samples/sec Loss 11.3642 LearningRate 0.0691 Epoch: 3 Global Step: 140030 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:30,718-Speed 2645.06 samples/sec Loss 11.4575 LearningRate 0.0691 Epoch: 3 Global Step: 140040 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:34,593-Speed 2642.83 samples/sec Loss 11.6345 LearningRate 0.0691 Epoch: 3 Global Step: 140050 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:38,481-Speed 2634.40 samples/sec Loss 11.5239 LearningRate 0.0691 Epoch: 3 Global Step: 140060 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:42,360-Speed 2641.40 samples/sec Loss 11.5056 LearningRate 0.0691 Epoch: 3 Global Step: 140070 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:46,238-Speed 2641.57 samples/sec Loss 11.3834 LearningRate 0.0691 Epoch: 3 Global Step: 140080 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:50,117-Speed 2640.26 samples/sec Loss 11.4860 LearningRate 0.0691 Epoch: 3 Global Step: 140090 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:54,006-Speed 2633.76 samples/sec Loss 11.5370 LearningRate 0.0691 Epoch: 3 Global Step: 140100 Fp16 Grad Scale: 65536 Required: 78 hours
Training: 2022-04-13 11:28:57,903-Speed 2628.66 samples/sec Loss 11.5312 LearningRate 0.0691 Epoch: 3 Global Step: 140110 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:29:01,790-Speed 2635.08 samples/sec Loss 11.4537 LearningRate 0.0691 Epoch: 3 Global Step: 140120 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:29:05,679-Speed 2633.54 samples/sec Loss 11.4460 LearningRate 0.0691 Epoch: 3 Global Step: 140130 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:29:09,570-Speed 2632.67 samples/sec Loss 11.3876 LearningRate 0.0691 Epoch: 3 Global Step: 140140 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:29:13,459-Speed 2633.58 samples/sec Loss 11.4551 LearningRate 0.0691 Epoch: 3 Global Step: 140150 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:29:17,367-Speed 2621.12 samples/sec Loss 11.4543 LearningRate 0.0691 Epoch: 3 Global Step: 140160 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:29:21,267-Speed 2625.96 samples/sec Loss 11.2977 LearningRate 0.0691 Epoch: 3 Global Step: 140170 Fp16 Grad Scale: 131072 Required: 78 hours
Training: 2022-04-13 11:29:25,168-Speed 2625.77 samples/sec Loss 11.4462 LearningRate 0.0691 Epoch: 3 Global Step: 140180 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:29,082-Speed 2617.20 samples/sec Loss 11.4731 LearningRate 0.0691 Epoch: 3 Global Step: 140190 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:32,997-Speed 2616.31 samples/sec Loss 11.5266 LearningRate 0.0691 Epoch: 3 Global Step: 140200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:36,881-Speed 2636.83 samples/sec Loss 11.4023 LearningRate 0.0691 Epoch: 3 Global Step: 140210 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:40,778-Speed 2628.16 samples/sec Loss 11.4398 LearningRate 0.0691 Epoch: 3 Global Step: 140220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:44,676-Speed 2627.40 samples/sec Loss 11.4072 LearningRate 0.0690 Epoch: 3 Global Step: 140230 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:48,587-Speed 2619.35 samples/sec Loss 11.4482 LearningRate 0.0690 Epoch: 3 Global Step: 140240 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:52,486-Speed 2627.18 samples/sec Loss 11.4549 LearningRate 0.0690 Epoch: 3 Global Step: 140250 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:29:56,387-Speed 2625.25 samples/sec Loss 11.5308 LearningRate 0.0690 Epoch: 3 Global Step: 140260 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:00,297-Speed 2620.24 samples/sec Loss 11.6045 LearningRate 0.0690 Epoch: 3 Global Step: 140270 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:04,201-Speed 2623.77 samples/sec Loss 11.3271 LearningRate 0.0690 Epoch: 3 Global Step: 140280 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:08,107-Speed 2621.71 samples/sec Loss 11.5415 LearningRate 0.0690 Epoch: 3 Global Step: 140290 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:12,019-Speed 2618.34 samples/sec Loss 11.4101 LearningRate 0.0690 Epoch: 3 Global Step: 140300 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:15,927-Speed 2621.05 samples/sec Loss 11.5158 LearningRate 0.0690 Epoch: 3 Global Step: 140310 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:19,824-Speed 2628.79 samples/sec Loss 11.3239 LearningRate 0.0690 Epoch: 3 Global Step: 140320 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:23,726-Speed 2625.14 samples/sec Loss 11.4142 LearningRate 0.0690 Epoch: 3 Global Step: 140330 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:27,630-Speed 2623.49 samples/sec Loss 11.5095 LearningRate 0.0690 Epoch: 3 Global Step: 140340 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:31,530-Speed 2625.81 samples/sec Loss 11.3349 LearningRate 0.0690 Epoch: 3 Global Step: 140350 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:35,454-Speed 2610.39 samples/sec Loss 11.6060 LearningRate 0.0690 Epoch: 3 Global Step: 140360 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:39,358-Speed 2623.76 samples/sec Loss 11.5652 LearningRate 0.0690 Epoch: 3 Global Step: 140370 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:43,264-Speed 2622.66 samples/sec Loss 11.5602 LearningRate 0.0690 Epoch: 3 Global Step: 140380 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:47,173-Speed 2620.08 samples/sec Loss 11.5877 LearningRate 0.0690 Epoch: 3 Global Step: 140390 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:51,092-Speed 2613.12 samples/sec Loss 11.5007 LearningRate 0.0690 Epoch: 3 Global Step: 140400 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:30:55,007-Speed 2616.75 samples/sec Loss 11.4437 LearningRate 0.0690 Epoch: 3 Global Step: 140410 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:30:58,944-Speed 2601.45 samples/sec Loss 11.4228 LearningRate 0.0690 Epoch: 3 Global Step: 140420 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:31:02,843-Speed 2627.76 samples/sec Loss 11.3630 LearningRate 0.0690 Epoch: 3 Global Step: 140430 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:31:06,725-Speed 2638.12 samples/sec Loss 11.4592 LearningRate 0.0690 Epoch: 3 Global Step: 140440 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:10,651-Speed 2609.36 samples/sec Loss 11.9044 LearningRate 0.0690 Epoch: 3 Global Step: 140450 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:14,560-Speed 2620.48 samples/sec Loss 11.4944 LearningRate 0.0690 Epoch: 3 Global Step: 140460 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:18,466-Speed 2621.89 samples/sec Loss 11.4162 LearningRate 0.0690 Epoch: 3 Global Step: 140470 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:22,391-Speed 2609.66 samples/sec Loss 11.4396 LearningRate 0.0690 Epoch: 3 Global Step: 140480 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:26,300-Speed 2620.20 samples/sec Loss 11.4124 LearningRate 0.0690 Epoch: 3 Global Step: 140490 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:30,217-Speed 2615.37 samples/sec Loss 11.4965 LearningRate 0.0690 Epoch: 3 Global Step: 140500 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:34,134-Speed 2615.02 samples/sec Loss 11.3716 LearningRate 0.0690 Epoch: 3 Global Step: 140510 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:38,048-Speed 2616.76 samples/sec Loss 11.2961 LearningRate 0.0690 Epoch: 3 Global Step: 140520 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:41,962-Speed 2616.96 samples/sec Loss 11.7287 LearningRate 0.0690 Epoch: 3 Global Step: 140530 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:31:45,881-Speed 2613.43 samples/sec Loss 11.4125 LearningRate 0.0690 Epoch: 3 Global Step: 140540 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:31:49,825-Speed 2596.90 samples/sec Loss 11.3419 LearningRate 0.0690 Epoch: 3 Global Step: 140550 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:31:53,752-Speed 2608.72 samples/sec Loss 11.4240 LearningRate 0.0690 Epoch: 3 Global Step: 140560 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:31:57,682-Speed 2606.15 samples/sec Loss 11.4415 LearningRate 0.0690 Epoch: 3 Global Step: 140570 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:01,610-Speed 2607.81 samples/sec Loss 11.5156 LearningRate 0.0690 Epoch: 3 Global Step: 140580 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:05,609-Speed 2561.84 samples/sec Loss 11.4472 LearningRate 0.0690 Epoch: 3 Global Step: 140590 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:09,515-Speed 2622.07 samples/sec Loss 11.3307 LearningRate 0.0690 Epoch: 3 Global Step: 140600 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:13,427-Speed 2618.34 samples/sec Loss 11.5070 LearningRate 0.0690 Epoch: 3 Global Step: 140610 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:17,342-Speed 2616.20 samples/sec Loss 11.5071 LearningRate 0.0690 Epoch: 3 Global Step: 140620 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:21,250-Speed 2620.77 samples/sec Loss 11.4577 LearningRate 0.0690 Epoch: 3 Global Step: 140630 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:25,148-Speed 2627.65 samples/sec Loss 11.4826 LearningRate 0.0690 Epoch: 3 Global Step: 140640 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:29,053-Speed 2622.99 samples/sec Loss 11.4960 LearningRate 0.0690 Epoch: 3 Global Step: 140650 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:32:32,943-Speed 2633.49 samples/sec Loss 11.6009 LearningRate 0.0690 Epoch: 3 Global Step: 140660 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:32:36,860-Speed 2614.66 samples/sec Loss 11.4315 LearningRate 0.0690 Epoch: 3 Global Step: 140670 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:32:40,775-Speed 2616.19 samples/sec Loss 11.5443 LearningRate 0.0690 Epoch: 3 Global Step: 140680 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:32:44,684-Speed 2620.35 samples/sec Loss 11.3306 LearningRate 0.0690 Epoch: 3 Global Step: 140690 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:32:48,594-Speed 2619.46 samples/sec Loss 11.5870 LearningRate 0.0690 Epoch: 3 Global Step: 140700 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:32:52,500-Speed 2622.40 samples/sec Loss 11.5220 LearningRate 0.0690 Epoch: 3 Global Step: 140710 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:32:56,404-Speed 2623.56 samples/sec Loss 11.4744 LearningRate 0.0690 Epoch: 3 Global Step: 140720 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:33:00,309-Speed 2622.96 samples/sec Loss 11.4578 LearningRate 0.0689 Epoch: 3 Global Step: 140730 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:33:04,218-Speed 2620.26 samples/sec Loss 11.4407 LearningRate 0.0689 Epoch: 3 Global Step: 140740 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:33:08,125-Speed 2621.39 samples/sec Loss 11.5646 LearningRate 0.0689 Epoch: 3 Global Step: 140750 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:33:12,036-Speed 2619.28 samples/sec Loss 11.3965 LearningRate 0.0689 Epoch: 3 Global Step: 140760 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:15,946-Speed 2619.74 samples/sec Loss 11.5331 LearningRate 0.0689 Epoch: 3 Global Step: 140770 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:19,855-Speed 2620.21 samples/sec Loss 11.4870 LearningRate 0.0689 Epoch: 3 Global Step: 140780 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:23,762-Speed 2621.48 samples/sec Loss 11.5189 LearningRate 0.0689 Epoch: 3 Global Step: 140790 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:27,682-Speed 2613.45 samples/sec Loss 11.4621 LearningRate 0.0689 Epoch: 3 Global Step: 140800 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:31,610-Speed 2607.63 samples/sec Loss 11.4119 LearningRate 0.0689 Epoch: 3 Global Step: 140810 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:35,516-Speed 2621.80 samples/sec Loss 11.5202 LearningRate 0.0689 Epoch: 3 Global Step: 140820 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:39,429-Speed 2617.38 samples/sec Loss 11.4485 LearningRate 0.0689 Epoch: 3 Global Step: 140830 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:43,342-Speed 2618.22 samples/sec Loss 11.3970 LearningRate 0.0689 Epoch: 3 Global Step: 140840 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:33:47,241-Speed 2626.78 samples/sec Loss 11.5087 LearningRate 0.0689 Epoch: 3 Global Step: 140850 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:33:51,157-Speed 2615.45 samples/sec Loss 11.5148 LearningRate 0.0689 Epoch: 3 Global Step: 140860 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:33:55,084-Speed 2608.10 samples/sec Loss 11.3518 LearningRate 0.0689 Epoch: 3 Global Step: 140870 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:33:59,019-Speed 2603.11 samples/sec Loss 11.5160 LearningRate 0.0689 Epoch: 3 Global Step: 140880 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:34:02,927-Speed 2621.14 samples/sec Loss 11.3086 LearningRate 0.0689 Epoch: 3 Global Step: 140890 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:34:06,962-Speed 2537.96 samples/sec Loss 11.3646 LearningRate 0.0689 Epoch: 3 Global Step: 140900 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:34:10,878-Speed 2615.85 samples/sec Loss 11.5812 LearningRate 0.0689 Epoch: 3 Global Step: 140910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:34:14,794-Speed 2615.33 samples/sec Loss 11.4661 LearningRate 0.0689 Epoch: 3 Global Step: 140920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:34:18,715-Speed 2612.58 samples/sec Loss 11.4968 LearningRate 0.0689 Epoch: 3 Global Step: 140930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:34:22,640-Speed 2609.72 samples/sec Loss 11.5453 LearningRate 0.0689 Epoch: 3 Global Step: 140940 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:34:26,560-Speed 2613.16 samples/sec Loss 11.5508 LearningRate 0.0689 Epoch: 3 Global Step: 140950 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:30,477-Speed 2614.60 samples/sec Loss 11.4078 LearningRate 0.0689 Epoch: 3 Global Step: 140960 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:34,386-Speed 2619.86 samples/sec Loss 11.3966 LearningRate 0.0689 Epoch: 3 Global Step: 140970 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:38,298-Speed 2618.44 samples/sec Loss 11.3262 LearningRate 0.0689 Epoch: 3 Global Step: 140980 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:42,207-Speed 2621.12 samples/sec Loss 11.4912 LearningRate 0.0689 Epoch: 3 Global Step: 140990 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:46,120-Speed 2617.02 samples/sec Loss 11.4893 LearningRate 0.0689 Epoch: 3 Global Step: 141000 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:50,034-Speed 2617.55 samples/sec Loss 11.3086 LearningRate 0.0689 Epoch: 3 Global Step: 141010 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:53,947-Speed 2616.99 samples/sec Loss 11.4281 LearningRate 0.0689 Epoch: 3 Global Step: 141020 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:34:57,857-Speed 2620.30 samples/sec Loss 11.4373 LearningRate 0.0689 Epoch: 3 Global Step: 141030 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:35:01,766-Speed 2619.97 samples/sec Loss 11.6408 LearningRate 0.0689 Epoch: 3 Global Step: 141040 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:35:05,678-Speed 2618.26 samples/sec Loss 11.4914 LearningRate 0.0689 Epoch: 3 Global Step: 141050 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:35:09,584-Speed 2621.92 samples/sec Loss 11.3488 LearningRate 0.0689 Epoch: 3 Global Step: 141060 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:35:13,499-Speed 2616.46 samples/sec Loss 11.2188 LearningRate 0.0689 Epoch: 3 Global Step: 141070 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:35:17,421-Speed 2611.38 samples/sec Loss 11.3704 LearningRate 0.0689 Epoch: 3 Global Step: 141080 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:35:21,310-Speed 2634.45 samples/sec Loss 11.4613 LearningRate 0.0689 Epoch: 3 Global Step: 141090 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:25,215-Speed 2622.51 samples/sec Loss 11.3017 LearningRate 0.0689 Epoch: 3 Global Step: 141100 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:29,119-Speed 2623.79 samples/sec Loss 11.3073 LearningRate 0.0689 Epoch: 3 Global Step: 141110 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:33,028-Speed 2620.55 samples/sec Loss 11.2259 LearningRate 0.0689 Epoch: 3 Global Step: 141120 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:36,931-Speed 2623.69 samples/sec Loss 11.4873 LearningRate 0.0689 Epoch: 3 Global Step: 141130 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:40,848-Speed 2615.36 samples/sec Loss 11.5308 LearningRate 0.0689 Epoch: 3 Global Step: 141140 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:44,787-Speed 2600.33 samples/sec Loss 11.4312 LearningRate 0.0689 Epoch: 3 Global Step: 141150 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:48,759-Speed 2579.08 samples/sec Loss 11.2547 LearningRate 0.0689 Epoch: 3 Global Step: 141160 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:52,698-Speed 2600.10 samples/sec Loss 11.4434 LearningRate 0.0689 Epoch: 3 Global Step: 141170 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:35:56,622-Speed 2610.28 samples/sec Loss 11.3866 LearningRate 0.0689 Epoch: 3 Global Step: 141180 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:36:00,536-Speed 2617.28 samples/sec Loss 11.5506 LearningRate 0.0689 Epoch: 3 Global Step: 141190 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:04,449-Speed 2617.65 samples/sec Loss 11.3638 LearningRate 0.0689 Epoch: 3 Global Step: 141200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:08,382-Speed 2603.84 samples/sec Loss 11.4456 LearningRate 0.0689 Epoch: 3 Global Step: 141210 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:12,308-Speed 2608.89 samples/sec Loss 11.3798 LearningRate 0.0689 Epoch: 3 Global Step: 141220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:16,305-Speed 2562.99 samples/sec Loss 11.2569 LearningRate 0.0688 Epoch: 3 Global Step: 141230 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:20,225-Speed 2612.52 samples/sec Loss 11.3921 LearningRate 0.0688 Epoch: 3 Global Step: 141240 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:24,128-Speed 2624.60 samples/sec Loss 11.3214 LearningRate 0.0688 Epoch: 3 Global Step: 141250 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:28,038-Speed 2620.38 samples/sec Loss 11.4252 LearningRate 0.0688 Epoch: 3 Global Step: 141260 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:31,947-Speed 2620.02 samples/sec Loss 11.3511 LearningRate 0.0688 Epoch: 3 Global Step: 141270 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:35,864-Speed 2615.44 samples/sec Loss 11.3453 LearningRate 0.0688 Epoch: 3 Global Step: 141280 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:36:39,769-Speed 2622.58 samples/sec Loss 11.4773 LearningRate 0.0688 Epoch: 3 Global Step: 141290 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:36:43,693-Speed 2610.46 samples/sec Loss 11.5312 LearningRate 0.0688 Epoch: 3 Global Step: 141300 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:36:47,614-Speed 2612.48 samples/sec Loss 11.3539 LearningRate 0.0688 Epoch: 3 Global Step: 141310 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:36:51,523-Speed 2620.10 samples/sec Loss 11.4182 LearningRate 0.0688 Epoch: 3 Global Step: 141320 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:36:55,443-Speed 2612.28 samples/sec Loss 11.3839 LearningRate 0.0688 Epoch: 3 Global Step: 141330 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:36:59,379-Speed 2602.94 samples/sec Loss 11.2267 LearningRate 0.0688 Epoch: 3 Global Step: 141340 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:37:03,268-Speed 2633.99 samples/sec Loss 11.4737 LearningRate 0.0688 Epoch: 3 Global Step: 141350 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:07,184-Speed 2615.32 samples/sec Loss 11.4666 LearningRate 0.0688 Epoch: 3 Global Step: 141360 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:11,095-Speed 2618.99 samples/sec Loss 11.4450 LearningRate 0.0688 Epoch: 3 Global Step: 141370 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:15,001-Speed 2622.54 samples/sec Loss 11.4541 LearningRate 0.0688 Epoch: 3 Global Step: 141380 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:18,907-Speed 2621.95 samples/sec Loss 11.2949 LearningRate 0.0688 Epoch: 3 Global Step: 141390 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:22,821-Speed 2616.75 samples/sec Loss 11.4561 LearningRate 0.0688 Epoch: 3 Global Step: 141400 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:26,786-Speed 2583.67 samples/sec Loss 11.3978 LearningRate 0.0688 Epoch: 3 Global Step: 141410 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:30,692-Speed 2622.34 samples/sec Loss 11.4393 LearningRate 0.0688 Epoch: 3 Global Step: 141420 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:34,602-Speed 2620.00 samples/sec Loss 11.2692 LearningRate 0.0688 Epoch: 3 Global Step: 141430 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:38,517-Speed 2615.79 samples/sec Loss 11.4151 LearningRate 0.0688 Epoch: 3 Global Step: 141440 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:37:42,428-Speed 2619.45 samples/sec Loss 11.3961 LearningRate 0.0688 Epoch: 3 Global Step: 141450 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:37:46,335-Speed 2621.32 samples/sec Loss 11.3946 LearningRate 0.0688 Epoch: 3 Global Step: 141460 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:37:50,243-Speed 2620.98 samples/sec Loss 11.4378 LearningRate 0.0688 Epoch: 3 Global Step: 141470 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:37:54,166-Speed 2610.32 samples/sec Loss 11.5497 LearningRate 0.0688 Epoch: 3 Global Step: 141480 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:37:58,073-Speed 2622.20 samples/sec Loss 11.4073 LearningRate 0.0688 Epoch: 3 Global Step: 141490 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:38:01,975-Speed 2624.89 samples/sec Loss 11.4073 LearningRate 0.0688 Epoch: 3 Global Step: 141500 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:05,909-Speed 2605.20 samples/sec Loss 11.2899 LearningRate 0.0688 Epoch: 3 Global Step: 141510 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:09,814-Speed 2622.70 samples/sec Loss 11.5003 LearningRate 0.0688 Epoch: 3 Global Step: 141520 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:13,724-Speed 2619.64 samples/sec Loss 11.4092 LearningRate 0.0688 Epoch: 3 Global Step: 141530 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:17,641-Speed 2615.16 samples/sec Loss 11.5568 LearningRate 0.0688 Epoch: 3 Global Step: 141540 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:21,550-Speed 2619.54 samples/sec Loss 11.4394 LearningRate 0.0688 Epoch: 3 Global Step: 141550 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:25,457-Speed 2621.92 samples/sec Loss 11.3653 LearningRate 0.0688 Epoch: 3 Global Step: 141560 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:29,366-Speed 2620.44 samples/sec Loss 11.3104 LearningRate 0.0688 Epoch: 3 Global Step: 141570 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:33,275-Speed 2620.55 samples/sec Loss 11.4000 LearningRate 0.0688 Epoch: 3 Global Step: 141580 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:37,180-Speed 2622.72 samples/sec Loss 11.4327 LearningRate 0.0688 Epoch: 3 Global Step: 141590 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:38:41,087-Speed 2621.27 samples/sec Loss 11.3998 LearningRate 0.0688 Epoch: 3 Global Step: 141600 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:38:45,022-Speed 2603.22 samples/sec Loss 11.4430 LearningRate 0.0688 Epoch: 3 Global Step: 141610 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:38:48,929-Speed 2621.70 samples/sec Loss 11.2411 LearningRate 0.0688 Epoch: 3 Global Step: 141620 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:38:52,928-Speed 2561.38 samples/sec Loss 11.3899 LearningRate 0.0688 Epoch: 3 Global Step: 141630 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:38:57,041-Speed 2490.41 samples/sec Loss 11.1615 LearningRate 0.0688 Epoch: 3 Global Step: 141640 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:01,050-Speed 2554.64 samples/sec Loss 11.4369 LearningRate 0.0688 Epoch: 3 Global Step: 141650 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:04,962-Speed 2618.67 samples/sec Loss 11.3843 LearningRate 0.0688 Epoch: 3 Global Step: 141660 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:08,875-Speed 2617.47 samples/sec Loss 11.3637 LearningRate 0.0688 Epoch: 3 Global Step: 141670 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:12,800-Speed 2609.48 samples/sec Loss 11.3543 LearningRate 0.0688 Epoch: 3 Global Step: 141680 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:16,702-Speed 2624.64 samples/sec Loss 11.3089 LearningRate 0.0688 Epoch: 3 Global Step: 141690 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:20,597-Speed 2629.88 samples/sec Loss 11.3449 LearningRate 0.0688 Epoch: 3 Global Step: 141700 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:24,511-Speed 2617.20 samples/sec Loss 11.3541 LearningRate 0.0688 Epoch: 3 Global Step: 141710 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:28,442-Speed 2605.52 samples/sec Loss 11.4136 LearningRate 0.0688 Epoch: 3 Global Step: 141720 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:32,348-Speed 2621.96 samples/sec Loss 11.3649 LearningRate 0.0687 Epoch: 3 Global Step: 141730 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:36,255-Speed 2621.35 samples/sec Loss 11.3684 LearningRate 0.0687 Epoch: 3 Global Step: 141740 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:40,166-Speed 2619.25 samples/sec Loss 11.4180 LearningRate 0.0687 Epoch: 3 Global Step: 141750 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:44,074-Speed 2621.06 samples/sec Loss 11.4867 LearningRate 0.0687 Epoch: 3 Global Step: 141760 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:47,979-Speed 2623.02 samples/sec Loss 11.3242 LearningRate 0.0687 Epoch: 3 Global Step: 141770 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:51,887-Speed 2620.82 samples/sec Loss 11.2706 LearningRate 0.0687 Epoch: 3 Global Step: 141780 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:55,794-Speed 2621.58 samples/sec Loss 11.4904 LearningRate 0.0687 Epoch: 3 Global Step: 141790 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:39:59,683-Speed 2633.86 samples/sec Loss 11.4283 LearningRate 0.0687 Epoch: 3 Global Step: 141800 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:03,595-Speed 2617.85 samples/sec Loss 11.5027 LearningRate 0.0687 Epoch: 3 Global Step: 141810 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:07,530-Speed 2602.62 samples/sec Loss 11.4241 LearningRate 0.0687 Epoch: 3 Global Step: 141820 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:11,441-Speed 2619.21 samples/sec Loss 11.4088 LearningRate 0.0687 Epoch: 3 Global Step: 141830 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:15,356-Speed 2616.15 samples/sec Loss 11.3979 LearningRate 0.0687 Epoch: 3 Global Step: 141840 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:19,486-Speed 2480.17 samples/sec Loss 11.3195 LearningRate 0.0687 Epoch: 3 Global Step: 141850 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:23,405-Speed 2613.19 samples/sec Loss 11.3441 LearningRate 0.0687 Epoch: 3 Global Step: 141860 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:27,329-Speed 2610.63 samples/sec Loss 11.4124 LearningRate 0.0687 Epoch: 3 Global Step: 141870 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:31,250-Speed 2611.94 samples/sec Loss 11.3360 LearningRate 0.0687 Epoch: 3 Global Step: 141880 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:35,193-Speed 2597.16 samples/sec Loss 11.3684 LearningRate 0.0687 Epoch: 3 Global Step: 141890 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:39,105-Speed 2618.38 samples/sec Loss 11.2552 LearningRate 0.0687 Epoch: 3 Global Step: 141900 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:43,081-Speed 2576.52 samples/sec Loss 11.3602 LearningRate 0.0687 Epoch: 3 Global Step: 141910 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:47,003-Speed 2611.50 samples/sec Loss 11.3318 LearningRate 0.0687 Epoch: 3 Global Step: 141920 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:50,935-Speed 2605.82 samples/sec Loss 11.3419 LearningRate 0.0687 Epoch: 3 Global Step: 141930 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:54,855-Speed 2613.17 samples/sec Loss 11.3849 LearningRate 0.0687 Epoch: 3 Global Step: 141940 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:40:58,778-Speed 2610.78 samples/sec Loss 11.5047 LearningRate 0.0687 Epoch: 3 Global Step: 141950 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:41:02,701-Speed 2611.14 samples/sec Loss 11.4682 LearningRate 0.0687 Epoch: 3 Global Step: 141960 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:41:06,604-Speed 2623.58 samples/sec Loss 11.3661 LearningRate 0.0687 Epoch: 3 Global Step: 141970 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:41:10,513-Speed 2620.03 samples/sec Loss 11.4929 LearningRate 0.0687 Epoch: 3 Global Step: 141980 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:14,424-Speed 2619.65 samples/sec Loss 11.3642 LearningRate 0.0687 Epoch: 3 Global Step: 141990 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:18,329-Speed 2622.61 samples/sec Loss 11.5485 LearningRate 0.0687 Epoch: 3 Global Step: 142000 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:22,238-Speed 2620.72 samples/sec Loss 11.3228 LearningRate 0.0687 Epoch: 3 Global Step: 142010 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:26,141-Speed 2624.01 samples/sec Loss 11.2822 LearningRate 0.0687 Epoch: 3 Global Step: 142020 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:30,045-Speed 2623.46 samples/sec Loss 11.3175 LearningRate 0.0687 Epoch: 3 Global Step: 142030 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:33,971-Speed 2609.10 samples/sec Loss 11.3594 LearningRate 0.0687 Epoch: 3 Global Step: 142040 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:37,903-Speed 2604.63 samples/sec Loss 11.2956 LearningRate 0.0687 Epoch: 3 Global Step: 142050 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:41,823-Speed 2612.43 samples/sec Loss 11.3615 LearningRate 0.0687 Epoch: 3 Global Step: 142060 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:45,749-Speed 2608.70 samples/sec Loss 11.4769 LearningRate 0.0687 Epoch: 3 Global Step: 142070 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:41:49,666-Speed 2615.54 samples/sec Loss 11.3290 LearningRate 0.0687 Epoch: 3 Global Step: 142080 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:41:53,570-Speed 2623.30 samples/sec Loss 11.4997 LearningRate 0.0687 Epoch: 3 Global Step: 142090 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:41:57,473-Speed 2624.81 samples/sec Loss 11.4257 LearningRate 0.0687 Epoch: 3 Global Step: 142100 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:42:01,362-Speed 2633.34 samples/sec Loss 11.4061 LearningRate 0.0687 Epoch: 3 Global Step: 142110 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:05,285-Speed 2610.80 samples/sec Loss 11.3164 LearningRate 0.0687 Epoch: 3 Global Step: 142120 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:09,194-Speed 2620.09 samples/sec Loss 11.4694 LearningRate 0.0687 Epoch: 3 Global Step: 142130 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:13,097-Speed 2624.71 samples/sec Loss 11.4833 LearningRate 0.0687 Epoch: 3 Global Step: 142140 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:16,998-Speed 2625.06 samples/sec Loss 11.3224 LearningRate 0.0687 Epoch: 3 Global Step: 142150 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:20,920-Speed 2611.26 samples/sec Loss 11.4785 LearningRate 0.0687 Epoch: 3 Global Step: 142160 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:24,839-Speed 2614.13 samples/sec Loss 11.2885 LearningRate 0.0687 Epoch: 3 Global Step: 142170 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:28,740-Speed 2626.94 samples/sec Loss 11.3219 LearningRate 0.0687 Epoch: 3 Global Step: 142180 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:32,648-Speed 2620.49 samples/sec Loss 11.3411 LearningRate 0.0687 Epoch: 3 Global Step: 142190 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:36,559-Speed 2618.97 samples/sec Loss 11.3157 LearningRate 0.0687 Epoch: 3 Global Step: 142200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:40,474-Speed 2616.42 samples/sec Loss 11.4341 LearningRate 0.0687 Epoch: 3 Global Step: 142210 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:42:44,375-Speed 2625.56 samples/sec Loss 11.3482 LearningRate 0.0687 Epoch: 3 Global Step: 142220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:42:48,270-Speed 2629.85 samples/sec Loss 11.3652 LearningRate 0.0686 Epoch: 3 Global Step: 142230 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:42:52,147-Speed 2641.49 samples/sec Loss 12.0246 LearningRate 0.0686 Epoch: 3 Global Step: 142240 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:42:56,049-Speed 2625.27 samples/sec Loss 11.7898 LearningRate 0.0686 Epoch: 3 Global Step: 142250 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:00,028-Speed 2573.98 samples/sec Loss 11.7476 LearningRate 0.0686 Epoch: 3 Global Step: 142260 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:04,082-Speed 2526.19 samples/sec Loss 11.6340 LearningRate 0.0686 Epoch: 3 Global Step: 142270 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:07,994-Speed 2618.31 samples/sec Loss 11.5403 LearningRate 0.0686 Epoch: 3 Global Step: 142280 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:11,911-Speed 2615.31 samples/sec Loss 11.4514 LearningRate 0.0686 Epoch: 3 Global Step: 142290 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:15,809-Speed 2627.19 samples/sec Loss 11.3795 LearningRate 0.0686 Epoch: 3 Global Step: 142300 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:19,723-Speed 2617.16 samples/sec Loss 11.3141 LearningRate 0.0686 Epoch: 3 Global Step: 142310 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:23,632-Speed 2620.03 samples/sec Loss 11.4755 LearningRate 0.0686 Epoch: 3 Global Step: 142320 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:27,542-Speed 2619.93 samples/sec Loss 11.4946 LearningRate 0.0686 Epoch: 3 Global Step: 142330 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 11:43:31,454-Speed 2618.05 samples/sec Loss 11.4763 LearningRate 0.0686 Epoch: 3 Global Step: 142340 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:43:35,367-Speed 2617.43 samples/sec Loss 11.4352 LearningRate 0.0686 Epoch: 3 Global Step: 142350 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:43:39,366-Speed 2561.07 samples/sec Loss 11.4701 LearningRate 0.0686 Epoch: 3 Global Step: 142360 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:43:43,268-Speed 2624.84 samples/sec Loss 11.4895 LearningRate 0.0686 Epoch: 3 Global Step: 142370 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:43:47,177-Speed 2621.11 samples/sec Loss 11.3880 LearningRate 0.0686 Epoch: 3 Global Step: 142380 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:43:51,123-Speed 2595.45 samples/sec Loss 11.4908 LearningRate 0.0686 Epoch: 3 Global Step: 142390 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:43:55,026-Speed 2624.07 samples/sec Loss 11.4450 LearningRate 0.0686 Epoch: 3 Global Step: 142400 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:43:58,944-Speed 2614.58 samples/sec Loss 11.4029 LearningRate 0.0686 Epoch: 3 Global Step: 142410 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:44:02,845-Speed 2625.82 samples/sec Loss 11.3852 LearningRate 0.0686 Epoch: 3 Global Step: 142420 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:44:06,764-Speed 2613.71 samples/sec Loss 11.4392 LearningRate 0.0686 Epoch: 3 Global Step: 142430 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:44:10,691-Speed 2608.61 samples/sec Loss 11.4646 LearningRate 0.0686 Epoch: 3 Global Step: 142440 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:14,606-Speed 2616.02 samples/sec Loss 11.3485 LearningRate 0.0686 Epoch: 3 Global Step: 142450 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:18,510-Speed 2623.43 samples/sec Loss 11.3384 LearningRate 0.0686 Epoch: 3 Global Step: 142460 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:22,411-Speed 2626.12 samples/sec Loss 11.4016 LearningRate 0.0686 Epoch: 3 Global Step: 142470 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:26,312-Speed 2625.15 samples/sec Loss 11.3393 LearningRate 0.0686 Epoch: 3 Global Step: 142480 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:30,218-Speed 2622.50 samples/sec Loss 11.3671 LearningRate 0.0686 Epoch: 3 Global Step: 142490 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:34,120-Speed 2625.06 samples/sec Loss 11.4317 LearningRate 0.0686 Epoch: 3 Global Step: 142500 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:38,031-Speed 2618.40 samples/sec Loss 11.3466 LearningRate 0.0686 Epoch: 3 Global Step: 142510 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:41,944-Speed 2617.85 samples/sec Loss 11.2513 LearningRate 0.0686 Epoch: 3 Global Step: 142520 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:45,854-Speed 2619.46 samples/sec Loss 11.2853 LearningRate 0.0686 Epoch: 3 Global Step: 142530 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:44:49,775-Speed 2612.16 samples/sec Loss 11.4241 LearningRate 0.0686 Epoch: 3 Global Step: 142540 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:44:53,688-Speed 2617.67 samples/sec Loss 11.3326 LearningRate 0.0686 Epoch: 3 Global Step: 142550 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:44:57,614-Speed 2608.89 samples/sec Loss 11.2993 LearningRate 0.0686 Epoch: 3 Global Step: 142560 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:45:01,538-Speed 2609.89 samples/sec Loss 11.4769 LearningRate 0.0686 Epoch: 3 Global Step: 142570 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:45:05,466-Speed 2607.21 samples/sec Loss 11.3171 LearningRate 0.0686 Epoch: 3 Global Step: 142580 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:45:09,380-Speed 2616.97 samples/sec Loss 11.4017 LearningRate 0.0686 Epoch: 3 Global Step: 142590 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:45:13,306-Speed 2609.01 samples/sec Loss 11.3690 LearningRate 0.0686 Epoch: 3 Global Step: 142600 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:45:17,228-Speed 2611.55 samples/sec Loss 11.3179 LearningRate 0.0686 Epoch: 3 Global Step: 142610 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:45:21,132-Speed 2624.19 samples/sec Loss 11.4728 LearningRate 0.0686 Epoch: 3 Global Step: 142620 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:45:25,030-Speed 2627.37 samples/sec Loss 11.3587 LearningRate 0.0686 Epoch: 3 Global Step: 142630 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:28,939-Speed 2620.32 samples/sec Loss 11.2927 LearningRate 0.0686 Epoch: 3 Global Step: 142640 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:32,847-Speed 2620.65 samples/sec Loss 11.3845 LearningRate 0.0686 Epoch: 3 Global Step: 142650 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:36,750-Speed 2623.90 samples/sec Loss 11.2975 LearningRate 0.0686 Epoch: 3 Global Step: 142660 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:40,649-Speed 2627.21 samples/sec Loss 11.3435 LearningRate 0.0686 Epoch: 3 Global Step: 142670 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:44,555-Speed 2622.50 samples/sec Loss 11.2960 LearningRate 0.0686 Epoch: 3 Global Step: 142680 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:48,456-Speed 2625.31 samples/sec Loss 11.3365 LearningRate 0.0686 Epoch: 3 Global Step: 142690 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:52,365-Speed 2620.36 samples/sec Loss 11.4033 LearningRate 0.0686 Epoch: 3 Global Step: 142700 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:45:56,266-Speed 2625.67 samples/sec Loss 11.4208 LearningRate 0.0686 Epoch: 3 Global Step: 142710 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:00,164-Speed 2627.85 samples/sec Loss 11.2866 LearningRate 0.0686 Epoch: 3 Global Step: 142720 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:04,071-Speed 2621.14 samples/sec Loss 11.4143 LearningRate 0.0685 Epoch: 3 Global Step: 142730 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:46:07,961-Speed 2633.17 samples/sec Loss 11.4677 LearningRate 0.0685 Epoch: 3 Global Step: 142740 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:11,866-Speed 2622.87 samples/sec Loss 11.3709 LearningRate 0.0685 Epoch: 3 Global Step: 142750 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:15,770-Speed 2623.41 samples/sec Loss 11.3866 LearningRate 0.0685 Epoch: 3 Global Step: 142760 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:19,686-Speed 2616.32 samples/sec Loss 11.3407 LearningRate 0.0685 Epoch: 3 Global Step: 142770 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:23,600-Speed 2616.89 samples/sec Loss 11.4436 LearningRate 0.0685 Epoch: 3 Global Step: 142780 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:27,512-Speed 2617.94 samples/sec Loss 11.4021 LearningRate 0.0685 Epoch: 3 Global Step: 142790 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:31,427-Speed 2616.54 samples/sec Loss 11.2958 LearningRate 0.0685 Epoch: 3 Global Step: 142800 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:35,347-Speed 2613.09 samples/sec Loss 11.4198 LearningRate 0.0685 Epoch: 3 Global Step: 142810 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:39,260-Speed 2616.94 samples/sec Loss 11.4151 LearningRate 0.0685 Epoch: 3 Global Step: 142820 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:43,191-Speed 2605.92 samples/sec Loss 11.5835 LearningRate 0.0685 Epoch: 3 Global Step: 142830 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:47,102-Speed 2618.89 samples/sec Loss 11.5020 LearningRate 0.0685 Epoch: 3 Global Step: 142840 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:46:50,982-Speed 2640.53 samples/sec Loss 11.4261 LearningRate 0.0685 Epoch: 3 Global Step: 142850 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:54,897-Speed 2615.72 samples/sec Loss 11.5781 LearningRate 0.0685 Epoch: 3 Global Step: 142860 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:46:58,804-Speed 2621.86 samples/sec Loss 11.2675 LearningRate 0.0685 Epoch: 3 Global Step: 142870 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:02,707-Speed 2624.23 samples/sec Loss 11.4017 LearningRate 0.0685 Epoch: 3 Global Step: 142880 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:06,614-Speed 2621.64 samples/sec Loss 11.3940 LearningRate 0.0685 Epoch: 3 Global Step: 142890 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:10,532-Speed 2613.58 samples/sec Loss 11.2714 LearningRate 0.0685 Epoch: 3 Global Step: 142900 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:14,433-Speed 2626.04 samples/sec Loss 11.5382 LearningRate 0.0685 Epoch: 3 Global Step: 142910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:18,333-Speed 2626.66 samples/sec Loss 11.4913 LearningRate 0.0685 Epoch: 3 Global Step: 142920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:22,233-Speed 2626.01 samples/sec Loss 11.3118 LearningRate 0.0685 Epoch: 3 Global Step: 142930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:26,137-Speed 2623.94 samples/sec Loss 11.5241 LearningRate 0.0685 Epoch: 3 Global Step: 142940 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:30,037-Speed 2626.89 samples/sec Loss 11.3024 LearningRate 0.0685 Epoch: 3 Global Step: 142950 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:47:33,937-Speed 2625.76 samples/sec Loss 11.3797 LearningRate 0.0685 Epoch: 3 Global Step: 142960 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:37,853-Speed 2615.56 samples/sec Loss 11.4275 LearningRate 0.0685 Epoch: 3 Global Step: 142970 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:41,762-Speed 2620.48 samples/sec Loss 11.3797 LearningRate 0.0685 Epoch: 3 Global Step: 142980 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:45,662-Speed 2626.09 samples/sec Loss 11.3124 LearningRate 0.0685 Epoch: 3 Global Step: 142990 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:49,565-Speed 2627.18 samples/sec Loss 11.4924 LearningRate 0.0685 Epoch: 3 Global Step: 143000 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:53,466-Speed 2625.19 samples/sec Loss 11.3694 LearningRate 0.0685 Epoch: 3 Global Step: 143010 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:47:57,389-Speed 2610.64 samples/sec Loss 11.3455 LearningRate 0.0685 Epoch: 3 Global Step: 143020 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:48:01,369-Speed 2573.39 samples/sec Loss 11.3581 LearningRate 0.0685 Epoch: 3 Global Step: 143030 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:05,272-Speed 2624.55 samples/sec Loss 11.2923 LearningRate 0.0685 Epoch: 3 Global Step: 143040 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:09,188-Speed 2615.13 samples/sec Loss 11.5425 LearningRate 0.0685 Epoch: 3 Global Step: 143050 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:13,089-Speed 2626.11 samples/sec Loss 11.1669 LearningRate 0.0685 Epoch: 3 Global Step: 143060 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:16,989-Speed 2625.83 samples/sec Loss 11.2289 LearningRate 0.0685 Epoch: 3 Global Step: 143070 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:20,896-Speed 2621.97 samples/sec Loss 11.3269 LearningRate 0.0685 Epoch: 3 Global Step: 143080 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:24,800-Speed 2623.80 samples/sec Loss 11.3883 LearningRate 0.0685 Epoch: 3 Global Step: 143090 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:28,699-Speed 2626.44 samples/sec Loss 11.4420 LearningRate 0.0685 Epoch: 3 Global Step: 143100 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:32,608-Speed 2620.33 samples/sec Loss 11.4190 LearningRate 0.0685 Epoch: 3 Global Step: 143110 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:36,515-Speed 2621.84 samples/sec Loss 11.2121 LearningRate 0.0685 Epoch: 3 Global Step: 143120 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 11:48:40,418-Speed 2624.56 samples/sec Loss 11.4982 LearningRate 0.0685 Epoch: 3 Global Step: 143130 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:48:44,322-Speed 2623.84 samples/sec Loss 11.3652 LearningRate 0.0685 Epoch: 3 Global Step: 143140 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:48:48,221-Speed 2626.53 samples/sec Loss 11.4596 LearningRate 0.0685 Epoch: 3 Global Step: 143150 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:48:52,122-Speed 2625.76 samples/sec Loss 11.3742 LearningRate 0.0685 Epoch: 3 Global Step: 143160 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:48:56,022-Speed 2626.13 samples/sec Loss 11.3343 LearningRate 0.0685 Epoch: 3 Global Step: 143170 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:48:59,919-Speed 2628.30 samples/sec Loss 11.4312 LearningRate 0.0685 Epoch: 3 Global Step: 143180 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:03,829-Speed 2619.10 samples/sec Loss 11.4237 LearningRate 0.0685 Epoch: 3 Global Step: 143190 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:07,757-Speed 2608.55 samples/sec Loss 11.2716 LearningRate 0.0685 Epoch: 3 Global Step: 143200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:11,658-Speed 2625.92 samples/sec Loss 11.4015 LearningRate 0.0685 Epoch: 3 Global Step: 143210 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:15,555-Speed 2627.97 samples/sec Loss 11.2810 LearningRate 0.0685 Epoch: 3 Global Step: 143220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:19,462-Speed 2621.56 samples/sec Loss 11.4352 LearningRate 0.0685 Epoch: 3 Global Step: 143230 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:49:23,366-Speed 2623.57 samples/sec Loss 11.3429 LearningRate 0.0684 Epoch: 3 Global Step: 143240 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:49:27,270-Speed 2623.96 samples/sec Loss 11.4525 LearningRate 0.0684 Epoch: 3 Global Step: 143250 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:49:31,167-Speed 2628.15 samples/sec Loss 11.3752 LearningRate 0.0684 Epoch: 3 Global Step: 143260 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:49:35,065-Speed 2627.60 samples/sec Loss 11.3636 LearningRate 0.0684 Epoch: 3 Global Step: 143270 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:49:38,963-Speed 2627.50 samples/sec Loss 11.3162 LearningRate 0.0684 Epoch: 3 Global Step: 143280 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:49:42,865-Speed 2624.86 samples/sec Loss 11.3827 LearningRate 0.0684 Epoch: 3 Global Step: 143290 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:49:46,747-Speed 2638.89 samples/sec Loss 11.2654 LearningRate 0.0684 Epoch: 3 Global Step: 143300 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:50,648-Speed 2625.55 samples/sec Loss 11.4236 LearningRate 0.0684 Epoch: 3 Global Step: 143310 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:54,552-Speed 2623.72 samples/sec Loss 11.3961 LearningRate 0.0684 Epoch: 3 Global Step: 143320 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:49:58,475-Speed 2611.27 samples/sec Loss 11.4203 LearningRate 0.0684 Epoch: 3 Global Step: 143330 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:50:02,399-Speed 2609.83 samples/sec Loss 11.2528 LearningRate 0.0684 Epoch: 3 Global Step: 143340 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:50:06,317-Speed 2614.41 samples/sec Loss 11.3385 LearningRate 0.0684 Epoch: 3 Global Step: 143350 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:50:10,248-Speed 2605.78 samples/sec Loss 11.1412 LearningRate 0.0684 Epoch: 3 Global Step: 143360 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:50:14,143-Speed 2629.59 samples/sec Loss 11.4868 LearningRate 0.0684 Epoch: 3 Global Step: 143370 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:50:18,053-Speed 2619.68 samples/sec Loss 11.2580 LearningRate 0.0684 Epoch: 3 Global Step: 143380 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:50:21,961-Speed 2620.75 samples/sec Loss 11.2077 LearningRate 0.0684 Epoch: 3 Global Step: 143390 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:50:25,866-Speed 2623.28 samples/sec Loss 11.2748 LearningRate 0.0684 Epoch: 3 Global Step: 143400 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:29,762-Speed 2628.87 samples/sec Loss 11.5051 LearningRate 0.0684 Epoch: 3 Global Step: 143410 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:33,667-Speed 2622.71 samples/sec Loss 11.3778 LearningRate 0.0684 Epoch: 3 Global Step: 143420 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:37,576-Speed 2620.11 samples/sec Loss 11.5139 LearningRate 0.0684 Epoch: 3 Global Step: 143430 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:41,487-Speed 2619.15 samples/sec Loss 11.4445 LearningRate 0.0684 Epoch: 3 Global Step: 143440 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:45,420-Speed 2604.22 samples/sec Loss 11.2727 LearningRate 0.0684 Epoch: 3 Global Step: 143450 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:49,343-Speed 2611.10 samples/sec Loss 11.3448 LearningRate 0.0684 Epoch: 3 Global Step: 143460 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:53,256-Speed 2618.12 samples/sec Loss 11.4153 LearningRate 0.0684 Epoch: 3 Global Step: 143470 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:50:57,161-Speed 2622.35 samples/sec Loss 11.4458 LearningRate 0.0684 Epoch: 3 Global Step: 143480 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:51:01,078-Speed 2615.08 samples/sec Loss 11.3737 LearningRate 0.0684 Epoch: 3 Global Step: 143490 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:51:04,986-Speed 2621.10 samples/sec Loss 11.1666 LearningRate 0.0684 Epoch: 3 Global Step: 143500 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:51:08,944-Speed 2587.53 samples/sec Loss 11.3288 LearningRate 0.0684 Epoch: 3 Global Step: 143510 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:51:12,846-Speed 2624.98 samples/sec Loss 11.3731 LearningRate 0.0684 Epoch: 3 Global Step: 143520 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:51:16,743-Speed 2628.66 samples/sec Loss 11.4612 LearningRate 0.0684 Epoch: 3 Global Step: 143530 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:51:20,620-Speed 2641.72 samples/sec Loss 11.2823 LearningRate 0.0684 Epoch: 3 Global Step: 143540 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:24,515-Speed 2629.81 samples/sec Loss 11.4197 LearningRate 0.0684 Epoch: 3 Global Step: 143550 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:28,436-Speed 2612.72 samples/sec Loss 11.3761 LearningRate 0.0684 Epoch: 3 Global Step: 143560 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:32,357-Speed 2612.03 samples/sec Loss 11.3301 LearningRate 0.0684 Epoch: 3 Global Step: 143570 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:36,264-Speed 2621.38 samples/sec Loss 11.3465 LearningRate 0.0684 Epoch: 3 Global Step: 143580 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:40,163-Speed 2627.14 samples/sec Loss 11.4766 LearningRate 0.0684 Epoch: 3 Global Step: 143590 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:44,082-Speed 2613.51 samples/sec Loss 11.3247 LearningRate 0.0684 Epoch: 3 Global Step: 143600 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:47,975-Speed 2631.50 samples/sec Loss 11.2839 LearningRate 0.0684 Epoch: 3 Global Step: 143610 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:51,890-Speed 2616.58 samples/sec Loss 11.3397 LearningRate 0.0684 Epoch: 3 Global Step: 143620 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:55,779-Speed 2633.27 samples/sec Loss 11.2591 LearningRate 0.0684 Epoch: 3 Global Step: 143630 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:51:59,681-Speed 2625.66 samples/sec Loss 11.3374 LearningRate 0.0684 Epoch: 3 Global Step: 143640 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:03,576-Speed 2629.42 samples/sec Loss 11.3049 LearningRate 0.0684 Epoch: 3 Global Step: 143650 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:07,473-Speed 2628.56 samples/sec Loss 11.4005 LearningRate 0.0684 Epoch: 3 Global Step: 143660 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:11,380-Speed 2621.15 samples/sec Loss 11.4552 LearningRate 0.0684 Epoch: 3 Global Step: 143670 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:15,276-Speed 2629.02 samples/sec Loss 11.4094 LearningRate 0.0684 Epoch: 3 Global Step: 143680 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:19,173-Speed 2628.55 samples/sec Loss 11.1909 LearningRate 0.0684 Epoch: 3 Global Step: 143690 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:23,073-Speed 2626.61 samples/sec Loss 11.3525 LearningRate 0.0684 Epoch: 3 Global Step: 143700 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:26,993-Speed 2612.25 samples/sec Loss 11.3216 LearningRate 0.0684 Epoch: 3 Global Step: 143710 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:30,892-Speed 2627.86 samples/sec Loss 11.3622 LearningRate 0.0684 Epoch: 3 Global Step: 143720 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:34,789-Speed 2628.04 samples/sec Loss 11.4707 LearningRate 0.0684 Epoch: 3 Global Step: 143730 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:38,676-Speed 2634.77 samples/sec Loss 11.3827 LearningRate 0.0683 Epoch: 3 Global Step: 143740 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:42,573-Speed 2628.37 samples/sec Loss 11.2181 LearningRate 0.0683 Epoch: 3 Global Step: 143750 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:46,473-Speed 2626.68 samples/sec Loss 11.2367 LearningRate 0.0683 Epoch: 3 Global Step: 143760 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:50,373-Speed 2626.56 samples/sec Loss 11.3859 LearningRate 0.0683 Epoch: 3 Global Step: 143770 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:54,269-Speed 2628.68 samples/sec Loss 11.2890 LearningRate 0.0683 Epoch: 3 Global Step: 143780 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:52:58,192-Speed 2611.11 samples/sec Loss 11.4129 LearningRate 0.0683 Epoch: 3 Global Step: 143790 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:53:02,095-Speed 2624.75 samples/sec Loss 11.3436 LearningRate 0.0683 Epoch: 3 Global Step: 143800 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:53:06,037-Speed 2597.90 samples/sec Loss 11.5055 LearningRate 0.0683 Epoch: 3 Global Step: 143810 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:53:09,936-Speed 2627.15 samples/sec Loss 11.4390 LearningRate 0.0683 Epoch: 3 Global Step: 143820 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:53:13,839-Speed 2624.21 samples/sec Loss 11.2670 LearningRate 0.0683 Epoch: 3 Global Step: 143830 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:53:17,714-Speed 2643.13 samples/sec Loss 11.3703 LearningRate 0.0683 Epoch: 3 Global Step: 143840 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:21,612-Speed 2628.33 samples/sec Loss 11.3018 LearningRate 0.0683 Epoch: 3 Global Step: 143850 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:25,509-Speed 2627.76 samples/sec Loss 11.2876 LearningRate 0.0683 Epoch: 3 Global Step: 143860 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:29,798-Speed 2388.38 samples/sec Loss 11.4827 LearningRate 0.0683 Epoch: 3 Global Step: 143870 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:33,692-Speed 2630.56 samples/sec Loss 11.3545 LearningRate 0.0683 Epoch: 3 Global Step: 143880 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:37,585-Speed 2630.59 samples/sec Loss 11.3555 LearningRate 0.0683 Epoch: 3 Global Step: 143890 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:41,484-Speed 2627.60 samples/sec Loss 11.4138 LearningRate 0.0683 Epoch: 3 Global Step: 143900 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:45,380-Speed 2628.74 samples/sec Loss 11.3944 LearningRate 0.0683 Epoch: 3 Global Step: 143910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:49,273-Speed 2630.96 samples/sec Loss 11.3637 LearningRate 0.0683 Epoch: 3 Global Step: 143920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:53,172-Speed 2627.35 samples/sec Loss 11.3023 LearningRate 0.0683 Epoch: 3 Global Step: 143930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:53:57,069-Speed 2628.03 samples/sec Loss 11.4037 LearningRate 0.0683 Epoch: 3 Global Step: 143940 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:54:00,973-Speed 2623.77 samples/sec Loss 11.2783 LearningRate 0.0683 Epoch: 3 Global Step: 143950 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:54:04,875-Speed 2624.96 samples/sec Loss 11.2719 LearningRate 0.0683 Epoch: 3 Global Step: 143960 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:54:08,755-Speed 2639.59 samples/sec Loss 11.4395 LearningRate 0.0683 Epoch: 3 Global Step: 143970 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:12,653-Speed 2627.71 samples/sec Loss 11.4096 LearningRate 0.0683 Epoch: 3 Global Step: 143980 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:16,553-Speed 2626.70 samples/sec Loss 11.1862 LearningRate 0.0683 Epoch: 3 Global Step: 143990 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:20,451-Speed 2627.83 samples/sec Loss 11.3790 LearningRate 0.0683 Epoch: 3 Global Step: 144000 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:24,349-Speed 2627.45 samples/sec Loss 11.4311 LearningRate 0.0683 Epoch: 3 Global Step: 144010 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:28,265-Speed 2615.46 samples/sec Loss 11.3483 LearningRate 0.0683 Epoch: 3 Global Step: 144020 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:32,161-Speed 2629.44 samples/sec Loss 11.3314 LearningRate 0.0683 Epoch: 3 Global Step: 144030 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:36,058-Speed 2628.59 samples/sec Loss 11.1951 LearningRate 0.0683 Epoch: 3 Global Step: 144040 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:39,956-Speed 2627.35 samples/sec Loss 11.4210 LearningRate 0.0683 Epoch: 3 Global Step: 144050 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:43,856-Speed 2626.85 samples/sec Loss 11.3370 LearningRate 0.0683 Epoch: 3 Global Step: 144060 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:54:47,762-Speed 2622.52 samples/sec Loss 11.4236 LearningRate 0.0683 Epoch: 3 Global Step: 144070 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:54:51,685-Speed 2610.62 samples/sec Loss 11.4796 LearningRate 0.0683 Epoch: 3 Global Step: 144080 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:54:55,591-Speed 2622.94 samples/sec Loss 11.3222 LearningRate 0.0683 Epoch: 3 Global Step: 144090 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:54:59,494-Speed 2623.72 samples/sec Loss 11.4548 LearningRate 0.0683 Epoch: 3 Global Step: 144100 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:55:03,440-Speed 2595.74 samples/sec Loss 11.3135 LearningRate 0.0683 Epoch: 3 Global Step: 144110 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:55:07,356-Speed 2615.54 samples/sec Loss 11.4123 LearningRate 0.0683 Epoch: 3 Global Step: 144120 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:55:11,272-Speed 2616.08 samples/sec Loss 11.2495 LearningRate 0.0683 Epoch: 3 Global Step: 144130 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:55:15,192-Speed 2612.94 samples/sec Loss 11.2535 LearningRate 0.0683 Epoch: 3 Global Step: 144140 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:55:19,108-Speed 2615.49 samples/sec Loss 11.5966 LearningRate 0.0683 Epoch: 3 Global Step: 144150 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:55:22,988-Speed 2639.76 samples/sec Loss 11.3114 LearningRate 0.0683 Epoch: 3 Global Step: 144160 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:26,885-Speed 2628.35 samples/sec Loss 11.4691 LearningRate 0.0683 Epoch: 3 Global Step: 144170 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:30,798-Speed 2617.54 samples/sec Loss 11.2002 LearningRate 0.0683 Epoch: 3 Global Step: 144180 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:34,706-Speed 2621.11 samples/sec Loss 11.2825 LearningRate 0.0683 Epoch: 3 Global Step: 144190 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:38,630-Speed 2610.06 samples/sec Loss 11.5294 LearningRate 0.0683 Epoch: 3 Global Step: 144200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:42,536-Speed 2622.33 samples/sec Loss 11.2418 LearningRate 0.0683 Epoch: 3 Global Step: 144210 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:46,431-Speed 2629.85 samples/sec Loss 11.3954 LearningRate 0.0683 Epoch: 3 Global Step: 144220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:50,336-Speed 2623.06 samples/sec Loss 11.2670 LearningRate 0.0683 Epoch: 3 Global Step: 144230 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:54,234-Speed 2627.80 samples/sec Loss 11.3348 LearningRate 0.0682 Epoch: 3 Global Step: 144240 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:55:58,134-Speed 2626.22 samples/sec Loss 11.2415 LearningRate 0.0682 Epoch: 3 Global Step: 144250 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:02,032-Speed 2627.59 samples/sec Loss 11.3362 LearningRate 0.0682 Epoch: 3 Global Step: 144260 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:56:05,932-Speed 2625.77 samples/sec Loss 11.2445 LearningRate 0.0682 Epoch: 3 Global Step: 144270 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:56:09,832-Speed 2627.13 samples/sec Loss 11.3054 LearningRate 0.0682 Epoch: 3 Global Step: 144280 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:56:13,729-Speed 2628.11 samples/sec Loss 11.5105 LearningRate 0.0682 Epoch: 3 Global Step: 144290 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:56:17,658-Speed 2606.98 samples/sec Loss 11.3318 LearningRate 0.0682 Epoch: 3 Global Step: 144300 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:56:21,558-Speed 2626.48 samples/sec Loss 11.2847 LearningRate 0.0682 Epoch: 3 Global Step: 144310 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:56:25,437-Speed 2640.58 samples/sec Loss 11.3461 LearningRate 0.0682 Epoch: 3 Global Step: 144320 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:29,348-Speed 2619.15 samples/sec Loss 11.3295 LearningRate 0.0682 Epoch: 3 Global Step: 144330 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:33,246-Speed 2627.67 samples/sec Loss 11.3617 LearningRate 0.0682 Epoch: 3 Global Step: 144340 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:37,146-Speed 2625.96 samples/sec Loss 11.3379 LearningRate 0.0682 Epoch: 3 Global Step: 144350 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:41,058-Speed 2618.87 samples/sec Loss 11.1887 LearningRate 0.0682 Epoch: 3 Global Step: 144360 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:44,953-Speed 2629.91 samples/sec Loss 11.3410 LearningRate 0.0682 Epoch: 3 Global Step: 144370 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:48,846-Speed 2631.17 samples/sec Loss 11.1467 LearningRate 0.0682 Epoch: 3 Global Step: 144380 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:52,768-Speed 2611.58 samples/sec Loss 11.2636 LearningRate 0.0682 Epoch: 3 Global Step: 144390 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:56:56,669-Speed 2625.91 samples/sec Loss 11.4993 LearningRate 0.0682 Epoch: 3 Global Step: 144400 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:00,580-Speed 2618.78 samples/sec Loss 11.3079 LearningRate 0.0682 Epoch: 3 Global Step: 144410 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:04,488-Speed 2620.89 samples/sec Loss 11.5640 LearningRate 0.0682 Epoch: 3 Global Step: 144420 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:57:08,380-Speed 2631.53 samples/sec Loss 11.3615 LearningRate 0.0682 Epoch: 3 Global Step: 144430 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:57:12,275-Speed 2629.60 samples/sec Loss 11.4308 LearningRate 0.0682 Epoch: 3 Global Step: 144440 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:57:16,180-Speed 2623.85 samples/sec Loss 11.3126 LearningRate 0.0682 Epoch: 3 Global Step: 144450 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:57:20,074-Speed 2630.77 samples/sec Loss 11.2255 LearningRate 0.0682 Epoch: 3 Global Step: 144460 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:57:23,951-Speed 2641.64 samples/sec Loss 11.3897 LearningRate 0.0682 Epoch: 3 Global Step: 144470 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:27,856-Speed 2623.30 samples/sec Loss 11.2246 LearningRate 0.0682 Epoch: 3 Global Step: 144480 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:31,750-Speed 2630.02 samples/sec Loss 11.3708 LearningRate 0.0682 Epoch: 3 Global Step: 144490 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:35,643-Speed 2630.68 samples/sec Loss 11.4184 LearningRate 0.0682 Epoch: 3 Global Step: 144500 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:39,539-Speed 2629.28 samples/sec Loss 11.3603 LearningRate 0.0682 Epoch: 3 Global Step: 144510 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:43,447-Speed 2620.54 samples/sec Loss 11.3762 LearningRate 0.0682 Epoch: 3 Global Step: 144520 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:47,347-Speed 2626.82 samples/sec Loss 11.3577 LearningRate 0.0682 Epoch: 3 Global Step: 144530 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:51,245-Speed 2627.71 samples/sec Loss 11.3368 LearningRate 0.0682 Epoch: 3 Global Step: 144540 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:55,145-Speed 2626.02 samples/sec Loss 11.3305 LearningRate 0.0682 Epoch: 3 Global Step: 144550 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:57:59,040-Speed 2630.59 samples/sec Loss 11.3005 LearningRate 0.0682 Epoch: 3 Global Step: 144560 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:02,939-Speed 2626.19 samples/sec Loss 11.4359 LearningRate 0.0682 Epoch: 3 Global Step: 144570 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:58:06,822-Speed 2637.83 samples/sec Loss 11.2130 LearningRate 0.0682 Epoch: 3 Global Step: 144580 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:10,740-Speed 2614.44 samples/sec Loss 11.3349 LearningRate 0.0682 Epoch: 3 Global Step: 144590 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:14,637-Speed 2628.64 samples/sec Loss 11.3918 LearningRate 0.0682 Epoch: 3 Global Step: 144600 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:18,532-Speed 2629.25 samples/sec Loss 11.1919 LearningRate 0.0682 Epoch: 3 Global Step: 144610 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:22,436-Speed 2624.16 samples/sec Loss 11.2349 LearningRate 0.0682 Epoch: 3 Global Step: 144620 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:26,352-Speed 2615.42 samples/sec Loss 11.2528 LearningRate 0.0682 Epoch: 3 Global Step: 144630 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:30,247-Speed 2630.64 samples/sec Loss 11.3412 LearningRate 0.0682 Epoch: 3 Global Step: 144640 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:34,145-Speed 2627.74 samples/sec Loss 11.2090 LearningRate 0.0682 Epoch: 3 Global Step: 144650 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:38,045-Speed 2625.83 samples/sec Loss 11.3877 LearningRate 0.0682 Epoch: 3 Global Step: 144660 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:41,946-Speed 2625.15 samples/sec Loss 11.2594 LearningRate 0.0682 Epoch: 3 Global Step: 144670 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:58:45,866-Speed 2612.98 samples/sec Loss 11.3496 LearningRate 0.0682 Epoch: 3 Global Step: 144680 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:58:49,774-Speed 2620.80 samples/sec Loss 11.2996 LearningRate 0.0682 Epoch: 3 Global Step: 144690 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:58:53,674-Speed 2627.16 samples/sec Loss 11.4878 LearningRate 0.0682 Epoch: 3 Global Step: 144700 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:58:57,554-Speed 2639.45 samples/sec Loss 11.3865 LearningRate 0.0682 Epoch: 3 Global Step: 144710 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:01,448-Speed 2631.09 samples/sec Loss 11.2829 LearningRate 0.0682 Epoch: 3 Global Step: 144720 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:05,345-Speed 2628.40 samples/sec Loss 11.4582 LearningRate 0.0682 Epoch: 3 Global Step: 144730 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:09,247-Speed 2624.61 samples/sec Loss 11.2651 LearningRate 0.0681 Epoch: 3 Global Step: 144740 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:13,141-Speed 2629.80 samples/sec Loss 11.4304 LearningRate 0.0681 Epoch: 3 Global Step: 144750 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:17,037-Speed 2629.32 samples/sec Loss 11.2763 LearningRate 0.0681 Epoch: 3 Global Step: 144760 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:20,930-Speed 2631.43 samples/sec Loss 11.2207 LearningRate 0.0681 Epoch: 3 Global Step: 144770 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:24,822-Speed 2631.75 samples/sec Loss 11.3156 LearningRate 0.0681 Epoch: 3 Global Step: 144780 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:28,722-Speed 2626.60 samples/sec Loss 11.2079 LearningRate 0.0681 Epoch: 3 Global Step: 144790 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:32,621-Speed 2627.07 samples/sec Loss 11.3867 LearningRate 0.0681 Epoch: 3 Global Step: 144800 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 11:59:36,528-Speed 2621.74 samples/sec Loss 11.2268 LearningRate 0.0681 Epoch: 3 Global Step: 144810 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:59:40,421-Speed 2630.84 samples/sec Loss 11.3820 LearningRate 0.0681 Epoch: 3 Global Step: 144820 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:59:44,313-Speed 2631.66 samples/sec Loss 11.4718 LearningRate 0.0681 Epoch: 3 Global Step: 144830 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:59:48,214-Speed 2625.74 samples/sec Loss 11.4413 LearningRate 0.0681 Epoch: 3 Global Step: 144840 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:59:52,115-Speed 2625.51 samples/sec Loss 11.3103 LearningRate 0.0681 Epoch: 3 Global Step: 144850 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:59:56,033-Speed 2614.05 samples/sec Loss 11.3323 LearningRate 0.0681 Epoch: 3 Global Step: 144860 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 11:59:59,943-Speed 2620.14 samples/sec Loss 11.3601 LearningRate 0.0681 Epoch: 3 Global Step: 144870 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:00:03,849-Speed 2622.59 samples/sec Loss 11.3537 LearningRate 0.0681 Epoch: 3 Global Step: 144880 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:00:07,785-Speed 2601.56 samples/sec Loss 11.1541 LearningRate 0.0681 Epoch: 3 Global Step: 144890 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:00:11,693-Speed 2620.99 samples/sec Loss 11.3767 LearningRate 0.0681 Epoch: 3 Global Step: 144900 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:00:15,559-Speed 2649.88 samples/sec Loss 11.3110 LearningRate 0.0681 Epoch: 3 Global Step: 144910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:19,454-Speed 2629.66 samples/sec Loss 11.3541 LearningRate 0.0681 Epoch: 3 Global Step: 144920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:23,359-Speed 2622.39 samples/sec Loss 11.3432 LearningRate 0.0681 Epoch: 3 Global Step: 144930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:27,261-Speed 2625.29 samples/sec Loss 11.3400 LearningRate 0.0681 Epoch: 3 Global Step: 144940 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:31,157-Speed 2629.22 samples/sec Loss 11.3020 LearningRate 0.0681 Epoch: 3 Global Step: 144950 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:35,052-Speed 2629.87 samples/sec Loss 11.3396 LearningRate 0.0681 Epoch: 3 Global Step: 144960 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:38,982-Speed 2605.53 samples/sec Loss 11.3423 LearningRate 0.0681 Epoch: 3 Global Step: 144970 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:42,887-Speed 2622.83 samples/sec Loss 11.3327 LearningRate 0.0681 Epoch: 3 Global Step: 144980 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:46,791-Speed 2624.17 samples/sec Loss 11.2974 LearningRate 0.0681 Epoch: 3 Global Step: 144990 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:00:50,658-Speed 2648.84 samples/sec Loss 11.3329 LearningRate 0.0681 Epoch: 3 Global Step: 145000 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:00:54,550-Speed 2631.60 samples/sec Loss 11.4481 LearningRate 0.0681 Epoch: 3 Global Step: 145010 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:00:58,447-Speed 2628.24 samples/sec Loss 11.2476 LearningRate 0.0681 Epoch: 3 Global Step: 145020 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:02,338-Speed 2632.36 samples/sec Loss 11.3089 LearningRate 0.0681 Epoch: 3 Global Step: 145030 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:06,231-Speed 2631.27 samples/sec Loss 11.2747 LearningRate 0.0681 Epoch: 3 Global Step: 145040 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:10,126-Speed 2629.69 samples/sec Loss 11.1733 LearningRate 0.0681 Epoch: 3 Global Step: 145050 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:14,017-Speed 2632.10 samples/sec Loss 11.2600 LearningRate 0.0681 Epoch: 3 Global Step: 145060 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:17,907-Speed 2633.60 samples/sec Loss 11.3667 LearningRate 0.0681 Epoch: 3 Global Step: 145070 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:21,805-Speed 2627.91 samples/sec Loss 11.3362 LearningRate 0.0681 Epoch: 3 Global Step: 145080 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:25,694-Speed 2633.60 samples/sec Loss 11.4117 LearningRate 0.0681 Epoch: 3 Global Step: 145090 Fp16 Grad Scale: 16384 Required: 77 hours
Training: 2022-04-13 12:01:29,583-Speed 2633.57 samples/sec Loss 11.3965 LearningRate 0.0681 Epoch: 3 Global Step: 145100 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:01:33,473-Speed 2632.95 samples/sec Loss 11.2860 LearningRate 0.0681 Epoch: 3 Global Step: 145110 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:01:37,363-Speed 2632.95 samples/sec Loss 11.2945 LearningRate 0.0681 Epoch: 3 Global Step: 145120 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:01:41,255-Speed 2632.12 samples/sec Loss 11.3820 LearningRate 0.0681 Epoch: 3 Global Step: 145130 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:01:45,154-Speed 2626.67 samples/sec Loss 11.4400 LearningRate 0.0681 Epoch: 3 Global Step: 145140 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:01:49,045-Speed 2632.58 samples/sec Loss 11.2296 LearningRate 0.0681 Epoch: 3 Global Step: 145150 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:01:52,935-Speed 2633.15 samples/sec Loss 11.3105 LearningRate 0.0681 Epoch: 3 Global Step: 145160 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:01:56,825-Speed 2633.27 samples/sec Loss 11.2113 LearningRate 0.0681 Epoch: 3 Global Step: 145170 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:02:00,721-Speed 2629.28 samples/sec Loss 11.1974 LearningRate 0.0681 Epoch: 3 Global Step: 145180 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:02:04,614-Speed 2630.88 samples/sec Loss 11.4205 LearningRate 0.0681 Epoch: 3 Global Step: 145190 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:02:08,509-Speed 2628.92 samples/sec Loss 11.3576 LearningRate 0.0681 Epoch: 3 Global Step: 145200 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:12,407-Speed 2628.24 samples/sec Loss 11.3191 LearningRate 0.0681 Epoch: 3 Global Step: 145210 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:16,306-Speed 2626.75 samples/sec Loss 11.2745 LearningRate 0.0681 Epoch: 3 Global Step: 145220 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:20,202-Speed 2628.96 samples/sec Loss 11.2213 LearningRate 0.0681 Epoch: 3 Global Step: 145230 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:24,093-Speed 2632.38 samples/sec Loss 11.1040 LearningRate 0.0680 Epoch: 3 Global Step: 145240 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:27,987-Speed 2630.66 samples/sec Loss 11.3500 LearningRate 0.0680 Epoch: 3 Global Step: 145250 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:31,876-Speed 2633.09 samples/sec Loss 11.3860 LearningRate 0.0680 Epoch: 3 Global Step: 145260 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:35,771-Speed 2629.32 samples/sec Loss 11.2709 LearningRate 0.0680 Epoch: 3 Global Step: 145270 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:39,666-Speed 2630.31 samples/sec Loss 11.3316 LearningRate 0.0680 Epoch: 3 Global Step: 145280 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:43,615-Speed 2593.85 samples/sec Loss 11.2007 LearningRate 0.0680 Epoch: 3 Global Step: 145290 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:02:47,531-Speed 2615.63 samples/sec Loss 11.2019 LearningRate 0.0680 Epoch: 3 Global Step: 145300 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:02:51,448-Speed 2614.98 samples/sec Loss 11.4168 LearningRate 0.0680 Epoch: 3 Global Step: 145310 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:02:55,361-Speed 2617.73 samples/sec Loss 11.3705 LearningRate 0.0680 Epoch: 3 Global Step: 145320 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:02:59,274-Speed 2617.53 samples/sec Loss 11.4740 LearningRate 0.0680 Epoch: 3 Global Step: 145330 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:03,187-Speed 2617.62 samples/sec Loss 11.2923 LearningRate 0.0680 Epoch: 3 Global Step: 145340 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:07,101-Speed 2616.44 samples/sec Loss 11.3271 LearningRate 0.0680 Epoch: 3 Global Step: 145350 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:11,017-Speed 2615.61 samples/sec Loss 11.4143 LearningRate 0.0680 Epoch: 3 Global Step: 145360 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:14,928-Speed 2618.96 samples/sec Loss 11.3862 LearningRate 0.0680 Epoch: 3 Global Step: 145370 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:18,829-Speed 2625.70 samples/sec Loss 11.2450 LearningRate 0.0680 Epoch: 3 Global Step: 145380 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:22,720-Speed 2632.41 samples/sec Loss 11.3539 LearningRate 0.0680 Epoch: 3 Global Step: 145390 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:26,615-Speed 2629.22 samples/sec Loss 11.2771 LearningRate 0.0680 Epoch: 3 Global Step: 145400 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:03:30,503-Speed 2635.58 samples/sec Loss 11.4279 LearningRate 0.0680 Epoch: 3 Global Step: 145410 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:34,422-Speed 2613.45 samples/sec Loss 11.2700 LearningRate 0.0680 Epoch: 3 Global Step: 145420 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:38,344-Speed 2611.25 samples/sec Loss 11.4144 LearningRate 0.0680 Epoch: 3 Global Step: 145430 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:42,270-Speed 2609.10 samples/sec Loss 11.4399 LearningRate 0.0680 Epoch: 3 Global Step: 145440 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:46,171-Speed 2625.29 samples/sec Loss 11.3131 LearningRate 0.0680 Epoch: 3 Global Step: 145450 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:50,064-Speed 2631.11 samples/sec Loss 11.5128 LearningRate 0.0680 Epoch: 3 Global Step: 145460 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:53,970-Speed 2622.96 samples/sec Loss 11.2560 LearningRate 0.0680 Epoch: 3 Global Step: 145470 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:03:57,866-Speed 2628.50 samples/sec Loss 11.3009 LearningRate 0.0680 Epoch: 3 Global Step: 145480 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:04:01,759-Speed 2631.71 samples/sec Loss 11.3572 LearningRate 0.0680 Epoch: 3 Global Step: 145490 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:04:05,656-Speed 2628.34 samples/sec Loss 11.2709 LearningRate 0.0680 Epoch: 3 Global Step: 145500 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:04:09,548-Speed 2631.30 samples/sec Loss 11.2755 LearningRate 0.0680 Epoch: 3 Global Step: 145510 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:13,444-Speed 2628.70 samples/sec Loss 11.2194 LearningRate 0.0680 Epoch: 3 Global Step: 145520 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:17,338-Speed 2630.64 samples/sec Loss 11.2545 LearningRate 0.0680 Epoch: 3 Global Step: 145530 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:21,284-Speed 2596.30 samples/sec Loss 11.2436 LearningRate 0.0680 Epoch: 3 Global Step: 145540 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:25,197-Speed 2617.32 samples/sec Loss 11.2325 LearningRate 0.0680 Epoch: 3 Global Step: 145550 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:29,108-Speed 2618.95 samples/sec Loss 11.2291 LearningRate 0.0680 Epoch: 3 Global Step: 145560 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:33,011-Speed 2624.88 samples/sec Loss 11.3407 LearningRate 0.0680 Epoch: 3 Global Step: 145570 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:36,906-Speed 2629.28 samples/sec Loss 11.1417 LearningRate 0.0680 Epoch: 3 Global Step: 145580 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:40,809-Speed 2624.24 samples/sec Loss 11.3615 LearningRate 0.0680 Epoch: 3 Global Step: 145590 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:44,709-Speed 2626.99 samples/sec Loss 11.4244 LearningRate 0.0680 Epoch: 3 Global Step: 145600 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:48,769-Speed 2522.22 samples/sec Loss 11.3314 LearningRate 0.0680 Epoch: 3 Global Step: 145610 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:52,845-Speed 2513.09 samples/sec Loss 11.2026 LearningRate 0.0680 Epoch: 3 Global Step: 145620 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:04:56,737-Speed 2631.98 samples/sec Loss 11.4365 LearningRate 0.0680 Epoch: 3 Global Step: 145630 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:05:00,630-Speed 2630.85 samples/sec Loss 11.1501 LearningRate 0.0680 Epoch: 3 Global Step: 145640 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:05:04,528-Speed 2628.13 samples/sec Loss 11.3399 LearningRate 0.0680 Epoch: 3 Global Step: 145650 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:05:08,406-Speed 2640.66 samples/sec Loss 11.3366 LearningRate 0.0680 Epoch: 3 Global Step: 145660 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:05:12,286-Speed 2639.81 samples/sec Loss 11.2301 LearningRate 0.0680 Epoch: 3 Global Step: 145670 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:16,186-Speed 2626.28 samples/sec Loss 11.2316 LearningRate 0.0680 Epoch: 3 Global Step: 145680 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:20,089-Speed 2624.62 samples/sec Loss 11.3405 LearningRate 0.0680 Epoch: 3 Global Step: 145690 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:24,002-Speed 2618.04 samples/sec Loss 11.4510 LearningRate 0.0680 Epoch: 3 Global Step: 145700 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:27,927-Speed 2609.07 samples/sec Loss 11.3627 LearningRate 0.0680 Epoch: 3 Global Step: 145710 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:31,828-Speed 2626.26 samples/sec Loss 11.2470 LearningRate 0.0680 Epoch: 3 Global Step: 145720 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:35,746-Speed 2614.24 samples/sec Loss 11.3217 LearningRate 0.0680 Epoch: 3 Global Step: 145730 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:39,647-Speed 2625.92 samples/sec Loss 11.3879 LearningRate 0.0680 Epoch: 3 Global Step: 145740 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:43,552-Speed 2622.43 samples/sec Loss 11.2502 LearningRate 0.0679 Epoch: 3 Global Step: 145750 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:47,457-Speed 2623.60 samples/sec Loss 11.3596 LearningRate 0.0679 Epoch: 3 Global Step: 145760 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:05:51,364-Speed 2621.73 samples/sec Loss 11.2809 LearningRate 0.0679 Epoch: 3 Global Step: 145770 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:05:55,269-Speed 2622.58 samples/sec Loss 11.2404 LearningRate 0.0679 Epoch: 3 Global Step: 145780 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:05:59,185-Speed 2616.25 samples/sec Loss 11.1463 LearningRate 0.0679 Epoch: 3 Global Step: 145790 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:03,095-Speed 2619.14 samples/sec Loss 11.3900 LearningRate 0.0679 Epoch: 3 Global Step: 145800 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:06,997-Speed 2625.01 samples/sec Loss 11.2950 LearningRate 0.0679 Epoch: 3 Global Step: 145810 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:10,893-Speed 2629.36 samples/sec Loss 11.2848 LearningRate 0.0679 Epoch: 3 Global Step: 145820 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:14,800-Speed 2621.25 samples/sec Loss 11.2542 LearningRate 0.0679 Epoch: 3 Global Step: 145830 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:18,697-Speed 2628.61 samples/sec Loss 11.2598 LearningRate 0.0679 Epoch: 3 Global Step: 145840 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:22,589-Speed 2631.86 samples/sec Loss 11.4223 LearningRate 0.0679 Epoch: 3 Global Step: 145850 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:26,489-Speed 2626.90 samples/sec Loss 11.4958 LearningRate 0.0679 Epoch: 3 Global Step: 145860 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:30,415-Speed 2608.49 samples/sec Loss 11.2237 LearningRate 0.0679 Epoch: 3 Global Step: 145870 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:06:34,324-Speed 2620.26 samples/sec Loss 11.2599 LearningRate 0.0679 Epoch: 3 Global Step: 145880 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:06:38,202-Speed 2641.32 samples/sec Loss 11.4089 LearningRate 0.0679 Epoch: 3 Global Step: 145890 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:42,123-Speed 2612.78 samples/sec Loss 11.4443 LearningRate 0.0679 Epoch: 3 Global Step: 145900 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:46,018-Speed 2629.57 samples/sec Loss 11.2553 LearningRate 0.0679 Epoch: 3 Global Step: 145910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:49,926-Speed 2621.04 samples/sec Loss 11.3647 LearningRate 0.0679 Epoch: 3 Global Step: 145920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:53,820-Speed 2630.54 samples/sec Loss 11.4305 LearningRate 0.0679 Epoch: 3 Global Step: 145930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:06:57,716-Speed 2629.40 samples/sec Loss 11.3808 LearningRate 0.0679 Epoch: 3 Global Step: 145940 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:01,649-Speed 2604.09 samples/sec Loss 11.4077 LearningRate 0.0679 Epoch: 3 Global Step: 145950 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:05,543-Speed 2630.51 samples/sec Loss 11.4871 LearningRate 0.0679 Epoch: 3 Global Step: 145960 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:09,436-Speed 2631.30 samples/sec Loss 11.2719 LearningRate 0.0679 Epoch: 3 Global Step: 145970 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:13,333-Speed 2628.30 samples/sec Loss 11.1158 LearningRate 0.0679 Epoch: 3 Global Step: 145980 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:17,234-Speed 2625.76 samples/sec Loss 11.3053 LearningRate 0.0679 Epoch: 3 Global Step: 145990 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:07:21,135-Speed 2625.53 samples/sec Loss 11.2831 LearningRate 0.0679 Epoch: 3 Global Step: 146000 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:07:25,035-Speed 2626.57 samples/sec Loss 11.2733 LearningRate 0.0679 Epoch: 3 Global Step: 146010 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:07:28,936-Speed 2625.52 samples/sec Loss 11.3092 LearningRate 0.0679 Epoch: 3 Global Step: 146020 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:07:32,843-Speed 2621.26 samples/sec Loss 11.2789 LearningRate 0.0679 Epoch: 3 Global Step: 146030 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:07:36,731-Speed 2634.35 samples/sec Loss 11.2107 LearningRate 0.0679 Epoch: 3 Global Step: 146040 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:40,641-Speed 2619.98 samples/sec Loss 11.4569 LearningRate 0.0679 Epoch: 3 Global Step: 146050 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:44,535-Speed 2630.42 samples/sec Loss 11.2603 LearningRate 0.0679 Epoch: 3 Global Step: 146060 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:48,434-Speed 2626.97 samples/sec Loss 11.2999 LearningRate 0.0679 Epoch: 3 Global Step: 146070 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:52,342-Speed 2621.35 samples/sec Loss 11.3248 LearningRate 0.0679 Epoch: 3 Global Step: 146080 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:07:56,245-Speed 2624.47 samples/sec Loss 11.3705 LearningRate 0.0679 Epoch: 3 Global Step: 146090 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:00,139-Speed 2630.25 samples/sec Loss 11.4370 LearningRate 0.0679 Epoch: 3 Global Step: 146100 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:04,061-Speed 2611.12 samples/sec Loss 11.2795 LearningRate 0.0679 Epoch: 3 Global Step: 146110 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:07,961-Speed 2626.02 samples/sec Loss 11.2829 LearningRate 0.0679 Epoch: 3 Global Step: 146120 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:11,863-Speed 2625.90 samples/sec Loss 11.4025 LearningRate 0.0679 Epoch: 3 Global Step: 146130 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:15,786-Speed 2610.49 samples/sec Loss 11.2511 LearningRate 0.0679 Epoch: 3 Global Step: 146140 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:08:19,720-Speed 2604.12 samples/sec Loss 11.3757 LearningRate 0.0679 Epoch: 3 Global Step: 146150 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:08:23,680-Speed 2586.81 samples/sec Loss 11.3109 LearningRate 0.0679 Epoch: 3 Global Step: 146160 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:08:27,574-Speed 2629.93 samples/sec Loss 11.2844 LearningRate 0.0679 Epoch: 3 Global Step: 146170 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:08:31,458-Speed 2637.29 samples/sec Loss 11.2822 LearningRate 0.0679 Epoch: 3 Global Step: 146180 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:35,347-Speed 2633.68 samples/sec Loss 11.4779 LearningRate 0.0679 Epoch: 3 Global Step: 146190 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:39,243-Speed 2629.18 samples/sec Loss 11.3031 LearningRate 0.0679 Epoch: 3 Global Step: 146200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:43,142-Speed 2626.91 samples/sec Loss 11.2598 LearningRate 0.0679 Epoch: 3 Global Step: 146210 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:47,040-Speed 2628.33 samples/sec Loss 11.3307 LearningRate 0.0679 Epoch: 3 Global Step: 146220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:50,933-Speed 2630.84 samples/sec Loss 11.3382 LearningRate 0.0679 Epoch: 3 Global Step: 146230 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:54,829-Speed 2629.04 samples/sec Loss 11.4272 LearningRate 0.0679 Epoch: 3 Global Step: 146240 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:08:58,723-Speed 2630.62 samples/sec Loss 11.3826 LearningRate 0.0678 Epoch: 3 Global Step: 146250 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:02,635-Speed 2617.64 samples/sec Loss 11.2442 LearningRate 0.0678 Epoch: 3 Global Step: 146260 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:06,531-Speed 2629.28 samples/sec Loss 11.4507 LearningRate 0.0678 Epoch: 3 Global Step: 146270 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:10,424-Speed 2631.22 samples/sec Loss 11.3004 LearningRate 0.0678 Epoch: 3 Global Step: 146280 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:09:14,313-Speed 2633.90 samples/sec Loss 11.3484 LearningRate 0.0678 Epoch: 3 Global Step: 146290 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:18,219-Speed 2621.95 samples/sec Loss 11.3721 LearningRate 0.0678 Epoch: 3 Global Step: 146300 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:22,113-Speed 2630.91 samples/sec Loss 11.3666 LearningRate 0.0678 Epoch: 3 Global Step: 146310 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:26,013-Speed 2625.75 samples/sec Loss 11.2261 LearningRate 0.0678 Epoch: 3 Global Step: 146320 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:29,907-Speed 2630.69 samples/sec Loss 11.4024 LearningRate 0.0678 Epoch: 3 Global Step: 146330 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:33,803-Speed 2629.06 samples/sec Loss 11.2871 LearningRate 0.0678 Epoch: 3 Global Step: 146340 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:37,713-Speed 2619.22 samples/sec Loss 11.2620 LearningRate 0.0678 Epoch: 3 Global Step: 146350 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:41,606-Speed 2631.28 samples/sec Loss 11.3792 LearningRate 0.0678 Epoch: 3 Global Step: 146360 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:45,515-Speed 2620.27 samples/sec Loss 11.3260 LearningRate 0.0678 Epoch: 3 Global Step: 146370 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:49,408-Speed 2630.80 samples/sec Loss 11.3946 LearningRate 0.0678 Epoch: 3 Global Step: 146380 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:09:53,302-Speed 2629.99 samples/sec Loss 11.3673 LearningRate 0.0678 Epoch: 3 Global Step: 146390 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:09:57,183-Speed 2639.62 samples/sec Loss 11.4074 LearningRate 0.0678 Epoch: 3 Global Step: 146400 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:01,075-Speed 2632.05 samples/sec Loss 11.1735 LearningRate 0.0678 Epoch: 3 Global Step: 146410 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:04,972-Speed 2627.89 samples/sec Loss 11.2649 LearningRate 0.0678 Epoch: 3 Global Step: 146420 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:08,869-Speed 2628.43 samples/sec Loss 11.0970 LearningRate 0.0678 Epoch: 3 Global Step: 146430 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:12,774-Speed 2622.67 samples/sec Loss 11.1764 LearningRate 0.0678 Epoch: 3 Global Step: 146440 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:16,669-Speed 2629.66 samples/sec Loss 11.1774 LearningRate 0.0678 Epoch: 3 Global Step: 146450 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:20,574-Speed 2622.35 samples/sec Loss 11.3438 LearningRate 0.0678 Epoch: 3 Global Step: 146460 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:24,472-Speed 2627.85 samples/sec Loss 11.4060 LearningRate 0.0678 Epoch: 3 Global Step: 146470 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:28,381-Speed 2620.40 samples/sec Loss 11.2783 LearningRate 0.0678 Epoch: 3 Global Step: 146480 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:32,274-Speed 2631.36 samples/sec Loss 11.3725 LearningRate 0.0678 Epoch: 3 Global Step: 146490 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:10:36,175-Speed 2625.78 samples/sec Loss 11.1520 LearningRate 0.0678 Epoch: 3 Global Step: 146500 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:10:40,087-Speed 2617.94 samples/sec Loss 11.3444 LearningRate 0.0678 Epoch: 3 Global Step: 146510 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:10:44,042-Speed 2589.54 samples/sec Loss 11.1636 LearningRate 0.0678 Epoch: 3 Global Step: 146520 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:10:47,991-Speed 2594.29 samples/sec Loss 11.9880 LearningRate 0.0678 Epoch: 3 Global Step: 146530 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:10:52,038-Speed 2530.61 samples/sec Loss 11.8950 LearningRate 0.0678 Epoch: 3 Global Step: 146540 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:10:55,944-Speed 2622.34 samples/sec Loss 11.6548 LearningRate 0.0678 Epoch: 3 Global Step: 146550 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:10:59,878-Speed 2604.10 samples/sec Loss 11.5833 LearningRate 0.0678 Epoch: 3 Global Step: 146560 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:11:03,802-Speed 2610.36 samples/sec Loss 11.5135 LearningRate 0.0678 Epoch: 3 Global Step: 146570 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:11:07,756-Speed 2590.46 samples/sec Loss 11.4158 LearningRate 0.0678 Epoch: 3 Global Step: 146580 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:11:11,728-Speed 2578.80 samples/sec Loss 11.2318 LearningRate 0.0678 Epoch: 3 Global Step: 146590 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:11:15,631-Speed 2624.67 samples/sec Loss 11.2839 LearningRate 0.0678 Epoch: 3 Global Step: 146600 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:11:19,539-Speed 2620.82 samples/sec Loss 11.3175 LearningRate 0.0678 Epoch: 3 Global Step: 146610 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:11:23,444-Speed 2623.11 samples/sec Loss 11.3788 LearningRate 0.0678 Epoch: 3 Global Step: 146620 Fp16 Grad Scale: 32768 Required: 77 hours
Training: 2022-04-13 12:11:27,368-Speed 2610.20 samples/sec Loss 11.3432 LearningRate 0.0678 Epoch: 3 Global Step: 146630 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:31,279-Speed 2619.20 samples/sec Loss 11.2490 LearningRate 0.0678 Epoch: 3 Global Step: 146640 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:35,180-Speed 2625.94 samples/sec Loss 11.3967 LearningRate 0.0678 Epoch: 3 Global Step: 146650 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:39,082-Speed 2624.35 samples/sec Loss 11.3202 LearningRate 0.0678 Epoch: 3 Global Step: 146660 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:43,008-Speed 2609.24 samples/sec Loss 11.3080 LearningRate 0.0678 Epoch: 3 Global Step: 146670 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:46,914-Speed 2622.25 samples/sec Loss 11.3516 LearningRate 0.0678 Epoch: 3 Global Step: 146680 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:50,809-Speed 2629.49 samples/sec Loss 11.3803 LearningRate 0.0678 Epoch: 3 Global Step: 146690 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:54,713-Speed 2623.77 samples/sec Loss 11.4331 LearningRate 0.0678 Epoch: 3 Global Step: 146700 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:11:58,617-Speed 2623.28 samples/sec Loss 11.3241 LearningRate 0.0678 Epoch: 3 Global Step: 146710 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:12:02,514-Speed 2629.07 samples/sec Loss 11.1940 LearningRate 0.0678 Epoch: 3 Global Step: 146720 Fp16 Grad Scale: 65536 Required: 77 hours
Training: 2022-04-13 12:12:06,413-Speed 2627.14 samples/sec Loss 11.0832 LearningRate 0.0678 Epoch: 3 Global Step: 146730 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:10,309-Speed 2628.90 samples/sec Loss 11.4040 LearningRate 0.0678 Epoch: 3 Global Step: 146740 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:14,206-Speed 2627.96 samples/sec Loss 11.2842 LearningRate 0.0677 Epoch: 3 Global Step: 146750 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:18,101-Speed 2629.95 samples/sec Loss 11.3860 LearningRate 0.0677 Epoch: 3 Global Step: 146760 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:21,993-Speed 2631.22 samples/sec Loss 11.2762 LearningRate 0.0677 Epoch: 3 Global Step: 146770 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:25,897-Speed 2624.28 samples/sec Loss 11.1613 LearningRate 0.0677 Epoch: 3 Global Step: 146780 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:29,891-Speed 2563.99 samples/sec Loss 11.2646 LearningRate 0.0677 Epoch: 3 Global Step: 146790 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:33,799-Speed 2620.74 samples/sec Loss 11.3656 LearningRate 0.0677 Epoch: 3 Global Step: 146800 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:37,791-Speed 2566.38 samples/sec Loss 11.2888 LearningRate 0.0677 Epoch: 3 Global Step: 146810 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:41,704-Speed 2618.15 samples/sec Loss 11.2350 LearningRate 0.0677 Epoch: 3 Global Step: 146820 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:45,705-Speed 2559.88 samples/sec Loss 11.2363 LearningRate 0.0677 Epoch: 3 Global Step: 146830 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:12:49,799-Speed 2502.58 samples/sec Loss 11.3201 LearningRate 0.0677 Epoch: 3 Global Step: 146840 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:12:53,679-Speed 2639.78 samples/sec Loss 11.2446 LearningRate 0.0677 Epoch: 3 Global Step: 146850 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:12:57,580-Speed 2625.70 samples/sec Loss 11.2813 LearningRate 0.0677 Epoch: 3 Global Step: 146860 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:01,474-Speed 2629.79 samples/sec Loss 11.2666 LearningRate 0.0677 Epoch: 3 Global Step: 146870 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:05,375-Speed 2625.82 samples/sec Loss 11.4939 LearningRate 0.0677 Epoch: 3 Global Step: 146880 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:09,283-Speed 2620.79 samples/sec Loss 11.4164 LearningRate 0.0677 Epoch: 3 Global Step: 146890 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:13,176-Speed 2631.52 samples/sec Loss 11.3223 LearningRate 0.0677 Epoch: 3 Global Step: 146900 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:17,076-Speed 2626.47 samples/sec Loss 11.2946 LearningRate 0.0677 Epoch: 3 Global Step: 146910 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:20,973-Speed 2628.24 samples/sec Loss 11.3891 LearningRate 0.0677 Epoch: 3 Global Step: 146920 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:24,869-Speed 2629.55 samples/sec Loss 11.5075 LearningRate 0.0677 Epoch: 3 Global Step: 146930 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:28,766-Speed 2628.46 samples/sec Loss 11.2081 LearningRate 0.0677 Epoch: 3 Global Step: 146940 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:13:32,661-Speed 2629.28 samples/sec Loss 11.4110 LearningRate 0.0677 Epoch: 3 Global Step: 146950 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:13:36,564-Speed 2624.16 samples/sec Loss 11.3713 LearningRate 0.0677 Epoch: 3 Global Step: 146960 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:13:40,469-Speed 2623.60 samples/sec Loss 11.2517 LearningRate 0.0677 Epoch: 3 Global Step: 146970 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:13:44,382-Speed 2617.14 samples/sec Loss 11.1841 LearningRate 0.0677 Epoch: 3 Global Step: 146980 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:13:48,386-Speed 2558.31 samples/sec Loss 11.3186 LearningRate 0.0677 Epoch: 3 Global Step: 146990 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:13:52,298-Speed 2618.63 samples/sec Loss 11.2218 LearningRate 0.0677 Epoch: 3 Global Step: 147000 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:13:56,190-Speed 2631.81 samples/sec Loss 11.2600 LearningRate 0.0677 Epoch: 3 Global Step: 147010 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:00,090-Speed 2626.57 samples/sec Loss 11.2880 LearningRate 0.0677 Epoch: 3 Global Step: 147020 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:03,990-Speed 2626.30 samples/sec Loss 11.3082 LearningRate 0.0677 Epoch: 3 Global Step: 147030 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:07,881-Speed 2632.31 samples/sec Loss 11.4971 LearningRate 0.0677 Epoch: 3 Global Step: 147040 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:11,767-Speed 2635.73 samples/sec Loss 11.3934 LearningRate 0.0677 Epoch: 3 Global Step: 147050 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:15,662-Speed 2629.82 samples/sec Loss 11.3078 LearningRate 0.0677 Epoch: 3 Global Step: 147060 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:19,554-Speed 2631.94 samples/sec Loss 11.3724 LearningRate 0.0677 Epoch: 3 Global Step: 147070 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:23,471-Speed 2614.91 samples/sec Loss 11.3861 LearningRate 0.0677 Epoch: 3 Global Step: 147080 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:27,389-Speed 2614.39 samples/sec Loss 11.3917 LearningRate 0.0677 Epoch: 3 Global Step: 147090 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:31,284-Speed 2629.30 samples/sec Loss 11.2707 LearningRate 0.0677 Epoch: 3 Global Step: 147100 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:35,181-Speed 2628.43 samples/sec Loss 11.3048 LearningRate 0.0677 Epoch: 3 Global Step: 147110 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:39,083-Speed 2624.73 samples/sec Loss 11.2521 LearningRate 0.0677 Epoch: 3 Global Step: 147120 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:43,133-Speed 2529.76 samples/sec Loss 11.1816 LearningRate 0.0677 Epoch: 3 Global Step: 147130 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:47,047-Speed 2617.12 samples/sec Loss 11.3071 LearningRate 0.0677 Epoch: 3 Global Step: 147140 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:50,962-Speed 2616.12 samples/sec Loss 11.3238 LearningRate 0.0677 Epoch: 3 Global Step: 147150 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:54,894-Speed 2604.94 samples/sec Loss 11.2782 LearningRate 0.0677 Epoch: 3 Global Step: 147160 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:14:58,791-Speed 2628.75 samples/sec Loss 11.3070 LearningRate 0.0677 Epoch: 3 Global Step: 147170 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:15:02,713-Speed 2611.02 samples/sec Loss 11.4247 LearningRate 0.0677 Epoch: 3 Global Step: 147180 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:15:06,610-Speed 2628.91 samples/sec Loss 11.2508 LearningRate 0.0677 Epoch: 3 Global Step: 147190 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:15:10,506-Speed 2628.95 samples/sec Loss 11.3248 LearningRate 0.0677 Epoch: 3 Global Step: 147200 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:14,417-Speed 2618.98 samples/sec Loss 11.2722 LearningRate 0.0677 Epoch: 3 Global Step: 147210 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:18,367-Speed 2593.01 samples/sec Loss 11.1313 LearningRate 0.0677 Epoch: 3 Global Step: 147220 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:22,399-Speed 2540.81 samples/sec Loss 11.2744 LearningRate 0.0677 Epoch: 3 Global Step: 147230 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:26,291-Speed 2631.25 samples/sec Loss 11.3657 LearningRate 0.0677 Epoch: 3 Global Step: 147240 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:30,251-Speed 2586.94 samples/sec Loss 11.2744 LearningRate 0.0677 Epoch: 3 Global Step: 147250 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:34,160-Speed 2620.54 samples/sec Loss 11.2641 LearningRate 0.0676 Epoch: 3 Global Step: 147260 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:38,068-Speed 2620.68 samples/sec Loss 11.1760 LearningRate 0.0676 Epoch: 3 Global Step: 147270 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:41,987-Speed 2613.79 samples/sec Loss 11.3725 LearningRate 0.0676 Epoch: 3 Global Step: 147280 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:45,901-Speed 2616.97 samples/sec Loss 11.3348 LearningRate 0.0676 Epoch: 3 Global Step: 147290 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:15:49,801-Speed 2626.19 samples/sec Loss 11.3367 LearningRate 0.0676 Epoch: 3 Global Step: 147300 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:15:53,697-Speed 2629.17 samples/sec Loss 11.2718 LearningRate 0.0676 Epoch: 3 Global Step: 147310 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:15:57,573-Speed 2642.45 samples/sec Loss 11.2572 LearningRate 0.0676 Epoch: 3 Global Step: 147320 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:01,465-Speed 2632.05 samples/sec Loss 11.1721 LearningRate 0.0676 Epoch: 3 Global Step: 147330 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:05,361-Speed 2628.91 samples/sec Loss 11.2933 LearningRate 0.0676 Epoch: 3 Global Step: 147340 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:09,258-Speed 2627.96 samples/sec Loss 11.4295 LearningRate 0.0676 Epoch: 3 Global Step: 147350 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:13,175-Speed 2614.70 samples/sec Loss 11.3158 LearningRate 0.0676 Epoch: 3 Global Step: 147360 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:17,088-Speed 2618.34 samples/sec Loss 11.1355 LearningRate 0.0676 Epoch: 3 Global Step: 147370 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:20,977-Speed 2633.41 samples/sec Loss 11.2267 LearningRate 0.0676 Epoch: 3 Global Step: 147380 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:24,870-Speed 2631.01 samples/sec Loss 11.3691 LearningRate 0.0676 Epoch: 3 Global Step: 147390 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:28,770-Speed 2626.81 samples/sec Loss 11.1278 LearningRate 0.0676 Epoch: 3 Global Step: 147400 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:32,664-Speed 2630.23 samples/sec Loss 11.3830 LearningRate 0.0676 Epoch: 3 Global Step: 147410 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:36,594-Speed 2606.43 samples/sec Loss 11.2859 LearningRate 0.0676 Epoch: 3 Global Step: 147420 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:16:40,493-Speed 2627.11 samples/sec Loss 11.1605 LearningRate 0.0676 Epoch: 3 Global Step: 147430 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:16:44,387-Speed 2630.00 samples/sec Loss 11.1559 LearningRate 0.0676 Epoch: 3 Global Step: 147440 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:16:48,282-Speed 2629.66 samples/sec Loss 11.2945 LearningRate 0.0676 Epoch: 3 Global Step: 147450 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:16:52,170-Speed 2635.21 samples/sec Loss 11.3289 LearningRate 0.0676 Epoch: 3 Global Step: 147460 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:56,063-Speed 2631.00 samples/sec Loss 11.2367 LearningRate 0.0676 Epoch: 3 Global Step: 147470 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:16:59,975-Speed 2618.38 samples/sec Loss 11.3395 LearningRate 0.0676 Epoch: 3 Global Step: 147480 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:03,889-Speed 2616.87 samples/sec Loss 11.3951 LearningRate 0.0676 Epoch: 3 Global Step: 147490 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:07,786-Speed 2628.17 samples/sec Loss 11.2951 LearningRate 0.0676 Epoch: 3 Global Step: 147500 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:11,696-Speed 2619.89 samples/sec Loss 11.2977 LearningRate 0.0676 Epoch: 3 Global Step: 147510 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:15,599-Speed 2624.68 samples/sec Loss 11.1410 LearningRate 0.0676 Epoch: 3 Global Step: 147520 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:19,487-Speed 2633.98 samples/sec Loss 11.2410 LearningRate 0.0676 Epoch: 3 Global Step: 147530 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:23,388-Speed 2626.27 samples/sec Loss 11.2198 LearningRate 0.0676 Epoch: 3 Global Step: 147540 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:27,295-Speed 2621.81 samples/sec Loss 11.4223 LearningRate 0.0676 Epoch: 3 Global Step: 147550 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:17:31,194-Speed 2627.25 samples/sec Loss 11.4217 LearningRate 0.0676 Epoch: 3 Global Step: 147560 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:17:35,120-Speed 2608.85 samples/sec Loss 11.1682 LearningRate 0.0676 Epoch: 3 Global Step: 147570 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:17:39,015-Speed 2629.35 samples/sec Loss 11.3656 LearningRate 0.0676 Epoch: 3 Global Step: 147580 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:17:42,909-Speed 2630.51 samples/sec Loss 11.2635 LearningRate 0.0676 Epoch: 3 Global Step: 147590 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:17:46,811-Speed 2624.71 samples/sec Loss 11.2710 LearningRate 0.0676 Epoch: 3 Global Step: 147600 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:17:50,703-Speed 2631.50 samples/sec Loss 11.1677 LearningRate 0.0676 Epoch: 3 Global Step: 147610 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:17:54,595-Speed 2632.03 samples/sec Loss 11.4823 LearningRate 0.0676 Epoch: 3 Global Step: 147620 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:17:58,494-Speed 2627.32 samples/sec Loss 11.1746 LearningRate 0.0676 Epoch: 3 Global Step: 147630 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:02,387-Speed 2630.97 samples/sec Loss 11.2976 LearningRate 0.0676 Epoch: 3 Global Step: 147640 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:06,285-Speed 2627.63 samples/sec Loss 11.3105 LearningRate 0.0676 Epoch: 3 Global Step: 147650 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:10,170-Speed 2636.05 samples/sec Loss 11.1982 LearningRate 0.0676 Epoch: 3 Global Step: 147660 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:14,122-Speed 2591.15 samples/sec Loss 11.3162 LearningRate 0.0676 Epoch: 3 Global Step: 147670 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:18,028-Speed 2622.75 samples/sec Loss 11.1553 LearningRate 0.0676 Epoch: 3 Global Step: 147680 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:21,925-Speed 2628.50 samples/sec Loss 11.2276 LearningRate 0.0676 Epoch: 3 Global Step: 147690 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:25,816-Speed 2632.32 samples/sec Loss 11.2788 LearningRate 0.0676 Epoch: 3 Global Step: 147700 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:29,713-Speed 2627.75 samples/sec Loss 11.1792 LearningRate 0.0676 Epoch: 3 Global Step: 147710 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:33,615-Speed 2625.48 samples/sec Loss 11.2086 LearningRate 0.0676 Epoch: 3 Global Step: 147720 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:37,514-Speed 2627.20 samples/sec Loss 11.2033 LearningRate 0.0676 Epoch: 3 Global Step: 147730 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:41,417-Speed 2624.34 samples/sec Loss 11.1510 LearningRate 0.0676 Epoch: 3 Global Step: 147740 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:45,325-Speed 2620.59 samples/sec Loss 11.3314 LearningRate 0.0676 Epoch: 3 Global Step: 147750 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:18:49,195-Speed 2647.30 samples/sec Loss 11.3010 LearningRate 0.0675 Epoch: 3 Global Step: 147760 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:18:53,085-Speed 2633.30 samples/sec Loss 11.1880 LearningRate 0.0675 Epoch: 3 Global Step: 147770 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:18:56,981-Speed 2628.64 samples/sec Loss 11.3444 LearningRate 0.0675 Epoch: 3 Global Step: 147780 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:00,885-Speed 2623.62 samples/sec Loss 11.2322 LearningRate 0.0675 Epoch: 3 Global Step: 147790 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:04,772-Speed 2634.72 samples/sec Loss 11.1788 LearningRate 0.0675 Epoch: 3 Global Step: 147800 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:08,668-Speed 2628.98 samples/sec Loss 11.2328 LearningRate 0.0675 Epoch: 3 Global Step: 147810 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:12,565-Speed 2628.61 samples/sec Loss 11.3483 LearningRate 0.0675 Epoch: 3 Global Step: 147820 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:16,464-Speed 2627.10 samples/sec Loss 11.0399 LearningRate 0.0675 Epoch: 3 Global Step: 147830 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:20,357-Speed 2630.98 samples/sec Loss 11.2294 LearningRate 0.0675 Epoch: 3 Global Step: 147840 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:24,252-Speed 2629.68 samples/sec Loss 11.2343 LearningRate 0.0675 Epoch: 3 Global Step: 147850 Fp16 Grad Scale: 131072 Required: 77 hours
Training: 2022-04-13 12:19:28,146-Speed 2630.94 samples/sec Loss 11.2732 LearningRate 0.0675 Epoch: 3 Global Step: 147860 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:19:32,043-Speed 2628.12 samples/sec Loss 11.3297 LearningRate 0.0675 Epoch: 3 Global Step: 147870 Fp16 Grad Scale: 262144 Required: 77 hours
Training: 2022-04-13 12:19:35,950-Speed 2621.20 samples/sec Loss 11.3081 LearningRate 0.0675 Epoch: 3 Global Step: 147880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:19:39,860-Speed 2619.61 samples/sec Loss 11.3774 LearningRate 0.0675 Epoch: 3 Global Step: 147890 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:19:43,757-Speed 2628.90 samples/sec Loss 11.1073 LearningRate 0.0675 Epoch: 3 Global Step: 147900 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:19:47,665-Speed 2620.84 samples/sec Loss 11.2816 LearningRate 0.0675 Epoch: 3 Global Step: 147910 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:19:51,558-Speed 2631.06 samples/sec Loss 11.2661 LearningRate 0.0675 Epoch: 3 Global Step: 147920 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:19:55,471-Speed 2617.52 samples/sec Loss 11.2831 LearningRate 0.0675 Epoch: 3 Global Step: 147930 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:19:59,365-Speed 2630.82 samples/sec Loss 11.2980 LearningRate 0.0675 Epoch: 3 Global Step: 147940 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:03,262-Speed 2628.06 samples/sec Loss 11.3030 LearningRate 0.0675 Epoch: 3 Global Step: 147950 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:07,158-Speed 2628.63 samples/sec Loss 11.2001 LearningRate 0.0675 Epoch: 3 Global Step: 147960 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:11,050-Speed 2631.66 samples/sec Loss 11.4102 LearningRate 0.0675 Epoch: 3 Global Step: 147970 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:14,948-Speed 2627.74 samples/sec Loss 11.2580 LearningRate 0.0675 Epoch: 3 Global Step: 147980 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:20:18,850-Speed 2625.39 samples/sec Loss 11.2938 LearningRate 0.0675 Epoch: 3 Global Step: 147990 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:20:22,750-Speed 2626.54 samples/sec Loss 11.1379 LearningRate 0.0675 Epoch: 3 Global Step: 148000 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:20:26,644-Speed 2630.44 samples/sec Loss 11.1784 LearningRate 0.0675 Epoch: 3 Global Step: 148010 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:20:30,542-Speed 2627.44 samples/sec Loss 11.2337 LearningRate 0.0675 Epoch: 3 Global Step: 148020 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:20:34,442-Speed 2626.08 samples/sec Loss 11.1793 LearningRate 0.0675 Epoch: 3 Global Step: 148030 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:20:38,352-Speed 2619.75 samples/sec Loss 11.1736 LearningRate 0.0675 Epoch: 3 Global Step: 148040 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:20:42,237-Speed 2636.86 samples/sec Loss 11.1883 LearningRate 0.0675 Epoch: 3 Global Step: 148050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:46,137-Speed 2626.32 samples/sec Loss 11.1377 LearningRate 0.0675 Epoch: 3 Global Step: 148060 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:50,042-Speed 2622.68 samples/sec Loss 11.2101 LearningRate 0.0675 Epoch: 3 Global Step: 148070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:53,939-Speed 2629.03 samples/sec Loss 11.2359 LearningRate 0.0675 Epoch: 3 Global Step: 148080 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:20:57,841-Speed 2625.61 samples/sec Loss 11.2752 LearningRate 0.0675 Epoch: 3 Global Step: 148090 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:01,738-Speed 2628.36 samples/sec Loss 11.1303 LearningRate 0.0675 Epoch: 3 Global Step: 148100 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:05,658-Speed 2612.85 samples/sec Loss 11.3733 LearningRate 0.0675 Epoch: 3 Global Step: 148110 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:09,555-Speed 2627.79 samples/sec Loss 11.1863 LearningRate 0.0675 Epoch: 3 Global Step: 148120 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:13,483-Speed 2608.07 samples/sec Loss 11.3325 LearningRate 0.0675 Epoch: 3 Global Step: 148130 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:17,380-Speed 2629.28 samples/sec Loss 11.2392 LearningRate 0.0675 Epoch: 3 Global Step: 148140 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:21,284-Speed 2622.89 samples/sec Loss 11.3009 LearningRate 0.0675 Epoch: 3 Global Step: 148150 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:21:25,219-Speed 2603.60 samples/sec Loss 11.1720 LearningRate 0.0675 Epoch: 3 Global Step: 148160 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:21:29,102-Speed 2637.88 samples/sec Loss 11.2496 LearningRate 0.0675 Epoch: 3 Global Step: 148170 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:33,004-Speed 2624.74 samples/sec Loss 11.1808 LearningRate 0.0675 Epoch: 3 Global Step: 148180 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:36,898-Speed 2630.47 samples/sec Loss 11.3508 LearningRate 0.0675 Epoch: 3 Global Step: 148190 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:40,792-Speed 2630.29 samples/sec Loss 11.2602 LearningRate 0.0675 Epoch: 3 Global Step: 148200 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:44,691-Speed 2626.92 samples/sec Loss 11.2166 LearningRate 0.0675 Epoch: 3 Global Step: 148210 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:48,588-Speed 2628.78 samples/sec Loss 11.3948 LearningRate 0.0675 Epoch: 3 Global Step: 148220 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:21:52,468-Speed 2639.64 samples/sec Loss 11.2095 LearningRate 0.0675 Epoch: 3 Global Step: 148230 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:21:56,364-Speed 2629.08 samples/sec Loss 11.2067 LearningRate 0.0675 Epoch: 3 Global Step: 148240 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:00,263-Speed 2626.90 samples/sec Loss 11.2218 LearningRate 0.0675 Epoch: 3 Global Step: 148250 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:04,162-Speed 2626.78 samples/sec Loss 11.3320 LearningRate 0.0675 Epoch: 3 Global Step: 148260 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:08,097-Speed 2603.10 samples/sec Loss 11.3767 LearningRate 0.0674 Epoch: 3 Global Step: 148270 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:12,003-Speed 2622.88 samples/sec Loss 11.1937 LearningRate 0.0674 Epoch: 3 Global Step: 148280 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:15,948-Speed 2596.32 samples/sec Loss 11.2512 LearningRate 0.0674 Epoch: 3 Global Step: 148290 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:19,863-Speed 2616.59 samples/sec Loss 11.3114 LearningRate 0.0674 Epoch: 3 Global Step: 148300 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:23,771-Speed 2620.80 samples/sec Loss 11.3157 LearningRate 0.0674 Epoch: 3 Global Step: 148310 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:27,683-Speed 2618.40 samples/sec Loss 11.1930 LearningRate 0.0674 Epoch: 3 Global Step: 148320 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:22:31,589-Speed 2621.99 samples/sec Loss 11.3112 LearningRate 0.0674 Epoch: 3 Global Step: 148330 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:22:35,485-Speed 2628.92 samples/sec Loss 11.2844 LearningRate 0.0674 Epoch: 3 Global Step: 148340 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:22:39,395-Speed 2619.66 samples/sec Loss 11.1759 LearningRate 0.0674 Epoch: 3 Global Step: 148350 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:22:43,300-Speed 2622.83 samples/sec Loss 11.1560 LearningRate 0.0674 Epoch: 3 Global Step: 148360 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:22:47,204-Speed 2623.57 samples/sec Loss 11.2356 LearningRate 0.0674 Epoch: 3 Global Step: 148370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:22:51,100-Speed 2629.56 samples/sec Loss 11.2493 LearningRate 0.0674 Epoch: 3 Global Step: 148380 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:22:54,996-Speed 2628.60 samples/sec Loss 11.3436 LearningRate 0.0674 Epoch: 3 Global Step: 148390 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:22:58,914-Speed 2614.28 samples/sec Loss 11.2941 LearningRate 0.0674 Epoch: 3 Global Step: 148400 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:02,818-Speed 2623.60 samples/sec Loss 11.4579 LearningRate 0.0674 Epoch: 3 Global Step: 148410 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:06,711-Speed 2630.78 samples/sec Loss 11.2315 LearningRate 0.0674 Epoch: 3 Global Step: 148420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:10,606-Speed 2629.60 samples/sec Loss 11.0127 LearningRate 0.0674 Epoch: 3 Global Step: 148430 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:23:14,502-Speed 2628.88 samples/sec Loss 11.2043 LearningRate 0.0674 Epoch: 3 Global Step: 148440 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:23:18,404-Speed 2626.14 samples/sec Loss 11.2275 LearningRate 0.0674 Epoch: 3 Global Step: 148450 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:23:22,299-Speed 2629.51 samples/sec Loss 11.1839 LearningRate 0.0674 Epoch: 3 Global Step: 148460 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:23:26,195-Speed 2628.79 samples/sec Loss 11.1506 LearningRate 0.0674 Epoch: 3 Global Step: 148470 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:23:30,096-Speed 2625.30 samples/sec Loss 11.3451 LearningRate 0.0674 Epoch: 3 Global Step: 148480 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:23:33,979-Speed 2638.12 samples/sec Loss 11.1499 LearningRate 0.0674 Epoch: 3 Global Step: 148490 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:37,870-Speed 2631.72 samples/sec Loss 11.3024 LearningRate 0.0674 Epoch: 3 Global Step: 148500 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:41,767-Speed 2628.79 samples/sec Loss 11.2164 LearningRate 0.0674 Epoch: 3 Global Step: 148510 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:45,669-Speed 2625.13 samples/sec Loss 11.2351 LearningRate 0.0674 Epoch: 3 Global Step: 148520 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:49,569-Speed 2626.45 samples/sec Loss 11.2583 LearningRate 0.0674 Epoch: 3 Global Step: 148530 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:53,462-Speed 2630.99 samples/sec Loss 11.1214 LearningRate 0.0674 Epoch: 3 Global Step: 148540 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:23:57,353-Speed 2632.17 samples/sec Loss 11.2412 LearningRate 0.0674 Epoch: 3 Global Step: 148550 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:24:01,251-Speed 2627.69 samples/sec Loss 11.3874 LearningRate 0.0674 Epoch: 3 Global Step: 148560 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:24:05,096-Speed 2663.79 samples/sec Loss 11.2341 LearningRate 0.0674 Epoch: 3 Global Step: 148570 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:08,995-Speed 2626.57 samples/sec Loss 11.3096 LearningRate 0.0674 Epoch: 3 Global Step: 148580 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:12,899-Speed 2623.88 samples/sec Loss 11.4962 LearningRate 0.0674 Epoch: 3 Global Step: 148590 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:16,799-Speed 2625.80 samples/sec Loss 11.1962 LearningRate 0.0674 Epoch: 3 Global Step: 148600 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:20,701-Speed 2625.11 samples/sec Loss 11.4817 LearningRate 0.0674 Epoch: 3 Global Step: 148610 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:24,601-Speed 2626.46 samples/sec Loss 11.3998 LearningRate 0.0674 Epoch: 3 Global Step: 148620 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:28,503-Speed 2624.99 samples/sec Loss 11.2013 LearningRate 0.0674 Epoch: 3 Global Step: 148630 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:32,405-Speed 2624.72 samples/sec Loss 11.2708 LearningRate 0.0674 Epoch: 3 Global Step: 148640 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:36,388-Speed 2571.59 samples/sec Loss 11.3011 LearningRate 0.0674 Epoch: 3 Global Step: 148650 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:40,390-Speed 2558.76 samples/sec Loss 11.2079 LearningRate 0.0674 Epoch: 3 Global Step: 148660 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:24:44,408-Speed 2549.71 samples/sec Loss 11.3193 LearningRate 0.0674 Epoch: 3 Global Step: 148670 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:24:48,311-Speed 2623.95 samples/sec Loss 11.2873 LearningRate 0.0674 Epoch: 3 Global Step: 148680 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:24:52,207-Speed 2629.75 samples/sec Loss 11.2130 LearningRate 0.0674 Epoch: 3 Global Step: 148690 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:24:56,108-Speed 2625.40 samples/sec Loss 11.3460 LearningRate 0.0674 Epoch: 3 Global Step: 148700 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:00,002-Speed 2630.07 samples/sec Loss 11.2346 LearningRate 0.0674 Epoch: 3 Global Step: 148710 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:03,904-Speed 2624.98 samples/sec Loss 11.4171 LearningRate 0.0674 Epoch: 3 Global Step: 148720 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:07,816-Speed 2618.45 samples/sec Loss 11.3762 LearningRate 0.0674 Epoch: 3 Global Step: 148730 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:11,712-Speed 2629.49 samples/sec Loss 11.4111 LearningRate 0.0674 Epoch: 3 Global Step: 148740 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:15,612-Speed 2625.78 samples/sec Loss 11.2425 LearningRate 0.0674 Epoch: 3 Global Step: 148750 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:19,520-Speed 2621.44 samples/sec Loss 11.2509 LearningRate 0.0674 Epoch: 3 Global Step: 148760 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:23,406-Speed 2636.09 samples/sec Loss 11.1529 LearningRate 0.0673 Epoch: 3 Global Step: 148770 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:25:27,300-Speed 2630.35 samples/sec Loss 11.1064 LearningRate 0.0673 Epoch: 3 Global Step: 148780 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:25:31,199-Speed 2626.76 samples/sec Loss 11.2624 LearningRate 0.0673 Epoch: 3 Global Step: 148790 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:25:35,108-Speed 2619.87 samples/sec Loss 11.2600 LearningRate 0.0673 Epoch: 3 Global Step: 148800 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:25:39,001-Speed 2631.11 samples/sec Loss 11.4188 LearningRate 0.0673 Epoch: 3 Global Step: 148810 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:25:42,927-Speed 2609.59 samples/sec Loss 11.2670 LearningRate 0.0673 Epoch: 3 Global Step: 148820 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:25:46,851-Speed 2609.72 samples/sec Loss 11.2676 LearningRate 0.0673 Epoch: 3 Global Step: 148830 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:25:50,733-Speed 2642.49 samples/sec Loss 11.3357 LearningRate 0.0673 Epoch: 3 Global Step: 148840 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:54,622-Speed 2633.69 samples/sec Loss 11.2700 LearningRate 0.0673 Epoch: 3 Global Step: 148850 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:25:58,534-Speed 2618.38 samples/sec Loss 11.3101 LearningRate 0.0673 Epoch: 3 Global Step: 148860 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:02,451-Speed 2614.97 samples/sec Loss 11.2806 LearningRate 0.0673 Epoch: 3 Global Step: 148870 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:06,355-Speed 2623.48 samples/sec Loss 11.1791 LearningRate 0.0673 Epoch: 3 Global Step: 148880 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:10,255-Speed 2625.94 samples/sec Loss 11.3550 LearningRate 0.0673 Epoch: 3 Global Step: 148890 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:14,168-Speed 2618.33 samples/sec Loss 11.3085 LearningRate 0.0673 Epoch: 3 Global Step: 148900 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:18,110-Speed 2598.00 samples/sec Loss 11.2584 LearningRate 0.0673 Epoch: 3 Global Step: 148910 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:22,007-Speed 2628.32 samples/sec Loss 11.2910 LearningRate 0.0673 Epoch: 3 Global Step: 148920 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:25,904-Speed 2628.64 samples/sec Loss 11.2716 LearningRate 0.0673 Epoch: 3 Global Step: 148930 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:26:29,810-Speed 2622.03 samples/sec Loss 11.3054 LearningRate 0.0673 Epoch: 3 Global Step: 148940 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:26:33,721-Speed 2619.27 samples/sec Loss 11.2482 LearningRate 0.0673 Epoch: 3 Global Step: 148950 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:26:37,628-Speed 2621.61 samples/sec Loss 11.3159 LearningRate 0.0673 Epoch: 3 Global Step: 148960 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:26:41,520-Speed 2632.23 samples/sec Loss 11.3510 LearningRate 0.0673 Epoch: 3 Global Step: 148970 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:26:45,412-Speed 2631.29 samples/sec Loss 11.1523 LearningRate 0.0673 Epoch: 3 Global Step: 148980 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:26:49,329-Speed 2615.48 samples/sec Loss 11.1632 LearningRate 0.0673 Epoch: 3 Global Step: 148990 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:26:53,239-Speed 2619.22 samples/sec Loss 11.1516 LearningRate 0.0673 Epoch: 3 Global Step: 149000 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:26:57,157-Speed 2615.10 samples/sec Loss 11.2781 LearningRate 0.0673 Epoch: 3 Global Step: 149010 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:27:01,064-Speed 2621.31 samples/sec Loss 11.1255 LearningRate 0.0673 Epoch: 3 Global Step: 149020 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:27:04,965-Speed 2625.55 samples/sec Loss 11.4366 LearningRate 0.0673 Epoch: 3 Global Step: 149030 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:27:08,888-Speed 2610.53 samples/sec Loss 11.4131 LearningRate 0.0673 Epoch: 3 Global Step: 149040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:27:12,788-Speed 2626.66 samples/sec Loss 11.0699 LearningRate 0.0673 Epoch: 3 Global Step: 149050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:27:16,663-Speed 2643.82 samples/sec Loss 11.2967 LearningRate 0.0673 Epoch: 3 Global Step: 149060 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:27:20,536-Speed 2645.16 samples/sec Loss 11.2360 LearningRate 0.0673 Epoch: 3 Global Step: 149070 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:24,433-Speed 2627.77 samples/sec Loss 11.1698 LearningRate 0.0673 Epoch: 3 Global Step: 149080 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:28,336-Speed 2625.48 samples/sec Loss 11.1338 LearningRate 0.0673 Epoch: 3 Global Step: 149090 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:32,234-Speed 2627.49 samples/sec Loss 11.2707 LearningRate 0.0673 Epoch: 3 Global Step: 149100 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:36,132-Speed 2627.52 samples/sec Loss 11.1545 LearningRate 0.0673 Epoch: 3 Global Step: 149110 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:40,036-Speed 2623.12 samples/sec Loss 11.1665 LearningRate 0.0673 Epoch: 3 Global Step: 149120 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:43,937-Speed 2626.18 samples/sec Loss 11.1816 LearningRate 0.0673 Epoch: 3 Global Step: 149130 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:47,829-Speed 2631.65 samples/sec Loss 11.1888 LearningRate 0.0673 Epoch: 3 Global Step: 149140 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:51,717-Speed 2634.76 samples/sec Loss 11.3147 LearningRate 0.0673 Epoch: 3 Global Step: 149150 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:55,618-Speed 2625.13 samples/sec Loss 11.1450 LearningRate 0.0673 Epoch: 3 Global Step: 149160 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 12:27:59,509-Speed 2632.69 samples/sec Loss 11.0684 LearningRate 0.0673 Epoch: 3 Global Step: 149170 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:03,403-Speed 2630.33 samples/sec Loss 11.2868 LearningRate 0.0673 Epoch: 3 Global Step: 149180 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:07,294-Speed 2632.14 samples/sec Loss 11.4429 LearningRate 0.0673 Epoch: 3 Global Step: 149190 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:11,249-Speed 2589.15 samples/sec Loss 11.2643 LearningRate 0.0673 Epoch: 3 Global Step: 149200 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:15,138-Speed 2633.49 samples/sec Loss 11.3004 LearningRate 0.0673 Epoch: 3 Global Step: 149210 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:19,044-Speed 2622.72 samples/sec Loss 11.2726 LearningRate 0.0673 Epoch: 3 Global Step: 149220 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:22,953-Speed 2620.17 samples/sec Loss 11.1921 LearningRate 0.0673 Epoch: 3 Global Step: 149230 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:26,864-Speed 2619.54 samples/sec Loss 11.0629 LearningRate 0.0673 Epoch: 3 Global Step: 149240 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:30,762-Speed 2627.39 samples/sec Loss 11.0748 LearningRate 0.0673 Epoch: 3 Global Step: 149250 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:34,683-Speed 2612.45 samples/sec Loss 11.2977 LearningRate 0.0673 Epoch: 3 Global Step: 149260 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:28:38,578-Speed 2629.50 samples/sec Loss 11.2025 LearningRate 0.0673 Epoch: 3 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:28:42,543-Speed 2583.73 samples/sec Loss 11.1572 LearningRate 0.0672 Epoch: 3 Global Step: 149280 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:28:46,524-Speed 2572.85 samples/sec Loss 11.0558 LearningRate 0.0672 Epoch: 3 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:28:50,455-Speed 2605.61 samples/sec Loss 11.2233 LearningRate 0.0672 Epoch: 3 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:28:54,363-Speed 2620.69 samples/sec Loss 11.0965 LearningRate 0.0672 Epoch: 3 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:28:58,270-Speed 2622.43 samples/sec Loss 11.1488 LearningRate 0.0672 Epoch: 3 Global Step: 149320 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:29:02,185-Speed 2615.70 samples/sec Loss 11.0926 LearningRate 0.0672 Epoch: 3 Global Step: 149330 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:29:06,083-Speed 2627.87 samples/sec Loss 11.1771 LearningRate 0.0672 Epoch: 3 Global Step: 149340 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:29:09,981-Speed 2627.88 samples/sec Loss 11.1843 LearningRate 0.0672 Epoch: 3 Global Step: 149350 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:29:13,877-Speed 2628.95 samples/sec Loss 11.2019 LearningRate 0.0672 Epoch: 3 Global Step: 149360 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:29:17,775-Speed 2627.15 samples/sec Loss 11.2119 LearningRate 0.0672 Epoch: 3 Global Step: 149370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:21,706-Speed 2606.49 samples/sec Loss 11.2318 LearningRate 0.0672 Epoch: 3 Global Step: 149380 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:25,611-Speed 2622.81 samples/sec Loss 11.1882 LearningRate 0.0672 Epoch: 3 Global Step: 149390 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:29,507-Speed 2628.94 samples/sec Loss 11.1599 LearningRate 0.0672 Epoch: 3 Global Step: 149400 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:33,411-Speed 2624.05 samples/sec Loss 11.2349 LearningRate 0.0672 Epoch: 3 Global Step: 149410 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:37,304-Speed 2630.77 samples/sec Loss 11.3685 LearningRate 0.0672 Epoch: 3 Global Step: 149420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:41,227-Speed 2610.71 samples/sec Loss 11.2356 LearningRate 0.0672 Epoch: 3 Global Step: 149430 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:45,113-Speed 2635.78 samples/sec Loss 11.0778 LearningRate 0.0672 Epoch: 3 Global Step: 149440 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:49,004-Speed 2632.43 samples/sec Loss 11.1142 LearningRate 0.0672 Epoch: 3 Global Step: 149450 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:52,898-Speed 2630.39 samples/sec Loss 11.2669 LearningRate 0.0672 Epoch: 3 Global Step: 149460 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:29:56,790-Speed 2631.71 samples/sec Loss 11.2282 LearningRate 0.0672 Epoch: 3 Global Step: 149470 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:30:00,692-Speed 2625.03 samples/sec Loss 11.1693 LearningRate 0.0672 Epoch: 3 Global Step: 149480 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:30:04,585-Speed 2630.91 samples/sec Loss 11.1908 LearningRate 0.0672 Epoch: 3 Global Step: 149490 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:30:08,517-Speed 2604.93 samples/sec Loss 11.3058 LearningRate 0.0672 Epoch: 3 Global Step: 149500 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:30:12,403-Speed 2636.00 samples/sec Loss 11.2068 LearningRate 0.0672 Epoch: 3 Global Step: 149510 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:30:16,301-Speed 2627.90 samples/sec Loss 11.1962 LearningRate 0.0672 Epoch: 3 Global Step: 149520 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:30:20,209-Speed 2620.71 samples/sec Loss 11.3849 LearningRate 0.0672 Epoch: 3 Global Step: 149530 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:30:24,088-Speed 2640.91 samples/sec Loss 11.1054 LearningRate 0.0672 Epoch: 3 Global Step: 149540 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:27,975-Speed 2634.60 samples/sec Loss 11.1353 LearningRate 0.0672 Epoch: 3 Global Step: 149550 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:31,869-Speed 2630.57 samples/sec Loss 11.3026 LearningRate 0.0672 Epoch: 3 Global Step: 149560 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:35,762-Speed 2630.51 samples/sec Loss 11.1207 LearningRate 0.0672 Epoch: 3 Global Step: 149570 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:39,655-Speed 2631.68 samples/sec Loss 11.2394 LearningRate 0.0672 Epoch: 3 Global Step: 149580 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:43,552-Speed 2628.24 samples/sec Loss 11.2139 LearningRate 0.0672 Epoch: 3 Global Step: 149590 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:47,446-Speed 2629.82 samples/sec Loss 11.2326 LearningRate 0.0672 Epoch: 3 Global Step: 149600 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:51,344-Speed 2628.08 samples/sec Loss 11.2401 LearningRate 0.0672 Epoch: 3 Global Step: 149610 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:55,248-Speed 2623.34 samples/sec Loss 11.1541 LearningRate 0.0672 Epoch: 3 Global Step: 149620 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:30:59,142-Speed 2630.39 samples/sec Loss 11.2465 LearningRate 0.0672 Epoch: 3 Global Step: 149630 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:31:03,034-Speed 2631.72 samples/sec Loss 11.3367 LearningRate 0.0672 Epoch: 3 Global Step: 149640 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:06,930-Speed 2629.40 samples/sec Loss 11.2130 LearningRate 0.0672 Epoch: 3 Global Step: 149650 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:10,842-Speed 2617.93 samples/sec Loss 11.3370 LearningRate 0.0672 Epoch: 3 Global Step: 149660 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:14,931-Speed 2504.72 samples/sec Loss 11.2858 LearningRate 0.0672 Epoch: 3 Global Step: 149670 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:18,832-Speed 2626.25 samples/sec Loss 11.2174 LearningRate 0.0672 Epoch: 3 Global Step: 149680 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:22,731-Speed 2626.64 samples/sec Loss 11.2369 LearningRate 0.0672 Epoch: 3 Global Step: 149690 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:26,661-Speed 2606.51 samples/sec Loss 11.1978 LearningRate 0.0672 Epoch: 3 Global Step: 149700 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:30,564-Speed 2624.66 samples/sec Loss 11.2191 LearningRate 0.0672 Epoch: 3 Global Step: 149710 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:34,456-Speed 2631.53 samples/sec Loss 11.3366 LearningRate 0.0672 Epoch: 3 Global Step: 149720 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:38,350-Speed 2629.78 samples/sec Loss 11.2383 LearningRate 0.0672 Epoch: 3 Global Step: 149730 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:31:42,291-Speed 2599.39 samples/sec Loss 11.1974 LearningRate 0.0672 Epoch: 3 Global Step: 149740 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:31:46,186-Speed 2629.82 samples/sec Loss 11.2605 LearningRate 0.0672 Epoch: 3 Global Step: 149750 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:31:50,119-Speed 2604.57 samples/sec Loss 11.2805 LearningRate 0.0672 Epoch: 3 Global Step: 149760 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:31:54,026-Speed 2621.77 samples/sec Loss 11.3401 LearningRate 0.0672 Epoch: 3 Global Step: 149770 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:31:57,934-Speed 2621.23 samples/sec Loss 11.3692 LearningRate 0.0671 Epoch: 3 Global Step: 149780 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:32:01,827-Speed 2630.68 samples/sec Loss 11.1901 LearningRate 0.0671 Epoch: 3 Global Step: 149790 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:32:05,736-Speed 2620.40 samples/sec Loss 11.1693 LearningRate 0.0671 Epoch: 3 Global Step: 149800 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:32:09,629-Speed 2631.08 samples/sec Loss 11.0744 LearningRate 0.0671 Epoch: 3 Global Step: 149810 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:32:13,524-Speed 2630.13 samples/sec Loss 11.3814 LearningRate 0.0671 Epoch: 3 Global Step: 149820 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:32:17,409-Speed 2636.01 samples/sec Loss 11.3633 LearningRate 0.0671 Epoch: 3 Global Step: 149830 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:21,308-Speed 2627.30 samples/sec Loss 11.1319 LearningRate 0.0671 Epoch: 3 Global Step: 149840 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:25,201-Speed 2631.34 samples/sec Loss 11.2255 LearningRate 0.0671 Epoch: 3 Global Step: 149850 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:29,108-Speed 2621.69 samples/sec Loss 11.2961 LearningRate 0.0671 Epoch: 3 Global Step: 149860 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:33,009-Speed 2625.46 samples/sec Loss 11.3622 LearningRate 0.0671 Epoch: 3 Global Step: 149870 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:36,904-Speed 2629.79 samples/sec Loss 11.2460 LearningRate 0.0671 Epoch: 3 Global Step: 149880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:40,798-Speed 2630.15 samples/sec Loss 11.1109 LearningRate 0.0671 Epoch: 3 Global Step: 149890 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:44,693-Speed 2630.06 samples/sec Loss 11.1618 LearningRate 0.0671 Epoch: 3 Global Step: 149900 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:48,586-Speed 2631.18 samples/sec Loss 11.3483 LearningRate 0.0671 Epoch: 3 Global Step: 149910 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:52,485-Speed 2627.05 samples/sec Loss 11.3089 LearningRate 0.0671 Epoch: 3 Global Step: 149920 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:32:56,386-Speed 2625.67 samples/sec Loss 11.3132 LearningRate 0.0671 Epoch: 3 Global Step: 149930 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:33:00,281-Speed 2629.91 samples/sec Loss 11.0319 LearningRate 0.0671 Epoch: 3 Global Step: 149940 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:33:04,160-Speed 2639.82 samples/sec Loss 11.2636 LearningRate 0.0671 Epoch: 3 Global Step: 149950 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:33:08,058-Speed 2627.84 samples/sec Loss 11.2228 LearningRate 0.0671 Epoch: 3 Global Step: 149960 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:33:11,950-Speed 2632.02 samples/sec Loss 11.3525 LearningRate 0.0671 Epoch: 3 Global Step: 149970 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:33:15,849-Speed 2627.21 samples/sec Loss 11.2732 LearningRate 0.0671 Epoch: 3 Global Step: 149980 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:33:19,742-Speed 2631.30 samples/sec Loss 11.2130 LearningRate 0.0671 Epoch: 3 Global Step: 149990 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:33:23,640-Speed 2627.59 samples/sec Loss 11.1958 LearningRate 0.0671 Epoch: 3 Global Step: 150000 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:34:07,413-[lfw][150000]XNorm: 23.773956
Training: 2022-04-13 12:34:07,414-[lfw][150000]Accuracy-Flip: 0.99750+-0.00271
Training: 2022-04-13 12:34:07,415-[lfw][150000]Accuracy-Highest: 0.99783
Training: 2022-04-13 12:34:57,644-[cfp_fp][150000]XNorm: 21.455639
Training: 2022-04-13 12:34:57,645-[cfp_fp][150000]Accuracy-Flip: 0.98100+-0.00626
Training: 2022-04-13 12:34:57,646-[cfp_fp][150000]Accuracy-Highest: 0.98100
Training: 2022-04-13 12:35:41,139-[agedb_30][150000]XNorm: 23.533388
Training: 2022-04-13 12:35:41,140-[agedb_30][150000]Accuracy-Flip: 0.97000+-0.00645
Training: 2022-04-13 12:35:41,141-[agedb_30][150000]Accuracy-Highest: 0.97000
Training: 2022-04-13 12:35:45,007-Speed 72.44 samples/sec Loss 11.2658 LearningRate 0.0671 Epoch: 3 Global Step: 150010 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:35:48,877-Speed 2646.72 samples/sec Loss 11.3336 LearningRate 0.0671 Epoch: 3 Global Step: 150020 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:35:52,760-Speed 2638.03 samples/sec Loss 11.4217 LearningRate 0.0671 Epoch: 3 Global Step: 150030 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:35:56,636-Speed 2642.86 samples/sec Loss 11.3243 LearningRate 0.0671 Epoch: 3 Global Step: 150040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:00,511-Speed 2643.62 samples/sec Loss 11.3368 LearningRate 0.0671 Epoch: 3 Global Step: 150050 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:36:04,392-Speed 2640.03 samples/sec Loss 11.3843 LearningRate 0.0671 Epoch: 3 Global Step: 150060 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:36:08,258-Speed 2649.38 samples/sec Loss 11.1806 LearningRate 0.0671 Epoch: 3 Global Step: 150070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:12,135-Speed 2641.39 samples/sec Loss 11.1321 LearningRate 0.0671 Epoch: 3 Global Step: 150080 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:16,095-Speed 2587.12 samples/sec Loss 11.3816 LearningRate 0.0671 Epoch: 3 Global Step: 150090 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:19,997-Speed 2625.27 samples/sec Loss 11.2845 LearningRate 0.0671 Epoch: 3 Global Step: 150100 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:23,884-Speed 2634.58 samples/sec Loss 11.0953 LearningRate 0.0671 Epoch: 3 Global Step: 150110 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:27,771-Speed 2634.84 samples/sec Loss 11.2218 LearningRate 0.0671 Epoch: 3 Global Step: 150120 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:31,656-Speed 2637.31 samples/sec Loss 11.1449 LearningRate 0.0671 Epoch: 3 Global Step: 150130 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:35,543-Speed 2634.58 samples/sec Loss 11.2068 LearningRate 0.0671 Epoch: 3 Global Step: 150140 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:39,443-Speed 2625.98 samples/sec Loss 11.2226 LearningRate 0.0671 Epoch: 3 Global Step: 150150 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:43,332-Speed 2634.29 samples/sec Loss 11.2967 LearningRate 0.0671 Epoch: 3 Global Step: 150160 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:36:47,241-Speed 2620.06 samples/sec Loss 11.2505 LearningRate 0.0671 Epoch: 3 Global Step: 150170 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:36:51,140-Speed 2627.53 samples/sec Loss 11.3009 LearningRate 0.0671 Epoch: 3 Global Step: 150180 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:36:55,029-Speed 2634.04 samples/sec Loss 11.2855 LearningRate 0.0671 Epoch: 3 Global Step: 150190 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:36:58,886-Speed 2655.06 samples/sec Loss 11.1107 LearningRate 0.0671 Epoch: 3 Global Step: 150200 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:02,786-Speed 2626.32 samples/sec Loss 10.9861 LearningRate 0.0671 Epoch: 3 Global Step: 150210 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:06,674-Speed 2634.50 samples/sec Loss 11.2671 LearningRate 0.0671 Epoch: 3 Global Step: 150220 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:10,570-Speed 2629.42 samples/sec Loss 11.1470 LearningRate 0.0671 Epoch: 3 Global Step: 150230 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:14,475-Speed 2622.53 samples/sec Loss 11.2473 LearningRate 0.0671 Epoch: 3 Global Step: 150240 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:18,376-Speed 2625.56 samples/sec Loss 11.1963 LearningRate 0.0671 Epoch: 3 Global Step: 150250 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:22,271-Speed 2629.70 samples/sec Loss 11.1708 LearningRate 0.0671 Epoch: 3 Global Step: 150260 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:26,169-Speed 2628.01 samples/sec Loss 11.1549 LearningRate 0.0671 Epoch: 3 Global Step: 150270 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:30,094-Speed 2609.37 samples/sec Loss 11.1859 LearningRate 0.0671 Epoch: 3 Global Step: 150280 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:34,014-Speed 2613.27 samples/sec Loss 11.2560 LearningRate 0.0670 Epoch: 3 Global Step: 150290 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:37:37,918-Speed 2623.04 samples/sec Loss 11.2425 LearningRate 0.0670 Epoch: 3 Global Step: 150300 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:37:41,888-Speed 2580.77 samples/sec Loss 11.1935 LearningRate 0.0670 Epoch: 3 Global Step: 150310 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:37:45,783-Speed 2629.64 samples/sec Loss 11.1098 LearningRate 0.0670 Epoch: 3 Global Step: 150320 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:37:49,688-Speed 2622.77 samples/sec Loss 11.0595 LearningRate 0.0670 Epoch: 3 Global Step: 150330 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:37:53,608-Speed 2612.98 samples/sec Loss 11.2660 LearningRate 0.0670 Epoch: 3 Global Step: 150340 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:37:57,520-Speed 2618.38 samples/sec Loss 11.2344 LearningRate 0.0670 Epoch: 3 Global Step: 150350 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:01,445-Speed 2609.73 samples/sec Loss 11.1905 LearningRate 0.0670 Epoch: 3 Global Step: 150360 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:05,374-Speed 2606.60 samples/sec Loss 11.1843 LearningRate 0.0670 Epoch: 3 Global Step: 150370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:09,268-Speed 2630.61 samples/sec Loss 11.1058 LearningRate 0.0670 Epoch: 3 Global Step: 150380 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:13,163-Speed 2629.95 samples/sec Loss 11.2435 LearningRate 0.0670 Epoch: 3 Global Step: 150390 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:17,055-Speed 2632.09 samples/sec Loss 11.3649 LearningRate 0.0670 Epoch: 3 Global Step: 150400 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:38:20,942-Speed 2634.82 samples/sec Loss 11.1204 LearningRate 0.0670 Epoch: 3 Global Step: 150410 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:24,850-Speed 2621.27 samples/sec Loss 11.2142 LearningRate 0.0670 Epoch: 3 Global Step: 150420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:28,759-Speed 2620.12 samples/sec Loss 11.2280 LearningRate 0.0670 Epoch: 3 Global Step: 150430 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:32,659-Speed 2626.06 samples/sec Loss 11.2499 LearningRate 0.0670 Epoch: 3 Global Step: 150440 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:36,564-Speed 2622.88 samples/sec Loss 11.1691 LearningRate 0.0670 Epoch: 3 Global Step: 150450 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:40,461-Speed 2628.72 samples/sec Loss 11.1873 LearningRate 0.0670 Epoch: 3 Global Step: 150460 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:44,353-Speed 2631.39 samples/sec Loss 11.2816 LearningRate 0.0670 Epoch: 3 Global Step: 150470 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:48,250-Speed 2628.53 samples/sec Loss 11.0650 LearningRate 0.0670 Epoch: 3 Global Step: 150480 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:52,140-Speed 2632.73 samples/sec Loss 11.1118 LearningRate 0.0670 Epoch: 3 Global Step: 150490 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:56,075-Speed 2603.05 samples/sec Loss 11.3001 LearningRate 0.0670 Epoch: 3 Global Step: 150500 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:38:59,974-Speed 2627.30 samples/sec Loss 11.3237 LearningRate 0.0670 Epoch: 3 Global Step: 150510 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:39:03,877-Speed 2624.51 samples/sec Loss 11.2472 LearningRate 0.0670 Epoch: 3 Global Step: 150520 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:39:07,774-Speed 2628.11 samples/sec Loss 11.2070 LearningRate 0.0670 Epoch: 3 Global Step: 150530 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:39:11,690-Speed 2616.29 samples/sec Loss 11.1619 LearningRate 0.0670 Epoch: 3 Global Step: 150540 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:15,631-Speed 2598.43 samples/sec Loss 11.1989 LearningRate 0.0670 Epoch: 3 Global Step: 150550 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:19,543-Speed 2619.04 samples/sec Loss 11.3531 LearningRate 0.0670 Epoch: 3 Global Step: 150560 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:23,450-Speed 2621.50 samples/sec Loss 11.2000 LearningRate 0.0670 Epoch: 3 Global Step: 150570 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:27,356-Speed 2621.91 samples/sec Loss 11.2943 LearningRate 0.0670 Epoch: 3 Global Step: 150580 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:31,249-Speed 2631.58 samples/sec Loss 11.2218 LearningRate 0.0670 Epoch: 3 Global Step: 150590 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:35,138-Speed 2633.52 samples/sec Loss 11.2876 LearningRate 0.0670 Epoch: 3 Global Step: 150600 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:39,029-Speed 2632.10 samples/sec Loss 11.2725 LearningRate 0.0670 Epoch: 3 Global Step: 150610 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:42,920-Speed 2632.25 samples/sec Loss 11.2155 LearningRate 0.0670 Epoch: 3 Global Step: 150620 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:46,816-Speed 2628.72 samples/sec Loss 11.3199 LearningRate 0.0670 Epoch: 3 Global Step: 150630 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:50,709-Speed 2631.53 samples/sec Loss 11.2476 LearningRate 0.0670 Epoch: 3 Global Step: 150640 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:39:54,584-Speed 2643.24 samples/sec Loss 11.2546 LearningRate 0.0670 Epoch: 3 Global Step: 150650 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:39:58,475-Speed 2632.56 samples/sec Loss 11.2389 LearningRate 0.0670 Epoch: 3 Global Step: 150660 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:02,371-Speed 2629.08 samples/sec Loss 11.1580 LearningRate 0.0670 Epoch: 3 Global Step: 150670 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:06,262-Speed 2631.96 samples/sec Loss 11.1848 LearningRate 0.0670 Epoch: 3 Global Step: 150680 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:10,156-Speed 2630.58 samples/sec Loss 11.3783 LearningRate 0.0670 Epoch: 3 Global Step: 150690 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:14,055-Speed 2627.10 samples/sec Loss 11.1968 LearningRate 0.0670 Epoch: 3 Global Step: 150700 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:17,950-Speed 2629.47 samples/sec Loss 11.2736 LearningRate 0.0670 Epoch: 3 Global Step: 150710 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:21,845-Speed 2630.19 samples/sec Loss 11.3500 LearningRate 0.0670 Epoch: 3 Global Step: 150720 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:25,741-Speed 2628.88 samples/sec Loss 11.2064 LearningRate 0.0670 Epoch: 3 Global Step: 150730 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:29,635-Speed 2630.48 samples/sec Loss 11.3264 LearningRate 0.0670 Epoch: 3 Global Step: 150740 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:40:33,540-Speed 2622.58 samples/sec Loss 11.1431 LearningRate 0.0670 Epoch: 3 Global Step: 150750 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:40:37,446-Speed 2622.47 samples/sec Loss 11.1300 LearningRate 0.0670 Epoch: 3 Global Step: 150760 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:40:41,353-Speed 2621.27 samples/sec Loss 11.3382 LearningRate 0.0670 Epoch: 3 Global Step: 150770 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:40:45,246-Speed 2630.56 samples/sec Loss 11.1434 LearningRate 0.0670 Epoch: 3 Global Step: 150780 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:40:49,105-Speed 2654.11 samples/sec Loss 11.2413 LearningRate 0.0670 Epoch: 3 Global Step: 150790 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:40:53,000-Speed 2630.35 samples/sec Loss 11.9287 LearningRate 0.0669 Epoch: 3 Global Step: 150800 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:40:56,902-Speed 2624.68 samples/sec Loss 11.6009 LearningRate 0.0669 Epoch: 3 Global Step: 150810 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:00,796-Speed 2630.88 samples/sec Loss 11.2951 LearningRate 0.0669 Epoch: 3 Global Step: 150820 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:04,695-Speed 2626.66 samples/sec Loss 11.3713 LearningRate 0.0669 Epoch: 3 Global Step: 150830 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:08,595-Speed 2626.46 samples/sec Loss 11.2565 LearningRate 0.0669 Epoch: 3 Global Step: 150840 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:12,498-Speed 2624.20 samples/sec Loss 11.1790 LearningRate 0.0669 Epoch: 3 Global Step: 150850 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:16,413-Speed 2615.89 samples/sec Loss 11.2375 LearningRate 0.0669 Epoch: 3 Global Step: 150860 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:20,348-Speed 2602.96 samples/sec Loss 11.2973 LearningRate 0.0669 Epoch: 3 Global Step: 150870 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:24,246-Speed 2628.32 samples/sec Loss 11.1466 LearningRate 0.0669 Epoch: 3 Global Step: 150880 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 12:41:28,153-Speed 2621.57 samples/sec Loss 11.2189 LearningRate 0.0669 Epoch: 3 Global Step: 150890 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:32,054-Speed 2625.86 samples/sec Loss 11.1747 LearningRate 0.0669 Epoch: 3 Global Step: 150900 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:35,954-Speed 2626.45 samples/sec Loss 11.2326 LearningRate 0.0669 Epoch: 3 Global Step: 150910 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:39,864-Speed 2619.36 samples/sec Loss 11.3105 LearningRate 0.0669 Epoch: 3 Global Step: 150920 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:43,759-Speed 2629.77 samples/sec Loss 11.1017 LearningRate 0.0669 Epoch: 3 Global Step: 150930 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:47,658-Speed 2626.69 samples/sec Loss 11.2284 LearningRate 0.0669 Epoch: 3 Global Step: 150940 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:51,552-Speed 2630.47 samples/sec Loss 11.0929 LearningRate 0.0669 Epoch: 3 Global Step: 150950 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:55,476-Speed 2609.95 samples/sec Loss 11.2216 LearningRate 0.0669 Epoch: 3 Global Step: 150960 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:41:59,393-Speed 2615.33 samples/sec Loss 11.2697 LearningRate 0.0669 Epoch: 3 Global Step: 150970 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:42:03,293-Speed 2626.26 samples/sec Loss 11.2435 LearningRate 0.0669 Epoch: 3 Global Step: 150980 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:42:07,205-Speed 2618.51 samples/sec Loss 11.1684 LearningRate 0.0669 Epoch: 3 Global Step: 150990 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:11,101-Speed 2628.90 samples/sec Loss 11.2546 LearningRate 0.0669 Epoch: 3 Global Step: 151000 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:14,993-Speed 2631.48 samples/sec Loss 11.1903 LearningRate 0.0669 Epoch: 3 Global Step: 151010 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:18,898-Speed 2623.40 samples/sec Loss 11.0984 LearningRate 0.0669 Epoch: 3 Global Step: 151020 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:22,800-Speed 2624.85 samples/sec Loss 11.2372 LearningRate 0.0669 Epoch: 3 Global Step: 151030 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:26,691-Speed 2632.14 samples/sec Loss 11.2275 LearningRate 0.0669 Epoch: 3 Global Step: 151040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:30,586-Speed 2630.08 samples/sec Loss 11.0791 LearningRate 0.0669 Epoch: 3 Global Step: 151050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:34,476-Speed 2632.87 samples/sec Loss 11.1524 LearningRate 0.0669 Epoch: 3 Global Step: 151060 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:38,370-Speed 2630.29 samples/sec Loss 11.2594 LearningRate 0.0669 Epoch: 3 Global Step: 151070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:42,265-Speed 2629.48 samples/sec Loss 11.3653 LearningRate 0.0669 Epoch: 3 Global Step: 151080 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:42:46,165-Speed 2626.60 samples/sec Loss 11.2158 LearningRate 0.0669 Epoch: 3 Global Step: 151090 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:42:50,059-Speed 2630.35 samples/sec Loss 11.0708 LearningRate 0.0669 Epoch: 3 Global Step: 151100 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:42:53,956-Speed 2628.62 samples/sec Loss 11.1676 LearningRate 0.0669 Epoch: 3 Global Step: 151110 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:42:57,854-Speed 2627.67 samples/sec Loss 11.3767 LearningRate 0.0669 Epoch: 3 Global Step: 151120 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:43:01,774-Speed 2612.95 samples/sec Loss 11.2296 LearningRate 0.0669 Epoch: 3 Global Step: 151130 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:43:05,678-Speed 2623.36 samples/sec Loss 11.2419 LearningRate 0.0669 Epoch: 3 Global Step: 151140 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:43:09,573-Speed 2629.64 samples/sec Loss 11.1024 LearningRate 0.0669 Epoch: 3 Global Step: 151150 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:43:13,450-Speed 2641.96 samples/sec Loss 11.2318 LearningRate 0.0669 Epoch: 3 Global Step: 151160 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:43:17,346-Speed 2628.82 samples/sec Loss 11.3622 LearningRate 0.0669 Epoch: 3 Global Step: 151170 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:43:21,221-Speed 2643.46 samples/sec Loss 11.1885 LearningRate 0.0669 Epoch: 3 Global Step: 151180 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:25,130-Speed 2619.85 samples/sec Loss 11.1192 LearningRate 0.0669 Epoch: 3 Global Step: 151190 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:29,023-Speed 2631.59 samples/sec Loss 11.2900 LearningRate 0.0669 Epoch: 3 Global Step: 151200 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:32,918-Speed 2629.68 samples/sec Loss 11.1474 LearningRate 0.0669 Epoch: 3 Global Step: 151210 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:36,810-Speed 2631.70 samples/sec Loss 11.2814 LearningRate 0.0669 Epoch: 3 Global Step: 151220 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:40,701-Speed 2631.74 samples/sec Loss 11.1652 LearningRate 0.0669 Epoch: 3 Global Step: 151230 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:44,594-Speed 2630.93 samples/sec Loss 11.3139 LearningRate 0.0669 Epoch: 3 Global Step: 151240 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:48,490-Speed 2630.21 samples/sec Loss 11.1118 LearningRate 0.0669 Epoch: 3 Global Step: 151250 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:52,384-Speed 2629.71 samples/sec Loss 11.1918 LearningRate 0.0669 Epoch: 3 Global Step: 151260 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:43:56,279-Speed 2630.49 samples/sec Loss 11.2053 LearningRate 0.0669 Epoch: 3 Global Step: 151270 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:00,178-Speed 2626.83 samples/sec Loss 11.0513 LearningRate 0.0669 Epoch: 3 Global Step: 151280 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:44:04,080-Speed 2625.03 samples/sec Loss 11.2635 LearningRate 0.0669 Epoch: 3 Global Step: 151290 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:44:07,979-Speed 2626.32 samples/sec Loss 11.2654 LearningRate 0.0669 Epoch: 3 Global Step: 151300 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:44:11,895-Speed 2616.08 samples/sec Loss 11.2495 LearningRate 0.0668 Epoch: 3 Global Step: 151310 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:15,785-Speed 2632.93 samples/sec Loss 11.1123 LearningRate 0.0668 Epoch: 3 Global Step: 151320 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:19,721-Speed 2602.64 samples/sec Loss 11.1178 LearningRate 0.0668 Epoch: 3 Global Step: 151330 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:23,611-Speed 2633.16 samples/sec Loss 11.1610 LearningRate 0.0668 Epoch: 3 Global Step: 151340 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:27,503-Speed 2633.86 samples/sec Loss 11.2857 LearningRate 0.0668 Epoch: 3 Global Step: 151350 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:31,396-Speed 2630.74 samples/sec Loss 11.1893 LearningRate 0.0668 Epoch: 3 Global Step: 151360 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:35,288-Speed 2631.36 samples/sec Loss 11.0503 LearningRate 0.0668 Epoch: 3 Global Step: 151370 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:39,184-Speed 2629.05 samples/sec Loss 11.2136 LearningRate 0.0668 Epoch: 3 Global Step: 151380 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:43,096-Speed 2617.78 samples/sec Loss 11.1732 LearningRate 0.0668 Epoch: 3 Global Step: 151390 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:47,009-Speed 2618.14 samples/sec Loss 11.2076 LearningRate 0.0668 Epoch: 3 Global Step: 151400 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:44:50,921-Speed 2618.59 samples/sec Loss 11.3212 LearningRate 0.0668 Epoch: 3 Global Step: 151410 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:44:54,831-Speed 2618.86 samples/sec Loss 11.0932 LearningRate 0.0668 Epoch: 3 Global Step: 151420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:44:58,727-Speed 2629.56 samples/sec Loss 11.1054 LearningRate 0.0668 Epoch: 3 Global Step: 151430 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:02,623-Speed 2628.71 samples/sec Loss 11.2769 LearningRate 0.0668 Epoch: 3 Global Step: 151440 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:06,516-Speed 2630.92 samples/sec Loss 11.1636 LearningRate 0.0668 Epoch: 3 Global Step: 151450 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:10,409-Speed 2630.76 samples/sec Loss 11.0719 LearningRate 0.0668 Epoch: 3 Global Step: 151460 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:14,305-Speed 2629.10 samples/sec Loss 11.2080 LearningRate 0.0668 Epoch: 3 Global Step: 151470 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:18,208-Speed 2624.99 samples/sec Loss 11.0664 LearningRate 0.0668 Epoch: 3 Global Step: 151480 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:22,110-Speed 2624.33 samples/sec Loss 11.2349 LearningRate 0.0668 Epoch: 3 Global Step: 151490 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:26,009-Speed 2627.43 samples/sec Loss 11.2000 LearningRate 0.0668 Epoch: 3 Global Step: 151500 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:29,906-Speed 2628.18 samples/sec Loss 11.1942 LearningRate 0.0668 Epoch: 3 Global Step: 151510 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:45:33,777-Speed 2645.20 samples/sec Loss 11.1060 LearningRate 0.0668 Epoch: 3 Global Step: 151520 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:37,670-Speed 2631.36 samples/sec Loss 11.1539 LearningRate 0.0668 Epoch: 3 Global Step: 151530 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:41,571-Speed 2625.47 samples/sec Loss 11.0251 LearningRate 0.0668 Epoch: 3 Global Step: 151540 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:45,475-Speed 2623.25 samples/sec Loss 11.1792 LearningRate 0.0668 Epoch: 3 Global Step: 151550 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:49,382-Speed 2621.90 samples/sec Loss 11.1662 LearningRate 0.0668 Epoch: 3 Global Step: 151560 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:53,284-Speed 2624.92 samples/sec Loss 11.2041 LearningRate 0.0668 Epoch: 3 Global Step: 151570 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:45:57,196-Speed 2618.85 samples/sec Loss 11.1114 LearningRate 0.0668 Epoch: 3 Global Step: 151580 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:01,089-Speed 2630.75 samples/sec Loss 11.1029 LearningRate 0.0668 Epoch: 3 Global Step: 151590 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:05,000-Speed 2618.66 samples/sec Loss 11.2332 LearningRate 0.0668 Epoch: 3 Global Step: 151600 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:08,893-Speed 2630.49 samples/sec Loss 11.0893 LearningRate 0.0668 Epoch: 3 Global Step: 151610 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:12,816-Speed 2611.38 samples/sec Loss 11.2141 LearningRate 0.0668 Epoch: 3 Global Step: 151620 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:46:16,701-Speed 2636.21 samples/sec Loss 11.1309 LearningRate 0.0668 Epoch: 3 Global Step: 151630 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:20,604-Speed 2624.20 samples/sec Loss 11.3117 LearningRate 0.0668 Epoch: 3 Global Step: 151640 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:24,614-Speed 2554.24 samples/sec Loss 11.2298 LearningRate 0.0668 Epoch: 3 Global Step: 151650 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:28,527-Speed 2618.12 samples/sec Loss 11.1459 LearningRate 0.0668 Epoch: 3 Global Step: 151660 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:32,430-Speed 2624.20 samples/sec Loss 11.1483 LearningRate 0.0668 Epoch: 3 Global Step: 151670 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:36,324-Speed 2630.17 samples/sec Loss 11.1465 LearningRate 0.0668 Epoch: 3 Global Step: 151680 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:40,229-Speed 2622.80 samples/sec Loss 11.1426 LearningRate 0.0668 Epoch: 3 Global Step: 151690 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:44,122-Speed 2632.01 samples/sec Loss 11.1959 LearningRate 0.0668 Epoch: 3 Global Step: 151700 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:48,014-Speed 2631.74 samples/sec Loss 11.2421 LearningRate 0.0668 Epoch: 3 Global Step: 151710 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:51,912-Speed 2627.52 samples/sec Loss 11.3525 LearningRate 0.0668 Epoch: 3 Global Step: 151720 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:46:55,807-Speed 2629.52 samples/sec Loss 11.1625 LearningRate 0.0668 Epoch: 3 Global Step: 151730 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:46:59,702-Speed 2630.18 samples/sec Loss 11.3711 LearningRate 0.0668 Epoch: 3 Global Step: 151740 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:47:03,597-Speed 2629.47 samples/sec Loss 11.2449 LearningRate 0.0668 Epoch: 3 Global Step: 151750 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:47:07,499-Speed 2625.15 samples/sec Loss 11.2041 LearningRate 0.0668 Epoch: 3 Global Step: 151760 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:47:11,519-Speed 2548.20 samples/sec Loss 11.2124 LearningRate 0.0668 Epoch: 3 Global Step: 151770 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:15,468-Speed 2593.31 samples/sec Loss 11.3074 LearningRate 0.0668 Epoch: 3 Global Step: 151780 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:19,365-Speed 2628.21 samples/sec Loss 11.2162 LearningRate 0.0668 Epoch: 3 Global Step: 151790 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:23,260-Speed 2629.67 samples/sec Loss 11.2146 LearningRate 0.0668 Epoch: 3 Global Step: 151800 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:27,153-Speed 2631.29 samples/sec Loss 11.2773 LearningRate 0.0667 Epoch: 3 Global Step: 151810 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:31,048-Speed 2629.38 samples/sec Loss 11.2531 LearningRate 0.0667 Epoch: 3 Global Step: 151820 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:34,950-Speed 2625.27 samples/sec Loss 11.1653 LearningRate 0.0667 Epoch: 3 Global Step: 151830 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:38,854-Speed 2622.92 samples/sec Loss 11.2518 LearningRate 0.0667 Epoch: 3 Global Step: 151840 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:42,756-Speed 2625.32 samples/sec Loss 11.0536 LearningRate 0.0667 Epoch: 3 Global Step: 151850 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:46,666-Speed 2619.94 samples/sec Loss 11.1362 LearningRate 0.0667 Epoch: 3 Global Step: 151860 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:50,546-Speed 2639.76 samples/sec Loss 11.0961 LearningRate 0.0667 Epoch: 3 Global Step: 151870 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:54,443-Speed 2628.03 samples/sec Loss 11.0880 LearningRate 0.0667 Epoch: 3 Global Step: 151880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:47:58,341-Speed 2627.78 samples/sec Loss 11.1794 LearningRate 0.0667 Epoch: 3 Global Step: 151890 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:02,333-Speed 2565.44 samples/sec Loss 11.2260 LearningRate 0.0667 Epoch: 3 Global Step: 151900 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:06,228-Speed 2629.79 samples/sec Loss 11.0702 LearningRate 0.0667 Epoch: 3 Global Step: 151910 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:10,120-Speed 2631.24 samples/sec Loss 11.1800 LearningRate 0.0667 Epoch: 3 Global Step: 151920 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:14,016-Speed 2629.71 samples/sec Loss 11.2489 LearningRate 0.0667 Epoch: 3 Global Step: 151930 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:17,914-Speed 2627.26 samples/sec Loss 11.3266 LearningRate 0.0667 Epoch: 3 Global Step: 151940 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:21,809-Speed 2629.67 samples/sec Loss 11.3156 LearningRate 0.0667 Epoch: 3 Global Step: 151950 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:25,706-Speed 2628.26 samples/sec Loss 11.2760 LearningRate 0.0667 Epoch: 3 Global Step: 151960 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:29,610-Speed 2623.76 samples/sec Loss 11.1628 LearningRate 0.0667 Epoch: 3 Global Step: 151970 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:48:33,496-Speed 2636.48 samples/sec Loss 11.1918 LearningRate 0.0667 Epoch: 3 Global Step: 151980 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:37,402-Speed 2621.93 samples/sec Loss 11.3401 LearningRate 0.0667 Epoch: 3 Global Step: 151990 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:41,306-Speed 2623.40 samples/sec Loss 11.0194 LearningRate 0.0667 Epoch: 3 Global Step: 152000 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:45,212-Speed 2623.04 samples/sec Loss 11.1987 LearningRate 0.0667 Epoch: 3 Global Step: 152010 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:49,114-Speed 2624.46 samples/sec Loss 11.1670 LearningRate 0.0667 Epoch: 3 Global Step: 152020 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:53,031-Speed 2614.81 samples/sec Loss 11.0334 LearningRate 0.0667 Epoch: 3 Global Step: 152030 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:48:56,931-Speed 2626.06 samples/sec Loss 11.1658 LearningRate 0.0667 Epoch: 3 Global Step: 152040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:00,836-Speed 2623.17 samples/sec Loss 11.1822 LearningRate 0.0667 Epoch: 3 Global Step: 152050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:04,743-Speed 2621.30 samples/sec Loss 11.1838 LearningRate 0.0667 Epoch: 3 Global Step: 152060 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:08,637-Speed 2630.73 samples/sec Loss 11.2061 LearningRate 0.0667 Epoch: 3 Global Step: 152070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:12,535-Speed 2627.23 samples/sec Loss 11.2304 LearningRate 0.0667 Epoch: 3 Global Step: 152080 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:49:16,426-Speed 2632.24 samples/sec Loss 11.1708 LearningRate 0.0667 Epoch: 3 Global Step: 152090 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:49:20,329-Speed 2624.91 samples/sec Loss 11.0306 LearningRate 0.0667 Epoch: 3 Global Step: 152100 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:49:24,221-Speed 2631.02 samples/sec Loss 11.0912 LearningRate 0.0667 Epoch: 3 Global Step: 152110 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:49:28,108-Speed 2635.23 samples/sec Loss 11.3664 LearningRate 0.0667 Epoch: 3 Global Step: 152120 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:49:32,000-Speed 2631.74 samples/sec Loss 11.1163 LearningRate 0.0667 Epoch: 3 Global Step: 152130 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:49:35,904-Speed 2623.94 samples/sec Loss 11.1366 LearningRate 0.0667 Epoch: 3 Global Step: 152140 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:49:39,785-Speed 2638.53 samples/sec Loss 11.1841 LearningRate 0.0667 Epoch: 3 Global Step: 152150 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:43,681-Speed 2629.43 samples/sec Loss 11.1442 LearningRate 0.0667 Epoch: 3 Global Step: 152160 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:47,610-Speed 2606.66 samples/sec Loss 11.2008 LearningRate 0.0667 Epoch: 3 Global Step: 152170 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:51,526-Speed 2615.85 samples/sec Loss 11.1621 LearningRate 0.0667 Epoch: 3 Global Step: 152180 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:55,450-Speed 2609.93 samples/sec Loss 11.2031 LearningRate 0.0667 Epoch: 3 Global Step: 152190 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:49:59,366-Speed 2615.95 samples/sec Loss 11.1007 LearningRate 0.0667 Epoch: 3 Global Step: 152200 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:03,263-Speed 2627.78 samples/sec Loss 11.1351 LearningRate 0.0667 Epoch: 3 Global Step: 152210 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:07,183-Speed 2612.90 samples/sec Loss 11.3012 LearningRate 0.0667 Epoch: 3 Global Step: 152220 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:11,082-Speed 2626.84 samples/sec Loss 11.1555 LearningRate 0.0667 Epoch: 3 Global Step: 152230 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:15,048-Speed 2582.88 samples/sec Loss 10.9694 LearningRate 0.0667 Epoch: 3 Global Step: 152240 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:18,952-Speed 2623.96 samples/sec Loss 11.1953 LearningRate 0.0667 Epoch: 3 Global Step: 152250 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:50:22,853-Speed 2625.48 samples/sec Loss 11.1276 LearningRate 0.0667 Epoch: 3 Global Step: 152260 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:50:26,747-Speed 2630.82 samples/sec Loss 11.2728 LearningRate 0.0667 Epoch: 3 Global Step: 152270 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:50:30,638-Speed 2632.26 samples/sec Loss 11.1773 LearningRate 0.0667 Epoch: 3 Global Step: 152280 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:50:34,536-Speed 2627.44 samples/sec Loss 11.2014 LearningRate 0.0667 Epoch: 3 Global Step: 152290 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:50:38,409-Speed 2644.45 samples/sec Loss 11.0222 LearningRate 0.0667 Epoch: 3 Global Step: 152300 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:42,309-Speed 2626.55 samples/sec Loss 11.1134 LearningRate 0.0667 Epoch: 3 Global Step: 152310 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:46,205-Speed 2628.86 samples/sec Loss 11.2858 LearningRate 0.0666 Epoch: 3 Global Step: 152320 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:50,125-Speed 2613.26 samples/sec Loss 11.2296 LearningRate 0.0666 Epoch: 3 Global Step: 152330 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:54,047-Speed 2611.51 samples/sec Loss 11.0949 LearningRate 0.0666 Epoch: 3 Global Step: 152340 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:50:57,952-Speed 2623.22 samples/sec Loss 11.2012 LearningRate 0.0666 Epoch: 3 Global Step: 152350 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:01,852-Speed 2626.25 samples/sec Loss 11.1367 LearningRate 0.0666 Epoch: 3 Global Step: 152360 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:05,759-Speed 2621.50 samples/sec Loss 11.1134 LearningRate 0.0666 Epoch: 3 Global Step: 152370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:09,663-Speed 2623.18 samples/sec Loss 11.1164 LearningRate 0.0666 Epoch: 3 Global Step: 152380 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:13,577-Speed 2616.90 samples/sec Loss 11.0711 LearningRate 0.0666 Epoch: 3 Global Step: 152390 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:17,475-Speed 2627.71 samples/sec Loss 11.2832 LearningRate 0.0666 Epoch: 3 Global Step: 152400 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:51:21,372-Speed 2629.01 samples/sec Loss 11.2427 LearningRate 0.0666 Epoch: 3 Global Step: 152410 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:51:25,254-Speed 2638.13 samples/sec Loss 11.0830 LearningRate 0.0666 Epoch: 3 Global Step: 152420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:29,164-Speed 2619.90 samples/sec Loss 11.2852 LearningRate 0.0666 Epoch: 3 Global Step: 152430 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:33,070-Speed 2622.01 samples/sec Loss 11.2057 LearningRate 0.0666 Epoch: 3 Global Step: 152440 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:36,965-Speed 2629.37 samples/sec Loss 11.1609 LearningRate 0.0666 Epoch: 3 Global Step: 152450 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:40,865-Speed 2625.77 samples/sec Loss 11.2043 LearningRate 0.0666 Epoch: 3 Global Step: 152460 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:44,766-Speed 2626.19 samples/sec Loss 11.0801 LearningRate 0.0666 Epoch: 3 Global Step: 152470 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:48,660-Speed 2630.71 samples/sec Loss 11.1485 LearningRate 0.0666 Epoch: 3 Global Step: 152480 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:52,551-Speed 2632.23 samples/sec Loss 11.2306 LearningRate 0.0666 Epoch: 3 Global Step: 152490 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:51:56,447-Speed 2629.31 samples/sec Loss 11.1771 LearningRate 0.0666 Epoch: 3 Global Step: 152500 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:00,338-Speed 2631.62 samples/sec Loss 11.2742 LearningRate 0.0666 Epoch: 3 Global Step: 152510 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:04,223-Speed 2636.42 samples/sec Loss 11.2088 LearningRate 0.0666 Epoch: 3 Global Step: 152520 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:08,119-Speed 2628.94 samples/sec Loss 11.1705 LearningRate 0.0666 Epoch: 3 Global Step: 152530 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:12,018-Speed 2627.14 samples/sec Loss 11.1337 LearningRate 0.0666 Epoch: 3 Global Step: 152540 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:15,906-Speed 2634.53 samples/sec Loss 11.0117 LearningRate 0.0666 Epoch: 3 Global Step: 152550 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:19,815-Speed 2620.14 samples/sec Loss 11.0470 LearningRate 0.0666 Epoch: 3 Global Step: 152560 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:23,710-Speed 2630.13 samples/sec Loss 11.3354 LearningRate 0.0666 Epoch: 3 Global Step: 152570 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:27,607-Speed 2628.73 samples/sec Loss 11.0470 LearningRate 0.0666 Epoch: 3 Global Step: 152580 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:31,505-Speed 2627.32 samples/sec Loss 11.0485 LearningRate 0.0666 Epoch: 3 Global Step: 152590 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:35,397-Speed 2631.52 samples/sec Loss 11.1520 LearningRate 0.0666 Epoch: 3 Global Step: 152600 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:39,295-Speed 2627.96 samples/sec Loss 11.1474 LearningRate 0.0666 Epoch: 3 Global Step: 152610 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:52:43,190-Speed 2629.84 samples/sec Loss 11.1756 LearningRate 0.0666 Epoch: 3 Global Step: 152620 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:52:47,082-Speed 2631.48 samples/sec Loss 11.0534 LearningRate 0.0666 Epoch: 3 Global Step: 152630 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:52:50,977-Speed 2629.89 samples/sec Loss 11.1026 LearningRate 0.0666 Epoch: 3 Global Step: 152640 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:52:54,870-Speed 2630.19 samples/sec Loss 11.2301 LearningRate 0.0666 Epoch: 3 Global Step: 152650 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:52:58,750-Speed 2640.25 samples/sec Loss 11.2485 LearningRate 0.0666 Epoch: 3 Global Step: 152660 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:02,645-Speed 2629.74 samples/sec Loss 11.2554 LearningRate 0.0666 Epoch: 3 Global Step: 152670 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:06,541-Speed 2629.37 samples/sec Loss 11.1421 LearningRate 0.0666 Epoch: 3 Global Step: 152680 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:10,438-Speed 2628.32 samples/sec Loss 11.1446 LearningRate 0.0666 Epoch: 3 Global Step: 152690 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:14,332-Speed 2630.66 samples/sec Loss 11.1798 LearningRate 0.0666 Epoch: 3 Global Step: 152700 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:18,243-Speed 2618.78 samples/sec Loss 11.2548 LearningRate 0.0666 Epoch: 3 Global Step: 152710 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:22,146-Speed 2624.24 samples/sec Loss 11.1468 LearningRate 0.0666 Epoch: 3 Global Step: 152720 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:26,039-Speed 2630.62 samples/sec Loss 11.2225 LearningRate 0.0666 Epoch: 3 Global Step: 152730 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:29,933-Speed 2630.72 samples/sec Loss 11.2447 LearningRate 0.0666 Epoch: 3 Global Step: 152740 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:33,830-Speed 2629.15 samples/sec Loss 11.1689 LearningRate 0.0666 Epoch: 3 Global Step: 152750 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:53:37,727-Speed 2628.06 samples/sec Loss 11.2180 LearningRate 0.0666 Epoch: 3 Global Step: 152760 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:53:41,627-Speed 2626.21 samples/sec Loss 11.1419 LearningRate 0.0666 Epoch: 3 Global Step: 152770 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:53:45,528-Speed 2625.73 samples/sec Loss 11.1147 LearningRate 0.0666 Epoch: 3 Global Step: 152780 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:53:49,437-Speed 2620.33 samples/sec Loss 11.2238 LearningRate 0.0666 Epoch: 3 Global Step: 152790 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:53:53,341-Speed 2623.79 samples/sec Loss 11.1714 LearningRate 0.0666 Epoch: 3 Global Step: 152800 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:53:57,238-Speed 2628.48 samples/sec Loss 11.1661 LearningRate 0.0666 Epoch: 3 Global Step: 152810 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:01,173-Speed 2602.51 samples/sec Loss 11.1061 LearningRate 0.0666 Epoch: 3 Global Step: 152820 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:05,073-Speed 2626.40 samples/sec Loss 11.2564 LearningRate 0.0665 Epoch: 3 Global Step: 152830 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:08,976-Speed 2624.42 samples/sec Loss 11.0701 LearningRate 0.0665 Epoch: 3 Global Step: 152840 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:12,878-Speed 2625.19 samples/sec Loss 10.9925 LearningRate 0.0665 Epoch: 3 Global Step: 152850 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:16,779-Speed 2625.29 samples/sec Loss 11.4090 LearningRate 0.0665 Epoch: 3 Global Step: 152860 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:20,687-Speed 2621.02 samples/sec Loss 11.2099 LearningRate 0.0665 Epoch: 3 Global Step: 152870 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:24,581-Speed 2630.03 samples/sec Loss 11.2344 LearningRate 0.0665 Epoch: 3 Global Step: 152880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:54:28,460-Speed 2640.24 samples/sec Loss 11.3058 LearningRate 0.0665 Epoch: 3 Global Step: 152890 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:32,357-Speed 2628.13 samples/sec Loss 11.1195 LearningRate 0.0665 Epoch: 3 Global Step: 152900 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:36,247-Speed 2633.32 samples/sec Loss 10.9399 LearningRate 0.0665 Epoch: 3 Global Step: 152910 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:40,138-Speed 2632.89 samples/sec Loss 11.1953 LearningRate 0.0665 Epoch: 3 Global Step: 152920 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:44,031-Speed 2630.52 samples/sec Loss 11.1390 LearningRate 0.0665 Epoch: 3 Global Step: 152930 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:47,924-Speed 2631.05 samples/sec Loss 11.2131 LearningRate 0.0665 Epoch: 3 Global Step: 152940 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:51,816-Speed 2631.98 samples/sec Loss 11.0334 LearningRate 0.0665 Epoch: 3 Global Step: 152950 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:55,705-Speed 2633.29 samples/sec Loss 11.0654 LearningRate 0.0665 Epoch: 3 Global Step: 152960 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:54:59,597-Speed 2631.50 samples/sec Loss 11.1513 LearningRate 0.0665 Epoch: 3 Global Step: 152970 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:55:03,486-Speed 2633.51 samples/sec Loss 11.3248 LearningRate 0.0665 Epoch: 3 Global Step: 152980 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:55:07,386-Speed 2626.23 samples/sec Loss 11.0641 LearningRate 0.0665 Epoch: 3 Global Step: 152990 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:11,288-Speed 2624.88 samples/sec Loss 11.1964 LearningRate 0.0665 Epoch: 3 Global Step: 153000 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:15,190-Speed 2625.18 samples/sec Loss 11.2106 LearningRate 0.0665 Epoch: 3 Global Step: 153010 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:19,089-Speed 2627.63 samples/sec Loss 11.0460 LearningRate 0.0665 Epoch: 3 Global Step: 153020 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:22,982-Speed 2630.31 samples/sec Loss 10.9780 LearningRate 0.0665 Epoch: 3 Global Step: 153030 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:26,879-Speed 2628.42 samples/sec Loss 11.1734 LearningRate 0.0665 Epoch: 3 Global Step: 153040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:30,773-Speed 2630.71 samples/sec Loss 11.1327 LearningRate 0.0665 Epoch: 3 Global Step: 153050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:34,674-Speed 2625.38 samples/sec Loss 11.0324 LearningRate 0.0665 Epoch: 3 Global Step: 153060 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:38,568-Speed 2629.80 samples/sec Loss 10.9947 LearningRate 0.0665 Epoch: 3 Global Step: 153070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:42,460-Speed 2631.79 samples/sec Loss 11.0680 LearningRate 0.0665 Epoch: 3 Global Step: 153080 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:46,340-Speed 2640.07 samples/sec Loss 11.1551 LearningRate 0.0665 Epoch: 3 Global Step: 153090 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:50,233-Speed 2631.02 samples/sec Loss 11.2564 LearningRate 0.0665 Epoch: 3 Global Step: 153100 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:54,130-Speed 2628.36 samples/sec Loss 11.0994 LearningRate 0.0665 Epoch: 3 Global Step: 153110 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:55:58,024-Speed 2630.48 samples/sec Loss 11.0944 LearningRate 0.0665 Epoch: 3 Global Step: 153120 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:01,917-Speed 2630.37 samples/sec Loss 11.0647 LearningRate 0.0665 Epoch: 3 Global Step: 153130 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:05,814-Speed 2628.66 samples/sec Loss 11.0261 LearningRate 0.0665 Epoch: 3 Global Step: 153140 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:09,717-Speed 2623.65 samples/sec Loss 11.2049 LearningRate 0.0665 Epoch: 3 Global Step: 153150 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:13,612-Speed 2630.01 samples/sec Loss 11.1586 LearningRate 0.0665 Epoch: 3 Global Step: 153160 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:17,503-Speed 2631.86 samples/sec Loss 11.0555 LearningRate 0.0665 Epoch: 3 Global Step: 153170 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:21,398-Speed 2630.07 samples/sec Loss 11.0623 LearningRate 0.0665 Epoch: 3 Global Step: 153180 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:25,300-Speed 2624.73 samples/sec Loss 11.1400 LearningRate 0.0665 Epoch: 3 Global Step: 153190 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:56:29,202-Speed 2625.33 samples/sec Loss 11.1805 LearningRate 0.0665 Epoch: 3 Global Step: 153200 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:56:33,106-Speed 2623.01 samples/sec Loss 11.1873 LearningRate 0.0665 Epoch: 3 Global Step: 153210 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:56:36,987-Speed 2638.95 samples/sec Loss 11.2265 LearningRate 0.0665 Epoch: 3 Global Step: 153220 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:40,880-Speed 2630.98 samples/sec Loss 11.1994 LearningRate 0.0665 Epoch: 3 Global Step: 153230 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:44,777-Speed 2628.08 samples/sec Loss 11.1919 LearningRate 0.0665 Epoch: 3 Global Step: 153240 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:56:48,661-Speed 2637.31 samples/sec Loss 11.2584 LearningRate 0.0665 Epoch: 3 Global Step: 153250 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:56:52,565-Speed 2623.55 samples/sec Loss 11.1318 LearningRate 0.0665 Epoch: 3 Global Step: 153260 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:56:56,456-Speed 2632.05 samples/sec Loss 11.1825 LearningRate 0.0665 Epoch: 3 Global Step: 153270 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:00,350-Speed 2630.83 samples/sec Loss 11.1371 LearningRate 0.0665 Epoch: 3 Global Step: 153280 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:04,240-Speed 2633.05 samples/sec Loss 11.0479 LearningRate 0.0665 Epoch: 3 Global Step: 153290 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:08,130-Speed 2632.48 samples/sec Loss 11.1723 LearningRate 0.0665 Epoch: 3 Global Step: 153300 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:12,038-Speed 2621.18 samples/sec Loss 11.0045 LearningRate 0.0665 Epoch: 3 Global Step: 153310 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:15,934-Speed 2628.48 samples/sec Loss 11.2263 LearningRate 0.0665 Epoch: 3 Global Step: 153320 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:19,829-Speed 2629.80 samples/sec Loss 11.0331 LearningRate 0.0665 Epoch: 3 Global Step: 153330 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:23,722-Speed 2630.61 samples/sec Loss 11.2547 LearningRate 0.0664 Epoch: 3 Global Step: 153340 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 12:57:27,636-Speed 2617.36 samples/sec Loss 11.2030 LearningRate 0.0664 Epoch: 3 Global Step: 153350 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:31,529-Speed 2630.30 samples/sec Loss 11.0956 LearningRate 0.0664 Epoch: 3 Global Step: 153360 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:35,428-Speed 2627.02 samples/sec Loss 11.1824 LearningRate 0.0664 Epoch: 3 Global Step: 153370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:39,327-Speed 2627.03 samples/sec Loss 11.1927 LearningRate 0.0664 Epoch: 3 Global Step: 153380 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:43,218-Speed 2632.78 samples/sec Loss 11.2725 LearningRate 0.0664 Epoch: 3 Global Step: 153390 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:47,113-Speed 2629.40 samples/sec Loss 11.0960 LearningRate 0.0664 Epoch: 3 Global Step: 153400 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:51,007-Speed 2630.27 samples/sec Loss 10.9756 LearningRate 0.0664 Epoch: 3 Global Step: 153410 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:54,903-Speed 2628.62 samples/sec Loss 11.1302 LearningRate 0.0664 Epoch: 3 Global Step: 153420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:57:58,798-Speed 2629.97 samples/sec Loss 11.2066 LearningRate 0.0664 Epoch: 3 Global Step: 153430 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:58:02,691-Speed 2630.77 samples/sec Loss 11.1585 LearningRate 0.0664 Epoch: 3 Global Step: 153440 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:58:06,591-Speed 2625.87 samples/sec Loss 11.2233 LearningRate 0.0664 Epoch: 3 Global Step: 153450 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:10,490-Speed 2627.43 samples/sec Loss 11.1819 LearningRate 0.0664 Epoch: 3 Global Step: 153460 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:14,390-Speed 2626.35 samples/sec Loss 11.1103 LearningRate 0.0664 Epoch: 3 Global Step: 153470 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:18,293-Speed 2623.83 samples/sec Loss 11.1711 LearningRate 0.0664 Epoch: 3 Global Step: 153480 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:22,205-Speed 2618.33 samples/sec Loss 11.1068 LearningRate 0.0664 Epoch: 3 Global Step: 153490 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:26,112-Speed 2621.47 samples/sec Loss 11.1455 LearningRate 0.0664 Epoch: 3 Global Step: 153500 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:30,009-Speed 2627.99 samples/sec Loss 11.0256 LearningRate 0.0664 Epoch: 3 Global Step: 153510 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:33,910-Speed 2625.39 samples/sec Loss 10.9933 LearningRate 0.0664 Epoch: 3 Global Step: 153520 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:37,806-Speed 2629.37 samples/sec Loss 11.1211 LearningRate 0.0664 Epoch: 3 Global Step: 153530 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:58:41,700-Speed 2629.76 samples/sec Loss 10.9795 LearningRate 0.0664 Epoch: 3 Global Step: 153540 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:58:45,600-Speed 2627.03 samples/sec Loss 11.1762 LearningRate 0.0664 Epoch: 3 Global Step: 153550 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:58:49,492-Speed 2631.16 samples/sec Loss 11.1052 LearningRate 0.0664 Epoch: 3 Global Step: 153560 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:58:53,391-Speed 2627.07 samples/sec Loss 11.0156 LearningRate 0.0664 Epoch: 3 Global Step: 153570 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:58:57,286-Speed 2629.43 samples/sec Loss 11.1075 LearningRate 0.0664 Epoch: 3 Global Step: 153580 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:01,180-Speed 2630.05 samples/sec Loss 11.1249 LearningRate 0.0664 Epoch: 3 Global Step: 153590 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:05,077-Speed 2628.20 samples/sec Loss 11.0677 LearningRate 0.0664 Epoch: 3 Global Step: 153600 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:08,979-Speed 2625.02 samples/sec Loss 11.0685 LearningRate 0.0664 Epoch: 3 Global Step: 153610 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:12,873-Speed 2630.23 samples/sec Loss 11.0729 LearningRate 0.0664 Epoch: 3 Global Step: 153620 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:16,772-Speed 2626.69 samples/sec Loss 11.1103 LearningRate 0.0664 Epoch: 3 Global Step: 153630 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:20,675-Speed 2624.42 samples/sec Loss 11.1908 LearningRate 0.0664 Epoch: 3 Global Step: 153640 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 12:59:24,552-Speed 2641.68 samples/sec Loss 11.0547 LearningRate 0.0664 Epoch: 3 Global Step: 153650 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:28,444-Speed 2631.93 samples/sec Loss 11.1407 LearningRate 0.0664 Epoch: 3 Global Step: 153660 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:32,343-Speed 2627.07 samples/sec Loss 11.0619 LearningRate 0.0664 Epoch: 3 Global Step: 153670 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:36,267-Speed 2609.94 samples/sec Loss 11.1776 LearningRate 0.0664 Epoch: 3 Global Step: 153680 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:40,163-Speed 2629.15 samples/sec Loss 11.0929 LearningRate 0.0664 Epoch: 3 Global Step: 153690 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:44,083-Speed 2612.94 samples/sec Loss 11.1283 LearningRate 0.0664 Epoch: 3 Global Step: 153700 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:47,971-Speed 2633.68 samples/sec Loss 11.0947 LearningRate 0.0664 Epoch: 3 Global Step: 153710 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:51,865-Speed 2630.30 samples/sec Loss 11.1717 LearningRate 0.0664 Epoch: 3 Global Step: 153720 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:55,765-Speed 2625.88 samples/sec Loss 11.1524 LearningRate 0.0664 Epoch: 3 Global Step: 153730 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 12:59:59,660-Speed 2630.43 samples/sec Loss 11.1155 LearningRate 0.0664 Epoch: 3 Global Step: 153740 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:03,555-Speed 2629.54 samples/sec Loss 11.1552 LearningRate 0.0664 Epoch: 3 Global Step: 153750 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:00:07,457-Speed 2624.77 samples/sec Loss 11.0533 LearningRate 0.0664 Epoch: 3 Global Step: 153760 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:00:11,372-Speed 2615.97 samples/sec Loss 11.0910 LearningRate 0.0664 Epoch: 3 Global Step: 153770 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:00:15,284-Speed 2618.63 samples/sec Loss 11.0040 LearningRate 0.0664 Epoch: 3 Global Step: 153780 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:00:19,182-Speed 2626.92 samples/sec Loss 11.2189 LearningRate 0.0664 Epoch: 3 Global Step: 153790 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:00:23,073-Speed 2632.34 samples/sec Loss 11.1419 LearningRate 0.0664 Epoch: 3 Global Step: 153800 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:26,977-Speed 2623.60 samples/sec Loss 11.0913 LearningRate 0.0664 Epoch: 3 Global Step: 153810 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:30,865-Speed 2634.51 samples/sec Loss 11.0292 LearningRate 0.0664 Epoch: 3 Global Step: 153820 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:34,757-Speed 2631.62 samples/sec Loss 11.2055 LearningRate 0.0664 Epoch: 3 Global Step: 153830 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:38,647-Speed 2632.90 samples/sec Loss 11.0932 LearningRate 0.0664 Epoch: 3 Global Step: 153840 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:42,539-Speed 2632.00 samples/sec Loss 11.0270 LearningRate 0.0663 Epoch: 3 Global Step: 153850 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:46,431-Speed 2631.34 samples/sec Loss 11.1711 LearningRate 0.0663 Epoch: 3 Global Step: 153860 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:50,334-Speed 2624.26 samples/sec Loss 10.9366 LearningRate 0.0663 Epoch: 3 Global Step: 153870 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:54,238-Speed 2623.87 samples/sec Loss 11.1085 LearningRate 0.0663 Epoch: 3 Global Step: 153880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:00:58,134-Speed 2628.54 samples/sec Loss 11.0749 LearningRate 0.0663 Epoch: 3 Global Step: 153890 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:02,018-Speed 2636.92 samples/sec Loss 11.0250 LearningRate 0.0663 Epoch: 3 Global Step: 153900 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:05,924-Speed 2622.14 samples/sec Loss 11.1311 LearningRate 0.0663 Epoch: 3 Global Step: 153910 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:09,833-Speed 2620.20 samples/sec Loss 10.9843 LearningRate 0.0663 Epoch: 3 Global Step: 153920 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:13,749-Speed 2615.36 samples/sec Loss 10.9672 LearningRate 0.0663 Epoch: 3 Global Step: 153930 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:17,660-Speed 2619.06 samples/sec Loss 11.1322 LearningRate 0.0663 Epoch: 3 Global Step: 153940 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:21,570-Speed 2619.81 samples/sec Loss 11.1331 LearningRate 0.0663 Epoch: 3 Global Step: 153950 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:25,479-Speed 2619.87 samples/sec Loss 11.3021 LearningRate 0.0663 Epoch: 3 Global Step: 153960 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:29,386-Speed 2621.66 samples/sec Loss 11.1292 LearningRate 0.0663 Epoch: 3 Global Step: 153970 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:33,279-Speed 2630.72 samples/sec Loss 11.1533 LearningRate 0.0663 Epoch: 3 Global Step: 153980 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:37,171-Speed 2631.31 samples/sec Loss 10.9978 LearningRate 0.0663 Epoch: 3 Global Step: 153990 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:41,064-Speed 2631.34 samples/sec Loss 11.2738 LearningRate 0.0663 Epoch: 3 Global Step: 154000 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:01:44,944-Speed 2639.68 samples/sec Loss 11.3387 LearningRate 0.0663 Epoch: 3 Global Step: 154010 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:48,841-Speed 2628.06 samples/sec Loss 11.1073 LearningRate 0.0663 Epoch: 3 Global Step: 154020 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:52,739-Speed 2628.11 samples/sec Loss 11.0249 LearningRate 0.0663 Epoch: 3 Global Step: 154030 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:01:56,632-Speed 2630.99 samples/sec Loss 11.1470 LearningRate 0.0663 Epoch: 3 Global Step: 154040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:00,525-Speed 2630.93 samples/sec Loss 11.1367 LearningRate 0.0663 Epoch: 3 Global Step: 154050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:04,426-Speed 2625.34 samples/sec Loss 11.1698 LearningRate 0.0663 Epoch: 3 Global Step: 154060 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:08,336-Speed 2619.62 samples/sec Loss 11.0824 LearningRate 0.0663 Epoch: 3 Global Step: 154070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:12,232-Speed 2628.25 samples/sec Loss 10.9822 LearningRate 0.0663 Epoch: 3 Global Step: 154080 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:16,127-Speed 2630.21 samples/sec Loss 11.3523 LearningRate 0.0663 Epoch: 3 Global Step: 154090 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:20,024-Speed 2628.27 samples/sec Loss 11.0428 LearningRate 0.0663 Epoch: 3 Global Step: 154100 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:23,932-Speed 2620.81 samples/sec Loss 11.2207 LearningRate 0.0663 Epoch: 3 Global Step: 154110 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:02:27,819-Speed 2635.60 samples/sec Loss 11.0648 LearningRate 0.0663 Epoch: 3 Global Step: 154120 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:31,718-Speed 2626.26 samples/sec Loss 11.0872 LearningRate 0.0663 Epoch: 3 Global Step: 154130 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:35,620-Speed 2624.73 samples/sec Loss 11.2273 LearningRate 0.0663 Epoch: 3 Global Step: 154140 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:39,546-Speed 2609.16 samples/sec Loss 11.1283 LearningRate 0.0663 Epoch: 3 Global Step: 154150 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:43,440-Speed 2630.28 samples/sec Loss 11.2649 LearningRate 0.0663 Epoch: 3 Global Step: 154160 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:47,333-Speed 2630.60 samples/sec Loss 11.1884 LearningRate 0.0663 Epoch: 3 Global Step: 154170 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:51,231-Speed 2627.77 samples/sec Loss 11.1084 LearningRate 0.0663 Epoch: 3 Global Step: 154180 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:55,135-Speed 2623.62 samples/sec Loss 11.1827 LearningRate 0.0663 Epoch: 3 Global Step: 154190 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:02:59,027-Speed 2631.61 samples/sec Loss 11.1352 LearningRate 0.0663 Epoch: 3 Global Step: 154200 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:02,917-Speed 2633.12 samples/sec Loss 11.2734 LearningRate 0.0663 Epoch: 3 Global Step: 154210 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:06,836-Speed 2613.62 samples/sec Loss 11.2369 LearningRate 0.0663 Epoch: 3 Global Step: 154220 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:03:10,734-Speed 2627.13 samples/sec Loss 11.1224 LearningRate 0.0663 Epoch: 3 Global Step: 154230 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:14,642-Speed 2621.07 samples/sec Loss 11.2052 LearningRate 0.0663 Epoch: 3 Global Step: 154240 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:18,538-Speed 2629.10 samples/sec Loss 11.1296 LearningRate 0.0663 Epoch: 3 Global Step: 154250 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:22,435-Speed 2628.22 samples/sec Loss 11.1088 LearningRate 0.0663 Epoch: 3 Global Step: 154260 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:26,336-Speed 2625.34 samples/sec Loss 11.1466 LearningRate 0.0663 Epoch: 3 Global Step: 154270 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:30,234-Speed 2627.93 samples/sec Loss 11.1188 LearningRate 0.0663 Epoch: 3 Global Step: 154280 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:34,134-Speed 2625.98 samples/sec Loss 11.0670 LearningRate 0.0663 Epoch: 3 Global Step: 154290 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:38,029-Speed 2629.49 samples/sec Loss 11.0644 LearningRate 0.0663 Epoch: 3 Global Step: 154300 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:41,924-Speed 2629.42 samples/sec Loss 11.1319 LearningRate 0.0663 Epoch: 3 Global Step: 154310 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:45,846-Speed 2611.88 samples/sec Loss 11.1728 LearningRate 0.0663 Epoch: 3 Global Step: 154320 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:03:49,747-Speed 2625.47 samples/sec Loss 11.1623 LearningRate 0.0663 Epoch: 3 Global Step: 154330 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:03:53,647-Speed 2626.12 samples/sec Loss 11.2026 LearningRate 0.0663 Epoch: 3 Global Step: 154340 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:03:57,520-Speed 2645.06 samples/sec Loss 11.0222 LearningRate 0.0663 Epoch: 3 Global Step: 154350 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:04:01,422-Speed 2624.75 samples/sec Loss 11.1582 LearningRate 0.0662 Epoch: 3 Global Step: 154360 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:04:05,316-Speed 2629.59 samples/sec Loss 11.0706 LearningRate 0.0662 Epoch: 3 Global Step: 154370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:04:09,188-Speed 2645.11 samples/sec Loss 11.1160 LearningRate 0.0662 Epoch: 3 Global Step: 154380 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:04:13,090-Speed 2625.05 samples/sec Loss 11.0010 LearningRate 0.0662 Epoch: 3 Global Step: 154390 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:04:16,993-Speed 2624.82 samples/sec Loss 10.8389 LearningRate 0.0662 Epoch: 3 Global Step: 154400 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:04:20,852-Speed 2654.04 samples/sec Loss 11.2702 LearningRate 0.0662 Epoch: 3 Global Step: 154410 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:04:24,744-Speed 2631.77 samples/sec Loss 11.1618 LearningRate 0.0662 Epoch: 3 Global Step: 154420 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:04:28,639-Speed 2629.90 samples/sec Loss 11.3617 LearningRate 0.0662 Epoch: 3 Global Step: 154430 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:04:32,540-Speed 2625.49 samples/sec Loss 11.2685 LearningRate 0.0662 Epoch: 3 Global Step: 154440 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:04:36,432-Speed 2631.83 samples/sec Loss 11.2392 LearningRate 0.0662 Epoch: 3 Global Step: 154450 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:04:40,310-Speed 2640.77 samples/sec Loss 11.4682 LearningRate 0.0662 Epoch: 3 Global Step: 154460 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:04:44,198-Speed 2634.67 samples/sec Loss 11.9239 LearningRate 0.0662 Epoch: 3 Global Step: 154470 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:04:48,097-Speed 2626.80 samples/sec Loss 11.6128 LearningRate 0.0662 Epoch: 3 Global Step: 154480 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:04:51,998-Speed 2625.54 samples/sec Loss 11.3572 LearningRate 0.0662 Epoch: 3 Global Step: 154490 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:04:55,909-Speed 2618.49 samples/sec Loss 11.4364 LearningRate 0.0662 Epoch: 3 Global Step: 154500 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:04:59,804-Speed 2629.84 samples/sec Loss 11.3108 LearningRate 0.0662 Epoch: 3 Global Step: 154510 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:05:03,710-Speed 2622.45 samples/sec Loss 11.2527 LearningRate 0.0662 Epoch: 3 Global Step: 154520 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:05:07,598-Speed 2634.06 samples/sec Loss 11.2783 LearningRate 0.0662 Epoch: 3 Global Step: 154530 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:05:11,489-Speed 2632.20 samples/sec Loss 11.2326 LearningRate 0.0662 Epoch: 3 Global Step: 154540 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:05:15,379-Speed 2633.17 samples/sec Loss 11.2500 LearningRate 0.0662 Epoch: 3 Global Step: 154550 Fp16 Grad Scale: 8192 Required: 76 hours
Training: 2022-04-13 13:05:19,280-Speed 2625.35 samples/sec Loss 11.0948 LearningRate 0.0662 Epoch: 3 Global Step: 154560 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:23,174-Speed 2629.92 samples/sec Loss 11.2072 LearningRate 0.0662 Epoch: 3 Global Step: 154570 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:27,067-Speed 2630.75 samples/sec Loss 11.2084 LearningRate 0.0662 Epoch: 3 Global Step: 154580 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:30,964-Speed 2628.77 samples/sec Loss 11.2586 LearningRate 0.0662 Epoch: 3 Global Step: 154590 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:34,859-Speed 2630.22 samples/sec Loss 11.2965 LearningRate 0.0662 Epoch: 3 Global Step: 154600 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:38,749-Speed 2632.58 samples/sec Loss 11.1579 LearningRate 0.0662 Epoch: 3 Global Step: 154610 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:42,646-Speed 2628.14 samples/sec Loss 11.2257 LearningRate 0.0662 Epoch: 3 Global Step: 154620 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:46,538-Speed 2631.70 samples/sec Loss 11.1648 LearningRate 0.0662 Epoch: 3 Global Step: 154630 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:50,431-Speed 2630.95 samples/sec Loss 11.1040 LearningRate 0.0662 Epoch: 3 Global Step: 154640 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:54,324-Speed 2630.75 samples/sec Loss 11.2622 LearningRate 0.0662 Epoch: 3 Global Step: 154650 Fp16 Grad Scale: 16384 Required: 76 hours
Training: 2022-04-13 13:05:58,215-Speed 2632.22 samples/sec Loss 11.3156 LearningRate 0.0662 Epoch: 3 Global Step: 154660 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:02,119-Speed 2623.74 samples/sec Loss 11.0873 LearningRate 0.0662 Epoch: 3 Global Step: 154670 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:06,014-Speed 2629.72 samples/sec Loss 11.2779 LearningRate 0.0662 Epoch: 3 Global Step: 154680 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:09,907-Speed 2630.79 samples/sec Loss 11.0945 LearningRate 0.0662 Epoch: 3 Global Step: 154690 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:13,801-Speed 2630.52 samples/sec Loss 11.2349 LearningRate 0.0662 Epoch: 3 Global Step: 154700 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:17,698-Speed 2628.04 samples/sec Loss 11.0770 LearningRate 0.0662 Epoch: 3 Global Step: 154710 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:21,591-Speed 2631.19 samples/sec Loss 10.9490 LearningRate 0.0662 Epoch: 3 Global Step: 154720 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:25,498-Speed 2621.46 samples/sec Loss 11.0302 LearningRate 0.0662 Epoch: 3 Global Step: 154730 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:29,430-Speed 2604.89 samples/sec Loss 11.0803 LearningRate 0.0662 Epoch: 3 Global Step: 154740 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:33,332-Speed 2624.66 samples/sec Loss 11.2095 LearningRate 0.0662 Epoch: 3 Global Step: 154750 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:06:37,234-Speed 2624.72 samples/sec Loss 11.2619 LearningRate 0.0662 Epoch: 3 Global Step: 154760 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:06:41,127-Speed 2630.85 samples/sec Loss 11.0578 LearningRate 0.0662 Epoch: 3 Global Step: 154770 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:06:45,021-Speed 2630.76 samples/sec Loss 11.1763 LearningRate 0.0662 Epoch: 3 Global Step: 154780 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:06:48,922-Speed 2625.83 samples/sec Loss 11.1638 LearningRate 0.0662 Epoch: 3 Global Step: 154790 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:06:52,815-Speed 2631.11 samples/sec Loss 11.0629 LearningRate 0.0662 Epoch: 3 Global Step: 154800 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:06:56,714-Speed 2626.60 samples/sec Loss 11.0309 LearningRate 0.0662 Epoch: 3 Global Step: 154810 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:00,604-Speed 2633.08 samples/sec Loss 11.1127 LearningRate 0.0662 Epoch: 3 Global Step: 154820 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:04,502-Speed 2627.47 samples/sec Loss 11.0586 LearningRate 0.0662 Epoch: 3 Global Step: 154830 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:08,399-Speed 2628.15 samples/sec Loss 11.0392 LearningRate 0.0662 Epoch: 3 Global Step: 154840 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:12,291-Speed 2631.66 samples/sec Loss 11.1442 LearningRate 0.0662 Epoch: 3 Global Step: 154850 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:16,187-Speed 2628.70 samples/sec Loss 11.0905 LearningRate 0.0662 Epoch: 3 Global Step: 154860 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:20,090-Speed 2624.63 samples/sec Loss 11.0205 LearningRate 0.0661 Epoch: 3 Global Step: 154870 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:23,983-Speed 2631.15 samples/sec Loss 11.0566 LearningRate 0.0661 Epoch: 3 Global Step: 154880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:27,880-Speed 2628.32 samples/sec Loss 11.2532 LearningRate 0.0661 Epoch: 3 Global Step: 154890 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:31,774-Speed 2630.53 samples/sec Loss 11.1466 LearningRate 0.0661 Epoch: 3 Global Step: 154900 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:35,671-Speed 2627.87 samples/sec Loss 11.1996 LearningRate 0.0661 Epoch: 3 Global Step: 154910 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:39,563-Speed 2631.39 samples/sec Loss 11.0892 LearningRate 0.0661 Epoch: 3 Global Step: 154920 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:43,462-Speed 2627.33 samples/sec Loss 11.1998 LearningRate 0.0661 Epoch: 3 Global Step: 154930 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:07:47,337-Speed 2642.75 samples/sec Loss 11.1689 LearningRate 0.0661 Epoch: 3 Global Step: 154940 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:51,228-Speed 2632.42 samples/sec Loss 11.1040 LearningRate 0.0661 Epoch: 3 Global Step: 154950 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:55,120-Speed 2631.39 samples/sec Loss 11.2039 LearningRate 0.0661 Epoch: 3 Global Step: 154960 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:07:59,037-Speed 2615.16 samples/sec Loss 11.0236 LearningRate 0.0661 Epoch: 3 Global Step: 154970 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:08:02,936-Speed 2627.02 samples/sec Loss 11.1073 LearningRate 0.0661 Epoch: 3 Global Step: 154980 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:08:06,843-Speed 2621.31 samples/sec Loss 11.1283 LearningRate 0.0661 Epoch: 3 Global Step: 154990 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:08:10,748-Speed 2622.92 samples/sec Loss 11.0359 LearningRate 0.0661 Epoch: 3 Global Step: 155000 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:08:14,655-Speed 2621.77 samples/sec Loss 11.2346 LearningRate 0.0661 Epoch: 3 Global Step: 155010 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:08:18,550-Speed 2629.11 samples/sec Loss 11.1344 LearningRate 0.0661 Epoch: 3 Global Step: 155020 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:08:22,442-Speed 2632.03 samples/sec Loss 11.1645 LearningRate 0.0661 Epoch: 3 Global Step: 155030 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:08:26,333-Speed 2631.83 samples/sec Loss 11.1771 LearningRate 0.0661 Epoch: 3 Global Step: 155040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:30,235-Speed 2625.16 samples/sec Loss 11.1061 LearningRate 0.0661 Epoch: 3 Global Step: 155050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:34,128-Speed 2631.08 samples/sec Loss 11.1151 LearningRate 0.0661 Epoch: 3 Global Step: 155060 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:38,031-Speed 2623.57 samples/sec Loss 11.0212 LearningRate 0.0661 Epoch: 3 Global Step: 155070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:41,926-Speed 2629.79 samples/sec Loss 11.2620 LearningRate 0.0661 Epoch: 3 Global Step: 155080 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:45,822-Speed 2629.08 samples/sec Loss 11.1746 LearningRate 0.0661 Epoch: 3 Global Step: 155090 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:49,737-Speed 2616.62 samples/sec Loss 11.1275 LearningRate 0.0661 Epoch: 3 Global Step: 155100 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:53,633-Speed 2629.03 samples/sec Loss 11.0373 LearningRate 0.0661 Epoch: 3 Global Step: 155110 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:08:57,529-Speed 2628.34 samples/sec Loss 11.1096 LearningRate 0.0661 Epoch: 3 Global Step: 155120 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:01,423-Speed 2630.66 samples/sec Loss 11.0306 LearningRate 0.0661 Epoch: 3 Global Step: 155130 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:05,317-Speed 2629.82 samples/sec Loss 11.0057 LearningRate 0.0661 Epoch: 3 Global Step: 155140 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:09:09,213-Speed 2628.82 samples/sec Loss 11.1980 LearningRate 0.0661 Epoch: 3 Global Step: 155150 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:09:13,113-Speed 2626.54 samples/sec Loss 11.1525 LearningRate 0.0661 Epoch: 3 Global Step: 155160 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:09:17,019-Speed 2622.26 samples/sec Loss 11.0545 LearningRate 0.0661 Epoch: 3 Global Step: 155170 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:09:20,936-Speed 2615.02 samples/sec Loss 11.0369 LearningRate 0.0661 Epoch: 3 Global Step: 155180 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:09:24,831-Speed 2629.04 samples/sec Loss 11.1004 LearningRate 0.0661 Epoch: 3 Global Step: 155190 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:09:28,737-Speed 2622.46 samples/sec Loss 11.1051 LearningRate 0.0661 Epoch: 3 Global Step: 155200 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:09:32,613-Speed 2642.67 samples/sec Loss 11.2863 LearningRate 0.0661 Epoch: 3 Global Step: 155210 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:36,510-Speed 2628.06 samples/sec Loss 11.1310 LearningRate 0.0661 Epoch: 3 Global Step: 155220 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:40,404-Speed 2630.12 samples/sec Loss 11.0279 LearningRate 0.0661 Epoch: 3 Global Step: 155230 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:44,321-Speed 2615.13 samples/sec Loss 11.1707 LearningRate 0.0661 Epoch: 3 Global Step: 155240 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:48,212-Speed 2631.64 samples/sec Loss 10.9466 LearningRate 0.0661 Epoch: 3 Global Step: 155250 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:52,109-Speed 2629.00 samples/sec Loss 11.0198 LearningRate 0.0661 Epoch: 3 Global Step: 155260 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:56,005-Speed 2628.58 samples/sec Loss 11.1537 LearningRate 0.0661 Epoch: 3 Global Step: 155270 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:09:59,907-Speed 2625.57 samples/sec Loss 10.9516 LearningRate 0.0661 Epoch: 3 Global Step: 155280 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:03,806-Speed 2626.25 samples/sec Loss 10.9647 LearningRate 0.0661 Epoch: 3 Global Step: 155290 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:07,706-Speed 2626.49 samples/sec Loss 11.1923 LearningRate 0.0661 Epoch: 3 Global Step: 155300 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:11,619-Speed 2617.30 samples/sec Loss 11.2441 LearningRate 0.0661 Epoch: 3 Global Step: 155310 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:10:15,535-Speed 2615.77 samples/sec Loss 11.1786 LearningRate 0.0661 Epoch: 3 Global Step: 155320 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:10:19,425-Speed 2632.73 samples/sec Loss 11.1067 LearningRate 0.0661 Epoch: 3 Global Step: 155330 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:23,328-Speed 2623.92 samples/sec Loss 11.2660 LearningRate 0.0661 Epoch: 3 Global Step: 155340 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:27,219-Speed 2631.99 samples/sec Loss 11.0809 LearningRate 0.0661 Epoch: 3 Global Step: 155350 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:31,117-Speed 2628.36 samples/sec Loss 11.2058 LearningRate 0.0661 Epoch: 3 Global Step: 155360 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:35,019-Speed 2624.48 samples/sec Loss 11.0810 LearningRate 0.0661 Epoch: 3 Global Step: 155370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:38,915-Speed 2629.27 samples/sec Loss 11.1280 LearningRate 0.0660 Epoch: 3 Global Step: 155380 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:42,820-Speed 2622.61 samples/sec Loss 11.0929 LearningRate 0.0660 Epoch: 3 Global Step: 155390 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:46,741-Speed 2612.06 samples/sec Loss 11.0019 LearningRate 0.0660 Epoch: 3 Global Step: 155400 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:50,647-Speed 2622.11 samples/sec Loss 11.1628 LearningRate 0.0660 Epoch: 3 Global Step: 155410 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:54,557-Speed 2619.82 samples/sec Loss 11.0644 LearningRate 0.0660 Epoch: 3 Global Step: 155420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:10:58,465-Speed 2620.80 samples/sec Loss 11.1797 LearningRate 0.0660 Epoch: 3 Global Step: 155430 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:11:02,350-Speed 2636.27 samples/sec Loss 11.2162 LearningRate 0.0660 Epoch: 3 Global Step: 155440 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:06,250-Speed 2626.16 samples/sec Loss 11.1218 LearningRate 0.0660 Epoch: 3 Global Step: 155450 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:10,165-Speed 2616.19 samples/sec Loss 11.0670 LearningRate 0.0660 Epoch: 3 Global Step: 155460 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:14,060-Speed 2629.64 samples/sec Loss 11.0749 LearningRate 0.0660 Epoch: 3 Global Step: 155470 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:17,953-Speed 2631.28 samples/sec Loss 11.1091 LearningRate 0.0660 Epoch: 3 Global Step: 155480 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:21,853-Speed 2626.28 samples/sec Loss 10.9433 LearningRate 0.0660 Epoch: 3 Global Step: 155490 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:25,751-Speed 2627.08 samples/sec Loss 11.0463 LearningRate 0.0660 Epoch: 3 Global Step: 155500 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:29,646-Speed 2629.59 samples/sec Loss 11.0457 LearningRate 0.0660 Epoch: 3 Global Step: 155510 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:33,545-Speed 2627.09 samples/sec Loss 10.9713 LearningRate 0.0660 Epoch: 3 Global Step: 155520 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:37,442-Speed 2628.13 samples/sec Loss 10.9327 LearningRate 0.0660 Epoch: 3 Global Step: 155530 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:11:41,341-Speed 2626.80 samples/sec Loss 11.1966 LearningRate 0.0660 Epoch: 3 Global Step: 155540 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:11:45,236-Speed 2629.66 samples/sec Loss 11.0890 LearningRate 0.0660 Epoch: 3 Global Step: 155550 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:11:49,138-Speed 2625.31 samples/sec Loss 10.9215 LearningRate 0.0660 Epoch: 3 Global Step: 155560 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:11:53,036-Speed 2627.35 samples/sec Loss 11.0630 LearningRate 0.0660 Epoch: 3 Global Step: 155570 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:11:56,912-Speed 2642.07 samples/sec Loss 11.0531 LearningRate 0.0660 Epoch: 3 Global Step: 155580 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:00,811-Speed 2627.10 samples/sec Loss 11.1597 LearningRate 0.0660 Epoch: 3 Global Step: 155590 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:04,708-Speed 2627.96 samples/sec Loss 11.0594 LearningRate 0.0660 Epoch: 3 Global Step: 155600 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:08,604-Speed 2629.03 samples/sec Loss 11.0803 LearningRate 0.0660 Epoch: 3 Global Step: 155610 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:12,502-Speed 2627.43 samples/sec Loss 11.1249 LearningRate 0.0660 Epoch: 3 Global Step: 155620 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:16,397-Speed 2629.90 samples/sec Loss 11.0106 LearningRate 0.0660 Epoch: 3 Global Step: 155630 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:20,298-Speed 2625.99 samples/sec Loss 10.9701 LearningRate 0.0660 Epoch: 3 Global Step: 155640 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:24,210-Speed 2618.05 samples/sec Loss 10.9952 LearningRate 0.0660 Epoch: 3 Global Step: 155650 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:28,103-Speed 2630.71 samples/sec Loss 11.0470 LearningRate 0.0660 Epoch: 3 Global Step: 155660 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:32,000-Speed 2628.37 samples/sec Loss 11.1336 LearningRate 0.0660 Epoch: 3 Global Step: 155670 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:35,895-Speed 2629.31 samples/sec Loss 11.1476 LearningRate 0.0660 Epoch: 3 Global Step: 155680 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:12:39,776-Speed 2639.45 samples/sec Loss 11.1385 LearningRate 0.0660 Epoch: 3 Global Step: 155690 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:12:43,682-Speed 2621.65 samples/sec Loss 11.1845 LearningRate 0.0660 Epoch: 3 Global Step: 155700 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:12:47,578-Speed 2629.24 samples/sec Loss 11.1162 LearningRate 0.0660 Epoch: 3 Global Step: 155710 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:12:51,477-Speed 2626.80 samples/sec Loss 11.0453 LearningRate 0.0660 Epoch: 3 Global Step: 155720 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:12:55,380-Speed 2624.48 samples/sec Loss 11.1715 LearningRate 0.0660 Epoch: 3 Global Step: 155730 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:12:59,275-Speed 2629.51 samples/sec Loss 11.1776 LearningRate 0.0660 Epoch: 3 Global Step: 155740 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:13:03,182-Speed 2621.79 samples/sec Loss 11.0721 LearningRate 0.0660 Epoch: 3 Global Step: 155750 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:13:07,078-Speed 2628.76 samples/sec Loss 10.9655 LearningRate 0.0660 Epoch: 3 Global Step: 155760 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:13:10,977-Speed 2630.88 samples/sec Loss 11.1837 LearningRate 0.0660 Epoch: 3 Global Step: 155770 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:13:14,875-Speed 2627.44 samples/sec Loss 11.1915 LearningRate 0.0660 Epoch: 3 Global Step: 155780 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:13:18,771-Speed 2628.24 samples/sec Loss 11.2692 LearningRate 0.0660 Epoch: 3 Global Step: 155790 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:22,676-Speed 2623.70 samples/sec Loss 11.2151 LearningRate 0.0660 Epoch: 3 Global Step: 155800 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:26,577-Speed 2625.72 samples/sec Loss 11.2024 LearningRate 0.0660 Epoch: 3 Global Step: 155810 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:30,471-Speed 2630.18 samples/sec Loss 11.1498 LearningRate 0.0660 Epoch: 3 Global Step: 155820 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:34,362-Speed 2632.48 samples/sec Loss 11.2559 LearningRate 0.0660 Epoch: 3 Global Step: 155830 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:38,284-Speed 2611.30 samples/sec Loss 11.2214 LearningRate 0.0660 Epoch: 3 Global Step: 155840 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:42,176-Speed 2631.30 samples/sec Loss 11.2537 LearningRate 0.0660 Epoch: 3 Global Step: 155850 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:46,066-Speed 2633.67 samples/sec Loss 11.0825 LearningRate 0.0660 Epoch: 3 Global Step: 155860 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:49,965-Speed 2626.33 samples/sec Loss 11.1621 LearningRate 0.0660 Epoch: 3 Global Step: 155870 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:53,861-Speed 2629.48 samples/sec Loss 11.0486 LearningRate 0.0660 Epoch: 3 Global Step: 155880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:13:57,751-Speed 2632.21 samples/sec Loss 11.0191 LearningRate 0.0659 Epoch: 3 Global Step: 155890 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:14:01,648-Speed 2628.87 samples/sec Loss 11.1005 LearningRate 0.0659 Epoch: 3 Global Step: 155900 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:14:05,527-Speed 2640.23 samples/sec Loss 11.1874 LearningRate 0.0659 Epoch: 3 Global Step: 155910 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:09,419-Speed 2631.40 samples/sec Loss 11.1232 LearningRate 0.0659 Epoch: 3 Global Step: 155920 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:13,316-Speed 2628.64 samples/sec Loss 11.0633 LearningRate 0.0659 Epoch: 3 Global Step: 155930 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:17,220-Speed 2623.56 samples/sec Loss 11.0006 LearningRate 0.0659 Epoch: 3 Global Step: 155940 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:21,135-Speed 2616.21 samples/sec Loss 11.0747 LearningRate 0.0659 Epoch: 3 Global Step: 155950 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:25,041-Speed 2622.02 samples/sec Loss 11.0650 LearningRate 0.0659 Epoch: 3 Global Step: 155960 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:28,964-Speed 2610.95 samples/sec Loss 11.0233 LearningRate 0.0659 Epoch: 3 Global Step: 155970 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:32,876-Speed 2618.37 samples/sec Loss 11.1282 LearningRate 0.0659 Epoch: 3 Global Step: 155980 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:36,783-Speed 2621.22 samples/sec Loss 11.0429 LearningRate 0.0659 Epoch: 3 Global Step: 155990 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:40,679-Speed 2628.97 samples/sec Loss 11.0727 LearningRate 0.0659 Epoch: 3 Global Step: 156000 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:44,561-Speed 2639.28 samples/sec Loss 11.1412 LearningRate 0.0659 Epoch: 3 Global Step: 156010 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:48,447-Speed 2635.99 samples/sec Loss 11.1675 LearningRate 0.0659 Epoch: 3 Global Step: 156020 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:52,343-Speed 2628.56 samples/sec Loss 10.9344 LearningRate 0.0659 Epoch: 3 Global Step: 156030 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:14:56,237-Speed 2630.64 samples/sec Loss 11.1678 LearningRate 0.0659 Epoch: 3 Global Step: 156040 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:00,128-Speed 2632.41 samples/sec Loss 11.1144 LearningRate 0.0659 Epoch: 3 Global Step: 156050 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:04,021-Speed 2630.62 samples/sec Loss 11.2482 LearningRate 0.0659 Epoch: 3 Global Step: 156060 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:07,914-Speed 2630.68 samples/sec Loss 11.1419 LearningRate 0.0659 Epoch: 3 Global Step: 156070 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:11,808-Speed 2630.31 samples/sec Loss 11.2035 LearningRate 0.0659 Epoch: 3 Global Step: 156080 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:15,706-Speed 2627.24 samples/sec Loss 11.0633 LearningRate 0.0659 Epoch: 3 Global Step: 156090 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:19,601-Speed 2629.67 samples/sec Loss 11.0256 LearningRate 0.0659 Epoch: 3 Global Step: 156100 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:23,500-Speed 2627.15 samples/sec Loss 10.9724 LearningRate 0.0659 Epoch: 3 Global Step: 156110 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:15:27,411-Speed 2618.96 samples/sec Loss 11.0605 LearningRate 0.0659 Epoch: 3 Global Step: 156120 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:15:31,307-Speed 2629.72 samples/sec Loss 11.0189 LearningRate 0.0659 Epoch: 3 Global Step: 156130 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:15:35,216-Speed 2620.02 samples/sec Loss 11.0058 LearningRate 0.0659 Epoch: 3 Global Step: 156140 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:15:39,124-Speed 2620.20 samples/sec Loss 11.2181 LearningRate 0.0659 Epoch: 3 Global Step: 156150 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:15:43,009-Speed 2636.76 samples/sec Loss 11.1166 LearningRate 0.0659 Epoch: 3 Global Step: 156160 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:46,905-Speed 2628.84 samples/sec Loss 11.0157 LearningRate 0.0659 Epoch: 3 Global Step: 156170 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:50,818-Speed 2618.01 samples/sec Loss 11.0484 LearningRate 0.0659 Epoch: 3 Global Step: 156180 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:54,720-Speed 2624.51 samples/sec Loss 11.0188 LearningRate 0.0659 Epoch: 3 Global Step: 156190 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:15:58,615-Speed 2630.06 samples/sec Loss 11.0867 LearningRate 0.0659 Epoch: 3 Global Step: 156200 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:02,509-Speed 2630.02 samples/sec Loss 11.1431 LearningRate 0.0659 Epoch: 3 Global Step: 156210 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:06,406-Speed 2628.23 samples/sec Loss 10.9353 LearningRate 0.0659 Epoch: 3 Global Step: 156220 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:10,307-Speed 2625.54 samples/sec Loss 11.0423 LearningRate 0.0659 Epoch: 3 Global Step: 156230 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:14,202-Speed 2629.39 samples/sec Loss 11.1525 LearningRate 0.0659 Epoch: 3 Global Step: 156240 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:18,097-Speed 2629.91 samples/sec Loss 11.0312 LearningRate 0.0659 Epoch: 3 Global Step: 156250 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:21,997-Speed 2628.10 samples/sec Loss 11.1252 LearningRate 0.0659 Epoch: 3 Global Step: 156260 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:16:25,900-Speed 2624.37 samples/sec Loss 11.0805 LearningRate 0.0659 Epoch: 3 Global Step: 156270 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:16:29,803-Speed 2624.11 samples/sec Loss 11.0007 LearningRate 0.0659 Epoch: 3 Global Step: 156280 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:16:33,686-Speed 2637.63 samples/sec Loss 11.1079 LearningRate 0.0659 Epoch: 3 Global Step: 156290 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:37,596-Speed 2620.30 samples/sec Loss 11.1496 LearningRate 0.0659 Epoch: 3 Global Step: 156300 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:41,503-Speed 2621.60 samples/sec Loss 11.1082 LearningRate 0.0659 Epoch: 3 Global Step: 156310 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:45,409-Speed 2621.75 samples/sec Loss 11.0826 LearningRate 0.0659 Epoch: 3 Global Step: 156320 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:49,325-Speed 2615.93 samples/sec Loss 11.0307 LearningRate 0.0659 Epoch: 3 Global Step: 156330 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:53,226-Speed 2625.11 samples/sec Loss 11.1166 LearningRate 0.0659 Epoch: 3 Global Step: 156340 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:16:57,122-Speed 2629.18 samples/sec Loss 11.2659 LearningRate 0.0659 Epoch: 3 Global Step: 156350 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:01,032-Speed 2618.84 samples/sec Loss 11.0473 LearningRate 0.0659 Epoch: 3 Global Step: 156360 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:04,963-Speed 2606.28 samples/sec Loss 11.0401 LearningRate 0.0659 Epoch: 3 Global Step: 156370 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:08,855-Speed 2631.00 samples/sec Loss 11.0974 LearningRate 0.0659 Epoch: 3 Global Step: 156380 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:12,749-Speed 2630.53 samples/sec Loss 11.1351 LearningRate 0.0659 Epoch: 3 Global Step: 156390 Fp16 Grad Scale: 262144 Required: 76 hours
Training: 2022-04-13 13:17:16,634-Speed 2636.71 samples/sec Loss 11.0258 LearningRate 0.0658 Epoch: 3 Global Step: 156400 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:20,534-Speed 2626.26 samples/sec Loss 11.0545 LearningRate 0.0658 Epoch: 3 Global Step: 156410 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:24,451-Speed 2614.46 samples/sec Loss 10.9695 LearningRate 0.0658 Epoch: 3 Global Step: 156420 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:28,366-Speed 2616.03 samples/sec Loss 11.0415 LearningRate 0.0658 Epoch: 3 Global Step: 156430 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:32,261-Speed 2629.71 samples/sec Loss 11.1805 LearningRate 0.0658 Epoch: 3 Global Step: 156440 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:36,156-Speed 2629.60 samples/sec Loss 11.1316 LearningRate 0.0658 Epoch: 3 Global Step: 156450 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:40,060-Speed 2623.28 samples/sec Loss 10.9509 LearningRate 0.0658 Epoch: 3 Global Step: 156460 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:17:43,948-Speed 2634.46 samples/sec Loss 11.0199 LearningRate 0.0658 Epoch: 3 Global Step: 156470 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:17:47,855-Speed 2621.50 samples/sec Loss 11.0394 LearningRate 0.0658 Epoch: 3 Global Step: 156480 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:17:51,762-Speed 2621.99 samples/sec Loss 11.1266 LearningRate 0.0658 Epoch: 3 Global Step: 156490 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:17:55,658-Speed 2628.94 samples/sec Loss 10.9918 LearningRate 0.0658 Epoch: 3 Global Step: 156500 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:17:59,552-Speed 2630.01 samples/sec Loss 11.1837 LearningRate 0.0658 Epoch: 3 Global Step: 156510 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:18:03,447-Speed 2629.38 samples/sec Loss 11.1767 LearningRate 0.0658 Epoch: 3 Global Step: 156520 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:18:07,346-Speed 2627.64 samples/sec Loss 11.1315 LearningRate 0.0658 Epoch: 3 Global Step: 156530 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:18:11,239-Speed 2630.46 samples/sec Loss 11.2450 LearningRate 0.0658 Epoch: 3 Global Step: 156540 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:18:15,129-Speed 2633.24 samples/sec Loss 11.1292 LearningRate 0.0658 Epoch: 3 Global Step: 156550 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:18:19,022-Speed 2630.94 samples/sec Loss 11.1617 LearningRate 0.0658 Epoch: 3 Global Step: 156560 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:18:22,924-Speed 2625.05 samples/sec Loss 11.0917 LearningRate 0.0658 Epoch: 3 Global Step: 156570 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:26,816-Speed 2631.10 samples/sec Loss 11.0569 LearningRate 0.0658 Epoch: 3 Global Step: 156580 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:30,707-Speed 2632.27 samples/sec Loss 11.0285 LearningRate 0.0658 Epoch: 3 Global Step: 156590 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:34,599-Speed 2632.52 samples/sec Loss 11.1067 LearningRate 0.0658 Epoch: 3 Global Step: 156600 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:38,496-Speed 2627.74 samples/sec Loss 11.1090 LearningRate 0.0658 Epoch: 3 Global Step: 156610 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:42,401-Speed 2623.41 samples/sec Loss 11.3132 LearningRate 0.0658 Epoch: 3 Global Step: 156620 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:46,301-Speed 2626.07 samples/sec Loss 11.0221 LearningRate 0.0658 Epoch: 3 Global Step: 156630 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:50,195-Speed 2629.84 samples/sec Loss 11.0408 LearningRate 0.0658 Epoch: 3 Global Step: 156640 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:18:54,067-Speed 2645.41 samples/sec Loss 11.7435 LearningRate 0.0658 Epoch: 3 Global Step: 156650 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:18:57,961-Speed 2630.21 samples/sec Loss 11.6227 LearningRate 0.0658 Epoch: 3 Global Step: 156660 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:01,858-Speed 2628.21 samples/sec Loss 11.2409 LearningRate 0.0658 Epoch: 3 Global Step: 156670 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:05,837-Speed 2573.84 samples/sec Loss 11.1777 LearningRate 0.0658 Epoch: 3 Global Step: 156680 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:09,735-Speed 2628.18 samples/sec Loss 11.0120 LearningRate 0.0658 Epoch: 3 Global Step: 156690 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:13,644-Speed 2620.04 samples/sec Loss 10.9237 LearningRate 0.0658 Epoch: 3 Global Step: 156700 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:17,536-Speed 2631.53 samples/sec Loss 11.3260 LearningRate 0.0658 Epoch: 3 Global Step: 156710 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:21,442-Speed 2622.77 samples/sec Loss 11.0820 LearningRate 0.0658 Epoch: 3 Global Step: 156720 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:25,364-Speed 2610.97 samples/sec Loss 11.1929 LearningRate 0.0658 Epoch: 3 Global Step: 156730 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:29,270-Speed 2622.15 samples/sec Loss 11.2631 LearningRate 0.0658 Epoch: 3 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 76 hours
Training: 2022-04-13 13:19:33,168-Speed 2627.40 samples/sec Loss 10.9760 LearningRate 0.0658 Epoch: 3 Global Step: 156750 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:19:37,068-Speed 2626.70 samples/sec Loss 11.1255 LearningRate 0.0658 Epoch: 3 Global Step: 156760 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:19:40,965-Speed 2627.94 samples/sec Loss 11.0867 LearningRate 0.0658 Epoch: 3 Global Step: 156770 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:19:44,920-Speed 2589.77 samples/sec Loss 11.1070 LearningRate 0.0658 Epoch: 3 Global Step: 156780 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:19:48,814-Speed 2630.56 samples/sec Loss 11.1037 LearningRate 0.0658 Epoch: 3 Global Step: 156790 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:19:52,710-Speed 2628.97 samples/sec Loss 11.1353 LearningRate 0.0658 Epoch: 3 Global Step: 156800 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:19:56,608-Speed 2627.48 samples/sec Loss 11.1322 LearningRate 0.0658 Epoch: 3 Global Step: 156810 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:20:00,508-Speed 2626.71 samples/sec Loss 11.3646 LearningRate 0.0658 Epoch: 3 Global Step: 156820 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:20:04,406-Speed 2627.40 samples/sec Loss 11.0753 LearningRate 0.0658 Epoch: 3 Global Step: 156830 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:20:08,301-Speed 2628.84 samples/sec Loss 11.2847 LearningRate 0.0658 Epoch: 3 Global Step: 156840 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:20:12,200-Speed 2627.23 samples/sec Loss 10.9723 LearningRate 0.0658 Epoch: 3 Global Step: 156850 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:20:16,088-Speed 2634.71 samples/sec Loss 11.0152 LearningRate 0.0658 Epoch: 3 Global Step: 156860 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:20:19,995-Speed 2621.33 samples/sec Loss 11.1577 LearningRate 0.0658 Epoch: 3 Global Step: 156870 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:20:23,889-Speed 2630.70 samples/sec Loss 11.1669 LearningRate 0.0658 Epoch: 3 Global Step: 156880 Fp16 Grad Scale: 131072 Required: 76 hours
Training: 2022-04-13 13:20:27,765-Speed 2642.63 samples/sec Loss 11.1136 LearningRate 0.0658 Epoch: 3 Global Step: 156890 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:20:31,663-Speed 2627.62 samples/sec Loss 11.1051 LearningRate 0.0658 Epoch: 3 Global Step: 156900 Fp16 Grad Scale: 65536 Required: 76 hours
Training: 2022-04-13 13:20:35,555-Speed 2631.31 samples/sec Loss 10.9863 LearningRate 0.0657 Epoch: 3 Global Step: 156910 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:20:39,452-Speed 2628.25 samples/sec Loss 10.9806 LearningRate 0.0657 Epoch: 3 Global Step: 156920 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:20:43,346-Speed 2630.21 samples/sec Loss 11.2286 LearningRate 0.0657 Epoch: 3 Global Step: 156930 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:20:47,250-Speed 2623.37 samples/sec Loss 11.1935 LearningRate 0.0657 Epoch: 3 Global Step: 156940 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:20:51,144-Speed 2630.38 samples/sec Loss 11.1306 LearningRate 0.0657 Epoch: 3 Global Step: 156950 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:20:55,052-Speed 2620.57 samples/sec Loss 11.1238 LearningRate 0.0657 Epoch: 3 Global Step: 156960 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:20:58,943-Speed 2632.61 samples/sec Loss 11.0949 LearningRate 0.0657 Epoch: 3 Global Step: 156970 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:21:02,841-Speed 2627.93 samples/sec Loss 11.0471 LearningRate 0.0657 Epoch: 3 Global Step: 156980 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:21:06,735-Speed 2630.36 samples/sec Loss 10.9186 LearningRate 0.0657 Epoch: 3 Global Step: 156990 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:10,641-Speed 2621.89 samples/sec Loss 11.0851 LearningRate 0.0657 Epoch: 3 Global Step: 157000 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:14,536-Speed 2629.70 samples/sec Loss 10.9802 LearningRate 0.0657 Epoch: 3 Global Step: 157010 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:18,432-Speed 2628.80 samples/sec Loss 11.1047 LearningRate 0.0657 Epoch: 3 Global Step: 157020 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:22,339-Speed 2621.75 samples/sec Loss 10.9409 LearningRate 0.0657 Epoch: 3 Global Step: 157030 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:26,232-Speed 2630.42 samples/sec Loss 11.0474 LearningRate 0.0657 Epoch: 3 Global Step: 157040 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:30,131-Speed 2627.28 samples/sec Loss 11.0304 LearningRate 0.0657 Epoch: 3 Global Step: 157050 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:34,024-Speed 2631.32 samples/sec Loss 11.1547 LearningRate 0.0657 Epoch: 3 Global Step: 157060 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:37,929-Speed 2622.57 samples/sec Loss 11.2623 LearningRate 0.0657 Epoch: 3 Global Step: 157070 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:41,825-Speed 2628.99 samples/sec Loss 11.1166 LearningRate 0.0657 Epoch: 3 Global Step: 157080 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:45,718-Speed 2630.81 samples/sec Loss 11.0781 LearningRate 0.0657 Epoch: 3 Global Step: 157090 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:21:49,616-Speed 2627.70 samples/sec Loss 11.1216 LearningRate 0.0657 Epoch: 3 Global Step: 157100 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:21:53,501-Speed 2636.50 samples/sec Loss 11.0033 LearningRate 0.0657 Epoch: 3 Global Step: 157110 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:21:57,410-Speed 2619.93 samples/sec Loss 10.9505 LearningRate 0.0657 Epoch: 3 Global Step: 157120 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:01,305-Speed 2629.79 samples/sec Loss 10.9679 LearningRate 0.0657 Epoch: 3 Global Step: 157130 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:05,204-Speed 2626.55 samples/sec Loss 11.0108 LearningRate 0.0657 Epoch: 3 Global Step: 157140 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:09,096-Speed 2631.99 samples/sec Loss 11.0585 LearningRate 0.0657 Epoch: 3 Global Step: 157150 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:12,992-Speed 2629.18 samples/sec Loss 11.1795 LearningRate 0.0657 Epoch: 3 Global Step: 157160 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:16,890-Speed 2627.58 samples/sec Loss 11.1119 LearningRate 0.0657 Epoch: 3 Global Step: 157170 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:20,783-Speed 2630.95 samples/sec Loss 11.0065 LearningRate 0.0657 Epoch: 3 Global Step: 157180 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:24,686-Speed 2624.18 samples/sec Loss 11.1773 LearningRate 0.0657 Epoch: 3 Global Step: 157190 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:28,660-Speed 2577.12 samples/sec Loss 11.0295 LearningRate 0.0657 Epoch: 3 Global Step: 157200 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:32,570-Speed 2619.48 samples/sec Loss 11.0100 LearningRate 0.0657 Epoch: 3 Global Step: 157210 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:22:36,476-Speed 2623.10 samples/sec Loss 11.2059 LearningRate 0.0657 Epoch: 3 Global Step: 157220 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:22:40,485-Speed 2554.60 samples/sec Loss 11.1905 LearningRate 0.0657 Epoch: 3 Global Step: 157230 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:22:44,362-Speed 2641.70 samples/sec Loss 11.0598 LearningRate 0.0657 Epoch: 3 Global Step: 157240 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:48,255-Speed 2631.28 samples/sec Loss 10.9256 LearningRate 0.0657 Epoch: 3 Global Step: 157250 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:52,175-Speed 2613.36 samples/sec Loss 11.1204 LearningRate 0.0657 Epoch: 3 Global Step: 157260 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:22:56,117-Speed 2597.86 samples/sec Loss 11.1828 LearningRate 0.0657 Epoch: 3 Global Step: 157270 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:00,028-Speed 2618.56 samples/sec Loss 11.1459 LearningRate 0.0657 Epoch: 3 Global Step: 157280 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:03,939-Speed 2618.89 samples/sec Loss 11.1567 LearningRate 0.0657 Epoch: 3 Global Step: 157290 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:07,838-Speed 2627.10 samples/sec Loss 11.0276 LearningRate 0.0657 Epoch: 3 Global Step: 157300 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:11,732-Speed 2630.56 samples/sec Loss 11.0263 LearningRate 0.0657 Epoch: 3 Global Step: 157310 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:15,624-Speed 2631.43 samples/sec Loss 11.0590 LearningRate 0.0657 Epoch: 3 Global Step: 157320 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:19,518-Speed 2630.30 samples/sec Loss 11.1296 LearningRate 0.0657 Epoch: 3 Global Step: 157330 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:23,418-Speed 2626.63 samples/sec Loss 11.0013 LearningRate 0.0657 Epoch: 3 Global Step: 157340 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:27,309-Speed 2631.80 samples/sec Loss 11.0813 LearningRate 0.0657 Epoch: 3 Global Step: 157350 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:31,208-Speed 2626.72 samples/sec Loss 11.1384 LearningRate 0.0657 Epoch: 3 Global Step: 157360 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:23:35,112-Speed 2623.86 samples/sec Loss 11.1501 LearningRate 0.0657 Epoch: 3 Global Step: 157370 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:23:39,021-Speed 2620.29 samples/sec Loss 11.0319 LearningRate 0.0657 Epoch: 3 Global Step: 157380 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:23:42,930-Speed 2620.33 samples/sec Loss 10.9809 LearningRate 0.0657 Epoch: 3 Global Step: 157390 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:23:46,840-Speed 2619.21 samples/sec Loss 11.1430 LearningRate 0.0657 Epoch: 3 Global Step: 157400 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:23:50,742-Speed 2625.11 samples/sec Loss 11.0852 LearningRate 0.0657 Epoch: 3 Global Step: 157410 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:23:54,638-Speed 2628.67 samples/sec Loss 11.0447 LearningRate 0.0656 Epoch: 3 Global Step: 157420 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:23:58,547-Speed 2619.88 samples/sec Loss 10.9671 LearningRate 0.0656 Epoch: 3 Global Step: 157430 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:02,446-Speed 2627.38 samples/sec Loss 11.0544 LearningRate 0.0656 Epoch: 3 Global Step: 157440 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:06,342-Speed 2629.41 samples/sec Loss 11.0970 LearningRate 0.0656 Epoch: 3 Global Step: 157450 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:10,234-Speed 2630.96 samples/sec Loss 11.1641 LearningRate 0.0656 Epoch: 3 Global Step: 157460 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:14,131-Speed 2628.49 samples/sec Loss 11.1483 LearningRate 0.0656 Epoch: 3 Global Step: 157470 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:24:18,032-Speed 2625.83 samples/sec Loss 11.0114 LearningRate 0.0656 Epoch: 3 Global Step: 157480 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:24:21,919-Speed 2635.22 samples/sec Loss 11.0215 LearningRate 0.0656 Epoch: 3 Global Step: 157490 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:25,822-Speed 2624.07 samples/sec Loss 11.1728 LearningRate 0.0656 Epoch: 3 Global Step: 157500 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:29,735-Speed 2616.74 samples/sec Loss 11.0298 LearningRate 0.0656 Epoch: 3 Global Step: 157510 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:33,632-Speed 2628.19 samples/sec Loss 11.1431 LearningRate 0.0656 Epoch: 3 Global Step: 157520 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:37,530-Speed 2628.20 samples/sec Loss 10.9670 LearningRate 0.0656 Epoch: 3 Global Step: 157530 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:41,428-Speed 2627.97 samples/sec Loss 11.0530 LearningRate 0.0656 Epoch: 3 Global Step: 157540 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:45,324-Speed 2628.55 samples/sec Loss 10.9979 LearningRate 0.0656 Epoch: 3 Global Step: 157550 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:49,217-Speed 2631.26 samples/sec Loss 11.1027 LearningRate 0.0656 Epoch: 3 Global Step: 157560 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:53,116-Speed 2626.32 samples/sec Loss 11.0778 LearningRate 0.0656 Epoch: 3 Global Step: 157570 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:24:57,010-Speed 2630.47 samples/sec Loss 11.0453 LearningRate 0.0656 Epoch: 3 Global Step: 157580 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:00,906-Speed 2628.70 samples/sec Loss 11.0441 LearningRate 0.0656 Epoch: 3 Global Step: 157590 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:25:04,803-Speed 2628.57 samples/sec Loss 10.9532 LearningRate 0.0656 Epoch: 3 Global Step: 157600 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:25:08,683-Speed 2639.19 samples/sec Loss 11.0461 LearningRate 0.0656 Epoch: 3 Global Step: 157610 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:12,595-Speed 2618.27 samples/sec Loss 11.1466 LearningRate 0.0656 Epoch: 3 Global Step: 157620 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:16,492-Speed 2628.64 samples/sec Loss 11.0284 LearningRate 0.0656 Epoch: 3 Global Step: 157630 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:20,385-Speed 2630.73 samples/sec Loss 11.0187 LearningRate 0.0656 Epoch: 3 Global Step: 157640 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:24,279-Speed 2630.42 samples/sec Loss 11.0660 LearningRate 0.0656 Epoch: 3 Global Step: 157650 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:28,176-Speed 2628.19 samples/sec Loss 10.8698 LearningRate 0.0656 Epoch: 3 Global Step: 157660 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:32,078-Speed 2624.99 samples/sec Loss 11.1079 LearningRate 0.0656 Epoch: 3 Global Step: 157670 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:35,975-Speed 2628.50 samples/sec Loss 11.0752 LearningRate 0.0656 Epoch: 3 Global Step: 157680 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:39,880-Speed 2622.59 samples/sec Loss 11.1239 LearningRate 0.0656 Epoch: 3 Global Step: 157690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:43,784-Speed 2623.09 samples/sec Loss 11.1230 LearningRate 0.0656 Epoch: 3 Global Step: 157700 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:47,682-Speed 2628.20 samples/sec Loss 10.9456 LearningRate 0.0656 Epoch: 3 Global Step: 157710 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:25:51,580-Speed 2627.21 samples/sec Loss 11.1009 LearningRate 0.0656 Epoch: 3 Global Step: 157720 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:55,563-Speed 2571.84 samples/sec Loss 11.0926 LearningRate 0.0656 Epoch: 3 Global Step: 157730 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:25:59,468-Speed 2622.73 samples/sec Loss 11.0888 LearningRate 0.0656 Epoch: 3 Global Step: 157740 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:26:03,357-Speed 2634.03 samples/sec Loss 10.9887 LearningRate 0.0656 Epoch: 3 Global Step: 157750 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:07,257-Speed 2626.07 samples/sec Loss 11.0603 LearningRate 0.0656 Epoch: 3 Global Step: 157760 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:11,156-Speed 2626.82 samples/sec Loss 10.9405 LearningRate 0.0656 Epoch: 3 Global Step: 157770 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:15,051-Speed 2629.81 samples/sec Loss 10.9580 LearningRate 0.0656 Epoch: 3 Global Step: 157780 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:18,944-Speed 2630.38 samples/sec Loss 10.8636 LearningRate 0.0656 Epoch: 3 Global Step: 157790 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:22,853-Speed 2620.63 samples/sec Loss 10.9487 LearningRate 0.0656 Epoch: 3 Global Step: 157800 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:26,751-Speed 2627.26 samples/sec Loss 11.0870 LearningRate 0.0656 Epoch: 3 Global Step: 157810 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:30,648-Speed 2628.61 samples/sec Loss 11.2181 LearningRate 0.0656 Epoch: 3 Global Step: 157820 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:34,544-Speed 2629.15 samples/sec Loss 11.1467 LearningRate 0.0656 Epoch: 3 Global Step: 157830 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:38,490-Speed 2595.58 samples/sec Loss 11.0427 LearningRate 0.0656 Epoch: 3 Global Step: 157840 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:26:42,392-Speed 2624.73 samples/sec Loss 11.1799 LearningRate 0.0656 Epoch: 3 Global Step: 157850 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:26:46,287-Speed 2629.70 samples/sec Loss 11.1548 LearningRate 0.0656 Epoch: 3 Global Step: 157860 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:26:50,181-Speed 2630.31 samples/sec Loss 11.1292 LearningRate 0.0656 Epoch: 3 Global Step: 157870 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:26:54,078-Speed 2628.58 samples/sec Loss 10.9326 LearningRate 0.0656 Epoch: 3 Global Step: 157880 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:26:57,973-Speed 2629.64 samples/sec Loss 11.0211 LearningRate 0.0656 Epoch: 3 Global Step: 157890 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:01,875-Speed 2624.95 samples/sec Loss 11.1477 LearningRate 0.0656 Epoch: 3 Global Step: 157900 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:05,779-Speed 2623.58 samples/sec Loss 11.0835 LearningRate 0.0656 Epoch: 3 Global Step: 157910 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:09,684-Speed 2622.85 samples/sec Loss 10.9023 LearningRate 0.0656 Epoch: 3 Global Step: 157920 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:13,584-Speed 2626.19 samples/sec Loss 11.1285 LearningRate 0.0655 Epoch: 3 Global Step: 157930 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:17,483-Speed 2627.03 samples/sec Loss 10.9227 LearningRate 0.0655 Epoch: 3 Global Step: 157940 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:21,402-Speed 2613.61 samples/sec Loss 11.0143 LearningRate 0.0655 Epoch: 3 Global Step: 157950 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:27:25,290-Speed 2634.61 samples/sec Loss 11.0854 LearningRate 0.0655 Epoch: 3 Global Step: 157960 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:27:29,276-Speed 2569.54 samples/sec Loss 11.0898 LearningRate 0.0655 Epoch: 3 Global Step: 157970 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:27:33,233-Speed 2588.12 samples/sec Loss 11.0233 LearningRate 0.0655 Epoch: 3 Global Step: 157980 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:37,135-Speed 2624.78 samples/sec Loss 10.9743 LearningRate 0.0655 Epoch: 3 Global Step: 157990 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:41,035-Speed 2626.36 samples/sec Loss 11.0052 LearningRate 0.0655 Epoch: 3 Global Step: 158000 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:44,940-Speed 2623.15 samples/sec Loss 11.1171 LearningRate 0.0655 Epoch: 3 Global Step: 158010 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:48,851-Speed 2618.60 samples/sec Loss 11.1392 LearningRate 0.0655 Epoch: 3 Global Step: 158020 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:52,773-Speed 2611.48 samples/sec Loss 10.9524 LearningRate 0.0655 Epoch: 3 Global Step: 158030 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:27:56,662-Speed 2633.64 samples/sec Loss 11.0757 LearningRate 0.0655 Epoch: 3 Global Step: 158040 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:00,567-Speed 2622.85 samples/sec Loss 11.0646 LearningRate 0.0655 Epoch: 3 Global Step: 158050 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:04,612-Speed 2532.02 samples/sec Loss 10.9918 LearningRate 0.0655 Epoch: 3 Global Step: 158060 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:08,539-Speed 2608.61 samples/sec Loss 10.9907 LearningRate 0.0655 Epoch: 3 Global Step: 158070 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:12,532-Speed 2564.67 samples/sec Loss 11.1038 LearningRate 0.0655 Epoch: 3 Global Step: 158080 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:16,441-Speed 2620.68 samples/sec Loss 11.1671 LearningRate 0.0655 Epoch: 3 Global Step: 158090 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:20,340-Speed 2626.85 samples/sec Loss 11.0691 LearningRate 0.0655 Epoch: 3 Global Step: 158100 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:24,242-Speed 2624.86 samples/sec Loss 10.9592 LearningRate 0.0655 Epoch: 3 Global Step: 158110 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:28,162-Speed 2613.27 samples/sec Loss 11.0784 LearningRate 0.0655 Epoch: 3 Global Step: 158120 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:32,166-Speed 2557.98 samples/sec Loss 11.1611 LearningRate 0.0655 Epoch: 3 Global Step: 158130 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:28:36,059-Speed 2630.84 samples/sec Loss 11.2649 LearningRate 0.0655 Epoch: 3 Global Step: 158140 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:28:39,957-Speed 2627.51 samples/sec Loss 10.9287 LearningRate 0.0655 Epoch: 3 Global Step: 158150 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:28:43,868-Speed 2619.52 samples/sec Loss 11.0348 LearningRate 0.0655 Epoch: 3 Global Step: 158160 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:28:47,772-Speed 2623.22 samples/sec Loss 11.1075 LearningRate 0.0655 Epoch: 3 Global Step: 158170 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:28:51,683-Speed 2619.78 samples/sec Loss 11.0075 LearningRate 0.0655 Epoch: 3 Global Step: 158180 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:28:55,591-Speed 2620.50 samples/sec Loss 11.1345 LearningRate 0.0655 Epoch: 3 Global Step: 158190 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:28:59,498-Speed 2621.25 samples/sec Loss 10.9439 LearningRate 0.0655 Epoch: 3 Global Step: 158200 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:03,396-Speed 2627.92 samples/sec Loss 11.0241 LearningRate 0.0655 Epoch: 3 Global Step: 158210 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:07,290-Speed 2630.12 samples/sec Loss 11.0618 LearningRate 0.0655 Epoch: 3 Global Step: 158220 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:11,194-Speed 2623.55 samples/sec Loss 11.0416 LearningRate 0.0655 Epoch: 3 Global Step: 158230 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:15,099-Speed 2622.99 samples/sec Loss 11.1084 LearningRate 0.0655 Epoch: 3 Global Step: 158240 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:29:19,019-Speed 2612.66 samples/sec Loss 10.8599 LearningRate 0.0655 Epoch: 3 Global Step: 158250 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:22,922-Speed 2624.58 samples/sec Loss 10.9186 LearningRate 0.0655 Epoch: 3 Global Step: 158260 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:26,830-Speed 2620.96 samples/sec Loss 10.9959 LearningRate 0.0655 Epoch: 3 Global Step: 158270 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:30,727-Speed 2628.30 samples/sec Loss 11.1525 LearningRate 0.0655 Epoch: 3 Global Step: 158280 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:34,620-Speed 2630.40 samples/sec Loss 11.0176 LearningRate 0.0655 Epoch: 3 Global Step: 158290 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:38,519-Speed 2626.94 samples/sec Loss 10.9298 LearningRate 0.0655 Epoch: 3 Global Step: 158300 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:42,418-Speed 2626.99 samples/sec Loss 10.9840 LearningRate 0.0655 Epoch: 3 Global Step: 158310 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:46,311-Speed 2631.21 samples/sec Loss 10.9284 LearningRate 0.0655 Epoch: 3 Global Step: 158320 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:50,206-Speed 2629.42 samples/sec Loss 11.0663 LearningRate 0.0655 Epoch: 3 Global Step: 158330 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:54,100-Speed 2630.30 samples/sec Loss 11.0194 LearningRate 0.0655 Epoch: 3 Global Step: 158340 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:29:58,003-Speed 2623.84 samples/sec Loss 11.0874 LearningRate 0.0655 Epoch: 3 Global Step: 158350 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:30:01,882-Speed 2640.89 samples/sec Loss 10.9742 LearningRate 0.0655 Epoch: 3 Global Step: 158360 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:05,782-Speed 2626.35 samples/sec Loss 10.9956 LearningRate 0.0655 Epoch: 3 Global Step: 158370 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:09,698-Speed 2615.71 samples/sec Loss 10.8955 LearningRate 0.0655 Epoch: 3 Global Step: 158380 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:13,595-Speed 2628.10 samples/sec Loss 10.9382 LearningRate 0.0655 Epoch: 3 Global Step: 158390 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:17,489-Speed 2630.64 samples/sec Loss 11.0688 LearningRate 0.0655 Epoch: 3 Global Step: 158400 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:21,385-Speed 2628.59 samples/sec Loss 11.1027 LearningRate 0.0655 Epoch: 3 Global Step: 158410 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:25,280-Speed 2629.37 samples/sec Loss 11.0543 LearningRate 0.0655 Epoch: 3 Global Step: 158420 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:29,172-Speed 2631.87 samples/sec Loss 11.1253 LearningRate 0.0655 Epoch: 3 Global Step: 158430 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:30:33,055-Speed 2637.69 samples/sec Loss 11.0113 LearningRate 0.0655 Epoch: 3 Global Step: 158440 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:30:36,951-Speed 2629.23 samples/sec Loss 10.9324 LearningRate 0.0654 Epoch: 3 Global Step: 158450 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:30:40,843-Speed 2631.75 samples/sec Loss 11.1827 LearningRate 0.0654 Epoch: 3 Global Step: 158460 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:30:44,733-Speed 2632.96 samples/sec Loss 11.0009 LearningRate 0.0654 Epoch: 3 Global Step: 158470 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:30:48,626-Speed 2630.62 samples/sec Loss 11.1596 LearningRate 0.0654 Epoch: 3 Global Step: 158480 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:30:52,518-Speed 2631.61 samples/sec Loss 11.0842 LearningRate 0.0654 Epoch: 3 Global Step: 158490 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:30:56,411-Speed 2631.18 samples/sec Loss 11.0328 LearningRate 0.0654 Epoch: 3 Global Step: 158500 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:31:00,306-Speed 2629.78 samples/sec Loss 11.1082 LearningRate 0.0654 Epoch: 3 Global Step: 158510 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:31:04,205-Speed 2626.63 samples/sec Loss 11.1368 LearningRate 0.0654 Epoch: 3 Global Step: 158520 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:31:08,102-Speed 2628.78 samples/sec Loss 11.0776 LearningRate 0.0654 Epoch: 3 Global Step: 158530 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:31:12,037-Speed 2602.99 samples/sec Loss 11.0585 LearningRate 0.0654 Epoch: 3 Global Step: 158540 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:15,930-Speed 2630.90 samples/sec Loss 11.1082 LearningRate 0.0654 Epoch: 3 Global Step: 158550 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:19,827-Speed 2628.54 samples/sec Loss 10.9775 LearningRate 0.0654 Epoch: 3 Global Step: 158560 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:23,724-Speed 2628.36 samples/sec Loss 10.8590 LearningRate 0.0654 Epoch: 3 Global Step: 158570 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:27,673-Speed 2594.00 samples/sec Loss 11.0476 LearningRate 0.0654 Epoch: 3 Global Step: 158580 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:31,569-Speed 2629.04 samples/sec Loss 11.1358 LearningRate 0.0654 Epoch: 3 Global Step: 158590 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:35,489-Speed 2612.84 samples/sec Loss 10.9978 LearningRate 0.0654 Epoch: 3 Global Step: 158600 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:39,432-Speed 2597.84 samples/sec Loss 11.1069 LearningRate 0.0654 Epoch: 3 Global Step: 158610 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:43,339-Speed 2621.76 samples/sec Loss 10.9404 LearningRate 0.0654 Epoch: 3 Global Step: 158620 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:47,242-Speed 2625.01 samples/sec Loss 11.1078 LearningRate 0.0654 Epoch: 3 Global Step: 158630 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:51,129-Speed 2634.26 samples/sec Loss 11.1671 LearningRate 0.0654 Epoch: 3 Global Step: 158640 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:55,024-Speed 2630.79 samples/sec Loss 11.0949 LearningRate 0.0654 Epoch: 3 Global Step: 158650 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:31:58,919-Speed 2629.09 samples/sec Loss 11.0548 LearningRate 0.0654 Epoch: 3 Global Step: 158660 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:02,814-Speed 2629.67 samples/sec Loss 11.0745 LearningRate 0.0654 Epoch: 3 Global Step: 158670 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:06,733-Speed 2613.21 samples/sec Loss 11.0801 LearningRate 0.0654 Epoch: 3 Global Step: 158680 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:10,626-Speed 2631.46 samples/sec Loss 10.9052 LearningRate 0.0654 Epoch: 3 Global Step: 158690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:14,523-Speed 2628.62 samples/sec Loss 11.1758 LearningRate 0.0654 Epoch: 3 Global Step: 158700 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:18,422-Speed 2627.12 samples/sec Loss 10.8589 LearningRate 0.0654 Epoch: 3 Global Step: 158710 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:22,321-Speed 2627.00 samples/sec Loss 11.0772 LearningRate 0.0654 Epoch: 3 Global Step: 158720 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:26,213-Speed 2631.71 samples/sec Loss 11.0902 LearningRate 0.0654 Epoch: 3 Global Step: 158730 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:30,111-Speed 2627.40 samples/sec Loss 11.0294 LearningRate 0.0654 Epoch: 3 Global Step: 158740 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:32:34,028-Speed 2614.57 samples/sec Loss 11.0041 LearningRate 0.0654 Epoch: 3 Global Step: 158750 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:32:37,931-Speed 2623.83 samples/sec Loss 10.9601 LearningRate 0.0654 Epoch: 3 Global Step: 158760 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:41,838-Speed 2621.77 samples/sec Loss 10.9906 LearningRate 0.0654 Epoch: 3 Global Step: 158770 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:45,868-Speed 2541.67 samples/sec Loss 11.0142 LearningRate 0.0654 Epoch: 3 Global Step: 158780 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:49,782-Speed 2617.38 samples/sec Loss 10.9692 LearningRate 0.0654 Epoch: 3 Global Step: 158790 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:53,682-Speed 2626.10 samples/sec Loss 11.0966 LearningRate 0.0654 Epoch: 3 Global Step: 158800 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:32:57,576-Speed 2630.67 samples/sec Loss 11.0024 LearningRate 0.0654 Epoch: 3 Global Step: 158810 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:33:01,478-Speed 2624.43 samples/sec Loss 10.9753 LearningRate 0.0654 Epoch: 3 Global Step: 158820 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:33:05,370-Speed 2631.68 samples/sec Loss 11.1323 LearningRate 0.0654 Epoch: 3 Global Step: 158830 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:33:09,250-Speed 2639.77 samples/sec Loss 11.1611 LearningRate 0.0654 Epoch: 3 Global Step: 158840 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:13,134-Speed 2637.16 samples/sec Loss 10.9784 LearningRate 0.0654 Epoch: 3 Global Step: 158850 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:17,041-Speed 2621.72 samples/sec Loss 10.9163 LearningRate 0.0654 Epoch: 3 Global Step: 158860 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:21,000-Speed 2587.16 samples/sec Loss 11.0519 LearningRate 0.0654 Epoch: 3 Global Step: 158870 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:24,893-Speed 2631.03 samples/sec Loss 10.9338 LearningRate 0.0654 Epoch: 3 Global Step: 158880 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:28,786-Speed 2630.98 samples/sec Loss 11.0241 LearningRate 0.0654 Epoch: 3 Global Step: 158890 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:32,675-Speed 2633.62 samples/sec Loss 10.9925 LearningRate 0.0654 Epoch: 3 Global Step: 158900 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:36,565-Speed 2633.30 samples/sec Loss 11.0176 LearningRate 0.0654 Epoch: 3 Global Step: 158910 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:40,455-Speed 2632.70 samples/sec Loss 11.0207 LearningRate 0.0654 Epoch: 3 Global Step: 158920 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:44,350-Speed 2629.42 samples/sec Loss 11.0176 LearningRate 0.0654 Epoch: 3 Global Step: 158930 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:33:48,244-Speed 2630.68 samples/sec Loss 11.0826 LearningRate 0.0654 Epoch: 3 Global Step: 158940 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:33:52,155-Speed 2618.46 samples/sec Loss 10.8754 LearningRate 0.0654 Epoch: 3 Global Step: 158950 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:33:56,066-Speed 2618.96 samples/sec Loss 11.0415 LearningRate 0.0653 Epoch: 3 Global Step: 158960 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:33:59,967-Speed 2625.69 samples/sec Loss 10.9902 LearningRate 0.0653 Epoch: 3 Global Step: 158970 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:03,863-Speed 2629.15 samples/sec Loss 10.9272 LearningRate 0.0653 Epoch: 3 Global Step: 158980 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:07,758-Speed 2629.26 samples/sec Loss 11.0971 LearningRate 0.0653 Epoch: 3 Global Step: 158990 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:11,852-Speed 2501.94 samples/sec Loss 11.0825 LearningRate 0.0653 Epoch: 3 Global Step: 159000 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:15,830-Speed 2574.87 samples/sec Loss 10.9934 LearningRate 0.0653 Epoch: 3 Global Step: 159010 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:19,725-Speed 2629.14 samples/sec Loss 11.0094 LearningRate 0.0653 Epoch: 3 Global Step: 159020 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:23,624-Speed 2627.70 samples/sec Loss 10.9966 LearningRate 0.0653 Epoch: 3 Global Step: 159030 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:27,518-Speed 2630.58 samples/sec Loss 10.9761 LearningRate 0.0653 Epoch: 3 Global Step: 159040 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:34:31,426-Speed 2620.71 samples/sec Loss 10.8631 LearningRate 0.0653 Epoch: 3 Global Step: 159050 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:35,326-Speed 2626.36 samples/sec Loss 10.8999 LearningRate 0.0653 Epoch: 3 Global Step: 159060 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:39,236-Speed 2619.86 samples/sec Loss 11.0969 LearningRate 0.0653 Epoch: 3 Global Step: 159070 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:43,131-Speed 2629.49 samples/sec Loss 10.9509 LearningRate 0.0653 Epoch: 3 Global Step: 159080 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:47,026-Speed 2629.87 samples/sec Loss 11.0387 LearningRate 0.0653 Epoch: 3 Global Step: 159090 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:50,920-Speed 2631.02 samples/sec Loss 10.9035 LearningRate 0.0653 Epoch: 3 Global Step: 159100 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:54,831-Speed 2618.41 samples/sec Loss 11.0478 LearningRate 0.0653 Epoch: 3 Global Step: 159110 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:34:58,732-Speed 2625.71 samples/sec Loss 10.7785 LearningRate 0.0653 Epoch: 3 Global Step: 159120 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:02,651-Speed 2613.04 samples/sec Loss 11.0932 LearningRate 0.0653 Epoch: 3 Global Step: 159130 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:06,545-Speed 2630.85 samples/sec Loss 11.0589 LearningRate 0.0653 Epoch: 3 Global Step: 159140 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:10,439-Speed 2630.19 samples/sec Loss 11.0685 LearningRate 0.0653 Epoch: 3 Global Step: 159150 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:35:14,335-Speed 2629.31 samples/sec Loss 11.0339 LearningRate 0.0653 Epoch: 3 Global Step: 159160 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:35:18,235-Speed 2625.94 samples/sec Loss 10.9380 LearningRate 0.0653 Epoch: 3 Global Step: 159170 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:35:22,128-Speed 2631.21 samples/sec Loss 10.9841 LearningRate 0.0653 Epoch: 3 Global Step: 159180 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:35:26,027-Speed 2627.39 samples/sec Loss 11.0233 LearningRate 0.0653 Epoch: 3 Global Step: 159190 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:35:29,914-Speed 2635.06 samples/sec Loss 11.0068 LearningRate 0.0653 Epoch: 3 Global Step: 159200 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:33,831-Speed 2614.23 samples/sec Loss 11.0771 LearningRate 0.0653 Epoch: 3 Global Step: 159210 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:37,728-Speed 2628.98 samples/sec Loss 11.0237 LearningRate 0.0653 Epoch: 3 Global Step: 159220 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:41,627-Speed 2626.77 samples/sec Loss 10.9516 LearningRate 0.0653 Epoch: 3 Global Step: 159230 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:45,521-Speed 2630.39 samples/sec Loss 10.9383 LearningRate 0.0653 Epoch: 3 Global Step: 159240 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:49,420-Speed 2627.26 samples/sec Loss 10.9717 LearningRate 0.0653 Epoch: 3 Global Step: 159250 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:53,326-Speed 2622.52 samples/sec Loss 10.9242 LearningRate 0.0653 Epoch: 3 Global Step: 159260 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:35:57,228-Speed 2624.56 samples/sec Loss 10.9373 LearningRate 0.0653 Epoch: 3 Global Step: 159270 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:01,154-Speed 2608.89 samples/sec Loss 11.1220 LearningRate 0.0653 Epoch: 3 Global Step: 159280 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:05,050-Speed 2629.27 samples/sec Loss 11.1101 LearningRate 0.0653 Epoch: 3 Global Step: 159290 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:08,981-Speed 2605.96 samples/sec Loss 11.0193 LearningRate 0.0653 Epoch: 3 Global Step: 159300 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:36:12,887-Speed 2622.42 samples/sec Loss 11.0508 LearningRate 0.0653 Epoch: 3 Global Step: 159310 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:36:16,788-Speed 2626.10 samples/sec Loss 11.0537 LearningRate 0.0653 Epoch: 3 Global Step: 159320 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:36:20,764-Speed 2575.81 samples/sec Loss 10.9994 LearningRate 0.0653 Epoch: 3 Global Step: 159330 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:36:24,754-Speed 2567.17 samples/sec Loss 11.0186 LearningRate 0.0653 Epoch: 3 Global Step: 159340 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:28,685-Speed 2605.96 samples/sec Loss 10.9081 LearningRate 0.0653 Epoch: 3 Global Step: 159350 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:32,666-Speed 2572.41 samples/sec Loss 10.9585 LearningRate 0.0653 Epoch: 3 Global Step: 159360 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:36,560-Speed 2630.69 samples/sec Loss 11.2307 LearningRate 0.0653 Epoch: 3 Global Step: 159370 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:40,497-Speed 2601.01 samples/sec Loss 10.8503 LearningRate 0.0653 Epoch: 3 Global Step: 159380 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:44,446-Speed 2594.29 samples/sec Loss 11.0791 LearningRate 0.0653 Epoch: 3 Global Step: 159390 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:48,351-Speed 2623.12 samples/sec Loss 10.8380 LearningRate 0.0653 Epoch: 3 Global Step: 159400 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:52,265-Speed 2616.86 samples/sec Loss 11.0274 LearningRate 0.0653 Epoch: 3 Global Step: 159410 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:36:56,163-Speed 2627.21 samples/sec Loss 10.9564 LearningRate 0.0653 Epoch: 3 Global Step: 159420 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:37:00,056-Speed 2631.29 samples/sec Loss 11.0029 LearningRate 0.0653 Epoch: 3 Global Step: 159430 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:37:03,936-Speed 2639.13 samples/sec Loss 11.0265 LearningRate 0.0653 Epoch: 3 Global Step: 159440 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:37:07,836-Speed 2626.67 samples/sec Loss 11.0949 LearningRate 0.0653 Epoch: 3 Global Step: 159450 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:37:11,727-Speed 2632.22 samples/sec Loss 11.1059 LearningRate 0.0653 Epoch: 3 Global Step: 159460 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:37:15,620-Speed 2631.03 samples/sec Loss 10.9951 LearningRate 0.0652 Epoch: 3 Global Step: 159470 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:37:19,515-Speed 2629.83 samples/sec Loss 10.8879 LearningRate 0.0652 Epoch: 3 Global Step: 159480 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:37:23,398-Speed 2637.97 samples/sec Loss 11.2656 LearningRate 0.0652 Epoch: 3 Global Step: 159490 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:27,299-Speed 2625.43 samples/sec Loss 11.5980 LearningRate 0.0652 Epoch: 3 Global Step: 159500 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:31,197-Speed 2627.52 samples/sec Loss 11.3739 LearningRate 0.0652 Epoch: 3 Global Step: 159510 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:35,094-Speed 2628.08 samples/sec Loss 11.2510 LearningRate 0.0652 Epoch: 3 Global Step: 159520 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:38,986-Speed 2631.72 samples/sec Loss 11.1301 LearningRate 0.0652 Epoch: 3 Global Step: 159530 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:42,876-Speed 2632.73 samples/sec Loss 10.9917 LearningRate 0.0652 Epoch: 3 Global Step: 159540 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:46,770-Speed 2630.43 samples/sec Loss 11.0527 LearningRate 0.0652 Epoch: 3 Global Step: 159550 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:50,663-Speed 2631.68 samples/sec Loss 11.1123 LearningRate 0.0652 Epoch: 3 Global Step: 159560 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:54,557-Speed 2630.15 samples/sec Loss 11.1206 LearningRate 0.0652 Epoch: 3 Global Step: 159570 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:37:58,447-Speed 2632.78 samples/sec Loss 11.1547 LearningRate 0.0652 Epoch: 3 Global Step: 159580 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:38:02,347-Speed 2626.08 samples/sec Loss 11.0379 LearningRate 0.0652 Epoch: 3 Global Step: 159590 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:06,248-Speed 2625.32 samples/sec Loss 10.9958 LearningRate 0.0652 Epoch: 3 Global Step: 159600 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:10,153-Speed 2623.25 samples/sec Loss 11.1787 LearningRate 0.0652 Epoch: 3 Global Step: 159610 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:14,055-Speed 2625.10 samples/sec Loss 11.0766 LearningRate 0.0652 Epoch: 3 Global Step: 159620 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:17,958-Speed 2624.25 samples/sec Loss 11.0770 LearningRate 0.0652 Epoch: 3 Global Step: 159630 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:21,863-Speed 2623.17 samples/sec Loss 11.2137 LearningRate 0.0652 Epoch: 3 Global Step: 159640 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:25,767-Speed 2623.34 samples/sec Loss 11.2070 LearningRate 0.0652 Epoch: 3 Global Step: 159650 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:29,670-Speed 2624.48 samples/sec Loss 11.0141 LearningRate 0.0652 Epoch: 3 Global Step: 159660 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:33,580-Speed 2619.28 samples/sec Loss 11.0365 LearningRate 0.0652 Epoch: 3 Global Step: 159670 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:37,484-Speed 2623.24 samples/sec Loss 11.2478 LearningRate 0.0652 Epoch: 3 Global Step: 159680 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:41,383-Speed 2626.97 samples/sec Loss 11.0513 LearningRate 0.0652 Epoch: 3 Global Step: 159690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:38:45,258-Speed 2643.26 samples/sec Loss 11.0504 LearningRate 0.0652 Epoch: 3 Global Step: 159700 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:49,162-Speed 2623.64 samples/sec Loss 10.8439 LearningRate 0.0652 Epoch: 3 Global Step: 159710 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:38:53,052-Speed 2633.39 samples/sec Loss 11.2633 LearningRate 0.0652 Epoch: 3 Global Step: 159720 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:38:56,979-Speed 2608.22 samples/sec Loss 11.1171 LearningRate 0.0652 Epoch: 3 Global Step: 159730 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:00,896-Speed 2615.00 samples/sec Loss 11.0260 LearningRate 0.0652 Epoch: 3 Global Step: 159740 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:04,792-Speed 2628.61 samples/sec Loss 10.9534 LearningRate 0.0652 Epoch: 3 Global Step: 159750 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:08,702-Speed 2619.28 samples/sec Loss 10.9722 LearningRate 0.0652 Epoch: 3 Global Step: 159760 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:12,592-Speed 2633.31 samples/sec Loss 10.9683 LearningRate 0.0652 Epoch: 3 Global Step: 159770 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:16,487-Speed 2629.94 samples/sec Loss 11.1234 LearningRate 0.0652 Epoch: 3 Global Step: 159780 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:20,381-Speed 2630.24 samples/sec Loss 11.1539 LearningRate 0.0652 Epoch: 3 Global Step: 159790 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:24,288-Speed 2621.84 samples/sec Loss 11.2041 LearningRate 0.0652 Epoch: 3 Global Step: 159800 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:28,217-Speed 2607.42 samples/sec Loss 11.0727 LearningRate 0.0652 Epoch: 3 Global Step: 159810 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:39:32,114-Speed 2628.08 samples/sec Loss 10.8594 LearningRate 0.0652 Epoch: 3 Global Step: 159820 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:39:36,035-Speed 2611.72 samples/sec Loss 10.9972 LearningRate 0.0652 Epoch: 3 Global Step: 159830 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:39:39,943-Speed 2621.27 samples/sec Loss 11.0341 LearningRate 0.0652 Epoch: 3 Global Step: 159840 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:39:43,845-Speed 2625.00 samples/sec Loss 11.0067 LearningRate 0.0652 Epoch: 3 Global Step: 159850 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:39:47,757-Speed 2618.45 samples/sec Loss 11.1268 LearningRate 0.0652 Epoch: 3 Global Step: 159860 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:39:51,675-Speed 2614.70 samples/sec Loss 11.0921 LearningRate 0.0652 Epoch: 3 Global Step: 159870 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:39:55,596-Speed 2611.88 samples/sec Loss 10.9021 LearningRate 0.0652 Epoch: 3 Global Step: 159880 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:39:59,488-Speed 2632.07 samples/sec Loss 11.0784 LearningRate 0.0652 Epoch: 3 Global Step: 159890 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:40:03,386-Speed 2627.76 samples/sec Loss 10.9704 LearningRate 0.0652 Epoch: 3 Global Step: 159900 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:40:07,287-Speed 2625.41 samples/sec Loss 10.9634 LearningRate 0.0652 Epoch: 3 Global Step: 159910 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:40:11,189-Speed 2624.71 samples/sec Loss 11.0143 LearningRate 0.0652 Epoch: 3 Global Step: 159920 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:15,086-Speed 2628.29 samples/sec Loss 10.9031 LearningRate 0.0652 Epoch: 3 Global Step: 159930 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:18,980-Speed 2630.76 samples/sec Loss 11.0354 LearningRate 0.0652 Epoch: 3 Global Step: 159940 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:22,877-Speed 2627.95 samples/sec Loss 11.1331 LearningRate 0.0652 Epoch: 3 Global Step: 159950 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:26,777-Speed 2626.23 samples/sec Loss 11.0678 LearningRate 0.0652 Epoch: 3 Global Step: 159960 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:30,669-Speed 2631.47 samples/sec Loss 11.0199 LearningRate 0.0652 Epoch: 3 Global Step: 159970 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:34,571-Speed 2625.44 samples/sec Loss 11.0497 LearningRate 0.0651 Epoch: 3 Global Step: 159980 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:38,474-Speed 2623.94 samples/sec Loss 10.9674 LearningRate 0.0651 Epoch: 3 Global Step: 159990 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:40:42,380-Speed 2622.25 samples/sec Loss 11.0598 LearningRate 0.0651 Epoch: 3 Global Step: 160000 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:41:26,157-[lfw][160000]XNorm: 23.239836
Training: 2022-04-13 13:41:26,158-[lfw][160000]Accuracy-Flip: 0.99783+-0.00299
Training: 2022-04-13 13:41:26,159-[lfw][160000]Accuracy-Highest: 0.99783
Training: 2022-04-13 13:42:16,619-[cfp_fp][160000]XNorm: 20.861491
Training: 2022-04-13 13:42:16,620-[cfp_fp][160000]Accuracy-Flip: 0.97843+-0.00972
Training: 2022-04-13 13:42:16,622-[cfp_fp][160000]Accuracy-Highest: 0.98100
Training: 2022-04-13 13:43:00,184-[agedb_30][160000]XNorm: 23.004241
Training: 2022-04-13 13:43:00,185-[agedb_30][160000]Accuracy-Flip: 0.97050+-0.00853
Training: 2022-04-13 13:43:00,186-[agedb_30][160000]Accuracy-Highest: 0.97050
Training: 2022-04-13 13:43:04,052-Speed 72.28 samples/sec Loss 11.0553 LearningRate 0.0651 Epoch: 3 Global Step: 160010 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:43:07,909-Speed 2655.15 samples/sec Loss 11.0809 LearningRate 0.0651 Epoch: 3 Global Step: 160020 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:43:11,765-Speed 2656.64 samples/sec Loss 10.9895 LearningRate 0.0651 Epoch: 3 Global Step: 160030 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:43:15,612-Speed 2662.19 samples/sec Loss 11.1210 LearningRate 0.0651 Epoch: 3 Global Step: 160040 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:19,510-Speed 2627.82 samples/sec Loss 11.0020 LearningRate 0.0651 Epoch: 3 Global Step: 160050 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:23,391-Speed 2639.25 samples/sec Loss 11.0242 LearningRate 0.0651 Epoch: 3 Global Step: 160060 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:27,262-Speed 2647.18 samples/sec Loss 11.0286 LearningRate 0.0651 Epoch: 3 Global Step: 160070 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:31,138-Speed 2642.89 samples/sec Loss 11.0776 LearningRate 0.0651 Epoch: 3 Global Step: 160080 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:35,013-Speed 2643.49 samples/sec Loss 11.1476 LearningRate 0.0651 Epoch: 3 Global Step: 160090 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:38,890-Speed 2641.21 samples/sec Loss 11.1376 LearningRate 0.0651 Epoch: 3 Global Step: 160100 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:42,776-Speed 2636.37 samples/sec Loss 10.8194 LearningRate 0.0651 Epoch: 3 Global Step: 160110 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:46,659-Speed 2637.50 samples/sec Loss 10.9650 LearningRate 0.0651 Epoch: 3 Global Step: 160120 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:50,543-Speed 2637.32 samples/sec Loss 10.9648 LearningRate 0.0651 Epoch: 3 Global Step: 160130 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:43:54,431-Speed 2635.22 samples/sec Loss 11.0046 LearningRate 0.0651 Epoch: 3 Global Step: 160140 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:43:58,320-Speed 2633.26 samples/sec Loss 11.0694 LearningRate 0.0651 Epoch: 3 Global Step: 160150 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:02,209-Speed 2635.06 samples/sec Loss 11.0220 LearningRate 0.0651 Epoch: 3 Global Step: 160160 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:06,105-Speed 2628.69 samples/sec Loss 10.9520 LearningRate 0.0651 Epoch: 3 Global Step: 160170 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:09,999-Speed 2630.19 samples/sec Loss 10.9794 LearningRate 0.0651 Epoch: 3 Global Step: 160180 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:13,892-Speed 2630.94 samples/sec Loss 10.8064 LearningRate 0.0651 Epoch: 3 Global Step: 160190 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:17,791-Speed 2627.30 samples/sec Loss 10.9545 LearningRate 0.0651 Epoch: 3 Global Step: 160200 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:21,688-Speed 2628.01 samples/sec Loss 11.0500 LearningRate 0.0651 Epoch: 3 Global Step: 160210 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:25,580-Speed 2631.97 samples/sec Loss 11.0423 LearningRate 0.0651 Epoch: 3 Global Step: 160220 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:29,471-Speed 2632.61 samples/sec Loss 10.9753 LearningRate 0.0651 Epoch: 3 Global Step: 160230 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:33,362-Speed 2632.81 samples/sec Loss 10.9783 LearningRate 0.0651 Epoch: 3 Global Step: 160240 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:44:37,261-Speed 2626.88 samples/sec Loss 10.9169 LearningRate 0.0651 Epoch: 3 Global Step: 160250 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:44:41,137-Speed 2641.96 samples/sec Loss 11.1172 LearningRate 0.0651 Epoch: 3 Global Step: 160260 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:45,041-Speed 2623.70 samples/sec Loss 10.9855 LearningRate 0.0651 Epoch: 3 Global Step: 160270 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:48,949-Speed 2620.66 samples/sec Loss 11.0250 LearningRate 0.0651 Epoch: 3 Global Step: 160280 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:52,862-Speed 2617.46 samples/sec Loss 11.0423 LearningRate 0.0651 Epoch: 3 Global Step: 160290 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:44:56,755-Speed 2631.17 samples/sec Loss 11.1508 LearningRate 0.0651 Epoch: 3 Global Step: 160300 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:45:00,654-Speed 2626.58 samples/sec Loss 10.9996 LearningRate 0.0651 Epoch: 3 Global Step: 160310 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:45:04,551-Speed 2628.67 samples/sec Loss 11.0327 LearningRate 0.0651 Epoch: 3 Global Step: 160320 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:45:08,448-Speed 2628.77 samples/sec Loss 10.9797 LearningRate 0.0651 Epoch: 3 Global Step: 160330 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:45:12,336-Speed 2634.25 samples/sec Loss 10.9576 LearningRate 0.0651 Epoch: 3 Global Step: 160340 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:16,244-Speed 2620.33 samples/sec Loss 10.9118 LearningRate 0.0651 Epoch: 3 Global Step: 160350 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:20,138-Speed 2630.71 samples/sec Loss 10.8092 LearningRate 0.0651 Epoch: 3 Global Step: 160360 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:24,032-Speed 2630.22 samples/sec Loss 11.0090 LearningRate 0.0651 Epoch: 3 Global Step: 160370 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:27,935-Speed 2624.16 samples/sec Loss 10.9201 LearningRate 0.0651 Epoch: 3 Global Step: 160380 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:31,852-Speed 2615.51 samples/sec Loss 11.2216 LearningRate 0.0651 Epoch: 3 Global Step: 160390 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:35,758-Speed 2622.17 samples/sec Loss 10.9905 LearningRate 0.0651 Epoch: 3 Global Step: 160400 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:39,665-Speed 2621.96 samples/sec Loss 11.0033 LearningRate 0.0651 Epoch: 3 Global Step: 160410 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:43,572-Speed 2620.92 samples/sec Loss 10.9868 LearningRate 0.0651 Epoch: 3 Global Step: 160420 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:47,473-Speed 2626.44 samples/sec Loss 11.0350 LearningRate 0.0651 Epoch: 3 Global Step: 160430 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:45:51,366-Speed 2630.60 samples/sec Loss 11.0211 LearningRate 0.0651 Epoch: 3 Global Step: 160440 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:45:55,271-Speed 2623.27 samples/sec Loss 10.9463 LearningRate 0.0651 Epoch: 3 Global Step: 160450 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:45:59,168-Speed 2628.23 samples/sec Loss 10.8495 LearningRate 0.0651 Epoch: 3 Global Step: 160460 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:46:03,068-Speed 2626.91 samples/sec Loss 11.0614 LearningRate 0.0651 Epoch: 3 Global Step: 160470 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:46:06,966-Speed 2627.18 samples/sec Loss 10.9681 LearningRate 0.0651 Epoch: 3 Global Step: 160480 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:46:10,867-Speed 2625.61 samples/sec Loss 11.0133 LearningRate 0.0651 Epoch: 3 Global Step: 160490 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:46:14,763-Speed 2629.05 samples/sec Loss 10.9794 LearningRate 0.0650 Epoch: 3 Global Step: 160500 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:46:18,666-Speed 2624.35 samples/sec Loss 11.0792 LearningRate 0.0650 Epoch: 3 Global Step: 160510 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:46:22,558-Speed 2631.95 samples/sec Loss 10.8852 LearningRate 0.0650 Epoch: 3 Global Step: 160520 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:26,459-Speed 2625.49 samples/sec Loss 10.9815 LearningRate 0.0650 Epoch: 3 Global Step: 160530 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:30,360-Speed 2625.48 samples/sec Loss 11.0609 LearningRate 0.0650 Epoch: 3 Global Step: 160540 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:34,257-Speed 2628.73 samples/sec Loss 11.0984 LearningRate 0.0650 Epoch: 3 Global Step: 160550 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:38,155-Speed 2627.77 samples/sec Loss 11.0105 LearningRate 0.0650 Epoch: 3 Global Step: 160560 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:42,064-Speed 2620.42 samples/sec Loss 11.0694 LearningRate 0.0650 Epoch: 3 Global Step: 160570 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:45,996-Speed 2604.92 samples/sec Loss 10.9604 LearningRate 0.0650 Epoch: 3 Global Step: 160580 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:49,905-Speed 2620.59 samples/sec Loss 11.1437 LearningRate 0.0650 Epoch: 3 Global Step: 160590 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:53,824-Speed 2613.12 samples/sec Loss 10.8739 LearningRate 0.0650 Epoch: 3 Global Step: 160600 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:46:57,742-Speed 2614.56 samples/sec Loss 11.0632 LearningRate 0.0650 Epoch: 3 Global Step: 160610 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:01,646-Speed 2623.63 samples/sec Loss 10.9777 LearningRate 0.0650 Epoch: 3 Global Step: 160620 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:47:05,559-Speed 2618.27 samples/sec Loss 10.9190 LearningRate 0.0650 Epoch: 3 Global Step: 160630 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:47:09,449-Speed 2633.07 samples/sec Loss 10.8692 LearningRate 0.0650 Epoch: 3 Global Step: 160640 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:13,353-Speed 2623.76 samples/sec Loss 10.9865 LearningRate 0.0650 Epoch: 3 Global Step: 160650 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:17,260-Speed 2621.36 samples/sec Loss 10.9842 LearningRate 0.0650 Epoch: 3 Global Step: 160660 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:21,167-Speed 2620.89 samples/sec Loss 10.9515 LearningRate 0.0650 Epoch: 3 Global Step: 160670 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:25,075-Speed 2621.30 samples/sec Loss 10.9004 LearningRate 0.0650 Epoch: 3 Global Step: 160680 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:28,988-Speed 2617.78 samples/sec Loss 11.1353 LearningRate 0.0650 Epoch: 3 Global Step: 160690 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:32,892-Speed 2623.93 samples/sec Loss 11.0403 LearningRate 0.0650 Epoch: 3 Global Step: 160700 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:36,797-Speed 2622.84 samples/sec Loss 11.0690 LearningRate 0.0650 Epoch: 3 Global Step: 160710 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:40,703-Speed 2622.46 samples/sec Loss 11.0175 LearningRate 0.0650 Epoch: 3 Global Step: 160720 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:44,643-Speed 2599.21 samples/sec Loss 10.8331 LearningRate 0.0650 Epoch: 3 Global Step: 160730 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:47:48,554-Speed 2619.18 samples/sec Loss 11.0453 LearningRate 0.0650 Epoch: 3 Global Step: 160740 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:47:52,464-Speed 2619.48 samples/sec Loss 10.9701 LearningRate 0.0650 Epoch: 3 Global Step: 160750 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:47:56,389-Speed 2609.54 samples/sec Loss 10.9400 LearningRate 0.0650 Epoch: 3 Global Step: 160760 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:00,306-Speed 2615.23 samples/sec Loss 11.0223 LearningRate 0.0650 Epoch: 3 Global Step: 160770 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:04,236-Speed 2606.23 samples/sec Loss 10.9962 LearningRate 0.0650 Epoch: 3 Global Step: 160780 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:08,161-Speed 2609.96 samples/sec Loss 11.0467 LearningRate 0.0650 Epoch: 3 Global Step: 160790 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:12,082-Speed 2611.74 samples/sec Loss 11.1246 LearningRate 0.0650 Epoch: 3 Global Step: 160800 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:16,002-Speed 2613.84 samples/sec Loss 10.9609 LearningRate 0.0650 Epoch: 3 Global Step: 160810 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:19,923-Speed 2612.52 samples/sec Loss 10.8932 LearningRate 0.0650 Epoch: 3 Global Step: 160820 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:23,835-Speed 2618.23 samples/sec Loss 10.9143 LearningRate 0.0650 Epoch: 3 Global Step: 160830 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:27,755-Speed 2613.03 samples/sec Loss 11.0143 LearningRate 0.0650 Epoch: 3 Global Step: 160840 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:48:31,680-Speed 2609.29 samples/sec Loss 11.0515 LearningRate 0.0650 Epoch: 3 Global Step: 160850 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:48:35,636-Speed 2589.69 samples/sec Loss 10.8679 LearningRate 0.0650 Epoch: 3 Global Step: 160860 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:48:39,536-Speed 2626.17 samples/sec Loss 10.9572 LearningRate 0.0650 Epoch: 3 Global Step: 160870 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:43,451-Speed 2616.45 samples/sec Loss 10.9156 LearningRate 0.0650 Epoch: 3 Global Step: 160880 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:47,365-Speed 2616.77 samples/sec Loss 11.0035 LearningRate 0.0650 Epoch: 3 Global Step: 160890 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:51,277-Speed 2618.33 samples/sec Loss 10.8471 LearningRate 0.0650 Epoch: 3 Global Step: 160900 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:55,198-Speed 2612.68 samples/sec Loss 11.0861 LearningRate 0.0650 Epoch: 3 Global Step: 160910 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:48:59,114-Speed 2615.39 samples/sec Loss 11.0114 LearningRate 0.0650 Epoch: 3 Global Step: 160920 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:49:03,032-Speed 2613.98 samples/sec Loss 10.9156 LearningRate 0.0650 Epoch: 3 Global Step: 160930 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:49:06,964-Speed 2604.83 samples/sec Loss 11.0080 LearningRate 0.0650 Epoch: 3 Global Step: 160940 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:49:10,883-Speed 2614.17 samples/sec Loss 11.0503 LearningRate 0.0650 Epoch: 3 Global Step: 160950 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:49:14,802-Speed 2613.52 samples/sec Loss 10.8828 LearningRate 0.0650 Epoch: 3 Global Step: 160960 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:49:18,719-Speed 2614.60 samples/sec Loss 11.0044 LearningRate 0.0650 Epoch: 3 Global Step: 160970 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:49:22,615-Speed 2629.16 samples/sec Loss 10.9602 LearningRate 0.0650 Epoch: 3 Global Step: 160980 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:26,558-Speed 2598.04 samples/sec Loss 10.9390 LearningRate 0.0650 Epoch: 3 Global Step: 160990 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:30,497-Speed 2600.50 samples/sec Loss 10.9519 LearningRate 0.0650 Epoch: 3 Global Step: 161000 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:34,432-Speed 2602.87 samples/sec Loss 10.9483 LearningRate 0.0649 Epoch: 3 Global Step: 161010 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:38,349-Speed 2614.45 samples/sec Loss 11.0854 LearningRate 0.0649 Epoch: 3 Global Step: 161020 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:42,268-Speed 2613.48 samples/sec Loss 11.0612 LearningRate 0.0649 Epoch: 3 Global Step: 161030 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:46,186-Speed 2614.72 samples/sec Loss 10.9933 LearningRate 0.0649 Epoch: 3 Global Step: 161040 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:50,104-Speed 2614.49 samples/sec Loss 11.0999 LearningRate 0.0649 Epoch: 3 Global Step: 161050 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:54,050-Speed 2595.43 samples/sec Loss 11.0406 LearningRate 0.0649 Epoch: 3 Global Step: 161060 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:49:58,088-Speed 2536.55 samples/sec Loss 10.9733 LearningRate 0.0649 Epoch: 3 Global Step: 161070 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:50:02,010-Speed 2612.03 samples/sec Loss 10.9133 LearningRate 0.0649 Epoch: 3 Global Step: 161080 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:50:05,931-Speed 2611.79 samples/sec Loss 10.9337 LearningRate 0.0649 Epoch: 3 Global Step: 161090 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:50:09,857-Speed 2609.09 samples/sec Loss 10.9579 LearningRate 0.0649 Epoch: 3 Global Step: 161100 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:50:13,773-Speed 2615.62 samples/sec Loss 11.1953 LearningRate 0.0649 Epoch: 3 Global Step: 161110 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:50:17,694-Speed 2612.14 samples/sec Loss 11.6893 LearningRate 0.0649 Epoch: 3 Global Step: 161120 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:50:21,610-Speed 2615.77 samples/sec Loss 11.4058 LearningRate 0.0649 Epoch: 3 Global Step: 161130 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:50:25,523-Speed 2617.56 samples/sec Loss 11.0811 LearningRate 0.0649 Epoch: 3 Global Step: 161140 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:50:29,418-Speed 2629.88 samples/sec Loss 11.1700 LearningRate 0.0649 Epoch: 3 Global Step: 161150 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:50:33,323-Speed 2622.98 samples/sec Loss 11.1597 LearningRate 0.0649 Epoch: 3 Global Step: 161160 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:50:37,266-Speed 2598.07 samples/sec Loss 11.0234 LearningRate 0.0649 Epoch: 3 Global Step: 161170 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:50:41,210-Speed 2596.23 samples/sec Loss 11.0049 LearningRate 0.0649 Epoch: 3 Global Step: 161180 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:50:45,125-Speed 2617.04 samples/sec Loss 11.0671 LearningRate 0.0649 Epoch: 3 Global Step: 161190 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:50:49,037-Speed 2617.83 samples/sec Loss 10.9759 LearningRate 0.0649 Epoch: 3 Global Step: 161200 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:50:52,946-Speed 2620.89 samples/sec Loss 10.9412 LearningRate 0.0649 Epoch: 3 Global Step: 161210 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:50:56,859-Speed 2617.53 samples/sec Loss 10.9200 LearningRate 0.0649 Epoch: 3 Global Step: 161220 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:51:00,773-Speed 2617.35 samples/sec Loss 11.0304 LearningRate 0.0649 Epoch: 3 Global Step: 161230 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:51:04,683-Speed 2618.94 samples/sec Loss 10.9520 LearningRate 0.0649 Epoch: 3 Global Step: 161240 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 13:51:08,613-Speed 2606.67 samples/sec Loss 11.0119 LearningRate 0.0649 Epoch: 3 Global Step: 161250 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:12,550-Speed 2601.49 samples/sec Loss 11.1321 LearningRate 0.0649 Epoch: 3 Global Step: 161260 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:16,463-Speed 2617.64 samples/sec Loss 11.0399 LearningRate 0.0649 Epoch: 3 Global Step: 161270 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:20,434-Speed 2579.05 samples/sec Loss 11.0540 LearningRate 0.0649 Epoch: 3 Global Step: 161280 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:24,351-Speed 2614.81 samples/sec Loss 11.1016 LearningRate 0.0649 Epoch: 3 Global Step: 161290 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:28,258-Speed 2621.99 samples/sec Loss 11.1115 LearningRate 0.0649 Epoch: 3 Global Step: 161300 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:32,171-Speed 2618.20 samples/sec Loss 10.7734 LearningRate 0.0649 Epoch: 3 Global Step: 161310 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:36,078-Speed 2620.79 samples/sec Loss 11.0235 LearningRate 0.0649 Epoch: 3 Global Step: 161320 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:39,990-Speed 2618.19 samples/sec Loss 10.9893 LearningRate 0.0649 Epoch: 3 Global Step: 161330 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:43,902-Speed 2618.50 samples/sec Loss 10.9967 LearningRate 0.0649 Epoch: 3 Global Step: 161340 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 13:51:47,825-Speed 2610.73 samples/sec Loss 10.9572 LearningRate 0.0649 Epoch: 3 Global Step: 161350 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:51:51,764-Speed 2600.48 samples/sec Loss 11.0865 LearningRate 0.0649 Epoch: 3 Global Step: 161360 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:51:55,673-Speed 2620.64 samples/sec Loss 10.9735 LearningRate 0.0649 Epoch: 3 Global Step: 161370 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:51:59,576-Speed 2624.00 samples/sec Loss 10.9865 LearningRate 0.0649 Epoch: 3 Global Step: 161380 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:52:03,487-Speed 2619.18 samples/sec Loss 10.9713 LearningRate 0.0649 Epoch: 3 Global Step: 161390 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:52:07,391-Speed 2623.26 samples/sec Loss 11.1946 LearningRate 0.0649 Epoch: 3 Global Step: 161400 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:52:11,297-Speed 2621.95 samples/sec Loss 10.9062 LearningRate 0.0649 Epoch: 3 Global Step: 161410 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:52:15,213-Speed 2616.07 samples/sec Loss 10.8883 LearningRate 0.0649 Epoch: 3 Global Step: 161420 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:52:19,116-Speed 2623.73 samples/sec Loss 11.0296 LearningRate 0.0649 Epoch: 3 Global Step: 161430 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:52:23,024-Speed 2621.30 samples/sec Loss 10.8705 LearningRate 0.0649 Epoch: 3 Global Step: 161440 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:52:27,039-Speed 2550.72 samples/sec Loss 10.9316 LearningRate 0.0649 Epoch: 3 Global Step: 161450 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:30,959-Speed 2613.05 samples/sec Loss 10.8400 LearningRate 0.0649 Epoch: 3 Global Step: 161460 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:34,863-Speed 2623.83 samples/sec Loss 10.8783 LearningRate 0.0649 Epoch: 3 Global Step: 161470 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:38,770-Speed 2621.17 samples/sec Loss 10.9094 LearningRate 0.0649 Epoch: 3 Global Step: 161480 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:42,680-Speed 2619.24 samples/sec Loss 10.8981 LearningRate 0.0649 Epoch: 3 Global Step: 161490 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:46,592-Speed 2617.97 samples/sec Loss 11.2373 LearningRate 0.0649 Epoch: 3 Global Step: 161500 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:50,507-Speed 2616.94 samples/sec Loss 11.0604 LearningRate 0.0649 Epoch: 3 Global Step: 161510 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:54,413-Speed 2621.79 samples/sec Loss 10.9507 LearningRate 0.0649 Epoch: 3 Global Step: 161520 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:52:58,347-Speed 2603.70 samples/sec Loss 11.0306 LearningRate 0.0648 Epoch: 3 Global Step: 161530 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:02,258-Speed 2619.65 samples/sec Loss 11.0565 LearningRate 0.0648 Epoch: 3 Global Step: 161540 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:06,172-Speed 2616.38 samples/sec Loss 10.9149 LearningRate 0.0648 Epoch: 3 Global Step: 161550 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:53:10,071-Speed 2627.08 samples/sec Loss 11.0272 LearningRate 0.0648 Epoch: 3 Global Step: 161560 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:13,984-Speed 2617.92 samples/sec Loss 10.9892 LearningRate 0.0648 Epoch: 3 Global Step: 161570 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:17,897-Speed 2617.31 samples/sec Loss 11.0065 LearningRate 0.0648 Epoch: 3 Global Step: 161580 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:21,813-Speed 2615.94 samples/sec Loss 11.0052 LearningRate 0.0648 Epoch: 3 Global Step: 161590 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:25,748-Speed 2602.70 samples/sec Loss 11.0827 LearningRate 0.0648 Epoch: 3 Global Step: 161600 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:29,660-Speed 2618.96 samples/sec Loss 11.1151 LearningRate 0.0648 Epoch: 3 Global Step: 161610 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:33,581-Speed 2611.70 samples/sec Loss 10.9419 LearningRate 0.0648 Epoch: 3 Global Step: 161620 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:37,489-Speed 2620.73 samples/sec Loss 10.8974 LearningRate 0.0648 Epoch: 3 Global Step: 161630 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:41,423-Speed 2603.61 samples/sec Loss 10.9977 LearningRate 0.0648 Epoch: 3 Global Step: 161640 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:45,337-Speed 2616.98 samples/sec Loss 11.0799 LearningRate 0.0648 Epoch: 3 Global Step: 161650 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:49,246-Speed 2620.36 samples/sec Loss 10.8862 LearningRate 0.0648 Epoch: 3 Global Step: 161660 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:53:53,142-Speed 2628.94 samples/sec Loss 10.9816 LearningRate 0.0648 Epoch: 3 Global Step: 161670 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:53:57,039-Speed 2628.33 samples/sec Loss 10.9106 LearningRate 0.0648 Epoch: 3 Global Step: 161680 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:00,946-Speed 2621.96 samples/sec Loss 10.8934 LearningRate 0.0648 Epoch: 3 Global Step: 161690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:04,856-Speed 2619.63 samples/sec Loss 10.9718 LearningRate 0.0648 Epoch: 3 Global Step: 161700 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:08,759-Speed 2624.37 samples/sec Loss 10.9574 LearningRate 0.0648 Epoch: 3 Global Step: 161710 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:12,670-Speed 2618.74 samples/sec Loss 11.0266 LearningRate 0.0648 Epoch: 3 Global Step: 161720 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:16,570-Speed 2626.25 samples/sec Loss 10.8213 LearningRate 0.0648 Epoch: 3 Global Step: 161730 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:20,471-Speed 2625.88 samples/sec Loss 10.8257 LearningRate 0.0648 Epoch: 3 Global Step: 161740 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:24,374-Speed 2624.06 samples/sec Loss 10.8371 LearningRate 0.0648 Epoch: 3 Global Step: 161750 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:28,279-Speed 2623.40 samples/sec Loss 11.0967 LearningRate 0.0648 Epoch: 3 Global Step: 161760 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:32,187-Speed 2620.77 samples/sec Loss 10.8865 LearningRate 0.0648 Epoch: 3 Global Step: 161770 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:54:36,148-Speed 2586.29 samples/sec Loss 10.9402 LearningRate 0.0648 Epoch: 3 Global Step: 161780 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:54:40,061-Speed 2618.19 samples/sec Loss 10.9140 LearningRate 0.0648 Epoch: 3 Global Step: 161790 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:54:43,983-Speed 2611.42 samples/sec Loss 10.9727 LearningRate 0.0648 Epoch: 3 Global Step: 161800 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:47,884-Speed 2625.94 samples/sec Loss 11.0596 LearningRate 0.0648 Epoch: 3 Global Step: 161810 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:51,790-Speed 2622.11 samples/sec Loss 10.8905 LearningRate 0.0648 Epoch: 3 Global Step: 161820 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:55,716-Speed 2609.24 samples/sec Loss 10.8789 LearningRate 0.0648 Epoch: 3 Global Step: 161830 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:54:59,635-Speed 2612.96 samples/sec Loss 11.0299 LearningRate 0.0648 Epoch: 3 Global Step: 161840 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:03,559-Speed 2610.35 samples/sec Loss 10.9173 LearningRate 0.0648 Epoch: 3 Global Step: 161850 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:07,467-Speed 2620.87 samples/sec Loss 10.9936 LearningRate 0.0648 Epoch: 3 Global Step: 161860 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:11,384-Speed 2615.64 samples/sec Loss 11.0288 LearningRate 0.0648 Epoch: 3 Global Step: 161870 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:15,286-Speed 2624.50 samples/sec Loss 10.9858 LearningRate 0.0648 Epoch: 3 Global Step: 161880 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:19,191-Speed 2622.91 samples/sec Loss 10.9319 LearningRate 0.0648 Epoch: 3 Global Step: 161890 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:23,095-Speed 2623.63 samples/sec Loss 10.9143 LearningRate 0.0648 Epoch: 3 Global Step: 161900 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:55:26,998-Speed 2624.54 samples/sec Loss 11.1582 LearningRate 0.0648 Epoch: 3 Global Step: 161910 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:55:30,889-Speed 2632.27 samples/sec Loss 11.0631 LearningRate 0.0648 Epoch: 3 Global Step: 161920 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:34,790-Speed 2625.43 samples/sec Loss 10.8930 LearningRate 0.0648 Epoch: 3 Global Step: 161930 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:38,694-Speed 2623.32 samples/sec Loss 10.9116 LearningRate 0.0648 Epoch: 3 Global Step: 161940 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:42,600-Speed 2622.32 samples/sec Loss 11.0898 LearningRate 0.0648 Epoch: 3 Global Step: 161950 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:46,504-Speed 2623.92 samples/sec Loss 11.0097 LearningRate 0.0648 Epoch: 3 Global Step: 161960 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:50,423-Speed 2614.00 samples/sec Loss 10.9757 LearningRate 0.0648 Epoch: 3 Global Step: 161970 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:54,324-Speed 2625.52 samples/sec Loss 10.9646 LearningRate 0.0648 Epoch: 3 Global Step: 161980 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:55:58,236-Speed 2618.55 samples/sec Loss 10.8754 LearningRate 0.0648 Epoch: 3 Global Step: 161990 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:02,145-Speed 2620.19 samples/sec Loss 10.9727 LearningRate 0.0648 Epoch: 3 Global Step: 162000 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:06,056-Speed 2619.19 samples/sec Loss 11.0431 LearningRate 0.0648 Epoch: 3 Global Step: 162010 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:09,947-Speed 2631.76 samples/sec Loss 10.9211 LearningRate 0.0648 Epoch: 3 Global Step: 162020 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:13,847-Speed 2627.10 samples/sec Loss 10.8532 LearningRate 0.0648 Epoch: 3 Global Step: 162030 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:17,746-Speed 2626.49 samples/sec Loss 11.0445 LearningRate 0.0647 Epoch: 3 Global Step: 162040 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:21,652-Speed 2623.26 samples/sec Loss 10.9038 LearningRate 0.0647 Epoch: 3 Global Step: 162050 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:25,565-Speed 2617.17 samples/sec Loss 11.0513 LearningRate 0.0647 Epoch: 3 Global Step: 162060 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:29,474-Speed 2620.23 samples/sec Loss 10.9505 LearningRate 0.0647 Epoch: 3 Global Step: 162070 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:33,385-Speed 2619.39 samples/sec Loss 11.0889 LearningRate 0.0647 Epoch: 3 Global Step: 162080 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:37,288-Speed 2624.05 samples/sec Loss 11.1723 LearningRate 0.0647 Epoch: 3 Global Step: 162090 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:41,192-Speed 2623.15 samples/sec Loss 10.8746 LearningRate 0.0647 Epoch: 3 Global Step: 162100 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:45,119-Speed 2608.61 samples/sec Loss 10.9221 LearningRate 0.0647 Epoch: 3 Global Step: 162110 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:56:49,024-Speed 2623.16 samples/sec Loss 11.0800 LearningRate 0.0647 Epoch: 3 Global Step: 162120 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:56:52,933-Speed 2620.66 samples/sec Loss 10.9346 LearningRate 0.0647 Epoch: 3 Global Step: 162130 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:56:56,844-Speed 2618.56 samples/sec Loss 11.0235 LearningRate 0.0647 Epoch: 3 Global Step: 162140 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:57:00,749-Speed 2623.17 samples/sec Loss 10.9611 LearningRate 0.0647 Epoch: 3 Global Step: 162150 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:57:04,640-Speed 2632.22 samples/sec Loss 10.9898 LearningRate 0.0647 Epoch: 3 Global Step: 162160 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:57:08,534-Speed 2630.47 samples/sec Loss 10.9014 LearningRate 0.0647 Epoch: 3 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:12,429-Speed 2629.53 samples/sec Loss 10.9344 LearningRate 0.0647 Epoch: 3 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:16,333-Speed 2624.32 samples/sec Loss 10.9136 LearningRate 0.0647 Epoch: 3 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:20,236-Speed 2624.18 samples/sec Loss 10.8794 LearningRate 0.0647 Epoch: 3 Global Step: 162200 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:24,153-Speed 2614.81 samples/sec Loss 11.0097 LearningRate 0.0647 Epoch: 3 Global Step: 162210 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:28,058-Speed 2623.21 samples/sec Loss 11.0591 LearningRate 0.0647 Epoch: 3 Global Step: 162220 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:31,962-Speed 2624.02 samples/sec Loss 11.0903 LearningRate 0.0647 Epoch: 3 Global Step: 162230 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:35,860-Speed 2626.87 samples/sec Loss 10.9612 LearningRate 0.0647 Epoch: 3 Global Step: 162240 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:39,758-Speed 2627.41 samples/sec Loss 10.9189 LearningRate 0.0647 Epoch: 3 Global Step: 162250 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:43,663-Speed 2623.34 samples/sec Loss 10.9945 LearningRate 0.0647 Epoch: 3 Global Step: 162260 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:57:47,569-Speed 2622.34 samples/sec Loss 10.8974 LearningRate 0.0647 Epoch: 3 Global Step: 162270 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:57:51,488-Speed 2613.56 samples/sec Loss 10.9186 LearningRate 0.0647 Epoch: 3 Global Step: 162280 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:57:55,400-Speed 2618.58 samples/sec Loss 11.1610 LearningRate 0.0647 Epoch: 3 Global Step: 162290 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:57:59,309-Speed 2620.57 samples/sec Loss 10.8216 LearningRate 0.0647 Epoch: 3 Global Step: 162300 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:03,221-Speed 2618.36 samples/sec Loss 10.9046 LearningRate 0.0647 Epoch: 3 Global Step: 162310 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:07,124-Speed 2624.20 samples/sec Loss 10.9091 LearningRate 0.0647 Epoch: 3 Global Step: 162320 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:11,031-Speed 2621.39 samples/sec Loss 11.0537 LearningRate 0.0647 Epoch: 3 Global Step: 162330 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:14,941-Speed 2619.92 samples/sec Loss 10.9346 LearningRate 0.0647 Epoch: 3 Global Step: 162340 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:18,844-Speed 2624.62 samples/sec Loss 11.0203 LearningRate 0.0647 Epoch: 3 Global Step: 162350 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:22,746-Speed 2624.76 samples/sec Loss 10.8797 LearningRate 0.0647 Epoch: 3 Global Step: 162360 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:26,703-Speed 2588.48 samples/sec Loss 10.9468 LearningRate 0.0647 Epoch: 3 Global Step: 162370 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:58:30,610-Speed 2621.94 samples/sec Loss 10.9031 LearningRate 0.0647 Epoch: 3 Global Step: 162380 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:58:34,521-Speed 2619.07 samples/sec Loss 10.9080 LearningRate 0.0647 Epoch: 3 Global Step: 162390 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:58:38,434-Speed 2617.03 samples/sec Loss 10.9258 LearningRate 0.0647 Epoch: 3 Global Step: 162400 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 13:58:42,328-Speed 2630.49 samples/sec Loss 10.9565 LearningRate 0.0647 Epoch: 3 Global Step: 162410 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:46,237-Speed 2620.45 samples/sec Loss 10.8477 LearningRate 0.0647 Epoch: 3 Global Step: 162420 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:58:50,301-Speed 2520.62 samples/sec Loss 10.8874 LearningRate 0.0647 Epoch: 3 Global Step: 162430 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:58:54,339-Speed 2536.56 samples/sec Loss 11.1556 LearningRate 0.0647 Epoch: 3 Global Step: 162440 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:58:58,241-Speed 2624.89 samples/sec Loss 11.0139 LearningRate 0.0647 Epoch: 3 Global Step: 162450 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:02,146-Speed 2622.93 samples/sec Loss 10.9189 LearningRate 0.0647 Epoch: 3 Global Step: 162460 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:06,048-Speed 2625.21 samples/sec Loss 11.0007 LearningRate 0.0647 Epoch: 3 Global Step: 162470 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:09,952-Speed 2623.50 samples/sec Loss 10.8582 LearningRate 0.0647 Epoch: 3 Global Step: 162480 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:13,858-Speed 2622.57 samples/sec Loss 10.9539 LearningRate 0.0647 Epoch: 3 Global Step: 162490 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:17,766-Speed 2620.60 samples/sec Loss 10.8772 LearningRate 0.0647 Epoch: 3 Global Step: 162500 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:21,676-Speed 2620.01 samples/sec Loss 11.0532 LearningRate 0.0647 Epoch: 3 Global Step: 162510 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:25,576-Speed 2626.70 samples/sec Loss 11.0255 LearningRate 0.0647 Epoch: 3 Global Step: 162520 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 13:59:29,474-Speed 2627.44 samples/sec Loss 11.0422 LearningRate 0.0647 Epoch: 3 Global Step: 162530 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:59:33,381-Speed 2621.95 samples/sec Loss 10.9264 LearningRate 0.0647 Epoch: 3 Global Step: 162540 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:59:37,289-Speed 2620.51 samples/sec Loss 10.9139 LearningRate 0.0647 Epoch: 3 Global Step: 162550 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:59:41,197-Speed 2620.90 samples/sec Loss 10.8377 LearningRate 0.0646 Epoch: 3 Global Step: 162560 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:59:45,095-Speed 2627.53 samples/sec Loss 10.9642 LearningRate 0.0646 Epoch: 3 Global Step: 162570 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:59:48,994-Speed 2627.39 samples/sec Loss 10.8296 LearningRate 0.0646 Epoch: 3 Global Step: 162580 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:59:52,893-Speed 2626.54 samples/sec Loss 10.9892 LearningRate 0.0646 Epoch: 3 Global Step: 162590 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 13:59:56,812-Speed 2614.50 samples/sec Loss 10.9705 LearningRate 0.0646 Epoch: 3 Global Step: 162600 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:00,804-Speed 2565.55 samples/sec Loss 11.0877 LearningRate 0.0646 Epoch: 3 Global Step: 162610 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:04,709-Speed 2623.42 samples/sec Loss 11.0123 LearningRate 0.0646 Epoch: 3 Global Step: 162620 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:08,616-Speed 2621.21 samples/sec Loss 10.8443 LearningRate 0.0646 Epoch: 3 Global Step: 162630 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:00:12,525-Speed 2619.81 samples/sec Loss 10.8230 LearningRate 0.0646 Epoch: 3 Global Step: 162640 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:00:16,442-Speed 2614.72 samples/sec Loss 10.8962 LearningRate 0.0646 Epoch: 3 Global Step: 162650 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:20,370-Speed 2608.04 samples/sec Loss 10.8180 LearningRate 0.0646 Epoch: 3 Global Step: 162660 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:24,313-Speed 2598.26 samples/sec Loss 11.1184 LearningRate 0.0646 Epoch: 3 Global Step: 162670 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:28,233-Speed 2612.59 samples/sec Loss 10.8961 LearningRate 0.0646 Epoch: 3 Global Step: 162680 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:32,129-Speed 2629.00 samples/sec Loss 10.8859 LearningRate 0.0646 Epoch: 3 Global Step: 162690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:36,040-Speed 2619.77 samples/sec Loss 10.8796 LearningRate 0.0646 Epoch: 3 Global Step: 162700 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:39,957-Speed 2614.62 samples/sec Loss 10.9962 LearningRate 0.0646 Epoch: 3 Global Step: 162710 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:43,855-Speed 2627.53 samples/sec Loss 11.0315 LearningRate 0.0646 Epoch: 3 Global Step: 162720 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:47,756-Speed 2625.67 samples/sec Loss 11.0609 LearningRate 0.0646 Epoch: 3 Global Step: 162730 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:51,657-Speed 2625.81 samples/sec Loss 11.0567 LearningRate 0.0646 Epoch: 3 Global Step: 162740 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:00:55,557-Speed 2626.49 samples/sec Loss 10.9373 LearningRate 0.0646 Epoch: 3 Global Step: 162750 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:00:59,458-Speed 2625.58 samples/sec Loss 10.8119 LearningRate 0.0646 Epoch: 3 Global Step: 162760 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:01:03,360-Speed 2624.51 samples/sec Loss 10.8598 LearningRate 0.0646 Epoch: 3 Global Step: 162770 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:01:07,247-Speed 2635.22 samples/sec Loss 10.9757 LearningRate 0.0646 Epoch: 3 Global Step: 162780 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:01:11,140-Speed 2630.89 samples/sec Loss 11.2021 LearningRate 0.0646 Epoch: 3 Global Step: 162790 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:01:15,070-Speed 2605.78 samples/sec Loss 10.9515 LearningRate 0.0646 Epoch: 3 Global Step: 162800 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:01:18,977-Speed 2621.68 samples/sec Loss 10.9650 LearningRate 0.0646 Epoch: 3 Global Step: 162810 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:01:22,992-Speed 2551.35 samples/sec Loss 10.9701 LearningRate 0.0646 Epoch: 3 Global Step: 162820 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:26,963-Speed 2579.55 samples/sec Loss 10.8594 LearningRate 0.0646 Epoch: 3 Global Step: 162830 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:30,878-Speed 2616.55 samples/sec Loss 10.9784 LearningRate 0.0646 Epoch: 3 Global Step: 162840 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:34,776-Speed 2627.90 samples/sec Loss 11.0815 LearningRate 0.0646 Epoch: 3 Global Step: 162850 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:38,674-Speed 2627.49 samples/sec Loss 10.9478 LearningRate 0.0646 Epoch: 3 Global Step: 162860 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:42,591-Speed 2614.58 samples/sec Loss 10.8980 LearningRate 0.0646 Epoch: 3 Global Step: 162870 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:46,486-Speed 2629.80 samples/sec Loss 10.8971 LearningRate 0.0646 Epoch: 3 Global Step: 162880 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:50,384-Speed 2628.08 samples/sec Loss 10.9926 LearningRate 0.0646 Epoch: 3 Global Step: 162890 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:54,283-Speed 2627.43 samples/sec Loss 11.0128 LearningRate 0.0646 Epoch: 3 Global Step: 162900 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:01:58,183-Speed 2625.86 samples/sec Loss 10.9129 LearningRate 0.0646 Epoch: 3 Global Step: 162910 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:02:02,080-Speed 2628.96 samples/sec Loss 11.0612 LearningRate 0.0646 Epoch: 3 Global Step: 162920 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:05,972-Speed 2631.25 samples/sec Loss 10.9481 LearningRate 0.0646 Epoch: 3 Global Step: 162930 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:09,868-Speed 2629.30 samples/sec Loss 10.8985 LearningRate 0.0646 Epoch: 3 Global Step: 162940 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:13,770-Speed 2624.60 samples/sec Loss 11.0660 LearningRate 0.0646 Epoch: 3 Global Step: 162950 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:17,720-Speed 2592.72 samples/sec Loss 10.7478 LearningRate 0.0646 Epoch: 3 Global Step: 162960 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:21,619-Speed 2627.14 samples/sec Loss 10.8909 LearningRate 0.0646 Epoch: 3 Global Step: 162970 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:25,534-Speed 2616.49 samples/sec Loss 10.8388 LearningRate 0.0646 Epoch: 3 Global Step: 162980 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:29,440-Speed 2622.66 samples/sec Loss 10.9614 LearningRate 0.0646 Epoch: 3 Global Step: 162990 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:33,342-Speed 2624.71 samples/sec Loss 10.9150 LearningRate 0.0646 Epoch: 3 Global Step: 163000 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:02:37,219-Speed 2641.18 samples/sec Loss 10.8422 LearningRate 0.0646 Epoch: 3 Global Step: 163010 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:02:41,123-Speed 2623.67 samples/sec Loss 11.0231 LearningRate 0.0646 Epoch: 3 Global Step: 163020 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:02:45,029-Speed 2623.14 samples/sec Loss 11.1006 LearningRate 0.0646 Epoch: 3 Global Step: 163030 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:02:48,933-Speed 2623.41 samples/sec Loss 10.9859 LearningRate 0.0646 Epoch: 3 Global Step: 163040 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:02:52,835-Speed 2624.87 samples/sec Loss 11.0140 LearningRate 0.0646 Epoch: 3 Global Step: 163050 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:02:56,742-Speed 2623.23 samples/sec Loss 10.8129 LearningRate 0.0646 Epoch: 3 Global Step: 163060 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:03:00,640-Speed 2627.78 samples/sec Loss 10.8823 LearningRate 0.0646 Epoch: 3 Global Step: 163070 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:03:04,540-Speed 2625.57 samples/sec Loss 10.8996 LearningRate 0.0645 Epoch: 3 Global Step: 163080 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:03:08,436-Speed 2629.00 samples/sec Loss 10.9029 LearningRate 0.0645 Epoch: 3 Global Step: 163090 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:03:12,340-Speed 2623.60 samples/sec Loss 10.9264 LearningRate 0.0645 Epoch: 3 Global Step: 163100 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:03:16,242-Speed 2625.42 samples/sec Loss 10.9929 LearningRate 0.0645 Epoch: 3 Global Step: 163110 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:20,150-Speed 2621.07 samples/sec Loss 10.9187 LearningRate 0.0645 Epoch: 3 Global Step: 163120 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:24,052-Speed 2625.00 samples/sec Loss 11.0033 LearningRate 0.0645 Epoch: 3 Global Step: 163130 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:27,958-Speed 2622.15 samples/sec Loss 11.0705 LearningRate 0.0645 Epoch: 3 Global Step: 163140 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:31,861-Speed 2624.71 samples/sec Loss 10.9662 LearningRate 0.0645 Epoch: 3 Global Step: 163150 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:35,810-Speed 2593.55 samples/sec Loss 11.1604 LearningRate 0.0645 Epoch: 3 Global Step: 163160 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:39,768-Speed 2587.75 samples/sec Loss 10.8614 LearningRate 0.0645 Epoch: 3 Global Step: 163170 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:43,667-Speed 2626.96 samples/sec Loss 10.9434 LearningRate 0.0645 Epoch: 3 Global Step: 163180 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:47,596-Speed 2607.28 samples/sec Loss 11.1093 LearningRate 0.0645 Epoch: 3 Global Step: 163190 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:51,509-Speed 2617.47 samples/sec Loss 11.0872 LearningRate 0.0645 Epoch: 3 Global Step: 163200 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:03:55,407-Speed 2627.54 samples/sec Loss 11.0019 LearningRate 0.0645 Epoch: 3 Global Step: 163210 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:03:59,315-Speed 2621.70 samples/sec Loss 11.0308 LearningRate 0.0645 Epoch: 3 Global Step: 163220 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:03,216-Speed 2625.22 samples/sec Loss 11.0455 LearningRate 0.0645 Epoch: 3 Global Step: 163230 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:07,118-Speed 2624.85 samples/sec Loss 10.9602 LearningRate 0.0645 Epoch: 3 Global Step: 163240 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:11,017-Speed 2626.61 samples/sec Loss 10.9231 LearningRate 0.0645 Epoch: 3 Global Step: 163250 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:14,952-Speed 2603.37 samples/sec Loss 10.7435 LearningRate 0.0645 Epoch: 3 Global Step: 163260 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:18,853-Speed 2625.88 samples/sec Loss 10.9389 LearningRate 0.0645 Epoch: 3 Global Step: 163270 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:22,753-Speed 2626.18 samples/sec Loss 11.0810 LearningRate 0.0645 Epoch: 3 Global Step: 163280 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:26,652-Speed 2627.34 samples/sec Loss 10.8992 LearningRate 0.0645 Epoch: 3 Global Step: 163290 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:04:30,541-Speed 2633.48 samples/sec Loss 11.0333 LearningRate 0.0645 Epoch: 3 Global Step: 163300 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:04:34,442-Speed 2625.90 samples/sec Loss 10.9838 LearningRate 0.0645 Epoch: 3 Global Step: 163310 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:04:38,337-Speed 2629.37 samples/sec Loss 10.8345 LearningRate 0.0645 Epoch: 3 Global Step: 163320 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:04:42,236-Speed 2626.91 samples/sec Loss 10.8858 LearningRate 0.0645 Epoch: 3 Global Step: 163330 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:04:46,139-Speed 2624.23 samples/sec Loss 10.9768 LearningRate 0.0645 Epoch: 3 Global Step: 163340 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:04:50,033-Speed 2630.58 samples/sec Loss 10.8694 LearningRate 0.0645 Epoch: 3 Global Step: 163350 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:04:53,932-Speed 2627.04 samples/sec Loss 11.0759 LearningRate 0.0645 Epoch: 3 Global Step: 163360 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:04:57,849-Speed 2614.50 samples/sec Loss 11.0369 LearningRate 0.0645 Epoch: 3 Global Step: 163370 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:01,744-Speed 2629.85 samples/sec Loss 10.9376 LearningRate 0.0645 Epoch: 3 Global Step: 163380 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:05,654-Speed 2619.35 samples/sec Loss 10.8490 LearningRate 0.0645 Epoch: 3 Global Step: 163390 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:09,566-Speed 2618.20 samples/sec Loss 10.8674 LearningRate 0.0645 Epoch: 3 Global Step: 163400 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:13,467-Speed 2625.86 samples/sec Loss 10.8802 LearningRate 0.0645 Epoch: 3 Global Step: 163410 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:17,364-Speed 2628.30 samples/sec Loss 10.8784 LearningRate 0.0645 Epoch: 3 Global Step: 163420 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:21,263-Speed 2627.03 samples/sec Loss 10.9053 LearningRate 0.0645 Epoch: 3 Global Step: 163430 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:25,163-Speed 2626.44 samples/sec Loss 10.9024 LearningRate 0.0645 Epoch: 3 Global Step: 163440 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:29,071-Speed 2620.67 samples/sec Loss 10.9665 LearningRate 0.0645 Epoch: 3 Global Step: 163450 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:32,968-Speed 2628.10 samples/sec Loss 10.9914 LearningRate 0.0645 Epoch: 3 Global Step: 163460 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:36,866-Speed 2627.96 samples/sec Loss 10.9675 LearningRate 0.0645 Epoch: 3 Global Step: 163470 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:40,763-Speed 2627.48 samples/sec Loss 11.0897 LearningRate 0.0645 Epoch: 3 Global Step: 163480 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:44,660-Speed 2629.43 samples/sec Loss 10.8488 LearningRate 0.0645 Epoch: 3 Global Step: 163490 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:48,574-Speed 2616.66 samples/sec Loss 10.9809 LearningRate 0.0645 Epoch: 3 Global Step: 163500 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:05:52,462-Speed 2634.55 samples/sec Loss 11.0659 LearningRate 0.0645 Epoch: 3 Global Step: 163510 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:05:56,339-Speed 2641.61 samples/sec Loss 10.7656 LearningRate 0.0645 Epoch: 3 Global Step: 163520 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:00,233-Speed 2630.68 samples/sec Loss 10.9432 LearningRate 0.0645 Epoch: 3 Global Step: 163530 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:04,138-Speed 2622.40 samples/sec Loss 10.8004 LearningRate 0.0645 Epoch: 3 Global Step: 163540 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:08,041-Speed 2624.06 samples/sec Loss 10.8499 LearningRate 0.0645 Epoch: 3 Global Step: 163550 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:11,937-Speed 2629.27 samples/sec Loss 10.9630 LearningRate 0.0645 Epoch: 3 Global Step: 163560 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:15,832-Speed 2630.07 samples/sec Loss 10.8683 LearningRate 0.0645 Epoch: 3 Global Step: 163570 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:19,733-Speed 2625.78 samples/sec Loss 10.9317 LearningRate 0.0645 Epoch: 3 Global Step: 163580 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:23,626-Speed 2630.63 samples/sec Loss 10.8380 LearningRate 0.0644 Epoch: 3 Global Step: 163590 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:27,522-Speed 2629.28 samples/sec Loss 10.8084 LearningRate 0.0644 Epoch: 3 Global Step: 163600 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:31,453-Speed 2606.15 samples/sec Loss 10.9542 LearningRate 0.0644 Epoch: 3 Global Step: 163610 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:06:35,363-Speed 2619.54 samples/sec Loss 10.8166 LearningRate 0.0644 Epoch: 3 Global Step: 163620 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:06:39,266-Speed 2624.32 samples/sec Loss 10.8538 LearningRate 0.0644 Epoch: 3 Global Step: 163630 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:06:43,218-Speed 2592.15 samples/sec Loss 10.9999 LearningRate 0.0644 Epoch: 3 Global Step: 163640 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:06:47,164-Speed 2595.51 samples/sec Loss 10.8345 LearningRate 0.0644 Epoch: 3 Global Step: 163650 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:06:51,069-Speed 2623.16 samples/sec Loss 11.0921 LearningRate 0.0644 Epoch: 3 Global Step: 163660 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:06:54,984-Speed 2616.00 samples/sec Loss 11.0768 LearningRate 0.0644 Epoch: 3 Global Step: 163670 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:06:58,944-Speed 2586.25 samples/sec Loss 10.8516 LearningRate 0.0644 Epoch: 3 Global Step: 163680 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:02,891-Speed 2595.11 samples/sec Loss 10.8898 LearningRate 0.0644 Epoch: 3 Global Step: 163690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:06,804-Speed 2617.81 samples/sec Loss 10.9466 LearningRate 0.0644 Epoch: 3 Global Step: 163700 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:10,752-Speed 2594.15 samples/sec Loss 10.8563 LearningRate 0.0644 Epoch: 3 Global Step: 163710 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:14,655-Speed 2624.53 samples/sec Loss 10.9083 LearningRate 0.0644 Epoch: 3 Global Step: 163720 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:07:18,557-Speed 2624.70 samples/sec Loss 10.9884 LearningRate 0.0644 Epoch: 3 Global Step: 163730 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:22,475-Speed 2614.34 samples/sec Loss 10.7579 LearningRate 0.0644 Epoch: 3 Global Step: 163740 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:26,379-Speed 2623.95 samples/sec Loss 11.0168 LearningRate 0.0644 Epoch: 3 Global Step: 163750 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:30,295-Speed 2615.59 samples/sec Loss 11.0018 LearningRate 0.0644 Epoch: 3 Global Step: 163760 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:34,206-Speed 2619.26 samples/sec Loss 11.0297 LearningRate 0.0644 Epoch: 3 Global Step: 163770 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:38,110-Speed 2623.08 samples/sec Loss 10.9619 LearningRate 0.0644 Epoch: 3 Global Step: 163780 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:42,021-Speed 2618.60 samples/sec Loss 11.0149 LearningRate 0.0644 Epoch: 3 Global Step: 163790 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:07:45,875-Speed 2658.00 samples/sec Loss 11.0789 LearningRate 0.0644 Epoch: 3 Global Step: 163800 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:07:49,780-Speed 2622.80 samples/sec Loss 10.9086 LearningRate 0.0644 Epoch: 3 Global Step: 163810 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:07:53,677-Speed 2628.91 samples/sec Loss 11.0581 LearningRate 0.0644 Epoch: 3 Global Step: 163820 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:07:57,579-Speed 2624.56 samples/sec Loss 11.0664 LearningRate 0.0644 Epoch: 3 Global Step: 163830 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:08:01,512-Speed 2604.93 samples/sec Loss 10.9626 LearningRate 0.0644 Epoch: 3 Global Step: 163840 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:08:05,429-Speed 2614.79 samples/sec Loss 11.0219 LearningRate 0.0644 Epoch: 3 Global Step: 163850 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:08:09,360-Speed 2605.34 samples/sec Loss 10.9675 LearningRate 0.0644 Epoch: 3 Global Step: 163860 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:08:13,269-Speed 2620.48 samples/sec Loss 11.0453 LearningRate 0.0644 Epoch: 3 Global Step: 163870 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:08:17,176-Speed 2621.38 samples/sec Loss 10.9668 LearningRate 0.0644 Epoch: 3 Global Step: 163880 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:08:21,091-Speed 2616.94 samples/sec Loss 10.8767 LearningRate 0.0644 Epoch: 3 Global Step: 163890 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:08:24,988-Speed 2628.44 samples/sec Loss 11.0245 LearningRate 0.0644 Epoch: 3 Global Step: 163900 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:28,885-Speed 2628.13 samples/sec Loss 11.0223 LearningRate 0.0644 Epoch: 3 Global Step: 163910 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:32,785-Speed 2626.41 samples/sec Loss 10.9511 LearningRate 0.0644 Epoch: 3 Global Step: 163920 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:36,690-Speed 2622.42 samples/sec Loss 11.0186 LearningRate 0.0644 Epoch: 3 Global Step: 163930 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:40,581-Speed 2632.27 samples/sec Loss 10.9575 LearningRate 0.0644 Epoch: 3 Global Step: 163940 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:44,479-Speed 2628.01 samples/sec Loss 11.1620 LearningRate 0.0644 Epoch: 3 Global Step: 163950 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:48,389-Speed 2619.22 samples/sec Loss 10.9932 LearningRate 0.0644 Epoch: 3 Global Step: 163960 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:52,281-Speed 2632.23 samples/sec Loss 10.8964 LearningRate 0.0644 Epoch: 3 Global Step: 163970 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:08:56,172-Speed 2632.24 samples/sec Loss 10.9794 LearningRate 0.0644 Epoch: 3 Global Step: 163980 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:09:00,076-Speed 2624.13 samples/sec Loss 10.8667 LearningRate 0.0644 Epoch: 3 Global Step: 163990 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:09:03,987-Speed 2618.47 samples/sec Loss 10.8452 LearningRate 0.0644 Epoch: 3 Global Step: 164000 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:07,886-Speed 2627.43 samples/sec Loss 10.9458 LearningRate 0.0644 Epoch: 3 Global Step: 164010 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:11,783-Speed 2627.86 samples/sec Loss 10.8459 LearningRate 0.0644 Epoch: 3 Global Step: 164020 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:15,685-Speed 2625.43 samples/sec Loss 10.9420 LearningRate 0.0644 Epoch: 3 Global Step: 164030 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:19,579-Speed 2630.19 samples/sec Loss 10.9411 LearningRate 0.0644 Epoch: 3 Global Step: 164040 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:23,471-Speed 2632.07 samples/sec Loss 10.8511 LearningRate 0.0644 Epoch: 3 Global Step: 164050 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:27,367-Speed 2628.63 samples/sec Loss 10.8697 LearningRate 0.0644 Epoch: 3 Global Step: 164060 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:31,261-Speed 2630.42 samples/sec Loss 11.0348 LearningRate 0.0644 Epoch: 3 Global Step: 164070 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:35,153-Speed 2631.75 samples/sec Loss 10.9775 LearningRate 0.0644 Epoch: 3 Global Step: 164080 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:39,049-Speed 2628.88 samples/sec Loss 11.0851 LearningRate 0.0644 Epoch: 3 Global Step: 164090 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:09:42,941-Speed 2631.43 samples/sec Loss 11.0488 LearningRate 0.0644 Epoch: 3 Global Step: 164100 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:09:46,882-Speed 2599.05 samples/sec Loss 10.8680 LearningRate 0.0643 Epoch: 3 Global Step: 164110 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:09:50,789-Speed 2621.97 samples/sec Loss 11.0148 LearningRate 0.0643 Epoch: 3 Global Step: 164120 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:09:54,687-Speed 2629.23 samples/sec Loss 10.9061 LearningRate 0.0643 Epoch: 3 Global Step: 164130 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:09:58,589-Speed 2625.12 samples/sec Loss 10.9993 LearningRate 0.0643 Epoch: 3 Global Step: 164140 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:02,487-Speed 2628.10 samples/sec Loss 10.8930 LearningRate 0.0643 Epoch: 3 Global Step: 164150 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:06,389-Speed 2624.88 samples/sec Loss 10.8806 LearningRate 0.0643 Epoch: 3 Global Step: 164160 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:10,286-Speed 2628.59 samples/sec Loss 11.0413 LearningRate 0.0643 Epoch: 3 Global Step: 164170 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:14,183-Speed 2628.13 samples/sec Loss 10.7614 LearningRate 0.0643 Epoch: 3 Global Step: 164180 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:18,083-Speed 2626.11 samples/sec Loss 11.0682 LearningRate 0.0643 Epoch: 3 Global Step: 164190 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:21,989-Speed 2622.83 samples/sec Loss 11.0231 LearningRate 0.0643 Epoch: 3 Global Step: 164200 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:10:25,884-Speed 2629.22 samples/sec Loss 11.0412 LearningRate 0.0643 Epoch: 3 Global Step: 164210 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:10:29,783-Speed 2627.30 samples/sec Loss 10.9128 LearningRate 0.0643 Epoch: 3 Global Step: 164220 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:10:33,675-Speed 2631.93 samples/sec Loss 10.8374 LearningRate 0.0643 Epoch: 3 Global Step: 164230 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:10:37,611-Speed 2602.11 samples/sec Loss 10.9962 LearningRate 0.0643 Epoch: 3 Global Step: 164240 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:10:41,511-Speed 2626.32 samples/sec Loss 10.7986 LearningRate 0.0643 Epoch: 3 Global Step: 164250 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:10:45,396-Speed 2636.76 samples/sec Loss 11.1009 LearningRate 0.0643 Epoch: 3 Global Step: 164260 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:49,300-Speed 2623.25 samples/sec Loss 11.0798 LearningRate 0.0643 Epoch: 3 Global Step: 164270 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:53,198-Speed 2628.22 samples/sec Loss 10.9081 LearningRate 0.0643 Epoch: 3 Global Step: 164280 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:10:57,094-Speed 2629.34 samples/sec Loss 10.8270 LearningRate 0.0643 Epoch: 3 Global Step: 164290 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:01,005-Speed 2618.78 samples/sec Loss 10.9867 LearningRate 0.0643 Epoch: 3 Global Step: 164300 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:04,898-Speed 2631.50 samples/sec Loss 10.9100 LearningRate 0.0643 Epoch: 3 Global Step: 164310 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:08,792-Speed 2629.85 samples/sec Loss 10.7536 LearningRate 0.0643 Epoch: 3 Global Step: 164320 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:12,686-Speed 2630.29 samples/sec Loss 10.7388 LearningRate 0.0643 Epoch: 3 Global Step: 164330 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:16,679-Speed 2564.89 samples/sec Loss 10.8810 LearningRate 0.0643 Epoch: 3 Global Step: 164340 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:20,579-Speed 2626.27 samples/sec Loss 11.0337 LearningRate 0.0643 Epoch: 3 Global Step: 164350 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:24,475-Speed 2629.09 samples/sec Loss 10.8927 LearningRate 0.0643 Epoch: 3 Global Step: 164360 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:11:28,383-Speed 2621.34 samples/sec Loss 10.8043 LearningRate 0.0643 Epoch: 3 Global Step: 164370 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:11:32,268-Speed 2636.24 samples/sec Loss 10.9068 LearningRate 0.0643 Epoch: 3 Global Step: 164380 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:36,228-Speed 2587.02 samples/sec Loss 10.8396 LearningRate 0.0643 Epoch: 3 Global Step: 164390 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:40,231-Speed 2558.31 samples/sec Loss 10.9819 LearningRate 0.0643 Epoch: 3 Global Step: 164400 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:44,121-Speed 2633.07 samples/sec Loss 10.8879 LearningRate 0.0643 Epoch: 3 Global Step: 164410 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:48,016-Speed 2629.36 samples/sec Loss 10.9090 LearningRate 0.0643 Epoch: 3 Global Step: 164420 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:51,989-Speed 2579.08 samples/sec Loss 10.9561 LearningRate 0.0643 Epoch: 3 Global Step: 164430 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:55,899-Speed 2619.07 samples/sec Loss 10.8323 LearningRate 0.0643 Epoch: 3 Global Step: 164440 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:11:59,911-Speed 2552.90 samples/sec Loss 10.9161 LearningRate 0.0643 Epoch: 3 Global Step: 164450 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:03,824-Speed 2617.75 samples/sec Loss 10.9242 LearningRate 0.0643 Epoch: 3 Global Step: 164460 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:07,724-Speed 2626.37 samples/sec Loss 10.8250 LearningRate 0.0643 Epoch: 3 Global Step: 164470 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:11,621-Speed 2628.44 samples/sec Loss 10.9552 LearningRate 0.0643 Epoch: 3 Global Step: 164480 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:12:15,551-Speed 2606.34 samples/sec Loss 10.9011 LearningRate 0.0643 Epoch: 3 Global Step: 164490 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:19,449-Speed 2627.44 samples/sec Loss 10.8493 LearningRate 0.0643 Epoch: 3 Global Step: 164500 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:23,344-Speed 2630.00 samples/sec Loss 10.8890 LearningRate 0.0643 Epoch: 3 Global Step: 164510 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:27,240-Speed 2628.88 samples/sec Loss 11.0028 LearningRate 0.0643 Epoch: 3 Global Step: 164520 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:31,157-Speed 2614.76 samples/sec Loss 10.8711 LearningRate 0.0643 Epoch: 3 Global Step: 164530 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:35,057-Speed 2626.74 samples/sec Loss 10.9015 LearningRate 0.0643 Epoch: 3 Global Step: 164540 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:38,948-Speed 2631.82 samples/sec Loss 10.7872 LearningRate 0.0643 Epoch: 3 Global Step: 164550 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:42,839-Speed 2632.99 samples/sec Loss 11.0008 LearningRate 0.0643 Epoch: 3 Global Step: 164560 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:46,731-Speed 2631.52 samples/sec Loss 10.7964 LearningRate 0.0643 Epoch: 3 Global Step: 164570 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:50,628-Speed 2628.47 samples/sec Loss 10.9001 LearningRate 0.0643 Epoch: 3 Global Step: 164580 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:12:54,648-Speed 2547.70 samples/sec Loss 10.8688 LearningRate 0.0643 Epoch: 3 Global Step: 164590 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:12:58,546-Speed 2627.60 samples/sec Loss 10.9641 LearningRate 0.0643 Epoch: 3 Global Step: 164600 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:13:02,422-Speed 2642.47 samples/sec Loss 10.9158 LearningRate 0.0643 Epoch: 3 Global Step: 164610 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:06,320-Speed 2627.30 samples/sec Loss 10.9138 LearningRate 0.0643 Epoch: 3 Global Step: 164620 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:10,216-Speed 2630.04 samples/sec Loss 10.9936 LearningRate 0.0642 Epoch: 3 Global Step: 164630 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:14,119-Speed 2623.95 samples/sec Loss 10.8836 LearningRate 0.0642 Epoch: 3 Global Step: 164640 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:18,025-Speed 2622.61 samples/sec Loss 11.0052 LearningRate 0.0642 Epoch: 3 Global Step: 164650 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:21,919-Speed 2630.48 samples/sec Loss 10.9046 LearningRate 0.0642 Epoch: 3 Global Step: 164660 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:25,812-Speed 2630.75 samples/sec Loss 10.9061 LearningRate 0.0642 Epoch: 3 Global Step: 164670 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:29,706-Speed 2630.71 samples/sec Loss 10.8712 LearningRate 0.0642 Epoch: 3 Global Step: 164680 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:33,603-Speed 2628.27 samples/sec Loss 10.9413 LearningRate 0.0642 Epoch: 3 Global Step: 164690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:37,496-Speed 2630.63 samples/sec Loss 10.8932 LearningRate 0.0642 Epoch: 3 Global Step: 164700 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:13:41,397-Speed 2625.25 samples/sec Loss 10.9718 LearningRate 0.0642 Epoch: 3 Global Step: 164710 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:13:45,295-Speed 2627.78 samples/sec Loss 10.9277 LearningRate 0.0642 Epoch: 3 Global Step: 164720 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:13:49,198-Speed 2624.25 samples/sec Loss 10.8622 LearningRate 0.0642 Epoch: 3 Global Step: 164730 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:13:53,076-Speed 2641.14 samples/sec Loss 10.9196 LearningRate 0.0642 Epoch: 3 Global Step: 164740 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:13:56,973-Speed 2628.00 samples/sec Loss 10.9475 LearningRate 0.0642 Epoch: 3 Global Step: 164750 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:00,870-Speed 2629.19 samples/sec Loss 10.9717 LearningRate 0.0642 Epoch: 3 Global Step: 164760 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:04,769-Speed 2627.00 samples/sec Loss 11.0732 LearningRate 0.0642 Epoch: 3 Global Step: 164770 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:08,673-Speed 2623.39 samples/sec Loss 10.8268 LearningRate 0.0642 Epoch: 3 Global Step: 164780 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:12,565-Speed 2631.25 samples/sec Loss 10.7352 LearningRate 0.0642 Epoch: 3 Global Step: 164790 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:16,459-Speed 2630.85 samples/sec Loss 10.8922 LearningRate 0.0642 Epoch: 3 Global Step: 164800 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:20,360-Speed 2625.53 samples/sec Loss 10.8651 LearningRate 0.0642 Epoch: 3 Global Step: 164810 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:24,263-Speed 2624.72 samples/sec Loss 11.0050 LearningRate 0.0642 Epoch: 3 Global Step: 164820 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:28,205-Speed 2597.72 samples/sec Loss 10.9926 LearningRate 0.0642 Epoch: 3 Global Step: 164830 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:14:32,149-Speed 2597.40 samples/sec Loss 10.8458 LearningRate 0.0642 Epoch: 3 Global Step: 164840 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:14:36,070-Speed 2612.43 samples/sec Loss 10.9954 LearningRate 0.0642 Epoch: 3 Global Step: 164850 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:14:39,996-Speed 2609.03 samples/sec Loss 10.9001 LearningRate 0.0642 Epoch: 3 Global Step: 164860 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:14:44,017-Speed 2547.22 samples/sec Loss 10.8517 LearningRate 0.0642 Epoch: 3 Global Step: 164870 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:14:47,919-Speed 2624.93 samples/sec Loss 10.7881 LearningRate 0.0642 Epoch: 3 Global Step: 164880 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:14:51,817-Speed 2628.08 samples/sec Loss 10.8168 LearningRate 0.0642 Epoch: 3 Global Step: 164890 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:14:55,723-Speed 2621.93 samples/sec Loss 10.9518 LearningRate 0.0642 Epoch: 3 Global Step: 164900 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:14:59,621-Speed 2627.60 samples/sec Loss 11.0440 LearningRate 0.0642 Epoch: 3 Global Step: 164910 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:15:03,499-Speed 2641.61 samples/sec Loss 10.9297 LearningRate 0.0642 Epoch: 3 Global Step: 164920 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:07,392-Speed 2630.77 samples/sec Loss 10.8497 LearningRate 0.0642 Epoch: 3 Global Step: 164930 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:11,295-Speed 2624.35 samples/sec Loss 10.8735 LearningRate 0.0642 Epoch: 3 Global Step: 164940 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:15,202-Speed 2621.56 samples/sec Loss 10.9329 LearningRate 0.0642 Epoch: 3 Global Step: 164950 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:19,096-Speed 2630.38 samples/sec Loss 11.0297 LearningRate 0.0642 Epoch: 3 Global Step: 164960 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:22,993-Speed 2628.82 samples/sec Loss 10.9306 LearningRate 0.0642 Epoch: 3 Global Step: 164970 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:26,891-Speed 2626.99 samples/sec Loss 10.9129 LearningRate 0.0642 Epoch: 3 Global Step: 164980 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:30,805-Speed 2617.67 samples/sec Loss 10.9431 LearningRate 0.0642 Epoch: 3 Global Step: 164990 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:34,702-Speed 2628.34 samples/sec Loss 11.0150 LearningRate 0.0642 Epoch: 3 Global Step: 165000 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:38,594-Speed 2631.39 samples/sec Loss 10.8445 LearningRate 0.0642 Epoch: 3 Global Step: 165010 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:15:42,491-Speed 2628.23 samples/sec Loss 10.8902 LearningRate 0.0642 Epoch: 3 Global Step: 165020 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:15:46,385-Speed 2630.70 samples/sec Loss 10.8699 LearningRate 0.0642 Epoch: 3 Global Step: 165030 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:15:50,278-Speed 2631.14 samples/sec Loss 10.7863 LearningRate 0.0642 Epoch: 3 Global Step: 165040 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:15:54,173-Speed 2629.67 samples/sec Loss 10.7478 LearningRate 0.0642 Epoch: 3 Global Step: 165050 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:15:58,089-Speed 2615.79 samples/sec Loss 10.9768 LearningRate 0.0642 Epoch: 3 Global Step: 165060 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:16:01,992-Speed 2624.55 samples/sec Loss 10.9343 LearningRate 0.0642 Epoch: 3 Global Step: 165070 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:16:05,912-Speed 2612.42 samples/sec Loss 10.9832 LearningRate 0.0642 Epoch: 3 Global Step: 165080 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:16:09,804-Speed 2632.21 samples/sec Loss 10.8236 LearningRate 0.0642 Epoch: 3 Global Step: 165090 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:16:13,697-Speed 2630.75 samples/sec Loss 10.7955 LearningRate 0.0642 Epoch: 3 Global Step: 165100 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:16:17,590-Speed 2631.11 samples/sec Loss 10.7674 LearningRate 0.0642 Epoch: 3 Global Step: 165110 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:16:21,481-Speed 2632.65 samples/sec Loss 10.9549 LearningRate 0.0642 Epoch: 3 Global Step: 165120 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:25,371-Speed 2632.49 samples/sec Loss 10.9576 LearningRate 0.0642 Epoch: 3 Global Step: 165130 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:29,263-Speed 2632.42 samples/sec Loss 10.8442 LearningRate 0.0641 Epoch: 3 Global Step: 165140 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:33,160-Speed 2627.94 samples/sec Loss 11.0635 LearningRate 0.0641 Epoch: 3 Global Step: 165150 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:37,060-Speed 2626.67 samples/sec Loss 10.9554 LearningRate 0.0641 Epoch: 3 Global Step: 165160 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:40,977-Speed 2614.80 samples/sec Loss 10.9926 LearningRate 0.0641 Epoch: 3 Global Step: 165170 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:44,873-Speed 2628.59 samples/sec Loss 11.0049 LearningRate 0.0641 Epoch: 3 Global Step: 165180 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:48,768-Speed 2629.63 samples/sec Loss 11.1392 LearningRate 0.0641 Epoch: 3 Global Step: 165190 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:16:52,654-Speed 2636.00 samples/sec Loss 10.9094 LearningRate 0.0641 Epoch: 3 Global Step: 165200 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:16:56,558-Speed 2623.31 samples/sec Loss 10.9079 LearningRate 0.0641 Epoch: 3 Global Step: 165210 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:00,457-Speed 2627.26 samples/sec Loss 10.9659 LearningRate 0.0641 Epoch: 3 Global Step: 165220 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:04,366-Speed 2620.34 samples/sec Loss 10.8910 LearningRate 0.0641 Epoch: 3 Global Step: 165230 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:08,266-Speed 2626.32 samples/sec Loss 10.9315 LearningRate 0.0641 Epoch: 3 Global Step: 165240 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:12,165-Speed 2626.89 samples/sec Loss 10.9864 LearningRate 0.0641 Epoch: 3 Global Step: 165250 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:16,075-Speed 2619.48 samples/sec Loss 10.7931 LearningRate 0.0641 Epoch: 3 Global Step: 165260 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:19,974-Speed 2626.41 samples/sec Loss 11.0944 LearningRate 0.0641 Epoch: 3 Global Step: 165270 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:23,885-Speed 2619.48 samples/sec Loss 10.9086 LearningRate 0.0641 Epoch: 3 Global Step: 165280 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:27,784-Speed 2627.35 samples/sec Loss 10.9571 LearningRate 0.0641 Epoch: 3 Global Step: 165290 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:31,678-Speed 2629.62 samples/sec Loss 10.8494 LearningRate 0.0641 Epoch: 3 Global Step: 165300 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:35,573-Speed 2630.23 samples/sec Loss 10.9323 LearningRate 0.0641 Epoch: 3 Global Step: 165310 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:17:39,464-Speed 2631.86 samples/sec Loss 11.0264 LearningRate 0.0641 Epoch: 3 Global Step: 165320 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:17:43,364-Speed 2626.46 samples/sec Loss 10.9654 LearningRate 0.0641 Epoch: 3 Global Step: 165330 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:47,259-Speed 2629.22 samples/sec Loss 10.8620 LearningRate 0.0641 Epoch: 3 Global Step: 165340 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:51,155-Speed 2629.57 samples/sec Loss 10.8100 LearningRate 0.0641 Epoch: 3 Global Step: 165350 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:55,073-Speed 2614.16 samples/sec Loss 10.8889 LearningRate 0.0641 Epoch: 3 Global Step: 165360 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:17:58,983-Speed 2620.09 samples/sec Loss 10.8631 LearningRate 0.0641 Epoch: 3 Global Step: 165370 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:18:02,881-Speed 2627.30 samples/sec Loss 10.8381 LearningRate 0.0641 Epoch: 3 Global Step: 165380 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:18:06,802-Speed 2612.69 samples/sec Loss 10.7955 LearningRate 0.0641 Epoch: 3 Global Step: 165390 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:18:10,713-Speed 2618.90 samples/sec Loss 10.8390 LearningRate 0.0641 Epoch: 3 Global Step: 165400 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:18:14,612-Speed 2627.12 samples/sec Loss 10.8736 LearningRate 0.0641 Epoch: 3 Global Step: 165410 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:18:18,515-Speed 2624.23 samples/sec Loss 10.9831 LearningRate 0.0641 Epoch: 3 Global Step: 165420 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:18:22,409-Speed 2630.65 samples/sec Loss 10.9479 LearningRate 0.0641 Epoch: 3 Global Step: 165430 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:26,315-Speed 2621.92 samples/sec Loss 10.8576 LearningRate 0.0641 Epoch: 3 Global Step: 165440 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:30,211-Speed 2628.93 samples/sec Loss 10.9091 LearningRate 0.0641 Epoch: 3 Global Step: 165450 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:34,110-Speed 2626.64 samples/sec Loss 11.0012 LearningRate 0.0641 Epoch: 3 Global Step: 165460 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:38,004-Speed 2631.03 samples/sec Loss 10.8923 LearningRate 0.0641 Epoch: 3 Global Step: 165470 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:41,911-Speed 2620.91 samples/sec Loss 10.9568 LearningRate 0.0641 Epoch: 3 Global Step: 165480 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:45,810-Speed 2627.57 samples/sec Loss 10.8093 LearningRate 0.0641 Epoch: 3 Global Step: 165490 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:49,717-Speed 2621.14 samples/sec Loss 10.9434 LearningRate 0.0641 Epoch: 3 Global Step: 165500 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:53,626-Speed 2620.32 samples/sec Loss 10.9625 LearningRate 0.0641 Epoch: 3 Global Step: 165510 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:18:57,528-Speed 2625.01 samples/sec Loss 11.0629 LearningRate 0.0641 Epoch: 3 Global Step: 165520 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:19:01,428-Speed 2626.16 samples/sec Loss 10.8915 LearningRate 0.0641 Epoch: 3 Global Step: 165530 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:19:05,324-Speed 2628.90 samples/sec Loss 11.0324 LearningRate 0.0641 Epoch: 3 Global Step: 165540 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:19:09,220-Speed 2629.00 samples/sec Loss 10.9543 LearningRate 0.0641 Epoch: 3 Global Step: 165550 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:19:13,114-Speed 2630.33 samples/sec Loss 10.8658 LearningRate 0.0641 Epoch: 3 Global Step: 165560 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:19:16,995-Speed 2638.78 samples/sec Loss 11.0409 LearningRate 0.0641 Epoch: 3 Global Step: 165570 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:19:20,876-Speed 2639.82 samples/sec Loss 11.0410 LearningRate 0.0641 Epoch: 3 Global Step: 165580 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:24,784-Speed 2620.51 samples/sec Loss 10.9511 LearningRate 0.0641 Epoch: 3 Global Step: 165590 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:28,688-Speed 2624.25 samples/sec Loss 10.9386 LearningRate 0.0641 Epoch: 3 Global Step: 165600 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:32,580-Speed 2631.15 samples/sec Loss 10.9426 LearningRate 0.0641 Epoch: 3 Global Step: 165610 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:36,474-Speed 2630.44 samples/sec Loss 11.0673 LearningRate 0.0641 Epoch: 3 Global Step: 165620 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:40,397-Speed 2611.11 samples/sec Loss 10.8708 LearningRate 0.0641 Epoch: 3 Global Step: 165630 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:44,290-Speed 2631.06 samples/sec Loss 10.8027 LearningRate 0.0641 Epoch: 3 Global Step: 165640 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:48,330-Speed 2535.03 samples/sec Loss 10.9088 LearningRate 0.0641 Epoch: 3 Global Step: 165650 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:52,223-Speed 2631.16 samples/sec Loss 10.8888 LearningRate 0.0640 Epoch: 3 Global Step: 165660 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:19:56,119-Speed 2628.61 samples/sec Loss 11.0723 LearningRate 0.0640 Epoch: 3 Global Step: 165670 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:20:00,016-Speed 2628.24 samples/sec Loss 10.8624 LearningRate 0.0640 Epoch: 3 Global Step: 165680 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:03,926-Speed 2619.85 samples/sec Loss 10.8624 LearningRate 0.0640 Epoch: 3 Global Step: 165690 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:07,823-Speed 2628.32 samples/sec Loss 10.8057 LearningRate 0.0640 Epoch: 3 Global Step: 165700 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:11,722-Speed 2627.35 samples/sec Loss 10.9920 LearningRate 0.0640 Epoch: 3 Global Step: 165710 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:15,626-Speed 2623.23 samples/sec Loss 10.9615 LearningRate 0.0640 Epoch: 3 Global Step: 165720 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:19,525-Speed 2627.47 samples/sec Loss 10.9060 LearningRate 0.0640 Epoch: 3 Global Step: 165730 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:23,427-Speed 2625.00 samples/sec Loss 10.8880 LearningRate 0.0640 Epoch: 3 Global Step: 165740 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:27,331-Speed 2623.83 samples/sec Loss 11.1735 LearningRate 0.0640 Epoch: 3 Global Step: 165750 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:31,248-Speed 2614.72 samples/sec Loss 10.8404 LearningRate 0.0640 Epoch: 3 Global Step: 165760 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:35,144-Speed 2628.86 samples/sec Loss 11.0280 LearningRate 0.0640 Epoch: 3 Global Step: 165770 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:39,040-Speed 2629.50 samples/sec Loss 10.7781 LearningRate 0.0640 Epoch: 3 Global Step: 165780 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:20:42,935-Speed 2629.37 samples/sec Loss 10.8957 LearningRate 0.0640 Epoch: 3 Global Step: 165790 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:20:46,832-Speed 2628.19 samples/sec Loss 10.9039 LearningRate 0.0640 Epoch: 3 Global Step: 165800 Fp16 Grad Scale: 262144 Required: 75 hours
Training: 2022-04-13 14:20:50,711-Speed 2640.55 samples/sec Loss 10.8810 LearningRate 0.0640 Epoch: 3 Global Step: 165810 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:54,611-Speed 2626.39 samples/sec Loss 10.8913 LearningRate 0.0640 Epoch: 3 Global Step: 165820 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:20:58,511-Speed 2626.96 samples/sec Loss 10.8704 LearningRate 0.0640 Epoch: 3 Global Step: 165830 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:21:02,420-Speed 2619.87 samples/sec Loss 10.9257 LearningRate 0.0640 Epoch: 3 Global Step: 165840 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:21:06,315-Speed 2629.50 samples/sec Loss 10.8983 LearningRate 0.0640 Epoch: 3 Global Step: 165850 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:21:10,209-Speed 2630.16 samples/sec Loss 10.8850 LearningRate 0.0640 Epoch: 3 Global Step: 165860 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:21:14,118-Speed 2620.20 samples/sec Loss 11.0602 LearningRate 0.0640 Epoch: 3 Global Step: 165870 Fp16 Grad Scale: 131072 Required: 75 hours
Training: 2022-04-13 14:21:18,015-Speed 2628.54 samples/sec Loss 10.8924 LearningRate 0.0640 Epoch: 3 Global Step: 165880 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:21:21,948-Speed 2603.73 samples/sec Loss 10.9192 LearningRate 0.0640 Epoch: 3 Global Step: 165890 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:21:25,842-Speed 2630.97 samples/sec Loss 10.8564 LearningRate 0.0640 Epoch: 3 Global Step: 165900 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:21:29,750-Speed 2620.96 samples/sec Loss 10.9676 LearningRate 0.0640 Epoch: 3 Global Step: 165910 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:21:50,890-Speed 484.41 samples/sec Loss 11.0112 LearningRate 0.0640 Epoch: 4 Global Step: 165920 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:21:54,779-Speed 2634.58 samples/sec Loss 10.9454 LearningRate 0.0640 Epoch: 4 Global Step: 165930 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:21:58,664-Speed 2636.62 samples/sec Loss 10.9245 LearningRate 0.0640 Epoch: 4 Global Step: 165940 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:22:02,558-Speed 2630.73 samples/sec Loss 10.9101 LearningRate 0.0640 Epoch: 4 Global Step: 165950 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:22:06,456-Speed 2627.55 samples/sec Loss 10.8445 LearningRate 0.0640 Epoch: 4 Global Step: 165960 Fp16 Grad Scale: 65536 Required: 75 hours
Training: 2022-04-13 14:22:10,311-Speed 2656.70 samples/sec Loss 10.9454 LearningRate 0.0640 Epoch: 4 Global Step: 165970 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:14,255-Speed 2597.05 samples/sec Loss 11.6734 LearningRate 0.0640 Epoch: 4 Global Step: 165980 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:18,138-Speed 2637.78 samples/sec Loss 11.0802 LearningRate 0.0640 Epoch: 4 Global Step: 165990 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:22,025-Speed 2635.34 samples/sec Loss 10.9931 LearningRate 0.0640 Epoch: 4 Global Step: 166000 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:25,909-Speed 2637.48 samples/sec Loss 10.9837 LearningRate 0.0640 Epoch: 4 Global Step: 166010 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:29,797-Speed 2633.89 samples/sec Loss 10.9347 LearningRate 0.0640 Epoch: 4 Global Step: 166020 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:33,679-Speed 2638.83 samples/sec Loss 10.9395 LearningRate 0.0640 Epoch: 4 Global Step: 166030 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:37,577-Speed 2627.57 samples/sec Loss 11.0745 LearningRate 0.0640 Epoch: 4 Global Step: 166040 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:41,465-Speed 2634.74 samples/sec Loss 11.0384 LearningRate 0.0640 Epoch: 4 Global Step: 166050 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:45,356-Speed 2632.44 samples/sec Loss 10.9228 LearningRate 0.0640 Epoch: 4 Global Step: 166060 Fp16 Grad Scale: 16384 Required: 75 hours
Training: 2022-04-13 14:22:49,255-Speed 2627.06 samples/sec Loss 10.9700 LearningRate 0.0640 Epoch: 4 Global Step: 166070 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:22:53,146-Speed 2632.21 samples/sec Loss 10.8792 LearningRate 0.0640 Epoch: 4 Global Step: 166080 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:22:57,038-Speed 2631.73 samples/sec Loss 10.8374 LearningRate 0.0640 Epoch: 4 Global Step: 166090 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:23:00,933-Speed 2629.31 samples/sec Loss 10.9539 LearningRate 0.0640 Epoch: 4 Global Step: 166100 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:23:04,842-Speed 2620.34 samples/sec Loss 10.8036 LearningRate 0.0640 Epoch: 4 Global Step: 166110 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:23:08,745-Speed 2623.98 samples/sec Loss 10.8780 LearningRate 0.0640 Epoch: 4 Global Step: 166120 Fp16 Grad Scale: 32768 Required: 75 hours
Training: 2022-04-13 14:23:12,640-Speed 2630.12 samples/sec Loss 10.8332 LearningRate 0.0640 Epoch: 4 Global Step: 166130 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:23:16,541-Speed 2625.42 samples/sec Loss 10.8140 LearningRate 0.0640 Epoch: 4 Global Step: 166140 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:23:20,436-Speed 2629.69 samples/sec Loss 10.8718 LearningRate 0.0640 Epoch: 4 Global Step: 166150 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:23:24,331-Speed 2629.26 samples/sec Loss 10.9417 LearningRate 0.0640 Epoch: 4 Global Step: 166160 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:23:28,235-Speed 2623.78 samples/sec Loss 10.8876 LearningRate 0.0640 Epoch: 4 Global Step: 166170 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:32,235-Speed 2561.00 samples/sec Loss 10.9525 LearningRate 0.0639 Epoch: 4 Global Step: 166180 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:36,135-Speed 2625.56 samples/sec Loss 10.8792 LearningRate 0.0639 Epoch: 4 Global Step: 166190 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:40,032-Speed 2628.36 samples/sec Loss 10.7112 LearningRate 0.0639 Epoch: 4 Global Step: 166200 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:43,931-Speed 2626.94 samples/sec Loss 10.8979 LearningRate 0.0639 Epoch: 4 Global Step: 166210 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:47,873-Speed 2598.87 samples/sec Loss 10.8686 LearningRate 0.0639 Epoch: 4 Global Step: 166220 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:51,767-Speed 2630.18 samples/sec Loss 11.0111 LearningRate 0.0639 Epoch: 4 Global Step: 166230 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:55,741-Speed 2577.81 samples/sec Loss 10.9062 LearningRate 0.0639 Epoch: 4 Global Step: 166240 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:23:59,649-Speed 2620.70 samples/sec Loss 11.0863 LearningRate 0.0639 Epoch: 4 Global Step: 166250 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:24:03,594-Speed 2596.61 samples/sec Loss 10.8673 LearningRate 0.0639 Epoch: 4 Global Step: 166260 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:24:07,510-Speed 2615.42 samples/sec Loss 10.8728 LearningRate 0.0639 Epoch: 4 Global Step: 166270 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:11,413-Speed 2624.04 samples/sec Loss 10.9223 LearningRate 0.0639 Epoch: 4 Global Step: 166280 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:15,309-Speed 2628.99 samples/sec Loss 10.9859 LearningRate 0.0639 Epoch: 4 Global Step: 166290 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:19,208-Speed 2627.29 samples/sec Loss 10.9415 LearningRate 0.0639 Epoch: 4 Global Step: 166300 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:23,106-Speed 2628.23 samples/sec Loss 10.8897 LearningRate 0.0639 Epoch: 4 Global Step: 166310 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:27,002-Speed 2628.25 samples/sec Loss 10.9874 LearningRate 0.0639 Epoch: 4 Global Step: 166320 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:30,902-Speed 2626.58 samples/sec Loss 10.8920 LearningRate 0.0639 Epoch: 4 Global Step: 166330 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:34,912-Speed 2553.65 samples/sec Loss 10.9370 LearningRate 0.0639 Epoch: 4 Global Step: 166340 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:38,839-Speed 2608.82 samples/sec Loss 10.9532 LearningRate 0.0639 Epoch: 4 Global Step: 166350 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:42,778-Speed 2600.43 samples/sec Loss 10.8756 LearningRate 0.0639 Epoch: 4 Global Step: 166360 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:24:46,671-Speed 2630.99 samples/sec Loss 10.7874 LearningRate 0.0639 Epoch: 4 Global Step: 166370 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:24:50,567-Speed 2629.39 samples/sec Loss 10.7718 LearningRate 0.0639 Epoch: 4 Global Step: 166380 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:24:54,466-Speed 2626.40 samples/sec Loss 10.8953 LearningRate 0.0639 Epoch: 4 Global Step: 166390 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:24:58,375-Speed 2620.42 samples/sec Loss 10.8186 LearningRate 0.0639 Epoch: 4 Global Step: 166400 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:02,275-Speed 2626.48 samples/sec Loss 10.9683 LearningRate 0.0639 Epoch: 4 Global Step: 166410 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:06,178-Speed 2624.54 samples/sec Loss 10.9713 LearningRate 0.0639 Epoch: 4 Global Step: 166420 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:10,073-Speed 2629.55 samples/sec Loss 10.9294 LearningRate 0.0639 Epoch: 4 Global Step: 166430 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:13,981-Speed 2621.09 samples/sec Loss 10.7508 LearningRate 0.0639 Epoch: 4 Global Step: 166440 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:17,878-Speed 2628.70 samples/sec Loss 10.8753 LearningRate 0.0639 Epoch: 4 Global Step: 166450 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:21,789-Speed 2618.57 samples/sec Loss 10.9423 LearningRate 0.0639 Epoch: 4 Global Step: 166460 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:25,682-Speed 2631.28 samples/sec Loss 10.7409 LearningRate 0.0639 Epoch: 4 Global Step: 166470 Fp16 Grad Scale: 524288 Required: 74 hours
Training: 2022-04-13 14:25:29,560-Speed 2641.58 samples/sec Loss 11.0004 LearningRate 0.0639 Epoch: 4 Global Step: 166480 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:33,456-Speed 2628.99 samples/sec Loss 10.8695 LearningRate 0.0639 Epoch: 4 Global Step: 166490 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:37,353-Speed 2627.73 samples/sec Loss 10.8116 LearningRate 0.0639 Epoch: 4 Global Step: 166500 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:41,245-Speed 2632.08 samples/sec Loss 10.9856 LearningRate 0.0639 Epoch: 4 Global Step: 166510 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:25:45,127-Speed 2638.32 samples/sec Loss 10.8548 LearningRate 0.0639 Epoch: 4 Global Step: 166520 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:25:49,036-Speed 2620.71 samples/sec Loss 10.9104 LearningRate 0.0639 Epoch: 4 Global Step: 166530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:25:52,934-Speed 2627.47 samples/sec Loss 10.8311 LearningRate 0.0639 Epoch: 4 Global Step: 166540 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:25:56,833-Speed 2627.30 samples/sec Loss 10.9935 LearningRate 0.0639 Epoch: 4 Global Step: 166550 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:00,735-Speed 2624.83 samples/sec Loss 10.9520 LearningRate 0.0639 Epoch: 4 Global Step: 166560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:04,632-Speed 2628.21 samples/sec Loss 10.8624 LearningRate 0.0639 Epoch: 4 Global Step: 166570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:08,571-Speed 2600.09 samples/sec Loss 10.9174 LearningRate 0.0639 Epoch: 4 Global Step: 166580 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:12,487-Speed 2616.24 samples/sec Loss 10.9446 LearningRate 0.0639 Epoch: 4 Global Step: 166590 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:16,389-Speed 2624.71 samples/sec Loss 10.5952 LearningRate 0.0639 Epoch: 4 Global Step: 166600 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:20,281-Speed 2632.05 samples/sec Loss 10.8732 LearningRate 0.0639 Epoch: 4 Global Step: 166610 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:24,177-Speed 2628.87 samples/sec Loss 10.8492 LearningRate 0.0639 Epoch: 4 Global Step: 166620 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:26:28,089-Speed 2618.65 samples/sec Loss 10.8223 LearningRate 0.0639 Epoch: 4 Global Step: 166630 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:26:31,994-Speed 2623.19 samples/sec Loss 10.8288 LearningRate 0.0639 Epoch: 4 Global Step: 166640 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:26:35,901-Speed 2621.00 samples/sec Loss 10.9069 LearningRate 0.0639 Epoch: 4 Global Step: 166650 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:26:39,819-Speed 2614.52 samples/sec Loss 10.8087 LearningRate 0.0639 Epoch: 4 Global Step: 166660 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:26:43,713-Speed 2630.85 samples/sec Loss 10.8250 LearningRate 0.0639 Epoch: 4 Global Step: 166670 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:26:47,608-Speed 2629.17 samples/sec Loss 10.7349 LearningRate 0.0639 Epoch: 4 Global Step: 166680 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:26:51,483-Speed 2643.59 samples/sec Loss 11.0178 LearningRate 0.0639 Epoch: 4 Global Step: 166690 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:55,378-Speed 2629.61 samples/sec Loss 10.7513 LearningRate 0.0638 Epoch: 4 Global Step: 166700 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:26:59,274-Speed 2629.35 samples/sec Loss 10.8219 LearningRate 0.0638 Epoch: 4 Global Step: 166710 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:03,176-Speed 2624.48 samples/sec Loss 10.8765 LearningRate 0.0638 Epoch: 4 Global Step: 166720 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:07,097-Speed 2612.40 samples/sec Loss 10.8427 LearningRate 0.0638 Epoch: 4 Global Step: 166730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:11,009-Speed 2617.74 samples/sec Loss 10.9023 LearningRate 0.0638 Epoch: 4 Global Step: 166740 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:14,913-Speed 2623.91 samples/sec Loss 10.8391 LearningRate 0.0638 Epoch: 4 Global Step: 166750 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:18,822-Speed 2620.76 samples/sec Loss 10.9604 LearningRate 0.0638 Epoch: 4 Global Step: 166760 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:22,713-Speed 2632.06 samples/sec Loss 10.9360 LearningRate 0.0638 Epoch: 4 Global Step: 166770 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:26,609-Speed 2629.29 samples/sec Loss 10.9757 LearningRate 0.0638 Epoch: 4 Global Step: 166780 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:27:30,504-Speed 2629.04 samples/sec Loss 10.9240 LearningRate 0.0638 Epoch: 4 Global Step: 166790 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:27:34,362-Speed 2654.66 samples/sec Loss 10.9736 LearningRate 0.0638 Epoch: 4 Global Step: 166800 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:27:38,262-Speed 2626.43 samples/sec Loss 10.9264 LearningRate 0.0638 Epoch: 4 Global Step: 166810 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:27:42,169-Speed 2621.69 samples/sec Loss 10.8554 LearningRate 0.0638 Epoch: 4 Global Step: 166820 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:27:46,074-Speed 2623.35 samples/sec Loss 10.9287 LearningRate 0.0638 Epoch: 4 Global Step: 166830 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:27:49,990-Speed 2615.55 samples/sec Loss 10.9666 LearningRate 0.0638 Epoch: 4 Global Step: 166840 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:27:53,893-Speed 2624.41 samples/sec Loss 10.9769 LearningRate 0.0638 Epoch: 4 Global Step: 166850 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:27:57,816-Speed 2611.37 samples/sec Loss 11.0164 LearningRate 0.0638 Epoch: 4 Global Step: 166860 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:01,707-Speed 2631.82 samples/sec Loss 10.9618 LearningRate 0.0638 Epoch: 4 Global Step: 166870 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:05,613-Speed 2622.17 samples/sec Loss 10.8318 LearningRate 0.0638 Epoch: 4 Global Step: 166880 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:09,531-Speed 2614.57 samples/sec Loss 10.9235 LearningRate 0.0638 Epoch: 4 Global Step: 166890 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:13,443-Speed 2618.20 samples/sec Loss 10.6930 LearningRate 0.0638 Epoch: 4 Global Step: 166900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:28:17,337-Speed 2629.91 samples/sec Loss 10.9575 LearningRate 0.0638 Epoch: 4 Global Step: 166910 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:28:21,251-Speed 2617.15 samples/sec Loss 10.7190 LearningRate 0.0638 Epoch: 4 Global Step: 166920 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:28:25,165-Speed 2616.44 samples/sec Loss 10.7871 LearningRate 0.0638 Epoch: 4 Global Step: 166930 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:28:29,064-Speed 2627.22 samples/sec Loss 10.9264 LearningRate 0.0638 Epoch: 4 Global Step: 166940 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:28:32,942-Speed 2641.26 samples/sec Loss 10.8679 LearningRate 0.0638 Epoch: 4 Global Step: 166950 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:36,837-Speed 2630.11 samples/sec Loss 10.8995 LearningRate 0.0638 Epoch: 4 Global Step: 166960 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:40,731-Speed 2629.80 samples/sec Loss 10.9415 LearningRate 0.0638 Epoch: 4 Global Step: 166970 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:44,628-Speed 2628.58 samples/sec Loss 10.7775 LearningRate 0.0638 Epoch: 4 Global Step: 166980 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:48,563-Speed 2602.22 samples/sec Loss 10.8062 LearningRate 0.0638 Epoch: 4 Global Step: 166990 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:52,475-Speed 2618.67 samples/sec Loss 10.8774 LearningRate 0.0638 Epoch: 4 Global Step: 167000 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:28:56,368-Speed 2631.23 samples/sec Loss 10.7924 LearningRate 0.0638 Epoch: 4 Global Step: 167010 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:29:00,261-Speed 2630.89 samples/sec Loss 10.8417 LearningRate 0.0638 Epoch: 4 Global Step: 167020 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:29:04,225-Speed 2583.84 samples/sec Loss 10.8530 LearningRate 0.0638 Epoch: 4 Global Step: 167030 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:29:08,312-Speed 2506.32 samples/sec Loss 10.9114 LearningRate 0.0638 Epoch: 4 Global Step: 167040 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:29:12,405-Speed 2502.42 samples/sec Loss 10.7906 LearningRate 0.0638 Epoch: 4 Global Step: 167050 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:16,303-Speed 2627.69 samples/sec Loss 10.7801 LearningRate 0.0638 Epoch: 4 Global Step: 167060 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:20,198-Speed 2629.52 samples/sec Loss 10.8878 LearningRate 0.0638 Epoch: 4 Global Step: 167070 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:24,093-Speed 2630.01 samples/sec Loss 10.8856 LearningRate 0.0638 Epoch: 4 Global Step: 167080 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:27,987-Speed 2630.29 samples/sec Loss 11.0918 LearningRate 0.0638 Epoch: 4 Global Step: 167090 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:31,880-Speed 2631.22 samples/sec Loss 10.8141 LearningRate 0.0638 Epoch: 4 Global Step: 167100 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:35,778-Speed 2627.71 samples/sec Loss 10.8175 LearningRate 0.0638 Epoch: 4 Global Step: 167110 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:39,671-Speed 2630.67 samples/sec Loss 10.8642 LearningRate 0.0638 Epoch: 4 Global Step: 167120 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:43,582-Speed 2619.14 samples/sec Loss 10.8634 LearningRate 0.0638 Epoch: 4 Global Step: 167130 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:47,478-Speed 2628.96 samples/sec Loss 10.8827 LearningRate 0.0638 Epoch: 4 Global Step: 167140 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:51,376-Speed 2627.17 samples/sec Loss 10.8517 LearningRate 0.0638 Epoch: 4 Global Step: 167150 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:55,276-Speed 2626.05 samples/sec Loss 10.8948 LearningRate 0.0638 Epoch: 4 Global Step: 167160 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:29:59,185-Speed 2621.03 samples/sec Loss 10.8623 LearningRate 0.0638 Epoch: 4 Global Step: 167170 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:30:03,080-Speed 2628.85 samples/sec Loss 11.0594 LearningRate 0.0638 Epoch: 4 Global Step: 167180 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:30:06,981-Speed 2626.57 samples/sec Loss 10.8739 LearningRate 0.0638 Epoch: 4 Global Step: 167190 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:30:10,858-Speed 2641.96 samples/sec Loss 10.8163 LearningRate 0.0638 Epoch: 4 Global Step: 167200 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:14,763-Speed 2622.45 samples/sec Loss 12.1286 LearningRate 0.0638 Epoch: 4 Global Step: 167210 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:18,663-Speed 2626.48 samples/sec Loss 11.5601 LearningRate 0.0637 Epoch: 4 Global Step: 167220 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:22,555-Speed 2632.13 samples/sec Loss 11.2354 LearningRate 0.0637 Epoch: 4 Global Step: 167230 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:26,481-Speed 2608.57 samples/sec Loss 11.1638 LearningRate 0.0637 Epoch: 4 Global Step: 167240 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:30,377-Speed 2629.05 samples/sec Loss 11.1064 LearningRate 0.0637 Epoch: 4 Global Step: 167250 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:34,356-Speed 2574.23 samples/sec Loss 10.9744 LearningRate 0.0637 Epoch: 4 Global Step: 167260 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:38,279-Speed 2611.42 samples/sec Loss 11.0518 LearningRate 0.0637 Epoch: 4 Global Step: 167270 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:42,186-Speed 2621.41 samples/sec Loss 10.7631 LearningRate 0.0637 Epoch: 4 Global Step: 167280 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:46,077-Speed 2632.77 samples/sec Loss 11.0137 LearningRate 0.0637 Epoch: 4 Global Step: 167290 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:30:49,969-Speed 2631.25 samples/sec Loss 10.7934 LearningRate 0.0637 Epoch: 4 Global Step: 167300 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:30:53,861-Speed 2632.37 samples/sec Loss 10.8723 LearningRate 0.0637 Epoch: 4 Global Step: 167310 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:30:57,771-Speed 2619.76 samples/sec Loss 10.9893 LearningRate 0.0637 Epoch: 4 Global Step: 167320 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:01,666-Speed 2629.16 samples/sec Loss 10.9357 LearningRate 0.0637 Epoch: 4 Global Step: 167330 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:05,587-Speed 2611.96 samples/sec Loss 10.8828 LearningRate 0.0637 Epoch: 4 Global Step: 167340 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:09,482-Speed 2630.05 samples/sec Loss 10.9224 LearningRate 0.0637 Epoch: 4 Global Step: 167350 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:13,381-Speed 2627.44 samples/sec Loss 10.9445 LearningRate 0.0637 Epoch: 4 Global Step: 167360 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:17,280-Speed 2627.02 samples/sec Loss 10.9801 LearningRate 0.0637 Epoch: 4 Global Step: 167370 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:21,177-Speed 2628.73 samples/sec Loss 10.7823 LearningRate 0.0637 Epoch: 4 Global Step: 167380 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:25,075-Speed 2627.24 samples/sec Loss 10.9843 LearningRate 0.0637 Epoch: 4 Global Step: 167390 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:31:28,972-Speed 2628.80 samples/sec Loss 11.0350 LearningRate 0.0637 Epoch: 4 Global Step: 167400 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:31:32,871-Speed 2626.71 samples/sec Loss 10.9582 LearningRate 0.0637 Epoch: 4 Global Step: 167410 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:31:36,773-Speed 2624.83 samples/sec Loss 10.8611 LearningRate 0.0637 Epoch: 4 Global Step: 167420 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:31:40,671-Speed 2627.26 samples/sec Loss 10.8008 LearningRate 0.0637 Epoch: 4 Global Step: 167430 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:31:44,569-Speed 2628.28 samples/sec Loss 10.8873 LearningRate 0.0637 Epoch: 4 Global Step: 167440 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:31:48,467-Speed 2627.66 samples/sec Loss 10.9177 LearningRate 0.0637 Epoch: 4 Global Step: 167450 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:31:52,363-Speed 2629.13 samples/sec Loss 10.9442 LearningRate 0.0637 Epoch: 4 Global Step: 167460 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:31:56,274-Speed 2618.97 samples/sec Loss 10.9605 LearningRate 0.0637 Epoch: 4 Global Step: 167470 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:00,343-Speed 2517.12 samples/sec Loss 10.8941 LearningRate 0.0637 Epoch: 4 Global Step: 167480 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:04,416-Speed 2514.30 samples/sec Loss 10.7886 LearningRate 0.0637 Epoch: 4 Global Step: 167490 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:08,417-Speed 2560.20 samples/sec Loss 11.0714 LearningRate 0.0637 Epoch: 4 Global Step: 167500 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:32:12,296-Speed 2640.48 samples/sec Loss 10.8346 LearningRate 0.0637 Epoch: 4 Global Step: 167510 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:16,197-Speed 2625.70 samples/sec Loss 10.9776 LearningRate 0.0637 Epoch: 4 Global Step: 167520 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:20,096-Speed 2626.72 samples/sec Loss 10.9006 LearningRate 0.0637 Epoch: 4 Global Step: 167530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:23,991-Speed 2630.24 samples/sec Loss 10.9260 LearningRate 0.0637 Epoch: 4 Global Step: 167540 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:27,893-Speed 2624.76 samples/sec Loss 10.7571 LearningRate 0.0637 Epoch: 4 Global Step: 167550 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:31,787-Speed 2630.29 samples/sec Loss 10.8744 LearningRate 0.0637 Epoch: 4 Global Step: 167560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:35,686-Speed 2627.18 samples/sec Loss 10.7480 LearningRate 0.0637 Epoch: 4 Global Step: 167570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:39,590-Speed 2623.22 samples/sec Loss 10.9048 LearningRate 0.0637 Epoch: 4 Global Step: 167580 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:43,491-Speed 2625.87 samples/sec Loss 10.9235 LearningRate 0.0637 Epoch: 4 Global Step: 167590 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:47,391-Speed 2626.48 samples/sec Loss 11.0814 LearningRate 0.0637 Epoch: 4 Global Step: 167600 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:32:51,293-Speed 2625.23 samples/sec Loss 10.9855 LearningRate 0.0637 Epoch: 4 Global Step: 167610 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:32:55,200-Speed 2620.94 samples/sec Loss 10.7891 LearningRate 0.0637 Epoch: 4 Global Step: 167620 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:32:59,101-Speed 2625.92 samples/sec Loss 10.7830 LearningRate 0.0637 Epoch: 4 Global Step: 167630 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:03,002-Speed 2625.17 samples/sec Loss 10.8159 LearningRate 0.0637 Epoch: 4 Global Step: 167640 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:06,902-Speed 2626.43 samples/sec Loss 10.7402 LearningRate 0.0637 Epoch: 4 Global Step: 167650 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:10,797-Speed 2629.15 samples/sec Loss 10.9173 LearningRate 0.0637 Epoch: 4 Global Step: 167660 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:14,698-Speed 2626.30 samples/sec Loss 10.9457 LearningRate 0.0637 Epoch: 4 Global Step: 167670 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:18,593-Speed 2629.88 samples/sec Loss 11.0028 LearningRate 0.0637 Epoch: 4 Global Step: 167680 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:22,485-Speed 2631.70 samples/sec Loss 10.9536 LearningRate 0.0637 Epoch: 4 Global Step: 167690 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:26,381-Speed 2629.26 samples/sec Loss 10.9292 LearningRate 0.0637 Epoch: 4 Global Step: 167700 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:33:30,253-Speed 2645.49 samples/sec Loss 10.9740 LearningRate 0.0637 Epoch: 4 Global Step: 167710 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:33:34,152-Speed 2626.75 samples/sec Loss 10.8057 LearningRate 0.0637 Epoch: 4 Global Step: 167720 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:33:38,056-Speed 2623.33 samples/sec Loss 10.8268 LearningRate 0.0637 Epoch: 4 Global Step: 167730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:33:41,950-Speed 2630.59 samples/sec Loss 11.0218 LearningRate 0.0636 Epoch: 4 Global Step: 167740 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:33:45,839-Speed 2634.00 samples/sec Loss 10.7145 LearningRate 0.0636 Epoch: 4 Global Step: 167750 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:33:49,758-Speed 2613.40 samples/sec Loss 10.8518 LearningRate 0.0636 Epoch: 4 Global Step: 167760 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:33:53,685-Speed 2608.62 samples/sec Loss 10.9035 LearningRate 0.0636 Epoch: 4 Global Step: 167770 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:33:57,591-Speed 2622.10 samples/sec Loss 10.8465 LearningRate 0.0636 Epoch: 4 Global Step: 167780 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:01,502-Speed 2618.86 samples/sec Loss 10.9523 LearningRate 0.0636 Epoch: 4 Global Step: 167790 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:05,402-Speed 2625.91 samples/sec Loss 10.8682 LearningRate 0.0636 Epoch: 4 Global Step: 167800 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:09,299-Speed 2628.52 samples/sec Loss 10.8621 LearningRate 0.0636 Epoch: 4 Global Step: 167810 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:13,212-Speed 2617.33 samples/sec Loss 10.9394 LearningRate 0.0636 Epoch: 4 Global Step: 167820 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:17,127-Speed 2616.70 samples/sec Loss 10.9743 LearningRate 0.0636 Epoch: 4 Global Step: 167830 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:21,037-Speed 2619.13 samples/sec Loss 10.9306 LearningRate 0.0636 Epoch: 4 Global Step: 167840 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:24,939-Speed 2624.99 samples/sec Loss 10.9132 LearningRate 0.0636 Epoch: 4 Global Step: 167850 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:28,854-Speed 2616.70 samples/sec Loss 10.7688 LearningRate 0.0636 Epoch: 4 Global Step: 167860 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:32,777-Speed 2610.60 samples/sec Loss 10.8973 LearningRate 0.0636 Epoch: 4 Global Step: 167870 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:36,682-Speed 2623.01 samples/sec Loss 10.9279 LearningRate 0.0636 Epoch: 4 Global Step: 167880 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:40,586-Speed 2623.15 samples/sec Loss 10.9077 LearningRate 0.0636 Epoch: 4 Global Step: 167890 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:44,503-Speed 2615.32 samples/sec Loss 10.8090 LearningRate 0.0636 Epoch: 4 Global Step: 167900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:34:48,399-Speed 2628.81 samples/sec Loss 10.8561 LearningRate 0.0636 Epoch: 4 Global Step: 167910 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:34:52,304-Speed 2623.60 samples/sec Loss 10.9155 LearningRate 0.0636 Epoch: 4 Global Step: 167920 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:34:56,215-Speed 2618.85 samples/sec Loss 10.8991 LearningRate 0.0636 Epoch: 4 Global Step: 167930 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:00,123-Speed 2620.98 samples/sec Loss 10.7906 LearningRate 0.0636 Epoch: 4 Global Step: 167940 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:04,028-Speed 2622.78 samples/sec Loss 10.8875 LearningRate 0.0636 Epoch: 4 Global Step: 167950 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:07,940-Speed 2618.24 samples/sec Loss 10.9149 LearningRate 0.0636 Epoch: 4 Global Step: 167960 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:11,852-Speed 2617.82 samples/sec Loss 11.0222 LearningRate 0.0636 Epoch: 4 Global Step: 167970 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:15,760-Speed 2620.99 samples/sec Loss 10.8700 LearningRate 0.0636 Epoch: 4 Global Step: 167980 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:19,703-Speed 2597.43 samples/sec Loss 10.8854 LearningRate 0.0636 Epoch: 4 Global Step: 167990 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:23,649-Speed 2596.14 samples/sec Loss 10.8608 LearningRate 0.0636 Epoch: 4 Global Step: 168000 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:27,537-Speed 2634.35 samples/sec Loss 10.8577 LearningRate 0.0636 Epoch: 4 Global Step: 168010 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:31,436-Speed 2627.05 samples/sec Loss 10.8728 LearningRate 0.0636 Epoch: 4 Global Step: 168020 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:35,346-Speed 2619.27 samples/sec Loss 10.9872 LearningRate 0.0636 Epoch: 4 Global Step: 168030 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:39,249-Speed 2624.37 samples/sec Loss 10.8839 LearningRate 0.0636 Epoch: 4 Global Step: 168040 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:43,151-Speed 2625.26 samples/sec Loss 10.9563 LearningRate 0.0636 Epoch: 4 Global Step: 168050 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:47,053-Speed 2624.55 samples/sec Loss 10.8694 LearningRate 0.0636 Epoch: 4 Global Step: 168060 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:50,963-Speed 2619.22 samples/sec Loss 10.7732 LearningRate 0.0636 Epoch: 4 Global Step: 168070 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:54,873-Speed 2619.90 samples/sec Loss 10.9464 LearningRate 0.0636 Epoch: 4 Global Step: 168080 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:35:58,755-Speed 2638.60 samples/sec Loss 10.7835 LearningRate 0.0636 Epoch: 4 Global Step: 168090 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:36:02,650-Speed 2629.72 samples/sec Loss 10.8740 LearningRate 0.0636 Epoch: 4 Global Step: 168100 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:36:06,508-Speed 2655.08 samples/sec Loss 11.0024 LearningRate 0.0636 Epoch: 4 Global Step: 168110 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:36:10,380-Speed 2645.31 samples/sec Loss 10.8499 LearningRate 0.0636 Epoch: 4 Global Step: 168120 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:14,274-Speed 2630.25 samples/sec Loss 10.6917 LearningRate 0.0636 Epoch: 4 Global Step: 168130 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:18,184-Speed 2619.36 samples/sec Loss 10.8943 LearningRate 0.0636 Epoch: 4 Global Step: 168140 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:22,074-Speed 2632.79 samples/sec Loss 10.9558 LearningRate 0.0636 Epoch: 4 Global Step: 168150 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:25,971-Speed 2628.79 samples/sec Loss 10.8383 LearningRate 0.0636 Epoch: 4 Global Step: 168160 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:29,866-Speed 2629.47 samples/sec Loss 10.6689 LearningRate 0.0636 Epoch: 4 Global Step: 168170 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:33,758-Speed 2632.52 samples/sec Loss 10.9208 LearningRate 0.0636 Epoch: 4 Global Step: 168180 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:37,655-Speed 2627.47 samples/sec Loss 10.8023 LearningRate 0.0636 Epoch: 4 Global Step: 168190 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:41,551-Speed 2629.31 samples/sec Loss 10.7389 LearningRate 0.0636 Epoch: 4 Global Step: 168200 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:45,448-Speed 2628.85 samples/sec Loss 10.7973 LearningRate 0.0636 Epoch: 4 Global Step: 168210 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:36:49,508-Speed 2522.18 samples/sec Loss 10.9844 LearningRate 0.0636 Epoch: 4 Global Step: 168220 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:36:53,424-Speed 2615.72 samples/sec Loss 10.9070 LearningRate 0.0636 Epoch: 4 Global Step: 168230 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:36:57,331-Speed 2621.43 samples/sec Loss 10.9415 LearningRate 0.0636 Epoch: 4 Global Step: 168240 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:37:01,292-Speed 2585.96 samples/sec Loss 10.9537 LearningRate 0.0636 Epoch: 4 Global Step: 168250 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:37:05,396-Speed 2495.51 samples/sec Loss 10.7509 LearningRate 0.0635 Epoch: 4 Global Step: 168260 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:37:09,499-Speed 2496.77 samples/sec Loss 10.8043 LearningRate 0.0635 Epoch: 4 Global Step: 168270 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:37:13,612-Speed 2490.20 samples/sec Loss 10.8856 LearningRate 0.0635 Epoch: 4 Global Step: 168280 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:37:17,513-Speed 2625.16 samples/sec Loss 10.7765 LearningRate 0.0635 Epoch: 4 Global Step: 168290 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:37:21,402-Speed 2634.01 samples/sec Loss 10.8809 LearningRate 0.0635 Epoch: 4 Global Step: 168300 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:37:25,277-Speed 2643.21 samples/sec Loss 10.9296 LearningRate 0.0635 Epoch: 4 Global Step: 168310 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:29,172-Speed 2629.84 samples/sec Loss 10.7748 LearningRate 0.0635 Epoch: 4 Global Step: 168320 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:33,067-Speed 2629.35 samples/sec Loss 10.7700 LearningRate 0.0635 Epoch: 4 Global Step: 168330 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:36,973-Speed 2622.71 samples/sec Loss 10.8143 LearningRate 0.0635 Epoch: 4 Global Step: 168340 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:40,871-Speed 2627.06 samples/sec Loss 10.9175 LearningRate 0.0635 Epoch: 4 Global Step: 168350 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:44,833-Speed 2585.81 samples/sec Loss 10.8068 LearningRate 0.0635 Epoch: 4 Global Step: 168360 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:48,925-Speed 2502.59 samples/sec Loss 10.7991 LearningRate 0.0635 Epoch: 4 Global Step: 168370 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:52,881-Speed 2589.52 samples/sec Loss 10.8701 LearningRate 0.0635 Epoch: 4 Global Step: 168380 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:37:56,781-Speed 2625.64 samples/sec Loss 10.8314 LearningRate 0.0635 Epoch: 4 Global Step: 168390 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:00,677-Speed 2629.16 samples/sec Loss 10.9435 LearningRate 0.0635 Epoch: 4 Global Step: 168400 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:04,578-Speed 2626.17 samples/sec Loss 10.9113 LearningRate 0.0635 Epoch: 4 Global Step: 168410 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:38:08,475-Speed 2627.98 samples/sec Loss 10.8168 LearningRate 0.0635 Epoch: 4 Global Step: 168420 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:38:12,380-Speed 2623.05 samples/sec Loss 10.9760 LearningRate 0.0635 Epoch: 4 Global Step: 168430 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:38:16,281-Speed 2625.21 samples/sec Loss 10.7986 LearningRate 0.0635 Epoch: 4 Global Step: 168440 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:38:20,165-Speed 2637.08 samples/sec Loss 10.9145 LearningRate 0.0635 Epoch: 4 Global Step: 168450 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:24,064-Speed 2626.94 samples/sec Loss 11.0139 LearningRate 0.0635 Epoch: 4 Global Step: 168460 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:27,986-Speed 2612.14 samples/sec Loss 11.0297 LearningRate 0.0635 Epoch: 4 Global Step: 168470 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:31,884-Speed 2627.37 samples/sec Loss 10.8637 LearningRate 0.0635 Epoch: 4 Global Step: 168480 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:35,774-Speed 2632.89 samples/sec Loss 10.9090 LearningRate 0.0635 Epoch: 4 Global Step: 168490 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:39,691-Speed 2614.75 samples/sec Loss 10.8270 LearningRate 0.0635 Epoch: 4 Global Step: 168500 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:43,592-Speed 2625.92 samples/sec Loss 10.8076 LearningRate 0.0635 Epoch: 4 Global Step: 168510 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:47,494-Speed 2624.99 samples/sec Loss 10.8312 LearningRate 0.0635 Epoch: 4 Global Step: 168520 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:51,410-Speed 2616.50 samples/sec Loss 10.8941 LearningRate 0.0635 Epoch: 4 Global Step: 168530 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:55,311-Speed 2625.30 samples/sec Loss 10.8376 LearningRate 0.0635 Epoch: 4 Global Step: 168540 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:38:59,216-Speed 2623.40 samples/sec Loss 10.8035 LearningRate 0.0635 Epoch: 4 Global Step: 168550 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:03,130-Speed 2616.94 samples/sec Loss 10.7926 LearningRate 0.0635 Epoch: 4 Global Step: 168560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:07,029-Speed 2626.64 samples/sec Loss 10.8542 LearningRate 0.0635 Epoch: 4 Global Step: 168570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:10,963-Speed 2603.46 samples/sec Loss 10.7163 LearningRate 0.0635 Epoch: 4 Global Step: 168580 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:14,895-Speed 2605.60 samples/sec Loss 10.7991 LearningRate 0.0635 Epoch: 4 Global Step: 168590 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:18,790-Speed 2629.23 samples/sec Loss 10.7987 LearningRate 0.0635 Epoch: 4 Global Step: 168600 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:22,709-Speed 2613.57 samples/sec Loss 10.9212 LearningRate 0.0635 Epoch: 4 Global Step: 168610 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:26,612-Speed 2624.41 samples/sec Loss 10.9234 LearningRate 0.0635 Epoch: 4 Global Step: 168620 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:39:30,525-Speed 2617.90 samples/sec Loss 10.8422 LearningRate 0.0635 Epoch: 4 Global Step: 168630 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:39:34,453-Speed 2607.59 samples/sec Loss 10.8392 LearningRate 0.0635 Epoch: 4 Global Step: 168640 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:39:38,352-Speed 2627.19 samples/sec Loss 10.8371 LearningRate 0.0635 Epoch: 4 Global Step: 168650 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:39:42,250-Speed 2627.56 samples/sec Loss 10.8701 LearningRate 0.0635 Epoch: 4 Global Step: 168660 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:39:46,148-Speed 2627.54 samples/sec Loss 10.8869 LearningRate 0.0635 Epoch: 4 Global Step: 168670 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:39:50,044-Speed 2629.25 samples/sec Loss 10.9157 LearningRate 0.0635 Epoch: 4 Global Step: 168680 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:39:53,937-Speed 2630.35 samples/sec Loss 10.9160 LearningRate 0.0635 Epoch: 4 Global Step: 168690 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:39:57,836-Speed 2627.86 samples/sec Loss 10.9230 LearningRate 0.0635 Epoch: 4 Global Step: 168700 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:40:01,730-Speed 2630.06 samples/sec Loss 10.7169 LearningRate 0.0635 Epoch: 4 Global Step: 168710 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:40:05,632-Speed 2624.91 samples/sec Loss 10.8118 LearningRate 0.0635 Epoch: 4 Global Step: 168720 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:40:09,529-Speed 2628.22 samples/sec Loss 10.9902 LearningRate 0.0635 Epoch: 4 Global Step: 168730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:13,428-Speed 2627.57 samples/sec Loss 10.9293 LearningRate 0.0635 Epoch: 4 Global Step: 168740 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:17,330-Speed 2624.62 samples/sec Loss 10.7354 LearningRate 0.0635 Epoch: 4 Global Step: 168750 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:21,227-Speed 2628.18 samples/sec Loss 10.8580 LearningRate 0.0635 Epoch: 4 Global Step: 168760 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:25,124-Speed 2628.57 samples/sec Loss 10.6964 LearningRate 0.0635 Epoch: 4 Global Step: 168770 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:29,020-Speed 2628.96 samples/sec Loss 10.8570 LearningRate 0.0634 Epoch: 4 Global Step: 168780 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:32,932-Speed 2617.78 samples/sec Loss 10.7401 LearningRate 0.0634 Epoch: 4 Global Step: 168790 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:36,830-Speed 2627.94 samples/sec Loss 10.7566 LearningRate 0.0634 Epoch: 4 Global Step: 168800 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:40,727-Speed 2628.28 samples/sec Loss 10.8376 LearningRate 0.0634 Epoch: 4 Global Step: 168810 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:44,634-Speed 2622.21 samples/sec Loss 10.6976 LearningRate 0.0634 Epoch: 4 Global Step: 168820 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:48,510-Speed 2642.14 samples/sec Loss 10.8791 LearningRate 0.0634 Epoch: 4 Global Step: 168830 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:52,406-Speed 2628.87 samples/sec Loss 10.8513 LearningRate 0.0634 Epoch: 4 Global Step: 168840 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:40:56,306-Speed 2626.38 samples/sec Loss 11.0031 LearningRate 0.0634 Epoch: 4 Global Step: 168850 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:00,213-Speed 2621.47 samples/sec Loss 10.6927 LearningRate 0.0634 Epoch: 4 Global Step: 168860 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:04,130-Speed 2615.41 samples/sec Loss 10.8947 LearningRate 0.0634 Epoch: 4 Global Step: 168870 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:08,044-Speed 2616.73 samples/sec Loss 10.9192 LearningRate 0.0634 Epoch: 4 Global Step: 168880 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:11,942-Speed 2627.98 samples/sec Loss 10.7492 LearningRate 0.0634 Epoch: 4 Global Step: 168890 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:15,837-Speed 2629.86 samples/sec Loss 10.8229 LearningRate 0.0634 Epoch: 4 Global Step: 168900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:19,734-Speed 2627.75 samples/sec Loss 10.9207 LearningRate 0.0634 Epoch: 4 Global Step: 168910 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:23,632-Speed 2627.99 samples/sec Loss 10.8541 LearningRate 0.0634 Epoch: 4 Global Step: 168920 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:27,528-Speed 2628.97 samples/sec Loss 10.7704 LearningRate 0.0634 Epoch: 4 Global Step: 168930 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:41:31,425-Speed 2627.80 samples/sec Loss 10.7944 LearningRate 0.0634 Epoch: 4 Global Step: 168940 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:41:35,326-Speed 2626.03 samples/sec Loss 10.9315 LearningRate 0.0634 Epoch: 4 Global Step: 168950 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:41:39,227-Speed 2625.33 samples/sec Loss 10.8068 LearningRate 0.0634 Epoch: 4 Global Step: 168960 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:41:43,111-Speed 2637.37 samples/sec Loss 10.8905 LearningRate 0.0634 Epoch: 4 Global Step: 168970 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:47,009-Speed 2627.45 samples/sec Loss 10.7923 LearningRate 0.0634 Epoch: 4 Global Step: 168980 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:50,902-Speed 2630.61 samples/sec Loss 10.8006 LearningRate 0.0634 Epoch: 4 Global Step: 168990 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:54,800-Speed 2627.68 samples/sec Loss 10.7580 LearningRate 0.0634 Epoch: 4 Global Step: 169000 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:41:58,714-Speed 2617.28 samples/sec Loss 10.7472 LearningRate 0.0634 Epoch: 4 Global Step: 169010 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:02,609-Speed 2629.52 samples/sec Loss 10.7549 LearningRate 0.0634 Epoch: 4 Global Step: 169020 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:06,509-Speed 2626.54 samples/sec Loss 10.7566 LearningRate 0.0634 Epoch: 4 Global Step: 169030 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:10,410-Speed 2625.16 samples/sec Loss 10.8527 LearningRate 0.0634 Epoch: 4 Global Step: 169040 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:14,308-Speed 2627.78 samples/sec Loss 10.8871 LearningRate 0.0634 Epoch: 4 Global Step: 169050 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:18,210-Speed 2624.49 samples/sec Loss 10.8986 LearningRate 0.0634 Epoch: 4 Global Step: 169060 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:22,106-Speed 2629.15 samples/sec Loss 10.8517 LearningRate 0.0634 Epoch: 4 Global Step: 169070 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:42:26,002-Speed 2629.39 samples/sec Loss 10.9152 LearningRate 0.0634 Epoch: 4 Global Step: 169080 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:42:29,896-Speed 2630.59 samples/sec Loss 10.8386 LearningRate 0.0634 Epoch: 4 Global Step: 169090 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:42:33,796-Speed 2625.73 samples/sec Loss 10.8781 LearningRate 0.0634 Epoch: 4 Global Step: 169100 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:42:37,690-Speed 2630.16 samples/sec Loss 10.7530 LearningRate 0.0634 Epoch: 4 Global Step: 169110 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:42:41,672-Speed 2572.02 samples/sec Loss 10.8076 LearningRate 0.0634 Epoch: 4 Global Step: 169120 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:42:45,612-Speed 2599.85 samples/sec Loss 10.7672 LearningRate 0.0634 Epoch: 4 Global Step: 169130 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:42:49,490-Speed 2640.87 samples/sec Loss 10.7980 LearningRate 0.0634 Epoch: 4 Global Step: 169140 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:53,385-Speed 2630.11 samples/sec Loss 10.9353 LearningRate 0.0634 Epoch: 4 Global Step: 169150 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:42:57,283-Speed 2627.83 samples/sec Loss 10.9227 LearningRate 0.0634 Epoch: 4 Global Step: 169160 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:01,179-Speed 2628.39 samples/sec Loss 10.8782 LearningRate 0.0634 Epoch: 4 Global Step: 169170 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:05,082-Speed 2624.21 samples/sec Loss 10.7342 LearningRate 0.0634 Epoch: 4 Global Step: 169180 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:09,018-Speed 2602.61 samples/sec Loss 10.9147 LearningRate 0.0634 Epoch: 4 Global Step: 169190 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:12,914-Speed 2628.97 samples/sec Loss 10.8871 LearningRate 0.0634 Epoch: 4 Global Step: 169200 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:16,815-Speed 2625.47 samples/sec Loss 10.8155 LearningRate 0.0634 Epoch: 4 Global Step: 169210 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:20,713-Speed 2627.97 samples/sec Loss 10.8832 LearningRate 0.0634 Epoch: 4 Global Step: 169220 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:24,612-Speed 2626.80 samples/sec Loss 10.7419 LearningRate 0.0634 Epoch: 4 Global Step: 169230 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:28,505-Speed 2631.30 samples/sec Loss 10.7228 LearningRate 0.0634 Epoch: 4 Global Step: 169240 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:43:32,411-Speed 2622.44 samples/sec Loss 10.8407 LearningRate 0.0634 Epoch: 4 Global Step: 169250 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:43:36,306-Speed 2629.59 samples/sec Loss 10.8764 LearningRate 0.0634 Epoch: 4 Global Step: 169260 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:43:40,201-Speed 2628.91 samples/sec Loss 10.9409 LearningRate 0.0634 Epoch: 4 Global Step: 169270 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:43:44,086-Speed 2637.02 samples/sec Loss 10.8578 LearningRate 0.0634 Epoch: 4 Global Step: 169280 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:47,972-Speed 2635.98 samples/sec Loss 10.8748 LearningRate 0.0634 Epoch: 4 Global Step: 169290 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:51,868-Speed 2629.32 samples/sec Loss 10.8943 LearningRate 0.0633 Epoch: 4 Global Step: 169300 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:55,769-Speed 2625.30 samples/sec Loss 10.6833 LearningRate 0.0633 Epoch: 4 Global Step: 169310 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:43:59,665-Speed 2629.18 samples/sec Loss 10.9778 LearningRate 0.0633 Epoch: 4 Global Step: 169320 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:44:03,560-Speed 2629.34 samples/sec Loss 10.8632 LearningRate 0.0633 Epoch: 4 Global Step: 169330 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:44:07,466-Speed 2622.23 samples/sec Loss 10.8101 LearningRate 0.0633 Epoch: 4 Global Step: 169340 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:44:11,366-Speed 2625.69 samples/sec Loss 10.9235 LearningRate 0.0633 Epoch: 4 Global Step: 169350 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:44:15,261-Speed 2630.53 samples/sec Loss 10.8409 LearningRate 0.0633 Epoch: 4 Global Step: 169360 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:44:19,200-Speed 2599.85 samples/sec Loss 10.8915 LearningRate 0.0633 Epoch: 4 Global Step: 169370 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:44:23,087-Speed 2635.62 samples/sec Loss 10.7732 LearningRate 0.0633 Epoch: 4 Global Step: 169380 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:26,982-Speed 2629.79 samples/sec Loss 10.8197 LearningRate 0.0633 Epoch: 4 Global Step: 169390 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:30,877-Speed 2629.88 samples/sec Loss 10.7959 LearningRate 0.0633 Epoch: 4 Global Step: 169400 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:34,869-Speed 2565.72 samples/sec Loss 10.8175 LearningRate 0.0633 Epoch: 4 Global Step: 169410 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:38,892-Speed 2545.76 samples/sec Loss 10.6244 LearningRate 0.0633 Epoch: 4 Global Step: 169420 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:43,481-Speed 2231.43 samples/sec Loss 10.9064 LearningRate 0.0633 Epoch: 4 Global Step: 169430 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:47,375-Speed 2630.63 samples/sec Loss 10.8168 LearningRate 0.0633 Epoch: 4 Global Step: 169440 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:51,269-Speed 2630.78 samples/sec Loss 10.9448 LearningRate 0.0633 Epoch: 4 Global Step: 169450 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:55,165-Speed 2628.91 samples/sec Loss 10.7307 LearningRate 0.0633 Epoch: 4 Global Step: 169460 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:44:59,086-Speed 2612.76 samples/sec Loss 10.8200 LearningRate 0.0633 Epoch: 4 Global Step: 169470 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:45:02,976-Speed 2632.86 samples/sec Loss 10.8574 LearningRate 0.0633 Epoch: 4 Global Step: 169480 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:06,905-Speed 2606.99 samples/sec Loss 10.7305 LearningRate 0.0633 Epoch: 4 Global Step: 169490 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:10,809-Speed 2623.71 samples/sec Loss 10.8797 LearningRate 0.0633 Epoch: 4 Global Step: 169500 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:14,709-Speed 2626.65 samples/sec Loss 10.8448 LearningRate 0.0633 Epoch: 4 Global Step: 169510 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:18,616-Speed 2620.80 samples/sec Loss 10.8014 LearningRate 0.0633 Epoch: 4 Global Step: 169520 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:22,533-Speed 2615.17 samples/sec Loss 10.8299 LearningRate 0.0633 Epoch: 4 Global Step: 169530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:26,431-Speed 2627.75 samples/sec Loss 10.7785 LearningRate 0.0633 Epoch: 4 Global Step: 169540 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:30,333-Speed 2625.25 samples/sec Loss 10.8068 LearningRate 0.0633 Epoch: 4 Global Step: 169550 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:34,236-Speed 2624.37 samples/sec Loss 10.9437 LearningRate 0.0633 Epoch: 4 Global Step: 169560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:38,141-Speed 2623.20 samples/sec Loss 10.7512 LearningRate 0.0633 Epoch: 4 Global Step: 169570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:42,048-Speed 2621.38 samples/sec Loss 10.7983 LearningRate 0.0633 Epoch: 4 Global Step: 169580 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:45:45,935-Speed 2635.54 samples/sec Loss 10.7457 LearningRate 0.0633 Epoch: 4 Global Step: 169590 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:45:49,837-Speed 2624.71 samples/sec Loss 10.7566 LearningRate 0.0633 Epoch: 4 Global Step: 169600 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:45:53,725-Speed 2634.64 samples/sec Loss 10.8635 LearningRate 0.0633 Epoch: 4 Global Step: 169610 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:45:57,621-Speed 2629.22 samples/sec Loss 10.7313 LearningRate 0.0633 Epoch: 4 Global Step: 169620 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:01,524-Speed 2623.76 samples/sec Loss 10.8766 LearningRate 0.0633 Epoch: 4 Global Step: 169630 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:05,416-Speed 2632.15 samples/sec Loss 10.8297 LearningRate 0.0633 Epoch: 4 Global Step: 169640 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:09,314-Speed 2627.45 samples/sec Loss 10.8052 LearningRate 0.0633 Epoch: 4 Global Step: 169650 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:13,210-Speed 2629.79 samples/sec Loss 10.7659 LearningRate 0.0633 Epoch: 4 Global Step: 169660 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:17,104-Speed 2630.34 samples/sec Loss 10.9454 LearningRate 0.0633 Epoch: 4 Global Step: 169670 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:20,995-Speed 2632.52 samples/sec Loss 10.7667 LearningRate 0.0633 Epoch: 4 Global Step: 169680 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:24,890-Speed 2629.79 samples/sec Loss 10.7241 LearningRate 0.0633 Epoch: 4 Global Step: 169690 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:28,784-Speed 2629.89 samples/sec Loss 10.7315 LearningRate 0.0633 Epoch: 4 Global Step: 169700 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:46:32,686-Speed 2625.05 samples/sec Loss 10.6250 LearningRate 0.0633 Epoch: 4 Global Step: 169710 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:46:36,580-Speed 2630.54 samples/sec Loss 10.7790 LearningRate 0.0633 Epoch: 4 Global Step: 169720 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:46:40,437-Speed 2655.30 samples/sec Loss 10.8038 LearningRate 0.0633 Epoch: 4 Global Step: 169730 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:46:44,333-Speed 2628.96 samples/sec Loss 10.7615 LearningRate 0.0633 Epoch: 4 Global Step: 169740 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:46:48,269-Speed 2602.26 samples/sec Loss 10.7717 LearningRate 0.0633 Epoch: 4 Global Step: 169750 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:46:52,180-Speed 2619.05 samples/sec Loss 10.8169 LearningRate 0.0633 Epoch: 4 Global Step: 169760 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:46:56,075-Speed 2629.04 samples/sec Loss 10.8249 LearningRate 0.0633 Epoch: 4 Global Step: 169770 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:46:59,967-Speed 2632.34 samples/sec Loss 10.7470 LearningRate 0.0633 Epoch: 4 Global Step: 169780 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:47:03,861-Speed 2629.90 samples/sec Loss 10.8131 LearningRate 0.0633 Epoch: 4 Global Step: 169790 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:47:07,764-Speed 2624.81 samples/sec Loss 10.8351 LearningRate 0.0633 Epoch: 4 Global Step: 169800 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:47:11,653-Speed 2633.67 samples/sec Loss 10.9434 LearningRate 0.0633 Epoch: 4 Global Step: 169810 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:47:15,548-Speed 2629.21 samples/sec Loss 10.8935 LearningRate 0.0632 Epoch: 4 Global Step: 169820 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:47:19,451-Speed 2624.55 samples/sec Loss 10.7454 LearningRate 0.0632 Epoch: 4 Global Step: 169830 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:23,334-Speed 2637.69 samples/sec Loss 10.8903 LearningRate 0.0632 Epoch: 4 Global Step: 169840 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:27,230-Speed 2628.77 samples/sec Loss 10.7526 LearningRate 0.0632 Epoch: 4 Global Step: 169850 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:31,134-Speed 2624.00 samples/sec Loss 10.7558 LearningRate 0.0632 Epoch: 4 Global Step: 169860 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:35,043-Speed 2620.61 samples/sec Loss 10.7063 LearningRate 0.0632 Epoch: 4 Global Step: 169870 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:38,951-Speed 2620.25 samples/sec Loss 10.7951 LearningRate 0.0632 Epoch: 4 Global Step: 169880 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:42,854-Speed 2624.88 samples/sec Loss 10.7814 LearningRate 0.0632 Epoch: 4 Global Step: 169890 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:46,750-Speed 2629.25 samples/sec Loss 10.7414 LearningRate 0.0632 Epoch: 4 Global Step: 169900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:50,643-Speed 2630.73 samples/sec Loss 10.9030 LearningRate 0.0632 Epoch: 4 Global Step: 169910 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:54,542-Speed 2626.46 samples/sec Loss 10.6726 LearningRate 0.0632 Epoch: 4 Global Step: 169920 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:47:58,428-Speed 2635.89 samples/sec Loss 10.7241 LearningRate 0.0632 Epoch: 4 Global Step: 169930 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:48:02,335-Speed 2621.78 samples/sec Loss 10.7837 LearningRate 0.0632 Epoch: 4 Global Step: 169940 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:48:06,232-Speed 2628.79 samples/sec Loss 10.8128 LearningRate 0.0632 Epoch: 4 Global Step: 169950 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:48:10,124-Speed 2631.59 samples/sec Loss 10.8520 LearningRate 0.0632 Epoch: 4 Global Step: 169960 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:48:14,023-Speed 2626.95 samples/sec Loss 10.8551 LearningRate 0.0632 Epoch: 4 Global Step: 169970 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:48:17,920-Speed 2628.09 samples/sec Loss 10.8173 LearningRate 0.0632 Epoch: 4 Global Step: 169980 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:48:21,876-Speed 2588.85 samples/sec Loss 10.7367 LearningRate 0.0632 Epoch: 4 Global Step: 169990 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:48:25,934-Speed 2524.08 samples/sec Loss 10.7718 LearningRate 0.0632 Epoch: 4 Global Step: 170000 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:49:09,339-[lfw][170000]XNorm: 22.842724
Training: 2022-04-13 14:49:09,340-[lfw][170000]Accuracy-Flip: 0.99767+-0.00309
Training: 2022-04-13 14:49:09,341-[lfw][170000]Accuracy-Highest: 0.99783
Training: 2022-04-13 14:49:59,490-[cfp_fp][170000]XNorm: 21.100989
Training: 2022-04-13 14:49:59,491-[cfp_fp][170000]Accuracy-Flip: 0.98071+-0.00565
Training: 2022-04-13 14:49:59,492-[cfp_fp][170000]Accuracy-Highest: 0.98100
Training: 2022-04-13 14:50:42,685-[agedb_30][170000]XNorm: 22.645779
Training: 2022-04-13 14:50:42,686-[agedb_30][170000]Accuracy-Flip: 0.97133+-0.00741
Training: 2022-04-13 14:50:42,686-[agedb_30][170000]Accuracy-Highest: 0.97133
Training: 2022-04-13 14:50:46,560-Speed 72.82 samples/sec Loss 10.9538 LearningRate 0.0632 Epoch: 4 Global Step: 170010 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:50:50,431-Speed 2645.36 samples/sec Loss 10.8029 LearningRate 0.0632 Epoch: 4 Global Step: 170020 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:50:54,286-Speed 2657.06 samples/sec Loss 10.7809 LearningRate 0.0632 Epoch: 4 Global Step: 170030 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:50:58,163-Speed 2642.16 samples/sec Loss 10.6764 LearningRate 0.0632 Epoch: 4 Global Step: 170040 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:51:02,036-Speed 2644.21 samples/sec Loss 10.7266 LearningRate 0.0632 Epoch: 4 Global Step: 170050 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:51:05,918-Speed 2639.17 samples/sec Loss 10.7595 LearningRate 0.0632 Epoch: 4 Global Step: 170060 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:51:09,805-Speed 2635.38 samples/sec Loss 10.8372 LearningRate 0.0632 Epoch: 4 Global Step: 170070 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:51:13,690-Speed 2636.34 samples/sec Loss 10.7721 LearningRate 0.0632 Epoch: 4 Global Step: 170080 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:51:17,569-Speed 2640.68 samples/sec Loss 10.8722 LearningRate 0.0632 Epoch: 4 Global Step: 170090 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:21,457-Speed 2634.10 samples/sec Loss 10.7622 LearningRate 0.0632 Epoch: 4 Global Step: 170100 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:25,353-Speed 2628.85 samples/sec Loss 10.7884 LearningRate 0.0632 Epoch: 4 Global Step: 170110 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:29,246-Speed 2631.38 samples/sec Loss 10.9057 LearningRate 0.0632 Epoch: 4 Global Step: 170120 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:33,134-Speed 2634.14 samples/sec Loss 10.7943 LearningRate 0.0632 Epoch: 4 Global Step: 170130 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:37,032-Speed 2628.19 samples/sec Loss 10.7729 LearningRate 0.0632 Epoch: 4 Global Step: 170140 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:40,933-Speed 2625.41 samples/sec Loss 10.8175 LearningRate 0.0632 Epoch: 4 Global Step: 170150 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:44,851-Speed 2614.23 samples/sec Loss 10.7973 LearningRate 0.0632 Epoch: 4 Global Step: 170160 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:48,747-Speed 2628.92 samples/sec Loss 10.9252 LearningRate 0.0632 Epoch: 4 Global Step: 170170 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:52,642-Speed 2629.80 samples/sec Loss 10.7637 LearningRate 0.0632 Epoch: 4 Global Step: 170180 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:51:56,535-Speed 2631.09 samples/sec Loss 10.9238 LearningRate 0.0632 Epoch: 4 Global Step: 170190 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:52:00,438-Speed 2624.22 samples/sec Loss 10.7465 LearningRate 0.0632 Epoch: 4 Global Step: 170200 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:52:04,344-Speed 2625.15 samples/sec Loss 10.7089 LearningRate 0.0632 Epoch: 4 Global Step: 170210 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:08,241-Speed 2628.80 samples/sec Loss 10.8033 LearningRate 0.0632 Epoch: 4 Global Step: 170220 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:12,151-Speed 2619.33 samples/sec Loss 10.9080 LearningRate 0.0632 Epoch: 4 Global Step: 170230 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:16,053-Speed 2624.79 samples/sec Loss 10.7291 LearningRate 0.0632 Epoch: 4 Global Step: 170240 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:19,967-Speed 2616.97 samples/sec Loss 10.8922 LearningRate 0.0632 Epoch: 4 Global Step: 170250 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:23,879-Speed 2618.20 samples/sec Loss 10.9011 LearningRate 0.0632 Epoch: 4 Global Step: 170260 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:27,800-Speed 2612.70 samples/sec Loss 10.9171 LearningRate 0.0632 Epoch: 4 Global Step: 170270 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:31,703-Speed 2624.71 samples/sec Loss 10.7181 LearningRate 0.0632 Epoch: 4 Global Step: 170280 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:35,630-Speed 2607.98 samples/sec Loss 10.8054 LearningRate 0.0632 Epoch: 4 Global Step: 170290 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:39,533-Speed 2624.45 samples/sec Loss 10.8273 LearningRate 0.0632 Epoch: 4 Global Step: 170300 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:52:43,437-Speed 2623.59 samples/sec Loss 10.7953 LearningRate 0.0632 Epoch: 4 Global Step: 170310 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:52:47,337-Speed 2625.79 samples/sec Loss 10.7408 LearningRate 0.0632 Epoch: 4 Global Step: 170320 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:52:51,239-Speed 2625.45 samples/sec Loss 10.8347 LearningRate 0.0632 Epoch: 4 Global Step: 170330 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:52:55,136-Speed 2628.02 samples/sec Loss 10.7605 LearningRate 0.0631 Epoch: 4 Global Step: 170340 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:52:59,034-Speed 2627.51 samples/sec Loss 10.6719 LearningRate 0.0631 Epoch: 4 Global Step: 170350 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:02,935-Speed 2625.69 samples/sec Loss 10.7925 LearningRate 0.0631 Epoch: 4 Global Step: 170360 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:06,830-Speed 2629.76 samples/sec Loss 10.8636 LearningRate 0.0631 Epoch: 4 Global Step: 170370 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:10,730-Speed 2626.01 samples/sec Loss 10.9459 LearningRate 0.0631 Epoch: 4 Global Step: 170380 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:14,632-Speed 2625.26 samples/sec Loss 10.8832 LearningRate 0.0631 Epoch: 4 Global Step: 170390 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:18,536-Speed 2623.25 samples/sec Loss 10.8902 LearningRate 0.0631 Epoch: 4 Global Step: 170400 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:22,448-Speed 2618.16 samples/sec Loss 10.9881 LearningRate 0.0631 Epoch: 4 Global Step: 170410 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:26,340-Speed 2632.08 samples/sec Loss 10.7620 LearningRate 0.0631 Epoch: 4 Global Step: 170420 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:30,236-Speed 2629.06 samples/sec Loss 10.8349 LearningRate 0.0631 Epoch: 4 Global Step: 170430 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:34,131-Speed 2629.62 samples/sec Loss 10.7660 LearningRate 0.0631 Epoch: 4 Global Step: 170440 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:38,006-Speed 2642.96 samples/sec Loss 10.7443 LearningRate 0.0631 Epoch: 4 Global Step: 170450 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:41,901-Speed 2629.94 samples/sec Loss 10.8312 LearningRate 0.0631 Epoch: 4 Global Step: 170460 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:45,815-Speed 2616.36 samples/sec Loss 10.7489 LearningRate 0.0631 Epoch: 4 Global Step: 170470 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:49,755-Speed 2600.43 samples/sec Loss 10.8926 LearningRate 0.0631 Epoch: 4 Global Step: 170480 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:53,651-Speed 2628.92 samples/sec Loss 10.8003 LearningRate 0.0631 Epoch: 4 Global Step: 170490 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:53:57,557-Speed 2622.61 samples/sec Loss 10.8559 LearningRate 0.0631 Epoch: 4 Global Step: 170500 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:01,461-Speed 2623.32 samples/sec Loss 10.9130 LearningRate 0.0631 Epoch: 4 Global Step: 170510 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:05,356-Speed 2629.55 samples/sec Loss 10.8523 LearningRate 0.0631 Epoch: 4 Global Step: 170520 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:09,253-Speed 2628.06 samples/sec Loss 10.7294 LearningRate 0.0631 Epoch: 4 Global Step: 170530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:13,157-Speed 2623.90 samples/sec Loss 10.8096 LearningRate 0.0631 Epoch: 4 Global Step: 170540 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:17,064-Speed 2621.68 samples/sec Loss 10.7651 LearningRate 0.0631 Epoch: 4 Global Step: 170550 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:54:20,948-Speed 2637.14 samples/sec Loss 10.8537 LearningRate 0.0631 Epoch: 4 Global Step: 170560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:24,897-Speed 2593.58 samples/sec Loss 10.8311 LearningRate 0.0631 Epoch: 4 Global Step: 170570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:28,798-Speed 2626.28 samples/sec Loss 10.8120 LearningRate 0.0631 Epoch: 4 Global Step: 170580 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:32,695-Speed 2628.24 samples/sec Loss 10.6872 LearningRate 0.0631 Epoch: 4 Global Step: 170590 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:36,594-Speed 2626.31 samples/sec Loss 10.9455 LearningRate 0.0631 Epoch: 4 Global Step: 170600 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:40,490-Speed 2628.78 samples/sec Loss 10.9038 LearningRate 0.0631 Epoch: 4 Global Step: 170610 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:44,388-Speed 2627.37 samples/sec Loss 10.8175 LearningRate 0.0631 Epoch: 4 Global Step: 170620 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:48,284-Speed 2629.80 samples/sec Loss 10.9410 LearningRate 0.0631 Epoch: 4 Global Step: 170630 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:52,184-Speed 2625.46 samples/sec Loss 10.8301 LearningRate 0.0631 Epoch: 4 Global Step: 170640 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:56,083-Speed 2627.36 samples/sec Loss 10.8605 LearningRate 0.0631 Epoch: 4 Global Step: 170650 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:54:59,983-Speed 2626.34 samples/sec Loss 10.7350 LearningRate 0.0631 Epoch: 4 Global Step: 170660 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:55:04,643-Speed 2198.09 samples/sec Loss 10.8400 LearningRate 0.0631 Epoch: 4 Global Step: 170670 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:55:08,526-Speed 2637.48 samples/sec Loss 10.7616 LearningRate 0.0631 Epoch: 4 Global Step: 170680 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:12,479-Speed 2591.29 samples/sec Loss 10.8837 LearningRate 0.0631 Epoch: 4 Global Step: 170690 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:16,426-Speed 2595.37 samples/sec Loss 10.8788 LearningRate 0.0631 Epoch: 4 Global Step: 170700 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:20,339-Speed 2617.52 samples/sec Loss 10.7320 LearningRate 0.0631 Epoch: 4 Global Step: 170710 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:24,248-Speed 2620.13 samples/sec Loss 10.8584 LearningRate 0.0631 Epoch: 4 Global Step: 170720 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:28,153-Speed 2624.63 samples/sec Loss 10.7120 LearningRate 0.0631 Epoch: 4 Global Step: 170730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:32,061-Speed 2620.92 samples/sec Loss 10.8788 LearningRate 0.0631 Epoch: 4 Global Step: 170740 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:35,963-Speed 2624.48 samples/sec Loss 10.8363 LearningRate 0.0631 Epoch: 4 Global Step: 170750 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:39,874-Speed 2618.66 samples/sec Loss 10.8878 LearningRate 0.0631 Epoch: 4 Global Step: 170760 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:43,802-Speed 2608.36 samples/sec Loss 10.7838 LearningRate 0.0631 Epoch: 4 Global Step: 170770 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:55:47,719-Speed 2615.16 samples/sec Loss 10.9075 LearningRate 0.0631 Epoch: 4 Global Step: 170780 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:55:51,618-Speed 2626.56 samples/sec Loss 10.7762 LearningRate 0.0631 Epoch: 4 Global Step: 170790 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:55:55,522-Speed 2624.31 samples/sec Loss 10.7452 LearningRate 0.0631 Epoch: 4 Global Step: 170800 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:55:59,415-Speed 2630.84 samples/sec Loss 10.8465 LearningRate 0.0631 Epoch: 4 Global Step: 170810 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:56:03,311-Speed 2628.64 samples/sec Loss 10.9224 LearningRate 0.0631 Epoch: 4 Global Step: 170820 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:56:07,197-Speed 2635.39 samples/sec Loss 10.5966 LearningRate 0.0631 Epoch: 4 Global Step: 170830 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:11,095-Speed 2628.18 samples/sec Loss 10.8583 LearningRate 0.0631 Epoch: 4 Global Step: 170840 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:14,987-Speed 2631.48 samples/sec Loss 10.7017 LearningRate 0.0631 Epoch: 4 Global Step: 170850 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:18,881-Speed 2630.74 samples/sec Loss 10.8001 LearningRate 0.0631 Epoch: 4 Global Step: 170860 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:22,779-Speed 2627.57 samples/sec Loss 10.6964 LearningRate 0.0630 Epoch: 4 Global Step: 170870 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:26,756-Speed 2575.40 samples/sec Loss 10.8482 LearningRate 0.0630 Epoch: 4 Global Step: 170880 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:30,687-Speed 2605.90 samples/sec Loss 10.7218 LearningRate 0.0630 Epoch: 4 Global Step: 170890 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:34,594-Speed 2621.23 samples/sec Loss 10.7677 LearningRate 0.0630 Epoch: 4 Global Step: 170900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:38,497-Speed 2624.59 samples/sec Loss 10.7405 LearningRate 0.0630 Epoch: 4 Global Step: 170910 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:42,408-Speed 2618.39 samples/sec Loss 10.7030 LearningRate 0.0630 Epoch: 4 Global Step: 170920 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:56:46,318-Speed 2620.39 samples/sec Loss 10.6657 LearningRate 0.0630 Epoch: 4 Global Step: 170930 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:56:50,299-Speed 2572.69 samples/sec Loss 10.8200 LearningRate 0.0630 Epoch: 4 Global Step: 170940 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:56:54,307-Speed 2556.32 samples/sec Loss 10.7527 LearningRate 0.0630 Epoch: 4 Global Step: 170950 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:56:58,222-Speed 2616.02 samples/sec Loss 10.7428 LearningRate 0.0630 Epoch: 4 Global Step: 170960 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:57:02,072-Speed 2660.53 samples/sec Loss 11.2906 LearningRate 0.0630 Epoch: 4 Global Step: 170970 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:05,981-Speed 2620.08 samples/sec Loss 11.2391 LearningRate 0.0630 Epoch: 4 Global Step: 170980 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:09,883-Speed 2625.08 samples/sec Loss 11.0576 LearningRate 0.0630 Epoch: 4 Global Step: 170990 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:13,824-Speed 2598.69 samples/sec Loss 10.8896 LearningRate 0.0630 Epoch: 4 Global Step: 171000 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:17,724-Speed 2626.85 samples/sec Loss 10.8454 LearningRate 0.0630 Epoch: 4 Global Step: 171010 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:21,689-Speed 2582.90 samples/sec Loss 10.9602 LearningRate 0.0630 Epoch: 4 Global Step: 171020 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:25,637-Speed 2594.51 samples/sec Loss 10.9397 LearningRate 0.0630 Epoch: 4 Global Step: 171030 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:29,532-Speed 2629.37 samples/sec Loss 10.7468 LearningRate 0.0630 Epoch: 4 Global Step: 171040 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:33,425-Speed 2631.35 samples/sec Loss 10.9140 LearningRate 0.0630 Epoch: 4 Global Step: 171050 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:37,316-Speed 2632.52 samples/sec Loss 10.8477 LearningRate 0.0630 Epoch: 4 Global Step: 171060 Fp16 Grad Scale: 32768 Required: 74 hours
Training: 2022-04-13 14:57:41,235-Speed 2613.39 samples/sec Loss 10.8083 LearningRate 0.0630 Epoch: 4 Global Step: 171070 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:57:45,139-Speed 2623.98 samples/sec Loss 10.8775 LearningRate 0.0630 Epoch: 4 Global Step: 171080 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:57:49,038-Speed 2627.26 samples/sec Loss 10.7247 LearningRate 0.0630 Epoch: 4 Global Step: 171090 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:57:52,930-Speed 2631.80 samples/sec Loss 10.8801 LearningRate 0.0630 Epoch: 4 Global Step: 171100 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:57:57,942-Speed 2043.11 samples/sec Loss 10.9099 LearningRate 0.0630 Epoch: 4 Global Step: 171110 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:58:01,956-Speed 2552.76 samples/sec Loss 10.8206 LearningRate 0.0630 Epoch: 4 Global Step: 171120 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:58:05,845-Speed 2633.83 samples/sec Loss 10.7613 LearningRate 0.0630 Epoch: 4 Global Step: 171130 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:58:09,757-Speed 2617.55 samples/sec Loss 10.7903 LearningRate 0.0630 Epoch: 4 Global Step: 171140 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:58:13,661-Speed 2623.81 samples/sec Loss 10.8582 LearningRate 0.0630 Epoch: 4 Global Step: 171150 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:58:17,558-Speed 2629.02 samples/sec Loss 10.7767 LearningRate 0.0630 Epoch: 4 Global Step: 171160 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 14:58:21,460-Speed 2624.14 samples/sec Loss 10.7972 LearningRate 0.0630 Epoch: 4 Global Step: 171170 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:25,366-Speed 2622.44 samples/sec Loss 10.6315 LearningRate 0.0630 Epoch: 4 Global Step: 171180 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:29,296-Speed 2606.56 samples/sec Loss 10.6412 LearningRate 0.0630 Epoch: 4 Global Step: 171190 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:33,211-Speed 2616.50 samples/sec Loss 10.7645 LearningRate 0.0630 Epoch: 4 Global Step: 171200 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:37,129-Speed 2614.08 samples/sec Loss 10.7257 LearningRate 0.0630 Epoch: 4 Global Step: 171210 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:41,024-Speed 2629.83 samples/sec Loss 10.8126 LearningRate 0.0630 Epoch: 4 Global Step: 171220 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:44,917-Speed 2631.60 samples/sec Loss 10.7253 LearningRate 0.0630 Epoch: 4 Global Step: 171230 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:48,823-Speed 2621.87 samples/sec Loss 10.8373 LearningRate 0.0630 Epoch: 4 Global Step: 171240 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:52,718-Speed 2630.30 samples/sec Loss 10.8687 LearningRate 0.0630 Epoch: 4 Global Step: 171250 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:58:56,612-Speed 2630.05 samples/sec Loss 10.9822 LearningRate 0.0630 Epoch: 4 Global Step: 171260 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:00,509-Speed 2627.81 samples/sec Loss 10.7715 LearningRate 0.0630 Epoch: 4 Global Step: 171270 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:59:04,432-Speed 2611.21 samples/sec Loss 10.6589 LearningRate 0.0630 Epoch: 4 Global Step: 171280 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:59:08,344-Speed 2618.27 samples/sec Loss 10.9594 LearningRate 0.0630 Epoch: 4 Global Step: 171290 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:59:12,241-Speed 2628.46 samples/sec Loss 10.7787 LearningRate 0.0630 Epoch: 4 Global Step: 171300 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:59:16,136-Speed 2629.90 samples/sec Loss 10.8857 LearningRate 0.0630 Epoch: 4 Global Step: 171310 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 14:59:20,048-Speed 2618.57 samples/sec Loss 10.9644 LearningRate 0.0630 Epoch: 4 Global Step: 171320 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:23,945-Speed 2627.84 samples/sec Loss 10.8266 LearningRate 0.0630 Epoch: 4 Global Step: 171330 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:27,837-Speed 2632.12 samples/sec Loss 10.9453 LearningRate 0.0630 Epoch: 4 Global Step: 171340 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:31,738-Speed 2625.76 samples/sec Loss 10.8112 LearningRate 0.0630 Epoch: 4 Global Step: 171350 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:35,636-Speed 2627.58 samples/sec Loss 10.9075 LearningRate 0.0630 Epoch: 4 Global Step: 171360 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:39,529-Speed 2630.66 samples/sec Loss 10.8749 LearningRate 0.0630 Epoch: 4 Global Step: 171370 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:43,452-Speed 2611.80 samples/sec Loss 10.9388 LearningRate 0.0630 Epoch: 4 Global Step: 171380 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:47,364-Speed 2617.54 samples/sec Loss 10.8823 LearningRate 0.0629 Epoch: 4 Global Step: 171390 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:51,273-Speed 2620.71 samples/sec Loss 10.8174 LearningRate 0.0629 Epoch: 4 Global Step: 171400 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:55,176-Speed 2624.17 samples/sec Loss 10.9107 LearningRate 0.0629 Epoch: 4 Global Step: 171410 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 14:59:59,072-Speed 2628.82 samples/sec Loss 10.8537 LearningRate 0.0629 Epoch: 4 Global Step: 171420 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:00:02,968-Speed 2629.54 samples/sec Loss 10.7312 LearningRate 0.0629 Epoch: 4 Global Step: 171430 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:00:06,844-Speed 2642.04 samples/sec Loss 10.8411 LearningRate 0.0629 Epoch: 4 Global Step: 171440 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:10,745-Speed 2625.77 samples/sec Loss 10.6726 LearningRate 0.0629 Epoch: 4 Global Step: 171450 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:14,640-Speed 2629.46 samples/sec Loss 10.6338 LearningRate 0.0629 Epoch: 4 Global Step: 171460 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:18,575-Speed 2603.39 samples/sec Loss 10.7419 LearningRate 0.0629 Epoch: 4 Global Step: 171470 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:22,473-Speed 2627.91 samples/sec Loss 10.8858 LearningRate 0.0629 Epoch: 4 Global Step: 171480 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:26,366-Speed 2630.73 samples/sec Loss 10.8297 LearningRate 0.0629 Epoch: 4 Global Step: 171490 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:30,262-Speed 2629.17 samples/sec Loss 10.8536 LearningRate 0.0629 Epoch: 4 Global Step: 171500 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:34,161-Speed 2627.18 samples/sec Loss 10.7999 LearningRate 0.0629 Epoch: 4 Global Step: 171510 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:38,056-Speed 2629.23 samples/sec Loss 10.9471 LearningRate 0.0629 Epoch: 4 Global Step: 171520 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:41,955-Speed 2627.60 samples/sec Loss 10.8861 LearningRate 0.0629 Epoch: 4 Global Step: 171530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:45,863-Speed 2620.19 samples/sec Loss 10.7219 LearningRate 0.0629 Epoch: 4 Global Step: 171540 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:00:49,751-Speed 2634.79 samples/sec Loss 10.9466 LearningRate 0.0629 Epoch: 4 Global Step: 171550 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:53,647-Speed 2629.20 samples/sec Loss 10.7245 LearningRate 0.0629 Epoch: 4 Global Step: 171560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:00:57,545-Speed 2628.02 samples/sec Loss 10.7904 LearningRate 0.0629 Epoch: 4 Global Step: 171570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:01:01,439-Speed 2630.03 samples/sec Loss 10.7776 LearningRate 0.0629 Epoch: 4 Global Step: 171580 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:01:05,337-Speed 2627.36 samples/sec Loss 10.8350 LearningRate 0.0629 Epoch: 4 Global Step: 171590 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:01:09,252-Speed 2616.02 samples/sec Loss 10.8726 LearningRate 0.0629 Epoch: 4 Global Step: 171600 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:01:13,177-Speed 2609.55 samples/sec Loss 10.6939 LearningRate 0.0629 Epoch: 4 Global Step: 171610 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:01:17,716-Speed 2256.39 samples/sec Loss 10.7711 LearningRate 0.0629 Epoch: 4 Global Step: 171620 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:21,610-Speed 2630.88 samples/sec Loss 10.7439 LearningRate 0.0629 Epoch: 4 Global Step: 171630 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:25,510-Speed 2626.64 samples/sec Loss 10.7036 LearningRate 0.0629 Epoch: 4 Global Step: 171640 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:29,406-Speed 2628.64 samples/sec Loss 10.8593 LearningRate 0.0629 Epoch: 4 Global Step: 171650 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:33,310-Speed 2623.64 samples/sec Loss 10.8831 LearningRate 0.0629 Epoch: 4 Global Step: 171660 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:37,211-Speed 2625.29 samples/sec Loss 10.7542 LearningRate 0.0629 Epoch: 4 Global Step: 171670 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:41,107-Speed 2628.85 samples/sec Loss 10.8625 LearningRate 0.0629 Epoch: 4 Global Step: 171680 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:45,021-Speed 2616.82 samples/sec Loss 10.9070 LearningRate 0.0629 Epoch: 4 Global Step: 171690 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:48,967-Speed 2595.57 samples/sec Loss 10.8027 LearningRate 0.0629 Epoch: 4 Global Step: 171700 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:52,867-Speed 2626.51 samples/sec Loss 10.6318 LearningRate 0.0629 Epoch: 4 Global Step: 171710 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:01:56,769-Speed 2625.21 samples/sec Loss 10.8198 LearningRate 0.0629 Epoch: 4 Global Step: 171720 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:00,710-Speed 2598.69 samples/sec Loss 10.8320 LearningRate 0.0629 Epoch: 4 Global Step: 171730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:04,606-Speed 2629.03 samples/sec Loss 10.8543 LearningRate 0.0629 Epoch: 4 Global Step: 171740 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:08,511-Speed 2623.05 samples/sec Loss 10.7318 LearningRate 0.0629 Epoch: 4 Global Step: 171750 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:12,516-Speed 2557.61 samples/sec Loss 10.8245 LearningRate 0.0629 Epoch: 4 Global Step: 171760 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:16,433-Speed 2614.68 samples/sec Loss 10.7077 LearningRate 0.0629 Epoch: 4 Global Step: 171770 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:20,331-Speed 2627.16 samples/sec Loss 10.6873 LearningRate 0.0629 Epoch: 4 Global Step: 171780 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:24,230-Speed 2626.98 samples/sec Loss 10.8784 LearningRate 0.0629 Epoch: 4 Global Step: 171790 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:28,133-Speed 2624.09 samples/sec Loss 10.7770 LearningRate 0.0629 Epoch: 4 Global Step: 171800 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:32,027-Speed 2630.78 samples/sec Loss 10.6041 LearningRate 0.0629 Epoch: 4 Global Step: 171810 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:35,927-Speed 2626.44 samples/sec Loss 10.8865 LearningRate 0.0629 Epoch: 4 Global Step: 171820 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:02:39,828-Speed 2625.57 samples/sec Loss 10.6881 LearningRate 0.0629 Epoch: 4 Global Step: 171830 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:02:43,723-Speed 2629.37 samples/sec Loss 10.8657 LearningRate 0.0629 Epoch: 4 Global Step: 171840 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:02:47,613-Speed 2632.91 samples/sec Loss 10.9765 LearningRate 0.0629 Epoch: 4 Global Step: 171850 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:51,512-Speed 2627.23 samples/sec Loss 10.7244 LearningRate 0.0629 Epoch: 4 Global Step: 171860 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:55,414-Speed 2625.14 samples/sec Loss 10.8713 LearningRate 0.0629 Epoch: 4 Global Step: 171870 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:02:59,313-Speed 2626.68 samples/sec Loss 10.8089 LearningRate 0.0629 Epoch: 4 Global Step: 171880 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:03,208-Speed 2630.35 samples/sec Loss 10.8048 LearningRate 0.0629 Epoch: 4 Global Step: 171890 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:07,102-Speed 2630.29 samples/sec Loss 10.8504 LearningRate 0.0629 Epoch: 4 Global Step: 171900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:11,007-Speed 2622.73 samples/sec Loss 10.7104 LearningRate 0.0628 Epoch: 4 Global Step: 171910 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:14,901-Speed 2630.58 samples/sec Loss 10.8256 LearningRate 0.0628 Epoch: 4 Global Step: 171920 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:18,918-Speed 2549.89 samples/sec Loss 10.7599 LearningRate 0.0628 Epoch: 4 Global Step: 171930 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:22,822-Speed 2623.31 samples/sec Loss 10.7853 LearningRate 0.0628 Epoch: 4 Global Step: 171940 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:26,715-Speed 2631.22 samples/sec Loss 10.6901 LearningRate 0.0628 Epoch: 4 Global Step: 171950 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:03:30,611-Speed 2628.59 samples/sec Loss 10.7629 LearningRate 0.0628 Epoch: 4 Global Step: 171960 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:03:34,509-Speed 2628.14 samples/sec Loss 10.8013 LearningRate 0.0628 Epoch: 4 Global Step: 171970 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:03:38,403-Speed 2630.68 samples/sec Loss 10.6717 LearningRate 0.0628 Epoch: 4 Global Step: 171980 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:03:42,297-Speed 2630.15 samples/sec Loss 10.7060 LearningRate 0.0628 Epoch: 4 Global Step: 171990 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:03:46,195-Speed 2627.47 samples/sec Loss 10.8420 LearningRate 0.0628 Epoch: 4 Global Step: 172000 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:03:50,075-Speed 2639.55 samples/sec Loss 10.7104 LearningRate 0.0628 Epoch: 4 Global Step: 172010 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:53,964-Speed 2633.93 samples/sec Loss 10.8185 LearningRate 0.0628 Epoch: 4 Global Step: 172020 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:03:57,860-Speed 2629.27 samples/sec Loss 11.0383 LearningRate 0.0628 Epoch: 4 Global Step: 172030 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:01,777-Speed 2614.80 samples/sec Loss 10.8193 LearningRate 0.0628 Epoch: 4 Global Step: 172040 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:05,673-Speed 2628.92 samples/sec Loss 10.8381 LearningRate 0.0628 Epoch: 4 Global Step: 172050 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:09,569-Speed 2629.38 samples/sec Loss 10.7874 LearningRate 0.0628 Epoch: 4 Global Step: 172060 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:13,467-Speed 2627.07 samples/sec Loss 10.7885 LearningRate 0.0628 Epoch: 4 Global Step: 172070 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:17,391-Speed 2610.66 samples/sec Loss 10.7678 LearningRate 0.0628 Epoch: 4 Global Step: 172080 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:21,284-Speed 2631.34 samples/sec Loss 10.6034 LearningRate 0.0628 Epoch: 4 Global Step: 172090 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:25,177-Speed 2630.76 samples/sec Loss 10.7601 LearningRate 0.0628 Epoch: 4 Global Step: 172100 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:29,092-Speed 2616.62 samples/sec Loss 10.9031 LearningRate 0.0628 Epoch: 4 Global Step: 172110 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:04:32,987-Speed 2629.39 samples/sec Loss 10.8284 LearningRate 0.0628 Epoch: 4 Global Step: 172120 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:04:36,892-Speed 2622.76 samples/sec Loss 10.6740 LearningRate 0.0628 Epoch: 4 Global Step: 172130 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:04:40,783-Speed 2632.48 samples/sec Loss 10.6820 LearningRate 0.0628 Epoch: 4 Global Step: 172140 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:44,677-Speed 2630.46 samples/sec Loss 10.6698 LearningRate 0.0628 Epoch: 4 Global Step: 172150 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:48,570-Speed 2630.89 samples/sec Loss 10.7947 LearningRate 0.0628 Epoch: 4 Global Step: 172160 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:52,475-Speed 2622.85 samples/sec Loss 10.7849 LearningRate 0.0628 Epoch: 4 Global Step: 172170 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:04:56,389-Speed 2617.16 samples/sec Loss 10.7802 LearningRate 0.0628 Epoch: 4 Global Step: 172180 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:05:00,284-Speed 2629.73 samples/sec Loss 10.6254 LearningRate 0.0628 Epoch: 4 Global Step: 172190 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:04,199-Speed 2615.73 samples/sec Loss 10.8117 LearningRate 0.0628 Epoch: 4 Global Step: 172200 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:08,122-Speed 2611.16 samples/sec Loss 10.7770 LearningRate 0.0628 Epoch: 4 Global Step: 172210 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:12,026-Speed 2623.54 samples/sec Loss 10.6998 LearningRate 0.0628 Epoch: 4 Global Step: 172220 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:16,158-Speed 2478.92 samples/sec Loss 10.7446 LearningRate 0.0628 Epoch: 4 Global Step: 172230 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:20,063-Speed 2622.45 samples/sec Loss 10.8250 LearningRate 0.0628 Epoch: 4 Global Step: 172240 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:23,954-Speed 2632.19 samples/sec Loss 10.8549 LearningRate 0.0628 Epoch: 4 Global Step: 172250 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:27,846-Speed 2632.19 samples/sec Loss 10.7645 LearningRate 0.0628 Epoch: 4 Global Step: 172260 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:31,739-Speed 2631.55 samples/sec Loss 10.6717 LearningRate 0.0628 Epoch: 4 Global Step: 172270 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:35,629-Speed 2632.82 samples/sec Loss 10.7440 LearningRate 0.0628 Epoch: 4 Global Step: 172280 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:05:39,526-Speed 2628.16 samples/sec Loss 10.7260 LearningRate 0.0628 Epoch: 4 Global Step: 172290 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:05:43,419-Speed 2631.47 samples/sec Loss 10.7482 LearningRate 0.0628 Epoch: 4 Global Step: 172300 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:05:47,315-Speed 2628.55 samples/sec Loss 10.9120 LearningRate 0.0628 Epoch: 4 Global Step: 172310 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:05:51,208-Speed 2631.63 samples/sec Loss 10.7968 LearningRate 0.0628 Epoch: 4 Global Step: 172320 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:05:55,110-Speed 2624.83 samples/sec Loss 10.7646 LearningRate 0.0628 Epoch: 4 Global Step: 172330 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:05:58,993-Speed 2638.77 samples/sec Loss 10.7389 LearningRate 0.0628 Epoch: 4 Global Step: 172340 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:02,888-Speed 2629.51 samples/sec Loss 10.7611 LearningRate 0.0628 Epoch: 4 Global Step: 172350 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:06,790-Speed 2624.84 samples/sec Loss 10.6705 LearningRate 0.0628 Epoch: 4 Global Step: 172360 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:10,730-Speed 2599.70 samples/sec Loss 10.7131 LearningRate 0.0628 Epoch: 4 Global Step: 172370 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:14,625-Speed 2630.28 samples/sec Loss 11.0811 LearningRate 0.0628 Epoch: 4 Global Step: 172380 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:18,534-Speed 2620.36 samples/sec Loss 10.7106 LearningRate 0.0628 Epoch: 4 Global Step: 172390 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:22,433-Speed 2627.15 samples/sec Loss 10.8176 LearningRate 0.0628 Epoch: 4 Global Step: 172400 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:26,331-Speed 2627.52 samples/sec Loss 10.7779 LearningRate 0.0628 Epoch: 4 Global Step: 172410 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:30,223-Speed 2631.69 samples/sec Loss 10.7210 LearningRate 0.0628 Epoch: 4 Global Step: 172420 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:34,121-Speed 2627.97 samples/sec Loss 10.8464 LearningRate 0.0627 Epoch: 4 Global Step: 172430 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:06:38,012-Speed 2632.17 samples/sec Loss 10.7880 LearningRate 0.0627 Epoch: 4 Global Step: 172440 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:06:41,910-Speed 2627.59 samples/sec Loss 10.8712 LearningRate 0.0627 Epoch: 4 Global Step: 172450 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:06:45,815-Speed 2623.10 samples/sec Loss 10.7549 LearningRate 0.0627 Epoch: 4 Global Step: 172460 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:06:49,727-Speed 2617.92 samples/sec Loss 10.6915 LearningRate 0.0627 Epoch: 4 Global Step: 172470 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:06:53,619-Speed 2631.65 samples/sec Loss 10.6299 LearningRate 0.0627 Epoch: 4 Global Step: 172480 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:06:57,516-Speed 2628.52 samples/sec Loss 10.6828 LearningRate 0.0627 Epoch: 4 Global Step: 172490 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:07:01,412-Speed 2629.44 samples/sec Loss 10.7927 LearningRate 0.0627 Epoch: 4 Global Step: 172500 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:07:05,320-Speed 2620.46 samples/sec Loss 10.7660 LearningRate 0.0627 Epoch: 4 Global Step: 172510 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:07:09,214-Speed 2629.91 samples/sec Loss 10.8600 LearningRate 0.0627 Epoch: 4 Global Step: 172520 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:07:13,110-Speed 2629.60 samples/sec Loss 10.9167 LearningRate 0.0627 Epoch: 4 Global Step: 172530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:07:16,991-Speed 2638.94 samples/sec Loss 10.8000 LearningRate 0.0627 Epoch: 4 Global Step: 172540 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:07:20,882-Speed 2632.66 samples/sec Loss 10.6127 LearningRate 0.0627 Epoch: 4 Global Step: 172550 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:24,812-Speed 2606.44 samples/sec Loss 10.8799 LearningRate 0.0627 Epoch: 4 Global Step: 172560 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:28,705-Speed 2631.51 samples/sec Loss 10.6614 LearningRate 0.0627 Epoch: 4 Global Step: 172570 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:32,609-Speed 2623.21 samples/sec Loss 10.7932 LearningRate 0.0627 Epoch: 4 Global Step: 172580 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:36,500-Speed 2632.25 samples/sec Loss 10.8804 LearningRate 0.0627 Epoch: 4 Global Step: 172590 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:40,397-Speed 2628.56 samples/sec Loss 10.7321 LearningRate 0.0627 Epoch: 4 Global Step: 172600 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:44,323-Speed 2609.10 samples/sec Loss 10.9244 LearningRate 0.0627 Epoch: 4 Global Step: 172610 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:48,237-Speed 2616.25 samples/sec Loss 10.8323 LearningRate 0.0627 Epoch: 4 Global Step: 172620 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:52,146-Speed 2620.28 samples/sec Loss 10.7039 LearningRate 0.0627 Epoch: 4 Global Step: 172630 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:56,041-Speed 2630.04 samples/sec Loss 10.6951 LearningRate 0.0627 Epoch: 4 Global Step: 172640 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:07:59,936-Speed 2629.90 samples/sec Loss 10.6590 LearningRate 0.0627 Epoch: 4 Global Step: 172650 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:03,833-Speed 2628.28 samples/sec Loss 10.8381 LearningRate 0.0627 Epoch: 4 Global Step: 172660 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:07,748-Speed 2616.10 samples/sec Loss 10.8175 LearningRate 0.0627 Epoch: 4 Global Step: 172670 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:11,644-Speed 2629.07 samples/sec Loss 10.7102 LearningRate 0.0627 Epoch: 4 Global Step: 172680 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:15,536-Speed 2631.63 samples/sec Loss 10.7960 LearningRate 0.0627 Epoch: 4 Global Step: 172690 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:19,429-Speed 2631.59 samples/sec Loss 10.7356 LearningRate 0.0627 Epoch: 4 Global Step: 172700 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:23,323-Speed 2630.02 samples/sec Loss 10.7800 LearningRate 0.0627 Epoch: 4 Global Step: 172710 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:27,231-Speed 2620.78 samples/sec Loss 10.8633 LearningRate 0.0627 Epoch: 4 Global Step: 172720 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:31,131-Speed 2626.27 samples/sec Loss 10.6385 LearningRate 0.0627 Epoch: 4 Global Step: 172730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:35,027-Speed 2629.72 samples/sec Loss 10.8101 LearningRate 0.0627 Epoch: 4 Global Step: 172740 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:38,902-Speed 2642.71 samples/sec Loss 10.9086 LearningRate 0.0627 Epoch: 4 Global Step: 172750 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:42,796-Speed 2630.16 samples/sec Loss 10.8875 LearningRate 0.0627 Epoch: 4 Global Step: 172760 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:46,692-Speed 2629.07 samples/sec Loss 10.6318 LearningRate 0.0627 Epoch: 4 Global Step: 172770 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:50,594-Speed 2624.98 samples/sec Loss 10.6061 LearningRate 0.0627 Epoch: 4 Global Step: 172780 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:54,493-Speed 2627.14 samples/sec Loss 10.7264 LearningRate 0.0627 Epoch: 4 Global Step: 172790 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:08:58,390-Speed 2628.36 samples/sec Loss 10.6995 LearningRate 0.0627 Epoch: 4 Global Step: 172800 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:09:02,282-Speed 2631.89 samples/sec Loss 10.8567 LearningRate 0.0627 Epoch: 4 Global Step: 172810 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:06,190-Speed 2620.73 samples/sec Loss 10.8632 LearningRate 0.0627 Epoch: 4 Global Step: 172820 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:10,087-Speed 2628.61 samples/sec Loss 10.6662 LearningRate 0.0627 Epoch: 4 Global Step: 172830 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:13,988-Speed 2625.39 samples/sec Loss 10.7887 LearningRate 0.0627 Epoch: 4 Global Step: 172840 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:17,886-Speed 2627.68 samples/sec Loss 10.7631 LearningRate 0.0627 Epoch: 4 Global Step: 172850 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:21,780-Speed 2630.25 samples/sec Loss 10.7577 LearningRate 0.0627 Epoch: 4 Global Step: 172860 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:25,672-Speed 2632.06 samples/sec Loss 10.6995 LearningRate 0.0627 Epoch: 4 Global Step: 172870 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:29,568-Speed 2629.17 samples/sec Loss 10.8272 LearningRate 0.0627 Epoch: 4 Global Step: 172880 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:33,463-Speed 2629.77 samples/sec Loss 10.7659 LearningRate 0.0627 Epoch: 4 Global Step: 172890 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:37,381-Speed 2613.80 samples/sec Loss 10.8155 LearningRate 0.0627 Epoch: 4 Global Step: 172900 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:09:41,274-Speed 2630.74 samples/sec Loss 10.8455 LearningRate 0.0627 Epoch: 4 Global Step: 172910 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:09:45,170-Speed 2629.07 samples/sec Loss 10.8940 LearningRate 0.0627 Epoch: 4 Global Step: 172920 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:09:49,077-Speed 2621.74 samples/sec Loss 10.8651 LearningRate 0.0627 Epoch: 4 Global Step: 172930 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:09:52,968-Speed 2632.37 samples/sec Loss 10.9635 LearningRate 0.0627 Epoch: 4 Global Step: 172940 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:09:56,843-Speed 2643.31 samples/sec Loss 11.3575 LearningRate 0.0627 Epoch: 4 Global Step: 172950 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:00,738-Speed 2629.54 samples/sec Loss 10.9786 LearningRate 0.0626 Epoch: 4 Global Step: 172960 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:04,674-Speed 2602.58 samples/sec Loss 10.8423 LearningRate 0.0626 Epoch: 4 Global Step: 172970 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:08,563-Speed 2633.55 samples/sec Loss 11.1827 LearningRate 0.0626 Epoch: 4 Global Step: 172980 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:12,454-Speed 2632.62 samples/sec Loss 10.8828 LearningRate 0.0626 Epoch: 4 Global Step: 172990 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:16,346-Speed 2631.49 samples/sec Loss 10.7993 LearningRate 0.0626 Epoch: 4 Global Step: 173000 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:20,260-Speed 2616.87 samples/sec Loss 11.0231 LearningRate 0.0626 Epoch: 4 Global Step: 173010 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:24,165-Speed 2623.50 samples/sec Loss 10.8502 LearningRate 0.0626 Epoch: 4 Global Step: 173020 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:28,056-Speed 2632.33 samples/sec Loss 10.8482 LearningRate 0.0626 Epoch: 4 Global Step: 173030 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:31,988-Speed 2604.60 samples/sec Loss 10.7245 LearningRate 0.0626 Epoch: 4 Global Step: 173040 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:35,891-Speed 2624.32 samples/sec Loss 10.8305 LearningRate 0.0626 Epoch: 4 Global Step: 173050 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:10:39,788-Speed 2628.40 samples/sec Loss 10.8836 LearningRate 0.0626 Epoch: 4 Global Step: 173060 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:10:43,701-Speed 2617.51 samples/sec Loss 10.8909 LearningRate 0.0626 Epoch: 4 Global Step: 173070 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:10:47,593-Speed 2631.70 samples/sec Loss 10.6543 LearningRate 0.0626 Epoch: 4 Global Step: 173080 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:10:51,472-Speed 2640.89 samples/sec Loss 10.8102 LearningRate 0.0626 Epoch: 4 Global Step: 173090 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:55,368-Speed 2629.01 samples/sec Loss 10.6681 LearningRate 0.0626 Epoch: 4 Global Step: 173100 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:10:59,275-Speed 2622.05 samples/sec Loss 10.9143 LearningRate 0.0626 Epoch: 4 Global Step: 173110 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:03,175-Speed 2625.84 samples/sec Loss 10.7019 LearningRate 0.0626 Epoch: 4 Global Step: 173120 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:07,065-Speed 2632.70 samples/sec Loss 10.8697 LearningRate 0.0626 Epoch: 4 Global Step: 173130 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:10,964-Speed 2626.90 samples/sec Loss 10.8202 LearningRate 0.0626 Epoch: 4 Global Step: 173140 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:14,866-Speed 2625.27 samples/sec Loss 10.8934 LearningRate 0.0626 Epoch: 4 Global Step: 173150 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:18,759-Speed 2630.28 samples/sec Loss 10.7429 LearningRate 0.0626 Epoch: 4 Global Step: 173160 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:22,651-Speed 2632.36 samples/sec Loss 10.8234 LearningRate 0.0626 Epoch: 4 Global Step: 173170 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:26,550-Speed 2627.20 samples/sec Loss 10.7531 LearningRate 0.0626 Epoch: 4 Global Step: 173180 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:11:30,465-Speed 2615.99 samples/sec Loss 10.7279 LearningRate 0.0626 Epoch: 4 Global Step: 173190 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:11:34,382-Speed 2614.74 samples/sec Loss 10.7632 LearningRate 0.0626 Epoch: 4 Global Step: 173200 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:11:38,292-Speed 2619.40 samples/sec Loss 10.8112 LearningRate 0.0626 Epoch: 4 Global Step: 173210 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:11:42,187-Speed 2629.75 samples/sec Loss 10.9360 LearningRate 0.0626 Epoch: 4 Global Step: 173220 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:11:46,091-Speed 2623.25 samples/sec Loss 10.8704 LearningRate 0.0626 Epoch: 4 Global Step: 173230 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:11:49,987-Speed 2629.13 samples/sec Loss 10.7579 LearningRate 0.0626 Epoch: 4 Global Step: 173240 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:11:53,880-Speed 2631.02 samples/sec Loss 10.8565 LearningRate 0.0626 Epoch: 4 Global Step: 173250 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:11:57,915-Speed 2538.25 samples/sec Loss 10.7464 LearningRate 0.0626 Epoch: 4 Global Step: 173260 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:01,810-Speed 2629.71 samples/sec Loss 10.8736 LearningRate 0.0626 Epoch: 4 Global Step: 173270 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:05,704-Speed 2629.99 samples/sec Loss 10.8719 LearningRate 0.0626 Epoch: 4 Global Step: 173280 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:09,603-Speed 2627.32 samples/sec Loss 10.7459 LearningRate 0.0626 Epoch: 4 Global Step: 173290 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:12:13,496-Speed 2631.07 samples/sec Loss 10.7601 LearningRate 0.0626 Epoch: 4 Global Step: 173300 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:12:17,418-Speed 2610.90 samples/sec Loss 10.8656 LearningRate 0.0626 Epoch: 4 Global Step: 173310 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:21,400-Speed 2572.26 samples/sec Loss 10.8739 LearningRate 0.0626 Epoch: 4 Global Step: 173320 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:25,295-Speed 2629.51 samples/sec Loss 10.7896 LearningRate 0.0626 Epoch: 4 Global Step: 173330 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:29,191-Speed 2628.98 samples/sec Loss 10.5838 LearningRate 0.0626 Epoch: 4 Global Step: 173340 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:33,085-Speed 2630.64 samples/sec Loss 10.7678 LearningRate 0.0626 Epoch: 4 Global Step: 173350 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:36,974-Speed 2633.35 samples/sec Loss 10.7598 LearningRate 0.0626 Epoch: 4 Global Step: 173360 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:40,864-Speed 2632.79 samples/sec Loss 10.7016 LearningRate 0.0626 Epoch: 4 Global Step: 173370 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:44,780-Speed 2616.09 samples/sec Loss 10.7955 LearningRate 0.0626 Epoch: 4 Global Step: 173380 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:48,700-Speed 2612.37 samples/sec Loss 10.7060 LearningRate 0.0626 Epoch: 4 Global Step: 173390 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:52,604-Speed 2623.58 samples/sec Loss 10.9020 LearningRate 0.0626 Epoch: 4 Global Step: 173400 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:12:56,499-Speed 2629.32 samples/sec Loss 10.5962 LearningRate 0.0626 Epoch: 4 Global Step: 173410 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:13:00,376-Speed 2642.14 samples/sec Loss 10.7046 LearningRate 0.0626 Epoch: 4 Global Step: 173420 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:04,267-Speed 2632.17 samples/sec Loss 10.8801 LearningRate 0.0626 Epoch: 4 Global Step: 173430 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:08,157-Speed 2633.10 samples/sec Loss 10.7361 LearningRate 0.0626 Epoch: 4 Global Step: 173440 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:12,046-Speed 2633.45 samples/sec Loss 10.6341 LearningRate 0.0626 Epoch: 4 Global Step: 173450 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:15,936-Speed 2633.24 samples/sec Loss 10.7365 LearningRate 0.0626 Epoch: 4 Global Step: 173460 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:19,830-Speed 2630.54 samples/sec Loss 10.7617 LearningRate 0.0626 Epoch: 4 Global Step: 173470 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:23,730-Speed 2625.93 samples/sec Loss 10.7822 LearningRate 0.0625 Epoch: 4 Global Step: 173480 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:27,629-Speed 2627.03 samples/sec Loss 10.7678 LearningRate 0.0625 Epoch: 4 Global Step: 173490 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:31,532-Speed 2624.23 samples/sec Loss 10.7415 LearningRate 0.0625 Epoch: 4 Global Step: 173500 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:35,434-Speed 2624.88 samples/sec Loss 10.6994 LearningRate 0.0625 Epoch: 4 Global Step: 173510 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:39,384-Speed 2592.89 samples/sec Loss 10.7858 LearningRate 0.0625 Epoch: 4 Global Step: 173520 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:13:43,258-Speed 2643.47 samples/sec Loss 10.8031 LearningRate 0.0625 Epoch: 4 Global Step: 173530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:47,157-Speed 2626.90 samples/sec Loss 10.7423 LearningRate 0.0625 Epoch: 4 Global Step: 173540 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:51,052-Speed 2630.15 samples/sec Loss 10.7350 LearningRate 0.0625 Epoch: 4 Global Step: 173550 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:54,946-Speed 2629.97 samples/sec Loss 10.7598 LearningRate 0.0625 Epoch: 4 Global Step: 173560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:13:58,844-Speed 2628.14 samples/sec Loss 10.7751 LearningRate 0.0625 Epoch: 4 Global Step: 173570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:02,740-Speed 2628.75 samples/sec Loss 10.6500 LearningRate 0.0625 Epoch: 4 Global Step: 173580 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:06,635-Speed 2628.97 samples/sec Loss 10.7573 LearningRate 0.0625 Epoch: 4 Global Step: 173590 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:10,543-Speed 2620.95 samples/sec Loss 10.8009 LearningRate 0.0625 Epoch: 4 Global Step: 173600 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:14,438-Speed 2629.45 samples/sec Loss 10.7831 LearningRate 0.0625 Epoch: 4 Global Step: 173610 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:18,332-Speed 2630.44 samples/sec Loss 10.7263 LearningRate 0.0625 Epoch: 4 Global Step: 173620 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:22,234-Speed 2625.13 samples/sec Loss 10.7962 LearningRate 0.0625 Epoch: 4 Global Step: 173630 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:14:26,126-Speed 2631.81 samples/sec Loss 10.7702 LearningRate 0.0625 Epoch: 4 Global Step: 173640 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:14:30,012-Speed 2635.60 samples/sec Loss 10.8154 LearningRate 0.0625 Epoch: 4 Global Step: 173650 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:33,898-Speed 2635.46 samples/sec Loss 10.7184 LearningRate 0.0625 Epoch: 4 Global Step: 173660 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:37,789-Speed 2632.49 samples/sec Loss 10.5403 LearningRate 0.0625 Epoch: 4 Global Step: 173670 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:41,680-Speed 2632.14 samples/sec Loss 10.8104 LearningRate 0.0625 Epoch: 4 Global Step: 173680 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:45,584-Speed 2623.36 samples/sec Loss 10.6946 LearningRate 0.0625 Epoch: 4 Global Step: 173690 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:49,468-Speed 2637.32 samples/sec Loss 10.9046 LearningRate 0.0625 Epoch: 4 Global Step: 173700 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:53,357-Speed 2633.40 samples/sec Loss 10.6771 LearningRate 0.0625 Epoch: 4 Global Step: 173710 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:14:57,249-Speed 2631.66 samples/sec Loss 10.7664 LearningRate 0.0625 Epoch: 4 Global Step: 173720 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:15:01,139-Speed 2633.33 samples/sec Loss 10.8431 LearningRate 0.0625 Epoch: 4 Global Step: 173730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:15:05,043-Speed 2623.60 samples/sec Loss 10.8031 LearningRate 0.0625 Epoch: 4 Global Step: 173740 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:08,934-Speed 2632.35 samples/sec Loss 10.8083 LearningRate 0.0625 Epoch: 4 Global Step: 173750 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:12,832-Speed 2627.43 samples/sec Loss 10.7510 LearningRate 0.0625 Epoch: 4 Global Step: 173760 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:16,722-Speed 2632.55 samples/sec Loss 10.6500 LearningRate 0.0625 Epoch: 4 Global Step: 173770 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:20,611-Speed 2634.09 samples/sec Loss 10.7489 LearningRate 0.0625 Epoch: 4 Global Step: 173780 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:24,514-Speed 2624.05 samples/sec Loss 10.7568 LearningRate 0.0625 Epoch: 4 Global Step: 173790 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:28,418-Speed 2623.69 samples/sec Loss 10.6574 LearningRate 0.0625 Epoch: 4 Global Step: 173800 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:32,324-Speed 2621.80 samples/sec Loss 10.7708 LearningRate 0.0625 Epoch: 4 Global Step: 173810 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:36,224-Speed 2626.57 samples/sec Loss 10.7326 LearningRate 0.0625 Epoch: 4 Global Step: 173820 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:40,124-Speed 2626.03 samples/sec Loss 10.8245 LearningRate 0.0625 Epoch: 4 Global Step: 173830 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:15:44,016-Speed 2631.44 samples/sec Loss 10.7205 LearningRate 0.0625 Epoch: 4 Global Step: 173840 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:15:47,912-Speed 2628.99 samples/sec Loss 10.7694 LearningRate 0.0625 Epoch: 4 Global Step: 173850 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:15:51,805-Speed 2631.13 samples/sec Loss 10.6566 LearningRate 0.0625 Epoch: 4 Global Step: 173860 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:15:55,698-Speed 2631.07 samples/sec Loss 10.8494 LearningRate 0.0625 Epoch: 4 Global Step: 173870 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:15:59,594-Speed 2628.92 samples/sec Loss 10.7814 LearningRate 0.0625 Epoch: 4 Global Step: 173880 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:16:03,491-Speed 2628.38 samples/sec Loss 10.9482 LearningRate 0.0625 Epoch: 4 Global Step: 173890 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:16:07,386-Speed 2629.77 samples/sec Loss 10.7851 LearningRate 0.0625 Epoch: 4 Global Step: 173900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:16:11,601-Speed 2429.77 samples/sec Loss 10.7236 LearningRate 0.0625 Epoch: 4 Global Step: 173910 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:16:15,494-Speed 2631.03 samples/sec Loss 10.7261 LearningRate 0.0625 Epoch: 4 Global Step: 173920 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:16:19,391-Speed 2628.26 samples/sec Loss 10.8019 LearningRate 0.0625 Epoch: 4 Global Step: 173930 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:16:23,285-Speed 2629.89 samples/sec Loss 10.6601 LearningRate 0.0625 Epoch: 4 Global Step: 173940 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:16:27,177-Speed 2631.84 samples/sec Loss 10.7178 LearningRate 0.0625 Epoch: 4 Global Step: 173950 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:16:31,075-Speed 2627.95 samples/sec Loss 10.7053 LearningRate 0.0625 Epoch: 4 Global Step: 173960 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:16:34,970-Speed 2629.70 samples/sec Loss 10.7943 LearningRate 0.0625 Epoch: 4 Global Step: 173970 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:16:38,857-Speed 2634.27 samples/sec Loss 10.7280 LearningRate 0.0625 Epoch: 4 Global Step: 173980 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:16:42,733-Speed 2642.77 samples/sec Loss 10.7603 LearningRate 0.0625 Epoch: 4 Global Step: 173990 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:16:46,625-Speed 2631.99 samples/sec Loss 10.6746 LearningRate 0.0625 Epoch: 4 Global Step: 174000 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:16:50,519-Speed 2630.10 samples/sec Loss 10.6649 LearningRate 0.0624 Epoch: 4 Global Step: 174010 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:16:54,413-Speed 2630.46 samples/sec Loss 10.7759 LearningRate 0.0624 Epoch: 4 Global Step: 174020 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:16:58,305-Speed 2631.30 samples/sec Loss 10.6874 LearningRate 0.0624 Epoch: 4 Global Step: 174030 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:17:02,212-Speed 2621.70 samples/sec Loss 10.5995 LearningRate 0.0624 Epoch: 4 Global Step: 174040 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:17:06,134-Speed 2611.54 samples/sec Loss 10.8607 LearningRate 0.0624 Epoch: 4 Global Step: 174050 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:17:10,051-Speed 2614.60 samples/sec Loss 10.7701 LearningRate 0.0624 Epoch: 4 Global Step: 174060 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:17:13,990-Speed 2600.08 samples/sec Loss 10.7226 LearningRate 0.0624 Epoch: 4 Global Step: 174070 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:17:17,911-Speed 2611.84 samples/sec Loss 10.8025 LearningRate 0.0624 Epoch: 4 Global Step: 174080 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:17:21,814-Speed 2624.75 samples/sec Loss 10.8090 LearningRate 0.0624 Epoch: 4 Global Step: 174090 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:25,711-Speed 2628.10 samples/sec Loss 10.8929 LearningRate 0.0624 Epoch: 4 Global Step: 174100 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:29,611-Speed 2626.59 samples/sec Loss 10.6142 LearningRate 0.0624 Epoch: 4 Global Step: 174110 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:33,501-Speed 2633.09 samples/sec Loss 10.6885 LearningRate 0.0624 Epoch: 4 Global Step: 174120 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:37,392-Speed 2631.68 samples/sec Loss 10.9025 LearningRate 0.0624 Epoch: 4 Global Step: 174130 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:41,285-Speed 2631.09 samples/sec Loss 10.7672 LearningRate 0.0624 Epoch: 4 Global Step: 174140 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:45,184-Speed 2627.24 samples/sec Loss 10.8159 LearningRate 0.0624 Epoch: 4 Global Step: 174150 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:50,415-Speed 1957.52 samples/sec Loss 10.6434 LearningRate 0.0624 Epoch: 4 Global Step: 174160 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:54,306-Speed 2632.58 samples/sec Loss 10.7697 LearningRate 0.0624 Epoch: 4 Global Step: 174170 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:17:58,204-Speed 2627.40 samples/sec Loss 10.9157 LearningRate 0.0624 Epoch: 4 Global Step: 174180 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:02,095-Speed 2632.57 samples/sec Loss 10.8239 LearningRate 0.0624 Epoch: 4 Global Step: 174190 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:18:05,989-Speed 2630.21 samples/sec Loss 10.8369 LearningRate 0.0624 Epoch: 4 Global Step: 174200 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:18:09,896-Speed 2621.40 samples/sec Loss 10.8767 LearningRate 0.0624 Epoch: 4 Global Step: 174210 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:18:13,791-Speed 2629.66 samples/sec Loss 10.6767 LearningRate 0.0624 Epoch: 4 Global Step: 174220 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:18:17,684-Speed 2630.98 samples/sec Loss 10.6976 LearningRate 0.0624 Epoch: 4 Global Step: 174230 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:18:21,557-Speed 2644.57 samples/sec Loss 10.7081 LearningRate 0.0624 Epoch: 4 Global Step: 174240 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:25,449-Speed 2631.35 samples/sec Loss 10.7545 LearningRate 0.0624 Epoch: 4 Global Step: 174250 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:29,342-Speed 2631.38 samples/sec Loss 10.7009 LearningRate 0.0624 Epoch: 4 Global Step: 174260 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:33,247-Speed 2622.39 samples/sec Loss 10.7810 LearningRate 0.0624 Epoch: 4 Global Step: 174270 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:37,140-Speed 2630.75 samples/sec Loss 10.7507 LearningRate 0.0624 Epoch: 4 Global Step: 174280 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:41,029-Speed 2633.50 samples/sec Loss 10.8639 LearningRate 0.0624 Epoch: 4 Global Step: 174290 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:44,928-Speed 2627.81 samples/sec Loss 10.7496 LearningRate 0.0624 Epoch: 4 Global Step: 174300 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:48,822-Speed 2630.29 samples/sec Loss 10.6550 LearningRate 0.0624 Epoch: 4 Global Step: 174310 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:52,717-Speed 2629.56 samples/sec Loss 10.6201 LearningRate 0.0624 Epoch: 4 Global Step: 174320 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:18:56,612-Speed 2629.34 samples/sec Loss 10.6121 LearningRate 0.0624 Epoch: 4 Global Step: 174330 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:19:00,490-Speed 2641.29 samples/sec Loss 11.4523 LearningRate 0.0624 Epoch: 4 Global Step: 174340 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:04,389-Speed 2626.48 samples/sec Loss 11.2824 LearningRate 0.0624 Epoch: 4 Global Step: 174350 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:08,293-Speed 2623.82 samples/sec Loss 10.9648 LearningRate 0.0624 Epoch: 4 Global Step: 174360 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:12,187-Speed 2629.96 samples/sec Loss 10.9998 LearningRate 0.0624 Epoch: 4 Global Step: 174370 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:16,078-Speed 2632.48 samples/sec Loss 10.8710 LearningRate 0.0624 Epoch: 4 Global Step: 174380 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:19,969-Speed 2632.19 samples/sec Loss 10.6671 LearningRate 0.0624 Epoch: 4 Global Step: 174390 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:23,868-Speed 2627.12 samples/sec Loss 10.8038 LearningRate 0.0624 Epoch: 4 Global Step: 174400 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:27,776-Speed 2621.41 samples/sec Loss 10.8161 LearningRate 0.0624 Epoch: 4 Global Step: 174410 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:31,667-Speed 2631.95 samples/sec Loss 10.7887 LearningRate 0.0624 Epoch: 4 Global Step: 174420 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:35,575-Speed 2620.65 samples/sec Loss 10.7683 LearningRate 0.0624 Epoch: 4 Global Step: 174430 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:19:39,489-Speed 2616.57 samples/sec Loss 10.8141 LearningRate 0.0624 Epoch: 4 Global Step: 174440 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:19:43,384-Speed 2630.19 samples/sec Loss 10.6233 LearningRate 0.0624 Epoch: 4 Global Step: 174450 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:19:47,353-Speed 2580.27 samples/sec Loss 10.6904 LearningRate 0.0624 Epoch: 4 Global Step: 174460 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:19:51,447-Speed 2501.97 samples/sec Loss 10.8299 LearningRate 0.0624 Epoch: 4 Global Step: 174470 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:19:55,459-Speed 2553.08 samples/sec Loss 10.8537 LearningRate 0.0624 Epoch: 4 Global Step: 174480 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:19:59,352-Speed 2631.24 samples/sec Loss 10.7206 LearningRate 0.0624 Epoch: 4 Global Step: 174490 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:03,245-Speed 2630.49 samples/sec Loss 10.8175 LearningRate 0.0624 Epoch: 4 Global Step: 174500 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:07,141-Speed 2629.33 samples/sec Loss 10.6776 LearningRate 0.0624 Epoch: 4 Global Step: 174510 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:11,045-Speed 2623.14 samples/sec Loss 10.7478 LearningRate 0.0624 Epoch: 4 Global Step: 174520 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:14,951-Speed 2622.37 samples/sec Loss 10.6877 LearningRate 0.0623 Epoch: 4 Global Step: 174530 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:18,833-Speed 2638.16 samples/sec Loss 10.9442 LearningRate 0.0623 Epoch: 4 Global Step: 174540 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:22,725-Speed 2631.75 samples/sec Loss 10.7491 LearningRate 0.0623 Epoch: 4 Global Step: 174550 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:27,787-Speed 2023.45 samples/sec Loss 10.7297 LearningRate 0.0623 Epoch: 4 Global Step: 174560 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:31,691-Speed 2623.24 samples/sec Loss 10.7083 LearningRate 0.0623 Epoch: 4 Global Step: 174570 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:35,593-Speed 2625.53 samples/sec Loss 10.6207 LearningRate 0.0623 Epoch: 4 Global Step: 174580 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:39,487-Speed 2630.14 samples/sec Loss 10.7620 LearningRate 0.0623 Epoch: 4 Global Step: 174590 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:43,380-Speed 2630.48 samples/sec Loss 10.6981 LearningRate 0.0623 Epoch: 4 Global Step: 174600 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:47,285-Speed 2622.43 samples/sec Loss 10.6995 LearningRate 0.0623 Epoch: 4 Global Step: 174610 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:51,190-Speed 2623.43 samples/sec Loss 10.7465 LearningRate 0.0623 Epoch: 4 Global Step: 174620 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:55,087-Speed 2628.01 samples/sec Loss 10.6480 LearningRate 0.0623 Epoch: 4 Global Step: 174630 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:20:58,987-Speed 2626.64 samples/sec Loss 10.5357 LearningRate 0.0623 Epoch: 4 Global Step: 174640 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:21:02,883-Speed 2628.65 samples/sec Loss 10.8567 LearningRate 0.0623 Epoch: 4 Global Step: 174650 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:21:06,778-Speed 2629.94 samples/sec Loss 10.6441 LearningRate 0.0623 Epoch: 4 Global Step: 174660 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:21:10,663-Speed 2636.57 samples/sec Loss 10.5831 LearningRate 0.0623 Epoch: 4 Global Step: 174670 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:14,547-Speed 2636.60 samples/sec Loss 10.8303 LearningRate 0.0623 Epoch: 4 Global Step: 174680 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:18,440-Speed 2631.30 samples/sec Loss 10.7286 LearningRate 0.0623 Epoch: 4 Global Step: 174690 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:22,333-Speed 2630.60 samples/sec Loss 10.7322 LearningRate 0.0623 Epoch: 4 Global Step: 174700 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:26,223-Speed 2633.24 samples/sec Loss 10.7805 LearningRate 0.0623 Epoch: 4 Global Step: 174710 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:30,143-Speed 2612.68 samples/sec Loss 10.7420 LearningRate 0.0623 Epoch: 4 Global Step: 174720 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:34,051-Speed 2620.72 samples/sec Loss 10.6903 LearningRate 0.0623 Epoch: 4 Global Step: 174730 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:37,962-Speed 2618.89 samples/sec Loss 10.8575 LearningRate 0.0623 Epoch: 4 Global Step: 174740 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:41,858-Speed 2628.95 samples/sec Loss 10.7745 LearningRate 0.0623 Epoch: 4 Global Step: 174750 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:45,752-Speed 2630.36 samples/sec Loss 10.8433 LearningRate 0.0623 Epoch: 4 Global Step: 174760 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:21:49,648-Speed 2629.20 samples/sec Loss 10.8626 LearningRate 0.0623 Epoch: 4 Global Step: 174770 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:21:53,544-Speed 2628.57 samples/sec Loss 10.7883 LearningRate 0.0623 Epoch: 4 Global Step: 174780 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:21:57,445-Speed 2626.02 samples/sec Loss 10.5689 LearningRate 0.0623 Epoch: 4 Global Step: 174790 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:22:01,342-Speed 2627.87 samples/sec Loss 10.7796 LearningRate 0.0623 Epoch: 4 Global Step: 174800 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:22:05,223-Speed 2639.39 samples/sec Loss 10.7048 LearningRate 0.0623 Epoch: 4 Global Step: 174810 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:09,126-Speed 2624.30 samples/sec Loss 10.7274 LearningRate 0.0623 Epoch: 4 Global Step: 174820 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:13,023-Speed 2627.51 samples/sec Loss 10.7090 LearningRate 0.0623 Epoch: 4 Global Step: 174830 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:16,943-Speed 2612.84 samples/sec Loss 10.6173 LearningRate 0.0623 Epoch: 4 Global Step: 174840 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:20,840-Speed 2629.24 samples/sec Loss 10.5902 LearningRate 0.0623 Epoch: 4 Global Step: 174850 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:24,738-Speed 2627.09 samples/sec Loss 10.7916 LearningRate 0.0623 Epoch: 4 Global Step: 174860 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:28,657-Speed 2613.50 samples/sec Loss 10.7958 LearningRate 0.0623 Epoch: 4 Global Step: 174870 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:32,562-Speed 2623.00 samples/sec Loss 10.7846 LearningRate 0.0623 Epoch: 4 Global Step: 174880 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:36,586-Speed 2544.82 samples/sec Loss 10.7296 LearningRate 0.0623 Epoch: 4 Global Step: 174890 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:40,668-Speed 2509.62 samples/sec Loss 10.6569 LearningRate 0.0623 Epoch: 4 Global Step: 174900 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:22:44,724-Speed 2524.64 samples/sec Loss 10.7406 LearningRate 0.0623 Epoch: 4 Global Step: 174910 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:22:48,622-Speed 2628.01 samples/sec Loss 10.8790 LearningRate 0.0623 Epoch: 4 Global Step: 174920 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:22:52,518-Speed 2628.51 samples/sec Loss 10.6011 LearningRate 0.0623 Epoch: 4 Global Step: 174930 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:22:56,417-Speed 2627.87 samples/sec Loss 10.6952 LearningRate 0.0623 Epoch: 4 Global Step: 174940 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:23:00,313-Speed 2629.02 samples/sec Loss 10.8053 LearningRate 0.0623 Epoch: 4 Global Step: 174950 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:23:04,243-Speed 2605.96 samples/sec Loss 10.8025 LearningRate 0.0623 Epoch: 4 Global Step: 174960 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:08,138-Speed 2629.65 samples/sec Loss 10.7369 LearningRate 0.0623 Epoch: 4 Global Step: 174970 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:12,030-Speed 2631.34 samples/sec Loss 10.7133 LearningRate 0.0623 Epoch: 4 Global Step: 174980 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:15,925-Speed 2629.60 samples/sec Loss 10.7463 LearningRate 0.0623 Epoch: 4 Global Step: 174990 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:19,816-Speed 2632.26 samples/sec Loss 10.7484 LearningRate 0.0623 Epoch: 4 Global Step: 175000 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:23,706-Speed 2632.91 samples/sec Loss 10.6811 LearningRate 0.0623 Epoch: 4 Global Step: 175010 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:27,599-Speed 2630.84 samples/sec Loss 10.6397 LearningRate 0.0623 Epoch: 4 Global Step: 175020 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:31,568-Speed 2580.61 samples/sec Loss 10.8277 LearningRate 0.0623 Epoch: 4 Global Step: 175030 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:35,494-Speed 2608.82 samples/sec Loss 10.8153 LearningRate 0.0623 Epoch: 4 Global Step: 175040 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:39,395-Speed 2625.90 samples/sec Loss 10.7740 LearningRate 0.0623 Epoch: 4 Global Step: 175050 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:43,291-Speed 2628.59 samples/sec Loss 10.7886 LearningRate 0.0622 Epoch: 4 Global Step: 175060 Fp16 Grad Scale: 262144 Required: 74 hours
Training: 2022-04-13 15:23:47,164-Speed 2644.55 samples/sec Loss 10.6622 LearningRate 0.0622 Epoch: 4 Global Step: 175070 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:51,058-Speed 2630.41 samples/sec Loss 10.8645 LearningRate 0.0622 Epoch: 4 Global Step: 175080 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:54,951-Speed 2630.69 samples/sec Loss 10.8176 LearningRate 0.0622 Epoch: 4 Global Step: 175090 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:23:58,845-Speed 2630.18 samples/sec Loss 10.6177 LearningRate 0.0622 Epoch: 4 Global Step: 175100 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:24:02,756-Speed 2618.59 samples/sec Loss 10.7145 LearningRate 0.0622 Epoch: 4 Global Step: 175110 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:24:06,671-Speed 2616.57 samples/sec Loss 10.6010 LearningRate 0.0622 Epoch: 4 Global Step: 175120 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:24:10,586-Speed 2615.71 samples/sec Loss 10.7332 LearningRate 0.0622 Epoch: 4 Global Step: 175130 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:24:14,484-Speed 2628.08 samples/sec Loss 10.8067 LearningRate 0.0622 Epoch: 4 Global Step: 175140 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:24:18,374-Speed 2633.14 samples/sec Loss 10.9011 LearningRate 0.0622 Epoch: 4 Global Step: 175150 Fp16 Grad Scale: 131072 Required: 74 hours
Training: 2022-04-13 15:24:22,253-Speed 2640.63 samples/sec Loss 10.7377 LearningRate 0.0622 Epoch: 4 Global Step: 175160 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:24:26,144-Speed 2631.82 samples/sec Loss 10.7899 LearningRate 0.0622 Epoch: 4 Global Step: 175170 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:24:30,038-Speed 2630.46 samples/sec Loss 10.6972 LearningRate 0.0622 Epoch: 4 Global Step: 175180 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:24:33,929-Speed 2631.84 samples/sec Loss 10.7613 LearningRate 0.0622 Epoch: 4 Global Step: 175190 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:24:37,824-Speed 2630.01 samples/sec Loss 10.6620 LearningRate 0.0622 Epoch: 4 Global Step: 175200 Fp16 Grad Scale: 65536 Required: 74 hours
Training: 2022-04-13 15:24:41,715-Speed 2632.00 samples/sec Loss 10.5800 LearningRate 0.0622 Epoch: 4 Global Step: 175210 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:24:45,609-Speed 2630.01 samples/sec Loss 10.6840 LearningRate 0.0622 Epoch: 4 Global Step: 175220 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:24:49,502-Speed 2631.16 samples/sec Loss 10.9055 LearningRate 0.0622 Epoch: 4 Global Step: 175230 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:24:53,395-Speed 2631.51 samples/sec Loss 10.7669 LearningRate 0.0622 Epoch: 4 Global Step: 175240 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:24:57,286-Speed 2632.72 samples/sec Loss 10.5624 LearningRate 0.0622 Epoch: 4 Global Step: 175250 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:25:01,187-Speed 2625.70 samples/sec Loss 10.6342 LearningRate 0.0622 Epoch: 4 Global Step: 175260 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:05,092-Speed 2623.10 samples/sec Loss 10.7093 LearningRate 0.0622 Epoch: 4 Global Step: 175270 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:09,017-Speed 2609.22 samples/sec Loss 10.7988 LearningRate 0.0622 Epoch: 4 Global Step: 175280 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:12,913-Speed 2628.29 samples/sec Loss 10.7208 LearningRate 0.0622 Epoch: 4 Global Step: 175290 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:16,811-Speed 2627.73 samples/sec Loss 10.6920 LearningRate 0.0622 Epoch: 4 Global Step: 175300 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:20,705-Speed 2630.50 samples/sec Loss 10.6972 LearningRate 0.0622 Epoch: 4 Global Step: 175310 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:24,736-Speed 2541.08 samples/sec Loss 10.6796 LearningRate 0.0622 Epoch: 4 Global Step: 175320 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:28,628-Speed 2632.06 samples/sec Loss 10.9342 LearningRate 0.0622 Epoch: 4 Global Step: 175330 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:25:32,505-Speed 2641.93 samples/sec Loss 10.7003 LearningRate 0.0622 Epoch: 4 Global Step: 175340 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:25:36,448-Speed 2596.99 samples/sec Loss 10.6969 LearningRate 0.0622 Epoch: 4 Global Step: 175350 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:25:40,409-Speed 2586.11 samples/sec Loss 10.8403 LearningRate 0.0622 Epoch: 4 Global Step: 175360 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:25:44,304-Speed 2629.14 samples/sec Loss 10.6189 LearningRate 0.0622 Epoch: 4 Global Step: 175370 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:25:48,206-Speed 2625.28 samples/sec Loss 10.6624 LearningRate 0.0622 Epoch: 4 Global Step: 175380 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:25:52,095-Speed 2633.53 samples/sec Loss 10.7478 LearningRate 0.0622 Epoch: 4 Global Step: 175390 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:25:55,985-Speed 2632.93 samples/sec Loss 10.7284 LearningRate 0.0622 Epoch: 4 Global Step: 175400 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:25:59,884-Speed 2626.77 samples/sec Loss 10.6303 LearningRate 0.0622 Epoch: 4 Global Step: 175410 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:26:03,778-Speed 2630.53 samples/sec Loss 10.7890 LearningRate 0.0622 Epoch: 4 Global Step: 175420 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:26:07,669-Speed 2632.32 samples/sec Loss 10.7887 LearningRate 0.0622 Epoch: 4 Global Step: 175430 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:26:11,562-Speed 2630.99 samples/sec Loss 10.7722 LearningRate 0.0622 Epoch: 4 Global Step: 175440 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:26:15,452-Speed 2633.23 samples/sec Loss 10.6236 LearningRate 0.0622 Epoch: 4 Global Step: 175450 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:19,343-Speed 2631.67 samples/sec Loss 10.8594 LearningRate 0.0622 Epoch: 4 Global Step: 175460 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:23,236-Speed 2631.10 samples/sec Loss 10.8926 LearningRate 0.0622 Epoch: 4 Global Step: 175470 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:27,128-Speed 2631.50 samples/sec Loss 10.7877 LearningRate 0.0622 Epoch: 4 Global Step: 175480 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:31,022-Speed 2630.34 samples/sec Loss 10.7071 LearningRate 0.0622 Epoch: 4 Global Step: 175490 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:34,913-Speed 2632.52 samples/sec Loss 10.6716 LearningRate 0.0622 Epoch: 4 Global Step: 175500 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:38,802-Speed 2633.05 samples/sec Loss 10.7889 LearningRate 0.0622 Epoch: 4 Global Step: 175510 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:42,694-Speed 2631.87 samples/sec Loss 10.7865 LearningRate 0.0622 Epoch: 4 Global Step: 175520 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:46,587-Speed 2631.20 samples/sec Loss 10.8531 LearningRate 0.0622 Epoch: 4 Global Step: 175530 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:50,484-Speed 2628.32 samples/sec Loss 10.7706 LearningRate 0.0622 Epoch: 4 Global Step: 175540 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:26:54,374-Speed 2632.55 samples/sec Loss 10.6387 LearningRate 0.0622 Epoch: 4 Global Step: 175550 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:26:58,267-Speed 2631.08 samples/sec Loss 10.8343 LearningRate 0.0622 Epoch: 4 Global Step: 175560 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:02,159-Speed 2631.94 samples/sec Loss 10.7015 LearningRate 0.0622 Epoch: 4 Global Step: 175570 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:06,053-Speed 2630.00 samples/sec Loss 10.6858 LearningRate 0.0621 Epoch: 4 Global Step: 175580 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:09,956-Speed 2624.10 samples/sec Loss 10.8392 LearningRate 0.0621 Epoch: 4 Global Step: 175590 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:13,842-Speed 2636.01 samples/sec Loss 10.7554 LearningRate 0.0621 Epoch: 4 Global Step: 175600 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:17,736-Speed 2629.49 samples/sec Loss 10.6580 LearningRate 0.0621 Epoch: 4 Global Step: 175610 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:21,627-Speed 2632.83 samples/sec Loss 10.8468 LearningRate 0.0621 Epoch: 4 Global Step: 175620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:25,533-Speed 2622.88 samples/sec Loss 10.7791 LearningRate 0.0621 Epoch: 4 Global Step: 175630 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:29,433-Speed 2626.14 samples/sec Loss 10.6719 LearningRate 0.0621 Epoch: 4 Global Step: 175640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:27:33,339-Speed 2622.04 samples/sec Loss 10.7758 LearningRate 0.0621 Epoch: 4 Global Step: 175650 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:27:37,233-Speed 2630.05 samples/sec Loss 10.7553 LearningRate 0.0621 Epoch: 4 Global Step: 175660 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:27:41,134-Speed 2625.56 samples/sec Loss 10.7350 LearningRate 0.0621 Epoch: 4 Global Step: 175670 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:27:45,036-Speed 2624.86 samples/sec Loss 10.9179 LearningRate 0.0621 Epoch: 4 Global Step: 175680 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:27:48,937-Speed 2625.44 samples/sec Loss 10.7973 LearningRate 0.0621 Epoch: 4 Global Step: 175690 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:27:52,836-Speed 2627.10 samples/sec Loss 10.5898 LearningRate 0.0621 Epoch: 4 Global Step: 175700 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:27:56,755-Speed 2613.35 samples/sec Loss 10.6639 LearningRate 0.0621 Epoch: 4 Global Step: 175710 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:28:00,632-Speed 2642.26 samples/sec Loss 10.6602 LearningRate 0.0621 Epoch: 4 Global Step: 175720 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:04,617-Speed 2569.87 samples/sec Loss 10.7264 LearningRate 0.0621 Epoch: 4 Global Step: 175730 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:08,515-Speed 2627.70 samples/sec Loss 10.6572 LearningRate 0.0621 Epoch: 4 Global Step: 175740 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:12,411-Speed 2628.90 samples/sec Loss 10.6827 LearningRate 0.0621 Epoch: 4 Global Step: 175750 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:16,306-Speed 2629.25 samples/sec Loss 10.6635 LearningRate 0.0621 Epoch: 4 Global Step: 175760 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:20,206-Speed 2626.25 samples/sec Loss 10.7135 LearningRate 0.0621 Epoch: 4 Global Step: 175770 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:24,109-Speed 2624.26 samples/sec Loss 10.6920 LearningRate 0.0621 Epoch: 4 Global Step: 175780 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:28,024-Speed 2616.14 samples/sec Loss 10.8571 LearningRate 0.0621 Epoch: 4 Global Step: 175790 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:31,926-Speed 2624.97 samples/sec Loss 10.6844 LearningRate 0.0621 Epoch: 4 Global Step: 175800 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:35,821-Speed 2630.32 samples/sec Loss 10.6087 LearningRate 0.0621 Epoch: 4 Global Step: 175810 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:39,732-Speed 2618.32 samples/sec Loss 10.6794 LearningRate 0.0621 Epoch: 4 Global Step: 175820 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:28:43,618-Speed 2635.65 samples/sec Loss 10.7008 LearningRate 0.0621 Epoch: 4 Global Step: 175830 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:47,516-Speed 2628.12 samples/sec Loss 10.7695 LearningRate 0.0621 Epoch: 4 Global Step: 175840 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:51,418-Speed 2624.63 samples/sec Loss 10.7057 LearningRate 0.0621 Epoch: 4 Global Step: 175850 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:55,326-Speed 2620.58 samples/sec Loss 10.6863 LearningRate 0.0621 Epoch: 4 Global Step: 175860 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:28:59,221-Speed 2629.98 samples/sec Loss 10.7089 LearningRate 0.0621 Epoch: 4 Global Step: 175870 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:03,116-Speed 2629.10 samples/sec Loss 10.8783 LearningRate 0.0621 Epoch: 4 Global Step: 175880 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:07,010-Speed 2630.31 samples/sec Loss 10.7363 LearningRate 0.0621 Epoch: 4 Global Step: 175890 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:10,908-Speed 2627.88 samples/sec Loss 10.6791 LearningRate 0.0621 Epoch: 4 Global Step: 175900 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:14,799-Speed 2632.56 samples/sec Loss 10.7971 LearningRate 0.0621 Epoch: 4 Global Step: 175910 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:18,694-Speed 2629.42 samples/sec Loss 10.6907 LearningRate 0.0621 Epoch: 4 Global Step: 175920 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:22,587-Speed 2631.10 samples/sec Loss 10.5945 LearningRate 0.0621 Epoch: 4 Global Step: 175930 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:29:26,480-Speed 2630.67 samples/sec Loss 10.8076 LearningRate 0.0621 Epoch: 4 Global Step: 175940 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:29:30,373-Speed 2631.34 samples/sec Loss 10.6623 LearningRate 0.0621 Epoch: 4 Global Step: 175950 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:29:34,271-Speed 2627.44 samples/sec Loss 10.6712 LearningRate 0.0621 Epoch: 4 Global Step: 175960 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:29:38,165-Speed 2629.67 samples/sec Loss 10.6622 LearningRate 0.0621 Epoch: 4 Global Step: 175970 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:29:42,055-Speed 2633.26 samples/sec Loss 10.7350 LearningRate 0.0621 Epoch: 4 Global Step: 175980 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:45,963-Speed 2620.65 samples/sec Loss 10.7615 LearningRate 0.0621 Epoch: 4 Global Step: 175990 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:49,861-Speed 2627.94 samples/sec Loss 10.7375 LearningRate 0.0621 Epoch: 4 Global Step: 176000 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:53,754-Speed 2631.16 samples/sec Loss 10.6707 LearningRate 0.0621 Epoch: 4 Global Step: 176010 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:29:57,651-Speed 2627.76 samples/sec Loss 10.7087 LearningRate 0.0621 Epoch: 4 Global Step: 176020 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:01,546-Speed 2630.10 samples/sec Loss 10.5741 LearningRate 0.0621 Epoch: 4 Global Step: 176030 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:05,441-Speed 2629.31 samples/sec Loss 10.7027 LearningRate 0.0621 Epoch: 4 Global Step: 176040 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:09,332-Speed 2632.36 samples/sec Loss 10.7372 LearningRate 0.0621 Epoch: 4 Global Step: 176050 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:13,224-Speed 2631.44 samples/sec Loss 10.7157 LearningRate 0.0621 Epoch: 4 Global Step: 176060 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:17,225-Speed 2559.82 samples/sec Loss 10.6932 LearningRate 0.0621 Epoch: 4 Global Step: 176070 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:21,120-Speed 2630.01 samples/sec Loss 10.7112 LearningRate 0.0621 Epoch: 4 Global Step: 176080 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:30:25,021-Speed 2625.13 samples/sec Loss 10.8546 LearningRate 0.0621 Epoch: 4 Global Step: 176090 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:30:28,927-Speed 2623.02 samples/sec Loss 10.7189 LearningRate 0.0621 Epoch: 4 Global Step: 176100 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:30:32,821-Speed 2629.96 samples/sec Loss 10.7999 LearningRate 0.0620 Epoch: 4 Global Step: 176110 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:36,714-Speed 2630.91 samples/sec Loss 10.7788 LearningRate 0.0620 Epoch: 4 Global Step: 176120 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:40,614-Speed 2626.01 samples/sec Loss 10.7185 LearningRate 0.0620 Epoch: 4 Global Step: 176130 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:44,514-Speed 2626.64 samples/sec Loss 10.5909 LearningRate 0.0620 Epoch: 4 Global Step: 176140 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:48,412-Speed 2626.78 samples/sec Loss 10.8180 LearningRate 0.0620 Epoch: 4 Global Step: 176150 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:52,318-Speed 2622.76 samples/sec Loss 10.7990 LearningRate 0.0620 Epoch: 4 Global Step: 176160 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:30:56,211-Speed 2630.75 samples/sec Loss 10.6689 LearningRate 0.0620 Epoch: 4 Global Step: 176170 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:00,116-Speed 2623.46 samples/sec Loss 10.8172 LearningRate 0.0620 Epoch: 4 Global Step: 176180 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:04,016-Speed 2625.89 samples/sec Loss 10.7653 LearningRate 0.0620 Epoch: 4 Global Step: 176190 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:07,916-Speed 2626.51 samples/sec Loss 10.6732 LearningRate 0.0620 Epoch: 4 Global Step: 176200 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:11,809-Speed 2630.80 samples/sec Loss 10.5973 LearningRate 0.0620 Epoch: 4 Global Step: 176210 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:31:15,685-Speed 2642.03 samples/sec Loss 10.6459 LearningRate 0.0620 Epoch: 4 Global Step: 176220 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:19,591-Speed 2622.15 samples/sec Loss 10.8487 LearningRate 0.0620 Epoch: 4 Global Step: 176230 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:23,485-Speed 2631.16 samples/sec Loss 10.8097 LearningRate 0.0620 Epoch: 4 Global Step: 176240 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:27,381-Speed 2628.64 samples/sec Loss 10.7433 LearningRate 0.0620 Epoch: 4 Global Step: 176250 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:31,277-Speed 2628.78 samples/sec Loss 10.6186 LearningRate 0.0620 Epoch: 4 Global Step: 176260 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:35,183-Speed 2622.35 samples/sec Loss 10.7398 LearningRate 0.0620 Epoch: 4 Global Step: 176270 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:39,149-Speed 2582.43 samples/sec Loss 10.8364 LearningRate 0.0620 Epoch: 4 Global Step: 176280 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:43,063-Speed 2617.06 samples/sec Loss 10.6513 LearningRate 0.0620 Epoch: 4 Global Step: 176290 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:46,954-Speed 2632.78 samples/sec Loss 10.7697 LearningRate 0.0620 Epoch: 4 Global Step: 176300 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:50,859-Speed 2622.74 samples/sec Loss 10.7081 LearningRate 0.0620 Epoch: 4 Global Step: 176310 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:31:54,761-Speed 2624.53 samples/sec Loss 10.6848 LearningRate 0.0620 Epoch: 4 Global Step: 176320 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:31:58,666-Speed 2623.44 samples/sec Loss 10.7962 LearningRate 0.0620 Epoch: 4 Global Step: 176330 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:32:02,561-Speed 2629.46 samples/sec Loss 10.7381 LearningRate 0.0620 Epoch: 4 Global Step: 176340 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:32:06,454-Speed 2630.54 samples/sec Loss 10.8210 LearningRate 0.0620 Epoch: 4 Global Step: 176350 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:32:10,354-Speed 2626.10 samples/sec Loss 10.6238 LearningRate 0.0620 Epoch: 4 Global Step: 176360 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:32:14,265-Speed 2618.81 samples/sec Loss 10.5145 LearningRate 0.0620 Epoch: 4 Global Step: 176370 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:32:18,195-Speed 2606.20 samples/sec Loss 10.6527 LearningRate 0.0620 Epoch: 4 Global Step: 176380 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:32:22,106-Speed 2619.41 samples/sec Loss 10.7560 LearningRate 0.0620 Epoch: 4 Global Step: 176390 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:32:25,986-Speed 2639.53 samples/sec Loss 10.7127 LearningRate 0.0620 Epoch: 4 Global Step: 176400 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:29,895-Speed 2620.06 samples/sec Loss 10.7619 LearningRate 0.0620 Epoch: 4 Global Step: 176410 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:33,800-Speed 2623.24 samples/sec Loss 10.7750 LearningRate 0.0620 Epoch: 4 Global Step: 176420 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:38,085-Speed 2389.85 samples/sec Loss 10.6890 LearningRate 0.0620 Epoch: 4 Global Step: 176430 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:41,986-Speed 2625.58 samples/sec Loss 10.6991 LearningRate 0.0620 Epoch: 4 Global Step: 176440 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:45,888-Speed 2625.16 samples/sec Loss 10.6888 LearningRate 0.0620 Epoch: 4 Global Step: 176450 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:49,783-Speed 2629.65 samples/sec Loss 10.6977 LearningRate 0.0620 Epoch: 4 Global Step: 176460 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:53,681-Speed 2627.48 samples/sec Loss 10.7055 LearningRate 0.0620 Epoch: 4 Global Step: 176470 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:32:57,586-Speed 2623.03 samples/sec Loss 10.7352 LearningRate 0.0620 Epoch: 4 Global Step: 176480 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:01,483-Speed 2627.95 samples/sec Loss 10.7077 LearningRate 0.0620 Epoch: 4 Global Step: 176490 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:05,380-Speed 2628.27 samples/sec Loss 10.6808 LearningRate 0.0620 Epoch: 4 Global Step: 176500 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:33:09,281-Speed 2625.84 samples/sec Loss 10.7724 LearningRate 0.0620 Epoch: 4 Global Step: 176510 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:33:13,173-Speed 2631.35 samples/sec Loss 10.5734 LearningRate 0.0620 Epoch: 4 Global Step: 176520 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:33:17,057-Speed 2636.77 samples/sec Loss 10.6240 LearningRate 0.0620 Epoch: 4 Global Step: 176530 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:20,943-Speed 2636.13 samples/sec Loss 10.6511 LearningRate 0.0620 Epoch: 4 Global Step: 176540 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:24,837-Speed 2630.34 samples/sec Loss 10.8826 LearningRate 0.0620 Epoch: 4 Global Step: 176550 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:28,741-Speed 2623.70 samples/sec Loss 10.7811 LearningRate 0.0620 Epoch: 4 Global Step: 176560 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:32,655-Speed 2616.92 samples/sec Loss 10.6497 LearningRate 0.0620 Epoch: 4 Global Step: 176570 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:36,572-Speed 2633.99 samples/sec Loss 10.6863 LearningRate 0.0620 Epoch: 4 Global Step: 176580 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:40,463-Speed 2632.11 samples/sec Loss 10.6572 LearningRate 0.0620 Epoch: 4 Global Step: 176590 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:44,354-Speed 2631.93 samples/sec Loss 10.7450 LearningRate 0.0620 Epoch: 4 Global Step: 176600 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:48,251-Speed 2627.90 samples/sec Loss 10.6378 LearningRate 0.0620 Epoch: 4 Global Step: 176610 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:52,293-Speed 2632.38 samples/sec Loss 10.7356 LearningRate 0.0620 Epoch: 4 Global Step: 176620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:33:56,366-Speed 2609.40 samples/sec Loss 10.7222 LearningRate 0.0620 Epoch: 4 Global Step: 176630 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:34:00,265-Speed 2627.23 samples/sec Loss 10.6636 LearningRate 0.0619 Epoch: 4 Global Step: 176640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:04,299-Speed 2635.71 samples/sec Loss 10.6590 LearningRate 0.0619 Epoch: 4 Global Step: 176650 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:08,201-Speed 2625.49 samples/sec Loss 10.6195 LearningRate 0.0619 Epoch: 4 Global Step: 176660 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:12,110-Speed 2619.96 samples/sec Loss 10.7043 LearningRate 0.0619 Epoch: 4 Global Step: 176670 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:16,016-Speed 2622.14 samples/sec Loss 10.7197 LearningRate 0.0619 Epoch: 4 Global Step: 176680 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:19,946-Speed 2634.07 samples/sec Loss 10.5888 LearningRate 0.0619 Epoch: 4 Global Step: 176690 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:23,853-Speed 2621.57 samples/sec Loss 10.7515 LearningRate 0.0619 Epoch: 4 Global Step: 176700 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:27,756-Speed 2624.26 samples/sec Loss 10.6949 LearningRate 0.0619 Epoch: 4 Global Step: 176710 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:31,666-Speed 2619.41 samples/sec Loss 10.6482 LearningRate 0.0619 Epoch: 4 Global Step: 176720 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:35,572-Speed 2622.13 samples/sec Loss 10.8454 LearningRate 0.0619 Epoch: 4 Global Step: 176730 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:34:39,471-Speed 2626.37 samples/sec Loss 10.7773 LearningRate 0.0619 Epoch: 4 Global Step: 176740 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:34:43,370-Speed 2626.82 samples/sec Loss 10.6781 LearningRate 0.0619 Epoch: 4 Global Step: 176750 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:34:47,267-Speed 2629.08 samples/sec Loss 10.6456 LearningRate 0.0619 Epoch: 4 Global Step: 176760 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:34:51,158-Speed 2631.78 samples/sec Loss 10.7127 LearningRate 0.0619 Epoch: 4 Global Step: 176770 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:34:55,050-Speed 2632.32 samples/sec Loss 10.8820 LearningRate 0.0619 Epoch: 4 Global Step: 176780 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:34:59,020-Speed 2579.20 samples/sec Loss 10.7867 LearningRate 0.0619 Epoch: 4 Global Step: 176790 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:35:02,921-Speed 2625.43 samples/sec Loss 10.7515 LearningRate 0.0619 Epoch: 4 Global Step: 176800 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:35:06,832-Speed 2619.11 samples/sec Loss 10.6160 LearningRate 0.0619 Epoch: 4 Global Step: 176810 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:35:10,728-Speed 2628.59 samples/sec Loss 10.7990 LearningRate 0.0619 Epoch: 4 Global Step: 176820 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:14,632-Speed 2624.09 samples/sec Loss 10.7396 LearningRate 0.0619 Epoch: 4 Global Step: 176830 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:18,546-Speed 2616.55 samples/sec Loss 10.6700 LearningRate 0.0619 Epoch: 4 Global Step: 176840 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:22,454-Speed 2620.75 samples/sec Loss 10.6871 LearningRate 0.0619 Epoch: 4 Global Step: 176850 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:26,350-Speed 2629.56 samples/sec Loss 10.6838 LearningRate 0.0619 Epoch: 4 Global Step: 176860 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:30,240-Speed 2633.15 samples/sec Loss 10.6471 LearningRate 0.0619 Epoch: 4 Global Step: 176870 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:34,133-Speed 2630.42 samples/sec Loss 10.7171 LearningRate 0.0619 Epoch: 4 Global Step: 176880 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:38,024-Speed 2632.44 samples/sec Loss 10.5183 LearningRate 0.0619 Epoch: 4 Global Step: 176890 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:41,915-Speed 2632.17 samples/sec Loss 10.6828 LearningRate 0.0619 Epoch: 4 Global Step: 176900 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:45,809-Speed 2629.86 samples/sec Loss 10.7663 LearningRate 0.0619 Epoch: 4 Global Step: 176910 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:49,712-Speed 2624.05 samples/sec Loss 10.6216 LearningRate 0.0619 Epoch: 4 Global Step: 176920 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:35:53,592-Speed 2640.31 samples/sec Loss 10.7263 LearningRate 0.0619 Epoch: 4 Global Step: 176930 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:35:57,498-Speed 2621.57 samples/sec Loss 10.6348 LearningRate 0.0619 Epoch: 4 Global Step: 176940 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:01,392-Speed 2631.52 samples/sec Loss 10.7550 LearningRate 0.0619 Epoch: 4 Global Step: 176950 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:05,300-Speed 2620.94 samples/sec Loss 10.8086 LearningRate 0.0619 Epoch: 4 Global Step: 176960 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:09,193-Speed 2630.68 samples/sec Loss 10.7674 LearningRate 0.0619 Epoch: 4 Global Step: 176970 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:13,088-Speed 2629.81 samples/sec Loss 10.6824 LearningRate 0.0619 Epoch: 4 Global Step: 176980 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:16,983-Speed 2629.73 samples/sec Loss 10.6777 LearningRate 0.0619 Epoch: 4 Global Step: 176990 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:20,886-Speed 2624.28 samples/sec Loss 10.6437 LearningRate 0.0619 Epoch: 4 Global Step: 177000 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:24,790-Speed 2623.63 samples/sec Loss 10.6668 LearningRate 0.0619 Epoch: 4 Global Step: 177010 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:28,694-Speed 2622.86 samples/sec Loss 10.7645 LearningRate 0.0619 Epoch: 4 Global Step: 177020 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:36:32,595-Speed 2625.73 samples/sec Loss 10.5933 LearningRate 0.0619 Epoch: 4 Global Step: 177030 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:36:36,498-Speed 2624.64 samples/sec Loss 10.7008 LearningRate 0.0619 Epoch: 4 Global Step: 177040 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:36:40,393-Speed 2629.60 samples/sec Loss 10.6659 LearningRate 0.0619 Epoch: 4 Global Step: 177050 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:36:44,284-Speed 2632.31 samples/sec Loss 10.7712 LearningRate 0.0619 Epoch: 4 Global Step: 177060 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:36:48,177-Speed 2630.98 samples/sec Loss 10.7532 LearningRate 0.0619 Epoch: 4 Global Step: 177070 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:36:52,076-Speed 2626.70 samples/sec Loss 10.6842 LearningRate 0.0619 Epoch: 4 Global Step: 177080 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:36:56,076-Speed 2560.51 samples/sec Loss 10.5223 LearningRate 0.0619 Epoch: 4 Global Step: 177090 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:36:59,979-Speed 2624.39 samples/sec Loss 10.6365 LearningRate 0.0619 Epoch: 4 Global Step: 177100 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:37:03,872-Speed 2631.06 samples/sec Loss 10.6782 LearningRate 0.0619 Epoch: 4 Global Step: 177110 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:37:07,755-Speed 2637.27 samples/sec Loss 10.7403 LearningRate 0.0619 Epoch: 4 Global Step: 177120 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:37:11,641-Speed 2635.84 samples/sec Loss 10.5900 LearningRate 0.0619 Epoch: 4 Global Step: 177130 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:37:15,541-Speed 2626.40 samples/sec Loss 10.4670 LearningRate 0.0619 Epoch: 4 Global Step: 177140 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:37:19,447-Speed 2622.63 samples/sec Loss 10.6346 LearningRate 0.0619 Epoch: 4 Global Step: 177150 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:37:23,322-Speed 2643.89 samples/sec Loss 10.7860 LearningRate 0.0618 Epoch: 4 Global Step: 177160 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:27,215-Speed 2630.36 samples/sec Loss 10.7039 LearningRate 0.0618 Epoch: 4 Global Step: 177170 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:31,107-Speed 2632.03 samples/sec Loss 10.7579 LearningRate 0.0618 Epoch: 4 Global Step: 177180 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:35,008-Speed 2625.45 samples/sec Loss 10.7255 LearningRate 0.0618 Epoch: 4 Global Step: 177190 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:38,898-Speed 2632.47 samples/sec Loss 10.7378 LearningRate 0.0618 Epoch: 4 Global Step: 177200 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:42,793-Speed 2629.75 samples/sec Loss 10.6898 LearningRate 0.0618 Epoch: 4 Global Step: 177210 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:46,687-Speed 2630.33 samples/sec Loss 10.6790 LearningRate 0.0618 Epoch: 4 Global Step: 177220 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:50,925-Speed 2416.62 samples/sec Loss 10.7366 LearningRate 0.0618 Epoch: 4 Global Step: 177230 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:54,821-Speed 2629.77 samples/sec Loss 10.8375 LearningRate 0.0618 Epoch: 4 Global Step: 177240 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:37:58,723-Speed 2624.49 samples/sec Loss 10.6167 LearningRate 0.0618 Epoch: 4 Global Step: 177250 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:38:02,614-Speed 2632.10 samples/sec Loss 10.5710 LearningRate 0.0618 Epoch: 4 Global Step: 177260 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:06,507-Speed 2631.37 samples/sec Loss 10.7370 LearningRate 0.0618 Epoch: 4 Global Step: 177270 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:10,427-Speed 2612.78 samples/sec Loss 10.7185 LearningRate 0.0618 Epoch: 4 Global Step: 177280 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:14,334-Speed 2621.13 samples/sec Loss 10.7166 LearningRate 0.0618 Epoch: 4 Global Step: 177290 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:18,254-Speed 2613.06 samples/sec Loss 10.6369 LearningRate 0.0618 Epoch: 4 Global Step: 177300 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:22,154-Speed 2626.50 samples/sec Loss 10.6174 LearningRate 0.0618 Epoch: 4 Global Step: 177310 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:26,069-Speed 2615.61 samples/sec Loss 10.7197 LearningRate 0.0618 Epoch: 4 Global Step: 177320 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:29,987-Speed 2615.00 samples/sec Loss 10.6972 LearningRate 0.0618 Epoch: 4 Global Step: 177330 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:33,890-Speed 2624.14 samples/sec Loss 10.6162 LearningRate 0.0618 Epoch: 4 Global Step: 177340 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:37,817-Speed 2608.30 samples/sec Loss 10.6185 LearningRate 0.0618 Epoch: 4 Global Step: 177350 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:41,720-Speed 2624.09 samples/sec Loss 10.8037 LearningRate 0.0618 Epoch: 4 Global Step: 177360 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:38:45,602-Speed 2637.81 samples/sec Loss 10.9066 LearningRate 0.0618 Epoch: 4 Global Step: 177370 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:49,495-Speed 2631.20 samples/sec Loss 10.8019 LearningRate 0.0618 Epoch: 4 Global Step: 177380 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:53,388-Speed 2630.65 samples/sec Loss 10.7085 LearningRate 0.0618 Epoch: 4 Global Step: 177390 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:38:57,280-Speed 2632.67 samples/sec Loss 10.5987 LearningRate 0.0618 Epoch: 4 Global Step: 177400 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:01,178-Speed 2627.42 samples/sec Loss 10.6530 LearningRate 0.0618 Epoch: 4 Global Step: 177410 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:05,085-Speed 2622.31 samples/sec Loss 10.7282 LearningRate 0.0618 Epoch: 4 Global Step: 177420 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:08,975-Speed 2632.86 samples/sec Loss 10.7099 LearningRate 0.0618 Epoch: 4 Global Step: 177430 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:12,874-Speed 2626.92 samples/sec Loss 10.6297 LearningRate 0.0618 Epoch: 4 Global Step: 177440 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:16,792-Speed 2613.68 samples/sec Loss 10.5446 LearningRate 0.0618 Epoch: 4 Global Step: 177450 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:20,686-Speed 2630.22 samples/sec Loss 10.5561 LearningRate 0.0618 Epoch: 4 Global Step: 177460 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:24,568-Speed 2638.44 samples/sec Loss 10.6395 LearningRate 0.0618 Epoch: 4 Global Step: 177470 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:28,464-Speed 2629.49 samples/sec Loss 10.7069 LearningRate 0.0618 Epoch: 4 Global Step: 177480 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:32,362-Speed 2626.98 samples/sec Loss 10.7957 LearningRate 0.0618 Epoch: 4 Global Step: 177490 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:36,254-Speed 2632.13 samples/sec Loss 10.7293 LearningRate 0.0618 Epoch: 4 Global Step: 177500 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:40,153-Speed 2626.92 samples/sec Loss 10.6744 LearningRate 0.0618 Epoch: 4 Global Step: 177510 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:44,041-Speed 2633.92 samples/sec Loss 10.6719 LearningRate 0.0618 Epoch: 4 Global Step: 177520 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:47,943-Speed 2625.26 samples/sec Loss 10.7571 LearningRate 0.0618 Epoch: 4 Global Step: 177530 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:51,846-Speed 2623.80 samples/sec Loss 10.7392 LearningRate 0.0618 Epoch: 4 Global Step: 177540 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:55,742-Speed 2629.35 samples/sec Loss 10.6444 LearningRate 0.0618 Epoch: 4 Global Step: 177550 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:39:59,635-Speed 2630.84 samples/sec Loss 10.6213 LearningRate 0.0618 Epoch: 4 Global Step: 177560 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:03,529-Speed 2630.41 samples/sec Loss 10.8568 LearningRate 0.0618 Epoch: 4 Global Step: 177570 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:40:07,403-Speed 2643.43 samples/sec Loss 10.7055 LearningRate 0.0618 Epoch: 4 Global Step: 177580 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:11,294-Speed 2632.32 samples/sec Loss 10.7894 LearningRate 0.0618 Epoch: 4 Global Step: 177590 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:15,190-Speed 2628.89 samples/sec Loss 10.6767 LearningRate 0.0618 Epoch: 4 Global Step: 177600 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:19,091-Speed 2625.45 samples/sec Loss 10.7030 LearningRate 0.0618 Epoch: 4 Global Step: 177610 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:22,991-Speed 2626.49 samples/sec Loss 10.6369 LearningRate 0.0618 Epoch: 4 Global Step: 177620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:26,901-Speed 2620.32 samples/sec Loss 10.5948 LearningRate 0.0618 Epoch: 4 Global Step: 177630 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:30,801-Speed 2625.93 samples/sec Loss 10.8084 LearningRate 0.0618 Epoch: 4 Global Step: 177640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:34,696-Speed 2629.33 samples/sec Loss 10.8131 LearningRate 0.0618 Epoch: 4 Global Step: 177650 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:40:38,574-Speed 2641.48 samples/sec Loss 10.6426 LearningRate 0.0618 Epoch: 4 Global Step: 177660 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:40:42,450-Speed 2642.23 samples/sec Loss 10.7322 LearningRate 0.0618 Epoch: 4 Global Step: 177670 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:40:46,352-Speed 2624.85 samples/sec Loss 10.7065 LearningRate 0.0618 Epoch: 4 Global Step: 177680 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:40:50,245-Speed 2630.81 samples/sec Loss 10.6451 LearningRate 0.0617 Epoch: 4 Global Step: 177690 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:40:54,140-Speed 2629.84 samples/sec Loss 10.8034 LearningRate 0.0617 Epoch: 4 Global Step: 177700 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:40:58,035-Speed 2629.82 samples/sec Loss 10.6571 LearningRate 0.0617 Epoch: 4 Global Step: 177710 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:41:01,935-Speed 2626.00 samples/sec Loss 10.6810 LearningRate 0.0617 Epoch: 4 Global Step: 177720 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:41:05,831-Speed 2629.34 samples/sec Loss 10.7038 LearningRate 0.0617 Epoch: 4 Global Step: 177730 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:41:09,725-Speed 2629.57 samples/sec Loss 10.7648 LearningRate 0.0617 Epoch: 4 Global Step: 177740 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:41:13,618-Speed 2631.65 samples/sec Loss 10.7444 LearningRate 0.0617 Epoch: 4 Global Step: 177750 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:41:17,507-Speed 2633.59 samples/sec Loss 10.6907 LearningRate 0.0617 Epoch: 4 Global Step: 177760 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:41:21,400-Speed 2630.51 samples/sec Loss 10.6934 LearningRate 0.0617 Epoch: 4 Global Step: 177770 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:25,292-Speed 2631.87 samples/sec Loss 10.7140 LearningRate 0.0617 Epoch: 4 Global Step: 177780 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:29,187-Speed 2629.56 samples/sec Loss 10.6796 LearningRate 0.0617 Epoch: 4 Global Step: 177790 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:33,092-Speed 2623.01 samples/sec Loss 10.8943 LearningRate 0.0617 Epoch: 4 Global Step: 177800 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:36,984-Speed 2631.55 samples/sec Loss 10.8446 LearningRate 0.0617 Epoch: 4 Global Step: 177810 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:40,878-Speed 2630.11 samples/sec Loss 10.7733 LearningRate 0.0617 Epoch: 4 Global Step: 177820 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:44,782-Speed 2623.54 samples/sec Loss 10.7231 LearningRate 0.0617 Epoch: 4 Global Step: 177830 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:48,673-Speed 2634.99 samples/sec Loss 10.7276 LearningRate 0.0617 Epoch: 4 Global Step: 177840 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:52,569-Speed 2628.97 samples/sec Loss 10.6892 LearningRate 0.0617 Epoch: 4 Global Step: 177850 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:41:56,460-Speed 2632.63 samples/sec Loss 10.6792 LearningRate 0.0617 Epoch: 4 Global Step: 177860 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:42:00,357-Speed 2628.10 samples/sec Loss 10.6002 LearningRate 0.0617 Epoch: 4 Global Step: 177870 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:04,253-Speed 2628.99 samples/sec Loss 10.7297 LearningRate 0.0617 Epoch: 4 Global Step: 177880 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:08,146-Speed 2630.87 samples/sec Loss 10.7052 LearningRate 0.0617 Epoch: 4 Global Step: 177890 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:12,047-Speed 2625.62 samples/sec Loss 10.7505 LearningRate 0.0617 Epoch: 4 Global Step: 177900 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:15,940-Speed 2630.73 samples/sec Loss 10.7265 LearningRate 0.0617 Epoch: 4 Global Step: 177910 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:19,830-Speed 2633.81 samples/sec Loss 10.6468 LearningRate 0.0617 Epoch: 4 Global Step: 177920 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:23,723-Speed 2630.30 samples/sec Loss 10.7412 LearningRate 0.0617 Epoch: 4 Global Step: 177930 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:27,618-Speed 2629.83 samples/sec Loss 10.7731 LearningRate 0.0617 Epoch: 4 Global Step: 177940 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:31,520-Speed 2625.09 samples/sec Loss 10.7229 LearningRate 0.0617 Epoch: 4 Global Step: 177950 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:35,441-Speed 2611.78 samples/sec Loss 10.7386 LearningRate 0.0617 Epoch: 4 Global Step: 177960 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:39,340-Speed 2626.89 samples/sec Loss 10.8662 LearningRate 0.0617 Epoch: 4 Global Step: 177970 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:42:43,231-Speed 2632.46 samples/sec Loss 10.6399 LearningRate 0.0617 Epoch: 4 Global Step: 177980 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:42:47,124-Speed 2631.52 samples/sec Loss 10.7031 LearningRate 0.0617 Epoch: 4 Global Step: 177990 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:42:51,020-Speed 2629.01 samples/sec Loss 10.7105 LearningRate 0.0617 Epoch: 4 Global Step: 178000 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:42:54,896-Speed 2642.01 samples/sec Loss 10.5960 LearningRate 0.0617 Epoch: 4 Global Step: 178010 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:42:58,793-Speed 2628.05 samples/sec Loss 10.5568 LearningRate 0.0617 Epoch: 4 Global Step: 178020 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:02,689-Speed 2629.39 samples/sec Loss 10.6569 LearningRate 0.0617 Epoch: 4 Global Step: 178030 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:06,583-Speed 2629.97 samples/sec Loss 10.7621 LearningRate 0.0617 Epoch: 4 Global Step: 178040 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:10,492-Speed 2620.12 samples/sec Loss 10.7767 LearningRate 0.0617 Epoch: 4 Global Step: 178050 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:14,390-Speed 2627.71 samples/sec Loss 10.6641 LearningRate 0.0617 Epoch: 4 Global Step: 178060 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:18,285-Speed 2629.56 samples/sec Loss 10.6103 LearningRate 0.0617 Epoch: 4 Global Step: 178070 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:22,179-Speed 2630.16 samples/sec Loss 10.8313 LearningRate 0.0617 Epoch: 4 Global Step: 178080 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:26,075-Speed 2629.07 samples/sec Loss 10.7969 LearningRate 0.0617 Epoch: 4 Global Step: 178090 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:29,968-Speed 2631.14 samples/sec Loss 10.6215 LearningRate 0.0617 Epoch: 4 Global Step: 178100 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:43:33,862-Speed 2630.06 samples/sec Loss 10.7521 LearningRate 0.0617 Epoch: 4 Global Step: 178110 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:43:37,754-Speed 2631.28 samples/sec Loss 10.7286 LearningRate 0.0617 Epoch: 4 Global Step: 178120 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:43:41,650-Speed 2629.64 samples/sec Loss 10.7450 LearningRate 0.0617 Epoch: 4 Global Step: 178130 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:43:45,560-Speed 2618.97 samples/sec Loss 10.7084 LearningRate 0.0617 Epoch: 4 Global Step: 178140 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:43:49,463-Speed 2624.30 samples/sec Loss 10.7583 LearningRate 0.0617 Epoch: 4 Global Step: 178150 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:43:53,356-Speed 2630.66 samples/sec Loss 10.7655 LearningRate 0.0617 Epoch: 4 Global Step: 178160 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:43:57,249-Speed 2631.52 samples/sec Loss 10.6423 LearningRate 0.0617 Epoch: 4 Global Step: 178170 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:01,188-Speed 2600.41 samples/sec Loss 10.7013 LearningRate 0.0617 Epoch: 4 Global Step: 178180 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:05,084-Speed 2628.53 samples/sec Loss 10.5544 LearningRate 0.0617 Epoch: 4 Global Step: 178190 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:08,984-Speed 2625.90 samples/sec Loss 10.7151 LearningRate 0.0617 Epoch: 4 Global Step: 178200 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:12,963-Speed 2574.20 samples/sec Loss 10.8585 LearningRate 0.0617 Epoch: 4 Global Step: 178210 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:16,916-Speed 2590.92 samples/sec Loss 10.6912 LearningRate 0.0616 Epoch: 4 Global Step: 178220 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:20,812-Speed 2628.75 samples/sec Loss 10.7455 LearningRate 0.0616 Epoch: 4 Global Step: 178230 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:24,712-Speed 2626.24 samples/sec Loss 10.6219 LearningRate 0.0616 Epoch: 4 Global Step: 178240 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:28,619-Speed 2621.78 samples/sec Loss 10.6290 LearningRate 0.0616 Epoch: 4 Global Step: 178250 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:32,513-Speed 2630.60 samples/sec Loss 10.6873 LearningRate 0.0616 Epoch: 4 Global Step: 178260 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:36,404-Speed 2632.71 samples/sec Loss 10.8280 LearningRate 0.0616 Epoch: 4 Global Step: 178270 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:40,312-Speed 2620.81 samples/sec Loss 10.6867 LearningRate 0.0616 Epoch: 4 Global Step: 178280 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:44,230-Speed 2614.08 samples/sec Loss 10.7628 LearningRate 0.0616 Epoch: 4 Global Step: 178290 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:48,138-Speed 2620.84 samples/sec Loss 10.7297 LearningRate 0.0616 Epoch: 4 Global Step: 178300 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:52,022-Speed 2636.88 samples/sec Loss 10.6877 LearningRate 0.0616 Epoch: 4 Global Step: 178310 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:55,928-Speed 2622.52 samples/sec Loss 10.6968 LearningRate 0.0616 Epoch: 4 Global Step: 178320 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:44:59,820-Speed 2631.43 samples/sec Loss 10.6623 LearningRate 0.0616 Epoch: 4 Global Step: 178330 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:03,713-Speed 2630.59 samples/sec Loss 10.8234 LearningRate 0.0616 Epoch: 4 Global Step: 178340 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:07,606-Speed 2631.58 samples/sec Loss 10.6855 LearningRate 0.0616 Epoch: 4 Global Step: 178350 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:11,499-Speed 2631.23 samples/sec Loss 10.6041 LearningRate 0.0616 Epoch: 4 Global Step: 178360 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:15,394-Speed 2629.08 samples/sec Loss 10.5299 LearningRate 0.0616 Epoch: 4 Global Step: 178370 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:19,294-Speed 2626.08 samples/sec Loss 10.6934 LearningRate 0.0616 Epoch: 4 Global Step: 178380 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:23,190-Speed 2628.84 samples/sec Loss 10.6114 LearningRate 0.0616 Epoch: 4 Global Step: 178390 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:27,089-Speed 2627.10 samples/sec Loss 10.6748 LearningRate 0.0616 Epoch: 4 Global Step: 178400 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:30,992-Speed 2623.96 samples/sec Loss 10.6957 LearningRate 0.0616 Epoch: 4 Global Step: 178410 Fp16 Grad Scale: 524288 Required: 73 hours
Training: 2022-04-13 15:45:34,876-Speed 2637.20 samples/sec Loss 10.6529 LearningRate 0.0616 Epoch: 4 Global Step: 178420 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:38,770-Speed 2630.70 samples/sec Loss 10.7334 LearningRate 0.0616 Epoch: 4 Global Step: 178430 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:42,665-Speed 2629.65 samples/sec Loss 10.6328 LearningRate 0.0616 Epoch: 4 Global Step: 178440 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:46,557-Speed 2631.20 samples/sec Loss 10.6947 LearningRate 0.0616 Epoch: 4 Global Step: 178450 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:50,450-Speed 2630.95 samples/sec Loss 10.5275 LearningRate 0.0616 Epoch: 4 Global Step: 178460 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:54,351-Speed 2625.72 samples/sec Loss 10.8108 LearningRate 0.0616 Epoch: 4 Global Step: 178470 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:45:58,246-Speed 2629.35 samples/sec Loss 10.6088 LearningRate 0.0616 Epoch: 4 Global Step: 178480 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:02,142-Speed 2629.14 samples/sec Loss 10.5937 LearningRate 0.0616 Epoch: 4 Global Step: 178490 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:06,039-Speed 2628.52 samples/sec Loss 10.7525 LearningRate 0.0616 Epoch: 4 Global Step: 178500 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:09,935-Speed 2628.54 samples/sec Loss 10.7295 LearningRate 0.0616 Epoch: 4 Global Step: 178510 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:13,815-Speed 2639.29 samples/sec Loss 10.7554 LearningRate 0.0616 Epoch: 4 Global Step: 178520 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:17,710-Speed 2629.47 samples/sec Loss 10.7040 LearningRate 0.0616 Epoch: 4 Global Step: 178530 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:21,605-Speed 2630.63 samples/sec Loss 10.7745 LearningRate 0.0616 Epoch: 4 Global Step: 178540 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:25,495-Speed 2632.38 samples/sec Loss 10.7126 LearningRate 0.0616 Epoch: 4 Global Step: 178550 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:29,389-Speed 2630.57 samples/sec Loss 10.6019 LearningRate 0.0616 Epoch: 4 Global Step: 178560 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:33,282-Speed 2631.17 samples/sec Loss 10.6452 LearningRate 0.0616 Epoch: 4 Global Step: 178570 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:37,176-Speed 2629.60 samples/sec Loss 10.8244 LearningRate 0.0616 Epoch: 4 Global Step: 178580 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:41,072-Speed 2628.89 samples/sec Loss 10.6246 LearningRate 0.0616 Epoch: 4 Global Step: 178590 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:44,972-Speed 2626.50 samples/sec Loss 10.6387 LearningRate 0.0616 Epoch: 4 Global Step: 178600 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:46:48,851-Speed 2639.91 samples/sec Loss 10.6630 LearningRate 0.0616 Epoch: 4 Global Step: 178610 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:46:52,758-Speed 2622.12 samples/sec Loss 10.7916 LearningRate 0.0616 Epoch: 4 Global Step: 178620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:46:56,659-Speed 2624.99 samples/sec Loss 10.8277 LearningRate 0.0616 Epoch: 4 Global Step: 178630 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:00,559-Speed 2627.03 samples/sec Loss 10.6252 LearningRate 0.0616 Epoch: 4 Global Step: 178640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:04,451-Speed 2631.30 samples/sec Loss 10.7371 LearningRate 0.0616 Epoch: 4 Global Step: 178650 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:08,347-Speed 2629.29 samples/sec Loss 10.6699 LearningRate 0.0616 Epoch: 4 Global Step: 178660 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:12,241-Speed 2629.93 samples/sec Loss 10.5969 LearningRate 0.0616 Epoch: 4 Global Step: 178670 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:16,136-Speed 2629.53 samples/sec Loss 10.5907 LearningRate 0.0616 Epoch: 4 Global Step: 178680 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:20,030-Speed 2630.25 samples/sec Loss 10.6133 LearningRate 0.0616 Epoch: 4 Global Step: 178690 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:23,936-Speed 2622.20 samples/sec Loss 10.5881 LearningRate 0.0616 Epoch: 4 Global Step: 178700 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:27,827-Speed 2632.51 samples/sec Loss 10.8299 LearningRate 0.0616 Epoch: 4 Global Step: 178710 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:47:31,722-Speed 2628.89 samples/sec Loss 10.6099 LearningRate 0.0616 Epoch: 4 Global Step: 178720 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:47:35,617-Speed 2629.98 samples/sec Loss 10.7039 LearningRate 0.0616 Epoch: 4 Global Step: 178730 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:47:39,522-Speed 2623.15 samples/sec Loss 10.6398 LearningRate 0.0616 Epoch: 4 Global Step: 178740 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:47:43,429-Speed 2621.58 samples/sec Loss 10.4824 LearningRate 0.0615 Epoch: 4 Global Step: 178750 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:47:47,330-Speed 2625.20 samples/sec Loss 10.4868 LearningRate 0.0615 Epoch: 4 Global Step: 178760 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:47:51,224-Speed 2630.64 samples/sec Loss 10.5890 LearningRate 0.0615 Epoch: 4 Global Step: 178770 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:47:55,098-Speed 2643.72 samples/sec Loss 10.5201 LearningRate 0.0615 Epoch: 4 Global Step: 178780 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:47:58,988-Speed 2632.70 samples/sec Loss 10.6658 LearningRate 0.0615 Epoch: 4 Global Step: 178790 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:02,882-Speed 2630.28 samples/sec Loss 10.7692 LearningRate 0.0615 Epoch: 4 Global Step: 178800 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:06,777-Speed 2629.24 samples/sec Loss 10.7885 LearningRate 0.0615 Epoch: 4 Global Step: 178810 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:10,670-Speed 2630.95 samples/sec Loss 10.7999 LearningRate 0.0615 Epoch: 4 Global Step: 178820 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:14,565-Speed 2630.11 samples/sec Loss 10.5970 LearningRate 0.0615 Epoch: 4 Global Step: 178830 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:18,459-Speed 2630.36 samples/sec Loss 10.7011 LearningRate 0.0615 Epoch: 4 Global Step: 178840 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:22,351-Speed 2631.29 samples/sec Loss 10.6921 LearningRate 0.0615 Epoch: 4 Global Step: 178850 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:26,244-Speed 2631.07 samples/sec Loss 10.5355 LearningRate 0.0615 Epoch: 4 Global Step: 178860 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:30,318-Speed 2514.26 samples/sec Loss 10.7501 LearningRate 0.0615 Epoch: 4 Global Step: 178870 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:34,355-Speed 2536.71 samples/sec Loss 10.6171 LearningRate 0.0615 Epoch: 4 Global Step: 178880 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:38,286-Speed 2605.51 samples/sec Loss 10.6193 LearningRate 0.0615 Epoch: 4 Global Step: 178890 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:42,222-Speed 2602.38 samples/sec Loss 10.7977 LearningRate 0.0615 Epoch: 4 Global Step: 178900 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:46,140-Speed 2613.74 samples/sec Loss 10.6376 LearningRate 0.0615 Epoch: 4 Global Step: 178910 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:50,033-Speed 2631.44 samples/sec Loss 10.7728 LearningRate 0.0615 Epoch: 4 Global Step: 178920 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:53,931-Speed 2627.75 samples/sec Loss 10.5955 LearningRate 0.0615 Epoch: 4 Global Step: 178930 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:48:57,840-Speed 2621.01 samples/sec Loss 10.6787 LearningRate 0.0615 Epoch: 4 Global Step: 178940 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:01,745-Speed 2622.37 samples/sec Loss 10.6520 LearningRate 0.0615 Epoch: 4 Global Step: 178950 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:05,637-Speed 2631.57 samples/sec Loss 10.7467 LearningRate 0.0615 Epoch: 4 Global Step: 178960 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:09,532-Speed 2629.26 samples/sec Loss 10.7096 LearningRate 0.0615 Epoch: 4 Global Step: 178970 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:13,493-Speed 2586.29 samples/sec Loss 10.8246 LearningRate 0.0615 Epoch: 4 Global Step: 178980 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:49:17,394-Speed 2625.31 samples/sec Loss 10.6916 LearningRate 0.0615 Epoch: 4 Global Step: 178990 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:49:21,271-Speed 2641.74 samples/sec Loss 10.6205 LearningRate 0.0615 Epoch: 4 Global Step: 179000 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:25,163-Speed 2631.73 samples/sec Loss 10.7163 LearningRate 0.0615 Epoch: 4 Global Step: 179010 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:29,057-Speed 2630.77 samples/sec Loss 10.6433 LearningRate 0.0615 Epoch: 4 Global Step: 179020 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:32,951-Speed 2630.28 samples/sec Loss 10.5948 LearningRate 0.0615 Epoch: 4 Global Step: 179030 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:36,843-Speed 2631.65 samples/sec Loss 10.6182 LearningRate 0.0615 Epoch: 4 Global Step: 179040 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:40,735-Speed 2631.05 samples/sec Loss 10.6851 LearningRate 0.0615 Epoch: 4 Global Step: 179050 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:44,631-Speed 2629.08 samples/sec Loss 10.4721 LearningRate 0.0615 Epoch: 4 Global Step: 179060 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:49:48,478-Speed 2662.72 samples/sec Loss 11.4678 LearningRate 0.0615 Epoch: 4 Global Step: 179070 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:49:52,371-Speed 2630.80 samples/sec Loss 11.3414 LearningRate 0.0615 Epoch: 4 Global Step: 179080 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:49:56,262-Speed 2632.50 samples/sec Loss 11.0457 LearningRate 0.0615 Epoch: 4 Global Step: 179090 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:00,158-Speed 2628.37 samples/sec Loss 10.8251 LearningRate 0.0615 Epoch: 4 Global Step: 179100 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:04,049-Speed 2632.94 samples/sec Loss 10.6977 LearningRate 0.0615 Epoch: 4 Global Step: 179110 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:07,937-Speed 2634.26 samples/sec Loss 10.7251 LearningRate 0.0615 Epoch: 4 Global Step: 179120 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:11,830-Speed 2631.24 samples/sec Loss 10.7323 LearningRate 0.0615 Epoch: 4 Global Step: 179130 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:15,720-Speed 2632.38 samples/sec Loss 10.6973 LearningRate 0.0615 Epoch: 4 Global Step: 179140 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:19,615-Speed 2629.35 samples/sec Loss 10.7236 LearningRate 0.0615 Epoch: 4 Global Step: 179150 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:23,516-Speed 2625.79 samples/sec Loss 10.7575 LearningRate 0.0615 Epoch: 4 Global Step: 179160 Fp16 Grad Scale: 16384 Required: 73 hours
Training: 2022-04-13 15:50:27,399-Speed 2637.80 samples/sec Loss 10.6121 LearningRate 0.0615 Epoch: 4 Global Step: 179170 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:31,294-Speed 2629.71 samples/sec Loss 10.7194 LearningRate 0.0615 Epoch: 4 Global Step: 179180 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:35,189-Speed 2629.34 samples/sec Loss 10.7210 LearningRate 0.0615 Epoch: 4 Global Step: 179190 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:39,083-Speed 2630.86 samples/sec Loss 10.8128 LearningRate 0.0615 Epoch: 4 Global Step: 179200 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:42,976-Speed 2631.26 samples/sec Loss 10.8504 LearningRate 0.0615 Epoch: 4 Global Step: 179210 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:46,865-Speed 2633.12 samples/sec Loss 10.7281 LearningRate 0.0615 Epoch: 4 Global Step: 179220 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:50,756-Speed 2632.29 samples/sec Loss 10.8003 LearningRate 0.0615 Epoch: 4 Global Step: 179230 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:54,645-Speed 2633.32 samples/sec Loss 10.7831 LearningRate 0.0615 Epoch: 4 Global Step: 179240 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:50:58,538-Speed 2631.54 samples/sec Loss 10.6746 LearningRate 0.0615 Epoch: 4 Global Step: 179250 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:51:02,432-Speed 2630.33 samples/sec Loss 10.5514 LearningRate 0.0615 Epoch: 4 Global Step: 179260 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 15:51:06,324-Speed 2631.98 samples/sec Loss 10.6155 LearningRate 0.0615 Epoch: 4 Global Step: 179270 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:10,216-Speed 2631.62 samples/sec Loss 10.5372 LearningRate 0.0614 Epoch: 4 Global Step: 179280 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:14,109-Speed 2631.31 samples/sec Loss 10.6337 LearningRate 0.0614 Epoch: 4 Global Step: 179290 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:18,004-Speed 2629.74 samples/sec Loss 11.4906 LearningRate 0.0614 Epoch: 4 Global Step: 179300 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:21,895-Speed 2632.23 samples/sec Loss 11.0863 LearningRate 0.0614 Epoch: 4 Global Step: 179310 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:25,790-Speed 2629.39 samples/sec Loss 10.8555 LearningRate 0.0614 Epoch: 4 Global Step: 179320 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:29,681-Speed 2632.50 samples/sec Loss 10.8270 LearningRate 0.0614 Epoch: 4 Global Step: 179330 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:33,574-Speed 2630.85 samples/sec Loss 10.7821 LearningRate 0.0614 Epoch: 4 Global Step: 179340 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:37,466-Speed 2631.54 samples/sec Loss 10.7586 LearningRate 0.0614 Epoch: 4 Global Step: 179350 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:41,356-Speed 2632.88 samples/sec Loss 10.6942 LearningRate 0.0614 Epoch: 4 Global Step: 179360 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:51:45,248-Speed 2631.72 samples/sec Loss 10.7069 LearningRate 0.0614 Epoch: 4 Global Step: 179370 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:51:49,141-Speed 2630.64 samples/sec Loss 10.7330 LearningRate 0.0614 Epoch: 4 Global Step: 179380 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:51:53,070-Speed 2607.33 samples/sec Loss 10.8350 LearningRate 0.0614 Epoch: 4 Global Step: 179390 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:51:56,994-Speed 2610.61 samples/sec Loss 10.6367 LearningRate 0.0614 Epoch: 4 Global Step: 179400 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:00,887-Speed 2630.55 samples/sec Loss 10.6038 LearningRate 0.0614 Epoch: 4 Global Step: 179410 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:04,809-Speed 2612.06 samples/sec Loss 10.7514 LearningRate 0.0614 Epoch: 4 Global Step: 179420 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:08,706-Speed 2628.32 samples/sec Loss 10.6394 LearningRate 0.0614 Epoch: 4 Global Step: 179430 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:12,598-Speed 2631.94 samples/sec Loss 10.6900 LearningRate 0.0614 Epoch: 4 Global Step: 179440 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:16,515-Speed 2614.72 samples/sec Loss 10.6217 LearningRate 0.0614 Epoch: 4 Global Step: 179450 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:20,410-Speed 2630.00 samples/sec Loss 10.7656 LearningRate 0.0614 Epoch: 4 Global Step: 179460 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:24,310-Speed 2626.08 samples/sec Loss 10.7729 LearningRate 0.0614 Epoch: 4 Global Step: 179470 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:52:28,202-Speed 2631.67 samples/sec Loss 10.6612 LearningRate 0.0614 Epoch: 4 Global Step: 179480 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:52:32,104-Speed 2624.75 samples/sec Loss 10.7255 LearningRate 0.0614 Epoch: 4 Global Step: 179490 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:52:35,980-Speed 2642.33 samples/sec Loss 10.6804 LearningRate 0.0614 Epoch: 4 Global Step: 179500 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:39,872-Speed 2631.80 samples/sec Loss 10.6216 LearningRate 0.0614 Epoch: 4 Global Step: 179510 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:43,762-Speed 2633.08 samples/sec Loss 10.7651 LearningRate 0.0614 Epoch: 4 Global Step: 179520 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:47,653-Speed 2632.22 samples/sec Loss 10.5641 LearningRate 0.0614 Epoch: 4 Global Step: 179530 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:51,547-Speed 2630.63 samples/sec Loss 10.6669 LearningRate 0.0614 Epoch: 4 Global Step: 179540 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:55,438-Speed 2631.78 samples/sec Loss 10.7712 LearningRate 0.0614 Epoch: 4 Global Step: 179550 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:52:59,330-Speed 2631.68 samples/sec Loss 10.5663 LearningRate 0.0614 Epoch: 4 Global Step: 179560 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:03,224-Speed 2630.07 samples/sec Loss 10.9089 LearningRate 0.0614 Epoch: 4 Global Step: 179570 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:07,119-Speed 2629.98 samples/sec Loss 10.6575 LearningRate 0.0614 Epoch: 4 Global Step: 179580 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:11,012-Speed 2631.42 samples/sec Loss 10.6964 LearningRate 0.0614 Epoch: 4 Global Step: 179590 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:14,893-Speed 2638.74 samples/sec Loss 10.8384 LearningRate 0.0614 Epoch: 4 Global Step: 179600 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:18,790-Speed 2628.45 samples/sec Loss 10.7343 LearningRate 0.0614 Epoch: 4 Global Step: 179610 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:22,680-Speed 2632.77 samples/sec Loss 10.6719 LearningRate 0.0614 Epoch: 4 Global Step: 179620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:26,583-Speed 2624.86 samples/sec Loss 10.8384 LearningRate 0.0614 Epoch: 4 Global Step: 179630 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:30,473-Speed 2632.87 samples/sec Loss 10.6800 LearningRate 0.0614 Epoch: 4 Global Step: 179640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:53:34,356-Speed 2638.04 samples/sec Loss 10.7496 LearningRate 0.0614 Epoch: 4 Global Step: 179650 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:53:38,246-Speed 2632.98 samples/sec Loss 10.8088 LearningRate 0.0614 Epoch: 4 Global Step: 179660 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:53:42,139-Speed 2631.30 samples/sec Loss 10.6878 LearningRate 0.0614 Epoch: 4 Global Step: 179670 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:53:46,030-Speed 2631.97 samples/sec Loss 10.6731 LearningRate 0.0614 Epoch: 4 Global Step: 179680 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:53:49,922-Speed 2632.04 samples/sec Loss 10.8416 LearningRate 0.0614 Epoch: 4 Global Step: 179690 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:53:53,816-Speed 2629.93 samples/sec Loss 10.6913 LearningRate 0.0614 Epoch: 4 Global Step: 179700 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:53:57,708-Speed 2632.20 samples/sec Loss 10.7023 LearningRate 0.0614 Epoch: 4 Global Step: 179710 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:54:01,598-Speed 2633.07 samples/sec Loss 10.6951 LearningRate 0.0614 Epoch: 4 Global Step: 179720 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:54:05,505-Speed 2621.16 samples/sec Loss 10.6363 LearningRate 0.0614 Epoch: 4 Global Step: 179730 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:54:09,400-Speed 2629.97 samples/sec Loss 10.6136 LearningRate 0.0614 Epoch: 4 Global Step: 179740 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 15:54:13,296-Speed 2629.48 samples/sec Loss 10.7743 LearningRate 0.0614 Epoch: 4 Global Step: 179750 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:17,194-Speed 2627.55 samples/sec Loss 10.5550 LearningRate 0.0614 Epoch: 4 Global Step: 179760 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:21,091-Speed 2628.12 samples/sec Loss 10.5721 LearningRate 0.0614 Epoch: 4 Global Step: 179770 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:24,982-Speed 2632.46 samples/sec Loss 10.6277 LearningRate 0.0614 Epoch: 4 Global Step: 179780 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:28,875-Speed 2631.35 samples/sec Loss 10.7339 LearningRate 0.0614 Epoch: 4 Global Step: 179790 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:32,766-Speed 2631.99 samples/sec Loss 10.7212 LearningRate 0.0614 Epoch: 4 Global Step: 179800 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:36,671-Speed 2622.63 samples/sec Loss 10.8057 LearningRate 0.0613 Epoch: 4 Global Step: 179810 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:40,574-Speed 2624.23 samples/sec Loss 10.5653 LearningRate 0.0613 Epoch: 4 Global Step: 179820 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:44,469-Speed 2630.05 samples/sec Loss 10.7153 LearningRate 0.0613 Epoch: 4 Global Step: 179830 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:48,376-Speed 2621.89 samples/sec Loss 10.7245 LearningRate 0.0613 Epoch: 4 Global Step: 179840 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:54:52,289-Speed 2617.62 samples/sec Loss 10.6649 LearningRate 0.0613 Epoch: 4 Global Step: 179850 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:54:56,240-Speed 2592.14 samples/sec Loss 10.5395 LearningRate 0.0613 Epoch: 4 Global Step: 179860 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:00,139-Speed 2627.10 samples/sec Loss 10.7672 LearningRate 0.0613 Epoch: 4 Global Step: 179870 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:04,037-Speed 2627.84 samples/sec Loss 10.7356 LearningRate 0.0613 Epoch: 4 Global Step: 179880 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:07,930-Speed 2630.56 samples/sec Loss 10.6357 LearningRate 0.0613 Epoch: 4 Global Step: 179890 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:11,825-Speed 2630.17 samples/sec Loss 10.7863 LearningRate 0.0613 Epoch: 4 Global Step: 179900 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:15,753-Speed 2607.18 samples/sec Loss 10.6860 LearningRate 0.0613 Epoch: 4 Global Step: 179910 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:19,648-Speed 2630.43 samples/sec Loss 10.6230 LearningRate 0.0613 Epoch: 4 Global Step: 179920 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:23,552-Speed 2623.49 samples/sec Loss 10.5793 LearningRate 0.0613 Epoch: 4 Global Step: 179930 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:27,453-Speed 2625.59 samples/sec Loss 10.6637 LearningRate 0.0613 Epoch: 4 Global Step: 179940 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:31,348-Speed 2630.10 samples/sec Loss 10.5728 LearningRate 0.0613 Epoch: 4 Global Step: 179950 Fp16 Grad Scale: 524288 Required: 73 hours
Training: 2022-04-13 15:55:35,229-Speed 2639.09 samples/sec Loss 10.7175 LearningRate 0.0613 Epoch: 4 Global Step: 179960 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:39,118-Speed 2633.78 samples/sec Loss 10.7092 LearningRate 0.0613 Epoch: 4 Global Step: 179970 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:43,112-Speed 2564.91 samples/sec Loss 10.7567 LearningRate 0.0613 Epoch: 4 Global Step: 179980 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:47,098-Speed 2569.48 samples/sec Loss 10.4948 LearningRate 0.0613 Epoch: 4 Global Step: 179990 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:55:50,989-Speed 2631.95 samples/sec Loss 10.6943 LearningRate 0.0613 Epoch: 4 Global Step: 180000 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:56:33,642-[lfw][180000]XNorm: 23.741681
Training: 2022-04-13 15:56:33,643-[lfw][180000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-13 15:56:33,644-[lfw][180000]Accuracy-Highest: 0.99783
Training: 2022-04-13 15:57:23,685-[cfp_fp][180000]XNorm: 21.689816
Training: 2022-04-13 15:57:23,686-[cfp_fp][180000]Accuracy-Flip: 0.98043+-0.00667
Training: 2022-04-13 15:57:23,687-[cfp_fp][180000]Accuracy-Highest: 0.98100
Training: 2022-04-13 15:58:06,863-[agedb_30][180000]XNorm: 23.205789
Training: 2022-04-13 15:58:06,867-[agedb_30][180000]Accuracy-Flip: 0.97150+-0.00769
Training: 2022-04-13 15:58:06,867-[agedb_30][180000]Accuracy-Highest: 0.97150
Training: 2022-04-13 15:58:10,730-Speed 73.28 samples/sec Loss 10.5278 LearningRate 0.0613 Epoch: 4 Global Step: 180010 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:14,592-Speed 2652.22 samples/sec Loss 10.6196 LearningRate 0.0613 Epoch: 4 Global Step: 180020 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:18,459-Speed 2648.84 samples/sec Loss 10.6917 LearningRate 0.0613 Epoch: 4 Global Step: 180030 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:22,322-Speed 2651.41 samples/sec Loss 10.5523 LearningRate 0.0613 Epoch: 4 Global Step: 180040 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:26,189-Speed 2648.36 samples/sec Loss 10.6624 LearningRate 0.0613 Epoch: 4 Global Step: 180050 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:30,062-Speed 2644.66 samples/sec Loss 10.7234 LearningRate 0.0613 Epoch: 4 Global Step: 180060 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:33,931-Speed 2648.06 samples/sec Loss 10.6265 LearningRate 0.0613 Epoch: 4 Global Step: 180070 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:37,805-Speed 2643.89 samples/sec Loss 10.6824 LearningRate 0.0613 Epoch: 4 Global Step: 180080 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:41,692-Speed 2635.19 samples/sec Loss 10.7101 LearningRate 0.0613 Epoch: 4 Global Step: 180090 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:45,573-Speed 2639.67 samples/sec Loss 10.6837 LearningRate 0.0613 Epoch: 4 Global Step: 180100 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:49,453-Speed 2639.66 samples/sec Loss 10.6291 LearningRate 0.0613 Epoch: 4 Global Step: 180110 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:53,337-Speed 2637.50 samples/sec Loss 10.5625 LearningRate 0.0613 Epoch: 4 Global Step: 180120 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:58:57,204-Speed 2648.41 samples/sec Loss 10.6789 LearningRate 0.0613 Epoch: 4 Global Step: 180130 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:01,088-Speed 2637.47 samples/sec Loss 10.6072 LearningRate 0.0613 Epoch: 4 Global Step: 180140 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:04,975-Speed 2634.96 samples/sec Loss 10.5723 LearningRate 0.0613 Epoch: 4 Global Step: 180150 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:08,871-Speed 2629.16 samples/sec Loss 10.6580 LearningRate 0.0613 Epoch: 4 Global Step: 180160 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:12,759-Speed 2637.26 samples/sec Loss 10.6919 LearningRate 0.0613 Epoch: 4 Global Step: 180170 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:16,649-Speed 2633.00 samples/sec Loss 10.5645 LearningRate 0.0613 Epoch: 4 Global Step: 180180 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:20,542-Speed 2632.45 samples/sec Loss 10.8033 LearningRate 0.0613 Epoch: 4 Global Step: 180190 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:24,433-Speed 2632.10 samples/sec Loss 10.6376 LearningRate 0.0613 Epoch: 4 Global Step: 180200 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:28,344-Speed 2618.25 samples/sec Loss 10.5730 LearningRate 0.0613 Epoch: 4 Global Step: 180210 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:32,240-Speed 2629.24 samples/sec Loss 10.6707 LearningRate 0.0613 Epoch: 4 Global Step: 180220 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:36,133-Speed 2631.32 samples/sec Loss 10.6471 LearningRate 0.0613 Epoch: 4 Global Step: 180230 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:59:40,023-Speed 2633.05 samples/sec Loss 10.8297 LearningRate 0.0613 Epoch: 4 Global Step: 180240 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:59:43,913-Speed 2633.14 samples/sec Loss 10.6352 LearningRate 0.0613 Epoch: 4 Global Step: 180250 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:59:47,809-Speed 2628.75 samples/sec Loss 10.7809 LearningRate 0.0613 Epoch: 4 Global Step: 180260 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:59:51,704-Speed 2630.24 samples/sec Loss 10.4698 LearningRate 0.0613 Epoch: 4 Global Step: 180270 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 15:59:55,588-Speed 2636.84 samples/sec Loss 10.7069 LearningRate 0.0613 Epoch: 4 Global Step: 180280 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 15:59:59,460-Speed 2645.71 samples/sec Loss 10.6178 LearningRate 0.0613 Epoch: 4 Global Step: 180290 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:03,354-Speed 2630.48 samples/sec Loss 10.5063 LearningRate 0.0613 Epoch: 4 Global Step: 180300 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:07,246-Speed 2631.62 samples/sec Loss 10.7155 LearningRate 0.0613 Epoch: 4 Global Step: 180310 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:11,138-Speed 2631.93 samples/sec Loss 10.8032 LearningRate 0.0613 Epoch: 4 Global Step: 180320 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:15,039-Speed 2625.64 samples/sec Loss 10.6509 LearningRate 0.0613 Epoch: 4 Global Step: 180330 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:18,952-Speed 2618.09 samples/sec Loss 10.6850 LearningRate 0.0612 Epoch: 4 Global Step: 180340 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:22,855-Speed 2623.65 samples/sec Loss 10.7459 LearningRate 0.0612 Epoch: 4 Global Step: 180350 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:26,745-Speed 2633.42 samples/sec Loss 10.7226 LearningRate 0.0612 Epoch: 4 Global Step: 180360 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:30,634-Speed 2633.53 samples/sec Loss 10.6855 LearningRate 0.0612 Epoch: 4 Global Step: 180370 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:34,524-Speed 2633.47 samples/sec Loss 10.6966 LearningRate 0.0612 Epoch: 4 Global Step: 180380 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:00:38,567-Speed 2533.00 samples/sec Loss 10.8163 LearningRate 0.0612 Epoch: 4 Global Step: 180390 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:00:42,457-Speed 2633.50 samples/sec Loss 10.6296 LearningRate 0.0612 Epoch: 4 Global Step: 180400 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:00:46,359-Speed 2625.30 samples/sec Loss 10.6958 LearningRate 0.0612 Epoch: 4 Global Step: 180410 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:00:50,241-Speed 2639.01 samples/sec Loss 10.6559 LearningRate 0.0612 Epoch: 4 Global Step: 180420 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:00:54,129-Speed 2633.95 samples/sec Loss 10.7385 LearningRate 0.0612 Epoch: 4 Global Step: 180430 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:00:58,016-Speed 2635.55 samples/sec Loss 10.7052 LearningRate 0.0612 Epoch: 4 Global Step: 180440 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:01,910-Speed 2630.01 samples/sec Loss 10.5470 LearningRate 0.0612 Epoch: 4 Global Step: 180450 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:05,818-Speed 2621.08 samples/sec Loss 10.6318 LearningRate 0.0612 Epoch: 4 Global Step: 180460 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:10,584-Speed 2149.00 samples/sec Loss 10.6697 LearningRate 0.0612 Epoch: 4 Global Step: 180470 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:14,472-Speed 2634.88 samples/sec Loss 10.7839 LearningRate 0.0612 Epoch: 4 Global Step: 180480 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:18,338-Speed 2648.70 samples/sec Loss 10.8303 LearningRate 0.0612 Epoch: 4 Global Step: 180490 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:22,310-Speed 2579.14 samples/sec Loss 10.5812 LearningRate 0.0612 Epoch: 4 Global Step: 180500 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:26,316-Speed 2556.49 samples/sec Loss 10.5814 LearningRate 0.0612 Epoch: 4 Global Step: 180510 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:30,201-Speed 2635.97 samples/sec Loss 10.7076 LearningRate 0.0612 Epoch: 4 Global Step: 180520 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:34,093-Speed 2631.66 samples/sec Loss 10.6563 LearningRate 0.0612 Epoch: 4 Global Step: 180530 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:37,987-Speed 2630.97 samples/sec Loss 10.6241 LearningRate 0.0612 Epoch: 4 Global Step: 180540 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:41,887-Speed 2626.25 samples/sec Loss 10.7666 LearningRate 0.0612 Epoch: 4 Global Step: 180550 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:45,782-Speed 2629.40 samples/sec Loss 10.7071 LearningRate 0.0612 Epoch: 4 Global Step: 180560 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:49,681-Speed 2627.84 samples/sec Loss 10.7200 LearningRate 0.0612 Epoch: 4 Global Step: 180570 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:53,573-Speed 2631.80 samples/sec Loss 10.6394 LearningRate 0.0612 Epoch: 4 Global Step: 180580 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:01:57,462-Speed 2633.59 samples/sec Loss 10.6807 LearningRate 0.0612 Epoch: 4 Global Step: 180590 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:02:01,443-Speed 2572.16 samples/sec Loss 10.6814 LearningRate 0.0612 Epoch: 4 Global Step: 180600 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:02:05,500-Speed 2525.33 samples/sec Loss 10.6107 LearningRate 0.0612 Epoch: 4 Global Step: 180610 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:09,595-Speed 2501.00 samples/sec Loss 10.6533 LearningRate 0.0612 Epoch: 4 Global Step: 180620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:13,535-Speed 2599.77 samples/sec Loss 10.5806 LearningRate 0.0612 Epoch: 4 Global Step: 180630 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:17,439-Speed 2623.96 samples/sec Loss 10.5941 LearningRate 0.0612 Epoch: 4 Global Step: 180640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:21,329-Speed 2632.65 samples/sec Loss 10.6386 LearningRate 0.0612 Epoch: 4 Global Step: 180650 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:25,219-Speed 2633.27 samples/sec Loss 10.5028 LearningRate 0.0612 Epoch: 4 Global Step: 180660 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:29,124-Speed 2622.47 samples/sec Loss 10.8272 LearningRate 0.0612 Epoch: 4 Global Step: 180670 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:33,014-Speed 2633.17 samples/sec Loss 10.5021 LearningRate 0.0612 Epoch: 4 Global Step: 180680 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:36,905-Speed 2632.43 samples/sec Loss 10.7250 LearningRate 0.0612 Epoch: 4 Global Step: 180690 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:40,803-Speed 2627.39 samples/sec Loss 10.7216 LearningRate 0.0612 Epoch: 4 Global Step: 180700 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:02:44,695-Speed 2631.51 samples/sec Loss 10.6586 LearningRate 0.0612 Epoch: 4 Global Step: 180710 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:02:48,606-Speed 2619.37 samples/sec Loss 10.7203 LearningRate 0.0612 Epoch: 4 Global Step: 180720 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:02:52,505-Speed 2626.87 samples/sec Loss 10.6982 LearningRate 0.0612 Epoch: 4 Global Step: 180730 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:02:56,440-Speed 2602.74 samples/sec Loss 10.5833 LearningRate 0.0612 Epoch: 4 Global Step: 180740 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:00,336-Speed 2629.09 samples/sec Loss 10.5545 LearningRate 0.0612 Epoch: 4 Global Step: 180750 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:04,229-Speed 2630.83 samples/sec Loss 10.6596 LearningRate 0.0612 Epoch: 4 Global Step: 180760 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:08,121-Speed 2631.44 samples/sec Loss 10.5933 LearningRate 0.0612 Epoch: 4 Global Step: 180770 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:12,021-Speed 2626.22 samples/sec Loss 10.7196 LearningRate 0.0612 Epoch: 4 Global Step: 180780 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:15,921-Speed 2626.77 samples/sec Loss 10.6851 LearningRate 0.0612 Epoch: 4 Global Step: 180790 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:19,819-Speed 2627.41 samples/sec Loss 10.5461 LearningRate 0.0612 Epoch: 4 Global Step: 180800 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:23,699-Speed 2639.65 samples/sec Loss 10.5943 LearningRate 0.0612 Epoch: 4 Global Step: 180810 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:27,587-Speed 2634.60 samples/sec Loss 10.7442 LearningRate 0.0612 Epoch: 4 Global Step: 180820 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:31,487-Speed 2626.54 samples/sec Loss 10.7036 LearningRate 0.0612 Epoch: 4 Global Step: 180830 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:35,464-Speed 2575.19 samples/sec Loss 10.7254 LearningRate 0.0612 Epoch: 4 Global Step: 180840 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:39,370-Speed 2622.74 samples/sec Loss 10.6948 LearningRate 0.0612 Epoch: 4 Global Step: 180850 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:43,274-Speed 2623.59 samples/sec Loss 10.6432 LearningRate 0.0612 Epoch: 4 Global Step: 180860 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:47,168-Speed 2630.66 samples/sec Loss 10.6108 LearningRate 0.0611 Epoch: 4 Global Step: 180870 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:51,061-Speed 2630.62 samples/sec Loss 10.5412 LearningRate 0.0611 Epoch: 4 Global Step: 180880 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:54,988-Speed 2608.43 samples/sec Loss 10.6483 LearningRate 0.0611 Epoch: 4 Global Step: 180890 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:03:58,885-Speed 2628.43 samples/sec Loss 10.6220 LearningRate 0.0611 Epoch: 4 Global Step: 180900 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:04:02,762-Speed 2645.49 samples/sec Loss 10.6241 LearningRate 0.0611 Epoch: 4 Global Step: 180910 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:04:06,656-Speed 2630.00 samples/sec Loss 10.7727 LearningRate 0.0611 Epoch: 4 Global Step: 180920 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:04:10,539-Speed 2637.44 samples/sec Loss 10.5844 LearningRate 0.0611 Epoch: 4 Global Step: 180930 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:14,431-Speed 2631.98 samples/sec Loss 10.5316 LearningRate 0.0611 Epoch: 4 Global Step: 180940 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:18,335-Speed 2623.38 samples/sec Loss 10.6153 LearningRate 0.0611 Epoch: 4 Global Step: 180950 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:22,253-Speed 2614.08 samples/sec Loss 10.6959 LearningRate 0.0611 Epoch: 4 Global Step: 180960 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:26,172-Speed 2613.55 samples/sec Loss 10.6533 LearningRate 0.0611 Epoch: 4 Global Step: 180970 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:30,088-Speed 2616.20 samples/sec Loss 10.8060 LearningRate 0.0611 Epoch: 4 Global Step: 180980 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:34,439-Speed 2354.05 samples/sec Loss 10.5894 LearningRate 0.0611 Epoch: 4 Global Step: 180990 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:38,329-Speed 2632.34 samples/sec Loss 10.4178 LearningRate 0.0611 Epoch: 4 Global Step: 181000 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:42,245-Speed 2616.03 samples/sec Loss 10.5947 LearningRate 0.0611 Epoch: 4 Global Step: 181010 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:46,132-Speed 2635.24 samples/sec Loss 10.6188 LearningRate 0.0611 Epoch: 4 Global Step: 181020 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:04:50,026-Speed 2629.85 samples/sec Loss 10.6427 LearningRate 0.0611 Epoch: 4 Global Step: 181030 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:04:53,921-Speed 2630.28 samples/sec Loss 10.6929 LearningRate 0.0611 Epoch: 4 Global Step: 181040 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:04:57,818-Speed 2628.64 samples/sec Loss 10.5807 LearningRate 0.0611 Epoch: 4 Global Step: 181050 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:05:01,713-Speed 2629.35 samples/sec Loss 10.5619 LearningRate 0.0611 Epoch: 4 Global Step: 181060 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:05:05,616-Speed 2624.52 samples/sec Loss 10.6897 LearningRate 0.0611 Epoch: 4 Global Step: 181070 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:05:09,513-Speed 2628.18 samples/sec Loss 10.5009 LearningRate 0.0611 Epoch: 4 Global Step: 181080 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:05:13,391-Speed 2641.06 samples/sec Loss 10.5480 LearningRate 0.0611 Epoch: 4 Global Step: 181090 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:17,285-Speed 2630.64 samples/sec Loss 10.6286 LearningRate 0.0611 Epoch: 4 Global Step: 181100 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:21,178-Speed 2630.42 samples/sec Loss 10.6525 LearningRate 0.0611 Epoch: 4 Global Step: 181110 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:25,077-Speed 2627.70 samples/sec Loss 10.5186 LearningRate 0.0611 Epoch: 4 Global Step: 181120 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:28,992-Speed 2616.06 samples/sec Loss 10.6526 LearningRate 0.0611 Epoch: 4 Global Step: 181130 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:32,897-Speed 2623.05 samples/sec Loss 10.6625 LearningRate 0.0611 Epoch: 4 Global Step: 181140 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:36,794-Speed 2627.57 samples/sec Loss 10.5616 LearningRate 0.0611 Epoch: 4 Global Step: 181150 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:40,691-Speed 2628.51 samples/sec Loss 10.5912 LearningRate 0.0611 Epoch: 4 Global Step: 181160 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:44,599-Speed 2620.58 samples/sec Loss 10.6122 LearningRate 0.0611 Epoch: 4 Global Step: 181170 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:48,495-Speed 2629.18 samples/sec Loss 10.5695 LearningRate 0.0611 Epoch: 4 Global Step: 181180 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:52,380-Speed 2636.57 samples/sec Loss 10.6333 LearningRate 0.0611 Epoch: 4 Global Step: 181190 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:05:56,277-Speed 2627.96 samples/sec Loss 10.6033 LearningRate 0.0611 Epoch: 4 Global Step: 181200 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:00,174-Speed 2628.73 samples/sec Loss 10.6872 LearningRate 0.0611 Epoch: 4 Global Step: 181210 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:04,085-Speed 2618.83 samples/sec Loss 10.6635 LearningRate 0.0611 Epoch: 4 Global Step: 181220 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:07,990-Speed 2622.82 samples/sec Loss 10.4773 LearningRate 0.0611 Epoch: 4 Global Step: 181230 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:11,888-Speed 2627.26 samples/sec Loss 10.6653 LearningRate 0.0611 Epoch: 4 Global Step: 181240 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:15,786-Speed 2627.63 samples/sec Loss 10.5633 LearningRate 0.0611 Epoch: 4 Global Step: 181250 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:19,684-Speed 2627.74 samples/sec Loss 10.6247 LearningRate 0.0611 Epoch: 4 Global Step: 181260 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:23,580-Speed 2628.68 samples/sec Loss 10.5597 LearningRate 0.0611 Epoch: 4 Global Step: 181270 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:27,481-Speed 2626.15 samples/sec Loss 10.6892 LearningRate 0.0611 Epoch: 4 Global Step: 181280 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:06:31,383-Speed 2624.80 samples/sec Loss 10.8397 LearningRate 0.0611 Epoch: 4 Global Step: 181290 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:06:35,318-Speed 2602.77 samples/sec Loss 10.5164 LearningRate 0.0611 Epoch: 4 Global Step: 181300 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:06:39,225-Speed 2621.72 samples/sec Loss 10.6927 LearningRate 0.0611 Epoch: 4 Global Step: 181310 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:06:43,121-Speed 2628.46 samples/sec Loss 10.5991 LearningRate 0.0611 Epoch: 4 Global Step: 181320 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:06:47,018-Speed 2628.64 samples/sec Loss 10.6495 LearningRate 0.0611 Epoch: 4 Global Step: 181330 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:06:50,912-Speed 2629.96 samples/sec Loss 10.6532 LearningRate 0.0611 Epoch: 4 Global Step: 181340 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:06:54,813-Speed 2625.95 samples/sec Loss 10.5700 LearningRate 0.0611 Epoch: 4 Global Step: 181350 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:06:58,715-Speed 2625.35 samples/sec Loss 10.4529 LearningRate 0.0611 Epoch: 4 Global Step: 181360 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:07:02,617-Speed 2624.93 samples/sec Loss 10.6327 LearningRate 0.0611 Epoch: 4 Global Step: 181370 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:07:06,512-Speed 2629.61 samples/sec Loss 10.6759 LearningRate 0.0611 Epoch: 4 Global Step: 181380 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:07:10,410-Speed 2627.40 samples/sec Loss 10.5057 LearningRate 0.0611 Epoch: 4 Global Step: 181390 Fp16 Grad Scale: 524288 Required: 73 hours
Training: 2022-04-13 16:07:14,289-Speed 2640.50 samples/sec Loss 10.7592 LearningRate 0.0610 Epoch: 4 Global Step: 181400 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:07:18,171-Speed 2638.37 samples/sec Loss 10.5395 LearningRate 0.0610 Epoch: 4 Global Step: 181410 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:22,067-Speed 2628.80 samples/sec Loss 10.5523 LearningRate 0.0610 Epoch: 4 Global Step: 181420 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:25,960-Speed 2630.89 samples/sec Loss 10.6261 LearningRate 0.0610 Epoch: 4 Global Step: 181430 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:29,862-Speed 2625.65 samples/sec Loss 10.6696 LearningRate 0.0610 Epoch: 4 Global Step: 181440 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:33,753-Speed 2632.40 samples/sec Loss 10.5726 LearningRate 0.0610 Epoch: 4 Global Step: 181450 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:37,669-Speed 2615.37 samples/sec Loss 10.7882 LearningRate 0.0610 Epoch: 4 Global Step: 181460 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:41,588-Speed 2613.72 samples/sec Loss 10.7497 LearningRate 0.0610 Epoch: 4 Global Step: 181470 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:45,507-Speed 2613.52 samples/sec Loss 10.6367 LearningRate 0.0610 Epoch: 4 Global Step: 181480 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:49,416-Speed 2620.02 samples/sec Loss 10.4848 LearningRate 0.0610 Epoch: 4 Global Step: 181490 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:53,302-Speed 2635.99 samples/sec Loss 10.5479 LearningRate 0.0610 Epoch: 4 Global Step: 181500 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:07:57,200-Speed 2628.47 samples/sec Loss 10.5169 LearningRate 0.0610 Epoch: 4 Global Step: 181510 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:01,117-Speed 2614.85 samples/sec Loss 10.6270 LearningRate 0.0610 Epoch: 4 Global Step: 181520 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:05,038-Speed 2611.80 samples/sec Loss 10.5744 LearningRate 0.0610 Epoch: 4 Global Step: 181530 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:08,969-Speed 2605.53 samples/sec Loss 10.5640 LearningRate 0.0610 Epoch: 4 Global Step: 181540 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:12,885-Speed 2615.67 samples/sec Loss 10.6765 LearningRate 0.0610 Epoch: 4 Global Step: 181550 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:16,783-Speed 2627.64 samples/sec Loss 10.7293 LearningRate 0.0610 Epoch: 4 Global Step: 181560 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:20,677-Speed 2630.44 samples/sec Loss 10.5180 LearningRate 0.0610 Epoch: 4 Global Step: 181570 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:24,572-Speed 2629.61 samples/sec Loss 10.4729 LearningRate 0.0610 Epoch: 4 Global Step: 181580 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:28,486-Speed 2617.30 samples/sec Loss 10.5045 LearningRate 0.0610 Epoch: 4 Global Step: 181590 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:08:32,375-Speed 2634.19 samples/sec Loss 10.6519 LearningRate 0.0610 Epoch: 4 Global Step: 181600 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:08:36,268-Speed 2630.38 samples/sec Loss 10.6497 LearningRate 0.0610 Epoch: 4 Global Step: 181610 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:08:40,180-Speed 2618.19 samples/sec Loss 10.7381 LearningRate 0.0610 Epoch: 4 Global Step: 181620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:08:44,074-Speed 2630.74 samples/sec Loss 10.6309 LearningRate 0.0610 Epoch: 4 Global Step: 181630 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:08:47,967-Speed 2631.08 samples/sec Loss 10.4371 LearningRate 0.0610 Epoch: 4 Global Step: 181640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:08:51,865-Speed 2628.10 samples/sec Loss 10.5574 LearningRate 0.0610 Epoch: 4 Global Step: 181650 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:08:55,765-Speed 2626.11 samples/sec Loss 10.5925 LearningRate 0.0610 Epoch: 4 Global Step: 181660 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:08:59,674-Speed 2620.22 samples/sec Loss 10.6029 LearningRate 0.0610 Epoch: 4 Global Step: 181670 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:09:03,568-Speed 2630.15 samples/sec Loss 10.5914 LearningRate 0.0610 Epoch: 4 Global Step: 181680 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:09:07,466-Speed 2627.85 samples/sec Loss 10.6755 LearningRate 0.0610 Epoch: 4 Global Step: 181690 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:09:11,360-Speed 2630.28 samples/sec Loss 10.6614 LearningRate 0.0610 Epoch: 4 Global Step: 181700 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:09:15,257-Speed 2628.46 samples/sec Loss 10.5906 LearningRate 0.0610 Epoch: 4 Global Step: 181710 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:09:19,153-Speed 2628.97 samples/sec Loss 10.5792 LearningRate 0.0610 Epoch: 4 Global Step: 181720 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:09:23,054-Speed 2625.61 samples/sec Loss 10.6830 LearningRate 0.0610 Epoch: 4 Global Step: 181730 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:09:26,936-Speed 2639.11 samples/sec Loss 10.5417 LearningRate 0.0610 Epoch: 4 Global Step: 181740 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:09:30,834-Speed 2627.39 samples/sec Loss 10.7139 LearningRate 0.0610 Epoch: 4 Global Step: 181750 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:09:34,738-Speed 2623.25 samples/sec Loss 10.5911 LearningRate 0.0610 Epoch: 4 Global Step: 181760 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:09:38,617-Speed 2640.44 samples/sec Loss 10.5306 LearningRate 0.0610 Epoch: 4 Global Step: 181770 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:09:42,517-Speed 2626.44 samples/sec Loss 10.6007 LearningRate 0.0610 Epoch: 4 Global Step: 181780 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:09:46,409-Speed 2632.09 samples/sec Loss 10.6638 LearningRate 0.0610 Epoch: 4 Global Step: 181790 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:09:50,304-Speed 2629.62 samples/sec Loss 10.6790 LearningRate 0.0610 Epoch: 4 Global Step: 181800 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:09:54,204-Speed 2626.24 samples/sec Loss 10.4568 LearningRate 0.0610 Epoch: 4 Global Step: 181810 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:09:58,094-Speed 2633.07 samples/sec Loss 10.6224 LearningRate 0.0610 Epoch: 4 Global Step: 181820 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:01,987-Speed 2631.05 samples/sec Loss 10.5340 LearningRate 0.0610 Epoch: 4 Global Step: 181830 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:05,902-Speed 2616.41 samples/sec Loss 10.7226 LearningRate 0.0610 Epoch: 4 Global Step: 181840 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:09,829-Speed 2608.07 samples/sec Loss 10.6149 LearningRate 0.0610 Epoch: 4 Global Step: 181850 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:13,725-Speed 2629.85 samples/sec Loss 10.6026 LearningRate 0.0610 Epoch: 4 Global Step: 181860 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:17,617-Speed 2632.20 samples/sec Loss 10.6847 LearningRate 0.0610 Epoch: 4 Global Step: 181870 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:21,521-Speed 2623.28 samples/sec Loss 10.7361 LearningRate 0.0610 Epoch: 4 Global Step: 181880 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:25,414-Speed 2631.36 samples/sec Loss 10.6512 LearningRate 0.0610 Epoch: 4 Global Step: 181890 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:29,315-Speed 2625.55 samples/sec Loss 10.6573 LearningRate 0.0610 Epoch: 4 Global Step: 181900 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:33,211-Speed 2628.49 samples/sec Loss 10.5991 LearningRate 0.0610 Epoch: 4 Global Step: 181910 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:10:37,124-Speed 2617.97 samples/sec Loss 10.5044 LearningRate 0.0610 Epoch: 4 Global Step: 181920 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:10:41,025-Speed 2625.75 samples/sec Loss 10.5258 LearningRate 0.0609 Epoch: 4 Global Step: 181930 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:10:44,922-Speed 2628.35 samples/sec Loss 10.5621 LearningRate 0.0609 Epoch: 4 Global Step: 181940 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:10:48,818-Speed 2630.05 samples/sec Loss 10.7525 LearningRate 0.0609 Epoch: 4 Global Step: 181950 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:10:52,714-Speed 2628.43 samples/sec Loss 10.5647 LearningRate 0.0609 Epoch: 4 Global Step: 181960 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:10:56,613-Speed 2627.59 samples/sec Loss 10.6026 LearningRate 0.0609 Epoch: 4 Global Step: 181970 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:11:00,508-Speed 2629.49 samples/sec Loss 10.6529 LearningRate 0.0609 Epoch: 4 Global Step: 181980 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:11:04,419-Speed 2618.92 samples/sec Loss 10.6301 LearningRate 0.0609 Epoch: 4 Global Step: 181990 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:11:08,325-Speed 2621.94 samples/sec Loss 10.5934 LearningRate 0.0609 Epoch: 4 Global Step: 182000 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:11:12,226-Speed 2626.53 samples/sec Loss 10.6935 LearningRate 0.0609 Epoch: 4 Global Step: 182010 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:11:16,124-Speed 2627.43 samples/sec Loss 10.7150 LearningRate 0.0609 Epoch: 4 Global Step: 182020 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:20,029-Speed 2623.01 samples/sec Loss 10.6341 LearningRate 0.0609 Epoch: 4 Global Step: 182030 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:23,950-Speed 2612.12 samples/sec Loss 10.5878 LearningRate 0.0609 Epoch: 4 Global Step: 182040 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:27,853-Speed 2624.88 samples/sec Loss 10.7228 LearningRate 0.0609 Epoch: 4 Global Step: 182050 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:31,750-Speed 2627.99 samples/sec Loss 10.6698 LearningRate 0.0609 Epoch: 4 Global Step: 182060 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:35,650-Speed 2626.18 samples/sec Loss 10.5196 LearningRate 0.0609 Epoch: 4 Global Step: 182070 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:39,548-Speed 2627.35 samples/sec Loss 10.4910 LearningRate 0.0609 Epoch: 4 Global Step: 182080 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:43,450-Speed 2624.85 samples/sec Loss 10.5626 LearningRate 0.0609 Epoch: 4 Global Step: 182090 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:47,486-Speed 2538.62 samples/sec Loss 10.7266 LearningRate 0.0609 Epoch: 4 Global Step: 182100 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:51,450-Speed 2584.08 samples/sec Loss 10.6156 LearningRate 0.0609 Epoch: 4 Global Step: 182110 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:11:55,350-Speed 2626.08 samples/sec Loss 10.7578 LearningRate 0.0609 Epoch: 4 Global Step: 182120 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:11:59,248-Speed 2627.35 samples/sec Loss 10.5461 LearningRate 0.0609 Epoch: 4 Global Step: 182130 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:03,150-Speed 2624.58 samples/sec Loss 10.6829 LearningRate 0.0609 Epoch: 4 Global Step: 182140 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:07,049-Speed 2627.36 samples/sec Loss 10.5379 LearningRate 0.0609 Epoch: 4 Global Step: 182150 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:10,959-Speed 2619.51 samples/sec Loss 10.7760 LearningRate 0.0609 Epoch: 4 Global Step: 182160 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:14,878-Speed 2613.55 samples/sec Loss 10.5968 LearningRate 0.0609 Epoch: 4 Global Step: 182170 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:18,781-Speed 2624.79 samples/sec Loss 10.5206 LearningRate 0.0609 Epoch: 4 Global Step: 182180 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:22,691-Speed 2620.11 samples/sec Loss 10.7389 LearningRate 0.0609 Epoch: 4 Global Step: 182190 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:26,586-Speed 2629.42 samples/sec Loss 10.5760 LearningRate 0.0609 Epoch: 4 Global Step: 182200 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:30,483-Speed 2629.01 samples/sec Loss 10.5448 LearningRate 0.0609 Epoch: 4 Global Step: 182210 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:34,399-Speed 2615.33 samples/sec Loss 10.5898 LearningRate 0.0609 Epoch: 4 Global Step: 182220 Fp16 Grad Scale: 524288 Required: 73 hours
Training: 2022-04-13 16:12:38,285-Speed 2635.81 samples/sec Loss 10.5977 LearningRate 0.0609 Epoch: 4 Global Step: 182230 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:42,183-Speed 2627.45 samples/sec Loss 10.6495 LearningRate 0.0609 Epoch: 4 Global Step: 182240 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:46,085-Speed 2625.37 samples/sec Loss 10.5565 LearningRate 0.0609 Epoch: 4 Global Step: 182250 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:49,985-Speed 2625.86 samples/sec Loss 10.5493 LearningRate 0.0609 Epoch: 4 Global Step: 182260 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:53,938-Speed 2591.19 samples/sec Loss 10.6555 LearningRate 0.0609 Epoch: 4 Global Step: 182270 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:12:57,959-Speed 2547.69 samples/sec Loss 10.8131 LearningRate 0.0609 Epoch: 4 Global Step: 182280 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:01,856-Speed 2628.16 samples/sec Loss 10.6297 LearningRate 0.0609 Epoch: 4 Global Step: 182290 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:05,785-Speed 2607.03 samples/sec Loss 10.6467 LearningRate 0.0609 Epoch: 4 Global Step: 182300 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:09,685-Speed 2626.60 samples/sec Loss 10.6107 LearningRate 0.0609 Epoch: 4 Global Step: 182310 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:13,710-Speed 2544.72 samples/sec Loss 10.7768 LearningRate 0.0609 Epoch: 4 Global Step: 182320 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:17,734-Speed 2545.45 samples/sec Loss 10.6302 LearningRate 0.0609 Epoch: 4 Global Step: 182330 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:21,638-Speed 2623.61 samples/sec Loss 10.5243 LearningRate 0.0609 Epoch: 4 Global Step: 182340 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:25,535-Speed 2628.10 samples/sec Loss 10.7775 LearningRate 0.0609 Epoch: 4 Global Step: 182350 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:29,441-Speed 2622.39 samples/sec Loss 10.5437 LearningRate 0.0609 Epoch: 4 Global Step: 182360 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:33,382-Speed 2599.46 samples/sec Loss 10.7610 LearningRate 0.0609 Epoch: 4 Global Step: 182370 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:37,292-Speed 2619.67 samples/sec Loss 10.6245 LearningRate 0.0609 Epoch: 4 Global Step: 182380 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:13:41,178-Speed 2635.19 samples/sec Loss 10.7282 LearningRate 0.0609 Epoch: 4 Global Step: 182390 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:13:45,087-Speed 2620.23 samples/sec Loss 10.5268 LearningRate 0.0609 Epoch: 4 Global Step: 182400 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:13:48,994-Speed 2622.19 samples/sec Loss 10.6992 LearningRate 0.0609 Epoch: 4 Global Step: 182410 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:13:52,900-Speed 2622.63 samples/sec Loss 10.6386 LearningRate 0.0609 Epoch: 4 Global Step: 182420 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:13:56,811-Speed 2619.17 samples/sec Loss 10.5813 LearningRate 0.0609 Epoch: 4 Global Step: 182430 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:00,742-Speed 2605.69 samples/sec Loss 10.6616 LearningRate 0.0609 Epoch: 4 Global Step: 182440 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:04,647-Speed 2622.79 samples/sec Loss 10.6459 LearningRate 0.0609 Epoch: 4 Global Step: 182450 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:08,560-Speed 2617.74 samples/sec Loss 10.6187 LearningRate 0.0608 Epoch: 4 Global Step: 182460 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:12,470-Speed 2619.70 samples/sec Loss 10.4066 LearningRate 0.0608 Epoch: 4 Global Step: 182470 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:16,377-Speed 2621.76 samples/sec Loss 10.5464 LearningRate 0.0608 Epoch: 4 Global Step: 182480 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:20,274-Speed 2628.05 samples/sec Loss 10.5142 LearningRate 0.0608 Epoch: 4 Global Step: 182490 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:24,174-Speed 2627.09 samples/sec Loss 10.6155 LearningRate 0.0608 Epoch: 4 Global Step: 182500 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:28,150-Speed 2575.98 samples/sec Loss 10.6917 LearningRate 0.0608 Epoch: 4 Global Step: 182510 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:32,080-Speed 2606.18 samples/sec Loss 10.5755 LearningRate 0.0608 Epoch: 4 Global Step: 182520 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:35,988-Speed 2621.38 samples/sec Loss 10.5887 LearningRate 0.0608 Epoch: 4 Global Step: 182530 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:39,883-Speed 2628.95 samples/sec Loss 10.6140 LearningRate 0.0608 Epoch: 4 Global Step: 182540 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:43,794-Speed 2619.22 samples/sec Loss 10.5853 LearningRate 0.0608 Epoch: 4 Global Step: 182550 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:47,703-Speed 2620.14 samples/sec Loss 10.6279 LearningRate 0.0608 Epoch: 4 Global Step: 182560 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:51,620-Speed 2615.09 samples/sec Loss 10.6035 LearningRate 0.0608 Epoch: 4 Global Step: 182570 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:55,542-Speed 2611.73 samples/sec Loss 10.7647 LearningRate 0.0608 Epoch: 4 Global Step: 182580 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:14:59,444-Speed 2624.90 samples/sec Loss 10.6958 LearningRate 0.0608 Epoch: 4 Global Step: 182590 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:03,344-Speed 2626.34 samples/sec Loss 10.5617 LearningRate 0.0608 Epoch: 4 Global Step: 182600 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:07,240-Speed 2628.73 samples/sec Loss 10.5553 LearningRate 0.0608 Epoch: 4 Global Step: 182610 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:11,136-Speed 2629.12 samples/sec Loss 10.5350 LearningRate 0.0608 Epoch: 4 Global Step: 182620 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:15,033-Speed 2628.31 samples/sec Loss 10.6299 LearningRate 0.0608 Epoch: 4 Global Step: 182630 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:18,956-Speed 2610.90 samples/sec Loss 10.6529 LearningRate 0.0608 Epoch: 4 Global Step: 182640 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:22,851-Speed 2629.70 samples/sec Loss 10.5863 LearningRate 0.0608 Epoch: 4 Global Step: 182650 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:26,748-Speed 2628.90 samples/sec Loss 10.5848 LearningRate 0.0608 Epoch: 4 Global Step: 182660 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:30,651-Speed 2624.14 samples/sec Loss 10.6103 LearningRate 0.0608 Epoch: 4 Global Step: 182670 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:34,559-Speed 2620.76 samples/sec Loss 10.4880 LearningRate 0.0608 Epoch: 4 Global Step: 182680 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:15:38,471-Speed 2618.15 samples/sec Loss 10.6970 LearningRate 0.0608 Epoch: 4 Global Step: 182690 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:15:42,370-Speed 2627.61 samples/sec Loss 10.6535 LearningRate 0.0608 Epoch: 4 Global Step: 182700 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:15:46,265-Speed 2629.37 samples/sec Loss 10.5523 LearningRate 0.0608 Epoch: 4 Global Step: 182710 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:15:50,161-Speed 2628.90 samples/sec Loss 10.5487 LearningRate 0.0608 Epoch: 4 Global Step: 182720 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:15:54,060-Speed 2627.12 samples/sec Loss 10.4938 LearningRate 0.0608 Epoch: 4 Global Step: 182730 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:15:57,959-Speed 2627.47 samples/sec Loss 10.6251 LearningRate 0.0608 Epoch: 4 Global Step: 182740 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:16:01,856-Speed 2628.13 samples/sec Loss 10.5306 LearningRate 0.0608 Epoch: 4 Global Step: 182750 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:16:05,763-Speed 2621.58 samples/sec Loss 10.5696 LearningRate 0.0608 Epoch: 4 Global Step: 182760 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:16:09,661-Speed 2627.87 samples/sec Loss 10.6759 LearningRate 0.0608 Epoch: 4 Global Step: 182770 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:16:13,560-Speed 2627.07 samples/sec Loss 10.5406 LearningRate 0.0608 Epoch: 4 Global Step: 182780 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:16:17,469-Speed 2620.35 samples/sec Loss 10.5706 LearningRate 0.0608 Epoch: 4 Global Step: 182790 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:21,365-Speed 2629.20 samples/sec Loss 10.5733 LearningRate 0.0608 Epoch: 4 Global Step: 182800 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:25,266-Speed 2625.53 samples/sec Loss 10.5642 LearningRate 0.0608 Epoch: 4 Global Step: 182810 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:29,168-Speed 2624.65 samples/sec Loss 10.5697 LearningRate 0.0608 Epoch: 4 Global Step: 182820 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:33,068-Speed 2626.52 samples/sec Loss 10.6267 LearningRate 0.0608 Epoch: 4 Global Step: 182830 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:36,966-Speed 2627.61 samples/sec Loss 10.5356 LearningRate 0.0608 Epoch: 4 Global Step: 182840 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:40,863-Speed 2628.29 samples/sec Loss 10.5762 LearningRate 0.0608 Epoch: 4 Global Step: 182850 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:44,766-Speed 2624.33 samples/sec Loss 10.4712 LearningRate 0.0608 Epoch: 4 Global Step: 182860 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:48,667-Speed 2625.62 samples/sec Loss 10.6671 LearningRate 0.0608 Epoch: 4 Global Step: 182870 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:52,563-Speed 2628.95 samples/sec Loss 10.7410 LearningRate 0.0608 Epoch: 4 Global Step: 182880 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:16:56,428-Speed 2650.84 samples/sec Loss 10.6647 LearningRate 0.0608 Epoch: 4 Global Step: 182890 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:17:00,335-Speed 2621.49 samples/sec Loss 10.7148 LearningRate 0.0608 Epoch: 4 Global Step: 182900 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:17:04,253-Speed 2613.77 samples/sec Loss 10.7989 LearningRate 0.0608 Epoch: 4 Global Step: 182910 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:17:08,144-Speed 2632.76 samples/sec Loss 10.7834 LearningRate 0.0608 Epoch: 4 Global Step: 182920 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:17:12,027-Speed 2637.84 samples/sec Loss 10.4767 LearningRate 0.0608 Epoch: 4 Global Step: 182930 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:15,918-Speed 2632.25 samples/sec Loss 11.2953 LearningRate 0.0608 Epoch: 4 Global Step: 182940 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:19,844-Speed 2609.17 samples/sec Loss 10.8910 LearningRate 0.0608 Epoch: 4 Global Step: 182950 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:23,748-Speed 2623.69 samples/sec Loss 10.7037 LearningRate 0.0608 Epoch: 4 Global Step: 182960 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:27,660-Speed 2618.68 samples/sec Loss 10.6013 LearningRate 0.0608 Epoch: 4 Global Step: 182970 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:31,598-Speed 2600.35 samples/sec Loss 10.6207 LearningRate 0.0608 Epoch: 4 Global Step: 182980 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:35,507-Speed 2620.82 samples/sec Loss 10.6187 LearningRate 0.0607 Epoch: 4 Global Step: 182990 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:39,406-Speed 2626.83 samples/sec Loss 10.5801 LearningRate 0.0607 Epoch: 4 Global Step: 183000 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:43,305-Speed 2626.83 samples/sec Loss 10.7020 LearningRate 0.0607 Epoch: 4 Global Step: 183010 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:47,211-Speed 2622.49 samples/sec Loss 10.7388 LearningRate 0.0607 Epoch: 4 Global Step: 183020 Fp16 Grad Scale: 32768 Required: 73 hours
Training: 2022-04-13 16:17:51,147-Speed 2602.46 samples/sec Loss 10.4915 LearningRate 0.0607 Epoch: 4 Global Step: 183030 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:17:55,057-Speed 2619.34 samples/sec Loss 10.7029 LearningRate 0.0607 Epoch: 4 Global Step: 183040 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:17:58,985-Speed 2607.77 samples/sec Loss 10.6336 LearningRate 0.0607 Epoch: 4 Global Step: 183050 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:02,901-Speed 2615.97 samples/sec Loss 10.5048 LearningRate 0.0607 Epoch: 4 Global Step: 183060 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:06,813-Speed 2617.87 samples/sec Loss 10.6367 LearningRate 0.0607 Epoch: 4 Global Step: 183070 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:10,725-Speed 2618.06 samples/sec Loss 10.4730 LearningRate 0.0607 Epoch: 4 Global Step: 183080 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:14,733-Speed 2555.67 samples/sec Loss 10.7263 LearningRate 0.0607 Epoch: 4 Global Step: 183090 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:18,642-Speed 2620.40 samples/sec Loss 10.6576 LearningRate 0.0607 Epoch: 4 Global Step: 183100 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:22,554-Speed 2617.67 samples/sec Loss 10.7076 LearningRate 0.0607 Epoch: 4 Global Step: 183110 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:26,455-Speed 2626.32 samples/sec Loss 10.5671 LearningRate 0.0607 Epoch: 4 Global Step: 183120 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:18:30,353-Speed 2627.20 samples/sec Loss 10.6844 LearningRate 0.0607 Epoch: 4 Global Step: 183130 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:18:34,265-Speed 2618.96 samples/sec Loss 10.6549 LearningRate 0.0607 Epoch: 4 Global Step: 183140 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:18:38,185-Speed 2612.49 samples/sec Loss 10.6312 LearningRate 0.0607 Epoch: 4 Global Step: 183150 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:18:42,084-Speed 2627.17 samples/sec Loss 10.5925 LearningRate 0.0607 Epoch: 4 Global Step: 183160 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:18:45,981-Speed 2628.27 samples/sec Loss 10.5358 LearningRate 0.0607 Epoch: 4 Global Step: 183170 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:18:49,881-Speed 2626.30 samples/sec Loss 10.6163 LearningRate 0.0607 Epoch: 4 Global Step: 183180 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:18:53,779-Speed 2627.39 samples/sec Loss 10.5251 LearningRate 0.0607 Epoch: 4 Global Step: 183190 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:18:57,681-Speed 2624.64 samples/sec Loss 10.5916 LearningRate 0.0607 Epoch: 4 Global Step: 183200 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:01,581-Speed 2626.65 samples/sec Loss 10.6361 LearningRate 0.0607 Epoch: 4 Global Step: 183210 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:05,491-Speed 2619.98 samples/sec Loss 10.5543 LearningRate 0.0607 Epoch: 4 Global Step: 183220 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:09,414-Speed 2610.62 samples/sec Loss 10.4912 LearningRate 0.0607 Epoch: 4 Global Step: 183230 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:19:13,329-Speed 2615.56 samples/sec Loss 10.5684 LearningRate 0.0607 Epoch: 4 Global Step: 183240 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:19:17,259-Speed 2606.86 samples/sec Loss 10.4742 LearningRate 0.0607 Epoch: 4 Global Step: 183250 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:19:21,178-Speed 2613.38 samples/sec Loss 10.5769 LearningRate 0.0607 Epoch: 4 Global Step: 183260 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:19:25,085-Speed 2621.53 samples/sec Loss 10.5223 LearningRate 0.0607 Epoch: 4 Global Step: 183270 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:19:28,995-Speed 2619.80 samples/sec Loss 10.4908 LearningRate 0.0607 Epoch: 4 Global Step: 183280 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:19:32,903-Speed 2621.19 samples/sec Loss 10.4481 LearningRate 0.0607 Epoch: 4 Global Step: 183290 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:19:36,788-Speed 2636.60 samples/sec Loss 10.5416 LearningRate 0.0607 Epoch: 4 Global Step: 183300 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:40,682-Speed 2629.79 samples/sec Loss 10.5438 LearningRate 0.0607 Epoch: 4 Global Step: 183310 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:44,588-Speed 2622.11 samples/sec Loss 10.6805 LearningRate 0.0607 Epoch: 4 Global Step: 183320 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:48,504-Speed 2615.96 samples/sec Loss 10.5735 LearningRate 0.0607 Epoch: 4 Global Step: 183330 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:52,415-Speed 2618.74 samples/sec Loss 10.6156 LearningRate 0.0607 Epoch: 4 Global Step: 183340 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:19:56,329-Speed 2616.89 samples/sec Loss 10.5934 LearningRate 0.0607 Epoch: 4 Global Step: 183350 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:00,249-Speed 2613.21 samples/sec Loss 10.7278 LearningRate 0.0607 Epoch: 4 Global Step: 183360 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:04,161-Speed 2618.41 samples/sec Loss 10.5636 LearningRate 0.0607 Epoch: 4 Global Step: 183370 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:08,079-Speed 2613.90 samples/sec Loss 10.5440 LearningRate 0.0607 Epoch: 4 Global Step: 183380 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:11,988-Speed 2620.74 samples/sec Loss 10.5809 LearningRate 0.0607 Epoch: 4 Global Step: 183390 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:15,898-Speed 2619.25 samples/sec Loss 10.7647 LearningRate 0.0607 Epoch: 4 Global Step: 183400 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:20:19,810-Speed 2618.22 samples/sec Loss 10.6400 LearningRate 0.0607 Epoch: 4 Global Step: 183410 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:20:23,749-Speed 2600.56 samples/sec Loss 10.6121 LearningRate 0.0607 Epoch: 4 Global Step: 183420 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:20:27,638-Speed 2633.71 samples/sec Loss 10.6260 LearningRate 0.0607 Epoch: 4 Global Step: 183430 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:31,547-Speed 2620.57 samples/sec Loss 10.3317 LearningRate 0.0607 Epoch: 4 Global Step: 183440 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:35,456-Speed 2620.14 samples/sec Loss 10.6135 LearningRate 0.0607 Epoch: 4 Global Step: 183450 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:39,359-Speed 2624.60 samples/sec Loss 10.7293 LearningRate 0.0607 Epoch: 4 Global Step: 183460 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:43,265-Speed 2622.31 samples/sec Loss 10.5256 LearningRate 0.0607 Epoch: 4 Global Step: 183470 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:47,169-Speed 2623.82 samples/sec Loss 10.5807 LearningRate 0.0607 Epoch: 4 Global Step: 183480 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:51,081-Speed 2618.68 samples/sec Loss 10.6004 LearningRate 0.0607 Epoch: 4 Global Step: 183490 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:54,990-Speed 2620.20 samples/sec Loss 10.5726 LearningRate 0.0607 Epoch: 4 Global Step: 183500 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:20:58,898-Speed 2620.54 samples/sec Loss 10.5900 LearningRate 0.0607 Epoch: 4 Global Step: 183510 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:21:02,815-Speed 2614.35 samples/sec Loss 10.4934 LearningRate 0.0606 Epoch: 4 Global Step: 183520 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:21:06,714-Speed 2627.52 samples/sec Loss 10.6406 LearningRate 0.0606 Epoch: 4 Global Step: 183530 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:10,623-Speed 2620.33 samples/sec Loss 10.5690 LearningRate 0.0606 Epoch: 4 Global Step: 183540 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:14,523-Speed 2626.40 samples/sec Loss 10.5748 LearningRate 0.0606 Epoch: 4 Global Step: 183550 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:18,530-Speed 2556.21 samples/sec Loss 10.5404 LearningRate 0.0606 Epoch: 4 Global Step: 183560 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:22,450-Speed 2612.31 samples/sec Loss 10.6120 LearningRate 0.0606 Epoch: 4 Global Step: 183570 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:26,347-Speed 2629.31 samples/sec Loss 10.6559 LearningRate 0.0606 Epoch: 4 Global Step: 183580 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:30,248-Speed 2625.33 samples/sec Loss 10.4617 LearningRate 0.0606 Epoch: 4 Global Step: 183590 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:34,150-Speed 2624.73 samples/sec Loss 10.5733 LearningRate 0.0606 Epoch: 4 Global Step: 183600 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:38,084-Speed 2603.38 samples/sec Loss 10.5447 LearningRate 0.0606 Epoch: 4 Global Step: 183610 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:21:41,968-Speed 2637.48 samples/sec Loss 10.5933 LearningRate 0.0606 Epoch: 4 Global Step: 183620 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:21:45,869-Speed 2625.74 samples/sec Loss 10.5365 LearningRate 0.0606 Epoch: 4 Global Step: 183630 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:21:49,771-Speed 2625.06 samples/sec Loss 10.5783 LearningRate 0.0606 Epoch: 4 Global Step: 183640 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:21:53,701-Speed 2605.79 samples/sec Loss 10.5058 LearningRate 0.0606 Epoch: 4 Global Step: 183650 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:21:57,612-Speed 2619.27 samples/sec Loss 10.7114 LearningRate 0.0606 Epoch: 4 Global Step: 183660 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:01,519-Speed 2621.69 samples/sec Loss 10.6911 LearningRate 0.0606 Epoch: 4 Global Step: 183670 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:05,452-Speed 2604.11 samples/sec Loss 10.5334 LearningRate 0.0606 Epoch: 4 Global Step: 183680 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:09,364-Speed 2618.46 samples/sec Loss 10.6005 LearningRate 0.0606 Epoch: 4 Global Step: 183690 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:13,262-Speed 2627.54 samples/sec Loss 10.4854 LearningRate 0.0606 Epoch: 4 Global Step: 183700 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:17,167-Speed 2623.05 samples/sec Loss 10.5098 LearningRate 0.0606 Epoch: 4 Global Step: 183710 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:21,070-Speed 2624.29 samples/sec Loss 10.5284 LearningRate 0.0606 Epoch: 4 Global Step: 183720 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:24,984-Speed 2617.09 samples/sec Loss 10.5763 LearningRate 0.0606 Epoch: 4 Global Step: 183730 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:28,893-Speed 2620.04 samples/sec Loss 10.5038 LearningRate 0.0606 Epoch: 4 Global Step: 183740 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:32,796-Speed 2624.28 samples/sec Loss 10.5257 LearningRate 0.0606 Epoch: 4 Global Step: 183750 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:36,712-Speed 2615.34 samples/sec Loss 10.6148 LearningRate 0.0606 Epoch: 4 Global Step: 183760 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:40,628-Speed 2615.75 samples/sec Loss 10.5216 LearningRate 0.0606 Epoch: 4 Global Step: 183770 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:44,529-Speed 2625.64 samples/sec Loss 10.7179 LearningRate 0.0606 Epoch: 4 Global Step: 183780 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:48,454-Speed 2610.04 samples/sec Loss 10.5817 LearningRate 0.0606 Epoch: 4 Global Step: 183790 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:52,357-Speed 2624.25 samples/sec Loss 10.6328 LearningRate 0.0606 Epoch: 4 Global Step: 183800 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:22:56,257-Speed 2626.81 samples/sec Loss 10.6825 LearningRate 0.0606 Epoch: 4 Global Step: 183810 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:00,159-Speed 2624.69 samples/sec Loss 10.6161 LearningRate 0.0606 Epoch: 4 Global Step: 183820 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:04,060-Speed 2625.59 samples/sec Loss 10.5105 LearningRate 0.0606 Epoch: 4 Global Step: 183830 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:08,104-Speed 2532.98 samples/sec Loss 10.5950 LearningRate 0.0606 Epoch: 4 Global Step: 183840 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:12,243-Speed 2474.79 samples/sec Loss 10.6063 LearningRate 0.0606 Epoch: 4 Global Step: 183850 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:16,141-Speed 2627.29 samples/sec Loss 10.6862 LearningRate 0.0606 Epoch: 4 Global Step: 183860 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:20,049-Speed 2621.34 samples/sec Loss 10.6321 LearningRate 0.0606 Epoch: 4 Global Step: 183870 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:23,955-Speed 2621.73 samples/sec Loss 10.6371 LearningRate 0.0606 Epoch: 4 Global Step: 183880 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:27,862-Speed 2622.24 samples/sec Loss 10.7326 LearningRate 0.0606 Epoch: 4 Global Step: 183890 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:23:31,745-Speed 2637.55 samples/sec Loss 10.5023 LearningRate 0.0606 Epoch: 4 Global Step: 183900 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:35,649-Speed 2623.90 samples/sec Loss 10.6916 LearningRate 0.0606 Epoch: 4 Global Step: 183910 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:39,622-Speed 2577.38 samples/sec Loss 10.5836 LearningRate 0.0606 Epoch: 4 Global Step: 183920 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:43,528-Speed 2622.93 samples/sec Loss 10.5217 LearningRate 0.0606 Epoch: 4 Global Step: 183930 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:47,425-Speed 2628.12 samples/sec Loss 10.4956 LearningRate 0.0606 Epoch: 4 Global Step: 183940 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:51,327-Speed 2625.06 samples/sec Loss 10.5683 LearningRate 0.0606 Epoch: 4 Global Step: 183950 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:55,232-Speed 2622.90 samples/sec Loss 10.7415 LearningRate 0.0606 Epoch: 4 Global Step: 183960 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:23:59,131-Speed 2626.83 samples/sec Loss 10.4834 LearningRate 0.0606 Epoch: 4 Global Step: 183970 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:24:03,034-Speed 2623.94 samples/sec Loss 10.4396 LearningRate 0.0606 Epoch: 4 Global Step: 183980 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:24:06,934-Speed 2626.15 samples/sec Loss 10.4998 LearningRate 0.0606 Epoch: 4 Global Step: 183990 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:24:10,834-Speed 2626.00 samples/sec Loss 10.4997 LearningRate 0.0606 Epoch: 4 Global Step: 184000 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:14,736-Speed 2626.08 samples/sec Loss 10.6454 LearningRate 0.0606 Epoch: 4 Global Step: 184010 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:18,639-Speed 2624.32 samples/sec Loss 10.7241 LearningRate 0.0606 Epoch: 4 Global Step: 184020 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:22,541-Speed 2624.71 samples/sec Loss 10.6992 LearningRate 0.0606 Epoch: 4 Global Step: 184030 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:26,447-Speed 2622.41 samples/sec Loss 10.6502 LearningRate 0.0606 Epoch: 4 Global Step: 184040 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:30,352-Speed 2623.02 samples/sec Loss 10.5445 LearningRate 0.0606 Epoch: 4 Global Step: 184050 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:34,257-Speed 2622.89 samples/sec Loss 10.6267 LearningRate 0.0605 Epoch: 4 Global Step: 184060 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:38,161-Speed 2623.85 samples/sec Loss 10.4949 LearningRate 0.0605 Epoch: 4 Global Step: 184070 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:42,069-Speed 2621.03 samples/sec Loss 10.4095 LearningRate 0.0605 Epoch: 4 Global Step: 184080 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:46,064-Speed 2563.74 samples/sec Loss 10.7075 LearningRate 0.0605 Epoch: 4 Global Step: 184090 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:50,021-Speed 2588.80 samples/sec Loss 10.5862 LearningRate 0.0605 Epoch: 4 Global Step: 184100 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:53,925-Speed 2623.38 samples/sec Loss 10.5465 LearningRate 0.0605 Epoch: 4 Global Step: 184110 Fp16 Grad Scale: 262144 Required: 73 hours
Training: 2022-04-13 16:24:57,952-Speed 2543.54 samples/sec Loss 10.5998 LearningRate 0.0605 Epoch: 4 Global Step: 184120 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:02,009-Speed 2524.96 samples/sec Loss 10.5300 LearningRate 0.0605 Epoch: 4 Global Step: 184130 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:05,924-Speed 2616.27 samples/sec Loss 10.5639 LearningRate 0.0605 Epoch: 4 Global Step: 184140 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:09,823-Speed 2626.88 samples/sec Loss 10.6990 LearningRate 0.0605 Epoch: 4 Global Step: 184150 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:13,724-Speed 2625.30 samples/sec Loss 10.4950 LearningRate 0.0605 Epoch: 4 Global Step: 184160 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:17,627-Speed 2623.98 samples/sec Loss 10.5793 LearningRate 0.0605 Epoch: 4 Global Step: 184170 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:21,530-Speed 2624.33 samples/sec Loss 10.6595 LearningRate 0.0605 Epoch: 4 Global Step: 184180 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:25,432-Speed 2624.88 samples/sec Loss 10.6213 LearningRate 0.0605 Epoch: 4 Global Step: 184190 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:29,332-Speed 2626.24 samples/sec Loss 10.5110 LearningRate 0.0605 Epoch: 4 Global Step: 184200 Fp16 Grad Scale: 131072 Required: 73 hours
Training: 2022-04-13 16:25:33,217-Speed 2636.96 samples/sec Loss 10.6273 LearningRate 0.0605 Epoch: 4 Global Step: 184210 Fp16 Grad Scale: 65536 Required: 73 hours
Training: 2022-04-13 16:25:37,121-Speed 2623.69 samples/sec Loss 10.4822 LearningRate 0.0605 Epoch: 4 Global Step: 184220 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:25:41,009-Speed 2633.79 samples/sec Loss 10.6419 LearningRate 0.0605 Epoch: 4 Global Step: 184230 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:25:44,907-Speed 2627.54 samples/sec Loss 10.7038 LearningRate 0.0605 Epoch: 4 Global Step: 184240 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:25:48,808-Speed 2625.49 samples/sec Loss 10.6483 LearningRate 0.0605 Epoch: 4 Global Step: 184250 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:25:52,708-Speed 2626.58 samples/sec Loss 10.6231 LearningRate 0.0605 Epoch: 4 Global Step: 184260 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:25:56,615-Speed 2621.86 samples/sec Loss 10.5994 LearningRate 0.0605 Epoch: 4 Global Step: 184270 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:00,543-Speed 2607.63 samples/sec Loss 10.5681 LearningRate 0.0605 Epoch: 4 Global Step: 184280 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:04,446-Speed 2624.94 samples/sec Loss 10.6835 LearningRate 0.0605 Epoch: 4 Global Step: 184290 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:08,346-Speed 2626.22 samples/sec Loss 10.6623 LearningRate 0.0605 Epoch: 4 Global Step: 184300 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:12,244-Speed 2626.93 samples/sec Loss 10.5720 LearningRate 0.0605 Epoch: 4 Global Step: 184310 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:16,154-Speed 2619.90 samples/sec Loss 10.4455 LearningRate 0.0605 Epoch: 4 Global Step: 184320 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:20,054-Speed 2626.35 samples/sec Loss 10.6809 LearningRate 0.0605 Epoch: 4 Global Step: 184330 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:26:23,953-Speed 2627.44 samples/sec Loss 10.6126 LearningRate 0.0605 Epoch: 4 Global Step: 184340 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:26:27,837-Speed 2636.58 samples/sec Loss 10.6414 LearningRate 0.0605 Epoch: 4 Global Step: 184350 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:31,738-Speed 2625.50 samples/sec Loss 10.5546 LearningRate 0.0605 Epoch: 4 Global Step: 184360 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:35,636-Speed 2627.81 samples/sec Loss 10.8078 LearningRate 0.0605 Epoch: 4 Global Step: 184370 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:39,547-Speed 2618.84 samples/sec Loss 10.4602 LearningRate 0.0605 Epoch: 4 Global Step: 184380 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:43,446-Speed 2627.09 samples/sec Loss 10.6898 LearningRate 0.0605 Epoch: 4 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:47,349-Speed 2624.11 samples/sec Loss 10.5053 LearningRate 0.0605 Epoch: 4 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:51,256-Speed 2622.20 samples/sec Loss 10.6110 LearningRate 0.0605 Epoch: 4 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:55,160-Speed 2623.20 samples/sec Loss 10.5676 LearningRate 0.0605 Epoch: 4 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:26:59,063-Speed 2623.99 samples/sec Loss 10.7032 LearningRate 0.0605 Epoch: 4 Global Step: 184430 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:27:02,977-Speed 2617.26 samples/sec Loss 10.5663 LearningRate 0.0605 Epoch: 4 Global Step: 184440 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:27:06,919-Speed 2598.33 samples/sec Loss 10.6549 LearningRate 0.0605 Epoch: 4 Global Step: 184450 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:10,849-Speed 2606.70 samples/sec Loss 10.5947 LearningRate 0.0605 Epoch: 4 Global Step: 184460 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:14,754-Speed 2623.27 samples/sec Loss 10.6335 LearningRate 0.0605 Epoch: 4 Global Step: 184470 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:18,686-Speed 2604.94 samples/sec Loss 10.5769 LearningRate 0.0605 Epoch: 4 Global Step: 184480 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:22,584-Speed 2627.16 samples/sec Loss 10.5360 LearningRate 0.0605 Epoch: 4 Global Step: 184490 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:26,491-Speed 2621.81 samples/sec Loss 10.7477 LearningRate 0.0605 Epoch: 4 Global Step: 184500 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:30,403-Speed 2618.86 samples/sec Loss 10.6674 LearningRate 0.0605 Epoch: 4 Global Step: 184510 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:34,316-Speed 2617.43 samples/sec Loss 10.5243 LearningRate 0.0605 Epoch: 4 Global Step: 184520 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:38,218-Speed 2624.31 samples/sec Loss 10.7643 LearningRate 0.0605 Epoch: 4 Global Step: 184530 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:42,138-Speed 2613.29 samples/sec Loss 10.6321 LearningRate 0.0605 Epoch: 4 Global Step: 184540 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:27:46,053-Speed 2615.86 samples/sec Loss 10.4938 LearningRate 0.0605 Epoch: 4 Global Step: 184550 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:27:49,975-Speed 2611.81 samples/sec Loss 10.5331 LearningRate 0.0605 Epoch: 4 Global Step: 184560 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:27:53,902-Speed 2608.54 samples/sec Loss 10.5423 LearningRate 0.0605 Epoch: 4 Global Step: 184570 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:27:57,821-Speed 2613.39 samples/sec Loss 10.5167 LearningRate 0.0605 Epoch: 4 Global Step: 184580 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:01,742-Speed 2612.36 samples/sec Loss 10.5292 LearningRate 0.0604 Epoch: 4 Global Step: 184590 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:05,644-Speed 2624.78 samples/sec Loss 10.6661 LearningRate 0.0604 Epoch: 4 Global Step: 184600 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:09,546-Speed 2624.48 samples/sec Loss 10.5439 LearningRate 0.0604 Epoch: 4 Global Step: 184610 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:13,445-Speed 2627.14 samples/sec Loss 10.5977 LearningRate 0.0604 Epoch: 4 Global Step: 184620 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:17,344-Speed 2626.89 samples/sec Loss 10.7084 LearningRate 0.0604 Epoch: 4 Global Step: 184630 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:21,248-Speed 2623.98 samples/sec Loss 10.6098 LearningRate 0.0604 Epoch: 4 Global Step: 184640 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:25,148-Speed 2625.73 samples/sec Loss 10.5456 LearningRate 0.0604 Epoch: 4 Global Step: 184650 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:28:29,048-Speed 2626.69 samples/sec Loss 10.6197 LearningRate 0.0604 Epoch: 4 Global Step: 184660 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:28:33,009-Speed 2585.82 samples/sec Loss 10.6041 LearningRate 0.0604 Epoch: 4 Global Step: 184670 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:37,022-Speed 2551.79 samples/sec Loss 10.4547 LearningRate 0.0604 Epoch: 4 Global Step: 184680 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:40,925-Speed 2624.09 samples/sec Loss 10.4919 LearningRate 0.0604 Epoch: 4 Global Step: 184690 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:44,827-Speed 2625.31 samples/sec Loss 10.5920 LearningRate 0.0604 Epoch: 4 Global Step: 184700 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:48,734-Speed 2621.91 samples/sec Loss 10.4396 LearningRate 0.0604 Epoch: 4 Global Step: 184710 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:52,637-Speed 2624.71 samples/sec Loss 10.5333 LearningRate 0.0604 Epoch: 4 Global Step: 184720 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:28:56,560-Speed 2610.93 samples/sec Loss 10.5034 LearningRate 0.0604 Epoch: 4 Global Step: 184730 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:00,457-Speed 2628.06 samples/sec Loss 10.6697 LearningRate 0.0604 Epoch: 4 Global Step: 184740 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:04,383-Speed 2609.13 samples/sec Loss 10.4962 LearningRate 0.0604 Epoch: 4 Global Step: 184750 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:08,414-Speed 2540.30 samples/sec Loss 10.5709 LearningRate 0.0604 Epoch: 4 Global Step: 184760 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:12,319-Speed 2623.72 samples/sec Loss 10.5591 LearningRate 0.0604 Epoch: 4 Global Step: 184770 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:29:16,218-Speed 2626.73 samples/sec Loss 10.5961 LearningRate 0.0604 Epoch: 4 Global Step: 184780 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:29:20,106-Speed 2634.60 samples/sec Loss 10.5266 LearningRate 0.0604 Epoch: 4 Global Step: 184790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:23,998-Speed 2631.77 samples/sec Loss 10.5995 LearningRate 0.0604 Epoch: 4 Global Step: 184800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:27,908-Speed 2620.12 samples/sec Loss 10.6684 LearningRate 0.0604 Epoch: 4 Global Step: 184810 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:31,832-Speed 2610.44 samples/sec Loss 10.6136 LearningRate 0.0604 Epoch: 4 Global Step: 184820 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:35,759-Speed 2608.10 samples/sec Loss 10.6992 LearningRate 0.0604 Epoch: 4 Global Step: 184830 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:39,661-Speed 2624.71 samples/sec Loss 10.6487 LearningRate 0.0604 Epoch: 4 Global Step: 184840 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:43,565-Speed 2623.80 samples/sec Loss 10.6348 LearningRate 0.0604 Epoch: 4 Global Step: 184850 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:47,468-Speed 2624.28 samples/sec Loss 10.5335 LearningRate 0.0604 Epoch: 4 Global Step: 184860 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:51,368-Speed 2626.93 samples/sec Loss 10.6502 LearningRate 0.0604 Epoch: 4 Global Step: 184870 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:55,274-Speed 2622.00 samples/sec Loss 10.4774 LearningRate 0.0604 Epoch: 4 Global Step: 184880 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:29:59,177-Speed 2624.36 samples/sec Loss 10.7059 LearningRate 0.0604 Epoch: 4 Global Step: 184890 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:30:03,082-Speed 2622.78 samples/sec Loss 10.7126 LearningRate 0.0604 Epoch: 4 Global Step: 184900 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:30:06,991-Speed 2620.52 samples/sec Loss 10.5312 LearningRate 0.0604 Epoch: 4 Global Step: 184910 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:30:10,872-Speed 2638.74 samples/sec Loss 10.5638 LearningRate 0.0604 Epoch: 4 Global Step: 184920 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:14,785-Speed 2618.14 samples/sec Loss 10.6599 LearningRate 0.0604 Epoch: 4 Global Step: 184930 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:18,680-Speed 2629.75 samples/sec Loss 10.6631 LearningRate 0.0604 Epoch: 4 Global Step: 184940 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:22,577-Speed 2627.97 samples/sec Loss 10.5227 LearningRate 0.0604 Epoch: 4 Global Step: 184950 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:26,475-Speed 2627.71 samples/sec Loss 10.5775 LearningRate 0.0604 Epoch: 4 Global Step: 184960 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:30,369-Speed 2629.88 samples/sec Loss 10.4100 LearningRate 0.0604 Epoch: 4 Global Step: 184970 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:34,273-Speed 2624.44 samples/sec Loss 10.4814 LearningRate 0.0604 Epoch: 4 Global Step: 184980 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:38,167-Speed 2630.09 samples/sec Loss 10.4690 LearningRate 0.0604 Epoch: 4 Global Step: 184990 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:42,062-Speed 2629.64 samples/sec Loss 10.4557 LearningRate 0.0604 Epoch: 4 Global Step: 185000 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:45,960-Speed 2627.08 samples/sec Loss 10.5270 LearningRate 0.0604 Epoch: 4 Global Step: 185010 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:30:49,855-Speed 2630.32 samples/sec Loss 10.4744 LearningRate 0.0604 Epoch: 4 Global Step: 185020 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:30:53,751-Speed 2628.80 samples/sec Loss 10.5129 LearningRate 0.0604 Epoch: 4 Global Step: 185030 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:30:57,650-Speed 2627.39 samples/sec Loss 10.6388 LearningRate 0.0604 Epoch: 4 Global Step: 185040 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:31:01,550-Speed 2626.36 samples/sec Loss 10.6687 LearningRate 0.0604 Epoch: 4 Global Step: 185050 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:31:05,437-Speed 2635.00 samples/sec Loss 10.7029 LearningRate 0.0604 Epoch: 4 Global Step: 185060 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:31:09,345-Speed 2621.13 samples/sec Loss 10.6603 LearningRate 0.0604 Epoch: 4 Global Step: 185070 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:31:13,251-Speed 2622.13 samples/sec Loss 10.5859 LearningRate 0.0604 Epoch: 4 Global Step: 185080 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:31:17,161-Speed 2619.47 samples/sec Loss 10.4911 LearningRate 0.0604 Epoch: 4 Global Step: 185090 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:31:21,029-Speed 2648.01 samples/sec Loss 10.6052 LearningRate 0.0604 Epoch: 4 Global Step: 185100 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:24,947-Speed 2614.12 samples/sec Loss 10.5178 LearningRate 0.0604 Epoch: 4 Global Step: 185110 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:28,853-Speed 2622.20 samples/sec Loss 10.4556 LearningRate 0.0603 Epoch: 4 Global Step: 185120 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:32,762-Speed 2620.62 samples/sec Loss 10.5906 LearningRate 0.0603 Epoch: 4 Global Step: 185130 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:36,674-Speed 2618.51 samples/sec Loss 10.4838 LearningRate 0.0603 Epoch: 4 Global Step: 185140 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:40,606-Speed 2604.49 samples/sec Loss 11.2617 LearningRate 0.0603 Epoch: 4 Global Step: 185150 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:44,520-Speed 2616.68 samples/sec Loss 11.0940 LearningRate 0.0603 Epoch: 4 Global Step: 185160 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:48,436-Speed 2615.55 samples/sec Loss 10.9711 LearningRate 0.0603 Epoch: 4 Global Step: 185170 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:52,349-Speed 2617.76 samples/sec Loss 10.8780 LearningRate 0.0603 Epoch: 4 Global Step: 185180 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:31:56,267-Speed 2614.53 samples/sec Loss 10.5780 LearningRate 0.0603 Epoch: 4 Global Step: 185190 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:32:00,163-Speed 2629.21 samples/sec Loss 10.7208 LearningRate 0.0603 Epoch: 4 Global Step: 185200 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:04,058-Speed 2630.09 samples/sec Loss 10.7208 LearningRate 0.0603 Epoch: 4 Global Step: 185210 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:07,954-Speed 2628.56 samples/sec Loss 10.6942 LearningRate 0.0603 Epoch: 4 Global Step: 185220 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:11,853-Speed 2626.55 samples/sec Loss 10.6701 LearningRate 0.0603 Epoch: 4 Global Step: 185230 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:15,748-Speed 2629.70 samples/sec Loss 10.6755 LearningRate 0.0603 Epoch: 4 Global Step: 185240 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:19,650-Speed 2625.71 samples/sec Loss 10.6153 LearningRate 0.0603 Epoch: 4 Global Step: 185250 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:23,553-Speed 2624.14 samples/sec Loss 10.6715 LearningRate 0.0603 Epoch: 4 Global Step: 185260 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:27,597-Speed 2532.53 samples/sec Loss 10.5906 LearningRate 0.0603 Epoch: 4 Global Step: 185270 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:31,493-Speed 2628.66 samples/sec Loss 10.6303 LearningRate 0.0603 Epoch: 4 Global Step: 185280 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:35,386-Speed 2631.65 samples/sec Loss 10.5088 LearningRate 0.0603 Epoch: 4 Global Step: 185290 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:32:39,280-Speed 2630.80 samples/sec Loss 10.5092 LearningRate 0.0603 Epoch: 4 Global Step: 185300 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:32:43,172-Speed 2631.69 samples/sec Loss 10.6468 LearningRate 0.0603 Epoch: 4 Global Step: 185310 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:32:47,066-Speed 2630.38 samples/sec Loss 10.6270 LearningRate 0.0603 Epoch: 4 Global Step: 185320 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:32:50,959-Speed 2630.84 samples/sec Loss 10.6446 LearningRate 0.0603 Epoch: 4 Global Step: 185330 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:32:54,853-Speed 2631.05 samples/sec Loss 10.6117 LearningRate 0.0603 Epoch: 4 Global Step: 185340 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:32:58,751-Speed 2627.62 samples/sec Loss 10.4957 LearningRate 0.0603 Epoch: 4 Global Step: 185350 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:33:02,630-Speed 2640.13 samples/sec Loss 10.6169 LearningRate 0.0603 Epoch: 4 Global Step: 185360 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:06,528-Speed 2627.54 samples/sec Loss 10.7107 LearningRate 0.0603 Epoch: 4 Global Step: 185370 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:10,428-Speed 2626.73 samples/sec Loss 10.5924 LearningRate 0.0603 Epoch: 4 Global Step: 185380 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:14,356-Speed 2607.71 samples/sec Loss 10.7093 LearningRate 0.0603 Epoch: 4 Global Step: 185390 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:18,257-Speed 2625.63 samples/sec Loss 10.7544 LearningRate 0.0603 Epoch: 4 Global Step: 185400 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:22,158-Speed 2626.23 samples/sec Loss 10.6201 LearningRate 0.0603 Epoch: 4 Global Step: 185410 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:26,132-Speed 2576.85 samples/sec Loss 10.6731 LearningRate 0.0603 Epoch: 4 Global Step: 185420 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:30,037-Speed 2623.43 samples/sec Loss 10.7073 LearningRate 0.0603 Epoch: 4 Global Step: 185430 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:33,933-Speed 2628.81 samples/sec Loss 10.5707 LearningRate 0.0603 Epoch: 4 Global Step: 185440 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:37,828-Speed 2629.78 samples/sec Loss 10.6469 LearningRate 0.0603 Epoch: 4 Global Step: 185450 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:33:41,737-Speed 2620.31 samples/sec Loss 10.7055 LearningRate 0.0603 Epoch: 4 Global Step: 185460 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:33:45,645-Speed 2621.07 samples/sec Loss 10.5284 LearningRate 0.0603 Epoch: 4 Global Step: 185470 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:33:49,542-Speed 2627.95 samples/sec Loss 10.4981 LearningRate 0.0603 Epoch: 4 Global Step: 185480 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:33:53,439-Speed 2628.50 samples/sec Loss 10.6515 LearningRate 0.0603 Epoch: 4 Global Step: 185490 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:33:57,338-Speed 2626.78 samples/sec Loss 10.6495 LearningRate 0.0603 Epoch: 4 Global Step: 185500 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:01,244-Speed 2622.38 samples/sec Loss 10.6250 LearningRate 0.0603 Epoch: 4 Global Step: 185510 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:05,151-Speed 2621.29 samples/sec Loss 10.5220 LearningRate 0.0603 Epoch: 4 Global Step: 185520 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:09,048-Speed 2627.99 samples/sec Loss 10.6619 LearningRate 0.0603 Epoch: 4 Global Step: 185530 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:12,953-Speed 2623.28 samples/sec Loss 10.6647 LearningRate 0.0603 Epoch: 4 Global Step: 185540 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:16,852-Speed 2627.22 samples/sec Loss 10.4801 LearningRate 0.0603 Epoch: 4 Global Step: 185550 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:20,735-Speed 2637.76 samples/sec Loss 10.6487 LearningRate 0.0603 Epoch: 4 Global Step: 185560 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:24,634-Speed 2626.78 samples/sec Loss 10.6922 LearningRate 0.0603 Epoch: 4 Global Step: 185570 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:28,534-Speed 2626.31 samples/sec Loss 10.5140 LearningRate 0.0603 Epoch: 4 Global Step: 185580 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:32,436-Speed 2625.17 samples/sec Loss 10.6767 LearningRate 0.0603 Epoch: 4 Global Step: 185590 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:36,343-Speed 2621.41 samples/sec Loss 10.3920 LearningRate 0.0603 Epoch: 4 Global Step: 185600 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:40,252-Speed 2620.09 samples/sec Loss 10.6595 LearningRate 0.0603 Epoch: 4 Global Step: 185610 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:44,205-Speed 2591.48 samples/sec Loss 10.5993 LearningRate 0.0603 Epoch: 4 Global Step: 185620 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:48,222-Speed 2549.71 samples/sec Loss 10.6928 LearningRate 0.0603 Epoch: 4 Global Step: 185630 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:52,138-Speed 2615.71 samples/sec Loss 10.7423 LearningRate 0.0603 Epoch: 4 Global Step: 185640 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:56,060-Speed 2611.85 samples/sec Loss 10.5828 LearningRate 0.0603 Epoch: 4 Global Step: 185650 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:34:59,957-Speed 2628.26 samples/sec Loss 10.5885 LearningRate 0.0602 Epoch: 4 Global Step: 185660 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:03,861-Speed 2623.68 samples/sec Loss 10.6927 LearningRate 0.0602 Epoch: 4 Global Step: 185670 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:07,765-Speed 2623.00 samples/sec Loss 10.5922 LearningRate 0.0602 Epoch: 4 Global Step: 185680 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:11,662-Speed 2628.11 samples/sec Loss 10.6038 LearningRate 0.0602 Epoch: 4 Global Step: 185690 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:15,559-Speed 2628.83 samples/sec Loss 10.6264 LearningRate 0.0602 Epoch: 4 Global Step: 185700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:19,460-Speed 2626.26 samples/sec Loss 10.6714 LearningRate 0.0602 Epoch: 4 Global Step: 185710 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:23,385-Speed 2609.28 samples/sec Loss 10.6732 LearningRate 0.0602 Epoch: 4 Global Step: 185720 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:27,303-Speed 2614.10 samples/sec Loss 10.5748 LearningRate 0.0602 Epoch: 4 Global Step: 185730 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:31,210-Speed 2621.68 samples/sec Loss 10.5597 LearningRate 0.0602 Epoch: 4 Global Step: 185740 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:35,121-Speed 2618.61 samples/sec Loss 10.4787 LearningRate 0.0602 Epoch: 4 Global Step: 185750 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:39,033-Speed 2617.90 samples/sec Loss 10.5546 LearningRate 0.0602 Epoch: 4 Global Step: 185760 Fp16 Grad Scale: 524288 Required: 72 hours
Training: 2022-04-13 16:35:42,932-Speed 2630.93 samples/sec Loss 10.3505 LearningRate 0.0602 Epoch: 4 Global Step: 185770 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:46,846-Speed 2617.09 samples/sec Loss 10.6557 LearningRate 0.0602 Epoch: 4 Global Step: 185780 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:50,745-Speed 2626.80 samples/sec Loss 10.5838 LearningRate 0.0602 Epoch: 4 Global Step: 185790 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:54,671-Speed 2608.96 samples/sec Loss 10.5370 LearningRate 0.0602 Epoch: 4 Global Step: 185800 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:35:58,585-Speed 2616.97 samples/sec Loss 10.3702 LearningRate 0.0602 Epoch: 4 Global Step: 185810 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:02,482-Speed 2628.42 samples/sec Loss 10.3938 LearningRate 0.0602 Epoch: 4 Global Step: 185820 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:06,386-Speed 2623.61 samples/sec Loss 10.5216 LearningRate 0.0602 Epoch: 4 Global Step: 185830 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:10,289-Speed 2624.07 samples/sec Loss 10.5938 LearningRate 0.0602 Epoch: 4 Global Step: 185840 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:14,193-Speed 2623.83 samples/sec Loss 10.5293 LearningRate 0.0602 Epoch: 4 Global Step: 185850 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:18,102-Speed 2619.65 samples/sec Loss 10.5330 LearningRate 0.0602 Epoch: 4 Global Step: 185860 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:22,009-Speed 2621.79 samples/sec Loss 10.6690 LearningRate 0.0602 Epoch: 4 Global Step: 185870 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:25,908-Speed 2627.07 samples/sec Loss 10.4654 LearningRate 0.0602 Epoch: 4 Global Step: 185880 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:29,811-Speed 2624.25 samples/sec Loss 10.6081 LearningRate 0.0602 Epoch: 4 Global Step: 185890 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:33,715-Speed 2624.15 samples/sec Loss 10.4066 LearningRate 0.0602 Epoch: 4 Global Step: 185900 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:36:37,619-Speed 2623.08 samples/sec Loss 10.5715 LearningRate 0.0602 Epoch: 4 Global Step: 185910 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:36:41,518-Speed 2627.01 samples/sec Loss 10.4880 LearningRate 0.0602 Epoch: 4 Global Step: 185920 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:36:45,419-Speed 2624.89 samples/sec Loss 10.6168 LearningRate 0.0602 Epoch: 4 Global Step: 185930 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:36:49,328-Speed 2620.57 samples/sec Loss 10.5328 LearningRate 0.0602 Epoch: 4 Global Step: 185940 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:36:53,232-Speed 2623.87 samples/sec Loss 10.6299 LearningRate 0.0602 Epoch: 4 Global Step: 185950 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:36:57,134-Speed 2625.24 samples/sec Loss 10.6442 LearningRate 0.0602 Epoch: 4 Global Step: 185960 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:01,034-Speed 2626.19 samples/sec Loss 10.5992 LearningRate 0.0602 Epoch: 4 Global Step: 185970 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:04,938-Speed 2631.19 samples/sec Loss 10.5362 LearningRate 0.0602 Epoch: 4 Global Step: 185980 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:08,842-Speed 2623.69 samples/sec Loss 10.5972 LearningRate 0.0602 Epoch: 4 Global Step: 185990 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:12,744-Speed 2624.80 samples/sec Loss 10.4121 LearningRate 0.0602 Epoch: 4 Global Step: 186000 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:16,630-Speed 2635.40 samples/sec Loss 10.6424 LearningRate 0.0602 Epoch: 4 Global Step: 186010 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:20,546-Speed 2615.85 samples/sec Loss 10.5703 LearningRate 0.0602 Epoch: 4 Global Step: 186020 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:24,437-Speed 2631.97 samples/sec Loss 10.3762 LearningRate 0.0602 Epoch: 4 Global Step: 186030 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:28,340-Speed 2624.43 samples/sec Loss 10.6347 LearningRate 0.0602 Epoch: 4 Global Step: 186040 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:32,295-Speed 2590.01 samples/sec Loss 10.4400 LearningRate 0.0602 Epoch: 4 Global Step: 186050 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:36,201-Speed 2622.52 samples/sec Loss 10.5505 LearningRate 0.0602 Epoch: 4 Global Step: 186060 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:40,103-Speed 2624.16 samples/sec Loss 10.5984 LearningRate 0.0602 Epoch: 4 Global Step: 186070 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:44,020-Speed 2614.98 samples/sec Loss 10.7104 LearningRate 0.0602 Epoch: 4 Global Step: 186080 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:47,921-Speed 2625.13 samples/sec Loss 10.5839 LearningRate 0.0602 Epoch: 4 Global Step: 186090 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:51,824-Speed 2624.08 samples/sec Loss 10.3272 LearningRate 0.0602 Epoch: 4 Global Step: 186100 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:55,716-Speed 2632.31 samples/sec Loss 10.5290 LearningRate 0.0602 Epoch: 4 Global Step: 186110 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:37:59,629-Speed 2616.78 samples/sec Loss 10.6250 LearningRate 0.0602 Epoch: 4 Global Step: 186120 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:38:03,533-Speed 2623.94 samples/sec Loss 10.6114 LearningRate 0.0602 Epoch: 4 Global Step: 186130 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:38:07,419-Speed 2635.67 samples/sec Loss 10.6750 LearningRate 0.0602 Epoch: 4 Global Step: 186140 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:11,321-Speed 2624.88 samples/sec Loss 10.4252 LearningRate 0.0602 Epoch: 4 Global Step: 186150 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:15,224-Speed 2624.10 samples/sec Loss 10.4898 LearningRate 0.0602 Epoch: 4 Global Step: 186160 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:19,135-Speed 2619.33 samples/sec Loss 10.4463 LearningRate 0.0602 Epoch: 4 Global Step: 186170 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:23,038-Speed 2623.86 samples/sec Loss 10.5784 LearningRate 0.0602 Epoch: 4 Global Step: 186180 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:26,940-Speed 2624.75 samples/sec Loss 10.5654 LearningRate 0.0601 Epoch: 4 Global Step: 186190 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:30,844-Speed 2623.77 samples/sec Loss 10.4988 LearningRate 0.0601 Epoch: 4 Global Step: 186200 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:34,744-Speed 2626.14 samples/sec Loss 10.4930 LearningRate 0.0601 Epoch: 4 Global Step: 186210 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:38,660-Speed 2615.13 samples/sec Loss 10.5333 LearningRate 0.0601 Epoch: 4 Global Step: 186220 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:42,566-Speed 2622.67 samples/sec Loss 10.6241 LearningRate 0.0601 Epoch: 4 Global Step: 186230 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:46,467-Speed 2625.49 samples/sec Loss 10.4176 LearningRate 0.0601 Epoch: 4 Global Step: 186240 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:38:50,357-Speed 2633.28 samples/sec Loss 10.6028 LearningRate 0.0601 Epoch: 4 Global Step: 186250 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:54,278-Speed 2612.25 samples/sec Loss 10.4645 LearningRate 0.0601 Epoch: 4 Global Step: 186260 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:38:58,132-Speed 2657.40 samples/sec Loss 11.4375 LearningRate 0.0601 Epoch: 4 Global Step: 186270 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:02,023-Speed 2632.10 samples/sec Loss 11.5140 LearningRate 0.0601 Epoch: 4 Global Step: 186280 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:05,928-Speed 2622.78 samples/sec Loss 11.2130 LearningRate 0.0601 Epoch: 4 Global Step: 186290 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:09,834-Speed 2621.86 samples/sec Loss 10.8443 LearningRate 0.0601 Epoch: 4 Global Step: 186300 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:13,731-Speed 2628.32 samples/sec Loss 10.6664 LearningRate 0.0601 Epoch: 4 Global Step: 186310 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:17,632-Speed 2625.63 samples/sec Loss 10.5800 LearningRate 0.0601 Epoch: 4 Global Step: 186320 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:21,532-Speed 2626.31 samples/sec Loss 10.5862 LearningRate 0.0601 Epoch: 4 Global Step: 186330 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:25,429-Speed 2628.17 samples/sec Loss 10.5918 LearningRate 0.0601 Epoch: 4 Global Step: 186340 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:29,322-Speed 2631.33 samples/sec Loss 10.4820 LearningRate 0.0601 Epoch: 4 Global Step: 186350 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:33,214-Speed 2631.39 samples/sec Loss 10.5446 LearningRate 0.0601 Epoch: 4 Global Step: 186360 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 16:39:37,123-Speed 2620.31 samples/sec Loss 10.6743 LearningRate 0.0601 Epoch: 4 Global Step: 186370 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:39:41,021-Speed 2627.13 samples/sec Loss 10.5657 LearningRate 0.0601 Epoch: 4 Global Step: 186380 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:39:44,923-Speed 2625.26 samples/sec Loss 10.6091 LearningRate 0.0601 Epoch: 4 Global Step: 186390 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:39:48,825-Speed 2624.32 samples/sec Loss 10.5826 LearningRate 0.0601 Epoch: 4 Global Step: 186400 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:39:52,729-Speed 2623.69 samples/sec Loss 10.6542 LearningRate 0.0601 Epoch: 4 Global Step: 186410 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:39:56,640-Speed 2619.02 samples/sec Loss 10.5544 LearningRate 0.0601 Epoch: 4 Global Step: 186420 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:40:00,568-Speed 2607.91 samples/sec Loss 10.7568 LearningRate 0.0601 Epoch: 4 Global Step: 186430 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:40:04,483-Speed 2615.89 samples/sec Loss 11.0367 LearningRate 0.0601 Epoch: 4 Global Step: 186440 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:40:08,379-Speed 2629.23 samples/sec Loss 10.8165 LearningRate 0.0601 Epoch: 4 Global Step: 186450 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:40:12,284-Speed 2622.26 samples/sec Loss 10.6589 LearningRate 0.0601 Epoch: 4 Global Step: 186460 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:40:16,186-Speed 2625.28 samples/sec Loss 10.6777 LearningRate 0.0601 Epoch: 4 Global Step: 186470 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:20,097-Speed 2618.67 samples/sec Loss 10.5233 LearningRate 0.0601 Epoch: 4 Global Step: 186480 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:24,007-Speed 2619.47 samples/sec Loss 10.6226 LearningRate 0.0601 Epoch: 4 Global Step: 186490 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:27,910-Speed 2624.42 samples/sec Loss 10.6019 LearningRate 0.0601 Epoch: 4 Global Step: 186500 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:31,801-Speed 2632.36 samples/sec Loss 10.5466 LearningRate 0.0601 Epoch: 4 Global Step: 186510 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:35,693-Speed 2632.03 samples/sec Loss 10.6319 LearningRate 0.0601 Epoch: 4 Global Step: 186520 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:39,588-Speed 2629.32 samples/sec Loss 10.5208 LearningRate 0.0601 Epoch: 4 Global Step: 186530 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:43,484-Speed 2628.74 samples/sec Loss 10.5828 LearningRate 0.0601 Epoch: 4 Global Step: 186540 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:47,377-Speed 2630.92 samples/sec Loss 10.5851 LearningRate 0.0601 Epoch: 4 Global Step: 186550 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:51,285-Speed 2621.25 samples/sec Loss 10.6229 LearningRate 0.0601 Epoch: 4 Global Step: 186560 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:40:55,183-Speed 2627.48 samples/sec Loss 10.5823 LearningRate 0.0601 Epoch: 4 Global Step: 186570 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:40:59,074-Speed 2632.05 samples/sec Loss 10.5791 LearningRate 0.0601 Epoch: 4 Global Step: 186580 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:02,970-Speed 2629.29 samples/sec Loss 10.5299 LearningRate 0.0601 Epoch: 4 Global Step: 186590 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:06,871-Speed 2625.81 samples/sec Loss 10.7638 LearningRate 0.0601 Epoch: 4 Global Step: 186600 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:10,775-Speed 2623.01 samples/sec Loss 10.6339 LearningRate 0.0601 Epoch: 4 Global Step: 186610 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:14,669-Speed 2630.44 samples/sec Loss 10.6755 LearningRate 0.0601 Epoch: 4 Global Step: 186620 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:18,569-Speed 2626.33 samples/sec Loss 10.6200 LearningRate 0.0601 Epoch: 4 Global Step: 186630 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:22,546-Speed 2575.43 samples/sec Loss 10.6563 LearningRate 0.0601 Epoch: 4 Global Step: 186640 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:26,455-Speed 2620.03 samples/sec Loss 10.6602 LearningRate 0.0601 Epoch: 4 Global Step: 186650 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:30,339-Speed 2637.25 samples/sec Loss 10.6547 LearningRate 0.0601 Epoch: 4 Global Step: 186660 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:41:34,256-Speed 2614.97 samples/sec Loss 10.5280 LearningRate 0.0601 Epoch: 4 Global Step: 186670 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:41:38,164-Speed 2620.47 samples/sec Loss 10.5670 LearningRate 0.0601 Epoch: 4 Global Step: 186680 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:41:42,068-Speed 2623.26 samples/sec Loss 10.5878 LearningRate 0.0601 Epoch: 4 Global Step: 186690 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:41:45,968-Speed 2627.56 samples/sec Loss 10.6878 LearningRate 0.0601 Epoch: 4 Global Step: 186700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:41:49,870-Speed 2624.81 samples/sec Loss 10.4425 LearningRate 0.0601 Epoch: 4 Global Step: 186710 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:41:53,771-Speed 2625.81 samples/sec Loss 10.6894 LearningRate 0.0601 Epoch: 4 Global Step: 186720 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:41:57,673-Speed 2624.34 samples/sec Loss 10.6315 LearningRate 0.0600 Epoch: 4 Global Step: 186730 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:42:01,578-Speed 2623.08 samples/sec Loss 10.4926 LearningRate 0.0600 Epoch: 4 Global Step: 186740 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:42:05,463-Speed 2636.30 samples/sec Loss 10.5588 LearningRate 0.0600 Epoch: 4 Global Step: 186750 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:09,360-Speed 2628.26 samples/sec Loss 10.4621 LearningRate 0.0600 Epoch: 4 Global Step: 186760 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:13,261-Speed 2625.16 samples/sec Loss 10.6299 LearningRate 0.0600 Epoch: 4 Global Step: 186770 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:17,163-Speed 2624.61 samples/sec Loss 10.5018 LearningRate 0.0600 Epoch: 4 Global Step: 186780 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:21,065-Speed 2625.15 samples/sec Loss 10.4914 LearningRate 0.0600 Epoch: 4 Global Step: 186790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:24,972-Speed 2625.15 samples/sec Loss 10.5077 LearningRate 0.0600 Epoch: 4 Global Step: 186800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:28,874-Speed 2624.50 samples/sec Loss 10.4358 LearningRate 0.0600 Epoch: 4 Global Step: 186810 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:32,776-Speed 2625.05 samples/sec Loss 10.4901 LearningRate 0.0600 Epoch: 4 Global Step: 186820 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:36,677-Speed 2625.36 samples/sec Loss 10.5502 LearningRate 0.0600 Epoch: 4 Global Step: 186830 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:40,572-Speed 2629.87 samples/sec Loss 10.5861 LearningRate 0.0600 Epoch: 4 Global Step: 186840 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:42:44,464-Speed 2630.90 samples/sec Loss 10.5704 LearningRate 0.0600 Epoch: 4 Global Step: 186850 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:42:48,357-Speed 2631.15 samples/sec Loss 10.6421 LearningRate 0.0600 Epoch: 4 Global Step: 186860 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:42:52,248-Speed 2632.09 samples/sec Loss 10.6998 LearningRate 0.0600 Epoch: 4 Global Step: 186870 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:42:56,123-Speed 2643.71 samples/sec Loss 10.4447 LearningRate 0.0600 Epoch: 4 Global Step: 186880 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:00,016-Speed 2631.09 samples/sec Loss 10.6650 LearningRate 0.0600 Epoch: 4 Global Step: 186890 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:03,908-Speed 2631.90 samples/sec Loss 10.5083 LearningRate 0.0600 Epoch: 4 Global Step: 186900 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:07,801-Speed 2631.02 samples/sec Loss 10.4122 LearningRate 0.0600 Epoch: 4 Global Step: 186910 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:11,693-Speed 2631.36 samples/sec Loss 10.6849 LearningRate 0.0600 Epoch: 4 Global Step: 186920 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:15,585-Speed 2631.40 samples/sec Loss 10.6880 LearningRate 0.0600 Epoch: 4 Global Step: 186930 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:19,497-Speed 2618.74 samples/sec Loss 10.4990 LearningRate 0.0600 Epoch: 4 Global Step: 186940 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:23,443-Speed 2595.80 samples/sec Loss 10.6097 LearningRate 0.0600 Epoch: 4 Global Step: 186950 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:27,353-Speed 2619.46 samples/sec Loss 10.6315 LearningRate 0.0600 Epoch: 4 Global Step: 186960 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:31,247-Speed 2629.92 samples/sec Loss 10.6655 LearningRate 0.0600 Epoch: 4 Global Step: 186970 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:35,145-Speed 2627.97 samples/sec Loss 10.4514 LearningRate 0.0600 Epoch: 4 Global Step: 186980 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:43:39,034-Speed 2633.92 samples/sec Loss 10.6183 LearningRate 0.0600 Epoch: 4 Global Step: 186990 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:43:42,925-Speed 2631.85 samples/sec Loss 10.6322 LearningRate 0.0600 Epoch: 4 Global Step: 187000 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:43:46,818-Speed 2631.12 samples/sec Loss 10.6246 LearningRate 0.0600 Epoch: 4 Global Step: 187010 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:43:50,698-Speed 2639.70 samples/sec Loss 10.6591 LearningRate 0.0600 Epoch: 4 Global Step: 187020 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:43:54,575-Speed 2642.00 samples/sec Loss 10.5420 LearningRate 0.0600 Epoch: 4 Global Step: 187030 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:43:58,462-Speed 2635.18 samples/sec Loss 10.6892 LearningRate 0.0600 Epoch: 4 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:02,359-Speed 2627.98 samples/sec Loss 10.7331 LearningRate 0.0600 Epoch: 4 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:06,254-Speed 2629.58 samples/sec Loss 10.7130 LearningRate 0.0600 Epoch: 4 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:10,155-Speed 2626.04 samples/sec Loss 10.6622 LearningRate 0.0600 Epoch: 4 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:14,052-Speed 2628.01 samples/sec Loss 10.5504 LearningRate 0.0600 Epoch: 4 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:17,952-Speed 2626.08 samples/sec Loss 10.5250 LearningRate 0.0600 Epoch: 4 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:21,848-Speed 2628.72 samples/sec Loss 10.6355 LearningRate 0.0600 Epoch: 4 Global Step: 187100 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:25,748-Speed 2626.73 samples/sec Loss 10.6229 LearningRate 0.0600 Epoch: 4 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:29,646-Speed 2627.06 samples/sec Loss 10.6098 LearningRate 0.0600 Epoch: 4 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:33,568-Speed 2611.63 samples/sec Loss 10.6042 LearningRate 0.0600 Epoch: 4 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 16:44:37,498-Speed 2606.06 samples/sec Loss 10.5648 LearningRate 0.0600 Epoch: 4 Global Step: 187140 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:44:41,422-Speed 2610.14 samples/sec Loss 10.6832 LearningRate 0.0600 Epoch: 4 Global Step: 187150 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:44:45,344-Speed 2612.01 samples/sec Loss 10.5489 LearningRate 0.0600 Epoch: 4 Global Step: 187160 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:44:49,239-Speed 2629.32 samples/sec Loss 10.6238 LearningRate 0.0600 Epoch: 4 Global Step: 187170 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:44:53,130-Speed 2632.68 samples/sec Loss 10.6794 LearningRate 0.0600 Epoch: 4 Global Step: 187180 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:44:57,021-Speed 2631.88 samples/sec Loss 10.5339 LearningRate 0.0600 Epoch: 4 Global Step: 187190 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:00,914-Speed 2631.01 samples/sec Loss 10.5048 LearningRate 0.0600 Epoch: 4 Global Step: 187200 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:04,807-Speed 2630.60 samples/sec Loss 10.6867 LearningRate 0.0600 Epoch: 4 Global Step: 187210 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:08,707-Speed 2626.59 samples/sec Loss 10.6137 LearningRate 0.0600 Epoch: 4 Global Step: 187220 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:12,600-Speed 2630.99 samples/sec Loss 10.5960 LearningRate 0.0600 Epoch: 4 Global Step: 187230 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:16,492-Speed 2631.91 samples/sec Loss 10.5985 LearningRate 0.0600 Epoch: 4 Global Step: 187240 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:45:20,386-Speed 2630.06 samples/sec Loss 10.5914 LearningRate 0.0600 Epoch: 4 Global Step: 187250 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:45:24,277-Speed 2632.59 samples/sec Loss 10.5621 LearningRate 0.0599 Epoch: 4 Global Step: 187260 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:45:28,170-Speed 2631.01 samples/sec Loss 10.5079 LearningRate 0.0599 Epoch: 4 Global Step: 187270 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:32,078-Speed 2620.81 samples/sec Loss 10.5303 LearningRate 0.0599 Epoch: 4 Global Step: 187280 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:35,986-Speed 2620.68 samples/sec Loss 10.4235 LearningRate 0.0599 Epoch: 4 Global Step: 187290 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:39,884-Speed 2627.48 samples/sec Loss 10.5743 LearningRate 0.0599 Epoch: 4 Global Step: 187300 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:43,779-Speed 2629.93 samples/sec Loss 10.5982 LearningRate 0.0599 Epoch: 4 Global Step: 187310 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:47,678-Speed 2626.73 samples/sec Loss 10.5930 LearningRate 0.0599 Epoch: 4 Global Step: 187320 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:51,569-Speed 2632.65 samples/sec Loss 10.5661 LearningRate 0.0599 Epoch: 4 Global Step: 187330 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:55,464-Speed 2629.07 samples/sec Loss 10.4675 LearningRate 0.0599 Epoch: 4 Global Step: 187340 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:45:59,364-Speed 2627.22 samples/sec Loss 10.4862 LearningRate 0.0599 Epoch: 4 Global Step: 187350 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:46:03,262-Speed 2627.54 samples/sec Loss 10.5555 LearningRate 0.0599 Epoch: 4 Global Step: 187360 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 16:46:07,162-Speed 2626.04 samples/sec Loss 10.6979 LearningRate 0.0599 Epoch: 4 Global Step: 187370 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:11,057-Speed 2629.04 samples/sec Loss 10.4765 LearningRate 0.0599 Epoch: 4 Global Step: 187380 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:14,978-Speed 2612.67 samples/sec Loss 10.4594 LearningRate 0.0599 Epoch: 4 Global Step: 187390 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:18,873-Speed 2630.13 samples/sec Loss 10.5156 LearningRate 0.0599 Epoch: 4 Global Step: 187400 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:22,771-Speed 2627.22 samples/sec Loss 10.6569 LearningRate 0.0599 Epoch: 4 Global Step: 187410 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:26,680-Speed 2620.46 samples/sec Loss 10.4233 LearningRate 0.0599 Epoch: 4 Global Step: 187420 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:30,569-Speed 2634.28 samples/sec Loss 10.5055 LearningRate 0.0599 Epoch: 4 Global Step: 187430 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:34,478-Speed 2620.11 samples/sec Loss 10.5126 LearningRate 0.0599 Epoch: 4 Global Step: 187440 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:38,375-Speed 2628.09 samples/sec Loss 10.6673 LearningRate 0.0599 Epoch: 4 Global Step: 187450 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:42,272-Speed 2628.21 samples/sec Loss 10.4025 LearningRate 0.0599 Epoch: 4 Global Step: 187460 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:46,168-Speed 2629.11 samples/sec Loss 10.5022 LearningRate 0.0599 Epoch: 4 Global Step: 187470 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:46:50,044-Speed 2642.65 samples/sec Loss 10.4502 LearningRate 0.0599 Epoch: 4 Global Step: 187480 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:53,939-Speed 2630.18 samples/sec Loss 10.4803 LearningRate 0.0599 Epoch: 4 Global Step: 187490 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:46:57,831-Speed 2631.39 samples/sec Loss 10.4543 LearningRate 0.0599 Epoch: 4 Global Step: 187500 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:01,728-Speed 2628.28 samples/sec Loss 10.5173 LearningRate 0.0599 Epoch: 4 Global Step: 187510 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:05,618-Speed 2632.57 samples/sec Loss 10.6053 LearningRate 0.0599 Epoch: 4 Global Step: 187520 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:09,507-Speed 2633.51 samples/sec Loss 10.6049 LearningRate 0.0599 Epoch: 4 Global Step: 187530 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:13,404-Speed 2628.46 samples/sec Loss 10.5100 LearningRate 0.0599 Epoch: 4 Global Step: 187540 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:17,301-Speed 2628.52 samples/sec Loss 10.6536 LearningRate 0.0599 Epoch: 4 Global Step: 187550 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:21,196-Speed 2629.93 samples/sec Loss 10.5619 LearningRate 0.0599 Epoch: 4 Global Step: 187560 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:25,087-Speed 2632.07 samples/sec Loss 10.5578 LearningRate 0.0599 Epoch: 4 Global Step: 187570 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:47:28,993-Speed 2622.97 samples/sec Loss 10.6286 LearningRate 0.0599 Epoch: 4 Global Step: 187580 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:47:32,889-Speed 2628.58 samples/sec Loss 10.5452 LearningRate 0.0599 Epoch: 4 Global Step: 187590 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:47:36,781-Speed 2631.31 samples/sec Loss 10.5183 LearningRate 0.0599 Epoch: 4 Global Step: 187600 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:47:40,675-Speed 2630.11 samples/sec Loss 10.4594 LearningRate 0.0599 Epoch: 4 Global Step: 187610 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:47:44,579-Speed 2624.01 samples/sec Loss 10.5644 LearningRate 0.0599 Epoch: 4 Global Step: 187620 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:47:48,473-Speed 2630.07 samples/sec Loss 10.6456 LearningRate 0.0599 Epoch: 4 Global Step: 187630 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:47:52,370-Speed 2628.68 samples/sec Loss 10.5082 LearningRate 0.0599 Epoch: 4 Global Step: 187640 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:47:56,296-Speed 2608.69 samples/sec Loss 10.4825 LearningRate 0.0599 Epoch: 4 Global Step: 187650 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:00,192-Speed 2629.09 samples/sec Loss 10.5098 LearningRate 0.0599 Epoch: 4 Global Step: 187660 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:04,094-Speed 2624.87 samples/sec Loss 10.5094 LearningRate 0.0599 Epoch: 4 Global Step: 187670 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:07,978-Speed 2636.91 samples/sec Loss 10.6363 LearningRate 0.0599 Epoch: 4 Global Step: 187680 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:11,871-Speed 2631.20 samples/sec Loss 10.6011 LearningRate 0.0599 Epoch: 4 Global Step: 187690 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:15,776-Speed 2622.66 samples/sec Loss 10.5870 LearningRate 0.0599 Epoch: 4 Global Step: 187700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:19,673-Speed 2628.65 samples/sec Loss 10.5833 LearningRate 0.0599 Epoch: 4 Global Step: 187710 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:23,568-Speed 2629.43 samples/sec Loss 10.6704 LearningRate 0.0599 Epoch: 4 Global Step: 187720 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:27,461-Speed 2631.31 samples/sec Loss 10.5115 LearningRate 0.0599 Epoch: 4 Global Step: 187730 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:48:31,337-Speed 2642.52 samples/sec Loss 10.5593 LearningRate 0.0599 Epoch: 4 Global Step: 187740 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:48:35,227-Speed 2633.09 samples/sec Loss 10.5916 LearningRate 0.0599 Epoch: 4 Global Step: 187750 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:48:39,121-Speed 2630.67 samples/sec Loss 10.6684 LearningRate 0.0599 Epoch: 4 Global Step: 187760 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:48:43,011-Speed 2632.69 samples/sec Loss 10.5890 LearningRate 0.0599 Epoch: 4 Global Step: 187770 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:48:46,902-Speed 2632.23 samples/sec Loss 10.4179 LearningRate 0.0599 Epoch: 4 Global Step: 187780 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:48:50,796-Speed 2630.57 samples/sec Loss 10.6200 LearningRate 0.0599 Epoch: 4 Global Step: 187790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:48:54,693-Speed 2628.47 samples/sec Loss 10.4958 LearningRate 0.0598 Epoch: 4 Global Step: 187800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:48:58,585-Speed 2631.46 samples/sec Loss 10.5773 LearningRate 0.0598 Epoch: 4 Global Step: 187810 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:02,482-Speed 2628.06 samples/sec Loss 10.6375 LearningRate 0.0598 Epoch: 4 Global Step: 187820 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:06,376-Speed 2630.82 samples/sec Loss 10.6877 LearningRate 0.0598 Epoch: 4 Global Step: 187830 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:10,281-Speed 2622.65 samples/sec Loss 10.4339 LearningRate 0.0598 Epoch: 4 Global Step: 187840 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:49:14,178-Speed 2628.00 samples/sec Loss 10.5151 LearningRate 0.0598 Epoch: 4 Global Step: 187850 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:49:18,073-Speed 2629.54 samples/sec Loss 10.3564 LearningRate 0.0598 Epoch: 4 Global Step: 187860 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:49:21,949-Speed 2642.36 samples/sec Loss 10.7312 LearningRate 0.0598 Epoch: 4 Global Step: 187870 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:25,842-Speed 2631.77 samples/sec Loss 10.4175 LearningRate 0.0598 Epoch: 4 Global Step: 187880 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:29,749-Speed 2621.72 samples/sec Loss 10.6202 LearningRate 0.0598 Epoch: 4 Global Step: 187890 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:33,641-Speed 2631.65 samples/sec Loss 10.6131 LearningRate 0.0598 Epoch: 4 Global Step: 187900 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:37,543-Speed 2624.59 samples/sec Loss 10.4425 LearningRate 0.0598 Epoch: 4 Global Step: 187910 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:41,442-Speed 2626.77 samples/sec Loss 10.5132 LearningRate 0.0598 Epoch: 4 Global Step: 187920 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:45,347-Speed 2623.00 samples/sec Loss 10.7070 LearningRate 0.0598 Epoch: 4 Global Step: 187930 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:49,251-Speed 2623.80 samples/sec Loss 10.4266 LearningRate 0.0598 Epoch: 4 Global Step: 187940 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:53,165-Speed 2616.60 samples/sec Loss 10.6033 LearningRate 0.0598 Epoch: 4 Global Step: 187950 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:49:57,070-Speed 2622.90 samples/sec Loss 10.4511 LearningRate 0.0598 Epoch: 4 Global Step: 187960 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:00,973-Speed 2624.65 samples/sec Loss 10.5942 LearningRate 0.0598 Epoch: 4 Global Step: 187970 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:50:04,879-Speed 2622.24 samples/sec Loss 10.4375 LearningRate 0.0598 Epoch: 4 Global Step: 187980 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:50:08,784-Speed 2623.07 samples/sec Loss 10.3505 LearningRate 0.0598 Epoch: 4 Global Step: 187990 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:50:12,666-Speed 2638.34 samples/sec Loss 10.6777 LearningRate 0.0598 Epoch: 4 Global Step: 188000 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:16,562-Speed 2628.63 samples/sec Loss 10.6346 LearningRate 0.0598 Epoch: 4 Global Step: 188010 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:20,457-Speed 2629.88 samples/sec Loss 10.6032 LearningRate 0.0598 Epoch: 4 Global Step: 188020 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:24,376-Speed 2613.51 samples/sec Loss 10.5192 LearningRate 0.0598 Epoch: 4 Global Step: 188030 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:28,269-Speed 2630.62 samples/sec Loss 10.4713 LearningRate 0.0598 Epoch: 4 Global Step: 188040 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:32,179-Speed 2620.23 samples/sec Loss 10.5585 LearningRate 0.0598 Epoch: 4 Global Step: 188050 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:36,099-Speed 2613.09 samples/sec Loss 10.5279 LearningRate 0.0598 Epoch: 4 Global Step: 188060 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:39,995-Speed 2628.55 samples/sec Loss 10.4093 LearningRate 0.0598 Epoch: 4 Global Step: 188070 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:43,886-Speed 2632.81 samples/sec Loss 10.3749 LearningRate 0.0598 Epoch: 4 Global Step: 188080 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:47,782-Speed 2628.82 samples/sec Loss 10.5468 LearningRate 0.0598 Epoch: 4 Global Step: 188090 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:50:51,680-Speed 2627.40 samples/sec Loss 10.4930 LearningRate 0.0598 Epoch: 4 Global Step: 188100 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:50:55,611-Speed 2606.56 samples/sec Loss 10.4804 LearningRate 0.0598 Epoch: 4 Global Step: 188110 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:50:59,509-Speed 2627.14 samples/sec Loss 10.6600 LearningRate 0.0598 Epoch: 4 Global Step: 188120 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:51:03,399-Speed 2633.41 samples/sec Loss 10.6941 LearningRate 0.0598 Epoch: 4 Global Step: 188130 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:51:07,293-Speed 2630.13 samples/sec Loss 10.5225 LearningRate 0.0598 Epoch: 4 Global Step: 188140 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:51:11,191-Speed 2627.69 samples/sec Loss 10.5506 LearningRate 0.0598 Epoch: 4 Global Step: 188150 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:51:15,063-Speed 2645.16 samples/sec Loss 10.5293 LearningRate 0.0598 Epoch: 4 Global Step: 188160 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:18,967-Speed 2623.91 samples/sec Loss 10.5275 LearningRate 0.0598 Epoch: 4 Global Step: 188170 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:22,866-Speed 2626.89 samples/sec Loss 10.5690 LearningRate 0.0598 Epoch: 4 Global Step: 188180 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:26,759-Speed 2630.62 samples/sec Loss 10.5950 LearningRate 0.0598 Epoch: 4 Global Step: 188190 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:30,652-Speed 2631.85 samples/sec Loss 10.3900 LearningRate 0.0598 Epoch: 4 Global Step: 188200 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:34,550-Speed 2627.68 samples/sec Loss 10.5584 LearningRate 0.0598 Epoch: 4 Global Step: 188210 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:38,444-Speed 2629.81 samples/sec Loss 10.5079 LearningRate 0.0598 Epoch: 4 Global Step: 188220 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:42,336-Speed 2631.47 samples/sec Loss 10.5920 LearningRate 0.0598 Epoch: 4 Global Step: 188230 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:46,230-Speed 2630.85 samples/sec Loss 10.6104 LearningRate 0.0598 Epoch: 4 Global Step: 188240 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:50,125-Speed 2629.45 samples/sec Loss 10.6349 LearningRate 0.0598 Epoch: 4 Global Step: 188250 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:51:54,021-Speed 2629.38 samples/sec Loss 10.5141 LearningRate 0.0598 Epoch: 4 Global Step: 188260 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:51:57,902-Speed 2638.68 samples/sec Loss 10.5972 LearningRate 0.0598 Epoch: 4 Global Step: 188270 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:01,798-Speed 2629.39 samples/sec Loss 10.4540 LearningRate 0.0598 Epoch: 4 Global Step: 188280 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:05,691-Speed 2631.14 samples/sec Loss 10.4539 LearningRate 0.0598 Epoch: 4 Global Step: 188290 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:09,583-Speed 2631.62 samples/sec Loss 10.5220 LearningRate 0.0598 Epoch: 4 Global Step: 188300 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:13,477-Speed 2630.01 samples/sec Loss 10.6716 LearningRate 0.0598 Epoch: 4 Global Step: 188310 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:17,370-Speed 2631.21 samples/sec Loss 10.4980 LearningRate 0.0598 Epoch: 4 Global Step: 188320 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:21,267-Speed 2628.60 samples/sec Loss 10.4403 LearningRate 0.0598 Epoch: 4 Global Step: 188330 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:25,171-Speed 2623.78 samples/sec Loss 10.4099 LearningRate 0.0597 Epoch: 4 Global Step: 188340 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:29,067-Speed 2629.21 samples/sec Loss 10.4585 LearningRate 0.0597 Epoch: 4 Global Step: 188350 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:32,962-Speed 2629.47 samples/sec Loss 10.6576 LearningRate 0.0597 Epoch: 4 Global Step: 188360 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:52:36,858-Speed 2628.93 samples/sec Loss 10.4857 LearningRate 0.0597 Epoch: 4 Global Step: 188370 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:52:40,760-Speed 2624.45 samples/sec Loss 10.5338 LearningRate 0.0597 Epoch: 4 Global Step: 188380 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:52:44,657-Speed 2628.83 samples/sec Loss 10.5389 LearningRate 0.0597 Epoch: 4 Global Step: 188390 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:52:48,566-Speed 2620.43 samples/sec Loss 10.6752 LearningRate 0.0597 Epoch: 4 Global Step: 188400 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:52:52,476-Speed 2619.84 samples/sec Loss 10.6253 LearningRate 0.0597 Epoch: 4 Global Step: 188410 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:52:56,378-Speed 2624.34 samples/sec Loss 10.6157 LearningRate 0.0597 Epoch: 4 Global Step: 188420 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:00,285-Speed 2622.28 samples/sec Loss 10.4068 LearningRate 0.0597 Epoch: 4 Global Step: 188430 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:04,206-Speed 2611.56 samples/sec Loss 10.3940 LearningRate 0.0597 Epoch: 4 Global Step: 188440 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:08,103-Speed 2627.99 samples/sec Loss 10.5349 LearningRate 0.0597 Epoch: 4 Global Step: 188450 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:12,007-Speed 2623.59 samples/sec Loss 10.5692 LearningRate 0.0597 Epoch: 4 Global Step: 188460 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:15,889-Speed 2639.07 samples/sec Loss 10.6236 LearningRate 0.0597 Epoch: 4 Global Step: 188470 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:19,789-Speed 2625.62 samples/sec Loss 10.5436 LearningRate 0.0597 Epoch: 4 Global Step: 188480 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:23,692-Speed 2624.63 samples/sec Loss 10.5635 LearningRate 0.0597 Epoch: 4 Global Step: 188490 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:27,647-Speed 2590.31 samples/sec Loss 10.6575 LearningRate 0.0597 Epoch: 4 Global Step: 188500 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:31,553-Speed 2622.32 samples/sec Loss 10.3975 LearningRate 0.0597 Epoch: 4 Global Step: 188510 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:35,458-Speed 2622.31 samples/sec Loss 10.6141 LearningRate 0.0597 Epoch: 4 Global Step: 188520 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:39,361-Speed 2624.25 samples/sec Loss 10.5599 LearningRate 0.0597 Epoch: 4 Global Step: 188530 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:43,323-Speed 2584.97 samples/sec Loss 10.5793 LearningRate 0.0597 Epoch: 4 Global Step: 188540 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:47,219-Speed 2629.66 samples/sec Loss 10.5825 LearningRate 0.0597 Epoch: 4 Global Step: 188550 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:51,125-Speed 2622.24 samples/sec Loss 10.5474 LearningRate 0.0597 Epoch: 4 Global Step: 188560 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:55,004-Speed 2640.25 samples/sec Loss 10.5616 LearningRate 0.0597 Epoch: 4 Global Step: 188570 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:53:58,888-Speed 2637.26 samples/sec Loss 10.5254 LearningRate 0.0597 Epoch: 4 Global Step: 188580 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:02,779-Speed 2632.68 samples/sec Loss 10.4823 LearningRate 0.0597 Epoch: 4 Global Step: 188590 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:06,675-Speed 2628.66 samples/sec Loss 10.5075 LearningRate 0.0597 Epoch: 4 Global Step: 188600 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:10,566-Speed 2632.60 samples/sec Loss 10.6056 LearningRate 0.0597 Epoch: 4 Global Step: 188610 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:14,461-Speed 2629.92 samples/sec Loss 10.5104 LearningRate 0.0597 Epoch: 4 Global Step: 188620 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:18,354-Speed 2631.05 samples/sec Loss 10.4864 LearningRate 0.0597 Epoch: 4 Global Step: 188630 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:22,246-Speed 2631.69 samples/sec Loss 10.5838 LearningRate 0.0597 Epoch: 4 Global Step: 188640 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:26,143-Speed 2628.50 samples/sec Loss 10.5098 LearningRate 0.0597 Epoch: 4 Global Step: 188650 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:30,037-Speed 2630.20 samples/sec Loss 10.5311 LearningRate 0.0597 Epoch: 4 Global Step: 188660 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:33,929-Speed 2631.71 samples/sec Loss 10.6492 LearningRate 0.0597 Epoch: 4 Global Step: 188670 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:37,824-Speed 2630.02 samples/sec Loss 10.5138 LearningRate 0.0597 Epoch: 4 Global Step: 188680 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:54:41,741-Speed 2614.41 samples/sec Loss 10.5098 LearningRate 0.0597 Epoch: 4 Global Step: 188690 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:54:45,676-Speed 2602.85 samples/sec Loss 10.7023 LearningRate 0.0597 Epoch: 4 Global Step: 188700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:54:49,559-Speed 2637.96 samples/sec Loss 10.5941 LearningRate 0.0597 Epoch: 4 Global Step: 188710 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:53,458-Speed 2626.71 samples/sec Loss 10.5436 LearningRate 0.0597 Epoch: 4 Global Step: 188720 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:54:57,360-Speed 2625.37 samples/sec Loss 10.5774 LearningRate 0.0597 Epoch: 4 Global Step: 188730 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:01,255-Speed 2629.85 samples/sec Loss 10.3777 LearningRate 0.0597 Epoch: 4 Global Step: 188740 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:05,171-Speed 2615.82 samples/sec Loss 10.6520 LearningRate 0.0597 Epoch: 4 Global Step: 188750 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:09,090-Speed 2613.47 samples/sec Loss 10.5316 LearningRate 0.0597 Epoch: 4 Global Step: 188760 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:12,988-Speed 2627.42 samples/sec Loss 10.5007 LearningRate 0.0597 Epoch: 4 Global Step: 188770 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:16,884-Speed 2628.92 samples/sec Loss 10.4725 LearningRate 0.0597 Epoch: 4 Global Step: 188780 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:20,779-Speed 2629.59 samples/sec Loss 10.4902 LearningRate 0.0597 Epoch: 4 Global Step: 188790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:24,674-Speed 2629.81 samples/sec Loss 10.4950 LearningRate 0.0597 Epoch: 4 Global Step: 188800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:55:28,573-Speed 2627.45 samples/sec Loss 10.4425 LearningRate 0.0597 Epoch: 4 Global Step: 188810 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:32,466-Speed 2630.67 samples/sec Loss 10.4759 LearningRate 0.0597 Epoch: 4 Global Step: 188820 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:36,387-Speed 2612.33 samples/sec Loss 10.4692 LearningRate 0.0597 Epoch: 4 Global Step: 188830 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:40,290-Speed 2624.64 samples/sec Loss 10.5670 LearningRate 0.0597 Epoch: 4 Global Step: 188840 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:44,187-Speed 2628.02 samples/sec Loss 10.4246 LearningRate 0.0597 Epoch: 4 Global Step: 188850 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:48,092-Speed 2622.63 samples/sec Loss 10.4728 LearningRate 0.0597 Epoch: 4 Global Step: 188860 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:51,990-Speed 2627.76 samples/sec Loss 10.4553 LearningRate 0.0596 Epoch: 4 Global Step: 188870 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:55,886-Speed 2628.89 samples/sec Loss 10.3828 LearningRate 0.0596 Epoch: 4 Global Step: 188880 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:55:59,770-Speed 2637.74 samples/sec Loss 10.4221 LearningRate 0.0596 Epoch: 4 Global Step: 188890 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:03,674-Speed 2623.30 samples/sec Loss 10.5587 LearningRate 0.0596 Epoch: 4 Global Step: 188900 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:07,569-Speed 2630.36 samples/sec Loss 10.5024 LearningRate 0.0596 Epoch: 4 Global Step: 188910 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:11,473-Speed 2622.85 samples/sec Loss 10.4777 LearningRate 0.0596 Epoch: 4 Global Step: 188920 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:15,374-Speed 2625.47 samples/sec Loss 10.6322 LearningRate 0.0596 Epoch: 4 Global Step: 188930 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:19,277-Speed 2624.59 samples/sec Loss 10.5715 LearningRate 0.0596 Epoch: 4 Global Step: 188940 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:23,167-Speed 2633.07 samples/sec Loss 10.4818 LearningRate 0.0596 Epoch: 4 Global Step: 188950 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:27,061-Speed 2630.80 samples/sec Loss 10.5365 LearningRate 0.0596 Epoch: 4 Global Step: 188960 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:30,972-Speed 2618.76 samples/sec Loss 10.5647 LearningRate 0.0596 Epoch: 4 Global Step: 188970 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:34,884-Speed 2618.32 samples/sec Loss 10.5226 LearningRate 0.0596 Epoch: 4 Global Step: 188980 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:56:38,784-Speed 2626.29 samples/sec Loss 10.6781 LearningRate 0.0596 Epoch: 4 Global Step: 188990 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:56:42,673-Speed 2633.80 samples/sec Loss 10.6533 LearningRate 0.0596 Epoch: 4 Global Step: 189000 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:56:46,567-Speed 2630.84 samples/sec Loss 10.4945 LearningRate 0.0596 Epoch: 4 Global Step: 189010 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:56:50,464-Speed 2628.13 samples/sec Loss 10.3796 LearningRate 0.0596 Epoch: 4 Global Step: 189020 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:56:54,360-Speed 2628.84 samples/sec Loss 10.5220 LearningRate 0.0596 Epoch: 4 Global Step: 189030 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:56:58,237-Speed 2641.78 samples/sec Loss 10.3986 LearningRate 0.0596 Epoch: 4 Global Step: 189040 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:02,127-Speed 2633.26 samples/sec Loss 10.4290 LearningRate 0.0596 Epoch: 4 Global Step: 189050 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:06,020-Speed 2631.02 samples/sec Loss 10.5022 LearningRate 0.0596 Epoch: 4 Global Step: 189060 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:09,911-Speed 2632.55 samples/sec Loss 10.5786 LearningRate 0.0596 Epoch: 4 Global Step: 189070 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:13,802-Speed 2631.76 samples/sec Loss 10.5241 LearningRate 0.0596 Epoch: 4 Global Step: 189080 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:17,691-Speed 2634.22 samples/sec Loss 10.5457 LearningRate 0.0596 Epoch: 4 Global Step: 189090 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:21,582-Speed 2632.17 samples/sec Loss 10.6496 LearningRate 0.0596 Epoch: 4 Global Step: 189100 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:25,470-Speed 2634.32 samples/sec Loss 10.4341 LearningRate 0.0596 Epoch: 4 Global Step: 189110 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:29,359-Speed 2633.61 samples/sec Loss 10.4533 LearningRate 0.0596 Epoch: 4 Global Step: 189120 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:33,250-Speed 2632.64 samples/sec Loss 10.4929 LearningRate 0.0596 Epoch: 4 Global Step: 189130 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:37,148-Speed 2627.75 samples/sec Loss 10.5382 LearningRate 0.0596 Epoch: 4 Global Step: 189140 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:57:41,020-Speed 2644.99 samples/sec Loss 10.6221 LearningRate 0.0596 Epoch: 4 Global Step: 189150 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:44,913-Speed 2631.08 samples/sec Loss 10.4533 LearningRate 0.0596 Epoch: 4 Global Step: 189160 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:48,806-Speed 2630.52 samples/sec Loss 10.5076 LearningRate 0.0596 Epoch: 4 Global Step: 189170 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:52,696-Speed 2633.33 samples/sec Loss 10.5468 LearningRate 0.0596 Epoch: 4 Global Step: 189180 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:57:56,586-Speed 2633.34 samples/sec Loss 10.5621 LearningRate 0.0596 Epoch: 4 Global Step: 189190 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:00,479-Speed 2630.63 samples/sec Loss 10.5538 LearningRate 0.0596 Epoch: 4 Global Step: 189200 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:04,378-Speed 2627.21 samples/sec Loss 10.4538 LearningRate 0.0596 Epoch: 4 Global Step: 189210 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:08,283-Speed 2622.71 samples/sec Loss 10.6164 LearningRate 0.0596 Epoch: 4 Global Step: 189220 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:12,172-Speed 2633.75 samples/sec Loss 10.4241 LearningRate 0.0596 Epoch: 4 Global Step: 189230 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:16,063-Speed 2632.42 samples/sec Loss 10.4067 LearningRate 0.0596 Epoch: 4 Global Step: 189240 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:19,960-Speed 2628.45 samples/sec Loss 10.4888 LearningRate 0.0596 Epoch: 4 Global Step: 189250 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:58:23,861-Speed 2625.44 samples/sec Loss 10.5865 LearningRate 0.0596 Epoch: 4 Global Step: 189260 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:58:27,765-Speed 2623.24 samples/sec Loss 10.3425 LearningRate 0.0596 Epoch: 4 Global Step: 189270 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:58:31,674-Speed 2620.51 samples/sec Loss 10.4800 LearningRate 0.0596 Epoch: 4 Global Step: 189280 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:58:35,572-Speed 2627.70 samples/sec Loss 10.6627 LearningRate 0.0596 Epoch: 4 Global Step: 189290 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:58:39,458-Speed 2635.34 samples/sec Loss 10.6046 LearningRate 0.0596 Epoch: 4 Global Step: 189300 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:43,356-Speed 2627.70 samples/sec Loss 10.4624 LearningRate 0.0596 Epoch: 4 Global Step: 189310 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:47,266-Speed 2619.76 samples/sec Loss 10.4929 LearningRate 0.0596 Epoch: 4 Global Step: 189320 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:51,165-Speed 2626.81 samples/sec Loss 10.6571 LearningRate 0.0596 Epoch: 4 Global Step: 189330 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:55,065-Speed 2626.84 samples/sec Loss 10.5338 LearningRate 0.0596 Epoch: 4 Global Step: 189340 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:58:58,974-Speed 2619.79 samples/sec Loss 10.5571 LearningRate 0.0596 Epoch: 4 Global Step: 189350 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:02,873-Speed 2626.92 samples/sec Loss 10.5913 LearningRate 0.0596 Epoch: 4 Global Step: 189360 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:06,767-Speed 2630.30 samples/sec Loss 10.4842 LearningRate 0.0596 Epoch: 4 Global Step: 189370 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:10,662-Speed 2629.17 samples/sec Loss 10.6622 LearningRate 0.0596 Epoch: 4 Global Step: 189380 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:14,564-Speed 2625.22 samples/sec Loss 10.4648 LearningRate 0.0596 Epoch: 4 Global Step: 189390 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:18,466-Speed 2624.86 samples/sec Loss 10.3647 LearningRate 0.0596 Epoch: 4 Global Step: 189400 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:59:22,360-Speed 2630.70 samples/sec Loss 10.4286 LearningRate 0.0595 Epoch: 4 Global Step: 189410 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:59:26,255-Speed 2629.82 samples/sec Loss 10.4091 LearningRate 0.0595 Epoch: 4 Global Step: 189420 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 16:59:30,141-Speed 2635.61 samples/sec Loss 10.5437 LearningRate 0.0595 Epoch: 4 Global Step: 189430 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:34,049-Speed 2621.08 samples/sec Loss 10.5333 LearningRate 0.0595 Epoch: 4 Global Step: 189440 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:37,949-Speed 2625.94 samples/sec Loss 10.5275 LearningRate 0.0595 Epoch: 4 Global Step: 189450 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:41,851-Speed 2624.86 samples/sec Loss 10.4434 LearningRate 0.0595 Epoch: 4 Global Step: 189460 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:45,758-Speed 2621.89 samples/sec Loss 10.5698 LearningRate 0.0595 Epoch: 4 Global Step: 189470 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:49,646-Speed 2633.61 samples/sec Loss 10.3985 LearningRate 0.0595 Epoch: 4 Global Step: 189480 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:53,541-Speed 2630.70 samples/sec Loss 10.3998 LearningRate 0.0595 Epoch: 4 Global Step: 189490 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 16:59:57,434-Speed 2630.44 samples/sec Loss 10.4035 LearningRate 0.0595 Epoch: 4 Global Step: 189500 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:01,325-Speed 2632.59 samples/sec Loss 10.5844 LearningRate 0.0595 Epoch: 4 Global Step: 189510 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:05,217-Speed 2631.44 samples/sec Loss 10.4401 LearningRate 0.0595 Epoch: 4 Global Step: 189520 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:09,106-Speed 2633.85 samples/sec Loss 10.5408 LearningRate 0.0595 Epoch: 4 Global Step: 189530 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:00:12,999-Speed 2630.59 samples/sec Loss 10.5481 LearningRate 0.0595 Epoch: 4 Global Step: 189540 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:00:16,890-Speed 2632.43 samples/sec Loss 10.4053 LearningRate 0.0595 Epoch: 4 Global Step: 189550 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:00:20,788-Speed 2627.55 samples/sec Loss 10.3970 LearningRate 0.0595 Epoch: 4 Global Step: 189560 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:00:24,679-Speed 2632.20 samples/sec Loss 10.5072 LearningRate 0.0595 Epoch: 4 Global Step: 189570 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:00:28,571-Speed 2631.89 samples/sec Loss 10.4413 LearningRate 0.0595 Epoch: 4 Global Step: 189580 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:00:32,483-Speed 2618.35 samples/sec Loss 10.5960 LearningRate 0.0595 Epoch: 4 Global Step: 189590 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:36,371-Speed 2634.28 samples/sec Loss 10.5894 LearningRate 0.0595 Epoch: 4 Global Step: 189600 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:40,264-Speed 2631.02 samples/sec Loss 10.5283 LearningRate 0.0595 Epoch: 4 Global Step: 189610 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:44,156-Speed 2631.75 samples/sec Loss 10.5360 LearningRate 0.0595 Epoch: 4 Global Step: 189620 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:48,045-Speed 2633.06 samples/sec Loss 10.5062 LearningRate 0.0595 Epoch: 4 Global Step: 189630 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:51,944-Speed 2627.90 samples/sec Loss 10.4863 LearningRate 0.0595 Epoch: 4 Global Step: 189640 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:55,840-Speed 2629.01 samples/sec Loss 10.5333 LearningRate 0.0595 Epoch: 4 Global Step: 189650 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:00:59,764-Speed 2610.38 samples/sec Loss 10.3506 LearningRate 0.0595 Epoch: 4 Global Step: 189660 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:03,664-Speed 2625.96 samples/sec Loss 10.4475 LearningRate 0.0595 Epoch: 4 Global Step: 189670 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:07,553-Speed 2633.95 samples/sec Loss 10.3458 LearningRate 0.0595 Epoch: 4 Global Step: 189680 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:11,446-Speed 2631.32 samples/sec Loss 10.5997 LearningRate 0.0595 Epoch: 4 Global Step: 189690 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:01:15,353-Speed 2621.25 samples/sec Loss 10.3963 LearningRate 0.0595 Epoch: 4 Global Step: 189700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:01:19,241-Speed 2634.32 samples/sec Loss 10.5665 LearningRate 0.0595 Epoch: 4 Global Step: 189710 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:01:23,139-Speed 2628.01 samples/sec Loss 10.3996 LearningRate 0.0595 Epoch: 4 Global Step: 189720 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:01:27,035-Speed 2629.00 samples/sec Loss 10.4833 LearningRate 0.0595 Epoch: 4 Global Step: 189730 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:01:30,951-Speed 2615.63 samples/sec Loss 10.4912 LearningRate 0.0595 Epoch: 4 Global Step: 189740 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:35,014-Speed 2520.77 samples/sec Loss 10.5216 LearningRate 0.0595 Epoch: 4 Global Step: 189750 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:38,984-Speed 2580.23 samples/sec Loss 10.4604 LearningRate 0.0595 Epoch: 4 Global Step: 189760 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:42,877-Speed 2630.26 samples/sec Loss 10.5101 LearningRate 0.0595 Epoch: 4 Global Step: 189770 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:46,778-Speed 2625.83 samples/sec Loss 10.5915 LearningRate 0.0595 Epoch: 4 Global Step: 189780 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:50,669-Speed 2632.75 samples/sec Loss 10.5055 LearningRate 0.0595 Epoch: 4 Global Step: 189790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:54,562-Speed 2631.08 samples/sec Loss 10.4294 LearningRate 0.0595 Epoch: 4 Global Step: 189800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:01:58,460-Speed 2627.90 samples/sec Loss 10.6250 LearningRate 0.0595 Epoch: 4 Global Step: 189810 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:02:02,354-Speed 2630.20 samples/sec Loss 10.4573 LearningRate 0.0595 Epoch: 4 Global Step: 189820 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:02:06,247-Speed 2631.27 samples/sec Loss 10.4817 LearningRate 0.0595 Epoch: 4 Global Step: 189830 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:02:10,143-Speed 2628.67 samples/sec Loss 10.4053 LearningRate 0.0595 Epoch: 4 Global Step: 189840 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:14,038-Speed 2629.20 samples/sec Loss 10.4245 LearningRate 0.0595 Epoch: 4 Global Step: 189850 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:17,928-Speed 2633.18 samples/sec Loss 10.5061 LearningRate 0.0595 Epoch: 4 Global Step: 189860 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:21,826-Speed 2628.12 samples/sec Loss 10.4600 LearningRate 0.0595 Epoch: 4 Global Step: 189870 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:25,738-Speed 2618.05 samples/sec Loss 10.2931 LearningRate 0.0595 Epoch: 4 Global Step: 189880 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:29,630-Speed 2631.36 samples/sec Loss 10.4296 LearningRate 0.0595 Epoch: 4 Global Step: 189890 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:33,533-Speed 2624.90 samples/sec Loss 10.3907 LearningRate 0.0595 Epoch: 4 Global Step: 189900 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:37,431-Speed 2627.71 samples/sec Loss 10.5253 LearningRate 0.0595 Epoch: 4 Global Step: 189910 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:41,326-Speed 2629.89 samples/sec Loss 10.5326 LearningRate 0.0595 Epoch: 4 Global Step: 189920 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:45,238-Speed 2618.71 samples/sec Loss 10.5214 LearningRate 0.0595 Epoch: 4 Global Step: 189930 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:49,119-Speed 2639.13 samples/sec Loss 10.5698 LearningRate 0.0595 Epoch: 4 Global Step: 189940 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:53,031-Speed 2618.50 samples/sec Loss 10.5068 LearningRate 0.0594 Epoch: 4 Global Step: 189950 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:02:56,931-Speed 2626.50 samples/sec Loss 10.5631 LearningRate 0.0594 Epoch: 4 Global Step: 189960 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:03:00,825-Speed 2630.47 samples/sec Loss 10.3840 LearningRate 0.0594 Epoch: 4 Global Step: 189970 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:03:04,719-Speed 2629.70 samples/sec Loss 10.4495 LearningRate 0.0594 Epoch: 4 Global Step: 189980 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:03:08,614-Speed 2630.54 samples/sec Loss 10.4605 LearningRate 0.0594 Epoch: 4 Global Step: 189990 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:03:12,541-Speed 2608.47 samples/sec Loss 10.5089 LearningRate 0.0594 Epoch: 4 Global Step: 190000 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:03:55,704-[lfw][190000]XNorm: 23.847233
Training: 2022-04-13 17:03:55,704-[lfw][190000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-13 17:03:55,705-[lfw][190000]Accuracy-Highest: 0.99783
Training: 2022-04-13 17:04:46,102-[cfp_fp][190000]XNorm: 21.677466
Training: 2022-04-13 17:04:46,103-[cfp_fp][190000]Accuracy-Flip: 0.98071+-0.00804
Training: 2022-04-13 17:04:46,104-[cfp_fp][190000]Accuracy-Highest: 0.98100
Training: 2022-04-13 17:05:28,829-[agedb_30][190000]XNorm: 23.758015
Training: 2022-04-13 17:05:28,830-[agedb_30][190000]Accuracy-Flip: 0.96967+-0.00785
Training: 2022-04-13 17:05:28,831-[agedb_30][190000]Accuracy-Highest: 0.97150
Training: 2022-04-13 17:05:32,703-Speed 73.06 samples/sec Loss 10.4995 LearningRate 0.0594 Epoch: 4 Global Step: 190010 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:05:36,530-Speed 2676.24 samples/sec Loss 10.4946 LearningRate 0.0594 Epoch: 4 Global Step: 190020 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:05:40,400-Speed 2647.36 samples/sec Loss 10.6074 LearningRate 0.0594 Epoch: 4 Global Step: 190030 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:05:44,283-Speed 2637.17 samples/sec Loss 10.4411 LearningRate 0.0594 Epoch: 4 Global Step: 190040 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:05:48,284-Speed 2561.59 samples/sec Loss 10.4263 LearningRate 0.0594 Epoch: 4 Global Step: 190050 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:05:52,156-Speed 2645.62 samples/sec Loss 11.1960 LearningRate 0.0594 Epoch: 4 Global Step: 190060 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:05:56,037-Speed 2639.45 samples/sec Loss 11.1806 LearningRate 0.0594 Epoch: 4 Global Step: 190070 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:05:59,916-Speed 2639.78 samples/sec Loss 10.7807 LearningRate 0.0594 Epoch: 4 Global Step: 190080 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:03,803-Speed 2635.90 samples/sec Loss 10.5264 LearningRate 0.0594 Epoch: 4 Global Step: 190090 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:07,696-Speed 2630.58 samples/sec Loss 10.5076 LearningRate 0.0594 Epoch: 4 Global Step: 190100 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:11,579-Speed 2638.17 samples/sec Loss 10.4712 LearningRate 0.0594 Epoch: 4 Global Step: 190110 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:15,461-Speed 2638.00 samples/sec Loss 10.5662 LearningRate 0.0594 Epoch: 4 Global Step: 190120 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:19,356-Speed 2629.79 samples/sec Loss 10.5367 LearningRate 0.0594 Epoch: 4 Global Step: 190130 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:23,243-Speed 2635.81 samples/sec Loss 10.8120 LearningRate 0.0594 Epoch: 4 Global Step: 190140 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:27,129-Speed 2635.55 samples/sec Loss 10.5976 LearningRate 0.0594 Epoch: 4 Global Step: 190150 Fp16 Grad Scale: 16384 Required: 72 hours
Training: 2022-04-13 17:06:31,018-Speed 2634.12 samples/sec Loss 10.5918 LearningRate 0.0594 Epoch: 4 Global Step: 190160 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:06:34,913-Speed 2629.08 samples/sec Loss 10.4924 LearningRate 0.0594 Epoch: 4 Global Step: 190170 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:06:38,806-Speed 2631.10 samples/sec Loss 10.5546 LearningRate 0.0594 Epoch: 4 Global Step: 190180 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:06:42,699-Speed 2631.14 samples/sec Loss 10.5572 LearningRate 0.0594 Epoch: 4 Global Step: 190190 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:06:46,603-Speed 2623.29 samples/sec Loss 10.5023 LearningRate 0.0594 Epoch: 4 Global Step: 190200 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:06:50,497-Speed 2630.54 samples/sec Loss 10.5750 LearningRate 0.0594 Epoch: 4 Global Step: 190210 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:06:54,392-Speed 2629.76 samples/sec Loss 10.6720 LearningRate 0.0594 Epoch: 4 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:06:58,289-Speed 2628.73 samples/sec Loss 10.5994 LearningRate 0.0594 Epoch: 4 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:07:02,192-Speed 2624.00 samples/sec Loss 10.7873 LearningRate 0.0594 Epoch: 4 Global Step: 190240 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:07:06,100-Speed 2621.39 samples/sec Loss 10.5660 LearningRate 0.0594 Epoch: 4 Global Step: 190250 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:07:10,086-Speed 2569.48 samples/sec Loss 10.4971 LearningRate 0.0594 Epoch: 4 Global Step: 190260 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:14,155-Speed 2517.45 samples/sec Loss 10.5623 LearningRate 0.0594 Epoch: 4 Global Step: 190270 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:18,108-Speed 2590.96 samples/sec Loss 10.5113 LearningRate 0.0594 Epoch: 4 Global Step: 190280 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:22,212-Speed 2495.77 samples/sec Loss 10.5859 LearningRate 0.0594 Epoch: 4 Global Step: 190290 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:26,141-Speed 2607.52 samples/sec Loss 10.7468 LearningRate 0.0594 Epoch: 4 Global Step: 190300 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:30,070-Speed 2606.92 samples/sec Loss 10.3825 LearningRate 0.0594 Epoch: 4 Global Step: 190310 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:33,961-Speed 2631.87 samples/sec Loss 10.5035 LearningRate 0.0594 Epoch: 4 Global Step: 190320 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:37,857-Speed 2628.84 samples/sec Loss 10.6838 LearningRate 0.0594 Epoch: 4 Global Step: 190330 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:41,752-Speed 2630.30 samples/sec Loss 10.5093 LearningRate 0.0594 Epoch: 4 Global Step: 190340 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:45,645-Speed 2630.71 samples/sec Loss 10.6347 LearningRate 0.0594 Epoch: 4 Global Step: 190350 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:07:49,665-Speed 2548.36 samples/sec Loss 10.5058 LearningRate 0.0594 Epoch: 4 Global Step: 190360 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:07:53,644-Speed 2573.94 samples/sec Loss 10.4967 LearningRate 0.0594 Epoch: 4 Global Step: 190370 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:07:57,537-Speed 2631.77 samples/sec Loss 10.4387 LearningRate 0.0594 Epoch: 4 Global Step: 190380 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:01,433-Speed 2628.93 samples/sec Loss 10.6314 LearningRate 0.0594 Epoch: 4 Global Step: 190390 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:05,328-Speed 2628.94 samples/sec Loss 10.5006 LearningRate 0.0594 Epoch: 4 Global Step: 190400 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:09,235-Speed 2621.53 samples/sec Loss 10.4913 LearningRate 0.0594 Epoch: 4 Global Step: 190410 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:13,162-Speed 2608.71 samples/sec Loss 10.4664 LearningRate 0.0594 Epoch: 4 Global Step: 190420 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:17,061-Speed 2627.18 samples/sec Loss 10.6770 LearningRate 0.0594 Epoch: 4 Global Step: 190430 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:20,984-Speed 2610.44 samples/sec Loss 10.5858 LearningRate 0.0594 Epoch: 4 Global Step: 190440 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:24,886-Speed 2625.18 samples/sec Loss 10.4522 LearningRate 0.0594 Epoch: 4 Global Step: 190450 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:28,785-Speed 2626.46 samples/sec Loss 10.5944 LearningRate 0.0594 Epoch: 4 Global Step: 190460 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:32,686-Speed 2625.92 samples/sec Loss 10.4850 LearningRate 0.0594 Epoch: 4 Global Step: 190470 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:08:36,584-Speed 2627.91 samples/sec Loss 10.3938 LearningRate 0.0594 Epoch: 4 Global Step: 190480 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:08:40,478-Speed 2629.81 samples/sec Loss 10.5246 LearningRate 0.0593 Epoch: 4 Global Step: 190490 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:08:44,389-Speed 2618.67 samples/sec Loss 10.4154 LearningRate 0.0593 Epoch: 4 Global Step: 190500 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:08:48,286-Speed 2628.94 samples/sec Loss 10.3993 LearningRate 0.0593 Epoch: 4 Global Step: 190510 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:08:52,186-Speed 2626.36 samples/sec Loss 10.5274 LearningRate 0.0593 Epoch: 4 Global Step: 190520 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:08:56,109-Speed 2610.79 samples/sec Loss 10.3678 LearningRate 0.0593 Epoch: 4 Global Step: 190530 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:09:00,071-Speed 2584.91 samples/sec Loss 10.5340 LearningRate 0.0593 Epoch: 4 Global Step: 190540 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:09:03,965-Speed 2630.23 samples/sec Loss 10.6263 LearningRate 0.0593 Epoch: 4 Global Step: 190550 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:09:07,871-Speed 2622.52 samples/sec Loss 10.4625 LearningRate 0.0593 Epoch: 4 Global Step: 190560 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:09:11,768-Speed 2627.69 samples/sec Loss 10.4980 LearningRate 0.0593 Epoch: 4 Global Step: 190570 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:09:15,670-Speed 2625.32 samples/sec Loss 10.4501 LearningRate 0.0593 Epoch: 4 Global Step: 190580 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:19,571-Speed 2625.44 samples/sec Loss 10.5338 LearningRate 0.0593 Epoch: 4 Global Step: 190590 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:23,470-Speed 2626.97 samples/sec Loss 10.6639 LearningRate 0.0593 Epoch: 4 Global Step: 190600 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:27,370-Speed 2625.98 samples/sec Loss 10.4374 LearningRate 0.0593 Epoch: 4 Global Step: 190610 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:31,266-Speed 2629.25 samples/sec Loss 10.4807 LearningRate 0.0593 Epoch: 4 Global Step: 190620 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:35,203-Speed 2601.37 samples/sec Loss 10.5609 LearningRate 0.0593 Epoch: 4 Global Step: 190630 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:39,300-Speed 2499.93 samples/sec Loss 10.4715 LearningRate 0.0593 Epoch: 4 Global Step: 190640 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:43,224-Speed 2609.84 samples/sec Loss 10.5445 LearningRate 0.0593 Epoch: 4 Global Step: 190650 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:47,123-Speed 2627.84 samples/sec Loss 10.4267 LearningRate 0.0593 Epoch: 4 Global Step: 190660 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:51,023-Speed 2626.33 samples/sec Loss 10.5045 LearningRate 0.0593 Epoch: 4 Global Step: 190670 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:54,918-Speed 2629.38 samples/sec Loss 10.5177 LearningRate 0.0593 Epoch: 4 Global Step: 190680 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:09:58,812-Speed 2630.59 samples/sec Loss 10.3624 LearningRate 0.0593 Epoch: 4 Global Step: 190690 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:10:02,711-Speed 2627.67 samples/sec Loss 10.2636 LearningRate 0.0593 Epoch: 4 Global Step: 190700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:10:06,604-Speed 2630.48 samples/sec Loss 10.5871 LearningRate 0.0593 Epoch: 4 Global Step: 190710 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:10:10,483-Speed 2640.28 samples/sec Loss 10.4481 LearningRate 0.0593 Epoch: 4 Global Step: 190720 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:14,381-Speed 2627.77 samples/sec Loss 10.5135 LearningRate 0.0593 Epoch: 4 Global Step: 190730 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:18,282-Speed 2626.02 samples/sec Loss 10.5842 LearningRate 0.0593 Epoch: 4 Global Step: 190740 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:22,183-Speed 2625.72 samples/sec Loss 10.5903 LearningRate 0.0593 Epoch: 4 Global Step: 190750 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:26,080-Speed 2627.71 samples/sec Loss 10.5571 LearningRate 0.0593 Epoch: 4 Global Step: 190760 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:29,980-Speed 2626.37 samples/sec Loss 10.4954 LearningRate 0.0593 Epoch: 4 Global Step: 190770 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:33,876-Speed 2628.71 samples/sec Loss 10.5377 LearningRate 0.0593 Epoch: 4 Global Step: 190780 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:37,774-Speed 2628.04 samples/sec Loss 10.5018 LearningRate 0.0593 Epoch: 4 Global Step: 190790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:41,681-Speed 2621.16 samples/sec Loss 10.4894 LearningRate 0.0593 Epoch: 4 Global Step: 190800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:45,579-Speed 2627.52 samples/sec Loss 10.4479 LearningRate 0.0593 Epoch: 4 Global Step: 190810 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:10:49,520-Speed 2598.92 samples/sec Loss 10.5496 LearningRate 0.0593 Epoch: 4 Global Step: 190820 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:10:53,419-Speed 2627.27 samples/sec Loss 10.5843 LearningRate 0.0593 Epoch: 4 Global Step: 190830 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:10:57,302-Speed 2637.94 samples/sec Loss 10.5491 LearningRate 0.0593 Epoch: 4 Global Step: 190840 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:01,200-Speed 2627.45 samples/sec Loss 10.3084 LearningRate 0.0593 Epoch: 4 Global Step: 190850 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:05,109-Speed 2620.11 samples/sec Loss 10.5249 LearningRate 0.0593 Epoch: 4 Global Step: 190860 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:09,181-Speed 2515.10 samples/sec Loss 10.4921 LearningRate 0.0593 Epoch: 4 Global Step: 190870 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:13,251-Speed 2516.85 samples/sec Loss 10.5815 LearningRate 0.0593 Epoch: 4 Global Step: 190880 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:17,207-Speed 2588.41 samples/sec Loss 10.5238 LearningRate 0.0593 Epoch: 4 Global Step: 190890 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:21,108-Speed 2625.71 samples/sec Loss 10.3851 LearningRate 0.0593 Epoch: 4 Global Step: 190900 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:25,020-Speed 2618.86 samples/sec Loss 10.5488 LearningRate 0.0593 Epoch: 4 Global Step: 190910 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:28,921-Speed 2625.90 samples/sec Loss 10.4772 LearningRate 0.0593 Epoch: 4 Global Step: 190920 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:32,820-Speed 2626.80 samples/sec Loss 10.4737 LearningRate 0.0593 Epoch: 4 Global Step: 190930 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:11:36,716-Speed 2628.76 samples/sec Loss 10.5006 LearningRate 0.0593 Epoch: 4 Global Step: 190940 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:11:40,612-Speed 2628.82 samples/sec Loss 10.5006 LearningRate 0.0593 Epoch: 4 Global Step: 190950 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:11:44,512-Speed 2625.99 samples/sec Loss 10.4179 LearningRate 0.0593 Epoch: 4 Global Step: 190960 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:11:48,411-Speed 2627.28 samples/sec Loss 10.4668 LearningRate 0.0593 Epoch: 4 Global Step: 190970 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:11:52,306-Speed 2629.08 samples/sec Loss 10.4917 LearningRate 0.0593 Epoch: 4 Global Step: 190980 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:11:56,210-Speed 2623.77 samples/sec Loss 10.5002 LearningRate 0.0593 Epoch: 4 Global Step: 190990 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:12:00,111-Speed 2625.60 samples/sec Loss 10.6046 LearningRate 0.0593 Epoch: 4 Global Step: 191000 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:12:04,008-Speed 2628.17 samples/sec Loss 10.5510 LearningRate 0.0593 Epoch: 4 Global Step: 191010 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:12:07,904-Speed 2629.30 samples/sec Loss 10.5355 LearningRate 0.0592 Epoch: 4 Global Step: 191020 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:12:11,798-Speed 2630.17 samples/sec Loss 10.5403 LearningRate 0.0592 Epoch: 4 Global Step: 191030 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:12:15,677-Speed 2640.32 samples/sec Loss 10.4950 LearningRate 0.0592 Epoch: 4 Global Step: 191040 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:12:19,561-Speed 2637.46 samples/sec Loss 10.3723 LearningRate 0.0592 Epoch: 4 Global Step: 191050 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:12:23,437-Speed 2642.42 samples/sec Loss 10.5365 LearningRate 0.0592 Epoch: 4 Global Step: 191060 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:27,343-Speed 2622.61 samples/sec Loss 11.3496 LearningRate 0.0592 Epoch: 4 Global Step: 191070 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:31,237-Speed 2629.68 samples/sec Loss 10.9315 LearningRate 0.0592 Epoch: 4 Global Step: 191080 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:35,137-Speed 2626.16 samples/sec Loss 10.6548 LearningRate 0.0592 Epoch: 4 Global Step: 191090 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:39,034-Speed 2628.75 samples/sec Loss 10.7133 LearningRate 0.0592 Epoch: 4 Global Step: 191100 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:42,926-Speed 2632.10 samples/sec Loss 10.6835 LearningRate 0.0592 Epoch: 4 Global Step: 191110 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:46,818-Speed 2631.02 samples/sec Loss 10.6263 LearningRate 0.0592 Epoch: 4 Global Step: 191120 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:50,719-Speed 2625.58 samples/sec Loss 10.5824 LearningRate 0.0592 Epoch: 4 Global Step: 191130 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:54,637-Speed 2614.18 samples/sec Loss 10.3810 LearningRate 0.0592 Epoch: 4 Global Step: 191140 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:12:58,548-Speed 2618.96 samples/sec Loss 10.4915 LearningRate 0.0592 Epoch: 4 Global Step: 191150 Fp16 Grad Scale: 32768 Required: 72 hours
Training: 2022-04-13 17:13:02,440-Speed 2631.40 samples/sec Loss 10.5862 LearningRate 0.0592 Epoch: 4 Global Step: 191160 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:06,337-Speed 2628.25 samples/sec Loss 10.5384 LearningRate 0.0592 Epoch: 4 Global Step: 191170 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:10,238-Speed 2625.37 samples/sec Loss 10.7065 LearningRate 0.0592 Epoch: 4 Global Step: 191180 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:14,132-Speed 2630.92 samples/sec Loss 10.4927 LearningRate 0.0592 Epoch: 4 Global Step: 191190 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:18,035-Speed 2624.52 samples/sec Loss 10.4100 LearningRate 0.0592 Epoch: 4 Global Step: 191200 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:21,934-Speed 2626.87 samples/sec Loss 10.6214 LearningRate 0.0592 Epoch: 4 Global Step: 191210 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:25,840-Speed 2623.13 samples/sec Loss 10.6271 LearningRate 0.0592 Epoch: 4 Global Step: 191220 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:29,733-Speed 2631.02 samples/sec Loss 10.5808 LearningRate 0.0592 Epoch: 4 Global Step: 191230 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:33,626-Speed 2630.29 samples/sec Loss 10.4136 LearningRate 0.0592 Epoch: 4 Global Step: 191240 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:37,545-Speed 2613.85 samples/sec Loss 10.4926 LearningRate 0.0592 Epoch: 4 Global Step: 191250 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:13:41,441-Speed 2629.60 samples/sec Loss 10.4247 LearningRate 0.0592 Epoch: 4 Global Step: 191260 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:13:45,349-Speed 2620.51 samples/sec Loss 10.3414 LearningRate 0.0592 Epoch: 4 Global Step: 191270 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:13:49,264-Speed 2616.28 samples/sec Loss 10.4203 LearningRate 0.0592 Epoch: 4 Global Step: 191280 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:13:53,184-Speed 2612.84 samples/sec Loss 10.5427 LearningRate 0.0592 Epoch: 4 Global Step: 191290 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:13:57,116-Speed 2605.52 samples/sec Loss 10.7028 LearningRate 0.0592 Epoch: 4 Global Step: 191300 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:01,046-Speed 2605.74 samples/sec Loss 10.6545 LearningRate 0.0592 Epoch: 4 Global Step: 191310 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:04,956-Speed 2619.79 samples/sec Loss 10.4746 LearningRate 0.0592 Epoch: 4 Global Step: 191320 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:08,858-Speed 2624.41 samples/sec Loss 10.5597 LearningRate 0.0592 Epoch: 4 Global Step: 191330 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:12,763-Speed 2623.83 samples/sec Loss 10.5071 LearningRate 0.0592 Epoch: 4 Global Step: 191340 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:16,667-Speed 2623.10 samples/sec Loss 10.6524 LearningRate 0.0592 Epoch: 4 Global Step: 191350 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:20,571-Speed 2623.30 samples/sec Loss 10.5086 LearningRate 0.0592 Epoch: 4 Global Step: 191360 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:14:24,477-Speed 2622.68 samples/sec Loss 10.7572 LearningRate 0.0592 Epoch: 4 Global Step: 191370 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:14:28,385-Speed 2620.84 samples/sec Loss 10.6984 LearningRate 0.0592 Epoch: 4 Global Step: 191380 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:32,283-Speed 2628.06 samples/sec Loss 10.5030 LearningRate 0.0592 Epoch: 4 Global Step: 191390 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:36,181-Speed 2628.12 samples/sec Loss 10.5052 LearningRate 0.0592 Epoch: 4 Global Step: 191400 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:40,075-Speed 2630.12 samples/sec Loss 10.5307 LearningRate 0.0592 Epoch: 4 Global Step: 191410 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:43,971-Speed 2628.96 samples/sec Loss 10.4809 LearningRate 0.0592 Epoch: 4 Global Step: 191420 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:47,879-Speed 2620.79 samples/sec Loss 10.4106 LearningRate 0.0592 Epoch: 4 Global Step: 191430 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:51,943-Speed 2520.99 samples/sec Loss 10.5279 LearningRate 0.0592 Epoch: 4 Global Step: 191440 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:14:55,914-Speed 2579.15 samples/sec Loss 10.6795 LearningRate 0.0592 Epoch: 4 Global Step: 191450 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:14:59,839-Speed 2609.48 samples/sec Loss 10.6149 LearningRate 0.0592 Epoch: 4 Global Step: 191460 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:03,733-Speed 2630.78 samples/sec Loss 10.4043 LearningRate 0.0592 Epoch: 4 Global Step: 191470 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:07,630-Speed 2628.40 samples/sec Loss 10.5212 LearningRate 0.0592 Epoch: 4 Global Step: 191480 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:11,536-Speed 2621.76 samples/sec Loss 10.4960 LearningRate 0.0592 Epoch: 4 Global Step: 191490 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:15,435-Speed 2627.50 samples/sec Loss 10.4771 LearningRate 0.0592 Epoch: 4 Global Step: 191500 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:19,330-Speed 2629.27 samples/sec Loss 10.4369 LearningRate 0.0592 Epoch: 4 Global Step: 191510 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:23,223-Speed 2631.54 samples/sec Loss 10.4728 LearningRate 0.0592 Epoch: 4 Global Step: 191520 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:27,118-Speed 2629.04 samples/sec Loss 10.5501 LearningRate 0.0592 Epoch: 4 Global Step: 191530 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:31,021-Speed 2624.53 samples/sec Loss 10.4331 LearningRate 0.0592 Epoch: 4 Global Step: 191540 Fp16 Grad Scale: 65536 Required: 72 hours
Training: 2022-04-13 17:15:34,916-Speed 2629.58 samples/sec Loss 10.4178 LearningRate 0.0592 Epoch: 4 Global Step: 191550 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:15:38,814-Speed 2628.41 samples/sec Loss 10.4856 LearningRate 0.0591 Epoch: 4 Global Step: 191560 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:15:42,711-Speed 2628.23 samples/sec Loss 10.4312 LearningRate 0.0591 Epoch: 4 Global Step: 191570 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:15:46,602-Speed 2631.82 samples/sec Loss 10.5477 LearningRate 0.0591 Epoch: 4 Global Step: 191580 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:15:50,496-Speed 2630.52 samples/sec Loss 10.4769 LearningRate 0.0591 Epoch: 4 Global Step: 191590 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:15:54,391-Speed 2629.35 samples/sec Loss 10.4061 LearningRate 0.0591 Epoch: 4 Global Step: 191600 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:15:58,292-Speed 2625.87 samples/sec Loss 10.5409 LearningRate 0.0591 Epoch: 4 Global Step: 191610 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:02,190-Speed 2627.20 samples/sec Loss 10.4956 LearningRate 0.0591 Epoch: 4 Global Step: 191620 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:06,091-Speed 2627.01 samples/sec Loss 10.5650 LearningRate 0.0591 Epoch: 4 Global Step: 191630 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:09,993-Speed 2624.91 samples/sec Loss 10.5025 LearningRate 0.0591 Epoch: 4 Global Step: 191640 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:13,903-Speed 2618.97 samples/sec Loss 10.4961 LearningRate 0.0591 Epoch: 4 Global Step: 191650 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:16:17,974-Speed 2516.30 samples/sec Loss 10.5350 LearningRate 0.0591 Epoch: 4 Global Step: 191660 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:16:21,937-Speed 2585.15 samples/sec Loss 10.5844 LearningRate 0.0591 Epoch: 4 Global Step: 191670 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:16:25,832-Speed 2629.44 samples/sec Loss 10.6002 LearningRate 0.0591 Epoch: 4 Global Step: 191680 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:16:29,728-Speed 2628.65 samples/sec Loss 10.4022 LearningRate 0.0591 Epoch: 4 Global Step: 191690 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:16:33,634-Speed 2622.96 samples/sec Loss 10.5772 LearningRate 0.0591 Epoch: 4 Global Step: 191700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:16:37,509-Speed 2643.47 samples/sec Loss 10.5031 LearningRate 0.0591 Epoch: 4 Global Step: 191710 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:41,408-Speed 2626.63 samples/sec Loss 10.4935 LearningRate 0.0591 Epoch: 4 Global Step: 191720 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:45,302-Speed 2630.61 samples/sec Loss 10.6202 LearningRate 0.0591 Epoch: 4 Global Step: 191730 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:49,197-Speed 2629.43 samples/sec Loss 10.5857 LearningRate 0.0591 Epoch: 4 Global Step: 191740 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:53,252-Speed 2526.24 samples/sec Loss 10.5287 LearningRate 0.0591 Epoch: 4 Global Step: 191750 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:16:57,148-Speed 2628.89 samples/sec Loss 10.4343 LearningRate 0.0591 Epoch: 4 Global Step: 191760 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:01,044-Speed 2629.00 samples/sec Loss 10.4776 LearningRate 0.0591 Epoch: 4 Global Step: 191770 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:04,950-Speed 2621.85 samples/sec Loss 10.4395 LearningRate 0.0591 Epoch: 4 Global Step: 191780 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:08,881-Speed 2606.15 samples/sec Loss 10.5774 LearningRate 0.0591 Epoch: 4 Global Step: 191790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:12,775-Speed 2630.39 samples/sec Loss 10.6506 LearningRate 0.0591 Epoch: 4 Global Step: 191800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:16,681-Speed 2622.30 samples/sec Loss 10.4952 LearningRate 0.0591 Epoch: 4 Global Step: 191810 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:20,599-Speed 2614.37 samples/sec Loss 10.4556 LearningRate 0.0591 Epoch: 4 Global Step: 191820 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:24,499-Speed 2626.31 samples/sec Loss 10.5630 LearningRate 0.0591 Epoch: 4 Global Step: 191830 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:28,393-Speed 2630.48 samples/sec Loss 10.5401 LearningRate 0.0591 Epoch: 4 Global Step: 191840 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:32,333-Speed 2599.29 samples/sec Loss 10.4607 LearningRate 0.0591 Epoch: 4 Global Step: 191850 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:36,274-Speed 2599.15 samples/sec Loss 10.5014 LearningRate 0.0591 Epoch: 4 Global Step: 191860 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:40,176-Speed 2625.25 samples/sec Loss 10.4124 LearningRate 0.0591 Epoch: 4 Global Step: 191870 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:44,065-Speed 2633.52 samples/sec Loss 10.4306 LearningRate 0.0591 Epoch: 4 Global Step: 191880 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:17:47,940-Speed 2642.96 samples/sec Loss 10.5148 LearningRate 0.0591 Epoch: 4 Global Step: 191890 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:51,863-Speed 2611.58 samples/sec Loss 10.4844 LearningRate 0.0591 Epoch: 4 Global Step: 191900 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:55,757-Speed 2630.14 samples/sec Loss 10.5163 LearningRate 0.0591 Epoch: 4 Global Step: 191910 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:17:59,655-Speed 2628.37 samples/sec Loss 10.4989 LearningRate 0.0591 Epoch: 4 Global Step: 191920 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:18:03,549-Speed 2630.22 samples/sec Loss 10.6097 LearningRate 0.0591 Epoch: 4 Global Step: 191930 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:18:07,482-Speed 2604.04 samples/sec Loss 10.5250 LearningRate 0.0591 Epoch: 4 Global Step: 191940 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:18:11,378-Speed 2628.87 samples/sec Loss 10.6141 LearningRate 0.0591 Epoch: 4 Global Step: 191950 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:18:15,279-Speed 2626.17 samples/sec Loss 10.3741 LearningRate 0.0591 Epoch: 4 Global Step: 191960 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:18:19,172-Speed 2631.58 samples/sec Loss 10.5229 LearningRate 0.0591 Epoch: 4 Global Step: 191970 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:18:23,070-Speed 2627.41 samples/sec Loss 10.4623 LearningRate 0.0591 Epoch: 4 Global Step: 191980 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:18:26,969-Speed 2627.45 samples/sec Loss 10.4178 LearningRate 0.0591 Epoch: 4 Global Step: 191990 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:30,867-Speed 2627.45 samples/sec Loss 10.4474 LearningRate 0.0591 Epoch: 4 Global Step: 192000 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:34,770-Speed 2624.37 samples/sec Loss 10.3882 LearningRate 0.0591 Epoch: 4 Global Step: 192010 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:38,670-Speed 2626.10 samples/sec Loss 10.5311 LearningRate 0.0591 Epoch: 4 Global Step: 192020 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:42,565-Speed 2629.76 samples/sec Loss 10.4435 LearningRate 0.0591 Epoch: 4 Global Step: 192030 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:46,465-Speed 2626.38 samples/sec Loss 10.6205 LearningRate 0.0591 Epoch: 4 Global Step: 192040 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:50,376-Speed 2618.83 samples/sec Loss 10.4176 LearningRate 0.0591 Epoch: 4 Global Step: 192050 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:54,280-Speed 2623.28 samples/sec Loss 10.5343 LearningRate 0.0591 Epoch: 4 Global Step: 192060 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:18:58,155-Speed 2643.84 samples/sec Loss 10.4303 LearningRate 0.0591 Epoch: 4 Global Step: 192070 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:02,063-Speed 2621.13 samples/sec Loss 10.4623 LearningRate 0.0591 Epoch: 4 Global Step: 192080 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:05,960-Speed 2627.90 samples/sec Loss 10.4371 LearningRate 0.0591 Epoch: 4 Global Step: 192090 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:09,915-Speed 2589.93 samples/sec Loss 10.4647 LearningRate 0.0590 Epoch: 4 Global Step: 192100 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:13,829-Speed 2617.01 samples/sec Loss 10.4058 LearningRate 0.0590 Epoch: 4 Global Step: 192110 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:17,770-Speed 2599.12 samples/sec Loss 10.4531 LearningRate 0.0590 Epoch: 4 Global Step: 192120 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:21,674-Speed 2624.12 samples/sec Loss 10.5838 LearningRate 0.0590 Epoch: 4 Global Step: 192130 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:25,581-Speed 2621.21 samples/sec Loss 10.6053 LearningRate 0.0590 Epoch: 4 Global Step: 192140 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:29,472-Speed 2632.55 samples/sec Loss 10.5158 LearningRate 0.0590 Epoch: 4 Global Step: 192150 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:33,367-Speed 2630.05 samples/sec Loss 10.3791 LearningRate 0.0590 Epoch: 4 Global Step: 192160 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:19:37,260-Speed 2630.53 samples/sec Loss 10.5939 LearningRate 0.0590 Epoch: 4 Global Step: 192170 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:19:41,154-Speed 2630.14 samples/sec Loss 10.5032 LearningRate 0.0590 Epoch: 4 Global Step: 192180 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:19:45,049-Speed 2630.17 samples/sec Loss 10.4860 LearningRate 0.0590 Epoch: 4 Global Step: 192190 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:19:48,942-Speed 2631.11 samples/sec Loss 10.5053 LearningRate 0.0590 Epoch: 4 Global Step: 192200 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:19:52,834-Speed 2632.19 samples/sec Loss 10.5103 LearningRate 0.0590 Epoch: 4 Global Step: 192210 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:19:56,728-Speed 2630.05 samples/sec Loss 10.4409 LearningRate 0.0590 Epoch: 4 Global Step: 192220 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:20:00,631-Speed 2624.44 samples/sec Loss 10.3759 LearningRate 0.0590 Epoch: 4 Global Step: 192230 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:20:04,542-Speed 2618.94 samples/sec Loss 10.4965 LearningRate 0.0590 Epoch: 4 Global Step: 192240 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:08,621-Speed 2510.65 samples/sec Loss 10.5306 LearningRate 0.0590 Epoch: 4 Global Step: 192250 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:12,550-Speed 2607.24 samples/sec Loss 10.3114 LearningRate 0.0590 Epoch: 4 Global Step: 192260 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:16,446-Speed 2629.07 samples/sec Loss 10.6083 LearningRate 0.0590 Epoch: 4 Global Step: 192270 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:20,346-Speed 2626.47 samples/sec Loss 10.5971 LearningRate 0.0590 Epoch: 4 Global Step: 192280 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:24,246-Speed 2627.01 samples/sec Loss 10.4582 LearningRate 0.0590 Epoch: 4 Global Step: 192290 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:28,148-Speed 2624.86 samples/sec Loss 10.4621 LearningRate 0.0590 Epoch: 4 Global Step: 192300 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:32,046-Speed 2627.67 samples/sec Loss 10.4431 LearningRate 0.0590 Epoch: 4 Global Step: 192310 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:35,938-Speed 2631.05 samples/sec Loss 10.3284 LearningRate 0.0590 Epoch: 4 Global Step: 192320 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:39,830-Speed 2632.46 samples/sec Loss 10.4806 LearningRate 0.0590 Epoch: 4 Global Step: 192330 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:20:43,720-Speed 2633.29 samples/sec Loss 10.4774 LearningRate 0.0590 Epoch: 4 Global Step: 192340 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:20:47,613-Speed 2630.54 samples/sec Loss 10.5255 LearningRate 0.0590 Epoch: 4 Global Step: 192350 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:20:51,505-Speed 2631.73 samples/sec Loss 10.5499 LearningRate 0.0590 Epoch: 4 Global Step: 192360 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:20:55,396-Speed 2632.32 samples/sec Loss 10.5138 LearningRate 0.0590 Epoch: 4 Global Step: 192370 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:20:59,286-Speed 2633.72 samples/sec Loss 10.5129 LearningRate 0.0590 Epoch: 4 Global Step: 192380 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:03,174-Speed 2633.91 samples/sec Loss 10.4724 LearningRate 0.0590 Epoch: 4 Global Step: 192390 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:07,069-Speed 2629.60 samples/sec Loss 10.5467 LearningRate 0.0590 Epoch: 4 Global Step: 192400 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:10,970-Speed 2626.07 samples/sec Loss 10.3416 LearningRate 0.0590 Epoch: 4 Global Step: 192410 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:14,872-Speed 2624.60 samples/sec Loss 10.5119 LearningRate 0.0590 Epoch: 4 Global Step: 192420 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:18,769-Speed 2628.36 samples/sec Loss 10.4790 LearningRate 0.0590 Epoch: 4 Global Step: 192430 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:22,666-Speed 2628.20 samples/sec Loss 10.3944 LearningRate 0.0590 Epoch: 4 Global Step: 192440 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:26,564-Speed 2627.87 samples/sec Loss 10.4448 LearningRate 0.0590 Epoch: 4 Global Step: 192450 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:30,460-Speed 2629.49 samples/sec Loss 10.4803 LearningRate 0.0590 Epoch: 4 Global Step: 192460 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:34,367-Speed 2621.47 samples/sec Loss 10.4869 LearningRate 0.0590 Epoch: 4 Global Step: 192470 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:38,263-Speed 2629.12 samples/sec Loss 10.5119 LearningRate 0.0590 Epoch: 4 Global Step: 192480 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:21:42,140-Speed 2642.06 samples/sec Loss 10.4901 LearningRate 0.0590 Epoch: 4 Global Step: 192490 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:46,035-Speed 2629.10 samples/sec Loss 10.5229 LearningRate 0.0590 Epoch: 4 Global Step: 192500 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:49,929-Speed 2629.92 samples/sec Loss 10.3956 LearningRate 0.0590 Epoch: 4 Global Step: 192510 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:53,840-Speed 2619.27 samples/sec Loss 10.3481 LearningRate 0.0590 Epoch: 4 Global Step: 192520 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:21:57,734-Speed 2630.56 samples/sec Loss 10.5195 LearningRate 0.0590 Epoch: 4 Global Step: 192530 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:01,631-Speed 2628.22 samples/sec Loss 10.6734 LearningRate 0.0590 Epoch: 4 Global Step: 192540 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:05,526-Speed 2629.44 samples/sec Loss 10.6376 LearningRate 0.0590 Epoch: 4 Global Step: 192550 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:09,465-Speed 2601.11 samples/sec Loss 10.5139 LearningRate 0.0590 Epoch: 4 Global Step: 192560 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:13,359-Speed 2629.86 samples/sec Loss 10.4673 LearningRate 0.0590 Epoch: 4 Global Step: 192570 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:17,251-Speed 2631.85 samples/sec Loss 10.3801 LearningRate 0.0590 Epoch: 4 Global Step: 192580 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:21,156-Speed 2622.47 samples/sec Loss 10.6125 LearningRate 0.0590 Epoch: 4 Global Step: 192590 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:22:25,038-Speed 2638.92 samples/sec Loss 10.5090 LearningRate 0.0590 Epoch: 4 Global Step: 192600 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:28,935-Speed 2628.97 samples/sec Loss 10.5034 LearningRate 0.0590 Epoch: 4 Global Step: 192610 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:32,833-Speed 2627.54 samples/sec Loss 10.4223 LearningRate 0.0590 Epoch: 4 Global Step: 192620 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:36,731-Speed 2627.34 samples/sec Loss 10.4503 LearningRate 0.0590 Epoch: 4 Global Step: 192630 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:40,646-Speed 2616.44 samples/sec Loss 10.5290 LearningRate 0.0589 Epoch: 4 Global Step: 192640 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:44,559-Speed 2617.28 samples/sec Loss 10.6451 LearningRate 0.0589 Epoch: 4 Global Step: 192650 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:48,453-Speed 2631.01 samples/sec Loss 10.3437 LearningRate 0.0589 Epoch: 4 Global Step: 192660 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:52,348-Speed 2629.83 samples/sec Loss 10.6338 LearningRate 0.0589 Epoch: 4 Global Step: 192670 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:22:56,243-Speed 2629.16 samples/sec Loss 10.4286 LearningRate 0.0589 Epoch: 4 Global Step: 192680 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:00,137-Speed 2630.59 samples/sec Loss 10.5150 LearningRate 0.0589 Epoch: 4 Global Step: 192690 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:04,029-Speed 2631.08 samples/sec Loss 10.4191 LearningRate 0.0589 Epoch: 4 Global Step: 192700 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:23:07,928-Speed 2627.09 samples/sec Loss 10.5200 LearningRate 0.0589 Epoch: 4 Global Step: 192710 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:23:11,834-Speed 2622.63 samples/sec Loss 10.5364 LearningRate 0.0589 Epoch: 4 Global Step: 192720 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:23:15,732-Speed 2627.18 samples/sec Loss 10.3980 LearningRate 0.0589 Epoch: 4 Global Step: 192730 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:23:19,630-Speed 2628.49 samples/sec Loss 10.4335 LearningRate 0.0589 Epoch: 4 Global Step: 192740 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:23:23,529-Speed 2626.60 samples/sec Loss 10.5531 LearningRate 0.0589 Epoch: 4 Global Step: 192750 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:23:27,427-Speed 2628.26 samples/sec Loss 10.4418 LearningRate 0.0589 Epoch: 4 Global Step: 192760 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:23:31,302-Speed 2642.48 samples/sec Loss 10.3948 LearningRate 0.0589 Epoch: 4 Global Step: 192770 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:35,194-Speed 2631.87 samples/sec Loss 10.5187 LearningRate 0.0589 Epoch: 4 Global Step: 192780 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:39,092-Speed 2627.86 samples/sec Loss 10.3317 LearningRate 0.0589 Epoch: 4 Global Step: 192790 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:42,986-Speed 2630.28 samples/sec Loss 10.4212 LearningRate 0.0589 Epoch: 4 Global Step: 192800 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:46,883-Speed 2628.57 samples/sec Loss 10.4529 LearningRate 0.0589 Epoch: 4 Global Step: 192810 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:50,778-Speed 2629.31 samples/sec Loss 10.4418 LearningRate 0.0589 Epoch: 4 Global Step: 192820 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:54,669-Speed 2632.19 samples/sec Loss 10.4392 LearningRate 0.0589 Epoch: 4 Global Step: 192830 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:23:58,570-Speed 2626.00 samples/sec Loss 10.4866 LearningRate 0.0589 Epoch: 4 Global Step: 192840 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:24:02,466-Speed 2628.98 samples/sec Loss 10.3803 LearningRate 0.0589 Epoch: 4 Global Step: 192850 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:24:06,364-Speed 2627.79 samples/sec Loss 10.4751 LearningRate 0.0589 Epoch: 4 Global Step: 192860 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:24:10,256-Speed 2631.38 samples/sec Loss 10.4258 LearningRate 0.0589 Epoch: 4 Global Step: 192870 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:14,153-Speed 2628.45 samples/sec Loss 10.3057 LearningRate 0.0589 Epoch: 4 Global Step: 192880 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:18,044-Speed 2632.27 samples/sec Loss 10.5834 LearningRate 0.0589 Epoch: 4 Global Step: 192890 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:21,938-Speed 2630.52 samples/sec Loss 10.4333 LearningRate 0.0589 Epoch: 4 Global Step: 192900 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:25,831-Speed 2630.47 samples/sec Loss 10.4258 LearningRate 0.0589 Epoch: 4 Global Step: 192910 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:29,728-Speed 2628.85 samples/sec Loss 10.5334 LearningRate 0.0589 Epoch: 4 Global Step: 192920 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:33,620-Speed 2631.63 samples/sec Loss 10.4144 LearningRate 0.0589 Epoch: 4 Global Step: 192930 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:37,518-Speed 2627.53 samples/sec Loss 10.2876 LearningRate 0.0589 Epoch: 4 Global Step: 192940 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:41,410-Speed 2631.36 samples/sec Loss 10.5343 LearningRate 0.0589 Epoch: 4 Global Step: 192950 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:45,305-Speed 2630.54 samples/sec Loss 10.5487 LearningRate 0.0589 Epoch: 4 Global Step: 192960 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:49,183-Speed 2640.99 samples/sec Loss 10.4622 LearningRate 0.0589 Epoch: 4 Global Step: 192970 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:24:53,071-Speed 2634.26 samples/sec Loss 10.5165 LearningRate 0.0589 Epoch: 4 Global Step: 192980 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:24:56,966-Speed 2629.93 samples/sec Loss 10.3910 LearningRate 0.0589 Epoch: 4 Global Step: 192990 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:00,862-Speed 2628.72 samples/sec Loss 10.3098 LearningRate 0.0589 Epoch: 4 Global Step: 193000 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:04,760-Speed 2627.83 samples/sec Loss 10.4170 LearningRate 0.0589 Epoch: 4 Global Step: 193010 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:08,655-Speed 2629.84 samples/sec Loss 10.3986 LearningRate 0.0589 Epoch: 4 Global Step: 193020 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:12,556-Speed 2625.51 samples/sec Loss 10.4623 LearningRate 0.0589 Epoch: 4 Global Step: 193030 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:16,454-Speed 2627.08 samples/sec Loss 10.5320 LearningRate 0.0589 Epoch: 4 Global Step: 193040 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:20,347-Speed 2631.66 samples/sec Loss 10.3433 LearningRate 0.0589 Epoch: 4 Global Step: 193050 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:24,251-Speed 2623.04 samples/sec Loss 10.4526 LearningRate 0.0589 Epoch: 4 Global Step: 193060 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:28,144-Speed 2631.23 samples/sec Loss 10.4225 LearningRate 0.0589 Epoch: 4 Global Step: 193070 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:32,050-Speed 2622.21 samples/sec Loss 10.4402 LearningRate 0.0589 Epoch: 4 Global Step: 193080 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:25:35,931-Speed 2638.97 samples/sec Loss 10.4358 LearningRate 0.0589 Epoch: 4 Global Step: 193090 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:39,827-Speed 2628.99 samples/sec Loss 10.3895 LearningRate 0.0589 Epoch: 4 Global Step: 193100 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:43,719-Speed 2631.59 samples/sec Loss 10.5987 LearningRate 0.0589 Epoch: 4 Global Step: 193110 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:47,616-Speed 2628.35 samples/sec Loss 10.4339 LearningRate 0.0589 Epoch: 4 Global Step: 193120 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:51,510-Speed 2630.60 samples/sec Loss 10.4200 LearningRate 0.0589 Epoch: 4 Global Step: 193130 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:55,403-Speed 2630.25 samples/sec Loss 10.4893 LearningRate 0.0589 Epoch: 4 Global Step: 193140 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:25:59,310-Speed 2622.36 samples/sec Loss 10.5237 LearningRate 0.0589 Epoch: 4 Global Step: 193150 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:26:03,223-Speed 2617.11 samples/sec Loss 10.3940 LearningRate 0.0589 Epoch: 4 Global Step: 193160 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:26:07,122-Speed 2627.02 samples/sec Loss 10.4343 LearningRate 0.0589 Epoch: 4 Global Step: 193170 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:26:11,015-Speed 2631.15 samples/sec Loss 10.4905 LearningRate 0.0588 Epoch: 4 Global Step: 193180 Fp16 Grad Scale: 131072 Required: 72 hours
Training: 2022-04-13 17:26:14,912-Speed 2628.22 samples/sec Loss 10.3805 LearningRate 0.0588 Epoch: 4 Global Step: 193190 Fp16 Grad Scale: 262144 Required: 72 hours
Training: 2022-04-13 17:26:18,806-Speed 2630.61 samples/sec Loss 10.4434 LearningRate 0.0588 Epoch: 4 Global Step: 193200 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:26:22,702-Speed 2628.47 samples/sec Loss 10.5285 LearningRate 0.0588 Epoch: 4 Global Step: 193210 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:26:26,594-Speed 2631.50 samples/sec Loss 10.4103 LearningRate 0.0588 Epoch: 4 Global Step: 193220 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:26:30,492-Speed 2627.92 samples/sec Loss 10.3601 LearningRate 0.0588 Epoch: 4 Global Step: 193230 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:26:34,369-Speed 2641.48 samples/sec Loss 10.3899 LearningRate 0.0588 Epoch: 4 Global Step: 193240 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:26:38,274-Speed 2623.41 samples/sec Loss 10.6618 LearningRate 0.0588 Epoch: 4 Global Step: 193250 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:26:42,166-Speed 2631.32 samples/sec Loss 10.3574 LearningRate 0.0588 Epoch: 4 Global Step: 193260 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:26:46,057-Speed 2632.26 samples/sec Loss 10.4385 LearningRate 0.0588 Epoch: 4 Global Step: 193270 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:26:49,954-Speed 2628.12 samples/sec Loss 10.3976 LearningRate 0.0588 Epoch: 4 Global Step: 193280 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:26:53,847-Speed 2631.14 samples/sec Loss 10.4743 LearningRate 0.0588 Epoch: 4 Global Step: 193290 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:26:57,750-Speed 2624.48 samples/sec Loss 10.5361 LearningRate 0.0588 Epoch: 4 Global Step: 193300 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:01,657-Speed 2621.62 samples/sec Loss 10.4404 LearningRate 0.0588 Epoch: 4 Global Step: 193310 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:05,553-Speed 2628.28 samples/sec Loss 10.4547 LearningRate 0.0588 Epoch: 4 Global Step: 193320 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:09,467-Speed 2617.18 samples/sec Loss 10.4065 LearningRate 0.0588 Epoch: 4 Global Step: 193330 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:13,364-Speed 2627.93 samples/sec Loss 10.5066 LearningRate 0.0588 Epoch: 4 Global Step: 193340 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:27:17,252-Speed 2634.74 samples/sec Loss 10.3510 LearningRate 0.0588 Epoch: 4 Global Step: 193350 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:21,144-Speed 2631.44 samples/sec Loss 10.3582 LearningRate 0.0588 Epoch: 4 Global Step: 193360 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:25,039-Speed 2629.44 samples/sec Loss 10.4325 LearningRate 0.0588 Epoch: 4 Global Step: 193370 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:28,939-Speed 2626.46 samples/sec Loss 10.4872 LearningRate 0.0588 Epoch: 4 Global Step: 193380 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:32,838-Speed 2627.38 samples/sec Loss 10.5159 LearningRate 0.0588 Epoch: 4 Global Step: 193390 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:36,730-Speed 2631.08 samples/sec Loss 10.4044 LearningRate 0.0588 Epoch: 4 Global Step: 193400 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:40,623-Speed 2631.24 samples/sec Loss 10.4220 LearningRate 0.0588 Epoch: 4 Global Step: 193410 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:44,532-Speed 2619.97 samples/sec Loss 10.5120 LearningRate 0.0588 Epoch: 4 Global Step: 193420 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:48,424-Speed 2631.78 samples/sec Loss 10.5098 LearningRate 0.0588 Epoch: 4 Global Step: 193430 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:52,322-Speed 2627.13 samples/sec Loss 10.4596 LearningRate 0.0588 Epoch: 4 Global Step: 193440 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:27:56,213-Speed 2632.68 samples/sec Loss 10.4354 LearningRate 0.0588 Epoch: 4 Global Step: 193450 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:28:00,087-Speed 2644.21 samples/sec Loss 10.3342 LearningRate 0.0588 Epoch: 4 Global Step: 193460 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:03,981-Speed 2630.06 samples/sec Loss 10.5046 LearningRate 0.0588 Epoch: 4 Global Step: 193470 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:07,886-Speed 2623.16 samples/sec Loss 10.5819 LearningRate 0.0588 Epoch: 4 Global Step: 193480 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:11,778-Speed 2631.68 samples/sec Loss 10.4953 LearningRate 0.0588 Epoch: 4 Global Step: 193490 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:15,678-Speed 2626.04 samples/sec Loss 10.5536 LearningRate 0.0588 Epoch: 4 Global Step: 193500 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:19,582-Speed 2623.38 samples/sec Loss 10.3898 LearningRate 0.0588 Epoch: 4 Global Step: 193510 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:23,488-Speed 2624.53 samples/sec Loss 10.4309 LearningRate 0.0588 Epoch: 4 Global Step: 193520 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:27,383-Speed 2629.42 samples/sec Loss 10.4654 LearningRate 0.0588 Epoch: 4 Global Step: 193530 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:31,287-Speed 2623.82 samples/sec Loss 10.3823 LearningRate 0.0588 Epoch: 4 Global Step: 193540 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:35,179-Speed 2631.52 samples/sec Loss 10.5580 LearningRate 0.0588 Epoch: 4 Global Step: 193550 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:39,070-Speed 2632.30 samples/sec Loss 10.4463 LearningRate 0.0588 Epoch: 4 Global Step: 193560 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:28:42,962-Speed 2631.93 samples/sec Loss 10.2640 LearningRate 0.0588 Epoch: 4 Global Step: 193570 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:28:46,842-Speed 2639.62 samples/sec Loss 10.3917 LearningRate 0.0588 Epoch: 4 Global Step: 193580 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:50,743-Speed 2625.46 samples/sec Loss 10.3143 LearningRate 0.0588 Epoch: 4 Global Step: 193590 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:54,649-Speed 2622.67 samples/sec Loss 10.3842 LearningRate 0.0588 Epoch: 4 Global Step: 193600 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:28:58,560-Speed 2618.68 samples/sec Loss 10.3889 LearningRate 0.0588 Epoch: 4 Global Step: 193610 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:29:02,451-Speed 2632.27 samples/sec Loss 10.4834 LearningRate 0.0588 Epoch: 4 Global Step: 193620 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:29:06,305-Speed 2657.03 samples/sec Loss 10.7338 LearningRate 0.0588 Epoch: 4 Global Step: 193630 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:10,198-Speed 2631.46 samples/sec Loss 10.6741 LearningRate 0.0588 Epoch: 4 Global Step: 193640 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:14,088-Speed 2632.75 samples/sec Loss 10.4991 LearningRate 0.0588 Epoch: 4 Global Step: 193650 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:17,990-Speed 2625.23 samples/sec Loss 10.5148 LearningRate 0.0588 Epoch: 4 Global Step: 193660 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:21,898-Speed 2621.55 samples/sec Loss 10.4970 LearningRate 0.0588 Epoch: 4 Global Step: 193670 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:25,805-Speed 2621.28 samples/sec Loss 10.3676 LearningRate 0.0588 Epoch: 4 Global Step: 193680 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:29,703-Speed 2627.48 samples/sec Loss 10.5299 LearningRate 0.0588 Epoch: 4 Global Step: 193690 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:33,609-Speed 2622.27 samples/sec Loss 10.6203 LearningRate 0.0588 Epoch: 4 Global Step: 193700 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:37,500-Speed 2632.49 samples/sec Loss 10.5414 LearningRate 0.0588 Epoch: 4 Global Step: 193710 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:41,392-Speed 2631.15 samples/sec Loss 10.5456 LearningRate 0.0587 Epoch: 4 Global Step: 193720 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:29:45,283-Speed 2632.12 samples/sec Loss 10.4822 LearningRate 0.0587 Epoch: 4 Global Step: 193730 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:29:49,192-Speed 2620.42 samples/sec Loss 10.4345 LearningRate 0.0587 Epoch: 4 Global Step: 193740 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:29:53,084-Speed 2631.41 samples/sec Loss 10.4839 LearningRate 0.0587 Epoch: 4 Global Step: 193750 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:29:56,976-Speed 2631.91 samples/sec Loss 10.4016 LearningRate 0.0587 Epoch: 4 Global Step: 193760 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:00,869-Speed 2630.96 samples/sec Loss 10.4564 LearningRate 0.0587 Epoch: 4 Global Step: 193770 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:04,762-Speed 2631.09 samples/sec Loss 10.4015 LearningRate 0.0587 Epoch: 4 Global Step: 193780 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:08,658-Speed 2628.67 samples/sec Loss 10.5133 LearningRate 0.0587 Epoch: 4 Global Step: 193790 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:12,560-Speed 2624.98 samples/sec Loss 10.3703 LearningRate 0.0587 Epoch: 4 Global Step: 193800 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:16,462-Speed 2625.12 samples/sec Loss 10.4176 LearningRate 0.0587 Epoch: 4 Global Step: 193810 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:20,387-Speed 2609.80 samples/sec Loss 10.5472 LearningRate 0.0587 Epoch: 4 Global Step: 193820 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:24,287-Speed 2625.65 samples/sec Loss 10.3967 LearningRate 0.0587 Epoch: 4 Global Step: 193830 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:28,189-Speed 2630.61 samples/sec Loss 10.5139 LearningRate 0.0587 Epoch: 4 Global Step: 193840 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:32,098-Speed 2620.59 samples/sec Loss 10.4967 LearningRate 0.0587 Epoch: 4 Global Step: 193850 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:35,996-Speed 2627.34 samples/sec Loss 10.4487 LearningRate 0.0587 Epoch: 4 Global Step: 193860 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:39,902-Speed 2622.36 samples/sec Loss 10.4200 LearningRate 0.0587 Epoch: 4 Global Step: 193870 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:43,813-Speed 2619.05 samples/sec Loss 10.4419 LearningRate 0.0587 Epoch: 4 Global Step: 193880 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:47,724-Speed 2618.25 samples/sec Loss 10.5825 LearningRate 0.0587 Epoch: 4 Global Step: 193890 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:51,621-Speed 2628.30 samples/sec Loss 10.5541 LearningRate 0.0587 Epoch: 4 Global Step: 193900 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:30:55,497-Speed 2642.34 samples/sec Loss 10.4614 LearningRate 0.0587 Epoch: 4 Global Step: 193910 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:30:59,390-Speed 2631.57 samples/sec Loss 10.5560 LearningRate 0.0587 Epoch: 4 Global Step: 193920 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:03,285-Speed 2629.25 samples/sec Loss 10.5408 LearningRate 0.0587 Epoch: 4 Global Step: 193930 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:07,179-Speed 2630.59 samples/sec Loss 10.5044 LearningRate 0.0587 Epoch: 4 Global Step: 193940 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:11,072-Speed 2630.72 samples/sec Loss 10.5688 LearningRate 0.0587 Epoch: 4 Global Step: 193950 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:14,968-Speed 2628.78 samples/sec Loss 10.3114 LearningRate 0.0587 Epoch: 4 Global Step: 193960 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:18,862-Speed 2630.33 samples/sec Loss 10.4915 LearningRate 0.0587 Epoch: 4 Global Step: 193970 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:22,754-Speed 2631.91 samples/sec Loss 10.3883 LearningRate 0.0587 Epoch: 4 Global Step: 193980 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:26,647-Speed 2630.54 samples/sec Loss 10.3266 LearningRate 0.0587 Epoch: 4 Global Step: 193990 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:30,540-Speed 2631.43 samples/sec Loss 10.5439 LearningRate 0.0587 Epoch: 4 Global Step: 194000 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:31:34,438-Speed 2627.23 samples/sec Loss 10.3950 LearningRate 0.0587 Epoch: 4 Global Step: 194010 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:31:38,337-Speed 2626.79 samples/sec Loss 10.6151 LearningRate 0.0587 Epoch: 4 Global Step: 194020 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:31:42,237-Speed 2626.30 samples/sec Loss 10.4826 LearningRate 0.0587 Epoch: 4 Global Step: 194030 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:31:46,149-Speed 2618.33 samples/sec Loss 10.4776 LearningRate 0.0587 Epoch: 4 Global Step: 194040 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:31:50,049-Speed 2626.59 samples/sec Loss 10.5517 LearningRate 0.0587 Epoch: 4 Global Step: 194050 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:31:53,998-Speed 2593.53 samples/sec Loss 10.5884 LearningRate 0.0587 Epoch: 4 Global Step: 194060 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:31:57,902-Speed 2623.63 samples/sec Loss 10.3668 LearningRate 0.0587 Epoch: 4 Global Step: 194070 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:32:01,819-Speed 2614.37 samples/sec Loss 10.5485 LearningRate 0.0587 Epoch: 4 Global Step: 194080 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:32:05,727-Speed 2620.91 samples/sec Loss 10.5255 LearningRate 0.0587 Epoch: 4 Global Step: 194090 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:32:09,596-Speed 2647.48 samples/sec Loss 10.3380 LearningRate 0.0587 Epoch: 4 Global Step: 194100 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:13,486-Speed 2632.52 samples/sec Loss 10.6794 LearningRate 0.0587 Epoch: 4 Global Step: 194110 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:17,402-Speed 2615.84 samples/sec Loss 10.5922 LearningRate 0.0587 Epoch: 4 Global Step: 194120 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:21,290-Speed 2633.96 samples/sec Loss 10.5169 LearningRate 0.0587 Epoch: 4 Global Step: 194130 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:25,191-Speed 2635.16 samples/sec Loss 10.3884 LearningRate 0.0587 Epoch: 4 Global Step: 194140 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:29,082-Speed 2632.19 samples/sec Loss 10.5331 LearningRate 0.0587 Epoch: 4 Global Step: 194150 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:32,971-Speed 2633.41 samples/sec Loss 10.4854 LearningRate 0.0587 Epoch: 4 Global Step: 194160 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:36,863-Speed 2631.36 samples/sec Loss 10.4795 LearningRate 0.0587 Epoch: 4 Global Step: 194170 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:40,756-Speed 2631.53 samples/sec Loss 10.3662 LearningRate 0.0587 Epoch: 4 Global Step: 194180 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:44,658-Speed 2624.33 samples/sec Loss 10.4130 LearningRate 0.0587 Epoch: 4 Global Step: 194190 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:32:48,565-Speed 2621.86 samples/sec Loss 10.4290 LearningRate 0.0587 Epoch: 4 Global Step: 194200 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:32:52,476-Speed 2618.28 samples/sec Loss 10.3865 LearningRate 0.0587 Epoch: 4 Global Step: 194210 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:32:56,383-Speed 2621.52 samples/sec Loss 10.5444 LearningRate 0.0587 Epoch: 4 Global Step: 194220 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:00,289-Speed 2622.83 samples/sec Loss 10.5760 LearningRate 0.0587 Epoch: 4 Global Step: 194230 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:04,187-Speed 2627.71 samples/sec Loss 10.4786 LearningRate 0.0587 Epoch: 4 Global Step: 194240 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:08,131-Speed 2596.99 samples/sec Loss 10.5105 LearningRate 0.0587 Epoch: 4 Global Step: 194250 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:12,020-Speed 2633.70 samples/sec Loss 10.3389 LearningRate 0.0587 Epoch: 4 Global Step: 194260 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:15,915-Speed 2629.63 samples/sec Loss 10.3588 LearningRate 0.0586 Epoch: 4 Global Step: 194270 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:19,805-Speed 2632.64 samples/sec Loss 10.6322 LearningRate 0.0586 Epoch: 4 Global Step: 194280 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:23,693-Speed 2634.58 samples/sec Loss 10.3447 LearningRate 0.0586 Epoch: 4 Global Step: 194290 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:33:27,581-Speed 2634.03 samples/sec Loss 10.4917 LearningRate 0.0586 Epoch: 4 Global Step: 194300 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:31,470-Speed 2633.97 samples/sec Loss 10.5911 LearningRate 0.0586 Epoch: 4 Global Step: 194310 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:35,361-Speed 2632.19 samples/sec Loss 10.4286 LearningRate 0.0586 Epoch: 4 Global Step: 194320 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:39,270-Speed 2620.84 samples/sec Loss 10.5158 LearningRate 0.0586 Epoch: 4 Global Step: 194330 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:43,173-Speed 2623.66 samples/sec Loss 10.4981 LearningRate 0.0586 Epoch: 4 Global Step: 194340 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:47,070-Speed 2628.12 samples/sec Loss 10.4288 LearningRate 0.0586 Epoch: 4 Global Step: 194350 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:50,976-Speed 2622.00 samples/sec Loss 10.5846 LearningRate 0.0586 Epoch: 4 Global Step: 194360 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:54,882-Speed 2622.31 samples/sec Loss 10.5262 LearningRate 0.0586 Epoch: 4 Global Step: 194370 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:33:58,791-Speed 2619.97 samples/sec Loss 10.4031 LearningRate 0.0586 Epoch: 4 Global Step: 194380 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:02,694-Speed 2624.35 samples/sec Loss 10.3984 LearningRate 0.0586 Epoch: 4 Global Step: 194390 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:06,590-Speed 2628.75 samples/sec Loss 10.5938 LearningRate 0.0586 Epoch: 4 Global Step: 194400 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:34:10,489-Speed 2627.42 samples/sec Loss 10.4459 LearningRate 0.0586 Epoch: 4 Global Step: 194410 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:34:14,372-Speed 2637.66 samples/sec Loss 10.5859 LearningRate 0.0586 Epoch: 4 Global Step: 194420 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:18,272-Speed 2626.18 samples/sec Loss 10.4843 LearningRate 0.0586 Epoch: 4 Global Step: 194430 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:22,168-Speed 2628.79 samples/sec Loss 10.4234 LearningRate 0.0586 Epoch: 4 Global Step: 194440 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:26,062-Speed 2630.62 samples/sec Loss 10.4070 LearningRate 0.0586 Epoch: 4 Global Step: 194450 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:29,968-Speed 2622.43 samples/sec Loss 10.2644 LearningRate 0.0586 Epoch: 4 Global Step: 194460 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:33,868-Speed 2625.66 samples/sec Loss 10.3778 LearningRate 0.0586 Epoch: 4 Global Step: 194470 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:38,174-Speed 2378.76 samples/sec Loss 10.6105 LearningRate 0.0586 Epoch: 4 Global Step: 194480 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:42,094-Speed 2612.45 samples/sec Loss 10.4944 LearningRate 0.0586 Epoch: 4 Global Step: 194490 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:46,000-Speed 2622.55 samples/sec Loss 10.4377 LearningRate 0.0586 Epoch: 4 Global Step: 194500 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:49,899-Speed 2627.27 samples/sec Loss 10.3634 LearningRate 0.0586 Epoch: 4 Global Step: 194510 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:34:53,793-Speed 2630.22 samples/sec Loss 10.4524 LearningRate 0.0586 Epoch: 4 Global Step: 194520 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:34:57,687-Speed 2630.19 samples/sec Loss 10.5385 LearningRate 0.0586 Epoch: 4 Global Step: 194530 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:35:01,582-Speed 2629.59 samples/sec Loss 10.4636 LearningRate 0.0586 Epoch: 4 Global Step: 194540 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:35:05,475-Speed 2630.38 samples/sec Loss 10.4604 LearningRate 0.0586 Epoch: 4 Global Step: 194550 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:35:09,348-Speed 2644.79 samples/sec Loss 10.4315 LearningRate 0.0586 Epoch: 4 Global Step: 194560 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:13,247-Speed 2626.97 samples/sec Loss 10.4704 LearningRate 0.0586 Epoch: 4 Global Step: 194570 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:17,143-Speed 2628.96 samples/sec Loss 10.4368 LearningRate 0.0586 Epoch: 4 Global Step: 194580 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:21,054-Speed 2619.74 samples/sec Loss 10.2237 LearningRate 0.0586 Epoch: 4 Global Step: 194590 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:24,947-Speed 2630.92 samples/sec Loss 10.4520 LearningRate 0.0586 Epoch: 4 Global Step: 194600 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:28,842-Speed 2629.53 samples/sec Loss 10.3830 LearningRate 0.0586 Epoch: 4 Global Step: 194610 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:32,736-Speed 2629.96 samples/sec Loss 10.4642 LearningRate 0.0586 Epoch: 4 Global Step: 194620 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:36,632-Speed 2628.90 samples/sec Loss 10.4451 LearningRate 0.0586 Epoch: 4 Global Step: 194630 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:40,538-Speed 2622.28 samples/sec Loss 10.4560 LearningRate 0.0586 Epoch: 4 Global Step: 194640 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:44,439-Speed 2625.34 samples/sec Loss 10.4652 LearningRate 0.0586 Epoch: 4 Global Step: 194650 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:48,335-Speed 2628.89 samples/sec Loss 10.3180 LearningRate 0.0586 Epoch: 4 Global Step: 194660 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:52,225-Speed 2633.54 samples/sec Loss 10.4037 LearningRate 0.0586 Epoch: 4 Global Step: 194670 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:35:56,119-Speed 2630.43 samples/sec Loss 10.4611 LearningRate 0.0586 Epoch: 4 Global Step: 194680 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:36:00,032-Speed 2617.25 samples/sec Loss 10.3290 LearningRate 0.0586 Epoch: 4 Global Step: 194690 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:36:03,930-Speed 2627.87 samples/sec Loss 10.4149 LearningRate 0.0586 Epoch: 4 Global Step: 194700 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:36:07,828-Speed 2627.63 samples/sec Loss 10.4126 LearningRate 0.0586 Epoch: 4 Global Step: 194710 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:36:11,703-Speed 2642.66 samples/sec Loss 10.5091 LearningRate 0.0586 Epoch: 4 Global Step: 194720 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:15,596-Speed 2630.92 samples/sec Loss 10.3087 LearningRate 0.0586 Epoch: 4 Global Step: 194730 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:19,489-Speed 2631.34 samples/sec Loss 10.3361 LearningRate 0.0586 Epoch: 4 Global Step: 194740 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:23,383-Speed 2629.96 samples/sec Loss 10.5851 LearningRate 0.0586 Epoch: 4 Global Step: 194750 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:27,288-Speed 2622.97 samples/sec Loss 10.5224 LearningRate 0.0586 Epoch: 4 Global Step: 194760 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:31,210-Speed 2611.58 samples/sec Loss 10.6179 LearningRate 0.0586 Epoch: 4 Global Step: 194770 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:35,103-Speed 2631.09 samples/sec Loss 10.4611 LearningRate 0.0586 Epoch: 4 Global Step: 194780 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:38,998-Speed 2629.55 samples/sec Loss 10.4132 LearningRate 0.0586 Epoch: 4 Global Step: 194790 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:42,891-Speed 2630.83 samples/sec Loss 10.2616 LearningRate 0.0586 Epoch: 4 Global Step: 194800 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:46,788-Speed 2628.36 samples/sec Loss 10.5292 LearningRate 0.0585 Epoch: 4 Global Step: 194810 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:36:50,770-Speed 2572.18 samples/sec Loss 10.5019 LearningRate 0.0585 Epoch: 4 Global Step: 194820 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:36:54,663-Speed 2630.71 samples/sec Loss 10.5847 LearningRate 0.0585 Epoch: 4 Global Step: 194830 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:36:58,539-Speed 2642.78 samples/sec Loss 10.5374 LearningRate 0.0585 Epoch: 4 Global Step: 194840 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:37:02,433-Speed 2630.08 samples/sec Loss 10.2645 LearningRate 0.0585 Epoch: 4 Global Step: 194850 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:37:06,327-Speed 2630.36 samples/sec Loss 10.3010 LearningRate 0.0585 Epoch: 4 Global Step: 194860 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:37:10,208-Speed 2639.07 samples/sec Loss 10.5895 LearningRate 0.0585 Epoch: 4 Global Step: 194870 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:14,101-Speed 2631.46 samples/sec Loss 10.9760 LearningRate 0.0585 Epoch: 4 Global Step: 194880 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:17,991-Speed 2632.76 samples/sec Loss 10.5610 LearningRate 0.0585 Epoch: 4 Global Step: 194890 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:21,887-Speed 2629.07 samples/sec Loss 10.6208 LearningRate 0.0585 Epoch: 4 Global Step: 194900 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:25,778-Speed 2631.88 samples/sec Loss 10.4875 LearningRate 0.0585 Epoch: 4 Global Step: 194910 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:29,672-Speed 2630.67 samples/sec Loss 10.3605 LearningRate 0.0585 Epoch: 4 Global Step: 194920 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:33,567-Speed 2629.35 samples/sec Loss 10.4060 LearningRate 0.0585 Epoch: 4 Global Step: 194930 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:37,470-Speed 2624.12 samples/sec Loss 10.5207 LearningRate 0.0585 Epoch: 4 Global Step: 194940 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:41,368-Speed 2627.46 samples/sec Loss 10.3897 LearningRate 0.0585 Epoch: 4 Global Step: 194950 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:45,262-Speed 2630.74 samples/sec Loss 10.4983 LearningRate 0.0585 Epoch: 4 Global Step: 194960 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:37:49,157-Speed 2629.60 samples/sec Loss 10.4698 LearningRate 0.0585 Epoch: 4 Global Step: 194970 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:37:53,052-Speed 2629.45 samples/sec Loss 10.3695 LearningRate 0.0585 Epoch: 4 Global Step: 194980 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:37:56,950-Speed 2628.05 samples/sec Loss 10.5045 LearningRate 0.0585 Epoch: 4 Global Step: 194990 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:00,851-Speed 2625.46 samples/sec Loss 10.3719 LearningRate 0.0585 Epoch: 4 Global Step: 195000 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:04,742-Speed 2632.20 samples/sec Loss 10.4637 LearningRate 0.0585 Epoch: 4 Global Step: 195010 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:08,631-Speed 2633.51 samples/sec Loss 10.5118 LearningRate 0.0585 Epoch: 4 Global Step: 195020 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:12,519-Speed 2633.80 samples/sec Loss 10.4103 LearningRate 0.0585 Epoch: 4 Global Step: 195030 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:16,432-Speed 2618.03 samples/sec Loss 10.5204 LearningRate 0.0585 Epoch: 4 Global Step: 195040 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:20,333-Speed 2625.82 samples/sec Loss 10.5117 LearningRate 0.0585 Epoch: 4 Global Step: 195050 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:24,257-Speed 2610.24 samples/sec Loss 10.4717 LearningRate 0.0585 Epoch: 4 Global Step: 195060 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:38:28,174-Speed 2614.92 samples/sec Loss 10.4220 LearningRate 0.0585 Epoch: 4 Global Step: 195070 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:32,087-Speed 2617.66 samples/sec Loss 10.3991 LearningRate 0.0585 Epoch: 4 Global Step: 195080 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:35,986-Speed 2626.86 samples/sec Loss 10.4104 LearningRate 0.0585 Epoch: 4 Global Step: 195090 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:39,883-Speed 2628.05 samples/sec Loss 10.4155 LearningRate 0.0585 Epoch: 4 Global Step: 195100 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:43,782-Speed 2627.17 samples/sec Loss 10.4960 LearningRate 0.0585 Epoch: 4 Global Step: 195110 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:47,684-Speed 2624.20 samples/sec Loss 10.4392 LearningRate 0.0585 Epoch: 4 Global Step: 195120 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:51,588-Speed 2623.55 samples/sec Loss 10.5189 LearningRate 0.0585 Epoch: 4 Global Step: 195130 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:55,488-Speed 2626.23 samples/sec Loss 10.4035 LearningRate 0.0585 Epoch: 4 Global Step: 195140 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:38:59,420-Speed 2605.29 samples/sec Loss 10.6137 LearningRate 0.0585 Epoch: 4 Global Step: 195150 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:03,445-Speed 2544.38 samples/sec Loss 10.3402 LearningRate 0.0585 Epoch: 4 Global Step: 195160 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:07,339-Speed 2630.50 samples/sec Loss 10.5238 LearningRate 0.0585 Epoch: 4 Global Step: 195170 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:39:11,240-Speed 2625.55 samples/sec Loss 10.5159 LearningRate 0.0585 Epoch: 4 Global Step: 195180 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:39:15,134-Speed 2630.18 samples/sec Loss 10.3269 LearningRate 0.0585 Epoch: 4 Global Step: 195190 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:39:19,046-Speed 2618.15 samples/sec Loss 10.3445 LearningRate 0.0585 Epoch: 4 Global Step: 195200 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:39:22,922-Speed 2642.24 samples/sec Loss 10.4759 LearningRate 0.0585 Epoch: 4 Global Step: 195210 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:26,823-Speed 2625.66 samples/sec Loss 10.4147 LearningRate 0.0585 Epoch: 4 Global Step: 195220 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:30,718-Speed 2629.81 samples/sec Loss 10.4385 LearningRate 0.0585 Epoch: 4 Global Step: 195230 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:34,611-Speed 2631.02 samples/sec Loss 10.3898 LearningRate 0.0585 Epoch: 4 Global Step: 195240 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:38,501-Speed 2633.01 samples/sec Loss 10.2816 LearningRate 0.0585 Epoch: 4 Global Step: 195250 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:42,392-Speed 2632.16 samples/sec Loss 10.2661 LearningRate 0.0585 Epoch: 4 Global Step: 195260 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:46,283-Speed 2632.68 samples/sec Loss 10.4566 LearningRate 0.0585 Epoch: 4 Global Step: 195270 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:50,175-Speed 2631.25 samples/sec Loss 10.4649 LearningRate 0.0585 Epoch: 4 Global Step: 195280 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:54,065-Speed 2632.90 samples/sec Loss 10.4242 LearningRate 0.0585 Epoch: 4 Global Step: 195290 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:39:57,957-Speed 2631.71 samples/sec Loss 10.4424 LearningRate 0.0585 Epoch: 4 Global Step: 195300 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:01,853-Speed 2628.64 samples/sec Loss 10.5314 LearningRate 0.0585 Epoch: 4 Global Step: 195310 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:40:05,731-Speed 2641.34 samples/sec Loss 10.3340 LearningRate 0.0585 Epoch: 4 Global Step: 195320 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:09,635-Speed 2624.14 samples/sec Loss 10.3515 LearningRate 0.0585 Epoch: 4 Global Step: 195330 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:13,522-Speed 2635.36 samples/sec Loss 10.4366 LearningRate 0.0585 Epoch: 4 Global Step: 195340 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:17,434-Speed 2617.47 samples/sec Loss 10.4311 LearningRate 0.0584 Epoch: 4 Global Step: 195350 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:21,324-Speed 2632.77 samples/sec Loss 10.3511 LearningRate 0.0584 Epoch: 4 Global Step: 195360 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:25,216-Speed 2631.74 samples/sec Loss 10.4466 LearningRate 0.0584 Epoch: 4 Global Step: 195370 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:29,105-Speed 2634.11 samples/sec Loss 10.4421 LearningRate 0.0584 Epoch: 4 Global Step: 195380 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:32,998-Speed 2630.28 samples/sec Loss 10.5444 LearningRate 0.0584 Epoch: 4 Global Step: 195390 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:36,895-Speed 2628.42 samples/sec Loss 10.3556 LearningRate 0.0584 Epoch: 4 Global Step: 195400 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:40,790-Speed 2630.11 samples/sec Loss 10.4540 LearningRate 0.0584 Epoch: 4 Global Step: 195410 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:44,667-Speed 2641.19 samples/sec Loss 10.4165 LearningRate 0.0584 Epoch: 4 Global Step: 195420 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:48,567-Speed 2627.12 samples/sec Loss 10.5216 LearningRate 0.0584 Epoch: 4 Global Step: 195430 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:52,474-Speed 2621.58 samples/sec Loss 10.3916 LearningRate 0.0584 Epoch: 4 Global Step: 195440 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:40:56,360-Speed 2635.38 samples/sec Loss 10.4956 LearningRate 0.0584 Epoch: 4 Global Step: 195450 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:00,259-Speed 2627.26 samples/sec Loss 10.4520 LearningRate 0.0584 Epoch: 4 Global Step: 195460 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:04,162-Speed 2623.66 samples/sec Loss 10.4518 LearningRate 0.0584 Epoch: 4 Global Step: 195470 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:08,063-Speed 2625.54 samples/sec Loss 10.4115 LearningRate 0.0584 Epoch: 4 Global Step: 195480 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:11,967-Speed 2623.37 samples/sec Loss 10.5708 LearningRate 0.0584 Epoch: 4 Global Step: 195490 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:15,875-Speed 2620.95 samples/sec Loss 10.5797 LearningRate 0.0584 Epoch: 4 Global Step: 195500 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:19,785-Speed 2620.22 samples/sec Loss 10.3262 LearningRate 0.0584 Epoch: 4 Global Step: 195510 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:23,691-Speed 2622.34 samples/sec Loss 10.3824 LearningRate 0.0584 Epoch: 4 Global Step: 195520 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:41:27,598-Speed 2621.37 samples/sec Loss 10.4856 LearningRate 0.0584 Epoch: 4 Global Step: 195530 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:41:31,487-Speed 2633.80 samples/sec Loss 10.5168 LearningRate 0.0584 Epoch: 4 Global Step: 195540 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:35,388-Speed 2625.50 samples/sec Loss 10.4482 LearningRate 0.0584 Epoch: 4 Global Step: 195550 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:39,290-Speed 2624.53 samples/sec Loss 10.4520 LearningRate 0.0584 Epoch: 4 Global Step: 195560 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:43,188-Speed 2627.43 samples/sec Loss 10.4964 LearningRate 0.0584 Epoch: 4 Global Step: 195570 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:47,089-Speed 2625.70 samples/sec Loss 10.4238 LearningRate 0.0584 Epoch: 4 Global Step: 195580 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:50,985-Speed 2629.11 samples/sec Loss 10.6008 LearningRate 0.0584 Epoch: 4 Global Step: 195590 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:54,879-Speed 2630.41 samples/sec Loss 10.3144 LearningRate 0.0584 Epoch: 4 Global Step: 195600 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:41:58,773-Speed 2630.28 samples/sec Loss 10.4682 LearningRate 0.0584 Epoch: 4 Global Step: 195610 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:42:02,662-Speed 2633.98 samples/sec Loss 10.3484 LearningRate 0.0584 Epoch: 4 Global Step: 195620 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:42:06,521-Speed 2653.74 samples/sec Loss 10.4623 LearningRate 0.0584 Epoch: 4 Global Step: 195630 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:10,419-Speed 2627.21 samples/sec Loss 10.4403 LearningRate 0.0584 Epoch: 4 Global Step: 195640 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:14,320-Speed 2626.15 samples/sec Loss 10.3305 LearningRate 0.0584 Epoch: 4 Global Step: 195650 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:18,221-Speed 2625.16 samples/sec Loss 10.5424 LearningRate 0.0584 Epoch: 4 Global Step: 195660 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:22,120-Speed 2626.98 samples/sec Loss 10.4219 LearningRate 0.0584 Epoch: 4 Global Step: 195670 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:26,020-Speed 2625.95 samples/sec Loss 10.4380 LearningRate 0.0584 Epoch: 4 Global Step: 195680 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:29,919-Speed 2626.93 samples/sec Loss 10.4683 LearningRate 0.0584 Epoch: 4 Global Step: 195690 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:33,816-Speed 2628.52 samples/sec Loss 10.3514 LearningRate 0.0584 Epoch: 4 Global Step: 195700 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:37,704-Speed 2634.50 samples/sec Loss 10.4938 LearningRate 0.0584 Epoch: 4 Global Step: 195710 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:41,591-Speed 2634.56 samples/sec Loss 10.3844 LearningRate 0.0584 Epoch: 4 Global Step: 195720 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:42:45,480-Speed 2633.77 samples/sec Loss 10.4108 LearningRate 0.0584 Epoch: 4 Global Step: 195730 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:42:49,377-Speed 2628.42 samples/sec Loss 10.4036 LearningRate 0.0584 Epoch: 4 Global Step: 195740 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:42:53,286-Speed 2620.49 samples/sec Loss 10.4991 LearningRate 0.0584 Epoch: 4 Global Step: 195750 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:42:57,176-Speed 2632.44 samples/sec Loss 10.3666 LearningRate 0.0584 Epoch: 4 Global Step: 195760 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:01,068-Speed 2632.06 samples/sec Loss 10.5063 LearningRate 0.0584 Epoch: 4 Global Step: 195770 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:04,975-Speed 2620.91 samples/sec Loss 10.4933 LearningRate 0.0584 Epoch: 4 Global Step: 195780 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:08,869-Speed 2630.65 samples/sec Loss 10.3949 LearningRate 0.0584 Epoch: 4 Global Step: 195790 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:12,760-Speed 2631.89 samples/sec Loss 10.4316 LearningRate 0.0584 Epoch: 4 Global Step: 195800 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:16,653-Speed 2631.74 samples/sec Loss 10.4356 LearningRate 0.0584 Epoch: 4 Global Step: 195810 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:20,550-Speed 2627.77 samples/sec Loss 10.3790 LearningRate 0.0584 Epoch: 4 Global Step: 195820 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:24,424-Speed 2643.94 samples/sec Loss 10.5013 LearningRate 0.0584 Epoch: 4 Global Step: 195830 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:43:28,301-Speed 2641.61 samples/sec Loss 10.5331 LearningRate 0.0584 Epoch: 4 Global Step: 195840 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:43:32,216-Speed 2616.68 samples/sec Loss 10.3322 LearningRate 0.0584 Epoch: 4 Global Step: 195850 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:43:36,108-Speed 2632.19 samples/sec Loss 10.3776 LearningRate 0.0584 Epoch: 4 Global Step: 195860 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:43:40,858-Speed 2156.19 samples/sec Loss 10.4236 LearningRate 0.0584 Epoch: 4 Global Step: 195870 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:43:44,748-Speed 2633.25 samples/sec Loss 10.4972 LearningRate 0.0584 Epoch: 4 Global Step: 195880 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:43:48,638-Speed 2632.73 samples/sec Loss 10.4866 LearningRate 0.0583 Epoch: 4 Global Step: 195890 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:43:52,538-Speed 2626.51 samples/sec Loss 10.4607 LearningRate 0.0583 Epoch: 4 Global Step: 195900 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:43:56,434-Speed 2629.35 samples/sec Loss 10.3716 LearningRate 0.0583 Epoch: 4 Global Step: 195910 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:00,328-Speed 2630.87 samples/sec Loss 10.3816 LearningRate 0.0583 Epoch: 4 Global Step: 195920 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:04,223-Speed 2629.36 samples/sec Loss 10.4443 LearningRate 0.0583 Epoch: 4 Global Step: 195930 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:08,160-Speed 2601.24 samples/sec Loss 10.3697 LearningRate 0.0583 Epoch: 4 Global Step: 195940 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:44:12,111-Speed 2592.32 samples/sec Loss 10.4352 LearningRate 0.0583 Epoch: 4 Global Step: 195950 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:44:16,054-Speed 2598.11 samples/sec Loss 10.2621 LearningRate 0.0583 Epoch: 4 Global Step: 195960 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:44:19,952-Speed 2627.24 samples/sec Loss 10.3767 LearningRate 0.0583 Epoch: 4 Global Step: 195970 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:44:23,832-Speed 2639.79 samples/sec Loss 10.3697 LearningRate 0.0583 Epoch: 4 Global Step: 195980 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:27,732-Speed 2626.79 samples/sec Loss 10.6179 LearningRate 0.0583 Epoch: 4 Global Step: 195990 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:31,631-Speed 2627.02 samples/sec Loss 10.5081 LearningRate 0.0583 Epoch: 4 Global Step: 196000 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:35,524-Speed 2631.53 samples/sec Loss 10.5198 LearningRate 0.0583 Epoch: 4 Global Step: 196010 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:39,432-Speed 2620.87 samples/sec Loss 10.3783 LearningRate 0.0583 Epoch: 4 Global Step: 196020 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:43,315-Speed 2637.61 samples/sec Loss 10.5284 LearningRate 0.0583 Epoch: 4 Global Step: 196030 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:47,208-Speed 2631.75 samples/sec Loss 10.4940 LearningRate 0.0583 Epoch: 4 Global Step: 196040 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:51,110-Speed 2624.40 samples/sec Loss 10.5327 LearningRate 0.0583 Epoch: 4 Global Step: 196050 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:54,999-Speed 2633.42 samples/sec Loss 10.4428 LearningRate 0.0583 Epoch: 4 Global Step: 196060 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:44:58,887-Speed 2634.08 samples/sec Loss 10.4469 LearningRate 0.0583 Epoch: 4 Global Step: 196070 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:45:02,782-Speed 2630.65 samples/sec Loss 10.4707 LearningRate 0.0583 Epoch: 4 Global Step: 196080 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:06,673-Speed 2632.56 samples/sec Loss 10.5234 LearningRate 0.0583 Epoch: 4 Global Step: 196090 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:10,575-Speed 2624.55 samples/sec Loss 10.3899 LearningRate 0.0583 Epoch: 4 Global Step: 196100 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:14,507-Speed 2605.56 samples/sec Loss 10.3976 LearningRate 0.0583 Epoch: 4 Global Step: 196110 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:18,398-Speed 2632.32 samples/sec Loss 10.4360 LearningRate 0.0583 Epoch: 4 Global Step: 196120 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:22,288-Speed 2632.67 samples/sec Loss 10.5279 LearningRate 0.0583 Epoch: 4 Global Step: 196130 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:26,179-Speed 2632.46 samples/sec Loss 10.4037 LearningRate 0.0583 Epoch: 4 Global Step: 196140 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:30,071-Speed 2631.71 samples/sec Loss 10.4984 LearningRate 0.0583 Epoch: 4 Global Step: 196150 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:33,971-Speed 2626.64 samples/sec Loss 10.3996 LearningRate 0.0583 Epoch: 4 Global Step: 196160 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:37,867-Speed 2629.40 samples/sec Loss 10.4503 LearningRate 0.0583 Epoch: 4 Global Step: 196170 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:45:41,766-Speed 2626.81 samples/sec Loss 10.4866 LearningRate 0.0583 Epoch: 4 Global Step: 196180 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:45:45,658-Speed 2631.58 samples/sec Loss 10.3683 LearningRate 0.0583 Epoch: 4 Global Step: 196190 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:45:49,553-Speed 2630.02 samples/sec Loss 10.5584 LearningRate 0.0583 Epoch: 4 Global Step: 196200 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:45:53,456-Speed 2624.29 samples/sec Loss 10.3382 LearningRate 0.0583 Epoch: 4 Global Step: 196210 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:45:57,390-Speed 2603.01 samples/sec Loss 10.4543 LearningRate 0.0583 Epoch: 4 Global Step: 196220 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:01,286-Speed 2629.85 samples/sec Loss 10.3824 LearningRate 0.0583 Epoch: 4 Global Step: 196230 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:05,187-Speed 2625.69 samples/sec Loss 10.4958 LearningRate 0.0583 Epoch: 4 Global Step: 196240 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:09,084-Speed 2628.73 samples/sec Loss 10.4756 LearningRate 0.0583 Epoch: 4 Global Step: 196250 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:12,997-Speed 2617.12 samples/sec Loss 10.4927 LearningRate 0.0583 Epoch: 4 Global Step: 196260 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:16,890-Speed 2631.57 samples/sec Loss 10.5423 LearningRate 0.0583 Epoch: 4 Global Step: 196270 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:20,782-Speed 2631.76 samples/sec Loss 10.4132 LearningRate 0.0583 Epoch: 4 Global Step: 196280 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:46:24,676-Speed 2629.98 samples/sec Loss 10.5662 LearningRate 0.0583 Epoch: 4 Global Step: 196290 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:46:28,599-Speed 2610.62 samples/sec Loss 10.6715 LearningRate 0.0583 Epoch: 4 Global Step: 196300 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:46:32,493-Speed 2630.33 samples/sec Loss 10.3744 LearningRate 0.0583 Epoch: 4 Global Step: 196310 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:46:36,388-Speed 2630.54 samples/sec Loss 10.4869 LearningRate 0.0583 Epoch: 4 Global Step: 196320 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:46:40,263-Speed 2642.97 samples/sec Loss 10.5147 LearningRate 0.0583 Epoch: 4 Global Step: 196330 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:44,162-Speed 2627.03 samples/sec Loss 10.3240 LearningRate 0.0583 Epoch: 4 Global Step: 196340 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:48,060-Speed 2628.28 samples/sec Loss 10.5427 LearningRate 0.0583 Epoch: 4 Global Step: 196350 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:51,962-Speed 2625.31 samples/sec Loss 10.4806 LearningRate 0.0583 Epoch: 4 Global Step: 196360 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:55,860-Speed 2627.32 samples/sec Loss 10.2553 LearningRate 0.0583 Epoch: 4 Global Step: 196370 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:46:59,763-Speed 2624.54 samples/sec Loss 10.5110 LearningRate 0.0583 Epoch: 4 Global Step: 196380 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:03,663-Speed 2626.36 samples/sec Loss 10.2777 LearningRate 0.0583 Epoch: 4 Global Step: 196390 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:07,564-Speed 2625.41 samples/sec Loss 10.5187 LearningRate 0.0583 Epoch: 4 Global Step: 196400 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:11,458-Speed 2631.06 samples/sec Loss 10.4711 LearningRate 0.0583 Epoch: 4 Global Step: 196410 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:15,359-Speed 2625.57 samples/sec Loss 10.4209 LearningRate 0.0583 Epoch: 4 Global Step: 196420 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:19,261-Speed 2624.77 samples/sec Loss 10.4228 LearningRate 0.0583 Epoch: 4 Global Step: 196430 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:47:23,154-Speed 2630.51 samples/sec Loss 10.3993 LearningRate 0.0582 Epoch: 4 Global Step: 196440 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:47:27,030-Speed 2642.66 samples/sec Loss 10.3544 LearningRate 0.0582 Epoch: 4 Global Step: 196450 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:30,923-Speed 2631.26 samples/sec Loss 10.5714 LearningRate 0.0582 Epoch: 4 Global Step: 196460 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:34,818-Speed 2629.69 samples/sec Loss 10.4527 LearningRate 0.0582 Epoch: 4 Global Step: 196470 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:38,727-Speed 2619.96 samples/sec Loss 10.3883 LearningRate 0.0582 Epoch: 4 Global Step: 196480 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:47:42,610-Speed 2638.36 samples/sec Loss 10.4453 LearningRate 0.0582 Epoch: 4 Global Step: 196490 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:47:46,509-Speed 2627.33 samples/sec Loss 10.3745 LearningRate 0.0582 Epoch: 4 Global Step: 196500 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:47:50,411-Speed 2624.52 samples/sec Loss 10.3026 LearningRate 0.0582 Epoch: 4 Global Step: 196510 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:47:54,351-Speed 2600.15 samples/sec Loss 10.5075 LearningRate 0.0582 Epoch: 4 Global Step: 196520 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:47:58,252-Speed 2625.50 samples/sec Loss 10.4627 LearningRate 0.0582 Epoch: 4 Global Step: 196530 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:48:02,146-Speed 2630.46 samples/sec Loss 10.3505 LearningRate 0.0582 Epoch: 4 Global Step: 196540 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:48:06,075-Speed 2607.27 samples/sec Loss 10.4308 LearningRate 0.0582 Epoch: 4 Global Step: 196550 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:48:09,977-Speed 2624.74 samples/sec Loss 10.5093 LearningRate 0.0582 Epoch: 4 Global Step: 196560 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:48:13,869-Speed 2631.25 samples/sec Loss 10.3869 LearningRate 0.0582 Epoch: 4 Global Step: 196570 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:48:17,833-Speed 2584.80 samples/sec Loss 10.3372 LearningRate 0.0582 Epoch: 4 Global Step: 196580 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:48:21,729-Speed 2629.43 samples/sec Loss 10.3753 LearningRate 0.0582 Epoch: 4 Global Step: 196590 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:25,641-Speed 2617.99 samples/sec Loss 10.3114 LearningRate 0.0582 Epoch: 4 Global Step: 196600 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:29,536-Speed 2630.17 samples/sec Loss 10.3678 LearningRate 0.0582 Epoch: 4 Global Step: 196610 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:33,425-Speed 2633.34 samples/sec Loss 10.4687 LearningRate 0.0582 Epoch: 4 Global Step: 196620 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:37,317-Speed 2631.31 samples/sec Loss 10.4292 LearningRate 0.0582 Epoch: 4 Global Step: 196630 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:41,209-Speed 2631.72 samples/sec Loss 10.2840 LearningRate 0.0582 Epoch: 4 Global Step: 196640 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:45,106-Speed 2628.50 samples/sec Loss 10.5408 LearningRate 0.0582 Epoch: 4 Global Step: 196650 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:48,995-Speed 2633.94 samples/sec Loss 10.4216 LearningRate 0.0582 Epoch: 4 Global Step: 196660 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:52,887-Speed 2631.84 samples/sec Loss 10.2949 LearningRate 0.0582 Epoch: 4 Global Step: 196670 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:48:56,781-Speed 2629.91 samples/sec Loss 10.3655 LearningRate 0.0582 Epoch: 4 Global Step: 196680 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:00,672-Speed 2632.96 samples/sec Loss 10.3716 LearningRate 0.0582 Epoch: 4 Global Step: 196690 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:49:04,549-Speed 2641.41 samples/sec Loss 10.2886 LearningRate 0.0582 Epoch: 4 Global Step: 196700 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:08,444-Speed 2629.77 samples/sec Loss 10.4190 LearningRate 0.0582 Epoch: 4 Global Step: 196710 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:12,340-Speed 2628.43 samples/sec Loss 10.2947 LearningRate 0.0582 Epoch: 4 Global Step: 196720 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:16,235-Speed 2630.18 samples/sec Loss 10.2878 LearningRate 0.0582 Epoch: 4 Global Step: 196730 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:20,135-Speed 2626.55 samples/sec Loss 10.4809 LearningRate 0.0582 Epoch: 4 Global Step: 196740 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:24,025-Speed 2633.40 samples/sec Loss 10.4769 LearningRate 0.0582 Epoch: 4 Global Step: 196750 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:27,915-Speed 2632.76 samples/sec Loss 10.3076 LearningRate 0.0582 Epoch: 4 Global Step: 196760 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:31,831-Speed 2615.60 samples/sec Loss 10.4215 LearningRate 0.0582 Epoch: 4 Global Step: 196770 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:49:35,710-Speed 2640.96 samples/sec Loss 10.5399 LearningRate 0.0582 Epoch: 4 Global Step: 196780 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:49:39,614-Speed 2623.78 samples/sec Loss 10.3658 LearningRate 0.0582 Epoch: 4 Global Step: 196790 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:49:43,575-Speed 2585.72 samples/sec Loss 10.4174 LearningRate 0.0582 Epoch: 4 Global Step: 196800 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:49:47,442-Speed 2648.80 samples/sec Loss 10.9131 LearningRate 0.0582 Epoch: 4 Global Step: 196810 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:49:51,337-Speed 2628.89 samples/sec Loss 11.1056 LearningRate 0.0582 Epoch: 4 Global Step: 196820 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:49:55,234-Speed 2629.11 samples/sec Loss 10.6958 LearningRate 0.0582 Epoch: 4 Global Step: 196830 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:49:59,139-Speed 2622.97 samples/sec Loss 10.6166 LearningRate 0.0582 Epoch: 4 Global Step: 196840 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:50:03,065-Speed 2608.59 samples/sec Loss 10.5500 LearningRate 0.0582 Epoch: 4 Global Step: 196850 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:50:06,957-Speed 2632.25 samples/sec Loss 10.4917 LearningRate 0.0582 Epoch: 4 Global Step: 196860 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:50:10,846-Speed 2633.08 samples/sec Loss 10.5279 LearningRate 0.0582 Epoch: 4 Global Step: 196870 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:50:14,735-Speed 2634.11 samples/sec Loss 10.4238 LearningRate 0.0582 Epoch: 4 Global Step: 196880 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:50:18,806-Speed 2515.62 samples/sec Loss 10.3944 LearningRate 0.0582 Epoch: 4 Global Step: 196890 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:50:22,728-Speed 2612.18 samples/sec Loss 10.4822 LearningRate 0.0582 Epoch: 4 Global Step: 196900 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:50:26,617-Speed 2633.72 samples/sec Loss 10.5178 LearningRate 0.0582 Epoch: 4 Global Step: 196910 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:30,505-Speed 2634.62 samples/sec Loss 10.4282 LearningRate 0.0582 Epoch: 4 Global Step: 196920 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:34,399-Speed 2630.16 samples/sec Loss 10.4445 LearningRate 0.0582 Epoch: 4 Global Step: 196930 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:38,290-Speed 2633.98 samples/sec Loss 10.4601 LearningRate 0.0582 Epoch: 4 Global Step: 196940 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:42,196-Speed 2623.19 samples/sec Loss 10.6059 LearningRate 0.0582 Epoch: 4 Global Step: 196950 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:46,103-Speed 2621.07 samples/sec Loss 10.5952 LearningRate 0.0582 Epoch: 4 Global Step: 196960 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:50,022-Speed 2614.60 samples/sec Loss 10.4728 LearningRate 0.0582 Epoch: 4 Global Step: 196970 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:53,945-Speed 2610.92 samples/sec Loss 10.3693 LearningRate 0.0581 Epoch: 4 Global Step: 196980 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:50:57,839-Speed 2630.59 samples/sec Loss 10.3948 LearningRate 0.0581 Epoch: 4 Global Step: 196990 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:51:01,730-Speed 2631.78 samples/sec Loss 10.4573 LearningRate 0.0581 Epoch: 4 Global Step: 197000 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:51:05,627-Speed 2628.82 samples/sec Loss 10.5263 LearningRate 0.0581 Epoch: 4 Global Step: 197010 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:09,516-Speed 2633.27 samples/sec Loss 10.6233 LearningRate 0.0581 Epoch: 4 Global Step: 197020 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:13,409-Speed 2631.09 samples/sec Loss 10.3667 LearningRate 0.0581 Epoch: 4 Global Step: 197030 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:17,301-Speed 2632.02 samples/sec Loss 10.3658 LearningRate 0.0581 Epoch: 4 Global Step: 197040 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:21,194-Speed 2630.76 samples/sec Loss 10.4213 LearningRate 0.0581 Epoch: 4 Global Step: 197050 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:25,090-Speed 2628.78 samples/sec Loss 10.4672 LearningRate 0.0581 Epoch: 4 Global Step: 197060 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:28,981-Speed 2632.31 samples/sec Loss 10.4857 LearningRate 0.0581 Epoch: 4 Global Step: 197070 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:32,873-Speed 2631.88 samples/sec Loss 10.3606 LearningRate 0.0581 Epoch: 4 Global Step: 197080 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:36,766-Speed 2631.01 samples/sec Loss 10.3250 LearningRate 0.0581 Epoch: 4 Global Step: 197090 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:40,658-Speed 2631.30 samples/sec Loss 10.5526 LearningRate 0.0581 Epoch: 4 Global Step: 197100 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:51:44,554-Speed 2629.50 samples/sec Loss 10.5863 LearningRate 0.0581 Epoch: 4 Global Step: 197110 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:51:48,469-Speed 2616.27 samples/sec Loss 10.4409 LearningRate 0.0581 Epoch: 4 Global Step: 197120 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:51:52,361-Speed 2631.38 samples/sec Loss 10.4660 LearningRate 0.0581 Epoch: 4 Global Step: 197130 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:51:56,255-Speed 2630.39 samples/sec Loss 10.4518 LearningRate 0.0581 Epoch: 4 Global Step: 197140 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:52:00,147-Speed 2631.43 samples/sec Loss 10.3312 LearningRate 0.0581 Epoch: 4 Global Step: 197150 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:52:04,038-Speed 2632.12 samples/sec Loss 10.3926 LearningRate 0.0581 Epoch: 4 Global Step: 197160 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:52:07,930-Speed 2631.75 samples/sec Loss 10.4467 LearningRate 0.0581 Epoch: 4 Global Step: 197170 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:52:11,821-Speed 2632.76 samples/sec Loss 10.3357 LearningRate 0.0581 Epoch: 4 Global Step: 197180 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:52:15,712-Speed 2631.88 samples/sec Loss 10.4009 LearningRate 0.0581 Epoch: 4 Global Step: 197190 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:52:19,611-Speed 2627.14 samples/sec Loss 10.4595 LearningRate 0.0581 Epoch: 4 Global Step: 197200 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:52:23,503-Speed 2631.86 samples/sec Loss 10.4756 LearningRate 0.0581 Epoch: 4 Global Step: 197210 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:27,399-Speed 2628.85 samples/sec Loss 10.3292 LearningRate 0.0581 Epoch: 4 Global Step: 197220 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:31,297-Speed 2627.13 samples/sec Loss 10.3188 LearningRate 0.0581 Epoch: 4 Global Step: 197230 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:35,202-Speed 2623.12 samples/sec Loss 10.3516 LearningRate 0.0581 Epoch: 4 Global Step: 197240 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:39,097-Speed 2629.81 samples/sec Loss 10.2428 LearningRate 0.0581 Epoch: 4 Global Step: 197250 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:42,988-Speed 2632.37 samples/sec Loss 10.3009 LearningRate 0.0581 Epoch: 4 Global Step: 197260 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:46,879-Speed 2631.84 samples/sec Loss 10.4962 LearningRate 0.0581 Epoch: 4 Global Step: 197270 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:50,772-Speed 2631.43 samples/sec Loss 10.3290 LearningRate 0.0581 Epoch: 4 Global Step: 197280 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:54,671-Speed 2627.41 samples/sec Loss 10.4644 LearningRate 0.0581 Epoch: 4 Global Step: 197290 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:52:58,563-Speed 2631.47 samples/sec Loss 10.3060 LearningRate 0.0581 Epoch: 4 Global Step: 197300 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:02,457-Speed 2630.41 samples/sec Loss 10.3693 LearningRate 0.0581 Epoch: 4 Global Step: 197310 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:53:06,337-Speed 2639.45 samples/sec Loss 10.5333 LearningRate 0.0581 Epoch: 4 Global Step: 197320 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:10,231-Speed 2630.32 samples/sec Loss 10.4675 LearningRate 0.0581 Epoch: 4 Global Step: 197330 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:14,122-Speed 2632.04 samples/sec Loss 10.3704 LearningRate 0.0581 Epoch: 4 Global Step: 197340 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:18,036-Speed 2617.59 samples/sec Loss 10.3461 LearningRate 0.0581 Epoch: 4 Global Step: 197350 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:21,927-Speed 2632.42 samples/sec Loss 10.4917 LearningRate 0.0581 Epoch: 4 Global Step: 197360 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:25,819-Speed 2631.45 samples/sec Loss 10.4343 LearningRate 0.0581 Epoch: 4 Global Step: 197370 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:29,726-Speed 2621.52 samples/sec Loss 10.4759 LearningRate 0.0581 Epoch: 4 Global Step: 197380 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:33,618-Speed 2631.72 samples/sec Loss 10.3831 LearningRate 0.0581 Epoch: 4 Global Step: 197390 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:37,507-Speed 2633.70 samples/sec Loss 10.3751 LearningRate 0.0581 Epoch: 4 Global Step: 197400 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:41,399-Speed 2631.89 samples/sec Loss 10.4450 LearningRate 0.0581 Epoch: 4 Global Step: 197410 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:45,288-Speed 2634.12 samples/sec Loss 10.4758 LearningRate 0.0581 Epoch: 4 Global Step: 197420 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:53:49,180-Speed 2632.15 samples/sec Loss 10.3804 LearningRate 0.0581 Epoch: 4 Global Step: 197430 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:53:53,083-Speed 2624.23 samples/sec Loss 10.5227 LearningRate 0.0581 Epoch: 4 Global Step: 197440 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:53:56,992-Speed 2620.22 samples/sec Loss 10.2125 LearningRate 0.0581 Epoch: 4 Global Step: 197450 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:00,883-Speed 2632.34 samples/sec Loss 10.4716 LearningRate 0.0581 Epoch: 4 Global Step: 197460 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:04,782-Speed 2626.66 samples/sec Loss 10.4624 LearningRate 0.0581 Epoch: 4 Global Step: 197470 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:08,688-Speed 2622.40 samples/sec Loss 10.5621 LearningRate 0.0581 Epoch: 4 Global Step: 197480 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:12,578-Speed 2633.42 samples/sec Loss 10.4772 LearningRate 0.0581 Epoch: 4 Global Step: 197490 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:16,470-Speed 2631.90 samples/sec Loss 10.4322 LearningRate 0.0581 Epoch: 4 Global Step: 197500 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:20,360-Speed 2632.98 samples/sec Loss 10.4675 LearningRate 0.0581 Epoch: 4 Global Step: 197510 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:24,260-Speed 2625.73 samples/sec Loss 10.4205 LearningRate 0.0580 Epoch: 4 Global Step: 197520 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:28,156-Speed 2629.41 samples/sec Loss 10.2901 LearningRate 0.0580 Epoch: 4 Global Step: 197530 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:54:31,976-Speed 2681.76 samples/sec Loss 11.1399 LearningRate 0.0580 Epoch: 4 Global Step: 197540 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:54:35,870-Speed 2630.00 samples/sec Loss 10.6462 LearningRate 0.0580 Epoch: 4 Global Step: 197550 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:54:39,768-Speed 2627.20 samples/sec Loss 10.3412 LearningRate 0.0580 Epoch: 4 Global Step: 197560 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:54:43,768-Speed 2561.02 samples/sec Loss 10.3980 LearningRate 0.0580 Epoch: 4 Global Step: 197570 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:54:47,692-Speed 2611.07 samples/sec Loss 10.4657 LearningRate 0.0580 Epoch: 4 Global Step: 197580 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:54:51,582-Speed 2632.34 samples/sec Loss 10.3875 LearningRate 0.0580 Epoch: 4 Global Step: 197590 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:54:55,479-Speed 2628.35 samples/sec Loss 10.4700 LearningRate 0.0580 Epoch: 4 Global Step: 197600 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:54:59,477-Speed 2562.33 samples/sec Loss 10.4853 LearningRate 0.0580 Epoch: 4 Global Step: 197610 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:55:03,550-Speed 2514.67 samples/sec Loss 10.3904 LearningRate 0.0580 Epoch: 4 Global Step: 197620 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:55:07,482-Speed 2604.58 samples/sec Loss 10.5059 LearningRate 0.0580 Epoch: 4 Global Step: 197630 Fp16 Grad Scale: 8192 Required: 71 hours
Training: 2022-04-13 17:55:11,382-Speed 2626.37 samples/sec Loss 10.3158 LearningRate 0.0580 Epoch: 4 Global Step: 197640 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:15,270-Speed 2634.77 samples/sec Loss 10.3801 LearningRate 0.0580 Epoch: 4 Global Step: 197650 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:19,228-Speed 2587.71 samples/sec Loss 10.4093 LearningRate 0.0580 Epoch: 4 Global Step: 197660 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:23,197-Speed 2580.23 samples/sec Loss 10.3120 LearningRate 0.0580 Epoch: 4 Global Step: 197670 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:27,113-Speed 2616.12 samples/sec Loss 10.3512 LearningRate 0.0580 Epoch: 4 Global Step: 197680 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:31,006-Speed 2631.10 samples/sec Loss 10.3528 LearningRate 0.0580 Epoch: 4 Global Step: 197690 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:34,913-Speed 2621.19 samples/sec Loss 10.2624 LearningRate 0.0580 Epoch: 4 Global Step: 197700 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:38,819-Speed 2622.51 samples/sec Loss 10.4612 LearningRate 0.0580 Epoch: 4 Global Step: 197710 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:42,715-Speed 2629.19 samples/sec Loss 10.5356 LearningRate 0.0580 Epoch: 4 Global Step: 197720 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:46,604-Speed 2633.14 samples/sec Loss 10.3982 LearningRate 0.0580 Epoch: 4 Global Step: 197730 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:55:50,498-Speed 2630.54 samples/sec Loss 10.3856 LearningRate 0.0580 Epoch: 4 Global Step: 197740 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:55:54,406-Speed 2621.09 samples/sec Loss 10.5326 LearningRate 0.0580 Epoch: 4 Global Step: 197750 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:55:58,308-Speed 2624.66 samples/sec Loss 10.4146 LearningRate 0.0580 Epoch: 4 Global Step: 197760 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:02,203-Speed 2629.97 samples/sec Loss 10.2138 LearningRate 0.0580 Epoch: 4 Global Step: 197770 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:06,099-Speed 2629.14 samples/sec Loss 10.4380 LearningRate 0.0580 Epoch: 4 Global Step: 197780 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:09,992-Speed 2631.02 samples/sec Loss 10.4261 LearningRate 0.0580 Epoch: 4 Global Step: 197790 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:13,888-Speed 2628.67 samples/sec Loss 10.4682 LearningRate 0.0580 Epoch: 4 Global Step: 197800 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:17,785-Speed 2629.18 samples/sec Loss 10.3841 LearningRate 0.0580 Epoch: 4 Global Step: 197810 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:21,674-Speed 2633.27 samples/sec Loss 10.3649 LearningRate 0.0580 Epoch: 4 Global Step: 197820 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:25,571-Speed 2628.37 samples/sec Loss 10.4876 LearningRate 0.0580 Epoch: 4 Global Step: 197830 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:56:29,467-Speed 2629.08 samples/sec Loss 10.3514 LearningRate 0.0580 Epoch: 4 Global Step: 197840 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:56:33,358-Speed 2632.96 samples/sec Loss 10.4595 LearningRate 0.0580 Epoch: 4 Global Step: 197850 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:56:37,249-Speed 2632.42 samples/sec Loss 10.4034 LearningRate 0.0580 Epoch: 4 Global Step: 197860 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:56:41,144-Speed 2629.89 samples/sec Loss 10.2291 LearningRate 0.0580 Epoch: 4 Global Step: 197870 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:56:45,035-Speed 2632.25 samples/sec Loss 10.4837 LearningRate 0.0580 Epoch: 4 Global Step: 197880 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:56:48,927-Speed 2631.40 samples/sec Loss 10.4227 LearningRate 0.0580 Epoch: 4 Global Step: 197890 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:56:52,854-Speed 2608.80 samples/sec Loss 10.5299 LearningRate 0.0580 Epoch: 4 Global Step: 197900 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:56:56,750-Speed 2629.03 samples/sec Loss 10.4116 LearningRate 0.0580 Epoch: 4 Global Step: 197910 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:57:00,659-Speed 2619.87 samples/sec Loss 10.4247 LearningRate 0.0580 Epoch: 4 Global Step: 197920 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:57:04,554-Speed 2629.82 samples/sec Loss 10.4715 LearningRate 0.0580 Epoch: 4 Global Step: 197930 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:57:08,447-Speed 2631.45 samples/sec Loss 10.2220 LearningRate 0.0580 Epoch: 4 Global Step: 197940 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:12,346-Speed 2627.16 samples/sec Loss 10.3490 LearningRate 0.0580 Epoch: 4 Global Step: 197950 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:16,241-Speed 2629.44 samples/sec Loss 10.4925 LearningRate 0.0580 Epoch: 4 Global Step: 197960 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:20,141-Speed 2626.52 samples/sec Loss 10.4743 LearningRate 0.0580 Epoch: 4 Global Step: 197970 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:24,042-Speed 2625.66 samples/sec Loss 10.4589 LearningRate 0.0580 Epoch: 4 Global Step: 197980 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:27,934-Speed 2631.49 samples/sec Loss 10.4698 LearningRate 0.0580 Epoch: 4 Global Step: 197990 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:31,858-Speed 2610.04 samples/sec Loss 10.4029 LearningRate 0.0580 Epoch: 4 Global Step: 198000 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:35,810-Speed 2592.47 samples/sec Loss 10.4302 LearningRate 0.0580 Epoch: 4 Global Step: 198010 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:39,821-Speed 2553.50 samples/sec Loss 10.3776 LearningRate 0.0580 Epoch: 4 Global Step: 198020 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:43,716-Speed 2630.50 samples/sec Loss 10.2587 LearningRate 0.0580 Epoch: 4 Global Step: 198030 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:57:47,622-Speed 2621.86 samples/sec Loss 10.4220 LearningRate 0.0580 Epoch: 4 Global Step: 198040 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:57:51,516-Speed 2630.27 samples/sec Loss 10.4292 LearningRate 0.0580 Epoch: 4 Global Step: 198050 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:57:55,413-Speed 2628.48 samples/sec Loss 10.2373 LearningRate 0.0580 Epoch: 4 Global Step: 198060 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 17:57:59,297-Speed 2636.93 samples/sec Loss 10.4108 LearningRate 0.0579 Epoch: 4 Global Step: 198070 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:03,188-Speed 2632.17 samples/sec Loss 10.4480 LearningRate 0.0579 Epoch: 4 Global Step: 198080 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:07,108-Speed 2613.11 samples/sec Loss 10.3403 LearningRate 0.0579 Epoch: 4 Global Step: 198090 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:11,117-Speed 2555.08 samples/sec Loss 10.3406 LearningRate 0.0579 Epoch: 4 Global Step: 198100 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:15,019-Speed 2625.44 samples/sec Loss 10.4041 LearningRate 0.0579 Epoch: 4 Global Step: 198110 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:19,031-Speed 2552.65 samples/sec Loss 10.4557 LearningRate 0.0579 Epoch: 4 Global Step: 198120 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:22,924-Speed 2632.49 samples/sec Loss 10.2965 LearningRate 0.0579 Epoch: 4 Global Step: 198130 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:26,818-Speed 2629.76 samples/sec Loss 10.4455 LearningRate 0.0579 Epoch: 4 Global Step: 198140 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:30,718-Speed 2626.69 samples/sec Loss 10.5091 LearningRate 0.0579 Epoch: 4 Global Step: 198150 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 17:58:34,599-Speed 2639.15 samples/sec Loss 10.4013 LearningRate 0.0579 Epoch: 4 Global Step: 198160 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:58:38,493-Speed 2631.02 samples/sec Loss 10.5431 LearningRate 0.0579 Epoch: 4 Global Step: 198170 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:58:42,387-Speed 2630.13 samples/sec Loss 10.4734 LearningRate 0.0579 Epoch: 4 Global Step: 198180 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:58:46,280-Speed 2631.65 samples/sec Loss 10.5082 LearningRate 0.0579 Epoch: 4 Global Step: 198190 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:58:50,170-Speed 2632.84 samples/sec Loss 10.4451 LearningRate 0.0579 Epoch: 4 Global Step: 198200 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:58:54,097-Speed 2608.51 samples/sec Loss 10.4188 LearningRate 0.0579 Epoch: 4 Global Step: 198210 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:58:57,992-Speed 2629.77 samples/sec Loss 10.3816 LearningRate 0.0579 Epoch: 4 Global Step: 198220 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:59:01,882-Speed 2632.98 samples/sec Loss 10.3799 LearningRate 0.0579 Epoch: 4 Global Step: 198230 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 17:59:05,756-Speed 2644.23 samples/sec Loss 10.5500 LearningRate 0.0579 Epoch: 4 Global Step: 198240 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:09,657-Speed 2625.57 samples/sec Loss 10.5172 LearningRate 0.0579 Epoch: 4 Global Step: 198250 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:13,556-Speed 2627.28 samples/sec Loss 10.3226 LearningRate 0.0579 Epoch: 4 Global Step: 198260 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:17,460-Speed 2624.03 samples/sec Loss 10.4706 LearningRate 0.0579 Epoch: 4 Global Step: 198270 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:21,358-Speed 2627.46 samples/sec Loss 10.5123 LearningRate 0.0579 Epoch: 4 Global Step: 198280 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:25,260-Speed 2624.98 samples/sec Loss 10.3765 LearningRate 0.0579 Epoch: 4 Global Step: 198290 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:29,159-Speed 2627.23 samples/sec Loss 10.4044 LearningRate 0.0579 Epoch: 4 Global Step: 198300 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:33,093-Speed 2603.46 samples/sec Loss 10.4614 LearningRate 0.0579 Epoch: 4 Global Step: 198310 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:37,221-Speed 2480.76 samples/sec Loss 10.4310 LearningRate 0.0579 Epoch: 4 Global Step: 198320 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:41,120-Speed 2627.24 samples/sec Loss 10.2911 LearningRate 0.0579 Epoch: 4 Global Step: 198330 Fp16 Grad Scale: 16384 Required: 71 hours
Training: 2022-04-13 17:59:45,028-Speed 2620.55 samples/sec Loss 10.4362 LearningRate 0.0579 Epoch: 4 Global Step: 198340 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:59:48,942-Speed 2617.63 samples/sec Loss 10.3099 LearningRate 0.0579 Epoch: 4 Global Step: 198350 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:59:52,842-Speed 2626.37 samples/sec Loss 10.4934 LearningRate 0.0579 Epoch: 4 Global Step: 198360 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 17:59:56,740-Speed 2627.33 samples/sec Loss 10.1897 LearningRate 0.0579 Epoch: 4 Global Step: 198370 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:00:00,632-Speed 2631.86 samples/sec Loss 10.2833 LearningRate 0.0579 Epoch: 4 Global Step: 198380 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:00:04,525-Speed 2630.98 samples/sec Loss 10.3524 LearningRate 0.0579 Epoch: 4 Global Step: 198390 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:00:08,421-Speed 2628.72 samples/sec Loss 10.3483 LearningRate 0.0579 Epoch: 4 Global Step: 198400 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:00:12,330-Speed 2620.35 samples/sec Loss 10.4928 LearningRate 0.0579 Epoch: 4 Global Step: 198410 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:00:16,226-Speed 2629.52 samples/sec Loss 10.5135 LearningRate 0.0579 Epoch: 4 Global Step: 198420 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:00:20,120-Speed 2630.38 samples/sec Loss 10.4320 LearningRate 0.0579 Epoch: 4 Global Step: 198430 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:00:24,030-Speed 2619.56 samples/sec Loss 10.4642 LearningRate 0.0579 Epoch: 4 Global Step: 198440 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:27,927-Speed 2628.37 samples/sec Loss 10.5113 LearningRate 0.0579 Epoch: 4 Global Step: 198450 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:31,825-Speed 2627.99 samples/sec Loss 10.3928 LearningRate 0.0579 Epoch: 4 Global Step: 198460 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:35,714-Speed 2632.96 samples/sec Loss 10.3958 LearningRate 0.0579 Epoch: 4 Global Step: 198470 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:39,616-Speed 2625.06 samples/sec Loss 10.4309 LearningRate 0.0579 Epoch: 4 Global Step: 198480 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:43,509-Speed 2631.70 samples/sec Loss 10.4315 LearningRate 0.0579 Epoch: 4 Global Step: 198490 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:47,406-Speed 2628.04 samples/sec Loss 10.3515 LearningRate 0.0579 Epoch: 4 Global Step: 198500 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:51,319-Speed 2617.87 samples/sec Loss 10.3621 LearningRate 0.0579 Epoch: 4 Global Step: 198510 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:55,218-Speed 2626.56 samples/sec Loss 10.4952 LearningRate 0.0579 Epoch: 4 Global Step: 198520 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:00:59,111-Speed 2631.96 samples/sec Loss 10.4245 LearningRate 0.0579 Epoch: 4 Global Step: 198530 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:01:03,001-Speed 2632.55 samples/sec Loss 10.3728 LearningRate 0.0579 Epoch: 4 Global Step: 198540 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:06,898-Speed 2627.98 samples/sec Loss 10.2532 LearningRate 0.0579 Epoch: 4 Global Step: 198550 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:10,792-Speed 2630.64 samples/sec Loss 10.4546 LearningRate 0.0579 Epoch: 4 Global Step: 198560 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:14,691-Speed 2626.39 samples/sec Loss 10.4405 LearningRate 0.0579 Epoch: 4 Global Step: 198570 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:18,586-Speed 2630.36 samples/sec Loss 10.3963 LearningRate 0.0579 Epoch: 4 Global Step: 198580 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:22,479-Speed 2631.02 samples/sec Loss 10.5806 LearningRate 0.0579 Epoch: 4 Global Step: 198590 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:26,369-Speed 2632.51 samples/sec Loss 10.4830 LearningRate 0.0579 Epoch: 4 Global Step: 198600 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:30,262-Speed 2630.89 samples/sec Loss 10.3835 LearningRate 0.0578 Epoch: 4 Global Step: 198610 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:34,155-Speed 2630.82 samples/sec Loss 10.4958 LearningRate 0.0578 Epoch: 4 Global Step: 198620 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:38,046-Speed 2632.07 samples/sec Loss 10.5085 LearningRate 0.0578 Epoch: 4 Global Step: 198630 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:41,936-Speed 2633.16 samples/sec Loss 10.4809 LearningRate 0.0578 Epoch: 4 Global Step: 198640 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:01:45,811-Speed 2643.49 samples/sec Loss 10.4385 LearningRate 0.0578 Epoch: 4 Global Step: 198650 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:49,708-Speed 2628.46 samples/sec Loss 10.2259 LearningRate 0.0578 Epoch: 4 Global Step: 198660 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:53,606-Speed 2627.52 samples/sec Loss 10.3577 LearningRate 0.0578 Epoch: 4 Global Step: 198670 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:01:57,500-Speed 2630.63 samples/sec Loss 10.4017 LearningRate 0.0578 Epoch: 4 Global Step: 198680 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:02:01,406-Speed 2622.02 samples/sec Loss 10.4402 LearningRate 0.0578 Epoch: 4 Global Step: 198690 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:02:05,304-Speed 2626.99 samples/sec Loss 10.4360 LearningRate 0.0578 Epoch: 4 Global Step: 198700 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:02:09,195-Speed 2632.52 samples/sec Loss 10.3393 LearningRate 0.0578 Epoch: 4 Global Step: 198710 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:02:13,096-Speed 2625.64 samples/sec Loss 10.3022 LearningRate 0.0578 Epoch: 4 Global Step: 198720 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:02:16,991-Speed 2629.10 samples/sec Loss 10.4760 LearningRate 0.0578 Epoch: 4 Global Step: 198730 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:02:20,882-Speed 2632.06 samples/sec Loss 10.3442 LearningRate 0.0578 Epoch: 4 Global Step: 198740 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:02:24,780-Speed 2628.13 samples/sec Loss 10.3906 LearningRate 0.0578 Epoch: 4 Global Step: 198750 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:28,673-Speed 2631.24 samples/sec Loss 10.2228 LearningRate 0.0578 Epoch: 4 Global Step: 198760 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:32,567-Speed 2630.16 samples/sec Loss 10.4550 LearningRate 0.0578 Epoch: 4 Global Step: 198770 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:36,502-Speed 2602.74 samples/sec Loss 10.2655 LearningRate 0.0578 Epoch: 4 Global Step: 198780 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:40,403-Speed 2625.24 samples/sec Loss 10.5210 LearningRate 0.0578 Epoch: 4 Global Step: 198790 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:44,306-Speed 2624.37 samples/sec Loss 10.4149 LearningRate 0.0578 Epoch: 4 Global Step: 198800 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:48,213-Speed 2621.80 samples/sec Loss 10.4094 LearningRate 0.0578 Epoch: 4 Global Step: 198810 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:52,107-Speed 2630.28 samples/sec Loss 10.3558 LearningRate 0.0578 Epoch: 4 Global Step: 198820 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:56,007-Speed 2626.21 samples/sec Loss 10.2301 LearningRate 0.0578 Epoch: 4 Global Step: 198830 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:02:59,918-Speed 2618.70 samples/sec Loss 10.3536 LearningRate 0.0578 Epoch: 4 Global Step: 198840 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:03:03,800-Speed 2638.70 samples/sec Loss 10.4048 LearningRate 0.0578 Epoch: 4 Global Step: 198850 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:03:07,701-Speed 2625.27 samples/sec Loss 10.3115 LearningRate 0.0578 Epoch: 4 Global Step: 198860 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:03:11,586-Speed 2636.58 samples/sec Loss 10.4263 LearningRate 0.0578 Epoch: 4 Global Step: 198870 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:15,489-Speed 2624.13 samples/sec Loss 10.3710 LearningRate 0.0578 Epoch: 4 Global Step: 198880 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:19,389-Speed 2626.26 samples/sec Loss 10.3086 LearningRate 0.0578 Epoch: 4 Global Step: 198890 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:23,294-Speed 2623.34 samples/sec Loss 10.4092 LearningRate 0.0578 Epoch: 4 Global Step: 198900 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:27,205-Speed 2618.61 samples/sec Loss 10.3059 LearningRate 0.0578 Epoch: 4 Global Step: 198910 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:31,125-Speed 2612.42 samples/sec Loss 10.5314 LearningRate 0.0578 Epoch: 4 Global Step: 198920 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:35,026-Speed 2625.46 samples/sec Loss 10.4327 LearningRate 0.0578 Epoch: 4 Global Step: 198930 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:38,931-Speed 2623.65 samples/sec Loss 10.2482 LearningRate 0.0578 Epoch: 4 Global Step: 198940 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:42,829-Speed 2627.53 samples/sec Loss 10.4961 LearningRate 0.0578 Epoch: 4 Global Step: 198950 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:46,747-Speed 2614.43 samples/sec Loss 10.3463 LearningRate 0.0578 Epoch: 4 Global Step: 198960 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:03:50,636-Speed 2633.19 samples/sec Loss 10.3763 LearningRate 0.0578 Epoch: 4 Global Step: 198970 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:03:54,550-Speed 2617.26 samples/sec Loss 10.4557 LearningRate 0.0578 Epoch: 4 Global Step: 198980 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:03:58,473-Speed 2610.80 samples/sec Loss 10.4232 LearningRate 0.0578 Epoch: 4 Global Step: 198990 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:02,375-Speed 2624.72 samples/sec Loss 10.3001 LearningRate 0.0578 Epoch: 4 Global Step: 199000 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:06,270-Speed 2629.27 samples/sec Loss 10.2978 LearningRate 0.0578 Epoch: 4 Global Step: 199010 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:10,172-Speed 2624.91 samples/sec Loss 10.4877 LearningRate 0.0578 Epoch: 4 Global Step: 199020 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:14,065-Speed 2631.18 samples/sec Loss 10.4304 LearningRate 0.0578 Epoch: 4 Global Step: 199030 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:17,968-Speed 2624.38 samples/sec Loss 10.4121 LearningRate 0.0578 Epoch: 4 Global Step: 199040 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:21,860-Speed 2632.04 samples/sec Loss 10.4827 LearningRate 0.0578 Epoch: 4 Global Step: 199050 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:25,754-Speed 2630.31 samples/sec Loss 10.4122 LearningRate 0.0578 Epoch: 4 Global Step: 199060 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:04:29,659-Speed 2622.37 samples/sec Loss 10.1884 LearningRate 0.0578 Epoch: 4 Global Step: 199070 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:04:33,566-Speed 2621.22 samples/sec Loss 10.3624 LearningRate 0.0578 Epoch: 4 Global Step: 199080 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:04:37,466-Speed 2626.95 samples/sec Loss 10.3697 LearningRate 0.0578 Epoch: 4 Global Step: 199090 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:04:41,360-Speed 2630.05 samples/sec Loss 10.3972 LearningRate 0.0578 Epoch: 4 Global Step: 199100 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:04:45,262-Speed 2624.80 samples/sec Loss 10.4243 LearningRate 0.0578 Epoch: 4 Global Step: 199110 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:04:49,165-Speed 2624.09 samples/sec Loss 10.2614 LearningRate 0.0578 Epoch: 4 Global Step: 199120 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:04:53,059-Speed 2631.86 samples/sec Loss 10.3766 LearningRate 0.0578 Epoch: 4 Global Step: 199130 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:04:56,953-Speed 2630.64 samples/sec Loss 10.3419 LearningRate 0.0578 Epoch: 4 Global Step: 199140 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:00,849-Speed 2628.95 samples/sec Loss 10.4512 LearningRate 0.0578 Epoch: 4 Global Step: 199150 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:04,748-Speed 2627.00 samples/sec Loss 10.3635 LearningRate 0.0577 Epoch: 4 Global Step: 199160 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:08,653-Speed 2622.64 samples/sec Loss 10.4523 LearningRate 0.0577 Epoch: 4 Global Step: 199170 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:05:12,553-Speed 2626.54 samples/sec Loss 10.3363 LearningRate 0.0577 Epoch: 4 Global Step: 199180 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:05:16,445-Speed 2631.01 samples/sec Loss 10.3246 LearningRate 0.0577 Epoch: 4 Global Step: 199190 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:05:20,348-Speed 2624.29 samples/sec Loss 10.3816 LearningRate 0.0577 Epoch: 4 Global Step: 199200 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:05:24,253-Speed 2623.04 samples/sec Loss 10.1874 LearningRate 0.0577 Epoch: 4 Global Step: 199210 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:05:28,145-Speed 2631.78 samples/sec Loss 10.3675 LearningRate 0.0577 Epoch: 4 Global Step: 199220 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:05:32,041-Speed 2628.66 samples/sec Loss 10.2827 LearningRate 0.0577 Epoch: 4 Global Step: 199230 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:05:35,915-Speed 2644.17 samples/sec Loss 10.5425 LearningRate 0.0577 Epoch: 4 Global Step: 199240 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:39,820-Speed 2622.67 samples/sec Loss 10.3043 LearningRate 0.0577 Epoch: 4 Global Step: 199250 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:43,724-Speed 2623.63 samples/sec Loss 10.4082 LearningRate 0.0577 Epoch: 4 Global Step: 199260 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:47,629-Speed 2622.62 samples/sec Loss 10.3106 LearningRate 0.0577 Epoch: 4 Global Step: 199270 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:51,534-Speed 2623.16 samples/sec Loss 10.3620 LearningRate 0.0577 Epoch: 4 Global Step: 199280 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:55,436-Speed 2624.92 samples/sec Loss 10.2897 LearningRate 0.0577 Epoch: 4 Global Step: 199290 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:05:59,328-Speed 2631.92 samples/sec Loss 10.4026 LearningRate 0.0577 Epoch: 4 Global Step: 199300 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:03,224-Speed 2628.28 samples/sec Loss 10.3689 LearningRate 0.0577 Epoch: 4 Global Step: 199310 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:07,121-Speed 2628.24 samples/sec Loss 10.5717 LearningRate 0.0577 Epoch: 4 Global Step: 199320 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:11,014-Speed 2631.14 samples/sec Loss 10.3764 LearningRate 0.0577 Epoch: 4 Global Step: 199330 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:14,922-Speed 2621.47 samples/sec Loss 10.4316 LearningRate 0.0577 Epoch: 4 Global Step: 199340 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:06:18,889-Speed 2581.88 samples/sec Loss 10.3267 LearningRate 0.0577 Epoch: 4 Global Step: 199350 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:06:22,771-Speed 2638.19 samples/sec Loss 10.4749 LearningRate 0.0577 Epoch: 4 Global Step: 199360 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:26,670-Speed 2627.04 samples/sec Loss 10.4410 LearningRate 0.0577 Epoch: 4 Global Step: 199370 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:30,571-Speed 2625.50 samples/sec Loss 10.3568 LearningRate 0.0577 Epoch: 4 Global Step: 199380 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:34,494-Speed 2610.65 samples/sec Loss 10.2446 LearningRate 0.0577 Epoch: 4 Global Step: 199390 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:38,394-Speed 2626.34 samples/sec Loss 10.3340 LearningRate 0.0577 Epoch: 4 Global Step: 199400 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:42,287-Speed 2631.39 samples/sec Loss 10.2946 LearningRate 0.0577 Epoch: 4 Global Step: 199410 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:46,181-Speed 2629.92 samples/sec Loss 10.3393 LearningRate 0.0577 Epoch: 4 Global Step: 199420 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:50,075-Speed 2630.38 samples/sec Loss 10.2619 LearningRate 0.0577 Epoch: 4 Global Step: 199430 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:06:53,954-Speed 2641.00 samples/sec Loss 10.4370 LearningRate 0.0577 Epoch: 4 Global Step: 199440 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:06:57,848-Speed 2630.27 samples/sec Loss 10.4986 LearningRate 0.0577 Epoch: 4 Global Step: 199450 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:01,738-Speed 2632.43 samples/sec Loss 10.3995 LearningRate 0.0577 Epoch: 4 Global Step: 199460 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:05,629-Speed 2632.58 samples/sec Loss 10.3696 LearningRate 0.0577 Epoch: 4 Global Step: 199470 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:09,520-Speed 2632.19 samples/sec Loss 10.4180 LearningRate 0.0577 Epoch: 4 Global Step: 199480 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:13,411-Speed 2632.13 samples/sec Loss 10.3220 LearningRate 0.0577 Epoch: 4 Global Step: 199490 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:17,301-Speed 2632.95 samples/sec Loss 10.2431 LearningRate 0.0577 Epoch: 4 Global Step: 199500 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:21,198-Speed 2628.60 samples/sec Loss 10.2741 LearningRate 0.0577 Epoch: 4 Global Step: 199510 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:25,096-Speed 2627.48 samples/sec Loss 10.4599 LearningRate 0.0577 Epoch: 4 Global Step: 199520 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:28,993-Speed 2628.87 samples/sec Loss 10.3936 LearningRate 0.0577 Epoch: 4 Global Step: 199530 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:07:32,884-Speed 2631.64 samples/sec Loss 10.4091 LearningRate 0.0577 Epoch: 4 Global Step: 199540 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:07:36,773-Speed 2634.05 samples/sec Loss 10.3351 LearningRate 0.0577 Epoch: 4 Global Step: 199550 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:07:40,660-Speed 2634.33 samples/sec Loss 10.3175 LearningRate 0.0577 Epoch: 4 Global Step: 199560 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:07:44,550-Speed 2633.30 samples/sec Loss 10.2832 LearningRate 0.0577 Epoch: 4 Global Step: 199570 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:07:48,441-Speed 2633.25 samples/sec Loss 10.2602 LearningRate 0.0577 Epoch: 4 Global Step: 199580 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:07:52,423-Speed 2572.11 samples/sec Loss 10.4254 LearningRate 0.0577 Epoch: 4 Global Step: 199590 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:07:56,326-Speed 2624.11 samples/sec Loss 10.2727 LearningRate 0.0577 Epoch: 4 Global Step: 199600 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:00,223-Speed 2628.42 samples/sec Loss 10.3611 LearningRate 0.0577 Epoch: 4 Global Step: 199610 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:04,145-Speed 2611.39 samples/sec Loss 10.4112 LearningRate 0.0577 Epoch: 4 Global Step: 199620 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:08,038-Speed 2631.26 samples/sec Loss 10.3248 LearningRate 0.0577 Epoch: 4 Global Step: 199630 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:11,931-Speed 2631.05 samples/sec Loss 10.3844 LearningRate 0.0577 Epoch: 4 Global Step: 199640 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:08:15,807-Speed 2642.79 samples/sec Loss 10.2962 LearningRate 0.0577 Epoch: 4 Global Step: 199650 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:19,724-Speed 2614.84 samples/sec Loss 10.2848 LearningRate 0.0577 Epoch: 4 Global Step: 199660 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:23,623-Speed 2627.44 samples/sec Loss 10.2787 LearningRate 0.0577 Epoch: 4 Global Step: 199670 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:27,519-Speed 2628.93 samples/sec Loss 10.4202 LearningRate 0.0577 Epoch: 4 Global Step: 199680 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:31,414-Speed 2629.35 samples/sec Loss 10.4731 LearningRate 0.0577 Epoch: 4 Global Step: 199690 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:35,312-Speed 2627.82 samples/sec Loss 10.4287 LearningRate 0.0576 Epoch: 4 Global Step: 199700 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:39,295-Speed 2571.84 samples/sec Loss 10.2613 LearningRate 0.0576 Epoch: 4 Global Step: 199710 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:43,192-Speed 2628.25 samples/sec Loss 10.2827 LearningRate 0.0576 Epoch: 4 Global Step: 199720 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:47,083-Speed 2632.23 samples/sec Loss 10.4764 LearningRate 0.0576 Epoch: 4 Global Step: 199730 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:50,989-Speed 2622.93 samples/sec Loss 10.3902 LearningRate 0.0576 Epoch: 4 Global Step: 199740 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:54,864-Speed 2643.47 samples/sec Loss 10.3001 LearningRate 0.0576 Epoch: 4 Global Step: 199750 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:08:58,756-Speed 2630.99 samples/sec Loss 10.3683 LearningRate 0.0576 Epoch: 4 Global Step: 199760 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:02,649-Speed 2631.26 samples/sec Loss 10.4276 LearningRate 0.0576 Epoch: 4 Global Step: 199770 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:06,542-Speed 2630.91 samples/sec Loss 10.3477 LearningRate 0.0576 Epoch: 4 Global Step: 199780 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:10,432-Speed 2633.05 samples/sec Loss 10.1873 LearningRate 0.0576 Epoch: 4 Global Step: 199790 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:14,360-Speed 2608.35 samples/sec Loss 10.4312 LearningRate 0.0576 Epoch: 4 Global Step: 199800 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:18,255-Speed 2629.23 samples/sec Loss 10.3294 LearningRate 0.0576 Epoch: 4 Global Step: 199810 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:22,147-Speed 2632.36 samples/sec Loss 10.5087 LearningRate 0.0576 Epoch: 4 Global Step: 199820 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:26,070-Speed 2610.66 samples/sec Loss 10.4282 LearningRate 0.0576 Epoch: 4 Global Step: 199830 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:29,961-Speed 2632.47 samples/sec Loss 10.3917 LearningRate 0.0576 Epoch: 4 Global Step: 199840 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:33,852-Speed 2632.19 samples/sec Loss 10.3286 LearningRate 0.0576 Epoch: 4 Global Step: 199850 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:09:37,745-Speed 2631.22 samples/sec Loss 10.1622 LearningRate 0.0576 Epoch: 4 Global Step: 199860 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:09:41,642-Speed 2628.49 samples/sec Loss 10.3089 LearningRate 0.0576 Epoch: 4 Global Step: 199870 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:09:45,536-Speed 2630.71 samples/sec Loss 10.3292 LearningRate 0.0576 Epoch: 4 Global Step: 199880 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:09:49,439-Speed 2623.76 samples/sec Loss 10.3811 LearningRate 0.0576 Epoch: 4 Global Step: 199890 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:53,336-Speed 2628.95 samples/sec Loss 10.4237 LearningRate 0.0576 Epoch: 4 Global Step: 199900 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:09:57,226-Speed 2632.95 samples/sec Loss 10.5008 LearningRate 0.0576 Epoch: 4 Global Step: 199910 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:01,119-Speed 2630.72 samples/sec Loss 10.3433 LearningRate 0.0576 Epoch: 4 Global Step: 199920 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:05,017-Speed 2627.17 samples/sec Loss 10.3834 LearningRate 0.0576 Epoch: 4 Global Step: 199930 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:08,912-Speed 2630.57 samples/sec Loss 10.3877 LearningRate 0.0576 Epoch: 4 Global Step: 199940 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:12,808-Speed 2628.82 samples/sec Loss 10.2307 LearningRate 0.0576 Epoch: 4 Global Step: 199950 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:16,700-Speed 2631.46 samples/sec Loss 10.3340 LearningRate 0.0576 Epoch: 4 Global Step: 199960 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:20,602-Speed 2625.01 samples/sec Loss 10.3875 LearningRate 0.0576 Epoch: 4 Global Step: 199970 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:24,502-Speed 2626.64 samples/sec Loss 10.3997 LearningRate 0.0576 Epoch: 4 Global Step: 199980 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:10:28,394-Speed 2631.97 samples/sec Loss 10.4073 LearningRate 0.0576 Epoch: 4 Global Step: 199990 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:10:32,297-Speed 2623.94 samples/sec Loss 10.2945 LearningRate 0.0576 Epoch: 4 Global Step: 200000 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:11:15,517-[lfw][200000]XNorm: 23.224185
Training: 2022-04-13 18:11:15,518-[lfw][200000]Accuracy-Flip: 0.99767+-0.00226
Training: 2022-04-13 18:11:15,518-[lfw][200000]Accuracy-Highest: 0.99783
Training: 2022-04-13 18:12:05,937-[cfp_fp][200000]XNorm: 20.991996
Training: 2022-04-13 18:12:05,938-[cfp_fp][200000]Accuracy-Flip: 0.98314+-0.00477
Training: 2022-04-13 18:12:05,939-[cfp_fp][200000]Accuracy-Highest: 0.98314
Training: 2022-04-13 18:12:49,331-[agedb_30][200000]XNorm: 23.042334
Training: 2022-04-13 18:12:49,332-[agedb_30][200000]Accuracy-Flip: 0.96933+-0.00731
Training: 2022-04-13 18:12:49,333-[agedb_30][200000]Accuracy-Highest: 0.97150
Training: 2022-04-13 18:12:53,172-Speed 72.69 samples/sec Loss 10.4043 LearningRate 0.0576 Epoch: 4 Global Step: 200010 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:12:57,039-Speed 2648.66 samples/sec Loss 10.3826 LearningRate 0.0576 Epoch: 4 Global Step: 200020 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:00,906-Speed 2648.98 samples/sec Loss 10.1876 LearningRate 0.0576 Epoch: 4 Global Step: 200030 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:04,780-Speed 2644.20 samples/sec Loss 10.3414 LearningRate 0.0576 Epoch: 4 Global Step: 200040 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:08,647-Speed 2648.14 samples/sec Loss 10.4036 LearningRate 0.0576 Epoch: 4 Global Step: 200050 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:12,534-Speed 2636.13 samples/sec Loss 10.4969 LearningRate 0.0576 Epoch: 4 Global Step: 200060 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:16,409-Speed 2643.51 samples/sec Loss 10.4154 LearningRate 0.0576 Epoch: 4 Global Step: 200070 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:20,280-Speed 2645.73 samples/sec Loss 10.3380 LearningRate 0.0576 Epoch: 4 Global Step: 200080 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:24,164-Speed 2637.74 samples/sec Loss 10.2099 LearningRate 0.0576 Epoch: 4 Global Step: 200090 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:28,040-Speed 2641.99 samples/sec Loss 10.2916 LearningRate 0.0576 Epoch: 4 Global Step: 200100 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:31,924-Speed 2637.72 samples/sec Loss 10.4134 LearningRate 0.0576 Epoch: 4 Global Step: 200110 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:13:35,807-Speed 2637.67 samples/sec Loss 10.3706 LearningRate 0.0576 Epoch: 4 Global Step: 200120 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:13:39,699-Speed 2631.26 samples/sec Loss 10.4044 LearningRate 0.0576 Epoch: 4 Global Step: 200130 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:13:43,609-Speed 2619.49 samples/sec Loss 10.3783 LearningRate 0.0576 Epoch: 4 Global Step: 200140 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:47,494-Speed 2637.22 samples/sec Loss 10.4498 LearningRate 0.0576 Epoch: 4 Global Step: 200150 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:51,382-Speed 2633.96 samples/sec Loss 10.4382 LearningRate 0.0576 Epoch: 4 Global Step: 200160 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:55,271-Speed 2634.33 samples/sec Loss 10.4895 LearningRate 0.0576 Epoch: 4 Global Step: 200170 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:13:59,173-Speed 2624.17 samples/sec Loss 10.2979 LearningRate 0.0576 Epoch: 4 Global Step: 200180 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:14:03,057-Speed 2637.23 samples/sec Loss 10.3013 LearningRate 0.0576 Epoch: 4 Global Step: 200190 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:14:06,958-Speed 2625.98 samples/sec Loss 10.3608 LearningRate 0.0576 Epoch: 4 Global Step: 200200 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:14:10,852-Speed 2630.32 samples/sec Loss 10.2863 LearningRate 0.0576 Epoch: 4 Global Step: 200210 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:14:14,745-Speed 2630.44 samples/sec Loss 10.3409 LearningRate 0.0576 Epoch: 4 Global Step: 200220 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:14:18,638-Speed 2631.03 samples/sec Loss 10.2493 LearningRate 0.0576 Epoch: 4 Global Step: 200230 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:14:22,532-Speed 2630.54 samples/sec Loss 10.2352 LearningRate 0.0576 Epoch: 4 Global Step: 200240 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:14:26,410-Speed 2640.65 samples/sec Loss 10.4536 LearningRate 0.0575 Epoch: 4 Global Step: 200250 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:14:30,284-Speed 2644.24 samples/sec Loss 10.3163 LearningRate 0.0575 Epoch: 4 Global Step: 200260 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:14:34,297-Speed 2552.47 samples/sec Loss 10.3092 LearningRate 0.0575 Epoch: 4 Global Step: 200270 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:14:38,247-Speed 2593.13 samples/sec Loss 10.2065 LearningRate 0.0575 Epoch: 4 Global Step: 200280 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:14:42,227-Speed 2573.58 samples/sec Loss 10.4271 LearningRate 0.0575 Epoch: 4 Global Step: 200290 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:14:46,298-Speed 2516.04 samples/sec Loss 10.3799 LearningRate 0.0575 Epoch: 4 Global Step: 200300 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:14:50,294-Speed 2562.52 samples/sec Loss 10.4337 LearningRate 0.0575 Epoch: 4 Global Step: 200310 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:14:54,184-Speed 2633.70 samples/sec Loss 10.3961 LearningRate 0.0575 Epoch: 4 Global Step: 200320 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:14:58,077-Speed 2630.69 samples/sec Loss 10.3465 LearningRate 0.0575 Epoch: 4 Global Step: 200330 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:15:01,971-Speed 2629.95 samples/sec Loss 10.2584 LearningRate 0.0575 Epoch: 4 Global Step: 200340 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:15:05,861-Speed 2632.98 samples/sec Loss 10.3747 LearningRate 0.0575 Epoch: 4 Global Step: 200350 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:15:09,751-Speed 2633.63 samples/sec Loss 10.2967 LearningRate 0.0575 Epoch: 4 Global Step: 200360 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:13,663-Speed 2618.34 samples/sec Loss 10.2796 LearningRate 0.0575 Epoch: 4 Global Step: 200370 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:17,560-Speed 2627.94 samples/sec Loss 10.3790 LearningRate 0.0575 Epoch: 4 Global Step: 200380 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:21,456-Speed 2629.26 samples/sec Loss 10.3223 LearningRate 0.0575 Epoch: 4 Global Step: 200390 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:25,346-Speed 2632.54 samples/sec Loss 10.3495 LearningRate 0.0575 Epoch: 4 Global Step: 200400 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:29,242-Speed 2629.21 samples/sec Loss 10.4922 LearningRate 0.0575 Epoch: 4 Global Step: 200410 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:33,146-Speed 2623.46 samples/sec Loss 10.2768 LearningRate 0.0575 Epoch: 4 Global Step: 200420 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:37,203-Speed 2524.35 samples/sec Loss 10.2515 LearningRate 0.0575 Epoch: 4 Global Step: 200430 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:41,095-Speed 2631.78 samples/sec Loss 10.3976 LearningRate 0.0575 Epoch: 4 Global Step: 200440 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:44,997-Speed 2625.14 samples/sec Loss 10.3555 LearningRate 0.0575 Epoch: 4 Global Step: 200450 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:15:48,890-Speed 2631.46 samples/sec Loss 10.4310 LearningRate 0.0575 Epoch: 4 Global Step: 200460 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:15:52,785-Speed 2629.46 samples/sec Loss 10.3209 LearningRate 0.0575 Epoch: 4 Global Step: 200470 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:15:56,683-Speed 2627.43 samples/sec Loss 10.2903 LearningRate 0.0575 Epoch: 4 Global Step: 200480 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:16:00,569-Speed 2635.63 samples/sec Loss 10.3903 LearningRate 0.0575 Epoch: 4 Global Step: 200490 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:04,464-Speed 2629.13 samples/sec Loss 10.2712 LearningRate 0.0575 Epoch: 4 Global Step: 200500 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:08,365-Speed 2625.85 samples/sec Loss 10.3528 LearningRate 0.0575 Epoch: 4 Global Step: 200510 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:12,250-Speed 2636.37 samples/sec Loss 10.2308 LearningRate 0.0575 Epoch: 4 Global Step: 200520 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:16,140-Speed 2632.58 samples/sec Loss 10.5774 LearningRate 0.0575 Epoch: 4 Global Step: 200530 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:20,146-Speed 2557.55 samples/sec Loss 10.1639 LearningRate 0.0575 Epoch: 4 Global Step: 200540 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:24,066-Speed 2612.74 samples/sec Loss 10.3310 LearningRate 0.0575 Epoch: 4 Global Step: 200550 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:27,955-Speed 2633.75 samples/sec Loss 10.4402 LearningRate 0.0575 Epoch: 4 Global Step: 200560 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:31,850-Speed 2629.80 samples/sec Loss 10.3769 LearningRate 0.0575 Epoch: 4 Global Step: 200570 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:35,761-Speed 2618.57 samples/sec Loss 10.4585 LearningRate 0.0575 Epoch: 4 Global Step: 200580 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:39,640-Speed 2640.15 samples/sec Loss 10.3648 LearningRate 0.0575 Epoch: 4 Global Step: 200590 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:43,535-Speed 2629.51 samples/sec Loss 10.3323 LearningRate 0.0575 Epoch: 4 Global Step: 200600 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:47,438-Speed 2624.61 samples/sec Loss 10.3710 LearningRate 0.0575 Epoch: 4 Global Step: 200610 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:51,342-Speed 2623.26 samples/sec Loss 10.2220 LearningRate 0.0575 Epoch: 4 Global Step: 200620 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:55,233-Speed 2632.19 samples/sec Loss 10.2811 LearningRate 0.0575 Epoch: 4 Global Step: 200630 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:16:59,126-Speed 2631.26 samples/sec Loss 10.3466 LearningRate 0.0575 Epoch: 4 Global Step: 200640 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:17:03,023-Speed 2628.64 samples/sec Loss 10.1749 LearningRate 0.0575 Epoch: 4 Global Step: 200650 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:17:06,918-Speed 2629.38 samples/sec Loss 10.3942 LearningRate 0.0575 Epoch: 4 Global Step: 200660 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:17:10,796-Speed 2640.95 samples/sec Loss 10.2925 LearningRate 0.0575 Epoch: 4 Global Step: 200670 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:14,691-Speed 2629.76 samples/sec Loss 10.2613 LearningRate 0.0575 Epoch: 4 Global Step: 200680 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:18,586-Speed 2629.34 samples/sec Loss 10.2377 LearningRate 0.0575 Epoch: 4 Global Step: 200690 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:22,481-Speed 2629.97 samples/sec Loss 10.3873 LearningRate 0.0575 Epoch: 4 Global Step: 200700 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:26,379-Speed 2627.36 samples/sec Loss 10.2964 LearningRate 0.0575 Epoch: 4 Global Step: 200710 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:30,288-Speed 2620.55 samples/sec Loss 10.4622 LearningRate 0.0575 Epoch: 4 Global Step: 200720 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:34,185-Speed 2627.79 samples/sec Loss 10.4409 LearningRate 0.0575 Epoch: 4 Global Step: 200730 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:38,083-Speed 2627.52 samples/sec Loss 10.2889 LearningRate 0.0575 Epoch: 4 Global Step: 200740 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:41,977-Speed 2630.40 samples/sec Loss 10.2983 LearningRate 0.0575 Epoch: 4 Global Step: 200750 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:45,880-Speed 2624.39 samples/sec Loss 10.3795 LearningRate 0.0575 Epoch: 4 Global Step: 200760 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:17:49,798-Speed 2614.67 samples/sec Loss 10.2564 LearningRate 0.0575 Epoch: 4 Global Step: 200770 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:17:53,734-Speed 2601.48 samples/sec Loss 10.3226 LearningRate 0.0575 Epoch: 4 Global Step: 200780 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:17:57,652-Speed 2614.82 samples/sec Loss 10.2650 LearningRate 0.0575 Epoch: 4 Global Step: 200790 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:01,556-Speed 2623.13 samples/sec Loss 10.4238 LearningRate 0.0574 Epoch: 4 Global Step: 200800 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:05,477-Speed 2611.74 samples/sec Loss 10.3200 LearningRate 0.0574 Epoch: 4 Global Step: 200810 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:09,381-Speed 2623.76 samples/sec Loss 10.2479 LearningRate 0.0574 Epoch: 4 Global Step: 200820 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:13,287-Speed 2622.89 samples/sec Loss 10.3806 LearningRate 0.0574 Epoch: 4 Global Step: 200830 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:17,194-Speed 2621.56 samples/sec Loss 10.4470 LearningRate 0.0574 Epoch: 4 Global Step: 200840 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:21,098-Speed 2624.13 samples/sec Loss 10.3558 LearningRate 0.0574 Epoch: 4 Global Step: 200850 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:25,001-Speed 2624.27 samples/sec Loss 10.4616 LearningRate 0.0574 Epoch: 4 Global Step: 200860 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:28,904-Speed 2624.29 samples/sec Loss 10.4647 LearningRate 0.0574 Epoch: 4 Global Step: 200870 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:32,815-Speed 2618.88 samples/sec Loss 10.2822 LearningRate 0.0574 Epoch: 4 Global Step: 200880 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:36,721-Speed 2621.95 samples/sec Loss 10.5319 LearningRate 0.0574 Epoch: 4 Global Step: 200890 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:40,678-Speed 2588.00 samples/sec Loss 10.4486 LearningRate 0.0574 Epoch: 4 Global Step: 200900 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:44,578-Speed 2625.97 samples/sec Loss 10.3828 LearningRate 0.0574 Epoch: 4 Global Step: 200910 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:48,480-Speed 2626.02 samples/sec Loss 10.3694 LearningRate 0.0574 Epoch: 4 Global Step: 200920 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:52,382-Speed 2624.83 samples/sec Loss 10.2716 LearningRate 0.0574 Epoch: 4 Global Step: 200930 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:18:56,283-Speed 2625.60 samples/sec Loss 10.3191 LearningRate 0.0574 Epoch: 4 Global Step: 200940 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:19:00,190-Speed 2621.82 samples/sec Loss 10.4931 LearningRate 0.0574 Epoch: 4 Global Step: 200950 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:19:04,096-Speed 2622.20 samples/sec Loss 10.3818 LearningRate 0.0574 Epoch: 4 Global Step: 200960 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:19:07,987-Speed 2632.37 samples/sec Loss 10.5187 LearningRate 0.0574 Epoch: 4 Global Step: 200970 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:11,892-Speed 2622.89 samples/sec Loss 10.4167 LearningRate 0.0574 Epoch: 4 Global Step: 200980 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:15,795-Speed 2623.68 samples/sec Loss 10.3395 LearningRate 0.0574 Epoch: 4 Global Step: 200990 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:19,707-Speed 2618.40 samples/sec Loss 10.3017 LearningRate 0.0574 Epoch: 4 Global Step: 201000 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:23,608-Speed 2625.57 samples/sec Loss 10.2783 LearningRate 0.0574 Epoch: 4 Global Step: 201010 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:27,510-Speed 2624.52 samples/sec Loss 10.2057 LearningRate 0.0574 Epoch: 4 Global Step: 201020 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:31,413-Speed 2624.81 samples/sec Loss 10.2202 LearningRate 0.0574 Epoch: 4 Global Step: 201030 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:35,314-Speed 2625.40 samples/sec Loss 10.2703 LearningRate 0.0574 Epoch: 4 Global Step: 201040 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:39,219-Speed 2623.30 samples/sec Loss 10.4041 LearningRate 0.0574 Epoch: 4 Global Step: 201050 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:43,151-Speed 2604.65 samples/sec Loss 10.4855 LearningRate 0.0574 Epoch: 4 Global Step: 201060 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:19:47,067-Speed 2615.18 samples/sec Loss 10.2893 LearningRate 0.0574 Epoch: 4 Global Step: 201070 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:19:50,978-Speed 2618.82 samples/sec Loss 10.2592 LearningRate 0.0574 Epoch: 4 Global Step: 201080 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:19:54,890-Speed 2618.59 samples/sec Loss 10.2758 LearningRate 0.0574 Epoch: 4 Global Step: 201090 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:19:58,798-Speed 2620.89 samples/sec Loss 10.3186 LearningRate 0.0574 Epoch: 4 Global Step: 201100 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:02,711-Speed 2617.48 samples/sec Loss 10.4393 LearningRate 0.0574 Epoch: 4 Global Step: 201110 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:06,620-Speed 2620.08 samples/sec Loss 10.3071 LearningRate 0.0574 Epoch: 4 Global Step: 201120 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:10,524-Speed 2623.21 samples/sec Loss 10.3154 LearningRate 0.0574 Epoch: 4 Global Step: 201130 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:14,425-Speed 2625.72 samples/sec Loss 10.5162 LearningRate 0.0574 Epoch: 4 Global Step: 201140 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:18,331-Speed 2622.57 samples/sec Loss 10.4712 LearningRate 0.0574 Epoch: 4 Global Step: 201150 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:22,243-Speed 2618.21 samples/sec Loss 10.4910 LearningRate 0.0574 Epoch: 4 Global Step: 201160 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:26,131-Speed 2634.44 samples/sec Loss 10.4604 LearningRate 0.0574 Epoch: 4 Global Step: 201170 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:30,050-Speed 2613.45 samples/sec Loss 10.4167 LearningRate 0.0574 Epoch: 4 Global Step: 201180 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:33,958-Speed 2620.59 samples/sec Loss 10.3073 LearningRate 0.0574 Epoch: 4 Global Step: 201190 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:37,880-Speed 2611.50 samples/sec Loss 10.2696 LearningRate 0.0574 Epoch: 4 Global Step: 201200 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:41,793-Speed 2617.56 samples/sec Loss 10.3561 LearningRate 0.0574 Epoch: 4 Global Step: 201210 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:45,702-Speed 2619.92 samples/sec Loss 10.3029 LearningRate 0.0574 Epoch: 4 Global Step: 201220 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:49,622-Speed 2613.08 samples/sec Loss 10.2778 LearningRate 0.0574 Epoch: 4 Global Step: 201230 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:53,544-Speed 2612.06 samples/sec Loss 10.4619 LearningRate 0.0574 Epoch: 4 Global Step: 201240 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:20:57,459-Speed 2615.91 samples/sec Loss 10.2682 LearningRate 0.0574 Epoch: 4 Global Step: 201250 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:21:01,364-Speed 2622.41 samples/sec Loss 10.2486 LearningRate 0.0574 Epoch: 4 Global Step: 201260 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:21:05,255-Speed 2632.49 samples/sec Loss 10.3128 LearningRate 0.0574 Epoch: 4 Global Step: 201270 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:21:09,160-Speed 2622.57 samples/sec Loss 10.3626 LearningRate 0.0574 Epoch: 4 Global Step: 201280 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:21:13,067-Speed 2621.82 samples/sec Loss 10.4133 LearningRate 0.0574 Epoch: 4 Global Step: 201290 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:21:16,954-Speed 2634.92 samples/sec Loss 10.3380 LearningRate 0.0574 Epoch: 4 Global Step: 201300 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:20,885-Speed 2605.45 samples/sec Loss 10.2953 LearningRate 0.0574 Epoch: 4 Global Step: 201310 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:24,790-Speed 2622.83 samples/sec Loss 10.3719 LearningRate 0.0574 Epoch: 4 Global Step: 201320 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:28,696-Speed 2622.59 samples/sec Loss 10.3280 LearningRate 0.0574 Epoch: 4 Global Step: 201330 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:32,600-Speed 2623.53 samples/sec Loss 10.4203 LearningRate 0.0574 Epoch: 4 Global Step: 201340 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:36,507-Speed 2621.72 samples/sec Loss 10.1271 LearningRate 0.0573 Epoch: 4 Global Step: 201350 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:40,410-Speed 2624.02 samples/sec Loss 10.2698 LearningRate 0.0573 Epoch: 4 Global Step: 201360 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:44,316-Speed 2622.23 samples/sec Loss 10.2664 LearningRate 0.0573 Epoch: 4 Global Step: 201370 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:48,217-Speed 2625.38 samples/sec Loss 10.3412 LearningRate 0.0573 Epoch: 4 Global Step: 201380 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:52,135-Speed 2614.24 samples/sec Loss 10.3065 LearningRate 0.0573 Epoch: 4 Global Step: 201390 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:21:56,044-Speed 2619.82 samples/sec Loss 10.2351 LearningRate 0.0573 Epoch: 4 Global Step: 201400 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:21:59,951-Speed 2621.53 samples/sec Loss 10.3788 LearningRate 0.0573 Epoch: 4 Global Step: 201410 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:03,858-Speed 2621.65 samples/sec Loss 10.5261 LearningRate 0.0573 Epoch: 4 Global Step: 201420 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:07,764-Speed 2622.21 samples/sec Loss 10.3217 LearningRate 0.0573 Epoch: 4 Global Step: 201430 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:11,668-Speed 2623.76 samples/sec Loss 10.3905 LearningRate 0.0573 Epoch: 4 Global Step: 201440 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:15,574-Speed 2622.41 samples/sec Loss 10.4972 LearningRate 0.0573 Epoch: 4 Global Step: 201450 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:19,488-Speed 2616.36 samples/sec Loss 10.2720 LearningRate 0.0573 Epoch: 4 Global Step: 201460 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:23,421-Speed 2604.35 samples/sec Loss 10.1958 LearningRate 0.0573 Epoch: 4 Global Step: 201470 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:27,476-Speed 2525.94 samples/sec Loss 10.4877 LearningRate 0.0573 Epoch: 4 Global Step: 201480 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:31,523-Speed 2538.21 samples/sec Loss 10.2434 LearningRate 0.0573 Epoch: 4 Global Step: 201490 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:22:35,417-Speed 2630.11 samples/sec Loss 10.3084 LearningRate 0.0573 Epoch: 4 Global Step: 201500 Fp16 Grad Scale: 262144 Required: 71 hours
Training: 2022-04-13 18:22:39,279-Speed 2651.69 samples/sec Loss 10.5117 LearningRate 0.0573 Epoch: 4 Global Step: 201510 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:22:43,184-Speed 2623.38 samples/sec Loss 10.3410 LearningRate 0.0573 Epoch: 4 Global Step: 201520 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:22:47,088-Speed 2623.51 samples/sec Loss 10.2819 LearningRate 0.0573 Epoch: 4 Global Step: 201530 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:22:50,991-Speed 2623.90 samples/sec Loss 10.3414 LearningRate 0.0573 Epoch: 4 Global Step: 201540 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:22:54,897-Speed 2621.87 samples/sec Loss 10.4695 LearningRate 0.0573 Epoch: 4 Global Step: 201550 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:22:58,802-Speed 2623.54 samples/sec Loss 10.3834 LearningRate 0.0573 Epoch: 4 Global Step: 201560 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:23:02,717-Speed 2615.95 samples/sec Loss 10.3872 LearningRate 0.0573 Epoch: 4 Global Step: 201570 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:23:06,635-Speed 2614.03 samples/sec Loss 10.4523 LearningRate 0.0573 Epoch: 4 Global Step: 201580 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:23:10,540-Speed 2622.32 samples/sec Loss 10.3353 LearningRate 0.0573 Epoch: 4 Global Step: 201590 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:23:14,451-Speed 2619.39 samples/sec Loss 10.2683 LearningRate 0.0573 Epoch: 4 Global Step: 201600 Fp16 Grad Scale: 32768 Required: 71 hours
Training: 2022-04-13 18:23:18,363-Speed 2618.14 samples/sec Loss 10.3702 LearningRate 0.0573 Epoch: 4 Global Step: 201610 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:22,294-Speed 2605.84 samples/sec Loss 10.4157 LearningRate 0.0573 Epoch: 4 Global Step: 201620 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:26,198-Speed 2623.95 samples/sec Loss 10.3672 LearningRate 0.0573 Epoch: 4 Global Step: 201630 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:30,118-Speed 2612.92 samples/sec Loss 10.6150 LearningRate 0.0573 Epoch: 4 Global Step: 201640 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:34,017-Speed 2626.62 samples/sec Loss 10.3129 LearningRate 0.0573 Epoch: 4 Global Step: 201650 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:37,923-Speed 2622.29 samples/sec Loss 10.3904 LearningRate 0.0573 Epoch: 4 Global Step: 201660 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:41,843-Speed 2613.14 samples/sec Loss 10.3802 LearningRate 0.0573 Epoch: 4 Global Step: 201670 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:45,749-Speed 2622.27 samples/sec Loss 10.2976 LearningRate 0.0573 Epoch: 4 Global Step: 201680 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:49,651-Speed 2624.97 samples/sec Loss 10.3534 LearningRate 0.0573 Epoch: 4 Global Step: 201690 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:53,561-Speed 2620.10 samples/sec Loss 10.4000 LearningRate 0.0573 Epoch: 4 Global Step: 201700 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:23:57,494-Speed 2604.40 samples/sec Loss 10.4569 LearningRate 0.0573 Epoch: 4 Global Step: 201710 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:24:01,462-Speed 2581.53 samples/sec Loss 10.1986 LearningRate 0.0573 Epoch: 4 Global Step: 201720 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:24:05,374-Speed 2618.55 samples/sec Loss 10.4465 LearningRate 0.0573 Epoch: 4 Global Step: 201730 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:24:09,284-Speed 2619.38 samples/sec Loss 10.3938 LearningRate 0.0573 Epoch: 4 Global Step: 201740 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:24:13,209-Speed 2609.36 samples/sec Loss 10.2406 LearningRate 0.0573 Epoch: 4 Global Step: 201750 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:24:17,101-Speed 2632.02 samples/sec Loss 10.2313 LearningRate 0.0573 Epoch: 4 Global Step: 201760 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:21,023-Speed 2611.57 samples/sec Loss 10.2049 LearningRate 0.0573 Epoch: 4 Global Step: 201770 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:24,930-Speed 2621.87 samples/sec Loss 10.3739 LearningRate 0.0573 Epoch: 4 Global Step: 201780 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:28,835-Speed 2623.06 samples/sec Loss 10.3621 LearningRate 0.0573 Epoch: 4 Global Step: 201790 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:32,806-Speed 2580.19 samples/sec Loss 10.4749 LearningRate 0.0573 Epoch: 4 Global Step: 201800 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:36,715-Speed 2619.94 samples/sec Loss 10.4446 LearningRate 0.0573 Epoch: 4 Global Step: 201810 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:40,657-Speed 2598.06 samples/sec Loss 10.3831 LearningRate 0.0573 Epoch: 4 Global Step: 201820 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:44,572-Speed 2616.49 samples/sec Loss 10.2731 LearningRate 0.0573 Epoch: 4 Global Step: 201830 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:48,477-Speed 2625.51 samples/sec Loss 10.3336 LearningRate 0.0573 Epoch: 4 Global Step: 201840 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:52,384-Speed 2621.32 samples/sec Loss 10.3421 LearningRate 0.0573 Epoch: 4 Global Step: 201850 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:24:56,297-Speed 2617.59 samples/sec Loss 10.3256 LearningRate 0.0573 Epoch: 4 Global Step: 201860 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:25:00,209-Speed 2618.04 samples/sec Loss 10.3059 LearningRate 0.0573 Epoch: 4 Global Step: 201870 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:04,119-Speed 2619.93 samples/sec Loss 10.4974 LearningRate 0.0573 Epoch: 4 Global Step: 201880 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:08,022-Speed 2624.40 samples/sec Loss 10.2912 LearningRate 0.0572 Epoch: 4 Global Step: 201890 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:11,927-Speed 2623.03 samples/sec Loss 10.3058 LearningRate 0.0572 Epoch: 4 Global Step: 201900 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:15,841-Speed 2616.68 samples/sec Loss 10.3577 LearningRate 0.0572 Epoch: 4 Global Step: 201910 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:19,750-Speed 2620.46 samples/sec Loss 10.2604 LearningRate 0.0572 Epoch: 4 Global Step: 201920 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:23,667-Speed 2614.76 samples/sec Loss 10.4215 LearningRate 0.0572 Epoch: 4 Global Step: 201930 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:27,574-Speed 2622.01 samples/sec Loss 10.1342 LearningRate 0.0572 Epoch: 4 Global Step: 201940 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:31,477-Speed 2624.24 samples/sec Loss 10.2372 LearningRate 0.0572 Epoch: 4 Global Step: 201950 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:35,380-Speed 2624.46 samples/sec Loss 10.2974 LearningRate 0.0572 Epoch: 4 Global Step: 201960 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:39,289-Speed 2619.51 samples/sec Loss 10.2658 LearningRate 0.0572 Epoch: 4 Global Step: 201970 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:25:43,177-Speed 2634.83 samples/sec Loss 10.3843 LearningRate 0.0572 Epoch: 4 Global Step: 201980 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:47,076-Speed 2626.92 samples/sec Loss 10.3624 LearningRate 0.0572 Epoch: 4 Global Step: 201990 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:50,985-Speed 2620.31 samples/sec Loss 10.3574 LearningRate 0.0572 Epoch: 4 Global Step: 202000 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:54,890-Speed 2622.78 samples/sec Loss 10.3056 LearningRate 0.0572 Epoch: 4 Global Step: 202010 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:25:58,805-Speed 2616.31 samples/sec Loss 10.4280 LearningRate 0.0572 Epoch: 4 Global Step: 202020 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:26:02,711-Speed 2621.86 samples/sec Loss 10.3060 LearningRate 0.0572 Epoch: 4 Global Step: 202030 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:26:06,615-Speed 2623.50 samples/sec Loss 10.2442 LearningRate 0.0572 Epoch: 4 Global Step: 202040 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:26:10,527-Speed 2618.50 samples/sec Loss 10.4084 LearningRate 0.0572 Epoch: 4 Global Step: 202050 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:26:14,430-Speed 2624.07 samples/sec Loss 10.1889 LearningRate 0.0572 Epoch: 4 Global Step: 202060 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:26:18,332-Speed 2624.90 samples/sec Loss 10.5120 LearningRate 0.0572 Epoch: 4 Global Step: 202070 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:26:22,234-Speed 2624.92 samples/sec Loss 10.2824 LearningRate 0.0572 Epoch: 4 Global Step: 202080 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:26,300-Speed 2518.98 samples/sec Loss 10.3773 LearningRate 0.0572 Epoch: 4 Global Step: 202090 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:30,201-Speed 2626.34 samples/sec Loss 10.2204 LearningRate 0.0572 Epoch: 4 Global Step: 202100 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:34,103-Speed 2624.44 samples/sec Loss 10.3483 LearningRate 0.0572 Epoch: 4 Global Step: 202110 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:38,004-Speed 2625.69 samples/sec Loss 10.3124 LearningRate 0.0572 Epoch: 4 Global Step: 202120 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:41,984-Speed 2573.63 samples/sec Loss 10.1134 LearningRate 0.0572 Epoch: 4 Global Step: 202130 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:45,897-Speed 2618.04 samples/sec Loss 10.3395 LearningRate 0.0572 Epoch: 4 Global Step: 202140 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:49,847-Speed 2593.37 samples/sec Loss 10.3456 LearningRate 0.0572 Epoch: 4 Global Step: 202150 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:53,747-Speed 2626.26 samples/sec Loss 10.2930 LearningRate 0.0572 Epoch: 4 Global Step: 202160 Fp16 Grad Scale: 131072 Required: 71 hours
Training: 2022-04-13 18:26:57,640-Speed 2630.74 samples/sec Loss 10.1797 LearningRate 0.0572 Epoch: 4 Global Step: 202170 Fp16 Grad Scale: 65536 Required: 71 hours
Training: 2022-04-13 18:27:01,535-Speed 2629.71 samples/sec Loss 10.2714 LearningRate 0.0572 Epoch: 4 Global Step: 202180 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:05,441-Speed 2622.29 samples/sec Loss 10.3675 LearningRate 0.0572 Epoch: 4 Global Step: 202190 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:09,347-Speed 2622.52 samples/sec Loss 10.4090 LearningRate 0.0572 Epoch: 4 Global Step: 202200 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:13,249-Speed 2624.50 samples/sec Loss 10.3138 LearningRate 0.0572 Epoch: 4 Global Step: 202210 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:17,161-Speed 2617.97 samples/sec Loss 10.4232 LearningRate 0.0572 Epoch: 4 Global Step: 202220 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:21,068-Speed 2621.95 samples/sec Loss 10.3558 LearningRate 0.0572 Epoch: 4 Global Step: 202230 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:24,994-Speed 2609.01 samples/sec Loss 10.3599 LearningRate 0.0572 Epoch: 4 Global Step: 202240 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:28,918-Speed 2610.37 samples/sec Loss 10.3696 LearningRate 0.0572 Epoch: 4 Global Step: 202250 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:32,837-Speed 2613.96 samples/sec Loss 10.2741 LearningRate 0.0572 Epoch: 4 Global Step: 202260 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:27:36,737-Speed 2626.15 samples/sec Loss 10.2507 LearningRate 0.0572 Epoch: 4 Global Step: 202270 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:27:40,636-Speed 2626.55 samples/sec Loss 10.3358 LearningRate 0.0572 Epoch: 4 Global Step: 202280 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:27:44,567-Speed 2605.96 samples/sec Loss 10.3579 LearningRate 0.0572 Epoch: 4 Global Step: 202290 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:27:48,466-Speed 2626.89 samples/sec Loss 10.2818 LearningRate 0.0572 Epoch: 4 Global Step: 202300 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:27:53,196-Speed 2165.45 samples/sec Loss 10.4024 LearningRate 0.0572 Epoch: 4 Global Step: 202310 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:27:57,090-Speed 2629.88 samples/sec Loss 10.2841 LearningRate 0.0572 Epoch: 4 Global Step: 202320 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:00,988-Speed 2628.93 samples/sec Loss 10.3818 LearningRate 0.0572 Epoch: 4 Global Step: 202330 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:04,892-Speed 2623.49 samples/sec Loss 10.2622 LearningRate 0.0572 Epoch: 4 Global Step: 202340 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:08,796-Speed 2623.56 samples/sec Loss 10.2386 LearningRate 0.0572 Epoch: 4 Global Step: 202350 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:12,712-Speed 2615.75 samples/sec Loss 10.2011 LearningRate 0.0572 Epoch: 4 Global Step: 202360 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:16,611-Speed 2627.16 samples/sec Loss 10.3406 LearningRate 0.0572 Epoch: 4 Global Step: 202370 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:28:20,501-Speed 2633.13 samples/sec Loss 10.3185 LearningRate 0.0572 Epoch: 4 Global Step: 202380 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:24,403-Speed 2624.84 samples/sec Loss 10.2250 LearningRate 0.0572 Epoch: 4 Global Step: 202390 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:28,307-Speed 2624.49 samples/sec Loss 10.2576 LearningRate 0.0572 Epoch: 4 Global Step: 202400 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:32,203-Speed 2628.35 samples/sec Loss 10.2137 LearningRate 0.0572 Epoch: 4 Global Step: 202410 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:36,130-Speed 2608.35 samples/sec Loss 10.4293 LearningRate 0.0572 Epoch: 4 Global Step: 202420 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:40,043-Speed 2617.82 samples/sec Loss 10.3706 LearningRate 0.0572 Epoch: 4 Global Step: 202430 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:43,942-Speed 2626.78 samples/sec Loss 10.4389 LearningRate 0.0571 Epoch: 4 Global Step: 202440 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:47,840-Speed 2627.85 samples/sec Loss 10.4578 LearningRate 0.0571 Epoch: 4 Global Step: 202450 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:51,756-Speed 2615.64 samples/sec Loss 10.1901 LearningRate 0.0571 Epoch: 4 Global Step: 202460 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:55,656-Speed 2626.04 samples/sec Loss 10.3187 LearningRate 0.0571 Epoch: 4 Global Step: 202470 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:28:59,554-Speed 2628.40 samples/sec Loss 10.3157 LearningRate 0.0571 Epoch: 4 Global Step: 202480 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:29:03,455-Speed 2625.43 samples/sec Loss 10.3455 LearningRate 0.0571 Epoch: 4 Global Step: 202490 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:29:07,341-Speed 2635.57 samples/sec Loss 10.3501 LearningRate 0.0571 Epoch: 4 Global Step: 202500 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:29:11,256-Speed 2616.08 samples/sec Loss 10.3920 LearningRate 0.0571 Epoch: 4 Global Step: 202510 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:29:15,150-Speed 2630.96 samples/sec Loss 10.4077 LearningRate 0.0571 Epoch: 4 Global Step: 202520 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:29:19,074-Speed 2610.16 samples/sec Loss 10.3641 LearningRate 0.0571 Epoch: 4 Global Step: 202530 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:29:22,976-Speed 2624.86 samples/sec Loss 10.2482 LearningRate 0.0571 Epoch: 4 Global Step: 202540 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:29:26,876-Speed 2626.09 samples/sec Loss 10.2359 LearningRate 0.0571 Epoch: 4 Global Step: 202550 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:29:30,829-Speed 2591.33 samples/sec Loss 10.3318 LearningRate 0.0571 Epoch: 4 Global Step: 202560 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:29:34,729-Speed 2626.26 samples/sec Loss 10.2995 LearningRate 0.0571 Epoch: 4 Global Step: 202570 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:29:38,631-Speed 2624.80 samples/sec Loss 10.3629 LearningRate 0.0571 Epoch: 4 Global Step: 202580 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:29:42,531-Speed 2626.34 samples/sec Loss 10.4088 LearningRate 0.0571 Epoch: 4 Global Step: 202590 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:29:46,455-Speed 2610.66 samples/sec Loss 10.3405 LearningRate 0.0571 Epoch: 4 Global Step: 202600 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:29:50,359-Speed 2623.36 samples/sec Loss 10.2797 LearningRate 0.0571 Epoch: 4 Global Step: 202610 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:29:54,258-Speed 2627.72 samples/sec Loss 10.2088 LearningRate 0.0571 Epoch: 4 Global Step: 202620 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:29:58,153-Speed 2629.50 samples/sec Loss 10.2691 LearningRate 0.0571 Epoch: 4 Global Step: 202630 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:30:02,066-Speed 2617.23 samples/sec Loss 10.4444 LearningRate 0.0571 Epoch: 4 Global Step: 202640 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:30:06,067-Speed 2560.43 samples/sec Loss 10.3843 LearningRate 0.0571 Epoch: 4 Global Step: 202650 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:30:09,964-Speed 2628.35 samples/sec Loss 10.3414 LearningRate 0.0571 Epoch: 4 Global Step: 202660 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:13,865-Speed 2625.24 samples/sec Loss 10.3180 LearningRate 0.0571 Epoch: 4 Global Step: 202670 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:17,778-Speed 2617.88 samples/sec Loss 10.3536 LearningRate 0.0571 Epoch: 4 Global Step: 202680 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:21,851-Speed 2514.63 samples/sec Loss 10.2562 LearningRate 0.0571 Epoch: 4 Global Step: 202690 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:25,785-Speed 2604.14 samples/sec Loss 10.4002 LearningRate 0.0571 Epoch: 4 Global Step: 202700 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:30,190-Speed 2324.97 samples/sec Loss 10.2708 LearningRate 0.0571 Epoch: 4 Global Step: 202710 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:34,093-Speed 2624.57 samples/sec Loss 10.2935 LearningRate 0.0571 Epoch: 4 Global Step: 202720 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:37,985-Speed 2631.60 samples/sec Loss 10.2613 LearningRate 0.0571 Epoch: 4 Global Step: 202730 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:30:41,862-Speed 2642.03 samples/sec Loss 10.2380 LearningRate 0.0571 Epoch: 4 Global Step: 202740 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:30:45,758-Speed 2628.40 samples/sec Loss 10.3516 LearningRate 0.0571 Epoch: 4 Global Step: 202750 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:30:49,663-Speed 2622.99 samples/sec Loss 10.3343 LearningRate 0.0571 Epoch: 4 Global Step: 202760 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:30:53,580-Speed 2614.33 samples/sec Loss 10.3271 LearningRate 0.0571 Epoch: 4 Global Step: 202770 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:30:57,474-Speed 2630.55 samples/sec Loss 10.1916 LearningRate 0.0571 Epoch: 4 Global Step: 202780 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:31:01,412-Speed 2601.58 samples/sec Loss 10.5179 LearningRate 0.0571 Epoch: 4 Global Step: 202790 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:31:05,304-Speed 2631.36 samples/sec Loss 10.4698 LearningRate 0.0571 Epoch: 4 Global Step: 202800 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:31:09,200-Speed 2629.36 samples/sec Loss 10.3037 LearningRate 0.0571 Epoch: 4 Global Step: 202810 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:31:13,093-Speed 2630.75 samples/sec Loss 10.2740 LearningRate 0.0571 Epoch: 4 Global Step: 202820 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:31:16,986-Speed 2630.33 samples/sec Loss 10.1639 LearningRate 0.0571 Epoch: 4 Global Step: 202830 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:31:20,870-Speed 2636.93 samples/sec Loss 10.2326 LearningRate 0.0571 Epoch: 4 Global Step: 202840 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:31:24,736-Speed 2649.93 samples/sec Loss 11.5342 LearningRate 0.0571 Epoch: 4 Global Step: 202850 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:28,637-Speed 2625.33 samples/sec Loss 10.9231 LearningRate 0.0571 Epoch: 4 Global Step: 202860 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:32,660-Speed 2546.38 samples/sec Loss 10.5368 LearningRate 0.0571 Epoch: 4 Global Step: 202870 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:36,684-Speed 2545.80 samples/sec Loss 10.4977 LearningRate 0.0571 Epoch: 4 Global Step: 202880 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:40,576-Speed 2631.71 samples/sec Loss 10.5378 LearningRate 0.0571 Epoch: 4 Global Step: 202890 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:44,484-Speed 2620.74 samples/sec Loss 10.4855 LearningRate 0.0571 Epoch: 4 Global Step: 202900 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:48,393-Speed 2619.75 samples/sec Loss 10.3557 LearningRate 0.0571 Epoch: 4 Global Step: 202910 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:52,301-Speed 2620.97 samples/sec Loss 10.4171 LearningRate 0.0571 Epoch: 4 Global Step: 202920 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:31:56,194-Speed 2631.13 samples/sec Loss 10.3597 LearningRate 0.0571 Epoch: 4 Global Step: 202930 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:32:00,089-Speed 2628.99 samples/sec Loss 10.2792 LearningRate 0.0571 Epoch: 4 Global Step: 202940 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:32:03,983-Speed 2630.92 samples/sec Loss 10.3669 LearningRate 0.0571 Epoch: 4 Global Step: 202950 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:07,880-Speed 2627.65 samples/sec Loss 10.3227 LearningRate 0.0571 Epoch: 4 Global Step: 202960 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:11,782-Speed 2625.33 samples/sec Loss 10.4495 LearningRate 0.0571 Epoch: 4 Global Step: 202970 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:15,675-Speed 2631.26 samples/sec Loss 10.2559 LearningRate 0.0571 Epoch: 4 Global Step: 202980 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:19,568-Speed 2630.96 samples/sec Loss 10.3955 LearningRate 0.0570 Epoch: 4 Global Step: 202990 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:23,461-Speed 2630.19 samples/sec Loss 10.4653 LearningRate 0.0570 Epoch: 4 Global Step: 203000 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:27,399-Speed 2601.65 samples/sec Loss 10.2367 LearningRate 0.0570 Epoch: 4 Global Step: 203010 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:31,292-Speed 2630.97 samples/sec Loss 10.3600 LearningRate 0.0570 Epoch: 4 Global Step: 203020 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:35,183-Speed 2632.17 samples/sec Loss 10.2954 LearningRate 0.0570 Epoch: 4 Global Step: 203030 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:39,076-Speed 2630.45 samples/sec Loss 10.2831 LearningRate 0.0570 Epoch: 4 Global Step: 203040 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:32:42,972-Speed 2628.88 samples/sec Loss 10.3914 LearningRate 0.0570 Epoch: 4 Global Step: 203050 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:32:46,886-Speed 2617.32 samples/sec Loss 10.2662 LearningRate 0.0570 Epoch: 4 Global Step: 203060 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:32:50,805-Speed 2613.43 samples/sec Loss 10.3283 LearningRate 0.0570 Epoch: 4 Global Step: 203070 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:32:54,701-Speed 2629.32 samples/sec Loss 10.3708 LearningRate 0.0570 Epoch: 4 Global Step: 203080 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:32:58,733-Speed 2539.90 samples/sec Loss 10.3791 LearningRate 0.0570 Epoch: 4 Global Step: 203090 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:33:02,805-Speed 2515.40 samples/sec Loss 10.3486 LearningRate 0.0570 Epoch: 4 Global Step: 203100 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:33:06,727-Speed 2611.83 samples/sec Loss 10.5007 LearningRate 0.0570 Epoch: 4 Global Step: 203110 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:33:10,622-Speed 2629.47 samples/sec Loss 10.3721 LearningRate 0.0570 Epoch: 4 Global Step: 203120 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:33:14,515-Speed 2630.74 samples/sec Loss 10.3153 LearningRate 0.0570 Epoch: 4 Global Step: 203130 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:33:18,445-Speed 2605.74 samples/sec Loss 10.5571 LearningRate 0.0570 Epoch: 4 Global Step: 203140 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:22,337-Speed 2631.90 samples/sec Loss 10.3252 LearningRate 0.0570 Epoch: 4 Global Step: 203150 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:26,230-Speed 2631.37 samples/sec Loss 10.3450 LearningRate 0.0570 Epoch: 4 Global Step: 203160 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:30,122-Speed 2632.00 samples/sec Loss 10.3465 LearningRate 0.0570 Epoch: 4 Global Step: 203170 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:34,014-Speed 2631.15 samples/sec Loss 10.3506 LearningRate 0.0570 Epoch: 4 Global Step: 203180 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:37,905-Speed 2631.96 samples/sec Loss 10.3483 LearningRate 0.0570 Epoch: 4 Global Step: 203190 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:41,799-Speed 2630.43 samples/sec Loss 10.3289 LearningRate 0.0570 Epoch: 4 Global Step: 203200 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:45,692-Speed 2631.12 samples/sec Loss 10.2776 LearningRate 0.0570 Epoch: 4 Global Step: 203210 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:49,595-Speed 2624.25 samples/sec Loss 10.3095 LearningRate 0.0570 Epoch: 4 Global Step: 203220 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:53,479-Speed 2637.05 samples/sec Loss 10.4911 LearningRate 0.0570 Epoch: 4 Global Step: 203230 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:33:57,379-Speed 2626.48 samples/sec Loss 10.4234 LearningRate 0.0570 Epoch: 4 Global Step: 203240 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:01,276-Speed 2628.34 samples/sec Loss 10.3254 LearningRate 0.0570 Epoch: 4 Global Step: 203250 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:05,176-Speed 2626.24 samples/sec Loss 10.2598 LearningRate 0.0570 Epoch: 4 Global Step: 203260 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:09,074-Speed 2627.45 samples/sec Loss 10.3372 LearningRate 0.0570 Epoch: 4 Global Step: 203270 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:12,968-Speed 2630.52 samples/sec Loss 10.2527 LearningRate 0.0570 Epoch: 4 Global Step: 203280 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:16,899-Speed 2606.45 samples/sec Loss 10.2866 LearningRate 0.0570 Epoch: 4 Global Step: 203290 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:20,816-Speed 2614.97 samples/sec Loss 10.2451 LearningRate 0.0570 Epoch: 4 Global Step: 203300 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:24,707-Speed 2632.34 samples/sec Loss 10.3618 LearningRate 0.0570 Epoch: 4 Global Step: 203310 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:28,606-Speed 2627.54 samples/sec Loss 10.3280 LearningRate 0.0570 Epoch: 4 Global Step: 203320 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:32,539-Speed 2603.87 samples/sec Loss 10.4834 LearningRate 0.0570 Epoch: 4 Global Step: 203330 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:34:36,447-Speed 2621.43 samples/sec Loss 10.3859 LearningRate 0.0570 Epoch: 4 Global Step: 203340 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:34:40,344-Speed 2627.77 samples/sec Loss 10.3711 LearningRate 0.0570 Epoch: 4 Global Step: 203350 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:34:44,361-Speed 2550.17 samples/sec Loss 10.4033 LearningRate 0.0570 Epoch: 4 Global Step: 203360 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:34:48,256-Speed 2629.71 samples/sec Loss 10.2251 LearningRate 0.0570 Epoch: 4 Global Step: 203370 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:34:52,153-Speed 2628.49 samples/sec Loss 10.2522 LearningRate 0.0570 Epoch: 4 Global Step: 203380 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:34:56,047-Speed 2629.97 samples/sec Loss 10.8794 LearningRate 0.0570 Epoch: 4 Global Step: 203390 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:34:59,940-Speed 2631.23 samples/sec Loss 10.4137 LearningRate 0.0570 Epoch: 4 Global Step: 203400 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:03,833-Speed 2631.37 samples/sec Loss 10.3778 LearningRate 0.0570 Epoch: 4 Global Step: 203410 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:07,726-Speed 2630.61 samples/sec Loss 10.2914 LearningRate 0.0570 Epoch: 4 Global Step: 203420 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:11,631-Speed 2622.95 samples/sec Loss 10.4338 LearningRate 0.0570 Epoch: 4 Global Step: 203430 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:15,525-Speed 2630.46 samples/sec Loss 10.4656 LearningRate 0.0570 Epoch: 4 Global Step: 203440 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:35:19,425-Speed 2625.57 samples/sec Loss 10.4303 LearningRate 0.0570 Epoch: 4 Global Step: 203450 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:35:23,330-Speed 2623.55 samples/sec Loss 10.1896 LearningRate 0.0570 Epoch: 4 Global Step: 203460 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:35:27,223-Speed 2630.63 samples/sec Loss 10.3304 LearningRate 0.0570 Epoch: 4 Global Step: 203470 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:35:31,119-Speed 2629.70 samples/sec Loss 10.2772 LearningRate 0.0570 Epoch: 4 Global Step: 203480 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:35:34,997-Speed 2640.78 samples/sec Loss 10.1107 LearningRate 0.0570 Epoch: 4 Global Step: 203490 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:38,894-Speed 2628.06 samples/sec Loss 10.3120 LearningRate 0.0570 Epoch: 4 Global Step: 203500 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:42,787-Speed 2630.96 samples/sec Loss 10.1539 LearningRate 0.0570 Epoch: 4 Global Step: 203510 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:46,685-Speed 2627.75 samples/sec Loss 10.1909 LearningRate 0.0570 Epoch: 4 Global Step: 203520 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:50,583-Speed 2628.14 samples/sec Loss 10.2619 LearningRate 0.0570 Epoch: 4 Global Step: 203530 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:35:54,442-Speed 2654.09 samples/sec Loss 10.2899 LearningRate 0.0569 Epoch: 4 Global Step: 203540 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:35:58,367-Speed 2609.33 samples/sec Loss 10.4997 LearningRate 0.0569 Epoch: 4 Global Step: 203550 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:02,259-Speed 2632.06 samples/sec Loss 10.4775 LearningRate 0.0569 Epoch: 4 Global Step: 203560 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:06,150-Speed 2632.59 samples/sec Loss 10.2056 LearningRate 0.0569 Epoch: 4 Global Step: 203570 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:10,053-Speed 2623.91 samples/sec Loss 10.3093 LearningRate 0.0569 Epoch: 4 Global Step: 203580 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:13,944-Speed 2632.20 samples/sec Loss 10.4330 LearningRate 0.0569 Epoch: 4 Global Step: 203590 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:17,845-Speed 2626.28 samples/sec Loss 10.3353 LearningRate 0.0569 Epoch: 4 Global Step: 203600 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:21,739-Speed 2630.70 samples/sec Loss 10.2117 LearningRate 0.0569 Epoch: 4 Global Step: 203610 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:25,636-Speed 2627.53 samples/sec Loss 10.2872 LearningRate 0.0569 Epoch: 4 Global Step: 203620 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:29,528-Speed 2632.19 samples/sec Loss 10.5829 LearningRate 0.0569 Epoch: 4 Global Step: 203630 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:36:33,420-Speed 2631.82 samples/sec Loss 10.3993 LearningRate 0.0569 Epoch: 4 Global Step: 203640 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:36:37,312-Speed 2631.58 samples/sec Loss 10.2476 LearningRate 0.0569 Epoch: 4 Global Step: 203650 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:36:41,221-Speed 2620.19 samples/sec Loss 10.3281 LearningRate 0.0569 Epoch: 4 Global Step: 203660 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:36:45,120-Speed 2626.52 samples/sec Loss 10.4878 LearningRate 0.0569 Epoch: 4 Global Step: 203670 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:36:49,016-Speed 2629.66 samples/sec Loss 10.2664 LearningRate 0.0569 Epoch: 4 Global Step: 203680 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:36:52,915-Speed 2626.77 samples/sec Loss 10.3128 LearningRate 0.0569 Epoch: 4 Global Step: 203690 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:36:56,814-Speed 2627.17 samples/sec Loss 10.2862 LearningRate 0.0569 Epoch: 4 Global Step: 203700 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:37:00,709-Speed 2629.43 samples/sec Loss 10.2066 LearningRate 0.0569 Epoch: 4 Global Step: 203710 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:37:04,603-Speed 2630.18 samples/sec Loss 10.1825 LearningRate 0.0569 Epoch: 4 Global Step: 203720 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:37:08,596-Speed 2564.86 samples/sec Loss 10.2483 LearningRate 0.0569 Epoch: 4 Global Step: 203730 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:37:12,496-Speed 2626.53 samples/sec Loss 10.2025 LearningRate 0.0569 Epoch: 4 Global Step: 203740 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:16,388-Speed 2631.78 samples/sec Loss 10.2777 LearningRate 0.0569 Epoch: 4 Global Step: 203750 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:20,291-Speed 2624.11 samples/sec Loss 10.2301 LearningRate 0.0569 Epoch: 4 Global Step: 203760 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:24,222-Speed 2605.66 samples/sec Loss 10.1485 LearningRate 0.0569 Epoch: 4 Global Step: 203770 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:28,112-Speed 2633.34 samples/sec Loss 10.2616 LearningRate 0.0569 Epoch: 4 Global Step: 203780 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:32,004-Speed 2631.42 samples/sec Loss 10.2097 LearningRate 0.0569 Epoch: 4 Global Step: 203790 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:35,903-Speed 2626.90 samples/sec Loss 10.3104 LearningRate 0.0569 Epoch: 4 Global Step: 203800 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:39,801-Speed 2627.95 samples/sec Loss 10.3226 LearningRate 0.0569 Epoch: 4 Global Step: 203810 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:43,705-Speed 2623.33 samples/sec Loss 10.4569 LearningRate 0.0569 Epoch: 4 Global Step: 203820 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:47,635-Speed 2606.37 samples/sec Loss 10.3469 LearningRate 0.0569 Epoch: 4 Global Step: 203830 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:37:51,533-Speed 2627.72 samples/sec Loss 10.4684 LearningRate 0.0569 Epoch: 4 Global Step: 203840 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:37:55,433-Speed 2625.94 samples/sec Loss 10.2775 LearningRate 0.0569 Epoch: 4 Global Step: 203850 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:37:59,329-Speed 2629.36 samples/sec Loss 10.2928 LearningRate 0.0569 Epoch: 4 Global Step: 203860 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:38:03,210-Speed 2639.15 samples/sec Loss 10.1512 LearningRate 0.0569 Epoch: 4 Global Step: 203870 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:07,124-Speed 2616.84 samples/sec Loss 10.3809 LearningRate 0.0569 Epoch: 4 Global Step: 203880 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:11,017-Speed 2630.86 samples/sec Loss 10.2318 LearningRate 0.0569 Epoch: 4 Global Step: 203890 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:14,915-Speed 2627.29 samples/sec Loss 10.4688 LearningRate 0.0569 Epoch: 4 Global Step: 203900 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:18,809-Speed 2630.12 samples/sec Loss 10.2795 LearningRate 0.0569 Epoch: 4 Global Step: 203910 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:22,704-Speed 2629.69 samples/sec Loss 10.2807 LearningRate 0.0569 Epoch: 4 Global Step: 203920 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:26,599-Speed 2629.94 samples/sec Loss 10.2797 LearningRate 0.0569 Epoch: 4 Global Step: 203930 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:30,494-Speed 2630.10 samples/sec Loss 10.2660 LearningRate 0.0569 Epoch: 4 Global Step: 203940 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:34,387-Speed 2630.62 samples/sec Loss 10.4464 LearningRate 0.0569 Epoch: 4 Global Step: 203950 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:38,280-Speed 2631.01 samples/sec Loss 10.1889 LearningRate 0.0569 Epoch: 4 Global Step: 203960 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:42,180-Speed 2626.21 samples/sec Loss 10.1856 LearningRate 0.0569 Epoch: 4 Global Step: 203970 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:38:46,059-Speed 2640.31 samples/sec Loss 10.3671 LearningRate 0.0569 Epoch: 4 Global Step: 203980 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:49,951-Speed 2632.04 samples/sec Loss 10.2983 LearningRate 0.0569 Epoch: 4 Global Step: 203990 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:53,843-Speed 2631.55 samples/sec Loss 10.3187 LearningRate 0.0569 Epoch: 4 Global Step: 204000 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:38:57,742-Speed 2627.26 samples/sec Loss 10.3440 LearningRate 0.0569 Epoch: 4 Global Step: 204010 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:01,635-Speed 2631.15 samples/sec Loss 10.2745 LearningRate 0.0569 Epoch: 4 Global Step: 204020 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:05,535-Speed 2626.42 samples/sec Loss 10.1974 LearningRate 0.0569 Epoch: 4 Global Step: 204030 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:09,429-Speed 2629.99 samples/sec Loss 10.2210 LearningRate 0.0569 Epoch: 4 Global Step: 204040 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:13,320-Speed 2632.17 samples/sec Loss 10.3358 LearningRate 0.0569 Epoch: 4 Global Step: 204050 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:17,215-Speed 2630.22 samples/sec Loss 10.2898 LearningRate 0.0569 Epoch: 4 Global Step: 204060 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:21,111-Speed 2629.01 samples/sec Loss 10.1560 LearningRate 0.0569 Epoch: 4 Global Step: 204070 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:25,036-Speed 2609.88 samples/sec Loss 10.2297 LearningRate 0.0569 Epoch: 4 Global Step: 204080 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:39:28,918-Speed 2638.87 samples/sec Loss 10.3151 LearningRate 0.0568 Epoch: 4 Global Step: 204090 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:32,811-Speed 2631.31 samples/sec Loss 10.1852 LearningRate 0.0568 Epoch: 4 Global Step: 204100 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:36,708-Speed 2627.81 samples/sec Loss 10.3536 LearningRate 0.0568 Epoch: 4 Global Step: 204110 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:40,606-Speed 2628.06 samples/sec Loss 10.3520 LearningRate 0.0568 Epoch: 4 Global Step: 204120 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:44,501-Speed 2629.55 samples/sec Loss 10.3973 LearningRate 0.0568 Epoch: 4 Global Step: 204130 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:48,407-Speed 2622.16 samples/sec Loss 10.2924 LearningRate 0.0568 Epoch: 4 Global Step: 204140 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:52,331-Speed 2610.67 samples/sec Loss 10.2506 LearningRate 0.0568 Epoch: 4 Global Step: 204150 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:39:56,239-Speed 2621.02 samples/sec Loss 10.2505 LearningRate 0.0568 Epoch: 4 Global Step: 204160 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:00,163-Speed 2610.49 samples/sec Loss 10.4084 LearningRate 0.0568 Epoch: 4 Global Step: 204170 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:04,057-Speed 2630.20 samples/sec Loss 10.4093 LearningRate 0.0568 Epoch: 4 Global Step: 204180 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:07,950-Speed 2631.14 samples/sec Loss 10.2733 LearningRate 0.0568 Epoch: 4 Global Step: 204190 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:40:11,846-Speed 2629.14 samples/sec Loss 10.3285 LearningRate 0.0568 Epoch: 4 Global Step: 204200 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:40:15,774-Speed 2607.72 samples/sec Loss 10.1302 LearningRate 0.0568 Epoch: 4 Global Step: 204210 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:19,660-Speed 2635.91 samples/sec Loss 10.4072 LearningRate 0.0568 Epoch: 4 Global Step: 204220 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:23,553-Speed 2631.14 samples/sec Loss 10.3276 LearningRate 0.0568 Epoch: 4 Global Step: 204230 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:27,462-Speed 2619.71 samples/sec Loss 10.4047 LearningRate 0.0568 Epoch: 4 Global Step: 204240 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:31,362-Speed 2626.88 samples/sec Loss 10.3891 LearningRate 0.0568 Epoch: 4 Global Step: 204250 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:35,257-Speed 2629.79 samples/sec Loss 10.2768 LearningRate 0.0568 Epoch: 4 Global Step: 204260 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:39,149-Speed 2631.58 samples/sec Loss 10.2003 LearningRate 0.0568 Epoch: 4 Global Step: 204270 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:43,049-Speed 2625.77 samples/sec Loss 10.2865 LearningRate 0.0568 Epoch: 4 Global Step: 204280 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:46,955-Speed 2623.03 samples/sec Loss 10.2141 LearningRate 0.0568 Epoch: 4 Global Step: 204290 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:40:50,836-Speed 2639.01 samples/sec Loss 10.2661 LearningRate 0.0568 Epoch: 4 Global Step: 204300 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:40:54,770-Speed 2603.86 samples/sec Loss 10.2149 LearningRate 0.0568 Epoch: 4 Global Step: 204310 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:40:58,665-Speed 2629.23 samples/sec Loss 10.1552 LearningRate 0.0568 Epoch: 4 Global Step: 204320 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:02,566-Speed 2626.60 samples/sec Loss 10.3635 LearningRate 0.0568 Epoch: 4 Global Step: 204330 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:06,457-Speed 2631.75 samples/sec Loss 10.1789 LearningRate 0.0568 Epoch: 4 Global Step: 204340 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:10,355-Speed 2627.46 samples/sec Loss 10.4146 LearningRate 0.0568 Epoch: 4 Global Step: 204350 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:14,249-Speed 2630.27 samples/sec Loss 10.2491 LearningRate 0.0568 Epoch: 4 Global Step: 204360 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:18,255-Speed 2557.02 samples/sec Loss 10.1658 LearningRate 0.0568 Epoch: 4 Global Step: 204370 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:22,152-Speed 2628.96 samples/sec Loss 10.2969 LearningRate 0.0568 Epoch: 4 Global Step: 204380 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:26,052-Speed 2626.52 samples/sec Loss 10.3156 LearningRate 0.0568 Epoch: 4 Global Step: 204390 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:29,932-Speed 2640.12 samples/sec Loss 10.3187 LearningRate 0.0568 Epoch: 4 Global Step: 204400 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:33,848-Speed 2615.38 samples/sec Loss 10.4201 LearningRate 0.0568 Epoch: 4 Global Step: 204410 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:37,754-Speed 2621.87 samples/sec Loss 10.3263 LearningRate 0.0568 Epoch: 4 Global Step: 204420 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:41,648-Speed 2630.58 samples/sec Loss 10.3449 LearningRate 0.0568 Epoch: 4 Global Step: 204430 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:45,544-Speed 2634.60 samples/sec Loss 10.3183 LearningRate 0.0568 Epoch: 4 Global Step: 204440 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:49,441-Speed 2628.70 samples/sec Loss 10.3286 LearningRate 0.0568 Epoch: 4 Global Step: 204450 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:53,340-Speed 2626.63 samples/sec Loss 10.2944 LearningRate 0.0568 Epoch: 4 Global Step: 204460 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:41:57,232-Speed 2632.04 samples/sec Loss 10.3630 LearningRate 0.0568 Epoch: 4 Global Step: 204470 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:01,124-Speed 2631.89 samples/sec Loss 10.3990 LearningRate 0.0568 Epoch: 4 Global Step: 204480 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:05,019-Speed 2629.80 samples/sec Loss 10.2107 LearningRate 0.0568 Epoch: 4 Global Step: 204490 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:08,917-Speed 2627.34 samples/sec Loss 10.3875 LearningRate 0.0568 Epoch: 4 Global Step: 204500 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:42:12,799-Speed 2638.29 samples/sec Loss 10.2618 LearningRate 0.0568 Epoch: 4 Global Step: 204510 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:16,685-Speed 2635.86 samples/sec Loss 10.2901 LearningRate 0.0568 Epoch: 4 Global Step: 204520 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:20,591-Speed 2622.05 samples/sec Loss 10.3299 LearningRate 0.0568 Epoch: 4 Global Step: 204530 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:24,533-Speed 2598.27 samples/sec Loss 10.4106 LearningRate 0.0568 Epoch: 4 Global Step: 204540 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:28,425-Speed 2631.74 samples/sec Loss 10.2430 LearningRate 0.0568 Epoch: 4 Global Step: 204550 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:32,325-Speed 2627.17 samples/sec Loss 10.2451 LearningRate 0.0568 Epoch: 4 Global Step: 204560 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:36,222-Speed 2628.18 samples/sec Loss 10.1198 LearningRate 0.0568 Epoch: 4 Global Step: 204570 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:40,121-Speed 2627.27 samples/sec Loss 10.3569 LearningRate 0.0568 Epoch: 4 Global Step: 204580 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:42:44,016-Speed 2628.94 samples/sec Loss 10.5012 LearningRate 0.0568 Epoch: 4 Global Step: 204590 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:42:47,918-Speed 2625.18 samples/sec Loss 11.1485 LearningRate 0.0568 Epoch: 4 Global Step: 204600 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:42:51,944-Speed 2543.66 samples/sec Loss 10.7269 LearningRate 0.0568 Epoch: 4 Global Step: 204610 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:42:55,821-Speed 2642.14 samples/sec Loss 10.7903 LearningRate 0.0568 Epoch: 4 Global Step: 204620 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:42:59,719-Speed 2627.50 samples/sec Loss 10.5225 LearningRate 0.0568 Epoch: 4 Global Step: 204630 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:03,614-Speed 2629.65 samples/sec Loss 10.4841 LearningRate 0.0567 Epoch: 4 Global Step: 204640 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:07,513-Speed 2627.15 samples/sec Loss 10.3535 LearningRate 0.0567 Epoch: 4 Global Step: 204650 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:11,409-Speed 2629.16 samples/sec Loss 10.3338 LearningRate 0.0567 Epoch: 4 Global Step: 204660 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:15,304-Speed 2629.43 samples/sec Loss 10.2820 LearningRate 0.0567 Epoch: 4 Global Step: 204670 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:19,198-Speed 2629.93 samples/sec Loss 10.2677 LearningRate 0.0567 Epoch: 4 Global Step: 204680 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:23,091-Speed 2631.30 samples/sec Loss 10.3610 LearningRate 0.0567 Epoch: 4 Global Step: 204690 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:26,983-Speed 2631.35 samples/sec Loss 10.3176 LearningRate 0.0567 Epoch: 4 Global Step: 204700 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:30,874-Speed 2632.63 samples/sec Loss 10.2077 LearningRate 0.0567 Epoch: 4 Global Step: 204710 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:43:34,765-Speed 2632.53 samples/sec Loss 10.2684 LearningRate 0.0567 Epoch: 4 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:43:38,657-Speed 2631.06 samples/sec Loss 10.2316 LearningRate 0.0567 Epoch: 4 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:43:42,551-Speed 2635.46 samples/sec Loss 10.3056 LearningRate 0.0567 Epoch: 4 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:43:46,443-Speed 2631.54 samples/sec Loss 10.2954 LearningRate 0.0567 Epoch: 4 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:43:50,337-Speed 2630.56 samples/sec Loss 10.2069 LearningRate 0.0567 Epoch: 4 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:43:54,229-Speed 2631.29 samples/sec Loss 10.3918 LearningRate 0.0567 Epoch: 4 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:43:58,126-Speed 2628.90 samples/sec Loss 10.3407 LearningRate 0.0567 Epoch: 4 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:44:02,017-Speed 2632.26 samples/sec Loss 10.1028 LearningRate 0.0567 Epoch: 4 Global Step: 204790 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:44:05,909-Speed 2631.70 samples/sec Loss 10.2874 LearningRate 0.0567 Epoch: 4 Global Step: 204800 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:44:09,802-Speed 2631.25 samples/sec Loss 10.2549 LearningRate 0.0567 Epoch: 4 Global Step: 204810 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:44:13,695-Speed 2630.26 samples/sec Loss 10.2350 LearningRate 0.0567 Epoch: 4 Global Step: 204820 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:17,600-Speed 2623.55 samples/sec Loss 10.3439 LearningRate 0.0567 Epoch: 4 Global Step: 204830 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:21,494-Speed 2630.34 samples/sec Loss 10.3742 LearningRate 0.0567 Epoch: 4 Global Step: 204840 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:25,383-Speed 2633.63 samples/sec Loss 10.2739 LearningRate 0.0567 Epoch: 4 Global Step: 204850 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:29,283-Speed 2626.43 samples/sec Loss 10.2532 LearningRate 0.0567 Epoch: 4 Global Step: 204860 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:33,180-Speed 2628.19 samples/sec Loss 10.3628 LearningRate 0.0567 Epoch: 4 Global Step: 204870 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:37,075-Speed 2629.67 samples/sec Loss 10.3176 LearningRate 0.0567 Epoch: 4 Global Step: 204880 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:40,968-Speed 2631.08 samples/sec Loss 10.3018 LearningRate 0.0567 Epoch: 4 Global Step: 204890 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:44,862-Speed 2629.77 samples/sec Loss 10.3376 LearningRate 0.0567 Epoch: 4 Global Step: 204900 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:48,764-Speed 2625.22 samples/sec Loss 10.3023 LearningRate 0.0567 Epoch: 4 Global Step: 204910 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:44:52,657-Speed 2630.78 samples/sec Loss 10.2403 LearningRate 0.0567 Epoch: 4 Global Step: 204920 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:44:56,553-Speed 2629.66 samples/sec Loss 10.2757 LearningRate 0.0567 Epoch: 4 Global Step: 204930 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:00,445-Speed 2631.88 samples/sec Loss 10.1436 LearningRate 0.0567 Epoch: 4 Global Step: 204940 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:04,342-Speed 2627.91 samples/sec Loss 10.2279 LearningRate 0.0567 Epoch: 4 Global Step: 204950 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:08,235-Speed 2630.83 samples/sec Loss 10.3373 LearningRate 0.0567 Epoch: 4 Global Step: 204960 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:12,126-Speed 2632.96 samples/sec Loss 10.1707 LearningRate 0.0567 Epoch: 4 Global Step: 204970 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:16,016-Speed 2633.13 samples/sec Loss 10.3810 LearningRate 0.0567 Epoch: 4 Global Step: 204980 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:19,905-Speed 2633.41 samples/sec Loss 10.2784 LearningRate 0.0567 Epoch: 4 Global Step: 204990 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:23,803-Speed 2627.45 samples/sec Loss 10.3048 LearningRate 0.0567 Epoch: 4 Global Step: 205000 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:27,705-Speed 2625.87 samples/sec Loss 10.3435 LearningRate 0.0567 Epoch: 4 Global Step: 205010 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:31,600-Speed 2629.41 samples/sec Loss 10.3481 LearningRate 0.0567 Epoch: 4 Global Step: 205020 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:45:35,496-Speed 2628.81 samples/sec Loss 10.2181 LearningRate 0.0567 Epoch: 4 Global Step: 205030 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:45:39,400-Speed 2623.31 samples/sec Loss 10.3828 LearningRate 0.0567 Epoch: 4 Global Step: 205040 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:45:43,295-Speed 2630.00 samples/sec Loss 10.3935 LearningRate 0.0567 Epoch: 4 Global Step: 205050 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:45:47,193-Speed 2627.92 samples/sec Loss 10.3161 LearningRate 0.0567 Epoch: 4 Global Step: 205060 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:45:51,095-Speed 2624.43 samples/sec Loss 10.1193 LearningRate 0.0567 Epoch: 4 Global Step: 205070 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:54,990-Speed 2630.00 samples/sec Loss 10.3624 LearningRate 0.0567 Epoch: 4 Global Step: 205080 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:45:58,871-Speed 2639.35 samples/sec Loss 10.3411 LearningRate 0.0567 Epoch: 4 Global Step: 205090 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:46:02,785-Speed 2616.54 samples/sec Loss 10.3078 LearningRate 0.0567 Epoch: 4 Global Step: 205100 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:46:06,684-Speed 2626.42 samples/sec Loss 10.2477 LearningRate 0.0567 Epoch: 4 Global Step: 205110 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:46:10,589-Speed 2623.86 samples/sec Loss 10.1790 LearningRate 0.0567 Epoch: 4 Global Step: 205120 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:46:14,485-Speed 2628.78 samples/sec Loss 10.3540 LearningRate 0.0567 Epoch: 4 Global Step: 205130 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:18,381-Speed 2629.21 samples/sec Loss 10.2100 LearningRate 0.0567 Epoch: 4 Global Step: 205140 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:22,284-Speed 2623.73 samples/sec Loss 10.1932 LearningRate 0.0567 Epoch: 4 Global Step: 205150 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:26,184-Speed 2627.00 samples/sec Loss 10.2906 LearningRate 0.0567 Epoch: 4 Global Step: 205160 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:30,084-Speed 2626.49 samples/sec Loss 10.2999 LearningRate 0.0567 Epoch: 4 Global Step: 205170 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:33,993-Speed 2619.66 samples/sec Loss 10.3178 LearningRate 0.0567 Epoch: 4 Global Step: 205180 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:37,901-Speed 2620.96 samples/sec Loss 10.2045 LearningRate 0.0566 Epoch: 4 Global Step: 205190 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:41,806-Speed 2623.20 samples/sec Loss 10.2883 LearningRate 0.0566 Epoch: 4 Global Step: 205200 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:45,699-Speed 2631.01 samples/sec Loss 10.3563 LearningRate 0.0566 Epoch: 4 Global Step: 205210 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:49,584-Speed 2636.34 samples/sec Loss 10.2753 LearningRate 0.0566 Epoch: 4 Global Step: 205220 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 18:46:53,474-Speed 2633.19 samples/sec Loss 10.2899 LearningRate 0.0566 Epoch: 4 Global Step: 205230 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:46:57,393-Speed 2613.67 samples/sec Loss 10.3289 LearningRate 0.0566 Epoch: 4 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:01,283-Speed 2633.31 samples/sec Loss 10.2388 LearningRate 0.0566 Epoch: 4 Global Step: 205250 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:05,185-Speed 2624.69 samples/sec Loss 10.2007 LearningRate 0.0566 Epoch: 4 Global Step: 205260 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:09,067-Speed 2638.24 samples/sec Loss 10.3315 LearningRate 0.0566 Epoch: 4 Global Step: 205270 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:12,957-Speed 2633.18 samples/sec Loss 10.2288 LearningRate 0.0566 Epoch: 4 Global Step: 205280 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:16,847-Speed 2633.16 samples/sec Loss 10.4193 LearningRate 0.0566 Epoch: 4 Global Step: 205290 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:20,761-Speed 2617.41 samples/sec Loss 10.1154 LearningRate 0.0566 Epoch: 4 Global Step: 205300 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:24,668-Speed 2621.84 samples/sec Loss 10.4729 LearningRate 0.0566 Epoch: 4 Global Step: 205310 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:28,564-Speed 2629.13 samples/sec Loss 10.4085 LearningRate 0.0566 Epoch: 4 Global Step: 205320 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 18:47:32,461-Speed 2628.29 samples/sec Loss 10.4225 LearningRate 0.0566 Epoch: 4 Global Step: 205330 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:47:36,351-Speed 2633.36 samples/sec Loss 10.2530 LearningRate 0.0566 Epoch: 4 Global Step: 205340 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:47:40,246-Speed 2629.25 samples/sec Loss 10.3073 LearningRate 0.0566 Epoch: 4 Global Step: 205350 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:47:44,135-Speed 2633.52 samples/sec Loss 10.3168 LearningRate 0.0566 Epoch: 4 Global Step: 205360 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:47:48,046-Speed 2619.77 samples/sec Loss 10.3818 LearningRate 0.0566 Epoch: 4 Global Step: 205370 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:47:51,937-Speed 2631.83 samples/sec Loss 10.3360 LearningRate 0.0566 Epoch: 4 Global Step: 205380 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:47:55,830-Speed 2631.65 samples/sec Loss 10.2839 LearningRate 0.0566 Epoch: 4 Global Step: 205390 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:47:59,749-Speed 2613.44 samples/sec Loss 10.1118 LearningRate 0.0566 Epoch: 4 Global Step: 205400 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:48:03,638-Speed 2633.82 samples/sec Loss 10.2283 LearningRate 0.0566 Epoch: 4 Global Step: 205410 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:48:07,530-Speed 2631.03 samples/sec Loss 10.3300 LearningRate 0.0566 Epoch: 4 Global Step: 205420 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:48:11,435-Speed 2623.31 samples/sec Loss 10.2219 LearningRate 0.0566 Epoch: 4 Global Step: 205430 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:15,327-Speed 2631.18 samples/sec Loss 10.4791 LearningRate 0.0566 Epoch: 4 Global Step: 205440 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:19,216-Speed 2633.72 samples/sec Loss 10.2120 LearningRate 0.0566 Epoch: 4 Global Step: 205450 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:23,107-Speed 2632.55 samples/sec Loss 10.2731 LearningRate 0.0566 Epoch: 4 Global Step: 205460 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:27,012-Speed 2622.83 samples/sec Loss 10.2196 LearningRate 0.0566 Epoch: 4 Global Step: 205470 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:30,909-Speed 2628.16 samples/sec Loss 10.1469 LearningRate 0.0566 Epoch: 4 Global Step: 205480 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:34,799-Speed 2633.59 samples/sec Loss 10.2617 LearningRate 0.0566 Epoch: 4 Global Step: 205490 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:38,703-Speed 2623.29 samples/sec Loss 10.1501 LearningRate 0.0566 Epoch: 4 Global Step: 205500 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:42,598-Speed 2629.75 samples/sec Loss 10.2886 LearningRate 0.0566 Epoch: 4 Global Step: 205510 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:46,495-Speed 2628.07 samples/sec Loss 10.1293 LearningRate 0.0566 Epoch: 4 Global Step: 205520 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:48:50,393-Speed 2627.51 samples/sec Loss 10.1011 LearningRate 0.0566 Epoch: 4 Global Step: 205530 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:48:54,291-Speed 2627.32 samples/sec Loss 10.1766 LearningRate 0.0566 Epoch: 4 Global Step: 205540 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:48:58,184-Speed 2631.73 samples/sec Loss 10.3087 LearningRate 0.0566 Epoch: 4 Global Step: 205550 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:49:02,075-Speed 2632.24 samples/sec Loss 10.1897 LearningRate 0.0566 Epoch: 4 Global Step: 205560 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:49:05,977-Speed 2624.51 samples/sec Loss 10.2535 LearningRate 0.0566 Epoch: 4 Global Step: 205570 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:09,889-Speed 2619.50 samples/sec Loss 10.2133 LearningRate 0.0566 Epoch: 4 Global Step: 205580 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:13,780-Speed 2632.73 samples/sec Loss 10.3322 LearningRate 0.0566 Epoch: 4 Global Step: 205590 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:17,671-Speed 2632.12 samples/sec Loss 10.3177 LearningRate 0.0566 Epoch: 4 Global Step: 205600 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:21,565-Speed 2630.15 samples/sec Loss 10.3312 LearningRate 0.0566 Epoch: 4 Global Step: 205610 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:25,455-Speed 2633.44 samples/sec Loss 10.3323 LearningRate 0.0566 Epoch: 4 Global Step: 205620 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:29,354-Speed 2626.42 samples/sec Loss 10.2682 LearningRate 0.0566 Epoch: 4 Global Step: 205630 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:33,258-Speed 2623.89 samples/sec Loss 10.2596 LearningRate 0.0566 Epoch: 4 Global Step: 205640 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:37,147-Speed 2633.60 samples/sec Loss 10.3462 LearningRate 0.0566 Epoch: 4 Global Step: 205650 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:41,037-Speed 2633.83 samples/sec Loss 10.3564 LearningRate 0.0566 Epoch: 4 Global Step: 205660 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:44,927-Speed 2632.77 samples/sec Loss 10.2406 LearningRate 0.0566 Epoch: 4 Global Step: 205670 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:48,818-Speed 2632.46 samples/sec Loss 10.2658 LearningRate 0.0566 Epoch: 4 Global Step: 205680 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:52,731-Speed 2617.06 samples/sec Loss 10.4572 LearningRate 0.0566 Epoch: 4 Global Step: 205690 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:49:56,629-Speed 2628.02 samples/sec Loss 10.2648 LearningRate 0.0566 Epoch: 4 Global Step: 205700 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:00,522-Speed 2630.96 samples/sec Loss 10.3032 LearningRate 0.0566 Epoch: 4 Global Step: 205710 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:04,426-Speed 2624.15 samples/sec Loss 10.3160 LearningRate 0.0566 Epoch: 4 Global Step: 205720 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:08,321-Speed 2629.82 samples/sec Loss 10.2813 LearningRate 0.0566 Epoch: 4 Global Step: 205730 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:12,232-Speed 2619.14 samples/sec Loss 10.2908 LearningRate 0.0565 Epoch: 4 Global Step: 205740 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:16,130-Speed 2627.69 samples/sec Loss 10.2138 LearningRate 0.0565 Epoch: 4 Global Step: 205750 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:20,036-Speed 2621.88 samples/sec Loss 10.1492 LearningRate 0.0565 Epoch: 4 Global Step: 205760 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:23,955-Speed 2613.35 samples/sec Loss 10.0790 LearningRate 0.0565 Epoch: 4 Global Step: 205770 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:50:27,892-Speed 2601.71 samples/sec Loss 10.2860 LearningRate 0.0565 Epoch: 4 Global Step: 205780 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:50:31,777-Speed 2636.60 samples/sec Loss 10.1903 LearningRate 0.0565 Epoch: 4 Global Step: 205790 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:35,681-Speed 2623.76 samples/sec Loss 10.3287 LearningRate 0.0565 Epoch: 4 Global Step: 205800 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:39,580-Speed 2626.89 samples/sec Loss 10.2165 LearningRate 0.0565 Epoch: 4 Global Step: 205810 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:43,485-Speed 2622.81 samples/sec Loss 10.3478 LearningRate 0.0565 Epoch: 4 Global Step: 205820 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:47,385-Speed 2626.33 samples/sec Loss 10.1149 LearningRate 0.0565 Epoch: 4 Global Step: 205830 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:50:51,277-Speed 2632.37 samples/sec Loss 10.2550 LearningRate 0.0565 Epoch: 4 Global Step: 205840 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:50:55,175-Speed 2627.41 samples/sec Loss 10.1919 LearningRate 0.0565 Epoch: 4 Global Step: 205850 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:50:59,069-Speed 2630.22 samples/sec Loss 10.2684 LearningRate 0.0565 Epoch: 4 Global Step: 205860 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:02,977-Speed 2620.90 samples/sec Loss 10.2981 LearningRate 0.0565 Epoch: 4 Global Step: 205870 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:06,878-Speed 2626.02 samples/sec Loss 10.1892 LearningRate 0.0565 Epoch: 4 Global Step: 205880 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:10,766-Speed 2634.07 samples/sec Loss 10.2073 LearningRate 0.0565 Epoch: 4 Global Step: 205890 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:14,666-Speed 2626.12 samples/sec Loss 10.1509 LearningRate 0.0565 Epoch: 4 Global Step: 205900 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:18,554-Speed 2635.04 samples/sec Loss 10.1997 LearningRate 0.0565 Epoch: 4 Global Step: 205910 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:22,450-Speed 2629.09 samples/sec Loss 10.1774 LearningRate 0.0565 Epoch: 4 Global Step: 205920 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:26,342-Speed 2631.99 samples/sec Loss 10.2694 LearningRate 0.0565 Epoch: 4 Global Step: 205930 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:51:30,231-Speed 2633.91 samples/sec Loss 10.2712 LearningRate 0.0565 Epoch: 4 Global Step: 205940 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:51:34,134-Speed 2624.16 samples/sec Loss 10.1286 LearningRate 0.0565 Epoch: 4 Global Step: 205950 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:51:38,021-Speed 2635.14 samples/sec Loss 10.1476 LearningRate 0.0565 Epoch: 4 Global Step: 205960 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:51:41,912-Speed 2632.35 samples/sec Loss 10.2498 LearningRate 0.0565 Epoch: 4 Global Step: 205970 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:51:45,805-Speed 2630.60 samples/sec Loss 10.2202 LearningRate 0.0565 Epoch: 4 Global Step: 205980 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:51:49,698-Speed 2631.73 samples/sec Loss 10.0856 LearningRate 0.0565 Epoch: 4 Global Step: 205990 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:51:53,590-Speed 2631.52 samples/sec Loss 10.2553 LearningRate 0.0565 Epoch: 4 Global Step: 206000 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:51:57,481-Speed 2632.34 samples/sec Loss 10.2475 LearningRate 0.0565 Epoch: 4 Global Step: 206010 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:52:01,371-Speed 2633.22 samples/sec Loss 10.2330 LearningRate 0.0565 Epoch: 4 Global Step: 206020 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:52:05,263-Speed 2631.62 samples/sec Loss 10.3422 LearningRate 0.0565 Epoch: 4 Global Step: 206030 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:52:09,164-Speed 2625.68 samples/sec Loss 10.2610 LearningRate 0.0565 Epoch: 4 Global Step: 206040 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:52:13,057-Speed 2631.05 samples/sec Loss 10.2592 LearningRate 0.0565 Epoch: 4 Global Step: 206050 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:52:16,951-Speed 2630.19 samples/sec Loss 10.3231 LearningRate 0.0565 Epoch: 4 Global Step: 206060 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:52:20,845-Speed 2630.55 samples/sec Loss 10.1802 LearningRate 0.0565 Epoch: 4 Global Step: 206070 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:52:24,708-Speed 2651.93 samples/sec Loss 10.2339 LearningRate 0.0565 Epoch: 4 Global Step: 206080 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:28,603-Speed 2630.33 samples/sec Loss 10.2397 LearningRate 0.0565 Epoch: 4 Global Step: 206090 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:32,501-Speed 2627.03 samples/sec Loss 10.1519 LearningRate 0.0565 Epoch: 4 Global Step: 206100 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:36,393-Speed 2631.48 samples/sec Loss 10.2817 LearningRate 0.0565 Epoch: 4 Global Step: 206110 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:40,286-Speed 2630.82 samples/sec Loss 10.2748 LearningRate 0.0565 Epoch: 4 Global Step: 206120 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:44,179-Speed 2631.05 samples/sec Loss 10.2215 LearningRate 0.0565 Epoch: 4 Global Step: 206130 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:48,080-Speed 2626.18 samples/sec Loss 10.2644 LearningRate 0.0565 Epoch: 4 Global Step: 206140 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:51,972-Speed 2631.80 samples/sec Loss 10.2231 LearningRate 0.0565 Epoch: 4 Global Step: 206150 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:55,872-Speed 2626.26 samples/sec Loss 10.2600 LearningRate 0.0565 Epoch: 4 Global Step: 206160 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:52:59,773-Speed 2625.38 samples/sec Loss 10.2204 LearningRate 0.0565 Epoch: 4 Global Step: 206170 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:53:03,671-Speed 2628.00 samples/sec Loss 10.2032 LearningRate 0.0565 Epoch: 4 Global Step: 206180 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:07,566-Speed 2629.27 samples/sec Loss 10.2872 LearningRate 0.0565 Epoch: 4 Global Step: 206190 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:11,473-Speed 2621.82 samples/sec Loss 10.2237 LearningRate 0.0565 Epoch: 4 Global Step: 206200 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:15,368-Speed 2629.83 samples/sec Loss 10.3095 LearningRate 0.0565 Epoch: 4 Global Step: 206210 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:19,264-Speed 2629.31 samples/sec Loss 10.3964 LearningRate 0.0565 Epoch: 4 Global Step: 206220 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:23,156-Speed 2631.96 samples/sec Loss 10.2083 LearningRate 0.0565 Epoch: 4 Global Step: 206230 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:27,160-Speed 2558.50 samples/sec Loss 10.2757 LearningRate 0.0565 Epoch: 4 Global Step: 206240 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:31,087-Speed 2608.22 samples/sec Loss 10.3265 LearningRate 0.0565 Epoch: 4 Global Step: 206250 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:34,982-Speed 2629.21 samples/sec Loss 10.2276 LearningRate 0.0565 Epoch: 4 Global Step: 206260 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:38,878-Speed 2629.10 samples/sec Loss 10.1861 LearningRate 0.0565 Epoch: 4 Global Step: 206270 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:42,784-Speed 2622.18 samples/sec Loss 10.2174 LearningRate 0.0565 Epoch: 4 Global Step: 206280 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:53:46,674-Speed 2633.01 samples/sec Loss 10.3533 LearningRate 0.0564 Epoch: 4 Global Step: 206290 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:53:50,566-Speed 2632.03 samples/sec Loss 10.1808 LearningRate 0.0564 Epoch: 4 Global Step: 206300 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:53:54,443-Speed 2641.58 samples/sec Loss 10.2743 LearningRate 0.0564 Epoch: 4 Global Step: 206310 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:53:58,337-Speed 2630.52 samples/sec Loss 10.1632 LearningRate 0.0564 Epoch: 4 Global Step: 206320 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:02,230-Speed 2631.22 samples/sec Loss 10.1266 LearningRate 0.0564 Epoch: 4 Global Step: 206330 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:06,124-Speed 2629.68 samples/sec Loss 10.1586 LearningRate 0.0564 Epoch: 4 Global Step: 206340 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:10,029-Speed 2623.17 samples/sec Loss 10.2597 LearningRate 0.0564 Epoch: 4 Global Step: 206350 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:13,924-Speed 2630.10 samples/sec Loss 10.3938 LearningRate 0.0564 Epoch: 4 Global Step: 206360 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:17,822-Speed 2628.05 samples/sec Loss 10.3069 LearningRate 0.0564 Epoch: 4 Global Step: 206370 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:21,718-Speed 2628.34 samples/sec Loss 10.2749 LearningRate 0.0564 Epoch: 4 Global Step: 206380 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:25,618-Speed 2626.26 samples/sec Loss 10.3739 LearningRate 0.0564 Epoch: 4 Global Step: 206390 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:29,550-Speed 2605.51 samples/sec Loss 10.3179 LearningRate 0.0564 Epoch: 4 Global Step: 206400 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:54:33,491-Speed 2598.98 samples/sec Loss 10.2002 LearningRate 0.0564 Epoch: 4 Global Step: 206410 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:54:37,382-Speed 2632.56 samples/sec Loss 10.1633 LearningRate 0.0564 Epoch: 4 Global Step: 206420 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:54:41,286-Speed 2624.26 samples/sec Loss 10.1936 LearningRate 0.0564 Epoch: 4 Global Step: 206430 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:54:45,189-Speed 2623.91 samples/sec Loss 10.3172 LearningRate 0.0564 Epoch: 4 Global Step: 206440 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:54:49,092-Speed 2623.97 samples/sec Loss 10.2171 LearningRate 0.0564 Epoch: 4 Global Step: 206450 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:54:52,997-Speed 2623.18 samples/sec Loss 10.3120 LearningRate 0.0564 Epoch: 4 Global Step: 206460 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:54:56,894-Speed 2628.73 samples/sec Loss 10.1387 LearningRate 0.0564 Epoch: 4 Global Step: 206470 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:55:00,818-Speed 2609.38 samples/sec Loss 10.0336 LearningRate 0.0564 Epoch: 4 Global Step: 206480 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:55:04,719-Speed 2625.56 samples/sec Loss 10.1511 LearningRate 0.0564 Epoch: 4 Global Step: 206490 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:55:08,631-Speed 2618.89 samples/sec Loss 10.1632 LearningRate 0.0564 Epoch: 4 Global Step: 206500 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:55:12,535-Speed 2623.21 samples/sec Loss 10.2656 LearningRate 0.0564 Epoch: 4 Global Step: 206510 Fp16 Grad Scale: 524288 Required: 70 hours
Training: 2022-04-13 18:55:16,410-Speed 2643.60 samples/sec Loss 10.1910 LearningRate 0.0564 Epoch: 4 Global Step: 206520 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:20,312-Speed 2625.30 samples/sec Loss 10.1811 LearningRate 0.0564 Epoch: 4 Global Step: 206530 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:24,326-Speed 2551.98 samples/sec Loss 10.1974 LearningRate 0.0564 Epoch: 4 Global Step: 206540 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:28,232-Speed 2621.55 samples/sec Loss 10.3151 LearningRate 0.0564 Epoch: 4 Global Step: 206550 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:32,138-Speed 2622.86 samples/sec Loss 10.2599 LearningRate 0.0564 Epoch: 4 Global Step: 206560 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:36,128-Speed 2566.94 samples/sec Loss 10.3835 LearningRate 0.0564 Epoch: 4 Global Step: 206570 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:40,082-Speed 2590.44 samples/sec Loss 10.2470 LearningRate 0.0564 Epoch: 4 Global Step: 206580 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:43,994-Speed 2618.61 samples/sec Loss 10.2505 LearningRate 0.0564 Epoch: 4 Global Step: 206590 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:47,893-Speed 2627.17 samples/sec Loss 10.1751 LearningRate 0.0564 Epoch: 4 Global Step: 206600 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:51,818-Speed 2609.15 samples/sec Loss 10.1713 LearningRate 0.0564 Epoch: 4 Global Step: 206610 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:55:55,708-Speed 2633.42 samples/sec Loss 10.2782 LearningRate 0.0564 Epoch: 4 Global Step: 206620 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:55:59,581-Speed 2643.95 samples/sec Loss 10.1930 LearningRate 0.0564 Epoch: 4 Global Step: 206630 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:03,471-Speed 2633.58 samples/sec Loss 10.2761 LearningRate 0.0564 Epoch: 4 Global Step: 206640 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:07,362-Speed 2631.82 samples/sec Loss 10.2165 LearningRate 0.0564 Epoch: 4 Global Step: 206650 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:11,253-Speed 2632.91 samples/sec Loss 10.2356 LearningRate 0.0564 Epoch: 4 Global Step: 206660 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:15,145-Speed 2630.99 samples/sec Loss 10.3037 LearningRate 0.0564 Epoch: 4 Global Step: 206670 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:19,047-Speed 2625.72 samples/sec Loss 10.1478 LearningRate 0.0564 Epoch: 4 Global Step: 206680 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:22,934-Speed 2634.77 samples/sec Loss 10.1948 LearningRate 0.0564 Epoch: 4 Global Step: 206690 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:26,827-Speed 2630.82 samples/sec Loss 10.2767 LearningRate 0.0564 Epoch: 4 Global Step: 206700 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:30,718-Speed 2632.45 samples/sec Loss 10.2230 LearningRate 0.0564 Epoch: 4 Global Step: 206710 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:34,634-Speed 2619.00 samples/sec Loss 10.1750 LearningRate 0.0564 Epoch: 4 Global Step: 206720 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:38,515-Speed 2640.04 samples/sec Loss 10.3754 LearningRate 0.0564 Epoch: 4 Global Step: 206730 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:42,410-Speed 2629.33 samples/sec Loss 10.2502 LearningRate 0.0564 Epoch: 4 Global Step: 206740 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:46,304-Speed 2630.44 samples/sec Loss 10.3147 LearningRate 0.0564 Epoch: 4 Global Step: 206750 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:50,195-Speed 2632.14 samples/sec Loss 10.2821 LearningRate 0.0564 Epoch: 4 Global Step: 206760 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:54,086-Speed 2632.83 samples/sec Loss 10.1641 LearningRate 0.0564 Epoch: 4 Global Step: 206770 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:56:57,985-Speed 2626.71 samples/sec Loss 10.3617 LearningRate 0.0564 Epoch: 4 Global Step: 206780 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:01,887-Speed 2624.57 samples/sec Loss 10.3324 LearningRate 0.0564 Epoch: 4 Global Step: 206790 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:05,794-Speed 2621.47 samples/sec Loss 10.1734 LearningRate 0.0564 Epoch: 4 Global Step: 206800 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:09,697-Speed 2624.53 samples/sec Loss 10.1399 LearningRate 0.0564 Epoch: 4 Global Step: 206810 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:13,599-Speed 2625.71 samples/sec Loss 10.0145 LearningRate 0.0564 Epoch: 4 Global Step: 206820 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:17,498-Speed 2626.24 samples/sec Loss 10.2673 LearningRate 0.0564 Epoch: 4 Global Step: 206830 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:57:21,403-Speed 2623.25 samples/sec Loss 10.3352 LearningRate 0.0564 Epoch: 4 Global Step: 206840 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:57:25,313-Speed 2619.24 samples/sec Loss 10.2177 LearningRate 0.0563 Epoch: 4 Global Step: 206850 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:57:29,205-Speed 2632.28 samples/sec Loss 10.2682 LearningRate 0.0563 Epoch: 4 Global Step: 206860 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:57:33,105-Speed 2625.49 samples/sec Loss 10.3002 LearningRate 0.0563 Epoch: 4 Global Step: 206870 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:57:37,004-Speed 2626.78 samples/sec Loss 10.1814 LearningRate 0.0563 Epoch: 4 Global Step: 206880 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:57:40,874-Speed 2647.08 samples/sec Loss 10.2023 LearningRate 0.0563 Epoch: 4 Global Step: 206890 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:44,771-Speed 2628.17 samples/sec Loss 10.1508 LearningRate 0.0563 Epoch: 4 Global Step: 206900 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:48,663-Speed 2631.85 samples/sec Loss 10.1792 LearningRate 0.0563 Epoch: 4 Global Step: 206910 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:57:52,549-Speed 2636.07 samples/sec Loss 10.1440 LearningRate 0.0563 Epoch: 4 Global Step: 206920 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:57:56,450-Speed 2625.04 samples/sec Loss 10.2070 LearningRate 0.0563 Epoch: 4 Global Step: 206930 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:00,354-Speed 2623.79 samples/sec Loss 10.2303 LearningRate 0.0563 Epoch: 4 Global Step: 206940 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:04,244-Speed 2632.70 samples/sec Loss 10.3011 LearningRate 0.0563 Epoch: 4 Global Step: 206950 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:08,144-Speed 2626.05 samples/sec Loss 10.3708 LearningRate 0.0563 Epoch: 4 Global Step: 206960 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:12,036-Speed 2631.53 samples/sec Loss 10.3821 LearningRate 0.0563 Epoch: 4 Global Step: 206970 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:15,930-Speed 2630.21 samples/sec Loss 10.2297 LearningRate 0.0563 Epoch: 4 Global Step: 206980 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:19,828-Speed 2628.17 samples/sec Loss 10.2548 LearningRate 0.0563 Epoch: 4 Global Step: 206990 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:23,722-Speed 2630.14 samples/sec Loss 10.2683 LearningRate 0.0563 Epoch: 4 Global Step: 207000 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:27,615-Speed 2631.29 samples/sec Loss 10.2533 LearningRate 0.0563 Epoch: 4 Global Step: 207010 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 18:58:31,505-Speed 2632.55 samples/sec Loss 10.1371 LearningRate 0.0563 Epoch: 4 Global Step: 207020 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:58:35,397-Speed 2632.00 samples/sec Loss 10.0513 LearningRate 0.0563 Epoch: 4 Global Step: 207030 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:58:39,294-Speed 2627.77 samples/sec Loss 10.3981 LearningRate 0.0563 Epoch: 4 Global Step: 207040 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:58:43,237-Speed 2598.29 samples/sec Loss 10.2875 LearningRate 0.0563 Epoch: 4 Global Step: 207050 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:58:47,267-Speed 2541.32 samples/sec Loss 10.3357 LearningRate 0.0563 Epoch: 4 Global Step: 207060 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:58:51,184-Speed 2615.02 samples/sec Loss 10.1166 LearningRate 0.0563 Epoch: 4 Global Step: 207070 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:58:55,077-Speed 2631.00 samples/sec Loss 10.3260 LearningRate 0.0563 Epoch: 4 Global Step: 207080 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:58:58,977-Speed 2626.60 samples/sec Loss 10.1833 LearningRate 0.0563 Epoch: 4 Global Step: 207090 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:02,873-Speed 2628.54 samples/sec Loss 10.1157 LearningRate 0.0563 Epoch: 4 Global Step: 207100 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:06,764-Speed 2632.42 samples/sec Loss 10.2392 LearningRate 0.0563 Epoch: 4 Global Step: 207110 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:10,645-Speed 2639.07 samples/sec Loss 10.2330 LearningRate 0.0563 Epoch: 4 Global Step: 207120 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:14,541-Speed 2628.82 samples/sec Loss 10.3843 LearningRate 0.0563 Epoch: 4 Global Step: 207130 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:18,441-Speed 2627.07 samples/sec Loss 10.2725 LearningRate 0.0563 Epoch: 4 Global Step: 207140 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:22,337-Speed 2628.54 samples/sec Loss 10.3561 LearningRate 0.0563 Epoch: 4 Global Step: 207150 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:26,269-Speed 2605.48 samples/sec Loss 10.0333 LearningRate 0.0563 Epoch: 4 Global Step: 207160 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:30,167-Speed 2627.80 samples/sec Loss 10.1933 LearningRate 0.0563 Epoch: 4 Global Step: 207170 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:34,065-Speed 2627.04 samples/sec Loss 10.0970 LearningRate 0.0563 Epoch: 4 Global Step: 207180 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:37,960-Speed 2629.73 samples/sec Loss 10.2746 LearningRate 0.0563 Epoch: 4 Global Step: 207190 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:41,853-Speed 2631.49 samples/sec Loss 10.0865 LearningRate 0.0563 Epoch: 4 Global Step: 207200 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:45,782-Speed 2607.03 samples/sec Loss 10.2238 LearningRate 0.0563 Epoch: 4 Global Step: 207210 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 18:59:49,689-Speed 2621.67 samples/sec Loss 10.2111 LearningRate 0.0563 Epoch: 4 Global Step: 207220 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:59:53,588-Speed 2626.62 samples/sec Loss 10.1879 LearningRate 0.0563 Epoch: 4 Global Step: 207230 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 18:59:57,489-Speed 2625.70 samples/sec Loss 10.3176 LearningRate 0.0563 Epoch: 4 Global Step: 207240 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:00:01,387-Speed 2627.43 samples/sec Loss 10.2092 LearningRate 0.0563 Epoch: 4 Global Step: 207250 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:00:05,282-Speed 2630.19 samples/sec Loss 10.2484 LearningRate 0.0563 Epoch: 4 Global Step: 207260 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:00:09,179-Speed 2628.13 samples/sec Loss 10.1984 LearningRate 0.0563 Epoch: 4 Global Step: 207270 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:00:13,074-Speed 2630.62 samples/sec Loss 10.3023 LearningRate 0.0563 Epoch: 4 Global Step: 207280 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:00:16,980-Speed 2621.98 samples/sec Loss 10.2435 LearningRate 0.0563 Epoch: 4 Global Step: 207290 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:00:20,876-Speed 2629.03 samples/sec Loss 10.2460 LearningRate 0.0563 Epoch: 4 Global Step: 207300 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:00:24,814-Speed 2601.37 samples/sec Loss 10.1800 LearningRate 0.0563 Epoch: 4 Global Step: 207310 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:28,717-Speed 2624.53 samples/sec Loss 10.2498 LearningRate 0.0563 Epoch: 4 Global Step: 207320 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:32,610-Speed 2630.87 samples/sec Loss 10.2448 LearningRate 0.0563 Epoch: 4 Global Step: 207330 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:36,500-Speed 2632.67 samples/sec Loss 10.3323 LearningRate 0.0563 Epoch: 4 Global Step: 207340 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:40,392-Speed 2631.81 samples/sec Loss 10.2795 LearningRate 0.0563 Epoch: 4 Global Step: 207350 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:44,286-Speed 2630.15 samples/sec Loss 10.0809 LearningRate 0.0563 Epoch: 4 Global Step: 207360 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:48,180-Speed 2630.73 samples/sec Loss 10.1817 LearningRate 0.0563 Epoch: 4 Global Step: 207370 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:52,073-Speed 2630.47 samples/sec Loss 10.1995 LearningRate 0.0563 Epoch: 4 Global Step: 207380 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:00:55,966-Speed 2631.30 samples/sec Loss 10.2686 LearningRate 0.0563 Epoch: 4 Global Step: 207390 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:18,026-Speed 464.22 samples/sec Loss 10.2975 LearningRate 0.0562 Epoch: 5 Global Step: 207400 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:21,936-Speed 2619.62 samples/sec Loss 10.3200 LearningRate 0.0562 Epoch: 5 Global Step: 207410 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:01:25,831-Speed 2629.65 samples/sec Loss 10.2283 LearningRate 0.0562 Epoch: 5 Global Step: 207420 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:01:29,702-Speed 2646.40 samples/sec Loss 10.2658 LearningRate 0.0562 Epoch: 5 Global Step: 207430 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:33,600-Speed 2627.41 samples/sec Loss 10.3411 LearningRate 0.0562 Epoch: 5 Global Step: 207440 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:37,502-Speed 2625.03 samples/sec Loss 10.0276 LearningRate 0.0562 Epoch: 5 Global Step: 207450 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:41,402-Speed 2626.05 samples/sec Loss 10.2505 LearningRate 0.0562 Epoch: 5 Global Step: 207460 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:45,315-Speed 2617.46 samples/sec Loss 10.1436 LearningRate 0.0562 Epoch: 5 Global Step: 207470 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:49,211-Speed 2629.24 samples/sec Loss 10.2258 LearningRate 0.0562 Epoch: 5 Global Step: 207480 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:53,104-Speed 2630.85 samples/sec Loss 10.2635 LearningRate 0.0562 Epoch: 5 Global Step: 207490 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:01:56,987-Speed 2637.66 samples/sec Loss 10.2554 LearningRate 0.0562 Epoch: 5 Global Step: 207500 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:00,877-Speed 2633.56 samples/sec Loss 10.2796 LearningRate 0.0562 Epoch: 5 Global Step: 207510 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:04,760-Speed 2637.23 samples/sec Loss 10.2798 LearningRate 0.0562 Epoch: 5 Global Step: 207520 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:08,644-Speed 2637.32 samples/sec Loss 10.2898 LearningRate 0.0562 Epoch: 5 Global Step: 207530 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:12,560-Speed 2615.80 samples/sec Loss 10.2261 LearningRate 0.0562 Epoch: 5 Global Step: 207540 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:16,446-Speed 2635.12 samples/sec Loss 10.1726 LearningRate 0.0562 Epoch: 5 Global Step: 207550 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:20,341-Speed 2630.26 samples/sec Loss 10.2015 LearningRate 0.0562 Epoch: 5 Global Step: 207560 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:24,228-Speed 2634.38 samples/sec Loss 10.2307 LearningRate 0.0562 Epoch: 5 Global Step: 207570 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:28,117-Speed 2634.27 samples/sec Loss 10.2276 LearningRate 0.0562 Epoch: 5 Global Step: 207580 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:32,007-Speed 2632.78 samples/sec Loss 10.3049 LearningRate 0.0562 Epoch: 5 Global Step: 207590 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:35,899-Speed 2631.87 samples/sec Loss 10.2986 LearningRate 0.0562 Epoch: 5 Global Step: 207600 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:02:39,773-Speed 2643.68 samples/sec Loss 10.2951 LearningRate 0.0562 Epoch: 5 Global Step: 207610 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:43,715-Speed 2598.87 samples/sec Loss 10.3588 LearningRate 0.0562 Epoch: 5 Global Step: 207620 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:47,606-Speed 2632.28 samples/sec Loss 10.1834 LearningRate 0.0562 Epoch: 5 Global Step: 207630 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:51,501-Speed 2629.59 samples/sec Loss 10.2311 LearningRate 0.0562 Epoch: 5 Global Step: 207640 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:55,404-Speed 2624.57 samples/sec Loss 10.2852 LearningRate 0.0562 Epoch: 5 Global Step: 207650 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:02:59,293-Speed 2633.66 samples/sec Loss 10.1762 LearningRate 0.0562 Epoch: 5 Global Step: 207660 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:03:03,197-Speed 2623.41 samples/sec Loss 10.3404 LearningRate 0.0562 Epoch: 5 Global Step: 207670 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:03:07,091-Speed 2630.61 samples/sec Loss 10.2018 LearningRate 0.0562 Epoch: 5 Global Step: 207680 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:03:10,987-Speed 2628.72 samples/sec Loss 10.3055 LearningRate 0.0562 Epoch: 5 Global Step: 207690 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:03:14,882-Speed 2629.82 samples/sec Loss 10.1820 LearningRate 0.0562 Epoch: 5 Global Step: 207700 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:03:18,784-Speed 2625.51 samples/sec Loss 10.1384 LearningRate 0.0562 Epoch: 5 Global Step: 207710 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:03:22,640-Speed 2655.85 samples/sec Loss 10.0547 LearningRate 0.0562 Epoch: 5 Global Step: 207720 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:26,531-Speed 2633.03 samples/sec Loss 10.3268 LearningRate 0.0562 Epoch: 5 Global Step: 207730 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:30,437-Speed 2621.95 samples/sec Loss 10.1329 LearningRate 0.0562 Epoch: 5 Global Step: 207740 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:34,340-Speed 2624.17 samples/sec Loss 10.0766 LearningRate 0.0562 Epoch: 5 Global Step: 207750 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:38,256-Speed 2615.33 samples/sec Loss 10.3164 LearningRate 0.0562 Epoch: 5 Global Step: 207760 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:42,161-Speed 2623.57 samples/sec Loss 10.3755 LearningRate 0.0562 Epoch: 5 Global Step: 207770 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:46,059-Speed 2627.29 samples/sec Loss 10.2363 LearningRate 0.0562 Epoch: 5 Global Step: 207780 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:49,944-Speed 2636.67 samples/sec Loss 10.0848 LearningRate 0.0562 Epoch: 5 Global Step: 207790 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:53,868-Speed 2610.43 samples/sec Loss 10.1099 LearningRate 0.0562 Epoch: 5 Global Step: 207800 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:03:57,765-Speed 2628.69 samples/sec Loss 10.1998 LearningRate 0.0562 Epoch: 5 Global Step: 207810 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:04:01,662-Speed 2628.32 samples/sec Loss 10.1235 LearningRate 0.0562 Epoch: 5 Global Step: 207820 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:05,554-Speed 2630.98 samples/sec Loss 9.9540 LearningRate 0.0562 Epoch: 5 Global Step: 207830 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:09,452-Speed 2628.05 samples/sec Loss 10.0919 LearningRate 0.0562 Epoch: 5 Global Step: 207840 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:13,355-Speed 2624.59 samples/sec Loss 10.2729 LearningRate 0.0562 Epoch: 5 Global Step: 207850 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:17,249-Speed 2630.25 samples/sec Loss 10.1988 LearningRate 0.0562 Epoch: 5 Global Step: 207860 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:21,151-Speed 2624.52 samples/sec Loss 10.2423 LearningRate 0.0562 Epoch: 5 Global Step: 207870 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:25,046-Speed 2629.70 samples/sec Loss 10.0729 LearningRate 0.0562 Epoch: 5 Global Step: 207880 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:28,940-Speed 2630.40 samples/sec Loss 10.1158 LearningRate 0.0562 Epoch: 5 Global Step: 207890 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:32,844-Speed 2623.19 samples/sec Loss 10.1440 LearningRate 0.0562 Epoch: 5 Global Step: 207900 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:36,745-Speed 2625.82 samples/sec Loss 10.2384 LearningRate 0.0562 Epoch: 5 Global Step: 207910 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:40,626-Speed 2638.71 samples/sec Loss 10.1496 LearningRate 0.0562 Epoch: 5 Global Step: 207920 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:44,517-Speed 2631.97 samples/sec Loss 10.2422 LearningRate 0.0562 Epoch: 5 Global Step: 207930 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:04:48,395-Speed 2641.99 samples/sec Loss 10.0844 LearningRate 0.0562 Epoch: 5 Global Step: 207940 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:04:52,287-Speed 2631.61 samples/sec Loss 10.1690 LearningRate 0.0561 Epoch: 5 Global Step: 207950 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:04:56,178-Speed 2632.11 samples/sec Loss 10.1670 LearningRate 0.0561 Epoch: 5 Global Step: 207960 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:00,068-Speed 2632.77 samples/sec Loss 10.2187 LearningRate 0.0561 Epoch: 5 Global Step: 207970 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:03,965-Speed 2628.70 samples/sec Loss 10.2447 LearningRate 0.0561 Epoch: 5 Global Step: 207980 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:07,848-Speed 2637.36 samples/sec Loss 10.1757 LearningRate 0.0561 Epoch: 5 Global Step: 207990 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:11,740-Speed 2631.66 samples/sec Loss 10.2504 LearningRate 0.0561 Epoch: 5 Global Step: 208000 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:15,628-Speed 2634.06 samples/sec Loss 10.2388 LearningRate 0.0561 Epoch: 5 Global Step: 208010 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:19,518-Speed 2633.56 samples/sec Loss 10.2922 LearningRate 0.0561 Epoch: 5 Global Step: 208020 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:23,407-Speed 2633.97 samples/sec Loss 10.1399 LearningRate 0.0561 Epoch: 5 Global Step: 208030 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:05:27,296-Speed 2633.42 samples/sec Loss 10.2039 LearningRate 0.0561 Epoch: 5 Global Step: 208040 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:31,187-Speed 2632.30 samples/sec Loss 10.1591 LearningRate 0.0561 Epoch: 5 Global Step: 208050 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:35,090-Speed 2624.48 samples/sec Loss 10.1732 LearningRate 0.0561 Epoch: 5 Global Step: 208060 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:38,989-Speed 2626.73 samples/sec Loss 10.2063 LearningRate 0.0561 Epoch: 5 Global Step: 208070 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:42,883-Speed 2629.62 samples/sec Loss 10.2183 LearningRate 0.0561 Epoch: 5 Global Step: 208080 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:46,776-Speed 2631.07 samples/sec Loss 10.2398 LearningRate 0.0561 Epoch: 5 Global Step: 208090 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:50,674-Speed 2627.44 samples/sec Loss 10.2120 LearningRate 0.0561 Epoch: 5 Global Step: 208100 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:54,567-Speed 2631.36 samples/sec Loss 10.0830 LearningRate 0.0561 Epoch: 5 Global Step: 208110 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:05:58,463-Speed 2629.07 samples/sec Loss 10.1203 LearningRate 0.0561 Epoch: 5 Global Step: 208120 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:06:02,358-Speed 2630.09 samples/sec Loss 10.1373 LearningRate 0.0561 Epoch: 5 Global Step: 208130 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:06:06,252-Speed 2629.73 samples/sec Loss 10.1185 LearningRate 0.0561 Epoch: 5 Global Step: 208140 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:06:10,209-Speed 2588.38 samples/sec Loss 10.2124 LearningRate 0.0561 Epoch: 5 Global Step: 208150 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:06:14,106-Speed 2627.98 samples/sec Loss 10.1902 LearningRate 0.0561 Epoch: 5 Global Step: 208160 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:06:18,001-Speed 2629.70 samples/sec Loss 10.2645 LearningRate 0.0561 Epoch: 5 Global Step: 208170 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:06:21,896-Speed 2629.49 samples/sec Loss 10.2559 LearningRate 0.0561 Epoch: 5 Global Step: 208180 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:06:25,772-Speed 2642.49 samples/sec Loss 10.2590 LearningRate 0.0561 Epoch: 5 Global Step: 208190 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:06:29,667-Speed 2630.20 samples/sec Loss 10.1508 LearningRate 0.0561 Epoch: 5 Global Step: 208200 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:06:33,560-Speed 2630.61 samples/sec Loss 10.2492 LearningRate 0.0561 Epoch: 5 Global Step: 208210 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:06:37,462-Speed 2624.85 samples/sec Loss 10.2859 LearningRate 0.0561 Epoch: 5 Global Step: 208220 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:06:41,356-Speed 2630.35 samples/sec Loss 10.1174 LearningRate 0.0561 Epoch: 5 Global Step: 208230 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:06:45,205-Speed 2661.38 samples/sec Loss 10.7692 LearningRate 0.0561 Epoch: 5 Global Step: 208240 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:06:49,058-Speed 2658.29 samples/sec Loss 11.1137 LearningRate 0.0561 Epoch: 5 Global Step: 208250 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:06:52,983-Speed 2609.74 samples/sec Loss 10.7215 LearningRate 0.0561 Epoch: 5 Global Step: 208260 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:06:56,878-Speed 2630.13 samples/sec Loss 10.3438 LearningRate 0.0561 Epoch: 5 Global Step: 208270 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:00,770-Speed 2631.61 samples/sec Loss 10.3542 LearningRate 0.0561 Epoch: 5 Global Step: 208280 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:04,686-Speed 2615.23 samples/sec Loss 10.2814 LearningRate 0.0561 Epoch: 5 Global Step: 208290 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:08,583-Speed 2628.03 samples/sec Loss 10.2290 LearningRate 0.0561 Epoch: 5 Global Step: 208300 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:12,473-Speed 2633.27 samples/sec Loss 10.3113 LearningRate 0.0561 Epoch: 5 Global Step: 208310 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:16,361-Speed 2634.36 samples/sec Loss 10.3821 LearningRate 0.0561 Epoch: 5 Global Step: 208320 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:20,280-Speed 2613.60 samples/sec Loss 10.2131 LearningRate 0.0561 Epoch: 5 Global Step: 208330 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:24,173-Speed 2631.32 samples/sec Loss 10.2192 LearningRate 0.0561 Epoch: 5 Global Step: 208340 Fp16 Grad Scale: 4096 Required: 70 hours
Training: 2022-04-13 19:07:28,060-Speed 2635.53 samples/sec Loss 10.2328 LearningRate 0.0561 Epoch: 5 Global Step: 208350 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:31,959-Speed 2626.59 samples/sec Loss 10.2900 LearningRate 0.0561 Epoch: 5 Global Step: 208360 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:35,848-Speed 2633.84 samples/sec Loss 10.2369 LearningRate 0.0561 Epoch: 5 Global Step: 208370 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:39,735-Speed 2634.59 samples/sec Loss 10.1767 LearningRate 0.0561 Epoch: 5 Global Step: 208380 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:43,633-Speed 2628.35 samples/sec Loss 10.4207 LearningRate 0.0561 Epoch: 5 Global Step: 208390 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:47,524-Speed 2631.83 samples/sec Loss 10.2725 LearningRate 0.0561 Epoch: 5 Global Step: 208400 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:51,416-Speed 2632.34 samples/sec Loss 10.1260 LearningRate 0.0561 Epoch: 5 Global Step: 208410 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:55,318-Speed 2624.84 samples/sec Loss 10.3363 LearningRate 0.0561 Epoch: 5 Global Step: 208420 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:07:59,205-Speed 2635.03 samples/sec Loss 10.2080 LearningRate 0.0561 Epoch: 5 Global Step: 208430 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:08:03,091-Speed 2635.41 samples/sec Loss 10.2364 LearningRate 0.0561 Epoch: 5 Global Step: 208440 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:08:06,983-Speed 2632.46 samples/sec Loss 10.2732 LearningRate 0.0561 Epoch: 5 Global Step: 208450 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:10,874-Speed 2631.86 samples/sec Loss 10.2375 LearningRate 0.0561 Epoch: 5 Global Step: 208460 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:14,766-Speed 2632.12 samples/sec Loss 10.2847 LearningRate 0.0561 Epoch: 5 Global Step: 208470 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:18,655-Speed 2633.45 samples/sec Loss 10.2029 LearningRate 0.0561 Epoch: 5 Global Step: 208480 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:22,545-Speed 2633.54 samples/sec Loss 10.2913 LearningRate 0.0561 Epoch: 5 Global Step: 208490 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:26,432-Speed 2634.95 samples/sec Loss 10.1835 LearningRate 0.0561 Epoch: 5 Global Step: 208500 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:30,325-Speed 2631.06 samples/sec Loss 10.2781 LearningRate 0.0560 Epoch: 5 Global Step: 208510 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:34,238-Speed 2617.69 samples/sec Loss 10.2770 LearningRate 0.0560 Epoch: 5 Global Step: 208520 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:38,284-Speed 2531.56 samples/sec Loss 10.2585 LearningRate 0.0560 Epoch: 5 Global Step: 208530 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:42,173-Speed 2633.25 samples/sec Loss 10.2131 LearningRate 0.0560 Epoch: 5 Global Step: 208540 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:08:46,060-Speed 2634.78 samples/sec Loss 10.1797 LearningRate 0.0560 Epoch: 5 Global Step: 208550 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:08:49,950-Speed 2632.99 samples/sec Loss 10.1315 LearningRate 0.0560 Epoch: 5 Global Step: 208560 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:08:53,847-Speed 2628.44 samples/sec Loss 10.3651 LearningRate 0.0560 Epoch: 5 Global Step: 208570 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:08:57,740-Speed 2631.27 samples/sec Loss 10.1471 LearningRate 0.0560 Epoch: 5 Global Step: 208580 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:09:01,632-Speed 2631.24 samples/sec Loss 10.3860 LearningRate 0.0560 Epoch: 5 Global Step: 208590 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:09:05,533-Speed 2625.59 samples/sec Loss 10.1614 LearningRate 0.0560 Epoch: 5 Global Step: 208600 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:09:09,423-Speed 2632.90 samples/sec Loss 10.0886 LearningRate 0.0560 Epoch: 5 Global Step: 208610 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:09:13,316-Speed 2631.44 samples/sec Loss 10.2571 LearningRate 0.0560 Epoch: 5 Global Step: 208620 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:09:17,204-Speed 2634.20 samples/sec Loss 10.2493 LearningRate 0.0560 Epoch: 5 Global Step: 208630 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:09:21,101-Speed 2627.78 samples/sec Loss 10.2137 LearningRate 0.0560 Epoch: 5 Global Step: 208640 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:09:24,992-Speed 2632.80 samples/sec Loss 10.2175 LearningRate 0.0560 Epoch: 5 Global Step: 208650 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:09:28,878-Speed 2635.17 samples/sec Loss 10.3209 LearningRate 0.0560 Epoch: 5 Global Step: 208660 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:09:32,769-Speed 2632.67 samples/sec Loss 10.1688 LearningRate 0.0560 Epoch: 5 Global Step: 208670 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:09:36,657-Speed 2634.56 samples/sec Loss 10.2904 LearningRate 0.0560 Epoch: 5 Global Step: 208680 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:09:40,553-Speed 2628.70 samples/sec Loss 10.3774 LearningRate 0.0560 Epoch: 5 Global Step: 208690 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:09:44,507-Speed 2590.23 samples/sec Loss 10.2943 LearningRate 0.0560 Epoch: 5 Global Step: 208700 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:09:48,372-Speed 2650.19 samples/sec Loss 10.8591 LearningRate 0.0560 Epoch: 5 Global Step: 208710 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:09:52,301-Speed 2607.08 samples/sec Loss 10.2306 LearningRate 0.0560 Epoch: 5 Global Step: 208720 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:09:56,204-Speed 2624.07 samples/sec Loss 10.1797 LearningRate 0.0560 Epoch: 5 Global Step: 208730 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:00,099-Speed 2629.83 samples/sec Loss 10.2020 LearningRate 0.0560 Epoch: 5 Global Step: 208740 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:03,992-Speed 2630.95 samples/sec Loss 10.1961 LearningRate 0.0560 Epoch: 5 Global Step: 208750 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:07,881-Speed 2634.10 samples/sec Loss 10.2556 LearningRate 0.0560 Epoch: 5 Global Step: 208760 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:11,771-Speed 2632.81 samples/sec Loss 10.0751 LearningRate 0.0560 Epoch: 5 Global Step: 208770 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:15,662-Speed 2632.78 samples/sec Loss 10.1959 LearningRate 0.0560 Epoch: 5 Global Step: 208780 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:19,554-Speed 2631.85 samples/sec Loss 10.1936 LearningRate 0.0560 Epoch: 5 Global Step: 208790 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:23,440-Speed 2635.00 samples/sec Loss 10.1590 LearningRate 0.0560 Epoch: 5 Global Step: 208800 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:10:27,329-Speed 2634.28 samples/sec Loss 10.1302 LearningRate 0.0560 Epoch: 5 Global Step: 208810 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:31,222-Speed 2630.89 samples/sec Loss 10.2293 LearningRate 0.0560 Epoch: 5 Global Step: 208820 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:35,116-Speed 2630.82 samples/sec Loss 10.1616 LearningRate 0.0560 Epoch: 5 Global Step: 208830 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:39,009-Speed 2630.66 samples/sec Loss 10.4207 LearningRate 0.0560 Epoch: 5 Global Step: 208840 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:42,900-Speed 2632.43 samples/sec Loss 10.2717 LearningRate 0.0560 Epoch: 5 Global Step: 208850 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:46,828-Speed 2607.43 samples/sec Loss 10.1430 LearningRate 0.0560 Epoch: 5 Global Step: 208860 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:50,718-Speed 2633.59 samples/sec Loss 10.2189 LearningRate 0.0560 Epoch: 5 Global Step: 208870 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:54,611-Speed 2630.71 samples/sec Loss 10.3740 LearningRate 0.0560 Epoch: 5 Global Step: 208880 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:10:58,504-Speed 2630.92 samples/sec Loss 10.2807 LearningRate 0.0560 Epoch: 5 Global Step: 208890 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:11:02,407-Speed 2624.66 samples/sec Loss 10.3255 LearningRate 0.0560 Epoch: 5 Global Step: 208900 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:11:06,294-Speed 2635.07 samples/sec Loss 10.2573 LearningRate 0.0560 Epoch: 5 Global Step: 208910 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:10,187-Speed 2630.64 samples/sec Loss 10.2411 LearningRate 0.0560 Epoch: 5 Global Step: 208920 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:14,114-Speed 2608.72 samples/sec Loss 10.2109 LearningRate 0.0560 Epoch: 5 Global Step: 208930 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:18,009-Speed 2629.71 samples/sec Loss 10.1777 LearningRate 0.0560 Epoch: 5 Global Step: 208940 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:21,897-Speed 2633.94 samples/sec Loss 10.2104 LearningRate 0.0560 Epoch: 5 Global Step: 208950 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:25,788-Speed 2632.26 samples/sec Loss 10.1081 LearningRate 0.0560 Epoch: 5 Global Step: 208960 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:29,677-Speed 2633.66 samples/sec Loss 10.2000 LearningRate 0.0560 Epoch: 5 Global Step: 208970 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:33,569-Speed 2631.90 samples/sec Loss 10.0917 LearningRate 0.0560 Epoch: 5 Global Step: 208980 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:37,459-Speed 2632.55 samples/sec Loss 10.0864 LearningRate 0.0560 Epoch: 5 Global Step: 208990 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:41,353-Speed 2630.54 samples/sec Loss 10.2728 LearningRate 0.0560 Epoch: 5 Global Step: 209000 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:11:45,248-Speed 2629.55 samples/sec Loss 10.2300 LearningRate 0.0560 Epoch: 5 Global Step: 209010 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:11:49,140-Speed 2631.73 samples/sec Loss 10.2805 LearningRate 0.0560 Epoch: 5 Global Step: 209020 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:11:53,033-Speed 2631.19 samples/sec Loss 10.2026 LearningRate 0.0560 Epoch: 5 Global Step: 209030 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:11:56,927-Speed 2630.21 samples/sec Loss 10.1828 LearningRate 0.0560 Epoch: 5 Global Step: 209040 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:12:00,833-Speed 2622.16 samples/sec Loss 10.2208 LearningRate 0.0560 Epoch: 5 Global Step: 209050 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:12:04,726-Speed 2630.62 samples/sec Loss 10.1583 LearningRate 0.0559 Epoch: 5 Global Step: 209060 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:12:08,621-Speed 2629.48 samples/sec Loss 10.3044 LearningRate 0.0559 Epoch: 5 Global Step: 209070 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:12:12,517-Speed 2628.85 samples/sec Loss 10.3321 LearningRate 0.0559 Epoch: 5 Global Step: 209080 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:12:16,420-Speed 2624.65 samples/sec Loss 10.2407 LearningRate 0.0559 Epoch: 5 Global Step: 209090 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:12:20,391-Speed 2579.58 samples/sec Loss 10.1396 LearningRate 0.0559 Epoch: 5 Global Step: 209100 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:24,284-Speed 2630.29 samples/sec Loss 10.3872 LearningRate 0.0559 Epoch: 5 Global Step: 209110 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:28,179-Speed 2629.68 samples/sec Loss 10.1163 LearningRate 0.0559 Epoch: 5 Global Step: 209120 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:32,076-Speed 2628.12 samples/sec Loss 10.1792 LearningRate 0.0559 Epoch: 5 Global Step: 209130 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:35,969-Speed 2631.52 samples/sec Loss 10.1904 LearningRate 0.0559 Epoch: 5 Global Step: 209140 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:39,866-Speed 2628.55 samples/sec Loss 10.2136 LearningRate 0.0559 Epoch: 5 Global Step: 209150 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:43,753-Speed 2635.21 samples/sec Loss 10.1372 LearningRate 0.0559 Epoch: 5 Global Step: 209160 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:47,646-Speed 2630.88 samples/sec Loss 10.3020 LearningRate 0.0559 Epoch: 5 Global Step: 209170 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:51,538-Speed 2631.25 samples/sec Loss 10.2102 LearningRate 0.0559 Epoch: 5 Global Step: 209180 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:12:56,293-Speed 2154.50 samples/sec Loss 10.1995 LearningRate 0.0559 Epoch: 5 Global Step: 209190 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:00,188-Speed 2629.58 samples/sec Loss 10.2636 LearningRate 0.0559 Epoch: 5 Global Step: 209200 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:13:04,079-Speed 2632.19 samples/sec Loss 10.0835 LearningRate 0.0559 Epoch: 5 Global Step: 209210 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:13:07,969-Speed 2633.03 samples/sec Loss 10.1723 LearningRate 0.0559 Epoch: 5 Global Step: 209220 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:13:11,866-Speed 2628.13 samples/sec Loss 10.3657 LearningRate 0.0559 Epoch: 5 Global Step: 209230 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:13:15,761-Speed 2629.88 samples/sec Loss 10.1915 LearningRate 0.0559 Epoch: 5 Global Step: 209240 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:13:19,657-Speed 2629.79 samples/sec Loss 10.2623 LearningRate 0.0559 Epoch: 5 Global Step: 209250 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:13:23,552-Speed 2629.11 samples/sec Loss 10.2230 LearningRate 0.0559 Epoch: 5 Global Step: 209260 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:13:27,428-Speed 2642.78 samples/sec Loss 10.1954 LearningRate 0.0559 Epoch: 5 Global Step: 209270 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:31,396-Speed 2581.46 samples/sec Loss 10.2502 LearningRate 0.0559 Epoch: 5 Global Step: 209280 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:35,321-Speed 2609.20 samples/sec Loss 10.2160 LearningRate 0.0559 Epoch: 5 Global Step: 209290 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:39,274-Speed 2591.25 samples/sec Loss 10.1025 LearningRate 0.0559 Epoch: 5 Global Step: 209300 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:43,170-Speed 2629.14 samples/sec Loss 10.3507 LearningRate 0.0559 Epoch: 5 Global Step: 209310 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:47,062-Speed 2631.68 samples/sec Loss 10.2316 LearningRate 0.0559 Epoch: 5 Global Step: 209320 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:50,964-Speed 2625.11 samples/sec Loss 10.1128 LearningRate 0.0559 Epoch: 5 Global Step: 209330 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:54,860-Speed 2629.37 samples/sec Loss 10.2683 LearningRate 0.0559 Epoch: 5 Global Step: 209340 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:13:58,749-Speed 2634.19 samples/sec Loss 10.1848 LearningRate 0.0559 Epoch: 5 Global Step: 209350 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:14:02,640-Speed 2631.67 samples/sec Loss 10.3750 LearningRate 0.0559 Epoch: 5 Global Step: 209360 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:14:06,665-Speed 2545.12 samples/sec Loss 10.1466 LearningRate 0.0559 Epoch: 5 Global Step: 209370 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:10,756-Speed 2503.36 samples/sec Loss 10.3312 LearningRate 0.0559 Epoch: 5 Global Step: 209380 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:14,795-Speed 2536.43 samples/sec Loss 10.2260 LearningRate 0.0559 Epoch: 5 Global Step: 209390 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:18,710-Speed 2615.56 samples/sec Loss 10.2792 LearningRate 0.0559 Epoch: 5 Global Step: 209400 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:22,606-Speed 2629.64 samples/sec Loss 10.1602 LearningRate 0.0559 Epoch: 5 Global Step: 209410 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:26,500-Speed 2630.04 samples/sec Loss 10.1357 LearningRate 0.0559 Epoch: 5 Global Step: 209420 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:30,415-Speed 2616.57 samples/sec Loss 10.1345 LearningRate 0.0559 Epoch: 5 Global Step: 209430 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:34,322-Speed 2621.61 samples/sec Loss 10.2345 LearningRate 0.0559 Epoch: 5 Global Step: 209440 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:38,245-Speed 2610.77 samples/sec Loss 10.2325 LearningRate 0.0559 Epoch: 5 Global Step: 209450 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:42,147-Speed 2624.55 samples/sec Loss 10.2051 LearningRate 0.0559 Epoch: 5 Global Step: 209460 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:46,051-Speed 2624.65 samples/sec Loss 10.2689 LearningRate 0.0559 Epoch: 5 Global Step: 209470 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:14:49,935-Speed 2637.18 samples/sec Loss 10.1544 LearningRate 0.0559 Epoch: 5 Global Step: 209480 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:53,839-Speed 2623.84 samples/sec Loss 10.2268 LearningRate 0.0559 Epoch: 5 Global Step: 209490 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:14:57,760-Speed 2612.32 samples/sec Loss 10.1811 LearningRate 0.0559 Epoch: 5 Global Step: 209500 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:01,667-Speed 2621.68 samples/sec Loss 10.3559 LearningRate 0.0559 Epoch: 5 Global Step: 209510 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:05,558-Speed 2632.12 samples/sec Loss 10.1576 LearningRate 0.0559 Epoch: 5 Global Step: 209520 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:09,449-Speed 2632.06 samples/sec Loss 10.1893 LearningRate 0.0559 Epoch: 5 Global Step: 209530 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:13,346-Speed 2629.07 samples/sec Loss 10.1281 LearningRate 0.0559 Epoch: 5 Global Step: 209540 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:17,236-Speed 2632.61 samples/sec Loss 10.2050 LearningRate 0.0559 Epoch: 5 Global Step: 209550 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:21,125-Speed 2633.86 samples/sec Loss 10.2056 LearningRate 0.0559 Epoch: 5 Global Step: 209560 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:25,029-Speed 2623.50 samples/sec Loss 10.1803 LearningRate 0.0559 Epoch: 5 Global Step: 209570 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:28,910-Speed 2639.14 samples/sec Loss 10.2458 LearningRate 0.0559 Epoch: 5 Global Step: 209580 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:32,807-Speed 2628.67 samples/sec Loss 10.1349 LearningRate 0.0559 Epoch: 5 Global Step: 209590 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:36,701-Speed 2629.91 samples/sec Loss 10.2076 LearningRate 0.0559 Epoch: 5 Global Step: 209600 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:40,594-Speed 2630.62 samples/sec Loss 10.1696 LearningRate 0.0559 Epoch: 5 Global Step: 209610 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:44,484-Speed 2633.62 samples/sec Loss 10.2533 LearningRate 0.0558 Epoch: 5 Global Step: 209620 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:48,379-Speed 2629.70 samples/sec Loss 10.2718 LearningRate 0.0558 Epoch: 5 Global Step: 209630 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:52,279-Speed 2626.31 samples/sec Loss 10.1063 LearningRate 0.0558 Epoch: 5 Global Step: 209640 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:15:56,198-Speed 2613.42 samples/sec Loss 10.0807 LearningRate 0.0558 Epoch: 5 Global Step: 209650 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:16:00,094-Speed 2629.21 samples/sec Loss 10.1517 LearningRate 0.0558 Epoch: 5 Global Step: 209660 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:16:03,986-Speed 2631.97 samples/sec Loss 10.1760 LearningRate 0.0558 Epoch: 5 Global Step: 209670 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:16:07,879-Speed 2630.72 samples/sec Loss 10.1047 LearningRate 0.0558 Epoch: 5 Global Step: 209680 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:16:11,757-Speed 2641.56 samples/sec Loss 10.1884 LearningRate 0.0558 Epoch: 5 Global Step: 209690 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:16:15,632-Speed 2643.34 samples/sec Loss 10.2628 LearningRate 0.0558 Epoch: 5 Global Step: 209700 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:19,527-Speed 2629.30 samples/sec Loss 10.2766 LearningRate 0.0558 Epoch: 5 Global Step: 209710 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:23,419-Speed 2631.55 samples/sec Loss 10.3392 LearningRate 0.0558 Epoch: 5 Global Step: 209720 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:27,315-Speed 2629.40 samples/sec Loss 10.1931 LearningRate 0.0558 Epoch: 5 Global Step: 209730 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:31,216-Speed 2625.67 samples/sec Loss 10.0785 LearningRate 0.0558 Epoch: 5 Global Step: 209740 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:35,122-Speed 2622.57 samples/sec Loss 10.1159 LearningRate 0.0558 Epoch: 5 Global Step: 209750 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:39,034-Speed 2617.83 samples/sec Loss 10.1624 LearningRate 0.0558 Epoch: 5 Global Step: 209760 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:42,936-Speed 2625.15 samples/sec Loss 10.2672 LearningRate 0.0558 Epoch: 5 Global Step: 209770 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:46,842-Speed 2622.36 samples/sec Loss 10.2930 LearningRate 0.0558 Epoch: 5 Global Step: 209780 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:50,733-Speed 2632.23 samples/sec Loss 10.2097 LearningRate 0.0558 Epoch: 5 Global Step: 209790 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:16:54,625-Speed 2631.76 samples/sec Loss 10.0654 LearningRate 0.0558 Epoch: 5 Global Step: 209800 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:16:58,522-Speed 2628.63 samples/sec Loss 10.0295 LearningRate 0.0558 Epoch: 5 Global Step: 209810 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:17:02,412-Speed 2632.88 samples/sec Loss 10.1445 LearningRate 0.0558 Epoch: 5 Global Step: 209820 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:17:06,309-Speed 2628.22 samples/sec Loss 10.2062 LearningRate 0.0558 Epoch: 5 Global Step: 209830 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:17:10,187-Speed 2641.49 samples/sec Loss 10.1323 LearningRate 0.0558 Epoch: 5 Global Step: 209840 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:14,083-Speed 2628.88 samples/sec Loss 10.1948 LearningRate 0.0558 Epoch: 5 Global Step: 209850 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:17,975-Speed 2631.32 samples/sec Loss 10.1950 LearningRate 0.0558 Epoch: 5 Global Step: 209860 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:21,871-Speed 2629.14 samples/sec Loss 10.2500 LearningRate 0.0558 Epoch: 5 Global Step: 209870 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:25,819-Speed 2594.39 samples/sec Loss 10.1438 LearningRate 0.0558 Epoch: 5 Global Step: 209880 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:29,712-Speed 2631.26 samples/sec Loss 10.2083 LearningRate 0.0558 Epoch: 5 Global Step: 209890 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:33,604-Speed 2631.73 samples/sec Loss 10.2260 LearningRate 0.0558 Epoch: 5 Global Step: 209900 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:37,494-Speed 2632.76 samples/sec Loss 10.2018 LearningRate 0.0558 Epoch: 5 Global Step: 209910 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:41,386-Speed 2632.20 samples/sec Loss 10.1746 LearningRate 0.0558 Epoch: 5 Global Step: 209920 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:45,277-Speed 2632.29 samples/sec Loss 10.2572 LearningRate 0.0558 Epoch: 5 Global Step: 209930 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:17:49,171-Speed 2630.46 samples/sec Loss 10.1229 LearningRate 0.0558 Epoch: 5 Global Step: 209940 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:17:53,070-Speed 2626.59 samples/sec Loss 10.0628 LearningRate 0.0558 Epoch: 5 Global Step: 209950 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:17:56,967-Speed 2628.67 samples/sec Loss 10.2470 LearningRate 0.0558 Epoch: 5 Global Step: 209960 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:18:00,864-Speed 2628.37 samples/sec Loss 10.1227 LearningRate 0.0558 Epoch: 5 Global Step: 209970 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:18:04,764-Speed 2626.74 samples/sec Loss 10.1346 LearningRate 0.0558 Epoch: 5 Global Step: 209980 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:18:08,661-Speed 2627.69 samples/sec Loss 10.1441 LearningRate 0.0558 Epoch: 5 Global Step: 209990 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:18:12,555-Speed 2630.29 samples/sec Loss 10.1887 LearningRate 0.0558 Epoch: 5 Global Step: 210000 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:18:55,793-[lfw][210000]XNorm: 24.450740
Training: 2022-04-13 19:18:55,794-[lfw][210000]Accuracy-Flip: 0.99650+-0.00345
Training: 2022-04-13 19:18:55,794-[lfw][210000]Accuracy-Highest: 0.99783
Training: 2022-04-13 19:19:45,881-[cfp_fp][210000]XNorm: 22.013891
Training: 2022-04-13 19:19:45,881-[cfp_fp][210000]Accuracy-Flip: 0.98214+-0.00607
Training: 2022-04-13 19:19:45,883-[cfp_fp][210000]Accuracy-Highest: 0.98314
Training: 2022-04-13 19:20:28,899-[agedb_30][210000]XNorm: 24.174806
Training: 2022-04-13 19:20:28,900-[agedb_30][210000]Accuracy-Flip: 0.96983+-0.00769
Training: 2022-04-13 19:20:28,901-[agedb_30][210000]Accuracy-Highest: 0.97150
Training: 2022-04-13 19:20:32,768-Speed 73.03 samples/sec Loss 10.1919 LearningRate 0.0558 Epoch: 5 Global Step: 210010 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:20:36,627-Speed 2654.07 samples/sec Loss 10.1917 LearningRate 0.0558 Epoch: 5 Global Step: 210020 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:20:40,493-Speed 2649.14 samples/sec Loss 10.2132 LearningRate 0.0558 Epoch: 5 Global Step: 210030 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:20:44,473-Speed 2573.71 samples/sec Loss 10.1918 LearningRate 0.0558 Epoch: 5 Global Step: 210040 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:20:48,349-Speed 2642.72 samples/sec Loss 9.9840 LearningRate 0.0558 Epoch: 5 Global Step: 210050 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:20:52,219-Speed 2646.49 samples/sec Loss 10.1846 LearningRate 0.0558 Epoch: 5 Global Step: 210060 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:20:56,128-Speed 2620.13 samples/sec Loss 10.2476 LearningRate 0.0558 Epoch: 5 Global Step: 210070 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:00,005-Speed 2642.23 samples/sec Loss 10.1166 LearningRate 0.0558 Epoch: 5 Global Step: 210080 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:03,885-Speed 2639.99 samples/sec Loss 10.1512 LearningRate 0.0558 Epoch: 5 Global Step: 210090 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:07,763-Speed 2640.97 samples/sec Loss 10.3491 LearningRate 0.0558 Epoch: 5 Global Step: 210100 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:11,651-Speed 2634.53 samples/sec Loss 10.2785 LearningRate 0.0558 Epoch: 5 Global Step: 210110 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:15,535-Speed 2638.02 samples/sec Loss 10.2470 LearningRate 0.0558 Epoch: 5 Global Step: 210120 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:19,423-Speed 2634.80 samples/sec Loss 10.0660 LearningRate 0.0558 Epoch: 5 Global Step: 210130 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:23,349-Speed 2608.46 samples/sec Loss 10.1811 LearningRate 0.0558 Epoch: 5 Global Step: 210140 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:27,239-Speed 2633.48 samples/sec Loss 10.1758 LearningRate 0.0558 Epoch: 5 Global Step: 210150 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:21:31,124-Speed 2636.29 samples/sec Loss 10.0067 LearningRate 0.0558 Epoch: 5 Global Step: 210160 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:21:34,996-Speed 2645.20 samples/sec Loss 10.2034 LearningRate 0.0557 Epoch: 5 Global Step: 210170 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:38,886-Speed 2633.23 samples/sec Loss 10.2643 LearningRate 0.0557 Epoch: 5 Global Step: 210180 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:42,776-Speed 2633.22 samples/sec Loss 10.1790 LearningRate 0.0557 Epoch: 5 Global Step: 210190 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:46,665-Speed 2633.87 samples/sec Loss 10.2525 LearningRate 0.0557 Epoch: 5 Global Step: 210200 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:50,572-Speed 2621.50 samples/sec Loss 10.2492 LearningRate 0.0557 Epoch: 5 Global Step: 210210 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:54,491-Speed 2613.27 samples/sec Loss 10.1375 LearningRate 0.0557 Epoch: 5 Global Step: 210220 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:21:58,391-Speed 2626.65 samples/sec Loss 10.0901 LearningRate 0.0557 Epoch: 5 Global Step: 210230 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:22:02,285-Speed 2630.37 samples/sec Loss 10.1774 LearningRate 0.0557 Epoch: 5 Global Step: 210240 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:06,241-Speed 2589.12 samples/sec Loss 10.1780 LearningRate 0.0557 Epoch: 5 Global Step: 210250 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:10,133-Speed 2632.32 samples/sec Loss 10.1509 LearningRate 0.0557 Epoch: 5 Global Step: 210260 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:14,029-Speed 2628.96 samples/sec Loss 10.0641 LearningRate 0.0557 Epoch: 5 Global Step: 210270 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:17,927-Speed 2627.25 samples/sec Loss 10.2250 LearningRate 0.0557 Epoch: 5 Global Step: 210280 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:21,825-Speed 2628.30 samples/sec Loss 10.2398 LearningRate 0.0557 Epoch: 5 Global Step: 210290 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:25,720-Speed 2629.02 samples/sec Loss 10.0868 LearningRate 0.0557 Epoch: 5 Global Step: 210300 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:29,613-Speed 2630.92 samples/sec Loss 10.0943 LearningRate 0.0557 Epoch: 5 Global Step: 210310 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:33,510-Speed 2629.11 samples/sec Loss 10.1441 LearningRate 0.0557 Epoch: 5 Global Step: 210320 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:37,400-Speed 2633.17 samples/sec Loss 10.1125 LearningRate 0.0557 Epoch: 5 Global Step: 210330 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:22:41,290-Speed 2633.22 samples/sec Loss 10.2013 LearningRate 0.0557 Epoch: 5 Global Step: 210340 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:22:45,186-Speed 2628.84 samples/sec Loss 10.0513 LearningRate 0.0557 Epoch: 5 Global Step: 210350 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:22:49,076-Speed 2633.25 samples/sec Loss 10.1727 LearningRate 0.0557 Epoch: 5 Global Step: 210360 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:22:52,965-Speed 2633.52 samples/sec Loss 10.3022 LearningRate 0.0557 Epoch: 5 Global Step: 210370 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:22:56,856-Speed 2632.18 samples/sec Loss 10.2713 LearningRate 0.0557 Epoch: 5 Global Step: 210380 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:23:00,749-Speed 2631.23 samples/sec Loss 10.1817 LearningRate 0.0557 Epoch: 5 Global Step: 210390 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:23:04,645-Speed 2628.66 samples/sec Loss 10.2585 LearningRate 0.0557 Epoch: 5 Global Step: 210400 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:23:08,540-Speed 2629.76 samples/sec Loss 10.0575 LearningRate 0.0557 Epoch: 5 Global Step: 210410 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:23:12,480-Speed 2600.00 samples/sec Loss 10.2934 LearningRate 0.0557 Epoch: 5 Global Step: 210420 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:23:16,378-Speed 2628.13 samples/sec Loss 10.2617 LearningRate 0.0557 Epoch: 5 Global Step: 210430 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:23:20,284-Speed 2622.09 samples/sec Loss 10.2623 LearningRate 0.0557 Epoch: 5 Global Step: 210440 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:23:24,225-Speed 2598.95 samples/sec Loss 10.1867 LearningRate 0.0557 Epoch: 5 Global Step: 210450 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:23:28,092-Speed 2648.63 samples/sec Loss 10.2117 LearningRate 0.0557 Epoch: 5 Global Step: 210460 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:31,988-Speed 2628.57 samples/sec Loss 9.9907 LearningRate 0.0557 Epoch: 5 Global Step: 210470 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:35,894-Speed 2622.88 samples/sec Loss 10.1518 LearningRate 0.0557 Epoch: 5 Global Step: 210480 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:39,800-Speed 2621.98 samples/sec Loss 10.1344 LearningRate 0.0557 Epoch: 5 Global Step: 210490 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:43,705-Speed 2622.94 samples/sec Loss 10.3725 LearningRate 0.0557 Epoch: 5 Global Step: 210500 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:47,607-Speed 2624.83 samples/sec Loss 10.1875 LearningRate 0.0557 Epoch: 5 Global Step: 210510 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:51,514-Speed 2621.92 samples/sec Loss 10.2259 LearningRate 0.0557 Epoch: 5 Global Step: 210520 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:55,427-Speed 2616.96 samples/sec Loss 10.2289 LearningRate 0.0557 Epoch: 5 Global Step: 210530 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:23:59,339-Speed 2618.60 samples/sec Loss 10.1435 LearningRate 0.0557 Epoch: 5 Global Step: 210540 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:03,239-Speed 2625.80 samples/sec Loss 10.1141 LearningRate 0.0557 Epoch: 5 Global Step: 210550 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:07,137-Speed 2627.47 samples/sec Loss 10.2015 LearningRate 0.0557 Epoch: 5 Global Step: 210560 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:24:11,034-Speed 2628.27 samples/sec Loss 9.9223 LearningRate 0.0557 Epoch: 5 Global Step: 210570 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:24:14,934-Speed 2626.37 samples/sec Loss 10.1115 LearningRate 0.0557 Epoch: 5 Global Step: 210580 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:24:18,835-Speed 2625.83 samples/sec Loss 10.1705 LearningRate 0.0557 Epoch: 5 Global Step: 210590 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:24:22,733-Speed 2627.47 samples/sec Loss 10.1948 LearningRate 0.0557 Epoch: 5 Global Step: 210600 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:24:26,630-Speed 2628.19 samples/sec Loss 10.0486 LearningRate 0.0557 Epoch: 5 Global Step: 210610 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:24:30,518-Speed 2634.11 samples/sec Loss 10.2177 LearningRate 0.0557 Epoch: 5 Global Step: 210620 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:34,414-Speed 2629.19 samples/sec Loss 10.1990 LearningRate 0.0557 Epoch: 5 Global Step: 210630 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:38,317-Speed 2624.08 samples/sec Loss 10.3333 LearningRate 0.0557 Epoch: 5 Global Step: 210640 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:42,218-Speed 2625.34 samples/sec Loss 10.2414 LearningRate 0.0557 Epoch: 5 Global Step: 210650 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:46,117-Speed 2627.62 samples/sec Loss 10.1277 LearningRate 0.0557 Epoch: 5 Global Step: 210660 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:50,028-Speed 2618.48 samples/sec Loss 10.1846 LearningRate 0.0557 Epoch: 5 Global Step: 210670 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:53,927-Speed 2626.68 samples/sec Loss 10.2920 LearningRate 0.0557 Epoch: 5 Global Step: 210680 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:24:57,829-Speed 2625.57 samples/sec Loss 10.0526 LearningRate 0.0557 Epoch: 5 Global Step: 210690 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:25:01,728-Speed 2626.43 samples/sec Loss 10.1982 LearningRate 0.0557 Epoch: 5 Global Step: 210700 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:25:05,638-Speed 2619.47 samples/sec Loss 10.1887 LearningRate 0.0557 Epoch: 5 Global Step: 210710 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:25:09,539-Speed 2625.62 samples/sec Loss 10.2778 LearningRate 0.0557 Epoch: 5 Global Step: 210720 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:13,478-Speed 2600.05 samples/sec Loss 10.1750 LearningRate 0.0556 Epoch: 5 Global Step: 210730 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:17,380-Speed 2624.92 samples/sec Loss 10.1711 LearningRate 0.0556 Epoch: 5 Global Step: 210740 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:21,285-Speed 2623.29 samples/sec Loss 9.9844 LearningRate 0.0556 Epoch: 5 Global Step: 210750 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:25,188-Speed 2624.48 samples/sec Loss 10.1572 LearningRate 0.0556 Epoch: 5 Global Step: 210760 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:29,098-Speed 2619.07 samples/sec Loss 10.1847 LearningRate 0.0556 Epoch: 5 Global Step: 210770 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:32,997-Speed 2626.77 samples/sec Loss 10.0585 LearningRate 0.0556 Epoch: 5 Global Step: 210780 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:36,914-Speed 2614.64 samples/sec Loss 10.2087 LearningRate 0.0556 Epoch: 5 Global Step: 210790 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:40,824-Speed 2620.16 samples/sec Loss 10.3117 LearningRate 0.0556 Epoch: 5 Global Step: 210800 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:44,732-Speed 2620.94 samples/sec Loss 10.1242 LearningRate 0.0556 Epoch: 5 Global Step: 210810 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:25:48,633-Speed 2625.20 samples/sec Loss 10.1794 LearningRate 0.0556 Epoch: 5 Global Step: 210820 Fp16 Grad Scale: 262144 Required: 70 hours
Training: 2022-04-13 19:25:52,471-Speed 2668.48 samples/sec Loss 10.2761 LearningRate 0.0556 Epoch: 5 Global Step: 210830 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:25:56,345-Speed 2644.39 samples/sec Loss 10.3384 LearningRate 0.0556 Epoch: 5 Global Step: 210840 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:00,242-Speed 2627.94 samples/sec Loss 10.1931 LearningRate 0.0556 Epoch: 5 Global Step: 210850 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:04,149-Speed 2621.52 samples/sec Loss 10.1985 LearningRate 0.0556 Epoch: 5 Global Step: 210860 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:08,058-Speed 2620.29 samples/sec Loss 10.2912 LearningRate 0.0556 Epoch: 5 Global Step: 210870 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:11,968-Speed 2619.36 samples/sec Loss 10.2146 LearningRate 0.0556 Epoch: 5 Global Step: 210880 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:15,866-Speed 2627.52 samples/sec Loss 10.0060 LearningRate 0.0556 Epoch: 5 Global Step: 210890 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:19,769-Speed 2624.59 samples/sec Loss 10.2251 LearningRate 0.0556 Epoch: 5 Global Step: 210900 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:23,674-Speed 2622.73 samples/sec Loss 10.3475 LearningRate 0.0556 Epoch: 5 Global Step: 210910 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:27,572-Speed 2627.83 samples/sec Loss 10.1959 LearningRate 0.0556 Epoch: 5 Global Step: 210920 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:31,471-Speed 2626.95 samples/sec Loss 10.2786 LearningRate 0.0556 Epoch: 5 Global Step: 210930 Fp16 Grad Scale: 8192 Required: 70 hours
Training: 2022-04-13 19:26:35,370-Speed 2626.33 samples/sec Loss 10.0667 LearningRate 0.0556 Epoch: 5 Global Step: 210940 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:26:39,267-Speed 2628.38 samples/sec Loss 10.1806 LearningRate 0.0556 Epoch: 5 Global Step: 210950 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:26:43,165-Speed 2627.86 samples/sec Loss 10.1512 LearningRate 0.0556 Epoch: 5 Global Step: 210960 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:26:47,066-Speed 2624.99 samples/sec Loss 10.1626 LearningRate 0.0556 Epoch: 5 Global Step: 210970 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:26:50,979-Speed 2617.80 samples/sec Loss 10.2326 LearningRate 0.0556 Epoch: 5 Global Step: 210980 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:26:54,888-Speed 2620.57 samples/sec Loss 10.2345 LearningRate 0.0556 Epoch: 5 Global Step: 210990 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:26:58,790-Speed 2624.91 samples/sec Loss 10.0684 LearningRate 0.0556 Epoch: 5 Global Step: 211000 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:27:02,691-Speed 2625.23 samples/sec Loss 10.3161 LearningRate 0.0556 Epoch: 5 Global Step: 211010 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:27:06,603-Speed 2618.53 samples/sec Loss 10.1371 LearningRate 0.0556 Epoch: 5 Global Step: 211020 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:27:10,500-Speed 2627.79 samples/sec Loss 10.1921 LearningRate 0.0556 Epoch: 5 Global Step: 211030 Fp16 Grad Scale: 16384 Required: 70 hours
Training: 2022-04-13 19:27:14,401-Speed 2626.02 samples/sec Loss 10.1511 LearningRate 0.0556 Epoch: 5 Global Step: 211040 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:18,316-Speed 2616.79 samples/sec Loss 10.1384 LearningRate 0.0556 Epoch: 5 Global Step: 211050 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:22,266-Speed 2592.92 samples/sec Loss 10.2221 LearningRate 0.0556 Epoch: 5 Global Step: 211060 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:26,167-Speed 2625.65 samples/sec Loss 10.3201 LearningRate 0.0556 Epoch: 5 Global Step: 211070 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:30,076-Speed 2619.86 samples/sec Loss 10.1340 LearningRate 0.0556 Epoch: 5 Global Step: 211080 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:33,983-Speed 2621.65 samples/sec Loss 10.1888 LearningRate 0.0556 Epoch: 5 Global Step: 211090 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:37,892-Speed 2620.23 samples/sec Loss 10.0708 LearningRate 0.0556 Epoch: 5 Global Step: 211100 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:41,801-Speed 2620.52 samples/sec Loss 10.2007 LearningRate 0.0556 Epoch: 5 Global Step: 211110 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:45,739-Speed 2600.74 samples/sec Loss 10.1394 LearningRate 0.0556 Epoch: 5 Global Step: 211120 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:49,816-Speed 2512.36 samples/sec Loss 10.1230 LearningRate 0.0556 Epoch: 5 Global Step: 211130 Fp16 Grad Scale: 32768 Required: 70 hours
Training: 2022-04-13 19:27:53,806-Speed 2566.61 samples/sec Loss 10.1460 LearningRate 0.0556 Epoch: 5 Global Step: 211140 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:27:57,723-Speed 2614.70 samples/sec Loss 9.9662 LearningRate 0.0556 Epoch: 5 Global Step: 211150 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:01,628-Speed 2622.78 samples/sec Loss 10.1193 LearningRate 0.0556 Epoch: 5 Global Step: 211160 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:05,536-Speed 2621.00 samples/sec Loss 10.0186 LearningRate 0.0556 Epoch: 5 Global Step: 211170 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:09,450-Speed 2616.79 samples/sec Loss 10.1189 LearningRate 0.0556 Epoch: 5 Global Step: 211180 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:13,356-Speed 2622.39 samples/sec Loss 10.2578 LearningRate 0.0556 Epoch: 5 Global Step: 211190 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:17,262-Speed 2622.11 samples/sec Loss 10.0660 LearningRate 0.0556 Epoch: 5 Global Step: 211200 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:21,175-Speed 2617.80 samples/sec Loss 9.9885 LearningRate 0.0556 Epoch: 5 Global Step: 211210 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:25,082-Speed 2621.39 samples/sec Loss 10.0794 LearningRate 0.0556 Epoch: 5 Global Step: 211220 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:28,985-Speed 2624.14 samples/sec Loss 10.1331 LearningRate 0.0556 Epoch: 5 Global Step: 211230 Fp16 Grad Scale: 65536 Required: 70 hours
Training: 2022-04-13 19:28:32,891-Speed 2622.28 samples/sec Loss 10.0917 LearningRate 0.0556 Epoch: 5 Global Step: 211240 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:28:36,804-Speed 2617.53 samples/sec Loss 10.1733 LearningRate 0.0556 Epoch: 5 Global Step: 211250 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:28:40,704-Speed 2626.28 samples/sec Loss 10.0458 LearningRate 0.0556 Epoch: 5 Global Step: 211260 Fp16 Grad Scale: 131072 Required: 70 hours
Training: 2022-04-13 19:28:44,609-Speed 2622.50 samples/sec Loss 10.0672 LearningRate 0.0556 Epoch: 5 Global Step: 211270 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:28:48,512-Speed 2624.64 samples/sec Loss 10.1543 LearningRate 0.0555 Epoch: 5 Global Step: 211280 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:28:52,416-Speed 2623.25 samples/sec Loss 10.0823 LearningRate 0.0555 Epoch: 5 Global Step: 211290 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:28:56,317-Speed 2625.89 samples/sec Loss 10.1985 LearningRate 0.0555 Epoch: 5 Global Step: 211300 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:00,230-Speed 2617.19 samples/sec Loss 9.9412 LearningRate 0.0555 Epoch: 5 Global Step: 211310 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:04,139-Speed 2620.20 samples/sec Loss 10.3201 LearningRate 0.0555 Epoch: 5 Global Step: 211320 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:08,056-Speed 2614.99 samples/sec Loss 10.1608 LearningRate 0.0555 Epoch: 5 Global Step: 211330 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:11,947-Speed 2632.53 samples/sec Loss 10.1609 LearningRate 0.0555 Epoch: 5 Global Step: 211340 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:15,854-Speed 2621.59 samples/sec Loss 10.1651 LearningRate 0.0555 Epoch: 5 Global Step: 211350 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:19,764-Speed 2618.88 samples/sec Loss 9.9713 LearningRate 0.0555 Epoch: 5 Global Step: 211360 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:23,668-Speed 2623.79 samples/sec Loss 10.1756 LearningRate 0.0555 Epoch: 5 Global Step: 211370 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:27,569-Speed 2625.37 samples/sec Loss 10.0961 LearningRate 0.0555 Epoch: 5 Global Step: 211380 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:31,472-Speed 2624.52 samples/sec Loss 10.2439 LearningRate 0.0555 Epoch: 5 Global Step: 211390 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:35,376-Speed 2623.50 samples/sec Loss 10.1738 LearningRate 0.0555 Epoch: 5 Global Step: 211400 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:39,280-Speed 2623.64 samples/sec Loss 10.0216 LearningRate 0.0555 Epoch: 5 Global Step: 211410 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:43,181-Speed 2625.78 samples/sec Loss 10.2341 LearningRate 0.0555 Epoch: 5 Global Step: 211420 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:47,087-Speed 2622.36 samples/sec Loss 10.1856 LearningRate 0.0555 Epoch: 5 Global Step: 211430 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:50,977-Speed 2632.46 samples/sec Loss 10.2407 LearningRate 0.0555 Epoch: 5 Global Step: 211440 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:54,882-Speed 2623.13 samples/sec Loss 10.0431 LearningRate 0.0555 Epoch: 5 Global Step: 211450 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:29:58,784-Speed 2624.22 samples/sec Loss 10.0909 LearningRate 0.0555 Epoch: 5 Global Step: 211460 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:30:02,689-Speed 2623.68 samples/sec Loss 10.0293 LearningRate 0.0555 Epoch: 5 Global Step: 211470 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:30:06,594-Speed 2622.34 samples/sec Loss 10.1527 LearningRate 0.0555 Epoch: 5 Global Step: 211480 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:30:10,504-Speed 2619.97 samples/sec Loss 10.0855 LearningRate 0.0555 Epoch: 5 Global Step: 211490 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:30:14,408-Speed 2623.37 samples/sec Loss 10.2199 LearningRate 0.0555 Epoch: 5 Global Step: 211500 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:30:18,312-Speed 2623.66 samples/sec Loss 10.1278 LearningRate 0.0555 Epoch: 5 Global Step: 211510 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:30:22,214-Speed 2625.29 samples/sec Loss 10.1111 LearningRate 0.0555 Epoch: 5 Global Step: 211520 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:30:26,102-Speed 2633.91 samples/sec Loss 10.1553 LearningRate 0.0555 Epoch: 5 Global Step: 211530 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:30,012-Speed 2619.25 samples/sec Loss 10.2680 LearningRate 0.0555 Epoch: 5 Global Step: 211540 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:33,914-Speed 2625.31 samples/sec Loss 10.1072 LearningRate 0.0555 Epoch: 5 Global Step: 211550 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:37,826-Speed 2617.75 samples/sec Loss 10.1725 LearningRate 0.0555 Epoch: 5 Global Step: 211560 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:41,726-Speed 2626.25 samples/sec Loss 10.0600 LearningRate 0.0555 Epoch: 5 Global Step: 211570 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:45,634-Speed 2621.15 samples/sec Loss 10.0241 LearningRate 0.0555 Epoch: 5 Global Step: 211580 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:49,536-Speed 2625.33 samples/sec Loss 10.2179 LearningRate 0.0555 Epoch: 5 Global Step: 211590 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:53,436-Speed 2626.42 samples/sec Loss 10.1039 LearningRate 0.0555 Epoch: 5 Global Step: 211600 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:30:57,337-Speed 2625.43 samples/sec Loss 10.1268 LearningRate 0.0555 Epoch: 5 Global Step: 211610 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:01,238-Speed 2625.02 samples/sec Loss 10.2151 LearningRate 0.0555 Epoch: 5 Global Step: 211620 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:05,142-Speed 2623.29 samples/sec Loss 10.1066 LearningRate 0.0555 Epoch: 5 Global Step: 211630 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:31:09,033-Speed 2632.38 samples/sec Loss 10.0953 LearningRate 0.0555 Epoch: 5 Global Step: 211640 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:12,929-Speed 2629.54 samples/sec Loss 10.1013 LearningRate 0.0555 Epoch: 5 Global Step: 211650 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:16,834-Speed 2622.78 samples/sec Loss 10.0850 LearningRate 0.0555 Epoch: 5 Global Step: 211660 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:20,740-Speed 2622.50 samples/sec Loss 10.1922 LearningRate 0.0555 Epoch: 5 Global Step: 211670 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:24,644-Speed 2624.05 samples/sec Loss 10.2748 LearningRate 0.0555 Epoch: 5 Global Step: 211680 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:28,547-Speed 2623.93 samples/sec Loss 10.2802 LearningRate 0.0555 Epoch: 5 Global Step: 211690 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:32,457-Speed 2619.63 samples/sec Loss 10.2168 LearningRate 0.0555 Epoch: 5 Global Step: 211700 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:36,359-Speed 2624.61 samples/sec Loss 10.2276 LearningRate 0.0555 Epoch: 5 Global Step: 211710 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:40,261-Speed 2625.03 samples/sec Loss 10.1493 LearningRate 0.0555 Epoch: 5 Global Step: 211720 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:44,162-Speed 2625.95 samples/sec Loss 10.1244 LearningRate 0.0555 Epoch: 5 Global Step: 211730 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:31:48,060-Speed 2627.04 samples/sec Loss 10.2042 LearningRate 0.0555 Epoch: 5 Global Step: 211740 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:31:51,964-Speed 2623.61 samples/sec Loss 10.1774 LearningRate 0.0555 Epoch: 5 Global Step: 211750 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:31:55,867-Speed 2624.67 samples/sec Loss 10.1623 LearningRate 0.0555 Epoch: 5 Global Step: 211760 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:31:59,769-Speed 2624.35 samples/sec Loss 10.1256 LearningRate 0.0555 Epoch: 5 Global Step: 211770 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:32:03,678-Speed 2620.84 samples/sec Loss 10.2684 LearningRate 0.0555 Epoch: 5 Global Step: 211780 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:32:07,578-Speed 2625.99 samples/sec Loss 9.9127 LearningRate 0.0555 Epoch: 5 Global Step: 211790 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:32:11,480-Speed 2625.04 samples/sec Loss 10.0331 LearningRate 0.0555 Epoch: 5 Global Step: 211800 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:32:15,382-Speed 2624.36 samples/sec Loss 10.1048 LearningRate 0.0555 Epoch: 5 Global Step: 211810 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:32:19,287-Speed 2623.19 samples/sec Loss 10.1211 LearningRate 0.0555 Epoch: 5 Global Step: 211820 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:32:23,195-Speed 2620.53 samples/sec Loss 10.1074 LearningRate 0.0555 Epoch: 5 Global Step: 211830 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:32:27,099-Speed 2624.07 samples/sec Loss 10.0940 LearningRate 0.0554 Epoch: 5 Global Step: 211840 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:32:31,004-Speed 2622.91 samples/sec Loss 10.0175 LearningRate 0.0554 Epoch: 5 Global Step: 211850 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:32:34,836-Speed 2672.64 samples/sec Loss 10.9874 LearningRate 0.0554 Epoch: 5 Global Step: 211860 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:32:38,738-Speed 2624.53 samples/sec Loss 10.5663 LearningRate 0.0554 Epoch: 5 Global Step: 211870 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:32:42,638-Speed 2626.84 samples/sec Loss 10.2804 LearningRate 0.0554 Epoch: 5 Global Step: 211880 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:32:46,549-Speed 2618.44 samples/sec Loss 10.2284 LearningRate 0.0554 Epoch: 5 Global Step: 211890 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:32:50,448-Speed 2627.20 samples/sec Loss 10.2934 LearningRate 0.0554 Epoch: 5 Global Step: 211900 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:32:54,354-Speed 2622.16 samples/sec Loss 10.2174 LearningRate 0.0554 Epoch: 5 Global Step: 211910 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:32:58,267-Speed 2617.44 samples/sec Loss 10.1605 LearningRate 0.0554 Epoch: 5 Global Step: 211920 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:33:02,174-Speed 2621.70 samples/sec Loss 10.3022 LearningRate 0.0554 Epoch: 5 Global Step: 211930 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:33:06,097-Speed 2610.69 samples/sec Loss 10.1307 LearningRate 0.0554 Epoch: 5 Global Step: 211940 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:33:10,004-Speed 2621.44 samples/sec Loss 10.1084 LearningRate 0.0554 Epoch: 5 Global Step: 211950 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:33:13,910-Speed 2621.90 samples/sec Loss 10.0934 LearningRate 0.0554 Epoch: 5 Global Step: 211960 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:17,831-Speed 2612.58 samples/sec Loss 10.0829 LearningRate 0.0554 Epoch: 5 Global Step: 211970 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:21,729-Speed 2627.51 samples/sec Loss 10.2402 LearningRate 0.0554 Epoch: 5 Global Step: 211980 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:25,640-Speed 2619.21 samples/sec Loss 10.2310 LearningRate 0.0554 Epoch: 5 Global Step: 211990 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:29,552-Speed 2617.67 samples/sec Loss 10.2151 LearningRate 0.0554 Epoch: 5 Global Step: 212000 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:33,456-Speed 2623.54 samples/sec Loss 10.0501 LearningRate 0.0554 Epoch: 5 Global Step: 212010 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:37,353-Speed 2628.46 samples/sec Loss 10.1725 LearningRate 0.0554 Epoch: 5 Global Step: 212020 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:41,253-Speed 2626.17 samples/sec Loss 10.0609 LearningRate 0.0554 Epoch: 5 Global Step: 212030 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:45,150-Speed 2627.81 samples/sec Loss 10.1716 LearningRate 0.0554 Epoch: 5 Global Step: 212040 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:49,052-Speed 2625.06 samples/sec Loss 10.2343 LearningRate 0.0554 Epoch: 5 Global Step: 212050 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:33:52,956-Speed 2623.03 samples/sec Loss 10.1094 LearningRate 0.0554 Epoch: 5 Global Step: 212060 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:33:56,860-Speed 2624.21 samples/sec Loss 10.1718 LearningRate 0.0554 Epoch: 5 Global Step: 212070 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:00,758-Speed 2627.55 samples/sec Loss 10.0912 LearningRate 0.0554 Epoch: 5 Global Step: 212080 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:04,661-Speed 2624.52 samples/sec Loss 10.2962 LearningRate 0.0554 Epoch: 5 Global Step: 212090 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:08,564-Speed 2623.88 samples/sec Loss 10.0589 LearningRate 0.0554 Epoch: 5 Global Step: 212100 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:12,467-Speed 2624.11 samples/sec Loss 10.1192 LearningRate 0.0554 Epoch: 5 Global Step: 212110 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:16,368-Speed 2625.63 samples/sec Loss 10.1077 LearningRate 0.0554 Epoch: 5 Global Step: 212120 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:20,277-Speed 2619.80 samples/sec Loss 10.1732 LearningRate 0.0554 Epoch: 5 Global Step: 212130 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:24,181-Speed 2623.87 samples/sec Loss 10.1707 LearningRate 0.0554 Epoch: 5 Global Step: 212140 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:28,082-Speed 2625.67 samples/sec Loss 10.2095 LearningRate 0.0554 Epoch: 5 Global Step: 212150 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:34:31,987-Speed 2622.72 samples/sec Loss 10.2704 LearningRate 0.0554 Epoch: 5 Global Step: 212160 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:34:35,897-Speed 2619.56 samples/sec Loss 10.1100 LearningRate 0.0554 Epoch: 5 Global Step: 212170 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:34:39,809-Speed 2618.21 samples/sec Loss 10.2212 LearningRate 0.0554 Epoch: 5 Global Step: 212180 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:34:43,714-Speed 2623.09 samples/sec Loss 10.2064 LearningRate 0.0554 Epoch: 5 Global Step: 212190 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:34:47,615-Speed 2625.77 samples/sec Loss 10.2096 LearningRate 0.0554 Epoch: 5 Global Step: 212200 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:34:51,531-Speed 2615.95 samples/sec Loss 10.0939 LearningRate 0.0554 Epoch: 5 Global Step: 212210 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:34:55,451-Speed 2613.03 samples/sec Loss 10.0855 LearningRate 0.0554 Epoch: 5 Global Step: 212220 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:34:59,354-Speed 2624.29 samples/sec Loss 10.1417 LearningRate 0.0554 Epoch: 5 Global Step: 212230 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:35:03,267-Speed 2617.65 samples/sec Loss 10.0282 LearningRate 0.0554 Epoch: 5 Global Step: 212240 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:35:07,174-Speed 2621.58 samples/sec Loss 9.9935 LearningRate 0.0554 Epoch: 5 Global Step: 212250 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:35:11,078-Speed 2623.55 samples/sec Loss 10.0121 LearningRate 0.0554 Epoch: 5 Global Step: 212260 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:35:14,946-Speed 2648.45 samples/sec Loss 10.0243 LearningRate 0.0554 Epoch: 5 Global Step: 212270 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:18,852-Speed 2621.86 samples/sec Loss 10.0981 LearningRate 0.0554 Epoch: 5 Global Step: 212280 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:22,760-Speed 2621.24 samples/sec Loss 10.1227 LearningRate 0.0554 Epoch: 5 Global Step: 212290 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:26,665-Speed 2623.42 samples/sec Loss 10.3192 LearningRate 0.0554 Epoch: 5 Global Step: 212300 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:30,568-Speed 2623.91 samples/sec Loss 10.2721 LearningRate 0.0554 Epoch: 5 Global Step: 212310 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:34,475-Speed 2621.71 samples/sec Loss 10.1312 LearningRate 0.0554 Epoch: 5 Global Step: 212320 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:38,388-Speed 2617.86 samples/sec Loss 10.2010 LearningRate 0.0554 Epoch: 5 Global Step: 212330 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:42,308-Speed 2612.23 samples/sec Loss 10.2171 LearningRate 0.0554 Epoch: 5 Global Step: 212340 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:46,218-Speed 2619.65 samples/sec Loss 10.1024 LearningRate 0.0554 Epoch: 5 Global Step: 212350 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:50,127-Speed 2620.34 samples/sec Loss 10.1842 LearningRate 0.0554 Epoch: 5 Global Step: 212360 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:35:54,032-Speed 2623.64 samples/sec Loss 10.2526 LearningRate 0.0554 Epoch: 5 Global Step: 212370 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:35:57,934-Speed 2624.58 samples/sec Loss 10.1520 LearningRate 0.0554 Epoch: 5 Global Step: 212380 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:01,836-Speed 2624.68 samples/sec Loss 10.2281 LearningRate 0.0554 Epoch: 5 Global Step: 212390 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:05,743-Speed 2621.46 samples/sec Loss 10.2213 LearningRate 0.0553 Epoch: 5 Global Step: 212400 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:09,650-Speed 2621.80 samples/sec Loss 10.1360 LearningRate 0.0553 Epoch: 5 Global Step: 212410 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:13,562-Speed 2618.61 samples/sec Loss 9.9852 LearningRate 0.0553 Epoch: 5 Global Step: 212420 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:17,469-Speed 2621.22 samples/sec Loss 10.1296 LearningRate 0.0553 Epoch: 5 Global Step: 212430 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:21,495-Speed 2543.88 samples/sec Loss 10.1634 LearningRate 0.0553 Epoch: 5 Global Step: 212440 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:25,398-Speed 2624.11 samples/sec Loss 9.9626 LearningRate 0.0553 Epoch: 5 Global Step: 212450 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:29,343-Speed 2596.25 samples/sec Loss 10.0982 LearningRate 0.0553 Epoch: 5 Global Step: 212460 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:33,248-Speed 2622.96 samples/sec Loss 10.1854 LearningRate 0.0553 Epoch: 5 Global Step: 212470 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:36:37,152-Speed 2623.97 samples/sec Loss 9.9703 LearningRate 0.0553 Epoch: 5 Global Step: 212480 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:36:41,057-Speed 2622.98 samples/sec Loss 10.1460 LearningRate 0.0553 Epoch: 5 Global Step: 212490 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:36:44,961-Speed 2623.21 samples/sec Loss 10.0228 LearningRate 0.0553 Epoch: 5 Global Step: 212500 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:36:48,868-Speed 2621.47 samples/sec Loss 10.2066 LearningRate 0.0553 Epoch: 5 Global Step: 212510 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:36:52,756-Speed 2634.65 samples/sec Loss 10.0838 LearningRate 0.0553 Epoch: 5 Global Step: 212520 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:36:56,679-Speed 2611.03 samples/sec Loss 10.0979 LearningRate 0.0553 Epoch: 5 Global Step: 212530 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:37:00,581-Speed 2624.28 samples/sec Loss 10.1037 LearningRate 0.0553 Epoch: 5 Global Step: 212540 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:37:04,493-Speed 2617.93 samples/sec Loss 10.1119 LearningRate 0.0553 Epoch: 5 Global Step: 212550 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:37:08,360-Speed 2649.40 samples/sec Loss 10.6067 LearningRate 0.0553 Epoch: 5 Global Step: 212560 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:12,261-Speed 2625.37 samples/sec Loss 10.8599 LearningRate 0.0553 Epoch: 5 Global Step: 212570 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:16,169-Speed 2620.93 samples/sec Loss 10.3965 LearningRate 0.0553 Epoch: 5 Global Step: 212580 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:20,071-Speed 2625.19 samples/sec Loss 10.1178 LearningRate 0.0553 Epoch: 5 Global Step: 212590 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:23,969-Speed 2627.25 samples/sec Loss 10.2907 LearningRate 0.0553 Epoch: 5 Global Step: 212600 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:27,888-Speed 2613.42 samples/sec Loss 10.1987 LearningRate 0.0553 Epoch: 5 Global Step: 212610 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:31,788-Speed 2626.31 samples/sec Loss 10.1748 LearningRate 0.0553 Epoch: 5 Global Step: 212620 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:35,687-Speed 2626.45 samples/sec Loss 10.1578 LearningRate 0.0553 Epoch: 5 Global Step: 212630 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:39,587-Speed 2626.71 samples/sec Loss 10.1653 LearningRate 0.0553 Epoch: 5 Global Step: 212640 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:43,486-Speed 2632.40 samples/sec Loss 10.1844 LearningRate 0.0553 Epoch: 5 Global Step: 212650 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:37:47,385-Speed 2627.08 samples/sec Loss 10.1652 LearningRate 0.0553 Epoch: 5 Global Step: 212660 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:37:51,285-Speed 2626.50 samples/sec Loss 10.3488 LearningRate 0.0553 Epoch: 5 Global Step: 212670 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:37:55,188-Speed 2624.47 samples/sec Loss 10.1354 LearningRate 0.0553 Epoch: 5 Global Step: 212680 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:37:59,091-Speed 2623.86 samples/sec Loss 10.3222 LearningRate 0.0553 Epoch: 5 Global Step: 212690 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:03,010-Speed 2613.46 samples/sec Loss 10.1712 LearningRate 0.0553 Epoch: 5 Global Step: 212700 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:06,910-Speed 2626.36 samples/sec Loss 10.1502 LearningRate 0.0553 Epoch: 5 Global Step: 212710 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:10,808-Speed 2627.48 samples/sec Loss 10.1835 LearningRate 0.0553 Epoch: 5 Global Step: 212720 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:14,708-Speed 2626.50 samples/sec Loss 10.0940 LearningRate 0.0553 Epoch: 5 Global Step: 212730 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:18,605-Speed 2627.94 samples/sec Loss 10.1035 LearningRate 0.0553 Epoch: 5 Global Step: 212740 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:22,505-Speed 2625.94 samples/sec Loss 10.1605 LearningRate 0.0553 Epoch: 5 Global Step: 212750 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:26,402-Speed 2628.37 samples/sec Loss 10.3384 LearningRate 0.0553 Epoch: 5 Global Step: 212760 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:38:30,303-Speed 2625.83 samples/sec Loss 10.0925 LearningRate 0.0553 Epoch: 5 Global Step: 212770 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:38:34,200-Speed 2628.28 samples/sec Loss 10.1741 LearningRate 0.0553 Epoch: 5 Global Step: 212780 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:38:38,099-Speed 2627.16 samples/sec Loss 10.0695 LearningRate 0.0553 Epoch: 5 Global Step: 212790 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:38:41,997-Speed 2627.38 samples/sec Loss 10.2096 LearningRate 0.0553 Epoch: 5 Global Step: 212800 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:38:45,882-Speed 2636.26 samples/sec Loss 10.1160 LearningRate 0.0553 Epoch: 5 Global Step: 212810 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:49,784-Speed 2624.99 samples/sec Loss 10.1628 LearningRate 0.0553 Epoch: 5 Global Step: 212820 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:53,685-Speed 2625.34 samples/sec Loss 10.1123 LearningRate 0.0553 Epoch: 5 Global Step: 212830 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:38:57,588-Speed 2624.55 samples/sec Loss 10.0966 LearningRate 0.0553 Epoch: 5 Global Step: 212840 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:39:01,494-Speed 2621.73 samples/sec Loss 10.0848 LearningRate 0.0553 Epoch: 5 Global Step: 212850 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:39:05,400-Speed 2621.95 samples/sec Loss 10.3336 LearningRate 0.0553 Epoch: 5 Global Step: 212860 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:39:09,311-Speed 2619.23 samples/sec Loss 10.2227 LearningRate 0.0553 Epoch: 5 Global Step: 212870 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:39:13,217-Speed 2622.76 samples/sec Loss 10.0907 LearningRate 0.0553 Epoch: 5 Global Step: 212880 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:39:17,119-Speed 2624.52 samples/sec Loss 9.9937 LearningRate 0.0553 Epoch: 5 Global Step: 212890 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:39:21,022-Speed 2623.98 samples/sec Loss 10.2966 LearningRate 0.0553 Epoch: 5 Global Step: 212900 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:39:24,929-Speed 2621.79 samples/sec Loss 10.1534 LearningRate 0.0553 Epoch: 5 Global Step: 212910 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:28,837-Speed 2620.75 samples/sec Loss 10.2650 LearningRate 0.0553 Epoch: 5 Global Step: 212920 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:32,753-Speed 2615.37 samples/sec Loss 10.2010 LearningRate 0.0553 Epoch: 5 Global Step: 212930 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:36,659-Speed 2621.71 samples/sec Loss 10.0295 LearningRate 0.0553 Epoch: 5 Global Step: 212940 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:40,573-Speed 2616.76 samples/sec Loss 10.1349 LearningRate 0.0553 Epoch: 5 Global Step: 212950 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:44,483-Speed 2620.07 samples/sec Loss 10.0060 LearningRate 0.0552 Epoch: 5 Global Step: 212960 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:48,387-Speed 2623.26 samples/sec Loss 10.1486 LearningRate 0.0552 Epoch: 5 Global Step: 212970 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:52,293-Speed 2622.87 samples/sec Loss 10.1675 LearningRate 0.0552 Epoch: 5 Global Step: 212980 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:39:56,213-Speed 2612.61 samples/sec Loss 10.1735 LearningRate 0.0552 Epoch: 5 Global Step: 212990 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:40:00,120-Speed 2621.56 samples/sec Loss 10.1420 LearningRate 0.0552 Epoch: 5 Global Step: 213000 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:40:04,028-Speed 2620.74 samples/sec Loss 10.1809 LearningRate 0.0552 Epoch: 5 Global Step: 213010 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:07,934-Speed 2621.85 samples/sec Loss 10.0385 LearningRate 0.0552 Epoch: 5 Global Step: 213020 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:11,842-Speed 2620.67 samples/sec Loss 10.1539 LearningRate 0.0552 Epoch: 5 Global Step: 213030 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:15,756-Speed 2617.36 samples/sec Loss 10.0369 LearningRate 0.0552 Epoch: 5 Global Step: 213040 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:19,660-Speed 2623.85 samples/sec Loss 9.9586 LearningRate 0.0552 Epoch: 5 Global Step: 213050 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:23,565-Speed 2622.34 samples/sec Loss 10.1241 LearningRate 0.0552 Epoch: 5 Global Step: 213060 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:27,467-Speed 2625.06 samples/sec Loss 10.2414 LearningRate 0.0552 Epoch: 5 Global Step: 213070 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:31,367-Speed 2626.48 samples/sec Loss 10.1469 LearningRate 0.0552 Epoch: 5 Global Step: 213080 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:35,267-Speed 2626.07 samples/sec Loss 10.0809 LearningRate 0.0552 Epoch: 5 Global Step: 213090 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:39,172-Speed 2622.76 samples/sec Loss 10.1759 LearningRate 0.0552 Epoch: 5 Global Step: 213100 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:40:43,073-Speed 2625.67 samples/sec Loss 10.2441 LearningRate 0.0552 Epoch: 5 Global Step: 213110 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:40:46,975-Speed 2624.87 samples/sec Loss 10.3381 LearningRate 0.0552 Epoch: 5 Global Step: 213120 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:40:50,888-Speed 2617.95 samples/sec Loss 10.2304 LearningRate 0.0552 Epoch: 5 Global Step: 213130 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:40:54,805-Speed 2614.75 samples/sec Loss 10.1481 LearningRate 0.0552 Epoch: 5 Global Step: 213140 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:40:58,715-Speed 2619.27 samples/sec Loss 10.3498 LearningRate 0.0552 Epoch: 5 Global Step: 213150 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:02,616-Speed 2625.52 samples/sec Loss 10.2120 LearningRate 0.0552 Epoch: 5 Global Step: 213160 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:06,518-Speed 2624.93 samples/sec Loss 10.1768 LearningRate 0.0552 Epoch: 5 Global Step: 213170 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:10,423-Speed 2623.08 samples/sec Loss 10.1734 LearningRate 0.0552 Epoch: 5 Global Step: 213180 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:14,329-Speed 2622.61 samples/sec Loss 10.0751 LearningRate 0.0552 Epoch: 5 Global Step: 213190 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:18,233-Speed 2623.34 samples/sec Loss 10.0828 LearningRate 0.0552 Epoch: 5 Global Step: 213200 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:22,123-Speed 2633.04 samples/sec Loss 10.1075 LearningRate 0.0552 Epoch: 5 Global Step: 213210 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:26,025-Speed 2625.04 samples/sec Loss 10.1665 LearningRate 0.0552 Epoch: 5 Global Step: 213220 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:29,927-Speed 2624.76 samples/sec Loss 10.0532 LearningRate 0.0552 Epoch: 5 Global Step: 213230 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:33,826-Speed 2626.95 samples/sec Loss 9.9607 LearningRate 0.0552 Epoch: 5 Global Step: 213240 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:37,725-Speed 2627.26 samples/sec Loss 10.0450 LearningRate 0.0552 Epoch: 5 Global Step: 213250 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:41,639-Speed 2616.90 samples/sec Loss 9.9743 LearningRate 0.0552 Epoch: 5 Global Step: 213260 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:45,535-Speed 2628.71 samples/sec Loss 10.0555 LearningRate 0.0552 Epoch: 5 Global Step: 213270 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:49,431-Speed 2628.82 samples/sec Loss 10.2828 LearningRate 0.0552 Epoch: 5 Global Step: 213280 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:53,334-Speed 2624.81 samples/sec Loss 10.2069 LearningRate 0.0552 Epoch: 5 Global Step: 213290 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:41:57,231-Speed 2628.16 samples/sec Loss 10.3347 LearningRate 0.0552 Epoch: 5 Global Step: 213300 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:42:01,179-Speed 2594.28 samples/sec Loss 10.1107 LearningRate 0.0552 Epoch: 5 Global Step: 213310 Fp16 Grad Scale: 524288 Required: 69 hours
Training: 2022-04-13 19:42:05,118-Speed 2600.01 samples/sec Loss 10.0960 LearningRate 0.0552 Epoch: 5 Global Step: 213320 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:42:09,022-Speed 2624.61 samples/sec Loss 10.0347 LearningRate 0.0552 Epoch: 5 Global Step: 213330 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:42:12,919-Speed 2627.91 samples/sec Loss 10.1402 LearningRate 0.0552 Epoch: 5 Global Step: 213340 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:42:16,817-Speed 2627.78 samples/sec Loss 10.1330 LearningRate 0.0552 Epoch: 5 Global Step: 213350 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:42:20,702-Speed 2636.48 samples/sec Loss 10.1890 LearningRate 0.0552 Epoch: 5 Global Step: 213360 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:24,621-Speed 2614.02 samples/sec Loss 9.9674 LearningRate 0.0552 Epoch: 5 Global Step: 213370 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:28,553-Speed 2604.77 samples/sec Loss 10.0064 LearningRate 0.0552 Epoch: 5 Global Step: 213380 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:32,462-Speed 2620.47 samples/sec Loss 10.1349 LearningRate 0.0552 Epoch: 5 Global Step: 213390 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:36,358-Speed 2628.50 samples/sec Loss 10.0540 LearningRate 0.0552 Epoch: 5 Global Step: 213400 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:40,256-Speed 2627.89 samples/sec Loss 10.1493 LearningRate 0.0552 Epoch: 5 Global Step: 213410 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:44,155-Speed 2627.06 samples/sec Loss 10.0716 LearningRate 0.0552 Epoch: 5 Global Step: 213420 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:48,056-Speed 2625.31 samples/sec Loss 10.0279 LearningRate 0.0552 Epoch: 5 Global Step: 213430 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:51,951-Speed 2630.15 samples/sec Loss 10.2648 LearningRate 0.0552 Epoch: 5 Global Step: 213440 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:55,847-Speed 2628.77 samples/sec Loss 10.0211 LearningRate 0.0552 Epoch: 5 Global Step: 213450 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:42:59,772-Speed 2609.44 samples/sec Loss 10.1115 LearningRate 0.0552 Epoch: 5 Global Step: 213460 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:43:03,643-Speed 2646.30 samples/sec Loss 10.0691 LearningRate 0.0552 Epoch: 5 Global Step: 213470 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:07,546-Speed 2624.76 samples/sec Loss 10.0490 LearningRate 0.0552 Epoch: 5 Global Step: 213480 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:11,474-Speed 2607.10 samples/sec Loss 10.1847 LearningRate 0.0552 Epoch: 5 Global Step: 213490 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:15,369-Speed 2630.41 samples/sec Loss 10.0792 LearningRate 0.0552 Epoch: 5 Global Step: 213500 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:19,267-Speed 2627.83 samples/sec Loss 9.9246 LearningRate 0.0551 Epoch: 5 Global Step: 213510 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:23,171-Speed 2623.63 samples/sec Loss 10.1290 LearningRate 0.0551 Epoch: 5 Global Step: 213520 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:27,067-Speed 2629.24 samples/sec Loss 10.1581 LearningRate 0.0551 Epoch: 5 Global Step: 213530 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:30,972-Speed 2622.84 samples/sec Loss 10.0891 LearningRate 0.0551 Epoch: 5 Global Step: 213540 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:34,871-Speed 2626.38 samples/sec Loss 10.0260 LearningRate 0.0551 Epoch: 5 Global Step: 213550 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:38,767-Speed 2629.22 samples/sec Loss 10.1020 LearningRate 0.0551 Epoch: 5 Global Step: 213560 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:42,662-Speed 2629.60 samples/sec Loss 10.2256 LearningRate 0.0551 Epoch: 5 Global Step: 213570 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:43:46,587-Speed 2609.92 samples/sec Loss 10.2028 LearningRate 0.0551 Epoch: 5 Global Step: 213580 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:43:50,480-Speed 2631.33 samples/sec Loss 10.0497 LearningRate 0.0551 Epoch: 5 Global Step: 213590 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:43:54,353-Speed 2644.37 samples/sec Loss 10.1815 LearningRate 0.0551 Epoch: 5 Global Step: 213600 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:43:58,253-Speed 2626.00 samples/sec Loss 10.0912 LearningRate 0.0551 Epoch: 5 Global Step: 213610 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:02,158-Speed 2623.42 samples/sec Loss 10.1352 LearningRate 0.0551 Epoch: 5 Global Step: 213620 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:06,055-Speed 2628.17 samples/sec Loss 10.2677 LearningRate 0.0551 Epoch: 5 Global Step: 213630 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:09,952-Speed 2627.77 samples/sec Loss 10.1049 LearningRate 0.0551 Epoch: 5 Global Step: 213640 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:13,856-Speed 2624.50 samples/sec Loss 10.1611 LearningRate 0.0551 Epoch: 5 Global Step: 213650 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:17,761-Speed 2622.76 samples/sec Loss 10.1771 LearningRate 0.0551 Epoch: 5 Global Step: 213660 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:21,677-Speed 2615.81 samples/sec Loss 10.1722 LearningRate 0.0551 Epoch: 5 Global Step: 213670 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:25,597-Speed 2612.42 samples/sec Loss 10.0932 LearningRate 0.0551 Epoch: 5 Global Step: 213680 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:29,514-Speed 2615.54 samples/sec Loss 10.1383 LearningRate 0.0551 Epoch: 5 Global Step: 213690 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:44:33,442-Speed 2607.59 samples/sec Loss 10.1069 LearningRate 0.0551 Epoch: 5 Global Step: 213700 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:44:37,345-Speed 2623.82 samples/sec Loss 10.1284 LearningRate 0.0551 Epoch: 5 Global Step: 213710 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:44:41,249-Speed 2623.56 samples/sec Loss 10.0913 LearningRate 0.0551 Epoch: 5 Global Step: 213720 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:44:45,144-Speed 2629.53 samples/sec Loss 10.1550 LearningRate 0.0551 Epoch: 5 Global Step: 213730 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:44:49,038-Speed 2630.29 samples/sec Loss 10.1705 LearningRate 0.0551 Epoch: 5 Global Step: 213740 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:44:52,936-Speed 2628.12 samples/sec Loss 10.1346 LearningRate 0.0551 Epoch: 5 Global Step: 213750 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:44:56,837-Speed 2625.54 samples/sec Loss 10.1142 LearningRate 0.0551 Epoch: 5 Global Step: 213760 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:00,740-Speed 2624.24 samples/sec Loss 10.1865 LearningRate 0.0551 Epoch: 5 Global Step: 213770 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:04,674-Speed 2603.63 samples/sec Loss 10.0326 LearningRate 0.0551 Epoch: 5 Global Step: 213780 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:08,585-Speed 2618.82 samples/sec Loss 9.9870 LearningRate 0.0551 Epoch: 5 Global Step: 213790 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:12,468-Speed 2637.20 samples/sec Loss 10.1744 LearningRate 0.0551 Epoch: 5 Global Step: 213800 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:16,367-Speed 2627.21 samples/sec Loss 10.1329 LearningRate 0.0551 Epoch: 5 Global Step: 213810 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:20,266-Speed 2627.33 samples/sec Loss 10.2364 LearningRate 0.0551 Epoch: 5 Global Step: 213820 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:24,157-Speed 2631.88 samples/sec Loss 10.1638 LearningRate 0.0551 Epoch: 5 Global Step: 213830 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:28,049-Speed 2631.44 samples/sec Loss 10.1014 LearningRate 0.0551 Epoch: 5 Global Step: 213840 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:31,944-Speed 2630.02 samples/sec Loss 10.1271 LearningRate 0.0551 Epoch: 5 Global Step: 213850 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:35,841-Speed 2628.05 samples/sec Loss 9.9681 LearningRate 0.0551 Epoch: 5 Global Step: 213860 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:39,838-Speed 2562.03 samples/sec Loss 10.1946 LearningRate 0.0551 Epoch: 5 Global Step: 213870 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:43,732-Speed 2630.42 samples/sec Loss 9.9576 LearningRate 0.0551 Epoch: 5 Global Step: 213880 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:45:47,621-Speed 2633.89 samples/sec Loss 10.0210 LearningRate 0.0551 Epoch: 5 Global Step: 213890 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:45:51,516-Speed 2629.17 samples/sec Loss 10.0735 LearningRate 0.0551 Epoch: 5 Global Step: 213900 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:45:55,414-Speed 2627.39 samples/sec Loss 10.1393 LearningRate 0.0551 Epoch: 5 Global Step: 213910 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:45:59,305-Speed 2633.12 samples/sec Loss 10.1055 LearningRate 0.0551 Epoch: 5 Global Step: 213920 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:46:03,198-Speed 2631.10 samples/sec Loss 10.0852 LearningRate 0.0551 Epoch: 5 Global Step: 213930 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:46:07,092-Speed 2630.03 samples/sec Loss 10.0834 LearningRate 0.0551 Epoch: 5 Global Step: 213940 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:46:10,987-Speed 2629.57 samples/sec Loss 10.1673 LearningRate 0.0551 Epoch: 5 Global Step: 213950 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:46:14,880-Speed 2630.85 samples/sec Loss 9.8407 LearningRate 0.0551 Epoch: 5 Global Step: 213960 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:46:18,774-Speed 2630.33 samples/sec Loss 10.0365 LearningRate 0.0551 Epoch: 5 Global Step: 213970 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:46:22,681-Speed 2621.44 samples/sec Loss 9.9783 LearningRate 0.0551 Epoch: 5 Global Step: 213980 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:46:26,582-Speed 2625.27 samples/sec Loss 10.0372 LearningRate 0.0551 Epoch: 5 Global Step: 213990 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:30,481-Speed 2626.95 samples/sec Loss 10.1813 LearningRate 0.0551 Epoch: 5 Global Step: 214000 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:34,390-Speed 2620.81 samples/sec Loss 10.1282 LearningRate 0.0551 Epoch: 5 Global Step: 214010 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:38,299-Speed 2619.77 samples/sec Loss 10.1578 LearningRate 0.0551 Epoch: 5 Global Step: 214020 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:42,192-Speed 2630.81 samples/sec Loss 10.1586 LearningRate 0.0551 Epoch: 5 Global Step: 214030 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:46,085-Speed 2631.00 samples/sec Loss 10.1800 LearningRate 0.0551 Epoch: 5 Global Step: 214040 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:49,979-Speed 2630.59 samples/sec Loss 9.9574 LearningRate 0.0551 Epoch: 5 Global Step: 214050 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:53,880-Speed 2625.82 samples/sec Loss 10.0342 LearningRate 0.0551 Epoch: 5 Global Step: 214060 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:46:57,774-Speed 2630.17 samples/sec Loss 10.0879 LearningRate 0.0550 Epoch: 5 Global Step: 214070 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:47:01,671-Speed 2628.06 samples/sec Loss 10.1276 LearningRate 0.0550 Epoch: 5 Global Step: 214080 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:47:05,562-Speed 2632.04 samples/sec Loss 10.0394 LearningRate 0.0550 Epoch: 5 Global Step: 214090 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:47:09,453-Speed 2632.48 samples/sec Loss 10.1381 LearningRate 0.0550 Epoch: 5 Global Step: 214100 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:47:13,332-Speed 2640.32 samples/sec Loss 10.1154 LearningRate 0.0550 Epoch: 5 Global Step: 214110 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:17,228-Speed 2629.50 samples/sec Loss 9.9950 LearningRate 0.0550 Epoch: 5 Global Step: 214120 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:21,133-Speed 2622.40 samples/sec Loss 10.1754 LearningRate 0.0550 Epoch: 5 Global Step: 214130 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:25,029-Speed 2628.94 samples/sec Loss 10.0531 LearningRate 0.0550 Epoch: 5 Global Step: 214140 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:28,926-Speed 2628.19 samples/sec Loss 10.0373 LearningRate 0.0550 Epoch: 5 Global Step: 214150 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:32,943-Speed 2549.86 samples/sec Loss 10.0658 LearningRate 0.0550 Epoch: 5 Global Step: 214160 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:36,834-Speed 2631.91 samples/sec Loss 10.1497 LearningRate 0.0550 Epoch: 5 Global Step: 214170 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:40,765-Speed 2605.95 samples/sec Loss 10.1005 LearningRate 0.0550 Epoch: 5 Global Step: 214180 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:44,658-Speed 2630.68 samples/sec Loss 10.1297 LearningRate 0.0550 Epoch: 5 Global Step: 214190 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:48,552-Speed 2630.68 samples/sec Loss 9.9419 LearningRate 0.0550 Epoch: 5 Global Step: 214200 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:52,428-Speed 2642.71 samples/sec Loss 10.1148 LearningRate 0.0550 Epoch: 5 Global Step: 214210 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:47:56,321-Speed 2630.95 samples/sec Loss 10.1190 LearningRate 0.0550 Epoch: 5 Global Step: 214220 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:00,218-Speed 2627.89 samples/sec Loss 10.0422 LearningRate 0.0550 Epoch: 5 Global Step: 214230 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:04,115-Speed 2628.05 samples/sec Loss 10.0708 LearningRate 0.0550 Epoch: 5 Global Step: 214240 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:08,008-Speed 2630.86 samples/sec Loss 10.0866 LearningRate 0.0550 Epoch: 5 Global Step: 214250 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:11,904-Speed 2629.06 samples/sec Loss 10.1565 LearningRate 0.0550 Epoch: 5 Global Step: 214260 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:15,804-Speed 2626.08 samples/sec Loss 10.1215 LearningRate 0.0550 Epoch: 5 Global Step: 214270 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:19,705-Speed 2625.52 samples/sec Loss 10.0653 LearningRate 0.0550 Epoch: 5 Global Step: 214280 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:23,615-Speed 2619.88 samples/sec Loss 10.1824 LearningRate 0.0550 Epoch: 5 Global Step: 214290 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:27,508-Speed 2630.84 samples/sec Loss 10.1327 LearningRate 0.0550 Epoch: 5 Global Step: 214300 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:48:31,402-Speed 2630.64 samples/sec Loss 10.1063 LearningRate 0.0550 Epoch: 5 Global Step: 214310 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:48:35,297-Speed 2629.04 samples/sec Loss 10.1547 LearningRate 0.0550 Epoch: 5 Global Step: 214320 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:48:39,193-Speed 2629.10 samples/sec Loss 10.0620 LearningRate 0.0550 Epoch: 5 Global Step: 214330 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:48:43,089-Speed 2628.81 samples/sec Loss 10.0049 LearningRate 0.0550 Epoch: 5 Global Step: 214340 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:48:46,985-Speed 2629.07 samples/sec Loss 10.0172 LearningRate 0.0550 Epoch: 5 Global Step: 214350 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:48:50,887-Speed 2624.99 samples/sec Loss 10.0646 LearningRate 0.0550 Epoch: 5 Global Step: 214360 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:48:54,796-Speed 2619.96 samples/sec Loss 10.0037 LearningRate 0.0550 Epoch: 5 Global Step: 214370 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:48:58,691-Speed 2629.39 samples/sec Loss 10.2389 LearningRate 0.0550 Epoch: 5 Global Step: 214380 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:49:02,590-Speed 2626.96 samples/sec Loss 10.1073 LearningRate 0.0550 Epoch: 5 Global Step: 214390 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:49:06,485-Speed 2629.81 samples/sec Loss 10.0018 LearningRate 0.0550 Epoch: 5 Global Step: 214400 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:49:10,386-Speed 2625.76 samples/sec Loss 10.1334 LearningRate 0.0550 Epoch: 5 Global Step: 214410 Fp16 Grad Scale: 524288 Required: 69 hours
Training: 2022-04-13 19:49:14,265-Speed 2640.13 samples/sec Loss 10.0744 LearningRate 0.0550 Epoch: 5 Global Step: 214420 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:49:18,164-Speed 2627.28 samples/sec Loss 9.9934 LearningRate 0.0550 Epoch: 5 Global Step: 214430 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:49:22,046-Speed 2638.32 samples/sec Loss 9.9855 LearningRate 0.0550 Epoch: 5 Global Step: 214440 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:25,944-Speed 2627.64 samples/sec Loss 10.0936 LearningRate 0.0550 Epoch: 5 Global Step: 214450 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:29,844-Speed 2626.31 samples/sec Loss 10.1699 LearningRate 0.0550 Epoch: 5 Global Step: 214460 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:33,740-Speed 2628.91 samples/sec Loss 10.1738 LearningRate 0.0550 Epoch: 5 Global Step: 214470 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:37,634-Speed 2629.69 samples/sec Loss 10.1349 LearningRate 0.0550 Epoch: 5 Global Step: 214480 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:41,528-Speed 2630.38 samples/sec Loss 9.9568 LearningRate 0.0550 Epoch: 5 Global Step: 214490 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:45,422-Speed 2630.69 samples/sec Loss 10.1405 LearningRate 0.0550 Epoch: 5 Global Step: 214500 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:49,353-Speed 2605.23 samples/sec Loss 10.1247 LearningRate 0.0550 Epoch: 5 Global Step: 214510 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:53,431-Speed 2511.67 samples/sec Loss 10.2048 LearningRate 0.0550 Epoch: 5 Global Step: 214520 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:49:57,328-Speed 2628.33 samples/sec Loss 10.0751 LearningRate 0.0550 Epoch: 5 Global Step: 214530 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:01,220-Speed 2631.64 samples/sec Loss 10.0005 LearningRate 0.0550 Epoch: 5 Global Step: 214540 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:50:05,118-Speed 2627.45 samples/sec Loss 10.0453 LearningRate 0.0550 Epoch: 5 Global Step: 214550 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:50:09,012-Speed 2630.21 samples/sec Loss 10.1243 LearningRate 0.0550 Epoch: 5 Global Step: 214560 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:50:12,909-Speed 2628.25 samples/sec Loss 10.1515 LearningRate 0.0550 Epoch: 5 Global Step: 214570 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:50:16,788-Speed 2640.44 samples/sec Loss 10.2128 LearningRate 0.0550 Epoch: 5 Global Step: 214580 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:20,828-Speed 2535.78 samples/sec Loss 10.2314 LearningRate 0.0550 Epoch: 5 Global Step: 214590 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:24,724-Speed 2628.72 samples/sec Loss 10.0388 LearningRate 0.0550 Epoch: 5 Global Step: 214600 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:28,619-Speed 2630.61 samples/sec Loss 10.0905 LearningRate 0.0550 Epoch: 5 Global Step: 214610 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:32,511-Speed 2631.31 samples/sec Loss 9.9867 LearningRate 0.0550 Epoch: 5 Global Step: 214620 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:36,401-Speed 2632.98 samples/sec Loss 10.0139 LearningRate 0.0549 Epoch: 5 Global Step: 214630 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:40,292-Speed 2631.93 samples/sec Loss 10.1939 LearningRate 0.0549 Epoch: 5 Global Step: 214640 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:44,189-Speed 2629.34 samples/sec Loss 10.0823 LearningRate 0.0549 Epoch: 5 Global Step: 214650 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:48,115-Speed 2608.61 samples/sec Loss 10.1066 LearningRate 0.0549 Epoch: 5 Global Step: 214660 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:52,004-Speed 2633.59 samples/sec Loss 9.9786 LearningRate 0.0549 Epoch: 5 Global Step: 214670 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:50:55,895-Speed 2632.54 samples/sec Loss 10.0649 LearningRate 0.0549 Epoch: 5 Global Step: 214680 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:50:59,792-Speed 2627.93 samples/sec Loss 10.1151 LearningRate 0.0549 Epoch: 5 Global Step: 214690 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:03,693-Speed 2625.72 samples/sec Loss 10.1462 LearningRate 0.0549 Epoch: 5 Global Step: 214700 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:07,585-Speed 2631.37 samples/sec Loss 9.9792 LearningRate 0.0549 Epoch: 5 Global Step: 214710 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:11,482-Speed 2628.41 samples/sec Loss 10.0255 LearningRate 0.0549 Epoch: 5 Global Step: 214720 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:15,389-Speed 2621.74 samples/sec Loss 10.1003 LearningRate 0.0549 Epoch: 5 Global Step: 214730 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:19,279-Speed 2632.46 samples/sec Loss 10.1039 LearningRate 0.0549 Epoch: 5 Global Step: 214740 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:23,175-Speed 2629.51 samples/sec Loss 10.2006 LearningRate 0.0549 Epoch: 5 Global Step: 214750 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:27,078-Speed 2623.70 samples/sec Loss 9.9297 LearningRate 0.0549 Epoch: 5 Global Step: 214760 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:30,980-Speed 2625.39 samples/sec Loss 9.9596 LearningRate 0.0549 Epoch: 5 Global Step: 214770 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:51:34,861-Speed 2639.29 samples/sec Loss 10.1970 LearningRate 0.0549 Epoch: 5 Global Step: 214780 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:51:38,753-Speed 2630.96 samples/sec Loss 10.0613 LearningRate 0.0549 Epoch: 5 Global Step: 214790 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:51:42,646-Speed 2630.98 samples/sec Loss 10.1730 LearningRate 0.0549 Epoch: 5 Global Step: 214800 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:51:46,545-Speed 2627.33 samples/sec Loss 10.0581 LearningRate 0.0549 Epoch: 5 Global Step: 214810 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:51:50,438-Speed 2630.88 samples/sec Loss 10.0977 LearningRate 0.0549 Epoch: 5 Global Step: 214820 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:51:54,331-Speed 2630.71 samples/sec Loss 10.1010 LearningRate 0.0549 Epoch: 5 Global Step: 214830 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:51:58,224-Speed 2631.20 samples/sec Loss 10.0416 LearningRate 0.0549 Epoch: 5 Global Step: 214840 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:02,127-Speed 2623.82 samples/sec Loss 10.0765 LearningRate 0.0549 Epoch: 5 Global Step: 214850 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:06,027-Speed 2626.21 samples/sec Loss 10.1832 LearningRate 0.0549 Epoch: 5 Global Step: 214860 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:09,924-Speed 2628.38 samples/sec Loss 10.0473 LearningRate 0.0549 Epoch: 5 Global Step: 214870 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:13,818-Speed 2630.51 samples/sec Loss 10.1552 LearningRate 0.0549 Epoch: 5 Global Step: 214880 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:52:17,687-Speed 2647.13 samples/sec Loss 10.0774 LearningRate 0.0549 Epoch: 5 Global Step: 214890 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:21,599-Speed 2618.18 samples/sec Loss 10.0829 LearningRate 0.0549 Epoch: 5 Global Step: 214900 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:25,491-Speed 2632.00 samples/sec Loss 10.0875 LearningRate 0.0549 Epoch: 5 Global Step: 214910 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:29,389-Speed 2627.25 samples/sec Loss 10.1231 LearningRate 0.0549 Epoch: 5 Global Step: 214920 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:33,323-Speed 2603.36 samples/sec Loss 10.2137 LearningRate 0.0549 Epoch: 5 Global Step: 214930 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:37,287-Speed 2584.30 samples/sec Loss 10.1433 LearningRate 0.0549 Epoch: 5 Global Step: 214940 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:41,179-Speed 2631.18 samples/sec Loss 10.0920 LearningRate 0.0549 Epoch: 5 Global Step: 214950 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:45,073-Speed 2630.58 samples/sec Loss 10.1711 LearningRate 0.0549 Epoch: 5 Global Step: 214960 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:48,966-Speed 2631.02 samples/sec Loss 10.1188 LearningRate 0.0549 Epoch: 5 Global Step: 214970 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:52,860-Speed 2630.47 samples/sec Loss 10.0397 LearningRate 0.0549 Epoch: 5 Global Step: 214980 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:52:56,752-Speed 2631.61 samples/sec Loss 10.0868 LearningRate 0.0549 Epoch: 5 Global Step: 214990 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:53:00,643-Speed 2632.37 samples/sec Loss 10.1237 LearningRate 0.0549 Epoch: 5 Global Step: 215000 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:53:04,536-Speed 2630.51 samples/sec Loss 10.0398 LearningRate 0.0549 Epoch: 5 Global Step: 215010 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:53:08,430-Speed 2630.26 samples/sec Loss 10.0859 LearningRate 0.0549 Epoch: 5 Global Step: 215020 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:53:12,318-Speed 2634.44 samples/sec Loss 10.0759 LearningRate 0.0549 Epoch: 5 Global Step: 215030 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:16,211-Speed 2631.29 samples/sec Loss 10.1056 LearningRate 0.0549 Epoch: 5 Global Step: 215040 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:20,111-Speed 2625.72 samples/sec Loss 10.0657 LearningRate 0.0549 Epoch: 5 Global Step: 215050 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:24,002-Speed 2633.03 samples/sec Loss 10.0260 LearningRate 0.0549 Epoch: 5 Global Step: 215060 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:27,897-Speed 2629.69 samples/sec Loss 10.0463 LearningRate 0.0549 Epoch: 5 Global Step: 215070 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:31,788-Speed 2632.31 samples/sec Loss 10.0727 LearningRate 0.0549 Epoch: 5 Global Step: 215080 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:35,682-Speed 2630.13 samples/sec Loss 10.0551 LearningRate 0.0549 Epoch: 5 Global Step: 215090 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:39,575-Speed 2630.69 samples/sec Loss 10.0459 LearningRate 0.0549 Epoch: 5 Global Step: 215100 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:43,468-Speed 2630.59 samples/sec Loss 10.2447 LearningRate 0.0549 Epoch: 5 Global Step: 215110 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:47,367-Speed 2627.33 samples/sec Loss 10.1985 LearningRate 0.0549 Epoch: 5 Global Step: 215120 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:51,241-Speed 2643.58 samples/sec Loss 10.0566 LearningRate 0.0549 Epoch: 5 Global Step: 215130 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:55,145-Speed 2623.57 samples/sec Loss 10.0327 LearningRate 0.0549 Epoch: 5 Global Step: 215140 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:53:59,037-Speed 2631.46 samples/sec Loss 10.0899 LearningRate 0.0549 Epoch: 5 Global Step: 215150 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:54:02,932-Speed 2629.68 samples/sec Loss 10.0036 LearningRate 0.0549 Epoch: 5 Global Step: 215160 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:54:06,830-Speed 2628.06 samples/sec Loss 10.0705 LearningRate 0.0549 Epoch: 5 Global Step: 215170 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:54:10,734-Speed 2623.29 samples/sec Loss 10.0096 LearningRate 0.0549 Epoch: 5 Global Step: 215180 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:54:14,648-Speed 2616.97 samples/sec Loss 10.0743 LearningRate 0.0548 Epoch: 5 Global Step: 215190 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:54:18,566-Speed 2614.09 samples/sec Loss 9.9948 LearningRate 0.0548 Epoch: 5 Global Step: 215200 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:54:22,486-Speed 2613.02 samples/sec Loss 10.1949 LearningRate 0.0548 Epoch: 5 Global Step: 215210 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:54:26,337-Speed 2659.64 samples/sec Loss 10.6052 LearningRate 0.0548 Epoch: 5 Global Step: 215220 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:30,243-Speed 2622.16 samples/sec Loss 10.3042 LearningRate 0.0548 Epoch: 5 Global Step: 215230 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:34,136-Speed 2630.82 samples/sec Loss 10.3436 LearningRate 0.0548 Epoch: 5 Global Step: 215240 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:38,033-Speed 2628.48 samples/sec Loss 10.0639 LearningRate 0.0548 Epoch: 5 Global Step: 215250 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:41,924-Speed 2635.58 samples/sec Loss 10.1076 LearningRate 0.0548 Epoch: 5 Global Step: 215260 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:45,820-Speed 2628.59 samples/sec Loss 10.1824 LearningRate 0.0548 Epoch: 5 Global Step: 215270 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:49,714-Speed 2630.03 samples/sec Loss 10.1345 LearningRate 0.0548 Epoch: 5 Global Step: 215280 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:53,614-Speed 2626.75 samples/sec Loss 9.9370 LearningRate 0.0548 Epoch: 5 Global Step: 215290 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:54:57,505-Speed 2632.12 samples/sec Loss 10.1115 LearningRate 0.0548 Epoch: 5 Global Step: 215300 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:55:01,404-Speed 2626.97 samples/sec Loss 10.0184 LearningRate 0.0548 Epoch: 5 Global Step: 215310 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 19:55:05,295-Speed 2632.04 samples/sec Loss 10.1159 LearningRate 0.0548 Epoch: 5 Global Step: 215320 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:09,191-Speed 2628.92 samples/sec Loss 10.1747 LearningRate 0.0548 Epoch: 5 Global Step: 215330 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:13,083-Speed 2631.68 samples/sec Loss 10.2457 LearningRate 0.0548 Epoch: 5 Global Step: 215340 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:16,983-Speed 2626.57 samples/sec Loss 10.1608 LearningRate 0.0548 Epoch: 5 Global Step: 215350 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:20,875-Speed 2631.93 samples/sec Loss 10.1179 LearningRate 0.0548 Epoch: 5 Global Step: 215360 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:24,773-Speed 2626.98 samples/sec Loss 10.4762 LearningRate 0.0548 Epoch: 5 Global Step: 215370 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:28,677-Speed 2623.56 samples/sec Loss 10.8365 LearningRate 0.0548 Epoch: 5 Global Step: 215380 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:32,572-Speed 2629.67 samples/sec Loss 10.2773 LearningRate 0.0548 Epoch: 5 Global Step: 215390 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:36,462-Speed 2632.75 samples/sec Loss 10.0210 LearningRate 0.0548 Epoch: 5 Global Step: 215400 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:40,353-Speed 2632.18 samples/sec Loss 10.1552 LearningRate 0.0548 Epoch: 5 Global Step: 215410 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 19:55:44,270-Speed 2615.15 samples/sec Loss 10.0499 LearningRate 0.0548 Epoch: 5 Global Step: 215420 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:55:48,292-Speed 2546.57 samples/sec Loss 10.1392 LearningRate 0.0548 Epoch: 5 Global Step: 215430 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:55:52,189-Speed 2628.75 samples/sec Loss 10.1152 LearningRate 0.0548 Epoch: 5 Global Step: 215440 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:55:56,083-Speed 2630.11 samples/sec Loss 10.2149 LearningRate 0.0548 Epoch: 5 Global Step: 215450 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:56:00,089-Speed 2556.83 samples/sec Loss 10.1835 LearningRate 0.0548 Epoch: 5 Global Step: 215460 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:56:03,986-Speed 2627.97 samples/sec Loss 10.2761 LearningRate 0.0548 Epoch: 5 Global Step: 215470 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:56:07,885-Speed 2627.03 samples/sec Loss 10.2039 LearningRate 0.0548 Epoch: 5 Global Step: 215480 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:56:11,779-Speed 2630.07 samples/sec Loss 10.2547 LearningRate 0.0548 Epoch: 5 Global Step: 215490 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:56:15,680-Speed 2625.37 samples/sec Loss 10.0131 LearningRate 0.0548 Epoch: 5 Global Step: 215500 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:56:19,593-Speed 2617.34 samples/sec Loss 10.0749 LearningRate 0.0548 Epoch: 5 Global Step: 215510 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 19:56:23,492-Speed 2627.93 samples/sec Loss 10.2214 LearningRate 0.0548 Epoch: 5 Global Step: 215520 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:27,388-Speed 2628.66 samples/sec Loss 10.1414 LearningRate 0.0548 Epoch: 5 Global Step: 215530 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:31,284-Speed 2628.99 samples/sec Loss 10.0247 LearningRate 0.0548 Epoch: 5 Global Step: 215540 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:35,177-Speed 2630.92 samples/sec Loss 10.1898 LearningRate 0.0548 Epoch: 5 Global Step: 215550 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:39,068-Speed 2632.52 samples/sec Loss 10.0449 LearningRate 0.0548 Epoch: 5 Global Step: 215560 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:42,960-Speed 2631.07 samples/sec Loss 10.1265 LearningRate 0.0548 Epoch: 5 Global Step: 215570 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:46,852-Speed 2631.98 samples/sec Loss 10.0250 LearningRate 0.0548 Epoch: 5 Global Step: 215580 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:50,744-Speed 2631.72 samples/sec Loss 10.1180 LearningRate 0.0548 Epoch: 5 Global Step: 215590 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:54,655-Speed 2619.01 samples/sec Loss 9.9520 LearningRate 0.0548 Epoch: 5 Global Step: 215600 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:56:58,550-Speed 2629.39 samples/sec Loss 10.1569 LearningRate 0.0548 Epoch: 5 Global Step: 215610 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:02,448-Speed 2627.52 samples/sec Loss 10.0692 LearningRate 0.0548 Epoch: 5 Global Step: 215620 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:57:06,346-Speed 2627.89 samples/sec Loss 10.1952 LearningRate 0.0548 Epoch: 5 Global Step: 215630 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:57:10,241-Speed 2629.39 samples/sec Loss 10.0606 LearningRate 0.0548 Epoch: 5 Global Step: 215640 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:57:14,136-Speed 2630.37 samples/sec Loss 10.1459 LearningRate 0.0548 Epoch: 5 Global Step: 215650 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:57:18,012-Speed 2641.74 samples/sec Loss 10.0123 LearningRate 0.0548 Epoch: 5 Global Step: 215660 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:21,905-Speed 2631.61 samples/sec Loss 10.0523 LearningRate 0.0548 Epoch: 5 Global Step: 215670 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:25,801-Speed 2628.77 samples/sec Loss 9.9774 LearningRate 0.0548 Epoch: 5 Global Step: 215680 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:29,693-Speed 2631.49 samples/sec Loss 10.0688 LearningRate 0.0548 Epoch: 5 Global Step: 215690 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:33,594-Speed 2625.49 samples/sec Loss 9.9664 LearningRate 0.0548 Epoch: 5 Global Step: 215700 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:37,500-Speed 2622.32 samples/sec Loss 10.1447 LearningRate 0.0548 Epoch: 5 Global Step: 215710 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:41,396-Speed 2628.76 samples/sec Loss 10.0694 LearningRate 0.0548 Epoch: 5 Global Step: 215720 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:45,288-Speed 2631.74 samples/sec Loss 9.8772 LearningRate 0.0548 Epoch: 5 Global Step: 215730 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:49,179-Speed 2632.30 samples/sec Loss 9.9155 LearningRate 0.0548 Epoch: 5 Global Step: 215740 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:53,070-Speed 2632.75 samples/sec Loss 10.1016 LearningRate 0.0547 Epoch: 5 Global Step: 215750 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:57:56,963-Speed 2630.62 samples/sec Loss 10.1050 LearningRate 0.0547 Epoch: 5 Global Step: 215760 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:58:00,858-Speed 2629.96 samples/sec Loss 10.1032 LearningRate 0.0547 Epoch: 5 Global Step: 215770 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:58:04,751-Speed 2630.62 samples/sec Loss 9.9849 LearningRate 0.0547 Epoch: 5 Global Step: 215780 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:58:08,651-Speed 2625.94 samples/sec Loss 10.0752 LearningRate 0.0547 Epoch: 5 Global Step: 215790 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:12,551-Speed 2625.83 samples/sec Loss 10.1828 LearningRate 0.0547 Epoch: 5 Global Step: 215800 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:16,459-Speed 2621.02 samples/sec Loss 10.0465 LearningRate 0.0547 Epoch: 5 Global Step: 215810 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:20,357-Speed 2627.72 samples/sec Loss 10.0983 LearningRate 0.0547 Epoch: 5 Global Step: 215820 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:24,256-Speed 2627.00 samples/sec Loss 9.9483 LearningRate 0.0547 Epoch: 5 Global Step: 215830 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:28,164-Speed 2620.64 samples/sec Loss 10.1484 LearningRate 0.0547 Epoch: 5 Global Step: 215840 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:32,083-Speed 2613.58 samples/sec Loss 10.1368 LearningRate 0.0547 Epoch: 5 Global Step: 215850 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:36,003-Speed 2613.18 samples/sec Loss 10.0650 LearningRate 0.0547 Epoch: 5 Global Step: 215860 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:40,078-Speed 2513.36 samples/sec Loss 10.0661 LearningRate 0.0547 Epoch: 5 Global Step: 215870 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:44,002-Speed 2609.82 samples/sec Loss 10.0620 LearningRate 0.0547 Epoch: 5 Global Step: 215880 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:58:47,934-Speed 2605.89 samples/sec Loss 10.0619 LearningRate 0.0547 Epoch: 5 Global Step: 215890 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:58:51,831-Speed 2628.16 samples/sec Loss 9.9651 LearningRate 0.0547 Epoch: 5 Global Step: 215900 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:58:55,736-Speed 2623.22 samples/sec Loss 9.9596 LearningRate 0.0547 Epoch: 5 Global Step: 215910 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:58:59,636-Speed 2626.26 samples/sec Loss 9.9378 LearningRate 0.0547 Epoch: 5 Global Step: 215920 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:59:03,541-Speed 2622.82 samples/sec Loss 9.9800 LearningRate 0.0547 Epoch: 5 Global Step: 215930 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:59:07,452-Speed 2618.49 samples/sec Loss 10.1137 LearningRate 0.0547 Epoch: 5 Global Step: 215940 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:59:11,357-Speed 2622.97 samples/sec Loss 10.0145 LearningRate 0.0547 Epoch: 5 Global Step: 215950 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:59:15,247-Speed 2633.23 samples/sec Loss 9.9867 LearningRate 0.0547 Epoch: 5 Global Step: 215960 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:19,143-Speed 2629.26 samples/sec Loss 10.1360 LearningRate 0.0547 Epoch: 5 Global Step: 215970 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:23,040-Speed 2628.91 samples/sec Loss 10.0032 LearningRate 0.0547 Epoch: 5 Global Step: 215980 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:26,930-Speed 2633.94 samples/sec Loss 10.0657 LearningRate 0.0547 Epoch: 5 Global Step: 215990 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:30,842-Speed 2618.38 samples/sec Loss 10.2476 LearningRate 0.0547 Epoch: 5 Global Step: 216000 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:34,738-Speed 2629.11 samples/sec Loss 10.0435 LearningRate 0.0547 Epoch: 5 Global Step: 216010 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:38,631-Speed 2631.23 samples/sec Loss 9.9358 LearningRate 0.0547 Epoch: 5 Global Step: 216020 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:42,523-Speed 2631.58 samples/sec Loss 9.9274 LearningRate 0.0547 Epoch: 5 Global Step: 216030 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:46,465-Speed 2598.38 samples/sec Loss 10.1142 LearningRate 0.0547 Epoch: 5 Global Step: 216040 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:50,381-Speed 2615.51 samples/sec Loss 10.1660 LearningRate 0.0547 Epoch: 5 Global Step: 216050 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 19:59:54,272-Speed 2632.58 samples/sec Loss 10.1042 LearningRate 0.0547 Epoch: 5 Global Step: 216060 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 19:59:58,167-Speed 2629.67 samples/sec Loss 10.0708 LearningRate 0.0547 Epoch: 5 Global Step: 216070 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:00:02,081-Speed 2617.15 samples/sec Loss 10.1124 LearningRate 0.0547 Epoch: 5 Global Step: 216080 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:00:05,981-Speed 2626.35 samples/sec Loss 10.0829 LearningRate 0.0547 Epoch: 5 Global Step: 216090 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:00:09,890-Speed 2620.48 samples/sec Loss 10.2089 LearningRate 0.0547 Epoch: 5 Global Step: 216100 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:00:13,786-Speed 2628.62 samples/sec Loss 9.9972 LearningRate 0.0547 Epoch: 5 Global Step: 216110 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:00:17,688-Speed 2625.08 samples/sec Loss 10.0061 LearningRate 0.0547 Epoch: 5 Global Step: 216120 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:00:21,587-Speed 2627.39 samples/sec Loss 9.9419 LearningRate 0.0547 Epoch: 5 Global Step: 216130 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:00:25,482-Speed 2629.96 samples/sec Loss 10.0756 LearningRate 0.0547 Epoch: 5 Global Step: 216140 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:00:29,606-Speed 2483.05 samples/sec Loss 10.1056 LearningRate 0.0547 Epoch: 5 Global Step: 216150 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:00:33,509-Speed 2624.43 samples/sec Loss 10.0427 LearningRate 0.0547 Epoch: 5 Global Step: 216160 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:00:37,382-Speed 2644.78 samples/sec Loss 10.0735 LearningRate 0.0547 Epoch: 5 Global Step: 216170 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:00:41,275-Speed 2630.99 samples/sec Loss 10.1282 LearningRate 0.0547 Epoch: 5 Global Step: 216180 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:00:45,170-Speed 2629.71 samples/sec Loss 10.0710 LearningRate 0.0547 Epoch: 5 Global Step: 216190 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:00:49,063-Speed 2630.86 samples/sec Loss 10.0142 LearningRate 0.0547 Epoch: 5 Global Step: 216200 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:00:52,957-Speed 2630.16 samples/sec Loss 9.9890 LearningRate 0.0547 Epoch: 5 Global Step: 216210 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:00:56,850-Speed 2631.41 samples/sec Loss 10.0780 LearningRate 0.0547 Epoch: 5 Global Step: 216220 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:01:00,759-Speed 2620.35 samples/sec Loss 10.1171 LearningRate 0.0547 Epoch: 5 Global Step: 216230 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:01:04,672-Speed 2617.34 samples/sec Loss 10.1073 LearningRate 0.0547 Epoch: 5 Global Step: 216240 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:01:08,572-Speed 2626.40 samples/sec Loss 10.1200 LearningRate 0.0547 Epoch: 5 Global Step: 216250 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:01:12,479-Speed 2621.58 samples/sec Loss 10.0703 LearningRate 0.0547 Epoch: 5 Global Step: 216260 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:01:16,372-Speed 2631.16 samples/sec Loss 10.0508 LearningRate 0.0547 Epoch: 5 Global Step: 216270 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:20,266-Speed 2630.66 samples/sec Loss 9.9793 LearningRate 0.0547 Epoch: 5 Global Step: 216280 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:24,160-Speed 2630.39 samples/sec Loss 10.0980 LearningRate 0.0547 Epoch: 5 Global Step: 216290 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:28,056-Speed 2629.04 samples/sec Loss 10.1142 LearningRate 0.0547 Epoch: 5 Global Step: 216300 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:31,956-Speed 2626.58 samples/sec Loss 10.1500 LearningRate 0.0546 Epoch: 5 Global Step: 216310 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:35,850-Speed 2630.27 samples/sec Loss 10.0766 LearningRate 0.0546 Epoch: 5 Global Step: 216320 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:39,743-Speed 2630.65 samples/sec Loss 10.1190 LearningRate 0.0546 Epoch: 5 Global Step: 216330 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:43,633-Speed 2633.25 samples/sec Loss 10.0501 LearningRate 0.0546 Epoch: 5 Global Step: 216340 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:47,527-Speed 2630.58 samples/sec Loss 10.0521 LearningRate 0.0546 Epoch: 5 Global Step: 216350 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:51,419-Speed 2631.85 samples/sec Loss 10.0829 LearningRate 0.0546 Epoch: 5 Global Step: 216360 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:01:55,309-Speed 2632.93 samples/sec Loss 10.1391 LearningRate 0.0546 Epoch: 5 Global Step: 216370 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:01:59,219-Speed 2619.90 samples/sec Loss 10.1233 LearningRate 0.0546 Epoch: 5 Global Step: 216380 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:03,110-Speed 2631.80 samples/sec Loss 10.1611 LearningRate 0.0546 Epoch: 5 Global Step: 216390 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:07,003-Speed 2631.33 samples/sec Loss 10.0832 LearningRate 0.0546 Epoch: 5 Global Step: 216400 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:10,898-Speed 2629.46 samples/sec Loss 10.1673 LearningRate 0.0546 Epoch: 5 Global Step: 216410 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:14,793-Speed 2630.27 samples/sec Loss 10.0024 LearningRate 0.0546 Epoch: 5 Global Step: 216420 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:18,688-Speed 2629.75 samples/sec Loss 10.1689 LearningRate 0.0546 Epoch: 5 Global Step: 216430 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:22,579-Speed 2632.74 samples/sec Loss 9.9777 LearningRate 0.0546 Epoch: 5 Global Step: 216440 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:27,298-Speed 2170.07 samples/sec Loss 10.2525 LearningRate 0.0546 Epoch: 5 Global Step: 216450 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:31,187-Speed 2633.96 samples/sec Loss 10.1546 LearningRate 0.0546 Epoch: 5 Global Step: 216460 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:35,061-Speed 2643.90 samples/sec Loss 9.9999 LearningRate 0.0546 Epoch: 5 Global Step: 216470 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:02:38,939-Speed 2641.19 samples/sec Loss 9.9806 LearningRate 0.0546 Epoch: 5 Global Step: 216480 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:02:42,832-Speed 2630.87 samples/sec Loss 10.1740 LearningRate 0.0546 Epoch: 5 Global Step: 216490 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:02:46,743-Speed 2618.92 samples/sec Loss 10.0428 LearningRate 0.0546 Epoch: 5 Global Step: 216500 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:02:50,646-Speed 2624.74 samples/sec Loss 10.1231 LearningRate 0.0546 Epoch: 5 Global Step: 216510 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:02:54,540-Speed 2630.09 samples/sec Loss 10.0081 LearningRate 0.0546 Epoch: 5 Global Step: 216520 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:02:58,427-Speed 2635.26 samples/sec Loss 10.0590 LearningRate 0.0546 Epoch: 5 Global Step: 216530 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:02,321-Speed 2631.10 samples/sec Loss 10.0873 LearningRate 0.0546 Epoch: 5 Global Step: 216540 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:06,218-Speed 2627.92 samples/sec Loss 10.1131 LearningRate 0.0546 Epoch: 5 Global Step: 216550 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:10,145-Speed 2607.84 samples/sec Loss 10.0188 LearningRate 0.0546 Epoch: 5 Global Step: 216560 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:14,038-Speed 2630.96 samples/sec Loss 10.0041 LearningRate 0.0546 Epoch: 5 Global Step: 216570 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:17,930-Speed 2632.35 samples/sec Loss 10.0229 LearningRate 0.0546 Epoch: 5 Global Step: 216580 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:21,837-Speed 2621.79 samples/sec Loss 10.0692 LearningRate 0.0546 Epoch: 5 Global Step: 216590 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:25,729-Speed 2631.19 samples/sec Loss 10.0824 LearningRate 0.0546 Epoch: 5 Global Step: 216600 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:29,619-Speed 2633.71 samples/sec Loss 10.0409 LearningRate 0.0546 Epoch: 5 Global Step: 216610 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:33,529-Speed 2619.01 samples/sec Loss 10.0942 LearningRate 0.0546 Epoch: 5 Global Step: 216620 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:03:37,422-Speed 2631.14 samples/sec Loss 10.0293 LearningRate 0.0546 Epoch: 5 Global Step: 216630 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:03:41,315-Speed 2630.95 samples/sec Loss 9.9682 LearningRate 0.0546 Epoch: 5 Global Step: 216640 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:03:45,211-Speed 2628.99 samples/sec Loss 9.9969 LearningRate 0.0546 Epoch: 5 Global Step: 216650 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:03:49,111-Speed 2626.49 samples/sec Loss 9.9300 LearningRate 0.0546 Epoch: 5 Global Step: 216660 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:03:53,009-Speed 2627.99 samples/sec Loss 10.0362 LearningRate 0.0546 Epoch: 5 Global Step: 216670 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:03:56,907-Speed 2627.02 samples/sec Loss 10.1355 LearningRate 0.0546 Epoch: 5 Global Step: 216680 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:04:00,815-Speed 2621.12 samples/sec Loss 10.1593 LearningRate 0.0546 Epoch: 5 Global Step: 216690 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:04:04,718-Speed 2624.22 samples/sec Loss 9.9772 LearningRate 0.0546 Epoch: 5 Global Step: 216700 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:04:08,621-Speed 2624.48 samples/sec Loss 10.1129 LearningRate 0.0546 Epoch: 5 Global Step: 216710 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:04:12,513-Speed 2631.76 samples/sec Loss 10.0196 LearningRate 0.0546 Epoch: 5 Global Step: 216720 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:04:16,405-Speed 2631.28 samples/sec Loss 9.9015 LearningRate 0.0546 Epoch: 5 Global Step: 216730 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:20,303-Speed 2627.88 samples/sec Loss 10.0913 LearningRate 0.0546 Epoch: 5 Global Step: 216740 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:24,207-Speed 2623.36 samples/sec Loss 10.0472 LearningRate 0.0546 Epoch: 5 Global Step: 216750 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:28,105-Speed 2627.42 samples/sec Loss 9.8549 LearningRate 0.0546 Epoch: 5 Global Step: 216760 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:31,999-Speed 2630.28 samples/sec Loss 10.1018 LearningRate 0.0546 Epoch: 5 Global Step: 216770 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:35,916-Speed 2614.68 samples/sec Loss 10.0443 LearningRate 0.0546 Epoch: 5 Global Step: 216780 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:39,817-Speed 2625.77 samples/sec Loss 9.9793 LearningRate 0.0546 Epoch: 5 Global Step: 216790 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:43,711-Speed 2629.95 samples/sec Loss 9.9900 LearningRate 0.0546 Epoch: 5 Global Step: 216800 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:47,609-Speed 2628.12 samples/sec Loss 10.0682 LearningRate 0.0546 Epoch: 5 Global Step: 216810 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:51,503-Speed 2630.10 samples/sec Loss 10.0313 LearningRate 0.0546 Epoch: 5 Global Step: 216820 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:04:55,400-Speed 2628.39 samples/sec Loss 10.1056 LearningRate 0.0546 Epoch: 5 Global Step: 216830 Fp16 Grad Scale: 524288 Required: 69 hours
Training: 2022-04-13 20:04:59,279-Speed 2640.50 samples/sec Loss 9.9827 LearningRate 0.0546 Epoch: 5 Global Step: 216840 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:03,181-Speed 2624.79 samples/sec Loss 10.0372 LearningRate 0.0546 Epoch: 5 Global Step: 216850 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:07,073-Speed 2631.35 samples/sec Loss 10.0318 LearningRate 0.0546 Epoch: 5 Global Step: 216860 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:10,968-Speed 2629.60 samples/sec Loss 10.0534 LearningRate 0.0545 Epoch: 5 Global Step: 216870 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:14,863-Speed 2629.90 samples/sec Loss 10.0579 LearningRate 0.0545 Epoch: 5 Global Step: 216880 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:18,760-Speed 2628.92 samples/sec Loss 10.0201 LearningRate 0.0545 Epoch: 5 Global Step: 216890 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:22,652-Speed 2631.41 samples/sec Loss 10.0954 LearningRate 0.0545 Epoch: 5 Global Step: 216900 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:26,544-Speed 2631.82 samples/sec Loss 10.0812 LearningRate 0.0545 Epoch: 5 Global Step: 216910 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:30,470-Speed 2608.37 samples/sec Loss 9.9815 LearningRate 0.0545 Epoch: 5 Global Step: 216920 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:05:34,346-Speed 2642.82 samples/sec Loss 10.1541 LearningRate 0.0545 Epoch: 5 Global Step: 216930 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:05:38,256-Speed 2619.62 samples/sec Loss 10.0795 LearningRate 0.0545 Epoch: 5 Global Step: 216940 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:05:42,157-Speed 2625.80 samples/sec Loss 9.9195 LearningRate 0.0545 Epoch: 5 Global Step: 216950 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:05:46,051-Speed 2630.31 samples/sec Loss 10.1355 LearningRate 0.0545 Epoch: 5 Global Step: 216960 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:05:49,954-Speed 2624.18 samples/sec Loss 9.9105 LearningRate 0.0545 Epoch: 5 Global Step: 216970 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:05:53,859-Speed 2622.56 samples/sec Loss 9.9455 LearningRate 0.0545 Epoch: 5 Global Step: 216980 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:05:57,885-Speed 2544.68 samples/sec Loss 9.9500 LearningRate 0.0545 Epoch: 5 Global Step: 216990 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:01,784-Speed 2626.88 samples/sec Loss 10.0066 LearningRate 0.0545 Epoch: 5 Global Step: 217000 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:05,671-Speed 2634.70 samples/sec Loss 10.1468 LearningRate 0.0545 Epoch: 5 Global Step: 217010 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:09,560-Speed 2633.53 samples/sec Loss 10.0161 LearningRate 0.0545 Epoch: 5 Global Step: 217020 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:13,434-Speed 2644.00 samples/sec Loss 10.1798 LearningRate 0.0545 Epoch: 5 Global Step: 217030 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:17,345-Speed 2619.16 samples/sec Loss 9.9199 LearningRate 0.0545 Epoch: 5 Global Step: 217040 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:21,237-Speed 2631.25 samples/sec Loss 10.1146 LearningRate 0.0545 Epoch: 5 Global Step: 217050 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:25,130-Speed 2631.12 samples/sec Loss 10.0760 LearningRate 0.0545 Epoch: 5 Global Step: 217060 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:29,027-Speed 2628.78 samples/sec Loss 10.0989 LearningRate 0.0545 Epoch: 5 Global Step: 217070 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:06:32,955-Speed 2606.93 samples/sec Loss 10.1636 LearningRate 0.0545 Epoch: 5 Global Step: 217080 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:06:36,815-Speed 2653.80 samples/sec Loss 10.4585 LearningRate 0.0545 Epoch: 5 Global Step: 217090 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:06:40,704-Speed 2634.17 samples/sec Loss 10.2140 LearningRate 0.0545 Epoch: 5 Global Step: 217100 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:06:44,590-Speed 2635.43 samples/sec Loss 10.1633 LearningRate 0.0545 Epoch: 5 Global Step: 217110 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:06:48,476-Speed 2636.03 samples/sec Loss 10.2315 LearningRate 0.0545 Epoch: 5 Global Step: 217120 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:06:52,366-Speed 2632.87 samples/sec Loss 10.1710 LearningRate 0.0545 Epoch: 5 Global Step: 217130 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:06:56,252-Speed 2636.27 samples/sec Loss 10.1395 LearningRate 0.0545 Epoch: 5 Global Step: 217140 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:07:00,138-Speed 2635.51 samples/sec Loss 10.1025 LearningRate 0.0545 Epoch: 5 Global Step: 217150 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:07:04,032-Speed 2629.97 samples/sec Loss 10.1127 LearningRate 0.0545 Epoch: 5 Global Step: 217160 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:07:07,934-Speed 2624.50 samples/sec Loss 10.0332 LearningRate 0.0545 Epoch: 5 Global Step: 217170 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:07:11,846-Speed 2618.83 samples/sec Loss 10.0382 LearningRate 0.0545 Epoch: 5 Global Step: 217180 Fp16 Grad Scale: 16384 Required: 69 hours
Training: 2022-04-13 20:07:15,753-Speed 2621.29 samples/sec Loss 9.9599 LearningRate 0.0545 Epoch: 5 Global Step: 217190 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:19,654-Speed 2625.47 samples/sec Loss 10.0123 LearningRate 0.0545 Epoch: 5 Global Step: 217200 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:23,572-Speed 2614.66 samples/sec Loss 10.1065 LearningRate 0.0545 Epoch: 5 Global Step: 217210 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:27,469-Speed 2628.61 samples/sec Loss 10.0463 LearningRate 0.0545 Epoch: 5 Global Step: 217220 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:31,360-Speed 2632.67 samples/sec Loss 10.0176 LearningRate 0.0545 Epoch: 5 Global Step: 217230 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:35,254-Speed 2629.69 samples/sec Loss 10.0639 LearningRate 0.0545 Epoch: 5 Global Step: 217240 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:39,247-Speed 2565.12 samples/sec Loss 10.0764 LearningRate 0.0545 Epoch: 5 Global Step: 217250 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:43,137-Speed 2632.50 samples/sec Loss 10.1571 LearningRate 0.0545 Epoch: 5 Global Step: 217260 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:47,028-Speed 2632.80 samples/sec Loss 10.0825 LearningRate 0.0545 Epoch: 5 Global Step: 217270 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:50,923-Speed 2629.54 samples/sec Loss 10.0185 LearningRate 0.0545 Epoch: 5 Global Step: 217280 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:07:54,817-Speed 2631.21 samples/sec Loss 10.0667 LearningRate 0.0545 Epoch: 5 Global Step: 217290 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:07:58,710-Speed 2630.22 samples/sec Loss 10.1981 LearningRate 0.0545 Epoch: 5 Global Step: 217300 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:08:02,607-Speed 2628.76 samples/sec Loss 10.0923 LearningRate 0.0545 Epoch: 5 Global Step: 217310 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:08:06,499-Speed 2631.62 samples/sec Loss 10.0721 LearningRate 0.0545 Epoch: 5 Global Step: 217320 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:08:10,428-Speed 2606.83 samples/sec Loss 10.1296 LearningRate 0.0545 Epoch: 5 Global Step: 217330 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:08:14,323-Speed 2628.94 samples/sec Loss 10.0501 LearningRate 0.0545 Epoch: 5 Global Step: 217340 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:08:18,215-Speed 2632.03 samples/sec Loss 10.1830 LearningRate 0.0545 Epoch: 5 Global Step: 217350 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:08:22,109-Speed 2630.82 samples/sec Loss 10.0462 LearningRate 0.0545 Epoch: 5 Global Step: 217360 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:08:26,022-Speed 2617.40 samples/sec Loss 10.4574 LearningRate 0.0545 Epoch: 5 Global Step: 217370 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:29,920-Speed 2627.74 samples/sec Loss 11.1287 LearningRate 0.0545 Epoch: 5 Global Step: 217380 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:33,820-Speed 2626.42 samples/sec Loss 10.4590 LearningRate 0.0545 Epoch: 5 Global Step: 217390 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:37,703-Speed 2637.58 samples/sec Loss 10.3181 LearningRate 0.0545 Epoch: 5 Global Step: 217400 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:41,594-Speed 2632.01 samples/sec Loss 10.1067 LearningRate 0.0545 Epoch: 5 Global Step: 217410 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:45,487-Speed 2631.50 samples/sec Loss 10.1444 LearningRate 0.0545 Epoch: 5 Global Step: 217420 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:49,376-Speed 2633.52 samples/sec Loss 10.1066 LearningRate 0.0545 Epoch: 5 Global Step: 217430 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:53,267-Speed 2632.66 samples/sec Loss 10.1837 LearningRate 0.0544 Epoch: 5 Global Step: 217440 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:08:57,156-Speed 2633.89 samples/sec Loss 10.2390 LearningRate 0.0544 Epoch: 5 Global Step: 217450 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:09:01,045-Speed 2633.99 samples/sec Loss 10.1422 LearningRate 0.0544 Epoch: 5 Global Step: 217460 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:09:05,114-Speed 2516.87 samples/sec Loss 10.0577 LearningRate 0.0544 Epoch: 5 Global Step: 217470 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:09,204-Speed 2504.06 samples/sec Loss 10.1584 LearningRate 0.0544 Epoch: 5 Global Step: 217480 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:13,243-Speed 2535.91 samples/sec Loss 10.1698 LearningRate 0.0544 Epoch: 5 Global Step: 217490 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:17,135-Speed 2631.87 samples/sec Loss 10.2079 LearningRate 0.0544 Epoch: 5 Global Step: 217500 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:21,085-Speed 2593.28 samples/sec Loss 10.1691 LearningRate 0.0544 Epoch: 5 Global Step: 217510 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:25,044-Speed 2586.99 samples/sec Loss 10.0856 LearningRate 0.0544 Epoch: 5 Global Step: 217520 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:28,934-Speed 2633.64 samples/sec Loss 10.1531 LearningRate 0.0544 Epoch: 5 Global Step: 217530 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:32,848-Speed 2616.65 samples/sec Loss 10.0526 LearningRate 0.0544 Epoch: 5 Global Step: 217540 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:36,740-Speed 2631.91 samples/sec Loss 10.1068 LearningRate 0.0544 Epoch: 5 Global Step: 217550 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:40,633-Speed 2630.67 samples/sec Loss 10.1644 LearningRate 0.0544 Epoch: 5 Global Step: 217560 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:09:44,539-Speed 2622.18 samples/sec Loss 10.0139 LearningRate 0.0544 Epoch: 5 Global Step: 217570 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:09:48,497-Speed 2587.80 samples/sec Loss 10.0515 LearningRate 0.0544 Epoch: 5 Global Step: 217580 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:09:52,408-Speed 2619.38 samples/sec Loss 10.0225 LearningRate 0.0544 Epoch: 5 Global Step: 217590 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:09:56,300-Speed 2631.80 samples/sec Loss 10.1342 LearningRate 0.0544 Epoch: 5 Global Step: 217600 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:00,193-Speed 2630.56 samples/sec Loss 9.9988 LearningRate 0.0544 Epoch: 5 Global Step: 217610 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:04,092-Speed 2627.15 samples/sec Loss 9.9976 LearningRate 0.0544 Epoch: 5 Global Step: 217620 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:07,987-Speed 2629.28 samples/sec Loss 10.0973 LearningRate 0.0544 Epoch: 5 Global Step: 217630 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:11,883-Speed 2628.70 samples/sec Loss 10.0947 LearningRate 0.0544 Epoch: 5 Global Step: 217640 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:15,776-Speed 2631.56 samples/sec Loss 10.0439 LearningRate 0.0544 Epoch: 5 Global Step: 217650 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:19,670-Speed 2629.96 samples/sec Loss 9.9518 LearningRate 0.0544 Epoch: 5 Global Step: 217660 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:23,568-Speed 2628.55 samples/sec Loss 9.9444 LearningRate 0.0544 Epoch: 5 Global Step: 217670 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:10:27,471-Speed 2624.04 samples/sec Loss 10.0400 LearningRate 0.0544 Epoch: 5 Global Step: 217680 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:10:31,363-Speed 2631.70 samples/sec Loss 10.1288 LearningRate 0.0544 Epoch: 5 Global Step: 217690 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:10:35,289-Speed 2608.38 samples/sec Loss 10.0135 LearningRate 0.0544 Epoch: 5 Global Step: 217700 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:10:39,163-Speed 2644.08 samples/sec Loss 10.0862 LearningRate 0.0544 Epoch: 5 Global Step: 217710 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:43,056-Speed 2631.10 samples/sec Loss 9.9968 LearningRate 0.0544 Epoch: 5 Global Step: 217720 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:46,947-Speed 2632.17 samples/sec Loss 10.0521 LearningRate 0.0544 Epoch: 5 Global Step: 217730 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:50,851-Speed 2623.65 samples/sec Loss 10.0330 LearningRate 0.0544 Epoch: 5 Global Step: 217740 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:54,738-Speed 2635.50 samples/sec Loss 10.0074 LearningRate 0.0544 Epoch: 5 Global Step: 217750 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:10:58,630-Speed 2631.31 samples/sec Loss 9.9975 LearningRate 0.0544 Epoch: 5 Global Step: 217760 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:11:02,520-Speed 2632.96 samples/sec Loss 10.1640 LearningRate 0.0544 Epoch: 5 Global Step: 217770 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:11:06,416-Speed 2629.19 samples/sec Loss 10.0885 LearningRate 0.0544 Epoch: 5 Global Step: 217780 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:11:10,283-Speed 2648.83 samples/sec Loss 10.4367 LearningRate 0.0544 Epoch: 5 Global Step: 217790 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:14,182-Speed 2626.79 samples/sec Loss 10.1568 LearningRate 0.0544 Epoch: 5 Global Step: 217800 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:18,246-Speed 2520.51 samples/sec Loss 10.1481 LearningRate 0.0544 Epoch: 5 Global Step: 217810 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:22,138-Speed 2631.29 samples/sec Loss 10.0982 LearningRate 0.0544 Epoch: 5 Global Step: 217820 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:26,044-Speed 2622.82 samples/sec Loss 10.1176 LearningRate 0.0544 Epoch: 5 Global Step: 217830 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:29,940-Speed 2629.28 samples/sec Loss 9.9946 LearningRate 0.0544 Epoch: 5 Global Step: 217840 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:33,840-Speed 2625.95 samples/sec Loss 10.1233 LearningRate 0.0544 Epoch: 5 Global Step: 217850 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:37,734-Speed 2630.47 samples/sec Loss 10.0687 LearningRate 0.0544 Epoch: 5 Global Step: 217860 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:41,622-Speed 2634.16 samples/sec Loss 10.1467 LearningRate 0.0544 Epoch: 5 Global Step: 217870 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:45,513-Speed 2632.34 samples/sec Loss 10.0776 LearningRate 0.0544 Epoch: 5 Global Step: 217880 Fp16 Grad Scale: 32768 Required: 69 hours
Training: 2022-04-13 20:11:49,408-Speed 2629.56 samples/sec Loss 10.0334 LearningRate 0.0544 Epoch: 5 Global Step: 217890 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:11:53,299-Speed 2632.76 samples/sec Loss 9.9675 LearningRate 0.0544 Epoch: 5 Global Step: 217900 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:11:57,192-Speed 2630.96 samples/sec Loss 10.1356 LearningRate 0.0544 Epoch: 5 Global Step: 217910 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:01,083-Speed 2632.11 samples/sec Loss 9.9899 LearningRate 0.0544 Epoch: 5 Global Step: 217920 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:04,977-Speed 2630.35 samples/sec Loss 9.8191 LearningRate 0.0544 Epoch: 5 Global Step: 217930 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:08,865-Speed 2634.25 samples/sec Loss 10.0460 LearningRate 0.0544 Epoch: 5 Global Step: 217940 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:12,757-Speed 2631.61 samples/sec Loss 10.0970 LearningRate 0.0544 Epoch: 5 Global Step: 217950 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:16,649-Speed 2631.88 samples/sec Loss 10.0359 LearningRate 0.0544 Epoch: 5 Global Step: 217960 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:20,566-Speed 2614.84 samples/sec Loss 9.8988 LearningRate 0.0544 Epoch: 5 Global Step: 217970 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:24,489-Speed 2611.40 samples/sec Loss 10.0387 LearningRate 0.0544 Epoch: 5 Global Step: 217980 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:12:28,390-Speed 2625.77 samples/sec Loss 10.0061 LearningRate 0.0544 Epoch: 5 Global Step: 217990 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:32,297-Speed 2621.49 samples/sec Loss 10.0660 LearningRate 0.0543 Epoch: 5 Global Step: 218000 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:36,192-Speed 2629.25 samples/sec Loss 10.0600 LearningRate 0.0543 Epoch: 5 Global Step: 218010 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:40,094-Speed 2624.87 samples/sec Loss 10.0744 LearningRate 0.0543 Epoch: 5 Global Step: 218020 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:43,992-Speed 2627.61 samples/sec Loss 10.0627 LearningRate 0.0543 Epoch: 5 Global Step: 218030 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:47,914-Speed 2611.83 samples/sec Loss 10.0427 LearningRate 0.0543 Epoch: 5 Global Step: 218040 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:51,809-Speed 2629.14 samples/sec Loss 10.0970 LearningRate 0.0543 Epoch: 5 Global Step: 218050 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:55,719-Speed 2619.97 samples/sec Loss 10.0568 LearningRate 0.0543 Epoch: 5 Global Step: 218060 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:12:59,620-Speed 2625.51 samples/sec Loss 10.0643 LearningRate 0.0543 Epoch: 5 Global Step: 218070 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:03,523-Speed 2624.40 samples/sec Loss 10.0285 LearningRate 0.0543 Epoch: 5 Global Step: 218080 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:07,425-Speed 2625.12 samples/sec Loss 9.9531 LearningRate 0.0543 Epoch: 5 Global Step: 218090 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:13:11,317-Speed 2631.83 samples/sec Loss 10.0583 LearningRate 0.0543 Epoch: 5 Global Step: 218100 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:13:15,213-Speed 2628.60 samples/sec Loss 10.1306 LearningRate 0.0543 Epoch: 5 Global Step: 218110 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:13:19,091-Speed 2641.49 samples/sec Loss 10.0554 LearningRate 0.0543 Epoch: 5 Global Step: 218120 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:22,983-Speed 2631.60 samples/sec Loss 10.0395 LearningRate 0.0543 Epoch: 5 Global Step: 218130 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:27,019-Speed 2538.29 samples/sec Loss 10.0835 LearningRate 0.0543 Epoch: 5 Global Step: 218140 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:31,084-Speed 2519.42 samples/sec Loss 9.9614 LearningRate 0.0543 Epoch: 5 Global Step: 218150 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:35,082-Speed 2561.83 samples/sec Loss 9.9924 LearningRate 0.0543 Epoch: 5 Global Step: 218160 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:39,010-Speed 2607.14 samples/sec Loss 10.1951 LearningRate 0.0543 Epoch: 5 Global Step: 218170 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:42,902-Speed 2631.79 samples/sec Loss 10.0745 LearningRate 0.0543 Epoch: 5 Global Step: 218180 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:46,803-Speed 2625.06 samples/sec Loss 10.0729 LearningRate 0.0543 Epoch: 5 Global Step: 218190 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:50,695-Speed 2631.85 samples/sec Loss 9.9785 LearningRate 0.0543 Epoch: 5 Global Step: 218200 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:54,586-Speed 2632.22 samples/sec Loss 9.9833 LearningRate 0.0543 Epoch: 5 Global Step: 218210 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:13:58,476-Speed 2633.95 samples/sec Loss 9.9666 LearningRate 0.0543 Epoch: 5 Global Step: 218220 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:14:02,351-Speed 2643.12 samples/sec Loss 9.9895 LearningRate 0.0543 Epoch: 5 Global Step: 218230 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:06,243-Speed 2631.33 samples/sec Loss 9.9958 LearningRate 0.0543 Epoch: 5 Global Step: 218240 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:10,134-Speed 2632.23 samples/sec Loss 10.1030 LearningRate 0.0543 Epoch: 5 Global Step: 218250 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:14,023-Speed 2633.64 samples/sec Loss 9.9691 LearningRate 0.0543 Epoch: 5 Global Step: 218260 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:17,914-Speed 2632.05 samples/sec Loss 10.0582 LearningRate 0.0543 Epoch: 5 Global Step: 218270 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:21,816-Speed 2624.84 samples/sec Loss 10.0572 LearningRate 0.0543 Epoch: 5 Global Step: 218280 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:25,732-Speed 2615.93 samples/sec Loss 10.0170 LearningRate 0.0543 Epoch: 5 Global Step: 218290 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:29,622-Speed 2632.44 samples/sec Loss 9.9769 LearningRate 0.0543 Epoch: 5 Global Step: 218300 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:33,515-Speed 2631.51 samples/sec Loss 10.0784 LearningRate 0.0543 Epoch: 5 Global Step: 218310 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:37,409-Speed 2630.11 samples/sec Loss 10.0153 LearningRate 0.0543 Epoch: 5 Global Step: 218320 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:41,326-Speed 2615.15 samples/sec Loss 9.9660 LearningRate 0.0543 Epoch: 5 Global Step: 218330 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:14:45,253-Speed 2607.94 samples/sec Loss 10.1008 LearningRate 0.0543 Epoch: 5 Global Step: 218340 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:49,147-Speed 2630.28 samples/sec Loss 9.9166 LearningRate 0.0543 Epoch: 5 Global Step: 218350 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:53,045-Speed 2627.82 samples/sec Loss 10.0474 LearningRate 0.0543 Epoch: 5 Global Step: 218360 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:14:56,945-Speed 2626.62 samples/sec Loss 10.1493 LearningRate 0.0543 Epoch: 5 Global Step: 218370 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:00,845-Speed 2626.68 samples/sec Loss 10.1483 LearningRate 0.0543 Epoch: 5 Global Step: 218380 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:04,743-Speed 2627.34 samples/sec Loss 10.0854 LearningRate 0.0543 Epoch: 5 Global Step: 218390 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:08,639-Speed 2628.94 samples/sec Loss 10.0507 LearningRate 0.0543 Epoch: 5 Global Step: 218400 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:12,534-Speed 2629.96 samples/sec Loss 9.9535 LearningRate 0.0543 Epoch: 5 Global Step: 218410 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:16,427-Speed 2630.92 samples/sec Loss 10.0315 LearningRate 0.0543 Epoch: 5 Global Step: 218420 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:20,321-Speed 2630.07 samples/sec Loss 10.0884 LearningRate 0.0543 Epoch: 5 Global Step: 218430 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:24,212-Speed 2632.64 samples/sec Loss 9.8550 LearningRate 0.0543 Epoch: 5 Global Step: 218440 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:15:28,102-Speed 2632.75 samples/sec Loss 9.9957 LearningRate 0.0543 Epoch: 5 Global Step: 218450 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:15:31,997-Speed 2630.55 samples/sec Loss 10.0913 LearningRate 0.0543 Epoch: 5 Global Step: 218460 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:15:35,902-Speed 2622.40 samples/sec Loss 10.1362 LearningRate 0.0543 Epoch: 5 Global Step: 218470 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:15:39,799-Speed 2628.40 samples/sec Loss 10.1242 LearningRate 0.0543 Epoch: 5 Global Step: 218480 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:15:43,699-Speed 2626.13 samples/sec Loss 9.9719 LearningRate 0.0543 Epoch: 5 Global Step: 218490 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:47,585-Speed 2635.67 samples/sec Loss 10.1098 LearningRate 0.0543 Epoch: 5 Global Step: 218500 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:51,483-Speed 2627.50 samples/sec Loss 10.0373 LearningRate 0.0543 Epoch: 5 Global Step: 218510 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:55,383-Speed 2626.74 samples/sec Loss 9.9240 LearningRate 0.0543 Epoch: 5 Global Step: 218520 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:15:59,279-Speed 2629.09 samples/sec Loss 10.0497 LearningRate 0.0543 Epoch: 5 Global Step: 218530 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:16:03,237-Speed 2587.82 samples/sec Loss 10.0975 LearningRate 0.0543 Epoch: 5 Global Step: 218540 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:16:07,111-Speed 2643.61 samples/sec Loss 10.0148 LearningRate 0.0543 Epoch: 5 Global Step: 218550 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:11,026-Speed 2616.94 samples/sec Loss 10.1473 LearningRate 0.0542 Epoch: 5 Global Step: 218560 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:14,916-Speed 2632.66 samples/sec Loss 10.0384 LearningRate 0.0542 Epoch: 5 Global Step: 218570 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:18,812-Speed 2629.27 samples/sec Loss 9.9203 LearningRate 0.0542 Epoch: 5 Global Step: 218580 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:22,737-Speed 2610.30 samples/sec Loss 10.0120 LearningRate 0.0542 Epoch: 5 Global Step: 218590 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:26,632-Speed 2629.62 samples/sec Loss 10.1177 LearningRate 0.0542 Epoch: 5 Global Step: 218600 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:30,531-Speed 2626.72 samples/sec Loss 10.0681 LearningRate 0.0542 Epoch: 5 Global Step: 218610 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:34,427-Speed 2628.91 samples/sec Loss 10.0942 LearningRate 0.0542 Epoch: 5 Global Step: 218620 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:38,319-Speed 2631.88 samples/sec Loss 9.9943 LearningRate 0.0542 Epoch: 5 Global Step: 218630 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:42,219-Speed 2626.46 samples/sec Loss 10.0674 LearningRate 0.0542 Epoch: 5 Global Step: 218640 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:16:46,223-Speed 2557.95 samples/sec Loss 9.9767 LearningRate 0.0542 Epoch: 5 Global Step: 218650 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:16:50,118-Speed 2629.78 samples/sec Loss 10.1444 LearningRate 0.0542 Epoch: 5 Global Step: 218660 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:16:54,016-Speed 2627.85 samples/sec Loss 9.8594 LearningRate 0.0542 Epoch: 5 Global Step: 218670 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:16:57,912-Speed 2628.79 samples/sec Loss 10.0918 LearningRate 0.0542 Epoch: 5 Global Step: 218680 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:01,806-Speed 2630.22 samples/sec Loss 10.0289 LearningRate 0.0542 Epoch: 5 Global Step: 218690 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:05,702-Speed 2629.48 samples/sec Loss 9.9871 LearningRate 0.0542 Epoch: 5 Global Step: 218700 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:09,603-Speed 2625.16 samples/sec Loss 10.1083 LearningRate 0.0542 Epoch: 5 Global Step: 218710 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:13,499-Speed 2629.39 samples/sec Loss 10.1507 LearningRate 0.0542 Epoch: 5 Global Step: 218720 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:17,398-Speed 2627.72 samples/sec Loss 9.9886 LearningRate 0.0542 Epoch: 5 Global Step: 218730 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:21,343-Speed 2596.20 samples/sec Loss 10.1028 LearningRate 0.0542 Epoch: 5 Global Step: 218740 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:25,242-Speed 2627.13 samples/sec Loss 10.0931 LearningRate 0.0542 Epoch: 5 Global Step: 218750 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:17:29,144-Speed 2624.72 samples/sec Loss 10.1606 LearningRate 0.0542 Epoch: 5 Global Step: 218760 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:17:33,022-Speed 2641.78 samples/sec Loss 9.9648 LearningRate 0.0542 Epoch: 5 Global Step: 218770 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:36,920-Speed 2627.09 samples/sec Loss 10.0013 LearningRate 0.0542 Epoch: 5 Global Step: 218780 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:40,820-Speed 2626.39 samples/sec Loss 10.0684 LearningRate 0.0542 Epoch: 5 Global Step: 218790 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:44,718-Speed 2627.90 samples/sec Loss 9.9735 LearningRate 0.0542 Epoch: 5 Global Step: 218800 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:48,617-Speed 2627.33 samples/sec Loss 9.9950 LearningRate 0.0542 Epoch: 5 Global Step: 218810 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:52,510-Speed 2631.13 samples/sec Loss 10.0293 LearningRate 0.0542 Epoch: 5 Global Step: 218820 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:17:56,400-Speed 2632.85 samples/sec Loss 10.1088 LearningRate 0.0542 Epoch: 5 Global Step: 218830 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:00,290-Speed 2633.14 samples/sec Loss 9.9623 LearningRate 0.0542 Epoch: 5 Global Step: 218840 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:04,184-Speed 2629.59 samples/sec Loss 9.9415 LearningRate 0.0542 Epoch: 5 Global Step: 218850 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:08,088-Speed 2623.79 samples/sec Loss 9.9904 LearningRate 0.0542 Epoch: 5 Global Step: 218860 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:11,985-Speed 2628.44 samples/sec Loss 10.0636 LearningRate 0.0542 Epoch: 5 Global Step: 218870 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:15,893-Speed 2621.21 samples/sec Loss 10.0654 LearningRate 0.0542 Epoch: 5 Global Step: 218880 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:19,799-Speed 2622.59 samples/sec Loss 9.9790 LearningRate 0.0542 Epoch: 5 Global Step: 218890 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:23,717-Speed 2614.24 samples/sec Loss 9.9989 LearningRate 0.0542 Epoch: 5 Global Step: 218900 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:27,608-Speed 2632.91 samples/sec Loss 9.9674 LearningRate 0.0542 Epoch: 5 Global Step: 218910 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:31,498-Speed 2632.64 samples/sec Loss 10.0210 LearningRate 0.0542 Epoch: 5 Global Step: 218920 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:35,390-Speed 2631.27 samples/sec Loss 9.9453 LearningRate 0.0542 Epoch: 5 Global Step: 218930 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:39,292-Speed 2625.07 samples/sec Loss 9.9155 LearningRate 0.0542 Epoch: 5 Global Step: 218940 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:43,195-Speed 2624.45 samples/sec Loss 9.9185 LearningRate 0.0542 Epoch: 5 Global Step: 218950 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:47,091-Speed 2629.37 samples/sec Loss 10.0309 LearningRate 0.0542 Epoch: 5 Global Step: 218960 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:18:50,982-Speed 2632.08 samples/sec Loss 10.0034 LearningRate 0.0542 Epoch: 5 Global Step: 218970 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:18:54,907-Speed 2609.61 samples/sec Loss 9.9939 LearningRate 0.0542 Epoch: 5 Global Step: 218980 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:18:58,798-Speed 2632.74 samples/sec Loss 9.9940 LearningRate 0.0542 Epoch: 5 Global Step: 218990 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:19:02,693-Speed 2629.85 samples/sec Loss 9.9783 LearningRate 0.0542 Epoch: 5 Global Step: 219000 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:19:06,570-Speed 2641.31 samples/sec Loss 10.0626 LearningRate 0.0542 Epoch: 5 Global Step: 219010 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:19:10,456-Speed 2635.38 samples/sec Loss 10.1523 LearningRate 0.0542 Epoch: 5 Global Step: 219020 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:14,360-Speed 2624.35 samples/sec Loss 10.0614 LearningRate 0.0542 Epoch: 5 Global Step: 219030 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:18,253-Speed 2630.95 samples/sec Loss 9.9222 LearningRate 0.0542 Epoch: 5 Global Step: 219040 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:22,145-Speed 2631.75 samples/sec Loss 10.0708 LearningRate 0.0542 Epoch: 5 Global Step: 219050 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:26,037-Speed 2632.03 samples/sec Loss 10.1807 LearningRate 0.0542 Epoch: 5 Global Step: 219060 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:29,931-Speed 2630.31 samples/sec Loss 9.9998 LearningRate 0.0542 Epoch: 5 Global Step: 219070 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:33,834-Speed 2623.59 samples/sec Loss 10.0663 LearningRate 0.0542 Epoch: 5 Global Step: 219080 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:37,726-Speed 2631.47 samples/sec Loss 10.0483 LearningRate 0.0542 Epoch: 5 Global Step: 219090 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:41,618-Speed 2632.31 samples/sec Loss 10.0424 LearningRate 0.0542 Epoch: 5 Global Step: 219100 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:45,515-Speed 2628.24 samples/sec Loss 9.9180 LearningRate 0.0542 Epoch: 5 Global Step: 219110 Fp16 Grad Scale: 65536 Required: 69 hours
Training: 2022-04-13 20:19:49,439-Speed 2610.16 samples/sec Loss 9.9792 LearningRate 0.0541 Epoch: 5 Global Step: 219120 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:19:53,512-Speed 2514.73 samples/sec Loss 9.9893 LearningRate 0.0541 Epoch: 5 Global Step: 219130 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:19:57,588-Speed 2512.89 samples/sec Loss 9.9848 LearningRate 0.0541 Epoch: 5 Global Step: 219140 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:01,658-Speed 2516.45 samples/sec Loss 10.0273 LearningRate 0.0541 Epoch: 5 Global Step: 219150 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:05,641-Speed 2571.69 samples/sec Loss 10.1292 LearningRate 0.0541 Epoch: 5 Global Step: 219160 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:09,546-Speed 2622.57 samples/sec Loss 9.9445 LearningRate 0.0541 Epoch: 5 Global Step: 219170 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:13,449-Speed 2624.49 samples/sec Loss 10.0377 LearningRate 0.0541 Epoch: 5 Global Step: 219180 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:17,356-Speed 2621.64 samples/sec Loss 10.1857 LearningRate 0.0541 Epoch: 5 Global Step: 219190 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:21,250-Speed 2630.12 samples/sec Loss 10.1430 LearningRate 0.0541 Epoch: 5 Global Step: 219200 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:25,155-Speed 2623.14 samples/sec Loss 9.9388 LearningRate 0.0541 Epoch: 5 Global Step: 219210 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:20:29,050-Speed 2629.47 samples/sec Loss 9.9866 LearningRate 0.0541 Epoch: 5 Global Step: 219220 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:20:32,948-Speed 2627.73 samples/sec Loss 9.9781 LearningRate 0.0541 Epoch: 5 Global Step: 219230 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:20:36,852-Speed 2623.75 samples/sec Loss 9.9573 LearningRate 0.0541 Epoch: 5 Global Step: 219240 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:20:40,736-Speed 2636.47 samples/sec Loss 9.9393 LearningRate 0.0541 Epoch: 5 Global Step: 219250 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:20:44,630-Speed 2630.69 samples/sec Loss 10.1807 LearningRate 0.0541 Epoch: 5 Global Step: 219260 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:20:48,521-Speed 2632.28 samples/sec Loss 9.9795 LearningRate 0.0541 Epoch: 5 Global Step: 219270 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:20:52,410-Speed 2633.56 samples/sec Loss 10.0123 LearningRate 0.0541 Epoch: 5 Global Step: 219280 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:20:56,302-Speed 2631.80 samples/sec Loss 10.0497 LearningRate 0.0541 Epoch: 5 Global Step: 219290 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:21:00,198-Speed 2629.47 samples/sec Loss 10.0614 LearningRate 0.0541 Epoch: 5 Global Step: 219300 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:21:04,091-Speed 2631.25 samples/sec Loss 9.9597 LearningRate 0.0541 Epoch: 5 Global Step: 219310 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:21:07,967-Speed 2641.87 samples/sec Loss 10.1479 LearningRate 0.0541 Epoch: 5 Global Step: 219320 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:21:11,857-Speed 2632.92 samples/sec Loss 9.9663 LearningRate 0.0541 Epoch: 5 Global Step: 219330 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:21:15,749-Speed 2631.77 samples/sec Loss 10.0250 LearningRate 0.0541 Epoch: 5 Global Step: 219340 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:21:19,641-Speed 2632.06 samples/sec Loss 9.9026 LearningRate 0.0541 Epoch: 5 Global Step: 219350 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:21:23,519-Speed 2640.88 samples/sec Loss 10.0417 LearningRate 0.0541 Epoch: 5 Global Step: 219360 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:21:27,403-Speed 2636.93 samples/sec Loss 10.0396 LearningRate 0.0541 Epoch: 5 Global Step: 219370 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:21:31,297-Speed 2630.99 samples/sec Loss 10.0664 LearningRate 0.0541 Epoch: 5 Global Step: 219380 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:21:35,247-Speed 2592.82 samples/sec Loss 9.9745 LearningRate 0.0541 Epoch: 5 Global Step: 219390 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:21:39,145-Speed 2627.64 samples/sec Loss 9.9969 LearningRate 0.0541 Epoch: 5 Global Step: 219400 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:21:43,036-Speed 2631.92 samples/sec Loss 9.9790 LearningRate 0.0541 Epoch: 5 Global Step: 219410 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:21:46,927-Speed 2632.77 samples/sec Loss 9.9665 LearningRate 0.0541 Epoch: 5 Global Step: 219420 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:21:50,817-Speed 2633.06 samples/sec Loss 10.0347 LearningRate 0.0541 Epoch: 5 Global Step: 219430 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:21:54,708-Speed 2632.60 samples/sec Loss 10.0895 LearningRate 0.0541 Epoch: 5 Global Step: 219440 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:21:58,602-Speed 2630.53 samples/sec Loss 10.0710 LearningRate 0.0541 Epoch: 5 Global Step: 219450 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:02,498-Speed 2628.70 samples/sec Loss 10.0501 LearningRate 0.0541 Epoch: 5 Global Step: 219460 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:22:06,395-Speed 2628.40 samples/sec Loss 10.1238 LearningRate 0.0541 Epoch: 5 Global Step: 219470 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:22:10,296-Speed 2625.73 samples/sec Loss 9.8773 LearningRate 0.0541 Epoch: 5 Global Step: 219480 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:22:14,210-Speed 2617.00 samples/sec Loss 9.9269 LearningRate 0.0541 Epoch: 5 Global Step: 219490 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:22:18,104-Speed 2630.24 samples/sec Loss 10.0223 LearningRate 0.0541 Epoch: 5 Global Step: 219500 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:22:22,002-Speed 2627.59 samples/sec Loss 10.1305 LearningRate 0.0541 Epoch: 5 Global Step: 219510 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:22:25,896-Speed 2630.23 samples/sec Loss 10.0255 LearningRate 0.0541 Epoch: 5 Global Step: 219520 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:29,810-Speed 2616.89 samples/sec Loss 10.0652 LearningRate 0.0541 Epoch: 5 Global Step: 219530 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:33,706-Speed 2628.80 samples/sec Loss 9.9454 LearningRate 0.0541 Epoch: 5 Global Step: 219540 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:37,611-Speed 2623.31 samples/sec Loss 10.0909 LearningRate 0.0541 Epoch: 5 Global Step: 219550 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:41,527-Speed 2615.05 samples/sec Loss 9.9847 LearningRate 0.0541 Epoch: 5 Global Step: 219560 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:45,435-Speed 2621.09 samples/sec Loss 10.0680 LearningRate 0.0541 Epoch: 5 Global Step: 219570 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:49,330-Speed 2629.53 samples/sec Loss 10.0363 LearningRate 0.0541 Epoch: 5 Global Step: 219580 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:53,222-Speed 2632.08 samples/sec Loss 10.0211 LearningRate 0.0541 Epoch: 5 Global Step: 219590 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:22:57,112-Speed 2632.85 samples/sec Loss 10.0274 LearningRate 0.0541 Epoch: 5 Global Step: 219600 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:01,002-Speed 2632.55 samples/sec Loss 10.0392 LearningRate 0.0541 Epoch: 5 Global Step: 219610 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:04,902-Speed 2626.98 samples/sec Loss 10.0463 LearningRate 0.0541 Epoch: 5 Global Step: 219620 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:23:08,793-Speed 2632.15 samples/sec Loss 10.0101 LearningRate 0.0541 Epoch: 5 Global Step: 219630 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:23:12,661-Speed 2647.95 samples/sec Loss 10.0421 LearningRate 0.0541 Epoch: 5 Global Step: 219640 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:16,566-Speed 2622.68 samples/sec Loss 9.9684 LearningRate 0.0541 Epoch: 5 Global Step: 219650 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:20,456-Speed 2633.30 samples/sec Loss 9.9565 LearningRate 0.0541 Epoch: 5 Global Step: 219660 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:24,350-Speed 2630.12 samples/sec Loss 10.0608 LearningRate 0.0541 Epoch: 5 Global Step: 219670 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:28,245-Speed 2629.53 samples/sec Loss 9.9962 LearningRate 0.0541 Epoch: 5 Global Step: 219680 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:32,137-Speed 2631.50 samples/sec Loss 10.0436 LearningRate 0.0540 Epoch: 5 Global Step: 219690 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:36,028-Speed 2632.69 samples/sec Loss 10.1261 LearningRate 0.0540 Epoch: 5 Global Step: 219700 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:39,921-Speed 2630.91 samples/sec Loss 10.0081 LearningRate 0.0540 Epoch: 5 Global Step: 219710 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:43,814-Speed 2631.19 samples/sec Loss 10.0462 LearningRate 0.0540 Epoch: 5 Global Step: 219720 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:47,722-Speed 2620.77 samples/sec Loss 10.0509 LearningRate 0.0540 Epoch: 5 Global Step: 219730 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:23:51,632-Speed 2619.49 samples/sec Loss 10.0479 LearningRate 0.0540 Epoch: 5 Global Step: 219740 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:23:55,541-Speed 2620.08 samples/sec Loss 9.9296 LearningRate 0.0540 Epoch: 5 Global Step: 219750 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:23:59,463-Speed 2611.62 samples/sec Loss 9.8947 LearningRate 0.0540 Epoch: 5 Global Step: 219760 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:24:03,371-Speed 2621.50 samples/sec Loss 10.0902 LearningRate 0.0540 Epoch: 5 Global Step: 219770 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:24:07,275-Speed 2623.16 samples/sec Loss 10.1367 LearningRate 0.0540 Epoch: 5 Global Step: 219780 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:24:11,171-Speed 2628.97 samples/sec Loss 10.0022 LearningRate 0.0540 Epoch: 5 Global Step: 219790 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:15,091-Speed 2613.26 samples/sec Loss 10.0311 LearningRate 0.0540 Epoch: 5 Global Step: 219800 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:18,989-Speed 2627.79 samples/sec Loss 9.9355 LearningRate 0.0540 Epoch: 5 Global Step: 219810 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:22,919-Speed 2606.44 samples/sec Loss 9.9337 LearningRate 0.0540 Epoch: 5 Global Step: 219820 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:26,810-Speed 2632.83 samples/sec Loss 10.0110 LearningRate 0.0540 Epoch: 5 Global Step: 219830 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:30,701-Speed 2632.02 samples/sec Loss 9.9141 LearningRate 0.0540 Epoch: 5 Global Step: 219840 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:34,594-Speed 2630.86 samples/sec Loss 10.1088 LearningRate 0.0540 Epoch: 5 Global Step: 219850 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:38,485-Speed 2632.26 samples/sec Loss 9.9927 LearningRate 0.0540 Epoch: 5 Global Step: 219860 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:24:42,369-Speed 2637.47 samples/sec Loss 9.9011 LearningRate 0.0540 Epoch: 5 Global Step: 219870 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:24:46,268-Speed 2626.75 samples/sec Loss 9.9875 LearningRate 0.0540 Epoch: 5 Global Step: 219880 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:24:50,169-Speed 2626.04 samples/sec Loss 9.8731 LearningRate 0.0540 Epoch: 5 Global Step: 219890 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:24:54,074-Speed 2622.71 samples/sec Loss 10.0153 LearningRate 0.0540 Epoch: 5 Global Step: 219900 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:24:57,976-Speed 2625.57 samples/sec Loss 10.1247 LearningRate 0.0540 Epoch: 5 Global Step: 219910 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:25:01,877-Speed 2625.61 samples/sec Loss 9.9480 LearningRate 0.0540 Epoch: 5 Global Step: 219920 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:25:05,774-Speed 2627.60 samples/sec Loss 10.0217 LearningRate 0.0540 Epoch: 5 Global Step: 219930 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:25:09,676-Speed 2625.31 samples/sec Loss 10.0062 LearningRate 0.0540 Epoch: 5 Global Step: 219940 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:25:13,565-Speed 2633.38 samples/sec Loss 10.0309 LearningRate 0.0540 Epoch: 5 Global Step: 219950 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:25:17,454-Speed 2633.62 samples/sec Loss 10.1350 LearningRate 0.0540 Epoch: 5 Global Step: 219960 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:25:21,353-Speed 2627.76 samples/sec Loss 9.8697 LearningRate 0.0540 Epoch: 5 Global Step: 219970 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:25:25,243-Speed 2633.08 samples/sec Loss 10.0525 LearningRate 0.0540 Epoch: 5 Global Step: 219980 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:25:29,147-Speed 2623.71 samples/sec Loss 9.9963 LearningRate 0.0540 Epoch: 5 Global Step: 219990 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:25:33,051-Speed 2623.29 samples/sec Loss 9.9675 LearningRate 0.0540 Epoch: 5 Global Step: 220000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:26:16,230-[lfw][220000]XNorm: 23.200060
Training: 2022-04-13 20:26:16,231-[lfw][220000]Accuracy-Flip: 0.99733+-0.00291
Training: 2022-04-13 20:26:16,232-[lfw][220000]Accuracy-Highest: 0.99783
Training: 2022-04-13 20:27:06,276-[cfp_fp][220000]XNorm: 21.186069
Training: 2022-04-13 20:27:06,277-[cfp_fp][220000]Accuracy-Flip: 0.98314+-0.00635
Training: 2022-04-13 20:27:06,278-[cfp_fp][220000]Accuracy-Highest: 0.98314
Training: 2022-04-13 20:27:49,348-[agedb_30][220000]XNorm: 22.932315
Training: 2022-04-13 20:27:49,349-[agedb_30][220000]Accuracy-Flip: 0.97133+-0.00653
Training: 2022-04-13 20:27:49,349-[agedb_30][220000]Accuracy-Highest: 0.97150
Training: 2022-04-13 20:27:53,241-Speed 73.04 samples/sec Loss 10.0306 LearningRate 0.0540 Epoch: 5 Global Step: 220010 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:27:57,124-Speed 2637.24 samples/sec Loss 9.9739 LearningRate 0.0540 Epoch: 5 Global Step: 220020 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:01,049-Speed 2610.26 samples/sec Loss 10.1418 LearningRate 0.0540 Epoch: 5 Global Step: 220030 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:04,965-Speed 2615.20 samples/sec Loss 9.9842 LearningRate 0.0540 Epoch: 5 Global Step: 220040 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:08,838-Speed 2644.69 samples/sec Loss 10.0220 LearningRate 0.0540 Epoch: 5 Global Step: 220050 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:12,842-Speed 2558.22 samples/sec Loss 9.9369 LearningRate 0.0540 Epoch: 5 Global Step: 220060 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:16,822-Speed 2574.39 samples/sec Loss 10.1259 LearningRate 0.0540 Epoch: 5 Global Step: 220070 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:28:20,712-Speed 2633.18 samples/sec Loss 9.8278 LearningRate 0.0540 Epoch: 5 Global Step: 220080 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:28:24,602-Speed 2633.24 samples/sec Loss 10.1270 LearningRate 0.0540 Epoch: 5 Global Step: 220090 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:28:28,487-Speed 2636.40 samples/sec Loss 9.8763 LearningRate 0.0540 Epoch: 5 Global Step: 220100 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:28:32,370-Speed 2638.58 samples/sec Loss 10.0805 LearningRate 0.0540 Epoch: 5 Global Step: 220110 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:36,265-Speed 2629.07 samples/sec Loss 10.0684 LearningRate 0.0540 Epoch: 5 Global Step: 220120 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:40,165-Speed 2626.84 samples/sec Loss 9.9581 LearningRate 0.0540 Epoch: 5 Global Step: 220130 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:44,053-Speed 2634.11 samples/sec Loss 10.0655 LearningRate 0.0540 Epoch: 5 Global Step: 220140 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:47,944-Speed 2632.16 samples/sec Loss 9.8805 LearningRate 0.0540 Epoch: 5 Global Step: 220150 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:51,837-Speed 2631.33 samples/sec Loss 9.9495 LearningRate 0.0540 Epoch: 5 Global Step: 220160 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:55,739-Speed 2624.68 samples/sec Loss 9.7935 LearningRate 0.0540 Epoch: 5 Global Step: 220170 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:28:59,647-Speed 2620.86 samples/sec Loss 9.9669 LearningRate 0.0540 Epoch: 5 Global Step: 220180 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:29:03,548-Speed 2626.10 samples/sec Loss 9.9065 LearningRate 0.0540 Epoch: 5 Global Step: 220190 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:29:07,440-Speed 2631.68 samples/sec Loss 10.0515 LearningRate 0.0540 Epoch: 5 Global Step: 220200 Fp16 Grad Scale: 131072 Required: 69 hours
Training: 2022-04-13 20:29:11,343-Speed 2623.81 samples/sec Loss 10.0182 LearningRate 0.0540 Epoch: 5 Global Step: 220210 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:29:15,238-Speed 2629.97 samples/sec Loss 10.0195 LearningRate 0.0540 Epoch: 5 Global Step: 220220 Fp16 Grad Scale: 262144 Required: 69 hours
Training: 2022-04-13 20:29:19,136-Speed 2627.35 samples/sec Loss 9.8854 LearningRate 0.0540 Epoch: 5 Global Step: 220230 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:29:23,026-Speed 2632.97 samples/sec Loss 9.9001 LearningRate 0.0540 Epoch: 5 Global Step: 220240 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:26,928-Speed 2625.51 samples/sec Loss 10.0019 LearningRate 0.0539 Epoch: 5 Global Step: 220250 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:30,822-Speed 2630.79 samples/sec Loss 10.0835 LearningRate 0.0539 Epoch: 5 Global Step: 220260 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:34,719-Speed 2628.76 samples/sec Loss 9.9621 LearningRate 0.0539 Epoch: 5 Global Step: 220270 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:38,641-Speed 2611.14 samples/sec Loss 9.9796 LearningRate 0.0539 Epoch: 5 Global Step: 220280 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:42,537-Speed 2629.41 samples/sec Loss 9.9820 LearningRate 0.0539 Epoch: 5 Global Step: 220290 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:46,436-Speed 2626.58 samples/sec Loss 9.8204 LearningRate 0.0539 Epoch: 5 Global Step: 220300 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:50,357-Speed 2612.51 samples/sec Loss 9.9489 LearningRate 0.0539 Epoch: 5 Global Step: 220310 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:54,251-Speed 2630.68 samples/sec Loss 10.0093 LearningRate 0.0539 Epoch: 5 Global Step: 220320 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:29:58,143-Speed 2631.69 samples/sec Loss 9.9711 LearningRate 0.0539 Epoch: 5 Global Step: 220330 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:02,056-Speed 2617.95 samples/sec Loss 10.0277 LearningRate 0.0539 Epoch: 5 Global Step: 220340 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:30:05,950-Speed 2630.23 samples/sec Loss 10.0085 LearningRate 0.0539 Epoch: 5 Global Step: 220350 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:09,859-Speed 2619.94 samples/sec Loss 10.0123 LearningRate 0.0539 Epoch: 5 Global Step: 220360 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:13,749-Speed 2633.11 samples/sec Loss 9.9321 LearningRate 0.0539 Epoch: 5 Global Step: 220370 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:17,644-Speed 2629.80 samples/sec Loss 10.0915 LearningRate 0.0539 Epoch: 5 Global Step: 220380 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:21,547-Speed 2624.56 samples/sec Loss 10.1267 LearningRate 0.0539 Epoch: 5 Global Step: 220390 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:25,444-Speed 2628.29 samples/sec Loss 10.0198 LearningRate 0.0539 Epoch: 5 Global Step: 220400 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:29,340-Speed 2629.32 samples/sec Loss 10.0362 LearningRate 0.0539 Epoch: 5 Global Step: 220410 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:33,237-Speed 2628.65 samples/sec Loss 9.9091 LearningRate 0.0539 Epoch: 5 Global Step: 220420 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:37,128-Speed 2631.91 samples/sec Loss 9.8926 LearningRate 0.0539 Epoch: 5 Global Step: 220430 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:41,020-Speed 2632.17 samples/sec Loss 10.0105 LearningRate 0.0539 Epoch: 5 Global Step: 220440 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:44,892-Speed 2644.56 samples/sec Loss 10.1809 LearningRate 0.0539 Epoch: 5 Global Step: 220450 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:48,785-Speed 2631.68 samples/sec Loss 10.0105 LearningRate 0.0539 Epoch: 5 Global Step: 220460 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:52,678-Speed 2630.73 samples/sec Loss 9.9888 LearningRate 0.0539 Epoch: 5 Global Step: 220470 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:30:56,571-Speed 2631.20 samples/sec Loss 10.1567 LearningRate 0.0539 Epoch: 5 Global Step: 220480 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:00,465-Speed 2630.27 samples/sec Loss 10.0783 LearningRate 0.0539 Epoch: 5 Global Step: 220490 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:04,393-Speed 2607.58 samples/sec Loss 10.0672 LearningRate 0.0539 Epoch: 5 Global Step: 220500 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:08,289-Speed 2629.65 samples/sec Loss 10.0104 LearningRate 0.0539 Epoch: 5 Global Step: 220510 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:12,208-Speed 2613.54 samples/sec Loss 9.9724 LearningRate 0.0539 Epoch: 5 Global Step: 220520 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:16,110-Speed 2624.63 samples/sec Loss 9.8584 LearningRate 0.0539 Epoch: 5 Global Step: 220530 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:20,013-Speed 2624.24 samples/sec Loss 9.9497 LearningRate 0.0539 Epoch: 5 Global Step: 220540 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:23,914-Speed 2625.47 samples/sec Loss 10.0391 LearningRate 0.0539 Epoch: 5 Global Step: 220550 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:31:27,813-Speed 2626.71 samples/sec Loss 9.9497 LearningRate 0.0539 Epoch: 5 Global Step: 220560 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:31:31,691-Speed 2642.27 samples/sec Loss 9.8686 LearningRate 0.0539 Epoch: 5 Global Step: 220570 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:31:35,547-Speed 2655.94 samples/sec Loss 9.9410 LearningRate 0.0539 Epoch: 5 Global Step: 220580 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:31:39,442-Speed 2629.23 samples/sec Loss 10.2647 LearningRate 0.0539 Epoch: 5 Global Step: 220590 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:31:43,341-Speed 2626.76 samples/sec Loss 10.2443 LearningRate 0.0539 Epoch: 5 Global Step: 220600 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:31:47,236-Speed 2629.84 samples/sec Loss 9.9690 LearningRate 0.0539 Epoch: 5 Global Step: 220610 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:31:51,136-Speed 2626.02 samples/sec Loss 9.9935 LearningRate 0.0539 Epoch: 5 Global Step: 220620 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:31:55,047-Speed 2619.00 samples/sec Loss 10.0623 LearningRate 0.0539 Epoch: 5 Global Step: 220630 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:31:58,933-Speed 2635.53 samples/sec Loss 10.0607 LearningRate 0.0539 Epoch: 5 Global Step: 220640 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:32:02,843-Speed 2620.19 samples/sec Loss 10.0129 LearningRate 0.0539 Epoch: 5 Global Step: 220650 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:32:06,734-Speed 2632.42 samples/sec Loss 9.8744 LearningRate 0.0539 Epoch: 5 Global Step: 220660 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:32:10,662-Speed 2607.11 samples/sec Loss 9.9987 LearningRate 0.0539 Epoch: 5 Global Step: 220670 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:32:14,554-Speed 2631.71 samples/sec Loss 10.0584 LearningRate 0.0539 Epoch: 5 Global Step: 220680 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:18,443-Speed 2634.11 samples/sec Loss 9.9873 LearningRate 0.0539 Epoch: 5 Global Step: 220690 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:22,333-Speed 2633.09 samples/sec Loss 10.0505 LearningRate 0.0539 Epoch: 5 Global Step: 220700 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:26,225-Speed 2631.34 samples/sec Loss 9.9631 LearningRate 0.0539 Epoch: 5 Global Step: 220710 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:30,124-Speed 2627.21 samples/sec Loss 10.0928 LearningRate 0.0539 Epoch: 5 Global Step: 220720 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:34,040-Speed 2616.12 samples/sec Loss 10.0408 LearningRate 0.0539 Epoch: 5 Global Step: 220730 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:37,938-Speed 2627.43 samples/sec Loss 9.9417 LearningRate 0.0539 Epoch: 5 Global Step: 220740 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:41,838-Speed 2626.46 samples/sec Loss 10.1104 LearningRate 0.0539 Epoch: 5 Global Step: 220750 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:45,740-Speed 2624.71 samples/sec Loss 9.9397 LearningRate 0.0539 Epoch: 5 Global Step: 220760 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:49,647-Speed 2621.83 samples/sec Loss 10.0041 LearningRate 0.0539 Epoch: 5 Global Step: 220770 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:32:53,556-Speed 2619.56 samples/sec Loss 9.9856 LearningRate 0.0539 Epoch: 5 Global Step: 220780 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:32:57,467-Speed 2619.50 samples/sec Loss 10.0469 LearningRate 0.0539 Epoch: 5 Global Step: 220790 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:01,361-Speed 2630.61 samples/sec Loss 10.0396 LearningRate 0.0539 Epoch: 5 Global Step: 220800 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:05,255-Speed 2630.43 samples/sec Loss 10.0961 LearningRate 0.0539 Epoch: 5 Global Step: 220810 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:09,149-Speed 2630.13 samples/sec Loss 9.9185 LearningRate 0.0538 Epoch: 5 Global Step: 220820 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:13,046-Speed 2628.37 samples/sec Loss 9.9684 LearningRate 0.0538 Epoch: 5 Global Step: 220830 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:16,942-Speed 2628.87 samples/sec Loss 10.0556 LearningRate 0.0538 Epoch: 5 Global Step: 220840 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:20,839-Speed 2628.70 samples/sec Loss 9.8995 LearningRate 0.0538 Epoch: 5 Global Step: 220850 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:24,732-Speed 2630.45 samples/sec Loss 9.9357 LearningRate 0.0538 Epoch: 5 Global Step: 220860 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:28,644-Speed 2618.68 samples/sec Loss 10.0277 LearningRate 0.0538 Epoch: 5 Global Step: 220870 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:32,534-Speed 2632.47 samples/sec Loss 10.0318 LearningRate 0.0538 Epoch: 5 Global Step: 220880 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:33:36,437-Speed 2625.32 samples/sec Loss 9.8649 LearningRate 0.0538 Epoch: 5 Global Step: 220890 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:33:40,334-Speed 2628.01 samples/sec Loss 10.0395 LearningRate 0.0538 Epoch: 5 Global Step: 220900 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:33:44,226-Speed 2631.89 samples/sec Loss 9.9568 LearningRate 0.0538 Epoch: 5 Global Step: 220910 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:33:48,123-Speed 2628.20 samples/sec Loss 9.8580 LearningRate 0.0538 Epoch: 5 Global Step: 220920 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:33:52,016-Speed 2631.05 samples/sec Loss 10.0262 LearningRate 0.0538 Epoch: 5 Global Step: 220930 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:33:55,888-Speed 2645.19 samples/sec Loss 10.9438 LearningRate 0.0538 Epoch: 5 Global Step: 220940 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:33:59,801-Speed 2617.70 samples/sec Loss 10.4241 LearningRate 0.0538 Epoch: 5 Global Step: 220950 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:03,711-Speed 2619.17 samples/sec Loss 10.2572 LearningRate 0.0538 Epoch: 5 Global Step: 220960 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:07,655-Speed 2597.34 samples/sec Loss 10.0677 LearningRate 0.0538 Epoch: 5 Global Step: 220970 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:11,548-Speed 2631.45 samples/sec Loss 10.1055 LearningRate 0.0538 Epoch: 5 Global Step: 220980 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:15,489-Speed 2598.95 samples/sec Loss 10.0243 LearningRate 0.0538 Epoch: 5 Global Step: 220990 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:19,386-Speed 2628.66 samples/sec Loss 10.2253 LearningRate 0.0538 Epoch: 5 Global Step: 221000 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:23,328-Speed 2598.24 samples/sec Loss 9.8980 LearningRate 0.0538 Epoch: 5 Global Step: 221010 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:27,251-Speed 2610.72 samples/sec Loss 9.9987 LearningRate 0.0538 Epoch: 5 Global Step: 221020 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:31,154-Speed 2624.39 samples/sec Loss 10.0776 LearningRate 0.0538 Epoch: 5 Global Step: 221030 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:34:35,045-Speed 2632.56 samples/sec Loss 10.1081 LearningRate 0.0538 Epoch: 5 Global Step: 221040 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:34:39,049-Speed 2557.73 samples/sec Loss 10.0113 LearningRate 0.0538 Epoch: 5 Global Step: 221050 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:34:42,960-Speed 2618.91 samples/sec Loss 9.9864 LearningRate 0.0538 Epoch: 5 Global Step: 221060 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:34:46,856-Speed 2629.34 samples/sec Loss 9.7910 LearningRate 0.0538 Epoch: 5 Global Step: 221070 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:34:50,758-Speed 2624.96 samples/sec Loss 9.9383 LearningRate 0.0538 Epoch: 5 Global Step: 221080 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:34:54,651-Speed 2631.56 samples/sec Loss 9.9549 LearningRate 0.0538 Epoch: 5 Global Step: 221090 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:34:58,546-Speed 2629.27 samples/sec Loss 9.8907 LearningRate 0.0538 Epoch: 5 Global Step: 221100 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:35:02,467-Speed 2611.70 samples/sec Loss 9.9447 LearningRate 0.0538 Epoch: 5 Global Step: 221110 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:35:06,397-Speed 2606.70 samples/sec Loss 9.8893 LearningRate 0.0538 Epoch: 5 Global Step: 221120 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:35:10,303-Speed 2622.39 samples/sec Loss 9.9943 LearningRate 0.0538 Epoch: 5 Global Step: 221130 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:35:14,217-Speed 2617.03 samples/sec Loss 10.0138 LearningRate 0.0538 Epoch: 5 Global Step: 221140 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:18,123-Speed 2621.88 samples/sec Loss 10.0316 LearningRate 0.0538 Epoch: 5 Global Step: 221150 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:22,024-Speed 2626.33 samples/sec Loss 9.8782 LearningRate 0.0538 Epoch: 5 Global Step: 221160 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:25,926-Speed 2624.89 samples/sec Loss 9.8930 LearningRate 0.0538 Epoch: 5 Global Step: 221170 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:29,870-Speed 2597.16 samples/sec Loss 9.8945 LearningRate 0.0538 Epoch: 5 Global Step: 221180 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:33,768-Speed 2628.02 samples/sec Loss 9.9212 LearningRate 0.0538 Epoch: 5 Global Step: 221190 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:37,688-Speed 2612.30 samples/sec Loss 9.9390 LearningRate 0.0538 Epoch: 5 Global Step: 221200 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:41,593-Speed 2622.96 samples/sec Loss 9.9658 LearningRate 0.0538 Epoch: 5 Global Step: 221210 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:45,493-Speed 2626.61 samples/sec Loss 9.9981 LearningRate 0.0538 Epoch: 5 Global Step: 221220 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:49,392-Speed 2627.13 samples/sec Loss 9.9202 LearningRate 0.0538 Epoch: 5 Global Step: 221230 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:35:53,294-Speed 2625.14 samples/sec Loss 9.9435 LearningRate 0.0538 Epoch: 5 Global Step: 221240 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:35:57,191-Speed 2628.47 samples/sec Loss 10.0340 LearningRate 0.0538 Epoch: 5 Global Step: 221250 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:36:01,093-Speed 2625.02 samples/sec Loss 9.9066 LearningRate 0.0538 Epoch: 5 Global Step: 221260 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:36:05,003-Speed 2619.70 samples/sec Loss 10.0100 LearningRate 0.0538 Epoch: 5 Global Step: 221270 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:36:08,906-Speed 2624.00 samples/sec Loss 10.0515 LearningRate 0.0538 Epoch: 5 Global Step: 221280 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:36:12,809-Speed 2624.28 samples/sec Loss 10.0242 LearningRate 0.0538 Epoch: 5 Global Step: 221290 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:36:16,695-Speed 2635.49 samples/sec Loss 10.1204 LearningRate 0.0538 Epoch: 5 Global Step: 221300 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:20,599-Speed 2624.56 samples/sec Loss 10.1266 LearningRate 0.0538 Epoch: 5 Global Step: 221310 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:24,495-Speed 2628.68 samples/sec Loss 9.9080 LearningRate 0.0538 Epoch: 5 Global Step: 221320 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:28,407-Speed 2618.24 samples/sec Loss 9.9866 LearningRate 0.0538 Epoch: 5 Global Step: 221330 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:32,313-Speed 2622.37 samples/sec Loss 9.7967 LearningRate 0.0538 Epoch: 5 Global Step: 221340 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:36,212-Speed 2626.85 samples/sec Loss 10.0145 LearningRate 0.0538 Epoch: 5 Global Step: 221350 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:40,106-Speed 2630.17 samples/sec Loss 10.0062 LearningRate 0.0538 Epoch: 5 Global Step: 221360 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:44,037-Speed 2606.10 samples/sec Loss 9.7928 LearningRate 0.0538 Epoch: 5 Global Step: 221370 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:47,946-Speed 2620.55 samples/sec Loss 9.8781 LearningRate 0.0537 Epoch: 5 Global Step: 221380 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:51,913-Speed 2582.32 samples/sec Loss 9.9520 LearningRate 0.0537 Epoch: 5 Global Step: 221390 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:36:55,837-Speed 2610.04 samples/sec Loss 9.8845 LearningRate 0.0537 Epoch: 5 Global Step: 221400 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:36:59,734-Speed 2628.38 samples/sec Loss 9.9165 LearningRate 0.0537 Epoch: 5 Global Step: 221410 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:37:03,626-Speed 2631.45 samples/sec Loss 9.9415 LearningRate 0.0537 Epoch: 5 Global Step: 221420 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:37:07,522-Speed 2629.27 samples/sec Loss 10.0071 LearningRate 0.0537 Epoch: 5 Global Step: 221430 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:37:11,414-Speed 2631.48 samples/sec Loss 9.9442 LearningRate 0.0537 Epoch: 5 Global Step: 221440 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:37:15,314-Speed 2626.18 samples/sec Loss 10.0824 LearningRate 0.0537 Epoch: 5 Global Step: 221450 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:37:19,234-Speed 2613.82 samples/sec Loss 9.9524 LearningRate 0.0537 Epoch: 5 Global Step: 221460 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:37:23,143-Speed 2619.63 samples/sec Loss 10.0173 LearningRate 0.0537 Epoch: 5 Global Step: 221470 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:27,065-Speed 2612.17 samples/sec Loss 9.9579 LearningRate 0.0537 Epoch: 5 Global Step: 221480 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:30,965-Speed 2626.27 samples/sec Loss 10.0224 LearningRate 0.0537 Epoch: 5 Global Step: 221490 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:34,862-Speed 2628.18 samples/sec Loss 10.0503 LearningRate 0.0537 Epoch: 5 Global Step: 221500 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:38,752-Speed 2632.77 samples/sec Loss 9.8846 LearningRate 0.0537 Epoch: 5 Global Step: 221510 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:42,644-Speed 2631.95 samples/sec Loss 9.9804 LearningRate 0.0537 Epoch: 5 Global Step: 221520 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:46,542-Speed 2627.31 samples/sec Loss 10.1132 LearningRate 0.0537 Epoch: 5 Global Step: 221530 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:50,436-Speed 2630.66 samples/sec Loss 9.9822 LearningRate 0.0537 Epoch: 5 Global Step: 221540 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:54,335-Speed 2626.85 samples/sec Loss 10.0538 LearningRate 0.0537 Epoch: 5 Global Step: 221550 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:37:58,229-Speed 2630.86 samples/sec Loss 9.9929 LearningRate 0.0537 Epoch: 5 Global Step: 221560 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:38:02,127-Speed 2627.49 samples/sec Loss 9.9637 LearningRate 0.0537 Epoch: 5 Global Step: 221570 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:06,020-Speed 2631.08 samples/sec Loss 9.9839 LearningRate 0.0537 Epoch: 5 Global Step: 221580 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:09,925-Speed 2622.64 samples/sec Loss 9.9822 LearningRate 0.0537 Epoch: 5 Global Step: 221590 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:13,816-Speed 2632.28 samples/sec Loss 9.9101 LearningRate 0.0537 Epoch: 5 Global Step: 221600 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:17,713-Speed 2628.62 samples/sec Loss 9.8222 LearningRate 0.0537 Epoch: 5 Global Step: 221610 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:21,642-Speed 2606.58 samples/sec Loss 10.0665 LearningRate 0.0537 Epoch: 5 Global Step: 221620 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:25,539-Speed 2628.85 samples/sec Loss 9.9446 LearningRate 0.0537 Epoch: 5 Global Step: 221630 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:29,430-Speed 2632.65 samples/sec Loss 10.0640 LearningRate 0.0537 Epoch: 5 Global Step: 221640 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:33,322-Speed 2631.87 samples/sec Loss 9.9477 LearningRate 0.0537 Epoch: 5 Global Step: 221650 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:37,216-Speed 2630.51 samples/sec Loss 9.7971 LearningRate 0.0537 Epoch: 5 Global Step: 221660 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:41,101-Speed 2636.34 samples/sec Loss 9.9805 LearningRate 0.0537 Epoch: 5 Global Step: 221670 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:45,020-Speed 2613.72 samples/sec Loss 9.9391 LearningRate 0.0537 Epoch: 5 Global Step: 221680 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:38:48,925-Speed 2622.90 samples/sec Loss 9.9267 LearningRate 0.0537 Epoch: 5 Global Step: 221690 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:38:52,825-Speed 2626.09 samples/sec Loss 9.9418 LearningRate 0.0537 Epoch: 5 Global Step: 221700 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:38:56,716-Speed 2632.54 samples/sec Loss 10.0152 LearningRate 0.0537 Epoch: 5 Global Step: 221710 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:00,607-Speed 2632.68 samples/sec Loss 9.9849 LearningRate 0.0537 Epoch: 5 Global Step: 221720 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:04,506-Speed 2627.00 samples/sec Loss 9.8714 LearningRate 0.0537 Epoch: 5 Global Step: 221730 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:08,406-Speed 2626.33 samples/sec Loss 10.0591 LearningRate 0.0537 Epoch: 5 Global Step: 221740 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:12,326-Speed 2613.04 samples/sec Loss 9.9576 LearningRate 0.0537 Epoch: 5 Global Step: 221750 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:16,252-Speed 2608.97 samples/sec Loss 10.0310 LearningRate 0.0537 Epoch: 5 Global Step: 221760 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:20,159-Speed 2621.68 samples/sec Loss 9.9116 LearningRate 0.0537 Epoch: 5 Global Step: 221770 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:24,062-Speed 2624.19 samples/sec Loss 9.9616 LearningRate 0.0537 Epoch: 5 Global Step: 221780 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:27,959-Speed 2628.33 samples/sec Loss 9.8618 LearningRate 0.0537 Epoch: 5 Global Step: 221790 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:39:31,871-Speed 2617.85 samples/sec Loss 9.9557 LearningRate 0.0537 Epoch: 5 Global Step: 221800 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:39:35,774-Speed 2624.99 samples/sec Loss 10.1888 LearningRate 0.0537 Epoch: 5 Global Step: 221810 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:39:39,657-Speed 2637.74 samples/sec Loss 9.9377 LearningRate 0.0537 Epoch: 5 Global Step: 221820 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:43,544-Speed 2635.55 samples/sec Loss 10.0117 LearningRate 0.0537 Epoch: 5 Global Step: 221830 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:47,437-Speed 2630.55 samples/sec Loss 10.0307 LearningRate 0.0537 Epoch: 5 Global Step: 221840 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:51,334-Speed 2627.80 samples/sec Loss 10.1571 LearningRate 0.0537 Epoch: 5 Global Step: 221850 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:55,260-Speed 2608.92 samples/sec Loss 9.9051 LearningRate 0.0537 Epoch: 5 Global Step: 221860 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:39:59,162-Speed 2624.83 samples/sec Loss 9.9544 LearningRate 0.0537 Epoch: 5 Global Step: 221870 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:40:03,227-Speed 2519.57 samples/sec Loss 10.1159 LearningRate 0.0537 Epoch: 5 Global Step: 221880 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:07,318-Speed 2504.03 samples/sec Loss 10.0403 LearningRate 0.0537 Epoch: 5 Global Step: 221890 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:11,409-Speed 2503.82 samples/sec Loss 9.7826 LearningRate 0.0537 Epoch: 5 Global Step: 221900 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:15,349-Speed 2599.95 samples/sec Loss 9.9961 LearningRate 0.0537 Epoch: 5 Global Step: 221910 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:19,244-Speed 2629.42 samples/sec Loss 10.1118 LearningRate 0.0537 Epoch: 5 Global Step: 221920 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:23,166-Speed 2611.67 samples/sec Loss 9.9938 LearningRate 0.0537 Epoch: 5 Global Step: 221930 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:27,080-Speed 2617.00 samples/sec Loss 9.8600 LearningRate 0.0537 Epoch: 5 Global Step: 221940 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:30,972-Speed 2631.62 samples/sec Loss 10.0447 LearningRate 0.0536 Epoch: 5 Global Step: 221950 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:34,866-Speed 2630.86 samples/sec Loss 10.0652 LearningRate 0.0536 Epoch: 5 Global Step: 221960 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:38,761-Speed 2629.72 samples/sec Loss 9.9364 LearningRate 0.0536 Epoch: 5 Global Step: 221970 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:40:42,655-Speed 2629.94 samples/sec Loss 9.8763 LearningRate 0.0536 Epoch: 5 Global Step: 221980 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:40:46,551-Speed 2629.26 samples/sec Loss 9.9005 LearningRate 0.0536 Epoch: 5 Global Step: 221990 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:40:50,448-Speed 2628.27 samples/sec Loss 9.9638 LearningRate 0.0536 Epoch: 5 Global Step: 222000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:40:54,342-Speed 2630.64 samples/sec Loss 10.0484 LearningRate 0.0536 Epoch: 5 Global Step: 222010 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:40:58,218-Speed 2642.65 samples/sec Loss 9.9676 LearningRate 0.0536 Epoch: 5 Global Step: 222020 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:02,115-Speed 2628.25 samples/sec Loss 9.9068 LearningRate 0.0536 Epoch: 5 Global Step: 222030 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:06,010-Speed 2629.85 samples/sec Loss 10.1066 LearningRate 0.0536 Epoch: 5 Global Step: 222040 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:09,907-Speed 2628.78 samples/sec Loss 10.0034 LearningRate 0.0536 Epoch: 5 Global Step: 222050 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:13,807-Speed 2625.99 samples/sec Loss 9.9523 LearningRate 0.0536 Epoch: 5 Global Step: 222060 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:17,709-Speed 2624.37 samples/sec Loss 9.9761 LearningRate 0.0536 Epoch: 5 Global Step: 222070 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:21,608-Speed 2627.50 samples/sec Loss 9.9762 LearningRate 0.0536 Epoch: 5 Global Step: 222080 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:25,505-Speed 2628.15 samples/sec Loss 9.8709 LearningRate 0.0536 Epoch: 5 Global Step: 222090 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:29,402-Speed 2628.68 samples/sec Loss 10.0299 LearningRate 0.0536 Epoch: 5 Global Step: 222100 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:33,294-Speed 2631.75 samples/sec Loss 10.0269 LearningRate 0.0536 Epoch: 5 Global Step: 222110 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:41:37,188-Speed 2630.20 samples/sec Loss 10.0309 LearningRate 0.0536 Epoch: 5 Global Step: 222120 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:41:41,087-Speed 2626.50 samples/sec Loss 10.0736 LearningRate 0.0536 Epoch: 5 Global Step: 222130 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:41:45,070-Speed 2571.94 samples/sec Loss 10.0035 LearningRate 0.0536 Epoch: 5 Global Step: 222140 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:41:48,983-Speed 2617.59 samples/sec Loss 9.8634 LearningRate 0.0536 Epoch: 5 Global Step: 222150 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:41:52,886-Speed 2624.79 samples/sec Loss 9.9059 LearningRate 0.0536 Epoch: 5 Global Step: 222160 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:41:56,820-Speed 2603.45 samples/sec Loss 9.9098 LearningRate 0.0536 Epoch: 5 Global Step: 222170 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:42:00,729-Speed 2620.24 samples/sec Loss 9.9441 LearningRate 0.0536 Epoch: 5 Global Step: 222180 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:42:04,627-Speed 2627.69 samples/sec Loss 10.0259 LearningRate 0.0536 Epoch: 5 Global Step: 222190 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:42:08,536-Speed 2620.49 samples/sec Loss 9.8892 LearningRate 0.0536 Epoch: 5 Global Step: 222200 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:42:12,439-Speed 2623.88 samples/sec Loss 10.0029 LearningRate 0.0536 Epoch: 5 Global Step: 222210 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:42:16,323-Speed 2637.11 samples/sec Loss 10.0500 LearningRate 0.0536 Epoch: 5 Global Step: 222220 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:42:20,159-Speed 2670.64 samples/sec Loss 10.3783 LearningRate 0.0536 Epoch: 5 Global Step: 222230 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:42:24,032-Speed 2644.54 samples/sec Loss 10.0954 LearningRate 0.0536 Epoch: 5 Global Step: 222240 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:27,924-Speed 2632.01 samples/sec Loss 9.9727 LearningRate 0.0536 Epoch: 5 Global Step: 222250 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:31,817-Speed 2631.08 samples/sec Loss 10.1716 LearningRate 0.0536 Epoch: 5 Global Step: 222260 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:35,706-Speed 2632.86 samples/sec Loss 9.7784 LearningRate 0.0536 Epoch: 5 Global Step: 222270 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:39,596-Speed 2632.91 samples/sec Loss 10.0891 LearningRate 0.0536 Epoch: 5 Global Step: 222280 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:43,485-Speed 2634.68 samples/sec Loss 9.9355 LearningRate 0.0536 Epoch: 5 Global Step: 222290 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:47,384-Speed 2626.66 samples/sec Loss 9.8839 LearningRate 0.0536 Epoch: 5 Global Step: 222300 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:51,277-Speed 2631.45 samples/sec Loss 10.0169 LearningRate 0.0536 Epoch: 5 Global Step: 222310 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:55,169-Speed 2631.71 samples/sec Loss 9.9630 LearningRate 0.0536 Epoch: 5 Global Step: 222320 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:42:59,055-Speed 2635.63 samples/sec Loss 9.9222 LearningRate 0.0536 Epoch: 5 Global Step: 222330 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 20:43:02,945-Speed 2633.37 samples/sec Loss 9.9072 LearningRate 0.0536 Epoch: 5 Global Step: 222340 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:06,836-Speed 2632.62 samples/sec Loss 9.9409 LearningRate 0.0536 Epoch: 5 Global Step: 222350 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:10,725-Speed 2633.50 samples/sec Loss 10.1096 LearningRate 0.0536 Epoch: 5 Global Step: 222360 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:14,616-Speed 2632.07 samples/sec Loss 9.9633 LearningRate 0.0536 Epoch: 5 Global Step: 222370 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:18,508-Speed 2631.91 samples/sec Loss 10.0456 LearningRate 0.0536 Epoch: 5 Global Step: 222380 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:22,402-Speed 2630.86 samples/sec Loss 9.8395 LearningRate 0.0536 Epoch: 5 Global Step: 222390 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:26,292-Speed 2632.76 samples/sec Loss 9.9683 LearningRate 0.0536 Epoch: 5 Global Step: 222400 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:30,181-Speed 2633.47 samples/sec Loss 10.0553 LearningRate 0.0536 Epoch: 5 Global Step: 222410 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:34,081-Speed 2626.44 samples/sec Loss 9.8993 LearningRate 0.0536 Epoch: 5 Global Step: 222420 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:37,972-Speed 2632.64 samples/sec Loss 10.0165 LearningRate 0.0536 Epoch: 5 Global Step: 222430 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:43:41,865-Speed 2630.99 samples/sec Loss 10.0069 LearningRate 0.0536 Epoch: 5 Global Step: 222440 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:43:45,761-Speed 2629.31 samples/sec Loss 9.9758 LearningRate 0.0536 Epoch: 5 Global Step: 222450 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:43:49,650-Speed 2633.73 samples/sec Loss 10.0067 LearningRate 0.0536 Epoch: 5 Global Step: 222460 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:43:53,548-Speed 2628.08 samples/sec Loss 10.0894 LearningRate 0.0536 Epoch: 5 Global Step: 222470 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:43:57,443-Speed 2629.38 samples/sec Loss 9.7530 LearningRate 0.0536 Epoch: 5 Global Step: 222480 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:44:01,335-Speed 2631.85 samples/sec Loss 9.8219 LearningRate 0.0536 Epoch: 5 Global Step: 222490 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:44:05,225-Speed 2633.13 samples/sec Loss 9.9561 LearningRate 0.0536 Epoch: 5 Global Step: 222500 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:44:09,116-Speed 2632.07 samples/sec Loss 9.8875 LearningRate 0.0536 Epoch: 5 Global Step: 222510 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:44:13,012-Speed 2628.51 samples/sec Loss 10.0546 LearningRate 0.0535 Epoch: 5 Global Step: 222520 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:44:16,915-Speed 2624.27 samples/sec Loss 9.8989 LearningRate 0.0535 Epoch: 5 Global Step: 222530 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:44:20,866-Speed 2593.29 samples/sec Loss 9.9713 LearningRate 0.0535 Epoch: 5 Global Step: 222540 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:24,890-Speed 2544.63 samples/sec Loss 9.8810 LearningRate 0.0535 Epoch: 5 Global Step: 222550 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:28,803-Speed 2617.52 samples/sec Loss 9.7996 LearningRate 0.0535 Epoch: 5 Global Step: 222560 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:32,702-Speed 2626.92 samples/sec Loss 10.0661 LearningRate 0.0535 Epoch: 5 Global Step: 222570 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:36,594-Speed 2632.18 samples/sec Loss 10.0310 LearningRate 0.0535 Epoch: 5 Global Step: 222580 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:40,522-Speed 2607.57 samples/sec Loss 9.8175 LearningRate 0.0535 Epoch: 5 Global Step: 222590 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:44,416-Speed 2630.49 samples/sec Loss 9.8700 LearningRate 0.0535 Epoch: 5 Global Step: 222600 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:48,325-Speed 2620.51 samples/sec Loss 10.0659 LearningRate 0.0535 Epoch: 5 Global Step: 222610 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:52,217-Speed 2631.57 samples/sec Loss 9.8055 LearningRate 0.0535 Epoch: 5 Global Step: 222620 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:56,106-Speed 2634.41 samples/sec Loss 9.9905 LearningRate 0.0535 Epoch: 5 Global Step: 222630 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:44:59,998-Speed 2631.79 samples/sec Loss 9.8704 LearningRate 0.0535 Epoch: 5 Global Step: 222640 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:45:03,892-Speed 2630.10 samples/sec Loss 9.7757 LearningRate 0.0535 Epoch: 5 Global Step: 222650 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:45:07,789-Speed 2628.27 samples/sec Loss 10.0900 LearningRate 0.0535 Epoch: 5 Global Step: 222660 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:45:11,680-Speed 2632.47 samples/sec Loss 10.1294 LearningRate 0.0535 Epoch: 5 Global Step: 222670 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:45:15,555-Speed 2643.93 samples/sec Loss 10.0662 LearningRate 0.0535 Epoch: 5 Global Step: 222680 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:19,469-Speed 2616.25 samples/sec Loss 9.9463 LearningRate 0.0535 Epoch: 5 Global Step: 222690 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:23,360-Speed 2633.22 samples/sec Loss 9.9107 LearningRate 0.0535 Epoch: 5 Global Step: 222700 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:27,248-Speed 2634.55 samples/sec Loss 9.9043 LearningRate 0.0535 Epoch: 5 Global Step: 222710 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:31,139-Speed 2631.72 samples/sec Loss 10.0594 LearningRate 0.0535 Epoch: 5 Global Step: 222720 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:35,038-Speed 2627.23 samples/sec Loss 9.8619 LearningRate 0.0535 Epoch: 5 Global Step: 222730 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:38,931-Speed 2631.08 samples/sec Loss 9.9575 LearningRate 0.0535 Epoch: 5 Global Step: 222740 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:42,842-Speed 2618.97 samples/sec Loss 9.8973 LearningRate 0.0535 Epoch: 5 Global Step: 222750 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:46,730-Speed 2634.56 samples/sec Loss 9.9674 LearningRate 0.0535 Epoch: 5 Global Step: 222760 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:50,621-Speed 2632.16 samples/sec Loss 10.0083 LearningRate 0.0535 Epoch: 5 Global Step: 222770 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:45:54,567-Speed 2596.47 samples/sec Loss 9.8616 LearningRate 0.0535 Epoch: 5 Global Step: 222780 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:45:58,464-Speed 2627.73 samples/sec Loss 9.8128 LearningRate 0.0535 Epoch: 5 Global Step: 222790 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:02,359-Speed 2629.87 samples/sec Loss 9.9636 LearningRate 0.0535 Epoch: 5 Global Step: 222800 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:06,285-Speed 2608.51 samples/sec Loss 9.8985 LearningRate 0.0535 Epoch: 5 Global Step: 222810 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:10,177-Speed 2632.34 samples/sec Loss 9.9453 LearningRate 0.0535 Epoch: 5 Global Step: 222820 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:14,078-Speed 2625.76 samples/sec Loss 9.8297 LearningRate 0.0535 Epoch: 5 Global Step: 222830 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:17,972-Speed 2630.10 samples/sec Loss 9.9408 LearningRate 0.0535 Epoch: 5 Global Step: 222840 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:21,868-Speed 2628.56 samples/sec Loss 10.1231 LearningRate 0.0535 Epoch: 5 Global Step: 222850 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:25,755-Speed 2635.85 samples/sec Loss 9.9355 LearningRate 0.0535 Epoch: 5 Global Step: 222860 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:29,648-Speed 2630.65 samples/sec Loss 9.9057 LearningRate 0.0535 Epoch: 5 Global Step: 222870 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:46:33,583-Speed 2603.01 samples/sec Loss 9.8778 LearningRate 0.0535 Epoch: 5 Global Step: 222880 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:46:37,543-Speed 2586.44 samples/sec Loss 9.8476 LearningRate 0.0535 Epoch: 5 Global Step: 222890 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:46:41,447-Speed 2623.86 samples/sec Loss 9.8638 LearningRate 0.0535 Epoch: 5 Global Step: 222900 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:46:45,336-Speed 2633.41 samples/sec Loss 9.9004 LearningRate 0.0535 Epoch: 5 Global Step: 222910 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:46:49,283-Speed 2595.04 samples/sec Loss 9.9720 LearningRate 0.0535 Epoch: 5 Global Step: 222920 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:46:53,184-Speed 2626.03 samples/sec Loss 9.9502 LearningRate 0.0535 Epoch: 5 Global Step: 222930 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:46:57,092-Speed 2620.77 samples/sec Loss 9.8201 LearningRate 0.0535 Epoch: 5 Global Step: 222940 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:47:00,984-Speed 2631.74 samples/sec Loss 9.9970 LearningRate 0.0535 Epoch: 5 Global Step: 222950 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:47:04,886-Speed 2624.84 samples/sec Loss 9.8847 LearningRate 0.0535 Epoch: 5 Global Step: 222960 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:47:08,785-Speed 2626.69 samples/sec Loss 9.9449 LearningRate 0.0535 Epoch: 5 Global Step: 222970 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:47:12,670-Speed 2637.21 samples/sec Loss 9.9037 LearningRate 0.0535 Epoch: 5 Global Step: 222980 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:47:16,548-Speed 2641.04 samples/sec Loss 9.8392 LearningRate 0.0535 Epoch: 5 Global Step: 222990 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:20,445-Speed 2628.84 samples/sec Loss 10.0679 LearningRate 0.0535 Epoch: 5 Global Step: 223000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:24,343-Speed 2627.02 samples/sec Loss 10.0924 LearningRate 0.0535 Epoch: 5 Global Step: 223010 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:28,234-Speed 2633.23 samples/sec Loss 9.8146 LearningRate 0.0535 Epoch: 5 Global Step: 223020 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:32,126-Speed 2631.69 samples/sec Loss 10.0594 LearningRate 0.0535 Epoch: 5 Global Step: 223030 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:36,018-Speed 2630.93 samples/sec Loss 9.9995 LearningRate 0.0535 Epoch: 5 Global Step: 223040 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:39,915-Speed 2628.69 samples/sec Loss 9.8340 LearningRate 0.0535 Epoch: 5 Global Step: 223050 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:43,805-Speed 2633.13 samples/sec Loss 9.8454 LearningRate 0.0535 Epoch: 5 Global Step: 223060 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:47,694-Speed 2633.41 samples/sec Loss 9.9458 LearningRate 0.0535 Epoch: 5 Global Step: 223070 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:51,617-Speed 2611.59 samples/sec Loss 10.0023 LearningRate 0.0534 Epoch: 5 Global Step: 223080 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:47:55,507-Speed 2632.57 samples/sec Loss 10.0731 LearningRate 0.0534 Epoch: 5 Global Step: 223090 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:47:59,421-Speed 2617.15 samples/sec Loss 9.8902 LearningRate 0.0534 Epoch: 5 Global Step: 223100 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:48:03,302-Speed 2639.60 samples/sec Loss 9.9975 LearningRate 0.0534 Epoch: 5 Global Step: 223110 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:48:07,178-Speed 2642.67 samples/sec Loss 9.9069 LearningRate 0.0534 Epoch: 5 Global Step: 223120 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:11,072-Speed 2629.79 samples/sec Loss 10.0021 LearningRate 0.0534 Epoch: 5 Global Step: 223130 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:14,966-Speed 2630.14 samples/sec Loss 9.7332 LearningRate 0.0534 Epoch: 5 Global Step: 223140 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:18,857-Speed 2632.23 samples/sec Loss 9.9114 LearningRate 0.0534 Epoch: 5 Global Step: 223150 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:22,749-Speed 2632.64 samples/sec Loss 9.9074 LearningRate 0.0534 Epoch: 5 Global Step: 223160 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:26,642-Speed 2630.60 samples/sec Loss 10.0004 LearningRate 0.0534 Epoch: 5 Global Step: 223170 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:30,561-Speed 2614.55 samples/sec Loss 10.0383 LearningRate 0.0534 Epoch: 5 Global Step: 223180 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:34,495-Speed 2603.69 samples/sec Loss 9.9453 LearningRate 0.0534 Epoch: 5 Global Step: 223190 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:38,386-Speed 2632.18 samples/sec Loss 10.0069 LearningRate 0.0534 Epoch: 5 Global Step: 223200 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:42,291-Speed 2622.65 samples/sec Loss 9.9091 LearningRate 0.0534 Epoch: 5 Global Step: 223210 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:48:46,191-Speed 2626.08 samples/sec Loss 9.9508 LearningRate 0.0534 Epoch: 5 Global Step: 223220 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:48:50,084-Speed 2631.31 samples/sec Loss 10.0275 LearningRate 0.0534 Epoch: 5 Global Step: 223230 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:48:53,978-Speed 2630.83 samples/sec Loss 9.7795 LearningRate 0.0534 Epoch: 5 Global Step: 223240 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:48:57,871-Speed 2630.95 samples/sec Loss 9.8413 LearningRate 0.0534 Epoch: 5 Global Step: 223250 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:01,823-Speed 2591.64 samples/sec Loss 9.8719 LearningRate 0.0534 Epoch: 5 Global Step: 223260 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:05,924-Speed 2497.14 samples/sec Loss 9.8953 LearningRate 0.0534 Epoch: 5 Global Step: 223270 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:09,903-Speed 2574.35 samples/sec Loss 10.0541 LearningRate 0.0534 Epoch: 5 Global Step: 223280 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:13,794-Speed 2632.67 samples/sec Loss 9.9652 LearningRate 0.0534 Epoch: 5 Global Step: 223290 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:17,691-Speed 2627.98 samples/sec Loss 10.0003 LearningRate 0.0534 Epoch: 5 Global Step: 223300 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:21,590-Speed 2627.64 samples/sec Loss 10.0383 LearningRate 0.0534 Epoch: 5 Global Step: 223310 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:25,499-Speed 2620.33 samples/sec Loss 9.9839 LearningRate 0.0534 Epoch: 5 Global Step: 223320 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:49:29,386-Speed 2634.95 samples/sec Loss 9.9028 LearningRate 0.0534 Epoch: 5 Global Step: 223330 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:49:33,280-Speed 2630.05 samples/sec Loss 9.7972 LearningRate 0.0534 Epoch: 5 Global Step: 223340 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:49:37,174-Speed 2630.58 samples/sec Loss 10.1121 LearningRate 0.0534 Epoch: 5 Global Step: 223350 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:49:41,066-Speed 2631.77 samples/sec Loss 9.9421 LearningRate 0.0534 Epoch: 5 Global Step: 223360 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:49:44,954-Speed 2634.75 samples/sec Loss 9.7636 LearningRate 0.0534 Epoch: 5 Global Step: 223370 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:49:48,833-Speed 2640.11 samples/sec Loss 9.9285 LearningRate 0.0534 Epoch: 5 Global Step: 223380 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:52,724-Speed 2632.97 samples/sec Loss 9.8530 LearningRate 0.0534 Epoch: 5 Global Step: 223390 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:49:56,615-Speed 2632.33 samples/sec Loss 9.9277 LearningRate 0.0534 Epoch: 5 Global Step: 223400 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:00,519-Speed 2624.23 samples/sec Loss 9.9369 LearningRate 0.0534 Epoch: 5 Global Step: 223410 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:04,518-Speed 2560.95 samples/sec Loss 9.7782 LearningRate 0.0534 Epoch: 5 Global Step: 223420 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:08,412-Speed 2630.46 samples/sec Loss 9.9601 LearningRate 0.0534 Epoch: 5 Global Step: 223430 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:12,312-Speed 2625.83 samples/sec Loss 9.8683 LearningRate 0.0534 Epoch: 5 Global Step: 223440 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:16,217-Speed 2623.54 samples/sec Loss 9.8748 LearningRate 0.0534 Epoch: 5 Global Step: 223450 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:20,137-Speed 2612.41 samples/sec Loss 9.7779 LearningRate 0.0534 Epoch: 5 Global Step: 223460 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:24,040-Speed 2624.10 samples/sec Loss 10.1118 LearningRate 0.0534 Epoch: 5 Global Step: 223470 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:27,942-Speed 2625.35 samples/sec Loss 9.8733 LearningRate 0.0534 Epoch: 5 Global Step: 223480 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:50:31,837-Speed 2630.05 samples/sec Loss 9.9905 LearningRate 0.0534 Epoch: 5 Global Step: 223490 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:50:35,747-Speed 2619.40 samples/sec Loss 9.9618 LearningRate 0.0534 Epoch: 5 Global Step: 223500 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:50:39,647-Speed 2625.85 samples/sec Loss 10.0169 LearningRate 0.0534 Epoch: 5 Global Step: 223510 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:50:43,522-Speed 2643.25 samples/sec Loss 10.0026 LearningRate 0.0534 Epoch: 5 Global Step: 223520 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:47,414-Speed 2631.70 samples/sec Loss 9.9424 LearningRate 0.0534 Epoch: 5 Global Step: 223530 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:51,313-Speed 2626.99 samples/sec Loss 10.0219 LearningRate 0.0534 Epoch: 5 Global Step: 223540 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:55,213-Speed 2626.69 samples/sec Loss 10.0499 LearningRate 0.0534 Epoch: 5 Global Step: 223550 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:50:59,114-Speed 2625.49 samples/sec Loss 9.7926 LearningRate 0.0534 Epoch: 5 Global Step: 223560 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:51:03,032-Speed 2614.01 samples/sec Loss 9.9254 LearningRate 0.0534 Epoch: 5 Global Step: 223570 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:51:06,936-Speed 2623.74 samples/sec Loss 9.7633 LearningRate 0.0534 Epoch: 5 Global Step: 223580 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:51:10,832-Speed 2629.13 samples/sec Loss 10.0974 LearningRate 0.0534 Epoch: 5 Global Step: 223590 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:51:14,730-Speed 2627.01 samples/sec Loss 9.9393 LearningRate 0.0534 Epoch: 5 Global Step: 223600 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:51:18,647-Speed 2615.54 samples/sec Loss 9.8615 LearningRate 0.0534 Epoch: 5 Global Step: 223610 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:51:22,549-Speed 2625.02 samples/sec Loss 10.0826 LearningRate 0.0534 Epoch: 5 Global Step: 223620 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:26,436-Speed 2635.06 samples/sec Loss 9.9477 LearningRate 0.0534 Epoch: 5 Global Step: 223630 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:30,328-Speed 2631.45 samples/sec Loss 10.0366 LearningRate 0.0534 Epoch: 5 Global Step: 223640 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:34,215-Speed 2635.07 samples/sec Loss 9.9425 LearningRate 0.0533 Epoch: 5 Global Step: 223650 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:38,111-Speed 2628.96 samples/sec Loss 9.8353 LearningRate 0.0533 Epoch: 5 Global Step: 223660 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:42,010-Speed 2627.02 samples/sec Loss 9.9725 LearningRate 0.0533 Epoch: 5 Global Step: 223670 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:45,950-Speed 2600.11 samples/sec Loss 10.0483 LearningRate 0.0533 Epoch: 5 Global Step: 223680 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:49,899-Speed 2593.78 samples/sec Loss 9.9384 LearningRate 0.0533 Epoch: 5 Global Step: 223690 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:53,792-Speed 2631.36 samples/sec Loss 9.8662 LearningRate 0.0533 Epoch: 5 Global Step: 223700 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:51:57,687-Speed 2629.35 samples/sec Loss 10.0280 LearningRate 0.0533 Epoch: 5 Global Step: 223710 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:52:01,565-Speed 2641.74 samples/sec Loss 9.8743 LearningRate 0.0533 Epoch: 5 Global Step: 223720 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:52:05,457-Speed 2631.08 samples/sec Loss 10.0845 LearningRate 0.0533 Epoch: 5 Global Step: 223730 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:52:09,349-Speed 2632.79 samples/sec Loss 9.9653 LearningRate 0.0533 Epoch: 5 Global Step: 223740 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:52:13,236-Speed 2634.92 samples/sec Loss 9.8704 LearningRate 0.0533 Epoch: 5 Global Step: 223750 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:17,130-Speed 2630.41 samples/sec Loss 9.9584 LearningRate 0.0533 Epoch: 5 Global Step: 223760 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:21,027-Speed 2628.48 samples/sec Loss 9.9776 LearningRate 0.0533 Epoch: 5 Global Step: 223770 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:25,050-Speed 2545.86 samples/sec Loss 9.8518 LearningRate 0.0533 Epoch: 5 Global Step: 223780 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:29,118-Speed 2518.12 samples/sec Loss 9.8563 LearningRate 0.0533 Epoch: 5 Global Step: 223790 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:33,189-Speed 2516.13 samples/sec Loss 9.9064 LearningRate 0.0533 Epoch: 5 Global Step: 223800 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:37,258-Speed 2517.37 samples/sec Loss 9.9785 LearningRate 0.0533 Epoch: 5 Global Step: 223810 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:41,270-Speed 2552.71 samples/sec Loss 9.9337 LearningRate 0.0533 Epoch: 5 Global Step: 223820 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:45,168-Speed 2627.62 samples/sec Loss 9.9514 LearningRate 0.0533 Epoch: 5 Global Step: 223830 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:49,078-Speed 2619.41 samples/sec Loss 9.8206 LearningRate 0.0533 Epoch: 5 Global Step: 223840 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:52:52,976-Speed 2627.78 samples/sec Loss 9.9791 LearningRate 0.0533 Epoch: 5 Global Step: 223850 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:52:56,873-Speed 2628.37 samples/sec Loss 9.8044 LearningRate 0.0533 Epoch: 5 Global Step: 223860 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:00,777-Speed 2623.95 samples/sec Loss 10.0532 LearningRate 0.0533 Epoch: 5 Global Step: 223870 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:04,676-Speed 2626.68 samples/sec Loss 9.8862 LearningRate 0.0533 Epoch: 5 Global Step: 223880 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:08,582-Speed 2622.13 samples/sec Loss 10.0209 LearningRate 0.0533 Epoch: 5 Global Step: 223890 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:12,478-Speed 2629.04 samples/sec Loss 9.9271 LearningRate 0.0533 Epoch: 5 Global Step: 223900 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:16,371-Speed 2631.45 samples/sec Loss 9.9515 LearningRate 0.0533 Epoch: 5 Global Step: 223910 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:20,262-Speed 2632.17 samples/sec Loss 9.9501 LearningRate 0.0533 Epoch: 5 Global Step: 223920 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:24,151-Speed 2633.99 samples/sec Loss 9.8312 LearningRate 0.0533 Epoch: 5 Global Step: 223930 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:28,043-Speed 2631.33 samples/sec Loss 10.0079 LearningRate 0.0533 Epoch: 5 Global Step: 223940 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:31,935-Speed 2632.05 samples/sec Loss 10.0324 LearningRate 0.0533 Epoch: 5 Global Step: 223950 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:35,836-Speed 2625.75 samples/sec Loss 9.8009 LearningRate 0.0533 Epoch: 5 Global Step: 223960 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:39,729-Speed 2630.65 samples/sec Loss 9.8744 LearningRate 0.0533 Epoch: 5 Global Step: 223970 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:53:43,609-Speed 2639.58 samples/sec Loss 9.9999 LearningRate 0.0533 Epoch: 5 Global Step: 223980 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:53:47,494-Speed 2636.72 samples/sec Loss 9.8168 LearningRate 0.0533 Epoch: 5 Global Step: 223990 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:53:51,389-Speed 2629.92 samples/sec Loss 9.9550 LearningRate 0.0533 Epoch: 5 Global Step: 224000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:53:55,278-Speed 2633.98 samples/sec Loss 9.8599 LearningRate 0.0533 Epoch: 5 Global Step: 224010 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:53:59,171-Speed 2630.67 samples/sec Loss 9.8806 LearningRate 0.0533 Epoch: 5 Global Step: 224020 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:54:03,068-Speed 2628.68 samples/sec Loss 9.8718 LearningRate 0.0533 Epoch: 5 Global Step: 224030 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:54:06,961-Speed 2630.39 samples/sec Loss 9.9094 LearningRate 0.0533 Epoch: 5 Global Step: 224040 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:54:10,849-Speed 2634.32 samples/sec Loss 9.8365 LearningRate 0.0533 Epoch: 5 Global Step: 224050 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:54:14,742-Speed 2631.06 samples/sec Loss 10.0507 LearningRate 0.0533 Epoch: 5 Global Step: 224060 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:54:18,634-Speed 2631.69 samples/sec Loss 9.9663 LearningRate 0.0533 Epoch: 5 Global Step: 224070 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:54:22,528-Speed 2630.41 samples/sec Loss 9.8775 LearningRate 0.0533 Epoch: 5 Global Step: 224080 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:26,419-Speed 2632.57 samples/sec Loss 9.9971 LearningRate 0.0533 Epoch: 5 Global Step: 224090 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:30,312-Speed 2631.28 samples/sec Loss 9.9837 LearningRate 0.0533 Epoch: 5 Global Step: 224100 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:34,202-Speed 2632.28 samples/sec Loss 9.8657 LearningRate 0.0533 Epoch: 5 Global Step: 224110 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:38,099-Speed 2628.47 samples/sec Loss 9.9011 LearningRate 0.0533 Epoch: 5 Global Step: 224120 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:41,989-Speed 2632.53 samples/sec Loss 9.8177 LearningRate 0.0533 Epoch: 5 Global Step: 224130 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:45,879-Speed 2633.37 samples/sec Loss 9.8377 LearningRate 0.0533 Epoch: 5 Global Step: 224140 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:49,772-Speed 2631.06 samples/sec Loss 9.8346 LearningRate 0.0533 Epoch: 5 Global Step: 224150 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:54:53,652-Speed 2640.37 samples/sec Loss 9.7820 LearningRate 0.0533 Epoch: 5 Global Step: 224160 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:54:57,567-Speed 2615.81 samples/sec Loss 9.8801 LearningRate 0.0533 Epoch: 5 Global Step: 224170 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:55:01,530-Speed 2584.85 samples/sec Loss 9.8018 LearningRate 0.0533 Epoch: 5 Global Step: 224180 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:55:05,423-Speed 2630.60 samples/sec Loss 9.9661 LearningRate 0.0533 Epoch: 5 Global Step: 224190 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:55:09,265-Speed 2665.78 samples/sec Loss 9.9443 LearningRate 0.0533 Epoch: 5 Global Step: 224200 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:13,169-Speed 2623.31 samples/sec Loss 10.2531 LearningRate 0.0533 Epoch: 5 Global Step: 224210 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:17,272-Speed 2496.69 samples/sec Loss 10.3264 LearningRate 0.0532 Epoch: 5 Global Step: 224220 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:21,169-Speed 2628.78 samples/sec Loss 10.2215 LearningRate 0.0532 Epoch: 5 Global Step: 224230 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:25,055-Speed 2635.91 samples/sec Loss 9.9916 LearningRate 0.0532 Epoch: 5 Global Step: 224240 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:28,946-Speed 2632.42 samples/sec Loss 9.8787 LearningRate 0.0532 Epoch: 5 Global Step: 224250 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:32,831-Speed 2636.75 samples/sec Loss 10.0229 LearningRate 0.0532 Epoch: 5 Global Step: 224260 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:36,723-Speed 2631.70 samples/sec Loss 9.9440 LearningRate 0.0532 Epoch: 5 Global Step: 224270 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:40,613-Speed 2633.00 samples/sec Loss 10.0088 LearningRate 0.0532 Epoch: 5 Global Step: 224280 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:44,506-Speed 2630.85 samples/sec Loss 9.9179 LearningRate 0.0532 Epoch: 5 Global Step: 224290 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 20:55:48,403-Speed 2628.66 samples/sec Loss 10.0302 LearningRate 0.0532 Epoch: 5 Global Step: 224300 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:55:52,294-Speed 2632.95 samples/sec Loss 10.0066 LearningRate 0.0532 Epoch: 5 Global Step: 224310 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:55:56,200-Speed 2621.76 samples/sec Loss 10.1273 LearningRate 0.0532 Epoch: 5 Global Step: 224320 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:00,096-Speed 2629.28 samples/sec Loss 9.9389 LearningRate 0.0532 Epoch: 5 Global Step: 224330 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:03,994-Speed 2627.84 samples/sec Loss 9.9657 LearningRate 0.0532 Epoch: 5 Global Step: 224340 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:07,903-Speed 2620.06 samples/sec Loss 9.8748 LearningRate 0.0532 Epoch: 5 Global Step: 224350 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:11,797-Speed 2630.72 samples/sec Loss 9.8609 LearningRate 0.0532 Epoch: 5 Global Step: 224360 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:15,692-Speed 2629.98 samples/sec Loss 9.9681 LearningRate 0.0532 Epoch: 5 Global Step: 224370 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:19,591-Speed 2626.41 samples/sec Loss 9.9890 LearningRate 0.0532 Epoch: 5 Global Step: 224380 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:23,487-Speed 2628.76 samples/sec Loss 10.0159 LearningRate 0.0532 Epoch: 5 Global Step: 224390 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 20:56:27,384-Speed 2628.42 samples/sec Loss 10.0307 LearningRate 0.0532 Epoch: 5 Global Step: 224400 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:31,277-Speed 2631.68 samples/sec Loss 9.8227 LearningRate 0.0532 Epoch: 5 Global Step: 224410 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:35,168-Speed 2632.47 samples/sec Loss 10.0589 LearningRate 0.0532 Epoch: 5 Global Step: 224420 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:39,067-Speed 2626.42 samples/sec Loss 9.9223 LearningRate 0.0532 Epoch: 5 Global Step: 224430 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:43,029-Speed 2585.33 samples/sec Loss 10.0054 LearningRate 0.0532 Epoch: 5 Global Step: 224440 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:46,934-Speed 2623.33 samples/sec Loss 9.8702 LearningRate 0.0532 Epoch: 5 Global Step: 224450 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:50,826-Speed 2632.02 samples/sec Loss 10.0326 LearningRate 0.0532 Epoch: 5 Global Step: 224460 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:54,718-Speed 2631.86 samples/sec Loss 9.7734 LearningRate 0.0532 Epoch: 5 Global Step: 224470 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:56:58,611-Speed 2630.80 samples/sec Loss 9.8825 LearningRate 0.0532 Epoch: 5 Global Step: 224480 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:02,552-Speed 2598.86 samples/sec Loss 10.0180 LearningRate 0.0532 Epoch: 5 Global Step: 224490 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:06,444-Speed 2631.83 samples/sec Loss 9.9771 LearningRate 0.0532 Epoch: 5 Global Step: 224500 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:57:10,334-Speed 2632.98 samples/sec Loss 9.9258 LearningRate 0.0532 Epoch: 5 Global Step: 224510 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:57:14,209-Speed 2642.94 samples/sec Loss 9.8681 LearningRate 0.0532 Epoch: 5 Global Step: 224520 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:18,113-Speed 2624.21 samples/sec Loss 9.9952 LearningRate 0.0532 Epoch: 5 Global Step: 224530 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:22,014-Speed 2625.10 samples/sec Loss 9.9513 LearningRate 0.0532 Epoch: 5 Global Step: 224540 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:25,910-Speed 2629.55 samples/sec Loss 9.9112 LearningRate 0.0532 Epoch: 5 Global Step: 224550 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:29,800-Speed 2632.56 samples/sec Loss 9.8678 LearningRate 0.0532 Epoch: 5 Global Step: 224560 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:33,699-Speed 2627.04 samples/sec Loss 9.9772 LearningRate 0.0532 Epoch: 5 Global Step: 224570 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:37,593-Speed 2630.31 samples/sec Loss 9.9936 LearningRate 0.0532 Epoch: 5 Global Step: 224580 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:41,481-Speed 2634.11 samples/sec Loss 9.9751 LearningRate 0.0532 Epoch: 5 Global Step: 224590 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:45,384-Speed 2624.59 samples/sec Loss 9.9290 LearningRate 0.0532 Epoch: 5 Global Step: 224600 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:49,281-Speed 2628.42 samples/sec Loss 9.9046 LearningRate 0.0532 Epoch: 5 Global Step: 224610 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 20:57:53,169-Speed 2634.47 samples/sec Loss 9.7644 LearningRate 0.0532 Epoch: 5 Global Step: 224620 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:57:57,060-Speed 2632.39 samples/sec Loss 10.0718 LearningRate 0.0532 Epoch: 5 Global Step: 224630 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:00,950-Speed 2633.46 samples/sec Loss 9.9740 LearningRate 0.0532 Epoch: 5 Global Step: 224640 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:04,853-Speed 2624.34 samples/sec Loss 9.7975 LearningRate 0.0532 Epoch: 5 Global Step: 224650 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:08,747-Speed 2630.03 samples/sec Loss 9.8088 LearningRate 0.0532 Epoch: 5 Global Step: 224660 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:12,687-Speed 2599.87 samples/sec Loss 10.0030 LearningRate 0.0532 Epoch: 5 Global Step: 224670 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:16,578-Speed 2632.59 samples/sec Loss 10.1395 LearningRate 0.0532 Epoch: 5 Global Step: 224680 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:20,475-Speed 2628.59 samples/sec Loss 9.9603 LearningRate 0.0532 Epoch: 5 Global Step: 224690 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:24,367-Speed 2631.97 samples/sec Loss 9.9104 LearningRate 0.0532 Epoch: 5 Global Step: 224700 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:28,259-Speed 2631.80 samples/sec Loss 9.9652 LearningRate 0.0532 Epoch: 5 Global Step: 224710 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:32,160-Speed 2625.50 samples/sec Loss 9.7889 LearningRate 0.0532 Epoch: 5 Global Step: 224720 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:58:36,083-Speed 2610.36 samples/sec Loss 9.8432 LearningRate 0.0532 Epoch: 5 Global Step: 224730 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:58:39,980-Speed 2629.02 samples/sec Loss 9.8835 LearningRate 0.0532 Epoch: 5 Global Step: 224740 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:58:43,914-Speed 2603.89 samples/sec Loss 9.8571 LearningRate 0.0532 Epoch: 5 Global Step: 224750 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:58:47,789-Speed 2643.27 samples/sec Loss 9.9702 LearningRate 0.0532 Epoch: 5 Global Step: 224760 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:51,818-Speed 2541.95 samples/sec Loss 9.7622 LearningRate 0.0532 Epoch: 5 Global Step: 224770 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:55,744-Speed 2609.25 samples/sec Loss 10.0066 LearningRate 0.0532 Epoch: 5 Global Step: 224780 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:58:59,651-Speed 2621.96 samples/sec Loss 10.0500 LearningRate 0.0531 Epoch: 5 Global Step: 224790 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:03,546-Speed 2628.97 samples/sec Loss 9.8071 LearningRate 0.0531 Epoch: 5 Global Step: 224800 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:07,437-Speed 2632.46 samples/sec Loss 9.8555 LearningRate 0.0531 Epoch: 5 Global Step: 224810 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:11,341-Speed 2624.08 samples/sec Loss 9.9674 LearningRate 0.0531 Epoch: 5 Global Step: 224820 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:15,232-Speed 2632.18 samples/sec Loss 9.9435 LearningRate 0.0531 Epoch: 5 Global Step: 224830 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:19,123-Speed 2632.31 samples/sec Loss 9.8121 LearningRate 0.0531 Epoch: 5 Global Step: 224840 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:23,019-Speed 2628.83 samples/sec Loss 10.0046 LearningRate 0.0531 Epoch: 5 Global Step: 224850 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:26,913-Speed 2630.38 samples/sec Loss 9.8554 LearningRate 0.0531 Epoch: 5 Global Step: 224860 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 20:59:30,783-Speed 2646.91 samples/sec Loss 9.8210 LearningRate 0.0531 Epoch: 5 Global Step: 224870 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:34,681-Speed 2627.77 samples/sec Loss 9.8464 LearningRate 0.0531 Epoch: 5 Global Step: 224880 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:38,574-Speed 2630.23 samples/sec Loss 9.8833 LearningRate 0.0531 Epoch: 5 Global Step: 224890 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:42,468-Speed 2630.67 samples/sec Loss 9.9301 LearningRate 0.0531 Epoch: 5 Global Step: 224900 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:46,367-Speed 2627.48 samples/sec Loss 9.7920 LearningRate 0.0531 Epoch: 5 Global Step: 224910 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:50,280-Speed 2617.25 samples/sec Loss 9.9007 LearningRate 0.0531 Epoch: 5 Global Step: 224920 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:54,168-Speed 2634.83 samples/sec Loss 9.9045 LearningRate 0.0531 Epoch: 5 Global Step: 224930 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 20:59:58,064-Speed 2628.67 samples/sec Loss 9.9983 LearningRate 0.0531 Epoch: 5 Global Step: 224940 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:01,954-Speed 2633.99 samples/sec Loss 9.7964 LearningRate 0.0531 Epoch: 5 Global Step: 224950 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:05,847-Speed 2630.87 samples/sec Loss 9.9651 LearningRate 0.0531 Epoch: 5 Global Step: 224960 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:09,735-Speed 2633.89 samples/sec Loss 9.8456 LearningRate 0.0531 Epoch: 5 Global Step: 224970 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:00:13,629-Speed 2630.26 samples/sec Loss 9.7829 LearningRate 0.0531 Epoch: 5 Global Step: 224980 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:00:17,536-Speed 2621.97 samples/sec Loss 9.8832 LearningRate 0.0531 Epoch: 5 Global Step: 224990 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:00:21,432-Speed 2629.03 samples/sec Loss 9.8628 LearningRate 0.0531 Epoch: 5 Global Step: 225000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:25,319-Speed 2635.00 samples/sec Loss 9.8885 LearningRate 0.0531 Epoch: 5 Global Step: 225010 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:29,221-Speed 2625.03 samples/sec Loss 9.8711 LearningRate 0.0531 Epoch: 5 Global Step: 225020 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:33,110-Speed 2633.54 samples/sec Loss 9.9835 LearningRate 0.0531 Epoch: 5 Global Step: 225030 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:37,000-Speed 2633.19 samples/sec Loss 9.8811 LearningRate 0.0531 Epoch: 5 Global Step: 225040 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:40,892-Speed 2631.49 samples/sec Loss 9.8894 LearningRate 0.0531 Epoch: 5 Global Step: 225050 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:44,854-Speed 2585.02 samples/sec Loss 9.8009 LearningRate 0.0531 Epoch: 5 Global Step: 225060 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:48,824-Speed 2579.88 samples/sec Loss 9.9235 LearningRate 0.0531 Epoch: 5 Global Step: 225070 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:52,712-Speed 2634.30 samples/sec Loss 9.7276 LearningRate 0.0531 Epoch: 5 Global Step: 225080 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:00:56,603-Speed 2632.54 samples/sec Loss 9.6960 LearningRate 0.0531 Epoch: 5 Global Step: 225090 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:00,494-Speed 2634.47 samples/sec Loss 9.7407 LearningRate 0.0531 Epoch: 5 Global Step: 225100 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:01:04,388-Speed 2630.13 samples/sec Loss 9.8953 LearningRate 0.0531 Epoch: 5 Global Step: 225110 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:01:08,285-Speed 2628.62 samples/sec Loss 9.7714 LearningRate 0.0531 Epoch: 5 Global Step: 225120 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:01:12,175-Speed 2632.80 samples/sec Loss 9.8345 LearningRate 0.0531 Epoch: 5 Global Step: 225130 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:01:16,068-Speed 2630.78 samples/sec Loss 9.8705 LearningRate 0.0531 Epoch: 5 Global Step: 225140 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:01:19,961-Speed 2630.64 samples/sec Loss 9.9753 LearningRate 0.0531 Epoch: 5 Global Step: 225150 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:01:23,851-Speed 2633.16 samples/sec Loss 9.9479 LearningRate 0.0531 Epoch: 5 Global Step: 225160 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:01:27,735-Speed 2637.27 samples/sec Loss 9.9552 LearningRate 0.0531 Epoch: 5 Global Step: 225170 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:31,633-Speed 2628.38 samples/sec Loss 9.9410 LearningRate 0.0531 Epoch: 5 Global Step: 225180 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:35,524-Speed 2632.14 samples/sec Loss 9.9377 LearningRate 0.0531 Epoch: 5 Global Step: 225190 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:39,417-Speed 2630.69 samples/sec Loss 9.8561 LearningRate 0.0531 Epoch: 5 Global Step: 225200 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:43,306-Speed 2633.32 samples/sec Loss 9.8944 LearningRate 0.0531 Epoch: 5 Global Step: 225210 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:47,205-Speed 2627.51 samples/sec Loss 9.8515 LearningRate 0.0531 Epoch: 5 Global Step: 225220 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:51,115-Speed 2618.98 samples/sec Loss 9.9554 LearningRate 0.0531 Epoch: 5 Global Step: 225230 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:55,020-Speed 2623.03 samples/sec Loss 9.9009 LearningRate 0.0531 Epoch: 5 Global Step: 225240 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:01:58,910-Speed 2633.26 samples/sec Loss 9.8546 LearningRate 0.0531 Epoch: 5 Global Step: 225250 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:02,823-Speed 2617.55 samples/sec Loss 9.8796 LearningRate 0.0531 Epoch: 5 Global Step: 225260 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:06,743-Speed 2612.77 samples/sec Loss 9.9280 LearningRate 0.0531 Epoch: 5 Global Step: 225270 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:02:10,638-Speed 2629.50 samples/sec Loss 9.8144 LearningRate 0.0531 Epoch: 5 Global Step: 225280 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:02:14,528-Speed 2633.45 samples/sec Loss 9.9028 LearningRate 0.0531 Epoch: 5 Global Step: 225290 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:02:18,423-Speed 2629.58 samples/sec Loss 9.9500 LearningRate 0.0531 Epoch: 5 Global Step: 225300 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:02:22,361-Speed 2600.98 samples/sec Loss 9.8282 LearningRate 0.0531 Epoch: 5 Global Step: 225310 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:02:26,235-Speed 2643.83 samples/sec Loss 9.8044 LearningRate 0.0531 Epoch: 5 Global Step: 225320 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:30,134-Speed 2627.36 samples/sec Loss 9.7558 LearningRate 0.0531 Epoch: 5 Global Step: 225330 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:34,034-Speed 2625.90 samples/sec Loss 9.9243 LearningRate 0.0531 Epoch: 5 Global Step: 225340 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:37,926-Speed 2632.05 samples/sec Loss 9.8330 LearningRate 0.0531 Epoch: 5 Global Step: 225350 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:41,821-Speed 2630.32 samples/sec Loss 9.9333 LearningRate 0.0530 Epoch: 5 Global Step: 225360 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:45,718-Speed 2627.55 samples/sec Loss 9.8278 LearningRate 0.0530 Epoch: 5 Global Step: 225370 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:49,621-Speed 2624.56 samples/sec Loss 9.9343 LearningRate 0.0530 Epoch: 5 Global Step: 225380 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:53,523-Speed 2625.16 samples/sec Loss 9.8656 LearningRate 0.0530 Epoch: 5 Global Step: 225390 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:02:57,420-Speed 2628.90 samples/sec Loss 9.8388 LearningRate 0.0530 Epoch: 5 Global Step: 225400 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:03:01,340-Speed 2612.36 samples/sec Loss 9.8457 LearningRate 0.0530 Epoch: 5 Global Step: 225410 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:03:05,237-Speed 2628.93 samples/sec Loss 9.8211 LearningRate 0.0530 Epoch: 5 Global Step: 225420 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:03:09,132-Speed 2629.18 samples/sec Loss 9.8868 LearningRate 0.0530 Epoch: 5 Global Step: 225430 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:03:12,988-Speed 2656.68 samples/sec Loss 9.7363 LearningRate 0.0530 Epoch: 5 Global Step: 225440 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:16,913-Speed 2609.60 samples/sec Loss 9.8877 LearningRate 0.0530 Epoch: 5 Global Step: 225450 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:20,807-Speed 2630.39 samples/sec Loss 9.8981 LearningRate 0.0530 Epoch: 5 Global Step: 225460 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:24,747-Speed 2600.11 samples/sec Loss 9.9329 LearningRate 0.0530 Epoch: 5 Global Step: 225470 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:28,644-Speed 2628.22 samples/sec Loss 9.9714 LearningRate 0.0530 Epoch: 5 Global Step: 225480 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:32,536-Speed 2631.45 samples/sec Loss 9.8181 LearningRate 0.0530 Epoch: 5 Global Step: 225490 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:36,429-Speed 2631.08 samples/sec Loss 9.8691 LearningRate 0.0530 Epoch: 5 Global Step: 225500 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:40,318-Speed 2634.10 samples/sec Loss 9.7142 LearningRate 0.0530 Epoch: 5 Global Step: 225510 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:44,208-Speed 2632.95 samples/sec Loss 9.8789 LearningRate 0.0530 Epoch: 5 Global Step: 225520 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:48,126-Speed 2614.20 samples/sec Loss 9.8394 LearningRate 0.0530 Epoch: 5 Global Step: 225530 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:03:52,018-Speed 2631.75 samples/sec Loss 9.7966 LearningRate 0.0530 Epoch: 5 Global Step: 225540 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:03:55,910-Speed 2631.85 samples/sec Loss 9.9231 LearningRate 0.0530 Epoch: 5 Global Step: 225550 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:03:59,808-Speed 2627.25 samples/sec Loss 9.9690 LearningRate 0.0530 Epoch: 5 Global Step: 225560 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:03,702-Speed 2630.39 samples/sec Loss 9.8700 LearningRate 0.0530 Epoch: 5 Global Step: 225570 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:07,601-Speed 2627.30 samples/sec Loss 9.9112 LearningRate 0.0530 Epoch: 5 Global Step: 225580 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:11,493-Speed 2632.14 samples/sec Loss 9.9762 LearningRate 0.0530 Epoch: 5 Global Step: 225590 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:15,387-Speed 2630.00 samples/sec Loss 9.9101 LearningRate 0.0530 Epoch: 5 Global Step: 225600 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:19,285-Speed 2627.89 samples/sec Loss 10.0678 LearningRate 0.0530 Epoch: 5 Global Step: 225610 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:23,181-Speed 2628.80 samples/sec Loss 9.9200 LearningRate 0.0530 Epoch: 5 Global Step: 225620 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:27,075-Speed 2630.70 samples/sec Loss 9.8766 LearningRate 0.0530 Epoch: 5 Global Step: 225630 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:30,970-Speed 2629.95 samples/sec Loss 9.8959 LearningRate 0.0530 Epoch: 5 Global Step: 225640 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:04:34,865-Speed 2629.63 samples/sec Loss 9.9377 LearningRate 0.0530 Epoch: 5 Global Step: 225650 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:04:38,756-Speed 2632.13 samples/sec Loss 9.8460 LearningRate 0.0530 Epoch: 5 Global Step: 225660 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:04:42,653-Speed 2628.90 samples/sec Loss 9.8461 LearningRate 0.0530 Epoch: 5 Global Step: 225670 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:04:46,541-Speed 2634.07 samples/sec Loss 9.9429 LearningRate 0.0530 Epoch: 5 Global Step: 225680 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:04:50,424-Speed 2637.60 samples/sec Loss 9.9131 LearningRate 0.0530 Epoch: 5 Global Step: 225690 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:54,320-Speed 2628.73 samples/sec Loss 9.9352 LearningRate 0.0530 Epoch: 5 Global Step: 225700 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:04:58,211-Speed 2632.66 samples/sec Loss 9.8361 LearningRate 0.0530 Epoch: 5 Global Step: 225710 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:02,106-Speed 2629.70 samples/sec Loss 9.9805 LearningRate 0.0530 Epoch: 5 Global Step: 225720 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:05,995-Speed 2633.96 samples/sec Loss 9.8376 LearningRate 0.0530 Epoch: 5 Global Step: 225730 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:09,883-Speed 2634.47 samples/sec Loss 9.8924 LearningRate 0.0530 Epoch: 5 Global Step: 225740 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:13,774-Speed 2632.01 samples/sec Loss 9.9249 LearningRate 0.0530 Epoch: 5 Global Step: 225750 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:17,664-Speed 2633.14 samples/sec Loss 9.8687 LearningRate 0.0530 Epoch: 5 Global Step: 225760 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:21,551-Speed 2635.27 samples/sec Loss 9.9754 LearningRate 0.0530 Epoch: 5 Global Step: 225770 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:25,441-Speed 2633.14 samples/sec Loss 9.9145 LearningRate 0.0530 Epoch: 5 Global Step: 225780 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:29,336-Speed 2629.71 samples/sec Loss 9.8998 LearningRate 0.0530 Epoch: 5 Global Step: 225790 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:05:33,229-Speed 2630.99 samples/sec Loss 9.9975 LearningRate 0.0530 Epoch: 5 Global Step: 225800 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:05:37,119-Speed 2632.58 samples/sec Loss 9.8822 LearningRate 0.0530 Epoch: 5 Global Step: 225810 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:05:41,013-Speed 2631.15 samples/sec Loss 9.8139 LearningRate 0.0530 Epoch: 5 Global Step: 225820 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:05:44,918-Speed 2622.97 samples/sec Loss 9.8483 LearningRate 0.0530 Epoch: 5 Global Step: 225830 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:05:48,808-Speed 2633.14 samples/sec Loss 9.8473 LearningRate 0.0530 Epoch: 5 Global Step: 225840 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:52,722-Speed 2616.98 samples/sec Loss 9.9438 LearningRate 0.0530 Epoch: 5 Global Step: 225850 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:05:56,614-Speed 2631.68 samples/sec Loss 9.9323 LearningRate 0.0530 Epoch: 5 Global Step: 225860 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:00,508-Speed 2630.22 samples/sec Loss 9.9712 LearningRate 0.0530 Epoch: 5 Global Step: 225870 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:04,406-Speed 2627.43 samples/sec Loss 10.0379 LearningRate 0.0530 Epoch: 5 Global Step: 225880 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:08,297-Speed 2632.55 samples/sec Loss 9.9290 LearningRate 0.0530 Epoch: 5 Global Step: 225890 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:12,200-Speed 2624.07 samples/sec Loss 9.8584 LearningRate 0.0530 Epoch: 5 Global Step: 225900 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:16,091-Speed 2632.95 samples/sec Loss 9.8207 LearningRate 0.0530 Epoch: 5 Global Step: 225910 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:19,992-Speed 2625.48 samples/sec Loss 9.7512 LearningRate 0.0530 Epoch: 5 Global Step: 225920 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:23,887-Speed 2629.88 samples/sec Loss 9.8351 LearningRate 0.0529 Epoch: 5 Global Step: 225930 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:27,783-Speed 2628.58 samples/sec Loss 9.9161 LearningRate 0.0529 Epoch: 5 Global Step: 225940 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:06:31,754-Speed 2580.17 samples/sec Loss 9.8685 LearningRate 0.0529 Epoch: 5 Global Step: 225950 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:06:35,641-Speed 2634.90 samples/sec Loss 9.8877 LearningRate 0.0529 Epoch: 5 Global Step: 225960 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:06:39,588-Speed 2594.90 samples/sec Loss 9.7178 LearningRate 0.0529 Epoch: 5 Global Step: 225970 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:06:43,545-Speed 2588.82 samples/sec Loss 9.8415 LearningRate 0.0529 Epoch: 5 Global Step: 225980 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:47,438-Speed 2631.17 samples/sec Loss 9.8302 LearningRate 0.0529 Epoch: 5 Global Step: 225990 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:51,329-Speed 2632.62 samples/sec Loss 9.6877 LearningRate 0.0529 Epoch: 5 Global Step: 226000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:55,228-Speed 2626.70 samples/sec Loss 9.7771 LearningRate 0.0529 Epoch: 5 Global Step: 226010 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:06:59,135-Speed 2621.83 samples/sec Loss 9.7988 LearningRate 0.0529 Epoch: 5 Global Step: 226020 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:07:03,035-Speed 2626.72 samples/sec Loss 9.7365 LearningRate 0.0529 Epoch: 5 Global Step: 226030 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:07:06,942-Speed 2621.06 samples/sec Loss 9.7143 LearningRate 0.0529 Epoch: 5 Global Step: 226040 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:07:10,838-Speed 2629.51 samples/sec Loss 10.0356 LearningRate 0.0529 Epoch: 5 Global Step: 226050 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:07:14,727-Speed 2633.56 samples/sec Loss 9.9277 LearningRate 0.0529 Epoch: 5 Global Step: 226060 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:07:18,627-Speed 2626.51 samples/sec Loss 9.9412 LearningRate 0.0529 Epoch: 5 Global Step: 226070 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:07:22,508-Speed 2639.67 samples/sec Loss 9.8453 LearningRate 0.0529 Epoch: 5 Global Step: 226080 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:26,400-Speed 2631.58 samples/sec Loss 9.8129 LearningRate 0.0529 Epoch: 5 Global Step: 226090 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:30,295-Speed 2629.72 samples/sec Loss 9.8300 LearningRate 0.0529 Epoch: 5 Global Step: 226100 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:34,185-Speed 2633.29 samples/sec Loss 9.8265 LearningRate 0.0529 Epoch: 5 Global Step: 226110 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:38,077-Speed 2631.18 samples/sec Loss 9.8776 LearningRate 0.0529 Epoch: 5 Global Step: 226120 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:41,968-Speed 2632.03 samples/sec Loss 9.8815 LearningRate 0.0529 Epoch: 5 Global Step: 226130 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:45,873-Speed 2623.55 samples/sec Loss 10.0422 LearningRate 0.0529 Epoch: 5 Global Step: 226140 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:49,775-Speed 2624.61 samples/sec Loss 9.7725 LearningRate 0.0529 Epoch: 5 Global Step: 226150 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:53,684-Speed 2620.85 samples/sec Loss 9.8758 LearningRate 0.0529 Epoch: 5 Global Step: 226160 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:07:57,576-Speed 2631.48 samples/sec Loss 9.9495 LearningRate 0.0529 Epoch: 5 Global Step: 226170 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:08:01,473-Speed 2628.09 samples/sec Loss 9.8432 LearningRate 0.0529 Epoch: 5 Global Step: 226180 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:05,365-Speed 2632.09 samples/sec Loss 9.7617 LearningRate 0.0529 Epoch: 5 Global Step: 226190 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:09,267-Speed 2624.71 samples/sec Loss 9.8240 LearningRate 0.0529 Epoch: 5 Global Step: 226200 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:13,161-Speed 2630.43 samples/sec Loss 9.8898 LearningRate 0.0529 Epoch: 5 Global Step: 226210 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:17,054-Speed 2630.93 samples/sec Loss 9.9547 LearningRate 0.0529 Epoch: 5 Global Step: 226220 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:20,950-Speed 2628.73 samples/sec Loss 9.8866 LearningRate 0.0529 Epoch: 5 Global Step: 226230 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:24,842-Speed 2631.83 samples/sec Loss 9.7861 LearningRate 0.0529 Epoch: 5 Global Step: 226240 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:28,735-Speed 2631.95 samples/sec Loss 9.7419 LearningRate 0.0529 Epoch: 5 Global Step: 226250 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:32,628-Speed 2631.09 samples/sec Loss 9.9339 LearningRate 0.0529 Epoch: 5 Global Step: 226260 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:36,530-Speed 2624.71 samples/sec Loss 9.6784 LearningRate 0.0529 Epoch: 5 Global Step: 226270 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:40,422-Speed 2631.50 samples/sec Loss 9.8624 LearningRate 0.0529 Epoch: 5 Global Step: 226280 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:08:44,319-Speed 2628.61 samples/sec Loss 9.9416 LearningRate 0.0529 Epoch: 5 Global Step: 226290 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:08:48,196-Speed 2641.74 samples/sec Loss 10.0205 LearningRate 0.0529 Epoch: 5 Global Step: 226300 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:52,096-Speed 2626.74 samples/sec Loss 9.8431 LearningRate 0.0529 Epoch: 5 Global Step: 226310 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:55,992-Speed 2628.53 samples/sec Loss 9.9061 LearningRate 0.0529 Epoch: 5 Global Step: 226320 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:08:59,919-Speed 2607.97 samples/sec Loss 9.9165 LearningRate 0.0529 Epoch: 5 Global Step: 226330 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:09:03,815-Speed 2629.62 samples/sec Loss 9.7797 LearningRate 0.0529 Epoch: 5 Global Step: 226340 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:09:07,724-Speed 2620.20 samples/sec Loss 9.9701 LearningRate 0.0529 Epoch: 5 Global Step: 226350 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:09:11,641-Speed 2614.52 samples/sec Loss 9.9066 LearningRate 0.0529 Epoch: 5 Global Step: 226360 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:09:15,557-Speed 2615.70 samples/sec Loss 9.8721 LearningRate 0.0529 Epoch: 5 Global Step: 226370 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:09:19,519-Speed 2585.88 samples/sec Loss 9.7727 LearningRate 0.0529 Epoch: 5 Global Step: 226380 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:09:23,429-Speed 2619.42 samples/sec Loss 9.9427 LearningRate 0.0529 Epoch: 5 Global Step: 226390 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:09:27,326-Speed 2628.44 samples/sec Loss 9.8236 LearningRate 0.0529 Epoch: 5 Global Step: 226400 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:31,210-Speed 2636.69 samples/sec Loss 10.0187 LearningRate 0.0529 Epoch: 5 Global Step: 226410 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:35,106-Speed 2628.89 samples/sec Loss 9.7853 LearningRate 0.0529 Epoch: 5 Global Step: 226420 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:39,005-Speed 2627.01 samples/sec Loss 9.8686 LearningRate 0.0529 Epoch: 5 Global Step: 226430 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:42,906-Speed 2625.55 samples/sec Loss 9.9708 LearningRate 0.0529 Epoch: 5 Global Step: 226440 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:46,809-Speed 2624.54 samples/sec Loss 9.9198 LearningRate 0.0529 Epoch: 5 Global Step: 226450 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:50,699-Speed 2633.34 samples/sec Loss 9.8728 LearningRate 0.0529 Epoch: 5 Global Step: 226460 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:54,589-Speed 2632.78 samples/sec Loss 9.8466 LearningRate 0.0529 Epoch: 5 Global Step: 226470 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:09:58,478-Speed 2633.74 samples/sec Loss 9.9398 LearningRate 0.0529 Epoch: 5 Global Step: 226480 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:10:02,373-Speed 2629.55 samples/sec Loss 9.9029 LearningRate 0.0529 Epoch: 5 Global Step: 226490 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:10:06,269-Speed 2629.28 samples/sec Loss 9.8788 LearningRate 0.0528 Epoch: 5 Global Step: 226500 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:10,159-Speed 2632.61 samples/sec Loss 9.7614 LearningRate 0.0528 Epoch: 5 Global Step: 226510 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:14,061-Speed 2625.32 samples/sec Loss 9.9428 LearningRate 0.0528 Epoch: 5 Global Step: 226520 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:17,957-Speed 2629.45 samples/sec Loss 9.8262 LearningRate 0.0528 Epoch: 5 Global Step: 226530 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:21,847-Speed 2632.87 samples/sec Loss 9.8847 LearningRate 0.0528 Epoch: 5 Global Step: 226540 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:25,737-Speed 2633.25 samples/sec Loss 9.9376 LearningRate 0.0528 Epoch: 5 Global Step: 226550 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:29,629-Speed 2631.39 samples/sec Loss 9.9114 LearningRate 0.0528 Epoch: 5 Global Step: 226560 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:33,521-Speed 2631.74 samples/sec Loss 9.7601 LearningRate 0.0528 Epoch: 5 Global Step: 226570 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:37,431-Speed 2619.55 samples/sec Loss 9.9706 LearningRate 0.0528 Epoch: 5 Global Step: 226580 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:41,323-Speed 2632.07 samples/sec Loss 9.8925 LearningRate 0.0528 Epoch: 5 Global Step: 226590 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:10:45,220-Speed 2628.91 samples/sec Loss 9.7990 LearningRate 0.0528 Epoch: 5 Global Step: 226600 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:10:49,114-Speed 2630.77 samples/sec Loss 9.9947 LearningRate 0.0528 Epoch: 5 Global Step: 226610 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:10:53,004-Speed 2632.86 samples/sec Loss 9.8000 LearningRate 0.0528 Epoch: 5 Global Step: 226620 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:10:56,880-Speed 2642.67 samples/sec Loss 9.8742 LearningRate 0.0528 Epoch: 5 Global Step: 226630 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:00,779-Speed 2626.47 samples/sec Loss 9.9762 LearningRate 0.0528 Epoch: 5 Global Step: 226640 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:04,675-Speed 2629.27 samples/sec Loss 9.7696 LearningRate 0.0528 Epoch: 5 Global Step: 226650 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:08,571-Speed 2628.39 samples/sec Loss 9.8425 LearningRate 0.0528 Epoch: 5 Global Step: 226660 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:12,468-Speed 2628.77 samples/sec Loss 9.8359 LearningRate 0.0528 Epoch: 5 Global Step: 226670 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:16,362-Speed 2630.07 samples/sec Loss 9.8227 LearningRate 0.0528 Epoch: 5 Global Step: 226680 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:20,252-Speed 2633.43 samples/sec Loss 9.8461 LearningRate 0.0528 Epoch: 5 Global Step: 226690 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:24,141-Speed 2633.65 samples/sec Loss 9.9862 LearningRate 0.0528 Epoch: 5 Global Step: 226700 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:28,032-Speed 2632.02 samples/sec Loss 10.0625 LearningRate 0.0528 Epoch: 5 Global Step: 226710 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:31,927-Speed 2629.98 samples/sec Loss 9.9421 LearningRate 0.0528 Epoch: 5 Global Step: 226720 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:35,817-Speed 2632.37 samples/sec Loss 9.7232 LearningRate 0.0528 Epoch: 5 Global Step: 226730 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:11:39,711-Speed 2630.17 samples/sec Loss 10.0695 LearningRate 0.0528 Epoch: 5 Global Step: 226740 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:11:43,584-Speed 2645.29 samples/sec Loss 9.9310 LearningRate 0.0528 Epoch: 5 Global Step: 226750 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:47,473-Speed 2633.82 samples/sec Loss 9.6964 LearningRate 0.0528 Epoch: 5 Global Step: 226760 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:51,365-Speed 2631.83 samples/sec Loss 9.8832 LearningRate 0.0528 Epoch: 5 Global Step: 226770 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:55,258-Speed 2631.12 samples/sec Loss 9.8677 LearningRate 0.0528 Epoch: 5 Global Step: 226780 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:11:59,149-Speed 2632.47 samples/sec Loss 9.7987 LearningRate 0.0528 Epoch: 5 Global Step: 226790 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:12:03,040-Speed 2632.56 samples/sec Loss 9.8798 LearningRate 0.0528 Epoch: 5 Global Step: 226800 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:12:06,930-Speed 2632.78 samples/sec Loss 9.7775 LearningRate 0.0528 Epoch: 5 Global Step: 226810 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:12:10,832-Speed 2625.10 samples/sec Loss 9.8182 LearningRate 0.0528 Epoch: 5 Global Step: 226820 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:12:14,723-Speed 2632.24 samples/sec Loss 9.8398 LearningRate 0.0528 Epoch: 5 Global Step: 226830 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:12:18,619-Speed 2629.17 samples/sec Loss 9.8796 LearningRate 0.0528 Epoch: 5 Global Step: 226840 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:12:22,515-Speed 2628.94 samples/sec Loss 9.8380 LearningRate 0.0528 Epoch: 5 Global Step: 226850 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:26,415-Speed 2626.10 samples/sec Loss 9.8218 LearningRate 0.0528 Epoch: 5 Global Step: 226860 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:30,305-Speed 2633.15 samples/sec Loss 9.8286 LearningRate 0.0528 Epoch: 5 Global Step: 226870 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:34,205-Speed 2626.46 samples/sec Loss 9.8502 LearningRate 0.0528 Epoch: 5 Global Step: 226880 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:38,102-Speed 2628.73 samples/sec Loss 9.8977 LearningRate 0.0528 Epoch: 5 Global Step: 226890 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:42,121-Speed 2548.61 samples/sec Loss 9.8004 LearningRate 0.0528 Epoch: 5 Global Step: 226900 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:46,032-Speed 2618.43 samples/sec Loss 9.8753 LearningRate 0.0528 Epoch: 5 Global Step: 226910 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:49,924-Speed 2632.02 samples/sec Loss 9.9656 LearningRate 0.0528 Epoch: 5 Global Step: 226920 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:53,813-Speed 2633.81 samples/sec Loss 9.8443 LearningRate 0.0528 Epoch: 5 Global Step: 226930 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:12:57,692-Speed 2640.92 samples/sec Loss 9.8162 LearningRate 0.0528 Epoch: 5 Global Step: 226940 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:01,578-Speed 2635.51 samples/sec Loss 9.7680 LearningRate 0.0528 Epoch: 5 Global Step: 226950 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:05,470-Speed 2631.37 samples/sec Loss 9.8323 LearningRate 0.0528 Epoch: 5 Global Step: 226960 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:09,364-Speed 2630.27 samples/sec Loss 9.8226 LearningRate 0.0528 Epoch: 5 Global Step: 226970 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:13,262-Speed 2627.43 samples/sec Loss 9.8559 LearningRate 0.0528 Epoch: 5 Global Step: 226980 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:17,161-Speed 2627.36 samples/sec Loss 9.8507 LearningRate 0.0528 Epoch: 5 Global Step: 226990 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:21,059-Speed 2627.46 samples/sec Loss 9.8497 LearningRate 0.0528 Epoch: 5 Global Step: 227000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:24,953-Speed 2630.76 samples/sec Loss 9.7569 LearningRate 0.0528 Epoch: 5 Global Step: 227010 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:28,844-Speed 2632.06 samples/sec Loss 9.9044 LearningRate 0.0528 Epoch: 5 Global Step: 227020 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:32,736-Speed 2632.22 samples/sec Loss 9.7592 LearningRate 0.0528 Epoch: 5 Global Step: 227030 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:36,627-Speed 2632.06 samples/sec Loss 9.7386 LearningRate 0.0528 Epoch: 5 Global Step: 227040 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:13:40,518-Speed 2632.34 samples/sec Loss 9.8670 LearningRate 0.0528 Epoch: 5 Global Step: 227050 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:13:44,409-Speed 2631.85 samples/sec Loss 9.8762 LearningRate 0.0528 Epoch: 5 Global Step: 227060 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:13:48,299-Speed 2633.31 samples/sec Loss 9.8265 LearningRate 0.0527 Epoch: 5 Global Step: 227070 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:13:52,183-Speed 2637.32 samples/sec Loss 9.8186 LearningRate 0.0527 Epoch: 5 Global Step: 227080 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:56,067-Speed 2637.44 samples/sec Loss 9.8738 LearningRate 0.0527 Epoch: 5 Global Step: 227090 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:13:59,956-Speed 2633.74 samples/sec Loss 9.8987 LearningRate 0.0527 Epoch: 5 Global Step: 227100 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:03,848-Speed 2632.19 samples/sec Loss 9.8156 LearningRate 0.0527 Epoch: 5 Global Step: 227110 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:07,745-Speed 2628.05 samples/sec Loss 9.8858 LearningRate 0.0527 Epoch: 5 Global Step: 227120 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:11,653-Speed 2620.72 samples/sec Loss 9.7881 LearningRate 0.0527 Epoch: 5 Global Step: 227130 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:15,578-Speed 2609.20 samples/sec Loss 9.9087 LearningRate 0.0527 Epoch: 5 Global Step: 227140 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:19,491-Speed 2617.90 samples/sec Loss 9.7901 LearningRate 0.0527 Epoch: 5 Global Step: 227150 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:23,398-Speed 2621.86 samples/sec Loss 9.8208 LearningRate 0.0527 Epoch: 5 Global Step: 227160 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:27,356-Speed 2587.63 samples/sec Loss 9.7692 LearningRate 0.0527 Epoch: 5 Global Step: 227170 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:31,265-Speed 2620.53 samples/sec Loss 9.9342 LearningRate 0.0527 Epoch: 5 Global Step: 227180 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:14:35,172-Speed 2621.15 samples/sec Loss 9.9150 LearningRate 0.0527 Epoch: 5 Global Step: 227190 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:14:39,071-Speed 2627.58 samples/sec Loss 9.8596 LearningRate 0.0527 Epoch: 5 Global Step: 227200 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:14:42,969-Speed 2627.63 samples/sec Loss 9.7946 LearningRate 0.0527 Epoch: 5 Global Step: 227210 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:14:46,863-Speed 2629.75 samples/sec Loss 9.8059 LearningRate 0.0527 Epoch: 5 Global Step: 227220 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:14:50,758-Speed 2630.40 samples/sec Loss 9.9264 LearningRate 0.0527 Epoch: 5 Global Step: 227230 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:14:54,643-Speed 2636.14 samples/sec Loss 9.8543 LearningRate 0.0527 Epoch: 5 Global Step: 227240 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:14:58,532-Speed 2633.59 samples/sec Loss 9.9215 LearningRate 0.0527 Epoch: 5 Global Step: 227250 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:02,429-Speed 2627.94 samples/sec Loss 9.7911 LearningRate 0.0527 Epoch: 5 Global Step: 227260 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:06,325-Speed 2629.17 samples/sec Loss 9.8505 LearningRate 0.0527 Epoch: 5 Global Step: 227270 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:10,220-Speed 2629.82 samples/sec Loss 10.0209 LearningRate 0.0527 Epoch: 5 Global Step: 227280 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:14,120-Speed 2626.70 samples/sec Loss 9.8580 LearningRate 0.0527 Epoch: 5 Global Step: 227290 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:18,024-Speed 2624.06 samples/sec Loss 9.8922 LearningRate 0.0527 Epoch: 5 Global Step: 227300 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:21,920-Speed 2628.72 samples/sec Loss 9.8653 LearningRate 0.0527 Epoch: 5 Global Step: 227310 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:25,815-Speed 2629.91 samples/sec Loss 9.8166 LearningRate 0.0527 Epoch: 5 Global Step: 227320 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:29,713-Speed 2627.25 samples/sec Loss 9.9071 LearningRate 0.0527 Epoch: 5 Global Step: 227330 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:33,777-Speed 2520.19 samples/sec Loss 9.7583 LearningRate 0.0527 Epoch: 5 Global Step: 227340 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:15:37,852-Speed 2513.60 samples/sec Loss 9.8765 LearningRate 0.0527 Epoch: 5 Global Step: 227350 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:15:41,818-Speed 2582.56 samples/sec Loss 9.8323 LearningRate 0.0527 Epoch: 5 Global Step: 227360 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:15:45,697-Speed 2640.61 samples/sec Loss 9.9044 LearningRate 0.0527 Epoch: 5 Global Step: 227370 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:49,590-Speed 2631.26 samples/sec Loss 9.8999 LearningRate 0.0527 Epoch: 5 Global Step: 227380 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:53,485-Speed 2629.35 samples/sec Loss 9.9318 LearningRate 0.0527 Epoch: 5 Global Step: 227390 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:15:57,388-Speed 2625.19 samples/sec Loss 9.8124 LearningRate 0.0527 Epoch: 5 Global Step: 227400 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:16:01,286-Speed 2627.16 samples/sec Loss 9.8219 LearningRate 0.0527 Epoch: 5 Global Step: 227410 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:16:05,140-Speed 2657.81 samples/sec Loss 9.9917 LearningRate 0.0527 Epoch: 5 Global Step: 227420 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:16:09,005-Speed 2649.58 samples/sec Loss 10.1925 LearningRate 0.0527 Epoch: 5 Global Step: 227430 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:12,896-Speed 2633.00 samples/sec Loss 10.5108 LearningRate 0.0527 Epoch: 5 Global Step: 227440 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:16,792-Speed 2628.98 samples/sec Loss 10.2685 LearningRate 0.0527 Epoch: 5 Global Step: 227450 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:20,685-Speed 2631.04 samples/sec Loss 10.4492 LearningRate 0.0527 Epoch: 5 Global Step: 227460 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:24,626-Speed 2599.28 samples/sec Loss 10.1342 LearningRate 0.0527 Epoch: 5 Global Step: 227470 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:28,516-Speed 2633.67 samples/sec Loss 10.1101 LearningRate 0.0527 Epoch: 5 Global Step: 227480 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:32,407-Speed 2632.12 samples/sec Loss 10.1559 LearningRate 0.0527 Epoch: 5 Global Step: 227490 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:36,294-Speed 2634.81 samples/sec Loss 9.9477 LearningRate 0.0527 Epoch: 5 Global Step: 227500 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:40,190-Speed 2629.05 samples/sec Loss 10.1452 LearningRate 0.0527 Epoch: 5 Global Step: 227510 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:44,080-Speed 2633.17 samples/sec Loss 10.2475 LearningRate 0.0527 Epoch: 5 Global Step: 227520 Fp16 Grad Scale: 4096 Required: 68 hours
Training: 2022-04-13 21:16:47,984-Speed 2623.78 samples/sec Loss 10.0801 LearningRate 0.0527 Epoch: 5 Global Step: 227530 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:16:51,873-Speed 2633.71 samples/sec Loss 9.8441 LearningRate 0.0527 Epoch: 5 Global Step: 227540 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:16:55,765-Speed 2632.57 samples/sec Loss 10.0396 LearningRate 0.0527 Epoch: 5 Global Step: 227550 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:16:59,658-Speed 2630.51 samples/sec Loss 10.1784 LearningRate 0.0527 Epoch: 5 Global Step: 227560 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:17:03,564-Speed 2622.63 samples/sec Loss 10.0526 LearningRate 0.0527 Epoch: 5 Global Step: 227570 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:17:07,461-Speed 2627.92 samples/sec Loss 9.8167 LearningRate 0.0527 Epoch: 5 Global Step: 227580 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:17:11,352-Speed 2633.81 samples/sec Loss 9.9052 LearningRate 0.0527 Epoch: 5 Global Step: 227590 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:17:15,244-Speed 2631.38 samples/sec Loss 9.8902 LearningRate 0.0527 Epoch: 5 Global Step: 227600 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:17:19,139-Speed 2630.05 samples/sec Loss 9.9874 LearningRate 0.0527 Epoch: 5 Global Step: 227610 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:17:23,027-Speed 2633.76 samples/sec Loss 9.9426 LearningRate 0.0527 Epoch: 5 Global Step: 227620 Fp16 Grad Scale: 8192 Required: 68 hours
Training: 2022-04-13 21:17:26,917-Speed 2633.93 samples/sec Loss 9.7821 LearningRate 0.0527 Epoch: 5 Global Step: 227630 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:30,809-Speed 2631.47 samples/sec Loss 9.8416 LearningRate 0.0526 Epoch: 5 Global Step: 227640 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:34,825-Speed 2549.74 samples/sec Loss 9.8864 LearningRate 0.0526 Epoch: 5 Global Step: 227650 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:38,922-Speed 2500.28 samples/sec Loss 9.9466 LearningRate 0.0526 Epoch: 5 Global Step: 227660 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:42,809-Speed 2635.01 samples/sec Loss 9.8985 LearningRate 0.0526 Epoch: 5 Global Step: 227670 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:46,699-Speed 2633.02 samples/sec Loss 9.8022 LearningRate 0.0526 Epoch: 5 Global Step: 227680 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:50,669-Speed 2580.38 samples/sec Loss 9.9261 LearningRate 0.0526 Epoch: 5 Global Step: 227690 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:54,556-Speed 2635.28 samples/sec Loss 9.9278 LearningRate 0.0526 Epoch: 5 Global Step: 227700 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:17:58,464-Speed 2621.61 samples/sec Loss 9.9068 LearningRate 0.0526 Epoch: 5 Global Step: 227710 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:18:02,356-Speed 2631.16 samples/sec Loss 9.8973 LearningRate 0.0526 Epoch: 5 Global Step: 227720 Fp16 Grad Scale: 16384 Required: 68 hours
Training: 2022-04-13 21:18:06,256-Speed 2626.42 samples/sec Loss 10.0978 LearningRate 0.0526 Epoch: 5 Global Step: 227730 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:10,149-Speed 2630.78 samples/sec Loss 9.8081 LearningRate 0.0526 Epoch: 5 Global Step: 227740 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:14,044-Speed 2629.73 samples/sec Loss 9.9210 LearningRate 0.0526 Epoch: 5 Global Step: 227750 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:17,928-Speed 2637.23 samples/sec Loss 9.7745 LearningRate 0.0526 Epoch: 5 Global Step: 227760 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:21,820-Speed 2631.55 samples/sec Loss 9.8985 LearningRate 0.0526 Epoch: 5 Global Step: 227770 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:25,721-Speed 2626.06 samples/sec Loss 9.7383 LearningRate 0.0526 Epoch: 5 Global Step: 227780 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:29,627-Speed 2622.39 samples/sec Loss 9.9062 LearningRate 0.0526 Epoch: 5 Global Step: 227790 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:33,525-Speed 2627.52 samples/sec Loss 9.9404 LearningRate 0.0526 Epoch: 5 Global Step: 227800 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:37,429-Speed 2623.10 samples/sec Loss 9.8179 LearningRate 0.0526 Epoch: 5 Global Step: 227810 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:41,329-Speed 2626.12 samples/sec Loss 9.7933 LearningRate 0.0526 Epoch: 5 Global Step: 227820 Fp16 Grad Scale: 32768 Required: 68 hours
Training: 2022-04-13 21:18:45,233-Speed 2623.48 samples/sec Loss 9.9860 LearningRate 0.0526 Epoch: 5 Global Step: 227830 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:18:49,129-Speed 2629.89 samples/sec Loss 9.9819 LearningRate 0.0526 Epoch: 5 Global Step: 227840 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:18:53,023-Speed 2630.06 samples/sec Loss 9.7907 LearningRate 0.0526 Epoch: 5 Global Step: 227850 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:18:57,014-Speed 2566.45 samples/sec Loss 9.9148 LearningRate 0.0526 Epoch: 5 Global Step: 227860 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:19:01,028-Speed 2551.87 samples/sec Loss 9.9249 LearningRate 0.0526 Epoch: 5 Global Step: 227870 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:19:05,059-Speed 2540.89 samples/sec Loss 9.9128 LearningRate 0.0526 Epoch: 5 Global Step: 227880 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:19:08,954-Speed 2630.30 samples/sec Loss 9.8933 LearningRate 0.0526 Epoch: 5 Global Step: 227890 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:19:12,970-Speed 2549.98 samples/sec Loss 10.0497 LearningRate 0.0526 Epoch: 5 Global Step: 227900 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:19:16,875-Speed 2623.12 samples/sec Loss 9.9188 LearningRate 0.0526 Epoch: 5 Global Step: 227910 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:19:20,773-Speed 2627.02 samples/sec Loss 9.8926 LearningRate 0.0526 Epoch: 5 Global Step: 227920 Fp16 Grad Scale: 65536 Required: 68 hours
Training: 2022-04-13 21:19:24,665-Speed 2632.19 samples/sec Loss 9.9236 LearningRate 0.0526 Epoch: 5 Global Step: 227930 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:28,556-Speed 2631.95 samples/sec Loss 9.8049 LearningRate 0.0526 Epoch: 5 Global Step: 227940 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:32,447-Speed 2633.26 samples/sec Loss 10.0675 LearningRate 0.0526 Epoch: 5 Global Step: 227950 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:36,338-Speed 2632.20 samples/sec Loss 9.8432 LearningRate 0.0526 Epoch: 5 Global Step: 227960 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:40,262-Speed 2610.09 samples/sec Loss 9.8670 LearningRate 0.0526 Epoch: 5 Global Step: 227970 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:44,157-Speed 2629.07 samples/sec Loss 9.8002 LearningRate 0.0526 Epoch: 5 Global Step: 227980 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:48,051-Speed 2630.55 samples/sec Loss 9.7892 LearningRate 0.0526 Epoch: 5 Global Step: 227990 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:51,943-Speed 2631.40 samples/sec Loss 10.0635 LearningRate 0.0526 Epoch: 5 Global Step: 228000 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:55,835-Speed 2632.06 samples/sec Loss 9.8422 LearningRate 0.0526 Epoch: 5 Global Step: 228010 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:19:59,732-Speed 2628.49 samples/sec Loss 9.6603 LearningRate 0.0526 Epoch: 5 Global Step: 228020 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:03,661-Speed 2607.10 samples/sec Loss 9.8730 LearningRate 0.0526 Epoch: 5 Global Step: 228030 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:20:07,557-Speed 2628.83 samples/sec Loss 9.8153 LearningRate 0.0526 Epoch: 5 Global Step: 228040 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:20:11,447-Speed 2633.44 samples/sec Loss 9.8006 LearningRate 0.0526 Epoch: 5 Global Step: 228050 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:20:15,339-Speed 2630.98 samples/sec Loss 9.8018 LearningRate 0.0526 Epoch: 5 Global Step: 228060 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:20:19,215-Speed 2643.73 samples/sec Loss 9.7958 LearningRate 0.0526 Epoch: 5 Global Step: 228070 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:23,105-Speed 2632.39 samples/sec Loss 9.8084 LearningRate 0.0526 Epoch: 5 Global Step: 228080 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:26,999-Speed 2630.59 samples/sec Loss 9.9294 LearningRate 0.0526 Epoch: 5 Global Step: 228090 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:30,891-Speed 2631.47 samples/sec Loss 9.8010 LearningRate 0.0526 Epoch: 5 Global Step: 228100 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:34,782-Speed 2632.42 samples/sec Loss 9.8490 LearningRate 0.0526 Epoch: 5 Global Step: 228110 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:38,673-Speed 2632.80 samples/sec Loss 9.7908 LearningRate 0.0526 Epoch: 5 Global Step: 228120 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:42,567-Speed 2630.35 samples/sec Loss 9.8342 LearningRate 0.0526 Epoch: 5 Global Step: 228130 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:46,498-Speed 2605.87 samples/sec Loss 9.8004 LearningRate 0.0526 Epoch: 5 Global Step: 228140 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:50,394-Speed 2629.06 samples/sec Loss 9.7694 LearningRate 0.0526 Epoch: 5 Global Step: 228150 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:54,291-Speed 2628.44 samples/sec Loss 9.8967 LearningRate 0.0526 Epoch: 5 Global Step: 228160 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:20:58,190-Speed 2626.96 samples/sec Loss 9.8275 LearningRate 0.0526 Epoch: 5 Global Step: 228170 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:21:02,097-Speed 2622.04 samples/sec Loss 9.9173 LearningRate 0.0526 Epoch: 5 Global Step: 228180 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:21:06,094-Speed 2562.13 samples/sec Loss 9.8699 LearningRate 0.0526 Epoch: 5 Global Step: 228190 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:21:09,993-Speed 2627.56 samples/sec Loss 9.8566 LearningRate 0.0526 Epoch: 5 Global Step: 228200 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:21:13,871-Speed 2641.62 samples/sec Loss 9.7890 LearningRate 0.0525 Epoch: 5 Global Step: 228210 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:17,775-Speed 2622.98 samples/sec Loss 9.7980 LearningRate 0.0525 Epoch: 5 Global Step: 228220 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:21,672-Speed 2628.44 samples/sec Loss 9.9529 LearningRate 0.0525 Epoch: 5 Global Step: 228230 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:25,605-Speed 2604.22 samples/sec Loss 9.8907 LearningRate 0.0525 Epoch: 5 Global Step: 228240 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:29,500-Speed 2630.56 samples/sec Loss 9.8995 LearningRate 0.0525 Epoch: 5 Global Step: 228250 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:33,394-Speed 2629.68 samples/sec Loss 9.8172 LearningRate 0.0525 Epoch: 5 Global Step: 228260 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:37,287-Speed 2630.94 samples/sec Loss 9.8757 LearningRate 0.0525 Epoch: 5 Global Step: 228270 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:41,180-Speed 2630.91 samples/sec Loss 9.7587 LearningRate 0.0525 Epoch: 5 Global Step: 228280 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:45,084-Speed 2623.87 samples/sec Loss 9.7156 LearningRate 0.0525 Epoch: 5 Global Step: 228290 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:48,992-Speed 2621.27 samples/sec Loss 9.6410 LearningRate 0.0525 Epoch: 5 Global Step: 228300 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:21:52,900-Speed 2621.28 samples/sec Loss 9.7655 LearningRate 0.0525 Epoch: 5 Global Step: 228310 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:21:56,793-Speed 2631.12 samples/sec Loss 9.9339 LearningRate 0.0525 Epoch: 5 Global Step: 228320 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:22:00,688-Speed 2629.41 samples/sec Loss 9.8453 LearningRate 0.0525 Epoch: 5 Global Step: 228330 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:22:04,592-Speed 2623.00 samples/sec Loss 9.7382 LearningRate 0.0525 Epoch: 5 Global Step: 228340 Fp16 Grad Scale: 262144 Required: 68 hours
Training: 2022-04-13 21:22:08,470-Speed 2641.15 samples/sec Loss 9.6318 LearningRate 0.0525 Epoch: 5 Global Step: 228350 Fp16 Grad Scale: 131072 Required: 68 hours
Training: 2022-04-13 21:22:12,368-Speed 2627.97 samples/sec Loss 9.8867 LearningRate 0.0525 Epoch: 5 Global Step: 228360 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:16,258-Speed 2632.81 samples/sec Loss 9.7298 LearningRate 0.0525 Epoch: 5 Global Step: 228370 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:20,158-Speed 2627.00 samples/sec Loss 9.8544 LearningRate 0.0525 Epoch: 5 Global Step: 228380 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:24,053-Speed 2629.02 samples/sec Loss 9.8291 LearningRate 0.0525 Epoch: 5 Global Step: 228390 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:27,946-Speed 2631.82 samples/sec Loss 9.7749 LearningRate 0.0525 Epoch: 5 Global Step: 228400 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:31,834-Speed 2634.16 samples/sec Loss 9.9054 LearningRate 0.0525 Epoch: 5 Global Step: 228410 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:35,727-Speed 2631.33 samples/sec Loss 9.8730 LearningRate 0.0525 Epoch: 5 Global Step: 228420 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:39,625-Speed 2627.05 samples/sec Loss 9.8657 LearningRate 0.0525 Epoch: 5 Global Step: 228430 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:43,519-Speed 2630.18 samples/sec Loss 9.8124 LearningRate 0.0525 Epoch: 5 Global Step: 228440 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:47,423-Speed 2623.76 samples/sec Loss 9.8531 LearningRate 0.0525 Epoch: 5 Global Step: 228450 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:22:51,303-Speed 2640.03 samples/sec Loss 9.9762 LearningRate 0.0525 Epoch: 5 Global Step: 228460 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:55,195-Speed 2631.82 samples/sec Loss 9.9897 LearningRate 0.0525 Epoch: 5 Global Step: 228470 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:22:59,093-Speed 2627.47 samples/sec Loss 9.8956 LearningRate 0.0525 Epoch: 5 Global Step: 228480 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:02,995-Speed 2624.77 samples/sec Loss 9.8224 LearningRate 0.0525 Epoch: 5 Global Step: 228490 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:06,888-Speed 2630.82 samples/sec Loss 9.9958 LearningRate 0.0525 Epoch: 5 Global Step: 228500 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:10,781-Speed 2631.12 samples/sec Loss 9.7713 LearningRate 0.0525 Epoch: 5 Global Step: 228510 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:14,674-Speed 2631.17 samples/sec Loss 9.7802 LearningRate 0.0525 Epoch: 5 Global Step: 228520 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:18,576-Speed 2624.56 samples/sec Loss 9.8220 LearningRate 0.0525 Epoch: 5 Global Step: 228530 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:22,470-Speed 2630.17 samples/sec Loss 9.7678 LearningRate 0.0525 Epoch: 5 Global Step: 228540 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:26,362-Speed 2631.66 samples/sec Loss 9.8270 LearningRate 0.0525 Epoch: 5 Global Step: 228550 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:23:30,255-Speed 2631.22 samples/sec Loss 9.8184 LearningRate 0.0525 Epoch: 5 Global Step: 228560 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:23:34,150-Speed 2630.15 samples/sec Loss 9.8497 LearningRate 0.0525 Epoch: 5 Global Step: 228570 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:23:38,043-Speed 2630.35 samples/sec Loss 9.6895 LearningRate 0.0525 Epoch: 5 Global Step: 228580 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:23:41,935-Speed 2632.11 samples/sec Loss 9.8083 LearningRate 0.0525 Epoch: 5 Global Step: 228590 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:23:45,825-Speed 2632.62 samples/sec Loss 9.8014 LearningRate 0.0525 Epoch: 5 Global Step: 228600 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:23:49,719-Speed 2630.01 samples/sec Loss 9.8100 LearningRate 0.0525 Epoch: 5 Global Step: 228610 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:23:53,620-Speed 2625.61 samples/sec Loss 9.7099 LearningRate 0.0525 Epoch: 5 Global Step: 228620 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:23:57,537-Speed 2615.07 samples/sec Loss 9.8838 LearningRate 0.0525 Epoch: 5 Global Step: 228630 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:24:01,426-Speed 2633.41 samples/sec Loss 9.7078 LearningRate 0.0525 Epoch: 5 Global Step: 228640 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:24:05,321-Speed 2629.27 samples/sec Loss 9.8971 LearningRate 0.0525 Epoch: 5 Global Step: 228650 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:24:09,203-Speed 2638.30 samples/sec Loss 9.7835 LearningRate 0.0525 Epoch: 5 Global Step: 228660 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:24:13,091-Speed 2634.89 samples/sec Loss 9.8990 LearningRate 0.0525 Epoch: 5 Global Step: 228670 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:24:17,000-Speed 2620.16 samples/sec Loss 9.9254 LearningRate 0.0525 Epoch: 5 Global Step: 228680 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:24:20,930-Speed 2606.25 samples/sec Loss 9.7564 LearningRate 0.0525 Epoch: 5 Global Step: 228690 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:24:24,798-Speed 2648.24 samples/sec Loss 9.8552 LearningRate 0.0525 Epoch: 5 Global Step: 228700 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:28,693-Speed 2629.46 samples/sec Loss 9.9765 LearningRate 0.0525 Epoch: 5 Global Step: 228710 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:32,586-Speed 2631.98 samples/sec Loss 9.8288 LearningRate 0.0525 Epoch: 5 Global Step: 228720 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:36,480-Speed 2629.92 samples/sec Loss 9.8658 LearningRate 0.0525 Epoch: 5 Global Step: 228730 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:40,372-Speed 2631.72 samples/sec Loss 9.7838 LearningRate 0.0525 Epoch: 5 Global Step: 228740 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:44,267-Speed 2629.39 samples/sec Loss 9.8776 LearningRate 0.0525 Epoch: 5 Global Step: 228750 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:48,157-Speed 2632.93 samples/sec Loss 9.9648 LearningRate 0.0525 Epoch: 5 Global Step: 228760 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:52,054-Speed 2628.76 samples/sec Loss 9.7620 LearningRate 0.0525 Epoch: 5 Global Step: 228770 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:55,940-Speed 2635.40 samples/sec Loss 9.8367 LearningRate 0.0524 Epoch: 5 Global Step: 228780 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:24:59,831-Speed 2632.08 samples/sec Loss 9.8701 LearningRate 0.0524 Epoch: 5 Global Step: 228790 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:03,730-Speed 2627.33 samples/sec Loss 9.7737 LearningRate 0.0524 Epoch: 5 Global Step: 228800 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:25:07,615-Speed 2636.48 samples/sec Loss 9.7795 LearningRate 0.0524 Epoch: 5 Global Step: 228810 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:11,509-Speed 2629.66 samples/sec Loss 9.7346 LearningRate 0.0524 Epoch: 5 Global Step: 228820 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:15,407-Speed 2627.99 samples/sec Loss 9.7857 LearningRate 0.0524 Epoch: 5 Global Step: 228830 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:19,303-Speed 2628.94 samples/sec Loss 9.9292 LearningRate 0.0524 Epoch: 5 Global Step: 228840 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:23,205-Speed 2625.09 samples/sec Loss 9.9171 LearningRate 0.0524 Epoch: 5 Global Step: 228850 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:27,103-Speed 2627.75 samples/sec Loss 9.8420 LearningRate 0.0524 Epoch: 5 Global Step: 228860 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:31,113-Speed 2554.08 samples/sec Loss 9.6799 LearningRate 0.0524 Epoch: 5 Global Step: 228870 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:35,005-Speed 2631.65 samples/sec Loss 9.8173 LearningRate 0.0524 Epoch: 5 Global Step: 228880 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:38,911-Speed 2622.66 samples/sec Loss 9.8063 LearningRate 0.0524 Epoch: 5 Global Step: 228890 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:42,797-Speed 2635.47 samples/sec Loss 9.7215 LearningRate 0.0524 Epoch: 5 Global Step: 228900 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:46,686-Speed 2633.37 samples/sec Loss 9.7699 LearningRate 0.0524 Epoch: 5 Global Step: 228910 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:25:50,561-Speed 2643.19 samples/sec Loss 9.7514 LearningRate 0.0524 Epoch: 5 Global Step: 228920 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:54,465-Speed 2624.02 samples/sec Loss 9.7724 LearningRate 0.0524 Epoch: 5 Global Step: 228930 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:25:58,350-Speed 2636.09 samples/sec Loss 9.7929 LearningRate 0.0524 Epoch: 5 Global Step: 228940 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:02,243-Speed 2631.05 samples/sec Loss 9.9832 LearningRate 0.0524 Epoch: 5 Global Step: 228950 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:06,138-Speed 2629.37 samples/sec Loss 9.9217 LearningRate 0.0524 Epoch: 5 Global Step: 228960 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:10,034-Speed 2629.31 samples/sec Loss 9.7519 LearningRate 0.0524 Epoch: 5 Global Step: 228970 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:13,928-Speed 2629.85 samples/sec Loss 9.9709 LearningRate 0.0524 Epoch: 5 Global Step: 228980 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:17,820-Speed 2631.82 samples/sec Loss 9.7709 LearningRate 0.0524 Epoch: 5 Global Step: 228990 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:21,716-Speed 2628.92 samples/sec Loss 9.8292 LearningRate 0.0524 Epoch: 5 Global Step: 229000 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:25,619-Speed 2624.37 samples/sec Loss 9.7905 LearningRate 0.0524 Epoch: 5 Global Step: 229010 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:29,504-Speed 2636.79 samples/sec Loss 9.7845 LearningRate 0.0524 Epoch: 5 Global Step: 229020 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:33,411-Speed 2621.22 samples/sec Loss 9.6040 LearningRate 0.0524 Epoch: 5 Global Step: 229030 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:37,320-Speed 2620.04 samples/sec Loss 9.7803 LearningRate 0.0524 Epoch: 5 Global Step: 229040 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:41,215-Speed 2629.35 samples/sec Loss 9.8802 LearningRate 0.0524 Epoch: 5 Global Step: 229050 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:45,112-Speed 2628.59 samples/sec Loss 9.7287 LearningRate 0.0524 Epoch: 5 Global Step: 229060 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:49,007-Speed 2629.55 samples/sec Loss 9.8175 LearningRate 0.0524 Epoch: 5 Global Step: 229070 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:52,901-Speed 2630.57 samples/sec Loss 9.8914 LearningRate 0.0524 Epoch: 5 Global Step: 229080 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:26:56,796-Speed 2629.42 samples/sec Loss 9.8856 LearningRate 0.0524 Epoch: 5 Global Step: 229090 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:27:00,692-Speed 2628.98 samples/sec Loss 9.8603 LearningRate 0.0524 Epoch: 5 Global Step: 229100 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:27:04,588-Speed 2628.70 samples/sec Loss 9.7127 LearningRate 0.0524 Epoch: 5 Global Step: 229110 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:27:08,485-Speed 2628.36 samples/sec Loss 9.8773 LearningRate 0.0524 Epoch: 5 Global Step: 229120 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:12,381-Speed 2628.81 samples/sec Loss 9.8683 LearningRate 0.0524 Epoch: 5 Global Step: 229130 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:16,281-Speed 2626.27 samples/sec Loss 9.9217 LearningRate 0.0524 Epoch: 5 Global Step: 229140 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:20,180-Speed 2626.87 samples/sec Loss 9.7654 LearningRate 0.0524 Epoch: 5 Global Step: 229150 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:24,075-Speed 2629.58 samples/sec Loss 9.8050 LearningRate 0.0524 Epoch: 5 Global Step: 229160 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:27,970-Speed 2629.90 samples/sec Loss 9.8452 LearningRate 0.0524 Epoch: 5 Global Step: 229170 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:31,875-Speed 2623.02 samples/sec Loss 9.9138 LearningRate 0.0524 Epoch: 5 Global Step: 229180 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:35,775-Speed 2626.16 samples/sec Loss 9.7683 LearningRate 0.0524 Epoch: 5 Global Step: 229190 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:39,672-Speed 2627.92 samples/sec Loss 9.8034 LearningRate 0.0524 Epoch: 5 Global Step: 229200 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:43,565-Speed 2631.27 samples/sec Loss 9.8375 LearningRate 0.0524 Epoch: 5 Global Step: 229210 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:47,488-Speed 2611.04 samples/sec Loss 9.8806 LearningRate 0.0524 Epoch: 5 Global Step: 229220 Fp16 Grad Scale: 524288 Required: 67 hours
Training: 2022-04-13 21:27:51,383-Speed 2629.68 samples/sec Loss 9.9527 LearningRate 0.0524 Epoch: 5 Global Step: 229230 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:27:55,259-Speed 2642.59 samples/sec Loss 9.7682 LearningRate 0.0524 Epoch: 5 Global Step: 229240 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:27:59,150-Speed 2632.25 samples/sec Loss 9.9179 LearningRate 0.0524 Epoch: 5 Global Step: 229250 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:03,039-Speed 2633.91 samples/sec Loss 9.7879 LearningRate 0.0524 Epoch: 5 Global Step: 229260 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:06,927-Speed 2633.93 samples/sec Loss 9.9731 LearningRate 0.0524 Epoch: 5 Global Step: 229270 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:10,823-Speed 2628.91 samples/sec Loss 9.8915 LearningRate 0.0524 Epoch: 5 Global Step: 229280 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:14,731-Speed 2620.97 samples/sec Loss 9.8824 LearningRate 0.0524 Epoch: 5 Global Step: 229290 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:18,634-Speed 2624.07 samples/sec Loss 9.7522 LearningRate 0.0524 Epoch: 5 Global Step: 229300 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:22,531-Speed 2628.72 samples/sec Loss 9.8321 LearningRate 0.0524 Epoch: 5 Global Step: 229310 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:26,428-Speed 2627.90 samples/sec Loss 9.7910 LearningRate 0.0524 Epoch: 5 Global Step: 229320 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:30,335-Speed 2621.86 samples/sec Loss 9.8892 LearningRate 0.0524 Epoch: 5 Global Step: 229330 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:28:34,240-Speed 2622.47 samples/sec Loss 9.8043 LearningRate 0.0524 Epoch: 5 Global Step: 229340 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:28:38,147-Speed 2621.23 samples/sec Loss 9.7912 LearningRate 0.0524 Epoch: 5 Global Step: 229350 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:28:42,046-Speed 2627.23 samples/sec Loss 9.6682 LearningRate 0.0523 Epoch: 5 Global Step: 229360 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:28:45,943-Speed 2628.23 samples/sec Loss 9.7437 LearningRate 0.0523 Epoch: 5 Global Step: 229370 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:28:49,835-Speed 2632.36 samples/sec Loss 9.7033 LearningRate 0.0523 Epoch: 5 Global Step: 229380 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:28:53,728-Speed 2630.70 samples/sec Loss 9.6556 LearningRate 0.0523 Epoch: 5 Global Step: 229390 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:28:57,604-Speed 2642.53 samples/sec Loss 9.8847 LearningRate 0.0523 Epoch: 5 Global Step: 229400 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:01,512-Speed 2620.44 samples/sec Loss 9.7251 LearningRate 0.0523 Epoch: 5 Global Step: 229410 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:05,395-Speed 2637.84 samples/sec Loss 9.8833 LearningRate 0.0523 Epoch: 5 Global Step: 229420 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:09,285-Speed 2632.76 samples/sec Loss 9.8807 LearningRate 0.0523 Epoch: 5 Global Step: 229430 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:13,181-Speed 2629.29 samples/sec Loss 9.7266 LearningRate 0.0523 Epoch: 5 Global Step: 229440 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:17,073-Speed 2631.59 samples/sec Loss 9.8985 LearningRate 0.0523 Epoch: 5 Global Step: 229450 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:20,966-Speed 2631.70 samples/sec Loss 9.8841 LearningRate 0.0523 Epoch: 5 Global Step: 229460 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:24,856-Speed 2632.82 samples/sec Loss 9.7409 LearningRate 0.0523 Epoch: 5 Global Step: 229470 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:28,748-Speed 2632.04 samples/sec Loss 9.8716 LearningRate 0.0523 Epoch: 5 Global Step: 229480 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:32,640-Speed 2631.47 samples/sec Loss 9.8441 LearningRate 0.0523 Epoch: 5 Global Step: 229490 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:29:36,536-Speed 2628.83 samples/sec Loss 9.7414 LearningRate 0.0523 Epoch: 5 Global Step: 229500 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:29:40,470-Speed 2603.13 samples/sec Loss 9.9053 LearningRate 0.0523 Epoch: 5 Global Step: 229510 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:29:44,475-Speed 2557.80 samples/sec Loss 9.7993 LearningRate 0.0523 Epoch: 5 Global Step: 229520 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:29:48,371-Speed 2629.11 samples/sec Loss 9.8965 LearningRate 0.0523 Epoch: 5 Global Step: 229530 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:29:52,271-Speed 2626.35 samples/sec Loss 9.6851 LearningRate 0.0523 Epoch: 5 Global Step: 229540 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:29:56,188-Speed 2615.10 samples/sec Loss 9.7830 LearningRate 0.0523 Epoch: 5 Global Step: 229550 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:30:00,080-Speed 2631.87 samples/sec Loss 9.9539 LearningRate 0.0523 Epoch: 5 Global Step: 229560 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:30:03,969-Speed 2633.61 samples/sec Loss 9.9666 LearningRate 0.0523 Epoch: 5 Global Step: 229570 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:30:07,882-Speed 2617.57 samples/sec Loss 9.6941 LearningRate 0.0523 Epoch: 5 Global Step: 229580 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:30:11,755-Speed 2644.90 samples/sec Loss 9.8202 LearningRate 0.0523 Epoch: 5 Global Step: 229590 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:15,648-Speed 2630.86 samples/sec Loss 9.7357 LearningRate 0.0523 Epoch: 5 Global Step: 229600 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:19,536-Speed 2634.32 samples/sec Loss 9.6396 LearningRate 0.0523 Epoch: 5 Global Step: 229610 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:23,426-Speed 2633.06 samples/sec Loss 9.8508 LearningRate 0.0523 Epoch: 5 Global Step: 229620 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:27,317-Speed 2632.98 samples/sec Loss 9.7181 LearningRate 0.0523 Epoch: 5 Global Step: 229630 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:31,208-Speed 2632.49 samples/sec Loss 9.7654 LearningRate 0.0523 Epoch: 5 Global Step: 229640 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:35,097-Speed 2633.37 samples/sec Loss 9.7813 LearningRate 0.0523 Epoch: 5 Global Step: 229650 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:38,993-Speed 2629.25 samples/sec Loss 9.7371 LearningRate 0.0523 Epoch: 5 Global Step: 229660 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:42,896-Speed 2623.66 samples/sec Loss 9.8867 LearningRate 0.0523 Epoch: 5 Global Step: 229670 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:46,787-Speed 2632.02 samples/sec Loss 9.8386 LearningRate 0.0523 Epoch: 5 Global Step: 229680 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:30:50,678-Speed 2632.59 samples/sec Loss 9.8681 LearningRate 0.0523 Epoch: 5 Global Step: 229690 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:30:54,575-Speed 2629.21 samples/sec Loss 9.7223 LearningRate 0.0523 Epoch: 5 Global Step: 229700 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:30:58,470-Speed 2629.29 samples/sec Loss 9.8495 LearningRate 0.0523 Epoch: 5 Global Step: 229710 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:31:02,342-Speed 2645.67 samples/sec Loss 9.8324 LearningRate 0.0523 Epoch: 5 Global Step: 229720 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:06,238-Speed 2628.47 samples/sec Loss 9.9164 LearningRate 0.0523 Epoch: 5 Global Step: 229730 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:10,137-Speed 2627.50 samples/sec Loss 9.6812 LearningRate 0.0523 Epoch: 5 Global Step: 229740 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:14,036-Speed 2626.55 samples/sec Loss 9.9089 LearningRate 0.0523 Epoch: 5 Global Step: 229750 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:18,029-Speed 2565.35 samples/sec Loss 9.8591 LearningRate 0.0523 Epoch: 5 Global Step: 229760 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:22,127-Speed 2499.28 samples/sec Loss 9.9088 LearningRate 0.0523 Epoch: 5 Global Step: 229770 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:26,061-Speed 2604.21 samples/sec Loss 9.7362 LearningRate 0.0523 Epoch: 5 Global Step: 229780 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:29,957-Speed 2628.42 samples/sec Loss 9.8069 LearningRate 0.0523 Epoch: 5 Global Step: 229790 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:33,851-Speed 2630.88 samples/sec Loss 9.8735 LearningRate 0.0523 Epoch: 5 Global Step: 229800 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:37,750-Speed 2626.25 samples/sec Loss 9.8636 LearningRate 0.0523 Epoch: 5 Global Step: 229810 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:41,644-Speed 2630.77 samples/sec Loss 9.7094 LearningRate 0.0523 Epoch: 5 Global Step: 229820 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:31:45,518-Speed 2643.63 samples/sec Loss 9.9173 LearningRate 0.0523 Epoch: 5 Global Step: 229830 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:49,419-Speed 2625.47 samples/sec Loss 9.8395 LearningRate 0.0523 Epoch: 5 Global Step: 229840 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:53,309-Speed 2633.34 samples/sec Loss 9.8648 LearningRate 0.0523 Epoch: 5 Global Step: 229850 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:31:57,198-Speed 2634.12 samples/sec Loss 9.9796 LearningRate 0.0523 Epoch: 5 Global Step: 229860 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:01,128-Speed 2605.80 samples/sec Loss 9.9202 LearningRate 0.0523 Epoch: 5 Global Step: 229870 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:05,035-Speed 2622.38 samples/sec Loss 9.8551 LearningRate 0.0523 Epoch: 5 Global Step: 229880 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:08,931-Speed 2628.93 samples/sec Loss 9.7935 LearningRate 0.0523 Epoch: 5 Global Step: 229890 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:12,831-Speed 2625.93 samples/sec Loss 9.8396 LearningRate 0.0523 Epoch: 5 Global Step: 229900 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:16,738-Speed 2621.30 samples/sec Loss 9.7856 LearningRate 0.0523 Epoch: 5 Global Step: 229910 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:20,627-Speed 2634.16 samples/sec Loss 9.7553 LearningRate 0.0523 Epoch: 5 Global Step: 229920 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:24,542-Speed 2616.25 samples/sec Loss 9.6696 LearningRate 0.0522 Epoch: 5 Global Step: 229930 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:32:28,437-Speed 2629.96 samples/sec Loss 9.9108 LearningRate 0.0522 Epoch: 5 Global Step: 229940 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:32:32,347-Speed 2619.72 samples/sec Loss 9.7605 LearningRate 0.0522 Epoch: 5 Global Step: 229950 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:32:36,239-Speed 2632.01 samples/sec Loss 9.7759 LearningRate 0.0522 Epoch: 5 Global Step: 229960 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:40,128-Speed 2633.85 samples/sec Loss 9.7984 LearningRate 0.0522 Epoch: 5 Global Step: 229970 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:44,017-Speed 2633.59 samples/sec Loss 9.8140 LearningRate 0.0522 Epoch: 5 Global Step: 229980 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:47,919-Speed 2625.36 samples/sec Loss 9.7710 LearningRate 0.0522 Epoch: 5 Global Step: 229990 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:32:51,806-Speed 2635.54 samples/sec Loss 9.9782 LearningRate 0.0522 Epoch: 5 Global Step: 230000 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:33:35,754-[lfw][230000]XNorm: 23.869069
Training: 2022-04-13 21:33:35,755-[lfw][230000]Accuracy-Flip: 0.99667+-0.00357
Training: 2022-04-13 21:33:35,755-[lfw][230000]Accuracy-Highest: 0.99783
Training: 2022-04-13 21:34:26,737-[cfp_fp][230000]XNorm: 21.752470
Training: 2022-04-13 21:34:26,738-[cfp_fp][230000]Accuracy-Flip: 0.98300+-0.00601
Training: 2022-04-13 21:34:26,739-[cfp_fp][230000]Accuracy-Highest: 0.98314
Training: 2022-04-13 21:35:10,391-[agedb_30][230000]XNorm: 23.469151
Training: 2022-04-13 21:35:10,392-[agedb_30][230000]Accuracy-Flip: 0.97250+-0.00704
Training: 2022-04-13 21:35:10,392-[agedb_30][230000]Accuracy-Highest: 0.97250
Training: 2022-04-13 21:35:14,253-Speed 71.89 samples/sec Loss 9.8625 LearningRate 0.0522 Epoch: 5 Global Step: 230010 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:18,116-Speed 2651.71 samples/sec Loss 9.9053 LearningRate 0.0522 Epoch: 5 Global Step: 230020 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:22,007-Speed 2632.87 samples/sec Loss 9.8077 LearningRate 0.0522 Epoch: 5 Global Step: 230030 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:25,870-Speed 2651.21 samples/sec Loss 9.6645 LearningRate 0.0522 Epoch: 5 Global Step: 230040 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:29,739-Speed 2647.51 samples/sec Loss 9.8851 LearningRate 0.0522 Epoch: 5 Global Step: 230050 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:33,629-Speed 2632.55 samples/sec Loss 9.9087 LearningRate 0.0522 Epoch: 5 Global Step: 230060 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:37,500-Speed 2647.39 samples/sec Loss 9.7955 LearningRate 0.0522 Epoch: 5 Global Step: 230070 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:41,375-Speed 2643.20 samples/sec Loss 9.8599 LearningRate 0.0522 Epoch: 5 Global Step: 230080 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:45,262-Speed 2635.06 samples/sec Loss 9.8509 LearningRate 0.0522 Epoch: 5 Global Step: 230090 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:49,144-Speed 2638.42 samples/sec Loss 9.8000 LearningRate 0.0522 Epoch: 5 Global Step: 230100 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:53,031-Speed 2634.35 samples/sec Loss 9.9286 LearningRate 0.0522 Epoch: 5 Global Step: 230110 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:35:56,911-Speed 2640.11 samples/sec Loss 9.8851 LearningRate 0.0522 Epoch: 5 Global Step: 230120 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:36:00,790-Speed 2640.62 samples/sec Loss 9.9577 LearningRate 0.0522 Epoch: 5 Global Step: 230130 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:36:04,671-Speed 2639.47 samples/sec Loss 9.6234 LearningRate 0.0522 Epoch: 5 Global Step: 230140 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:36:08,565-Speed 2629.61 samples/sec Loss 9.6021 LearningRate 0.0522 Epoch: 5 Global Step: 230150 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:36:12,490-Speed 2609.90 samples/sec Loss 9.7450 LearningRate 0.0522 Epoch: 5 Global Step: 230160 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:36:16,372-Speed 2638.87 samples/sec Loss 9.7543 LearningRate 0.0522 Epoch: 5 Global Step: 230170 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:36:20,258-Speed 2635.68 samples/sec Loss 9.7625 LearningRate 0.0522 Epoch: 5 Global Step: 230180 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:36:24,130-Speed 2645.06 samples/sec Loss 9.8229 LearningRate 0.0522 Epoch: 5 Global Step: 230190 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:36:27,994-Speed 2651.24 samples/sec Loss 10.0426 LearningRate 0.0522 Epoch: 5 Global Step: 230200 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:31,880-Speed 2635.35 samples/sec Loss 10.5652 LearningRate 0.0522 Epoch: 5 Global Step: 230210 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:35,765-Speed 2636.84 samples/sec Loss 10.3296 LearningRate 0.0522 Epoch: 5 Global Step: 230220 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:39,648-Speed 2638.08 samples/sec Loss 9.9743 LearningRate 0.0522 Epoch: 5 Global Step: 230230 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:43,532-Speed 2636.97 samples/sec Loss 9.8951 LearningRate 0.0522 Epoch: 5 Global Step: 230240 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:47,417-Speed 2636.28 samples/sec Loss 9.8186 LearningRate 0.0522 Epoch: 5 Global Step: 230250 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:51,305-Speed 2634.38 samples/sec Loss 9.8904 LearningRate 0.0522 Epoch: 5 Global Step: 230260 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:55,229-Speed 2610.39 samples/sec Loss 9.7790 LearningRate 0.0522 Epoch: 5 Global Step: 230270 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:36:59,149-Speed 2613.17 samples/sec Loss 9.8529 LearningRate 0.0522 Epoch: 5 Global Step: 230280 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:37:03,036-Speed 2634.51 samples/sec Loss 9.7580 LearningRate 0.0522 Epoch: 5 Global Step: 230290 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:37:06,931-Speed 2630.21 samples/sec Loss 9.8505 LearningRate 0.0522 Epoch: 5 Global Step: 230300 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:10,819-Speed 2633.76 samples/sec Loss 9.8186 LearningRate 0.0522 Epoch: 5 Global Step: 230310 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:14,707-Speed 2635.37 samples/sec Loss 9.7626 LearningRate 0.0522 Epoch: 5 Global Step: 230320 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:18,600-Speed 2630.96 samples/sec Loss 9.8483 LearningRate 0.0522 Epoch: 5 Global Step: 230330 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:22,490-Speed 2632.75 samples/sec Loss 9.6805 LearningRate 0.0522 Epoch: 5 Global Step: 230340 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:26,378-Speed 2634.70 samples/sec Loss 9.8696 LearningRate 0.0522 Epoch: 5 Global Step: 230350 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:30,268-Speed 2633.09 samples/sec Loss 9.7255 LearningRate 0.0522 Epoch: 5 Global Step: 230360 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:34,157-Speed 2634.18 samples/sec Loss 9.7267 LearningRate 0.0522 Epoch: 5 Global Step: 230370 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:38,060-Speed 2624.33 samples/sec Loss 9.9357 LearningRate 0.0522 Epoch: 5 Global Step: 230380 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:41,951-Speed 2632.87 samples/sec Loss 9.8404 LearningRate 0.0522 Epoch: 5 Global Step: 230390 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:37:45,839-Speed 2634.36 samples/sec Loss 9.8091 LearningRate 0.0522 Epoch: 5 Global Step: 230400 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:37:49,740-Speed 2625.70 samples/sec Loss 9.6754 LearningRate 0.0522 Epoch: 5 Global Step: 230410 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:37:53,627-Speed 2634.35 samples/sec Loss 9.8196 LearningRate 0.0522 Epoch: 5 Global Step: 230420 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:37:57,521-Speed 2630.72 samples/sec Loss 9.7971 LearningRate 0.0522 Epoch: 5 Global Step: 230430 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:38:01,409-Speed 2634.15 samples/sec Loss 9.8717 LearningRate 0.0522 Epoch: 5 Global Step: 230440 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:38:05,310-Speed 2626.16 samples/sec Loss 9.9549 LearningRate 0.0522 Epoch: 5 Global Step: 230450 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:38:09,220-Speed 2619.29 samples/sec Loss 9.8492 LearningRate 0.0522 Epoch: 5 Global Step: 230460 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:38:13,113-Speed 2631.72 samples/sec Loss 9.7719 LearningRate 0.0522 Epoch: 5 Global Step: 230470 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:38:17,005-Speed 2631.46 samples/sec Loss 9.8339 LearningRate 0.0522 Epoch: 5 Global Step: 230480 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:38:20,895-Speed 2632.78 samples/sec Loss 9.8051 LearningRate 0.0522 Epoch: 5 Global Step: 230490 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:38:24,789-Speed 2629.77 samples/sec Loss 9.9133 LearningRate 0.0521 Epoch: 5 Global Step: 230500 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:28,683-Speed 2630.27 samples/sec Loss 9.5236 LearningRate 0.0521 Epoch: 5 Global Step: 230510 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:32,604-Speed 2612.13 samples/sec Loss 9.8038 LearningRate 0.0521 Epoch: 5 Global Step: 230520 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:36,512-Speed 2620.69 samples/sec Loss 9.8068 LearningRate 0.0521 Epoch: 5 Global Step: 230530 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:40,401-Speed 2634.33 samples/sec Loss 9.7807 LearningRate 0.0521 Epoch: 5 Global Step: 230540 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:44,295-Speed 2630.38 samples/sec Loss 9.7948 LearningRate 0.0521 Epoch: 5 Global Step: 230550 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:48,195-Speed 2626.30 samples/sec Loss 9.7360 LearningRate 0.0521 Epoch: 5 Global Step: 230560 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:52,088-Speed 2630.81 samples/sec Loss 9.7249 LearningRate 0.0521 Epoch: 5 Global Step: 230570 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:55,980-Speed 2631.44 samples/sec Loss 9.8685 LearningRate 0.0521 Epoch: 5 Global Step: 230580 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:38:59,869-Speed 2634.10 samples/sec Loss 9.7505 LearningRate 0.0521 Epoch: 5 Global Step: 230590 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:03,765-Speed 2628.80 samples/sec Loss 9.7403 LearningRate 0.0521 Epoch: 5 Global Step: 230600 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:39:07,659-Speed 2630.08 samples/sec Loss 9.7601 LearningRate 0.0521 Epoch: 5 Global Step: 230610 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:39:11,731-Speed 2515.71 samples/sec Loss 9.7380 LearningRate 0.0521 Epoch: 5 Global Step: 230620 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:39:15,621-Speed 2632.95 samples/sec Loss 9.8582 LearningRate 0.0521 Epoch: 5 Global Step: 230630 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:39:19,513-Speed 2632.06 samples/sec Loss 9.6587 LearningRate 0.0521 Epoch: 5 Global Step: 230640 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:39:23,384-Speed 2645.56 samples/sec Loss 9.8743 LearningRate 0.0521 Epoch: 5 Global Step: 230650 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:27,273-Speed 2633.91 samples/sec Loss 9.7950 LearningRate 0.0521 Epoch: 5 Global Step: 230660 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:31,163-Speed 2632.40 samples/sec Loss 9.7500 LearningRate 0.0521 Epoch: 5 Global Step: 230670 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:35,060-Speed 2628.98 samples/sec Loss 9.7974 LearningRate 0.0521 Epoch: 5 Global Step: 230680 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:38,969-Speed 2620.23 samples/sec Loss 9.6792 LearningRate 0.0521 Epoch: 5 Global Step: 230690 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:42,870-Speed 2625.87 samples/sec Loss 9.8514 LearningRate 0.0521 Epoch: 5 Global Step: 230700 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:46,761-Speed 2631.84 samples/sec Loss 9.6318 LearningRate 0.0521 Epoch: 5 Global Step: 230710 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:50,663-Speed 2625.10 samples/sec Loss 9.7846 LearningRate 0.0521 Epoch: 5 Global Step: 230720 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:54,558-Speed 2629.36 samples/sec Loss 9.8558 LearningRate 0.0521 Epoch: 5 Global Step: 230730 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:39:58,452-Speed 2630.65 samples/sec Loss 9.8073 LearningRate 0.0521 Epoch: 5 Global Step: 230740 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:40:02,359-Speed 2621.14 samples/sec Loss 9.6216 LearningRate 0.0521 Epoch: 5 Global Step: 230750 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:40:06,249-Speed 2633.28 samples/sec Loss 9.9251 LearningRate 0.0521 Epoch: 5 Global Step: 230760 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:40:10,143-Speed 2630.89 samples/sec Loss 9.8116 LearningRate 0.0521 Epoch: 5 Global Step: 230770 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:40:14,001-Speed 2654.59 samples/sec Loss 10.2534 LearningRate 0.0521 Epoch: 5 Global Step: 230780 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:17,887-Speed 2636.12 samples/sec Loss 9.8844 LearningRate 0.0521 Epoch: 5 Global Step: 230790 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:21,791-Speed 2622.83 samples/sec Loss 9.9411 LearningRate 0.0521 Epoch: 5 Global Step: 230800 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:25,690-Speed 2627.26 samples/sec Loss 10.1853 LearningRate 0.0521 Epoch: 5 Global Step: 230810 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:29,593-Speed 2623.91 samples/sec Loss 9.7436 LearningRate 0.0521 Epoch: 5 Global Step: 230820 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:33,481-Speed 2634.74 samples/sec Loss 9.8479 LearningRate 0.0521 Epoch: 5 Global Step: 230830 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:37,370-Speed 2633.55 samples/sec Loss 9.7450 LearningRate 0.0521 Epoch: 5 Global Step: 230840 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:41,260-Speed 2633.26 samples/sec Loss 9.9395 LearningRate 0.0521 Epoch: 5 Global Step: 230850 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:45,144-Speed 2636.95 samples/sec Loss 9.8519 LearningRate 0.0521 Epoch: 5 Global Step: 230860 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:49,034-Speed 2632.83 samples/sec Loss 9.7411 LearningRate 0.0521 Epoch: 5 Global Step: 230870 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:40:52,927-Speed 2631.27 samples/sec Loss 9.8191 LearningRate 0.0521 Epoch: 5 Global Step: 230880 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:40:56,822-Speed 2629.41 samples/sec Loss 9.7529 LearningRate 0.0521 Epoch: 5 Global Step: 230890 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:00,718-Speed 2628.87 samples/sec Loss 9.8076 LearningRate 0.0521 Epoch: 5 Global Step: 230900 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:04,621-Speed 2624.07 samples/sec Loss 9.7680 LearningRate 0.0521 Epoch: 5 Global Step: 230910 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:08,527-Speed 2623.18 samples/sec Loss 9.8391 LearningRate 0.0521 Epoch: 5 Global Step: 230920 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:12,418-Speed 2632.24 samples/sec Loss 9.7800 LearningRate 0.0521 Epoch: 5 Global Step: 230930 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:16,309-Speed 2632.11 samples/sec Loss 9.7455 LearningRate 0.0521 Epoch: 5 Global Step: 230940 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:20,202-Speed 2631.88 samples/sec Loss 9.6255 LearningRate 0.0521 Epoch: 5 Global Step: 230950 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:24,105-Speed 2624.00 samples/sec Loss 9.8212 LearningRate 0.0521 Epoch: 5 Global Step: 230960 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:27,999-Speed 2630.45 samples/sec Loss 9.9593 LearningRate 0.0521 Epoch: 5 Global Step: 230970 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:41:31,902-Speed 2624.49 samples/sec Loss 9.7792 LearningRate 0.0521 Epoch: 5 Global Step: 230980 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:41:35,801-Speed 2626.49 samples/sec Loss 9.7873 LearningRate 0.0521 Epoch: 5 Global Step: 230990 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:41:39,706-Speed 2623.04 samples/sec Loss 9.9744 LearningRate 0.0521 Epoch: 5 Global Step: 231000 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:41:43,602-Speed 2629.12 samples/sec Loss 9.8963 LearningRate 0.0521 Epoch: 5 Global Step: 231010 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:41:47,495-Speed 2630.75 samples/sec Loss 9.8595 LearningRate 0.0521 Epoch: 5 Global Step: 231020 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:41:51,404-Speed 2620.72 samples/sec Loss 9.7018 LearningRate 0.0521 Epoch: 5 Global Step: 231030 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:41:55,304-Speed 2626.05 samples/sec Loss 9.6928 LearningRate 0.0521 Epoch: 5 Global Step: 231040 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:41:59,198-Speed 2630.35 samples/sec Loss 9.7146 LearningRate 0.0521 Epoch: 5 Global Step: 231050 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:42:03,101-Speed 2624.72 samples/sec Loss 9.7601 LearningRate 0.0521 Epoch: 5 Global Step: 231060 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:42:06,994-Speed 2630.65 samples/sec Loss 9.9034 LearningRate 0.0521 Epoch: 5 Global Step: 231070 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:42:10,886-Speed 2631.10 samples/sec Loss 9.9238 LearningRate 0.0520 Epoch: 5 Global Step: 231080 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:14,777-Speed 2632.92 samples/sec Loss 9.6877 LearningRate 0.0520 Epoch: 5 Global Step: 231090 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:18,668-Speed 2632.40 samples/sec Loss 9.7121 LearningRate 0.0520 Epoch: 5 Global Step: 231100 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:22,557-Speed 2633.80 samples/sec Loss 9.7665 LearningRate 0.0520 Epoch: 5 Global Step: 231110 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:26,451-Speed 2629.89 samples/sec Loss 9.8416 LearningRate 0.0520 Epoch: 5 Global Step: 231120 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:30,342-Speed 2632.59 samples/sec Loss 9.8422 LearningRate 0.0520 Epoch: 5 Global Step: 231130 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:34,237-Speed 2629.66 samples/sec Loss 9.7206 LearningRate 0.0520 Epoch: 5 Global Step: 231140 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:38,131-Speed 2629.93 samples/sec Loss 9.8123 LearningRate 0.0520 Epoch: 5 Global Step: 231150 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:42,022-Speed 2632.01 samples/sec Loss 9.7194 LearningRate 0.0520 Epoch: 5 Global Step: 231160 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:45,918-Speed 2629.61 samples/sec Loss 9.6622 LearningRate 0.0520 Epoch: 5 Global Step: 231170 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:42:49,816-Speed 2628.74 samples/sec Loss 9.6802 LearningRate 0.0520 Epoch: 5 Global Step: 231180 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:42:53,709-Speed 2630.71 samples/sec Loss 9.8650 LearningRate 0.0520 Epoch: 5 Global Step: 231190 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:42:57,600-Speed 2632.72 samples/sec Loss 9.8409 LearningRate 0.0520 Epoch: 5 Global Step: 231200 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:01,495-Speed 2629.18 samples/sec Loss 9.8954 LearningRate 0.0520 Epoch: 5 Global Step: 231210 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:05,389-Speed 2630.44 samples/sec Loss 9.8246 LearningRate 0.0520 Epoch: 5 Global Step: 231220 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:09,280-Speed 2632.23 samples/sec Loss 9.7654 LearningRate 0.0520 Epoch: 5 Global Step: 231230 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:13,176-Speed 2628.75 samples/sec Loss 9.7698 LearningRate 0.0520 Epoch: 5 Global Step: 231240 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:17,072-Speed 2628.76 samples/sec Loss 9.6777 LearningRate 0.0520 Epoch: 5 Global Step: 231250 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:20,968-Speed 2629.42 samples/sec Loss 9.6634 LearningRate 0.0520 Epoch: 5 Global Step: 231260 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:24,861-Speed 2631.18 samples/sec Loss 9.7781 LearningRate 0.0520 Epoch: 5 Global Step: 231270 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:43:28,754-Speed 2631.83 samples/sec Loss 9.8227 LearningRate 0.0520 Epoch: 5 Global Step: 231280 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:32,648-Speed 2629.75 samples/sec Loss 9.7856 LearningRate 0.0520 Epoch: 5 Global Step: 231290 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:36,542-Speed 2629.87 samples/sec Loss 9.7307 LearningRate 0.0520 Epoch: 5 Global Step: 231300 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:40,437-Speed 2629.95 samples/sec Loss 9.8062 LearningRate 0.0520 Epoch: 5 Global Step: 231310 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:44,332-Speed 2629.98 samples/sec Loss 9.8873 LearningRate 0.0520 Epoch: 5 Global Step: 231320 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:48,228-Speed 2628.56 samples/sec Loss 9.6944 LearningRate 0.0520 Epoch: 5 Global Step: 231330 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:52,125-Speed 2628.38 samples/sec Loss 9.7146 LearningRate 0.0520 Epoch: 5 Global Step: 231340 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:56,035-Speed 2619.24 samples/sec Loss 9.7236 LearningRate 0.0520 Epoch: 5 Global Step: 231350 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:43:59,945-Speed 2620.26 samples/sec Loss 9.8859 LearningRate 0.0520 Epoch: 5 Global Step: 231360 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:44:03,844-Speed 2626.82 samples/sec Loss 9.7829 LearningRate 0.0520 Epoch: 5 Global Step: 231370 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:44:07,737-Speed 2630.67 samples/sec Loss 9.7376 LearningRate 0.0520 Epoch: 5 Global Step: 231380 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:44:11,619-Speed 2638.90 samples/sec Loss 9.8859 LearningRate 0.0520 Epoch: 5 Global Step: 231390 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:15,512-Speed 2631.04 samples/sec Loss 9.7900 LearningRate 0.0520 Epoch: 5 Global Step: 231400 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:19,408-Speed 2629.04 samples/sec Loss 9.7217 LearningRate 0.0520 Epoch: 5 Global Step: 231410 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:23,303-Speed 2629.26 samples/sec Loss 9.7442 LearningRate 0.0520 Epoch: 5 Global Step: 231420 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:27,198-Speed 2630.01 samples/sec Loss 9.7979 LearningRate 0.0520 Epoch: 5 Global Step: 231430 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:31,096-Speed 2627.38 samples/sec Loss 9.8394 LearningRate 0.0520 Epoch: 5 Global Step: 231440 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:34,992-Speed 2629.70 samples/sec Loss 9.8832 LearningRate 0.0520 Epoch: 5 Global Step: 231450 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:38,891-Speed 2626.98 samples/sec Loss 9.6881 LearningRate 0.0520 Epoch: 5 Global Step: 231460 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:44:42,780-Speed 2633.17 samples/sec Loss 9.8174 LearningRate 0.0520 Epoch: 5 Global Step: 231470 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:44:46,643-Speed 2651.32 samples/sec Loss 10.0610 LearningRate 0.0520 Epoch: 5 Global Step: 231480 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:44:50,541-Speed 2628.10 samples/sec Loss 10.0430 LearningRate 0.0520 Epoch: 5 Global Step: 231490 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:44:54,433-Speed 2631.27 samples/sec Loss 10.9428 LearningRate 0.0520 Epoch: 5 Global Step: 231500 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:44:58,330-Speed 2629.53 samples/sec Loss 10.5008 LearningRate 0.0520 Epoch: 5 Global Step: 231510 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:02,236-Speed 2622.23 samples/sec Loss 10.0203 LearningRate 0.0520 Epoch: 5 Global Step: 231520 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:06,130-Speed 2630.09 samples/sec Loss 9.9930 LearningRate 0.0520 Epoch: 5 Global Step: 231530 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:10,026-Speed 2628.56 samples/sec Loss 9.7291 LearningRate 0.0520 Epoch: 5 Global Step: 231540 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:13,922-Speed 2629.14 samples/sec Loss 9.9213 LearningRate 0.0520 Epoch: 5 Global Step: 231550 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:17,814-Speed 2632.14 samples/sec Loss 9.8422 LearningRate 0.0520 Epoch: 5 Global Step: 231560 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:21,708-Speed 2629.94 samples/sec Loss 9.8898 LearningRate 0.0520 Epoch: 5 Global Step: 231570 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:25,620-Speed 2619.23 samples/sec Loss 9.8806 LearningRate 0.0520 Epoch: 5 Global Step: 231580 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:29,509-Speed 2633.77 samples/sec Loss 9.9224 LearningRate 0.0520 Epoch: 5 Global Step: 231590 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 21:45:33,402-Speed 2631.02 samples/sec Loss 9.8334 LearningRate 0.0520 Epoch: 5 Global Step: 231600 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:45:37,310-Speed 2621.34 samples/sec Loss 9.8104 LearningRate 0.0520 Epoch: 5 Global Step: 231610 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:45:41,204-Speed 2630.25 samples/sec Loss 9.8004 LearningRate 0.0520 Epoch: 5 Global Step: 231620 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:45:45,094-Speed 2632.58 samples/sec Loss 9.7949 LearningRate 0.0520 Epoch: 5 Global Step: 231630 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:45:48,987-Speed 2631.66 samples/sec Loss 9.8596 LearningRate 0.0520 Epoch: 5 Global Step: 231640 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:45:52,890-Speed 2623.75 samples/sec Loss 9.7850 LearningRate 0.0519 Epoch: 5 Global Step: 231650 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:45:56,787-Speed 2628.71 samples/sec Loss 9.6964 LearningRate 0.0519 Epoch: 5 Global Step: 231660 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:46:00,682-Speed 2629.70 samples/sec Loss 9.9002 LearningRate 0.0519 Epoch: 5 Global Step: 231670 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:46:04,573-Speed 2632.47 samples/sec Loss 9.7571 LearningRate 0.0519 Epoch: 5 Global Step: 231680 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:46:08,468-Speed 2629.77 samples/sec Loss 9.7008 LearningRate 0.0519 Epoch: 5 Global Step: 231690 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 21:46:12,366-Speed 2627.91 samples/sec Loss 9.8077 LearningRate 0.0519 Epoch: 5 Global Step: 231700 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:16,260-Speed 2629.66 samples/sec Loss 9.9805 LearningRate 0.0519 Epoch: 5 Global Step: 231710 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:20,246-Speed 2570.26 samples/sec Loss 9.7473 LearningRate 0.0519 Epoch: 5 Global Step: 231720 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:24,142-Speed 2628.91 samples/sec Loss 9.8181 LearningRate 0.0519 Epoch: 5 Global Step: 231730 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:28,034-Speed 2631.38 samples/sec Loss 9.8934 LearningRate 0.0519 Epoch: 5 Global Step: 231740 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:31,935-Speed 2625.75 samples/sec Loss 9.8305 LearningRate 0.0519 Epoch: 5 Global Step: 231750 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:35,832-Speed 2628.03 samples/sec Loss 9.7533 LearningRate 0.0519 Epoch: 5 Global Step: 231760 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:39,726-Speed 2630.48 samples/sec Loss 9.7168 LearningRate 0.0519 Epoch: 5 Global Step: 231770 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:43,642-Speed 2615.29 samples/sec Loss 9.8785 LearningRate 0.0519 Epoch: 5 Global Step: 231780 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:47,540-Speed 2627.96 samples/sec Loss 9.9097 LearningRate 0.0519 Epoch: 5 Global Step: 231790 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 21:46:51,438-Speed 2627.07 samples/sec Loss 9.9153 LearningRate 0.0519 Epoch: 5 Global Step: 231800 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:46:55,333-Speed 2630.09 samples/sec Loss 9.8737 LearningRate 0.0519 Epoch: 5 Global Step: 231810 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:46:59,227-Speed 2630.10 samples/sec Loss 9.7101 LearningRate 0.0519 Epoch: 5 Global Step: 231820 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:03,135-Speed 2621.20 samples/sec Loss 9.7496 LearningRate 0.0519 Epoch: 5 Global Step: 231830 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:07,031-Speed 2628.50 samples/sec Loss 9.7450 LearningRate 0.0519 Epoch: 5 Global Step: 231840 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:10,929-Speed 2627.72 samples/sec Loss 9.8928 LearningRate 0.0519 Epoch: 5 Global Step: 231850 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:14,832-Speed 2624.37 samples/sec Loss 9.8142 LearningRate 0.0519 Epoch: 5 Global Step: 231860 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:18,755-Speed 2610.54 samples/sec Loss 9.8191 LearningRate 0.0519 Epoch: 5 Global Step: 231870 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:22,653-Speed 2627.63 samples/sec Loss 9.8057 LearningRate 0.0519 Epoch: 5 Global Step: 231880 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:26,550-Speed 2628.67 samples/sec Loss 9.7159 LearningRate 0.0519 Epoch: 5 Global Step: 231890 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:47:30,457-Speed 2621.75 samples/sec Loss 9.6906 LearningRate 0.0519 Epoch: 5 Global Step: 231900 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:47:34,359-Speed 2625.19 samples/sec Loss 9.8551 LearningRate 0.0519 Epoch: 5 Global Step: 231910 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:47:38,268-Speed 2619.97 samples/sec Loss 9.8518 LearningRate 0.0519 Epoch: 5 Global Step: 231920 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:47:42,236-Speed 2580.98 samples/sec Loss 9.8471 LearningRate 0.0519 Epoch: 5 Global Step: 231930 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:47:46,138-Speed 2625.80 samples/sec Loss 9.7912 LearningRate 0.0519 Epoch: 5 Global Step: 231940 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:47:50,035-Speed 2628.39 samples/sec Loss 9.6639 LearningRate 0.0519 Epoch: 5 Global Step: 231950 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:47:53,978-Speed 2597.34 samples/sec Loss 9.7980 LearningRate 0.0519 Epoch: 5 Global Step: 231960 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:47:57,886-Speed 2621.23 samples/sec Loss 9.8351 LearningRate 0.0519 Epoch: 5 Global Step: 231970 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:48:01,795-Speed 2619.91 samples/sec Loss 9.6150 LearningRate 0.0519 Epoch: 5 Global Step: 231980 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:48:05,697-Speed 2625.22 samples/sec Loss 9.6512 LearningRate 0.0519 Epoch: 5 Global Step: 231990 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:48:09,602-Speed 2622.54 samples/sec Loss 9.6359 LearningRate 0.0519 Epoch: 5 Global Step: 232000 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:13,569-Speed 2583.65 samples/sec Loss 9.7138 LearningRate 0.0519 Epoch: 5 Global Step: 232010 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:17,477-Speed 2620.34 samples/sec Loss 9.8355 LearningRate 0.0519 Epoch: 5 Global Step: 232020 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:21,391-Speed 2617.08 samples/sec Loss 9.7443 LearningRate 0.0519 Epoch: 5 Global Step: 232030 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:25,298-Speed 2621.61 samples/sec Loss 9.6662 LearningRate 0.0519 Epoch: 5 Global Step: 232040 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:29,199-Speed 2625.91 samples/sec Loss 9.7995 LearningRate 0.0519 Epoch: 5 Global Step: 232050 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:33,110-Speed 2618.64 samples/sec Loss 9.6479 LearningRate 0.0519 Epoch: 5 Global Step: 232060 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:37,140-Speed 2541.53 samples/sec Loss 9.5514 LearningRate 0.0519 Epoch: 5 Global Step: 232070 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:48:41,065-Speed 2609.68 samples/sec Loss 9.6364 LearningRate 0.0519 Epoch: 5 Global Step: 232080 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:48:44,969-Speed 2623.46 samples/sec Loss 9.8109 LearningRate 0.0519 Epoch: 5 Global Step: 232090 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:48:48,872-Speed 2624.26 samples/sec Loss 9.6854 LearningRate 0.0519 Epoch: 5 Global Step: 232100 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:48:52,776-Speed 2624.02 samples/sec Loss 9.7537 LearningRate 0.0519 Epoch: 5 Global Step: 232110 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:48:56,678-Speed 2624.80 samples/sec Loss 9.8796 LearningRate 0.0519 Epoch: 5 Global Step: 232120 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:49:00,573-Speed 2629.63 samples/sec Loss 9.6682 LearningRate 0.0519 Epoch: 5 Global Step: 232130 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:49:04,473-Speed 2626.64 samples/sec Loss 9.8155 LearningRate 0.0519 Epoch: 5 Global Step: 232140 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:49:08,391-Speed 2613.71 samples/sec Loss 9.7665 LearningRate 0.0519 Epoch: 5 Global Step: 232150 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:49:12,283-Speed 2631.96 samples/sec Loss 9.7417 LearningRate 0.0519 Epoch: 5 Global Step: 232160 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:49:16,191-Speed 2621.17 samples/sec Loss 9.8342 LearningRate 0.0519 Epoch: 5 Global Step: 232170 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:49:20,103-Speed 2618.35 samples/sec Loss 9.7386 LearningRate 0.0519 Epoch: 5 Global Step: 232180 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:24,022-Speed 2613.31 samples/sec Loss 9.7969 LearningRate 0.0519 Epoch: 5 Global Step: 232190 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:27,934-Speed 2618.73 samples/sec Loss 9.7117 LearningRate 0.0519 Epoch: 5 Global Step: 232200 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:31,843-Speed 2620.29 samples/sec Loss 9.6739 LearningRate 0.0519 Epoch: 5 Global Step: 232210 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:35,765-Speed 2611.53 samples/sec Loss 9.7795 LearningRate 0.0519 Epoch: 5 Global Step: 232220 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:39,672-Speed 2621.31 samples/sec Loss 9.8126 LearningRate 0.0518 Epoch: 5 Global Step: 232230 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:43,590-Speed 2614.74 samples/sec Loss 9.7087 LearningRate 0.0518 Epoch: 5 Global Step: 232240 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:47,501-Speed 2619.21 samples/sec Loss 9.6367 LearningRate 0.0518 Epoch: 5 Global Step: 232250 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:51,406-Speed 2622.61 samples/sec Loss 9.8581 LearningRate 0.0518 Epoch: 5 Global Step: 232260 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:55,323-Speed 2615.14 samples/sec Loss 9.8919 LearningRate 0.0518 Epoch: 5 Global Step: 232270 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:49:59,229-Speed 2629.44 samples/sec Loss 9.7216 LearningRate 0.0518 Epoch: 5 Global Step: 232280 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:50:03,138-Speed 2620.78 samples/sec Loss 9.5848 LearningRate 0.0518 Epoch: 5 Global Step: 232290 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:50:07,022-Speed 2636.83 samples/sec Loss 9.7468 LearningRate 0.0518 Epoch: 5 Global Step: 232300 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:10,931-Speed 2619.90 samples/sec Loss 9.6552 LearningRate 0.0518 Epoch: 5 Global Step: 232310 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:14,853-Speed 2633.69 samples/sec Loss 9.7744 LearningRate 0.0518 Epoch: 5 Global Step: 232320 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:18,752-Speed 2626.85 samples/sec Loss 9.6841 LearningRate 0.0518 Epoch: 5 Global Step: 232330 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:22,648-Speed 2629.26 samples/sec Loss 9.7162 LearningRate 0.0518 Epoch: 5 Global Step: 232340 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:26,579-Speed 2627.67 samples/sec Loss 9.7553 LearningRate 0.0518 Epoch: 5 Global Step: 232350 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:30,473-Speed 2630.02 samples/sec Loss 9.7059 LearningRate 0.0518 Epoch: 5 Global Step: 232360 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:34,518-Speed 2610.45 samples/sec Loss 9.8521 LearningRate 0.0518 Epoch: 5 Global Step: 232370 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:50:38,399-Speed 2639.65 samples/sec Loss 9.7351 LearningRate 0.0518 Epoch: 5 Global Step: 232380 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:50:42,295-Speed 2628.92 samples/sec Loss 9.5892 LearningRate 0.0518 Epoch: 5 Global Step: 232390 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:50:46,192-Speed 2628.21 samples/sec Loss 9.7444 LearningRate 0.0518 Epoch: 5 Global Step: 232400 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:50:50,386-Speed 2624.31 samples/sec Loss 9.7410 LearningRate 0.0518 Epoch: 5 Global Step: 232410 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:50:54,282-Speed 2628.89 samples/sec Loss 9.7363 LearningRate 0.0518 Epoch: 5 Global Step: 232420 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:50:58,193-Speed 2618.75 samples/sec Loss 9.5443 LearningRate 0.0518 Epoch: 5 Global Step: 232430 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:51:02,107-Speed 2617.13 samples/sec Loss 9.8728 LearningRate 0.0518 Epoch: 5 Global Step: 232440 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:51:06,007-Speed 2626.33 samples/sec Loss 9.5993 LearningRate 0.0518 Epoch: 5 Global Step: 232450 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:51:09,918-Speed 2619.04 samples/sec Loss 9.6884 LearningRate 0.0518 Epoch: 5 Global Step: 232460 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:51:13,813-Speed 2629.04 samples/sec Loss 9.6427 LearningRate 0.0518 Epoch: 5 Global Step: 232470 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:51:17,709-Speed 2628.93 samples/sec Loss 9.6900 LearningRate 0.0518 Epoch: 5 Global Step: 232480 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:21,608-Speed 2627.49 samples/sec Loss 9.7604 LearningRate 0.0518 Epoch: 5 Global Step: 232490 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:25,506-Speed 2627.34 samples/sec Loss 9.6651 LearningRate 0.0518 Epoch: 5 Global Step: 232500 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:29,411-Speed 2622.64 samples/sec Loss 9.6750 LearningRate 0.0518 Epoch: 5 Global Step: 232510 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:33,309-Speed 2628.61 samples/sec Loss 9.7617 LearningRate 0.0518 Epoch: 5 Global Step: 232520 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:37,218-Speed 2620.32 samples/sec Loss 9.6899 LearningRate 0.0518 Epoch: 5 Global Step: 232530 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:41,115-Speed 2627.94 samples/sec Loss 9.7294 LearningRate 0.0518 Epoch: 5 Global Step: 232540 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:45,017-Speed 2625.26 samples/sec Loss 9.8289 LearningRate 0.0518 Epoch: 5 Global Step: 232550 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:48,917-Speed 2625.61 samples/sec Loss 9.6832 LearningRate 0.0518 Epoch: 5 Global Step: 232560 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:52,815-Speed 2627.83 samples/sec Loss 9.7667 LearningRate 0.0518 Epoch: 5 Global Step: 232570 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:51:56,715-Speed 2626.54 samples/sec Loss 9.8103 LearningRate 0.0518 Epoch: 5 Global Step: 232580 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:52:00,617-Speed 2625.39 samples/sec Loss 9.7692 LearningRate 0.0518 Epoch: 5 Global Step: 232590 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:52:04,513-Speed 2628.43 samples/sec Loss 9.8072 LearningRate 0.0518 Epoch: 5 Global Step: 232600 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:52:08,420-Speed 2622.27 samples/sec Loss 9.8087 LearningRate 0.0518 Epoch: 5 Global Step: 232610 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:52:12,337-Speed 2614.78 samples/sec Loss 9.8579 LearningRate 0.0518 Epoch: 5 Global Step: 232620 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:52:16,238-Speed 2625.78 samples/sec Loss 9.9213 LearningRate 0.0518 Epoch: 5 Global Step: 232630 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:52:20,152-Speed 2616.16 samples/sec Loss 9.7420 LearningRate 0.0518 Epoch: 5 Global Step: 232640 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:52:24,034-Speed 2638.81 samples/sec Loss 9.7674 LearningRate 0.0518 Epoch: 5 Global Step: 232650 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:27,935-Speed 2625.76 samples/sec Loss 9.7215 LearningRate 0.0518 Epoch: 5 Global Step: 232660 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:31,837-Speed 2625.02 samples/sec Loss 9.7328 LearningRate 0.0518 Epoch: 5 Global Step: 232670 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:35,737-Speed 2626.81 samples/sec Loss 9.6441 LearningRate 0.0518 Epoch: 5 Global Step: 232680 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:39,638-Speed 2625.65 samples/sec Loss 9.7006 LearningRate 0.0518 Epoch: 5 Global Step: 232690 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:43,617-Speed 2574.15 samples/sec Loss 9.7523 LearningRate 0.0518 Epoch: 5 Global Step: 232700 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:47,533-Speed 2624.55 samples/sec Loss 9.7486 LearningRate 0.0518 Epoch: 5 Global Step: 232710 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:51,452-Speed 2613.67 samples/sec Loss 9.6615 LearningRate 0.0518 Epoch: 5 Global Step: 232720 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:55,354-Speed 2624.78 samples/sec Loss 9.8118 LearningRate 0.0518 Epoch: 5 Global Step: 232730 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:52:59,262-Speed 2621.17 samples/sec Loss 9.7848 LearningRate 0.0518 Epoch: 5 Global Step: 232740 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:03,180-Speed 2613.93 samples/sec Loss 9.9054 LearningRate 0.0518 Epoch: 5 Global Step: 232750 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:53:07,175-Speed 2563.62 samples/sec Loss 9.6542 LearningRate 0.0518 Epoch: 5 Global Step: 232760 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:53:11,082-Speed 2622.24 samples/sec Loss 9.7388 LearningRate 0.0518 Epoch: 5 Global Step: 232770 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:15,102-Speed 2547.72 samples/sec Loss 9.8142 LearningRate 0.0518 Epoch: 5 Global Step: 232780 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:19,003-Speed 2625.01 samples/sec Loss 9.6755 LearningRate 0.0518 Epoch: 5 Global Step: 232790 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:22,903-Speed 2627.17 samples/sec Loss 9.7317 LearningRate 0.0518 Epoch: 5 Global Step: 232800 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:26,799-Speed 2628.57 samples/sec Loss 9.7418 LearningRate 0.0517 Epoch: 5 Global Step: 232810 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:30,699-Speed 2626.83 samples/sec Loss 9.7311 LearningRate 0.0517 Epoch: 5 Global Step: 232820 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:34,601-Speed 2624.94 samples/sec Loss 9.7169 LearningRate 0.0517 Epoch: 5 Global Step: 232830 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:38,500-Speed 2626.91 samples/sec Loss 9.7494 LearningRate 0.0517 Epoch: 5 Global Step: 232840 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:42,412-Speed 2617.59 samples/sec Loss 9.7701 LearningRate 0.0517 Epoch: 5 Global Step: 232850 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:46,310-Speed 2628.61 samples/sec Loss 9.7571 LearningRate 0.0517 Epoch: 5 Global Step: 232860 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:53:50,206-Speed 2628.69 samples/sec Loss 9.7013 LearningRate 0.0517 Epoch: 5 Global Step: 232870 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:53:54,111-Speed 2622.65 samples/sec Loss 9.6785 LearningRate 0.0517 Epoch: 5 Global Step: 232880 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:53:57,990-Speed 2640.86 samples/sec Loss 9.6876 LearningRate 0.0517 Epoch: 5 Global Step: 232890 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:01,891-Speed 2625.91 samples/sec Loss 9.8004 LearningRate 0.0517 Epoch: 5 Global Step: 232900 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:05,803-Speed 2617.84 samples/sec Loss 9.8267 LearningRate 0.0517 Epoch: 5 Global Step: 232910 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:09,713-Speed 2619.69 samples/sec Loss 9.7475 LearningRate 0.0517 Epoch: 5 Global Step: 232920 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:13,615-Speed 2624.63 samples/sec Loss 9.7006 LearningRate 0.0517 Epoch: 5 Global Step: 232930 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:17,520-Speed 2623.63 samples/sec Loss 9.8258 LearningRate 0.0517 Epoch: 5 Global Step: 232940 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:21,420-Speed 2626.58 samples/sec Loss 9.8286 LearningRate 0.0517 Epoch: 5 Global Step: 232950 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:25,328-Speed 2620.68 samples/sec Loss 9.7703 LearningRate 0.0517 Epoch: 5 Global Step: 232960 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:29,338-Speed 2554.39 samples/sec Loss 9.6291 LearningRate 0.0517 Epoch: 5 Global Step: 232970 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:33,260-Speed 2611.09 samples/sec Loss 9.7009 LearningRate 0.0517 Epoch: 5 Global Step: 232980 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:54:37,170-Speed 2619.64 samples/sec Loss 9.6802 LearningRate 0.0517 Epoch: 5 Global Step: 232990 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:54:41,078-Speed 2621.15 samples/sec Loss 9.8060 LearningRate 0.0517 Epoch: 5 Global Step: 233000 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:54:44,984-Speed 2622.65 samples/sec Loss 9.8255 LearningRate 0.0517 Epoch: 5 Global Step: 233010 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:54:48,884-Speed 2626.48 samples/sec Loss 9.8890 LearningRate 0.0517 Epoch: 5 Global Step: 233020 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:54:52,786-Speed 2625.29 samples/sec Loss 9.6707 LearningRate 0.0517 Epoch: 5 Global Step: 233030 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:54:56,796-Speed 2553.74 samples/sec Loss 9.8292 LearningRate 0.0517 Epoch: 5 Global Step: 233040 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:00,847-Speed 2528.67 samples/sec Loss 9.7821 LearningRate 0.0517 Epoch: 5 Global Step: 233050 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:04,750-Speed 2624.23 samples/sec Loss 9.7041 LearningRate 0.0517 Epoch: 5 Global Step: 233060 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:08,655-Speed 2622.89 samples/sec Loss 9.7532 LearningRate 0.0517 Epoch: 5 Global Step: 233070 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:12,552-Speed 2627.93 samples/sec Loss 9.7795 LearningRate 0.0517 Epoch: 5 Global Step: 233080 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:16,436-Speed 2637.69 samples/sec Loss 9.6811 LearningRate 0.0517 Epoch: 5 Global Step: 233090 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:20,339-Speed 2624.24 samples/sec Loss 9.6263 LearningRate 0.0517 Epoch: 5 Global Step: 233100 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:24,273-Speed 2603.93 samples/sec Loss 9.7415 LearningRate 0.0517 Epoch: 5 Global Step: 233110 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:55:28,156-Speed 2638.09 samples/sec Loss 9.7885 LearningRate 0.0517 Epoch: 5 Global Step: 233120 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:32,066-Speed 2620.01 samples/sec Loss 9.7925 LearningRate 0.0517 Epoch: 5 Global Step: 233130 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:35,970-Speed 2623.41 samples/sec Loss 9.8500 LearningRate 0.0517 Epoch: 5 Global Step: 233140 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:39,872-Speed 2624.83 samples/sec Loss 9.7569 LearningRate 0.0517 Epoch: 5 Global Step: 233150 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:43,770-Speed 2627.44 samples/sec Loss 9.7345 LearningRate 0.0517 Epoch: 5 Global Step: 233160 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:47,670-Speed 2626.28 samples/sec Loss 9.8273 LearningRate 0.0517 Epoch: 5 Global Step: 233170 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:51,572-Speed 2625.44 samples/sec Loss 9.9768 LearningRate 0.0517 Epoch: 5 Global Step: 233180 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:55,484-Speed 2618.03 samples/sec Loss 9.7270 LearningRate 0.0517 Epoch: 5 Global Step: 233190 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:55:59,382-Speed 2627.62 samples/sec Loss 9.6823 LearningRate 0.0517 Epoch: 5 Global Step: 233200 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:03,284-Speed 2625.09 samples/sec Loss 9.7188 LearningRate 0.0517 Epoch: 5 Global Step: 233210 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:07,172-Speed 2634.37 samples/sec Loss 9.7304 LearningRate 0.0517 Epoch: 5 Global Step: 233220 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:11,069-Speed 2627.77 samples/sec Loss 9.7133 LearningRate 0.0517 Epoch: 5 Global Step: 233230 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:15,001-Speed 2605.44 samples/sec Loss 9.7049 LearningRate 0.0517 Epoch: 5 Global Step: 233240 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:18,899-Speed 2627.05 samples/sec Loss 9.8155 LearningRate 0.0517 Epoch: 5 Global Step: 233250 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:22,810-Speed 2619.47 samples/sec Loss 9.8673 LearningRate 0.0517 Epoch: 5 Global Step: 233260 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:26,711-Speed 2625.83 samples/sec Loss 9.7987 LearningRate 0.0517 Epoch: 5 Global Step: 233270 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:30,627-Speed 2615.15 samples/sec Loss 9.6865 LearningRate 0.0517 Epoch: 5 Global Step: 233280 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:34,524-Speed 2628.71 samples/sec Loss 9.7004 LearningRate 0.0517 Epoch: 5 Global Step: 233290 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:38,428-Speed 2623.27 samples/sec Loss 9.7518 LearningRate 0.0517 Epoch: 5 Global Step: 233300 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:42,332-Speed 2624.04 samples/sec Loss 9.7403 LearningRate 0.0517 Epoch: 5 Global Step: 233310 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:56:46,231-Speed 2626.77 samples/sec Loss 9.7040 LearningRate 0.0517 Epoch: 5 Global Step: 233320 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:56:50,132-Speed 2625.44 samples/sec Loss 9.7062 LearningRate 0.0517 Epoch: 5 Global Step: 233330 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:56:54,038-Speed 2621.87 samples/sec Loss 9.7217 LearningRate 0.0517 Epoch: 5 Global Step: 233340 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:56:57,947-Speed 2620.51 samples/sec Loss 9.6391 LearningRate 0.0517 Epoch: 5 Global Step: 233350 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:57:01,867-Speed 2612.75 samples/sec Loss 9.7192 LearningRate 0.0517 Epoch: 5 Global Step: 233360 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:57:05,765-Speed 2627.58 samples/sec Loss 9.8140 LearningRate 0.0517 Epoch: 5 Global Step: 233370 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:57:09,649-Speed 2637.82 samples/sec Loss 9.7906 LearningRate 0.0516 Epoch: 5 Global Step: 233380 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:57:13,550-Speed 2625.65 samples/sec Loss 9.7835 LearningRate 0.0516 Epoch: 5 Global Step: 233390 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:57:17,453-Speed 2624.14 samples/sec Loss 9.7347 LearningRate 0.0516 Epoch: 5 Global Step: 233400 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:57:21,382-Speed 2606.98 samples/sec Loss 9.7432 LearningRate 0.0516 Epoch: 5 Global Step: 233410 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:57:25,302-Speed 2612.65 samples/sec Loss 9.6952 LearningRate 0.0516 Epoch: 5 Global Step: 233420 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:57:29,214-Speed 2618.09 samples/sec Loss 9.8663 LearningRate 0.0516 Epoch: 5 Global Step: 233430 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:57:33,125-Speed 2618.85 samples/sec Loss 9.6149 LearningRate 0.0516 Epoch: 5 Global Step: 233440 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:57:37,012-Speed 2634.98 samples/sec Loss 9.6401 LearningRate 0.0516 Epoch: 5 Global Step: 233450 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:57:40,916-Speed 2624.00 samples/sec Loss 9.6578 LearningRate 0.0516 Epoch: 5 Global Step: 233460 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:57:44,823-Speed 2621.39 samples/sec Loss 9.7915 LearningRate 0.0516 Epoch: 5 Global Step: 233470 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:57:48,736-Speed 2617.57 samples/sec Loss 9.7008 LearningRate 0.0516 Epoch: 5 Global Step: 233480 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:57:52,638-Speed 2624.69 samples/sec Loss 9.6981 LearningRate 0.0516 Epoch: 5 Global Step: 233490 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:57:56,549-Speed 2619.08 samples/sec Loss 9.7230 LearningRate 0.0516 Epoch: 5 Global Step: 233500 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:58:00,458-Speed 2620.23 samples/sec Loss 9.8299 LearningRate 0.0516 Epoch: 5 Global Step: 233510 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:58:04,545-Speed 2506.09 samples/sec Loss 9.6847 LearningRate 0.0516 Epoch: 5 Global Step: 233520 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:58:08,490-Speed 2596.37 samples/sec Loss 9.6998 LearningRate 0.0516 Epoch: 5 Global Step: 233530 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:58:12,395-Speed 2623.54 samples/sec Loss 9.7523 LearningRate 0.0516 Epoch: 5 Global Step: 233540 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 21:58:16,311-Speed 2615.14 samples/sec Loss 9.6691 LearningRate 0.0516 Epoch: 5 Global Step: 233550 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:20,216-Speed 2623.05 samples/sec Loss 9.6761 LearningRate 0.0516 Epoch: 5 Global Step: 233560 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:24,120-Speed 2623.45 samples/sec Loss 9.6091 LearningRate 0.0516 Epoch: 5 Global Step: 233570 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:28,033-Speed 2618.27 samples/sec Loss 9.9069 LearningRate 0.0516 Epoch: 5 Global Step: 233580 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:31,939-Speed 2622.38 samples/sec Loss 9.8050 LearningRate 0.0516 Epoch: 5 Global Step: 233590 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:35,842-Speed 2623.84 samples/sec Loss 9.7090 LearningRate 0.0516 Epoch: 5 Global Step: 233600 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:39,744-Speed 2624.75 samples/sec Loss 9.7013 LearningRate 0.0516 Epoch: 5 Global Step: 233610 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:43,647-Speed 2624.44 samples/sec Loss 9.7485 LearningRate 0.0516 Epoch: 5 Global Step: 233620 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:47,548-Speed 2625.49 samples/sec Loss 9.7105 LearningRate 0.0516 Epoch: 5 Global Step: 233630 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:51,451-Speed 2624.62 samples/sec Loss 9.7282 LearningRate 0.0516 Epoch: 5 Global Step: 233640 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:58:55,353-Speed 2624.70 samples/sec Loss 9.7370 LearningRate 0.0516 Epoch: 5 Global Step: 233650 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:58:59,267-Speed 2617.18 samples/sec Loss 9.7608 LearningRate 0.0516 Epoch: 5 Global Step: 233660 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:03,176-Speed 2620.10 samples/sec Loss 9.7230 LearningRate 0.0516 Epoch: 5 Global Step: 233670 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:07,082-Speed 2622.12 samples/sec Loss 9.6919 LearningRate 0.0516 Epoch: 5 Global Step: 233680 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:10,983-Speed 2625.84 samples/sec Loss 9.6308 LearningRate 0.0516 Epoch: 5 Global Step: 233690 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:14,881-Speed 2627.88 samples/sec Loss 9.8770 LearningRate 0.0516 Epoch: 5 Global Step: 233700 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:18,782-Speed 2625.54 samples/sec Loss 9.7603 LearningRate 0.0516 Epoch: 5 Global Step: 233710 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:22,691-Speed 2620.39 samples/sec Loss 9.6441 LearningRate 0.0516 Epoch: 5 Global Step: 233720 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:26,589-Speed 2627.29 samples/sec Loss 9.7212 LearningRate 0.0516 Epoch: 5 Global Step: 233730 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:30,487-Speed 2627.76 samples/sec Loss 9.7339 LearningRate 0.0516 Epoch: 5 Global Step: 233740 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:34,378-Speed 2632.77 samples/sec Loss 9.7961 LearningRate 0.0516 Epoch: 5 Global Step: 233750 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:38,298-Speed 2612.41 samples/sec Loss 9.8724 LearningRate 0.0516 Epoch: 5 Global Step: 233760 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:42,200-Speed 2624.54 samples/sec Loss 9.8087 LearningRate 0.0516 Epoch: 5 Global Step: 233770 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:46,110-Speed 2619.70 samples/sec Loss 9.8057 LearningRate 0.0516 Epoch: 5 Global Step: 233780 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:50,012-Speed 2624.81 samples/sec Loss 9.7914 LearningRate 0.0516 Epoch: 5 Global Step: 233790 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 21:59:53,949-Speed 2602.24 samples/sec Loss 9.7257 LearningRate 0.0516 Epoch: 5 Global Step: 233800 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 21:59:57,853-Speed 2622.93 samples/sec Loss 9.7371 LearningRate 0.0516 Epoch: 5 Global Step: 233810 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:01,771-Speed 2615.03 samples/sec Loss 9.7383 LearningRate 0.0516 Epoch: 5 Global Step: 233820 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:05,669-Speed 2627.25 samples/sec Loss 9.6887 LearningRate 0.0516 Epoch: 5 Global Step: 233830 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:09,574-Speed 2622.69 samples/sec Loss 9.6303 LearningRate 0.0516 Epoch: 5 Global Step: 233840 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:13,476-Speed 2625.06 samples/sec Loss 9.6508 LearningRate 0.0516 Epoch: 5 Global Step: 233850 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:17,376-Speed 2626.21 samples/sec Loss 9.7708 LearningRate 0.0516 Epoch: 5 Global Step: 233860 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:21,277-Speed 2625.37 samples/sec Loss 9.6842 LearningRate 0.0516 Epoch: 5 Global Step: 233870 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:25,185-Speed 2621.23 samples/sec Loss 9.6681 LearningRate 0.0516 Epoch: 5 Global Step: 233880 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:29,100-Speed 2616.17 samples/sec Loss 9.5982 LearningRate 0.0516 Epoch: 5 Global Step: 233890 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:33,003-Speed 2624.76 samples/sec Loss 9.6731 LearningRate 0.0516 Epoch: 5 Global Step: 233900 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:00:36,900-Speed 2627.61 samples/sec Loss 9.7050 LearningRate 0.0516 Epoch: 5 Global Step: 233910 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:00:40,803-Speed 2624.42 samples/sec Loss 9.6333 LearningRate 0.0516 Epoch: 5 Global Step: 233920 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:00:44,738-Speed 2603.00 samples/sec Loss 9.6948 LearningRate 0.0516 Epoch: 5 Global Step: 233930 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:48,664-Speed 2608.77 samples/sec Loss 9.6118 LearningRate 0.0516 Epoch: 5 Global Step: 233940 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:52,566-Speed 2625.52 samples/sec Loss 9.8251 LearningRate 0.0516 Epoch: 5 Global Step: 233950 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:00:56,468-Speed 2624.32 samples/sec Loss 9.6850 LearningRate 0.0515 Epoch: 5 Global Step: 233960 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:00,374-Speed 2622.56 samples/sec Loss 9.6183 LearningRate 0.0515 Epoch: 5 Global Step: 233970 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:04,296-Speed 2611.22 samples/sec Loss 9.7376 LearningRate 0.0515 Epoch: 5 Global Step: 233980 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:08,205-Speed 2620.56 samples/sec Loss 9.7863 LearningRate 0.0515 Epoch: 5 Global Step: 233990 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:12,110-Speed 2622.70 samples/sec Loss 9.7109 LearningRate 0.0515 Epoch: 5 Global Step: 234000 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:16,009-Speed 2626.96 samples/sec Loss 9.7741 LearningRate 0.0515 Epoch: 5 Global Step: 234010 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:19,910-Speed 2625.91 samples/sec Loss 9.6156 LearningRate 0.0515 Epoch: 5 Global Step: 234020 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:23,810-Speed 2626.12 samples/sec Loss 9.7286 LearningRate 0.0515 Epoch: 5 Global Step: 234030 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:01:27,710-Speed 2626.29 samples/sec Loss 9.7384 LearningRate 0.0515 Epoch: 5 Global Step: 234040 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:01:31,610-Speed 2626.03 samples/sec Loss 9.8157 LearningRate 0.0515 Epoch: 5 Global Step: 234050 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:01:35,523-Speed 2617.44 samples/sec Loss 9.7646 LearningRate 0.0515 Epoch: 5 Global Step: 234060 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:01:39,415-Speed 2632.06 samples/sec Loss 9.8225 LearningRate 0.0515 Epoch: 5 Global Step: 234070 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:43,305-Speed 2632.42 samples/sec Loss 9.8607 LearningRate 0.0515 Epoch: 5 Global Step: 234080 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:47,209-Speed 2623.94 samples/sec Loss 9.6685 LearningRate 0.0515 Epoch: 5 Global Step: 234090 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:51,114-Speed 2622.95 samples/sec Loss 9.6380 LearningRate 0.0515 Epoch: 5 Global Step: 234100 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:55,014-Speed 2626.24 samples/sec Loss 9.6690 LearningRate 0.0515 Epoch: 5 Global Step: 234110 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:01:58,913-Speed 2627.00 samples/sec Loss 9.7547 LearningRate 0.0515 Epoch: 5 Global Step: 234120 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:02:02,834-Speed 2613.01 samples/sec Loss 9.7151 LearningRate 0.0515 Epoch: 5 Global Step: 234130 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:02:06,732-Speed 2627.24 samples/sec Loss 9.8126 LearningRate 0.0515 Epoch: 5 Global Step: 234140 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:02:10,636-Speed 2623.57 samples/sec Loss 9.7886 LearningRate 0.0515 Epoch: 5 Global Step: 234150 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:02:14,546-Speed 2619.44 samples/sec Loss 9.7274 LearningRate 0.0515 Epoch: 5 Global Step: 234160 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:02:18,450-Speed 2623.82 samples/sec Loss 9.8461 LearningRate 0.0515 Epoch: 5 Global Step: 234170 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:22,352-Speed 2625.62 samples/sec Loss 9.7313 LearningRate 0.0515 Epoch: 5 Global Step: 234180 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:26,257-Speed 2623.21 samples/sec Loss 9.6920 LearningRate 0.0515 Epoch: 5 Global Step: 234190 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:30,169-Speed 2618.02 samples/sec Loss 9.8085 LearningRate 0.0515 Epoch: 5 Global Step: 234200 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:34,096-Speed 2608.78 samples/sec Loss 9.6495 LearningRate 0.0515 Epoch: 5 Global Step: 234210 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:38,011-Speed 2615.80 samples/sec Loss 9.6200 LearningRate 0.0515 Epoch: 5 Global Step: 234220 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:41,914-Speed 2624.94 samples/sec Loss 9.7383 LearningRate 0.0515 Epoch: 5 Global Step: 234230 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:45,816-Speed 2624.45 samples/sec Loss 9.7445 LearningRate 0.0515 Epoch: 5 Global Step: 234240 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:49,721-Speed 2622.68 samples/sec Loss 9.7992 LearningRate 0.0515 Epoch: 5 Global Step: 234250 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:02:53,619-Speed 2627.39 samples/sec Loss 9.6189 LearningRate 0.0515 Epoch: 5 Global Step: 234260 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:02:57,522-Speed 2624.46 samples/sec Loss 9.7324 LearningRate 0.0515 Epoch: 5 Global Step: 234270 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:01,423-Speed 2625.39 samples/sec Loss 9.6063 LearningRate 0.0515 Epoch: 5 Global Step: 234280 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:05,326-Speed 2624.99 samples/sec Loss 9.6588 LearningRate 0.0515 Epoch: 5 Global Step: 234290 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:09,226-Speed 2625.87 samples/sec Loss 9.7344 LearningRate 0.0515 Epoch: 5 Global Step: 234300 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:13,128-Speed 2625.48 samples/sec Loss 9.7507 LearningRate 0.0515 Epoch: 5 Global Step: 234310 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:17,032-Speed 2623.37 samples/sec Loss 9.6607 LearningRate 0.0515 Epoch: 5 Global Step: 234320 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:20,934-Speed 2624.92 samples/sec Loss 9.7709 LearningRate 0.0515 Epoch: 5 Global Step: 234330 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:24,958-Speed 2545.14 samples/sec Loss 9.8205 LearningRate 0.0515 Epoch: 5 Global Step: 234340 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:29,051-Speed 2502.02 samples/sec Loss 9.6415 LearningRate 0.0515 Epoch: 5 Global Step: 234350 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:33,146-Speed 2501.70 samples/sec Loss 9.7782 LearningRate 0.0515 Epoch: 5 Global Step: 234360 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:03:37,185-Speed 2535.90 samples/sec Loss 9.7401 LearningRate 0.0515 Epoch: 5 Global Step: 234370 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:41,117-Speed 2605.16 samples/sec Loss 9.6131 LearningRate 0.0515 Epoch: 5 Global Step: 234380 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:45,032-Speed 2616.04 samples/sec Loss 9.7507 LearningRate 0.0515 Epoch: 5 Global Step: 234390 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:48,940-Speed 2620.86 samples/sec Loss 9.7706 LearningRate 0.0515 Epoch: 5 Global Step: 234400 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:52,905-Speed 2583.48 samples/sec Loss 9.5920 LearningRate 0.0515 Epoch: 5 Global Step: 234410 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:03:56,843-Speed 2600.96 samples/sec Loss 9.6953 LearningRate 0.0515 Epoch: 5 Global Step: 234420 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:04:00,744-Speed 2625.02 samples/sec Loss 9.7451 LearningRate 0.0515 Epoch: 5 Global Step: 234430 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:04:04,641-Speed 2629.24 samples/sec Loss 9.7459 LearningRate 0.0515 Epoch: 5 Global Step: 234440 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:04:08,551-Speed 2619.35 samples/sec Loss 9.6761 LearningRate 0.0515 Epoch: 5 Global Step: 234450 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:04:12,457-Speed 2622.31 samples/sec Loss 9.6921 LearningRate 0.0515 Epoch: 5 Global Step: 234460 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:04:16,356-Speed 2627.15 samples/sec Loss 9.7441 LearningRate 0.0515 Epoch: 5 Global Step: 234470 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:20,260-Speed 2623.38 samples/sec Loss 9.6468 LearningRate 0.0515 Epoch: 5 Global Step: 234480 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:24,159-Speed 2626.45 samples/sec Loss 9.6106 LearningRate 0.0515 Epoch: 5 Global Step: 234490 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:28,074-Speed 2616.59 samples/sec Loss 9.6778 LearningRate 0.0515 Epoch: 5 Global Step: 234500 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:31,990-Speed 2615.71 samples/sec Loss 9.6197 LearningRate 0.0515 Epoch: 5 Global Step: 234510 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:35,982-Speed 2565.56 samples/sec Loss 9.6722 LearningRate 0.0515 Epoch: 5 Global Step: 234520 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:39,937-Speed 2589.98 samples/sec Loss 9.7275 LearningRate 0.0515 Epoch: 5 Global Step: 234530 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:43,858-Speed 2612.81 samples/sec Loss 9.7165 LearningRate 0.0514 Epoch: 5 Global Step: 234540 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:47,762-Speed 2623.13 samples/sec Loss 9.8305 LearningRate 0.0514 Epoch: 5 Global Step: 234550 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:51,661-Speed 2627.10 samples/sec Loss 9.7353 LearningRate 0.0514 Epoch: 5 Global Step: 234560 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:55,537-Speed 2642.63 samples/sec Loss 9.7669 LearningRate 0.0514 Epoch: 5 Global Step: 234570 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:04:59,459-Speed 2611.39 samples/sec Loss 9.6030 LearningRate 0.0514 Epoch: 5 Global Step: 234580 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:05:03,358-Speed 2626.76 samples/sec Loss 9.6305 LearningRate 0.0514 Epoch: 5 Global Step: 234590 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:05:07,240-Speed 2638.10 samples/sec Loss 9.5646 LearningRate 0.0514 Epoch: 5 Global Step: 234600 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:11,138-Speed 2627.40 samples/sec Loss 9.8047 LearningRate 0.0514 Epoch: 5 Global Step: 234610 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:15,046-Speed 2621.39 samples/sec Loss 9.5724 LearningRate 0.0514 Epoch: 5 Global Step: 234620 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:18,955-Speed 2620.57 samples/sec Loss 9.6911 LearningRate 0.0514 Epoch: 5 Global Step: 234630 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:22,868-Speed 2617.39 samples/sec Loss 9.6458 LearningRate 0.0514 Epoch: 5 Global Step: 234640 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:26,772-Speed 2624.07 samples/sec Loss 9.5640 LearningRate 0.0514 Epoch: 5 Global Step: 234650 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:30,672-Speed 2625.80 samples/sec Loss 9.6515 LearningRate 0.0514 Epoch: 5 Global Step: 234660 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:34,587-Speed 2616.18 samples/sec Loss 9.7988 LearningRate 0.0514 Epoch: 5 Global Step: 234670 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:38,502-Speed 2615.75 samples/sec Loss 9.6617 LearningRate 0.0514 Epoch: 5 Global Step: 234680 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:42,401-Speed 2627.21 samples/sec Loss 9.7477 LearningRate 0.0514 Epoch: 5 Global Step: 234690 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:46,282-Speed 2639.11 samples/sec Loss 9.6757 LearningRate 0.0514 Epoch: 5 Global Step: 234700 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:50,197-Speed 2616.86 samples/sec Loss 9.7726 LearningRate 0.0514 Epoch: 5 Global Step: 234710 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:54,105-Speed 2620.15 samples/sec Loss 9.6646 LearningRate 0.0514 Epoch: 5 Global Step: 234720 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:05:58,016-Speed 2619.48 samples/sec Loss 9.5845 LearningRate 0.0514 Epoch: 5 Global Step: 234730 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:01,919-Speed 2623.62 samples/sec Loss 9.6412 LearningRate 0.0514 Epoch: 5 Global Step: 234740 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:05,831-Speed 2618.21 samples/sec Loss 9.8052 LearningRate 0.0514 Epoch: 5 Global Step: 234750 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:09,744-Speed 2617.66 samples/sec Loss 9.6234 LearningRate 0.0514 Epoch: 5 Global Step: 234760 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:13,656-Speed 2618.47 samples/sec Loss 9.7499 LearningRate 0.0514 Epoch: 5 Global Step: 234770 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:17,566-Speed 2620.04 samples/sec Loss 9.8322 LearningRate 0.0514 Epoch: 5 Global Step: 234780 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:21,481-Speed 2616.12 samples/sec Loss 9.7636 LearningRate 0.0514 Epoch: 5 Global Step: 234790 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:25,379-Speed 2627.97 samples/sec Loss 9.7988 LearningRate 0.0514 Epoch: 5 Global Step: 234800 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:06:29,267-Speed 2634.72 samples/sec Loss 9.6107 LearningRate 0.0514 Epoch: 5 Global Step: 234810 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:33,165-Speed 2627.41 samples/sec Loss 9.6638 LearningRate 0.0514 Epoch: 5 Global Step: 234820 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:37,064-Speed 2626.80 samples/sec Loss 9.6293 LearningRate 0.0514 Epoch: 5 Global Step: 234830 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:40,961-Speed 2628.74 samples/sec Loss 9.7686 LearningRate 0.0514 Epoch: 5 Global Step: 234840 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:44,862-Speed 2625.34 samples/sec Loss 9.8022 LearningRate 0.0514 Epoch: 5 Global Step: 234850 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:48,774-Speed 2618.91 samples/sec Loss 9.7111 LearningRate 0.0514 Epoch: 5 Global Step: 234860 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:52,671-Speed 2627.74 samples/sec Loss 9.6136 LearningRate 0.0514 Epoch: 5 Global Step: 234870 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:06:56,571-Speed 2626.54 samples/sec Loss 9.7552 LearningRate 0.0514 Epoch: 5 Global Step: 234880 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:07:00,471-Speed 2626.53 samples/sec Loss 9.7458 LearningRate 0.0514 Epoch: 5 Global Step: 234890 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:07:04,385-Speed 2616.39 samples/sec Loss 9.7816 LearningRate 0.0514 Epoch: 5 Global Step: 234900 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:07:08,282-Speed 2627.89 samples/sec Loss 9.8849 LearningRate 0.0514 Epoch: 5 Global Step: 234910 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:12,189-Speed 2622.58 samples/sec Loss 9.7079 LearningRate 0.0514 Epoch: 5 Global Step: 234920 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:16,087-Speed 2627.15 samples/sec Loss 9.6327 LearningRate 0.0514 Epoch: 5 Global Step: 234930 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:20,019-Speed 2605.30 samples/sec Loss 9.6362 LearningRate 0.0514 Epoch: 5 Global Step: 234940 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:23,917-Speed 2627.62 samples/sec Loss 9.6131 LearningRate 0.0514 Epoch: 5 Global Step: 234950 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:27,821-Speed 2624.28 samples/sec Loss 9.7220 LearningRate 0.0514 Epoch: 5 Global Step: 234960 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:31,718-Speed 2628.35 samples/sec Loss 9.7012 LearningRate 0.0514 Epoch: 5 Global Step: 234970 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:35,614-Speed 2628.46 samples/sec Loss 9.6241 LearningRate 0.0514 Epoch: 5 Global Step: 234980 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:39,512-Speed 2628.13 samples/sec Loss 9.6818 LearningRate 0.0514 Epoch: 5 Global Step: 234990 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:43,409-Speed 2628.31 samples/sec Loss 9.5934 LearningRate 0.0514 Epoch: 5 Global Step: 235000 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:07:47,262-Speed 2658.57 samples/sec Loss 9.6311 LearningRate 0.0514 Epoch: 5 Global Step: 235010 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:07:51,160-Speed 2627.86 samples/sec Loss 9.8216 LearningRate 0.0514 Epoch: 5 Global Step: 235020 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:07:55,054-Speed 2630.29 samples/sec Loss 9.6516 LearningRate 0.0514 Epoch: 5 Global Step: 235030 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:07:58,951-Speed 2628.07 samples/sec Loss 9.7291 LearningRate 0.0514 Epoch: 5 Global Step: 235040 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:08:02,846-Speed 2629.81 samples/sec Loss 9.6002 LearningRate 0.0514 Epoch: 5 Global Step: 235050 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:08:06,749-Speed 2624.22 samples/sec Loss 9.6283 LearningRate 0.0514 Epoch: 5 Global Step: 235060 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:08:10,646-Speed 2628.15 samples/sec Loss 9.5898 LearningRate 0.0514 Epoch: 5 Global Step: 235070 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:08:14,552-Speed 2622.48 samples/sec Loss 9.7587 LearningRate 0.0514 Epoch: 5 Global Step: 235080 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:08:18,455-Speed 2624.45 samples/sec Loss 9.6995 LearningRate 0.0514 Epoch: 5 Global Step: 235090 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:08:22,359-Speed 2624.05 samples/sec Loss 9.6330 LearningRate 0.0514 Epoch: 5 Global Step: 235100 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:08:26,213-Speed 2657.32 samples/sec Loss 10.1494 LearningRate 0.0514 Epoch: 5 Global Step: 235110 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:30,123-Speed 2619.17 samples/sec Loss 10.1164 LearningRate 0.0513 Epoch: 5 Global Step: 235120 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:34,024-Speed 2626.26 samples/sec Loss 10.7050 LearningRate 0.0513 Epoch: 5 Global Step: 235130 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:37,940-Speed 2615.38 samples/sec Loss 10.2369 LearningRate 0.0513 Epoch: 5 Global Step: 235140 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:41,848-Speed 2620.90 samples/sec Loss 9.9680 LearningRate 0.0513 Epoch: 5 Global Step: 235150 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:45,766-Speed 2613.94 samples/sec Loss 9.6812 LearningRate 0.0513 Epoch: 5 Global Step: 235160 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:49,662-Speed 2629.32 samples/sec Loss 9.6828 LearningRate 0.0513 Epoch: 5 Global Step: 235170 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:53,568-Speed 2621.92 samples/sec Loss 9.9019 LearningRate 0.0513 Epoch: 5 Global Step: 235180 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:08:57,488-Speed 2613.28 samples/sec Loss 9.8400 LearningRate 0.0513 Epoch: 5 Global Step: 235190 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:09:01,394-Speed 2622.29 samples/sec Loss 9.8673 LearningRate 0.0513 Epoch: 5 Global Step: 235200 Fp16 Grad Scale: 8192 Required: 67 hours
Training: 2022-04-13 22:09:05,326-Speed 2604.90 samples/sec Loss 9.7918 LearningRate 0.0513 Epoch: 5 Global Step: 235210 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:09,221-Speed 2630.28 samples/sec Loss 9.7706 LearningRate 0.0513 Epoch: 5 Global Step: 235220 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:13,130-Speed 2620.25 samples/sec Loss 9.6315 LearningRate 0.0513 Epoch: 5 Global Step: 235230 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:17,022-Speed 2631.73 samples/sec Loss 9.5911 LearningRate 0.0513 Epoch: 5 Global Step: 235240 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:20,916-Speed 2630.32 samples/sec Loss 9.8050 LearningRate 0.0513 Epoch: 5 Global Step: 235250 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:24,847-Speed 2605.35 samples/sec Loss 9.8296 LearningRate 0.0513 Epoch: 5 Global Step: 235260 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:28,751-Speed 2624.10 samples/sec Loss 9.5250 LearningRate 0.0513 Epoch: 5 Global Step: 235270 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:32,651-Speed 2626.87 samples/sec Loss 9.6874 LearningRate 0.0513 Epoch: 5 Global Step: 235280 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:36,553-Speed 2624.68 samples/sec Loss 9.7870 LearningRate 0.0513 Epoch: 5 Global Step: 235290 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:40,482-Speed 2608.02 samples/sec Loss 9.8816 LearningRate 0.0513 Epoch: 5 Global Step: 235300 Fp16 Grad Scale: 16384 Required: 67 hours
Training: 2022-04-13 22:09:44,390-Speed 2620.54 samples/sec Loss 9.7102 LearningRate 0.0513 Epoch: 5 Global Step: 235310 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:09:48,290-Speed 2626.51 samples/sec Loss 9.7193 LearningRate 0.0513 Epoch: 5 Global Step: 235320 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:09:52,192-Speed 2624.89 samples/sec Loss 9.6638 LearningRate 0.0513 Epoch: 5 Global Step: 235330 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:09:56,100-Speed 2620.74 samples/sec Loss 9.6739 LearningRate 0.0513 Epoch: 5 Global Step: 235340 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:10:00,008-Speed 2621.07 samples/sec Loss 9.6905 LearningRate 0.0513 Epoch: 5 Global Step: 235350 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:10:03,956-Speed 2595.01 samples/sec Loss 9.6328 LearningRate 0.0513 Epoch: 5 Global Step: 235360 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:10:07,866-Speed 2619.71 samples/sec Loss 9.8303 LearningRate 0.0513 Epoch: 5 Global Step: 235370 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:10:11,760-Speed 2629.86 samples/sec Loss 9.6382 LearningRate 0.0513 Epoch: 5 Global Step: 235380 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:10:15,655-Speed 2630.25 samples/sec Loss 9.8365 LearningRate 0.0513 Epoch: 5 Global Step: 235390 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:10:19,557-Speed 2624.51 samples/sec Loss 9.6368 LearningRate 0.0513 Epoch: 5 Global Step: 235400 Fp16 Grad Scale: 32768 Required: 67 hours
Training: 2022-04-13 22:10:23,449-Speed 2631.96 samples/sec Loss 9.8367 LearningRate 0.0513 Epoch: 5 Global Step: 235410 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:27,344-Speed 2629.21 samples/sec Loss 9.6563 LearningRate 0.0513 Epoch: 5 Global Step: 235420 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:31,245-Speed 2626.14 samples/sec Loss 9.6945 LearningRate 0.0513 Epoch: 5 Global Step: 235430 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:35,138-Speed 2630.85 samples/sec Loss 9.7751 LearningRate 0.0513 Epoch: 5 Global Step: 235440 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:39,032-Speed 2630.70 samples/sec Loss 9.6506 LearningRate 0.0513 Epoch: 5 Global Step: 235450 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:42,924-Speed 2631.93 samples/sec Loss 9.6121 LearningRate 0.0513 Epoch: 5 Global Step: 235460 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:46,836-Speed 2617.76 samples/sec Loss 9.6069 LearningRate 0.0513 Epoch: 5 Global Step: 235470 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:50,733-Speed 2628.96 samples/sec Loss 9.6682 LearningRate 0.0513 Epoch: 5 Global Step: 235480 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:54,629-Speed 2628.56 samples/sec Loss 9.7235 LearningRate 0.0513 Epoch: 5 Global Step: 235490 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:10:58,535-Speed 2622.38 samples/sec Loss 9.6425 LearningRate 0.0513 Epoch: 5 Global Step: 235500 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:11:02,425-Speed 2632.48 samples/sec Loss 9.6079 LearningRate 0.0513 Epoch: 5 Global Step: 235510 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:06,315-Speed 2633.60 samples/sec Loss 9.7048 LearningRate 0.0513 Epoch: 5 Global Step: 235520 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:10,211-Speed 2629.15 samples/sec Loss 9.6408 LearningRate 0.0513 Epoch: 5 Global Step: 235530 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:14,110-Speed 2626.95 samples/sec Loss 9.7062 LearningRate 0.0513 Epoch: 5 Global Step: 235540 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:18,004-Speed 2630.64 samples/sec Loss 9.7928 LearningRate 0.0513 Epoch: 5 Global Step: 235550 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:21,896-Speed 2631.50 samples/sec Loss 9.6764 LearningRate 0.0513 Epoch: 5 Global Step: 235560 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:25,801-Speed 2623.58 samples/sec Loss 9.8308 LearningRate 0.0513 Epoch: 5 Global Step: 235570 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:29,694-Speed 2630.23 samples/sec Loss 9.6778 LearningRate 0.0513 Epoch: 5 Global Step: 235580 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:33,590-Speed 2629.63 samples/sec Loss 9.8219 LearningRate 0.0513 Epoch: 5 Global Step: 235590 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:37,481-Speed 2632.02 samples/sec Loss 9.7869 LearningRate 0.0513 Epoch: 5 Global Step: 235600 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:11:41,376-Speed 2629.73 samples/sec Loss 9.6826 LearningRate 0.0513 Epoch: 5 Global Step: 235610 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:11:45,272-Speed 2628.96 samples/sec Loss 9.6824 LearningRate 0.0513 Epoch: 5 Global Step: 235620 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:11:49,177-Speed 2623.83 samples/sec Loss 9.5521 LearningRate 0.0513 Epoch: 5 Global Step: 235630 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:11:53,233-Speed 2525.15 samples/sec Loss 9.6498 LearningRate 0.0513 Epoch: 5 Global Step: 235640 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:11:57,343-Speed 2492.52 samples/sec Loss 9.7593 LearningRate 0.0513 Epoch: 5 Global Step: 235650 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:12:01,277-Speed 2602.88 samples/sec Loss 9.8059 LearningRate 0.0513 Epoch: 5 Global Step: 235660 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:05,176-Speed 2626.79 samples/sec Loss 9.6771 LearningRate 0.0513 Epoch: 5 Global Step: 235670 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:09,074-Speed 2627.86 samples/sec Loss 9.7064 LearningRate 0.0513 Epoch: 5 Global Step: 235680 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:12,978-Speed 2623.62 samples/sec Loss 9.7535 LearningRate 0.0513 Epoch: 5 Global Step: 235690 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:16,884-Speed 2622.25 samples/sec Loss 9.7739 LearningRate 0.0512 Epoch: 5 Global Step: 235700 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:20,785-Speed 2625.94 samples/sec Loss 9.6103 LearningRate 0.0512 Epoch: 5 Global Step: 235710 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:24,682-Speed 2628.36 samples/sec Loss 9.7072 LearningRate 0.0512 Epoch: 5 Global Step: 235720 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:28,574-Speed 2632.09 samples/sec Loss 9.6485 LearningRate 0.0512 Epoch: 5 Global Step: 235730 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:32,472-Speed 2627.81 samples/sec Loss 9.7072 LearningRate 0.0512 Epoch: 5 Global Step: 235740 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:36,366-Speed 2630.15 samples/sec Loss 9.6284 LearningRate 0.0512 Epoch: 5 Global Step: 235750 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:40,260-Speed 2629.78 samples/sec Loss 9.7014 LearningRate 0.0512 Epoch: 5 Global Step: 235760 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:12:44,151-Speed 2632.39 samples/sec Loss 9.7314 LearningRate 0.0512 Epoch: 5 Global Step: 235770 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:48,046-Speed 2630.53 samples/sec Loss 9.8262 LearningRate 0.0512 Epoch: 5 Global Step: 235780 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:51,937-Speed 2631.80 samples/sec Loss 9.7813 LearningRate 0.0512 Epoch: 5 Global Step: 235790 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:55,832-Speed 2629.88 samples/sec Loss 9.5084 LearningRate 0.0512 Epoch: 5 Global Step: 235800 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:12:59,727-Speed 2629.56 samples/sec Loss 9.6315 LearningRate 0.0512 Epoch: 5 Global Step: 235810 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:03,634-Speed 2621.68 samples/sec Loss 9.6594 LearningRate 0.0512 Epoch: 5 Global Step: 235820 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:07,533-Speed 2626.57 samples/sec Loss 9.7877 LearningRate 0.0512 Epoch: 5 Global Step: 235830 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:11,430-Speed 2628.74 samples/sec Loss 9.6317 LearningRate 0.0512 Epoch: 5 Global Step: 235840 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:15,334-Speed 2623.35 samples/sec Loss 9.6186 LearningRate 0.0512 Epoch: 5 Global Step: 235850 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:19,234-Speed 2626.46 samples/sec Loss 9.6559 LearningRate 0.0512 Epoch: 5 Global Step: 235860 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:23,135-Speed 2625.51 samples/sec Loss 9.6028 LearningRate 0.0512 Epoch: 5 Global Step: 235870 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:13:27,032-Speed 2628.94 samples/sec Loss 9.7335 LearningRate 0.0512 Epoch: 5 Global Step: 235880 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:13:30,927-Speed 2629.66 samples/sec Loss 9.6884 LearningRate 0.0512 Epoch: 5 Global Step: 235890 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:13:34,825-Speed 2627.13 samples/sec Loss 9.7448 LearningRate 0.0512 Epoch: 5 Global Step: 235900 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:13:38,726-Speed 2625.40 samples/sec Loss 9.6309 LearningRate 0.0512 Epoch: 5 Global Step: 235910 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:13:42,608-Speed 2639.26 samples/sec Loss 9.5963 LearningRate 0.0512 Epoch: 5 Global Step: 235920 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:46,506-Speed 2627.50 samples/sec Loss 9.5436 LearningRate 0.0512 Epoch: 5 Global Step: 235930 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:50,416-Speed 2619.74 samples/sec Loss 9.6195 LearningRate 0.0512 Epoch: 5 Global Step: 235940 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:54,306-Speed 2632.44 samples/sec Loss 9.5578 LearningRate 0.0512 Epoch: 5 Global Step: 235950 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:13:58,209-Speed 2625.00 samples/sec Loss 9.7493 LearningRate 0.0512 Epoch: 5 Global Step: 235960 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:02,106-Speed 2628.11 samples/sec Loss 9.6478 LearningRate 0.0512 Epoch: 5 Global Step: 235970 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:06,001-Speed 2629.42 samples/sec Loss 9.6296 LearningRate 0.0512 Epoch: 5 Global Step: 235980 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:09,899-Speed 2627.33 samples/sec Loss 9.5784 LearningRate 0.0512 Epoch: 5 Global Step: 235990 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:13,795-Speed 2629.68 samples/sec Loss 9.7253 LearningRate 0.0512 Epoch: 5 Global Step: 236000 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:17,696-Speed 2625.54 samples/sec Loss 9.7426 LearningRate 0.0512 Epoch: 5 Global Step: 236010 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:21,600-Speed 2623.97 samples/sec Loss 9.7399 LearningRate 0.0512 Epoch: 5 Global Step: 236020 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:14:25,496-Speed 2628.91 samples/sec Loss 9.4397 LearningRate 0.0512 Epoch: 5 Global Step: 236030 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:14:29,379-Speed 2638.09 samples/sec Loss 9.7469 LearningRate 0.0512 Epoch: 5 Global Step: 236040 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:33,276-Speed 2628.57 samples/sec Loss 9.7430 LearningRate 0.0512 Epoch: 5 Global Step: 236050 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:37,212-Speed 2602.09 samples/sec Loss 9.8899 LearningRate 0.0512 Epoch: 5 Global Step: 236060 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:41,133-Speed 2612.47 samples/sec Loss 9.6538 LearningRate 0.0512 Epoch: 5 Global Step: 236070 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:45,027-Speed 2630.28 samples/sec Loss 9.7708 LearningRate 0.0512 Epoch: 5 Global Step: 236080 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:48,926-Speed 2627.72 samples/sec Loss 9.6010 LearningRate 0.0512 Epoch: 5 Global Step: 236090 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:52,823-Speed 2628.24 samples/sec Loss 9.6986 LearningRate 0.0512 Epoch: 5 Global Step: 236100 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:14:56,719-Speed 2628.91 samples/sec Loss 9.7660 LearningRate 0.0512 Epoch: 5 Global Step: 236110 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:00,626-Speed 2621.49 samples/sec Loss 9.5844 LearningRate 0.0512 Epoch: 5 Global Step: 236120 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:04,521-Speed 2629.46 samples/sec Loss 9.5850 LearningRate 0.0512 Epoch: 5 Global Step: 236130 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:08,413-Speed 2631.79 samples/sec Loss 9.5367 LearningRate 0.0512 Epoch: 5 Global Step: 236140 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:15:12,297-Speed 2637.35 samples/sec Loss 9.6912 LearningRate 0.0512 Epoch: 5 Global Step: 236150 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:16,192-Speed 2630.11 samples/sec Loss 9.6956 LearningRate 0.0512 Epoch: 5 Global Step: 236160 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:20,122-Speed 2605.93 samples/sec Loss 9.6834 LearningRate 0.0512 Epoch: 5 Global Step: 236170 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:24,016-Speed 2630.47 samples/sec Loss 9.6983 LearningRate 0.0512 Epoch: 5 Global Step: 236180 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:27,910-Speed 2630.40 samples/sec Loss 9.6665 LearningRate 0.0512 Epoch: 5 Global Step: 236190 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:31,923-Speed 2552.07 samples/sec Loss 9.5978 LearningRate 0.0512 Epoch: 5 Global Step: 236200 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:35,812-Speed 2633.69 samples/sec Loss 9.7145 LearningRate 0.0512 Epoch: 5 Global Step: 236210 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:39,715-Speed 2625.13 samples/sec Loss 9.7228 LearningRate 0.0512 Epoch: 5 Global Step: 236220 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:43,610-Speed 2628.86 samples/sec Loss 9.6890 LearningRate 0.0512 Epoch: 5 Global Step: 236230 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:47,541-Speed 2606.46 samples/sec Loss 9.6790 LearningRate 0.0512 Epoch: 5 Global Step: 236240 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:15:51,447-Speed 2621.78 samples/sec Loss 9.5257 LearningRate 0.0512 Epoch: 5 Global Step: 236250 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:15:55,353-Speed 2622.67 samples/sec Loss 9.7073 LearningRate 0.0512 Epoch: 5 Global Step: 236260 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:15:59,251-Speed 2627.71 samples/sec Loss 9.5046 LearningRate 0.0512 Epoch: 5 Global Step: 236270 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:16:03,145-Speed 2630.46 samples/sec Loss 9.5542 LearningRate 0.0511 Epoch: 5 Global Step: 236280 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:16:07,039-Speed 2630.26 samples/sec Loss 9.6944 LearningRate 0.0511 Epoch: 5 Global Step: 236290 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:16:10,915-Speed 2642.52 samples/sec Loss 9.5768 LearningRate 0.0511 Epoch: 5 Global Step: 236300 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:14,815-Speed 2626.68 samples/sec Loss 9.6690 LearningRate 0.0511 Epoch: 5 Global Step: 236310 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:18,732-Speed 2614.62 samples/sec Loss 9.6634 LearningRate 0.0511 Epoch: 5 Global Step: 236320 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:22,628-Speed 2629.53 samples/sec Loss 9.6047 LearningRate 0.0511 Epoch: 5 Global Step: 236330 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:26,531-Speed 2623.85 samples/sec Loss 9.6344 LearningRate 0.0511 Epoch: 5 Global Step: 236340 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:30,427-Speed 2628.85 samples/sec Loss 9.6071 LearningRate 0.0511 Epoch: 5 Global Step: 236350 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:34,356-Speed 2607.26 samples/sec Loss 9.5685 LearningRate 0.0511 Epoch: 5 Global Step: 236360 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:38,249-Speed 2631.16 samples/sec Loss 9.6619 LearningRate 0.0511 Epoch: 5 Global Step: 236370 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:42,153-Speed 2623.89 samples/sec Loss 9.5389 LearningRate 0.0511 Epoch: 5 Global Step: 236380 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:46,067-Speed 2617.25 samples/sec Loss 9.5701 LearningRate 0.0511 Epoch: 5 Global Step: 236390 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:16:50,019-Speed 2591.69 samples/sec Loss 9.7119 LearningRate 0.0511 Epoch: 5 Global Step: 236400 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:16:53,910-Speed 2632.17 samples/sec Loss 9.7484 LearningRate 0.0511 Epoch: 5 Global Step: 236410 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:16:57,818-Speed 2621.34 samples/sec Loss 9.9769 LearningRate 0.0511 Epoch: 5 Global Step: 236420 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:17:01,715-Speed 2628.07 samples/sec Loss 9.6886 LearningRate 0.0511 Epoch: 5 Global Step: 236430 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:17:05,610-Speed 2629.82 samples/sec Loss 9.6117 LearningRate 0.0511 Epoch: 5 Global Step: 236440 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:17:09,508-Speed 2628.00 samples/sec Loss 9.6962 LearningRate 0.0511 Epoch: 5 Global Step: 236450 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:17:13,404-Speed 2629.60 samples/sec Loss 9.6447 LearningRate 0.0511 Epoch: 5 Global Step: 236460 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:17:17,279-Speed 2642.87 samples/sec Loss 9.6442 LearningRate 0.0511 Epoch: 5 Global Step: 236470 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:21,185-Speed 2622.59 samples/sec Loss 9.7325 LearningRate 0.0511 Epoch: 5 Global Step: 236480 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:25,077-Speed 2631.80 samples/sec Loss 9.5067 LearningRate 0.0511 Epoch: 5 Global Step: 236490 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:28,969-Speed 2632.15 samples/sec Loss 9.6437 LearningRate 0.0511 Epoch: 5 Global Step: 236500 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:32,866-Speed 2627.96 samples/sec Loss 9.5476 LearningRate 0.0511 Epoch: 5 Global Step: 236510 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:36,763-Speed 2628.30 samples/sec Loss 9.6733 LearningRate 0.0511 Epoch: 5 Global Step: 236520 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:40,658-Speed 2629.44 samples/sec Loss 9.6113 LearningRate 0.0511 Epoch: 5 Global Step: 236530 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:44,556-Speed 2627.91 samples/sec Loss 9.6417 LearningRate 0.0511 Epoch: 5 Global Step: 236540 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:48,480-Speed 2610.80 samples/sec Loss 9.5774 LearningRate 0.0511 Epoch: 5 Global Step: 236550 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:52,384-Speed 2623.63 samples/sec Loss 9.6518 LearningRate 0.0511 Epoch: 5 Global Step: 236560 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:17:56,286-Speed 2625.36 samples/sec Loss 9.7686 LearningRate 0.0511 Epoch: 5 Global Step: 236570 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:18:00,205-Speed 2613.56 samples/sec Loss 9.5466 LearningRate 0.0511 Epoch: 5 Global Step: 236580 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:18:04,100-Speed 2629.27 samples/sec Loss 9.6511 LearningRate 0.0511 Epoch: 5 Global Step: 236590 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:18:07,973-Speed 2644.33 samples/sec Loss 9.6534 LearningRate 0.0511 Epoch: 5 Global Step: 236600 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:11,874-Speed 2626.03 samples/sec Loss 9.6949 LearningRate 0.0511 Epoch: 5 Global Step: 236610 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:15,773-Speed 2626.97 samples/sec Loss 9.5971 LearningRate 0.0511 Epoch: 5 Global Step: 236620 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:19,667-Speed 2630.46 samples/sec Loss 9.7359 LearningRate 0.0511 Epoch: 5 Global Step: 236630 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:23,563-Speed 2628.62 samples/sec Loss 9.6852 LearningRate 0.0511 Epoch: 5 Global Step: 236640 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:27,456-Speed 2631.49 samples/sec Loss 9.6621 LearningRate 0.0511 Epoch: 5 Global Step: 236650 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:31,351-Speed 2629.64 samples/sec Loss 9.5832 LearningRate 0.0511 Epoch: 5 Global Step: 236660 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:35,245-Speed 2630.27 samples/sec Loss 9.5781 LearningRate 0.0511 Epoch: 5 Global Step: 236670 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:39,139-Speed 2630.22 samples/sec Loss 9.5112 LearningRate 0.0511 Epoch: 5 Global Step: 236680 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:43,037-Speed 2627.83 samples/sec Loss 9.5185 LearningRate 0.0511 Epoch: 5 Global Step: 236690 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:46,932-Speed 2629.48 samples/sec Loss 9.7118 LearningRate 0.0511 Epoch: 5 Global Step: 236700 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:18:50,828-Speed 2629.39 samples/sec Loss 9.6263 LearningRate 0.0511 Epoch: 5 Global Step: 236710 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:18:54,710-Speed 2638.53 samples/sec Loss 9.6693 LearningRate 0.0511 Epoch: 5 Global Step: 236720 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:18:58,605-Speed 2630.16 samples/sec Loss 9.7265 LearningRate 0.0511 Epoch: 5 Global Step: 236730 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:02,529-Speed 2610.16 samples/sec Loss 9.7383 LearningRate 0.0511 Epoch: 5 Global Step: 236740 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:06,425-Speed 2629.16 samples/sec Loss 9.6659 LearningRate 0.0511 Epoch: 5 Global Step: 236750 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:10,321-Speed 2628.71 samples/sec Loss 9.5037 LearningRate 0.0511 Epoch: 5 Global Step: 236760 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:14,226-Speed 2622.86 samples/sec Loss 9.7815 LearningRate 0.0511 Epoch: 5 Global Step: 236770 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:18,132-Speed 2622.51 samples/sec Loss 9.5558 LearningRate 0.0511 Epoch: 5 Global Step: 236780 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:22,024-Speed 2632.03 samples/sec Loss 9.6067 LearningRate 0.0511 Epoch: 5 Global Step: 236790 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:25,915-Speed 2632.76 samples/sec Loss 9.5507 LearningRate 0.0511 Epoch: 5 Global Step: 236800 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:29,811-Speed 2628.59 samples/sec Loss 9.6969 LearningRate 0.0511 Epoch: 5 Global Step: 236810 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:33,705-Speed 2630.32 samples/sec Loss 9.5628 LearningRate 0.0511 Epoch: 5 Global Step: 236820 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:19:37,601-Speed 2628.93 samples/sec Loss 9.5332 LearningRate 0.0511 Epoch: 5 Global Step: 236830 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:19:41,540-Speed 2600.72 samples/sec Loss 9.6556 LearningRate 0.0511 Epoch: 5 Global Step: 236840 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:19:45,421-Speed 2639.35 samples/sec Loss 9.7448 LearningRate 0.0511 Epoch: 5 Global Step: 236850 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:49,315-Speed 2630.61 samples/sec Loss 9.7453 LearningRate 0.0510 Epoch: 5 Global Step: 236860 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:53,211-Speed 2628.39 samples/sec Loss 9.6810 LearningRate 0.0510 Epoch: 5 Global Step: 236870 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:19:57,105-Speed 2630.83 samples/sec Loss 9.6214 LearningRate 0.0510 Epoch: 5 Global Step: 236880 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:00,998-Speed 2631.03 samples/sec Loss 9.7246 LearningRate 0.0510 Epoch: 5 Global Step: 236890 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:04,888-Speed 2632.99 samples/sec Loss 9.6048 LearningRate 0.0510 Epoch: 5 Global Step: 236900 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:08,811-Speed 2611.13 samples/sec Loss 9.5717 LearningRate 0.0510 Epoch: 5 Global Step: 236910 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:12,705-Speed 2630.27 samples/sec Loss 9.5346 LearningRate 0.0510 Epoch: 5 Global Step: 236920 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:16,630-Speed 2609.68 samples/sec Loss 9.6144 LearningRate 0.0510 Epoch: 5 Global Step: 236930 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:20,526-Speed 2629.11 samples/sec Loss 9.7973 LearningRate 0.0510 Epoch: 5 Global Step: 236940 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:24,423-Speed 2628.45 samples/sec Loss 9.7516 LearningRate 0.0510 Epoch: 5 Global Step: 236950 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:20:28,318-Speed 2630.04 samples/sec Loss 9.6064 LearningRate 0.0510 Epoch: 5 Global Step: 236960 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:20:32,210-Speed 2631.87 samples/sec Loss 9.7135 LearningRate 0.0510 Epoch: 5 Global Step: 236970 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:20:36,108-Speed 2627.54 samples/sec Loss 9.6018 LearningRate 0.0510 Epoch: 5 Global Step: 236980 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:20:39,989-Speed 2639.40 samples/sec Loss 9.7401 LearningRate 0.0510 Epoch: 5 Global Step: 236990 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:43,882-Speed 2631.56 samples/sec Loss 9.6216 LearningRate 0.0510 Epoch: 5 Global Step: 237000 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:47,777-Speed 2629.12 samples/sec Loss 9.6820 LearningRate 0.0510 Epoch: 5 Global Step: 237010 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:51,668-Speed 2632.69 samples/sec Loss 9.6148 LearningRate 0.0510 Epoch: 5 Global Step: 237020 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:55,571-Speed 2624.80 samples/sec Loss 9.6471 LearningRate 0.0510 Epoch: 5 Global Step: 237030 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:20:59,477-Speed 2622.10 samples/sec Loss 9.6663 LearningRate 0.0510 Epoch: 5 Global Step: 237040 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:03,389-Speed 2617.92 samples/sec Loss 9.6127 LearningRate 0.0510 Epoch: 5 Global Step: 237050 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:07,285-Speed 2629.56 samples/sec Loss 9.7094 LearningRate 0.0510 Epoch: 5 Global Step: 237060 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:11,183-Speed 2627.08 samples/sec Loss 9.8181 LearningRate 0.0510 Epoch: 5 Global Step: 237070 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:15,079-Speed 2629.37 samples/sec Loss 9.6885 LearningRate 0.0510 Epoch: 5 Global Step: 237080 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:18,958-Speed 2640.21 samples/sec Loss 9.7662 LearningRate 0.0510 Epoch: 5 Global Step: 237090 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:22,853-Speed 2629.80 samples/sec Loss 9.7612 LearningRate 0.0510 Epoch: 5 Global Step: 237100 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:26,765-Speed 2618.58 samples/sec Loss 9.7758 LearningRate 0.0510 Epoch: 5 Global Step: 237110 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:30,674-Speed 2620.01 samples/sec Loss 9.7971 LearningRate 0.0510 Epoch: 5 Global Step: 237120 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:34,570-Speed 2628.94 samples/sec Loss 9.6956 LearningRate 0.0510 Epoch: 5 Global Step: 237130 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:38,504-Speed 2603.48 samples/sec Loss 9.4805 LearningRate 0.0510 Epoch: 5 Global Step: 237140 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:42,399-Speed 2630.19 samples/sec Loss 9.6208 LearningRate 0.0510 Epoch: 5 Global Step: 237150 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:46,297-Speed 2627.98 samples/sec Loss 9.6919 LearningRate 0.0510 Epoch: 5 Global Step: 237160 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:50,192-Speed 2629.49 samples/sec Loss 9.6490 LearningRate 0.0510 Epoch: 5 Global Step: 237170 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:54,096-Speed 2623.74 samples/sec Loss 9.5316 LearningRate 0.0510 Epoch: 5 Global Step: 237180 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:21:57,997-Speed 2625.77 samples/sec Loss 9.6983 LearningRate 0.0510 Epoch: 5 Global Step: 237190 Fp16 Grad Scale: 262144 Required: 67 hours
Training: 2022-04-13 22:22:01,871-Speed 2643.81 samples/sec Loss 9.5725 LearningRate 0.0510 Epoch: 5 Global Step: 237200 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:22:05,763-Speed 2630.96 samples/sec Loss 9.5487 LearningRate 0.0510 Epoch: 5 Global Step: 237210 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:22:09,664-Speed 2632.52 samples/sec Loss 9.6562 LearningRate 0.0510 Epoch: 5 Global Step: 237220 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:22:13,559-Speed 2629.95 samples/sec Loss 9.6381 LearningRate 0.0510 Epoch: 5 Global Step: 237230 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:22:17,436-Speed 2642.23 samples/sec Loss 9.6046 LearningRate 0.0510 Epoch: 5 Global Step: 237240 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:21,342-Speed 2622.31 samples/sec Loss 9.5628 LearningRate 0.0510 Epoch: 5 Global Step: 237250 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:25,259-Speed 2614.85 samples/sec Loss 9.7132 LearningRate 0.0510 Epoch: 5 Global Step: 237260 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:29,152-Speed 2631.06 samples/sec Loss 9.6412 LearningRate 0.0510 Epoch: 5 Global Step: 237270 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:33,042-Speed 2632.70 samples/sec Loss 9.6422 LearningRate 0.0510 Epoch: 5 Global Step: 237280 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:36,992-Speed 2593.41 samples/sec Loss 9.6570 LearningRate 0.0510 Epoch: 5 Global Step: 237290 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:40,879-Speed 2635.08 samples/sec Loss 9.8004 LearningRate 0.0510 Epoch: 5 Global Step: 237300 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:44,791-Speed 2618.25 samples/sec Loss 9.7383 LearningRate 0.0510 Epoch: 5 Global Step: 237310 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:48,696-Speed 2622.86 samples/sec Loss 9.5697 LearningRate 0.0510 Epoch: 5 Global Step: 237320 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:52,652-Speed 2589.37 samples/sec Loss 9.6855 LearningRate 0.0510 Epoch: 5 Global Step: 237330 Fp16 Grad Scale: 65536 Required: 67 hours
Training: 2022-04-13 22:22:56,545-Speed 2630.96 samples/sec Loss 9.5905 LearningRate 0.0510 Epoch: 5 Global Step: 237340 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:23:00,438-Speed 2631.07 samples/sec Loss 9.7095 LearningRate 0.0510 Epoch: 5 Global Step: 237350 Fp16 Grad Scale: 131072 Required: 67 hours
Training: 2022-04-13 22:23:04,370-Speed 2604.88 samples/sec Loss 9.6726 LearningRate 0.0510 Epoch: 5 Global Step: 237360 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:08,264-Speed 2630.72 samples/sec Loss 9.5924 LearningRate 0.0510 Epoch: 5 Global Step: 237370 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:12,159-Speed 2629.19 samples/sec Loss 9.6212 LearningRate 0.0510 Epoch: 5 Global Step: 237380 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:16,075-Speed 2616.00 samples/sec Loss 9.6786 LearningRate 0.0510 Epoch: 5 Global Step: 237390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:19,968-Speed 2630.67 samples/sec Loss 9.6644 LearningRate 0.0510 Epoch: 5 Global Step: 237400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:23,869-Speed 2625.71 samples/sec Loss 9.6284 LearningRate 0.0510 Epoch: 5 Global Step: 237410 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:27,774-Speed 2623.23 samples/sec Loss 9.7717 LearningRate 0.0510 Epoch: 5 Global Step: 237420 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:31,681-Speed 2621.61 samples/sec Loss 9.6909 LearningRate 0.0510 Epoch: 5 Global Step: 237430 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:35,586-Speed 2622.63 samples/sec Loss 9.8044 LearningRate 0.0509 Epoch: 5 Global Step: 237440 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:23:39,511-Speed 2609.09 samples/sec Loss 9.5783 LearningRate 0.0509 Epoch: 5 Global Step: 237450 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:23:43,407-Speed 2629.83 samples/sec Loss 9.6575 LearningRate 0.0509 Epoch: 5 Global Step: 237460 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:23:47,302-Speed 2629.57 samples/sec Loss 9.7507 LearningRate 0.0509 Epoch: 5 Global Step: 237470 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:23:51,214-Speed 2618.41 samples/sec Loss 9.6082 LearningRate 0.0509 Epoch: 5 Global Step: 237480 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:23:55,093-Speed 2640.62 samples/sec Loss 9.6826 LearningRate 0.0509 Epoch: 5 Global Step: 237490 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:23:58,988-Speed 2630.24 samples/sec Loss 9.6751 LearningRate 0.0509 Epoch: 5 Global Step: 237500 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:02,887-Speed 2626.85 samples/sec Loss 9.7269 LearningRate 0.0509 Epoch: 5 Global Step: 237510 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:06,784-Speed 2628.18 samples/sec Loss 9.7018 LearningRate 0.0509 Epoch: 5 Global Step: 237520 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:10,681-Speed 2628.36 samples/sec Loss 9.6690 LearningRate 0.0509 Epoch: 5 Global Step: 237530 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:14,597-Speed 2616.26 samples/sec Loss 9.5834 LearningRate 0.0509 Epoch: 5 Global Step: 237540 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:18,553-Speed 2588.94 samples/sec Loss 9.5216 LearningRate 0.0509 Epoch: 5 Global Step: 237550 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:22,462-Speed 2620.58 samples/sec Loss 9.5822 LearningRate 0.0509 Epoch: 5 Global Step: 237560 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:26,366-Speed 2623.33 samples/sec Loss 9.5265 LearningRate 0.0509 Epoch: 5 Global Step: 237570 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:30,310-Speed 2597.60 samples/sec Loss 9.7859 LearningRate 0.0509 Epoch: 5 Global Step: 237580 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:24:34,224-Speed 2616.30 samples/sec Loss 9.7176 LearningRate 0.0509 Epoch: 5 Global Step: 237590 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:24:38,145-Speed 2612.06 samples/sec Loss 9.7267 LearningRate 0.0509 Epoch: 5 Global Step: 237600 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:24:42,055-Speed 2619.82 samples/sec Loss 9.5985 LearningRate 0.0509 Epoch: 5 Global Step: 237610 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:24:46,061-Speed 2556.64 samples/sec Loss 9.6001 LearningRate 0.0509 Epoch: 5 Global Step: 237620 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:24:49,956-Speed 2630.31 samples/sec Loss 9.6581 LearningRate 0.0509 Epoch: 5 Global Step: 237630 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:24:53,852-Speed 2628.48 samples/sec Loss 9.6555 LearningRate 0.0509 Epoch: 5 Global Step: 237640 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:24:57,746-Speed 2630.68 samples/sec Loss 9.7336 LearningRate 0.0509 Epoch: 5 Global Step: 237650 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:25:01,639-Speed 2631.40 samples/sec Loss 9.5738 LearningRate 0.0509 Epoch: 5 Global Step: 237660 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:25:05,514-Speed 2643.01 samples/sec Loss 9.6523 LearningRate 0.0509 Epoch: 5 Global Step: 237670 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:25:09,418-Speed 2622.90 samples/sec Loss 9.6131 LearningRate 0.0509 Epoch: 5 Global Step: 237680 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:25:13,313-Speed 2630.83 samples/sec Loss 9.5005 LearningRate 0.0509 Epoch: 5 Global Step: 237690 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:25:17,212-Speed 2626.63 samples/sec Loss 9.6487 LearningRate 0.0509 Epoch: 5 Global Step: 237700 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:25:21,049-Speed 2669.87 samples/sec Loss 10.9180 LearningRate 0.0509 Epoch: 5 Global Step: 237710 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:24,949-Speed 2626.16 samples/sec Loss 10.5578 LearningRate 0.0509 Epoch: 5 Global Step: 237720 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:28,846-Speed 2628.98 samples/sec Loss 9.9241 LearningRate 0.0509 Epoch: 5 Global Step: 237730 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:32,830-Speed 2570.25 samples/sec Loss 9.9336 LearningRate 0.0509 Epoch: 5 Global Step: 237740 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:36,720-Speed 2632.94 samples/sec Loss 9.9190 LearningRate 0.0509 Epoch: 5 Global Step: 237750 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:40,620-Speed 2626.45 samples/sec Loss 9.7368 LearningRate 0.0509 Epoch: 5 Global Step: 237760 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:44,520-Speed 2626.77 samples/sec Loss 9.7254 LearningRate 0.0509 Epoch: 5 Global Step: 237770 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:48,414-Speed 2629.98 samples/sec Loss 9.8848 LearningRate 0.0509 Epoch: 5 Global Step: 237780 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:52,305-Speed 2632.29 samples/sec Loss 9.8414 LearningRate 0.0509 Epoch: 5 Global Step: 237790 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:25:56,196-Speed 2632.71 samples/sec Loss 9.7727 LearningRate 0.0509 Epoch: 5 Global Step: 237800 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:26:00,122-Speed 2608.97 samples/sec Loss 9.7137 LearningRate 0.0509 Epoch: 5 Global Step: 237810 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:04,014-Speed 2631.52 samples/sec Loss 9.7447 LearningRate 0.0509 Epoch: 5 Global Step: 237820 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:07,920-Speed 2622.22 samples/sec Loss 9.6381 LearningRate 0.0509 Epoch: 5 Global Step: 237830 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:11,821-Speed 2625.81 samples/sec Loss 9.7811 LearningRate 0.0509 Epoch: 5 Global Step: 237840 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:15,726-Speed 2623.33 samples/sec Loss 9.8625 LearningRate 0.0509 Epoch: 5 Global Step: 237850 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:19,630-Speed 2623.97 samples/sec Loss 9.7596 LearningRate 0.0509 Epoch: 5 Global Step: 237860 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:23,521-Speed 2631.95 samples/sec Loss 9.7315 LearningRate 0.0509 Epoch: 5 Global Step: 237870 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:27,430-Speed 2621.00 samples/sec Loss 9.7744 LearningRate 0.0509 Epoch: 5 Global Step: 237880 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:31,325-Speed 2629.24 samples/sec Loss 9.5963 LearningRate 0.0509 Epoch: 5 Global Step: 237890 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:35,229-Speed 2624.44 samples/sec Loss 9.6569 LearningRate 0.0509 Epoch: 5 Global Step: 237900 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:26:39,144-Speed 2616.31 samples/sec Loss 9.9626 LearningRate 0.0509 Epoch: 5 Global Step: 237910 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:26:43,044-Speed 2626.49 samples/sec Loss 10.2195 LearningRate 0.0509 Epoch: 5 Global Step: 237920 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:26:46,939-Speed 2629.41 samples/sec Loss 9.9637 LearningRate 0.0509 Epoch: 5 Global Step: 237930 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:26:50,837-Speed 2628.24 samples/sec Loss 9.8019 LearningRate 0.0509 Epoch: 5 Global Step: 237940 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:26:54,744-Speed 2621.81 samples/sec Loss 9.9155 LearningRate 0.0509 Epoch: 5 Global Step: 237950 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:26:58,660-Speed 2615.56 samples/sec Loss 9.7602 LearningRate 0.0509 Epoch: 5 Global Step: 237960 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:27:02,638-Speed 2574.62 samples/sec Loss 9.7123 LearningRate 0.0509 Epoch: 5 Global Step: 237970 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:27:06,606-Speed 2581.27 samples/sec Loss 9.6146 LearningRate 0.0509 Epoch: 5 Global Step: 237980 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:27:10,513-Speed 2621.91 samples/sec Loss 9.6617 LearningRate 0.0509 Epoch: 5 Global Step: 237990 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:27:14,415-Speed 2625.19 samples/sec Loss 9.6413 LearningRate 0.0509 Epoch: 5 Global Step: 238000 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:27:18,312-Speed 2628.17 samples/sec Loss 9.6201 LearningRate 0.0509 Epoch: 5 Global Step: 238010 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:22,219-Speed 2621.60 samples/sec Loss 9.6604 LearningRate 0.0508 Epoch: 5 Global Step: 238020 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:26,111-Speed 2632.03 samples/sec Loss 9.7983 LearningRate 0.0508 Epoch: 5 Global Step: 238030 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:30,002-Speed 2632.22 samples/sec Loss 9.8264 LearningRate 0.0508 Epoch: 5 Global Step: 238040 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:33,895-Speed 2631.17 samples/sec Loss 9.7625 LearningRate 0.0508 Epoch: 5 Global Step: 238050 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:37,793-Speed 2627.54 samples/sec Loss 9.6017 LearningRate 0.0508 Epoch: 5 Global Step: 238060 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:41,694-Speed 2625.50 samples/sec Loss 9.7294 LearningRate 0.0508 Epoch: 5 Global Step: 238070 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:45,593-Speed 2627.26 samples/sec Loss 9.6919 LearningRate 0.0508 Epoch: 5 Global Step: 238080 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:49,488-Speed 2629.35 samples/sec Loss 9.7363 LearningRate 0.0508 Epoch: 5 Global Step: 238090 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:53,379-Speed 2633.18 samples/sec Loss 9.6182 LearningRate 0.0508 Epoch: 5 Global Step: 238100 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:27:57,277-Speed 2627.32 samples/sec Loss 9.5102 LearningRate 0.0508 Epoch: 5 Global Step: 238110 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:01,176-Speed 2626.68 samples/sec Loss 9.7035 LearningRate 0.0508 Epoch: 5 Global Step: 238120 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:05,116-Speed 2599.69 samples/sec Loss 9.7175 LearningRate 0.0508 Epoch: 5 Global Step: 238130 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:09,037-Speed 2612.46 samples/sec Loss 9.8575 LearningRate 0.0508 Epoch: 5 Global Step: 238140 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:12,942-Speed 2622.79 samples/sec Loss 9.6981 LearningRate 0.0508 Epoch: 5 Global Step: 238150 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:16,837-Speed 2630.72 samples/sec Loss 9.6875 LearningRate 0.0508 Epoch: 5 Global Step: 238160 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:20,733-Speed 2628.56 samples/sec Loss 9.6225 LearningRate 0.0508 Epoch: 5 Global Step: 238170 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:24,633-Speed 2626.85 samples/sec Loss 9.6387 LearningRate 0.0508 Epoch: 5 Global Step: 238180 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:28,550-Speed 2614.70 samples/sec Loss 9.5213 LearningRate 0.0508 Epoch: 5 Global Step: 238190 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:32,469-Speed 2613.40 samples/sec Loss 9.6932 LearningRate 0.0508 Epoch: 5 Global Step: 238200 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:28:36,374-Speed 2623.12 samples/sec Loss 9.6191 LearningRate 0.0508 Epoch: 5 Global Step: 238210 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:28:40,268-Speed 2630.73 samples/sec Loss 9.7425 LearningRate 0.0508 Epoch: 5 Global Step: 238220 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:28:44,172-Speed 2623.64 samples/sec Loss 9.6397 LearningRate 0.0508 Epoch: 5 Global Step: 238230 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:28:48,083-Speed 2618.75 samples/sec Loss 9.5447 LearningRate 0.0508 Epoch: 5 Global Step: 238240 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:28:51,979-Speed 2629.14 samples/sec Loss 9.6771 LearningRate 0.0508 Epoch: 5 Global Step: 238250 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:28:55,871-Speed 2631.91 samples/sec Loss 9.6930 LearningRate 0.0508 Epoch: 5 Global Step: 238260 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:28:59,767-Speed 2628.85 samples/sec Loss 9.5158 LearningRate 0.0508 Epoch: 5 Global Step: 238270 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:29:03,646-Speed 2639.71 samples/sec Loss 9.5982 LearningRate 0.0508 Epoch: 5 Global Step: 238280 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:07,548-Speed 2625.37 samples/sec Loss 9.7792 LearningRate 0.0508 Epoch: 5 Global Step: 238290 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:11,445-Speed 2628.73 samples/sec Loss 9.5279 LearningRate 0.0508 Epoch: 5 Global Step: 238300 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:15,350-Speed 2622.89 samples/sec Loss 9.7304 LearningRate 0.0508 Epoch: 5 Global Step: 238310 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:19,256-Speed 2623.16 samples/sec Loss 9.5259 LearningRate 0.0508 Epoch: 5 Global Step: 238320 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:23,158-Speed 2625.08 samples/sec Loss 9.5878 LearningRate 0.0508 Epoch: 5 Global Step: 238330 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:27,053-Speed 2630.04 samples/sec Loss 9.6982 LearningRate 0.0508 Epoch: 5 Global Step: 238340 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:30,949-Speed 2628.36 samples/sec Loss 9.6239 LearningRate 0.0508 Epoch: 5 Global Step: 238350 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:34,840-Speed 2632.05 samples/sec Loss 9.6458 LearningRate 0.0508 Epoch: 5 Global Step: 238360 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:38,734-Speed 2630.63 samples/sec Loss 9.8311 LearningRate 0.0508 Epoch: 5 Global Step: 238370 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:29:42,627-Speed 2631.47 samples/sec Loss 9.5519 LearningRate 0.0508 Epoch: 5 Global Step: 238380 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:29:46,521-Speed 2629.88 samples/sec Loss 9.6572 LearningRate 0.0508 Epoch: 5 Global Step: 238390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:29:50,444-Speed 2611.50 samples/sec Loss 9.7158 LearningRate 0.0508 Epoch: 5 Global Step: 238400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:29:54,407-Speed 2584.21 samples/sec Loss 9.6974 LearningRate 0.0508 Epoch: 5 Global Step: 238410 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:29:58,304-Speed 2628.52 samples/sec Loss 9.7066 LearningRate 0.0508 Epoch: 5 Global Step: 238420 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:02,200-Speed 2629.48 samples/sec Loss 9.7074 LearningRate 0.0508 Epoch: 5 Global Step: 238430 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:06,096-Speed 2628.96 samples/sec Loss 9.7128 LearningRate 0.0508 Epoch: 5 Global Step: 238440 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:09,992-Speed 2628.37 samples/sec Loss 9.5827 LearningRate 0.0508 Epoch: 5 Global Step: 238450 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:13,894-Speed 2625.32 samples/sec Loss 9.4651 LearningRate 0.0508 Epoch: 5 Global Step: 238460 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:17,789-Speed 2629.52 samples/sec Loss 9.6660 LearningRate 0.0508 Epoch: 5 Global Step: 238470 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:21,684-Speed 2630.21 samples/sec Loss 9.5593 LearningRate 0.0508 Epoch: 5 Global Step: 238480 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:30:25,564-Speed 2639.45 samples/sec Loss 9.6959 LearningRate 0.0508 Epoch: 5 Global Step: 238490 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:29,459-Speed 2630.08 samples/sec Loss 9.6461 LearningRate 0.0508 Epoch: 5 Global Step: 238500 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:33,361-Speed 2624.58 samples/sec Loss 9.6648 LearningRate 0.0508 Epoch: 5 Global Step: 238510 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:37,255-Speed 2630.33 samples/sec Loss 9.6478 LearningRate 0.0508 Epoch: 5 Global Step: 238520 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:41,156-Speed 2625.14 samples/sec Loss 9.5149 LearningRate 0.0508 Epoch: 5 Global Step: 238530 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:45,069-Speed 2618.49 samples/sec Loss 9.7228 LearningRate 0.0508 Epoch: 5 Global Step: 238540 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:48,968-Speed 2626.70 samples/sec Loss 9.5042 LearningRate 0.0508 Epoch: 5 Global Step: 238550 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:52,866-Speed 2628.01 samples/sec Loss 9.5073 LearningRate 0.0508 Epoch: 5 Global Step: 238560 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:30:56,782-Speed 2615.10 samples/sec Loss 9.5730 LearningRate 0.0508 Epoch: 5 Global Step: 238570 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:00,677-Speed 2629.99 samples/sec Loss 9.7565 LearningRate 0.0508 Epoch: 5 Global Step: 238580 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:04,579-Speed 2624.98 samples/sec Loss 9.6984 LearningRate 0.0508 Epoch: 5 Global Step: 238590 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:31:08,476-Speed 2628.48 samples/sec Loss 9.6380 LearningRate 0.0507 Epoch: 5 Global Step: 238600 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:31:12,350-Speed 2643.67 samples/sec Loss 9.6669 LearningRate 0.0507 Epoch: 5 Global Step: 238610 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:16,245-Speed 2629.44 samples/sec Loss 9.6810 LearningRate 0.0507 Epoch: 5 Global Step: 238620 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:20,148-Speed 2625.05 samples/sec Loss 9.6389 LearningRate 0.0507 Epoch: 5 Global Step: 238630 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:24,045-Speed 2628.30 samples/sec Loss 9.5675 LearningRate 0.0507 Epoch: 5 Global Step: 238640 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:27,944-Speed 2626.79 samples/sec Loss 9.6317 LearningRate 0.0507 Epoch: 5 Global Step: 238650 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:31,838-Speed 2630.15 samples/sec Loss 9.6046 LearningRate 0.0507 Epoch: 5 Global Step: 238660 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:35,756-Speed 2613.90 samples/sec Loss 9.6210 LearningRate 0.0507 Epoch: 5 Global Step: 238670 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:39,653-Speed 2628.53 samples/sec Loss 9.5847 LearningRate 0.0507 Epoch: 5 Global Step: 238680 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:43,550-Speed 2627.97 samples/sec Loss 9.6570 LearningRate 0.0507 Epoch: 5 Global Step: 238690 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:47,443-Speed 2630.78 samples/sec Loss 9.5857 LearningRate 0.0507 Epoch: 5 Global Step: 238700 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:31:51,342-Speed 2627.36 samples/sec Loss 9.5127 LearningRate 0.0507 Epoch: 5 Global Step: 238710 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:31:55,244-Speed 2625.38 samples/sec Loss 9.7246 LearningRate 0.0507 Epoch: 5 Global Step: 238720 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:31:59,141-Speed 2628.00 samples/sec Loss 9.6734 LearningRate 0.0507 Epoch: 5 Global Step: 238730 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:03,043-Speed 2625.27 samples/sec Loss 9.7323 LearningRate 0.0507 Epoch: 5 Global Step: 238740 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:06,945-Speed 2624.76 samples/sec Loss 9.6729 LearningRate 0.0507 Epoch: 5 Global Step: 238750 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:10,849-Speed 2623.62 samples/sec Loss 9.5945 LearningRate 0.0507 Epoch: 5 Global Step: 238760 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:14,755-Speed 2622.42 samples/sec Loss 9.6388 LearningRate 0.0507 Epoch: 5 Global Step: 238770 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:18,685-Speed 2605.89 samples/sec Loss 9.5103 LearningRate 0.0507 Epoch: 5 Global Step: 238780 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:22,627-Speed 2598.29 samples/sec Loss 9.5403 LearningRate 0.0507 Epoch: 5 Global Step: 238790 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:26,530-Speed 2624.31 samples/sec Loss 9.5917 LearningRate 0.0507 Epoch: 5 Global Step: 238800 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:30,433-Speed 2624.44 samples/sec Loss 9.5826 LearningRate 0.0507 Epoch: 5 Global Step: 238810 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:32:34,325-Speed 2631.68 samples/sec Loss 9.4727 LearningRate 0.0507 Epoch: 5 Global Step: 238820 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:32:38,226-Speed 2626.01 samples/sec Loss 9.6242 LearningRate 0.0507 Epoch: 5 Global Step: 238830 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:32:42,125-Speed 2626.92 samples/sec Loss 9.6659 LearningRate 0.0507 Epoch: 5 Global Step: 238840 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:32:46,060-Speed 2602.88 samples/sec Loss 9.6400 LearningRate 0.0507 Epoch: 5 Global Step: 238850 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:32:49,960-Speed 2626.14 samples/sec Loss 9.6881 LearningRate 0.0507 Epoch: 5 Global Step: 238860 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:32:53,898-Speed 2601.17 samples/sec Loss 9.7768 LearningRate 0.0507 Epoch: 5 Global Step: 238870 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:32:57,881-Speed 2571.29 samples/sec Loss 9.6120 LearningRate 0.0507 Epoch: 5 Global Step: 238880 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:33:01,798-Speed 2615.29 samples/sec Loss 9.6991 LearningRate 0.0507 Epoch: 5 Global Step: 238890 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:33:05,691-Speed 2630.21 samples/sec Loss 9.5686 LearningRate 0.0507 Epoch: 5 Global Step: 238900 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:33:09,626-Speed 2603.82 samples/sec Loss 9.5242 LearningRate 0.0507 Epoch: 5 Global Step: 238910 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:33:13,521-Speed 2629.46 samples/sec Loss 9.6910 LearningRate 0.0507 Epoch: 5 Global Step: 238920 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:17,415-Speed 2630.33 samples/sec Loss 9.4934 LearningRate 0.0507 Epoch: 5 Global Step: 238930 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:21,309-Speed 2630.53 samples/sec Loss 9.6179 LearningRate 0.0507 Epoch: 5 Global Step: 238940 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:25,208-Speed 2626.32 samples/sec Loss 9.7767 LearningRate 0.0507 Epoch: 5 Global Step: 238950 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:29,115-Speed 2622.20 samples/sec Loss 9.6037 LearningRate 0.0507 Epoch: 5 Global Step: 238960 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:33,017-Speed 2624.80 samples/sec Loss 9.5474 LearningRate 0.0507 Epoch: 5 Global Step: 238970 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:36,926-Speed 2620.11 samples/sec Loss 9.8367 LearningRate 0.0507 Epoch: 5 Global Step: 238980 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:40,826-Speed 2626.07 samples/sec Loss 9.5452 LearningRate 0.0507 Epoch: 5 Global Step: 238990 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:44,723-Speed 2634.94 samples/sec Loss 9.5831 LearningRate 0.0507 Epoch: 5 Global Step: 239000 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:33:48,598-Speed 2642.93 samples/sec Loss 9.6567 LearningRate 0.0507 Epoch: 5 Global Step: 239010 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:33:52,498-Speed 2626.77 samples/sec Loss 9.5911 LearningRate 0.0507 Epoch: 5 Global Step: 239020 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:33:56,394-Speed 2629.16 samples/sec Loss 9.6234 LearningRate 0.0507 Epoch: 5 Global Step: 239030 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:00,295-Speed 2626.12 samples/sec Loss 9.6211 LearningRate 0.0507 Epoch: 5 Global Step: 239040 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:04,205-Speed 2619.60 samples/sec Loss 9.6483 LearningRate 0.0507 Epoch: 5 Global Step: 239050 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:08,109-Speed 2623.08 samples/sec Loss 9.7200 LearningRate 0.0507 Epoch: 5 Global Step: 239060 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:12,029-Speed 2612.55 samples/sec Loss 9.6056 LearningRate 0.0507 Epoch: 5 Global Step: 239070 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:15,932-Speed 2624.70 samples/sec Loss 9.7006 LearningRate 0.0507 Epoch: 5 Global Step: 239080 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:19,825-Speed 2631.34 samples/sec Loss 9.6033 LearningRate 0.0507 Epoch: 5 Global Step: 239090 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:23,719-Speed 2630.24 samples/sec Loss 9.6665 LearningRate 0.0507 Epoch: 5 Global Step: 239100 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:27,628-Speed 2620.74 samples/sec Loss 9.7288 LearningRate 0.0507 Epoch: 5 Global Step: 239110 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:34:31,497-Speed 2647.21 samples/sec Loss 9.6068 LearningRate 0.0507 Epoch: 5 Global Step: 239120 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:35,391-Speed 2630.07 samples/sec Loss 9.7058 LearningRate 0.0507 Epoch: 5 Global Step: 239130 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:39,289-Speed 2627.81 samples/sec Loss 9.8129 LearningRate 0.0507 Epoch: 5 Global Step: 239140 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:43,183-Speed 2630.28 samples/sec Loss 9.6292 LearningRate 0.0507 Epoch: 5 Global Step: 239150 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:47,078-Speed 2630.23 samples/sec Loss 9.6994 LearningRate 0.0507 Epoch: 5 Global Step: 239160 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:50,972-Speed 2630.67 samples/sec Loss 9.7340 LearningRate 0.0507 Epoch: 5 Global Step: 239170 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:54,868-Speed 2628.84 samples/sec Loss 9.4168 LearningRate 0.0506 Epoch: 5 Global Step: 239180 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:34:58,762-Speed 2630.04 samples/sec Loss 9.5664 LearningRate 0.0506 Epoch: 5 Global Step: 239190 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:35:02,659-Speed 2628.77 samples/sec Loss 9.5271 LearningRate 0.0506 Epoch: 5 Global Step: 239200 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:35:06,570-Speed 2618.87 samples/sec Loss 9.5342 LearningRate 0.0506 Epoch: 5 Global Step: 239210 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:35:10,495-Speed 2609.45 samples/sec Loss 9.5612 LearningRate 0.0506 Epoch: 5 Global Step: 239220 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:14,404-Speed 2620.60 samples/sec Loss 9.5705 LearningRate 0.0506 Epoch: 5 Global Step: 239230 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:18,289-Speed 2636.54 samples/sec Loss 9.3532 LearningRate 0.0506 Epoch: 5 Global Step: 239240 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:22,208-Speed 2613.42 samples/sec Loss 9.5635 LearningRate 0.0506 Epoch: 5 Global Step: 239250 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:26,104-Speed 2628.70 samples/sec Loss 9.7912 LearningRate 0.0506 Epoch: 5 Global Step: 239260 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:30,137-Speed 2540.46 samples/sec Loss 9.7369 LearningRate 0.0506 Epoch: 5 Global Step: 239270 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:34,033-Speed 2628.80 samples/sec Loss 9.5918 LearningRate 0.0506 Epoch: 5 Global Step: 239280 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:37,934-Speed 2625.73 samples/sec Loss 9.7405 LearningRate 0.0506 Epoch: 5 Global Step: 239290 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:35:41,825-Speed 2631.88 samples/sec Loss 9.7451 LearningRate 0.0506 Epoch: 5 Global Step: 239300 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:35:45,720-Speed 2629.94 samples/sec Loss 9.5564 LearningRate 0.0506 Epoch: 5 Global Step: 239310 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:35:49,617-Speed 2628.45 samples/sec Loss 9.5339 LearningRate 0.0506 Epoch: 5 Global Step: 239320 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:35:53,511-Speed 2630.25 samples/sec Loss 9.7271 LearningRate 0.0506 Epoch: 5 Global Step: 239330 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:35:57,405-Speed 2630.40 samples/sec Loss 9.5353 LearningRate 0.0506 Epoch: 5 Global Step: 239340 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:01,299-Speed 2630.24 samples/sec Loss 9.6525 LearningRate 0.0506 Epoch: 5 Global Step: 239350 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:05,203-Speed 2624.03 samples/sec Loss 9.7727 LearningRate 0.0506 Epoch: 5 Global Step: 239360 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:09,092-Speed 2633.77 samples/sec Loss 9.6135 LearningRate 0.0506 Epoch: 5 Global Step: 239370 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:12,986-Speed 2629.99 samples/sec Loss 9.6289 LearningRate 0.0506 Epoch: 5 Global Step: 239380 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:16,878-Speed 2631.50 samples/sec Loss 9.6026 LearningRate 0.0506 Epoch: 5 Global Step: 239390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:20,760-Speed 2638.89 samples/sec Loss 9.5305 LearningRate 0.0506 Epoch: 5 Global Step: 239400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:24,662-Speed 2625.39 samples/sec Loss 9.5489 LearningRate 0.0506 Epoch: 5 Global Step: 239410 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:28,578-Speed 2615.55 samples/sec Loss 9.7963 LearningRate 0.0506 Epoch: 5 Global Step: 239420 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:32,489-Speed 2619.24 samples/sec Loss 9.7612 LearningRate 0.0506 Epoch: 5 Global Step: 239430 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:36,382-Speed 2630.78 samples/sec Loss 9.5960 LearningRate 0.0506 Epoch: 5 Global Step: 239440 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:40,293-Speed 2618.90 samples/sec Loss 9.5748 LearningRate 0.0506 Epoch: 5 Global Step: 239450 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:44,185-Speed 2631.91 samples/sec Loss 9.5992 LearningRate 0.0506 Epoch: 5 Global Step: 239460 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:48,101-Speed 2615.42 samples/sec Loss 9.5432 LearningRate 0.0506 Epoch: 5 Global Step: 239470 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:51,996-Speed 2629.53 samples/sec Loss 9.5112 LearningRate 0.0506 Epoch: 5 Global Step: 239480 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:55,896-Speed 2626.12 samples/sec Loss 9.6065 LearningRate 0.0506 Epoch: 5 Global Step: 239490 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:36:59,809-Speed 2618.43 samples/sec Loss 9.6752 LearningRate 0.0506 Epoch: 5 Global Step: 239500 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:37:03,706-Speed 2628.26 samples/sec Loss 9.5553 LearningRate 0.0506 Epoch: 5 Global Step: 239510 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:37:07,600-Speed 2629.60 samples/sec Loss 9.6292 LearningRate 0.0506 Epoch: 5 Global Step: 239520 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:37:11,501-Speed 2625.51 samples/sec Loss 9.7144 LearningRate 0.0506 Epoch: 5 Global Step: 239530 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:37:15,389-Speed 2634.96 samples/sec Loss 9.6763 LearningRate 0.0506 Epoch: 5 Global Step: 239540 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:19,284-Speed 2629.25 samples/sec Loss 9.6021 LearningRate 0.0506 Epoch: 5 Global Step: 239550 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:23,208-Speed 2610.84 samples/sec Loss 9.5623 LearningRate 0.0506 Epoch: 5 Global Step: 239560 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:27,104-Speed 2629.06 samples/sec Loss 9.5855 LearningRate 0.0506 Epoch: 5 Global Step: 239570 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:31,052-Speed 2594.49 samples/sec Loss 9.6279 LearningRate 0.0506 Epoch: 5 Global Step: 239580 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:34,954-Speed 2625.12 samples/sec Loss 9.5373 LearningRate 0.0506 Epoch: 5 Global Step: 239590 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:38,852-Speed 2627.17 samples/sec Loss 9.5985 LearningRate 0.0506 Epoch: 5 Global Step: 239600 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:42,748-Speed 2629.05 samples/sec Loss 9.5146 LearningRate 0.0506 Epoch: 5 Global Step: 239610 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:46,644-Speed 2629.78 samples/sec Loss 9.6765 LearningRate 0.0506 Epoch: 5 Global Step: 239620 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:50,540-Speed 2628.89 samples/sec Loss 9.5264 LearningRate 0.0506 Epoch: 5 Global Step: 239630 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:37:54,433-Speed 2631.12 samples/sec Loss 9.6811 LearningRate 0.0506 Epoch: 5 Global Step: 239640 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:37:58,329-Speed 2628.84 samples/sec Loss 9.6771 LearningRate 0.0506 Epoch: 5 Global Step: 239650 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:38:02,231-Speed 2624.84 samples/sec Loss 9.7347 LearningRate 0.0506 Epoch: 5 Global Step: 239660 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:38:06,126-Speed 2629.62 samples/sec Loss 9.5918 LearningRate 0.0506 Epoch: 5 Global Step: 239670 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:38:10,021-Speed 2630.00 samples/sec Loss 9.6581 LearningRate 0.0506 Epoch: 5 Global Step: 239680 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:38:13,898-Speed 2641.51 samples/sec Loss 9.7239 LearningRate 0.0506 Epoch: 5 Global Step: 239690 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:17,793-Speed 2629.93 samples/sec Loss 9.6733 LearningRate 0.0506 Epoch: 5 Global Step: 239700 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:21,687-Speed 2630.21 samples/sec Loss 9.6765 LearningRate 0.0506 Epoch: 5 Global Step: 239710 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:25,580-Speed 2631.26 samples/sec Loss 9.7637 LearningRate 0.0506 Epoch: 5 Global Step: 239720 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:29,494-Speed 2617.11 samples/sec Loss 9.7022 LearningRate 0.0506 Epoch: 5 Global Step: 239730 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:33,409-Speed 2615.83 samples/sec Loss 9.7183 LearningRate 0.0506 Epoch: 5 Global Step: 239740 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:37,308-Speed 2627.16 samples/sec Loss 9.6106 LearningRate 0.0506 Epoch: 5 Global Step: 239750 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:41,204-Speed 2629.35 samples/sec Loss 9.6950 LearningRate 0.0506 Epoch: 5 Global Step: 239760 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:45,100-Speed 2629.37 samples/sec Loss 9.7001 LearningRate 0.0505 Epoch: 5 Global Step: 239770 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:48,996-Speed 2628.41 samples/sec Loss 9.5944 LearningRate 0.0505 Epoch: 5 Global Step: 239780 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:38:52,902-Speed 2622.77 samples/sec Loss 9.6273 LearningRate 0.0505 Epoch: 5 Global Step: 239790 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:38:56,715-Speed 2686.47 samples/sec Loss 9.9507 LearningRate 0.0505 Epoch: 5 Global Step: 239800 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:00,610-Speed 2629.55 samples/sec Loss 10.8635 LearningRate 0.0505 Epoch: 5 Global Step: 239810 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:04,511-Speed 2624.81 samples/sec Loss 10.1667 LearningRate 0.0505 Epoch: 5 Global Step: 239820 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:08,417-Speed 2622.58 samples/sec Loss 9.8011 LearningRate 0.0505 Epoch: 5 Global Step: 239830 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:12,317-Speed 2626.27 samples/sec Loss 9.7404 LearningRate 0.0505 Epoch: 5 Global Step: 239840 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:16,212-Speed 2630.69 samples/sec Loss 9.7233 LearningRate 0.0505 Epoch: 5 Global Step: 239850 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:20,102-Speed 2633.07 samples/sec Loss 9.6349 LearningRate 0.0505 Epoch: 5 Global Step: 239860 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:23,993-Speed 2632.05 samples/sec Loss 9.5466 LearningRate 0.0505 Epoch: 5 Global Step: 239870 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:27,891-Speed 2627.72 samples/sec Loss 9.7187 LearningRate 0.0505 Epoch: 5 Global Step: 239880 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:31,782-Speed 2631.87 samples/sec Loss 9.7336 LearningRate 0.0505 Epoch: 5 Global Step: 239890 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:39:35,677-Speed 2629.57 samples/sec Loss 9.5187 LearningRate 0.0505 Epoch: 5 Global Step: 239900 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:39:39,568-Speed 2632.44 samples/sec Loss 9.7353 LearningRate 0.0505 Epoch: 5 Global Step: 239910 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:39:43,464-Speed 2629.10 samples/sec Loss 9.6673 LearningRate 0.0505 Epoch: 5 Global Step: 239920 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:39:47,360-Speed 2629.01 samples/sec Loss 9.5959 LearningRate 0.0505 Epoch: 5 Global Step: 239930 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:39:51,282-Speed 2612.38 samples/sec Loss 9.6061 LearningRate 0.0505 Epoch: 5 Global Step: 239940 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:39:55,174-Speed 2631.42 samples/sec Loss 9.7010 LearningRate 0.0505 Epoch: 5 Global Step: 239950 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:39:59,080-Speed 2622.66 samples/sec Loss 9.5629 LearningRate 0.0505 Epoch: 5 Global Step: 239960 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:40:02,982-Speed 2624.95 samples/sec Loss 9.4253 LearningRate 0.0505 Epoch: 5 Global Step: 239970 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:40:06,873-Speed 2632.28 samples/sec Loss 9.7121 LearningRate 0.0505 Epoch: 5 Global Step: 239980 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:40:10,763-Speed 2632.55 samples/sec Loss 9.6184 LearningRate 0.0505 Epoch: 5 Global Step: 239990 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:40:14,661-Speed 2627.97 samples/sec Loss 9.5810 LearningRate 0.0505 Epoch: 5 Global Step: 240000 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:40:57,423-[lfw][240000]XNorm: 23.453357
Training: 2022-04-13 22:40:57,424-[lfw][240000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-04-13 22:40:57,425-[lfw][240000]Accuracy-Highest: 0.99783
Training: 2022-04-13 22:41:47,529-[cfp_fp][240000]XNorm: 21.397198
Training: 2022-04-13 22:41:47,529-[cfp_fp][240000]Accuracy-Flip: 0.97871+-0.00689
Training: 2022-04-13 22:41:47,531-[cfp_fp][240000]Accuracy-Highest: 0.98314
Training: 2022-04-13 22:42:30,745-[agedb_30][240000]XNorm: 23.193004
Training: 2022-04-13 22:42:30,746-[agedb_30][240000]Accuracy-Flip: 0.97267+-0.00814
Training: 2022-04-13 22:42:30,746-[agedb_30][240000]Accuracy-Highest: 0.97267
Training: 2022-04-13 22:42:34,620-Speed 73.16 samples/sec Loss 9.6308 LearningRate 0.0505 Epoch: 5 Global Step: 240010 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:42:38,486-Speed 2649.22 samples/sec Loss 9.6315 LearningRate 0.0505 Epoch: 5 Global Step: 240020 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:42:42,354-Speed 2647.98 samples/sec Loss 9.6037 LearningRate 0.0505 Epoch: 5 Global Step: 240030 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:42:46,253-Speed 2627.16 samples/sec Loss 9.6040 LearningRate 0.0505 Epoch: 5 Global Step: 240040 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:42:50,128-Speed 2643.02 samples/sec Loss 9.6319 LearningRate 0.0505 Epoch: 5 Global Step: 240050 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:42:54,009-Speed 2639.43 samples/sec Loss 9.6521 LearningRate 0.0505 Epoch: 5 Global Step: 240060 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:42:57,898-Speed 2635.08 samples/sec Loss 9.6619 LearningRate 0.0505 Epoch: 5 Global Step: 240070 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:01,779-Speed 2639.98 samples/sec Loss 9.6239 LearningRate 0.0505 Epoch: 5 Global Step: 240080 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:05,659-Speed 2639.73 samples/sec Loss 9.7536 LearningRate 0.0505 Epoch: 5 Global Step: 240090 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:09,612-Speed 2590.83 samples/sec Loss 9.5619 LearningRate 0.0505 Epoch: 5 Global Step: 240100 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:43:13,494-Speed 2637.99 samples/sec Loss 9.8461 LearningRate 0.0505 Epoch: 5 Global Step: 240110 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:17,384-Speed 2634.59 samples/sec Loss 9.6905 LearningRate 0.0505 Epoch: 5 Global Step: 240120 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:21,268-Speed 2636.91 samples/sec Loss 9.5922 LearningRate 0.0505 Epoch: 5 Global Step: 240130 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:25,154-Speed 2635.89 samples/sec Loss 9.7915 LearningRate 0.0505 Epoch: 5 Global Step: 240140 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:29,047-Speed 2631.23 samples/sec Loss 9.6019 LearningRate 0.0505 Epoch: 5 Global Step: 240150 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:32,933-Speed 2636.01 samples/sec Loss 9.5559 LearningRate 0.0505 Epoch: 5 Global Step: 240160 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:36,824-Speed 2632.28 samples/sec Loss 9.6549 LearningRate 0.0505 Epoch: 5 Global Step: 240170 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:40,714-Speed 2632.72 samples/sec Loss 9.6300 LearningRate 0.0505 Epoch: 5 Global Step: 240180 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:44,609-Speed 2630.19 samples/sec Loss 9.5507 LearningRate 0.0505 Epoch: 5 Global Step: 240190 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:48,496-Speed 2635.21 samples/sec Loss 9.5817 LearningRate 0.0505 Epoch: 5 Global Step: 240200 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:43:52,389-Speed 2630.88 samples/sec Loss 9.5426 LearningRate 0.0505 Epoch: 5 Global Step: 240210 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:43:56,286-Speed 2628.62 samples/sec Loss 9.5314 LearningRate 0.0505 Epoch: 5 Global Step: 240220 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:00,214-Speed 2607.93 samples/sec Loss 9.6816 LearningRate 0.0505 Epoch: 5 Global Step: 240230 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:04,105-Speed 2632.38 samples/sec Loss 9.4340 LearningRate 0.0505 Epoch: 5 Global Step: 240240 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:07,998-Speed 2630.82 samples/sec Loss 9.7597 LearningRate 0.0505 Epoch: 5 Global Step: 240250 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:11,904-Speed 2622.33 samples/sec Loss 9.5818 LearningRate 0.0505 Epoch: 5 Global Step: 240260 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:15,798-Speed 2630.66 samples/sec Loss 9.6377 LearningRate 0.0505 Epoch: 5 Global Step: 240270 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:19,691-Speed 2630.95 samples/sec Loss 9.6915 LearningRate 0.0505 Epoch: 5 Global Step: 240280 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:23,585-Speed 2630.34 samples/sec Loss 9.6877 LearningRate 0.0505 Epoch: 5 Global Step: 240290 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:27,499-Speed 2617.47 samples/sec Loss 9.6008 LearningRate 0.0505 Epoch: 5 Global Step: 240300 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:44:31,385-Speed 2635.97 samples/sec Loss 9.6503 LearningRate 0.0505 Epoch: 5 Global Step: 240310 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:44:35,277-Speed 2631.46 samples/sec Loss 9.7214 LearningRate 0.0505 Epoch: 5 Global Step: 240320 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:44:39,172-Speed 2629.40 samples/sec Loss 9.5754 LearningRate 0.0505 Epoch: 5 Global Step: 240330 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:44:43,065-Speed 2631.36 samples/sec Loss 9.5089 LearningRate 0.0505 Epoch: 5 Global Step: 240340 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:44:46,964-Speed 2626.87 samples/sec Loss 9.6744 LearningRate 0.0504 Epoch: 5 Global Step: 240350 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:44:50,855-Speed 2632.52 samples/sec Loss 9.6775 LearningRate 0.0504 Epoch: 5 Global Step: 240360 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:44:54,750-Speed 2629.40 samples/sec Loss 9.5177 LearningRate 0.0504 Epoch: 5 Global Step: 240370 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:44:58,645-Speed 2629.75 samples/sec Loss 9.4817 LearningRate 0.0504 Epoch: 5 Global Step: 240380 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:02,541-Speed 2629.30 samples/sec Loss 9.5052 LearningRate 0.0504 Epoch: 5 Global Step: 240390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:06,436-Speed 2629.01 samples/sec Loss 9.6230 LearningRate 0.0504 Epoch: 5 Global Step: 240400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:10,340-Speed 2623.63 samples/sec Loss 9.6899 LearningRate 0.0504 Epoch: 5 Global Step: 240410 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:45:14,235-Speed 2629.87 samples/sec Loss 9.6543 LearningRate 0.0504 Epoch: 5 Global Step: 240420 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:45:18,136-Speed 2625.69 samples/sec Loss 9.5262 LearningRate 0.0504 Epoch: 5 Global Step: 240430 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:45:22,031-Speed 2630.04 samples/sec Loss 9.5618 LearningRate 0.0504 Epoch: 5 Global Step: 240440 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:45:25,919-Speed 2634.39 samples/sec Loss 9.6032 LearningRate 0.0504 Epoch: 5 Global Step: 240450 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:29,812-Speed 2631.25 samples/sec Loss 9.6756 LearningRate 0.0504 Epoch: 5 Global Step: 240460 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:33,707-Speed 2629.91 samples/sec Loss 9.7120 LearningRate 0.0504 Epoch: 5 Global Step: 240470 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:37,600-Speed 2630.54 samples/sec Loss 9.6119 LearningRate 0.0504 Epoch: 5 Global Step: 240480 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:41,504-Speed 2623.62 samples/sec Loss 9.6405 LearningRate 0.0504 Epoch: 5 Global Step: 240490 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:45,403-Speed 2626.98 samples/sec Loss 9.5614 LearningRate 0.0504 Epoch: 5 Global Step: 240500 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:49,297-Speed 2630.63 samples/sec Loss 9.6849 LearningRate 0.0504 Epoch: 5 Global Step: 240510 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:53,196-Speed 2627.14 samples/sec Loss 9.5409 LearningRate 0.0504 Epoch: 5 Global Step: 240520 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:45:57,086-Speed 2632.87 samples/sec Loss 9.6413 LearningRate 0.0504 Epoch: 5 Global Step: 240530 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:46:00,882-Speed 2698.52 samples/sec Loss 9.6836 LearningRate 0.0504 Epoch: 5 Global Step: 240540 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:04,776-Speed 2630.63 samples/sec Loss 9.9694 LearningRate 0.0504 Epoch: 5 Global Step: 240550 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:08,672-Speed 2628.66 samples/sec Loss 9.7306 LearningRate 0.0504 Epoch: 5 Global Step: 240560 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:12,558-Speed 2635.27 samples/sec Loss 9.6238 LearningRate 0.0504 Epoch: 5 Global Step: 240570 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:16,456-Speed 2627.73 samples/sec Loss 9.4066 LearningRate 0.0504 Epoch: 5 Global Step: 240580 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:20,349-Speed 2631.05 samples/sec Loss 9.6675 LearningRate 0.0504 Epoch: 5 Global Step: 240590 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:24,242-Speed 2631.80 samples/sec Loss 9.6137 LearningRate 0.0504 Epoch: 5 Global Step: 240600 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:28,132-Speed 2632.74 samples/sec Loss 9.5432 LearningRate 0.0504 Epoch: 5 Global Step: 240610 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:32,023-Speed 2632.56 samples/sec Loss 9.7246 LearningRate 0.0504 Epoch: 5 Global Step: 240620 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:35,914-Speed 2632.18 samples/sec Loss 9.5727 LearningRate 0.0504 Epoch: 5 Global Step: 240630 Fp16 Grad Scale: 2048 Required: 66 hours
Training: 2022-04-13 22:46:39,803-Speed 2633.55 samples/sec Loss 9.5412 LearningRate 0.0504 Epoch: 5 Global Step: 240640 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:46:43,696-Speed 2630.65 samples/sec Loss 9.5173 LearningRate 0.0504 Epoch: 5 Global Step: 240650 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:46:47,600-Speed 2624.04 samples/sec Loss 9.6442 LearningRate 0.0504 Epoch: 5 Global Step: 240660 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:46:51,519-Speed 2613.72 samples/sec Loss 9.7092 LearningRate 0.0504 Epoch: 5 Global Step: 240670 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:46:55,410-Speed 2632.63 samples/sec Loss 9.5415 LearningRate 0.0504 Epoch: 5 Global Step: 240680 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:46:59,303-Speed 2631.30 samples/sec Loss 9.5696 LearningRate 0.0504 Epoch: 5 Global Step: 240690 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:47:03,205-Speed 2624.68 samples/sec Loss 9.7120 LearningRate 0.0504 Epoch: 5 Global Step: 240700 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:47:07,104-Speed 2627.02 samples/sec Loss 9.6448 LearningRate 0.0504 Epoch: 5 Global Step: 240710 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:47:11,003-Speed 2627.00 samples/sec Loss 9.6730 LearningRate 0.0504 Epoch: 5 Global Step: 240720 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:47:14,900-Speed 2628.07 samples/sec Loss 9.5785 LearningRate 0.0504 Epoch: 5 Global Step: 240730 Fp16 Grad Scale: 4096 Required: 66 hours
Training: 2022-04-13 22:47:18,809-Speed 2620.46 samples/sec Loss 9.6927 LearningRate 0.0504 Epoch: 5 Global Step: 240740 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:22,712-Speed 2624.69 samples/sec Loss 9.7495 LearningRate 0.0504 Epoch: 5 Global Step: 240750 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:26,640-Speed 2607.03 samples/sec Loss 9.7443 LearningRate 0.0504 Epoch: 5 Global Step: 240760 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:30,542-Speed 2625.03 samples/sec Loss 9.6745 LearningRate 0.0504 Epoch: 5 Global Step: 240770 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:34,441-Speed 2627.64 samples/sec Loss 9.6713 LearningRate 0.0504 Epoch: 5 Global Step: 240780 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:38,339-Speed 2627.22 samples/sec Loss 9.6373 LearningRate 0.0504 Epoch: 5 Global Step: 240790 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:42,258-Speed 2613.43 samples/sec Loss 9.4895 LearningRate 0.0504 Epoch: 5 Global Step: 240800 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:46,159-Speed 2626.72 samples/sec Loss 9.6245 LearningRate 0.0504 Epoch: 5 Global Step: 240810 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:50,226-Speed 2518.41 samples/sec Loss 9.6234 LearningRate 0.0504 Epoch: 5 Global Step: 240820 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:54,262-Speed 2537.74 samples/sec Loss 9.6951 LearningRate 0.0504 Epoch: 5 Global Step: 240830 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:47:58,149-Speed 2634.81 samples/sec Loss 9.5792 LearningRate 0.0504 Epoch: 5 Global Step: 240840 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:02,039-Speed 2633.54 samples/sec Loss 9.5714 LearningRate 0.0504 Epoch: 5 Global Step: 240850 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:05,938-Speed 2627.27 samples/sec Loss 9.5532 LearningRate 0.0504 Epoch: 5 Global Step: 240860 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:09,834-Speed 2629.04 samples/sec Loss 9.5669 LearningRate 0.0504 Epoch: 5 Global Step: 240870 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:13,727-Speed 2630.50 samples/sec Loss 9.5145 LearningRate 0.0504 Epoch: 5 Global Step: 240880 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:17,616-Speed 2633.96 samples/sec Loss 9.5479 LearningRate 0.0504 Epoch: 5 Global Step: 240890 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:21,511-Speed 2630.57 samples/sec Loss 9.5530 LearningRate 0.0504 Epoch: 5 Global Step: 240900 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:25,406-Speed 2629.40 samples/sec Loss 9.6303 LearningRate 0.0504 Epoch: 5 Global Step: 240910 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:29,318-Speed 2618.26 samples/sec Loss 9.7224 LearningRate 0.0504 Epoch: 5 Global Step: 240920 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:33,210-Speed 2632.13 samples/sec Loss 9.6214 LearningRate 0.0503 Epoch: 5 Global Step: 240930 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:48:37,106-Speed 2628.36 samples/sec Loss 9.6509 LearningRate 0.0503 Epoch: 5 Global Step: 240940 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:48:41,003-Speed 2628.58 samples/sec Loss 9.5858 LearningRate 0.0503 Epoch: 5 Global Step: 240950 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:48:44,905-Speed 2625.16 samples/sec Loss 9.7072 LearningRate 0.0503 Epoch: 5 Global Step: 240960 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:48:48,798-Speed 2630.83 samples/sec Loss 9.5562 LearningRate 0.0503 Epoch: 5 Global Step: 240970 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:48:52,730-Speed 2604.67 samples/sec Loss 9.5693 LearningRate 0.0503 Epoch: 5 Global Step: 240980 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:48:56,623-Speed 2631.64 samples/sec Loss 9.5509 LearningRate 0.0503 Epoch: 5 Global Step: 240990 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:49:00,517-Speed 2630.33 samples/sec Loss 9.6085 LearningRate 0.0503 Epoch: 5 Global Step: 241000 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:49:04,410-Speed 2631.04 samples/sec Loss 9.5667 LearningRate 0.0503 Epoch: 5 Global Step: 241010 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:49:08,307-Speed 2628.45 samples/sec Loss 9.7293 LearningRate 0.0503 Epoch: 5 Global Step: 241020 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:49:12,206-Speed 2627.43 samples/sec Loss 9.4664 LearningRate 0.0503 Epoch: 5 Global Step: 241030 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:49:16,129-Speed 2610.91 samples/sec Loss 9.5814 LearningRate 0.0503 Epoch: 5 Global Step: 241040 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:20,021-Speed 2631.34 samples/sec Loss 9.6152 LearningRate 0.0503 Epoch: 5 Global Step: 241050 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:23,912-Speed 2632.21 samples/sec Loss 9.5086 LearningRate 0.0503 Epoch: 5 Global Step: 241060 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:27,811-Speed 2627.15 samples/sec Loss 9.4464 LearningRate 0.0503 Epoch: 5 Global Step: 241070 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:31,710-Speed 2627.07 samples/sec Loss 9.6022 LearningRate 0.0503 Epoch: 5 Global Step: 241080 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:35,607-Speed 2628.51 samples/sec Loss 9.6741 LearningRate 0.0503 Epoch: 5 Global Step: 241090 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:39,511-Speed 2624.03 samples/sec Loss 9.7050 LearningRate 0.0503 Epoch: 5 Global Step: 241100 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:43,401-Speed 2633.11 samples/sec Loss 9.4665 LearningRate 0.0503 Epoch: 5 Global Step: 241110 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:47,294-Speed 2630.58 samples/sec Loss 9.6026 LearningRate 0.0503 Epoch: 5 Global Step: 241120 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:51,192-Speed 2627.15 samples/sec Loss 9.5717 LearningRate 0.0503 Epoch: 5 Global Step: 241130 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:49:55,084-Speed 2631.99 samples/sec Loss 9.4032 LearningRate 0.0503 Epoch: 5 Global Step: 241140 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:49:58,984-Speed 2625.84 samples/sec Loss 9.6200 LearningRate 0.0503 Epoch: 5 Global Step: 241150 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:02,896-Speed 2618.84 samples/sec Loss 9.6502 LearningRate 0.0503 Epoch: 5 Global Step: 241160 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:06,790-Speed 2629.88 samples/sec Loss 9.6076 LearningRate 0.0503 Epoch: 5 Global Step: 241170 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:10,699-Speed 2620.55 samples/sec Loss 9.7879 LearningRate 0.0503 Epoch: 5 Global Step: 241180 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:14,595-Speed 2628.90 samples/sec Loss 9.6629 LearningRate 0.0503 Epoch: 5 Global Step: 241190 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:18,632-Speed 2537.19 samples/sec Loss 9.6317 LearningRate 0.0503 Epoch: 5 Global Step: 241200 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:22,639-Speed 2555.77 samples/sec Loss 9.6151 LearningRate 0.0503 Epoch: 5 Global Step: 241210 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:26,534-Speed 2629.87 samples/sec Loss 9.4833 LearningRate 0.0503 Epoch: 5 Global Step: 241220 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:30,429-Speed 2629.55 samples/sec Loss 9.3730 LearningRate 0.0503 Epoch: 5 Global Step: 241230 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:34,335-Speed 2622.14 samples/sec Loss 9.5630 LearningRate 0.0503 Epoch: 5 Global Step: 241240 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:50:38,217-Speed 2637.87 samples/sec Loss 9.6796 LearningRate 0.0503 Epoch: 5 Global Step: 241250 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:42,116-Speed 2627.44 samples/sec Loss 9.5776 LearningRate 0.0503 Epoch: 5 Global Step: 241260 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:46,016-Speed 2626.17 samples/sec Loss 9.5396 LearningRate 0.0503 Epoch: 5 Global Step: 241270 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:49,906-Speed 2633.09 samples/sec Loss 9.5468 LearningRate 0.0503 Epoch: 5 Global Step: 241280 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:50:53,902-Speed 2563.07 samples/sec Loss 9.4856 LearningRate 0.0503 Epoch: 5 Global Step: 241290 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:50:57,870-Speed 2581.90 samples/sec Loss 9.4349 LearningRate 0.0503 Epoch: 5 Global Step: 241300 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:01,767-Speed 2627.63 samples/sec Loss 9.6910 LearningRate 0.0503 Epoch: 5 Global Step: 241310 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:05,672-Speed 2622.90 samples/sec Loss 9.3746 LearningRate 0.0503 Epoch: 5 Global Step: 241320 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:09,567-Speed 2629.19 samples/sec Loss 9.5807 LearningRate 0.0503 Epoch: 5 Global Step: 241330 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:13,475-Speed 2621.28 samples/sec Loss 9.5275 LearningRate 0.0503 Epoch: 5 Global Step: 241340 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:17,370-Speed 2629.35 samples/sec Loss 9.5242 LearningRate 0.0503 Epoch: 5 Global Step: 241350 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:21,264-Speed 2630.29 samples/sec Loss 9.7167 LearningRate 0.0503 Epoch: 5 Global Step: 241360 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:25,161-Speed 2628.90 samples/sec Loss 9.7536 LearningRate 0.0503 Epoch: 5 Global Step: 241370 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:29,057-Speed 2628.89 samples/sec Loss 9.6219 LearningRate 0.0503 Epoch: 5 Global Step: 241380 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:51:32,949-Speed 2631.51 samples/sec Loss 9.4199 LearningRate 0.0503 Epoch: 5 Global Step: 241390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:51:36,843-Speed 2629.94 samples/sec Loss 9.5039 LearningRate 0.0503 Epoch: 5 Global Step: 241400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:51:40,738-Speed 2629.74 samples/sec Loss 9.6803 LearningRate 0.0503 Epoch: 5 Global Step: 241410 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:51:44,631-Speed 2631.17 samples/sec Loss 9.6459 LearningRate 0.0503 Epoch: 5 Global Step: 241420 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:51:48,533-Speed 2624.42 samples/sec Loss 9.6261 LearningRate 0.0503 Epoch: 5 Global Step: 241430 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:51:52,425-Speed 2631.81 samples/sec Loss 9.5267 LearningRate 0.0503 Epoch: 5 Global Step: 241440 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:51:56,317-Speed 2631.77 samples/sec Loss 9.5857 LearningRate 0.0503 Epoch: 5 Global Step: 241450 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:00,213-Speed 2629.10 samples/sec Loss 9.6555 LearningRate 0.0503 Epoch: 5 Global Step: 241460 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:04,128-Speed 2615.83 samples/sec Loss 9.5818 LearningRate 0.0503 Epoch: 5 Global Step: 241470 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:08,019-Speed 2632.78 samples/sec Loss 9.4933 LearningRate 0.0503 Epoch: 5 Global Step: 241480 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:11,913-Speed 2630.18 samples/sec Loss 9.5944 LearningRate 0.0503 Epoch: 5 Global Step: 241490 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:52:15,817-Speed 2623.73 samples/sec Loss 9.6481 LearningRate 0.0503 Epoch: 5 Global Step: 241500 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:52:19,714-Speed 2628.06 samples/sec Loss 9.4666 LearningRate 0.0503 Epoch: 5 Global Step: 241510 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:52:23,609-Speed 2629.62 samples/sec Loss 9.5775 LearningRate 0.0502 Epoch: 5 Global Step: 241520 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:52:27,494-Speed 2636.28 samples/sec Loss 9.5784 LearningRate 0.0502 Epoch: 5 Global Step: 241530 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:31,382-Speed 2634.85 samples/sec Loss 9.6397 LearningRate 0.0502 Epoch: 5 Global Step: 241540 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:35,278-Speed 2628.87 samples/sec Loss 9.6433 LearningRate 0.0502 Epoch: 5 Global Step: 241550 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:39,171-Speed 2630.57 samples/sec Loss 9.6789 LearningRate 0.0502 Epoch: 5 Global Step: 241560 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:52:43,048-Speed 2642.45 samples/sec Loss 9.5991 LearningRate 0.0502 Epoch: 5 Global Step: 241570 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:52:46,948-Speed 2626.42 samples/sec Loss 9.5586 LearningRate 0.0502 Epoch: 5 Global Step: 241580 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:52:50,839-Speed 2631.93 samples/sec Loss 9.6669 LearningRate 0.0502 Epoch: 5 Global Step: 241590 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:52:54,732-Speed 2631.18 samples/sec Loss 9.5931 LearningRate 0.0502 Epoch: 5 Global Step: 241600 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:52:58,626-Speed 2630.00 samples/sec Loss 9.6079 LearningRate 0.0502 Epoch: 5 Global Step: 241610 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:53:02,525-Speed 2627.07 samples/sec Loss 9.6771 LearningRate 0.0502 Epoch: 5 Global Step: 241620 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:53:06,419-Speed 2630.07 samples/sec Loss 9.5190 LearningRate 0.0502 Epoch: 5 Global Step: 241630 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:53:10,323-Speed 2623.60 samples/sec Loss 9.5424 LearningRate 0.0502 Epoch: 5 Global Step: 241640 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:53:14,218-Speed 2629.57 samples/sec Loss 9.5316 LearningRate 0.0502 Epoch: 5 Global Step: 241650 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:53:18,115-Speed 2628.20 samples/sec Loss 9.5656 LearningRate 0.0502 Epoch: 5 Global Step: 241660 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:53:22,044-Speed 2607.30 samples/sec Loss 9.5833 LearningRate 0.0502 Epoch: 5 Global Step: 241670 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:25,938-Speed 2629.83 samples/sec Loss 9.7235 LearningRate 0.0502 Epoch: 5 Global Step: 241680 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:29,831-Speed 2631.45 samples/sec Loss 9.6158 LearningRate 0.0502 Epoch: 5 Global Step: 241690 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:33,741-Speed 2619.12 samples/sec Loss 9.6054 LearningRate 0.0502 Epoch: 5 Global Step: 241700 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:37,638-Speed 2628.23 samples/sec Loss 9.5648 LearningRate 0.0502 Epoch: 5 Global Step: 241710 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:41,534-Speed 2628.82 samples/sec Loss 9.5490 LearningRate 0.0502 Epoch: 5 Global Step: 241720 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:45,433-Speed 2627.19 samples/sec Loss 9.6323 LearningRate 0.0502 Epoch: 5 Global Step: 241730 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:49,335-Speed 2624.37 samples/sec Loss 9.7011 LearningRate 0.0502 Epoch: 5 Global Step: 241740 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:53,226-Speed 2632.77 samples/sec Loss 9.5842 LearningRate 0.0502 Epoch: 5 Global Step: 241750 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:53:57,121-Speed 2629.72 samples/sec Loss 9.6947 LearningRate 0.0502 Epoch: 5 Global Step: 241760 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:54:01,015-Speed 2630.65 samples/sec Loss 9.4814 LearningRate 0.0502 Epoch: 5 Global Step: 241770 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:04,909-Speed 2630.22 samples/sec Loss 9.6277 LearningRate 0.0502 Epoch: 5 Global Step: 241780 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:08,802-Speed 2630.96 samples/sec Loss 9.6722 LearningRate 0.0502 Epoch: 5 Global Step: 241790 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:12,694-Speed 2631.01 samples/sec Loss 9.3911 LearningRate 0.0502 Epoch: 5 Global Step: 241800 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:16,597-Speed 2624.83 samples/sec Loss 9.5473 LearningRate 0.0502 Epoch: 5 Global Step: 241810 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:20,500-Speed 2623.53 samples/sec Loss 9.6183 LearningRate 0.0502 Epoch: 5 Global Step: 241820 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:24,398-Speed 2627.94 samples/sec Loss 9.6044 LearningRate 0.0502 Epoch: 5 Global Step: 241830 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:28,294-Speed 2628.56 samples/sec Loss 9.5543 LearningRate 0.0502 Epoch: 5 Global Step: 241840 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:32,192-Speed 2628.28 samples/sec Loss 9.5905 LearningRate 0.0502 Epoch: 5 Global Step: 241850 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:36,090-Speed 2627.34 samples/sec Loss 9.5688 LearningRate 0.0502 Epoch: 5 Global Step: 241860 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:39,977-Speed 2635.55 samples/sec Loss 9.4925 LearningRate 0.0502 Epoch: 5 Global Step: 241870 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:43,870-Speed 2630.91 samples/sec Loss 9.4454 LearningRate 0.0502 Epoch: 5 Global Step: 241880 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:47,766-Speed 2628.84 samples/sec Loss 9.5981 LearningRate 0.0502 Epoch: 5 Global Step: 241890 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:51,665-Speed 2626.90 samples/sec Loss 9.5134 LearningRate 0.0502 Epoch: 5 Global Step: 241900 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:55,560-Speed 2629.24 samples/sec Loss 9.4869 LearningRate 0.0502 Epoch: 5 Global Step: 241910 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:54:59,465-Speed 2623.04 samples/sec Loss 9.6220 LearningRate 0.0502 Epoch: 5 Global Step: 241920 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:55:03,347-Speed 2638.23 samples/sec Loss 9.4374 LearningRate 0.0502 Epoch: 5 Global Step: 241930 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:55:07,249-Speed 2625.11 samples/sec Loss 9.5672 LearningRate 0.0502 Epoch: 5 Global Step: 241940 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:55:11,120-Speed 2645.54 samples/sec Loss 9.5963 LearningRate 0.0502 Epoch: 5 Global Step: 241950 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:55:15,149-Speed 2542.66 samples/sec Loss 11.4892 LearningRate 0.0502 Epoch: 5 Global Step: 241960 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:19,045-Speed 2629.14 samples/sec Loss 10.7380 LearningRate 0.0502 Epoch: 5 Global Step: 241970 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:22,940-Speed 2629.65 samples/sec Loss 10.0265 LearningRate 0.0502 Epoch: 5 Global Step: 241980 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:26,834-Speed 2629.75 samples/sec Loss 9.7536 LearningRate 0.0502 Epoch: 5 Global Step: 241990 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:30,734-Speed 2626.80 samples/sec Loss 9.6523 LearningRate 0.0502 Epoch: 5 Global Step: 242000 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:34,632-Speed 2627.12 samples/sec Loss 9.6295 LearningRate 0.0502 Epoch: 5 Global Step: 242010 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:38,534-Speed 2625.17 samples/sec Loss 9.6919 LearningRate 0.0502 Epoch: 5 Global Step: 242020 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:42,442-Speed 2620.20 samples/sec Loss 9.6211 LearningRate 0.0502 Epoch: 5 Global Step: 242030 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:46,343-Speed 2625.96 samples/sec Loss 9.6253 LearningRate 0.0502 Epoch: 5 Global Step: 242040 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:50,236-Speed 2630.88 samples/sec Loss 9.6718 LearningRate 0.0502 Epoch: 5 Global Step: 242050 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 22:55:54,131-Speed 2629.47 samples/sec Loss 9.7397 LearningRate 0.0502 Epoch: 5 Global Step: 242060 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:55:58,031-Speed 2626.76 samples/sec Loss 9.6491 LearningRate 0.0502 Epoch: 5 Global Step: 242070 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:01,920-Speed 2633.53 samples/sec Loss 9.6305 LearningRate 0.0502 Epoch: 5 Global Step: 242080 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:05,807-Speed 2635.08 samples/sec Loss 9.6038 LearningRate 0.0502 Epoch: 5 Global Step: 242090 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:09,704-Speed 2628.24 samples/sec Loss 9.6552 LearningRate 0.0501 Epoch: 5 Global Step: 242100 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:13,595-Speed 2632.17 samples/sec Loss 9.5818 LearningRate 0.0501 Epoch: 5 Global Step: 242110 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:17,495-Speed 2626.01 samples/sec Loss 9.6714 LearningRate 0.0501 Epoch: 5 Global Step: 242120 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:21,391-Speed 2629.11 samples/sec Loss 9.5844 LearningRate 0.0501 Epoch: 5 Global Step: 242130 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:25,283-Speed 2631.11 samples/sec Loss 9.4961 LearningRate 0.0501 Epoch: 5 Global Step: 242140 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:29,179-Speed 2629.18 samples/sec Loss 9.5645 LearningRate 0.0501 Epoch: 5 Global Step: 242150 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 22:56:33,079-Speed 2626.83 samples/sec Loss 9.6289 LearningRate 0.0501 Epoch: 5 Global Step: 242160 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:56:36,981-Speed 2624.65 samples/sec Loss 9.7160 LearningRate 0.0501 Epoch: 5 Global Step: 242170 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:56:40,878-Speed 2628.05 samples/sec Loss 9.6856 LearningRate 0.0501 Epoch: 5 Global Step: 242180 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:56:44,767-Speed 2633.52 samples/sec Loss 9.7048 LearningRate 0.0501 Epoch: 5 Global Step: 242190 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:56:48,662-Speed 2630.28 samples/sec Loss 9.7238 LearningRate 0.0501 Epoch: 5 Global Step: 242200 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:56:52,552-Speed 2632.66 samples/sec Loss 9.5040 LearningRate 0.0501 Epoch: 5 Global Step: 242210 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:56:56,445-Speed 2630.89 samples/sec Loss 9.7094 LearningRate 0.0501 Epoch: 5 Global Step: 242220 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:57:00,339-Speed 2630.58 samples/sec Loss 9.5797 LearningRate 0.0501 Epoch: 5 Global Step: 242230 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:57:04,236-Speed 2628.32 samples/sec Loss 9.7126 LearningRate 0.0501 Epoch: 5 Global Step: 242240 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:57:08,132-Speed 2628.75 samples/sec Loss 9.6384 LearningRate 0.0501 Epoch: 5 Global Step: 242250 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 22:57:12,024-Speed 2631.85 samples/sec Loss 9.6486 LearningRate 0.0501 Epoch: 5 Global Step: 242260 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:15,917-Speed 2630.90 samples/sec Loss 9.7077 LearningRate 0.0501 Epoch: 5 Global Step: 242270 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:19,817-Speed 2626.09 samples/sec Loss 9.7032 LearningRate 0.0501 Epoch: 5 Global Step: 242280 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:23,722-Speed 2623.14 samples/sec Loss 9.6351 LearningRate 0.0501 Epoch: 5 Global Step: 242290 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:27,614-Speed 2631.84 samples/sec Loss 9.5485 LearningRate 0.0501 Epoch: 5 Global Step: 242300 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:31,506-Speed 2631.39 samples/sec Loss 9.6364 LearningRate 0.0501 Epoch: 5 Global Step: 242310 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:35,436-Speed 2605.53 samples/sec Loss 9.6132 LearningRate 0.0501 Epoch: 5 Global Step: 242320 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:39,416-Speed 2573.41 samples/sec Loss 9.7375 LearningRate 0.0501 Epoch: 5 Global Step: 242330 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:43,315-Speed 2627.16 samples/sec Loss 9.5880 LearningRate 0.0501 Epoch: 5 Global Step: 242340 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:47,221-Speed 2622.52 samples/sec Loss 9.6511 LearningRate 0.0501 Epoch: 5 Global Step: 242350 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 22:57:51,238-Speed 2550.15 samples/sec Loss 9.5500 LearningRate 0.0501 Epoch: 5 Global Step: 242360 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:57:55,212-Speed 2577.17 samples/sec Loss 9.6620 LearningRate 0.0501 Epoch: 5 Global Step: 242370 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:57:59,105-Speed 2631.00 samples/sec Loss 9.5893 LearningRate 0.0501 Epoch: 5 Global Step: 242380 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:02,997-Speed 2631.34 samples/sec Loss 9.6520 LearningRate 0.0501 Epoch: 5 Global Step: 242390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:06,895-Speed 2627.27 samples/sec Loss 9.5425 LearningRate 0.0501 Epoch: 5 Global Step: 242400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:10,791-Speed 2629.34 samples/sec Loss 9.5294 LearningRate 0.0501 Epoch: 5 Global Step: 242410 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:14,683-Speed 2631.65 samples/sec Loss 9.5137 LearningRate 0.0501 Epoch: 5 Global Step: 242420 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:18,580-Speed 2628.44 samples/sec Loss 9.5825 LearningRate 0.0501 Epoch: 5 Global Step: 242430 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:22,482-Speed 2624.58 samples/sec Loss 9.6119 LearningRate 0.0501 Epoch: 5 Global Step: 242440 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:26,373-Speed 2632.56 samples/sec Loss 9.5348 LearningRate 0.0501 Epoch: 5 Global Step: 242450 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:30,266-Speed 2630.90 samples/sec Loss 9.6036 LearningRate 0.0501 Epoch: 5 Global Step: 242460 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:58:34,159-Speed 2630.86 samples/sec Loss 9.4674 LearningRate 0.0501 Epoch: 5 Global Step: 242470 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:58:38,034-Speed 2642.99 samples/sec Loss 9.5108 LearningRate 0.0501 Epoch: 5 Global Step: 242480 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:41,924-Speed 2633.35 samples/sec Loss 9.4541 LearningRate 0.0501 Epoch: 5 Global Step: 242490 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:45,817-Speed 2630.93 samples/sec Loss 9.5117 LearningRate 0.0501 Epoch: 5 Global Step: 242500 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:49,813-Speed 2563.02 samples/sec Loss 9.5961 LearningRate 0.0501 Epoch: 5 Global Step: 242510 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:53,705-Speed 2631.51 samples/sec Loss 9.6013 LearningRate 0.0501 Epoch: 5 Global Step: 242520 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:58:57,595-Speed 2632.92 samples/sec Loss 9.6053 LearningRate 0.0501 Epoch: 5 Global Step: 242530 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:01,490-Speed 2630.02 samples/sec Loss 9.6510 LearningRate 0.0501 Epoch: 5 Global Step: 242540 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:05,381-Speed 2632.56 samples/sec Loss 9.6339 LearningRate 0.0501 Epoch: 5 Global Step: 242550 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:09,277-Speed 2628.32 samples/sec Loss 9.5529 LearningRate 0.0501 Epoch: 5 Global Step: 242560 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:13,170-Speed 2631.58 samples/sec Loss 9.5424 LearningRate 0.0501 Epoch: 5 Global Step: 242570 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:17,065-Speed 2630.09 samples/sec Loss 9.5287 LearningRate 0.0501 Epoch: 5 Global Step: 242580 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:59:20,961-Speed 2629.23 samples/sec Loss 9.5470 LearningRate 0.0501 Epoch: 5 Global Step: 242590 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:59:24,852-Speed 2632.10 samples/sec Loss 9.4078 LearningRate 0.0501 Epoch: 5 Global Step: 242600 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:59:28,744-Speed 2631.84 samples/sec Loss 9.6398 LearningRate 0.0501 Epoch: 5 Global Step: 242610 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 22:59:32,618-Speed 2643.33 samples/sec Loss 9.5391 LearningRate 0.0501 Epoch: 5 Global Step: 242620 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:36,509-Speed 2632.37 samples/sec Loss 9.5516 LearningRate 0.0501 Epoch: 5 Global Step: 242630 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:40,402-Speed 2631.03 samples/sec Loss 9.6545 LearningRate 0.0501 Epoch: 5 Global Step: 242640 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:44,300-Speed 2628.22 samples/sec Loss 9.4493 LearningRate 0.0501 Epoch: 5 Global Step: 242650 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:48,191-Speed 2632.62 samples/sec Loss 9.5623 LearningRate 0.0501 Epoch: 5 Global Step: 242660 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:52,086-Speed 2629.44 samples/sec Loss 9.4283 LearningRate 0.0501 Epoch: 5 Global Step: 242670 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:55,979-Speed 2631.33 samples/sec Loss 9.4749 LearningRate 0.0501 Epoch: 5 Global Step: 242680 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 22:59:59,868-Speed 2633.50 samples/sec Loss 9.5422 LearningRate 0.0500 Epoch: 5 Global Step: 242690 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:03,768-Speed 2625.55 samples/sec Loss 9.5512 LearningRate 0.0500 Epoch: 5 Global Step: 242700 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:07,686-Speed 2614.61 samples/sec Loss 9.5925 LearningRate 0.0500 Epoch: 5 Global Step: 242710 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:11,583-Speed 2627.93 samples/sec Loss 9.5093 LearningRate 0.0500 Epoch: 5 Global Step: 242720 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:00:15,476-Speed 2630.66 samples/sec Loss 9.6019 LearningRate 0.0500 Epoch: 5 Global Step: 242730 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:00:19,353-Speed 2641.92 samples/sec Loss 9.5133 LearningRate 0.0500 Epoch: 5 Global Step: 242740 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:23,253-Speed 2626.70 samples/sec Loss 9.5171 LearningRate 0.0500 Epoch: 5 Global Step: 242750 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:27,146-Speed 2630.93 samples/sec Loss 9.5982 LearningRate 0.0500 Epoch: 5 Global Step: 242760 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:31,044-Speed 2627.34 samples/sec Loss 9.6362 LearningRate 0.0500 Epoch: 5 Global Step: 242770 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:34,940-Speed 2628.81 samples/sec Loss 9.6093 LearningRate 0.0500 Epoch: 5 Global Step: 242780 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:38,835-Speed 2629.94 samples/sec Loss 9.6093 LearningRate 0.0500 Epoch: 5 Global Step: 242790 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:42,729-Speed 2630.71 samples/sec Loss 9.5227 LearningRate 0.0500 Epoch: 5 Global Step: 242800 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:46,622-Speed 2630.77 samples/sec Loss 9.4961 LearningRate 0.0500 Epoch: 5 Global Step: 242810 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:50,545-Speed 2610.65 samples/sec Loss 9.5147 LearningRate 0.0500 Epoch: 5 Global Step: 242820 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:54,438-Speed 2631.03 samples/sec Loss 9.6220 LearningRate 0.0500 Epoch: 5 Global Step: 242830 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:00:58,329-Speed 2632.05 samples/sec Loss 9.4557 LearningRate 0.0500 Epoch: 5 Global Step: 242840 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:01:02,223-Speed 2630.68 samples/sec Loss 9.5262 LearningRate 0.0500 Epoch: 5 Global Step: 242850 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:01:06,115-Speed 2631.84 samples/sec Loss 9.5740 LearningRate 0.0500 Epoch: 5 Global Step: 242860 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:01:10,003-Speed 2634.32 samples/sec Loss 9.5194 LearningRate 0.0500 Epoch: 5 Global Step: 242870 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:13,923-Speed 2612.75 samples/sec Loss 9.6196 LearningRate 0.0500 Epoch: 5 Global Step: 242880 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:17,841-Speed 2614.09 samples/sec Loss 9.6638 LearningRate 0.0500 Epoch: 5 Global Step: 242890 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:21,737-Speed 2629.06 samples/sec Loss 9.6983 LearningRate 0.0500 Epoch: 5 Global Step: 242900 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:25,633-Speed 2628.53 samples/sec Loss 9.5624 LearningRate 0.0500 Epoch: 5 Global Step: 242910 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:29,534-Speed 2625.62 samples/sec Loss 9.5927 LearningRate 0.0500 Epoch: 5 Global Step: 242920 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:33,431-Speed 2628.28 samples/sec Loss 9.5097 LearningRate 0.0500 Epoch: 5 Global Step: 242930 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:37,329-Speed 2627.77 samples/sec Loss 9.5724 LearningRate 0.0500 Epoch: 5 Global Step: 242940 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:41,222-Speed 2631.10 samples/sec Loss 9.5709 LearningRate 0.0500 Epoch: 5 Global Step: 242950 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:45,115-Speed 2631.16 samples/sec Loss 9.5346 LearningRate 0.0500 Epoch: 5 Global Step: 242960 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:01:49,007-Speed 2631.57 samples/sec Loss 9.5385 LearningRate 0.0500 Epoch: 5 Global Step: 242970 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:01:52,911-Speed 2623.46 samples/sec Loss 9.5208 LearningRate 0.0500 Epoch: 5 Global Step: 242980 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:01:56,804-Speed 2630.97 samples/sec Loss 9.6250 LearningRate 0.0500 Epoch: 5 Global Step: 242990 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:02:00,703-Speed 2626.48 samples/sec Loss 9.6781 LearningRate 0.0500 Epoch: 5 Global Step: 243000 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:02:04,604-Speed 2625.73 samples/sec Loss 9.4186 LearningRate 0.0500 Epoch: 5 Global Step: 243010 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:02:08,507-Speed 2624.36 samples/sec Loss 9.4201 LearningRate 0.0500 Epoch: 5 Global Step: 243020 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:02:12,391-Speed 2637.30 samples/sec Loss 9.5773 LearningRate 0.0500 Epoch: 5 Global Step: 243030 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:16,282-Speed 2632.12 samples/sec Loss 9.5785 LearningRate 0.0500 Epoch: 5 Global Step: 243040 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:20,174-Speed 2632.11 samples/sec Loss 9.5238 LearningRate 0.0500 Epoch: 5 Global Step: 243050 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:24,064-Speed 2632.80 samples/sec Loss 9.4708 LearningRate 0.0500 Epoch: 5 Global Step: 243060 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:27,958-Speed 2630.06 samples/sec Loss 9.6038 LearningRate 0.0500 Epoch: 5 Global Step: 243070 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:31,859-Speed 2625.30 samples/sec Loss 9.5677 LearningRate 0.0500 Epoch: 5 Global Step: 243080 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:35,767-Speed 2620.80 samples/sec Loss 9.6036 LearningRate 0.0500 Epoch: 5 Global Step: 243090 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:39,675-Speed 2620.93 samples/sec Loss 9.5077 LearningRate 0.0500 Epoch: 5 Global Step: 243100 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:43,577-Speed 2625.08 samples/sec Loss 9.4909 LearningRate 0.0500 Epoch: 5 Global Step: 243110 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:47,473-Speed 2628.68 samples/sec Loss 9.5404 LearningRate 0.0500 Epoch: 5 Global Step: 243120 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:51,365-Speed 2631.92 samples/sec Loss 9.6138 LearningRate 0.0500 Epoch: 5 Global Step: 243130 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:02:55,251-Speed 2635.96 samples/sec Loss 9.6331 LearningRate 0.0500 Epoch: 5 Global Step: 243140 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:02:59,144-Speed 2630.89 samples/sec Loss 9.6430 LearningRate 0.0500 Epoch: 5 Global Step: 243150 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:03,036-Speed 2631.18 samples/sec Loss 9.4864 LearningRate 0.0500 Epoch: 5 Global Step: 243160 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:06,929-Speed 2631.02 samples/sec Loss 9.5007 LearningRate 0.0500 Epoch: 5 Global Step: 243170 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:10,827-Speed 2627.39 samples/sec Loss 9.7045 LearningRate 0.0500 Epoch: 5 Global Step: 243180 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:14,720-Speed 2630.80 samples/sec Loss 9.6017 LearningRate 0.0500 Epoch: 5 Global Step: 243190 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:18,614-Speed 2631.27 samples/sec Loss 9.4976 LearningRate 0.0500 Epoch: 5 Global Step: 243200 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:22,507-Speed 2630.45 samples/sec Loss 9.5074 LearningRate 0.0500 Epoch: 5 Global Step: 243210 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:26,395-Speed 2634.42 samples/sec Loss 9.3759 LearningRate 0.0500 Epoch: 5 Global Step: 243220 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:30,299-Speed 2623.97 samples/sec Loss 9.5247 LearningRate 0.0500 Epoch: 5 Global Step: 243230 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:34,191-Speed 2631.44 samples/sec Loss 9.5673 LearningRate 0.0500 Epoch: 5 Global Step: 243240 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:03:38,066-Speed 2643.33 samples/sec Loss 9.4863 LearningRate 0.0500 Epoch: 5 Global Step: 243250 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:41,959-Speed 2630.73 samples/sec Loss 9.5296 LearningRate 0.0500 Epoch: 5 Global Step: 243260 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:45,866-Speed 2621.81 samples/sec Loss 9.6507 LearningRate 0.0500 Epoch: 5 Global Step: 243270 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:49,762-Speed 2628.98 samples/sec Loss 9.6225 LearningRate 0.0499 Epoch: 5 Global Step: 243280 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:53,667-Speed 2622.72 samples/sec Loss 9.3916 LearningRate 0.0499 Epoch: 5 Global Step: 243290 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:03:57,560-Speed 2630.86 samples/sec Loss 9.6218 LearningRate 0.0499 Epoch: 5 Global Step: 243300 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:01,453-Speed 2630.86 samples/sec Loss 9.6018 LearningRate 0.0499 Epoch: 5 Global Step: 243310 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:05,423-Speed 2579.78 samples/sec Loss 9.4607 LearningRate 0.0499 Epoch: 5 Global Step: 243320 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:09,317-Speed 2630.54 samples/sec Loss 9.4764 LearningRate 0.0499 Epoch: 5 Global Step: 243330 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:13,208-Speed 2632.61 samples/sec Loss 9.4637 LearningRate 0.0499 Epoch: 5 Global Step: 243340 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:17,099-Speed 2632.23 samples/sec Loss 9.3648 LearningRate 0.0499 Epoch: 5 Global Step: 243350 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:04:21,009-Speed 2619.74 samples/sec Loss 9.5423 LearningRate 0.0499 Epoch: 5 Global Step: 243360 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:24,899-Speed 2632.54 samples/sec Loss 9.6360 LearningRate 0.0499 Epoch: 5 Global Step: 243370 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:28,786-Speed 2635.41 samples/sec Loss 9.5939 LearningRate 0.0499 Epoch: 5 Global Step: 243380 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:32,676-Speed 2632.93 samples/sec Loss 9.5576 LearningRate 0.0499 Epoch: 5 Global Step: 243390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:36,571-Speed 2629.47 samples/sec Loss 9.5428 LearningRate 0.0499 Epoch: 5 Global Step: 243400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:40,460-Speed 2633.25 samples/sec Loss 9.5032 LearningRate 0.0499 Epoch: 5 Global Step: 243410 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:44,354-Speed 2630.54 samples/sec Loss 9.4540 LearningRate 0.0499 Epoch: 5 Global Step: 243420 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:48,262-Speed 2620.83 samples/sec Loss 9.5002 LearningRate 0.0499 Epoch: 5 Global Step: 243430 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:52,166-Speed 2623.93 samples/sec Loss 9.5840 LearningRate 0.0499 Epoch: 5 Global Step: 243440 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:56,069-Speed 2623.99 samples/sec Loss 9.4318 LearningRate 0.0499 Epoch: 5 Global Step: 243450 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:04:59,968-Speed 2626.87 samples/sec Loss 9.5688 LearningRate 0.0499 Epoch: 5 Global Step: 243460 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:05:03,863-Speed 2629.58 samples/sec Loss 9.5579 LearningRate 0.0499 Epoch: 5 Global Step: 243470 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:05:07,768-Speed 2622.72 samples/sec Loss 9.5493 LearningRate 0.0499 Epoch: 5 Global Step: 243480 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:05:11,668-Speed 2626.68 samples/sec Loss 9.5798 LearningRate 0.0499 Epoch: 5 Global Step: 243490 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:05:15,563-Speed 2629.51 samples/sec Loss 9.4727 LearningRate 0.0499 Epoch: 5 Global Step: 243500 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:19,458-Speed 2629.39 samples/sec Loss 9.4792 LearningRate 0.0499 Epoch: 5 Global Step: 243510 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:23,354-Speed 2628.73 samples/sec Loss 9.4273 LearningRate 0.0499 Epoch: 5 Global Step: 243520 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:27,256-Speed 2625.34 samples/sec Loss 9.5826 LearningRate 0.0499 Epoch: 5 Global Step: 243530 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:31,158-Speed 2624.47 samples/sec Loss 9.5092 LearningRate 0.0499 Epoch: 5 Global Step: 243540 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:35,054-Speed 2629.15 samples/sec Loss 9.5678 LearningRate 0.0499 Epoch: 5 Global Step: 243550 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:38,952-Speed 2627.22 samples/sec Loss 9.5847 LearningRate 0.0499 Epoch: 5 Global Step: 243560 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:42,853-Speed 2625.82 samples/sec Loss 9.4845 LearningRate 0.0499 Epoch: 5 Global Step: 243570 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:46,752-Speed 2626.75 samples/sec Loss 9.5772 LearningRate 0.0499 Epoch: 5 Global Step: 243580 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:05:50,644-Speed 2631.95 samples/sec Loss 9.4587 LearningRate 0.0499 Epoch: 5 Global Step: 243590 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:05:54,545-Speed 2625.39 samples/sec Loss 9.5407 LearningRate 0.0499 Epoch: 5 Global Step: 243600 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:05:58,443-Speed 2627.56 samples/sec Loss 9.5189 LearningRate 0.0499 Epoch: 5 Global Step: 243610 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:02,351-Speed 2620.87 samples/sec Loss 9.5395 LearningRate 0.0499 Epoch: 5 Global Step: 243620 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:06,247-Speed 2628.87 samples/sec Loss 9.5943 LearningRate 0.0499 Epoch: 5 Global Step: 243630 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:10,142-Speed 2629.55 samples/sec Loss 9.4794 LearningRate 0.0499 Epoch: 5 Global Step: 243640 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:14,035-Speed 2631.57 samples/sec Loss 9.5402 LearningRate 0.0499 Epoch: 5 Global Step: 243650 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:17,935-Speed 2626.72 samples/sec Loss 9.5519 LearningRate 0.0499 Epoch: 5 Global Step: 243660 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:21,828-Speed 2630.42 samples/sec Loss 9.6836 LearningRate 0.0499 Epoch: 5 Global Step: 243670 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:25,902-Speed 2514.16 samples/sec Loss 9.6297 LearningRate 0.0499 Epoch: 5 Global Step: 243680 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:06:29,809-Speed 2621.31 samples/sec Loss 9.4276 LearningRate 0.0499 Epoch: 5 Global Step: 243690 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:06:33,713-Speed 2624.05 samples/sec Loss 9.6045 LearningRate 0.0499 Epoch: 5 Global Step: 243700 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:06:37,703-Speed 2566.84 samples/sec Loss 9.4886 LearningRate 0.0499 Epoch: 5 Global Step: 243710 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:06:41,628-Speed 2609.25 samples/sec Loss 9.5183 LearningRate 0.0499 Epoch: 5 Global Step: 243720 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:06:45,528-Speed 2626.99 samples/sec Loss 9.6091 LearningRate 0.0499 Epoch: 5 Global Step: 243730 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:06:49,424-Speed 2628.36 samples/sec Loss 9.5656 LearningRate 0.0499 Epoch: 5 Global Step: 243740 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:06:53,340-Speed 2616.11 samples/sec Loss 9.4950 LearningRate 0.0499 Epoch: 5 Global Step: 243750 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:06:57,240-Speed 2625.44 samples/sec Loss 9.6014 LearningRate 0.0499 Epoch: 5 Global Step: 243760 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:07:01,135-Speed 2629.99 samples/sec Loss 9.6773 LearningRate 0.0499 Epoch: 5 Global Step: 243770 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:07:05,033-Speed 2627.10 samples/sec Loss 9.5384 LearningRate 0.0499 Epoch: 5 Global Step: 243780 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:07:08,939-Speed 2622.99 samples/sec Loss 9.6108 LearningRate 0.0499 Epoch: 5 Global Step: 243790 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:12,843-Speed 2623.69 samples/sec Loss 9.5664 LearningRate 0.0499 Epoch: 5 Global Step: 243800 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:16,746-Speed 2623.96 samples/sec Loss 9.5914 LearningRate 0.0499 Epoch: 5 Global Step: 243810 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:20,648-Speed 2625.12 samples/sec Loss 9.5369 LearningRate 0.0499 Epoch: 5 Global Step: 243820 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:25,057-Speed 2323.12 samples/sec Loss 9.6036 LearningRate 0.0499 Epoch: 5 Global Step: 243830 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:28,949-Speed 2631.92 samples/sec Loss 9.4308 LearningRate 0.0499 Epoch: 5 Global Step: 243840 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:32,846-Speed 2628.40 samples/sec Loss 9.4898 LearningRate 0.0499 Epoch: 5 Global Step: 243850 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:36,740-Speed 2630.12 samples/sec Loss 9.5318 LearningRate 0.0498 Epoch: 5 Global Step: 243860 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:40,646-Speed 2622.03 samples/sec Loss 9.5273 LearningRate 0.0498 Epoch: 5 Global Step: 243870 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:07:44,535-Speed 2633.40 samples/sec Loss 9.4509 LearningRate 0.0498 Epoch: 5 Global Step: 243880 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:07:48,439-Speed 2623.43 samples/sec Loss 9.4978 LearningRate 0.0498 Epoch: 5 Global Step: 243890 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:07:52,361-Speed 2611.53 samples/sec Loss 9.4145 LearningRate 0.0498 Epoch: 5 Global Step: 243900 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:07:56,264-Speed 2624.00 samples/sec Loss 9.4847 LearningRate 0.0498 Epoch: 5 Global Step: 243910 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:00,176-Speed 2618.52 samples/sec Loss 9.3742 LearningRate 0.0498 Epoch: 5 Global Step: 243920 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:04,091-Speed 2616.14 samples/sec Loss 9.5231 LearningRate 0.0498 Epoch: 5 Global Step: 243930 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:07,984-Speed 2631.39 samples/sec Loss 9.5326 LearningRate 0.0498 Epoch: 5 Global Step: 243940 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:12,051-Speed 2518.18 samples/sec Loss 9.4407 LearningRate 0.0498 Epoch: 5 Global Step: 243950 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:16,145-Speed 2502.04 samples/sec Loss 9.4794 LearningRate 0.0498 Epoch: 5 Global Step: 243960 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:20,059-Speed 2616.63 samples/sec Loss 9.5419 LearningRate 0.0498 Epoch: 5 Global Step: 243970 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:23,959-Speed 2626.37 samples/sec Loss 9.5483 LearningRate 0.0498 Epoch: 5 Global Step: 243980 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:08:27,864-Speed 2623.12 samples/sec Loss 9.4537 LearningRate 0.0498 Epoch: 5 Global Step: 243990 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:08:31,773-Speed 2620.25 samples/sec Loss 9.5237 LearningRate 0.0498 Epoch: 5 Global Step: 244000 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:35,788-Speed 2550.85 samples/sec Loss 9.5870 LearningRate 0.0498 Epoch: 5 Global Step: 244010 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:39,699-Speed 2618.79 samples/sec Loss 9.6192 LearningRate 0.0498 Epoch: 5 Global Step: 244020 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:43,608-Speed 2620.35 samples/sec Loss 9.4953 LearningRate 0.0498 Epoch: 5 Global Step: 244030 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:47,502-Speed 2630.65 samples/sec Loss 9.5935 LearningRate 0.0498 Epoch: 5 Global Step: 244040 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:51,400-Speed 2627.23 samples/sec Loss 9.4358 LearningRate 0.0498 Epoch: 5 Global Step: 244050 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:55,297-Speed 2628.65 samples/sec Loss 9.3532 LearningRate 0.0498 Epoch: 5 Global Step: 244060 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:08:59,193-Speed 2628.83 samples/sec Loss 9.2878 LearningRate 0.0498 Epoch: 5 Global Step: 244070 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:03,091-Speed 2627.31 samples/sec Loss 9.5918 LearningRate 0.0498 Epoch: 5 Global Step: 244080 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:06,985-Speed 2630.54 samples/sec Loss 9.4669 LearningRate 0.0498 Epoch: 5 Global Step: 244090 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:10,880-Speed 2629.39 samples/sec Loss 9.5101 LearningRate 0.0498 Epoch: 5 Global Step: 244100 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:09:14,773-Speed 2631.53 samples/sec Loss 9.5884 LearningRate 0.0498 Epoch: 5 Global Step: 244110 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:09:18,669-Speed 2628.85 samples/sec Loss 9.5769 LearningRate 0.0498 Epoch: 5 Global Step: 244120 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:09:22,550-Speed 2639.13 samples/sec Loss 9.5879 LearningRate 0.0498 Epoch: 5 Global Step: 244130 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:26,445-Speed 2629.35 samples/sec Loss 9.5015 LearningRate 0.0498 Epoch: 5 Global Step: 244140 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:30,355-Speed 2619.52 samples/sec Loss 9.6124 LearningRate 0.0498 Epoch: 5 Global Step: 244150 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:34,257-Speed 2624.81 samples/sec Loss 9.5030 LearningRate 0.0498 Epoch: 5 Global Step: 244160 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:38,149-Speed 2631.34 samples/sec Loss 9.5326 LearningRate 0.0498 Epoch: 5 Global Step: 244170 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:42,040-Speed 2632.57 samples/sec Loss 9.4647 LearningRate 0.0498 Epoch: 5 Global Step: 244180 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:47,025-Speed 2054.36 samples/sec Loss 9.5337 LearningRate 0.0498 Epoch: 5 Global Step: 244190 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:50,917-Speed 2632.00 samples/sec Loss 9.4710 LearningRate 0.0498 Epoch: 5 Global Step: 244200 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:54,806-Speed 2634.09 samples/sec Loss 9.5339 LearningRate 0.0498 Epoch: 5 Global Step: 244210 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:09:58,712-Speed 2622.03 samples/sec Loss 9.5856 LearningRate 0.0498 Epoch: 5 Global Step: 244220 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:10:02,608-Speed 2628.92 samples/sec Loss 9.4389 LearningRate 0.0498 Epoch: 5 Global Step: 244230 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:10:06,498-Speed 2632.54 samples/sec Loss 9.5809 LearningRate 0.0498 Epoch: 5 Global Step: 244240 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:10:10,380-Speed 2638.42 samples/sec Loss 9.6281 LearningRate 0.0498 Epoch: 5 Global Step: 244250 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:10:14,252-Speed 2645.42 samples/sec Loss 9.6348 LearningRate 0.0498 Epoch: 5 Global Step: 244260 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:10:18,154-Speed 2625.13 samples/sec Loss 9.5880 LearningRate 0.0498 Epoch: 5 Global Step: 244270 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:10:22,044-Speed 2632.61 samples/sec Loss 9.6289 LearningRate 0.0498 Epoch: 5 Global Step: 244280 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:10:25,886-Speed 2666.10 samples/sec Loss 10.4292 LearningRate 0.0498 Epoch: 5 Global Step: 244290 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:29,782-Speed 2629.41 samples/sec Loss 10.0445 LearningRate 0.0498 Epoch: 5 Global Step: 244300 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:33,673-Speed 2631.86 samples/sec Loss 10.2844 LearningRate 0.0498 Epoch: 5 Global Step: 244310 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:37,563-Speed 2632.86 samples/sec Loss 9.9684 LearningRate 0.0498 Epoch: 5 Global Step: 244320 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:41,454-Speed 2632.39 samples/sec Loss 9.6799 LearningRate 0.0498 Epoch: 5 Global Step: 244330 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:45,343-Speed 2633.95 samples/sec Loss 9.7636 LearningRate 0.0498 Epoch: 5 Global Step: 244340 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:49,233-Speed 2632.63 samples/sec Loss 9.5752 LearningRate 0.0498 Epoch: 5 Global Step: 244350 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:53,123-Speed 2632.70 samples/sec Loss 9.5306 LearningRate 0.0498 Epoch: 5 Global Step: 244360 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:10:57,066-Speed 2597.81 samples/sec Loss 9.5039 LearningRate 0.0498 Epoch: 5 Global Step: 244370 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:11:00,962-Speed 2628.77 samples/sec Loss 9.5097 LearningRate 0.0498 Epoch: 5 Global Step: 244380 Fp16 Grad Scale: 8192 Required: 66 hours
Training: 2022-04-13 23:11:04,855-Speed 2631.23 samples/sec Loss 9.5638 LearningRate 0.0498 Epoch: 5 Global Step: 244390 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:08,744-Speed 2633.92 samples/sec Loss 9.6094 LearningRate 0.0498 Epoch: 5 Global Step: 244400 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:12,635-Speed 2632.41 samples/sec Loss 9.5353 LearningRate 0.0498 Epoch: 5 Global Step: 244410 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:16,534-Speed 2626.55 samples/sec Loss 9.5719 LearningRate 0.0498 Epoch: 5 Global Step: 244420 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:20,442-Speed 2621.20 samples/sec Loss 9.5188 LearningRate 0.0498 Epoch: 5 Global Step: 244430 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:24,335-Speed 2631.03 samples/sec Loss 9.5126 LearningRate 0.0498 Epoch: 5 Global Step: 244440 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:28,225-Speed 2632.78 samples/sec Loss 9.6500 LearningRate 0.0497 Epoch: 5 Global Step: 244450 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:32,117-Speed 2631.61 samples/sec Loss 9.5831 LearningRate 0.0497 Epoch: 5 Global Step: 244460 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:36,033-Speed 2615.49 samples/sec Loss 9.5298 LearningRate 0.0497 Epoch: 5 Global Step: 244470 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:39,926-Speed 2630.79 samples/sec Loss 9.4112 LearningRate 0.0497 Epoch: 5 Global Step: 244480 Fp16 Grad Scale: 16384 Required: 66 hours
Training: 2022-04-13 23:11:43,826-Speed 2626.34 samples/sec Loss 9.5163 LearningRate 0.0497 Epoch: 5 Global Step: 244490 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:11:47,718-Speed 2631.57 samples/sec Loss 9.4381 LearningRate 0.0497 Epoch: 5 Global Step: 244500 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:11:51,615-Speed 2628.45 samples/sec Loss 9.4471 LearningRate 0.0497 Epoch: 5 Global Step: 244510 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:11:55,509-Speed 2630.18 samples/sec Loss 9.5059 LearningRate 0.0497 Epoch: 5 Global Step: 244520 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:11:59,400-Speed 2632.75 samples/sec Loss 9.6157 LearningRate 0.0497 Epoch: 5 Global Step: 244530 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:12:03,302-Speed 2624.32 samples/sec Loss 9.5382 LearningRate 0.0497 Epoch: 5 Global Step: 244540 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:12:07,195-Speed 2630.84 samples/sec Loss 9.6516 LearningRate 0.0497 Epoch: 5 Global Step: 244550 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:12:11,087-Speed 2631.97 samples/sec Loss 9.5524 LearningRate 0.0497 Epoch: 5 Global Step: 244560 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:12:14,977-Speed 2632.86 samples/sec Loss 9.4898 LearningRate 0.0497 Epoch: 5 Global Step: 244570 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:12:18,871-Speed 2630.58 samples/sec Loss 9.6360 LearningRate 0.0497 Epoch: 5 Global Step: 244580 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:12:22,762-Speed 2632.30 samples/sec Loss 9.6281 LearningRate 0.0497 Epoch: 5 Global Step: 244590 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:26,661-Speed 2626.71 samples/sec Loss 9.4374 LearningRate 0.0497 Epoch: 5 Global Step: 244600 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:30,557-Speed 2629.06 samples/sec Loss 9.5340 LearningRate 0.0497 Epoch: 5 Global Step: 244610 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:34,485-Speed 2607.80 samples/sec Loss 9.5603 LearningRate 0.0497 Epoch: 5 Global Step: 244620 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:38,377-Speed 2631.52 samples/sec Loss 9.4732 LearningRate 0.0497 Epoch: 5 Global Step: 244630 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:42,269-Speed 2631.63 samples/sec Loss 9.4795 LearningRate 0.0497 Epoch: 5 Global Step: 244640 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:46,159-Speed 2632.60 samples/sec Loss 9.6481 LearningRate 0.0497 Epoch: 5 Global Step: 244650 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:50,050-Speed 2632.60 samples/sec Loss 9.6600 LearningRate 0.0497 Epoch: 5 Global Step: 244660 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:53,951-Speed 2625.81 samples/sec Loss 9.6427 LearningRate 0.0497 Epoch: 5 Global Step: 244670 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:12:57,839-Speed 2633.74 samples/sec Loss 9.5495 LearningRate 0.0497 Epoch: 5 Global Step: 244680 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:13:01,730-Speed 2632.52 samples/sec Loss 9.3985 LearningRate 0.0497 Epoch: 5 Global Step: 244690 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:05,621-Speed 2632.46 samples/sec Loss 9.4884 LearningRate 0.0497 Epoch: 5 Global Step: 244700 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:09,536-Speed 2616.35 samples/sec Loss 9.5271 LearningRate 0.0497 Epoch: 5 Global Step: 244710 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:13,431-Speed 2629.43 samples/sec Loss 9.3309 LearningRate 0.0497 Epoch: 5 Global Step: 244720 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:17,328-Speed 2628.49 samples/sec Loss 9.6025 LearningRate 0.0497 Epoch: 5 Global Step: 244730 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:21,222-Speed 2629.90 samples/sec Loss 9.5742 LearningRate 0.0497 Epoch: 5 Global Step: 244740 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:25,115-Speed 2631.41 samples/sec Loss 9.4002 LearningRate 0.0497 Epoch: 5 Global Step: 244750 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:29,019-Speed 2623.23 samples/sec Loss 9.5843 LearningRate 0.0497 Epoch: 5 Global Step: 244760 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:32,923-Speed 2623.40 samples/sec Loss 9.5983 LearningRate 0.0497 Epoch: 5 Global Step: 244770 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:36,826-Speed 2624.37 samples/sec Loss 9.5729 LearningRate 0.0497 Epoch: 5 Global Step: 244780 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:40,724-Speed 2627.64 samples/sec Loss 9.4820 LearningRate 0.0497 Epoch: 5 Global Step: 244790 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:13:44,604-Speed 2640.14 samples/sec Loss 9.5908 LearningRate 0.0497 Epoch: 5 Global Step: 244800 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:48,493-Speed 2633.64 samples/sec Loss 9.5456 LearningRate 0.0497 Epoch: 5 Global Step: 244810 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:52,385-Speed 2631.18 samples/sec Loss 9.3413 LearningRate 0.0497 Epoch: 5 Global Step: 244820 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:13:56,279-Speed 2630.40 samples/sec Loss 9.6042 LearningRate 0.0497 Epoch: 5 Global Step: 244830 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:14:00,172-Speed 2630.83 samples/sec Loss 9.3242 LearningRate 0.0497 Epoch: 5 Global Step: 244840 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:14:04,084-Speed 2618.39 samples/sec Loss 9.5815 LearningRate 0.0497 Epoch: 5 Global Step: 244850 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:14:07,967-Speed 2637.37 samples/sec Loss 9.4715 LearningRate 0.0497 Epoch: 5 Global Step: 244860 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:14:11,869-Speed 2625.44 samples/sec Loss 9.5135 LearningRate 0.0497 Epoch: 5 Global Step: 244870 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:14:15,766-Speed 2627.81 samples/sec Loss 9.4639 LearningRate 0.0497 Epoch: 5 Global Step: 244880 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:14:19,664-Speed 2628.06 samples/sec Loss 9.5543 LearningRate 0.0497 Epoch: 5 Global Step: 244890 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:14:23,581-Speed 2614.58 samples/sec Loss 9.5567 LearningRate 0.0497 Epoch: 5 Global Step: 244900 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:14:27,473-Speed 2632.22 samples/sec Loss 9.4505 LearningRate 0.0497 Epoch: 5 Global Step: 244910 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:14:31,360-Speed 2635.01 samples/sec Loss 9.7151 LearningRate 0.0497 Epoch: 5 Global Step: 244920 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:14:35,247-Speed 2634.62 samples/sec Loss 9.5913 LearningRate 0.0497 Epoch: 5 Global Step: 244930 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:14:39,138-Speed 2631.91 samples/sec Loss 9.5374 LearningRate 0.0497 Epoch: 5 Global Step: 244940 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:14:43,032-Speed 2630.45 samples/sec Loss 9.5653 LearningRate 0.0497 Epoch: 5 Global Step: 244950 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:14:46,927-Speed 2629.74 samples/sec Loss 9.4287 LearningRate 0.0497 Epoch: 5 Global Step: 244960 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:14:50,826-Speed 2627.11 samples/sec Loss 9.4067 LearningRate 0.0497 Epoch: 5 Global Step: 244970 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:14:54,773-Speed 2594.74 samples/sec Loss 9.6900 LearningRate 0.0497 Epoch: 5 Global Step: 244980 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:14:58,671-Speed 2628.11 samples/sec Loss 9.5027 LearningRate 0.0497 Epoch: 5 Global Step: 244990 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:15:02,565-Speed 2629.86 samples/sec Loss 9.6579 LearningRate 0.0497 Epoch: 5 Global Step: 245000 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:15:06,464-Speed 2626.61 samples/sec Loss 9.4506 LearningRate 0.0497 Epoch: 5 Global Step: 245010 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:15:10,358-Speed 2630.50 samples/sec Loss 9.5236 LearningRate 0.0497 Epoch: 5 Global Step: 245020 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:15:14,250-Speed 2631.51 samples/sec Loss 9.5722 LearningRate 0.0497 Epoch: 5 Global Step: 245030 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:18,151-Speed 2625.33 samples/sec Loss 9.5003 LearningRate 0.0496 Epoch: 5 Global Step: 245040 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:22,047-Speed 2629.30 samples/sec Loss 9.4521 LearningRate 0.0496 Epoch: 5 Global Step: 245050 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:25,940-Speed 2630.79 samples/sec Loss 9.4213 LearningRate 0.0496 Epoch: 5 Global Step: 245060 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:29,836-Speed 2629.16 samples/sec Loss 9.5540 LearningRate 0.0496 Epoch: 5 Global Step: 245070 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:33,732-Speed 2629.08 samples/sec Loss 9.6330 LearningRate 0.0496 Epoch: 5 Global Step: 245080 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:37,625-Speed 2631.08 samples/sec Loss 9.4697 LearningRate 0.0496 Epoch: 5 Global Step: 245090 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:41,523-Speed 2626.98 samples/sec Loss 9.4399 LearningRate 0.0496 Epoch: 5 Global Step: 245100 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:45,422-Speed 2627.04 samples/sec Loss 9.5548 LearningRate 0.0496 Epoch: 5 Global Step: 245110 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:49,321-Speed 2627.15 samples/sec Loss 9.5632 LearningRate 0.0496 Epoch: 5 Global Step: 245120 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:15:53,182-Speed 2653.00 samples/sec Loss 9.6432 LearningRate 0.0496 Epoch: 5 Global Step: 245130 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:15:57,079-Speed 2627.83 samples/sec Loss 9.7183 LearningRate 0.0496 Epoch: 5 Global Step: 245140 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:00,975-Speed 2628.88 samples/sec Loss 9.5990 LearningRate 0.0496 Epoch: 5 Global Step: 245150 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:04,872-Speed 2628.80 samples/sec Loss 9.4572 LearningRate 0.0496 Epoch: 5 Global Step: 245160 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:08,761-Speed 2633.31 samples/sec Loss 9.4389 LearningRate 0.0496 Epoch: 5 Global Step: 245170 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:12,652-Speed 2632.93 samples/sec Loss 9.6179 LearningRate 0.0496 Epoch: 5 Global Step: 245180 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:16,547-Speed 2629.95 samples/sec Loss 9.6185 LearningRate 0.0496 Epoch: 5 Global Step: 245190 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:20,438-Speed 2632.00 samples/sec Loss 9.4326 LearningRate 0.0496 Epoch: 5 Global Step: 245200 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:24,332-Speed 2630.43 samples/sec Loss 9.5039 LearningRate 0.0496 Epoch: 5 Global Step: 245210 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:28,236-Speed 2623.25 samples/sec Loss 9.6326 LearningRate 0.0496 Epoch: 5 Global Step: 245220 Fp16 Grad Scale: 32768 Required: 66 hours
Training: 2022-04-13 23:16:32,125-Speed 2633.83 samples/sec Loss 9.4809 LearningRate 0.0496 Epoch: 5 Global Step: 245230 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:16:36,023-Speed 2627.34 samples/sec Loss 9.5188 LearningRate 0.0496 Epoch: 5 Global Step: 245240 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:16:39,911-Speed 2634.26 samples/sec Loss 9.5549 LearningRate 0.0496 Epoch: 5 Global Step: 245250 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:16:43,804-Speed 2631.40 samples/sec Loss 9.4520 LearningRate 0.0496 Epoch: 5 Global Step: 245260 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:16:47,699-Speed 2629.84 samples/sec Loss 9.4896 LearningRate 0.0496 Epoch: 5 Global Step: 245270 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:16:51,593-Speed 2630.42 samples/sec Loss 9.5749 LearningRate 0.0496 Epoch: 5 Global Step: 245280 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:16:55,485-Speed 2631.16 samples/sec Loss 9.4860 LearningRate 0.0496 Epoch: 5 Global Step: 245290 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:16:59,389-Speed 2623.91 samples/sec Loss 9.4873 LearningRate 0.0496 Epoch: 5 Global Step: 245300 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:17:03,306-Speed 2614.80 samples/sec Loss 9.5904 LearningRate 0.0496 Epoch: 5 Global Step: 245310 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:17:07,201-Speed 2629.01 samples/sec Loss 9.5188 LearningRate 0.0496 Epoch: 5 Global Step: 245320 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:17:11,105-Speed 2623.85 samples/sec Loss 9.4256 LearningRate 0.0496 Epoch: 5 Global Step: 245330 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:14,996-Speed 2631.99 samples/sec Loss 9.5615 LearningRate 0.0496 Epoch: 5 Global Step: 245340 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:18,895-Speed 2627.56 samples/sec Loss 9.5036 LearningRate 0.0496 Epoch: 5 Global Step: 245350 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:22,795-Speed 2626.05 samples/sec Loss 9.5128 LearningRate 0.0496 Epoch: 5 Global Step: 245360 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:26,693-Speed 2627.72 samples/sec Loss 9.4971 LearningRate 0.0496 Epoch: 5 Global Step: 245370 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:30,591-Speed 2627.81 samples/sec Loss 9.5387 LearningRate 0.0496 Epoch: 5 Global Step: 245380 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:34,481-Speed 2632.83 samples/sec Loss 9.7222 LearningRate 0.0496 Epoch: 5 Global Step: 245390 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:38,379-Speed 2627.43 samples/sec Loss 9.3275 LearningRate 0.0496 Epoch: 5 Global Step: 245400 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:42,273-Speed 2630.37 samples/sec Loss 9.5118 LearningRate 0.0496 Epoch: 5 Global Step: 245410 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:46,163-Speed 2632.44 samples/sec Loss 9.4769 LearningRate 0.0496 Epoch: 5 Global Step: 245420 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:50,055-Speed 2632.22 samples/sec Loss 9.6651 LearningRate 0.0496 Epoch: 5 Global Step: 245430 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:17:53,927-Speed 2644.56 samples/sec Loss 9.4596 LearningRate 0.0496 Epoch: 5 Global Step: 245440 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:17:57,819-Speed 2632.21 samples/sec Loss 9.4676 LearningRate 0.0496 Epoch: 5 Global Step: 245450 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:18:01,712-Speed 2630.91 samples/sec Loss 9.4487 LearningRate 0.0496 Epoch: 5 Global Step: 245460 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:18:05,590-Speed 2641.50 samples/sec Loss 9.5453 LearningRate 0.0496 Epoch: 5 Global Step: 245470 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:09,483-Speed 2630.45 samples/sec Loss 9.4230 LearningRate 0.0496 Epoch: 5 Global Step: 245480 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:13,373-Speed 2633.53 samples/sec Loss 9.5638 LearningRate 0.0496 Epoch: 5 Global Step: 245490 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:17,285-Speed 2617.83 samples/sec Loss 9.5743 LearningRate 0.0496 Epoch: 5 Global Step: 245500 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:21,223-Speed 2601.24 samples/sec Loss 9.5884 LearningRate 0.0496 Epoch: 5 Global Step: 245510 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:25,117-Speed 2630.16 samples/sec Loss 9.3782 LearningRate 0.0496 Epoch: 5 Global Step: 245520 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:29,010-Speed 2631.26 samples/sec Loss 9.4843 LearningRate 0.0496 Epoch: 5 Global Step: 245530 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:32,910-Speed 2625.92 samples/sec Loss 9.6982 LearningRate 0.0496 Epoch: 5 Global Step: 245540 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:36,813-Speed 2624.56 samples/sec Loss 9.6400 LearningRate 0.0496 Epoch: 5 Global Step: 245550 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:40,708-Speed 2629.63 samples/sec Loss 9.4664 LearningRate 0.0496 Epoch: 5 Global Step: 245560 Fp16 Grad Scale: 65536 Required: 66 hours
Training: 2022-04-13 23:18:44,601-Speed 2631.19 samples/sec Loss 9.6289 LearningRate 0.0496 Epoch: 5 Global Step: 245570 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:18:48,499-Speed 2627.55 samples/sec Loss 9.4742 LearningRate 0.0496 Epoch: 5 Global Step: 245580 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:18:52,388-Speed 2633.76 samples/sec Loss 9.4050 LearningRate 0.0496 Epoch: 5 Global Step: 245590 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:18:56,283-Speed 2629.46 samples/sec Loss 9.5620 LearningRate 0.0496 Epoch: 5 Global Step: 245600 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:00,175-Speed 2631.10 samples/sec Loss 9.3687 LearningRate 0.0496 Epoch: 5 Global Step: 245610 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:04,070-Speed 2629.87 samples/sec Loss 9.4938 LearningRate 0.0496 Epoch: 5 Global Step: 245620 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:07,966-Speed 2629.17 samples/sec Loss 9.5758 LearningRate 0.0495 Epoch: 5 Global Step: 245630 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:11,861-Speed 2629.64 samples/sec Loss 9.3954 LearningRate 0.0495 Epoch: 5 Global Step: 245640 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:15,756-Speed 2629.62 samples/sec Loss 9.6540 LearningRate 0.0495 Epoch: 5 Global Step: 245650 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:19,648-Speed 2631.43 samples/sec Loss 9.5716 LearningRate 0.0495 Epoch: 5 Global Step: 245660 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:23,545-Speed 2628.69 samples/sec Loss 9.5915 LearningRate 0.0495 Epoch: 5 Global Step: 245670 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:19:27,447-Speed 2624.95 samples/sec Loss 9.4593 LearningRate 0.0495 Epoch: 5 Global Step: 245680 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:19:31,325-Speed 2640.84 samples/sec Loss 9.6289 LearningRate 0.0495 Epoch: 5 Global Step: 245690 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:35,220-Speed 2629.06 samples/sec Loss 9.5068 LearningRate 0.0495 Epoch: 5 Global Step: 245700 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:39,136-Speed 2615.57 samples/sec Loss 9.5931 LearningRate 0.0495 Epoch: 5 Global Step: 245710 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:43,026-Speed 2633.57 samples/sec Loss 9.4722 LearningRate 0.0495 Epoch: 5 Global Step: 245720 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:46,916-Speed 2632.98 samples/sec Loss 9.5446 LearningRate 0.0495 Epoch: 5 Global Step: 245730 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:50,827-Speed 2619.59 samples/sec Loss 9.6133 LearningRate 0.0495 Epoch: 5 Global Step: 245740 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:54,718-Speed 2632.17 samples/sec Loss 9.5165 LearningRate 0.0495 Epoch: 5 Global Step: 245750 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:19:58,606-Speed 2634.14 samples/sec Loss 9.5269 LearningRate 0.0495 Epoch: 5 Global Step: 245760 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:02,498-Speed 2631.65 samples/sec Loss 9.5341 LearningRate 0.0495 Epoch: 5 Global Step: 245770 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:06,387-Speed 2633.11 samples/sec Loss 9.6066 LearningRate 0.0495 Epoch: 5 Global Step: 245780 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:10,263-Speed 2642.43 samples/sec Loss 9.4708 LearningRate 0.0495 Epoch: 5 Global Step: 245790 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:14,165-Speed 2625.24 samples/sec Loss 9.4873 LearningRate 0.0495 Epoch: 5 Global Step: 245800 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:18,058-Speed 2630.59 samples/sec Loss 9.4257 LearningRate 0.0495 Epoch: 5 Global Step: 245810 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:21,959-Speed 2625.89 samples/sec Loss 9.5249 LearningRate 0.0495 Epoch: 5 Global Step: 245820 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:25,854-Speed 2629.86 samples/sec Loss 9.4715 LearningRate 0.0495 Epoch: 5 Global Step: 245830 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:29,757-Speed 2624.59 samples/sec Loss 9.6067 LearningRate 0.0495 Epoch: 5 Global Step: 245840 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:33,653-Speed 2628.87 samples/sec Loss 9.5344 LearningRate 0.0495 Epoch: 5 Global Step: 245850 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:37,549-Speed 2628.45 samples/sec Loss 9.5790 LearningRate 0.0495 Epoch: 5 Global Step: 245860 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:41,440-Speed 2632.06 samples/sec Loss 9.4988 LearningRate 0.0495 Epoch: 5 Global Step: 245870 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:45,348-Speed 2621.02 samples/sec Loss 9.6179 LearningRate 0.0495 Epoch: 5 Global Step: 245880 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:49,245-Speed 2628.78 samples/sec Loss 9.4863 LearningRate 0.0495 Epoch: 5 Global Step: 245890 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:20:53,130-Speed 2636.02 samples/sec Loss 9.4308 LearningRate 0.0495 Epoch: 5 Global Step: 245900 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:20:57,016-Speed 2636.23 samples/sec Loss 9.4794 LearningRate 0.0495 Epoch: 5 Global Step: 245910 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:00,909-Speed 2630.22 samples/sec Loss 9.5567 LearningRate 0.0495 Epoch: 5 Global Step: 245920 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:04,806-Speed 2628.63 samples/sec Loss 9.4828 LearningRate 0.0495 Epoch: 5 Global Step: 245930 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:08,697-Speed 2632.18 samples/sec Loss 9.5852 LearningRate 0.0495 Epoch: 5 Global Step: 245940 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:12,589-Speed 2632.03 samples/sec Loss 9.4544 LearningRate 0.0495 Epoch: 5 Global Step: 245950 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:16,483-Speed 2630.02 samples/sec Loss 9.5163 LearningRate 0.0495 Epoch: 5 Global Step: 245960 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:20,384-Speed 2625.83 samples/sec Loss 9.5153 LearningRate 0.0495 Epoch: 5 Global Step: 245970 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:24,278-Speed 2630.23 samples/sec Loss 9.5120 LearningRate 0.0495 Epoch: 5 Global Step: 245980 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:28,179-Speed 2626.23 samples/sec Loss 9.5111 LearningRate 0.0495 Epoch: 5 Global Step: 245990 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:32,075-Speed 2628.67 samples/sec Loss 9.2883 LearningRate 0.0495 Epoch: 5 Global Step: 246000 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:21:35,985-Speed 2619.23 samples/sec Loss 9.4712 LearningRate 0.0495 Epoch: 5 Global Step: 246010 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:21:39,868-Speed 2637.77 samples/sec Loss 9.4856 LearningRate 0.0495 Epoch: 5 Global Step: 246020 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:43,764-Speed 2628.91 samples/sec Loss 9.4596 LearningRate 0.0495 Epoch: 5 Global Step: 246030 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:47,655-Speed 2632.74 samples/sec Loss 9.4530 LearningRate 0.0495 Epoch: 5 Global Step: 246040 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:51,546-Speed 2632.37 samples/sec Loss 9.5612 LearningRate 0.0495 Epoch: 5 Global Step: 246050 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:55,437-Speed 2632.20 samples/sec Loss 9.6297 LearningRate 0.0495 Epoch: 5 Global Step: 246060 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:21:59,329-Speed 2631.35 samples/sec Loss 9.4544 LearningRate 0.0495 Epoch: 5 Global Step: 246070 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:03,223-Speed 2630.50 samples/sec Loss 9.6352 LearningRate 0.0495 Epoch: 5 Global Step: 246080 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:07,117-Speed 2630.28 samples/sec Loss 9.5611 LearningRate 0.0495 Epoch: 5 Global Step: 246090 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:11,012-Speed 2629.19 samples/sec Loss 9.5247 LearningRate 0.0495 Epoch: 5 Global Step: 246100 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:14,908-Speed 2628.73 samples/sec Loss 9.5208 LearningRate 0.0495 Epoch: 5 Global Step: 246110 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:18,839-Speed 2605.71 samples/sec Loss 9.4050 LearningRate 0.0495 Epoch: 5 Global Step: 246120 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:22:22,719-Speed 2639.86 samples/sec Loss 9.6034 LearningRate 0.0495 Epoch: 5 Global Step: 246130 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:26,613-Speed 2630.55 samples/sec Loss 9.5929 LearningRate 0.0495 Epoch: 5 Global Step: 246140 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:30,505-Speed 2631.44 samples/sec Loss 9.4005 LearningRate 0.0495 Epoch: 5 Global Step: 246150 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:34,400-Speed 2629.90 samples/sec Loss 9.3717 LearningRate 0.0495 Epoch: 5 Global Step: 246160 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:38,301-Speed 2625.75 samples/sec Loss 9.5968 LearningRate 0.0495 Epoch: 5 Global Step: 246170 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:42,198-Speed 2627.80 samples/sec Loss 9.5320 LearningRate 0.0495 Epoch: 5 Global Step: 246180 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:46,105-Speed 2621.67 samples/sec Loss 9.4852 LearningRate 0.0495 Epoch: 5 Global Step: 246190 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:50,009-Speed 2623.39 samples/sec Loss 9.4553 LearningRate 0.0495 Epoch: 5 Global Step: 246200 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:53,897-Speed 2634.10 samples/sec Loss 9.5129 LearningRate 0.0495 Epoch: 5 Global Step: 246210 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:22:57,790-Speed 2630.83 samples/sec Loss 9.4790 LearningRate 0.0494 Epoch: 5 Global Step: 246220 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:23:01,681-Speed 2632.73 samples/sec Loss 9.5185 LearningRate 0.0494 Epoch: 5 Global Step: 246230 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:23:05,573-Speed 2632.12 samples/sec Loss 9.5821 LearningRate 0.0494 Epoch: 5 Global Step: 246240 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:23:09,463-Speed 2632.72 samples/sec Loss 9.7049 LearningRate 0.0494 Epoch: 5 Global Step: 246250 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:23:13,352-Speed 2633.51 samples/sec Loss 9.6951 LearningRate 0.0494 Epoch: 5 Global Step: 246260 Fp16 Grad Scale: 262144 Required: 66 hours
Training: 2022-04-13 23:23:17,223-Speed 2645.49 samples/sec Loss 9.5504 LearningRate 0.0494 Epoch: 5 Global Step: 246270 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:23:21,114-Speed 2632.75 samples/sec Loss 9.5188 LearningRate 0.0494 Epoch: 5 Global Step: 246280 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:23:25,005-Speed 2632.49 samples/sec Loss 9.4864 LearningRate 0.0494 Epoch: 5 Global Step: 246290 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:23:28,896-Speed 2632.26 samples/sec Loss 9.4730 LearningRate 0.0494 Epoch: 5 Global Step: 246300 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:23:32,789-Speed 2631.00 samples/sec Loss 9.5173 LearningRate 0.0494 Epoch: 5 Global Step: 246310 Fp16 Grad Scale: 131072 Required: 66 hours
Training: 2022-04-13 23:23:36,681-Speed 2631.31 samples/sec Loss 9.5009 LearningRate 0.0494 Epoch: 5 Global Step: 246320 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:23:40,576-Speed 2630.00 samples/sec Loss 9.5282 LearningRate 0.0494 Epoch: 5 Global Step: 246330 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:23:44,465-Speed 2633.35 samples/sec Loss 9.3546 LearningRate 0.0494 Epoch: 5 Global Step: 246340 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:23:48,357-Speed 2631.89 samples/sec Loss 9.4394 LearningRate 0.0494 Epoch: 5 Global Step: 246350 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:23:52,255-Speed 2627.54 samples/sec Loss 9.4334 LearningRate 0.0494 Epoch: 5 Global Step: 246360 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:23:56,151-Speed 2629.08 samples/sec Loss 9.4848 LearningRate 0.0494 Epoch: 5 Global Step: 246370 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:24:00,053-Speed 2624.37 samples/sec Loss 9.4792 LearningRate 0.0494 Epoch: 5 Global Step: 246380 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:24:03,924-Speed 2646.47 samples/sec Loss 9.5001 LearningRate 0.0494 Epoch: 5 Global Step: 246390 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:07,814-Speed 2632.53 samples/sec Loss 9.4353 LearningRate 0.0494 Epoch: 5 Global Step: 246400 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:11,708-Speed 2630.04 samples/sec Loss 9.4539 LearningRate 0.0494 Epoch: 5 Global Step: 246410 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:15,609-Speed 2625.65 samples/sec Loss 9.5909 LearningRate 0.0494 Epoch: 5 Global Step: 246420 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:19,503-Speed 2630.31 samples/sec Loss 9.6451 LearningRate 0.0494 Epoch: 5 Global Step: 246430 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:23,397-Speed 2630.71 samples/sec Loss 9.5123 LearningRate 0.0494 Epoch: 5 Global Step: 246440 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:27,290-Speed 2630.68 samples/sec Loss 9.5545 LearningRate 0.0494 Epoch: 5 Global Step: 246450 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:31,182-Speed 2631.89 samples/sec Loss 9.5017 LearningRate 0.0494 Epoch: 5 Global Step: 246460 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:35,073-Speed 2632.58 samples/sec Loss 9.4450 LearningRate 0.0494 Epoch: 5 Global Step: 246470 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:38,964-Speed 2632.03 samples/sec Loss 9.5850 LearningRate 0.0494 Epoch: 5 Global Step: 246480 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:42,844-Speed 2639.36 samples/sec Loss 9.5815 LearningRate 0.0494 Epoch: 5 Global Step: 246490 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:46,729-Speed 2637.09 samples/sec Loss 9.5430 LearningRate 0.0494 Epoch: 5 Global Step: 246500 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:50,624-Speed 2629.65 samples/sec Loss 9.4508 LearningRate 0.0494 Epoch: 5 Global Step: 246510 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:24:54,502-Speed 2641.16 samples/sec Loss 9.4488 LearningRate 0.0494 Epoch: 5 Global Step: 246520 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:24:58,398-Speed 2628.94 samples/sec Loss 9.6648 LearningRate 0.0494 Epoch: 5 Global Step: 246530 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:02,296-Speed 2627.95 samples/sec Loss 9.5843 LearningRate 0.0494 Epoch: 5 Global Step: 246540 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:06,188-Speed 2631.29 samples/sec Loss 9.3273 LearningRate 0.0494 Epoch: 5 Global Step: 246550 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:10,079-Speed 2632.58 samples/sec Loss 9.2950 LearningRate 0.0494 Epoch: 5 Global Step: 246560 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:13,969-Speed 2632.52 samples/sec Loss 9.5634 LearningRate 0.0494 Epoch: 5 Global Step: 246570 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:17,870-Speed 2625.65 samples/sec Loss 9.6061 LearningRate 0.0494 Epoch: 5 Global Step: 246580 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:21,777-Speed 2621.79 samples/sec Loss 9.5308 LearningRate 0.0494 Epoch: 5 Global Step: 246590 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:25,681-Speed 2623.51 samples/sec Loss 9.4987 LearningRate 0.0494 Epoch: 5 Global Step: 246600 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:29,571-Speed 2633.33 samples/sec Loss 9.4526 LearningRate 0.0494 Epoch: 5 Global Step: 246610 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:33,466-Speed 2629.73 samples/sec Loss 9.3456 LearningRate 0.0494 Epoch: 5 Global Step: 246620 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:25:37,346-Speed 2640.02 samples/sec Loss 9.4790 LearningRate 0.0494 Epoch: 5 Global Step: 246630 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:41,236-Speed 2632.48 samples/sec Loss 9.5100 LearningRate 0.0494 Epoch: 5 Global Step: 246640 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:45,136-Speed 2626.38 samples/sec Loss 9.5086 LearningRate 0.0494 Epoch: 5 Global Step: 246650 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:49,049-Speed 2617.74 samples/sec Loss 9.5084 LearningRate 0.0494 Epoch: 5 Global Step: 246660 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:25:52,935-Speed 2635.78 samples/sec Loss 9.4968 LearningRate 0.0494 Epoch: 5 Global Step: 246670 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:25:56,833-Speed 2627.71 samples/sec Loss 9.4380 LearningRate 0.0494 Epoch: 5 Global Step: 246680 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:00,722-Speed 2633.77 samples/sec Loss 9.5739 LearningRate 0.0494 Epoch: 5 Global Step: 246690 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:04,617-Speed 2630.03 samples/sec Loss 9.5049 LearningRate 0.0494 Epoch: 5 Global Step: 246700 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:08,519-Speed 2624.13 samples/sec Loss 9.3442 LearningRate 0.0494 Epoch: 5 Global Step: 246710 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:12,410-Speed 2632.71 samples/sec Loss 9.5678 LearningRate 0.0494 Epoch: 5 Global Step: 246720 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:16,301-Speed 2632.24 samples/sec Loss 9.5976 LearningRate 0.0494 Epoch: 5 Global Step: 246730 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:20,193-Speed 2631.93 samples/sec Loss 9.4155 LearningRate 0.0494 Epoch: 5 Global Step: 246740 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:24,108-Speed 2616.00 samples/sec Loss 9.4416 LearningRate 0.0494 Epoch: 5 Global Step: 246750 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:28,010-Speed 2625.17 samples/sec Loss 9.5018 LearningRate 0.0494 Epoch: 5 Global Step: 246760 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:26:31,903-Speed 2631.54 samples/sec Loss 9.4580 LearningRate 0.0494 Epoch: 5 Global Step: 246770 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:26:35,793-Speed 2632.67 samples/sec Loss 9.5319 LearningRate 0.0494 Epoch: 5 Global Step: 246780 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:26:39,687-Speed 2629.93 samples/sec Loss 9.4249 LearningRate 0.0494 Epoch: 5 Global Step: 246790 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:26:43,580-Speed 2631.36 samples/sec Loss 9.4487 LearningRate 0.0494 Epoch: 5 Global Step: 246800 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:26:47,475-Speed 2629.55 samples/sec Loss 9.6767 LearningRate 0.0493 Epoch: 5 Global Step: 246810 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:26:51,368-Speed 2630.90 samples/sec Loss 9.4131 LearningRate 0.0493 Epoch: 5 Global Step: 246820 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:26:55,259-Speed 2632.28 samples/sec Loss 9.4214 LearningRate 0.0493 Epoch: 5 Global Step: 246830 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:26:59,176-Speed 2615.01 samples/sec Loss 9.5140 LearningRate 0.0493 Epoch: 5 Global Step: 246840 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:27:03,085-Speed 2620.60 samples/sec Loss 9.5358 LearningRate 0.0493 Epoch: 5 Global Step: 246850 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:27:06,975-Speed 2632.79 samples/sec Loss 9.6177 LearningRate 0.0493 Epoch: 5 Global Step: 246860 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:27:10,866-Speed 2632.43 samples/sec Loss 9.5170 LearningRate 0.0493 Epoch: 5 Global Step: 246870 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:14,764-Speed 2627.62 samples/sec Loss 9.5328 LearningRate 0.0493 Epoch: 5 Global Step: 246880 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:18,658-Speed 2630.47 samples/sec Loss 9.5197 LearningRate 0.0493 Epoch: 5 Global Step: 246890 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:22,555-Speed 2628.79 samples/sec Loss 9.5363 LearningRate 0.0493 Epoch: 5 Global Step: 246900 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:26,447-Speed 2631.36 samples/sec Loss 9.4983 LearningRate 0.0493 Epoch: 5 Global Step: 246910 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:30,342-Speed 2629.86 samples/sec Loss 9.4820 LearningRate 0.0493 Epoch: 5 Global Step: 246920 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:34,244-Speed 2624.90 samples/sec Loss 9.5489 LearningRate 0.0493 Epoch: 5 Global Step: 246930 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:38,137-Speed 2630.60 samples/sec Loss 9.4980 LearningRate 0.0493 Epoch: 5 Global Step: 246940 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:42,032-Speed 2630.11 samples/sec Loss 9.4854 LearningRate 0.0493 Epoch: 5 Global Step: 246950 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:27:45,904-Speed 2645.54 samples/sec Loss 9.5613 LearningRate 0.0493 Epoch: 5 Global Step: 246960 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:27:49,763-Speed 2654.47 samples/sec Loss 11.1611 LearningRate 0.0493 Epoch: 5 Global Step: 246970 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:27:53,648-Speed 2636.16 samples/sec Loss 10.2777 LearningRate 0.0493 Epoch: 5 Global Step: 246980 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:27:57,543-Speed 2629.52 samples/sec Loss 9.9330 LearningRate 0.0493 Epoch: 5 Global Step: 246990 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:01,435-Speed 2631.72 samples/sec Loss 9.5338 LearningRate 0.0493 Epoch: 5 Global Step: 247000 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:05,323-Speed 2634.18 samples/sec Loss 9.4666 LearningRate 0.0493 Epoch: 5 Global Step: 247010 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:09,211-Speed 2634.02 samples/sec Loss 9.5782 LearningRate 0.0493 Epoch: 5 Global Step: 247020 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:13,101-Speed 2633.89 samples/sec Loss 9.6396 LearningRate 0.0493 Epoch: 5 Global Step: 247030 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:16,988-Speed 2634.49 samples/sec Loss 9.4538 LearningRate 0.0493 Epoch: 5 Global Step: 247040 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:20,883-Speed 2630.22 samples/sec Loss 9.6430 LearningRate 0.0493 Epoch: 5 Global Step: 247050 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:24,770-Speed 2634.76 samples/sec Loss 9.3908 LearningRate 0.0493 Epoch: 5 Global Step: 247060 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:28:28,663-Speed 2631.08 samples/sec Loss 9.6417 LearningRate 0.0493 Epoch: 5 Global Step: 247070 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:32,554-Speed 2632.62 samples/sec Loss 9.5588 LearningRate 0.0493 Epoch: 5 Global Step: 247080 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:36,444-Speed 2632.97 samples/sec Loss 9.5378 LearningRate 0.0493 Epoch: 5 Global Step: 247090 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:40,330-Speed 2635.27 samples/sec Loss 9.5037 LearningRate 0.0493 Epoch: 5 Global Step: 247100 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:44,222-Speed 2632.06 samples/sec Loss 9.4120 LearningRate 0.0493 Epoch: 5 Global Step: 247110 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:48,118-Speed 2629.57 samples/sec Loss 9.5266 LearningRate 0.0493 Epoch: 5 Global Step: 247120 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:52,011-Speed 2630.76 samples/sec Loss 9.4221 LearningRate 0.0493 Epoch: 5 Global Step: 247130 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:55,900-Speed 2633.73 samples/sec Loss 9.6960 LearningRate 0.0493 Epoch: 5 Global Step: 247140 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:28:59,807-Speed 2621.57 samples/sec Loss 9.5500 LearningRate 0.0493 Epoch: 5 Global Step: 247150 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:29:03,719-Speed 2618.89 samples/sec Loss 9.6194 LearningRate 0.0493 Epoch: 5 Global Step: 247160 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:29:07,624-Speed 2623.04 samples/sec Loss 9.4456 LearningRate 0.0493 Epoch: 5 Global Step: 247170 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:29:11,548-Speed 2610.01 samples/sec Loss 9.5187 LearningRate 0.0493 Epoch: 5 Global Step: 247180 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:29:15,447-Speed 2627.27 samples/sec Loss 9.5582 LearningRate 0.0493 Epoch: 5 Global Step: 247190 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:29:19,344-Speed 2628.28 samples/sec Loss 9.6012 LearningRate 0.0493 Epoch: 5 Global Step: 247200 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:29:23,241-Speed 2628.32 samples/sec Loss 9.6363 LearningRate 0.0493 Epoch: 5 Global Step: 247210 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:29:27,091-Speed 2660.64 samples/sec Loss 10.2801 LearningRate 0.0493 Epoch: 5 Global Step: 247220 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:29:30,982-Speed 2632.00 samples/sec Loss 9.9408 LearningRate 0.0493 Epoch: 5 Global Step: 247230 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:29:34,859-Speed 2642.45 samples/sec Loss 9.7733 LearningRate 0.0493 Epoch: 5 Global Step: 247240 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:29:38,745-Speed 2635.33 samples/sec Loss 9.4521 LearningRate 0.0493 Epoch: 5 Global Step: 247250 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:29:42,632-Speed 2635.04 samples/sec Loss 9.6497 LearningRate 0.0493 Epoch: 5 Global Step: 247260 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:29:46,525-Speed 2631.01 samples/sec Loss 9.5586 LearningRate 0.0493 Epoch: 5 Global Step: 247270 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:29:50,536-Speed 2553.50 samples/sec Loss 9.5815 LearningRate 0.0493 Epoch: 5 Global Step: 247280 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:29:54,416-Speed 2640.14 samples/sec Loss 9.6092 LearningRate 0.0493 Epoch: 5 Global Step: 247290 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:29:58,323-Speed 2621.45 samples/sec Loss 9.5481 LearningRate 0.0493 Epoch: 5 Global Step: 247300 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:30:02,220-Speed 2628.99 samples/sec Loss 9.5804 LearningRate 0.0493 Epoch: 5 Global Step: 247310 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:30:06,115-Speed 2630.03 samples/sec Loss 9.4982 LearningRate 0.0493 Epoch: 5 Global Step: 247320 Fp16 Grad Scale: 4096 Required: 65 hours
Training: 2022-04-13 23:30:10,006-Speed 2631.91 samples/sec Loss 9.5348 LearningRate 0.0493 Epoch: 5 Global Step: 247330 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:13,902-Speed 2628.99 samples/sec Loss 9.5320 LearningRate 0.0493 Epoch: 5 Global Step: 247340 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:17,793-Speed 2631.87 samples/sec Loss 9.4938 LearningRate 0.0493 Epoch: 5 Global Step: 247350 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:21,688-Speed 2630.02 samples/sec Loss 9.5110 LearningRate 0.0493 Epoch: 5 Global Step: 247360 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:25,579-Speed 2633.92 samples/sec Loss 9.4957 LearningRate 0.0493 Epoch: 5 Global Step: 247370 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:29,503-Speed 2610.43 samples/sec Loss 9.5665 LearningRate 0.0493 Epoch: 5 Global Step: 247380 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:33,396-Speed 2630.26 samples/sec Loss 9.6129 LearningRate 0.0493 Epoch: 5 Global Step: 247390 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:37,289-Speed 2631.10 samples/sec Loss 9.6245 LearningRate 0.0492 Epoch: 5 Global Step: 247400 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:41,178-Speed 2633.93 samples/sec Loss 9.5903 LearningRate 0.0492 Epoch: 5 Global Step: 247410 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:45,063-Speed 2636.86 samples/sec Loss 9.5536 LearningRate 0.0492 Epoch: 5 Global Step: 247420 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:30:48,951-Speed 2634.53 samples/sec Loss 9.4534 LearningRate 0.0492 Epoch: 5 Global Step: 247430 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:30:52,837-Speed 2635.85 samples/sec Loss 9.4697 LearningRate 0.0492 Epoch: 5 Global Step: 247440 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:30:56,726-Speed 2633.55 samples/sec Loss 9.3993 LearningRate 0.0492 Epoch: 5 Global Step: 247450 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:00,612-Speed 2635.93 samples/sec Loss 9.4525 LearningRate 0.0492 Epoch: 5 Global Step: 247460 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:04,501-Speed 2633.06 samples/sec Loss 9.3200 LearningRate 0.0492 Epoch: 5 Global Step: 247470 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:08,412-Speed 2619.11 samples/sec Loss 9.5724 LearningRate 0.0492 Epoch: 5 Global Step: 247480 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:12,310-Speed 2627.53 samples/sec Loss 9.5571 LearningRate 0.0492 Epoch: 5 Global Step: 247490 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:16,202-Speed 2632.03 samples/sec Loss 9.5423 LearningRate 0.0492 Epoch: 5 Global Step: 247500 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:20,094-Speed 2631.60 samples/sec Loss 9.5337 LearningRate 0.0492 Epoch: 5 Global Step: 247510 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:23,995-Speed 2625.57 samples/sec Loss 9.4674 LearningRate 0.0492 Epoch: 5 Global Step: 247520 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:31:27,893-Speed 2627.54 samples/sec Loss 9.5570 LearningRate 0.0492 Epoch: 5 Global Step: 247530 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:31,781-Speed 2634.75 samples/sec Loss 9.5704 LearningRate 0.0492 Epoch: 5 Global Step: 247540 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:35,680-Speed 2627.06 samples/sec Loss 9.5659 LearningRate 0.0492 Epoch: 5 Global Step: 247550 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:39,572-Speed 2631.45 samples/sec Loss 9.4392 LearningRate 0.0492 Epoch: 5 Global Step: 247560 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:43,466-Speed 2630.61 samples/sec Loss 9.5027 LearningRate 0.0492 Epoch: 5 Global Step: 247570 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:47,367-Speed 2625.22 samples/sec Loss 9.6044 LearningRate 0.0492 Epoch: 5 Global Step: 247580 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:51,257-Speed 2633.70 samples/sec Loss 9.4583 LearningRate 0.0492 Epoch: 5 Global Step: 247590 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:55,151-Speed 2630.05 samples/sec Loss 9.4611 LearningRate 0.0492 Epoch: 5 Global Step: 247600 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:31:59,043-Speed 2632.05 samples/sec Loss 9.5308 LearningRate 0.0492 Epoch: 5 Global Step: 247610 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:32:02,933-Speed 2633.12 samples/sec Loss 9.4002 LearningRate 0.0492 Epoch: 5 Global Step: 247620 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:32:06,832-Speed 2626.90 samples/sec Loss 9.4878 LearningRate 0.0492 Epoch: 5 Global Step: 247630 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:10,724-Speed 2631.66 samples/sec Loss 9.4217 LearningRate 0.0492 Epoch: 5 Global Step: 247640 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:14,637-Speed 2617.92 samples/sec Loss 9.4637 LearningRate 0.0492 Epoch: 5 Global Step: 247650 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:18,536-Speed 2627.12 samples/sec Loss 9.4736 LearningRate 0.0492 Epoch: 5 Global Step: 247660 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:22,425-Speed 2633.37 samples/sec Loss 9.5300 LearningRate 0.0492 Epoch: 5 Global Step: 247670 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:26,321-Speed 2628.96 samples/sec Loss 9.4464 LearningRate 0.0492 Epoch: 5 Global Step: 247680 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:30,236-Speed 2616.52 samples/sec Loss 9.4886 LearningRate 0.0492 Epoch: 5 Global Step: 247690 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:34,142-Speed 2622.29 samples/sec Loss 9.4753 LearningRate 0.0492 Epoch: 5 Global Step: 247700 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:38,039-Speed 2628.27 samples/sec Loss 9.3649 LearningRate 0.0492 Epoch: 5 Global Step: 247710 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:41,934-Speed 2629.58 samples/sec Loss 9.5172 LearningRate 0.0492 Epoch: 5 Global Step: 247720 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:32:45,858-Speed 2610.24 samples/sec Loss 9.4279 LearningRate 0.0492 Epoch: 5 Global Step: 247730 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:32:49,751-Speed 2631.31 samples/sec Loss 9.4792 LearningRate 0.0492 Epoch: 5 Global Step: 247740 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:32:53,684-Speed 2604.81 samples/sec Loss 9.6541 LearningRate 0.0492 Epoch: 5 Global Step: 247750 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:32:57,609-Speed 2609.11 samples/sec Loss 9.6085 LearningRate 0.0492 Epoch: 5 Global Step: 247760 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:01,502-Speed 2631.19 samples/sec Loss 9.5938 LearningRate 0.0492 Epoch: 5 Global Step: 247770 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:05,392-Speed 2633.43 samples/sec Loss 9.3586 LearningRate 0.0492 Epoch: 5 Global Step: 247780 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:09,280-Speed 2633.81 samples/sec Loss 9.4507 LearningRate 0.0492 Epoch: 5 Global Step: 247790 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:13,172-Speed 2631.79 samples/sec Loss 9.4023 LearningRate 0.0492 Epoch: 5 Global Step: 247800 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:17,068-Speed 2628.76 samples/sec Loss 9.5912 LearningRate 0.0492 Epoch: 5 Global Step: 247810 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:20,982-Speed 2617.73 samples/sec Loss 9.5501 LearningRate 0.0492 Epoch: 5 Global Step: 247820 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:24,876-Speed 2630.12 samples/sec Loss 9.3740 LearningRate 0.0492 Epoch: 5 Global Step: 247830 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:33:28,747-Speed 2646.20 samples/sec Loss 9.4285 LearningRate 0.0492 Epoch: 5 Global Step: 247840 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:32,638-Speed 2632.30 samples/sec Loss 9.5019 LearningRate 0.0492 Epoch: 5 Global Step: 247850 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:36,534-Speed 2628.44 samples/sec Loss 9.6026 LearningRate 0.0492 Epoch: 5 Global Step: 247860 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:40,426-Speed 2631.31 samples/sec Loss 9.5223 LearningRate 0.0492 Epoch: 5 Global Step: 247870 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:44,320-Speed 2630.64 samples/sec Loss 9.5117 LearningRate 0.0492 Epoch: 5 Global Step: 247880 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:48,210-Speed 2633.09 samples/sec Loss 9.4472 LearningRate 0.0492 Epoch: 5 Global Step: 247890 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:52,101-Speed 2632.53 samples/sec Loss 9.5637 LearningRate 0.0492 Epoch: 5 Global Step: 247900 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:56,001-Speed 2626.48 samples/sec Loss 9.5537 LearningRate 0.0492 Epoch: 5 Global Step: 247910 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:33:59,910-Speed 2620.08 samples/sec Loss 9.3801 LearningRate 0.0492 Epoch: 5 Global Step: 247920 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:03,808-Speed 2627.53 samples/sec Loss 9.2518 LearningRate 0.0492 Epoch: 5 Global Step: 247930 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:07,705-Speed 2628.15 samples/sec Loss 9.6209 LearningRate 0.0492 Epoch: 5 Global Step: 247940 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:34:11,597-Speed 2631.74 samples/sec Loss 9.4764 LearningRate 0.0492 Epoch: 5 Global Step: 247950 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:34:15,481-Speed 2637.09 samples/sec Loss 9.5373 LearningRate 0.0492 Epoch: 5 Global Step: 247960 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:19,404-Speed 2610.68 samples/sec Loss 9.5635 LearningRate 0.0492 Epoch: 5 Global Step: 247970 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:23,298-Speed 2630.35 samples/sec Loss 9.4709 LearningRate 0.0492 Epoch: 5 Global Step: 247980 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:27,191-Speed 2631.36 samples/sec Loss 9.4743 LearningRate 0.0491 Epoch: 5 Global Step: 247990 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:31,080-Speed 2633.50 samples/sec Loss 9.5458 LearningRate 0.0491 Epoch: 5 Global Step: 248000 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:34,970-Speed 2633.29 samples/sec Loss 9.5879 LearningRate 0.0491 Epoch: 5 Global Step: 248010 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:38,863-Speed 2630.84 samples/sec Loss 9.5188 LearningRate 0.0491 Epoch: 5 Global Step: 248020 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:42,756-Speed 2631.09 samples/sec Loss 9.5124 LearningRate 0.0491 Epoch: 5 Global Step: 248030 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:46,649-Speed 2630.58 samples/sec Loss 9.4197 LearningRate 0.0491 Epoch: 5 Global Step: 248040 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:50,547-Speed 2628.37 samples/sec Loss 9.4812 LearningRate 0.0491 Epoch: 5 Global Step: 248050 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:54,438-Speed 2632.54 samples/sec Loss 9.4440 LearningRate 0.0491 Epoch: 5 Global Step: 248060 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:34:58,369-Speed 2605.74 samples/sec Loss 9.5516 LearningRate 0.0491 Epoch: 5 Global Step: 248070 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:02,257-Speed 2634.37 samples/sec Loss 9.6292 LearningRate 0.0491 Epoch: 5 Global Step: 248080 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:06,157-Speed 2626.76 samples/sec Loss 9.5712 LearningRate 0.0491 Epoch: 5 Global Step: 248090 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:10,064-Speed 2620.84 samples/sec Loss 9.4463 LearningRate 0.0491 Epoch: 5 Global Step: 248100 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:13,956-Speed 2631.77 samples/sec Loss 9.5136 LearningRate 0.0491 Epoch: 5 Global Step: 248110 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:17,849-Speed 2631.10 samples/sec Loss 9.4908 LearningRate 0.0491 Epoch: 5 Global Step: 248120 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:21,756-Speed 2621.55 samples/sec Loss 9.5279 LearningRate 0.0491 Epoch: 5 Global Step: 248130 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:25,659-Speed 2624.48 samples/sec Loss 9.4909 LearningRate 0.0491 Epoch: 5 Global Step: 248140 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:29,554-Speed 2629.76 samples/sec Loss 9.4608 LearningRate 0.0491 Epoch: 5 Global Step: 248150 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:33,450-Speed 2629.26 samples/sec Loss 9.6543 LearningRate 0.0491 Epoch: 5 Global Step: 248160 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:35:37,343-Speed 2631.06 samples/sec Loss 9.4186 LearningRate 0.0491 Epoch: 5 Global Step: 248170 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:35:41,273-Speed 2606.41 samples/sec Loss 9.5465 LearningRate 0.0491 Epoch: 5 Global Step: 248180 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:35:45,179-Speed 2621.89 samples/sec Loss 9.4689 LearningRate 0.0491 Epoch: 5 Global Step: 248190 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:35:49,080-Speed 2626.27 samples/sec Loss 9.6112 LearningRate 0.0491 Epoch: 5 Global Step: 248200 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:35:52,982-Speed 2624.66 samples/sec Loss 9.4852 LearningRate 0.0491 Epoch: 5 Global Step: 248210 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:35:56,879-Speed 2628.54 samples/sec Loss 9.5967 LearningRate 0.0491 Epoch: 5 Global Step: 248220 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:00,796-Speed 2614.35 samples/sec Loss 9.5618 LearningRate 0.0491 Epoch: 5 Global Step: 248230 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:04,699-Speed 2625.22 samples/sec Loss 9.3876 LearningRate 0.0491 Epoch: 5 Global Step: 248240 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:08,602-Speed 2624.08 samples/sec Loss 9.4614 LearningRate 0.0491 Epoch: 5 Global Step: 248250 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:12,516-Speed 2617.11 samples/sec Loss 9.2963 LearningRate 0.0491 Epoch: 5 Global Step: 248260 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:16,402-Speed 2635.85 samples/sec Loss 9.5491 LearningRate 0.0491 Epoch: 5 Global Step: 248270 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:36:20,290-Speed 2634.11 samples/sec Loss 9.4949 LearningRate 0.0491 Epoch: 5 Global Step: 248280 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:36:24,166-Speed 2642.53 samples/sec Loss 9.5197 LearningRate 0.0491 Epoch: 5 Global Step: 248290 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:28,059-Speed 2630.89 samples/sec Loss 9.4746 LearningRate 0.0491 Epoch: 5 Global Step: 248300 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:31,971-Speed 2618.52 samples/sec Loss 9.4196 LearningRate 0.0491 Epoch: 5 Global Step: 248310 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:35,861-Speed 2633.23 samples/sec Loss 9.3797 LearningRate 0.0491 Epoch: 5 Global Step: 248320 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:39,763-Speed 2625.18 samples/sec Loss 9.4763 LearningRate 0.0491 Epoch: 5 Global Step: 248330 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:43,662-Speed 2626.75 samples/sec Loss 9.4439 LearningRate 0.0491 Epoch: 5 Global Step: 248340 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:47,595-Speed 2604.23 samples/sec Loss 9.3295 LearningRate 0.0491 Epoch: 5 Global Step: 248350 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:51,526-Speed 2605.73 samples/sec Loss 9.4576 LearningRate 0.0491 Epoch: 5 Global Step: 248360 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:36:55,400-Speed 2644.20 samples/sec Loss 9.4841 LearningRate 0.0491 Epoch: 5 Global Step: 248370 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:36:59,290-Speed 2633.07 samples/sec Loss 9.5818 LearningRate 0.0491 Epoch: 5 Global Step: 248380 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:03,181-Speed 2632.25 samples/sec Loss 9.5118 LearningRate 0.0491 Epoch: 5 Global Step: 248390 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:07,098-Speed 2615.12 samples/sec Loss 9.4670 LearningRate 0.0491 Epoch: 5 Global Step: 248400 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:11,016-Speed 2614.22 samples/sec Loss 9.4847 LearningRate 0.0491 Epoch: 5 Global Step: 248410 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:14,994-Speed 2574.63 samples/sec Loss 9.4926 LearningRate 0.0491 Epoch: 5 Global Step: 248420 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:18,888-Speed 2630.28 samples/sec Loss 9.5388 LearningRate 0.0491 Epoch: 5 Global Step: 248430 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:22,779-Speed 2632.79 samples/sec Loss 9.4482 LearningRate 0.0491 Epoch: 5 Global Step: 248440 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:26,667-Speed 2634.13 samples/sec Loss 9.4247 LearningRate 0.0491 Epoch: 5 Global Step: 248450 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:30,559-Speed 2631.91 samples/sec Loss 9.4933 LearningRate 0.0491 Epoch: 5 Global Step: 248460 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:37:34,457-Speed 2627.19 samples/sec Loss 9.3606 LearningRate 0.0491 Epoch: 5 Global Step: 248470 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:37:38,354-Speed 2628.47 samples/sec Loss 9.4377 LearningRate 0.0491 Epoch: 5 Global Step: 248480 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:37:42,257-Speed 2623.82 samples/sec Loss 9.5610 LearningRate 0.0491 Epoch: 5 Global Step: 248490 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:37:46,146-Speed 2634.06 samples/sec Loss 9.3322 LearningRate 0.0491 Epoch: 5 Global Step: 248500 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:37:50,036-Speed 2633.04 samples/sec Loss 9.5179 LearningRate 0.0491 Epoch: 5 Global Step: 248510 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:37:53,928-Speed 2632.12 samples/sec Loss 9.4951 LearningRate 0.0491 Epoch: 5 Global Step: 248520 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:37:57,828-Speed 2625.79 samples/sec Loss 9.4132 LearningRate 0.0491 Epoch: 5 Global Step: 248530 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:01,739-Speed 2618.93 samples/sec Loss 9.4243 LearningRate 0.0491 Epoch: 5 Global Step: 248540 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:05,630-Speed 2632.55 samples/sec Loss 9.5829 LearningRate 0.0491 Epoch: 5 Global Step: 248550 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:09,529-Speed 2626.34 samples/sec Loss 9.5176 LearningRate 0.0491 Epoch: 5 Global Step: 248560 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:13,443-Speed 2617.33 samples/sec Loss 9.5063 LearningRate 0.0491 Epoch: 5 Global Step: 248570 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:38:17,423-Speed 2573.55 samples/sec Loss 9.5146 LearningRate 0.0490 Epoch: 5 Global Step: 248580 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:38:21,313-Speed 2633.61 samples/sec Loss 9.4220 LearningRate 0.0490 Epoch: 5 Global Step: 248590 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:38:25,207-Speed 2629.94 samples/sec Loss 9.6004 LearningRate 0.0490 Epoch: 5 Global Step: 248600 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:29,107-Speed 2626.53 samples/sec Loss 9.5224 LearningRate 0.0490 Epoch: 5 Global Step: 248610 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:32,998-Speed 2632.73 samples/sec Loss 9.4652 LearningRate 0.0490 Epoch: 5 Global Step: 248620 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:36,887-Speed 2633.68 samples/sec Loss 9.4678 LearningRate 0.0490 Epoch: 5 Global Step: 248630 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:40,794-Speed 2620.91 samples/sec Loss 9.3503 LearningRate 0.0490 Epoch: 5 Global Step: 248640 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:44,674-Speed 2639.93 samples/sec Loss 9.5733 LearningRate 0.0490 Epoch: 5 Global Step: 248650 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:48,565-Speed 2632.42 samples/sec Loss 9.5011 LearningRate 0.0490 Epoch: 5 Global Step: 248660 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:38:52,442-Speed 2642.19 samples/sec Loss 9.3746 LearningRate 0.0490 Epoch: 5 Global Step: 248670 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:38:56,335-Speed 2631.34 samples/sec Loss 9.5417 LearningRate 0.0490 Epoch: 5 Global Step: 248680 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:39:00,179-Speed 2664.70 samples/sec Loss 9.9596 LearningRate 0.0490 Epoch: 5 Global Step: 248690 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:04,073-Speed 2630.04 samples/sec Loss 10.5796 LearningRate 0.0490 Epoch: 5 Global Step: 248700 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:07,960-Speed 2634.89 samples/sec Loss 9.8318 LearningRate 0.0490 Epoch: 5 Global Step: 248710 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:11,850-Speed 2632.55 samples/sec Loss 9.4759 LearningRate 0.0490 Epoch: 5 Global Step: 248720 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:15,740-Speed 2633.43 samples/sec Loss 9.5001 LearningRate 0.0490 Epoch: 5 Global Step: 248730 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:19,643-Speed 2623.97 samples/sec Loss 9.4250 LearningRate 0.0490 Epoch: 5 Global Step: 248740 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:23,536-Speed 2631.21 samples/sec Loss 9.4808 LearningRate 0.0490 Epoch: 5 Global Step: 248750 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:27,432-Speed 2628.83 samples/sec Loss 9.4480 LearningRate 0.0490 Epoch: 5 Global Step: 248760 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:31,329-Speed 2628.77 samples/sec Loss 9.3496 LearningRate 0.0490 Epoch: 5 Global Step: 248770 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:35,221-Speed 2631.65 samples/sec Loss 9.3858 LearningRate 0.0490 Epoch: 5 Global Step: 248780 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-13 23:39:39,115-Speed 2629.99 samples/sec Loss 9.5104 LearningRate 0.0490 Epoch: 5 Global Step: 248790 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:39:43,018-Speed 2624.14 samples/sec Loss 9.5013 LearningRate 0.0490 Epoch: 5 Global Step: 248800 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:39:46,926-Speed 2621.10 samples/sec Loss 9.5317 LearningRate 0.0490 Epoch: 5 Global Step: 248810 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:39:50,815-Speed 2633.22 samples/sec Loss 9.6170 LearningRate 0.0490 Epoch: 5 Global Step: 248820 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:39:54,712-Speed 2628.23 samples/sec Loss 9.5691 LearningRate 0.0490 Epoch: 5 Global Step: 248830 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:39:58,601-Speed 2634.15 samples/sec Loss 9.6470 LearningRate 0.0490 Epoch: 5 Global Step: 248840 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:40:02,494-Speed 2631.53 samples/sec Loss 9.3551 LearningRate 0.0490 Epoch: 5 Global Step: 248850 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:40:06,387-Speed 2630.54 samples/sec Loss 9.4282 LearningRate 0.0490 Epoch: 5 Global Step: 248860 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:40:10,292-Speed 2623.02 samples/sec Loss 9.4197 LearningRate 0.0490 Epoch: 5 Global Step: 248870 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:40:32,124-Speed 469.05 samples/sec Loss 9.4151 LearningRate 0.0490 Epoch: 6 Global Step: 248880 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:40:36,018-Speed 2630.43 samples/sec Loss 9.3991 LearningRate 0.0490 Epoch: 6 Global Step: 248890 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:40:39,907-Speed 2633.73 samples/sec Loss 9.4376 LearningRate 0.0490 Epoch: 6 Global Step: 248900 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:40:43,826-Speed 2614.61 samples/sec Loss 9.4657 LearningRate 0.0490 Epoch: 6 Global Step: 248910 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:40:47,749-Speed 2610.95 samples/sec Loss 9.5165 LearningRate 0.0490 Epoch: 6 Global Step: 248920 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:40:51,642-Speed 2631.45 samples/sec Loss 9.4526 LearningRate 0.0490 Epoch: 6 Global Step: 248930 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:40:55,533-Speed 2632.09 samples/sec Loss 9.3744 LearningRate 0.0490 Epoch: 6 Global Step: 248940 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:40:59,424-Speed 2632.87 samples/sec Loss 9.5917 LearningRate 0.0490 Epoch: 6 Global Step: 248950 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:41:03,348-Speed 2610.23 samples/sec Loss 9.5356 LearningRate 0.0490 Epoch: 6 Global Step: 248960 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:41:07,254-Speed 2621.89 samples/sec Loss 9.5657 LearningRate 0.0490 Epoch: 6 Global Step: 248970 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:41:11,154-Speed 2626.24 samples/sec Loss 9.5554 LearningRate 0.0490 Epoch: 6 Global Step: 248980 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:41:15,063-Speed 2620.69 samples/sec Loss 9.4879 LearningRate 0.0490 Epoch: 6 Global Step: 248990 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:18,974-Speed 2618.81 samples/sec Loss 9.4240 LearningRate 0.0490 Epoch: 6 Global Step: 249000 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:22,879-Speed 2623.17 samples/sec Loss 9.4620 LearningRate 0.0490 Epoch: 6 Global Step: 249010 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:26,784-Speed 2622.62 samples/sec Loss 9.3585 LearningRate 0.0490 Epoch: 6 Global Step: 249020 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:30,699-Speed 2616.71 samples/sec Loss 9.5357 LearningRate 0.0490 Epoch: 6 Global Step: 249030 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:34,618-Speed 2613.36 samples/sec Loss 9.4126 LearningRate 0.0490 Epoch: 6 Global Step: 249040 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:38,526-Speed 2620.49 samples/sec Loss 9.3555 LearningRate 0.0490 Epoch: 6 Global Step: 249050 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:42,435-Speed 2620.50 samples/sec Loss 9.3726 LearningRate 0.0490 Epoch: 6 Global Step: 249060 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:46,353-Speed 2614.35 samples/sec Loss 9.4706 LearningRate 0.0490 Epoch: 6 Global Step: 249070 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:50,269-Speed 2614.90 samples/sec Loss 9.4136 LearningRate 0.0490 Epoch: 6 Global Step: 249080 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:41:54,176-Speed 2621.92 samples/sec Loss 9.5219 LearningRate 0.0490 Epoch: 6 Global Step: 249090 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:41:58,079-Speed 2624.18 samples/sec Loss 9.4934 LearningRate 0.0490 Epoch: 6 Global Step: 249100 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:02,003-Speed 2610.86 samples/sec Loss 9.4199 LearningRate 0.0490 Epoch: 6 Global Step: 249110 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:05,928-Speed 2609.35 samples/sec Loss 9.4692 LearningRate 0.0490 Epoch: 6 Global Step: 249120 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:09,833-Speed 2622.53 samples/sec Loss 9.4948 LearningRate 0.0490 Epoch: 6 Global Step: 249130 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:13,741-Speed 2621.39 samples/sec Loss 9.7260 LearningRate 0.0490 Epoch: 6 Global Step: 249140 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:17,647-Speed 2622.41 samples/sec Loss 9.4086 LearningRate 0.0490 Epoch: 6 Global Step: 249150 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:21,564-Speed 2614.59 samples/sec Loss 9.5127 LearningRate 0.0490 Epoch: 6 Global Step: 249160 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:25,492-Speed 2607.70 samples/sec Loss 9.5220 LearningRate 0.0490 Epoch: 6 Global Step: 249170 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:29,398-Speed 2622.32 samples/sec Loss 9.3732 LearningRate 0.0489 Epoch: 6 Global Step: 249180 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:33,299-Speed 2625.22 samples/sec Loss 9.5379 LearningRate 0.0489 Epoch: 6 Global Step: 249190 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:42:37,267-Speed 2581.98 samples/sec Loss 9.3875 LearningRate 0.0489 Epoch: 6 Global Step: 249200 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:42:41,194-Speed 2608.02 samples/sec Loss 9.5787 LearningRate 0.0489 Epoch: 6 Global Step: 249210 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:45,095-Speed 2626.05 samples/sec Loss 9.4610 LearningRate 0.0489 Epoch: 6 Global Step: 249220 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:48,995-Speed 2626.30 samples/sec Loss 9.3558 LearningRate 0.0489 Epoch: 6 Global Step: 249230 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:52,902-Speed 2621.41 samples/sec Loss 9.2167 LearningRate 0.0489 Epoch: 6 Global Step: 249240 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:42:56,822-Speed 2613.52 samples/sec Loss 9.3626 LearningRate 0.0489 Epoch: 6 Global Step: 249250 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:00,726-Speed 2623.21 samples/sec Loss 9.4149 LearningRate 0.0489 Epoch: 6 Global Step: 249260 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:04,628-Speed 2625.28 samples/sec Loss 9.4750 LearningRate 0.0489 Epoch: 6 Global Step: 249270 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:08,528-Speed 2625.85 samples/sec Loss 9.3846 LearningRate 0.0489 Epoch: 6 Global Step: 249280 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:12,430-Speed 2624.79 samples/sec Loss 9.5522 LearningRate 0.0489 Epoch: 6 Global Step: 249290 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:16,339-Speed 2621.17 samples/sec Loss 9.3321 LearningRate 0.0489 Epoch: 6 Global Step: 249300 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:20,237-Speed 2627.21 samples/sec Loss 9.4543 LearningRate 0.0489 Epoch: 6 Global Step: 249310 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:43:24,167-Speed 2606.42 samples/sec Loss 9.4917 LearningRate 0.0489 Epoch: 6 Global Step: 249320 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:43:28,072-Speed 2622.98 samples/sec Loss 9.5252 LearningRate 0.0489 Epoch: 6 Global Step: 249330 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:43:31,979-Speed 2622.28 samples/sec Loss 9.1780 LearningRate 0.0489 Epoch: 6 Global Step: 249340 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:43:35,864-Speed 2636.02 samples/sec Loss 9.5262 LearningRate 0.0489 Epoch: 6 Global Step: 249350 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:39,770-Speed 2622.35 samples/sec Loss 9.4032 LearningRate 0.0489 Epoch: 6 Global Step: 249360 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:43,672-Speed 2624.33 samples/sec Loss 9.4291 LearningRate 0.0489 Epoch: 6 Global Step: 249370 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:47,575-Speed 2625.01 samples/sec Loss 9.4990 LearningRate 0.0489 Epoch: 6 Global Step: 249380 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:51,478-Speed 2624.25 samples/sec Loss 9.4268 LearningRate 0.0489 Epoch: 6 Global Step: 249390 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:43:55,338-Speed 2653.71 samples/sec Loss 10.2778 LearningRate 0.0489 Epoch: 6 Global Step: 249400 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:43:59,238-Speed 2626.54 samples/sec Loss 10.1882 LearningRate 0.0489 Epoch: 6 Global Step: 249410 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:03,145-Speed 2620.94 samples/sec Loss 9.8791 LearningRate 0.0489 Epoch: 6 Global Step: 249420 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:07,048-Speed 2624.24 samples/sec Loss 9.4665 LearningRate 0.0489 Epoch: 6 Global Step: 249430 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:10,960-Speed 2617.87 samples/sec Loss 9.6702 LearningRate 0.0489 Epoch: 6 Global Step: 249440 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:14,894-Speed 2604.54 samples/sec Loss 9.4674 LearningRate 0.0489 Epoch: 6 Global Step: 249450 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:18,796-Speed 2625.44 samples/sec Loss 9.2932 LearningRate 0.0489 Epoch: 6 Global Step: 249460 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:22,699-Speed 2624.41 samples/sec Loss 9.3700 LearningRate 0.0489 Epoch: 6 Global Step: 249470 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:26,604-Speed 2623.19 samples/sec Loss 9.3945 LearningRate 0.0489 Epoch: 6 Global Step: 249480 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:30,506-Speed 2624.93 samples/sec Loss 9.5187 LearningRate 0.0489 Epoch: 6 Global Step: 249490 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-13 23:44:34,416-Speed 2619.07 samples/sec Loss 9.4341 LearningRate 0.0489 Epoch: 6 Global Step: 249500 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:44:38,345-Speed 2606.85 samples/sec Loss 9.4035 LearningRate 0.0489 Epoch: 6 Global Step: 249510 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:44:42,249-Speed 2623.76 samples/sec Loss 9.6053 LearningRate 0.0489 Epoch: 6 Global Step: 249520 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:44:46,154-Speed 2622.95 samples/sec Loss 9.3582 LearningRate 0.0489 Epoch: 6 Global Step: 249530 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:44:50,058-Speed 2623.94 samples/sec Loss 9.4756 LearningRate 0.0489 Epoch: 6 Global Step: 249540 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:44:53,965-Speed 2621.33 samples/sec Loss 9.4923 LearningRate 0.0489 Epoch: 6 Global Step: 249550 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:44:57,866-Speed 2625.47 samples/sec Loss 9.5627 LearningRate 0.0489 Epoch: 6 Global Step: 249560 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:45:01,775-Speed 2620.90 samples/sec Loss 9.3328 LearningRate 0.0489 Epoch: 6 Global Step: 249570 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:45:05,680-Speed 2622.69 samples/sec Loss 9.5177 LearningRate 0.0489 Epoch: 6 Global Step: 249580 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:45:09,592-Speed 2618.16 samples/sec Loss 9.4032 LearningRate 0.0489 Epoch: 6 Global Step: 249590 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:45:13,499-Speed 2621.32 samples/sec Loss 9.6054 LearningRate 0.0489 Epoch: 6 Global Step: 249600 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:17,419-Speed 2613.41 samples/sec Loss 9.4426 LearningRate 0.0489 Epoch: 6 Global Step: 249610 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:21,326-Speed 2621.60 samples/sec Loss 9.3167 LearningRate 0.0489 Epoch: 6 Global Step: 249620 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:25,226-Speed 2626.38 samples/sec Loss 9.5632 LearningRate 0.0489 Epoch: 6 Global Step: 249630 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:29,129-Speed 2624.17 samples/sec Loss 9.4452 LearningRate 0.0489 Epoch: 6 Global Step: 249640 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:33,039-Speed 2620.25 samples/sec Loss 9.5188 LearningRate 0.0489 Epoch: 6 Global Step: 249650 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:36,942-Speed 2623.92 samples/sec Loss 9.6226 LearningRate 0.0489 Epoch: 6 Global Step: 249660 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:40,854-Speed 2617.98 samples/sec Loss 9.3679 LearningRate 0.0489 Epoch: 6 Global Step: 249670 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:44,768-Speed 2616.65 samples/sec Loss 9.6470 LearningRate 0.0489 Epoch: 6 Global Step: 249680 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:48,682-Speed 2617.50 samples/sec Loss 9.5597 LearningRate 0.0489 Epoch: 6 Global Step: 249690 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:45:52,601-Speed 2613.30 samples/sec Loss 9.4092 LearningRate 0.0489 Epoch: 6 Global Step: 249700 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:45:56,516-Speed 2616.10 samples/sec Loss 9.5076 LearningRate 0.0489 Epoch: 6 Global Step: 249710 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:00,451-Speed 2603.65 samples/sec Loss 9.5039 LearningRate 0.0489 Epoch: 6 Global Step: 249720 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:04,364-Speed 2617.00 samples/sec Loss 9.4566 LearningRate 0.0489 Epoch: 6 Global Step: 249730 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:08,396-Speed 2540.56 samples/sec Loss 9.3634 LearningRate 0.0489 Epoch: 6 Global Step: 249740 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:12,319-Speed 2610.37 samples/sec Loss 9.3496 LearningRate 0.0489 Epoch: 6 Global Step: 249750 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:16,239-Speed 2614.52 samples/sec Loss 9.4532 LearningRate 0.0489 Epoch: 6 Global Step: 249760 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:20,149-Speed 2619.09 samples/sec Loss 9.4841 LearningRate 0.0488 Epoch: 6 Global Step: 249770 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:24,060-Speed 2619.26 samples/sec Loss 9.3839 LearningRate 0.0488 Epoch: 6 Global Step: 249780 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:27,979-Speed 2613.34 samples/sec Loss 9.5379 LearningRate 0.0488 Epoch: 6 Global Step: 249790 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:31,890-Speed 2619.42 samples/sec Loss 9.5682 LearningRate 0.0488 Epoch: 6 Global Step: 249800 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:46:35,807-Speed 2614.66 samples/sec Loss 9.3601 LearningRate 0.0488 Epoch: 6 Global Step: 249810 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:46:39,754-Speed 2595.70 samples/sec Loss 9.5184 LearningRate 0.0488 Epoch: 6 Global Step: 249820 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:46:43,662-Speed 2620.66 samples/sec Loss 9.5268 LearningRate 0.0488 Epoch: 6 Global Step: 249830 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:46:47,562-Speed 2625.98 samples/sec Loss 9.4600 LearningRate 0.0488 Epoch: 6 Global Step: 249840 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:51,469-Speed 2621.67 samples/sec Loss 9.4771 LearningRate 0.0488 Epoch: 6 Global Step: 249850 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:55,374-Speed 2622.85 samples/sec Loss 9.4950 LearningRate 0.0488 Epoch: 6 Global Step: 249860 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:46:59,276-Speed 2624.75 samples/sec Loss 9.4730 LearningRate 0.0488 Epoch: 6 Global Step: 249870 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:03,179-Speed 2624.84 samples/sec Loss 9.4417 LearningRate 0.0488 Epoch: 6 Global Step: 249880 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:07,089-Speed 2619.23 samples/sec Loss 9.5983 LearningRate 0.0488 Epoch: 6 Global Step: 249890 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:11,022-Speed 2604.74 samples/sec Loss 9.3588 LearningRate 0.0488 Epoch: 6 Global Step: 249900 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:14,925-Speed 2624.51 samples/sec Loss 9.4094 LearningRate 0.0488 Epoch: 6 Global Step: 249910 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:18,826-Speed 2625.28 samples/sec Loss 9.4231 LearningRate 0.0488 Epoch: 6 Global Step: 249920 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:22,735-Speed 2620.26 samples/sec Loss 9.4332 LearningRate 0.0488 Epoch: 6 Global Step: 249930 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:26,623-Speed 2634.66 samples/sec Loss 9.5170 LearningRate 0.0488 Epoch: 6 Global Step: 249940 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:30,526-Speed 2624.17 samples/sec Loss 9.3892 LearningRate 0.0488 Epoch: 6 Global Step: 249950 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:34,462-Speed 2601.80 samples/sec Loss 9.4715 LearningRate 0.0488 Epoch: 6 Global Step: 249960 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:38,387-Speed 2610.05 samples/sec Loss 9.4741 LearningRate 0.0488 Epoch: 6 Global Step: 249970 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:42,301-Speed 2617.27 samples/sec Loss 9.2392 LearningRate 0.0488 Epoch: 6 Global Step: 249980 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:46,205-Speed 2622.92 samples/sec Loss 9.4043 LearningRate 0.0488 Epoch: 6 Global Step: 249990 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:47:50,114-Speed 2620.51 samples/sec Loss 9.4199 LearningRate 0.0488 Epoch: 6 Global Step: 250000 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:48:32,686-[lfw][250000]XNorm: 23.060224
Training: 2022-04-13 23:48:32,687-[lfw][250000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-13 23:48:32,687-[lfw][250000]Accuracy-Highest: 0.99783
Training: 2022-04-13 23:49:23,677-[cfp_fp][250000]XNorm: 21.058497
Training: 2022-04-13 23:49:23,678-[cfp_fp][250000]Accuracy-Flip: 0.98643+-0.00630
Training: 2022-04-13 23:49:23,679-[cfp_fp][250000]Accuracy-Highest: 0.98643
Training: 2022-04-13 23:50:06,750-[agedb_30][250000]XNorm: 22.844863
Training: 2022-04-13 23:50:06,751-[agedb_30][250000]Accuracy-Flip: 0.97350+-0.00497
Training: 2022-04-13 23:50:06,752-[agedb_30][250000]Accuracy-Highest: 0.97350
Training: 2022-04-13 23:50:10,633-Speed 72.87 samples/sec Loss 9.4795 LearningRate 0.0488 Epoch: 6 Global Step: 250010 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:50:14,512-Speed 2640.83 samples/sec Loss 9.5247 LearningRate 0.0488 Epoch: 6 Global Step: 250020 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:50:18,393-Speed 2639.27 samples/sec Loss 9.4675 LearningRate 0.0488 Epoch: 6 Global Step: 250030 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:50:22,278-Speed 2636.03 samples/sec Loss 9.4229 LearningRate 0.0488 Epoch: 6 Global Step: 250040 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:50:26,163-Speed 2636.19 samples/sec Loss 9.3145 LearningRate 0.0488 Epoch: 6 Global Step: 250050 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:50:30,056-Speed 2630.61 samples/sec Loss 9.4436 LearningRate 0.0488 Epoch: 6 Global Step: 250060 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:50:33,954-Speed 2628.23 samples/sec Loss 9.4900 LearningRate 0.0488 Epoch: 6 Global Step: 250070 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:50:37,859-Speed 2622.24 samples/sec Loss 9.3488 LearningRate 0.0488 Epoch: 6 Global Step: 250080 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:50:41,755-Speed 2629.38 samples/sec Loss 9.5109 LearningRate 0.0488 Epoch: 6 Global Step: 250090 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:50:45,649-Speed 2629.82 samples/sec Loss 9.4124 LearningRate 0.0488 Epoch: 6 Global Step: 250100 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:50:49,539-Speed 2633.93 samples/sec Loss 9.3919 LearningRate 0.0488 Epoch: 6 Global Step: 250110 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:50:53,427-Speed 2634.24 samples/sec Loss 9.3098 LearningRate 0.0488 Epoch: 6 Global Step: 250120 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:50:57,319-Speed 2631.70 samples/sec Loss 9.3235 LearningRate 0.0488 Epoch: 6 Global Step: 250130 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:01,207-Speed 2634.66 samples/sec Loss 9.3885 LearningRate 0.0488 Epoch: 6 Global Step: 250140 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:05,101-Speed 2630.68 samples/sec Loss 9.5117 LearningRate 0.0488 Epoch: 6 Global Step: 250150 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:08,997-Speed 2628.73 samples/sec Loss 9.5828 LearningRate 0.0488 Epoch: 6 Global Step: 250160 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:12,908-Speed 2619.04 samples/sec Loss 9.5401 LearningRate 0.0488 Epoch: 6 Global Step: 250170 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:16,801-Speed 2630.81 samples/sec Loss 9.4196 LearningRate 0.0488 Epoch: 6 Global Step: 250180 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:20,704-Speed 2624.72 samples/sec Loss 9.4573 LearningRate 0.0488 Epoch: 6 Global Step: 250190 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:51:24,597-Speed 2631.16 samples/sec Loss 9.5196 LearningRate 0.0488 Epoch: 6 Global Step: 250200 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:51:28,496-Speed 2626.72 samples/sec Loss 9.5458 LearningRate 0.0488 Epoch: 6 Global Step: 250210 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:51:32,390-Speed 2630.17 samples/sec Loss 9.4914 LearningRate 0.0488 Epoch: 6 Global Step: 250220 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:51:36,282-Speed 2631.53 samples/sec Loss 9.4577 LearningRate 0.0488 Epoch: 6 Global Step: 250230 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:51:40,161-Speed 2641.32 samples/sec Loss 9.3600 LearningRate 0.0488 Epoch: 6 Global Step: 250240 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:44,061-Speed 2625.55 samples/sec Loss 9.3471 LearningRate 0.0488 Epoch: 6 Global Step: 250250 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:47,955-Speed 2630.40 samples/sec Loss 9.4754 LearningRate 0.0488 Epoch: 6 Global Step: 250260 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:51,866-Speed 2618.97 samples/sec Loss 9.3924 LearningRate 0.0488 Epoch: 6 Global Step: 250270 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:55,762-Speed 2629.26 samples/sec Loss 9.4724 LearningRate 0.0488 Epoch: 6 Global Step: 250280 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:51:59,660-Speed 2627.70 samples/sec Loss 9.3624 LearningRate 0.0488 Epoch: 6 Global Step: 250290 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:03,605-Speed 2595.94 samples/sec Loss 9.4878 LearningRate 0.0488 Epoch: 6 Global Step: 250300 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:07,497-Speed 2632.09 samples/sec Loss 9.3383 LearningRate 0.0488 Epoch: 6 Global Step: 250310 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:11,393-Speed 2629.28 samples/sec Loss 9.3810 LearningRate 0.0488 Epoch: 6 Global Step: 250320 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:15,293-Speed 2625.72 samples/sec Loss 9.3959 LearningRate 0.0488 Epoch: 6 Global Step: 250330 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:19,198-Speed 2623.75 samples/sec Loss 9.3561 LearningRate 0.0488 Epoch: 6 Global Step: 250340 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:52:23,100-Speed 2624.51 samples/sec Loss 9.4484 LearningRate 0.0488 Epoch: 6 Global Step: 250350 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:52:26,993-Speed 2630.81 samples/sec Loss 9.4237 LearningRate 0.0487 Epoch: 6 Global Step: 250360 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:52:30,871-Speed 2641.31 samples/sec Loss 9.3676 LearningRate 0.0487 Epoch: 6 Global Step: 250370 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:34,764-Speed 2630.89 samples/sec Loss 9.4265 LearningRate 0.0487 Epoch: 6 Global Step: 250380 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:38,659-Speed 2629.39 samples/sec Loss 9.3274 LearningRate 0.0487 Epoch: 6 Global Step: 250390 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:42,551-Speed 2632.10 samples/sec Loss 9.4898 LearningRate 0.0487 Epoch: 6 Global Step: 250400 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:46,442-Speed 2632.90 samples/sec Loss 9.4714 LearningRate 0.0487 Epoch: 6 Global Step: 250410 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:50,334-Speed 2631.07 samples/sec Loss 9.3755 LearningRate 0.0487 Epoch: 6 Global Step: 250420 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:54,255-Speed 2612.77 samples/sec Loss 9.4086 LearningRate 0.0487 Epoch: 6 Global Step: 250430 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:52:58,149-Speed 2630.42 samples/sec Loss 9.3413 LearningRate 0.0487 Epoch: 6 Global Step: 250440 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:02,047-Speed 2627.52 samples/sec Loss 9.2456 LearningRate 0.0487 Epoch: 6 Global Step: 250450 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:05,944-Speed 2627.98 samples/sec Loss 9.4180 LearningRate 0.0487 Epoch: 6 Global Step: 250460 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:09,838-Speed 2630.70 samples/sec Loss 9.4440 LearningRate 0.0487 Epoch: 6 Global Step: 250470 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:53:13,737-Speed 2627.21 samples/sec Loss 9.4338 LearningRate 0.0487 Epoch: 6 Global Step: 250480 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:53:17,627-Speed 2632.86 samples/sec Loss 9.5037 LearningRate 0.0487 Epoch: 6 Global Step: 250490 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:21,519-Speed 2631.67 samples/sec Loss 9.4525 LearningRate 0.0487 Epoch: 6 Global Step: 250500 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:25,414-Speed 2630.02 samples/sec Loss 9.4378 LearningRate 0.0487 Epoch: 6 Global Step: 250510 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:29,308-Speed 2630.23 samples/sec Loss 9.5830 LearningRate 0.0487 Epoch: 6 Global Step: 250520 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:33,217-Speed 2620.46 samples/sec Loss 9.4648 LearningRate 0.0487 Epoch: 6 Global Step: 250530 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:37,128-Speed 2618.13 samples/sec Loss 9.3446 LearningRate 0.0487 Epoch: 6 Global Step: 250540 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:41,022-Speed 2630.85 samples/sec Loss 9.3836 LearningRate 0.0487 Epoch: 6 Global Step: 250550 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:44,914-Speed 2631.81 samples/sec Loss 9.4484 LearningRate 0.0487 Epoch: 6 Global Step: 250560 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:48,804-Speed 2632.75 samples/sec Loss 9.3939 LearningRate 0.0487 Epoch: 6 Global Step: 250570 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:52,696-Speed 2631.86 samples/sec Loss 9.2878 LearningRate 0.0487 Epoch: 6 Global Step: 250580 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:53:56,591-Speed 2629.74 samples/sec Loss 9.4067 LearningRate 0.0487 Epoch: 6 Global Step: 250590 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:54:00,486-Speed 2629.66 samples/sec Loss 9.5045 LearningRate 0.0487 Epoch: 6 Global Step: 250600 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:54:04,382-Speed 2628.62 samples/sec Loss 9.5429 LearningRate 0.0487 Epoch: 6 Global Step: 250610 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:54:08,278-Speed 2629.00 samples/sec Loss 9.4358 LearningRate 0.0487 Epoch: 6 Global Step: 250620 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:54:12,176-Speed 2627.12 samples/sec Loss 9.4058 LearningRate 0.0487 Epoch: 6 Global Step: 250630 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:54:16,072-Speed 2630.01 samples/sec Loss 9.4923 LearningRate 0.0487 Epoch: 6 Global Step: 250640 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:54:19,946-Speed 2643.24 samples/sec Loss 9.5148 LearningRate 0.0487 Epoch: 6 Global Step: 250650 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:23,840-Speed 2630.74 samples/sec Loss 9.5599 LearningRate 0.0487 Epoch: 6 Global Step: 250660 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:27,735-Speed 2629.36 samples/sec Loss 9.4298 LearningRate 0.0487 Epoch: 6 Global Step: 250670 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:31,640-Speed 2623.01 samples/sec Loss 9.4514 LearningRate 0.0487 Epoch: 6 Global Step: 250680 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:35,544-Speed 2623.43 samples/sec Loss 9.3914 LearningRate 0.0487 Epoch: 6 Global Step: 250690 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:39,452-Speed 2620.93 samples/sec Loss 9.4688 LearningRate 0.0487 Epoch: 6 Global Step: 250700 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:43,357-Speed 2622.89 samples/sec Loss 9.4397 LearningRate 0.0487 Epoch: 6 Global Step: 250710 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:47,263-Speed 2622.02 samples/sec Loss 9.4321 LearningRate 0.0487 Epoch: 6 Global Step: 250720 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:51,168-Speed 2623.38 samples/sec Loss 9.3600 LearningRate 0.0487 Epoch: 6 Global Step: 250730 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:55,079-Speed 2618.60 samples/sec Loss 9.3223 LearningRate 0.0487 Epoch: 6 Global Step: 250740 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:54:58,975-Speed 2628.99 samples/sec Loss 9.4056 LearningRate 0.0487 Epoch: 6 Global Step: 250750 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:55:02,886-Speed 2618.67 samples/sec Loss 9.3621 LearningRate 0.0487 Epoch: 6 Global Step: 250760 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:55:06,809-Speed 2611.09 samples/sec Loss 9.4855 LearningRate 0.0487 Epoch: 6 Global Step: 250770 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:55:10,707-Speed 2627.24 samples/sec Loss 9.4017 LearningRate 0.0487 Epoch: 6 Global Step: 250780 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:14,607-Speed 2626.18 samples/sec Loss 9.4804 LearningRate 0.0487 Epoch: 6 Global Step: 250790 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:18,512-Speed 2622.72 samples/sec Loss 9.5169 LearningRate 0.0487 Epoch: 6 Global Step: 250800 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:22,423-Speed 2619.62 samples/sec Loss 9.4991 LearningRate 0.0487 Epoch: 6 Global Step: 250810 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:26,318-Speed 2629.39 samples/sec Loss 9.4095 LearningRate 0.0487 Epoch: 6 Global Step: 250820 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:30,225-Speed 2622.03 samples/sec Loss 9.4573 LearningRate 0.0487 Epoch: 6 Global Step: 250830 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:34,132-Speed 2621.62 samples/sec Loss 9.3957 LearningRate 0.0487 Epoch: 6 Global Step: 250840 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:38,049-Speed 2614.86 samples/sec Loss 9.4991 LearningRate 0.0487 Epoch: 6 Global Step: 250850 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:41,957-Speed 2621.53 samples/sec Loss 9.4137 LearningRate 0.0487 Epoch: 6 Global Step: 250860 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:45,868-Speed 2618.96 samples/sec Loss 9.5137 LearningRate 0.0487 Epoch: 6 Global Step: 250870 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:55:49,815-Speed 2594.95 samples/sec Loss 9.4216 LearningRate 0.0487 Epoch: 6 Global Step: 250880 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:55:53,711-Speed 2629.26 samples/sec Loss 9.4598 LearningRate 0.0487 Epoch: 6 Global Step: 250890 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:55:57,607-Speed 2629.20 samples/sec Loss 9.5282 LearningRate 0.0487 Epoch: 6 Global Step: 250900 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:56:01,493-Speed 2636.08 samples/sec Loss 9.4723 LearningRate 0.0487 Epoch: 6 Global Step: 250910 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:05,396-Speed 2624.16 samples/sec Loss 9.4627 LearningRate 0.0487 Epoch: 6 Global Step: 250920 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:09,300-Speed 2623.34 samples/sec Loss 9.4100 LearningRate 0.0487 Epoch: 6 Global Step: 250930 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:13,204-Speed 2623.33 samples/sec Loss 9.3773 LearningRate 0.0487 Epoch: 6 Global Step: 250940 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:17,105-Speed 2625.61 samples/sec Loss 9.3816 LearningRate 0.0487 Epoch: 6 Global Step: 250950 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:21,011-Speed 2622.28 samples/sec Loss 9.4757 LearningRate 0.0486 Epoch: 6 Global Step: 250960 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:24,915-Speed 2623.37 samples/sec Loss 9.2176 LearningRate 0.0486 Epoch: 6 Global Step: 250970 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:28,809-Speed 2630.30 samples/sec Loss 9.2546 LearningRate 0.0486 Epoch: 6 Global Step: 250980 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:33,216-Speed 2324.37 samples/sec Loss 9.5268 LearningRate 0.0486 Epoch: 6 Global Step: 250990 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:37,180-Speed 2584.09 samples/sec Loss 9.3457 LearningRate 0.0486 Epoch: 6 Global Step: 251000 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:41,066-Speed 2635.63 samples/sec Loss 9.4429 LearningRate 0.0486 Epoch: 6 Global Step: 251010 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:44,954-Speed 2634.25 samples/sec Loss 9.4228 LearningRate 0.0486 Epoch: 6 Global Step: 251020 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:48,850-Speed 2628.48 samples/sec Loss 9.3850 LearningRate 0.0486 Epoch: 6 Global Step: 251030 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:52,746-Speed 2629.22 samples/sec Loss 9.3057 LearningRate 0.0486 Epoch: 6 Global Step: 251040 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:56:56,644-Speed 2627.27 samples/sec Loss 9.5021 LearningRate 0.0486 Epoch: 6 Global Step: 251050 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:57:00,540-Speed 2629.15 samples/sec Loss 9.4535 LearningRate 0.0486 Epoch: 6 Global Step: 251060 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:57:04,437-Speed 2628.33 samples/sec Loss 9.4585 LearningRate 0.0486 Epoch: 6 Global Step: 251070 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:57:08,331-Speed 2630.33 samples/sec Loss 9.3224 LearningRate 0.0486 Epoch: 6 Global Step: 251080 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:57:12,214-Speed 2637.54 samples/sec Loss 9.4492 LearningRate 0.0486 Epoch: 6 Global Step: 251090 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:16,105-Speed 2632.51 samples/sec Loss 9.2040 LearningRate 0.0486 Epoch: 6 Global Step: 251100 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:20,010-Speed 2622.72 samples/sec Loss 9.4946 LearningRate 0.0486 Epoch: 6 Global Step: 251110 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:23,921-Speed 2618.92 samples/sec Loss 9.4182 LearningRate 0.0486 Epoch: 6 Global Step: 251120 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:27,828-Speed 2621.98 samples/sec Loss 9.3053 LearningRate 0.0486 Epoch: 6 Global Step: 251130 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:31,731-Speed 2623.74 samples/sec Loss 9.3866 LearningRate 0.0486 Epoch: 6 Global Step: 251140 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:35,637-Speed 2622.50 samples/sec Loss 9.4247 LearningRate 0.0486 Epoch: 6 Global Step: 251150 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:39,544-Speed 2621.27 samples/sec Loss 9.5253 LearningRate 0.0486 Epoch: 6 Global Step: 251160 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:43,458-Speed 2616.83 samples/sec Loss 9.4614 LearningRate 0.0486 Epoch: 6 Global Step: 251170 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:47,355-Speed 2628.43 samples/sec Loss 9.4296 LearningRate 0.0486 Epoch: 6 Global Step: 251180 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:57:51,254-Speed 2627.18 samples/sec Loss 9.4460 LearningRate 0.0486 Epoch: 6 Global Step: 251190 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:57:55,154-Speed 2626.12 samples/sec Loss 9.3367 LearningRate 0.0486 Epoch: 6 Global Step: 251200 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:57:59,055-Speed 2625.66 samples/sec Loss 9.4120 LearningRate 0.0486 Epoch: 6 Global Step: 251210 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:02,975-Speed 2612.72 samples/sec Loss 9.4276 LearningRate 0.0486 Epoch: 6 Global Step: 251220 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:06,872-Speed 2628.20 samples/sec Loss 9.4086 LearningRate 0.0486 Epoch: 6 Global Step: 251230 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:10,767-Speed 2629.47 samples/sec Loss 9.4558 LearningRate 0.0486 Epoch: 6 Global Step: 251240 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:14,662-Speed 2629.88 samples/sec Loss 9.4585 LearningRate 0.0486 Epoch: 6 Global Step: 251250 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:18,573-Speed 2618.12 samples/sec Loss 9.3473 LearningRate 0.0486 Epoch: 6 Global Step: 251260 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:22,471-Speed 2627.82 samples/sec Loss 9.3434 LearningRate 0.0486 Epoch: 6 Global Step: 251270 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:26,367-Speed 2630.01 samples/sec Loss 9.4197 LearningRate 0.0486 Epoch: 6 Global Step: 251280 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:30,260-Speed 2631.13 samples/sec Loss 9.5521 LearningRate 0.0486 Epoch: 6 Global Step: 251290 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-13 23:58:34,142-Speed 2638.48 samples/sec Loss 9.4919 LearningRate 0.0486 Epoch: 6 Global Step: 251300 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:38,099-Speed 2588.10 samples/sec Loss 9.3230 LearningRate 0.0486 Epoch: 6 Global Step: 251310 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:42,191-Speed 2503.05 samples/sec Loss 9.5084 LearningRate 0.0486 Epoch: 6 Global Step: 251320 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:46,164-Speed 2578.31 samples/sec Loss 9.3383 LearningRate 0.0486 Epoch: 6 Global Step: 251330 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-13 23:58:50,021-Speed 2654.97 samples/sec Loss 9.4361 LearningRate 0.0486 Epoch: 6 Global Step: 251340 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:58:53,983-Speed 2585.49 samples/sec Loss 9.5937 LearningRate 0.0486 Epoch: 6 Global Step: 251350 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:58:57,875-Speed 2631.22 samples/sec Loss 9.4681 LearningRate 0.0486 Epoch: 6 Global Step: 251360 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:01,764-Speed 2633.71 samples/sec Loss 9.5214 LearningRate 0.0486 Epoch: 6 Global Step: 251370 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:05,656-Speed 2631.26 samples/sec Loss 9.2972 LearningRate 0.0486 Epoch: 6 Global Step: 251380 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:09,552-Speed 2629.65 samples/sec Loss 9.5050 LearningRate 0.0486 Epoch: 6 Global Step: 251390 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:13,446-Speed 2630.39 samples/sec Loss 9.4187 LearningRate 0.0486 Epoch: 6 Global Step: 251400 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:17,339-Speed 2630.72 samples/sec Loss 9.4854 LearningRate 0.0486 Epoch: 6 Global Step: 251410 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:21,232-Speed 2631.02 samples/sec Loss 9.3733 LearningRate 0.0486 Epoch: 6 Global Step: 251420 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:25,129-Speed 2628.31 samples/sec Loss 9.5612 LearningRate 0.0486 Epoch: 6 Global Step: 251430 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:29,030-Speed 2625.32 samples/sec Loss 9.4489 LearningRate 0.0486 Epoch: 6 Global Step: 251440 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:59:32,924-Speed 2630.24 samples/sec Loss 9.4247 LearningRate 0.0486 Epoch: 6 Global Step: 251450 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:59:36,824-Speed 2626.08 samples/sec Loss 9.2978 LearningRate 0.0486 Epoch: 6 Global Step: 251460 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-13 23:59:40,715-Speed 2632.64 samples/sec Loss 9.9335 LearningRate 0.0486 Epoch: 6 Global Step: 251470 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:44,608-Speed 2631.11 samples/sec Loss 10.0452 LearningRate 0.0486 Epoch: 6 Global Step: 251480 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:48,508-Speed 2626.04 samples/sec Loss 9.8083 LearningRate 0.0486 Epoch: 6 Global Step: 251490 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:52,409-Speed 2625.65 samples/sec Loss 9.7663 LearningRate 0.0486 Epoch: 6 Global Step: 251500 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-13 23:59:56,316-Speed 2621.86 samples/sec Loss 9.7303 LearningRate 0.0486 Epoch: 6 Global Step: 251510 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:00:00,205-Speed 2633.38 samples/sec Loss 9.5860 LearningRate 0.0486 Epoch: 6 Global Step: 251520 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:00:04,122-Speed 2614.90 samples/sec Loss 9.6441 LearningRate 0.0486 Epoch: 6 Global Step: 251530 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:00:08,025-Speed 2624.08 samples/sec Loss 9.6077 LearningRate 0.0486 Epoch: 6 Global Step: 251540 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:00:11,927-Speed 2624.59 samples/sec Loss 9.6171 LearningRate 0.0485 Epoch: 6 Global Step: 251550 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:00:15,833-Speed 2622.94 samples/sec Loss 9.5051 LearningRate 0.0485 Epoch: 6 Global Step: 251560 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:00:19,726-Speed 2630.84 samples/sec Loss 9.2651 LearningRate 0.0485 Epoch: 6 Global Step: 251570 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:00:23,624-Speed 2628.65 samples/sec Loss 9.4983 LearningRate 0.0485 Epoch: 6 Global Step: 251580 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:00:27,514-Speed 2632.78 samples/sec Loss 9.5020 LearningRate 0.0485 Epoch: 6 Global Step: 251590 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:00:31,405-Speed 2633.03 samples/sec Loss 9.5140 LearningRate 0.0485 Epoch: 6 Global Step: 251600 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:00:35,294-Speed 2633.08 samples/sec Loss 9.5048 LearningRate 0.0485 Epoch: 6 Global Step: 251610 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:00:39,187-Speed 2630.96 samples/sec Loss 9.5146 LearningRate 0.0485 Epoch: 6 Global Step: 251620 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:00:43,147-Speed 2586.30 samples/sec Loss 9.3200 LearningRate 0.0485 Epoch: 6 Global Step: 251630 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:00:47,010-Speed 2651.20 samples/sec Loss 9.5977 LearningRate 0.0485 Epoch: 6 Global Step: 251640 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:00:50,913-Speed 2624.22 samples/sec Loss 9.5351 LearningRate 0.0485 Epoch: 6 Global Step: 251650 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:00:54,803-Speed 2632.90 samples/sec Loss 9.4801 LearningRate 0.0485 Epoch: 6 Global Step: 251660 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:00:58,693-Speed 2633.05 samples/sec Loss 9.4278 LearningRate 0.0485 Epoch: 6 Global Step: 251670 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:01:02,585-Speed 2631.72 samples/sec Loss 9.3629 LearningRate 0.0485 Epoch: 6 Global Step: 251680 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:01:06,479-Speed 2630.38 samples/sec Loss 9.4279 LearningRate 0.0485 Epoch: 6 Global Step: 251690 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:01:10,368-Speed 2633.41 samples/sec Loss 9.4417 LearningRate 0.0485 Epoch: 6 Global Step: 251700 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:01:14,255-Speed 2635.39 samples/sec Loss 9.4812 LearningRate 0.0485 Epoch: 6 Global Step: 251710 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:01:18,145-Speed 2632.80 samples/sec Loss 9.4336 LearningRate 0.0485 Epoch: 6 Global Step: 251720 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:01:22,035-Speed 2633.12 samples/sec Loss 9.4536 LearningRate 0.0485 Epoch: 6 Global Step: 251730 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:01:25,924-Speed 2633.23 samples/sec Loss 9.5544 LearningRate 0.0485 Epoch: 6 Global Step: 251740 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:29,814-Speed 2633.11 samples/sec Loss 9.4111 LearningRate 0.0485 Epoch: 6 Global Step: 251750 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:33,713-Speed 2626.76 samples/sec Loss 9.2984 LearningRate 0.0485 Epoch: 6 Global Step: 251760 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:37,639-Speed 2608.86 samples/sec Loss 9.4468 LearningRate 0.0485 Epoch: 6 Global Step: 251770 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:41,527-Speed 2634.40 samples/sec Loss 9.3687 LearningRate 0.0485 Epoch: 6 Global Step: 251780 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:45,419-Speed 2631.81 samples/sec Loss 9.5083 LearningRate 0.0485 Epoch: 6 Global Step: 251790 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:49,312-Speed 2631.11 samples/sec Loss 9.5243 LearningRate 0.0485 Epoch: 6 Global Step: 251800 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:53,206-Speed 2630.23 samples/sec Loss 9.3359 LearningRate 0.0485 Epoch: 6 Global Step: 251810 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:01:57,102-Speed 2628.75 samples/sec Loss 9.5942 LearningRate 0.0485 Epoch: 6 Global Step: 251820 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:02:00,993-Speed 2632.74 samples/sec Loss 9.4217 LearningRate 0.0485 Epoch: 6 Global Step: 251830 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:02:04,884-Speed 2632.27 samples/sec Loss 9.5020 LearningRate 0.0485 Epoch: 6 Global Step: 251840 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:08,795-Speed 2618.78 samples/sec Loss 9.5082 LearningRate 0.0485 Epoch: 6 Global Step: 251850 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:12,703-Speed 2620.38 samples/sec Loss 9.4866 LearningRate 0.0485 Epoch: 6 Global Step: 251860 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:16,595-Speed 2632.17 samples/sec Loss 9.2952 LearningRate 0.0485 Epoch: 6 Global Step: 251870 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:20,564-Speed 2580.61 samples/sec Loss 9.3871 LearningRate 0.0485 Epoch: 6 Global Step: 251880 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:24,482-Speed 2614.07 samples/sec Loss 9.2919 LearningRate 0.0485 Epoch: 6 Global Step: 251890 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:28,379-Speed 2628.39 samples/sec Loss 9.4255 LearningRate 0.0485 Epoch: 6 Global Step: 251900 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:32,283-Speed 2623.44 samples/sec Loss 9.3593 LearningRate 0.0485 Epoch: 6 Global Step: 251910 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:36,179-Speed 2628.66 samples/sec Loss 9.3582 LearningRate 0.0485 Epoch: 6 Global Step: 251920 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:40,094-Speed 2616.27 samples/sec Loss 9.4645 LearningRate 0.0485 Epoch: 6 Global Step: 251930 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:02:43,993-Speed 2626.43 samples/sec Loss 9.3517 LearningRate 0.0485 Epoch: 6 Global Step: 251940 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:02:47,910-Speed 2614.80 samples/sec Loss 9.5079 LearningRate 0.0485 Epoch: 6 Global Step: 251950 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:02:51,811-Speed 2625.99 samples/sec Loss 9.4325 LearningRate 0.0485 Epoch: 6 Global Step: 251960 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:02:55,712-Speed 2625.34 samples/sec Loss 9.4171 LearningRate 0.0485 Epoch: 6 Global Step: 251970 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:02:59,626-Speed 2617.22 samples/sec Loss 9.3582 LearningRate 0.0485 Epoch: 6 Global Step: 251980 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:03:03,536-Speed 2619.65 samples/sec Loss 9.4337 LearningRate 0.0485 Epoch: 6 Global Step: 251990 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:03:07,425-Speed 2633.69 samples/sec Loss 9.8376 LearningRate 0.0485 Epoch: 6 Global Step: 252000 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:03:11,269-Speed 2664.51 samples/sec Loss 10.1109 LearningRate 0.0485 Epoch: 6 Global Step: 252010 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:15,246-Speed 2575.33 samples/sec Loss 9.6810 LearningRate 0.0485 Epoch: 6 Global Step: 252020 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:19,155-Speed 2619.95 samples/sec Loss 9.4125 LearningRate 0.0485 Epoch: 6 Global Step: 252030 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:23,062-Speed 2621.07 samples/sec Loss 9.3281 LearningRate 0.0485 Epoch: 6 Global Step: 252040 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:27,071-Speed 2555.21 samples/sec Loss 9.3749 LearningRate 0.0485 Epoch: 6 Global Step: 252050 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:30,970-Speed 2627.14 samples/sec Loss 9.4468 LearningRate 0.0485 Epoch: 6 Global Step: 252060 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:34,867-Speed 2628.75 samples/sec Loss 9.5038 LearningRate 0.0485 Epoch: 6 Global Step: 252070 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:38,764-Speed 2628.38 samples/sec Loss 9.5788 LearningRate 0.0485 Epoch: 6 Global Step: 252080 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:42,663-Speed 2626.60 samples/sec Loss 9.4654 LearningRate 0.0485 Epoch: 6 Global Step: 252090 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:46,562-Speed 2626.50 samples/sec Loss 9.5197 LearningRate 0.0485 Epoch: 6 Global Step: 252100 Fp16 Grad Scale: 8192 Required: 65 hours
Training: 2022-04-14 00:03:50,467-Speed 2623.44 samples/sec Loss 9.5352 LearningRate 0.0485 Epoch: 6 Global Step: 252110 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:03:54,371-Speed 2622.84 samples/sec Loss 9.5479 LearningRate 0.0485 Epoch: 6 Global Step: 252120 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:03:58,271-Speed 2626.61 samples/sec Loss 9.5011 LearningRate 0.0485 Epoch: 6 Global Step: 252130 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:02,162-Speed 2632.56 samples/sec Loss 9.4276 LearningRate 0.0485 Epoch: 6 Global Step: 252140 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:06,065-Speed 2624.61 samples/sec Loss 9.5564 LearningRate 0.0484 Epoch: 6 Global Step: 252150 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:09,968-Speed 2624.28 samples/sec Loss 9.4295 LearningRate 0.0484 Epoch: 6 Global Step: 252160 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:13,875-Speed 2621.33 samples/sec Loss 9.5223 LearningRate 0.0484 Epoch: 6 Global Step: 252170 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:17,769-Speed 2630.11 samples/sec Loss 9.3926 LearningRate 0.0484 Epoch: 6 Global Step: 252180 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:21,660-Speed 2632.57 samples/sec Loss 9.4471 LearningRate 0.0484 Epoch: 6 Global Step: 252190 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:25,554-Speed 2630.47 samples/sec Loss 9.4198 LearningRate 0.0484 Epoch: 6 Global Step: 252200 Fp16 Grad Scale: 16384 Required: 65 hours
Training: 2022-04-14 00:04:29,445-Speed 2632.08 samples/sec Loss 9.3389 LearningRate 0.0484 Epoch: 6 Global Step: 252210 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:04:33,335-Speed 2633.12 samples/sec Loss 9.4040 LearningRate 0.0484 Epoch: 6 Global Step: 252220 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:04:37,229-Speed 2630.48 samples/sec Loss 9.4637 LearningRate 0.0484 Epoch: 6 Global Step: 252230 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:04:41,120-Speed 2632.51 samples/sec Loss 9.4646 LearningRate 0.0484 Epoch: 6 Global Step: 252240 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:04:45,020-Speed 2626.50 samples/sec Loss 9.4161 LearningRate 0.0484 Epoch: 6 Global Step: 252250 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:04:48,924-Speed 2623.87 samples/sec Loss 9.5107 LearningRate 0.0484 Epoch: 6 Global Step: 252260 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:04:52,838-Speed 2616.71 samples/sec Loss 9.4835 LearningRate 0.0484 Epoch: 6 Global Step: 252270 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:04:56,740-Speed 2625.49 samples/sec Loss 9.3066 LearningRate 0.0484 Epoch: 6 Global Step: 252280 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:05:00,638-Speed 2627.11 samples/sec Loss 9.3419 LearningRate 0.0484 Epoch: 6 Global Step: 252290 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:05:04,539-Speed 2625.57 samples/sec Loss 9.4849 LearningRate 0.0484 Epoch: 6 Global Step: 252300 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:05:08,435-Speed 2628.64 samples/sec Loss 9.4173 LearningRate 0.0484 Epoch: 6 Global Step: 252310 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:12,334-Speed 2627.26 samples/sec Loss 9.4621 LearningRate 0.0484 Epoch: 6 Global Step: 252320 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:16,235-Speed 2625.73 samples/sec Loss 9.4722 LearningRate 0.0484 Epoch: 6 Global Step: 252330 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:20,130-Speed 2629.97 samples/sec Loss 9.4955 LearningRate 0.0484 Epoch: 6 Global Step: 252340 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:24,021-Speed 2632.37 samples/sec Loss 9.5071 LearningRate 0.0484 Epoch: 6 Global Step: 252350 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:27,917-Speed 2629.23 samples/sec Loss 9.4428 LearningRate 0.0484 Epoch: 6 Global Step: 252360 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:31,807-Speed 2633.12 samples/sec Loss 9.3581 LearningRate 0.0484 Epoch: 6 Global Step: 252370 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:35,719-Speed 2618.02 samples/sec Loss 9.4784 LearningRate 0.0484 Epoch: 6 Global Step: 252380 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:39,611-Speed 2631.92 samples/sec Loss 9.3502 LearningRate 0.0484 Epoch: 6 Global Step: 252390 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:43,518-Speed 2621.49 samples/sec Loss 9.5433 LearningRate 0.0484 Epoch: 6 Global Step: 252400 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:05:47,435-Speed 2615.24 samples/sec Loss 9.4299 LearningRate 0.0484 Epoch: 6 Global Step: 252410 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:05:51,328-Speed 2631.24 samples/sec Loss 9.4332 LearningRate 0.0484 Epoch: 6 Global Step: 252420 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:05:55,227-Speed 2626.88 samples/sec Loss 9.3745 LearningRate 0.0484 Epoch: 6 Global Step: 252430 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:05:59,142-Speed 2616.56 samples/sec Loss 9.3318 LearningRate 0.0484 Epoch: 6 Global Step: 252440 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:06:03,025-Speed 2637.28 samples/sec Loss 9.3531 LearningRate 0.0484 Epoch: 6 Global Step: 252450 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:06,943-Speed 2614.04 samples/sec Loss 9.5271 LearningRate 0.0484 Epoch: 6 Global Step: 252460 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:10,869-Speed 2609.30 samples/sec Loss 9.4452 LearningRate 0.0484 Epoch: 6 Global Step: 252470 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:14,777-Speed 2621.01 samples/sec Loss 9.3570 LearningRate 0.0484 Epoch: 6 Global Step: 252480 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:18,679-Speed 2625.24 samples/sec Loss 9.4265 LearningRate 0.0484 Epoch: 6 Global Step: 252490 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:22,589-Speed 2619.70 samples/sec Loss 9.5435 LearningRate 0.0484 Epoch: 6 Global Step: 252500 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:26,957-Speed 2344.58 samples/sec Loss 9.5941 LearningRate 0.0484 Epoch: 6 Global Step: 252510 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:30,859-Speed 2625.48 samples/sec Loss 9.3541 LearningRate 0.0484 Epoch: 6 Global Step: 252520 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:34,755-Speed 2628.45 samples/sec Loss 9.4209 LearningRate 0.0484 Epoch: 6 Global Step: 252530 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:38,651-Speed 2629.22 samples/sec Loss 9.4453 LearningRate 0.0484 Epoch: 6 Global Step: 252540 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:06:42,574-Speed 2610.78 samples/sec Loss 9.4179 LearningRate 0.0484 Epoch: 6 Global Step: 252550 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:06:46,471-Speed 2629.29 samples/sec Loss 9.4406 LearningRate 0.0484 Epoch: 6 Global Step: 252560 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:06:50,374-Speed 2624.03 samples/sec Loss 9.5032 LearningRate 0.0484 Epoch: 6 Global Step: 252570 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:06:54,306-Speed 2605.29 samples/sec Loss 9.4537 LearningRate 0.0484 Epoch: 6 Global Step: 252580 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:06:58,206-Speed 2625.95 samples/sec Loss 9.3712 LearningRate 0.0484 Epoch: 6 Global Step: 252590 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:02,110-Speed 2624.28 samples/sec Loss 9.3186 LearningRate 0.0484 Epoch: 6 Global Step: 252600 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:06,013-Speed 2624.34 samples/sec Loss 9.4636 LearningRate 0.0484 Epoch: 6 Global Step: 252610 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:09,908-Speed 2629.34 samples/sec Loss 9.4142 LearningRate 0.0484 Epoch: 6 Global Step: 252620 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:13,803-Speed 2629.27 samples/sec Loss 9.3145 LearningRate 0.0484 Epoch: 6 Global Step: 252630 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:17,703-Speed 2626.43 samples/sec Loss 9.3416 LearningRate 0.0484 Epoch: 6 Global Step: 252640 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:21,608-Speed 2623.07 samples/sec Loss 9.4111 LearningRate 0.0484 Epoch: 6 Global Step: 252650 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:07:25,524-Speed 2615.61 samples/sec Loss 9.4148 LearningRate 0.0484 Epoch: 6 Global Step: 252660 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:07:29,430-Speed 2622.49 samples/sec Loss 9.3465 LearningRate 0.0484 Epoch: 6 Global Step: 252670 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:07:33,337-Speed 2621.69 samples/sec Loss 9.3383 LearningRate 0.0484 Epoch: 6 Global Step: 252680 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:07:37,235-Speed 2627.15 samples/sec Loss 9.2627 LearningRate 0.0484 Epoch: 6 Global Step: 252690 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:07:41,124-Speed 2633.71 samples/sec Loss 9.3866 LearningRate 0.0484 Epoch: 6 Global Step: 252700 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:45,017-Speed 2631.12 samples/sec Loss 9.4513 LearningRate 0.0484 Epoch: 6 Global Step: 252710 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:49,088-Speed 2515.98 samples/sec Loss 9.3831 LearningRate 0.0484 Epoch: 6 Global Step: 252720 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:53,139-Speed 2528.30 samples/sec Loss 9.3708 LearningRate 0.0484 Epoch: 6 Global Step: 252730 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:07:57,028-Speed 2633.52 samples/sec Loss 9.5279 LearningRate 0.0483 Epoch: 6 Global Step: 252740 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:00,931-Speed 2624.24 samples/sec Loss 9.4012 LearningRate 0.0483 Epoch: 6 Global Step: 252750 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:04,824-Speed 2631.06 samples/sec Loss 9.3135 LearningRate 0.0483 Epoch: 6 Global Step: 252760 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:08,721-Speed 2627.97 samples/sec Loss 9.2369 LearningRate 0.0483 Epoch: 6 Global Step: 252770 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:12,622-Speed 2625.50 samples/sec Loss 9.4375 LearningRate 0.0483 Epoch: 6 Global Step: 252780 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:16,554-Speed 2605.49 samples/sec Loss 9.2156 LearningRate 0.0483 Epoch: 6 Global Step: 252790 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:20,457-Speed 2623.85 samples/sec Loss 9.2856 LearningRate 0.0483 Epoch: 6 Global Step: 252800 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:08:24,327-Speed 2646.43 samples/sec Loss 9.3443 LearningRate 0.0483 Epoch: 6 Global Step: 252810 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:28,220-Speed 2631.55 samples/sec Loss 9.3690 LearningRate 0.0483 Epoch: 6 Global Step: 252820 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:08:32,095-Speed 2642.90 samples/sec Loss 9.4245 LearningRate 0.0483 Epoch: 6 Global Step: 252830 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:08:35,995-Speed 2626.88 samples/sec Loss 9.4008 LearningRate 0.0483 Epoch: 6 Global Step: 252840 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:08:39,886-Speed 2631.85 samples/sec Loss 9.3114 LearningRate 0.0483 Epoch: 6 Global Step: 252850 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:08:43,779-Speed 2631.16 samples/sec Loss 9.4414 LearningRate 0.0483 Epoch: 6 Global Step: 252860 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:08:47,684-Speed 2622.55 samples/sec Loss 9.2895 LearningRate 0.0483 Epoch: 6 Global Step: 252870 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:08:51,575-Speed 2632.90 samples/sec Loss 9.4244 LearningRate 0.0483 Epoch: 6 Global Step: 252880 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:08:55,461-Speed 2635.31 samples/sec Loss 9.3397 LearningRate 0.0483 Epoch: 6 Global Step: 252890 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:08:59,403-Speed 2598.47 samples/sec Loss 9.3726 LearningRate 0.0483 Epoch: 6 Global Step: 252900 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:09:03,302-Speed 2626.77 samples/sec Loss 9.2781 LearningRate 0.0483 Epoch: 6 Global Step: 252910 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:09:07,192-Speed 2633.48 samples/sec Loss 9.4458 LearningRate 0.0483 Epoch: 6 Global Step: 252920 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:09:11,093-Speed 2625.39 samples/sec Loss 9.4163 LearningRate 0.0483 Epoch: 6 Global Step: 252930 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:09:15,004-Speed 2618.92 samples/sec Loss 9.4878 LearningRate 0.0483 Epoch: 6 Global Step: 252940 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:09:18,898-Speed 2629.81 samples/sec Loss 9.3606 LearningRate 0.0483 Epoch: 6 Global Step: 252950 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:09:22,790-Speed 2632.02 samples/sec Loss 9.4504 LearningRate 0.0483 Epoch: 6 Global Step: 252960 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:09:26,682-Speed 2632.18 samples/sec Loss 9.4150 LearningRate 0.0483 Epoch: 6 Global Step: 252970 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:30,578-Speed 2628.84 samples/sec Loss 9.3797 LearningRate 0.0483 Epoch: 6 Global Step: 252980 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:34,484-Speed 2622.74 samples/sec Loss 9.3068 LearningRate 0.0483 Epoch: 6 Global Step: 252990 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:38,372-Speed 2633.87 samples/sec Loss 9.3599 LearningRate 0.0483 Epoch: 6 Global Step: 253000 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:42,269-Speed 2628.74 samples/sec Loss 9.2946 LearningRate 0.0483 Epoch: 6 Global Step: 253010 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:46,162-Speed 2630.73 samples/sec Loss 9.4395 LearningRate 0.0483 Epoch: 6 Global Step: 253020 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:50,052-Speed 2633.24 samples/sec Loss 9.4563 LearningRate 0.0483 Epoch: 6 Global Step: 253030 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:53,960-Speed 2620.53 samples/sec Loss 9.3712 LearningRate 0.0483 Epoch: 6 Global Step: 253040 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:09:57,849-Speed 2633.87 samples/sec Loss 9.3822 LearningRate 0.0483 Epoch: 6 Global Step: 253050 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:10:01,738-Speed 2633.53 samples/sec Loss 9.3181 LearningRate 0.0483 Epoch: 6 Global Step: 253060 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:10:05,635-Speed 2628.98 samples/sec Loss 9.4531 LearningRate 0.0483 Epoch: 6 Global Step: 253070 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:09,525-Speed 2632.80 samples/sec Loss 9.4350 LearningRate 0.0483 Epoch: 6 Global Step: 253080 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:13,418-Speed 2631.06 samples/sec Loss 9.4511 LearningRate 0.0483 Epoch: 6 Global Step: 253090 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:17,317-Speed 2626.86 samples/sec Loss 9.3431 LearningRate 0.0483 Epoch: 6 Global Step: 253100 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:21,208-Speed 2632.15 samples/sec Loss 9.3729 LearningRate 0.0483 Epoch: 6 Global Step: 253110 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:25,114-Speed 2622.26 samples/sec Loss 9.3504 LearningRate 0.0483 Epoch: 6 Global Step: 253120 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:29,003-Speed 2633.76 samples/sec Loss 9.4420 LearningRate 0.0483 Epoch: 6 Global Step: 253130 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:32,893-Speed 2633.76 samples/sec Loss 9.4977 LearningRate 0.0483 Epoch: 6 Global Step: 253140 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:36,782-Speed 2633.69 samples/sec Loss 9.4500 LearningRate 0.0483 Epoch: 6 Global Step: 253150 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:40,683-Speed 2625.73 samples/sec Loss 9.3107 LearningRate 0.0483 Epoch: 6 Global Step: 253160 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:44,555-Speed 2644.68 samples/sec Loss 9.3187 LearningRate 0.0483 Epoch: 6 Global Step: 253170 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:48,476-Speed 2613.05 samples/sec Loss 9.4640 LearningRate 0.0483 Epoch: 6 Global Step: 253180 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:52,377-Speed 2625.57 samples/sec Loss 9.2665 LearningRate 0.0483 Epoch: 6 Global Step: 253190 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:10:56,289-Speed 2620.21 samples/sec Loss 9.4145 LearningRate 0.0483 Epoch: 6 Global Step: 253200 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:00,181-Speed 2631.84 samples/sec Loss 9.3432 LearningRate 0.0483 Epoch: 6 Global Step: 253210 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:04,074-Speed 2631.39 samples/sec Loss 9.3764 LearningRate 0.0483 Epoch: 6 Global Step: 253220 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:07,971-Speed 2628.18 samples/sec Loss 9.3073 LearningRate 0.0483 Epoch: 6 Global Step: 253230 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:11,874-Speed 2624.27 samples/sec Loss 9.4201 LearningRate 0.0483 Epoch: 6 Global Step: 253240 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:15,784-Speed 2619.29 samples/sec Loss 9.2494 LearningRate 0.0483 Epoch: 6 Global Step: 253250 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:19,675-Speed 2632.22 samples/sec Loss 9.4385 LearningRate 0.0483 Epoch: 6 Global Step: 253260 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:23,572-Speed 2628.74 samples/sec Loss 9.4092 LearningRate 0.0483 Epoch: 6 Global Step: 253270 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:11:27,450-Speed 2641.42 samples/sec Loss 9.4206 LearningRate 0.0483 Epoch: 6 Global Step: 253280 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:31,353-Speed 2624.18 samples/sec Loss 9.5244 LearningRate 0.0483 Epoch: 6 Global Step: 253290 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:35,251-Speed 2626.92 samples/sec Loss 9.4353 LearningRate 0.0483 Epoch: 6 Global Step: 253300 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:39,147-Speed 2628.83 samples/sec Loss 9.4166 LearningRate 0.0483 Epoch: 6 Global Step: 253310 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:43,050-Speed 2625.13 samples/sec Loss 9.3224 LearningRate 0.0483 Epoch: 6 Global Step: 253320 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:46,953-Speed 2624.72 samples/sec Loss 9.4245 LearningRate 0.0483 Epoch: 6 Global Step: 253330 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:50,849-Speed 2628.73 samples/sec Loss 9.4358 LearningRate 0.0482 Epoch: 6 Global Step: 253340 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:54,768-Speed 2613.76 samples/sec Loss 9.4151 LearningRate 0.0482 Epoch: 6 Global Step: 253350 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:11:58,673-Speed 2623.19 samples/sec Loss 9.3522 LearningRate 0.0482 Epoch: 6 Global Step: 253360 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:02,560-Speed 2634.38 samples/sec Loss 9.4009 LearningRate 0.0482 Epoch: 6 Global Step: 253370 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:06,453-Speed 2631.08 samples/sec Loss 9.3560 LearningRate 0.0482 Epoch: 6 Global Step: 253380 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:10,345-Speed 2632.24 samples/sec Loss 9.4311 LearningRate 0.0482 Epoch: 6 Global Step: 253390 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:14,231-Speed 2635.46 samples/sec Loss 9.3714 LearningRate 0.0482 Epoch: 6 Global Step: 253400 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:18,133-Speed 2625.52 samples/sec Loss 9.3641 LearningRate 0.0482 Epoch: 6 Global Step: 253410 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:22,039-Speed 2622.14 samples/sec Loss 9.1858 LearningRate 0.0482 Epoch: 6 Global Step: 253420 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:25,943-Speed 2623.82 samples/sec Loss 9.3700 LearningRate 0.0482 Epoch: 6 Global Step: 253430 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:29,832-Speed 2633.92 samples/sec Loss 9.2845 LearningRate 0.0482 Epoch: 6 Global Step: 253440 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:33,734-Speed 2624.40 samples/sec Loss 9.2886 LearningRate 0.0482 Epoch: 6 Global Step: 253450 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:37,624-Speed 2632.94 samples/sec Loss 9.3624 LearningRate 0.0482 Epoch: 6 Global Step: 253460 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:41,521-Speed 2628.58 samples/sec Loss 9.3885 LearningRate 0.0482 Epoch: 6 Global Step: 253470 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:45,416-Speed 2629.56 samples/sec Loss 9.5156 LearningRate 0.0482 Epoch: 6 Global Step: 253480 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:12:49,326-Speed 2620.30 samples/sec Loss 9.4809 LearningRate 0.0482 Epoch: 6 Global Step: 253490 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:53,226-Speed 2626.47 samples/sec Loss 9.2725 LearningRate 0.0482 Epoch: 6 Global Step: 253500 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:12:57,494-Speed 2399.97 samples/sec Loss 9.4320 LearningRate 0.0482 Epoch: 6 Global Step: 253510 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:01,397-Speed 2623.70 samples/sec Loss 9.4817 LearningRate 0.0482 Epoch: 6 Global Step: 253520 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:05,285-Speed 2634.24 samples/sec Loss 9.4958 LearningRate 0.0482 Epoch: 6 Global Step: 253530 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:09,180-Speed 2630.08 samples/sec Loss 9.5266 LearningRate 0.0482 Epoch: 6 Global Step: 253540 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:13,075-Speed 2629.92 samples/sec Loss 9.3996 LearningRate 0.0482 Epoch: 6 Global Step: 253550 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:16,975-Speed 2626.34 samples/sec Loss 9.5002 LearningRate 0.0482 Epoch: 6 Global Step: 253560 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:20,873-Speed 2627.17 samples/sec Loss 9.3431 LearningRate 0.0482 Epoch: 6 Global Step: 253570 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:24,783-Speed 2620.12 samples/sec Loss 9.3227 LearningRate 0.0482 Epoch: 6 Global Step: 253580 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:28,712-Speed 2607.08 samples/sec Loss 9.3212 LearningRate 0.0482 Epoch: 6 Global Step: 253590 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:13:32,616-Speed 2623.68 samples/sec Loss 9.2732 LearningRate 0.0482 Epoch: 6 Global Step: 253600 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:13:36,511-Speed 2629.35 samples/sec Loss 9.4549 LearningRate 0.0482 Epoch: 6 Global Step: 253610 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:13:40,380-Speed 2647.15 samples/sec Loss 9.4954 LearningRate 0.0482 Epoch: 6 Global Step: 253620 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:44,275-Speed 2629.23 samples/sec Loss 9.3920 LearningRate 0.0482 Epoch: 6 Global Step: 253630 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:48,171-Speed 2629.56 samples/sec Loss 9.3857 LearningRate 0.0482 Epoch: 6 Global Step: 253640 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:52,065-Speed 2630.22 samples/sec Loss 9.4604 LearningRate 0.0482 Epoch: 6 Global Step: 253650 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:13:55,926-Speed 2654.11 samples/sec Loss 9.3340 LearningRate 0.0482 Epoch: 6 Global Step: 253660 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:13:59,818-Speed 2631.82 samples/sec Loss 9.4157 LearningRate 0.0482 Epoch: 6 Global Step: 253670 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:03,720-Speed 2624.95 samples/sec Loss 9.3732 LearningRate 0.0482 Epoch: 6 Global Step: 253680 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:07,614-Speed 2630.60 samples/sec Loss 9.3076 LearningRate 0.0482 Epoch: 6 Global Step: 253690 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:11,511-Speed 2628.32 samples/sec Loss 9.4626 LearningRate 0.0482 Epoch: 6 Global Step: 253700 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:15,412-Speed 2625.26 samples/sec Loss 9.4240 LearningRate 0.0482 Epoch: 6 Global Step: 253710 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:19,309-Speed 2628.11 samples/sec Loss 9.2921 LearningRate 0.0482 Epoch: 6 Global Step: 253720 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:23,205-Speed 2629.47 samples/sec Loss 9.3452 LearningRate 0.0482 Epoch: 6 Global Step: 253730 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:27,098-Speed 2630.43 samples/sec Loss 9.1973 LearningRate 0.0482 Epoch: 6 Global Step: 253740 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:30,998-Speed 2627.04 samples/sec Loss 9.4426 LearningRate 0.0482 Epoch: 6 Global Step: 253750 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:14:34,891-Speed 2630.96 samples/sec Loss 9.3620 LearningRate 0.0482 Epoch: 6 Global Step: 253760 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:14:38,784-Speed 2630.83 samples/sec Loss 9.3367 LearningRate 0.0482 Epoch: 6 Global Step: 253770 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:14:42,675-Speed 2631.78 samples/sec Loss 9.4745 LearningRate 0.0482 Epoch: 6 Global Step: 253780 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:14:46,572-Speed 2628.42 samples/sec Loss 9.3304 LearningRate 0.0482 Epoch: 6 Global Step: 253790 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:14:50,468-Speed 2629.16 samples/sec Loss 9.4082 LearningRate 0.0482 Epoch: 6 Global Step: 253800 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:14:54,366-Speed 2627.97 samples/sec Loss 9.3235 LearningRate 0.0482 Epoch: 6 Global Step: 253810 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:14:58,260-Speed 2630.33 samples/sec Loss 9.3663 LearningRate 0.0482 Epoch: 6 Global Step: 253820 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:15:02,168-Speed 2621.38 samples/sec Loss 9.3124 LearningRate 0.0482 Epoch: 6 Global Step: 253830 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:15:06,073-Speed 2622.71 samples/sec Loss 9.5028 LearningRate 0.0482 Epoch: 6 Global Step: 253840 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:09,966-Speed 2630.86 samples/sec Loss 9.4900 LearningRate 0.0482 Epoch: 6 Global Step: 253850 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:13,862-Speed 2629.15 samples/sec Loss 9.3643 LearningRate 0.0482 Epoch: 6 Global Step: 253860 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:17,781-Speed 2613.32 samples/sec Loss 9.4642 LearningRate 0.0482 Epoch: 6 Global Step: 253870 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:21,688-Speed 2621.80 samples/sec Loss 9.2150 LearningRate 0.0482 Epoch: 6 Global Step: 253880 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:25,579-Speed 2632.44 samples/sec Loss 9.3354 LearningRate 0.0482 Epoch: 6 Global Step: 253890 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:29,472-Speed 2630.96 samples/sec Loss 9.4717 LearningRate 0.0482 Epoch: 6 Global Step: 253900 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:33,365-Speed 2631.48 samples/sec Loss 9.3613 LearningRate 0.0482 Epoch: 6 Global Step: 253910 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:37,257-Speed 2632.07 samples/sec Loss 9.3564 LearningRate 0.0482 Epoch: 6 Global Step: 253920 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:41,150-Speed 2630.51 samples/sec Loss 9.4285 LearningRate 0.0482 Epoch: 6 Global Step: 253930 Fp16 Grad Scale: 32768 Required: 65 hours
Training: 2022-04-14 00:15:45,044-Speed 2630.59 samples/sec Loss 9.5535 LearningRate 0.0481 Epoch: 6 Global Step: 253940 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:15:48,938-Speed 2630.46 samples/sec Loss 9.3318 LearningRate 0.0481 Epoch: 6 Global Step: 253950 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:15:52,832-Speed 2630.31 samples/sec Loss 9.4854 LearningRate 0.0481 Epoch: 6 Global Step: 253960 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:15:56,726-Speed 2630.26 samples/sec Loss 9.1871 LearningRate 0.0481 Epoch: 6 Global Step: 253970 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:16:00,619-Speed 2631.53 samples/sec Loss 9.3354 LearningRate 0.0481 Epoch: 6 Global Step: 253980 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:16:04,510-Speed 2632.22 samples/sec Loss 9.4565 LearningRate 0.0481 Epoch: 6 Global Step: 253990 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:16:08,402-Speed 2631.54 samples/sec Loss 9.2906 LearningRate 0.0481 Epoch: 6 Global Step: 254000 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:16:12,311-Speed 2620.12 samples/sec Loss 9.4804 LearningRate 0.0481 Epoch: 6 Global Step: 254010 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:16:16,228-Speed 2614.55 samples/sec Loss 9.3428 LearningRate 0.0481 Epoch: 6 Global Step: 254020 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:16:20,149-Speed 2612.03 samples/sec Loss 9.2807 LearningRate 0.0481 Epoch: 6 Global Step: 254030 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:16:24,054-Speed 2623.23 samples/sec Loss 9.3086 LearningRate 0.0481 Epoch: 6 Global Step: 254040 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:27,955-Speed 2625.93 samples/sec Loss 9.5311 LearningRate 0.0481 Epoch: 6 Global Step: 254050 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:31,847-Speed 2631.91 samples/sec Loss 9.4259 LearningRate 0.0481 Epoch: 6 Global Step: 254060 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:35,737-Speed 2632.63 samples/sec Loss 9.2874 LearningRate 0.0481 Epoch: 6 Global Step: 254070 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:39,631-Speed 2630.36 samples/sec Loss 9.3541 LearningRate 0.0481 Epoch: 6 Global Step: 254080 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:43,522-Speed 2632.64 samples/sec Loss 9.5260 LearningRate 0.0481 Epoch: 6 Global Step: 254090 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:47,414-Speed 2631.03 samples/sec Loss 9.3094 LearningRate 0.0481 Epoch: 6 Global Step: 254100 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:51,306-Speed 2633.25 samples/sec Loss 9.3980 LearningRate 0.0481 Epoch: 6 Global Step: 254110 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:55,198-Speed 2631.55 samples/sec Loss 9.3522 LearningRate 0.0481 Epoch: 6 Global Step: 254120 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:16:59,094-Speed 2628.90 samples/sec Loss 9.3988 LearningRate 0.0481 Epoch: 6 Global Step: 254130 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:17:02,989-Speed 2629.34 samples/sec Loss 9.4884 LearningRate 0.0481 Epoch: 6 Global Step: 254140 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:06,898-Speed 2620.83 samples/sec Loss 9.3284 LearningRate 0.0481 Epoch: 6 Global Step: 254150 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:10,791-Speed 2630.80 samples/sec Loss 9.4247 LearningRate 0.0481 Epoch: 6 Global Step: 254160 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:14,755-Speed 2583.45 samples/sec Loss 9.4412 LearningRate 0.0481 Epoch: 6 Global Step: 254170 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:18,659-Speed 2623.73 samples/sec Loss 9.4746 LearningRate 0.0481 Epoch: 6 Global Step: 254180 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:22,552-Speed 2630.96 samples/sec Loss 9.2639 LearningRate 0.0481 Epoch: 6 Global Step: 254190 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:26,444-Speed 2632.38 samples/sec Loss 9.1681 LearningRate 0.0481 Epoch: 6 Global Step: 254200 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:30,332-Speed 2633.80 samples/sec Loss 9.3389 LearningRate 0.0481 Epoch: 6 Global Step: 254210 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:34,229-Speed 2628.73 samples/sec Loss 9.3739 LearningRate 0.0481 Epoch: 6 Global Step: 254220 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:17:38,120-Speed 2631.81 samples/sec Loss 9.3663 LearningRate 0.0481 Epoch: 6 Global Step: 254230 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:17:42,011-Speed 2632.20 samples/sec Loss 9.4813 LearningRate 0.0481 Epoch: 6 Global Step: 254240 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:17:45,910-Speed 2626.84 samples/sec Loss 9.5216 LearningRate 0.0481 Epoch: 6 Global Step: 254250 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:17:49,806-Speed 2629.31 samples/sec Loss 9.4011 LearningRate 0.0481 Epoch: 6 Global Step: 254260 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:17:53,703-Speed 2628.46 samples/sec Loss 9.5079 LearningRate 0.0481 Epoch: 6 Global Step: 254270 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:17:57,622-Speed 2614.01 samples/sec Loss 9.4274 LearningRate 0.0481 Epoch: 6 Global Step: 254280 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:01,512-Speed 2632.47 samples/sec Loss 9.3396 LearningRate 0.0481 Epoch: 6 Global Step: 254290 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:05,404-Speed 2631.64 samples/sec Loss 9.2364 LearningRate 0.0481 Epoch: 6 Global Step: 254300 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:09,298-Speed 2630.68 samples/sec Loss 9.4154 LearningRate 0.0481 Epoch: 6 Global Step: 254310 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:13,191-Speed 2631.09 samples/sec Loss 9.3154 LearningRate 0.0481 Epoch: 6 Global Step: 254320 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:17,082-Speed 2631.98 samples/sec Loss 9.4108 LearningRate 0.0481 Epoch: 6 Global Step: 254330 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:18:20,976-Speed 2630.74 samples/sec Loss 9.4294 LearningRate 0.0481 Epoch: 6 Global Step: 254340 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:18:24,868-Speed 2631.59 samples/sec Loss 9.5172 LearningRate 0.0481 Epoch: 6 Global Step: 254350 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:18:28,764-Speed 2629.16 samples/sec Loss 9.2955 LearningRate 0.0481 Epoch: 6 Global Step: 254360 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:18:32,658-Speed 2630.29 samples/sec Loss 9.4180 LearningRate 0.0481 Epoch: 6 Global Step: 254370 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:18:36,556-Speed 2627.79 samples/sec Loss 9.4184 LearningRate 0.0481 Epoch: 6 Global Step: 254380 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:18:40,440-Speed 2636.76 samples/sec Loss 9.2774 LearningRate 0.0481 Epoch: 6 Global Step: 254390 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:44,338-Speed 2627.53 samples/sec Loss 9.3389 LearningRate 0.0481 Epoch: 6 Global Step: 254400 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:48,232-Speed 2630.90 samples/sec Loss 9.3813 LearningRate 0.0481 Epoch: 6 Global Step: 254410 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:52,127-Speed 2628.86 samples/sec Loss 9.4033 LearningRate 0.0481 Epoch: 6 Global Step: 254420 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:18:56,104-Speed 2576.14 samples/sec Loss 9.5909 LearningRate 0.0481 Epoch: 6 Global Step: 254430 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:00,016-Speed 2618.08 samples/sec Loss 9.3831 LearningRate 0.0481 Epoch: 6 Global Step: 254440 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:03,964-Speed 2594.05 samples/sec Loss 9.3552 LearningRate 0.0481 Epoch: 6 Global Step: 254450 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:07,880-Speed 2615.52 samples/sec Loss 9.3644 LearningRate 0.0481 Epoch: 6 Global Step: 254460 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:11,785-Speed 2623.72 samples/sec Loss 9.2155 LearningRate 0.0481 Epoch: 6 Global Step: 254470 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:15,687-Speed 2624.91 samples/sec Loss 9.4367 LearningRate 0.0481 Epoch: 6 Global Step: 254480 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:19,620-Speed 2604.14 samples/sec Loss 9.3810 LearningRate 0.0481 Epoch: 6 Global Step: 254490 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:19:23,535-Speed 2616.42 samples/sec Loss 9.4112 LearningRate 0.0481 Epoch: 6 Global Step: 254500 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:19:27,499-Speed 2584.31 samples/sec Loss 9.2350 LearningRate 0.0481 Epoch: 6 Global Step: 254510 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:19:31,395-Speed 2628.42 samples/sec Loss 9.3694 LearningRate 0.0481 Epoch: 6 Global Step: 254520 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:35,309-Speed 2617.51 samples/sec Loss 9.3182 LearningRate 0.0481 Epoch: 6 Global Step: 254530 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:39,215-Speed 2621.56 samples/sec Loss 9.4574 LearningRate 0.0480 Epoch: 6 Global Step: 254540 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:19:43,108-Speed 2632.09 samples/sec Loss 9.2283 LearningRate 0.0480 Epoch: 6 Global Step: 254550 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:19:47,008-Speed 2625.79 samples/sec Loss 9.3506 LearningRate 0.0480 Epoch: 6 Global Step: 254560 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:19:50,908-Speed 2627.12 samples/sec Loss 9.3377 LearningRate 0.0480 Epoch: 6 Global Step: 254570 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:19:54,800-Speed 2631.32 samples/sec Loss 9.4233 LearningRate 0.0480 Epoch: 6 Global Step: 254580 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:19:58,695-Speed 2629.65 samples/sec Loss 9.3478 LearningRate 0.0480 Epoch: 6 Global Step: 254590 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:20:02,597-Speed 2625.46 samples/sec Loss 9.3723 LearningRate 0.0480 Epoch: 6 Global Step: 254600 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:20:06,494-Speed 2628.27 samples/sec Loss 9.3435 LearningRate 0.0480 Epoch: 6 Global Step: 254610 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:20:10,403-Speed 2620.14 samples/sec Loss 9.4181 LearningRate 0.0480 Epoch: 6 Global Step: 254620 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:20:14,305-Speed 2625.02 samples/sec Loss 9.3003 LearningRate 0.0480 Epoch: 6 Global Step: 254630 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:20:18,203-Speed 2628.10 samples/sec Loss 9.4184 LearningRate 0.0480 Epoch: 6 Global Step: 254640 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:20:22,102-Speed 2626.81 samples/sec Loss 9.3461 LearningRate 0.0480 Epoch: 6 Global Step: 254650 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:25,993-Speed 2632.56 samples/sec Loss 9.2933 LearningRate 0.0480 Epoch: 6 Global Step: 254660 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:29,887-Speed 2629.83 samples/sec Loss 9.1438 LearningRate 0.0480 Epoch: 6 Global Step: 254670 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:33,783-Speed 2629.69 samples/sec Loss 9.3008 LearningRate 0.0480 Epoch: 6 Global Step: 254680 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:37,677-Speed 2630.34 samples/sec Loss 9.3001 LearningRate 0.0480 Epoch: 6 Global Step: 254690 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:41,570-Speed 2630.39 samples/sec Loss 9.2207 LearningRate 0.0480 Epoch: 6 Global Step: 254700 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:45,475-Speed 2622.74 samples/sec Loss 9.3652 LearningRate 0.0480 Epoch: 6 Global Step: 254710 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:49,364-Speed 2634.08 samples/sec Loss 9.5513 LearningRate 0.0480 Epoch: 6 Global Step: 254720 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:53,277-Speed 2617.85 samples/sec Loss 9.3312 LearningRate 0.0480 Epoch: 6 Global Step: 254730 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:20:57,176-Speed 2626.84 samples/sec Loss 9.3915 LearningRate 0.0480 Epoch: 6 Global Step: 254740 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:21:01,068-Speed 2631.88 samples/sec Loss 9.3970 LearningRate 0.0480 Epoch: 6 Global Step: 254750 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:04,960-Speed 2631.67 samples/sec Loss 9.2229 LearningRate 0.0480 Epoch: 6 Global Step: 254760 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:08,852-Speed 2631.82 samples/sec Loss 9.4840 LearningRate 0.0480 Epoch: 6 Global Step: 254770 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:12,744-Speed 2632.00 samples/sec Loss 9.5189 LearningRate 0.0480 Epoch: 6 Global Step: 254780 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:16,640-Speed 2628.68 samples/sec Loss 9.4431 LearningRate 0.0480 Epoch: 6 Global Step: 254790 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:20,534-Speed 2630.24 samples/sec Loss 9.4731 LearningRate 0.0480 Epoch: 6 Global Step: 254800 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:24,424-Speed 2632.96 samples/sec Loss 9.4406 LearningRate 0.0480 Epoch: 6 Global Step: 254810 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:28,315-Speed 2632.59 samples/sec Loss 9.3865 LearningRate 0.0480 Epoch: 6 Global Step: 254820 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:32,207-Speed 2631.81 samples/sec Loss 9.4392 LearningRate 0.0480 Epoch: 6 Global Step: 254830 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:21:36,081-Speed 2643.81 samples/sec Loss 9.3988 LearningRate 0.0480 Epoch: 6 Global Step: 254840 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:21:39,985-Speed 2623.58 samples/sec Loss 9.3001 LearningRate 0.0480 Epoch: 6 Global Step: 254850 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:21:43,876-Speed 2632.17 samples/sec Loss 9.2859 LearningRate 0.0480 Epoch: 6 Global Step: 254860 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:21:47,802-Speed 2610.14 samples/sec Loss 9.4067 LearningRate 0.0480 Epoch: 6 Global Step: 254870 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:21:51,691-Speed 2633.59 samples/sec Loss 9.4027 LearningRate 0.0480 Epoch: 6 Global Step: 254880 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:21:55,584-Speed 2631.43 samples/sec Loss 9.2539 LearningRate 0.0480 Epoch: 6 Global Step: 254890 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:21:59,478-Speed 2630.03 samples/sec Loss 9.3397 LearningRate 0.0480 Epoch: 6 Global Step: 254900 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:03,378-Speed 2626.86 samples/sec Loss 9.4470 LearningRate 0.0480 Epoch: 6 Global Step: 254910 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:07,265-Speed 2635.14 samples/sec Loss 9.4021 LearningRate 0.0480 Epoch: 6 Global Step: 254920 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:11,157-Speed 2630.92 samples/sec Loss 9.2563 LearningRate 0.0480 Epoch: 6 Global Step: 254930 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:15,055-Speed 2627.88 samples/sec Loss 9.3652 LearningRate 0.0480 Epoch: 6 Global Step: 254940 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:22:18,954-Speed 2627.02 samples/sec Loss 9.2756 LearningRate 0.0480 Epoch: 6 Global Step: 254950 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:22:22,841-Speed 2635.68 samples/sec Loss 9.2873 LearningRate 0.0480 Epoch: 6 Global Step: 254960 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:26,768-Speed 2607.68 samples/sec Loss 9.3885 LearningRate 0.0480 Epoch: 6 Global Step: 254970 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:30,660-Speed 2631.77 samples/sec Loss 9.4474 LearningRate 0.0480 Epoch: 6 Global Step: 254980 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:34,552-Speed 2632.51 samples/sec Loss 9.3241 LearningRate 0.0480 Epoch: 6 Global Step: 254990 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:38,442-Speed 2633.17 samples/sec Loss 9.2574 LearningRate 0.0480 Epoch: 6 Global Step: 255000 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:42,336-Speed 2630.36 samples/sec Loss 9.3613 LearningRate 0.0480 Epoch: 6 Global Step: 255010 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:46,230-Speed 2629.95 samples/sec Loss 9.3216 LearningRate 0.0480 Epoch: 6 Global Step: 255020 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:50,126-Speed 2628.95 samples/sec Loss 9.3919 LearningRate 0.0480 Epoch: 6 Global Step: 255030 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:54,019-Speed 2631.65 samples/sec Loss 9.2392 LearningRate 0.0480 Epoch: 6 Global Step: 255040 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:22:57,904-Speed 2636.07 samples/sec Loss 9.4204 LearningRate 0.0480 Epoch: 6 Global Step: 255050 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:01,802-Speed 2627.63 samples/sec Loss 9.4325 LearningRate 0.0480 Epoch: 6 Global Step: 255060 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:05,695-Speed 2631.05 samples/sec Loss 9.3752 LearningRate 0.0480 Epoch: 6 Global Step: 255070 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:09,592-Speed 2628.10 samples/sec Loss 9.3678 LearningRate 0.0480 Epoch: 6 Global Step: 255080 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:13,495-Speed 2624.68 samples/sec Loss 9.5213 LearningRate 0.0480 Epoch: 6 Global Step: 255090 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:17,422-Speed 2608.25 samples/sec Loss 9.3958 LearningRate 0.0480 Epoch: 6 Global Step: 255100 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:21,320-Speed 2628.09 samples/sec Loss 9.3312 LearningRate 0.0480 Epoch: 6 Global Step: 255110 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:25,395-Speed 2513.38 samples/sec Loss 9.4893 LearningRate 0.0480 Epoch: 6 Global Step: 255120 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:29,490-Speed 2501.07 samples/sec Loss 9.2761 LearningRate 0.0479 Epoch: 6 Global Step: 255130 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:33,583-Speed 2502.50 samples/sec Loss 9.3233 LearningRate 0.0479 Epoch: 6 Global Step: 255140 Fp16 Grad Scale: 65536 Required: 65 hours
Training: 2022-04-14 00:23:37,643-Speed 2522.63 samples/sec Loss 9.3012 LearningRate 0.0479 Epoch: 6 Global Step: 255150 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:23:41,536-Speed 2630.68 samples/sec Loss 9.2848 LearningRate 0.0479 Epoch: 6 Global Step: 255160 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:23:45,442-Speed 2622.54 samples/sec Loss 9.3239 LearningRate 0.0479 Epoch: 6 Global Step: 255170 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:23:49,336-Speed 2630.91 samples/sec Loss 9.3636 LearningRate 0.0479 Epoch: 6 Global Step: 255180 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:23:53,228-Speed 2631.59 samples/sec Loss 9.3235 LearningRate 0.0479 Epoch: 6 Global Step: 255190 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:23:57,134-Speed 2622.95 samples/sec Loss 9.1791 LearningRate 0.0479 Epoch: 6 Global Step: 255200 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:01,024-Speed 2632.76 samples/sec Loss 9.2292 LearningRate 0.0479 Epoch: 6 Global Step: 255210 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:04,915-Speed 2632.14 samples/sec Loss 9.3869 LearningRate 0.0479 Epoch: 6 Global Step: 255220 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:08,808-Speed 2631.09 samples/sec Loss 9.3195 LearningRate 0.0479 Epoch: 6 Global Step: 255230 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:12,715-Speed 2621.75 samples/sec Loss 9.2643 LearningRate 0.0479 Epoch: 6 Global Step: 255240 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:16,625-Speed 2619.49 samples/sec Loss 9.4006 LearningRate 0.0479 Epoch: 6 Global Step: 255250 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:24:20,524-Speed 2627.28 samples/sec Loss 9.2903 LearningRate 0.0479 Epoch: 6 Global Step: 255260 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:24:24,479-Speed 2589.92 samples/sec Loss 9.3417 LearningRate 0.0479 Epoch: 6 Global Step: 255270 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:24:28,378-Speed 2626.74 samples/sec Loss 9.3303 LearningRate 0.0479 Epoch: 6 Global Step: 255280 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:24:32,286-Speed 2621.25 samples/sec Loss 9.3043 LearningRate 0.0479 Epoch: 6 Global Step: 255290 Fp16 Grad Scale: 262144 Required: 65 hours
Training: 2022-04-14 00:24:36,161-Speed 2643.17 samples/sec Loss 9.3658 LearningRate 0.0479 Epoch: 6 Global Step: 255300 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:40,056-Speed 2629.38 samples/sec Loss 9.3841 LearningRate 0.0479 Epoch: 6 Global Step: 255310 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:43,955-Speed 2627.01 samples/sec Loss 9.4411 LearningRate 0.0479 Epoch: 6 Global Step: 255320 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:47,843-Speed 2634.16 samples/sec Loss 9.2321 LearningRate 0.0479 Epoch: 6 Global Step: 255330 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:51,737-Speed 2630.98 samples/sec Loss 9.3488 LearningRate 0.0479 Epoch: 6 Global Step: 255340 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:55,632-Speed 2629.37 samples/sec Loss 9.3183 LearningRate 0.0479 Epoch: 6 Global Step: 255350 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:24:59,534-Speed 2625.33 samples/sec Loss 9.3482 LearningRate 0.0479 Epoch: 6 Global Step: 255360 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:25:03,427-Speed 2630.71 samples/sec Loss 9.2486 LearningRate 0.0479 Epoch: 6 Global Step: 255370 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:25:07,829-Speed 2326.92 samples/sec Loss 9.2527 LearningRate 0.0479 Epoch: 6 Global Step: 255380 Fp16 Grad Scale: 131072 Required: 65 hours
Training: 2022-04-14 00:25:11,730-Speed 2625.30 samples/sec Loss 9.2879 LearningRate 0.0479 Epoch: 6 Global Step: 255390 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:25:15,638-Speed 2620.74 samples/sec Loss 9.3393 LearningRate 0.0479 Epoch: 6 Global Step: 255400 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:19,534-Speed 2629.60 samples/sec Loss 9.2640 LearningRate 0.0479 Epoch: 6 Global Step: 255410 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:23,433-Speed 2626.87 samples/sec Loss 9.2852 LearningRate 0.0479 Epoch: 6 Global Step: 255420 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:27,342-Speed 2620.29 samples/sec Loss 9.2366 LearningRate 0.0479 Epoch: 6 Global Step: 255430 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:31,247-Speed 2623.27 samples/sec Loss 9.3034 LearningRate 0.0479 Epoch: 6 Global Step: 255440 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:35,139-Speed 2631.34 samples/sec Loss 9.4197 LearningRate 0.0479 Epoch: 6 Global Step: 255450 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:39,032-Speed 2631.03 samples/sec Loss 9.4302 LearningRate 0.0479 Epoch: 6 Global Step: 255460 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:42,983-Speed 2592.57 samples/sec Loss 9.3632 LearningRate 0.0479 Epoch: 6 Global Step: 255470 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:46,865-Speed 2638.87 samples/sec Loss 9.3711 LearningRate 0.0479 Epoch: 6 Global Step: 255480 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:25:50,744-Speed 2640.88 samples/sec Loss 9.2383 LearningRate 0.0479 Epoch: 6 Global Step: 255490 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:25:54,657-Speed 2617.42 samples/sec Loss 9.3260 LearningRate 0.0479 Epoch: 6 Global Step: 255500 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:25:58,554-Speed 2628.99 samples/sec Loss 9.3045 LearningRate 0.0479 Epoch: 6 Global Step: 255510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:02,450-Speed 2629.05 samples/sec Loss 9.3607 LearningRate 0.0479 Epoch: 6 Global Step: 255520 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:06,347-Speed 2628.07 samples/sec Loss 9.3788 LearningRate 0.0479 Epoch: 6 Global Step: 255530 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:10,241-Speed 2630.14 samples/sec Loss 9.2419 LearningRate 0.0479 Epoch: 6 Global Step: 255540 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:14,134-Speed 2631.27 samples/sec Loss 9.3859 LearningRate 0.0479 Epoch: 6 Global Step: 255550 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:18,033-Speed 2627.05 samples/sec Loss 9.3852 LearningRate 0.0479 Epoch: 6 Global Step: 255560 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:21,926-Speed 2631.23 samples/sec Loss 9.2674 LearningRate 0.0479 Epoch: 6 Global Step: 255570 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:25,852-Speed 2608.62 samples/sec Loss 9.3234 LearningRate 0.0479 Epoch: 6 Global Step: 255580 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:29,741-Speed 2633.89 samples/sec Loss 9.2325 LearningRate 0.0479 Epoch: 6 Global Step: 255590 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:26:33,618-Speed 2642.40 samples/sec Loss 9.3203 LearningRate 0.0479 Epoch: 6 Global Step: 255600 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:37,506-Speed 2633.91 samples/sec Loss 9.3882 LearningRate 0.0479 Epoch: 6 Global Step: 255610 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:41,399-Speed 2631.31 samples/sec Loss 9.3583 LearningRate 0.0479 Epoch: 6 Global Step: 255620 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:45,292-Speed 2631.36 samples/sec Loss 9.3957 LearningRate 0.0479 Epoch: 6 Global Step: 255630 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:49,198-Speed 2622.01 samples/sec Loss 9.3278 LearningRate 0.0479 Epoch: 6 Global Step: 255640 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:53,086-Speed 2633.87 samples/sec Loss 9.3688 LearningRate 0.0479 Epoch: 6 Global Step: 255650 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:26:56,981-Speed 2630.17 samples/sec Loss 9.3843 LearningRate 0.0479 Epoch: 6 Global Step: 255660 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:00,881-Speed 2626.01 samples/sec Loss 9.1826 LearningRate 0.0479 Epoch: 6 Global Step: 255670 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:04,779-Speed 2627.57 samples/sec Loss 9.2576 LearningRate 0.0479 Epoch: 6 Global Step: 255680 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:08,694-Speed 2616.56 samples/sec Loss 9.3502 LearningRate 0.0479 Epoch: 6 Global Step: 255690 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:12,601-Speed 2621.83 samples/sec Loss 9.2874 LearningRate 0.0479 Epoch: 6 Global Step: 255700 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:27:16,494-Speed 2630.28 samples/sec Loss 9.1917 LearningRate 0.0479 Epoch: 6 Global Step: 255710 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:27:20,367-Speed 2644.56 samples/sec Loss 9.3818 LearningRate 0.0479 Epoch: 6 Global Step: 255720 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:24,265-Speed 2627.22 samples/sec Loss 9.3183 LearningRate 0.0478 Epoch: 6 Global Step: 255730 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:28,167-Speed 2625.16 samples/sec Loss 9.4427 LearningRate 0.0478 Epoch: 6 Global Step: 255740 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:32,063-Speed 2629.16 samples/sec Loss 9.3313 LearningRate 0.0478 Epoch: 6 Global Step: 255750 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:35,969-Speed 2621.96 samples/sec Loss 9.2427 LearningRate 0.0478 Epoch: 6 Global Step: 255760 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:39,866-Speed 2628.65 samples/sec Loss 9.4379 LearningRate 0.0478 Epoch: 6 Global Step: 255770 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:43,755-Speed 2633.87 samples/sec Loss 9.3683 LearningRate 0.0478 Epoch: 6 Global Step: 255780 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:47,662-Speed 2621.19 samples/sec Loss 9.4403 LearningRate 0.0478 Epoch: 6 Global Step: 255790 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:51,553-Speed 2632.00 samples/sec Loss 9.3348 LearningRate 0.0478 Epoch: 6 Global Step: 255800 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:27:55,420-Speed 2649.08 samples/sec Loss 9.2580 LearningRate 0.0478 Epoch: 6 Global Step: 255810 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:27:59,311-Speed 2632.11 samples/sec Loss 9.2096 LearningRate 0.0478 Epoch: 6 Global Step: 255820 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:03,199-Speed 2634.11 samples/sec Loss 9.3387 LearningRate 0.0478 Epoch: 6 Global Step: 255830 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:07,088-Speed 2633.91 samples/sec Loss 9.2939 LearningRate 0.0478 Epoch: 6 Global Step: 255840 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:10,986-Speed 2627.28 samples/sec Loss 9.3893 LearningRate 0.0478 Epoch: 6 Global Step: 255850 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:14,873-Speed 2635.25 samples/sec Loss 9.3125 LearningRate 0.0478 Epoch: 6 Global Step: 255860 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:18,766-Speed 2631.05 samples/sec Loss 9.4206 LearningRate 0.0478 Epoch: 6 Global Step: 255870 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:22,654-Speed 2634.15 samples/sec Loss 9.3570 LearningRate 0.0478 Epoch: 6 Global Step: 255880 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:26,542-Speed 2634.44 samples/sec Loss 9.2210 LearningRate 0.0478 Epoch: 6 Global Step: 255890 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:30,435-Speed 2630.91 samples/sec Loss 9.2343 LearningRate 0.0478 Epoch: 6 Global Step: 255900 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:28:34,328-Speed 2631.22 samples/sec Loss 9.1719 LearningRate 0.0478 Epoch: 6 Global Step: 255910 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:28:38,234-Speed 2621.72 samples/sec Loss 9.2611 LearningRate 0.0478 Epoch: 6 Global Step: 255920 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:28:42,133-Speed 2626.77 samples/sec Loss 9.2826 LearningRate 0.0478 Epoch: 6 Global Step: 255930 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:28:46,027-Speed 2630.15 samples/sec Loss 9.2198 LearningRate 0.0478 Epoch: 6 Global Step: 255940 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:28:49,958-Speed 2605.66 samples/sec Loss 9.3051 LearningRate 0.0478 Epoch: 6 Global Step: 255950 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:28:53,868-Speed 2619.95 samples/sec Loss 9.3085 LearningRate 0.0478 Epoch: 6 Global Step: 255960 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:28:57,763-Speed 2629.43 samples/sec Loss 9.4268 LearningRate 0.0478 Epoch: 6 Global Step: 255970 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:01,652-Speed 2634.08 samples/sec Loss 9.2555 LearningRate 0.0478 Epoch: 6 Global Step: 255980 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:05,542-Speed 2632.51 samples/sec Loss 9.1527 LearningRate 0.0478 Epoch: 6 Global Step: 255990 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:09,441-Speed 2627.29 samples/sec Loss 9.4246 LearningRate 0.0478 Epoch: 6 Global Step: 256000 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:13,335-Speed 2630.27 samples/sec Loss 9.3888 LearningRate 0.0478 Epoch: 6 Global Step: 256010 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:29:17,214-Speed 2639.98 samples/sec Loss 9.2636 LearningRate 0.0478 Epoch: 6 Global Step: 256020 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:21,106-Speed 2631.52 samples/sec Loss 9.2981 LearningRate 0.0478 Epoch: 6 Global Step: 256030 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:24,996-Speed 2633.43 samples/sec Loss 9.3076 LearningRate 0.0478 Epoch: 6 Global Step: 256040 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:28,887-Speed 2631.93 samples/sec Loss 9.2668 LearningRate 0.0478 Epoch: 6 Global Step: 256050 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:32,780-Speed 2631.38 samples/sec Loss 9.2631 LearningRate 0.0478 Epoch: 6 Global Step: 256060 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:36,669-Speed 2633.79 samples/sec Loss 9.3412 LearningRate 0.0478 Epoch: 6 Global Step: 256070 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:40,559-Speed 2632.46 samples/sec Loss 9.3459 LearningRate 0.0478 Epoch: 6 Global Step: 256080 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:44,478-Speed 2614.12 samples/sec Loss 9.2466 LearningRate 0.0478 Epoch: 6 Global Step: 256090 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:48,373-Speed 2629.36 samples/sec Loss 9.4224 LearningRate 0.0478 Epoch: 6 Global Step: 256100 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:52,269-Speed 2628.80 samples/sec Loss 9.4310 LearningRate 0.0478 Epoch: 6 Global Step: 256110 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:29:56,172-Speed 2624.37 samples/sec Loss 9.4097 LearningRate 0.0478 Epoch: 6 Global Step: 256120 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:30:00,065-Speed 2630.94 samples/sec Loss 9.3323 LearningRate 0.0478 Epoch: 6 Global Step: 256130 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:30:03,959-Speed 2630.52 samples/sec Loss 9.2956 LearningRate 0.0478 Epoch: 6 Global Step: 256140 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:30:07,854-Speed 2629.56 samples/sec Loss 9.2999 LearningRate 0.0478 Epoch: 6 Global Step: 256150 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:30:11,757-Speed 2624.24 samples/sec Loss 9.2648 LearningRate 0.0478 Epoch: 6 Global Step: 256160 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:30:15,665-Speed 2620.85 samples/sec Loss 9.4173 LearningRate 0.0478 Epoch: 6 Global Step: 256170 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:30:19,563-Speed 2627.55 samples/sec Loss 9.3289 LearningRate 0.0478 Epoch: 6 Global Step: 256180 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:30:23,433-Speed 2646.58 samples/sec Loss 9.6394 LearningRate 0.0478 Epoch: 6 Global Step: 256190 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:30:27,324-Speed 2632.61 samples/sec Loss 10.2517 LearningRate 0.0478 Epoch: 6 Global Step: 256200 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:31,215-Speed 2632.29 samples/sec Loss 9.6302 LearningRate 0.0478 Epoch: 6 Global Step: 256210 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:35,113-Speed 2627.47 samples/sec Loss 9.4730 LearningRate 0.0478 Epoch: 6 Global Step: 256220 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:39,009-Speed 2629.27 samples/sec Loss 9.4542 LearningRate 0.0478 Epoch: 6 Global Step: 256230 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:42,909-Speed 2626.40 samples/sec Loss 9.4651 LearningRate 0.0478 Epoch: 6 Global Step: 256240 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:46,808-Speed 2627.18 samples/sec Loss 9.4570 LearningRate 0.0478 Epoch: 6 Global Step: 256250 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:50,702-Speed 2630.00 samples/sec Loss 9.4355 LearningRate 0.0478 Epoch: 6 Global Step: 256260 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:54,596-Speed 2630.14 samples/sec Loss 9.3151 LearningRate 0.0478 Epoch: 6 Global Step: 256270 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:30:58,510-Speed 2617.56 samples/sec Loss 9.5244 LearningRate 0.0478 Epoch: 6 Global Step: 256280 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:31:02,413-Speed 2624.48 samples/sec Loss 9.3561 LearningRate 0.0478 Epoch: 6 Global Step: 256290 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:31:06,305-Speed 2631.20 samples/sec Loss 9.2891 LearningRate 0.0478 Epoch: 6 Global Step: 256300 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:10,199-Speed 2630.42 samples/sec Loss 9.3955 LearningRate 0.0478 Epoch: 6 Global Step: 256310 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:14,097-Speed 2627.20 samples/sec Loss 9.3477 LearningRate 0.0478 Epoch: 6 Global Step: 256320 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:18,056-Speed 2587.27 samples/sec Loss 9.2997 LearningRate 0.0477 Epoch: 6 Global Step: 256330 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:21,974-Speed 2614.48 samples/sec Loss 9.3726 LearningRate 0.0477 Epoch: 6 Global Step: 256340 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:25,865-Speed 2632.75 samples/sec Loss 9.3744 LearningRate 0.0477 Epoch: 6 Global Step: 256350 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:29,766-Speed 2625.51 samples/sec Loss 9.4466 LearningRate 0.0477 Epoch: 6 Global Step: 256360 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:33,656-Speed 2633.24 samples/sec Loss 9.2423 LearningRate 0.0477 Epoch: 6 Global Step: 256370 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:37,546-Speed 2632.78 samples/sec Loss 9.2881 LearningRate 0.0477 Epoch: 6 Global Step: 256380 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:41,437-Speed 2632.60 samples/sec Loss 9.3142 LearningRate 0.0477 Epoch: 6 Global Step: 256390 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:31:45,331-Speed 2629.90 samples/sec Loss 9.2245 LearningRate 0.0477 Epoch: 6 Global Step: 256400 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:31:49,226-Speed 2630.23 samples/sec Loss 9.4184 LearningRate 0.0477 Epoch: 6 Global Step: 256410 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:31:53,114-Speed 2634.27 samples/sec Loss 9.2702 LearningRate 0.0477 Epoch: 6 Global Step: 256420 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:31:57,005-Speed 2632.86 samples/sec Loss 9.4496 LearningRate 0.0477 Epoch: 6 Global Step: 256430 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:00,896-Speed 2631.91 samples/sec Loss 9.2843 LearningRate 0.0477 Epoch: 6 Global Step: 256440 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:04,791-Speed 2629.41 samples/sec Loss 9.2783 LearningRate 0.0477 Epoch: 6 Global Step: 256450 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:08,689-Speed 2627.38 samples/sec Loss 9.2719 LearningRate 0.0477 Epoch: 6 Global Step: 256460 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:12,584-Speed 2630.30 samples/sec Loss 9.1924 LearningRate 0.0477 Epoch: 6 Global Step: 256470 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:16,488-Speed 2622.81 samples/sec Loss 9.3790 LearningRate 0.0477 Epoch: 6 Global Step: 256480 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:20,388-Speed 2627.37 samples/sec Loss 9.4057 LearningRate 0.0477 Epoch: 6 Global Step: 256490 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:24,278-Speed 2632.80 samples/sec Loss 9.4259 LearningRate 0.0477 Epoch: 6 Global Step: 256500 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:28,279-Speed 2560.28 samples/sec Loss 9.3610 LearningRate 0.0477 Epoch: 6 Global Step: 256510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:32,261-Speed 2571.94 samples/sec Loss 9.5499 LearningRate 0.0477 Epoch: 6 Global Step: 256520 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:36,152-Speed 2632.95 samples/sec Loss 9.3627 LearningRate 0.0477 Epoch: 6 Global Step: 256530 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:40,043-Speed 2631.96 samples/sec Loss 9.3386 LearningRate 0.0477 Epoch: 6 Global Step: 256540 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:43,957-Speed 2617.01 samples/sec Loss 9.2978 LearningRate 0.0477 Epoch: 6 Global Step: 256550 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:47,859-Speed 2625.44 samples/sec Loss 9.3443 LearningRate 0.0477 Epoch: 6 Global Step: 256560 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:51,749-Speed 2632.72 samples/sec Loss 9.1997 LearningRate 0.0477 Epoch: 6 Global Step: 256570 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:55,697-Speed 2593.93 samples/sec Loss 9.3028 LearningRate 0.0477 Epoch: 6 Global Step: 256580 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:32:59,611-Speed 2617.55 samples/sec Loss 9.2852 LearningRate 0.0477 Epoch: 6 Global Step: 256590 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:03,505-Speed 2629.97 samples/sec Loss 9.3353 LearningRate 0.0477 Epoch: 6 Global Step: 256600 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:33:07,392-Speed 2635.06 samples/sec Loss 9.2139 LearningRate 0.0477 Epoch: 6 Global Step: 256610 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:11,287-Speed 2629.97 samples/sec Loss 9.2390 LearningRate 0.0477 Epoch: 6 Global Step: 256620 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:15,182-Speed 2630.65 samples/sec Loss 9.3686 LearningRate 0.0477 Epoch: 6 Global Step: 256630 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:19,072-Speed 2632.93 samples/sec Loss 9.1924 LearningRate 0.0477 Epoch: 6 Global Step: 256640 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:22,999-Speed 2607.76 samples/sec Loss 9.3151 LearningRate 0.0477 Epoch: 6 Global Step: 256650 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:26,894-Speed 2629.76 samples/sec Loss 9.2622 LearningRate 0.0477 Epoch: 6 Global Step: 256660 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:30,785-Speed 2632.85 samples/sec Loss 9.4232 LearningRate 0.0477 Epoch: 6 Global Step: 256670 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:34,701-Speed 2615.87 samples/sec Loss 9.3352 LearningRate 0.0477 Epoch: 6 Global Step: 256680 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:38,597-Speed 2628.81 samples/sec Loss 9.3637 LearningRate 0.0477 Epoch: 6 Global Step: 256690 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:33:42,470-Speed 2644.69 samples/sec Loss 9.2595 LearningRate 0.0477 Epoch: 6 Global Step: 256700 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:33:46,367-Speed 2628.41 samples/sec Loss 9.3306 LearningRate 0.0477 Epoch: 6 Global Step: 256710 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:33:50,253-Speed 2635.55 samples/sec Loss 9.2974 LearningRate 0.0477 Epoch: 6 Global Step: 256720 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:33:54,144-Speed 2632.89 samples/sec Loss 9.3492 LearningRate 0.0477 Epoch: 6 Global Step: 256730 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:33:58,035-Speed 2633.02 samples/sec Loss 9.3463 LearningRate 0.0477 Epoch: 6 Global Step: 256740 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:34:01,925-Speed 2632.65 samples/sec Loss 9.2700 LearningRate 0.0477 Epoch: 6 Global Step: 256750 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:34:05,820-Speed 2629.95 samples/sec Loss 9.3756 LearningRate 0.0477 Epoch: 6 Global Step: 256760 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:34:09,726-Speed 2622.65 samples/sec Loss 9.3795 LearningRate 0.0477 Epoch: 6 Global Step: 256770 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:34:13,619-Speed 2630.87 samples/sec Loss 9.4077 LearningRate 0.0477 Epoch: 6 Global Step: 256780 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:34:17,609-Speed 2566.89 samples/sec Loss 9.4251 LearningRate 0.0477 Epoch: 6 Global Step: 256790 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:34:21,588-Speed 2573.66 samples/sec Loss 9.3112 LearningRate 0.0477 Epoch: 6 Global Step: 256800 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:25,680-Speed 2503.46 samples/sec Loss 9.3031 LearningRate 0.0477 Epoch: 6 Global Step: 256810 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:29,781-Speed 2497.34 samples/sec Loss 9.3439 LearningRate 0.0477 Epoch: 6 Global Step: 256820 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:33,877-Speed 2500.37 samples/sec Loss 9.3793 LearningRate 0.0477 Epoch: 6 Global Step: 256830 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:37,968-Speed 2503.92 samples/sec Loss 9.3710 LearningRate 0.0477 Epoch: 6 Global Step: 256840 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:42,109-Speed 2473.18 samples/sec Loss 9.2353 LearningRate 0.0477 Epoch: 6 Global Step: 256850 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:46,080-Speed 2579.76 samples/sec Loss 9.2402 LearningRate 0.0477 Epoch: 6 Global Step: 256860 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:49,982-Speed 2624.62 samples/sec Loss 9.2874 LearningRate 0.0477 Epoch: 6 Global Step: 256870 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:53,875-Speed 2631.22 samples/sec Loss 9.2494 LearningRate 0.0477 Epoch: 6 Global Step: 256880 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:34:57,793-Speed 2613.86 samples/sec Loss 9.3731 LearningRate 0.0477 Epoch: 6 Global Step: 256890 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:35:01,684-Speed 2632.31 samples/sec Loss 9.2457 LearningRate 0.0477 Epoch: 6 Global Step: 256900 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:35:05,593-Speed 2620.14 samples/sec Loss 9.2212 LearningRate 0.0477 Epoch: 6 Global Step: 256910 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:35:09,479-Speed 2635.90 samples/sec Loss 9.2917 LearningRate 0.0477 Epoch: 6 Global Step: 256920 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:35:13,362-Speed 2637.71 samples/sec Loss 9.2781 LearningRate 0.0476 Epoch: 6 Global Step: 256930 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:17,252-Speed 2633.10 samples/sec Loss 9.3387 LearningRate 0.0476 Epoch: 6 Global Step: 256940 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:21,146-Speed 2630.45 samples/sec Loss 9.3964 LearningRate 0.0476 Epoch: 6 Global Step: 256950 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:25,038-Speed 2631.67 samples/sec Loss 9.4241 LearningRate 0.0476 Epoch: 6 Global Step: 256960 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:28,946-Speed 2621.14 samples/sec Loss 9.4986 LearningRate 0.0476 Epoch: 6 Global Step: 256970 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:33,384-Speed 2307.76 samples/sec Loss 9.3272 LearningRate 0.0476 Epoch: 6 Global Step: 256980 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:37,289-Speed 2622.97 samples/sec Loss 9.3493 LearningRate 0.0476 Epoch: 6 Global Step: 256990 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:41,181-Speed 2631.99 samples/sec Loss 9.4710 LearningRate 0.0476 Epoch: 6 Global Step: 257000 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:45,074-Speed 2630.94 samples/sec Loss 9.4081 LearningRate 0.0476 Epoch: 6 Global Step: 257010 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:48,966-Speed 2631.57 samples/sec Loss 9.2253 LearningRate 0.0476 Epoch: 6 Global Step: 257020 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:35:52,860-Speed 2630.12 samples/sec Loss 9.2520 LearningRate 0.0476 Epoch: 6 Global Step: 257030 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:35:56,755-Speed 2630.08 samples/sec Loss 9.2456 LearningRate 0.0476 Epoch: 6 Global Step: 257040 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:00,642-Speed 2635.06 samples/sec Loss 9.3387 LearningRate 0.0476 Epoch: 6 Global Step: 257050 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:04,538-Speed 2629.00 samples/sec Loss 9.3387 LearningRate 0.0476 Epoch: 6 Global Step: 257060 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:08,443-Speed 2622.53 samples/sec Loss 9.3559 LearningRate 0.0476 Epoch: 6 Global Step: 257070 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:12,341-Speed 2627.46 samples/sec Loss 9.4001 LearningRate 0.0476 Epoch: 6 Global Step: 257080 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:16,234-Speed 2631.35 samples/sec Loss 9.4275 LearningRate 0.0476 Epoch: 6 Global Step: 257090 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:20,128-Speed 2630.22 samples/sec Loss 9.2943 LearningRate 0.0476 Epoch: 6 Global Step: 257100 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:24,013-Speed 2636.42 samples/sec Loss 9.4513 LearningRate 0.0476 Epoch: 6 Global Step: 257110 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:27,903-Speed 2633.24 samples/sec Loss 9.2191 LearningRate 0.0476 Epoch: 6 Global Step: 257120 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:31,775-Speed 2645.23 samples/sec Loss 9.2981 LearningRate 0.0476 Epoch: 6 Global Step: 257130 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:35,670-Speed 2629.25 samples/sec Loss 9.1914 LearningRate 0.0476 Epoch: 6 Global Step: 257140 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:39,560-Speed 2633.43 samples/sec Loss 9.1680 LearningRate 0.0476 Epoch: 6 Global Step: 257150 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:43,453-Speed 2630.81 samples/sec Loss 9.2664 LearningRate 0.0476 Epoch: 6 Global Step: 257160 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:47,349-Speed 2628.72 samples/sec Loss 9.2161 LearningRate 0.0476 Epoch: 6 Global Step: 257170 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:51,251-Speed 2625.37 samples/sec Loss 9.3330 LearningRate 0.0476 Epoch: 6 Global Step: 257180 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:55,147-Speed 2629.15 samples/sec Loss 9.3369 LearningRate 0.0476 Epoch: 6 Global Step: 257190 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:36:59,067-Speed 2613.22 samples/sec Loss 9.3564 LearningRate 0.0476 Epoch: 6 Global Step: 257200 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:37:02,971-Speed 2622.80 samples/sec Loss 9.2337 LearningRate 0.0476 Epoch: 6 Global Step: 257210 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:37:06,863-Speed 2631.94 samples/sec Loss 9.2711 LearningRate 0.0476 Epoch: 6 Global Step: 257220 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:37:10,761-Speed 2627.95 samples/sec Loss 9.4395 LearningRate 0.0476 Epoch: 6 Global Step: 257230 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:14,653-Speed 2631.63 samples/sec Loss 9.3455 LearningRate 0.0476 Epoch: 6 Global Step: 257240 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:18,553-Speed 2625.94 samples/sec Loss 9.2083 LearningRate 0.0476 Epoch: 6 Global Step: 257250 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:22,445-Speed 2631.82 samples/sec Loss 9.1862 LearningRate 0.0476 Epoch: 6 Global Step: 257260 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:26,346-Speed 2625.58 samples/sec Loss 9.3805 LearningRate 0.0476 Epoch: 6 Global Step: 257270 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:30,244-Speed 2627.85 samples/sec Loss 9.1761 LearningRate 0.0476 Epoch: 6 Global Step: 257280 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:34,234-Speed 2567.00 samples/sec Loss 9.0991 LearningRate 0.0476 Epoch: 6 Global Step: 257290 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:38,127-Speed 2631.20 samples/sec Loss 9.1915 LearningRate 0.0476 Epoch: 6 Global Step: 257300 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:42,031-Speed 2622.83 samples/sec Loss 9.2351 LearningRate 0.0476 Epoch: 6 Global Step: 257310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:45,964-Speed 2604.77 samples/sec Loss 9.2158 LearningRate 0.0476 Epoch: 6 Global Step: 257320 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:49,842-Speed 2641.80 samples/sec Loss 9.2230 LearningRate 0.0476 Epoch: 6 Global Step: 257330 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:37:53,726-Speed 2637.03 samples/sec Loss 9.3332 LearningRate 0.0476 Epoch: 6 Global Step: 257340 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:37:57,633-Speed 2621.60 samples/sec Loss 9.2765 LearningRate 0.0476 Epoch: 6 Global Step: 257350 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:01,529-Speed 2629.38 samples/sec Loss 9.3562 LearningRate 0.0476 Epoch: 6 Global Step: 257360 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:05,456-Speed 2607.77 samples/sec Loss 9.2915 LearningRate 0.0476 Epoch: 6 Global Step: 257370 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:09,463-Speed 2555.79 samples/sec Loss 9.1731 LearningRate 0.0476 Epoch: 6 Global Step: 257380 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:13,375-Speed 2618.90 samples/sec Loss 9.2042 LearningRate 0.0476 Epoch: 6 Global Step: 257390 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:17,284-Speed 2619.85 samples/sec Loss 9.3544 LearningRate 0.0476 Epoch: 6 Global Step: 257400 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:21,183-Speed 2627.70 samples/sec Loss 9.3654 LearningRate 0.0476 Epoch: 6 Global Step: 257410 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:25,077-Speed 2629.92 samples/sec Loss 9.2745 LearningRate 0.0476 Epoch: 6 Global Step: 257420 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:28,985-Speed 2621.34 samples/sec Loss 9.3035 LearningRate 0.0476 Epoch: 6 Global Step: 257430 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:38:32,877-Speed 2631.30 samples/sec Loss 9.1818 LearningRate 0.0476 Epoch: 6 Global Step: 257440 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:38:36,768-Speed 2632.29 samples/sec Loss 9.4194 LearningRate 0.0476 Epoch: 6 Global Step: 257450 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:38:40,904-Speed 2476.34 samples/sec Loss 9.3030 LearningRate 0.0476 Epoch: 6 Global Step: 257460 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:38:44,797-Speed 2630.56 samples/sec Loss 9.3137 LearningRate 0.0476 Epoch: 6 Global Step: 257470 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:38:48,695-Speed 2628.37 samples/sec Loss 9.3646 LearningRate 0.0476 Epoch: 6 Global Step: 257480 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:38:52,601-Speed 2622.18 samples/sec Loss 9.2218 LearningRate 0.0476 Epoch: 6 Global Step: 257490 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:38:56,506-Speed 2622.37 samples/sec Loss 9.2664 LearningRate 0.0476 Epoch: 6 Global Step: 257500 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:39:00,401-Speed 2629.24 samples/sec Loss 9.3190 LearningRate 0.0476 Epoch: 6 Global Step: 257510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:39:04,299-Speed 2627.91 samples/sec Loss 9.3380 LearningRate 0.0476 Epoch: 6 Global Step: 257520 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:39:08,202-Speed 2624.48 samples/sec Loss 9.2043 LearningRate 0.0476 Epoch: 6 Global Step: 257530 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:39:12,083-Speed 2639.35 samples/sec Loss 9.6132 LearningRate 0.0475 Epoch: 6 Global Step: 257540 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:15,990-Speed 2620.81 samples/sec Loss 9.2332 LearningRate 0.0475 Epoch: 6 Global Step: 257550 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:19,899-Speed 2621.02 samples/sec Loss 9.2408 LearningRate 0.0475 Epoch: 6 Global Step: 257560 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:23,796-Speed 2628.73 samples/sec Loss 9.2913 LearningRate 0.0475 Epoch: 6 Global Step: 257570 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:27,728-Speed 2604.47 samples/sec Loss 9.4731 LearningRate 0.0475 Epoch: 6 Global Step: 257580 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:31,626-Speed 2627.77 samples/sec Loss 9.2252 LearningRate 0.0475 Epoch: 6 Global Step: 257590 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:35,533-Speed 2621.87 samples/sec Loss 9.4229 LearningRate 0.0475 Epoch: 6 Global Step: 257600 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:39,424-Speed 2632.48 samples/sec Loss 9.3440 LearningRate 0.0475 Epoch: 6 Global Step: 257610 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:43,316-Speed 2631.83 samples/sec Loss 9.3792 LearningRate 0.0475 Epoch: 6 Global Step: 257620 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:47,209-Speed 2631.06 samples/sec Loss 9.3579 LearningRate 0.0475 Epoch: 6 Global Step: 257630 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:39:51,108-Speed 2626.96 samples/sec Loss 9.2052 LearningRate 0.0475 Epoch: 6 Global Step: 257640 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:39:55,006-Speed 2627.10 samples/sec Loss 9.2409 LearningRate 0.0475 Epoch: 6 Global Step: 257650 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:39:58,901-Speed 2629.95 samples/sec Loss 9.3507 LearningRate 0.0475 Epoch: 6 Global Step: 257660 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:40:02,794-Speed 2630.94 samples/sec Loss 9.3944 LearningRate 0.0475 Epoch: 6 Global Step: 257670 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:40:06,691-Speed 2627.94 samples/sec Loss 9.3275 LearningRate 0.0475 Epoch: 6 Global Step: 257680 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:40:10,596-Speed 2623.56 samples/sec Loss 9.4472 LearningRate 0.0475 Epoch: 6 Global Step: 257690 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:14,484-Speed 2634.60 samples/sec Loss 9.3378 LearningRate 0.0475 Epoch: 6 Global Step: 257700 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:19,212-Speed 2166.11 samples/sec Loss 9.3988 LearningRate 0.0475 Epoch: 6 Global Step: 257710 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:23,106-Speed 2630.14 samples/sec Loss 9.2480 LearningRate 0.0475 Epoch: 6 Global Step: 257720 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:27,002-Speed 2629.17 samples/sec Loss 9.3789 LearningRate 0.0475 Epoch: 6 Global Step: 257730 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:30,892-Speed 2633.56 samples/sec Loss 9.4614 LearningRate 0.0475 Epoch: 6 Global Step: 257740 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:34,783-Speed 2632.32 samples/sec Loss 9.3156 LearningRate 0.0475 Epoch: 6 Global Step: 257750 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:38,674-Speed 2631.62 samples/sec Loss 9.3270 LearningRate 0.0475 Epoch: 6 Global Step: 257760 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:42,576-Speed 2624.96 samples/sec Loss 9.3457 LearningRate 0.0475 Epoch: 6 Global Step: 257770 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:46,476-Speed 2626.93 samples/sec Loss 9.2466 LearningRate 0.0475 Epoch: 6 Global Step: 257780 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:40:50,375-Speed 2626.58 samples/sec Loss 9.3595 LearningRate 0.0475 Epoch: 6 Global Step: 257790 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:40:54,272-Speed 2628.78 samples/sec Loss 9.3127 LearningRate 0.0475 Epoch: 6 Global Step: 257800 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:40:58,170-Speed 2627.51 samples/sec Loss 9.4071 LearningRate 0.0475 Epoch: 6 Global Step: 257810 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:02,081-Speed 2618.62 samples/sec Loss 9.3302 LearningRate 0.0475 Epoch: 6 Global Step: 257820 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:05,985-Speed 2622.92 samples/sec Loss 9.5456 LearningRate 0.0475 Epoch: 6 Global Step: 257830 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:09,877-Speed 2632.26 samples/sec Loss 9.3997 LearningRate 0.0475 Epoch: 6 Global Step: 257840 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:13,781-Speed 2623.52 samples/sec Loss 9.4751 LearningRate 0.0475 Epoch: 6 Global Step: 257850 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:17,670-Speed 2633.62 samples/sec Loss 9.3040 LearningRate 0.0475 Epoch: 6 Global Step: 257860 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:21,559-Speed 2634.17 samples/sec Loss 9.3652 LearningRate 0.0475 Epoch: 6 Global Step: 257870 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:25,450-Speed 2632.13 samples/sec Loss 9.3169 LearningRate 0.0475 Epoch: 6 Global Step: 257880 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:29,340-Speed 2633.34 samples/sec Loss 9.2846 LearningRate 0.0475 Epoch: 6 Global Step: 257890 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:41:33,233-Speed 2631.28 samples/sec Loss 9.2311 LearningRate 0.0475 Epoch: 6 Global Step: 257900 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:41:37,128-Speed 2629.19 samples/sec Loss 9.2855 LearningRate 0.0475 Epoch: 6 Global Step: 257910 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:41:41,019-Speed 2632.41 samples/sec Loss 9.3601 LearningRate 0.0475 Epoch: 6 Global Step: 257920 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:44,916-Speed 2628.56 samples/sec Loss 9.2201 LearningRate 0.0475 Epoch: 6 Global Step: 257930 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:48,807-Speed 2632.16 samples/sec Loss 9.3665 LearningRate 0.0475 Epoch: 6 Global Step: 257940 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:52,722-Speed 2616.68 samples/sec Loss 9.3279 LearningRate 0.0475 Epoch: 6 Global Step: 257950 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:41:56,618-Speed 2629.31 samples/sec Loss 9.3540 LearningRate 0.0475 Epoch: 6 Global Step: 257960 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:42:00,529-Speed 2618.93 samples/sec Loss 9.3081 LearningRate 0.0475 Epoch: 6 Global Step: 257970 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:42:04,440-Speed 2618.97 samples/sec Loss 9.3246 LearningRate 0.0475 Epoch: 6 Global Step: 257980 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:42:08,345-Speed 2622.48 samples/sec Loss 9.3309 LearningRate 0.0475 Epoch: 6 Global Step: 257990 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:42:12,226-Speed 2638.98 samples/sec Loss 9.3146 LearningRate 0.0475 Epoch: 6 Global Step: 258000 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:16,119-Speed 2631.49 samples/sec Loss 9.2187 LearningRate 0.0475 Epoch: 6 Global Step: 258010 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:20,023-Speed 2623.89 samples/sec Loss 9.3413 LearningRate 0.0475 Epoch: 6 Global Step: 258020 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:23,926-Speed 2623.79 samples/sec Loss 9.2451 LearningRate 0.0475 Epoch: 6 Global Step: 258030 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:27,826-Speed 2626.66 samples/sec Loss 9.1660 LearningRate 0.0475 Epoch: 6 Global Step: 258040 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:31,725-Speed 2626.83 samples/sec Loss 9.3019 LearningRate 0.0475 Epoch: 6 Global Step: 258050 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:35,624-Speed 2627.15 samples/sec Loss 9.2972 LearningRate 0.0475 Epoch: 6 Global Step: 258060 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:39,520-Speed 2629.08 samples/sec Loss 9.2013 LearningRate 0.0475 Epoch: 6 Global Step: 258070 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:43,422-Speed 2624.99 samples/sec Loss 9.1813 LearningRate 0.0475 Epoch: 6 Global Step: 258080 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:47,331-Speed 2620.20 samples/sec Loss 9.2575 LearningRate 0.0475 Epoch: 6 Global Step: 258090 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:42:51,224-Speed 2630.88 samples/sec Loss 9.2935 LearningRate 0.0475 Epoch: 6 Global Step: 258100 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:42:55,123-Speed 2627.03 samples/sec Loss 9.4688 LearningRate 0.0475 Epoch: 6 Global Step: 258110 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:42:59,023-Speed 2626.27 samples/sec Loss 9.2976 LearningRate 0.0475 Epoch: 6 Global Step: 258120 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:43:02,913-Speed 2633.03 samples/sec Loss 9.2504 LearningRate 0.0475 Epoch: 6 Global Step: 258130 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:43:06,814-Speed 2625.58 samples/sec Loss 9.2448 LearningRate 0.0474 Epoch: 6 Global Step: 258140 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:43:10,687-Speed 2644.75 samples/sec Loss 9.2246 LearningRate 0.0474 Epoch: 6 Global Step: 258150 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:14,583-Speed 2629.41 samples/sec Loss 9.2861 LearningRate 0.0474 Epoch: 6 Global Step: 258160 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:18,482-Speed 2626.41 samples/sec Loss 9.2918 LearningRate 0.0474 Epoch: 6 Global Step: 258170 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:22,397-Speed 2616.69 samples/sec Loss 9.3700 LearningRate 0.0474 Epoch: 6 Global Step: 258180 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:26,290-Speed 2630.91 samples/sec Loss 9.2886 LearningRate 0.0474 Epoch: 6 Global Step: 258190 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:30,184-Speed 2630.60 samples/sec Loss 9.2520 LearningRate 0.0474 Epoch: 6 Global Step: 258200 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:34,082-Speed 2627.99 samples/sec Loss 9.2962 LearningRate 0.0474 Epoch: 6 Global Step: 258210 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:37,987-Speed 2622.39 samples/sec Loss 9.2664 LearningRate 0.0474 Epoch: 6 Global Step: 258220 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:41,882-Speed 2630.06 samples/sec Loss 9.5074 LearningRate 0.0474 Epoch: 6 Global Step: 258230 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:45,775-Speed 2630.68 samples/sec Loss 9.3547 LearningRate 0.0474 Epoch: 6 Global Step: 258240 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:43:49,669-Speed 2630.59 samples/sec Loss 9.4494 LearningRate 0.0474 Epoch: 6 Global Step: 258250 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:43:53,562-Speed 2631.31 samples/sec Loss 9.3129 LearningRate 0.0474 Epoch: 6 Global Step: 258260 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:43:57,459-Speed 2628.05 samples/sec Loss 9.4075 LearningRate 0.0474 Epoch: 6 Global Step: 258270 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:44:01,353-Speed 2630.89 samples/sec Loss 9.4261 LearningRate 0.0474 Epoch: 6 Global Step: 258280 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:44:05,247-Speed 2629.74 samples/sec Loss 9.3017 LearningRate 0.0474 Epoch: 6 Global Step: 258290 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:44:09,143-Speed 2629.18 samples/sec Loss 9.3250 LearningRate 0.0474 Epoch: 6 Global Step: 258300 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:44:13,039-Speed 2628.21 samples/sec Loss 9.1945 LearningRate 0.0474 Epoch: 6 Global Step: 258310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:44:16,915-Speed 2642.87 samples/sec Loss 9.2853 LearningRate 0.0474 Epoch: 6 Global Step: 258320 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:20,811-Speed 2628.85 samples/sec Loss 9.2984 LearningRate 0.0474 Epoch: 6 Global Step: 258330 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:24,714-Speed 2624.31 samples/sec Loss 9.2386 LearningRate 0.0474 Epoch: 6 Global Step: 258340 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:28,618-Speed 2623.78 samples/sec Loss 9.3428 LearningRate 0.0474 Epoch: 6 Global Step: 258350 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:32,506-Speed 2634.11 samples/sec Loss 9.3050 LearningRate 0.0474 Epoch: 6 Global Step: 258360 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:36,403-Speed 2628.71 samples/sec Loss 9.3346 LearningRate 0.0474 Epoch: 6 Global Step: 258370 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:40,307-Speed 2623.80 samples/sec Loss 9.3338 LearningRate 0.0474 Epoch: 6 Global Step: 258380 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:44,198-Speed 2632.24 samples/sec Loss 9.3300 LearningRate 0.0474 Epoch: 6 Global Step: 258390 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:48,110-Speed 2618.16 samples/sec Loss 9.2801 LearningRate 0.0474 Epoch: 6 Global Step: 258400 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:52,007-Speed 2628.59 samples/sec Loss 9.1841 LearningRate 0.0474 Epoch: 6 Global Step: 258410 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:44:55,903-Speed 2629.20 samples/sec Loss 9.2164 LearningRate 0.0474 Epoch: 6 Global Step: 258420 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:44:59,794-Speed 2632.18 samples/sec Loss 9.2835 LearningRate 0.0474 Epoch: 6 Global Step: 258430 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:03,692-Speed 2627.95 samples/sec Loss 9.2134 LearningRate 0.0474 Epoch: 6 Global Step: 258440 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:07,586-Speed 2629.99 samples/sec Loss 9.2375 LearningRate 0.0474 Epoch: 6 Global Step: 258450 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:11,476-Speed 2633.22 samples/sec Loss 9.3099 LearningRate 0.0474 Epoch: 6 Global Step: 258460 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:15,380-Speed 2623.92 samples/sec Loss 9.2507 LearningRate 0.0474 Epoch: 6 Global Step: 258470 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:19,272-Speed 2631.63 samples/sec Loss 9.3861 LearningRate 0.0474 Epoch: 6 Global Step: 258480 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:23,176-Speed 2623.38 samples/sec Loss 9.2074 LearningRate 0.0474 Epoch: 6 Global Step: 258490 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:27,068-Speed 2631.48 samples/sec Loss 9.2637 LearningRate 0.0474 Epoch: 6 Global Step: 258500 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:30,966-Speed 2627.92 samples/sec Loss 9.3454 LearningRate 0.0474 Epoch: 6 Global Step: 258510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:34,861-Speed 2629.43 samples/sec Loss 9.2652 LearningRate 0.0474 Epoch: 6 Global Step: 258520 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:45:38,753-Speed 2631.77 samples/sec Loss 9.2454 LearningRate 0.0474 Epoch: 6 Global Step: 258530 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:45:42,626-Speed 2644.24 samples/sec Loss 9.3229 LearningRate 0.0474 Epoch: 6 Global Step: 258540 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:46,515-Speed 2634.55 samples/sec Loss 9.2622 LearningRate 0.0474 Epoch: 6 Global Step: 258550 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:50,409-Speed 2629.83 samples/sec Loss 9.3645 LearningRate 0.0474 Epoch: 6 Global Step: 258560 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:54,319-Speed 2620.12 samples/sec Loss 9.3005 LearningRate 0.0474 Epoch: 6 Global Step: 258570 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:45:58,228-Speed 2619.97 samples/sec Loss 9.3783 LearningRate 0.0474 Epoch: 6 Global Step: 258580 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:46:02,120-Speed 2631.13 samples/sec Loss 9.2230 LearningRate 0.0474 Epoch: 6 Global Step: 258590 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:46:06,016-Speed 2629.23 samples/sec Loss 9.3962 LearningRate 0.0474 Epoch: 6 Global Step: 258600 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:46:09,926-Speed 2619.82 samples/sec Loss 9.3597 LearningRate 0.0474 Epoch: 6 Global Step: 258610 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:46:13,825-Speed 2626.94 samples/sec Loss 9.4502 LearningRate 0.0474 Epoch: 6 Global Step: 258620 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:46:17,727-Speed 2625.05 samples/sec Loss 9.3258 LearningRate 0.0474 Epoch: 6 Global Step: 258630 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:46:21,624-Speed 2628.07 samples/sec Loss 9.1703 LearningRate 0.0474 Epoch: 6 Global Step: 258640 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:25,519-Speed 2630.47 samples/sec Loss 9.3282 LearningRate 0.0474 Epoch: 6 Global Step: 258650 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:29,449-Speed 2606.01 samples/sec Loss 9.3260 LearningRate 0.0474 Epoch: 6 Global Step: 258660 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:33,355-Speed 2622.06 samples/sec Loss 9.2505 LearningRate 0.0474 Epoch: 6 Global Step: 258670 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:37,317-Speed 2585.25 samples/sec Loss 9.3255 LearningRate 0.0474 Epoch: 6 Global Step: 258680 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:41,230-Speed 2617.67 samples/sec Loss 9.2693 LearningRate 0.0474 Epoch: 6 Global Step: 258690 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:45,124-Speed 2630.55 samples/sec Loss 9.3186 LearningRate 0.0474 Epoch: 6 Global Step: 258700 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:49,020-Speed 2629.28 samples/sec Loss 9.1871 LearningRate 0.0474 Epoch: 6 Global Step: 258710 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:52,931-Speed 2619.18 samples/sec Loss 9.2124 LearningRate 0.0474 Epoch: 6 Global Step: 258720 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:46:56,820-Speed 2633.86 samples/sec Loss 9.3573 LearningRate 0.0474 Epoch: 6 Global Step: 258730 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:00,735-Speed 2616.40 samples/sec Loss 9.1195 LearningRate 0.0473 Epoch: 6 Global Step: 258740 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:04,641-Speed 2621.91 samples/sec Loss 9.2279 LearningRate 0.0473 Epoch: 6 Global Step: 258750 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:08,552-Speed 2619.11 samples/sec Loss 9.2606 LearningRate 0.0473 Epoch: 6 Global Step: 258760 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:12,452-Speed 2625.63 samples/sec Loss 9.3489 LearningRate 0.0473 Epoch: 6 Global Step: 258770 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:16,348-Speed 2629.67 samples/sec Loss 9.3376 LearningRate 0.0473 Epoch: 6 Global Step: 258780 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:20,243-Speed 2629.66 samples/sec Loss 9.2892 LearningRate 0.0473 Epoch: 6 Global Step: 258790 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:24,140-Speed 2628.74 samples/sec Loss 9.3506 LearningRate 0.0473 Epoch: 6 Global Step: 258800 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:28,035-Speed 2630.23 samples/sec Loss 9.4524 LearningRate 0.0473 Epoch: 6 Global Step: 258810 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:31,946-Speed 2618.23 samples/sec Loss 9.3649 LearningRate 0.0473 Epoch: 6 Global Step: 258820 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:35,847-Speed 2625.91 samples/sec Loss 9.3421 LearningRate 0.0473 Epoch: 6 Global Step: 258830 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:47:39,744-Speed 2628.52 samples/sec Loss 9.4250 LearningRate 0.0473 Epoch: 6 Global Step: 258840 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:47:43,638-Speed 2629.89 samples/sec Loss 9.2958 LearningRate 0.0473 Epoch: 6 Global Step: 258850 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:47:47,535-Speed 2629.04 samples/sec Loss 9.2431 LearningRate 0.0473 Epoch: 6 Global Step: 258860 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:47:51,418-Speed 2637.50 samples/sec Loss 9.2359 LearningRate 0.0473 Epoch: 6 Global Step: 258870 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:55,318-Speed 2626.12 samples/sec Loss 9.4550 LearningRate 0.0473 Epoch: 6 Global Step: 258880 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:47:59,211-Speed 2631.52 samples/sec Loss 9.3604 LearningRate 0.0473 Epoch: 6 Global Step: 258890 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:03,149-Speed 2601.06 samples/sec Loss 9.2946 LearningRate 0.0473 Epoch: 6 Global Step: 258900 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:07,043-Speed 2630.16 samples/sec Loss 9.1932 LearningRate 0.0473 Epoch: 6 Global Step: 258910 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:10,984-Speed 2598.82 samples/sec Loss 9.1500 LearningRate 0.0473 Epoch: 6 Global Step: 258920 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:14,877-Speed 2631.33 samples/sec Loss 9.2639 LearningRate 0.0473 Epoch: 6 Global Step: 258930 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:18,770-Speed 2630.69 samples/sec Loss 9.1816 LearningRate 0.0473 Epoch: 6 Global Step: 258940 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:22,661-Speed 2632.45 samples/sec Loss 9.2249 LearningRate 0.0473 Epoch: 6 Global Step: 258950 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:26,551-Speed 2632.86 samples/sec Loss 9.3060 LearningRate 0.0473 Epoch: 6 Global Step: 258960 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:30,444-Speed 2631.79 samples/sec Loss 9.2927 LearningRate 0.0473 Epoch: 6 Global Step: 258970 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:48:34,345-Speed 2625.66 samples/sec Loss 9.2062 LearningRate 0.0473 Epoch: 6 Global Step: 258980 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:48:38,241-Speed 2628.39 samples/sec Loss 9.2550 LearningRate 0.0473 Epoch: 6 Global Step: 258990 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:48:42,134-Speed 2631.21 samples/sec Loss 9.2743 LearningRate 0.0473 Epoch: 6 Global Step: 259000 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:46,033-Speed 2626.84 samples/sec Loss 9.2341 LearningRate 0.0473 Epoch: 6 Global Step: 259010 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:49,930-Speed 2628.49 samples/sec Loss 9.4150 LearningRate 0.0473 Epoch: 6 Global Step: 259020 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:53,834-Speed 2623.48 samples/sec Loss 9.3058 LearningRate 0.0473 Epoch: 6 Global Step: 259030 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:48:57,754-Speed 2612.65 samples/sec Loss 9.1531 LearningRate 0.0473 Epoch: 6 Global Step: 259040 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:01,647-Speed 2631.15 samples/sec Loss 9.1386 LearningRate 0.0473 Epoch: 6 Global Step: 259050 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:05,545-Speed 2627.58 samples/sec Loss 9.3299 LearningRate 0.0473 Epoch: 6 Global Step: 259060 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:09,450-Speed 2623.17 samples/sec Loss 9.2274 LearningRate 0.0473 Epoch: 6 Global Step: 259070 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:13,341-Speed 2632.05 samples/sec Loss 9.2936 LearningRate 0.0473 Epoch: 6 Global Step: 259080 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:17,232-Speed 2632.74 samples/sec Loss 9.0871 LearningRate 0.0473 Epoch: 6 Global Step: 259090 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:21,124-Speed 2631.51 samples/sec Loss 9.1796 LearningRate 0.0473 Epoch: 6 Global Step: 259100 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:49:25,015-Speed 2631.96 samples/sec Loss 9.1784 LearningRate 0.0473 Epoch: 6 Global Step: 259110 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:49:28,906-Speed 2632.89 samples/sec Loss 9.3877 LearningRate 0.0473 Epoch: 6 Global Step: 259120 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:49:33,378-Speed 2290.20 samples/sec Loss 9.3622 LearningRate 0.0473 Epoch: 6 Global Step: 259130 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:37,269-Speed 2632.24 samples/sec Loss 9.3102 LearningRate 0.0473 Epoch: 6 Global Step: 259140 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:41,176-Speed 2621.66 samples/sec Loss 9.2454 LearningRate 0.0473 Epoch: 6 Global Step: 259150 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:45,069-Speed 2630.77 samples/sec Loss 9.2485 LearningRate 0.0473 Epoch: 6 Global Step: 259160 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:48,962-Speed 2631.28 samples/sec Loss 9.2820 LearningRate 0.0473 Epoch: 6 Global Step: 259170 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:52,862-Speed 2626.56 samples/sec Loss 9.3817 LearningRate 0.0473 Epoch: 6 Global Step: 259180 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:49:56,753-Speed 2632.07 samples/sec Loss 9.2470 LearningRate 0.0473 Epoch: 6 Global Step: 259190 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:00,649-Speed 2629.25 samples/sec Loss 9.2634 LearningRate 0.0473 Epoch: 6 Global Step: 259200 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:04,556-Speed 2621.66 samples/sec Loss 9.2956 LearningRate 0.0473 Epoch: 6 Global Step: 259210 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:08,448-Speed 2631.76 samples/sec Loss 9.2740 LearningRate 0.0473 Epoch: 6 Global Step: 259220 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:12,336-Speed 2634.16 samples/sec Loss 9.3669 LearningRate 0.0473 Epoch: 6 Global Step: 259230 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:50:16,229-Speed 2631.30 samples/sec Loss 9.3159 LearningRate 0.0473 Epoch: 6 Global Step: 259240 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:50:20,123-Speed 2629.89 samples/sec Loss 9.3500 LearningRate 0.0473 Epoch: 6 Global Step: 259250 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:50:24,014-Speed 2632.32 samples/sec Loss 9.2463 LearningRate 0.0473 Epoch: 6 Global Step: 259260 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:50:27,893-Speed 2640.77 samples/sec Loss 9.3466 LearningRate 0.0473 Epoch: 6 Global Step: 259270 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:31,782-Speed 2633.88 samples/sec Loss 9.2109 LearningRate 0.0473 Epoch: 6 Global Step: 259280 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:35,673-Speed 2632.72 samples/sec Loss 9.2991 LearningRate 0.0473 Epoch: 6 Global Step: 259290 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:39,567-Speed 2630.03 samples/sec Loss 9.1871 LearningRate 0.0473 Epoch: 6 Global Step: 259300 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:43,455-Speed 2634.03 samples/sec Loss 9.1792 LearningRate 0.0473 Epoch: 6 Global Step: 259310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:47,375-Speed 2613.31 samples/sec Loss 9.2774 LearningRate 0.0473 Epoch: 6 Global Step: 259320 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:51,404-Speed 2542.41 samples/sec Loss 9.2240 LearningRate 0.0473 Epoch: 6 Global Step: 259330 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:55,300-Speed 2629.11 samples/sec Loss 9.2550 LearningRate 0.0472 Epoch: 6 Global Step: 259340 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:50:59,210-Speed 2619.98 samples/sec Loss 9.3015 LearningRate 0.0472 Epoch: 6 Global Step: 259350 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:03,112-Speed 2624.83 samples/sec Loss 9.2088 LearningRate 0.0472 Epoch: 6 Global Step: 259360 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:07,017-Speed 2622.44 samples/sec Loss 9.1403 LearningRate 0.0472 Epoch: 6 Global Step: 259370 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:51:10,924-Speed 2621.83 samples/sec Loss 9.1900 LearningRate 0.0472 Epoch: 6 Global Step: 259380 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:51:14,835-Speed 2619.35 samples/sec Loss 9.3320 LearningRate 0.0472 Epoch: 6 Global Step: 259390 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:51:18,737-Speed 2625.20 samples/sec Loss 9.3308 LearningRate 0.0472 Epoch: 6 Global Step: 259400 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:51:22,634-Speed 2627.63 samples/sec Loss 9.2413 LearningRate 0.0472 Epoch: 6 Global Step: 259410 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:51:26,530-Speed 2629.90 samples/sec Loss 9.1968 LearningRate 0.0472 Epoch: 6 Global Step: 259420 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:51:30,426-Speed 2628.64 samples/sec Loss 9.3696 LearningRate 0.0472 Epoch: 6 Global Step: 259430 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:51:34,300-Speed 2643.74 samples/sec Loss 9.2565 LearningRate 0.0472 Epoch: 6 Global Step: 259440 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:38,212-Speed 2617.92 samples/sec Loss 9.3413 LearningRate 0.0472 Epoch: 6 Global Step: 259450 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:42,098-Speed 2636.31 samples/sec Loss 9.3065 LearningRate 0.0472 Epoch: 6 Global Step: 259460 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:45,989-Speed 2632.67 samples/sec Loss 9.2756 LearningRate 0.0472 Epoch: 6 Global Step: 259470 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:49,878-Speed 2633.33 samples/sec Loss 9.4087 LearningRate 0.0472 Epoch: 6 Global Step: 259480 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:53,787-Speed 2620.98 samples/sec Loss 9.3959 LearningRate 0.0472 Epoch: 6 Global Step: 259490 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:51:57,681-Speed 2630.31 samples/sec Loss 9.2322 LearningRate 0.0472 Epoch: 6 Global Step: 259500 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:52:01,569-Speed 2634.50 samples/sec Loss 9.2074 LearningRate 0.0472 Epoch: 6 Global Step: 259510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:52:05,460-Speed 2631.78 samples/sec Loss 9.2823 LearningRate 0.0472 Epoch: 6 Global Step: 259520 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:52:09,356-Speed 2629.89 samples/sec Loss 9.2931 LearningRate 0.0472 Epoch: 6 Global Step: 259530 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:52:13,297-Speed 2598.82 samples/sec Loss 9.2783 LearningRate 0.0472 Epoch: 6 Global Step: 259540 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:52:17,191-Speed 2630.26 samples/sec Loss 9.1967 LearningRate 0.0472 Epoch: 6 Global Step: 259550 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:52:21,069-Speed 2641.08 samples/sec Loss 9.3083 LearningRate 0.0472 Epoch: 6 Global Step: 259560 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:52:24,958-Speed 2633.53 samples/sec Loss 9.3590 LearningRate 0.0472 Epoch: 6 Global Step: 259570 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:52:28,851-Speed 2631.36 samples/sec Loss 9.1579 LearningRate 0.0472 Epoch: 6 Global Step: 259580 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:52:32,737-Speed 2635.65 samples/sec Loss 9.2513 LearningRate 0.0472 Epoch: 6 Global Step: 259590 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:52:36,642-Speed 2622.67 samples/sec Loss 9.2381 LearningRate 0.0472 Epoch: 6 Global Step: 259600 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:52:40,533-Speed 2632.99 samples/sec Loss 9.3988 LearningRate 0.0472 Epoch: 6 Global Step: 259610 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:52:44,436-Speed 2624.40 samples/sec Loss 9.3679 LearningRate 0.0472 Epoch: 6 Global Step: 259620 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:52:48,366-Speed 2606.30 samples/sec Loss 9.3976 LearningRate 0.0472 Epoch: 6 Global Step: 259630 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:52:52,272-Speed 2622.26 samples/sec Loss 9.2715 LearningRate 0.0472 Epoch: 6 Global Step: 259640 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:52:56,158-Speed 2635.70 samples/sec Loss 9.1641 LearningRate 0.0472 Epoch: 6 Global Step: 259650 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:53:00,047-Speed 2633.93 samples/sec Loss 9.4508 LearningRate 0.0472 Epoch: 6 Global Step: 259660 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:53:03,934-Speed 2634.73 samples/sec Loss 9.3374 LearningRate 0.0472 Epoch: 6 Global Step: 259670 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:53:07,837-Speed 2623.91 samples/sec Loss 9.4126 LearningRate 0.0472 Epoch: 6 Global Step: 259680 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:53:11,736-Speed 2627.62 samples/sec Loss 9.3526 LearningRate 0.0472 Epoch: 6 Global Step: 259690 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:53:15,636-Speed 2626.12 samples/sec Loss 9.4899 LearningRate 0.0472 Epoch: 6 Global Step: 259700 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 00:53:19,537-Speed 2625.72 samples/sec Loss 9.3522 LearningRate 0.0472 Epoch: 6 Global Step: 259710 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:23,428-Speed 2632.80 samples/sec Loss 9.5188 LearningRate 0.0472 Epoch: 6 Global Step: 259720 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:27,315-Speed 2634.27 samples/sec Loss 9.2959 LearningRate 0.0472 Epoch: 6 Global Step: 259730 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:31,203-Speed 2634.78 samples/sec Loss 9.2792 LearningRate 0.0472 Epoch: 6 Global Step: 259740 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:35,102-Speed 2626.59 samples/sec Loss 9.2457 LearningRate 0.0472 Epoch: 6 Global Step: 259750 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:38,993-Speed 2632.62 samples/sec Loss 9.3375 LearningRate 0.0472 Epoch: 6 Global Step: 259760 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:42,887-Speed 2630.43 samples/sec Loss 9.2777 LearningRate 0.0472 Epoch: 6 Global Step: 259770 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:46,775-Speed 2634.73 samples/sec Loss 9.3129 LearningRate 0.0472 Epoch: 6 Global Step: 259780 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:50,668-Speed 2630.71 samples/sec Loss 9.4397 LearningRate 0.0472 Epoch: 6 Global Step: 259790 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:54,556-Speed 2634.50 samples/sec Loss 9.1472 LearningRate 0.0472 Epoch: 6 Global Step: 259800 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 00:53:58,446-Speed 2632.70 samples/sec Loss 9.1649 LearningRate 0.0472 Epoch: 6 Global Step: 259810 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:02,356-Speed 2620.14 samples/sec Loss 9.2767 LearningRate 0.0472 Epoch: 6 Global Step: 259820 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:06,243-Speed 2634.29 samples/sec Loss 9.2436 LearningRate 0.0472 Epoch: 6 Global Step: 259830 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:10,135-Speed 2631.86 samples/sec Loss 9.2647 LearningRate 0.0472 Epoch: 6 Global Step: 259840 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:14,027-Speed 2631.37 samples/sec Loss 9.2154 LearningRate 0.0472 Epoch: 6 Global Step: 259850 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:17,918-Speed 2632.42 samples/sec Loss 9.2670 LearningRate 0.0472 Epoch: 6 Global Step: 259860 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:21,809-Speed 2634.40 samples/sec Loss 9.1465 LearningRate 0.0472 Epoch: 6 Global Step: 259870 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:25,700-Speed 2632.30 samples/sec Loss 9.1874 LearningRate 0.0472 Epoch: 6 Global Step: 259880 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:29,597-Speed 2628.66 samples/sec Loss 9.1450 LearningRate 0.0472 Epoch: 6 Global Step: 259890 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:33,488-Speed 2632.06 samples/sec Loss 9.2378 LearningRate 0.0472 Epoch: 6 Global Step: 259900 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:54:37,377-Speed 2633.05 samples/sec Loss 9.3374 LearningRate 0.0472 Epoch: 6 Global Step: 259910 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:54:41,277-Speed 2626.65 samples/sec Loss 9.3231 LearningRate 0.0472 Epoch: 6 Global Step: 259920 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:54:45,171-Speed 2629.99 samples/sec Loss 9.1656 LearningRate 0.0472 Epoch: 6 Global Step: 259930 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:54:49,061-Speed 2633.70 samples/sec Loss 9.4582 LearningRate 0.0472 Epoch: 6 Global Step: 259940 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:54:52,952-Speed 2632.70 samples/sec Loss 9.3525 LearningRate 0.0471 Epoch: 6 Global Step: 259950 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:54:56,845-Speed 2630.49 samples/sec Loss 9.1429 LearningRate 0.0471 Epoch: 6 Global Step: 259960 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:55:00,767-Speed 2611.84 samples/sec Loss 9.1893 LearningRate 0.0471 Epoch: 6 Global Step: 259970 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:55:04,657-Speed 2632.89 samples/sec Loss 9.1382 LearningRate 0.0471 Epoch: 6 Global Step: 259980 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:55:08,539-Speed 2638.60 samples/sec Loss 9.1973 LearningRate 0.0471 Epoch: 6 Global Step: 259990 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:55:12,439-Speed 2626.35 samples/sec Loss 9.4235 LearningRate 0.0471 Epoch: 6 Global Step: 260000 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:55:55,770-[lfw][260000]XNorm: 23.913672
Training: 2022-04-14 00:55:55,770-[lfw][260000]Accuracy-Flip: 0.99783+-0.00269
Training: 2022-04-14 00:55:55,771-[lfw][260000]Accuracy-Highest: 0.99783
Training: 2022-04-14 00:56:46,222-[cfp_fp][260000]XNorm: 22.023019
Training: 2022-04-14 00:56:46,222-[cfp_fp][260000]Accuracy-Flip: 0.98214+-0.00471
Training: 2022-04-14 00:56:46,223-[cfp_fp][260000]Accuracy-Highest: 0.98643
Training: 2022-04-14 00:57:29,826-[agedb_30][260000]XNorm: 23.688854
Training: 2022-04-14 00:57:29,827-[agedb_30][260000]Accuracy-Flip: 0.97317+-0.00643
Training: 2022-04-14 00:57:29,827-[agedb_30][260000]Accuracy-Highest: 0.97350
Training: 2022-04-14 00:57:33,685-Speed 72.50 samples/sec Loss 9.2724 LearningRate 0.0471 Epoch: 6 Global Step: 260010 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:57:37,554-Speed 2646.58 samples/sec Loss 9.3040 LearningRate 0.0471 Epoch: 6 Global Step: 260020 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:57:41,415-Speed 2653.06 samples/sec Loss 9.2536 LearningRate 0.0471 Epoch: 6 Global Step: 260030 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:57:45,275-Speed 2653.59 samples/sec Loss 9.3815 LearningRate 0.0471 Epoch: 6 Global Step: 260040 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:57:49,142-Speed 2648.32 samples/sec Loss 9.2068 LearningRate 0.0471 Epoch: 6 Global Step: 260050 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:57:53,016-Speed 2644.38 samples/sec Loss 9.2394 LearningRate 0.0471 Epoch: 6 Global Step: 260060 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:57:56,881-Speed 2650.12 samples/sec Loss 9.3356 LearningRate 0.0471 Epoch: 6 Global Step: 260070 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:00,750-Speed 2647.22 samples/sec Loss 9.2719 LearningRate 0.0471 Epoch: 6 Global Step: 260080 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:04,619-Speed 2647.21 samples/sec Loss 9.4407 LearningRate 0.0471 Epoch: 6 Global Step: 260090 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:58:08,499-Speed 2639.31 samples/sec Loss 9.3249 LearningRate 0.0471 Epoch: 6 Global Step: 260100 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:58:12,379-Speed 2640.23 samples/sec Loss 9.2983 LearningRate 0.0471 Epoch: 6 Global Step: 260110 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:58:16,254-Speed 2643.57 samples/sec Loss 9.2421 LearningRate 0.0471 Epoch: 6 Global Step: 260120 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:58:20,131-Speed 2641.85 samples/sec Loss 9.3179 LearningRate 0.0471 Epoch: 6 Global Step: 260130 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:58:23,994-Speed 2652.25 samples/sec Loss 9.2956 LearningRate 0.0471 Epoch: 6 Global Step: 260140 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:27,881-Speed 2634.67 samples/sec Loss 9.1237 LearningRate 0.0471 Epoch: 6 Global Step: 260150 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:31,762-Speed 2639.44 samples/sec Loss 9.3284 LearningRate 0.0471 Epoch: 6 Global Step: 260160 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:35,648-Speed 2635.57 samples/sec Loss 9.2577 LearningRate 0.0471 Epoch: 6 Global Step: 260170 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:39,546-Speed 2627.81 samples/sec Loss 9.2829 LearningRate 0.0471 Epoch: 6 Global Step: 260180 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:43,440-Speed 2630.41 samples/sec Loss 9.3043 LearningRate 0.0471 Epoch: 6 Global Step: 260190 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:47,339-Speed 2626.93 samples/sec Loss 9.3381 LearningRate 0.0471 Epoch: 6 Global Step: 260200 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:51,248-Speed 2620.31 samples/sec Loss 9.3855 LearningRate 0.0471 Epoch: 6 Global Step: 260210 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:55,149-Speed 2626.23 samples/sec Loss 9.3459 LearningRate 0.0471 Epoch: 6 Global Step: 260220 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:58:59,054-Speed 2622.82 samples/sec Loss 9.3257 LearningRate 0.0471 Epoch: 6 Global Step: 260230 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:59:02,943-Speed 2633.40 samples/sec Loss 9.2594 LearningRate 0.0471 Epoch: 6 Global Step: 260240 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:06,831-Speed 2634.56 samples/sec Loss 9.1176 LearningRate 0.0471 Epoch: 6 Global Step: 260250 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:10,718-Speed 2634.52 samples/sec Loss 9.1402 LearningRate 0.0471 Epoch: 6 Global Step: 260260 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:14,607-Speed 2633.75 samples/sec Loss 9.3969 LearningRate 0.0471 Epoch: 6 Global Step: 260270 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:18,496-Speed 2633.63 samples/sec Loss 9.1800 LearningRate 0.0471 Epoch: 6 Global Step: 260280 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:22,389-Speed 2630.99 samples/sec Loss 9.2420 LearningRate 0.0471 Epoch: 6 Global Step: 260290 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:26,278-Speed 2633.70 samples/sec Loss 9.3097 LearningRate 0.0471 Epoch: 6 Global Step: 260300 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:30,169-Speed 2632.49 samples/sec Loss 9.3353 LearningRate 0.0471 Epoch: 6 Global Step: 260310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:34,081-Speed 2618.42 samples/sec Loss 9.1351 LearningRate 0.0471 Epoch: 6 Global Step: 260320 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:37,970-Speed 2633.45 samples/sec Loss 9.2650 LearningRate 0.0471 Epoch: 6 Global Step: 260330 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:41,868-Speed 2627.46 samples/sec Loss 9.2781 LearningRate 0.0471 Epoch: 6 Global Step: 260340 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:59:45,757-Speed 2634.00 samples/sec Loss 9.2407 LearningRate 0.0471 Epoch: 6 Global Step: 260350 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 00:59:49,632-Speed 2643.02 samples/sec Loss 9.2036 LearningRate 0.0471 Epoch: 6 Global Step: 260360 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 00:59:53,505-Speed 2644.68 samples/sec Loss 9.2483 LearningRate 0.0471 Epoch: 6 Global Step: 260370 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 00:59:57,388-Speed 2637.95 samples/sec Loss 9.2243 LearningRate 0.0471 Epoch: 6 Global Step: 260380 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:01,397-Speed 2554.86 samples/sec Loss 9.2508 LearningRate 0.0471 Epoch: 6 Global Step: 260390 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:05,300-Speed 2624.22 samples/sec Loss 9.2135 LearningRate 0.0471 Epoch: 6 Global Step: 260400 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:09,241-Speed 2598.77 samples/sec Loss 9.2521 LearningRate 0.0471 Epoch: 6 Global Step: 260410 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:13,142-Speed 2625.48 samples/sec Loss 9.1829 LearningRate 0.0471 Epoch: 6 Global Step: 260420 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:17,032-Speed 2633.61 samples/sec Loss 9.1878 LearningRate 0.0471 Epoch: 6 Global Step: 260430 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:20,931-Speed 2626.67 samples/sec Loss 9.2132 LearningRate 0.0471 Epoch: 6 Global Step: 260440 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:24,823-Speed 2631.75 samples/sec Loss 9.3421 LearningRate 0.0471 Epoch: 6 Global Step: 260450 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:28,714-Speed 2632.75 samples/sec Loss 9.2498 LearningRate 0.0471 Epoch: 6 Global Step: 260460 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:00:32,603-Speed 2633.11 samples/sec Loss 9.1950 LearningRate 0.0471 Epoch: 6 Global Step: 260470 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:00:36,494-Speed 2632.64 samples/sec Loss 9.2662 LearningRate 0.0471 Epoch: 6 Global Step: 260480 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:00:40,387-Speed 2630.79 samples/sec Loss 9.2957 LearningRate 0.0471 Epoch: 6 Global Step: 260490 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:00:44,277-Speed 2632.84 samples/sec Loss 9.2002 LearningRate 0.0471 Epoch: 6 Global Step: 260500 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:00:48,174-Speed 2628.18 samples/sec Loss 9.1870 LearningRate 0.0471 Epoch: 6 Global Step: 260510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:00:52,070-Speed 2628.92 samples/sec Loss 9.2567 LearningRate 0.0471 Epoch: 6 Global Step: 260520 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:00:55,965-Speed 2629.88 samples/sec Loss 9.2238 LearningRate 0.0471 Epoch: 6 Global Step: 260530 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:00:59,828-Speed 2651.85 samples/sec Loss 10.1081 LearningRate 0.0471 Epoch: 6 Global Step: 260540 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:01:03,704-Speed 2642.03 samples/sec Loss 9.9400 LearningRate 0.0470 Epoch: 6 Global Step: 260550 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:07,597-Speed 2631.67 samples/sec Loss 9.6538 LearningRate 0.0470 Epoch: 6 Global Step: 260560 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:11,494-Speed 2627.88 samples/sec Loss 9.3360 LearningRate 0.0470 Epoch: 6 Global Step: 260570 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:15,391-Speed 2628.22 samples/sec Loss 9.3206 LearningRate 0.0470 Epoch: 6 Global Step: 260580 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:19,295-Speed 2623.27 samples/sec Loss 9.4960 LearningRate 0.0470 Epoch: 6 Global Step: 260590 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:23,193-Speed 2627.62 samples/sec Loss 9.1986 LearningRate 0.0470 Epoch: 6 Global Step: 260600 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:27,092-Speed 2627.50 samples/sec Loss 9.1352 LearningRate 0.0470 Epoch: 6 Global Step: 260610 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:30,984-Speed 2631.69 samples/sec Loss 9.2217 LearningRate 0.0470 Epoch: 6 Global Step: 260620 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:34,889-Speed 2622.87 samples/sec Loss 9.2340 LearningRate 0.0470 Epoch: 6 Global Step: 260630 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:38,786-Speed 2628.61 samples/sec Loss 9.3027 LearningRate 0.0470 Epoch: 6 Global Step: 260640 Fp16 Grad Scale: 8192 Required: 64 hours
Training: 2022-04-14 01:01:42,676-Speed 2632.57 samples/sec Loss 9.0990 LearningRate 0.0470 Epoch: 6 Global Step: 260650 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:01:46,568-Speed 2631.42 samples/sec Loss 9.2187 LearningRate 0.0470 Epoch: 6 Global Step: 260660 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:01:50,475-Speed 2621.66 samples/sec Loss 9.2319 LearningRate 0.0470 Epoch: 6 Global Step: 260670 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:01:54,370-Speed 2629.47 samples/sec Loss 9.2991 LearningRate 0.0470 Epoch: 6 Global Step: 260680 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:01:58,262-Speed 2632.58 samples/sec Loss 9.3289 LearningRate 0.0470 Epoch: 6 Global Step: 260690 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:02:02,221-Speed 2586.70 samples/sec Loss 9.2110 LearningRate 0.0470 Epoch: 6 Global Step: 260700 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:02:06,208-Speed 2569.26 samples/sec Loss 9.1589 LearningRate 0.0470 Epoch: 6 Global Step: 260710 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:02:10,117-Speed 2620.39 samples/sec Loss 9.1763 LearningRate 0.0470 Epoch: 6 Global Step: 260720 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:02:14,011-Speed 2630.02 samples/sec Loss 9.1826 LearningRate 0.0470 Epoch: 6 Global Step: 260730 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:02:17,912-Speed 2625.77 samples/sec Loss 9.1789 LearningRate 0.0470 Epoch: 6 Global Step: 260740 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:02:21,805-Speed 2631.00 samples/sec Loss 9.2150 LearningRate 0.0470 Epoch: 6 Global Step: 260750 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:25,697-Speed 2631.81 samples/sec Loss 9.2670 LearningRate 0.0470 Epoch: 6 Global Step: 260760 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:29,588-Speed 2631.99 samples/sec Loss 9.2770 LearningRate 0.0470 Epoch: 6 Global Step: 260770 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:33,483-Speed 2630.03 samples/sec Loss 9.2247 LearningRate 0.0470 Epoch: 6 Global Step: 260780 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:37,374-Speed 2632.99 samples/sec Loss 9.0764 LearningRate 0.0470 Epoch: 6 Global Step: 260790 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:41,270-Speed 2628.85 samples/sec Loss 9.2535 LearningRate 0.0470 Epoch: 6 Global Step: 260800 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:45,169-Speed 2626.37 samples/sec Loss 9.2443 LearningRate 0.0470 Epoch: 6 Global Step: 260810 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:49,073-Speed 2623.89 samples/sec Loss 9.3572 LearningRate 0.0470 Epoch: 6 Global Step: 260820 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:52,975-Speed 2624.63 samples/sec Loss 9.2036 LearningRate 0.0470 Epoch: 6 Global Step: 260830 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:02:56,870-Speed 2630.47 samples/sec Loss 9.1784 LearningRate 0.0470 Epoch: 6 Global Step: 260840 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:03:00,794-Speed 2610.29 samples/sec Loss 9.2854 LearningRate 0.0470 Epoch: 6 Global Step: 260850 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:04,695-Speed 2624.93 samples/sec Loss 9.2734 LearningRate 0.0470 Epoch: 6 Global Step: 260860 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:08,606-Speed 2618.86 samples/sec Loss 9.3007 LearningRate 0.0470 Epoch: 6 Global Step: 260870 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:12,540-Speed 2603.65 samples/sec Loss 9.2543 LearningRate 0.0470 Epoch: 6 Global Step: 260880 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:16,441-Speed 2625.87 samples/sec Loss 9.1682 LearningRate 0.0470 Epoch: 6 Global Step: 260890 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:20,344-Speed 2624.94 samples/sec Loss 9.3656 LearningRate 0.0470 Epoch: 6 Global Step: 260900 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:24,242-Speed 2627.34 samples/sec Loss 9.3395 LearningRate 0.0470 Epoch: 6 Global Step: 260910 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:28,145-Speed 2624.32 samples/sec Loss 9.3962 LearningRate 0.0470 Epoch: 6 Global Step: 260920 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:32,045-Speed 2626.68 samples/sec Loss 9.2309 LearningRate 0.0470 Epoch: 6 Global Step: 260930 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:35,947-Speed 2624.68 samples/sec Loss 9.1626 LearningRate 0.0470 Epoch: 6 Global Step: 260940 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:03:39,848-Speed 2625.28 samples/sec Loss 9.3604 LearningRate 0.0470 Epoch: 6 Global Step: 260950 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:03:43,744-Speed 2629.07 samples/sec Loss 9.1980 LearningRate 0.0470 Epoch: 6 Global Step: 260960 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:03:47,639-Speed 2630.12 samples/sec Loss 9.2100 LearningRate 0.0470 Epoch: 6 Global Step: 260970 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:03:51,579-Speed 2599.49 samples/sec Loss 9.2555 LearningRate 0.0470 Epoch: 6 Global Step: 260980 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:03:55,478-Speed 2627.57 samples/sec Loss 9.3299 LearningRate 0.0470 Epoch: 6 Global Step: 260990 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:03:59,376-Speed 2627.13 samples/sec Loss 9.3467 LearningRate 0.0470 Epoch: 6 Global Step: 261000 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:03,276-Speed 2626.57 samples/sec Loss 9.2966 LearningRate 0.0470 Epoch: 6 Global Step: 261010 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:07,172-Speed 2629.01 samples/sec Loss 9.3155 LearningRate 0.0470 Epoch: 6 Global Step: 261020 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:11,079-Speed 2621.26 samples/sec Loss 9.2296 LearningRate 0.0470 Epoch: 6 Global Step: 261030 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:14,987-Speed 2620.68 samples/sec Loss 9.1885 LearningRate 0.0470 Epoch: 6 Global Step: 261040 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:18,876-Speed 2634.35 samples/sec Loss 9.2891 LearningRate 0.0470 Epoch: 6 Global Step: 261050 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:22,799-Speed 2611.06 samples/sec Loss 9.2019 LearningRate 0.0470 Epoch: 6 Global Step: 261060 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:26,721-Speed 2611.79 samples/sec Loss 9.2837 LearningRate 0.0470 Epoch: 6 Global Step: 261070 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:30,633-Speed 2618.34 samples/sec Loss 9.3010 LearningRate 0.0470 Epoch: 6 Global Step: 261080 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:34,555-Speed 2611.18 samples/sec Loss 9.3459 LearningRate 0.0470 Epoch: 6 Global Step: 261090 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:38,452-Speed 2628.61 samples/sec Loss 9.2476 LearningRate 0.0470 Epoch: 6 Global Step: 261100 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:42,350-Speed 2627.47 samples/sec Loss 9.2778 LearningRate 0.0470 Epoch: 6 Global Step: 261110 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:46,275-Speed 2609.09 samples/sec Loss 9.2627 LearningRate 0.0470 Epoch: 6 Global Step: 261120 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:50,292-Speed 2551.31 samples/sec Loss 9.3790 LearningRate 0.0470 Epoch: 6 Global Step: 261130 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:54,182-Speed 2632.67 samples/sec Loss 9.1724 LearningRate 0.0470 Epoch: 6 Global Step: 261140 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:04:58,080-Speed 2627.57 samples/sec Loss 9.1947 LearningRate 0.0470 Epoch: 6 Global Step: 261150 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:05:02,002-Speed 2611.48 samples/sec Loss 9.2995 LearningRate 0.0469 Epoch: 6 Global Step: 261160 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:05:05,883-Speed 2639.20 samples/sec Loss 9.1701 LearningRate 0.0469 Epoch: 6 Global Step: 261170 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:09,779-Speed 2628.88 samples/sec Loss 9.3574 LearningRate 0.0469 Epoch: 6 Global Step: 261180 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:13,682-Speed 2624.54 samples/sec Loss 9.2555 LearningRate 0.0469 Epoch: 6 Global Step: 261190 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:17,579-Speed 2628.59 samples/sec Loss 9.3104 LearningRate 0.0469 Epoch: 6 Global Step: 261200 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:21,478-Speed 2626.86 samples/sec Loss 9.2424 LearningRate 0.0469 Epoch: 6 Global Step: 261210 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:25,408-Speed 2607.10 samples/sec Loss 9.0989 LearningRate 0.0469 Epoch: 6 Global Step: 261220 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:29,321-Speed 2617.60 samples/sec Loss 9.1232 LearningRate 0.0469 Epoch: 6 Global Step: 261230 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:33,223-Speed 2625.11 samples/sec Loss 9.2238 LearningRate 0.0469 Epoch: 6 Global Step: 261240 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:37,125-Speed 2624.44 samples/sec Loss 9.3301 LearningRate 0.0469 Epoch: 6 Global Step: 261250 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:41,031-Speed 2622.25 samples/sec Loss 9.1405 LearningRate 0.0469 Epoch: 6 Global Step: 261260 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:44,913-Speed 2638.56 samples/sec Loss 9.0649 LearningRate 0.0469 Epoch: 6 Global Step: 261270 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:48,810-Speed 2628.52 samples/sec Loss 9.2868 LearningRate 0.0469 Epoch: 6 Global Step: 261280 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:52,712-Speed 2624.93 samples/sec Loss 9.2370 LearningRate 0.0469 Epoch: 6 Global Step: 261290 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:05:56,616-Speed 2623.45 samples/sec Loss 9.2468 LearningRate 0.0469 Epoch: 6 Global Step: 261300 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:00,516-Speed 2626.45 samples/sec Loss 9.1639 LearningRate 0.0469 Epoch: 6 Global Step: 261310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:04,433-Speed 2614.37 samples/sec Loss 9.1618 LearningRate 0.0469 Epoch: 6 Global Step: 261320 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:08,334-Speed 2625.82 samples/sec Loss 9.2549 LearningRate 0.0469 Epoch: 6 Global Step: 261330 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:12,249-Speed 2616.30 samples/sec Loss 9.2546 LearningRate 0.0469 Epoch: 6 Global Step: 261340 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:16,155-Speed 2621.81 samples/sec Loss 9.4286 LearningRate 0.0469 Epoch: 6 Global Step: 261350 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:20,064-Speed 2619.99 samples/sec Loss 9.2630 LearningRate 0.0469 Epoch: 6 Global Step: 261360 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:23,970-Speed 2622.72 samples/sec Loss 9.2905 LearningRate 0.0469 Epoch: 6 Global Step: 261370 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:06:27,875-Speed 2623.17 samples/sec Loss 9.1993 LearningRate 0.0469 Epoch: 6 Global Step: 261380 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:06:31,764-Speed 2633.62 samples/sec Loss 9.2192 LearningRate 0.0469 Epoch: 6 Global Step: 261390 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:35,673-Speed 2620.35 samples/sec Loss 9.3156 LearningRate 0.0469 Epoch: 6 Global Step: 261400 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:39,620-Speed 2594.75 samples/sec Loss 9.3245 LearningRate 0.0469 Epoch: 6 Global Step: 261410 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:43,540-Speed 2612.94 samples/sec Loss 9.3047 LearningRate 0.0469 Epoch: 6 Global Step: 261420 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:47,452-Speed 2617.67 samples/sec Loss 9.1664 LearningRate 0.0469 Epoch: 6 Global Step: 261430 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:51,373-Speed 2612.20 samples/sec Loss 9.1545 LearningRate 0.0469 Epoch: 6 Global Step: 261440 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:55,283-Speed 2619.32 samples/sec Loss 9.1567 LearningRate 0.0469 Epoch: 6 Global Step: 261450 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:06:59,193-Speed 2619.37 samples/sec Loss 9.2579 LearningRate 0.0469 Epoch: 6 Global Step: 261460 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:03,103-Speed 2621.95 samples/sec Loss 9.2190 LearningRate 0.0469 Epoch: 6 Global Step: 261470 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:07,009-Speed 2621.89 samples/sec Loss 9.2224 LearningRate 0.0469 Epoch: 6 Global Step: 261480 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:10,892-Speed 2637.78 samples/sec Loss 9.2570 LearningRate 0.0469 Epoch: 6 Global Step: 261490 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:14,793-Speed 2625.53 samples/sec Loss 9.3710 LearningRate 0.0469 Epoch: 6 Global Step: 261500 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:18,693-Speed 2626.00 samples/sec Loss 9.2780 LearningRate 0.0469 Epoch: 6 Global Step: 261510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:22,593-Speed 2626.02 samples/sec Loss 9.1665 LearningRate 0.0469 Epoch: 6 Global Step: 261520 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:26,497-Speed 2623.99 samples/sec Loss 9.1884 LearningRate 0.0469 Epoch: 6 Global Step: 261530 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:30,399-Speed 2624.83 samples/sec Loss 9.1978 LearningRate 0.0469 Epoch: 6 Global Step: 261540 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:34,300-Speed 2625.85 samples/sec Loss 9.4194 LearningRate 0.0469 Epoch: 6 Global Step: 261550 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:38,202-Speed 2624.98 samples/sec Loss 9.2858 LearningRate 0.0469 Epoch: 6 Global Step: 261560 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:42,187-Speed 2570.45 samples/sec Loss 9.1380 LearningRate 0.0469 Epoch: 6 Global Step: 261570 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:46,087-Speed 2625.56 samples/sec Loss 9.2373 LearningRate 0.0469 Epoch: 6 Global Step: 261580 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:07:49,991-Speed 2623.48 samples/sec Loss 9.1890 LearningRate 0.0469 Epoch: 6 Global Step: 261590 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:07:53,901-Speed 2619.34 samples/sec Loss 9.1681 LearningRate 0.0469 Epoch: 6 Global Step: 261600 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:07:57,801-Speed 2626.92 samples/sec Loss 9.0903 LearningRate 0.0469 Epoch: 6 Global Step: 261610 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:08:01,708-Speed 2621.20 samples/sec Loss 9.3436 LearningRate 0.0469 Epoch: 6 Global Step: 261620 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:08:05,619-Speed 2619.04 samples/sec Loss 9.2264 LearningRate 0.0469 Epoch: 6 Global Step: 261630 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:08:09,526-Speed 2621.29 samples/sec Loss 9.2654 LearningRate 0.0469 Epoch: 6 Global Step: 261640 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:08:13,426-Speed 2626.25 samples/sec Loss 9.2074 LearningRate 0.0469 Epoch: 6 Global Step: 261650 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:17,332-Speed 2622.22 samples/sec Loss 9.1564 LearningRate 0.0469 Epoch: 6 Global Step: 261660 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:21,240-Speed 2621.10 samples/sec Loss 9.3567 LearningRate 0.0469 Epoch: 6 Global Step: 261670 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:25,152-Speed 2617.70 samples/sec Loss 9.2837 LearningRate 0.0469 Epoch: 6 Global Step: 261680 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:29,061-Speed 2620.81 samples/sec Loss 9.2734 LearningRate 0.0469 Epoch: 6 Global Step: 261690 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:32,975-Speed 2616.72 samples/sec Loss 9.2225 LearningRate 0.0469 Epoch: 6 Global Step: 261700 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:36,877-Speed 2624.56 samples/sec Loss 9.3101 LearningRate 0.0469 Epoch: 6 Global Step: 261710 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:40,778-Speed 2625.34 samples/sec Loss 9.3182 LearningRate 0.0469 Epoch: 6 Global Step: 261720 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:44,689-Speed 2619.41 samples/sec Loss 9.1571 LearningRate 0.0469 Epoch: 6 Global Step: 261730 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:48,601-Speed 2618.24 samples/sec Loss 9.0678 LearningRate 0.0469 Epoch: 6 Global Step: 261740 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:08:52,505-Speed 2624.39 samples/sec Loss 9.2497 LearningRate 0.0469 Epoch: 6 Global Step: 261750 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:08:56,391-Speed 2635.47 samples/sec Loss 9.2452 LearningRate 0.0468 Epoch: 6 Global Step: 261760 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:00,314-Speed 2610.94 samples/sec Loss 9.2832 LearningRate 0.0468 Epoch: 6 Global Step: 261770 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:04,217-Speed 2624.15 samples/sec Loss 9.1378 LearningRate 0.0468 Epoch: 6 Global Step: 261780 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:08,171-Speed 2590.08 samples/sec Loss 9.1352 LearningRate 0.0468 Epoch: 6 Global Step: 261790 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:12,073-Speed 2624.75 samples/sec Loss 9.0706 LearningRate 0.0468 Epoch: 6 Global Step: 261800 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:15,976-Speed 2624.41 samples/sec Loss 9.2424 LearningRate 0.0468 Epoch: 6 Global Step: 261810 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:19,877-Speed 2625.27 samples/sec Loss 9.2490 LearningRate 0.0468 Epoch: 6 Global Step: 261820 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:23,780-Speed 2624.35 samples/sec Loss 9.2125 LearningRate 0.0468 Epoch: 6 Global Step: 261830 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:27,686-Speed 2622.34 samples/sec Loss 9.1749 LearningRate 0.0468 Epoch: 6 Global Step: 261840 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:31,596-Speed 2620.01 samples/sec Loss 9.3120 LearningRate 0.0468 Epoch: 6 Global Step: 261850 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:35,499-Speed 2624.01 samples/sec Loss 9.2362 LearningRate 0.0468 Epoch: 6 Global Step: 261860 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:09:39,402-Speed 2624.26 samples/sec Loss 9.2307 LearningRate 0.0468 Epoch: 6 Global Step: 261870 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:09:43,304-Speed 2624.67 samples/sec Loss 9.2804 LearningRate 0.0468 Epoch: 6 Global Step: 261880 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:09:47,201-Speed 2628.00 samples/sec Loss 9.2343 LearningRate 0.0468 Epoch: 6 Global Step: 261890 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:09:51,079-Speed 2641.50 samples/sec Loss 9.2482 LearningRate 0.0468 Epoch: 6 Global Step: 261900 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:09:54,975-Speed 2629.20 samples/sec Loss 9.4344 LearningRate 0.0468 Epoch: 6 Global Step: 261910 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:09:58,882-Speed 2621.52 samples/sec Loss 9.2806 LearningRate 0.0468 Epoch: 6 Global Step: 261920 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:02,788-Speed 2622.23 samples/sec Loss 9.2688 LearningRate 0.0468 Epoch: 6 Global Step: 261930 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:06,697-Speed 2620.25 samples/sec Loss 9.2624 LearningRate 0.0468 Epoch: 6 Global Step: 261940 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:10,607-Speed 2619.64 samples/sec Loss 9.1308 LearningRate 0.0468 Epoch: 6 Global Step: 261950 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:14,514-Speed 2621.01 samples/sec Loss 9.2742 LearningRate 0.0468 Epoch: 6 Global Step: 261960 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:18,421-Speed 2622.15 samples/sec Loss 9.2846 LearningRate 0.0468 Epoch: 6 Global Step: 261970 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:22,332-Speed 2618.59 samples/sec Loss 9.3354 LearningRate 0.0468 Epoch: 6 Global Step: 261980 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:26,231-Speed 2626.71 samples/sec Loss 9.1928 LearningRate 0.0468 Epoch: 6 Global Step: 261990 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:30,136-Speed 2623.16 samples/sec Loss 9.2115 LearningRate 0.0468 Epoch: 6 Global Step: 262000 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:34,019-Speed 2637.73 samples/sec Loss 9.9037 LearningRate 0.0468 Epoch: 6 Global Step: 262010 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:37,923-Speed 2623.19 samples/sec Loss 9.6131 LearningRate 0.0468 Epoch: 6 Global Step: 262020 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:41,823-Speed 2626.99 samples/sec Loss 9.2629 LearningRate 0.0468 Epoch: 6 Global Step: 262030 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:45,746-Speed 2610.79 samples/sec Loss 9.4049 LearningRate 0.0468 Epoch: 6 Global Step: 262040 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:49,659-Speed 2617.25 samples/sec Loss 9.2102 LearningRate 0.0468 Epoch: 6 Global Step: 262050 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:53,562-Speed 2624.41 samples/sec Loss 9.2468 LearningRate 0.0468 Epoch: 6 Global Step: 262060 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:10:57,461-Speed 2626.83 samples/sec Loss 9.1777 LearningRate 0.0468 Epoch: 6 Global Step: 262070 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:11:01,389-Speed 2607.46 samples/sec Loss 9.3041 LearningRate 0.0468 Epoch: 6 Global Step: 262080 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:11:05,293-Speed 2623.59 samples/sec Loss 9.2939 LearningRate 0.0468 Epoch: 6 Global Step: 262090 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:11:09,196-Speed 2623.70 samples/sec Loss 9.2053 LearningRate 0.0468 Epoch: 6 Global Step: 262100 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:11:13,101-Speed 2623.26 samples/sec Loss 9.2765 LearningRate 0.0468 Epoch: 6 Global Step: 262110 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:17,004-Speed 2624.43 samples/sec Loss 9.1356 LearningRate 0.0468 Epoch: 6 Global Step: 262120 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:20,913-Speed 2620.51 samples/sec Loss 9.2192 LearningRate 0.0468 Epoch: 6 Global Step: 262130 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:24,814-Speed 2625.41 samples/sec Loss 9.2142 LearningRate 0.0468 Epoch: 6 Global Step: 262140 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:28,713-Speed 2626.97 samples/sec Loss 9.1812 LearningRate 0.0468 Epoch: 6 Global Step: 262150 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:32,623-Speed 2619.17 samples/sec Loss 9.1961 LearningRate 0.0468 Epoch: 6 Global Step: 262160 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:36,526-Speed 2623.92 samples/sec Loss 9.3491 LearningRate 0.0468 Epoch: 6 Global Step: 262170 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:40,427-Speed 2625.44 samples/sec Loss 9.1941 LearningRate 0.0468 Epoch: 6 Global Step: 262180 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:44,331-Speed 2624.45 samples/sec Loss 9.1652 LearningRate 0.0468 Epoch: 6 Global Step: 262190 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:48,235-Speed 2623.97 samples/sec Loss 9.3160 LearningRate 0.0468 Epoch: 6 Global Step: 262200 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:11:52,171-Speed 2602.06 samples/sec Loss 9.1676 LearningRate 0.0468 Epoch: 6 Global Step: 262210 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:11:56,086-Speed 2616.80 samples/sec Loss 9.2593 LearningRate 0.0468 Epoch: 6 Global Step: 262220 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:00,003-Speed 2614.86 samples/sec Loss 9.3155 LearningRate 0.0468 Epoch: 6 Global Step: 262230 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:03,917-Speed 2616.45 samples/sec Loss 9.3277 LearningRate 0.0468 Epoch: 6 Global Step: 262240 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:07,842-Speed 2609.75 samples/sec Loss 9.1389 LearningRate 0.0468 Epoch: 6 Global Step: 262250 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:11,738-Speed 2629.21 samples/sec Loss 9.1594 LearningRate 0.0468 Epoch: 6 Global Step: 262260 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:15,646-Speed 2621.08 samples/sec Loss 9.2517 LearningRate 0.0468 Epoch: 6 Global Step: 262270 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:19,564-Speed 2614.07 samples/sec Loss 9.1566 LearningRate 0.0468 Epoch: 6 Global Step: 262280 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:23,485-Speed 2612.55 samples/sec Loss 9.1424 LearningRate 0.0468 Epoch: 6 Global Step: 262290 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:27,388-Speed 2624.10 samples/sec Loss 9.1153 LearningRate 0.0468 Epoch: 6 Global Step: 262300 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:12:31,289-Speed 2625.96 samples/sec Loss 9.2618 LearningRate 0.0468 Epoch: 6 Global Step: 262310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:12:35,195-Speed 2622.68 samples/sec Loss 9.2762 LearningRate 0.0468 Epoch: 6 Global Step: 262320 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:12:39,097-Speed 2624.85 samples/sec Loss 9.2411 LearningRate 0.0468 Epoch: 6 Global Step: 262330 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:12:43,000-Speed 2624.70 samples/sec Loss 9.1421 LearningRate 0.0468 Epoch: 6 Global Step: 262340 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:12:46,908-Speed 2620.58 samples/sec Loss 9.2093 LearningRate 0.0468 Epoch: 6 Global Step: 262350 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:12:50,842-Speed 2603.25 samples/sec Loss 9.2418 LearningRate 0.0468 Epoch: 6 Global Step: 262360 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:12:54,746-Speed 2623.99 samples/sec Loss 9.2634 LearningRate 0.0467 Epoch: 6 Global Step: 262370 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:12:58,646-Speed 2626.28 samples/sec Loss 9.1771 LearningRate 0.0467 Epoch: 6 Global Step: 262380 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:13:02,549-Speed 2624.11 samples/sec Loss 9.1892 LearningRate 0.0467 Epoch: 6 Global Step: 262390 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:13:06,452-Speed 2624.33 samples/sec Loss 9.0898 LearningRate 0.0467 Epoch: 6 Global Step: 262400 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:13:10,356-Speed 2623.74 samples/sec Loss 9.3280 LearningRate 0.0467 Epoch: 6 Global Step: 262410 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:14,261-Speed 2623.70 samples/sec Loss 9.2930 LearningRate 0.0467 Epoch: 6 Global Step: 262420 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:18,166-Speed 2622.52 samples/sec Loss 9.3030 LearningRate 0.0467 Epoch: 6 Global Step: 262430 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:22,068-Speed 2625.13 samples/sec Loss 9.0957 LearningRate 0.0467 Epoch: 6 Global Step: 262440 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:25,969-Speed 2625.09 samples/sec Loss 9.2138 LearningRate 0.0467 Epoch: 6 Global Step: 262450 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:29,892-Speed 2611.16 samples/sec Loss 9.3274 LearningRate 0.0467 Epoch: 6 Global Step: 262460 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:33,799-Speed 2622.06 samples/sec Loss 9.2066 LearningRate 0.0467 Epoch: 6 Global Step: 262470 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:37,707-Speed 2620.46 samples/sec Loss 9.2782 LearningRate 0.0467 Epoch: 6 Global Step: 262480 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:41,608-Speed 2626.04 samples/sec Loss 9.3124 LearningRate 0.0467 Epoch: 6 Global Step: 262490 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:45,513-Speed 2622.47 samples/sec Loss 9.2547 LearningRate 0.0467 Epoch: 6 Global Step: 262500 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:13:49,411-Speed 2627.80 samples/sec Loss 9.1423 LearningRate 0.0467 Epoch: 6 Global Step: 262510 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:13:53,317-Speed 2622.21 samples/sec Loss 9.1490 LearningRate 0.0467 Epoch: 6 Global Step: 262520 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:13:57,220-Speed 2625.00 samples/sec Loss 9.2157 LearningRate 0.0467 Epoch: 6 Global Step: 262530 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:01,133-Speed 2616.87 samples/sec Loss 9.1942 LearningRate 0.0467 Epoch: 6 Global Step: 262540 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:05,053-Speed 2613.07 samples/sec Loss 9.0535 LearningRate 0.0467 Epoch: 6 Global Step: 262550 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:09,053-Speed 2560.60 samples/sec Loss 9.1599 LearningRate 0.0467 Epoch: 6 Global Step: 262560 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:13,087-Speed 2539.40 samples/sec Loss 9.2909 LearningRate 0.0467 Epoch: 6 Global Step: 262570 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:17,006-Speed 2613.44 samples/sec Loss 9.2658 LearningRate 0.0467 Epoch: 6 Global Step: 262580 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:20,906-Speed 2626.63 samples/sec Loss 9.1921 LearningRate 0.0467 Epoch: 6 Global Step: 262590 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:24,827-Speed 2611.88 samples/sec Loss 9.1271 LearningRate 0.0467 Epoch: 6 Global Step: 262600 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:14:28,729-Speed 2625.04 samples/sec Loss 9.2491 LearningRate 0.0467 Epoch: 6 Global Step: 262610 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:32,632-Speed 2624.52 samples/sec Loss 9.2344 LearningRate 0.0467 Epoch: 6 Global Step: 262620 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:36,541-Speed 2619.88 samples/sec Loss 9.2355 LearningRate 0.0467 Epoch: 6 Global Step: 262630 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:40,465-Speed 2609.95 samples/sec Loss 9.1555 LearningRate 0.0467 Epoch: 6 Global Step: 262640 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:44,378-Speed 2618.14 samples/sec Loss 9.0573 LearningRate 0.0467 Epoch: 6 Global Step: 262650 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:48,283-Speed 2622.62 samples/sec Loss 9.1819 LearningRate 0.0467 Epoch: 6 Global Step: 262660 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:52,191-Speed 2621.51 samples/sec Loss 9.2511 LearningRate 0.0467 Epoch: 6 Global Step: 262670 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:56,091-Speed 2626.24 samples/sec Loss 9.2722 LearningRate 0.0467 Epoch: 6 Global Step: 262680 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:14:59,991-Speed 2626.45 samples/sec Loss 9.2072 LearningRate 0.0467 Epoch: 6 Global Step: 262690 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:03,891-Speed 2626.16 samples/sec Loss 9.3122 LearningRate 0.0467 Epoch: 6 Global Step: 262700 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:08,267-Speed 2340.54 samples/sec Loss 9.3831 LearningRate 0.0467 Epoch: 6 Global Step: 262710 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:12,177-Speed 2618.90 samples/sec Loss 9.1450 LearningRate 0.0467 Epoch: 6 Global Step: 262720 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:16,069-Speed 2631.93 samples/sec Loss 9.1126 LearningRate 0.0467 Epoch: 6 Global Step: 262730 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:19,970-Speed 2626.04 samples/sec Loss 9.3413 LearningRate 0.0467 Epoch: 6 Global Step: 262740 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:23,872-Speed 2625.04 samples/sec Loss 9.3232 LearningRate 0.0467 Epoch: 6 Global Step: 262750 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:27,782-Speed 2619.26 samples/sec Loss 9.0779 LearningRate 0.0467 Epoch: 6 Global Step: 262760 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:31,683-Speed 2625.34 samples/sec Loss 9.1160 LearningRate 0.0467 Epoch: 6 Global Step: 262770 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:35,582-Speed 2627.39 samples/sec Loss 9.1678 LearningRate 0.0467 Epoch: 6 Global Step: 262780 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:39,494-Speed 2617.37 samples/sec Loss 9.1487 LearningRate 0.0467 Epoch: 6 Global Step: 262790 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:43,425-Speed 2606.13 samples/sec Loss 9.0731 LearningRate 0.0467 Epoch: 6 Global Step: 262800 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:47,312-Speed 2635.28 samples/sec Loss 9.2119 LearningRate 0.0467 Epoch: 6 Global Step: 262810 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:51,211-Speed 2627.03 samples/sec Loss 9.3134 LearningRate 0.0467 Epoch: 6 Global Step: 262820 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:55,110-Speed 2626.81 samples/sec Loss 9.1003 LearningRate 0.0467 Epoch: 6 Global Step: 262830 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:15:59,007-Speed 2629.06 samples/sec Loss 9.2042 LearningRate 0.0467 Epoch: 6 Global Step: 262840 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:16:02,906-Speed 2626.67 samples/sec Loss 9.1625 LearningRate 0.0467 Epoch: 6 Global Step: 262850 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:16:06,795-Speed 2633.15 samples/sec Loss 9.2072 LearningRate 0.0467 Epoch: 6 Global Step: 262860 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:16:10,642-Speed 2662.22 samples/sec Loss 10.1797 LearningRate 0.0467 Epoch: 6 Global Step: 262870 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:14,543-Speed 2625.98 samples/sec Loss 9.4990 LearningRate 0.0467 Epoch: 6 Global Step: 262880 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:18,441-Speed 2628.25 samples/sec Loss 9.4066 LearningRate 0.0467 Epoch: 6 Global Step: 262890 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:22,340-Speed 2626.95 samples/sec Loss 9.5263 LearningRate 0.0467 Epoch: 6 Global Step: 262900 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:26,230-Speed 2633.31 samples/sec Loss 9.5334 LearningRate 0.0467 Epoch: 6 Global Step: 262910 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:30,132-Speed 2624.32 samples/sec Loss 9.5311 LearningRate 0.0467 Epoch: 6 Global Step: 262920 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:34,041-Speed 2620.22 samples/sec Loss 9.2314 LearningRate 0.0467 Epoch: 6 Global Step: 262930 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:37,938-Speed 2628.39 samples/sec Loss 9.1933 LearningRate 0.0467 Epoch: 6 Global Step: 262940 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:41,838-Speed 2626.43 samples/sec Loss 9.1598 LearningRate 0.0467 Epoch: 6 Global Step: 262950 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:45,744-Speed 2621.92 samples/sec Loss 9.1579 LearningRate 0.0467 Epoch: 6 Global Step: 262960 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:16:49,664-Speed 2613.14 samples/sec Loss 9.1998 LearningRate 0.0467 Epoch: 6 Global Step: 262970 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:16:53,575-Speed 2618.87 samples/sec Loss 9.1647 LearningRate 0.0466 Epoch: 6 Global Step: 262980 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:16:57,478-Speed 2624.65 samples/sec Loss 9.3311 LearningRate 0.0466 Epoch: 6 Global Step: 262990 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:17:01,370-Speed 2631.97 samples/sec Loss 9.4422 LearningRate 0.0466 Epoch: 6 Global Step: 263000 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:05,276-Speed 2621.77 samples/sec Loss 9.4562 LearningRate 0.0466 Epoch: 6 Global Step: 263010 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:09,173-Speed 2628.34 samples/sec Loss 9.1567 LearningRate 0.0466 Epoch: 6 Global Step: 263020 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:13,068-Speed 2629.88 samples/sec Loss 9.3790 LearningRate 0.0466 Epoch: 6 Global Step: 263030 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:16,962-Speed 2629.64 samples/sec Loss 9.1704 LearningRate 0.0466 Epoch: 6 Global Step: 263040 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:20,858-Speed 2629.48 samples/sec Loss 9.2678 LearningRate 0.0466 Epoch: 6 Global Step: 263050 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:24,759-Speed 2625.90 samples/sec Loss 9.2913 LearningRate 0.0466 Epoch: 6 Global Step: 263060 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:28,677-Speed 2614.23 samples/sec Loss 9.1328 LearningRate 0.0466 Epoch: 6 Global Step: 263070 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:32,579-Speed 2625.29 samples/sec Loss 9.1627 LearningRate 0.0466 Epoch: 6 Global Step: 263080 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:36,482-Speed 2624.20 samples/sec Loss 9.1861 LearningRate 0.0466 Epoch: 6 Global Step: 263090 Fp16 Grad Scale: 16384 Required: 64 hours
Training: 2022-04-14 01:17:40,383-Speed 2625.24 samples/sec Loss 9.3165 LearningRate 0.0466 Epoch: 6 Global Step: 263100 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:17:44,335-Speed 2591.93 samples/sec Loss 9.3166 LearningRate 0.0466 Epoch: 6 Global Step: 263110 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:17:48,230-Speed 2630.02 samples/sec Loss 9.0572 LearningRate 0.0466 Epoch: 6 Global Step: 263120 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:17:52,247-Speed 2549.74 samples/sec Loss 9.2407 LearningRate 0.0466 Epoch: 6 Global Step: 263130 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:17:56,149-Speed 2625.59 samples/sec Loss 9.1151 LearningRate 0.0466 Epoch: 6 Global Step: 263140 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:18:00,047-Speed 2627.37 samples/sec Loss 9.2782 LearningRate 0.0466 Epoch: 6 Global Step: 263150 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:18:03,941-Speed 2630.13 samples/sec Loss 9.2057 LearningRate 0.0466 Epoch: 6 Global Step: 263160 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:18:07,836-Speed 2629.44 samples/sec Loss 9.2333 LearningRate 0.0466 Epoch: 6 Global Step: 263170 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:18:11,736-Speed 2626.65 samples/sec Loss 9.3420 LearningRate 0.0466 Epoch: 6 Global Step: 263180 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:18:15,639-Speed 2624.46 samples/sec Loss 9.2935 LearningRate 0.0466 Epoch: 6 Global Step: 263190 Fp16 Grad Scale: 32768 Required: 64 hours
Training: 2022-04-14 01:18:19,637-Speed 2561.94 samples/sec Loss 9.3636 LearningRate 0.0466 Epoch: 6 Global Step: 263200 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:23,534-Speed 2628.75 samples/sec Loss 9.2897 LearningRate 0.0466 Epoch: 6 Global Step: 263210 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:27,438-Speed 2623.32 samples/sec Loss 9.2388 LearningRate 0.0466 Epoch: 6 Global Step: 263220 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:31,334-Speed 2628.80 samples/sec Loss 9.3316 LearningRate 0.0466 Epoch: 6 Global Step: 263230 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:35,235-Speed 2625.88 samples/sec Loss 9.0562 LearningRate 0.0466 Epoch: 6 Global Step: 263240 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:39,144-Speed 2620.52 samples/sec Loss 9.2089 LearningRate 0.0466 Epoch: 6 Global Step: 263250 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:43,049-Speed 2622.94 samples/sec Loss 9.2495 LearningRate 0.0466 Epoch: 6 Global Step: 263260 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:46,992-Speed 2597.81 samples/sec Loss 9.2239 LearningRate 0.0466 Epoch: 6 Global Step: 263270 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:50,903-Speed 2619.07 samples/sec Loss 9.2950 LearningRate 0.0466 Epoch: 6 Global Step: 263280 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:54,822-Speed 2613.60 samples/sec Loss 9.2554 LearningRate 0.0466 Epoch: 6 Global Step: 263290 Fp16 Grad Scale: 65536 Required: 64 hours
Training: 2022-04-14 01:18:58,716-Speed 2630.79 samples/sec Loss 9.2371 LearningRate 0.0466 Epoch: 6 Global Step: 263300 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:02,621-Speed 2623.13 samples/sec Loss 9.3310 LearningRate 0.0466 Epoch: 6 Global Step: 263310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:06,530-Speed 2620.03 samples/sec Loss 9.2226 LearningRate 0.0466 Epoch: 6 Global Step: 263320 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:10,428-Speed 2627.42 samples/sec Loss 9.2552 LearningRate 0.0466 Epoch: 6 Global Step: 263330 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:14,338-Speed 2619.62 samples/sec Loss 9.1275 LearningRate 0.0466 Epoch: 6 Global Step: 263340 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:18,255-Speed 2614.25 samples/sec Loss 9.2388 LearningRate 0.0466 Epoch: 6 Global Step: 263350 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:22,160-Speed 2623.58 samples/sec Loss 9.3164 LearningRate 0.0466 Epoch: 6 Global Step: 263360 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:26,065-Speed 2623.12 samples/sec Loss 9.1982 LearningRate 0.0466 Epoch: 6 Global Step: 263370 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:29,985-Speed 2612.35 samples/sec Loss 9.2265 LearningRate 0.0466 Epoch: 6 Global Step: 263380 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:33,889-Speed 2623.62 samples/sec Loss 9.3696 LearningRate 0.0466 Epoch: 6 Global Step: 263390 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:19:37,787-Speed 2627.71 samples/sec Loss 9.2311 LearningRate 0.0466 Epoch: 6 Global Step: 263400 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:19:41,684-Speed 2628.17 samples/sec Loss 9.1499 LearningRate 0.0466 Epoch: 6 Global Step: 263410 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:19:45,580-Speed 2629.34 samples/sec Loss 9.3494 LearningRate 0.0466 Epoch: 6 Global Step: 263420 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:19:49,491-Speed 2618.87 samples/sec Loss 9.2067 LearningRate 0.0466 Epoch: 6 Global Step: 263430 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:19:53,388-Speed 2628.01 samples/sec Loss 9.1828 LearningRate 0.0466 Epoch: 6 Global Step: 263440 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:19:57,319-Speed 2605.50 samples/sec Loss 9.1426 LearningRate 0.0466 Epoch: 6 Global Step: 263450 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:01,218-Speed 2627.24 samples/sec Loss 9.0667 LearningRate 0.0466 Epoch: 6 Global Step: 263460 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:05,120-Speed 2624.68 samples/sec Loss 9.2244 LearningRate 0.0466 Epoch: 6 Global Step: 263470 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:09,036-Speed 2616.42 samples/sec Loss 9.1824 LearningRate 0.0466 Epoch: 6 Global Step: 263480 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:12,942-Speed 2621.79 samples/sec Loss 9.2147 LearningRate 0.0466 Epoch: 6 Global Step: 263490 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:16,834-Speed 2632.43 samples/sec Loss 9.1513 LearningRate 0.0466 Epoch: 6 Global Step: 263500 Fp16 Grad Scale: 524288 Required: 64 hours
Training: 2022-04-14 01:20:20,709-Speed 2642.66 samples/sec Loss 9.1465 LearningRate 0.0466 Epoch: 6 Global Step: 263510 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:24,607-Speed 2628.44 samples/sec Loss 9.0161 LearningRate 0.0466 Epoch: 6 Global Step: 263520 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:28,508-Speed 2625.53 samples/sec Loss 9.1992 LearningRate 0.0466 Epoch: 6 Global Step: 263530 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:32,414-Speed 2621.57 samples/sec Loss 9.2987 LearningRate 0.0466 Epoch: 6 Global Step: 263540 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:36,309-Speed 2629.59 samples/sec Loss 9.2718 LearningRate 0.0466 Epoch: 6 Global Step: 263550 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:40,204-Speed 2630.16 samples/sec Loss 9.1791 LearningRate 0.0466 Epoch: 6 Global Step: 263560 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:20:44,087-Speed 2638.09 samples/sec Loss 9.1690 LearningRate 0.0466 Epoch: 6 Global Step: 263570 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:20:47,984-Speed 2628.24 samples/sec Loss 9.2729 LearningRate 0.0465 Epoch: 6 Global Step: 263580 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:20:51,878-Speed 2630.43 samples/sec Loss 9.2890 LearningRate 0.0465 Epoch: 6 Global Step: 263590 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:20:55,771-Speed 2630.34 samples/sec Loss 9.0941 LearningRate 0.0465 Epoch: 6 Global Step: 263600 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:20:59,665-Speed 2630.36 samples/sec Loss 9.1732 LearningRate 0.0465 Epoch: 6 Global Step: 263610 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:03,579-Speed 2616.74 samples/sec Loss 9.1104 LearningRate 0.0465 Epoch: 6 Global Step: 263620 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:07,489-Speed 2619.63 samples/sec Loss 9.1513 LearningRate 0.0465 Epoch: 6 Global Step: 263630 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:11,382-Speed 2630.60 samples/sec Loss 9.1482 LearningRate 0.0465 Epoch: 6 Global Step: 263640 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:15,282-Speed 2627.17 samples/sec Loss 9.1857 LearningRate 0.0465 Epoch: 6 Global Step: 263650 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:19,204-Speed 2611.49 samples/sec Loss 9.2532 LearningRate 0.0465 Epoch: 6 Global Step: 263660 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:23,101-Speed 2628.41 samples/sec Loss 9.1827 LearningRate 0.0465 Epoch: 6 Global Step: 263670 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:21:26,995-Speed 2630.83 samples/sec Loss 9.2047 LearningRate 0.0465 Epoch: 6 Global Step: 263680 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:21:30,888-Speed 2631.14 samples/sec Loss 9.1886 LearningRate 0.0465 Epoch: 6 Global Step: 263690 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:21:34,783-Speed 2629.17 samples/sec Loss 9.2098 LearningRate 0.0465 Epoch: 6 Global Step: 263700 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:21:38,678-Speed 2629.63 samples/sec Loss 9.1538 LearningRate 0.0465 Epoch: 6 Global Step: 263710 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:21:42,556-Speed 2641.26 samples/sec Loss 9.2764 LearningRate 0.0465 Epoch: 6 Global Step: 263720 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:46,474-Speed 2614.20 samples/sec Loss 9.1297 LearningRate 0.0465 Epoch: 6 Global Step: 263730 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:50,371-Speed 2628.60 samples/sec Loss 9.2397 LearningRate 0.0465 Epoch: 6 Global Step: 263740 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:54,266-Speed 2629.43 samples/sec Loss 9.2802 LearningRate 0.0465 Epoch: 6 Global Step: 263750 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:21:58,168-Speed 2625.17 samples/sec Loss 9.2292 LearningRate 0.0465 Epoch: 6 Global Step: 263760 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:22:02,062-Speed 2630.14 samples/sec Loss 9.1145 LearningRate 0.0465 Epoch: 6 Global Step: 263770 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:22:05,958-Speed 2628.96 samples/sec Loss 9.2279 LearningRate 0.0465 Epoch: 6 Global Step: 263780 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:22:09,855-Speed 2628.64 samples/sec Loss 9.1273 LearningRate 0.0465 Epoch: 6 Global Step: 263790 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:22:13,750-Speed 2629.62 samples/sec Loss 9.3160 LearningRate 0.0465 Epoch: 6 Global Step: 263800 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:22:17,647-Speed 2628.12 samples/sec Loss 9.2509 LearningRate 0.0465 Epoch: 6 Global Step: 263810 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:22:21,589-Speed 2601.57 samples/sec Loss 9.1596 LearningRate 0.0465 Epoch: 6 Global Step: 263820 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:25,513-Speed 2610.71 samples/sec Loss 9.2304 LearningRate 0.0465 Epoch: 6 Global Step: 263830 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:29,408-Speed 2629.53 samples/sec Loss 9.1339 LearningRate 0.0465 Epoch: 6 Global Step: 263840 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:33,336-Speed 2607.33 samples/sec Loss 9.3021 LearningRate 0.0465 Epoch: 6 Global Step: 263850 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:37,237-Speed 2625.87 samples/sec Loss 9.2391 LearningRate 0.0465 Epoch: 6 Global Step: 263860 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:41,128-Speed 2632.26 samples/sec Loss 9.0957 LearningRate 0.0465 Epoch: 6 Global Step: 263870 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:45,019-Speed 2632.52 samples/sec Loss 9.2584 LearningRate 0.0465 Epoch: 6 Global Step: 263880 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:48,913-Speed 2630.45 samples/sec Loss 9.1363 LearningRate 0.0465 Epoch: 6 Global Step: 263890 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:52,809-Speed 2628.73 samples/sec Loss 9.1756 LearningRate 0.0465 Epoch: 6 Global Step: 263900 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:22:56,709-Speed 2626.85 samples/sec Loss 9.2217 LearningRate 0.0465 Epoch: 6 Global Step: 263910 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:00,593-Speed 2637.42 samples/sec Loss 9.3410 LearningRate 0.0465 Epoch: 6 Global Step: 263920 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:04,523-Speed 2605.79 samples/sec Loss 9.0970 LearningRate 0.0465 Epoch: 6 Global Step: 263930 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:08,423-Speed 2626.20 samples/sec Loss 9.0977 LearningRate 0.0465 Epoch: 6 Global Step: 263940 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:12,412-Speed 2568.33 samples/sec Loss 9.1443 LearningRate 0.0465 Epoch: 6 Global Step: 263950 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:16,310-Speed 2627.50 samples/sec Loss 9.0902 LearningRate 0.0465 Epoch: 6 Global Step: 263960 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:20,211-Speed 2625.43 samples/sec Loss 9.2658 LearningRate 0.0465 Epoch: 6 Global Step: 263970 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:24,107-Speed 2629.00 samples/sec Loss 9.0931 LearningRate 0.0465 Epoch: 6 Global Step: 263980 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:28,003-Speed 2629.29 samples/sec Loss 9.2630 LearningRate 0.0465 Epoch: 6 Global Step: 263990 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:31,902-Speed 2627.01 samples/sec Loss 9.2446 LearningRate 0.0465 Epoch: 6 Global Step: 264000 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:23:35,781-Speed 2640.44 samples/sec Loss 9.3000 LearningRate 0.0465 Epoch: 6 Global Step: 264010 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:23:39,674-Speed 2630.87 samples/sec Loss 9.2545 LearningRate 0.0465 Epoch: 6 Global Step: 264020 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:23:43,591-Speed 2614.48 samples/sec Loss 9.2766 LearningRate 0.0465 Epoch: 6 Global Step: 264030 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:23:47,495-Speed 2623.68 samples/sec Loss 9.1744 LearningRate 0.0465 Epoch: 6 Global Step: 264040 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:23:51,399-Speed 2623.49 samples/sec Loss 9.2429 LearningRate 0.0465 Epoch: 6 Global Step: 264050 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:23:55,302-Speed 2624.38 samples/sec Loss 9.1579 LearningRate 0.0465 Epoch: 6 Global Step: 264060 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:23:59,200-Speed 2627.75 samples/sec Loss 9.1081 LearningRate 0.0465 Epoch: 6 Global Step: 264070 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:24:03,096-Speed 2628.97 samples/sec Loss 9.1781 LearningRate 0.0465 Epoch: 6 Global Step: 264080 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:24:06,989-Speed 2631.02 samples/sec Loss 9.0835 LearningRate 0.0465 Epoch: 6 Global Step: 264090 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:24:10,884-Speed 2629.32 samples/sec Loss 9.3056 LearningRate 0.0465 Epoch: 6 Global Step: 264100 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:24:14,787-Speed 2624.38 samples/sec Loss 9.0850 LearningRate 0.0465 Epoch: 6 Global Step: 264110 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:18,779-Speed 2565.37 samples/sec Loss 9.2030 LearningRate 0.0465 Epoch: 6 Global Step: 264120 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:22,674-Speed 2629.72 samples/sec Loss 9.1087 LearningRate 0.0465 Epoch: 6 Global Step: 264130 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:26,574-Speed 2626.17 samples/sec Loss 9.1901 LearningRate 0.0465 Epoch: 6 Global Step: 264140 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:30,486-Speed 2618.66 samples/sec Loss 9.3093 LearningRate 0.0465 Epoch: 6 Global Step: 264150 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:34,382-Speed 2628.78 samples/sec Loss 9.2321 LearningRate 0.0465 Epoch: 6 Global Step: 264160 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:38,277-Speed 2629.99 samples/sec Loss 9.2625 LearningRate 0.0465 Epoch: 6 Global Step: 264170 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:42,174-Speed 2628.16 samples/sec Loss 9.1524 LearningRate 0.0465 Epoch: 6 Global Step: 264180 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:46,070-Speed 2629.40 samples/sec Loss 9.2506 LearningRate 0.0464 Epoch: 6 Global Step: 264190 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:49,971-Speed 2625.14 samples/sec Loss 9.1457 LearningRate 0.0464 Epoch: 6 Global Step: 264200 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:53,850-Speed 2640.79 samples/sec Loss 9.2871 LearningRate 0.0464 Epoch: 6 Global Step: 264210 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:24:57,754-Speed 2623.54 samples/sec Loss 9.1198 LearningRate 0.0464 Epoch: 6 Global Step: 264220 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:25:01,624-Speed 2646.69 samples/sec Loss 9.1953 LearningRate 0.0464 Epoch: 6 Global Step: 264230 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:05,517-Speed 2630.35 samples/sec Loss 9.1582 LearningRate 0.0464 Epoch: 6 Global Step: 264240 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:09,412-Speed 2630.07 samples/sec Loss 9.2408 LearningRate 0.0464 Epoch: 6 Global Step: 264250 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:13,305-Speed 2631.08 samples/sec Loss 9.3129 LearningRate 0.0464 Epoch: 6 Global Step: 264260 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:17,202-Speed 2628.74 samples/sec Loss 9.2099 LearningRate 0.0464 Epoch: 6 Global Step: 264270 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:21,102-Speed 2626.00 samples/sec Loss 9.2662 LearningRate 0.0464 Epoch: 6 Global Step: 264280 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:25,007-Speed 2623.54 samples/sec Loss 9.1563 LearningRate 0.0464 Epoch: 6 Global Step: 264290 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:28,914-Speed 2621.72 samples/sec Loss 9.2395 LearningRate 0.0464 Epoch: 6 Global Step: 264300 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:32,813-Speed 2626.48 samples/sec Loss 9.0805 LearningRate 0.0464 Epoch: 6 Global Step: 264310 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:36,741-Speed 2607.30 samples/sec Loss 9.3564 LearningRate 0.0464 Epoch: 6 Global Step: 264320 Fp16 Grad Scale: 131072 Required: 64 hours
Training: 2022-04-14 01:25:40,638-Speed 2628.80 samples/sec Loss 9.1900 LearningRate 0.0464 Epoch: 6 Global Step: 264330 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:25:44,534-Speed 2628.87 samples/sec Loss 9.2690 LearningRate 0.0464 Epoch: 6 Global Step: 264340 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:25:48,440-Speed 2622.46 samples/sec Loss 9.0355 LearningRate 0.0464 Epoch: 6 Global Step: 264350 Fp16 Grad Scale: 262144 Required: 64 hours
Training: 2022-04-14 01:25:52,339-Speed 2627.14 samples/sec Loss 9.0888 LearningRate 0.0464 Epoch: 6 Global Step: 264360 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:25:56,246-Speed 2620.90 samples/sec Loss 9.2013 LearningRate 0.0464 Epoch: 6 Global Step: 264370 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:26:00,141-Speed 2629.81 samples/sec Loss 9.2131 LearningRate 0.0464 Epoch: 6 Global Step: 264380 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:26:04,040-Speed 2627.37 samples/sec Loss 9.2771 LearningRate 0.0464 Epoch: 6 Global Step: 264390 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:26:07,947-Speed 2621.34 samples/sec Loss 9.1014 LearningRate 0.0464 Epoch: 6 Global Step: 264400 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:26:11,842-Speed 2629.72 samples/sec Loss 9.2060 LearningRate 0.0464 Epoch: 6 Global Step: 264410 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:26:15,702-Speed 2653.37 samples/sec Loss 9.2021 LearningRate 0.0464 Epoch: 6 Global Step: 264420 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:19,597-Speed 2629.72 samples/sec Loss 9.0939 LearningRate 0.0464 Epoch: 6 Global Step: 264430 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:23,498-Speed 2625.78 samples/sec Loss 9.2565 LearningRate 0.0464 Epoch: 6 Global Step: 264440 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:27,413-Speed 2615.70 samples/sec Loss 9.3083 LearningRate 0.0464 Epoch: 6 Global Step: 264450 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:31,309-Speed 2628.74 samples/sec Loss 9.1897 LearningRate 0.0464 Epoch: 6 Global Step: 264460 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:35,210-Speed 2626.21 samples/sec Loss 9.2161 LearningRate 0.0464 Epoch: 6 Global Step: 264470 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:39,105-Speed 2629.75 samples/sec Loss 9.2354 LearningRate 0.0464 Epoch: 6 Global Step: 264480 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:42,999-Speed 2629.70 samples/sec Loss 9.2656 LearningRate 0.0464 Epoch: 6 Global Step: 264490 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:46,900-Speed 2626.19 samples/sec Loss 9.0600 LearningRate 0.0464 Epoch: 6 Global Step: 264500 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:50,805-Speed 2622.31 samples/sec Loss 9.0897 LearningRate 0.0464 Epoch: 6 Global Step: 264510 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:26:54,711-Speed 2623.03 samples/sec Loss 9.0864 LearningRate 0.0464 Epoch: 6 Global Step: 264520 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:26:58,612-Speed 2625.63 samples/sec Loss 9.1468 LearningRate 0.0464 Epoch: 6 Global Step: 264530 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:02,533-Speed 2611.64 samples/sec Loss 9.2488 LearningRate 0.0464 Epoch: 6 Global Step: 264540 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:06,433-Speed 2626.06 samples/sec Loss 9.1851 LearningRate 0.0464 Epoch: 6 Global Step: 264550 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:10,334-Speed 2625.79 samples/sec Loss 9.1958 LearningRate 0.0464 Epoch: 6 Global Step: 264560 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:14,226-Speed 2632.07 samples/sec Loss 9.2520 LearningRate 0.0464 Epoch: 6 Global Step: 264570 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:18,119-Speed 2631.02 samples/sec Loss 9.0856 LearningRate 0.0464 Epoch: 6 Global Step: 264580 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:22,016-Speed 2628.18 samples/sec Loss 9.1361 LearningRate 0.0464 Epoch: 6 Global Step: 264590 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:25,919-Speed 2624.44 samples/sec Loss 9.1702 LearningRate 0.0464 Epoch: 6 Global Step: 264600 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:27:29,804-Speed 2636.32 samples/sec Loss 9.1747 LearningRate 0.0464 Epoch: 6 Global Step: 264610 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:27:33,677-Speed 2644.37 samples/sec Loss 9.2879 LearningRate 0.0464 Epoch: 6 Global Step: 264620 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:27:37,578-Speed 2625.38 samples/sec Loss 9.1029 LearningRate 0.0464 Epoch: 6 Global Step: 264630 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:27:41,472-Speed 2630.06 samples/sec Loss 9.2817 LearningRate 0.0464 Epoch: 6 Global Step: 264640 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:27:45,363-Speed 2632.58 samples/sec Loss 9.3088 LearningRate 0.0464 Epoch: 6 Global Step: 264650 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:27:49,254-Speed 2632.21 samples/sec Loss 9.2329 LearningRate 0.0464 Epoch: 6 Global Step: 264660 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:27:53,147-Speed 2631.50 samples/sec Loss 9.1758 LearningRate 0.0464 Epoch: 6 Global Step: 264670 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:27:57,043-Speed 2629.05 samples/sec Loss 9.1226 LearningRate 0.0464 Epoch: 6 Global Step: 264680 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:28:00,943-Speed 2625.71 samples/sec Loss 9.2058 LearningRate 0.0464 Epoch: 6 Global Step: 264690 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:28:04,836-Speed 2631.04 samples/sec Loss 9.1217 LearningRate 0.0464 Epoch: 6 Global Step: 264700 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:28:08,728-Speed 2631.60 samples/sec Loss 9.1670 LearningRate 0.0464 Epoch: 6 Global Step: 264710 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:28:12,620-Speed 2631.62 samples/sec Loss 9.1946 LearningRate 0.0464 Epoch: 6 Global Step: 264720 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:16,520-Speed 2626.50 samples/sec Loss 9.2835 LearningRate 0.0464 Epoch: 6 Global Step: 264730 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:20,413-Speed 2630.91 samples/sec Loss 9.1700 LearningRate 0.0464 Epoch: 6 Global Step: 264740 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:24,316-Speed 2624.53 samples/sec Loss 9.1058 LearningRate 0.0464 Epoch: 6 Global Step: 264750 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:28,218-Speed 2624.48 samples/sec Loss 9.2139 LearningRate 0.0464 Epoch: 6 Global Step: 264760 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:32,169-Speed 2592.97 samples/sec Loss 9.1715 LearningRate 0.0464 Epoch: 6 Global Step: 264770 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:36,069-Speed 2625.92 samples/sec Loss 9.1192 LearningRate 0.0464 Epoch: 6 Global Step: 264780 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:39,963-Speed 2630.56 samples/sec Loss 9.2642 LearningRate 0.0464 Epoch: 6 Global Step: 264790 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:43,857-Speed 2630.16 samples/sec Loss 9.1170 LearningRate 0.0463 Epoch: 6 Global Step: 264800 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:47,750-Speed 2631.02 samples/sec Loss 9.2077 LearningRate 0.0463 Epoch: 6 Global Step: 264810 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:28:51,645-Speed 2629.56 samples/sec Loss 9.1888 LearningRate 0.0463 Epoch: 6 Global Step: 264820 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:28:55,538-Speed 2630.55 samples/sec Loss 9.2259 LearningRate 0.0463 Epoch: 6 Global Step: 264830 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:28:59,472-Speed 2603.63 samples/sec Loss 9.1364 LearningRate 0.0463 Epoch: 6 Global Step: 264840 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:03,367-Speed 2629.65 samples/sec Loss 9.2314 LearningRate 0.0463 Epoch: 6 Global Step: 264850 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:07,267-Speed 2626.08 samples/sec Loss 9.2554 LearningRate 0.0463 Epoch: 6 Global Step: 264860 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:11,162-Speed 2629.87 samples/sec Loss 9.1884 LearningRate 0.0463 Epoch: 6 Global Step: 264870 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:15,056-Speed 2630.05 samples/sec Loss 9.0456 LearningRate 0.0463 Epoch: 6 Global Step: 264880 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:18,947-Speed 2632.68 samples/sec Loss 9.1341 LearningRate 0.0463 Epoch: 6 Global Step: 264890 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:22,858-Speed 2618.56 samples/sec Loss 9.1847 LearningRate 0.0463 Epoch: 6 Global Step: 264900 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:26,755-Speed 2628.37 samples/sec Loss 9.2878 LearningRate 0.0463 Epoch: 6 Global Step: 264910 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:30,652-Speed 2628.22 samples/sec Loss 9.2322 LearningRate 0.0463 Epoch: 6 Global Step: 264920 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:29:34,558-Speed 2622.27 samples/sec Loss 9.1522 LearningRate 0.0463 Epoch: 6 Global Step: 264930 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:29:38,452-Speed 2630.00 samples/sec Loss 9.1455 LearningRate 0.0463 Epoch: 6 Global Step: 264940 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:29:42,346-Speed 2630.15 samples/sec Loss 9.2020 LearningRate 0.0463 Epoch: 6 Global Step: 264950 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:29:46,243-Speed 2629.00 samples/sec Loss 9.3399 LearningRate 0.0463 Epoch: 6 Global Step: 264960 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:29:50,132-Speed 2633.33 samples/sec Loss 9.1852 LearningRate 0.0463 Epoch: 6 Global Step: 264970 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:54,036-Speed 2623.86 samples/sec Loss 9.2086 LearningRate 0.0463 Epoch: 6 Global Step: 264980 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:29:57,944-Speed 2621.08 samples/sec Loss 9.2156 LearningRate 0.0463 Epoch: 6 Global Step: 264990 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:01,849-Speed 2622.23 samples/sec Loss 9.1064 LearningRate 0.0463 Epoch: 6 Global Step: 265000 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:05,744-Speed 2629.83 samples/sec Loss 9.1058 LearningRate 0.0463 Epoch: 6 Global Step: 265010 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:09,643-Speed 2626.45 samples/sec Loss 9.1622 LearningRate 0.0463 Epoch: 6 Global Step: 265020 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:13,550-Speed 2621.87 samples/sec Loss 9.2011 LearningRate 0.0463 Epoch: 6 Global Step: 265030 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:17,449-Speed 2626.86 samples/sec Loss 9.1821 LearningRate 0.0463 Epoch: 6 Global Step: 265040 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:21,348-Speed 2627.00 samples/sec Loss 9.3022 LearningRate 0.0463 Epoch: 6 Global Step: 265050 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:25,244-Speed 2629.34 samples/sec Loss 9.3022 LearningRate 0.0463 Epoch: 6 Global Step: 265060 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:29,138-Speed 2629.71 samples/sec Loss 9.1682 LearningRate 0.0463 Epoch: 6 Global Step: 265070 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:30:33,035-Speed 2628.12 samples/sec Loss 9.1386 LearningRate 0.0463 Epoch: 6 Global Step: 265080 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:30:36,953-Speed 2613.96 samples/sec Loss 9.2544 LearningRate 0.0463 Epoch: 6 Global Step: 265090 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:30:40,853-Speed 2626.76 samples/sec Loss 9.2569 LearningRate 0.0463 Epoch: 6 Global Step: 265100 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:30:44,756-Speed 2624.44 samples/sec Loss 9.1780 LearningRate 0.0463 Epoch: 6 Global Step: 265110 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:30:48,808-Speed 2527.11 samples/sec Loss 9.1205 LearningRate 0.0463 Epoch: 6 Global Step: 265120 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:30:52,699-Speed 2633.01 samples/sec Loss 9.3194 LearningRate 0.0463 Epoch: 6 Global Step: 265130 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:30:56,653-Speed 2589.90 samples/sec Loss 9.2631 LearningRate 0.0463 Epoch: 6 Global Step: 265140 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:00,548-Speed 2630.06 samples/sec Loss 9.1085 LearningRate 0.0463 Epoch: 6 Global Step: 265150 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:04,446-Speed 2627.62 samples/sec Loss 9.1874 LearningRate 0.0463 Epoch: 6 Global Step: 265160 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:08,345-Speed 2626.94 samples/sec Loss 9.2438 LearningRate 0.0463 Epoch: 6 Global Step: 265170 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:12,246-Speed 2625.07 samples/sec Loss 9.2611 LearningRate 0.0463 Epoch: 6 Global Step: 265180 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:16,144-Speed 2628.15 samples/sec Loss 9.1815 LearningRate 0.0463 Epoch: 6 Global Step: 265190 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:20,046-Speed 2624.58 samples/sec Loss 9.1817 LearningRate 0.0463 Epoch: 6 Global Step: 265200 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:23,950-Speed 2623.31 samples/sec Loss 9.1011 LearningRate 0.0463 Epoch: 6 Global Step: 265210 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:27,849-Speed 2626.95 samples/sec Loss 9.0888 LearningRate 0.0463 Epoch: 6 Global Step: 265220 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:31,746-Speed 2628.77 samples/sec Loss 9.0715 LearningRate 0.0463 Epoch: 6 Global Step: 265230 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:31:35,642-Speed 2628.70 samples/sec Loss 9.1624 LearningRate 0.0463 Epoch: 6 Global Step: 265240 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:31:39,545-Speed 2624.13 samples/sec Loss 9.1057 LearningRate 0.0463 Epoch: 6 Global Step: 265250 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:31:43,435-Speed 2633.15 samples/sec Loss 9.1433 LearningRate 0.0463 Epoch: 6 Global Step: 265260 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:31:47,311-Speed 2642.73 samples/sec Loss 9.1483 LearningRate 0.0463 Epoch: 6 Global Step: 265270 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:51,206-Speed 2629.36 samples/sec Loss 9.1457 LearningRate 0.0463 Epoch: 6 Global Step: 265280 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:55,106-Speed 2626.41 samples/sec Loss 9.1816 LearningRate 0.0463 Epoch: 6 Global Step: 265290 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:31:59,005-Speed 2626.75 samples/sec Loss 9.1957 LearningRate 0.0463 Epoch: 6 Global Step: 265300 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:02,903-Speed 2627.46 samples/sec Loss 9.3195 LearningRate 0.0463 Epoch: 6 Global Step: 265310 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:06,801-Speed 2627.62 samples/sec Loss 9.3575 LearningRate 0.0463 Epoch: 6 Global Step: 265320 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:10,694-Speed 2630.87 samples/sec Loss 9.2072 LearningRate 0.0463 Epoch: 6 Global Step: 265330 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:14,590-Speed 2631.57 samples/sec Loss 9.1894 LearningRate 0.0463 Epoch: 6 Global Step: 265340 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:18,485-Speed 2629.62 samples/sec Loss 9.1208 LearningRate 0.0463 Epoch: 6 Global Step: 265350 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:22,380-Speed 2629.58 samples/sec Loss 9.0974 LearningRate 0.0463 Epoch: 6 Global Step: 265360 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:26,277-Speed 2628.42 samples/sec Loss 9.0885 LearningRate 0.0463 Epoch: 6 Global Step: 265370 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:32:30,176-Speed 2626.59 samples/sec Loss 9.1062 LearningRate 0.0463 Epoch: 6 Global Step: 265380 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:32:34,068-Speed 2631.56 samples/sec Loss 9.1701 LearningRate 0.0463 Epoch: 6 Global Step: 265390 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:32:37,961-Speed 2630.81 samples/sec Loss 9.2199 LearningRate 0.0463 Epoch: 6 Global Step: 265400 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:41,856-Speed 2630.22 samples/sec Loss 9.0588 LearningRate 0.0462 Epoch: 6 Global Step: 265410 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:45,749-Speed 2631.24 samples/sec Loss 9.1332 LearningRate 0.0462 Epoch: 6 Global Step: 265420 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:49,648-Speed 2626.32 samples/sec Loss 9.1749 LearningRate 0.0462 Epoch: 6 Global Step: 265430 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:53,549-Speed 2626.13 samples/sec Loss 9.2501 LearningRate 0.0462 Epoch: 6 Global Step: 265440 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:32:57,444-Speed 2629.17 samples/sec Loss 9.1865 LearningRate 0.0462 Epoch: 6 Global Step: 265450 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:01,342-Speed 2627.19 samples/sec Loss 9.0980 LearningRate 0.0462 Epoch: 6 Global Step: 265460 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:05,243-Speed 2625.61 samples/sec Loss 9.1739 LearningRate 0.0462 Epoch: 6 Global Step: 265470 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:09,145-Speed 2625.20 samples/sec Loss 9.0882 LearningRate 0.0462 Epoch: 6 Global Step: 265480 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:13,042-Speed 2627.73 samples/sec Loss 9.1090 LearningRate 0.0462 Epoch: 6 Global Step: 265490 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:16,940-Speed 2627.86 samples/sec Loss 9.1197 LearningRate 0.0462 Epoch: 6 Global Step: 265500 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:33:20,820-Speed 2639.83 samples/sec Loss 9.1602 LearningRate 0.0462 Epoch: 6 Global Step: 265510 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:24,716-Speed 2629.27 samples/sec Loss 9.0818 LearningRate 0.0462 Epoch: 6 Global Step: 265520 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:28,610-Speed 2630.45 samples/sec Loss 9.2642 LearningRate 0.0462 Epoch: 6 Global Step: 265530 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:32,509-Speed 2626.57 samples/sec Loss 9.2282 LearningRate 0.0462 Epoch: 6 Global Step: 265540 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:36,409-Speed 2626.21 samples/sec Loss 9.1838 LearningRate 0.0462 Epoch: 6 Global Step: 265550 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:40,307-Speed 2627.59 samples/sec Loss 9.1829 LearningRate 0.0462 Epoch: 6 Global Step: 265560 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:44,206-Speed 2626.88 samples/sec Loss 9.1831 LearningRate 0.0462 Epoch: 6 Global Step: 265570 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:48,100-Speed 2629.79 samples/sec Loss 9.1310 LearningRate 0.0462 Epoch: 6 Global Step: 265580 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:51,993-Speed 2631.54 samples/sec Loss 9.1898 LearningRate 0.0462 Epoch: 6 Global Step: 265590 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:55,886-Speed 2630.30 samples/sec Loss 9.0862 LearningRate 0.0462 Epoch: 6 Global Step: 265600 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:33:59,851-Speed 2583.89 samples/sec Loss 9.1468 LearningRate 0.0462 Epoch: 6 Global Step: 265610 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:34:03,753-Speed 2624.78 samples/sec Loss 9.1972 LearningRate 0.0462 Epoch: 6 Global Step: 265620 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:34:07,648-Speed 2629.62 samples/sec Loss 9.0979 LearningRate 0.0462 Epoch: 6 Global Step: 265630 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:34:11,526-Speed 2641.00 samples/sec Loss 9.2230 LearningRate 0.0462 Epoch: 6 Global Step: 265640 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:34:15,420-Speed 2630.34 samples/sec Loss 9.1888 LearningRate 0.0462 Epoch: 6 Global Step: 265650 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:34:19,312-Speed 2631.38 samples/sec Loss 9.1273 LearningRate 0.0462 Epoch: 6 Global Step: 265660 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:34:23,189-Speed 2642.26 samples/sec Loss 9.2424 LearningRate 0.0462 Epoch: 6 Global Step: 265670 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:27,080-Speed 2631.99 samples/sec Loss 9.2086 LearningRate 0.0462 Epoch: 6 Global Step: 265680 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:30,973-Speed 2630.63 samples/sec Loss 9.1889 LearningRate 0.0462 Epoch: 6 Global Step: 265690 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:34,870-Speed 2628.69 samples/sec Loss 9.2061 LearningRate 0.0462 Epoch: 6 Global Step: 265700 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:38,780-Speed 2619.30 samples/sec Loss 9.0879 LearningRate 0.0462 Epoch: 6 Global Step: 265710 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:42,681-Speed 2625.30 samples/sec Loss 9.0281 LearningRate 0.0462 Epoch: 6 Global Step: 265720 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:46,581-Speed 2626.85 samples/sec Loss 9.1777 LearningRate 0.0462 Epoch: 6 Global Step: 265730 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:50,479-Speed 2627.31 samples/sec Loss 9.1412 LearningRate 0.0462 Epoch: 6 Global Step: 265740 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:54,379-Speed 2626.41 samples/sec Loss 9.1138 LearningRate 0.0462 Epoch: 6 Global Step: 265750 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:34:58,285-Speed 2621.90 samples/sec Loss 9.0203 LearningRate 0.0462 Epoch: 6 Global Step: 265760 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:35:02,199-Speed 2616.71 samples/sec Loss 9.1845 LearningRate 0.0462 Epoch: 6 Global Step: 265770 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:06,096-Speed 2628.39 samples/sec Loss 9.1656 LearningRate 0.0462 Epoch: 6 Global Step: 265780 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:09,996-Speed 2626.38 samples/sec Loss 9.3102 LearningRate 0.0462 Epoch: 6 Global Step: 265790 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:13,887-Speed 2632.43 samples/sec Loss 9.2031 LearningRate 0.0462 Epoch: 6 Global Step: 265800 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:17,779-Speed 2631.73 samples/sec Loss 9.1359 LearningRate 0.0462 Epoch: 6 Global Step: 265810 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:21,672-Speed 2630.71 samples/sec Loss 9.0629 LearningRate 0.0462 Epoch: 6 Global Step: 265820 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:25,565-Speed 2630.99 samples/sec Loss 9.1627 LearningRate 0.0462 Epoch: 6 Global Step: 265830 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:29,457-Speed 2631.65 samples/sec Loss 9.0543 LearningRate 0.0462 Epoch: 6 Global Step: 265840 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:33,351-Speed 2630.67 samples/sec Loss 9.0558 LearningRate 0.0462 Epoch: 6 Global Step: 265850 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:37,246-Speed 2629.00 samples/sec Loss 9.1681 LearningRate 0.0462 Epoch: 6 Global Step: 265860 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:35:41,147-Speed 2625.89 samples/sec Loss 9.1783 LearningRate 0.0462 Epoch: 6 Global Step: 265870 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:35:45,040-Speed 2631.10 samples/sec Loss 9.0727 LearningRate 0.0462 Epoch: 6 Global Step: 265880 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:35:48,936-Speed 2628.83 samples/sec Loss 9.1216 LearningRate 0.0462 Epoch: 6 Global Step: 265890 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:35:52,837-Speed 2625.47 samples/sec Loss 9.1237 LearningRate 0.0462 Epoch: 6 Global Step: 265900 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:35:56,727-Speed 2633.35 samples/sec Loss 9.1288 LearningRate 0.0462 Epoch: 6 Global Step: 265910 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:00,619-Speed 2631.60 samples/sec Loss 9.2038 LearningRate 0.0462 Epoch: 6 Global Step: 265920 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:04,517-Speed 2627.20 samples/sec Loss 9.1121 LearningRate 0.0462 Epoch: 6 Global Step: 265930 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:08,407-Speed 2633.41 samples/sec Loss 9.1696 LearningRate 0.0462 Epoch: 6 Global Step: 265940 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:12,301-Speed 2629.66 samples/sec Loss 9.2043 LearningRate 0.0462 Epoch: 6 Global Step: 265950 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:16,196-Speed 2630.25 samples/sec Loss 9.1610 LearningRate 0.0462 Epoch: 6 Global Step: 265960 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:20,082-Speed 2635.25 samples/sec Loss 9.1834 LearningRate 0.0462 Epoch: 6 Global Step: 265970 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:23,975-Speed 2631.06 samples/sec Loss 9.1859 LearningRate 0.0462 Epoch: 6 Global Step: 265980 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:27,866-Speed 2631.92 samples/sec Loss 9.1711 LearningRate 0.0462 Epoch: 6 Global Step: 265990 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:31,761-Speed 2630.49 samples/sec Loss 9.2010 LearningRate 0.0462 Epoch: 6 Global Step: 266000 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:36:35,639-Speed 2641.42 samples/sec Loss 9.2503 LearningRate 0.0462 Epoch: 6 Global Step: 266010 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:36:39,531-Speed 2631.19 samples/sec Loss 9.2122 LearningRate 0.0461 Epoch: 6 Global Step: 266020 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:36:43,426-Speed 2629.61 samples/sec Loss 9.1820 LearningRate 0.0461 Epoch: 6 Global Step: 266030 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:36:47,319-Speed 2631.32 samples/sec Loss 9.0522 LearningRate 0.0461 Epoch: 6 Global Step: 266040 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:36:51,217-Speed 2627.46 samples/sec Loss 9.0558 LearningRate 0.0461 Epoch: 6 Global Step: 266050 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:36:55,112-Speed 2629.73 samples/sec Loss 9.0779 LearningRate 0.0461 Epoch: 6 Global Step: 266060 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:36:59,008-Speed 2628.17 samples/sec Loss 9.1016 LearningRate 0.0461 Epoch: 6 Global Step: 266070 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:37:02,901-Speed 2631.01 samples/sec Loss 9.3237 LearningRate 0.0461 Epoch: 6 Global Step: 266080 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:37:06,794-Speed 2631.54 samples/sec Loss 9.1533 LearningRate 0.0461 Epoch: 6 Global Step: 266090 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:37:10,698-Speed 2623.76 samples/sec Loss 9.1501 LearningRate 0.0461 Epoch: 6 Global Step: 266100 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:37:14,591-Speed 2630.72 samples/sec Loss 9.1268 LearningRate 0.0461 Epoch: 6 Global Step: 266110 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:37:18,483-Speed 2631.87 samples/sec Loss 9.2441 LearningRate 0.0461 Epoch: 6 Global Step: 266120 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:37:22,393-Speed 2618.96 samples/sec Loss 9.1390 LearningRate 0.0461 Epoch: 6 Global Step: 266130 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:37:26,294-Speed 2626.11 samples/sec Loss 9.0206 LearningRate 0.0461 Epoch: 6 Global Step: 266140 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:37:30,185-Speed 2631.83 samples/sec Loss 9.1701 LearningRate 0.0461 Epoch: 6 Global Step: 266150 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:37:34,064-Speed 2640.73 samples/sec Loss 9.1470 LearningRate 0.0461 Epoch: 6 Global Step: 266160 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:37:37,929-Speed 2649.44 samples/sec Loss 9.3504 LearningRate 0.0461 Epoch: 6 Global Step: 266170 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:37:41,784-Speed 2657.75 samples/sec Loss 10.3604 LearningRate 0.0461 Epoch: 6 Global Step: 266180 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:37:45,673-Speed 2634.58 samples/sec Loss 9.4956 LearningRate 0.0461 Epoch: 6 Global Step: 266190 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:37:49,569-Speed 2628.88 samples/sec Loss 9.3731 LearningRate 0.0461 Epoch: 6 Global Step: 266200 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:37:53,462-Speed 2630.80 samples/sec Loss 9.1694 LearningRate 0.0461 Epoch: 6 Global Step: 266210 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:37:57,361-Speed 2627.01 samples/sec Loss 9.3986 LearningRate 0.0461 Epoch: 6 Global Step: 266220 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:38:01,271-Speed 2619.46 samples/sec Loss 9.3582 LearningRate 0.0461 Epoch: 6 Global Step: 266230 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:38:05,168-Speed 2627.87 samples/sec Loss 9.3504 LearningRate 0.0461 Epoch: 6 Global Step: 266240 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:38:09,069-Speed 2625.89 samples/sec Loss 9.2250 LearningRate 0.0461 Epoch: 6 Global Step: 266250 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:38:12,958-Speed 2633.26 samples/sec Loss 9.2518 LearningRate 0.0461 Epoch: 6 Global Step: 266260 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:38:16,844-Speed 2635.99 samples/sec Loss 9.2282 LearningRate 0.0461 Epoch: 6 Global Step: 266270 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:38:20,737-Speed 2631.62 samples/sec Loss 9.0972 LearningRate 0.0461 Epoch: 6 Global Step: 266280 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:24,628-Speed 2632.12 samples/sec Loss 9.2932 LearningRate 0.0461 Epoch: 6 Global Step: 266290 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:28,517-Speed 2633.42 samples/sec Loss 9.0843 LearningRate 0.0461 Epoch: 6 Global Step: 266300 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:32,409-Speed 2631.49 samples/sec Loss 9.2862 LearningRate 0.0461 Epoch: 6 Global Step: 266310 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:36,313-Speed 2623.72 samples/sec Loss 9.2598 LearningRate 0.0461 Epoch: 6 Global Step: 266320 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:40,240-Speed 2608.44 samples/sec Loss 9.2813 LearningRate 0.0461 Epoch: 6 Global Step: 266330 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:44,219-Speed 2574.55 samples/sec Loss 9.1196 LearningRate 0.0461 Epoch: 6 Global Step: 266340 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:48,110-Speed 2631.99 samples/sec Loss 9.3377 LearningRate 0.0461 Epoch: 6 Global Step: 266350 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:52,016-Speed 2622.80 samples/sec Loss 9.2073 LearningRate 0.0461 Epoch: 6 Global Step: 266360 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:55,905-Speed 2633.78 samples/sec Loss 9.7803 LearningRate 0.0461 Epoch: 6 Global Step: 266370 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:38:59,804-Speed 2627.32 samples/sec Loss 9.3257 LearningRate 0.0461 Epoch: 6 Global Step: 266380 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:03,696-Speed 2631.55 samples/sec Loss 9.3188 LearningRate 0.0461 Epoch: 6 Global Step: 266390 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:07,593-Speed 2628.28 samples/sec Loss 9.1749 LearningRate 0.0461 Epoch: 6 Global Step: 266400 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:11,491-Speed 2627.60 samples/sec Loss 9.3855 LearningRate 0.0461 Epoch: 6 Global Step: 266410 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:15,389-Speed 2628.18 samples/sec Loss 9.2611 LearningRate 0.0461 Epoch: 6 Global Step: 266420 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:19,286-Speed 2628.41 samples/sec Loss 9.2166 LearningRate 0.0461 Epoch: 6 Global Step: 266430 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:23,177-Speed 2632.44 samples/sec Loss 9.1428 LearningRate 0.0461 Epoch: 6 Global Step: 266440 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:27,070-Speed 2630.77 samples/sec Loss 9.0774 LearningRate 0.0461 Epoch: 6 Global Step: 266450 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:30,966-Speed 2629.19 samples/sec Loss 9.0415 LearningRate 0.0461 Epoch: 6 Global Step: 266460 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:34,859-Speed 2630.48 samples/sec Loss 9.2308 LearningRate 0.0461 Epoch: 6 Global Step: 266470 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:38,739-Speed 2640.17 samples/sec Loss 9.1615 LearningRate 0.0461 Epoch: 6 Global Step: 266480 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:42,633-Speed 2630.00 samples/sec Loss 9.0656 LearningRate 0.0461 Epoch: 6 Global Step: 266490 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:39:46,514-Speed 2639.43 samples/sec Loss 9.4187 LearningRate 0.0461 Epoch: 6 Global Step: 266500 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:39:50,380-Speed 2649.33 samples/sec Loss 9.2080 LearningRate 0.0461 Epoch: 6 Global Step: 266510 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:39:54,267-Speed 2635.46 samples/sec Loss 9.5471 LearningRate 0.0461 Epoch: 6 Global Step: 266520 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:39:58,225-Speed 2587.83 samples/sec Loss 9.3387 LearningRate 0.0461 Epoch: 6 Global Step: 266530 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:02,128-Speed 2624.13 samples/sec Loss 9.2242 LearningRate 0.0461 Epoch: 6 Global Step: 266540 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:06,021-Speed 2631.24 samples/sec Loss 9.2322 LearningRate 0.0461 Epoch: 6 Global Step: 266550 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:09,910-Speed 2633.86 samples/sec Loss 9.2469 LearningRate 0.0461 Epoch: 6 Global Step: 266560 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:13,800-Speed 2632.79 samples/sec Loss 9.2257 LearningRate 0.0461 Epoch: 6 Global Step: 266570 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:17,692-Speed 2631.53 samples/sec Loss 9.2191 LearningRate 0.0461 Epoch: 6 Global Step: 266580 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:21,585-Speed 2631.62 samples/sec Loss 9.0826 LearningRate 0.0461 Epoch: 6 Global Step: 266590 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:25,489-Speed 2623.43 samples/sec Loss 9.1764 LearningRate 0.0461 Epoch: 6 Global Step: 266600 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:40:29,387-Speed 2627.65 samples/sec Loss 9.2248 LearningRate 0.0461 Epoch: 6 Global Step: 266610 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:40:33,277-Speed 2633.38 samples/sec Loss 9.0953 LearningRate 0.0461 Epoch: 6 Global Step: 266620 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:40:37,168-Speed 2632.25 samples/sec Loss 9.1415 LearningRate 0.0460 Epoch: 6 Global Step: 266630 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:40:41,056-Speed 2633.76 samples/sec Loss 9.2451 LearningRate 0.0460 Epoch: 6 Global Step: 266640 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:40:44,955-Speed 2627.52 samples/sec Loss 9.1765 LearningRate 0.0460 Epoch: 6 Global Step: 266650 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:40:48,842-Speed 2634.92 samples/sec Loss 9.1716 LearningRate 0.0460 Epoch: 6 Global Step: 266660 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:40:52,738-Speed 2629.08 samples/sec Loss 9.0849 LearningRate 0.0460 Epoch: 6 Global Step: 266670 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:40:56,634-Speed 2629.10 samples/sec Loss 9.2545 LearningRate 0.0460 Epoch: 6 Global Step: 266680 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:41:00,519-Speed 2636.31 samples/sec Loss 9.4418 LearningRate 0.0460 Epoch: 6 Global Step: 266690 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:41:04,437-Speed 2614.15 samples/sec Loss 9.2953 LearningRate 0.0460 Epoch: 6 Global Step: 266700 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:41:08,327-Speed 2633.12 samples/sec Loss 9.2514 LearningRate 0.0460 Epoch: 6 Global Step: 266710 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:12,217-Speed 2633.24 samples/sec Loss 9.2841 LearningRate 0.0460 Epoch: 6 Global Step: 266720 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:16,118-Speed 2625.22 samples/sec Loss 9.4557 LearningRate 0.0460 Epoch: 6 Global Step: 266730 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:20,013-Speed 2630.08 samples/sec Loss 9.2274 LearningRate 0.0460 Epoch: 6 Global Step: 266740 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:23,903-Speed 2633.24 samples/sec Loss 9.2323 LearningRate 0.0460 Epoch: 6 Global Step: 266750 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:27,795-Speed 2631.28 samples/sec Loss 9.0541 LearningRate 0.0460 Epoch: 6 Global Step: 266760 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:31,689-Speed 2630.99 samples/sec Loss 9.1400 LearningRate 0.0460 Epoch: 6 Global Step: 266770 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:35,582-Speed 2630.66 samples/sec Loss 9.1131 LearningRate 0.0460 Epoch: 6 Global Step: 266780 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:39,476-Speed 2629.74 samples/sec Loss 9.1192 LearningRate 0.0460 Epoch: 6 Global Step: 266790 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:43,369-Speed 2631.12 samples/sec Loss 9.1786 LearningRate 0.0460 Epoch: 6 Global Step: 266800 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:41:47,270-Speed 2625.53 samples/sec Loss 9.1742 LearningRate 0.0460 Epoch: 6 Global Step: 266810 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:41:51,162-Speed 2632.07 samples/sec Loss 9.1666 LearningRate 0.0460 Epoch: 6 Global Step: 266820 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:41:55,052-Speed 2632.72 samples/sec Loss 9.0722 LearningRate 0.0460 Epoch: 6 Global Step: 266830 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:41:58,943-Speed 2632.81 samples/sec Loss 9.1309 LearningRate 0.0460 Epoch: 6 Global Step: 266840 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:42:02,834-Speed 2631.99 samples/sec Loss 9.1093 LearningRate 0.0460 Epoch: 6 Global Step: 266850 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:42:06,727-Speed 2631.02 samples/sec Loss 9.2125 LearningRate 0.0460 Epoch: 6 Global Step: 266860 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:42:10,619-Speed 2631.16 samples/sec Loss 9.1259 LearningRate 0.0460 Epoch: 6 Global Step: 266870 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:42:14,511-Speed 2632.08 samples/sec Loss 9.2960 LearningRate 0.0460 Epoch: 6 Global Step: 266880 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:42:18,407-Speed 2628.86 samples/sec Loss 9.2261 LearningRate 0.0460 Epoch: 6 Global Step: 266890 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:42:22,302-Speed 2629.73 samples/sec Loss 9.1497 LearningRate 0.0460 Epoch: 6 Global Step: 266900 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:42:26,193-Speed 2631.93 samples/sec Loss 9.1184 LearningRate 0.0460 Epoch: 6 Global Step: 266910 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:42:30,036-Speed 2665.17 samples/sec Loss 10.5463 LearningRate 0.0460 Epoch: 6 Global Step: 266920 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:42:33,930-Speed 2630.50 samples/sec Loss 10.1699 LearningRate 0.0460 Epoch: 6 Global Step: 266930 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:42:37,822-Speed 2631.99 samples/sec Loss 9.4872 LearningRate 0.0460 Epoch: 6 Global Step: 266940 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:42:41,714-Speed 2631.49 samples/sec Loss 9.2826 LearningRate 0.0460 Epoch: 6 Global Step: 266950 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:42:45,606-Speed 2632.12 samples/sec Loss 9.3340 LearningRate 0.0460 Epoch: 6 Global Step: 266960 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:42:49,500-Speed 2630.10 samples/sec Loss 9.3453 LearningRate 0.0460 Epoch: 6 Global Step: 266970 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:42:53,388-Speed 2633.97 samples/sec Loss 9.2801 LearningRate 0.0460 Epoch: 6 Global Step: 266980 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:42:57,278-Speed 2632.98 samples/sec Loss 9.1569 LearningRate 0.0460 Epoch: 6 Global Step: 266990 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:43:01,169-Speed 2632.23 samples/sec Loss 9.3122 LearningRate 0.0460 Epoch: 6 Global Step: 267000 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:43:05,061-Speed 2631.25 samples/sec Loss 9.0867 LearningRate 0.0460 Epoch: 6 Global Step: 267010 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:43:08,951-Speed 2633.26 samples/sec Loss 9.2511 LearningRate 0.0460 Epoch: 6 Global Step: 267020 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:12,842-Speed 2631.94 samples/sec Loss 9.1416 LearningRate 0.0460 Epoch: 6 Global Step: 267030 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:16,733-Speed 2632.97 samples/sec Loss 9.0925 LearningRate 0.0460 Epoch: 6 Global Step: 267040 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:20,626-Speed 2631.63 samples/sec Loss 9.1270 LearningRate 0.0460 Epoch: 6 Global Step: 267050 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:24,516-Speed 2632.69 samples/sec Loss 9.1804 LearningRate 0.0460 Epoch: 6 Global Step: 267060 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:28,406-Speed 2633.33 samples/sec Loss 9.2760 LearningRate 0.0460 Epoch: 6 Global Step: 267070 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:32,300-Speed 2629.81 samples/sec Loss 9.1959 LearningRate 0.0460 Epoch: 6 Global Step: 267080 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:36,216-Speed 2615.31 samples/sec Loss 9.1730 LearningRate 0.0460 Epoch: 6 Global Step: 267090 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:40,109-Speed 2631.09 samples/sec Loss 9.3232 LearningRate 0.0460 Epoch: 6 Global Step: 267100 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:44,005-Speed 2628.75 samples/sec Loss 9.1902 LearningRate 0.0460 Epoch: 6 Global Step: 267110 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:43:47,901-Speed 2628.93 samples/sec Loss 9.1487 LearningRate 0.0460 Epoch: 6 Global Step: 267120 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:43:51,790-Speed 2633.73 samples/sec Loss 9.0621 LearningRate 0.0460 Epoch: 6 Global Step: 267130 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:43:55,681-Speed 2632.47 samples/sec Loss 9.2116 LearningRate 0.0460 Epoch: 6 Global Step: 267140 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:43:59,585-Speed 2623.38 samples/sec Loss 9.1523 LearningRate 0.0460 Epoch: 6 Global Step: 267150 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:44:03,477-Speed 2631.50 samples/sec Loss 9.1569 LearningRate 0.0460 Epoch: 6 Global Step: 267160 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:44:07,373-Speed 2628.97 samples/sec Loss 9.1786 LearningRate 0.0460 Epoch: 6 Global Step: 267170 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:44:11,264-Speed 2632.33 samples/sec Loss 9.1109 LearningRate 0.0460 Epoch: 6 Global Step: 267180 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:44:15,154-Speed 2632.83 samples/sec Loss 9.1845 LearningRate 0.0460 Epoch: 6 Global Step: 267190 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:44:19,048-Speed 2630.56 samples/sec Loss 9.2540 LearningRate 0.0460 Epoch: 6 Global Step: 267200 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:44:22,942-Speed 2630.29 samples/sec Loss 9.3528 LearningRate 0.0460 Epoch: 6 Global Step: 267210 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:44:26,836-Speed 2630.28 samples/sec Loss 9.2025 LearningRate 0.0460 Epoch: 6 Global Step: 267220 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:44:30,731-Speed 2630.19 samples/sec Loss 9.2620 LearningRate 0.0460 Epoch: 6 Global Step: 267230 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:44:34,630-Speed 2626.75 samples/sec Loss 9.1044 LearningRate 0.0459 Epoch: 6 Global Step: 267240 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:44:38,489-Speed 2653.60 samples/sec Loss 10.0610 LearningRate 0.0459 Epoch: 6 Global Step: 267250 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:44:42,374-Speed 2636.61 samples/sec Loss 9.7661 LearningRate 0.0459 Epoch: 6 Global Step: 267260 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:44:46,265-Speed 2631.92 samples/sec Loss 9.3829 LearningRate 0.0459 Epoch: 6 Global Step: 267270 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:44:50,162-Speed 2628.30 samples/sec Loss 9.3004 LearningRate 0.0459 Epoch: 6 Global Step: 267280 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:44:54,050-Speed 2634.18 samples/sec Loss 9.2730 LearningRate 0.0459 Epoch: 6 Global Step: 267290 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:44:57,952-Speed 2624.86 samples/sec Loss 9.2176 LearningRate 0.0459 Epoch: 6 Global Step: 267300 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:45:01,855-Speed 2624.52 samples/sec Loss 9.2507 LearningRate 0.0459 Epoch: 6 Global Step: 267310 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:45:05,744-Speed 2633.82 samples/sec Loss 9.1882 LearningRate 0.0459 Epoch: 6 Global Step: 267320 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:45:09,639-Speed 2629.36 samples/sec Loss 9.0132 LearningRate 0.0459 Epoch: 6 Global Step: 267330 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:45:13,529-Speed 2632.78 samples/sec Loss 9.3156 LearningRate 0.0459 Epoch: 6 Global Step: 267340 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:45:17,425-Speed 2629.81 samples/sec Loss 9.1435 LearningRate 0.0459 Epoch: 6 Global Step: 267350 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:21,317-Speed 2632.12 samples/sec Loss 9.1580 LearningRate 0.0459 Epoch: 6 Global Step: 267360 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:25,209-Speed 2630.86 samples/sec Loss 9.2676 LearningRate 0.0459 Epoch: 6 Global Step: 267370 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:29,103-Speed 2630.45 samples/sec Loss 9.0773 LearningRate 0.0459 Epoch: 6 Global Step: 267380 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:33,004-Speed 2625.33 samples/sec Loss 9.1655 LearningRate 0.0459 Epoch: 6 Global Step: 267390 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:36,895-Speed 2632.25 samples/sec Loss 9.2141 LearningRate 0.0459 Epoch: 6 Global Step: 267400 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:40,796-Speed 2625.58 samples/sec Loss 9.2807 LearningRate 0.0459 Epoch: 6 Global Step: 267410 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:44,768-Speed 2579.08 samples/sec Loss 9.1256 LearningRate 0.0459 Epoch: 6 Global Step: 267420 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:48,675-Speed 2621.35 samples/sec Loss 9.2097 LearningRate 0.0459 Epoch: 6 Global Step: 267430 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:52,574-Speed 2626.98 samples/sec Loss 9.1410 LearningRate 0.0459 Epoch: 6 Global Step: 267440 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:45:56,470-Speed 2628.79 samples/sec Loss 9.1573 LearningRate 0.0459 Epoch: 6 Global Step: 267450 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:00,362-Speed 2631.72 samples/sec Loss 9.1304 LearningRate 0.0459 Epoch: 6 Global Step: 267460 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:04,258-Speed 2628.76 samples/sec Loss 9.1206 LearningRate 0.0459 Epoch: 6 Global Step: 267470 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:08,153-Speed 2629.47 samples/sec Loss 9.2798 LearningRate 0.0459 Epoch: 6 Global Step: 267480 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:12,053-Speed 2626.08 samples/sec Loss 9.2727 LearningRate 0.0459 Epoch: 6 Global Step: 267490 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:15,947-Speed 2630.94 samples/sec Loss 9.2133 LearningRate 0.0459 Epoch: 6 Global Step: 267500 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:19,838-Speed 2632.44 samples/sec Loss 9.2090 LearningRate 0.0459 Epoch: 6 Global Step: 267510 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:23,732-Speed 2630.10 samples/sec Loss 9.3152 LearningRate 0.0459 Epoch: 6 Global Step: 267520 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:27,626-Speed 2630.70 samples/sec Loss 9.1938 LearningRate 0.0459 Epoch: 6 Global Step: 267530 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:31,519-Speed 2630.40 samples/sec Loss 9.3048 LearningRate 0.0459 Epoch: 6 Global Step: 267540 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:46:35,411-Speed 2631.74 samples/sec Loss 9.2459 LearningRate 0.0459 Epoch: 6 Global Step: 267550 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:46:39,315-Speed 2623.14 samples/sec Loss 9.2494 LearningRate 0.0459 Epoch: 6 Global Step: 267560 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:46:43,208-Speed 2631.47 samples/sec Loss 9.2269 LearningRate 0.0459 Epoch: 6 Global Step: 267570 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:46:47,102-Speed 2629.96 samples/sec Loss 9.0547 LearningRate 0.0459 Epoch: 6 Global Step: 267580 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:46:51,002-Speed 2626.50 samples/sec Loss 9.2585 LearningRate 0.0459 Epoch: 6 Global Step: 267590 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:46:54,890-Speed 2634.00 samples/sec Loss 9.1076 LearningRate 0.0459 Epoch: 6 Global Step: 267600 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:46:58,784-Speed 2630.76 samples/sec Loss 9.2237 LearningRate 0.0459 Epoch: 6 Global Step: 267610 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:47:02,677-Speed 2630.89 samples/sec Loss 9.1706 LearningRate 0.0459 Epoch: 6 Global Step: 267620 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:47:06,574-Speed 2628.58 samples/sec Loss 9.1382 LearningRate 0.0459 Epoch: 6 Global Step: 267630 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:47:10,475-Speed 2624.86 samples/sec Loss 9.2575 LearningRate 0.0459 Epoch: 6 Global Step: 267640 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:47:14,375-Speed 2626.01 samples/sec Loss 9.0491 LearningRate 0.0459 Epoch: 6 Global Step: 267650 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:18,266-Speed 2632.17 samples/sec Loss 9.1341 LearningRate 0.0459 Epoch: 6 Global Step: 267660 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:22,155-Speed 2634.24 samples/sec Loss 9.0575 LearningRate 0.0459 Epoch: 6 Global Step: 267670 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:26,060-Speed 2623.08 samples/sec Loss 9.1250 LearningRate 0.0459 Epoch: 6 Global Step: 267680 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:29,959-Speed 2626.89 samples/sec Loss 9.1191 LearningRate 0.0459 Epoch: 6 Global Step: 267690 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:33,851-Speed 2631.87 samples/sec Loss 9.2159 LearningRate 0.0459 Epoch: 6 Global Step: 267700 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:37,744-Speed 2630.49 samples/sec Loss 9.3464 LearningRate 0.0459 Epoch: 6 Global Step: 267710 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:41,635-Speed 2632.47 samples/sec Loss 9.1230 LearningRate 0.0459 Epoch: 6 Global Step: 267720 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:45,540-Speed 2622.37 samples/sec Loss 9.1680 LearningRate 0.0459 Epoch: 6 Global Step: 267730 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:49,432-Speed 2632.04 samples/sec Loss 9.3370 LearningRate 0.0459 Epoch: 6 Global Step: 267740 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:53,309-Speed 2641.91 samples/sec Loss 9.1873 LearningRate 0.0459 Epoch: 6 Global Step: 267750 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:47:57,182-Speed 2644.45 samples/sec Loss 9.2112 LearningRate 0.0459 Epoch: 6 Global Step: 267760 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:01,072-Speed 2633.04 samples/sec Loss 9.2474 LearningRate 0.0459 Epoch: 6 Global Step: 267770 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:04,984-Speed 2618.00 samples/sec Loss 9.3296 LearningRate 0.0459 Epoch: 6 Global Step: 267780 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:08,879-Speed 2629.30 samples/sec Loss 9.1859 LearningRate 0.0459 Epoch: 6 Global Step: 267790 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:12,771-Speed 2631.77 samples/sec Loss 9.0392 LearningRate 0.0459 Epoch: 6 Global Step: 267800 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:16,674-Speed 2624.46 samples/sec Loss 9.2447 LearningRate 0.0459 Epoch: 6 Global Step: 267810 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:20,566-Speed 2632.44 samples/sec Loss 9.1000 LearningRate 0.0459 Epoch: 6 Global Step: 267820 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:24,455-Speed 2633.20 samples/sec Loss 8.9644 LearningRate 0.0459 Epoch: 6 Global Step: 267830 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:28,348-Speed 2630.97 samples/sec Loss 9.2387 LearningRate 0.0459 Epoch: 6 Global Step: 267840 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:48:32,248-Speed 2626.47 samples/sec Loss 9.1431 LearningRate 0.0458 Epoch: 6 Global Step: 267850 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:48:36,132-Speed 2637.45 samples/sec Loss 9.2797 LearningRate 0.0458 Epoch: 6 Global Step: 267860 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:48:40,022-Speed 2632.81 samples/sec Loss 9.1682 LearningRate 0.0458 Epoch: 6 Global Step: 267870 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:48:43,912-Speed 2633.05 samples/sec Loss 9.2243 LearningRate 0.0458 Epoch: 6 Global Step: 267880 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:48:47,812-Speed 2626.24 samples/sec Loss 9.1913 LearningRate 0.0458 Epoch: 6 Global Step: 267890 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:48:51,704-Speed 2631.82 samples/sec Loss 9.3623 LearningRate 0.0458 Epoch: 6 Global Step: 267900 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:48:55,597-Speed 2631.35 samples/sec Loss 9.2358 LearningRate 0.0458 Epoch: 6 Global Step: 267910 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:48:59,485-Speed 2633.60 samples/sec Loss 9.2960 LearningRate 0.0458 Epoch: 6 Global Step: 267920 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:49:03,377-Speed 2631.46 samples/sec Loss 9.0651 LearningRate 0.0458 Epoch: 6 Global Step: 267930 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:49:07,269-Speed 2631.91 samples/sec Loss 9.1021 LearningRate 0.0458 Epoch: 6 Global Step: 267940 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:49:11,158-Speed 2633.87 samples/sec Loss 9.1856 LearningRate 0.0458 Epoch: 6 Global Step: 267950 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:49:14,989-Speed 2672.95 samples/sec Loss 9.5729 LearningRate 0.0458 Epoch: 6 Global Step: 267960 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:18,877-Speed 2634.70 samples/sec Loss 9.2156 LearningRate 0.0458 Epoch: 6 Global Step: 267970 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:22,763-Speed 2635.32 samples/sec Loss 9.2090 LearningRate 0.0458 Epoch: 6 Global Step: 267980 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:26,661-Speed 2627.74 samples/sec Loss 9.0292 LearningRate 0.0458 Epoch: 6 Global Step: 267990 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:30,559-Speed 2627.77 samples/sec Loss 9.2254 LearningRate 0.0458 Epoch: 6 Global Step: 268000 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:34,453-Speed 2630.78 samples/sec Loss 9.1225 LearningRate 0.0458 Epoch: 6 Global Step: 268010 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:38,348-Speed 2629.39 samples/sec Loss 9.0253 LearningRate 0.0458 Epoch: 6 Global Step: 268020 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:42,239-Speed 2631.91 samples/sec Loss 9.2111 LearningRate 0.0458 Epoch: 6 Global Step: 268030 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:46,131-Speed 2631.90 samples/sec Loss 9.1300 LearningRate 0.0458 Epoch: 6 Global Step: 268040 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:50,024-Speed 2630.90 samples/sec Loss 9.0211 LearningRate 0.0458 Epoch: 6 Global Step: 268050 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:49:53,924-Speed 2626.49 samples/sec Loss 9.1664 LearningRate 0.0458 Epoch: 6 Global Step: 268060 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:49:57,812-Speed 2634.04 samples/sec Loss 9.0721 LearningRate 0.0458 Epoch: 6 Global Step: 268070 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:01,705-Speed 2631.52 samples/sec Loss 9.2144 LearningRate 0.0458 Epoch: 6 Global Step: 268080 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:05,594-Speed 2633.41 samples/sec Loss 9.2032 LearningRate 0.0458 Epoch: 6 Global Step: 268090 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:09,487-Speed 2630.77 samples/sec Loss 9.1543 LearningRate 0.0458 Epoch: 6 Global Step: 268100 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:13,380-Speed 2631.07 samples/sec Loss 9.1297 LearningRate 0.0458 Epoch: 6 Global Step: 268110 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:17,273-Speed 2630.65 samples/sec Loss 9.2631 LearningRate 0.0458 Epoch: 6 Global Step: 268120 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:21,175-Speed 2625.07 samples/sec Loss 9.0455 LearningRate 0.0458 Epoch: 6 Global Step: 268130 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:25,087-Speed 2618.09 samples/sec Loss 9.1653 LearningRate 0.0458 Epoch: 6 Global Step: 268140 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:28,988-Speed 2625.30 samples/sec Loss 9.0919 LearningRate 0.0458 Epoch: 6 Global Step: 268150 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:50:32,889-Speed 2625.69 samples/sec Loss 9.2184 LearningRate 0.0458 Epoch: 6 Global Step: 268160 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:50:36,790-Speed 2625.74 samples/sec Loss 9.0628 LearningRate 0.0458 Epoch: 6 Global Step: 268170 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:50:40,692-Speed 2625.09 samples/sec Loss 9.1862 LearningRate 0.0458 Epoch: 6 Global Step: 268180 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:50:44,590-Speed 2627.79 samples/sec Loss 9.2036 LearningRate 0.0458 Epoch: 6 Global Step: 268190 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:50:48,502-Speed 2618.02 samples/sec Loss 9.0463 LearningRate 0.0458 Epoch: 6 Global Step: 268200 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:50:52,417-Speed 2616.93 samples/sec Loss 9.1768 LearningRate 0.0458 Epoch: 6 Global Step: 268210 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:50:56,316-Speed 2626.72 samples/sec Loss 9.3072 LearningRate 0.0458 Epoch: 6 Global Step: 268220 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:51:00,215-Speed 2626.81 samples/sec Loss 9.1510 LearningRate 0.0458 Epoch: 6 Global Step: 268230 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:51:04,110-Speed 2629.15 samples/sec Loss 9.0620 LearningRate 0.0458 Epoch: 6 Global Step: 268240 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:51:08,012-Speed 2624.70 samples/sec Loss 9.2038 LearningRate 0.0458 Epoch: 6 Global Step: 268250 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:51:11,912-Speed 2626.37 samples/sec Loss 9.2141 LearningRate 0.0458 Epoch: 6 Global Step: 268260 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:15,809-Speed 2628.18 samples/sec Loss 9.0850 LearningRate 0.0458 Epoch: 6 Global Step: 268270 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:19,709-Speed 2626.64 samples/sec Loss 9.2841 LearningRate 0.0458 Epoch: 6 Global Step: 268280 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:23,602-Speed 2631.42 samples/sec Loss 9.2963 LearningRate 0.0458 Epoch: 6 Global Step: 268290 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:27,494-Speed 2631.46 samples/sec Loss 9.2716 LearningRate 0.0458 Epoch: 6 Global Step: 268300 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:31,390-Speed 2628.94 samples/sec Loss 9.1219 LearningRate 0.0458 Epoch: 6 Global Step: 268310 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:35,279-Speed 2633.42 samples/sec Loss 9.3093 LearningRate 0.0458 Epoch: 6 Global Step: 268320 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:39,171-Speed 2631.76 samples/sec Loss 9.2101 LearningRate 0.0458 Epoch: 6 Global Step: 268330 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:43,068-Speed 2627.67 samples/sec Loss 9.2050 LearningRate 0.0458 Epoch: 6 Global Step: 268340 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:46,975-Speed 2621.54 samples/sec Loss 9.2335 LearningRate 0.0458 Epoch: 6 Global Step: 268350 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:51:50,868-Speed 2630.95 samples/sec Loss 9.0954 LearningRate 0.0458 Epoch: 6 Global Step: 268360 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:51:54,763-Speed 2629.94 samples/sec Loss 9.2823 LearningRate 0.0458 Epoch: 6 Global Step: 268370 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:51:58,668-Speed 2623.00 samples/sec Loss 9.1728 LearningRate 0.0458 Epoch: 6 Global Step: 268380 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:02,561-Speed 2631.47 samples/sec Loss 9.1748 LearningRate 0.0458 Epoch: 6 Global Step: 268390 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:06,481-Speed 2612.77 samples/sec Loss 9.1087 LearningRate 0.0458 Epoch: 6 Global Step: 268400 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:10,581-Speed 2497.80 samples/sec Loss 9.1412 LearningRate 0.0458 Epoch: 6 Global Step: 268410 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:14,674-Speed 2502.20 samples/sec Loss 9.0897 LearningRate 0.0458 Epoch: 6 Global Step: 268420 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:18,773-Speed 2498.42 samples/sec Loss 9.2223 LearningRate 0.0458 Epoch: 6 Global Step: 268430 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:22,667-Speed 2630.69 samples/sec Loss 9.1864 LearningRate 0.0458 Epoch: 6 Global Step: 268440 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:26,560-Speed 2630.91 samples/sec Loss 9.2008 LearningRate 0.0458 Epoch: 6 Global Step: 268450 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:52:30,452-Speed 2631.72 samples/sec Loss 9.1448 LearningRate 0.0458 Epoch: 6 Global Step: 268460 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:52:34,341-Speed 2633.70 samples/sec Loss 9.2019 LearningRate 0.0457 Epoch: 6 Global Step: 268470 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:52:38,249-Speed 2620.95 samples/sec Loss 9.2118 LearningRate 0.0457 Epoch: 6 Global Step: 268480 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:52:42,140-Speed 2631.86 samples/sec Loss 9.1443 LearningRate 0.0457 Epoch: 6 Global Step: 268490 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:52:46,033-Speed 2631.07 samples/sec Loss 9.1033 LearningRate 0.0457 Epoch: 6 Global Step: 268500 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:52:49,929-Speed 2629.42 samples/sec Loss 9.1153 LearningRate 0.0457 Epoch: 6 Global Step: 268510 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:52:53,826-Speed 2627.59 samples/sec Loss 9.1183 LearningRate 0.0457 Epoch: 6 Global Step: 268520 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:52:57,757-Speed 2605.90 samples/sec Loss 9.0094 LearningRate 0.0457 Epoch: 6 Global Step: 268530 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:01,649-Speed 2631.71 samples/sec Loss 9.0682 LearningRate 0.0457 Epoch: 6 Global Step: 268540 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:05,543-Speed 2630.25 samples/sec Loss 9.0503 LearningRate 0.0457 Epoch: 6 Global Step: 268550 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:09,420-Speed 2641.92 samples/sec Loss 9.0889 LearningRate 0.0457 Epoch: 6 Global Step: 268560 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:13,317-Speed 2627.97 samples/sec Loss 9.1871 LearningRate 0.0457 Epoch: 6 Global Step: 268570 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:17,214-Speed 2628.02 samples/sec Loss 9.0346 LearningRate 0.0457 Epoch: 6 Global Step: 268580 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:21,113-Speed 2627.19 samples/sec Loss 9.2283 LearningRate 0.0457 Epoch: 6 Global Step: 268590 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:25,010-Speed 2628.08 samples/sec Loss 9.2022 LearningRate 0.0457 Epoch: 6 Global Step: 268600 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:28,921-Speed 2618.87 samples/sec Loss 9.0251 LearningRate 0.0457 Epoch: 6 Global Step: 268610 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 01:53:32,809-Speed 2634.44 samples/sec Loss 9.1621 LearningRate 0.0457 Epoch: 6 Global Step: 268620 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:53:36,722-Speed 2617.90 samples/sec Loss 9.1180 LearningRate 0.0457 Epoch: 6 Global Step: 268630 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:53:40,619-Speed 2627.69 samples/sec Loss 9.2702 LearningRate 0.0457 Epoch: 6 Global Step: 268640 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:53:44,674-Speed 2526.14 samples/sec Loss 9.2492 LearningRate 0.0457 Epoch: 6 Global Step: 268650 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:53:48,725-Speed 2528.35 samples/sec Loss 9.0661 LearningRate 0.0457 Epoch: 6 Global Step: 268660 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:53:52,525-Speed 2695.63 samples/sec Loss 9.5631 LearningRate 0.0457 Epoch: 6 Global Step: 268670 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:53:56,389-Speed 2650.57 samples/sec Loss 9.9581 LearningRate 0.0457 Epoch: 6 Global Step: 268680 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:00,276-Speed 2635.17 samples/sec Loss 9.2565 LearningRate 0.0457 Epoch: 6 Global Step: 268690 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:04,173-Speed 2628.28 samples/sec Loss 9.2040 LearningRate 0.0457 Epoch: 6 Global Step: 268700 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:08,071-Speed 2627.64 samples/sec Loss 9.1258 LearningRate 0.0457 Epoch: 6 Global Step: 268710 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:11,968-Speed 2627.95 samples/sec Loss 9.1247 LearningRate 0.0457 Epoch: 6 Global Step: 268720 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:15,868-Speed 2626.22 samples/sec Loss 9.1598 LearningRate 0.0457 Epoch: 6 Global Step: 268730 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:19,768-Speed 2625.71 samples/sec Loss 9.2367 LearningRate 0.0457 Epoch: 6 Global Step: 268740 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:23,666-Speed 2627.58 samples/sec Loss 9.1420 LearningRate 0.0457 Epoch: 6 Global Step: 268750 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:27,559-Speed 2631.59 samples/sec Loss 9.0880 LearningRate 0.0457 Epoch: 6 Global Step: 268760 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:31,447-Speed 2633.96 samples/sec Loss 9.0544 LearningRate 0.0457 Epoch: 6 Global Step: 268770 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 01:54:35,334-Speed 2634.85 samples/sec Loss 9.1698 LearningRate 0.0457 Epoch: 6 Global Step: 268780 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:54:39,225-Speed 2632.20 samples/sec Loss 9.1856 LearningRate 0.0457 Epoch: 6 Global Step: 268790 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:54:43,130-Speed 2623.44 samples/sec Loss 9.0475 LearningRate 0.0457 Epoch: 6 Global Step: 268800 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:54:47,021-Speed 2632.09 samples/sec Loss 9.1468 LearningRate 0.0457 Epoch: 6 Global Step: 268810 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:54:50,919-Speed 2627.84 samples/sec Loss 9.1548 LearningRate 0.0457 Epoch: 6 Global Step: 268820 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:54:54,811-Speed 2631.55 samples/sec Loss 9.1372 LearningRate 0.0457 Epoch: 6 Global Step: 268830 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:54:58,697-Speed 2635.02 samples/sec Loss 9.0984 LearningRate 0.0457 Epoch: 6 Global Step: 268840 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:55:02,588-Speed 2633.15 samples/sec Loss 9.1668 LearningRate 0.0457 Epoch: 6 Global Step: 268850 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:55:06,497-Speed 2620.21 samples/sec Loss 9.1635 LearningRate 0.0457 Epoch: 6 Global Step: 268860 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:55:10,387-Speed 2632.88 samples/sec Loss 9.1835 LearningRate 0.0457 Epoch: 6 Global Step: 268870 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 01:55:14,275-Speed 2634.16 samples/sec Loss 9.1107 LearningRate 0.0457 Epoch: 6 Global Step: 268880 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:18,181-Speed 2622.56 samples/sec Loss 9.1757 LearningRate 0.0457 Epoch: 6 Global Step: 268890 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:22,074-Speed 2631.14 samples/sec Loss 9.1194 LearningRate 0.0457 Epoch: 6 Global Step: 268900 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:25,963-Speed 2633.66 samples/sec Loss 9.0952 LearningRate 0.0457 Epoch: 6 Global Step: 268910 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:29,856-Speed 2630.99 samples/sec Loss 9.2855 LearningRate 0.0457 Epoch: 6 Global Step: 268920 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:33,746-Speed 2632.41 samples/sec Loss 9.2702 LearningRate 0.0457 Epoch: 6 Global Step: 268930 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:37,635-Speed 2634.04 samples/sec Loss 9.1805 LearningRate 0.0457 Epoch: 6 Global Step: 268940 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:41,525-Speed 2632.68 samples/sec Loss 9.0697 LearningRate 0.0457 Epoch: 6 Global Step: 268950 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:45,418-Speed 2631.65 samples/sec Loss 8.9927 LearningRate 0.0457 Epoch: 6 Global Step: 268960 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:49,306-Speed 2633.92 samples/sec Loss 9.0862 LearningRate 0.0457 Epoch: 6 Global Step: 268970 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 01:55:53,194-Speed 2634.70 samples/sec Loss 9.2092 LearningRate 0.0457 Epoch: 6 Global Step: 268980 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:55:57,086-Speed 2631.32 samples/sec Loss 9.2305 LearningRate 0.0457 Epoch: 6 Global Step: 268990 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:00,996-Speed 2619.75 samples/sec Loss 9.1049 LearningRate 0.0457 Epoch: 6 Global Step: 269000 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:04,887-Speed 2632.27 samples/sec Loss 9.2217 LearningRate 0.0457 Epoch: 6 Global Step: 269010 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:08,780-Speed 2630.42 samples/sec Loss 9.1229 LearningRate 0.0457 Epoch: 6 Global Step: 269020 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:12,669-Speed 2633.71 samples/sec Loss 9.2245 LearningRate 0.0457 Epoch: 6 Global Step: 269030 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:16,559-Speed 2633.16 samples/sec Loss 9.1123 LearningRate 0.0457 Epoch: 6 Global Step: 269040 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:20,447-Speed 2634.50 samples/sec Loss 9.2719 LearningRate 0.0457 Epoch: 6 Global Step: 269050 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:24,335-Speed 2634.76 samples/sec Loss 9.1591 LearningRate 0.0457 Epoch: 6 Global Step: 269060 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:28,224-Speed 2633.73 samples/sec Loss 9.4100 LearningRate 0.0457 Epoch: 6 Global Step: 269070 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 01:56:32,125-Speed 2626.69 samples/sec Loss 9.2763 LearningRate 0.0456 Epoch: 6 Global Step: 269080 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:56:36,020-Speed 2629.41 samples/sec Loss 9.3571 LearningRate 0.0456 Epoch: 6 Global Step: 269090 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:56:39,905-Speed 2635.77 samples/sec Loss 9.2778 LearningRate 0.0456 Epoch: 6 Global Step: 269100 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:56:43,797-Speed 2631.58 samples/sec Loss 9.1069 LearningRate 0.0456 Epoch: 6 Global Step: 269110 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:56:47,709-Speed 2618.82 samples/sec Loss 9.1829 LearningRate 0.0456 Epoch: 6 Global Step: 269120 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:56:51,731-Speed 2546.27 samples/sec Loss 9.1981 LearningRate 0.0456 Epoch: 6 Global Step: 269130 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:56:55,624-Speed 2631.42 samples/sec Loss 9.1931 LearningRate 0.0456 Epoch: 6 Global Step: 269140 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:56:59,529-Speed 2622.59 samples/sec Loss 9.1725 LearningRate 0.0456 Epoch: 6 Global Step: 269150 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:03,434-Speed 2623.24 samples/sec Loss 9.2072 LearningRate 0.0456 Epoch: 6 Global Step: 269160 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:07,334-Speed 2626.48 samples/sec Loss 9.1949 LearningRate 0.0456 Epoch: 6 Global Step: 269170 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:11,209-Speed 2642.49 samples/sec Loss 9.3185 LearningRate 0.0456 Epoch: 6 Global Step: 269180 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:15,103-Speed 2630.60 samples/sec Loss 9.2522 LearningRate 0.0456 Epoch: 6 Global Step: 269190 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:18,998-Speed 2629.51 samples/sec Loss 9.5083 LearningRate 0.0456 Epoch: 6 Global Step: 269200 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:22,897-Speed 2627.49 samples/sec Loss 9.3266 LearningRate 0.0456 Epoch: 6 Global Step: 269210 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:26,788-Speed 2632.10 samples/sec Loss 9.2419 LearningRate 0.0456 Epoch: 6 Global Step: 269220 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:30,680-Speed 2631.71 samples/sec Loss 9.0598 LearningRate 0.0456 Epoch: 6 Global Step: 269230 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:34,569-Speed 2633.68 samples/sec Loss 9.1555 LearningRate 0.0456 Epoch: 6 Global Step: 269240 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:38,483-Speed 2616.77 samples/sec Loss 9.1251 LearningRate 0.0456 Epoch: 6 Global Step: 269250 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:42,541-Speed 2524.12 samples/sec Loss 9.1182 LearningRate 0.0456 Epoch: 6 Global Step: 269260 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:46,615-Speed 2514.61 samples/sec Loss 9.0999 LearningRate 0.0456 Epoch: 6 Global Step: 269270 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 01:57:50,532-Speed 2614.33 samples/sec Loss 9.2130 LearningRate 0.0456 Epoch: 6 Global Step: 269280 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:57:54,446-Speed 2617.46 samples/sec Loss 9.2605 LearningRate 0.0456 Epoch: 6 Global Step: 269290 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:57:58,338-Speed 2631.26 samples/sec Loss 9.1481 LearningRate 0.0456 Epoch: 6 Global Step: 269300 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:02,234-Speed 2629.48 samples/sec Loss 9.0933 LearningRate 0.0456 Epoch: 6 Global Step: 269310 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:06,125-Speed 2632.47 samples/sec Loss 9.1332 LearningRate 0.0456 Epoch: 6 Global Step: 269320 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:10,019-Speed 2629.98 samples/sec Loss 9.0286 LearningRate 0.0456 Epoch: 6 Global Step: 269330 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:13,917-Speed 2627.19 samples/sec Loss 9.0254 LearningRate 0.0456 Epoch: 6 Global Step: 269340 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:17,813-Speed 2629.74 samples/sec Loss 9.2052 LearningRate 0.0456 Epoch: 6 Global Step: 269350 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:21,726-Speed 2617.15 samples/sec Loss 9.1417 LearningRate 0.0456 Epoch: 6 Global Step: 269360 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:25,667-Speed 2598.70 samples/sec Loss 9.2790 LearningRate 0.0456 Epoch: 6 Global Step: 269370 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 01:58:29,562-Speed 2629.72 samples/sec Loss 9.2202 LearningRate 0.0456 Epoch: 6 Global Step: 269380 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:58:33,459-Speed 2628.21 samples/sec Loss 9.0683 LearningRate 0.0456 Epoch: 6 Global Step: 269390 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:58:37,351-Speed 2632.01 samples/sec Loss 9.0703 LearningRate 0.0456 Epoch: 6 Global Step: 269400 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:58:41,245-Speed 2630.42 samples/sec Loss 9.2846 LearningRate 0.0456 Epoch: 6 Global Step: 269410 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:58:45,144-Speed 2626.63 samples/sec Loss 9.0753 LearningRate 0.0456 Epoch: 6 Global Step: 269420 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:58:49,035-Speed 2632.16 samples/sec Loss 9.0994 LearningRate 0.0456 Epoch: 6 Global Step: 269430 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:58:52,926-Speed 2632.87 samples/sec Loss 9.1568 LearningRate 0.0456 Epoch: 6 Global Step: 269440 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:58:56,817-Speed 2632.59 samples/sec Loss 9.1248 LearningRate 0.0456 Epoch: 6 Global Step: 269450 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:59:00,709-Speed 2631.50 samples/sec Loss 9.0883 LearningRate 0.0456 Epoch: 6 Global Step: 269460 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:59:04,756-Speed 2530.94 samples/sec Loss 9.1651 LearningRate 0.0456 Epoch: 6 Global Step: 269470 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 01:59:08,675-Speed 2614.16 samples/sec Loss 9.1019 LearningRate 0.0456 Epoch: 6 Global Step: 269480 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:12,605-Speed 2606.50 samples/sec Loss 9.1191 LearningRate 0.0456 Epoch: 6 Global Step: 269490 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:16,497-Speed 2631.01 samples/sec Loss 9.0802 LearningRate 0.0456 Epoch: 6 Global Step: 269500 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:20,399-Speed 2624.76 samples/sec Loss 9.1310 LearningRate 0.0456 Epoch: 6 Global Step: 269510 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:24,293-Speed 2631.02 samples/sec Loss 9.1247 LearningRate 0.0456 Epoch: 6 Global Step: 269520 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:28,183-Speed 2633.25 samples/sec Loss 9.0995 LearningRate 0.0456 Epoch: 6 Global Step: 269530 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:32,101-Speed 2614.09 samples/sec Loss 9.0800 LearningRate 0.0456 Epoch: 6 Global Step: 269540 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:35,996-Speed 2630.18 samples/sec Loss 9.1482 LearningRate 0.0456 Epoch: 6 Global Step: 269550 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:39,891-Speed 2629.33 samples/sec Loss 9.0423 LearningRate 0.0456 Epoch: 6 Global Step: 269560 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:43,784-Speed 2631.20 samples/sec Loss 9.0153 LearningRate 0.0456 Epoch: 6 Global Step: 269570 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:47,658-Speed 2643.39 samples/sec Loss 9.0404 LearningRate 0.0456 Epoch: 6 Global Step: 269580 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:51,549-Speed 2632.59 samples/sec Loss 9.0595 LearningRate 0.0456 Epoch: 6 Global Step: 269590 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:55,440-Speed 2632.23 samples/sec Loss 9.1197 LearningRate 0.0456 Epoch: 6 Global Step: 269600 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 01:59:59,333-Speed 2630.96 samples/sec Loss 9.1980 LearningRate 0.0456 Epoch: 6 Global Step: 269610 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:00:03,245-Speed 2618.72 samples/sec Loss 9.1376 LearningRate 0.0456 Epoch: 6 Global Step: 269620 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:00:07,149-Speed 2623.70 samples/sec Loss 9.2206 LearningRate 0.0456 Epoch: 6 Global Step: 269630 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:00:11,046-Speed 2628.31 samples/sec Loss 9.1564 LearningRate 0.0456 Epoch: 6 Global Step: 269640 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:00:14,938-Speed 2631.70 samples/sec Loss 9.1459 LearningRate 0.0456 Epoch: 6 Global Step: 269650 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:00:18,834-Speed 2629.14 samples/sec Loss 9.0089 LearningRate 0.0456 Epoch: 6 Global Step: 269660 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:00:22,735-Speed 2625.69 samples/sec Loss 9.0946 LearningRate 0.0456 Epoch: 6 Global Step: 269670 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:00:26,621-Speed 2635.05 samples/sec Loss 9.1567 LearningRate 0.0456 Epoch: 6 Global Step: 269680 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:00:30,519-Speed 2627.49 samples/sec Loss 9.1374 LearningRate 0.0456 Epoch: 6 Global Step: 269690 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:00:34,438-Speed 2613.96 samples/sec Loss 9.1618 LearningRate 0.0455 Epoch: 6 Global Step: 269700 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:00:38,340-Speed 2625.15 samples/sec Loss 9.0348 LearningRate 0.0455 Epoch: 6 Global Step: 269710 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:00:42,207-Speed 2648.91 samples/sec Loss 9.1594 LearningRate 0.0455 Epoch: 6 Global Step: 269720 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:00:46,110-Speed 2624.11 samples/sec Loss 9.1597 LearningRate 0.0455 Epoch: 6 Global Step: 269730 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:00:50,013-Speed 2624.38 samples/sec Loss 9.0662 LearningRate 0.0455 Epoch: 6 Global Step: 269740 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:00:53,916-Speed 2624.26 samples/sec Loss 9.0704 LearningRate 0.0455 Epoch: 6 Global Step: 269750 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:00:57,817-Speed 2625.44 samples/sec Loss 9.1675 LearningRate 0.0455 Epoch: 6 Global Step: 269760 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:01:01,721-Speed 2623.39 samples/sec Loss 9.1074 LearningRate 0.0455 Epoch: 6 Global Step: 269770 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:01:05,621-Speed 2626.09 samples/sec Loss 9.1198 LearningRate 0.0455 Epoch: 6 Global Step: 269780 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:01:09,522-Speed 2625.40 samples/sec Loss 9.0081 LearningRate 0.0455 Epoch: 6 Global Step: 269790 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:01:13,423-Speed 2626.20 samples/sec Loss 9.1608 LearningRate 0.0455 Epoch: 6 Global Step: 269800 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:01:17,305-Speed 2638.56 samples/sec Loss 9.6563 LearningRate 0.0455 Epoch: 6 Global Step: 269810 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:01:21,214-Speed 2620.16 samples/sec Loss 9.4555 LearningRate 0.0455 Epoch: 6 Global Step: 269820 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:01:25,130-Speed 2615.90 samples/sec Loss 9.0788 LearningRate 0.0455 Epoch: 6 Global Step: 269830 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:01:29,020-Speed 2632.70 samples/sec Loss 9.2156 LearningRate 0.0455 Epoch: 6 Global Step: 269840 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:01:32,880-Speed 2653.53 samples/sec Loss 9.4486 LearningRate 0.0455 Epoch: 6 Global Step: 269850 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:01:36,761-Speed 2639.20 samples/sec Loss 9.5067 LearningRate 0.0455 Epoch: 6 Global Step: 269860 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:01:40,655-Speed 2630.43 samples/sec Loss 9.1704 LearningRate 0.0455 Epoch: 6 Global Step: 269870 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:01:44,552-Speed 2628.09 samples/sec Loss 10.2979 LearningRate 0.0455 Epoch: 6 Global Step: 269880 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:01:48,490-Speed 2601.47 samples/sec Loss 10.0212 LearningRate 0.0455 Epoch: 6 Global Step: 269890 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:01:52,397-Speed 2621.30 samples/sec Loss 9.7023 LearningRate 0.0455 Epoch: 6 Global Step: 269900 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:01:56,281-Speed 2637.93 samples/sec Loss 9.4155 LearningRate 0.0455 Epoch: 6 Global Step: 269910 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:02:00,167-Speed 2635.61 samples/sec Loss 9.3079 LearningRate 0.0455 Epoch: 6 Global Step: 269920 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:02:04,055-Speed 2634.08 samples/sec Loss 9.1971 LearningRate 0.0455 Epoch: 6 Global Step: 269930 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:02:07,942-Speed 2634.75 samples/sec Loss 9.2992 LearningRate 0.0455 Epoch: 6 Global Step: 269940 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:02:11,832-Speed 2633.80 samples/sec Loss 9.1423 LearningRate 0.0455 Epoch: 6 Global Step: 269950 Fp16 Grad Scale: 512 Required: 63 hours
Training: 2022-04-14 02:02:15,721-Speed 2633.62 samples/sec Loss 9.1950 LearningRate 0.0455 Epoch: 6 Global Step: 269960 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:02:19,616-Speed 2630.18 samples/sec Loss 9.1468 LearningRate 0.0455 Epoch: 6 Global Step: 269970 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:02:23,503-Speed 2635.03 samples/sec Loss 9.1318 LearningRate 0.0455 Epoch: 6 Global Step: 269980 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:02:27,392-Speed 2633.66 samples/sec Loss 9.1776 LearningRate 0.0455 Epoch: 6 Global Step: 269990 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:02:31,314-Speed 2611.69 samples/sec Loss 9.1283 LearningRate 0.0455 Epoch: 6 Global Step: 270000 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:03:14,635-[lfw][270000]XNorm: 23.538378
Training: 2022-04-14 02:03:14,636-[lfw][270000]Accuracy-Flip: 0.99717+-0.00248
Training: 2022-04-14 02:03:14,636-[lfw][270000]Accuracy-Highest: 0.99783
Training: 2022-04-14 02:04:05,255-[cfp_fp][270000]XNorm: 21.506028
Training: 2022-04-14 02:04:05,256-[cfp_fp][270000]Accuracy-Flip: 0.98400+-0.00423
Training: 2022-04-14 02:04:05,257-[cfp_fp][270000]Accuracy-Highest: 0.98643
Training: 2022-04-14 02:04:48,760-[agedb_30][270000]XNorm: 23.227426
Training: 2022-04-14 02:04:48,761-[agedb_30][270000]Accuracy-Flip: 0.97367+-0.00552
Training: 2022-04-14 02:04:48,762-[agedb_30][270000]Accuracy-Highest: 0.97367
Training: 2022-04-14 02:04:52,633-Speed 72.46 samples/sec Loss 9.0449 LearningRate 0.0455 Epoch: 6 Global Step: 270010 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:04:56,545-Speed 2618.26 samples/sec Loss 9.1670 LearningRate 0.0455 Epoch: 6 Global Step: 270020 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:05:00,427-Speed 2639.24 samples/sec Loss 9.0819 LearningRate 0.0455 Epoch: 6 Global Step: 270030 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:05:04,354-Speed 2608.66 samples/sec Loss 9.1276 LearningRate 0.0455 Epoch: 6 Global Step: 270040 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:05:08,230-Speed 2643.20 samples/sec Loss 9.2126 LearningRate 0.0455 Epoch: 6 Global Step: 270050 Fp16 Grad Scale: 1024 Required: 63 hours
Training: 2022-04-14 02:05:12,151-Speed 2611.57 samples/sec Loss 8.9853 LearningRate 0.0455 Epoch: 6 Global Step: 270060 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:16,024-Speed 2645.35 samples/sec Loss 9.1526 LearningRate 0.0455 Epoch: 6 Global Step: 270070 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:19,902-Speed 2641.02 samples/sec Loss 9.1344 LearningRate 0.0455 Epoch: 6 Global Step: 270080 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:23,783-Speed 2638.85 samples/sec Loss 9.2694 LearningRate 0.0455 Epoch: 6 Global Step: 270090 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:27,664-Speed 2639.81 samples/sec Loss 9.0678 LearningRate 0.0455 Epoch: 6 Global Step: 270100 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:31,548-Speed 2636.96 samples/sec Loss 9.0944 LearningRate 0.0455 Epoch: 6 Global Step: 270110 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:35,447-Speed 2627.14 samples/sec Loss 9.1972 LearningRate 0.0455 Epoch: 6 Global Step: 270120 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:39,343-Speed 2628.91 samples/sec Loss 9.1729 LearningRate 0.0455 Epoch: 6 Global Step: 270130 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:43,239-Speed 2629.23 samples/sec Loss 9.3025 LearningRate 0.0455 Epoch: 6 Global Step: 270140 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:47,117-Speed 2641.51 samples/sec Loss 9.0491 LearningRate 0.0455 Epoch: 6 Global Step: 270150 Fp16 Grad Scale: 2048 Required: 63 hours
Training: 2022-04-14 02:05:51,002-Speed 2636.25 samples/sec Loss 9.2114 LearningRate 0.0455 Epoch: 6 Global Step: 270160 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:05:54,898-Speed 2629.10 samples/sec Loss 9.1614 LearningRate 0.0455 Epoch: 6 Global Step: 270170 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:05:58,790-Speed 2632.06 samples/sec Loss 9.2348 LearningRate 0.0455 Epoch: 6 Global Step: 270180 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:02,677-Speed 2635.01 samples/sec Loss 9.2104 LearningRate 0.0455 Epoch: 6 Global Step: 270190 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:06,564-Speed 2635.31 samples/sec Loss 9.1985 LearningRate 0.0455 Epoch: 6 Global Step: 270200 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:10,455-Speed 2632.32 samples/sec Loss 9.0861 LearningRate 0.0455 Epoch: 6 Global Step: 270210 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:14,346-Speed 2632.95 samples/sec Loss 9.0845 LearningRate 0.0455 Epoch: 6 Global Step: 270220 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:18,243-Speed 2628.60 samples/sec Loss 9.1891 LearningRate 0.0455 Epoch: 6 Global Step: 270230 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:22,142-Speed 2627.07 samples/sec Loss 9.0076 LearningRate 0.0455 Epoch: 6 Global Step: 270240 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:26,035-Speed 2631.27 samples/sec Loss 9.2130 LearningRate 0.0455 Epoch: 6 Global Step: 270250 Fp16 Grad Scale: 4096 Required: 63 hours
Training: 2022-04-14 02:06:29,924-Speed 2633.07 samples/sec Loss 9.1656 LearningRate 0.0455 Epoch: 6 Global Step: 270260 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:06:33,820-Speed 2629.11 samples/sec Loss 9.0832 LearningRate 0.0455 Epoch: 6 Global Step: 270270 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:06:37,718-Speed 2627.88 samples/sec Loss 9.0835 LearningRate 0.0455 Epoch: 6 Global Step: 270280 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:06:41,617-Speed 2627.27 samples/sec Loss 8.9163 LearningRate 0.0455 Epoch: 6 Global Step: 270290 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:06:45,520-Speed 2623.94 samples/sec Loss 9.2278 LearningRate 0.0455 Epoch: 6 Global Step: 270300 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:06:49,425-Speed 2622.77 samples/sec Loss 9.2327 LearningRate 0.0454 Epoch: 6 Global Step: 270310 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:06:53,349-Speed 2610.46 samples/sec Loss 9.0614 LearningRate 0.0454 Epoch: 6 Global Step: 270320 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:06:57,262-Speed 2618.10 samples/sec Loss 9.1127 LearningRate 0.0454 Epoch: 6 Global Step: 270330 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:07:01,149-Speed 2634.85 samples/sec Loss 8.9733 LearningRate 0.0454 Epoch: 6 Global Step: 270340 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:07:05,152-Speed 2558.84 samples/sec Loss 9.1382 LearningRate 0.0454 Epoch: 6 Global Step: 270350 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:07:09,217-Speed 2519.89 samples/sec Loss 9.0729 LearningRate 0.0454 Epoch: 6 Global Step: 270360 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:13,958-Speed 2160.38 samples/sec Loss 9.2538 LearningRate 0.0454 Epoch: 6 Global Step: 270370 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:17,862-Speed 2622.89 samples/sec Loss 9.1760 LearningRate 0.0454 Epoch: 6 Global Step: 270380 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:21,766-Speed 2624.19 samples/sec Loss 9.1868 LearningRate 0.0454 Epoch: 6 Global Step: 270390 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:25,665-Speed 2626.79 samples/sec Loss 9.1874 LearningRate 0.0454 Epoch: 6 Global Step: 270400 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:29,561-Speed 2629.79 samples/sec Loss 9.1103 LearningRate 0.0454 Epoch: 6 Global Step: 270410 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:33,454-Speed 2630.81 samples/sec Loss 9.1034 LearningRate 0.0454 Epoch: 6 Global Step: 270420 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:37,343-Speed 2633.57 samples/sec Loss 9.2100 LearningRate 0.0454 Epoch: 6 Global Step: 270430 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:41,230-Speed 2634.65 samples/sec Loss 9.0315 LearningRate 0.0454 Epoch: 6 Global Step: 270440 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:45,124-Speed 2630.50 samples/sec Loss 9.1871 LearningRate 0.0454 Epoch: 6 Global Step: 270450 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:07:49,024-Speed 2626.70 samples/sec Loss 9.2010 LearningRate 0.0454 Epoch: 6 Global Step: 270460 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:07:52,915-Speed 2632.60 samples/sec Loss 9.1460 LearningRate 0.0454 Epoch: 6 Global Step: 270470 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:07:56,801-Speed 2635.68 samples/sec Loss 9.1486 LearningRate 0.0454 Epoch: 6 Global Step: 270480 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:00,687-Speed 2635.65 samples/sec Loss 9.1179 LearningRate 0.0454 Epoch: 6 Global Step: 270490 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:04,576-Speed 2633.32 samples/sec Loss 9.1230 LearningRate 0.0454 Epoch: 6 Global Step: 270500 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:08,486-Speed 2619.52 samples/sec Loss 9.0775 LearningRate 0.0454 Epoch: 6 Global Step: 270510 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:12,414-Speed 2607.77 samples/sec Loss 9.3501 LearningRate 0.0454 Epoch: 6 Global Step: 270520 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:16,313-Speed 2626.39 samples/sec Loss 9.0490 LearningRate 0.0454 Epoch: 6 Global Step: 270530 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:20,214-Speed 2625.96 samples/sec Loss 9.1464 LearningRate 0.0454 Epoch: 6 Global Step: 270540 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:24,105-Speed 2632.64 samples/sec Loss 9.2749 LearningRate 0.0454 Epoch: 6 Global Step: 270550 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:08:28,017-Speed 2618.24 samples/sec Loss 9.1247 LearningRate 0.0454 Epoch: 6 Global Step: 270560 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:31,909-Speed 2631.90 samples/sec Loss 9.0588 LearningRate 0.0454 Epoch: 6 Global Step: 270570 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:35,808-Speed 2626.68 samples/sec Loss 9.1182 LearningRate 0.0454 Epoch: 6 Global Step: 270580 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:39,731-Speed 2611.20 samples/sec Loss 9.2150 LearningRate 0.0454 Epoch: 6 Global Step: 270590 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:43,625-Speed 2630.55 samples/sec Loss 9.0785 LearningRate 0.0454 Epoch: 6 Global Step: 270600 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:47,516-Speed 2631.92 samples/sec Loss 9.2137 LearningRate 0.0454 Epoch: 6 Global Step: 270610 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:51,404-Speed 2634.44 samples/sec Loss 9.0487 LearningRate 0.0454 Epoch: 6 Global Step: 270620 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:55,297-Speed 2631.64 samples/sec Loss 8.9464 LearningRate 0.0454 Epoch: 6 Global Step: 270630 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:08:59,192-Speed 2630.11 samples/sec Loss 9.1150 LearningRate 0.0454 Epoch: 6 Global Step: 270640 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:03,082-Speed 2632.39 samples/sec Loss 9.0587 LearningRate 0.0454 Epoch: 6 Global Step: 270650 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:06,982-Speed 2626.27 samples/sec Loss 9.1351 LearningRate 0.0454 Epoch: 6 Global Step: 270660 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:09:10,877-Speed 2629.53 samples/sec Loss 9.1047 LearningRate 0.0454 Epoch: 6 Global Step: 270670 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:09:14,770-Speed 2635.76 samples/sec Loss 9.1351 LearningRate 0.0454 Epoch: 6 Global Step: 270680 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:09:18,663-Speed 2631.48 samples/sec Loss 9.2324 LearningRate 0.0454 Epoch: 6 Global Step: 270690 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:09:22,556-Speed 2631.23 samples/sec Loss 9.0936 LearningRate 0.0454 Epoch: 6 Global Step: 270700 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:09:26,457-Speed 2625.85 samples/sec Loss 9.0997 LearningRate 0.0454 Epoch: 6 Global Step: 270710 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:09:30,353-Speed 2628.38 samples/sec Loss 9.1235 LearningRate 0.0454 Epoch: 6 Global Step: 270720 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:09:34,231-Speed 2640.81 samples/sec Loss 9.1874 LearningRate 0.0454 Epoch: 6 Global Step: 270730 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:38,140-Speed 2620.06 samples/sec Loss 9.1059 LearningRate 0.0454 Epoch: 6 Global Step: 270740 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:42,054-Speed 2617.49 samples/sec Loss 9.1740 LearningRate 0.0454 Epoch: 6 Global Step: 270750 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:46,045-Speed 2566.34 samples/sec Loss 9.1508 LearningRate 0.0454 Epoch: 6 Global Step: 270760 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:49,969-Speed 2610.74 samples/sec Loss 9.0312 LearningRate 0.0454 Epoch: 6 Global Step: 270770 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:53,868-Speed 2626.85 samples/sec Loss 9.0760 LearningRate 0.0454 Epoch: 6 Global Step: 270780 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:09:57,779-Speed 2619.48 samples/sec Loss 9.1552 LearningRate 0.0454 Epoch: 6 Global Step: 270790 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:10:01,659-Speed 2639.65 samples/sec Loss 9.1552 LearningRate 0.0454 Epoch: 6 Global Step: 270800 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:10:05,533-Speed 2644.23 samples/sec Loss 10.4164 LearningRate 0.0454 Epoch: 6 Global Step: 270810 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:09,426-Speed 2630.63 samples/sec Loss 9.3996 LearningRate 0.0454 Epoch: 6 Global Step: 270820 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:13,354-Speed 2607.51 samples/sec Loss 9.1364 LearningRate 0.0454 Epoch: 6 Global Step: 270830 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:17,260-Speed 2622.46 samples/sec Loss 9.2270 LearningRate 0.0454 Epoch: 6 Global Step: 270840 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:21,148-Speed 2634.49 samples/sec Loss 9.2777 LearningRate 0.0454 Epoch: 6 Global Step: 270850 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:25,038-Speed 2633.10 samples/sec Loss 9.1491 LearningRate 0.0454 Epoch: 6 Global Step: 270860 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:28,927-Speed 2634.04 samples/sec Loss 9.2240 LearningRate 0.0454 Epoch: 6 Global Step: 270870 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:32,824-Speed 2628.15 samples/sec Loss 9.2931 LearningRate 0.0454 Epoch: 6 Global Step: 270880 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:36,715-Speed 2632.47 samples/sec Loss 9.3030 LearningRate 0.0454 Epoch: 6 Global Step: 270890 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:40,614-Speed 2626.84 samples/sec Loss 9.1960 LearningRate 0.0454 Epoch: 6 Global Step: 270900 Fp16 Grad Scale: 8192 Required: 63 hours
Training: 2022-04-14 02:10:44,504-Speed 2633.46 samples/sec Loss 9.1650 LearningRate 0.0454 Epoch: 6 Global Step: 270910 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:10:48,400-Speed 2628.78 samples/sec Loss 9.2469 LearningRate 0.0454 Epoch: 6 Global Step: 270920 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:10:52,289-Speed 2634.25 samples/sec Loss 9.1162 LearningRate 0.0453 Epoch: 6 Global Step: 270930 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:10:56,189-Speed 2625.75 samples/sec Loss 9.2315 LearningRate 0.0453 Epoch: 6 Global Step: 270940 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:00,080-Speed 2632.89 samples/sec Loss 9.0535 LearningRate 0.0453 Epoch: 6 Global Step: 270950 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:03,971-Speed 2631.77 samples/sec Loss 9.0634 LearningRate 0.0453 Epoch: 6 Global Step: 270960 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:07,865-Speed 2630.37 samples/sec Loss 9.1429 LearningRate 0.0453 Epoch: 6 Global Step: 270970 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:11,756-Speed 2632.28 samples/sec Loss 9.0520 LearningRate 0.0453 Epoch: 6 Global Step: 270980 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:15,645-Speed 2633.43 samples/sec Loss 9.1484 LearningRate 0.0453 Epoch: 6 Global Step: 270990 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:19,548-Speed 2624.91 samples/sec Loss 9.1962 LearningRate 0.0453 Epoch: 6 Global Step: 271000 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:23,437-Speed 2633.09 samples/sec Loss 9.1372 LearningRate 0.0453 Epoch: 6 Global Step: 271010 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:11:27,333-Speed 2629.51 samples/sec Loss 9.3175 LearningRate 0.0453 Epoch: 6 Global Step: 271020 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:11:31,226-Speed 2631.49 samples/sec Loss 9.1155 LearningRate 0.0453 Epoch: 6 Global Step: 271030 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:11:35,098-Speed 2645.06 samples/sec Loss 9.2446 LearningRate 0.0453 Epoch: 6 Global Step: 271040 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:39,002-Speed 2623.00 samples/sec Loss 9.1921 LearningRate 0.0453 Epoch: 6 Global Step: 271050 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:42,904-Speed 2625.92 samples/sec Loss 9.2182 LearningRate 0.0453 Epoch: 6 Global Step: 271060 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:46,799-Speed 2629.42 samples/sec Loss 9.1075 LearningRate 0.0453 Epoch: 6 Global Step: 271070 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:50,686-Speed 2635.97 samples/sec Loss 9.0749 LearningRate 0.0453 Epoch: 6 Global Step: 271080 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:54,577-Speed 2631.93 samples/sec Loss 9.2306 LearningRate 0.0453 Epoch: 6 Global Step: 271090 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:11:58,470-Speed 2631.28 samples/sec Loss 9.1135 LearningRate 0.0453 Epoch: 6 Global Step: 271100 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:12:02,469-Speed 2560.85 samples/sec Loss 9.2025 LearningRate 0.0453 Epoch: 6 Global Step: 271110 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:12:06,365-Speed 2628.82 samples/sec Loss 9.1623 LearningRate 0.0453 Epoch: 6 Global Step: 271120 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:12:10,254-Speed 2633.64 samples/sec Loss 9.0675 LearningRate 0.0453 Epoch: 6 Global Step: 271130 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:12:14,144-Speed 2633.04 samples/sec Loss 9.3496 LearningRate 0.0453 Epoch: 6 Global Step: 271140 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:18,036-Speed 2631.87 samples/sec Loss 9.2246 LearningRate 0.0453 Epoch: 6 Global Step: 271150 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:21,931-Speed 2629.42 samples/sec Loss 9.0531 LearningRate 0.0453 Epoch: 6 Global Step: 271160 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:25,833-Speed 2624.98 samples/sec Loss 9.0349 LearningRate 0.0453 Epoch: 6 Global Step: 271170 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:29,726-Speed 2630.92 samples/sec Loss 9.0798 LearningRate 0.0453 Epoch: 6 Global Step: 271180 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:33,626-Speed 2626.90 samples/sec Loss 9.1832 LearningRate 0.0453 Epoch: 6 Global Step: 271190 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:37,525-Speed 2626.70 samples/sec Loss 9.1298 LearningRate 0.0453 Epoch: 6 Global Step: 271200 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:41,417-Speed 2631.72 samples/sec Loss 9.0529 LearningRate 0.0453 Epoch: 6 Global Step: 271210 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:45,347-Speed 2606.35 samples/sec Loss 9.0269 LearningRate 0.0453 Epoch: 6 Global Step: 271220 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:49,242-Speed 2629.53 samples/sec Loss 9.1795 LearningRate 0.0453 Epoch: 6 Global Step: 271230 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:12:53,137-Speed 2630.02 samples/sec Loss 9.2210 LearningRate 0.0453 Epoch: 6 Global Step: 271240 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:12:57,030-Speed 2630.88 samples/sec Loss 9.2076 LearningRate 0.0453 Epoch: 6 Global Step: 271250 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:00,919-Speed 2633.90 samples/sec Loss 9.1369 LearningRate 0.0453 Epoch: 6 Global Step: 271260 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:04,810-Speed 2632.61 samples/sec Loss 9.1059 LearningRate 0.0453 Epoch: 6 Global Step: 271270 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:08,699-Speed 2633.41 samples/sec Loss 9.1601 LearningRate 0.0453 Epoch: 6 Global Step: 271280 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:12,603-Speed 2623.25 samples/sec Loss 9.2113 LearningRate 0.0453 Epoch: 6 Global Step: 271290 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:16,502-Speed 2626.93 samples/sec Loss 9.0220 LearningRate 0.0453 Epoch: 6 Global Step: 271300 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:20,404-Speed 2625.14 samples/sec Loss 9.0734 LearningRate 0.0453 Epoch: 6 Global Step: 271310 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:24,306-Speed 2625.07 samples/sec Loss 9.1686 LearningRate 0.0453 Epoch: 6 Global Step: 271320 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:28,210-Speed 2623.97 samples/sec Loss 9.1982 LearningRate 0.0453 Epoch: 6 Global Step: 271330 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:13:32,114-Speed 2623.29 samples/sec Loss 9.0533 LearningRate 0.0453 Epoch: 6 Global Step: 271340 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:13:36,017-Speed 2623.60 samples/sec Loss 9.1025 LearningRate 0.0453 Epoch: 6 Global Step: 271350 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:13:39,916-Speed 2627.25 samples/sec Loss 9.2158 LearningRate 0.0453 Epoch: 6 Global Step: 271360 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:13:43,820-Speed 2623.78 samples/sec Loss 9.0814 LearningRate 0.0453 Epoch: 6 Global Step: 271370 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:13:47,713-Speed 2630.40 samples/sec Loss 9.2049 LearningRate 0.0453 Epoch: 6 Global Step: 271380 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:13:51,614-Speed 2626.11 samples/sec Loss 9.2041 LearningRate 0.0453 Epoch: 6 Global Step: 271390 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:13:55,510-Speed 2628.66 samples/sec Loss 9.1719 LearningRate 0.0453 Epoch: 6 Global Step: 271400 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:13:59,401-Speed 2632.56 samples/sec Loss 9.0678 LearningRate 0.0453 Epoch: 6 Global Step: 271410 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:03,300-Speed 2627.28 samples/sec Loss 9.2481 LearningRate 0.0453 Epoch: 6 Global Step: 271420 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:07,191-Speed 2632.04 samples/sec Loss 8.9610 LearningRate 0.0453 Epoch: 6 Global Step: 271430 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:11,080-Speed 2633.19 samples/sec Loss 9.1730 LearningRate 0.0453 Epoch: 6 Global Step: 271440 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:14:14,976-Speed 2629.60 samples/sec Loss 9.1125 LearningRate 0.0453 Epoch: 6 Global Step: 271450 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:14:18,866-Speed 2632.80 samples/sec Loss 9.1159 LearningRate 0.0453 Epoch: 6 Global Step: 271460 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:14:22,770-Speed 2623.39 samples/sec Loss 9.0579 LearningRate 0.0453 Epoch: 6 Global Step: 271470 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:14:26,663-Speed 2631.03 samples/sec Loss 9.0725 LearningRate 0.0453 Epoch: 6 Global Step: 271480 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:14:30,549-Speed 2636.03 samples/sec Loss 9.0398 LearningRate 0.0453 Epoch: 6 Global Step: 271490 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:34,445-Speed 2628.79 samples/sec Loss 9.1499 LearningRate 0.0453 Epoch: 6 Global Step: 271500 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:38,339-Speed 2629.88 samples/sec Loss 9.0691 LearningRate 0.0453 Epoch: 6 Global Step: 271510 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:42,242-Speed 2624.63 samples/sec Loss 9.0351 LearningRate 0.0453 Epoch: 6 Global Step: 271520 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:46,146-Speed 2623.11 samples/sec Loss 9.0761 LearningRate 0.0453 Epoch: 6 Global Step: 271530 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:50,044-Speed 2627.70 samples/sec Loss 9.1224 LearningRate 0.0452 Epoch: 6 Global Step: 271540 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:53,937-Speed 2631.13 samples/sec Loss 9.1614 LearningRate 0.0452 Epoch: 6 Global Step: 271550 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:14:57,834-Speed 2628.51 samples/sec Loss 9.1691 LearningRate 0.0452 Epoch: 6 Global Step: 271560 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:15:01,800-Speed 2582.54 samples/sec Loss 9.2008 LearningRate 0.0452 Epoch: 6 Global Step: 271570 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:15:05,694-Speed 2629.75 samples/sec Loss 9.0378 LearningRate 0.0452 Epoch: 6 Global Step: 271580 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:15:09,588-Speed 2630.94 samples/sec Loss 9.1030 LearningRate 0.0452 Epoch: 6 Global Step: 271590 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:13,477-Speed 2633.47 samples/sec Loss 9.1971 LearningRate 0.0452 Epoch: 6 Global Step: 271600 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:17,376-Speed 2627.07 samples/sec Loss 8.9549 LearningRate 0.0452 Epoch: 6 Global Step: 271610 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:21,273-Speed 2627.90 samples/sec Loss 9.0286 LearningRate 0.0452 Epoch: 6 Global Step: 271620 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:25,176-Speed 2624.34 samples/sec Loss 9.1546 LearningRate 0.0452 Epoch: 6 Global Step: 271630 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:29,075-Speed 2626.86 samples/sec Loss 9.1388 LearningRate 0.0452 Epoch: 6 Global Step: 271640 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:32,973-Speed 2627.83 samples/sec Loss 9.1137 LearningRate 0.0452 Epoch: 6 Global Step: 271650 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:36,880-Speed 2621.12 samples/sec Loss 8.9945 LearningRate 0.0452 Epoch: 6 Global Step: 271660 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:40,772-Speed 2631.56 samples/sec Loss 9.1259 LearningRate 0.0452 Epoch: 6 Global Step: 271670 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:44,663-Speed 2632.37 samples/sec Loss 9.0475 LearningRate 0.0452 Epoch: 6 Global Step: 271680 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:48,546-Speed 2637.76 samples/sec Loss 9.0812 LearningRate 0.0452 Epoch: 6 Global Step: 271690 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:52,439-Speed 2631.51 samples/sec Loss 9.0946 LearningRate 0.0452 Epoch: 6 Global Step: 271700 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:15:56,331-Speed 2631.57 samples/sec Loss 9.1211 LearningRate 0.0452 Epoch: 6 Global Step: 271710 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:16:00,223-Speed 2632.00 samples/sec Loss 9.0482 LearningRate 0.0452 Epoch: 6 Global Step: 271720 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:16:04,103-Speed 2639.57 samples/sec Loss 9.1513 LearningRate 0.0452 Epoch: 6 Global Step: 271730 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:08,002-Speed 2626.39 samples/sec Loss 9.0349 LearningRate 0.0452 Epoch: 6 Global Step: 271740 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:11,900-Speed 2627.40 samples/sec Loss 9.2121 LearningRate 0.0452 Epoch: 6 Global Step: 271750 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:15,794-Speed 2630.50 samples/sec Loss 9.2539 LearningRate 0.0452 Epoch: 6 Global Step: 271760 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:19,692-Speed 2627.65 samples/sec Loss 9.2860 LearningRate 0.0452 Epoch: 6 Global Step: 271770 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:23,584-Speed 2631.59 samples/sec Loss 9.0873 LearningRate 0.0452 Epoch: 6 Global Step: 271780 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:27,475-Speed 2632.14 samples/sec Loss 9.0989 LearningRate 0.0452 Epoch: 6 Global Step: 271790 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:31,365-Speed 2633.33 samples/sec Loss 9.1550 LearningRate 0.0452 Epoch: 6 Global Step: 271800 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:35,259-Speed 2630.38 samples/sec Loss 9.0844 LearningRate 0.0452 Epoch: 6 Global Step: 271810 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:39,149-Speed 2633.31 samples/sec Loss 9.1804 LearningRate 0.0452 Epoch: 6 Global Step: 271820 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:43,042-Speed 2630.22 samples/sec Loss 8.9586 LearningRate 0.0452 Epoch: 6 Global Step: 271830 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:16:46,940-Speed 2628.26 samples/sec Loss 8.9621 LearningRate 0.0452 Epoch: 6 Global Step: 271840 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:16:50,813-Speed 2643.80 samples/sec Loss 9.0887 LearningRate 0.0452 Epoch: 6 Global Step: 271850 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:54,721-Speed 2621.28 samples/sec Loss 9.0103 LearningRate 0.0452 Epoch: 6 Global Step: 271860 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:16:58,612-Speed 2632.30 samples/sec Loss 9.2593 LearningRate 0.0452 Epoch: 6 Global Step: 271870 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:02,503-Speed 2631.97 samples/sec Loss 8.9974 LearningRate 0.0452 Epoch: 6 Global Step: 271880 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:06,394-Speed 2632.40 samples/sec Loss 9.2117 LearningRate 0.0452 Epoch: 6 Global Step: 271890 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:10,298-Speed 2623.86 samples/sec Loss 9.1182 LearningRate 0.0452 Epoch: 6 Global Step: 271900 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:14,193-Speed 2629.57 samples/sec Loss 9.1037 LearningRate 0.0452 Epoch: 6 Global Step: 271910 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:18,090-Speed 2628.26 samples/sec Loss 8.9867 LearningRate 0.0452 Epoch: 6 Global Step: 271920 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:21,995-Speed 2623.47 samples/sec Loss 9.0497 LearningRate 0.0452 Epoch: 6 Global Step: 271930 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:25,888-Speed 2630.72 samples/sec Loss 9.0934 LearningRate 0.0452 Epoch: 6 Global Step: 271940 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:17:29,779-Speed 2631.96 samples/sec Loss 9.1297 LearningRate 0.0452 Epoch: 6 Global Step: 271950 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:17:33,683-Speed 2623.82 samples/sec Loss 9.1317 LearningRate 0.0452 Epoch: 6 Global Step: 271960 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:17:37,587-Speed 2623.16 samples/sec Loss 9.0998 LearningRate 0.0452 Epoch: 6 Global Step: 271970 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:17:41,482-Speed 2630.02 samples/sec Loss 8.9317 LearningRate 0.0452 Epoch: 6 Global Step: 271980 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:17:45,370-Speed 2634.74 samples/sec Loss 9.1644 LearningRate 0.0452 Epoch: 6 Global Step: 271990 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:17:49,194-Speed 2678.51 samples/sec Loss 9.1725 LearningRate 0.0452 Epoch: 6 Global Step: 272000 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:17:53,087-Speed 2630.94 samples/sec Loss 9.1632 LearningRate 0.0452 Epoch: 6 Global Step: 272010 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:17:56,985-Speed 2627.51 samples/sec Loss 9.0492 LearningRate 0.0452 Epoch: 6 Global Step: 272020 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:00,871-Speed 2635.37 samples/sec Loss 9.0960 LearningRate 0.0452 Epoch: 6 Global Step: 272030 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:04,773-Speed 2625.09 samples/sec Loss 9.2208 LearningRate 0.0452 Epoch: 6 Global Step: 272040 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:08,694-Speed 2611.47 samples/sec Loss 9.0638 LearningRate 0.0452 Epoch: 6 Global Step: 272050 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:12,590-Speed 2629.44 samples/sec Loss 9.0947 LearningRate 0.0452 Epoch: 6 Global Step: 272060 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:16,502-Speed 2618.66 samples/sec Loss 9.1800 LearningRate 0.0452 Epoch: 6 Global Step: 272070 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:20,547-Speed 2532.32 samples/sec Loss 9.1584 LearningRate 0.0452 Epoch: 6 Global Step: 272080 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:24,451-Speed 2623.28 samples/sec Loss 9.1210 LearningRate 0.0452 Epoch: 6 Global Step: 272090 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:18:28,350-Speed 2626.90 samples/sec Loss 9.1383 LearningRate 0.0452 Epoch: 6 Global Step: 272100 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:32,272-Speed 2611.84 samples/sec Loss 9.1006 LearningRate 0.0452 Epoch: 6 Global Step: 272110 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:36,234-Speed 2584.97 samples/sec Loss 9.1667 LearningRate 0.0452 Epoch: 6 Global Step: 272120 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:40,127-Speed 2630.49 samples/sec Loss 9.0688 LearningRate 0.0452 Epoch: 6 Global Step: 272130 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:44,070-Speed 2598.08 samples/sec Loss 9.1572 LearningRate 0.0452 Epoch: 6 Global Step: 272140 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:48,115-Speed 2532.11 samples/sec Loss 8.8733 LearningRate 0.0452 Epoch: 6 Global Step: 272150 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:52,005-Speed 2632.81 samples/sec Loss 9.0902 LearningRate 0.0451 Epoch: 6 Global Step: 272160 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:55,903-Speed 2628.14 samples/sec Loss 9.1426 LearningRate 0.0451 Epoch: 6 Global Step: 272170 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:18:59,801-Speed 2627.49 samples/sec Loss 9.1896 LearningRate 0.0451 Epoch: 6 Global Step: 272180 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:19:03,699-Speed 2627.34 samples/sec Loss 9.0063 LearningRate 0.0451 Epoch: 6 Global Step: 272190 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:19:07,609-Speed 2619.51 samples/sec Loss 9.0152 LearningRate 0.0451 Epoch: 6 Global Step: 272200 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:11,507-Speed 2627.25 samples/sec Loss 9.1573 LearningRate 0.0451 Epoch: 6 Global Step: 272210 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:15,409-Speed 2625.04 samples/sec Loss 9.1389 LearningRate 0.0451 Epoch: 6 Global Step: 272220 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:19,303-Speed 2630.98 samples/sec Loss 8.9970 LearningRate 0.0451 Epoch: 6 Global Step: 272230 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:23,194-Speed 2632.16 samples/sec Loss 8.9117 LearningRate 0.0451 Epoch: 6 Global Step: 272240 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:27,090-Speed 2628.69 samples/sec Loss 9.0752 LearningRate 0.0451 Epoch: 6 Global Step: 272250 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:30,985-Speed 2629.84 samples/sec Loss 9.0162 LearningRate 0.0451 Epoch: 6 Global Step: 272260 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:34,878-Speed 2630.53 samples/sec Loss 9.1673 LearningRate 0.0451 Epoch: 6 Global Step: 272270 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:38,772-Speed 2630.34 samples/sec Loss 9.1945 LearningRate 0.0451 Epoch: 6 Global Step: 272280 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:42,662-Speed 2633.04 samples/sec Loss 9.0748 LearningRate 0.0451 Epoch: 6 Global Step: 272290 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:19:46,557-Speed 2629.81 samples/sec Loss 9.0188 LearningRate 0.0451 Epoch: 6 Global Step: 272300 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:19:50,452-Speed 2629.48 samples/sec Loss 9.0961 LearningRate 0.0451 Epoch: 6 Global Step: 272310 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:19:54,352-Speed 2626.19 samples/sec Loss 9.1485 LearningRate 0.0451 Epoch: 6 Global Step: 272320 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:19:58,249-Speed 2628.32 samples/sec Loss 9.1161 LearningRate 0.0451 Epoch: 6 Global Step: 272330 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:20:02,147-Speed 2627.38 samples/sec Loss 9.1769 LearningRate 0.0451 Epoch: 6 Global Step: 272340 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:20:06,043-Speed 2629.19 samples/sec Loss 9.0535 LearningRate 0.0451 Epoch: 6 Global Step: 272350 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:20:09,913-Speed 2646.28 samples/sec Loss 9.1662 LearningRate 0.0451 Epoch: 6 Global Step: 272360 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:13,816-Speed 2624.41 samples/sec Loss 8.9721 LearningRate 0.0451 Epoch: 6 Global Step: 272370 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:17,726-Speed 2619.66 samples/sec Loss 9.0023 LearningRate 0.0451 Epoch: 6 Global Step: 272380 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:21,625-Speed 2626.91 samples/sec Loss 9.0944 LearningRate 0.0451 Epoch: 6 Global Step: 272390 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:25,520-Speed 2629.42 samples/sec Loss 9.0521 LearningRate 0.0451 Epoch: 6 Global Step: 272400 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:29,419-Speed 2626.93 samples/sec Loss 8.9872 LearningRate 0.0451 Epoch: 6 Global Step: 272410 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:33,316-Speed 2628.06 samples/sec Loss 9.2541 LearningRate 0.0451 Epoch: 6 Global Step: 272420 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:37,222-Speed 2622.61 samples/sec Loss 9.1797 LearningRate 0.0451 Epoch: 6 Global Step: 272430 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:41,113-Speed 2631.75 samples/sec Loss 9.0082 LearningRate 0.0451 Epoch: 6 Global Step: 272440 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:45,129-Speed 2550.25 samples/sec Loss 9.0742 LearningRate 0.0451 Epoch: 6 Global Step: 272450 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:20:49,198-Speed 2517.62 samples/sec Loss 8.9884 LearningRate 0.0451 Epoch: 6 Global Step: 272460 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:20:53,210-Speed 2552.98 samples/sec Loss 9.0929 LearningRate 0.0451 Epoch: 6 Global Step: 272470 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:20:57,103-Speed 2630.98 samples/sec Loss 9.0814 LearningRate 0.0451 Epoch: 6 Global Step: 272480 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:00,997-Speed 2630.25 samples/sec Loss 9.1110 LearningRate 0.0451 Epoch: 6 Global Step: 272490 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:04,894-Speed 2628.14 samples/sec Loss 9.0227 LearningRate 0.0451 Epoch: 6 Global Step: 272500 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:08,790-Speed 2629.09 samples/sec Loss 9.0382 LearningRate 0.0451 Epoch: 6 Global Step: 272510 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:12,685-Speed 2629.63 samples/sec Loss 9.0675 LearningRate 0.0451 Epoch: 6 Global Step: 272520 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:16,594-Speed 2619.92 samples/sec Loss 8.9888 LearningRate 0.0451 Epoch: 6 Global Step: 272530 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:20,511-Speed 2614.83 samples/sec Loss 9.0601 LearningRate 0.0451 Epoch: 6 Global Step: 272540 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:24,429-Speed 2614.66 samples/sec Loss 8.9756 LearningRate 0.0451 Epoch: 6 Global Step: 272550 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:28,320-Speed 2632.12 samples/sec Loss 9.0741 LearningRate 0.0451 Epoch: 6 Global Step: 272560 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:21:32,253-Speed 2604.56 samples/sec Loss 9.2027 LearningRate 0.0451 Epoch: 6 Global Step: 272570 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:21:36,154-Speed 2625.00 samples/sec Loss 9.0464 LearningRate 0.0451 Epoch: 6 Global Step: 272580 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:21:40,063-Speed 2620.54 samples/sec Loss 9.0050 LearningRate 0.0451 Epoch: 6 Global Step: 272590 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:21:43,982-Speed 2613.36 samples/sec Loss 9.1593 LearningRate 0.0451 Epoch: 6 Global Step: 272600 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:21:47,877-Speed 2629.83 samples/sec Loss 8.9886 LearningRate 0.0451 Epoch: 6 Global Step: 272610 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:21:51,752-Speed 2643.02 samples/sec Loss 9.2335 LearningRate 0.0451 Epoch: 6 Global Step: 272620 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:55,646-Speed 2630.12 samples/sec Loss 9.0525 LearningRate 0.0451 Epoch: 6 Global Step: 272630 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:21:59,540-Speed 2630.51 samples/sec Loss 8.9383 LearningRate 0.0451 Epoch: 6 Global Step: 272640 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:03,436-Speed 2629.09 samples/sec Loss 9.0615 LearningRate 0.0451 Epoch: 6 Global Step: 272650 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:07,329-Speed 2630.65 samples/sec Loss 9.2300 LearningRate 0.0451 Epoch: 6 Global Step: 272660 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:11,222-Speed 2630.92 samples/sec Loss 8.9946 LearningRate 0.0451 Epoch: 6 Global Step: 272670 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:15,115-Speed 2631.13 samples/sec Loss 9.1087 LearningRate 0.0451 Epoch: 6 Global Step: 272680 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:19,012-Speed 2628.38 samples/sec Loss 9.1115 LearningRate 0.0451 Epoch: 6 Global Step: 272690 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:22,908-Speed 2628.91 samples/sec Loss 9.1837 LearningRate 0.0451 Epoch: 6 Global Step: 272700 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:26,801-Speed 2630.49 samples/sec Loss 9.1801 LearningRate 0.0451 Epoch: 6 Global Step: 272710 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:30,701-Speed 2626.33 samples/sec Loss 9.1115 LearningRate 0.0451 Epoch: 6 Global Step: 272720 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:22:34,594-Speed 2631.36 samples/sec Loss 8.9063 LearningRate 0.0451 Epoch: 6 Global Step: 272730 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:22:38,497-Speed 2624.15 samples/sec Loss 9.1312 LearningRate 0.0451 Epoch: 6 Global Step: 272740 Fp16 Grad Scale: 262144 Required: 63 hours
Training: 2022-04-14 02:22:42,367-Speed 2646.53 samples/sec Loss 9.0609 LearningRate 0.0451 Epoch: 6 Global Step: 272750 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:46,267-Speed 2626.18 samples/sec Loss 9.0327 LearningRate 0.0451 Epoch: 6 Global Step: 272760 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:50,170-Speed 2624.48 samples/sec Loss 8.9783 LearningRate 0.0451 Epoch: 6 Global Step: 272770 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:54,064-Speed 2630.35 samples/sec Loss 9.0692 LearningRate 0.0450 Epoch: 6 Global Step: 272780 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:22:57,958-Speed 2630.38 samples/sec Loss 9.1075 LearningRate 0.0450 Epoch: 6 Global Step: 272790 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:23:01,863-Speed 2622.66 samples/sec Loss 9.1074 LearningRate 0.0450 Epoch: 6 Global Step: 272800 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:23:05,755-Speed 2631.19 samples/sec Loss 9.0316 LearningRate 0.0450 Epoch: 6 Global Step: 272810 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:23:09,649-Speed 2630.82 samples/sec Loss 9.0110 LearningRate 0.0450 Epoch: 6 Global Step: 272820 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:23:13,531-Speed 2638.20 samples/sec Loss 9.0142 LearningRate 0.0450 Epoch: 6 Global Step: 272830 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:17,426-Speed 2629.29 samples/sec Loss 9.2166 LearningRate 0.0450 Epoch: 6 Global Step: 272840 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:21,323-Speed 2628.52 samples/sec Loss 9.0436 LearningRate 0.0450 Epoch: 6 Global Step: 272850 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:25,217-Speed 2630.00 samples/sec Loss 9.0564 LearningRate 0.0450 Epoch: 6 Global Step: 272860 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:29,117-Speed 2626.64 samples/sec Loss 9.2659 LearningRate 0.0450 Epoch: 6 Global Step: 272870 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:33,008-Speed 2631.93 samples/sec Loss 9.0812 LearningRate 0.0450 Epoch: 6 Global Step: 272880 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:36,901-Speed 2631.05 samples/sec Loss 8.9512 LearningRate 0.0450 Epoch: 6 Global Step: 272890 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:40,794-Speed 2631.19 samples/sec Loss 9.0290 LearningRate 0.0450 Epoch: 6 Global Step: 272900 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:44,686-Speed 2631.32 samples/sec Loss 9.0601 LearningRate 0.0450 Epoch: 6 Global Step: 272910 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:48,593-Speed 2621.40 samples/sec Loss 9.0539 LearningRate 0.0450 Epoch: 6 Global Step: 272920 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:23:52,507-Speed 2617.33 samples/sec Loss 8.9983 LearningRate 0.0450 Epoch: 6 Global Step: 272930 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:23:56,417-Speed 2619.27 samples/sec Loss 9.0926 LearningRate 0.0450 Epoch: 6 Global Step: 272940 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:00,318-Speed 2625.58 samples/sec Loss 9.0170 LearningRate 0.0450 Epoch: 6 Global Step: 272950 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:04,216-Speed 2627.44 samples/sec Loss 9.0911 LearningRate 0.0450 Epoch: 6 Global Step: 272960 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:08,114-Speed 2628.10 samples/sec Loss 8.9428 LearningRate 0.0450 Epoch: 6 Global Step: 272970 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:12,011-Speed 2627.95 samples/sec Loss 9.1126 LearningRate 0.0450 Epoch: 6 Global Step: 272980 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:15,907-Speed 2628.79 samples/sec Loss 9.0505 LearningRate 0.0450 Epoch: 6 Global Step: 272990 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:19,803-Speed 2629.10 samples/sec Loss 9.0296 LearningRate 0.0450 Epoch: 6 Global Step: 273000 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:23,697-Speed 2630.08 samples/sec Loss 9.0091 LearningRate 0.0450 Epoch: 6 Global Step: 273010 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:27,589-Speed 2631.83 samples/sec Loss 9.0784 LearningRate 0.0450 Epoch: 6 Global Step: 273020 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:24:31,485-Speed 2628.66 samples/sec Loss 9.0220 LearningRate 0.0450 Epoch: 6 Global Step: 273030 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:24:35,382-Speed 2628.51 samples/sec Loss 9.2219 LearningRate 0.0450 Epoch: 6 Global Step: 273040 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:24:39,274-Speed 2631.53 samples/sec Loss 9.0816 LearningRate 0.0450 Epoch: 6 Global Step: 273050 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:24:43,168-Speed 2630.63 samples/sec Loss 9.2257 LearningRate 0.0450 Epoch: 6 Global Step: 273060 Fp16 Grad Scale: 131072 Required: 63 hours
Training: 2022-04-14 02:24:46,998-Speed 2674.12 samples/sec Loss 9.5799 LearningRate 0.0450 Epoch: 6 Global Step: 273070 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:24:50,891-Speed 2630.79 samples/sec Loss 9.0185 LearningRate 0.0450 Epoch: 6 Global Step: 273080 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:24:54,796-Speed 2622.75 samples/sec Loss 9.2515 LearningRate 0.0450 Epoch: 6 Global Step: 273090 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:24:58,689-Speed 2631.10 samples/sec Loss 9.1252 LearningRate 0.0450 Epoch: 6 Global Step: 273100 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:25:02,585-Speed 2629.14 samples/sec Loss 9.1247 LearningRate 0.0450 Epoch: 6 Global Step: 273110 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:25:06,489-Speed 2623.50 samples/sec Loss 9.1615 LearningRate 0.0450 Epoch: 6 Global Step: 273120 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:25:10,393-Speed 2623.06 samples/sec Loss 8.9382 LearningRate 0.0450 Epoch: 6 Global Step: 273130 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:25:14,318-Speed 2610.09 samples/sec Loss 8.9621 LearningRate 0.0450 Epoch: 6 Global Step: 273140 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:25:18,211-Speed 2630.76 samples/sec Loss 9.1391 LearningRate 0.0450 Epoch: 6 Global Step: 273150 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:25:22,104-Speed 2631.38 samples/sec Loss 9.1207 LearningRate 0.0450 Epoch: 6 Global Step: 273160 Fp16 Grad Scale: 16384 Required: 63 hours
Training: 2022-04-14 02:25:26,000-Speed 2629.05 samples/sec Loss 9.0570 LearningRate 0.0450 Epoch: 6 Global Step: 273170 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:29,893-Speed 2630.88 samples/sec Loss 9.0310 LearningRate 0.0450 Epoch: 6 Global Step: 273180 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:33,785-Speed 2631.57 samples/sec Loss 9.1909 LearningRate 0.0450 Epoch: 6 Global Step: 273190 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:37,676-Speed 2632.40 samples/sec Loss 9.1139 LearningRate 0.0450 Epoch: 6 Global Step: 273200 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:41,572-Speed 2628.32 samples/sec Loss 9.1304 LearningRate 0.0450 Epoch: 6 Global Step: 273210 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:45,467-Speed 2629.65 samples/sec Loss 8.9547 LearningRate 0.0450 Epoch: 6 Global Step: 273220 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:49,362-Speed 2629.93 samples/sec Loss 9.0597 LearningRate 0.0450 Epoch: 6 Global Step: 273230 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:53,260-Speed 2627.05 samples/sec Loss 9.0596 LearningRate 0.0450 Epoch: 6 Global Step: 273240 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:25:57,152-Speed 2632.09 samples/sec Loss 9.1033 LearningRate 0.0450 Epoch: 6 Global Step: 273250 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:26:01,045-Speed 2631.36 samples/sec Loss 9.0466 LearningRate 0.0450 Epoch: 6 Global Step: 273260 Fp16 Grad Scale: 32768 Required: 63 hours
Training: 2022-04-14 02:26:04,942-Speed 2628.32 samples/sec Loss 9.0866 LearningRate 0.0450 Epoch: 6 Global Step: 273270 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:26:08,836-Speed 2630.03 samples/sec Loss 9.0503 LearningRate 0.0450 Epoch: 6 Global Step: 273280 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:26:12,732-Speed 2628.57 samples/sec Loss 9.0056 LearningRate 0.0450 Epoch: 6 Global Step: 273290 Fp16 Grad Scale: 65536 Required: 63 hours
Training: 2022-04-14 02:26:16,622-Speed 2633.07 samples/sec Loss 9.0413 LearningRate 0.0450 Epoch: 6 Global Step: 273300 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:26:20,516-Speed 2630.21 samples/sec Loss 9.0370 LearningRate 0.0450 Epoch: 6 Global Step: 273310 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:26:24,423-Speed 2621.78 samples/sec Loss 9.1355 LearningRate 0.0450 Epoch: 6 Global Step: 273320 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:26:28,323-Speed 2625.94 samples/sec Loss 9.1432 LearningRate 0.0450 Epoch: 6 Global Step: 273330 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:26:32,215-Speed 2632.09 samples/sec Loss 9.1759 LearningRate 0.0450 Epoch: 6 Global Step: 273340 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:26:36,114-Speed 2626.98 samples/sec Loss 9.1769 LearningRate 0.0450 Epoch: 6 Global Step: 273350 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:26:40,008-Speed 2630.14 samples/sec Loss 9.0470 LearningRate 0.0450 Epoch: 6 Global Step: 273360 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:26:43,903-Speed 2629.51 samples/sec Loss 8.9972 LearningRate 0.0450 Epoch: 6 Global Step: 273370 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:26:47,800-Speed 2628.37 samples/sec Loss 9.0702 LearningRate 0.0450 Epoch: 6 Global Step: 273380 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:26:51,693-Speed 2631.23 samples/sec Loss 9.2020 LearningRate 0.0450 Epoch: 6 Global Step: 273390 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:26:55,588-Speed 2629.68 samples/sec Loss 9.0192 LearningRate 0.0449 Epoch: 6 Global Step: 273400 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:26:59,480-Speed 2631.18 samples/sec Loss 9.0765 LearningRate 0.0449 Epoch: 6 Global Step: 273410 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:03,372-Speed 2631.57 samples/sec Loss 9.0369 LearningRate 0.0449 Epoch: 6 Global Step: 273420 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:07,266-Speed 2630.59 samples/sec Loss 9.0294 LearningRate 0.0449 Epoch: 6 Global Step: 273430 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:11,168-Speed 2624.87 samples/sec Loss 9.1363 LearningRate 0.0449 Epoch: 6 Global Step: 273440 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:15,072-Speed 2623.30 samples/sec Loss 9.0556 LearningRate 0.0449 Epoch: 6 Global Step: 273450 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:18,977-Speed 2623.08 samples/sec Loss 9.0369 LearningRate 0.0449 Epoch: 6 Global Step: 273460 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:22,878-Speed 2625.83 samples/sec Loss 9.0781 LearningRate 0.0449 Epoch: 6 Global Step: 273470 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:27:26,771-Speed 2630.59 samples/sec Loss 9.2498 LearningRate 0.0449 Epoch: 6 Global Step: 273480 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:27:30,663-Speed 2631.57 samples/sec Loss 9.3477 LearningRate 0.0449 Epoch: 6 Global Step: 273490 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:27:34,559-Speed 2629.29 samples/sec Loss 9.0495 LearningRate 0.0449 Epoch: 6 Global Step: 273500 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:27:38,459-Speed 2626.01 samples/sec Loss 9.1121 LearningRate 0.0449 Epoch: 6 Global Step: 273510 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:27:42,341-Speed 2638.36 samples/sec Loss 9.1372 LearningRate 0.0449 Epoch: 6 Global Step: 273520 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:46,243-Speed 2624.47 samples/sec Loss 8.9546 LearningRate 0.0449 Epoch: 6 Global Step: 273530 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:50,153-Speed 2619.89 samples/sec Loss 9.0067 LearningRate 0.0449 Epoch: 6 Global Step: 273540 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:54,049-Speed 2629.20 samples/sec Loss 9.0878 LearningRate 0.0449 Epoch: 6 Global Step: 273550 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:27:57,956-Speed 2621.41 samples/sec Loss 9.0045 LearningRate 0.0449 Epoch: 6 Global Step: 273560 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:01,850-Speed 2630.54 samples/sec Loss 9.0865 LearningRate 0.0449 Epoch: 6 Global Step: 273570 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:05,744-Speed 2630.13 samples/sec Loss 8.9950 LearningRate 0.0449 Epoch: 6 Global Step: 273580 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:09,640-Speed 2628.94 samples/sec Loss 9.0696 LearningRate 0.0449 Epoch: 6 Global Step: 273590 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:13,546-Speed 2621.80 samples/sec Loss 9.1241 LearningRate 0.0449 Epoch: 6 Global Step: 273600 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:17,445-Speed 2626.77 samples/sec Loss 9.0012 LearningRate 0.0449 Epoch: 6 Global Step: 273610 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:21,338-Speed 2631.59 samples/sec Loss 9.0782 LearningRate 0.0449 Epoch: 6 Global Step: 273620 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:28:25,260-Speed 2611.64 samples/sec Loss 9.0597 LearningRate 0.0449 Epoch: 6 Global Step: 273630 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:28:29,161-Speed 2625.74 samples/sec Loss 9.0552 LearningRate 0.0449 Epoch: 6 Global Step: 273640 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:28:33,052-Speed 2632.37 samples/sec Loss 8.9383 LearningRate 0.0449 Epoch: 6 Global Step: 273650 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:28:36,928-Speed 2642.17 samples/sec Loss 9.1006 LearningRate 0.0449 Epoch: 6 Global Step: 273660 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:40,822-Speed 2630.45 samples/sec Loss 9.0059 LearningRate 0.0449 Epoch: 6 Global Step: 273670 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:44,718-Speed 2628.58 samples/sec Loss 9.0531 LearningRate 0.0449 Epoch: 6 Global Step: 273680 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:48,613-Speed 2629.97 samples/sec Loss 9.0196 LearningRate 0.0449 Epoch: 6 Global Step: 273690 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:52,506-Speed 2630.78 samples/sec Loss 9.0164 LearningRate 0.0449 Epoch: 6 Global Step: 273700 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:28:56,400-Speed 2629.95 samples/sec Loss 9.0793 LearningRate 0.0449 Epoch: 6 Global Step: 273710 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:29:00,293-Speed 2631.62 samples/sec Loss 9.1050 LearningRate 0.0449 Epoch: 6 Global Step: 273720 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:29:04,188-Speed 2629.49 samples/sec Loss 9.0998 LearningRate 0.0449 Epoch: 6 Global Step: 273730 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:29:08,065-Speed 2641.89 samples/sec Loss 8.9995 LearningRate 0.0449 Epoch: 6 Global Step: 273740 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:11,960-Speed 2629.62 samples/sec Loss 9.1166 LearningRate 0.0449 Epoch: 6 Global Step: 273750 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:15,851-Speed 2632.50 samples/sec Loss 9.0992 LearningRate 0.0449 Epoch: 6 Global Step: 273760 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:19,754-Speed 2623.61 samples/sec Loss 8.8886 LearningRate 0.0449 Epoch: 6 Global Step: 273770 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:23,652-Speed 2627.59 samples/sec Loss 9.0703 LearningRate 0.0449 Epoch: 6 Global Step: 273780 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:27,550-Speed 2627.61 samples/sec Loss 9.0335 LearningRate 0.0449 Epoch: 6 Global Step: 273790 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:31,455-Speed 2622.98 samples/sec Loss 9.0189 LearningRate 0.0449 Epoch: 6 Global Step: 273800 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:35,347-Speed 2631.72 samples/sec Loss 8.9884 LearningRate 0.0449 Epoch: 6 Global Step: 273810 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:39,243-Speed 2629.24 samples/sec Loss 9.1956 LearningRate 0.0449 Epoch: 6 Global Step: 273820 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:43,137-Speed 2630.00 samples/sec Loss 9.0691 LearningRate 0.0449 Epoch: 6 Global Step: 273830 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:29:47,043-Speed 2621.83 samples/sec Loss 8.9649 LearningRate 0.0449 Epoch: 6 Global Step: 273840 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:29:50,939-Speed 2628.98 samples/sec Loss 8.9655 LearningRate 0.0449 Epoch: 6 Global Step: 273850 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:29:54,834-Speed 2629.54 samples/sec Loss 9.0924 LearningRate 0.0449 Epoch: 6 Global Step: 273860 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:29:58,726-Speed 2631.77 samples/sec Loss 9.1389 LearningRate 0.0449 Epoch: 6 Global Step: 273870 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:30:02,618-Speed 2632.04 samples/sec Loss 9.0764 LearningRate 0.0449 Epoch: 6 Global Step: 273880 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:30:06,533-Speed 2615.84 samples/sec Loss 9.0577 LearningRate 0.0449 Epoch: 6 Global Step: 273890 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:30:10,391-Speed 2654.74 samples/sec Loss 9.1627 LearningRate 0.0449 Epoch: 6 Global Step: 273900 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:14,283-Speed 2631.55 samples/sec Loss 9.2238 LearningRate 0.0449 Epoch: 6 Global Step: 273910 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:18,177-Speed 2630.25 samples/sec Loss 9.0911 LearningRate 0.0449 Epoch: 6 Global Step: 273920 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:22,070-Speed 2631.67 samples/sec Loss 9.1113 LearningRate 0.0449 Epoch: 6 Global Step: 273930 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:25,960-Speed 2632.61 samples/sec Loss 9.1411 LearningRate 0.0449 Epoch: 6 Global Step: 273940 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:29,855-Speed 2629.74 samples/sec Loss 9.3182 LearningRate 0.0449 Epoch: 6 Global Step: 273950 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:33,747-Speed 2631.96 samples/sec Loss 9.1127 LearningRate 0.0449 Epoch: 6 Global Step: 273960 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:37,644-Speed 2628.40 samples/sec Loss 9.1040 LearningRate 0.0449 Epoch: 6 Global Step: 273970 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:41,576-Speed 2604.34 samples/sec Loss 9.0840 LearningRate 0.0449 Epoch: 6 Global Step: 273980 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:45,490-Speed 2620.57 samples/sec Loss 9.0764 LearningRate 0.0449 Epoch: 6 Global Step: 273990 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:30:49,384-Speed 2630.17 samples/sec Loss 9.1094 LearningRate 0.0449 Epoch: 6 Global Step: 274000 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:30:53,325-Speed 2599.74 samples/sec Loss 9.0822 LearningRate 0.0448 Epoch: 6 Global Step: 274010 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:30:57,221-Speed 2629.18 samples/sec Loss 9.0481 LearningRate 0.0448 Epoch: 6 Global Step: 274020 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:01,112-Speed 2632.07 samples/sec Loss 9.0356 LearningRate 0.0448 Epoch: 6 Global Step: 274030 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:05,023-Speed 2619.41 samples/sec Loss 9.0993 LearningRate 0.0448 Epoch: 6 Global Step: 274040 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:08,932-Speed 2620.15 samples/sec Loss 9.2268 LearningRate 0.0448 Epoch: 6 Global Step: 274050 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:12,841-Speed 2619.98 samples/sec Loss 9.0539 LearningRate 0.0448 Epoch: 6 Global Step: 274060 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:16,737-Speed 2629.12 samples/sec Loss 9.1194 LearningRate 0.0448 Epoch: 6 Global Step: 274070 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:20,634-Speed 2628.60 samples/sec Loss 8.8718 LearningRate 0.0448 Epoch: 6 Global Step: 274080 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:24,541-Speed 2621.38 samples/sec Loss 9.2029 LearningRate 0.0448 Epoch: 6 Global Step: 274090 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:31:28,439-Speed 2628.49 samples/sec Loss 8.9262 LearningRate 0.0448 Epoch: 6 Global Step: 274100 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:32,345-Speed 2622.05 samples/sec Loss 9.1334 LearningRate 0.0448 Epoch: 6 Global Step: 274110 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:36,232-Speed 2634.68 samples/sec Loss 9.1357 LearningRate 0.0448 Epoch: 6 Global Step: 274120 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:40,124-Speed 2632.02 samples/sec Loss 9.1065 LearningRate 0.0448 Epoch: 6 Global Step: 274130 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:44,022-Speed 2627.56 samples/sec Loss 9.0799 LearningRate 0.0448 Epoch: 6 Global Step: 274140 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:47,921-Speed 2627.19 samples/sec Loss 8.9372 LearningRate 0.0448 Epoch: 6 Global Step: 274150 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:51,815-Speed 2630.13 samples/sec Loss 9.1956 LearningRate 0.0448 Epoch: 6 Global Step: 274160 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:55,712-Speed 2628.54 samples/sec Loss 9.0864 LearningRate 0.0448 Epoch: 6 Global Step: 274170 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:31:59,606-Speed 2630.37 samples/sec Loss 9.0944 LearningRate 0.0448 Epoch: 6 Global Step: 274180 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:32:03,499-Speed 2630.99 samples/sec Loss 8.9958 LearningRate 0.0448 Epoch: 6 Global Step: 274190 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:32:07,391-Speed 2631.40 samples/sec Loss 9.1259 LearningRate 0.0448 Epoch: 6 Global Step: 274200 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:11,284-Speed 2631.15 samples/sec Loss 9.0404 LearningRate 0.0448 Epoch: 6 Global Step: 274210 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:15,188-Speed 2623.64 samples/sec Loss 8.9970 LearningRate 0.0448 Epoch: 6 Global Step: 274220 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:19,087-Speed 2626.75 samples/sec Loss 8.9732 LearningRate 0.0448 Epoch: 6 Global Step: 274230 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:22,980-Speed 2630.82 samples/sec Loss 9.0264 LearningRate 0.0448 Epoch: 6 Global Step: 274240 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:26,873-Speed 2631.09 samples/sec Loss 9.0787 LearningRate 0.0448 Epoch: 6 Global Step: 274250 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:30,767-Speed 2630.47 samples/sec Loss 9.0424 LearningRate 0.0448 Epoch: 6 Global Step: 274260 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:34,657-Speed 2634.80 samples/sec Loss 9.1284 LearningRate 0.0448 Epoch: 6 Global Step: 274270 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:38,552-Speed 2629.46 samples/sec Loss 9.0661 LearningRate 0.0448 Epoch: 6 Global Step: 274280 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:42,443-Speed 2632.13 samples/sec Loss 9.0221 LearningRate 0.0448 Epoch: 6 Global Step: 274290 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:32:46,336-Speed 2630.39 samples/sec Loss 9.1187 LearningRate 0.0448 Epoch: 6 Global Step: 274300 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:32:50,236-Speed 2626.91 samples/sec Loss 9.0285 LearningRate 0.0448 Epoch: 6 Global Step: 274310 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:32:54,135-Speed 2626.55 samples/sec Loss 9.1385 LearningRate 0.0448 Epoch: 6 Global Step: 274320 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:32:58,118-Speed 2571.93 samples/sec Loss 9.0175 LearningRate 0.0448 Epoch: 6 Global Step: 274330 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:02,014-Speed 2629.52 samples/sec Loss 9.1562 LearningRate 0.0448 Epoch: 6 Global Step: 274340 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:05,910-Speed 2628.94 samples/sec Loss 9.0310 LearningRate 0.0448 Epoch: 6 Global Step: 274350 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:09,824-Speed 2616.96 samples/sec Loss 8.9929 LearningRate 0.0448 Epoch: 6 Global Step: 274360 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:13,717-Speed 2630.81 samples/sec Loss 8.9792 LearningRate 0.0448 Epoch: 6 Global Step: 274370 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:17,613-Speed 2628.81 samples/sec Loss 9.0559 LearningRate 0.0448 Epoch: 6 Global Step: 274380 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:21,515-Speed 2625.70 samples/sec Loss 9.0287 LearningRate 0.0448 Epoch: 6 Global Step: 274390 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:25,414-Speed 2627.44 samples/sec Loss 9.1096 LearningRate 0.0448 Epoch: 6 Global Step: 274400 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:29,329-Speed 2615.95 samples/sec Loss 8.9958 LearningRate 0.0448 Epoch: 6 Global Step: 274410 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:33:33,211-Speed 2638.54 samples/sec Loss 9.0789 LearningRate 0.0448 Epoch: 6 Global Step: 274420 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:33:37,110-Speed 2626.89 samples/sec Loss 9.0951 LearningRate 0.0448 Epoch: 6 Global Step: 274430 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:33:41,103-Speed 2565.45 samples/sec Loss 9.1546 LearningRate 0.0448 Epoch: 6 Global Step: 274440 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:33:45,052-Speed 2593.24 samples/sec Loss 9.1426 LearningRate 0.0448 Epoch: 6 Global Step: 274450 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:33:48,983-Speed 2606.36 samples/sec Loss 9.0890 LearningRate 0.0448 Epoch: 6 Global Step: 274460 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:33:52,879-Speed 2628.61 samples/sec Loss 9.0332 LearningRate 0.0448 Epoch: 6 Global Step: 274470 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:33:56,811-Speed 2605.93 samples/sec Loss 9.1470 LearningRate 0.0448 Epoch: 6 Global Step: 274480 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:34:00,743-Speed 2604.61 samples/sec Loss 9.1715 LearningRate 0.0448 Epoch: 6 Global Step: 274490 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:34:04,641-Speed 2627.53 samples/sec Loss 9.1017 LearningRate 0.0448 Epoch: 6 Global Step: 274500 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:34:08,532-Speed 2632.63 samples/sec Loss 9.0686 LearningRate 0.0448 Epoch: 6 Global Step: 274510 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:34:12,425-Speed 2631.13 samples/sec Loss 9.0506 LearningRate 0.0448 Epoch: 6 Global Step: 274520 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:16,315-Speed 2632.94 samples/sec Loss 8.9238 LearningRate 0.0448 Epoch: 6 Global Step: 274530 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:20,216-Speed 2625.81 samples/sec Loss 8.9795 LearningRate 0.0448 Epoch: 6 Global Step: 274540 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:24,106-Speed 2632.33 samples/sec Loss 8.8285 LearningRate 0.0448 Epoch: 6 Global Step: 274550 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:27,998-Speed 2631.98 samples/sec Loss 8.9468 LearningRate 0.0448 Epoch: 6 Global Step: 274560 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:31,897-Speed 2631.26 samples/sec Loss 9.0005 LearningRate 0.0448 Epoch: 6 Global Step: 274570 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:35,804-Speed 2621.72 samples/sec Loss 9.0935 LearningRate 0.0448 Epoch: 6 Global Step: 274580 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:39,709-Speed 2623.05 samples/sec Loss 8.9895 LearningRate 0.0448 Epoch: 6 Global Step: 274590 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:43,606-Speed 2628.12 samples/sec Loss 9.0154 LearningRate 0.0448 Epoch: 6 Global Step: 274600 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:47,501-Speed 2630.05 samples/sec Loss 9.1054 LearningRate 0.0448 Epoch: 6 Global Step: 274610 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:51,401-Speed 2625.99 samples/sec Loss 9.1361 LearningRate 0.0448 Epoch: 6 Global Step: 274620 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:34:55,295-Speed 2630.44 samples/sec Loss 9.1187 LearningRate 0.0447 Epoch: 6 Global Step: 274630 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:34:59,193-Speed 2627.53 samples/sec Loss 9.1495 LearningRate 0.0447 Epoch: 6 Global Step: 274640 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:03,092-Speed 2627.09 samples/sec Loss 9.1373 LearningRate 0.0447 Epoch: 6 Global Step: 274650 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:06,998-Speed 2623.86 samples/sec Loss 9.0060 LearningRate 0.0447 Epoch: 6 Global Step: 274660 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:10,898-Speed 2627.01 samples/sec Loss 9.1457 LearningRate 0.0447 Epoch: 6 Global Step: 274670 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:14,795-Speed 2628.02 samples/sec Loss 8.9856 LearningRate 0.0447 Epoch: 6 Global Step: 274680 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:18,698-Speed 2624.50 samples/sec Loss 9.1154 LearningRate 0.0447 Epoch: 6 Global Step: 274690 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:22,595-Speed 2629.13 samples/sec Loss 9.1519 LearningRate 0.0447 Epoch: 6 Global Step: 274700 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:26,494-Speed 2626.48 samples/sec Loss 9.0739 LearningRate 0.0447 Epoch: 6 Global Step: 274710 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:30,394-Speed 2626.61 samples/sec Loss 9.1502 LearningRate 0.0447 Epoch: 6 Global Step: 274720 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:34,294-Speed 2626.54 samples/sec Loss 9.1135 LearningRate 0.0447 Epoch: 6 Global Step: 274730 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:35:38,236-Speed 2598.08 samples/sec Loss 9.0944 LearningRate 0.0447 Epoch: 6 Global Step: 274740 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:35:42,131-Speed 2629.87 samples/sec Loss 9.1323 LearningRate 0.0447 Epoch: 6 Global Step: 274750 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:35:46,022-Speed 2632.33 samples/sec Loss 9.1096 LearningRate 0.0447 Epoch: 6 Global Step: 274760 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:35:49,892-Speed 2646.52 samples/sec Loss 9.0398 LearningRate 0.0447 Epoch: 6 Global Step: 274770 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:53,798-Speed 2622.29 samples/sec Loss 8.8713 LearningRate 0.0447 Epoch: 6 Global Step: 274780 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:35:57,705-Speed 2621.67 samples/sec Loss 9.0592 LearningRate 0.0447 Epoch: 6 Global Step: 274790 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:36:01,587-Speed 2639.06 samples/sec Loss 9.0519 LearningRate 0.0447 Epoch: 6 Global Step: 274800 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:05,491-Speed 2623.28 samples/sec Loss 8.9809 LearningRate 0.0447 Epoch: 6 Global Step: 274810 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:09,408-Speed 2614.68 samples/sec Loss 9.0447 LearningRate 0.0447 Epoch: 6 Global Step: 274820 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:13,304-Speed 2629.07 samples/sec Loss 9.0727 LearningRate 0.0447 Epoch: 6 Global Step: 274830 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:17,220-Speed 2615.58 samples/sec Loss 9.1816 LearningRate 0.0447 Epoch: 6 Global Step: 274840 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:21,113-Speed 2631.43 samples/sec Loss 9.0411 LearningRate 0.0447 Epoch: 6 Global Step: 274850 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:25,004-Speed 2632.11 samples/sec Loss 8.9740 LearningRate 0.0447 Epoch: 6 Global Step: 274860 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:28,896-Speed 2631.73 samples/sec Loss 9.0570 LearningRate 0.0447 Epoch: 6 Global Step: 274870 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:32,792-Speed 2629.19 samples/sec Loss 9.0544 LearningRate 0.0447 Epoch: 6 Global Step: 274880 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:36:36,665-Speed 2644.48 samples/sec Loss 8.9348 LearningRate 0.0447 Epoch: 6 Global Step: 274890 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:36:40,534-Speed 2647.69 samples/sec Loss 9.4221 LearningRate 0.0447 Epoch: 6 Global Step: 274900 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:36:44,418-Speed 2636.79 samples/sec Loss 9.0815 LearningRate 0.0447 Epoch: 6 Global Step: 274910 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:36:48,309-Speed 2632.46 samples/sec Loss 9.1308 LearningRate 0.0447 Epoch: 6 Global Step: 274920 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:36:52,245-Speed 2602.84 samples/sec Loss 9.0556 LearningRate 0.0447 Epoch: 6 Global Step: 274930 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:36:56,143-Speed 2627.46 samples/sec Loss 9.1050 LearningRate 0.0447 Epoch: 6 Global Step: 274940 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:37:00,036-Speed 2631.08 samples/sec Loss 9.0902 LearningRate 0.0447 Epoch: 6 Global Step: 274950 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:37:03,930-Speed 2630.05 samples/sec Loss 9.0904 LearningRate 0.0447 Epoch: 6 Global Step: 274960 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:37:07,822-Speed 2631.74 samples/sec Loss 9.0725 LearningRate 0.0447 Epoch: 6 Global Step: 274970 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:37:11,718-Speed 2628.74 samples/sec Loss 9.0295 LearningRate 0.0447 Epoch: 6 Global Step: 274980 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:37:15,619-Speed 2625.92 samples/sec Loss 9.0278 LearningRate 0.0447 Epoch: 6 Global Step: 274990 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:37:19,514-Speed 2629.48 samples/sec Loss 9.0413 LearningRate 0.0447 Epoch: 6 Global Step: 275000 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:23,408-Speed 2630.49 samples/sec Loss 8.9908 LearningRate 0.0447 Epoch: 6 Global Step: 275010 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:27,307-Speed 2626.83 samples/sec Loss 9.1073 LearningRate 0.0447 Epoch: 6 Global Step: 275020 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:31,211-Speed 2623.83 samples/sec Loss 9.0880 LearningRate 0.0447 Epoch: 6 Global Step: 275030 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:35,105-Speed 2630.40 samples/sec Loss 9.0310 LearningRate 0.0447 Epoch: 6 Global Step: 275040 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:39,025-Speed 2612.69 samples/sec Loss 8.9886 LearningRate 0.0447 Epoch: 6 Global Step: 275050 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:42,990-Speed 2582.88 samples/sec Loss 9.1065 LearningRate 0.0447 Epoch: 6 Global Step: 275060 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:46,882-Speed 2632.43 samples/sec Loss 9.1028 LearningRate 0.0447 Epoch: 6 Global Step: 275070 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:50,776-Speed 2629.92 samples/sec Loss 8.9660 LearningRate 0.0447 Epoch: 6 Global Step: 275080 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:54,666-Speed 2633.46 samples/sec Loss 9.0888 LearningRate 0.0447 Epoch: 6 Global Step: 275090 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:37:58,557-Speed 2632.07 samples/sec Loss 9.0976 LearningRate 0.0447 Epoch: 6 Global Step: 275100 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:02,451-Speed 2630.88 samples/sec Loss 9.0992 LearningRate 0.0447 Epoch: 6 Global Step: 275110 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:06,358-Speed 2621.35 samples/sec Loss 9.0682 LearningRate 0.0447 Epoch: 6 Global Step: 275120 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:10,254-Speed 2628.78 samples/sec Loss 9.1132 LearningRate 0.0447 Epoch: 6 Global Step: 275130 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:14,141-Speed 2634.96 samples/sec Loss 9.1619 LearningRate 0.0447 Epoch: 6 Global Step: 275140 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:18,034-Speed 2631.02 samples/sec Loss 9.2207 LearningRate 0.0447 Epoch: 6 Global Step: 275150 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:21,925-Speed 2633.02 samples/sec Loss 9.0451 LearningRate 0.0447 Epoch: 6 Global Step: 275160 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:25,826-Speed 2625.42 samples/sec Loss 8.9939 LearningRate 0.0447 Epoch: 6 Global Step: 275170 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:29,726-Speed 2626.55 samples/sec Loss 9.0324 LearningRate 0.0447 Epoch: 6 Global Step: 275180 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:33,616-Speed 2632.94 samples/sec Loss 9.0204 LearningRate 0.0447 Epoch: 6 Global Step: 275190 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:38:37,514-Speed 2627.38 samples/sec Loss 9.0423 LearningRate 0.0447 Epoch: 6 Global Step: 275200 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:38:41,413-Speed 2627.51 samples/sec Loss 9.1613 LearningRate 0.0447 Epoch: 6 Global Step: 275210 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:38:45,307-Speed 2630.05 samples/sec Loss 9.2489 LearningRate 0.0447 Epoch: 6 Global Step: 275220 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:38:49,246-Speed 2600.47 samples/sec Loss 9.1214 LearningRate 0.0447 Epoch: 6 Global Step: 275230 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:38:53,148-Speed 2625.02 samples/sec Loss 8.9993 LearningRate 0.0447 Epoch: 6 Global Step: 275240 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:38:57,050-Speed 2625.18 samples/sec Loss 9.0327 LearningRate 0.0446 Epoch: 6 Global Step: 275250 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:39:00,965-Speed 2616.11 samples/sec Loss 9.0799 LearningRate 0.0446 Epoch: 6 Global Step: 275260 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:39:04,889-Speed 2610.37 samples/sec Loss 8.9443 LearningRate 0.0446 Epoch: 6 Global Step: 275270 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:39:08,779-Speed 2632.72 samples/sec Loss 9.1322 LearningRate 0.0446 Epoch: 6 Global Step: 275280 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:39:12,693-Speed 2617.19 samples/sec Loss 9.0330 LearningRate 0.0446 Epoch: 6 Global Step: 275290 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:39:16,594-Speed 2625.21 samples/sec Loss 9.0024 LearningRate 0.0446 Epoch: 6 Global Step: 275300 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:20,493-Speed 2627.28 samples/sec Loss 9.0026 LearningRate 0.0446 Epoch: 6 Global Step: 275310 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:24,393-Speed 2626.67 samples/sec Loss 9.1172 LearningRate 0.0446 Epoch: 6 Global Step: 275320 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:28,299-Speed 2621.70 samples/sec Loss 9.1167 LearningRate 0.0446 Epoch: 6 Global Step: 275330 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:32,198-Speed 2627.16 samples/sec Loss 9.0252 LearningRate 0.0446 Epoch: 6 Global Step: 275340 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:36,094-Speed 2629.02 samples/sec Loss 9.1314 LearningRate 0.0446 Epoch: 6 Global Step: 275350 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:39,986-Speed 2631.51 samples/sec Loss 9.1173 LearningRate 0.0446 Epoch: 6 Global Step: 275360 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:43,876-Speed 2633.01 samples/sec Loss 9.0741 LearningRate 0.0446 Epoch: 6 Global Step: 275370 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:47,766-Speed 2632.67 samples/sec Loss 8.9137 LearningRate 0.0446 Epoch: 6 Global Step: 275380 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:51,668-Speed 2625.05 samples/sec Loss 9.1454 LearningRate 0.0446 Epoch: 6 Global Step: 275390 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:39:55,565-Speed 2628.15 samples/sec Loss 9.1138 LearningRate 0.0446 Epoch: 6 Global Step: 275400 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:39:59,446-Speed 2639.75 samples/sec Loss 9.0892 LearningRate 0.0446 Epoch: 6 Global Step: 275410 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:03,339-Speed 2630.18 samples/sec Loss 9.1725 LearningRate 0.0446 Epoch: 6 Global Step: 275420 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:07,230-Speed 2633.01 samples/sec Loss 9.0755 LearningRate 0.0446 Epoch: 6 Global Step: 275430 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:11,117-Speed 2634.69 samples/sec Loss 8.9741 LearningRate 0.0446 Epoch: 6 Global Step: 275440 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:15,006-Speed 2633.79 samples/sec Loss 8.9143 LearningRate 0.0446 Epoch: 6 Global Step: 275450 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:18,897-Speed 2632.09 samples/sec Loss 9.0316 LearningRate 0.0446 Epoch: 6 Global Step: 275460 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:22,788-Speed 2632.70 samples/sec Loss 9.0079 LearningRate 0.0446 Epoch: 6 Global Step: 275470 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:26,676-Speed 2634.29 samples/sec Loss 8.9824 LearningRate 0.0446 Epoch: 6 Global Step: 275480 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:30,565-Speed 2633.31 samples/sec Loss 9.0314 LearningRate 0.0446 Epoch: 6 Global Step: 275490 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:34,461-Speed 2629.18 samples/sec Loss 9.1013 LearningRate 0.0446 Epoch: 6 Global Step: 275500 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:38,336-Speed 2642.62 samples/sec Loss 9.0099 LearningRate 0.0446 Epoch: 6 Global Step: 275510 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:42,237-Speed 2625.98 samples/sec Loss 8.8642 LearningRate 0.0446 Epoch: 6 Global Step: 275520 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:46,131-Speed 2630.37 samples/sec Loss 9.0474 LearningRate 0.0446 Epoch: 6 Global Step: 275530 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:50,025-Speed 2630.48 samples/sec Loss 9.0593 LearningRate 0.0446 Epoch: 6 Global Step: 275540 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:53,922-Speed 2634.38 samples/sec Loss 9.0398 LearningRate 0.0446 Epoch: 6 Global Step: 275550 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:40:57,831-Speed 2620.20 samples/sec Loss 9.0580 LearningRate 0.0446 Epoch: 6 Global Step: 275560 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:01,722-Speed 2631.78 samples/sec Loss 9.0585 LearningRate 0.0446 Epoch: 6 Global Step: 275570 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:05,619-Speed 2628.57 samples/sec Loss 8.9013 LearningRate 0.0446 Epoch: 6 Global Step: 275580 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:09,513-Speed 2630.02 samples/sec Loss 9.0767 LearningRate 0.0446 Epoch: 6 Global Step: 275590 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:13,406-Speed 2630.80 samples/sec Loss 8.7813 LearningRate 0.0446 Epoch: 6 Global Step: 275600 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:17,297-Speed 2632.75 samples/sec Loss 9.0610 LearningRate 0.0446 Epoch: 6 Global Step: 275610 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:41:21,190-Speed 2630.57 samples/sec Loss 8.8983 LearningRate 0.0446 Epoch: 6 Global Step: 275620 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:41:25,166-Speed 2575.80 samples/sec Loss 9.0037 LearningRate 0.0446 Epoch: 6 Global Step: 275630 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:41:29,052-Speed 2636.23 samples/sec Loss 9.0879 LearningRate 0.0446 Epoch: 6 Global Step: 275640 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:32,966-Speed 2616.75 samples/sec Loss 9.0606 LearningRate 0.0446 Epoch: 6 Global Step: 275650 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:36,876-Speed 2619.46 samples/sec Loss 8.9662 LearningRate 0.0446 Epoch: 6 Global Step: 275660 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:40,792-Speed 2615.26 samples/sec Loss 8.9947 LearningRate 0.0446 Epoch: 6 Global Step: 275670 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:44,703-Speed 2618.96 samples/sec Loss 9.1233 LearningRate 0.0446 Epoch: 6 Global Step: 275680 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:41:48,599-Speed 2629.16 samples/sec Loss 9.0191 LearningRate 0.0446 Epoch: 6 Global Step: 275690 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:41:52,516-Speed 2615.00 samples/sec Loss 8.9448 LearningRate 0.0446 Epoch: 6 Global Step: 275700 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:41:56,410-Speed 2630.04 samples/sec Loss 9.0298 LearningRate 0.0446 Epoch: 6 Global Step: 275710 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:00,303-Speed 2630.99 samples/sec Loss 8.8883 LearningRate 0.0446 Epoch: 6 Global Step: 275720 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:04,205-Speed 2625.06 samples/sec Loss 8.9870 LearningRate 0.0446 Epoch: 6 Global Step: 275730 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:08,096-Speed 2632.07 samples/sec Loss 8.9914 LearningRate 0.0446 Epoch: 6 Global Step: 275740 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:11,985-Speed 2633.66 samples/sec Loss 8.9403 LearningRate 0.0446 Epoch: 6 Global Step: 275750 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:15,874-Speed 2634.01 samples/sec Loss 9.0478 LearningRate 0.0446 Epoch: 6 Global Step: 275760 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:19,763-Speed 2633.29 samples/sec Loss 8.9611 LearningRate 0.0446 Epoch: 6 Global Step: 275770 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:23,654-Speed 2632.64 samples/sec Loss 9.0574 LearningRate 0.0446 Epoch: 6 Global Step: 275780 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:42:27,544-Speed 2632.75 samples/sec Loss 9.0086 LearningRate 0.0446 Epoch: 6 Global Step: 275790 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:31,434-Speed 2633.04 samples/sec Loss 9.1585 LearningRate 0.0446 Epoch: 6 Global Step: 275800 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:35,323-Speed 2633.24 samples/sec Loss 9.0164 LearningRate 0.0446 Epoch: 6 Global Step: 275810 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:39,218-Speed 2629.67 samples/sec Loss 8.9396 LearningRate 0.0446 Epoch: 6 Global Step: 275820 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:43,109-Speed 2632.07 samples/sec Loss 9.0792 LearningRate 0.0446 Epoch: 6 Global Step: 275830 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:47,008-Speed 2627.52 samples/sec Loss 8.9270 LearningRate 0.0446 Epoch: 6 Global Step: 275840 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:50,903-Speed 2629.28 samples/sec Loss 8.8109 LearningRate 0.0446 Epoch: 6 Global Step: 275850 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:54,808-Speed 2623.58 samples/sec Loss 8.9965 LearningRate 0.0446 Epoch: 6 Global Step: 275860 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:42:58,704-Speed 2628.97 samples/sec Loss 9.1074 LearningRate 0.0446 Epoch: 6 Global Step: 275870 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:02,698-Speed 2564.91 samples/sec Loss 9.2022 LearningRate 0.0445 Epoch: 6 Global Step: 275880 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:06,581-Speed 2637.29 samples/sec Loss 9.1152 LearningRate 0.0445 Epoch: 6 Global Step: 275890 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:43:10,479-Speed 2627.50 samples/sec Loss 9.1229 LearningRate 0.0445 Epoch: 6 Global Step: 275900 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:43:14,359-Speed 2639.80 samples/sec Loss 9.0744 LearningRate 0.0445 Epoch: 6 Global Step: 275910 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:18,249-Speed 2632.69 samples/sec Loss 9.0938 LearningRate 0.0445 Epoch: 6 Global Step: 275920 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:22,146-Speed 2628.38 samples/sec Loss 9.1223 LearningRate 0.0445 Epoch: 6 Global Step: 275930 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:26,044-Speed 2627.69 samples/sec Loss 8.9926 LearningRate 0.0445 Epoch: 6 Global Step: 275940 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:29,941-Speed 2630.28 samples/sec Loss 9.1735 LearningRate 0.0445 Epoch: 6 Global Step: 275950 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:33,833-Speed 2631.39 samples/sec Loss 9.0263 LearningRate 0.0445 Epoch: 6 Global Step: 275960 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:37,728-Speed 2628.99 samples/sec Loss 9.0667 LearningRate 0.0445 Epoch: 6 Global Step: 275970 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:41,625-Speed 2628.68 samples/sec Loss 9.0150 LearningRate 0.0445 Epoch: 6 Global Step: 275980 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:45,519-Speed 2630.12 samples/sec Loss 8.9241 LearningRate 0.0445 Epoch: 6 Global Step: 275990 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:43:49,397-Speed 2641.11 samples/sec Loss 8.9738 LearningRate 0.0445 Epoch: 6 Global Step: 276000 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:43:53,307-Speed 2619.36 samples/sec Loss 9.0757 LearningRate 0.0445 Epoch: 6 Global Step: 276010 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:43:57,197-Speed 2633.22 samples/sec Loss 8.8957 LearningRate 0.0445 Epoch: 6 Global Step: 276020 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:44:01,116-Speed 2613.49 samples/sec Loss 9.0034 LearningRate 0.0445 Epoch: 6 Global Step: 276030 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:44:05,014-Speed 2627.98 samples/sec Loss 9.1013 LearningRate 0.0445 Epoch: 6 Global Step: 276040 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:44:08,913-Speed 2626.37 samples/sec Loss 8.9554 LearningRate 0.0445 Epoch: 6 Global Step: 276050 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:44:12,801-Speed 2634.63 samples/sec Loss 8.9608 LearningRate 0.0445 Epoch: 6 Global Step: 276060 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:44:16,686-Speed 2636.44 samples/sec Loss 9.9828 LearningRate 0.0445 Epoch: 6 Global Step: 276070 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:20,576-Speed 2632.64 samples/sec Loss 9.2565 LearningRate 0.0445 Epoch: 6 Global Step: 276080 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:24,468-Speed 2631.42 samples/sec Loss 8.9631 LearningRate 0.0445 Epoch: 6 Global Step: 276090 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:28,360-Speed 2632.18 samples/sec Loss 8.9721 LearningRate 0.0445 Epoch: 6 Global Step: 276100 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:32,251-Speed 2631.66 samples/sec Loss 9.0377 LearningRate 0.0445 Epoch: 6 Global Step: 276110 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:36,142-Speed 2632.75 samples/sec Loss 9.0455 LearningRate 0.0445 Epoch: 6 Global Step: 276120 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:40,032-Speed 2633.22 samples/sec Loss 9.2326 LearningRate 0.0445 Epoch: 6 Global Step: 276130 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:43,923-Speed 2632.23 samples/sec Loss 8.9542 LearningRate 0.0445 Epoch: 6 Global Step: 276140 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:47,818-Speed 2629.54 samples/sec Loss 8.9695 LearningRate 0.0445 Epoch: 6 Global Step: 276150 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:51,708-Speed 2633.06 samples/sec Loss 9.0648 LearningRate 0.0445 Epoch: 6 Global Step: 276160 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:44:55,603-Speed 2629.21 samples/sec Loss 8.8742 LearningRate 0.0445 Epoch: 6 Global Step: 276170 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:44:59,507-Speed 2623.75 samples/sec Loss 9.0309 LearningRate 0.0445 Epoch: 6 Global Step: 276180 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:03,399-Speed 2631.52 samples/sec Loss 8.9437 LearningRate 0.0445 Epoch: 6 Global Step: 276190 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:07,293-Speed 2630.78 samples/sec Loss 9.1598 LearningRate 0.0445 Epoch: 6 Global Step: 276200 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:11,190-Speed 2628.31 samples/sec Loss 9.1717 LearningRate 0.0445 Epoch: 6 Global Step: 276210 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:15,081-Speed 2632.32 samples/sec Loss 9.1189 LearningRate 0.0445 Epoch: 6 Global Step: 276220 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:18,974-Speed 2630.92 samples/sec Loss 8.9501 LearningRate 0.0445 Epoch: 6 Global Step: 276230 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:22,866-Speed 2631.92 samples/sec Loss 9.0750 LearningRate 0.0445 Epoch: 6 Global Step: 276240 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:26,760-Speed 2630.35 samples/sec Loss 9.0945 LearningRate 0.0445 Epoch: 6 Global Step: 276250 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:30,651-Speed 2632.18 samples/sec Loss 9.1089 LearningRate 0.0445 Epoch: 6 Global Step: 276260 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:45:34,541-Speed 2632.59 samples/sec Loss 8.8600 LearningRate 0.0445 Epoch: 6 Global Step: 276270 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:45:38,442-Speed 2625.46 samples/sec Loss 9.0019 LearningRate 0.0445 Epoch: 6 Global Step: 276280 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:45:42,334-Speed 2631.58 samples/sec Loss 8.9781 LearningRate 0.0445 Epoch: 6 Global Step: 276290 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:45:46,229-Speed 2629.91 samples/sec Loss 9.0723 LearningRate 0.0445 Epoch: 6 Global Step: 276300 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:45:50,120-Speed 2632.90 samples/sec Loss 9.0380 LearningRate 0.0445 Epoch: 6 Global Step: 276310 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:45:54,017-Speed 2628.09 samples/sec Loss 9.2052 LearningRate 0.0445 Epoch: 6 Global Step: 276320 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:45:57,909-Speed 2632.05 samples/sec Loss 8.9615 LearningRate 0.0445 Epoch: 6 Global Step: 276330 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:01,812-Speed 2624.19 samples/sec Loss 9.0832 LearningRate 0.0445 Epoch: 6 Global Step: 276340 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:05,707-Speed 2629.58 samples/sec Loss 9.0658 LearningRate 0.0445 Epoch: 6 Global Step: 276350 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:09,601-Speed 2630.73 samples/sec Loss 8.9684 LearningRate 0.0445 Epoch: 6 Global Step: 276360 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:13,494-Speed 2630.67 samples/sec Loss 8.9288 LearningRate 0.0445 Epoch: 6 Global Step: 276370 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:46:17,389-Speed 2629.22 samples/sec Loss 8.9473 LearningRate 0.0445 Epoch: 6 Global Step: 276380 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:46:21,274-Speed 2636.59 samples/sec Loss 9.0991 LearningRate 0.0445 Epoch: 6 Global Step: 276390 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:25,167-Speed 2630.77 samples/sec Loss 8.9935 LearningRate 0.0445 Epoch: 6 Global Step: 276400 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:29,055-Speed 2634.03 samples/sec Loss 8.9068 LearningRate 0.0445 Epoch: 6 Global Step: 276410 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:32,947-Speed 2632.36 samples/sec Loss 8.9687 LearningRate 0.0445 Epoch: 6 Global Step: 276420 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:36,837-Speed 2632.90 samples/sec Loss 9.0202 LearningRate 0.0445 Epoch: 6 Global Step: 276430 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:46:40,719-Speed 2638.80 samples/sec Loss 8.9308 LearningRate 0.0445 Epoch: 6 Global Step: 276440 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:46:44,614-Speed 2628.92 samples/sec Loss 9.0477 LearningRate 0.0445 Epoch: 6 Global Step: 276450 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:46:48,505-Speed 2632.62 samples/sec Loss 9.0896 LearningRate 0.0445 Epoch: 6 Global Step: 276460 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:46:52,395-Speed 2633.03 samples/sec Loss 8.8678 LearningRate 0.0445 Epoch: 6 Global Step: 276470 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:46:56,288-Speed 2630.86 samples/sec Loss 8.9724 LearningRate 0.0445 Epoch: 6 Global Step: 276480 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:47:00,182-Speed 2630.27 samples/sec Loss 9.0040 LearningRate 0.0445 Epoch: 6 Global Step: 276490 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:47:04,087-Speed 2622.89 samples/sec Loss 9.0540 LearningRate 0.0444 Epoch: 6 Global Step: 276500 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:47:07,974-Speed 2634.71 samples/sec Loss 8.9655 LearningRate 0.0444 Epoch: 6 Global Step: 276510 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:47:11,880-Speed 2622.58 samples/sec Loss 9.0025 LearningRate 0.0444 Epoch: 6 Global Step: 276520 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:47:15,784-Speed 2623.17 samples/sec Loss 8.9154 LearningRate 0.0444 Epoch: 6 Global Step: 276530 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:47:19,694-Speed 2619.98 samples/sec Loss 8.8858 LearningRate 0.0444 Epoch: 6 Global Step: 276540 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:23,601-Speed 2622.09 samples/sec Loss 9.1899 LearningRate 0.0444 Epoch: 6 Global Step: 276550 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:27,505-Speed 2623.15 samples/sec Loss 9.0248 LearningRate 0.0444 Epoch: 6 Global Step: 276560 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:31,402-Speed 2628.35 samples/sec Loss 9.1546 LearningRate 0.0444 Epoch: 6 Global Step: 276570 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:35,307-Speed 2622.67 samples/sec Loss 9.1235 LearningRate 0.0444 Epoch: 6 Global Step: 276580 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:39,203-Speed 2628.70 samples/sec Loss 8.9532 LearningRate 0.0444 Epoch: 6 Global Step: 276590 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:43,095-Speed 2631.77 samples/sec Loss 8.9852 LearningRate 0.0444 Epoch: 6 Global Step: 276600 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:46,985-Speed 2632.83 samples/sec Loss 9.0151 LearningRate 0.0444 Epoch: 6 Global Step: 276610 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:50,883-Speed 2627.96 samples/sec Loss 9.1891 LearningRate 0.0444 Epoch: 6 Global Step: 276620 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:54,799-Speed 2615.57 samples/sec Loss 9.1338 LearningRate 0.0444 Epoch: 6 Global Step: 276630 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:47:58,714-Speed 2616.12 samples/sec Loss 8.9941 LearningRate 0.0444 Epoch: 6 Global Step: 276640 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:02,615-Speed 2625.93 samples/sec Loss 9.0463 LearningRate 0.0444 Epoch: 6 Global Step: 276650 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:06,511-Speed 2628.15 samples/sec Loss 8.9332 LearningRate 0.0444 Epoch: 6 Global Step: 276660 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:10,406-Speed 2630.10 samples/sec Loss 9.0515 LearningRate 0.0444 Epoch: 6 Global Step: 276670 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:14,301-Speed 2629.04 samples/sec Loss 9.1195 LearningRate 0.0444 Epoch: 6 Global Step: 276680 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:18,213-Speed 2618.06 samples/sec Loss 9.0538 LearningRate 0.0444 Epoch: 6 Global Step: 276690 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:22,112-Speed 2627.27 samples/sec Loss 9.0545 LearningRate 0.0444 Epoch: 6 Global Step: 276700 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:26,009-Speed 2628.08 samples/sec Loss 8.9145 LearningRate 0.0444 Epoch: 6 Global Step: 276710 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:29,899-Speed 2633.30 samples/sec Loss 8.9627 LearningRate 0.0444 Epoch: 6 Global Step: 276720 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:33,791-Speed 2631.64 samples/sec Loss 9.1191 LearningRate 0.0444 Epoch: 6 Global Step: 276730 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:37,683-Speed 2631.85 samples/sec Loss 8.9794 LearningRate 0.0444 Epoch: 6 Global Step: 276740 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:48:41,555-Speed 2644.88 samples/sec Loss 8.9992 LearningRate 0.0444 Epoch: 6 Global Step: 276750 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:45,448-Speed 2631.12 samples/sec Loss 9.0680 LearningRate 0.0444 Epoch: 6 Global Step: 276760 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:49,339-Speed 2632.20 samples/sec Loss 9.0591 LearningRate 0.0444 Epoch: 6 Global Step: 276770 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:53,229-Speed 2632.79 samples/sec Loss 9.1203 LearningRate 0.0444 Epoch: 6 Global Step: 276780 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:48:57,119-Speed 2632.87 samples/sec Loss 8.9693 LearningRate 0.0444 Epoch: 6 Global Step: 276790 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:01,006-Speed 2635.32 samples/sec Loss 9.0464 LearningRate 0.0444 Epoch: 6 Global Step: 276800 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:04,913-Speed 2621.57 samples/sec Loss 9.1247 LearningRate 0.0444 Epoch: 6 Global Step: 276810 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:08,810-Speed 2628.04 samples/sec Loss 8.9989 LearningRate 0.0444 Epoch: 6 Global Step: 276820 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:12,703-Speed 2630.97 samples/sec Loss 9.0413 LearningRate 0.0444 Epoch: 6 Global Step: 276830 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:16,595-Speed 2631.67 samples/sec Loss 9.0566 LearningRate 0.0444 Epoch: 6 Global Step: 276840 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:20,488-Speed 2630.67 samples/sec Loss 9.0227 LearningRate 0.0444 Epoch: 6 Global Step: 276850 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:49:24,391-Speed 2624.73 samples/sec Loss 9.0557 LearningRate 0.0444 Epoch: 6 Global Step: 276860 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:28,292-Speed 2625.36 samples/sec Loss 9.0833 LearningRate 0.0444 Epoch: 6 Global Step: 276870 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:32,187-Speed 2629.46 samples/sec Loss 9.0066 LearningRate 0.0444 Epoch: 6 Global Step: 276880 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:36,096-Speed 2620.24 samples/sec Loss 8.9822 LearningRate 0.0444 Epoch: 6 Global Step: 276890 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:39,997-Speed 2625.37 samples/sec Loss 9.0291 LearningRate 0.0444 Epoch: 6 Global Step: 276900 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:43,928-Speed 2605.37 samples/sec Loss 9.0104 LearningRate 0.0444 Epoch: 6 Global Step: 276910 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:47,831-Speed 2625.02 samples/sec Loss 8.9314 LearningRate 0.0444 Epoch: 6 Global Step: 276920 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:51,730-Speed 2626.55 samples/sec Loss 9.0053 LearningRate 0.0444 Epoch: 6 Global Step: 276930 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:55,627-Speed 2627.98 samples/sec Loss 8.9029 LearningRate 0.0444 Epoch: 6 Global Step: 276940 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:49:59,529-Speed 2625.25 samples/sec Loss 8.8978 LearningRate 0.0444 Epoch: 6 Global Step: 276950 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:50:03,435-Speed 2622.07 samples/sec Loss 9.0802 LearningRate 0.0444 Epoch: 6 Global Step: 276960 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 02:50:07,320-Speed 2636.53 samples/sec Loss 9.0903 LearningRate 0.0444 Epoch: 6 Global Step: 276970 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:50:11,217-Speed 2627.49 samples/sec Loss 8.9225 LearningRate 0.0444 Epoch: 6 Global Step: 276980 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:50:15,094-Speed 2642.19 samples/sec Loss 8.9467 LearningRate 0.0444 Epoch: 6 Global Step: 276990 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:50:19,000-Speed 2622.15 samples/sec Loss 8.9133 LearningRate 0.0444 Epoch: 6 Global Step: 277000 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:50:22,853-Speed 2658.47 samples/sec Loss 10.4272 LearningRate 0.0444 Epoch: 6 Global Step: 277010 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:26,761-Speed 2620.95 samples/sec Loss 9.7567 LearningRate 0.0444 Epoch: 6 Global Step: 277020 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:30,654-Speed 2630.85 samples/sec Loss 9.3507 LearningRate 0.0444 Epoch: 6 Global Step: 277030 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:34,543-Speed 2633.59 samples/sec Loss 9.1629 LearningRate 0.0444 Epoch: 6 Global Step: 277040 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:38,462-Speed 2614.05 samples/sec Loss 9.1675 LearningRate 0.0444 Epoch: 6 Global Step: 277050 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:42,384-Speed 2611.30 samples/sec Loss 9.0715 LearningRate 0.0444 Epoch: 6 Global Step: 277060 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:46,280-Speed 2628.42 samples/sec Loss 9.0038 LearningRate 0.0444 Epoch: 6 Global Step: 277070 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:50,179-Speed 2626.71 samples/sec Loss 9.0204 LearningRate 0.0444 Epoch: 6 Global Step: 277080 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:54,083-Speed 2623.37 samples/sec Loss 8.9527 LearningRate 0.0444 Epoch: 6 Global Step: 277090 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:50:58,011-Speed 2607.70 samples/sec Loss 9.1512 LearningRate 0.0444 Epoch: 6 Global Step: 277100 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:51:01,910-Speed 2626.98 samples/sec Loss 8.9931 LearningRate 0.0444 Epoch: 6 Global Step: 277110 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:05,814-Speed 2623.90 samples/sec Loss 9.0037 LearningRate 0.0443 Epoch: 6 Global Step: 277120 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:09,713-Speed 2626.53 samples/sec Loss 9.0782 LearningRate 0.0443 Epoch: 6 Global Step: 277130 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:13,614-Speed 2625.91 samples/sec Loss 9.0677 LearningRate 0.0443 Epoch: 6 Global Step: 277140 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:17,517-Speed 2623.88 samples/sec Loss 9.0674 LearningRate 0.0443 Epoch: 6 Global Step: 277150 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:21,417-Speed 2626.19 samples/sec Loss 9.0026 LearningRate 0.0443 Epoch: 6 Global Step: 277160 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:25,319-Speed 2624.72 samples/sec Loss 8.9905 LearningRate 0.0443 Epoch: 6 Global Step: 277170 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:29,220-Speed 2625.82 samples/sec Loss 8.9556 LearningRate 0.0443 Epoch: 6 Global Step: 277180 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:33,113-Speed 2630.60 samples/sec Loss 8.9209 LearningRate 0.0443 Epoch: 6 Global Step: 277190 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:37,022-Speed 2620.65 samples/sec Loss 8.9214 LearningRate 0.0443 Epoch: 6 Global Step: 277200 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:51:40,922-Speed 2626.53 samples/sec Loss 9.0091 LearningRate 0.0443 Epoch: 6 Global Step: 277210 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:51:44,832-Speed 2619.22 samples/sec Loss 9.1305 LearningRate 0.0443 Epoch: 6 Global Step: 277220 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:51:48,730-Speed 2627.41 samples/sec Loss 9.0519 LearningRate 0.0443 Epoch: 6 Global Step: 277230 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:51:52,629-Speed 2627.51 samples/sec Loss 9.0510 LearningRate 0.0443 Epoch: 6 Global Step: 277240 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:51:56,524-Speed 2629.27 samples/sec Loss 9.0653 LearningRate 0.0443 Epoch: 6 Global Step: 277250 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:52:00,426-Speed 2624.64 samples/sec Loss 8.9521 LearningRate 0.0443 Epoch: 6 Global Step: 277260 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:52:04,347-Speed 2612.06 samples/sec Loss 9.1505 LearningRate 0.0443 Epoch: 6 Global Step: 277270 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:52:08,238-Speed 2632.46 samples/sec Loss 8.9143 LearningRate 0.0443 Epoch: 6 Global Step: 277280 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:52:12,131-Speed 2630.94 samples/sec Loss 8.9926 LearningRate 0.0443 Epoch: 6 Global Step: 277290 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:52:16,026-Speed 2629.57 samples/sec Loss 8.9500 LearningRate 0.0443 Epoch: 6 Global Step: 277300 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:52:19,921-Speed 2630.04 samples/sec Loss 9.1072 LearningRate 0.0443 Epoch: 6 Global Step: 277310 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:23,809-Speed 2634.42 samples/sec Loss 8.9963 LearningRate 0.0443 Epoch: 6 Global Step: 277320 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:27,700-Speed 2633.19 samples/sec Loss 9.1754 LearningRate 0.0443 Epoch: 6 Global Step: 277330 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:31,589-Speed 2633.09 samples/sec Loss 9.1734 LearningRate 0.0443 Epoch: 6 Global Step: 277340 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:35,494-Speed 2623.12 samples/sec Loss 8.8445 LearningRate 0.0443 Epoch: 6 Global Step: 277350 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:39,381-Speed 2634.24 samples/sec Loss 8.9494 LearningRate 0.0443 Epoch: 6 Global Step: 277360 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:43,274-Speed 2631.33 samples/sec Loss 8.9140 LearningRate 0.0443 Epoch: 6 Global Step: 277370 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:47,162-Speed 2634.18 samples/sec Loss 8.9851 LearningRate 0.0443 Epoch: 6 Global Step: 277380 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:51,063-Speed 2625.97 samples/sec Loss 8.9002 LearningRate 0.0443 Epoch: 6 Global Step: 277390 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:54,967-Speed 2623.29 samples/sec Loss 8.9900 LearningRate 0.0443 Epoch: 6 Global Step: 277400 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:52:58,883-Speed 2615.40 samples/sec Loss 9.0751 LearningRate 0.0443 Epoch: 6 Global Step: 277410 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:02,778-Speed 2629.34 samples/sec Loss 9.0031 LearningRate 0.0443 Epoch: 6 Global Step: 277420 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:06,669-Speed 2632.48 samples/sec Loss 8.9778 LearningRate 0.0443 Epoch: 6 Global Step: 277430 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:10,565-Speed 2629.19 samples/sec Loss 9.0595 LearningRate 0.0443 Epoch: 6 Global Step: 277440 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:14,457-Speed 2631.37 samples/sec Loss 9.0309 LearningRate 0.0443 Epoch: 6 Global Step: 277450 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:18,359-Speed 2624.80 samples/sec Loss 9.0122 LearningRate 0.0443 Epoch: 6 Global Step: 277460 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:22,256-Speed 2628.03 samples/sec Loss 9.0936 LearningRate 0.0443 Epoch: 6 Global Step: 277470 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:26,158-Speed 2625.65 samples/sec Loss 9.0688 LearningRate 0.0443 Epoch: 6 Global Step: 277480 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:53:30,046-Speed 2633.92 samples/sec Loss 9.0156 LearningRate 0.0443 Epoch: 6 Global Step: 277490 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:53:33,842-Speed 2698.51 samples/sec Loss 9.5045 LearningRate 0.0443 Epoch: 6 Global Step: 277500 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:53:37,742-Speed 2626.48 samples/sec Loss 9.6891 LearningRate 0.0443 Epoch: 6 Global Step: 277510 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:53:41,641-Speed 2626.73 samples/sec Loss 9.4107 LearningRate 0.0443 Epoch: 6 Global Step: 277520 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:53:45,544-Speed 2624.02 samples/sec Loss 9.0700 LearningRate 0.0443 Epoch: 6 Global Step: 277530 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:53:49,456-Speed 2618.11 samples/sec Loss 9.1172 LearningRate 0.0443 Epoch: 6 Global Step: 277540 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:53:53,412-Speed 2588.92 samples/sec Loss 8.9188 LearningRate 0.0443 Epoch: 6 Global Step: 277550 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:53:57,313-Speed 2626.06 samples/sec Loss 8.9332 LearningRate 0.0443 Epoch: 6 Global Step: 277560 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:54:01,203-Speed 2632.66 samples/sec Loss 9.1221 LearningRate 0.0443 Epoch: 6 Global Step: 277570 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:54:05,094-Speed 2632.27 samples/sec Loss 9.0326 LearningRate 0.0443 Epoch: 6 Global Step: 277580 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:54:09,020-Speed 2608.88 samples/sec Loss 9.1520 LearningRate 0.0443 Epoch: 6 Global Step: 277590 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 02:54:12,914-Speed 2630.35 samples/sec Loss 9.1262 LearningRate 0.0443 Epoch: 6 Global Step: 277600 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:16,801-Speed 2635.28 samples/sec Loss 8.9184 LearningRate 0.0443 Epoch: 6 Global Step: 277610 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:20,689-Speed 2634.34 samples/sec Loss 8.8883 LearningRate 0.0443 Epoch: 6 Global Step: 277620 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:24,587-Speed 2627.68 samples/sec Loss 8.9355 LearningRate 0.0443 Epoch: 6 Global Step: 277630 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:28,473-Speed 2635.79 samples/sec Loss 8.8760 LearningRate 0.0443 Epoch: 6 Global Step: 277640 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:32,378-Speed 2622.48 samples/sec Loss 8.9678 LearningRate 0.0443 Epoch: 6 Global Step: 277650 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:36,267-Speed 2633.73 samples/sec Loss 9.0321 LearningRate 0.0443 Epoch: 6 Global Step: 277660 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:40,155-Speed 2634.10 samples/sec Loss 9.0278 LearningRate 0.0443 Epoch: 6 Global Step: 277670 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:44,046-Speed 2632.12 samples/sec Loss 9.0445 LearningRate 0.0443 Epoch: 6 Global Step: 277680 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:48,107-Speed 2522.28 samples/sec Loss 9.0047 LearningRate 0.0443 Epoch: 6 Global Step: 277690 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 02:54:52,073-Speed 2582.88 samples/sec Loss 8.9879 LearningRate 0.0443 Epoch: 6 Global Step: 277700 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:54:55,967-Speed 2630.49 samples/sec Loss 9.1299 LearningRate 0.0443 Epoch: 6 Global Step: 277710 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:54:59,858-Speed 2632.05 samples/sec Loss 8.9635 LearningRate 0.0443 Epoch: 6 Global Step: 277720 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:03,749-Speed 2632.45 samples/sec Loss 8.9971 LearningRate 0.0443 Epoch: 6 Global Step: 277730 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:07,642-Speed 2630.59 samples/sec Loss 8.9687 LearningRate 0.0442 Epoch: 6 Global Step: 277740 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:11,530-Speed 2634.30 samples/sec Loss 8.9989 LearningRate 0.0442 Epoch: 6 Global Step: 277750 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:15,429-Speed 2627.08 samples/sec Loss 8.9195 LearningRate 0.0442 Epoch: 6 Global Step: 277760 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:19,322-Speed 2631.21 samples/sec Loss 9.1430 LearningRate 0.0442 Epoch: 6 Global Step: 277770 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:23,214-Speed 2631.53 samples/sec Loss 8.9309 LearningRate 0.0442 Epoch: 6 Global Step: 277780 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:27,125-Speed 2619.02 samples/sec Loss 8.9962 LearningRate 0.0442 Epoch: 6 Global Step: 277790 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 02:55:31,015-Speed 2632.49 samples/sec Loss 8.9418 LearningRate 0.0442 Epoch: 6 Global Step: 277800 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:55:34,903-Speed 2634.37 samples/sec Loss 9.1413 LearningRate 0.0442 Epoch: 6 Global Step: 277810 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:55:38,795-Speed 2631.69 samples/sec Loss 9.0943 LearningRate 0.0442 Epoch: 6 Global Step: 277820 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:55:42,687-Speed 2631.69 samples/sec Loss 8.8337 LearningRate 0.0442 Epoch: 6 Global Step: 277830 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:55:46,577-Speed 2632.85 samples/sec Loss 8.9245 LearningRate 0.0442 Epoch: 6 Global Step: 277840 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:55:50,468-Speed 2633.05 samples/sec Loss 8.9308 LearningRate 0.0442 Epoch: 6 Global Step: 277850 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:55:54,360-Speed 2631.41 samples/sec Loss 9.0975 LearningRate 0.0442 Epoch: 6 Global Step: 277860 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:55:58,247-Speed 2634.87 samples/sec Loss 9.0419 LearningRate 0.0442 Epoch: 6 Global Step: 277870 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:56:02,134-Speed 2634.83 samples/sec Loss 8.9586 LearningRate 0.0442 Epoch: 6 Global Step: 277880 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:56:06,025-Speed 2632.10 samples/sec Loss 9.0307 LearningRate 0.0442 Epoch: 6 Global Step: 277890 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:56:09,915-Speed 2632.73 samples/sec Loss 9.1291 LearningRate 0.0442 Epoch: 6 Global Step: 277900 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:13,817-Speed 2625.49 samples/sec Loss 8.8499 LearningRate 0.0442 Epoch: 6 Global Step: 277910 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:17,714-Speed 2628.86 samples/sec Loss 8.9917 LearningRate 0.0442 Epoch: 6 Global Step: 277920 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:21,609-Speed 2628.90 samples/sec Loss 9.0485 LearningRate 0.0442 Epoch: 6 Global Step: 277930 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:25,505-Speed 2629.11 samples/sec Loss 8.9522 LearningRate 0.0442 Epoch: 6 Global Step: 277940 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:29,397-Speed 2631.75 samples/sec Loss 9.0368 LearningRate 0.0442 Epoch: 6 Global Step: 277950 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:33,291-Speed 2630.16 samples/sec Loss 8.9376 LearningRate 0.0442 Epoch: 6 Global Step: 277960 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:37,181-Speed 2632.69 samples/sec Loss 9.0874 LearningRate 0.0442 Epoch: 6 Global Step: 277970 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:41,071-Speed 2633.19 samples/sec Loss 8.9794 LearningRate 0.0442 Epoch: 6 Global Step: 277980 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:44,964-Speed 2630.65 samples/sec Loss 8.9658 LearningRate 0.0442 Epoch: 6 Global Step: 277990 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:56:48,857-Speed 2631.34 samples/sec Loss 8.9922 LearningRate 0.0442 Epoch: 6 Global Step: 278000 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:56:52,749-Speed 2632.11 samples/sec Loss 8.8166 LearningRate 0.0442 Epoch: 6 Global Step: 278010 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:56:56,651-Speed 2624.49 samples/sec Loss 8.9576 LearningRate 0.0442 Epoch: 6 Global Step: 278020 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:00,541-Speed 2633.41 samples/sec Loss 9.0017 LearningRate 0.0442 Epoch: 6 Global Step: 278030 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:04,434-Speed 2630.74 samples/sec Loss 8.9956 LearningRate 0.0442 Epoch: 6 Global Step: 278040 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:08,332-Speed 2627.25 samples/sec Loss 9.0721 LearningRate 0.0442 Epoch: 6 Global Step: 278050 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:12,245-Speed 2617.37 samples/sec Loss 8.9560 LearningRate 0.0442 Epoch: 6 Global Step: 278060 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:16,170-Speed 2609.93 samples/sec Loss 8.8745 LearningRate 0.0442 Epoch: 6 Global Step: 278070 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:20,064-Speed 2630.01 samples/sec Loss 9.0135 LearningRate 0.0442 Epoch: 6 Global Step: 278080 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:23,956-Speed 2631.99 samples/sec Loss 8.9615 LearningRate 0.0442 Epoch: 6 Global Step: 278090 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:57:27,855-Speed 2627.29 samples/sec Loss 9.0905 LearningRate 0.0442 Epoch: 6 Global Step: 278100 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:31,752-Speed 2628.44 samples/sec Loss 8.9499 LearningRate 0.0442 Epoch: 6 Global Step: 278110 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:35,650-Speed 2627.79 samples/sec Loss 9.0840 LearningRate 0.0442 Epoch: 6 Global Step: 278120 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:39,544-Speed 2629.78 samples/sec Loss 9.0457 LearningRate 0.0442 Epoch: 6 Global Step: 278130 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:43,439-Speed 2629.62 samples/sec Loss 9.0397 LearningRate 0.0442 Epoch: 6 Global Step: 278140 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:47,333-Speed 2630.62 samples/sec Loss 9.0452 LearningRate 0.0442 Epoch: 6 Global Step: 278150 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:51,227-Speed 2629.85 samples/sec Loss 9.0264 LearningRate 0.0442 Epoch: 6 Global Step: 278160 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:55,121-Speed 2631.13 samples/sec Loss 8.9524 LearningRate 0.0442 Epoch: 6 Global Step: 278170 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:57:59,014-Speed 2630.65 samples/sec Loss 8.7909 LearningRate 0.0442 Epoch: 6 Global Step: 278180 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:02,947-Speed 2604.35 samples/sec Loss 9.0750 LearningRate 0.0442 Epoch: 6 Global Step: 278190 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:06,809-Speed 2652.25 samples/sec Loss 9.1158 LearningRate 0.0442 Epoch: 6 Global Step: 278200 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:10,701-Speed 2631.80 samples/sec Loss 9.0542 LearningRate 0.0442 Epoch: 6 Global Step: 278210 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:14,599-Speed 2627.06 samples/sec Loss 9.0434 LearningRate 0.0442 Epoch: 6 Global Step: 278220 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:18,499-Speed 2627.06 samples/sec Loss 9.0670 LearningRate 0.0442 Epoch: 6 Global Step: 278230 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:22,391-Speed 2631.71 samples/sec Loss 9.0737 LearningRate 0.0442 Epoch: 6 Global Step: 278240 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:26,284-Speed 2630.61 samples/sec Loss 9.0138 LearningRate 0.0442 Epoch: 6 Global Step: 278250 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:30,177-Speed 2631.18 samples/sec Loss 8.9717 LearningRate 0.0442 Epoch: 6 Global Step: 278260 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:34,126-Speed 2593.84 samples/sec Loss 9.0575 LearningRate 0.0442 Epoch: 6 Global Step: 278270 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 02:58:38,014-Speed 2634.56 samples/sec Loss 8.9040 LearningRate 0.0442 Epoch: 6 Global Step: 278280 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:58:41,908-Speed 2629.98 samples/sec Loss 8.9646 LearningRate 0.0442 Epoch: 6 Global Step: 278290 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:58:45,802-Speed 2631.10 samples/sec Loss 9.1065 LearningRate 0.0442 Epoch: 6 Global Step: 278300 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:58:49,703-Speed 2625.17 samples/sec Loss 9.0127 LearningRate 0.0442 Epoch: 6 Global Step: 278310 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:58:53,602-Speed 2627.55 samples/sec Loss 8.9398 LearningRate 0.0442 Epoch: 6 Global Step: 278320 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 02:58:57,489-Speed 2634.52 samples/sec Loss 8.8893 LearningRate 0.0442 Epoch: 6 Global Step: 278330 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:59:01,370-Speed 2639.37 samples/sec Loss 10.4496 LearningRate 0.0442 Epoch: 6 Global Step: 278340 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:05,259-Speed 2633.37 samples/sec Loss 9.3183 LearningRate 0.0442 Epoch: 6 Global Step: 278350 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:09,165-Speed 2622.80 samples/sec Loss 9.0245 LearningRate 0.0442 Epoch: 6 Global Step: 278360 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:13,056-Speed 2632.42 samples/sec Loss 9.1371 LearningRate 0.0441 Epoch: 6 Global Step: 278370 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:16,953-Speed 2628.35 samples/sec Loss 9.0456 LearningRate 0.0441 Epoch: 6 Global Step: 278380 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:20,842-Speed 2634.01 samples/sec Loss 9.1619 LearningRate 0.0441 Epoch: 6 Global Step: 278390 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:24,732-Speed 2632.91 samples/sec Loss 9.0527 LearningRate 0.0441 Epoch: 6 Global Step: 278400 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:28,621-Speed 2633.96 samples/sec Loss 8.9310 LearningRate 0.0441 Epoch: 6 Global Step: 278410 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:32,512-Speed 2631.88 samples/sec Loss 9.0816 LearningRate 0.0441 Epoch: 6 Global Step: 278420 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:36,404-Speed 2631.80 samples/sec Loss 9.0097 LearningRate 0.0441 Epoch: 6 Global Step: 278430 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 02:59:40,299-Speed 2629.18 samples/sec Loss 8.9990 LearningRate 0.0441 Epoch: 6 Global Step: 278440 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:59:44,197-Speed 2627.94 samples/sec Loss 9.1718 LearningRate 0.0441 Epoch: 6 Global Step: 278450 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:59:48,096-Speed 2627.38 samples/sec Loss 9.0540 LearningRate 0.0441 Epoch: 6 Global Step: 278460 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:59:51,995-Speed 2627.03 samples/sec Loss 9.0670 LearningRate 0.0441 Epoch: 6 Global Step: 278470 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:59:55,885-Speed 2633.36 samples/sec Loss 9.0236 LearningRate 0.0441 Epoch: 6 Global Step: 278480 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 02:59:59,780-Speed 2629.86 samples/sec Loss 9.0997 LearningRate 0.0441 Epoch: 6 Global Step: 278490 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:00:03,672-Speed 2631.99 samples/sec Loss 8.9716 LearningRate 0.0441 Epoch: 6 Global Step: 278500 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:00:07,566-Speed 2630.08 samples/sec Loss 8.9815 LearningRate 0.0441 Epoch: 6 Global Step: 278510 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:00:11,456-Speed 2632.98 samples/sec Loss 8.9321 LearningRate 0.0441 Epoch: 6 Global Step: 278520 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:00:15,347-Speed 2632.73 samples/sec Loss 8.9983 LearningRate 0.0441 Epoch: 6 Global Step: 278530 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:00:19,238-Speed 2632.18 samples/sec Loss 9.0005 LearningRate 0.0441 Epoch: 6 Global Step: 278540 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:23,134-Speed 2629.26 samples/sec Loss 9.0043 LearningRate 0.0441 Epoch: 6 Global Step: 278550 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:27,032-Speed 2627.74 samples/sec Loss 8.9910 LearningRate 0.0441 Epoch: 6 Global Step: 278560 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:30,923-Speed 2632.30 samples/sec Loss 9.0256 LearningRate 0.0441 Epoch: 6 Global Step: 278570 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:34,814-Speed 2632.57 samples/sec Loss 8.9620 LearningRate 0.0441 Epoch: 6 Global Step: 278580 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:38,731-Speed 2614.42 samples/sec Loss 9.0308 LearningRate 0.0441 Epoch: 6 Global Step: 278590 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:42,630-Speed 2627.80 samples/sec Loss 9.0202 LearningRate 0.0441 Epoch: 6 Global Step: 278600 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:46,536-Speed 2622.29 samples/sec Loss 8.9684 LearningRate 0.0441 Epoch: 6 Global Step: 278610 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:50,429-Speed 2630.62 samples/sec Loss 9.0199 LearningRate 0.0441 Epoch: 6 Global Step: 278620 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:54,331-Speed 2625.40 samples/sec Loss 8.9362 LearningRate 0.0441 Epoch: 6 Global Step: 278630 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:00:58,231-Speed 2626.27 samples/sec Loss 9.0653 LearningRate 0.0441 Epoch: 6 Global Step: 278640 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:01:02,139-Speed 2621.19 samples/sec Loss 9.0079 LearningRate 0.0441 Epoch: 6 Global Step: 278650 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:01:06,032-Speed 2630.53 samples/sec Loss 8.9335 LearningRate 0.0441 Epoch: 6 Global Step: 278660 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:01:09,925-Speed 2630.99 samples/sec Loss 9.1521 LearningRate 0.0441 Epoch: 6 Global Step: 278670 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:01:13,816-Speed 2633.01 samples/sec Loss 9.0895 LearningRate 0.0441 Epoch: 6 Global Step: 278680 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:01:17,719-Speed 2623.72 samples/sec Loss 9.0110 LearningRate 0.0441 Epoch: 6 Global Step: 278690 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:01:21,603-Speed 2637.30 samples/sec Loss 9.1152 LearningRate 0.0441 Epoch: 6 Global Step: 278700 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:01:25,480-Speed 2641.65 samples/sec Loss 9.0820 LearningRate 0.0441 Epoch: 6 Global Step: 278710 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:29,376-Speed 2629.57 samples/sec Loss 8.8736 LearningRate 0.0441 Epoch: 6 Global Step: 278720 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:33,264-Speed 2634.14 samples/sec Loss 8.8816 LearningRate 0.0441 Epoch: 6 Global Step: 278730 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:37,158-Speed 2630.47 samples/sec Loss 8.9730 LearningRate 0.0441 Epoch: 6 Global Step: 278740 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:41,050-Speed 2631.38 samples/sec Loss 8.9492 LearningRate 0.0441 Epoch: 6 Global Step: 278750 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:44,939-Speed 2633.77 samples/sec Loss 9.0489 LearningRate 0.0441 Epoch: 6 Global Step: 278760 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:48,837-Speed 2628.01 samples/sec Loss 8.8933 LearningRate 0.0441 Epoch: 6 Global Step: 278770 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:52,766-Speed 2606.66 samples/sec Loss 8.9841 LearningRate 0.0441 Epoch: 6 Global Step: 278780 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:01:56,804-Speed 2536.99 samples/sec Loss 9.0055 LearningRate 0.0441 Epoch: 6 Global Step: 278790 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:02:00,704-Speed 2626.03 samples/sec Loss 9.0059 LearningRate 0.0441 Epoch: 6 Global Step: 278800 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:02:04,598-Speed 2630.40 samples/sec Loss 8.9355 LearningRate 0.0441 Epoch: 6 Global Step: 278810 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:08,495-Speed 2627.90 samples/sec Loss 9.1970 LearningRate 0.0441 Epoch: 6 Global Step: 278820 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:12,392-Speed 2628.51 samples/sec Loss 9.0509 LearningRate 0.0441 Epoch: 6 Global Step: 278830 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:16,306-Speed 2616.57 samples/sec Loss 9.0913 LearningRate 0.0441 Epoch: 6 Global Step: 278840 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:20,219-Speed 2617.52 samples/sec Loss 8.8813 LearningRate 0.0441 Epoch: 6 Global Step: 278850 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:24,116-Speed 2628.49 samples/sec Loss 9.0504 LearningRate 0.0441 Epoch: 6 Global Step: 278860 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:28,013-Speed 2627.99 samples/sec Loss 9.1014 LearningRate 0.0441 Epoch: 6 Global Step: 278870 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:31,917-Speed 2624.60 samples/sec Loss 9.0511 LearningRate 0.0441 Epoch: 6 Global Step: 278880 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:35,813-Speed 2628.61 samples/sec Loss 8.9463 LearningRate 0.0441 Epoch: 6 Global Step: 278890 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:39,708-Speed 2630.59 samples/sec Loss 8.9307 LearningRate 0.0441 Epoch: 6 Global Step: 278900 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:43,581-Speed 2644.57 samples/sec Loss 9.0581 LearningRate 0.0441 Epoch: 6 Global Step: 278910 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:47,484-Speed 2624.29 samples/sec Loss 8.9225 LearningRate 0.0441 Epoch: 6 Global Step: 278920 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:51,376-Speed 2631.52 samples/sec Loss 8.9744 LearningRate 0.0441 Epoch: 6 Global Step: 278930 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:55,377-Speed 2559.77 samples/sec Loss 9.1344 LearningRate 0.0441 Epoch: 6 Global Step: 278940 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:02:59,269-Speed 2633.15 samples/sec Loss 9.0223 LearningRate 0.0441 Epoch: 6 Global Step: 278950 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:03:03,199-Speed 2605.97 samples/sec Loss 9.0515 LearningRate 0.0441 Epoch: 6 Global Step: 278960 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:03:07,337-Speed 2475.51 samples/sec Loss 9.0594 LearningRate 0.0441 Epoch: 6 Global Step: 278970 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:03:11,330-Speed 2565.04 samples/sec Loss 8.9325 LearningRate 0.0441 Epoch: 6 Global Step: 278980 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:03:15,202-Speed 2645.46 samples/sec Loss 8.9693 LearningRate 0.0440 Epoch: 6 Global Step: 278990 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:19,106-Speed 2624.08 samples/sec Loss 8.9084 LearningRate 0.0440 Epoch: 6 Global Step: 279000 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:23,005-Speed 2626.80 samples/sec Loss 8.9853 LearningRate 0.0440 Epoch: 6 Global Step: 279010 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:26,897-Speed 2631.05 samples/sec Loss 8.9623 LearningRate 0.0440 Epoch: 6 Global Step: 279020 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:30,800-Speed 2624.72 samples/sec Loss 8.9611 LearningRate 0.0440 Epoch: 6 Global Step: 279030 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:34,693-Speed 2631.16 samples/sec Loss 8.9535 LearningRate 0.0440 Epoch: 6 Global Step: 279040 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:38,598-Speed 2623.35 samples/sec Loss 9.1517 LearningRate 0.0440 Epoch: 6 Global Step: 279050 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:42,495-Speed 2627.76 samples/sec Loss 9.0432 LearningRate 0.0440 Epoch: 6 Global Step: 279060 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:46,391-Speed 2628.88 samples/sec Loss 8.9982 LearningRate 0.0440 Epoch: 6 Global Step: 279070 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:50,298-Speed 2621.35 samples/sec Loss 9.0345 LearningRate 0.0440 Epoch: 6 Global Step: 279080 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:03:54,182-Speed 2638.43 samples/sec Loss 9.0306 LearningRate 0.0440 Epoch: 6 Global Step: 279090 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:03:58,076-Speed 2630.61 samples/sec Loss 9.1267 LearningRate 0.0440 Epoch: 6 Global Step: 279100 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:01,967-Speed 2632.20 samples/sec Loss 9.1098 LearningRate 0.0440 Epoch: 6 Global Step: 279110 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:05,857-Speed 2632.78 samples/sec Loss 8.9204 LearningRate 0.0440 Epoch: 6 Global Step: 279120 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:09,762-Speed 2623.42 samples/sec Loss 8.8393 LearningRate 0.0440 Epoch: 6 Global Step: 279130 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:13,660-Speed 2627.65 samples/sec Loss 9.0099 LearningRate 0.0440 Epoch: 6 Global Step: 279140 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:17,553-Speed 2630.85 samples/sec Loss 9.1117 LearningRate 0.0440 Epoch: 6 Global Step: 279150 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:21,448-Speed 2629.27 samples/sec Loss 8.8690 LearningRate 0.0440 Epoch: 6 Global Step: 279160 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:25,343-Speed 2630.12 samples/sec Loss 9.0466 LearningRate 0.0440 Epoch: 6 Global Step: 279170 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:29,235-Speed 2632.13 samples/sec Loss 8.9296 LearningRate 0.0440 Epoch: 6 Global Step: 279180 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:33,127-Speed 2631.83 samples/sec Loss 9.1035 LearningRate 0.0440 Epoch: 6 Global Step: 279190 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:04:37,034-Speed 2621.48 samples/sec Loss 8.9707 LearningRate 0.0440 Epoch: 6 Global Step: 279200 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:04:40,902-Speed 2648.07 samples/sec Loss 9.1641 LearningRate 0.0440 Epoch: 6 Global Step: 279210 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:44,793-Speed 2632.73 samples/sec Loss 8.9883 LearningRate 0.0440 Epoch: 6 Global Step: 279220 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:04:48,667-Speed 2643.91 samples/sec Loss 8.9760 LearningRate 0.0440 Epoch: 6 Global Step: 279230 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:04:52,566-Speed 2626.72 samples/sec Loss 9.1030 LearningRate 0.0440 Epoch: 6 Global Step: 279240 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:04:56,454-Speed 2633.99 samples/sec Loss 8.9655 LearningRate 0.0440 Epoch: 6 Global Step: 279250 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:00,343-Speed 2633.74 samples/sec Loss 9.0973 LearningRate 0.0440 Epoch: 6 Global Step: 279260 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:04,237-Speed 2630.41 samples/sec Loss 9.1752 LearningRate 0.0440 Epoch: 6 Global Step: 279270 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:08,137-Speed 2626.91 samples/sec Loss 8.7684 LearningRate 0.0440 Epoch: 6 Global Step: 279280 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:12,042-Speed 2622.79 samples/sec Loss 8.9119 LearningRate 0.0440 Epoch: 6 Global Step: 279290 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:15,937-Speed 2629.35 samples/sec Loss 8.9617 LearningRate 0.0440 Epoch: 6 Global Step: 279300 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:19,834-Speed 2628.37 samples/sec Loss 8.9659 LearningRate 0.0440 Epoch: 6 Global Step: 279310 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:23,726-Speed 2632.00 samples/sec Loss 8.9045 LearningRate 0.0440 Epoch: 6 Global Step: 279320 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:27,618-Speed 2631.92 samples/sec Loss 8.9818 LearningRate 0.0440 Epoch: 6 Global Step: 279330 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:05:31,540-Speed 2611.34 samples/sec Loss 8.8630 LearningRate 0.0440 Epoch: 6 Global Step: 279340 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:05:35,437-Speed 2628.36 samples/sec Loss 9.0226 LearningRate 0.0440 Epoch: 6 Global Step: 279350 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:05:39,328-Speed 2632.12 samples/sec Loss 9.0932 LearningRate 0.0440 Epoch: 6 Global Step: 279360 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:05:43,215-Speed 2635.86 samples/sec Loss 9.0954 LearningRate 0.0440 Epoch: 6 Global Step: 279370 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:47,110-Speed 2629.73 samples/sec Loss 9.0028 LearningRate 0.0440 Epoch: 6 Global Step: 279380 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:51,001-Speed 2632.05 samples/sec Loss 9.0266 LearningRate 0.0440 Epoch: 6 Global Step: 279390 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:54,890-Speed 2633.93 samples/sec Loss 9.0738 LearningRate 0.0440 Epoch: 6 Global Step: 279400 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:05:58,781-Speed 2632.61 samples/sec Loss 8.9515 LearningRate 0.0440 Epoch: 6 Global Step: 279410 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:06:02,678-Speed 2628.15 samples/sec Loss 9.0879 LearningRate 0.0440 Epoch: 6 Global Step: 279420 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:06:06,566-Speed 2634.05 samples/sec Loss 8.8812 LearningRate 0.0440 Epoch: 6 Global Step: 279430 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:06:10,458-Speed 2631.45 samples/sec Loss 8.9554 LearningRate 0.0440 Epoch: 6 Global Step: 279440 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:06:14,366-Speed 2621.43 samples/sec Loss 8.9898 LearningRate 0.0440 Epoch: 6 Global Step: 279450 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:06:18,260-Speed 2630.63 samples/sec Loss 8.9311 LearningRate 0.0440 Epoch: 6 Global Step: 279460 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:06:22,152-Speed 2631.76 samples/sec Loss 9.0198 LearningRate 0.0440 Epoch: 6 Global Step: 279470 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:26,046-Speed 2630.12 samples/sec Loss 9.0839 LearningRate 0.0440 Epoch: 6 Global Step: 279480 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:29,939-Speed 2631.29 samples/sec Loss 9.0238 LearningRate 0.0440 Epoch: 6 Global Step: 279490 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:33,835-Speed 2629.20 samples/sec Loss 9.0049 LearningRate 0.0440 Epoch: 6 Global Step: 279500 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:37,730-Speed 2628.95 samples/sec Loss 9.0078 LearningRate 0.0440 Epoch: 6 Global Step: 279510 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:41,620-Speed 2632.94 samples/sec Loss 8.9178 LearningRate 0.0440 Epoch: 6 Global Step: 279520 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:45,521-Speed 2625.73 samples/sec Loss 8.9392 LearningRate 0.0440 Epoch: 6 Global Step: 279530 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:49,406-Speed 2636.68 samples/sec Loss 8.9866 LearningRate 0.0440 Epoch: 6 Global Step: 279540 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:53,323-Speed 2615.18 samples/sec Loss 8.9090 LearningRate 0.0440 Epoch: 6 Global Step: 279550 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:06:57,227-Speed 2623.62 samples/sec Loss 9.0745 LearningRate 0.0440 Epoch: 6 Global Step: 279560 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:01,107-Speed 2640.10 samples/sec Loss 9.1133 LearningRate 0.0440 Epoch: 6 Global Step: 279570 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:05,026-Speed 2612.96 samples/sec Loss 8.8836 LearningRate 0.0440 Epoch: 6 Global Step: 279580 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:09,137-Speed 2491.34 samples/sec Loss 8.9301 LearningRate 0.0440 Epoch: 6 Global Step: 279590 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:13,043-Speed 2622.73 samples/sec Loss 8.9580 LearningRate 0.0440 Epoch: 6 Global Step: 279600 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:16,948-Speed 2622.80 samples/sec Loss 8.9343 LearningRate 0.0440 Epoch: 6 Global Step: 279610 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:20,905-Speed 2588.84 samples/sec Loss 9.0927 LearningRate 0.0439 Epoch: 6 Global Step: 279620 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:24,800-Speed 2629.22 samples/sec Loss 9.0413 LearningRate 0.0439 Epoch: 6 Global Step: 279630 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:28,709-Speed 2620.39 samples/sec Loss 8.9017 LearningRate 0.0439 Epoch: 6 Global Step: 279640 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:32,600-Speed 2632.31 samples/sec Loss 8.9519 LearningRate 0.0439 Epoch: 6 Global Step: 279650 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:36,492-Speed 2631.78 samples/sec Loss 8.9735 LearningRate 0.0439 Epoch: 6 Global Step: 279660 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:07:40,511-Speed 2548.16 samples/sec Loss 8.8835 LearningRate 0.0439 Epoch: 6 Global Step: 279670 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:07:44,486-Speed 2577.46 samples/sec Loss 8.8599 LearningRate 0.0439 Epoch: 6 Global Step: 279680 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:07:48,389-Speed 2624.47 samples/sec Loss 8.9864 LearningRate 0.0439 Epoch: 6 Global Step: 279690 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:07:52,314-Speed 2609.75 samples/sec Loss 8.9950 LearningRate 0.0439 Epoch: 6 Global Step: 279700 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:07:56,207-Speed 2631.02 samples/sec Loss 8.9535 LearningRate 0.0439 Epoch: 6 Global Step: 279710 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:08:00,076-Speed 2648.07 samples/sec Loss 8.8302 LearningRate 0.0439 Epoch: 6 Global Step: 279720 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:03,970-Speed 2630.10 samples/sec Loss 9.0154 LearningRate 0.0439 Epoch: 6 Global Step: 279730 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:07,860-Speed 2633.12 samples/sec Loss 8.9597 LearningRate 0.0439 Epoch: 6 Global Step: 279740 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:11,755-Speed 2629.55 samples/sec Loss 8.9614 LearningRate 0.0439 Epoch: 6 Global Step: 279750 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:15,645-Speed 2633.18 samples/sec Loss 8.8942 LearningRate 0.0439 Epoch: 6 Global Step: 279760 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:19,537-Speed 2632.18 samples/sec Loss 9.1038 LearningRate 0.0439 Epoch: 6 Global Step: 279770 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:23,426-Speed 2633.67 samples/sec Loss 8.9829 LearningRate 0.0439 Epoch: 6 Global Step: 279780 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:27,318-Speed 2631.29 samples/sec Loss 8.9764 LearningRate 0.0439 Epoch: 6 Global Step: 279790 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:31,220-Speed 2625.79 samples/sec Loss 9.0265 LearningRate 0.0439 Epoch: 6 Global Step: 279800 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:35,115-Speed 2629.76 samples/sec Loss 9.0534 LearningRate 0.0439 Epoch: 6 Global Step: 279810 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:39,011-Speed 2629.73 samples/sec Loss 9.0303 LearningRate 0.0439 Epoch: 6 Global Step: 279820 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:08:42,908-Speed 2627.86 samples/sec Loss 8.9436 LearningRate 0.0439 Epoch: 6 Global Step: 279830 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:08:46,809-Speed 2626.03 samples/sec Loss 8.8809 LearningRate 0.0439 Epoch: 6 Global Step: 279840 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:08:50,687-Speed 2641.20 samples/sec Loss 8.8413 LearningRate 0.0439 Epoch: 6 Global Step: 279850 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:54,580-Speed 2631.01 samples/sec Loss 8.8164 LearningRate 0.0439 Epoch: 6 Global Step: 279860 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:08:58,481-Speed 2625.71 samples/sec Loss 8.8733 LearningRate 0.0439 Epoch: 6 Global Step: 279870 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:02,389-Speed 2620.97 samples/sec Loss 9.0040 LearningRate 0.0439 Epoch: 6 Global Step: 279880 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:06,289-Speed 2626.37 samples/sec Loss 9.0797 LearningRate 0.0439 Epoch: 6 Global Step: 279890 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:10,189-Speed 2626.92 samples/sec Loss 9.0121 LearningRate 0.0439 Epoch: 6 Global Step: 279900 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:14,087-Speed 2627.65 samples/sec Loss 8.9514 LearningRate 0.0439 Epoch: 6 Global Step: 279910 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:18,026-Speed 2599.80 samples/sec Loss 8.9799 LearningRate 0.0439 Epoch: 6 Global Step: 279920 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:21,919-Speed 2631.13 samples/sec Loss 9.0018 LearningRate 0.0439 Epoch: 6 Global Step: 279930 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:25,813-Speed 2630.89 samples/sec Loss 8.8582 LearningRate 0.0439 Epoch: 6 Global Step: 279940 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:29,707-Speed 2630.76 samples/sec Loss 9.0563 LearningRate 0.0439 Epoch: 6 Global Step: 279950 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:09:33,579-Speed 2645.13 samples/sec Loss 8.9739 LearningRate 0.0439 Epoch: 6 Global Step: 279960 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:37,480-Speed 2625.60 samples/sec Loss 8.9084 LearningRate 0.0439 Epoch: 6 Global Step: 279970 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:41,372-Speed 2631.49 samples/sec Loss 8.9583 LearningRate 0.0439 Epoch: 6 Global Step: 279980 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:45,266-Speed 2630.58 samples/sec Loss 8.8825 LearningRate 0.0439 Epoch: 6 Global Step: 279990 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:09:49,161-Speed 2629.66 samples/sec Loss 8.8247 LearningRate 0.0439 Epoch: 6 Global Step: 280000 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:10:32,439-[lfw][280000]XNorm: 23.485694
Training: 2022-04-14 03:10:32,440-[lfw][280000]Accuracy-Flip: 0.99767+-0.00200
Training: 2022-04-14 03:10:32,441-[lfw][280000]Accuracy-Highest: 0.99783
Training: 2022-04-14 03:11:22,285-[cfp_fp][280000]XNorm: 21.563512
Training: 2022-04-14 03:11:22,286-[cfp_fp][280000]Accuracy-Flip: 0.98457+-0.00750
Training: 2022-04-14 03:11:22,288-[cfp_fp][280000]Accuracy-Highest: 0.98643
Training: 2022-04-14 03:12:04,966-[agedb_30][280000]XNorm: 23.049351
Training: 2022-04-14 03:12:04,967-[agedb_30][280000]Accuracy-Flip: 0.97567+-0.00898
Training: 2022-04-14 03:12:04,967-[agedb_30][280000]Accuracy-Highest: 0.97567
Training: 2022-04-14 03:12:08,836-Speed 73.31 samples/sec Loss 8.9456 LearningRate 0.0439 Epoch: 6 Global Step: 280010 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:12,703-Speed 2648.44 samples/sec Loss 9.0089 LearningRate 0.0439 Epoch: 6 Global Step: 280020 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:16,573-Speed 2646.83 samples/sec Loss 8.8560 LearningRate 0.0439 Epoch: 6 Global Step: 280030 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:20,445-Speed 2644.97 samples/sec Loss 8.9283 LearningRate 0.0439 Epoch: 6 Global Step: 280040 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:24,322-Speed 2641.94 samples/sec Loss 8.9151 LearningRate 0.0439 Epoch: 6 Global Step: 280050 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:28,201-Speed 2640.34 samples/sec Loss 8.9687 LearningRate 0.0439 Epoch: 6 Global Step: 280060 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:12:32,080-Speed 2641.03 samples/sec Loss 8.9975 LearningRate 0.0439 Epoch: 6 Global Step: 280070 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:12:35,940-Speed 2653.27 samples/sec Loss 9.0290 LearningRate 0.0439 Epoch: 6 Global Step: 280080 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:39,842-Speed 2625.36 samples/sec Loss 8.8998 LearningRate 0.0439 Epoch: 6 Global Step: 280090 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:43,729-Speed 2635.43 samples/sec Loss 8.9782 LearningRate 0.0439 Epoch: 6 Global Step: 280100 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:47,618-Speed 2633.87 samples/sec Loss 9.0031 LearningRate 0.0439 Epoch: 6 Global Step: 280110 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:51,611-Speed 2565.33 samples/sec Loss 9.0930 LearningRate 0.0439 Epoch: 6 Global Step: 280120 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:55,505-Speed 2630.43 samples/sec Loss 9.0085 LearningRate 0.0439 Epoch: 6 Global Step: 280130 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:12:59,441-Speed 2602.16 samples/sec Loss 8.9029 LearningRate 0.0439 Epoch: 6 Global Step: 280140 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:03,362-Speed 2612.27 samples/sec Loss 8.8970 LearningRate 0.0439 Epoch: 6 Global Step: 280150 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:07,262-Speed 2626.66 samples/sec Loss 8.9113 LearningRate 0.0439 Epoch: 6 Global Step: 280160 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:11,156-Speed 2630.41 samples/sec Loss 8.8401 LearningRate 0.0439 Epoch: 6 Global Step: 280170 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:15,053-Speed 2628.12 samples/sec Loss 9.0423 LearningRate 0.0439 Epoch: 6 Global Step: 280180 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:13:18,928-Speed 2643.80 samples/sec Loss 9.0882 LearningRate 0.0439 Epoch: 6 Global Step: 280190 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:22,819-Speed 2632.33 samples/sec Loss 8.8513 LearningRate 0.0439 Epoch: 6 Global Step: 280200 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:26,713-Speed 2630.17 samples/sec Loss 9.0091 LearningRate 0.0439 Epoch: 6 Global Step: 280210 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:30,607-Speed 2629.55 samples/sec Loss 8.9569 LearningRate 0.0439 Epoch: 6 Global Step: 280220 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:34,507-Speed 2627.27 samples/sec Loss 9.0598 LearningRate 0.0439 Epoch: 6 Global Step: 280230 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:38,402-Speed 2629.59 samples/sec Loss 8.8813 LearningRate 0.0438 Epoch: 6 Global Step: 280240 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:42,306-Speed 2623.53 samples/sec Loss 9.0159 LearningRate 0.0438 Epoch: 6 Global Step: 280250 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:13:46,185-Speed 2640.31 samples/sec Loss 9.0656 LearningRate 0.0438 Epoch: 6 Global Step: 280260 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:13:50,083-Speed 2627.79 samples/sec Loss 8.9253 LearningRate 0.0438 Epoch: 6 Global Step: 280270 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:13:53,982-Speed 2627.14 samples/sec Loss 8.9244 LearningRate 0.0438 Epoch: 6 Global Step: 280280 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:13:57,879-Speed 2628.27 samples/sec Loss 9.0218 LearningRate 0.0438 Epoch: 6 Global Step: 280290 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:14:01,779-Speed 2625.54 samples/sec Loss 9.0316 LearningRate 0.0438 Epoch: 6 Global Step: 280300 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:14:05,680-Speed 2626.24 samples/sec Loss 8.9545 LearningRate 0.0438 Epoch: 6 Global Step: 280310 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:14:09,577-Speed 2627.78 samples/sec Loss 9.0684 LearningRate 0.0438 Epoch: 6 Global Step: 280320 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:14:13,477-Speed 2626.58 samples/sec Loss 9.0044 LearningRate 0.0438 Epoch: 6 Global Step: 280330 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:14:17,381-Speed 2623.20 samples/sec Loss 9.0814 LearningRate 0.0438 Epoch: 6 Global Step: 280340 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:14:21,277-Speed 2629.59 samples/sec Loss 9.0173 LearningRate 0.0438 Epoch: 6 Global Step: 280350 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:14:25,176-Speed 2626.61 samples/sec Loss 8.9822 LearningRate 0.0438 Epoch: 6 Global Step: 280360 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:29,072-Speed 2629.41 samples/sec Loss 8.9960 LearningRate 0.0438 Epoch: 6 Global Step: 280370 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:32,968-Speed 2628.17 samples/sec Loss 8.9358 LearningRate 0.0438 Epoch: 6 Global Step: 280380 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:36,864-Speed 2629.01 samples/sec Loss 8.9109 LearningRate 0.0438 Epoch: 6 Global Step: 280390 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:40,762-Speed 2628.17 samples/sec Loss 8.7813 LearningRate 0.0438 Epoch: 6 Global Step: 280400 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:44,657-Speed 2629.52 samples/sec Loss 8.9115 LearningRate 0.0438 Epoch: 6 Global Step: 280410 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:48,552-Speed 2629.25 samples/sec Loss 8.9847 LearningRate 0.0438 Epoch: 6 Global Step: 280420 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:52,457-Speed 2622.76 samples/sec Loss 8.8875 LearningRate 0.0438 Epoch: 6 Global Step: 280430 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:14:56,361-Speed 2625.53 samples/sec Loss 8.9508 LearningRate 0.0438 Epoch: 6 Global Step: 280440 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:00,257-Speed 2628.80 samples/sec Loss 8.9338 LearningRate 0.0438 Epoch: 6 Global Step: 280450 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:04,154-Speed 2627.97 samples/sec Loss 8.9023 LearningRate 0.0438 Epoch: 6 Global Step: 280460 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:15:08,058-Speed 2623.60 samples/sec Loss 8.9377 LearningRate 0.0438 Epoch: 6 Global Step: 280470 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:15:11,940-Speed 2638.58 samples/sec Loss 8.9073 LearningRate 0.0438 Epoch: 6 Global Step: 280480 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:15,835-Speed 2629.80 samples/sec Loss 9.0006 LearningRate 0.0438 Epoch: 6 Global Step: 280490 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:19,736-Speed 2625.87 samples/sec Loss 8.8645 LearningRate 0.0438 Epoch: 6 Global Step: 280500 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:23,636-Speed 2626.58 samples/sec Loss 8.9572 LearningRate 0.0438 Epoch: 6 Global Step: 280510 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:27,569-Speed 2604.39 samples/sec Loss 8.9949 LearningRate 0.0438 Epoch: 6 Global Step: 280520 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:31,475-Speed 2622.75 samples/sec Loss 8.9765 LearningRate 0.0438 Epoch: 6 Global Step: 280530 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:35,368-Speed 2630.46 samples/sec Loss 8.9749 LearningRate 0.0438 Epoch: 6 Global Step: 280540 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:39,266-Speed 2627.49 samples/sec Loss 8.9542 LearningRate 0.0438 Epoch: 6 Global Step: 280550 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:43,164-Speed 2627.82 samples/sec Loss 8.9421 LearningRate 0.0438 Epoch: 6 Global Step: 280560 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:47,061-Speed 2628.21 samples/sec Loss 9.0020 LearningRate 0.0438 Epoch: 6 Global Step: 280570 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:50,940-Speed 2640.74 samples/sec Loss 9.0008 LearningRate 0.0438 Epoch: 6 Global Step: 280580 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:54,835-Speed 2629.50 samples/sec Loss 9.0894 LearningRate 0.0438 Epoch: 6 Global Step: 280590 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:15:58,736-Speed 2626.09 samples/sec Loss 9.0437 LearningRate 0.0438 Epoch: 6 Global Step: 280600 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:02,635-Speed 2626.32 samples/sec Loss 8.8511 LearningRate 0.0438 Epoch: 6 Global Step: 280610 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:06,535-Speed 2626.65 samples/sec Loss 8.9105 LearningRate 0.0438 Epoch: 6 Global Step: 280620 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:10,436-Speed 2625.73 samples/sec Loss 8.9800 LearningRate 0.0438 Epoch: 6 Global Step: 280630 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:14,336-Speed 2625.99 samples/sec Loss 9.0714 LearningRate 0.0438 Epoch: 6 Global Step: 280640 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:18,235-Speed 2626.99 samples/sec Loss 8.9980 LearningRate 0.0438 Epoch: 6 Global Step: 280650 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:22,224-Speed 2567.68 samples/sec Loss 8.9944 LearningRate 0.0438 Epoch: 6 Global Step: 280660 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:26,125-Speed 2626.06 samples/sec Loss 8.9840 LearningRate 0.0438 Epoch: 6 Global Step: 280670 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:30,024-Speed 2627.64 samples/sec Loss 8.8323 LearningRate 0.0438 Epoch: 6 Global Step: 280680 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:16:33,919-Speed 2629.28 samples/sec Loss 8.9555 LearningRate 0.0438 Epoch: 6 Global Step: 280690 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:16:37,800-Speed 2639.00 samples/sec Loss 9.0310 LearningRate 0.0438 Epoch: 6 Global Step: 280700 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:41,704-Speed 2623.33 samples/sec Loss 8.9441 LearningRate 0.0438 Epoch: 6 Global Step: 280710 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:45,607-Speed 2625.09 samples/sec Loss 8.9997 LearningRate 0.0438 Epoch: 6 Global Step: 280720 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:49,508-Speed 2624.88 samples/sec Loss 8.9671 LearningRate 0.0438 Epoch: 6 Global Step: 280730 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:53,414-Speed 2622.51 samples/sec Loss 9.0527 LearningRate 0.0438 Epoch: 6 Global Step: 280740 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:16:57,284-Speed 2646.85 samples/sec Loss 9.1476 LearningRate 0.0438 Epoch: 6 Global Step: 280750 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:17:01,128-Speed 2664.95 samples/sec Loss 9.2360 LearningRate 0.0438 Epoch: 6 Global Step: 280760 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:17:05,017-Speed 2633.46 samples/sec Loss 9.3282 LearningRate 0.0438 Epoch: 6 Global Step: 280770 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:08,910-Speed 2630.67 samples/sec Loss 9.0494 LearningRate 0.0438 Epoch: 6 Global Step: 280780 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:12,803-Speed 2630.93 samples/sec Loss 9.0928 LearningRate 0.0438 Epoch: 6 Global Step: 280790 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:16,702-Speed 2627.29 samples/sec Loss 8.9651 LearningRate 0.0438 Epoch: 6 Global Step: 280800 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:20,623-Speed 2612.73 samples/sec Loss 8.9473 LearningRate 0.0438 Epoch: 6 Global Step: 280810 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:24,535-Speed 2618.08 samples/sec Loss 8.9585 LearningRate 0.0438 Epoch: 6 Global Step: 280820 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:28,444-Speed 2620.33 samples/sec Loss 8.9572 LearningRate 0.0438 Epoch: 6 Global Step: 280830 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:32,337-Speed 2630.98 samples/sec Loss 9.3185 LearningRate 0.0438 Epoch: 6 Global Step: 280840 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:36,238-Speed 2625.54 samples/sec Loss 8.9271 LearningRate 0.0438 Epoch: 6 Global Step: 280850 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:40,133-Speed 2629.61 samples/sec Loss 9.0746 LearningRate 0.0438 Epoch: 6 Global Step: 280860 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:17:44,038-Speed 2623.02 samples/sec Loss 8.9255 LearningRate 0.0437 Epoch: 6 Global Step: 280870 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:17:47,937-Speed 2626.90 samples/sec Loss 9.0843 LearningRate 0.0437 Epoch: 6 Global Step: 280880 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:17:51,832-Speed 2629.35 samples/sec Loss 8.9828 LearningRate 0.0437 Epoch: 6 Global Step: 280890 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:17:55,724-Speed 2631.83 samples/sec Loss 8.9213 LearningRate 0.0437 Epoch: 6 Global Step: 280900 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:17:59,617-Speed 2631.61 samples/sec Loss 8.9702 LearningRate 0.0437 Epoch: 6 Global Step: 280910 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:18:03,479-Speed 2652.23 samples/sec Loss 9.4241 LearningRate 0.0437 Epoch: 6 Global Step: 280920 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:07,369-Speed 2632.26 samples/sec Loss 9.2552 LearningRate 0.0437 Epoch: 6 Global Step: 280930 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:11,276-Speed 2621.68 samples/sec Loss 8.9660 LearningRate 0.0437 Epoch: 6 Global Step: 280940 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:15,168-Speed 2631.65 samples/sec Loss 9.0553 LearningRate 0.0437 Epoch: 6 Global Step: 280950 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:19,065-Speed 2628.21 samples/sec Loss 8.9928 LearningRate 0.0437 Epoch: 6 Global Step: 280960 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:22,969-Speed 2623.53 samples/sec Loss 9.0390 LearningRate 0.0437 Epoch: 6 Global Step: 280970 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:26,860-Speed 2632.89 samples/sec Loss 9.0432 LearningRate 0.0437 Epoch: 6 Global Step: 280980 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:30,754-Speed 2630.43 samples/sec Loss 9.0705 LearningRate 0.0437 Epoch: 6 Global Step: 280990 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:34,648-Speed 2630.13 samples/sec Loss 9.0612 LearningRate 0.0437 Epoch: 6 Global Step: 281000 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:38,540-Speed 2631.87 samples/sec Loss 9.0888 LearningRate 0.0437 Epoch: 6 Global Step: 281010 Fp16 Grad Scale: 1024 Required: 62 hours
Training: 2022-04-14 03:18:42,434-Speed 2629.70 samples/sec Loss 9.1420 LearningRate 0.0437 Epoch: 6 Global Step: 281020 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:18:46,328-Speed 2630.39 samples/sec Loss 9.0120 LearningRate 0.0437 Epoch: 6 Global Step: 281030 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:18:50,220-Speed 2631.38 samples/sec Loss 9.0516 LearningRate 0.0437 Epoch: 6 Global Step: 281040 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:18:54,125-Speed 2623.04 samples/sec Loss 8.8993 LearningRate 0.0437 Epoch: 6 Global Step: 281050 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:18:58,025-Speed 2626.52 samples/sec Loss 8.9613 LearningRate 0.0437 Epoch: 6 Global Step: 281060 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:19:01,928-Speed 2624.15 samples/sec Loss 9.0560 LearningRate 0.0437 Epoch: 6 Global Step: 281070 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:19:05,832-Speed 2623.75 samples/sec Loss 9.0815 LearningRate 0.0437 Epoch: 6 Global Step: 281080 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:19:09,726-Speed 2630.18 samples/sec Loss 9.0891 LearningRate 0.0437 Epoch: 6 Global Step: 281090 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:19:13,623-Speed 2628.43 samples/sec Loss 8.9693 LearningRate 0.0437 Epoch: 6 Global Step: 281100 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:19:17,571-Speed 2594.37 samples/sec Loss 8.9380 LearningRate 0.0437 Epoch: 6 Global Step: 281110 Fp16 Grad Scale: 2048 Required: 62 hours
Training: 2022-04-14 03:19:21,547-Speed 2575.76 samples/sec Loss 8.9945 LearningRate 0.0437 Epoch: 6 Global Step: 281120 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:25,446-Speed 2627.55 samples/sec Loss 9.0591 LearningRate 0.0437 Epoch: 6 Global Step: 281130 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:29,342-Speed 2629.21 samples/sec Loss 9.0176 LearningRate 0.0437 Epoch: 6 Global Step: 281140 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:33,236-Speed 2630.21 samples/sec Loss 9.0158 LearningRate 0.0437 Epoch: 6 Global Step: 281150 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:37,129-Speed 2631.58 samples/sec Loss 8.9700 LearningRate 0.0437 Epoch: 6 Global Step: 281160 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:41,020-Speed 2631.86 samples/sec Loss 8.9311 LearningRate 0.0437 Epoch: 6 Global Step: 281170 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:44,913-Speed 2631.09 samples/sec Loss 9.1067 LearningRate 0.0437 Epoch: 6 Global Step: 281180 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:48,807-Speed 2630.64 samples/sec Loss 9.1425 LearningRate 0.0437 Epoch: 6 Global Step: 281190 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:52,700-Speed 2630.45 samples/sec Loss 8.9268 LearningRate 0.0437 Epoch: 6 Global Step: 281200 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:19:56,595-Speed 2629.92 samples/sec Loss 8.8681 LearningRate 0.0437 Epoch: 6 Global Step: 281210 Fp16 Grad Scale: 4096 Required: 62 hours
Training: 2022-04-14 03:20:00,486-Speed 2632.37 samples/sec Loss 9.0335 LearningRate 0.0437 Epoch: 6 Global Step: 281220 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:04,391-Speed 2623.26 samples/sec Loss 9.0426 LearningRate 0.0437 Epoch: 6 Global Step: 281230 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:08,281-Speed 2632.91 samples/sec Loss 8.9539 LearningRate 0.0437 Epoch: 6 Global Step: 281240 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:12,178-Speed 2628.05 samples/sec Loss 9.0671 LearningRate 0.0437 Epoch: 6 Global Step: 281250 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:16,072-Speed 2630.51 samples/sec Loss 8.9817 LearningRate 0.0437 Epoch: 6 Global Step: 281260 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:19,964-Speed 2631.78 samples/sec Loss 9.0183 LearningRate 0.0437 Epoch: 6 Global Step: 281270 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:23,863-Speed 2626.49 samples/sec Loss 8.9837 LearningRate 0.0437 Epoch: 6 Global Step: 281280 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:27,759-Speed 2628.91 samples/sec Loss 9.0317 LearningRate 0.0437 Epoch: 6 Global Step: 281290 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:31,659-Speed 2626.89 samples/sec Loss 8.9265 LearningRate 0.0437 Epoch: 6 Global Step: 281300 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:35,566-Speed 2621.73 samples/sec Loss 8.9251 LearningRate 0.0437 Epoch: 6 Global Step: 281310 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:39,458-Speed 2631.40 samples/sec Loss 8.9856 LearningRate 0.0437 Epoch: 6 Global Step: 281320 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:20:43,346-Speed 2634.56 samples/sec Loss 8.9205 LearningRate 0.0437 Epoch: 6 Global Step: 281330 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:47,238-Speed 2631.29 samples/sec Loss 8.8283 LearningRate 0.0437 Epoch: 6 Global Step: 281340 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:51,137-Speed 2627.31 samples/sec Loss 9.1450 LearningRate 0.0437 Epoch: 6 Global Step: 281350 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:55,031-Speed 2630.39 samples/sec Loss 8.8764 LearningRate 0.0437 Epoch: 6 Global Step: 281360 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:20:58,973-Speed 2598.63 samples/sec Loss 8.8675 LearningRate 0.0437 Epoch: 6 Global Step: 281370 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:21:02,872-Speed 2626.98 samples/sec Loss 8.8885 LearningRate 0.0437 Epoch: 6 Global Step: 281380 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:21:06,809-Speed 2601.58 samples/sec Loss 9.0188 LearningRate 0.0437 Epoch: 6 Global Step: 281390 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:21:10,704-Speed 2629.62 samples/sec Loss 8.8985 LearningRate 0.0437 Epoch: 6 Global Step: 281400 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:21:14,605-Speed 2625.25 samples/sec Loss 9.0631 LearningRate 0.0437 Epoch: 6 Global Step: 281410 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:21:18,501-Speed 2628.87 samples/sec Loss 9.0718 LearningRate 0.0437 Epoch: 6 Global Step: 281420 Fp16 Grad Scale: 8192 Required: 62 hours
Training: 2022-04-14 03:21:22,397-Speed 2629.35 samples/sec Loss 9.1348 LearningRate 0.0437 Epoch: 6 Global Step: 281430 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:26,296-Speed 2627.01 samples/sec Loss 8.8798 LearningRate 0.0437 Epoch: 6 Global Step: 281440 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:30,301-Speed 2557.84 samples/sec Loss 9.0280 LearningRate 0.0437 Epoch: 6 Global Step: 281450 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:34,211-Speed 2619.34 samples/sec Loss 8.9755 LearningRate 0.0437 Epoch: 6 Global Step: 281460 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:38,120-Speed 2619.81 samples/sec Loss 9.0067 LearningRate 0.0437 Epoch: 6 Global Step: 281470 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:42,021-Speed 2625.59 samples/sec Loss 9.1671 LearningRate 0.0437 Epoch: 6 Global Step: 281480 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:45,914-Speed 2630.85 samples/sec Loss 8.9879 LearningRate 0.0437 Epoch: 6 Global Step: 281490 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:49,809-Speed 2630.25 samples/sec Loss 8.8654 LearningRate 0.0436 Epoch: 6 Global Step: 281500 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:53,739-Speed 2606.42 samples/sec Loss 9.1164 LearningRate 0.0436 Epoch: 6 Global Step: 281510 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:21:57,653-Speed 2616.31 samples/sec Loss 9.1049 LearningRate 0.0436 Epoch: 6 Global Step: 281520 Fp16 Grad Scale: 16384 Required: 62 hours
Training: 2022-04-14 03:22:01,555-Speed 2625.70 samples/sec Loss 8.9823 LearningRate 0.0436 Epoch: 6 Global Step: 281530 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:05,452-Speed 2627.92 samples/sec Loss 8.9096 LearningRate 0.0436 Epoch: 6 Global Step: 281540 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:09,357-Speed 2623.19 samples/sec Loss 8.9282 LearningRate 0.0436 Epoch: 6 Global Step: 281550 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:13,282-Speed 2609.01 samples/sec Loss 8.9781 LearningRate 0.0436 Epoch: 6 Global Step: 281560 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:17,177-Speed 2630.39 samples/sec Loss 8.8895 LearningRate 0.0436 Epoch: 6 Global Step: 281570 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:21,086-Speed 2620.58 samples/sec Loss 8.9911 LearningRate 0.0436 Epoch: 6 Global Step: 281580 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:24,982-Speed 2628.47 samples/sec Loss 8.8412 LearningRate 0.0436 Epoch: 6 Global Step: 281590 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:28,884-Speed 2625.41 samples/sec Loss 8.9361 LearningRate 0.0436 Epoch: 6 Global Step: 281600 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:32,776-Speed 2631.74 samples/sec Loss 8.8902 LearningRate 0.0436 Epoch: 6 Global Step: 281610 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:36,668-Speed 2631.21 samples/sec Loss 8.9763 LearningRate 0.0436 Epoch: 6 Global Step: 281620 Fp16 Grad Scale: 32768 Required: 62 hours
Training: 2022-04-14 03:22:40,577-Speed 2620.51 samples/sec Loss 8.9413 LearningRate 0.0436 Epoch: 6 Global Step: 281630 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:22:44,469-Speed 2632.25 samples/sec Loss 8.9328 LearningRate 0.0436 Epoch: 6 Global Step: 281640 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:22:48,364-Speed 2629.53 samples/sec Loss 8.8902 LearningRate 0.0436 Epoch: 6 Global Step: 281650 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:22:52,257-Speed 2630.92 samples/sec Loss 8.8618 LearningRate 0.0436 Epoch: 6 Global Step: 281660 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:22:56,149-Speed 2631.47 samples/sec Loss 8.7509 LearningRate 0.0436 Epoch: 6 Global Step: 281670 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:23:00,046-Speed 2628.78 samples/sec Loss 9.0099 LearningRate 0.0436 Epoch: 6 Global Step: 281680 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:23:03,947-Speed 2625.18 samples/sec Loss 9.0073 LearningRate 0.0436 Epoch: 6 Global Step: 281690 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:23:07,842-Speed 2629.84 samples/sec Loss 8.9276 LearningRate 0.0436 Epoch: 6 Global Step: 281700 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:23:11,749-Speed 2621.68 samples/sec Loss 9.0180 LearningRate 0.0436 Epoch: 6 Global Step: 281710 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:23:15,649-Speed 2626.64 samples/sec Loss 8.8211 LearningRate 0.0436 Epoch: 6 Global Step: 281720 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:23:19,541-Speed 2631.33 samples/sec Loss 9.0041 LearningRate 0.0436 Epoch: 6 Global Step: 281730 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:23,437-Speed 2629.12 samples/sec Loss 8.9672 LearningRate 0.0436 Epoch: 6 Global Step: 281740 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:27,333-Speed 2628.64 samples/sec Loss 8.9152 LearningRate 0.0436 Epoch: 6 Global Step: 281750 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:31,227-Speed 2630.49 samples/sec Loss 9.0015 LearningRate 0.0436 Epoch: 6 Global Step: 281760 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:35,123-Speed 2628.91 samples/sec Loss 9.0612 LearningRate 0.0436 Epoch: 6 Global Step: 281770 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:39,019-Speed 2628.96 samples/sec Loss 9.0239 LearningRate 0.0436 Epoch: 6 Global Step: 281780 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:42,916-Speed 2628.49 samples/sec Loss 8.8624 LearningRate 0.0436 Epoch: 6 Global Step: 281790 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:46,811-Speed 2630.22 samples/sec Loss 8.8957 LearningRate 0.0436 Epoch: 6 Global Step: 281800 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:50,707-Speed 2628.90 samples/sec Loss 8.9611 LearningRate 0.0436 Epoch: 6 Global Step: 281810 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:54,601-Speed 2630.42 samples/sec Loss 8.9604 LearningRate 0.0436 Epoch: 6 Global Step: 281820 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:23:58,497-Speed 2629.18 samples/sec Loss 9.0328 LearningRate 0.0436 Epoch: 6 Global Step: 281830 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:24:02,394-Speed 2627.84 samples/sec Loss 8.9101 LearningRate 0.0436 Epoch: 6 Global Step: 281840 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:24:06,290-Speed 2629.06 samples/sec Loss 9.0393 LearningRate 0.0436 Epoch: 6 Global Step: 281850 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:24:10,184-Speed 2630.34 samples/sec Loss 8.9201 LearningRate 0.0436 Epoch: 6 Global Step: 281860 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:24:14,095-Speed 2618.90 samples/sec Loss 8.9401 LearningRate 0.0436 Epoch: 6 Global Step: 281870 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:24:18,012-Speed 2614.44 samples/sec Loss 8.8353 LearningRate 0.0436 Epoch: 6 Global Step: 281880 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:24:21,998-Speed 2569.98 samples/sec Loss 8.8345 LearningRate 0.0436 Epoch: 6 Global Step: 281890 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:25,887-Speed 2634.33 samples/sec Loss 8.9828 LearningRate 0.0436 Epoch: 6 Global Step: 281900 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:29,790-Speed 2623.69 samples/sec Loss 8.9775 LearningRate 0.0436 Epoch: 6 Global Step: 281910 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:33,690-Speed 2626.51 samples/sec Loss 8.8142 LearningRate 0.0436 Epoch: 6 Global Step: 281920 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:37,587-Speed 2628.60 samples/sec Loss 9.0971 LearningRate 0.0436 Epoch: 6 Global Step: 281930 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:41,500-Speed 2617.34 samples/sec Loss 8.9270 LearningRate 0.0436 Epoch: 6 Global Step: 281940 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:45,400-Speed 2626.15 samples/sec Loss 8.8469 LearningRate 0.0436 Epoch: 6 Global Step: 281950 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:49,292-Speed 2631.70 samples/sec Loss 8.9425 LearningRate 0.0436 Epoch: 6 Global Step: 281960 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:53,212-Speed 2612.88 samples/sec Loss 8.9375 LearningRate 0.0436 Epoch: 6 Global Step: 281970 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:24:57,106-Speed 2630.25 samples/sec Loss 8.9487 LearningRate 0.0436 Epoch: 6 Global Step: 281980 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:01,003-Speed 2628.13 samples/sec Loss 9.0739 LearningRate 0.0436 Epoch: 6 Global Step: 281990 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:25:04,897-Speed 2630.79 samples/sec Loss 9.0035 LearningRate 0.0436 Epoch: 6 Global Step: 282000 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:25:08,789-Speed 2631.39 samples/sec Loss 8.9741 LearningRate 0.0436 Epoch: 6 Global Step: 282010 Fp16 Grad Scale: 262144 Required: 62 hours
Training: 2022-04-14 03:25:12,695-Speed 2622.38 samples/sec Loss 8.8845 LearningRate 0.0436 Epoch: 6 Global Step: 282020 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:16,711-Speed 2550.27 samples/sec Loss 8.9392 LearningRate 0.0436 Epoch: 6 Global Step: 282030 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:20,637-Speed 2608.97 samples/sec Loss 8.9805 LearningRate 0.0436 Epoch: 6 Global Step: 282040 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:24,553-Speed 2615.66 samples/sec Loss 8.9431 LearningRate 0.0436 Epoch: 6 Global Step: 282050 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:28,628-Speed 2513.27 samples/sec Loss 9.0294 LearningRate 0.0436 Epoch: 6 Global Step: 282060 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:32,628-Speed 2560.87 samples/sec Loss 8.9947 LearningRate 0.0436 Epoch: 6 Global Step: 282070 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:36,529-Speed 2625.98 samples/sec Loss 8.9471 LearningRate 0.0436 Epoch: 6 Global Step: 282080 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:25:40,406-Speed 2641.80 samples/sec Loss 8.9559 LearningRate 0.0436 Epoch: 6 Global Step: 282090 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:25:44,308-Speed 2625.17 samples/sec Loss 8.9892 LearningRate 0.0436 Epoch: 6 Global Step: 282100 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:25:48,203-Speed 2629.02 samples/sec Loss 8.8704 LearningRate 0.0436 Epoch: 6 Global Step: 282110 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:25:52,102-Speed 2627.76 samples/sec Loss 8.8955 LearningRate 0.0436 Epoch: 6 Global Step: 282120 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:25:56,001-Speed 2627.42 samples/sec Loss 8.8446 LearningRate 0.0435 Epoch: 6 Global Step: 282130 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:25:59,899-Speed 2627.06 samples/sec Loss 8.9933 LearningRate 0.0435 Epoch: 6 Global Step: 282140 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:26:03,793-Speed 2631.27 samples/sec Loss 9.0237 LearningRate 0.0435 Epoch: 6 Global Step: 282150 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:26:07,832-Speed 2535.42 samples/sec Loss 9.0079 LearningRate 0.0435 Epoch: 6 Global Step: 282160 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:26:11,905-Speed 2514.60 samples/sec Loss 8.9651 LearningRate 0.0435 Epoch: 6 Global Step: 282170 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:26:15,992-Speed 2506.41 samples/sec Loss 9.0572 LearningRate 0.0435 Epoch: 6 Global Step: 282180 Fp16 Grad Scale: 65536 Required: 62 hours
Training: 2022-04-14 03:26:20,069-Speed 2512.64 samples/sec Loss 9.0076 LearningRate 0.0435 Epoch: 6 Global Step: 282190 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:26:24,089-Speed 2547.37 samples/sec Loss 9.0730 LearningRate 0.0435 Epoch: 6 Global Step: 282200 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:26:27,981-Speed 2631.48 samples/sec Loss 8.8980 LearningRate 0.0435 Epoch: 6 Global Step: 282210 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:26:31,881-Speed 2626.77 samples/sec Loss 9.0204 LearningRate 0.0435 Epoch: 6 Global Step: 282220 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:26:35,776-Speed 2629.71 samples/sec Loss 8.9595 LearningRate 0.0435 Epoch: 6 Global Step: 282230 Fp16 Grad Scale: 131072 Required: 62 hours
Training: 2022-04-14 03:26:39,670-Speed 2629.87 samples/sec Loss 8.9385 LearningRate 0.0435 Epoch: 6 Global Step: 282240 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:26:43,576-Speed 2622.34 samples/sec Loss 8.9436 LearningRate 0.0435 Epoch: 6 Global Step: 282250 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:26:47,486-Speed 2619.58 samples/sec Loss 9.0003 LearningRate 0.0435 Epoch: 6 Global Step: 282260 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:26:51,380-Speed 2630.34 samples/sec Loss 8.9251 LearningRate 0.0435 Epoch: 6 Global Step: 282270 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:26:55,274-Speed 2630.87 samples/sec Loss 8.9841 LearningRate 0.0435 Epoch: 6 Global Step: 282280 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:26:59,187-Speed 2617.35 samples/sec Loss 8.9209 LearningRate 0.0435 Epoch: 6 Global Step: 282290 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:27:03,080-Speed 2630.86 samples/sec Loss 8.9781 LearningRate 0.0435 Epoch: 6 Global Step: 282300 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:27:06,977-Speed 2628.25 samples/sec Loss 8.8820 LearningRate 0.0435 Epoch: 6 Global Step: 282310 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:27:10,877-Speed 2626.67 samples/sec Loss 8.9492 LearningRate 0.0435 Epoch: 6 Global Step: 282320 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:27:14,754-Speed 2641.88 samples/sec Loss 9.0781 LearningRate 0.0435 Epoch: 6 Global Step: 282330 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:27:18,646-Speed 2631.35 samples/sec Loss 8.8727 LearningRate 0.0435 Epoch: 6 Global Step: 282340 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:27:22,539-Speed 2631.18 samples/sec Loss 8.9967 LearningRate 0.0435 Epoch: 6 Global Step: 282350 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:27:26,432-Speed 2631.02 samples/sec Loss 8.9524 LearningRate 0.0435 Epoch: 6 Global Step: 282360 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:27:30,324-Speed 2631.78 samples/sec Loss 8.7617 LearningRate 0.0435 Epoch: 6 Global Step: 282370 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:27:34,226-Speed 2624.61 samples/sec Loss 9.0567 LearningRate 0.0435 Epoch: 6 Global Step: 282380 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:27:38,100-Speed 2644.07 samples/sec Loss 8.9479 LearningRate 0.0435 Epoch: 6 Global Step: 282390 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:27:41,994-Speed 2630.19 samples/sec Loss 8.8291 LearningRate 0.0435 Epoch: 6 Global Step: 282400 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:27:45,915-Speed 2612.65 samples/sec Loss 8.9099 LearningRate 0.0435 Epoch: 6 Global Step: 282410 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:27:49,807-Speed 2631.49 samples/sec Loss 8.9919 LearningRate 0.0435 Epoch: 6 Global Step: 282420 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:27:53,699-Speed 2632.09 samples/sec Loss 8.9133 LearningRate 0.0435 Epoch: 6 Global Step: 282430 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:27:57,596-Speed 2628.15 samples/sec Loss 8.9549 LearningRate 0.0435 Epoch: 6 Global Step: 282440 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:28:01,487-Speed 2632.67 samples/sec Loss 8.8331 LearningRate 0.0435 Epoch: 6 Global Step: 282450 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:28:05,393-Speed 2621.89 samples/sec Loss 8.8774 LearningRate 0.0435 Epoch: 6 Global Step: 282460 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:28:09,308-Speed 2616.46 samples/sec Loss 8.9156 LearningRate 0.0435 Epoch: 6 Global Step: 282470 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:28:13,210-Speed 2624.80 samples/sec Loss 8.9302 LearningRate 0.0435 Epoch: 6 Global Step: 282480 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:28:17,110-Speed 2626.37 samples/sec Loss 8.8316 LearningRate 0.0435 Epoch: 6 Global Step: 282490 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:21,015-Speed 2623.12 samples/sec Loss 9.0360 LearningRate 0.0435 Epoch: 6 Global Step: 282500 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:24,916-Speed 2625.37 samples/sec Loss 8.8750 LearningRate 0.0435 Epoch: 6 Global Step: 282510 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:28,822-Speed 2622.50 samples/sec Loss 8.8773 LearningRate 0.0435 Epoch: 6 Global Step: 282520 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:32,720-Speed 2628.10 samples/sec Loss 9.0990 LearningRate 0.0435 Epoch: 6 Global Step: 282530 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:36,621-Speed 2625.35 samples/sec Loss 8.9860 LearningRate 0.0435 Epoch: 6 Global Step: 282540 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:40,525-Speed 2623.28 samples/sec Loss 8.8866 LearningRate 0.0435 Epoch: 6 Global Step: 282550 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:44,431-Speed 2622.52 samples/sec Loss 8.7707 LearningRate 0.0435 Epoch: 6 Global Step: 282560 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:48,346-Speed 2615.69 samples/sec Loss 8.8888 LearningRate 0.0435 Epoch: 6 Global Step: 282570 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:52,240-Speed 2630.92 samples/sec Loss 9.0367 LearningRate 0.0435 Epoch: 6 Global Step: 282580 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:28:56,118-Speed 2641.09 samples/sec Loss 8.8452 LearningRate 0.0435 Epoch: 6 Global Step: 282590 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:00,011-Speed 2631.22 samples/sec Loss 8.9118 LearningRate 0.0435 Epoch: 6 Global Step: 282600 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:03,920-Speed 2620.07 samples/sec Loss 8.9233 LearningRate 0.0435 Epoch: 6 Global Step: 282610 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:07,813-Speed 2631.21 samples/sec Loss 9.0544 LearningRate 0.0435 Epoch: 6 Global Step: 282620 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:11,708-Speed 2629.30 samples/sec Loss 8.9487 LearningRate 0.0435 Epoch: 6 Global Step: 282630 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:15,604-Speed 2629.56 samples/sec Loss 8.9414 LearningRate 0.0435 Epoch: 6 Global Step: 282640 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:19,508-Speed 2623.47 samples/sec Loss 8.9213 LearningRate 0.0435 Epoch: 6 Global Step: 282650 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:23,399-Speed 2632.68 samples/sec Loss 9.0602 LearningRate 0.0435 Epoch: 6 Global Step: 282660 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:27,311-Speed 2618.00 samples/sec Loss 8.9149 LearningRate 0.0435 Epoch: 6 Global Step: 282670 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:29:31,169-Speed 2654.79 samples/sec Loss 9.7200 LearningRate 0.0435 Epoch: 6 Global Step: 282680 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:29:35,043-Speed 2644.37 samples/sec Loss 10.1879 LearningRate 0.0435 Epoch: 6 Global Step: 282690 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:29:38,938-Speed 2629.37 samples/sec Loss 9.4880 LearningRate 0.0435 Epoch: 6 Global Step: 282700 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:29:42,830-Speed 2631.81 samples/sec Loss 9.1766 LearningRate 0.0435 Epoch: 6 Global Step: 282710 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:29:46,721-Speed 2632.28 samples/sec Loss 9.0414 LearningRate 0.0435 Epoch: 6 Global Step: 282720 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:29:50,618-Speed 2627.97 samples/sec Loss 9.0536 LearningRate 0.0435 Epoch: 6 Global Step: 282730 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:29:54,550-Speed 2605.84 samples/sec Loss 8.9925 LearningRate 0.0435 Epoch: 6 Global Step: 282740 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:29:58,443-Speed 2630.62 samples/sec Loss 9.2231 LearningRate 0.0434 Epoch: 6 Global Step: 282750 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:30:02,338-Speed 2630.21 samples/sec Loss 8.8733 LearningRate 0.0434 Epoch: 6 Global Step: 282760 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:30:06,230-Speed 2631.06 samples/sec Loss 8.9299 LearningRate 0.0434 Epoch: 6 Global Step: 282770 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:30:10,133-Speed 2624.87 samples/sec Loss 8.8776 LearningRate 0.0434 Epoch: 6 Global Step: 282780 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:30:14,024-Speed 2631.73 samples/sec Loss 8.8722 LearningRate 0.0434 Epoch: 6 Global Step: 282790 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:17,919-Speed 2630.45 samples/sec Loss 8.9566 LearningRate 0.0434 Epoch: 6 Global Step: 282800 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:21,809-Speed 2633.27 samples/sec Loss 8.8503 LearningRate 0.0434 Epoch: 6 Global Step: 282810 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:25,716-Speed 2621.42 samples/sec Loss 8.9178 LearningRate 0.0434 Epoch: 6 Global Step: 282820 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:29,609-Speed 2630.57 samples/sec Loss 8.9567 LearningRate 0.0434 Epoch: 6 Global Step: 282830 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:33,500-Speed 2632.40 samples/sec Loss 8.9086 LearningRate 0.0434 Epoch: 6 Global Step: 282840 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:37,389-Speed 2634.13 samples/sec Loss 9.0240 LearningRate 0.0434 Epoch: 6 Global Step: 282850 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:41,345-Speed 2589.26 samples/sec Loss 8.8817 LearningRate 0.0434 Epoch: 6 Global Step: 282860 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:45,288-Speed 2597.59 samples/sec Loss 8.8911 LearningRate 0.0434 Epoch: 6 Global Step: 282870 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:49,227-Speed 2599.89 samples/sec Loss 8.9283 LearningRate 0.0434 Epoch: 6 Global Step: 282880 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:30:53,181-Speed 2590.99 samples/sec Loss 9.0108 LearningRate 0.0434 Epoch: 6 Global Step: 282890 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:30:57,088-Speed 2621.52 samples/sec Loss 8.8391 LearningRate 0.0434 Epoch: 6 Global Step: 282900 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:00,988-Speed 2626.04 samples/sec Loss 8.8304 LearningRate 0.0434 Epoch: 6 Global Step: 282910 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:04,890-Speed 2625.69 samples/sec Loss 8.9451 LearningRate 0.0434 Epoch: 6 Global Step: 282920 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:08,782-Speed 2631.29 samples/sec Loss 8.8829 LearningRate 0.0434 Epoch: 6 Global Step: 282930 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:12,670-Speed 2634.27 samples/sec Loss 8.8565 LearningRate 0.0434 Epoch: 6 Global Step: 282940 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:16,564-Speed 2630.30 samples/sec Loss 9.0117 LearningRate 0.0434 Epoch: 6 Global Step: 282950 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:20,492-Speed 2608.05 samples/sec Loss 8.8951 LearningRate 0.0434 Epoch: 6 Global Step: 282960 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:24,397-Speed 2622.59 samples/sec Loss 9.0913 LearningRate 0.0434 Epoch: 6 Global Step: 282970 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:28,288-Speed 2633.07 samples/sec Loss 9.0802 LearningRate 0.0434 Epoch: 6 Global Step: 282980 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:31:32,183-Speed 2629.09 samples/sec Loss 8.9298 LearningRate 0.0434 Epoch: 6 Global Step: 282990 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:31:36,074-Speed 2633.10 samples/sec Loss 9.1370 LearningRate 0.0434 Epoch: 6 Global Step: 283000 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:31:39,977-Speed 2624.14 samples/sec Loss 8.8356 LearningRate 0.0434 Epoch: 6 Global Step: 283010 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:31:43,869-Speed 2631.49 samples/sec Loss 8.9387 LearningRate 0.0434 Epoch: 6 Global Step: 283020 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:31:47,760-Speed 2632.11 samples/sec Loss 8.8209 LearningRate 0.0434 Epoch: 6 Global Step: 283030 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:31:51,652-Speed 2631.43 samples/sec Loss 8.9578 LearningRate 0.0434 Epoch: 6 Global Step: 283040 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:31:55,548-Speed 2629.44 samples/sec Loss 8.9158 LearningRate 0.0434 Epoch: 6 Global Step: 283050 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:31:59,437-Speed 2633.83 samples/sec Loss 8.8162 LearningRate 0.0434 Epoch: 6 Global Step: 283060 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:03,336-Speed 2626.86 samples/sec Loss 8.8478 LearningRate 0.0434 Epoch: 6 Global Step: 283070 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:07,233-Speed 2628.70 samples/sec Loss 8.8345 LearningRate 0.0434 Epoch: 6 Global Step: 283080 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:11,109-Speed 2642.12 samples/sec Loss 8.9750 LearningRate 0.0434 Epoch: 6 Global Step: 283090 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:15,001-Speed 2631.60 samples/sec Loss 8.9799 LearningRate 0.0434 Epoch: 6 Global Step: 283100 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:18,892-Speed 2631.92 samples/sec Loss 8.9213 LearningRate 0.0434 Epoch: 6 Global Step: 283110 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:22,791-Speed 2627.70 samples/sec Loss 9.0041 LearningRate 0.0434 Epoch: 6 Global Step: 283120 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:26,685-Speed 2630.48 samples/sec Loss 9.0127 LearningRate 0.0434 Epoch: 6 Global Step: 283130 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:30,590-Speed 2622.61 samples/sec Loss 8.8208 LearningRate 0.0434 Epoch: 6 Global Step: 283140 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:34,612-Speed 2546.88 samples/sec Loss 8.8410 LearningRate 0.0434 Epoch: 6 Global Step: 283150 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:38,731-Speed 2486.13 samples/sec Loss 8.9548 LearningRate 0.0434 Epoch: 6 Global Step: 283160 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:42,627-Speed 2629.93 samples/sec Loss 9.1274 LearningRate 0.0434 Epoch: 6 Global Step: 283170 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:46,628-Speed 2560.08 samples/sec Loss 8.9281 LearningRate 0.0434 Epoch: 6 Global Step: 283180 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:32:50,517-Speed 2632.89 samples/sec Loss 8.8762 LearningRate 0.0434 Epoch: 6 Global Step: 283190 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:32:54,409-Speed 2632.19 samples/sec Loss 8.7945 LearningRate 0.0434 Epoch: 6 Global Step: 283200 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:32:58,303-Speed 2630.24 samples/sec Loss 8.9922 LearningRate 0.0434 Epoch: 6 Global Step: 283210 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:33:02,201-Speed 2627.65 samples/sec Loss 8.8654 LearningRate 0.0434 Epoch: 6 Global Step: 283220 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:33:06,075-Speed 2643.78 samples/sec Loss 8.9529 LearningRate 0.0434 Epoch: 6 Global Step: 283230 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:09,966-Speed 2632.63 samples/sec Loss 8.9257 LearningRate 0.0434 Epoch: 6 Global Step: 283240 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:13,861-Speed 2628.91 samples/sec Loss 8.9442 LearningRate 0.0434 Epoch: 6 Global Step: 283250 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:17,757-Speed 2629.35 samples/sec Loss 8.9082 LearningRate 0.0434 Epoch: 6 Global Step: 283260 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:21,655-Speed 2628.17 samples/sec Loss 8.9432 LearningRate 0.0434 Epoch: 6 Global Step: 283270 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:25,544-Speed 2633.03 samples/sec Loss 8.9398 LearningRate 0.0434 Epoch: 6 Global Step: 283280 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:29,454-Speed 2619.81 samples/sec Loss 8.8568 LearningRate 0.0434 Epoch: 6 Global Step: 283290 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:33,353-Speed 2627.40 samples/sec Loss 8.9499 LearningRate 0.0434 Epoch: 6 Global Step: 283300 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:37,281-Speed 2607.29 samples/sec Loss 8.9313 LearningRate 0.0434 Epoch: 6 Global Step: 283310 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:41,182-Speed 2625.36 samples/sec Loss 8.8944 LearningRate 0.0434 Epoch: 6 Global Step: 283320 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:33:45,088-Speed 2622.28 samples/sec Loss 8.8433 LearningRate 0.0434 Epoch: 6 Global Step: 283330 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:33:48,986-Speed 2627.93 samples/sec Loss 8.9407 LearningRate 0.0434 Epoch: 6 Global Step: 283340 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:33:52,885-Speed 2627.46 samples/sec Loss 8.8699 LearningRate 0.0434 Epoch: 6 Global Step: 283350 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:33:56,788-Speed 2623.63 samples/sec Loss 8.8830 LearningRate 0.0434 Epoch: 6 Global Step: 283360 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:00,685-Speed 2628.66 samples/sec Loss 9.0905 LearningRate 0.0434 Epoch: 6 Global Step: 283370 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:04,584-Speed 2627.17 samples/sec Loss 8.9905 LearningRate 0.0433 Epoch: 6 Global Step: 283380 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:08,487-Speed 2624.26 samples/sec Loss 8.8887 LearningRate 0.0433 Epoch: 6 Global Step: 283390 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:12,391-Speed 2623.39 samples/sec Loss 8.9381 LearningRate 0.0433 Epoch: 6 Global Step: 283400 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:16,298-Speed 2621.39 samples/sec Loss 8.8762 LearningRate 0.0433 Epoch: 6 Global Step: 283410 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:20,205-Speed 2621.73 samples/sec Loss 8.9485 LearningRate 0.0433 Epoch: 6 Global Step: 283420 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:24,108-Speed 2624.78 samples/sec Loss 8.8977 LearningRate 0.0433 Epoch: 6 Global Step: 283430 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:34:28,007-Speed 2626.87 samples/sec Loss 8.9242 LearningRate 0.0433 Epoch: 6 Global Step: 283440 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:34:31,892-Speed 2636.37 samples/sec Loss 8.8448 LearningRate 0.0433 Epoch: 6 Global Step: 283450 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:35,793-Speed 2625.10 samples/sec Loss 8.8434 LearningRate 0.0433 Epoch: 6 Global Step: 283460 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:39,697-Speed 2623.54 samples/sec Loss 9.0238 LearningRate 0.0433 Epoch: 6 Global Step: 283470 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:43,622-Speed 2609.53 samples/sec Loss 8.8630 LearningRate 0.0433 Epoch: 6 Global Step: 283480 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:47,529-Speed 2621.87 samples/sec Loss 8.8017 LearningRate 0.0433 Epoch: 6 Global Step: 283490 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:51,425-Speed 2628.38 samples/sec Loss 8.8091 LearningRate 0.0433 Epoch: 6 Global Step: 283500 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:55,326-Speed 2626.33 samples/sec Loss 8.8419 LearningRate 0.0433 Epoch: 6 Global Step: 283510 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:34:59,217-Speed 2631.96 samples/sec Loss 8.8987 LearningRate 0.0433 Epoch: 6 Global Step: 283520 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:03,110-Speed 2631.02 samples/sec Loss 8.9971 LearningRate 0.0433 Epoch: 6 Global Step: 283530 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:07,028-Speed 2614.00 samples/sec Loss 8.8823 LearningRate 0.0433 Epoch: 6 Global Step: 283540 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:10,928-Speed 2626.09 samples/sec Loss 8.8846 LearningRate 0.0433 Epoch: 6 Global Step: 283550 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:35:14,816-Speed 2634.65 samples/sec Loss 8.9660 LearningRate 0.0433 Epoch: 6 Global Step: 283560 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:18,728-Speed 2617.70 samples/sec Loss 8.9007 LearningRate 0.0433 Epoch: 6 Global Step: 283570 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:22,640-Speed 2619.06 samples/sec Loss 9.0620 LearningRate 0.0433 Epoch: 6 Global Step: 283580 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:26,545-Speed 2622.85 samples/sec Loss 9.0099 LearningRate 0.0433 Epoch: 6 Global Step: 283590 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:30,440-Speed 2629.58 samples/sec Loss 8.9696 LearningRate 0.0433 Epoch: 6 Global Step: 283600 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:34,333-Speed 2631.18 samples/sec Loss 8.7813 LearningRate 0.0433 Epoch: 6 Global Step: 283610 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:38,231-Speed 2627.49 samples/sec Loss 8.7517 LearningRate 0.0433 Epoch: 6 Global Step: 283620 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:42,129-Speed 2627.55 samples/sec Loss 8.9927 LearningRate 0.0433 Epoch: 6 Global Step: 283630 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:46,024-Speed 2629.87 samples/sec Loss 8.8960 LearningRate 0.0433 Epoch: 6 Global Step: 283640 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:49,916-Speed 2631.36 samples/sec Loss 8.9649 LearningRate 0.0433 Epoch: 6 Global Step: 283650 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:35:53,810-Speed 2630.77 samples/sec Loss 8.9273 LearningRate 0.0433 Epoch: 6 Global Step: 283660 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:35:57,712-Speed 2624.82 samples/sec Loss 8.8360 LearningRate 0.0433 Epoch: 6 Global Step: 283670 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:36:01,582-Speed 2647.02 samples/sec Loss 8.9061 LearningRate 0.0433 Epoch: 6 Global Step: 283680 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:05,479-Speed 2627.80 samples/sec Loss 8.9706 LearningRate 0.0433 Epoch: 6 Global Step: 283690 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:09,371-Speed 2631.42 samples/sec Loss 8.8992 LearningRate 0.0433 Epoch: 6 Global Step: 283700 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:13,267-Speed 2628.93 samples/sec Loss 8.9134 LearningRate 0.0433 Epoch: 6 Global Step: 283710 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:17,164-Speed 2628.55 samples/sec Loss 8.9191 LearningRate 0.0433 Epoch: 6 Global Step: 283720 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:21,059-Speed 2629.76 samples/sec Loss 8.8853 LearningRate 0.0433 Epoch: 6 Global Step: 283730 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:24,972-Speed 2617.50 samples/sec Loss 8.8802 LearningRate 0.0433 Epoch: 6 Global Step: 283740 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:28,874-Speed 2625.43 samples/sec Loss 8.7946 LearningRate 0.0433 Epoch: 6 Global Step: 283750 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:32,767-Speed 2630.66 samples/sec Loss 9.0343 LearningRate 0.0433 Epoch: 6 Global Step: 283760 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:36,668-Speed 2625.52 samples/sec Loss 8.9650 LearningRate 0.0433 Epoch: 6 Global Step: 283770 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:36:40,564-Speed 2628.99 samples/sec Loss 8.8570 LearningRate 0.0433 Epoch: 6 Global Step: 283780 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:36:44,459-Speed 2629.55 samples/sec Loss 8.9908 LearningRate 0.0433 Epoch: 6 Global Step: 283790 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:36:48,362-Speed 2624.57 samples/sec Loss 8.8424 LearningRate 0.0433 Epoch: 6 Global Step: 283800 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:36:52,254-Speed 2631.35 samples/sec Loss 8.8871 LearningRate 0.0433 Epoch: 6 Global Step: 283810 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:36:56,145-Speed 2632.64 samples/sec Loss 8.8931 LearningRate 0.0433 Epoch: 6 Global Step: 283820 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:37:00,037-Speed 2631.76 samples/sec Loss 8.9809 LearningRate 0.0433 Epoch: 6 Global Step: 283830 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:37:03,845-Speed 2689.20 samples/sec Loss 9.6805 LearningRate 0.0433 Epoch: 6 Global Step: 283840 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:07,742-Speed 2628.93 samples/sec Loss 9.4544 LearningRate 0.0433 Epoch: 6 Global Step: 283850 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:11,629-Speed 2634.81 samples/sec Loss 10.2517 LearningRate 0.0433 Epoch: 6 Global Step: 283860 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:15,522-Speed 2630.90 samples/sec Loss 9.6497 LearningRate 0.0433 Epoch: 6 Global Step: 283870 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:19,412-Speed 2632.95 samples/sec Loss 9.3343 LearningRate 0.0433 Epoch: 6 Global Step: 283880 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:23,309-Speed 2628.62 samples/sec Loss 9.0732 LearningRate 0.0433 Epoch: 6 Global Step: 283890 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:27,212-Speed 2624.00 samples/sec Loss 9.0652 LearningRate 0.0433 Epoch: 6 Global Step: 283900 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:31,095-Speed 2637.94 samples/sec Loss 9.0428 LearningRate 0.0433 Epoch: 6 Global Step: 283910 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:34,986-Speed 2632.70 samples/sec Loss 8.9241 LearningRate 0.0433 Epoch: 6 Global Step: 283920 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:38,884-Speed 2627.28 samples/sec Loss 8.9324 LearningRate 0.0433 Epoch: 6 Global Step: 283930 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:37:42,779-Speed 2630.20 samples/sec Loss 9.0772 LearningRate 0.0433 Epoch: 6 Global Step: 283940 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:37:46,678-Speed 2627.09 samples/sec Loss 8.9428 LearningRate 0.0433 Epoch: 6 Global Step: 283950 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:37:50,578-Speed 2625.82 samples/sec Loss 8.9241 LearningRate 0.0433 Epoch: 6 Global Step: 283960 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:37:54,473-Speed 2630.30 samples/sec Loss 8.9176 LearningRate 0.0433 Epoch: 6 Global Step: 283970 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:37:58,363-Speed 2633.07 samples/sec Loss 8.9799 LearningRate 0.0433 Epoch: 6 Global Step: 283980 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:38:02,263-Speed 2626.39 samples/sec Loss 8.9135 LearningRate 0.0433 Epoch: 6 Global Step: 283990 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:38:06,183-Speed 2613.22 samples/sec Loss 9.0558 LearningRate 0.0433 Epoch: 6 Global Step: 284000 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:38:10,072-Speed 2633.14 samples/sec Loss 9.1041 LearningRate 0.0432 Epoch: 6 Global Step: 284010 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:38:13,963-Speed 2632.78 samples/sec Loss 8.9242 LearningRate 0.0432 Epoch: 6 Global Step: 284020 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:38:17,855-Speed 2631.20 samples/sec Loss 8.9266 LearningRate 0.0432 Epoch: 6 Global Step: 284030 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:38:21,746-Speed 2632.69 samples/sec Loss 8.9780 LearningRate 0.0432 Epoch: 6 Global Step: 284040 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:25,636-Speed 2632.56 samples/sec Loss 8.9575 LearningRate 0.0432 Epoch: 6 Global Step: 284050 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:29,531-Speed 2629.60 samples/sec Loss 9.0099 LearningRate 0.0432 Epoch: 6 Global Step: 284060 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:33,425-Speed 2630.74 samples/sec Loss 8.7765 LearningRate 0.0432 Epoch: 6 Global Step: 284070 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:37,318-Speed 2631.47 samples/sec Loss 8.8995 LearningRate 0.0432 Epoch: 6 Global Step: 284080 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:41,220-Speed 2624.87 samples/sec Loss 8.9357 LearningRate 0.0432 Epoch: 6 Global Step: 284090 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:45,113-Speed 2630.78 samples/sec Loss 8.7364 LearningRate 0.0432 Epoch: 6 Global Step: 284100 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:49,006-Speed 2630.75 samples/sec Loss 8.8756 LearningRate 0.0432 Epoch: 6 Global Step: 284110 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:52,904-Speed 2627.93 samples/sec Loss 8.9952 LearningRate 0.0432 Epoch: 6 Global Step: 284120 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:38:56,806-Speed 2624.49 samples/sec Loss 8.9586 LearningRate 0.0432 Epoch: 6 Global Step: 284130 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:39:00,708-Speed 2624.99 samples/sec Loss 8.7528 LearningRate 0.0432 Epoch: 6 Global Step: 284140 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:04,609-Speed 2625.50 samples/sec Loss 8.8272 LearningRate 0.0432 Epoch: 6 Global Step: 284150 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:08,516-Speed 2622.37 samples/sec Loss 8.8586 LearningRate 0.0432 Epoch: 6 Global Step: 284160 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:12,415-Speed 2626.55 samples/sec Loss 8.8151 LearningRate 0.0432 Epoch: 6 Global Step: 284170 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:16,309-Speed 2630.02 samples/sec Loss 8.8711 LearningRate 0.0432 Epoch: 6 Global Step: 284180 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:20,206-Speed 2637.15 samples/sec Loss 8.9263 LearningRate 0.0432 Epoch: 6 Global Step: 284190 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:24,097-Speed 2632.58 samples/sec Loss 8.9584 LearningRate 0.0432 Epoch: 6 Global Step: 284200 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:27,987-Speed 2632.74 samples/sec Loss 8.9674 LearningRate 0.0432 Epoch: 6 Global Step: 284210 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:31,880-Speed 2631.11 samples/sec Loss 8.8194 LearningRate 0.0432 Epoch: 6 Global Step: 284220 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:35,774-Speed 2630.11 samples/sec Loss 8.8574 LearningRate 0.0432 Epoch: 6 Global Step: 284230 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:39,668-Speed 2630.41 samples/sec Loss 8.8515 LearningRate 0.0432 Epoch: 6 Global Step: 284240 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:39:43,567-Speed 2626.94 samples/sec Loss 8.9459 LearningRate 0.0432 Epoch: 6 Global Step: 284250 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:39:47,449-Speed 2639.18 samples/sec Loss 9.0747 LearningRate 0.0432 Epoch: 6 Global Step: 284260 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:39:51,346-Speed 2628.14 samples/sec Loss 8.8551 LearningRate 0.0432 Epoch: 6 Global Step: 284270 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:39:55,241-Speed 2629.32 samples/sec Loss 8.8200 LearningRate 0.0432 Epoch: 6 Global Step: 284280 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:39:59,136-Speed 2629.37 samples/sec Loss 8.8766 LearningRate 0.0432 Epoch: 6 Global Step: 284290 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:40:03,040-Speed 2623.45 samples/sec Loss 8.8757 LearningRate 0.0432 Epoch: 6 Global Step: 284300 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:40:06,940-Speed 2625.97 samples/sec Loss 8.7575 LearningRate 0.0432 Epoch: 6 Global Step: 284310 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:40:10,900-Speed 2586.76 samples/sec Loss 8.8676 LearningRate 0.0432 Epoch: 6 Global Step: 284320 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:40:14,827-Speed 2608.55 samples/sec Loss 8.8954 LearningRate 0.0432 Epoch: 6 Global Step: 284330 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:40:18,726-Speed 2627.37 samples/sec Loss 8.8152 LearningRate 0.0432 Epoch: 6 Global Step: 284340 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:40:22,615-Speed 2633.52 samples/sec Loss 8.8055 LearningRate 0.0432 Epoch: 6 Global Step: 284350 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:40:26,516-Speed 2625.52 samples/sec Loss 8.9406 LearningRate 0.0432 Epoch: 6 Global Step: 284360 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:30,423-Speed 2621.35 samples/sec Loss 8.9736 LearningRate 0.0432 Epoch: 6 Global Step: 284370 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:34,316-Speed 2630.97 samples/sec Loss 8.9150 LearningRate 0.0432 Epoch: 6 Global Step: 284380 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:38,208-Speed 2631.37 samples/sec Loss 8.9411 LearningRate 0.0432 Epoch: 6 Global Step: 284390 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:42,100-Speed 2632.09 samples/sec Loss 8.9420 LearningRate 0.0432 Epoch: 6 Global Step: 284400 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:45,991-Speed 2632.13 samples/sec Loss 8.9817 LearningRate 0.0432 Epoch: 6 Global Step: 284410 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:49,945-Speed 2590.58 samples/sec Loss 8.7446 LearningRate 0.0432 Epoch: 6 Global Step: 284420 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:53,846-Speed 2625.67 samples/sec Loss 8.9735 LearningRate 0.0432 Epoch: 6 Global Step: 284430 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:40:57,743-Speed 2628.08 samples/sec Loss 9.0133 LearningRate 0.0432 Epoch: 6 Global Step: 284440 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:41:01,669-Speed 2608.66 samples/sec Loss 9.2844 LearningRate 0.0432 Epoch: 6 Global Step: 284450 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:41:05,585-Speed 2615.60 samples/sec Loss 9.1676 LearningRate 0.0432 Epoch: 6 Global Step: 284460 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:41:09,479-Speed 2630.32 samples/sec Loss 8.9153 LearningRate 0.0432 Epoch: 6 Global Step: 284470 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:41:13,384-Speed 2622.93 samples/sec Loss 8.9011 LearningRate 0.0432 Epoch: 6 Global Step: 284480 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:41:17,280-Speed 2629.75 samples/sec Loss 8.8281 LearningRate 0.0432 Epoch: 6 Global Step: 284490 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:41:21,174-Speed 2630.16 samples/sec Loss 8.9251 LearningRate 0.0432 Epoch: 6 Global Step: 284500 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:41:25,084-Speed 2619.34 samples/sec Loss 8.8534 LearningRate 0.0432 Epoch: 6 Global Step: 284510 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:41:28,945-Speed 2652.91 samples/sec Loss 8.9321 LearningRate 0.0432 Epoch: 6 Global Step: 284520 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:41:32,839-Speed 2630.26 samples/sec Loss 8.8975 LearningRate 0.0432 Epoch: 6 Global Step: 284530 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:41:36,731-Speed 2631.68 samples/sec Loss 8.9555 LearningRate 0.0432 Epoch: 6 Global Step: 284540 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:41:40,622-Speed 2632.64 samples/sec Loss 9.0184 LearningRate 0.0432 Epoch: 6 Global Step: 284550 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:41:44,528-Speed 2622.33 samples/sec Loss 8.9165 LearningRate 0.0432 Epoch: 6 Global Step: 284560 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:41:48,419-Speed 2632.37 samples/sec Loss 8.9417 LearningRate 0.0432 Epoch: 6 Global Step: 284570 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:41:52,317-Speed 2628.09 samples/sec Loss 8.8982 LearningRate 0.0432 Epoch: 6 Global Step: 284580 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:41:56,239-Speed 2611.44 samples/sec Loss 8.9229 LearningRate 0.0432 Epoch: 6 Global Step: 284590 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:42:00,139-Speed 2626.59 samples/sec Loss 8.8080 LearningRate 0.0432 Epoch: 6 Global Step: 284600 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:42:04,030-Speed 2631.82 samples/sec Loss 8.9316 LearningRate 0.0432 Epoch: 6 Global Step: 284610 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:42:07,923-Speed 2631.13 samples/sec Loss 8.8812 LearningRate 0.0432 Epoch: 6 Global Step: 284620 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:11,819-Speed 2629.17 samples/sec Loss 8.9072 LearningRate 0.0432 Epoch: 6 Global Step: 284630 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:15,715-Speed 2629.35 samples/sec Loss 8.8138 LearningRate 0.0432 Epoch: 6 Global Step: 284640 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:19,613-Speed 2627.25 samples/sec Loss 8.7925 LearningRate 0.0431 Epoch: 6 Global Step: 284650 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:23,512-Speed 2627.08 samples/sec Loss 9.2147 LearningRate 0.0431 Epoch: 6 Global Step: 284660 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:27,420-Speed 2620.99 samples/sec Loss 9.4688 LearningRate 0.0431 Epoch: 6 Global Step: 284670 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:31,319-Speed 2627.32 samples/sec Loss 9.1214 LearningRate 0.0431 Epoch: 6 Global Step: 284680 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:35,219-Speed 2626.48 samples/sec Loss 8.8818 LearningRate 0.0431 Epoch: 6 Global Step: 284690 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:39,141-Speed 2611.29 samples/sec Loss 8.8672 LearningRate 0.0431 Epoch: 6 Global Step: 284700 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:43,043-Speed 2624.92 samples/sec Loss 8.9459 LearningRate 0.0431 Epoch: 6 Global Step: 284710 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:42:46,949-Speed 2622.14 samples/sec Loss 8.9172 LearningRate 0.0431 Epoch: 6 Global Step: 284720 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:42:50,842-Speed 2631.08 samples/sec Loss 8.9771 LearningRate 0.0431 Epoch: 6 Global Step: 284730 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:42:54,734-Speed 2631.40 samples/sec Loss 9.0622 LearningRate 0.0431 Epoch: 6 Global Step: 284740 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:42:58,629-Speed 2630.13 samples/sec Loss 8.8644 LearningRate 0.0431 Epoch: 6 Global Step: 284750 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:43:02,524-Speed 2629.93 samples/sec Loss 8.8639 LearningRate 0.0431 Epoch: 6 Global Step: 284760 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:43:06,416-Speed 2631.77 samples/sec Loss 8.9525 LearningRate 0.0431 Epoch: 6 Global Step: 284770 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:43:10,306-Speed 2632.73 samples/sec Loss 8.9583 LearningRate 0.0431 Epoch: 6 Global Step: 284780 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:43:14,203-Speed 2627.99 samples/sec Loss 8.9803 LearningRate 0.0431 Epoch: 6 Global Step: 284790 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:18,098-Speed 2629.90 samples/sec Loss 8.8963 LearningRate 0.0431 Epoch: 6 Global Step: 284800 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:21,990-Speed 2631.75 samples/sec Loss 8.8503 LearningRate 0.0431 Epoch: 6 Global Step: 284810 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:25,893-Speed 2623.78 samples/sec Loss 8.9528 LearningRate 0.0431 Epoch: 6 Global Step: 284820 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:29,786-Speed 2631.42 samples/sec Loss 8.9669 LearningRate 0.0431 Epoch: 6 Global Step: 284830 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:33,693-Speed 2621.48 samples/sec Loss 8.8878 LearningRate 0.0431 Epoch: 6 Global Step: 284840 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:37,585-Speed 2632.13 samples/sec Loss 8.8753 LearningRate 0.0431 Epoch: 6 Global Step: 284850 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:41,472-Speed 2634.80 samples/sec Loss 8.9990 LearningRate 0.0431 Epoch: 6 Global Step: 284860 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:45,364-Speed 2631.68 samples/sec Loss 8.9565 LearningRate 0.0431 Epoch: 6 Global Step: 284870 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:49,255-Speed 2631.90 samples/sec Loss 8.8862 LearningRate 0.0431 Epoch: 6 Global Step: 284880 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:43:53,148-Speed 2631.34 samples/sec Loss 8.8896 LearningRate 0.0431 Epoch: 6 Global Step: 284890 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:43:57,040-Speed 2631.59 samples/sec Loss 8.7729 LearningRate 0.0431 Epoch: 6 Global Step: 284900 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:44:00,933-Speed 2630.84 samples/sec Loss 8.9622 LearningRate 0.0431 Epoch: 6 Global Step: 284910 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:44:04,838-Speed 2622.85 samples/sec Loss 8.8108 LearningRate 0.0431 Epoch: 6 Global Step: 284920 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:44:08,699-Speed 2653.56 samples/sec Loss 9.1165 LearningRate 0.0431 Epoch: 6 Global Step: 284930 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:12,602-Speed 2623.47 samples/sec Loss 9.7217 LearningRate 0.0431 Epoch: 6 Global Step: 284940 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:16,504-Speed 2625.03 samples/sec Loss 9.3048 LearningRate 0.0431 Epoch: 6 Global Step: 284950 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:20,408-Speed 2623.74 samples/sec Loss 9.1264 LearningRate 0.0431 Epoch: 6 Global Step: 284960 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:24,301-Speed 2631.07 samples/sec Loss 8.9491 LearningRate 0.0431 Epoch: 6 Global Step: 284970 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:28,196-Speed 2629.96 samples/sec Loss 9.1535 LearningRate 0.0431 Epoch: 6 Global Step: 284980 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:32,088-Speed 2631.65 samples/sec Loss 8.9666 LearningRate 0.0431 Epoch: 6 Global Step: 284990 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:35,977-Speed 2633.66 samples/sec Loss 8.9818 LearningRate 0.0431 Epoch: 6 Global Step: 285000 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:39,869-Speed 2631.74 samples/sec Loss 8.9760 LearningRate 0.0431 Epoch: 6 Global Step: 285010 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:43,772-Speed 2624.58 samples/sec Loss 8.8992 LearningRate 0.0431 Epoch: 6 Global Step: 285020 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:44:47,666-Speed 2630.07 samples/sec Loss 8.8725 LearningRate 0.0431 Epoch: 6 Global Step: 285030 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:44:51,557-Speed 2632.49 samples/sec Loss 8.9109 LearningRate 0.0431 Epoch: 6 Global Step: 285040 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:44:55,448-Speed 2632.14 samples/sec Loss 8.9703 LearningRate 0.0431 Epoch: 6 Global Step: 285050 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:44:59,342-Speed 2630.12 samples/sec Loss 9.0213 LearningRate 0.0431 Epoch: 6 Global Step: 285060 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:45:03,295-Speed 2590.66 samples/sec Loss 8.9783 LearningRate 0.0431 Epoch: 6 Global Step: 285070 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:45:07,295-Speed 2560.67 samples/sec Loss 8.9851 LearningRate 0.0431 Epoch: 6 Global Step: 285080 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:45:11,191-Speed 2629.28 samples/sec Loss 8.9855 LearningRate 0.0431 Epoch: 6 Global Step: 285090 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:45:15,093-Speed 2624.46 samples/sec Loss 8.9348 LearningRate 0.0431 Epoch: 6 Global Step: 285100 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:45:18,987-Speed 2630.76 samples/sec Loss 8.8648 LearningRate 0.0431 Epoch: 6 Global Step: 285110 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:45:22,893-Speed 2622.41 samples/sec Loss 8.8777 LearningRate 0.0431 Epoch: 6 Global Step: 285120 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:45:26,785-Speed 2631.45 samples/sec Loss 8.8209 LearningRate 0.0431 Epoch: 6 Global Step: 285130 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:30,679-Speed 2630.48 samples/sec Loss 8.8930 LearningRate 0.0431 Epoch: 6 Global Step: 285140 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:34,583-Speed 2623.39 samples/sec Loss 8.8300 LearningRate 0.0431 Epoch: 6 Global Step: 285150 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:38,479-Speed 2628.92 samples/sec Loss 8.9131 LearningRate 0.0431 Epoch: 6 Global Step: 285160 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:42,372-Speed 2631.26 samples/sec Loss 8.8374 LearningRate 0.0431 Epoch: 6 Global Step: 285170 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:46,267-Speed 2628.99 samples/sec Loss 8.9038 LearningRate 0.0431 Epoch: 6 Global Step: 285180 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:50,160-Speed 2631.35 samples/sec Loss 8.8810 LearningRate 0.0431 Epoch: 6 Global Step: 285190 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:54,053-Speed 2631.41 samples/sec Loss 8.9144 LearningRate 0.0431 Epoch: 6 Global Step: 285200 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:45:57,945-Speed 2631.41 samples/sec Loss 8.9599 LearningRate 0.0431 Epoch: 6 Global Step: 285210 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:46:01,841-Speed 2629.25 samples/sec Loss 8.9223 LearningRate 0.0431 Epoch: 6 Global Step: 285220 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:46:05,716-Speed 2643.12 samples/sec Loss 8.9774 LearningRate 0.0431 Epoch: 6 Global Step: 285230 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:46:09,607-Speed 2632.08 samples/sec Loss 8.8502 LearningRate 0.0431 Epoch: 6 Global Step: 285240 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:46:13,480-Speed 2644.55 samples/sec Loss 8.8896 LearningRate 0.0431 Epoch: 6 Global Step: 285250 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:17,395-Speed 2615.83 samples/sec Loss 8.7804 LearningRate 0.0431 Epoch: 6 Global Step: 285260 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:21,286-Speed 2632.68 samples/sec Loss 8.7975 LearningRate 0.0431 Epoch: 6 Global Step: 285270 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:25,177-Speed 2632.47 samples/sec Loss 9.0551 LearningRate 0.0430 Epoch: 6 Global Step: 285280 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:29,073-Speed 2629.20 samples/sec Loss 8.9144 LearningRate 0.0430 Epoch: 6 Global Step: 285290 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:32,969-Speed 2628.72 samples/sec Loss 8.8985 LearningRate 0.0430 Epoch: 6 Global Step: 285300 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:36,877-Speed 2621.02 samples/sec Loss 8.8625 LearningRate 0.0430 Epoch: 6 Global Step: 285310 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:40,779-Speed 2624.97 samples/sec Loss 8.8138 LearningRate 0.0430 Epoch: 6 Global Step: 285320 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:44,707-Speed 2607.37 samples/sec Loss 8.8824 LearningRate 0.0430 Epoch: 6 Global Step: 285330 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:48,614-Speed 2621.86 samples/sec Loss 8.7687 LearningRate 0.0430 Epoch: 6 Global Step: 285340 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:46:52,510-Speed 2629.03 samples/sec Loss 8.7439 LearningRate 0.0430 Epoch: 6 Global Step: 285350 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:46:56,412-Speed 2625.70 samples/sec Loss 8.9328 LearningRate 0.0430 Epoch: 6 Global Step: 285360 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:00,304-Speed 2631.14 samples/sec Loss 8.8748 LearningRate 0.0430 Epoch: 6 Global Step: 285370 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:04,222-Speed 2614.48 samples/sec Loss 8.9924 LearningRate 0.0430 Epoch: 6 Global Step: 285380 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:08,116-Speed 2630.34 samples/sec Loss 8.8701 LearningRate 0.0430 Epoch: 6 Global Step: 285390 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:12,022-Speed 2622.08 samples/sec Loss 8.8680 LearningRate 0.0430 Epoch: 6 Global Step: 285400 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:15,919-Speed 2628.57 samples/sec Loss 8.9465 LearningRate 0.0430 Epoch: 6 Global Step: 285410 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:19,815-Speed 2628.99 samples/sec Loss 8.9274 LearningRate 0.0430 Epoch: 6 Global Step: 285420 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:23,798-Speed 2571.17 samples/sec Loss 9.0353 LearningRate 0.0430 Epoch: 6 Global Step: 285430 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:27,693-Speed 2629.78 samples/sec Loss 8.8179 LearningRate 0.0430 Epoch: 6 Global Step: 285440 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:31,587-Speed 2630.39 samples/sec Loss 8.9270 LearningRate 0.0430 Epoch: 6 Global Step: 285450 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:47:35,467-Speed 2639.97 samples/sec Loss 8.9519 LearningRate 0.0430 Epoch: 6 Global Step: 285460 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:39,361-Speed 2629.62 samples/sec Loss 9.0484 LearningRate 0.0430 Epoch: 6 Global Step: 285470 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:43,264-Speed 2626.20 samples/sec Loss 8.8603 LearningRate 0.0430 Epoch: 6 Global Step: 285480 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:47,161-Speed 2628.37 samples/sec Loss 8.8813 LearningRate 0.0430 Epoch: 6 Global Step: 285490 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:51,058-Speed 2628.39 samples/sec Loss 9.0342 LearningRate 0.0430 Epoch: 6 Global Step: 285500 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:54,953-Speed 2629.25 samples/sec Loss 8.8599 LearningRate 0.0430 Epoch: 6 Global Step: 285510 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:47:58,851-Speed 2628.48 samples/sec Loss 8.9248 LearningRate 0.0430 Epoch: 6 Global Step: 285520 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:02,753-Speed 2624.97 samples/sec Loss 8.8273 LearningRate 0.0430 Epoch: 6 Global Step: 285530 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:06,647-Speed 2629.95 samples/sec Loss 8.8884 LearningRate 0.0430 Epoch: 6 Global Step: 285540 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:10,541-Speed 2629.75 samples/sec Loss 8.8839 LearningRate 0.0430 Epoch: 6 Global Step: 285550 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:14,435-Speed 2631.04 samples/sec Loss 8.8652 LearningRate 0.0430 Epoch: 6 Global Step: 285560 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:48:18,325-Speed 2632.99 samples/sec Loss 8.8423 LearningRate 0.0430 Epoch: 6 Global Step: 285570 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:22,221-Speed 2629.09 samples/sec Loss 8.9437 LearningRate 0.0430 Epoch: 6 Global Step: 285580 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:26,116-Speed 2629.03 samples/sec Loss 8.9040 LearningRate 0.0430 Epoch: 6 Global Step: 285590 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:30,008-Speed 2631.56 samples/sec Loss 8.9193 LearningRate 0.0430 Epoch: 6 Global Step: 285600 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:33,911-Speed 2624.88 samples/sec Loss 8.8932 LearningRate 0.0430 Epoch: 6 Global Step: 285610 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:37,805-Speed 2629.88 samples/sec Loss 8.9576 LearningRate 0.0430 Epoch: 6 Global Step: 285620 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:48:41,651-Speed 2663.40 samples/sec Loss 9.5452 LearningRate 0.0430 Epoch: 6 Global Step: 285630 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:48:45,533-Speed 2638.85 samples/sec Loss 9.2923 LearningRate 0.0430 Epoch: 6 Global Step: 285640 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:48:49,423-Speed 2633.06 samples/sec Loss 8.8949 LearningRate 0.0430 Epoch: 6 Global Step: 285650 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:48:53,313-Speed 2632.87 samples/sec Loss 8.9823 LearningRate 0.0430 Epoch: 6 Global Step: 285660 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:48:57,240-Speed 2607.90 samples/sec Loss 8.9827 LearningRate 0.0430 Epoch: 6 Global Step: 285670 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:49:01,182-Speed 2598.32 samples/sec Loss 8.9465 LearningRate 0.0430 Epoch: 6 Global Step: 285680 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:49:05,075-Speed 2631.26 samples/sec Loss 8.8702 LearningRate 0.0430 Epoch: 6 Global Step: 285690 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:49:08,967-Speed 2631.74 samples/sec Loss 8.8931 LearningRate 0.0430 Epoch: 6 Global Step: 285700 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:49:12,858-Speed 2632.76 samples/sec Loss 8.9291 LearningRate 0.0430 Epoch: 6 Global Step: 285710 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:49:16,752-Speed 2630.17 samples/sec Loss 8.9855 LearningRate 0.0430 Epoch: 6 Global Step: 285720 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:49:20,645-Speed 2630.33 samples/sec Loss 8.9469 LearningRate 0.0430 Epoch: 6 Global Step: 285730 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:24,546-Speed 2625.73 samples/sec Loss 8.8598 LearningRate 0.0430 Epoch: 6 Global Step: 285740 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:28,444-Speed 2627.64 samples/sec Loss 8.9843 LearningRate 0.0430 Epoch: 6 Global Step: 285750 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:32,338-Speed 2630.39 samples/sec Loss 8.9529 LearningRate 0.0430 Epoch: 6 Global Step: 285760 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:36,228-Speed 2633.01 samples/sec Loss 8.8560 LearningRate 0.0430 Epoch: 6 Global Step: 285770 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:40,134-Speed 2622.06 samples/sec Loss 8.9067 LearningRate 0.0430 Epoch: 6 Global Step: 285780 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:44,038-Speed 2624.20 samples/sec Loss 8.8978 LearningRate 0.0430 Epoch: 6 Global Step: 285790 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:47,932-Speed 2629.87 samples/sec Loss 8.9465 LearningRate 0.0430 Epoch: 6 Global Step: 285800 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:51,822-Speed 2633.23 samples/sec Loss 8.8681 LearningRate 0.0430 Epoch: 6 Global Step: 285810 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:55,714-Speed 2631.19 samples/sec Loss 9.0106 LearningRate 0.0430 Epoch: 6 Global Step: 285820 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:49:59,628-Speed 2617.41 samples/sec Loss 8.7821 LearningRate 0.0430 Epoch: 6 Global Step: 285830 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:50:03,560-Speed 2604.98 samples/sec Loss 8.9618 LearningRate 0.0430 Epoch: 6 Global Step: 285840 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:50:07,445-Speed 2636.31 samples/sec Loss 9.0039 LearningRate 0.0430 Epoch: 6 Global Step: 285850 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:11,329-Speed 2637.54 samples/sec Loss 9.1228 LearningRate 0.0430 Epoch: 6 Global Step: 285860 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:15,228-Speed 2626.98 samples/sec Loss 8.8524 LearningRate 0.0430 Epoch: 6 Global Step: 285870 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:19,118-Speed 2632.78 samples/sec Loss 8.9462 LearningRate 0.0430 Epoch: 6 Global Step: 285880 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:23,009-Speed 2632.46 samples/sec Loss 8.9877 LearningRate 0.0430 Epoch: 6 Global Step: 285890 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:26,905-Speed 2628.94 samples/sec Loss 8.9179 LearningRate 0.0430 Epoch: 6 Global Step: 285900 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:30,799-Speed 2630.36 samples/sec Loss 8.8983 LearningRate 0.0429 Epoch: 6 Global Step: 285910 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:34,690-Speed 2632.62 samples/sec Loss 8.8748 LearningRate 0.0429 Epoch: 6 Global Step: 285920 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:38,589-Speed 2627.29 samples/sec Loss 8.7838 LearningRate 0.0429 Epoch: 6 Global Step: 285930 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:42,481-Speed 2631.62 samples/sec Loss 8.9174 LearningRate 0.0429 Epoch: 6 Global Step: 285940 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:50:46,370-Speed 2634.06 samples/sec Loss 8.8817 LearningRate 0.0429 Epoch: 6 Global Step: 285950 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:50:50,263-Speed 2630.25 samples/sec Loss 9.0496 LearningRate 0.0429 Epoch: 6 Global Step: 285960 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:50:54,195-Speed 2605.13 samples/sec Loss 8.8577 LearningRate 0.0429 Epoch: 6 Global Step: 285970 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:50:58,088-Speed 2631.12 samples/sec Loss 8.9609 LearningRate 0.0429 Epoch: 6 Global Step: 285980 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:51:02,015-Speed 2608.48 samples/sec Loss 8.8734 LearningRate 0.0429 Epoch: 6 Global Step: 285990 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:51:05,911-Speed 2629.03 samples/sec Loss 8.8471 LearningRate 0.0429 Epoch: 6 Global Step: 286000 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:51:09,834-Speed 2611.10 samples/sec Loss 8.8320 LearningRate 0.0429 Epoch: 6 Global Step: 286010 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:51:13,723-Speed 2633.63 samples/sec Loss 8.9255 LearningRate 0.0429 Epoch: 6 Global Step: 286020 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:51:17,631-Speed 2620.95 samples/sec Loss 8.9276 LearningRate 0.0429 Epoch: 6 Global Step: 286030 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:51:21,520-Speed 2633.90 samples/sec Loss 8.8883 LearningRate 0.0429 Epoch: 6 Global Step: 286040 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:51:25,410-Speed 2633.17 samples/sec Loss 8.8449 LearningRate 0.0429 Epoch: 6 Global Step: 286050 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:29,301-Speed 2632.34 samples/sec Loss 8.8542 LearningRate 0.0429 Epoch: 6 Global Step: 286060 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:33,197-Speed 2629.04 samples/sec Loss 8.9034 LearningRate 0.0429 Epoch: 6 Global Step: 286070 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:37,089-Speed 2631.42 samples/sec Loss 8.8308 LearningRate 0.0429 Epoch: 6 Global Step: 286080 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:40,989-Speed 2626.21 samples/sec Loss 8.8789 LearningRate 0.0429 Epoch: 6 Global Step: 286090 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:44,885-Speed 2629.35 samples/sec Loss 8.7973 LearningRate 0.0429 Epoch: 6 Global Step: 286100 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:48,781-Speed 2629.41 samples/sec Loss 8.7821 LearningRate 0.0429 Epoch: 6 Global Step: 286110 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:52,676-Speed 2629.77 samples/sec Loss 8.8225 LearningRate 0.0429 Epoch: 6 Global Step: 286120 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:51:56,598-Speed 2611.11 samples/sec Loss 8.8779 LearningRate 0.0429 Epoch: 6 Global Step: 286130 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:52:00,492-Speed 2630.86 samples/sec Loss 8.8507 LearningRate 0.0429 Epoch: 6 Global Step: 286140 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:52:04,385-Speed 2631.06 samples/sec Loss 8.9463 LearningRate 0.0429 Epoch: 6 Global Step: 286150 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:08,279-Speed 2630.20 samples/sec Loss 8.9571 LearningRate 0.0429 Epoch: 6 Global Step: 286160 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:12,207-Speed 2607.70 samples/sec Loss 8.9851 LearningRate 0.0429 Epoch: 6 Global Step: 286170 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:16,113-Speed 2621.65 samples/sec Loss 8.7836 LearningRate 0.0429 Epoch: 6 Global Step: 286180 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:20,031-Speed 2614.92 samples/sec Loss 8.9396 LearningRate 0.0429 Epoch: 6 Global Step: 286190 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:23,928-Speed 2628.73 samples/sec Loss 8.7012 LearningRate 0.0429 Epoch: 6 Global Step: 286200 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:27,841-Speed 2617.65 samples/sec Loss 8.8809 LearningRate 0.0429 Epoch: 6 Global Step: 286210 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:31,735-Speed 2630.14 samples/sec Loss 8.7933 LearningRate 0.0429 Epoch: 6 Global Step: 286220 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:35,641-Speed 2622.42 samples/sec Loss 9.0005 LearningRate 0.0429 Epoch: 6 Global Step: 286230 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:39,545-Speed 2623.10 samples/sec Loss 8.8451 LearningRate 0.0429 Epoch: 6 Global Step: 286240 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:43,460-Speed 2616.58 samples/sec Loss 8.8605 LearningRate 0.0429 Epoch: 6 Global Step: 286250 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:52:47,346-Speed 2635.69 samples/sec Loss 8.8547 LearningRate 0.0429 Epoch: 6 Global Step: 286260 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:51,245-Speed 2627.42 samples/sec Loss 8.9040 LearningRate 0.0429 Epoch: 6 Global Step: 286270 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:55,137-Speed 2631.39 samples/sec Loss 8.8688 LearningRate 0.0429 Epoch: 6 Global Step: 286280 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:52:59,037-Speed 2626.53 samples/sec Loss 8.9741 LearningRate 0.0429 Epoch: 6 Global Step: 286290 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:53:02,950-Speed 2617.57 samples/sec Loss 8.8313 LearningRate 0.0429 Epoch: 6 Global Step: 286300 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:53:06,842-Speed 2631.42 samples/sec Loss 8.7843 LearningRate 0.0429 Epoch: 6 Global Step: 286310 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:53:10,717-Speed 2642.84 samples/sec Loss 8.9032 LearningRate 0.0429 Epoch: 6 Global Step: 286320 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:14,609-Speed 2632.00 samples/sec Loss 8.7930 LearningRate 0.0429 Epoch: 6 Global Step: 286330 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:18,505-Speed 2629.08 samples/sec Loss 8.9555 LearningRate 0.0429 Epoch: 6 Global Step: 286340 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:22,401-Speed 2629.06 samples/sec Loss 8.9625 LearningRate 0.0429 Epoch: 6 Global Step: 286350 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:26,293-Speed 2631.59 samples/sec Loss 8.9354 LearningRate 0.0429 Epoch: 6 Global Step: 286360 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:30,197-Speed 2623.56 samples/sec Loss 8.7708 LearningRate 0.0429 Epoch: 6 Global Step: 286370 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:34,092-Speed 2629.61 samples/sec Loss 9.0417 LearningRate 0.0429 Epoch: 6 Global Step: 286380 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:38,025-Speed 2604.20 samples/sec Loss 8.8520 LearningRate 0.0429 Epoch: 6 Global Step: 286390 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:41,931-Speed 2622.81 samples/sec Loss 8.9354 LearningRate 0.0429 Epoch: 6 Global Step: 286400 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:45,832-Speed 2625.55 samples/sec Loss 8.9083 LearningRate 0.0429 Epoch: 6 Global Step: 286410 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:53:49,749-Speed 2614.84 samples/sec Loss 8.9260 LearningRate 0.0429 Epoch: 6 Global Step: 286420 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:53:53,667-Speed 2614.16 samples/sec Loss 8.8826 LearningRate 0.0429 Epoch: 6 Global Step: 286430 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:53:57,563-Speed 2628.91 samples/sec Loss 8.7748 LearningRate 0.0429 Epoch: 6 Global Step: 286440 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:54:01,467-Speed 2623.66 samples/sec Loss 8.9304 LearningRate 0.0429 Epoch: 6 Global Step: 286450 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:54:05,360-Speed 2631.22 samples/sec Loss 8.8718 LearningRate 0.0429 Epoch: 6 Global Step: 286460 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:54:09,425-Speed 2519.20 samples/sec Loss 8.7937 LearningRate 0.0429 Epoch: 6 Global Step: 286470 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:54:13,322-Speed 2628.75 samples/sec Loss 8.9734 LearningRate 0.0429 Epoch: 6 Global Step: 286480 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:54:17,226-Speed 2623.30 samples/sec Loss 8.8066 LearningRate 0.0429 Epoch: 6 Global Step: 286490 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:54:21,109-Speed 2638.37 samples/sec Loss 8.7259 LearningRate 0.0429 Epoch: 6 Global Step: 286500 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:25,020-Speed 2618.48 samples/sec Loss 8.9206 LearningRate 0.0429 Epoch: 6 Global Step: 286510 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:28,921-Speed 2625.77 samples/sec Loss 8.8667 LearningRate 0.0429 Epoch: 6 Global Step: 286520 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:32,817-Speed 2629.27 samples/sec Loss 8.9286 LearningRate 0.0429 Epoch: 6 Global Step: 286530 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:36,715-Speed 2627.94 samples/sec Loss 8.8362 LearningRate 0.0428 Epoch: 6 Global Step: 286540 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:40,610-Speed 2629.64 samples/sec Loss 8.8034 LearningRate 0.0428 Epoch: 6 Global Step: 286550 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:44,504-Speed 2630.26 samples/sec Loss 8.8559 LearningRate 0.0428 Epoch: 6 Global Step: 286560 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:48,400-Speed 2628.99 samples/sec Loss 8.9308 LearningRate 0.0428 Epoch: 6 Global Step: 286570 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:52,299-Speed 2626.62 samples/sec Loss 8.8630 LearningRate 0.0428 Epoch: 6 Global Step: 286580 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:54:56,189-Speed 2633.08 samples/sec Loss 8.8022 LearningRate 0.0428 Epoch: 6 Global Step: 286590 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 03:55:00,080-Speed 2632.41 samples/sec Loss 8.7190 LearningRate 0.0428 Epoch: 6 Global Step: 286600 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:03,973-Speed 2631.66 samples/sec Loss 8.9018 LearningRate 0.0428 Epoch: 6 Global Step: 286610 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:07,870-Speed 2627.69 samples/sec Loss 8.8504 LearningRate 0.0428 Epoch: 6 Global Step: 286620 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:11,764-Speed 2630.80 samples/sec Loss 8.7191 LearningRate 0.0428 Epoch: 6 Global Step: 286630 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:15,669-Speed 2622.80 samples/sec Loss 8.9175 LearningRate 0.0428 Epoch: 6 Global Step: 286640 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:19,560-Speed 2632.35 samples/sec Loss 8.8566 LearningRate 0.0428 Epoch: 6 Global Step: 286650 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:23,451-Speed 2631.92 samples/sec Loss 8.7625 LearningRate 0.0428 Epoch: 6 Global Step: 286660 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:27,345-Speed 2630.64 samples/sec Loss 8.8737 LearningRate 0.0428 Epoch: 6 Global Step: 286670 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:31,240-Speed 2629.61 samples/sec Loss 8.9980 LearningRate 0.0428 Epoch: 6 Global Step: 286680 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:35,135-Speed 2629.77 samples/sec Loss 8.7775 LearningRate 0.0428 Epoch: 6 Global Step: 286690 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:39,015-Speed 2639.63 samples/sec Loss 8.7025 LearningRate 0.0428 Epoch: 6 Global Step: 286700 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:42,912-Speed 2628.34 samples/sec Loss 8.7550 LearningRate 0.0428 Epoch: 6 Global Step: 286710 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:46,805-Speed 2630.97 samples/sec Loss 8.8933 LearningRate 0.0428 Epoch: 6 Global Step: 286720 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:50,702-Speed 2628.55 samples/sec Loss 8.9661 LearningRate 0.0428 Epoch: 6 Global Step: 286730 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:54,596-Speed 2629.59 samples/sec Loss 8.7855 LearningRate 0.0428 Epoch: 6 Global Step: 286740 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:55:58,495-Speed 2627.67 samples/sec Loss 8.8989 LearningRate 0.0428 Epoch: 6 Global Step: 286750 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:02,387-Speed 2631.98 samples/sec Loss 8.9575 LearningRate 0.0428 Epoch: 6 Global Step: 286760 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:06,281-Speed 2630.47 samples/sec Loss 8.8681 LearningRate 0.0428 Epoch: 6 Global Step: 286770 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:10,171-Speed 2632.36 samples/sec Loss 8.8652 LearningRate 0.0428 Epoch: 6 Global Step: 286780 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:14,070-Speed 2627.26 samples/sec Loss 8.9659 LearningRate 0.0428 Epoch: 6 Global Step: 286790 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:17,967-Speed 2628.11 samples/sec Loss 8.7647 LearningRate 0.0428 Epoch: 6 Global Step: 286800 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:56:21,860-Speed 2630.79 samples/sec Loss 8.9200 LearningRate 0.0428 Epoch: 6 Global Step: 286810 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:56:25,752-Speed 2631.50 samples/sec Loss 8.9286 LearningRate 0.0428 Epoch: 6 Global Step: 286820 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:56:29,661-Speed 2620.45 samples/sec Loss 8.9507 LearningRate 0.0428 Epoch: 6 Global Step: 286830 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 03:56:33,544-Speed 2637.77 samples/sec Loss 8.8557 LearningRate 0.0428 Epoch: 6 Global Step: 286840 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:37,444-Speed 2626.22 samples/sec Loss 8.8693 LearningRate 0.0428 Epoch: 6 Global Step: 286850 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:41,346-Speed 2624.66 samples/sec Loss 8.7317 LearningRate 0.0428 Epoch: 6 Global Step: 286860 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:45,254-Speed 2621.16 samples/sec Loss 8.7406 LearningRate 0.0428 Epoch: 6 Global Step: 286870 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 03:56:49,092-Speed 2668.74 samples/sec Loss 9.2719 LearningRate 0.0428 Epoch: 6 Global Step: 286880 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:56:53,027-Speed 2603.37 samples/sec Loss 9.2186 LearningRate 0.0428 Epoch: 6 Global Step: 286890 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:56:56,938-Speed 2618.69 samples/sec Loss 9.0050 LearningRate 0.0428 Epoch: 6 Global Step: 286900 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:57:00,838-Speed 2626.17 samples/sec Loss 9.0380 LearningRate 0.0428 Epoch: 6 Global Step: 286910 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:04,729-Speed 2632.31 samples/sec Loss 9.8714 LearningRate 0.0428 Epoch: 6 Global Step: 286920 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:08,712-Speed 2571.85 samples/sec Loss 9.0143 LearningRate 0.0428 Epoch: 6 Global Step: 286930 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:12,620-Speed 2621.24 samples/sec Loss 8.8739 LearningRate 0.0428 Epoch: 6 Global Step: 286940 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:16,513-Speed 2630.69 samples/sec Loss 8.9658 LearningRate 0.0428 Epoch: 6 Global Step: 286950 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:20,409-Speed 2629.65 samples/sec Loss 8.8880 LearningRate 0.0428 Epoch: 6 Global Step: 286960 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:24,305-Speed 2628.82 samples/sec Loss 8.6798 LearningRate 0.0428 Epoch: 6 Global Step: 286970 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:28,230-Speed 2609.72 samples/sec Loss 8.7869 LearningRate 0.0428 Epoch: 6 Global Step: 286980 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:32,127-Speed 2628.31 samples/sec Loss 8.9023 LearningRate 0.0428 Epoch: 6 Global Step: 286990 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:36,083-Speed 2589.22 samples/sec Loss 8.8405 LearningRate 0.0428 Epoch: 6 Global Step: 287000 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 03:57:40,040-Speed 2588.43 samples/sec Loss 8.7638 LearningRate 0.0428 Epoch: 6 Global Step: 287010 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:57:43,941-Speed 2626.29 samples/sec Loss 9.0077 LearningRate 0.0428 Epoch: 6 Global Step: 287020 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:57:47,834-Speed 2630.92 samples/sec Loss 8.9418 LearningRate 0.0428 Epoch: 6 Global Step: 287030 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:57:51,725-Speed 2633.00 samples/sec Loss 8.7865 LearningRate 0.0428 Epoch: 6 Global Step: 287040 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:57:55,622-Speed 2627.90 samples/sec Loss 8.7501 LearningRate 0.0428 Epoch: 6 Global Step: 287050 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:57:59,512-Speed 2633.61 samples/sec Loss 8.7439 LearningRate 0.0428 Epoch: 6 Global Step: 287060 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:58:03,421-Speed 2619.98 samples/sec Loss 9.0548 LearningRate 0.0428 Epoch: 6 Global Step: 287070 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:58:07,314-Speed 2631.32 samples/sec Loss 8.9224 LearningRate 0.0428 Epoch: 6 Global Step: 287080 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:58:11,208-Speed 2630.42 samples/sec Loss 9.0056 LearningRate 0.0428 Epoch: 6 Global Step: 287090 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:58:15,105-Speed 2628.30 samples/sec Loss 8.8965 LearningRate 0.0428 Epoch: 6 Global Step: 287100 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 03:58:19,000-Speed 2629.77 samples/sec Loss 8.7452 LearningRate 0.0428 Epoch: 6 Global Step: 287110 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:22,893-Speed 2631.07 samples/sec Loss 8.7861 LearningRate 0.0428 Epoch: 6 Global Step: 287120 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:26,788-Speed 2630.21 samples/sec Loss 8.7474 LearningRate 0.0428 Epoch: 6 Global Step: 287130 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:30,684-Speed 2628.90 samples/sec Loss 8.9313 LearningRate 0.0428 Epoch: 6 Global Step: 287140 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:34,620-Speed 2601.96 samples/sec Loss 8.9003 LearningRate 0.0428 Epoch: 6 Global Step: 287150 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:38,516-Speed 2628.72 samples/sec Loss 8.8665 LearningRate 0.0428 Epoch: 6 Global Step: 287160 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:42,421-Speed 2623.32 samples/sec Loss 8.8242 LearningRate 0.0428 Epoch: 6 Global Step: 287170 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:46,311-Speed 2632.77 samples/sec Loss 8.9077 LearningRate 0.0427 Epoch: 6 Global Step: 287180 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:50,219-Speed 2621.56 samples/sec Loss 8.7656 LearningRate 0.0427 Epoch: 6 Global Step: 287190 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:54,109-Speed 2632.95 samples/sec Loss 8.7641 LearningRate 0.0427 Epoch: 6 Global Step: 287200 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 03:58:58,003-Speed 2630.48 samples/sec Loss 8.7608 LearningRate 0.0427 Epoch: 6 Global Step: 287210 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:01,903-Speed 2626.50 samples/sec Loss 8.9416 LearningRate 0.0427 Epoch: 6 Global Step: 287220 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:05,804-Speed 2625.41 samples/sec Loss 8.8090 LearningRate 0.0427 Epoch: 6 Global Step: 287230 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:09,704-Speed 2626.05 samples/sec Loss 8.8045 LearningRate 0.0427 Epoch: 6 Global Step: 287240 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:13,619-Speed 2616.45 samples/sec Loss 8.8548 LearningRate 0.0427 Epoch: 6 Global Step: 287250 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:17,511-Speed 2631.72 samples/sec Loss 8.8767 LearningRate 0.0427 Epoch: 6 Global Step: 287260 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:21,417-Speed 2622.62 samples/sec Loss 8.9487 LearningRate 0.0427 Epoch: 6 Global Step: 287270 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:25,342-Speed 2609.92 samples/sec Loss 8.7965 LearningRate 0.0427 Epoch: 6 Global Step: 287280 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:29,235-Speed 2631.24 samples/sec Loss 8.9410 LearningRate 0.0427 Epoch: 6 Global Step: 287290 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:33,152-Speed 2614.85 samples/sec Loss 8.9101 LearningRate 0.0427 Epoch: 6 Global Step: 287300 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 03:59:37,047-Speed 2629.91 samples/sec Loss 8.9007 LearningRate 0.0427 Epoch: 6 Global Step: 287310 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:59:40,938-Speed 2632.30 samples/sec Loss 8.8432 LearningRate 0.0427 Epoch: 6 Global Step: 287320 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:59:44,828-Speed 2632.64 samples/sec Loss 9.0032 LearningRate 0.0427 Epoch: 6 Global Step: 287330 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:59:48,719-Speed 2632.62 samples/sec Loss 8.8828 LearningRate 0.0427 Epoch: 6 Global Step: 287340 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:59:52,611-Speed 2631.59 samples/sec Loss 8.8099 LearningRate 0.0427 Epoch: 6 Global Step: 287350 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 03:59:56,519-Speed 2621.19 samples/sec Loss 8.8690 LearningRate 0.0427 Epoch: 6 Global Step: 287360 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:00:00,430-Speed 2619.05 samples/sec Loss 8.8603 LearningRate 0.0427 Epoch: 6 Global Step: 287370 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:00:04,324-Speed 2629.91 samples/sec Loss 9.0728 LearningRate 0.0427 Epoch: 6 Global Step: 287380 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:00:08,225-Speed 2625.66 samples/sec Loss 8.7380 LearningRate 0.0427 Epoch: 6 Global Step: 287390 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:00:12,123-Speed 2627.48 samples/sec Loss 8.7835 LearningRate 0.0427 Epoch: 6 Global Step: 287400 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:00:16,017-Speed 2631.09 samples/sec Loss 8.7907 LearningRate 0.0427 Epoch: 6 Global Step: 287410 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:19,908-Speed 2632.39 samples/sec Loss 8.8328 LearningRate 0.0427 Epoch: 6 Global Step: 287420 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:23,803-Speed 2629.56 samples/sec Loss 8.9030 LearningRate 0.0427 Epoch: 6 Global Step: 287430 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:27,694-Speed 2631.97 samples/sec Loss 8.8473 LearningRate 0.0427 Epoch: 6 Global Step: 287440 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:31,585-Speed 2632.30 samples/sec Loss 8.8169 LearningRate 0.0427 Epoch: 6 Global Step: 287450 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:35,481-Speed 2629.74 samples/sec Loss 8.9102 LearningRate 0.0427 Epoch: 6 Global Step: 287460 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:39,372-Speed 2632.77 samples/sec Loss 8.8460 LearningRate 0.0427 Epoch: 6 Global Step: 287470 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:43,265-Speed 2630.76 samples/sec Loss 8.8026 LearningRate 0.0427 Epoch: 6 Global Step: 287480 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:47,155-Speed 2633.04 samples/sec Loss 8.8992 LearningRate 0.0427 Epoch: 6 Global Step: 287490 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:51,049-Speed 2630.02 samples/sec Loss 8.8784 LearningRate 0.0427 Epoch: 6 Global Step: 287500 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:00:54,941-Speed 2631.75 samples/sec Loss 8.9321 LearningRate 0.0427 Epoch: 6 Global Step: 287510 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:00:58,836-Speed 2630.07 samples/sec Loss 8.8583 LearningRate 0.0427 Epoch: 6 Global Step: 287520 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:02,727-Speed 2632.19 samples/sec Loss 8.9179 LearningRate 0.0427 Epoch: 6 Global Step: 287530 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:06,620-Speed 2630.37 samples/sec Loss 8.8815 LearningRate 0.0427 Epoch: 6 Global Step: 287540 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:10,533-Speed 2617.92 samples/sec Loss 8.8478 LearningRate 0.0427 Epoch: 6 Global Step: 287550 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:14,448-Speed 2616.72 samples/sec Loss 8.6600 LearningRate 0.0427 Epoch: 6 Global Step: 287560 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:18,338-Speed 2632.70 samples/sec Loss 8.8550 LearningRate 0.0427 Epoch: 6 Global Step: 287570 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:22,251-Speed 2618.00 samples/sec Loss 8.9142 LearningRate 0.0427 Epoch: 6 Global Step: 287580 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:26,144-Speed 2631.29 samples/sec Loss 8.7996 LearningRate 0.0427 Epoch: 6 Global Step: 287590 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:30,037-Speed 2630.96 samples/sec Loss 8.7543 LearningRate 0.0427 Epoch: 6 Global Step: 287600 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:33,930-Speed 2630.92 samples/sec Loss 8.7624 LearningRate 0.0427 Epoch: 6 Global Step: 287610 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:01:37,820-Speed 2633.67 samples/sec Loss 8.8421 LearningRate 0.0427 Epoch: 6 Global Step: 287620 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:01:41,712-Speed 2631.10 samples/sec Loss 8.9717 LearningRate 0.0427 Epoch: 6 Global Step: 287630 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:01:45,627-Speed 2617.37 samples/sec Loss 8.7752 LearningRate 0.0427 Epoch: 6 Global Step: 287640 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:01:49,500-Speed 2644.10 samples/sec Loss 8.8473 LearningRate 0.0427 Epoch: 6 Global Step: 287650 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:53,393-Speed 2631.88 samples/sec Loss 8.9374 LearningRate 0.0427 Epoch: 6 Global Step: 287660 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:01:57,302-Speed 2620.01 samples/sec Loss 8.7892 LearningRate 0.0427 Epoch: 6 Global Step: 287670 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:01,206-Speed 2623.11 samples/sec Loss 8.8526 LearningRate 0.0427 Epoch: 6 Global Step: 287680 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:05,127-Speed 2612.41 samples/sec Loss 8.9547 LearningRate 0.0427 Epoch: 6 Global Step: 287690 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:09,020-Speed 2631.49 samples/sec Loss 8.8329 LearningRate 0.0427 Epoch: 6 Global Step: 287700 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:12,918-Speed 2627.49 samples/sec Loss 8.8719 LearningRate 0.0427 Epoch: 6 Global Step: 287710 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:16,859-Speed 2598.54 samples/sec Loss 8.8581 LearningRate 0.0427 Epoch: 6 Global Step: 287720 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:20,758-Speed 2627.12 samples/sec Loss 8.9932 LearningRate 0.0427 Epoch: 6 Global Step: 287730 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:24,650-Speed 2633.49 samples/sec Loss 8.7630 LearningRate 0.0427 Epoch: 6 Global Step: 287740 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:28,546-Speed 2628.71 samples/sec Loss 8.7477 LearningRate 0.0427 Epoch: 6 Global Step: 287750 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:02:32,440-Speed 2630.44 samples/sec Loss 8.8856 LearningRate 0.0427 Epoch: 6 Global Step: 287760 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:02:36,333-Speed 2630.99 samples/sec Loss 8.8012 LearningRate 0.0427 Epoch: 6 Global Step: 287770 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:02:40,230-Speed 2628.36 samples/sec Loss 8.8557 LearningRate 0.0427 Epoch: 6 Global Step: 287780 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:02:44,133-Speed 2624.12 samples/sec Loss 8.8310 LearningRate 0.0427 Epoch: 6 Global Step: 287790 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:48,035-Speed 2625.06 samples/sec Loss 8.8490 LearningRate 0.0427 Epoch: 6 Global Step: 287800 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:51,939-Speed 2623.73 samples/sec Loss 8.7370 LearningRate 0.0426 Epoch: 6 Global Step: 287810 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:55,842-Speed 2625.10 samples/sec Loss 8.7276 LearningRate 0.0426 Epoch: 6 Global Step: 287820 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:02:59,735-Speed 2630.38 samples/sec Loss 8.8903 LearningRate 0.0426 Epoch: 6 Global Step: 287830 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:03:03,634-Speed 2626.90 samples/sec Loss 8.8913 LearningRate 0.0426 Epoch: 6 Global Step: 287840 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:03:07,526-Speed 2631.37 samples/sec Loss 8.7094 LearningRate 0.0426 Epoch: 6 Global Step: 287850 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:03:11,406-Speed 2640.02 samples/sec Loss 8.7339 LearningRate 0.0426 Epoch: 6 Global Step: 287860 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:15,318-Speed 2618.88 samples/sec Loss 8.8680 LearningRate 0.0426 Epoch: 6 Global Step: 287870 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:19,213-Speed 2629.23 samples/sec Loss 8.6976 LearningRate 0.0426 Epoch: 6 Global Step: 287880 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:23,107-Speed 2630.67 samples/sec Loss 8.8111 LearningRate 0.0426 Epoch: 6 Global Step: 287890 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:27,010-Speed 2624.39 samples/sec Loss 8.8564 LearningRate 0.0426 Epoch: 6 Global Step: 287900 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:30,911-Speed 2625.49 samples/sec Loss 9.0071 LearningRate 0.0426 Epoch: 6 Global Step: 287910 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:34,817-Speed 2622.26 samples/sec Loss 8.8268 LearningRate 0.0426 Epoch: 6 Global Step: 287920 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:38,713-Speed 2629.55 samples/sec Loss 8.7943 LearningRate 0.0426 Epoch: 6 Global Step: 287930 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:42,609-Speed 2628.34 samples/sec Loss 8.8428 LearningRate 0.0426 Epoch: 6 Global Step: 287940 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:46,506-Speed 2628.50 samples/sec Loss 8.8218 LearningRate 0.0426 Epoch: 6 Global Step: 287950 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:03:50,399-Speed 2630.90 samples/sec Loss 8.8046 LearningRate 0.0426 Epoch: 6 Global Step: 287960 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:03:54,296-Speed 2628.55 samples/sec Loss 8.7818 LearningRate 0.0426 Epoch: 6 Global Step: 287970 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:03:58,190-Speed 2630.25 samples/sec Loss 8.8843 LearningRate 0.0426 Epoch: 6 Global Step: 287980 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:04:02,086-Speed 2628.99 samples/sec Loss 8.7882 LearningRate 0.0426 Epoch: 6 Global Step: 287990 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:04:05,984-Speed 2627.41 samples/sec Loss 8.8555 LearningRate 0.0426 Epoch: 6 Global Step: 288000 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:04:09,895-Speed 2618.84 samples/sec Loss 8.9000 LearningRate 0.0426 Epoch: 6 Global Step: 288010 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:04:13,791-Speed 2629.32 samples/sec Loss 8.9746 LearningRate 0.0426 Epoch: 6 Global Step: 288020 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:04:17,720-Speed 2607.24 samples/sec Loss 8.9422 LearningRate 0.0426 Epoch: 6 Global Step: 288030 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:04:21,615-Speed 2630.54 samples/sec Loss 8.7369 LearningRate 0.0426 Epoch: 6 Global Step: 288040 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:04:25,493-Speed 2640.89 samples/sec Loss 8.8891 LearningRate 0.0426 Epoch: 6 Global Step: 288050 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:29,400-Speed 2621.81 samples/sec Loss 8.7529 LearningRate 0.0426 Epoch: 6 Global Step: 288060 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:33,302-Speed 2624.96 samples/sec Loss 8.8767 LearningRate 0.0426 Epoch: 6 Global Step: 288070 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:37,197-Speed 2629.84 samples/sec Loss 8.7865 LearningRate 0.0426 Epoch: 6 Global Step: 288080 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:41,092-Speed 2629.27 samples/sec Loss 8.9017 LearningRate 0.0426 Epoch: 6 Global Step: 288090 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:44,989-Speed 2628.64 samples/sec Loss 8.7983 LearningRate 0.0426 Epoch: 6 Global Step: 288100 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:48,885-Speed 2628.98 samples/sec Loss 8.6845 LearningRate 0.0426 Epoch: 6 Global Step: 288110 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:52,814-Speed 2607.57 samples/sec Loss 8.8939 LearningRate 0.0426 Epoch: 6 Global Step: 288120 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:04:56,746-Speed 2604.92 samples/sec Loss 8.8225 LearningRate 0.0426 Epoch: 6 Global Step: 288130 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:05:00,668-Speed 2611.52 samples/sec Loss 8.8206 LearningRate 0.0426 Epoch: 6 Global Step: 288140 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:05:04,574-Speed 2622.46 samples/sec Loss 8.7923 LearningRate 0.0426 Epoch: 6 Global Step: 288150 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:08,469-Speed 2629.23 samples/sec Loss 8.8896 LearningRate 0.0426 Epoch: 6 Global Step: 288160 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:12,361-Speed 2631.69 samples/sec Loss 8.8625 LearningRate 0.0426 Epoch: 6 Global Step: 288170 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:16,274-Speed 2617.59 samples/sec Loss 8.8153 LearningRate 0.0426 Epoch: 6 Global Step: 288180 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:20,171-Speed 2628.76 samples/sec Loss 8.7170 LearningRate 0.0426 Epoch: 6 Global Step: 288190 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:24,062-Speed 2632.27 samples/sec Loss 8.8606 LearningRate 0.0426 Epoch: 6 Global Step: 288200 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:27,958-Speed 2629.17 samples/sec Loss 8.9279 LearningRate 0.0426 Epoch: 6 Global Step: 288210 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:31,854-Speed 2629.14 samples/sec Loss 8.8217 LearningRate 0.0426 Epoch: 6 Global Step: 288220 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:35,749-Speed 2629.90 samples/sec Loss 8.8973 LearningRate 0.0426 Epoch: 6 Global Step: 288230 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:39,656-Speed 2621.36 samples/sec Loss 8.8210 LearningRate 0.0426 Epoch: 6 Global Step: 288240 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:05:43,533-Speed 2642.43 samples/sec Loss 8.8048 LearningRate 0.0426 Epoch: 6 Global Step: 288250 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:05:47,432-Speed 2626.47 samples/sec Loss 8.8530 LearningRate 0.0426 Epoch: 6 Global Step: 288260 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:05:51,326-Speed 2630.56 samples/sec Loss 8.7668 LearningRate 0.0426 Epoch: 6 Global Step: 288270 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:05:55,221-Speed 2629.89 samples/sec Loss 8.6517 LearningRate 0.0426 Epoch: 6 Global Step: 288280 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:05:59,113-Speed 2631.71 samples/sec Loss 8.8953 LearningRate 0.0426 Epoch: 6 Global Step: 288290 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:06:03,008-Speed 2629.76 samples/sec Loss 8.8058 LearningRate 0.0426 Epoch: 6 Global Step: 288300 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:06:06,908-Speed 2626.40 samples/sec Loss 9.0845 LearningRate 0.0426 Epoch: 6 Global Step: 288310 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:06:10,800-Speed 2631.49 samples/sec Loss 8.9324 LearningRate 0.0426 Epoch: 6 Global Step: 288320 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:06:14,725-Speed 2610.07 samples/sec Loss 8.9075 LearningRate 0.0426 Epoch: 6 Global Step: 288330 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:06:18,654-Speed 2606.28 samples/sec Loss 8.7097 LearningRate 0.0426 Epoch: 6 Global Step: 288340 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:06:22,565-Speed 2619.28 samples/sec Loss 8.8265 LearningRate 0.0426 Epoch: 6 Global Step: 288350 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:26,481-Speed 2629.82 samples/sec Loss 8.6841 LearningRate 0.0426 Epoch: 6 Global Step: 288360 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:30,407-Speed 2633.72 samples/sec Loss 8.7554 LearningRate 0.0426 Epoch: 6 Global Step: 288370 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:34,331-Speed 2610.20 samples/sec Loss 8.8536 LearningRate 0.0426 Epoch: 6 Global Step: 288380 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:38,224-Speed 2631.11 samples/sec Loss 8.9111 LearningRate 0.0426 Epoch: 6 Global Step: 288390 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:42,126-Speed 2625.10 samples/sec Loss 8.8460 LearningRate 0.0426 Epoch: 6 Global Step: 288400 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:50,214-Speed 2616.32 samples/sec Loss 8.7493 LearningRate 0.0426 Epoch: 6 Global Step: 288410 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:54,282-Speed 2641.40 samples/sec Loss 8.7888 LearningRate 0.0426 Epoch: 6 Global Step: 288420 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:06:58,190-Speed 2620.66 samples/sec Loss 8.7511 LearningRate 0.0426 Epoch: 6 Global Step: 288430 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:07:02,258-Speed 2640.38 samples/sec Loss 8.9033 LearningRate 0.0426 Epoch: 6 Global Step: 288440 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:07:06,153-Speed 2629.11 samples/sec Loss 8.8011 LearningRate 0.0425 Epoch: 6 Global Step: 288450 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:07:10,041-Speed 2634.39 samples/sec Loss 8.8530 LearningRate 0.0425 Epoch: 6 Global Step: 288460 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:07:13,944-Speed 2624.44 samples/sec Loss 8.8107 LearningRate 0.0425 Epoch: 6 Global Step: 288470 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:07:18,065-Speed 2629.68 samples/sec Loss 8.9325 LearningRate 0.0425 Epoch: 6 Global Step: 288480 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:07:21,961-Speed 2629.14 samples/sec Loss 8.6886 LearningRate 0.0425 Epoch: 6 Global Step: 288490 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:07:25,846-Speed 2636.57 samples/sec Loss 8.8505 LearningRate 0.0425 Epoch: 6 Global Step: 288500 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:07:29,733-Speed 2635.20 samples/sec Loss 8.8464 LearningRate 0.0425 Epoch: 6 Global Step: 288510 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:07:33,621-Speed 2634.16 samples/sec Loss 8.8143 LearningRate 0.0425 Epoch: 6 Global Step: 288520 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:07:37,515-Speed 2629.96 samples/sec Loss 8.6437 LearningRate 0.0425 Epoch: 6 Global Step: 288530 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:07:41,407-Speed 2632.48 samples/sec Loss 8.9358 LearningRate 0.0425 Epoch: 6 Global Step: 288540 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:07:45,297-Speed 2632.88 samples/sec Loss 8.8143 LearningRate 0.0425 Epoch: 6 Global Step: 288550 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:07:49,090-Speed 2700.70 samples/sec Loss 9.9233 LearningRate 0.0425 Epoch: 6 Global Step: 288560 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:07:52,989-Speed 2627.04 samples/sec Loss 9.7595 LearningRate 0.0425 Epoch: 6 Global Step: 288570 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:07:56,964-Speed 2576.66 samples/sec Loss 9.7823 LearningRate 0.0425 Epoch: 6 Global Step: 288580 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:00,845-Speed 2638.70 samples/sec Loss 9.3922 LearningRate 0.0425 Epoch: 6 Global Step: 288590 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:04,737-Speed 2632.15 samples/sec Loss 9.2172 LearningRate 0.0425 Epoch: 6 Global Step: 288600 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:08,642-Speed 2623.10 samples/sec Loss 9.0778 LearningRate 0.0425 Epoch: 6 Global Step: 288610 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:12,559-Speed 2614.62 samples/sec Loss 8.8250 LearningRate 0.0425 Epoch: 6 Global Step: 288620 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:16,458-Speed 2627.74 samples/sec Loss 8.9765 LearningRate 0.0425 Epoch: 6 Global Step: 288630 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:20,354-Speed 2628.40 samples/sec Loss 8.8867 LearningRate 0.0425 Epoch: 6 Global Step: 288640 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:24,241-Speed 2635.19 samples/sec Loss 8.7609 LearningRate 0.0425 Epoch: 6 Global Step: 288650 Fp16 Grad Scale: 2048 Required: 61 hours
Training: 2022-04-14 04:08:28,131-Speed 2633.30 samples/sec Loss 8.8971 LearningRate 0.0425 Epoch: 6 Global Step: 288660 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:32,022-Speed 2632.48 samples/sec Loss 8.9420 LearningRate 0.0425 Epoch: 6 Global Step: 288670 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:35,912-Speed 2632.98 samples/sec Loss 8.8059 LearningRate 0.0425 Epoch: 6 Global Step: 288680 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:39,802-Speed 2632.70 samples/sec Loss 8.8804 LearningRate 0.0425 Epoch: 6 Global Step: 288690 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:43,692-Speed 2632.82 samples/sec Loss 8.8257 LearningRate 0.0425 Epoch: 6 Global Step: 288700 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:47,583-Speed 2632.90 samples/sec Loss 8.7469 LearningRate 0.0425 Epoch: 6 Global Step: 288710 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:51,487-Speed 2623.54 samples/sec Loss 8.8783 LearningRate 0.0425 Epoch: 6 Global Step: 288720 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:55,397-Speed 2619.70 samples/sec Loss 8.8933 LearningRate 0.0425 Epoch: 6 Global Step: 288730 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:08:59,327-Speed 2606.41 samples/sec Loss 8.7898 LearningRate 0.0425 Epoch: 6 Global Step: 288740 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:09:03,222-Speed 2629.34 samples/sec Loss 8.9776 LearningRate 0.0425 Epoch: 6 Global Step: 288750 Fp16 Grad Scale: 4096 Required: 61 hours
Training: 2022-04-14 04:09:07,119-Speed 2628.25 samples/sec Loss 8.7799 LearningRate 0.0425 Epoch: 6 Global Step: 288760 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:11,010-Speed 2632.97 samples/sec Loss 8.9022 LearningRate 0.0425 Epoch: 6 Global Step: 288770 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:14,901-Speed 2631.84 samples/sec Loss 8.9764 LearningRate 0.0425 Epoch: 6 Global Step: 288780 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:18,801-Speed 2626.01 samples/sec Loss 8.9349 LearningRate 0.0425 Epoch: 6 Global Step: 288790 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:22,702-Speed 2626.51 samples/sec Loss 8.6480 LearningRate 0.0425 Epoch: 6 Global Step: 288800 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:26,607-Speed 2623.08 samples/sec Loss 8.7625 LearningRate 0.0425 Epoch: 6 Global Step: 288810 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:30,519-Speed 2618.48 samples/sec Loss 8.8194 LearningRate 0.0425 Epoch: 6 Global Step: 288820 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:34,421-Speed 2625.19 samples/sec Loss 8.8932 LearningRate 0.0425 Epoch: 6 Global Step: 288830 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:38,328-Speed 2621.56 samples/sec Loss 8.8216 LearningRate 0.0425 Epoch: 6 Global Step: 288840 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:42,228-Speed 2626.04 samples/sec Loss 8.6688 LearningRate 0.0425 Epoch: 6 Global Step: 288850 Fp16 Grad Scale: 8192 Required: 61 hours
Training: 2022-04-14 04:09:46,136-Speed 2621.39 samples/sec Loss 8.8345 LearningRate 0.0425 Epoch: 6 Global Step: 288860 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:09:50,025-Speed 2633.58 samples/sec Loss 8.8273 LearningRate 0.0425 Epoch: 6 Global Step: 288870 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:09:53,918-Speed 2631.43 samples/sec Loss 8.8943 LearningRate 0.0425 Epoch: 6 Global Step: 288880 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:09:57,801-Speed 2637.02 samples/sec Loss 8.8060 LearningRate 0.0425 Epoch: 6 Global Step: 288890 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:10:01,704-Speed 2624.76 samples/sec Loss 8.9558 LearningRate 0.0425 Epoch: 6 Global Step: 288900 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:10:05,599-Speed 2629.77 samples/sec Loss 8.8570 LearningRate 0.0425 Epoch: 6 Global Step: 288910 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:10:09,543-Speed 2596.83 samples/sec Loss 8.6972 LearningRate 0.0425 Epoch: 6 Global Step: 288920 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:10:13,444-Speed 2625.79 samples/sec Loss 8.9259 LearningRate 0.0425 Epoch: 6 Global Step: 288930 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:10:17,385-Speed 2598.42 samples/sec Loss 8.8055 LearningRate 0.0425 Epoch: 6 Global Step: 288940 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:10:21,272-Speed 2635.27 samples/sec Loss 8.5954 LearningRate 0.0425 Epoch: 6 Global Step: 288950 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:10:25,163-Speed 2632.63 samples/sec Loss 8.9194 LearningRate 0.0425 Epoch: 6 Global Step: 288960 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:29,051-Speed 2634.70 samples/sec Loss 8.6978 LearningRate 0.0425 Epoch: 6 Global Step: 288970 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:32,940-Speed 2633.40 samples/sec Loss 8.7914 LearningRate 0.0425 Epoch: 6 Global Step: 288980 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:36,952-Speed 2552.59 samples/sec Loss 8.9471 LearningRate 0.0425 Epoch: 6 Global Step: 288990 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:40,848-Speed 2628.72 samples/sec Loss 8.8134 LearningRate 0.0425 Epoch: 6 Global Step: 289000 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:44,744-Speed 2629.55 samples/sec Loss 8.9043 LearningRate 0.0425 Epoch: 6 Global Step: 289010 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:48,644-Speed 2626.14 samples/sec Loss 8.7262 LearningRate 0.0425 Epoch: 6 Global Step: 289020 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:52,534-Speed 2633.22 samples/sec Loss 8.8118 LearningRate 0.0425 Epoch: 6 Global Step: 289030 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:10:56,425-Speed 2632.49 samples/sec Loss 8.7297 LearningRate 0.0425 Epoch: 6 Global Step: 289040 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:11:00,393-Speed 2580.95 samples/sec Loss 8.8854 LearningRate 0.0425 Epoch: 6 Global Step: 289050 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:11:04,292-Speed 2627.07 samples/sec Loss 8.8463 LearningRate 0.0425 Epoch: 6 Global Step: 289060 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:08,199-Speed 2621.99 samples/sec Loss 8.9806 LearningRate 0.0425 Epoch: 6 Global Step: 289070 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:12,099-Speed 2625.79 samples/sec Loss 8.8818 LearningRate 0.0424 Epoch: 6 Global Step: 289080 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:16,017-Speed 2614.84 samples/sec Loss 8.8311 LearningRate 0.0424 Epoch: 6 Global Step: 289090 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:19,910-Speed 2631.14 samples/sec Loss 8.8965 LearningRate 0.0424 Epoch: 6 Global Step: 289100 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:23,797-Speed 2635.13 samples/sec Loss 8.8498 LearningRate 0.0424 Epoch: 6 Global Step: 289110 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:27,695-Speed 2627.18 samples/sec Loss 8.8187 LearningRate 0.0424 Epoch: 6 Global Step: 289120 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:31,586-Speed 2633.11 samples/sec Loss 8.9151 LearningRate 0.0424 Epoch: 6 Global Step: 289130 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:35,476-Speed 2633.03 samples/sec Loss 8.8820 LearningRate 0.0424 Epoch: 6 Global Step: 289140 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:39,369-Speed 2630.94 samples/sec Loss 8.6749 LearningRate 0.0424 Epoch: 6 Global Step: 289150 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:43,251-Speed 2637.89 samples/sec Loss 8.8423 LearningRate 0.0424 Epoch: 6 Global Step: 289160 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:47,209-Speed 2588.40 samples/sec Loss 8.8006 LearningRate 0.0424 Epoch: 6 Global Step: 289170 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:51,108-Speed 2626.48 samples/sec Loss 8.7925 LearningRate 0.0424 Epoch: 6 Global Step: 289180 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:55,001-Speed 2631.13 samples/sec Loss 8.7969 LearningRate 0.0424 Epoch: 6 Global Step: 289190 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:11:58,889-Speed 2634.67 samples/sec Loss 8.7191 LearningRate 0.0424 Epoch: 6 Global Step: 289200 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:12:02,788-Speed 2627.30 samples/sec Loss 8.7777 LearningRate 0.0424 Epoch: 6 Global Step: 289210 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:12:06,696-Speed 2620.83 samples/sec Loss 8.7717 LearningRate 0.0424 Epoch: 6 Global Step: 289220 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:12:10,591-Speed 2629.16 samples/sec Loss 8.7781 LearningRate 0.0424 Epoch: 6 Global Step: 289230 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:12:14,502-Speed 2619.22 samples/sec Loss 8.8968 LearningRate 0.0424 Epoch: 6 Global Step: 289240 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:12:18,392-Speed 2632.66 samples/sec Loss 8.7545 LearningRate 0.0424 Epoch: 6 Global Step: 289250 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:12:22,287-Speed 2630.16 samples/sec Loss 8.8849 LearningRate 0.0424 Epoch: 6 Global Step: 289260 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:26,181-Speed 2630.18 samples/sec Loss 8.7773 LearningRate 0.0424 Epoch: 6 Global Step: 289270 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:30,085-Speed 2623.93 samples/sec Loss 8.9770 LearningRate 0.0424 Epoch: 6 Global Step: 289280 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:33,980-Speed 2629.55 samples/sec Loss 8.8916 LearningRate 0.0424 Epoch: 6 Global Step: 289290 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:37,871-Speed 2632.27 samples/sec Loss 8.7966 LearningRate 0.0424 Epoch: 6 Global Step: 289300 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:41,761-Speed 2633.02 samples/sec Loss 8.9332 LearningRate 0.0424 Epoch: 6 Global Step: 289310 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:45,665-Speed 2623.68 samples/sec Loss 8.8474 LearningRate 0.0424 Epoch: 6 Global Step: 289320 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:49,567-Speed 2625.13 samples/sec Loss 8.7935 LearningRate 0.0424 Epoch: 6 Global Step: 289330 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:53,469-Speed 2625.35 samples/sec Loss 8.8367 LearningRate 0.0424 Epoch: 6 Global Step: 289340 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:12:57,360-Speed 2632.29 samples/sec Loss 8.8713 LearningRate 0.0424 Epoch: 6 Global Step: 289350 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:01,256-Speed 2628.88 samples/sec Loss 8.9480 LearningRate 0.0424 Epoch: 6 Global Step: 289360 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:13:05,147-Speed 2632.44 samples/sec Loss 8.8148 LearningRate 0.0424 Epoch: 6 Global Step: 289370 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:09,043-Speed 2629.46 samples/sec Loss 8.8040 LearningRate 0.0424 Epoch: 6 Global Step: 289380 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:12,935-Speed 2631.53 samples/sec Loss 8.8113 LearningRate 0.0424 Epoch: 6 Global Step: 289390 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:16,833-Speed 2628.10 samples/sec Loss 8.6761 LearningRate 0.0424 Epoch: 6 Global Step: 289400 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:20,727-Speed 2630.51 samples/sec Loss 8.7031 LearningRate 0.0424 Epoch: 6 Global Step: 289410 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:24,623-Speed 2628.67 samples/sec Loss 8.8505 LearningRate 0.0424 Epoch: 6 Global Step: 289420 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:28,519-Speed 2629.51 samples/sec Loss 8.7437 LearningRate 0.0424 Epoch: 6 Global Step: 289430 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:32,419-Speed 2626.43 samples/sec Loss 8.9131 LearningRate 0.0424 Epoch: 6 Global Step: 289440 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:36,338-Speed 2612.96 samples/sec Loss 8.7717 LearningRate 0.0424 Epoch: 6 Global Step: 289450 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:40,231-Speed 2630.95 samples/sec Loss 8.7603 LearningRate 0.0424 Epoch: 6 Global Step: 289460 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:44,124-Speed 2631.29 samples/sec Loss 8.7356 LearningRate 0.0424 Epoch: 6 Global Step: 289470 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:13:48,005-Speed 2638.98 samples/sec Loss 8.7980 LearningRate 0.0424 Epoch: 6 Global Step: 289480 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:51,898-Speed 2631.30 samples/sec Loss 8.8773 LearningRate 0.0424 Epoch: 6 Global Step: 289490 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:55,797-Speed 2626.65 samples/sec Loss 8.9322 LearningRate 0.0424 Epoch: 6 Global Step: 289500 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:13:59,724-Speed 2608.79 samples/sec Loss 8.9133 LearningRate 0.0424 Epoch: 6 Global Step: 289510 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:03,623-Speed 2627.24 samples/sec Loss 8.8405 LearningRate 0.0424 Epoch: 6 Global Step: 289520 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:07,519-Speed 2628.79 samples/sec Loss 8.8212 LearningRate 0.0424 Epoch: 6 Global Step: 289530 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:11,419-Speed 2625.90 samples/sec Loss 8.9146 LearningRate 0.0424 Epoch: 6 Global Step: 289540 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:15,315-Speed 2629.68 samples/sec Loss 8.9040 LearningRate 0.0424 Epoch: 6 Global Step: 289550 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:19,219-Speed 2623.42 samples/sec Loss 8.8090 LearningRate 0.0424 Epoch: 6 Global Step: 289560 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:23,119-Speed 2626.38 samples/sec Loss 8.8062 LearningRate 0.0424 Epoch: 6 Global Step: 289570 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:26,993-Speed 2643.41 samples/sec Loss 8.7264 LearningRate 0.0424 Epoch: 6 Global Step: 289580 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:30,891-Speed 2627.98 samples/sec Loss 8.7980 LearningRate 0.0424 Epoch: 6 Global Step: 289590 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:34,790-Speed 2626.92 samples/sec Loss 8.7977 LearningRate 0.0424 Epoch: 6 Global Step: 289600 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:14:38,676-Speed 2635.51 samples/sec Loss 8.8956 LearningRate 0.0424 Epoch: 6 Global Step: 289610 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:14:42,578-Speed 2625.38 samples/sec Loss 8.7872 LearningRate 0.0424 Epoch: 6 Global Step: 289620 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:14:46,495-Speed 2614.87 samples/sec Loss 8.8678 LearningRate 0.0424 Epoch: 6 Global Step: 289630 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:14:50,403-Speed 2620.70 samples/sec Loss 8.6948 LearningRate 0.0424 Epoch: 6 Global Step: 289640 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:14:54,290-Speed 2635.33 samples/sec Loss 8.7375 LearningRate 0.0424 Epoch: 6 Global Step: 289650 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:14:58,180-Speed 2633.05 samples/sec Loss 8.7370 LearningRate 0.0424 Epoch: 6 Global Step: 289660 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:15:02,094-Speed 2617.33 samples/sec Loss 8.7897 LearningRate 0.0424 Epoch: 6 Global Step: 289670 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:15:06,000-Speed 2622.23 samples/sec Loss 8.7292 LearningRate 0.0424 Epoch: 6 Global Step: 289680 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:15:09,902-Speed 2624.41 samples/sec Loss 8.8666 LearningRate 0.0424 Epoch: 6 Global Step: 289690 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:15:13,800-Speed 2627.59 samples/sec Loss 8.9690 LearningRate 0.0424 Epoch: 6 Global Step: 289700 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:15:17,700-Speed 2626.00 samples/sec Loss 8.8151 LearningRate 0.0424 Epoch: 6 Global Step: 289710 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:21,598-Speed 2628.05 samples/sec Loss 8.7990 LearningRate 0.0423 Epoch: 6 Global Step: 289720 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:25,493-Speed 2629.53 samples/sec Loss 8.8809 LearningRate 0.0423 Epoch: 6 Global Step: 289730 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:29,399-Speed 2631.20 samples/sec Loss 8.6339 LearningRate 0.0423 Epoch: 6 Global Step: 289740 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:33,293-Speed 2630.24 samples/sec Loss 8.7933 LearningRate 0.0423 Epoch: 6 Global Step: 289750 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:37,189-Speed 2628.62 samples/sec Loss 8.7944 LearningRate 0.0423 Epoch: 6 Global Step: 289760 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:41,085-Speed 2628.65 samples/sec Loss 8.7399 LearningRate 0.0423 Epoch: 6 Global Step: 289770 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:44,985-Speed 2626.83 samples/sec Loss 8.8266 LearningRate 0.0423 Epoch: 6 Global Step: 289780 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:48,880-Speed 2629.34 samples/sec Loss 8.8990 LearningRate 0.0423 Epoch: 6 Global Step: 289790 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:52,776-Speed 2629.51 samples/sec Loss 8.8402 LearningRate 0.0423 Epoch: 6 Global Step: 289800 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:15:56,668-Speed 2631.22 samples/sec Loss 8.8825 LearningRate 0.0423 Epoch: 6 Global Step: 289810 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:16:00,561-Speed 2631.40 samples/sec Loss 8.6754 LearningRate 0.0423 Epoch: 6 Global Step: 289820 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:16:04,458-Speed 2628.46 samples/sec Loss 8.8711 LearningRate 0.0423 Epoch: 6 Global Step: 289830 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:16:08,347-Speed 2633.28 samples/sec Loss 8.6748 LearningRate 0.0423 Epoch: 6 Global Step: 289840 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:16:12,221-Speed 2643.77 samples/sec Loss 8.7400 LearningRate 0.0423 Epoch: 6 Global Step: 289850 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:16:16,105-Speed 2637.22 samples/sec Loss 8.9369 LearningRate 0.0423 Epoch: 6 Global Step: 289860 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:20,018-Speed 2617.06 samples/sec Loss 8.8927 LearningRate 0.0423 Epoch: 6 Global Step: 289870 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:23,918-Speed 2627.27 samples/sec Loss 8.7779 LearningRate 0.0423 Epoch: 6 Global Step: 289880 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:27,806-Speed 2634.19 samples/sec Loss 8.8740 LearningRate 0.0423 Epoch: 6 Global Step: 289890 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:31,739-Speed 2603.98 samples/sec Loss 8.6785 LearningRate 0.0423 Epoch: 6 Global Step: 289900 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:35,790-Speed 2528.54 samples/sec Loss 8.9122 LearningRate 0.0423 Epoch: 6 Global Step: 289910 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:39,694-Speed 2623.33 samples/sec Loss 8.7766 LearningRate 0.0423 Epoch: 6 Global Step: 289920 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:43,593-Speed 2627.15 samples/sec Loss 8.8468 LearningRate 0.0423 Epoch: 6 Global Step: 289930 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:47,485-Speed 2631.69 samples/sec Loss 8.7730 LearningRate 0.0423 Epoch: 6 Global Step: 289940 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:51,377-Speed 2632.12 samples/sec Loss 8.8498 LearningRate 0.0423 Epoch: 6 Global Step: 289950 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:16:55,269-Speed 2631.48 samples/sec Loss 8.8886 LearningRate 0.0423 Epoch: 6 Global Step: 289960 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:16:59,158-Speed 2633.97 samples/sec Loss 8.8051 LearningRate 0.0423 Epoch: 6 Global Step: 289970 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:17:03,032-Speed 2643.70 samples/sec Loss 8.8787 LearningRate 0.0423 Epoch: 6 Global Step: 289980 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:17:06,932-Speed 2626.01 samples/sec Loss 8.8915 LearningRate 0.0423 Epoch: 6 Global Step: 289990 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:17:10,826-Speed 2630.27 samples/sec Loss 8.6995 LearningRate 0.0423 Epoch: 6 Global Step: 290000 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:17:54,451-[lfw][290000]XNorm: 22.874048
Training: 2022-04-14 04:17:54,452-[lfw][290000]Accuracy-Flip: 0.99717+-0.00279
Training: 2022-04-14 04:17:54,453-[lfw][290000]Accuracy-Highest: 0.99783
Training: 2022-04-14 04:18:45,175-[cfp_fp][290000]XNorm: 20.931474
Training: 2022-04-14 04:18:45,176-[cfp_fp][290000]Accuracy-Flip: 0.98443+-0.00587
Training: 2022-04-14 04:18:45,177-[cfp_fp][290000]Accuracy-Highest: 0.98643
Training: 2022-04-14 04:19:28,384-[agedb_30][290000]XNorm: 22.810445
Training: 2022-04-14 04:19:28,385-[agedb_30][290000]Accuracy-Flip: 0.97483+-0.00603
Training: 2022-04-14 04:19:28,386-[agedb_30][290000]Accuracy-Highest: 0.97567
Training: 2022-04-14 04:19:32,247-Speed 72.41 samples/sec Loss 8.8164 LearningRate 0.0423 Epoch: 6 Global Step: 290010 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:19:36,108-Speed 2652.77 samples/sec Loss 8.7170 LearningRate 0.0423 Epoch: 6 Global Step: 290020 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:19:39,990-Speed 2639.06 samples/sec Loss 8.8588 LearningRate 0.0423 Epoch: 6 Global Step: 290030 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:19:44,042-Speed 2527.39 samples/sec Loss 8.9399 LearningRate 0.0423 Epoch: 6 Global Step: 290040 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:19:47,966-Speed 2611.04 samples/sec Loss 8.9161 LearningRate 0.0423 Epoch: 6 Global Step: 290050 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:19:51,831-Speed 2650.11 samples/sec Loss 8.7573 LearningRate 0.0423 Epoch: 6 Global Step: 290060 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:19:55,700-Speed 2647.20 samples/sec Loss 8.8720 LearningRate 0.0423 Epoch: 6 Global Step: 290070 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:19:59,599-Speed 2627.69 samples/sec Loss 8.7707 LearningRate 0.0423 Epoch: 6 Global Step: 290080 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:03,471-Speed 2644.63 samples/sec Loss 8.8243 LearningRate 0.0423 Epoch: 6 Global Step: 290090 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:07,355-Speed 2637.40 samples/sec Loss 8.9221 LearningRate 0.0423 Epoch: 6 Global Step: 290100 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:11,234-Speed 2640.46 samples/sec Loss 8.7086 LearningRate 0.0423 Epoch: 6 Global Step: 290110 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:15,115-Speed 2639.05 samples/sec Loss 8.7687 LearningRate 0.0423 Epoch: 6 Global Step: 290120 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:18,997-Speed 2638.32 samples/sec Loss 8.9350 LearningRate 0.0423 Epoch: 6 Global Step: 290130 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:22,881-Speed 2637.35 samples/sec Loss 9.0165 LearningRate 0.0423 Epoch: 6 Global Step: 290140 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:26,769-Speed 2634.45 samples/sec Loss 8.8082 LearningRate 0.0423 Epoch: 6 Global Step: 290150 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:30,655-Speed 2636.14 samples/sec Loss 8.7674 LearningRate 0.0423 Epoch: 6 Global Step: 290160 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:34,543-Speed 2634.16 samples/sec Loss 8.8797 LearningRate 0.0423 Epoch: 6 Global Step: 290170 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:38,429-Speed 2635.94 samples/sec Loss 8.8336 LearningRate 0.0423 Epoch: 6 Global Step: 290180 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:20:42,318-Speed 2633.49 samples/sec Loss 8.8589 LearningRate 0.0423 Epoch: 6 Global Step: 290190 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:20:46,312-Speed 2564.75 samples/sec Loss 8.7375 LearningRate 0.0423 Epoch: 6 Global Step: 290200 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:20:50,214-Speed 2624.10 samples/sec Loss 8.8825 LearningRate 0.0423 Epoch: 6 Global Step: 290210 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:20:54,078-Speed 2651.37 samples/sec Loss 8.8864 LearningRate 0.0423 Epoch: 6 Global Step: 290220 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:20:57,969-Speed 2632.01 samples/sec Loss 8.9049 LearningRate 0.0423 Epoch: 6 Global Step: 290230 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:01,859-Speed 2633.36 samples/sec Loss 8.8268 LearningRate 0.0423 Epoch: 6 Global Step: 290240 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:05,762-Speed 2624.29 samples/sec Loss 8.8174 LearningRate 0.0423 Epoch: 6 Global Step: 290250 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:09,655-Speed 2630.83 samples/sec Loss 8.8647 LearningRate 0.0423 Epoch: 6 Global Step: 290260 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:13,550-Speed 2629.69 samples/sec Loss 8.7727 LearningRate 0.0423 Epoch: 6 Global Step: 290270 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:17,445-Speed 2629.51 samples/sec Loss 8.7293 LearningRate 0.0423 Epoch: 6 Global Step: 290280 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:21,343-Speed 2628.46 samples/sec Loss 8.6967 LearningRate 0.0423 Epoch: 6 Global Step: 290290 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:25,231-Speed 2634.18 samples/sec Loss 8.9588 LearningRate 0.0423 Epoch: 6 Global Step: 290300 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:29,125-Speed 2630.28 samples/sec Loss 8.7845 LearningRate 0.0423 Epoch: 6 Global Step: 290310 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:33,026-Speed 2625.27 samples/sec Loss 8.8547 LearningRate 0.0423 Epoch: 6 Global Step: 290320 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:36,922-Speed 2629.46 samples/sec Loss 8.8374 LearningRate 0.0423 Epoch: 6 Global Step: 290330 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:40,810-Speed 2634.13 samples/sec Loss 8.8431 LearningRate 0.0423 Epoch: 6 Global Step: 290340 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:21:44,711-Speed 2625.75 samples/sec Loss 8.9314 LearningRate 0.0423 Epoch: 6 Global Step: 290350 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:22:05,650-Speed 489.08 samples/sec Loss 8.8387 LearningRate 0.0422 Epoch: 7 Global Step: 290360 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:22:09,529-Speed 2640.49 samples/sec Loss 8.7885 LearningRate 0.0422 Epoch: 7 Global Step: 290370 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:22:13,408-Speed 2641.06 samples/sec Loss 8.7895 LearningRate 0.0422 Epoch: 7 Global Step: 290380 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:17,306-Speed 2627.78 samples/sec Loss 8.8993 LearningRate 0.0422 Epoch: 7 Global Step: 290390 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:21,209-Speed 2624.16 samples/sec Loss 8.8399 LearningRate 0.0422 Epoch: 7 Global Step: 290400 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:25,170-Speed 2586.26 samples/sec Loss 8.8313 LearningRate 0.0422 Epoch: 7 Global Step: 290410 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:29,078-Speed 2620.80 samples/sec Loss 8.7965 LearningRate 0.0422 Epoch: 7 Global Step: 290420 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:32,967-Speed 2634.00 samples/sec Loss 8.7864 LearningRate 0.0422 Epoch: 7 Global Step: 290430 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:36,857-Speed 2632.41 samples/sec Loss 8.8116 LearningRate 0.0422 Epoch: 7 Global Step: 290440 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:40,752-Speed 2629.79 samples/sec Loss 8.7601 LearningRate 0.0422 Epoch: 7 Global Step: 290450 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:44,644-Speed 2631.34 samples/sec Loss 8.6554 LearningRate 0.0422 Epoch: 7 Global Step: 290460 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:48,543-Speed 2626.92 samples/sec Loss 8.9084 LearningRate 0.0422 Epoch: 7 Global Step: 290470 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:22:52,438-Speed 2629.78 samples/sec Loss 8.7083 LearningRate 0.0422 Epoch: 7 Global Step: 290480 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:22:56,314-Speed 2642.35 samples/sec Loss 8.7206 LearningRate 0.0422 Epoch: 7 Global Step: 290490 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:23:00,210-Speed 2629.06 samples/sec Loss 8.8826 LearningRate 0.0422 Epoch: 7 Global Step: 290500 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:23:04,085-Speed 2643.06 samples/sec Loss 9.6456 LearningRate 0.0422 Epoch: 7 Global Step: 290510 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:07,995-Speed 2619.60 samples/sec Loss 9.1266 LearningRate 0.0422 Epoch: 7 Global Step: 290520 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:11,907-Speed 2617.80 samples/sec Loss 10.0776 LearningRate 0.0422 Epoch: 7 Global Step: 290530 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:15,810-Speed 2624.77 samples/sec Loss 9.0079 LearningRate 0.0422 Epoch: 7 Global Step: 290540 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:19,707-Speed 2628.31 samples/sec Loss 9.0149 LearningRate 0.0422 Epoch: 7 Global Step: 290550 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:23,607-Speed 2626.94 samples/sec Loss 8.7510 LearningRate 0.0422 Epoch: 7 Global Step: 290560 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:27,506-Speed 2626.65 samples/sec Loss 8.8153 LearningRate 0.0422 Epoch: 7 Global Step: 290570 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:31,405-Speed 2627.26 samples/sec Loss 8.7424 LearningRate 0.0422 Epoch: 7 Global Step: 290580 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:35,316-Speed 2618.90 samples/sec Loss 8.7181 LearningRate 0.0422 Epoch: 7 Global Step: 290590 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:39,219-Speed 2624.05 samples/sec Loss 8.7930 LearningRate 0.0422 Epoch: 7 Global Step: 290600 Fp16 Grad Scale: 16384 Required: 61 hours
Training: 2022-04-14 04:23:43,113-Speed 2630.26 samples/sec Loss 8.7527 LearningRate 0.0422 Epoch: 7 Global Step: 290610 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:23:47,013-Speed 2626.93 samples/sec Loss 8.8855 LearningRate 0.0422 Epoch: 7 Global Step: 290620 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:23:50,912-Speed 2626.28 samples/sec Loss 8.6941 LearningRate 0.0422 Epoch: 7 Global Step: 290630 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:23:54,851-Speed 2600.62 samples/sec Loss 8.8424 LearningRate 0.0422 Epoch: 7 Global Step: 290640 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:23:58,756-Speed 2623.05 samples/sec Loss 8.6961 LearningRate 0.0422 Epoch: 7 Global Step: 290650 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:24:02,661-Speed 2623.00 samples/sec Loss 8.7247 LearningRate 0.0422 Epoch: 7 Global Step: 290660 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:24:06,570-Speed 2620.31 samples/sec Loss 8.8417 LearningRate 0.0422 Epoch: 7 Global Step: 290670 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:24:10,507-Speed 2601.81 samples/sec Loss 8.7811 LearningRate 0.0422 Epoch: 7 Global Step: 290680 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:24:14,414-Speed 2621.36 samples/sec Loss 8.6956 LearningRate 0.0422 Epoch: 7 Global Step: 290690 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:24:18,316-Speed 2625.08 samples/sec Loss 8.9194 LearningRate 0.0422 Epoch: 7 Global Step: 290700 Fp16 Grad Scale: 32768 Required: 61 hours
Training: 2022-04-14 04:24:22,224-Speed 2621.33 samples/sec Loss 8.8088 LearningRate 0.0422 Epoch: 7 Global Step: 290710 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:26,133-Speed 2619.99 samples/sec Loss 8.8212 LearningRate 0.0422 Epoch: 7 Global Step: 290720 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:30,035-Speed 2625.16 samples/sec Loss 8.8487 LearningRate 0.0422 Epoch: 7 Global Step: 290730 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:33,946-Speed 2618.99 samples/sec Loss 8.7830 LearningRate 0.0422 Epoch: 7 Global Step: 290740 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:37,871-Speed 2609.40 samples/sec Loss 8.8074 LearningRate 0.0422 Epoch: 7 Global Step: 290750 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:41,796-Speed 2609.76 samples/sec Loss 8.7769 LearningRate 0.0422 Epoch: 7 Global Step: 290760 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:45,701-Speed 2623.19 samples/sec Loss 8.6644 LearningRate 0.0422 Epoch: 7 Global Step: 290770 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:49,602-Speed 2626.08 samples/sec Loss 8.6984 LearningRate 0.0422 Epoch: 7 Global Step: 290780 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:53,505-Speed 2624.18 samples/sec Loss 8.9119 LearningRate 0.0422 Epoch: 7 Global Step: 290790 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:24:57,408-Speed 2624.24 samples/sec Loss 8.6952 LearningRate 0.0422 Epoch: 7 Global Step: 290800 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:01,313-Speed 2623.37 samples/sec Loss 8.8433 LearningRate 0.0422 Epoch: 7 Global Step: 290810 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:25:05,218-Speed 2622.36 samples/sec Loss 8.7789 LearningRate 0.0422 Epoch: 7 Global Step: 290820 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:25:09,122-Speed 2623.52 samples/sec Loss 8.6967 LearningRate 0.0422 Epoch: 7 Global Step: 290830 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:25:13,044-Speed 2611.49 samples/sec Loss 8.6908 LearningRate 0.0422 Epoch: 7 Global Step: 290840 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:25:16,953-Speed 2620.89 samples/sec Loss 8.6945 LearningRate 0.0422 Epoch: 7 Global Step: 290850 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:25:20,855-Speed 2624.64 samples/sec Loss 8.8852 LearningRate 0.0422 Epoch: 7 Global Step: 290860 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:25:24,745-Speed 2633.51 samples/sec Loss 8.7699 LearningRate 0.0422 Epoch: 7 Global Step: 290870 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:28,645-Speed 2626.07 samples/sec Loss 8.6910 LearningRate 0.0422 Epoch: 7 Global Step: 290880 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:32,547-Speed 2624.61 samples/sec Loss 8.5520 LearningRate 0.0422 Epoch: 7 Global Step: 290890 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:36,456-Speed 2620.84 samples/sec Loss 8.8328 LearningRate 0.0422 Epoch: 7 Global Step: 290900 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:40,359-Speed 2624.19 samples/sec Loss 8.7666 LearningRate 0.0422 Epoch: 7 Global Step: 290910 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:44,261-Speed 2624.51 samples/sec Loss 8.8398 LearningRate 0.0422 Epoch: 7 Global Step: 290920 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:48,170-Speed 2620.25 samples/sec Loss 8.8658 LearningRate 0.0422 Epoch: 7 Global Step: 290930 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:52,095-Speed 2609.72 samples/sec Loss 8.7627 LearningRate 0.0422 Epoch: 7 Global Step: 290940 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:56,004-Speed 2620.27 samples/sec Loss 8.7521 LearningRate 0.0422 Epoch: 7 Global Step: 290950 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:25:59,907-Speed 2624.27 samples/sec Loss 8.7851 LearningRate 0.0422 Epoch: 7 Global Step: 290960 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:26:03,822-Speed 2616.18 samples/sec Loss 8.7684 LearningRate 0.0422 Epoch: 7 Global Step: 290970 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:07,731-Speed 2620.09 samples/sec Loss 8.7246 LearningRate 0.0422 Epoch: 7 Global Step: 290980 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:11,636-Speed 2623.32 samples/sec Loss 8.7573 LearningRate 0.0422 Epoch: 7 Global Step: 290990 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:15,540-Speed 2623.45 samples/sec Loss 8.8260 LearningRate 0.0421 Epoch: 7 Global Step: 291000 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:19,444-Speed 2623.16 samples/sec Loss 8.7945 LearningRate 0.0421 Epoch: 7 Global Step: 291010 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:23,348-Speed 2623.92 samples/sec Loss 8.6822 LearningRate 0.0421 Epoch: 7 Global Step: 291020 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:27,264-Speed 2615.39 samples/sec Loss 8.7892 LearningRate 0.0421 Epoch: 7 Global Step: 291030 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:31,173-Speed 2620.58 samples/sec Loss 8.8364 LearningRate 0.0421 Epoch: 7 Global Step: 291040 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:35,078-Speed 2622.36 samples/sec Loss 8.7326 LearningRate 0.0421 Epoch: 7 Global Step: 291050 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:38,989-Speed 2618.92 samples/sec Loss 8.5724 LearningRate 0.0421 Epoch: 7 Global Step: 291060 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:42,893-Speed 2623.30 samples/sec Loss 8.8028 LearningRate 0.0421 Epoch: 7 Global Step: 291070 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:26:46,798-Speed 2623.66 samples/sec Loss 8.8966 LearningRate 0.0421 Epoch: 7 Global Step: 291080 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:26:50,704-Speed 2621.79 samples/sec Loss 8.7306 LearningRate 0.0421 Epoch: 7 Global Step: 291090 Fp16 Grad Scale: 262144 Required: 61 hours
Training: 2022-04-14 04:26:54,610-Speed 2622.89 samples/sec Loss 8.8487 LearningRate 0.0421 Epoch: 7 Global Step: 291100 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:26:58,511-Speed 2625.38 samples/sec Loss 8.7482 LearningRate 0.0421 Epoch: 7 Global Step: 291110 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:27:02,422-Speed 2618.76 samples/sec Loss 8.6437 LearningRate 0.0421 Epoch: 7 Global Step: 291120 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:27:06,329-Speed 2621.41 samples/sec Loss 8.9087 LearningRate 0.0421 Epoch: 7 Global Step: 291130 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:27:10,234-Speed 2622.46 samples/sec Loss 8.8100 LearningRate 0.0421 Epoch: 7 Global Step: 291140 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:27:14,136-Speed 2624.75 samples/sec Loss 8.7911 LearningRate 0.0421 Epoch: 7 Global Step: 291150 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:27:18,048-Speed 2618.91 samples/sec Loss 8.7321 LearningRate 0.0421 Epoch: 7 Global Step: 291160 Fp16 Grad Scale: 131072 Required: 61 hours
Training: 2022-04-14 04:27:21,927-Speed 2640.94 samples/sec Loss 8.8615 LearningRate 0.0421 Epoch: 7 Global Step: 291170 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:25,833-Speed 2622.03 samples/sec Loss 8.9750 LearningRate 0.0421 Epoch: 7 Global Step: 291180 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:29,739-Speed 2622.39 samples/sec Loss 8.6822 LearningRate 0.0421 Epoch: 7 Global Step: 291190 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:33,657-Speed 2613.98 samples/sec Loss 8.7759 LearningRate 0.0421 Epoch: 7 Global Step: 291200 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:37,565-Speed 2621.13 samples/sec Loss 8.8528 LearningRate 0.0421 Epoch: 7 Global Step: 291210 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:41,515-Speed 2593.00 samples/sec Loss 8.6457 LearningRate 0.0421 Epoch: 7 Global Step: 291220 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:45,426-Speed 2619.15 samples/sec Loss 8.7139 LearningRate 0.0421 Epoch: 7 Global Step: 291230 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:49,333-Speed 2621.68 samples/sec Loss 8.8526 LearningRate 0.0421 Epoch: 7 Global Step: 291240 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:53,241-Speed 2621.18 samples/sec Loss 8.7571 LearningRate 0.0421 Epoch: 7 Global Step: 291250 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:27:57,148-Speed 2621.06 samples/sec Loss 8.7422 LearningRate 0.0421 Epoch: 7 Global Step: 291260 Fp16 Grad Scale: 65536 Required: 61 hours
Training: 2022-04-14 04:28:01,055-Speed 2621.64 samples/sec Loss 8.8106 LearningRate 0.0421 Epoch: 7 Global Step: 291270 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:04,968-Speed 2618.06 samples/sec Loss 8.6496 LearningRate 0.0421 Epoch: 7 Global Step: 291280 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:08,869-Speed 2625.22 samples/sec Loss 8.7925 LearningRate 0.0421 Epoch: 7 Global Step: 291290 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:12,772-Speed 2624.11 samples/sec Loss 8.7306 LearningRate 0.0421 Epoch: 7 Global Step: 291300 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:16,677-Speed 2623.17 samples/sec Loss 8.6794 LearningRate 0.0421 Epoch: 7 Global Step: 291310 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:20,581-Speed 2623.55 samples/sec Loss 8.8286 LearningRate 0.0421 Epoch: 7 Global Step: 291320 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:24,489-Speed 2620.92 samples/sec Loss 8.8067 LearningRate 0.0421 Epoch: 7 Global Step: 291330 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:28,391-Speed 2625.26 samples/sec Loss 8.6626 LearningRate 0.0421 Epoch: 7 Global Step: 291340 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:32,325-Speed 2603.73 samples/sec Loss 8.7238 LearningRate 0.0421 Epoch: 7 Global Step: 291350 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:36,229-Speed 2623.46 samples/sec Loss 8.6466 LearningRate 0.0421 Epoch: 7 Global Step: 291360 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:40,140-Speed 2619.14 samples/sec Loss 8.6735 LearningRate 0.0421 Epoch: 7 Global Step: 291370 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:28:44,045-Speed 2622.68 samples/sec Loss 8.7162 LearningRate 0.0421 Epoch: 7 Global Step: 291380 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:28:47,939-Speed 2630.54 samples/sec Loss 8.7799 LearningRate 0.0421 Epoch: 7 Global Step: 291390 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:51,845-Speed 2622.85 samples/sec Loss 8.7210 LearningRate 0.0421 Epoch: 7 Global Step: 291400 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:55,757-Speed 2618.06 samples/sec Loss 8.8538 LearningRate 0.0421 Epoch: 7 Global Step: 291410 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:28:59,662-Speed 2622.36 samples/sec Loss 9.0102 LearningRate 0.0421 Epoch: 7 Global Step: 291420 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:29:03,573-Speed 2619.13 samples/sec Loss 8.7515 LearningRate 0.0421 Epoch: 7 Global Step: 291430 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:29:07,491-Speed 2614.23 samples/sec Loss 8.6658 LearningRate 0.0421 Epoch: 7 Global Step: 291440 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:29:11,403-Speed 2618.24 samples/sec Loss 8.7093 LearningRate 0.0421 Epoch: 7 Global Step: 291450 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:29:15,293-Speed 2633.06 samples/sec Loss 8.7196 LearningRate 0.0421 Epoch: 7 Global Step: 291460 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:19,199-Speed 2622.29 samples/sec Loss 8.7519 LearningRate 0.0421 Epoch: 7 Global Step: 291470 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:23,103-Speed 2623.57 samples/sec Loss 8.8411 LearningRate 0.0421 Epoch: 7 Global Step: 291480 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:27,007-Speed 2623.54 samples/sec Loss 8.7974 LearningRate 0.0421 Epoch: 7 Global Step: 291490 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:30,911-Speed 2623.55 samples/sec Loss 8.7441 LearningRate 0.0421 Epoch: 7 Global Step: 291500 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:34,816-Speed 2623.30 samples/sec Loss 8.7863 LearningRate 0.0421 Epoch: 7 Global Step: 291510 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:38,720-Speed 2623.07 samples/sec Loss 8.7152 LearningRate 0.0421 Epoch: 7 Global Step: 291520 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:42,642-Speed 2611.77 samples/sec Loss 8.7474 LearningRate 0.0421 Epoch: 7 Global Step: 291530 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:46,546-Speed 2623.41 samples/sec Loss 8.8766 LearningRate 0.0421 Epoch: 7 Global Step: 291540 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:50,449-Speed 2624.83 samples/sec Loss 8.7270 LearningRate 0.0421 Epoch: 7 Global Step: 291550 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:29:54,349-Speed 2626.10 samples/sec Loss 8.7656 LearningRate 0.0421 Epoch: 7 Global Step: 291560 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:29:58,254-Speed 2623.29 samples/sec Loss 8.8047 LearningRate 0.0421 Epoch: 7 Global Step: 291570 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:02,157-Speed 2624.26 samples/sec Loss 8.6371 LearningRate 0.0421 Epoch: 7 Global Step: 291580 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:06,060-Speed 2623.89 samples/sec Loss 8.7659 LearningRate 0.0421 Epoch: 7 Global Step: 291590 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:10,003-Speed 2597.68 samples/sec Loss 8.7534 LearningRate 0.0421 Epoch: 7 Global Step: 291600 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:13,909-Speed 2621.94 samples/sec Loss 8.7460 LearningRate 0.0421 Epoch: 7 Global Step: 291610 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:17,851-Speed 2599.18 samples/sec Loss 8.7162 LearningRate 0.0421 Epoch: 7 Global Step: 291620 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:21,760-Speed 2619.85 samples/sec Loss 8.5825 LearningRate 0.0421 Epoch: 7 Global Step: 291630 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:25,663-Speed 2624.95 samples/sec Loss 8.9060 LearningRate 0.0420 Epoch: 7 Global Step: 291640 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:29,564-Speed 2625.75 samples/sec Loss 8.7560 LearningRate 0.0420 Epoch: 7 Global Step: 291650 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:33,495-Speed 2605.03 samples/sec Loss 8.7122 LearningRate 0.0420 Epoch: 7 Global Step: 291660 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:30:37,412-Speed 2615.13 samples/sec Loss 8.8131 LearningRate 0.0420 Epoch: 7 Global Step: 291670 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:30:41,346-Speed 2603.52 samples/sec Loss 8.7482 LearningRate 0.0420 Epoch: 7 Global Step: 291680 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:30:45,257-Speed 2619.14 samples/sec Loss 8.7778 LearningRate 0.0420 Epoch: 7 Global Step: 291690 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:50,279-Speed 2039.61 samples/sec Loss 8.7464 LearningRate 0.0420 Epoch: 7 Global Step: 291700 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:54,192-Speed 2617.74 samples/sec Loss 8.7163 LearningRate 0.0420 Epoch: 7 Global Step: 291710 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:30:58,107-Speed 2617.56 samples/sec Loss 8.7921 LearningRate 0.0420 Epoch: 7 Global Step: 291720 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:02,017-Speed 2619.22 samples/sec Loss 8.8190 LearningRate 0.0420 Epoch: 7 Global Step: 291730 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:05,929-Speed 2617.77 samples/sec Loss 8.8449 LearningRate 0.0420 Epoch: 7 Global Step: 291740 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:09,847-Speed 2614.06 samples/sec Loss 8.8964 LearningRate 0.0420 Epoch: 7 Global Step: 291750 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:13,765-Speed 2614.88 samples/sec Loss 8.7647 LearningRate 0.0420 Epoch: 7 Global Step: 291760 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:17,677-Speed 2618.83 samples/sec Loss 8.9597 LearningRate 0.0420 Epoch: 7 Global Step: 291770 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:21,588-Speed 2618.26 samples/sec Loss 8.8159 LearningRate 0.0420 Epoch: 7 Global Step: 291780 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:25,515-Speed 2609.25 samples/sec Loss 8.7683 LearningRate 0.0420 Epoch: 7 Global Step: 291790 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:31:29,404-Speed 2633.13 samples/sec Loss 8.8441 LearningRate 0.0420 Epoch: 7 Global Step: 291800 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:33,352-Speed 2594.26 samples/sec Loss 8.8043 LearningRate 0.0420 Epoch: 7 Global Step: 291810 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:37,271-Speed 2614.00 samples/sec Loss 8.7027 LearningRate 0.0420 Epoch: 7 Global Step: 291820 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:41,176-Speed 2623.23 samples/sec Loss 8.8064 LearningRate 0.0420 Epoch: 7 Global Step: 291830 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:45,078-Speed 2624.33 samples/sec Loss 8.7223 LearningRate 0.0420 Epoch: 7 Global Step: 291840 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:48,989-Speed 2619.26 samples/sec Loss 8.8033 LearningRate 0.0420 Epoch: 7 Global Step: 291850 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:52,910-Speed 2611.62 samples/sec Loss 8.7067 LearningRate 0.0420 Epoch: 7 Global Step: 291860 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:31:56,802-Speed 2632.11 samples/sec Loss 8.9241 LearningRate 0.0420 Epoch: 7 Global Step: 291870 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:00,707-Speed 2622.90 samples/sec Loss 8.8993 LearningRate 0.0420 Epoch: 7 Global Step: 291880 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:04,622-Speed 2616.39 samples/sec Loss 8.5751 LearningRate 0.0420 Epoch: 7 Global Step: 291890 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:08,524-Speed 2624.99 samples/sec Loss 8.9187 LearningRate 0.0420 Epoch: 7 Global Step: 291900 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:12,429-Speed 2622.75 samples/sec Loss 8.8472 LearningRate 0.0420 Epoch: 7 Global Step: 291910 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:16,332-Speed 2624.39 samples/sec Loss 8.7701 LearningRate 0.0420 Epoch: 7 Global Step: 291920 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:20,350-Speed 2549.02 samples/sec Loss 8.7971 LearningRate 0.0420 Epoch: 7 Global Step: 291930 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:24,302-Speed 2591.56 samples/sec Loss 8.8163 LearningRate 0.0420 Epoch: 7 Global Step: 291940 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:28,206-Speed 2623.82 samples/sec Loss 8.8556 LearningRate 0.0420 Epoch: 7 Global Step: 291950 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:32,113-Speed 2621.88 samples/sec Loss 8.6865 LearningRate 0.0420 Epoch: 7 Global Step: 291960 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:32:36,020-Speed 2621.28 samples/sec Loss 8.9367 LearningRate 0.0420 Epoch: 7 Global Step: 291970 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:32:39,927-Speed 2621.51 samples/sec Loss 8.6961 LearningRate 0.0420 Epoch: 7 Global Step: 291980 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:32:43,833-Speed 2622.30 samples/sec Loss 8.6897 LearningRate 0.0420 Epoch: 7 Global Step: 291990 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:32:47,748-Speed 2616.44 samples/sec Loss 8.9166 LearningRate 0.0420 Epoch: 7 Global Step: 292000 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:32:51,661-Speed 2617.27 samples/sec Loss 8.8403 LearningRate 0.0420 Epoch: 7 Global Step: 292010 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:32:55,576-Speed 2616.20 samples/sec Loss 8.7333 LearningRate 0.0420 Epoch: 7 Global Step: 292020 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:32:59,478-Speed 2624.99 samples/sec Loss 8.9394 LearningRate 0.0420 Epoch: 7 Global Step: 292030 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:03,382-Speed 2623.84 samples/sec Loss 8.9444 LearningRate 0.0420 Epoch: 7 Global Step: 292040 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:07,291-Speed 2620.48 samples/sec Loss 8.8774 LearningRate 0.0420 Epoch: 7 Global Step: 292050 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:11,303-Speed 2552.36 samples/sec Loss 8.7786 LearningRate 0.0420 Epoch: 7 Global Step: 292060 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:15,206-Speed 2624.59 samples/sec Loss 8.7415 LearningRate 0.0420 Epoch: 7 Global Step: 292070 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:33:19,110-Speed 2623.51 samples/sec Loss 8.7790 LearningRate 0.0420 Epoch: 7 Global Step: 292080 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:33:22,998-Speed 2634.47 samples/sec Loss 8.8399 LearningRate 0.0420 Epoch: 7 Global Step: 292090 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:26,912-Speed 2616.68 samples/sec Loss 8.7873 LearningRate 0.0420 Epoch: 7 Global Step: 292100 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:30,914-Speed 2560.27 samples/sec Loss 8.8275 LearningRate 0.0420 Epoch: 7 Global Step: 292110 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:34,815-Speed 2625.15 samples/sec Loss 8.8031 LearningRate 0.0420 Epoch: 7 Global Step: 292120 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:38,719-Speed 2623.14 samples/sec Loss 8.7093 LearningRate 0.0420 Epoch: 7 Global Step: 292130 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:42,622-Speed 2624.86 samples/sec Loss 8.6192 LearningRate 0.0420 Epoch: 7 Global Step: 292140 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:46,523-Speed 2625.88 samples/sec Loss 8.7195 LearningRate 0.0420 Epoch: 7 Global Step: 292150 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:50,428-Speed 2622.53 samples/sec Loss 8.7747 LearningRate 0.0420 Epoch: 7 Global Step: 292160 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:54,329-Speed 2625.66 samples/sec Loss 8.7360 LearningRate 0.0420 Epoch: 7 Global Step: 292170 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:33:58,227-Speed 2627.53 samples/sec Loss 8.8705 LearningRate 0.0420 Epoch: 7 Global Step: 292180 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:34:02,112-Speed 2637.36 samples/sec Loss 8.8390 LearningRate 0.0420 Epoch: 7 Global Step: 292190 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:34:05,995-Speed 2637.37 samples/sec Loss 8.7839 LearningRate 0.0420 Epoch: 7 Global Step: 292200 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:09,897-Speed 2624.44 samples/sec Loss 8.7069 LearningRate 0.0420 Epoch: 7 Global Step: 292210 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:13,801-Speed 2623.68 samples/sec Loss 8.6793 LearningRate 0.0420 Epoch: 7 Global Step: 292220 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:17,703-Speed 2625.43 samples/sec Loss 8.7945 LearningRate 0.0420 Epoch: 7 Global Step: 292230 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:21,606-Speed 2623.57 samples/sec Loss 8.7945 LearningRate 0.0420 Epoch: 7 Global Step: 292240 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:25,510-Speed 2623.76 samples/sec Loss 8.7382 LearningRate 0.0420 Epoch: 7 Global Step: 292250 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:29,411-Speed 2625.76 samples/sec Loss 9.0063 LearningRate 0.0420 Epoch: 7 Global Step: 292260 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:33,314-Speed 2624.30 samples/sec Loss 8.8589 LearningRate 0.0420 Epoch: 7 Global Step: 292270 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:37,215-Speed 2625.78 samples/sec Loss 8.6967 LearningRate 0.0419 Epoch: 7 Global Step: 292280 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:41,114-Speed 2626.54 samples/sec Loss 8.8070 LearningRate 0.0419 Epoch: 7 Global Step: 292290 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:34:45,019-Speed 2623.74 samples/sec Loss 8.7595 LearningRate 0.0419 Epoch: 7 Global Step: 292300 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:34:48,923-Speed 2623.62 samples/sec Loss 8.6963 LearningRate 0.0419 Epoch: 7 Global Step: 292310 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:34:52,860-Speed 2601.86 samples/sec Loss 8.7918 LearningRate 0.0419 Epoch: 7 Global Step: 292320 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:34:56,774-Speed 2616.78 samples/sec Loss 8.8219 LearningRate 0.0419 Epoch: 7 Global Step: 292330 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:00,676-Speed 2624.82 samples/sec Loss 8.7501 LearningRate 0.0419 Epoch: 7 Global Step: 292340 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:04,578-Speed 2624.63 samples/sec Loss 8.8259 LearningRate 0.0419 Epoch: 7 Global Step: 292350 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:08,477-Speed 2627.74 samples/sec Loss 8.6324 LearningRate 0.0419 Epoch: 7 Global Step: 292360 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:12,376-Speed 2626.69 samples/sec Loss 8.7403 LearningRate 0.0419 Epoch: 7 Global Step: 292370 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:16,280-Speed 2623.52 samples/sec Loss 8.8079 LearningRate 0.0419 Epoch: 7 Global Step: 292380 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:20,253-Speed 2578.92 samples/sec Loss 8.7048 LearningRate 0.0419 Epoch: 7 Global Step: 292390 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:24,152-Speed 2626.45 samples/sec Loss 8.6606 LearningRate 0.0419 Epoch: 7 Global Step: 292400 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:28,050-Speed 2627.73 samples/sec Loss 8.6633 LearningRate 0.0419 Epoch: 7 Global Step: 292410 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:31,950-Speed 2626.04 samples/sec Loss 8.8200 LearningRate 0.0419 Epoch: 7 Global Step: 292420 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:35,849-Speed 2627.08 samples/sec Loss 8.7832 LearningRate 0.0419 Epoch: 7 Global Step: 292430 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:39,751-Speed 2624.50 samples/sec Loss 8.7788 LearningRate 0.0419 Epoch: 7 Global Step: 292440 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:43,659-Speed 2621.37 samples/sec Loss 8.8007 LearningRate 0.0419 Epoch: 7 Global Step: 292450 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:47,574-Speed 2615.83 samples/sec Loss 8.7769 LearningRate 0.0419 Epoch: 7 Global Step: 292460 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:51,487-Speed 2617.93 samples/sec Loss 8.7666 LearningRate 0.0419 Epoch: 7 Global Step: 292470 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:55,400-Speed 2617.42 samples/sec Loss 8.6618 LearningRate 0.0419 Epoch: 7 Global Step: 292480 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:35:59,307-Speed 2621.26 samples/sec Loss 8.8372 LearningRate 0.0419 Epoch: 7 Global Step: 292490 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:36:03,220-Speed 2617.54 samples/sec Loss 8.7619 LearningRate 0.0419 Epoch: 7 Global Step: 292500 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:36:07,112-Speed 2631.89 samples/sec Loss 8.7046 LearningRate 0.0419 Epoch: 7 Global Step: 292510 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:36:10,974-Speed 2651.56 samples/sec Loss 8.7303 LearningRate 0.0419 Epoch: 7 Global Step: 292520 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:14,883-Speed 2620.01 samples/sec Loss 8.7242 LearningRate 0.0419 Epoch: 7 Global Step: 292530 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:18,785-Speed 2625.08 samples/sec Loss 8.8278 LearningRate 0.0419 Epoch: 7 Global Step: 292540 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:22,688-Speed 2623.77 samples/sec Loss 8.8623 LearningRate 0.0419 Epoch: 7 Global Step: 292550 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:26,587-Speed 2627.22 samples/sec Loss 8.8410 LearningRate 0.0419 Epoch: 7 Global Step: 292560 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:30,492-Speed 2623.28 samples/sec Loss 8.8693 LearningRate 0.0419 Epoch: 7 Global Step: 292570 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:34,396-Speed 2623.48 samples/sec Loss 8.7892 LearningRate 0.0419 Epoch: 7 Global Step: 292580 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:38,293-Speed 2628.20 samples/sec Loss 8.7430 LearningRate 0.0419 Epoch: 7 Global Step: 292590 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:42,262-Speed 2580.75 samples/sec Loss 8.8558 LearningRate 0.0419 Epoch: 7 Global Step: 292600 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:46,163-Speed 2625.30 samples/sec Loss 8.7325 LearningRate 0.0419 Epoch: 7 Global Step: 292610 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:36:50,064-Speed 2625.24 samples/sec Loss 8.7593 LearningRate 0.0419 Epoch: 7 Global Step: 292620 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:36:53,973-Speed 2620.29 samples/sec Loss 8.7877 LearningRate 0.0419 Epoch: 7 Global Step: 292630 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:36:57,881-Speed 2621.45 samples/sec Loss 8.8322 LearningRate 0.0419 Epoch: 7 Global Step: 292640 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:01,789-Speed 2620.75 samples/sec Loss 8.6892 LearningRate 0.0419 Epoch: 7 Global Step: 292650 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:05,689-Speed 2626.49 samples/sec Loss 8.6122 LearningRate 0.0419 Epoch: 7 Global Step: 292660 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:09,588-Speed 2626.84 samples/sec Loss 8.7900 LearningRate 0.0419 Epoch: 7 Global Step: 292670 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:13,489-Speed 2625.60 samples/sec Loss 8.7391 LearningRate 0.0419 Epoch: 7 Global Step: 292680 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:17,389-Speed 2626.45 samples/sec Loss 8.8374 LearningRate 0.0419 Epoch: 7 Global Step: 292690 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:21,290-Speed 2625.77 samples/sec Loss 8.6994 LearningRate 0.0419 Epoch: 7 Global Step: 292700 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:25,188-Speed 2627.48 samples/sec Loss 8.7866 LearningRate 0.0419 Epoch: 7 Global Step: 292710 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:37:29,087-Speed 2626.63 samples/sec Loss 8.7670 LearningRate 0.0419 Epoch: 7 Global Step: 292720 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:37:32,989-Speed 2624.68 samples/sec Loss 8.6591 LearningRate 0.0419 Epoch: 7 Global Step: 292730 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:37:37,259-Speed 2398.72 samples/sec Loss 8.6771 LearningRate 0.0419 Epoch: 7 Global Step: 292740 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:37:41,167-Speed 2620.91 samples/sec Loss 8.6213 LearningRate 0.0419 Epoch: 7 Global Step: 292750 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:37:45,067-Speed 2626.57 samples/sec Loss 8.6862 LearningRate 0.0419 Epoch: 7 Global Step: 292760 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:37:48,964-Speed 2628.03 samples/sec Loss 8.6957 LearningRate 0.0419 Epoch: 7 Global Step: 292770 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:37:52,866-Speed 2625.11 samples/sec Loss 8.6844 LearningRate 0.0419 Epoch: 7 Global Step: 292780 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:37:56,763-Speed 2628.14 samples/sec Loss 8.7776 LearningRate 0.0419 Epoch: 7 Global Step: 292790 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:00,663-Speed 2626.52 samples/sec Loss 8.8204 LearningRate 0.0419 Epoch: 7 Global Step: 292800 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:04,564-Speed 2625.01 samples/sec Loss 8.7251 LearningRate 0.0419 Epoch: 7 Global Step: 292810 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:08,469-Speed 2622.78 samples/sec Loss 8.7416 LearningRate 0.0419 Epoch: 7 Global Step: 292820 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:38:12,369-Speed 2626.20 samples/sec Loss 8.7414 LearningRate 0.0419 Epoch: 7 Global Step: 292830 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:38:16,251-Speed 2638.71 samples/sec Loss 8.7858 LearningRate 0.0419 Epoch: 7 Global Step: 292840 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:20,150-Speed 2626.77 samples/sec Loss 8.7377 LearningRate 0.0419 Epoch: 7 Global Step: 292850 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:24,046-Speed 2629.21 samples/sec Loss 8.8350 LearningRate 0.0419 Epoch: 7 Global Step: 292860 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:27,946-Speed 2626.34 samples/sec Loss 8.8064 LearningRate 0.0419 Epoch: 7 Global Step: 292870 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:31,851-Speed 2622.47 samples/sec Loss 8.8011 LearningRate 0.0419 Epoch: 7 Global Step: 292880 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:35,749-Speed 2627.56 samples/sec Loss 8.7361 LearningRate 0.0419 Epoch: 7 Global Step: 292890 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:39,646-Speed 2628.20 samples/sec Loss 8.5805 LearningRate 0.0419 Epoch: 7 Global Step: 292900 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:43,543-Speed 2628.33 samples/sec Loss 8.7746 LearningRate 0.0419 Epoch: 7 Global Step: 292910 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:47,442-Speed 2626.50 samples/sec Loss 8.7687 LearningRate 0.0418 Epoch: 7 Global Step: 292920 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:51,344-Speed 2625.08 samples/sec Loss 8.9179 LearningRate 0.0418 Epoch: 7 Global Step: 292930 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:38:55,256-Speed 2618.39 samples/sec Loss 8.6856 LearningRate 0.0418 Epoch: 7 Global Step: 292940 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:38:59,168-Speed 2617.74 samples/sec Loss 8.7664 LearningRate 0.0418 Epoch: 7 Global Step: 292950 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:39:03,042-Speed 2644.82 samples/sec Loss 8.7556 LearningRate 0.0418 Epoch: 7 Global Step: 292960 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:06,939-Speed 2628.08 samples/sec Loss 8.7595 LearningRate 0.0418 Epoch: 7 Global Step: 292970 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:10,841-Speed 2624.99 samples/sec Loss 8.6338 LearningRate 0.0418 Epoch: 7 Global Step: 292980 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:14,743-Speed 2624.63 samples/sec Loss 8.8289 LearningRate 0.0418 Epoch: 7 Global Step: 292990 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:18,642-Speed 2626.85 samples/sec Loss 8.5993 LearningRate 0.0418 Epoch: 7 Global Step: 293000 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:22,542-Speed 2625.85 samples/sec Loss 8.8501 LearningRate 0.0418 Epoch: 7 Global Step: 293010 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:26,442-Speed 2626.62 samples/sec Loss 8.6968 LearningRate 0.0418 Epoch: 7 Global Step: 293020 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:30,341-Speed 2626.64 samples/sec Loss 8.7869 LearningRate 0.0418 Epoch: 7 Global Step: 293030 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:34,259-Speed 2614.63 samples/sec Loss 8.8365 LearningRate 0.0418 Epoch: 7 Global Step: 293040 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:38,153-Speed 2629.95 samples/sec Loss 8.6822 LearningRate 0.0418 Epoch: 7 Global Step: 293050 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:42,052-Speed 2627.12 samples/sec Loss 8.8005 LearningRate 0.0418 Epoch: 7 Global Step: 293060 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:39:45,947-Speed 2629.59 samples/sec Loss 8.8405 LearningRate 0.0418 Epoch: 7 Global Step: 293070 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:39:49,824-Speed 2641.69 samples/sec Loss 8.6516 LearningRate 0.0418 Epoch: 7 Global Step: 293080 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:53,720-Speed 2629.08 samples/sec Loss 8.6694 LearningRate 0.0418 Epoch: 7 Global Step: 293090 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:39:57,616-Speed 2628.67 samples/sec Loss 8.7429 LearningRate 0.0418 Epoch: 7 Global Step: 293100 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:01,511-Speed 2629.90 samples/sec Loss 8.7842 LearningRate 0.0418 Epoch: 7 Global Step: 293110 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:05,405-Speed 2630.05 samples/sec Loss 8.7618 LearningRate 0.0418 Epoch: 7 Global Step: 293120 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:09,305-Speed 2626.32 samples/sec Loss 8.7699 LearningRate 0.0418 Epoch: 7 Global Step: 293130 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:13,199-Speed 2629.81 samples/sec Loss 8.7929 LearningRate 0.0418 Epoch: 7 Global Step: 293140 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:17,099-Speed 2627.04 samples/sec Loss 8.9325 LearningRate 0.0418 Epoch: 7 Global Step: 293150 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:20,997-Speed 2627.75 samples/sec Loss 8.6316 LearningRate 0.0418 Epoch: 7 Global Step: 293160 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:24,894-Speed 2628.04 samples/sec Loss 8.6855 LearningRate 0.0418 Epoch: 7 Global Step: 293170 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:28,795-Speed 2625.86 samples/sec Loss 8.8138 LearningRate 0.0418 Epoch: 7 Global Step: 293180 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:40:32,692-Speed 2628.09 samples/sec Loss 8.6398 LearningRate 0.0418 Epoch: 7 Global Step: 293190 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:40:36,587-Speed 2629.46 samples/sec Loss 8.8381 LearningRate 0.0418 Epoch: 7 Global Step: 293200 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:40:40,528-Speed 2598.83 samples/sec Loss 8.7146 LearningRate 0.0418 Epoch: 7 Global Step: 293210 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:40:44,420-Speed 2631.69 samples/sec Loss 8.8164 LearningRate 0.0418 Epoch: 7 Global Step: 293220 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:48,316-Speed 2628.49 samples/sec Loss 8.7560 LearningRate 0.0418 Epoch: 7 Global Step: 293230 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:52,228-Speed 2618.99 samples/sec Loss 8.7227 LearningRate 0.0418 Epoch: 7 Global Step: 293240 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:40:56,121-Speed 2630.50 samples/sec Loss 8.7887 LearningRate 0.0418 Epoch: 7 Global Step: 293250 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:41:00,018-Speed 2628.38 samples/sec Loss 8.6370 LearningRate 0.0418 Epoch: 7 Global Step: 293260 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:41:03,897-Speed 2640.59 samples/sec Loss 8.7375 LearningRate 0.0418 Epoch: 7 Global Step: 293270 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:07,804-Speed 2621.53 samples/sec Loss 8.6615 LearningRate 0.0418 Epoch: 7 Global Step: 293280 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:11,708-Speed 2623.02 samples/sec Loss 8.6697 LearningRate 0.0418 Epoch: 7 Global Step: 293290 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:15,638-Speed 2606.37 samples/sec Loss 8.8314 LearningRate 0.0418 Epoch: 7 Global Step: 293300 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:19,545-Speed 2621.90 samples/sec Loss 8.8119 LearningRate 0.0418 Epoch: 7 Global Step: 293310 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:23,438-Speed 2631.00 samples/sec Loss 8.8069 LearningRate 0.0418 Epoch: 7 Global Step: 293320 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:27,344-Speed 2622.28 samples/sec Loss 8.7726 LearningRate 0.0418 Epoch: 7 Global Step: 293330 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:31,245-Speed 2625.70 samples/sec Loss 8.6816 LearningRate 0.0418 Epoch: 7 Global Step: 293340 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:35,140-Speed 2629.42 samples/sec Loss 8.6903 LearningRate 0.0418 Epoch: 7 Global Step: 293350 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:39,034-Speed 2630.27 samples/sec Loss 8.7214 LearningRate 0.0418 Epoch: 7 Global Step: 293360 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:41:42,928-Speed 2630.92 samples/sec Loss 8.6950 LearningRate 0.0418 Epoch: 7 Global Step: 293370 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:41:46,821-Speed 2630.88 samples/sec Loss 8.6635 LearningRate 0.0418 Epoch: 7 Global Step: 293380 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:41:50,717-Speed 2629.13 samples/sec Loss 8.8100 LearningRate 0.0418 Epoch: 7 Global Step: 293390 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:41:54,652-Speed 2602.79 samples/sec Loss 8.8197 LearningRate 0.0418 Epoch: 7 Global Step: 293400 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:41:58,551-Speed 2627.39 samples/sec Loss 8.7551 LearningRate 0.0418 Epoch: 7 Global Step: 293410 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:02,455-Speed 2623.81 samples/sec Loss 8.6565 LearningRate 0.0418 Epoch: 7 Global Step: 293420 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:06,352-Speed 2628.10 samples/sec Loss 8.7155 LearningRate 0.0418 Epoch: 7 Global Step: 293430 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:10,246-Speed 2629.57 samples/sec Loss 8.8379 LearningRate 0.0418 Epoch: 7 Global Step: 293440 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:14,155-Speed 2620.71 samples/sec Loss 8.6190 LearningRate 0.0418 Epoch: 7 Global Step: 293450 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:18,060-Speed 2622.97 samples/sec Loss 8.7832 LearningRate 0.0418 Epoch: 7 Global Step: 293460 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:21,960-Speed 2626.46 samples/sec Loss 8.6596 LearningRate 0.0418 Epoch: 7 Global Step: 293470 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:42:25,886-Speed 2608.80 samples/sec Loss 8.7730 LearningRate 0.0418 Epoch: 7 Global Step: 293480 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:29,794-Speed 2621.63 samples/sec Loss 8.8352 LearningRate 0.0418 Epoch: 7 Global Step: 293490 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:42:33,679-Speed 2636.36 samples/sec Loss 8.7043 LearningRate 0.0418 Epoch: 7 Global Step: 293500 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:42:37,585-Speed 2622.20 samples/sec Loss 8.8199 LearningRate 0.0418 Epoch: 7 Global Step: 293510 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:42:41,484-Speed 2626.53 samples/sec Loss 8.7841 LearningRate 0.0418 Epoch: 7 Global Step: 293520 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:42:45,379-Speed 2629.09 samples/sec Loss 8.7525 LearningRate 0.0418 Epoch: 7 Global Step: 293530 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:42:49,279-Speed 2626.54 samples/sec Loss 8.8122 LearningRate 0.0418 Epoch: 7 Global Step: 293540 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:42:53,197-Speed 2614.74 samples/sec Loss 8.7009 LearningRate 0.0418 Epoch: 7 Global Step: 293550 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:42:57,120-Speed 2611.36 samples/sec Loss 9.3674 LearningRate 0.0417 Epoch: 7 Global Step: 293560 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:01,085-Speed 2582.79 samples/sec Loss 9.5308 LearningRate 0.0417 Epoch: 7 Global Step: 293570 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:04,994-Speed 2619.92 samples/sec Loss 8.9210 LearningRate 0.0417 Epoch: 7 Global Step: 293580 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:08,901-Speed 2621.84 samples/sec Loss 8.8632 LearningRate 0.0417 Epoch: 7 Global Step: 293590 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:12,847-Speed 2595.80 samples/sec Loss 8.6328 LearningRate 0.0417 Epoch: 7 Global Step: 293600 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:16,753-Speed 2621.83 samples/sec Loss 8.9378 LearningRate 0.0417 Epoch: 7 Global Step: 293610 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:20,751-Speed 2562.41 samples/sec Loss 8.7668 LearningRate 0.0417 Epoch: 7 Global Step: 293620 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:24,683-Speed 2604.58 samples/sec Loss 9.4611 LearningRate 0.0417 Epoch: 7 Global Step: 293630 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:28,631-Speed 2594.98 samples/sec Loss 9.2657 LearningRate 0.0417 Epoch: 7 Global Step: 293640 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:32,530-Speed 2626.50 samples/sec Loss 8.9565 LearningRate 0.0417 Epoch: 7 Global Step: 293650 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:43:36,426-Speed 2628.82 samples/sec Loss 8.7954 LearningRate 0.0417 Epoch: 7 Global Step: 293660 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:43:40,329-Speed 2623.92 samples/sec Loss 8.8260 LearningRate 0.0417 Epoch: 7 Global Step: 293670 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:43:44,236-Speed 2621.69 samples/sec Loss 8.8460 LearningRate 0.0417 Epoch: 7 Global Step: 293680 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:43:48,137-Speed 2625.63 samples/sec Loss 8.8512 LearningRate 0.0417 Epoch: 7 Global Step: 293690 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:43:52,032-Speed 2630.23 samples/sec Loss 8.7707 LearningRate 0.0417 Epoch: 7 Global Step: 293700 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:43:55,930-Speed 2627.76 samples/sec Loss 8.8947 LearningRate 0.0417 Epoch: 7 Global Step: 293710 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:43:59,830-Speed 2626.42 samples/sec Loss 8.8077 LearningRate 0.0417 Epoch: 7 Global Step: 293720 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:03,730-Speed 2625.90 samples/sec Loss 8.9748 LearningRate 0.0417 Epoch: 7 Global Step: 293730 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:07,629-Speed 2627.06 samples/sec Loss 8.8469 LearningRate 0.0417 Epoch: 7 Global Step: 293740 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:11,525-Speed 2629.08 samples/sec Loss 8.9237 LearningRate 0.0417 Epoch: 7 Global Step: 293750 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:15,434-Speed 2619.51 samples/sec Loss 8.8734 LearningRate 0.0417 Epoch: 7 Global Step: 293760 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:44:19,338-Speed 2624.19 samples/sec Loss 8.8400 LearningRate 0.0417 Epoch: 7 Global Step: 293770 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:44:23,227-Speed 2633.54 samples/sec Loss 8.7822 LearningRate 0.0417 Epoch: 7 Global Step: 293780 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:27,136-Speed 2620.86 samples/sec Loss 8.8896 LearningRate 0.0417 Epoch: 7 Global Step: 293790 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:31,028-Speed 2631.55 samples/sec Loss 8.8413 LearningRate 0.0417 Epoch: 7 Global Step: 293800 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:34,929-Speed 2625.84 samples/sec Loss 8.8373 LearningRate 0.0417 Epoch: 7 Global Step: 293810 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:38,832-Speed 2623.95 samples/sec Loss 8.7889 LearningRate 0.0417 Epoch: 7 Global Step: 293820 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:42,733-Speed 2626.02 samples/sec Loss 8.7525 LearningRate 0.0417 Epoch: 7 Global Step: 293830 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:46,799-Speed 2518.86 samples/sec Loss 8.7748 LearningRate 0.0417 Epoch: 7 Global Step: 293840 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:50,713-Speed 2616.81 samples/sec Loss 8.8581 LearningRate 0.0417 Epoch: 7 Global Step: 293850 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:54,640-Speed 2608.34 samples/sec Loss 8.6766 LearningRate 0.0417 Epoch: 7 Global Step: 293860 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:44:58,536-Speed 2628.80 samples/sec Loss 8.7198 LearningRate 0.0417 Epoch: 7 Global Step: 293870 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:45:02,522-Speed 2570.31 samples/sec Loss 8.8588 LearningRate 0.0417 Epoch: 7 Global Step: 293880 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:06,420-Speed 2627.47 samples/sec Loss 8.8323 LearningRate 0.0417 Epoch: 7 Global Step: 293890 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:10,337-Speed 2614.82 samples/sec Loss 8.7371 LearningRate 0.0417 Epoch: 7 Global Step: 293900 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:14,249-Speed 2618.09 samples/sec Loss 8.8152 LearningRate 0.0417 Epoch: 7 Global Step: 293910 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:18,165-Speed 2615.63 samples/sec Loss 8.7105 LearningRate 0.0417 Epoch: 7 Global Step: 293920 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:22,073-Speed 2621.23 samples/sec Loss 8.7342 LearningRate 0.0417 Epoch: 7 Global Step: 293930 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:25,966-Speed 2631.13 samples/sec Loss 8.6956 LearningRate 0.0417 Epoch: 7 Global Step: 293940 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:29,861-Speed 2629.77 samples/sec Loss 8.7261 LearningRate 0.0417 Epoch: 7 Global Step: 293950 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:33,754-Speed 2630.97 samples/sec Loss 8.8336 LearningRate 0.0417 Epoch: 7 Global Step: 293960 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:37,648-Speed 2630.71 samples/sec Loss 8.8861 LearningRate 0.0417 Epoch: 7 Global Step: 293970 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:41,539-Speed 2632.20 samples/sec Loss 8.7111 LearningRate 0.0417 Epoch: 7 Global Step: 293980 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:45,432-Speed 2630.97 samples/sec Loss 8.7516 LearningRate 0.0417 Epoch: 7 Global Step: 293990 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:49,338-Speed 2622.50 samples/sec Loss 8.8241 LearningRate 0.0417 Epoch: 7 Global Step: 294000 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:53,246-Speed 2621.19 samples/sec Loss 8.8791 LearningRate 0.0417 Epoch: 7 Global Step: 294010 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:45:57,143-Speed 2628.36 samples/sec Loss 8.7999 LearningRate 0.0417 Epoch: 7 Global Step: 294020 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:01,046-Speed 2623.78 samples/sec Loss 8.7519 LearningRate 0.0417 Epoch: 7 Global Step: 294030 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:04,943-Speed 2628.68 samples/sec Loss 8.7849 LearningRate 0.0417 Epoch: 7 Global Step: 294040 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:08,860-Speed 2615.12 samples/sec Loss 8.8512 LearningRate 0.0417 Epoch: 7 Global Step: 294050 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:12,788-Speed 2607.31 samples/sec Loss 8.7843 LearningRate 0.0417 Epoch: 7 Global Step: 294060 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:16,682-Speed 2630.53 samples/sec Loss 8.6921 LearningRate 0.0417 Epoch: 7 Global Step: 294070 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:20,579-Speed 2628.47 samples/sec Loss 8.6886 LearningRate 0.0417 Epoch: 7 Global Step: 294080 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:46:24,456-Speed 2642.14 samples/sec Loss 8.7521 LearningRate 0.0417 Epoch: 7 Global Step: 294090 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:28,347-Speed 2631.94 samples/sec Loss 8.6902 LearningRate 0.0417 Epoch: 7 Global Step: 294100 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:32,240-Speed 2630.85 samples/sec Loss 8.7986 LearningRate 0.0417 Epoch: 7 Global Step: 294110 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:36,133-Speed 2631.12 samples/sec Loss 8.7286 LearningRate 0.0417 Epoch: 7 Global Step: 294120 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:40,047-Speed 2616.98 samples/sec Loss 8.8391 LearningRate 0.0417 Epoch: 7 Global Step: 294130 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:43,955-Speed 2621.65 samples/sec Loss 8.6807 LearningRate 0.0417 Epoch: 7 Global Step: 294140 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:47,860-Speed 2622.46 samples/sec Loss 8.7728 LearningRate 0.0417 Epoch: 7 Global Step: 294150 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:51,776-Speed 2616.43 samples/sec Loss 8.7919 LearningRate 0.0417 Epoch: 7 Global Step: 294160 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:55,689-Speed 2617.12 samples/sec Loss 8.7245 LearningRate 0.0417 Epoch: 7 Global Step: 294170 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:46:59,630-Speed 2599.06 samples/sec Loss 8.7146 LearningRate 0.0417 Epoch: 7 Global Step: 294180 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:47:03,525-Speed 2629.30 samples/sec Loss 8.9039 LearningRate 0.0417 Epoch: 7 Global Step: 294190 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:47:07,419-Speed 2631.06 samples/sec Loss 9.1688 LearningRate 0.0416 Epoch: 7 Global Step: 294200 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:11,311-Speed 2631.36 samples/sec Loss 8.6836 LearningRate 0.0416 Epoch: 7 Global Step: 294210 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:15,227-Speed 2615.46 samples/sec Loss 8.7322 LearningRate 0.0416 Epoch: 7 Global Step: 294220 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:19,274-Speed 2531.31 samples/sec Loss 8.7779 LearningRate 0.0416 Epoch: 7 Global Step: 294230 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:23,168-Speed 2629.82 samples/sec Loss 8.9031 LearningRate 0.0416 Epoch: 7 Global Step: 294240 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:27,062-Speed 2630.89 samples/sec Loss 8.7394 LearningRate 0.0416 Epoch: 7 Global Step: 294250 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:30,954-Speed 2631.66 samples/sec Loss 8.7134 LearningRate 0.0416 Epoch: 7 Global Step: 294260 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:34,853-Speed 2627.02 samples/sec Loss 8.6432 LearningRate 0.0416 Epoch: 7 Global Step: 294270 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:38,747-Speed 2630.17 samples/sec Loss 8.7634 LearningRate 0.0416 Epoch: 7 Global Step: 294280 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:42,642-Speed 2629.60 samples/sec Loss 8.5009 LearningRate 0.0416 Epoch: 7 Global Step: 294290 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:47:46,539-Speed 2628.30 samples/sec Loss 8.8659 LearningRate 0.0416 Epoch: 7 Global Step: 294300 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:47:50,451-Speed 2618.54 samples/sec Loss 8.7234 LearningRate 0.0416 Epoch: 7 Global Step: 294310 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:47:54,358-Speed 2621.17 samples/sec Loss 9.0650 LearningRate 0.0416 Epoch: 7 Global Step: 294320 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:47:58,286-Speed 2608.30 samples/sec Loss 9.2601 LearningRate 0.0416 Epoch: 7 Global Step: 294330 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:48:02,191-Speed 2623.06 samples/sec Loss 8.6971 LearningRate 0.0416 Epoch: 7 Global Step: 294340 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:48:06,093-Speed 2624.97 samples/sec Loss 8.6390 LearningRate 0.0416 Epoch: 7 Global Step: 294350 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:48:10,001-Speed 2621.01 samples/sec Loss 8.6849 LearningRate 0.0416 Epoch: 7 Global Step: 294360 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:48:13,903-Speed 2624.68 samples/sec Loss 8.7182 LearningRate 0.0416 Epoch: 7 Global Step: 294370 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:48:17,812-Speed 2620.49 samples/sec Loss 8.7751 LearningRate 0.0416 Epoch: 7 Global Step: 294380 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:48:21,717-Speed 2623.58 samples/sec Loss 8.8332 LearningRate 0.0416 Epoch: 7 Global Step: 294390 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:48:25,619-Speed 2624.26 samples/sec Loss 8.8013 LearningRate 0.0416 Epoch: 7 Global Step: 294400 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:29,526-Speed 2622.18 samples/sec Loss 8.7941 LearningRate 0.0416 Epoch: 7 Global Step: 294410 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:33,421-Speed 2629.65 samples/sec Loss 8.5362 LearningRate 0.0416 Epoch: 7 Global Step: 294420 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:37,317-Speed 2628.58 samples/sec Loss 8.8894 LearningRate 0.0416 Epoch: 7 Global Step: 294430 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:41,213-Speed 2629.11 samples/sec Loss 8.8172 LearningRate 0.0416 Epoch: 7 Global Step: 294440 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:45,107-Speed 2630.40 samples/sec Loss 8.7417 LearningRate 0.0416 Epoch: 7 Global Step: 294450 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:49,014-Speed 2622.00 samples/sec Loss 8.6788 LearningRate 0.0416 Epoch: 7 Global Step: 294460 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:52,909-Speed 2629.12 samples/sec Loss 8.7313 LearningRate 0.0416 Epoch: 7 Global Step: 294470 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:48:56,821-Speed 2618.74 samples/sec Loss 8.7369 LearningRate 0.0416 Epoch: 7 Global Step: 294480 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:49:00,722-Speed 2625.38 samples/sec Loss 8.6453 LearningRate 0.0416 Epoch: 7 Global Step: 294490 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:49:04,621-Speed 2626.87 samples/sec Loss 8.7640 LearningRate 0.0416 Epoch: 7 Global Step: 294500 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:49:08,520-Speed 2626.70 samples/sec Loss 8.7678 LearningRate 0.0416 Epoch: 7 Global Step: 294510 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:49:12,417-Speed 2628.84 samples/sec Loss 8.6859 LearningRate 0.0416 Epoch: 7 Global Step: 294520 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:49:16,318-Speed 2625.33 samples/sec Loss 8.6972 LearningRate 0.0416 Epoch: 7 Global Step: 294530 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:49:20,225-Speed 2621.81 samples/sec Loss 8.8166 LearningRate 0.0416 Epoch: 7 Global Step: 294540 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:49:24,116-Speed 2632.10 samples/sec Loss 8.7408 LearningRate 0.0416 Epoch: 7 Global Step: 294550 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:49:28,012-Speed 2629.10 samples/sec Loss 8.7748 LearningRate 0.0416 Epoch: 7 Global Step: 294560 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:49:31,909-Speed 2628.76 samples/sec Loss 8.7638 LearningRate 0.0416 Epoch: 7 Global Step: 294570 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:49:35,775-Speed 2648.93 samples/sec Loss 9.6169 LearningRate 0.0416 Epoch: 7 Global Step: 294580 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:49:39,667-Speed 2631.53 samples/sec Loss 9.1071 LearningRate 0.0416 Epoch: 7 Global Step: 294590 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:49:43,561-Speed 2634.75 samples/sec Loss 8.8230 LearningRate 0.0416 Epoch: 7 Global Step: 294600 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:49:47,458-Speed 2628.58 samples/sec Loss 8.7909 LearningRate 0.0416 Epoch: 7 Global Step: 294610 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:49:51,350-Speed 2631.79 samples/sec Loss 8.7722 LearningRate 0.0416 Epoch: 7 Global Step: 294620 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:49:55,242-Speed 2631.49 samples/sec Loss 8.7162 LearningRate 0.0416 Epoch: 7 Global Step: 294630 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:49:59,135-Speed 2630.98 samples/sec Loss 8.7688 LearningRate 0.0416 Epoch: 7 Global Step: 294640 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:50:03,031-Speed 2628.93 samples/sec Loss 8.8540 LearningRate 0.0416 Epoch: 7 Global Step: 294650 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:50:06,923-Speed 2631.71 samples/sec Loss 8.7050 LearningRate 0.0416 Epoch: 7 Global Step: 294660 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:50:10,821-Speed 2627.61 samples/sec Loss 8.8674 LearningRate 0.0416 Epoch: 7 Global Step: 294670 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:50:14,716-Speed 2629.68 samples/sec Loss 8.8424 LearningRate 0.0416 Epoch: 7 Global Step: 294680 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:18,614-Speed 2627.45 samples/sec Loss 8.7645 LearningRate 0.0416 Epoch: 7 Global Step: 294690 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:22,532-Speed 2613.86 samples/sec Loss 8.8857 LearningRate 0.0416 Epoch: 7 Global Step: 294700 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:26,432-Speed 2626.54 samples/sec Loss 8.7948 LearningRate 0.0416 Epoch: 7 Global Step: 294710 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:30,329-Speed 2628.17 samples/sec Loss 8.8063 LearningRate 0.0416 Epoch: 7 Global Step: 294720 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:34,230-Speed 2625.81 samples/sec Loss 8.6679 LearningRate 0.0416 Epoch: 7 Global Step: 294730 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:38,125-Speed 2630.09 samples/sec Loss 8.6284 LearningRate 0.0416 Epoch: 7 Global Step: 294740 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:42,022-Speed 2627.85 samples/sec Loss 8.6810 LearningRate 0.0416 Epoch: 7 Global Step: 294750 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:45,917-Speed 2629.28 samples/sec Loss 8.8773 LearningRate 0.0416 Epoch: 7 Global Step: 294760 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:49,816-Speed 2627.48 samples/sec Loss 8.7453 LearningRate 0.0416 Epoch: 7 Global Step: 294770 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:50:53,721-Speed 2622.65 samples/sec Loss 8.7031 LearningRate 0.0416 Epoch: 7 Global Step: 294780 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:50:57,649-Speed 2607.97 samples/sec Loss 8.7584 LearningRate 0.0416 Epoch: 7 Global Step: 294790 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:01,545-Speed 2629.18 samples/sec Loss 8.6941 LearningRate 0.0416 Epoch: 7 Global Step: 294800 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:05,453-Speed 2620.53 samples/sec Loss 8.8336 LearningRate 0.0416 Epoch: 7 Global Step: 294810 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:09,342-Speed 2633.68 samples/sec Loss 8.7132 LearningRate 0.0416 Epoch: 7 Global Step: 294820 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:13,253-Speed 2619.41 samples/sec Loss 8.6420 LearningRate 0.0416 Epoch: 7 Global Step: 294830 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:17,185-Speed 2604.80 samples/sec Loss 8.7512 LearningRate 0.0415 Epoch: 7 Global Step: 294840 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:21,116-Speed 2605.84 samples/sec Loss 8.8586 LearningRate 0.0415 Epoch: 7 Global Step: 294850 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:25,021-Speed 2623.00 samples/sec Loss 8.8046 LearningRate 0.0415 Epoch: 7 Global Step: 294860 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:28,957-Speed 2602.06 samples/sec Loss 8.8028 LearningRate 0.0415 Epoch: 7 Global Step: 294870 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:32,876-Speed 2613.74 samples/sec Loss 8.7540 LearningRate 0.0415 Epoch: 7 Global Step: 294880 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:51:36,784-Speed 2621.35 samples/sec Loss 8.8894 LearningRate 0.0415 Epoch: 7 Global Step: 294890 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:51:40,660-Speed 2642.31 samples/sec Loss 8.7653 LearningRate 0.0415 Epoch: 7 Global Step: 294900 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:44,562-Speed 2625.09 samples/sec Loss 8.8143 LearningRate 0.0415 Epoch: 7 Global Step: 294910 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:48,460-Speed 2627.46 samples/sec Loss 8.6922 LearningRate 0.0415 Epoch: 7 Global Step: 294920 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:52,364-Speed 2623.14 samples/sec Loss 8.6293 LearningRate 0.0415 Epoch: 7 Global Step: 294930 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:51:56,358-Speed 2565.12 samples/sec Loss 8.8658 LearningRate 0.0415 Epoch: 7 Global Step: 294940 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:52:00,251-Speed 2630.62 samples/sec Loss 8.7273 LearningRate 0.0415 Epoch: 7 Global Step: 294950 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:52:04,145-Speed 2630.64 samples/sec Loss 8.8544 LearningRate 0.0415 Epoch: 7 Global Step: 294960 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:52:08,084-Speed 2600.17 samples/sec Loss 8.6800 LearningRate 0.0415 Epoch: 7 Global Step: 294970 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:52:12,063-Speed 2573.78 samples/sec Loss 8.9891 LearningRate 0.0415 Epoch: 7 Global Step: 294980 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:52:15,962-Speed 2627.14 samples/sec Loss 8.8475 LearningRate 0.0415 Epoch: 7 Global Step: 294990 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:52:19,859-Speed 2628.11 samples/sec Loss 8.7313 LearningRate 0.0415 Epoch: 7 Global Step: 295000 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:23,756-Speed 2628.34 samples/sec Loss 8.8441 LearningRate 0.0415 Epoch: 7 Global Step: 295010 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:27,659-Speed 2624.42 samples/sec Loss 8.8184 LearningRate 0.0415 Epoch: 7 Global Step: 295020 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:31,555-Speed 2629.19 samples/sec Loss 8.8022 LearningRate 0.0415 Epoch: 7 Global Step: 295030 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:35,456-Speed 2625.40 samples/sec Loss 8.6722 LearningRate 0.0415 Epoch: 7 Global Step: 295040 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:39,358-Speed 2625.09 samples/sec Loss 8.5694 LearningRate 0.0415 Epoch: 7 Global Step: 295050 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:43,439-Speed 2509.85 samples/sec Loss 8.7389 LearningRate 0.0415 Epoch: 7 Global Step: 295060 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:47,337-Speed 2628.27 samples/sec Loss 8.7558 LearningRate 0.0415 Epoch: 7 Global Step: 295070 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:51,233-Speed 2628.76 samples/sec Loss 8.7563 LearningRate 0.0415 Epoch: 7 Global Step: 295080 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:55,129-Speed 2629.69 samples/sec Loss 8.7258 LearningRate 0.0415 Epoch: 7 Global Step: 295090 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:52:59,025-Speed 2628.65 samples/sec Loss 8.6634 LearningRate 0.0415 Epoch: 7 Global Step: 295100 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:53:02,923-Speed 2627.66 samples/sec Loss 8.7014 LearningRate 0.0415 Epoch: 7 Global Step: 295110 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:53:06,802-Speed 2640.62 samples/sec Loss 8.7412 LearningRate 0.0415 Epoch: 7 Global Step: 295120 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:10,749-Speed 2594.92 samples/sec Loss 8.7460 LearningRate 0.0415 Epoch: 7 Global Step: 295130 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:14,664-Speed 2616.95 samples/sec Loss 8.8211 LearningRate 0.0415 Epoch: 7 Global Step: 295140 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:18,580-Speed 2615.77 samples/sec Loss 8.6728 LearningRate 0.0415 Epoch: 7 Global Step: 295150 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:22,522-Speed 2597.93 samples/sec Loss 8.7093 LearningRate 0.0415 Epoch: 7 Global Step: 295160 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:26,437-Speed 2616.62 samples/sec Loss 8.7562 LearningRate 0.0415 Epoch: 7 Global Step: 295170 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:30,350-Speed 2617.82 samples/sec Loss 8.6242 LearningRate 0.0415 Epoch: 7 Global Step: 295180 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:34,263-Speed 2617.06 samples/sec Loss 8.7194 LearningRate 0.0415 Epoch: 7 Global Step: 295190 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:38,173-Speed 2619.73 samples/sec Loss 8.6563 LearningRate 0.0415 Epoch: 7 Global Step: 295200 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:53:42,090-Speed 2614.30 samples/sec Loss 8.7036 LearningRate 0.0415 Epoch: 7 Global Step: 295210 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:53:46,008-Speed 2614.88 samples/sec Loss 8.6472 LearningRate 0.0415 Epoch: 7 Global Step: 295220 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:53:49,911-Speed 2624.88 samples/sec Loss 8.7779 LearningRate 0.0415 Epoch: 7 Global Step: 295230 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:53:53,828-Speed 2614.74 samples/sec Loss 8.6634 LearningRate 0.0415 Epoch: 7 Global Step: 295240 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:53:57,727-Speed 2627.41 samples/sec Loss 8.7565 LearningRate 0.0415 Epoch: 7 Global Step: 295250 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:01,621-Speed 2630.29 samples/sec Loss 8.7069 LearningRate 0.0415 Epoch: 7 Global Step: 295260 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:05,522-Speed 2624.99 samples/sec Loss 8.7182 LearningRate 0.0415 Epoch: 7 Global Step: 295270 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:09,417-Speed 2629.83 samples/sec Loss 8.7246 LearningRate 0.0415 Epoch: 7 Global Step: 295280 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:13,310-Speed 2631.65 samples/sec Loss 8.7254 LearningRate 0.0415 Epoch: 7 Global Step: 295290 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:17,206-Speed 2628.52 samples/sec Loss 8.6835 LearningRate 0.0415 Epoch: 7 Global Step: 295300 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:21,144-Speed 2600.90 samples/sec Loss 8.7974 LearningRate 0.0415 Epoch: 7 Global Step: 295310 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:54:25,040-Speed 2628.98 samples/sec Loss 8.6359 LearningRate 0.0415 Epoch: 7 Global Step: 295320 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:54:28,937-Speed 2629.07 samples/sec Loss 8.6681 LearningRate 0.0415 Epoch: 7 Global Step: 295330 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:54:32,860-Speed 2610.31 samples/sec Loss 8.7082 LearningRate 0.0415 Epoch: 7 Global Step: 295340 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:36,757-Speed 2628.41 samples/sec Loss 8.6576 LearningRate 0.0415 Epoch: 7 Global Step: 295350 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:40,672-Speed 2616.33 samples/sec Loss 8.7503 LearningRate 0.0415 Epoch: 7 Global Step: 295360 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:44,582-Speed 2619.79 samples/sec Loss 8.8370 LearningRate 0.0415 Epoch: 7 Global Step: 295370 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:48,488-Speed 2622.29 samples/sec Loss 8.7437 LearningRate 0.0415 Epoch: 7 Global Step: 295380 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:52,399-Speed 2618.49 samples/sec Loss 8.7317 LearningRate 0.0415 Epoch: 7 Global Step: 295390 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:54:56,298-Speed 2627.46 samples/sec Loss 8.7662 LearningRate 0.0415 Epoch: 7 Global Step: 295400 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:55:00,188-Speed 2633.48 samples/sec Loss 8.7187 LearningRate 0.0415 Epoch: 7 Global Step: 295410 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:55:04,119-Speed 2605.08 samples/sec Loss 8.7507 LearningRate 0.0415 Epoch: 7 Global Step: 295420 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:55:08,011-Speed 2631.75 samples/sec Loss 8.6980 LearningRate 0.0415 Epoch: 7 Global Step: 295430 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:55:11,905-Speed 2629.67 samples/sec Loss 8.7405 LearningRate 0.0415 Epoch: 7 Global Step: 295440 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:55:15,801-Speed 2629.60 samples/sec Loss 8.8168 LearningRate 0.0415 Epoch: 7 Global Step: 295450 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:55:19,713-Speed 2618.66 samples/sec Loss 8.7811 LearningRate 0.0415 Epoch: 7 Global Step: 295460 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:55:23,607-Speed 2630.27 samples/sec Loss 8.7370 LearningRate 0.0415 Epoch: 7 Global Step: 295470 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:55:27,447-Speed 2667.37 samples/sec Loss 9.0950 LearningRate 0.0415 Epoch: 7 Global Step: 295480 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:31,349-Speed 2624.93 samples/sec Loss 8.8500 LearningRate 0.0414 Epoch: 7 Global Step: 295490 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:35,242-Speed 2630.99 samples/sec Loss 8.7658 LearningRate 0.0414 Epoch: 7 Global Step: 295500 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:39,132-Speed 2632.79 samples/sec Loss 8.6482 LearningRate 0.0414 Epoch: 7 Global Step: 295510 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:43,030-Speed 2627.82 samples/sec Loss 8.7964 LearningRate 0.0414 Epoch: 7 Global Step: 295520 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:46,922-Speed 2631.66 samples/sec Loss 8.7322 LearningRate 0.0414 Epoch: 7 Global Step: 295530 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:50,814-Speed 2631.67 samples/sec Loss 8.7511 LearningRate 0.0414 Epoch: 7 Global Step: 295540 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:54,713-Speed 2627.01 samples/sec Loss 8.7640 LearningRate 0.0414 Epoch: 7 Global Step: 295550 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:55:58,622-Speed 2620.33 samples/sec Loss 8.6434 LearningRate 0.0414 Epoch: 7 Global Step: 295560 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:56:02,521-Speed 2627.25 samples/sec Loss 8.7873 LearningRate 0.0414 Epoch: 7 Global Step: 295570 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 04:56:06,523-Speed 2558.79 samples/sec Loss 8.6608 LearningRate 0.0414 Epoch: 7 Global Step: 295580 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:10,459-Speed 2602.15 samples/sec Loss 8.6680 LearningRate 0.0414 Epoch: 7 Global Step: 295590 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:14,403-Speed 2597.46 samples/sec Loss 8.7566 LearningRate 0.0414 Epoch: 7 Global Step: 295600 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:18,395-Speed 2565.51 samples/sec Loss 8.8387 LearningRate 0.0414 Epoch: 7 Global Step: 295610 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:22,365-Speed 2580.24 samples/sec Loss 8.6428 LearningRate 0.0414 Epoch: 7 Global Step: 295620 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:26,259-Speed 2630.47 samples/sec Loss 8.7572 LearningRate 0.0414 Epoch: 7 Global Step: 295630 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:30,153-Speed 2630.36 samples/sec Loss 8.8638 LearningRate 0.0414 Epoch: 7 Global Step: 295640 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:34,044-Speed 2631.98 samples/sec Loss 8.8450 LearningRate 0.0414 Epoch: 7 Global Step: 295650 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:37,946-Speed 2625.23 samples/sec Loss 8.8729 LearningRate 0.0414 Epoch: 7 Global Step: 295660 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:41,853-Speed 2620.97 samples/sec Loss 8.7834 LearningRate 0.0414 Epoch: 7 Global Step: 295670 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 04:56:45,766-Speed 2617.71 samples/sec Loss 8.7186 LearningRate 0.0414 Epoch: 7 Global Step: 295680 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:56:49,668-Speed 2624.83 samples/sec Loss 8.7128 LearningRate 0.0414 Epoch: 7 Global Step: 295690 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:56:53,588-Speed 2612.58 samples/sec Loss 8.7111 LearningRate 0.0414 Epoch: 7 Global Step: 295700 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:56:57,487-Speed 2627.64 samples/sec Loss 8.8224 LearningRate 0.0414 Epoch: 7 Global Step: 295710 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:01,382-Speed 2629.43 samples/sec Loss 8.7858 LearningRate 0.0414 Epoch: 7 Global Step: 295720 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:05,280-Speed 2627.98 samples/sec Loss 8.8385 LearningRate 0.0414 Epoch: 7 Global Step: 295730 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:09,203-Speed 2610.33 samples/sec Loss 8.8047 LearningRate 0.0414 Epoch: 7 Global Step: 295740 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:13,104-Speed 2625.81 samples/sec Loss 8.7842 LearningRate 0.0414 Epoch: 7 Global Step: 295750 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:17,004-Speed 2626.22 samples/sec Loss 8.8455 LearningRate 0.0414 Epoch: 7 Global Step: 295760 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:20,904-Speed 2626.37 samples/sec Loss 8.7067 LearningRate 0.0414 Epoch: 7 Global Step: 295770 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:24,817-Speed 2617.75 samples/sec Loss 8.7691 LearningRate 0.0414 Epoch: 7 Global Step: 295780 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:57:28,717-Speed 2625.70 samples/sec Loss 8.7381 LearningRate 0.0414 Epoch: 7 Global Step: 295790 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:57:32,612-Speed 2630.55 samples/sec Loss 8.7323 LearningRate 0.0414 Epoch: 7 Global Step: 295800 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:57:36,502-Speed 2632.98 samples/sec Loss 8.6742 LearningRate 0.0414 Epoch: 7 Global Step: 295810 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:57:40,377-Speed 2642.96 samples/sec Loss 8.7261 LearningRate 0.0414 Epoch: 7 Global Step: 295820 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:44,271-Speed 2630.37 samples/sec Loss 8.7105 LearningRate 0.0414 Epoch: 7 Global Step: 295830 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:48,181-Speed 2619.47 samples/sec Loss 8.8610 LearningRate 0.0414 Epoch: 7 Global Step: 295840 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:52,080-Speed 2627.09 samples/sec Loss 8.7596 LearningRate 0.0414 Epoch: 7 Global Step: 295850 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:55,971-Speed 2632.75 samples/sec Loss 8.8516 LearningRate 0.0414 Epoch: 7 Global Step: 295860 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:57:59,864-Speed 2630.84 samples/sec Loss 8.9096 LearningRate 0.0414 Epoch: 7 Global Step: 295870 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:58:03,758-Speed 2630.37 samples/sec Loss 8.5406 LearningRate 0.0414 Epoch: 7 Global Step: 295880 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:58:07,650-Speed 2632.22 samples/sec Loss 8.6950 LearningRate 0.0414 Epoch: 7 Global Step: 295890 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:58:11,541-Speed 2631.62 samples/sec Loss 8.7615 LearningRate 0.0414 Epoch: 7 Global Step: 295900 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:58:15,436-Speed 2629.57 samples/sec Loss 8.7855 LearningRate 0.0414 Epoch: 7 Global Step: 295910 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 04:58:19,343-Speed 2622.08 samples/sec Loss 8.8171 LearningRate 0.0414 Epoch: 7 Global Step: 295920 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:23,276-Speed 2604.18 samples/sec Loss 8.8148 LearningRate 0.0414 Epoch: 7 Global Step: 295930 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:27,182-Speed 2621.80 samples/sec Loss 8.7104 LearningRate 0.0414 Epoch: 7 Global Step: 295940 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:31,082-Speed 2627.12 samples/sec Loss 8.8743 LearningRate 0.0414 Epoch: 7 Global Step: 295950 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:34,978-Speed 2628.88 samples/sec Loss 8.6682 LearningRate 0.0414 Epoch: 7 Global Step: 295960 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:38,880-Speed 2624.94 samples/sec Loss 8.7541 LearningRate 0.0414 Epoch: 7 Global Step: 295970 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:42,799-Speed 2613.21 samples/sec Loss 8.7517 LearningRate 0.0414 Epoch: 7 Global Step: 295980 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:46,709-Speed 2619.90 samples/sec Loss 8.5924 LearningRate 0.0414 Epoch: 7 Global Step: 295990 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:50,615-Speed 2622.32 samples/sec Loss 8.6849 LearningRate 0.0414 Epoch: 7 Global Step: 296000 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:54,522-Speed 2621.87 samples/sec Loss 8.7039 LearningRate 0.0414 Epoch: 7 Global Step: 296010 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 04:58:58,420-Speed 2627.50 samples/sec Loss 8.7917 LearningRate 0.0414 Epoch: 7 Global Step: 296020 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:02,316-Speed 2629.13 samples/sec Loss 8.8864 LearningRate 0.0414 Epoch: 7 Global Step: 296030 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:06,221-Speed 2622.36 samples/sec Loss 8.5722 LearningRate 0.0414 Epoch: 7 Global Step: 296040 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:10,992-Speed 2147.04 samples/sec Loss 8.7141 LearningRate 0.0414 Epoch: 7 Global Step: 296050 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:14,881-Speed 2633.52 samples/sec Loss 8.6772 LearningRate 0.0414 Epoch: 7 Global Step: 296060 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:18,774-Speed 2630.91 samples/sec Loss 8.7612 LearningRate 0.0414 Epoch: 7 Global Step: 296070 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:22,671-Speed 2629.05 samples/sec Loss 8.8424 LearningRate 0.0414 Epoch: 7 Global Step: 296080 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:26,563-Speed 2631.71 samples/sec Loss 8.7760 LearningRate 0.0414 Epoch: 7 Global Step: 296090 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:30,455-Speed 2631.45 samples/sec Loss 8.8602 LearningRate 0.0414 Epoch: 7 Global Step: 296100 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:34,349-Speed 2630.39 samples/sec Loss 8.8089 LearningRate 0.0414 Epoch: 7 Global Step: 296110 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:38,244-Speed 2629.47 samples/sec Loss 8.7100 LearningRate 0.0414 Epoch: 7 Global Step: 296120 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:59:42,135-Speed 2631.97 samples/sec Loss 8.6249 LearningRate 0.0413 Epoch: 7 Global Step: 296130 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:59:46,040-Speed 2623.40 samples/sec Loss 8.6295 LearningRate 0.0413 Epoch: 7 Global Step: 296140 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 04:59:49,913-Speed 2644.93 samples/sec Loss 8.7503 LearningRate 0.0413 Epoch: 7 Global Step: 296150 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:53,805-Speed 2631.44 samples/sec Loss 8.7408 LearningRate 0.0413 Epoch: 7 Global Step: 296160 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 04:59:57,713-Speed 2621.51 samples/sec Loss 8.6590 LearningRate 0.0413 Epoch: 7 Global Step: 296170 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:01,605-Speed 2631.74 samples/sec Loss 8.7676 LearningRate 0.0413 Epoch: 7 Global Step: 296180 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:05,523-Speed 2613.77 samples/sec Loss 8.7279 LearningRate 0.0413 Epoch: 7 Global Step: 296190 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:09,430-Speed 2621.62 samples/sec Loss 8.7892 LearningRate 0.0413 Epoch: 7 Global Step: 296200 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:13,327-Speed 2628.88 samples/sec Loss 8.7405 LearningRate 0.0413 Epoch: 7 Global Step: 296210 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:17,250-Speed 2612.12 samples/sec Loss 8.6858 LearningRate 0.0413 Epoch: 7 Global Step: 296220 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:21,142-Speed 2631.53 samples/sec Loss 8.7197 LearningRate 0.0413 Epoch: 7 Global Step: 296230 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:25,058-Speed 2615.53 samples/sec Loss 8.7615 LearningRate 0.0413 Epoch: 7 Global Step: 296240 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:29,037-Speed 2574.41 samples/sec Loss 8.8666 LearningRate 0.0413 Epoch: 7 Global Step: 296250 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:00:32,919-Speed 2638.34 samples/sec Loss 8.7076 LearningRate 0.0413 Epoch: 7 Global Step: 296260 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:36,822-Speed 2624.51 samples/sec Loss 8.7959 LearningRate 0.0413 Epoch: 7 Global Step: 296270 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:40,727-Speed 2622.79 samples/sec Loss 8.7235 LearningRate 0.0413 Epoch: 7 Global Step: 296280 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:44,630-Speed 2624.75 samples/sec Loss 8.7333 LearningRate 0.0413 Epoch: 7 Global Step: 296290 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:48,532-Speed 2624.90 samples/sec Loss 8.7447 LearningRate 0.0413 Epoch: 7 Global Step: 296300 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:52,426-Speed 2630.62 samples/sec Loss 8.5678 LearningRate 0.0413 Epoch: 7 Global Step: 296310 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:00:56,371-Speed 2595.82 samples/sec Loss 8.6196 LearningRate 0.0413 Epoch: 7 Global Step: 296320 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:00,299-Speed 2608.38 samples/sec Loss 8.8279 LearningRate 0.0413 Epoch: 7 Global Step: 296330 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:04,209-Speed 2619.83 samples/sec Loss 8.7722 LearningRate 0.0413 Epoch: 7 Global Step: 296340 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:08,110-Speed 2625.44 samples/sec Loss 8.6233 LearningRate 0.0413 Epoch: 7 Global Step: 296350 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:12,007-Speed 2628.02 samples/sec Loss 8.6491 LearningRate 0.0413 Epoch: 7 Global Step: 296360 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:15,925-Speed 2613.95 samples/sec Loss 8.7661 LearningRate 0.0413 Epoch: 7 Global Step: 296370 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:19,835-Speed 2619.84 samples/sec Loss 8.7274 LearningRate 0.0413 Epoch: 7 Global Step: 296380 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:23,750-Speed 2616.41 samples/sec Loss 8.7630 LearningRate 0.0413 Epoch: 7 Global Step: 296390 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:27,675-Speed 2609.28 samples/sec Loss 8.8526 LearningRate 0.0413 Epoch: 7 Global Step: 296400 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:01:31,560-Speed 2637.13 samples/sec Loss 8.6730 LearningRate 0.0413 Epoch: 7 Global Step: 296410 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:01:35,462-Speed 2624.95 samples/sec Loss 8.6557 LearningRate 0.0413 Epoch: 7 Global Step: 296420 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:01:39,365-Speed 2623.84 samples/sec Loss 8.7498 LearningRate 0.0413 Epoch: 7 Global Step: 296430 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:01:43,276-Speed 2619.09 samples/sec Loss 8.6804 LearningRate 0.0413 Epoch: 7 Global Step: 296440 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:01:47,187-Speed 2618.50 samples/sec Loss 8.8154 LearningRate 0.0413 Epoch: 7 Global Step: 296450 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:01:51,072-Speed 2636.78 samples/sec Loss 9.5079 LearningRate 0.0413 Epoch: 7 Global Step: 296460 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:01:54,984-Speed 2618.07 samples/sec Loss 9.4364 LearningRate 0.0413 Epoch: 7 Global Step: 296470 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:01:58,897-Speed 2617.50 samples/sec Loss 8.9499 LearningRate 0.0413 Epoch: 7 Global Step: 296480 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:02:02,780-Speed 2637.63 samples/sec Loss 9.1301 LearningRate 0.0413 Epoch: 7 Global Step: 296490 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:06,675-Speed 2629.69 samples/sec Loss 8.8027 LearningRate 0.0413 Epoch: 7 Global Step: 296500 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:10,582-Speed 2621.56 samples/sec Loss 8.8797 LearningRate 0.0413 Epoch: 7 Global Step: 296510 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:14,495-Speed 2618.20 samples/sec Loss 8.7494 LearningRate 0.0413 Epoch: 7 Global Step: 296520 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:18,400-Speed 2622.87 samples/sec Loss 8.9801 LearningRate 0.0413 Epoch: 7 Global Step: 296530 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:22,333-Speed 2603.76 samples/sec Loss 8.7040 LearningRate 0.0413 Epoch: 7 Global Step: 296540 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:26,370-Speed 2537.74 samples/sec Loss 8.9457 LearningRate 0.0413 Epoch: 7 Global Step: 296550 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:30,296-Speed 2609.05 samples/sec Loss 8.8109 LearningRate 0.0413 Epoch: 7 Global Step: 296560 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:34,240-Speed 2597.02 samples/sec Loss 8.8017 LearningRate 0.0413 Epoch: 7 Global Step: 296570 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:38,140-Speed 2625.74 samples/sec Loss 8.8670 LearningRate 0.0413 Epoch: 7 Global Step: 296580 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:02:42,044-Speed 2623.84 samples/sec Loss 8.6444 LearningRate 0.0413 Epoch: 7 Global Step: 296590 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:02:45,969-Speed 2609.59 samples/sec Loss 8.6779 LearningRate 0.0413 Epoch: 7 Global Step: 296600 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:02:49,915-Speed 2595.74 samples/sec Loss 8.6197 LearningRate 0.0413 Epoch: 7 Global Step: 296610 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:02:53,809-Speed 2629.94 samples/sec Loss 8.8237 LearningRate 0.0413 Epoch: 7 Global Step: 296620 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:02:57,703-Speed 2630.86 samples/sec Loss 8.9255 LearningRate 0.0413 Epoch: 7 Global Step: 296630 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:03:01,639-Speed 2601.95 samples/sec Loss 8.8743 LearningRate 0.0413 Epoch: 7 Global Step: 296640 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:03:05,531-Speed 2632.00 samples/sec Loss 8.8341 LearningRate 0.0413 Epoch: 7 Global Step: 296650 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:03:09,423-Speed 2631.19 samples/sec Loss 8.8796 LearningRate 0.0413 Epoch: 7 Global Step: 296660 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:03:13,314-Speed 2632.10 samples/sec Loss 8.8677 LearningRate 0.0413 Epoch: 7 Global Step: 296670 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:03:17,207-Speed 2631.16 samples/sec Loss 8.7732 LearningRate 0.0413 Epoch: 7 Global Step: 296680 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:03:21,098-Speed 2633.21 samples/sec Loss 8.7431 LearningRate 0.0413 Epoch: 7 Global Step: 296690 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:24,990-Speed 2631.42 samples/sec Loss 8.6340 LearningRate 0.0413 Epoch: 7 Global Step: 296700 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:28,881-Speed 2632.32 samples/sec Loss 8.8378 LearningRate 0.0413 Epoch: 7 Global Step: 296710 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:32,772-Speed 2632.28 samples/sec Loss 8.7200 LearningRate 0.0413 Epoch: 7 Global Step: 296720 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:36,664-Speed 2631.63 samples/sec Loss 8.7064 LearningRate 0.0413 Epoch: 7 Global Step: 296730 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:40,560-Speed 2629.24 samples/sec Loss 8.7237 LearningRate 0.0413 Epoch: 7 Global Step: 296740 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:44,452-Speed 2631.17 samples/sec Loss 8.8157 LearningRate 0.0413 Epoch: 7 Global Step: 296750 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:48,351-Speed 2626.87 samples/sec Loss 8.7266 LearningRate 0.0413 Epoch: 7 Global Step: 296760 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:52,249-Speed 2627.78 samples/sec Loss 8.7055 LearningRate 0.0413 Epoch: 7 Global Step: 296770 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:03:56,152-Speed 2624.29 samples/sec Loss 8.6098 LearningRate 0.0412 Epoch: 7 Global Step: 296780 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:00,052-Speed 2626.73 samples/sec Loss 8.6474 LearningRate 0.0412 Epoch: 7 Global Step: 296790 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:04:03,944-Speed 2632.55 samples/sec Loss 8.6915 LearningRate 0.0412 Epoch: 7 Global Step: 296800 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:04:07,817-Speed 2644.24 samples/sec Loss 8.8474 LearningRate 0.0412 Epoch: 7 Global Step: 296810 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:11,706-Speed 2633.32 samples/sec Loss 8.6236 LearningRate 0.0412 Epoch: 7 Global Step: 296820 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:15,613-Speed 2621.91 samples/sec Loss 8.6695 LearningRate 0.0412 Epoch: 7 Global Step: 296830 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:19,503-Speed 2633.22 samples/sec Loss 8.6896 LearningRate 0.0412 Epoch: 7 Global Step: 296840 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:23,396-Speed 2630.65 samples/sec Loss 8.7826 LearningRate 0.0412 Epoch: 7 Global Step: 296850 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:27,445-Speed 2530.42 samples/sec Loss 8.5391 LearningRate 0.0412 Epoch: 7 Global Step: 296860 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:31,327-Speed 2638.35 samples/sec Loss 8.5693 LearningRate 0.0412 Epoch: 7 Global Step: 296870 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:35,219-Speed 2632.02 samples/sec Loss 8.5983 LearningRate 0.0412 Epoch: 7 Global Step: 296880 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:39,119-Speed 2625.82 samples/sec Loss 8.5391 LearningRate 0.0412 Epoch: 7 Global Step: 296890 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:43,001-Speed 2638.56 samples/sec Loss 8.5836 LearningRate 0.0412 Epoch: 7 Global Step: 296900 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:04:46,895-Speed 2630.70 samples/sec Loss 8.6271 LearningRate 0.0412 Epoch: 7 Global Step: 296910 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:04:50,787-Speed 2631.95 samples/sec Loss 8.7976 LearningRate 0.0412 Epoch: 7 Global Step: 296920 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:04:54,686-Speed 2626.83 samples/sec Loss 8.5946 LearningRate 0.0412 Epoch: 7 Global Step: 296930 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:04:58,597-Speed 2618.64 samples/sec Loss 8.5478 LearningRate 0.0412 Epoch: 7 Global Step: 296940 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:02,488-Speed 2632.34 samples/sec Loss 8.8146 LearningRate 0.0412 Epoch: 7 Global Step: 296950 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:06,378-Speed 2633.16 samples/sec Loss 8.7793 LearningRate 0.0412 Epoch: 7 Global Step: 296960 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:10,267-Speed 2634.24 samples/sec Loss 8.6596 LearningRate 0.0412 Epoch: 7 Global Step: 296970 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:14,156-Speed 2633.26 samples/sec Loss 8.6492 LearningRate 0.0412 Epoch: 7 Global Step: 296980 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:18,054-Speed 2627.11 samples/sec Loss 8.5724 LearningRate 0.0412 Epoch: 7 Global Step: 296990 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:21,946-Speed 2632.19 samples/sec Loss 8.7432 LearningRate 0.0412 Epoch: 7 Global Step: 297000 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:25,840-Speed 2630.93 samples/sec Loss 8.7314 LearningRate 0.0412 Epoch: 7 Global Step: 297010 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:05:29,725-Speed 2636.15 samples/sec Loss 8.5690 LearningRate 0.0412 Epoch: 7 Global Step: 297020 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:05:33,599-Speed 2644.14 samples/sec Loss 8.6563 LearningRate 0.0412 Epoch: 7 Global Step: 297030 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:05:37,491-Speed 2631.48 samples/sec Loss 8.6714 LearningRate 0.0412 Epoch: 7 Global Step: 297040 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:05:41,383-Speed 2631.77 samples/sec Loss 8.6710 LearningRate 0.0412 Epoch: 7 Global Step: 297050 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:05:45,277-Speed 2630.70 samples/sec Loss 8.6953 LearningRate 0.0412 Epoch: 7 Global Step: 297060 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:05:49,164-Speed 2634.55 samples/sec Loss 8.9223 LearningRate 0.0412 Epoch: 7 Global Step: 297070 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:05:53,056-Speed 2632.54 samples/sec Loss 8.7365 LearningRate 0.0412 Epoch: 7 Global Step: 297080 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:05:56,944-Speed 2634.13 samples/sec Loss 8.8094 LearningRate 0.0412 Epoch: 7 Global Step: 297090 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:06:00,834-Speed 2632.91 samples/sec Loss 8.6971 LearningRate 0.0412 Epoch: 7 Global Step: 297100 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:06:04,730-Speed 2628.68 samples/sec Loss 8.8590 LearningRate 0.0412 Epoch: 7 Global Step: 297110 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:06:08,633-Speed 2624.65 samples/sec Loss 8.7375 LearningRate 0.0412 Epoch: 7 Global Step: 297120 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:06:12,526-Speed 2630.42 samples/sec Loss 8.6669 LearningRate 0.0412 Epoch: 7 Global Step: 297130 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:16,426-Speed 2626.74 samples/sec Loss 8.6073 LearningRate 0.0412 Epoch: 7 Global Step: 297140 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:20,315-Speed 2633.26 samples/sec Loss 8.6257 LearningRate 0.0412 Epoch: 7 Global Step: 297150 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:24,206-Speed 2632.91 samples/sec Loss 8.5388 LearningRate 0.0412 Epoch: 7 Global Step: 297160 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:28,108-Speed 2625.19 samples/sec Loss 8.6973 LearningRate 0.0412 Epoch: 7 Global Step: 297170 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:31,997-Speed 2633.28 samples/sec Loss 8.7175 LearningRate 0.0412 Epoch: 7 Global Step: 297180 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:35,904-Speed 2621.44 samples/sec Loss 8.7733 LearningRate 0.0412 Epoch: 7 Global Step: 297190 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:39,798-Speed 2630.41 samples/sec Loss 8.5766 LearningRate 0.0412 Epoch: 7 Global Step: 297200 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:43,691-Speed 2630.70 samples/sec Loss 8.6508 LearningRate 0.0412 Epoch: 7 Global Step: 297210 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:47,582-Speed 2632.41 samples/sec Loss 8.7525 LearningRate 0.0412 Epoch: 7 Global Step: 297220 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:06:51,479-Speed 2628.60 samples/sec Loss 8.6406 LearningRate 0.0412 Epoch: 7 Global Step: 297230 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:06:55,382-Speed 2623.68 samples/sec Loss 8.6799 LearningRate 0.0412 Epoch: 7 Global Step: 297240 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:06:59,261-Speed 2640.92 samples/sec Loss 8.6497 LearningRate 0.0412 Epoch: 7 Global Step: 297250 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:03,185-Speed 2610.09 samples/sec Loss 8.7255 LearningRate 0.0412 Epoch: 7 Global Step: 297260 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:07,085-Speed 2626.09 samples/sec Loss 8.6291 LearningRate 0.0412 Epoch: 7 Global Step: 297270 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:11,006-Speed 2612.50 samples/sec Loss 8.7373 LearningRate 0.0412 Epoch: 7 Global Step: 297280 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:14,908-Speed 2625.07 samples/sec Loss 8.7764 LearningRate 0.0412 Epoch: 7 Global Step: 297290 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:18,805-Speed 2628.47 samples/sec Loss 8.5489 LearningRate 0.0412 Epoch: 7 Global Step: 297300 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:22,717-Speed 2618.25 samples/sec Loss 8.6191 LearningRate 0.0412 Epoch: 7 Global Step: 297310 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:26,619-Speed 2625.39 samples/sec Loss 8.4798 LearningRate 0.0412 Epoch: 7 Global Step: 297320 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:30,530-Speed 2618.65 samples/sec Loss 8.7061 LearningRate 0.0412 Epoch: 7 Global Step: 297330 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:34,422-Speed 2631.80 samples/sec Loss 8.6175 LearningRate 0.0412 Epoch: 7 Global Step: 297340 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:38,298-Speed 2642.25 samples/sec Loss 8.6624 LearningRate 0.0412 Epoch: 7 Global Step: 297350 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:07:42,171-Speed 2644.68 samples/sec Loss 8.7388 LearningRate 0.0412 Epoch: 7 Global Step: 297360 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:07:46,065-Speed 2630.75 samples/sec Loss 8.6762 LearningRate 0.0412 Epoch: 7 Global Step: 297370 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:07:49,957-Speed 2631.28 samples/sec Loss 8.4520 LearningRate 0.0412 Epoch: 7 Global Step: 297380 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:07:53,855-Speed 2628.17 samples/sec Loss 8.7897 LearningRate 0.0412 Epoch: 7 Global Step: 297390 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:07:57,748-Speed 2630.78 samples/sec Loss 8.7002 LearningRate 0.0412 Epoch: 7 Global Step: 297400 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:08:01,653-Speed 2623.26 samples/sec Loss 8.7199 LearningRate 0.0412 Epoch: 7 Global Step: 297410 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:08:05,543-Speed 2632.99 samples/sec Loss 8.5757 LearningRate 0.0411 Epoch: 7 Global Step: 297420 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:08:09,447-Speed 2623.53 samples/sec Loss 8.7661 LearningRate 0.0411 Epoch: 7 Global Step: 297430 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:08:13,339-Speed 2631.20 samples/sec Loss 8.7487 LearningRate 0.0411 Epoch: 7 Global Step: 297440 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:08:17,250-Speed 2619.17 samples/sec Loss 8.8176 LearningRate 0.0411 Epoch: 7 Global Step: 297450 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:08:21,165-Speed 2616.75 samples/sec Loss 8.7362 LearningRate 0.0411 Epoch: 7 Global Step: 297460 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:08:25,096-Speed 2605.62 samples/sec Loss 8.7285 LearningRate 0.0411 Epoch: 7 Global Step: 297470 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:08:28,996-Speed 2626.82 samples/sec Loss 8.7165 LearningRate 0.0411 Epoch: 7 Global Step: 297480 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:08:32,896-Speed 2625.74 samples/sec Loss 8.6144 LearningRate 0.0411 Epoch: 7 Global Step: 297490 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:08:36,758-Speed 2652.58 samples/sec Loss 8.6759 LearningRate 0.0411 Epoch: 7 Global Step: 297500 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:08:40,653-Speed 2629.91 samples/sec Loss 9.9806 LearningRate 0.0411 Epoch: 7 Global Step: 297510 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:08:44,554-Speed 2625.76 samples/sec Loss 9.2055 LearningRate 0.0411 Epoch: 7 Global Step: 297520 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:08:48,464-Speed 2618.80 samples/sec Loss 8.9084 LearningRate 0.0411 Epoch: 7 Global Step: 297530 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:08:52,364-Speed 2626.74 samples/sec Loss 8.7804 LearningRate 0.0411 Epoch: 7 Global Step: 297540 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:08:56,267-Speed 2624.62 samples/sec Loss 8.8066 LearningRate 0.0411 Epoch: 7 Global Step: 297550 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:09:00,163-Speed 2629.23 samples/sec Loss 8.7772 LearningRate 0.0411 Epoch: 7 Global Step: 297560 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:09:04,094-Speed 2606.10 samples/sec Loss 8.8103 LearningRate 0.0411 Epoch: 7 Global Step: 297570 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:09:08,026-Speed 2604.65 samples/sec Loss 8.6820 LearningRate 0.0411 Epoch: 7 Global Step: 297580 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:09:12,012-Speed 2569.90 samples/sec Loss 8.6297 LearningRate 0.0411 Epoch: 7 Global Step: 297590 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:09:15,911-Speed 2626.51 samples/sec Loss 8.7040 LearningRate 0.0411 Epoch: 7 Global Step: 297600 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:19,822-Speed 2619.41 samples/sec Loss 8.6943 LearningRate 0.0411 Epoch: 7 Global Step: 297610 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:23,726-Speed 2623.43 samples/sec Loss 8.7883 LearningRate 0.0411 Epoch: 7 Global Step: 297620 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:27,638-Speed 2617.95 samples/sec Loss 8.6729 LearningRate 0.0411 Epoch: 7 Global Step: 297630 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:31,528-Speed 2633.20 samples/sec Loss 8.7042 LearningRate 0.0411 Epoch: 7 Global Step: 297640 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:35,416-Speed 2634.77 samples/sec Loss 8.7404 LearningRate 0.0411 Epoch: 7 Global Step: 297650 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:39,304-Speed 2633.96 samples/sec Loss 8.6709 LearningRate 0.0411 Epoch: 7 Global Step: 297660 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:43,193-Speed 2633.48 samples/sec Loss 8.7497 LearningRate 0.0411 Epoch: 7 Global Step: 297670 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:47,085-Speed 2631.55 samples/sec Loss 8.7009 LearningRate 0.0411 Epoch: 7 Global Step: 297680 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:51,062-Speed 2575.71 samples/sec Loss 8.5600 LearningRate 0.0411 Epoch: 7 Global Step: 297690 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:09:54,959-Speed 2628.64 samples/sec Loss 8.7837 LearningRate 0.0411 Epoch: 7 Global Step: 297700 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:09:58,857-Speed 2627.76 samples/sec Loss 8.6869 LearningRate 0.0411 Epoch: 7 Global Step: 297710 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:02,764-Speed 2621.81 samples/sec Loss 8.4883 LearningRate 0.0411 Epoch: 7 Global Step: 297720 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:06,814-Speed 2529.10 samples/sec Loss 8.7326 LearningRate 0.0411 Epoch: 7 Global Step: 297730 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:10,712-Speed 2627.37 samples/sec Loss 8.7573 LearningRate 0.0411 Epoch: 7 Global Step: 297740 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:14,611-Speed 2626.50 samples/sec Loss 8.8483 LearningRate 0.0411 Epoch: 7 Global Step: 297750 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:18,503-Speed 2631.39 samples/sec Loss 8.7900 LearningRate 0.0411 Epoch: 7 Global Step: 297760 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:22,397-Speed 2630.92 samples/sec Loss 8.6780 LearningRate 0.0411 Epoch: 7 Global Step: 297770 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:26,288-Speed 2632.95 samples/sec Loss 8.7562 LearningRate 0.0411 Epoch: 7 Global Step: 297780 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:30,196-Speed 2620.73 samples/sec Loss 8.7820 LearningRate 0.0411 Epoch: 7 Global Step: 297790 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:10:34,122-Speed 2608.60 samples/sec Loss 8.7014 LearningRate 0.0411 Epoch: 7 Global Step: 297800 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:10:38,023-Speed 2625.69 samples/sec Loss 8.6186 LearningRate 0.0411 Epoch: 7 Global Step: 297810 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:10:41,911-Speed 2634.54 samples/sec Loss 8.7382 LearningRate 0.0411 Epoch: 7 Global Step: 297820 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:10:45,806-Speed 2629.56 samples/sec Loss 8.5980 LearningRate 0.0411 Epoch: 7 Global Step: 297830 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:10:49,697-Speed 2632.46 samples/sec Loss 8.6866 LearningRate 0.0411 Epoch: 7 Global Step: 297840 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:10:53,599-Speed 2625.18 samples/sec Loss 8.7261 LearningRate 0.0411 Epoch: 7 Global Step: 297850 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:10:57,500-Speed 2625.50 samples/sec Loss 8.7173 LearningRate 0.0411 Epoch: 7 Global Step: 297860 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:11:01,390-Speed 2633.31 samples/sec Loss 8.7743 LearningRate 0.0411 Epoch: 7 Global Step: 297870 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:11:05,280-Speed 2632.46 samples/sec Loss 8.6952 LearningRate 0.0411 Epoch: 7 Global Step: 297880 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:11:09,174-Speed 2630.37 samples/sec Loss 8.7054 LearningRate 0.0411 Epoch: 7 Global Step: 297890 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:11:13,065-Speed 2632.44 samples/sec Loss 8.5070 LearningRate 0.0411 Epoch: 7 Global Step: 297900 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:11:16,952-Speed 2634.84 samples/sec Loss 8.7835 LearningRate 0.0411 Epoch: 7 Global Step: 297910 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:11:20,841-Speed 2633.48 samples/sec Loss 8.6772 LearningRate 0.0411 Epoch: 7 Global Step: 297920 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:11:24,734-Speed 2631.14 samples/sec Loss 8.6483 LearningRate 0.0411 Epoch: 7 Global Step: 297930 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:11:28,621-Speed 2635.08 samples/sec Loss 8.7195 LearningRate 0.0411 Epoch: 7 Global Step: 297940 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:11:32,474-Speed 2658.49 samples/sec Loss 8.6679 LearningRate 0.0411 Epoch: 7 Global Step: 297950 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:11:36,345-Speed 2645.64 samples/sec Loss 8.5772 LearningRate 0.0411 Epoch: 7 Global Step: 297960 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:11:40,235-Speed 2633.22 samples/sec Loss 8.6861 LearningRate 0.0411 Epoch: 7 Global Step: 297970 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:11:44,127-Speed 2631.33 samples/sec Loss 8.5286 LearningRate 0.0411 Epoch: 7 Global Step: 297980 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:11:48,020-Speed 2631.19 samples/sec Loss 8.5680 LearningRate 0.0411 Epoch: 7 Global Step: 297990 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:11:51,926-Speed 2622.60 samples/sec Loss 8.5532 LearningRate 0.0411 Epoch: 7 Global Step: 298000 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:11:55,822-Speed 2629.11 samples/sec Loss 8.7101 LearningRate 0.0411 Epoch: 7 Global Step: 298010 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:11:59,728-Speed 2622.58 samples/sec Loss 8.7728 LearningRate 0.0411 Epoch: 7 Global Step: 298020 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:12:03,645-Speed 2615.04 samples/sec Loss 8.6848 LearningRate 0.0411 Epoch: 7 Global Step: 298030 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:12:07,539-Speed 2630.06 samples/sec Loss 8.6108 LearningRate 0.0411 Epoch: 7 Global Step: 298040 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:12:11,437-Speed 2627.32 samples/sec Loss 8.5991 LearningRate 0.0411 Epoch: 7 Global Step: 298050 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:12:15,405-Speed 2581.77 samples/sec Loss 8.6390 LearningRate 0.0411 Epoch: 7 Global Step: 298060 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:12:19,319-Speed 2617.01 samples/sec Loss 8.5958 LearningRate 0.0410 Epoch: 7 Global Step: 298070 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:12:23,203-Speed 2637.45 samples/sec Loss 9.4731 LearningRate 0.0410 Epoch: 7 Global Step: 298080 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:12:27,087-Speed 2636.91 samples/sec Loss 9.3013 LearningRate 0.0410 Epoch: 7 Global Step: 298090 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:30,979-Speed 2632.10 samples/sec Loss 8.6679 LearningRate 0.0410 Epoch: 7 Global Step: 298100 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:34,871-Speed 2631.33 samples/sec Loss 8.7046 LearningRate 0.0410 Epoch: 7 Global Step: 298110 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:38,770-Speed 2627.00 samples/sec Loss 8.7489 LearningRate 0.0410 Epoch: 7 Global Step: 298120 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:42,661-Speed 2632.21 samples/sec Loss 8.7588 LearningRate 0.0410 Epoch: 7 Global Step: 298130 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:46,563-Speed 2625.36 samples/sec Loss 8.6167 LearningRate 0.0410 Epoch: 7 Global Step: 298140 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:50,458-Speed 2629.92 samples/sec Loss 8.6424 LearningRate 0.0410 Epoch: 7 Global Step: 298150 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:54,357-Speed 2627.17 samples/sec Loss 8.5511 LearningRate 0.0410 Epoch: 7 Global Step: 298160 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:12:58,244-Speed 2634.88 samples/sec Loss 8.6360 LearningRate 0.0410 Epoch: 7 Global Step: 298170 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:13:02,133-Speed 2634.41 samples/sec Loss 8.7446 LearningRate 0.0410 Epoch: 7 Global Step: 298180 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:13:06,024-Speed 2632.13 samples/sec Loss 8.7145 LearningRate 0.0410 Epoch: 7 Global Step: 298190 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:09,914-Speed 2632.61 samples/sec Loss 8.5651 LearningRate 0.0410 Epoch: 7 Global Step: 298200 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:13,804-Speed 2633.30 samples/sec Loss 8.7329 LearningRate 0.0410 Epoch: 7 Global Step: 298210 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:17,695-Speed 2632.87 samples/sec Loss 8.7479 LearningRate 0.0410 Epoch: 7 Global Step: 298220 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:21,584-Speed 2633.35 samples/sec Loss 8.5429 LearningRate 0.0410 Epoch: 7 Global Step: 298230 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:25,481-Speed 2628.06 samples/sec Loss 8.5732 LearningRate 0.0410 Epoch: 7 Global Step: 298240 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:29,373-Speed 2632.37 samples/sec Loss 8.6122 LearningRate 0.0410 Epoch: 7 Global Step: 298250 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:33,261-Speed 2634.69 samples/sec Loss 8.6247 LearningRate 0.0410 Epoch: 7 Global Step: 298260 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:37,152-Speed 2632.14 samples/sec Loss 8.5535 LearningRate 0.0410 Epoch: 7 Global Step: 298270 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:41,043-Speed 2632.00 samples/sec Loss 8.6000 LearningRate 0.0410 Epoch: 7 Global Step: 298280 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:13:44,932-Speed 2633.77 samples/sec Loss 8.6018 LearningRate 0.0410 Epoch: 7 Global Step: 298290 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:13:48,838-Speed 2622.24 samples/sec Loss 8.7116 LearningRate 0.0410 Epoch: 7 Global Step: 298300 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:13:52,731-Speed 2631.28 samples/sec Loss 8.6498 LearningRate 0.0410 Epoch: 7 Global Step: 298310 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:13:56,640-Speed 2620.00 samples/sec Loss 8.5443 LearningRate 0.0410 Epoch: 7 Global Step: 298320 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:14:00,551-Speed 2619.20 samples/sec Loss 8.6758 LearningRate 0.0410 Epoch: 7 Global Step: 298330 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:14:04,439-Speed 2634.18 samples/sec Loss 8.7803 LearningRate 0.0410 Epoch: 7 Global Step: 298340 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:14:08,328-Speed 2633.45 samples/sec Loss 8.7194 LearningRate 0.0410 Epoch: 7 Global Step: 298350 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:14:12,216-Speed 2633.96 samples/sec Loss 8.6261 LearningRate 0.0410 Epoch: 7 Global Step: 298360 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:14:16,113-Speed 2628.36 samples/sec Loss 8.7203 LearningRate 0.0410 Epoch: 7 Global Step: 298370 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:14:20,003-Speed 2633.36 samples/sec Loss 8.6771 LearningRate 0.0410 Epoch: 7 Global Step: 298380 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:14:23,897-Speed 2630.09 samples/sec Loss 8.6921 LearningRate 0.0410 Epoch: 7 Global Step: 298390 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:27,788-Speed 2632.58 samples/sec Loss 8.7522 LearningRate 0.0410 Epoch: 7 Global Step: 298400 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:31,680-Speed 2633.37 samples/sec Loss 8.7518 LearningRate 0.0410 Epoch: 7 Global Step: 298410 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:35,576-Speed 2628.84 samples/sec Loss 8.8198 LearningRate 0.0410 Epoch: 7 Global Step: 298420 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:39,466-Speed 2633.29 samples/sec Loss 8.7418 LearningRate 0.0410 Epoch: 7 Global Step: 298430 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:43,366-Speed 2626.26 samples/sec Loss 8.7169 LearningRate 0.0410 Epoch: 7 Global Step: 298440 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:47,272-Speed 2622.50 samples/sec Loss 8.5899 LearningRate 0.0410 Epoch: 7 Global Step: 298450 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:51,163-Speed 2632.29 samples/sec Loss 8.5347 LearningRate 0.0410 Epoch: 7 Global Step: 298460 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:55,054-Speed 2632.28 samples/sec Loss 8.5415 LearningRate 0.0410 Epoch: 7 Global Step: 298470 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:14:58,947-Speed 2630.74 samples/sec Loss 8.7200 LearningRate 0.0410 Epoch: 7 Global Step: 298480 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:02,827-Speed 2639.92 samples/sec Loss 8.6947 LearningRate 0.0410 Epoch: 7 Global Step: 298490 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:06,718-Speed 2632.56 samples/sec Loss 8.6624 LearningRate 0.0410 Epoch: 7 Global Step: 298500 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:10,620-Speed 2624.84 samples/sec Loss 8.5859 LearningRate 0.0410 Epoch: 7 Global Step: 298510 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:14,511-Speed 2633.33 samples/sec Loss 8.6642 LearningRate 0.0410 Epoch: 7 Global Step: 298520 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:18,401-Speed 2632.50 samples/sec Loss 8.6338 LearningRate 0.0410 Epoch: 7 Global Step: 298530 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:22,414-Speed 2552.30 samples/sec Loss 8.6372 LearningRate 0.0410 Epoch: 7 Global Step: 298540 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:26,311-Speed 2628.54 samples/sec Loss 8.7166 LearningRate 0.0410 Epoch: 7 Global Step: 298550 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:30,246-Speed 2603.38 samples/sec Loss 8.7734 LearningRate 0.0410 Epoch: 7 Global Step: 298560 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:34,137-Speed 2631.94 samples/sec Loss 8.6406 LearningRate 0.0410 Epoch: 7 Global Step: 298570 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:38,026-Speed 2633.76 samples/sec Loss 8.7823 LearningRate 0.0410 Epoch: 7 Global Step: 298580 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:15:41,919-Speed 2631.20 samples/sec Loss 8.6141 LearningRate 0.0410 Epoch: 7 Global Step: 298590 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:15:45,808-Speed 2634.05 samples/sec Loss 8.7506 LearningRate 0.0410 Epoch: 7 Global Step: 298600 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:15:49,701-Speed 2630.71 samples/sec Loss 8.6767 LearningRate 0.0410 Epoch: 7 Global Step: 298610 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:15:53,591-Speed 2633.08 samples/sec Loss 8.6970 LearningRate 0.0410 Epoch: 7 Global Step: 298620 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:15:57,484-Speed 2630.99 samples/sec Loss 8.7327 LearningRate 0.0410 Epoch: 7 Global Step: 298630 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:16:01,375-Speed 2632.26 samples/sec Loss 8.8499 LearningRate 0.0410 Epoch: 7 Global Step: 298640 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:16:05,281-Speed 2622.14 samples/sec Loss 8.5869 LearningRate 0.0410 Epoch: 7 Global Step: 298650 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:16:09,188-Speed 2621.59 samples/sec Loss 8.7342 LearningRate 0.0410 Epoch: 7 Global Step: 298660 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:16:13,082-Speed 2630.48 samples/sec Loss 8.5493 LearningRate 0.0410 Epoch: 7 Global Step: 298670 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:16:16,961-Speed 2640.92 samples/sec Loss 8.5649 LearningRate 0.0410 Epoch: 7 Global Step: 298680 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:20,856-Speed 2629.34 samples/sec Loss 8.6222 LearningRate 0.0410 Epoch: 7 Global Step: 298690 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:24,769-Speed 2617.43 samples/sec Loss 8.6833 LearningRate 0.0410 Epoch: 7 Global Step: 298700 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:28,659-Speed 2633.05 samples/sec Loss 8.6354 LearningRate 0.0410 Epoch: 7 Global Step: 298710 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:32,552-Speed 2631.01 samples/sec Loss 8.4323 LearningRate 0.0409 Epoch: 7 Global Step: 298720 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:36,444-Speed 2631.52 samples/sec Loss 8.6261 LearningRate 0.0409 Epoch: 7 Global Step: 298730 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:40,341-Speed 2628.10 samples/sec Loss 8.6571 LearningRate 0.0409 Epoch: 7 Global Step: 298740 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:44,247-Speed 2622.21 samples/sec Loss 8.6609 LearningRate 0.0409 Epoch: 7 Global Step: 298750 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:48,142-Speed 2630.02 samples/sec Loss 8.6412 LearningRate 0.0409 Epoch: 7 Global Step: 298760 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:52,035-Speed 2631.30 samples/sec Loss 8.5932 LearningRate 0.0409 Epoch: 7 Global Step: 298770 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:16:55,926-Speed 2632.53 samples/sec Loss 8.5933 LearningRate 0.0409 Epoch: 7 Global Step: 298780 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:16:59,804-Speed 2640.65 samples/sec Loss 8.5606 LearningRate 0.0409 Epoch: 7 Global Step: 298790 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:17:03,698-Speed 2630.46 samples/sec Loss 8.6056 LearningRate 0.0409 Epoch: 7 Global Step: 298800 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:17:07,541-Speed 2665.37 samples/sec Loss 8.7659 LearningRate 0.0409 Epoch: 7 Global Step: 298810 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:11,426-Speed 2636.40 samples/sec Loss 9.1316 LearningRate 0.0409 Epoch: 7 Global Step: 298820 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:15,323-Speed 2628.50 samples/sec Loss 9.1463 LearningRate 0.0409 Epoch: 7 Global Step: 298830 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:19,219-Speed 2628.71 samples/sec Loss 8.7152 LearningRate 0.0409 Epoch: 7 Global Step: 298840 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:23,132-Speed 2618.10 samples/sec Loss 8.8216 LearningRate 0.0409 Epoch: 7 Global Step: 298850 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:27,036-Speed 2624.11 samples/sec Loss 8.7848 LearningRate 0.0409 Epoch: 7 Global Step: 298860 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:30,932-Speed 2629.15 samples/sec Loss 8.5544 LearningRate 0.0409 Epoch: 7 Global Step: 298870 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:34,822-Speed 2633.28 samples/sec Loss 8.6403 LearningRate 0.0409 Epoch: 7 Global Step: 298880 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:38,709-Speed 2634.75 samples/sec Loss 8.6325 LearningRate 0.0409 Epoch: 7 Global Step: 298890 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:42,598-Speed 2633.62 samples/sec Loss 8.5850 LearningRate 0.0409 Epoch: 7 Global Step: 298900 Fp16 Grad Scale: 8192 Required: 60 hours
Training: 2022-04-14 05:17:46,490-Speed 2632.40 samples/sec Loss 8.5903 LearningRate 0.0409 Epoch: 7 Global Step: 298910 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:17:50,381-Speed 2631.84 samples/sec Loss 8.6723 LearningRate 0.0409 Epoch: 7 Global Step: 298920 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:17:54,272-Speed 2632.96 samples/sec Loss 8.6360 LearningRate 0.0409 Epoch: 7 Global Step: 298930 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:17:58,162-Speed 2632.78 samples/sec Loss 8.6457 LearningRate 0.0409 Epoch: 7 Global Step: 298940 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:18:02,062-Speed 2626.71 samples/sec Loss 8.6152 LearningRate 0.0409 Epoch: 7 Global Step: 298950 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:18:05,950-Speed 2634.32 samples/sec Loss 8.9333 LearningRate 0.0409 Epoch: 7 Global Step: 298960 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:18:09,859-Speed 2619.90 samples/sec Loss 9.3442 LearningRate 0.0409 Epoch: 7 Global Step: 298970 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:18:13,750-Speed 2632.25 samples/sec Loss 8.9096 LearningRate 0.0409 Epoch: 7 Global Step: 298980 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:18:17,647-Speed 2628.56 samples/sec Loss 8.8624 LearningRate 0.0409 Epoch: 7 Global Step: 298990 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:18:21,557-Speed 2619.66 samples/sec Loss 8.6450 LearningRate 0.0409 Epoch: 7 Global Step: 299000 Fp16 Grad Scale: 16384 Required: 60 hours
Training: 2022-04-14 05:18:25,448-Speed 2632.36 samples/sec Loss 8.8364 LearningRate 0.0409 Epoch: 7 Global Step: 299010 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:29,364-Speed 2615.82 samples/sec Loss 8.7022 LearningRate 0.0409 Epoch: 7 Global Step: 299020 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:33,253-Speed 2633.80 samples/sec Loss 8.6572 LearningRate 0.0409 Epoch: 7 Global Step: 299030 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:37,144-Speed 2632.64 samples/sec Loss 8.7033 LearningRate 0.0409 Epoch: 7 Global Step: 299040 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:41,036-Speed 2631.40 samples/sec Loss 8.6895 LearningRate 0.0409 Epoch: 7 Global Step: 299050 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:44,924-Speed 2634.44 samples/sec Loss 8.6062 LearningRate 0.0409 Epoch: 7 Global Step: 299060 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:48,817-Speed 2631.08 samples/sec Loss 8.6283 LearningRate 0.0409 Epoch: 7 Global Step: 299070 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:52,701-Speed 2636.75 samples/sec Loss 9.0010 LearningRate 0.0409 Epoch: 7 Global Step: 299080 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:18:56,613-Speed 2617.99 samples/sec Loss 8.7356 LearningRate 0.0409 Epoch: 7 Global Step: 299090 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:19:00,520-Speed 2622.46 samples/sec Loss 8.7372 LearningRate 0.0409 Epoch: 7 Global Step: 299100 Fp16 Grad Scale: 32768 Required: 60 hours
Training: 2022-04-14 05:19:04,410-Speed 2633.01 samples/sec Loss 8.7280 LearningRate 0.0409 Epoch: 7 Global Step: 299110 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:08,299-Speed 2633.56 samples/sec Loss 8.6951 LearningRate 0.0409 Epoch: 7 Global Step: 299120 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:12,200-Speed 2625.27 samples/sec Loss 8.6638 LearningRate 0.0409 Epoch: 7 Global Step: 299130 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:16,093-Speed 2631.56 samples/sec Loss 8.6906 LearningRate 0.0409 Epoch: 7 Global Step: 299140 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:19,982-Speed 2633.89 samples/sec Loss 8.5102 LearningRate 0.0409 Epoch: 7 Global Step: 299150 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:23,901-Speed 2613.11 samples/sec Loss 8.5199 LearningRate 0.0409 Epoch: 7 Global Step: 299160 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:27,798-Speed 2628.85 samples/sec Loss 8.6649 LearningRate 0.0409 Epoch: 7 Global Step: 299170 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:31,690-Speed 2631.63 samples/sec Loss 8.7197 LearningRate 0.0409 Epoch: 7 Global Step: 299180 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:35,583-Speed 2631.11 samples/sec Loss 8.6354 LearningRate 0.0409 Epoch: 7 Global Step: 299190 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:39,473-Speed 2633.25 samples/sec Loss 8.7473 LearningRate 0.0409 Epoch: 7 Global Step: 299200 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:19:43,379-Speed 2622.40 samples/sec Loss 8.6758 LearningRate 0.0409 Epoch: 7 Global Step: 299210 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:19:47,290-Speed 2619.03 samples/sec Loss 8.6448 LearningRate 0.0409 Epoch: 7 Global Step: 299220 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:19:51,204-Speed 2616.90 samples/sec Loss 8.6861 LearningRate 0.0409 Epoch: 7 Global Step: 299230 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:19:55,092-Speed 2634.34 samples/sec Loss 8.7831 LearningRate 0.0409 Epoch: 7 Global Step: 299240 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:19:58,987-Speed 2629.56 samples/sec Loss 8.6388 LearningRate 0.0409 Epoch: 7 Global Step: 299250 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:02,892-Speed 2623.11 samples/sec Loss 8.5884 LearningRate 0.0409 Epoch: 7 Global Step: 299260 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:06,779-Speed 2635.00 samples/sec Loss 8.7068 LearningRate 0.0409 Epoch: 7 Global Step: 299270 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:10,680-Speed 2625.70 samples/sec Loss 8.6892 LearningRate 0.0409 Epoch: 7 Global Step: 299280 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:14,574-Speed 2629.99 samples/sec Loss 8.8166 LearningRate 0.0409 Epoch: 7 Global Step: 299290 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:18,466-Speed 2632.35 samples/sec Loss 8.6075 LearningRate 0.0409 Epoch: 7 Global Step: 299300 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:22,386-Speed 2612.63 samples/sec Loss 8.6343 LearningRate 0.0409 Epoch: 7 Global Step: 299310 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:20:26,259-Speed 2644.55 samples/sec Loss 8.6401 LearningRate 0.0409 Epoch: 7 Global Step: 299320 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:30,152-Speed 2631.13 samples/sec Loss 8.6640 LearningRate 0.0409 Epoch: 7 Global Step: 299330 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:34,045-Speed 2630.73 samples/sec Loss 8.7306 LearningRate 0.0409 Epoch: 7 Global Step: 299340 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:37,945-Speed 2626.85 samples/sec Loss 8.7749 LearningRate 0.0409 Epoch: 7 Global Step: 299350 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:41,837-Speed 2631.20 samples/sec Loss 8.6802 LearningRate 0.0409 Epoch: 7 Global Step: 299360 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:45,733-Speed 2630.53 samples/sec Loss 8.7716 LearningRate 0.0408 Epoch: 7 Global Step: 299370 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:49,622-Speed 2633.33 samples/sec Loss 8.6386 LearningRate 0.0408 Epoch: 7 Global Step: 299380 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:53,524-Speed 2625.18 samples/sec Loss 8.7566 LearningRate 0.0408 Epoch: 7 Global Step: 299390 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:20:57,428-Speed 2623.41 samples/sec Loss 8.6789 LearningRate 0.0408 Epoch: 7 Global Step: 299400 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:01,323-Speed 2630.12 samples/sec Loss 8.5916 LearningRate 0.0408 Epoch: 7 Global Step: 299410 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:05,292-Speed 2580.40 samples/sec Loss 8.6343 LearningRate 0.0408 Epoch: 7 Global Step: 299420 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:21:09,180-Speed 2634.15 samples/sec Loss 8.5514 LearningRate 0.0408 Epoch: 7 Global Step: 299430 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:21:13,098-Speed 2614.35 samples/sec Loss 8.6232 LearningRate 0.0408 Epoch: 7 Global Step: 299440 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:21:16,996-Speed 2628.82 samples/sec Loss 8.6151 LearningRate 0.0408 Epoch: 7 Global Step: 299450 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:21:20,892-Speed 2628.83 samples/sec Loss 8.7250 LearningRate 0.0408 Epoch: 7 Global Step: 299460 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:24,794-Speed 2624.72 samples/sec Loss 8.6852 LearningRate 0.0408 Epoch: 7 Global Step: 299470 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:28,689-Speed 2629.96 samples/sec Loss 8.7014 LearningRate 0.0408 Epoch: 7 Global Step: 299480 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:32,579-Speed 2632.94 samples/sec Loss 8.6613 LearningRate 0.0408 Epoch: 7 Global Step: 299490 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:36,479-Speed 2626.02 samples/sec Loss 8.6496 LearningRate 0.0408 Epoch: 7 Global Step: 299500 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:40,363-Speed 2637.18 samples/sec Loss 8.6699 LearningRate 0.0408 Epoch: 7 Global Step: 299510 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:44,264-Speed 2625.26 samples/sec Loss 8.7048 LearningRate 0.0408 Epoch: 7 Global Step: 299520 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:48,161-Speed 2629.34 samples/sec Loss 8.6622 LearningRate 0.0408 Epoch: 7 Global Step: 299530 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:52,049-Speed 2633.74 samples/sec Loss 8.6879 LearningRate 0.0408 Epoch: 7 Global Step: 299540 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:55,941-Speed 2632.08 samples/sec Loss 8.6391 LearningRate 0.0408 Epoch: 7 Global Step: 299550 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:21:59,898-Speed 2587.94 samples/sec Loss 8.7260 LearningRate 0.0408 Epoch: 7 Global Step: 299560 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:22:03,790-Speed 2631.85 samples/sec Loss 8.6324 LearningRate 0.0408 Epoch: 7 Global Step: 299570 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:22:07,684-Speed 2630.51 samples/sec Loss 8.5227 LearningRate 0.0408 Epoch: 7 Global Step: 299580 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:22:11,575-Speed 2632.86 samples/sec Loss 8.6787 LearningRate 0.0408 Epoch: 7 Global Step: 299590 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:22:15,464-Speed 2633.58 samples/sec Loss 8.5762 LearningRate 0.0408 Epoch: 7 Global Step: 299600 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:22:19,358-Speed 2630.03 samples/sec Loss 8.7866 LearningRate 0.0408 Epoch: 7 Global Step: 299610 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:22:23,264-Speed 2622.62 samples/sec Loss 8.7278 LearningRate 0.0408 Epoch: 7 Global Step: 299620 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:22:27,157-Speed 2630.76 samples/sec Loss 8.6216 LearningRate 0.0408 Epoch: 7 Global Step: 299630 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:22:31,045-Speed 2634.59 samples/sec Loss 8.5517 LearningRate 0.0408 Epoch: 7 Global Step: 299640 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:22:35,669-Speed 2215.17 samples/sec Loss 8.5131 LearningRate 0.0408 Epoch: 7 Global Step: 299650 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:22:39,556-Speed 2634.93 samples/sec Loss 8.5635 LearningRate 0.0408 Epoch: 7 Global Step: 299660 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:22:43,445-Speed 2634.13 samples/sec Loss 8.6404 LearningRate 0.0408 Epoch: 7 Global Step: 299670 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:22:47,336-Speed 2632.20 samples/sec Loss 8.6074 LearningRate 0.0408 Epoch: 7 Global Step: 299680 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:22:51,234-Speed 2628.05 samples/sec Loss 8.6236 LearningRate 0.0408 Epoch: 7 Global Step: 299690 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:22:55,129-Speed 2629.69 samples/sec Loss 8.6496 LearningRate 0.0408 Epoch: 7 Global Step: 299700 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:22:59,106-Speed 2574.91 samples/sec Loss 8.7506 LearningRate 0.0408 Epoch: 7 Global Step: 299710 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:23:03,005-Speed 2627.10 samples/sec Loss 8.6555 LearningRate 0.0408 Epoch: 7 Global Step: 299720 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:23:06,894-Speed 2634.29 samples/sec Loss 8.5970 LearningRate 0.0408 Epoch: 7 Global Step: 299730 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:23:10,769-Speed 2642.79 samples/sec Loss 8.6981 LearningRate 0.0408 Epoch: 7 Global Step: 299740 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:23:14,659-Speed 2633.47 samples/sec Loss 8.6193 LearningRate 0.0408 Epoch: 7 Global Step: 299750 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:23:18,523-Speed 2650.81 samples/sec Loss 8.6360 LearningRate 0.0408 Epoch: 7 Global Step: 299760 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:22,421-Speed 2628.11 samples/sec Loss 8.6622 LearningRate 0.0408 Epoch: 7 Global Step: 299770 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:26,328-Speed 2621.97 samples/sec Loss 8.5768 LearningRate 0.0408 Epoch: 7 Global Step: 299780 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:30,230-Speed 2624.55 samples/sec Loss 8.6343 LearningRate 0.0408 Epoch: 7 Global Step: 299790 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:34,147-Speed 2614.88 samples/sec Loss 8.6228 LearningRate 0.0408 Epoch: 7 Global Step: 299800 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:38,045-Speed 2627.61 samples/sec Loss 8.7490 LearningRate 0.0408 Epoch: 7 Global Step: 299810 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:41,936-Speed 2632.72 samples/sec Loss 8.6621 LearningRate 0.0408 Epoch: 7 Global Step: 299820 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:45,827-Speed 2632.33 samples/sec Loss 8.6681 LearningRate 0.0408 Epoch: 7 Global Step: 299830 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:49,718-Speed 2632.32 samples/sec Loss 8.5943 LearningRate 0.0408 Epoch: 7 Global Step: 299840 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:53,612-Speed 2630.60 samples/sec Loss 8.6973 LearningRate 0.0408 Epoch: 7 Global Step: 299850 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:23:57,508-Speed 2629.08 samples/sec Loss 8.5324 LearningRate 0.0408 Epoch: 7 Global Step: 299860 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:01,432-Speed 2610.49 samples/sec Loss 8.6043 LearningRate 0.0408 Epoch: 7 Global Step: 299870 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:05,331-Speed 2626.77 samples/sec Loss 8.6051 LearningRate 0.0408 Epoch: 7 Global Step: 299880 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:09,224-Speed 2631.41 samples/sec Loss 8.6200 LearningRate 0.0408 Epoch: 7 Global Step: 299890 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:13,153-Speed 2606.58 samples/sec Loss 8.6816 LearningRate 0.0408 Epoch: 7 Global Step: 299900 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:17,049-Speed 2628.65 samples/sec Loss 8.6934 LearningRate 0.0408 Epoch: 7 Global Step: 299910 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:20,940-Speed 2632.54 samples/sec Loss 8.6635 LearningRate 0.0408 Epoch: 7 Global Step: 299920 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:24,834-Speed 2630.22 samples/sec Loss 8.5108 LearningRate 0.0408 Epoch: 7 Global Step: 299930 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:28,728-Speed 2630.97 samples/sec Loss 8.6060 LearningRate 0.0408 Epoch: 7 Global Step: 299940 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:32,624-Speed 2628.70 samples/sec Loss 8.6793 LearningRate 0.0408 Epoch: 7 Global Step: 299950 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:24:36,514-Speed 2633.29 samples/sec Loss 8.6662 LearningRate 0.0408 Epoch: 7 Global Step: 299960 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:24:40,407-Speed 2630.73 samples/sec Loss 8.6111 LearningRate 0.0408 Epoch: 7 Global Step: 299970 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:24:44,297-Speed 2632.82 samples/sec Loss 8.6515 LearningRate 0.0408 Epoch: 7 Global Step: 299980 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:24:48,201-Speed 2623.97 samples/sec Loss 8.5599 LearningRate 0.0408 Epoch: 7 Global Step: 299990 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:24:52,093-Speed 2631.12 samples/sec Loss 8.5346 LearningRate 0.0408 Epoch: 7 Global Step: 300000 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:25:35,138-[lfw][300000]XNorm: 23.715297
Training: 2022-04-14 05:25:35,139-[lfw][300000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 05:25:35,140-[lfw][300000]Accuracy-Highest: 0.99783
Training: 2022-04-14 05:26:25,234-[cfp_fp][300000]XNorm: 21.356608
Training: 2022-04-14 05:26:25,235-[cfp_fp][300000]Accuracy-Flip: 0.98343+-0.00703
Training: 2022-04-14 05:26:25,236-[cfp_fp][300000]Accuracy-Highest: 0.98643
Training: 2022-04-14 05:27:08,313-[agedb_30][300000]XNorm: 23.332386
Training: 2022-04-14 05:27:08,314-[agedb_30][300000]Accuracy-Flip: 0.97233+-0.00782
Training: 2022-04-14 05:27:08,314-[agedb_30][300000]Accuracy-Highest: 0.97567
Training: 2022-04-14 05:27:12,200-Speed 73.09 samples/sec Loss 8.6982 LearningRate 0.0408 Epoch: 7 Global Step: 300010 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:27:16,080-Speed 2639.92 samples/sec Loss 8.6507 LearningRate 0.0407 Epoch: 7 Global Step: 300020 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:27:19,978-Speed 2627.67 samples/sec Loss 8.5857 LearningRate 0.0407 Epoch: 7 Global Step: 300030 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:27:23,864-Speed 2635.31 samples/sec Loss 8.6176 LearningRate 0.0407 Epoch: 7 Global Step: 300040 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:27:27,756-Speed 2632.16 samples/sec Loss 8.4628 LearningRate 0.0407 Epoch: 7 Global Step: 300050 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:27:31,632-Speed 2642.00 samples/sec Loss 8.6328 LearningRate 0.0407 Epoch: 7 Global Step: 300060 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:27:35,519-Speed 2635.92 samples/sec Loss 8.6108 LearningRate 0.0407 Epoch: 7 Global Step: 300070 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:27:39,401-Speed 2638.23 samples/sec Loss 8.5989 LearningRate 0.0407 Epoch: 7 Global Step: 300080 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:27:43,291-Speed 2633.31 samples/sec Loss 8.6361 LearningRate 0.0407 Epoch: 7 Global Step: 300090 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:27:47,169-Speed 2641.04 samples/sec Loss 8.6133 LearningRate 0.0407 Epoch: 7 Global Step: 300100 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:27:51,054-Speed 2636.59 samples/sec Loss 8.7090 LearningRate 0.0407 Epoch: 7 Global Step: 300110 Fp16 Grad Scale: 262144 Required: 60 hours
Training: 2022-04-14 05:27:54,931-Speed 2641.59 samples/sec Loss 8.7565 LearningRate 0.0407 Epoch: 7 Global Step: 300120 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:27:58,818-Speed 2634.67 samples/sec Loss 8.5640 LearningRate 0.0407 Epoch: 7 Global Step: 300130 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:02,712-Speed 2630.58 samples/sec Loss 8.5801 LearningRate 0.0407 Epoch: 7 Global Step: 300140 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:06,599-Speed 2635.41 samples/sec Loss 8.7097 LearningRate 0.0407 Epoch: 7 Global Step: 300150 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:10,488-Speed 2634.22 samples/sec Loss 8.7816 LearningRate 0.0407 Epoch: 7 Global Step: 300160 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:14,377-Speed 2633.67 samples/sec Loss 8.5912 LearningRate 0.0407 Epoch: 7 Global Step: 300170 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:18,266-Speed 2633.71 samples/sec Loss 8.5642 LearningRate 0.0407 Epoch: 7 Global Step: 300180 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:22,159-Speed 2631.39 samples/sec Loss 8.7474 LearningRate 0.0407 Epoch: 7 Global Step: 300190 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:26,053-Speed 2630.28 samples/sec Loss 8.6771 LearningRate 0.0407 Epoch: 7 Global Step: 300200 Fp16 Grad Scale: 131072 Required: 60 hours
Training: 2022-04-14 05:28:29,943-Speed 2632.93 samples/sec Loss 8.5695 LearningRate 0.0407 Epoch: 7 Global Step: 300210 Fp16 Grad Scale: 65536 Required: 60 hours
Training: 2022-04-14 05:28:33,822-Speed 2641.05 samples/sec Loss 8.6666 LearningRate 0.0407 Epoch: 7 Global Step: 300220 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:28:37,700-Speed 2640.79 samples/sec Loss 9.1170 LearningRate 0.0407 Epoch: 7 Global Step: 300230 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:28:41,592-Speed 2632.54 samples/sec Loss 9.0319 LearningRate 0.0407 Epoch: 7 Global Step: 300240 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:28:45,484-Speed 2631.18 samples/sec Loss 9.0011 LearningRate 0.0407 Epoch: 7 Global Step: 300250 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:28:49,377-Speed 2630.87 samples/sec Loss 8.5888 LearningRate 0.0407 Epoch: 7 Global Step: 300260 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:28:53,283-Speed 2622.29 samples/sec Loss 8.6442 LearningRate 0.0407 Epoch: 7 Global Step: 300270 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:28:57,179-Speed 2628.69 samples/sec Loss 8.6917 LearningRate 0.0407 Epoch: 7 Global Step: 300280 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:29:01,079-Speed 2626.48 samples/sec Loss 8.5547 LearningRate 0.0407 Epoch: 7 Global Step: 300290 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:29:04,979-Speed 2626.21 samples/sec Loss 8.5868 LearningRate 0.0407 Epoch: 7 Global Step: 300300 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:29:08,888-Speed 2619.80 samples/sec Loss 8.5809 LearningRate 0.0407 Epoch: 7 Global Step: 300310 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:29:12,811-Speed 2611.34 samples/sec Loss 8.5393 LearningRate 0.0407 Epoch: 7 Global Step: 300320 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:29:16,707-Speed 2629.21 samples/sec Loss 8.6306 LearningRate 0.0407 Epoch: 7 Global Step: 300330 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:20,604-Speed 2627.81 samples/sec Loss 8.5821 LearningRate 0.0407 Epoch: 7 Global Step: 300340 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:24,496-Speed 2631.48 samples/sec Loss 8.6266 LearningRate 0.0407 Epoch: 7 Global Step: 300350 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:28,388-Speed 2632.14 samples/sec Loss 8.6503 LearningRate 0.0407 Epoch: 7 Global Step: 300360 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:32,285-Speed 2627.98 samples/sec Loss 8.6106 LearningRate 0.0407 Epoch: 7 Global Step: 300370 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:36,193-Speed 2620.97 samples/sec Loss 8.5959 LearningRate 0.0407 Epoch: 7 Global Step: 300380 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:40,081-Speed 2634.26 samples/sec Loss 8.5557 LearningRate 0.0407 Epoch: 7 Global Step: 300390 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:43,978-Speed 2628.84 samples/sec Loss 8.7449 LearningRate 0.0407 Epoch: 7 Global Step: 300400 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:47,879-Speed 2625.26 samples/sec Loss 8.6684 LearningRate 0.0407 Epoch: 7 Global Step: 300410 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:51,881-Speed 2559.49 samples/sec Loss 8.5411 LearningRate 0.0407 Epoch: 7 Global Step: 300420 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:29:55,779-Speed 2627.69 samples/sec Loss 8.6853 LearningRate 0.0407 Epoch: 7 Global Step: 300430 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:29:59,674-Speed 2629.41 samples/sec Loss 8.5811 LearningRate 0.0407 Epoch: 7 Global Step: 300440 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:03,572-Speed 2627.96 samples/sec Loss 8.7870 LearningRate 0.0407 Epoch: 7 Global Step: 300450 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:07,469-Speed 2627.94 samples/sec Loss 8.6899 LearningRate 0.0407 Epoch: 7 Global Step: 300460 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:11,361-Speed 2631.96 samples/sec Loss 8.5119 LearningRate 0.0407 Epoch: 7 Global Step: 300470 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:15,291-Speed 2606.05 samples/sec Loss 8.6046 LearningRate 0.0407 Epoch: 7 Global Step: 300480 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:19,193-Speed 2624.93 samples/sec Loss 8.6358 LearningRate 0.0407 Epoch: 7 Global Step: 300490 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:23,092-Speed 2627.24 samples/sec Loss 8.5388 LearningRate 0.0407 Epoch: 7 Global Step: 300500 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:26,988-Speed 2629.52 samples/sec Loss 8.6257 LearningRate 0.0407 Epoch: 7 Global Step: 300510 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:30,883-Speed 2629.20 samples/sec Loss 8.7022 LearningRate 0.0407 Epoch: 7 Global Step: 300520 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:30:34,780-Speed 2628.38 samples/sec Loss 8.6100 LearningRate 0.0407 Epoch: 7 Global Step: 300530 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:30:38,692-Speed 2618.21 samples/sec Loss 8.5855 LearningRate 0.0407 Epoch: 7 Global Step: 300540 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:30:42,605-Speed 2617.45 samples/sec Loss 8.5852 LearningRate 0.0407 Epoch: 7 Global Step: 300550 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:30:46,513-Speed 2621.43 samples/sec Loss 8.7911 LearningRate 0.0407 Epoch: 7 Global Step: 300560 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:30:50,415-Speed 2624.56 samples/sec Loss 8.7191 LearningRate 0.0407 Epoch: 7 Global Step: 300570 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:30:54,325-Speed 2619.55 samples/sec Loss 8.6048 LearningRate 0.0407 Epoch: 7 Global Step: 300580 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:30:58,227-Speed 2624.43 samples/sec Loss 8.6415 LearningRate 0.0407 Epoch: 7 Global Step: 300590 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:31:02,159-Speed 2605.12 samples/sec Loss 8.7449 LearningRate 0.0407 Epoch: 7 Global Step: 300600 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:31:06,052-Speed 2630.86 samples/sec Loss 8.6928 LearningRate 0.0407 Epoch: 7 Global Step: 300610 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:31:09,957-Speed 2623.80 samples/sec Loss 8.5858 LearningRate 0.0407 Epoch: 7 Global Step: 300620 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:31:13,862-Speed 2622.79 samples/sec Loss 8.8001 LearningRate 0.0407 Epoch: 7 Global Step: 300630 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:31:17,780-Speed 2614.43 samples/sec Loss 8.7148 LearningRate 0.0407 Epoch: 7 Global Step: 300640 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:31:21,689-Speed 2619.83 samples/sec Loss 8.6645 LearningRate 0.0407 Epoch: 7 Global Step: 300650 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:31:25,577-Speed 2634.80 samples/sec Loss 8.6478 LearningRate 0.0407 Epoch: 7 Global Step: 300660 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:29,477-Speed 2626.23 samples/sec Loss 8.6879 LearningRate 0.0406 Epoch: 7 Global Step: 300670 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:33,375-Speed 2627.29 samples/sec Loss 8.7101 LearningRate 0.0406 Epoch: 7 Global Step: 300680 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:37,289-Speed 2616.87 samples/sec Loss 8.5811 LearningRate 0.0406 Epoch: 7 Global Step: 300690 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:41,191-Speed 2625.25 samples/sec Loss 8.7123 LearningRate 0.0406 Epoch: 7 Global Step: 300700 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:45,105-Speed 2616.98 samples/sec Loss 8.6816 LearningRate 0.0406 Epoch: 7 Global Step: 300710 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:49,011-Speed 2622.73 samples/sec Loss 8.6026 LearningRate 0.0406 Epoch: 7 Global Step: 300720 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:52,910-Speed 2626.72 samples/sec Loss 8.5960 LearningRate 0.0406 Epoch: 7 Global Step: 300730 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:31:56,812-Speed 2624.63 samples/sec Loss 8.6204 LearningRate 0.0406 Epoch: 7 Global Step: 300740 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:32:00,718-Speed 2621.95 samples/sec Loss 8.6502 LearningRate 0.0406 Epoch: 7 Global Step: 300750 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:32:04,619-Speed 2625.66 samples/sec Loss 8.5313 LearningRate 0.0406 Epoch: 7 Global Step: 300760 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:08,543-Speed 2610.41 samples/sec Loss 8.5643 LearningRate 0.0406 Epoch: 7 Global Step: 300770 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:12,445-Speed 2625.05 samples/sec Loss 8.6893 LearningRate 0.0406 Epoch: 7 Global Step: 300780 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:16,347-Speed 2625.58 samples/sec Loss 8.4074 LearningRate 0.0406 Epoch: 7 Global Step: 300790 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:20,313-Speed 2582.19 samples/sec Loss 8.6638 LearningRate 0.0406 Epoch: 7 Global Step: 300800 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:24,231-Speed 2613.82 samples/sec Loss 8.6809 LearningRate 0.0406 Epoch: 7 Global Step: 300810 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:28,139-Speed 2621.53 samples/sec Loss 8.5037 LearningRate 0.0406 Epoch: 7 Global Step: 300820 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:32,046-Speed 2621.40 samples/sec Loss 8.7189 LearningRate 0.0406 Epoch: 7 Global Step: 300830 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:35,957-Speed 2619.07 samples/sec Loss 8.7217 LearningRate 0.0406 Epoch: 7 Global Step: 300840 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:39,872-Speed 2615.92 samples/sec Loss 8.7548 LearningRate 0.0406 Epoch: 7 Global Step: 300850 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:32:43,794-Speed 2611.99 samples/sec Loss 8.6716 LearningRate 0.0406 Epoch: 7 Global Step: 300860 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:32:47,705-Speed 2618.70 samples/sec Loss 8.6933 LearningRate 0.0406 Epoch: 7 Global Step: 300870 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:32:51,608-Speed 2624.79 samples/sec Loss 8.6151 LearningRate 0.0406 Epoch: 7 Global Step: 300880 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:32:55,515-Speed 2621.38 samples/sec Loss 8.5755 LearningRate 0.0406 Epoch: 7 Global Step: 300890 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:32:59,416-Speed 2625.40 samples/sec Loss 8.6322 LearningRate 0.0406 Epoch: 7 Global Step: 300900 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:33:03,319-Speed 2624.15 samples/sec Loss 8.4404 LearningRate 0.0406 Epoch: 7 Global Step: 300910 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:33:07,213-Speed 2630.77 samples/sec Loss 8.6108 LearningRate 0.0406 Epoch: 7 Global Step: 300920 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:33:11,112-Speed 2626.15 samples/sec Loss 8.5980 LearningRate 0.0406 Epoch: 7 Global Step: 300930 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:33:15,007-Speed 2629.74 samples/sec Loss 8.5669 LearningRate 0.0406 Epoch: 7 Global Step: 300940 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:33:18,904-Speed 2628.55 samples/sec Loss 8.6926 LearningRate 0.0406 Epoch: 7 Global Step: 300950 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:33:22,804-Speed 2626.36 samples/sec Loss 8.4904 LearningRate 0.0406 Epoch: 7 Global Step: 300960 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:33:26,703-Speed 2627.36 samples/sec Loss 8.6461 LearningRate 0.0406 Epoch: 7 Global Step: 300970 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:33:30,568-Speed 2649.48 samples/sec Loss 8.6319 LearningRate 0.0406 Epoch: 7 Global Step: 300980 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:33:34,465-Speed 2628.24 samples/sec Loss 8.6378 LearningRate 0.0406 Epoch: 7 Global Step: 300990 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:33:38,362-Speed 2628.09 samples/sec Loss 8.6147 LearningRate 0.0406 Epoch: 7 Global Step: 301000 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:33:42,279-Speed 2615.70 samples/sec Loss 8.6909 LearningRate 0.0406 Epoch: 7 Global Step: 301010 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:33:46,190-Speed 2618.66 samples/sec Loss 8.6164 LearningRate 0.0406 Epoch: 7 Global Step: 301020 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:33:50,095-Speed 2623.15 samples/sec Loss 8.6779 LearningRate 0.0406 Epoch: 7 Global Step: 301030 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:33:54,030-Speed 2602.97 samples/sec Loss 8.8170 LearningRate 0.0406 Epoch: 7 Global Step: 301040 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:33:57,937-Speed 2621.89 samples/sec Loss 8.7029 LearningRate 0.0406 Epoch: 7 Global Step: 301050 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:34:01,845-Speed 2620.52 samples/sec Loss 8.6733 LearningRate 0.0406 Epoch: 7 Global Step: 301060 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:34:05,751-Speed 2622.52 samples/sec Loss 8.5225 LearningRate 0.0406 Epoch: 7 Global Step: 301070 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:34:09,650-Speed 2626.10 samples/sec Loss 8.7103 LearningRate 0.0406 Epoch: 7 Global Step: 301080 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:13,547-Speed 2629.25 samples/sec Loss 8.6538 LearningRate 0.0406 Epoch: 7 Global Step: 301090 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:17,443-Speed 2628.52 samples/sec Loss 8.5707 LearningRate 0.0406 Epoch: 7 Global Step: 301100 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:21,337-Speed 2630.53 samples/sec Loss 8.5898 LearningRate 0.0406 Epoch: 7 Global Step: 301110 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:25,232-Speed 2629.47 samples/sec Loss 8.4709 LearningRate 0.0406 Epoch: 7 Global Step: 301120 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:29,132-Speed 2627.27 samples/sec Loss 8.6096 LearningRate 0.0406 Epoch: 7 Global Step: 301130 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:33,025-Speed 2630.72 samples/sec Loss 8.6752 LearningRate 0.0406 Epoch: 7 Global Step: 301140 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:36,920-Speed 2629.55 samples/sec Loss 8.6499 LearningRate 0.0406 Epoch: 7 Global Step: 301150 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:40,826-Speed 2622.47 samples/sec Loss 8.6730 LearningRate 0.0406 Epoch: 7 Global Step: 301160 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:44,721-Speed 2629.72 samples/sec Loss 8.5951 LearningRate 0.0406 Epoch: 7 Global Step: 301170 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:48,620-Speed 2626.80 samples/sec Loss 8.4610 LearningRate 0.0406 Epoch: 7 Global Step: 301180 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:34:52,515-Speed 2629.46 samples/sec Loss 8.6043 LearningRate 0.0406 Epoch: 7 Global Step: 301190 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:34:56,412-Speed 2628.41 samples/sec Loss 8.5519 LearningRate 0.0406 Epoch: 7 Global Step: 301200 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:00,310-Speed 2627.71 samples/sec Loss 8.6702 LearningRate 0.0406 Epoch: 7 Global Step: 301210 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:04,207-Speed 2628.84 samples/sec Loss 8.7808 LearningRate 0.0406 Epoch: 7 Global Step: 301220 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:08,106-Speed 2626.71 samples/sec Loss 8.6169 LearningRate 0.0406 Epoch: 7 Global Step: 301230 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:12,017-Speed 2618.85 samples/sec Loss 8.5955 LearningRate 0.0406 Epoch: 7 Global Step: 301240 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:15,933-Speed 2615.54 samples/sec Loss 8.6130 LearningRate 0.0406 Epoch: 7 Global Step: 301250 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:19,841-Speed 2620.54 samples/sec Loss 8.5306 LearningRate 0.0406 Epoch: 7 Global Step: 301260 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:23,734-Speed 2631.09 samples/sec Loss 8.5318 LearningRate 0.0406 Epoch: 7 Global Step: 301270 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:27,633-Speed 2627.51 samples/sec Loss 8.5773 LearningRate 0.0406 Epoch: 7 Global Step: 301280 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:31,526-Speed 2630.49 samples/sec Loss 8.6454 LearningRate 0.0406 Epoch: 7 Global Step: 301290 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:35:35,422-Speed 2629.07 samples/sec Loss 8.6020 LearningRate 0.0406 Epoch: 7 Global Step: 301300 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:35:39,299-Speed 2641.97 samples/sec Loss 8.7366 LearningRate 0.0406 Epoch: 7 Global Step: 301310 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:43,233-Speed 2603.64 samples/sec Loss 8.5993 LearningRate 0.0405 Epoch: 7 Global Step: 301320 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:47,129-Speed 2628.77 samples/sec Loss 8.6778 LearningRate 0.0405 Epoch: 7 Global Step: 301330 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:35:51,010-Speed 2639.83 samples/sec Loss 8.5886 LearningRate 0.0405 Epoch: 7 Global Step: 301340 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:35:54,910-Speed 2626.06 samples/sec Loss 8.6152 LearningRate 0.0405 Epoch: 7 Global Step: 301350 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:35:58,814-Speed 2631.58 samples/sec Loss 8.4756 LearningRate 0.0405 Epoch: 7 Global Step: 301360 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:02,707-Speed 2630.25 samples/sec Loss 8.5966 LearningRate 0.0405 Epoch: 7 Global Step: 301370 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:06,599-Speed 2632.10 samples/sec Loss 8.6341 LearningRate 0.0405 Epoch: 7 Global Step: 301380 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:10,498-Speed 2626.26 samples/sec Loss 8.6950 LearningRate 0.0405 Epoch: 7 Global Step: 301390 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:14,410-Speed 2618.64 samples/sec Loss 8.5425 LearningRate 0.0405 Epoch: 7 Global Step: 301400 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:18,316-Speed 2622.35 samples/sec Loss 8.6609 LearningRate 0.0405 Epoch: 7 Global Step: 301410 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:22,212-Speed 2629.32 samples/sec Loss 8.4582 LearningRate 0.0405 Epoch: 7 Global Step: 301420 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:26,109-Speed 2628.45 samples/sec Loss 8.5244 LearningRate 0.0405 Epoch: 7 Global Step: 301430 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:36:30,005-Speed 2628.84 samples/sec Loss 8.5288 LearningRate 0.0405 Epoch: 7 Global Step: 301440 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:36:33,901-Speed 2628.84 samples/sec Loss 8.5914 LearningRate 0.0405 Epoch: 7 Global Step: 301450 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:36:37,804-Speed 2624.58 samples/sec Loss 8.5043 LearningRate 0.0405 Epoch: 7 Global Step: 301460 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:36:41,697-Speed 2630.79 samples/sec Loss 8.5729 LearningRate 0.0405 Epoch: 7 Global Step: 301470 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:36:45,596-Speed 2626.78 samples/sec Loss 8.5369 LearningRate 0.0405 Epoch: 7 Global Step: 301480 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:36:49,494-Speed 2628.10 samples/sec Loss 8.5318 LearningRate 0.0405 Epoch: 7 Global Step: 301490 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:36:53,394-Speed 2626.18 samples/sec Loss 8.5828 LearningRate 0.0405 Epoch: 7 Global Step: 301500 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:36:57,294-Speed 2626.43 samples/sec Loss 8.5720 LearningRate 0.0405 Epoch: 7 Global Step: 301510 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:37:01,176-Speed 2638.80 samples/sec Loss 8.5723 LearningRate 0.0405 Epoch: 7 Global Step: 301520 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:05,072-Speed 2629.09 samples/sec Loss 8.5837 LearningRate 0.0405 Epoch: 7 Global Step: 301530 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:08,986-Speed 2617.00 samples/sec Loss 8.6308 LearningRate 0.0405 Epoch: 7 Global Step: 301540 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:12,911-Speed 2609.32 samples/sec Loss 8.6942 LearningRate 0.0405 Epoch: 7 Global Step: 301550 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:16,828-Speed 2615.33 samples/sec Loss 8.5566 LearningRate 0.0405 Epoch: 7 Global Step: 301560 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:20,747-Speed 2613.56 samples/sec Loss 8.6740 LearningRate 0.0405 Epoch: 7 Global Step: 301570 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:24,659-Speed 2617.91 samples/sec Loss 8.6463 LearningRate 0.0405 Epoch: 7 Global Step: 301580 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:28,578-Speed 2613.65 samples/sec Loss 8.5403 LearningRate 0.0405 Epoch: 7 Global Step: 301590 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:32,474-Speed 2629.65 samples/sec Loss 8.5765 LearningRate 0.0405 Epoch: 7 Global Step: 301600 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:36,364-Speed 2632.86 samples/sec Loss 8.5631 LearningRate 0.0405 Epoch: 7 Global Step: 301610 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:37:40,211-Speed 2662.34 samples/sec Loss 9.1707 LearningRate 0.0405 Epoch: 7 Global Step: 301620 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:37:44,101-Speed 2632.63 samples/sec Loss 9.8343 LearningRate 0.0405 Epoch: 7 Global Step: 301630 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:37:48,000-Speed 2627.45 samples/sec Loss 8.8314 LearningRate 0.0405 Epoch: 7 Global Step: 301640 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:37:51,890-Speed 2632.93 samples/sec Loss 8.6405 LearningRate 0.0405 Epoch: 7 Global Step: 301650 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:37:55,820-Speed 2606.47 samples/sec Loss 8.6751 LearningRate 0.0405 Epoch: 7 Global Step: 301660 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:37:59,716-Speed 2628.54 samples/sec Loss 8.6192 LearningRate 0.0405 Epoch: 7 Global Step: 301670 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:38:03,608-Speed 2631.92 samples/sec Loss 8.6075 LearningRate 0.0405 Epoch: 7 Global Step: 301680 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:38:07,510-Speed 2625.04 samples/sec Loss 8.6569 LearningRate 0.0405 Epoch: 7 Global Step: 301690 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:38:11,406-Speed 2628.72 samples/sec Loss 8.6255 LearningRate 0.0405 Epoch: 7 Global Step: 301700 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:38:15,313-Speed 2621.75 samples/sec Loss 8.6594 LearningRate 0.0405 Epoch: 7 Global Step: 301710 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:38:19,273-Speed 2586.46 samples/sec Loss 8.5642 LearningRate 0.0405 Epoch: 7 Global Step: 301720 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:38:23,184-Speed 2619.26 samples/sec Loss 8.5914 LearningRate 0.0405 Epoch: 7 Global Step: 301730 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:27,087-Speed 2624.33 samples/sec Loss 8.5669 LearningRate 0.0405 Epoch: 7 Global Step: 301740 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:30,984-Speed 2628.01 samples/sec Loss 8.6176 LearningRate 0.0405 Epoch: 7 Global Step: 301750 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:34,890-Speed 2621.98 samples/sec Loss 8.6876 LearningRate 0.0405 Epoch: 7 Global Step: 301760 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:39,011-Speed 2485.90 samples/sec Loss 8.5783 LearningRate 0.0405 Epoch: 7 Global Step: 301770 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:42,936-Speed 2609.97 samples/sec Loss 8.6240 LearningRate 0.0405 Epoch: 7 Global Step: 301780 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:46,838-Speed 2624.63 samples/sec Loss 8.8659 LearningRate 0.0405 Epoch: 7 Global Step: 301790 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:50,729-Speed 2632.44 samples/sec Loss 8.6648 LearningRate 0.0405 Epoch: 7 Global Step: 301800 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:54,618-Speed 2634.18 samples/sec Loss 8.6098 LearningRate 0.0405 Epoch: 7 Global Step: 301810 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:38:58,511-Speed 2631.08 samples/sec Loss 8.6333 LearningRate 0.0405 Epoch: 7 Global Step: 301820 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:39:02,409-Speed 2627.16 samples/sec Loss 8.4904 LearningRate 0.0405 Epoch: 7 Global Step: 301830 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:06,298-Speed 2633.45 samples/sec Loss 8.7026 LearningRate 0.0405 Epoch: 7 Global Step: 301840 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:10,191-Speed 2631.22 samples/sec Loss 8.6363 LearningRate 0.0405 Epoch: 7 Global Step: 301850 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:14,084-Speed 2631.36 samples/sec Loss 8.7039 LearningRate 0.0405 Epoch: 7 Global Step: 301860 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:17,974-Speed 2633.18 samples/sec Loss 8.6542 LearningRate 0.0405 Epoch: 7 Global Step: 301870 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:21,872-Speed 2627.86 samples/sec Loss 8.7114 LearningRate 0.0405 Epoch: 7 Global Step: 301880 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:25,762-Speed 2632.31 samples/sec Loss 8.5186 LearningRate 0.0405 Epoch: 7 Global Step: 301890 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:29,675-Speed 2617.81 samples/sec Loss 8.6536 LearningRate 0.0405 Epoch: 7 Global Step: 301900 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:33,570-Speed 2629.91 samples/sec Loss 8.5891 LearningRate 0.0405 Epoch: 7 Global Step: 301910 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:37,467-Speed 2628.32 samples/sec Loss 8.6733 LearningRate 0.0405 Epoch: 7 Global Step: 301920 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:39:41,361-Speed 2629.91 samples/sec Loss 8.5154 LearningRate 0.0405 Epoch: 7 Global Step: 301930 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:39:45,255-Speed 2630.89 samples/sec Loss 8.6200 LearningRate 0.0405 Epoch: 7 Global Step: 301940 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:39:49,146-Speed 2632.61 samples/sec Loss 8.5817 LearningRate 0.0405 Epoch: 7 Global Step: 301950 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:39:53,042-Speed 2629.42 samples/sec Loss 8.5521 LearningRate 0.0405 Epoch: 7 Global Step: 301960 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:39:56,933-Speed 2632.70 samples/sec Loss 8.5069 LearningRate 0.0404 Epoch: 7 Global Step: 301970 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:40:00,881-Speed 2594.14 samples/sec Loss 8.4494 LearningRate 0.0404 Epoch: 7 Global Step: 301980 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:40:04,956-Speed 2513.08 samples/sec Loss 8.6200 LearningRate 0.0404 Epoch: 7 Global Step: 301990 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:40:09,029-Speed 2514.96 samples/sec Loss 8.6581 LearningRate 0.0404 Epoch: 7 Global Step: 302000 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:40:13,014-Speed 2570.62 samples/sec Loss 8.5082 LearningRate 0.0404 Epoch: 7 Global Step: 302010 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:40:16,908-Speed 2629.97 samples/sec Loss 8.6256 LearningRate 0.0404 Epoch: 7 Global Step: 302020 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:40:20,803-Speed 2629.60 samples/sec Loss 8.5461 LearningRate 0.0404 Epoch: 7 Global Step: 302030 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:24,700-Speed 2628.89 samples/sec Loss 8.6890 LearningRate 0.0404 Epoch: 7 Global Step: 302040 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:28,593-Speed 2631.18 samples/sec Loss 8.6296 LearningRate 0.0404 Epoch: 7 Global Step: 302050 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:32,544-Speed 2591.68 samples/sec Loss 8.6041 LearningRate 0.0404 Epoch: 7 Global Step: 302060 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:36,438-Speed 2630.87 samples/sec Loss 8.6134 LearningRate 0.0404 Epoch: 7 Global Step: 302070 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:40,333-Speed 2629.14 samples/sec Loss 8.7129 LearningRate 0.0404 Epoch: 7 Global Step: 302080 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:44,234-Speed 2626.27 samples/sec Loss 8.4771 LearningRate 0.0404 Epoch: 7 Global Step: 302090 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:48,128-Speed 2629.74 samples/sec Loss 8.6349 LearningRate 0.0404 Epoch: 7 Global Step: 302100 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:52,026-Speed 2627.95 samples/sec Loss 8.5490 LearningRate 0.0404 Epoch: 7 Global Step: 302110 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:55,927-Speed 2625.69 samples/sec Loss 8.5906 LearningRate 0.0404 Epoch: 7 Global Step: 302120 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:40:59,825-Speed 2627.88 samples/sec Loss 8.5896 LearningRate 0.0404 Epoch: 7 Global Step: 302130 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:03,720-Speed 2629.36 samples/sec Loss 8.6486 LearningRate 0.0404 Epoch: 7 Global Step: 302140 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:07,614-Speed 2630.54 samples/sec Loss 8.6640 LearningRate 0.0404 Epoch: 7 Global Step: 302150 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:11,505-Speed 2632.25 samples/sec Loss 8.6988 LearningRate 0.0404 Epoch: 7 Global Step: 302160 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:15,396-Speed 2632.11 samples/sec Loss 8.6473 LearningRate 0.0404 Epoch: 7 Global Step: 302170 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:19,296-Speed 2626.15 samples/sec Loss 8.6794 LearningRate 0.0404 Epoch: 7 Global Step: 302180 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:23,197-Speed 2625.85 samples/sec Loss 8.5578 LearningRate 0.0404 Epoch: 7 Global Step: 302190 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:27,091-Speed 2630.37 samples/sec Loss 8.5811 LearningRate 0.0404 Epoch: 7 Global Step: 302200 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:30,985-Speed 2629.85 samples/sec Loss 8.7337 LearningRate 0.0404 Epoch: 7 Global Step: 302210 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:34,879-Speed 2631.05 samples/sec Loss 8.5822 LearningRate 0.0404 Epoch: 7 Global Step: 302220 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:38,757-Speed 2640.75 samples/sec Loss 8.5446 LearningRate 0.0404 Epoch: 7 Global Step: 302230 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:42,650-Speed 2631.41 samples/sec Loss 8.6835 LearningRate 0.0404 Epoch: 7 Global Step: 302240 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:46,544-Speed 2630.03 samples/sec Loss 8.4898 LearningRate 0.0404 Epoch: 7 Global Step: 302250 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:50,439-Speed 2629.86 samples/sec Loss 8.6335 LearningRate 0.0404 Epoch: 7 Global Step: 302260 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:41:54,322-Speed 2637.49 samples/sec Loss 8.4650 LearningRate 0.0404 Epoch: 7 Global Step: 302270 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:41:58,239-Speed 2615.74 samples/sec Loss 8.6236 LearningRate 0.0404 Epoch: 7 Global Step: 302280 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:02,132-Speed 2630.42 samples/sec Loss 8.6800 LearningRate 0.0404 Epoch: 7 Global Step: 302290 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:06,040-Speed 2620.58 samples/sec Loss 8.6254 LearningRate 0.0404 Epoch: 7 Global Step: 302300 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:09,937-Speed 2628.42 samples/sec Loss 8.5945 LearningRate 0.0404 Epoch: 7 Global Step: 302310 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:13,839-Speed 2625.78 samples/sec Loss 8.4579 LearningRate 0.0404 Epoch: 7 Global Step: 302320 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:17,735-Speed 2628.82 samples/sec Loss 8.6709 LearningRate 0.0404 Epoch: 7 Global Step: 302330 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:21,633-Speed 2627.07 samples/sec Loss 8.5214 LearningRate 0.0404 Epoch: 7 Global Step: 302340 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:25,539-Speed 2622.59 samples/sec Loss 8.6490 LearningRate 0.0404 Epoch: 7 Global Step: 302350 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:29,444-Speed 2623.08 samples/sec Loss 8.5634 LearningRate 0.0404 Epoch: 7 Global Step: 302360 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:42:33,340-Speed 2629.01 samples/sec Loss 8.6290 LearningRate 0.0404 Epoch: 7 Global Step: 302370 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:42:37,260-Speed 2612.69 samples/sec Loss 8.6433 LearningRate 0.0404 Epoch: 7 Global Step: 302380 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:42:41,157-Speed 2628.54 samples/sec Loss 8.6529 LearningRate 0.0404 Epoch: 7 Global Step: 302390 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:42:45,051-Speed 2630.43 samples/sec Loss 8.5648 LearningRate 0.0404 Epoch: 7 Global Step: 302400 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:42:48,960-Speed 2620.62 samples/sec Loss 8.6408 LearningRate 0.0404 Epoch: 7 Global Step: 302410 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:42:52,896-Speed 2602.00 samples/sec Loss 8.5904 LearningRate 0.0404 Epoch: 7 Global Step: 302420 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:42:56,796-Speed 2626.51 samples/sec Loss 8.5337 LearningRate 0.0404 Epoch: 7 Global Step: 302430 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:00,700-Speed 2623.74 samples/sec Loss 8.5443 LearningRate 0.0404 Epoch: 7 Global Step: 302440 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:04,598-Speed 2627.96 samples/sec Loss 8.6204 LearningRate 0.0404 Epoch: 7 Global Step: 302450 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:08,488-Speed 2632.77 samples/sec Loss 8.7381 LearningRate 0.0404 Epoch: 7 Global Step: 302460 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:12,380-Speed 2631.44 samples/sec Loss 8.4755 LearningRate 0.0404 Epoch: 7 Global Step: 302470 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:43:16,289-Speed 2620.57 samples/sec Loss 8.5345 LearningRate 0.0404 Epoch: 7 Global Step: 302480 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:20,181-Speed 2631.48 samples/sec Loss 8.6634 LearningRate 0.0404 Epoch: 7 Global Step: 302490 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:24,079-Speed 2628.23 samples/sec Loss 8.5959 LearningRate 0.0404 Epoch: 7 Global Step: 302500 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:28,015-Speed 2602.56 samples/sec Loss 8.6728 LearningRate 0.0404 Epoch: 7 Global Step: 302510 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:31,911-Speed 2629.07 samples/sec Loss 8.6042 LearningRate 0.0404 Epoch: 7 Global Step: 302520 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:35,807-Speed 2629.36 samples/sec Loss 8.4388 LearningRate 0.0404 Epoch: 7 Global Step: 302530 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:39,703-Speed 2628.82 samples/sec Loss 8.5678 LearningRate 0.0404 Epoch: 7 Global Step: 302540 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:43,631-Speed 2607.09 samples/sec Loss 8.6417 LearningRate 0.0404 Epoch: 7 Global Step: 302550 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:47,530-Speed 2627.20 samples/sec Loss 8.6519 LearningRate 0.0404 Epoch: 7 Global Step: 302560 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:51,438-Speed 2621.45 samples/sec Loss 8.6162 LearningRate 0.0404 Epoch: 7 Global Step: 302570 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:43:55,339-Speed 2625.82 samples/sec Loss 8.5397 LearningRate 0.0404 Epoch: 7 Global Step: 302580 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:43:59,242-Speed 2624.07 samples/sec Loss 8.5832 LearningRate 0.0404 Epoch: 7 Global Step: 302590 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:44:03,124-Speed 2638.26 samples/sec Loss 8.6466 LearningRate 0.0404 Epoch: 7 Global Step: 302600 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:07,042-Speed 2614.33 samples/sec Loss 8.6765 LearningRate 0.0404 Epoch: 7 Global Step: 302610 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:10,945-Speed 2624.47 samples/sec Loss 8.6216 LearningRate 0.0403 Epoch: 7 Global Step: 302620 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:14,845-Speed 2626.26 samples/sec Loss 8.5772 LearningRate 0.0403 Epoch: 7 Global Step: 302630 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:18,746-Speed 2625.33 samples/sec Loss 8.4765 LearningRate 0.0403 Epoch: 7 Global Step: 302640 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:22,640-Speed 2630.12 samples/sec Loss 8.5564 LearningRate 0.0403 Epoch: 7 Global Step: 302650 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:26,534-Speed 2630.13 samples/sec Loss 8.5635 LearningRate 0.0403 Epoch: 7 Global Step: 302660 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:30,432-Speed 2627.60 samples/sec Loss 8.5865 LearningRate 0.0403 Epoch: 7 Global Step: 302670 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:34,384-Speed 2592.47 samples/sec Loss 8.4992 LearningRate 0.0403 Epoch: 7 Global Step: 302680 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:38,271-Speed 2635.20 samples/sec Loss 8.5881 LearningRate 0.0403 Epoch: 7 Global Step: 302690 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:42,167-Speed 2628.81 samples/sec Loss 8.5098 LearningRate 0.0403 Epoch: 7 Global Step: 302700 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:44:46,042-Speed 2643.45 samples/sec Loss 8.6913 LearningRate 0.0403 Epoch: 7 Global Step: 302710 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:49,948-Speed 2622.50 samples/sec Loss 8.6545 LearningRate 0.0403 Epoch: 7 Global Step: 302720 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:53,850-Speed 2625.05 samples/sec Loss 8.6081 LearningRate 0.0403 Epoch: 7 Global Step: 302730 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:44:57,763-Speed 2617.03 samples/sec Loss 8.6560 LearningRate 0.0403 Epoch: 7 Global Step: 302740 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:01,675-Speed 2618.05 samples/sec Loss 8.6816 LearningRate 0.0403 Epoch: 7 Global Step: 302750 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:05,576-Speed 2625.49 samples/sec Loss 8.6471 LearningRate 0.0403 Epoch: 7 Global Step: 302760 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:09,470-Speed 2631.08 samples/sec Loss 8.5144 LearningRate 0.0403 Epoch: 7 Global Step: 302770 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:13,369-Speed 2626.71 samples/sec Loss 8.6442 LearningRate 0.0403 Epoch: 7 Global Step: 302780 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:17,273-Speed 2623.38 samples/sec Loss 8.5806 LearningRate 0.0403 Epoch: 7 Global Step: 302790 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:21,170-Speed 2628.69 samples/sec Loss 8.6734 LearningRate 0.0403 Epoch: 7 Global Step: 302800 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:25,042-Speed 2645.07 samples/sec Loss 8.5627 LearningRate 0.0403 Epoch: 7 Global Step: 302810 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:45:28,912-Speed 2646.89 samples/sec Loss 8.6022 LearningRate 0.0403 Epoch: 7 Global Step: 302820 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:45:32,762-Speed 2660.11 samples/sec Loss 9.5869 LearningRate 0.0403 Epoch: 7 Global Step: 302830 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:45:36,634-Speed 2645.46 samples/sec Loss 10.0393 LearningRate 0.0403 Epoch: 7 Global Step: 302840 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:45:40,530-Speed 2629.13 samples/sec Loss 9.1763 LearningRate 0.0403 Epoch: 7 Global Step: 302850 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:45:44,428-Speed 2627.38 samples/sec Loss 8.7632 LearningRate 0.0403 Epoch: 7 Global Step: 302860 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:45:48,321-Speed 2631.01 samples/sec Loss 8.5994 LearningRate 0.0403 Epoch: 7 Global Step: 302870 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:45:52,216-Speed 2630.47 samples/sec Loss 8.6019 LearningRate 0.0403 Epoch: 7 Global Step: 302880 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:45:56,106-Speed 2633.02 samples/sec Loss 8.5697 LearningRate 0.0403 Epoch: 7 Global Step: 302890 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:46:00,037-Speed 2605.77 samples/sec Loss 8.6088 LearningRate 0.0403 Epoch: 7 Global Step: 302900 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:46:03,930-Speed 2631.12 samples/sec Loss 8.6115 LearningRate 0.0403 Epoch: 7 Global Step: 302910 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:46:07,818-Speed 2634.30 samples/sec Loss 8.4923 LearningRate 0.0403 Epoch: 7 Global Step: 302920 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:46:11,706-Speed 2634.18 samples/sec Loss 8.5929 LearningRate 0.0403 Epoch: 7 Global Step: 302930 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:46:15,602-Speed 2628.74 samples/sec Loss 8.6266 LearningRate 0.0403 Epoch: 7 Global Step: 302940 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:19,539-Speed 2602.55 samples/sec Loss 8.3892 LearningRate 0.0403 Epoch: 7 Global Step: 302950 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:23,436-Speed 2628.33 samples/sec Loss 8.7562 LearningRate 0.0403 Epoch: 7 Global Step: 302960 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:27,340-Speed 2623.79 samples/sec Loss 8.6779 LearningRate 0.0403 Epoch: 7 Global Step: 302970 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:31,240-Speed 2626.35 samples/sec Loss 8.5622 LearningRate 0.0403 Epoch: 7 Global Step: 302980 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:35,133-Speed 2631.23 samples/sec Loss 8.6117 LearningRate 0.0403 Epoch: 7 Global Step: 302990 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:39,024-Speed 2632.37 samples/sec Loss 8.5719 LearningRate 0.0403 Epoch: 7 Global Step: 303000 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:42,918-Speed 2630.39 samples/sec Loss 8.3568 LearningRate 0.0403 Epoch: 7 Global Step: 303010 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:46,812-Speed 2629.71 samples/sec Loss 8.7089 LearningRate 0.0403 Epoch: 7 Global Step: 303020 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:50,706-Speed 2631.01 samples/sec Loss 8.5514 LearningRate 0.0403 Epoch: 7 Global Step: 303030 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:46:54,604-Speed 2627.83 samples/sec Loss 8.7036 LearningRate 0.0403 Epoch: 7 Global Step: 303040 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:46:58,503-Speed 2626.90 samples/sec Loss 8.7243 LearningRate 0.0403 Epoch: 7 Global Step: 303050 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:47:02,386-Speed 2637.32 samples/sec Loss 8.7464 LearningRate 0.0403 Epoch: 7 Global Step: 303060 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:06,275-Speed 2633.76 samples/sec Loss 8.8313 LearningRate 0.0403 Epoch: 7 Global Step: 303070 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:10,168-Speed 2630.63 samples/sec Loss 8.6678 LearningRate 0.0403 Epoch: 7 Global Step: 303080 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:14,063-Speed 2629.92 samples/sec Loss 8.6554 LearningRate 0.0403 Epoch: 7 Global Step: 303090 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:17,957-Speed 2630.49 samples/sec Loss 8.5399 LearningRate 0.0403 Epoch: 7 Global Step: 303100 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:21,856-Speed 2627.16 samples/sec Loss 8.6151 LearningRate 0.0403 Epoch: 7 Global Step: 303110 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:25,754-Speed 2628.04 samples/sec Loss 8.6132 LearningRate 0.0403 Epoch: 7 Global Step: 303120 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:29,647-Speed 2631.17 samples/sec Loss 8.5972 LearningRate 0.0403 Epoch: 7 Global Step: 303130 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:33,538-Speed 2632.23 samples/sec Loss 8.5206 LearningRate 0.0403 Epoch: 7 Global Step: 303140 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:37,435-Speed 2627.91 samples/sec Loss 8.6129 LearningRate 0.0403 Epoch: 7 Global Step: 303150 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:47:41,335-Speed 2625.98 samples/sec Loss 8.6220 LearningRate 0.0403 Epoch: 7 Global Step: 303160 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:47:45,229-Speed 2630.35 samples/sec Loss 8.5449 LearningRate 0.0403 Epoch: 7 Global Step: 303170 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:47:49,124-Speed 2629.91 samples/sec Loss 8.5931 LearningRate 0.0403 Epoch: 7 Global Step: 303180 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:47:53,019-Speed 2629.92 samples/sec Loss 8.5726 LearningRate 0.0403 Epoch: 7 Global Step: 303190 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:47:56,914-Speed 2629.25 samples/sec Loss 8.6641 LearningRate 0.0403 Epoch: 7 Global Step: 303200 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:48:00,808-Speed 2630.67 samples/sec Loss 8.5205 LearningRate 0.0403 Epoch: 7 Global Step: 303210 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:48:04,700-Speed 2631.64 samples/sec Loss 8.5833 LearningRate 0.0403 Epoch: 7 Global Step: 303220 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:48:08,594-Speed 2630.12 samples/sec Loss 8.6219 LearningRate 0.0403 Epoch: 7 Global Step: 303230 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:48:12,497-Speed 2623.87 samples/sec Loss 9.2859 LearningRate 0.0403 Epoch: 7 Global Step: 303240 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:48:16,392-Speed 2629.92 samples/sec Loss 9.0265 LearningRate 0.0403 Epoch: 7 Global Step: 303250 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:48:20,287-Speed 2629.78 samples/sec Loss 8.6759 LearningRate 0.0403 Epoch: 7 Global Step: 303260 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:24,180-Speed 2630.99 samples/sec Loss 8.6433 LearningRate 0.0403 Epoch: 7 Global Step: 303270 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:28,073-Speed 2630.97 samples/sec Loss 8.6435 LearningRate 0.0402 Epoch: 7 Global Step: 303280 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:31,971-Speed 2627.52 samples/sec Loss 8.6000 LearningRate 0.0402 Epoch: 7 Global Step: 303290 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:35,870-Speed 2627.06 samples/sec Loss 8.6982 LearningRate 0.0402 Epoch: 7 Global Step: 303300 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:39,768-Speed 2628.28 samples/sec Loss 8.5412 LearningRate 0.0402 Epoch: 7 Global Step: 303310 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:43,664-Speed 2628.95 samples/sec Loss 8.4961 LearningRate 0.0402 Epoch: 7 Global Step: 303320 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:47,560-Speed 2628.56 samples/sec Loss 8.7140 LearningRate 0.0402 Epoch: 7 Global Step: 303330 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:51,456-Speed 2628.92 samples/sec Loss 8.6829 LearningRate 0.0402 Epoch: 7 Global Step: 303340 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:55,347-Speed 2632.54 samples/sec Loss 8.6759 LearningRate 0.0402 Epoch: 7 Global Step: 303350 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:48:59,245-Speed 2627.50 samples/sec Loss 8.5308 LearningRate 0.0402 Epoch: 7 Global Step: 303360 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:03,141-Speed 2629.55 samples/sec Loss 8.6502 LearningRate 0.0402 Epoch: 7 Global Step: 303370 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:07,213-Speed 2514.84 samples/sec Loss 8.5963 LearningRate 0.0402 Epoch: 7 Global Step: 303380 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:11,279-Speed 2519.60 samples/sec Loss 8.4633 LearningRate 0.0402 Epoch: 7 Global Step: 303390 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:15,277-Speed 2561.99 samples/sec Loss 8.5225 LearningRate 0.0402 Epoch: 7 Global Step: 303400 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:19,183-Speed 2621.92 samples/sec Loss 8.5967 LearningRate 0.0402 Epoch: 7 Global Step: 303410 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:23,076-Speed 2630.92 samples/sec Loss 8.5180 LearningRate 0.0402 Epoch: 7 Global Step: 303420 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:26,969-Speed 2631.02 samples/sec Loss 8.5892 LearningRate 0.0402 Epoch: 7 Global Step: 303430 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:30,874-Speed 2631.84 samples/sec Loss 8.4495 LearningRate 0.0402 Epoch: 7 Global Step: 303440 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:34,779-Speed 2622.96 samples/sec Loss 8.5674 LearningRate 0.0402 Epoch: 7 Global Step: 303450 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:49:38,674-Speed 2630.45 samples/sec Loss 8.6830 LearningRate 0.0402 Epoch: 7 Global Step: 303460 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:49:42,575-Speed 2625.71 samples/sec Loss 8.5565 LearningRate 0.0402 Epoch: 7 Global Step: 303470 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:49:46,472-Speed 2627.60 samples/sec Loss 8.6901 LearningRate 0.0402 Epoch: 7 Global Step: 303480 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:49:50,368-Speed 2629.08 samples/sec Loss 8.6641 LearningRate 0.0402 Epoch: 7 Global Step: 303490 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:49:54,274-Speed 2622.20 samples/sec Loss 8.6410 LearningRate 0.0402 Epoch: 7 Global Step: 303500 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:49:58,169-Speed 2629.61 samples/sec Loss 8.5229 LearningRate 0.0402 Epoch: 7 Global Step: 303510 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:02,071-Speed 2625.39 samples/sec Loss 8.7474 LearningRate 0.0402 Epoch: 7 Global Step: 303520 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:05,997-Speed 2609.07 samples/sec Loss 8.7021 LearningRate 0.0402 Epoch: 7 Global Step: 303530 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:09,891-Speed 2630.63 samples/sec Loss 8.5468 LearningRate 0.0402 Epoch: 7 Global Step: 303540 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:13,805-Speed 2616.68 samples/sec Loss 8.6317 LearningRate 0.0402 Epoch: 7 Global Step: 303550 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:17,697-Speed 2631.63 samples/sec Loss 8.7164 LearningRate 0.0402 Epoch: 7 Global Step: 303560 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:50:21,590-Speed 2630.95 samples/sec Loss 8.5792 LearningRate 0.0402 Epoch: 7 Global Step: 303570 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:25,495-Speed 2623.00 samples/sec Loss 8.5302 LearningRate 0.0402 Epoch: 7 Global Step: 303580 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:29,387-Speed 2631.57 samples/sec Loss 8.6014 LearningRate 0.0402 Epoch: 7 Global Step: 303590 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:33,291-Speed 2623.57 samples/sec Loss 8.5258 LearningRate 0.0402 Epoch: 7 Global Step: 303600 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:37,200-Speed 2620.42 samples/sec Loss 8.5586 LearningRate 0.0402 Epoch: 7 Global Step: 303610 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:41,204-Speed 2558.28 samples/sec Loss 8.5798 LearningRate 0.0402 Epoch: 7 Global Step: 303620 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:45,100-Speed 2629.35 samples/sec Loss 8.6245 LearningRate 0.0402 Epoch: 7 Global Step: 303630 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:48,997-Speed 2628.21 samples/sec Loss 8.6226 LearningRate 0.0402 Epoch: 7 Global Step: 303640 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:50:52,873-Speed 2642.33 samples/sec Loss 8.4849 LearningRate 0.0402 Epoch: 7 Global Step: 303650 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:50:56,773-Speed 2626.49 samples/sec Loss 8.4408 LearningRate 0.0402 Epoch: 7 Global Step: 303660 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:00,666-Speed 2631.23 samples/sec Loss 8.5225 LearningRate 0.0402 Epoch: 7 Global Step: 303670 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:04,564-Speed 2627.01 samples/sec Loss 8.6282 LearningRate 0.0402 Epoch: 7 Global Step: 303680 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:08,459-Speed 2629.87 samples/sec Loss 8.5871 LearningRate 0.0402 Epoch: 7 Global Step: 303690 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:12,356-Speed 2628.52 samples/sec Loss 8.4940 LearningRate 0.0402 Epoch: 7 Global Step: 303700 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:16,249-Speed 2631.14 samples/sec Loss 8.7352 LearningRate 0.0402 Epoch: 7 Global Step: 303710 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:20,141-Speed 2631.96 samples/sec Loss 8.6403 LearningRate 0.0402 Epoch: 7 Global Step: 303720 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:24,040-Speed 2627.03 samples/sec Loss 8.6264 LearningRate 0.0402 Epoch: 7 Global Step: 303730 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:27,933-Speed 2630.81 samples/sec Loss 8.4726 LearningRate 0.0402 Epoch: 7 Global Step: 303740 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:51:31,828-Speed 2629.42 samples/sec Loss 8.4729 LearningRate 0.0402 Epoch: 7 Global Step: 303750 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:51:35,722-Speed 2630.56 samples/sec Loss 8.5622 LearningRate 0.0402 Epoch: 7 Global Step: 303760 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:51:39,626-Speed 2623.21 samples/sec Loss 8.6490 LearningRate 0.0402 Epoch: 7 Global Step: 303770 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:51:43,519-Speed 2631.31 samples/sec Loss 8.8295 LearningRate 0.0402 Epoch: 7 Global Step: 303780 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:51:47,413-Speed 2630.13 samples/sec Loss 8.4584 LearningRate 0.0402 Epoch: 7 Global Step: 303790 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:51:51,307-Speed 2631.00 samples/sec Loss 8.6541 LearningRate 0.0402 Epoch: 7 Global Step: 303800 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:51:55,207-Speed 2626.36 samples/sec Loss 8.5962 LearningRate 0.0402 Epoch: 7 Global Step: 303810 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:51:59,103-Speed 2628.91 samples/sec Loss 8.5817 LearningRate 0.0402 Epoch: 7 Global Step: 303820 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:02,998-Speed 2629.68 samples/sec Loss 8.5626 LearningRate 0.0402 Epoch: 7 Global Step: 303830 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:06,894-Speed 2628.78 samples/sec Loss 8.6514 LearningRate 0.0402 Epoch: 7 Global Step: 303840 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:10,791-Speed 2627.89 samples/sec Loss 8.4795 LearningRate 0.0402 Epoch: 7 Global Step: 303850 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:52:14,687-Speed 2629.18 samples/sec Loss 8.6875 LearningRate 0.0402 Epoch: 7 Global Step: 303860 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:52:18,586-Speed 2627.35 samples/sec Loss 8.6502 LearningRate 0.0402 Epoch: 7 Global Step: 303870 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:52:22,479-Speed 2631.30 samples/sec Loss 8.6692 LearningRate 0.0402 Epoch: 7 Global Step: 303880 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:52:26,363-Speed 2637.18 samples/sec Loss 8.5692 LearningRate 0.0402 Epoch: 7 Global Step: 303890 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:30,257-Speed 2630.44 samples/sec Loss 8.4990 LearningRate 0.0402 Epoch: 7 Global Step: 303900 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:34,151-Speed 2630.17 samples/sec Loss 8.6063 LearningRate 0.0402 Epoch: 7 Global Step: 303910 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:38,053-Speed 2624.52 samples/sec Loss 8.5409 LearningRate 0.0402 Epoch: 7 Global Step: 303920 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:41,952-Speed 2626.81 samples/sec Loss 8.5797 LearningRate 0.0401 Epoch: 7 Global Step: 303930 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:45,851-Speed 2627.64 samples/sec Loss 8.4928 LearningRate 0.0401 Epoch: 7 Global Step: 303940 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:52:49,727-Speed 2642.82 samples/sec Loss 8.5471 LearningRate 0.0401 Epoch: 7 Global Step: 303950 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:52:53,626-Speed 2626.14 samples/sec Loss 8.4690 LearningRate 0.0401 Epoch: 7 Global Step: 303960 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:52:57,520-Speed 2631.34 samples/sec Loss 8.5969 LearningRate 0.0401 Epoch: 7 Global Step: 303970 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:53:01,419-Speed 2626.95 samples/sec Loss 8.5844 LearningRate 0.0401 Epoch: 7 Global Step: 303980 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:53:05,317-Speed 2627.12 samples/sec Loss 8.4908 LearningRate 0.0401 Epoch: 7 Global Step: 303990 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:53:09,212-Speed 2629.49 samples/sec Loss 8.5825 LearningRate 0.0401 Epoch: 7 Global Step: 304000 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:53:13,025-Speed 2686.50 samples/sec Loss 9.2987 LearningRate 0.0401 Epoch: 7 Global Step: 304010 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:16,924-Speed 2627.26 samples/sec Loss 8.4796 LearningRate 0.0401 Epoch: 7 Global Step: 304020 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:20,825-Speed 2625.98 samples/sec Loss 8.6199 LearningRate 0.0401 Epoch: 7 Global Step: 304030 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:24,720-Speed 2629.33 samples/sec Loss 8.7720 LearningRate 0.0401 Epoch: 7 Global Step: 304040 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:28,617-Speed 2628.27 samples/sec Loss 8.7637 LearningRate 0.0401 Epoch: 7 Global Step: 304050 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:32,516-Speed 2626.91 samples/sec Loss 8.5875 LearningRate 0.0401 Epoch: 7 Global Step: 304060 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:36,412-Speed 2628.55 samples/sec Loss 8.6728 LearningRate 0.0401 Epoch: 7 Global Step: 304070 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:40,312-Speed 2625.90 samples/sec Loss 8.5514 LearningRate 0.0401 Epoch: 7 Global Step: 304080 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:44,209-Speed 2628.78 samples/sec Loss 8.6294 LearningRate 0.0401 Epoch: 7 Global Step: 304090 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:48,102-Speed 2631.14 samples/sec Loss 8.5215 LearningRate 0.0401 Epoch: 7 Global Step: 304100 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 05:53:52,000-Speed 2627.62 samples/sec Loss 8.4511 LearningRate 0.0401 Epoch: 7 Global Step: 304110 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:53:55,892-Speed 2631.74 samples/sec Loss 8.6081 LearningRate 0.0401 Epoch: 7 Global Step: 304120 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:53:59,787-Speed 2629.81 samples/sec Loss 8.6296 LearningRate 0.0401 Epoch: 7 Global Step: 304130 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:03,682-Speed 2629.55 samples/sec Loss 8.4919 LearningRate 0.0401 Epoch: 7 Global Step: 304140 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:07,582-Speed 2626.27 samples/sec Loss 8.5160 LearningRate 0.0401 Epoch: 7 Global Step: 304150 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:11,487-Speed 2622.44 samples/sec Loss 8.5753 LearningRate 0.0401 Epoch: 7 Global Step: 304160 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:15,385-Speed 2628.21 samples/sec Loss 8.5074 LearningRate 0.0401 Epoch: 7 Global Step: 304170 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:19,282-Speed 2627.89 samples/sec Loss 8.6776 LearningRate 0.0401 Epoch: 7 Global Step: 304180 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:23,284-Speed 2559.68 samples/sec Loss 8.4884 LearningRate 0.0401 Epoch: 7 Global Step: 304190 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:27,174-Speed 2633.07 samples/sec Loss 8.4716 LearningRate 0.0401 Epoch: 7 Global Step: 304200 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 05:54:31,063-Speed 2633.71 samples/sec Loss 8.6322 LearningRate 0.0401 Epoch: 7 Global Step: 304210 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:54:34,957-Speed 2630.88 samples/sec Loss 8.5658 LearningRate 0.0401 Epoch: 7 Global Step: 304220 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:54:38,848-Speed 2631.90 samples/sec Loss 8.4835 LearningRate 0.0401 Epoch: 7 Global Step: 304230 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:54:42,745-Speed 2628.63 samples/sec Loss 8.5354 LearningRate 0.0401 Epoch: 7 Global Step: 304240 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:54:46,646-Speed 2625.42 samples/sec Loss 8.5428 LearningRate 0.0401 Epoch: 7 Global Step: 304250 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:54:52,108-Speed 1875.21 samples/sec Loss 8.6405 LearningRate 0.0401 Epoch: 7 Global Step: 304260 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:54:55,993-Speed 2636.59 samples/sec Loss 8.5469 LearningRate 0.0401 Epoch: 7 Global Step: 304270 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:54:59,890-Speed 2627.97 samples/sec Loss 8.5972 LearningRate 0.0401 Epoch: 7 Global Step: 304280 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:55:03,789-Speed 2627.21 samples/sec Loss 8.7873 LearningRate 0.0401 Epoch: 7 Global Step: 304290 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:55:07,735-Speed 2595.18 samples/sec Loss 8.6374 LearningRate 0.0401 Epoch: 7 Global Step: 304300 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 05:55:11,634-Speed 2627.07 samples/sec Loss 8.4911 LearningRate 0.0401 Epoch: 7 Global Step: 304310 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:15,538-Speed 2623.83 samples/sec Loss 8.6899 LearningRate 0.0401 Epoch: 7 Global Step: 304320 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:19,442-Speed 2623.92 samples/sec Loss 8.4429 LearningRate 0.0401 Epoch: 7 Global Step: 304330 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:23,337-Speed 2629.00 samples/sec Loss 8.6087 LearningRate 0.0401 Epoch: 7 Global Step: 304340 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:27,226-Speed 2634.02 samples/sec Loss 8.5974 LearningRate 0.0401 Epoch: 7 Global Step: 304350 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:31,118-Speed 2631.89 samples/sec Loss 8.4825 LearningRate 0.0401 Epoch: 7 Global Step: 304360 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:35,057-Speed 2600.19 samples/sec Loss 8.6743 LearningRate 0.0401 Epoch: 7 Global Step: 304370 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:38,950-Speed 2631.51 samples/sec Loss 8.6165 LearningRate 0.0401 Epoch: 7 Global Step: 304380 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:42,855-Speed 2622.96 samples/sec Loss 8.4773 LearningRate 0.0401 Epoch: 7 Global Step: 304390 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:46,745-Speed 2633.00 samples/sec Loss 8.5520 LearningRate 0.0401 Epoch: 7 Global Step: 304400 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 05:55:50,633-Speed 2634.15 samples/sec Loss 8.5794 LearningRate 0.0401 Epoch: 7 Global Step: 304410 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:55:54,521-Speed 2633.99 samples/sec Loss 8.6732 LearningRate 0.0401 Epoch: 7 Global Step: 304420 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:55:58,416-Speed 2630.46 samples/sec Loss 8.5286 LearningRate 0.0401 Epoch: 7 Global Step: 304430 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:02,310-Speed 2630.08 samples/sec Loss 8.6001 LearningRate 0.0401 Epoch: 7 Global Step: 304440 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:06,203-Speed 2631.58 samples/sec Loss 8.4908 LearningRate 0.0401 Epoch: 7 Global Step: 304450 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:10,107-Speed 2623.26 samples/sec Loss 8.4991 LearningRate 0.0401 Epoch: 7 Global Step: 304460 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:14,003-Speed 2629.38 samples/sec Loss 8.5999 LearningRate 0.0401 Epoch: 7 Global Step: 304470 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:17,991-Speed 2567.76 samples/sec Loss 8.5342 LearningRate 0.0401 Epoch: 7 Global Step: 304480 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:21,870-Speed 2640.82 samples/sec Loss 8.6815 LearningRate 0.0401 Epoch: 7 Global Step: 304490 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:25,772-Speed 2624.66 samples/sec Loss 8.7450 LearningRate 0.0401 Epoch: 7 Global Step: 304500 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 05:56:29,661-Speed 2633.85 samples/sec Loss 8.6513 LearningRate 0.0401 Epoch: 7 Global Step: 304510 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:56:33,551-Speed 2632.36 samples/sec Loss 8.5443 LearningRate 0.0401 Epoch: 7 Global Step: 304520 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:56:37,447-Speed 2629.74 samples/sec Loss 8.6006 LearningRate 0.0401 Epoch: 7 Global Step: 304530 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:56:41,339-Speed 2631.63 samples/sec Loss 8.4508 LearningRate 0.0401 Epoch: 7 Global Step: 304540 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:56:45,265-Speed 2609.19 samples/sec Loss 8.5842 LearningRate 0.0401 Epoch: 7 Global Step: 304550 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:56:49,159-Speed 2630.36 samples/sec Loss 8.4830 LearningRate 0.0401 Epoch: 7 Global Step: 304560 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:56:53,051-Speed 2631.57 samples/sec Loss 8.5183 LearningRate 0.0401 Epoch: 7 Global Step: 304570 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:56:56,939-Speed 2634.27 samples/sec Loss 8.5437 LearningRate 0.0400 Epoch: 7 Global Step: 304580 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:00,826-Speed 2634.93 samples/sec Loss 8.4319 LearningRate 0.0400 Epoch: 7 Global Step: 304590 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:04,716-Speed 2632.78 samples/sec Loss 8.7338 LearningRate 0.0400 Epoch: 7 Global Step: 304600 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:08,623-Speed 2621.74 samples/sec Loss 8.5108 LearningRate 0.0400 Epoch: 7 Global Step: 304610 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:57:12,517-Speed 2630.47 samples/sec Loss 8.5684 LearningRate 0.0400 Epoch: 7 Global Step: 304620 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:57:16,426-Speed 2620.32 samples/sec Loss 8.5007 LearningRate 0.0400 Epoch: 7 Global Step: 304630 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:57:20,326-Speed 2626.31 samples/sec Loss 8.5713 LearningRate 0.0400 Epoch: 7 Global Step: 304640 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:24,218-Speed 2631.89 samples/sec Loss 8.6032 LearningRate 0.0400 Epoch: 7 Global Step: 304650 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:28,109-Speed 2632.03 samples/sec Loss 8.5719 LearningRate 0.0400 Epoch: 7 Global Step: 304660 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:32,007-Speed 2627.53 samples/sec Loss 8.5706 LearningRate 0.0400 Epoch: 7 Global Step: 304670 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:35,904-Speed 2627.95 samples/sec Loss 8.5493 LearningRate 0.0400 Epoch: 7 Global Step: 304680 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:39,794-Speed 2633.30 samples/sec Loss 8.6357 LearningRate 0.0400 Epoch: 7 Global Step: 304690 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:43,685-Speed 2632.42 samples/sec Loss 8.5500 LearningRate 0.0400 Epoch: 7 Global Step: 304700 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:47,579-Speed 2630.20 samples/sec Loss 8.5527 LearningRate 0.0400 Epoch: 7 Global Step: 304710 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:51,473-Speed 2630.92 samples/sec Loss 8.5974 LearningRate 0.0400 Epoch: 7 Global Step: 304720 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:55,370-Speed 2628.07 samples/sec Loss 8.6293 LearningRate 0.0400 Epoch: 7 Global Step: 304730 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:57:59,267-Speed 2628.83 samples/sec Loss 8.5306 LearningRate 0.0400 Epoch: 7 Global Step: 304740 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:03,166-Speed 2626.74 samples/sec Loss 8.6730 LearningRate 0.0400 Epoch: 7 Global Step: 304750 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:07,064-Speed 2627.66 samples/sec Loss 8.5227 LearningRate 0.0400 Epoch: 7 Global Step: 304760 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:10,963-Speed 2626.42 samples/sec Loss 8.5748 LearningRate 0.0400 Epoch: 7 Global Step: 304770 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:14,856-Speed 2631.71 samples/sec Loss 8.5343 LearningRate 0.0400 Epoch: 7 Global Step: 304780 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:18,745-Speed 2633.39 samples/sec Loss 8.5296 LearningRate 0.0400 Epoch: 7 Global Step: 304790 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:22,641-Speed 2628.92 samples/sec Loss 8.5801 LearningRate 0.0400 Epoch: 7 Global Step: 304800 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:26,536-Speed 2630.08 samples/sec Loss 8.6237 LearningRate 0.0400 Epoch: 7 Global Step: 304810 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:30,432-Speed 2628.98 samples/sec Loss 8.5912 LearningRate 0.0400 Epoch: 7 Global Step: 304820 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:34,334-Speed 2624.70 samples/sec Loss 8.5501 LearningRate 0.0400 Epoch: 7 Global Step: 304830 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:38,234-Speed 2626.15 samples/sec Loss 8.5468 LearningRate 0.0400 Epoch: 7 Global Step: 304840 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:58:42,136-Speed 2624.68 samples/sec Loss 8.5936 LearningRate 0.0400 Epoch: 7 Global Step: 304850 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:58:46,020-Speed 2637.17 samples/sec Loss 8.5516 LearningRate 0.0400 Epoch: 7 Global Step: 304860 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:49,937-Speed 2614.72 samples/sec Loss 8.5850 LearningRate 0.0400 Epoch: 7 Global Step: 304870 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:53,842-Speed 2623.49 samples/sec Loss 8.4428 LearningRate 0.0400 Epoch: 7 Global Step: 304880 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:58:57,744-Speed 2624.67 samples/sec Loss 8.6072 LearningRate 0.0400 Epoch: 7 Global Step: 304890 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:01,688-Speed 2597.41 samples/sec Loss 8.6228 LearningRate 0.0400 Epoch: 7 Global Step: 304900 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:05,685-Speed 2562.46 samples/sec Loss 8.6526 LearningRate 0.0400 Epoch: 7 Global Step: 304910 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:09,582-Speed 2628.41 samples/sec Loss 8.5440 LearningRate 0.0400 Epoch: 7 Global Step: 304920 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:13,480-Speed 2627.30 samples/sec Loss 8.3943 LearningRate 0.0400 Epoch: 7 Global Step: 304930 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:17,381-Speed 2625.32 samples/sec Loss 8.6671 LearningRate 0.0400 Epoch: 7 Global Step: 304940 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:21,281-Speed 2626.18 samples/sec Loss 8.6681 LearningRate 0.0400 Epoch: 7 Global Step: 304950 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:25,177-Speed 2629.07 samples/sec Loss 8.5241 LearningRate 0.0400 Epoch: 7 Global Step: 304960 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:59:29,089-Speed 2618.29 samples/sec Loss 8.4978 LearningRate 0.0400 Epoch: 7 Global Step: 304970 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:59:32,996-Speed 2621.76 samples/sec Loss 8.6836 LearningRate 0.0400 Epoch: 7 Global Step: 304980 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:59:36,889-Speed 2630.77 samples/sec Loss 8.5408 LearningRate 0.0400 Epoch: 7 Global Step: 304990 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:59:40,787-Speed 2627.63 samples/sec Loss 8.4563 LearningRate 0.0400 Epoch: 7 Global Step: 305000 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 05:59:44,676-Speed 2633.99 samples/sec Loss 8.5791 LearningRate 0.0400 Epoch: 7 Global Step: 305010 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:48,585-Speed 2620.18 samples/sec Loss 8.6122 LearningRate 0.0400 Epoch: 7 Global Step: 305020 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 05:59:52,471-Speed 2636.31 samples/sec Loss 8.4478 LearningRate 0.0400 Epoch: 7 Global Step: 305030 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 05:59:56,368-Speed 2628.10 samples/sec Loss 8.6102 LearningRate 0.0400 Epoch: 7 Global Step: 305040 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:00,269-Speed 2625.82 samples/sec Loss 8.6850 LearningRate 0.0400 Epoch: 7 Global Step: 305050 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:04,165-Speed 2628.59 samples/sec Loss 8.6982 LearningRate 0.0400 Epoch: 7 Global Step: 305060 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:08,056-Speed 2632.19 samples/sec Loss 8.6112 LearningRate 0.0400 Epoch: 7 Global Step: 305070 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:11,952-Speed 2628.93 samples/sec Loss 8.4594 LearningRate 0.0400 Epoch: 7 Global Step: 305080 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:15,876-Speed 2610.86 samples/sec Loss 8.3985 LearningRate 0.0400 Epoch: 7 Global Step: 305090 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:19,766-Speed 2632.93 samples/sec Loss 8.6440 LearningRate 0.0400 Epoch: 7 Global Step: 305100 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:23,658-Speed 2631.52 samples/sec Loss 8.6814 LearningRate 0.0400 Epoch: 7 Global Step: 305110 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:27,555-Speed 2627.97 samples/sec Loss 8.5238 LearningRate 0.0400 Epoch: 7 Global Step: 305120 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:00:31,455-Speed 2627.06 samples/sec Loss 8.5401 LearningRate 0.0400 Epoch: 7 Global Step: 305130 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:00:35,350-Speed 2629.24 samples/sec Loss 8.3756 LearningRate 0.0400 Epoch: 7 Global Step: 305140 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:00:39,256-Speed 2622.05 samples/sec Loss 8.5011 LearningRate 0.0400 Epoch: 7 Global Step: 305150 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:00:43,155-Speed 2627.44 samples/sec Loss 8.5539 LearningRate 0.0400 Epoch: 7 Global Step: 305160 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:00:47,053-Speed 2627.09 samples/sec Loss 8.5468 LearningRate 0.0400 Epoch: 7 Global Step: 305170 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:00:50,945-Speed 2632.11 samples/sec Loss 8.5157 LearningRate 0.0400 Epoch: 7 Global Step: 305180 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:00:54,731-Speed 2705.79 samples/sec Loss 9.4146 LearningRate 0.0400 Epoch: 7 Global Step: 305190 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:00:58,626-Speed 2629.01 samples/sec Loss 9.8982 LearningRate 0.0400 Epoch: 7 Global Step: 305200 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:02,515-Speed 2633.94 samples/sec Loss 8.9029 LearningRate 0.0400 Epoch: 7 Global Step: 305210 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:06,403-Speed 2634.94 samples/sec Loss 8.6005 LearningRate 0.0400 Epoch: 7 Global Step: 305220 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:10,291-Speed 2634.30 samples/sec Loss 8.6981 LearningRate 0.0400 Epoch: 7 Global Step: 305230 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:14,201-Speed 2619.70 samples/sec Loss 8.5706 LearningRate 0.0399 Epoch: 7 Global Step: 305240 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:18,197-Speed 2562.95 samples/sec Loss 8.6055 LearningRate 0.0399 Epoch: 7 Global Step: 305250 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:22,090-Speed 2630.96 samples/sec Loss 8.6630 LearningRate 0.0399 Epoch: 7 Global Step: 305260 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:25,987-Speed 2628.32 samples/sec Loss 8.6119 LearningRate 0.0399 Epoch: 7 Global Step: 305270 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:29,891-Speed 2623.59 samples/sec Loss 8.5976 LearningRate 0.0399 Epoch: 7 Global Step: 305280 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:01:33,829-Speed 2600.66 samples/sec Loss 8.6122 LearningRate 0.0399 Epoch: 7 Global Step: 305290 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:01:37,829-Speed 2560.17 samples/sec Loss 8.5278 LearningRate 0.0399 Epoch: 7 Global Step: 305300 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:01:41,751-Speed 2612.16 samples/sec Loss 8.5347 LearningRate 0.0399 Epoch: 7 Global Step: 305310 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:01:45,644-Speed 2631.00 samples/sec Loss 8.4925 LearningRate 0.0399 Epoch: 7 Global Step: 305320 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:01:49,534-Speed 2633.33 samples/sec Loss 8.4748 LearningRate 0.0399 Epoch: 7 Global Step: 305330 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:01:53,428-Speed 2630.17 samples/sec Loss 8.5090 LearningRate 0.0399 Epoch: 7 Global Step: 305340 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:01:57,334-Speed 2622.36 samples/sec Loss 8.4794 LearningRate 0.0399 Epoch: 7 Global Step: 305350 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:02:01,237-Speed 2624.31 samples/sec Loss 8.5918 LearningRate 0.0399 Epoch: 7 Global Step: 305360 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:02:05,133-Speed 2629.30 samples/sec Loss 8.7403 LearningRate 0.0399 Epoch: 7 Global Step: 305370 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:02:09,046-Speed 2617.30 samples/sec Loss 8.5713 LearningRate 0.0399 Epoch: 7 Global Step: 305380 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:02:12,935-Speed 2636.03 samples/sec Loss 8.4303 LearningRate 0.0399 Epoch: 7 Global Step: 305390 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:16,826-Speed 2632.38 samples/sec Loss 8.6535 LearningRate 0.0399 Epoch: 7 Global Step: 305400 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:20,727-Speed 2626.03 samples/sec Loss 8.5731 LearningRate 0.0399 Epoch: 7 Global Step: 305410 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:24,622-Speed 2629.54 samples/sec Loss 8.4004 LearningRate 0.0399 Epoch: 7 Global Step: 305420 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:28,544-Speed 2611.28 samples/sec Loss 8.5080 LearningRate 0.0399 Epoch: 7 Global Step: 305430 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:32,449-Speed 2623.20 samples/sec Loss 8.4514 LearningRate 0.0399 Epoch: 7 Global Step: 305440 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:36,356-Speed 2621.85 samples/sec Loss 8.4432 LearningRate 0.0399 Epoch: 7 Global Step: 305450 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:40,257-Speed 2625.36 samples/sec Loss 8.4795 LearningRate 0.0399 Epoch: 7 Global Step: 305460 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:44,170-Speed 2617.30 samples/sec Loss 8.6018 LearningRate 0.0399 Epoch: 7 Global Step: 305470 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:48,076-Speed 2622.21 samples/sec Loss 8.4387 LearningRate 0.0399 Epoch: 7 Global Step: 305480 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:02:51,986-Speed 2619.73 samples/sec Loss 8.5378 LearningRate 0.0399 Epoch: 7 Global Step: 305490 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:02:55,909-Speed 2611.27 samples/sec Loss 8.4738 LearningRate 0.0399 Epoch: 7 Global Step: 305500 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:02:59,805-Speed 2628.71 samples/sec Loss 8.4426 LearningRate 0.0399 Epoch: 7 Global Step: 305510 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:03,698-Speed 2630.91 samples/sec Loss 8.4181 LearningRate 0.0399 Epoch: 7 Global Step: 305520 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:07,592-Speed 2630.10 samples/sec Loss 8.5765 LearningRate 0.0399 Epoch: 7 Global Step: 305530 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:11,494-Speed 2625.74 samples/sec Loss 8.6355 LearningRate 0.0399 Epoch: 7 Global Step: 305540 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:15,403-Speed 2620.12 samples/sec Loss 8.5735 LearningRate 0.0399 Epoch: 7 Global Step: 305550 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:19,293-Speed 2632.69 samples/sec Loss 8.5515 LearningRate 0.0399 Epoch: 7 Global Step: 305560 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:23,190-Speed 2628.07 samples/sec Loss 8.5326 LearningRate 0.0399 Epoch: 7 Global Step: 305570 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:27,103-Speed 2617.58 samples/sec Loss 8.6175 LearningRate 0.0399 Epoch: 7 Global Step: 305580 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:03:31,177-Speed 2514.51 samples/sec Loss 8.4354 LearningRate 0.0399 Epoch: 7 Global Step: 305590 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:03:35,228-Speed 2528.45 samples/sec Loss 8.5142 LearningRate 0.0399 Epoch: 7 Global Step: 305600 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:03:39,133-Speed 2622.76 samples/sec Loss 8.5604 LearningRate 0.0399 Epoch: 7 Global Step: 305610 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:03:43,067-Speed 2603.32 samples/sec Loss 8.4914 LearningRate 0.0399 Epoch: 7 Global Step: 305620 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:03:47,056-Speed 2567.79 samples/sec Loss 8.5680 LearningRate 0.0399 Epoch: 7 Global Step: 305630 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:03:50,950-Speed 2630.88 samples/sec Loss 8.6136 LearningRate 0.0399 Epoch: 7 Global Step: 305640 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:03:54,870-Speed 2612.55 samples/sec Loss 8.5115 LearningRate 0.0399 Epoch: 7 Global Step: 305650 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:03:58,774-Speed 2624.20 samples/sec Loss 8.4392 LearningRate 0.0399 Epoch: 7 Global Step: 305660 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:04:02,691-Speed 2614.46 samples/sec Loss 8.5593 LearningRate 0.0399 Epoch: 7 Global Step: 305670 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:04:06,676-Speed 2570.56 samples/sec Loss 8.5004 LearningRate 0.0399 Epoch: 7 Global Step: 305680 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:04:10,567-Speed 2631.97 samples/sec Loss 8.5839 LearningRate 0.0399 Epoch: 7 Global Step: 305690 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:14,463-Speed 2628.74 samples/sec Loss 8.4492 LearningRate 0.0399 Epoch: 7 Global Step: 305700 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:18,357-Speed 2630.69 samples/sec Loss 8.4585 LearningRate 0.0399 Epoch: 7 Global Step: 305710 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:22,247-Speed 2633.00 samples/sec Loss 8.4294 LearningRate 0.0399 Epoch: 7 Global Step: 305720 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:26,141-Speed 2630.42 samples/sec Loss 8.5161 LearningRate 0.0399 Epoch: 7 Global Step: 305730 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:30,032-Speed 2632.25 samples/sec Loss 8.3683 LearningRate 0.0399 Epoch: 7 Global Step: 305740 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:33,921-Speed 2633.75 samples/sec Loss 8.5120 LearningRate 0.0399 Epoch: 7 Global Step: 305750 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:37,831-Speed 2619.48 samples/sec Loss 8.5376 LearningRate 0.0399 Epoch: 7 Global Step: 305760 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:41,726-Speed 2629.47 samples/sec Loss 8.4286 LearningRate 0.0399 Epoch: 7 Global Step: 305770 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:45,616-Speed 2633.19 samples/sec Loss 8.4957 LearningRate 0.0399 Epoch: 7 Global Step: 305780 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:04:49,507-Speed 2632.56 samples/sec Loss 8.5162 LearningRate 0.0399 Epoch: 7 Global Step: 305790 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:04:53,399-Speed 2631.96 samples/sec Loss 8.4525 LearningRate 0.0399 Epoch: 7 Global Step: 305800 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:04:57,294-Speed 2629.27 samples/sec Loss 8.6355 LearningRate 0.0399 Epoch: 7 Global Step: 305810 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:05:01,190-Speed 2629.50 samples/sec Loss 8.4964 LearningRate 0.0399 Epoch: 7 Global Step: 305820 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:05:05,068-Speed 2640.61 samples/sec Loss 8.5204 LearningRate 0.0399 Epoch: 7 Global Step: 305830 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:05:08,973-Speed 2623.08 samples/sec Loss 8.4920 LearningRate 0.0399 Epoch: 7 Global Step: 305840 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:05:12,850-Speed 2641.32 samples/sec Loss 8.5726 LearningRate 0.0399 Epoch: 7 Global Step: 305850 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:16,770-Speed 2613.78 samples/sec Loss 8.4063 LearningRate 0.0399 Epoch: 7 Global Step: 305860 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:20,669-Speed 2627.27 samples/sec Loss 8.4011 LearningRate 0.0399 Epoch: 7 Global Step: 305870 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:24,564-Speed 2629.75 samples/sec Loss 8.5305 LearningRate 0.0399 Epoch: 7 Global Step: 305880 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:28,459-Speed 2629.69 samples/sec Loss 8.5265 LearningRate 0.0399 Epoch: 7 Global Step: 305890 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:32,351-Speed 2632.55 samples/sec Loss 8.6343 LearningRate 0.0398 Epoch: 7 Global Step: 305900 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:36,242-Speed 2632.47 samples/sec Loss 8.4502 LearningRate 0.0398 Epoch: 7 Global Step: 305910 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:40,133-Speed 2631.96 samples/sec Loss 8.5370 LearningRate 0.0398 Epoch: 7 Global Step: 305920 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:44,028-Speed 2629.72 samples/sec Loss 8.4392 LearningRate 0.0398 Epoch: 7 Global Step: 305930 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:47,922-Speed 2629.84 samples/sec Loss 8.4694 LearningRate 0.0398 Epoch: 7 Global Step: 305940 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:05:51,827-Speed 2623.63 samples/sec Loss 8.6031 LearningRate 0.0398 Epoch: 7 Global Step: 305950 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:05:55,726-Speed 2627.00 samples/sec Loss 8.4668 LearningRate 0.0398 Epoch: 7 Global Step: 305960 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:05:59,691-Speed 2583.35 samples/sec Loss 8.4533 LearningRate 0.0398 Epoch: 7 Global Step: 305970 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:06:03,653-Speed 2584.86 samples/sec Loss 8.5926 LearningRate 0.0398 Epoch: 7 Global Step: 305980 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:06:07,546-Speed 2631.16 samples/sec Loss 8.6704 LearningRate 0.0398 Epoch: 7 Global Step: 305990 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:06:11,438-Speed 2631.40 samples/sec Loss 8.5273 LearningRate 0.0398 Epoch: 7 Global Step: 306000 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:06:15,349-Speed 2619.22 samples/sec Loss 8.5875 LearningRate 0.0398 Epoch: 7 Global Step: 306010 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:06:19,244-Speed 2629.44 samples/sec Loss 8.6387 LearningRate 0.0398 Epoch: 7 Global Step: 306020 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:06:23,152-Speed 2621.13 samples/sec Loss 8.4123 LearningRate 0.0398 Epoch: 7 Global Step: 306030 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:06:26,971-Speed 2682.16 samples/sec Loss 9.0834 LearningRate 0.0398 Epoch: 7 Global Step: 306040 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:30,856-Speed 2635.99 samples/sec Loss 8.6332 LearningRate 0.0398 Epoch: 7 Global Step: 306050 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:34,748-Speed 2631.67 samples/sec Loss 8.6777 LearningRate 0.0398 Epoch: 7 Global Step: 306060 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:38,639-Speed 2632.15 samples/sec Loss 8.6228 LearningRate 0.0398 Epoch: 7 Global Step: 306070 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:42,530-Speed 2632.40 samples/sec Loss 8.5618 LearningRate 0.0398 Epoch: 7 Global Step: 306080 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:46,417-Speed 2635.97 samples/sec Loss 8.5394 LearningRate 0.0398 Epoch: 7 Global Step: 306090 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:50,303-Speed 2635.55 samples/sec Loss 8.5831 LearningRate 0.0398 Epoch: 7 Global Step: 306100 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:54,190-Speed 2634.92 samples/sec Loss 8.4807 LearningRate 0.0398 Epoch: 7 Global Step: 306110 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:06:58,079-Speed 2633.73 samples/sec Loss 8.4766 LearningRate 0.0398 Epoch: 7 Global Step: 306120 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:07:01,981-Speed 2624.84 samples/sec Loss 8.3966 LearningRate 0.0398 Epoch: 7 Global Step: 306130 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:07:05,881-Speed 2625.74 samples/sec Loss 8.5247 LearningRate 0.0398 Epoch: 7 Global Step: 306140 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:09,771-Speed 2634.02 samples/sec Loss 8.5659 LearningRate 0.0398 Epoch: 7 Global Step: 306150 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:13,660-Speed 2633.11 samples/sec Loss 8.5497 LearningRate 0.0398 Epoch: 7 Global Step: 306160 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:17,549-Speed 2634.11 samples/sec Loss 8.5692 LearningRate 0.0398 Epoch: 7 Global Step: 306170 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:21,441-Speed 2631.59 samples/sec Loss 8.5985 LearningRate 0.0398 Epoch: 7 Global Step: 306180 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:25,334-Speed 2631.65 samples/sec Loss 8.5638 LearningRate 0.0398 Epoch: 7 Global Step: 306190 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:29,238-Speed 2623.25 samples/sec Loss 8.5046 LearningRate 0.0398 Epoch: 7 Global Step: 306200 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:33,128-Speed 2632.98 samples/sec Loss 8.6065 LearningRate 0.0398 Epoch: 7 Global Step: 306210 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:37,021-Speed 2630.53 samples/sec Loss 8.5638 LearningRate 0.0398 Epoch: 7 Global Step: 306220 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:40,909-Speed 2634.69 samples/sec Loss 8.5878 LearningRate 0.0398 Epoch: 7 Global Step: 306230 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:07:44,837-Speed 2607.83 samples/sec Loss 8.5466 LearningRate 0.0398 Epoch: 7 Global Step: 306240 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:07:48,730-Speed 2631.35 samples/sec Loss 8.5451 LearningRate 0.0398 Epoch: 7 Global Step: 306250 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:07:52,633-Speed 2623.77 samples/sec Loss 8.6375 LearningRate 0.0398 Epoch: 7 Global Step: 306260 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:07:56,533-Speed 2626.93 samples/sec Loss 8.5221 LearningRate 0.0398 Epoch: 7 Global Step: 306270 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:08:00,435-Speed 2625.15 samples/sec Loss 8.5941 LearningRate 0.0398 Epoch: 7 Global Step: 306280 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:08:04,332-Speed 2628.09 samples/sec Loss 8.3997 LearningRate 0.0398 Epoch: 7 Global Step: 306290 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:08:08,221-Speed 2633.40 samples/sec Loss 8.5476 LearningRate 0.0398 Epoch: 7 Global Step: 306300 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:08:12,109-Speed 2634.18 samples/sec Loss 8.5686 LearningRate 0.0398 Epoch: 7 Global Step: 306310 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:08:15,998-Speed 2633.81 samples/sec Loss 8.3976 LearningRate 0.0398 Epoch: 7 Global Step: 306320 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:08:19,886-Speed 2635.17 samples/sec Loss 8.5447 LearningRate 0.0398 Epoch: 7 Global Step: 306330 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:08:23,779-Speed 2630.76 samples/sec Loss 8.5990 LearningRate 0.0398 Epoch: 7 Global Step: 306340 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:27,676-Speed 2628.61 samples/sec Loss 8.4592 LearningRate 0.0398 Epoch: 7 Global Step: 306350 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:31,566-Speed 2632.28 samples/sec Loss 8.3781 LearningRate 0.0398 Epoch: 7 Global Step: 306360 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:35,476-Speed 2619.32 samples/sec Loss 8.6451 LearningRate 0.0398 Epoch: 7 Global Step: 306370 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:39,362-Speed 2636.16 samples/sec Loss 8.5282 LearningRate 0.0398 Epoch: 7 Global Step: 306380 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:43,261-Speed 2626.57 samples/sec Loss 8.6735 LearningRate 0.0398 Epoch: 7 Global Step: 306390 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:47,159-Speed 2627.97 samples/sec Loss 8.4448 LearningRate 0.0398 Epoch: 7 Global Step: 306400 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:51,051-Speed 2631.75 samples/sec Loss 8.4476 LearningRate 0.0398 Epoch: 7 Global Step: 306410 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:54,944-Speed 2630.97 samples/sec Loss 8.4803 LearningRate 0.0398 Epoch: 7 Global Step: 306420 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:08:58,836-Speed 2631.73 samples/sec Loss 8.5601 LearningRate 0.0398 Epoch: 7 Global Step: 306430 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:09:02,728-Speed 2630.92 samples/sec Loss 8.5499 LearningRate 0.0398 Epoch: 7 Global Step: 306440 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:09:06,620-Speed 2631.94 samples/sec Loss 8.5537 LearningRate 0.0398 Epoch: 7 Global Step: 306450 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:09:10,513-Speed 2630.86 samples/sec Loss 8.4526 LearningRate 0.0398 Epoch: 7 Global Step: 306460 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:09:14,420-Speed 2621.76 samples/sec Loss 8.6343 LearningRate 0.0398 Epoch: 7 Global Step: 306470 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:09:18,313-Speed 2630.82 samples/sec Loss 8.4936 LearningRate 0.0398 Epoch: 7 Global Step: 306480 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:09:22,204-Speed 2632.32 samples/sec Loss 8.5257 LearningRate 0.0398 Epoch: 7 Global Step: 306490 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:09:26,053-Speed 2661.02 samples/sec Loss 8.5776 LearningRate 0.0398 Epoch: 7 Global Step: 306500 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:29,951-Speed 2628.22 samples/sec Loss 8.4915 LearningRate 0.0398 Epoch: 7 Global Step: 306510 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:33,843-Speed 2631.23 samples/sec Loss 8.5474 LearningRate 0.0398 Epoch: 7 Global Step: 306520 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:37,748-Speed 2622.89 samples/sec Loss 8.4136 LearningRate 0.0398 Epoch: 7 Global Step: 306530 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:41,691-Speed 2597.73 samples/sec Loss 8.4598 LearningRate 0.0398 Epoch: 7 Global Step: 306540 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:45,654-Speed 2584.75 samples/sec Loss 8.5344 LearningRate 0.0397 Epoch: 7 Global Step: 306550 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:49,558-Speed 2622.83 samples/sec Loss 8.5641 LearningRate 0.0397 Epoch: 7 Global Step: 306560 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:53,465-Speed 2621.71 samples/sec Loss 8.5657 LearningRate 0.0397 Epoch: 7 Global Step: 306570 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:09:57,343-Speed 2640.97 samples/sec Loss 9.7156 LearningRate 0.0397 Epoch: 7 Global Step: 306580 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:01,237-Speed 2630.47 samples/sec Loss 9.2574 LearningRate 0.0397 Epoch: 7 Global Step: 306590 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:05,135-Speed 2627.66 samples/sec Loss 8.8070 LearningRate 0.0397 Epoch: 7 Global Step: 306600 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:09,232-Speed 2500.25 samples/sec Loss 8.7514 LearningRate 0.0397 Epoch: 7 Global Step: 306610 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:13,123-Speed 2632.11 samples/sec Loss 9.1219 LearningRate 0.0397 Epoch: 7 Global Step: 306620 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:17,032-Speed 2620.43 samples/sec Loss 8.7113 LearningRate 0.0397 Epoch: 7 Global Step: 306630 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:20,939-Speed 2620.97 samples/sec Loss 8.5985 LearningRate 0.0397 Epoch: 7 Global Step: 306640 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:24,850-Speed 2619.46 samples/sec Loss 8.5483 LearningRate 0.0397 Epoch: 7 Global Step: 306650 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:28,748-Speed 2627.41 samples/sec Loss 8.5958 LearningRate 0.0397 Epoch: 7 Global Step: 306660 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:32,643-Speed 2629.44 samples/sec Loss 8.6119 LearningRate 0.0397 Epoch: 7 Global Step: 306670 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:10:36,541-Speed 2627.63 samples/sec Loss 8.7665 LearningRate 0.0397 Epoch: 7 Global Step: 306680 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:10:40,439-Speed 2627.47 samples/sec Loss 8.5087 LearningRate 0.0397 Epoch: 7 Global Step: 306690 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:10:44,334-Speed 2629.89 samples/sec Loss 8.5878 LearningRate 0.0397 Epoch: 7 Global Step: 306700 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:10:48,226-Speed 2631.66 samples/sec Loss 8.4358 LearningRate 0.0397 Epoch: 7 Global Step: 306710 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:10:52,115-Speed 2633.87 samples/sec Loss 8.4224 LearningRate 0.0397 Epoch: 7 Global Step: 306720 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:10:56,014-Speed 2626.44 samples/sec Loss 8.6340 LearningRate 0.0397 Epoch: 7 Global Step: 306730 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:10:59,913-Speed 2626.87 samples/sec Loss 8.5921 LearningRate 0.0397 Epoch: 7 Global Step: 306740 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:11:03,817-Speed 2623.34 samples/sec Loss 8.5835 LearningRate 0.0397 Epoch: 7 Global Step: 306750 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:11:07,721-Speed 2624.03 samples/sec Loss 8.5784 LearningRate 0.0397 Epoch: 7 Global Step: 306760 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:11:11,609-Speed 2633.59 samples/sec Loss 8.5789 LearningRate 0.0397 Epoch: 7 Global Step: 306770 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:11:15,502-Speed 2631.17 samples/sec Loss 8.5298 LearningRate 0.0397 Epoch: 7 Global Step: 306780 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:19,392-Speed 2633.50 samples/sec Loss 8.4788 LearningRate 0.0397 Epoch: 7 Global Step: 306790 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:23,282-Speed 2632.95 samples/sec Loss 8.6456 LearningRate 0.0397 Epoch: 7 Global Step: 306800 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:27,172-Speed 2632.78 samples/sec Loss 8.5784 LearningRate 0.0397 Epoch: 7 Global Step: 306810 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:31,069-Speed 2628.76 samples/sec Loss 8.3486 LearningRate 0.0397 Epoch: 7 Global Step: 306820 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:34,959-Speed 2632.58 samples/sec Loss 8.6489 LearningRate 0.0397 Epoch: 7 Global Step: 306830 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:38,850-Speed 2632.24 samples/sec Loss 8.6731 LearningRate 0.0397 Epoch: 7 Global Step: 306840 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:42,741-Speed 2632.35 samples/sec Loss 8.6948 LearningRate 0.0397 Epoch: 7 Global Step: 306850 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:46,634-Speed 2630.94 samples/sec Loss 8.4830 LearningRate 0.0397 Epoch: 7 Global Step: 306860 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:50,536-Speed 2624.64 samples/sec Loss 9.1530 LearningRate 0.0397 Epoch: 7 Global Step: 306870 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:11:54,408-Speed 2645.22 samples/sec Loss 9.8304 LearningRate 0.0397 Epoch: 7 Global Step: 306880 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:11:58,312-Speed 2624.09 samples/sec Loss 8.7436 LearningRate 0.0397 Epoch: 7 Global Step: 306890 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:02,207-Speed 2629.25 samples/sec Loss 8.5982 LearningRate 0.0397 Epoch: 7 Global Step: 306900 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:06,107-Speed 2626.29 samples/sec Loss 8.4542 LearningRate 0.0397 Epoch: 7 Global Step: 306910 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:10,000-Speed 2631.11 samples/sec Loss 8.5401 LearningRate 0.0397 Epoch: 7 Global Step: 306920 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:13,891-Speed 2632.15 samples/sec Loss 8.6119 LearningRate 0.0397 Epoch: 7 Global Step: 306930 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:17,779-Speed 2634.14 samples/sec Loss 8.6828 LearningRate 0.0397 Epoch: 7 Global Step: 306940 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:21,667-Speed 2634.74 samples/sec Loss 8.6490 LearningRate 0.0397 Epoch: 7 Global Step: 306950 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:25,562-Speed 2629.42 samples/sec Loss 8.7066 LearningRate 0.0397 Epoch: 7 Global Step: 306960 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:29,457-Speed 2629.67 samples/sec Loss 8.7376 LearningRate 0.0397 Epoch: 7 Global Step: 306970 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:12:33,349-Speed 2631.77 samples/sec Loss 8.6126 LearningRate 0.0397 Epoch: 7 Global Step: 306980 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:12:37,242-Speed 2631.33 samples/sec Loss 8.6520 LearningRate 0.0397 Epoch: 7 Global Step: 306990 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:12:41,135-Speed 2630.89 samples/sec Loss 8.5282 LearningRate 0.0397 Epoch: 7 Global Step: 307000 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:12:45,028-Speed 2630.84 samples/sec Loss 8.5342 LearningRate 0.0397 Epoch: 7 Global Step: 307010 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:12:48,919-Speed 2632.17 samples/sec Loss 8.4754 LearningRate 0.0397 Epoch: 7 Global Step: 307020 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:12:52,808-Speed 2633.84 samples/sec Loss 8.4402 LearningRate 0.0397 Epoch: 7 Global Step: 307030 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:12:56,698-Speed 2632.91 samples/sec Loss 8.5790 LearningRate 0.0397 Epoch: 7 Global Step: 307040 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:13:00,594-Speed 2628.44 samples/sec Loss 8.6240 LearningRate 0.0397 Epoch: 7 Global Step: 307050 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:13:04,484-Speed 2633.20 samples/sec Loss 8.4448 LearningRate 0.0397 Epoch: 7 Global Step: 307060 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:13:08,375-Speed 2632.35 samples/sec Loss 8.6467 LearningRate 0.0397 Epoch: 7 Global Step: 307070 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:13:12,274-Speed 2626.81 samples/sec Loss 8.4554 LearningRate 0.0397 Epoch: 7 Global Step: 307080 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:16,166-Speed 2631.98 samples/sec Loss 8.5313 LearningRate 0.0397 Epoch: 7 Global Step: 307090 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:20,058-Speed 2631.78 samples/sec Loss 8.5625 LearningRate 0.0397 Epoch: 7 Global Step: 307100 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:23,951-Speed 2630.75 samples/sec Loss 8.5934 LearningRate 0.0397 Epoch: 7 Global Step: 307110 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:27,855-Speed 2623.80 samples/sec Loss 8.5541 LearningRate 0.0397 Epoch: 7 Global Step: 307120 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:31,758-Speed 2623.59 samples/sec Loss 8.4677 LearningRate 0.0397 Epoch: 7 Global Step: 307130 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:35,650-Speed 2631.56 samples/sec Loss 8.4157 LearningRate 0.0397 Epoch: 7 Global Step: 307140 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:39,550-Speed 2626.22 samples/sec Loss 8.5586 LearningRate 0.0397 Epoch: 7 Global Step: 307150 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:43,445-Speed 2629.49 samples/sec Loss 8.3655 LearningRate 0.0397 Epoch: 7 Global Step: 307160 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:47,346-Speed 2626.21 samples/sec Loss 8.4945 LearningRate 0.0397 Epoch: 7 Global Step: 307170 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:13:51,235-Speed 2633.62 samples/sec Loss 8.5064 LearningRate 0.0397 Epoch: 7 Global Step: 307180 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:13:55,125-Speed 2633.43 samples/sec Loss 8.4117 LearningRate 0.0397 Epoch: 7 Global Step: 307190 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:13:59,024-Speed 2626.57 samples/sec Loss 8.6368 LearningRate 0.0397 Epoch: 7 Global Step: 307200 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:02,923-Speed 2627.01 samples/sec Loss 8.5043 LearningRate 0.0396 Epoch: 7 Global Step: 307210 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:06,816-Speed 2630.70 samples/sec Loss 8.3859 LearningRate 0.0396 Epoch: 7 Global Step: 307220 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:10,707-Speed 2632.07 samples/sec Loss 8.4620 LearningRate 0.0396 Epoch: 7 Global Step: 307230 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:14,603-Speed 2629.12 samples/sec Loss 8.3385 LearningRate 0.0396 Epoch: 7 Global Step: 307240 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:18,500-Speed 2628.05 samples/sec Loss 8.5942 LearningRate 0.0396 Epoch: 7 Global Step: 307250 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:22,402-Speed 2624.84 samples/sec Loss 8.5698 LearningRate 0.0396 Epoch: 7 Global Step: 307260 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:26,301-Speed 2627.09 samples/sec Loss 8.4955 LearningRate 0.0396 Epoch: 7 Global Step: 307270 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:14:30,193-Speed 2631.90 samples/sec Loss 8.4619 LearningRate 0.0396 Epoch: 7 Global Step: 307280 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:14:34,111-Speed 2614.23 samples/sec Loss 8.3843 LearningRate 0.0396 Epoch: 7 Global Step: 307290 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:14:38,016-Speed 2622.98 samples/sec Loss 8.5922 LearningRate 0.0396 Epoch: 7 Global Step: 307300 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:14:41,908-Speed 2631.61 samples/sec Loss 8.4839 LearningRate 0.0396 Epoch: 7 Global Step: 307310 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:14:45,801-Speed 2630.35 samples/sec Loss 8.4409 LearningRate 0.0396 Epoch: 7 Global Step: 307320 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:14:49,693-Speed 2631.84 samples/sec Loss 8.6235 LearningRate 0.0396 Epoch: 7 Global Step: 307330 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:14:53,612-Speed 2613.32 samples/sec Loss 8.5956 LearningRate 0.0396 Epoch: 7 Global Step: 307340 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:14:57,515-Speed 2624.49 samples/sec Loss 8.4972 LearningRate 0.0396 Epoch: 7 Global Step: 307350 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:15:01,406-Speed 2632.19 samples/sec Loss 8.6210 LearningRate 0.0396 Epoch: 7 Global Step: 307360 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:15:05,297-Speed 2632.08 samples/sec Loss 8.3880 LearningRate 0.0396 Epoch: 7 Global Step: 307370 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:15:09,191-Speed 2630.91 samples/sec Loss 8.4912 LearningRate 0.0396 Epoch: 7 Global Step: 307380 Fp16 Grad Scale: 262144 Required: 59 hours
Training: 2022-04-14 06:15:13,065-Speed 2644.17 samples/sec Loss 8.6472 LearningRate 0.0396 Epoch: 7 Global Step: 307390 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:15:16,939-Speed 2643.99 samples/sec Loss 8.5155 LearningRate 0.0396 Epoch: 7 Global Step: 307400 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:20,837-Speed 2627.48 samples/sec Loss 8.4387 LearningRate 0.0396 Epoch: 7 Global Step: 307410 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:24,752-Speed 2616.19 samples/sec Loss 8.6198 LearningRate 0.0396 Epoch: 7 Global Step: 307420 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:28,661-Speed 2619.89 samples/sec Loss 8.5854 LearningRate 0.0396 Epoch: 7 Global Step: 307430 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:32,590-Speed 2606.54 samples/sec Loss 8.6455 LearningRate 0.0396 Epoch: 7 Global Step: 307440 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:36,494-Speed 2623.56 samples/sec Loss 8.4526 LearningRate 0.0396 Epoch: 7 Global Step: 307450 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:40,404-Speed 2620.39 samples/sec Loss 8.5534 LearningRate 0.0396 Epoch: 7 Global Step: 307460 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:44,313-Speed 2620.27 samples/sec Loss 8.5634 LearningRate 0.0396 Epoch: 7 Global Step: 307470 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:48,289-Speed 2577.06 samples/sec Loss 8.5424 LearningRate 0.0396 Epoch: 7 Global Step: 307480 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:52,220-Speed 2605.33 samples/sec Loss 8.6292 LearningRate 0.0396 Epoch: 7 Global Step: 307490 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:15:56,156-Speed 2602.65 samples/sec Loss 8.4712 LearningRate 0.0396 Epoch: 7 Global Step: 307500 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:16:00,083-Speed 2607.99 samples/sec Loss 8.3610 LearningRate 0.0396 Epoch: 7 Global Step: 307510 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:16:03,969-Speed 2635.67 samples/sec Loss 8.4428 LearningRate 0.0396 Epoch: 7 Global Step: 307520 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:07,868-Speed 2626.68 samples/sec Loss 8.5538 LearningRate 0.0396 Epoch: 7 Global Step: 307530 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:11,772-Speed 2624.13 samples/sec Loss 8.4669 LearningRate 0.0396 Epoch: 7 Global Step: 307540 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:15,670-Speed 2627.63 samples/sec Loss 8.4651 LearningRate 0.0396 Epoch: 7 Global Step: 307550 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:19,587-Speed 2614.33 samples/sec Loss 8.4724 LearningRate 0.0396 Epoch: 7 Global Step: 307560 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:23,501-Speed 2617.39 samples/sec Loss 8.4370 LearningRate 0.0396 Epoch: 7 Global Step: 307570 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:27,417-Speed 2615.37 samples/sec Loss 8.4345 LearningRate 0.0396 Epoch: 7 Global Step: 307580 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:31,338-Speed 2612.41 samples/sec Loss 8.3647 LearningRate 0.0396 Epoch: 7 Global Step: 307590 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:35,253-Speed 2616.32 samples/sec Loss 8.4921 LearningRate 0.0396 Epoch: 7 Global Step: 307600 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:39,161-Speed 2620.27 samples/sec Loss 8.4610 LearningRate 0.0396 Epoch: 7 Global Step: 307610 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:16:43,060-Speed 2626.92 samples/sec Loss 8.4593 LearningRate 0.0396 Epoch: 7 Global Step: 307620 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:16:46,949-Speed 2633.88 samples/sec Loss 8.5064 LearningRate 0.0396 Epoch: 7 Global Step: 307630 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:16:50,844-Speed 2629.96 samples/sec Loss 8.5415 LearningRate 0.0396 Epoch: 7 Global Step: 307640 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:16:54,748-Speed 2623.60 samples/sec Loss 8.5716 LearningRate 0.0396 Epoch: 7 Global Step: 307650 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:16:58,639-Speed 2632.71 samples/sec Loss 8.4230 LearningRate 0.0396 Epoch: 7 Global Step: 307660 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:17:02,536-Speed 2627.99 samples/sec Loss 8.4567 LearningRate 0.0396 Epoch: 7 Global Step: 307670 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:17:06,428-Speed 2631.81 samples/sec Loss 8.6068 LearningRate 0.0396 Epoch: 7 Global Step: 307680 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:17:10,328-Speed 2625.68 samples/sec Loss 8.4986 LearningRate 0.0396 Epoch: 7 Global Step: 307690 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:17:14,225-Speed 2628.47 samples/sec Loss 8.4527 LearningRate 0.0396 Epoch: 7 Global Step: 307700 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:17:18,120-Speed 2629.99 samples/sec Loss 8.4345 LearningRate 0.0396 Epoch: 7 Global Step: 307710 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:17:21,971-Speed 2660.89 samples/sec Loss 8.6708 LearningRate 0.0396 Epoch: 7 Global Step: 307720 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:17:25,809-Speed 2668.62 samples/sec Loss 8.7105 LearningRate 0.0396 Epoch: 7 Global Step: 307730 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:29,702-Speed 2631.65 samples/sec Loss 8.6422 LearningRate 0.0396 Epoch: 7 Global Step: 307740 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:33,614-Speed 2618.16 samples/sec Loss 8.8436 LearningRate 0.0396 Epoch: 7 Global Step: 307750 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:37,508-Speed 2630.36 samples/sec Loss 8.6113 LearningRate 0.0396 Epoch: 7 Global Step: 307760 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:41,399-Speed 2631.73 samples/sec Loss 8.5940 LearningRate 0.0396 Epoch: 7 Global Step: 307770 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:45,289-Speed 2633.65 samples/sec Loss 8.4961 LearningRate 0.0396 Epoch: 7 Global Step: 307780 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:49,181-Speed 2631.41 samples/sec Loss 8.3510 LearningRate 0.0396 Epoch: 7 Global Step: 307790 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:53,076-Speed 2630.43 samples/sec Loss 8.5675 LearningRate 0.0396 Epoch: 7 Global Step: 307800 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:17:56,981-Speed 2622.58 samples/sec Loss 8.5359 LearningRate 0.0396 Epoch: 7 Global Step: 307810 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:18:00,872-Speed 2632.90 samples/sec Loss 8.5792 LearningRate 0.0396 Epoch: 7 Global Step: 307820 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:18:04,762-Speed 2632.35 samples/sec Loss 8.5047 LearningRate 0.0396 Epoch: 7 Global Step: 307830 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:08,661-Speed 2626.96 samples/sec Loss 8.4304 LearningRate 0.0396 Epoch: 7 Global Step: 307840 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:12,562-Speed 2625.35 samples/sec Loss 8.4281 LearningRate 0.0396 Epoch: 7 Global Step: 307850 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:16,445-Speed 2638.08 samples/sec Loss 8.4438 LearningRate 0.0396 Epoch: 7 Global Step: 307860 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:20,337-Speed 2631.88 samples/sec Loss 8.5404 LearningRate 0.0395 Epoch: 7 Global Step: 307870 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:24,235-Speed 2627.82 samples/sec Loss 8.4261 LearningRate 0.0395 Epoch: 7 Global Step: 307880 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:28,123-Speed 2634.57 samples/sec Loss 8.4069 LearningRate 0.0395 Epoch: 7 Global Step: 307890 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:32,013-Speed 2632.96 samples/sec Loss 8.6262 LearningRate 0.0395 Epoch: 7 Global Step: 307900 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:35,904-Speed 2631.65 samples/sec Loss 8.4875 LearningRate 0.0395 Epoch: 7 Global Step: 307910 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:39,792-Speed 2634.17 samples/sec Loss 8.4669 LearningRate 0.0395 Epoch: 7 Global Step: 307920 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:18:43,686-Speed 2631.22 samples/sec Loss 8.5024 LearningRate 0.0395 Epoch: 7 Global Step: 307930 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:18:47,581-Speed 2629.79 samples/sec Loss 8.5814 LearningRate 0.0395 Epoch: 7 Global Step: 307940 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:18:51,471-Speed 2632.66 samples/sec Loss 8.5327 LearningRate 0.0395 Epoch: 7 Global Step: 307950 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:18:55,361-Speed 2632.96 samples/sec Loss 8.4293 LearningRate 0.0395 Epoch: 7 Global Step: 307960 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:18:59,254-Speed 2631.04 samples/sec Loss 8.5021 LearningRate 0.0395 Epoch: 7 Global Step: 307970 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:19:03,146-Speed 2632.15 samples/sec Loss 8.4035 LearningRate 0.0395 Epoch: 7 Global Step: 307980 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:19:07,049-Speed 2624.02 samples/sec Loss 8.4024 LearningRate 0.0395 Epoch: 7 Global Step: 307990 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:19:10,954-Speed 2622.83 samples/sec Loss 8.4141 LearningRate 0.0395 Epoch: 7 Global Step: 308000 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:19:14,857-Speed 2624.03 samples/sec Loss 8.5504 LearningRate 0.0395 Epoch: 7 Global Step: 308010 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:19:18,861-Speed 2558.26 samples/sec Loss 8.4524 LearningRate 0.0395 Epoch: 7 Global Step: 308020 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:19:22,766-Speed 2623.00 samples/sec Loss 8.5845 LearningRate 0.0395 Epoch: 7 Global Step: 308030 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:26,670-Speed 2623.86 samples/sec Loss 8.5667 LearningRate 0.0395 Epoch: 7 Global Step: 308040 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:30,571-Speed 2625.32 samples/sec Loss 8.4266 LearningRate 0.0395 Epoch: 7 Global Step: 308050 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:34,477-Speed 2621.96 samples/sec Loss 8.5423 LearningRate 0.0395 Epoch: 7 Global Step: 308060 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:38,379-Speed 2625.28 samples/sec Loss 8.4437 LearningRate 0.0395 Epoch: 7 Global Step: 308070 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:42,291-Speed 2618.07 samples/sec Loss 8.5925 LearningRate 0.0395 Epoch: 7 Global Step: 308080 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:46,189-Speed 2627.43 samples/sec Loss 8.4922 LearningRate 0.0395 Epoch: 7 Global Step: 308090 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:50,093-Speed 2624.16 samples/sec Loss 8.7308 LearningRate 0.0395 Epoch: 7 Global Step: 308100 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:53,997-Speed 2623.65 samples/sec Loss 8.5134 LearningRate 0.0395 Epoch: 7 Global Step: 308110 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:19:57,900-Speed 2624.49 samples/sec Loss 8.4781 LearningRate 0.0395 Epoch: 7 Global Step: 308120 Fp16 Grad Scale: 65536 Required: 59 hours
Training: 2022-04-14 06:20:01,799-Speed 2627.00 samples/sec Loss 8.5302 LearningRate 0.0395 Epoch: 7 Global Step: 308130 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:20:05,692-Speed 2630.62 samples/sec Loss 8.5971 LearningRate 0.0395 Epoch: 7 Global Step: 308140 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:20:09,607-Speed 2616.14 samples/sec Loss 8.5727 LearningRate 0.0395 Epoch: 7 Global Step: 308150 Fp16 Grad Scale: 131072 Required: 59 hours
Training: 2022-04-14 06:20:13,464-Speed 2655.77 samples/sec Loss 8.6178 LearningRate 0.0395 Epoch: 7 Global Step: 308160 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:20:17,363-Speed 2627.34 samples/sec Loss 8.5798 LearningRate 0.0395 Epoch: 7 Global Step: 308170 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:20:21,258-Speed 2629.26 samples/sec Loss 8.4101 LearningRate 0.0395 Epoch: 7 Global Step: 308180 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:20:25,152-Speed 2630.97 samples/sec Loss 8.4745 LearningRate 0.0395 Epoch: 7 Global Step: 308190 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:20:29,052-Speed 2626.12 samples/sec Loss 8.5937 LearningRate 0.0395 Epoch: 7 Global Step: 308200 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:20:32,947-Speed 2629.88 samples/sec Loss 8.4862 LearningRate 0.0395 Epoch: 7 Global Step: 308210 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:20:36,837-Speed 2632.35 samples/sec Loss 8.5687 LearningRate 0.0395 Epoch: 7 Global Step: 308220 Fp16 Grad Scale: 32768 Required: 59 hours
Training: 2022-04-14 06:20:40,689-Speed 2659.73 samples/sec Loss 8.7167 LearningRate 0.0395 Epoch: 7 Global Step: 308230 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:20:44,543-Speed 2657.99 samples/sec Loss 8.8230 LearningRate 0.0395 Epoch: 7 Global Step: 308240 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:20:48,435-Speed 2631.65 samples/sec Loss 8.6552 LearningRate 0.0395 Epoch: 7 Global Step: 308250 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:20:52,325-Speed 2633.07 samples/sec Loss 9.7335 LearningRate 0.0395 Epoch: 7 Global Step: 308260 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:20:56,206-Speed 2638.65 samples/sec Loss 8.9205 LearningRate 0.0395 Epoch: 7 Global Step: 308270 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:21:00,097-Speed 2632.46 samples/sec Loss 8.6870 LearningRate 0.0395 Epoch: 7 Global Step: 308280 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:21:04,007-Speed 2619.43 samples/sec Loss 8.5886 LearningRate 0.0395 Epoch: 7 Global Step: 308290 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:21:07,905-Speed 2628.02 samples/sec Loss 8.4214 LearningRate 0.0395 Epoch: 7 Global Step: 308300 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:21:11,790-Speed 2636.18 samples/sec Loss 8.3900 LearningRate 0.0395 Epoch: 7 Global Step: 308310 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:21:15,732-Speed 2598.40 samples/sec Loss 8.4451 LearningRate 0.0395 Epoch: 7 Global Step: 308320 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:21:19,674-Speed 2598.47 samples/sec Loss 8.7108 LearningRate 0.0395 Epoch: 7 Global Step: 308330 Fp16 Grad Scale: 2048 Required: 59 hours
Training: 2022-04-14 06:21:23,737-Speed 2520.64 samples/sec Loss 8.4454 LearningRate 0.0395 Epoch: 7 Global Step: 308340 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:27,623-Speed 2635.90 samples/sec Loss 8.4966 LearningRate 0.0395 Epoch: 7 Global Step: 308350 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:31,522-Speed 2627.31 samples/sec Loss 8.4999 LearningRate 0.0395 Epoch: 7 Global Step: 308360 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:35,420-Speed 2627.77 samples/sec Loss 8.4976 LearningRate 0.0395 Epoch: 7 Global Step: 308370 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:39,317-Speed 2628.02 samples/sec Loss 8.4305 LearningRate 0.0395 Epoch: 7 Global Step: 308380 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:43,214-Speed 2628.38 samples/sec Loss 8.5757 LearningRate 0.0395 Epoch: 7 Global Step: 308390 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:47,115-Speed 2625.42 samples/sec Loss 8.8415 LearningRate 0.0395 Epoch: 7 Global Step: 308400 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:51,009-Speed 2630.73 samples/sec Loss 8.6073 LearningRate 0.0395 Epoch: 7 Global Step: 308410 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:54,903-Speed 2630.16 samples/sec Loss 8.4787 LearningRate 0.0395 Epoch: 7 Global Step: 308420 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:21:58,806-Speed 2624.74 samples/sec Loss 8.3238 LearningRate 0.0395 Epoch: 7 Global Step: 308430 Fp16 Grad Scale: 4096 Required: 59 hours
Training: 2022-04-14 06:22:02,709-Speed 2623.81 samples/sec Loss 8.3841 LearningRate 0.0395 Epoch: 7 Global Step: 308440 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:06,609-Speed 2626.27 samples/sec Loss 8.5576 LearningRate 0.0395 Epoch: 7 Global Step: 308450 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:10,507-Speed 2627.04 samples/sec Loss 8.5972 LearningRate 0.0395 Epoch: 7 Global Step: 308460 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:14,397-Speed 2633.60 samples/sec Loss 8.5586 LearningRate 0.0395 Epoch: 7 Global Step: 308470 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:18,285-Speed 2635.00 samples/sec Loss 8.5333 LearningRate 0.0395 Epoch: 7 Global Step: 308480 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:22,172-Speed 2634.49 samples/sec Loss 8.5699 LearningRate 0.0395 Epoch: 7 Global Step: 308490 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:26,065-Speed 2631.65 samples/sec Loss 8.5284 LearningRate 0.0395 Epoch: 7 Global Step: 308500 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:29,964-Speed 2626.51 samples/sec Loss 8.5197 LearningRate 0.0395 Epoch: 7 Global Step: 308510 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:33,861-Speed 2629.10 samples/sec Loss 8.4175 LearningRate 0.0395 Epoch: 7 Global Step: 308520 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:37,748-Speed 2634.81 samples/sec Loss 8.4657 LearningRate 0.0394 Epoch: 7 Global Step: 308530 Fp16 Grad Scale: 8192 Required: 59 hours
Training: 2022-04-14 06:22:41,642-Speed 2630.42 samples/sec Loss 8.4253 LearningRate 0.0394 Epoch: 7 Global Step: 308540 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:22:45,528-Speed 2635.19 samples/sec Loss 8.3868 LearningRate 0.0394 Epoch: 7 Global Step: 308550 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:22:49,423-Speed 2629.95 samples/sec Loss 8.4683 LearningRate 0.0394 Epoch: 7 Global Step: 308560 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:22:53,354-Speed 2605.94 samples/sec Loss 8.4060 LearningRate 0.0394 Epoch: 7 Global Step: 308570 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:22:57,244-Speed 2633.02 samples/sec Loss 8.3843 LearningRate 0.0394 Epoch: 7 Global Step: 308580 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:23:01,145-Speed 2625.60 samples/sec Loss 8.4389 LearningRate 0.0394 Epoch: 7 Global Step: 308590 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:23:05,045-Speed 2625.98 samples/sec Loss 8.5073 LearningRate 0.0394 Epoch: 7 Global Step: 308600 Fp16 Grad Scale: 16384 Required: 59 hours
Training: 2022-04-14 06:23:08,942-Speed 2628.52 samples/sec Loss 8.4851 LearningRate 0.0394 Epoch: 7 Global Step: 308610 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:23:12,837-Speed 2629.81 samples/sec Loss 8.4878 LearningRate 0.0394 Epoch: 7 Global Step: 308620 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:23:16,731-Speed 2630.41 samples/sec Loss 8.5755 LearningRate 0.0394 Epoch: 7 Global Step: 308630 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:23:20,622-Speed 2632.53 samples/sec Loss 8.3862 LearningRate 0.0394 Epoch: 7 Global Step: 308640 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:24,510-Speed 2634.45 samples/sec Loss 8.5592 LearningRate 0.0394 Epoch: 7 Global Step: 308650 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:28,408-Speed 2627.16 samples/sec Loss 8.5079 LearningRate 0.0394 Epoch: 7 Global Step: 308660 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:32,308-Speed 2626.72 samples/sec Loss 8.5640 LearningRate 0.0394 Epoch: 7 Global Step: 308670 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:36,196-Speed 2634.31 samples/sec Loss 8.5854 LearningRate 0.0394 Epoch: 7 Global Step: 308680 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:40,087-Speed 2632.49 samples/sec Loss 8.4391 LearningRate 0.0394 Epoch: 7 Global Step: 308690 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:44,164-Speed 2511.83 samples/sec Loss 8.5312 LearningRate 0.0394 Epoch: 7 Global Step: 308700 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:48,258-Speed 2502.44 samples/sec Loss 8.4583 LearningRate 0.0394 Epoch: 7 Global Step: 308710 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:52,148-Speed 2633.00 samples/sec Loss 8.5228 LearningRate 0.0394 Epoch: 7 Global Step: 308720 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:56,035-Speed 2637.76 samples/sec Loss 8.3965 LearningRate 0.0394 Epoch: 7 Global Step: 308730 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:23:59,926-Speed 2632.57 samples/sec Loss 8.4136 LearningRate 0.0394 Epoch: 7 Global Step: 308740 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:03,819-Speed 2630.85 samples/sec Loss 8.4296 LearningRate 0.0394 Epoch: 7 Global Step: 308750 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:07,709-Speed 2633.23 samples/sec Loss 8.4548 LearningRate 0.0394 Epoch: 7 Global Step: 308760 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:11,606-Speed 2627.93 samples/sec Loss 8.6017 LearningRate 0.0394 Epoch: 7 Global Step: 308770 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:15,497-Speed 2632.31 samples/sec Loss 8.4613 LearningRate 0.0394 Epoch: 7 Global Step: 308780 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:19,389-Speed 2631.82 samples/sec Loss 8.4584 LearningRate 0.0394 Epoch: 7 Global Step: 308790 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:23,289-Speed 2626.92 samples/sec Loss 8.5562 LearningRate 0.0394 Epoch: 7 Global Step: 308800 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:27,179-Speed 2632.39 samples/sec Loss 8.4878 LearningRate 0.0394 Epoch: 7 Global Step: 308810 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:31,070-Speed 2632.56 samples/sec Loss 8.5162 LearningRate 0.0394 Epoch: 7 Global Step: 308820 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:34,972-Speed 2624.36 samples/sec Loss 8.4275 LearningRate 0.0394 Epoch: 7 Global Step: 308830 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:24:38,869-Speed 2628.91 samples/sec Loss 8.4009 LearningRate 0.0394 Epoch: 7 Global Step: 308840 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:24:42,764-Speed 2630.22 samples/sec Loss 8.4804 LearningRate 0.0394 Epoch: 7 Global Step: 308850 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:24:46,653-Speed 2633.28 samples/sec Loss 8.5580 LearningRate 0.0394 Epoch: 7 Global Step: 308860 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:24:50,551-Speed 2627.71 samples/sec Loss 8.5187 LearningRate 0.0394 Epoch: 7 Global Step: 308870 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:24:54,449-Speed 2627.97 samples/sec Loss 8.4698 LearningRate 0.0394 Epoch: 7 Global Step: 308880 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:24:58,341-Speed 2631.29 samples/sec Loss 8.5116 LearningRate 0.0394 Epoch: 7 Global Step: 308890 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:02,253-Speed 2617.91 samples/sec Loss 8.4344 LearningRate 0.0394 Epoch: 7 Global Step: 308900 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:06,155-Speed 2625.37 samples/sec Loss 8.4927 LearningRate 0.0394 Epoch: 7 Global Step: 308910 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:10,057-Speed 2624.68 samples/sec Loss 8.4245 LearningRate 0.0394 Epoch: 7 Global Step: 308920 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:13,972-Speed 2616.07 samples/sec Loss 8.5987 LearningRate 0.0394 Epoch: 7 Global Step: 308930 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:17,886-Speed 2617.27 samples/sec Loss 8.4517 LearningRate 0.0394 Epoch: 7 Global Step: 308940 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:21,898-Speed 2552.88 samples/sec Loss 8.5927 LearningRate 0.0394 Epoch: 7 Global Step: 308950 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:25,789-Speed 2632.81 samples/sec Loss 8.4875 LearningRate 0.0394 Epoch: 7 Global Step: 308960 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:25:29,666-Speed 2641.66 samples/sec Loss 8.5480 LearningRate 0.0394 Epoch: 7 Global Step: 308970 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:25:33,564-Speed 2627.61 samples/sec Loss 8.4443 LearningRate 0.0394 Epoch: 7 Global Step: 308980 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:25:37,457-Speed 2630.80 samples/sec Loss 8.5753 LearningRate 0.0394 Epoch: 7 Global Step: 308990 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:25:41,352-Speed 2629.94 samples/sec Loss 8.5165 LearningRate 0.0394 Epoch: 7 Global Step: 309000 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:25:45,243-Speed 2631.90 samples/sec Loss 8.4937 LearningRate 0.0394 Epoch: 7 Global Step: 309010 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:25:49,148-Speed 2623.27 samples/sec Loss 8.4928 LearningRate 0.0394 Epoch: 7 Global Step: 309020 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:25:53,040-Speed 2631.22 samples/sec Loss 8.4642 LearningRate 0.0394 Epoch: 7 Global Step: 309030 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:25:56,930-Speed 2633.83 samples/sec Loss 8.5321 LearningRate 0.0394 Epoch: 7 Global Step: 309040 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:26:00,822-Speed 2631.33 samples/sec Loss 8.4940 LearningRate 0.0394 Epoch: 7 Global Step: 309050 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:26:04,723-Speed 2625.89 samples/sec Loss 8.4421 LearningRate 0.0394 Epoch: 7 Global Step: 309060 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:26:08,672-Speed 2593.28 samples/sec Loss 8.4986 LearningRate 0.0394 Epoch: 7 Global Step: 309070 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:12,563-Speed 2632.58 samples/sec Loss 8.4248 LearningRate 0.0394 Epoch: 7 Global Step: 309080 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:16,463-Speed 2626.32 samples/sec Loss 8.4871 LearningRate 0.0394 Epoch: 7 Global Step: 309090 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:20,352-Speed 2634.11 samples/sec Loss 8.4150 LearningRate 0.0394 Epoch: 7 Global Step: 309100 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:24,248-Speed 2629.28 samples/sec Loss 8.4823 LearningRate 0.0394 Epoch: 7 Global Step: 309110 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:28,148-Speed 2626.39 samples/sec Loss 8.5307 LearningRate 0.0394 Epoch: 7 Global Step: 309120 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:32,066-Speed 2613.80 samples/sec Loss 8.4975 LearningRate 0.0394 Epoch: 7 Global Step: 309130 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:35,972-Speed 2622.49 samples/sec Loss 8.4433 LearningRate 0.0394 Epoch: 7 Global Step: 309140 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:39,868-Speed 2628.52 samples/sec Loss 8.4213 LearningRate 0.0394 Epoch: 7 Global Step: 309150 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:43,761-Speed 2631.66 samples/sec Loss 8.4295 LearningRate 0.0394 Epoch: 7 Global Step: 309160 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:47,653-Speed 2631.62 samples/sec Loss 8.3982 LearningRate 0.0394 Epoch: 7 Global Step: 309170 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:26:51,549-Speed 2629.28 samples/sec Loss 8.3217 LearningRate 0.0394 Epoch: 7 Global Step: 309180 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:26:55,420-Speed 2645.61 samples/sec Loss 8.4426 LearningRate 0.0393 Epoch: 7 Global Step: 309190 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:26:59,312-Speed 2631.83 samples/sec Loss 8.4536 LearningRate 0.0393 Epoch: 7 Global Step: 309200 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:03,205-Speed 2631.01 samples/sec Loss 8.6384 LearningRate 0.0393 Epoch: 7 Global Step: 309210 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:07,098-Speed 2631.41 samples/sec Loss 8.4910 LearningRate 0.0393 Epoch: 7 Global Step: 309220 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:11,006-Speed 2620.53 samples/sec Loss 8.3839 LearningRate 0.0393 Epoch: 7 Global Step: 309230 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:14,905-Speed 2627.20 samples/sec Loss 8.4581 LearningRate 0.0393 Epoch: 7 Global Step: 309240 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:18,829-Speed 2610.65 samples/sec Loss 8.6374 LearningRate 0.0393 Epoch: 7 Global Step: 309250 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:22,807-Speed 2574.27 samples/sec Loss 8.4646 LearningRate 0.0393 Epoch: 7 Global Step: 309260 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:26,698-Speed 2633.16 samples/sec Loss 8.4138 LearningRate 0.0393 Epoch: 7 Global Step: 309270 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:30,714-Speed 2550.18 samples/sec Loss 8.3663 LearningRate 0.0393 Epoch: 7 Global Step: 309280 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:27:34,612-Speed 2627.77 samples/sec Loss 8.5359 LearningRate 0.0393 Epoch: 7 Global Step: 309290 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:27:38,502-Speed 2632.64 samples/sec Loss 8.4124 LearningRate 0.0393 Epoch: 7 Global Step: 309300 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:27:42,405-Speed 2624.50 samples/sec Loss 8.3847 LearningRate 0.0393 Epoch: 7 Global Step: 309310 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:27:46,305-Speed 2626.31 samples/sec Loss 8.4783 LearningRate 0.0393 Epoch: 7 Global Step: 309320 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:27:50,199-Speed 2630.22 samples/sec Loss 8.5011 LearningRate 0.0393 Epoch: 7 Global Step: 309330 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:27:54,091-Speed 2631.86 samples/sec Loss 8.5466 LearningRate 0.0393 Epoch: 7 Global Step: 309340 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:27:57,982-Speed 2633.01 samples/sec Loss 8.4649 LearningRate 0.0393 Epoch: 7 Global Step: 309350 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:01,877-Speed 2629.09 samples/sec Loss 8.5918 LearningRate 0.0393 Epoch: 7 Global Step: 309360 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:05,782-Speed 2623.18 samples/sec Loss 8.4195 LearningRate 0.0393 Epoch: 7 Global Step: 309370 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:09,710-Speed 2607.56 samples/sec Loss 8.4345 LearningRate 0.0393 Epoch: 7 Global Step: 309380 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:13,607-Speed 2628.36 samples/sec Loss 8.4247 LearningRate 0.0393 Epoch: 7 Global Step: 309390 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:28:17,487-Speed 2639.58 samples/sec Loss 8.5292 LearningRate 0.0393 Epoch: 7 Global Step: 309400 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:21,382-Speed 2629.52 samples/sec Loss 8.3265 LearningRate 0.0393 Epoch: 7 Global Step: 309410 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:25,293-Speed 2619.54 samples/sec Loss 8.5024 LearningRate 0.0393 Epoch: 7 Global Step: 309420 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:29,172-Speed 2639.99 samples/sec Loss 8.3635 LearningRate 0.0393 Epoch: 7 Global Step: 309430 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:33,062-Speed 2633.34 samples/sec Loss 8.4103 LearningRate 0.0393 Epoch: 7 Global Step: 309440 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:36,956-Speed 2630.47 samples/sec Loss 8.4197 LearningRate 0.0393 Epoch: 7 Global Step: 309450 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:40,856-Speed 2626.38 samples/sec Loss 8.5054 LearningRate 0.0393 Epoch: 7 Global Step: 309460 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:44,747-Speed 2631.88 samples/sec Loss 8.4290 LearningRate 0.0393 Epoch: 7 Global Step: 309470 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:48,655-Speed 2621.01 samples/sec Loss 8.5689 LearningRate 0.0393 Epoch: 7 Global Step: 309480 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:52,548-Speed 2631.08 samples/sec Loss 8.4280 LearningRate 0.0393 Epoch: 7 Global Step: 309490 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:28:56,434-Speed 2635.73 samples/sec Loss 8.5492 LearningRate 0.0393 Epoch: 7 Global Step: 309500 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:00,324-Speed 2632.97 samples/sec Loss 8.5456 LearningRate 0.0393 Epoch: 7 Global Step: 309510 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:04,217-Speed 2631.12 samples/sec Loss 8.4889 LearningRate 0.0393 Epoch: 7 Global Step: 309520 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:08,121-Speed 2623.75 samples/sec Loss 8.4631 LearningRate 0.0393 Epoch: 7 Global Step: 309530 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:12,014-Speed 2630.83 samples/sec Loss 8.5471 LearningRate 0.0393 Epoch: 7 Global Step: 309540 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:15,925-Speed 2619.34 samples/sec Loss 8.5183 LearningRate 0.0393 Epoch: 7 Global Step: 309550 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:19,817-Speed 2631.81 samples/sec Loss 8.6114 LearningRate 0.0393 Epoch: 7 Global Step: 309560 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:23,708-Speed 2632.31 samples/sec Loss 8.5222 LearningRate 0.0393 Epoch: 7 Global Step: 309570 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:27,602-Speed 2630.68 samples/sec Loss 8.4574 LearningRate 0.0393 Epoch: 7 Global Step: 309580 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:31,493-Speed 2632.15 samples/sec Loss 8.4022 LearningRate 0.0393 Epoch: 7 Global Step: 309590 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:35,384-Speed 2632.11 samples/sec Loss 8.5125 LearningRate 0.0393 Epoch: 7 Global Step: 309600 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:29:39,259-Speed 2643.20 samples/sec Loss 8.5852 LearningRate 0.0393 Epoch: 7 Global Step: 309610 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:43,154-Speed 2629.86 samples/sec Loss 8.5481 LearningRate 0.0393 Epoch: 7 Global Step: 309620 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:29:46,998-Speed 2665.23 samples/sec Loss 8.8497 LearningRate 0.0393 Epoch: 7 Global Step: 309630 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:29:50,809-Speed 2686.99 samples/sec Loss 8.9631 LearningRate 0.0393 Epoch: 7 Global Step: 309640 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:29:54,750-Speed 2599.62 samples/sec Loss 9.9050 LearningRate 0.0393 Epoch: 7 Global Step: 309650 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:29:58,652-Speed 2624.64 samples/sec Loss 9.2481 LearningRate 0.0393 Epoch: 7 Global Step: 309660 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:02,547-Speed 2630.11 samples/sec Loss 8.6500 LearningRate 0.0393 Epoch: 7 Global Step: 309670 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:06,426-Speed 2640.55 samples/sec Loss 8.6328 LearningRate 0.0393 Epoch: 7 Global Step: 309680 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:10,315-Speed 2633.70 samples/sec Loss 8.5906 LearningRate 0.0393 Epoch: 7 Global Step: 309690 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:14,199-Speed 2637.15 samples/sec Loss 8.3866 LearningRate 0.0393 Epoch: 7 Global Step: 309700 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:18,090-Speed 2632.32 samples/sec Loss 8.5847 LearningRate 0.0393 Epoch: 7 Global Step: 309710 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:21,973-Speed 2637.40 samples/sec Loss 8.5630 LearningRate 0.0393 Epoch: 7 Global Step: 309720 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:25,858-Speed 2636.44 samples/sec Loss 8.4527 LearningRate 0.0393 Epoch: 7 Global Step: 309730 Fp16 Grad Scale: 1024 Required: 58 hours
Training: 2022-04-14 06:30:29,745-Speed 2635.72 samples/sec Loss 8.6084 LearningRate 0.0393 Epoch: 7 Global Step: 309740 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:30:33,632-Speed 2636.31 samples/sec Loss 8.5018 LearningRate 0.0393 Epoch: 7 Global Step: 309750 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:30:37,520-Speed 2633.70 samples/sec Loss 8.5537 LearningRate 0.0393 Epoch: 7 Global Step: 309760 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:30:41,423-Speed 2624.41 samples/sec Loss 8.5187 LearningRate 0.0393 Epoch: 7 Global Step: 309770 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:30:45,317-Speed 2630.65 samples/sec Loss 8.4933 LearningRate 0.0393 Epoch: 7 Global Step: 309780 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:30:49,203-Speed 2635.93 samples/sec Loss 8.4284 LearningRate 0.0393 Epoch: 7 Global Step: 309790 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:30:53,093-Speed 2633.08 samples/sec Loss 8.4809 LearningRate 0.0393 Epoch: 7 Global Step: 309800 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:30:56,981-Speed 2634.24 samples/sec Loss 8.4584 LearningRate 0.0393 Epoch: 7 Global Step: 309810 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:31:00,868-Speed 2634.98 samples/sec Loss 8.4936 LearningRate 0.0393 Epoch: 7 Global Step: 309820 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:31:04,759-Speed 2632.50 samples/sec Loss 8.5804 LearningRate 0.0393 Epoch: 7 Global Step: 309830 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:31:08,656-Speed 2628.43 samples/sec Loss 8.5066 LearningRate 0.0393 Epoch: 7 Global Step: 309840 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:12,547-Speed 2632.26 samples/sec Loss 8.4511 LearningRate 0.0392 Epoch: 7 Global Step: 309850 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:16,440-Speed 2631.48 samples/sec Loss 8.5333 LearningRate 0.0392 Epoch: 7 Global Step: 309860 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:20,336-Speed 2628.91 samples/sec Loss 8.5429 LearningRate 0.0392 Epoch: 7 Global Step: 309870 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:24,220-Speed 2636.96 samples/sec Loss 8.4158 LearningRate 0.0392 Epoch: 7 Global Step: 309880 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:28,108-Speed 2635.20 samples/sec Loss 8.5322 LearningRate 0.0392 Epoch: 7 Global Step: 309890 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:31,994-Speed 2635.58 samples/sec Loss 8.4952 LearningRate 0.0392 Epoch: 7 Global Step: 309900 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:35,883-Speed 2633.77 samples/sec Loss 8.5263 LearningRate 0.0392 Epoch: 7 Global Step: 309910 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:39,769-Speed 2635.86 samples/sec Loss 8.4671 LearningRate 0.0392 Epoch: 7 Global Step: 309920 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:43,656-Speed 2635.37 samples/sec Loss 8.5088 LearningRate 0.0392 Epoch: 7 Global Step: 309930 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:31:47,542-Speed 2635.45 samples/sec Loss 8.5801 LearningRate 0.0392 Epoch: 7 Global Step: 309940 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:31:51,433-Speed 2633.54 samples/sec Loss 8.4806 LearningRate 0.0392 Epoch: 7 Global Step: 309950 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:31:55,321-Speed 2633.95 samples/sec Loss 8.5577 LearningRate 0.0392 Epoch: 7 Global Step: 309960 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:31:59,211-Speed 2633.66 samples/sec Loss 8.5020 LearningRate 0.0392 Epoch: 7 Global Step: 309970 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:32:03,102-Speed 2632.13 samples/sec Loss 8.4447 LearningRate 0.0392 Epoch: 7 Global Step: 309980 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:32:06,989-Speed 2634.90 samples/sec Loss 8.5255 LearningRate 0.0392 Epoch: 7 Global Step: 309990 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:32:10,884-Speed 2629.29 samples/sec Loss 8.4560 LearningRate 0.0392 Epoch: 7 Global Step: 310000 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:32:53,903-[lfw][310000]XNorm: 24.335987
Training: 2022-04-14 06:32:53,904-[lfw][310000]Accuracy-Flip: 0.99717+-0.00224
Training: 2022-04-14 06:32:53,905-[lfw][310000]Accuracy-Highest: 0.99783
Training: 2022-04-14 06:33:43,984-[cfp_fp][310000]XNorm: 22.351407
Training: 2022-04-14 06:33:43,985-[cfp_fp][310000]Accuracy-Flip: 0.98671+-0.00723
Training: 2022-04-14 06:33:43,986-[cfp_fp][310000]Accuracy-Highest: 0.98671
Training: 2022-04-14 06:34:27,092-[agedb_30][310000]XNorm: 24.160127
Training: 2022-04-14 06:34:27,093-[agedb_30][310000]Accuracy-Flip: 0.97483+-0.00736
Training: 2022-04-14 06:34:27,094-[agedb_30][310000]Accuracy-Highest: 0.97567
Training: 2022-04-14 06:34:30,951-Speed 73.11 samples/sec Loss 8.4833 LearningRate 0.0392 Epoch: 7 Global Step: 310010 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:34:34,806-Speed 2656.60 samples/sec Loss 8.4297 LearningRate 0.0392 Epoch: 7 Global Step: 310020 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:34:38,661-Speed 2656.92 samples/sec Loss 8.4514 LearningRate 0.0392 Epoch: 7 Global Step: 310030 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:34:42,530-Speed 2647.73 samples/sec Loss 8.3931 LearningRate 0.0392 Epoch: 7 Global Step: 310040 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:34:46,392-Speed 2651.85 samples/sec Loss 8.4707 LearningRate 0.0392 Epoch: 7 Global Step: 310050 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:34:50,266-Speed 2644.73 samples/sec Loss 8.4335 LearningRate 0.0392 Epoch: 7 Global Step: 310060 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:34:54,130-Speed 2651.87 samples/sec Loss 8.5364 LearningRate 0.0392 Epoch: 7 Global Step: 310070 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:34:57,999-Speed 2647.05 samples/sec Loss 8.3961 LearningRate 0.0392 Epoch: 7 Global Step: 310080 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:35:01,870-Speed 2646.24 samples/sec Loss 8.4945 LearningRate 0.0392 Epoch: 7 Global Step: 310090 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:35:05,741-Speed 2645.42 samples/sec Loss 8.4302 LearningRate 0.0392 Epoch: 7 Global Step: 310100 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:35:09,630-Speed 2633.88 samples/sec Loss 8.3383 LearningRate 0.0392 Epoch: 7 Global Step: 310110 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:35:13,502-Speed 2645.21 samples/sec Loss 8.5111 LearningRate 0.0392 Epoch: 7 Global Step: 310120 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:35:17,378-Speed 2642.66 samples/sec Loss 8.5368 LearningRate 0.0392 Epoch: 7 Global Step: 310130 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:35:21,255-Speed 2642.28 samples/sec Loss 8.4182 LearningRate 0.0392 Epoch: 7 Global Step: 310140 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:25,137-Speed 2638.13 samples/sec Loss 8.4755 LearningRate 0.0392 Epoch: 7 Global Step: 310150 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:29,061-Speed 2610.54 samples/sec Loss 8.5016 LearningRate 0.0392 Epoch: 7 Global Step: 310160 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:32,952-Speed 2632.84 samples/sec Loss 8.5592 LearningRate 0.0392 Epoch: 7 Global Step: 310170 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:36,844-Speed 2631.26 samples/sec Loss 8.5650 LearningRate 0.0392 Epoch: 7 Global Step: 310180 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:40,735-Speed 2632.18 samples/sec Loss 8.4760 LearningRate 0.0392 Epoch: 7 Global Step: 310190 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:44,624-Speed 2634.45 samples/sec Loss 8.4790 LearningRate 0.0392 Epoch: 7 Global Step: 310200 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:48,520-Speed 2628.88 samples/sec Loss 8.3787 LearningRate 0.0392 Epoch: 7 Global Step: 310210 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:35:52,387-Speed 2648.45 samples/sec Loss 9.4608 LearningRate 0.0392 Epoch: 7 Global Step: 310220 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:35:56,274-Speed 2634.98 samples/sec Loss 9.1655 LearningRate 0.0392 Epoch: 7 Global Step: 310230 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:00,165-Speed 2633.05 samples/sec Loss 8.7293 LearningRate 0.0392 Epoch: 7 Global Step: 310240 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:04,051-Speed 2635.47 samples/sec Loss 8.5570 LearningRate 0.0392 Epoch: 7 Global Step: 310250 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:07,941-Speed 2632.87 samples/sec Loss 8.5848 LearningRate 0.0392 Epoch: 7 Global Step: 310260 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:11,828-Speed 2635.52 samples/sec Loss 8.4708 LearningRate 0.0392 Epoch: 7 Global Step: 310270 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:15,711-Speed 2637.59 samples/sec Loss 8.5518 LearningRate 0.0392 Epoch: 7 Global Step: 310280 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:19,603-Speed 2632.09 samples/sec Loss 8.5161 LearningRate 0.0392 Epoch: 7 Global Step: 310290 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:23,492-Speed 2633.54 samples/sec Loss 8.4514 LearningRate 0.0392 Epoch: 7 Global Step: 310300 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:27,379-Speed 2635.63 samples/sec Loss 8.4196 LearningRate 0.0392 Epoch: 7 Global Step: 310310 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:36:31,267-Speed 2634.19 samples/sec Loss 8.4066 LearningRate 0.0392 Epoch: 7 Global Step: 310320 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:36:35,154-Speed 2635.19 samples/sec Loss 8.4794 LearningRate 0.0392 Epoch: 7 Global Step: 310330 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:36:39,045-Speed 2632.21 samples/sec Loss 8.4514 LearningRate 0.0392 Epoch: 7 Global Step: 310340 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:36:42,933-Speed 2634.20 samples/sec Loss 8.4282 LearningRate 0.0392 Epoch: 7 Global Step: 310350 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:36:46,820-Speed 2635.27 samples/sec Loss 8.3427 LearningRate 0.0392 Epoch: 7 Global Step: 310360 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:36:50,721-Speed 2625.85 samples/sec Loss 8.6256 LearningRate 0.0392 Epoch: 7 Global Step: 310370 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:36:54,607-Speed 2636.22 samples/sec Loss 8.4908 LearningRate 0.0392 Epoch: 7 Global Step: 310380 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:36:58,493-Speed 2635.66 samples/sec Loss 8.4220 LearningRate 0.0392 Epoch: 7 Global Step: 310390 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:37:02,401-Speed 2621.20 samples/sec Loss 8.3828 LearningRate 0.0392 Epoch: 7 Global Step: 310400 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:37:06,293-Speed 2631.27 samples/sec Loss 8.3579 LearningRate 0.0392 Epoch: 7 Global Step: 310410 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:37:10,182-Speed 2633.94 samples/sec Loss 8.4408 LearningRate 0.0392 Epoch: 7 Global Step: 310420 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:14,074-Speed 2631.18 samples/sec Loss 8.4449 LearningRate 0.0392 Epoch: 7 Global Step: 310430 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:17,963-Speed 2633.95 samples/sec Loss 8.4036 LearningRate 0.0392 Epoch: 7 Global Step: 310440 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:21,855-Speed 2631.36 samples/sec Loss 8.5152 LearningRate 0.0392 Epoch: 7 Global Step: 310450 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:25,750-Speed 2630.15 samples/sec Loss 8.5218 LearningRate 0.0392 Epoch: 7 Global Step: 310460 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:29,645-Speed 2629.31 samples/sec Loss 8.4608 LearningRate 0.0392 Epoch: 7 Global Step: 310470 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:33,543-Speed 2627.64 samples/sec Loss 8.4746 LearningRate 0.0392 Epoch: 7 Global Step: 310480 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:37,454-Speed 2619.12 samples/sec Loss 8.5441 LearningRate 0.0392 Epoch: 7 Global Step: 310490 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:41,371-Speed 2614.84 samples/sec Loss 8.5683 LearningRate 0.0392 Epoch: 7 Global Step: 310500 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:45,269-Speed 2627.93 samples/sec Loss 8.4970 LearningRate 0.0392 Epoch: 7 Global Step: 310510 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:37:49,160-Speed 2631.99 samples/sec Loss 8.5016 LearningRate 0.0391 Epoch: 7 Global Step: 310520 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:37:53,055-Speed 2629.45 samples/sec Loss 8.3966 LearningRate 0.0391 Epoch: 7 Global Step: 310530 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:37:56,952-Speed 2628.55 samples/sec Loss 8.3233 LearningRate 0.0391 Epoch: 7 Global Step: 310540 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:00,844-Speed 2631.51 samples/sec Loss 8.5658 LearningRate 0.0391 Epoch: 7 Global Step: 310550 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:04,749-Speed 2622.93 samples/sec Loss 8.4176 LearningRate 0.0391 Epoch: 7 Global Step: 310560 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:08,645-Speed 2629.18 samples/sec Loss 8.4756 LearningRate 0.0391 Epoch: 7 Global Step: 310570 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:12,580-Speed 2602.77 samples/sec Loss 8.4438 LearningRate 0.0391 Epoch: 7 Global Step: 310580 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:16,492-Speed 2618.23 samples/sec Loss 8.4199 LearningRate 0.0391 Epoch: 7 Global Step: 310590 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:20,391-Speed 2627.40 samples/sec Loss 8.5280 LearningRate 0.0391 Epoch: 7 Global Step: 310600 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:24,292-Speed 2625.64 samples/sec Loss 8.4418 LearningRate 0.0391 Epoch: 7 Global Step: 310610 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:38:28,204-Speed 2617.82 samples/sec Loss 8.4298 LearningRate 0.0391 Epoch: 7 Global Step: 310620 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:32,245-Speed 2534.45 samples/sec Loss 8.4175 LearningRate 0.0391 Epoch: 7 Global Step: 310630 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:36,143-Speed 2627.53 samples/sec Loss 8.5185 LearningRate 0.0391 Epoch: 7 Global Step: 310640 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:40,056-Speed 2618.37 samples/sec Loss 8.4457 LearningRate 0.0391 Epoch: 7 Global Step: 310650 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:43,981-Speed 2609.07 samples/sec Loss 8.4389 LearningRate 0.0391 Epoch: 7 Global Step: 310660 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:47,897-Speed 2616.27 samples/sec Loss 8.4078 LearningRate 0.0391 Epoch: 7 Global Step: 310670 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:51,875-Speed 2574.62 samples/sec Loss 8.4601 LearningRate 0.0391 Epoch: 7 Global Step: 310680 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:55,799-Speed 2610.77 samples/sec Loss 8.4157 LearningRate 0.0391 Epoch: 7 Global Step: 310690 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:38:59,694-Speed 2629.43 samples/sec Loss 8.4592 LearningRate 0.0391 Epoch: 7 Global Step: 310700 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:03,591-Speed 2628.03 samples/sec Loss 8.3541 LearningRate 0.0391 Epoch: 7 Global Step: 310710 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:07,494-Speed 2623.60 samples/sec Loss 8.4443 LearningRate 0.0391 Epoch: 7 Global Step: 310720 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:39:11,392-Speed 2628.05 samples/sec Loss 8.4224 LearningRate 0.0391 Epoch: 7 Global Step: 310730 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:39:15,291-Speed 2627.25 samples/sec Loss 8.3854 LearningRate 0.0391 Epoch: 7 Global Step: 310740 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:39:19,196-Speed 2623.42 samples/sec Loss 8.4832 LearningRate 0.0391 Epoch: 7 Global Step: 310750 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:39:23,113-Speed 2614.22 samples/sec Loss 8.4698 LearningRate 0.0391 Epoch: 7 Global Step: 310760 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:39:27,036-Speed 2611.28 samples/sec Loss 8.4268 LearningRate 0.0391 Epoch: 7 Global Step: 310770 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:39:30,935-Speed 2627.13 samples/sec Loss 8.4991 LearningRate 0.0391 Epoch: 7 Global Step: 310780 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:34,869-Speed 2603.56 samples/sec Loss 8.4699 LearningRate 0.0391 Epoch: 7 Global Step: 310790 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:38,778-Speed 2620.20 samples/sec Loss 8.3524 LearningRate 0.0391 Epoch: 7 Global Step: 310800 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:42,679-Speed 2625.91 samples/sec Loss 8.4687 LearningRate 0.0391 Epoch: 7 Global Step: 310810 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:46,586-Speed 2621.66 samples/sec Loss 8.3580 LearningRate 0.0391 Epoch: 7 Global Step: 310820 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:50,485-Speed 2626.71 samples/sec Loss 8.5220 LearningRate 0.0391 Epoch: 7 Global Step: 310830 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:54,384-Speed 2626.66 samples/sec Loss 8.5933 LearningRate 0.0391 Epoch: 7 Global Step: 310840 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:39:58,288-Speed 2624.22 samples/sec Loss 8.4849 LearningRate 0.0391 Epoch: 7 Global Step: 310850 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:40:02,204-Speed 2615.44 samples/sec Loss 8.4485 LearningRate 0.0391 Epoch: 7 Global Step: 310860 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:40:06,106-Speed 2624.85 samples/sec Loss 8.5360 LearningRate 0.0391 Epoch: 7 Global Step: 310870 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:40:10,004-Speed 2627.69 samples/sec Loss 8.4456 LearningRate 0.0391 Epoch: 7 Global Step: 310880 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:40:13,917-Speed 2617.77 samples/sec Loss 8.3758 LearningRate 0.0391 Epoch: 7 Global Step: 310890 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:40:17,812-Speed 2629.64 samples/sec Loss 8.4242 LearningRate 0.0391 Epoch: 7 Global Step: 310900 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:40:21,721-Speed 2620.16 samples/sec Loss 8.4243 LearningRate 0.0391 Epoch: 7 Global Step: 310910 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:40:25,660-Speed 2600.27 samples/sec Loss 8.4565 LearningRate 0.0391 Epoch: 7 Global Step: 310920 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:40:29,553-Speed 2631.25 samples/sec Loss 8.4456 LearningRate 0.0391 Epoch: 7 Global Step: 310930 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:40:33,451-Speed 2627.89 samples/sec Loss 8.4747 LearningRate 0.0391 Epoch: 7 Global Step: 310940 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:40:37,349-Speed 2627.41 samples/sec Loss 8.3616 LearningRate 0.0391 Epoch: 7 Global Step: 310950 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:40:41,250-Speed 2625.21 samples/sec Loss 8.4452 LearningRate 0.0391 Epoch: 7 Global Step: 310960 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:40:45,155-Speed 2623.70 samples/sec Loss 8.4052 LearningRate 0.0391 Epoch: 7 Global Step: 310970 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:40:49,055-Speed 2625.96 samples/sec Loss 8.4364 LearningRate 0.0391 Epoch: 7 Global Step: 310980 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:40:52,954-Speed 2626.93 samples/sec Loss 8.2832 LearningRate 0.0391 Epoch: 7 Global Step: 310990 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:40:56,855-Speed 2625.47 samples/sec Loss 8.4401 LearningRate 0.0391 Epoch: 7 Global Step: 311000 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:00,773-Speed 2615.16 samples/sec Loss 8.5486 LearningRate 0.0391 Epoch: 7 Global Step: 311010 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:04,677-Speed 2623.27 samples/sec Loss 8.4415 LearningRate 0.0391 Epoch: 7 Global Step: 311020 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:08,580-Speed 2624.11 samples/sec Loss 8.2420 LearningRate 0.0391 Epoch: 7 Global Step: 311030 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:41:12,480-Speed 2626.50 samples/sec Loss 8.5357 LearningRate 0.0391 Epoch: 7 Global Step: 311040 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:41:16,359-Speed 2640.07 samples/sec Loss 8.4349 LearningRate 0.0391 Epoch: 7 Global Step: 311050 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:20,268-Speed 2621.22 samples/sec Loss 8.3645 LearningRate 0.0391 Epoch: 7 Global Step: 311060 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:24,168-Speed 2626.30 samples/sec Loss 8.3454 LearningRate 0.0391 Epoch: 7 Global Step: 311070 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:28,064-Speed 2629.24 samples/sec Loss 8.3852 LearningRate 0.0391 Epoch: 7 Global Step: 311080 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:31,962-Speed 2627.53 samples/sec Loss 8.5385 LearningRate 0.0391 Epoch: 7 Global Step: 311090 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:35,882-Speed 2612.80 samples/sec Loss 8.5042 LearningRate 0.0391 Epoch: 7 Global Step: 311100 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:39,778-Speed 2629.17 samples/sec Loss 8.4377 LearningRate 0.0391 Epoch: 7 Global Step: 311110 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:43,693-Speed 2616.23 samples/sec Loss 8.3575 LearningRate 0.0391 Epoch: 7 Global Step: 311120 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:47,587-Speed 2630.31 samples/sec Loss 8.3399 LearningRate 0.0391 Epoch: 7 Global Step: 311130 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:51,492-Speed 2622.86 samples/sec Loss 8.2684 LearningRate 0.0391 Epoch: 7 Global Step: 311140 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:41:55,397-Speed 2622.49 samples/sec Loss 8.5362 LearningRate 0.0391 Epoch: 7 Global Step: 311150 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:41:59,306-Speed 2620.74 samples/sec Loss 8.6516 LearningRate 0.0391 Epoch: 7 Global Step: 311160 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:03,211-Speed 2622.77 samples/sec Loss 8.3665 LearningRate 0.0391 Epoch: 7 Global Step: 311170 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:07,115-Speed 2623.89 samples/sec Loss 8.3676 LearningRate 0.0390 Epoch: 7 Global Step: 311180 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:11,019-Speed 2623.19 samples/sec Loss 8.3812 LearningRate 0.0390 Epoch: 7 Global Step: 311190 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:14,930-Speed 2618.70 samples/sec Loss 8.3970 LearningRate 0.0390 Epoch: 7 Global Step: 311200 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:18,843-Speed 2617.52 samples/sec Loss 8.4444 LearningRate 0.0390 Epoch: 7 Global Step: 311210 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:22,752-Speed 2619.98 samples/sec Loss 8.4633 LearningRate 0.0390 Epoch: 7 Global Step: 311220 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:26,666-Speed 2617.83 samples/sec Loss 8.3880 LearningRate 0.0390 Epoch: 7 Global Step: 311230 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:30,580-Speed 2616.78 samples/sec Loss 8.4947 LearningRate 0.0390 Epoch: 7 Global Step: 311240 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:34,478-Speed 2627.71 samples/sec Loss 8.5214 LearningRate 0.0390 Epoch: 7 Global Step: 311250 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:42:38,357-Speed 2640.74 samples/sec Loss 8.2512 LearningRate 0.0390 Epoch: 7 Global Step: 311260 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:42,254-Speed 2627.66 samples/sec Loss 8.4475 LearningRate 0.0390 Epoch: 7 Global Step: 311270 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:46,155-Speed 2625.80 samples/sec Loss 8.4358 LearningRate 0.0390 Epoch: 7 Global Step: 311280 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:50,068-Speed 2617.90 samples/sec Loss 8.3676 LearningRate 0.0390 Epoch: 7 Global Step: 311290 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:53,985-Speed 2614.47 samples/sec Loss 8.4807 LearningRate 0.0390 Epoch: 7 Global Step: 311300 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:42:57,907-Speed 2611.87 samples/sec Loss 8.4474 LearningRate 0.0390 Epoch: 7 Global Step: 311310 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:43:01,858-Speed 2592.25 samples/sec Loss 8.3757 LearningRate 0.0390 Epoch: 7 Global Step: 311320 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:43:05,894-Speed 2538.11 samples/sec Loss 8.4751 LearningRate 0.0390 Epoch: 7 Global Step: 311330 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:09,815-Speed 2612.31 samples/sec Loss 8.3552 LearningRate 0.0390 Epoch: 7 Global Step: 311340 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:13,717-Speed 2625.13 samples/sec Loss 8.4421 LearningRate 0.0390 Epoch: 7 Global Step: 311350 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:17,619-Speed 2624.37 samples/sec Loss 8.2570 LearningRate 0.0390 Epoch: 7 Global Step: 311360 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:21,530-Speed 2618.94 samples/sec Loss 8.5122 LearningRate 0.0390 Epoch: 7 Global Step: 311370 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:25,436-Speed 2622.29 samples/sec Loss 8.4127 LearningRate 0.0390 Epoch: 7 Global Step: 311380 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:29,341-Speed 2622.80 samples/sec Loss 8.5092 LearningRate 0.0390 Epoch: 7 Global Step: 311390 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:33,240-Speed 2627.58 samples/sec Loss 8.5100 LearningRate 0.0390 Epoch: 7 Global Step: 311400 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:37,138-Speed 2627.54 samples/sec Loss 8.4577 LearningRate 0.0390 Epoch: 7 Global Step: 311410 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:41,038-Speed 2626.53 samples/sec Loss 8.4460 LearningRate 0.0390 Epoch: 7 Global Step: 311420 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:43:44,938-Speed 2626.34 samples/sec Loss 8.4963 LearningRate 0.0390 Epoch: 7 Global Step: 311430 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:43:48,837-Speed 2627.10 samples/sec Loss 8.3741 LearningRate 0.0390 Epoch: 7 Global Step: 311440 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:43:52,736-Speed 2626.44 samples/sec Loss 8.4036 LearningRate 0.0390 Epoch: 7 Global Step: 311450 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:43:56,664-Speed 2608.13 samples/sec Loss 8.5723 LearningRate 0.0390 Epoch: 7 Global Step: 311460 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:44:00,586-Speed 2611.67 samples/sec Loss 8.4579 LearningRate 0.0390 Epoch: 7 Global Step: 311470 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:44:04,491-Speed 2623.18 samples/sec Loss 8.4031 LearningRate 0.0390 Epoch: 7 Global Step: 311480 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:44:08,423-Speed 2604.63 samples/sec Loss 8.3330 LearningRate 0.0390 Epoch: 7 Global Step: 311490 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:44:12,337-Speed 2616.69 samples/sec Loss 8.5702 LearningRate 0.0390 Epoch: 7 Global Step: 311500 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:44:16,250-Speed 2617.70 samples/sec Loss 8.3518 LearningRate 0.0390 Epoch: 7 Global Step: 311510 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:44:20,160-Speed 2622.24 samples/sec Loss 8.5166 LearningRate 0.0390 Epoch: 7 Global Step: 311520 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:44:24,079-Speed 2613.34 samples/sec Loss 8.4391 LearningRate 0.0390 Epoch: 7 Global Step: 311530 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:44:27,995-Speed 2615.62 samples/sec Loss 8.3858 LearningRate 0.0390 Epoch: 7 Global Step: 311540 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:44:31,898-Speed 2623.92 samples/sec Loss 8.3988 LearningRate 0.0390 Epoch: 7 Global Step: 311550 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:44:35,800-Speed 2625.50 samples/sec Loss 8.6242 LearningRate 0.0390 Epoch: 7 Global Step: 311560 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:44:39,701-Speed 2626.17 samples/sec Loss 8.4315 LearningRate 0.0390 Epoch: 7 Global Step: 311570 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:44:43,602-Speed 2625.23 samples/sec Loss 8.3229 LearningRate 0.0390 Epoch: 7 Global Step: 311580 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:44:47,468-Speed 2649.60 samples/sec Loss 8.4096 LearningRate 0.0390 Epoch: 7 Global Step: 311590 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:44:51,265-Speed 2697.38 samples/sec Loss 9.7189 LearningRate 0.0390 Epoch: 7 Global Step: 311600 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:44:55,170-Speed 2623.43 samples/sec Loss 10.1570 LearningRate 0.0390 Epoch: 7 Global Step: 311610 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:44:59,066-Speed 2629.18 samples/sec Loss 8.9240 LearningRate 0.0390 Epoch: 7 Global Step: 311620 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:03,054-Speed 2567.61 samples/sec Loss 8.6603 LearningRate 0.0390 Epoch: 7 Global Step: 311630 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:06,954-Speed 2626.56 samples/sec Loss 8.5170 LearningRate 0.0390 Epoch: 7 Global Step: 311640 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:10,851-Speed 2628.45 samples/sec Loss 8.5118 LearningRate 0.0390 Epoch: 7 Global Step: 311650 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:14,752-Speed 2626.90 samples/sec Loss 8.4128 LearningRate 0.0390 Epoch: 7 Global Step: 311660 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:18,714-Speed 2584.79 samples/sec Loss 8.4458 LearningRate 0.0390 Epoch: 7 Global Step: 311670 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:22,610-Speed 2629.13 samples/sec Loss 8.4683 LearningRate 0.0390 Epoch: 7 Global Step: 311680 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:26,517-Speed 2621.08 samples/sec Loss 8.4163 LearningRate 0.0390 Epoch: 7 Global Step: 311690 Fp16 Grad Scale: 2048 Required: 58 hours
Training: 2022-04-14 06:45:30,415-Speed 2628.29 samples/sec Loss 8.4282 LearningRate 0.0390 Epoch: 7 Global Step: 311700 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:45:34,318-Speed 2623.87 samples/sec Loss 8.4193 LearningRate 0.0390 Epoch: 7 Global Step: 311710 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:45:38,217-Speed 2626.90 samples/sec Loss 8.5137 LearningRate 0.0390 Epoch: 7 Global Step: 311720 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:45:42,121-Speed 2623.28 samples/sec Loss 8.3824 LearningRate 0.0390 Epoch: 7 Global Step: 311730 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:45:46,010-Speed 2634.56 samples/sec Loss 8.6685 LearningRate 0.0390 Epoch: 7 Global Step: 311740 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:45:49,909-Speed 2627.16 samples/sec Loss 8.4228 LearningRate 0.0390 Epoch: 7 Global Step: 311750 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:45:53,808-Speed 2626.82 samples/sec Loss 8.5446 LearningRate 0.0390 Epoch: 7 Global Step: 311760 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:45:57,707-Speed 2627.66 samples/sec Loss 8.3771 LearningRate 0.0390 Epoch: 7 Global Step: 311770 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:46:01,604-Speed 2628.16 samples/sec Loss 8.3709 LearningRate 0.0390 Epoch: 7 Global Step: 311780 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:46:05,505-Speed 2625.23 samples/sec Loss 8.3808 LearningRate 0.0390 Epoch: 7 Global Step: 311790 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 06:46:09,425-Speed 2612.83 samples/sec Loss 8.4147 LearningRate 0.0390 Epoch: 7 Global Step: 311800 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:13,554-Speed 2480.69 samples/sec Loss 8.3914 LearningRate 0.0390 Epoch: 7 Global Step: 311810 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:17,688-Speed 2477.33 samples/sec Loss 8.5301 LearningRate 0.0390 Epoch: 7 Global Step: 311820 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:21,675-Speed 2569.53 samples/sec Loss 8.3412 LearningRate 0.0390 Epoch: 7 Global Step: 311830 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:25,750-Speed 2512.96 samples/sec Loss 8.5161 LearningRate 0.0389 Epoch: 7 Global Step: 311840 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:29,694-Speed 2597.65 samples/sec Loss 8.5450 LearningRate 0.0389 Epoch: 7 Global Step: 311850 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:33,590-Speed 2628.63 samples/sec Loss 8.5380 LearningRate 0.0389 Epoch: 7 Global Step: 311860 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:37,491-Speed 2625.54 samples/sec Loss 8.4247 LearningRate 0.0389 Epoch: 7 Global Step: 311870 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:41,397-Speed 2622.01 samples/sec Loss 8.3437 LearningRate 0.0389 Epoch: 7 Global Step: 311880 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:45,299-Speed 2625.57 samples/sec Loss 8.4183 LearningRate 0.0389 Epoch: 7 Global Step: 311890 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:46:49,202-Speed 2624.14 samples/sec Loss 8.3847 LearningRate 0.0389 Epoch: 7 Global Step: 311900 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:46:53,110-Speed 2621.58 samples/sec Loss 8.6306 LearningRate 0.0389 Epoch: 7 Global Step: 311910 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:46:57,013-Speed 2624.30 samples/sec Loss 8.4912 LearningRate 0.0389 Epoch: 7 Global Step: 311920 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:00,930-Speed 2614.82 samples/sec Loss 8.3079 LearningRate 0.0389 Epoch: 7 Global Step: 311930 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:04,831-Speed 2625.45 samples/sec Loss 8.4149 LearningRate 0.0389 Epoch: 7 Global Step: 311940 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:08,732-Speed 2625.67 samples/sec Loss 8.4739 LearningRate 0.0389 Epoch: 7 Global Step: 311950 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:12,638-Speed 2622.70 samples/sec Loss 8.4635 LearningRate 0.0389 Epoch: 7 Global Step: 311960 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:16,541-Speed 2624.01 samples/sec Loss 8.4842 LearningRate 0.0389 Epoch: 7 Global Step: 311970 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:20,448-Speed 2622.05 samples/sec Loss 8.3876 LearningRate 0.0389 Epoch: 7 Global Step: 311980 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:24,348-Speed 2626.06 samples/sec Loss 8.5130 LearningRate 0.0389 Epoch: 7 Global Step: 311990 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:47:28,255-Speed 2621.80 samples/sec Loss 8.3572 LearningRate 0.0389 Epoch: 7 Global Step: 312000 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:32,162-Speed 2621.89 samples/sec Loss 8.4163 LearningRate 0.0389 Epoch: 7 Global Step: 312010 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:36,071-Speed 2619.55 samples/sec Loss 8.4925 LearningRate 0.0389 Epoch: 7 Global Step: 312020 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:39,986-Speed 2616.15 samples/sec Loss 8.4854 LearningRate 0.0389 Epoch: 7 Global Step: 312030 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:43,903-Speed 2615.42 samples/sec Loss 8.4175 LearningRate 0.0389 Epoch: 7 Global Step: 312040 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:47,815-Speed 2617.95 samples/sec Loss 8.4683 LearningRate 0.0389 Epoch: 7 Global Step: 312050 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:51,888-Speed 2514.90 samples/sec Loss 8.4024 LearningRate 0.0389 Epoch: 7 Global Step: 312060 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:55,970-Speed 2509.06 samples/sec Loss 8.2676 LearningRate 0.0389 Epoch: 7 Global Step: 312070 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:47:59,899-Speed 2607.22 samples/sec Loss 8.3406 LearningRate 0.0389 Epoch: 7 Global Step: 312080 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:48:03,809-Speed 2619.16 samples/sec Loss 8.3854 LearningRate 0.0389 Epoch: 7 Global Step: 312090 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:48:07,726-Speed 2615.16 samples/sec Loss 8.3958 LearningRate 0.0389 Epoch: 7 Global Step: 312100 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:11,639-Speed 2617.19 samples/sec Loss 8.4724 LearningRate 0.0389 Epoch: 7 Global Step: 312110 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:15,549-Speed 2620.37 samples/sec Loss 8.4581 LearningRate 0.0389 Epoch: 7 Global Step: 312120 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:19,460-Speed 2618.35 samples/sec Loss 8.4161 LearningRate 0.0389 Epoch: 7 Global Step: 312130 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:23,369-Speed 2620.20 samples/sec Loss 8.3917 LearningRate 0.0389 Epoch: 7 Global Step: 312140 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:27,275-Speed 2622.79 samples/sec Loss 8.3975 LearningRate 0.0389 Epoch: 7 Global Step: 312150 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:31,175-Speed 2626.27 samples/sec Loss 8.4877 LearningRate 0.0389 Epoch: 7 Global Step: 312160 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:35,079-Speed 2623.66 samples/sec Loss 8.3622 LearningRate 0.0389 Epoch: 7 Global Step: 312170 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:39,014-Speed 2603.13 samples/sec Loss 8.5725 LearningRate 0.0389 Epoch: 7 Global Step: 312180 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:42,925-Speed 2618.47 samples/sec Loss 8.5494 LearningRate 0.0389 Epoch: 7 Global Step: 312190 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:48:46,836-Speed 2618.76 samples/sec Loss 8.3946 LearningRate 0.0389 Epoch: 7 Global Step: 312200 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:48:50,762-Speed 2609.37 samples/sec Loss 8.4427 LearningRate 0.0389 Epoch: 7 Global Step: 312210 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:48:54,665-Speed 2624.41 samples/sec Loss 8.4159 LearningRate 0.0389 Epoch: 7 Global Step: 312220 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:48:58,596-Speed 2606.01 samples/sec Loss 8.4530 LearningRate 0.0389 Epoch: 7 Global Step: 312230 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:02,528-Speed 2604.92 samples/sec Loss 8.3843 LearningRate 0.0389 Epoch: 7 Global Step: 312240 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:06,428-Speed 2626.55 samples/sec Loss 8.4410 LearningRate 0.0389 Epoch: 7 Global Step: 312250 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:10,336-Speed 2620.83 samples/sec Loss 8.4254 LearningRate 0.0389 Epoch: 7 Global Step: 312260 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:14,238-Speed 2624.78 samples/sec Loss 8.4932 LearningRate 0.0389 Epoch: 7 Global Step: 312270 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:18,140-Speed 2625.09 samples/sec Loss 8.4168 LearningRate 0.0389 Epoch: 7 Global Step: 312280 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:22,056-Speed 2615.53 samples/sec Loss 8.4995 LearningRate 0.0389 Epoch: 7 Global Step: 312290 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:25,966-Speed 2619.91 samples/sec Loss 8.4804 LearningRate 0.0389 Epoch: 7 Global Step: 312300 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:49:29,852-Speed 2635.70 samples/sec Loss 8.5981 LearningRate 0.0389 Epoch: 7 Global Step: 312310 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:33,765-Speed 2617.68 samples/sec Loss 8.4015 LearningRate 0.0389 Epoch: 7 Global Step: 312320 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:37,661-Speed 2629.19 samples/sec Loss 8.4235 LearningRate 0.0389 Epoch: 7 Global Step: 312330 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:41,564-Speed 2624.53 samples/sec Loss 8.4754 LearningRate 0.0389 Epoch: 7 Global Step: 312340 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:45,474-Speed 2619.26 samples/sec Loss 8.5315 LearningRate 0.0389 Epoch: 7 Global Step: 312350 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:49,388-Speed 2616.58 samples/sec Loss 8.2924 LearningRate 0.0389 Epoch: 7 Global Step: 312360 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:53,294-Speed 2622.66 samples/sec Loss 8.3392 LearningRate 0.0389 Epoch: 7 Global Step: 312370 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:49:57,200-Speed 2622.58 samples/sec Loss 8.4595 LearningRate 0.0389 Epoch: 7 Global Step: 312380 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:01,106-Speed 2622.35 samples/sec Loss 8.5479 LearningRate 0.0389 Epoch: 7 Global Step: 312390 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:05,019-Speed 2617.66 samples/sec Loss 8.5559 LearningRate 0.0389 Epoch: 7 Global Step: 312400 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:08,927-Speed 2621.45 samples/sec Loss 8.4579 LearningRate 0.0389 Epoch: 7 Global Step: 312410 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:12,829-Speed 2624.44 samples/sec Loss 8.2464 LearningRate 0.0389 Epoch: 7 Global Step: 312420 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:16,734-Speed 2622.79 samples/sec Loss 8.3115 LearningRate 0.0389 Epoch: 7 Global Step: 312430 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:20,633-Speed 2627.39 samples/sec Loss 8.4618 LearningRate 0.0389 Epoch: 7 Global Step: 312440 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:24,534-Speed 2625.92 samples/sec Loss 8.5139 LearningRate 0.0389 Epoch: 7 Global Step: 312450 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:28,432-Speed 2627.22 samples/sec Loss 8.4921 LearningRate 0.0389 Epoch: 7 Global Step: 312460 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:32,335-Speed 2624.53 samples/sec Loss 8.4759 LearningRate 0.0389 Epoch: 7 Global Step: 312470 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:36,244-Speed 2619.96 samples/sec Loss 8.4726 LearningRate 0.0389 Epoch: 7 Global Step: 312480 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:40,154-Speed 2619.60 samples/sec Loss 8.4313 LearningRate 0.0389 Epoch: 7 Global Step: 312490 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:44,061-Speed 2621.87 samples/sec Loss 8.3497 LearningRate 0.0389 Epoch: 7 Global Step: 312500 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:47,953-Speed 2631.89 samples/sec Loss 8.4629 LearningRate 0.0388 Epoch: 7 Global Step: 312510 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:51,848-Speed 2629.06 samples/sec Loss 8.4168 LearningRate 0.0388 Epoch: 7 Global Step: 312520 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:55,761-Speed 2617.98 samples/sec Loss 8.4600 LearningRate 0.0388 Epoch: 7 Global Step: 312530 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:50:59,664-Speed 2623.98 samples/sec Loss 8.3542 LearningRate 0.0388 Epoch: 7 Global Step: 312540 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:03,564-Speed 2626.41 samples/sec Loss 8.4968 LearningRate 0.0388 Epoch: 7 Global Step: 312550 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:07,463-Speed 2626.26 samples/sec Loss 8.2081 LearningRate 0.0388 Epoch: 7 Global Step: 312560 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:11,368-Speed 2623.71 samples/sec Loss 8.4110 LearningRate 0.0388 Epoch: 7 Global Step: 312570 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:15,279-Speed 2619.00 samples/sec Loss 8.3726 LearningRate 0.0388 Epoch: 7 Global Step: 312580 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:19,179-Speed 2625.93 samples/sec Loss 8.5513 LearningRate 0.0388 Epoch: 7 Global Step: 312590 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:23,091-Speed 2618.86 samples/sec Loss 8.4975 LearningRate 0.0388 Epoch: 7 Global Step: 312600 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:27,001-Speed 2619.42 samples/sec Loss 8.5310 LearningRate 0.0388 Epoch: 7 Global Step: 312610 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:51:30,912-Speed 2618.83 samples/sec Loss 8.2921 LearningRate 0.0388 Epoch: 7 Global Step: 312620 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:51:34,819-Speed 2621.84 samples/sec Loss 8.4409 LearningRate 0.0388 Epoch: 7 Global Step: 312630 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:51:38,746-Speed 2608.09 samples/sec Loss 8.4652 LearningRate 0.0388 Epoch: 7 Global Step: 312640 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:51:42,653-Speed 2621.52 samples/sec Loss 8.5112 LearningRate 0.0388 Epoch: 7 Global Step: 312650 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:46,561-Speed 2621.02 samples/sec Loss 8.4297 LearningRate 0.0388 Epoch: 7 Global Step: 312660 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:50,470-Speed 2620.74 samples/sec Loss 8.2682 LearningRate 0.0388 Epoch: 7 Global Step: 312670 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:54,406-Speed 2602.48 samples/sec Loss 8.2828 LearningRate 0.0388 Epoch: 7 Global Step: 312680 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:51:58,319-Speed 2617.66 samples/sec Loss 8.4542 LearningRate 0.0388 Epoch: 7 Global Step: 312690 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:52:02,228-Speed 2619.83 samples/sec Loss 8.4604 LearningRate 0.0388 Epoch: 7 Global Step: 312700 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:52:06,119-Speed 2632.70 samples/sec Loss 8.3983 LearningRate 0.0388 Epoch: 7 Global Step: 312710 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:10,031-Speed 2618.05 samples/sec Loss 8.5225 LearningRate 0.0388 Epoch: 7 Global Step: 312720 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:14,014-Speed 2572.07 samples/sec Loss 8.5229 LearningRate 0.0388 Epoch: 7 Global Step: 312730 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:17,921-Speed 2621.52 samples/sec Loss 8.4133 LearningRate 0.0388 Epoch: 7 Global Step: 312740 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:21,850-Speed 2607.03 samples/sec Loss 8.3631 LearningRate 0.0388 Epoch: 7 Global Step: 312750 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:25,925-Speed 2513.35 samples/sec Loss 8.3578 LearningRate 0.0388 Epoch: 7 Global Step: 312760 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:29,939-Speed 2551.92 samples/sec Loss 8.3996 LearningRate 0.0388 Epoch: 7 Global Step: 312770 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:33,842-Speed 2624.06 samples/sec Loss 8.4446 LearningRate 0.0388 Epoch: 7 Global Step: 312780 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:37,748-Speed 2622.46 samples/sec Loss 8.4536 LearningRate 0.0388 Epoch: 7 Global Step: 312790 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:41,659-Speed 2618.94 samples/sec Loss 8.3795 LearningRate 0.0388 Epoch: 7 Global Step: 312800 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:45,586-Speed 2608.22 samples/sec Loss 8.3635 LearningRate 0.0388 Epoch: 7 Global Step: 312810 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:52:49,468-Speed 2638.60 samples/sec Loss 8.3704 LearningRate 0.0388 Epoch: 7 Global Step: 312820 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:53,368-Speed 2626.44 samples/sec Loss 8.3452 LearningRate 0.0388 Epoch: 7 Global Step: 312830 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:52:57,269-Speed 2625.43 samples/sec Loss 8.5296 LearningRate 0.0388 Epoch: 7 Global Step: 312840 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:01,168-Speed 2627.83 samples/sec Loss 8.4627 LearningRate 0.0388 Epoch: 7 Global Step: 312850 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:05,074-Speed 2621.91 samples/sec Loss 8.3356 LearningRate 0.0388 Epoch: 7 Global Step: 312860 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:08,976-Speed 2625.16 samples/sec Loss 8.4248 LearningRate 0.0388 Epoch: 7 Global Step: 312870 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:12,897-Speed 2611.90 samples/sec Loss 8.3113 LearningRate 0.0388 Epoch: 7 Global Step: 312880 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:16,980-Speed 2508.48 samples/sec Loss 8.3428 LearningRate 0.0388 Epoch: 7 Global Step: 312890 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:20,983-Speed 2559.20 samples/sec Loss 8.5260 LearningRate 0.0388 Epoch: 7 Global Step: 312900 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:24,887-Speed 2623.24 samples/sec Loss 8.5183 LearningRate 0.0388 Epoch: 7 Global Step: 312910 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:53:28,793-Speed 2622.72 samples/sec Loss 8.3623 LearningRate 0.0388 Epoch: 7 Global Step: 312920 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:53:32,697-Speed 2623.54 samples/sec Loss 8.3593 LearningRate 0.0388 Epoch: 7 Global Step: 312930 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:53:36,636-Speed 2600.62 samples/sec Loss 8.4198 LearningRate 0.0388 Epoch: 7 Global Step: 312940 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:53:40,483-Speed 2662.57 samples/sec Loss 8.9214 LearningRate 0.0388 Epoch: 7 Global Step: 312950 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:53:44,387-Speed 2623.78 samples/sec Loss 9.0122 LearningRate 0.0388 Epoch: 7 Global Step: 312960 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:53:48,290-Speed 2624.09 samples/sec Loss 8.7122 LearningRate 0.0388 Epoch: 7 Global Step: 312970 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:53:52,192-Speed 2628.70 samples/sec Loss 8.5815 LearningRate 0.0388 Epoch: 7 Global Step: 312980 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:53:56,089-Speed 2628.30 samples/sec Loss 8.4867 LearningRate 0.0388 Epoch: 7 Global Step: 312990 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:53:59,985-Speed 2629.25 samples/sec Loss 8.5565 LearningRate 0.0388 Epoch: 7 Global Step: 313000 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:54:03,882-Speed 2628.43 samples/sec Loss 8.4591 LearningRate 0.0388 Epoch: 7 Global Step: 313010 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:54:07,779-Speed 2628.40 samples/sec Loss 8.3508 LearningRate 0.0388 Epoch: 7 Global Step: 313020 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:54:11,678-Speed 2626.32 samples/sec Loss 8.3312 LearningRate 0.0388 Epoch: 7 Global Step: 313030 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:54:15,574-Speed 2629.39 samples/sec Loss 8.4270 LearningRate 0.0388 Epoch: 7 Global Step: 313040 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 06:54:19,472-Speed 2627.49 samples/sec Loss 8.4652 LearningRate 0.0388 Epoch: 7 Global Step: 313050 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:23,370-Speed 2628.03 samples/sec Loss 8.4812 LearningRate 0.0388 Epoch: 7 Global Step: 313060 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:27,297-Speed 2608.14 samples/sec Loss 8.3695 LearningRate 0.0388 Epoch: 7 Global Step: 313070 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:31,194-Speed 2628.99 samples/sec Loss 8.5941 LearningRate 0.0388 Epoch: 7 Global Step: 313080 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:35,093-Speed 2627.40 samples/sec Loss 8.3987 LearningRate 0.0388 Epoch: 7 Global Step: 313090 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:38,995-Speed 2625.39 samples/sec Loss 8.3289 LearningRate 0.0388 Epoch: 7 Global Step: 313100 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:42,890-Speed 2629.01 samples/sec Loss 8.4923 LearningRate 0.0388 Epoch: 7 Global Step: 313110 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:46,787-Speed 2628.35 samples/sec Loss 8.3501 LearningRate 0.0388 Epoch: 7 Global Step: 313120 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:50,685-Speed 2627.95 samples/sec Loss 8.5258 LearningRate 0.0388 Epoch: 7 Global Step: 313130 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:54,581-Speed 2628.90 samples/sec Loss 8.3088 LearningRate 0.0388 Epoch: 7 Global Step: 313140 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 06:54:58,476-Speed 2629.35 samples/sec Loss 8.4711 LearningRate 0.0388 Epoch: 7 Global Step: 313150 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:02,373-Speed 2628.33 samples/sec Loss 8.3861 LearningRate 0.0388 Epoch: 7 Global Step: 313160 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:06,269-Speed 2629.17 samples/sec Loss 8.5660 LearningRate 0.0388 Epoch: 7 Global Step: 313170 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:10,181-Speed 2618.28 samples/sec Loss 8.2951 LearningRate 0.0387 Epoch: 7 Global Step: 313180 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:14,076-Speed 2629.18 samples/sec Loss 8.3155 LearningRate 0.0387 Epoch: 7 Global Step: 313190 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:17,977-Speed 2625.55 samples/sec Loss 8.2870 LearningRate 0.0387 Epoch: 7 Global Step: 313200 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:21,877-Speed 2626.42 samples/sec Loss 8.5498 LearningRate 0.0387 Epoch: 7 Global Step: 313210 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:25,788-Speed 2619.32 samples/sec Loss 8.5426 LearningRate 0.0387 Epoch: 7 Global Step: 313220 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:29,697-Speed 2620.12 samples/sec Loss 8.4436 LearningRate 0.0387 Epoch: 7 Global Step: 313230 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:33,611-Speed 2616.89 samples/sec Loss 8.4365 LearningRate 0.0387 Epoch: 7 Global Step: 313240 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 06:55:37,531-Speed 2612.64 samples/sec Loss 8.2447 LearningRate 0.0387 Epoch: 7 Global Step: 313250 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:55:41,444-Speed 2617.78 samples/sec Loss 8.4537 LearningRate 0.0387 Epoch: 7 Global Step: 313260 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:55:45,345-Speed 2625.49 samples/sec Loss 8.4907 LearningRate 0.0387 Epoch: 7 Global Step: 313270 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:55:49,251-Speed 2622.49 samples/sec Loss 8.3773 LearningRate 0.0387 Epoch: 7 Global Step: 313280 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:55:53,151-Speed 2626.33 samples/sec Loss 8.4008 LearningRate 0.0387 Epoch: 7 Global Step: 313290 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:55:57,050-Speed 2627.20 samples/sec Loss 8.2501 LearningRate 0.0387 Epoch: 7 Global Step: 313300 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:56:00,963-Speed 2617.85 samples/sec Loss 8.3963 LearningRate 0.0387 Epoch: 7 Global Step: 313310 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:56:04,896-Speed 2604.03 samples/sec Loss 8.5668 LearningRate 0.0387 Epoch: 7 Global Step: 313320 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:56:08,793-Speed 2628.66 samples/sec Loss 8.2914 LearningRate 0.0387 Epoch: 7 Global Step: 313330 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:56:12,695-Speed 2625.08 samples/sec Loss 8.4025 LearningRate 0.0387 Epoch: 7 Global Step: 313340 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 06:56:16,594-Speed 2626.58 samples/sec Loss 8.3324 LearningRate 0.0387 Epoch: 7 Global Step: 313350 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:20,494-Speed 2626.15 samples/sec Loss 8.3138 LearningRate 0.0387 Epoch: 7 Global Step: 313360 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:24,398-Speed 2624.20 samples/sec Loss 8.4370 LearningRate 0.0387 Epoch: 7 Global Step: 313370 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:28,301-Speed 2623.63 samples/sec Loss 8.4275 LearningRate 0.0387 Epoch: 7 Global Step: 313380 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:32,216-Speed 2621.21 samples/sec Loss 8.4522 LearningRate 0.0387 Epoch: 7 Global Step: 313390 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:36,133-Speed 2615.28 samples/sec Loss 8.5992 LearningRate 0.0387 Epoch: 7 Global Step: 313400 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:40,040-Speed 2621.51 samples/sec Loss 8.3160 LearningRate 0.0387 Epoch: 7 Global Step: 313410 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:43,946-Speed 2622.26 samples/sec Loss 8.4191 LearningRate 0.0387 Epoch: 7 Global Step: 313420 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:47,855-Speed 2620.82 samples/sec Loss 8.4298 LearningRate 0.0387 Epoch: 7 Global Step: 313430 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:51,756-Speed 2625.61 samples/sec Loss 8.3636 LearningRate 0.0387 Epoch: 7 Global Step: 313440 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:55,638-Speed 2638.21 samples/sec Loss 8.4477 LearningRate 0.0387 Epoch: 7 Global Step: 313450 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:56:59,691-Speed 2527.02 samples/sec Loss 8.3152 LearningRate 0.0387 Epoch: 7 Global Step: 313460 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:03,779-Speed 2506.06 samples/sec Loss 8.4328 LearningRate 0.0387 Epoch: 7 Global Step: 313470 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:07,680-Speed 2626.43 samples/sec Loss 8.3860 LearningRate 0.0387 Epoch: 7 Global Step: 313480 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:11,575-Speed 2628.93 samples/sec Loss 8.3482 LearningRate 0.0387 Epoch: 7 Global Step: 313490 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:15,479-Speed 2624.28 samples/sec Loss 8.3845 LearningRate 0.0387 Epoch: 7 Global Step: 313500 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:19,373-Speed 2629.75 samples/sec Loss 8.2739 LearningRate 0.0387 Epoch: 7 Global Step: 313510 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:23,366-Speed 2565.29 samples/sec Loss 8.3508 LearningRate 0.0387 Epoch: 7 Global Step: 313520 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:27,361-Speed 2563.78 samples/sec Loss 8.4475 LearningRate 0.0387 Epoch: 7 Global Step: 313530 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:31,260-Speed 2627.43 samples/sec Loss 8.4503 LearningRate 0.0387 Epoch: 7 Global Step: 313540 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:35,156-Speed 2628.44 samples/sec Loss 8.2536 LearningRate 0.0387 Epoch: 7 Global Step: 313550 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:57:39,053-Speed 2628.01 samples/sec Loss 8.4307 LearningRate 0.0387 Epoch: 7 Global Step: 313560 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:57:42,967-Speed 2617.40 samples/sec Loss 8.5069 LearningRate 0.0387 Epoch: 7 Global Step: 313570 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:57:46,856-Speed 2633.94 samples/sec Loss 8.4516 LearningRate 0.0387 Epoch: 7 Global Step: 313580 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:50,752-Speed 2628.56 samples/sec Loss 8.4846 LearningRate 0.0387 Epoch: 7 Global Step: 313590 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:54,665-Speed 2617.68 samples/sec Loss 8.4165 LearningRate 0.0387 Epoch: 7 Global Step: 313600 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:57:58,564-Speed 2627.49 samples/sec Loss 8.3312 LearningRate 0.0387 Epoch: 7 Global Step: 313610 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:02,466-Speed 2625.22 samples/sec Loss 8.4209 LearningRate 0.0387 Epoch: 7 Global Step: 313620 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:06,366-Speed 2626.06 samples/sec Loss 8.2358 LearningRate 0.0387 Epoch: 7 Global Step: 313630 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:10,262-Speed 2628.67 samples/sec Loss 8.4108 LearningRate 0.0387 Epoch: 7 Global Step: 313640 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:14,160-Speed 2628.07 samples/sec Loss 8.5222 LearningRate 0.0387 Epoch: 7 Global Step: 313650 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:18,056-Speed 2628.63 samples/sec Loss 8.4529 LearningRate 0.0387 Epoch: 7 Global Step: 313660 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:21,953-Speed 2628.26 samples/sec Loss 8.3541 LearningRate 0.0387 Epoch: 7 Global Step: 313670 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:25,848-Speed 2629.64 samples/sec Loss 8.3057 LearningRate 0.0387 Epoch: 7 Global Step: 313680 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:58:29,726-Speed 2641.32 samples/sec Loss 8.3187 LearningRate 0.0387 Epoch: 7 Global Step: 313690 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:33,626-Speed 2626.70 samples/sec Loss 8.5005 LearningRate 0.0387 Epoch: 7 Global Step: 313700 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:37,525-Speed 2626.83 samples/sec Loss 8.3177 LearningRate 0.0387 Epoch: 7 Global Step: 313710 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:41,420-Speed 2629.22 samples/sec Loss 8.3100 LearningRate 0.0387 Epoch: 7 Global Step: 313720 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:45,315-Speed 2629.66 samples/sec Loss 8.4117 LearningRate 0.0387 Epoch: 7 Global Step: 313730 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:49,211-Speed 2628.67 samples/sec Loss 8.2567 LearningRate 0.0387 Epoch: 7 Global Step: 313740 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:53,113-Speed 2625.54 samples/sec Loss 8.4380 LearningRate 0.0387 Epoch: 7 Global Step: 313750 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:58:57,008-Speed 2628.98 samples/sec Loss 8.4623 LearningRate 0.0387 Epoch: 7 Global Step: 313760 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:00,910-Speed 2625.72 samples/sec Loss 8.3529 LearningRate 0.0387 Epoch: 7 Global Step: 313770 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:04,810-Speed 2626.18 samples/sec Loss 8.2957 LearningRate 0.0387 Epoch: 7 Global Step: 313780 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:08,706-Speed 2628.94 samples/sec Loss 8.3447 LearningRate 0.0387 Epoch: 7 Global Step: 313790 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:59:12,589-Speed 2637.51 samples/sec Loss 8.4049 LearningRate 0.0387 Epoch: 7 Global Step: 313800 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:16,518-Speed 2606.96 samples/sec Loss 8.3787 LearningRate 0.0387 Epoch: 7 Global Step: 313810 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:20,442-Speed 2610.38 samples/sec Loss 8.4359 LearningRate 0.0387 Epoch: 7 Global Step: 313820 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:24,409-Speed 2582.03 samples/sec Loss 8.4365 LearningRate 0.0387 Epoch: 7 Global Step: 313830 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:28,301-Speed 2631.46 samples/sec Loss 8.4283 LearningRate 0.0386 Epoch: 7 Global Step: 313840 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:32,196-Speed 2630.38 samples/sec Loss 8.2462 LearningRate 0.0386 Epoch: 7 Global Step: 313850 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:36,102-Speed 2622.44 samples/sec Loss 8.4215 LearningRate 0.0386 Epoch: 7 Global Step: 313860 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:40,021-Speed 2613.18 samples/sec Loss 8.3696 LearningRate 0.0386 Epoch: 7 Global Step: 313870 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:43,920-Speed 2627.05 samples/sec Loss 8.2472 LearningRate 0.0386 Epoch: 7 Global Step: 313880 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:47,813-Speed 2631.38 samples/sec Loss 8.2705 LearningRate 0.0386 Epoch: 7 Global Step: 313890 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:51,707-Speed 2630.33 samples/sec Loss 8.2812 LearningRate 0.0386 Epoch: 7 Global Step: 313900 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 06:59:55,584-Speed 2641.45 samples/sec Loss 8.5181 LearningRate 0.0386 Epoch: 7 Global Step: 313910 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 06:59:59,454-Speed 2647.40 samples/sec Loss 8.3667 LearningRate 0.0386 Epoch: 7 Global Step: 313920 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:00:03,310-Speed 2655.77 samples/sec Loss 9.1675 LearningRate 0.0386 Epoch: 7 Global Step: 313930 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:07,202-Speed 2631.56 samples/sec Loss 8.4503 LearningRate 0.0386 Epoch: 7 Global Step: 313940 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:11,099-Speed 2628.23 samples/sec Loss 8.2998 LearningRate 0.0386 Epoch: 7 Global Step: 313950 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:14,998-Speed 2627.08 samples/sec Loss 8.4337 LearningRate 0.0386 Epoch: 7 Global Step: 313960 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:18,902-Speed 2624.27 samples/sec Loss 8.4363 LearningRate 0.0386 Epoch: 7 Global Step: 313970 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:22,802-Speed 2626.45 samples/sec Loss 8.3871 LearningRate 0.0386 Epoch: 7 Global Step: 313980 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:26,725-Speed 2610.81 samples/sec Loss 8.3781 LearningRate 0.0386 Epoch: 7 Global Step: 313990 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:30,634-Speed 2620.04 samples/sec Loss 8.3533 LearningRate 0.0386 Epoch: 7 Global Step: 314000 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:34,535-Speed 2626.19 samples/sec Loss 8.4515 LearningRate 0.0386 Epoch: 7 Global Step: 314010 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:38,431-Speed 2629.08 samples/sec Loss 8.3631 LearningRate 0.0386 Epoch: 7 Global Step: 314020 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:00:42,438-Speed 2555.96 samples/sec Loss 8.4662 LearningRate 0.0386 Epoch: 7 Global Step: 314030 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:00:46,346-Speed 2621.27 samples/sec Loss 8.4242 LearningRate 0.0386 Epoch: 7 Global Step: 314040 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:00:50,238-Speed 2631.57 samples/sec Loss 8.2071 LearningRate 0.0386 Epoch: 7 Global Step: 314050 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:00:54,139-Speed 2625.21 samples/sec Loss 8.4033 LearningRate 0.0386 Epoch: 7 Global Step: 314060 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:00:58,045-Speed 2622.95 samples/sec Loss 8.3031 LearningRate 0.0386 Epoch: 7 Global Step: 314070 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:01:01,980-Speed 2603.42 samples/sec Loss 8.3817 LearningRate 0.0386 Epoch: 7 Global Step: 314080 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:01:05,885-Speed 2622.48 samples/sec Loss 8.4643 LearningRate 0.0386 Epoch: 7 Global Step: 314090 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:01:09,801-Speed 2615.49 samples/sec Loss 8.3311 LearningRate 0.0386 Epoch: 7 Global Step: 314100 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:01:13,710-Speed 2620.47 samples/sec Loss 8.4259 LearningRate 0.0386 Epoch: 7 Global Step: 314110 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:01:17,609-Speed 2627.23 samples/sec Loss 8.3462 LearningRate 0.0386 Epoch: 7 Global Step: 314120 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:01:21,509-Speed 2625.99 samples/sec Loss 8.3164 LearningRate 0.0386 Epoch: 7 Global Step: 314130 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:25,406-Speed 2628.72 samples/sec Loss 8.3957 LearningRate 0.0386 Epoch: 7 Global Step: 314140 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:29,299-Speed 2630.68 samples/sec Loss 8.4108 LearningRate 0.0386 Epoch: 7 Global Step: 314150 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:33,204-Speed 2622.90 samples/sec Loss 8.1881 LearningRate 0.0386 Epoch: 7 Global Step: 314160 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:37,108-Speed 2623.53 samples/sec Loss 8.3484 LearningRate 0.0386 Epoch: 7 Global Step: 314170 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:41,003-Speed 2630.26 samples/sec Loss 8.3335 LearningRate 0.0386 Epoch: 7 Global Step: 314180 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:44,898-Speed 2629.75 samples/sec Loss 8.4306 LearningRate 0.0386 Epoch: 7 Global Step: 314190 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:48,804-Speed 2622.52 samples/sec Loss 8.3914 LearningRate 0.0386 Epoch: 7 Global Step: 314200 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:52,708-Speed 2623.21 samples/sec Loss 8.3754 LearningRate 0.0386 Epoch: 7 Global Step: 314210 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:01:56,615-Speed 2621.82 samples/sec Loss 8.4052 LearningRate 0.0386 Epoch: 7 Global Step: 314220 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:02:00,523-Speed 2620.62 samples/sec Loss 8.2964 LearningRate 0.0386 Epoch: 7 Global Step: 314230 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:04,426-Speed 2625.10 samples/sec Loss 8.2416 LearningRate 0.0386 Epoch: 7 Global Step: 314240 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:08,327-Speed 2625.43 samples/sec Loss 8.3019 LearningRate 0.0386 Epoch: 7 Global Step: 314250 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:12,232-Speed 2627.94 samples/sec Loss 8.2565 LearningRate 0.0386 Epoch: 7 Global Step: 314260 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:16,134-Speed 2624.91 samples/sec Loss 8.4917 LearningRate 0.0386 Epoch: 7 Global Step: 314270 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:20,041-Speed 2621.39 samples/sec Loss 8.4745 LearningRate 0.0386 Epoch: 7 Global Step: 314280 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:23,954-Speed 2617.23 samples/sec Loss 8.3453 LearningRate 0.0386 Epoch: 7 Global Step: 314290 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:27,858-Speed 2624.64 samples/sec Loss 8.3809 LearningRate 0.0386 Epoch: 7 Global Step: 314300 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:31,762-Speed 2623.63 samples/sec Loss 8.3749 LearningRate 0.0386 Epoch: 7 Global Step: 314310 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:35,712-Speed 2593.10 samples/sec Loss 8.4046 LearningRate 0.0386 Epoch: 7 Global Step: 314320 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:02:39,612-Speed 2626.37 samples/sec Loss 8.4971 LearningRate 0.0386 Epoch: 7 Global Step: 314330 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:02:43,511-Speed 2627.03 samples/sec Loss 8.3650 LearningRate 0.0386 Epoch: 7 Global Step: 314340 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:02:47,412-Speed 2625.64 samples/sec Loss 8.2364 LearningRate 0.0386 Epoch: 7 Global Step: 314350 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:02:51,317-Speed 2622.78 samples/sec Loss 8.4136 LearningRate 0.0386 Epoch: 7 Global Step: 314360 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:02:55,219-Speed 2625.46 samples/sec Loss 8.4568 LearningRate 0.0386 Epoch: 7 Global Step: 314370 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:02:59,143-Speed 2609.89 samples/sec Loss 8.3637 LearningRate 0.0386 Epoch: 7 Global Step: 314380 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:03:03,052-Speed 2620.82 samples/sec Loss 8.3933 LearningRate 0.0386 Epoch: 7 Global Step: 314390 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:03:06,938-Speed 2635.63 samples/sec Loss 8.4170 LearningRate 0.0386 Epoch: 7 Global Step: 314400 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:10,837-Speed 2626.53 samples/sec Loss 8.5058 LearningRate 0.0386 Epoch: 7 Global Step: 314410 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:14,739-Speed 2625.40 samples/sec Loss 8.3713 LearningRate 0.0386 Epoch: 7 Global Step: 314420 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:18,646-Speed 2621.29 samples/sec Loss 8.3983 LearningRate 0.0386 Epoch: 7 Global Step: 314430 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:22,673-Speed 2543.57 samples/sec Loss 8.3091 LearningRate 0.0386 Epoch: 7 Global Step: 314440 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:26,579-Speed 2622.21 samples/sec Loss 8.3159 LearningRate 0.0386 Epoch: 7 Global Step: 314450 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:30,485-Speed 2622.58 samples/sec Loss 8.4077 LearningRate 0.0386 Epoch: 7 Global Step: 314460 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:34,385-Speed 2626.39 samples/sec Loss 8.4487 LearningRate 0.0386 Epoch: 7 Global Step: 314470 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:38,282-Speed 2628.19 samples/sec Loss 8.3613 LearningRate 0.0386 Epoch: 7 Global Step: 314480 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:42,178-Speed 2629.06 samples/sec Loss 8.4535 LearningRate 0.0386 Epoch: 7 Global Step: 314490 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:03:46,082-Speed 2623.44 samples/sec Loss 8.3490 LearningRate 0.0386 Epoch: 7 Global Step: 314500 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:03:49,973-Speed 2632.50 samples/sec Loss 8.3504 LearningRate 0.0385 Epoch: 7 Global Step: 314510 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:03:53,905-Speed 2606.35 samples/sec Loss 8.3939 LearningRate 0.0385 Epoch: 7 Global Step: 314520 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:03:57,801-Speed 2628.98 samples/sec Loss 8.2606 LearningRate 0.0385 Epoch: 7 Global Step: 314530 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:04:01,698-Speed 2628.76 samples/sec Loss 8.4327 LearningRate 0.0385 Epoch: 7 Global Step: 314540 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:04:05,581-Speed 2637.39 samples/sec Loss 8.4017 LearningRate 0.0385 Epoch: 7 Global Step: 314550 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:04:09,407-Speed 2676.85 samples/sec Loss 10.9358 LearningRate 0.0385 Epoch: 7 Global Step: 314560 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:13,298-Speed 2632.76 samples/sec Loss 10.0894 LearningRate 0.0385 Epoch: 7 Global Step: 314570 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:17,204-Speed 2622.42 samples/sec Loss 8.9212 LearningRate 0.0385 Epoch: 7 Global Step: 314580 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:21,101-Speed 2628.07 samples/sec Loss 8.5450 LearningRate 0.0385 Epoch: 7 Global Step: 314590 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:25,008-Speed 2621.59 samples/sec Loss 8.5014 LearningRate 0.0385 Epoch: 7 Global Step: 314600 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:28,900-Speed 2631.87 samples/sec Loss 8.5322 LearningRate 0.0385 Epoch: 7 Global Step: 314610 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:32,788-Speed 2634.66 samples/sec Loss 8.4170 LearningRate 0.0385 Epoch: 7 Global Step: 314620 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:36,688-Speed 2626.48 samples/sec Loss 8.3995 LearningRate 0.0385 Epoch: 7 Global Step: 314630 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:40,585-Speed 2627.85 samples/sec Loss 8.2608 LearningRate 0.0385 Epoch: 7 Global Step: 314640 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:44,477-Speed 2631.74 samples/sec Loss 8.4068 LearningRate 0.0385 Epoch: 7 Global Step: 314650 Fp16 Grad Scale: 4096 Required: 58 hours
Training: 2022-04-14 07:04:48,368-Speed 2632.04 samples/sec Loss 8.4015 LearningRate 0.0385 Epoch: 7 Global Step: 314660 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:04:52,265-Speed 2629.30 samples/sec Loss 8.2650 LearningRate 0.0385 Epoch: 7 Global Step: 314670 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:04:56,159-Speed 2629.73 samples/sec Loss 8.4644 LearningRate 0.0385 Epoch: 7 Global Step: 314680 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:00,073-Speed 2617.52 samples/sec Loss 8.3841 LearningRate 0.0385 Epoch: 7 Global Step: 314690 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:03,985-Speed 2618.16 samples/sec Loss 8.3682 LearningRate 0.0385 Epoch: 7 Global Step: 314700 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:07,886-Speed 2625.78 samples/sec Loss 8.3898 LearningRate 0.0385 Epoch: 7 Global Step: 314710 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:11,782-Speed 2628.91 samples/sec Loss 8.3686 LearningRate 0.0385 Epoch: 7 Global Step: 314720 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:15,715-Speed 2604.20 samples/sec Loss 8.2676 LearningRate 0.0385 Epoch: 7 Global Step: 314730 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:19,614-Speed 2626.97 samples/sec Loss 8.2747 LearningRate 0.0385 Epoch: 7 Global Step: 314740 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:23,509-Speed 2630.30 samples/sec Loss 8.3930 LearningRate 0.0385 Epoch: 7 Global Step: 314750 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:05:27,401-Speed 2630.95 samples/sec Loss 8.3653 LearningRate 0.0385 Epoch: 7 Global Step: 314760 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:31,307-Speed 2622.61 samples/sec Loss 8.3738 LearningRate 0.0385 Epoch: 7 Global Step: 314770 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:35,199-Speed 2631.78 samples/sec Loss 8.4193 LearningRate 0.0385 Epoch: 7 Global Step: 314780 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:39,162-Speed 2584.45 samples/sec Loss 8.2151 LearningRate 0.0385 Epoch: 7 Global Step: 314790 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:43,074-Speed 2618.18 samples/sec Loss 8.3393 LearningRate 0.0385 Epoch: 7 Global Step: 314800 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:46,986-Speed 2619.04 samples/sec Loss 8.3031 LearningRate 0.0385 Epoch: 7 Global Step: 314810 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:50,891-Speed 2623.12 samples/sec Loss 8.4423 LearningRate 0.0385 Epoch: 7 Global Step: 314820 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:54,802-Speed 2619.08 samples/sec Loss 8.3461 LearningRate 0.0385 Epoch: 7 Global Step: 314830 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:05:58,713-Speed 2618.95 samples/sec Loss 8.3117 LearningRate 0.0385 Epoch: 7 Global Step: 314840 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:06:02,606-Speed 2631.26 samples/sec Loss 8.4429 LearningRate 0.0385 Epoch: 7 Global Step: 314850 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:06:06,514-Speed 2620.94 samples/sec Loss 8.3235 LearningRate 0.0385 Epoch: 7 Global Step: 314860 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:10,410-Speed 2628.77 samples/sec Loss 8.4373 LearningRate 0.0385 Epoch: 7 Global Step: 314870 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:14,300-Speed 2633.26 samples/sec Loss 8.3953 LearningRate 0.0385 Epoch: 7 Global Step: 314880 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:18,194-Speed 2629.99 samples/sec Loss 8.6395 LearningRate 0.0385 Epoch: 7 Global Step: 314890 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:22,089-Speed 2630.08 samples/sec Loss 8.3625 LearningRate 0.0385 Epoch: 7 Global Step: 314900 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:25,983-Speed 2629.92 samples/sec Loss 8.4191 LearningRate 0.0385 Epoch: 7 Global Step: 314910 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:29,879-Speed 2629.64 samples/sec Loss 8.3623 LearningRate 0.0385 Epoch: 7 Global Step: 314920 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:33,775-Speed 2628.39 samples/sec Loss 8.3656 LearningRate 0.0385 Epoch: 7 Global Step: 314930 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:37,666-Speed 2633.09 samples/sec Loss 8.3665 LearningRate 0.0385 Epoch: 7 Global Step: 314940 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:41,558-Speed 2631.39 samples/sec Loss 8.4483 LearningRate 0.0385 Epoch: 7 Global Step: 314950 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:06:45,453-Speed 2629.83 samples/sec Loss 8.2097 LearningRate 0.0385 Epoch: 7 Global Step: 314960 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:06:49,349-Speed 2629.10 samples/sec Loss 8.3446 LearningRate 0.0385 Epoch: 7 Global Step: 314970 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:06:53,255-Speed 2628.79 samples/sec Loss 8.1831 LearningRate 0.0385 Epoch: 7 Global Step: 314980 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:06:57,154-Speed 2626.46 samples/sec Loss 8.3842 LearningRate 0.0385 Epoch: 7 Global Step: 314990 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:07:01,051-Speed 2628.15 samples/sec Loss 8.3206 LearningRate 0.0385 Epoch: 7 Global Step: 315000 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:07:04,949-Speed 2627.44 samples/sec Loss 8.3986 LearningRate 0.0385 Epoch: 7 Global Step: 315010 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:07:08,844-Speed 2629.65 samples/sec Loss 8.4205 LearningRate 0.0385 Epoch: 7 Global Step: 315020 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:07:12,745-Speed 2626.39 samples/sec Loss 8.2620 LearningRate 0.0385 Epoch: 7 Global Step: 315030 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:07:16,656-Speed 2618.85 samples/sec Loss 8.3603 LearningRate 0.0385 Epoch: 7 Global Step: 315040 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:07:20,562-Speed 2622.55 samples/sec Loss 8.3709 LearningRate 0.0385 Epoch: 7 Global Step: 315050 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:07:24,459-Speed 2627.82 samples/sec Loss 8.3655 LearningRate 0.0385 Epoch: 7 Global Step: 315060 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:28,358-Speed 2626.51 samples/sec Loss 8.2804 LearningRate 0.0385 Epoch: 7 Global Step: 315070 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:32,257-Speed 2627.25 samples/sec Loss 8.3233 LearningRate 0.0385 Epoch: 7 Global Step: 315080 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:36,166-Speed 2620.01 samples/sec Loss 8.4489 LearningRate 0.0385 Epoch: 7 Global Step: 315090 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:40,063-Speed 2628.30 samples/sec Loss 8.3803 LearningRate 0.0385 Epoch: 7 Global Step: 315100 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:43,968-Speed 2623.34 samples/sec Loss 8.2017 LearningRate 0.0385 Epoch: 7 Global Step: 315110 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:47,868-Speed 2625.97 samples/sec Loss 8.3602 LearningRate 0.0385 Epoch: 7 Global Step: 315120 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:51,792-Speed 2610.87 samples/sec Loss 8.2454 LearningRate 0.0385 Epoch: 7 Global Step: 315130 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:55,686-Speed 2630.11 samples/sec Loss 8.3337 LearningRate 0.0385 Epoch: 7 Global Step: 315140 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:07:59,570-Speed 2637.53 samples/sec Loss 8.3603 LearningRate 0.0385 Epoch: 7 Global Step: 315150 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:03,472-Speed 2624.70 samples/sec Loss 8.3915 LearningRate 0.0385 Epoch: 7 Global Step: 315160 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:07,368-Speed 2628.82 samples/sec Loss 8.4410 LearningRate 0.0385 Epoch: 7 Global Step: 315170 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:11,283-Speed 2616.38 samples/sec Loss 8.2686 LearningRate 0.0384 Epoch: 7 Global Step: 315180 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:15,184-Speed 2625.61 samples/sec Loss 8.3024 LearningRate 0.0384 Epoch: 7 Global Step: 315190 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:19,090-Speed 2622.60 samples/sec Loss 8.3673 LearningRate 0.0384 Epoch: 7 Global Step: 315200 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:23,024-Speed 2603.56 samples/sec Loss 8.3669 LearningRate 0.0384 Epoch: 7 Global Step: 315210 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:26,926-Speed 2624.95 samples/sec Loss 8.5385 LearningRate 0.0384 Epoch: 7 Global Step: 315220 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:30,845-Speed 2613.70 samples/sec Loss 8.4053 LearningRate 0.0384 Epoch: 7 Global Step: 315230 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:34,790-Speed 2596.20 samples/sec Loss 8.4083 LearningRate 0.0384 Epoch: 7 Global Step: 315240 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:08:38,887-Speed 2500.03 samples/sec Loss 8.2749 LearningRate 0.0384 Epoch: 7 Global Step: 315250 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:08:42,853-Speed 2583.06 samples/sec Loss 8.2867 LearningRate 0.0384 Epoch: 7 Global Step: 315260 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:08:46,760-Speed 2621.04 samples/sec Loss 8.3032 LearningRate 0.0384 Epoch: 7 Global Step: 315270 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:08:50,661-Speed 2626.10 samples/sec Loss 8.2271 LearningRate 0.0384 Epoch: 7 Global Step: 315280 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:08:54,559-Speed 2627.39 samples/sec Loss 8.2975 LearningRate 0.0384 Epoch: 7 Global Step: 315290 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:08:58,457-Speed 2628.02 samples/sec Loss 8.3992 LearningRate 0.0384 Epoch: 7 Global Step: 315300 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:09:02,360-Speed 2623.82 samples/sec Loss 8.3974 LearningRate 0.0384 Epoch: 7 Global Step: 315310 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:09:06,274-Speed 2622.59 samples/sec Loss 8.3009 LearningRate 0.0384 Epoch: 7 Global Step: 315320 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:09:10,177-Speed 2623.97 samples/sec Loss 8.1871 LearningRate 0.0384 Epoch: 7 Global Step: 315330 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:09:14,074-Speed 2628.05 samples/sec Loss 8.2599 LearningRate 0.0384 Epoch: 7 Global Step: 315340 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:09:17,955-Speed 2639.22 samples/sec Loss 8.3150 LearningRate 0.0384 Epoch: 7 Global Step: 315350 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:09:21,832-Speed 2642.06 samples/sec Loss 8.3305 LearningRate 0.0384 Epoch: 7 Global Step: 315360 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:25,724-Speed 2632.17 samples/sec Loss 8.5007 LearningRate 0.0384 Epoch: 7 Global Step: 315370 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:29,614-Speed 2632.85 samples/sec Loss 8.3357 LearningRate 0.0384 Epoch: 7 Global Step: 315380 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:33,544-Speed 2605.88 samples/sec Loss 8.3776 LearningRate 0.0384 Epoch: 7 Global Step: 315390 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:37,444-Speed 2626.78 samples/sec Loss 8.3510 LearningRate 0.0384 Epoch: 7 Global Step: 315400 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:41,344-Speed 2626.26 samples/sec Loss 8.3112 LearningRate 0.0384 Epoch: 7 Global Step: 315410 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:45,238-Speed 2630.58 samples/sec Loss 8.2726 LearningRate 0.0384 Epoch: 7 Global Step: 315420 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:49,172-Speed 2603.52 samples/sec Loss 8.3312 LearningRate 0.0384 Epoch: 7 Global Step: 315430 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:53,066-Speed 2630.42 samples/sec Loss 8.3031 LearningRate 0.0384 Epoch: 7 Global Step: 315440 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:09:56,957-Speed 2632.40 samples/sec Loss 8.3532 LearningRate 0.0384 Epoch: 7 Global Step: 315450 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:10:00,852-Speed 2629.68 samples/sec Loss 8.2955 LearningRate 0.0384 Epoch: 7 Global Step: 315460 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:04,745-Speed 2631.12 samples/sec Loss 8.3887 LearningRate 0.0384 Epoch: 7 Global Step: 315470 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:08,639-Speed 2630.36 samples/sec Loss 8.3419 LearningRate 0.0384 Epoch: 7 Global Step: 315480 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:12,532-Speed 2631.03 samples/sec Loss 8.3016 LearningRate 0.0384 Epoch: 7 Global Step: 315490 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:16,426-Speed 2630.30 samples/sec Loss 8.2490 LearningRate 0.0384 Epoch: 7 Global Step: 315500 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:20,353-Speed 2608.61 samples/sec Loss 8.4333 LearningRate 0.0384 Epoch: 7 Global Step: 315510 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:24,245-Speed 2631.50 samples/sec Loss 8.3962 LearningRate 0.0384 Epoch: 7 Global Step: 315520 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:28,142-Speed 2628.26 samples/sec Loss 8.4777 LearningRate 0.0384 Epoch: 7 Global Step: 315530 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:32,030-Speed 2634.15 samples/sec Loss 8.3865 LearningRate 0.0384 Epoch: 7 Global Step: 315540 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:35,924-Speed 2630.62 samples/sec Loss 8.4350 LearningRate 0.0384 Epoch: 7 Global Step: 315550 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:39,831-Speed 2621.27 samples/sec Loss 8.2778 LearningRate 0.0384 Epoch: 7 Global Step: 315560 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:10:43,704-Speed 2644.72 samples/sec Loss 8.3109 LearningRate 0.0384 Epoch: 7 Global Step: 315570 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:47,601-Speed 2628.49 samples/sec Loss 8.3337 LearningRate 0.0384 Epoch: 7 Global Step: 315580 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:51,493-Speed 2631.65 samples/sec Loss 8.4512 LearningRate 0.0384 Epoch: 7 Global Step: 315590 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:55,408-Speed 2616.79 samples/sec Loss 8.3714 LearningRate 0.0384 Epoch: 7 Global Step: 315600 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:10:59,322-Speed 2616.77 samples/sec Loss 8.3660 LearningRate 0.0384 Epoch: 7 Global Step: 315610 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:03,214-Speed 2631.33 samples/sec Loss 8.4619 LearningRate 0.0384 Epoch: 7 Global Step: 315620 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:07,123-Speed 2620.50 samples/sec Loss 8.3971 LearningRate 0.0384 Epoch: 7 Global Step: 315630 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:11,027-Speed 2623.66 samples/sec Loss 8.3782 LearningRate 0.0384 Epoch: 7 Global Step: 315640 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:14,932-Speed 2623.01 samples/sec Loss 8.4921 LearningRate 0.0384 Epoch: 7 Global Step: 315650 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:18,830-Speed 2627.64 samples/sec Loss 8.3439 LearningRate 0.0384 Epoch: 7 Global Step: 315660 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:22,734-Speed 2623.64 samples/sec Loss 8.2835 LearningRate 0.0384 Epoch: 7 Global Step: 315670 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:11:26,622-Speed 2634.63 samples/sec Loss 8.4404 LearningRate 0.0384 Epoch: 7 Global Step: 315680 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:30,519-Speed 2628.49 samples/sec Loss 8.4559 LearningRate 0.0384 Epoch: 7 Global Step: 315690 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:34,413-Speed 2629.89 samples/sec Loss 8.5016 LearningRate 0.0384 Epoch: 7 Global Step: 315700 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:38,313-Speed 2626.16 samples/sec Loss 8.2981 LearningRate 0.0384 Epoch: 7 Global Step: 315710 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:42,242-Speed 2607.54 samples/sec Loss 8.4897 LearningRate 0.0384 Epoch: 7 Global Step: 315720 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:46,141-Speed 2627.01 samples/sec Loss 8.3149 LearningRate 0.0384 Epoch: 7 Global Step: 315730 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:50,040-Speed 2627.29 samples/sec Loss 8.4341 LearningRate 0.0384 Epoch: 7 Global Step: 315740 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:53,935-Speed 2629.10 samples/sec Loss 8.4287 LearningRate 0.0384 Epoch: 7 Global Step: 315750 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:11:57,834-Speed 2627.34 samples/sec Loss 8.2658 LearningRate 0.0384 Epoch: 7 Global Step: 315760 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:01,729-Speed 2630.12 samples/sec Loss 8.3934 LearningRate 0.0384 Epoch: 7 Global Step: 315770 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:05,608-Speed 2640.20 samples/sec Loss 8.2978 LearningRate 0.0384 Epoch: 7 Global Step: 315780 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:09,507-Speed 2626.28 samples/sec Loss 8.4732 LearningRate 0.0384 Epoch: 7 Global Step: 315790 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:13,401-Speed 2630.90 samples/sec Loss 8.3351 LearningRate 0.0384 Epoch: 7 Global Step: 315800 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:17,298-Speed 2628.47 samples/sec Loss 8.4606 LearningRate 0.0384 Epoch: 7 Global Step: 315810 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:21,200-Speed 2624.90 samples/sec Loss 8.2258 LearningRate 0.0384 Epoch: 7 Global Step: 315820 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:25,100-Speed 2626.45 samples/sec Loss 8.4059 LearningRate 0.0384 Epoch: 7 Global Step: 315830 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:29,005-Speed 2622.69 samples/sec Loss 8.2423 LearningRate 0.0384 Epoch: 7 Global Step: 315840 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:32,925-Speed 2613.04 samples/sec Loss 8.4077 LearningRate 0.0383 Epoch: 7 Global Step: 315850 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:36,823-Speed 2627.91 samples/sec Loss 8.3731 LearningRate 0.0383 Epoch: 7 Global Step: 315860 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:40,719-Speed 2628.90 samples/sec Loss 8.3717 LearningRate 0.0383 Epoch: 7 Global Step: 315870 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:44,609-Speed 2633.15 samples/sec Loss 8.4484 LearningRate 0.0383 Epoch: 7 Global Step: 315880 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:48,499-Speed 2633.40 samples/sec Loss 8.3794 LearningRate 0.0383 Epoch: 7 Global Step: 315890 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:52,399-Speed 2626.41 samples/sec Loss 8.3619 LearningRate 0.0383 Epoch: 7 Global Step: 315900 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:12:56,298-Speed 2626.80 samples/sec Loss 8.4211 LearningRate 0.0383 Epoch: 7 Global Step: 315910 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:00,198-Speed 2626.33 samples/sec Loss 8.2379 LearningRate 0.0383 Epoch: 7 Global Step: 315920 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:04,103-Speed 2622.82 samples/sec Loss 8.3679 LearningRate 0.0383 Epoch: 7 Global Step: 315930 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:08,002-Speed 2627.42 samples/sec Loss 8.3061 LearningRate 0.0383 Epoch: 7 Global Step: 315940 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:11,899-Speed 2628.10 samples/sec Loss 8.2791 LearningRate 0.0383 Epoch: 7 Global Step: 315950 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:15,792-Speed 2631.08 samples/sec Loss 8.4487 LearningRate 0.0383 Epoch: 7 Global Step: 315960 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:19,687-Speed 2629.57 samples/sec Loss 8.3806 LearningRate 0.0383 Epoch: 7 Global Step: 315970 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:23,584-Speed 2628.45 samples/sec Loss 8.3376 LearningRate 0.0383 Epoch: 7 Global Step: 315980 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:27,478-Speed 2630.74 samples/sec Loss 8.4977 LearningRate 0.0383 Epoch: 7 Global Step: 315990 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:31,374-Speed 2628.82 samples/sec Loss 8.3822 LearningRate 0.0383 Epoch: 7 Global Step: 316000 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:35,273-Speed 2626.60 samples/sec Loss 8.2379 LearningRate 0.0383 Epoch: 7 Global Step: 316010 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:39,176-Speed 2624.84 samples/sec Loss 8.3776 LearningRate 0.0383 Epoch: 7 Global Step: 316020 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:43,071-Speed 2629.96 samples/sec Loss 8.2459 LearningRate 0.0383 Epoch: 7 Global Step: 316030 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:46,971-Speed 2626.01 samples/sec Loss 8.2738 LearningRate 0.0383 Epoch: 7 Global Step: 316040 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:50,875-Speed 2624.21 samples/sec Loss 8.2280 LearningRate 0.0383 Epoch: 7 Global Step: 316050 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:13:54,760-Speed 2636.52 samples/sec Loss 8.4276 LearningRate 0.0383 Epoch: 7 Global Step: 316060 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:13:58,652-Speed 2631.30 samples/sec Loss 8.3382 LearningRate 0.0383 Epoch: 7 Global Step: 316070 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:14:02,524-Speed 2645.35 samples/sec Loss 8.4526 LearningRate 0.0383 Epoch: 7 Global Step: 316080 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:06,419-Speed 2629.37 samples/sec Loss 8.6118 LearningRate 0.0383 Epoch: 7 Global Step: 316090 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:10,326-Speed 2622.31 samples/sec Loss 8.3476 LearningRate 0.0383 Epoch: 7 Global Step: 316100 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:14,319-Speed 2564.90 samples/sec Loss 8.3417 LearningRate 0.0383 Epoch: 7 Global Step: 316110 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:18,419-Speed 2498.40 samples/sec Loss 8.4076 LearningRate 0.0383 Epoch: 7 Global Step: 316120 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:22,514-Speed 2501.19 samples/sec Loss 8.2674 LearningRate 0.0383 Epoch: 7 Global Step: 316130 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:26,612-Speed 2499.35 samples/sec Loss 8.2755 LearningRate 0.0383 Epoch: 7 Global Step: 316140 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:30,563-Speed 2592.39 samples/sec Loss 8.5051 LearningRate 0.0383 Epoch: 7 Global Step: 316150 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:34,471-Speed 2621.09 samples/sec Loss 8.2054 LearningRate 0.0383 Epoch: 7 Global Step: 316160 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:38,369-Speed 2627.68 samples/sec Loss 8.4073 LearningRate 0.0383 Epoch: 7 Global Step: 316170 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:14:42,274-Speed 2622.76 samples/sec Loss 8.4042 LearningRate 0.0383 Epoch: 7 Global Step: 316180 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:14:46,197-Speed 2610.48 samples/sec Loss 8.3752 LearningRate 0.0383 Epoch: 7 Global Step: 316190 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:14:50,117-Speed 2613.09 samples/sec Loss 8.3694 LearningRate 0.0383 Epoch: 7 Global Step: 316200 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:14:54,025-Speed 2621.43 samples/sec Loss 8.4069 LearningRate 0.0383 Epoch: 7 Global Step: 316210 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:14:57,928-Speed 2624.12 samples/sec Loss 8.3380 LearningRate 0.0383 Epoch: 7 Global Step: 316220 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:15:01,827-Speed 2626.90 samples/sec Loss 8.2653 LearningRate 0.0383 Epoch: 7 Global Step: 316230 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:15:05,740-Speed 2616.79 samples/sec Loss 8.3647 LearningRate 0.0383 Epoch: 7 Global Step: 316240 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:15:09,658-Speed 2614.14 samples/sec Loss 8.3239 LearningRate 0.0383 Epoch: 7 Global Step: 316250 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:15:13,538-Speed 2639.95 samples/sec Loss 8.3292 LearningRate 0.0383 Epoch: 7 Global Step: 316260 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:15:17,403-Speed 2649.92 samples/sec Loss 8.5933 LearningRate 0.0383 Epoch: 7 Global Step: 316270 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:21,315-Speed 2620.71 samples/sec Loss 8.3474 LearningRate 0.0383 Epoch: 7 Global Step: 316280 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:25,216-Speed 2625.77 samples/sec Loss 8.5153 LearningRate 0.0383 Epoch: 7 Global Step: 316290 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:29,123-Speed 2621.66 samples/sec Loss 8.2504 LearningRate 0.0383 Epoch: 7 Global Step: 316300 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:33,039-Speed 2615.31 samples/sec Loss 8.3669 LearningRate 0.0383 Epoch: 7 Global Step: 316310 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:36,978-Speed 2600.16 samples/sec Loss 8.4641 LearningRate 0.0383 Epoch: 7 Global Step: 316320 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:40,878-Speed 2626.11 samples/sec Loss 8.4283 LearningRate 0.0383 Epoch: 7 Global Step: 316330 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:44,788-Speed 2619.62 samples/sec Loss 8.2604 LearningRate 0.0383 Epoch: 7 Global Step: 316340 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:48,695-Speed 2621.90 samples/sec Loss 8.2818 LearningRate 0.0383 Epoch: 7 Global Step: 316350 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:52,601-Speed 2622.41 samples/sec Loss 8.4958 LearningRate 0.0383 Epoch: 7 Global Step: 316360 Fp16 Grad Scale: 8192 Required: 58 hours
Training: 2022-04-14 07:15:56,515-Speed 2616.79 samples/sec Loss 8.2361 LearningRate 0.0383 Epoch: 7 Global Step: 316370 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:00,427-Speed 2618.79 samples/sec Loss 8.4765 LearningRate 0.0383 Epoch: 7 Global Step: 316380 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:04,332-Speed 2622.67 samples/sec Loss 8.2564 LearningRate 0.0383 Epoch: 7 Global Step: 316390 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:08,250-Speed 2613.73 samples/sec Loss 8.4094 LearningRate 0.0383 Epoch: 7 Global Step: 316400 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:12,176-Speed 2609.33 samples/sec Loss 8.3332 LearningRate 0.0383 Epoch: 7 Global Step: 316410 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:16,077-Speed 2625.42 samples/sec Loss 8.4835 LearningRate 0.0383 Epoch: 7 Global Step: 316420 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:19,986-Speed 2621.06 samples/sec Loss 8.3861 LearningRate 0.0383 Epoch: 7 Global Step: 316430 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:23,903-Speed 2614.78 samples/sec Loss 8.2589 LearningRate 0.0383 Epoch: 7 Global Step: 316440 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:27,818-Speed 2616.71 samples/sec Loss 8.3335 LearningRate 0.0383 Epoch: 7 Global Step: 316450 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:31,722-Speed 2623.13 samples/sec Loss 8.2649 LearningRate 0.0383 Epoch: 7 Global Step: 316460 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:16:35,625-Speed 2624.23 samples/sec Loss 8.4021 LearningRate 0.0383 Epoch: 7 Global Step: 316470 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:16:39,528-Speed 2623.98 samples/sec Loss 8.3896 LearningRate 0.0383 Epoch: 7 Global Step: 316480 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:16:43,429-Speed 2625.74 samples/sec Loss 8.2875 LearningRate 0.0383 Epoch: 7 Global Step: 316490 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:16:47,335-Speed 2622.16 samples/sec Loss 8.3795 LearningRate 0.0383 Epoch: 7 Global Step: 316500 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:16:51,231-Speed 2629.69 samples/sec Loss 8.3749 LearningRate 0.0383 Epoch: 7 Global Step: 316510 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:16:55,172-Speed 2598.56 samples/sec Loss 8.2715 LearningRate 0.0382 Epoch: 7 Global Step: 316520 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:16:59,082-Speed 2619.86 samples/sec Loss 8.3240 LearningRate 0.0382 Epoch: 7 Global Step: 316530 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:17:02,990-Speed 2621.02 samples/sec Loss 8.3757 LearningRate 0.0382 Epoch: 7 Global Step: 316540 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:17:06,889-Speed 2626.74 samples/sec Loss 8.3690 LearningRate 0.0382 Epoch: 7 Global Step: 316550 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:17:10,791-Speed 2624.72 samples/sec Loss 8.4462 LearningRate 0.0382 Epoch: 7 Global Step: 316560 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:17:14,689-Speed 2628.34 samples/sec Loss 8.3005 LearningRate 0.0382 Epoch: 7 Global Step: 316570 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:18,602-Speed 2618.27 samples/sec Loss 8.4581 LearningRate 0.0382 Epoch: 7 Global Step: 316580 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:22,626-Speed 2545.10 samples/sec Loss 8.2945 LearningRate 0.0382 Epoch: 7 Global Step: 316590 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:26,544-Speed 2614.92 samples/sec Loss 8.3021 LearningRate 0.0382 Epoch: 7 Global Step: 316600 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:30,512-Speed 2580.97 samples/sec Loss 8.5026 LearningRate 0.0382 Epoch: 7 Global Step: 316610 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:34,462-Speed 2593.36 samples/sec Loss 8.3314 LearningRate 0.0382 Epoch: 7 Global Step: 316620 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:38,365-Speed 2624.38 samples/sec Loss 8.1932 LearningRate 0.0382 Epoch: 7 Global Step: 316630 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:42,262-Speed 2628.04 samples/sec Loss 8.4220 LearningRate 0.0382 Epoch: 7 Global Step: 316640 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:46,156-Speed 2630.31 samples/sec Loss 8.3645 LearningRate 0.0382 Epoch: 7 Global Step: 316650 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:50,056-Speed 2626.42 samples/sec Loss 8.2322 LearningRate 0.0382 Epoch: 7 Global Step: 316660 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:17:53,950-Speed 2629.84 samples/sec Loss 8.3090 LearningRate 0.0382 Epoch: 7 Global Step: 316670 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:17:57,857-Speed 2622.53 samples/sec Loss 8.3968 LearningRate 0.0382 Epoch: 7 Global Step: 316680 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:01,755-Speed 2627.44 samples/sec Loss 8.4082 LearningRate 0.0382 Epoch: 7 Global Step: 316690 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:05,661-Speed 2622.31 samples/sec Loss 8.2836 LearningRate 0.0382 Epoch: 7 Global Step: 316700 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:09,559-Speed 2627.42 samples/sec Loss 8.3800 LearningRate 0.0382 Epoch: 7 Global Step: 316710 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:13,456-Speed 2628.40 samples/sec Loss 8.3999 LearningRate 0.0382 Epoch: 7 Global Step: 316720 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:17,348-Speed 2631.37 samples/sec Loss 8.2995 LearningRate 0.0382 Epoch: 7 Global Step: 316730 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:21,242-Speed 2630.25 samples/sec Loss 8.3235 LearningRate 0.0382 Epoch: 7 Global Step: 316740 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:25,133-Speed 2631.68 samples/sec Loss 8.2379 LearningRate 0.0382 Epoch: 7 Global Step: 316750 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:29,033-Speed 2626.98 samples/sec Loss 8.2681 LearningRate 0.0382 Epoch: 7 Global Step: 316760 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:32,926-Speed 2631.05 samples/sec Loss 8.3652 LearningRate 0.0382 Epoch: 7 Global Step: 316770 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:18:36,819-Speed 2631.20 samples/sec Loss 8.2482 LearningRate 0.0382 Epoch: 7 Global Step: 316780 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:18:40,712-Speed 2630.97 samples/sec Loss 8.3293 LearningRate 0.0382 Epoch: 7 Global Step: 316790 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:18:44,638-Speed 2608.96 samples/sec Loss 8.2703 LearningRate 0.0382 Epoch: 7 Global Step: 316800 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:18:48,530-Speed 2631.69 samples/sec Loss 8.2896 LearningRate 0.0382 Epoch: 7 Global Step: 316810 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:18:52,424-Speed 2630.60 samples/sec Loss 8.3874 LearningRate 0.0382 Epoch: 7 Global Step: 316820 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:18:56,323-Speed 2627.03 samples/sec Loss 8.3049 LearningRate 0.0382 Epoch: 7 Global Step: 316830 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:00,221-Speed 2627.54 samples/sec Loss 8.3062 LearningRate 0.0382 Epoch: 7 Global Step: 316840 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:04,117-Speed 2628.92 samples/sec Loss 8.2319 LearningRate 0.0382 Epoch: 7 Global Step: 316850 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:08,014-Speed 2628.72 samples/sec Loss 8.2918 LearningRate 0.0382 Epoch: 7 Global Step: 316860 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:11,917-Speed 2624.02 samples/sec Loss 8.3587 LearningRate 0.0382 Epoch: 7 Global Step: 316870 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:15,810-Speed 2630.79 samples/sec Loss 8.3640 LearningRate 0.0382 Epoch: 7 Global Step: 316880 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:19,704-Speed 2630.38 samples/sec Loss 8.4525 LearningRate 0.0382 Epoch: 7 Global Step: 316890 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:23,605-Speed 2626.15 samples/sec Loss 8.3394 LearningRate 0.0382 Epoch: 7 Global Step: 316900 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:27,500-Speed 2629.29 samples/sec Loss 8.3917 LearningRate 0.0382 Epoch: 7 Global Step: 316910 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:31,395-Speed 2629.51 samples/sec Loss 8.2647 LearningRate 0.0382 Epoch: 7 Global Step: 316920 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:19:35,275-Speed 2639.73 samples/sec Loss 8.2563 LearningRate 0.0382 Epoch: 7 Global Step: 316930 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:39,200-Speed 2610.20 samples/sec Loss 8.3150 LearningRate 0.0382 Epoch: 7 Global Step: 316940 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:43,092-Speed 2632.02 samples/sec Loss 8.3565 LearningRate 0.0382 Epoch: 7 Global Step: 316950 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:46,986-Speed 2629.96 samples/sec Loss 8.3834 LearningRate 0.0382 Epoch: 7 Global Step: 316960 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:50,879-Speed 2631.13 samples/sec Loss 8.3781 LearningRate 0.0382 Epoch: 7 Global Step: 316970 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:54,775-Speed 2629.08 samples/sec Loss 8.4133 LearningRate 0.0382 Epoch: 7 Global Step: 316980 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:19:58,669-Speed 2630.18 samples/sec Loss 8.4296 LearningRate 0.0382 Epoch: 7 Global Step: 316990 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:20:02,568-Speed 2627.30 samples/sec Loss 8.3389 LearningRate 0.0382 Epoch: 7 Global Step: 317000 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:20:06,472-Speed 2623.85 samples/sec Loss 8.3562 LearningRate 0.0382 Epoch: 7 Global Step: 317010 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:20:10,376-Speed 2623.54 samples/sec Loss 8.2498 LearningRate 0.0382 Epoch: 7 Global Step: 317020 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:20:14,280-Speed 2624.23 samples/sec Loss 8.3319 LearningRate 0.0382 Epoch: 7 Global Step: 317030 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:20:18,201-Speed 2612.23 samples/sec Loss 8.2558 LearningRate 0.0382 Epoch: 7 Global Step: 317040 Fp16 Grad Scale: 262144 Required: 58 hours
Training: 2022-04-14 07:20:22,072-Speed 2646.19 samples/sec Loss 8.3133 LearningRate 0.0382 Epoch: 7 Global Step: 317050 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:20:25,972-Speed 2626.42 samples/sec Loss 8.3187 LearningRate 0.0382 Epoch: 7 Global Step: 317060 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:20:29,869-Speed 2628.22 samples/sec Loss 8.3350 LearningRate 0.0382 Epoch: 7 Global Step: 317070 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:20:33,793-Speed 2610.03 samples/sec Loss 8.4008 LearningRate 0.0382 Epoch: 7 Global Step: 317080 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:20:37,685-Speed 2632.30 samples/sec Loss 8.2194 LearningRate 0.0382 Epoch: 7 Global Step: 317090 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:20:41,590-Speed 2623.11 samples/sec Loss 8.2149 LearningRate 0.0382 Epoch: 7 Global Step: 317100 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:20:45,484-Speed 2630.32 samples/sec Loss 8.2595 LearningRate 0.0382 Epoch: 7 Global Step: 317110 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:20:49,348-Speed 2650.60 samples/sec Loss 8.7182 LearningRate 0.0382 Epoch: 7 Global Step: 317120 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:20:53,252-Speed 2623.39 samples/sec Loss 8.7446 LearningRate 0.0382 Epoch: 7 Global Step: 317130 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:20:57,155-Speed 2624.37 samples/sec Loss 8.2988 LearningRate 0.0382 Epoch: 7 Global Step: 317140 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:01,046-Speed 2632.52 samples/sec Loss 8.3048 LearningRate 0.0382 Epoch: 7 Global Step: 317150 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:04,945-Speed 2626.91 samples/sec Loss 8.3540 LearningRate 0.0382 Epoch: 7 Global Step: 317160 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:08,849-Speed 2624.74 samples/sec Loss 8.1649 LearningRate 0.0382 Epoch: 7 Global Step: 317170 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:12,755-Speed 2622.00 samples/sec Loss 8.3914 LearningRate 0.0382 Epoch: 7 Global Step: 317180 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:16,652-Speed 2628.41 samples/sec Loss 8.3376 LearningRate 0.0381 Epoch: 7 Global Step: 317190 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:20,557-Speed 2623.38 samples/sec Loss 8.3392 LearningRate 0.0381 Epoch: 7 Global Step: 317200 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:24,456-Speed 2626.77 samples/sec Loss 8.3762 LearningRate 0.0381 Epoch: 7 Global Step: 317210 Fp16 Grad Scale: 16384 Required: 58 hours
Training: 2022-04-14 07:21:28,446-Speed 2567.56 samples/sec Loss 8.3406 LearningRate 0.0381 Epoch: 7 Global Step: 317220 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:32,355-Speed 2620.20 samples/sec Loss 8.3884 LearningRate 0.0381 Epoch: 7 Global Step: 317230 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:36,254-Speed 2627.13 samples/sec Loss 8.3121 LearningRate 0.0381 Epoch: 7 Global Step: 317240 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:40,158-Speed 2623.32 samples/sec Loss 8.4365 LearningRate 0.0381 Epoch: 7 Global Step: 317250 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:44,059-Speed 2625.27 samples/sec Loss 8.2480 LearningRate 0.0381 Epoch: 7 Global Step: 317260 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:47,998-Speed 2600.81 samples/sec Loss 8.3484 LearningRate 0.0381 Epoch: 7 Global Step: 317270 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:51,927-Speed 2606.84 samples/sec Loss 8.2808 LearningRate 0.0381 Epoch: 7 Global Step: 317280 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:55,855-Speed 2607.59 samples/sec Loss 8.3084 LearningRate 0.0381 Epoch: 7 Global Step: 317290 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:21:59,794-Speed 2600.84 samples/sec Loss 8.2626 LearningRate 0.0381 Epoch: 7 Global Step: 317300 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:22:03,761-Speed 2581.59 samples/sec Loss 8.2895 LearningRate 0.0381 Epoch: 7 Global Step: 317310 Fp16 Grad Scale: 32768 Required: 58 hours
Training: 2022-04-14 07:22:07,667-Speed 2622.42 samples/sec Loss 8.2610 LearningRate 0.0381 Epoch: 7 Global Step: 317320 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:11,580-Speed 2617.24 samples/sec Loss 8.3216 LearningRate 0.0381 Epoch: 7 Global Step: 317330 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:15,487-Speed 2621.85 samples/sec Loss 8.4064 LearningRate 0.0381 Epoch: 7 Global Step: 317340 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:19,400-Speed 2617.64 samples/sec Loss 8.2590 LearningRate 0.0381 Epoch: 7 Global Step: 317350 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:23,295-Speed 2629.80 samples/sec Loss 8.4323 LearningRate 0.0381 Epoch: 7 Global Step: 317360 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:27,209-Speed 2617.45 samples/sec Loss 8.3623 LearningRate 0.0381 Epoch: 7 Global Step: 317370 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:31,201-Speed 2565.48 samples/sec Loss 8.2801 LearningRate 0.0381 Epoch: 7 Global Step: 317380 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:35,097-Speed 2629.06 samples/sec Loss 8.2398 LearningRate 0.0381 Epoch: 7 Global Step: 317390 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:38,989-Speed 2631.86 samples/sec Loss 8.3546 LearningRate 0.0381 Epoch: 7 Global Step: 317400 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:42,881-Speed 2632.00 samples/sec Loss 8.3725 LearningRate 0.0381 Epoch: 7 Global Step: 317410 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:22:46,784-Speed 2623.49 samples/sec Loss 8.3188 LearningRate 0.0381 Epoch: 7 Global Step: 317420 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:22:50,677-Speed 2631.31 samples/sec Loss 8.2928 LearningRate 0.0381 Epoch: 7 Global Step: 317430 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:22:54,570-Speed 2631.13 samples/sec Loss 8.3401 LearningRate 0.0381 Epoch: 7 Global Step: 317440 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:22:58,463-Speed 2631.21 samples/sec Loss 8.4106 LearningRate 0.0381 Epoch: 7 Global Step: 317450 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:23:02,341-Speed 2641.23 samples/sec Loss 8.3293 LearningRate 0.0381 Epoch: 7 Global Step: 317460 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:06,232-Speed 2631.92 samples/sec Loss 8.3590 LearningRate 0.0381 Epoch: 7 Global Step: 317470 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:10,126-Speed 2630.24 samples/sec Loss 8.2964 LearningRate 0.0381 Epoch: 7 Global Step: 317480 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:14,074-Speed 2595.05 samples/sec Loss 8.4283 LearningRate 0.0381 Epoch: 7 Global Step: 317490 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:18,006-Speed 2604.90 samples/sec Loss 8.3066 LearningRate 0.0381 Epoch: 7 Global Step: 317500 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:21,897-Speed 2632.26 samples/sec Loss 8.2904 LearningRate 0.0381 Epoch: 7 Global Step: 317510 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:25,819-Speed 2611.72 samples/sec Loss 8.5070 LearningRate 0.0381 Epoch: 7 Global Step: 317520 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:29,715-Speed 2629.40 samples/sec Loss 8.3631 LearningRate 0.0381 Epoch: 7 Global Step: 317530 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:33,632-Speed 2615.00 samples/sec Loss 8.2246 LearningRate 0.0381 Epoch: 7 Global Step: 317540 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:37,533-Speed 2625.32 samples/sec Loss 8.4549 LearningRate 0.0381 Epoch: 7 Global Step: 317550 Fp16 Grad Scale: 65536 Required: 58 hours
Training: 2022-04-14 07:23:41,442-Speed 2620.53 samples/sec Loss 8.4589 LearningRate 0.0381 Epoch: 7 Global Step: 317560 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:23:45,357-Speed 2616.51 samples/sec Loss 8.4767 LearningRate 0.0381 Epoch: 7 Global Step: 317570 Fp16 Grad Scale: 131072 Required: 58 hours
Training: 2022-04-14 07:23:49,251-Speed 2630.28 samples/sec Loss 8.1746 LearningRate 0.0381 Epoch: 7 Global Step: 317580 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:23:53,148-Speed 2628.01 samples/sec Loss 8.3344 LearningRate 0.0381 Epoch: 7 Global Step: 317590 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:23:57,058-Speed 2619.82 samples/sec Loss 8.2970 LearningRate 0.0381 Epoch: 7 Global Step: 317600 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:24:00,943-Speed 2636.63 samples/sec Loss 8.2874 LearningRate 0.0381 Epoch: 7 Global Step: 317610 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:04,830-Speed 2635.24 samples/sec Loss 8.3713 LearningRate 0.0381 Epoch: 7 Global Step: 317620 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:08,719-Speed 2633.23 samples/sec Loss 8.3563 LearningRate 0.0381 Epoch: 7 Global Step: 317630 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:12,611-Speed 2631.78 samples/sec Loss 8.3975 LearningRate 0.0381 Epoch: 7 Global Step: 317640 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:16,507-Speed 2629.26 samples/sec Loss 8.4268 LearningRate 0.0381 Epoch: 7 Global Step: 317650 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:20,498-Speed 2566.86 samples/sec Loss 8.3593 LearningRate 0.0381 Epoch: 7 Global Step: 317660 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:24,397-Speed 2627.02 samples/sec Loss 8.1139 LearningRate 0.0381 Epoch: 7 Global Step: 317670 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:28,292-Speed 2629.25 samples/sec Loss 8.4648 LearningRate 0.0381 Epoch: 7 Global Step: 317680 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:32,186-Speed 2630.36 samples/sec Loss 8.4114 LearningRate 0.0381 Epoch: 7 Global Step: 317690 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:36,082-Speed 2628.72 samples/sec Loss 8.3482 LearningRate 0.0381 Epoch: 7 Global Step: 317700 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:24:39,977-Speed 2630.17 samples/sec Loss 8.1791 LearningRate 0.0381 Epoch: 7 Global Step: 317710 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:24:43,876-Speed 2627.75 samples/sec Loss 8.3384 LearningRate 0.0381 Epoch: 7 Global Step: 317720 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:24:47,787-Speed 2618.58 samples/sec Loss 8.3765 LearningRate 0.0381 Epoch: 7 Global Step: 317730 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:24:51,682-Speed 2629.84 samples/sec Loss 8.3278 LearningRate 0.0381 Epoch: 7 Global Step: 317740 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:24:55,577-Speed 2629.61 samples/sec Loss 8.3426 LearningRate 0.0381 Epoch: 7 Global Step: 317750 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:24:59,479-Speed 2624.60 samples/sec Loss 8.2069 LearningRate 0.0381 Epoch: 7 Global Step: 317760 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:03,395-Speed 2615.73 samples/sec Loss 8.1880 LearningRate 0.0381 Epoch: 7 Global Step: 317770 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:07,302-Speed 2622.12 samples/sec Loss 8.3356 LearningRate 0.0381 Epoch: 7 Global Step: 317780 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:11,213-Speed 2619.26 samples/sec Loss 8.2776 LearningRate 0.0381 Epoch: 7 Global Step: 317790 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:15,121-Speed 2620.95 samples/sec Loss 8.3743 LearningRate 0.0381 Epoch: 7 Global Step: 317800 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:19,040-Speed 2613.76 samples/sec Loss 8.3377 LearningRate 0.0381 Epoch: 7 Global Step: 317810 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:25:22,937-Speed 2628.65 samples/sec Loss 8.2363 LearningRate 0.0381 Epoch: 7 Global Step: 317820 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:25:26,834-Speed 2628.64 samples/sec Loss 8.3245 LearningRate 0.0381 Epoch: 7 Global Step: 317830 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:25:30,750-Speed 2615.03 samples/sec Loss 8.3206 LearningRate 0.0381 Epoch: 7 Global Step: 317840 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:25:34,647-Speed 2628.66 samples/sec Loss 8.2352 LearningRate 0.0381 Epoch: 7 Global Step: 317850 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:25:38,539-Speed 2631.91 samples/sec Loss 8.2326 LearningRate 0.0380 Epoch: 7 Global Step: 317860 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:42,465-Speed 2608.81 samples/sec Loss 8.3391 LearningRate 0.0380 Epoch: 7 Global Step: 317870 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:46,364-Speed 2626.44 samples/sec Loss 8.2320 LearningRate 0.0380 Epoch: 7 Global Step: 317880 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:50,262-Speed 2628.43 samples/sec Loss 8.4431 LearningRate 0.0380 Epoch: 7 Global Step: 317890 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:54,155-Speed 2631.24 samples/sec Loss 8.2717 LearningRate 0.0380 Epoch: 7 Global Step: 317900 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:25:58,053-Speed 2627.49 samples/sec Loss 8.3320 LearningRate 0.0380 Epoch: 7 Global Step: 317910 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:26:01,950-Speed 2628.41 samples/sec Loss 8.4076 LearningRate 0.0380 Epoch: 7 Global Step: 317920 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:26:05,847-Speed 2628.30 samples/sec Loss 8.3434 LearningRate 0.0380 Epoch: 7 Global Step: 317930 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:26:09,754-Speed 2621.87 samples/sec Loss 8.1882 LearningRate 0.0380 Epoch: 7 Global Step: 317940 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:26:13,682-Speed 2607.62 samples/sec Loss 8.3798 LearningRate 0.0380 Epoch: 7 Global Step: 317950 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:26:17,576-Speed 2630.29 samples/sec Loss 8.3475 LearningRate 0.0380 Epoch: 7 Global Step: 317960 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:26:21,455-Speed 2640.55 samples/sec Loss 8.2919 LearningRate 0.0380 Epoch: 7 Global Step: 317970 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:26:25,333-Speed 2641.07 samples/sec Loss 8.3476 LearningRate 0.0380 Epoch: 7 Global Step: 317980 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:29,236-Speed 2624.39 samples/sec Loss 8.2416 LearningRate 0.0380 Epoch: 7 Global Step: 317990 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:33,132-Speed 2629.09 samples/sec Loss 8.4061 LearningRate 0.0380 Epoch: 7 Global Step: 318000 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:37,035-Speed 2624.33 samples/sec Loss 8.2773 LearningRate 0.0380 Epoch: 7 Global Step: 318010 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:40,934-Speed 2626.83 samples/sec Loss 8.3205 LearningRate 0.0380 Epoch: 7 Global Step: 318020 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:44,838-Speed 2624.14 samples/sec Loss 8.3328 LearningRate 0.0380 Epoch: 7 Global Step: 318030 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:48,731-Speed 2631.73 samples/sec Loss 8.3357 LearningRate 0.0380 Epoch: 7 Global Step: 318040 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:52,630-Speed 2626.89 samples/sec Loss 8.3949 LearningRate 0.0380 Epoch: 7 Global Step: 318050 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:26:56,555-Speed 2609.37 samples/sec Loss 8.4148 LearningRate 0.0380 Epoch: 7 Global Step: 318060 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:00,473-Speed 2614.82 samples/sec Loss 8.3398 LearningRate 0.0380 Epoch: 7 Global Step: 318070 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:04,352-Speed 2640.19 samples/sec Loss 8.2788 LearningRate 0.0380 Epoch: 7 Global Step: 318080 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:08,260-Speed 2620.93 samples/sec Loss 8.4207 LearningRate 0.0380 Epoch: 7 Global Step: 318090 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:12,157-Speed 2628.50 samples/sec Loss 8.3794 LearningRate 0.0380 Epoch: 7 Global Step: 318100 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:16,061-Speed 2623.78 samples/sec Loss 8.4353 LearningRate 0.0380 Epoch: 7 Global Step: 318110 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:19,953-Speed 2631.48 samples/sec Loss 8.1908 LearningRate 0.0380 Epoch: 7 Global Step: 318120 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:23,846-Speed 2631.58 samples/sec Loss 8.2512 LearningRate 0.0380 Epoch: 7 Global Step: 318130 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:27,737-Speed 2632.67 samples/sec Loss 8.3191 LearningRate 0.0380 Epoch: 7 Global Step: 318140 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:31,649-Speed 2617.87 samples/sec Loss 8.2028 LearningRate 0.0380 Epoch: 7 Global Step: 318150 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:35,548-Speed 2627.02 samples/sec Loss 8.3369 LearningRate 0.0380 Epoch: 7 Global Step: 318160 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:39,441-Speed 2630.70 samples/sec Loss 8.1997 LearningRate 0.0380 Epoch: 7 Global Step: 318170 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:27:43,335-Speed 2630.89 samples/sec Loss 8.1718 LearningRate 0.0380 Epoch: 7 Global Step: 318180 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:27:47,227-Speed 2631.10 samples/sec Loss 8.2933 LearningRate 0.0380 Epoch: 7 Global Step: 318190 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:27:51,123-Speed 2629.53 samples/sec Loss 8.2264 LearningRate 0.0380 Epoch: 7 Global Step: 318200 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:27:55,023-Speed 2625.62 samples/sec Loss 8.3820 LearningRate 0.0380 Epoch: 7 Global Step: 318210 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:27:58,916-Speed 2632.02 samples/sec Loss 8.3120 LearningRate 0.0380 Epoch: 7 Global Step: 318220 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:28:02,818-Speed 2624.61 samples/sec Loss 8.2664 LearningRate 0.0380 Epoch: 7 Global Step: 318230 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:28:06,715-Speed 2627.86 samples/sec Loss 8.4670 LearningRate 0.0380 Epoch: 7 Global Step: 318240 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:28:10,621-Speed 2622.21 samples/sec Loss 8.3445 LearningRate 0.0380 Epoch: 7 Global Step: 318250 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:28:14,524-Speed 2624.34 samples/sec Loss 8.3859 LearningRate 0.0380 Epoch: 7 Global Step: 318260 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:28:18,426-Speed 2625.12 samples/sec Loss 8.2987 LearningRate 0.0380 Epoch: 7 Global Step: 318270 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:28:22,333-Speed 2621.15 samples/sec Loss 8.3711 LearningRate 0.0380 Epoch: 7 Global Step: 318280 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:28:26,231-Speed 2628.39 samples/sec Loss 8.2078 LearningRate 0.0380 Epoch: 7 Global Step: 318290 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:28:30,106-Speed 2642.86 samples/sec Loss 8.3456 LearningRate 0.0380 Epoch: 7 Global Step: 318300 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:28:34,019-Speed 2617.75 samples/sec Loss 8.3201 LearningRate 0.0380 Epoch: 7 Global Step: 318310 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:28:37,918-Speed 2626.50 samples/sec Loss 8.3950 LearningRate 0.0380 Epoch: 7 Global Step: 318320 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:28:41,814-Speed 2629.00 samples/sec Loss 8.2582 LearningRate 0.0380 Epoch: 7 Global Step: 318330 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:28:45,720-Speed 2622.31 samples/sec Loss 8.2810 LearningRate 0.0380 Epoch: 7 Global Step: 318340 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:28:49,594-Speed 2643.79 samples/sec Loss 8.4146 LearningRate 0.0380 Epoch: 7 Global Step: 318350 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:28:53,488-Speed 2630.10 samples/sec Loss 8.3526 LearningRate 0.0380 Epoch: 7 Global Step: 318360 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:28:57,378-Speed 2634.02 samples/sec Loss 8.3283 LearningRate 0.0380 Epoch: 7 Global Step: 318370 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:01,269-Speed 2631.66 samples/sec Loss 8.2501 LearningRate 0.0380 Epoch: 7 Global Step: 318380 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:05,212-Speed 2598.47 samples/sec Loss 8.1956 LearningRate 0.0380 Epoch: 7 Global Step: 318390 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:09,108-Speed 2628.69 samples/sec Loss 8.2995 LearningRate 0.0380 Epoch: 7 Global Step: 318400 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:13,047-Speed 2600.26 samples/sec Loss 8.3785 LearningRate 0.0380 Epoch: 7 Global Step: 318410 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:16,947-Speed 2626.42 samples/sec Loss 8.3167 LearningRate 0.0380 Epoch: 7 Global Step: 318420 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:20,842-Speed 2629.71 samples/sec Loss 8.2573 LearningRate 0.0380 Epoch: 7 Global Step: 318430 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:24,737-Speed 2630.62 samples/sec Loss 8.3403 LearningRate 0.0380 Epoch: 7 Global Step: 318440 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:29:28,627-Speed 2632.65 samples/sec Loss 8.2710 LearningRate 0.0380 Epoch: 7 Global Step: 318450 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:32,519-Speed 2631.97 samples/sec Loss 8.3257 LearningRate 0.0380 Epoch: 7 Global Step: 318460 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:36,412-Speed 2630.96 samples/sec Loss 8.4147 LearningRate 0.0380 Epoch: 7 Global Step: 318470 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:40,314-Speed 2625.06 samples/sec Loss 8.3619 LearningRate 0.0380 Epoch: 7 Global Step: 318480 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:44,207-Speed 2630.54 samples/sec Loss 8.3504 LearningRate 0.0380 Epoch: 7 Global Step: 318490 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:48,135-Speed 2608.32 samples/sec Loss 8.2606 LearningRate 0.0380 Epoch: 7 Global Step: 318500 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:52,030-Speed 2629.30 samples/sec Loss 8.2435 LearningRate 0.0380 Epoch: 7 Global Step: 318510 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:55,954-Speed 2610.91 samples/sec Loss 8.3353 LearningRate 0.0380 Epoch: 7 Global Step: 318520 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:29:59,840-Speed 2635.38 samples/sec Loss 8.3432 LearningRate 0.0379 Epoch: 7 Global Step: 318530 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:30:03,734-Speed 2630.55 samples/sec Loss 8.2462 LearningRate 0.0379 Epoch: 7 Global Step: 318540 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:30:07,629-Speed 2629.59 samples/sec Loss 8.2082 LearningRate 0.0379 Epoch: 7 Global Step: 318550 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:11,523-Speed 2630.47 samples/sec Loss 8.3148 LearningRate 0.0379 Epoch: 7 Global Step: 318560 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:15,418-Speed 2629.16 samples/sec Loss 8.3576 LearningRate 0.0379 Epoch: 7 Global Step: 318570 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:19,319-Speed 2626.20 samples/sec Loss 8.3091 LearningRate 0.0379 Epoch: 7 Global Step: 318580 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:23,220-Speed 2625.40 samples/sec Loss 8.3032 LearningRate 0.0379 Epoch: 7 Global Step: 318590 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:27,125-Speed 2623.17 samples/sec Loss 8.2695 LearningRate 0.0379 Epoch: 7 Global Step: 318600 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:31,015-Speed 2632.83 samples/sec Loss 8.2230 LearningRate 0.0379 Epoch: 7 Global Step: 318610 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:34,906-Speed 2632.50 samples/sec Loss 8.3716 LearningRate 0.0379 Epoch: 7 Global Step: 318620 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:38,801-Speed 2629.41 samples/sec Loss 8.3276 LearningRate 0.0379 Epoch: 7 Global Step: 318630 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:42,704-Speed 2624.18 samples/sec Loss 8.4249 LearningRate 0.0379 Epoch: 7 Global Step: 318640 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:46,582-Speed 2640.82 samples/sec Loss 8.4329 LearningRate 0.0379 Epoch: 7 Global Step: 318650 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:50,474-Speed 2631.94 samples/sec Loss 8.2633 LearningRate 0.0379 Epoch: 7 Global Step: 318660 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:54,407-Speed 2604.26 samples/sec Loss 8.3915 LearningRate 0.0379 Epoch: 7 Global Step: 318670 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:30:58,203-Speed 2698.26 samples/sec Loss 8.7561 LearningRate 0.0379 Epoch: 7 Global Step: 318680 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:02,097-Speed 2630.67 samples/sec Loss 9.6438 LearningRate 0.0379 Epoch: 7 Global Step: 318690 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:05,990-Speed 2630.79 samples/sec Loss 9.0493 LearningRate 0.0379 Epoch: 7 Global Step: 318700 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:09,878-Speed 2634.19 samples/sec Loss 8.6103 LearningRate 0.0379 Epoch: 7 Global Step: 318710 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:13,767-Speed 2633.43 samples/sec Loss 8.3840 LearningRate 0.0379 Epoch: 7 Global Step: 318720 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:17,656-Speed 2633.57 samples/sec Loss 8.2082 LearningRate 0.0379 Epoch: 7 Global Step: 318730 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:21,548-Speed 2631.64 samples/sec Loss 8.2505 LearningRate 0.0379 Epoch: 7 Global Step: 318740 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:25,443-Speed 2629.38 samples/sec Loss 8.3549 LearningRate 0.0379 Epoch: 7 Global Step: 318750 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:29,368-Speed 2609.51 samples/sec Loss 8.3969 LearningRate 0.0379 Epoch: 7 Global Step: 318760 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:33,261-Speed 2631.45 samples/sec Loss 8.2817 LearningRate 0.0379 Epoch: 7 Global Step: 318770 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:31:37,158-Speed 2627.96 samples/sec Loss 8.2085 LearningRate 0.0379 Epoch: 7 Global Step: 318780 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:31:41,049-Speed 2632.19 samples/sec Loss 8.3142 LearningRate 0.0379 Epoch: 7 Global Step: 318790 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:31:45,037-Speed 2568.30 samples/sec Loss 8.2587 LearningRate 0.0379 Epoch: 7 Global Step: 318800 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:31:48,934-Speed 2628.32 samples/sec Loss 8.2675 LearningRate 0.0379 Epoch: 7 Global Step: 318810 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:31:52,829-Speed 2630.12 samples/sec Loss 8.1560 LearningRate 0.0379 Epoch: 7 Global Step: 318820 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:31:56,728-Speed 2626.30 samples/sec Loss 8.3424 LearningRate 0.0379 Epoch: 7 Global Step: 318830 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:32:00,620-Speed 2632.12 samples/sec Loss 8.3857 LearningRate 0.0379 Epoch: 7 Global Step: 318840 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:32:04,530-Speed 2619.18 samples/sec Loss 8.1504 LearningRate 0.0379 Epoch: 7 Global Step: 318850 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:32:08,423-Speed 2630.92 samples/sec Loss 8.2625 LearningRate 0.0379 Epoch: 7 Global Step: 318860 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:32:12,313-Speed 2633.32 samples/sec Loss 8.2137 LearningRate 0.0379 Epoch: 7 Global Step: 318870 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:32:16,202-Speed 2633.47 samples/sec Loss 8.3323 LearningRate 0.0379 Epoch: 7 Global Step: 318880 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:20,094-Speed 2632.16 samples/sec Loss 8.3291 LearningRate 0.0379 Epoch: 7 Global Step: 318890 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:23,986-Speed 2631.17 samples/sec Loss 8.3216 LearningRate 0.0379 Epoch: 7 Global Step: 318900 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:27,880-Speed 2630.78 samples/sec Loss 8.2693 LearningRate 0.0379 Epoch: 7 Global Step: 318910 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:31,819-Speed 2599.81 samples/sec Loss 8.3509 LearningRate 0.0379 Epoch: 7 Global Step: 318920 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:35,721-Speed 2624.66 samples/sec Loss 8.2552 LearningRate 0.0379 Epoch: 7 Global Step: 318930 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:39,617-Speed 2628.68 samples/sec Loss 8.2491 LearningRate 0.0379 Epoch: 7 Global Step: 318940 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:43,517-Speed 2626.54 samples/sec Loss 8.2295 LearningRate 0.0379 Epoch: 7 Global Step: 318950 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:47,419-Speed 2625.03 samples/sec Loss 8.2863 LearningRate 0.0379 Epoch: 7 Global Step: 318960 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:51,317-Speed 2627.66 samples/sec Loss 8.2829 LearningRate 0.0379 Epoch: 7 Global Step: 318970 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:32:55,216-Speed 2627.02 samples/sec Loss 8.2950 LearningRate 0.0379 Epoch: 7 Global Step: 318980 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:32:59,114-Speed 2627.68 samples/sec Loss 8.2057 LearningRate 0.0379 Epoch: 7 Global Step: 318990 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:03,009-Speed 2629.19 samples/sec Loss 8.2335 LearningRate 0.0379 Epoch: 7 Global Step: 319000 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:06,901-Speed 2631.50 samples/sec Loss 8.1410 LearningRate 0.0379 Epoch: 7 Global Step: 319010 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:10,791-Speed 2633.08 samples/sec Loss 8.2797 LearningRate 0.0379 Epoch: 7 Global Step: 319020 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:14,686-Speed 2629.39 samples/sec Loss 8.3621 LearningRate 0.0379 Epoch: 7 Global Step: 319030 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:18,579-Speed 2631.41 samples/sec Loss 8.3365 LearningRate 0.0379 Epoch: 7 Global Step: 319040 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:22,468-Speed 2633.86 samples/sec Loss 8.3144 LearningRate 0.0379 Epoch: 7 Global Step: 319050 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:26,360-Speed 2631.20 samples/sec Loss 8.1555 LearningRate 0.0379 Epoch: 7 Global Step: 319060 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:30,252-Speed 2632.30 samples/sec Loss 8.4053 LearningRate 0.0379 Epoch: 7 Global Step: 319070 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:33:34,144-Speed 2631.47 samples/sec Loss 8.3965 LearningRate 0.0379 Epoch: 7 Global Step: 319080 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:33:38,038-Speed 2629.81 samples/sec Loss 8.2327 LearningRate 0.0379 Epoch: 7 Global Step: 319090 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:33:41,933-Speed 2629.38 samples/sec Loss 8.2876 LearningRate 0.0379 Epoch: 7 Global Step: 319100 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:33:45,830-Speed 2628.62 samples/sec Loss 8.2711 LearningRate 0.0379 Epoch: 7 Global Step: 319110 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:33:49,733-Speed 2624.56 samples/sec Loss 8.3325 LearningRate 0.0379 Epoch: 7 Global Step: 319120 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:33:53,628-Speed 2629.14 samples/sec Loss 8.3168 LearningRate 0.0379 Epoch: 7 Global Step: 319130 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:33:57,527-Speed 2627.20 samples/sec Loss 8.2303 LearningRate 0.0379 Epoch: 7 Global Step: 319140 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:01,416-Speed 2633.07 samples/sec Loss 8.1086 LearningRate 0.0379 Epoch: 7 Global Step: 319150 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:05,308-Speed 2632.19 samples/sec Loss 8.3268 LearningRate 0.0379 Epoch: 7 Global Step: 319160 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:09,203-Speed 2629.44 samples/sec Loss 8.3504 LearningRate 0.0379 Epoch: 7 Global Step: 319170 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:13,103-Speed 2626.34 samples/sec Loss 8.3301 LearningRate 0.0379 Epoch: 7 Global Step: 319180 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:34:16,997-Speed 2630.16 samples/sec Loss 8.2675 LearningRate 0.0379 Epoch: 7 Global Step: 319190 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:34:20,892-Speed 2630.07 samples/sec Loss 8.3351 LearningRate 0.0379 Epoch: 7 Global Step: 319200 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:34:24,783-Speed 2632.10 samples/sec Loss 8.2187 LearningRate 0.0378 Epoch: 7 Global Step: 319210 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:28,674-Speed 2632.31 samples/sec Loss 8.3020 LearningRate 0.0378 Epoch: 7 Global Step: 319220 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:32,568-Speed 2630.46 samples/sec Loss 8.2393 LearningRate 0.0378 Epoch: 7 Global Step: 319230 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:36,464-Speed 2628.59 samples/sec Loss 8.2958 LearningRate 0.0378 Epoch: 7 Global Step: 319240 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:40,356-Speed 2631.50 samples/sec Loss 8.2756 LearningRate 0.0378 Epoch: 7 Global Step: 319250 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:44,249-Speed 2631.27 samples/sec Loss 8.3031 LearningRate 0.0378 Epoch: 7 Global Step: 319260 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:48,162-Speed 2617.56 samples/sec Loss 8.2886 LearningRate 0.0378 Epoch: 7 Global Step: 319270 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:52,067-Speed 2622.85 samples/sec Loss 8.3519 LearningRate 0.0378 Epoch: 7 Global Step: 319280 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:56,088-Speed 2547.26 samples/sec Loss 8.3268 LearningRate 0.0378 Epoch: 7 Global Step: 319290 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:34:59,984-Speed 2628.54 samples/sec Loss 8.2736 LearningRate 0.0378 Epoch: 7 Global Step: 319300 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:35:03,877-Speed 2631.28 samples/sec Loss 8.3498 LearningRate 0.0378 Epoch: 7 Global Step: 319310 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:07,771-Speed 2630.16 samples/sec Loss 8.3004 LearningRate 0.0378 Epoch: 7 Global Step: 319320 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:11,667-Speed 2628.91 samples/sec Loss 8.2817 LearningRate 0.0378 Epoch: 7 Global Step: 319330 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:15,580-Speed 2617.25 samples/sec Loss 8.2975 LearningRate 0.0378 Epoch: 7 Global Step: 319340 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:19,475-Speed 2630.16 samples/sec Loss 8.2945 LearningRate 0.0378 Epoch: 7 Global Step: 319350 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:23,368-Speed 2630.58 samples/sec Loss 8.4310 LearningRate 0.0378 Epoch: 7 Global Step: 319360 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:27,295-Speed 2608.77 samples/sec Loss 8.2188 LearningRate 0.0378 Epoch: 7 Global Step: 319370 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:31,185-Speed 2632.60 samples/sec Loss 8.2707 LearningRate 0.0378 Epoch: 7 Global Step: 319380 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:35,076-Speed 2632.98 samples/sec Loss 8.2516 LearningRate 0.0378 Epoch: 7 Global Step: 319390 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:38,966-Speed 2632.57 samples/sec Loss 8.3972 LearningRate 0.0378 Epoch: 7 Global Step: 319400 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:35:42,858-Speed 2631.53 samples/sec Loss 8.2336 LearningRate 0.0378 Epoch: 7 Global Step: 319410 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:35:46,750-Speed 2631.95 samples/sec Loss 8.2421 LearningRate 0.0378 Epoch: 7 Global Step: 319420 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:35:50,654-Speed 2623.85 samples/sec Loss 8.3302 LearningRate 0.0378 Epoch: 7 Global Step: 319430 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:35:54,589-Speed 2603.13 samples/sec Loss 8.3223 LearningRate 0.0378 Epoch: 7 Global Step: 319440 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:35:58,490-Speed 2625.51 samples/sec Loss 8.2494 LearningRate 0.0378 Epoch: 7 Global Step: 319450 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:02,383-Speed 2631.83 samples/sec Loss 8.2340 LearningRate 0.0378 Epoch: 7 Global Step: 319460 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:06,274-Speed 2631.53 samples/sec Loss 8.2124 LearningRate 0.0378 Epoch: 7 Global Step: 319470 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:10,166-Speed 2632.05 samples/sec Loss 8.1704 LearningRate 0.0378 Epoch: 7 Global Step: 319480 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:14,071-Speed 2622.80 samples/sec Loss 8.1196 LearningRate 0.0378 Epoch: 7 Global Step: 319490 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:17,976-Speed 2622.66 samples/sec Loss 8.2366 LearningRate 0.0378 Epoch: 7 Global Step: 319500 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:21,868-Speed 2631.59 samples/sec Loss 8.2898 LearningRate 0.0378 Epoch: 7 Global Step: 319510 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:36:25,757-Speed 2633.62 samples/sec Loss 8.1793 LearningRate 0.0378 Epoch: 7 Global Step: 319520 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:29,649-Speed 2631.86 samples/sec Loss 8.1830 LearningRate 0.0378 Epoch: 7 Global Step: 319530 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:33,540-Speed 2632.18 samples/sec Loss 8.2049 LearningRate 0.0378 Epoch: 7 Global Step: 319540 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:36:37,448-Speed 2620.91 samples/sec Loss 8.2111 LearningRate 0.0378 Epoch: 7 Global Step: 319550 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:36:41,333-Speed 2636.88 samples/sec Loss 8.2671 LearningRate 0.0378 Epoch: 7 Global Step: 319560 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:36:45,224-Speed 2631.98 samples/sec Loss 8.2265 LearningRate 0.0378 Epoch: 7 Global Step: 319570 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:36:49,117-Speed 2631.11 samples/sec Loss 8.2437 LearningRate 0.0378 Epoch: 7 Global Step: 319580 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:36:53,009-Speed 2632.09 samples/sec Loss 8.2410 LearningRate 0.0378 Epoch: 7 Global Step: 319590 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:36:56,904-Speed 2629.39 samples/sec Loss 8.2983 LearningRate 0.0378 Epoch: 7 Global Step: 319600 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:00,796-Speed 2631.42 samples/sec Loss 8.2854 LearningRate 0.0378 Epoch: 7 Global Step: 319610 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:04,697-Speed 2625.45 samples/sec Loss 8.2704 LearningRate 0.0378 Epoch: 7 Global Step: 319620 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:08,593-Speed 2628.73 samples/sec Loss 8.2259 LearningRate 0.0378 Epoch: 7 Global Step: 319630 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:12,485-Speed 2631.77 samples/sec Loss 8.1561 LearningRate 0.0378 Epoch: 7 Global Step: 319640 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:16,378-Speed 2631.47 samples/sec Loss 8.2714 LearningRate 0.0378 Epoch: 7 Global Step: 319650 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:37:20,274-Speed 2628.61 samples/sec Loss 8.2681 LearningRate 0.0378 Epoch: 7 Global Step: 319660 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:37:24,171-Speed 2628.71 samples/sec Loss 8.2244 LearningRate 0.0378 Epoch: 7 Global Step: 319670 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:37:28,068-Speed 2627.85 samples/sec Loss 8.2056 LearningRate 0.0378 Epoch: 7 Global Step: 319680 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:37:31,962-Speed 2630.14 samples/sec Loss 8.2484 LearningRate 0.0378 Epoch: 7 Global Step: 319690 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:37:35,854-Speed 2632.00 samples/sec Loss 8.2332 LearningRate 0.0378 Epoch: 7 Global Step: 319700 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:37:39,762-Speed 2620.52 samples/sec Loss 8.2482 LearningRate 0.0378 Epoch: 7 Global Step: 319710 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:37:43,660-Speed 2627.47 samples/sec Loss 8.3118 LearningRate 0.0378 Epoch: 7 Global Step: 319720 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:47,550-Speed 2632.89 samples/sec Loss 8.3125 LearningRate 0.0378 Epoch: 7 Global Step: 319730 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:51,444-Speed 2630.53 samples/sec Loss 8.2983 LearningRate 0.0378 Epoch: 7 Global Step: 319740 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:55,336-Speed 2631.82 samples/sec Loss 8.2334 LearningRate 0.0378 Epoch: 7 Global Step: 319750 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:37:59,226-Speed 2633.03 samples/sec Loss 8.1663 LearningRate 0.0378 Epoch: 7 Global Step: 319760 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:03,117-Speed 2632.05 samples/sec Loss 8.2955 LearningRate 0.0378 Epoch: 7 Global Step: 319770 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:07,013-Speed 2629.13 samples/sec Loss 8.4008 LearningRate 0.0378 Epoch: 7 Global Step: 319780 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:10,911-Speed 2627.57 samples/sec Loss 8.2766 LearningRate 0.0378 Epoch: 7 Global Step: 319790 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:14,810-Speed 2627.03 samples/sec Loss 8.3339 LearningRate 0.0378 Epoch: 7 Global Step: 319800 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:18,702-Speed 2631.13 samples/sec Loss 8.2455 LearningRate 0.0378 Epoch: 7 Global Step: 319810 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:22,598-Speed 2629.24 samples/sec Loss 8.3829 LearningRate 0.0378 Epoch: 7 Global Step: 319820 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:38:26,492-Speed 2629.90 samples/sec Loss 8.3126 LearningRate 0.0378 Epoch: 7 Global Step: 319830 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:38:30,385-Speed 2631.29 samples/sec Loss 8.3227 LearningRate 0.0378 Epoch: 7 Global Step: 319840 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:38:34,293-Speed 2621.33 samples/sec Loss 8.2995 LearningRate 0.0378 Epoch: 7 Global Step: 319850 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:38:38,164-Speed 2645.63 samples/sec Loss 8.1712 LearningRate 0.0378 Epoch: 7 Global Step: 319860 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:42,056-Speed 2631.49 samples/sec Loss 8.3084 LearningRate 0.0378 Epoch: 7 Global Step: 319870 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:45,948-Speed 2632.15 samples/sec Loss 8.2234 LearningRate 0.0377 Epoch: 7 Global Step: 319880 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:49,837-Speed 2633.76 samples/sec Loss 8.1874 LearningRate 0.0377 Epoch: 7 Global Step: 319890 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:53,732-Speed 2629.45 samples/sec Loss 8.2146 LearningRate 0.0377 Epoch: 7 Global Step: 319900 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:38:57,624-Speed 2631.85 samples/sec Loss 8.2127 LearningRate 0.0377 Epoch: 7 Global Step: 319910 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:39:01,522-Speed 2627.38 samples/sec Loss 8.3027 LearningRate 0.0377 Epoch: 7 Global Step: 319920 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:39:05,415-Speed 2631.17 samples/sec Loss 8.3110 LearningRate 0.0377 Epoch: 7 Global Step: 319930 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:39:09,312-Speed 2628.10 samples/sec Loss 8.2290 LearningRate 0.0377 Epoch: 7 Global Step: 319940 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:39:13,207-Speed 2629.73 samples/sec Loss 8.2414 LearningRate 0.0377 Epoch: 7 Global Step: 319950 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:39:17,104-Speed 2628.11 samples/sec Loss 8.2287 LearningRate 0.0377 Epoch: 7 Global Step: 319960 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:39:20,999-Speed 2630.34 samples/sec Loss 8.2893 LearningRate 0.0377 Epoch: 7 Global Step: 319970 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:39:24,896-Speed 2627.82 samples/sec Loss 8.3418 LearningRate 0.0377 Epoch: 7 Global Step: 319980 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:39:28,793-Speed 2628.56 samples/sec Loss 8.3549 LearningRate 0.0377 Epoch: 7 Global Step: 319990 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:39:32,693-Speed 2626.13 samples/sec Loss 8.1529 LearningRate 0.0377 Epoch: 7 Global Step: 320000 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:40:15,731-[lfw][320000]XNorm: 23.231626
Training: 2022-04-14 07:40:15,732-[lfw][320000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 07:40:15,732-[lfw][320000]Accuracy-Highest: 0.99783
Training: 2022-04-14 07:41:05,862-[cfp_fp][320000]XNorm: 21.376191
Training: 2022-04-14 07:41:05,863-[cfp_fp][320000]Accuracy-Flip: 0.98443+-0.00559
Training: 2022-04-14 07:41:05,863-[cfp_fp][320000]Accuracy-Highest: 0.98671
Training: 2022-04-14 07:41:48,754-[agedb_30][320000]XNorm: 23.276776
Training: 2022-04-14 07:41:48,755-[agedb_30][320000]Accuracy-Flip: 0.97367+-0.00741
Training: 2022-04-14 07:41:48,755-[agedb_30][320000]Accuracy-Highest: 0.97567
Training: 2022-04-14 07:41:52,619-Speed 73.18 samples/sec Loss 8.2415 LearningRate 0.0377 Epoch: 7 Global Step: 320010 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:41:56,612-Speed 2564.72 samples/sec Loss 8.2733 LearningRate 0.0377 Epoch: 7 Global Step: 320020 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:00,486-Speed 2644.42 samples/sec Loss 8.2650 LearningRate 0.0377 Epoch: 7 Global Step: 320030 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:04,357-Speed 2645.42 samples/sec Loss 8.2616 LearningRate 0.0377 Epoch: 7 Global Step: 320040 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:08,257-Speed 2626.13 samples/sec Loss 8.2859 LearningRate 0.0377 Epoch: 7 Global Step: 320050 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:12,130-Speed 2644.57 samples/sec Loss 8.3468 LearningRate 0.0377 Epoch: 7 Global Step: 320060 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:16,004-Speed 2644.46 samples/sec Loss 8.2589 LearningRate 0.0377 Epoch: 7 Global Step: 320070 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:19,882-Speed 2641.17 samples/sec Loss 8.3195 LearningRate 0.0377 Epoch: 7 Global Step: 320080 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:23,763-Speed 2639.11 samples/sec Loss 8.1749 LearningRate 0.0377 Epoch: 7 Global Step: 320090 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:27,650-Speed 2634.75 samples/sec Loss 8.2900 LearningRate 0.0377 Epoch: 7 Global Step: 320100 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:31,536-Speed 2636.22 samples/sec Loss 8.3292 LearningRate 0.0377 Epoch: 7 Global Step: 320110 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:35,441-Speed 2622.51 samples/sec Loss 8.4126 LearningRate 0.0377 Epoch: 7 Global Step: 320120 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:39,335-Speed 2630.45 samples/sec Loss 8.3268 LearningRate 0.0377 Epoch: 7 Global Step: 320130 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:43,223-Speed 2633.78 samples/sec Loss 8.3719 LearningRate 0.0377 Epoch: 7 Global Step: 320140 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:47,112-Speed 2634.44 samples/sec Loss 8.3383 LearningRate 0.0377 Epoch: 7 Global Step: 320150 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:50,982-Speed 2646.20 samples/sec Loss 8.1859 LearningRate 0.0377 Epoch: 7 Global Step: 320160 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:54,872-Speed 2633.60 samples/sec Loss 8.2915 LearningRate 0.0377 Epoch: 7 Global Step: 320170 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:42:58,765-Speed 2630.82 samples/sec Loss 8.1501 LearningRate 0.0377 Epoch: 7 Global Step: 320180 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:43:02,653-Speed 2634.39 samples/sec Loss 8.3285 LearningRate 0.0377 Epoch: 7 Global Step: 320190 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:43:06,555-Speed 2624.41 samples/sec Loss 8.2682 LearningRate 0.0377 Epoch: 7 Global Step: 320200 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:43:10,453-Speed 2627.65 samples/sec Loss 8.2238 LearningRate 0.0377 Epoch: 7 Global Step: 320210 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:43:14,343-Speed 2632.94 samples/sec Loss 8.2562 LearningRate 0.0377 Epoch: 7 Global Step: 320220 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:43:18,221-Speed 2641.28 samples/sec Loss 8.2568 LearningRate 0.0377 Epoch: 7 Global Step: 320230 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:22,110-Speed 2633.69 samples/sec Loss 8.3588 LearningRate 0.0377 Epoch: 7 Global Step: 320240 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:26,002-Speed 2631.98 samples/sec Loss 8.2503 LearningRate 0.0377 Epoch: 7 Global Step: 320250 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:29,899-Speed 2628.60 samples/sec Loss 8.3487 LearningRate 0.0377 Epoch: 7 Global Step: 320260 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:33,793-Speed 2630.19 samples/sec Loss 8.2552 LearningRate 0.0377 Epoch: 7 Global Step: 320270 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:37,693-Speed 2625.82 samples/sec Loss 8.2438 LearningRate 0.0377 Epoch: 7 Global Step: 320280 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:41,599-Speed 2622.37 samples/sec Loss 8.1329 LearningRate 0.0377 Epoch: 7 Global Step: 320290 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:45,496-Speed 2627.91 samples/sec Loss 8.1990 LearningRate 0.0377 Epoch: 7 Global Step: 320300 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:49,392-Speed 2629.99 samples/sec Loss 8.4165 LearningRate 0.0377 Epoch: 7 Global Step: 320310 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:53,285-Speed 2631.34 samples/sec Loss 8.2202 LearningRate 0.0377 Epoch: 7 Global Step: 320320 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:43:57,175-Speed 2633.06 samples/sec Loss 8.2801 LearningRate 0.0377 Epoch: 7 Global Step: 320330 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:44:01,065-Speed 2633.27 samples/sec Loss 8.2131 LearningRate 0.0377 Epoch: 7 Global Step: 320340 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:44:04,956-Speed 2632.05 samples/sec Loss 8.2217 LearningRate 0.0377 Epoch: 7 Global Step: 320350 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:44:08,848-Speed 2631.74 samples/sec Loss 8.2645 LearningRate 0.0377 Epoch: 7 Global Step: 320360 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:44:12,738-Speed 2632.65 samples/sec Loss 8.2530 LearningRate 0.0377 Epoch: 7 Global Step: 320370 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:44:16,548-Speed 2688.41 samples/sec Loss 8.7854 LearningRate 0.0377 Epoch: 7 Global Step: 320380 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:20,442-Speed 2630.50 samples/sec Loss 10.4934 LearningRate 0.0377 Epoch: 7 Global Step: 320390 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:24,345-Speed 2624.26 samples/sec Loss 8.9209 LearningRate 0.0377 Epoch: 7 Global Step: 320400 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:28,232-Speed 2635.64 samples/sec Loss 8.4355 LearningRate 0.0377 Epoch: 7 Global Step: 320410 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:32,120-Speed 2634.68 samples/sec Loss 8.4534 LearningRate 0.0377 Epoch: 7 Global Step: 320420 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:36,005-Speed 2636.39 samples/sec Loss 8.2574 LearningRate 0.0377 Epoch: 7 Global Step: 320430 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:39,906-Speed 2625.44 samples/sec Loss 8.2184 LearningRate 0.0377 Epoch: 7 Global Step: 320440 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:43,807-Speed 2625.51 samples/sec Loss 8.1456 LearningRate 0.0377 Epoch: 7 Global Step: 320450 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:47,715-Speed 2621.10 samples/sec Loss 8.3298 LearningRate 0.0377 Epoch: 7 Global Step: 320460 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:51,682-Speed 2581.71 samples/sec Loss 8.3292 LearningRate 0.0377 Epoch: 7 Global Step: 320470 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:44:55,675-Speed 2565.01 samples/sec Loss 8.3170 LearningRate 0.0377 Epoch: 7 Global Step: 320480 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:44:59,574-Speed 2627.63 samples/sec Loss 8.2369 LearningRate 0.0377 Epoch: 7 Global Step: 320490 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:03,472-Speed 2627.51 samples/sec Loss 8.1798 LearningRate 0.0377 Epoch: 7 Global Step: 320500 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:07,375-Speed 2624.29 samples/sec Loss 8.3522 LearningRate 0.0377 Epoch: 7 Global Step: 320510 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:11,283-Speed 2620.70 samples/sec Loss 8.3097 LearningRate 0.0377 Epoch: 7 Global Step: 320520 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:15,177-Speed 2629.91 samples/sec Loss 8.2277 LearningRate 0.0377 Epoch: 7 Global Step: 320530 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:19,080-Speed 2624.53 samples/sec Loss 8.1828 LearningRate 0.0377 Epoch: 7 Global Step: 320540 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:22,967-Speed 2635.15 samples/sec Loss 8.4459 LearningRate 0.0377 Epoch: 7 Global Step: 320550 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:26,927-Speed 2586.61 samples/sec Loss 8.3319 LearningRate 0.0376 Epoch: 7 Global Step: 320560 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:30,817-Speed 2632.74 samples/sec Loss 8.3353 LearningRate 0.0376 Epoch: 7 Global Step: 320570 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:45:34,707-Speed 2633.68 samples/sec Loss 8.4024 LearningRate 0.0376 Epoch: 7 Global Step: 320580 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:45:38,610-Speed 2623.92 samples/sec Loss 8.3342 LearningRate 0.0376 Epoch: 7 Global Step: 320590 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:45:42,501-Speed 2632.47 samples/sec Loss 8.3075 LearningRate 0.0376 Epoch: 7 Global Step: 320600 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:45:46,389-Speed 2634.04 samples/sec Loss 8.2512 LearningRate 0.0376 Epoch: 7 Global Step: 320610 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:45:50,276-Speed 2634.59 samples/sec Loss 8.2460 LearningRate 0.0376 Epoch: 7 Global Step: 320620 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:45:54,174-Speed 2628.43 samples/sec Loss 8.1746 LearningRate 0.0376 Epoch: 7 Global Step: 320630 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:45:58,062-Speed 2634.16 samples/sec Loss 8.2369 LearningRate 0.0376 Epoch: 7 Global Step: 320640 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:46:01,964-Speed 2625.41 samples/sec Loss 8.2817 LearningRate 0.0376 Epoch: 7 Global Step: 320650 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:46:05,845-Speed 2638.69 samples/sec Loss 8.3533 LearningRate 0.0376 Epoch: 7 Global Step: 320660 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:46:09,739-Speed 2630.61 samples/sec Loss 8.1770 LearningRate 0.0376 Epoch: 7 Global Step: 320670 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:46:13,635-Speed 2628.61 samples/sec Loss 8.2218 LearningRate 0.0376 Epoch: 7 Global Step: 320680 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:17,525-Speed 2633.22 samples/sec Loss 8.2374 LearningRate 0.0376 Epoch: 7 Global Step: 320690 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:21,414-Speed 2633.51 samples/sec Loss 8.3840 LearningRate 0.0376 Epoch: 7 Global Step: 320700 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:25,303-Speed 2633.62 samples/sec Loss 8.3315 LearningRate 0.0376 Epoch: 7 Global Step: 320710 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:29,192-Speed 2634.41 samples/sec Loss 8.3050 LearningRate 0.0376 Epoch: 7 Global Step: 320720 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:33,078-Speed 2635.67 samples/sec Loss 8.2594 LearningRate 0.0376 Epoch: 7 Global Step: 320730 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:36,965-Speed 2634.38 samples/sec Loss 8.2622 LearningRate 0.0376 Epoch: 7 Global Step: 320740 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:40,883-Speed 2614.81 samples/sec Loss 8.2620 LearningRate 0.0376 Epoch: 7 Global Step: 320750 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:44,773-Speed 2633.06 samples/sec Loss 8.2762 LearningRate 0.0376 Epoch: 7 Global Step: 320760 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:48,716-Speed 2597.53 samples/sec Loss 8.1460 LearningRate 0.0376 Epoch: 7 Global Step: 320770 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:46:52,603-Speed 2635.13 samples/sec Loss 8.2766 LearningRate 0.0376 Epoch: 7 Global Step: 320780 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:46:56,492-Speed 2633.77 samples/sec Loss 8.1972 LearningRate 0.0376 Epoch: 7 Global Step: 320790 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:00,382-Speed 2632.97 samples/sec Loss 8.3506 LearningRate 0.0376 Epoch: 7 Global Step: 320800 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:04,288-Speed 2622.42 samples/sec Loss 8.2703 LearningRate 0.0376 Epoch: 7 Global Step: 320810 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:08,192-Speed 2623.54 samples/sec Loss 8.3327 LearningRate 0.0376 Epoch: 7 Global Step: 320820 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:12,122-Speed 2606.36 samples/sec Loss 8.2406 LearningRate 0.0376 Epoch: 7 Global Step: 320830 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:16,021-Speed 2627.39 samples/sec Loss 8.0995 LearningRate 0.0376 Epoch: 7 Global Step: 320840 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:19,913-Speed 2631.68 samples/sec Loss 8.2216 LearningRate 0.0376 Epoch: 7 Global Step: 320850 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:23,812-Speed 2626.95 samples/sec Loss 8.2562 LearningRate 0.0376 Epoch: 7 Global Step: 320860 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:27,706-Speed 2629.97 samples/sec Loss 8.2314 LearningRate 0.0376 Epoch: 7 Global Step: 320870 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:47:31,594-Speed 2634.58 samples/sec Loss 8.3607 LearningRate 0.0376 Epoch: 7 Global Step: 320880 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:47:35,482-Speed 2634.00 samples/sec Loss 8.1791 LearningRate 0.0376 Epoch: 7 Global Step: 320890 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:47:39,375-Speed 2631.56 samples/sec Loss 8.3548 LearningRate 0.0376 Epoch: 7 Global Step: 320900 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:47:43,266-Speed 2632.14 samples/sec Loss 8.2724 LearningRate 0.0376 Epoch: 7 Global Step: 320910 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:47:47,154-Speed 2634.50 samples/sec Loss 8.2089 LearningRate 0.0376 Epoch: 7 Global Step: 320920 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:47:51,043-Speed 2633.74 samples/sec Loss 8.3063 LearningRate 0.0376 Epoch: 7 Global Step: 320930 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:47:54,931-Speed 2634.23 samples/sec Loss 8.2665 LearningRate 0.0376 Epoch: 7 Global Step: 320940 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:47:58,797-Speed 2649.53 samples/sec Loss 8.5661 LearningRate 0.0376 Epoch: 7 Global Step: 320950 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:48:02,725-Speed 2607.30 samples/sec Loss 8.7098 LearningRate 0.0376 Epoch: 7 Global Step: 320960 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:48:06,586-Speed 2652.87 samples/sec Loss 8.9958 LearningRate 0.0376 Epoch: 7 Global Step: 320970 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:10,473-Speed 2635.25 samples/sec Loss 8.4089 LearningRate 0.0376 Epoch: 7 Global Step: 320980 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:14,365-Speed 2631.44 samples/sec Loss 8.1900 LearningRate 0.0376 Epoch: 7 Global Step: 320990 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:18,257-Speed 2631.98 samples/sec Loss 8.3110 LearningRate 0.0376 Epoch: 7 Global Step: 321000 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:22,151-Speed 2630.40 samples/sec Loss 8.2273 LearningRate 0.0376 Epoch: 7 Global Step: 321010 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:26,039-Speed 2634.50 samples/sec Loss 8.2188 LearningRate 0.0376 Epoch: 7 Global Step: 321020 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:29,928-Speed 2633.09 samples/sec Loss 8.3557 LearningRate 0.0376 Epoch: 7 Global Step: 321030 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:33,827-Speed 2626.97 samples/sec Loss 8.2471 LearningRate 0.0376 Epoch: 7 Global Step: 321040 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:37,717-Speed 2633.10 samples/sec Loss 8.3447 LearningRate 0.0376 Epoch: 7 Global Step: 321050 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:41,606-Speed 2633.41 samples/sec Loss 8.1803 LearningRate 0.0376 Epoch: 7 Global Step: 321060 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:48:45,496-Speed 2632.95 samples/sec Loss 8.2697 LearningRate 0.0376 Epoch: 7 Global Step: 321070 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:48:49,388-Speed 2631.48 samples/sec Loss 8.1169 LearningRate 0.0376 Epoch: 7 Global Step: 321080 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:48:53,279-Speed 2633.32 samples/sec Loss 8.2244 LearningRate 0.0376 Epoch: 7 Global Step: 321090 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:48:57,172-Speed 2630.96 samples/sec Loss 8.3264 LearningRate 0.0376 Epoch: 7 Global Step: 321100 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:49:01,061-Speed 2633.56 samples/sec Loss 8.2384 LearningRate 0.0376 Epoch: 7 Global Step: 321110 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:49:05,070-Speed 2554.35 samples/sec Loss 8.2940 LearningRate 0.0376 Epoch: 7 Global Step: 321120 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:49:09,061-Speed 2567.17 samples/sec Loss 8.2516 LearningRate 0.0376 Epoch: 7 Global Step: 321130 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:49:12,963-Speed 2624.91 samples/sec Loss 8.2151 LearningRate 0.0376 Epoch: 7 Global Step: 321140 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:49:16,861-Speed 2627.21 samples/sec Loss 8.3660 LearningRate 0.0376 Epoch: 7 Global Step: 321150 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:49:20,753-Speed 2631.88 samples/sec Loss 8.3114 LearningRate 0.0376 Epoch: 7 Global Step: 321160 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:49:24,647-Speed 2630.70 samples/sec Loss 8.3980 LearningRate 0.0376 Epoch: 7 Global Step: 321170 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:28,547-Speed 2626.67 samples/sec Loss 8.1749 LearningRate 0.0376 Epoch: 7 Global Step: 321180 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:32,443-Speed 2628.95 samples/sec Loss 8.2504 LearningRate 0.0376 Epoch: 7 Global Step: 321190 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:36,341-Speed 2627.22 samples/sec Loss 8.3607 LearningRate 0.0376 Epoch: 7 Global Step: 321200 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:40,242-Speed 2625.67 samples/sec Loss 8.2534 LearningRate 0.0376 Epoch: 7 Global Step: 321210 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:44,132-Speed 2633.31 samples/sec Loss 8.3576 LearningRate 0.0376 Epoch: 7 Global Step: 321220 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:48,026-Speed 2630.75 samples/sec Loss 8.2457 LearningRate 0.0375 Epoch: 7 Global Step: 321230 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:51,919-Speed 2631.37 samples/sec Loss 8.2804 LearningRate 0.0375 Epoch: 7 Global Step: 321240 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:55,823-Speed 2623.22 samples/sec Loss 8.1273 LearningRate 0.0375 Epoch: 7 Global Step: 321250 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:49:59,703-Speed 2640.15 samples/sec Loss 8.3278 LearningRate 0.0375 Epoch: 7 Global Step: 321260 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:50:03,594-Speed 2632.37 samples/sec Loss 8.2663 LearningRate 0.0375 Epoch: 7 Global Step: 321270 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:07,552-Speed 2588.03 samples/sec Loss 8.2970 LearningRate 0.0375 Epoch: 7 Global Step: 321280 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:11,461-Speed 2619.77 samples/sec Loss 8.2703 LearningRate 0.0375 Epoch: 7 Global Step: 321290 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:15,351-Speed 2633.11 samples/sec Loss 8.3071 LearningRate 0.0375 Epoch: 7 Global Step: 321300 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:19,244-Speed 2631.60 samples/sec Loss 8.2781 LearningRate 0.0375 Epoch: 7 Global Step: 321310 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:23,146-Speed 2624.69 samples/sec Loss 8.3412 LearningRate 0.0375 Epoch: 7 Global Step: 321320 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:27,065-Speed 2613.74 samples/sec Loss 8.1100 LearningRate 0.0375 Epoch: 7 Global Step: 321330 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:30,978-Speed 2617.63 samples/sec Loss 8.2230 LearningRate 0.0375 Epoch: 7 Global Step: 321340 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:34,894-Speed 2615.43 samples/sec Loss 8.2875 LearningRate 0.0375 Epoch: 7 Global Step: 321350 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:38,800-Speed 2622.63 samples/sec Loss 8.2225 LearningRate 0.0375 Epoch: 7 Global Step: 321360 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:50:42,729-Speed 2606.98 samples/sec Loss 8.2652 LearningRate 0.0375 Epoch: 7 Global Step: 321370 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:50:46,640-Speed 2618.72 samples/sec Loss 8.2068 LearningRate 0.0375 Epoch: 7 Global Step: 321380 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:50:50,555-Speed 2616.71 samples/sec Loss 8.0741 LearningRate 0.0375 Epoch: 7 Global Step: 321390 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:50:54,459-Speed 2623.28 samples/sec Loss 8.2275 LearningRate 0.0375 Epoch: 7 Global Step: 321400 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:50:58,347-Speed 2634.61 samples/sec Loss 8.3012 LearningRate 0.0375 Epoch: 7 Global Step: 321410 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:51:02,250-Speed 2624.13 samples/sec Loss 8.1889 LearningRate 0.0375 Epoch: 7 Global Step: 321420 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:51:06,154-Speed 2623.19 samples/sec Loss 8.2061 LearningRate 0.0375 Epoch: 7 Global Step: 321430 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:51:10,050-Speed 2629.08 samples/sec Loss 8.1211 LearningRate 0.0375 Epoch: 7 Global Step: 321440 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:51:13,958-Speed 2620.82 samples/sec Loss 8.0751 LearningRate 0.0375 Epoch: 7 Global Step: 321450 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:51:17,864-Speed 2622.26 samples/sec Loss 8.2323 LearningRate 0.0375 Epoch: 7 Global Step: 321460 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:51:21,778-Speed 2617.46 samples/sec Loss 8.2167 LearningRate 0.0375 Epoch: 7 Global Step: 321470 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:25,678-Speed 2626.07 samples/sec Loss 8.3174 LearningRate 0.0375 Epoch: 7 Global Step: 321480 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:29,570-Speed 2632.26 samples/sec Loss 8.2868 LearningRate 0.0375 Epoch: 7 Global Step: 321490 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:33,462-Speed 2630.94 samples/sec Loss 8.2812 LearningRate 0.0375 Epoch: 7 Global Step: 321500 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:37,354-Speed 2631.74 samples/sec Loss 8.2543 LearningRate 0.0375 Epoch: 7 Global Step: 321510 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:41,249-Speed 2630.12 samples/sec Loss 8.1901 LearningRate 0.0375 Epoch: 7 Global Step: 321520 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:45,151-Speed 2624.71 samples/sec Loss 8.4679 LearningRate 0.0375 Epoch: 7 Global Step: 321530 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:49,056-Speed 2623.69 samples/sec Loss 8.4743 LearningRate 0.0375 Epoch: 7 Global Step: 321540 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:52,947-Speed 2632.47 samples/sec Loss 8.3268 LearningRate 0.0375 Epoch: 7 Global Step: 321550 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:51:56,837-Speed 2632.86 samples/sec Loss 8.1322 LearningRate 0.0375 Epoch: 7 Global Step: 321560 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:52:00,730-Speed 2631.21 samples/sec Loss 8.2183 LearningRate 0.0375 Epoch: 7 Global Step: 321570 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:04,624-Speed 2630.08 samples/sec Loss 8.1180 LearningRate 0.0375 Epoch: 7 Global Step: 321580 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:08,517-Speed 2630.99 samples/sec Loss 8.1764 LearningRate 0.0375 Epoch: 7 Global Step: 321590 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:12,412-Speed 2629.67 samples/sec Loss 8.1412 LearningRate 0.0375 Epoch: 7 Global Step: 321600 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:16,311-Speed 2626.82 samples/sec Loss 8.1172 LearningRate 0.0375 Epoch: 7 Global Step: 321610 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:20,202-Speed 2632.72 samples/sec Loss 8.2325 LearningRate 0.0375 Epoch: 7 Global Step: 321620 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:24,093-Speed 2633.08 samples/sec Loss 8.3090 LearningRate 0.0375 Epoch: 7 Global Step: 321630 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:27,986-Speed 2631.02 samples/sec Loss 8.2025 LearningRate 0.0375 Epoch: 7 Global Step: 321640 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:31,880-Speed 2630.19 samples/sec Loss 8.3481 LearningRate 0.0375 Epoch: 7 Global Step: 321650 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:35,774-Speed 2630.12 samples/sec Loss 8.3418 LearningRate 0.0375 Epoch: 7 Global Step: 321660 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:39,669-Speed 2630.07 samples/sec Loss 8.2951 LearningRate 0.0375 Epoch: 7 Global Step: 321670 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 07:52:43,557-Speed 2633.73 samples/sec Loss 8.2306 LearningRate 0.0375 Epoch: 7 Global Step: 321680 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:47,463-Speed 2622.35 samples/sec Loss 8.3020 LearningRate 0.0375 Epoch: 7 Global Step: 321690 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:51,360-Speed 2628.27 samples/sec Loss 8.2173 LearningRate 0.0375 Epoch: 7 Global Step: 321700 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:55,267-Speed 2622.26 samples/sec Loss 8.1783 LearningRate 0.0375 Epoch: 7 Global Step: 321710 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:52:59,161-Speed 2630.25 samples/sec Loss 8.3220 LearningRate 0.0375 Epoch: 7 Global Step: 321720 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:53:03,055-Speed 2630.38 samples/sec Loss 8.2598 LearningRate 0.0375 Epoch: 7 Global Step: 321730 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:53:06,965-Speed 2619.11 samples/sec Loss 8.2265 LearningRate 0.0375 Epoch: 7 Global Step: 321740 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:10,858-Speed 2630.98 samples/sec Loss 8.3990 LearningRate 0.0375 Epoch: 7 Global Step: 321750 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:14,755-Speed 2628.63 samples/sec Loss 8.2998 LearningRate 0.0375 Epoch: 7 Global Step: 321760 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:18,650-Speed 2629.79 samples/sec Loss 8.2148 LearningRate 0.0375 Epoch: 7 Global Step: 321770 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:22,546-Speed 2628.87 samples/sec Loss 8.2455 LearningRate 0.0375 Epoch: 7 Global Step: 321780 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:26,471-Speed 2609.24 samples/sec Loss 8.2474 LearningRate 0.0375 Epoch: 7 Global Step: 321790 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:30,370-Speed 2627.11 samples/sec Loss 8.2499 LearningRate 0.0375 Epoch: 7 Global Step: 321800 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:34,281-Speed 2619.22 samples/sec Loss 8.2884 LearningRate 0.0375 Epoch: 7 Global Step: 321810 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:38,175-Speed 2630.28 samples/sec Loss 8.2073 LearningRate 0.0375 Epoch: 7 Global Step: 321820 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:42,069-Speed 2630.49 samples/sec Loss 8.1349 LearningRate 0.0375 Epoch: 7 Global Step: 321830 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:53:45,958-Speed 2633.26 samples/sec Loss 8.2780 LearningRate 0.0375 Epoch: 7 Global Step: 321840 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:53:49,863-Speed 2622.88 samples/sec Loss 8.2273 LearningRate 0.0375 Epoch: 7 Global Step: 321850 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:53:53,756-Speed 2630.88 samples/sec Loss 8.3127 LearningRate 0.0375 Epoch: 7 Global Step: 321860 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:53:57,648-Speed 2631.67 samples/sec Loss 8.1246 LearningRate 0.0375 Epoch: 7 Global Step: 321870 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:54:01,539-Speed 2632.67 samples/sec Loss 8.3288 LearningRate 0.0375 Epoch: 7 Global Step: 321880 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:54:05,435-Speed 2628.67 samples/sec Loss 8.1800 LearningRate 0.0375 Epoch: 7 Global Step: 321890 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:54:09,330-Speed 2630.14 samples/sec Loss 8.2192 LearningRate 0.0375 Epoch: 7 Global Step: 321900 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:54:13,221-Speed 2632.18 samples/sec Loss 8.1754 LearningRate 0.0374 Epoch: 7 Global Step: 321910 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:54:17,085-Speed 2650.99 samples/sec Loss 8.2980 LearningRate 0.0374 Epoch: 7 Global Step: 321920 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:20,978-Speed 2631.20 samples/sec Loss 8.2863 LearningRate 0.0374 Epoch: 7 Global Step: 321930 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:24,899-Speed 2612.24 samples/sec Loss 8.2258 LearningRate 0.0374 Epoch: 7 Global Step: 321940 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:28,790-Speed 2632.42 samples/sec Loss 8.3574 LearningRate 0.0374 Epoch: 7 Global Step: 321950 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:32,683-Speed 2631.21 samples/sec Loss 8.1619 LearningRate 0.0374 Epoch: 7 Global Step: 321960 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:36,586-Speed 2624.17 samples/sec Loss 8.2622 LearningRate 0.0374 Epoch: 7 Global Step: 321970 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:40,477-Speed 2632.66 samples/sec Loss 8.3165 LearningRate 0.0374 Epoch: 7 Global Step: 321980 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:44,367-Speed 2633.36 samples/sec Loss 8.3320 LearningRate 0.0374 Epoch: 7 Global Step: 321990 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:48,259-Speed 2631.17 samples/sec Loss 8.2515 LearningRate 0.0374 Epoch: 7 Global Step: 322000 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:52,156-Speed 2629.27 samples/sec Loss 8.3110 LearningRate 0.0374 Epoch: 7 Global Step: 322010 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 07:54:56,064-Speed 2620.81 samples/sec Loss 8.2667 LearningRate 0.0374 Epoch: 7 Global Step: 322020 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:54:59,974-Speed 2618.90 samples/sec Loss 8.3264 LearningRate 0.0374 Epoch: 7 Global Step: 322030 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:03,887-Speed 2617.89 samples/sec Loss 8.1444 LearningRate 0.0374 Epoch: 7 Global Step: 322040 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:07,779-Speed 2632.30 samples/sec Loss 8.2264 LearningRate 0.0374 Epoch: 7 Global Step: 322050 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:11,677-Speed 2627.05 samples/sec Loss 8.1238 LearningRate 0.0374 Epoch: 7 Global Step: 322060 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:15,567-Speed 2633.48 samples/sec Loss 8.2566 LearningRate 0.0374 Epoch: 7 Global Step: 322070 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:19,553-Speed 2569.88 samples/sec Loss 8.2034 LearningRate 0.0374 Epoch: 7 Global Step: 322080 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:23,631-Speed 2511.78 samples/sec Loss 8.2602 LearningRate 0.0374 Epoch: 7 Global Step: 322090 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:27,615-Speed 2571.10 samples/sec Loss 8.2424 LearningRate 0.0374 Epoch: 7 Global Step: 322100 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:31,507-Speed 2631.61 samples/sec Loss 8.2013 LearningRate 0.0374 Epoch: 7 Global Step: 322110 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:55:35,398-Speed 2631.88 samples/sec Loss 8.2673 LearningRate 0.0374 Epoch: 7 Global Step: 322120 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:55:39,293-Speed 2629.76 samples/sec Loss 8.2427 LearningRate 0.0374 Epoch: 7 Global Step: 322130 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:55:43,192-Speed 2626.71 samples/sec Loss 8.2063 LearningRate 0.0374 Epoch: 7 Global Step: 322140 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:55:47,089-Speed 2628.48 samples/sec Loss 8.3063 LearningRate 0.0374 Epoch: 7 Global Step: 322150 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:55:50,984-Speed 2629.86 samples/sec Loss 8.2415 LearningRate 0.0374 Epoch: 7 Global Step: 322160 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:55:54,877-Speed 2630.93 samples/sec Loss 8.0396 LearningRate 0.0374 Epoch: 7 Global Step: 322170 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:55:58,766-Speed 2633.79 samples/sec Loss 8.2154 LearningRate 0.0374 Epoch: 7 Global Step: 322180 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:56:02,648-Speed 2638.69 samples/sec Loss 8.3045 LearningRate 0.0374 Epoch: 7 Global Step: 322190 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:06,574-Speed 2609.02 samples/sec Loss 8.2242 LearningRate 0.0374 Epoch: 7 Global Step: 322200 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:10,488-Speed 2616.82 samples/sec Loss 8.2505 LearningRate 0.0374 Epoch: 7 Global Step: 322210 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:14,380-Speed 2632.37 samples/sec Loss 8.0504 LearningRate 0.0374 Epoch: 7 Global Step: 322220 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:18,273-Speed 2630.73 samples/sec Loss 8.1271 LearningRate 0.0374 Epoch: 7 Global Step: 322230 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:22,166-Speed 2631.01 samples/sec Loss 8.2596 LearningRate 0.0374 Epoch: 7 Global Step: 322240 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:26,063-Speed 2628.49 samples/sec Loss 8.2252 LearningRate 0.0374 Epoch: 7 Global Step: 322250 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:29,956-Speed 2631.67 samples/sec Loss 8.2769 LearningRate 0.0374 Epoch: 7 Global Step: 322260 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:33,848-Speed 2631.34 samples/sec Loss 8.1872 LearningRate 0.0374 Epoch: 7 Global Step: 322270 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:37,743-Speed 2629.78 samples/sec Loss 8.2020 LearningRate 0.0374 Epoch: 7 Global Step: 322280 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 07:56:41,640-Speed 2628.36 samples/sec Loss 8.1919 LearningRate 0.0374 Epoch: 7 Global Step: 322290 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:56:45,533-Speed 2630.69 samples/sec Loss 8.2577 LearningRate 0.0374 Epoch: 7 Global Step: 322300 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:56:49,429-Speed 2629.46 samples/sec Loss 8.2490 LearningRate 0.0374 Epoch: 7 Global Step: 322310 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:56:53,323-Speed 2630.57 samples/sec Loss 8.2403 LearningRate 0.0374 Epoch: 7 Global Step: 322320 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:56:57,220-Speed 2628.43 samples/sec Loss 8.0996 LearningRate 0.0374 Epoch: 7 Global Step: 322330 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:57:01,112-Speed 2631.02 samples/sec Loss 8.2977 LearningRate 0.0374 Epoch: 7 Global Step: 322340 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:57:05,010-Speed 2627.90 samples/sec Loss 8.3270 LearningRate 0.0374 Epoch: 7 Global Step: 322350 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:57:08,904-Speed 2629.87 samples/sec Loss 8.3075 LearningRate 0.0374 Epoch: 7 Global Step: 322360 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:57:12,801-Speed 2628.34 samples/sec Loss 8.1137 LearningRate 0.0374 Epoch: 7 Global Step: 322370 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 07:57:16,631-Speed 2674.63 samples/sec Loss 8.4211 LearningRate 0.0374 Epoch: 7 Global Step: 322380 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:57:20,507-Speed 2642.23 samples/sec Loss 8.5375 LearningRate 0.0374 Epoch: 7 Global Step: 322390 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:24,397-Speed 2632.81 samples/sec Loss 8.4116 LearningRate 0.0374 Epoch: 7 Global Step: 322400 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:28,288-Speed 2632.85 samples/sec Loss 8.3062 LearningRate 0.0374 Epoch: 7 Global Step: 322410 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:32,178-Speed 2632.92 samples/sec Loss 8.3057 LearningRate 0.0374 Epoch: 7 Global Step: 322420 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:36,073-Speed 2629.99 samples/sec Loss 8.3136 LearningRate 0.0374 Epoch: 7 Global Step: 322430 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:39,973-Speed 2625.75 samples/sec Loss 8.3057 LearningRate 0.0374 Epoch: 7 Global Step: 322440 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:43,862-Speed 2633.50 samples/sec Loss 8.2801 LearningRate 0.0374 Epoch: 7 Global Step: 322450 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:47,750-Speed 2634.54 samples/sec Loss 8.5245 LearningRate 0.0374 Epoch: 7 Global Step: 322460 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:51,641-Speed 2632.66 samples/sec Loss 8.1412 LearningRate 0.0374 Epoch: 7 Global Step: 322470 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:55,551-Speed 2619.28 samples/sec Loss 8.3392 LearningRate 0.0374 Epoch: 7 Global Step: 322480 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 07:57:59,440-Speed 2633.76 samples/sec Loss 8.2428 LearningRate 0.0374 Epoch: 7 Global Step: 322490 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:03,332-Speed 2631.56 samples/sec Loss 8.0859 LearningRate 0.0374 Epoch: 7 Global Step: 322500 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:07,225-Speed 2631.21 samples/sec Loss 8.1145 LearningRate 0.0374 Epoch: 7 Global Step: 322510 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:11,116-Speed 2632.12 samples/sec Loss 8.2212 LearningRate 0.0374 Epoch: 7 Global Step: 322520 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:15,011-Speed 2629.38 samples/sec Loss 8.1932 LearningRate 0.0374 Epoch: 7 Global Step: 322530 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:18,908-Speed 2628.08 samples/sec Loss 8.2330 LearningRate 0.0374 Epoch: 7 Global Step: 322540 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:22,800-Speed 2631.84 samples/sec Loss 8.2540 LearningRate 0.0374 Epoch: 7 Global Step: 322550 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:26,691-Speed 2631.84 samples/sec Loss 8.3254 LearningRate 0.0374 Epoch: 7 Global Step: 322560 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:30,587-Speed 2629.24 samples/sec Loss 8.2750 LearningRate 0.0374 Epoch: 7 Global Step: 322570 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:34,482-Speed 2629.94 samples/sec Loss 8.4396 LearningRate 0.0374 Epoch: 7 Global Step: 322580 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 07:58:38,383-Speed 2625.41 samples/sec Loss 8.3487 LearningRate 0.0373 Epoch: 7 Global Step: 322590 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:58:42,280-Speed 2628.06 samples/sec Loss 8.2220 LearningRate 0.0373 Epoch: 7 Global Step: 322600 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:58:46,174-Speed 2630.87 samples/sec Loss 8.2577 LearningRate 0.0373 Epoch: 7 Global Step: 322610 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:58:50,073-Speed 2626.68 samples/sec Loss 8.3470 LearningRate 0.0373 Epoch: 7 Global Step: 322620 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:58:53,964-Speed 2632.44 samples/sec Loss 8.2660 LearningRate 0.0373 Epoch: 7 Global Step: 322630 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:58:57,872-Speed 2620.65 samples/sec Loss 8.2279 LearningRate 0.0373 Epoch: 7 Global Step: 322640 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:59:01,768-Speed 2628.67 samples/sec Loss 8.2572 LearningRate 0.0373 Epoch: 7 Global Step: 322650 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:59:05,658-Speed 2632.71 samples/sec Loss 8.1925 LearningRate 0.0373 Epoch: 7 Global Step: 322660 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:59:09,550-Speed 2631.82 samples/sec Loss 8.2294 LearningRate 0.0373 Epoch: 7 Global Step: 322670 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:59:13,449-Speed 2627.08 samples/sec Loss 8.0773 LearningRate 0.0373 Epoch: 7 Global Step: 322680 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 07:59:17,352-Speed 2624.21 samples/sec Loss 8.1560 LearningRate 0.0373 Epoch: 7 Global Step: 322690 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:21,255-Speed 2624.14 samples/sec Loss 8.2565 LearningRate 0.0373 Epoch: 7 Global Step: 322700 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:25,152-Speed 2628.72 samples/sec Loss 8.1938 LearningRate 0.0373 Epoch: 7 Global Step: 322710 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:29,046-Speed 2630.48 samples/sec Loss 8.2482 LearningRate 0.0373 Epoch: 7 Global Step: 322720 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:32,937-Speed 2632.39 samples/sec Loss 8.2163 LearningRate 0.0373 Epoch: 7 Global Step: 322730 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:36,827-Speed 2632.57 samples/sec Loss 8.3267 LearningRate 0.0373 Epoch: 7 Global Step: 322740 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:40,720-Speed 2631.35 samples/sec Loss 8.1422 LearningRate 0.0373 Epoch: 7 Global Step: 322750 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:44,608-Speed 2634.57 samples/sec Loss 8.1597 LearningRate 0.0373 Epoch: 7 Global Step: 322760 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:48,502-Speed 2630.29 samples/sec Loss 8.3599 LearningRate 0.0373 Epoch: 7 Global Step: 322770 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:52,400-Speed 2627.51 samples/sec Loss 8.2294 LearningRate 0.0373 Epoch: 7 Global Step: 322780 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 07:59:56,295-Speed 2629.65 samples/sec Loss 8.1935 LearningRate 0.0373 Epoch: 7 Global Step: 322790 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:00,191-Speed 2629.10 samples/sec Loss 8.1916 LearningRate 0.0373 Epoch: 7 Global Step: 322800 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:04,090-Speed 2626.71 samples/sec Loss 8.2611 LearningRate 0.0373 Epoch: 7 Global Step: 322810 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:07,985-Speed 2629.58 samples/sec Loss 8.1089 LearningRate 0.0373 Epoch: 7 Global Step: 322820 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:11,876-Speed 2632.15 samples/sec Loss 8.2498 LearningRate 0.0373 Epoch: 7 Global Step: 322830 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:15,763-Speed 2635.41 samples/sec Loss 8.1359 LearningRate 0.0373 Epoch: 7 Global Step: 322840 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:19,652-Speed 2633.47 samples/sec Loss 8.3276 LearningRate 0.0373 Epoch: 7 Global Step: 322850 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:23,555-Speed 2624.43 samples/sec Loss 8.2619 LearningRate 0.0373 Epoch: 7 Global Step: 322860 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:27,444-Speed 2633.36 samples/sec Loss 8.3124 LearningRate 0.0373 Epoch: 7 Global Step: 322870 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:31,337-Speed 2631.13 samples/sec Loss 8.1369 LearningRate 0.0373 Epoch: 7 Global Step: 322880 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:00:35,228-Speed 2632.16 samples/sec Loss 8.2838 LearningRate 0.0373 Epoch: 7 Global Step: 322890 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:00:39,119-Speed 2632.49 samples/sec Loss 8.2448 LearningRate 0.0373 Epoch: 7 Global Step: 322900 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:00:43,010-Speed 2632.55 samples/sec Loss 8.3040 LearningRate 0.0373 Epoch: 7 Global Step: 322910 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:00:46,905-Speed 2629.73 samples/sec Loss 8.2531 LearningRate 0.0373 Epoch: 7 Global Step: 322920 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:00:50,805-Speed 2625.75 samples/sec Loss 8.1787 LearningRate 0.0373 Epoch: 7 Global Step: 322930 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:00:54,709-Speed 2623.94 samples/sec Loss 8.3644 LearningRate 0.0373 Epoch: 7 Global Step: 322940 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:00:58,618-Speed 2619.79 samples/sec Loss 8.1807 LearningRate 0.0373 Epoch: 7 Global Step: 322950 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:01:02,520-Speed 2625.10 samples/sec Loss 8.2888 LearningRate 0.0373 Epoch: 7 Global Step: 322960 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:01:06,423-Speed 2624.02 samples/sec Loss 8.1933 LearningRate 0.0373 Epoch: 7 Global Step: 322970 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:01:10,324-Speed 2625.90 samples/sec Loss 8.1847 LearningRate 0.0373 Epoch: 7 Global Step: 322980 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:01:14,221-Speed 2627.75 samples/sec Loss 8.2751 LearningRate 0.0373 Epoch: 7 Global Step: 322990 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:18,119-Speed 2627.56 samples/sec Loss 8.3229 LearningRate 0.0373 Epoch: 7 Global Step: 323000 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:22,030-Speed 2618.94 samples/sec Loss 8.1761 LearningRate 0.0373 Epoch: 7 Global Step: 323010 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:25,925-Speed 2629.71 samples/sec Loss 8.2443 LearningRate 0.0373 Epoch: 7 Global Step: 323020 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:29,815-Speed 2633.02 samples/sec Loss 8.2466 LearningRate 0.0373 Epoch: 7 Global Step: 323030 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:33,720-Speed 2622.79 samples/sec Loss 8.1614 LearningRate 0.0373 Epoch: 7 Global Step: 323040 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:37,624-Speed 2623.35 samples/sec Loss 8.2585 LearningRate 0.0373 Epoch: 7 Global Step: 323050 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:41,524-Speed 2625.97 samples/sec Loss 8.1760 LearningRate 0.0373 Epoch: 7 Global Step: 323060 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:45,415-Speed 2632.58 samples/sec Loss 8.1035 LearningRate 0.0373 Epoch: 7 Global Step: 323070 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:49,312-Speed 2628.43 samples/sec Loss 8.3743 LearningRate 0.0373 Epoch: 7 Global Step: 323080 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:53,181-Speed 2647.73 samples/sec Loss 8.1303 LearningRate 0.0373 Epoch: 7 Global Step: 323090 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:01:57,074-Speed 2630.97 samples/sec Loss 8.3789 LearningRate 0.0373 Epoch: 7 Global Step: 323100 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:00,967-Speed 2630.82 samples/sec Loss 8.2241 LearningRate 0.0373 Epoch: 7 Global Step: 323110 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:04,860-Speed 2630.99 samples/sec Loss 8.1243 LearningRate 0.0373 Epoch: 7 Global Step: 323120 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:08,758-Speed 2627.29 samples/sec Loss 8.3115 LearningRate 0.0373 Epoch: 7 Global Step: 323130 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:12,658-Speed 2626.05 samples/sec Loss 8.2603 LearningRate 0.0373 Epoch: 7 Global Step: 323140 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:16,550-Speed 2631.60 samples/sec Loss 8.2245 LearningRate 0.0373 Epoch: 7 Global Step: 323150 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:20,445-Speed 2630.23 samples/sec Loss 8.2658 LearningRate 0.0373 Epoch: 7 Global Step: 323160 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:24,339-Speed 2630.37 samples/sec Loss 8.1335 LearningRate 0.0373 Epoch: 7 Global Step: 323170 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:28,235-Speed 2628.72 samples/sec Loss 8.2884 LearningRate 0.0373 Epoch: 7 Global Step: 323180 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:32,120-Speed 2636.80 samples/sec Loss 8.1889 LearningRate 0.0373 Epoch: 7 Global Step: 323190 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:36,010-Speed 2632.59 samples/sec Loss 8.2568 LearningRate 0.0373 Epoch: 7 Global Step: 323200 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:39,903-Speed 2630.79 samples/sec Loss 8.0151 LearningRate 0.0373 Epoch: 7 Global Step: 323210 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:02:43,795-Speed 2631.87 samples/sec Loss 8.2043 LearningRate 0.0373 Epoch: 7 Global Step: 323220 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:02:47,692-Speed 2627.68 samples/sec Loss 8.3020 LearningRate 0.0373 Epoch: 7 Global Step: 323230 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:02:51,588-Speed 2629.09 samples/sec Loss 8.2595 LearningRate 0.0373 Epoch: 7 Global Step: 323240 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:02:55,483-Speed 2629.97 samples/sec Loss 8.1965 LearningRate 0.0373 Epoch: 7 Global Step: 323250 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:02:59,386-Speed 2623.94 samples/sec Loss 8.2719 LearningRate 0.0373 Epoch: 7 Global Step: 323260 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:03,281-Speed 2629.63 samples/sec Loss 8.2135 LearningRate 0.0372 Epoch: 7 Global Step: 323270 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:07,178-Speed 2628.13 samples/sec Loss 8.3132 LearningRate 0.0372 Epoch: 7 Global Step: 323280 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:11,073-Speed 2629.95 samples/sec Loss 8.2782 LearningRate 0.0372 Epoch: 7 Global Step: 323290 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:14,972-Speed 2626.52 samples/sec Loss 8.3013 LearningRate 0.0372 Epoch: 7 Global Step: 323300 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:18,883-Speed 2619.08 samples/sec Loss 8.3226 LearningRate 0.0372 Epoch: 7 Global Step: 323310 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:22,793-Speed 2619.44 samples/sec Loss 8.1770 LearningRate 0.0372 Epoch: 7 Global Step: 323320 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:03:26,723-Speed 2605.82 samples/sec Loss 8.2176 LearningRate 0.0372 Epoch: 7 Global Step: 323330 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:03:30,616-Speed 2631.43 samples/sec Loss 8.1023 LearningRate 0.0372 Epoch: 7 Global Step: 323340 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:34,519-Speed 2623.94 samples/sec Loss 8.2271 LearningRate 0.0372 Epoch: 7 Global Step: 323350 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:38,430-Speed 2618.92 samples/sec Loss 8.1013 LearningRate 0.0372 Epoch: 7 Global Step: 323360 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:42,339-Speed 2619.96 samples/sec Loss 8.3039 LearningRate 0.0372 Epoch: 7 Global Step: 323370 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:46,237-Speed 2627.97 samples/sec Loss 8.1926 LearningRate 0.0372 Epoch: 7 Global Step: 323380 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:50,140-Speed 2624.30 samples/sec Loss 8.2885 LearningRate 0.0372 Epoch: 7 Global Step: 323390 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:54,043-Speed 2624.09 samples/sec Loss 8.0593 LearningRate 0.0372 Epoch: 7 Global Step: 323400 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:03:57,945-Speed 2624.91 samples/sec Loss 8.2487 LearningRate 0.0372 Epoch: 7 Global Step: 323410 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:01,848-Speed 2624.44 samples/sec Loss 8.3200 LearningRate 0.0372 Epoch: 7 Global Step: 323420 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:05,752-Speed 2623.46 samples/sec Loss 8.1806 LearningRate 0.0372 Epoch: 7 Global Step: 323430 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:09,643-Speed 2632.23 samples/sec Loss 8.2882 LearningRate 0.0372 Epoch: 7 Global Step: 323440 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:13,536-Speed 2630.87 samples/sec Loss 8.2100 LearningRate 0.0372 Epoch: 7 Global Step: 323450 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:17,432-Speed 2629.02 samples/sec Loss 8.3376 LearningRate 0.0372 Epoch: 7 Global Step: 323460 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:21,320-Speed 2634.54 samples/sec Loss 8.1986 LearningRate 0.0372 Epoch: 7 Global Step: 323470 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:25,214-Speed 2630.61 samples/sec Loss 8.1757 LearningRate 0.0372 Epoch: 7 Global Step: 323480 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:29,107-Speed 2630.69 samples/sec Loss 8.1473 LearningRate 0.0372 Epoch: 7 Global Step: 323490 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:33,001-Speed 2630.15 samples/sec Loss 8.2106 LearningRate 0.0372 Epoch: 7 Global Step: 323500 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:36,896-Speed 2630.06 samples/sec Loss 8.1036 LearningRate 0.0372 Epoch: 7 Global Step: 323510 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:40,798-Speed 2624.70 samples/sec Loss 8.2113 LearningRate 0.0372 Epoch: 7 Global Step: 323520 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:44,698-Speed 2626.12 samples/sec Loss 8.2024 LearningRate 0.0372 Epoch: 7 Global Step: 323530 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:04:48,595-Speed 2628.33 samples/sec Loss 8.1758 LearningRate 0.0372 Epoch: 7 Global Step: 323540 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:04:52,508-Speed 2617.41 samples/sec Loss 8.3267 LearningRate 0.0372 Epoch: 7 Global Step: 323550 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:04:56,436-Speed 2607.72 samples/sec Loss 8.1800 LearningRate 0.0372 Epoch: 7 Global Step: 323560 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:00,366-Speed 2606.60 samples/sec Loss 8.3108 LearningRate 0.0372 Epoch: 7 Global Step: 323570 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:04,373-Speed 2556.16 samples/sec Loss 8.2897 LearningRate 0.0372 Epoch: 7 Global Step: 323580 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:08,270-Speed 2627.52 samples/sec Loss 8.0463 LearningRate 0.0372 Epoch: 7 Global Step: 323590 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:12,166-Speed 2628.91 samples/sec Loss 8.2040 LearningRate 0.0372 Epoch: 7 Global Step: 323600 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:16,072-Speed 2622.53 samples/sec Loss 8.1305 LearningRate 0.0372 Epoch: 7 Global Step: 323610 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:19,968-Speed 2628.93 samples/sec Loss 8.1378 LearningRate 0.0372 Epoch: 7 Global Step: 323620 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:23,863-Speed 2629.80 samples/sec Loss 8.2499 LearningRate 0.0372 Epoch: 7 Global Step: 323630 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:27,759-Speed 2628.77 samples/sec Loss 8.3127 LearningRate 0.0372 Epoch: 7 Global Step: 323640 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:05:31,654-Speed 2629.67 samples/sec Loss 8.1701 LearningRate 0.0372 Epoch: 7 Global Step: 323650 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:05:35,530-Speed 2642.71 samples/sec Loss 8.1140 LearningRate 0.0372 Epoch: 7 Global Step: 323660 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:39,426-Speed 2628.97 samples/sec Loss 8.2762 LearningRate 0.0372 Epoch: 7 Global Step: 323670 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:43,341-Speed 2615.86 samples/sec Loss 8.3343 LearningRate 0.0372 Epoch: 7 Global Step: 323680 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:47,237-Speed 2628.80 samples/sec Loss 8.1312 LearningRate 0.0372 Epoch: 7 Global Step: 323690 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:51,136-Speed 2627.22 samples/sec Loss 8.2646 LearningRate 0.0372 Epoch: 7 Global Step: 323700 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:55,026-Speed 2632.49 samples/sec Loss 8.1718 LearningRate 0.0372 Epoch: 7 Global Step: 323710 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:05:58,921-Speed 2630.16 samples/sec Loss 8.1119 LearningRate 0.0372 Epoch: 7 Global Step: 323720 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:02,825-Speed 2623.42 samples/sec Loss 8.1840 LearningRate 0.0372 Epoch: 7 Global Step: 323730 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:06,725-Speed 2625.75 samples/sec Loss 8.1966 LearningRate 0.0372 Epoch: 7 Global Step: 323740 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:10,620-Speed 2629.79 samples/sec Loss 8.2561 LearningRate 0.0372 Epoch: 7 Global Step: 323750 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:14,521-Speed 2626.23 samples/sec Loss 8.3880 LearningRate 0.0372 Epoch: 7 Global Step: 323760 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:06:18,412-Speed 2631.95 samples/sec Loss 8.3049 LearningRate 0.0372 Epoch: 7 Global Step: 323770 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:22,329-Speed 2614.75 samples/sec Loss 8.2251 LearningRate 0.0372 Epoch: 7 Global Step: 323780 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:26,227-Speed 2627.87 samples/sec Loss 8.2904 LearningRate 0.0372 Epoch: 7 Global Step: 323790 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:30,119-Speed 2631.49 samples/sec Loss 8.1073 LearningRate 0.0372 Epoch: 7 Global Step: 323800 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:34,024-Speed 2622.59 samples/sec Loss 8.1454 LearningRate 0.0372 Epoch: 7 Global Step: 323810 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:37,932-Speed 2620.91 samples/sec Loss 8.1054 LearningRate 0.0372 Epoch: 7 Global Step: 323820 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:41,881-Speed 2593.81 samples/sec Loss 8.2495 LearningRate 0.0372 Epoch: 7 Global Step: 323830 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:45,821-Speed 2599.60 samples/sec Loss 8.0734 LearningRate 0.0372 Epoch: 7 Global Step: 323840 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:49,722-Speed 2625.04 samples/sec Loss 8.3051 LearningRate 0.0372 Epoch: 7 Global Step: 323850 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:53,634-Speed 2618.63 samples/sec Loss 8.2604 LearningRate 0.0372 Epoch: 7 Global Step: 323860 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:06:57,508-Speed 2643.82 samples/sec Loss 8.3323 LearningRate 0.0372 Epoch: 7 Global Step: 323870 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:01,401-Speed 2631.68 samples/sec Loss 8.2517 LearningRate 0.0372 Epoch: 7 Global Step: 323880 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:05,293-Speed 2631.39 samples/sec Loss 8.0254 LearningRate 0.0372 Epoch: 7 Global Step: 323890 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:09,191-Speed 2627.37 samples/sec Loss 8.3701 LearningRate 0.0372 Epoch: 7 Global Step: 323900 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:13,096-Speed 2622.27 samples/sec Loss 8.1439 LearningRate 0.0372 Epoch: 7 Global Step: 323910 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:16,998-Speed 2625.20 samples/sec Loss 8.2107 LearningRate 0.0372 Epoch: 7 Global Step: 323920 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:20,891-Speed 2631.33 samples/sec Loss 8.2545 LearningRate 0.0372 Epoch: 7 Global Step: 323930 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:24,787-Speed 2628.55 samples/sec Loss 8.3875 LearningRate 0.0372 Epoch: 7 Global Step: 323940 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:28,681-Speed 2631.05 samples/sec Loss 8.1710 LearningRate 0.0371 Epoch: 7 Global Step: 323950 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:32,583-Speed 2624.93 samples/sec Loss 8.1882 LearningRate 0.0371 Epoch: 7 Global Step: 323960 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:36,475-Speed 2631.46 samples/sec Loss 8.1856 LearningRate 0.0371 Epoch: 7 Global Step: 323970 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:07:40,352-Speed 2641.58 samples/sec Loss 8.2776 LearningRate 0.0371 Epoch: 7 Global Step: 323980 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:44,244-Speed 2631.78 samples/sec Loss 8.0367 LearningRate 0.0371 Epoch: 7 Global Step: 323990 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:48,138-Speed 2629.56 samples/sec Loss 8.2100 LearningRate 0.0371 Epoch: 7 Global Step: 324000 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:07:52,015-Speed 2642.45 samples/sec Loss 8.0615 LearningRate 0.0371 Epoch: 7 Global Step: 324010 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:07:55,905-Speed 2632.99 samples/sec Loss 8.1882 LearningRate 0.0371 Epoch: 7 Global Step: 324020 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:07:59,805-Speed 2626.12 samples/sec Loss 8.2857 LearningRate 0.0371 Epoch: 7 Global Step: 324030 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:03,700-Speed 2629.67 samples/sec Loss 8.1728 LearningRate 0.0371 Epoch: 7 Global Step: 324040 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:07,595-Speed 2629.88 samples/sec Loss 8.2033 LearningRate 0.0371 Epoch: 7 Global Step: 324050 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:11,487-Speed 2630.98 samples/sec Loss 8.1857 LearningRate 0.0371 Epoch: 7 Global Step: 324060 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:15,379-Speed 2631.98 samples/sec Loss 8.1555 LearningRate 0.0371 Epoch: 7 Global Step: 324070 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:19,270-Speed 2632.05 samples/sec Loss 8.1471 LearningRate 0.0371 Epoch: 7 Global Step: 324080 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:23,175-Speed 2622.95 samples/sec Loss 8.1432 LearningRate 0.0371 Epoch: 7 Global Step: 324090 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:27,075-Speed 2626.37 samples/sec Loss 8.2290 LearningRate 0.0371 Epoch: 7 Global Step: 324100 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:08:30,970-Speed 2629.82 samples/sec Loss 8.1153 LearningRate 0.0371 Epoch: 7 Global Step: 324110 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:08:34,875-Speed 2622.57 samples/sec Loss 8.2192 LearningRate 0.0371 Epoch: 7 Global Step: 324120 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:08:38,778-Speed 2624.60 samples/sec Loss 8.2261 LearningRate 0.0371 Epoch: 7 Global Step: 324130 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:08:42,671-Speed 2631.13 samples/sec Loss 8.2270 LearningRate 0.0371 Epoch: 7 Global Step: 324140 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:08:46,567-Speed 2629.25 samples/sec Loss 8.1644 LearningRate 0.0371 Epoch: 7 Global Step: 324150 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:08:50,460-Speed 2631.05 samples/sec Loss 8.2495 LearningRate 0.0371 Epoch: 7 Global Step: 324160 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:08:54,354-Speed 2629.91 samples/sec Loss 8.2469 LearningRate 0.0371 Epoch: 7 Global Step: 324170 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:08:58,247-Speed 2631.24 samples/sec Loss 8.2318 LearningRate 0.0371 Epoch: 7 Global Step: 324180 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:02,154-Speed 2621.28 samples/sec Loss 8.1349 LearningRate 0.0371 Epoch: 7 Global Step: 324190 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:06,047-Speed 2631.17 samples/sec Loss 8.2776 LearningRate 0.0371 Epoch: 7 Global Step: 324200 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:09,927-Speed 2639.43 samples/sec Loss 8.1813 LearningRate 0.0371 Epoch: 7 Global Step: 324210 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:13,820-Speed 2630.91 samples/sec Loss 8.2136 LearningRate 0.0371 Epoch: 7 Global Step: 324220 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:17,714-Speed 2630.52 samples/sec Loss 8.1511 LearningRate 0.0371 Epoch: 7 Global Step: 324230 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:21,605-Speed 2632.31 samples/sec Loss 8.2374 LearningRate 0.0371 Epoch: 7 Global Step: 324240 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:25,497-Speed 2631.89 samples/sec Loss 8.1928 LearningRate 0.0371 Epoch: 7 Global Step: 324250 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:29,406-Speed 2620.25 samples/sec Loss 8.1896 LearningRate 0.0371 Epoch: 7 Global Step: 324260 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:33,297-Speed 2632.27 samples/sec Loss 8.2167 LearningRate 0.0371 Epoch: 7 Global Step: 324270 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:37,204-Speed 2621.12 samples/sec Loss 8.1037 LearningRate 0.0371 Epoch: 7 Global Step: 324280 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:41,094-Speed 2632.75 samples/sec Loss 8.2433 LearningRate 0.0371 Epoch: 7 Global Step: 324290 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:44,999-Speed 2623.22 samples/sec Loss 8.1867 LearningRate 0.0371 Epoch: 7 Global Step: 324300 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:09:48,889-Speed 2632.50 samples/sec Loss 8.2536 LearningRate 0.0371 Epoch: 7 Global Step: 324310 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:09:52,786-Speed 2629.05 samples/sec Loss 8.1671 LearningRate 0.0371 Epoch: 7 Global Step: 324320 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:09:56,683-Speed 2627.81 samples/sec Loss 8.2178 LearningRate 0.0371 Epoch: 7 Global Step: 324330 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:10:00,573-Speed 2635.54 samples/sec Loss 8.2591 LearningRate 0.0371 Epoch: 7 Global Step: 324340 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:10:04,451-Speed 2641.02 samples/sec Loss 8.0822 LearningRate 0.0371 Epoch: 7 Global Step: 324350 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:10:08,327-Speed 2642.72 samples/sec Loss 8.1992 LearningRate 0.0371 Epoch: 7 Global Step: 324360 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:12,219-Speed 2631.65 samples/sec Loss 8.1305 LearningRate 0.0371 Epoch: 7 Global Step: 324370 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:16,114-Speed 2629.57 samples/sec Loss 8.1562 LearningRate 0.0371 Epoch: 7 Global Step: 324380 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:20,005-Speed 2632.24 samples/sec Loss 8.1906 LearningRate 0.0371 Epoch: 7 Global Step: 324390 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:23,896-Speed 2632.24 samples/sec Loss 8.2135 LearningRate 0.0371 Epoch: 7 Global Step: 324400 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:27,787-Speed 2631.92 samples/sec Loss 8.0871 LearningRate 0.0371 Epoch: 7 Global Step: 324410 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:31,676-Speed 2634.30 samples/sec Loss 8.2739 LearningRate 0.0371 Epoch: 7 Global Step: 324420 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:35,570-Speed 2630.10 samples/sec Loss 8.1621 LearningRate 0.0371 Epoch: 7 Global Step: 324430 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:39,474-Speed 2623.49 samples/sec Loss 8.0331 LearningRate 0.0371 Epoch: 7 Global Step: 324440 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:43,402-Speed 2607.76 samples/sec Loss 8.2574 LearningRate 0.0371 Epoch: 7 Global Step: 324450 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:47,295-Speed 2630.89 samples/sec Loss 8.1786 LearningRate 0.0371 Epoch: 7 Global Step: 324460 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:10:51,172-Speed 2641.71 samples/sec Loss 8.1159 LearningRate 0.0371 Epoch: 7 Global Step: 324470 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:55,072-Speed 2626.98 samples/sec Loss 8.2105 LearningRate 0.0371 Epoch: 7 Global Step: 324480 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:10:58,964-Speed 2631.69 samples/sec Loss 8.1302 LearningRate 0.0371 Epoch: 7 Global Step: 324490 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:02,860-Speed 2628.47 samples/sec Loss 8.2334 LearningRate 0.0371 Epoch: 7 Global Step: 324500 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:06,764-Speed 2623.93 samples/sec Loss 8.1632 LearningRate 0.0371 Epoch: 7 Global Step: 324510 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:10,662-Speed 2627.71 samples/sec Loss 8.1607 LearningRate 0.0371 Epoch: 7 Global Step: 324520 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:14,558-Speed 2628.67 samples/sec Loss 8.0956 LearningRate 0.0371 Epoch: 7 Global Step: 324530 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:18,458-Speed 2626.05 samples/sec Loss 8.2098 LearningRate 0.0371 Epoch: 7 Global Step: 324540 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:22,359-Speed 2625.60 samples/sec Loss 8.1457 LearningRate 0.0371 Epoch: 7 Global Step: 324550 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:26,259-Speed 2626.27 samples/sec Loss 8.2800 LearningRate 0.0371 Epoch: 7 Global Step: 324560 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:11:30,155-Speed 2629.17 samples/sec Loss 8.1249 LearningRate 0.0371 Epoch: 7 Global Step: 324570 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:11:34,050-Speed 2629.66 samples/sec Loss 8.1857 LearningRate 0.0371 Epoch: 7 Global Step: 324580 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:11:37,955-Speed 2622.40 samples/sec Loss 8.2117 LearningRate 0.0371 Epoch: 7 Global Step: 324590 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:11:41,853-Speed 2627.58 samples/sec Loss 8.1500 LearningRate 0.0371 Epoch: 7 Global Step: 324600 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:11:45,752-Speed 2627.46 samples/sec Loss 8.1789 LearningRate 0.0371 Epoch: 7 Global Step: 324610 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:11:49,649-Speed 2627.89 samples/sec Loss 8.0609 LearningRate 0.0371 Epoch: 7 Global Step: 324620 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:11:53,547-Speed 2627.96 samples/sec Loss 8.2062 LearningRate 0.0370 Epoch: 7 Global Step: 324630 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:11:57,452-Speed 2623.13 samples/sec Loss 8.3936 LearningRate 0.0370 Epoch: 7 Global Step: 324640 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:12:01,346-Speed 2629.93 samples/sec Loss 8.1562 LearningRate 0.0370 Epoch: 7 Global Step: 324650 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:12:05,236-Speed 2632.88 samples/sec Loss 8.2340 LearningRate 0.0370 Epoch: 7 Global Step: 324660 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:12:09,127-Speed 2632.14 samples/sec Loss 8.2560 LearningRate 0.0370 Epoch: 7 Global Step: 324670 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:12:13,008-Speed 2639.26 samples/sec Loss 8.2580 LearningRate 0.0370 Epoch: 7 Global Step: 324680 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:12:16,905-Speed 2628.37 samples/sec Loss 8.3107 LearningRate 0.0370 Epoch: 7 Global Step: 324690 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:12:20,806-Speed 2625.56 samples/sec Loss 8.0904 LearningRate 0.0370 Epoch: 7 Global Step: 324700 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:12:24,700-Speed 2630.11 samples/sec Loss 8.2057 LearningRate 0.0370 Epoch: 7 Global Step: 324710 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:12:28,495-Speed 2699.44 samples/sec Loss 8.4634 LearningRate 0.0370 Epoch: 7 Global Step: 324720 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:32,380-Speed 2636.59 samples/sec Loss 8.4139 LearningRate 0.0370 Epoch: 7 Global Step: 324730 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:36,331-Speed 2592.03 samples/sec Loss 8.2323 LearningRate 0.0370 Epoch: 7 Global Step: 324740 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:40,232-Speed 2625.42 samples/sec Loss 8.1621 LearningRate 0.0370 Epoch: 7 Global Step: 324750 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:44,120-Speed 2634.29 samples/sec Loss 8.3162 LearningRate 0.0370 Epoch: 7 Global Step: 324760 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:48,013-Speed 2630.86 samples/sec Loss 8.2872 LearningRate 0.0370 Epoch: 7 Global Step: 324770 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:51,911-Speed 2627.99 samples/sec Loss 8.1777 LearningRate 0.0370 Epoch: 7 Global Step: 324780 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:55,804-Speed 2630.45 samples/sec Loss 8.1920 LearningRate 0.0370 Epoch: 7 Global Step: 324790 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:12:59,701-Speed 2628.52 samples/sec Loss 8.2690 LearningRate 0.0370 Epoch: 7 Global Step: 324800 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:13:03,597-Speed 2628.97 samples/sec Loss 8.1036 LearningRate 0.0370 Epoch: 7 Global Step: 324810 Fp16 Grad Scale: 1024 Required: 57 hours
Training: 2022-04-14 08:13:07,491-Speed 2630.09 samples/sec Loss 8.1956 LearningRate 0.0370 Epoch: 7 Global Step: 324820 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:11,397-Speed 2622.61 samples/sec Loss 8.1517 LearningRate 0.0370 Epoch: 7 Global Step: 324830 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:15,291-Speed 2630.29 samples/sec Loss 8.1446 LearningRate 0.0370 Epoch: 7 Global Step: 324840 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:19,179-Speed 2633.89 samples/sec Loss 8.1770 LearningRate 0.0370 Epoch: 7 Global Step: 324850 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:23,067-Speed 2634.48 samples/sec Loss 8.2195 LearningRate 0.0370 Epoch: 7 Global Step: 324860 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:26,956-Speed 2633.49 samples/sec Loss 8.2281 LearningRate 0.0370 Epoch: 7 Global Step: 324870 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:30,842-Speed 2635.78 samples/sec Loss 8.1292 LearningRate 0.0370 Epoch: 7 Global Step: 324880 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:34,731-Speed 2633.35 samples/sec Loss 8.3120 LearningRate 0.0370 Epoch: 7 Global Step: 324890 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:38,619-Speed 2634.71 samples/sec Loss 8.2246 LearningRate 0.0370 Epoch: 7 Global Step: 324900 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:42,512-Speed 2631.14 samples/sec Loss 8.1428 LearningRate 0.0370 Epoch: 7 Global Step: 324910 Fp16 Grad Scale: 2048 Required: 57 hours
Training: 2022-04-14 08:13:46,405-Speed 2630.76 samples/sec Loss 8.1922 LearningRate 0.0370 Epoch: 7 Global Step: 324920 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:13:50,302-Speed 2628.69 samples/sec Loss 8.0814 LearningRate 0.0370 Epoch: 7 Global Step: 324930 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:13:54,201-Speed 2626.70 samples/sec Loss 8.2327 LearningRate 0.0370 Epoch: 7 Global Step: 324940 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:13:58,102-Speed 2625.20 samples/sec Loss 8.2220 LearningRate 0.0370 Epoch: 7 Global Step: 324950 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:14:02,043-Speed 2599.32 samples/sec Loss 8.1802 LearningRate 0.0370 Epoch: 7 Global Step: 324960 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:14:05,941-Speed 2627.63 samples/sec Loss 8.4914 LearningRate 0.0370 Epoch: 7 Global Step: 324970 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:14:09,839-Speed 2627.45 samples/sec Loss 9.7753 LearningRate 0.0370 Epoch: 7 Global Step: 324980 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:14:13,730-Speed 2631.86 samples/sec Loss 8.7474 LearningRate 0.0370 Epoch: 7 Global Step: 324990 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:14:17,621-Speed 2633.03 samples/sec Loss 8.5662 LearningRate 0.0370 Epoch: 7 Global Step: 325000 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:14:21,511-Speed 2633.06 samples/sec Loss 8.6005 LearningRate 0.0370 Epoch: 7 Global Step: 325010 Fp16 Grad Scale: 4096 Required: 57 hours
Training: 2022-04-14 08:14:25,402-Speed 2632.07 samples/sec Loss 8.3873 LearningRate 0.0370 Epoch: 7 Global Step: 325020 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:29,293-Speed 2632.91 samples/sec Loss 8.3539 LearningRate 0.0370 Epoch: 7 Global Step: 325030 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:33,182-Speed 2633.10 samples/sec Loss 8.2783 LearningRate 0.0370 Epoch: 7 Global Step: 325040 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:37,071-Speed 2634.27 samples/sec Loss 8.3541 LearningRate 0.0370 Epoch: 7 Global Step: 325050 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:40,961-Speed 2633.09 samples/sec Loss 8.2536 LearningRate 0.0370 Epoch: 7 Global Step: 325060 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:44,856-Speed 2629.43 samples/sec Loss 8.3219 LearningRate 0.0370 Epoch: 7 Global Step: 325070 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:48,750-Speed 2630.29 samples/sec Loss 8.1877 LearningRate 0.0370 Epoch: 7 Global Step: 325080 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:52,643-Speed 2631.32 samples/sec Loss 8.2336 LearningRate 0.0370 Epoch: 7 Global Step: 325090 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:14:56,539-Speed 2629.80 samples/sec Loss 8.3311 LearningRate 0.0370 Epoch: 7 Global Step: 325100 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:15:00,447-Speed 2620.71 samples/sec Loss 8.2135 LearningRate 0.0370 Epoch: 7 Global Step: 325110 Fp16 Grad Scale: 8192 Required: 57 hours
Training: 2022-04-14 08:15:04,344-Speed 2627.92 samples/sec Loss 8.2210 LearningRate 0.0370 Epoch: 7 Global Step: 325120 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:08,240-Speed 2629.23 samples/sec Loss 8.1852 LearningRate 0.0370 Epoch: 7 Global Step: 325130 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:12,145-Speed 2622.86 samples/sec Loss 8.1394 LearningRate 0.0370 Epoch: 7 Global Step: 325140 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:16,076-Speed 2605.28 samples/sec Loss 8.1339 LearningRate 0.0370 Epoch: 7 Global Step: 325150 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:19,964-Speed 2634.92 samples/sec Loss 8.0512 LearningRate 0.0370 Epoch: 7 Global Step: 325160 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:23,856-Speed 2632.03 samples/sec Loss 7.9733 LearningRate 0.0370 Epoch: 7 Global Step: 325170 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:27,765-Speed 2620.52 samples/sec Loss 8.2161 LearningRate 0.0370 Epoch: 7 Global Step: 325180 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:31,662-Speed 2627.70 samples/sec Loss 8.3257 LearningRate 0.0370 Epoch: 7 Global Step: 325190 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:35,562-Speed 2626.65 samples/sec Loss 8.1498 LearningRate 0.0370 Epoch: 7 Global Step: 325200 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:39,458-Speed 2629.01 samples/sec Loss 8.2505 LearningRate 0.0370 Epoch: 7 Global Step: 325210 Fp16 Grad Scale: 16384 Required: 57 hours
Training: 2022-04-14 08:15:43,358-Speed 2625.95 samples/sec Loss 8.1995 LearningRate 0.0370 Epoch: 7 Global Step: 325220 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:15:47,249-Speed 2632.33 samples/sec Loss 8.1666 LearningRate 0.0370 Epoch: 7 Global Step: 325230 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:15:51,142-Speed 2630.93 samples/sec Loss 8.1047 LearningRate 0.0370 Epoch: 7 Global Step: 325240 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:15:55,035-Speed 2631.27 samples/sec Loss 8.0400 LearningRate 0.0370 Epoch: 7 Global Step: 325250 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:15:58,928-Speed 2630.87 samples/sec Loss 8.2464 LearningRate 0.0370 Epoch: 7 Global Step: 325260 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:16:02,818-Speed 2633.06 samples/sec Loss 8.1512 LearningRate 0.0370 Epoch: 7 Global Step: 325270 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:16:06,716-Speed 2628.19 samples/sec Loss 8.1529 LearningRate 0.0370 Epoch: 7 Global Step: 325280 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:16:10,600-Speed 2637.20 samples/sec Loss 8.2101 LearningRate 0.0370 Epoch: 7 Global Step: 325290 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:16:14,495-Speed 2628.96 samples/sec Loss 8.2998 LearningRate 0.0370 Epoch: 7 Global Step: 325300 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:16:18,394-Speed 2626.88 samples/sec Loss 8.1014 LearningRate 0.0369 Epoch: 7 Global Step: 325310 Fp16 Grad Scale: 32768 Required: 57 hours
Training: 2022-04-14 08:16:22,294-Speed 2626.35 samples/sec Loss 8.2315 LearningRate 0.0369 Epoch: 7 Global Step: 325320 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:26,200-Speed 2622.39 samples/sec Loss 8.3914 LearningRate 0.0369 Epoch: 7 Global Step: 325330 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:30,095-Speed 2629.61 samples/sec Loss 8.1339 LearningRate 0.0369 Epoch: 7 Global Step: 325340 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:33,988-Speed 2631.56 samples/sec Loss 8.1389 LearningRate 0.0369 Epoch: 7 Global Step: 325350 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:37,876-Speed 2634.18 samples/sec Loss 8.1974 LearningRate 0.0369 Epoch: 7 Global Step: 325360 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:41,766-Speed 2633.67 samples/sec Loss 8.1706 LearningRate 0.0369 Epoch: 7 Global Step: 325370 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:45,657-Speed 2631.63 samples/sec Loss 8.2502 LearningRate 0.0369 Epoch: 7 Global Step: 325380 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:49,546-Speed 2633.86 samples/sec Loss 8.2727 LearningRate 0.0369 Epoch: 7 Global Step: 325390 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:53,441-Speed 2629.55 samples/sec Loss 8.1787 LearningRate 0.0369 Epoch: 7 Global Step: 325400 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:16:57,370-Speed 2607.33 samples/sec Loss 8.2321 LearningRate 0.0369 Epoch: 7 Global Step: 325410 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:01,270-Speed 2625.74 samples/sec Loss 8.2805 LearningRate 0.0369 Epoch: 7 Global Step: 325420 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:17:05,170-Speed 2626.50 samples/sec Loss 8.0914 LearningRate 0.0369 Epoch: 7 Global Step: 325430 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:17:09,060-Speed 2632.80 samples/sec Loss 8.1180 LearningRate 0.0369 Epoch: 7 Global Step: 325440 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:17:12,949-Speed 2634.21 samples/sec Loss 8.0060 LearningRate 0.0369 Epoch: 7 Global Step: 325450 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:17:16,841-Speed 2631.99 samples/sec Loss 8.2054 LearningRate 0.0369 Epoch: 7 Global Step: 325460 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:17:20,798-Speed 2588.54 samples/sec Loss 8.2214 LearningRate 0.0369 Epoch: 7 Global Step: 325470 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:17:24,864-Speed 2519.02 samples/sec Loss 8.2392 LearningRate 0.0369 Epoch: 7 Global Step: 325480 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:28,795-Speed 2605.40 samples/sec Loss 8.1915 LearningRate 0.0369 Epoch: 7 Global Step: 325490 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:32,699-Speed 2624.03 samples/sec Loss 8.1593 LearningRate 0.0369 Epoch: 7 Global Step: 325500 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:36,598-Speed 2626.70 samples/sec Loss 8.1851 LearningRate 0.0369 Epoch: 7 Global Step: 325510 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:40,493-Speed 2630.11 samples/sec Loss 8.1199 LearningRate 0.0369 Epoch: 7 Global Step: 325520 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:44,419-Speed 2608.45 samples/sec Loss 8.1724 LearningRate 0.0369 Epoch: 7 Global Step: 325530 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:48,424-Speed 2557.85 samples/sec Loss 8.2100 LearningRate 0.0369 Epoch: 7 Global Step: 325540 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:52,313-Speed 2634.34 samples/sec Loss 8.1241 LearningRate 0.0369 Epoch: 7 Global Step: 325550 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:17:56,269-Speed 2588.37 samples/sec Loss 8.2377 LearningRate 0.0369 Epoch: 7 Global Step: 325560 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:00,354-Speed 2507.62 samples/sec Loss 8.1252 LearningRate 0.0369 Epoch: 7 Global Step: 325570 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:04,246-Speed 2632.12 samples/sec Loss 8.1792 LearningRate 0.0369 Epoch: 7 Global Step: 325580 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:08,137-Speed 2632.03 samples/sec Loss 8.1602 LearningRate 0.0369 Epoch: 7 Global Step: 325590 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:12,043-Speed 2621.79 samples/sec Loss 8.2393 LearningRate 0.0369 Epoch: 7 Global Step: 325600 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:15,938-Speed 2630.92 samples/sec Loss 8.2439 LearningRate 0.0369 Epoch: 7 Global Step: 325610 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:19,843-Speed 2623.30 samples/sec Loss 8.0668 LearningRate 0.0369 Epoch: 7 Global Step: 325620 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:23,736-Speed 2631.00 samples/sec Loss 8.2349 LearningRate 0.0369 Epoch: 7 Global Step: 325630 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:27,629-Speed 2631.13 samples/sec Loss 7.9739 LearningRate 0.0369 Epoch: 7 Global Step: 325640 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:31,643-Speed 2551.70 samples/sec Loss 8.2249 LearningRate 0.0369 Epoch: 7 Global Step: 325650 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:18:35,527-Speed 2637.30 samples/sec Loss 8.1013 LearningRate 0.0369 Epoch: 7 Global Step: 325660 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:39,416-Speed 2633.36 samples/sec Loss 8.2822 LearningRate 0.0369 Epoch: 7 Global Step: 325670 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:43,306-Speed 2632.54 samples/sec Loss 8.1805 LearningRate 0.0369 Epoch: 7 Global Step: 325680 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:47,197-Speed 2632.55 samples/sec Loss 8.1327 LearningRate 0.0369 Epoch: 7 Global Step: 325690 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:51,089-Speed 2632.15 samples/sec Loss 8.0916 LearningRate 0.0369 Epoch: 7 Global Step: 325700 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:54,979-Speed 2633.48 samples/sec Loss 8.1488 LearningRate 0.0369 Epoch: 7 Global Step: 325710 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:18:58,875-Speed 2628.24 samples/sec Loss 8.1577 LearningRate 0.0369 Epoch: 7 Global Step: 325720 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:19:02,767-Speed 2632.09 samples/sec Loss 8.2222 LearningRate 0.0369 Epoch: 7 Global Step: 325730 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:19:06,671-Speed 2623.57 samples/sec Loss 8.2740 LearningRate 0.0369 Epoch: 7 Global Step: 325740 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:19:10,572-Speed 2625.54 samples/sec Loss 8.1048 LearningRate 0.0369 Epoch: 7 Global Step: 325750 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:19:14,463-Speed 2631.78 samples/sec Loss 8.2022 LearningRate 0.0369 Epoch: 7 Global Step: 325760 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:18,404-Speed 2599.18 samples/sec Loss 8.0972 LearningRate 0.0369 Epoch: 7 Global Step: 325770 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:22,294-Speed 2633.25 samples/sec Loss 8.2073 LearningRate 0.0369 Epoch: 7 Global Step: 325780 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:26,188-Speed 2630.78 samples/sec Loss 8.2338 LearningRate 0.0369 Epoch: 7 Global Step: 325790 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:30,082-Speed 2630.08 samples/sec Loss 8.1638 LearningRate 0.0369 Epoch: 7 Global Step: 325800 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:33,979-Speed 2628.03 samples/sec Loss 8.2014 LearningRate 0.0369 Epoch: 7 Global Step: 325810 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:37,888-Speed 2620.34 samples/sec Loss 8.2862 LearningRate 0.0369 Epoch: 7 Global Step: 325820 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:41,789-Speed 2625.60 samples/sec Loss 8.2135 LearningRate 0.0369 Epoch: 7 Global Step: 325830 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:45,692-Speed 2624.52 samples/sec Loss 8.1700 LearningRate 0.0369 Epoch: 7 Global Step: 325840 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:49,607-Speed 2616.08 samples/sec Loss 8.1419 LearningRate 0.0369 Epoch: 7 Global Step: 325850 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:19:53,517-Speed 2619.92 samples/sec Loss 8.2179 LearningRate 0.0369 Epoch: 7 Global Step: 325860 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:19:57,422-Speed 2622.63 samples/sec Loss 8.2351 LearningRate 0.0369 Epoch: 7 Global Step: 325870 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:20:01,333-Speed 2618.84 samples/sec Loss 8.2260 LearningRate 0.0369 Epoch: 7 Global Step: 325880 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:20:05,251-Speed 2614.21 samples/sec Loss 8.1209 LearningRate 0.0369 Epoch: 7 Global Step: 325890 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:20:09,150-Speed 2626.94 samples/sec Loss 8.2907 LearningRate 0.0369 Epoch: 7 Global Step: 325900 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:20:13,041-Speed 2632.14 samples/sec Loss 8.0296 LearningRate 0.0369 Epoch: 7 Global Step: 325910 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:20:16,933-Speed 2632.26 samples/sec Loss 8.1545 LearningRate 0.0369 Epoch: 7 Global Step: 325920 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:20:20,804-Speed 2645.27 samples/sec Loss 8.0361 LearningRate 0.0369 Epoch: 7 Global Step: 325930 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:24,695-Speed 2632.50 samples/sec Loss 8.1655 LearningRate 0.0369 Epoch: 7 Global Step: 325940 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:28,584-Speed 2633.72 samples/sec Loss 8.1514 LearningRate 0.0369 Epoch: 7 Global Step: 325950 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:32,477-Speed 2631.05 samples/sec Loss 8.1053 LearningRate 0.0369 Epoch: 7 Global Step: 325960 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:36,368-Speed 2632.11 samples/sec Loss 8.1733 LearningRate 0.0369 Epoch: 7 Global Step: 325970 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:40,264-Speed 2629.02 samples/sec Loss 8.0660 LearningRate 0.0369 Epoch: 7 Global Step: 325980 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:44,156-Speed 2631.79 samples/sec Loss 8.1463 LearningRate 0.0369 Epoch: 7 Global Step: 325990 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:48,050-Speed 2629.99 samples/sec Loss 8.0442 LearningRate 0.0368 Epoch: 7 Global Step: 326000 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:51,947-Speed 2628.95 samples/sec Loss 8.0426 LearningRate 0.0368 Epoch: 7 Global Step: 326010 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:55,838-Speed 2632.01 samples/sec Loss 8.2487 LearningRate 0.0368 Epoch: 7 Global Step: 326020 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:20:59,729-Speed 2632.12 samples/sec Loss 8.0900 LearningRate 0.0368 Epoch: 7 Global Step: 326030 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:21:03,638-Speed 2620.31 samples/sec Loss 8.1789 LearningRate 0.0368 Epoch: 7 Global Step: 326040 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:21:07,541-Speed 2625.00 samples/sec Loss 8.1231 LearningRate 0.0368 Epoch: 7 Global Step: 326050 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:21:11,440-Speed 2626.67 samples/sec Loss 8.2435 LearningRate 0.0368 Epoch: 7 Global Step: 326060 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:21:15,368-Speed 2607.18 samples/sec Loss 8.2907 LearningRate 0.0368 Epoch: 7 Global Step: 326070 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:19,258-Speed 2633.56 samples/sec Loss 7.9792 LearningRate 0.0368 Epoch: 7 Global Step: 326080 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:23,150-Speed 2631.75 samples/sec Loss 8.0802 LearningRate 0.0368 Epoch: 7 Global Step: 326090 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:27,043-Speed 2631.47 samples/sec Loss 8.1097 LearningRate 0.0368 Epoch: 7 Global Step: 326100 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:30,950-Speed 2621.68 samples/sec Loss 8.1579 LearningRate 0.0368 Epoch: 7 Global Step: 326110 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:34,844-Speed 2630.40 samples/sec Loss 8.1637 LearningRate 0.0368 Epoch: 7 Global Step: 326120 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:38,753-Speed 2620.23 samples/sec Loss 8.0694 LearningRate 0.0368 Epoch: 7 Global Step: 326130 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:42,654-Speed 2625.23 samples/sec Loss 8.1450 LearningRate 0.0368 Epoch: 7 Global Step: 326140 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:46,552-Speed 2628.15 samples/sec Loss 8.1532 LearningRate 0.0368 Epoch: 7 Global Step: 326150 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:50,448-Speed 2629.04 samples/sec Loss 8.2410 LearningRate 0.0368 Epoch: 7 Global Step: 326160 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:21:54,341-Speed 2630.96 samples/sec Loss 8.2696 LearningRate 0.0368 Epoch: 7 Global Step: 326170 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:21:58,232-Speed 2632.69 samples/sec Loss 8.2285 LearningRate 0.0368 Epoch: 7 Global Step: 326180 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:02,127-Speed 2629.48 samples/sec Loss 8.0963 LearningRate 0.0368 Epoch: 7 Global Step: 326190 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:06,019-Speed 2631.88 samples/sec Loss 8.0706 LearningRate 0.0368 Epoch: 7 Global Step: 326200 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:09,919-Speed 2626.07 samples/sec Loss 8.1965 LearningRate 0.0368 Epoch: 7 Global Step: 326210 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:13,827-Speed 2621.04 samples/sec Loss 8.0451 LearningRate 0.0368 Epoch: 7 Global Step: 326220 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:17,838-Speed 2553.77 samples/sec Loss 8.2903 LearningRate 0.0368 Epoch: 7 Global Step: 326230 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:21,814-Speed 2576.94 samples/sec Loss 8.1542 LearningRate 0.0368 Epoch: 7 Global Step: 326240 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:25,707-Speed 2630.78 samples/sec Loss 8.1264 LearningRate 0.0368 Epoch: 7 Global Step: 326250 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:29,622-Speed 2615.79 samples/sec Loss 8.1908 LearningRate 0.0368 Epoch: 7 Global Step: 326260 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:33,529-Speed 2622.02 samples/sec Loss 8.1861 LearningRate 0.0368 Epoch: 7 Global Step: 326270 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:22:37,422-Speed 2630.94 samples/sec Loss 8.1376 LearningRate 0.0368 Epoch: 7 Global Step: 326280 Fp16 Grad Scale: 262144 Required: 57 hours
Training: 2022-04-14 08:22:41,295-Speed 2644.87 samples/sec Loss 8.2118 LearningRate 0.0368 Epoch: 7 Global Step: 326290 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:45,195-Speed 2626.21 samples/sec Loss 8.1495 LearningRate 0.0368 Epoch: 7 Global Step: 326300 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:49,088-Speed 2630.62 samples/sec Loss 8.0880 LearningRate 0.0368 Epoch: 7 Global Step: 326310 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:52,984-Speed 2628.89 samples/sec Loss 8.0397 LearningRate 0.0368 Epoch: 7 Global Step: 326320 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:22:56,870-Speed 2635.77 samples/sec Loss 8.1632 LearningRate 0.0368 Epoch: 7 Global Step: 326330 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:00,758-Speed 2634.63 samples/sec Loss 8.1328 LearningRate 0.0368 Epoch: 7 Global Step: 326340 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:04,656-Speed 2627.29 samples/sec Loss 8.1810 LearningRate 0.0368 Epoch: 7 Global Step: 326350 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:08,554-Speed 2627.92 samples/sec Loss 8.0872 LearningRate 0.0368 Epoch: 7 Global Step: 326360 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:12,458-Speed 2623.86 samples/sec Loss 8.2099 LearningRate 0.0368 Epoch: 7 Global Step: 326370 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:16,356-Speed 2627.35 samples/sec Loss 8.1177 LearningRate 0.0368 Epoch: 7 Global Step: 326380 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:20,260-Speed 2623.50 samples/sec Loss 8.1818 LearningRate 0.0368 Epoch: 7 Global Step: 326390 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:24,158-Speed 2627.85 samples/sec Loss 8.1492 LearningRate 0.0368 Epoch: 7 Global Step: 326400 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:28,206-Speed 2529.91 samples/sec Loss 8.2405 LearningRate 0.0368 Epoch: 7 Global Step: 326410 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:32,109-Speed 2624.28 samples/sec Loss 8.1460 LearningRate 0.0368 Epoch: 7 Global Step: 326420 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:36,005-Speed 2629.10 samples/sec Loss 8.1566 LearningRate 0.0368 Epoch: 7 Global Step: 326430 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:23:39,921-Speed 2615.27 samples/sec Loss 8.2216 LearningRate 0.0368 Epoch: 7 Global Step: 326440 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:23:43,928-Speed 2556.20 samples/sec Loss 8.2039 LearningRate 0.0368 Epoch: 7 Global Step: 326450 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:23:47,821-Speed 2631.02 samples/sec Loss 8.0517 LearningRate 0.0368 Epoch: 7 Global Step: 326460 Fp16 Grad Scale: 131072 Required: 57 hours
Training: 2022-04-14 08:23:51,704-Speed 2637.59 samples/sec Loss 8.2661 LearningRate 0.0368 Epoch: 7 Global Step: 326470 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:55,602-Speed 2627.79 samples/sec Loss 8.0343 LearningRate 0.0368 Epoch: 7 Global Step: 326480 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:23:59,506-Speed 2623.77 samples/sec Loss 8.0780 LearningRate 0.0368 Epoch: 7 Global Step: 326490 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:24:03,420-Speed 2616.62 samples/sec Loss 8.1274 LearningRate 0.0368 Epoch: 7 Global Step: 326500 Fp16 Grad Scale: 65536 Required: 57 hours
Training: 2022-04-14 08:24:07,324-Speed 2623.85 samples/sec Loss 8.2094 LearningRate 0.0368 Epoch: 7 Global Step: 326510 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:11,229-Speed 2622.90 samples/sec Loss 8.1154 LearningRate 0.0368 Epoch: 7 Global Step: 326520 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:15,130-Speed 2625.42 samples/sec Loss 8.2928 LearningRate 0.0368 Epoch: 7 Global Step: 326530 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:19,043-Speed 2617.42 samples/sec Loss 8.0782 LearningRate 0.0368 Epoch: 7 Global Step: 326540 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:22,939-Speed 2629.17 samples/sec Loss 8.1641 LearningRate 0.0368 Epoch: 7 Global Step: 326550 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:26,836-Speed 2628.21 samples/sec Loss 8.1989 LearningRate 0.0368 Epoch: 7 Global Step: 326560 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:30,732-Speed 2629.30 samples/sec Loss 8.1182 LearningRate 0.0368 Epoch: 7 Global Step: 326570 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:24:34,627-Speed 2629.31 samples/sec Loss 8.1735 LearningRate 0.0368 Epoch: 7 Global Step: 326580 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:24:38,520-Speed 2630.83 samples/sec Loss 8.2256 LearningRate 0.0368 Epoch: 7 Global Step: 326590 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:42,518-Speed 2561.69 samples/sec Loss 8.1668 LearningRate 0.0368 Epoch: 7 Global Step: 326600 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:46,437-Speed 2614.31 samples/sec Loss 8.1214 LearningRate 0.0368 Epoch: 7 Global Step: 326610 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:50,341-Speed 2623.31 samples/sec Loss 8.2659 LearningRate 0.0368 Epoch: 7 Global Step: 326620 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:54,278-Speed 2602.16 samples/sec Loss 8.2757 LearningRate 0.0368 Epoch: 7 Global Step: 326630 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:24:58,177-Speed 2626.45 samples/sec Loss 8.0873 LearningRate 0.0368 Epoch: 7 Global Step: 326640 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:25:02,078-Speed 2626.59 samples/sec Loss 8.2161 LearningRate 0.0368 Epoch: 7 Global Step: 326650 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:25:05,976-Speed 2627.13 samples/sec Loss 8.1336 LearningRate 0.0368 Epoch: 7 Global Step: 326660 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:25:09,893-Speed 2614.74 samples/sec Loss 8.1433 LearningRate 0.0368 Epoch: 7 Global Step: 326670 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:25:13,816-Speed 2610.70 samples/sec Loss 7.9984 LearningRate 0.0367 Epoch: 7 Global Step: 326680 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:25:17,721-Speed 2623.57 samples/sec Loss 8.1502 LearningRate 0.0367 Epoch: 7 Global Step: 326690 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:21,628-Speed 2621.67 samples/sec Loss 8.2783 LearningRate 0.0367 Epoch: 7 Global Step: 326700 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:25,533-Speed 2623.20 samples/sec Loss 8.1706 LearningRate 0.0367 Epoch: 7 Global Step: 326710 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:29,437-Speed 2623.60 samples/sec Loss 8.0965 LearningRate 0.0367 Epoch: 7 Global Step: 326720 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:33,340-Speed 2624.03 samples/sec Loss 8.0767 LearningRate 0.0367 Epoch: 7 Global Step: 326730 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:37,253-Speed 2617.86 samples/sec Loss 8.2058 LearningRate 0.0367 Epoch: 7 Global Step: 326740 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:41,160-Speed 2621.73 samples/sec Loss 8.2440 LearningRate 0.0367 Epoch: 7 Global Step: 326750 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:45,065-Speed 2622.24 samples/sec Loss 8.0394 LearningRate 0.0367 Epoch: 7 Global Step: 326760 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:48,964-Speed 2627.55 samples/sec Loss 8.1224 LearningRate 0.0367 Epoch: 7 Global Step: 326770 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:52,871-Speed 2622.03 samples/sec Loss 8.1282 LearningRate 0.0367 Epoch: 7 Global Step: 326780 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:25:56,747-Speed 2642.48 samples/sec Loss 8.2175 LearningRate 0.0367 Epoch: 7 Global Step: 326790 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:00,673-Speed 2609.05 samples/sec Loss 8.2444 LearningRate 0.0367 Epoch: 7 Global Step: 326800 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:04,569-Speed 2628.87 samples/sec Loss 8.0896 LearningRate 0.0367 Epoch: 7 Global Step: 326810 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:08,500-Speed 2605.59 samples/sec Loss 8.1243 LearningRate 0.0367 Epoch: 7 Global Step: 326820 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:12,399-Speed 2627.18 samples/sec Loss 8.1964 LearningRate 0.0367 Epoch: 7 Global Step: 326830 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:16,299-Speed 2626.58 samples/sec Loss 8.2458 LearningRate 0.0367 Epoch: 7 Global Step: 326840 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:20,209-Speed 2619.69 samples/sec Loss 8.0723 LearningRate 0.0367 Epoch: 7 Global Step: 326850 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:24,104-Speed 2629.08 samples/sec Loss 8.1573 LearningRate 0.0367 Epoch: 7 Global Step: 326860 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:28,001-Speed 2628.71 samples/sec Loss 8.0980 LearningRate 0.0367 Epoch: 7 Global Step: 326870 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:31,894-Speed 2630.82 samples/sec Loss 8.1555 LearningRate 0.0367 Epoch: 7 Global Step: 326880 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:35,812-Speed 2614.35 samples/sec Loss 8.1039 LearningRate 0.0367 Epoch: 7 Global Step: 326890 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:26:39,720-Speed 2621.34 samples/sec Loss 8.0641 LearningRate 0.0367 Epoch: 7 Global Step: 326900 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:26:43,594-Speed 2643.53 samples/sec Loss 8.1763 LearningRate 0.0367 Epoch: 7 Global Step: 326910 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:47,495-Speed 2625.90 samples/sec Loss 8.1100 LearningRate 0.0367 Epoch: 7 Global Step: 326920 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:51,396-Speed 2625.86 samples/sec Loss 8.1091 LearningRate 0.0367 Epoch: 7 Global Step: 326930 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:55,574-Speed 2451.51 samples/sec Loss 8.0540 LearningRate 0.0367 Epoch: 7 Global Step: 326940 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:26:59,463-Speed 2634.08 samples/sec Loss 8.0526 LearningRate 0.0367 Epoch: 7 Global Step: 326950 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:27:03,354-Speed 2631.95 samples/sec Loss 8.2035 LearningRate 0.0367 Epoch: 7 Global Step: 326960 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:27:07,249-Speed 2629.88 samples/sec Loss 8.2878 LearningRate 0.0367 Epoch: 7 Global Step: 326970 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:27:11,142-Speed 2630.93 samples/sec Loss 8.1960 LearningRate 0.0367 Epoch: 7 Global Step: 326980 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:27:15,054-Speed 2618.51 samples/sec Loss 8.1780 LearningRate 0.0367 Epoch: 7 Global Step: 326990 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:27:18,958-Speed 2623.78 samples/sec Loss 8.2624 LearningRate 0.0367 Epoch: 7 Global Step: 327000 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:27:22,864-Speed 2622.27 samples/sec Loss 8.0982 LearningRate 0.0367 Epoch: 7 Global Step: 327010 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:26,791-Speed 2608.64 samples/sec Loss 8.0944 LearningRate 0.0367 Epoch: 7 Global Step: 327020 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:30,698-Speed 2622.16 samples/sec Loss 8.1071 LearningRate 0.0367 Epoch: 7 Global Step: 327030 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:34,635-Speed 2601.76 samples/sec Loss 8.1806 LearningRate 0.0367 Epoch: 7 Global Step: 327040 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:38,528-Speed 2631.24 samples/sec Loss 8.2596 LearningRate 0.0367 Epoch: 7 Global Step: 327050 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:42,417-Speed 2633.35 samples/sec Loss 8.2442 LearningRate 0.0367 Epoch: 7 Global Step: 327060 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:46,404-Speed 2569.30 samples/sec Loss 8.2423 LearningRate 0.0367 Epoch: 7 Global Step: 327070 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:50,299-Speed 2629.29 samples/sec Loss 8.1806 LearningRate 0.0367 Epoch: 7 Global Step: 327080 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:54,191-Speed 2631.64 samples/sec Loss 8.1455 LearningRate 0.0367 Epoch: 7 Global Step: 327090 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:27:58,085-Speed 2630.94 samples/sec Loss 8.1764 LearningRate 0.0367 Epoch: 7 Global Step: 327100 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:01,977-Speed 2631.81 samples/sec Loss 8.2413 LearningRate 0.0367 Epoch: 7 Global Step: 327110 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:05,864-Speed 2635.26 samples/sec Loss 8.0954 LearningRate 0.0367 Epoch: 7 Global Step: 327120 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:09,760-Speed 2628.98 samples/sec Loss 8.0977 LearningRate 0.0367 Epoch: 7 Global Step: 327130 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:13,661-Speed 2625.42 samples/sec Loss 8.1186 LearningRate 0.0367 Epoch: 7 Global Step: 327140 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:17,563-Speed 2625.31 samples/sec Loss 8.1151 LearningRate 0.0367 Epoch: 7 Global Step: 327150 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:21,498-Speed 2602.85 samples/sec Loss 8.0799 LearningRate 0.0367 Epoch: 7 Global Step: 327160 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:25,408-Speed 2619.83 samples/sec Loss 8.1363 LearningRate 0.0367 Epoch: 7 Global Step: 327170 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:28:29,294-Speed 2635.37 samples/sec Loss 8.0896 LearningRate 0.0367 Epoch: 7 Global Step: 327180 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:28:33,238-Speed 2597.46 samples/sec Loss 8.0244 LearningRate 0.0367 Epoch: 7 Global Step: 327190 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:28:37,141-Speed 2624.12 samples/sec Loss 8.0983 LearningRate 0.0367 Epoch: 7 Global Step: 327200 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:28:41,042-Speed 2625.51 samples/sec Loss 8.1584 LearningRate 0.0367 Epoch: 7 Global Step: 327210 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:28:44,935-Speed 2631.23 samples/sec Loss 8.1186 LearningRate 0.0367 Epoch: 7 Global Step: 327220 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:28:49,000-Speed 2520.01 samples/sec Loss 8.1522 LearningRate 0.0367 Epoch: 7 Global Step: 327230 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:28:53,065-Speed 2519.91 samples/sec Loss 8.1547 LearningRate 0.0367 Epoch: 7 Global Step: 327240 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:28:57,095-Speed 2541.36 samples/sec Loss 8.1952 LearningRate 0.0367 Epoch: 7 Global Step: 327250 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:29:00,948-Speed 2658.00 samples/sec Loss 8.1660 LearningRate 0.0367 Epoch: 7 Global Step: 327260 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:04,864-Speed 2615.66 samples/sec Loss 8.2206 LearningRate 0.0367 Epoch: 7 Global Step: 327270 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:08,760-Speed 2629.10 samples/sec Loss 8.2367 LearningRate 0.0367 Epoch: 7 Global Step: 327280 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:12,651-Speed 2632.71 samples/sec Loss 8.1580 LearningRate 0.0367 Epoch: 7 Global Step: 327290 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:16,538-Speed 2634.73 samples/sec Loss 8.1345 LearningRate 0.0367 Epoch: 7 Global Step: 327300 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:20,426-Speed 2634.93 samples/sec Loss 8.1865 LearningRate 0.0367 Epoch: 7 Global Step: 327310 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:24,317-Speed 2632.00 samples/sec Loss 8.1996 LearningRate 0.0367 Epoch: 7 Global Step: 327320 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:28,218-Speed 2626.15 samples/sec Loss 7.9910 LearningRate 0.0367 Epoch: 7 Global Step: 327330 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:32,121-Speed 2624.26 samples/sec Loss 8.1013 LearningRate 0.0367 Epoch: 7 Global Step: 327340 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:36,011-Speed 2632.71 samples/sec Loss 8.1580 LearningRate 0.0367 Epoch: 7 Global Step: 327350 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:29:39,903-Speed 2631.71 samples/sec Loss 8.0383 LearningRate 0.0366 Epoch: 7 Global Step: 327360 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:29:43,807-Speed 2623.65 samples/sec Loss 8.1110 LearningRate 0.0366 Epoch: 7 Global Step: 327370 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:29:47,700-Speed 2631.66 samples/sec Loss 8.0130 LearningRate 0.0366 Epoch: 7 Global Step: 327380 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:29:51,587-Speed 2634.98 samples/sec Loss 8.1668 LearningRate 0.0366 Epoch: 7 Global Step: 327390 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:29:55,481-Speed 2629.90 samples/sec Loss 8.0786 LearningRate 0.0366 Epoch: 7 Global Step: 327400 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:29:59,379-Speed 2627.91 samples/sec Loss 8.0651 LearningRate 0.0366 Epoch: 7 Global Step: 327410 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:30:03,279-Speed 2625.58 samples/sec Loss 8.2173 LearningRate 0.0366 Epoch: 7 Global Step: 327420 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:30:07,167-Speed 2634.21 samples/sec Loss 8.2218 LearningRate 0.0366 Epoch: 7 Global Step: 327430 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:30:11,058-Speed 2632.38 samples/sec Loss 8.0354 LearningRate 0.0366 Epoch: 7 Global Step: 327440 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:30:14,947-Speed 2633.69 samples/sec Loss 8.1350 LearningRate 0.0366 Epoch: 7 Global Step: 327450 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:30:18,837-Speed 2633.02 samples/sec Loss 8.0377 LearningRate 0.0366 Epoch: 7 Global Step: 327460 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:22,727-Speed 2632.72 samples/sec Loss 8.1731 LearningRate 0.0366 Epoch: 7 Global Step: 327470 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:26,619-Speed 2632.65 samples/sec Loss 8.1795 LearningRate 0.0366 Epoch: 7 Global Step: 327480 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:30,506-Speed 2634.49 samples/sec Loss 8.0886 LearningRate 0.0366 Epoch: 7 Global Step: 327490 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:34,397-Speed 2632.37 samples/sec Loss 8.1323 LearningRate 0.0366 Epoch: 7 Global Step: 327500 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:38,288-Speed 2632.08 samples/sec Loss 7.9924 LearningRate 0.0366 Epoch: 7 Global Step: 327510 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:42,189-Speed 2626.11 samples/sec Loss 8.1929 LearningRate 0.0366 Epoch: 7 Global Step: 327520 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:46,080-Speed 2632.14 samples/sec Loss 8.1847 LearningRate 0.0366 Epoch: 7 Global Step: 327530 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:49,987-Speed 2621.17 samples/sec Loss 8.1544 LearningRate 0.0366 Epoch: 7 Global Step: 327540 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:53,891-Speed 2623.28 samples/sec Loss 8.0707 LearningRate 0.0366 Epoch: 7 Global Step: 327550 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:30:57,773-Speed 2639.34 samples/sec Loss 8.1005 LearningRate 0.0366 Epoch: 7 Global Step: 327560 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:01,669-Speed 2628.84 samples/sec Loss 8.1602 LearningRate 0.0366 Epoch: 7 Global Step: 327570 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:05,562-Speed 2630.73 samples/sec Loss 8.1234 LearningRate 0.0366 Epoch: 7 Global Step: 327580 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:09,455-Speed 2630.66 samples/sec Loss 7.9662 LearningRate 0.0366 Epoch: 7 Global Step: 327590 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:13,355-Speed 2626.76 samples/sec Loss 8.1813 LearningRate 0.0366 Epoch: 7 Global Step: 327600 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:17,244-Speed 2633.49 samples/sec Loss 8.1562 LearningRate 0.0366 Epoch: 7 Global Step: 327610 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:21,137-Speed 2630.97 samples/sec Loss 8.0458 LearningRate 0.0366 Epoch: 7 Global Step: 327620 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:25,027-Speed 2632.91 samples/sec Loss 8.2252 LearningRate 0.0366 Epoch: 7 Global Step: 327630 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:28,932-Speed 2622.84 samples/sec Loss 8.1585 LearningRate 0.0366 Epoch: 7 Global Step: 327640 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:32,835-Speed 2624.67 samples/sec Loss 8.2341 LearningRate 0.0366 Epoch: 7 Global Step: 327650 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:31:36,727-Speed 2631.61 samples/sec Loss 8.1469 LearningRate 0.0366 Epoch: 7 Global Step: 327660 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:31:40,627-Speed 2625.89 samples/sec Loss 8.1814 LearningRate 0.0366 Epoch: 7 Global Step: 327670 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:31:44,524-Speed 2628.64 samples/sec Loss 8.1838 LearningRate 0.0366 Epoch: 7 Global Step: 327680 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:31:48,417-Speed 2630.75 samples/sec Loss 8.1291 LearningRate 0.0366 Epoch: 7 Global Step: 327690 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:31:52,313-Speed 2628.83 samples/sec Loss 8.1140 LearningRate 0.0366 Epoch: 7 Global Step: 327700 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:31:56,203-Speed 2632.84 samples/sec Loss 8.1558 LearningRate 0.0366 Epoch: 7 Global Step: 327710 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:00,101-Speed 2627.91 samples/sec Loss 8.1348 LearningRate 0.0366 Epoch: 7 Global Step: 327720 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:03,995-Speed 2630.15 samples/sec Loss 8.0966 LearningRate 0.0366 Epoch: 7 Global Step: 327730 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:07,888-Speed 2630.32 samples/sec Loss 8.1870 LearningRate 0.0366 Epoch: 7 Global Step: 327740 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:11,798-Speed 2620.41 samples/sec Loss 8.1504 LearningRate 0.0366 Epoch: 7 Global Step: 327750 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:15,694-Speed 2628.53 samples/sec Loss 8.1952 LearningRate 0.0366 Epoch: 7 Global Step: 327760 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 08:32:19,570-Speed 2642.74 samples/sec Loss 8.2051 LearningRate 0.0366 Epoch: 7 Global Step: 327770 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:23,466-Speed 2628.78 samples/sec Loss 8.1512 LearningRate 0.0366 Epoch: 7 Global Step: 327780 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:27,361-Speed 2629.73 samples/sec Loss 8.0387 LearningRate 0.0366 Epoch: 7 Global Step: 327790 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:31,256-Speed 2630.03 samples/sec Loss 8.1163 LearningRate 0.0366 Epoch: 7 Global Step: 327800 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:35,149-Speed 2630.25 samples/sec Loss 8.0387 LearningRate 0.0366 Epoch: 7 Global Step: 327810 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:39,042-Speed 2631.06 samples/sec Loss 8.1019 LearningRate 0.0366 Epoch: 7 Global Step: 327820 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:32:42,914-Speed 2644.86 samples/sec Loss 8.0904 LearningRate 0.0366 Epoch: 7 Global Step: 327830 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:32:46,807-Speed 2631.99 samples/sec Loss 8.1277 LearningRate 0.0366 Epoch: 7 Global Step: 327840 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:32:50,699-Speed 2631.19 samples/sec Loss 8.1223 LearningRate 0.0366 Epoch: 7 Global Step: 327850 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:32:54,595-Speed 2629.44 samples/sec Loss 8.2161 LearningRate 0.0366 Epoch: 7 Global Step: 327860 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:32:58,500-Speed 2622.83 samples/sec Loss 8.2207 LearningRate 0.0366 Epoch: 7 Global Step: 327870 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:33:02,395-Speed 2630.16 samples/sec Loss 8.0936 LearningRate 0.0366 Epoch: 7 Global Step: 327880 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:33:06,286-Speed 2632.19 samples/sec Loss 8.1114 LearningRate 0.0366 Epoch: 7 Global Step: 327890 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:33:10,187-Speed 2625.92 samples/sec Loss 8.0377 LearningRate 0.0366 Epoch: 7 Global Step: 327900 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:33:14,082-Speed 2629.37 samples/sec Loss 8.1164 LearningRate 0.0366 Epoch: 7 Global Step: 327910 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:33:17,976-Speed 2630.03 samples/sec Loss 8.0370 LearningRate 0.0366 Epoch: 7 Global Step: 327920 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:33:21,871-Speed 2629.83 samples/sec Loss 8.3547 LearningRate 0.0366 Epoch: 7 Global Step: 327930 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:25,769-Speed 2627.54 samples/sec Loss 8.2295 LearningRate 0.0366 Epoch: 7 Global Step: 327940 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:29,665-Speed 2629.21 samples/sec Loss 8.1272 LearningRate 0.0366 Epoch: 7 Global Step: 327950 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:33,556-Speed 2632.88 samples/sec Loss 8.1098 LearningRate 0.0366 Epoch: 7 Global Step: 327960 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:37,456-Speed 2625.83 samples/sec Loss 8.1288 LearningRate 0.0366 Epoch: 7 Global Step: 327970 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:41,357-Speed 2625.50 samples/sec Loss 8.1741 LearningRate 0.0366 Epoch: 7 Global Step: 327980 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:45,269-Speed 2617.73 samples/sec Loss 8.1011 LearningRate 0.0366 Epoch: 7 Global Step: 327990 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:49,162-Speed 2631.49 samples/sec Loss 8.1783 LearningRate 0.0366 Epoch: 7 Global Step: 328000 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:53,062-Speed 2625.86 samples/sec Loss 8.1080 LearningRate 0.0366 Epoch: 7 Global Step: 328010 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:33:56,954-Speed 2631.68 samples/sec Loss 8.0725 LearningRate 0.0366 Epoch: 7 Global Step: 328020 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:00,829-Speed 2643.68 samples/sec Loss 8.0081 LearningRate 0.0366 Epoch: 7 Global Step: 328030 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:04,725-Speed 2628.99 samples/sec Loss 7.9479 LearningRate 0.0366 Epoch: 7 Global Step: 328040 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:08,620-Speed 2629.20 samples/sec Loss 8.1045 LearningRate 0.0365 Epoch: 7 Global Step: 328050 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:12,514-Speed 2630.32 samples/sec Loss 8.1597 LearningRate 0.0365 Epoch: 7 Global Step: 328060 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:16,397-Speed 2638.03 samples/sec Loss 8.1038 LearningRate 0.0365 Epoch: 7 Global Step: 328070 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:20,305-Speed 2620.74 samples/sec Loss 8.2151 LearningRate 0.0365 Epoch: 7 Global Step: 328080 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:24,196-Speed 2632.29 samples/sec Loss 8.0777 LearningRate 0.0365 Epoch: 7 Global Step: 328090 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:28,107-Speed 2619.10 samples/sec Loss 8.0893 LearningRate 0.0365 Epoch: 7 Global Step: 328100 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:32,007-Speed 2626.21 samples/sec Loss 8.0992 LearningRate 0.0365 Epoch: 7 Global Step: 328110 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:35,913-Speed 2622.61 samples/sec Loss 8.2456 LearningRate 0.0365 Epoch: 7 Global Step: 328120 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:39,778-Speed 2650.01 samples/sec Loss 8.1067 LearningRate 0.0365 Epoch: 7 Global Step: 328130 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:43,672-Speed 2630.98 samples/sec Loss 8.0582 LearningRate 0.0365 Epoch: 7 Global Step: 328140 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:47,565-Speed 2630.89 samples/sec Loss 8.1136 LearningRate 0.0365 Epoch: 7 Global Step: 328150 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:34:51,448-Speed 2637.63 samples/sec Loss 8.2020 LearningRate 0.0365 Epoch: 7 Global Step: 328160 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:34:55,348-Speed 2625.96 samples/sec Loss 8.0654 LearningRate 0.0365 Epoch: 7 Global Step: 328170 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:34:59,240-Speed 2631.47 samples/sec Loss 8.1405 LearningRate 0.0365 Epoch: 7 Global Step: 328180 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:35:03,143-Speed 2624.70 samples/sec Loss 8.1505 LearningRate 0.0365 Epoch: 7 Global Step: 328190 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:35:07,035-Speed 2631.89 samples/sec Loss 8.2061 LearningRate 0.0365 Epoch: 7 Global Step: 328200 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:35:10,930-Speed 2628.98 samples/sec Loss 8.1476 LearningRate 0.0365 Epoch: 7 Global Step: 328210 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:35:14,839-Speed 2620.82 samples/sec Loss 8.0518 LearningRate 0.0365 Epoch: 7 Global Step: 328220 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:35:18,732-Speed 2630.79 samples/sec Loss 8.0464 LearningRate 0.0365 Epoch: 7 Global Step: 328230 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:35:22,628-Speed 2629.01 samples/sec Loss 8.1577 LearningRate 0.0365 Epoch: 7 Global Step: 328240 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:35:26,502-Speed 2643.81 samples/sec Loss 8.0592 LearningRate 0.0365 Epoch: 7 Global Step: 328250 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:30,397-Speed 2629.36 samples/sec Loss 8.1471 LearningRate 0.0365 Epoch: 7 Global Step: 328260 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:34,293-Speed 2629.45 samples/sec Loss 8.0707 LearningRate 0.0365 Epoch: 7 Global Step: 328270 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:38,187-Speed 2630.32 samples/sec Loss 8.0863 LearningRate 0.0365 Epoch: 7 Global Step: 328280 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:42,079-Speed 2631.83 samples/sec Loss 8.1765 LearningRate 0.0365 Epoch: 7 Global Step: 328290 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:45,998-Speed 2613.34 samples/sec Loss 8.0213 LearningRate 0.0365 Epoch: 7 Global Step: 328300 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:49,888-Speed 2633.09 samples/sec Loss 8.0944 LearningRate 0.0365 Epoch: 7 Global Step: 328310 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:53,779-Speed 2632.27 samples/sec Loss 8.1714 LearningRate 0.0365 Epoch: 7 Global Step: 328320 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:35:57,672-Speed 2631.42 samples/sec Loss 8.1119 LearningRate 0.0365 Epoch: 7 Global Step: 328330 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:36:01,566-Speed 2629.84 samples/sec Loss 8.1380 LearningRate 0.0365 Epoch: 7 Global Step: 328340 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:36:05,554-Speed 2568.44 samples/sec Loss 8.1707 LearningRate 0.0365 Epoch: 7 Global Step: 328350 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:09,459-Speed 2622.93 samples/sec Loss 8.1987 LearningRate 0.0365 Epoch: 7 Global Step: 328360 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:13,355-Speed 2628.79 samples/sec Loss 8.0001 LearningRate 0.0365 Epoch: 7 Global Step: 328370 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:17,250-Speed 2629.14 samples/sec Loss 8.0694 LearningRate 0.0365 Epoch: 7 Global Step: 328380 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:21,203-Speed 2592.07 samples/sec Loss 8.1255 LearningRate 0.0365 Epoch: 7 Global Step: 328390 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:25,095-Speed 2631.59 samples/sec Loss 8.1810 LearningRate 0.0365 Epoch: 7 Global Step: 328400 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:29,001-Speed 2621.95 samples/sec Loss 8.2147 LearningRate 0.0365 Epoch: 7 Global Step: 328410 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:32,905-Speed 2623.91 samples/sec Loss 8.1779 LearningRate 0.0365 Epoch: 7 Global Step: 328420 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:36,809-Speed 2623.67 samples/sec Loss 8.1010 LearningRate 0.0365 Epoch: 7 Global Step: 328430 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:40,711-Speed 2624.47 samples/sec Loss 8.2304 LearningRate 0.0365 Epoch: 7 Global Step: 328440 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:36:44,624-Speed 2618.21 samples/sec Loss 8.1569 LearningRate 0.0365 Epoch: 7 Global Step: 328450 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:36:48,549-Speed 2609.73 samples/sec Loss 7.9764 LearningRate 0.0365 Epoch: 7 Global Step: 328460 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:36:52,452-Speed 2624.23 samples/sec Loss 8.0040 LearningRate 0.0365 Epoch: 7 Global Step: 328470 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:36:56,359-Speed 2621.76 samples/sec Loss 8.0856 LearningRate 0.0365 Epoch: 7 Global Step: 328480 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:37:00,240-Speed 2639.23 samples/sec Loss 8.1226 LearningRate 0.0365 Epoch: 7 Global Step: 328490 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:04,136-Speed 2628.84 samples/sec Loss 8.0944 LearningRate 0.0365 Epoch: 7 Global Step: 328500 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:08,048-Speed 2617.90 samples/sec Loss 8.0844 LearningRate 0.0365 Epoch: 7 Global Step: 328510 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:11,953-Speed 2623.10 samples/sec Loss 8.0426 LearningRate 0.0365 Epoch: 7 Global Step: 328520 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:15,855-Speed 2625.38 samples/sec Loss 8.0676 LearningRate 0.0365 Epoch: 7 Global Step: 328530 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:19,751-Speed 2629.20 samples/sec Loss 7.9865 LearningRate 0.0365 Epoch: 7 Global Step: 328540 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:23,664-Speed 2617.28 samples/sec Loss 8.3084 LearningRate 0.0365 Epoch: 7 Global Step: 328550 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:27,573-Speed 2621.19 samples/sec Loss 8.1218 LearningRate 0.0365 Epoch: 7 Global Step: 328560 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:31,477-Speed 2623.20 samples/sec Loss 8.0390 LearningRate 0.0365 Epoch: 7 Global Step: 328570 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:35,369-Speed 2631.30 samples/sec Loss 8.0364 LearningRate 0.0365 Epoch: 7 Global Step: 328580 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:37:39,260-Speed 2632.46 samples/sec Loss 8.1470 LearningRate 0.0365 Epoch: 7 Global Step: 328590 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:37:43,152-Speed 2631.41 samples/sec Loss 8.1364 LearningRate 0.0365 Epoch: 7 Global Step: 328600 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:37:47,045-Speed 2631.40 samples/sec Loss 8.1358 LearningRate 0.0365 Epoch: 7 Global Step: 328610 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:37:50,938-Speed 2631.17 samples/sec Loss 7.9764 LearningRate 0.0365 Epoch: 7 Global Step: 328620 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:37:54,842-Speed 2623.70 samples/sec Loss 8.1183 LearningRate 0.0365 Epoch: 7 Global Step: 328630 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:37:58,734-Speed 2631.94 samples/sec Loss 8.0677 LearningRate 0.0365 Epoch: 7 Global Step: 328640 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:02,645-Speed 2619.33 samples/sec Loss 8.0779 LearningRate 0.0365 Epoch: 7 Global Step: 328650 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:06,538-Speed 2630.95 samples/sec Loss 8.2151 LearningRate 0.0365 Epoch: 7 Global Step: 328660 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:10,467-Speed 2606.84 samples/sec Loss 8.2015 LearningRate 0.0365 Epoch: 7 Global Step: 328670 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:14,367-Speed 2626.57 samples/sec Loss 8.2321 LearningRate 0.0365 Epoch: 7 Global Step: 328680 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:18,265-Speed 2627.85 samples/sec Loss 7.9971 LearningRate 0.0365 Epoch: 7 Global Step: 328690 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 08:38:22,200-Speed 2602.28 samples/sec Loss 8.0172 LearningRate 0.0365 Epoch: 7 Global Step: 328700 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:26,110-Speed 2620.64 samples/sec Loss 8.1359 LearningRate 0.0365 Epoch: 7 Global Step: 328710 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:30,009-Speed 2626.57 samples/sec Loss 8.2565 LearningRate 0.0365 Epoch: 7 Global Step: 328720 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:33,927-Speed 2614.57 samples/sec Loss 8.0301 LearningRate 0.0365 Epoch: 7 Global Step: 328730 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:37,823-Speed 2629.31 samples/sec Loss 8.2939 LearningRate 0.0364 Epoch: 7 Global Step: 328740 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:41,714-Speed 2632.44 samples/sec Loss 8.1770 LearningRate 0.0364 Epoch: 7 Global Step: 328750 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:45,606-Speed 2631.64 samples/sec Loss 8.0877 LearningRate 0.0364 Epoch: 7 Global Step: 328760 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:49,498-Speed 2631.52 samples/sec Loss 8.0049 LearningRate 0.0364 Epoch: 7 Global Step: 328770 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:53,394-Speed 2628.88 samples/sec Loss 8.0422 LearningRate 0.0364 Epoch: 7 Global Step: 328780 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:38:57,286-Speed 2631.96 samples/sec Loss 8.1137 LearningRate 0.0364 Epoch: 7 Global Step: 328790 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:01,182-Speed 2629.07 samples/sec Loss 8.1543 LearningRate 0.0364 Epoch: 7 Global Step: 328800 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 08:39:05,061-Speed 2640.86 samples/sec Loss 8.1259 LearningRate 0.0364 Epoch: 7 Global Step: 328810 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:08,951-Speed 2632.71 samples/sec Loss 8.1790 LearningRate 0.0364 Epoch: 7 Global Step: 328820 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:12,847-Speed 2628.55 samples/sec Loss 8.0353 LearningRate 0.0364 Epoch: 7 Global Step: 328830 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:16,742-Speed 2630.30 samples/sec Loss 8.2955 LearningRate 0.0364 Epoch: 7 Global Step: 328840 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:20,641-Speed 2626.26 samples/sec Loss 8.0843 LearningRate 0.0364 Epoch: 7 Global Step: 328850 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:24,532-Speed 2633.06 samples/sec Loss 8.2083 LearningRate 0.0364 Epoch: 7 Global Step: 328860 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:28,424-Speed 2631.32 samples/sec Loss 8.1427 LearningRate 0.0364 Epoch: 7 Global Step: 328870 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:32,320-Speed 2629.38 samples/sec Loss 8.1286 LearningRate 0.0364 Epoch: 7 Global Step: 328880 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:36,238-Speed 2614.19 samples/sec Loss 8.0770 LearningRate 0.0364 Epoch: 7 Global Step: 328890 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:40,138-Speed 2626.27 samples/sec Loss 8.1877 LearningRate 0.0364 Epoch: 7 Global Step: 328900 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:44,025-Speed 2635.00 samples/sec Loss 7.9978 LearningRate 0.0364 Epoch: 7 Global Step: 328910 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:47,909-Speed 2637.70 samples/sec Loss 7.9878 LearningRate 0.0364 Epoch: 7 Global Step: 328920 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:51,804-Speed 2629.65 samples/sec Loss 8.1977 LearningRate 0.0364 Epoch: 7 Global Step: 328930 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:55,701-Speed 2628.51 samples/sec Loss 8.1133 LearningRate 0.0364 Epoch: 7 Global Step: 328940 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:39:59,757-Speed 2525.48 samples/sec Loss 8.1080 LearningRate 0.0364 Epoch: 7 Global Step: 328950 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:40:03,850-Speed 2502.52 samples/sec Loss 8.1555 LearningRate 0.0364 Epoch: 7 Global Step: 328960 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:40:07,797-Speed 2594.50 samples/sec Loss 8.1108 LearningRate 0.0364 Epoch: 7 Global Step: 328970 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:40:11,687-Speed 2633.04 samples/sec Loss 8.1426 LearningRate 0.0364 Epoch: 7 Global Step: 328980 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:40:15,592-Speed 2623.08 samples/sec Loss 8.0940 LearningRate 0.0364 Epoch: 7 Global Step: 328990 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:40:19,469-Speed 2642.17 samples/sec Loss 8.1087 LearningRate 0.0364 Epoch: 7 Global Step: 329000 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:40:23,308-Speed 2667.34 samples/sec Loss 10.8725 LearningRate 0.0364 Epoch: 7 Global Step: 329010 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:27,203-Speed 2629.70 samples/sec Loss 9.6417 LearningRate 0.0364 Epoch: 7 Global Step: 329020 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:31,093-Speed 2633.14 samples/sec Loss 8.7480 LearningRate 0.0364 Epoch: 7 Global Step: 329030 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:34,981-Speed 2634.59 samples/sec Loss 8.5940 LearningRate 0.0364 Epoch: 7 Global Step: 329040 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:38,868-Speed 2635.26 samples/sec Loss 8.5332 LearningRate 0.0364 Epoch: 7 Global Step: 329050 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:42,770-Speed 2625.03 samples/sec Loss 8.3560 LearningRate 0.0364 Epoch: 7 Global Step: 329060 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:46,663-Speed 2631.07 samples/sec Loss 8.3910 LearningRate 0.0364 Epoch: 7 Global Step: 329070 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:50,558-Speed 2629.49 samples/sec Loss 8.1806 LearningRate 0.0364 Epoch: 7 Global Step: 329080 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:54,454-Speed 2629.93 samples/sec Loss 8.2291 LearningRate 0.0364 Epoch: 7 Global Step: 329090 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:40:58,362-Speed 2620.74 samples/sec Loss 8.1816 LearningRate 0.0364 Epoch: 7 Global Step: 329100 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:41:02,270-Speed 2620.70 samples/sec Loss 8.1761 LearningRate 0.0364 Epoch: 7 Global Step: 329110 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:06,159-Speed 2633.80 samples/sec Loss 8.1736 LearningRate 0.0364 Epoch: 7 Global Step: 329120 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:10,050-Speed 2633.17 samples/sec Loss 8.1065 LearningRate 0.0364 Epoch: 7 Global Step: 329130 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:13,942-Speed 2631.53 samples/sec Loss 8.1764 LearningRate 0.0364 Epoch: 7 Global Step: 329140 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:17,847-Speed 2622.74 samples/sec Loss 8.3217 LearningRate 0.0364 Epoch: 7 Global Step: 329150 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:21,737-Speed 2633.38 samples/sec Loss 8.0647 LearningRate 0.0364 Epoch: 7 Global Step: 329160 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:25,628-Speed 2632.51 samples/sec Loss 8.1968 LearningRate 0.0364 Epoch: 7 Global Step: 329170 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:29,517-Speed 2633.91 samples/sec Loss 8.1257 LearningRate 0.0364 Epoch: 7 Global Step: 329180 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:33,406-Speed 2633.60 samples/sec Loss 8.2120 LearningRate 0.0364 Epoch: 7 Global Step: 329190 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:37,296-Speed 2632.88 samples/sec Loss 8.1391 LearningRate 0.0364 Epoch: 7 Global Step: 329200 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:41:41,211-Speed 2616.14 samples/sec Loss 8.1714 LearningRate 0.0364 Epoch: 7 Global Step: 329210 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:41:45,113-Speed 2625.06 samples/sec Loss 8.1300 LearningRate 0.0364 Epoch: 7 Global Step: 329220 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:41:49,014-Speed 2625.79 samples/sec Loss 8.5430 LearningRate 0.0364 Epoch: 7 Global Step: 329230 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:41:52,900-Speed 2635.67 samples/sec Loss 8.3279 LearningRate 0.0364 Epoch: 7 Global Step: 329240 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:41:56,799-Speed 2627.12 samples/sec Loss 8.3015 LearningRate 0.0364 Epoch: 7 Global Step: 329250 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:42:00,700-Speed 2625.84 samples/sec Loss 8.1661 LearningRate 0.0364 Epoch: 7 Global Step: 329260 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:42:04,620-Speed 2612.94 samples/sec Loss 8.2971 LearningRate 0.0364 Epoch: 7 Global Step: 329270 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:42:08,536-Speed 2615.06 samples/sec Loss 8.0996 LearningRate 0.0364 Epoch: 7 Global Step: 329280 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:42:12,428-Speed 2632.51 samples/sec Loss 8.1152 LearningRate 0.0364 Epoch: 7 Global Step: 329290 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:42:16,316-Speed 2633.99 samples/sec Loss 8.2030 LearningRate 0.0364 Epoch: 7 Global Step: 329300 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:42:20,241-Speed 2609.82 samples/sec Loss 8.1352 LearningRate 0.0364 Epoch: 7 Global Step: 329310 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:24,130-Speed 2634.07 samples/sec Loss 8.1144 LearningRate 0.0364 Epoch: 7 Global Step: 329320 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:28,022-Speed 2631.87 samples/sec Loss 8.2009 LearningRate 0.0364 Epoch: 7 Global Step: 329330 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:31,924-Speed 2624.77 samples/sec Loss 8.1490 LearningRate 0.0364 Epoch: 7 Global Step: 329340 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:35,823-Speed 2627.23 samples/sec Loss 8.0751 LearningRate 0.0364 Epoch: 7 Global Step: 329350 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:39,711-Speed 2634.25 samples/sec Loss 8.0779 LearningRate 0.0364 Epoch: 7 Global Step: 329360 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:43,601-Speed 2633.21 samples/sec Loss 8.1529 LearningRate 0.0364 Epoch: 7 Global Step: 329370 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:47,491-Speed 2632.71 samples/sec Loss 8.1250 LearningRate 0.0364 Epoch: 7 Global Step: 329380 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:51,384-Speed 2631.61 samples/sec Loss 8.0555 LearningRate 0.0364 Epoch: 7 Global Step: 329390 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:55,273-Speed 2633.03 samples/sec Loss 8.1229 LearningRate 0.0364 Epoch: 7 Global Step: 329400 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:42:59,165-Speed 2632.71 samples/sec Loss 8.1422 LearningRate 0.0364 Epoch: 7 Global Step: 329410 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:43:03,061-Speed 2628.34 samples/sec Loss 8.1322 LearningRate 0.0363 Epoch: 7 Global Step: 329420 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:43:06,954-Speed 2631.03 samples/sec Loss 8.1082 LearningRate 0.0363 Epoch: 7 Global Step: 329430 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:43:10,856-Speed 2624.80 samples/sec Loss 8.0912 LearningRate 0.0363 Epoch: 7 Global Step: 329440 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:43:14,807-Speed 2593.47 samples/sec Loss 7.9926 LearningRate 0.0363 Epoch: 7 Global Step: 329450 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:43:18,878-Speed 2515.47 samples/sec Loss 8.1861 LearningRate 0.0363 Epoch: 7 Global Step: 329460 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:43:22,826-Speed 2594.28 samples/sec Loss 8.1270 LearningRate 0.0363 Epoch: 7 Global Step: 329470 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:43:26,719-Speed 2631.35 samples/sec Loss 8.2033 LearningRate 0.0363 Epoch: 7 Global Step: 329480 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:43:30,594-Speed 2643.19 samples/sec Loss 8.1985 LearningRate 0.0363 Epoch: 7 Global Step: 329490 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:43:34,493-Speed 2627.32 samples/sec Loss 8.0285 LearningRate 0.0363 Epoch: 7 Global Step: 329500 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:43:38,386-Speed 2630.29 samples/sec Loss 8.2013 LearningRate 0.0363 Epoch: 7 Global Step: 329510 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:43:42,289-Speed 2624.33 samples/sec Loss 8.2040 LearningRate 0.0363 Epoch: 7 Global Step: 329520 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:43:46,181-Speed 2631.79 samples/sec Loss 8.1049 LearningRate 0.0363 Epoch: 7 Global Step: 329530 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:43:50,073-Speed 2631.83 samples/sec Loss 8.1323 LearningRate 0.0363 Epoch: 7 Global Step: 329540 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:43:53,964-Speed 2631.83 samples/sec Loss 8.1710 LearningRate 0.0363 Epoch: 7 Global Step: 329550 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:43:57,861-Speed 2628.56 samples/sec Loss 8.1913 LearningRate 0.0363 Epoch: 7 Global Step: 329560 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:44:01,752-Speed 2632.61 samples/sec Loss 8.2087 LearningRate 0.0363 Epoch: 7 Global Step: 329570 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:44:05,660-Speed 2620.35 samples/sec Loss 8.1881 LearningRate 0.0363 Epoch: 7 Global Step: 329580 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:44:09,555-Speed 2630.05 samples/sec Loss 8.0790 LearningRate 0.0363 Epoch: 7 Global Step: 329590 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:13,446-Speed 2632.62 samples/sec Loss 8.0691 LearningRate 0.0363 Epoch: 7 Global Step: 329600 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:17,361-Speed 2615.73 samples/sec Loss 8.0776 LearningRate 0.0363 Epoch: 7 Global Step: 329610 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:21,255-Speed 2630.08 samples/sec Loss 8.1396 LearningRate 0.0363 Epoch: 7 Global Step: 329620 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:25,146-Speed 2632.58 samples/sec Loss 8.0423 LearningRate 0.0363 Epoch: 7 Global Step: 329630 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:29,036-Speed 2633.23 samples/sec Loss 8.2633 LearningRate 0.0363 Epoch: 7 Global Step: 329640 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:32,925-Speed 2633.81 samples/sec Loss 7.9302 LearningRate 0.0363 Epoch: 7 Global Step: 329650 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:36,816-Speed 2632.06 samples/sec Loss 8.1422 LearningRate 0.0363 Epoch: 7 Global Step: 329660 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:40,704-Speed 2633.85 samples/sec Loss 8.1406 LearningRate 0.0363 Epoch: 7 Global Step: 329670 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:44,593-Speed 2633.74 samples/sec Loss 8.0434 LearningRate 0.0363 Epoch: 7 Global Step: 329680 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:44:48,483-Speed 2633.44 samples/sec Loss 8.1317 LearningRate 0.0363 Epoch: 7 Global Step: 329690 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:44:52,384-Speed 2625.22 samples/sec Loss 8.0829 LearningRate 0.0363 Epoch: 7 Global Step: 329700 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:44:56,275-Speed 2632.51 samples/sec Loss 8.1063 LearningRate 0.0363 Epoch: 7 Global Step: 329710 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:00,191-Speed 2615.52 samples/sec Loss 8.2397 LearningRate 0.0363 Epoch: 7 Global Step: 329720 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:04,084-Speed 2631.41 samples/sec Loss 8.1862 LearningRate 0.0363 Epoch: 7 Global Step: 329730 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:07,978-Speed 2629.79 samples/sec Loss 8.0775 LearningRate 0.0363 Epoch: 7 Global Step: 329740 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:11,871-Speed 2630.87 samples/sec Loss 8.1837 LearningRate 0.0363 Epoch: 7 Global Step: 329750 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:15,772-Speed 2625.39 samples/sec Loss 8.0501 LearningRate 0.0363 Epoch: 7 Global Step: 329760 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:19,669-Speed 2628.85 samples/sec Loss 8.1020 LearningRate 0.0363 Epoch: 7 Global Step: 329770 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:23,556-Speed 2634.33 samples/sec Loss 8.1519 LearningRate 0.0363 Epoch: 7 Global Step: 329780 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:45:27,450-Speed 2630.78 samples/sec Loss 8.1505 LearningRate 0.0363 Epoch: 7 Global Step: 329790 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:31,369-Speed 2613.37 samples/sec Loss 8.1504 LearningRate 0.0363 Epoch: 7 Global Step: 329800 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:35,261-Speed 2632.19 samples/sec Loss 8.0446 LearningRate 0.0363 Epoch: 7 Global Step: 329810 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:39,155-Speed 2629.74 samples/sec Loss 8.1318 LearningRate 0.0363 Epoch: 7 Global Step: 329820 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:43,052-Speed 2628.59 samples/sec Loss 8.1994 LearningRate 0.0363 Epoch: 7 Global Step: 329830 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:46,967-Speed 2615.97 samples/sec Loss 8.1027 LearningRate 0.0363 Epoch: 7 Global Step: 329840 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:50,860-Speed 2630.56 samples/sec Loss 8.1851 LearningRate 0.0363 Epoch: 7 Global Step: 329850 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:54,757-Speed 2628.76 samples/sec Loss 8.1204 LearningRate 0.0363 Epoch: 7 Global Step: 329860 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:45:58,650-Speed 2630.70 samples/sec Loss 8.1545 LearningRate 0.0363 Epoch: 7 Global Step: 329870 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:46:02,542-Speed 2632.03 samples/sec Loss 8.1141 LearningRate 0.0363 Epoch: 7 Global Step: 329880 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:46:06,432-Speed 2632.75 samples/sec Loss 8.1234 LearningRate 0.0363 Epoch: 7 Global Step: 329890 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 08:46:10,322-Speed 2632.93 samples/sec Loss 8.1777 LearningRate 0.0363 Epoch: 7 Global Step: 329900 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 08:46:14,195-Speed 2644.72 samples/sec Loss 8.1891 LearningRate 0.0363 Epoch: 7 Global Step: 329910 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:46:18,098-Speed 2624.07 samples/sec Loss 8.0486 LearningRate 0.0363 Epoch: 7 Global Step: 329920 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:46:21,990-Speed 2631.37 samples/sec Loss 8.1874 LearningRate 0.0363 Epoch: 7 Global Step: 329930 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:46:25,884-Speed 2630.71 samples/sec Loss 8.0364 LearningRate 0.0363 Epoch: 7 Global Step: 329940 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:46:29,762-Speed 2640.93 samples/sec Loss 8.2020 LearningRate 0.0363 Epoch: 7 Global Step: 329950 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:46:33,651-Speed 2633.92 samples/sec Loss 8.1450 LearningRate 0.0363 Epoch: 7 Global Step: 329960 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:46:37,541-Speed 2632.41 samples/sec Loss 8.1426 LearningRate 0.0363 Epoch: 7 Global Step: 329970 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:46:41,436-Speed 2629.88 samples/sec Loss 8.1128 LearningRate 0.0363 Epoch: 7 Global Step: 329980 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:46:45,327-Speed 2633.07 samples/sec Loss 8.0848 LearningRate 0.0363 Epoch: 7 Global Step: 329990 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:46:49,215-Speed 2633.95 samples/sec Loss 8.0960 LearningRate 0.0363 Epoch: 7 Global Step: 330000 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:47:32,167-[lfw][330000]XNorm: 23.292403
Training: 2022-04-14 08:47:32,168-[lfw][330000]Accuracy-Flip: 0.99683+-0.00283
Training: 2022-04-14 08:47:32,168-[lfw][330000]Accuracy-Highest: 0.99783
Training: 2022-04-14 08:48:22,119-[cfp_fp][330000]XNorm: 21.462890
Training: 2022-04-14 08:48:22,120-[cfp_fp][330000]Accuracy-Flip: 0.98529+-0.00703
Training: 2022-04-14 08:48:22,120-[cfp_fp][330000]Accuracy-Highest: 0.98671
Training: 2022-04-14 08:49:05,100-[agedb_30][330000]XNorm: 22.999810
Training: 2022-04-14 08:49:05,101-[agedb_30][330000]Accuracy-Flip: 0.97350+-0.00732
Training: 2022-04-14 08:49:05,101-[agedb_30][330000]Accuracy-Highest: 0.97567
Training: 2022-04-14 08:49:08,965-Speed 73.27 samples/sec Loss 8.1676 LearningRate 0.0363 Epoch: 7 Global Step: 330010 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:49:12,833-Speed 2647.85 samples/sec Loss 7.9833 LearningRate 0.0363 Epoch: 7 Global Step: 330020 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:49:16,705-Speed 2645.47 samples/sec Loss 8.0206 LearningRate 0.0363 Epoch: 7 Global Step: 330030 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:49:20,578-Speed 2644.78 samples/sec Loss 8.1956 LearningRate 0.0363 Epoch: 7 Global Step: 330040 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:49:24,451-Speed 2643.87 samples/sec Loss 8.1524 LearningRate 0.0363 Epoch: 7 Global Step: 330050 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:49:28,312-Speed 2652.93 samples/sec Loss 8.2106 LearningRate 0.0363 Epoch: 7 Global Step: 330060 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:49:32,133-Speed 2680.72 samples/sec Loss 8.9486 LearningRate 0.0363 Epoch: 7 Global Step: 330070 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:49:36,007-Speed 2643.48 samples/sec Loss 8.7149 LearningRate 0.0363 Epoch: 7 Global Step: 330080 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:49:39,891-Speed 2637.70 samples/sec Loss 8.2281 LearningRate 0.0363 Epoch: 7 Global Step: 330090 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:49:43,772-Speed 2639.03 samples/sec Loss 8.1858 LearningRate 0.0363 Epoch: 7 Global Step: 330100 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:49:47,658-Speed 2635.51 samples/sec Loss 8.3165 LearningRate 0.0362 Epoch: 7 Global Step: 330110 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:49:51,549-Speed 2632.39 samples/sec Loss 8.1772 LearningRate 0.0362 Epoch: 7 Global Step: 330120 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:49:55,454-Speed 2623.18 samples/sec Loss 8.1508 LearningRate 0.0362 Epoch: 7 Global Step: 330130 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:49:59,335-Speed 2639.30 samples/sec Loss 8.2061 LearningRate 0.0362 Epoch: 7 Global Step: 330140 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:50:03,219-Speed 2636.59 samples/sec Loss 7.9901 LearningRate 0.0362 Epoch: 7 Global Step: 330150 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:50:07,108-Speed 2633.99 samples/sec Loss 8.1764 LearningRate 0.0362 Epoch: 7 Global Step: 330160 Fp16 Grad Scale: 1024 Required: 56 hours
Training: 2022-04-14 08:50:10,994-Speed 2635.05 samples/sec Loss 8.1125 LearningRate 0.0362 Epoch: 7 Global Step: 330170 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:14,883-Speed 2633.90 samples/sec Loss 8.0601 LearningRate 0.0362 Epoch: 7 Global Step: 330180 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:18,773-Speed 2633.01 samples/sec Loss 8.1133 LearningRate 0.0362 Epoch: 7 Global Step: 330190 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:22,676-Speed 2624.59 samples/sec Loss 8.1122 LearningRate 0.0362 Epoch: 7 Global Step: 330200 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:26,564-Speed 2634.25 samples/sec Loss 8.1955 LearningRate 0.0362 Epoch: 7 Global Step: 330210 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:30,452-Speed 2634.52 samples/sec Loss 8.1748 LearningRate 0.0362 Epoch: 7 Global Step: 330220 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:34,353-Speed 2625.21 samples/sec Loss 8.1836 LearningRate 0.0362 Epoch: 7 Global Step: 330230 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:38,248-Speed 2629.86 samples/sec Loss 8.1812 LearningRate 0.0362 Epoch: 7 Global Step: 330240 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:42,150-Speed 2624.97 samples/sec Loss 8.1763 LearningRate 0.0362 Epoch: 7 Global Step: 330250 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:46,059-Speed 2619.68 samples/sec Loss 8.4262 LearningRate 0.0362 Epoch: 7 Global Step: 330260 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:50:49,953-Speed 2630.56 samples/sec Loss 8.1134 LearningRate 0.0362 Epoch: 7 Global Step: 330270 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:50:53,847-Speed 2630.50 samples/sec Loss 8.1188 LearningRate 0.0362 Epoch: 7 Global Step: 330280 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:50:57,739-Speed 2631.65 samples/sec Loss 8.0827 LearningRate 0.0362 Epoch: 7 Global Step: 330290 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:01,630-Speed 2631.83 samples/sec Loss 8.0315 LearningRate 0.0362 Epoch: 7 Global Step: 330300 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:05,519-Speed 2633.82 samples/sec Loss 8.0526 LearningRate 0.0362 Epoch: 7 Global Step: 330310 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:09,409-Speed 2633.43 samples/sec Loss 7.9905 LearningRate 0.0362 Epoch: 7 Global Step: 330320 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:13,298-Speed 2633.60 samples/sec Loss 8.1033 LearningRate 0.0362 Epoch: 7 Global Step: 330330 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:17,192-Speed 2630.13 samples/sec Loss 8.1345 LearningRate 0.0362 Epoch: 7 Global Step: 330340 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:21,088-Speed 2629.56 samples/sec Loss 8.1873 LearningRate 0.0362 Epoch: 7 Global Step: 330350 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:24,980-Speed 2631.34 samples/sec Loss 8.0582 LearningRate 0.0362 Epoch: 7 Global Step: 330360 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:51:28,870-Speed 2632.87 samples/sec Loss 8.0762 LearningRate 0.0362 Epoch: 7 Global Step: 330370 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:51:32,770-Speed 2626.47 samples/sec Loss 8.0095 LearningRate 0.0362 Epoch: 7 Global Step: 330380 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:51:36,663-Speed 2630.72 samples/sec Loss 8.1338 LearningRate 0.0362 Epoch: 7 Global Step: 330390 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:51:40,557-Speed 2630.54 samples/sec Loss 7.9657 LearningRate 0.0362 Epoch: 7 Global Step: 330400 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:51:44,529-Speed 2578.65 samples/sec Loss 7.8959 LearningRate 0.0362 Epoch: 7 Global Step: 330410 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:51:48,425-Speed 2629.03 samples/sec Loss 8.0477 LearningRate 0.0362 Epoch: 7 Global Step: 330420 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:51:52,321-Speed 2629.55 samples/sec Loss 8.1013 LearningRate 0.0362 Epoch: 7 Global Step: 330430 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:51:56,220-Speed 2626.29 samples/sec Loss 8.0865 LearningRate 0.0362 Epoch: 7 Global Step: 330440 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:52:00,130-Speed 2620.29 samples/sec Loss 8.1931 LearningRate 0.0362 Epoch: 7 Global Step: 330450 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:52:04,056-Speed 2609.06 samples/sec Loss 7.9621 LearningRate 0.0362 Epoch: 7 Global Step: 330460 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:52:07,957-Speed 2625.07 samples/sec Loss 8.0899 LearningRate 0.0362 Epoch: 7 Global Step: 330470 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:11,850-Speed 2630.62 samples/sec Loss 8.2089 LearningRate 0.0362 Epoch: 7 Global Step: 330480 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:15,744-Speed 2630.51 samples/sec Loss 8.0687 LearningRate 0.0362 Epoch: 7 Global Step: 330490 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:19,652-Speed 2621.49 samples/sec Loss 8.0107 LearningRate 0.0362 Epoch: 7 Global Step: 330500 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:23,549-Speed 2628.62 samples/sec Loss 8.0096 LearningRate 0.0362 Epoch: 7 Global Step: 330510 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:27,641-Speed 2503.38 samples/sec Loss 8.1173 LearningRate 0.0362 Epoch: 7 Global Step: 330520 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:31,678-Speed 2537.03 samples/sec Loss 8.1890 LearningRate 0.0362 Epoch: 7 Global Step: 330530 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:35,569-Speed 2632.12 samples/sec Loss 8.0871 LearningRate 0.0362 Epoch: 7 Global Step: 330540 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:39,463-Speed 2630.64 samples/sec Loss 8.1733 LearningRate 0.0362 Epoch: 7 Global Step: 330550 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:43,356-Speed 2630.76 samples/sec Loss 8.1522 LearningRate 0.0362 Epoch: 7 Global Step: 330560 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:52:47,252-Speed 2629.06 samples/sec Loss 8.1062 LearningRate 0.0362 Epoch: 7 Global Step: 330570 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:52:51,152-Speed 2626.09 samples/sec Loss 8.1481 LearningRate 0.0362 Epoch: 7 Global Step: 330580 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:52:55,049-Speed 2628.72 samples/sec Loss 8.1906 LearningRate 0.0362 Epoch: 7 Global Step: 330590 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:52:58,951-Speed 2624.80 samples/sec Loss 8.1648 LearningRate 0.0362 Epoch: 7 Global Step: 330600 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:53:02,847-Speed 2629.08 samples/sec Loss 8.1323 LearningRate 0.0362 Epoch: 7 Global Step: 330610 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:53:06,744-Speed 2628.14 samples/sec Loss 8.0844 LearningRate 0.0362 Epoch: 7 Global Step: 330620 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:53:10,641-Speed 2627.89 samples/sec Loss 7.9767 LearningRate 0.0362 Epoch: 7 Global Step: 330630 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:53:14,536-Speed 2630.06 samples/sec Loss 8.1886 LearningRate 0.0362 Epoch: 7 Global Step: 330640 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:53:18,436-Speed 2626.28 samples/sec Loss 8.1683 LearningRate 0.0362 Epoch: 7 Global Step: 330650 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:53:22,330-Speed 2630.51 samples/sec Loss 8.2013 LearningRate 0.0362 Epoch: 7 Global Step: 330660 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:53:26,226-Speed 2629.12 samples/sec Loss 8.1985 LearningRate 0.0362 Epoch: 7 Global Step: 330670 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:30,120-Speed 2630.03 samples/sec Loss 8.1756 LearningRate 0.0362 Epoch: 7 Global Step: 330680 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:34,018-Speed 2628.02 samples/sec Loss 8.0954 LearningRate 0.0362 Epoch: 7 Global Step: 330690 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:37,916-Speed 2627.42 samples/sec Loss 8.1198 LearningRate 0.0362 Epoch: 7 Global Step: 330700 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:41,825-Speed 2619.67 samples/sec Loss 8.1433 LearningRate 0.0362 Epoch: 7 Global Step: 330710 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:45,733-Speed 2621.29 samples/sec Loss 8.0453 LearningRate 0.0362 Epoch: 7 Global Step: 330720 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:49,642-Speed 2620.24 samples/sec Loss 8.0526 LearningRate 0.0362 Epoch: 7 Global Step: 330730 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:53,542-Speed 2626.47 samples/sec Loss 8.1178 LearningRate 0.0362 Epoch: 7 Global Step: 330740 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:53:57,443-Speed 2625.79 samples/sec Loss 8.0351 LearningRate 0.0362 Epoch: 7 Global Step: 330750 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:54:01,338-Speed 2628.91 samples/sec Loss 8.0716 LearningRate 0.0362 Epoch: 7 Global Step: 330760 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:54:05,234-Speed 2629.63 samples/sec Loss 8.0972 LearningRate 0.0362 Epoch: 7 Global Step: 330770 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:09,131-Speed 2627.81 samples/sec Loss 8.0501 LearningRate 0.0362 Epoch: 7 Global Step: 330780 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:13,032-Speed 2625.88 samples/sec Loss 8.1485 LearningRate 0.0362 Epoch: 7 Global Step: 330790 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:16,928-Speed 2628.68 samples/sec Loss 8.0203 LearningRate 0.0361 Epoch: 7 Global Step: 330800 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:20,883-Speed 2590.05 samples/sec Loss 8.0866 LearningRate 0.0361 Epoch: 7 Global Step: 330810 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:24,778-Speed 2630.47 samples/sec Loss 8.1810 LearningRate 0.0361 Epoch: 7 Global Step: 330820 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:28,676-Speed 2627.23 samples/sec Loss 8.0424 LearningRate 0.0361 Epoch: 7 Global Step: 330830 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:32,623-Speed 2595.52 samples/sec Loss 8.2643 LearningRate 0.0361 Epoch: 7 Global Step: 330840 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:36,533-Speed 2619.23 samples/sec Loss 8.1416 LearningRate 0.0361 Epoch: 7 Global Step: 330850 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:40,431-Speed 2627.85 samples/sec Loss 7.9999 LearningRate 0.0361 Epoch: 7 Global Step: 330860 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:44,324-Speed 2631.09 samples/sec Loss 8.1633 LearningRate 0.0361 Epoch: 7 Global Step: 330870 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 08:54:48,204-Speed 2640.01 samples/sec Loss 8.0789 LearningRate 0.0361 Epoch: 7 Global Step: 330880 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:52,104-Speed 2625.71 samples/sec Loss 8.1589 LearningRate 0.0361 Epoch: 7 Global Step: 330890 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:55,999-Speed 2629.95 samples/sec Loss 8.0979 LearningRate 0.0361 Epoch: 7 Global Step: 330900 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:54:59,895-Speed 2629.33 samples/sec Loss 7.9973 LearningRate 0.0361 Epoch: 7 Global Step: 330910 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:03,792-Speed 2629.45 samples/sec Loss 8.2101 LearningRate 0.0361 Epoch: 7 Global Step: 330920 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:07,689-Speed 2627.95 samples/sec Loss 7.9794 LearningRate 0.0361 Epoch: 7 Global Step: 330930 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:11,582-Speed 2631.04 samples/sec Loss 8.1453 LearningRate 0.0361 Epoch: 7 Global Step: 330940 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:15,488-Speed 2622.65 samples/sec Loss 8.1556 LearningRate 0.0361 Epoch: 7 Global Step: 330950 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:19,382-Speed 2630.38 samples/sec Loss 8.2977 LearningRate 0.0361 Epoch: 7 Global Step: 330960 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:23,281-Speed 2626.54 samples/sec Loss 8.0516 LearningRate 0.0361 Epoch: 7 Global Step: 330970 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:27,181-Speed 2626.89 samples/sec Loss 7.9823 LearningRate 0.0361 Epoch: 7 Global Step: 330980 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 08:55:31,055-Speed 2643.31 samples/sec Loss 8.0335 LearningRate 0.0361 Epoch: 7 Global Step: 330990 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 08:55:34,924-Speed 2647.57 samples/sec Loss 8.1116 LearningRate 0.0361 Epoch: 7 Global Step: 331000 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:55:38,817-Speed 2631.39 samples/sec Loss 8.1938 LearningRate 0.0361 Epoch: 7 Global Step: 331010 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:55:42,727-Speed 2619.34 samples/sec Loss 8.1826 LearningRate 0.0361 Epoch: 7 Global Step: 331020 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:55:46,620-Speed 2631.13 samples/sec Loss 8.0827 LearningRate 0.0361 Epoch: 7 Global Step: 331030 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:55:50,516-Speed 2629.06 samples/sec Loss 8.0451 LearningRate 0.0361 Epoch: 7 Global Step: 331040 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:55:54,407-Speed 2632.07 samples/sec Loss 8.1705 LearningRate 0.0361 Epoch: 7 Global Step: 331050 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:55:58,311-Speed 2623.76 samples/sec Loss 8.0927 LearningRate 0.0361 Epoch: 7 Global Step: 331060 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:56:02,210-Speed 2626.69 samples/sec Loss 8.0800 LearningRate 0.0361 Epoch: 7 Global Step: 331070 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:56:06,106-Speed 2629.64 samples/sec Loss 8.2255 LearningRate 0.0361 Epoch: 7 Global Step: 331080 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:56:10,004-Speed 2627.22 samples/sec Loss 8.1545 LearningRate 0.0361 Epoch: 7 Global Step: 331090 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:56:13,900-Speed 2629.15 samples/sec Loss 8.0935 LearningRate 0.0361 Epoch: 7 Global Step: 331100 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:56:17,794-Speed 2629.96 samples/sec Loss 8.0919 LearningRate 0.0361 Epoch: 7 Global Step: 331110 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 08:56:21,675-Speed 2639.18 samples/sec Loss 8.0859 LearningRate 0.0361 Epoch: 7 Global Step: 331120 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:25,569-Speed 2630.92 samples/sec Loss 8.1039 LearningRate 0.0361 Epoch: 7 Global Step: 331130 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:29,469-Speed 2626.29 samples/sec Loss 8.0268 LearningRate 0.0361 Epoch: 7 Global Step: 331140 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:33,360-Speed 2631.83 samples/sec Loss 8.1113 LearningRate 0.0361 Epoch: 7 Global Step: 331150 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:37,265-Speed 2623.53 samples/sec Loss 8.0356 LearningRate 0.0361 Epoch: 7 Global Step: 331160 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:41,157-Speed 2631.31 samples/sec Loss 8.1111 LearningRate 0.0361 Epoch: 7 Global Step: 331170 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:45,061-Speed 2624.29 samples/sec Loss 7.9105 LearningRate 0.0361 Epoch: 7 Global Step: 331180 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:48,952-Speed 2632.29 samples/sec Loss 8.1415 LearningRate 0.0361 Epoch: 7 Global Step: 331190 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:52,851-Speed 2627.06 samples/sec Loss 8.1897 LearningRate 0.0361 Epoch: 7 Global Step: 331200 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:56:56,742-Speed 2632.27 samples/sec Loss 8.1025 LearningRate 0.0361 Epoch: 7 Global Step: 331210 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:57:00,633-Speed 2632.17 samples/sec Loss 8.1088 LearningRate 0.0361 Epoch: 7 Global Step: 331220 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:57:04,533-Speed 2626.42 samples/sec Loss 7.9081 LearningRate 0.0361 Epoch: 7 Global Step: 331230 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:57:08,423-Speed 2632.54 samples/sec Loss 8.0862 LearningRate 0.0361 Epoch: 7 Global Step: 331240 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:57:12,317-Speed 2630.45 samples/sec Loss 8.0739 LearningRate 0.0361 Epoch: 7 Global Step: 331250 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:57:16,209-Speed 2631.74 samples/sec Loss 8.1016 LearningRate 0.0361 Epoch: 7 Global Step: 331260 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 08:57:20,078-Speed 2647.95 samples/sec Loss 8.4140 LearningRate 0.0361 Epoch: 7 Global Step: 331270 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:57:23,929-Speed 2660.01 samples/sec Loss 9.5979 LearningRate 0.0361 Epoch: 7 Global Step: 331280 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:27,837-Speed 2620.71 samples/sec Loss 9.0119 LearningRate 0.0361 Epoch: 7 Global Step: 331290 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:31,726-Speed 2633.45 samples/sec Loss 8.4616 LearningRate 0.0361 Epoch: 7 Global Step: 331300 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:35,623-Speed 2628.55 samples/sec Loss 8.1836 LearningRate 0.0361 Epoch: 7 Global Step: 331310 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:39,514-Speed 2632.44 samples/sec Loss 8.2211 LearningRate 0.0361 Epoch: 7 Global Step: 331320 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:43,403-Speed 2633.60 samples/sec Loss 7.9472 LearningRate 0.0361 Epoch: 7 Global Step: 331330 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:47,299-Speed 2629.10 samples/sec Loss 8.0013 LearningRate 0.0361 Epoch: 7 Global Step: 331340 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:51,213-Speed 2617.06 samples/sec Loss 8.1469 LearningRate 0.0361 Epoch: 7 Global Step: 331350 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:55,103-Speed 2632.28 samples/sec Loss 8.1538 LearningRate 0.0361 Epoch: 7 Global Step: 331360 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:57:58,994-Speed 2633.09 samples/sec Loss 8.1315 LearningRate 0.0361 Epoch: 7 Global Step: 331370 Fp16 Grad Scale: 2048 Required: 56 hours
Training: 2022-04-14 08:58:02,900-Speed 2622.18 samples/sec Loss 8.0082 LearningRate 0.0361 Epoch: 7 Global Step: 331380 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:06,809-Speed 2620.14 samples/sec Loss 8.0951 LearningRate 0.0361 Epoch: 7 Global Step: 331390 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:10,714-Speed 2622.84 samples/sec Loss 8.1328 LearningRate 0.0361 Epoch: 7 Global Step: 331400 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:14,626-Speed 2618.46 samples/sec Loss 8.0562 LearningRate 0.0361 Epoch: 7 Global Step: 331410 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:18,521-Speed 2629.49 samples/sec Loss 8.1339 LearningRate 0.0361 Epoch: 7 Global Step: 331420 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:22,414-Speed 2631.20 samples/sec Loss 8.0964 LearningRate 0.0361 Epoch: 7 Global Step: 331430 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:26,310-Speed 2629.42 samples/sec Loss 8.0811 LearningRate 0.0361 Epoch: 7 Global Step: 331440 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:30,291-Speed 2572.83 samples/sec Loss 8.1476 LearningRate 0.0361 Epoch: 7 Global Step: 331450 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:34,204-Speed 2617.56 samples/sec Loss 8.2405 LearningRate 0.0361 Epoch: 7 Global Step: 331460 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:38,095-Speed 2632.52 samples/sec Loss 8.1784 LearningRate 0.0361 Epoch: 7 Global Step: 331470 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 08:58:41,990-Speed 2629.26 samples/sec Loss 8.1231 LearningRate 0.0361 Epoch: 7 Global Step: 331480 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:58:45,886-Speed 2629.85 samples/sec Loss 8.0009 LearningRate 0.0360 Epoch: 7 Global Step: 331490 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:58:49,784-Speed 2627.29 samples/sec Loss 8.2593 LearningRate 0.0360 Epoch: 7 Global Step: 331500 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:58:53,677-Speed 2630.84 samples/sec Loss 8.2235 LearningRate 0.0360 Epoch: 7 Global Step: 331510 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:58:57,575-Speed 2627.37 samples/sec Loss 8.0172 LearningRate 0.0360 Epoch: 7 Global Step: 331520 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:59:01,472-Speed 2629.05 samples/sec Loss 8.0116 LearningRate 0.0360 Epoch: 7 Global Step: 331530 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:59:05,375-Speed 2623.93 samples/sec Loss 8.0141 LearningRate 0.0360 Epoch: 7 Global Step: 331540 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:59:09,270-Speed 2630.03 samples/sec Loss 8.0661 LearningRate 0.0360 Epoch: 7 Global Step: 331550 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:59:13,171-Speed 2625.22 samples/sec Loss 8.6936 LearningRate 0.0360 Epoch: 7 Global Step: 331560 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:59:17,069-Speed 2627.47 samples/sec Loss 8.6215 LearningRate 0.0360 Epoch: 7 Global Step: 331570 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 08:59:20,968-Speed 2627.38 samples/sec Loss 8.2048 LearningRate 0.0360 Epoch: 7 Global Step: 331580 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:24,863-Speed 2629.32 samples/sec Loss 8.1931 LearningRate 0.0360 Epoch: 7 Global Step: 331590 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:28,756-Speed 2631.52 samples/sec Loss 8.1277 LearningRate 0.0360 Epoch: 7 Global Step: 331600 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:32,651-Speed 2629.22 samples/sec Loss 8.0890 LearningRate 0.0360 Epoch: 7 Global Step: 331610 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:36,544-Speed 2630.89 samples/sec Loss 8.0975 LearningRate 0.0360 Epoch: 7 Global Step: 331620 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:40,449-Speed 2623.45 samples/sec Loss 8.1331 LearningRate 0.0360 Epoch: 7 Global Step: 331630 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:44,355-Speed 2621.92 samples/sec Loss 8.1233 LearningRate 0.0360 Epoch: 7 Global Step: 331640 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:48,279-Speed 2610.33 samples/sec Loss 8.1467 LearningRate 0.0360 Epoch: 7 Global Step: 331650 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:52,179-Speed 2626.46 samples/sec Loss 7.9623 LearningRate 0.0360 Epoch: 7 Global Step: 331660 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 08:59:56,084-Speed 2622.46 samples/sec Loss 8.1169 LearningRate 0.0360 Epoch: 7 Global Step: 331670 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:00,012-Speed 2608.32 samples/sec Loss 8.1322 LearningRate 0.0360 Epoch: 7 Global Step: 331680 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:00:03,912-Speed 2626.37 samples/sec Loss 8.1812 LearningRate 0.0360 Epoch: 7 Global Step: 331690 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:00:07,805-Speed 2630.41 samples/sec Loss 8.1219 LearningRate 0.0360 Epoch: 7 Global Step: 331700 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:00:11,699-Speed 2630.16 samples/sec Loss 8.1624 LearningRate 0.0360 Epoch: 7 Global Step: 331710 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:00:15,592-Speed 2631.36 samples/sec Loss 8.1138 LearningRate 0.0360 Epoch: 7 Global Step: 331720 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:00:19,494-Speed 2624.86 samples/sec Loss 8.1536 LearningRate 0.0360 Epoch: 7 Global Step: 331730 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:00:23,380-Speed 2635.60 samples/sec Loss 8.4053 LearningRate 0.0360 Epoch: 7 Global Step: 331740 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:27,279-Speed 2627.45 samples/sec Loss 8.2858 LearningRate 0.0360 Epoch: 7 Global Step: 331750 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:31,168-Speed 2633.67 samples/sec Loss 8.1461 LearningRate 0.0360 Epoch: 7 Global Step: 331760 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:35,063-Speed 2630.09 samples/sec Loss 8.1461 LearningRate 0.0360 Epoch: 7 Global Step: 331770 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:38,966-Speed 2623.84 samples/sec Loss 8.1362 LearningRate 0.0360 Epoch: 7 Global Step: 331780 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:42,868-Speed 2624.92 samples/sec Loss 8.1501 LearningRate 0.0360 Epoch: 7 Global Step: 331790 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:46,757-Speed 2633.34 samples/sec Loss 8.0914 LearningRate 0.0360 Epoch: 7 Global Step: 331800 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:50,661-Speed 2624.57 samples/sec Loss 8.0940 LearningRate 0.0360 Epoch: 7 Global Step: 331810 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:54,563-Speed 2624.97 samples/sec Loss 8.1197 LearningRate 0.0360 Epoch: 7 Global Step: 331820 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:00:58,462-Speed 2627.50 samples/sec Loss 8.1127 LearningRate 0.0360 Epoch: 7 Global Step: 331830 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:01:19,502-Speed 486.72 samples/sec Loss 8.0724 LearningRate 0.0360 Epoch: 8 Global Step: 331840 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:23,400-Speed 2627.68 samples/sec Loss 8.0930 LearningRate 0.0360 Epoch: 8 Global Step: 331850 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:27,297-Speed 2628.78 samples/sec Loss 8.0882 LearningRate 0.0360 Epoch: 8 Global Step: 331860 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:31,196-Speed 2626.83 samples/sec Loss 8.1416 LearningRate 0.0360 Epoch: 8 Global Step: 331870 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:35,085-Speed 2634.27 samples/sec Loss 8.1663 LearningRate 0.0360 Epoch: 8 Global Step: 331880 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:38,973-Speed 2633.76 samples/sec Loss 8.0787 LearningRate 0.0360 Epoch: 8 Global Step: 331890 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:42,865-Speed 2631.88 samples/sec Loss 8.0898 LearningRate 0.0360 Epoch: 8 Global Step: 331900 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:46,762-Speed 2628.35 samples/sec Loss 8.0664 LearningRate 0.0360 Epoch: 8 Global Step: 331910 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:50,676-Speed 2616.77 samples/sec Loss 8.0642 LearningRate 0.0360 Epoch: 8 Global Step: 331920 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:54,608-Speed 2604.70 samples/sec Loss 8.0583 LearningRate 0.0360 Epoch: 8 Global Step: 331930 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:01:58,505-Speed 2628.62 samples/sec Loss 8.0791 LearningRate 0.0360 Epoch: 8 Global Step: 331940 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:02,406-Speed 2625.98 samples/sec Loss 8.2003 LearningRate 0.0360 Epoch: 8 Global Step: 331950 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:06,327-Speed 2612.40 samples/sec Loss 8.0981 LearningRate 0.0360 Epoch: 8 Global Step: 331960 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:10,238-Speed 2618.68 samples/sec Loss 8.2651 LearningRate 0.0360 Epoch: 8 Global Step: 331970 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:14,138-Speed 2626.59 samples/sec Loss 8.1200 LearningRate 0.0360 Epoch: 8 Global Step: 331980 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:18,042-Speed 2623.09 samples/sec Loss 8.1717 LearningRate 0.0360 Epoch: 8 Global Step: 331990 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:21,968-Speed 2608.90 samples/sec Loss 8.1598 LearningRate 0.0360 Epoch: 8 Global Step: 332000 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:25,890-Speed 2611.98 samples/sec Loss 8.1122 LearningRate 0.0360 Epoch: 8 Global Step: 332010 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:29,803-Speed 2617.49 samples/sec Loss 8.0745 LearningRate 0.0360 Epoch: 8 Global Step: 332020 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:33,700-Speed 2628.39 samples/sec Loss 8.1734 LearningRate 0.0360 Epoch: 8 Global Step: 332030 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:02:37,600-Speed 2626.45 samples/sec Loss 8.0309 LearningRate 0.0360 Epoch: 8 Global Step: 332040 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:02:41,572-Speed 2578.97 samples/sec Loss 8.0865 LearningRate 0.0360 Epoch: 8 Global Step: 332050 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:02:45,652-Speed 2510.64 samples/sec Loss 7.8931 LearningRate 0.0360 Epoch: 8 Global Step: 332060 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:02:49,732-Speed 2509.69 samples/sec Loss 8.1778 LearningRate 0.0360 Epoch: 8 Global Step: 332070 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:02:53,647-Speed 2616.44 samples/sec Loss 8.1967 LearningRate 0.0360 Epoch: 8 Global Step: 332080 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:02:57,545-Speed 2628.33 samples/sec Loss 8.0222 LearningRate 0.0360 Epoch: 8 Global Step: 332090 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:01,443-Speed 2627.36 samples/sec Loss 8.0256 LearningRate 0.0360 Epoch: 8 Global Step: 332100 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:05,346-Speed 2623.86 samples/sec Loss 7.9926 LearningRate 0.0360 Epoch: 8 Global Step: 332110 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:09,250-Speed 2623.92 samples/sec Loss 8.1616 LearningRate 0.0360 Epoch: 8 Global Step: 332120 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:13,154-Speed 2623.87 samples/sec Loss 8.1122 LearningRate 0.0360 Epoch: 8 Global Step: 332130 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:17,193-Speed 2536.16 samples/sec Loss 8.0400 LearningRate 0.0360 Epoch: 8 Global Step: 332140 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:03:21,174-Speed 2573.13 samples/sec Loss 7.9568 LearningRate 0.0360 Epoch: 8 Global Step: 332150 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:25,066-Speed 2631.37 samples/sec Loss 7.9864 LearningRate 0.0360 Epoch: 8 Global Step: 332160 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:28,971-Speed 2623.59 samples/sec Loss 8.1531 LearningRate 0.0360 Epoch: 8 Global Step: 332170 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:32,869-Speed 2627.85 samples/sec Loss 8.0557 LearningRate 0.0359 Epoch: 8 Global Step: 332180 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:36,767-Speed 2626.99 samples/sec Loss 8.0888 LearningRate 0.0359 Epoch: 8 Global Step: 332190 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:03:40,637-Speed 2647.02 samples/sec Loss 7.9639 LearningRate 0.0359 Epoch: 8 Global Step: 332200 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:03:44,599-Speed 2585.62 samples/sec Loss 8.1019 LearningRate 0.0359 Epoch: 8 Global Step: 332210 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:03:48,492-Speed 2630.81 samples/sec Loss 8.1202 LearningRate 0.0359 Epoch: 8 Global Step: 332220 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:03:52,391-Speed 2626.94 samples/sec Loss 8.0869 LearningRate 0.0359 Epoch: 8 Global Step: 332230 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:03:56,284-Speed 2630.70 samples/sec Loss 7.9768 LearningRate 0.0359 Epoch: 8 Global Step: 332240 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:04:00,183-Speed 2627.57 samples/sec Loss 7.8780 LearningRate 0.0359 Epoch: 8 Global Step: 332250 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:04:04,091-Speed 2620.50 samples/sec Loss 8.1383 LearningRate 0.0359 Epoch: 8 Global Step: 332260 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:04:07,990-Speed 2627.30 samples/sec Loss 8.0668 LearningRate 0.0359 Epoch: 8 Global Step: 332270 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:04:11,887-Speed 2628.02 samples/sec Loss 8.0280 LearningRate 0.0359 Epoch: 8 Global Step: 332280 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:04:15,785-Speed 2628.01 samples/sec Loss 8.0694 LearningRate 0.0359 Epoch: 8 Global Step: 332290 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:04:19,684-Speed 2626.73 samples/sec Loss 8.1257 LearningRate 0.0359 Epoch: 8 Global Step: 332300 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:04:23,586-Speed 2625.33 samples/sec Loss 8.0586 LearningRate 0.0359 Epoch: 8 Global Step: 332310 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:04:27,495-Speed 2620.34 samples/sec Loss 7.9935 LearningRate 0.0359 Epoch: 8 Global Step: 332320 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:04:31,423-Speed 2608.03 samples/sec Loss 8.1010 LearningRate 0.0359 Epoch: 8 Global Step: 332330 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:04:35,337-Speed 2616.37 samples/sec Loss 8.0936 LearningRate 0.0359 Epoch: 8 Global Step: 332340 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:04:39,193-Speed 2656.25 samples/sec Loss 8.1958 LearningRate 0.0359 Epoch: 8 Global Step: 332350 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:04:43,088-Speed 2629.51 samples/sec Loss 8.1956 LearningRate 0.0359 Epoch: 8 Global Step: 332360 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:04:46,994-Speed 2622.79 samples/sec Loss 8.0880 LearningRate 0.0359 Epoch: 8 Global Step: 332370 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:04:50,890-Speed 2629.58 samples/sec Loss 8.1545 LearningRate 0.0359 Epoch: 8 Global Step: 332380 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:04:54,783-Speed 2630.54 samples/sec Loss 8.0955 LearningRate 0.0359 Epoch: 8 Global Step: 332390 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:04:58,678-Speed 2629.94 samples/sec Loss 8.0489 LearningRate 0.0359 Epoch: 8 Global Step: 332400 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:05:02,571-Speed 2630.93 samples/sec Loss 8.0866 LearningRate 0.0359 Epoch: 8 Global Step: 332410 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:05:06,482-Speed 2618.94 samples/sec Loss 8.0286 LearningRate 0.0359 Epoch: 8 Global Step: 332420 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:05:10,378-Speed 2629.06 samples/sec Loss 7.9878 LearningRate 0.0359 Epoch: 8 Global Step: 332430 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:05:14,269-Speed 2632.88 samples/sec Loss 7.9828 LearningRate 0.0359 Epoch: 8 Global Step: 332440 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:05:18,170-Speed 2625.18 samples/sec Loss 8.1495 LearningRate 0.0359 Epoch: 8 Global Step: 332450 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:22,067-Speed 2628.65 samples/sec Loss 8.0742 LearningRate 0.0359 Epoch: 8 Global Step: 332460 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:25,963-Speed 2628.55 samples/sec Loss 8.0345 LearningRate 0.0359 Epoch: 8 Global Step: 332470 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:29,858-Speed 2630.48 samples/sec Loss 8.0629 LearningRate 0.0359 Epoch: 8 Global Step: 332480 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:33,753-Speed 2629.52 samples/sec Loss 8.0729 LearningRate 0.0359 Epoch: 8 Global Step: 332490 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:37,663-Speed 2620.07 samples/sec Loss 8.1325 LearningRate 0.0359 Epoch: 8 Global Step: 332500 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:41,560-Speed 2627.92 samples/sec Loss 7.8843 LearningRate 0.0359 Epoch: 8 Global Step: 332510 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:45,458-Speed 2628.05 samples/sec Loss 8.0671 LearningRate 0.0359 Epoch: 8 Global Step: 332520 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:49,354-Speed 2628.95 samples/sec Loss 8.0667 LearningRate 0.0359 Epoch: 8 Global Step: 332530 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:53,250-Speed 2629.05 samples/sec Loss 8.0703 LearningRate 0.0359 Epoch: 8 Global Step: 332540 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:05:57,144-Speed 2630.27 samples/sec Loss 8.0832 LearningRate 0.0359 Epoch: 8 Global Step: 332550 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:06:01,017-Speed 2644.82 samples/sec Loss 8.3471 LearningRate 0.0359 Epoch: 8 Global Step: 332560 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:06:04,889-Speed 2645.33 samples/sec Loss 8.6420 LearningRate 0.0359 Epoch: 8 Global Step: 332570 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:08,779-Speed 2632.78 samples/sec Loss 8.1058 LearningRate 0.0359 Epoch: 8 Global Step: 332580 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:12,673-Speed 2630.58 samples/sec Loss 8.0368 LearningRate 0.0359 Epoch: 8 Global Step: 332590 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:16,571-Speed 2627.42 samples/sec Loss 8.1550 LearningRate 0.0359 Epoch: 8 Global Step: 332600 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:20,469-Speed 2627.65 samples/sec Loss 8.0765 LearningRate 0.0359 Epoch: 8 Global Step: 332610 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:24,360-Speed 2632.45 samples/sec Loss 8.0532 LearningRate 0.0359 Epoch: 8 Global Step: 332620 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:28,252-Speed 2631.50 samples/sec Loss 8.1302 LearningRate 0.0359 Epoch: 8 Global Step: 332630 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:32,163-Speed 2618.95 samples/sec Loss 7.9484 LearningRate 0.0359 Epoch: 8 Global Step: 332640 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:36,054-Speed 2632.48 samples/sec Loss 8.1387 LearningRate 0.0359 Epoch: 8 Global Step: 332650 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:39,949-Speed 2630.35 samples/sec Loss 8.0824 LearningRate 0.0359 Epoch: 8 Global Step: 332660 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:06:43,841-Speed 2631.73 samples/sec Loss 8.0997 LearningRate 0.0359 Epoch: 8 Global Step: 332670 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:06:47,927-Speed 2506.53 samples/sec Loss 8.1206 LearningRate 0.0359 Epoch: 8 Global Step: 332680 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:06:52,009-Speed 2509.00 samples/sec Loss 8.0730 LearningRate 0.0359 Epoch: 8 Global Step: 332690 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:06:56,050-Speed 2534.63 samples/sec Loss 8.0453 LearningRate 0.0359 Epoch: 8 Global Step: 332700 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:06:59,988-Speed 2601.41 samples/sec Loss 8.0840 LearningRate 0.0359 Epoch: 8 Global Step: 332710 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:07:03,883-Speed 2630.35 samples/sec Loss 8.0519 LearningRate 0.0359 Epoch: 8 Global Step: 332720 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:07:07,773-Speed 2632.46 samples/sec Loss 8.2101 LearningRate 0.0359 Epoch: 8 Global Step: 332730 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:07:11,775-Speed 2559.76 samples/sec Loss 7.9456 LearningRate 0.0359 Epoch: 8 Global Step: 332740 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:07:15,666-Speed 2632.17 samples/sec Loss 8.0233 LearningRate 0.0359 Epoch: 8 Global Step: 332750 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:07:19,582-Speed 2615.44 samples/sec Loss 8.1102 LearningRate 0.0359 Epoch: 8 Global Step: 332760 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:07:23,486-Speed 2623.89 samples/sec Loss 8.1020 LearningRate 0.0359 Epoch: 8 Global Step: 332770 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:27,379-Speed 2631.25 samples/sec Loss 8.0722 LearningRate 0.0359 Epoch: 8 Global Step: 332780 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:31,273-Speed 2630.30 samples/sec Loss 8.1105 LearningRate 0.0359 Epoch: 8 Global Step: 332790 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:35,170-Speed 2628.35 samples/sec Loss 7.9900 LearningRate 0.0359 Epoch: 8 Global Step: 332800 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:39,068-Speed 2627.88 samples/sec Loss 8.2125 LearningRate 0.0359 Epoch: 8 Global Step: 332810 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:42,961-Speed 2631.36 samples/sec Loss 8.0646 LearningRate 0.0359 Epoch: 8 Global Step: 332820 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:47,012-Speed 2528.04 samples/sec Loss 8.0637 LearningRate 0.0359 Epoch: 8 Global Step: 332830 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:51,080-Speed 2517.81 samples/sec Loss 8.1037 LearningRate 0.0359 Epoch: 8 Global Step: 332840 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:54,977-Speed 2628.51 samples/sec Loss 8.0799 LearningRate 0.0359 Epoch: 8 Global Step: 332850 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:07:58,878-Speed 2625.15 samples/sec Loss 8.0609 LearningRate 0.0359 Epoch: 8 Global Step: 332860 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:08:02,787-Speed 2620.72 samples/sec Loss 7.9097 LearningRate 0.0359 Epoch: 8 Global Step: 332870 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:06,703-Speed 2615.20 samples/sec Loss 8.0930 LearningRate 0.0358 Epoch: 8 Global Step: 332880 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:10,623-Speed 2612.74 samples/sec Loss 7.9687 LearningRate 0.0358 Epoch: 8 Global Step: 332890 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:14,521-Speed 2628.05 samples/sec Loss 8.0454 LearningRate 0.0358 Epoch: 8 Global Step: 332900 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:18,414-Speed 2630.93 samples/sec Loss 8.1104 LearningRate 0.0358 Epoch: 8 Global Step: 332910 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:22,310-Speed 2628.83 samples/sec Loss 8.1454 LearningRate 0.0358 Epoch: 8 Global Step: 332920 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:26,204-Speed 2630.77 samples/sec Loss 8.1043 LearningRate 0.0358 Epoch: 8 Global Step: 332930 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:30,097-Speed 2630.78 samples/sec Loss 8.1660 LearningRate 0.0358 Epoch: 8 Global Step: 332940 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:33,998-Speed 2625.74 samples/sec Loss 8.0346 LearningRate 0.0358 Epoch: 8 Global Step: 332950 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:37,921-Speed 2611.16 samples/sec Loss 8.0763 LearningRate 0.0358 Epoch: 8 Global Step: 332960 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:08:41,821-Speed 2626.77 samples/sec Loss 8.0541 LearningRate 0.0358 Epoch: 8 Global Step: 332970 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:08:45,726-Speed 2622.82 samples/sec Loss 8.0873 LearningRate 0.0358 Epoch: 8 Global Step: 332980 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:08:49,630-Speed 2623.25 samples/sec Loss 8.1418 LearningRate 0.0358 Epoch: 8 Global Step: 332990 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:08:53,531-Speed 2625.92 samples/sec Loss 8.2173 LearningRate 0.0358 Epoch: 8 Global Step: 333000 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:08:57,433-Speed 2624.99 samples/sec Loss 8.0241 LearningRate 0.0358 Epoch: 8 Global Step: 333010 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:09:01,426-Speed 2565.37 samples/sec Loss 8.0322 LearningRate 0.0358 Epoch: 8 Global Step: 333020 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:09:05,320-Speed 2629.58 samples/sec Loss 8.0987 LearningRate 0.0358 Epoch: 8 Global Step: 333030 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:09:09,214-Speed 2630.44 samples/sec Loss 7.9428 LearningRate 0.0358 Epoch: 8 Global Step: 333040 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:09:13,107-Speed 2630.96 samples/sec Loss 8.0302 LearningRate 0.0358 Epoch: 8 Global Step: 333050 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:09:17,003-Speed 2629.44 samples/sec Loss 8.1008 LearningRate 0.0358 Epoch: 8 Global Step: 333060 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:09:20,910-Speed 2621.31 samples/sec Loss 7.9886 LearningRate 0.0358 Epoch: 8 Global Step: 333070 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:24,815-Speed 2622.91 samples/sec Loss 8.0530 LearningRate 0.0358 Epoch: 8 Global Step: 333080 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:28,710-Speed 2629.31 samples/sec Loss 8.0570 LearningRate 0.0358 Epoch: 8 Global Step: 333090 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:32,605-Speed 2629.77 samples/sec Loss 8.0753 LearningRate 0.0358 Epoch: 8 Global Step: 333100 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:36,503-Speed 2627.66 samples/sec Loss 7.9732 LearningRate 0.0358 Epoch: 8 Global Step: 333110 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:40,400-Speed 2628.31 samples/sec Loss 7.8666 LearningRate 0.0358 Epoch: 8 Global Step: 333120 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:44,294-Speed 2630.15 samples/sec Loss 8.0497 LearningRate 0.0358 Epoch: 8 Global Step: 333130 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:48,192-Speed 2627.88 samples/sec Loss 8.0661 LearningRate 0.0358 Epoch: 8 Global Step: 333140 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:52,106-Speed 2617.41 samples/sec Loss 8.1224 LearningRate 0.0358 Epoch: 8 Global Step: 333150 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:56,006-Speed 2625.75 samples/sec Loss 8.1106 LearningRate 0.0358 Epoch: 8 Global Step: 333160 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:09:59,940-Speed 2604.20 samples/sec Loss 8.1165 LearningRate 0.0358 Epoch: 8 Global Step: 333170 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:10:03,848-Speed 2621.11 samples/sec Loss 8.0717 LearningRate 0.0358 Epoch: 8 Global Step: 333180 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:10:07,727-Speed 2640.02 samples/sec Loss 7.9876 LearningRate 0.0358 Epoch: 8 Global Step: 333190 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:11,620-Speed 2631.08 samples/sec Loss 8.0861 LearningRate 0.0358 Epoch: 8 Global Step: 333200 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:15,524-Speed 2623.99 samples/sec Loss 8.0739 LearningRate 0.0358 Epoch: 8 Global Step: 333210 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:19,424-Speed 2626.06 samples/sec Loss 8.1661 LearningRate 0.0358 Epoch: 8 Global Step: 333220 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:23,322-Speed 2628.37 samples/sec Loss 8.1139 LearningRate 0.0358 Epoch: 8 Global Step: 333230 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:27,233-Speed 2618.77 samples/sec Loss 8.0243 LearningRate 0.0358 Epoch: 8 Global Step: 333240 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:31,124-Speed 2632.68 samples/sec Loss 8.1090 LearningRate 0.0358 Epoch: 8 Global Step: 333250 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:35,021-Speed 2628.00 samples/sec Loss 8.0065 LearningRate 0.0358 Epoch: 8 Global Step: 333260 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:38,917-Speed 2629.25 samples/sec Loss 8.1280 LearningRate 0.0358 Epoch: 8 Global Step: 333270 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:42,810-Speed 2630.70 samples/sec Loss 8.1283 LearningRate 0.0358 Epoch: 8 Global Step: 333280 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:46,705-Speed 2629.94 samples/sec Loss 8.0781 LearningRate 0.0358 Epoch: 8 Global Step: 333290 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:50,596-Speed 2632.41 samples/sec Loss 8.0848 LearningRate 0.0358 Epoch: 8 Global Step: 333300 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:54,496-Speed 2626.34 samples/sec Loss 8.1469 LearningRate 0.0358 Epoch: 8 Global Step: 333310 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:10:58,392-Speed 2628.68 samples/sec Loss 8.1304 LearningRate 0.0358 Epoch: 8 Global Step: 333320 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:11:02,308-Speed 2616.22 samples/sec Loss 8.0745 LearningRate 0.0358 Epoch: 8 Global Step: 333330 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:11:06,201-Speed 2630.76 samples/sec Loss 7.9707 LearningRate 0.0358 Epoch: 8 Global Step: 333340 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:11:10,098-Speed 2628.30 samples/sec Loss 7.9917 LearningRate 0.0358 Epoch: 8 Global Step: 333350 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:11:13,977-Speed 2640.70 samples/sec Loss 7.9272 LearningRate 0.0358 Epoch: 8 Global Step: 333360 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:17,874-Speed 2627.89 samples/sec Loss 8.0231 LearningRate 0.0358 Epoch: 8 Global Step: 333370 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:21,775-Speed 2626.30 samples/sec Loss 7.9398 LearningRate 0.0358 Epoch: 8 Global Step: 333380 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:25,679-Speed 2623.40 samples/sec Loss 8.1658 LearningRate 0.0358 Epoch: 8 Global Step: 333390 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:29,586-Speed 2622.23 samples/sec Loss 8.0233 LearningRate 0.0358 Epoch: 8 Global Step: 333400 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:33,486-Speed 2626.04 samples/sec Loss 8.0400 LearningRate 0.0358 Epoch: 8 Global Step: 333410 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:37,380-Speed 2630.16 samples/sec Loss 7.9075 LearningRate 0.0358 Epoch: 8 Global Step: 333420 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:41,284-Speed 2622.98 samples/sec Loss 8.0718 LearningRate 0.0358 Epoch: 8 Global Step: 333430 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:45,197-Speed 2618.21 samples/sec Loss 8.0394 LearningRate 0.0358 Epoch: 8 Global Step: 333440 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:49,101-Speed 2623.54 samples/sec Loss 7.9447 LearningRate 0.0358 Epoch: 8 Global Step: 333450 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:11:53,001-Speed 2626.21 samples/sec Loss 8.0330 LearningRate 0.0358 Epoch: 8 Global Step: 333460 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:11:56,908-Speed 2622.10 samples/sec Loss 8.0817 LearningRate 0.0358 Epoch: 8 Global Step: 333470 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:00,817-Speed 2620.05 samples/sec Loss 7.9722 LearningRate 0.0358 Epoch: 8 Global Step: 333480 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:04,719-Speed 2625.65 samples/sec Loss 8.0094 LearningRate 0.0358 Epoch: 8 Global Step: 333490 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:08,629-Speed 2619.03 samples/sec Loss 8.0012 LearningRate 0.0358 Epoch: 8 Global Step: 333500 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:12,524-Speed 2629.90 samples/sec Loss 8.0354 LearningRate 0.0358 Epoch: 8 Global Step: 333510 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:16,455-Speed 2605.68 samples/sec Loss 8.0766 LearningRate 0.0358 Epoch: 8 Global Step: 333520 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:20,348-Speed 2631.05 samples/sec Loss 8.0330 LearningRate 0.0358 Epoch: 8 Global Step: 333530 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:24,242-Speed 2630.44 samples/sec Loss 7.9942 LearningRate 0.0358 Epoch: 8 Global Step: 333540 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:28,150-Speed 2620.76 samples/sec Loss 8.0467 LearningRate 0.0358 Epoch: 8 Global Step: 333550 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:32,057-Speed 2621.76 samples/sec Loss 8.0287 LearningRate 0.0358 Epoch: 8 Global Step: 333560 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:35,965-Speed 2620.99 samples/sec Loss 8.0512 LearningRate 0.0357 Epoch: 8 Global Step: 333570 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:39,858-Speed 2631.10 samples/sec Loss 8.0338 LearningRate 0.0357 Epoch: 8 Global Step: 333580 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:43,756-Speed 2627.67 samples/sec Loss 8.0226 LearningRate 0.0357 Epoch: 8 Global Step: 333590 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:47,663-Speed 2621.73 samples/sec Loss 8.1382 LearningRate 0.0357 Epoch: 8 Global Step: 333600 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:51,558-Speed 2629.39 samples/sec Loss 8.0635 LearningRate 0.0357 Epoch: 8 Global Step: 333610 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:55,452-Speed 2630.22 samples/sec Loss 8.1867 LearningRate 0.0357 Epoch: 8 Global Step: 333620 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:12:59,344-Speed 2632.03 samples/sec Loss 8.1224 LearningRate 0.0357 Epoch: 8 Global Step: 333630 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:13:03,241-Speed 2628.15 samples/sec Loss 8.0921 LearningRate 0.0357 Epoch: 8 Global Step: 333640 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:13:07,146-Speed 2623.40 samples/sec Loss 8.0493 LearningRate 0.0357 Epoch: 8 Global Step: 333650 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:13:11,029-Speed 2637.73 samples/sec Loss 8.2181 LearningRate 0.0357 Epoch: 8 Global Step: 333660 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:13:14,928-Speed 2627.14 samples/sec Loss 7.9532 LearningRate 0.0357 Epoch: 8 Global Step: 333670 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:13:18,868-Speed 2599.48 samples/sec Loss 8.0531 LearningRate 0.0357 Epoch: 8 Global Step: 333680 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:13:22,749-Speed 2639.00 samples/sec Loss 8.1026 LearningRate 0.0357 Epoch: 8 Global Step: 333690 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:26,635-Speed 2636.02 samples/sec Loss 7.9632 LearningRate 0.0357 Epoch: 8 Global Step: 333700 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:30,526-Speed 2632.60 samples/sec Loss 8.0493 LearningRate 0.0357 Epoch: 8 Global Step: 333710 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:34,421-Speed 2629.64 samples/sec Loss 8.0637 LearningRate 0.0357 Epoch: 8 Global Step: 333720 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:38,313-Speed 2632.22 samples/sec Loss 7.9183 LearningRate 0.0357 Epoch: 8 Global Step: 333730 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:42,205-Speed 2630.95 samples/sec Loss 8.0091 LearningRate 0.0357 Epoch: 8 Global Step: 333740 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:46,108-Speed 2624.91 samples/sec Loss 8.1248 LearningRate 0.0357 Epoch: 8 Global Step: 333750 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:50,019-Speed 2619.28 samples/sec Loss 8.1167 LearningRate 0.0357 Epoch: 8 Global Step: 333760 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:53,911-Speed 2631.13 samples/sec Loss 8.1648 LearningRate 0.0357 Epoch: 8 Global Step: 333770 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:13:57,804-Speed 2630.71 samples/sec Loss 8.0331 LearningRate 0.0357 Epoch: 8 Global Step: 333780 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:14:01,696-Speed 2632.29 samples/sec Loss 8.0924 LearningRate 0.0357 Epoch: 8 Global Step: 333790 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:05,595-Speed 2626.78 samples/sec Loss 8.0477 LearningRate 0.0357 Epoch: 8 Global Step: 333800 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:09,490-Speed 2629.98 samples/sec Loss 8.1474 LearningRate 0.0357 Epoch: 8 Global Step: 333810 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:13,381-Speed 2631.99 samples/sec Loss 7.9747 LearningRate 0.0357 Epoch: 8 Global Step: 333820 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:17,274-Speed 2631.61 samples/sec Loss 8.0949 LearningRate 0.0357 Epoch: 8 Global Step: 333830 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:21,171-Speed 2628.09 samples/sec Loss 7.9709 LearningRate 0.0357 Epoch: 8 Global Step: 333840 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:25,072-Speed 2625.44 samples/sec Loss 7.9283 LearningRate 0.0357 Epoch: 8 Global Step: 333850 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:28,964-Speed 2631.01 samples/sec Loss 8.2173 LearningRate 0.0357 Epoch: 8 Global Step: 333860 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:32,866-Speed 2625.65 samples/sec Loss 8.0289 LearningRate 0.0357 Epoch: 8 Global Step: 333870 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:36,757-Speed 2632.80 samples/sec Loss 7.9851 LearningRate 0.0357 Epoch: 8 Global Step: 333880 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:14:40,651-Speed 2629.79 samples/sec Loss 8.0797 LearningRate 0.0357 Epoch: 8 Global Step: 333890 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:14:44,543-Speed 2631.66 samples/sec Loss 8.0128 LearningRate 0.0357 Epoch: 8 Global Step: 333900 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:14:48,470-Speed 2608.92 samples/sec Loss 8.1130 LearningRate 0.0357 Epoch: 8 Global Step: 333910 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:14:52,371-Speed 2625.19 samples/sec Loss 8.0733 LearningRate 0.0357 Epoch: 8 Global Step: 333920 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:14:56,240-Speed 2647.47 samples/sec Loss 8.0124 LearningRate 0.0357 Epoch: 8 Global Step: 333930 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:00,128-Speed 2634.92 samples/sec Loss 7.9999 LearningRate 0.0357 Epoch: 8 Global Step: 333940 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:04,027-Speed 2626.95 samples/sec Loss 8.0684 LearningRate 0.0357 Epoch: 8 Global Step: 333950 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:07,919-Speed 2631.37 samples/sec Loss 8.0414 LearningRate 0.0357 Epoch: 8 Global Step: 333960 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:11,812-Speed 2631.27 samples/sec Loss 8.0710 LearningRate 0.0357 Epoch: 8 Global Step: 333970 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:15,711-Speed 2627.44 samples/sec Loss 7.9338 LearningRate 0.0357 Epoch: 8 Global Step: 333980 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:19,611-Speed 2626.06 samples/sec Loss 8.1180 LearningRate 0.0357 Epoch: 8 Global Step: 333990 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:23,509-Speed 2628.01 samples/sec Loss 7.9387 LearningRate 0.0357 Epoch: 8 Global Step: 334000 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:27,411-Speed 2625.25 samples/sec Loss 8.0927 LearningRate 0.0357 Epoch: 8 Global Step: 334010 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:31,320-Speed 2620.28 samples/sec Loss 7.9982 LearningRate 0.0357 Epoch: 8 Global Step: 334020 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:15:35,217-Speed 2628.32 samples/sec Loss 7.9400 LearningRate 0.0357 Epoch: 8 Global Step: 334030 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:15:39,123-Speed 2622.19 samples/sec Loss 8.1059 LearningRate 0.0357 Epoch: 8 Global Step: 334040 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:15:43,062-Speed 2600.27 samples/sec Loss 8.1633 LearningRate 0.0357 Epoch: 8 Global Step: 334050 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:15:46,962-Speed 2626.47 samples/sec Loss 8.0581 LearningRate 0.0357 Epoch: 8 Global Step: 334060 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:15:50,856-Speed 2630.51 samples/sec Loss 8.0625 LearningRate 0.0357 Epoch: 8 Global Step: 334070 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:15:54,750-Speed 2630.43 samples/sec Loss 7.8791 LearningRate 0.0357 Epoch: 8 Global Step: 334080 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:15:58,641-Speed 2631.81 samples/sec Loss 8.0558 LearningRate 0.0357 Epoch: 8 Global Step: 334090 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:16:02,550-Speed 2620.53 samples/sec Loss 8.2167 LearningRate 0.0357 Epoch: 8 Global Step: 334100 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:16:06,453-Speed 2624.54 samples/sec Loss 8.0438 LearningRate 0.0357 Epoch: 8 Global Step: 334110 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:16:10,347-Speed 2630.02 samples/sec Loss 8.0723 LearningRate 0.0357 Epoch: 8 Global Step: 334120 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:16:14,276-Speed 2607.11 samples/sec Loss 8.0993 LearningRate 0.0357 Epoch: 8 Global Step: 334130 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:16:18,165-Speed 2633.86 samples/sec Loss 8.0316 LearningRate 0.0357 Epoch: 8 Global Step: 334140 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:16:22,043-Speed 2642.06 samples/sec Loss 8.0657 LearningRate 0.0357 Epoch: 8 Global Step: 334150 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:16:25,935-Speed 2631.51 samples/sec Loss 8.0733 LearningRate 0.0357 Epoch: 8 Global Step: 334160 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:16:29,815-Speed 2639.53 samples/sec Loss 8.0651 LearningRate 0.0357 Epoch: 8 Global Step: 334170 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:16:33,717-Speed 2624.94 samples/sec Loss 8.0315 LearningRate 0.0357 Epoch: 8 Global Step: 334180 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:16:37,573-Speed 2655.93 samples/sec Loss 8.4830 LearningRate 0.0357 Epoch: 8 Global Step: 334190 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:16:41,474-Speed 2626.13 samples/sec Loss 9.7285 LearningRate 0.0357 Epoch: 8 Global Step: 334200 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:16:45,368-Speed 2630.28 samples/sec Loss 8.6545 LearningRate 0.0357 Epoch: 8 Global Step: 334210 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:16:49,261-Speed 2631.06 samples/sec Loss 8.5442 LearningRate 0.0357 Epoch: 8 Global Step: 334220 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:16:53,158-Speed 2628.63 samples/sec Loss 8.2880 LearningRate 0.0357 Epoch: 8 Global Step: 334230 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:16:57,054-Speed 2628.60 samples/sec Loss 8.3946 LearningRate 0.0357 Epoch: 8 Global Step: 334240 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:17:00,954-Speed 2626.35 samples/sec Loss 8.1937 LearningRate 0.0357 Epoch: 8 Global Step: 334250 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:17:04,854-Speed 2625.80 samples/sec Loss 8.2796 LearningRate 0.0356 Epoch: 8 Global Step: 334260 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:17:08,747-Speed 2631.10 samples/sec Loss 8.1742 LearningRate 0.0356 Epoch: 8 Global Step: 334270 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:17:12,654-Speed 2621.77 samples/sec Loss 8.2107 LearningRate 0.0356 Epoch: 8 Global Step: 334280 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:17:16,548-Speed 2631.33 samples/sec Loss 8.2308 LearningRate 0.0356 Epoch: 8 Global Step: 334290 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:20,445-Speed 2628.15 samples/sec Loss 8.1824 LearningRate 0.0356 Epoch: 8 Global Step: 334300 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:24,337-Speed 2631.26 samples/sec Loss 8.1221 LearningRate 0.0356 Epoch: 8 Global Step: 334310 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:28,230-Speed 2631.24 samples/sec Loss 8.0363 LearningRate 0.0356 Epoch: 8 Global Step: 334320 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:32,124-Speed 2630.08 samples/sec Loss 8.0958 LearningRate 0.0356 Epoch: 8 Global Step: 334330 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:36,030-Speed 2622.35 samples/sec Loss 8.1907 LearningRate 0.0356 Epoch: 8 Global Step: 334340 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:39,935-Speed 2622.44 samples/sec Loss 8.0704 LearningRate 0.0356 Epoch: 8 Global Step: 334350 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:43,833-Speed 2627.75 samples/sec Loss 7.9724 LearningRate 0.0356 Epoch: 8 Global Step: 334360 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:47,726-Speed 2630.81 samples/sec Loss 8.1434 LearningRate 0.0356 Epoch: 8 Global Step: 334370 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:51,618-Speed 2631.89 samples/sec Loss 8.0575 LearningRate 0.0356 Epoch: 8 Global Step: 334380 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:17:55,510-Speed 2632.38 samples/sec Loss 8.0326 LearningRate 0.0356 Epoch: 8 Global Step: 334390 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:17:59,416-Speed 2622.01 samples/sec Loss 8.1342 LearningRate 0.0356 Epoch: 8 Global Step: 334400 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:03,339-Speed 2610.89 samples/sec Loss 7.9441 LearningRate 0.0356 Epoch: 8 Global Step: 334410 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:07,231-Speed 2631.67 samples/sec Loss 8.0842 LearningRate 0.0356 Epoch: 8 Global Step: 334420 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:11,129-Speed 2627.31 samples/sec Loss 8.1988 LearningRate 0.0356 Epoch: 8 Global Step: 334430 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:15,017-Speed 2634.44 samples/sec Loss 7.9733 LearningRate 0.0356 Epoch: 8 Global Step: 334440 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:18,909-Speed 2631.94 samples/sec Loss 8.0313 LearningRate 0.0356 Epoch: 8 Global Step: 334450 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:22,801-Speed 2631.69 samples/sec Loss 8.0987 LearningRate 0.0356 Epoch: 8 Global Step: 334460 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:26,691-Speed 2632.87 samples/sec Loss 8.0949 LearningRate 0.0356 Epoch: 8 Global Step: 334470 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:30,583-Speed 2631.62 samples/sec Loss 8.0157 LearningRate 0.0356 Epoch: 8 Global Step: 334480 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:18:34,472-Speed 2633.46 samples/sec Loss 8.0237 LearningRate 0.0356 Epoch: 8 Global Step: 334490 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:18:38,363-Speed 2632.62 samples/sec Loss 8.1195 LearningRate 0.0356 Epoch: 8 Global Step: 334500 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:18:42,252-Speed 2633.83 samples/sec Loss 7.9493 LearningRate 0.0356 Epoch: 8 Global Step: 334510 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:18:46,164-Speed 2618.52 samples/sec Loss 8.0683 LearningRate 0.0356 Epoch: 8 Global Step: 334520 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:18:50,063-Speed 2626.43 samples/sec Loss 8.0767 LearningRate 0.0356 Epoch: 8 Global Step: 334530 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:18:53,964-Speed 2625.98 samples/sec Loss 8.0652 LearningRate 0.0356 Epoch: 8 Global Step: 334540 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:18:57,861-Speed 2627.76 samples/sec Loss 8.0097 LearningRate 0.0356 Epoch: 8 Global Step: 334550 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:19:01,760-Speed 2627.14 samples/sec Loss 8.0974 LearningRate 0.0356 Epoch: 8 Global Step: 334560 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:19:05,658-Speed 2627.52 samples/sec Loss 7.9060 LearningRate 0.0356 Epoch: 8 Global Step: 334570 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:19:09,553-Speed 2629.33 samples/sec Loss 8.0722 LearningRate 0.0356 Epoch: 8 Global Step: 334580 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:19:13,452-Speed 2627.58 samples/sec Loss 8.1737 LearningRate 0.0356 Epoch: 8 Global Step: 334590 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:17,352-Speed 2626.35 samples/sec Loss 8.0482 LearningRate 0.0356 Epoch: 8 Global Step: 334600 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:21,258-Speed 2622.17 samples/sec Loss 8.1149 LearningRate 0.0356 Epoch: 8 Global Step: 334610 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:25,156-Speed 2627.30 samples/sec Loss 7.9826 LearningRate 0.0356 Epoch: 8 Global Step: 334620 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:29,068-Speed 2618.36 samples/sec Loss 8.1030 LearningRate 0.0356 Epoch: 8 Global Step: 334630 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:32,964-Speed 2628.70 samples/sec Loss 7.9665 LearningRate 0.0356 Epoch: 8 Global Step: 334640 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:36,871-Speed 2621.55 samples/sec Loss 8.0085 LearningRate 0.0356 Epoch: 8 Global Step: 334650 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:40,776-Speed 2622.55 samples/sec Loss 8.0740 LearningRate 0.0356 Epoch: 8 Global Step: 334660 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:44,676-Speed 2626.62 samples/sec Loss 8.0358 LearningRate 0.0356 Epoch: 8 Global Step: 334670 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:48,570-Speed 2629.76 samples/sec Loss 7.9615 LearningRate 0.0356 Epoch: 8 Global Step: 334680 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:19:52,465-Speed 2630.21 samples/sec Loss 8.0334 LearningRate 0.0356 Epoch: 8 Global Step: 334690 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:19:56,368-Speed 2623.95 samples/sec Loss 7.9116 LearningRate 0.0356 Epoch: 8 Global Step: 334700 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:20:00,287-Speed 2613.63 samples/sec Loss 8.1159 LearningRate 0.0356 Epoch: 8 Global Step: 334710 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:20:04,216-Speed 2606.90 samples/sec Loss 8.0513 LearningRate 0.0356 Epoch: 8 Global Step: 334720 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:08,110-Speed 2630.47 samples/sec Loss 8.0598 LearningRate 0.0356 Epoch: 8 Global Step: 334730 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:12,007-Speed 2627.86 samples/sec Loss 8.0012 LearningRate 0.0356 Epoch: 8 Global Step: 334740 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:15,906-Speed 2626.98 samples/sec Loss 8.0792 LearningRate 0.0356 Epoch: 8 Global Step: 334750 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:19,814-Speed 2620.78 samples/sec Loss 8.0090 LearningRate 0.0356 Epoch: 8 Global Step: 334760 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:23,715-Speed 2625.79 samples/sec Loss 8.1064 LearningRate 0.0356 Epoch: 8 Global Step: 334770 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:27,612-Speed 2627.65 samples/sec Loss 8.0194 LearningRate 0.0356 Epoch: 8 Global Step: 334780 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:31,514-Speed 2624.99 samples/sec Loss 8.0149 LearningRate 0.0356 Epoch: 8 Global Step: 334790 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:35,418-Speed 2624.13 samples/sec Loss 7.9058 LearningRate 0.0356 Epoch: 8 Global Step: 334800 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:39,313-Speed 2629.03 samples/sec Loss 7.9774 LearningRate 0.0356 Epoch: 8 Global Step: 334810 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:43,219-Speed 2622.66 samples/sec Loss 8.0421 LearningRate 0.0356 Epoch: 8 Global Step: 334820 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:20:47,096-Speed 2641.39 samples/sec Loss 8.1303 LearningRate 0.0356 Epoch: 8 Global Step: 334830 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:50,996-Speed 2626.34 samples/sec Loss 7.9717 LearningRate 0.0356 Epoch: 8 Global Step: 334840 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:54,894-Speed 2627.31 samples/sec Loss 7.9770 LearningRate 0.0356 Epoch: 8 Global Step: 334850 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:20:58,791-Speed 2628.70 samples/sec Loss 7.9962 LearningRate 0.0356 Epoch: 8 Global Step: 334860 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:02,695-Speed 2623.38 samples/sec Loss 7.8968 LearningRate 0.0356 Epoch: 8 Global Step: 334870 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:06,592-Speed 2628.28 samples/sec Loss 8.0249 LearningRate 0.0356 Epoch: 8 Global Step: 334880 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:10,488-Speed 2628.62 samples/sec Loss 7.9010 LearningRate 0.0356 Epoch: 8 Global Step: 334890 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:14,384-Speed 2629.11 samples/sec Loss 8.0430 LearningRate 0.0356 Epoch: 8 Global Step: 334900 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:18,283-Speed 2627.55 samples/sec Loss 8.2498 LearningRate 0.0356 Epoch: 8 Global Step: 334910 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:22,181-Speed 2627.28 samples/sec Loss 8.1058 LearningRate 0.0356 Epoch: 8 Global Step: 334920 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:26,074-Speed 2630.78 samples/sec Loss 8.0953 LearningRate 0.0356 Epoch: 8 Global Step: 334930 Fp16 Grad Scale: 262144 Required: 56 hours
Training: 2022-04-14 09:21:29,957-Speed 2637.88 samples/sec Loss 8.0272 LearningRate 0.0356 Epoch: 8 Global Step: 334940 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:33,863-Speed 2621.80 samples/sec Loss 7.9921 LearningRate 0.0356 Epoch: 8 Global Step: 334950 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:21:37,709-Speed 2662.85 samples/sec Loss 8.0177 LearningRate 0.0355 Epoch: 8 Global Step: 334960 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:21:41,582-Speed 2645.10 samples/sec Loss 8.4671 LearningRate 0.0355 Epoch: 8 Global Step: 334970 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:21:45,472-Speed 2632.65 samples/sec Loss 8.1287 LearningRate 0.0355 Epoch: 8 Global Step: 334980 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:21:49,372-Speed 2626.34 samples/sec Loss 7.9874 LearningRate 0.0355 Epoch: 8 Global Step: 334990 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:21:53,259-Speed 2635.24 samples/sec Loss 8.1363 LearningRate 0.0355 Epoch: 8 Global Step: 335000 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:21:57,147-Speed 2634.44 samples/sec Loss 7.9930 LearningRate 0.0355 Epoch: 8 Global Step: 335010 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:22:01,037-Speed 2633.00 samples/sec Loss 7.9800 LearningRate 0.0355 Epoch: 8 Global Step: 335020 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:22:04,929-Speed 2631.09 samples/sec Loss 8.0743 LearningRate 0.0355 Epoch: 8 Global Step: 335030 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:22:08,820-Speed 2632.30 samples/sec Loss 8.0762 LearningRate 0.0355 Epoch: 8 Global Step: 335040 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:22:12,707-Speed 2635.28 samples/sec Loss 7.9823 LearningRate 0.0355 Epoch: 8 Global Step: 335050 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:22:16,596-Speed 2633.53 samples/sec Loss 8.0050 LearningRate 0.0355 Epoch: 8 Global Step: 335060 Fp16 Grad Scale: 4096 Required: 56 hours
Training: 2022-04-14 09:22:20,494-Speed 2627.58 samples/sec Loss 8.0219 LearningRate 0.0355 Epoch: 8 Global Step: 335070 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:24,382-Speed 2634.59 samples/sec Loss 8.0500 LearningRate 0.0355 Epoch: 8 Global Step: 335080 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:28,275-Speed 2631.11 samples/sec Loss 8.0313 LearningRate 0.0355 Epoch: 8 Global Step: 335090 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:32,171-Speed 2628.91 samples/sec Loss 8.0486 LearningRate 0.0355 Epoch: 8 Global Step: 335100 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:36,064-Speed 2630.84 samples/sec Loss 8.0530 LearningRate 0.0355 Epoch: 8 Global Step: 335110 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:39,953-Speed 2633.52 samples/sec Loss 8.0839 LearningRate 0.0355 Epoch: 8 Global Step: 335120 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:43,933-Speed 2573.73 samples/sec Loss 7.9958 LearningRate 0.0355 Epoch: 8 Global Step: 335130 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:47,839-Speed 2622.01 samples/sec Loss 8.0360 LearningRate 0.0355 Epoch: 8 Global Step: 335140 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:51,737-Speed 2627.84 samples/sec Loss 8.0032 LearningRate 0.0355 Epoch: 8 Global Step: 335150 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:55,637-Speed 2625.65 samples/sec Loss 8.1026 LearningRate 0.0355 Epoch: 8 Global Step: 335160 Fp16 Grad Scale: 8192 Required: 56 hours
Training: 2022-04-14 09:22:59,528-Speed 2632.75 samples/sec Loss 7.9349 LearningRate 0.0355 Epoch: 8 Global Step: 335170 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:03,424-Speed 2628.89 samples/sec Loss 8.0349 LearningRate 0.0355 Epoch: 8 Global Step: 335180 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:07,317-Speed 2630.73 samples/sec Loss 7.9283 LearningRate 0.0355 Epoch: 8 Global Step: 335190 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:11,209-Speed 2632.23 samples/sec Loss 7.9055 LearningRate 0.0355 Epoch: 8 Global Step: 335200 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:15,118-Speed 2620.06 samples/sec Loss 7.9836 LearningRate 0.0355 Epoch: 8 Global Step: 335210 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:19,018-Speed 2625.96 samples/sec Loss 8.0254 LearningRate 0.0355 Epoch: 8 Global Step: 335220 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:22,907-Speed 2633.50 samples/sec Loss 8.0392 LearningRate 0.0355 Epoch: 8 Global Step: 335230 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:26,801-Speed 2630.73 samples/sec Loss 8.0136 LearningRate 0.0355 Epoch: 8 Global Step: 335240 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:30,701-Speed 2625.73 samples/sec Loss 7.9715 LearningRate 0.0355 Epoch: 8 Global Step: 335250 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:34,606-Speed 2622.93 samples/sec Loss 8.0281 LearningRate 0.0355 Epoch: 8 Global Step: 335260 Fp16 Grad Scale: 16384 Required: 56 hours
Training: 2022-04-14 09:23:38,505-Speed 2626.79 samples/sec Loss 8.0632 LearningRate 0.0355 Epoch: 8 Global Step: 335270 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:23:42,395-Speed 2632.94 samples/sec Loss 7.9342 LearningRate 0.0355 Epoch: 8 Global Step: 335280 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:23:46,287-Speed 2631.80 samples/sec Loss 8.0293 LearningRate 0.0355 Epoch: 8 Global Step: 335290 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:23:50,180-Speed 2631.79 samples/sec Loss 7.9624 LearningRate 0.0355 Epoch: 8 Global Step: 335300 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:23:54,075-Speed 2628.95 samples/sec Loss 7.8603 LearningRate 0.0355 Epoch: 8 Global Step: 335310 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:23:57,969-Speed 2630.72 samples/sec Loss 8.1627 LearningRate 0.0355 Epoch: 8 Global Step: 335320 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:24:01,863-Speed 2630.37 samples/sec Loss 8.1809 LearningRate 0.0355 Epoch: 8 Global Step: 335330 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:24:05,757-Speed 2629.76 samples/sec Loss 7.9393 LearningRate 0.0355 Epoch: 8 Global Step: 335340 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:24:09,651-Speed 2630.07 samples/sec Loss 7.9522 LearningRate 0.0355 Epoch: 8 Global Step: 335350 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:24:13,547-Speed 2629.53 samples/sec Loss 8.0189 LearningRate 0.0355 Epoch: 8 Global Step: 335360 Fp16 Grad Scale: 32768 Required: 56 hours
Training: 2022-04-14 09:24:17,447-Speed 2626.14 samples/sec Loss 8.1489 LearningRate 0.0355 Epoch: 8 Global Step: 335370 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:21,355-Speed 2620.33 samples/sec Loss 8.1272 LearningRate 0.0355 Epoch: 8 Global Step: 335380 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:25,265-Speed 2620.06 samples/sec Loss 8.0075 LearningRate 0.0355 Epoch: 8 Global Step: 335390 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:29,157-Speed 2631.33 samples/sec Loss 8.1175 LearningRate 0.0355 Epoch: 8 Global Step: 335400 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:33,060-Speed 2625.11 samples/sec Loss 8.1167 LearningRate 0.0355 Epoch: 8 Global Step: 335410 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:36,963-Speed 2623.99 samples/sec Loss 8.0120 LearningRate 0.0355 Epoch: 8 Global Step: 335420 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:40,865-Speed 2624.56 samples/sec Loss 7.9457 LearningRate 0.0355 Epoch: 8 Global Step: 335430 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:44,764-Speed 2627.14 samples/sec Loss 7.9917 LearningRate 0.0355 Epoch: 8 Global Step: 335440 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:48,663-Speed 2626.74 samples/sec Loss 8.0372 LearningRate 0.0355 Epoch: 8 Global Step: 335450 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:52,572-Speed 2620.45 samples/sec Loss 8.0429 LearningRate 0.0355 Epoch: 8 Global Step: 335460 Fp16 Grad Scale: 65536 Required: 56 hours
Training: 2022-04-14 09:24:56,479-Speed 2621.41 samples/sec Loss 7.9779 LearningRate 0.0355 Epoch: 8 Global Step: 335470 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:25:00,383-Speed 2623.95 samples/sec Loss 8.1447 LearningRate 0.0355 Epoch: 8 Global Step: 335480 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:25:04,278-Speed 2629.65 samples/sec Loss 8.0687 LearningRate 0.0355 Epoch: 8 Global Step: 335490 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:25:08,172-Speed 2629.93 samples/sec Loss 7.8880 LearningRate 0.0355 Epoch: 8 Global Step: 335500 Fp16 Grad Scale: 131072 Required: 56 hours
Training: 2022-04-14 09:25:12,065-Speed 2631.16 samples/sec Loss 7.8851 LearningRate 0.0355 Epoch: 8 Global Step: 335510 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:15,955-Speed 2633.14 samples/sec Loss 8.1351 LearningRate 0.0355 Epoch: 8 Global Step: 335520 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:19,851-Speed 2628.53 samples/sec Loss 8.0394 LearningRate 0.0355 Epoch: 8 Global Step: 335530 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:23,745-Speed 2630.55 samples/sec Loss 8.0980 LearningRate 0.0355 Epoch: 8 Global Step: 335540 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:27,635-Speed 2632.90 samples/sec Loss 8.1482 LearningRate 0.0355 Epoch: 8 Global Step: 335550 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:31,531-Speed 2628.30 samples/sec Loss 8.0788 LearningRate 0.0355 Epoch: 8 Global Step: 335560 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:35,422-Speed 2632.43 samples/sec Loss 7.9495 LearningRate 0.0355 Epoch: 8 Global Step: 335570 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 09:25:39,315-Speed 2631.63 samples/sec Loss 7.9988 LearningRate 0.0355 Epoch: 8 Global Step: 335580 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 09:25:43,194-Speed 2640.25 samples/sec Loss 8.0488 LearningRate 0.0355 Epoch: 8 Global Step: 335590 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:47,088-Speed 2630.61 samples/sec Loss 7.8176 LearningRate 0.0355 Epoch: 8 Global Step: 335600 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:50,981-Speed 2630.89 samples/sec Loss 8.0471 LearningRate 0.0355 Epoch: 8 Global Step: 335610 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:54,874-Speed 2630.71 samples/sec Loss 7.8607 LearningRate 0.0355 Epoch: 8 Global Step: 335620 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:25:58,770-Speed 2628.71 samples/sec Loss 8.0093 LearningRate 0.0355 Epoch: 8 Global Step: 335630 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:26:02,664-Speed 2629.93 samples/sec Loss 8.0792 LearningRate 0.0355 Epoch: 8 Global Step: 335640 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:26:06,559-Speed 2629.76 samples/sec Loss 7.9881 LearningRate 0.0354 Epoch: 8 Global Step: 335650 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:26:10,453-Speed 2630.23 samples/sec Loss 7.9375 LearningRate 0.0354 Epoch: 8 Global Step: 335660 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:14,361-Speed 2621.16 samples/sec Loss 7.9280 LearningRate 0.0354 Epoch: 8 Global Step: 335670 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:18,272-Speed 2619.14 samples/sec Loss 8.0308 LearningRate 0.0354 Epoch: 8 Global Step: 335680 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:22,181-Speed 2620.04 samples/sec Loss 8.1375 LearningRate 0.0354 Epoch: 8 Global Step: 335690 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:26,085-Speed 2624.06 samples/sec Loss 8.0259 LearningRate 0.0354 Epoch: 8 Global Step: 335700 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:29,992-Speed 2621.36 samples/sec Loss 8.0576 LearningRate 0.0354 Epoch: 8 Global Step: 335710 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:33,885-Speed 2630.75 samples/sec Loss 8.0322 LearningRate 0.0354 Epoch: 8 Global Step: 335720 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:37,793-Speed 2620.82 samples/sec Loss 7.9785 LearningRate 0.0354 Epoch: 8 Global Step: 335730 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:41,683-Speed 2632.70 samples/sec Loss 8.0153 LearningRate 0.0354 Epoch: 8 Global Step: 335740 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:45,578-Speed 2629.87 samples/sec Loss 8.0276 LearningRate 0.0354 Epoch: 8 Global Step: 335750 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:26:49,482-Speed 2623.17 samples/sec Loss 7.9987 LearningRate 0.0354 Epoch: 8 Global Step: 335760 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:26:53,376-Speed 2630.42 samples/sec Loss 8.0419 LearningRate 0.0354 Epoch: 8 Global Step: 335770 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:26:57,272-Speed 2629.24 samples/sec Loss 7.9567 LearningRate 0.0354 Epoch: 8 Global Step: 335780 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:01,189-Speed 2614.68 samples/sec Loss 8.0865 LearningRate 0.0354 Epoch: 8 Global Step: 335790 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:05,084-Speed 2629.60 samples/sec Loss 8.0006 LearningRate 0.0354 Epoch: 8 Global Step: 335800 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:08,975-Speed 2632.41 samples/sec Loss 8.0160 LearningRate 0.0354 Epoch: 8 Global Step: 335810 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:12,881-Speed 2622.25 samples/sec Loss 8.0751 LearningRate 0.0354 Epoch: 8 Global Step: 335820 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:16,778-Speed 2628.19 samples/sec Loss 8.0595 LearningRate 0.0354 Epoch: 8 Global Step: 335830 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:20,671-Speed 2631.09 samples/sec Loss 7.9819 LearningRate 0.0354 Epoch: 8 Global Step: 335840 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:24,564-Speed 2630.61 samples/sec Loss 8.0534 LearningRate 0.0354 Epoch: 8 Global Step: 335850 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:28,440-Speed 2642.54 samples/sec Loss 7.9034 LearningRate 0.0354 Epoch: 8 Global Step: 335860 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:32,335-Speed 2629.79 samples/sec Loss 7.8690 LearningRate 0.0354 Epoch: 8 Global Step: 335870 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:36,229-Speed 2630.29 samples/sec Loss 8.1321 LearningRate 0.0354 Epoch: 8 Global Step: 335880 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:40,120-Speed 2632.57 samples/sec Loss 8.0090 LearningRate 0.0354 Epoch: 8 Global Step: 335890 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:44,019-Speed 2627.25 samples/sec Loss 7.9536 LearningRate 0.0354 Epoch: 8 Global Step: 335900 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:47,914-Speed 2629.29 samples/sec Loss 8.0416 LearningRate 0.0354 Epoch: 8 Global Step: 335910 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:51,818-Speed 2623.45 samples/sec Loss 8.0155 LearningRate 0.0354 Epoch: 8 Global Step: 335920 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:55,712-Speed 2630.57 samples/sec Loss 7.9525 LearningRate 0.0354 Epoch: 8 Global Step: 335930 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:27:59,605-Speed 2631.03 samples/sec Loss 7.9480 LearningRate 0.0354 Epoch: 8 Global Step: 335940 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:03,504-Speed 2626.54 samples/sec Loss 7.9338 LearningRate 0.0354 Epoch: 8 Global Step: 335950 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:07,387-Speed 2637.53 samples/sec Loss 7.9257 LearningRate 0.0354 Epoch: 8 Global Step: 335960 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:11,277-Speed 2634.08 samples/sec Loss 7.9190 LearningRate 0.0354 Epoch: 8 Global Step: 335970 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:15,178-Speed 2625.72 samples/sec Loss 8.0475 LearningRate 0.0354 Epoch: 8 Global Step: 335980 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:19,072-Speed 2630.10 samples/sec Loss 7.9882 LearningRate 0.0354 Epoch: 8 Global Step: 335990 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:22,967-Speed 2629.60 samples/sec Loss 8.0164 LearningRate 0.0354 Epoch: 8 Global Step: 336000 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:26,863-Speed 2629.28 samples/sec Loss 7.9467 LearningRate 0.0354 Epoch: 8 Global Step: 336010 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:30,756-Speed 2630.50 samples/sec Loss 7.9321 LearningRate 0.0354 Epoch: 8 Global Step: 336020 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:34,651-Speed 2629.67 samples/sec Loss 7.9072 LearningRate 0.0354 Epoch: 8 Global Step: 336030 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:38,545-Speed 2630.35 samples/sec Loss 8.0132 LearningRate 0.0354 Epoch: 8 Global Step: 336040 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:42,500-Speed 2589.52 samples/sec Loss 7.9319 LearningRate 0.0354 Epoch: 8 Global Step: 336050 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:46,600-Speed 2498.19 samples/sec Loss 7.8498 LearningRate 0.0354 Epoch: 8 Global Step: 336060 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 09:28:50,670-Speed 2516.38 samples/sec Loss 8.0606 LearningRate 0.0354 Epoch: 8 Global Step: 336070 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 09:28:54,554-Speed 2637.18 samples/sec Loss 8.1456 LearningRate 0.0354 Epoch: 8 Global Step: 336080 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:28:58,449-Speed 2629.91 samples/sec Loss 8.1207 LearningRate 0.0354 Epoch: 8 Global Step: 336090 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:02,350-Speed 2625.34 samples/sec Loss 8.0580 LearningRate 0.0354 Epoch: 8 Global Step: 336100 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:06,245-Speed 2629.45 samples/sec Loss 7.9348 LearningRate 0.0354 Epoch: 8 Global Step: 336110 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:10,154-Speed 2620.13 samples/sec Loss 7.9213 LearningRate 0.0354 Epoch: 8 Global Step: 336120 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:14,047-Speed 2631.45 samples/sec Loss 7.9146 LearningRate 0.0354 Epoch: 8 Global Step: 336130 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:17,950-Speed 2623.86 samples/sec Loss 8.0964 LearningRate 0.0354 Epoch: 8 Global Step: 336140 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:21,849-Speed 2626.68 samples/sec Loss 8.0148 LearningRate 0.0354 Epoch: 8 Global Step: 336150 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:25,751-Speed 2625.06 samples/sec Loss 8.0580 LearningRate 0.0354 Epoch: 8 Global Step: 336160 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:29,650-Speed 2627.11 samples/sec Loss 8.0233 LearningRate 0.0354 Epoch: 8 Global Step: 336170 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:33,552-Speed 2625.32 samples/sec Loss 7.9100 LearningRate 0.0354 Epoch: 8 Global Step: 336180 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 09:29:37,427-Speed 2642.90 samples/sec Loss 8.0841 LearningRate 0.0354 Epoch: 8 Global Step: 336190 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:41,321-Speed 2630.47 samples/sec Loss 7.8968 LearningRate 0.0354 Epoch: 8 Global Step: 336200 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:45,216-Speed 2629.20 samples/sec Loss 7.9629 LearningRate 0.0354 Epoch: 8 Global Step: 336210 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:49,109-Speed 2630.84 samples/sec Loss 7.9466 LearningRate 0.0354 Epoch: 8 Global Step: 336220 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:53,001-Speed 2632.02 samples/sec Loss 7.9270 LearningRate 0.0354 Epoch: 8 Global Step: 336230 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:29:56,936-Speed 2602.80 samples/sec Loss 7.9442 LearningRate 0.0354 Epoch: 8 Global Step: 336240 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:30:00,850-Speed 2616.30 samples/sec Loss 7.8910 LearningRate 0.0354 Epoch: 8 Global Step: 336250 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:30:04,762-Speed 2618.24 samples/sec Loss 8.1024 LearningRate 0.0354 Epoch: 8 Global Step: 336260 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:30:08,652-Speed 2632.90 samples/sec Loss 8.0423 LearningRate 0.0354 Epoch: 8 Global Step: 336270 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:30:12,545-Speed 2631.14 samples/sec Loss 8.0058 LearningRate 0.0354 Epoch: 8 Global Step: 336280 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:30:16,422-Speed 2642.34 samples/sec Loss 7.9741 LearningRate 0.0354 Epoch: 8 Global Step: 336290 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:30:20,317-Speed 2629.38 samples/sec Loss 7.8021 LearningRate 0.0354 Epoch: 8 Global Step: 336300 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:30:24,197-Speed 2639.46 samples/sec Loss 7.9267 LearningRate 0.0354 Epoch: 8 Global Step: 336310 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:28,090-Speed 2631.20 samples/sec Loss 7.9353 LearningRate 0.0354 Epoch: 8 Global Step: 336320 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:31,984-Speed 2630.57 samples/sec Loss 7.9928 LearningRate 0.0354 Epoch: 8 Global Step: 336330 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:35,875-Speed 2631.83 samples/sec Loss 7.9033 LearningRate 0.0354 Epoch: 8 Global Step: 336340 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:39,768-Speed 2631.28 samples/sec Loss 7.9172 LearningRate 0.0353 Epoch: 8 Global Step: 336350 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:43,664-Speed 2628.62 samples/sec Loss 7.9095 LearningRate 0.0353 Epoch: 8 Global Step: 336360 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:47,563-Speed 2626.95 samples/sec Loss 7.9392 LearningRate 0.0353 Epoch: 8 Global Step: 336370 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:51,460-Speed 2628.81 samples/sec Loss 8.0450 LearningRate 0.0353 Epoch: 8 Global Step: 336380 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:55,353-Speed 2631.20 samples/sec Loss 7.9724 LearningRate 0.0353 Epoch: 8 Global Step: 336390 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:30:59,245-Speed 2631.17 samples/sec Loss 7.9859 LearningRate 0.0353 Epoch: 8 Global Step: 336400 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:31:03,155-Speed 2619.53 samples/sec Loss 8.1257 LearningRate 0.0353 Epoch: 8 Global Step: 336410 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:07,060-Speed 2623.18 samples/sec Loss 7.8633 LearningRate 0.0353 Epoch: 8 Global Step: 336420 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:10,975-Speed 2615.74 samples/sec Loss 7.8669 LearningRate 0.0353 Epoch: 8 Global Step: 336430 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:14,897-Speed 2611.83 samples/sec Loss 8.1444 LearningRate 0.0353 Epoch: 8 Global Step: 336440 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:18,815-Speed 2613.71 samples/sec Loss 7.9417 LearningRate 0.0353 Epoch: 8 Global Step: 336450 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:22,729-Speed 2617.20 samples/sec Loss 8.0654 LearningRate 0.0353 Epoch: 8 Global Step: 336460 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:26,629-Speed 2625.71 samples/sec Loss 8.0625 LearningRate 0.0353 Epoch: 8 Global Step: 336470 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:30,520-Speed 2632.55 samples/sec Loss 8.0186 LearningRate 0.0353 Epoch: 8 Global Step: 336480 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:34,425-Speed 2623.11 samples/sec Loss 7.9785 LearningRate 0.0353 Epoch: 8 Global Step: 336490 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:31:38,260-Speed 2670.78 samples/sec Loss 8.2996 LearningRate 0.0353 Epoch: 8 Global Step: 336500 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:31:42,115-Speed 2656.61 samples/sec Loss 9.6476 LearningRate 0.0353 Epoch: 8 Global Step: 336510 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:31:46,002-Speed 2635.04 samples/sec Loss 9.2868 LearningRate 0.0353 Epoch: 8 Global Step: 336520 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:31:49,892-Speed 2633.04 samples/sec Loss 8.5001 LearningRate 0.0353 Epoch: 8 Global Step: 336530 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:31:53,782-Speed 2633.19 samples/sec Loss 8.3730 LearningRate 0.0353 Epoch: 8 Global Step: 336540 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:31:57,670-Speed 2634.40 samples/sec Loss 8.1863 LearningRate 0.0353 Epoch: 8 Global Step: 336550 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:32:01,559-Speed 2633.16 samples/sec Loss 8.1522 LearningRate 0.0353 Epoch: 8 Global Step: 336560 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:32:05,450-Speed 2632.53 samples/sec Loss 8.1246 LearningRate 0.0353 Epoch: 8 Global Step: 336570 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:32:09,337-Speed 2635.19 samples/sec Loss 8.1362 LearningRate 0.0353 Epoch: 8 Global Step: 336580 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:32:13,237-Speed 2626.68 samples/sec Loss 8.1412 LearningRate 0.0353 Epoch: 8 Global Step: 336590 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:32:17,132-Speed 2629.53 samples/sec Loss 7.9520 LearningRate 0.0353 Epoch: 8 Global Step: 336600 Fp16 Grad Scale: 1024 Required: 55 hours
Training: 2022-04-14 09:32:21,031-Speed 2626.73 samples/sec Loss 8.3721 LearningRate 0.0353 Epoch: 8 Global Step: 336610 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:24,924-Speed 2631.21 samples/sec Loss 8.1390 LearningRate 0.0353 Epoch: 8 Global Step: 336620 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:28,816-Speed 2631.03 samples/sec Loss 8.0495 LearningRate 0.0353 Epoch: 8 Global Step: 336630 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:32,705-Speed 2633.89 samples/sec Loss 7.9751 LearningRate 0.0353 Epoch: 8 Global Step: 336640 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:36,600-Speed 2629.24 samples/sec Loss 7.9379 LearningRate 0.0353 Epoch: 8 Global Step: 336650 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:40,491-Speed 2632.54 samples/sec Loss 7.9204 LearningRate 0.0353 Epoch: 8 Global Step: 336660 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:44,382-Speed 2632.52 samples/sec Loss 8.0854 LearningRate 0.0353 Epoch: 8 Global Step: 336670 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:48,275-Speed 2630.34 samples/sec Loss 8.0291 LearningRate 0.0353 Epoch: 8 Global Step: 336680 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:52,181-Speed 2623.10 samples/sec Loss 8.0704 LearningRate 0.0353 Epoch: 8 Global Step: 336690 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:56,071-Speed 2632.48 samples/sec Loss 8.1585 LearningRate 0.0353 Epoch: 8 Global Step: 336700 Fp16 Grad Scale: 2048 Required: 55 hours
Training: 2022-04-14 09:32:59,971-Speed 2626.41 samples/sec Loss 8.0747 LearningRate 0.0353 Epoch: 8 Global Step: 336710 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:03,874-Speed 2623.92 samples/sec Loss 8.0608 LearningRate 0.0353 Epoch: 8 Global Step: 336720 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:07,771-Speed 2628.54 samples/sec Loss 7.9898 LearningRate 0.0353 Epoch: 8 Global Step: 336730 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:11,675-Speed 2623.12 samples/sec Loss 8.0042 LearningRate 0.0353 Epoch: 8 Global Step: 336740 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:15,573-Speed 2627.82 samples/sec Loss 7.8826 LearningRate 0.0353 Epoch: 8 Global Step: 336750 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:19,478-Speed 2622.59 samples/sec Loss 7.9699 LearningRate 0.0353 Epoch: 8 Global Step: 336760 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:23,372-Speed 2630.35 samples/sec Loss 7.8981 LearningRate 0.0353 Epoch: 8 Global Step: 336770 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:27,259-Speed 2635.08 samples/sec Loss 8.0476 LearningRate 0.0353 Epoch: 8 Global Step: 336780 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:31,152-Speed 2631.26 samples/sec Loss 7.9586 LearningRate 0.0353 Epoch: 8 Global Step: 336790 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:35,041-Speed 2633.88 samples/sec Loss 7.9436 LearningRate 0.0353 Epoch: 8 Global Step: 336800 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:33:38,929-Speed 2634.03 samples/sec Loss 8.0179 LearningRate 0.0353 Epoch: 8 Global Step: 336810 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:33:42,819-Speed 2633.36 samples/sec Loss 8.0359 LearningRate 0.0353 Epoch: 8 Global Step: 336820 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:33:46,708-Speed 2633.04 samples/sec Loss 7.9777 LearningRate 0.0353 Epoch: 8 Global Step: 336830 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:33:50,600-Speed 2631.94 samples/sec Loss 7.8492 LearningRate 0.0353 Epoch: 8 Global Step: 336840 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:33:54,491-Speed 2632.16 samples/sec Loss 7.9421 LearningRate 0.0353 Epoch: 8 Global Step: 336850 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:33:58,392-Speed 2625.69 samples/sec Loss 7.9635 LearningRate 0.0353 Epoch: 8 Global Step: 336860 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:34:02,286-Speed 2629.83 samples/sec Loss 8.1616 LearningRate 0.0353 Epoch: 8 Global Step: 336870 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:34:06,181-Speed 2629.58 samples/sec Loss 8.1640 LearningRate 0.0353 Epoch: 8 Global Step: 336880 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:34:10,071-Speed 2633.55 samples/sec Loss 8.0713 LearningRate 0.0353 Epoch: 8 Global Step: 336890 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:34:13,965-Speed 2630.03 samples/sec Loss 8.1333 LearningRate 0.0353 Epoch: 8 Global Step: 336900 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:34:17,860-Speed 2629.84 samples/sec Loss 7.9144 LearningRate 0.0353 Epoch: 8 Global Step: 336910 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:21,760-Speed 2626.11 samples/sec Loss 7.9661 LearningRate 0.0353 Epoch: 8 Global Step: 336920 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:25,669-Speed 2619.95 samples/sec Loss 7.9862 LearningRate 0.0353 Epoch: 8 Global Step: 336930 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:29,574-Speed 2623.03 samples/sec Loss 7.9816 LearningRate 0.0353 Epoch: 8 Global Step: 336940 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:33,481-Speed 2621.54 samples/sec Loss 7.9184 LearningRate 0.0353 Epoch: 8 Global Step: 336950 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:37,376-Speed 2629.56 samples/sec Loss 8.0473 LearningRate 0.0353 Epoch: 8 Global Step: 336960 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:41,272-Speed 2628.86 samples/sec Loss 8.0612 LearningRate 0.0353 Epoch: 8 Global Step: 336970 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:45,175-Speed 2624.51 samples/sec Loss 7.9732 LearningRate 0.0353 Epoch: 8 Global Step: 336980 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:49,247-Speed 2515.49 samples/sec Loss 7.9811 LearningRate 0.0353 Epoch: 8 Global Step: 336990 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:53,171-Speed 2610.63 samples/sec Loss 7.9859 LearningRate 0.0353 Epoch: 8 Global Step: 337000 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:34:57,089-Speed 2614.69 samples/sec Loss 7.9548 LearningRate 0.0353 Epoch: 8 Global Step: 337010 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:00,994-Speed 2622.57 samples/sec Loss 7.9606 LearningRate 0.0353 Epoch: 8 Global Step: 337020 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:04,894-Speed 2626.11 samples/sec Loss 7.9180 LearningRate 0.0353 Epoch: 8 Global Step: 337030 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:08,804-Speed 2619.84 samples/sec Loss 8.0358 LearningRate 0.0353 Epoch: 8 Global Step: 337040 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:12,707-Speed 2624.56 samples/sec Loss 7.9054 LearningRate 0.0352 Epoch: 8 Global Step: 337050 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:16,670-Speed 2584.27 samples/sec Loss 8.0399 LearningRate 0.0352 Epoch: 8 Global Step: 337060 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:20,576-Speed 2622.38 samples/sec Loss 7.9303 LearningRate 0.0352 Epoch: 8 Global Step: 337070 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:24,472-Speed 2628.45 samples/sec Loss 7.9484 LearningRate 0.0352 Epoch: 8 Global Step: 337080 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:28,369-Speed 2628.61 samples/sec Loss 8.0752 LearningRate 0.0352 Epoch: 8 Global Step: 337090 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:32,269-Speed 2626.32 samples/sec Loss 8.0516 LearningRate 0.0352 Epoch: 8 Global Step: 337100 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:35:36,206-Speed 2601.82 samples/sec Loss 8.0061 LearningRate 0.0352 Epoch: 8 Global Step: 337110 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:35:40,129-Speed 2610.77 samples/sec Loss 7.9998 LearningRate 0.0352 Epoch: 8 Global Step: 337120 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:35:44,028-Speed 2627.06 samples/sec Loss 8.0091 LearningRate 0.0352 Epoch: 8 Global Step: 337130 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:35:47,931-Speed 2624.47 samples/sec Loss 7.9982 LearningRate 0.0352 Epoch: 8 Global Step: 337140 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:35:51,824-Speed 2630.87 samples/sec Loss 8.0101 LearningRate 0.0352 Epoch: 8 Global Step: 337150 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:35:55,727-Speed 2624.55 samples/sec Loss 8.0102 LearningRate 0.0352 Epoch: 8 Global Step: 337160 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:35:59,619-Speed 2631.80 samples/sec Loss 7.9769 LearningRate 0.0352 Epoch: 8 Global Step: 337170 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:03,516-Speed 2628.62 samples/sec Loss 7.9233 LearningRate 0.0352 Epoch: 8 Global Step: 337180 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:07,407-Speed 2632.43 samples/sec Loss 7.9450 LearningRate 0.0352 Epoch: 8 Global Step: 337190 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:11,303-Speed 2628.55 samples/sec Loss 7.9599 LearningRate 0.0352 Epoch: 8 Global Step: 337200 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:15,197-Speed 2630.54 samples/sec Loss 7.9538 LearningRate 0.0352 Epoch: 8 Global Step: 337210 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:36:19,094-Speed 2628.41 samples/sec Loss 7.9975 LearningRate 0.0352 Epoch: 8 Global Step: 337220 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:36:22,988-Speed 2630.41 samples/sec Loss 7.9725 LearningRate 0.0352 Epoch: 8 Global Step: 337230 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:36:26,882-Speed 2630.23 samples/sec Loss 8.0153 LearningRate 0.0352 Epoch: 8 Global Step: 337240 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:36:30,775-Speed 2630.68 samples/sec Loss 7.9725 LearningRate 0.0352 Epoch: 8 Global Step: 337250 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:36:34,668-Speed 2631.81 samples/sec Loss 7.9564 LearningRate 0.0352 Epoch: 8 Global Step: 337260 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:36:38,557-Speed 2633.57 samples/sec Loss 7.8330 LearningRate 0.0352 Epoch: 8 Global Step: 337270 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:42,521-Speed 2583.49 samples/sec Loss 8.0324 LearningRate 0.0352 Epoch: 8 Global Step: 337280 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:46,414-Speed 2631.13 samples/sec Loss 7.9748 LearningRate 0.0352 Epoch: 8 Global Step: 337290 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:50,319-Speed 2622.51 samples/sec Loss 7.9492 LearningRate 0.0352 Epoch: 8 Global Step: 337300 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:54,214-Speed 2630.02 samples/sec Loss 7.9788 LearningRate 0.0352 Epoch: 8 Global Step: 337310 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:36:58,104-Speed 2633.01 samples/sec Loss 7.9299 LearningRate 0.0352 Epoch: 8 Global Step: 337320 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:37:02,071-Speed 2581.54 samples/sec Loss 8.0857 LearningRate 0.0352 Epoch: 8 Global Step: 337330 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:37:05,964-Speed 2631.18 samples/sec Loss 7.9149 LearningRate 0.0352 Epoch: 8 Global Step: 337340 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:37:09,878-Speed 2617.52 samples/sec Loss 7.8947 LearningRate 0.0352 Epoch: 8 Global Step: 337350 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:37:13,777-Speed 2626.84 samples/sec Loss 7.9759 LearningRate 0.0352 Epoch: 8 Global Step: 337360 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:37:17,669-Speed 2631.60 samples/sec Loss 7.8689 LearningRate 0.0352 Epoch: 8 Global Step: 337370 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:37:21,563-Speed 2630.34 samples/sec Loss 7.9855 LearningRate 0.0352 Epoch: 8 Global Step: 337380 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:37:25,458-Speed 2629.81 samples/sec Loss 7.9851 LearningRate 0.0352 Epoch: 8 Global Step: 337390 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:37:29,338-Speed 2639.86 samples/sec Loss 7.9962 LearningRate 0.0352 Epoch: 8 Global Step: 337400 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:37:33,217-Speed 2640.87 samples/sec Loss 7.8919 LearningRate 0.0352 Epoch: 8 Global Step: 337410 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:37:37,120-Speed 2623.84 samples/sec Loss 8.0419 LearningRate 0.0352 Epoch: 8 Global Step: 337420 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:37:41,031-Speed 2619.18 samples/sec Loss 8.0328 LearningRate 0.0352 Epoch: 8 Global Step: 337430 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:37:44,925-Speed 2630.47 samples/sec Loss 8.0438 LearningRate 0.0352 Epoch: 8 Global Step: 337440 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:37:48,822-Speed 2628.54 samples/sec Loss 8.0731 LearningRate 0.0352 Epoch: 8 Global Step: 337450 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:37:52,717-Speed 2629.78 samples/sec Loss 8.0769 LearningRate 0.0352 Epoch: 8 Global Step: 337460 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:37:56,617-Speed 2626.03 samples/sec Loss 8.0494 LearningRate 0.0352 Epoch: 8 Global Step: 337470 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:00,510-Speed 2630.96 samples/sec Loss 8.0452 LearningRate 0.0352 Epoch: 8 Global Step: 337480 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:04,418-Speed 2621.01 samples/sec Loss 8.0485 LearningRate 0.0352 Epoch: 8 Global Step: 337490 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:08,324-Speed 2622.42 samples/sec Loss 8.1787 LearningRate 0.0352 Epoch: 8 Global Step: 337500 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:12,220-Speed 2629.33 samples/sec Loss 7.9836 LearningRate 0.0352 Epoch: 8 Global Step: 337510 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:16,124-Speed 2623.57 samples/sec Loss 8.0842 LearningRate 0.0352 Epoch: 8 Global Step: 337520 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:20,014-Speed 2635.09 samples/sec Loss 7.9624 LearningRate 0.0352 Epoch: 8 Global Step: 337530 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:23,905-Speed 2632.08 samples/sec Loss 7.9183 LearningRate 0.0352 Epoch: 8 Global Step: 337540 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:27,811-Speed 2622.36 samples/sec Loss 8.0051 LearningRate 0.0352 Epoch: 8 Global Step: 337550 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:31,710-Speed 2627.09 samples/sec Loss 8.0248 LearningRate 0.0352 Epoch: 8 Global Step: 337560 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:35,609-Speed 2626.94 samples/sec Loss 8.0141 LearningRate 0.0352 Epoch: 8 Global Step: 337570 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:39,500-Speed 2632.43 samples/sec Loss 7.8904 LearningRate 0.0352 Epoch: 8 Global Step: 337580 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:38:43,375-Speed 2643.38 samples/sec Loss 8.0529 LearningRate 0.0352 Epoch: 8 Global Step: 337590 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:47,277-Speed 2624.38 samples/sec Loss 7.8664 LearningRate 0.0352 Epoch: 8 Global Step: 337600 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:51,178-Speed 2625.88 samples/sec Loss 8.0271 LearningRate 0.0352 Epoch: 8 Global Step: 337610 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:55,072-Speed 2630.43 samples/sec Loss 7.9790 LearningRate 0.0352 Epoch: 8 Global Step: 337620 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:38:58,966-Speed 2630.63 samples/sec Loss 7.9603 LearningRate 0.0352 Epoch: 8 Global Step: 337630 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:39:02,870-Speed 2623.99 samples/sec Loss 7.8262 LearningRate 0.0352 Epoch: 8 Global Step: 337640 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:39:06,766-Speed 2628.67 samples/sec Loss 7.8898 LearningRate 0.0352 Epoch: 8 Global Step: 337650 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:39:10,670-Speed 2623.34 samples/sec Loss 8.0611 LearningRate 0.0352 Epoch: 8 Global Step: 337660 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:39:15,100-Speed 2312.31 samples/sec Loss 7.9946 LearningRate 0.0352 Epoch: 8 Global Step: 337670 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:39:18,997-Speed 2628.90 samples/sec Loss 7.9938 LearningRate 0.0352 Epoch: 8 Global Step: 337680 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:39:22,887-Speed 2632.87 samples/sec Loss 7.9372 LearningRate 0.0352 Epoch: 8 Global Step: 337690 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:26,780-Speed 2631.11 samples/sec Loss 7.9112 LearningRate 0.0352 Epoch: 8 Global Step: 337700 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:30,682-Speed 2624.64 samples/sec Loss 7.9351 LearningRate 0.0352 Epoch: 8 Global Step: 337710 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:34,579-Speed 2628.11 samples/sec Loss 7.9374 LearningRate 0.0352 Epoch: 8 Global Step: 337720 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:38,473-Speed 2630.37 samples/sec Loss 8.1149 LearningRate 0.0352 Epoch: 8 Global Step: 337730 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:42,360-Speed 2635.31 samples/sec Loss 7.8833 LearningRate 0.0352 Epoch: 8 Global Step: 337740 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:46,254-Speed 2630.23 samples/sec Loss 7.8826 LearningRate 0.0351 Epoch: 8 Global Step: 337750 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:50,151-Speed 2628.17 samples/sec Loss 8.0132 LearningRate 0.0351 Epoch: 8 Global Step: 337760 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:54,050-Speed 2627.10 samples/sec Loss 8.0316 LearningRate 0.0351 Epoch: 8 Global Step: 337770 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:39:57,940-Speed 2633.14 samples/sec Loss 8.0454 LearningRate 0.0351 Epoch: 8 Global Step: 337780 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:01,837-Speed 2628.18 samples/sec Loss 7.9559 LearningRate 0.0351 Epoch: 8 Global Step: 337790 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:40:05,833-Speed 2563.37 samples/sec Loss 7.9335 LearningRate 0.0351 Epoch: 8 Global Step: 337800 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:40:09,737-Speed 2623.39 samples/sec Loss 7.7894 LearningRate 0.0351 Epoch: 8 Global Step: 337810 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:40:13,631-Speed 2630.65 samples/sec Loss 7.9599 LearningRate 0.0351 Epoch: 8 Global Step: 337820 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:40:17,531-Speed 2626.56 samples/sec Loss 7.9711 LearningRate 0.0351 Epoch: 8 Global Step: 337830 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:40:21,405-Speed 2643.19 samples/sec Loss 8.0088 LearningRate 0.0351 Epoch: 8 Global Step: 337840 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:25,302-Speed 2628.99 samples/sec Loss 7.8439 LearningRate 0.0351 Epoch: 8 Global Step: 337850 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:29,190-Speed 2633.78 samples/sec Loss 8.0350 LearningRate 0.0351 Epoch: 8 Global Step: 337860 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:33,086-Speed 2629.23 samples/sec Loss 7.9946 LearningRate 0.0351 Epoch: 8 Global Step: 337870 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:36,986-Speed 2626.46 samples/sec Loss 7.9543 LearningRate 0.0351 Epoch: 8 Global Step: 337880 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:40,891-Speed 2623.05 samples/sec Loss 8.0857 LearningRate 0.0351 Epoch: 8 Global Step: 337890 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:44,801-Speed 2618.97 samples/sec Loss 7.8618 LearningRate 0.0351 Epoch: 8 Global Step: 337900 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:48,697-Speed 2629.21 samples/sec Loss 7.9443 LearningRate 0.0351 Epoch: 8 Global Step: 337910 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:52,603-Speed 2622.35 samples/sec Loss 8.0386 LearningRate 0.0351 Epoch: 8 Global Step: 337920 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:40:56,491-Speed 2634.57 samples/sec Loss 7.8367 LearningRate 0.0351 Epoch: 8 Global Step: 337930 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:41:00,381-Speed 2632.86 samples/sec Loss 8.0727 LearningRate 0.0351 Epoch: 8 Global Step: 337940 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:04,275-Speed 2630.63 samples/sec Loss 7.9707 LearningRate 0.0351 Epoch: 8 Global Step: 337950 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:08,175-Speed 2626.22 samples/sec Loss 7.8645 LearningRate 0.0351 Epoch: 8 Global Step: 337960 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:12,083-Speed 2621.04 samples/sec Loss 8.0009 LearningRate 0.0351 Epoch: 8 Global Step: 337970 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:15,982-Speed 2626.40 samples/sec Loss 8.0018 LearningRate 0.0351 Epoch: 8 Global Step: 337980 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:19,887-Speed 2623.28 samples/sec Loss 8.0029 LearningRate 0.0351 Epoch: 8 Global Step: 337990 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:23,778-Speed 2632.26 samples/sec Loss 7.8915 LearningRate 0.0351 Epoch: 8 Global Step: 338000 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:27,667-Speed 2633.41 samples/sec Loss 8.0747 LearningRate 0.0351 Epoch: 8 Global Step: 338010 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:31,559-Speed 2631.96 samples/sec Loss 8.0921 LearningRate 0.0351 Epoch: 8 Global Step: 338020 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:35,453-Speed 2630.44 samples/sec Loss 8.0423 LearningRate 0.0351 Epoch: 8 Global Step: 338030 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:39,343-Speed 2632.86 samples/sec Loss 7.9880 LearningRate 0.0351 Epoch: 8 Global Step: 338040 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 09:41:43,215-Speed 2645.08 samples/sec Loss 8.0901 LearningRate 0.0351 Epoch: 8 Global Step: 338050 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:47,104-Speed 2633.58 samples/sec Loss 7.9339 LearningRate 0.0351 Epoch: 8 Global Step: 338060 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:51,019-Speed 2617.05 samples/sec Loss 7.9587 LearningRate 0.0351 Epoch: 8 Global Step: 338070 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:54,908-Speed 2633.58 samples/sec Loss 8.0033 LearningRate 0.0351 Epoch: 8 Global Step: 338080 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:41:58,811-Speed 2624.48 samples/sec Loss 7.9809 LearningRate 0.0351 Epoch: 8 Global Step: 338090 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:42:02,693-Speed 2638.94 samples/sec Loss 8.1167 LearningRate 0.0351 Epoch: 8 Global Step: 338100 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:06,583-Speed 2633.08 samples/sec Loss 8.1010 LearningRate 0.0351 Epoch: 8 Global Step: 338110 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:10,473-Speed 2633.05 samples/sec Loss 8.0137 LearningRate 0.0351 Epoch: 8 Global Step: 338120 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:14,369-Speed 2628.94 samples/sec Loss 8.0095 LearningRate 0.0351 Epoch: 8 Global Step: 338130 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:18,298-Speed 2606.82 samples/sec Loss 7.8961 LearningRate 0.0351 Epoch: 8 Global Step: 338140 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:22,197-Speed 2627.64 samples/sec Loss 8.0655 LearningRate 0.0351 Epoch: 8 Global Step: 338150 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:26,092-Speed 2629.66 samples/sec Loss 7.9946 LearningRate 0.0351 Epoch: 8 Global Step: 338160 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:29,997-Speed 2623.38 samples/sec Loss 7.9334 LearningRate 0.0351 Epoch: 8 Global Step: 338170 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:33,896-Speed 2626.92 samples/sec Loss 8.0891 LearningRate 0.0351 Epoch: 8 Global Step: 338180 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:37,797-Speed 2625.47 samples/sec Loss 7.9621 LearningRate 0.0351 Epoch: 8 Global Step: 338190 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:42:41,689-Speed 2631.75 samples/sec Loss 7.9376 LearningRate 0.0351 Epoch: 8 Global Step: 338200 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:42:45,583-Speed 2629.95 samples/sec Loss 7.9314 LearningRate 0.0351 Epoch: 8 Global Step: 338210 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:42:49,491-Speed 2621.08 samples/sec Loss 7.9817 LearningRate 0.0351 Epoch: 8 Global Step: 338220 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:42:53,393-Speed 2625.04 samples/sec Loss 7.9082 LearningRate 0.0351 Epoch: 8 Global Step: 338230 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:42:57,261-Speed 2648.55 samples/sec Loss 7.9909 LearningRate 0.0351 Epoch: 8 Global Step: 338240 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:01,154-Speed 2630.52 samples/sec Loss 8.0251 LearningRate 0.0351 Epoch: 8 Global Step: 338250 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:05,049-Speed 2629.59 samples/sec Loss 7.9131 LearningRate 0.0351 Epoch: 8 Global Step: 338260 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:08,941-Speed 2631.98 samples/sec Loss 8.0629 LearningRate 0.0351 Epoch: 8 Global Step: 338270 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:12,835-Speed 2630.36 samples/sec Loss 7.8694 LearningRate 0.0351 Epoch: 8 Global Step: 338280 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:16,724-Speed 2633.26 samples/sec Loss 8.0289 LearningRate 0.0351 Epoch: 8 Global Step: 338290 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:20,617-Speed 2631.42 samples/sec Loss 8.0423 LearningRate 0.0351 Epoch: 8 Global Step: 338300 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:24,515-Speed 2627.35 samples/sec Loss 7.9896 LearningRate 0.0351 Epoch: 8 Global Step: 338310 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:28,411-Speed 2628.92 samples/sec Loss 7.9212 LearningRate 0.0351 Epoch: 8 Global Step: 338320 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:32,326-Speed 2616.15 samples/sec Loss 7.9284 LearningRate 0.0351 Epoch: 8 Global Step: 338330 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:36,223-Speed 2628.46 samples/sec Loss 7.8352 LearningRate 0.0351 Epoch: 8 Global Step: 338340 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:43:40,100-Speed 2641.99 samples/sec Loss 7.9056 LearningRate 0.0351 Epoch: 8 Global Step: 338350 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:43,995-Speed 2629.39 samples/sec Loss 7.7857 LearningRate 0.0351 Epoch: 8 Global Step: 338360 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:47,898-Speed 2624.12 samples/sec Loss 8.0167 LearningRate 0.0351 Epoch: 8 Global Step: 338370 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:51,802-Speed 2623.70 samples/sec Loss 7.9391 LearningRate 0.0351 Epoch: 8 Global Step: 338380 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:55,692-Speed 2633.03 samples/sec Loss 7.8911 LearningRate 0.0351 Epoch: 8 Global Step: 338390 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:43:59,583-Speed 2632.77 samples/sec Loss 7.8312 LearningRate 0.0351 Epoch: 8 Global Step: 338400 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:03,477-Speed 2630.48 samples/sec Loss 8.0847 LearningRate 0.0351 Epoch: 8 Global Step: 338410 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:07,369-Speed 2631.60 samples/sec Loss 7.8350 LearningRate 0.0351 Epoch: 8 Global Step: 338420 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:11,605-Speed 2417.64 samples/sec Loss 8.0544 LearningRate 0.0351 Epoch: 8 Global Step: 338430 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:15,496-Speed 2633.07 samples/sec Loss 7.8910 LearningRate 0.0351 Epoch: 8 Global Step: 338440 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:19,387-Speed 2632.06 samples/sec Loss 7.9383 LearningRate 0.0350 Epoch: 8 Global Step: 338450 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:44:23,281-Speed 2630.66 samples/sec Loss 7.9400 LearningRate 0.0350 Epoch: 8 Global Step: 338460 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:44:27,176-Speed 2629.60 samples/sec Loss 8.0561 LearningRate 0.0350 Epoch: 8 Global Step: 338470 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:44:31,052-Speed 2642.50 samples/sec Loss 8.0196 LearningRate 0.0350 Epoch: 8 Global Step: 338480 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:34,947-Speed 2629.93 samples/sec Loss 7.9174 LearningRate 0.0350 Epoch: 8 Global Step: 338490 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:38,838-Speed 2632.29 samples/sec Loss 7.8772 LearningRate 0.0350 Epoch: 8 Global Step: 338500 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:42,732-Speed 2630.37 samples/sec Loss 7.9026 LearningRate 0.0350 Epoch: 8 Global Step: 338510 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:46,626-Speed 2630.02 samples/sec Loss 7.8589 LearningRate 0.0350 Epoch: 8 Global Step: 338520 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:50,515-Speed 2634.31 samples/sec Loss 7.9386 LearningRate 0.0350 Epoch: 8 Global Step: 338530 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:44:54,356-Speed 2666.42 samples/sec Loss 7.9283 LearningRate 0.0350 Epoch: 8 Global Step: 338540 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:44:58,226-Speed 2646.78 samples/sec Loss 9.5497 LearningRate 0.0350 Epoch: 8 Global Step: 338550 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:02,316-Speed 2504.35 samples/sec Loss 8.8636 LearningRate 0.0350 Epoch: 8 Global Step: 338560 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:06,332-Speed 2549.90 samples/sec Loss 8.4360 LearningRate 0.0350 Epoch: 8 Global Step: 338570 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:10,227-Speed 2629.61 samples/sec Loss 8.2983 LearningRate 0.0350 Epoch: 8 Global Step: 338580 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:14,119-Speed 2632.14 samples/sec Loss 8.1735 LearningRate 0.0350 Epoch: 8 Global Step: 338590 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:18,012-Speed 2631.08 samples/sec Loss 8.1899 LearningRate 0.0350 Epoch: 8 Global Step: 338600 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:21,900-Speed 2634.74 samples/sec Loss 7.9739 LearningRate 0.0350 Epoch: 8 Global Step: 338610 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:25,793-Speed 2630.22 samples/sec Loss 8.1017 LearningRate 0.0350 Epoch: 8 Global Step: 338620 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:29,683-Speed 2633.70 samples/sec Loss 7.8620 LearningRate 0.0350 Epoch: 8 Global Step: 338630 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:33,570-Speed 2635.39 samples/sec Loss 8.1524 LearningRate 0.0350 Epoch: 8 Global Step: 338640 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 09:45:37,465-Speed 2629.27 samples/sec Loss 8.0067 LearningRate 0.0350 Epoch: 8 Global Step: 338650 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:45:41,364-Speed 2626.50 samples/sec Loss 8.1343 LearningRate 0.0350 Epoch: 8 Global Step: 338660 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:45:45,256-Speed 2631.69 samples/sec Loss 8.1179 LearningRate 0.0350 Epoch: 8 Global Step: 338670 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:45:49,155-Speed 2626.76 samples/sec Loss 7.9194 LearningRate 0.0350 Epoch: 8 Global Step: 338680 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:45:53,046-Speed 2632.46 samples/sec Loss 8.2150 LearningRate 0.0350 Epoch: 8 Global Step: 338690 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:45:56,937-Speed 2632.49 samples/sec Loss 8.0253 LearningRate 0.0350 Epoch: 8 Global Step: 338700 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:46:00,827-Speed 2633.25 samples/sec Loss 8.1474 LearningRate 0.0350 Epoch: 8 Global Step: 338710 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:46:04,720-Speed 2631.15 samples/sec Loss 7.9717 LearningRate 0.0350 Epoch: 8 Global Step: 338720 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:46:08,623-Speed 2624.25 samples/sec Loss 8.0563 LearningRate 0.0350 Epoch: 8 Global Step: 338730 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:46:12,528-Speed 2622.82 samples/sec Loss 7.9754 LearningRate 0.0350 Epoch: 8 Global Step: 338740 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:46:16,423-Speed 2629.37 samples/sec Loss 8.0336 LearningRate 0.0350 Epoch: 8 Global Step: 338750 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:20,316-Speed 2631.97 samples/sec Loss 8.0466 LearningRate 0.0350 Epoch: 8 Global Step: 338760 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:24,212-Speed 2628.35 samples/sec Loss 8.0276 LearningRate 0.0350 Epoch: 8 Global Step: 338770 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:28,107-Speed 2629.63 samples/sec Loss 8.0643 LearningRate 0.0350 Epoch: 8 Global Step: 338780 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:32,005-Speed 2628.08 samples/sec Loss 8.0411 LearningRate 0.0350 Epoch: 8 Global Step: 338790 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:35,894-Speed 2633.67 samples/sec Loss 7.9511 LearningRate 0.0350 Epoch: 8 Global Step: 338800 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:39,789-Speed 2629.82 samples/sec Loss 7.9391 LearningRate 0.0350 Epoch: 8 Global Step: 338810 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:43,687-Speed 2627.33 samples/sec Loss 7.8976 LearningRate 0.0350 Epoch: 8 Global Step: 338820 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:47,599-Speed 2617.95 samples/sec Loss 7.9477 LearningRate 0.0350 Epoch: 8 Global Step: 338830 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:51,500-Speed 2626.16 samples/sec Loss 8.0167 LearningRate 0.0350 Epoch: 8 Global Step: 338840 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:46:55,402-Speed 2625.38 samples/sec Loss 7.9363 LearningRate 0.0350 Epoch: 8 Global Step: 338850 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:46:59,298-Speed 2628.73 samples/sec Loss 7.9686 LearningRate 0.0350 Epoch: 8 Global Step: 338860 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:03,211-Speed 2617.52 samples/sec Loss 8.0678 LearningRate 0.0350 Epoch: 8 Global Step: 338870 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:07,105-Speed 2630.54 samples/sec Loss 8.0457 LearningRate 0.0350 Epoch: 8 Global Step: 338880 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:11,016-Speed 2618.62 samples/sec Loss 7.8693 LearningRate 0.0350 Epoch: 8 Global Step: 338890 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:14,910-Speed 2630.61 samples/sec Loss 7.8882 LearningRate 0.0350 Epoch: 8 Global Step: 338900 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:18,824-Speed 2617.11 samples/sec Loss 8.0679 LearningRate 0.0350 Epoch: 8 Global Step: 338910 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:22,712-Speed 2634.54 samples/sec Loss 8.0666 LearningRate 0.0350 Epoch: 8 Global Step: 338920 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:26,623-Speed 2618.63 samples/sec Loss 8.0276 LearningRate 0.0350 Epoch: 8 Global Step: 338930 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:30,543-Speed 2612.77 samples/sec Loss 8.1153 LearningRate 0.0350 Epoch: 8 Global Step: 338940 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:34,437-Speed 2631.06 samples/sec Loss 8.0473 LearningRate 0.0350 Epoch: 8 Global Step: 338950 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:47:38,332-Speed 2629.67 samples/sec Loss 8.0351 LearningRate 0.0350 Epoch: 8 Global Step: 338960 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:47:42,225-Speed 2630.72 samples/sec Loss 7.9532 LearningRate 0.0350 Epoch: 8 Global Step: 338970 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:47:46,118-Speed 2630.72 samples/sec Loss 7.9283 LearningRate 0.0350 Epoch: 8 Global Step: 338980 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:47:50,010-Speed 2631.85 samples/sec Loss 8.1120 LearningRate 0.0350 Epoch: 8 Global Step: 338990 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:47:53,889-Speed 2640.12 samples/sec Loss 7.9984 LearningRate 0.0350 Epoch: 8 Global Step: 339000 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:47:57,785-Speed 2629.30 samples/sec Loss 7.9890 LearningRate 0.0350 Epoch: 8 Global Step: 339010 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:01,675-Speed 2633.22 samples/sec Loss 7.8484 LearningRate 0.0350 Epoch: 8 Global Step: 339020 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:05,567-Speed 2631.52 samples/sec Loss 7.9943 LearningRate 0.0350 Epoch: 8 Global Step: 339030 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:09,460-Speed 2630.79 samples/sec Loss 8.0453 LearningRate 0.0350 Epoch: 8 Global Step: 339040 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:13,355-Speed 2629.71 samples/sec Loss 8.0161 LearningRate 0.0350 Epoch: 8 Global Step: 339050 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:17,248-Speed 2630.82 samples/sec Loss 7.8990 LearningRate 0.0350 Epoch: 8 Global Step: 339060 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:21,152-Speed 2623.64 samples/sec Loss 8.0268 LearningRate 0.0350 Epoch: 8 Global Step: 339070 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:25,043-Speed 2632.10 samples/sec Loss 7.8441 LearningRate 0.0350 Epoch: 8 Global Step: 339080 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:28,938-Speed 2630.25 samples/sec Loss 7.9639 LearningRate 0.0350 Epoch: 8 Global Step: 339090 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:48:32,841-Speed 2624.01 samples/sec Loss 7.8993 LearningRate 0.0350 Epoch: 8 Global Step: 339100 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:48:36,736-Speed 2629.89 samples/sec Loss 8.1350 LearningRate 0.0350 Epoch: 8 Global Step: 339110 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:48:40,628-Speed 2631.49 samples/sec Loss 8.1458 LearningRate 0.0350 Epoch: 8 Global Step: 339120 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:48:44,520-Speed 2631.51 samples/sec Loss 7.9502 LearningRate 0.0350 Epoch: 8 Global Step: 339130 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:48:48,413-Speed 2630.87 samples/sec Loss 7.9121 LearningRate 0.0350 Epoch: 8 Global Step: 339140 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:48:52,310-Speed 2628.28 samples/sec Loss 8.0395 LearningRate 0.0349 Epoch: 8 Global Step: 339150 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:48:56,214-Speed 2624.14 samples/sec Loss 8.0078 LearningRate 0.0349 Epoch: 8 Global Step: 339160 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:49:00,108-Speed 2630.03 samples/sec Loss 7.9907 LearningRate 0.0349 Epoch: 8 Global Step: 339170 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:49:04,007-Speed 2626.75 samples/sec Loss 8.0781 LearningRate 0.0349 Epoch: 8 Global Step: 339180 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:49:07,901-Speed 2630.79 samples/sec Loss 7.8831 LearningRate 0.0349 Epoch: 8 Global Step: 339190 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:49:11,792-Speed 2632.20 samples/sec Loss 7.9878 LearningRate 0.0349 Epoch: 8 Global Step: 339200 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:15,698-Speed 2622.74 samples/sec Loss 7.9832 LearningRate 0.0349 Epoch: 8 Global Step: 339210 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:19,592-Speed 2630.66 samples/sec Loss 8.0507 LearningRate 0.0349 Epoch: 8 Global Step: 339220 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:23,486-Speed 2630.20 samples/sec Loss 8.0694 LearningRate 0.0349 Epoch: 8 Global Step: 339230 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:27,381-Speed 2629.77 samples/sec Loss 7.8998 LearningRate 0.0349 Epoch: 8 Global Step: 339240 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:31,300-Speed 2614.16 samples/sec Loss 7.9569 LearningRate 0.0349 Epoch: 8 Global Step: 339250 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:35,195-Speed 2629.50 samples/sec Loss 7.9741 LearningRate 0.0349 Epoch: 8 Global Step: 339260 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:39,088-Speed 2630.75 samples/sec Loss 7.9594 LearningRate 0.0349 Epoch: 8 Global Step: 339270 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:42,983-Speed 2629.75 samples/sec Loss 7.9949 LearningRate 0.0349 Epoch: 8 Global Step: 339280 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:49:46,859-Speed 2642.59 samples/sec Loss 7.9878 LearningRate 0.0349 Epoch: 8 Global Step: 339290 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:49:50,753-Speed 2631.14 samples/sec Loss 8.0969 LearningRate 0.0349 Epoch: 8 Global Step: 339300 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:49:54,645-Speed 2631.43 samples/sec Loss 8.0157 LearningRate 0.0349 Epoch: 8 Global Step: 339310 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:49:58,537-Speed 2631.81 samples/sec Loss 7.9732 LearningRate 0.0349 Epoch: 8 Global Step: 339320 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:50:02,432-Speed 2629.28 samples/sec Loss 7.8061 LearningRate 0.0349 Epoch: 8 Global Step: 339330 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:50:06,327-Speed 2629.76 samples/sec Loss 8.0038 LearningRate 0.0349 Epoch: 8 Global Step: 339340 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:50:10,229-Speed 2624.57 samples/sec Loss 8.0598 LearningRate 0.0349 Epoch: 8 Global Step: 339350 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:50:14,075-Speed 2663.91 samples/sec Loss 8.2910 LearningRate 0.0349 Epoch: 8 Global Step: 339360 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:17,969-Speed 2630.06 samples/sec Loss 7.9730 LearningRate 0.0349 Epoch: 8 Global Step: 339370 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:21,902-Speed 2604.72 samples/sec Loss 8.0503 LearningRate 0.0349 Epoch: 8 Global Step: 339380 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:25,798-Speed 2629.08 samples/sec Loss 7.9168 LearningRate 0.0349 Epoch: 8 Global Step: 339390 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:29,688-Speed 2633.01 samples/sec Loss 7.8734 LearningRate 0.0349 Epoch: 8 Global Step: 339400 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:33,580-Speed 2631.60 samples/sec Loss 7.9402 LearningRate 0.0349 Epoch: 8 Global Step: 339410 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:37,471-Speed 2632.21 samples/sec Loss 7.8947 LearningRate 0.0349 Epoch: 8 Global Step: 339420 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:41,362-Speed 2632.19 samples/sec Loss 7.8516 LearningRate 0.0349 Epoch: 8 Global Step: 339430 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:45,256-Speed 2630.44 samples/sec Loss 7.9952 LearningRate 0.0349 Epoch: 8 Global Step: 339440 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:49,148-Speed 2631.68 samples/sec Loss 7.7993 LearningRate 0.0349 Epoch: 8 Global Step: 339450 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 09:50:53,040-Speed 2632.01 samples/sec Loss 7.8172 LearningRate 0.0349 Epoch: 8 Global Step: 339460 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:50:56,929-Speed 2633.25 samples/sec Loss 8.1144 LearningRate 0.0349 Epoch: 8 Global Step: 339470 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:00,827-Speed 2627.71 samples/sec Loss 7.9956 LearningRate 0.0349 Epoch: 8 Global Step: 339480 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:04,764-Speed 2601.75 samples/sec Loss 8.0030 LearningRate 0.0349 Epoch: 8 Global Step: 339490 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:08,648-Speed 2637.36 samples/sec Loss 7.9646 LearningRate 0.0349 Epoch: 8 Global Step: 339500 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:12,542-Speed 2630.03 samples/sec Loss 7.9496 LearningRate 0.0349 Epoch: 8 Global Step: 339510 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:16,445-Speed 2624.52 samples/sec Loss 7.9647 LearningRate 0.0349 Epoch: 8 Global Step: 339520 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:20,362-Speed 2615.14 samples/sec Loss 7.8483 LearningRate 0.0349 Epoch: 8 Global Step: 339530 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:24,253-Speed 2632.49 samples/sec Loss 8.0874 LearningRate 0.0349 Epoch: 8 Global Step: 339540 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:28,143-Speed 2632.97 samples/sec Loss 8.0018 LearningRate 0.0349 Epoch: 8 Global Step: 339550 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 09:51:32,038-Speed 2629.85 samples/sec Loss 7.9948 LearningRate 0.0349 Epoch: 8 Global Step: 339560 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:51:35,930-Speed 2631.56 samples/sec Loss 7.9766 LearningRate 0.0349 Epoch: 8 Global Step: 339570 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:51:39,825-Speed 2629.79 samples/sec Loss 7.9354 LearningRate 0.0349 Epoch: 8 Global Step: 339580 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:51:43,717-Speed 2631.83 samples/sec Loss 7.8204 LearningRate 0.0349 Epoch: 8 Global Step: 339590 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:51:47,615-Speed 2627.50 samples/sec Loss 7.8979 LearningRate 0.0349 Epoch: 8 Global Step: 339600 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:51:51,508-Speed 2631.14 samples/sec Loss 7.9025 LearningRate 0.0349 Epoch: 8 Global Step: 339610 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:51:55,518-Speed 2554.32 samples/sec Loss 7.7623 LearningRate 0.0349 Epoch: 8 Global Step: 339620 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:51:59,535-Speed 2550.10 samples/sec Loss 7.9556 LearningRate 0.0349 Epoch: 8 Global Step: 339630 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:52:03,448-Speed 2617.34 samples/sec Loss 7.8470 LearningRate 0.0349 Epoch: 8 Global Step: 339640 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:52:07,344-Speed 2629.19 samples/sec Loss 7.9783 LearningRate 0.0349 Epoch: 8 Global Step: 339650 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:52:11,241-Speed 2627.86 samples/sec Loss 7.8676 LearningRate 0.0349 Epoch: 8 Global Step: 339660 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:15,135-Speed 2631.16 samples/sec Loss 7.9799 LearningRate 0.0349 Epoch: 8 Global Step: 339670 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:19,032-Speed 2627.45 samples/sec Loss 7.9575 LearningRate 0.0349 Epoch: 8 Global Step: 339680 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:22,936-Speed 2623.78 samples/sec Loss 7.7965 LearningRate 0.0349 Epoch: 8 Global Step: 339690 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:26,837-Speed 2626.10 samples/sec Loss 7.9749 LearningRate 0.0349 Epoch: 8 Global Step: 339700 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:30,731-Speed 2630.26 samples/sec Loss 7.9899 LearningRate 0.0349 Epoch: 8 Global Step: 339710 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:34,625-Speed 2630.52 samples/sec Loss 7.9731 LearningRate 0.0349 Epoch: 8 Global Step: 339720 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:38,522-Speed 2627.99 samples/sec Loss 7.9102 LearningRate 0.0349 Epoch: 8 Global Step: 339730 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:42,424-Speed 2624.45 samples/sec Loss 8.0756 LearningRate 0.0349 Epoch: 8 Global Step: 339740 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:46,318-Speed 2630.52 samples/sec Loss 7.8615 LearningRate 0.0349 Epoch: 8 Global Step: 339750 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:52:50,219-Speed 2625.56 samples/sec Loss 7.9894 LearningRate 0.0349 Epoch: 8 Global Step: 339760 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:52:54,112-Speed 2631.25 samples/sec Loss 7.9055 LearningRate 0.0349 Epoch: 8 Global Step: 339770 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:52:58,007-Speed 2629.99 samples/sec Loss 8.0346 LearningRate 0.0349 Epoch: 8 Global Step: 339780 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:53:01,889-Speed 2638.29 samples/sec Loss 7.9438 LearningRate 0.0349 Epoch: 8 Global Step: 339790 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:05,781-Speed 2631.69 samples/sec Loss 7.9971 LearningRate 0.0349 Epoch: 8 Global Step: 339800 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:09,672-Speed 2632.37 samples/sec Loss 7.8766 LearningRate 0.0349 Epoch: 8 Global Step: 339810 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:13,567-Speed 2629.88 samples/sec Loss 7.9329 LearningRate 0.0349 Epoch: 8 Global Step: 339820 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:17,481-Speed 2616.58 samples/sec Loss 8.0590 LearningRate 0.0349 Epoch: 8 Global Step: 339830 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:21,379-Speed 2628.16 samples/sec Loss 7.9473 LearningRate 0.0349 Epoch: 8 Global Step: 339840 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:25,277-Speed 2627.40 samples/sec Loss 7.9461 LearningRate 0.0348 Epoch: 8 Global Step: 339850 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:29,173-Speed 2629.29 samples/sec Loss 7.9514 LearningRate 0.0348 Epoch: 8 Global Step: 339860 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:33,066-Speed 2631.35 samples/sec Loss 7.9888 LearningRate 0.0348 Epoch: 8 Global Step: 339870 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:37,036-Speed 2579.52 samples/sec Loss 7.8185 LearningRate 0.0348 Epoch: 8 Global Step: 339880 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:53:40,942-Speed 2622.12 samples/sec Loss 8.1055 LearningRate 0.0348 Epoch: 8 Global Step: 339890 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:53:44,841-Speed 2626.73 samples/sec Loss 7.8751 LearningRate 0.0348 Epoch: 8 Global Step: 339900 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:53:48,733-Speed 2632.25 samples/sec Loss 7.9305 LearningRate 0.0348 Epoch: 8 Global Step: 339910 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:53:52,641-Speed 2620.90 samples/sec Loss 7.9404 LearningRate 0.0348 Epoch: 8 Global Step: 339920 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:53:56,572-Speed 2605.90 samples/sec Loss 7.8447 LearningRate 0.0348 Epoch: 8 Global Step: 339930 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:54:00,468-Speed 2629.17 samples/sec Loss 7.8512 LearningRate 0.0348 Epoch: 8 Global Step: 339940 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:54:04,360-Speed 2631.92 samples/sec Loss 7.9657 LearningRate 0.0348 Epoch: 8 Global Step: 339950 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:54:08,252-Speed 2631.33 samples/sec Loss 8.0410 LearningRate 0.0348 Epoch: 8 Global Step: 339960 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:54:12,152-Speed 2625.98 samples/sec Loss 7.9330 LearningRate 0.0348 Epoch: 8 Global Step: 339970 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:54:16,053-Speed 2625.51 samples/sec Loss 8.0138 LearningRate 0.0348 Epoch: 8 Global Step: 339980 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:54:19,949-Speed 2629.11 samples/sec Loss 7.9488 LearningRate 0.0348 Epoch: 8 Global Step: 339990 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 09:54:23,826-Speed 2642.07 samples/sec Loss 7.9458 LearningRate 0.0348 Epoch: 8 Global Step: 340000 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:55:06,916-[lfw][340000]XNorm: 23.821781
Training: 2022-04-14 09:55:06,917-[lfw][340000]Accuracy-Flip: 0.99750+-0.00300
Training: 2022-04-14 09:55:06,917-[lfw][340000]Accuracy-Highest: 0.99783
Training: 2022-04-14 09:55:57,090-[cfp_fp][340000]XNorm: 22.373600
Training: 2022-04-14 09:55:57,091-[cfp_fp][340000]Accuracy-Flip: 0.98471+-0.00568
Training: 2022-04-14 09:55:57,092-[cfp_fp][340000]Accuracy-Highest: 0.98671
Training: 2022-04-14 09:56:39,927-[agedb_30][340000]XNorm: 23.840092
Training: 2022-04-14 09:56:39,928-[agedb_30][340000]Accuracy-Flip: 0.97550+-0.00624
Training: 2022-04-14 09:56:39,929-[agedb_30][340000]Accuracy-Highest: 0.97567
Training: 2022-04-14 09:56:43,776-Speed 73.17 samples/sec Loss 7.8511 LearningRate 0.0348 Epoch: 8 Global Step: 340010 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:56:47,707-Speed 2605.66 samples/sec Loss 7.9034 LearningRate 0.0348 Epoch: 8 Global Step: 340020 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:56:51,767-Speed 2522.87 samples/sec Loss 8.0212 LearningRate 0.0348 Epoch: 8 Global Step: 340030 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:56:55,662-Speed 2629.72 samples/sec Loss 8.0207 LearningRate 0.0348 Epoch: 8 Global Step: 340040 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:56:59,536-Speed 2643.87 samples/sec Loss 8.0078 LearningRate 0.0348 Epoch: 8 Global Step: 340050 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:57:03,419-Speed 2637.03 samples/sec Loss 7.8868 LearningRate 0.0348 Epoch: 8 Global Step: 340060 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:57:07,299-Speed 2640.21 samples/sec Loss 7.9230 LearningRate 0.0348 Epoch: 8 Global Step: 340070 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:57:11,189-Speed 2632.70 samples/sec Loss 8.0346 LearningRate 0.0348 Epoch: 8 Global Step: 340080 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:57:15,081-Speed 2631.87 samples/sec Loss 8.0475 LearningRate 0.0348 Epoch: 8 Global Step: 340090 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:57:18,972-Speed 2632.61 samples/sec Loss 8.0318 LearningRate 0.0348 Epoch: 8 Global Step: 340100 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:57:22,858-Speed 2636.18 samples/sec Loss 7.7697 LearningRate 0.0348 Epoch: 8 Global Step: 340110 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:26,742-Speed 2636.85 samples/sec Loss 7.8530 LearningRate 0.0348 Epoch: 8 Global Step: 340120 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:30,628-Speed 2635.79 samples/sec Loss 7.8862 LearningRate 0.0348 Epoch: 8 Global Step: 340130 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:34,513-Speed 2636.20 samples/sec Loss 7.8605 LearningRate 0.0348 Epoch: 8 Global Step: 340140 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:38,408-Speed 2629.56 samples/sec Loss 7.9551 LearningRate 0.0348 Epoch: 8 Global Step: 340150 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:42,310-Speed 2625.29 samples/sec Loss 7.9196 LearningRate 0.0348 Epoch: 8 Global Step: 340160 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:46,209-Speed 2626.95 samples/sec Loss 7.8640 LearningRate 0.0348 Epoch: 8 Global Step: 340170 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:50,118-Speed 2620.50 samples/sec Loss 7.8891 LearningRate 0.0348 Epoch: 8 Global Step: 340180 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:54,014-Speed 2628.70 samples/sec Loss 7.8499 LearningRate 0.0348 Epoch: 8 Global Step: 340190 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:57:58,417-Speed 2326.66 samples/sec Loss 8.0422 LearningRate 0.0348 Epoch: 8 Global Step: 340200 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:58:02,308-Speed 2632.74 samples/sec Loss 7.8702 LearningRate 0.0348 Epoch: 8 Global Step: 340210 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:58:06,180-Speed 2645.15 samples/sec Loss 8.0474 LearningRate 0.0348 Epoch: 8 Global Step: 340220 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:58:10,074-Speed 2630.75 samples/sec Loss 7.9089 LearningRate 0.0348 Epoch: 8 Global Step: 340230 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:58:13,964-Speed 2632.81 samples/sec Loss 8.0307 LearningRate 0.0348 Epoch: 8 Global Step: 340240 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:58:17,855-Speed 2632.77 samples/sec Loss 8.0519 LearningRate 0.0348 Epoch: 8 Global Step: 340250 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:58:21,745-Speed 2632.68 samples/sec Loss 7.8379 LearningRate 0.0348 Epoch: 8 Global Step: 340260 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:58:25,654-Speed 2620.38 samples/sec Loss 8.0399 LearningRate 0.0348 Epoch: 8 Global Step: 340270 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:58:29,545-Speed 2632.24 samples/sec Loss 8.7269 LearningRate 0.0348 Epoch: 8 Global Step: 340280 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:58:33,462-Speed 2615.32 samples/sec Loss 9.0381 LearningRate 0.0348 Epoch: 8 Global Step: 340290 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:58:37,353-Speed 2633.04 samples/sec Loss 8.5724 LearningRate 0.0348 Epoch: 8 Global Step: 340300 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:58:41,242-Speed 2633.64 samples/sec Loss 8.1886 LearningRate 0.0348 Epoch: 8 Global Step: 340310 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:58:45,132-Speed 2632.77 samples/sec Loss 8.2123 LearningRate 0.0348 Epoch: 8 Global Step: 340320 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:58:49,026-Speed 2630.43 samples/sec Loss 8.1547 LearningRate 0.0348 Epoch: 8 Global Step: 340330 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:58:52,915-Speed 2633.97 samples/sec Loss 8.1452 LearningRate 0.0348 Epoch: 8 Global Step: 340340 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:58:56,811-Speed 2629.12 samples/sec Loss 7.9938 LearningRate 0.0348 Epoch: 8 Global Step: 340350 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:59:00,702-Speed 2632.26 samples/sec Loss 7.9891 LearningRate 0.0348 Epoch: 8 Global Step: 340360 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:59:04,603-Speed 2625.60 samples/sec Loss 8.1016 LearningRate 0.0348 Epoch: 8 Global Step: 340370 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 09:59:08,501-Speed 2628.08 samples/sec Loss 7.9774 LearningRate 0.0348 Epoch: 8 Global Step: 340380 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:12,392-Speed 2632.44 samples/sec Loss 8.0738 LearningRate 0.0348 Epoch: 8 Global Step: 340390 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:16,285-Speed 2630.91 samples/sec Loss 8.0044 LearningRate 0.0348 Epoch: 8 Global Step: 340400 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:20,180-Speed 2629.89 samples/sec Loss 8.1242 LearningRate 0.0348 Epoch: 8 Global Step: 340410 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:24,081-Speed 2625.36 samples/sec Loss 7.9443 LearningRate 0.0348 Epoch: 8 Global Step: 340420 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:27,980-Speed 2626.42 samples/sec Loss 8.0022 LearningRate 0.0348 Epoch: 8 Global Step: 340430 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:31,875-Speed 2629.81 samples/sec Loss 7.8025 LearningRate 0.0348 Epoch: 8 Global Step: 340440 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:35,817-Speed 2598.25 samples/sec Loss 7.7468 LearningRate 0.0348 Epoch: 8 Global Step: 340450 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:39,744-Speed 2608.46 samples/sec Loss 7.9952 LearningRate 0.0348 Epoch: 8 Global Step: 340460 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:43,640-Speed 2628.96 samples/sec Loss 7.9654 LearningRate 0.0348 Epoch: 8 Global Step: 340470 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 09:59:47,532-Speed 2631.67 samples/sec Loss 7.9899 LearningRate 0.0348 Epoch: 8 Global Step: 340480 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:59:51,422-Speed 2633.32 samples/sec Loss 8.0450 LearningRate 0.0348 Epoch: 8 Global Step: 340490 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:59:55,318-Speed 2628.90 samples/sec Loss 7.8309 LearningRate 0.0348 Epoch: 8 Global Step: 340500 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 09:59:59,218-Speed 2625.73 samples/sec Loss 7.9258 LearningRate 0.0348 Epoch: 8 Global Step: 340510 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:00:03,113-Speed 2629.96 samples/sec Loss 7.8648 LearningRate 0.0348 Epoch: 8 Global Step: 340520 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:00:07,018-Speed 2622.77 samples/sec Loss 8.0062 LearningRate 0.0348 Epoch: 8 Global Step: 340530 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:00:10,919-Speed 2625.69 samples/sec Loss 7.8725 LearningRate 0.0348 Epoch: 8 Global Step: 340540 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:00:14,797-Speed 2641.51 samples/sec Loss 7.9454 LearningRate 0.0347 Epoch: 8 Global Step: 340550 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:18,692-Speed 2629.45 samples/sec Loss 7.9662 LearningRate 0.0347 Epoch: 8 Global Step: 340560 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:22,595-Speed 2624.72 samples/sec Loss 7.8498 LearningRate 0.0347 Epoch: 8 Global Step: 340570 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:26,497-Speed 2624.52 samples/sec Loss 7.8733 LearningRate 0.0347 Epoch: 8 Global Step: 340580 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:30,401-Speed 2623.22 samples/sec Loss 8.0905 LearningRate 0.0347 Epoch: 8 Global Step: 340590 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:34,301-Speed 2626.45 samples/sec Loss 7.9081 LearningRate 0.0347 Epoch: 8 Global Step: 340600 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:38,210-Speed 2620.83 samples/sec Loss 7.9177 LearningRate 0.0347 Epoch: 8 Global Step: 340610 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:42,116-Speed 2621.93 samples/sec Loss 7.9005 LearningRate 0.0347 Epoch: 8 Global Step: 340620 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:46,013-Speed 2628.67 samples/sec Loss 8.0401 LearningRate 0.0347 Epoch: 8 Global Step: 340630 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:49,940-Speed 2607.93 samples/sec Loss 7.8778 LearningRate 0.0347 Epoch: 8 Global Step: 340640 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:00:53,839-Speed 2627.31 samples/sec Loss 7.9331 LearningRate 0.0347 Epoch: 8 Global Step: 340650 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:00:57,738-Speed 2627.08 samples/sec Loss 8.0384 LearningRate 0.0347 Epoch: 8 Global Step: 340660 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:01,634-Speed 2628.85 samples/sec Loss 8.0554 LearningRate 0.0347 Epoch: 8 Global Step: 340670 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:05,532-Speed 2627.08 samples/sec Loss 8.0004 LearningRate 0.0347 Epoch: 8 Global Step: 340680 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:09,441-Speed 2620.14 samples/sec Loss 7.9008 LearningRate 0.0347 Epoch: 8 Global Step: 340690 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:13,336-Speed 2630.09 samples/sec Loss 7.9485 LearningRate 0.0347 Epoch: 8 Global Step: 340700 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:17,232-Speed 2628.99 samples/sec Loss 7.9989 LearningRate 0.0347 Epoch: 8 Global Step: 340710 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:21,130-Speed 2633.43 samples/sec Loss 7.8034 LearningRate 0.0347 Epoch: 8 Global Step: 340720 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:25,023-Speed 2630.89 samples/sec Loss 7.8701 LearningRate 0.0347 Epoch: 8 Global Step: 340730 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:28,950-Speed 2608.35 samples/sec Loss 8.0094 LearningRate 0.0347 Epoch: 8 Global Step: 340740 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:32,845-Speed 2629.80 samples/sec Loss 7.8054 LearningRate 0.0347 Epoch: 8 Global Step: 340750 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:01:36,727-Speed 2637.86 samples/sec Loss 7.9206 LearningRate 0.0347 Epoch: 8 Global Step: 340760 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:40,655-Speed 2607.57 samples/sec Loss 7.8546 LearningRate 0.0347 Epoch: 8 Global Step: 340770 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:44,570-Speed 2616.17 samples/sec Loss 7.9081 LearningRate 0.0347 Epoch: 8 Global Step: 340780 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:48,471-Speed 2626.38 samples/sec Loss 7.9521 LearningRate 0.0347 Epoch: 8 Global Step: 340790 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:52,364-Speed 2630.39 samples/sec Loss 8.0036 LearningRate 0.0347 Epoch: 8 Global Step: 340800 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:01:56,277-Speed 2617.88 samples/sec Loss 7.9155 LearningRate 0.0347 Epoch: 8 Global Step: 340810 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:00,174-Speed 2628.65 samples/sec Loss 7.9237 LearningRate 0.0347 Epoch: 8 Global Step: 340820 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:04,070-Speed 2628.85 samples/sec Loss 7.9557 LearningRate 0.0347 Epoch: 8 Global Step: 340830 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:08,000-Speed 2605.73 samples/sec Loss 7.9668 LearningRate 0.0347 Epoch: 8 Global Step: 340840 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:11,900-Speed 2627.36 samples/sec Loss 8.0415 LearningRate 0.0347 Epoch: 8 Global Step: 340850 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:15,805-Speed 2622.30 samples/sec Loss 7.9707 LearningRate 0.0347 Epoch: 8 Global Step: 340860 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:02:19,726-Speed 2612.59 samples/sec Loss 7.9143 LearningRate 0.0347 Epoch: 8 Global Step: 340870 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:02:23,609-Speed 2637.77 samples/sec Loss 7.9574 LearningRate 0.0347 Epoch: 8 Global Step: 340880 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:27,510-Speed 2625.98 samples/sec Loss 7.8478 LearningRate 0.0347 Epoch: 8 Global Step: 340890 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:31,411-Speed 2625.66 samples/sec Loss 7.9028 LearningRate 0.0347 Epoch: 8 Global Step: 340900 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:35,309-Speed 2627.10 samples/sec Loss 7.9920 LearningRate 0.0347 Epoch: 8 Global Step: 340910 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:39,206-Speed 2628.30 samples/sec Loss 7.8776 LearningRate 0.0347 Epoch: 8 Global Step: 340920 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:43,104-Speed 2628.09 samples/sec Loss 7.8448 LearningRate 0.0347 Epoch: 8 Global Step: 340930 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:47,008-Speed 2623.24 samples/sec Loss 7.8146 LearningRate 0.0347 Epoch: 8 Global Step: 340940 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:50,917-Speed 2620.87 samples/sec Loss 8.0020 LearningRate 0.0347 Epoch: 8 Global Step: 340950 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:54,818-Speed 2625.26 samples/sec Loss 7.9560 LearningRate 0.0347 Epoch: 8 Global Step: 340960 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:02:58,714-Speed 2629.47 samples/sec Loss 7.9864 LearningRate 0.0347 Epoch: 8 Global Step: 340970 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:02,611-Speed 2627.98 samples/sec Loss 7.9373 LearningRate 0.0347 Epoch: 8 Global Step: 340980 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:03:06,523-Speed 2618.26 samples/sec Loss 8.0075 LearningRate 0.0347 Epoch: 8 Global Step: 340990 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:03:10,421-Speed 2627.36 samples/sec Loss 7.9683 LearningRate 0.0347 Epoch: 8 Global Step: 341000 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:03:14,299-Speed 2641.79 samples/sec Loss 7.8587 LearningRate 0.0347 Epoch: 8 Global Step: 341010 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:18,196-Speed 2628.70 samples/sec Loss 7.8989 LearningRate 0.0347 Epoch: 8 Global Step: 341020 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:22,095-Speed 2627.00 samples/sec Loss 7.9263 LearningRate 0.0347 Epoch: 8 Global Step: 341030 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:25,993-Speed 2627.94 samples/sec Loss 7.9307 LearningRate 0.0347 Epoch: 8 Global Step: 341040 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:29,892-Speed 2626.90 samples/sec Loss 7.8846 LearningRate 0.0347 Epoch: 8 Global Step: 341050 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:33,794-Speed 2624.74 samples/sec Loss 7.7976 LearningRate 0.0347 Epoch: 8 Global Step: 341060 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:37,688-Speed 2629.99 samples/sec Loss 7.9674 LearningRate 0.0347 Epoch: 8 Global Step: 341070 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:41,617-Speed 2607.37 samples/sec Loss 8.0206 LearningRate 0.0347 Epoch: 8 Global Step: 341080 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:45,525-Speed 2620.58 samples/sec Loss 7.8754 LearningRate 0.0347 Epoch: 8 Global Step: 341090 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:49,431-Speed 2622.36 samples/sec Loss 7.8484 LearningRate 0.0347 Epoch: 8 Global Step: 341100 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:53,343-Speed 2618.23 samples/sec Loss 8.0606 LearningRate 0.0347 Epoch: 8 Global Step: 341110 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:03:57,235-Speed 2631.74 samples/sec Loss 7.8745 LearningRate 0.0347 Epoch: 8 Global Step: 341120 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:01,133-Speed 2628.20 samples/sec Loss 7.9030 LearningRate 0.0347 Epoch: 8 Global Step: 341130 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:05,031-Speed 2627.07 samples/sec Loss 7.9575 LearningRate 0.0347 Epoch: 8 Global Step: 341140 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:08,927-Speed 2629.05 samples/sec Loss 7.8086 LearningRate 0.0347 Epoch: 8 Global Step: 341150 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:12,859-Speed 2604.97 samples/sec Loss 7.7418 LearningRate 0.0347 Epoch: 8 Global Step: 341160 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:16,753-Speed 2630.39 samples/sec Loss 7.7818 LearningRate 0.0347 Epoch: 8 Global Step: 341170 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:20,655-Speed 2625.18 samples/sec Loss 7.9450 LearningRate 0.0347 Epoch: 8 Global Step: 341180 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:24,553-Speed 2627.95 samples/sec Loss 7.9672 LearningRate 0.0347 Epoch: 8 Global Step: 341190 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:28,450-Speed 2628.73 samples/sec Loss 8.0333 LearningRate 0.0347 Epoch: 8 Global Step: 341200 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:32,331-Speed 2638.47 samples/sec Loss 7.9409 LearningRate 0.0347 Epoch: 8 Global Step: 341210 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:04:36,212-Speed 2638.95 samples/sec Loss 7.7935 LearningRate 0.0347 Epoch: 8 Global Step: 341220 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:04:40,115-Speed 2624.32 samples/sec Loss 7.8945 LearningRate 0.0347 Epoch: 8 Global Step: 341230 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:04:44,014-Speed 2627.38 samples/sec Loss 7.9664 LearningRate 0.0347 Epoch: 8 Global Step: 341240 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:04:47,923-Speed 2620.34 samples/sec Loss 8.0478 LearningRate 0.0347 Epoch: 8 Global Step: 341250 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:04:51,816-Speed 2631.27 samples/sec Loss 7.9736 LearningRate 0.0346 Epoch: 8 Global Step: 341260 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:04:55,714-Speed 2627.60 samples/sec Loss 7.8703 LearningRate 0.0346 Epoch: 8 Global Step: 341270 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:04:59,622-Speed 2621.24 samples/sec Loss 7.8077 LearningRate 0.0346 Epoch: 8 Global Step: 341280 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:05:03,530-Speed 2620.44 samples/sec Loss 7.9746 LearningRate 0.0346 Epoch: 8 Global Step: 341290 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:05:07,426-Speed 2629.19 samples/sec Loss 8.0740 LearningRate 0.0346 Epoch: 8 Global Step: 341300 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:05:11,321-Speed 2629.29 samples/sec Loss 7.8197 LearningRate 0.0346 Epoch: 8 Global Step: 341310 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:05:15,217-Speed 2629.43 samples/sec Loss 7.7600 LearningRate 0.0346 Epoch: 8 Global Step: 341320 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:19,121-Speed 2623.85 samples/sec Loss 7.9964 LearningRate 0.0346 Epoch: 8 Global Step: 341330 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:23,021-Speed 2626.44 samples/sec Loss 7.8693 LearningRate 0.0346 Epoch: 8 Global Step: 341340 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:26,926-Speed 2622.49 samples/sec Loss 7.9845 LearningRate 0.0346 Epoch: 8 Global Step: 341350 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:30,831-Speed 2622.64 samples/sec Loss 8.0050 LearningRate 0.0346 Epoch: 8 Global Step: 341360 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:34,740-Speed 2620.21 samples/sec Loss 7.7824 LearningRate 0.0346 Epoch: 8 Global Step: 341370 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:38,643-Speed 2624.13 samples/sec Loss 7.8336 LearningRate 0.0346 Epoch: 8 Global Step: 341380 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:42,542-Speed 2627.10 samples/sec Loss 7.9266 LearningRate 0.0346 Epoch: 8 Global Step: 341390 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:46,450-Speed 2620.78 samples/sec Loss 8.0142 LearningRate 0.0346 Epoch: 8 Global Step: 341400 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:50,362-Speed 2618.81 samples/sec Loss 7.9090 LearningRate 0.0346 Epoch: 8 Global Step: 341410 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:05:54,246-Speed 2636.82 samples/sec Loss 8.0771 LearningRate 0.0346 Epoch: 8 Global Step: 341420 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:05:58,097-Speed 2659.78 samples/sec Loss 8.6716 LearningRate 0.0346 Epoch: 8 Global Step: 341430 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:01,995-Speed 2627.80 samples/sec Loss 8.2575 LearningRate 0.0346 Epoch: 8 Global Step: 341440 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:05,903-Speed 2620.48 samples/sec Loss 7.9662 LearningRate 0.0346 Epoch: 8 Global Step: 341450 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:09,826-Speed 2611.28 samples/sec Loss 7.9188 LearningRate 0.0346 Epoch: 8 Global Step: 341460 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:13,739-Speed 2618.10 samples/sec Loss 7.8936 LearningRate 0.0346 Epoch: 8 Global Step: 341470 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:17,635-Speed 2628.33 samples/sec Loss 7.7918 LearningRate 0.0346 Epoch: 8 Global Step: 341480 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:21,548-Speed 2618.45 samples/sec Loss 7.9093 LearningRate 0.0346 Epoch: 8 Global Step: 341490 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:25,452-Speed 2623.55 samples/sec Loss 7.9551 LearningRate 0.0346 Epoch: 8 Global Step: 341500 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:29,355-Speed 2623.77 samples/sec Loss 7.9204 LearningRate 0.0346 Epoch: 8 Global Step: 341510 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:33,264-Speed 2620.84 samples/sec Loss 7.9169 LearningRate 0.0346 Epoch: 8 Global Step: 341520 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:37,526-Speed 2403.16 samples/sec Loss 7.8830 LearningRate 0.0346 Epoch: 8 Global Step: 341530 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:06:41,410-Speed 2636.64 samples/sec Loss 8.4310 LearningRate 0.0346 Epoch: 8 Global Step: 341540 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:45,310-Speed 2626.65 samples/sec Loss 8.0837 LearningRate 0.0346 Epoch: 8 Global Step: 341550 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:49,208-Speed 2627.22 samples/sec Loss 7.8581 LearningRate 0.0346 Epoch: 8 Global Step: 341560 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:53,114-Speed 2623.34 samples/sec Loss 7.9432 LearningRate 0.0346 Epoch: 8 Global Step: 341570 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:06:57,009-Speed 2629.04 samples/sec Loss 8.1530 LearningRate 0.0346 Epoch: 8 Global Step: 341580 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:07:00,905-Speed 2628.80 samples/sec Loss 8.0352 LearningRate 0.0346 Epoch: 8 Global Step: 341590 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:07:04,817-Speed 2618.19 samples/sec Loss 8.0449 LearningRate 0.0346 Epoch: 8 Global Step: 341600 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:07:08,708-Speed 2633.13 samples/sec Loss 7.9288 LearningRate 0.0346 Epoch: 8 Global Step: 341610 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:07:12,602-Speed 2629.89 samples/sec Loss 7.9582 LearningRate 0.0346 Epoch: 8 Global Step: 341620 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:07:16,496-Speed 2630.28 samples/sec Loss 8.0268 LearningRate 0.0346 Epoch: 8 Global Step: 341630 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:07:20,446-Speed 2593.20 samples/sec Loss 7.9262 LearningRate 0.0346 Epoch: 8 Global Step: 341640 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:24,351-Speed 2623.04 samples/sec Loss 7.8959 LearningRate 0.0346 Epoch: 8 Global Step: 341650 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:28,247-Speed 2629.70 samples/sec Loss 7.9458 LearningRate 0.0346 Epoch: 8 Global Step: 341660 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:32,157-Speed 2618.82 samples/sec Loss 8.0379 LearningRate 0.0346 Epoch: 8 Global Step: 341670 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:36,061-Speed 2623.58 samples/sec Loss 8.0735 LearningRate 0.0346 Epoch: 8 Global Step: 341680 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:39,968-Speed 2621.71 samples/sec Loss 7.8118 LearningRate 0.0346 Epoch: 8 Global Step: 341690 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:43,886-Speed 2614.92 samples/sec Loss 7.8323 LearningRate 0.0346 Epoch: 8 Global Step: 341700 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:47,802-Speed 2615.02 samples/sec Loss 8.0126 LearningRate 0.0346 Epoch: 8 Global Step: 341710 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:51,732-Speed 2606.47 samples/sec Loss 7.8688 LearningRate 0.0346 Epoch: 8 Global Step: 341720 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:55,639-Speed 2621.70 samples/sec Loss 7.9436 LearningRate 0.0346 Epoch: 8 Global Step: 341730 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:07:59,556-Speed 2615.17 samples/sec Loss 7.7712 LearningRate 0.0346 Epoch: 8 Global Step: 341740 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:03,478-Speed 2611.76 samples/sec Loss 7.8621 LearningRate 0.0346 Epoch: 8 Global Step: 341750 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:07,389-Speed 2618.82 samples/sec Loss 7.9714 LearningRate 0.0346 Epoch: 8 Global Step: 341760 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:11,292-Speed 2624.12 samples/sec Loss 7.8966 LearningRate 0.0346 Epoch: 8 Global Step: 341770 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:15,188-Speed 2628.48 samples/sec Loss 7.8837 LearningRate 0.0346 Epoch: 8 Global Step: 341780 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:19,085-Speed 2628.85 samples/sec Loss 7.9603 LearningRate 0.0346 Epoch: 8 Global Step: 341790 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:22,981-Speed 2628.72 samples/sec Loss 7.9803 LearningRate 0.0346 Epoch: 8 Global Step: 341800 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:26,876-Speed 2629.63 samples/sec Loss 7.9907 LearningRate 0.0346 Epoch: 8 Global Step: 341810 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:30,790-Speed 2616.88 samples/sec Loss 7.9503 LearningRate 0.0346 Epoch: 8 Global Step: 341820 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:34,686-Speed 2629.45 samples/sec Loss 7.8332 LearningRate 0.0346 Epoch: 8 Global Step: 341830 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:08:38,580-Speed 2629.65 samples/sec Loss 7.8954 LearningRate 0.0346 Epoch: 8 Global Step: 341840 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:08:42,477-Speed 2628.54 samples/sec Loss 8.0631 LearningRate 0.0346 Epoch: 8 Global Step: 341850 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:08:46,376-Speed 2627.12 samples/sec Loss 8.0510 LearningRate 0.0346 Epoch: 8 Global Step: 341860 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:08:50,273-Speed 2628.38 samples/sec Loss 8.0112 LearningRate 0.0346 Epoch: 8 Global Step: 341870 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:08:54,192-Speed 2613.36 samples/sec Loss 8.0189 LearningRate 0.0346 Epoch: 8 Global Step: 341880 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:08:58,093-Speed 2626.59 samples/sec Loss 7.9102 LearningRate 0.0346 Epoch: 8 Global Step: 341890 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:09:01,989-Speed 2628.97 samples/sec Loss 7.8732 LearningRate 0.0346 Epoch: 8 Global Step: 341900 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:09:05,889-Speed 2625.65 samples/sec Loss 7.8541 LearningRate 0.0346 Epoch: 8 Global Step: 341910 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:09:09,807-Speed 2614.40 samples/sec Loss 7.9649 LearningRate 0.0346 Epoch: 8 Global Step: 341920 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:09:13,706-Speed 2627.05 samples/sec Loss 7.8767 LearningRate 0.0346 Epoch: 8 Global Step: 341930 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:09:17,604-Speed 2628.06 samples/sec Loss 7.8136 LearningRate 0.0346 Epoch: 8 Global Step: 341940 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:21,502-Speed 2627.10 samples/sec Loss 7.9221 LearningRate 0.0346 Epoch: 8 Global Step: 341950 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:25,407-Speed 2623.11 samples/sec Loss 7.9133 LearningRate 0.0345 Epoch: 8 Global Step: 341960 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:29,304-Speed 2628.86 samples/sec Loss 7.9048 LearningRate 0.0345 Epoch: 8 Global Step: 341970 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:33,205-Speed 2625.15 samples/sec Loss 7.9027 LearningRate 0.0345 Epoch: 8 Global Step: 341980 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:37,105-Speed 2626.45 samples/sec Loss 8.0492 LearningRate 0.0345 Epoch: 8 Global Step: 341990 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:41,008-Speed 2624.37 samples/sec Loss 7.9389 LearningRate 0.0345 Epoch: 8 Global Step: 342000 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:44,918-Speed 2619.85 samples/sec Loss 7.7952 LearningRate 0.0345 Epoch: 8 Global Step: 342010 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:09:48,804-Speed 2636.13 samples/sec Loss 8.0121 LearningRate 0.0345 Epoch: 8 Global Step: 342020 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:09:52,680-Speed 2642.08 samples/sec Loss 8.2410 LearningRate 0.0345 Epoch: 8 Global Step: 342030 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:09:56,581-Speed 2625.95 samples/sec Loss 7.9169 LearningRate 0.0345 Epoch: 8 Global Step: 342040 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:00,504-Speed 2610.38 samples/sec Loss 7.8646 LearningRate 0.0345 Epoch: 8 Global Step: 342050 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:04,399-Speed 2630.00 samples/sec Loss 7.9424 LearningRate 0.0345 Epoch: 8 Global Step: 342060 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:08,295-Speed 2628.97 samples/sec Loss 7.8945 LearningRate 0.0345 Epoch: 8 Global Step: 342070 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:12,247-Speed 2591.75 samples/sec Loss 7.9067 LearningRate 0.0345 Epoch: 8 Global Step: 342080 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:16,152-Speed 2622.79 samples/sec Loss 7.8917 LearningRate 0.0345 Epoch: 8 Global Step: 342090 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:20,051-Speed 2627.28 samples/sec Loss 7.8844 LearningRate 0.0345 Epoch: 8 Global Step: 342100 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:23,949-Speed 2628.10 samples/sec Loss 7.9944 LearningRate 0.0345 Epoch: 8 Global Step: 342110 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:27,864-Speed 2616.41 samples/sec Loss 8.1113 LearningRate 0.0345 Epoch: 8 Global Step: 342120 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:10:31,751-Speed 2634.99 samples/sec Loss 7.9404 LearningRate 0.0345 Epoch: 8 Global Step: 342130 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:10:35,658-Speed 2621.38 samples/sec Loss 7.8962 LearningRate 0.0345 Epoch: 8 Global Step: 342140 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:10:39,562-Speed 2623.81 samples/sec Loss 7.8804 LearningRate 0.0345 Epoch: 8 Global Step: 342150 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:10:43,471-Speed 2620.36 samples/sec Loss 8.0955 LearningRate 0.0345 Epoch: 8 Global Step: 342160 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:10:47,369-Speed 2627.18 samples/sec Loss 7.9205 LearningRate 0.0345 Epoch: 8 Global Step: 342170 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:10:51,266-Speed 2628.96 samples/sec Loss 7.8481 LearningRate 0.0345 Epoch: 8 Global Step: 342180 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:10:55,184-Speed 2614.23 samples/sec Loss 7.8845 LearningRate 0.0345 Epoch: 8 Global Step: 342190 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:10:59,117-Speed 2604.26 samples/sec Loss 8.0072 LearningRate 0.0345 Epoch: 8 Global Step: 342200 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:11:03,017-Speed 2626.17 samples/sec Loss 7.8686 LearningRate 0.0345 Epoch: 8 Global Step: 342210 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:11:06,912-Speed 2629.81 samples/sec Loss 7.8992 LearningRate 0.0345 Epoch: 8 Global Step: 342220 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:11:10,810-Speed 2627.93 samples/sec Loss 8.0933 LearningRate 0.0345 Epoch: 8 Global Step: 342230 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:14,712-Speed 2624.47 samples/sec Loss 7.9106 LearningRate 0.0345 Epoch: 8 Global Step: 342240 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:18,626-Speed 2617.84 samples/sec Loss 7.8118 LearningRate 0.0345 Epoch: 8 Global Step: 342250 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:22,530-Speed 2623.59 samples/sec Loss 7.9546 LearningRate 0.0345 Epoch: 8 Global Step: 342260 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:26,426-Speed 2629.15 samples/sec Loss 7.9431 LearningRate 0.0345 Epoch: 8 Global Step: 342270 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:30,336-Speed 2618.90 samples/sec Loss 7.8000 LearningRate 0.0345 Epoch: 8 Global Step: 342280 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:34,237-Speed 2626.21 samples/sec Loss 7.9052 LearningRate 0.0345 Epoch: 8 Global Step: 342290 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:38,135-Speed 2627.18 samples/sec Loss 7.8945 LearningRate 0.0345 Epoch: 8 Global Step: 342300 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:42,109-Speed 2577.75 samples/sec Loss 7.9867 LearningRate 0.0345 Epoch: 8 Global Step: 342310 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:46,007-Speed 2627.18 samples/sec Loss 7.9127 LearningRate 0.0345 Epoch: 8 Global Step: 342320 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:11:49,922-Speed 2616.81 samples/sec Loss 7.7460 LearningRate 0.0345 Epoch: 8 Global Step: 342330 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:11:53,819-Speed 2628.52 samples/sec Loss 7.9858 LearningRate 0.0345 Epoch: 8 Global Step: 342340 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:11:57,692-Speed 2644.40 samples/sec Loss 7.8953 LearningRate 0.0345 Epoch: 8 Global Step: 342350 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:12:01,543-Speed 2659.67 samples/sec Loss 8.7897 LearningRate 0.0345 Epoch: 8 Global Step: 342360 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:05,456-Speed 2617.96 samples/sec Loss 7.8878 LearningRate 0.0345 Epoch: 8 Global Step: 342370 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:09,350-Speed 2629.88 samples/sec Loss 7.8771 LearningRate 0.0345 Epoch: 8 Global Step: 342380 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:13,261-Speed 2619.42 samples/sec Loss 8.0207 LearningRate 0.0345 Epoch: 8 Global Step: 342390 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:17,182-Speed 2612.08 samples/sec Loss 7.9690 LearningRate 0.0345 Epoch: 8 Global Step: 342400 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:21,279-Speed 2499.68 samples/sec Loss 7.8673 LearningRate 0.0345 Epoch: 8 Global Step: 342410 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:25,377-Speed 2499.78 samples/sec Loss 7.8216 LearningRate 0.0345 Epoch: 8 Global Step: 342420 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:29,315-Speed 2601.33 samples/sec Loss 7.9205 LearningRate 0.0345 Epoch: 8 Global Step: 342430 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:33,215-Speed 2626.10 samples/sec Loss 7.7641 LearningRate 0.0345 Epoch: 8 Global Step: 342440 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:37,109-Speed 2629.71 samples/sec Loss 7.9393 LearningRate 0.0345 Epoch: 8 Global Step: 342450 Fp16 Grad Scale: 4096 Required: 55 hours
Training: 2022-04-14 10:12:41,006-Speed 2628.53 samples/sec Loss 8.0094 LearningRate 0.0345 Epoch: 8 Global Step: 342460 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:12:44,908-Speed 2625.18 samples/sec Loss 8.0010 LearningRate 0.0345 Epoch: 8 Global Step: 342470 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:12:48,807-Speed 2626.97 samples/sec Loss 7.8715 LearningRate 0.0345 Epoch: 8 Global Step: 342480 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:12:52,707-Speed 2625.80 samples/sec Loss 7.9087 LearningRate 0.0345 Epoch: 8 Global Step: 342490 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:12:56,619-Speed 2618.62 samples/sec Loss 7.8827 LearningRate 0.0345 Epoch: 8 Global Step: 342500 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:13:00,524-Speed 2623.26 samples/sec Loss 7.8095 LearningRate 0.0345 Epoch: 8 Global Step: 342510 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:13:04,425-Speed 2625.19 samples/sec Loss 7.9034 LearningRate 0.0345 Epoch: 8 Global Step: 342520 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:13:08,333-Speed 2620.67 samples/sec Loss 7.9145 LearningRate 0.0345 Epoch: 8 Global Step: 342530 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:13:12,232-Speed 2627.25 samples/sec Loss 8.0495 LearningRate 0.0345 Epoch: 8 Global Step: 342540 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:13:16,127-Speed 2629.34 samples/sec Loss 7.8400 LearningRate 0.0345 Epoch: 8 Global Step: 342550 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:13:20,030-Speed 2624.83 samples/sec Loss 7.8567 LearningRate 0.0345 Epoch: 8 Global Step: 342560 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:24,030-Speed 2560.44 samples/sec Loss 7.9241 LearningRate 0.0345 Epoch: 8 Global Step: 342570 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:27,944-Speed 2616.79 samples/sec Loss 7.9777 LearningRate 0.0345 Epoch: 8 Global Step: 342580 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:31,842-Speed 2627.56 samples/sec Loss 7.8761 LearningRate 0.0345 Epoch: 8 Global Step: 342590 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:35,744-Speed 2624.83 samples/sec Loss 7.9197 LearningRate 0.0345 Epoch: 8 Global Step: 342600 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:39,644-Speed 2626.12 samples/sec Loss 7.8438 LearningRate 0.0345 Epoch: 8 Global Step: 342610 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:43,540-Speed 2629.41 samples/sec Loss 7.9158 LearningRate 0.0345 Epoch: 8 Global Step: 342620 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:47,438-Speed 2627.16 samples/sec Loss 7.9066 LearningRate 0.0345 Epoch: 8 Global Step: 342630 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:51,351-Speed 2618.10 samples/sec Loss 7.9297 LearningRate 0.0345 Epoch: 8 Global Step: 342640 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:55,248-Speed 2628.41 samples/sec Loss 7.8621 LearningRate 0.0345 Epoch: 8 Global Step: 342650 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:13:59,152-Speed 2623.67 samples/sec Loss 7.9786 LearningRate 0.0345 Epoch: 8 Global Step: 342660 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:03,086-Speed 2603.23 samples/sec Loss 7.8980 LearningRate 0.0344 Epoch: 8 Global Step: 342670 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:06,991-Speed 2623.23 samples/sec Loss 7.9100 LearningRate 0.0344 Epoch: 8 Global Step: 342680 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:10,891-Speed 2626.15 samples/sec Loss 7.8471 LearningRate 0.0344 Epoch: 8 Global Step: 342690 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:14,794-Speed 2624.30 samples/sec Loss 7.9825 LearningRate 0.0344 Epoch: 8 Global Step: 342700 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:18,698-Speed 2624.10 samples/sec Loss 7.9368 LearningRate 0.0344 Epoch: 8 Global Step: 342710 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:22,602-Speed 2623.29 samples/sec Loss 7.8433 LearningRate 0.0344 Epoch: 8 Global Step: 342720 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:26,508-Speed 2623.01 samples/sec Loss 7.9471 LearningRate 0.0344 Epoch: 8 Global Step: 342730 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:30,417-Speed 2619.83 samples/sec Loss 7.9235 LearningRate 0.0344 Epoch: 8 Global Step: 342740 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:34,325-Speed 2621.45 samples/sec Loss 7.8550 LearningRate 0.0344 Epoch: 8 Global Step: 342750 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:14:38,196-Speed 2646.03 samples/sec Loss 8.2141 LearningRate 0.0344 Epoch: 8 Global Step: 342760 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:14:42,108-Speed 2617.50 samples/sec Loss 7.9374 LearningRate 0.0344 Epoch: 8 Global Step: 342770 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:14:46,017-Speed 2620.22 samples/sec Loss 7.8251 LearningRate 0.0344 Epoch: 8 Global Step: 342780 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:14:49,928-Speed 2619.40 samples/sec Loss 7.8581 LearningRate 0.0344 Epoch: 8 Global Step: 342790 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:14:53,828-Speed 2626.42 samples/sec Loss 8.0606 LearningRate 0.0344 Epoch: 8 Global Step: 342800 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:14:57,728-Speed 2625.88 samples/sec Loss 7.9558 LearningRate 0.0344 Epoch: 8 Global Step: 342810 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:15:01,627-Speed 2627.71 samples/sec Loss 7.8890 LearningRate 0.0344 Epoch: 8 Global Step: 342820 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:15:05,529-Speed 2624.34 samples/sec Loss 7.8654 LearningRate 0.0344 Epoch: 8 Global Step: 342830 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:15:09,430-Speed 2625.66 samples/sec Loss 7.9421 LearningRate 0.0344 Epoch: 8 Global Step: 342840 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:15:13,330-Speed 2625.88 samples/sec Loss 7.9264 LearningRate 0.0344 Epoch: 8 Global Step: 342850 Fp16 Grad Scale: 8192 Required: 55 hours
Training: 2022-04-14 10:15:17,231-Speed 2626.02 samples/sec Loss 7.8864 LearningRate 0.0344 Epoch: 8 Global Step: 342860 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:21,182-Speed 2592.78 samples/sec Loss 7.8516 LearningRate 0.0344 Epoch: 8 Global Step: 342870 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:25,093-Speed 2618.51 samples/sec Loss 7.8951 LearningRate 0.0344 Epoch: 8 Global Step: 342880 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:29,057-Speed 2584.39 samples/sec Loss 8.0302 LearningRate 0.0344 Epoch: 8 Global Step: 342890 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:32,999-Speed 2598.59 samples/sec Loss 7.9692 LearningRate 0.0344 Epoch: 8 Global Step: 342900 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:36,903-Speed 2623.26 samples/sec Loss 7.9599 LearningRate 0.0344 Epoch: 8 Global Step: 342910 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:40,823-Speed 2613.01 samples/sec Loss 7.9227 LearningRate 0.0344 Epoch: 8 Global Step: 342920 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:44,736-Speed 2617.19 samples/sec Loss 8.5335 LearningRate 0.0344 Epoch: 8 Global Step: 342930 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:48,629-Speed 2631.24 samples/sec Loss 8.3345 LearningRate 0.0344 Epoch: 8 Global Step: 342940 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:52,531-Speed 2625.05 samples/sec Loss 8.0201 LearningRate 0.0344 Epoch: 8 Global Step: 342950 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:15:56,430-Speed 2627.32 samples/sec Loss 7.9588 LearningRate 0.0344 Epoch: 8 Global Step: 342960 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:00,332-Speed 2624.64 samples/sec Loss 7.9812 LearningRate 0.0344 Epoch: 8 Global Step: 342970 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:04,232-Speed 2626.14 samples/sec Loss 7.9498 LearningRate 0.0344 Epoch: 8 Global Step: 342980 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:08,144-Speed 2618.09 samples/sec Loss 7.9961 LearningRate 0.0344 Epoch: 8 Global Step: 342990 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:12,040-Speed 2628.67 samples/sec Loss 7.9199 LearningRate 0.0344 Epoch: 8 Global Step: 343000 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:15,944-Speed 2623.90 samples/sec Loss 7.9879 LearningRate 0.0344 Epoch: 8 Global Step: 343010 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:19,844-Speed 2626.15 samples/sec Loss 7.9076 LearningRate 0.0344 Epoch: 8 Global Step: 343020 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:23,746-Speed 2625.42 samples/sec Loss 7.8109 LearningRate 0.0344 Epoch: 8 Global Step: 343030 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:27,651-Speed 2622.30 samples/sec Loss 7.8617 LearningRate 0.0344 Epoch: 8 Global Step: 343040 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:31,554-Speed 2623.96 samples/sec Loss 8.0200 LearningRate 0.0344 Epoch: 8 Global Step: 343050 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:16:35,462-Speed 2621.21 samples/sec Loss 7.9847 LearningRate 0.0344 Epoch: 8 Global Step: 343060 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:16:39,366-Speed 2623.93 samples/sec Loss 7.9407 LearningRate 0.0344 Epoch: 8 Global Step: 343070 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:16:43,270-Speed 2623.60 samples/sec Loss 7.9319 LearningRate 0.0344 Epoch: 8 Global Step: 343080 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:16:47,174-Speed 2623.81 samples/sec Loss 7.8767 LearningRate 0.0344 Epoch: 8 Global Step: 343090 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:16:51,079-Speed 2622.68 samples/sec Loss 8.0037 LearningRate 0.0344 Epoch: 8 Global Step: 343100 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:16:54,978-Speed 2627.08 samples/sec Loss 8.0162 LearningRate 0.0344 Epoch: 8 Global Step: 343110 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:16:58,968-Speed 2566.43 samples/sec Loss 7.9013 LearningRate 0.0344 Epoch: 8 Global Step: 343120 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:17:02,860-Speed 2632.08 samples/sec Loss 7.9745 LearningRate 0.0344 Epoch: 8 Global Step: 343130 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:17:06,762-Speed 2625.23 samples/sec Loss 7.9045 LearningRate 0.0344 Epoch: 8 Global Step: 343140 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:17:10,871-Speed 2492.80 samples/sec Loss 8.4964 LearningRate 0.0344 Epoch: 8 Global Step: 343150 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:14,774-Speed 2624.41 samples/sec Loss 8.0938 LearningRate 0.0344 Epoch: 8 Global Step: 343160 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:18,678-Speed 2623.40 samples/sec Loss 8.1156 LearningRate 0.0344 Epoch: 8 Global Step: 343170 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:22,577-Speed 2626.55 samples/sec Loss 7.9588 LearningRate 0.0344 Epoch: 8 Global Step: 343180 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:26,484-Speed 2621.82 samples/sec Loss 7.8948 LearningRate 0.0344 Epoch: 8 Global Step: 343190 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:30,384-Speed 2626.35 samples/sec Loss 7.9987 LearningRate 0.0344 Epoch: 8 Global Step: 343200 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:34,281-Speed 2628.15 samples/sec Loss 7.9803 LearningRate 0.0344 Epoch: 8 Global Step: 343210 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:38,181-Speed 2625.72 samples/sec Loss 7.8838 LearningRate 0.0344 Epoch: 8 Global Step: 343220 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:42,094-Speed 2618.00 samples/sec Loss 7.8814 LearningRate 0.0344 Epoch: 8 Global Step: 343230 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:45,988-Speed 2630.37 samples/sec Loss 7.8695 LearningRate 0.0344 Epoch: 8 Global Step: 343240 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:17:49,904-Speed 2615.43 samples/sec Loss 7.9163 LearningRate 0.0344 Epoch: 8 Global Step: 343250 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:17:53,846-Speed 2598.10 samples/sec Loss 7.8913 LearningRate 0.0344 Epoch: 8 Global Step: 343260 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:17:57,900-Speed 2526.87 samples/sec Loss 7.9469 LearningRate 0.0344 Epoch: 8 Global Step: 343270 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:01,799-Speed 2626.54 samples/sec Loss 7.9997 LearningRate 0.0344 Epoch: 8 Global Step: 343280 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:05,724-Speed 2609.20 samples/sec Loss 7.9336 LearningRate 0.0344 Epoch: 8 Global Step: 343290 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:09,625-Speed 2625.93 samples/sec Loss 7.8669 LearningRate 0.0344 Epoch: 8 Global Step: 343300 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:13,526-Speed 2625.78 samples/sec Loss 7.8972 LearningRate 0.0344 Epoch: 8 Global Step: 343310 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:17,427-Speed 2625.65 samples/sec Loss 7.9307 LearningRate 0.0344 Epoch: 8 Global Step: 343320 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:21,343-Speed 2615.41 samples/sec Loss 7.9119 LearningRate 0.0344 Epoch: 8 Global Step: 343330 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:25,764-Speed 2317.06 samples/sec Loss 7.8637 LearningRate 0.0344 Epoch: 8 Global Step: 343340 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:18:29,659-Speed 2629.54 samples/sec Loss 7.8165 LearningRate 0.0344 Epoch: 8 Global Step: 343350 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:18:33,580-Speed 2612.58 samples/sec Loss 7.9034 LearningRate 0.0344 Epoch: 8 Global Step: 343360 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:18:37,480-Speed 2625.81 samples/sec Loss 7.9102 LearningRate 0.0344 Epoch: 8 Global Step: 343370 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:18:41,382-Speed 2625.10 samples/sec Loss 8.0062 LearningRate 0.0343 Epoch: 8 Global Step: 343380 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:18:45,283-Speed 2625.69 samples/sec Loss 7.9011 LearningRate 0.0343 Epoch: 8 Global Step: 343390 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:18:49,185-Speed 2625.33 samples/sec Loss 7.8710 LearningRate 0.0343 Epoch: 8 Global Step: 343400 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:18:53,093-Speed 2620.52 samples/sec Loss 7.9260 LearningRate 0.0343 Epoch: 8 Global Step: 343410 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:18:56,990-Speed 2628.50 samples/sec Loss 7.7146 LearningRate 0.0343 Epoch: 8 Global Step: 343420 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:19:00,892-Speed 2625.42 samples/sec Loss 8.0502 LearningRate 0.0343 Epoch: 8 Global Step: 343430 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:19:04,793-Speed 2625.53 samples/sec Loss 7.9394 LearningRate 0.0343 Epoch: 8 Global Step: 343440 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:19:08,698-Speed 2622.74 samples/sec Loss 7.8623 LearningRate 0.0343 Epoch: 8 Global Step: 343450 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:12,615-Speed 2615.17 samples/sec Loss 7.9480 LearningRate 0.0343 Epoch: 8 Global Step: 343460 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:16,534-Speed 2613.25 samples/sec Loss 7.7824 LearningRate 0.0343 Epoch: 8 Global Step: 343470 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:20,443-Speed 2620.79 samples/sec Loss 7.9526 LearningRate 0.0343 Epoch: 8 Global Step: 343480 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:24,373-Speed 2606.07 samples/sec Loss 8.0242 LearningRate 0.0343 Epoch: 8 Global Step: 343490 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:28,276-Speed 2624.44 samples/sec Loss 7.9020 LearningRate 0.0343 Epoch: 8 Global Step: 343500 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:32,187-Speed 2618.78 samples/sec Loss 7.9917 LearningRate 0.0343 Epoch: 8 Global Step: 343510 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:36,123-Speed 2602.22 samples/sec Loss 7.9417 LearningRate 0.0343 Epoch: 8 Global Step: 343520 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:40,025-Speed 2624.87 samples/sec Loss 7.8452 LearningRate 0.0343 Epoch: 8 Global Step: 343530 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:43,926-Speed 2625.98 samples/sec Loss 7.7171 LearningRate 0.0343 Epoch: 8 Global Step: 343540 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:47,808-Speed 2638.12 samples/sec Loss 7.9090 LearningRate 0.0343 Epoch: 8 Global Step: 343550 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:51,720-Speed 2618.69 samples/sec Loss 7.8575 LearningRate 0.0343 Epoch: 8 Global Step: 343560 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:55,612-Speed 2631.44 samples/sec Loss 7.8959 LearningRate 0.0343 Epoch: 8 Global Step: 343570 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:19:59,513-Speed 2625.89 samples/sec Loss 7.9508 LearningRate 0.0343 Epoch: 8 Global Step: 343580 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:20:03,417-Speed 2624.13 samples/sec Loss 7.9215 LearningRate 0.0343 Epoch: 8 Global Step: 343590 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:20:07,318-Speed 2624.80 samples/sec Loss 8.0183 LearningRate 0.0343 Epoch: 8 Global Step: 343600 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:20:11,223-Speed 2623.63 samples/sec Loss 7.8771 LearningRate 0.0343 Epoch: 8 Global Step: 343610 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:20:15,124-Speed 2625.83 samples/sec Loss 7.8974 LearningRate 0.0343 Epoch: 8 Global Step: 343620 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:20:19,055-Speed 2605.39 samples/sec Loss 7.9814 LearningRate 0.0343 Epoch: 8 Global Step: 343630 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:20:22,955-Speed 2626.03 samples/sec Loss 7.8426 LearningRate 0.0343 Epoch: 8 Global Step: 343640 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:20:26,815-Speed 2653.92 samples/sec Loss 8.3507 LearningRate 0.0343 Epoch: 8 Global Step: 343650 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:30,714-Speed 2626.69 samples/sec Loss 8.2890 LearningRate 0.0343 Epoch: 8 Global Step: 343660 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:34,619-Speed 2623.79 samples/sec Loss 7.9839 LearningRate 0.0343 Epoch: 8 Global Step: 343670 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:38,524-Speed 2622.63 samples/sec Loss 7.9309 LearningRate 0.0343 Epoch: 8 Global Step: 343680 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:42,433-Speed 2620.38 samples/sec Loss 7.9292 LearningRate 0.0343 Epoch: 8 Global Step: 343690 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:46,340-Speed 2621.26 samples/sec Loss 7.9746 LearningRate 0.0343 Epoch: 8 Global Step: 343700 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:50,244-Speed 2624.00 samples/sec Loss 8.0519 LearningRate 0.0343 Epoch: 8 Global Step: 343710 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:54,184-Speed 2599.93 samples/sec Loss 7.8018 LearningRate 0.0343 Epoch: 8 Global Step: 343720 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:20:58,084-Speed 2626.65 samples/sec Loss 7.8848 LearningRate 0.0343 Epoch: 8 Global Step: 343730 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:01,985-Speed 2625.57 samples/sec Loss 7.7898 LearningRate 0.0343 Epoch: 8 Global Step: 343740 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:05,897-Speed 2618.33 samples/sec Loss 8.1016 LearningRate 0.0343 Epoch: 8 Global Step: 343750 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:21:09,807-Speed 2619.48 samples/sec Loss 7.9748 LearningRate 0.0343 Epoch: 8 Global Step: 343760 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:21:13,708-Speed 2625.98 samples/sec Loss 8.0036 LearningRate 0.0343 Epoch: 8 Global Step: 343770 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:21:17,612-Speed 2623.49 samples/sec Loss 7.9259 LearningRate 0.0343 Epoch: 8 Global Step: 343780 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:21:21,611-Speed 2560.65 samples/sec Loss 7.9538 LearningRate 0.0343 Epoch: 8 Global Step: 343790 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:21:25,706-Speed 2501.86 samples/sec Loss 7.8550 LearningRate 0.0343 Epoch: 8 Global Step: 343800 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:21:29,806-Speed 2497.89 samples/sec Loss 7.8359 LearningRate 0.0343 Epoch: 8 Global Step: 343810 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:21:33,744-Speed 2601.09 samples/sec Loss 8.7317 LearningRate 0.0343 Epoch: 8 Global Step: 343820 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:37,655-Speed 2618.69 samples/sec Loss 8.4939 LearningRate 0.0343 Epoch: 8 Global Step: 343830 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:41,562-Speed 2622.11 samples/sec Loss 8.0353 LearningRate 0.0343 Epoch: 8 Global Step: 343840 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:45,466-Speed 2623.23 samples/sec Loss 7.9259 LearningRate 0.0343 Epoch: 8 Global Step: 343850 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:49,400-Speed 2603.91 samples/sec Loss 7.9249 LearningRate 0.0343 Epoch: 8 Global Step: 343860 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:53,356-Speed 2589.36 samples/sec Loss 7.9028 LearningRate 0.0343 Epoch: 8 Global Step: 343870 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:21:57,254-Speed 2627.34 samples/sec Loss 7.9552 LearningRate 0.0343 Epoch: 8 Global Step: 343880 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:22:01,177-Speed 2610.83 samples/sec Loss 7.9651 LearningRate 0.0343 Epoch: 8 Global Step: 343890 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:22:05,076-Speed 2627.52 samples/sec Loss 8.0174 LearningRate 0.0343 Epoch: 8 Global Step: 343900 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:22:08,976-Speed 2626.22 samples/sec Loss 7.9624 LearningRate 0.0343 Epoch: 8 Global Step: 343910 Fp16 Grad Scale: 16384 Required: 55 hours
Training: 2022-04-14 10:22:12,878-Speed 2624.92 samples/sec Loss 7.8906 LearningRate 0.0343 Epoch: 8 Global Step: 343920 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:16,781-Speed 2624.15 samples/sec Loss 7.8734 LearningRate 0.0343 Epoch: 8 Global Step: 343930 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:20,696-Speed 2616.24 samples/sec Loss 7.9260 LearningRate 0.0343 Epoch: 8 Global Step: 343940 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:24,600-Speed 2623.92 samples/sec Loss 7.9233 LearningRate 0.0343 Epoch: 8 Global Step: 343950 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:28,500-Speed 2625.88 samples/sec Loss 7.9950 LearningRate 0.0343 Epoch: 8 Global Step: 343960 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:32,409-Speed 2620.75 samples/sec Loss 7.9261 LearningRate 0.0343 Epoch: 8 Global Step: 343970 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:36,312-Speed 2624.46 samples/sec Loss 7.9126 LearningRate 0.0343 Epoch: 8 Global Step: 343980 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:40,210-Speed 2627.62 samples/sec Loss 7.9646 LearningRate 0.0343 Epoch: 8 Global Step: 343990 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:44,109-Speed 2626.97 samples/sec Loss 7.7875 LearningRate 0.0343 Epoch: 8 Global Step: 344000 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:48,012-Speed 2624.36 samples/sec Loss 7.9603 LearningRate 0.0343 Epoch: 8 Global Step: 344010 Fp16 Grad Scale: 32768 Required: 55 hours
Training: 2022-04-14 10:22:51,909-Speed 2628.14 samples/sec Loss 7.7901 LearningRate 0.0343 Epoch: 8 Global Step: 344020 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:22:55,810-Speed 2625.64 samples/sec Loss 7.8518 LearningRate 0.0343 Epoch: 8 Global Step: 344030 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:22:59,710-Speed 2626.77 samples/sec Loss 7.8894 LearningRate 0.0343 Epoch: 8 Global Step: 344040 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:03,610-Speed 2626.13 samples/sec Loss 7.8350 LearningRate 0.0343 Epoch: 8 Global Step: 344050 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:07,511-Speed 2625.68 samples/sec Loss 7.9356 LearningRate 0.0343 Epoch: 8 Global Step: 344060 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:11,487-Speed 2633.62 samples/sec Loss 7.9871 LearningRate 0.0343 Epoch: 8 Global Step: 344070 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:17,636-Speed 2635.25 samples/sec Loss 7.9165 LearningRate 0.0343 Epoch: 8 Global Step: 344080 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:21,569-Speed 2634.75 samples/sec Loss 7.8219 LearningRate 0.0342 Epoch: 8 Global Step: 344090 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:25,844-Speed 2633.53 samples/sec Loss 7.8738 LearningRate 0.0342 Epoch: 8 Global Step: 344100 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:29,742-Speed 2627.56 samples/sec Loss 7.8206 LearningRate 0.0342 Epoch: 8 Global Step: 344110 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:23:34,348-Speed 2558.60 samples/sec Loss 7.9399 LearningRate 0.0342 Epoch: 8 Global Step: 344120 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:23:38,247-Speed 2627.00 samples/sec Loss 8.0391 LearningRate 0.0342 Epoch: 8 Global Step: 344130 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:23:42,150-Speed 2623.89 samples/sec Loss 8.0710 LearningRate 0.0342 Epoch: 8 Global Step: 344140 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:23:46,166-Speed 2629.86 samples/sec Loss 7.9274 LearningRate 0.0342 Epoch: 8 Global Step: 344150 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:23:50,072-Speed 2622.52 samples/sec Loss 8.0546 LearningRate 0.0342 Epoch: 8 Global Step: 344160 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:23:53,975-Speed 2624.16 samples/sec Loss 7.9103 LearningRate 0.0342 Epoch: 8 Global Step: 344170 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:23:57,875-Speed 2630.69 samples/sec Loss 7.8566 LearningRate 0.0342 Epoch: 8 Global Step: 344180 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:01,779-Speed 2623.86 samples/sec Loss 7.7637 LearningRate 0.0342 Epoch: 8 Global Step: 344190 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:05,675-Speed 2628.97 samples/sec Loss 7.8268 LearningRate 0.0342 Epoch: 8 Global Step: 344200 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:09,573-Speed 2627.69 samples/sec Loss 7.8255 LearningRate 0.0342 Epoch: 8 Global Step: 344210 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:13,468-Speed 2629.21 samples/sec Loss 7.8713 LearningRate 0.0342 Epoch: 8 Global Step: 344220 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:17,368-Speed 2626.40 samples/sec Loss 7.9797 LearningRate 0.0342 Epoch: 8 Global Step: 344230 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:21,282-Speed 2617.44 samples/sec Loss 7.9842 LearningRate 0.0342 Epoch: 8 Global Step: 344240 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:25,187-Speed 2622.88 samples/sec Loss 7.8225 LearningRate 0.0342 Epoch: 8 Global Step: 344250 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:29,092-Speed 2622.90 samples/sec Loss 7.8924 LearningRate 0.0342 Epoch: 8 Global Step: 344260 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:32,996-Speed 2624.20 samples/sec Loss 7.8847 LearningRate 0.0342 Epoch: 8 Global Step: 344270 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:36,896-Speed 2626.32 samples/sec Loss 7.8366 LearningRate 0.0342 Epoch: 8 Global Step: 344280 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:40,875-Speed 2574.00 samples/sec Loss 7.9456 LearningRate 0.0342 Epoch: 8 Global Step: 344290 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:44,778-Speed 2624.26 samples/sec Loss 7.8950 LearningRate 0.0342 Epoch: 8 Global Step: 344300 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:48,692-Speed 2616.88 samples/sec Loss 7.8650 LearningRate 0.0342 Epoch: 8 Global Step: 344310 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:24:52,621-Speed 2606.71 samples/sec Loss 7.8948 LearningRate 0.0342 Epoch: 8 Global Step: 344320 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:24:56,571-Speed 2593.39 samples/sec Loss 7.8113 LearningRate 0.0342 Epoch: 8 Global Step: 344330 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:25:00,484-Speed 2617.59 samples/sec Loss 8.0525 LearningRate 0.0342 Epoch: 8 Global Step: 344340 Fp16 Grad Scale: 262144 Required: 55 hours
Training: 2022-04-14 10:25:04,372-Speed 2634.63 samples/sec Loss 7.9327 LearningRate 0.0342 Epoch: 8 Global Step: 344350 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:25:08,278-Speed 2622.42 samples/sec Loss 7.8092 LearningRate 0.0342 Epoch: 8 Global Step: 344360 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:25:12,184-Speed 2621.92 samples/sec Loss 7.8199 LearningRate 0.0342 Epoch: 8 Global Step: 344370 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:25:16,083-Speed 2627.13 samples/sec Loss 7.8280 LearningRate 0.0342 Epoch: 8 Global Step: 344380 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:25:19,989-Speed 2622.58 samples/sec Loss 7.8747 LearningRate 0.0342 Epoch: 8 Global Step: 344390 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:25:23,901-Speed 2618.29 samples/sec Loss 7.8363 LearningRate 0.0342 Epoch: 8 Global Step: 344400 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:25:27,800-Speed 2627.06 samples/sec Loss 7.8189 LearningRate 0.0342 Epoch: 8 Global Step: 344410 Fp16 Grad Scale: 131072 Required: 55 hours
Training: 2022-04-14 10:25:31,683-Speed 2638.04 samples/sec Loss 7.6890 LearningRate 0.0342 Epoch: 8 Global Step: 344420 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:25:35,584-Speed 2626.33 samples/sec Loss 7.9643 LearningRate 0.0342 Epoch: 8 Global Step: 344430 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:25:39,489-Speed 2623.06 samples/sec Loss 7.8943 LearningRate 0.0342 Epoch: 8 Global Step: 344440 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:25:43,390-Speed 2624.84 samples/sec Loss 7.8186 LearningRate 0.0342 Epoch: 8 Global Step: 344450 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:25:47,288-Speed 2627.91 samples/sec Loss 7.7636 LearningRate 0.0342 Epoch: 8 Global Step: 344460 Fp16 Grad Scale: 65536 Required: 55 hours
Training: 2022-04-14 10:25:51,186-Speed 2627.38 samples/sec Loss 7.8479 LearningRate 0.0342 Epoch: 8 Global Step: 344470 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:25:55,089-Speed 2624.59 samples/sec Loss 7.9047 LearningRate 0.0342 Epoch: 8 Global Step: 344480 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:25:58,989-Speed 2627.41 samples/sec Loss 7.9527 LearningRate 0.0342 Epoch: 8 Global Step: 344490 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:26:02,891-Speed 2624.72 samples/sec Loss 7.8730 LearningRate 0.0342 Epoch: 8 Global Step: 344500 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:26:06,791-Speed 2626.33 samples/sec Loss 8.0033 LearningRate 0.0342 Epoch: 8 Global Step: 344510 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:26:10,695-Speed 2623.60 samples/sec Loss 8.0000 LearningRate 0.0342 Epoch: 8 Global Step: 344520 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:26:14,596-Speed 2625.25 samples/sec Loss 7.8658 LearningRate 0.0342 Epoch: 8 Global Step: 344530 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:26:18,495-Speed 2627.30 samples/sec Loss 7.8347 LearningRate 0.0342 Epoch: 8 Global Step: 344540 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:26:22,394-Speed 2626.85 samples/sec Loss 7.9306 LearningRate 0.0342 Epoch: 8 Global Step: 344550 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:26:26,274-Speed 2640.71 samples/sec Loss 7.9302 LearningRate 0.0342 Epoch: 8 Global Step: 344560 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:30,182-Speed 2620.82 samples/sec Loss 7.8399 LearningRate 0.0342 Epoch: 8 Global Step: 344570 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:34,096-Speed 2616.52 samples/sec Loss 8.0043 LearningRate 0.0342 Epoch: 8 Global Step: 344580 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:38,015-Speed 2613.55 samples/sec Loss 8.0199 LearningRate 0.0342 Epoch: 8 Global Step: 344590 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:41,913-Speed 2628.28 samples/sec Loss 7.8215 LearningRate 0.0342 Epoch: 8 Global Step: 344600 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:45,814-Speed 2625.84 samples/sec Loss 7.9068 LearningRate 0.0342 Epoch: 8 Global Step: 344610 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:49,707-Speed 2630.20 samples/sec Loss 7.9176 LearningRate 0.0342 Epoch: 8 Global Step: 344620 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:53,610-Speed 2624.25 samples/sec Loss 7.7155 LearningRate 0.0342 Epoch: 8 Global Step: 344630 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:26:57,506-Speed 2629.47 samples/sec Loss 7.8267 LearningRate 0.0342 Epoch: 8 Global Step: 344640 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:01,413-Speed 2622.00 samples/sec Loss 7.7969 LearningRate 0.0342 Epoch: 8 Global Step: 344650 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:05,309-Speed 2628.85 samples/sec Loss 7.8175 LearningRate 0.0342 Epoch: 8 Global Step: 344660 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:27:09,219-Speed 2619.89 samples/sec Loss 7.8889 LearningRate 0.0342 Epoch: 8 Global Step: 344670 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:27:13,117-Speed 2627.28 samples/sec Loss 7.8732 LearningRate 0.0342 Epoch: 8 Global Step: 344680 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:27:17,016-Speed 2626.96 samples/sec Loss 7.7636 LearningRate 0.0342 Epoch: 8 Global Step: 344690 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:27:20,912-Speed 2628.95 samples/sec Loss 7.8892 LearningRate 0.0342 Epoch: 8 Global Step: 344700 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:27:24,810-Speed 2628.06 samples/sec Loss 7.7887 LearningRate 0.0342 Epoch: 8 Global Step: 344710 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:27:28,693-Speed 2638.06 samples/sec Loss 7.8919 LearningRate 0.0342 Epoch: 8 Global Step: 344720 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:32,591-Speed 2627.62 samples/sec Loss 7.9255 LearningRate 0.0342 Epoch: 8 Global Step: 344730 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:36,489-Speed 2627.39 samples/sec Loss 7.8386 LearningRate 0.0342 Epoch: 8 Global Step: 344740 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:40,411-Speed 2611.90 samples/sec Loss 7.8712 LearningRate 0.0342 Epoch: 8 Global Step: 344750 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:44,311-Speed 2625.97 samples/sec Loss 7.7699 LearningRate 0.0342 Epoch: 8 Global Step: 344760 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:48,210-Speed 2626.80 samples/sec Loss 7.8505 LearningRate 0.0342 Epoch: 8 Global Step: 344770 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:52,113-Speed 2624.36 samples/sec Loss 7.8846 LearningRate 0.0342 Epoch: 8 Global Step: 344780 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:56,005-Speed 2631.42 samples/sec Loss 7.9297 LearningRate 0.0342 Epoch: 8 Global Step: 344790 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:27:59,905-Speed 2626.29 samples/sec Loss 7.8681 LearningRate 0.0341 Epoch: 8 Global Step: 344800 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:28:03,811-Speed 2623.09 samples/sec Loss 7.9260 LearningRate 0.0341 Epoch: 8 Global Step: 344810 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:28:07,730-Speed 2613.30 samples/sec Loss 7.9141 LearningRate 0.0341 Epoch: 8 Global Step: 344820 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:11,639-Speed 2620.58 samples/sec Loss 7.9327 LearningRate 0.0341 Epoch: 8 Global Step: 344830 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:15,542-Speed 2624.27 samples/sec Loss 7.8110 LearningRate 0.0341 Epoch: 8 Global Step: 344840 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:19,443-Speed 2625.50 samples/sec Loss 7.8188 LearningRate 0.0341 Epoch: 8 Global Step: 344850 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:23,342-Speed 2627.23 samples/sec Loss 7.7880 LearningRate 0.0341 Epoch: 8 Global Step: 344860 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:27,237-Speed 2629.79 samples/sec Loss 7.9714 LearningRate 0.0341 Epoch: 8 Global Step: 344870 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:31,132-Speed 2629.25 samples/sec Loss 7.9231 LearningRate 0.0341 Epoch: 8 Global Step: 344880 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:35,031-Speed 2626.98 samples/sec Loss 7.8557 LearningRate 0.0341 Epoch: 8 Global Step: 344890 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:38,927-Speed 2628.54 samples/sec Loss 7.9344 LearningRate 0.0341 Epoch: 8 Global Step: 344900 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:42,826-Speed 2627.99 samples/sec Loss 7.8748 LearningRate 0.0341 Epoch: 8 Global Step: 344910 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:28:46,720-Speed 2630.82 samples/sec Loss 7.9061 LearningRate 0.0341 Epoch: 8 Global Step: 344920 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:28:50,546-Speed 2677.13 samples/sec Loss 8.1705 LearningRate 0.0341 Epoch: 8 Global Step: 344930 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:28:54,447-Speed 2625.51 samples/sec Loss 8.5245 LearningRate 0.0341 Epoch: 8 Global Step: 344940 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:28:58,338-Speed 2632.81 samples/sec Loss 8.1137 LearningRate 0.0341 Epoch: 8 Global Step: 344950 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:02,232-Speed 2629.91 samples/sec Loss 8.0001 LearningRate 0.0341 Epoch: 8 Global Step: 344960 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:06,128-Speed 2629.27 samples/sec Loss 7.9297 LearningRate 0.0341 Epoch: 8 Global Step: 344970 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:10,031-Speed 2624.23 samples/sec Loss 7.8084 LearningRate 0.0341 Epoch: 8 Global Step: 344980 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:13,929-Speed 2627.51 samples/sec Loss 7.9958 LearningRate 0.0341 Epoch: 8 Global Step: 344990 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:17,926-Speed 2562.98 samples/sec Loss 7.8248 LearningRate 0.0341 Epoch: 8 Global Step: 345000 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:21,886-Speed 2586.48 samples/sec Loss 7.8522 LearningRate 0.0341 Epoch: 8 Global Step: 345010 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:25,784-Speed 2627.37 samples/sec Loss 7.7826 LearningRate 0.0341 Epoch: 8 Global Step: 345020 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:29:29,684-Speed 2626.76 samples/sec Loss 7.8064 LearningRate 0.0341 Epoch: 8 Global Step: 345030 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:29:33,600-Speed 2615.27 samples/sec Loss 7.8404 LearningRate 0.0341 Epoch: 8 Global Step: 345040 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:29:37,496-Speed 2629.05 samples/sec Loss 7.8901 LearningRate 0.0341 Epoch: 8 Global Step: 345050 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:29:41,408-Speed 2617.54 samples/sec Loss 7.8451 LearningRate 0.0341 Epoch: 8 Global Step: 345060 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:29:45,310-Speed 2625.42 samples/sec Loss 8.0110 LearningRate 0.0341 Epoch: 8 Global Step: 345070 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:29:49,241-Speed 2605.46 samples/sec Loss 7.7627 LearningRate 0.0341 Epoch: 8 Global Step: 345080 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:29:53,139-Speed 2628.09 samples/sec Loss 7.8378 LearningRate 0.0341 Epoch: 8 Global Step: 345090 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:29:57,036-Speed 2627.73 samples/sec Loss 7.9626 LearningRate 0.0341 Epoch: 8 Global Step: 345100 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:30:00,941-Speed 2624.32 samples/sec Loss 7.7687 LearningRate 0.0341 Epoch: 8 Global Step: 345110 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:30:04,848-Speed 2621.51 samples/sec Loss 7.9925 LearningRate 0.0341 Epoch: 8 Global Step: 345120 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:30:08,755-Speed 2621.32 samples/sec Loss 7.9093 LearningRate 0.0341 Epoch: 8 Global Step: 345130 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:12,719-Speed 2583.69 samples/sec Loss 7.6971 LearningRate 0.0341 Epoch: 8 Global Step: 345140 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:16,661-Speed 2598.35 samples/sec Loss 7.8716 LearningRate 0.0341 Epoch: 8 Global Step: 345150 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:20,560-Speed 2627.04 samples/sec Loss 7.9349 LearningRate 0.0341 Epoch: 8 Global Step: 345160 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:24,507-Speed 2595.25 samples/sec Loss 7.9160 LearningRate 0.0341 Epoch: 8 Global Step: 345170 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:28,417-Speed 2619.25 samples/sec Loss 7.8188 LearningRate 0.0341 Epoch: 8 Global Step: 345180 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:32,334-Speed 2615.12 samples/sec Loss 7.9551 LearningRate 0.0341 Epoch: 8 Global Step: 345190 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:36,246-Speed 2618.44 samples/sec Loss 7.8733 LearningRate 0.0341 Epoch: 8 Global Step: 345200 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:40,172-Speed 2608.68 samples/sec Loss 7.8140 LearningRate 0.0341 Epoch: 8 Global Step: 345210 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:44,069-Speed 2628.29 samples/sec Loss 7.7297 LearningRate 0.0341 Epoch: 8 Global Step: 345220 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:30:47,971-Speed 2624.99 samples/sec Loss 7.9628 LearningRate 0.0341 Epoch: 8 Global Step: 345230 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:30:51,875-Speed 2623.71 samples/sec Loss 7.6836 LearningRate 0.0341 Epoch: 8 Global Step: 345240 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:30:55,777-Speed 2625.28 samples/sec Loss 7.8826 LearningRate 0.0341 Epoch: 8 Global Step: 345250 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:30:59,673-Speed 2629.03 samples/sec Loss 7.8541 LearningRate 0.0341 Epoch: 8 Global Step: 345260 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:31:03,541-Speed 2647.63 samples/sec Loss 8.3049 LearningRate 0.0341 Epoch: 8 Global Step: 345270 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:07,448-Speed 2622.12 samples/sec Loss 8.9489 LearningRate 0.0341 Epoch: 8 Global Step: 345280 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:11,339-Speed 2631.74 samples/sec Loss 8.0012 LearningRate 0.0341 Epoch: 8 Global Step: 345290 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:15,236-Speed 2628.97 samples/sec Loss 7.9733 LearningRate 0.0341 Epoch: 8 Global Step: 345300 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:19,132-Speed 2629.56 samples/sec Loss 7.8927 LearningRate 0.0341 Epoch: 8 Global Step: 345310 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:23,030-Speed 2627.79 samples/sec Loss 7.8236 LearningRate 0.0341 Epoch: 8 Global Step: 345320 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:26,960-Speed 2605.95 samples/sec Loss 7.9125 LearningRate 0.0341 Epoch: 8 Global Step: 345330 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:30,868-Speed 2621.54 samples/sec Loss 7.9849 LearningRate 0.0341 Epoch: 8 Global Step: 345340 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:34,761-Speed 2630.88 samples/sec Loss 7.7005 LearningRate 0.0341 Epoch: 8 Global Step: 345350 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:38,660-Speed 2627.11 samples/sec Loss 7.8426 LearningRate 0.0341 Epoch: 8 Global Step: 345360 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:31:42,556-Speed 2628.21 samples/sec Loss 7.8181 LearningRate 0.0341 Epoch: 8 Global Step: 345370 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:31:46,452-Speed 2629.46 samples/sec Loss 7.8950 LearningRate 0.0341 Epoch: 8 Global Step: 345380 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:31:50,363-Speed 2618.93 samples/sec Loss 7.9274 LearningRate 0.0341 Epoch: 8 Global Step: 345390 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:31:54,261-Speed 2627.63 samples/sec Loss 7.8563 LearningRate 0.0341 Epoch: 8 Global Step: 345400 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:31:58,173-Speed 2618.04 samples/sec Loss 7.7982 LearningRate 0.0341 Epoch: 8 Global Step: 345410 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:32:02,090-Speed 2615.29 samples/sec Loss 7.8781 LearningRate 0.0341 Epoch: 8 Global Step: 345420 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:32:06,002-Speed 2617.74 samples/sec Loss 7.8484 LearningRate 0.0341 Epoch: 8 Global Step: 345430 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:32:09,904-Speed 2625.66 samples/sec Loss 7.8630 LearningRate 0.0341 Epoch: 8 Global Step: 345440 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:32:13,798-Speed 2630.07 samples/sec Loss 8.1017 LearningRate 0.0341 Epoch: 8 Global Step: 345450 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:32:17,700-Speed 2624.99 samples/sec Loss 8.0147 LearningRate 0.0341 Epoch: 8 Global Step: 345460 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:32:21,601-Speed 2625.39 samples/sec Loss 7.9137 LearningRate 0.0341 Epoch: 8 Global Step: 345470 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:25,499-Speed 2627.71 samples/sec Loss 7.9250 LearningRate 0.0341 Epoch: 8 Global Step: 345480 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:29,397-Speed 2628.03 samples/sec Loss 7.8693 LearningRate 0.0341 Epoch: 8 Global Step: 345490 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:33,293-Speed 2629.17 samples/sec Loss 7.8976 LearningRate 0.0341 Epoch: 8 Global Step: 345500 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:37,201-Speed 2620.93 samples/sec Loss 7.8967 LearningRate 0.0340 Epoch: 8 Global Step: 345510 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:41,099-Speed 2627.45 samples/sec Loss 7.7663 LearningRate 0.0340 Epoch: 8 Global Step: 345520 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:44,995-Speed 2628.64 samples/sec Loss 7.8249 LearningRate 0.0340 Epoch: 8 Global Step: 345530 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:48,968-Speed 2577.79 samples/sec Loss 7.8093 LearningRate 0.0340 Epoch: 8 Global Step: 345540 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:52,867-Speed 2627.52 samples/sec Loss 7.8893 LearningRate 0.0340 Epoch: 8 Global Step: 345550 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:32:56,779-Speed 2618.17 samples/sec Loss 7.9519 LearningRate 0.0340 Epoch: 8 Global Step: 345560 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:00,680-Speed 2625.63 samples/sec Loss 7.8887 LearningRate 0.0340 Epoch: 8 Global Step: 345570 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:33:04,717-Speed 2537.03 samples/sec Loss 7.8996 LearningRate 0.0340 Epoch: 8 Global Step: 345580 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:33:08,707-Speed 2567.66 samples/sec Loss 7.9283 LearningRate 0.0340 Epoch: 8 Global Step: 345590 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:33:12,601-Speed 2630.54 samples/sec Loss 7.9674 LearningRate 0.0340 Epoch: 8 Global Step: 345600 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:33:16,494-Speed 2630.35 samples/sec Loss 7.8233 LearningRate 0.0340 Epoch: 8 Global Step: 345610 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:33:20,397-Speed 2624.15 samples/sec Loss 7.7639 LearningRate 0.0340 Epoch: 8 Global Step: 345620 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:33:24,280-Speed 2637.45 samples/sec Loss 7.9116 LearningRate 0.0340 Epoch: 8 Global Step: 345630 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:28,181-Speed 2626.37 samples/sec Loss 7.8903 LearningRate 0.0340 Epoch: 8 Global Step: 345640 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:32,083-Speed 2624.96 samples/sec Loss 7.8472 LearningRate 0.0340 Epoch: 8 Global Step: 345650 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:35,988-Speed 2622.96 samples/sec Loss 8.0819 LearningRate 0.0340 Epoch: 8 Global Step: 345660 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:39,889-Speed 2625.62 samples/sec Loss 7.8474 LearningRate 0.0340 Epoch: 8 Global Step: 345670 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:43,788-Speed 2626.91 samples/sec Loss 7.8315 LearningRate 0.0340 Epoch: 8 Global Step: 345680 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:47,687-Speed 2626.67 samples/sec Loss 7.8676 LearningRate 0.0340 Epoch: 8 Global Step: 345690 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:51,579-Speed 2631.62 samples/sec Loss 7.9161 LearningRate 0.0340 Epoch: 8 Global Step: 345700 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:55,473-Speed 2630.70 samples/sec Loss 7.8074 LearningRate 0.0340 Epoch: 8 Global Step: 345710 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:33:59,381-Speed 2620.85 samples/sec Loss 7.8625 LearningRate 0.0340 Epoch: 8 Global Step: 345720 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:34:03,271-Speed 2633.50 samples/sec Loss 7.8755 LearningRate 0.0340 Epoch: 8 Global Step: 345730 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:07,165-Speed 2630.12 samples/sec Loss 7.8098 LearningRate 0.0340 Epoch: 8 Global Step: 345740 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:11,066-Speed 2626.07 samples/sec Loss 7.9368 LearningRate 0.0340 Epoch: 8 Global Step: 345750 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:14,958-Speed 2631.28 samples/sec Loss 7.8570 LearningRate 0.0340 Epoch: 8 Global Step: 345760 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:18,852-Speed 2630.48 samples/sec Loss 7.7437 LearningRate 0.0340 Epoch: 8 Global Step: 345770 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:22,765-Speed 2617.94 samples/sec Loss 7.9110 LearningRate 0.0340 Epoch: 8 Global Step: 345780 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:26,660-Speed 2629.41 samples/sec Loss 7.8578 LearningRate 0.0340 Epoch: 8 Global Step: 345790 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:30,557-Speed 2627.94 samples/sec Loss 7.8340 LearningRate 0.0340 Epoch: 8 Global Step: 345800 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:34,451-Speed 2630.64 samples/sec Loss 7.9311 LearningRate 0.0340 Epoch: 8 Global Step: 345810 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:38,348-Speed 2628.18 samples/sec Loss 7.9599 LearningRate 0.0340 Epoch: 8 Global Step: 345820 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:34:42,230-Speed 2639.26 samples/sec Loss 7.8214 LearningRate 0.0340 Epoch: 8 Global Step: 345830 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:34:46,128-Speed 2627.21 samples/sec Loss 7.8989 LearningRate 0.0340 Epoch: 8 Global Step: 345840 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:34:50,037-Speed 2620.51 samples/sec Loss 7.8586 LearningRate 0.0340 Epoch: 8 Global Step: 345850 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:34:53,955-Speed 2614.33 samples/sec Loss 7.9845 LearningRate 0.0340 Epoch: 8 Global Step: 345860 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:34:57,853-Speed 2627.56 samples/sec Loss 7.8309 LearningRate 0.0340 Epoch: 8 Global Step: 345870 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:35:01,747-Speed 2630.42 samples/sec Loss 7.7940 LearningRate 0.0340 Epoch: 8 Global Step: 345880 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:35:05,641-Speed 2630.26 samples/sec Loss 7.9098 LearningRate 0.0340 Epoch: 8 Global Step: 345890 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:35:09,535-Speed 2630.54 samples/sec Loss 7.7786 LearningRate 0.0340 Epoch: 8 Global Step: 345900 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:35:13,429-Speed 2630.56 samples/sec Loss 7.7903 LearningRate 0.0340 Epoch: 8 Global Step: 345910 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:35:17,328-Speed 2626.97 samples/sec Loss 7.8335 LearningRate 0.0340 Epoch: 8 Global Step: 345920 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:35:21,224-Speed 2629.19 samples/sec Loss 7.9155 LearningRate 0.0340 Epoch: 8 Global Step: 345930 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:25,123-Speed 2626.99 samples/sec Loss 7.7577 LearningRate 0.0340 Epoch: 8 Global Step: 345940 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:29,036-Speed 2617.68 samples/sec Loss 7.7836 LearningRate 0.0340 Epoch: 8 Global Step: 345950 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:32,941-Speed 2622.53 samples/sec Loss 7.8626 LearningRate 0.0340 Epoch: 8 Global Step: 345960 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:36,844-Speed 2624.46 samples/sec Loss 7.7589 LearningRate 0.0340 Epoch: 8 Global Step: 345970 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:40,743-Speed 2627.12 samples/sec Loss 7.8608 LearningRate 0.0340 Epoch: 8 Global Step: 345980 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:44,639-Speed 2629.00 samples/sec Loss 7.8895 LearningRate 0.0340 Epoch: 8 Global Step: 345990 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:48,533-Speed 2630.61 samples/sec Loss 7.7728 LearningRate 0.0340 Epoch: 8 Global Step: 346000 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:35:52,415-Speed 2637.92 samples/sec Loss 7.9282 LearningRate 0.0340 Epoch: 8 Global Step: 346010 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:35:56,340-Speed 2610.26 samples/sec Loss 7.8753 LearningRate 0.0340 Epoch: 8 Global Step: 346020 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:36:00,244-Speed 2623.91 samples/sec Loss 7.9229 LearningRate 0.0340 Epoch: 8 Global Step: 346030 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:36:04,141-Speed 2627.52 samples/sec Loss 7.8425 LearningRate 0.0340 Epoch: 8 Global Step: 346040 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:36:08,061-Speed 2613.32 samples/sec Loss 7.8254 LearningRate 0.0340 Epoch: 8 Global Step: 346050 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:36:11,853-Speed 2701.20 samples/sec Loss 8.1620 LearningRate 0.0340 Epoch: 8 Global Step: 346060 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:15,743-Speed 2633.27 samples/sec Loss 8.3455 LearningRate 0.0340 Epoch: 8 Global Step: 346070 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:19,643-Speed 2625.90 samples/sec Loss 7.9921 LearningRate 0.0340 Epoch: 8 Global Step: 346080 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:23,534-Speed 2632.10 samples/sec Loss 7.7797 LearningRate 0.0340 Epoch: 8 Global Step: 346090 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:27,432-Speed 2628.36 samples/sec Loss 7.8688 LearningRate 0.0340 Epoch: 8 Global Step: 346100 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:31,338-Speed 2622.57 samples/sec Loss 7.7901 LearningRate 0.0340 Epoch: 8 Global Step: 346110 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:35,228-Speed 2632.66 samples/sec Loss 7.7878 LearningRate 0.0340 Epoch: 8 Global Step: 346120 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:39,119-Speed 2632.04 samples/sec Loss 7.8663 LearningRate 0.0340 Epoch: 8 Global Step: 346130 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:43,022-Speed 2624.24 samples/sec Loss 7.9035 LearningRate 0.0340 Epoch: 8 Global Step: 346140 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:46,913-Speed 2632.18 samples/sec Loss 7.8771 LearningRate 0.0340 Epoch: 8 Global Step: 346150 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:36:50,805-Speed 2631.99 samples/sec Loss 7.8133 LearningRate 0.0340 Epoch: 8 Global Step: 346160 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:36:54,704-Speed 2626.44 samples/sec Loss 7.9799 LearningRate 0.0340 Epoch: 8 Global Step: 346170 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:36:58,587-Speed 2638.52 samples/sec Loss 8.5993 LearningRate 0.0340 Epoch: 8 Global Step: 346180 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:02,482-Speed 2629.26 samples/sec Loss 8.5679 LearningRate 0.0340 Epoch: 8 Global Step: 346190 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:06,408-Speed 2609.35 samples/sec Loss 8.1419 LearningRate 0.0340 Epoch: 8 Global Step: 346200 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:10,300-Speed 2631.05 samples/sec Loss 7.7714 LearningRate 0.0340 Epoch: 8 Global Step: 346210 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:14,192-Speed 2631.59 samples/sec Loss 7.8671 LearningRate 0.0339 Epoch: 8 Global Step: 346220 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:18,084-Speed 2631.97 samples/sec Loss 7.9723 LearningRate 0.0339 Epoch: 8 Global Step: 346230 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:21,982-Speed 2627.72 samples/sec Loss 8.0104 LearningRate 0.0339 Epoch: 8 Global Step: 346240 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:25,872-Speed 2633.20 samples/sec Loss 7.8729 LearningRate 0.0339 Epoch: 8 Global Step: 346250 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:29,765-Speed 2630.63 samples/sec Loss 7.7986 LearningRate 0.0339 Epoch: 8 Global Step: 346260 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:33,664-Speed 2627.15 samples/sec Loss 7.9544 LearningRate 0.0339 Epoch: 8 Global Step: 346270 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:37:37,556-Speed 2630.97 samples/sec Loss 7.9090 LearningRate 0.0339 Epoch: 8 Global Step: 346280 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:37:41,457-Speed 2626.29 samples/sec Loss 7.8002 LearningRate 0.0339 Epoch: 8 Global Step: 346290 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:37:45,348-Speed 2632.59 samples/sec Loss 7.8729 LearningRate 0.0339 Epoch: 8 Global Step: 346300 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:37:49,239-Speed 2632.16 samples/sec Loss 7.9037 LearningRate 0.0339 Epoch: 8 Global Step: 346310 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:37:53,132-Speed 2630.86 samples/sec Loss 7.7975 LearningRate 0.0339 Epoch: 8 Global Step: 346320 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:37:57,024-Speed 2631.66 samples/sec Loss 7.9201 LearningRate 0.0339 Epoch: 8 Global Step: 346330 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:38:00,918-Speed 2630.12 samples/sec Loss 7.9714 LearningRate 0.0339 Epoch: 8 Global Step: 346340 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:38:04,812-Speed 2630.29 samples/sec Loss 7.8520 LearningRate 0.0339 Epoch: 8 Global Step: 346350 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:38:08,706-Speed 2630.06 samples/sec Loss 7.9134 LearningRate 0.0339 Epoch: 8 Global Step: 346360 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:38:12,605-Speed 2627.09 samples/sec Loss 7.8401 LearningRate 0.0339 Epoch: 8 Global Step: 346370 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:38:16,549-Speed 2597.29 samples/sec Loss 7.8340 LearningRate 0.0339 Epoch: 8 Global Step: 346380 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:20,597-Speed 2529.82 samples/sec Loss 7.8961 LearningRate 0.0339 Epoch: 8 Global Step: 346390 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:24,506-Speed 2620.64 samples/sec Loss 7.8569 LearningRate 0.0339 Epoch: 8 Global Step: 346400 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:28,405-Speed 2626.89 samples/sec Loss 7.9219 LearningRate 0.0339 Epoch: 8 Global Step: 346410 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:32,301-Speed 2631.96 samples/sec Loss 7.8208 LearningRate 0.0339 Epoch: 8 Global Step: 346420 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:36,208-Speed 2621.54 samples/sec Loss 7.7893 LearningRate 0.0339 Epoch: 8 Global Step: 346430 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:40,108-Speed 2625.84 samples/sec Loss 7.8158 LearningRate 0.0339 Epoch: 8 Global Step: 346440 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:44,003-Speed 2629.77 samples/sec Loss 7.7162 LearningRate 0.0339 Epoch: 8 Global Step: 346450 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:47,909-Speed 2622.72 samples/sec Loss 7.9148 LearningRate 0.0339 Epoch: 8 Global Step: 346460 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:51,798-Speed 2633.39 samples/sec Loss 7.7047 LearningRate 0.0339 Epoch: 8 Global Step: 346470 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:38:55,690-Speed 2631.80 samples/sec Loss 7.8125 LearningRate 0.0339 Epoch: 8 Global Step: 346480 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:38:59,585-Speed 2629.36 samples/sec Loss 7.8177 LearningRate 0.0339 Epoch: 8 Global Step: 346490 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:03,479-Speed 2630.81 samples/sec Loss 7.9200 LearningRate 0.0339 Epoch: 8 Global Step: 346500 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:07,386-Speed 2621.34 samples/sec Loss 7.8667 LearningRate 0.0339 Epoch: 8 Global Step: 346510 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:11,305-Speed 2613.34 samples/sec Loss 7.8611 LearningRate 0.0339 Epoch: 8 Global Step: 346520 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:15,204-Speed 2626.69 samples/sec Loss 7.8773 LearningRate 0.0339 Epoch: 8 Global Step: 346530 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:19,128-Speed 2610.26 samples/sec Loss 7.7972 LearningRate 0.0339 Epoch: 8 Global Step: 346540 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:23,036-Speed 2621.49 samples/sec Loss 7.8923 LearningRate 0.0339 Epoch: 8 Global Step: 346550 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:26,953-Speed 2614.94 samples/sec Loss 7.8099 LearningRate 0.0339 Epoch: 8 Global Step: 346560 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:30,860-Speed 2621.84 samples/sec Loss 7.8908 LearningRate 0.0339 Epoch: 8 Global Step: 346570 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:39:34,779-Speed 2613.47 samples/sec Loss 7.8928 LearningRate 0.0339 Epoch: 8 Global Step: 346580 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:39:38,683-Speed 2623.32 samples/sec Loss 7.8398 LearningRate 0.0339 Epoch: 8 Global Step: 346590 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:39:42,632-Speed 2593.27 samples/sec Loss 7.9393 LearningRate 0.0339 Epoch: 8 Global Step: 346600 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:39:46,579-Speed 2595.55 samples/sec Loss 8.7043 LearningRate 0.0339 Epoch: 8 Global Step: 346610 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:39:50,478-Speed 2626.99 samples/sec Loss 8.0856 LearningRate 0.0339 Epoch: 8 Global Step: 346620 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:39:54,390-Speed 2618.08 samples/sec Loss 7.8243 LearningRate 0.0339 Epoch: 8 Global Step: 346630 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:39:58,302-Speed 2618.44 samples/sec Loss 8.0713 LearningRate 0.0339 Epoch: 8 Global Step: 346640 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:40:02,202-Speed 2626.67 samples/sec Loss 7.9989 LearningRate 0.0339 Epoch: 8 Global Step: 346650 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:40:06,099-Speed 2628.12 samples/sec Loss 7.9822 LearningRate 0.0339 Epoch: 8 Global Step: 346660 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:40:10,001-Speed 2624.87 samples/sec Loss 7.9725 LearningRate 0.0339 Epoch: 8 Global Step: 346670 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:40:13,905-Speed 2623.45 samples/sec Loss 7.9705 LearningRate 0.0339 Epoch: 8 Global Step: 346680 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:40:17,806-Speed 2625.73 samples/sec Loss 8.0381 LearningRate 0.0339 Epoch: 8 Global Step: 346690 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:40:21,704-Speed 2627.50 samples/sec Loss 7.7383 LearningRate 0.0339 Epoch: 8 Global Step: 346700 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:40:25,599-Speed 2629.93 samples/sec Loss 7.8012 LearningRate 0.0339 Epoch: 8 Global Step: 346710 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:29,496-Speed 2628.48 samples/sec Loss 7.8468 LearningRate 0.0339 Epoch: 8 Global Step: 346720 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:33,419-Speed 2610.71 samples/sec Loss 7.8294 LearningRate 0.0339 Epoch: 8 Global Step: 346730 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:37,318-Speed 2626.88 samples/sec Loss 7.9634 LearningRate 0.0339 Epoch: 8 Global Step: 346740 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:41,223-Speed 2623.08 samples/sec Loss 7.8739 LearningRate 0.0339 Epoch: 8 Global Step: 346750 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:45,123-Speed 2626.57 samples/sec Loss 7.9310 LearningRate 0.0339 Epoch: 8 Global Step: 346760 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:49,017-Speed 2630.39 samples/sec Loss 7.8538 LearningRate 0.0339 Epoch: 8 Global Step: 346770 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:52,907-Speed 2633.45 samples/sec Loss 7.8215 LearningRate 0.0339 Epoch: 8 Global Step: 346780 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:40:56,801-Speed 2629.61 samples/sec Loss 7.7833 LearningRate 0.0339 Epoch: 8 Global Step: 346790 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:41:00,697-Speed 2629.06 samples/sec Loss 7.9218 LearningRate 0.0339 Epoch: 8 Global Step: 346800 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:41:04,590-Speed 2630.60 samples/sec Loss 7.8021 LearningRate 0.0339 Epoch: 8 Global Step: 346810 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:41:08,483-Speed 2632.01 samples/sec Loss 7.8449 LearningRate 0.0339 Epoch: 8 Global Step: 346820 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:41:12,374-Speed 2632.25 samples/sec Loss 7.7623 LearningRate 0.0339 Epoch: 8 Global Step: 346830 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:41:16,265-Speed 2632.18 samples/sec Loss 7.7625 LearningRate 0.0339 Epoch: 8 Global Step: 346840 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:41:20,152-Speed 2634.61 samples/sec Loss 7.9564 LearningRate 0.0339 Epoch: 8 Global Step: 346850 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:41:24,007-Speed 2657.21 samples/sec Loss 8.7556 LearningRate 0.0339 Epoch: 8 Global Step: 346860 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:27,902-Speed 2629.67 samples/sec Loss 8.2419 LearningRate 0.0339 Epoch: 8 Global Step: 346870 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:31,791-Speed 2633.53 samples/sec Loss 7.8172 LearningRate 0.0339 Epoch: 8 Global Step: 346880 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:35,687-Speed 2628.47 samples/sec Loss 7.7757 LearningRate 0.0339 Epoch: 8 Global Step: 346890 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:39,583-Speed 2629.31 samples/sec Loss 7.8648 LearningRate 0.0339 Epoch: 8 Global Step: 346900 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:43,476-Speed 2631.56 samples/sec Loss 7.9778 LearningRate 0.0339 Epoch: 8 Global Step: 346910 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:47,382-Speed 2622.18 samples/sec Loss 7.9691 LearningRate 0.0339 Epoch: 8 Global Step: 346920 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:51,275-Speed 2631.11 samples/sec Loss 7.8733 LearningRate 0.0338 Epoch: 8 Global Step: 346930 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:55,163-Speed 2634.35 samples/sec Loss 7.8944 LearningRate 0.0338 Epoch: 8 Global Step: 346940 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:41:59,077-Speed 2617.66 samples/sec Loss 7.8950 LearningRate 0.0338 Epoch: 8 Global Step: 346950 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:42:02,970-Speed 2630.67 samples/sec Loss 7.8938 LearningRate 0.0338 Epoch: 8 Global Step: 346960 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:06,862-Speed 2631.86 samples/sec Loss 7.9625 LearningRate 0.0338 Epoch: 8 Global Step: 346970 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:10,755-Speed 2630.58 samples/sec Loss 7.8155 LearningRate 0.0338 Epoch: 8 Global Step: 346980 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:14,652-Speed 2628.89 samples/sec Loss 7.8434 LearningRate 0.0338 Epoch: 8 Global Step: 346990 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:18,581-Speed 2607.57 samples/sec Loss 7.8575 LearningRate 0.0338 Epoch: 8 Global Step: 347000 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:22,471-Speed 2632.55 samples/sec Loss 7.7798 LearningRate 0.0338 Epoch: 8 Global Step: 347010 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:26,361-Speed 2633.82 samples/sec Loss 7.9261 LearningRate 0.0338 Epoch: 8 Global Step: 347020 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:30,251-Speed 2632.88 samples/sec Loss 7.8253 LearningRate 0.0338 Epoch: 8 Global Step: 347030 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:34,141-Speed 2632.84 samples/sec Loss 7.7397 LearningRate 0.0338 Epoch: 8 Global Step: 347040 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:38,031-Speed 2632.62 samples/sec Loss 7.8622 LearningRate 0.0338 Epoch: 8 Global Step: 347050 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:42:41,927-Speed 2629.64 samples/sec Loss 7.7983 LearningRate 0.0338 Epoch: 8 Global Step: 347060 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:42:45,818-Speed 2632.45 samples/sec Loss 7.8627 LearningRate 0.0338 Epoch: 8 Global Step: 347070 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:42:49,724-Speed 2622.39 samples/sec Loss 7.8066 LearningRate 0.0338 Epoch: 8 Global Step: 347080 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:42:53,626-Speed 2624.88 samples/sec Loss 7.8649 LearningRate 0.0338 Epoch: 8 Global Step: 347090 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:42:57,521-Speed 2629.91 samples/sec Loss 7.8863 LearningRate 0.0338 Epoch: 8 Global Step: 347100 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:43:01,412-Speed 2631.98 samples/sec Loss 7.7902 LearningRate 0.0338 Epoch: 8 Global Step: 347110 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:43:05,318-Speed 2622.46 samples/sec Loss 7.7626 LearningRate 0.0338 Epoch: 8 Global Step: 347120 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:43:09,218-Speed 2626.00 samples/sec Loss 7.8572 LearningRate 0.0338 Epoch: 8 Global Step: 347130 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:43:13,116-Speed 2627.90 samples/sec Loss 7.8021 LearningRate 0.0338 Epoch: 8 Global Step: 347140 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:43:17,015-Speed 2626.87 samples/sec Loss 7.8552 LearningRate 0.0338 Epoch: 8 Global Step: 347150 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:43:20,915-Speed 2626.72 samples/sec Loss 7.8629 LearningRate 0.0338 Epoch: 8 Global Step: 347160 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:24,809-Speed 2629.97 samples/sec Loss 7.9342 LearningRate 0.0338 Epoch: 8 Global Step: 347170 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:28,709-Speed 2626.02 samples/sec Loss 7.7386 LearningRate 0.0338 Epoch: 8 Global Step: 347180 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:32,603-Speed 2630.39 samples/sec Loss 7.8637 LearningRate 0.0338 Epoch: 8 Global Step: 347190 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:36,503-Speed 2625.85 samples/sec Loss 7.7230 LearningRate 0.0338 Epoch: 8 Global Step: 347200 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:40,401-Speed 2627.81 samples/sec Loss 7.8082 LearningRate 0.0338 Epoch: 8 Global Step: 347210 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:44,312-Speed 2619.11 samples/sec Loss 7.9271 LearningRate 0.0338 Epoch: 8 Global Step: 347220 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:48,208-Speed 2629.78 samples/sec Loss 7.8107 LearningRate 0.0338 Epoch: 8 Global Step: 347230 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:52,117-Speed 2619.88 samples/sec Loss 7.9748 LearningRate 0.0338 Epoch: 8 Global Step: 347240 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:56,021-Speed 2623.79 samples/sec Loss 7.8444 LearningRate 0.0338 Epoch: 8 Global Step: 347250 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:43:59,929-Speed 2620.73 samples/sec Loss 7.7396 LearningRate 0.0338 Epoch: 8 Global Step: 347260 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:03,830-Speed 2626.00 samples/sec Loss 7.8727 LearningRate 0.0338 Epoch: 8 Global Step: 347270 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:07,728-Speed 2627.01 samples/sec Loss 7.7569 LearningRate 0.0338 Epoch: 8 Global Step: 347280 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:11,629-Speed 2625.67 samples/sec Loss 7.7576 LearningRate 0.0338 Epoch: 8 Global Step: 347290 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:15,525-Speed 2628.97 samples/sec Loss 7.7747 LearningRate 0.0338 Epoch: 8 Global Step: 347300 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:19,427-Speed 2626.01 samples/sec Loss 7.8096 LearningRate 0.0338 Epoch: 8 Global Step: 347310 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:23,336-Speed 2620.35 samples/sec Loss 7.8516 LearningRate 0.0338 Epoch: 8 Global Step: 347320 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:27,350-Speed 2551.80 samples/sec Loss 7.9253 LearningRate 0.0338 Epoch: 8 Global Step: 347330 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:31,366-Speed 2551.18 samples/sec Loss 7.8507 LearningRate 0.0338 Epoch: 8 Global Step: 347340 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:35,266-Speed 2625.93 samples/sec Loss 7.8556 LearningRate 0.0338 Epoch: 8 Global Step: 347350 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:44:39,170-Speed 2623.45 samples/sec Loss 7.8526 LearningRate 0.0338 Epoch: 8 Global Step: 347360 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:44:43,074-Speed 2623.90 samples/sec Loss 7.7584 LearningRate 0.0338 Epoch: 8 Global Step: 347370 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:44:46,969-Speed 2629.48 samples/sec Loss 7.7081 LearningRate 0.0338 Epoch: 8 Global Step: 347380 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:44:50,870-Speed 2625.77 samples/sec Loss 7.7701 LearningRate 0.0338 Epoch: 8 Global Step: 347390 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:44:54,779-Speed 2620.47 samples/sec Loss 7.8911 LearningRate 0.0338 Epoch: 8 Global Step: 347400 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:44:58,700-Speed 2613.23 samples/sec Loss 7.8189 LearningRate 0.0338 Epoch: 8 Global Step: 347410 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:45:02,586-Speed 2635.60 samples/sec Loss 7.8240 LearningRate 0.0338 Epoch: 8 Global Step: 347420 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:06,517-Speed 2605.88 samples/sec Loss 7.9164 LearningRate 0.0338 Epoch: 8 Global Step: 347430 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:10,421-Speed 2623.60 samples/sec Loss 7.8841 LearningRate 0.0338 Epoch: 8 Global Step: 347440 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:14,337-Speed 2615.12 samples/sec Loss 7.8389 LearningRate 0.0338 Epoch: 8 Global Step: 347450 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:18,236-Speed 2627.14 samples/sec Loss 7.8348 LearningRate 0.0338 Epoch: 8 Global Step: 347460 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:22,135-Speed 2626.92 samples/sec Loss 7.8822 LearningRate 0.0338 Epoch: 8 Global Step: 347470 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:26,055-Speed 2612.92 samples/sec Loss 7.8878 LearningRate 0.0338 Epoch: 8 Global Step: 347480 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:29,954-Speed 2627.29 samples/sec Loss 7.8028 LearningRate 0.0338 Epoch: 8 Global Step: 347490 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:33,848-Speed 2630.51 samples/sec Loss 7.8226 LearningRate 0.0338 Epoch: 8 Global Step: 347500 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:37,742-Speed 2630.52 samples/sec Loss 7.8989 LearningRate 0.0338 Epoch: 8 Global Step: 347510 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:45:41,638-Speed 2628.76 samples/sec Loss 7.8563 LearningRate 0.0338 Epoch: 8 Global Step: 347520 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:45:45,552-Speed 2616.64 samples/sec Loss 7.8625 LearningRate 0.0338 Epoch: 8 Global Step: 347530 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:45:49,448-Speed 2628.42 samples/sec Loss 7.7992 LearningRate 0.0338 Epoch: 8 Global Step: 347540 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:45:53,320-Speed 2647.64 samples/sec Loss 7.8554 LearningRate 0.0338 Epoch: 8 Global Step: 347550 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:45:57,200-Speed 2639.59 samples/sec Loss 8.1120 LearningRate 0.0338 Epoch: 8 Global Step: 347560 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:46:01,066-Speed 2650.23 samples/sec Loss 8.7011 LearningRate 0.0338 Epoch: 8 Global Step: 347570 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:04,966-Speed 2626.02 samples/sec Loss 7.9231 LearningRate 0.0338 Epoch: 8 Global Step: 347580 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:08,856-Speed 2633.05 samples/sec Loss 7.9044 LearningRate 0.0338 Epoch: 8 Global Step: 347590 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:12,751-Speed 2629.54 samples/sec Loss 7.7986 LearningRate 0.0338 Epoch: 8 Global Step: 347600 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:16,647-Speed 2629.50 samples/sec Loss 7.8794 LearningRate 0.0338 Epoch: 8 Global Step: 347610 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:20,541-Speed 2629.87 samples/sec Loss 7.9523 LearningRate 0.0338 Epoch: 8 Global Step: 347620 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:24,436-Speed 2629.89 samples/sec Loss 7.8787 LearningRate 0.0338 Epoch: 8 Global Step: 347630 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:28,327-Speed 2632.23 samples/sec Loss 7.9582 LearningRate 0.0337 Epoch: 8 Global Step: 347640 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:32,229-Speed 2625.19 samples/sec Loss 7.8559 LearningRate 0.0337 Epoch: 8 Global Step: 347650 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:36,119-Speed 2632.90 samples/sec Loss 7.8463 LearningRate 0.0337 Epoch: 8 Global Step: 347660 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:46:40,011-Speed 2632.11 samples/sec Loss 7.7859 LearningRate 0.0337 Epoch: 8 Global Step: 347670 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:46:43,901-Speed 2633.14 samples/sec Loss 7.7736 LearningRate 0.0337 Epoch: 8 Global Step: 347680 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:46:47,795-Speed 2630.23 samples/sec Loss 7.7084 LearningRate 0.0337 Epoch: 8 Global Step: 347690 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:46:51,684-Speed 2634.63 samples/sec Loss 7.7953 LearningRate 0.0337 Epoch: 8 Global Step: 347700 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:46:55,584-Speed 2625.57 samples/sec Loss 7.7911 LearningRate 0.0337 Epoch: 8 Global Step: 347710 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:46:59,474-Speed 2633.10 samples/sec Loss 7.8219 LearningRate 0.0337 Epoch: 8 Global Step: 347720 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:47:03,362-Speed 2634.10 samples/sec Loss 7.8346 LearningRate 0.0337 Epoch: 8 Global Step: 347730 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:47:07,256-Speed 2630.80 samples/sec Loss 7.8395 LearningRate 0.0337 Epoch: 8 Global Step: 347740 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:47:11,157-Speed 2625.38 samples/sec Loss 7.9098 LearningRate 0.0337 Epoch: 8 Global Step: 347750 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:47:15,049-Speed 2632.18 samples/sec Loss 7.7062 LearningRate 0.0337 Epoch: 8 Global Step: 347760 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:47:18,943-Speed 2630.47 samples/sec Loss 7.9415 LearningRate 0.0337 Epoch: 8 Global Step: 347770 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:22,840-Speed 2628.12 samples/sec Loss 7.9364 LearningRate 0.0337 Epoch: 8 Global Step: 347780 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:26,736-Speed 2629.06 samples/sec Loss 7.8348 LearningRate 0.0337 Epoch: 8 Global Step: 347790 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:30,626-Speed 2632.92 samples/sec Loss 7.6730 LearningRate 0.0337 Epoch: 8 Global Step: 347800 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:34,528-Speed 2624.91 samples/sec Loss 7.9373 LearningRate 0.0337 Epoch: 8 Global Step: 347810 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:38,437-Speed 2620.23 samples/sec Loss 7.7944 LearningRate 0.0337 Epoch: 8 Global Step: 347820 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:42,331-Speed 2630.53 samples/sec Loss 7.8208 LearningRate 0.0337 Epoch: 8 Global Step: 347830 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:46,222-Speed 2632.21 samples/sec Loss 7.7820 LearningRate 0.0337 Epoch: 8 Global Step: 347840 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:47:50,103-Speed 2639.47 samples/sec Loss 7.9396 LearningRate 0.0337 Epoch: 8 Global Step: 347850 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:47:54,017-Speed 2616.66 samples/sec Loss 8.9970 LearningRate 0.0337 Epoch: 8 Global Step: 347860 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:47:57,907-Speed 2633.33 samples/sec Loss 8.2106 LearningRate 0.0337 Epoch: 8 Global Step: 347870 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:01,798-Speed 2631.82 samples/sec Loss 7.9116 LearningRate 0.0337 Epoch: 8 Global Step: 347880 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:05,690-Speed 2631.77 samples/sec Loss 7.8741 LearningRate 0.0337 Epoch: 8 Global Step: 347890 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:09,584-Speed 2629.56 samples/sec Loss 7.7998 LearningRate 0.0337 Epoch: 8 Global Step: 347900 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:13,477-Speed 2632.09 samples/sec Loss 7.8627 LearningRate 0.0337 Epoch: 8 Global Step: 347910 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:17,374-Speed 2628.79 samples/sec Loss 7.9680 LearningRate 0.0337 Epoch: 8 Global Step: 347920 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:21,265-Speed 2632.25 samples/sec Loss 7.8506 LearningRate 0.0337 Epoch: 8 Global Step: 347930 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:25,156-Speed 2631.93 samples/sec Loss 7.8740 LearningRate 0.0337 Epoch: 8 Global Step: 347940 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:48:29,050-Speed 2630.59 samples/sec Loss 7.9113 LearningRate 0.0337 Epoch: 8 Global Step: 347950 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:48:32,945-Speed 2629.68 samples/sec Loss 7.7791 LearningRate 0.0337 Epoch: 8 Global Step: 347960 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:48:36,845-Speed 2626.06 samples/sec Loss 7.9154 LearningRate 0.0337 Epoch: 8 Global Step: 347970 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:48:40,769-Speed 2609.94 samples/sec Loss 7.8890 LearningRate 0.0337 Epoch: 8 Global Step: 347980 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:48:44,709-Speed 2599.79 samples/sec Loss 7.8005 LearningRate 0.0337 Epoch: 8 Global Step: 347990 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:48:48,600-Speed 2632.56 samples/sec Loss 7.7203 LearningRate 0.0337 Epoch: 8 Global Step: 348000 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:48:52,495-Speed 2629.97 samples/sec Loss 7.7859 LearningRate 0.0337 Epoch: 8 Global Step: 348010 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:48:56,385-Speed 2632.91 samples/sec Loss 7.8457 LearningRate 0.0337 Epoch: 8 Global Step: 348020 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:49:00,283-Speed 2627.18 samples/sec Loss 7.8042 LearningRate 0.0337 Epoch: 8 Global Step: 348030 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:49:04,203-Speed 2613.05 samples/sec Loss 7.9527 LearningRate 0.0337 Epoch: 8 Global Step: 348040 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:49:08,114-Speed 2618.34 samples/sec Loss 7.8678 LearningRate 0.0337 Epoch: 8 Global Step: 348050 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:49:12,007-Speed 2631.65 samples/sec Loss 7.8690 LearningRate 0.0337 Epoch: 8 Global Step: 348060 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:49:15,831-Speed 2677.73 samples/sec Loss 8.1995 LearningRate 0.0337 Epoch: 8 Global Step: 348070 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:19,720-Speed 2634.01 samples/sec Loss 8.3817 LearningRate 0.0337 Epoch: 8 Global Step: 348080 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:23,602-Speed 2638.02 samples/sec Loss 8.1735 LearningRate 0.0337 Epoch: 8 Global Step: 348090 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:27,491-Speed 2635.15 samples/sec Loss 7.9619 LearningRate 0.0337 Epoch: 8 Global Step: 348100 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:31,380-Speed 2633.30 samples/sec Loss 7.8810 LearningRate 0.0337 Epoch: 8 Global Step: 348110 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:35,266-Speed 2636.23 samples/sec Loss 7.9036 LearningRate 0.0337 Epoch: 8 Global Step: 348120 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:39,159-Speed 2630.56 samples/sec Loss 7.8887 LearningRate 0.0337 Epoch: 8 Global Step: 348130 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:43,045-Speed 2635.50 samples/sec Loss 7.8671 LearningRate 0.0337 Epoch: 8 Global Step: 348140 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:46,940-Speed 2629.69 samples/sec Loss 7.8971 LearningRate 0.0337 Epoch: 8 Global Step: 348150 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:50,830-Speed 2633.01 samples/sec Loss 7.7982 LearningRate 0.0337 Epoch: 8 Global Step: 348160 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 10:49:54,719-Speed 2633.79 samples/sec Loss 7.9520 LearningRate 0.0337 Epoch: 8 Global Step: 348170 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:49:58,615-Speed 2628.79 samples/sec Loss 7.8102 LearningRate 0.0337 Epoch: 8 Global Step: 348180 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:02,514-Speed 2627.39 samples/sec Loss 7.7770 LearningRate 0.0337 Epoch: 8 Global Step: 348190 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:06,403-Speed 2633.39 samples/sec Loss 7.8721 LearningRate 0.0337 Epoch: 8 Global Step: 348200 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:10,293-Speed 2633.07 samples/sec Loss 8.0434 LearningRate 0.0337 Epoch: 8 Global Step: 348210 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:14,188-Speed 2629.94 samples/sec Loss 8.1558 LearningRate 0.0337 Epoch: 8 Global Step: 348220 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:18,073-Speed 2635.98 samples/sec Loss 8.5843 LearningRate 0.0337 Epoch: 8 Global Step: 348230 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:21,964-Speed 2632.29 samples/sec Loss 7.8504 LearningRate 0.0337 Epoch: 8 Global Step: 348240 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:25,854-Speed 2632.62 samples/sec Loss 7.7592 LearningRate 0.0337 Epoch: 8 Global Step: 348250 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:29,757-Speed 2625.05 samples/sec Loss 7.7970 LearningRate 0.0337 Epoch: 8 Global Step: 348260 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 10:50:33,690-Speed 2604.51 samples/sec Loss 7.8570 LearningRate 0.0337 Epoch: 8 Global Step: 348270 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:50:37,604-Speed 2616.45 samples/sec Loss 7.8382 LearningRate 0.0337 Epoch: 8 Global Step: 348280 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:50:41,497-Speed 2631.76 samples/sec Loss 7.7917 LearningRate 0.0337 Epoch: 8 Global Step: 348290 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:50:45,393-Speed 2628.87 samples/sec Loss 7.8920 LearningRate 0.0337 Epoch: 8 Global Step: 348300 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:50:49,285-Speed 2632.05 samples/sec Loss 7.7839 LearningRate 0.0337 Epoch: 8 Global Step: 348310 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:50:53,174-Speed 2633.64 samples/sec Loss 7.8419 LearningRate 0.0337 Epoch: 8 Global Step: 348320 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:50:57,066-Speed 2631.53 samples/sec Loss 7.8666 LearningRate 0.0337 Epoch: 8 Global Step: 348330 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:51:00,957-Speed 2632.10 samples/sec Loss 7.8926 LearningRate 0.0337 Epoch: 8 Global Step: 348340 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:51:04,867-Speed 2620.11 samples/sec Loss 7.8827 LearningRate 0.0337 Epoch: 8 Global Step: 348350 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:51:08,778-Speed 2619.38 samples/sec Loss 7.7768 LearningRate 0.0336 Epoch: 8 Global Step: 348360 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 10:51:12,674-Speed 2628.69 samples/sec Loss 7.9130 LearningRate 0.0336 Epoch: 8 Global Step: 348370 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:16,563-Speed 2634.03 samples/sec Loss 7.9214 LearningRate 0.0336 Epoch: 8 Global Step: 348380 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:20,462-Speed 2626.83 samples/sec Loss 7.8593 LearningRate 0.0336 Epoch: 8 Global Step: 348390 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:24,354-Speed 2631.73 samples/sec Loss 7.8312 LearningRate 0.0336 Epoch: 8 Global Step: 348400 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:28,264-Speed 2619.62 samples/sec Loss 7.7925 LearningRate 0.0336 Epoch: 8 Global Step: 348410 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:32,172-Speed 2621.16 samples/sec Loss 7.7330 LearningRate 0.0336 Epoch: 8 Global Step: 348420 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:36,070-Speed 2627.19 samples/sec Loss 7.7443 LearningRate 0.0336 Epoch: 8 Global Step: 348430 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:39,967-Speed 2628.43 samples/sec Loss 7.7283 LearningRate 0.0336 Epoch: 8 Global Step: 348440 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:43,860-Speed 2630.88 samples/sec Loss 7.8823 LearningRate 0.0336 Epoch: 8 Global Step: 348450 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:47,754-Speed 2631.28 samples/sec Loss 7.9689 LearningRate 0.0336 Epoch: 8 Global Step: 348460 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:51:51,643-Speed 2632.93 samples/sec Loss 7.8954 LearningRate 0.0336 Epoch: 8 Global Step: 348470 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:51:55,534-Speed 2632.41 samples/sec Loss 7.8529 LearningRate 0.0336 Epoch: 8 Global Step: 348480 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:51:59,432-Speed 2627.49 samples/sec Loss 7.9056 LearningRate 0.0336 Epoch: 8 Global Step: 348490 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:03,320-Speed 2634.38 samples/sec Loss 7.8650 LearningRate 0.0336 Epoch: 8 Global Step: 348500 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:07,212-Speed 2631.33 samples/sec Loss 7.7556 LearningRate 0.0336 Epoch: 8 Global Step: 348510 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:11,105-Speed 2631.49 samples/sec Loss 7.7210 LearningRate 0.0336 Epoch: 8 Global Step: 348520 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:15,016-Speed 2618.68 samples/sec Loss 7.8390 LearningRate 0.0336 Epoch: 8 Global Step: 348530 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:19,061-Speed 2533.07 samples/sec Loss 7.9422 LearningRate 0.0336 Epoch: 8 Global Step: 348540 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:22,956-Speed 2629.39 samples/sec Loss 7.7649 LearningRate 0.0336 Epoch: 8 Global Step: 348550 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:26,851-Speed 2629.97 samples/sec Loss 7.8407 LearningRate 0.0336 Epoch: 8 Global Step: 348560 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:30,757-Speed 2622.32 samples/sec Loss 7.8860 LearningRate 0.0336 Epoch: 8 Global Step: 348570 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:52:34,647-Speed 2633.28 samples/sec Loss 7.8634 LearningRate 0.0336 Epoch: 8 Global Step: 348580 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:38,546-Speed 2627.38 samples/sec Loss 7.8846 LearningRate 0.0336 Epoch: 8 Global Step: 348590 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:42,459-Speed 2617.35 samples/sec Loss 7.9010 LearningRate 0.0336 Epoch: 8 Global Step: 348600 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:46,366-Speed 2621.40 samples/sec Loss 7.8773 LearningRate 0.0336 Epoch: 8 Global Step: 348610 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:50,299-Speed 2604.44 samples/sec Loss 7.9248 LearningRate 0.0336 Epoch: 8 Global Step: 348620 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:54,214-Speed 2615.97 samples/sec Loss 7.7236 LearningRate 0.0336 Epoch: 8 Global Step: 348630 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:52:58,136-Speed 2611.97 samples/sec Loss 7.9005 LearningRate 0.0336 Epoch: 8 Global Step: 348640 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:53:02,038-Speed 2624.68 samples/sec Loss 7.8008 LearningRate 0.0336 Epoch: 8 Global Step: 348650 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:53:05,940-Speed 2625.12 samples/sec Loss 7.7773 LearningRate 0.0336 Epoch: 8 Global Step: 348660 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:53:09,947-Speed 2556.13 samples/sec Loss 8.1315 LearningRate 0.0336 Epoch: 8 Global Step: 348670 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:13,839-Speed 2632.40 samples/sec Loss 7.8288 LearningRate 0.0336 Epoch: 8 Global Step: 348680 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:17,731-Speed 2631.26 samples/sec Loss 7.7842 LearningRate 0.0336 Epoch: 8 Global Step: 348690 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:21,653-Speed 2611.57 samples/sec Loss 7.7297 LearningRate 0.0336 Epoch: 8 Global Step: 348700 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:25,555-Speed 2625.25 samples/sec Loss 7.7979 LearningRate 0.0336 Epoch: 8 Global Step: 348710 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:29,455-Speed 2626.61 samples/sec Loss 7.7795 LearningRate 0.0336 Epoch: 8 Global Step: 348720 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:33,345-Speed 2633.09 samples/sec Loss 7.7121 LearningRate 0.0336 Epoch: 8 Global Step: 348730 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:37,238-Speed 2630.31 samples/sec Loss 7.7207 LearningRate 0.0336 Epoch: 8 Global Step: 348740 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:41,140-Speed 2625.05 samples/sec Loss 7.7806 LearningRate 0.0336 Epoch: 8 Global Step: 348750 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:45,040-Speed 2626.57 samples/sec Loss 7.8057 LearningRate 0.0336 Epoch: 8 Global Step: 348760 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:53:48,933-Speed 2631.63 samples/sec Loss 7.7666 LearningRate 0.0336 Epoch: 8 Global Step: 348770 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:53:52,827-Speed 2629.74 samples/sec Loss 7.8212 LearningRate 0.0336 Epoch: 8 Global Step: 348780 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:53:56,723-Speed 2629.15 samples/sec Loss 7.9380 LearningRate 0.0336 Epoch: 8 Global Step: 348790 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:00,619-Speed 2628.77 samples/sec Loss 7.8542 LearningRate 0.0336 Epoch: 8 Global Step: 348800 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:04,514-Speed 2629.92 samples/sec Loss 7.7355 LearningRate 0.0336 Epoch: 8 Global Step: 348810 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:08,408-Speed 2630.11 samples/sec Loss 7.7644 LearningRate 0.0336 Epoch: 8 Global Step: 348820 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:12,308-Speed 2626.45 samples/sec Loss 7.8855 LearningRate 0.0336 Epoch: 8 Global Step: 348830 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:16,199-Speed 2631.91 samples/sec Loss 7.9265 LearningRate 0.0336 Epoch: 8 Global Step: 348840 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:20,092-Speed 2631.64 samples/sec Loss 7.8142 LearningRate 0.0336 Epoch: 8 Global Step: 348850 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:23,984-Speed 2631.76 samples/sec Loss 7.9643 LearningRate 0.0336 Epoch: 8 Global Step: 348860 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:27,877-Speed 2631.24 samples/sec Loss 7.8819 LearningRate 0.0336 Epoch: 8 Global Step: 348870 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:54:31,771-Speed 2629.85 samples/sec Loss 7.8261 LearningRate 0.0336 Epoch: 8 Global Step: 348880 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:54:35,664-Speed 2630.63 samples/sec Loss 7.6794 LearningRate 0.0336 Epoch: 8 Global Step: 348890 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:54:39,558-Speed 2630.38 samples/sec Loss 7.8070 LearningRate 0.0336 Epoch: 8 Global Step: 348900 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:54:43,452-Speed 2630.27 samples/sec Loss 7.8035 LearningRate 0.0336 Epoch: 8 Global Step: 348910 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:54:47,348-Speed 2628.53 samples/sec Loss 7.9214 LearningRate 0.0336 Epoch: 8 Global Step: 348920 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:54:51,225-Speed 2642.63 samples/sec Loss 7.7224 LearningRate 0.0336 Epoch: 8 Global Step: 348930 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:55,118-Speed 2631.09 samples/sec Loss 7.8400 LearningRate 0.0336 Epoch: 8 Global Step: 348940 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:54:59,022-Speed 2623.65 samples/sec Loss 7.9165 LearningRate 0.0336 Epoch: 8 Global Step: 348950 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:02,918-Speed 2628.96 samples/sec Loss 7.8681 LearningRate 0.0336 Epoch: 8 Global Step: 348960 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:06,806-Speed 2633.91 samples/sec Loss 7.7905 LearningRate 0.0336 Epoch: 8 Global Step: 348970 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:10,698-Speed 2631.44 samples/sec Loss 7.9189 LearningRate 0.0336 Epoch: 8 Global Step: 348980 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:14,592-Speed 2630.41 samples/sec Loss 7.8248 LearningRate 0.0336 Epoch: 8 Global Step: 348990 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:18,485-Speed 2631.10 samples/sec Loss 7.7925 LearningRate 0.0336 Epoch: 8 Global Step: 349000 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:22,386-Speed 2634.08 samples/sec Loss 7.7649 LearningRate 0.0336 Epoch: 8 Global Step: 349010 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:26,277-Speed 2631.93 samples/sec Loss 7.8179 LearningRate 0.0336 Epoch: 8 Global Step: 349020 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:55:30,171-Speed 2630.99 samples/sec Loss 7.7357 LearningRate 0.0336 Epoch: 8 Global Step: 349030 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:55:34,063-Speed 2631.03 samples/sec Loss 7.7789 LearningRate 0.0336 Epoch: 8 Global Step: 349040 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:55:37,957-Speed 2631.01 samples/sec Loss 7.7905 LearningRate 0.0336 Epoch: 8 Global Step: 349050 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:55:41,850-Speed 2630.74 samples/sec Loss 7.7833 LearningRate 0.0336 Epoch: 8 Global Step: 349060 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:55:45,744-Speed 2630.31 samples/sec Loss 7.8773 LearningRate 0.0335 Epoch: 8 Global Step: 349070 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:55:49,648-Speed 2623.56 samples/sec Loss 7.8653 LearningRate 0.0335 Epoch: 8 Global Step: 349080 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:55:53,530-Speed 2638.39 samples/sec Loss 7.8518 LearningRate 0.0335 Epoch: 8 Global Step: 349090 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:55:57,426-Speed 2629.43 samples/sec Loss 7.7154 LearningRate 0.0335 Epoch: 8 Global Step: 349100 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:56:01,324-Speed 2627.36 samples/sec Loss 7.8857 LearningRate 0.0335 Epoch: 8 Global Step: 349110 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:56:05,222-Speed 2627.53 samples/sec Loss 7.8398 LearningRate 0.0335 Epoch: 8 Global Step: 349120 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:56:09,088-Speed 2649.30 samples/sec Loss 7.7937 LearningRate 0.0335 Epoch: 8 Global Step: 349130 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:12,983-Speed 2630.16 samples/sec Loss 7.8720 LearningRate 0.0335 Epoch: 8 Global Step: 349140 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:16,876-Speed 2630.40 samples/sec Loss 7.8426 LearningRate 0.0335 Epoch: 8 Global Step: 349150 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:20,780-Speed 2623.89 samples/sec Loss 7.8176 LearningRate 0.0335 Epoch: 8 Global Step: 349160 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:24,672-Speed 2632.47 samples/sec Loss 7.8551 LearningRate 0.0335 Epoch: 8 Global Step: 349170 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:28,568-Speed 2628.42 samples/sec Loss 7.8249 LearningRate 0.0335 Epoch: 8 Global Step: 349180 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:32,461-Speed 2631.07 samples/sec Loss 7.6645 LearningRate 0.0335 Epoch: 8 Global Step: 349190 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:36,359-Speed 2627.82 samples/sec Loss 7.7793 LearningRate 0.0335 Epoch: 8 Global Step: 349200 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:40,254-Speed 2629.40 samples/sec Loss 7.7129 LearningRate 0.0335 Epoch: 8 Global Step: 349210 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:44,148-Speed 2630.27 samples/sec Loss 7.9960 LearningRate 0.0335 Epoch: 8 Global Step: 349220 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:56:48,117-Speed 2581.08 samples/sec Loss 7.7435 LearningRate 0.0335 Epoch: 8 Global Step: 349230 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:56:52,012-Speed 2629.65 samples/sec Loss 7.7663 LearningRate 0.0335 Epoch: 8 Global Step: 349240 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:56:55,900-Speed 2634.05 samples/sec Loss 7.7660 LearningRate 0.0335 Epoch: 8 Global Step: 349250 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:56:59,809-Speed 2620.03 samples/sec Loss 7.7462 LearningRate 0.0335 Epoch: 8 Global Step: 349260 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:03,709-Speed 2627.18 samples/sec Loss 7.9420 LearningRate 0.0335 Epoch: 8 Global Step: 349270 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:07,602-Speed 2630.59 samples/sec Loss 7.8308 LearningRate 0.0335 Epoch: 8 Global Step: 349280 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:11,502-Speed 2626.58 samples/sec Loss 7.6911 LearningRate 0.0335 Epoch: 8 Global Step: 349290 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:15,403-Speed 2625.98 samples/sec Loss 7.9529 LearningRate 0.0335 Epoch: 8 Global Step: 349300 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:19,307-Speed 2623.30 samples/sec Loss 7.7542 LearningRate 0.0335 Epoch: 8 Global Step: 349310 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:23,199-Speed 2631.51 samples/sec Loss 7.7487 LearningRate 0.0335 Epoch: 8 Global Step: 349320 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:27,096-Speed 2628.60 samples/sec Loss 7.7098 LearningRate 0.0335 Epoch: 8 Global Step: 349330 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:57:30,987-Speed 2632.27 samples/sec Loss 7.7488 LearningRate 0.0335 Epoch: 8 Global Step: 349340 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:57:34,973-Speed 2570.03 samples/sec Loss 8.0394 LearningRate 0.0335 Epoch: 8 Global Step: 349350 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:57:38,862-Speed 2633.35 samples/sec Loss 7.9304 LearningRate 0.0335 Epoch: 8 Global Step: 349360 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:57:42,759-Speed 2628.71 samples/sec Loss 7.7294 LearningRate 0.0335 Epoch: 8 Global Step: 349370 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:57:46,650-Speed 2632.13 samples/sec Loss 7.9278 LearningRate 0.0335 Epoch: 8 Global Step: 349380 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:57:50,553-Speed 2624.59 samples/sec Loss 7.8056 LearningRate 0.0335 Epoch: 8 Global Step: 349390 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:57:54,432-Speed 2640.30 samples/sec Loss 7.8635 LearningRate 0.0335 Epoch: 8 Global Step: 349400 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:57:58,329-Speed 2628.85 samples/sec Loss 7.8809 LearningRate 0.0335 Epoch: 8 Global Step: 349410 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:02,226-Speed 2627.90 samples/sec Loss 7.6143 LearningRate 0.0335 Epoch: 8 Global Step: 349420 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:06,118-Speed 2631.97 samples/sec Loss 7.9202 LearningRate 0.0335 Epoch: 8 Global Step: 349430 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:10,009-Speed 2632.32 samples/sec Loss 7.8632 LearningRate 0.0335 Epoch: 8 Global Step: 349440 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:13,900-Speed 2632.69 samples/sec Loss 7.7412 LearningRate 0.0335 Epoch: 8 Global Step: 349450 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:17,795-Speed 2630.16 samples/sec Loss 7.6810 LearningRate 0.0335 Epoch: 8 Global Step: 349460 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:21,692-Speed 2628.19 samples/sec Loss 7.8639 LearningRate 0.0335 Epoch: 8 Global Step: 349470 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:25,591-Speed 2627.08 samples/sec Loss 7.8810 LearningRate 0.0335 Epoch: 8 Global Step: 349480 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:29,482-Speed 2632.24 samples/sec Loss 7.8094 LearningRate 0.0335 Epoch: 8 Global Step: 349490 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:33,562-Speed 2510.44 samples/sec Loss 7.7452 LearningRate 0.0335 Epoch: 8 Global Step: 349500 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 10:58:37,626-Speed 2520.30 samples/sec Loss 7.6826 LearningRate 0.0335 Epoch: 8 Global Step: 349510 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:41,619-Speed 2565.16 samples/sec Loss 7.8241 LearningRate 0.0335 Epoch: 8 Global Step: 349520 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:45,513-Speed 2630.41 samples/sec Loss 7.7944 LearningRate 0.0335 Epoch: 8 Global Step: 349530 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:49,409-Speed 2629.10 samples/sec Loss 7.8235 LearningRate 0.0335 Epoch: 8 Global Step: 349540 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 10:58:53,268-Speed 2654.42 samples/sec Loss 7.8405 LearningRate 0.0335 Epoch: 8 Global Step: 349550 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:58:57,163-Speed 2629.60 samples/sec Loss 7.8313 LearningRate 0.0335 Epoch: 8 Global Step: 349560 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:01,056-Speed 2630.71 samples/sec Loss 7.7952 LearningRate 0.0335 Epoch: 8 Global Step: 349570 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:04,949-Speed 2630.91 samples/sec Loss 7.8403 LearningRate 0.0335 Epoch: 8 Global Step: 349580 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:08,839-Speed 2633.38 samples/sec Loss 7.7487 LearningRate 0.0335 Epoch: 8 Global Step: 349590 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:12,732-Speed 2630.84 samples/sec Loss 7.6833 LearningRate 0.0335 Epoch: 8 Global Step: 349600 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:16,624-Speed 2631.87 samples/sec Loss 7.7935 LearningRate 0.0335 Epoch: 8 Global Step: 349610 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:20,518-Speed 2630.60 samples/sec Loss 7.7886 LearningRate 0.0335 Epoch: 8 Global Step: 349620 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:24,415-Speed 2627.94 samples/sec Loss 7.7561 LearningRate 0.0335 Epoch: 8 Global Step: 349630 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:28,323-Speed 2621.08 samples/sec Loss 7.9366 LearningRate 0.0335 Epoch: 8 Global Step: 349640 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 10:59:32,223-Speed 2625.77 samples/sec Loss 7.7891 LearningRate 0.0335 Epoch: 8 Global Step: 349650 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:59:36,132-Speed 2619.96 samples/sec Loss 7.7346 LearningRate 0.0335 Epoch: 8 Global Step: 349660 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:59:40,047-Speed 2616.32 samples/sec Loss 7.7446 LearningRate 0.0335 Epoch: 8 Global Step: 349670 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:59:43,945-Speed 2628.66 samples/sec Loss 7.8603 LearningRate 0.0335 Epoch: 8 Global Step: 349680 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:59:47,842-Speed 2628.04 samples/sec Loss 7.8624 LearningRate 0.0335 Epoch: 8 Global Step: 349690 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:59:51,738-Speed 2629.19 samples/sec Loss 7.7917 LearningRate 0.0335 Epoch: 8 Global Step: 349700 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:59:55,631-Speed 2630.65 samples/sec Loss 7.6744 LearningRate 0.0335 Epoch: 8 Global Step: 349710 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 10:59:59,569-Speed 2601.62 samples/sec Loss 7.8094 LearningRate 0.0335 Epoch: 8 Global Step: 349720 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:00:03,460-Speed 2631.97 samples/sec Loss 7.6741 LearningRate 0.0335 Epoch: 8 Global Step: 349730 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:00:07,351-Speed 2632.25 samples/sec Loss 7.8699 LearningRate 0.0335 Epoch: 8 Global Step: 349740 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:00:11,245-Speed 2630.57 samples/sec Loss 7.7134 LearningRate 0.0335 Epoch: 8 Global Step: 349750 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:15,208-Speed 2584.57 samples/sec Loss 7.8161 LearningRate 0.0335 Epoch: 8 Global Step: 349760 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:19,122-Speed 2616.90 samples/sec Loss 7.7861 LearningRate 0.0335 Epoch: 8 Global Step: 349770 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:23,015-Speed 2631.18 samples/sec Loss 7.7958 LearningRate 0.0335 Epoch: 8 Global Step: 349780 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:26,907-Speed 2631.65 samples/sec Loss 7.7846 LearningRate 0.0334 Epoch: 8 Global Step: 349790 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:30,801-Speed 2630.62 samples/sec Loss 7.7662 LearningRate 0.0334 Epoch: 8 Global Step: 349800 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:34,701-Speed 2626.16 samples/sec Loss 7.8708 LearningRate 0.0334 Epoch: 8 Global Step: 349810 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:38,597-Speed 2628.96 samples/sec Loss 7.7171 LearningRate 0.0334 Epoch: 8 Global Step: 349820 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:42,491-Speed 2630.05 samples/sec Loss 7.7820 LearningRate 0.0334 Epoch: 8 Global Step: 349830 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:46,395-Speed 2623.93 samples/sec Loss 7.7171 LearningRate 0.0334 Epoch: 8 Global Step: 349840 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:00:50,289-Speed 2630.32 samples/sec Loss 7.8522 LearningRate 0.0334 Epoch: 8 Global Step: 349850 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:00:54,189-Speed 2625.84 samples/sec Loss 7.8608 LearningRate 0.0334 Epoch: 8 Global Step: 349860 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:00:58,088-Speed 2627.02 samples/sec Loss 7.8234 LearningRate 0.0334 Epoch: 8 Global Step: 349870 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:01,982-Speed 2630.32 samples/sec Loss 7.8138 LearningRate 0.0334 Epoch: 8 Global Step: 349880 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:05,882-Speed 2626.69 samples/sec Loss 7.7620 LearningRate 0.0334 Epoch: 8 Global Step: 349890 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:09,774-Speed 2631.45 samples/sec Loss 7.7436 LearningRate 0.0334 Epoch: 8 Global Step: 349900 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:13,671-Speed 2628.25 samples/sec Loss 7.9242 LearningRate 0.0334 Epoch: 8 Global Step: 349910 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:17,568-Speed 2628.16 samples/sec Loss 7.8426 LearningRate 0.0334 Epoch: 8 Global Step: 349920 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:21,460-Speed 2632.04 samples/sec Loss 7.8222 LearningRate 0.0334 Epoch: 8 Global Step: 349930 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:25,368-Speed 2620.53 samples/sec Loss 7.7712 LearningRate 0.0334 Epoch: 8 Global Step: 349940 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:29,271-Speed 2625.19 samples/sec Loss 7.7341 LearningRate 0.0334 Epoch: 8 Global Step: 349950 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:01:33,169-Speed 2627.34 samples/sec Loss 7.8290 LearningRate 0.0334 Epoch: 8 Global Step: 349960 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:01:37,059-Speed 2632.74 samples/sec Loss 7.9281 LearningRate 0.0334 Epoch: 8 Global Step: 349970 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:40,947-Speed 2634.83 samples/sec Loss 7.7826 LearningRate 0.0334 Epoch: 8 Global Step: 349980 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:44,866-Speed 2613.50 samples/sec Loss 7.8003 LearningRate 0.0334 Epoch: 8 Global Step: 349990 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:01:48,760-Speed 2630.21 samples/sec Loss 7.7117 LearningRate 0.0334 Epoch: 8 Global Step: 350000 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:02:31,897-[lfw][350000]XNorm: 23.307740
Training: 2022-04-14 11:02:31,898-[lfw][350000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 11:02:31,898-[lfw][350000]Accuracy-Highest: 0.99783
Training: 2022-04-14 11:03:22,011-[cfp_fp][350000]XNorm: 21.773272
Training: 2022-04-14 11:03:22,012-[cfp_fp][350000]Accuracy-Flip: 0.98300+-0.00689
Training: 2022-04-14 11:03:22,013-[cfp_fp][350000]Accuracy-Highest: 0.98671
Training: 2022-04-14 11:04:05,224-[agedb_30][350000]XNorm: 23.298955
Training: 2022-04-14 11:04:05,225-[agedb_30][350000]Accuracy-Flip: 0.97383+-0.00711
Training: 2022-04-14 11:04:05,226-[agedb_30][350000]Accuracy-Highest: 0.97567
Training: 2022-04-14 11:04:09,078-Speed 72.98 samples/sec Loss 7.8284 LearningRate 0.0334 Epoch: 8 Global Step: 350010 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:12,963-Speed 2636.60 samples/sec Loss 7.6572 LearningRate 0.0334 Epoch: 8 Global Step: 350020 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:16,846-Speed 2638.19 samples/sec Loss 7.8301 LearningRate 0.0334 Epoch: 8 Global Step: 350030 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:20,728-Speed 2638.03 samples/sec Loss 7.8756 LearningRate 0.0334 Epoch: 8 Global Step: 350040 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:24,613-Speed 2636.29 samples/sec Loss 7.7250 LearningRate 0.0334 Epoch: 8 Global Step: 350050 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:28,503-Speed 2632.66 samples/sec Loss 7.7519 LearningRate 0.0334 Epoch: 8 Global Step: 350060 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:32,392-Speed 2634.53 samples/sec Loss 7.8223 LearningRate 0.0334 Epoch: 8 Global Step: 350070 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:36,284-Speed 2631.45 samples/sec Loss 7.6980 LearningRate 0.0334 Epoch: 8 Global Step: 350080 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:40,179-Speed 2629.83 samples/sec Loss 7.7555 LearningRate 0.0334 Epoch: 8 Global Step: 350090 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:44,063-Speed 2637.69 samples/sec Loss 7.7704 LearningRate 0.0334 Epoch: 8 Global Step: 350100 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:04:47,953-Speed 2632.70 samples/sec Loss 7.6910 LearningRate 0.0334 Epoch: 8 Global Step: 350110 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:04:51,849-Speed 2628.82 samples/sec Loss 7.7395 LearningRate 0.0334 Epoch: 8 Global Step: 350120 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:04:55,756-Speed 2622.05 samples/sec Loss 7.8127 LearningRate 0.0334 Epoch: 8 Global Step: 350130 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:04:59,658-Speed 2624.34 samples/sec Loss 7.7396 LearningRate 0.0334 Epoch: 8 Global Step: 350140 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:05:03,485-Speed 2676.69 samples/sec Loss 7.7714 LearningRate 0.0334 Epoch: 8 Global Step: 350150 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:07,378-Speed 2630.75 samples/sec Loss 8.7090 LearningRate 0.0334 Epoch: 8 Global Step: 350160 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:11,290-Speed 2618.24 samples/sec Loss 7.9132 LearningRate 0.0334 Epoch: 8 Global Step: 350170 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:15,190-Speed 2626.09 samples/sec Loss 7.7453 LearningRate 0.0334 Epoch: 8 Global Step: 350180 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:19,109-Speed 2615.10 samples/sec Loss 7.9204 LearningRate 0.0334 Epoch: 8 Global Step: 350190 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:23,016-Speed 2621.70 samples/sec Loss 7.8686 LearningRate 0.0334 Epoch: 8 Global Step: 350200 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:26,915-Speed 2626.86 samples/sec Loss 7.9257 LearningRate 0.0334 Epoch: 8 Global Step: 350210 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:30,819-Speed 2622.83 samples/sec Loss 7.7419 LearningRate 0.0334 Epoch: 8 Global Step: 350220 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:34,731-Speed 2618.85 samples/sec Loss 7.8369 LearningRate 0.0334 Epoch: 8 Global Step: 350230 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:38,637-Speed 2621.72 samples/sec Loss 7.7127 LearningRate 0.0334 Epoch: 8 Global Step: 350240 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:05:42,547-Speed 2620.11 samples/sec Loss 7.8960 LearningRate 0.0334 Epoch: 8 Global Step: 350250 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:05:46,451-Speed 2623.41 samples/sec Loss 7.7500 LearningRate 0.0334 Epoch: 8 Global Step: 350260 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:05:50,448-Speed 2562.51 samples/sec Loss 7.7822 LearningRate 0.0334 Epoch: 8 Global Step: 350270 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:05:54,346-Speed 2627.93 samples/sec Loss 7.8853 LearningRate 0.0334 Epoch: 8 Global Step: 350280 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:05:58,264-Speed 2613.98 samples/sec Loss 7.7824 LearningRate 0.0334 Epoch: 8 Global Step: 350290 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:06:02,180-Speed 2615.85 samples/sec Loss 7.7738 LearningRate 0.0334 Epoch: 8 Global Step: 350300 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:06:06,095-Speed 2616.65 samples/sec Loss 7.7673 LearningRate 0.0334 Epoch: 8 Global Step: 350310 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:06:10,006-Speed 2618.56 samples/sec Loss 7.6850 LearningRate 0.0334 Epoch: 8 Global Step: 350320 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:06:13,913-Speed 2621.78 samples/sec Loss 7.7477 LearningRate 0.0334 Epoch: 8 Global Step: 350330 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:06:17,814-Speed 2625.74 samples/sec Loss 7.8606 LearningRate 0.0334 Epoch: 8 Global Step: 350340 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:06:21,709-Speed 2629.64 samples/sec Loss 7.7507 LearningRate 0.0334 Epoch: 8 Global Step: 350350 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:25,618-Speed 2620.16 samples/sec Loss 7.7774 LearningRate 0.0334 Epoch: 8 Global Step: 350360 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:29,517-Speed 2626.54 samples/sec Loss 7.8123 LearningRate 0.0334 Epoch: 8 Global Step: 350370 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:33,416-Speed 2627.25 samples/sec Loss 7.7165 LearningRate 0.0334 Epoch: 8 Global Step: 350380 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:37,312-Speed 2628.65 samples/sec Loss 7.8761 LearningRate 0.0334 Epoch: 8 Global Step: 350390 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:41,208-Speed 2629.47 samples/sec Loss 7.8491 LearningRate 0.0334 Epoch: 8 Global Step: 350400 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:45,104-Speed 2628.96 samples/sec Loss 7.8132 LearningRate 0.0334 Epoch: 8 Global Step: 350410 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:49,018-Speed 2616.86 samples/sec Loss 7.8692 LearningRate 0.0334 Epoch: 8 Global Step: 350420 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:52,914-Speed 2628.90 samples/sec Loss 7.6786 LearningRate 0.0334 Epoch: 8 Global Step: 350430 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:06:56,823-Speed 2620.50 samples/sec Loss 7.8376 LearningRate 0.0334 Epoch: 8 Global Step: 350440 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:07:00,729-Speed 2621.77 samples/sec Loss 7.7831 LearningRate 0.0334 Epoch: 8 Global Step: 350450 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:04,639-Speed 2619.43 samples/sec Loss 7.6502 LearningRate 0.0334 Epoch: 8 Global Step: 350460 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:08,547-Speed 2620.88 samples/sec Loss 7.7500 LearningRate 0.0334 Epoch: 8 Global Step: 350470 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:12,454-Speed 2622.05 samples/sec Loss 7.9318 LearningRate 0.0334 Epoch: 8 Global Step: 350480 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:16,355-Speed 2625.36 samples/sec Loss 7.8201 LearningRate 0.0334 Epoch: 8 Global Step: 350490 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:20,266-Speed 2618.63 samples/sec Loss 7.5957 LearningRate 0.0334 Epoch: 8 Global Step: 350500 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:24,168-Speed 2624.90 samples/sec Loss 7.9119 LearningRate 0.0333 Epoch: 8 Global Step: 350510 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:28,070-Speed 2624.61 samples/sec Loss 7.7485 LearningRate 0.0333 Epoch: 8 Global Step: 350520 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:31,969-Speed 2627.44 samples/sec Loss 7.7725 LearningRate 0.0333 Epoch: 8 Global Step: 350530 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:35,865-Speed 2628.83 samples/sec Loss 7.7826 LearningRate 0.0333 Epoch: 8 Global Step: 350540 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:39,763-Speed 2627.85 samples/sec Loss 7.8088 LearningRate 0.0333 Epoch: 8 Global Step: 350550 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:07:43,661-Speed 2627.75 samples/sec Loss 7.7132 LearningRate 0.0333 Epoch: 8 Global Step: 350560 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:07:47,537-Speed 2642.64 samples/sec Loss 7.7482 LearningRate 0.0333 Epoch: 8 Global Step: 350570 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:51,442-Speed 2622.65 samples/sec Loss 7.8390 LearningRate 0.0333 Epoch: 8 Global Step: 350580 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:55,348-Speed 2622.32 samples/sec Loss 7.7882 LearningRate 0.0333 Epoch: 8 Global Step: 350590 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:07:59,242-Speed 2629.99 samples/sec Loss 7.7146 LearningRate 0.0333 Epoch: 8 Global Step: 350600 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:08:03,140-Speed 2627.94 samples/sec Loss 7.7543 LearningRate 0.0333 Epoch: 8 Global Step: 350610 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:08:07,048-Speed 2621.01 samples/sec Loss 7.9110 LearningRate 0.0333 Epoch: 8 Global Step: 350620 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:08:10,943-Speed 2630.06 samples/sec Loss 7.8266 LearningRate 0.0333 Epoch: 8 Global Step: 350630 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:08:14,840-Speed 2627.82 samples/sec Loss 7.7316 LearningRate 0.0333 Epoch: 8 Global Step: 350640 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:08:18,740-Speed 2626.80 samples/sec Loss 7.7160 LearningRate 0.0333 Epoch: 8 Global Step: 350650 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:08:22,640-Speed 2625.97 samples/sec Loss 7.6971 LearningRate 0.0333 Epoch: 8 Global Step: 350660 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:08:26,561-Speed 2612.07 samples/sec Loss 7.7239 LearningRate 0.0333 Epoch: 8 Global Step: 350670 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:30,457-Speed 2629.23 samples/sec Loss 7.5918 LearningRate 0.0333 Epoch: 8 Global Step: 350680 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:34,353-Speed 2629.15 samples/sec Loss 7.8339 LearningRate 0.0333 Epoch: 8 Global Step: 350690 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:38,284-Speed 2605.53 samples/sec Loss 7.7127 LearningRate 0.0333 Epoch: 8 Global Step: 350700 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:42,178-Speed 2630.09 samples/sec Loss 7.8334 LearningRate 0.0333 Epoch: 8 Global Step: 350710 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:46,091-Speed 2618.17 samples/sec Loss 7.7336 LearningRate 0.0333 Epoch: 8 Global Step: 350720 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:49,995-Speed 2623.23 samples/sec Loss 7.6790 LearningRate 0.0333 Epoch: 8 Global Step: 350730 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:53,926-Speed 2605.93 samples/sec Loss 7.6994 LearningRate 0.0333 Epoch: 8 Global Step: 350740 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:08:57,867-Speed 2599.27 samples/sec Loss 7.8676 LearningRate 0.0333 Epoch: 8 Global Step: 350750 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:01,768-Speed 2625.33 samples/sec Loss 7.8734 LearningRate 0.0333 Epoch: 8 Global Step: 350760 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:05,660-Speed 2631.57 samples/sec Loss 7.8518 LearningRate 0.0333 Epoch: 8 Global Step: 350770 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:09:09,545-Speed 2637.11 samples/sec Loss 7.7964 LearningRate 0.0333 Epoch: 8 Global Step: 350780 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:13,445-Speed 2625.71 samples/sec Loss 7.8038 LearningRate 0.0333 Epoch: 8 Global Step: 350790 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:17,340-Speed 2629.71 samples/sec Loss 7.8734 LearningRate 0.0333 Epoch: 8 Global Step: 350800 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:21,242-Speed 2625.66 samples/sec Loss 7.7735 LearningRate 0.0333 Epoch: 8 Global Step: 350810 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:25,157-Speed 2615.87 samples/sec Loss 7.7940 LearningRate 0.0333 Epoch: 8 Global Step: 350820 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:29,072-Speed 2616.56 samples/sec Loss 7.9704 LearningRate 0.0333 Epoch: 8 Global Step: 350830 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:32,972-Speed 2625.98 samples/sec Loss 7.9626 LearningRate 0.0333 Epoch: 8 Global Step: 350840 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:09:36,854-Speed 2638.34 samples/sec Loss 7.8278 LearningRate 0.0333 Epoch: 8 Global Step: 350850 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:09:40,749-Speed 2629.80 samples/sec Loss 7.7911 LearningRate 0.0333 Epoch: 8 Global Step: 350860 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:09:44,669-Speed 2613.22 samples/sec Loss 7.8576 LearningRate 0.0333 Epoch: 8 Global Step: 350870 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:09:48,566-Speed 2627.98 samples/sec Loss 7.7523 LearningRate 0.0333 Epoch: 8 Global Step: 350880 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:09:52,466-Speed 2630.46 samples/sec Loss 7.8274 LearningRate 0.0333 Epoch: 8 Global Step: 350890 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:09:56,368-Speed 2625.47 samples/sec Loss 7.8681 LearningRate 0.0333 Epoch: 8 Global Step: 350900 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:10:00,270-Speed 2624.68 samples/sec Loss 7.6925 LearningRate 0.0333 Epoch: 8 Global Step: 350910 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:10:04,165-Speed 2630.14 samples/sec Loss 7.7920 LearningRate 0.0333 Epoch: 8 Global Step: 350920 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:10:08,061-Speed 2628.79 samples/sec Loss 7.7769 LearningRate 0.0333 Epoch: 8 Global Step: 350930 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:10:11,972-Speed 2618.49 samples/sec Loss 7.7720 LearningRate 0.0333 Epoch: 8 Global Step: 350940 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:10:15,873-Speed 2625.98 samples/sec Loss 7.7756 LearningRate 0.0333 Epoch: 8 Global Step: 350950 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:10:19,775-Speed 2625.21 samples/sec Loss 7.6414 LearningRate 0.0333 Epoch: 8 Global Step: 350960 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:10:23,680-Speed 2622.49 samples/sec Loss 7.8539 LearningRate 0.0333 Epoch: 8 Global Step: 350970 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:10:27,577-Speed 2628.87 samples/sec Loss 7.7015 LearningRate 0.0333 Epoch: 8 Global Step: 350980 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:10:31,487-Speed 2619.21 samples/sec Loss 7.8002 LearningRate 0.0333 Epoch: 8 Global Step: 350990 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:10:35,392-Speed 2623.11 samples/sec Loss 7.7378 LearningRate 0.0333 Epoch: 8 Global Step: 351000 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:10:39,292-Speed 2625.54 samples/sec Loss 7.8077 LearningRate 0.0333 Epoch: 8 Global Step: 351010 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:10:43,198-Speed 2622.82 samples/sec Loss 7.7743 LearningRate 0.0333 Epoch: 8 Global Step: 351020 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:10:47,091-Speed 2630.64 samples/sec Loss 7.8082 LearningRate 0.0333 Epoch: 8 Global Step: 351030 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:10:50,908-Speed 2684.10 samples/sec Loss 8.1792 LearningRate 0.0333 Epoch: 8 Global Step: 351040 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:10:54,806-Speed 2627.51 samples/sec Loss 9.3272 LearningRate 0.0333 Epoch: 8 Global Step: 351050 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:10:58,693-Speed 2635.19 samples/sec Loss 8.1442 LearningRate 0.0333 Epoch: 8 Global Step: 351060 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:02,608-Speed 2616.24 samples/sec Loss 7.9191 LearningRate 0.0333 Epoch: 8 Global Step: 351070 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:06,498-Speed 2632.95 samples/sec Loss 7.8261 LearningRate 0.0333 Epoch: 8 Global Step: 351080 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:10,396-Speed 2626.76 samples/sec Loss 7.9354 LearningRate 0.0333 Epoch: 8 Global Step: 351090 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:14,345-Speed 2593.88 samples/sec Loss 7.7428 LearningRate 0.0333 Epoch: 8 Global Step: 351100 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:18,236-Speed 2632.25 samples/sec Loss 7.8871 LearningRate 0.0333 Epoch: 8 Global Step: 351110 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:22,127-Speed 2632.46 samples/sec Loss 7.8608 LearningRate 0.0333 Epoch: 8 Global Step: 351120 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:26,030-Speed 2624.81 samples/sec Loss 7.8095 LearningRate 0.0333 Epoch: 8 Global Step: 351130 Fp16 Grad Scale: 2048 Required: 54 hours
Training: 2022-04-14 11:11:29,929-Speed 2626.77 samples/sec Loss 7.8988 LearningRate 0.0333 Epoch: 8 Global Step: 351140 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:11:33,842-Speed 2617.52 samples/sec Loss 7.7470 LearningRate 0.0333 Epoch: 8 Global Step: 351150 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:11:37,744-Speed 2624.81 samples/sec Loss 7.8384 LearningRate 0.0333 Epoch: 8 Global Step: 351160 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:11:41,637-Speed 2630.48 samples/sec Loss 7.8907 LearningRate 0.0333 Epoch: 8 Global Step: 351170 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:11:45,534-Speed 2628.63 samples/sec Loss 7.7596 LearningRate 0.0333 Epoch: 8 Global Step: 351180 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:11:49,432-Speed 2627.31 samples/sec Loss 7.7814 LearningRate 0.0333 Epoch: 8 Global Step: 351190 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:11:53,339-Speed 2622.57 samples/sec Loss 7.8551 LearningRate 0.0333 Epoch: 8 Global Step: 351200 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:11:57,258-Speed 2613.28 samples/sec Loss 7.7208 LearningRate 0.0333 Epoch: 8 Global Step: 351210 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:12:01,155-Speed 2628.75 samples/sec Loss 7.8352 LearningRate 0.0333 Epoch: 8 Global Step: 351220 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:12:05,053-Speed 2627.08 samples/sec Loss 7.7922 LearningRate 0.0332 Epoch: 8 Global Step: 351230 Fp16 Grad Scale: 4096 Required: 54 hours
Training: 2022-04-14 11:12:08,949-Speed 2629.16 samples/sec Loss 7.7632 LearningRate 0.0332 Epoch: 8 Global Step: 351240 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:12,856-Speed 2621.86 samples/sec Loss 7.7654 LearningRate 0.0332 Epoch: 8 Global Step: 351250 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:16,749-Speed 2630.50 samples/sec Loss 7.8154 LearningRate 0.0332 Epoch: 8 Global Step: 351260 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:20,644-Speed 2630.16 samples/sec Loss 7.7952 LearningRate 0.0332 Epoch: 8 Global Step: 351270 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:24,534-Speed 2632.34 samples/sec Loss 7.6619 LearningRate 0.0332 Epoch: 8 Global Step: 351280 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:28,432-Speed 2628.11 samples/sec Loss 7.8478 LearningRate 0.0332 Epoch: 8 Global Step: 351290 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:32,332-Speed 2626.13 samples/sec Loss 7.8643 LearningRate 0.0332 Epoch: 8 Global Step: 351300 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:36,235-Speed 2624.96 samples/sec Loss 7.8296 LearningRate 0.0332 Epoch: 8 Global Step: 351310 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:40,141-Speed 2622.30 samples/sec Loss 7.8519 LearningRate 0.0332 Epoch: 8 Global Step: 351320 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:44,054-Speed 2617.41 samples/sec Loss 7.9729 LearningRate 0.0332 Epoch: 8 Global Step: 351330 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:12:47,957-Speed 2624.13 samples/sec Loss 7.7841 LearningRate 0.0332 Epoch: 8 Global Step: 351340 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:12:51,855-Speed 2627.87 samples/sec Loss 7.7076 LearningRate 0.0332 Epoch: 8 Global Step: 351350 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:12:55,776-Speed 2612.06 samples/sec Loss 7.7038 LearningRate 0.0332 Epoch: 8 Global Step: 351360 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:12:59,671-Speed 2629.72 samples/sec Loss 7.7503 LearningRate 0.0332 Epoch: 8 Global Step: 351370 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:13:03,574-Speed 2624.20 samples/sec Loss 7.7777 LearningRate 0.0332 Epoch: 8 Global Step: 351380 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:13:07,468-Speed 2630.26 samples/sec Loss 7.8727 LearningRate 0.0332 Epoch: 8 Global Step: 351390 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:13:11,366-Speed 2627.47 samples/sec Loss 7.8006 LearningRate 0.0332 Epoch: 8 Global Step: 351400 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:13:15,275-Speed 2620.22 samples/sec Loss 7.7961 LearningRate 0.0332 Epoch: 8 Global Step: 351410 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:13:19,169-Speed 2630.75 samples/sec Loss 7.6870 LearningRate 0.0332 Epoch: 8 Global Step: 351420 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:13:23,061-Speed 2631.16 samples/sec Loss 7.7741 LearningRate 0.0332 Epoch: 8 Global Step: 351430 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:13:26,965-Speed 2624.69 samples/sec Loss 7.7511 LearningRate 0.0332 Epoch: 8 Global Step: 351440 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:30,861-Speed 2628.33 samples/sec Loss 7.7481 LearningRate 0.0332 Epoch: 8 Global Step: 351450 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:34,764-Speed 2624.21 samples/sec Loss 7.8239 LearningRate 0.0332 Epoch: 8 Global Step: 351460 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:38,684-Speed 2613.18 samples/sec Loss 7.7865 LearningRate 0.0332 Epoch: 8 Global Step: 351470 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:42,588-Speed 2623.11 samples/sec Loss 7.7278 LearningRate 0.0332 Epoch: 8 Global Step: 351480 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:46,510-Speed 2611.62 samples/sec Loss 7.6071 LearningRate 0.0332 Epoch: 8 Global Step: 351490 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:50,410-Speed 2626.51 samples/sec Loss 7.8003 LearningRate 0.0332 Epoch: 8 Global Step: 351500 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:54,305-Speed 2629.39 samples/sec Loss 7.8374 LearningRate 0.0332 Epoch: 8 Global Step: 351510 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:13:58,313-Speed 2555.71 samples/sec Loss 7.7770 LearningRate 0.0332 Epoch: 8 Global Step: 351520 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:02,245-Speed 2604.57 samples/sec Loss 7.7797 LearningRate 0.0332 Epoch: 8 Global Step: 351530 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:06,135-Speed 2633.28 samples/sec Loss 7.8213 LearningRate 0.0332 Epoch: 8 Global Step: 351540 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:14:10,012-Speed 2641.94 samples/sec Loss 7.7219 LearningRate 0.0332 Epoch: 8 Global Step: 351550 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:13,903-Speed 2631.81 samples/sec Loss 7.7892 LearningRate 0.0332 Epoch: 8 Global Step: 351560 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:17,797-Speed 2630.11 samples/sec Loss 7.8427 LearningRate 0.0332 Epoch: 8 Global Step: 351570 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:21,689-Speed 2632.19 samples/sec Loss 7.7352 LearningRate 0.0332 Epoch: 8 Global Step: 351580 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:25,583-Speed 2630.39 samples/sec Loss 7.7759 LearningRate 0.0332 Epoch: 8 Global Step: 351590 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:29,473-Speed 2632.59 samples/sec Loss 7.8469 LearningRate 0.0332 Epoch: 8 Global Step: 351600 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:33,365-Speed 2631.56 samples/sec Loss 7.8256 LearningRate 0.0332 Epoch: 8 Global Step: 351610 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:37,260-Speed 2629.47 samples/sec Loss 7.7491 LearningRate 0.0332 Epoch: 8 Global Step: 351620 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:41,151-Speed 2632.59 samples/sec Loss 7.6474 LearningRate 0.0332 Epoch: 8 Global Step: 351630 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:45,042-Speed 2632.50 samples/sec Loss 7.7477 LearningRate 0.0332 Epoch: 8 Global Step: 351640 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:14:48,938-Speed 2628.80 samples/sec Loss 7.9022 LearningRate 0.0332 Epoch: 8 Global Step: 351650 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:14:52,831-Speed 2630.78 samples/sec Loss 7.7320 LearningRate 0.0332 Epoch: 8 Global Step: 351660 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:14:56,726-Speed 2629.63 samples/sec Loss 7.8005 LearningRate 0.0332 Epoch: 8 Global Step: 351670 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:00,617-Speed 2632.31 samples/sec Loss 7.8496 LearningRate 0.0332 Epoch: 8 Global Step: 351680 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:04,511-Speed 2629.85 samples/sec Loss 7.7714 LearningRate 0.0332 Epoch: 8 Global Step: 351690 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:08,414-Speed 2624.51 samples/sec Loss 7.8272 LearningRate 0.0332 Epoch: 8 Global Step: 351700 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:12,310-Speed 2629.26 samples/sec Loss 7.7797 LearningRate 0.0332 Epoch: 8 Global Step: 351710 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:16,208-Speed 2627.77 samples/sec Loss 7.7957 LearningRate 0.0332 Epoch: 8 Global Step: 351720 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:20,104-Speed 2628.81 samples/sec Loss 7.6623 LearningRate 0.0332 Epoch: 8 Global Step: 351730 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:23,991-Speed 2634.99 samples/sec Loss 7.7437 LearningRate 0.0332 Epoch: 8 Global Step: 351740 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:27,886-Speed 2629.89 samples/sec Loss 7.8103 LearningRate 0.0332 Epoch: 8 Global Step: 351750 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:15:31,772-Speed 2635.74 samples/sec Loss 7.7353 LearningRate 0.0332 Epoch: 8 Global Step: 351760 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:35,690-Speed 2613.94 samples/sec Loss 7.7493 LearningRate 0.0332 Epoch: 8 Global Step: 351770 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:39,585-Speed 2629.58 samples/sec Loss 7.7535 LearningRate 0.0332 Epoch: 8 Global Step: 351780 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:43,478-Speed 2631.48 samples/sec Loss 7.6010 LearningRate 0.0332 Epoch: 8 Global Step: 351790 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:47,373-Speed 2629.25 samples/sec Loss 7.7496 LearningRate 0.0332 Epoch: 8 Global Step: 351800 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:51,265-Speed 2632.05 samples/sec Loss 7.6999 LearningRate 0.0332 Epoch: 8 Global Step: 351810 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:55,161-Speed 2628.70 samples/sec Loss 7.9025 LearningRate 0.0332 Epoch: 8 Global Step: 351820 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:15:59,055-Speed 2630.81 samples/sec Loss 7.8101 LearningRate 0.0332 Epoch: 8 Global Step: 351830 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:16:02,955-Speed 2625.95 samples/sec Loss 7.7144 LearningRate 0.0332 Epoch: 8 Global Step: 351840 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:16:06,844-Speed 2633.49 samples/sec Loss 7.8209 LearningRate 0.0332 Epoch: 8 Global Step: 351850 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:16:10,745-Speed 2625.13 samples/sec Loss 7.7749 LearningRate 0.0332 Epoch: 8 Global Step: 351860 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:14,645-Speed 2626.90 samples/sec Loss 7.6993 LearningRate 0.0332 Epoch: 8 Global Step: 351870 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:18,542-Speed 2627.81 samples/sec Loss 7.8736 LearningRate 0.0332 Epoch: 8 Global Step: 351880 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:22,450-Speed 2621.32 samples/sec Loss 7.6983 LearningRate 0.0332 Epoch: 8 Global Step: 351890 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:26,348-Speed 2627.07 samples/sec Loss 7.6099 LearningRate 0.0332 Epoch: 8 Global Step: 351900 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:30,266-Speed 2614.89 samples/sec Loss 7.6132 LearningRate 0.0332 Epoch: 8 Global Step: 351910 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:34,168-Speed 2624.54 samples/sec Loss 7.8816 LearningRate 0.0332 Epoch: 8 Global Step: 351920 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:38,065-Speed 2628.45 samples/sec Loss 7.7181 LearningRate 0.0332 Epoch: 8 Global Step: 351930 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:41,958-Speed 2630.85 samples/sec Loss 7.8355 LearningRate 0.0332 Epoch: 8 Global Step: 351940 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:45,853-Speed 2629.46 samples/sec Loss 7.8648 LearningRate 0.0331 Epoch: 8 Global Step: 351950 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:16:49,756-Speed 2623.72 samples/sec Loss 7.7304 LearningRate 0.0331 Epoch: 8 Global Step: 351960 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:16:53,667-Speed 2619.39 samples/sec Loss 7.7022 LearningRate 0.0331 Epoch: 8 Global Step: 351970 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:16:57,548-Speed 2639.09 samples/sec Loss 7.8777 LearningRate 0.0331 Epoch: 8 Global Step: 351980 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:01,441-Speed 2630.70 samples/sec Loss 7.8208 LearningRate 0.0331 Epoch: 8 Global Step: 351990 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:05,338-Speed 2628.28 samples/sec Loss 7.6680 LearningRate 0.0331 Epoch: 8 Global Step: 352000 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:09,237-Speed 2627.28 samples/sec Loss 7.6868 LearningRate 0.0331 Epoch: 8 Global Step: 352010 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:13,134-Speed 2628.28 samples/sec Loss 7.7793 LearningRate 0.0331 Epoch: 8 Global Step: 352020 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:17,030-Speed 2628.59 samples/sec Loss 7.9275 LearningRate 0.0331 Epoch: 8 Global Step: 352030 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:20,933-Speed 2624.74 samples/sec Loss 7.8273 LearningRate 0.0331 Epoch: 8 Global Step: 352040 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:24,834-Speed 2625.11 samples/sec Loss 7.6697 LearningRate 0.0331 Epoch: 8 Global Step: 352050 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:28,734-Speed 2626.74 samples/sec Loss 7.9582 LearningRate 0.0331 Epoch: 8 Global Step: 352060 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:32,630-Speed 2628.50 samples/sec Loss 7.6386 LearningRate 0.0331 Epoch: 8 Global Step: 352070 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:17:36,530-Speed 2625.89 samples/sec Loss 7.8181 LearningRate 0.0331 Epoch: 8 Global Step: 352080 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:17:40,439-Speed 2620.45 samples/sec Loss 7.8020 LearningRate 0.0331 Epoch: 8 Global Step: 352090 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:17:44,338-Speed 2627.12 samples/sec Loss 7.7761 LearningRate 0.0331 Epoch: 8 Global Step: 352100 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:17:48,338-Speed 2560.31 samples/sec Loss 7.7419 LearningRate 0.0331 Epoch: 8 Global Step: 352110 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:17:52,227-Speed 2634.07 samples/sec Loss 7.8176 LearningRate 0.0331 Epoch: 8 Global Step: 352120 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:17:56,127-Speed 2626.30 samples/sec Loss 7.7232 LearningRate 0.0331 Epoch: 8 Global Step: 352130 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:00,021-Speed 2630.24 samples/sec Loss 7.6753 LearningRate 0.0331 Epoch: 8 Global Step: 352140 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:03,916-Speed 2629.50 samples/sec Loss 7.7315 LearningRate 0.0331 Epoch: 8 Global Step: 352150 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:07,818-Speed 2624.41 samples/sec Loss 7.8756 LearningRate 0.0331 Epoch: 8 Global Step: 352160 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:11,715-Speed 2628.43 samples/sec Loss 7.7090 LearningRate 0.0331 Epoch: 8 Global Step: 352170 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:15,602-Speed 2635.29 samples/sec Loss 7.7268 LearningRate 0.0331 Epoch: 8 Global Step: 352180 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:19,503-Speed 2625.73 samples/sec Loss 7.7500 LearningRate 0.0331 Epoch: 8 Global Step: 352190 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:23,394-Speed 2632.16 samples/sec Loss 7.6585 LearningRate 0.0331 Epoch: 8 Global Step: 352200 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:18:27,265-Speed 2646.46 samples/sec Loss 7.9038 LearningRate 0.0331 Epoch: 8 Global Step: 352210 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:31,159-Speed 2630.04 samples/sec Loss 7.7393 LearningRate 0.0331 Epoch: 8 Global Step: 352220 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:35,051-Speed 2631.59 samples/sec Loss 7.6861 LearningRate 0.0331 Epoch: 8 Global Step: 352230 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:38,947-Speed 2629.20 samples/sec Loss 7.7712 LearningRate 0.0331 Epoch: 8 Global Step: 352240 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:42,840-Speed 2630.84 samples/sec Loss 7.7484 LearningRate 0.0331 Epoch: 8 Global Step: 352250 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:46,734-Speed 2629.73 samples/sec Loss 7.8081 LearningRate 0.0331 Epoch: 8 Global Step: 352260 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:50,629-Speed 2631.00 samples/sec Loss 7.8091 LearningRate 0.0331 Epoch: 8 Global Step: 352270 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:54,520-Speed 2632.71 samples/sec Loss 7.6651 LearningRate 0.0331 Epoch: 8 Global Step: 352280 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:18:58,413-Speed 2630.93 samples/sec Loss 7.7024 LearningRate 0.0331 Epoch: 8 Global Step: 352290 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:19:02,307-Speed 2630.11 samples/sec Loss 7.7941 LearningRate 0.0331 Epoch: 8 Global Step: 352300 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:19:06,206-Speed 2627.33 samples/sec Loss 7.6835 LearningRate 0.0331 Epoch: 8 Global Step: 352310 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:19:10,088-Speed 2638.25 samples/sec Loss 7.7797 LearningRate 0.0331 Epoch: 8 Global Step: 352320 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:19:14,008-Speed 2612.69 samples/sec Loss 7.7088 LearningRate 0.0331 Epoch: 8 Global Step: 352330 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:19:17,902-Speed 2630.44 samples/sec Loss 7.7553 LearningRate 0.0331 Epoch: 8 Global Step: 352340 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:19:21,797-Speed 2630.12 samples/sec Loss 7.7814 LearningRate 0.0331 Epoch: 8 Global Step: 352350 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:19:25,682-Speed 2636.99 samples/sec Loss 7.7713 LearningRate 0.0331 Epoch: 8 Global Step: 352360 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:19:29,568-Speed 2635.72 samples/sec Loss 8.8057 LearningRate 0.0331 Epoch: 8 Global Step: 352370 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:19:33,456-Speed 2634.92 samples/sec Loss 7.9395 LearningRate 0.0331 Epoch: 8 Global Step: 352380 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:19:37,359-Speed 2624.18 samples/sec Loss 7.8710 LearningRate 0.0331 Epoch: 8 Global Step: 352390 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:19:41,248-Speed 2633.72 samples/sec Loss 7.9290 LearningRate 0.0331 Epoch: 8 Global Step: 352400 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:19:45,138-Speed 2633.13 samples/sec Loss 7.8138 LearningRate 0.0331 Epoch: 8 Global Step: 352410 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:19:49,039-Speed 2625.45 samples/sec Loss 7.8193 LearningRate 0.0331 Epoch: 8 Global Step: 352420 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:19:52,931-Speed 2631.84 samples/sec Loss 7.8404 LearningRate 0.0331 Epoch: 8 Global Step: 352430 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:19:56,825-Speed 2630.23 samples/sec Loss 7.7288 LearningRate 0.0331 Epoch: 8 Global Step: 352440 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:20:00,718-Speed 2630.79 samples/sec Loss 7.8433 LearningRate 0.0331 Epoch: 8 Global Step: 352450 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:20:04,609-Speed 2632.72 samples/sec Loss 7.7060 LearningRate 0.0331 Epoch: 8 Global Step: 352460 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:20:08,507-Speed 2628.10 samples/sec Loss 7.9089 LearningRate 0.0331 Epoch: 8 Global Step: 352470 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:12,404-Speed 2627.78 samples/sec Loss 7.8366 LearningRate 0.0331 Epoch: 8 Global Step: 352480 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:16,306-Speed 2624.90 samples/sec Loss 7.7739 LearningRate 0.0331 Epoch: 8 Global Step: 352490 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:20,202-Speed 2629.42 samples/sec Loss 7.8664 LearningRate 0.0331 Epoch: 8 Global Step: 352500 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:24,094-Speed 2630.85 samples/sec Loss 7.7047 LearningRate 0.0331 Epoch: 8 Global Step: 352510 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:27,998-Speed 2623.56 samples/sec Loss 7.7102 LearningRate 0.0331 Epoch: 8 Global Step: 352520 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:31,892-Speed 2631.02 samples/sec Loss 7.8807 LearningRate 0.0331 Epoch: 8 Global Step: 352530 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:35,784-Speed 2632.18 samples/sec Loss 7.6440 LearningRate 0.0331 Epoch: 8 Global Step: 352540 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:39,676-Speed 2631.35 samples/sec Loss 7.7189 LearningRate 0.0331 Epoch: 8 Global Step: 352550 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:43,574-Speed 2627.90 samples/sec Loss 7.6957 LearningRate 0.0331 Epoch: 8 Global Step: 352560 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:20:47,467-Speed 2631.01 samples/sec Loss 7.7733 LearningRate 0.0331 Epoch: 8 Global Step: 352570 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:20:51,359-Speed 2631.91 samples/sec Loss 7.6328 LearningRate 0.0331 Epoch: 8 Global Step: 352580 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:20:55,255-Speed 2629.22 samples/sec Loss 7.7147 LearningRate 0.0331 Epoch: 8 Global Step: 352590 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:20:59,150-Speed 2629.72 samples/sec Loss 7.7002 LearningRate 0.0331 Epoch: 8 Global Step: 352600 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:21:03,094-Speed 2597.32 samples/sec Loss 7.6745 LearningRate 0.0331 Epoch: 8 Global Step: 352610 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:21:06,985-Speed 2632.19 samples/sec Loss 7.7310 LearningRate 0.0331 Epoch: 8 Global Step: 352620 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:21:10,881-Speed 2628.79 samples/sec Loss 7.8366 LearningRate 0.0331 Epoch: 8 Global Step: 352630 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:21:14,786-Speed 2623.20 samples/sec Loss 7.8245 LearningRate 0.0331 Epoch: 8 Global Step: 352640 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:21:18,677-Speed 2631.94 samples/sec Loss 7.9079 LearningRate 0.0331 Epoch: 8 Global Step: 352650 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:21:22,567-Speed 2632.83 samples/sec Loss 7.6749 LearningRate 0.0331 Epoch: 8 Global Step: 352660 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:21:26,459-Speed 2632.34 samples/sec Loss 7.6411 LearningRate 0.0330 Epoch: 8 Global Step: 352670 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:30,357-Speed 2627.39 samples/sec Loss 7.7703 LearningRate 0.0330 Epoch: 8 Global Step: 352680 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:34,255-Speed 2627.41 samples/sec Loss 7.7941 LearningRate 0.0330 Epoch: 8 Global Step: 352690 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:38,153-Speed 2627.92 samples/sec Loss 7.6634 LearningRate 0.0330 Epoch: 8 Global Step: 352700 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:42,063-Speed 2619.31 samples/sec Loss 7.6120 LearningRate 0.0330 Epoch: 8 Global Step: 352710 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:45,973-Speed 2619.28 samples/sec Loss 7.8337 LearningRate 0.0330 Epoch: 8 Global Step: 352720 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:49,875-Speed 2625.03 samples/sec Loss 7.7197 LearningRate 0.0330 Epoch: 8 Global Step: 352730 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:53,776-Speed 2625.85 samples/sec Loss 7.6795 LearningRate 0.0330 Epoch: 8 Global Step: 352740 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:21:57,683-Speed 2621.30 samples/sec Loss 7.6591 LearningRate 0.0330 Epoch: 8 Global Step: 352750 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:22:01,587-Speed 2624.40 samples/sec Loss 7.8060 LearningRate 0.0330 Epoch: 8 Global Step: 352760 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:22:05,494-Speed 2621.05 samples/sec Loss 7.7096 LearningRate 0.0330 Epoch: 8 Global Step: 352770 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:22:09,397-Speed 2624.04 samples/sec Loss 7.6915 LearningRate 0.0330 Epoch: 8 Global Step: 352780 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:22:13,266-Speed 2647.41 samples/sec Loss 7.7622 LearningRate 0.0330 Epoch: 8 Global Step: 352790 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:17,181-Speed 2615.96 samples/sec Loss 7.7834 LearningRate 0.0330 Epoch: 8 Global Step: 352800 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:21,080-Speed 2627.59 samples/sec Loss 7.6632 LearningRate 0.0330 Epoch: 8 Global Step: 352810 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:24,978-Speed 2627.43 samples/sec Loss 7.6464 LearningRate 0.0330 Epoch: 8 Global Step: 352820 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:28,887-Speed 2620.72 samples/sec Loss 7.8285 LearningRate 0.0330 Epoch: 8 Global Step: 352830 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:32,824-Speed 2601.27 samples/sec Loss 7.6549 LearningRate 0.0330 Epoch: 8 Global Step: 352840 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:36,716-Speed 2631.76 samples/sec Loss 7.6766 LearningRate 0.0330 Epoch: 8 Global Step: 352850 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:40,615-Speed 2626.88 samples/sec Loss 7.7217 LearningRate 0.0330 Epoch: 8 Global Step: 352860 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:44,512-Speed 2628.78 samples/sec Loss 7.7299 LearningRate 0.0330 Epoch: 8 Global Step: 352870 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:48,442-Speed 2606.46 samples/sec Loss 7.7680 LearningRate 0.0330 Epoch: 8 Global Step: 352880 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:22:52,334-Speed 2631.66 samples/sec Loss 7.6769 LearningRate 0.0330 Epoch: 8 Global Step: 352890 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:22:56,230-Speed 2628.93 samples/sec Loss 7.7476 LearningRate 0.0330 Epoch: 8 Global Step: 352900 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:00,132-Speed 2625.26 samples/sec Loss 7.7090 LearningRate 0.0330 Epoch: 8 Global Step: 352910 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:04,030-Speed 2627.79 samples/sec Loss 7.7314 LearningRate 0.0330 Epoch: 8 Global Step: 352920 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:07,938-Speed 2621.10 samples/sec Loss 7.8016 LearningRate 0.0330 Epoch: 8 Global Step: 352930 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:11,829-Speed 2631.98 samples/sec Loss 7.6822 LearningRate 0.0330 Epoch: 8 Global Step: 352940 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:15,721-Speed 2632.10 samples/sec Loss 7.7345 LearningRate 0.0330 Epoch: 8 Global Step: 352950 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:19,614-Speed 2630.84 samples/sec Loss 7.7682 LearningRate 0.0330 Epoch: 8 Global Step: 352960 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:23,513-Speed 2627.27 samples/sec Loss 7.7469 LearningRate 0.0330 Epoch: 8 Global Step: 352970 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:27,435-Speed 2611.37 samples/sec Loss 7.6839 LearningRate 0.0330 Epoch: 8 Global Step: 352980 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:31,334-Speed 2627.35 samples/sec Loss 7.6865 LearningRate 0.0330 Epoch: 8 Global Step: 352990 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:23:35,255-Speed 2612.25 samples/sec Loss 7.7226 LearningRate 0.0330 Epoch: 8 Global Step: 353000 Fp16 Grad Scale: 262144 Required: 54 hours
Training: 2022-04-14 11:23:39,126-Speed 2645.77 samples/sec Loss 7.8635 LearningRate 0.0330 Epoch: 8 Global Step: 353010 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:23:43,009-Speed 2638.11 samples/sec Loss 7.7344 LearningRate 0.0330 Epoch: 8 Global Step: 353020 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:23:46,906-Speed 2628.33 samples/sec Loss 7.6597 LearningRate 0.0330 Epoch: 8 Global Step: 353030 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:23:50,801-Speed 2629.67 samples/sec Loss 7.7640 LearningRate 0.0330 Epoch: 8 Global Step: 353040 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:23:54,690-Speed 2633.86 samples/sec Loss 7.6786 LearningRate 0.0330 Epoch: 8 Global Step: 353050 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:23:58,583-Speed 2630.66 samples/sec Loss 7.6358 LearningRate 0.0330 Epoch: 8 Global Step: 353060 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:24:02,479-Speed 2628.99 samples/sec Loss 7.7535 LearningRate 0.0330 Epoch: 8 Global Step: 353070 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:24:06,372-Speed 2631.37 samples/sec Loss 7.8139 LearningRate 0.0330 Epoch: 8 Global Step: 353080 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:24:10,264-Speed 2631.70 samples/sec Loss 7.7558 LearningRate 0.0330 Epoch: 8 Global Step: 353090 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:24:14,192-Speed 2607.19 samples/sec Loss 7.9047 LearningRate 0.0330 Epoch: 8 Global Step: 353100 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:24:18,170-Speed 2574.49 samples/sec Loss 7.6999 LearningRate 0.0330 Epoch: 8 Global Step: 353110 Fp16 Grad Scale: 65536 Required: 54 hours
Training: 2022-04-14 11:24:22,068-Speed 2627.88 samples/sec Loss 7.6849 LearningRate 0.0330 Epoch: 8 Global Step: 353120 Fp16 Grad Scale: 131072 Required: 54 hours
Training: 2022-04-14 11:24:25,909-Speed 2666.61 samples/sec Loss 7.8685 LearningRate 0.0330 Epoch: 8 Global Step: 353130 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:29,821-Speed 2618.36 samples/sec Loss 9.8280 LearningRate 0.0330 Epoch: 8 Global Step: 353140 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:33,718-Speed 2628.31 samples/sec Loss 8.6324 LearningRate 0.0330 Epoch: 8 Global Step: 353150 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:37,620-Speed 2625.36 samples/sec Loss 7.9921 LearningRate 0.0330 Epoch: 8 Global Step: 353160 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:41,514-Speed 2630.31 samples/sec Loss 7.9325 LearningRate 0.0330 Epoch: 8 Global Step: 353170 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:45,405-Speed 2631.71 samples/sec Loss 7.8013 LearningRate 0.0330 Epoch: 8 Global Step: 353180 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:49,293-Speed 2634.38 samples/sec Loss 7.7832 LearningRate 0.0330 Epoch: 8 Global Step: 353190 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:53,182-Speed 2633.33 samples/sec Loss 7.9124 LearningRate 0.0330 Epoch: 8 Global Step: 353200 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:24:57,072-Speed 2633.70 samples/sec Loss 7.8042 LearningRate 0.0330 Epoch: 8 Global Step: 353210 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:25:00,959-Speed 2634.63 samples/sec Loss 7.7935 LearningRate 0.0330 Epoch: 8 Global Step: 353220 Fp16 Grad Scale: 8192 Required: 54 hours
Training: 2022-04-14 11:25:04,851-Speed 2632.02 samples/sec Loss 7.7468 LearningRate 0.0330 Epoch: 8 Global Step: 353230 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:08,741-Speed 2633.25 samples/sec Loss 7.8733 LearningRate 0.0330 Epoch: 8 Global Step: 353240 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:12,641-Speed 2625.82 samples/sec Loss 7.6770 LearningRate 0.0330 Epoch: 8 Global Step: 353250 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:16,542-Speed 2626.10 samples/sec Loss 7.7854 LearningRate 0.0330 Epoch: 8 Global Step: 353260 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:20,431-Speed 2632.94 samples/sec Loss 7.8589 LearningRate 0.0330 Epoch: 8 Global Step: 353270 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:24,329-Speed 2628.24 samples/sec Loss 7.7373 LearningRate 0.0330 Epoch: 8 Global Step: 353280 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:28,226-Speed 2628.02 samples/sec Loss 7.7127 LearningRate 0.0330 Epoch: 8 Global Step: 353290 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:32,124-Speed 2627.46 samples/sec Loss 7.7657 LearningRate 0.0330 Epoch: 8 Global Step: 353300 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:36,019-Speed 2629.92 samples/sec Loss 7.8977 LearningRate 0.0330 Epoch: 8 Global Step: 353310 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:39,910-Speed 2632.77 samples/sec Loss 7.8147 LearningRate 0.0330 Epoch: 8 Global Step: 353320 Fp16 Grad Scale: 16384 Required: 54 hours
Training: 2022-04-14 11:25:43,804-Speed 2630.19 samples/sec Loss 7.7940 LearningRate 0.0330 Epoch: 8 Global Step: 353330 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:25:47,710-Speed 2622.57 samples/sec Loss 7.6067 LearningRate 0.0330 Epoch: 8 Global Step: 353340 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:25:51,599-Speed 2633.44 samples/sec Loss 7.7015 LearningRate 0.0330 Epoch: 8 Global Step: 353350 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:25:55,491-Speed 2631.88 samples/sec Loss 7.7925 LearningRate 0.0330 Epoch: 8 Global Step: 353360 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:25:59,381-Speed 2632.62 samples/sec Loss 7.6135 LearningRate 0.0330 Epoch: 8 Global Step: 353370 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:26:03,277-Speed 2629.01 samples/sec Loss 7.6437 LearningRate 0.0330 Epoch: 8 Global Step: 353380 Fp16 Grad Scale: 32768 Required: 54 hours
Training: 2022-04-14 11:26:07,171-Speed 2630.33 samples/sec Loss 7.6738 LearningRate 0.0329 Epoch: 8 Global Step: 353390 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:26:11,058-Speed 2635.84 samples/sec Loss 7.7603 LearningRate 0.0329 Epoch: 8 Global Step: 353400 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:26:14,959-Speed 2625.11 samples/sec Loss 7.7742 LearningRate 0.0329 Epoch: 8 Global Step: 353410 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:26:18,865-Speed 2622.16 samples/sec Loss 7.7687 LearningRate 0.0329 Epoch: 8 Global Step: 353420 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:26:22,768-Speed 2623.98 samples/sec Loss 7.8019 LearningRate 0.0329 Epoch: 8 Global Step: 353430 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:26,672-Speed 2624.22 samples/sec Loss 7.7580 LearningRate 0.0329 Epoch: 8 Global Step: 353440 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:30,590-Speed 2613.56 samples/sec Loss 7.8088 LearningRate 0.0329 Epoch: 8 Global Step: 353450 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:34,508-Speed 2614.41 samples/sec Loss 7.6909 LearningRate 0.0329 Epoch: 8 Global Step: 353460 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:38,415-Speed 2621.34 samples/sec Loss 7.7840 LearningRate 0.0329 Epoch: 8 Global Step: 353470 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:42,311-Speed 2629.72 samples/sec Loss 7.8019 LearningRate 0.0329 Epoch: 8 Global Step: 353480 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:46,203-Speed 2631.71 samples/sec Loss 7.7623 LearningRate 0.0329 Epoch: 8 Global Step: 353490 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:50,097-Speed 2630.30 samples/sec Loss 7.7458 LearningRate 0.0329 Epoch: 8 Global Step: 353500 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:53,995-Speed 2628.08 samples/sec Loss 7.8251 LearningRate 0.0329 Epoch: 8 Global Step: 353510 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:26:57,903-Speed 2620.46 samples/sec Loss 7.7511 LearningRate 0.0329 Epoch: 8 Global Step: 353520 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:27:01,809-Speed 2622.43 samples/sec Loss 7.7435 LearningRate 0.0329 Epoch: 8 Global Step: 353530 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:27:05,713-Speed 2623.45 samples/sec Loss 7.8046 LearningRate 0.0329 Epoch: 8 Global Step: 353540 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:27:09,605-Speed 2632.13 samples/sec Loss 7.7732 LearningRate 0.0329 Epoch: 8 Global Step: 353550 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:27:13,495-Speed 2633.19 samples/sec Loss 7.8197 LearningRate 0.0329 Epoch: 8 Global Step: 353560 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:27:17,428-Speed 2604.05 samples/sec Loss 7.7460 LearningRate 0.0329 Epoch: 8 Global Step: 353570 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:27:21,307-Speed 2640.79 samples/sec Loss 7.8878 LearningRate 0.0329 Epoch: 8 Global Step: 353580 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:27:25,123-Speed 2684.30 samples/sec Loss 8.3791 LearningRate 0.0329 Epoch: 8 Global Step: 353590 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:29,028-Speed 2623.42 samples/sec Loss 7.9505 LearningRate 0.0329 Epoch: 8 Global Step: 353600 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:32,930-Speed 2624.50 samples/sec Loss 7.6783 LearningRate 0.0329 Epoch: 8 Global Step: 353610 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:36,819-Speed 2633.90 samples/sec Loss 7.6657 LearningRate 0.0329 Epoch: 8 Global Step: 353620 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:40,729-Speed 2619.55 samples/sec Loss 7.7266 LearningRate 0.0329 Epoch: 8 Global Step: 353630 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:44,630-Speed 2626.01 samples/sec Loss 7.8480 LearningRate 0.0329 Epoch: 8 Global Step: 353640 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:48,564-Speed 2603.46 samples/sec Loss 7.7626 LearningRate 0.0329 Epoch: 8 Global Step: 353650 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:52,634-Speed 2516.86 samples/sec Loss 7.7324 LearningRate 0.0329 Epoch: 8 Global Step: 353660 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:27:56,523-Speed 2633.34 samples/sec Loss 7.7669 LearningRate 0.0329 Epoch: 8 Global Step: 353670 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:28:00,464-Speed 2599.07 samples/sec Loss 7.6657 LearningRate 0.0329 Epoch: 8 Global Step: 353680 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:28:04,440-Speed 2576.22 samples/sec Loss 7.7937 LearningRate 0.0329 Epoch: 8 Global Step: 353690 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:08,340-Speed 2626.58 samples/sec Loss 7.7732 LearningRate 0.0329 Epoch: 8 Global Step: 353700 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:12,233-Speed 2630.77 samples/sec Loss 7.7560 LearningRate 0.0329 Epoch: 8 Global Step: 353710 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:16,129-Speed 2629.06 samples/sec Loss 7.8810 LearningRate 0.0329 Epoch: 8 Global Step: 353720 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:20,019-Speed 2633.39 samples/sec Loss 7.6515 LearningRate 0.0329 Epoch: 8 Global Step: 353730 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:23,907-Speed 2634.60 samples/sec Loss 7.7915 LearningRate 0.0329 Epoch: 8 Global Step: 353740 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:27,796-Speed 2633.01 samples/sec Loss 7.8445 LearningRate 0.0329 Epoch: 8 Global Step: 353750 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:31,688-Speed 2632.36 samples/sec Loss 7.7517 LearningRate 0.0329 Epoch: 8 Global Step: 353760 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:35,579-Speed 2632.28 samples/sec Loss 7.7213 LearningRate 0.0329 Epoch: 8 Global Step: 353770 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:39,474-Speed 2629.83 samples/sec Loss 7.7575 LearningRate 0.0329 Epoch: 8 Global Step: 353780 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:28:43,363-Speed 2633.37 samples/sec Loss 7.7031 LearningRate 0.0329 Epoch: 8 Global Step: 353790 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:28:47,277-Speed 2617.38 samples/sec Loss 7.6645 LearningRate 0.0329 Epoch: 8 Global Step: 353800 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:28:51,182-Speed 2622.74 samples/sec Loss 7.7583 LearningRate 0.0329 Epoch: 8 Global Step: 353810 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:28:55,092-Speed 2619.42 samples/sec Loss 7.5995 LearningRate 0.0329 Epoch: 8 Global Step: 353820 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:28:58,993-Speed 2625.70 samples/sec Loss 7.9105 LearningRate 0.0329 Epoch: 8 Global Step: 353830 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:29:02,901-Speed 2621.70 samples/sec Loss 7.7915 LearningRate 0.0329 Epoch: 8 Global Step: 353840 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:29:06,804-Speed 2623.58 samples/sec Loss 7.7261 LearningRate 0.0329 Epoch: 8 Global Step: 353850 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:29:10,712-Speed 2620.97 samples/sec Loss 7.7490 LearningRate 0.0329 Epoch: 8 Global Step: 353860 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:29:14,612-Speed 2625.90 samples/sec Loss 7.7530 LearningRate 0.0329 Epoch: 8 Global Step: 353870 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:29:18,511-Speed 2627.13 samples/sec Loss 7.7428 LearningRate 0.0329 Epoch: 8 Global Step: 353880 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:29:22,411-Speed 2627.64 samples/sec Loss 7.8082 LearningRate 0.0329 Epoch: 8 Global Step: 353890 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:26,307-Speed 2628.95 samples/sec Loss 7.7654 LearningRate 0.0329 Epoch: 8 Global Step: 353900 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:30,200-Speed 2631.11 samples/sec Loss 7.7262 LearningRate 0.0329 Epoch: 8 Global Step: 353910 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:34,103-Speed 2624.60 samples/sec Loss 7.6751 LearningRate 0.0329 Epoch: 8 Global Step: 353920 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:37,996-Speed 2630.95 samples/sec Loss 7.7099 LearningRate 0.0329 Epoch: 8 Global Step: 353930 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:41,893-Speed 2628.55 samples/sec Loss 7.7750 LearningRate 0.0329 Epoch: 8 Global Step: 353940 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:45,790-Speed 2627.88 samples/sec Loss 7.6540 LearningRate 0.0329 Epoch: 8 Global Step: 353950 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:49,690-Speed 2626.87 samples/sec Loss 7.7065 LearningRate 0.0329 Epoch: 8 Global Step: 353960 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:53,592-Speed 2625.26 samples/sec Loss 7.8182 LearningRate 0.0329 Epoch: 8 Global Step: 353970 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:29:57,486-Speed 2630.17 samples/sec Loss 7.8049 LearningRate 0.0329 Epoch: 8 Global Step: 353980 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:30:01,375-Speed 2633.46 samples/sec Loss 7.9028 LearningRate 0.0329 Epoch: 8 Global Step: 353990 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:05,270-Speed 2629.53 samples/sec Loss 7.6805 LearningRate 0.0329 Epoch: 8 Global Step: 354000 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:09,165-Speed 2629.71 samples/sec Loss 7.7037 LearningRate 0.0329 Epoch: 8 Global Step: 354010 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:13,074-Speed 2620.66 samples/sec Loss 7.7167 LearningRate 0.0329 Epoch: 8 Global Step: 354020 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:16,990-Speed 2615.91 samples/sec Loss 7.6166 LearningRate 0.0329 Epoch: 8 Global Step: 354030 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:20,891-Speed 2625.18 samples/sec Loss 7.7741 LearningRate 0.0329 Epoch: 8 Global Step: 354040 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:24,783-Speed 2631.56 samples/sec Loss 7.7764 LearningRate 0.0329 Epoch: 8 Global Step: 354050 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:28,680-Speed 2627.70 samples/sec Loss 7.6301 LearningRate 0.0329 Epoch: 8 Global Step: 354060 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:32,579-Speed 2627.59 samples/sec Loss 7.6690 LearningRate 0.0329 Epoch: 8 Global Step: 354070 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:36,474-Speed 2629.48 samples/sec Loss 7.6643 LearningRate 0.0329 Epoch: 8 Global Step: 354080 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:30:40,426-Speed 2591.21 samples/sec Loss 7.7658 LearningRate 0.0329 Epoch: 8 Global Step: 354090 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:30:44,323-Speed 2628.67 samples/sec Loss 7.8065 LearningRate 0.0329 Epoch: 8 Global Step: 354100 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:30:48,229-Speed 2622.29 samples/sec Loss 7.7093 LearningRate 0.0328 Epoch: 8 Global Step: 354110 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:30:52,122-Speed 2631.19 samples/sec Loss 7.6683 LearningRate 0.0328 Epoch: 8 Global Step: 354120 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:30:56,020-Speed 2628.04 samples/sec Loss 7.4942 LearningRate 0.0328 Epoch: 8 Global Step: 354130 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:30:59,913-Speed 2630.62 samples/sec Loss 7.8289 LearningRate 0.0328 Epoch: 8 Global Step: 354140 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:03,812-Speed 2626.94 samples/sec Loss 7.7318 LearningRate 0.0328 Epoch: 8 Global Step: 354150 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:07,721-Speed 2620.33 samples/sec Loss 7.7792 LearningRate 0.0328 Epoch: 8 Global Step: 354160 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:11,614-Speed 2631.45 samples/sec Loss 7.7283 LearningRate 0.0328 Epoch: 8 Global Step: 354170 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:15,519-Speed 2622.51 samples/sec Loss 7.6859 LearningRate 0.0328 Epoch: 8 Global Step: 354180 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:19,412-Speed 2631.60 samples/sec Loss 7.8472 LearningRate 0.0328 Epoch: 8 Global Step: 354190 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:31:23,309-Speed 2628.22 samples/sec Loss 7.7270 LearningRate 0.0328 Epoch: 8 Global Step: 354200 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:31:27,200-Speed 2632.71 samples/sec Loss 7.7675 LearningRate 0.0328 Epoch: 8 Global Step: 354210 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:31:31,125-Speed 2608.91 samples/sec Loss 7.9092 LearningRate 0.0328 Epoch: 8 Global Step: 354220 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:31:35,001-Speed 2642.93 samples/sec Loss 7.5589 LearningRate 0.0328 Epoch: 8 Global Step: 354230 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:38,908-Speed 2621.23 samples/sec Loss 7.7480 LearningRate 0.0328 Epoch: 8 Global Step: 354240 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:42,804-Speed 2629.86 samples/sec Loss 7.6476 LearningRate 0.0328 Epoch: 8 Global Step: 354250 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:46,719-Speed 2615.96 samples/sec Loss 7.6603 LearningRate 0.0328 Epoch: 8 Global Step: 354260 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:50,636-Speed 2615.41 samples/sec Loss 7.7794 LearningRate 0.0328 Epoch: 8 Global Step: 354270 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:31:54,529-Speed 2630.20 samples/sec Loss 8.0824 LearningRate 0.0328 Epoch: 8 Global Step: 354280 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:31:58,386-Speed 2656.18 samples/sec Loss 8.9332 LearningRate 0.0328 Epoch: 8 Global Step: 354290 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:32:02,273-Speed 2634.72 samples/sec Loss 7.9376 LearningRate 0.0328 Epoch: 8 Global Step: 354300 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:32:06,151-Speed 2641.74 samples/sec Loss 8.0626 LearningRate 0.0328 Epoch: 8 Global Step: 354310 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:10,046-Speed 2629.72 samples/sec Loss 8.5480 LearningRate 0.0328 Epoch: 8 Global Step: 354320 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:13,948-Speed 2625.09 samples/sec Loss 7.6853 LearningRate 0.0328 Epoch: 8 Global Step: 354330 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:17,856-Speed 2621.28 samples/sec Loss 7.6114 LearningRate 0.0328 Epoch: 8 Global Step: 354340 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:21,754-Speed 2627.63 samples/sec Loss 7.6564 LearningRate 0.0328 Epoch: 8 Global Step: 354350 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:25,657-Speed 2624.00 samples/sec Loss 7.7724 LearningRate 0.0328 Epoch: 8 Global Step: 354360 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:29,545-Speed 2634.15 samples/sec Loss 7.8676 LearningRate 0.0328 Epoch: 8 Global Step: 354370 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:33,438-Speed 2630.92 samples/sec Loss 7.6571 LearningRate 0.0328 Epoch: 8 Global Step: 354380 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:37,328-Speed 2633.40 samples/sec Loss 7.7211 LearningRate 0.0328 Epoch: 8 Global Step: 354390 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:41,222-Speed 2630.35 samples/sec Loss 7.8398 LearningRate 0.0328 Epoch: 8 Global Step: 354400 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:32:45,117-Speed 2629.42 samples/sec Loss 7.7153 LearningRate 0.0328 Epoch: 8 Global Step: 354410 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:32:49,009-Speed 2632.17 samples/sec Loss 7.8376 LearningRate 0.0328 Epoch: 8 Global Step: 354420 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:32:52,923-Speed 2616.17 samples/sec Loss 7.7752 LearningRate 0.0328 Epoch: 8 Global Step: 354430 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:32:56,815-Speed 2632.78 samples/sec Loss 7.7375 LearningRate 0.0328 Epoch: 8 Global Step: 354440 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:33:00,702-Speed 2634.83 samples/sec Loss 7.6159 LearningRate 0.0328 Epoch: 8 Global Step: 354450 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:33:04,595-Speed 2630.44 samples/sec Loss 7.6407 LearningRate 0.0328 Epoch: 8 Global Step: 354460 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:33:08,482-Speed 2634.89 samples/sec Loss 7.7610 LearningRate 0.0328 Epoch: 8 Global Step: 354470 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:33:12,370-Speed 2634.80 samples/sec Loss 7.6206 LearningRate 0.0328 Epoch: 8 Global Step: 354480 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:33:16,266-Speed 2629.06 samples/sec Loss 7.6805 LearningRate 0.0328 Epoch: 8 Global Step: 354490 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:33:20,158-Speed 2631.69 samples/sec Loss 7.8505 LearningRate 0.0328 Epoch: 8 Global Step: 354500 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:33:24,052-Speed 2630.04 samples/sec Loss 7.6191 LearningRate 0.0328 Epoch: 8 Global Step: 354510 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:27,942-Speed 2632.85 samples/sec Loss 7.7241 LearningRate 0.0328 Epoch: 8 Global Step: 354520 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:31,840-Speed 2628.09 samples/sec Loss 7.8462 LearningRate 0.0328 Epoch: 8 Global Step: 354530 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:35,728-Speed 2634.02 samples/sec Loss 7.7922 LearningRate 0.0328 Epoch: 8 Global Step: 354540 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:39,617-Speed 2633.47 samples/sec Loss 7.7391 LearningRate 0.0328 Epoch: 8 Global Step: 354550 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:43,509-Speed 2631.64 samples/sec Loss 7.7020 LearningRate 0.0328 Epoch: 8 Global Step: 354560 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:47,407-Speed 2628.15 samples/sec Loss 7.8112 LearningRate 0.0328 Epoch: 8 Global Step: 354570 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:51,300-Speed 2630.79 samples/sec Loss 7.7775 LearningRate 0.0328 Epoch: 8 Global Step: 354580 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:55,224-Speed 2610.53 samples/sec Loss 7.7025 LearningRate 0.0328 Epoch: 8 Global Step: 354590 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:33:59,115-Speed 2632.10 samples/sec Loss 7.6921 LearningRate 0.0328 Epoch: 8 Global Step: 354600 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:34:03,008-Speed 2631.09 samples/sec Loss 7.7798 LearningRate 0.0328 Epoch: 8 Global Step: 354610 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:06,902-Speed 2630.49 samples/sec Loss 7.6000 LearningRate 0.0328 Epoch: 8 Global Step: 354620 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:10,792-Speed 2632.70 samples/sec Loss 7.6890 LearningRate 0.0328 Epoch: 8 Global Step: 354630 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:14,682-Speed 2633.10 samples/sec Loss 7.6665 LearningRate 0.0328 Epoch: 8 Global Step: 354640 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:18,576-Speed 2630.52 samples/sec Loss 7.7535 LearningRate 0.0328 Epoch: 8 Global Step: 354650 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:22,467-Speed 2632.70 samples/sec Loss 7.6799 LearningRate 0.0328 Epoch: 8 Global Step: 354660 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:26,359-Speed 2631.35 samples/sec Loss 7.7479 LearningRate 0.0328 Epoch: 8 Global Step: 354670 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:30,263-Speed 2624.12 samples/sec Loss 7.6057 LearningRate 0.0328 Epoch: 8 Global Step: 354680 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:34,159-Speed 2629.09 samples/sec Loss 7.7938 LearningRate 0.0328 Epoch: 8 Global Step: 354690 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:38,062-Speed 2624.39 samples/sec Loss 7.7384 LearningRate 0.0328 Epoch: 8 Global Step: 354700 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:34:41,960-Speed 2627.27 samples/sec Loss 7.8235 LearningRate 0.0328 Epoch: 8 Global Step: 354710 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:34:45,857-Speed 2628.38 samples/sec Loss 7.6599 LearningRate 0.0328 Epoch: 8 Global Step: 354720 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:34:49,750-Speed 2631.05 samples/sec Loss 7.8379 LearningRate 0.0328 Epoch: 8 Global Step: 354730 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:34:53,646-Speed 2629.07 samples/sec Loss 7.6803 LearningRate 0.0328 Epoch: 8 Global Step: 354740 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:34:57,542-Speed 2629.30 samples/sec Loss 7.6863 LearningRate 0.0328 Epoch: 8 Global Step: 354750 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:35:01,448-Speed 2621.94 samples/sec Loss 7.6908 LearningRate 0.0328 Epoch: 8 Global Step: 354760 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:35:05,342-Speed 2630.21 samples/sec Loss 7.6837 LearningRate 0.0328 Epoch: 8 Global Step: 354770 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:35:09,236-Speed 2630.36 samples/sec Loss 7.7333 LearningRate 0.0328 Epoch: 8 Global Step: 354780 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:35:13,131-Speed 2629.71 samples/sec Loss 7.7955 LearningRate 0.0328 Epoch: 8 Global Step: 354790 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:35:17,048-Speed 2615.25 samples/sec Loss 7.6121 LearningRate 0.0328 Epoch: 8 Global Step: 354800 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:35:20,941-Speed 2630.92 samples/sec Loss 7.7973 LearningRate 0.0328 Epoch: 8 Global Step: 354810 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:24,831-Speed 2632.95 samples/sec Loss 7.7231 LearningRate 0.0328 Epoch: 8 Global Step: 354820 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:28,731-Speed 2626.19 samples/sec Loss 7.7762 LearningRate 0.0328 Epoch: 8 Global Step: 354830 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:32,631-Speed 2627.41 samples/sec Loss 7.6828 LearningRate 0.0327 Epoch: 8 Global Step: 354840 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:36,533-Speed 2624.79 samples/sec Loss 7.8244 LearningRate 0.0327 Epoch: 8 Global Step: 354850 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:40,448-Speed 2615.95 samples/sec Loss 7.6803 LearningRate 0.0327 Epoch: 8 Global Step: 354860 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:44,352-Speed 2623.74 samples/sec Loss 7.7440 LearningRate 0.0327 Epoch: 8 Global Step: 354870 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:48,246-Speed 2631.42 samples/sec Loss 7.6486 LearningRate 0.0327 Epoch: 8 Global Step: 354880 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:52,149-Speed 2624.16 samples/sec Loss 7.5577 LearningRate 0.0327 Epoch: 8 Global Step: 354890 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:56,047-Speed 2627.63 samples/sec Loss 7.6954 LearningRate 0.0327 Epoch: 8 Global Step: 354900 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:35:59,943-Speed 2629.17 samples/sec Loss 7.7071 LearningRate 0.0327 Epoch: 8 Global Step: 354910 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:36:03,839-Speed 2629.12 samples/sec Loss 7.7537 LearningRate 0.0327 Epoch: 8 Global Step: 354920 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:07,742-Speed 2624.46 samples/sec Loss 7.7806 LearningRate 0.0327 Epoch: 8 Global Step: 354930 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:11,666-Speed 2610.08 samples/sec Loss 7.6816 LearningRate 0.0327 Epoch: 8 Global Step: 354940 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:15,565-Speed 2627.05 samples/sec Loss 7.6824 LearningRate 0.0327 Epoch: 8 Global Step: 354950 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:19,468-Speed 2623.97 samples/sec Loss 7.6456 LearningRate 0.0327 Epoch: 8 Global Step: 354960 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:23,379-Speed 2619.23 samples/sec Loss 7.7913 LearningRate 0.0327 Epoch: 8 Global Step: 354970 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:27,287-Speed 2621.10 samples/sec Loss 7.6512 LearningRate 0.0327 Epoch: 8 Global Step: 354980 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:31,183-Speed 2629.36 samples/sec Loss 7.7576 LearningRate 0.0327 Epoch: 8 Global Step: 354990 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:35,074-Speed 2631.66 samples/sec Loss 7.6935 LearningRate 0.0327 Epoch: 8 Global Step: 355000 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:39,015-Speed 2599.50 samples/sec Loss 7.7970 LearningRate 0.0327 Epoch: 8 Global Step: 355010 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:36:42,911-Speed 2629.01 samples/sec Loss 7.7540 LearningRate 0.0327 Epoch: 8 Global Step: 355020 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:36:46,802-Speed 2632.32 samples/sec Loss 7.8591 LearningRate 0.0327 Epoch: 8 Global Step: 355030 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:36:50,693-Speed 2632.66 samples/sec Loss 7.7303 LearningRate 0.0327 Epoch: 8 Global Step: 355040 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:36:54,604-Speed 2618.49 samples/sec Loss 7.7209 LearningRate 0.0327 Epoch: 8 Global Step: 355050 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:36:58,494-Speed 2633.46 samples/sec Loss 7.6565 LearningRate 0.0327 Epoch: 8 Global Step: 355060 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:37:02,396-Speed 2624.59 samples/sec Loss 7.7311 LearningRate 0.0327 Epoch: 8 Global Step: 355070 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:37:06,269-Speed 2645.10 samples/sec Loss 7.7101 LearningRate 0.0327 Epoch: 8 Global Step: 355080 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:10,190-Speed 2611.84 samples/sec Loss 7.6967 LearningRate 0.0327 Epoch: 8 Global Step: 355090 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:14,104-Speed 2617.41 samples/sec Loss 7.7781 LearningRate 0.0327 Epoch: 8 Global Step: 355100 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:17,991-Speed 2634.96 samples/sec Loss 7.6775 LearningRate 0.0327 Epoch: 8 Global Step: 355110 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:21,891-Speed 2626.60 samples/sec Loss 7.7598 LearningRate 0.0327 Epoch: 8 Global Step: 355120 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:25,784-Speed 2630.46 samples/sec Loss 7.7019 LearningRate 0.0327 Epoch: 8 Global Step: 355130 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:29,686-Speed 2625.41 samples/sec Loss 7.6955 LearningRate 0.0327 Epoch: 8 Global Step: 355140 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:33,575-Speed 2633.30 samples/sec Loss 7.7059 LearningRate 0.0327 Epoch: 8 Global Step: 355150 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:37,511-Speed 2602.41 samples/sec Loss 7.6672 LearningRate 0.0327 Epoch: 8 Global Step: 355160 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:41,400-Speed 2633.83 samples/sec Loss 7.6087 LearningRate 0.0327 Epoch: 8 Global Step: 355170 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:37:45,291-Speed 2632.71 samples/sec Loss 7.6245 LearningRate 0.0327 Epoch: 8 Global Step: 355180 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:37:49,184-Speed 2630.83 samples/sec Loss 7.6346 LearningRate 0.0327 Epoch: 8 Global Step: 355190 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:37:53,077-Speed 2631.78 samples/sec Loss 7.7305 LearningRate 0.0327 Epoch: 8 Global Step: 355200 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:37:56,964-Speed 2634.36 samples/sec Loss 7.7388 LearningRate 0.0327 Epoch: 8 Global Step: 355210 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:00,856-Speed 2631.61 samples/sec Loss 7.8537 LearningRate 0.0327 Epoch: 8 Global Step: 355220 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:04,763-Speed 2621.79 samples/sec Loss 7.6557 LearningRate 0.0327 Epoch: 8 Global Step: 355230 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:08,659-Speed 2629.19 samples/sec Loss 7.6075 LearningRate 0.0327 Epoch: 8 Global Step: 355240 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:12,560-Speed 2626.05 samples/sec Loss 7.7635 LearningRate 0.0327 Epoch: 8 Global Step: 355250 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:16,455-Speed 2629.60 samples/sec Loss 7.8774 LearningRate 0.0327 Epoch: 8 Global Step: 355260 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:20,346-Speed 2632.54 samples/sec Loss 7.7731 LearningRate 0.0327 Epoch: 8 Global Step: 355270 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:24,243-Speed 2628.50 samples/sec Loss 7.6849 LearningRate 0.0327 Epoch: 8 Global Step: 355280 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:38:28,140-Speed 2627.65 samples/sec Loss 7.6357 LearningRate 0.0327 Epoch: 8 Global Step: 355290 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:38:32,011-Speed 2646.03 samples/sec Loss 7.8063 LearningRate 0.0327 Epoch: 8 Global Step: 355300 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:35,908-Speed 2628.14 samples/sec Loss 7.6530 LearningRate 0.0327 Epoch: 8 Global Step: 355310 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:39,800-Speed 2631.92 samples/sec Loss 7.7135 LearningRate 0.0327 Epoch: 8 Global Step: 355320 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:43,696-Speed 2629.43 samples/sec Loss 7.6680 LearningRate 0.0327 Epoch: 8 Global Step: 355330 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:47,588-Speed 2631.66 samples/sec Loss 7.7702 LearningRate 0.0327 Epoch: 8 Global Step: 355340 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:51,478-Speed 2633.27 samples/sec Loss 7.6612 LearningRate 0.0327 Epoch: 8 Global Step: 355350 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:55,388-Speed 2619.32 samples/sec Loss 7.7439 LearningRate 0.0327 Epoch: 8 Global Step: 355360 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:38:59,285-Speed 2628.60 samples/sec Loss 7.6554 LearningRate 0.0327 Epoch: 8 Global Step: 355370 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:39:03,107-Speed 2679.67 samples/sec Loss 8.2837 LearningRate 0.0327 Epoch: 8 Global Step: 355380 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:06,999-Speed 2632.11 samples/sec Loss 8.7650 LearningRate 0.0327 Epoch: 8 Global Step: 355390 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:10,889-Speed 2633.09 samples/sec Loss 8.3403 LearningRate 0.0327 Epoch: 8 Global Step: 355400 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:14,775-Speed 2635.17 samples/sec Loss 7.9134 LearningRate 0.0327 Epoch: 8 Global Step: 355410 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:18,667-Speed 2632.02 samples/sec Loss 7.8366 LearningRate 0.0327 Epoch: 8 Global Step: 355420 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:22,562-Speed 2630.01 samples/sec Loss 7.8695 LearningRate 0.0327 Epoch: 8 Global Step: 355430 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:26,451-Speed 2634.00 samples/sec Loss 7.6236 LearningRate 0.0327 Epoch: 8 Global Step: 355440 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:30,347-Speed 2628.77 samples/sec Loss 7.5666 LearningRate 0.0327 Epoch: 8 Global Step: 355450 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:34,229-Speed 2637.71 samples/sec Loss 7.7910 LearningRate 0.0327 Epoch: 8 Global Step: 355460 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:38,125-Speed 2629.21 samples/sec Loss 7.6151 LearningRate 0.0327 Epoch: 8 Global Step: 355470 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:39:42,017-Speed 2631.77 samples/sec Loss 7.7587 LearningRate 0.0327 Epoch: 8 Global Step: 355480 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:39:45,915-Speed 2628.10 samples/sec Loss 7.6524 LearningRate 0.0327 Epoch: 8 Global Step: 355490 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:39:49,816-Speed 2625.23 samples/sec Loss 7.7620 LearningRate 0.0327 Epoch: 8 Global Step: 355500 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:39:53,713-Speed 2628.54 samples/sec Loss 7.8042 LearningRate 0.0327 Epoch: 8 Global Step: 355510 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:39:57,604-Speed 2634.62 samples/sec Loss 7.6585 LearningRate 0.0327 Epoch: 8 Global Step: 355520 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:40:01,509-Speed 2622.78 samples/sec Loss 7.6066 LearningRate 0.0327 Epoch: 8 Global Step: 355530 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:40:05,400-Speed 2631.81 samples/sec Loss 7.6866 LearningRate 0.0327 Epoch: 8 Global Step: 355540 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:40:09,294-Speed 2630.15 samples/sec Loss 7.6839 LearningRate 0.0327 Epoch: 8 Global Step: 355550 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:40:13,194-Speed 2627.05 samples/sec Loss 7.8628 LearningRate 0.0326 Epoch: 8 Global Step: 355560 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:40:17,085-Speed 2631.91 samples/sec Loss 7.8250 LearningRate 0.0326 Epoch: 8 Global Step: 355570 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:40:20,990-Speed 2623.48 samples/sec Loss 7.6529 LearningRate 0.0326 Epoch: 8 Global Step: 355580 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:24,877-Speed 2634.94 samples/sec Loss 7.7250 LearningRate 0.0326 Epoch: 8 Global Step: 355590 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:28,805-Speed 2607.72 samples/sec Loss 7.7868 LearningRate 0.0326 Epoch: 8 Global Step: 355600 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:32,703-Speed 2627.60 samples/sec Loss 7.6186 LearningRate 0.0326 Epoch: 8 Global Step: 355610 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:36,597-Speed 2630.59 samples/sec Loss 7.7022 LearningRate 0.0326 Epoch: 8 Global Step: 355620 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:40,497-Speed 2626.15 samples/sec Loss 7.7959 LearningRate 0.0326 Epoch: 8 Global Step: 355630 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:44,419-Speed 2611.48 samples/sec Loss 7.7304 LearningRate 0.0326 Epoch: 8 Global Step: 355640 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:48,338-Speed 2614.33 samples/sec Loss 7.7160 LearningRate 0.0326 Epoch: 8 Global Step: 355650 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:52,235-Speed 2627.83 samples/sec Loss 7.7442 LearningRate 0.0326 Epoch: 8 Global Step: 355660 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:40:56,126-Speed 2632.56 samples/sec Loss 7.8065 LearningRate 0.0326 Epoch: 8 Global Step: 355670 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:41:00,024-Speed 2627.77 samples/sec Loss 7.5671 LearningRate 0.0326 Epoch: 8 Global Step: 355680 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:03,919-Speed 2629.13 samples/sec Loss 7.7023 LearningRate 0.0326 Epoch: 8 Global Step: 355690 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:07,819-Speed 2626.31 samples/sec Loss 7.7532 LearningRate 0.0326 Epoch: 8 Global Step: 355700 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:11,768-Speed 2593.98 samples/sec Loss 7.6806 LearningRate 0.0326 Epoch: 8 Global Step: 355710 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:15,842-Speed 2514.02 samples/sec Loss 7.7160 LearningRate 0.0326 Epoch: 8 Global Step: 355720 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:19,779-Speed 2601.58 samples/sec Loss 7.6739 LearningRate 0.0326 Epoch: 8 Global Step: 355730 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:23,683-Speed 2624.26 samples/sec Loss 7.6638 LearningRate 0.0326 Epoch: 8 Global Step: 355740 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:27,574-Speed 2632.30 samples/sec Loss 7.5338 LearningRate 0.0326 Epoch: 8 Global Step: 355750 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:31,469-Speed 2629.85 samples/sec Loss 7.7141 LearningRate 0.0326 Epoch: 8 Global Step: 355760 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:35,376-Speed 2621.48 samples/sec Loss 7.6045 LearningRate 0.0326 Epoch: 8 Global Step: 355770 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:41:39,275-Speed 2627.26 samples/sec Loss 7.6628 LearningRate 0.0326 Epoch: 8 Global Step: 355780 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:41:43,170-Speed 2629.78 samples/sec Loss 7.7082 LearningRate 0.0326 Epoch: 8 Global Step: 355790 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:41:47,065-Speed 2628.99 samples/sec Loss 7.6840 LearningRate 0.0326 Epoch: 8 Global Step: 355800 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:41:50,964-Speed 2627.33 samples/sec Loss 7.6820 LearningRate 0.0326 Epoch: 8 Global Step: 355810 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:41:54,862-Speed 2627.62 samples/sec Loss 7.7782 LearningRate 0.0326 Epoch: 8 Global Step: 355820 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:41:58,786-Speed 2610.40 samples/sec Loss 7.6278 LearningRate 0.0326 Epoch: 8 Global Step: 355830 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:42:02,703-Speed 2615.27 samples/sec Loss 7.5896 LearningRate 0.0326 Epoch: 8 Global Step: 355840 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:42:06,616-Speed 2617.03 samples/sec Loss 7.6346 LearningRate 0.0326 Epoch: 8 Global Step: 355850 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:42:10,529-Speed 2617.39 samples/sec Loss 7.7341 LearningRate 0.0326 Epoch: 8 Global Step: 355860 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:42:14,445-Speed 2616.04 samples/sec Loss 7.7878 LearningRate 0.0326 Epoch: 8 Global Step: 355870 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:42:18,361-Speed 2615.98 samples/sec Loss 7.7969 LearningRate 0.0326 Epoch: 8 Global Step: 355880 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:22,270-Speed 2620.09 samples/sec Loss 7.7508 LearningRate 0.0326 Epoch: 8 Global Step: 355890 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:26,177-Speed 2621.54 samples/sec Loss 7.7563 LearningRate 0.0326 Epoch: 8 Global Step: 355900 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:30,092-Speed 2616.01 samples/sec Loss 7.7013 LearningRate 0.0326 Epoch: 8 Global Step: 355910 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:33,996-Speed 2624.39 samples/sec Loss 7.7377 LearningRate 0.0326 Epoch: 8 Global Step: 355920 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:37,896-Speed 2626.09 samples/sec Loss 7.7082 LearningRate 0.0326 Epoch: 8 Global Step: 355930 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:41,801-Speed 2622.76 samples/sec Loss 7.5570 LearningRate 0.0326 Epoch: 8 Global Step: 355940 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:45,693-Speed 2631.93 samples/sec Loss 7.7011 LearningRate 0.0326 Epoch: 8 Global Step: 355950 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:49,585-Speed 2631.38 samples/sec Loss 7.6783 LearningRate 0.0326 Epoch: 8 Global Step: 355960 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:53,479-Speed 2631.06 samples/sec Loss 7.5363 LearningRate 0.0326 Epoch: 8 Global Step: 355970 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:42:57,449-Speed 2579.44 samples/sec Loss 7.6423 LearningRate 0.0326 Epoch: 8 Global Step: 355980 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:43:01,348-Speed 2627.23 samples/sec Loss 7.7565 LearningRate 0.0326 Epoch: 8 Global Step: 355990 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:43:05,243-Speed 2629.11 samples/sec Loss 7.7676 LearningRate 0.0326 Epoch: 8 Global Step: 356000 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:43:09,139-Speed 2629.59 samples/sec Loss 7.6751 LearningRate 0.0326 Epoch: 8 Global Step: 356010 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:43:13,199-Speed 2523.24 samples/sec Loss 7.8128 LearningRate 0.0326 Epoch: 8 Global Step: 356020 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:43:17,159-Speed 2586.16 samples/sec Loss 7.6596 LearningRate 0.0326 Epoch: 8 Global Step: 356030 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:43:21,081-Speed 2612.04 samples/sec Loss 7.7541 LearningRate 0.0326 Epoch: 8 Global Step: 356040 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:43:24,981-Speed 2626.27 samples/sec Loss 7.7722 LearningRate 0.0326 Epoch: 8 Global Step: 356050 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:43:28,883-Speed 2624.77 samples/sec Loss 7.6084 LearningRate 0.0326 Epoch: 8 Global Step: 356060 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:43:32,756-Speed 2644.16 samples/sec Loss 7.8864 LearningRate 0.0326 Epoch: 8 Global Step: 356070 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:43:36,650-Speed 2630.81 samples/sec Loss 7.6738 LearningRate 0.0326 Epoch: 8 Global Step: 356080 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:43:40,542-Speed 2631.19 samples/sec Loss 7.6449 LearningRate 0.0326 Epoch: 8 Global Step: 356090 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:43:44,434-Speed 2632.21 samples/sec Loss 7.8156 LearningRate 0.0326 Epoch: 8 Global Step: 356100 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:43:48,329-Speed 2629.65 samples/sec Loss 7.7003 LearningRate 0.0326 Epoch: 8 Global Step: 356110 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:43:52,222-Speed 2631.59 samples/sec Loss 7.5806 LearningRate 0.0326 Epoch: 8 Global Step: 356120 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:43:56,116-Speed 2630.51 samples/sec Loss 7.7509 LearningRate 0.0326 Epoch: 8 Global Step: 356130 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:44:00,015-Speed 2626.54 samples/sec Loss 7.6865 LearningRate 0.0326 Epoch: 8 Global Step: 356140 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:44:03,893-Speed 2640.79 samples/sec Loss 9.2135 LearningRate 0.0326 Epoch: 8 Global Step: 356150 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:07,787-Speed 2631.33 samples/sec Loss 8.4317 LearningRate 0.0326 Epoch: 8 Global Step: 356160 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:11,675-Speed 2633.95 samples/sec Loss 7.7725 LearningRate 0.0326 Epoch: 8 Global Step: 356170 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:15,596-Speed 2612.59 samples/sec Loss 7.8412 LearningRate 0.0326 Epoch: 8 Global Step: 356180 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:19,488-Speed 2631.75 samples/sec Loss 7.7790 LearningRate 0.0326 Epoch: 8 Global Step: 356190 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:23,411-Speed 2611.00 samples/sec Loss 7.8121 LearningRate 0.0326 Epoch: 8 Global Step: 356200 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:27,305-Speed 2630.31 samples/sec Loss 7.7417 LearningRate 0.0326 Epoch: 8 Global Step: 356210 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:31,196-Speed 2632.73 samples/sec Loss 7.6680 LearningRate 0.0326 Epoch: 8 Global Step: 356220 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:35,088-Speed 2631.26 samples/sec Loss 7.6373 LearningRate 0.0326 Epoch: 8 Global Step: 356230 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:38,979-Speed 2632.85 samples/sec Loss 7.6782 LearningRate 0.0326 Epoch: 8 Global Step: 356240 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:44:42,872-Speed 2631.06 samples/sec Loss 7.7989 LearningRate 0.0326 Epoch: 8 Global Step: 356250 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:44:46,764-Speed 2631.49 samples/sec Loss 7.7374 LearningRate 0.0326 Epoch: 8 Global Step: 356260 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:44:50,675-Speed 2618.96 samples/sec Loss 7.7430 LearningRate 0.0326 Epoch: 8 Global Step: 356270 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:44:54,591-Speed 2615.93 samples/sec Loss 7.7115 LearningRate 0.0326 Epoch: 8 Global Step: 356280 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:44:58,487-Speed 2628.77 samples/sec Loss 7.6895 LearningRate 0.0325 Epoch: 8 Global Step: 356290 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:02,377-Speed 2633.15 samples/sec Loss 7.7384 LearningRate 0.0325 Epoch: 8 Global Step: 356300 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:06,277-Speed 2626.06 samples/sec Loss 7.6071 LearningRate 0.0325 Epoch: 8 Global Step: 356310 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:10,187-Speed 2619.82 samples/sec Loss 7.5413 LearningRate 0.0325 Epoch: 8 Global Step: 356320 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:14,083-Speed 2629.17 samples/sec Loss 7.6724 LearningRate 0.0325 Epoch: 8 Global Step: 356330 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:17,983-Speed 2626.39 samples/sec Loss 7.7458 LearningRate 0.0325 Epoch: 8 Global Step: 356340 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:21,883-Speed 2625.65 samples/sec Loss 7.7035 LearningRate 0.0325 Epoch: 8 Global Step: 356350 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:45:25,776-Speed 2631.39 samples/sec Loss 7.6625 LearningRate 0.0325 Epoch: 8 Global Step: 356360 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:45:29,666-Speed 2633.26 samples/sec Loss 7.7173 LearningRate 0.0325 Epoch: 8 Global Step: 356370 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:45:33,561-Speed 2629.62 samples/sec Loss 7.5518 LearningRate 0.0325 Epoch: 8 Global Step: 356380 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:45:37,473-Speed 2617.77 samples/sec Loss 7.7002 LearningRate 0.0325 Epoch: 8 Global Step: 356390 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:45:41,366-Speed 2631.93 samples/sec Loss 7.6817 LearningRate 0.0325 Epoch: 8 Global Step: 356400 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:45:45,241-Speed 2642.98 samples/sec Loss 7.7068 LearningRate 0.0325 Epoch: 8 Global Step: 356410 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:49,131-Speed 2632.97 samples/sec Loss 7.6345 LearningRate 0.0325 Epoch: 8 Global Step: 356420 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:53,038-Speed 2621.81 samples/sec Loss 7.6751 LearningRate 0.0325 Epoch: 8 Global Step: 356430 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:45:56,935-Speed 2627.95 samples/sec Loss 7.6969 LearningRate 0.0325 Epoch: 8 Global Step: 356440 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:46:00,828-Speed 2631.02 samples/sec Loss 7.6407 LearningRate 0.0325 Epoch: 8 Global Step: 356450 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:46:04,722-Speed 2630.87 samples/sec Loss 7.6235 LearningRate 0.0325 Epoch: 8 Global Step: 356460 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:46:08,613-Speed 2632.23 samples/sec Loss 7.7432 LearningRate 0.0325 Epoch: 8 Global Step: 356470 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:46:12,520-Speed 2621.34 samples/sec Loss 7.7405 LearningRate 0.0325 Epoch: 8 Global Step: 356480 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:46:16,414-Speed 2630.54 samples/sec Loss 7.8584 LearningRate 0.0325 Epoch: 8 Global Step: 356490 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:46:20,307-Speed 2631.08 samples/sec Loss 7.7918 LearningRate 0.0325 Epoch: 8 Global Step: 356500 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:46:24,202-Speed 2628.94 samples/sec Loss 7.5467 LearningRate 0.0325 Epoch: 8 Global Step: 356510 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:28,102-Speed 2626.45 samples/sec Loss 7.6550 LearningRate 0.0325 Epoch: 8 Global Step: 356520 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:31,997-Speed 2629.58 samples/sec Loss 7.5610 LearningRate 0.0325 Epoch: 8 Global Step: 356530 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:35,894-Speed 2628.50 samples/sec Loss 7.8231 LearningRate 0.0325 Epoch: 8 Global Step: 356540 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:39,797-Speed 2624.03 samples/sec Loss 7.7833 LearningRate 0.0325 Epoch: 8 Global Step: 356550 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:43,706-Speed 2620.56 samples/sec Loss 7.6605 LearningRate 0.0325 Epoch: 8 Global Step: 356560 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:47,609-Speed 2623.96 samples/sec Loss 7.6181 LearningRate 0.0325 Epoch: 8 Global Step: 356570 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:51,505-Speed 2629.37 samples/sec Loss 7.5971 LearningRate 0.0325 Epoch: 8 Global Step: 356580 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:55,399-Speed 2630.25 samples/sec Loss 7.6609 LearningRate 0.0325 Epoch: 8 Global Step: 356590 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:46:59,289-Speed 2634.28 samples/sec Loss 7.6658 LearningRate 0.0325 Epoch: 8 Global Step: 356600 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:03,186-Speed 2627.77 samples/sec Loss 7.6585 LearningRate 0.0325 Epoch: 8 Global Step: 356610 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:47:07,090-Speed 2623.29 samples/sec Loss 7.7348 LearningRate 0.0325 Epoch: 8 Global Step: 356620 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:47:10,982-Speed 2631.84 samples/sec Loss 7.7905 LearningRate 0.0325 Epoch: 8 Global Step: 356630 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:47:14,881-Speed 2627.24 samples/sec Loss 7.6836 LearningRate 0.0325 Epoch: 8 Global Step: 356640 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:47:18,759-Speed 2641.74 samples/sec Loss 7.7589 LearningRate 0.0325 Epoch: 8 Global Step: 356650 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:22,653-Speed 2630.26 samples/sec Loss 7.6117 LearningRate 0.0325 Epoch: 8 Global Step: 356660 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:26,545-Speed 2631.52 samples/sec Loss 7.7562 LearningRate 0.0325 Epoch: 8 Global Step: 356670 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:30,445-Speed 2626.04 samples/sec Loss 7.7302 LearningRate 0.0325 Epoch: 8 Global Step: 356680 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:34,351-Speed 2622.35 samples/sec Loss 7.6678 LearningRate 0.0325 Epoch: 8 Global Step: 356690 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:38,244-Speed 2630.47 samples/sec Loss 7.7905 LearningRate 0.0325 Epoch: 8 Global Step: 356700 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:42,139-Speed 2630.32 samples/sec Loss 7.5942 LearningRate 0.0325 Epoch: 8 Global Step: 356710 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:46,028-Speed 2633.41 samples/sec Loss 7.6732 LearningRate 0.0325 Epoch: 8 Global Step: 356720 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:49,922-Speed 2630.63 samples/sec Loss 7.5726 LearningRate 0.0325 Epoch: 8 Global Step: 356730 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:53,813-Speed 2632.08 samples/sec Loss 7.5734 LearningRate 0.0325 Epoch: 8 Global Step: 356740 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:47:57,737-Speed 2610.93 samples/sec Loss 7.6065 LearningRate 0.0325 Epoch: 8 Global Step: 356750 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:48:01,630-Speed 2630.94 samples/sec Loss 7.6829 LearningRate 0.0325 Epoch: 8 Global Step: 356760 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:48:05,542-Speed 2617.97 samples/sec Loss 7.6552 LearningRate 0.0325 Epoch: 8 Global Step: 356770 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:48:09,470-Speed 2607.37 samples/sec Loss 7.6230 LearningRate 0.0325 Epoch: 8 Global Step: 356780 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:48:13,380-Speed 2620.29 samples/sec Loss 7.5721 LearningRate 0.0325 Epoch: 8 Global Step: 356790 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:48:17,309-Speed 2606.43 samples/sec Loss 7.7514 LearningRate 0.0325 Epoch: 8 Global Step: 356800 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:21,209-Speed 2626.25 samples/sec Loss 7.5971 LearningRate 0.0325 Epoch: 8 Global Step: 356810 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:25,109-Speed 2626.51 samples/sec Loss 7.6305 LearningRate 0.0325 Epoch: 8 Global Step: 356820 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:29,009-Speed 2627.16 samples/sec Loss 7.7463 LearningRate 0.0325 Epoch: 8 Global Step: 356830 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:32,906-Speed 2627.60 samples/sec Loss 7.5983 LearningRate 0.0325 Epoch: 8 Global Step: 356840 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:36,819-Speed 2617.73 samples/sec Loss 7.6770 LearningRate 0.0325 Epoch: 8 Global Step: 356850 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:40,717-Speed 2627.52 samples/sec Loss 7.8191 LearningRate 0.0325 Epoch: 8 Global Step: 356860 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:44,608-Speed 2632.82 samples/sec Loss 7.6768 LearningRate 0.0325 Epoch: 8 Global Step: 356870 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:48,511-Speed 2623.96 samples/sec Loss 7.7077 LearningRate 0.0325 Epoch: 8 Global Step: 356880 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:52,498-Speed 2568.92 samples/sec Loss 7.6959 LearningRate 0.0325 Epoch: 8 Global Step: 356890 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:48:56,421-Speed 2611.43 samples/sec Loss 7.6331 LearningRate 0.0325 Epoch: 8 Global Step: 356900 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:49:00,320-Speed 2626.64 samples/sec Loss 7.5755 LearningRate 0.0325 Epoch: 8 Global Step: 356910 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:49:04,196-Speed 2642.67 samples/sec Loss 7.7904 LearningRate 0.0325 Epoch: 8 Global Step: 356920 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:08,094-Speed 2627.74 samples/sec Loss 7.7270 LearningRate 0.0325 Epoch: 8 Global Step: 356930 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:11,993-Speed 2627.06 samples/sec Loss 7.6865 LearningRate 0.0325 Epoch: 8 Global Step: 356940 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:15,883-Speed 2632.60 samples/sec Loss 7.6958 LearningRate 0.0325 Epoch: 8 Global Step: 356950 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:19,790-Speed 2621.83 samples/sec Loss 7.6914 LearningRate 0.0325 Epoch: 8 Global Step: 356960 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:23,694-Speed 2624.06 samples/sec Loss 7.6141 LearningRate 0.0325 Epoch: 8 Global Step: 356970 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:27,590-Speed 2629.57 samples/sec Loss 7.6795 LearningRate 0.0325 Epoch: 8 Global Step: 356980 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:31,498-Speed 2620.21 samples/sec Loss 7.6386 LearningRate 0.0325 Epoch: 8 Global Step: 356990 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:35,413-Speed 2616.24 samples/sec Loss 7.6432 LearningRate 0.0325 Epoch: 8 Global Step: 357000 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:39,312-Speed 2627.00 samples/sec Loss 7.7518 LearningRate 0.0325 Epoch: 8 Global Step: 357010 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:49:43,204-Speed 2631.93 samples/sec Loss 7.7398 LearningRate 0.0324 Epoch: 8 Global Step: 357020 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:49:47,103-Speed 2627.00 samples/sec Loss 7.6507 LearningRate 0.0324 Epoch: 8 Global Step: 357030 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:49:51,002-Speed 2627.19 samples/sec Loss 7.6526 LearningRate 0.0324 Epoch: 8 Global Step: 357040 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:49:54,890-Speed 2633.72 samples/sec Loss 7.6365 LearningRate 0.0324 Epoch: 8 Global Step: 357050 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:49:58,745-Speed 2657.49 samples/sec Loss 8.4114 LearningRate 0.0324 Epoch: 8 Global Step: 357060 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:02,655-Speed 2619.14 samples/sec Loss 9.4159 LearningRate 0.0324 Epoch: 8 Global Step: 357070 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:06,547-Speed 2631.69 samples/sec Loss 8.4404 LearningRate 0.0324 Epoch: 8 Global Step: 357080 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:10,431-Speed 2636.48 samples/sec Loss 7.9256 LearningRate 0.0324 Epoch: 8 Global Step: 357090 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:14,335-Speed 2624.28 samples/sec Loss 7.8087 LearningRate 0.0324 Epoch: 8 Global Step: 357100 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:18,224-Speed 2633.76 samples/sec Loss 7.6863 LearningRate 0.0324 Epoch: 8 Global Step: 357110 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:22,114-Speed 2632.84 samples/sec Loss 7.7387 LearningRate 0.0324 Epoch: 8 Global Step: 357120 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:26,007-Speed 2630.91 samples/sec Loss 7.6710 LearningRate 0.0324 Epoch: 8 Global Step: 357130 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:29,892-Speed 2636.73 samples/sec Loss 7.6754 LearningRate 0.0324 Epoch: 8 Global Step: 357140 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:33,785-Speed 2631.14 samples/sec Loss 7.7168 LearningRate 0.0324 Epoch: 8 Global Step: 357150 Fp16 Grad Scale: 1024 Required: 53 hours
Training: 2022-04-14 11:50:37,677-Speed 2631.45 samples/sec Loss 7.6490 LearningRate 0.0324 Epoch: 8 Global Step: 357160 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:50:41,562-Speed 2636.11 samples/sec Loss 7.7720 LearningRate 0.0324 Epoch: 8 Global Step: 357170 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:50:45,455-Speed 2630.88 samples/sec Loss 7.7537 LearningRate 0.0324 Epoch: 8 Global Step: 357180 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:50:49,342-Speed 2635.12 samples/sec Loss 7.6456 LearningRate 0.0324 Epoch: 8 Global Step: 357190 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:50:53,236-Speed 2630.14 samples/sec Loss 7.5654 LearningRate 0.0324 Epoch: 8 Global Step: 357200 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:50:57,134-Speed 2628.09 samples/sec Loss 7.6231 LearningRate 0.0324 Epoch: 8 Global Step: 357210 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:51:01,030-Speed 2628.98 samples/sec Loss 7.6185 LearningRate 0.0324 Epoch: 8 Global Step: 357220 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:51:04,927-Speed 2628.56 samples/sec Loss 7.7181 LearningRate 0.0324 Epoch: 8 Global Step: 357230 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:51:08,823-Speed 2628.62 samples/sec Loss 7.7302 LearningRate 0.0324 Epoch: 8 Global Step: 357240 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:51:12,721-Speed 2628.22 samples/sec Loss 7.6188 LearningRate 0.0324 Epoch: 8 Global Step: 357250 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 11:51:16,610-Speed 2633.73 samples/sec Loss 7.7866 LearningRate 0.0324 Epoch: 8 Global Step: 357260 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:20,525-Speed 2616.10 samples/sec Loss 7.6598 LearningRate 0.0324 Epoch: 8 Global Step: 357270 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:24,415-Speed 2633.21 samples/sec Loss 7.6241 LearningRate 0.0324 Epoch: 8 Global Step: 357280 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:28,304-Speed 2634.29 samples/sec Loss 7.6877 LearningRate 0.0324 Epoch: 8 Global Step: 357290 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:32,198-Speed 2629.74 samples/sec Loss 7.6424 LearningRate 0.0324 Epoch: 8 Global Step: 357300 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:36,089-Speed 2632.25 samples/sec Loss 7.7009 LearningRate 0.0324 Epoch: 8 Global Step: 357310 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:39,977-Speed 2634.51 samples/sec Loss 7.6812 LearningRate 0.0324 Epoch: 8 Global Step: 357320 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:43,878-Speed 2626.04 samples/sec Loss 7.8029 LearningRate 0.0324 Epoch: 8 Global Step: 357330 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:47,764-Speed 2635.83 samples/sec Loss 7.6532 LearningRate 0.0324 Epoch: 8 Global Step: 357340 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:51,654-Speed 2632.84 samples/sec Loss 7.8506 LearningRate 0.0324 Epoch: 8 Global Step: 357350 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 11:51:55,542-Speed 2634.04 samples/sec Loss 7.6439 LearningRate 0.0324 Epoch: 8 Global Step: 357360 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:51:59,464-Speed 2612.07 samples/sec Loss 7.7753 LearningRate 0.0324 Epoch: 8 Global Step: 357370 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:03,388-Speed 2610.34 samples/sec Loss 7.7084 LearningRate 0.0324 Epoch: 8 Global Step: 357380 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:07,465-Speed 2512.12 samples/sec Loss 7.6427 LearningRate 0.0324 Epoch: 8 Global Step: 357390 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:11,529-Speed 2520.23 samples/sec Loss 7.6422 LearningRate 0.0324 Epoch: 8 Global Step: 357400 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:15,593-Speed 2520.89 samples/sec Loss 7.6046 LearningRate 0.0324 Epoch: 8 Global Step: 357410 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:19,595-Speed 2559.22 samples/sec Loss 7.7895 LearningRate 0.0324 Epoch: 8 Global Step: 357420 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:23,582-Speed 2569.13 samples/sec Loss 7.7364 LearningRate 0.0324 Epoch: 8 Global Step: 357430 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:27,473-Speed 2632.51 samples/sec Loss 7.6526 LearningRate 0.0324 Epoch: 8 Global Step: 357440 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:31,365-Speed 2631.75 samples/sec Loss 7.8005 LearningRate 0.0324 Epoch: 8 Global Step: 357450 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 11:52:35,253-Speed 2634.16 samples/sec Loss 7.5113 LearningRate 0.0324 Epoch: 8 Global Step: 357460 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:52:39,164-Speed 2618.97 samples/sec Loss 7.7414 LearningRate 0.0324 Epoch: 8 Global Step: 357470 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:52:43,057-Speed 2631.03 samples/sec Loss 7.6165 LearningRate 0.0324 Epoch: 8 Global Step: 357480 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:52:46,948-Speed 2632.53 samples/sec Loss 7.6351 LearningRate 0.0324 Epoch: 8 Global Step: 357490 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:52:50,841-Speed 2631.67 samples/sec Loss 7.8054 LearningRate 0.0324 Epoch: 8 Global Step: 357500 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:52:54,732-Speed 2631.70 samples/sec Loss 7.7448 LearningRate 0.0324 Epoch: 8 Global Step: 357510 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:52:58,627-Speed 2629.87 samples/sec Loss 7.7769 LearningRate 0.0324 Epoch: 8 Global Step: 357520 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:53:02,524-Speed 2628.80 samples/sec Loss 7.6471 LearningRate 0.0324 Epoch: 8 Global Step: 357530 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:53:06,426-Speed 2624.66 samples/sec Loss 7.7518 LearningRate 0.0324 Epoch: 8 Global Step: 357540 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:53:10,332-Speed 2622.01 samples/sec Loss 7.6907 LearningRate 0.0324 Epoch: 8 Global Step: 357550 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:53:14,226-Speed 2631.14 samples/sec Loss 7.7151 LearningRate 0.0324 Epoch: 8 Global Step: 357560 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:18,119-Speed 2630.82 samples/sec Loss 7.7288 LearningRate 0.0324 Epoch: 8 Global Step: 357570 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:22,034-Speed 2616.18 samples/sec Loss 7.6792 LearningRate 0.0324 Epoch: 8 Global Step: 357580 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:26,107-Speed 2514.75 samples/sec Loss 7.6511 LearningRate 0.0324 Epoch: 8 Global Step: 357590 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:30,007-Speed 2626.56 samples/sec Loss 7.7355 LearningRate 0.0324 Epoch: 8 Global Step: 357600 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:33,904-Speed 2628.61 samples/sec Loss 7.7932 LearningRate 0.0324 Epoch: 8 Global Step: 357610 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:37,809-Speed 2622.65 samples/sec Loss 7.6358 LearningRate 0.0324 Epoch: 8 Global Step: 357620 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:41,714-Speed 2622.79 samples/sec Loss 7.6470 LearningRate 0.0324 Epoch: 8 Global Step: 357630 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:45,792-Speed 2511.77 samples/sec Loss 7.7434 LearningRate 0.0324 Epoch: 8 Global Step: 357640 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:49,806-Speed 2551.66 samples/sec Loss 7.6708 LearningRate 0.0324 Epoch: 8 Global Step: 357650 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:53:53,707-Speed 2625.51 samples/sec Loss 7.7055 LearningRate 0.0324 Epoch: 8 Global Step: 357660 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:53:57,610-Speed 2624.57 samples/sec Loss 7.6078 LearningRate 0.0324 Epoch: 8 Global Step: 357670 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:54:01,494-Speed 2637.17 samples/sec Loss 7.7067 LearningRate 0.0324 Epoch: 8 Global Step: 357680 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:54:05,365-Speed 2646.00 samples/sec Loss 8.1297 LearningRate 0.0324 Epoch: 8 Global Step: 357690 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:09,279-Speed 2616.90 samples/sec Loss 8.2875 LearningRate 0.0324 Epoch: 8 Global Step: 357700 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:13,188-Speed 2620.54 samples/sec Loss 7.8820 LearningRate 0.0324 Epoch: 8 Global Step: 357710 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:17,100-Speed 2618.51 samples/sec Loss 7.7424 LearningRate 0.0324 Epoch: 8 Global Step: 357720 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:20,989-Speed 2632.91 samples/sec Loss 7.6995 LearningRate 0.0324 Epoch: 8 Global Step: 357730 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:24,882-Speed 2631.64 samples/sec Loss 7.7010 LearningRate 0.0323 Epoch: 8 Global Step: 357740 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:28,770-Speed 2634.35 samples/sec Loss 7.6387 LearningRate 0.0323 Epoch: 8 Global Step: 357750 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:32,659-Speed 2634.21 samples/sec Loss 7.7460 LearningRate 0.0323 Epoch: 8 Global Step: 357760 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:36,559-Speed 2625.85 samples/sec Loss 7.8311 LearningRate 0.0323 Epoch: 8 Global Step: 357770 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:40,450-Speed 2632.80 samples/sec Loss 7.5836 LearningRate 0.0323 Epoch: 8 Global Step: 357780 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 11:54:44,345-Speed 2629.72 samples/sec Loss 7.7265 LearningRate 0.0323 Epoch: 8 Global Step: 357790 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:54:48,242-Speed 2628.27 samples/sec Loss 7.7511 LearningRate 0.0323 Epoch: 8 Global Step: 357800 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:54:52,136-Speed 2631.21 samples/sec Loss 7.7803 LearningRate 0.0323 Epoch: 8 Global Step: 357810 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:54:56,026-Speed 2632.45 samples/sec Loss 7.5793 LearningRate 0.0323 Epoch: 8 Global Step: 357820 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:54:59,922-Speed 2629.13 samples/sec Loss 7.7185 LearningRate 0.0323 Epoch: 8 Global Step: 357830 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:55:03,810-Speed 2633.85 samples/sec Loss 7.6213 LearningRate 0.0323 Epoch: 8 Global Step: 357840 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:55:07,707-Speed 2629.25 samples/sec Loss 7.7613 LearningRate 0.0323 Epoch: 8 Global Step: 357850 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:55:11,609-Speed 2624.92 samples/sec Loss 7.5637 LearningRate 0.0323 Epoch: 8 Global Step: 357860 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:55:15,527-Speed 2614.00 samples/sec Loss 7.7085 LearningRate 0.0323 Epoch: 8 Global Step: 357870 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:55:19,446-Speed 2613.80 samples/sec Loss 7.5694 LearningRate 0.0323 Epoch: 8 Global Step: 357880 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 11:55:23,343-Speed 2628.31 samples/sec Loss 7.6870 LearningRate 0.0323 Epoch: 8 Global Step: 357890 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:27,263-Speed 2613.63 samples/sec Loss 7.7097 LearningRate 0.0323 Epoch: 8 Global Step: 357900 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:31,156-Speed 2630.83 samples/sec Loss 7.6351 LearningRate 0.0323 Epoch: 8 Global Step: 357910 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:35,052-Speed 2628.27 samples/sec Loss 7.6474 LearningRate 0.0323 Epoch: 8 Global Step: 357920 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:38,951-Speed 2627.03 samples/sec Loss 7.6575 LearningRate 0.0323 Epoch: 8 Global Step: 357930 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:42,841-Speed 2633.31 samples/sec Loss 7.6052 LearningRate 0.0323 Epoch: 8 Global Step: 357940 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:46,738-Speed 2628.60 samples/sec Loss 7.5448 LearningRate 0.0323 Epoch: 8 Global Step: 357950 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:50,641-Speed 2624.23 samples/sec Loss 7.7304 LearningRate 0.0323 Epoch: 8 Global Step: 357960 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:54,536-Speed 2629.68 samples/sec Loss 7.6288 LearningRate 0.0323 Epoch: 8 Global Step: 357970 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:55:58,438-Speed 2624.85 samples/sec Loss 7.7273 LearningRate 0.0323 Epoch: 8 Global Step: 357980 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 11:56:02,337-Speed 2627.24 samples/sec Loss 7.6813 LearningRate 0.0323 Epoch: 8 Global Step: 357990 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:06,227-Speed 2633.16 samples/sec Loss 7.7140 LearningRate 0.0323 Epoch: 8 Global Step: 358000 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:10,150-Speed 2611.08 samples/sec Loss 7.5673 LearningRate 0.0323 Epoch: 8 Global Step: 358010 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:14,041-Speed 2632.50 samples/sec Loss 7.6750 LearningRate 0.0323 Epoch: 8 Global Step: 358020 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:17,941-Speed 2626.10 samples/sec Loss 7.6162 LearningRate 0.0323 Epoch: 8 Global Step: 358030 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:21,838-Speed 2628.91 samples/sec Loss 7.6389 LearningRate 0.0323 Epoch: 8 Global Step: 358040 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:25,725-Speed 2634.89 samples/sec Loss 7.6378 LearningRate 0.0323 Epoch: 8 Global Step: 358050 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:29,622-Speed 2628.25 samples/sec Loss 7.7943 LearningRate 0.0323 Epoch: 8 Global Step: 358060 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:33,529-Speed 2622.01 samples/sec Loss 7.7085 LearningRate 0.0323 Epoch: 8 Global Step: 358070 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:37,423-Speed 2629.82 samples/sec Loss 7.4929 LearningRate 0.0323 Epoch: 8 Global Step: 358080 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:56:41,318-Speed 2629.37 samples/sec Loss 7.7002 LearningRate 0.0323 Epoch: 8 Global Step: 358090 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:56:45,209-Speed 2632.46 samples/sec Loss 7.6246 LearningRate 0.0323 Epoch: 8 Global Step: 358100 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:56:49,107-Speed 2628.21 samples/sec Loss 7.6472 LearningRate 0.0323 Epoch: 8 Global Step: 358110 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:56:53,012-Speed 2622.85 samples/sec Loss 7.6397 LearningRate 0.0323 Epoch: 8 Global Step: 358120 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:56:56,910-Speed 2627.98 samples/sec Loss 7.7786 LearningRate 0.0323 Epoch: 8 Global Step: 358130 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:57:00,801-Speed 2632.47 samples/sec Loss 7.7458 LearningRate 0.0323 Epoch: 8 Global Step: 358140 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:57:04,691-Speed 2632.30 samples/sec Loss 7.7370 LearningRate 0.0323 Epoch: 8 Global Step: 358150 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:57:08,582-Speed 2632.23 samples/sec Loss 7.8132 LearningRate 0.0323 Epoch: 8 Global Step: 358160 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:57:12,480-Speed 2627.89 samples/sec Loss 7.7361 LearningRate 0.0323 Epoch: 8 Global Step: 358170 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:57:16,374-Speed 2630.15 samples/sec Loss 7.7315 LearningRate 0.0323 Epoch: 8 Global Step: 358180 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:57:20,293-Speed 2613.71 samples/sec Loss 7.6897 LearningRate 0.0323 Epoch: 8 Global Step: 358190 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:24,198-Speed 2622.50 samples/sec Loss 7.6469 LearningRate 0.0323 Epoch: 8 Global Step: 358200 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:28,098-Speed 2626.63 samples/sec Loss 7.6419 LearningRate 0.0323 Epoch: 8 Global Step: 358210 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:31,985-Speed 2635.18 samples/sec Loss 7.6246 LearningRate 0.0323 Epoch: 8 Global Step: 358220 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:35,889-Speed 2623.56 samples/sec Loss 7.6502 LearningRate 0.0323 Epoch: 8 Global Step: 358230 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:39,851-Speed 2585.01 samples/sec Loss 7.5121 LearningRate 0.0323 Epoch: 8 Global Step: 358240 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:43,755-Speed 2624.14 samples/sec Loss 7.7334 LearningRate 0.0323 Epoch: 8 Global Step: 358250 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:47,654-Speed 2626.66 samples/sec Loss 7.6547 LearningRate 0.0323 Epoch: 8 Global Step: 358260 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:51,559-Speed 2622.96 samples/sec Loss 7.7590 LearningRate 0.0323 Epoch: 8 Global Step: 358270 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:55,461-Speed 2625.25 samples/sec Loss 7.6140 LearningRate 0.0323 Epoch: 8 Global Step: 358280 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:57:59,327-Speed 2649.43 samples/sec Loss 7.7583 LearningRate 0.0323 Epoch: 8 Global Step: 358290 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:03,220-Speed 2631.17 samples/sec Loss 7.6870 LearningRate 0.0323 Epoch: 8 Global Step: 358300 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:07,135-Speed 2615.74 samples/sec Loss 7.7001 LearningRate 0.0323 Epoch: 8 Global Step: 358310 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:11,031-Speed 2629.11 samples/sec Loss 7.6318 LearningRate 0.0323 Epoch: 8 Global Step: 358320 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:14,928-Speed 2628.48 samples/sec Loss 7.4964 LearningRate 0.0323 Epoch: 8 Global Step: 358330 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:18,822-Speed 2630.21 samples/sec Loss 7.7232 LearningRate 0.0323 Epoch: 8 Global Step: 358340 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:22,716-Speed 2630.48 samples/sec Loss 7.6334 LearningRate 0.0323 Epoch: 8 Global Step: 358350 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:26,610-Speed 2630.36 samples/sec Loss 7.6281 LearningRate 0.0323 Epoch: 8 Global Step: 358360 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:30,502-Speed 2631.56 samples/sec Loss 7.7462 LearningRate 0.0323 Epoch: 8 Global Step: 358370 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:34,394-Speed 2632.09 samples/sec Loss 7.7809 LearningRate 0.0323 Epoch: 8 Global Step: 358380 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:38,287-Speed 2631.02 samples/sec Loss 7.8529 LearningRate 0.0323 Epoch: 8 Global Step: 358390 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:58:42,177-Speed 2632.53 samples/sec Loss 7.6490 LearningRate 0.0323 Epoch: 8 Global Step: 358400 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:58:46,063-Speed 2635.79 samples/sec Loss 7.6549 LearningRate 0.0323 Epoch: 8 Global Step: 358410 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:49,946-Speed 2638.09 samples/sec Loss 7.6823 LearningRate 0.0323 Epoch: 8 Global Step: 358420 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:53,838-Speed 2631.46 samples/sec Loss 7.5539 LearningRate 0.0323 Epoch: 8 Global Step: 358430 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:58:57,734-Speed 2629.64 samples/sec Loss 7.7107 LearningRate 0.0323 Epoch: 8 Global Step: 358440 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:01,625-Speed 2631.82 samples/sec Loss 7.6626 LearningRate 0.0323 Epoch: 8 Global Step: 358450 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:05,531-Speed 2623.05 samples/sec Loss 7.5982 LearningRate 0.0323 Epoch: 8 Global Step: 358460 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:09,428-Speed 2627.99 samples/sec Loss 7.7283 LearningRate 0.0322 Epoch: 8 Global Step: 358470 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:13,331-Speed 2632.15 samples/sec Loss 7.7189 LearningRate 0.0322 Epoch: 8 Global Step: 358480 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:17,217-Speed 2636.04 samples/sec Loss 7.7300 LearningRate 0.0322 Epoch: 8 Global Step: 358490 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:21,108-Speed 2632.39 samples/sec Loss 7.7197 LearningRate 0.0322 Epoch: 8 Global Step: 358500 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:25,002-Speed 2630.42 samples/sec Loss 7.7148 LearningRate 0.0322 Epoch: 8 Global Step: 358510 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:59:28,895-Speed 2631.00 samples/sec Loss 7.6302 LearningRate 0.0322 Epoch: 8 Global Step: 358520 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:59:32,789-Speed 2630.81 samples/sec Loss 7.8094 LearningRate 0.0322 Epoch: 8 Global Step: 358530 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:59:36,683-Speed 2630.24 samples/sec Loss 7.6966 LearningRate 0.0322 Epoch: 8 Global Step: 358540 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 11:59:40,563-Speed 2639.43 samples/sec Loss 7.6669 LearningRate 0.0322 Epoch: 8 Global Step: 358550 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:44,465-Speed 2625.26 samples/sec Loss 7.6526 LearningRate 0.0322 Epoch: 8 Global Step: 358560 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:48,363-Speed 2627.81 samples/sec Loss 7.6844 LearningRate 0.0322 Epoch: 8 Global Step: 358570 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:52,269-Speed 2622.01 samples/sec Loss 7.6755 LearningRate 0.0322 Epoch: 8 Global Step: 358580 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 11:59:56,187-Speed 2615.06 samples/sec Loss 7.6173 LearningRate 0.0322 Epoch: 8 Global Step: 358590 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:00,082-Speed 2629.29 samples/sec Loss 7.6057 LearningRate 0.0322 Epoch: 8 Global Step: 358600 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:04,009-Speed 2608.94 samples/sec Loss 7.6444 LearningRate 0.0322 Epoch: 8 Global Step: 358610 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:07,905-Speed 2628.77 samples/sec Loss 7.6269 LearningRate 0.0322 Epoch: 8 Global Step: 358620 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:11,824-Speed 2613.40 samples/sec Loss 7.6922 LearningRate 0.0322 Epoch: 8 Global Step: 358630 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:15,740-Speed 2616.03 samples/sec Loss 7.7236 LearningRate 0.0322 Epoch: 8 Global Step: 358640 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:19,634-Speed 2630.40 samples/sec Loss 7.5719 LearningRate 0.0322 Epoch: 8 Global Step: 358650 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:00:23,534-Speed 2626.36 samples/sec Loss 7.6372 LearningRate 0.0322 Epoch: 8 Global Step: 358660 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:27,431-Speed 2627.98 samples/sec Loss 7.6934 LearningRate 0.0322 Epoch: 8 Global Step: 358670 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:31,323-Speed 2632.58 samples/sec Loss 7.7007 LearningRate 0.0322 Epoch: 8 Global Step: 358680 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:35,218-Speed 2629.44 samples/sec Loss 7.6976 LearningRate 0.0322 Epoch: 8 Global Step: 358690 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:39,111-Speed 2631.45 samples/sec Loss 7.6416 LearningRate 0.0322 Epoch: 8 Global Step: 358700 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:43,002-Speed 2632.15 samples/sec Loss 7.6033 LearningRate 0.0322 Epoch: 8 Global Step: 358710 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:46,956-Speed 2590.37 samples/sec Loss 7.6751 LearningRate 0.0322 Epoch: 8 Global Step: 358720 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:50,862-Speed 2622.31 samples/sec Loss 7.6638 LearningRate 0.0322 Epoch: 8 Global Step: 358730 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:54,798-Speed 2602.87 samples/sec Loss 7.6820 LearningRate 0.0322 Epoch: 8 Global Step: 358740 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:00:58,693-Speed 2629.55 samples/sec Loss 7.5989 LearningRate 0.0322 Epoch: 8 Global Step: 358750 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:02,588-Speed 2629.74 samples/sec Loss 7.6651 LearningRate 0.0322 Epoch: 8 Global Step: 358760 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:01:06,482-Speed 2630.34 samples/sec Loss 7.7206 LearningRate 0.0322 Epoch: 8 Global Step: 358770 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:01:10,381-Speed 2627.45 samples/sec Loss 7.6387 LearningRate 0.0322 Epoch: 8 Global Step: 358780 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:01:14,259-Speed 2640.72 samples/sec Loss 7.5365 LearningRate 0.0322 Epoch: 8 Global Step: 358790 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:18,154-Speed 2629.50 samples/sec Loss 7.6699 LearningRate 0.0322 Epoch: 8 Global Step: 358800 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:22,071-Speed 2615.20 samples/sec Loss 7.6847 LearningRate 0.0322 Epoch: 8 Global Step: 358810 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:25,970-Speed 2627.21 samples/sec Loss 7.6130 LearningRate 0.0322 Epoch: 8 Global Step: 358820 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:29,867-Speed 2628.62 samples/sec Loss 7.5845 LearningRate 0.0322 Epoch: 8 Global Step: 358830 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:33,762-Speed 2629.24 samples/sec Loss 7.5816 LearningRate 0.0322 Epoch: 8 Global Step: 358840 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:37,655-Speed 2631.24 samples/sec Loss 7.7316 LearningRate 0.0322 Epoch: 8 Global Step: 358850 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:41,549-Speed 2630.51 samples/sec Loss 7.7136 LearningRate 0.0322 Epoch: 8 Global Step: 358860 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:45,477-Speed 2607.22 samples/sec Loss 7.7097 LearningRate 0.0322 Epoch: 8 Global Step: 358870 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:49,373-Speed 2629.43 samples/sec Loss 7.6304 LearningRate 0.0322 Epoch: 8 Global Step: 358880 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:01:53,313-Speed 2599.59 samples/sec Loss 7.4158 LearningRate 0.0322 Epoch: 8 Global Step: 358890 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:01:57,207-Speed 2631.03 samples/sec Loss 7.5738 LearningRate 0.0322 Epoch: 8 Global Step: 358900 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:02:01,093-Speed 2636.04 samples/sec Loss 7.6343 LearningRate 0.0322 Epoch: 8 Global Step: 358910 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:02:05,109-Speed 2550.17 samples/sec Loss 7.6083 LearningRate 0.0322 Epoch: 8 Global Step: 358920 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:02:09,079-Speed 2579.99 samples/sec Loss 7.7876 LearningRate 0.0322 Epoch: 8 Global Step: 358930 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:12,987-Speed 2620.70 samples/sec Loss 7.6071 LearningRate 0.0322 Epoch: 8 Global Step: 358940 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:16,898-Speed 2618.90 samples/sec Loss 7.5729 LearningRate 0.0322 Epoch: 8 Global Step: 358950 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:20,832-Speed 2603.80 samples/sec Loss 7.6045 LearningRate 0.0322 Epoch: 8 Global Step: 358960 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:24,723-Speed 2632.67 samples/sec Loss 7.6887 LearningRate 0.0322 Epoch: 8 Global Step: 358970 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:28,620-Speed 2628.16 samples/sec Loss 7.6891 LearningRate 0.0322 Epoch: 8 Global Step: 358980 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:32,517-Speed 2628.71 samples/sec Loss 7.6827 LearningRate 0.0322 Epoch: 8 Global Step: 358990 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:36,417-Speed 2625.95 samples/sec Loss 7.7087 LearningRate 0.0322 Epoch: 8 Global Step: 359000 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:40,308-Speed 2632.43 samples/sec Loss 7.6425 LearningRate 0.0322 Epoch: 8 Global Step: 359010 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:44,202-Speed 2630.20 samples/sec Loss 7.6174 LearningRate 0.0322 Epoch: 8 Global Step: 359020 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:02:48,126-Speed 2610.91 samples/sec Loss 7.7366 LearningRate 0.0322 Epoch: 8 Global Step: 359030 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:02:52,018-Speed 2631.81 samples/sec Loss 7.5714 LearningRate 0.0322 Epoch: 8 Global Step: 359040 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:02:55,932-Speed 2616.97 samples/sec Loss 7.4905 LearningRate 0.0322 Epoch: 8 Global Step: 359050 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:02:59,824-Speed 2631.89 samples/sec Loss 7.6085 LearningRate 0.0322 Epoch: 8 Global Step: 359060 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:03:03,740-Speed 2615.32 samples/sec Loss 7.6428 LearningRate 0.0322 Epoch: 8 Global Step: 359070 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:03:07,635-Speed 2629.76 samples/sec Loss 7.4498 LearningRate 0.0322 Epoch: 8 Global Step: 359080 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:03:11,548-Speed 2617.47 samples/sec Loss 7.7818 LearningRate 0.0322 Epoch: 8 Global Step: 359090 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:03:15,441-Speed 2631.10 samples/sec Loss 7.6754 LearningRate 0.0322 Epoch: 8 Global Step: 359100 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:03:19,336-Speed 2630.03 samples/sec Loss 7.7788 LearningRate 0.0322 Epoch: 8 Global Step: 359110 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:03:23,227-Speed 2632.62 samples/sec Loss 7.6039 LearningRate 0.0322 Epoch: 8 Global Step: 359120 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:03:27,121-Speed 2630.25 samples/sec Loss 7.6132 LearningRate 0.0322 Epoch: 8 Global Step: 359130 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:03:30,982-Speed 2653.42 samples/sec Loss 7.6543 LearningRate 0.0322 Epoch: 8 Global Step: 359140 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:03:34,874-Speed 2631.62 samples/sec Loss 7.6654 LearningRate 0.0322 Epoch: 8 Global Step: 359150 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:03:38,781-Speed 2621.81 samples/sec Loss 7.6454 LearningRate 0.0322 Epoch: 8 Global Step: 359160 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:03:42,679-Speed 2627.11 samples/sec Loss 7.6164 LearningRate 0.0322 Epoch: 8 Global Step: 359170 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:03:46,574-Speed 2630.17 samples/sec Loss 7.6616 LearningRate 0.0322 Epoch: 8 Global Step: 359180 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:03:50,463-Speed 2634.05 samples/sec Loss 7.6592 LearningRate 0.0322 Epoch: 8 Global Step: 359190 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:03:54,365-Speed 2625.13 samples/sec Loss 7.5159 LearningRate 0.0322 Epoch: 8 Global Step: 359200 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:03:58,254-Speed 2633.57 samples/sec Loss 7.6098 LearningRate 0.0321 Epoch: 8 Global Step: 359210 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:04:02,145-Speed 2632.37 samples/sec Loss 7.7034 LearningRate 0.0321 Epoch: 8 Global Step: 359220 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:04:06,040-Speed 2629.65 samples/sec Loss 7.6718 LearningRate 0.0321 Epoch: 8 Global Step: 359230 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:04:09,932-Speed 2631.42 samples/sec Loss 7.5931 LearningRate 0.0321 Epoch: 8 Global Step: 359240 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:04:13,828-Speed 2629.20 samples/sec Loss 7.5933 LearningRate 0.0321 Epoch: 8 Global Step: 359250 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:04:17,735-Speed 2621.74 samples/sec Loss 7.6226 LearningRate 0.0321 Epoch: 8 Global Step: 359260 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:04:21,633-Speed 2627.77 samples/sec Loss 7.6970 LearningRate 0.0321 Epoch: 8 Global Step: 359270 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:04:25,527-Speed 2630.75 samples/sec Loss 7.6103 LearningRate 0.0321 Epoch: 8 Global Step: 359280 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:04:29,447-Speed 2613.06 samples/sec Loss 7.5093 LearningRate 0.0321 Epoch: 8 Global Step: 359290 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:04:33,341-Speed 2630.24 samples/sec Loss 7.6104 LearningRate 0.0321 Epoch: 8 Global Step: 359300 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:04:37,203-Speed 2652.33 samples/sec Loss 7.7237 LearningRate 0.0321 Epoch: 8 Global Step: 359310 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:04:41,106-Speed 2624.31 samples/sec Loss 7.7040 LearningRate 0.0321 Epoch: 8 Global Step: 359320 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:04:45,005-Speed 2627.06 samples/sec Loss 7.7471 LearningRate 0.0321 Epoch: 8 Global Step: 359330 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:04:48,907-Speed 2625.31 samples/sec Loss 7.4903 LearningRate 0.0321 Epoch: 8 Global Step: 359340 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:04:52,831-Speed 2610.42 samples/sec Loss 7.5231 LearningRate 0.0321 Epoch: 8 Global Step: 359350 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:04:56,720-Speed 2638.30 samples/sec Loss 7.6345 LearningRate 0.0321 Epoch: 8 Global Step: 359360 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:05:00,619-Speed 2627.39 samples/sec Loss 7.5150 LearningRate 0.0321 Epoch: 8 Global Step: 359370 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:05:04,611-Speed 2565.80 samples/sec Loss 7.6358 LearningRate 0.0321 Epoch: 8 Global Step: 359380 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:05:08,502-Speed 2632.34 samples/sec Loss 7.5823 LearningRate 0.0321 Epoch: 8 Global Step: 359390 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:05:12,407-Speed 2623.26 samples/sec Loss 7.6099 LearningRate 0.0321 Epoch: 8 Global Step: 359400 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:05:16,298-Speed 2632.26 samples/sec Loss 7.7219 LearningRate 0.0321 Epoch: 8 Global Step: 359410 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:20,210-Speed 2618.26 samples/sec Loss 7.5968 LearningRate 0.0321 Epoch: 8 Global Step: 359420 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:24,101-Speed 2632.13 samples/sec Loss 7.5964 LearningRate 0.0321 Epoch: 8 Global Step: 359430 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:27,999-Speed 2628.42 samples/sec Loss 7.6786 LearningRate 0.0321 Epoch: 8 Global Step: 359440 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:31,892-Speed 2630.60 samples/sec Loss 7.7053 LearningRate 0.0321 Epoch: 8 Global Step: 359450 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:35,783-Speed 2632.64 samples/sec Loss 7.5908 LearningRate 0.0321 Epoch: 8 Global Step: 359460 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:39,730-Speed 2595.13 samples/sec Loss 7.5919 LearningRate 0.0321 Epoch: 8 Global Step: 359470 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:43,622-Speed 2631.60 samples/sec Loss 7.6476 LearningRate 0.0321 Epoch: 8 Global Step: 359480 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:47,514-Speed 2631.84 samples/sec Loss 7.6347 LearningRate 0.0321 Epoch: 8 Global Step: 359490 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:51,604-Speed 2503.80 samples/sec Loss 7.6154 LearningRate 0.0321 Epoch: 8 Global Step: 359500 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:05:55,695-Speed 2504.52 samples/sec Loss 7.6848 LearningRate 0.0321 Epoch: 8 Global Step: 359510 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:05:59,793-Speed 2499.02 samples/sec Loss 7.5410 LearningRate 0.0321 Epoch: 8 Global Step: 359520 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:03,839-Speed 2531.76 samples/sec Loss 7.5931 LearningRate 0.0321 Epoch: 8 Global Step: 359530 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:07,734-Speed 2629.52 samples/sec Loss 7.6508 LearningRate 0.0321 Epoch: 8 Global Step: 359540 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:11,630-Speed 2629.06 samples/sec Loss 7.6988 LearningRate 0.0321 Epoch: 8 Global Step: 359550 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:15,521-Speed 2632.12 samples/sec Loss 7.6539 LearningRate 0.0321 Epoch: 8 Global Step: 359560 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:19,414-Speed 2631.41 samples/sec Loss 7.5458 LearningRate 0.0321 Epoch: 8 Global Step: 359570 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:23,307-Speed 2630.76 samples/sec Loss 7.5695 LearningRate 0.0321 Epoch: 8 Global Step: 359580 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:27,200-Speed 2631.35 samples/sec Loss 7.6595 LearningRate 0.0321 Epoch: 8 Global Step: 359590 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:31,093-Speed 2630.87 samples/sec Loss 7.5764 LearningRate 0.0321 Epoch: 8 Global Step: 359600 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:34,991-Speed 2628.36 samples/sec Loss 7.6588 LearningRate 0.0321 Epoch: 8 Global Step: 359610 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:06:38,867-Speed 2642.42 samples/sec Loss 7.7328 LearningRate 0.0321 Epoch: 8 Global Step: 359620 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:42,759-Speed 2631.34 samples/sec Loss 7.5577 LearningRate 0.0321 Epoch: 8 Global Step: 359630 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:46,652-Speed 2630.98 samples/sec Loss 7.4842 LearningRate 0.0321 Epoch: 8 Global Step: 359640 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:50,551-Speed 2626.88 samples/sec Loss 7.6778 LearningRate 0.0321 Epoch: 8 Global Step: 359650 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:54,459-Speed 2621.65 samples/sec Loss 7.7164 LearningRate 0.0321 Epoch: 8 Global Step: 359660 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:06:58,350-Speed 2632.35 samples/sec Loss 7.6226 LearningRate 0.0321 Epoch: 8 Global Step: 359670 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:02,255-Speed 2622.50 samples/sec Loss 7.7629 LearningRate 0.0321 Epoch: 8 Global Step: 359680 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:06,175-Speed 2613.17 samples/sec Loss 7.7089 LearningRate 0.0321 Epoch: 8 Global Step: 359690 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:10,088-Speed 2617.74 samples/sec Loss 7.6868 LearningRate 0.0321 Epoch: 8 Global Step: 359700 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:14,016-Speed 2607.79 samples/sec Loss 7.5975 LearningRate 0.0321 Epoch: 8 Global Step: 359710 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:17,914-Speed 2627.70 samples/sec Loss 7.4885 LearningRate 0.0321 Epoch: 8 Global Step: 359720 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:07:21,835-Speed 2612.30 samples/sec Loss 7.6731 LearningRate 0.0321 Epoch: 8 Global Step: 359730 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:25,735-Speed 2626.48 samples/sec Loss 7.6887 LearningRate 0.0321 Epoch: 8 Global Step: 359740 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:29,640-Speed 2623.20 samples/sec Loss 7.6309 LearningRate 0.0321 Epoch: 8 Global Step: 359750 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:33,545-Speed 2622.93 samples/sec Loss 7.5267 LearningRate 0.0321 Epoch: 8 Global Step: 359760 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:07:37,430-Speed 2635.98 samples/sec Loss 7.5527 LearningRate 0.0321 Epoch: 8 Global Step: 359770 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:07:41,332-Speed 2624.87 samples/sec Loss 7.6194 LearningRate 0.0321 Epoch: 8 Global Step: 359780 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:07:45,225-Speed 2631.31 samples/sec Loss 7.6366 LearningRate 0.0321 Epoch: 8 Global Step: 359790 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:07:49,118-Speed 2631.02 samples/sec Loss 7.6284 LearningRate 0.0321 Epoch: 8 Global Step: 359800 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:07:53,015-Speed 2628.38 samples/sec Loss 7.6364 LearningRate 0.0321 Epoch: 8 Global Step: 359810 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:07:56,893-Speed 2641.08 samples/sec Loss 7.7760 LearningRate 0.0321 Epoch: 8 Global Step: 359820 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:08:00,785-Speed 2632.30 samples/sec Loss 7.5546 LearningRate 0.0321 Epoch: 8 Global Step: 359830 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:08:04,686-Speed 2625.47 samples/sec Loss 7.6737 LearningRate 0.0321 Epoch: 8 Global Step: 359840 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:08:08,609-Speed 2610.92 samples/sec Loss 7.5816 LearningRate 0.0321 Epoch: 8 Global Step: 359850 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:08:12,485-Speed 2641.92 samples/sec Loss 7.5800 LearningRate 0.0321 Epoch: 8 Global Step: 359860 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:16,378-Speed 2631.24 samples/sec Loss 7.5720 LearningRate 0.0321 Epoch: 8 Global Step: 359870 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:20,272-Speed 2631.11 samples/sec Loss 7.5051 LearningRate 0.0321 Epoch: 8 Global Step: 359880 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:24,184-Speed 2617.48 samples/sec Loss 7.6259 LearningRate 0.0321 Epoch: 8 Global Step: 359890 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:28,122-Speed 2601.70 samples/sec Loss 7.6642 LearningRate 0.0321 Epoch: 8 Global Step: 359900 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:32,006-Speed 2636.82 samples/sec Loss 7.5829 LearningRate 0.0321 Epoch: 8 Global Step: 359910 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:35,910-Speed 2623.50 samples/sec Loss 7.6005 LearningRate 0.0321 Epoch: 8 Global Step: 359920 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:39,802-Speed 2631.80 samples/sec Loss 7.6776 LearningRate 0.0321 Epoch: 8 Global Step: 359930 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:43,725-Speed 2610.73 samples/sec Loss 7.7001 LearningRate 0.0320 Epoch: 8 Global Step: 359940 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:47,621-Speed 2629.08 samples/sec Loss 7.5850 LearningRate 0.0320 Epoch: 8 Global Step: 359950 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:08:51,520-Speed 2627.34 samples/sec Loss 7.7052 LearningRate 0.0320 Epoch: 8 Global Step: 359960 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:08:55,411-Speed 2632.53 samples/sec Loss 7.6059 LearningRate 0.0320 Epoch: 8 Global Step: 359970 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:08:59,307-Speed 2629.53 samples/sec Loss 7.7299 LearningRate 0.0320 Epoch: 8 Global Step: 359980 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:09:03,204-Speed 2628.21 samples/sec Loss 7.6315 LearningRate 0.0320 Epoch: 8 Global Step: 359990 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:09:07,099-Speed 2629.51 samples/sec Loss 7.6191 LearningRate 0.0320 Epoch: 8 Global Step: 360000 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:09:49,998-[lfw][360000]XNorm: 23.186249
Training: 2022-04-14 12:09:49,999-[lfw][360000]Accuracy-Flip: 0.99767+-0.00260
Training: 2022-04-14 12:09:49,999-[lfw][360000]Accuracy-Highest: 0.99783
Training: 2022-04-14 12:10:39,924-[cfp_fp][360000]XNorm: 21.406011
Training: 2022-04-14 12:10:39,925-[cfp_fp][360000]Accuracy-Flip: 0.98600+-0.00538
Training: 2022-04-14 12:10:39,926-[cfp_fp][360000]Accuracy-Highest: 0.98671
Training: 2022-04-14 12:11:23,081-[agedb_30][360000]XNorm: 23.462880
Training: 2022-04-14 12:11:23,082-[agedb_30][360000]Accuracy-Flip: 0.97700+-0.00698
Training: 2022-04-14 12:11:23,083-[agedb_30][360000]Accuracy-Highest: 0.97700
Training: 2022-04-14 12:11:26,950-Speed 73.22 samples/sec Loss 7.7173 LearningRate 0.0320 Epoch: 8 Global Step: 360010 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:11:30,822-Speed 2645.23 samples/sec Loss 7.6198 LearningRate 0.0320 Epoch: 8 Global Step: 360020 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:11:34,699-Speed 2642.03 samples/sec Loss 7.6861 LearningRate 0.0320 Epoch: 8 Global Step: 360030 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:11:38,575-Speed 2642.38 samples/sec Loss 7.5721 LearningRate 0.0320 Epoch: 8 Global Step: 360040 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:11:42,464-Speed 2633.98 samples/sec Loss 7.5390 LearningRate 0.0320 Epoch: 8 Global Step: 360050 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:11:46,347-Speed 2637.88 samples/sec Loss 7.5161 LearningRate 0.0320 Epoch: 8 Global Step: 360060 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:11:50,221-Speed 2644.10 samples/sec Loss 7.7719 LearningRate 0.0320 Epoch: 8 Global Step: 360070 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:11:54,101-Speed 2639.92 samples/sec Loss 7.7210 LearningRate 0.0320 Epoch: 8 Global Step: 360080 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:11:57,984-Speed 2637.54 samples/sec Loss 7.6872 LearningRate 0.0320 Epoch: 8 Global Step: 360090 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:01,867-Speed 2637.94 samples/sec Loss 7.6933 LearningRate 0.0320 Epoch: 8 Global Step: 360100 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:05,781-Speed 2616.75 samples/sec Loss 7.6424 LearningRate 0.0320 Epoch: 8 Global Step: 360110 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:09,674-Speed 2631.49 samples/sec Loss 7.7962 LearningRate 0.0320 Epoch: 8 Global Step: 360120 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:13,567-Speed 2630.56 samples/sec Loss 7.6215 LearningRate 0.0320 Epoch: 8 Global Step: 360130 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:17,536-Speed 2581.05 samples/sec Loss 7.5789 LearningRate 0.0320 Epoch: 8 Global Step: 360140 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:21,423-Speed 2635.27 samples/sec Loss 7.6918 LearningRate 0.0320 Epoch: 8 Global Step: 360150 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:25,314-Speed 2632.71 samples/sec Loss 7.6304 LearningRate 0.0320 Epoch: 8 Global Step: 360160 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:12:29,215-Speed 2625.51 samples/sec Loss 7.6599 LearningRate 0.0320 Epoch: 8 Global Step: 360170 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:12:33,105-Speed 2633.51 samples/sec Loss 7.7558 LearningRate 0.0320 Epoch: 8 Global Step: 360180 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:12:36,997-Speed 2631.28 samples/sec Loss 7.6271 LearningRate 0.0320 Epoch: 8 Global Step: 360190 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:12:40,888-Speed 2632.23 samples/sec Loss 7.5690 LearningRate 0.0320 Epoch: 8 Global Step: 360200 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:12:44,780-Speed 2632.35 samples/sec Loss 7.6474 LearningRate 0.0320 Epoch: 8 Global Step: 360210 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:12:48,662-Speed 2638.24 samples/sec Loss 7.5521 LearningRate 0.0320 Epoch: 8 Global Step: 360220 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:12:52,459-Speed 2697.92 samples/sec Loss 8.2656 LearningRate 0.0320 Epoch: 8 Global Step: 360230 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:12:56,358-Speed 2626.57 samples/sec Loss 9.4664 LearningRate 0.0320 Epoch: 8 Global Step: 360240 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:00,253-Speed 2629.69 samples/sec Loss 8.0911 LearningRate 0.0320 Epoch: 8 Global Step: 360250 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:04,139-Speed 2635.95 samples/sec Loss 7.9676 LearningRate 0.0320 Epoch: 8 Global Step: 360260 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:08,056-Speed 2614.67 samples/sec Loss 7.5958 LearningRate 0.0320 Epoch: 8 Global Step: 360270 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:11,945-Speed 2633.61 samples/sec Loss 7.6948 LearningRate 0.0320 Epoch: 8 Global Step: 360280 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:15,857-Speed 2618.46 samples/sec Loss 7.8558 LearningRate 0.0320 Epoch: 8 Global Step: 360290 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:19,747-Speed 2633.06 samples/sec Loss 7.5955 LearningRate 0.0320 Epoch: 8 Global Step: 360300 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:23,636-Speed 2634.13 samples/sec Loss 7.6708 LearningRate 0.0320 Epoch: 8 Global Step: 360310 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:27,522-Speed 2635.71 samples/sec Loss 7.6577 LearningRate 0.0320 Epoch: 8 Global Step: 360320 Fp16 Grad Scale: 2048 Required: 53 hours
Training: 2022-04-14 12:13:31,410-Speed 2634.74 samples/sec Loss 7.6951 LearningRate 0.0320 Epoch: 8 Global Step: 360330 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:13:35,304-Speed 2630.70 samples/sec Loss 7.6706 LearningRate 0.0320 Epoch: 8 Global Step: 360340 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:13:39,191-Speed 2634.41 samples/sec Loss 7.6972 LearningRate 0.0320 Epoch: 8 Global Step: 360350 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:13:43,087-Speed 2629.03 samples/sec Loss 7.5343 LearningRate 0.0320 Epoch: 8 Global Step: 360360 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:13:46,973-Speed 2635.47 samples/sec Loss 7.7674 LearningRate 0.0320 Epoch: 8 Global Step: 360370 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:13:50,870-Speed 2628.77 samples/sec Loss 7.6071 LearningRate 0.0320 Epoch: 8 Global Step: 360380 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:13:54,760-Speed 2633.38 samples/sec Loss 7.6036 LearningRate 0.0320 Epoch: 8 Global Step: 360390 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:13:58,722-Speed 2585.29 samples/sec Loss 7.6981 LearningRate 0.0320 Epoch: 8 Global Step: 360400 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:14:02,613-Speed 2632.31 samples/sec Loss 7.6596 LearningRate 0.0320 Epoch: 8 Global Step: 360410 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:14:06,504-Speed 2631.58 samples/sec Loss 7.8083 LearningRate 0.0320 Epoch: 8 Global Step: 360420 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:14:10,395-Speed 2632.47 samples/sec Loss 7.6519 LearningRate 0.0320 Epoch: 8 Global Step: 360430 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:14,282-Speed 2635.25 samples/sec Loss 7.6988 LearningRate 0.0320 Epoch: 8 Global Step: 360440 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:18,177-Speed 2630.08 samples/sec Loss 7.6190 LearningRate 0.0320 Epoch: 8 Global Step: 360450 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:22,104-Speed 2608.04 samples/sec Loss 7.6952 LearningRate 0.0320 Epoch: 8 Global Step: 360460 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:25,997-Speed 2631.17 samples/sec Loss 7.6975 LearningRate 0.0320 Epoch: 8 Global Step: 360470 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:29,887-Speed 2633.59 samples/sec Loss 7.4706 LearningRate 0.0320 Epoch: 8 Global Step: 360480 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:33,799-Speed 2618.23 samples/sec Loss 7.7487 LearningRate 0.0320 Epoch: 8 Global Step: 360490 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:37,684-Speed 2636.03 samples/sec Loss 7.6528 LearningRate 0.0320 Epoch: 8 Global Step: 360500 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:41,593-Speed 2620.86 samples/sec Loss 7.5133 LearningRate 0.0320 Epoch: 8 Global Step: 360510 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:45,483-Speed 2632.81 samples/sec Loss 7.6963 LearningRate 0.0320 Epoch: 8 Global Step: 360520 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:14:49,372-Speed 2634.06 samples/sec Loss 7.6039 LearningRate 0.0320 Epoch: 8 Global Step: 360530 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:14:53,262-Speed 2633.28 samples/sec Loss 7.7185 LearningRate 0.0320 Epoch: 8 Global Step: 360540 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:14:57,157-Speed 2629.48 samples/sec Loss 7.4770 LearningRate 0.0320 Epoch: 8 Global Step: 360550 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:01,053-Speed 2629.27 samples/sec Loss 7.6348 LearningRate 0.0320 Epoch: 8 Global Step: 360560 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:04,949-Speed 2628.77 samples/sec Loss 7.5302 LearningRate 0.0320 Epoch: 8 Global Step: 360570 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:08,840-Speed 2632.26 samples/sec Loss 7.4513 LearningRate 0.0320 Epoch: 8 Global Step: 360580 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:12,729-Speed 2633.50 samples/sec Loss 7.5870 LearningRate 0.0320 Epoch: 8 Global Step: 360590 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:16,623-Speed 2630.45 samples/sec Loss 7.5243 LearningRate 0.0320 Epoch: 8 Global Step: 360600 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:20,517-Speed 2630.36 samples/sec Loss 7.7309 LearningRate 0.0320 Epoch: 8 Global Step: 360610 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:24,408-Speed 2633.11 samples/sec Loss 7.6213 LearningRate 0.0320 Epoch: 8 Global Step: 360620 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:15:28,304-Speed 2628.46 samples/sec Loss 7.6240 LearningRate 0.0320 Epoch: 8 Global Step: 360630 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:32,202-Speed 2627.88 samples/sec Loss 7.6015 LearningRate 0.0320 Epoch: 8 Global Step: 360640 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:36,102-Speed 2626.02 samples/sec Loss 7.6004 LearningRate 0.0320 Epoch: 8 Global Step: 360650 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:39,999-Speed 2628.44 samples/sec Loss 7.6114 LearningRate 0.0320 Epoch: 8 Global Step: 360660 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:43,889-Speed 2632.25 samples/sec Loss 7.6854 LearningRate 0.0319 Epoch: 8 Global Step: 360670 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:47,783-Speed 2631.49 samples/sec Loss 7.5167 LearningRate 0.0319 Epoch: 8 Global Step: 360680 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:51,672-Speed 2634.45 samples/sec Loss 7.6341 LearningRate 0.0319 Epoch: 8 Global Step: 360690 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:55,583-Speed 2619.10 samples/sec Loss 7.6713 LearningRate 0.0319 Epoch: 8 Global Step: 360700 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:15:59,486-Speed 2624.41 samples/sec Loss 7.6199 LearningRate 0.0319 Epoch: 8 Global Step: 360710 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:16:03,412-Speed 2608.23 samples/sec Loss 7.7022 LearningRate 0.0319 Epoch: 8 Global Step: 360720 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:16:07,308-Speed 2629.17 samples/sec Loss 7.6179 LearningRate 0.0319 Epoch: 8 Global Step: 360730 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:11,199-Speed 2631.74 samples/sec Loss 7.4739 LearningRate 0.0319 Epoch: 8 Global Step: 360740 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:15,100-Speed 2626.59 samples/sec Loss 7.5869 LearningRate 0.0319 Epoch: 8 Global Step: 360750 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:18,993-Speed 2630.73 samples/sec Loss 7.4563 LearningRate 0.0319 Epoch: 8 Global Step: 360760 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:22,902-Speed 2620.05 samples/sec Loss 7.5693 LearningRate 0.0319 Epoch: 8 Global Step: 360770 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:26,790-Speed 2634.54 samples/sec Loss 7.5508 LearningRate 0.0319 Epoch: 8 Global Step: 360780 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:30,684-Speed 2630.55 samples/sec Loss 7.5436 LearningRate 0.0319 Epoch: 8 Global Step: 360790 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:34,588-Speed 2623.69 samples/sec Loss 7.6368 LearningRate 0.0319 Epoch: 8 Global Step: 360800 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:38,488-Speed 2625.56 samples/sec Loss 7.6152 LearningRate 0.0319 Epoch: 8 Global Step: 360810 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:42,383-Speed 2630.05 samples/sec Loss 7.6720 LearningRate 0.0319 Epoch: 8 Global Step: 360820 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:16:46,280-Speed 2629.31 samples/sec Loss 7.6072 LearningRate 0.0319 Epoch: 8 Global Step: 360830 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:16:50,185-Speed 2622.99 samples/sec Loss 7.6587 LearningRate 0.0319 Epoch: 8 Global Step: 360840 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:16:54,100-Speed 2616.48 samples/sec Loss 7.5131 LearningRate 0.0319 Epoch: 8 Global Step: 360850 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:16:57,996-Speed 2629.28 samples/sec Loss 7.6958 LearningRate 0.0319 Epoch: 8 Global Step: 360860 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:01,899-Speed 2624.00 samples/sec Loss 7.5475 LearningRate 0.0319 Epoch: 8 Global Step: 360870 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:05,797-Speed 2627.75 samples/sec Loss 7.5489 LearningRate 0.0319 Epoch: 8 Global Step: 360880 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:09,697-Speed 2626.40 samples/sec Loss 7.6478 LearningRate 0.0319 Epoch: 8 Global Step: 360890 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:13,643-Speed 2595.32 samples/sec Loss 7.6170 LearningRate 0.0319 Epoch: 8 Global Step: 360900 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:17,550-Speed 2622.10 samples/sec Loss 7.5051 LearningRate 0.0319 Epoch: 8 Global Step: 360910 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:21,443-Speed 2631.43 samples/sec Loss 7.5852 LearningRate 0.0319 Epoch: 8 Global Step: 360920 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:25,346-Speed 2624.06 samples/sec Loss 7.3717 LearningRate 0.0319 Epoch: 8 Global Step: 360930 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:17:29,265-Speed 2613.29 samples/sec Loss 7.5893 LearningRate 0.0319 Epoch: 8 Global Step: 360940 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:17:33,177-Speed 2618.37 samples/sec Loss 7.6367 LearningRate 0.0319 Epoch: 8 Global Step: 360950 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:17:37,075-Speed 2627.91 samples/sec Loss 7.5545 LearningRate 0.0319 Epoch: 8 Global Step: 360960 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:17:40,975-Speed 2626.35 samples/sec Loss 7.4993 LearningRate 0.0319 Epoch: 8 Global Step: 360970 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:17:44,880-Speed 2622.55 samples/sec Loss 7.5462 LearningRate 0.0319 Epoch: 8 Global Step: 360980 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:17:48,782-Speed 2625.81 samples/sec Loss 7.6389 LearningRate 0.0319 Epoch: 8 Global Step: 360990 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:52,697-Speed 2616.07 samples/sec Loss 7.6670 LearningRate 0.0319 Epoch: 8 Global Step: 361000 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:17:56,597-Speed 2626.47 samples/sec Loss 7.5681 LearningRate 0.0319 Epoch: 8 Global Step: 361010 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:18:00,447-Speed 2660.36 samples/sec Loss 8.2528 LearningRate 0.0319 Epoch: 8 Global Step: 361020 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:04,344-Speed 2628.05 samples/sec Loss 8.2373 LearningRate 0.0319 Epoch: 8 Global Step: 361030 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:08,235-Speed 2632.06 samples/sec Loss 7.6857 LearningRate 0.0319 Epoch: 8 Global Step: 361040 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:12,145-Speed 2619.91 samples/sec Loss 7.6400 LearningRate 0.0319 Epoch: 8 Global Step: 361050 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:16,043-Speed 2627.91 samples/sec Loss 7.6704 LearningRate 0.0319 Epoch: 8 Global Step: 361060 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:19,937-Speed 2630.60 samples/sec Loss 7.6312 LearningRate 0.0319 Epoch: 8 Global Step: 361070 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:23,830-Speed 2630.77 samples/sec Loss 7.5979 LearningRate 0.0319 Epoch: 8 Global Step: 361080 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:27,751-Speed 2612.38 samples/sec Loss 7.5909 LearningRate 0.0319 Epoch: 8 Global Step: 361090 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:31,665-Speed 2617.30 samples/sec Loss 7.5654 LearningRate 0.0319 Epoch: 8 Global Step: 361100 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:35,565-Speed 2625.99 samples/sec Loss 7.5680 LearningRate 0.0319 Epoch: 8 Global Step: 361110 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:18:39,457-Speed 2631.75 samples/sec Loss 7.5924 LearningRate 0.0319 Epoch: 8 Global Step: 361120 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:18:43,349-Speed 2631.07 samples/sec Loss 7.6319 LearningRate 0.0319 Epoch: 8 Global Step: 361130 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:18:47,254-Speed 2623.35 samples/sec Loss 7.7140 LearningRate 0.0319 Epoch: 8 Global Step: 361140 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:18:51,147-Speed 2631.17 samples/sec Loss 7.5710 LearningRate 0.0319 Epoch: 8 Global Step: 361150 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:18:55,079-Speed 2605.15 samples/sec Loss 7.5625 LearningRate 0.0319 Epoch: 8 Global Step: 361160 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:18:58,977-Speed 2627.85 samples/sec Loss 7.7170 LearningRate 0.0319 Epoch: 8 Global Step: 361170 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:19:02,872-Speed 2629.54 samples/sec Loss 7.5435 LearningRate 0.0319 Epoch: 8 Global Step: 361180 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:19:06,769-Speed 2627.71 samples/sec Loss 7.6307 LearningRate 0.0319 Epoch: 8 Global Step: 361190 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:19:10,666-Speed 2628.53 samples/sec Loss 7.7269 LearningRate 0.0319 Epoch: 8 Global Step: 361200 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:19:14,556-Speed 2632.75 samples/sec Loss 7.6746 LearningRate 0.0319 Epoch: 8 Global Step: 361210 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:19:18,455-Speed 2627.12 samples/sec Loss 7.6481 LearningRate 0.0319 Epoch: 8 Global Step: 361220 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:22,346-Speed 2632.97 samples/sec Loss 7.5985 LearningRate 0.0319 Epoch: 8 Global Step: 361230 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:26,241-Speed 2629.66 samples/sec Loss 7.5983 LearningRate 0.0319 Epoch: 8 Global Step: 361240 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:30,133-Speed 2631.86 samples/sec Loss 7.5503 LearningRate 0.0319 Epoch: 8 Global Step: 361250 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:34,036-Speed 2624.36 samples/sec Loss 7.6271 LearningRate 0.0319 Epoch: 8 Global Step: 361260 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:37,927-Speed 2632.13 samples/sec Loss 7.5797 LearningRate 0.0319 Epoch: 8 Global Step: 361270 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:41,840-Speed 2617.76 samples/sec Loss 7.5480 LearningRate 0.0319 Epoch: 8 Global Step: 361280 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:45,734-Speed 2630.20 samples/sec Loss 7.6108 LearningRate 0.0319 Epoch: 8 Global Step: 361290 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:49,645-Speed 2619.31 samples/sec Loss 7.5362 LearningRate 0.0319 Epoch: 8 Global Step: 361300 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:53,540-Speed 2629.45 samples/sec Loss 7.7050 LearningRate 0.0319 Epoch: 8 Global Step: 361310 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:19:57,438-Speed 2627.78 samples/sec Loss 7.4865 LearningRate 0.0319 Epoch: 8 Global Step: 361320 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:20:01,338-Speed 2626.53 samples/sec Loss 7.5937 LearningRate 0.0319 Epoch: 8 Global Step: 361330 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:20:05,247-Speed 2620.44 samples/sec Loss 7.5615 LearningRate 0.0319 Epoch: 8 Global Step: 361340 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:20:09,140-Speed 2630.85 samples/sec Loss 7.6068 LearningRate 0.0319 Epoch: 8 Global Step: 361350 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:20:13,044-Speed 2622.90 samples/sec Loss 7.4928 LearningRate 0.0319 Epoch: 8 Global Step: 361360 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:20:16,922-Speed 2641.67 samples/sec Loss 7.6978 LearningRate 0.0319 Epoch: 8 Global Step: 361370 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:20,813-Speed 2632.74 samples/sec Loss 7.6582 LearningRate 0.0319 Epoch: 8 Global Step: 361380 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:24,707-Speed 2630.73 samples/sec Loss 7.5909 LearningRate 0.0319 Epoch: 8 Global Step: 361390 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:28,616-Speed 2620.33 samples/sec Loss 7.6563 LearningRate 0.0318 Epoch: 8 Global Step: 361400 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:32,509-Speed 2631.02 samples/sec Loss 7.5724 LearningRate 0.0318 Epoch: 8 Global Step: 361410 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:36,418-Speed 2620.09 samples/sec Loss 7.6032 LearningRate 0.0318 Epoch: 8 Global Step: 361420 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:40,313-Speed 2629.39 samples/sec Loss 7.5554 LearningRate 0.0318 Epoch: 8 Global Step: 361430 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:44,206-Speed 2631.44 samples/sec Loss 7.6963 LearningRate 0.0318 Epoch: 8 Global Step: 361440 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:48,097-Speed 2632.00 samples/sec Loss 7.4981 LearningRate 0.0318 Epoch: 8 Global Step: 361450 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:52,000-Speed 2624.62 samples/sec Loss 7.7255 LearningRate 0.0318 Epoch: 8 Global Step: 361460 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:20:55,900-Speed 2626.28 samples/sec Loss 7.7412 LearningRate 0.0318 Epoch: 8 Global Step: 361470 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:20:59,815-Speed 2616.75 samples/sec Loss 7.6979 LearningRate 0.0318 Epoch: 8 Global Step: 361480 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:21:03,711-Speed 2628.61 samples/sec Loss 7.6704 LearningRate 0.0318 Epoch: 8 Global Step: 361490 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:21:07,640-Speed 2607.36 samples/sec Loss 7.7059 LearningRate 0.0318 Epoch: 8 Global Step: 361500 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:21:11,505-Speed 2650.06 samples/sec Loss 7.5106 LearningRate 0.0318 Epoch: 8 Global Step: 361510 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:15,414-Speed 2620.05 samples/sec Loss 7.5384 LearningRate 0.0318 Epoch: 8 Global Step: 361520 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:19,313-Speed 2626.60 samples/sec Loss 7.6110 LearningRate 0.0318 Epoch: 8 Global Step: 361530 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:23,208-Speed 2630.24 samples/sec Loss 7.5747 LearningRate 0.0318 Epoch: 8 Global Step: 361540 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:27,105-Speed 2628.46 samples/sec Loss 7.5740 LearningRate 0.0318 Epoch: 8 Global Step: 361550 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:31,012-Speed 2621.62 samples/sec Loss 7.6978 LearningRate 0.0318 Epoch: 8 Global Step: 361560 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:34,908-Speed 2629.20 samples/sec Loss 7.5454 LearningRate 0.0318 Epoch: 8 Global Step: 361570 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:38,798-Speed 2632.67 samples/sec Loss 7.6154 LearningRate 0.0318 Epoch: 8 Global Step: 361580 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:42,692-Speed 2630.78 samples/sec Loss 7.5579 LearningRate 0.0318 Epoch: 8 Global Step: 361590 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:46,586-Speed 2630.47 samples/sec Loss 7.5841 LearningRate 0.0318 Epoch: 8 Global Step: 361600 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:21:50,484-Speed 2627.92 samples/sec Loss 7.6422 LearningRate 0.0318 Epoch: 8 Global Step: 361610 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:21:54,440-Speed 2589.15 samples/sec Loss 7.5956 LearningRate 0.0318 Epoch: 8 Global Step: 361620 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:21:58,267-Speed 2676.33 samples/sec Loss 8.1411 LearningRate 0.0318 Epoch: 8 Global Step: 361630 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:02,151-Speed 2637.93 samples/sec Loss 8.0207 LearningRate 0.0318 Epoch: 8 Global Step: 361640 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:06,053-Speed 2625.47 samples/sec Loss 7.5799 LearningRate 0.0318 Epoch: 8 Global Step: 361650 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:10,037-Speed 2570.57 samples/sec Loss 7.7418 LearningRate 0.0318 Epoch: 8 Global Step: 361660 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:13,931-Speed 2630.04 samples/sec Loss 7.5850 LearningRate 0.0318 Epoch: 8 Global Step: 361670 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:17,876-Speed 2596.87 samples/sec Loss 8.7690 LearningRate 0.0318 Epoch: 8 Global Step: 361680 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:21,765-Speed 2634.43 samples/sec Loss 8.2293 LearningRate 0.0318 Epoch: 8 Global Step: 361690 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:25,691-Speed 2609.04 samples/sec Loss 8.0665 LearningRate 0.0318 Epoch: 8 Global Step: 361700 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:29,590-Speed 2627.30 samples/sec Loss 7.8234 LearningRate 0.0318 Epoch: 8 Global Step: 361710 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:33,504-Speed 2616.81 samples/sec Loss 7.7638 LearningRate 0.0318 Epoch: 8 Global Step: 361720 Fp16 Grad Scale: 4096 Required: 53 hours
Training: 2022-04-14 12:22:37,409-Speed 2622.77 samples/sec Loss 7.6242 LearningRate 0.0318 Epoch: 8 Global Step: 361730 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:22:41,314-Speed 2623.40 samples/sec Loss 7.7413 LearningRate 0.0318 Epoch: 8 Global Step: 361740 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:22:45,217-Speed 2623.55 samples/sec Loss 7.5829 LearningRate 0.0318 Epoch: 8 Global Step: 361750 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:22:49,135-Speed 2615.22 samples/sec Loss 7.6139 LearningRate 0.0318 Epoch: 8 Global Step: 361760 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:22:53,051-Speed 2615.52 samples/sec Loss 7.7238 LearningRate 0.0318 Epoch: 8 Global Step: 361770 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:22:56,970-Speed 2613.18 samples/sec Loss 7.6364 LearningRate 0.0318 Epoch: 8 Global Step: 361780 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:23:00,865-Speed 2630.04 samples/sec Loss 7.5514 LearningRate 0.0318 Epoch: 8 Global Step: 361790 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:23:04,768-Speed 2623.73 samples/sec Loss 7.6114 LearningRate 0.0318 Epoch: 8 Global Step: 361800 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:23:08,666-Speed 2627.97 samples/sec Loss 7.7079 LearningRate 0.0318 Epoch: 8 Global Step: 361810 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:23:12,572-Speed 2622.20 samples/sec Loss 7.6033 LearningRate 0.0318 Epoch: 8 Global Step: 361820 Fp16 Grad Scale: 8192 Required: 53 hours
Training: 2022-04-14 12:23:16,475-Speed 2624.22 samples/sec Loss 7.7175 LearningRate 0.0318 Epoch: 8 Global Step: 361830 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:20,387-Speed 2618.17 samples/sec Loss 7.5791 LearningRate 0.0318 Epoch: 8 Global Step: 361840 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:24,314-Speed 2608.50 samples/sec Loss 7.6638 LearningRate 0.0318 Epoch: 8 Global Step: 361850 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:28,214-Speed 2625.85 samples/sec Loss 7.7976 LearningRate 0.0318 Epoch: 8 Global Step: 361860 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:32,110-Speed 2628.96 samples/sec Loss 7.5975 LearningRate 0.0318 Epoch: 8 Global Step: 361870 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:36,018-Speed 2621.26 samples/sec Loss 7.6091 LearningRate 0.0318 Epoch: 8 Global Step: 361880 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:40,031-Speed 2551.87 samples/sec Loss 7.7123 LearningRate 0.0318 Epoch: 8 Global Step: 361890 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:43,927-Speed 2629.30 samples/sec Loss 7.7523 LearningRate 0.0318 Epoch: 8 Global Step: 361900 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:47,857-Speed 2606.22 samples/sec Loss 7.5771 LearningRate 0.0318 Epoch: 8 Global Step: 361910 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:51,758-Speed 2625.80 samples/sec Loss 7.5768 LearningRate 0.0318 Epoch: 8 Global Step: 361920 Fp16 Grad Scale: 16384 Required: 53 hours
Training: 2022-04-14 12:23:55,660-Speed 2625.06 samples/sec Loss 7.5707 LearningRate 0.0318 Epoch: 8 Global Step: 361930 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:23:59,567-Speed 2621.20 samples/sec Loss 7.6170 LearningRate 0.0318 Epoch: 8 Global Step: 361940 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:03,484-Speed 2615.15 samples/sec Loss 7.6676 LearningRate 0.0318 Epoch: 8 Global Step: 361950 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:07,388-Speed 2623.76 samples/sec Loss 7.7035 LearningRate 0.0318 Epoch: 8 Global Step: 361960 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:11,281-Speed 2630.89 samples/sec Loss 7.5509 LearningRate 0.0318 Epoch: 8 Global Step: 361970 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:15,181-Speed 2626.34 samples/sec Loss 7.5062 LearningRate 0.0318 Epoch: 8 Global Step: 361980 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:19,081-Speed 2626.42 samples/sec Loss 7.6106 LearningRate 0.0318 Epoch: 8 Global Step: 361990 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:22,976-Speed 2629.50 samples/sec Loss 7.5464 LearningRate 0.0318 Epoch: 8 Global Step: 362000 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:26,875-Speed 2627.39 samples/sec Loss 7.6154 LearningRate 0.0318 Epoch: 8 Global Step: 362010 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:30,769-Speed 2630.32 samples/sec Loss 7.6386 LearningRate 0.0318 Epoch: 8 Global Step: 362020 Fp16 Grad Scale: 32768 Required: 53 hours
Training: 2022-04-14 12:24:34,677-Speed 2620.47 samples/sec Loss 7.5678 LearningRate 0.0318 Epoch: 8 Global Step: 362030 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:24:38,576-Speed 2627.34 samples/sec Loss 7.6209 LearningRate 0.0318 Epoch: 8 Global Step: 362040 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:24:42,477-Speed 2625.86 samples/sec Loss 7.7724 LearningRate 0.0318 Epoch: 8 Global Step: 362050 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:24:46,387-Speed 2619.42 samples/sec Loss 7.6197 LearningRate 0.0318 Epoch: 8 Global Step: 362060 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:24:50,282-Speed 2629.68 samples/sec Loss 7.4695 LearningRate 0.0318 Epoch: 8 Global Step: 362070 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:24:54,186-Speed 2624.06 samples/sec Loss 7.5402 LearningRate 0.0318 Epoch: 8 Global Step: 362080 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:24:58,090-Speed 2623.56 samples/sec Loss 7.6435 LearningRate 0.0318 Epoch: 8 Global Step: 362090 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:25:01,981-Speed 2632.04 samples/sec Loss 7.5512 LearningRate 0.0318 Epoch: 8 Global Step: 362100 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:25:05,901-Speed 2613.41 samples/sec Loss 7.7217 LearningRate 0.0318 Epoch: 8 Global Step: 362110 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:25:09,795-Speed 2629.79 samples/sec Loss 7.5748 LearningRate 0.0318 Epoch: 8 Global Step: 362120 Fp16 Grad Scale: 65536 Required: 53 hours
Training: 2022-04-14 12:25:13,689-Speed 2630.50 samples/sec Loss 7.6101 LearningRate 0.0318 Epoch: 8 Global Step: 362130 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:17,587-Speed 2627.72 samples/sec Loss 7.6678 LearningRate 0.0317 Epoch: 8 Global Step: 362140 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:21,544-Speed 2588.72 samples/sec Loss 7.4806 LearningRate 0.0317 Epoch: 8 Global Step: 362150 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:25,428-Speed 2637.13 samples/sec Loss 7.5636 LearningRate 0.0317 Epoch: 8 Global Step: 362160 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:29,327-Speed 2627.19 samples/sec Loss 7.6954 LearningRate 0.0317 Epoch: 8 Global Step: 362170 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:33,223-Speed 2629.15 samples/sec Loss 7.5092 LearningRate 0.0317 Epoch: 8 Global Step: 362180 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:37,134-Speed 2618.75 samples/sec Loss 7.6903 LearningRate 0.0317 Epoch: 8 Global Step: 362190 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:41,037-Speed 2624.11 samples/sec Loss 7.5484 LearningRate 0.0317 Epoch: 8 Global Step: 362200 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:44,940-Speed 2624.28 samples/sec Loss 7.6196 LearningRate 0.0317 Epoch: 8 Global Step: 362210 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:48,822-Speed 2637.94 samples/sec Loss 7.6806 LearningRate 0.0317 Epoch: 8 Global Step: 362220 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:25:52,714-Speed 2632.48 samples/sec Loss 7.6777 LearningRate 0.0317 Epoch: 8 Global Step: 362230 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:25:56,770-Speed 2525.06 samples/sec Loss 7.6006 LearningRate 0.0317 Epoch: 8 Global Step: 362240 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:26:00,717-Speed 2595.34 samples/sec Loss 7.5713 LearningRate 0.0317 Epoch: 8 Global Step: 362250 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:26:04,615-Speed 2627.12 samples/sec Loss 7.5597 LearningRate 0.0317 Epoch: 8 Global Step: 362260 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:26:08,505-Speed 2632.62 samples/sec Loss 7.6852 LearningRate 0.0317 Epoch: 8 Global Step: 362270 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:26:12,472-Speed 2582.07 samples/sec Loss 7.7681 LearningRate 0.0317 Epoch: 8 Global Step: 362280 Fp16 Grad Scale: 262144 Required: 53 hours
Training: 2022-04-14 12:26:16,523-Speed 2528.16 samples/sec Loss 7.7481 LearningRate 0.0317 Epoch: 8 Global Step: 362290 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:26:20,590-Speed 2518.16 samples/sec Loss 7.4858 LearningRate 0.0317 Epoch: 8 Global Step: 362300 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:26:24,661-Speed 2516.74 samples/sec Loss 7.4750 LearningRate 0.0317 Epoch: 8 Global Step: 362310 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:26:28,648-Speed 2569.20 samples/sec Loss 7.6046 LearningRate 0.0317 Epoch: 8 Global Step: 362320 Fp16 Grad Scale: 131072 Required: 53 hours
Training: 2022-04-14 12:26:32,564-Speed 2615.27 samples/sec Loss 7.6556 LearningRate 0.0317 Epoch: 8 Global Step: 362330 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:26:36,456-Speed 2631.61 samples/sec Loss 7.5445 LearningRate 0.0317 Epoch: 8 Global Step: 362340 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:26:40,360-Speed 2623.94 samples/sec Loss 7.6460 LearningRate 0.0317 Epoch: 8 Global Step: 362350 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:26:44,260-Speed 2626.33 samples/sec Loss 7.5368 LearningRate 0.0317 Epoch: 8 Global Step: 362360 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:26:48,153-Speed 2630.85 samples/sec Loss 7.7499 LearningRate 0.0317 Epoch: 8 Global Step: 362370 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:26:52,048-Speed 2629.79 samples/sec Loss 7.4846 LearningRate 0.0317 Epoch: 8 Global Step: 362380 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:26:55,952-Speed 2623.69 samples/sec Loss 7.6938 LearningRate 0.0317 Epoch: 8 Global Step: 362390 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:26:59,849-Speed 2628.85 samples/sec Loss 7.7377 LearningRate 0.0317 Epoch: 8 Global Step: 362400 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:27:03,736-Speed 2634.73 samples/sec Loss 7.7187 LearningRate 0.0317 Epoch: 8 Global Step: 362410 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:07,639-Speed 2624.30 samples/sec Loss 7.5457 LearningRate 0.0317 Epoch: 8 Global Step: 362420 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:11,530-Speed 2632.74 samples/sec Loss 7.7388 LearningRate 0.0317 Epoch: 8 Global Step: 362430 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:15,437-Speed 2621.76 samples/sec Loss 7.4396 LearningRate 0.0317 Epoch: 8 Global Step: 362440 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:19,340-Speed 2623.80 samples/sec Loss 7.7168 LearningRate 0.0317 Epoch: 8 Global Step: 362450 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:23,251-Speed 2619.16 samples/sec Loss 7.6409 LearningRate 0.0317 Epoch: 8 Global Step: 362460 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:27,160-Speed 2620.18 samples/sec Loss 7.5485 LearningRate 0.0317 Epoch: 8 Global Step: 362470 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:31,060-Speed 2626.72 samples/sec Loss 7.5972 LearningRate 0.0317 Epoch: 8 Global Step: 362480 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:34,962-Speed 2625.19 samples/sec Loss 7.6057 LearningRate 0.0317 Epoch: 8 Global Step: 362490 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:38,870-Speed 2620.56 samples/sec Loss 7.5215 LearningRate 0.0317 Epoch: 8 Global Step: 362500 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:42,766-Speed 2628.78 samples/sec Loss 7.5938 LearningRate 0.0317 Epoch: 8 Global Step: 362510 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:27:46,663-Speed 2628.68 samples/sec Loss 7.5679 LearningRate 0.0317 Epoch: 8 Global Step: 362520 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:27:50,540-Speed 2641.53 samples/sec Loss 7.5690 LearningRate 0.0317 Epoch: 8 Global Step: 362530 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:54,437-Speed 2629.23 samples/sec Loss 7.4749 LearningRate 0.0317 Epoch: 8 Global Step: 362540 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:27:58,337-Speed 2625.90 samples/sec Loss 7.4766 LearningRate 0.0317 Epoch: 8 Global Step: 362550 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:02,255-Speed 2614.36 samples/sec Loss 7.6118 LearningRate 0.0317 Epoch: 8 Global Step: 362560 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:06,155-Speed 2626.57 samples/sec Loss 7.6079 LearningRate 0.0317 Epoch: 8 Global Step: 362570 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:10,053-Speed 2627.46 samples/sec Loss 7.5795 LearningRate 0.0317 Epoch: 8 Global Step: 362580 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:13,954-Speed 2625.58 samples/sec Loss 7.4668 LearningRate 0.0317 Epoch: 8 Global Step: 362590 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:17,854-Speed 2628.92 samples/sec Loss 7.4602 LearningRate 0.0317 Epoch: 8 Global Step: 362600 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:21,749-Speed 2629.44 samples/sec Loss 7.6033 LearningRate 0.0317 Epoch: 8 Global Step: 362610 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:25,644-Speed 2629.35 samples/sec Loss 7.5687 LearningRate 0.0317 Epoch: 8 Global Step: 362620 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:29,538-Speed 2630.74 samples/sec Loss 7.7696 LearningRate 0.0317 Epoch: 8 Global Step: 362630 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:28:33,420-Speed 2638.05 samples/sec Loss 7.5610 LearningRate 0.0317 Epoch: 8 Global Step: 362640 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:37,312-Speed 2632.04 samples/sec Loss 7.6043 LearningRate 0.0317 Epoch: 8 Global Step: 362650 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:41,213-Speed 2625.85 samples/sec Loss 7.5873 LearningRate 0.0317 Epoch: 8 Global Step: 362660 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:45,113-Speed 2626.33 samples/sec Loss 7.6517 LearningRate 0.0317 Epoch: 8 Global Step: 362670 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:49,008-Speed 2629.88 samples/sec Loss 7.5060 LearningRate 0.0317 Epoch: 8 Global Step: 362680 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:52,930-Speed 2612.32 samples/sec Loss 7.6717 LearningRate 0.0317 Epoch: 8 Global Step: 362690 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:28:56,833-Speed 2624.14 samples/sec Loss 7.5805 LearningRate 0.0317 Epoch: 8 Global Step: 362700 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:29:00,727-Speed 2630.21 samples/sec Loss 7.5732 LearningRate 0.0317 Epoch: 8 Global Step: 362710 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:29:04,623-Speed 2629.27 samples/sec Loss 7.6046 LearningRate 0.0317 Epoch: 8 Global Step: 362720 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:29:08,518-Speed 2629.62 samples/sec Loss 7.5464 LearningRate 0.0317 Epoch: 8 Global Step: 362730 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:29:12,426-Speed 2621.05 samples/sec Loss 7.6009 LearningRate 0.0317 Epoch: 8 Global Step: 362740 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:29:16,304-Speed 2641.40 samples/sec Loss 7.6571 LearningRate 0.0317 Epoch: 8 Global Step: 362750 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:29:20,109-Speed 2692.36 samples/sec Loss 8.5421 LearningRate 0.0317 Epoch: 8 Global Step: 362760 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:24,005-Speed 2628.50 samples/sec Loss 8.3322 LearningRate 0.0317 Epoch: 8 Global Step: 362770 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:27,892-Speed 2635.18 samples/sec Loss 7.6839 LearningRate 0.0317 Epoch: 8 Global Step: 362780 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:31,831-Speed 2600.60 samples/sec Loss 7.7051 LearningRate 0.0317 Epoch: 8 Global Step: 362790 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:35,787-Speed 2589.55 samples/sec Loss 7.5280 LearningRate 0.0317 Epoch: 8 Global Step: 362800 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:39,685-Speed 2627.29 samples/sec Loss 7.5408 LearningRate 0.0317 Epoch: 8 Global Step: 362810 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:43,584-Speed 2627.32 samples/sec Loss 7.4308 LearningRate 0.0317 Epoch: 8 Global Step: 362820 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:47,481-Speed 2628.43 samples/sec Loss 7.6738 LearningRate 0.0317 Epoch: 8 Global Step: 362830 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:51,370-Speed 2633.95 samples/sec Loss 7.6577 LearningRate 0.0317 Epoch: 8 Global Step: 362840 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:55,261-Speed 2632.14 samples/sec Loss 7.5468 LearningRate 0.0317 Epoch: 8 Global Step: 362850 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:29:59,159-Speed 2627.62 samples/sec Loss 7.7167 LearningRate 0.0317 Epoch: 8 Global Step: 362860 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:03,062-Speed 2624.67 samples/sec Loss 7.4960 LearningRate 0.0317 Epoch: 8 Global Step: 362870 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:06,957-Speed 2629.54 samples/sec Loss 7.5773 LearningRate 0.0316 Epoch: 8 Global Step: 362880 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:10,856-Speed 2633.93 samples/sec Loss 7.6887 LearningRate 0.0316 Epoch: 8 Global Step: 362890 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:14,749-Speed 2631.30 samples/sec Loss 7.5102 LearningRate 0.0316 Epoch: 8 Global Step: 362900 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:18,640-Speed 2632.63 samples/sec Loss 7.6063 LearningRate 0.0316 Epoch: 8 Global Step: 362910 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:22,538-Speed 2627.46 samples/sec Loss 7.5013 LearningRate 0.0316 Epoch: 8 Global Step: 362920 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:26,426-Speed 2634.78 samples/sec Loss 7.7403 LearningRate 0.0316 Epoch: 8 Global Step: 362930 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:30,340-Speed 2616.35 samples/sec Loss 7.6598 LearningRate 0.0316 Epoch: 8 Global Step: 362940 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:34,234-Speed 2630.40 samples/sec Loss 7.5326 LearningRate 0.0316 Epoch: 8 Global Step: 362950 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:30:38,125-Speed 2632.66 samples/sec Loss 7.5285 LearningRate 0.0316 Epoch: 8 Global Step: 362960 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:30:42,024-Speed 2627.09 samples/sec Loss 7.5949 LearningRate 0.0316 Epoch: 8 Global Step: 362970 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:30:45,916-Speed 2631.21 samples/sec Loss 7.5566 LearningRate 0.0316 Epoch: 8 Global Step: 362980 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:30:49,810-Speed 2630.75 samples/sec Loss 7.6611 LearningRate 0.0316 Epoch: 8 Global Step: 362990 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:30:53,705-Speed 2629.70 samples/sec Loss 7.6002 LearningRate 0.0316 Epoch: 8 Global Step: 363000 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:30:57,606-Speed 2625.68 samples/sec Loss 7.6883 LearningRate 0.0316 Epoch: 8 Global Step: 363010 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:31:01,498-Speed 2631.50 samples/sec Loss 7.5466 LearningRate 0.0316 Epoch: 8 Global Step: 363020 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:31:05,392-Speed 2630.35 samples/sec Loss 7.5466 LearningRate 0.0316 Epoch: 8 Global Step: 363030 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:31:09,292-Speed 2626.24 samples/sec Loss 7.6790 LearningRate 0.0316 Epoch: 8 Global Step: 363040 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:31:13,236-Speed 2597.30 samples/sec Loss 7.6006 LearningRate 0.0316 Epoch: 8 Global Step: 363050 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:31:17,143-Speed 2621.35 samples/sec Loss 7.5377 LearningRate 0.0316 Epoch: 8 Global Step: 363060 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:31:21,037-Speed 2630.42 samples/sec Loss 7.6914 LearningRate 0.0316 Epoch: 8 Global Step: 363070 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:31:24,930-Speed 2630.98 samples/sec Loss 7.6088 LearningRate 0.0316 Epoch: 8 Global Step: 363080 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:31:28,826-Speed 2629.25 samples/sec Loss 7.4560 LearningRate 0.0316 Epoch: 8 Global Step: 363090 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:31:32,694-Speed 2647.97 samples/sec Loss 8.7917 LearningRate 0.0316 Epoch: 8 Global Step: 363100 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:31:36,586-Speed 2632.05 samples/sec Loss 8.1441 LearningRate 0.0316 Epoch: 8 Global Step: 363110 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:31:40,488-Speed 2624.31 samples/sec Loss 8.1125 LearningRate 0.0316 Epoch: 8 Global Step: 363120 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:31:44,376-Speed 2634.35 samples/sec Loss 7.7996 LearningRate 0.0316 Epoch: 8 Global Step: 363130 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:31:48,272-Speed 2629.05 samples/sec Loss 7.6416 LearningRate 0.0316 Epoch: 8 Global Step: 363140 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:31:52,171-Speed 2626.95 samples/sec Loss 7.6245 LearningRate 0.0316 Epoch: 8 Global Step: 363150 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:31:56,073-Speed 2625.72 samples/sec Loss 7.6626 LearningRate 0.0316 Epoch: 8 Global Step: 363160 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:31:59,966-Speed 2630.40 samples/sec Loss 7.5179 LearningRate 0.0316 Epoch: 8 Global Step: 363170 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:32:03,859-Speed 2630.75 samples/sec Loss 7.5265 LearningRate 0.0316 Epoch: 8 Global Step: 363180 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:32:07,755-Speed 2629.19 samples/sec Loss 7.4938 LearningRate 0.0316 Epoch: 8 Global Step: 363190 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:32:11,650-Speed 2629.77 samples/sec Loss 7.6847 LearningRate 0.0316 Epoch: 8 Global Step: 363200 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:15,563-Speed 2617.69 samples/sec Loss 7.4749 LearningRate 0.0316 Epoch: 8 Global Step: 363210 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:19,489-Speed 2608.77 samples/sec Loss 7.5664 LearningRate 0.0316 Epoch: 8 Global Step: 363220 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:23,384-Speed 2629.81 samples/sec Loss 7.7187 LearningRate 0.0316 Epoch: 8 Global Step: 363230 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:27,281-Speed 2628.03 samples/sec Loss 7.5069 LearningRate 0.0316 Epoch: 8 Global Step: 363240 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:31,177-Speed 2629.50 samples/sec Loss 8.4893 LearningRate 0.0316 Epoch: 8 Global Step: 363250 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:35,068-Speed 2632.62 samples/sec Loss 7.8970 LearningRate 0.0316 Epoch: 8 Global Step: 363260 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:38,961-Speed 2630.34 samples/sec Loss 7.5665 LearningRate 0.0316 Epoch: 8 Global Step: 363270 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:42,858-Speed 2628.48 samples/sec Loss 7.6403 LearningRate 0.0316 Epoch: 8 Global Step: 363280 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:46,767-Speed 2620.12 samples/sec Loss 7.5128 LearningRate 0.0316 Epoch: 8 Global Step: 363290 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:32:50,661-Speed 2630.86 samples/sec Loss 7.5216 LearningRate 0.0316 Epoch: 8 Global Step: 363300 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:32:54,565-Speed 2623.96 samples/sec Loss 7.5121 LearningRate 0.0316 Epoch: 8 Global Step: 363310 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:32:58,450-Speed 2636.16 samples/sec Loss 7.5214 LearningRate 0.0316 Epoch: 8 Global Step: 363320 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:02,373-Speed 2610.62 samples/sec Loss 7.5101 LearningRate 0.0316 Epoch: 8 Global Step: 363330 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:06,258-Speed 2636.47 samples/sec Loss 7.6122 LearningRate 0.0316 Epoch: 8 Global Step: 363340 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:10,150-Speed 2631.66 samples/sec Loss 7.5600 LearningRate 0.0316 Epoch: 8 Global Step: 363350 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:14,058-Speed 2620.70 samples/sec Loss 7.5226 LearningRate 0.0316 Epoch: 8 Global Step: 363360 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:17,943-Speed 2636.94 samples/sec Loss 7.5624 LearningRate 0.0316 Epoch: 8 Global Step: 363370 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:21,838-Speed 2630.03 samples/sec Loss 7.6197 LearningRate 0.0316 Epoch: 8 Global Step: 363380 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:25,739-Speed 2625.36 samples/sec Loss 7.6009 LearningRate 0.0316 Epoch: 8 Global Step: 363390 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:33:29,627-Speed 2634.17 samples/sec Loss 7.5549 LearningRate 0.0316 Epoch: 8 Global Step: 363400 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:33:33,532-Speed 2623.47 samples/sec Loss 7.5400 LearningRate 0.0316 Epoch: 8 Global Step: 363410 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:33:37,421-Speed 2633.62 samples/sec Loss 7.6495 LearningRate 0.0316 Epoch: 8 Global Step: 363420 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:33:41,316-Speed 2629.53 samples/sec Loss 7.5901 LearningRate 0.0316 Epoch: 8 Global Step: 363430 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:33:45,228-Speed 2618.50 samples/sec Loss 7.5642 LearningRate 0.0316 Epoch: 8 Global Step: 363440 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:33:49,140-Speed 2618.26 samples/sec Loss 7.5586 LearningRate 0.0316 Epoch: 8 Global Step: 363450 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:33:53,037-Speed 2628.86 samples/sec Loss 7.6006 LearningRate 0.0316 Epoch: 8 Global Step: 363460 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:33:56,929-Speed 2631.49 samples/sec Loss 7.5581 LearningRate 0.0316 Epoch: 8 Global Step: 363470 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:34:00,834-Speed 2622.78 samples/sec Loss 7.6900 LearningRate 0.0316 Epoch: 8 Global Step: 363480 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:34:04,719-Speed 2636.52 samples/sec Loss 7.5239 LearningRate 0.0316 Epoch: 8 Global Step: 363490 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:34:08,621-Speed 2625.41 samples/sec Loss 7.4326 LearningRate 0.0316 Epoch: 8 Global Step: 363500 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:34:12,516-Speed 2629.20 samples/sec Loss 7.6287 LearningRate 0.0316 Epoch: 8 Global Step: 363510 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:34:16,399-Speed 2637.71 samples/sec Loss 7.7497 LearningRate 0.0316 Epoch: 8 Global Step: 363520 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:34:20,241-Speed 2665.83 samples/sec Loss 8.0541 LearningRate 0.0316 Epoch: 8 Global Step: 363530 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:24,136-Speed 2630.10 samples/sec Loss 7.7940 LearningRate 0.0316 Epoch: 8 Global Step: 363540 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:28,027-Speed 2633.36 samples/sec Loss 7.6522 LearningRate 0.0316 Epoch: 8 Global Step: 363550 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:31,918-Speed 2632.31 samples/sec Loss 7.5996 LearningRate 0.0316 Epoch: 8 Global Step: 363560 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:35,811-Speed 2631.17 samples/sec Loss 7.5314 LearningRate 0.0316 Epoch: 8 Global Step: 363570 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:39,705-Speed 2630.60 samples/sec Loss 7.5482 LearningRate 0.0316 Epoch: 8 Global Step: 363580 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:43,594-Speed 2633.50 samples/sec Loss 7.6091 LearningRate 0.0316 Epoch: 8 Global Step: 363590 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:47,484-Speed 2632.86 samples/sec Loss 7.5019 LearningRate 0.0316 Epoch: 8 Global Step: 363600 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:51,377-Speed 2631.54 samples/sec Loss 7.6446 LearningRate 0.0316 Epoch: 8 Global Step: 363610 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:55,265-Speed 2634.34 samples/sec Loss 7.4672 LearningRate 0.0315 Epoch: 8 Global Step: 363620 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:34:59,158-Speed 2630.96 samples/sec Loss 7.6636 LearningRate 0.0315 Epoch: 8 Global Step: 363630 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:03,046-Speed 2634.00 samples/sec Loss 7.5351 LearningRate 0.0315 Epoch: 8 Global Step: 363640 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:06,946-Speed 2626.22 samples/sec Loss 7.6799 LearningRate 0.0315 Epoch: 8 Global Step: 363650 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:10,847-Speed 2625.62 samples/sec Loss 7.5884 LearningRate 0.0315 Epoch: 8 Global Step: 363660 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:14,751-Speed 2623.76 samples/sec Loss 7.7447 LearningRate 0.0315 Epoch: 8 Global Step: 363670 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:18,643-Speed 2631.65 samples/sec Loss 7.6893 LearningRate 0.0315 Epoch: 8 Global Step: 363680 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:22,532-Speed 2633.18 samples/sec Loss 7.6179 LearningRate 0.0315 Epoch: 8 Global Step: 363690 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:26,428-Speed 2629.75 samples/sec Loss 7.5853 LearningRate 0.0315 Epoch: 8 Global Step: 363700 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:30,321-Speed 2631.21 samples/sec Loss 7.5761 LearningRate 0.0315 Epoch: 8 Global Step: 363710 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:34,225-Speed 2623.47 samples/sec Loss 7.6013 LearningRate 0.0315 Epoch: 8 Global Step: 363720 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:35:38,123-Speed 2627.36 samples/sec Loss 7.6044 LearningRate 0.0315 Epoch: 8 Global Step: 363730 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:35:42,023-Speed 2626.29 samples/sec Loss 7.5518 LearningRate 0.0315 Epoch: 8 Global Step: 363740 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:35:45,917-Speed 2630.68 samples/sec Loss 7.5421 LearningRate 0.0315 Epoch: 8 Global Step: 363750 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:35:49,810-Speed 2631.10 samples/sec Loss 7.5815 LearningRate 0.0315 Epoch: 8 Global Step: 363760 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:35:53,708-Speed 2627.71 samples/sec Loss 7.6844 LearningRate 0.0315 Epoch: 8 Global Step: 363770 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:35:57,602-Speed 2630.67 samples/sec Loss 7.6194 LearningRate 0.0315 Epoch: 8 Global Step: 363780 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:36:01,509-Speed 2621.36 samples/sec Loss 7.4828 LearningRate 0.0315 Epoch: 8 Global Step: 363790 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:36:05,408-Speed 2626.64 samples/sec Loss 7.6271 LearningRate 0.0315 Epoch: 8 Global Step: 363800 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:36:09,307-Speed 2626.77 samples/sec Loss 7.6302 LearningRate 0.0315 Epoch: 8 Global Step: 363810 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:36:13,194-Speed 2634.98 samples/sec Loss 7.6527 LearningRate 0.0315 Epoch: 8 Global Step: 363820 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:36:17,087-Speed 2631.57 samples/sec Loss 7.5600 LearningRate 0.0315 Epoch: 8 Global Step: 363830 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:20,986-Speed 2626.84 samples/sec Loss 7.5236 LearningRate 0.0315 Epoch: 8 Global Step: 363840 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:24,882-Speed 2629.11 samples/sec Loss 7.6016 LearningRate 0.0315 Epoch: 8 Global Step: 363850 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:28,774-Speed 2632.14 samples/sec Loss 7.6132 LearningRate 0.0315 Epoch: 8 Global Step: 363860 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:32,671-Speed 2627.94 samples/sec Loss 7.5966 LearningRate 0.0315 Epoch: 8 Global Step: 363870 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:36,565-Speed 2630.05 samples/sec Loss 7.5688 LearningRate 0.0315 Epoch: 8 Global Step: 363880 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:40,470-Speed 2622.98 samples/sec Loss 7.5607 LearningRate 0.0315 Epoch: 8 Global Step: 363890 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:44,361-Speed 2632.29 samples/sec Loss 7.5644 LearningRate 0.0315 Epoch: 8 Global Step: 363900 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:48,257-Speed 2629.03 samples/sec Loss 7.5363 LearningRate 0.0315 Epoch: 8 Global Step: 363910 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:52,149-Speed 2631.65 samples/sec Loss 7.7447 LearningRate 0.0315 Epoch: 8 Global Step: 363920 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:56,036-Speed 2635.05 samples/sec Loss 7.5651 LearningRate 0.0315 Epoch: 8 Global Step: 363930 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:36:59,905-Speed 2647.46 samples/sec Loss 7.9156 LearningRate 0.0315 Epoch: 8 Global Step: 363940 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:03,795-Speed 2632.63 samples/sec Loss 8.3529 LearningRate 0.0315 Epoch: 8 Global Step: 363950 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:07,685-Speed 2633.33 samples/sec Loss 7.7071 LearningRate 0.0315 Epoch: 8 Global Step: 363960 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:11,575-Speed 2632.93 samples/sec Loss 7.6699 LearningRate 0.0315 Epoch: 8 Global Step: 363970 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:15,472-Speed 2628.33 samples/sec Loss 7.4623 LearningRate 0.0315 Epoch: 8 Global Step: 363980 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:19,356-Speed 2637.04 samples/sec Loss 7.5443 LearningRate 0.0315 Epoch: 8 Global Step: 363990 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:23,251-Speed 2629.38 samples/sec Loss 7.5935 LearningRate 0.0315 Epoch: 8 Global Step: 364000 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:27,141-Speed 2633.53 samples/sec Loss 7.6407 LearningRate 0.0315 Epoch: 8 Global Step: 364010 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:31,031-Speed 2633.06 samples/sec Loss 7.5691 LearningRate 0.0315 Epoch: 8 Global Step: 364020 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:34,928-Speed 2628.10 samples/sec Loss 7.6766 LearningRate 0.0315 Epoch: 8 Global Step: 364030 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:37:38,820-Speed 2631.05 samples/sec Loss 7.5956 LearningRate 0.0315 Epoch: 8 Global Step: 364040 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:37:42,714-Speed 2630.71 samples/sec Loss 7.5471 LearningRate 0.0315 Epoch: 8 Global Step: 364050 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:37:46,616-Speed 2625.01 samples/sec Loss 7.6707 LearningRate 0.0315 Epoch: 8 Global Step: 364060 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:37:50,513-Speed 2628.44 samples/sec Loss 7.5378 LearningRate 0.0315 Epoch: 8 Global Step: 364070 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:37:54,427-Speed 2616.74 samples/sec Loss 7.5641 LearningRate 0.0315 Epoch: 8 Global Step: 364080 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:37:58,330-Speed 2623.76 samples/sec Loss 7.6798 LearningRate 0.0315 Epoch: 8 Global Step: 364090 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:38:02,387-Speed 2524.40 samples/sec Loss 7.6542 LearningRate 0.0315 Epoch: 8 Global Step: 364100 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:38:06,282-Speed 2630.38 samples/sec Loss 7.5917 LearningRate 0.0315 Epoch: 8 Global Step: 364110 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:38:10,176-Speed 2629.87 samples/sec Loss 7.7024 LearningRate 0.0315 Epoch: 8 Global Step: 364120 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:38:14,071-Speed 2629.68 samples/sec Loss 7.4664 LearningRate 0.0315 Epoch: 8 Global Step: 364130 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:38:17,969-Speed 2627.60 samples/sec Loss 7.4888 LearningRate 0.0315 Epoch: 8 Global Step: 364140 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:21,866-Speed 2628.38 samples/sec Loss 7.6173 LearningRate 0.0315 Epoch: 8 Global Step: 364150 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:25,776-Speed 2619.92 samples/sec Loss 7.6693 LearningRate 0.0315 Epoch: 8 Global Step: 364160 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:29,668-Speed 2631.32 samples/sec Loss 7.6058 LearningRate 0.0315 Epoch: 8 Global Step: 364170 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:33,562-Speed 2630.12 samples/sec Loss 7.5995 LearningRate 0.0315 Epoch: 8 Global Step: 364180 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:37,456-Speed 2630.08 samples/sec Loss 7.5211 LearningRate 0.0315 Epoch: 8 Global Step: 364190 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:41,352-Speed 2629.29 samples/sec Loss 7.4725 LearningRate 0.0315 Epoch: 8 Global Step: 364200 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:45,245-Speed 2630.81 samples/sec Loss 7.4886 LearningRate 0.0315 Epoch: 8 Global Step: 364210 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:49,140-Speed 2630.02 samples/sec Loss 7.5928 LearningRate 0.0315 Epoch: 8 Global Step: 364220 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:53,033-Speed 2630.69 samples/sec Loss 7.6091 LearningRate 0.0315 Epoch: 8 Global Step: 364230 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:38:56,929-Speed 2628.68 samples/sec Loss 7.5403 LearningRate 0.0315 Epoch: 8 Global Step: 364240 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:00,837-Speed 2621.38 samples/sec Loss 7.6329 LearningRate 0.0315 Epoch: 8 Global Step: 364250 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:04,726-Speed 2633.24 samples/sec Loss 7.6006 LearningRate 0.0315 Epoch: 8 Global Step: 364260 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:08,625-Speed 2627.15 samples/sec Loss 7.5019 LearningRate 0.0315 Epoch: 8 Global Step: 364270 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:12,523-Speed 2626.97 samples/sec Loss 7.3937 LearningRate 0.0315 Epoch: 8 Global Step: 364280 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:16,426-Speed 2624.52 samples/sec Loss 7.7210 LearningRate 0.0315 Epoch: 8 Global Step: 364290 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:20,328-Speed 2625.16 samples/sec Loss 7.5852 LearningRate 0.0315 Epoch: 8 Global Step: 364300 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:24,220-Speed 2631.51 samples/sec Loss 7.6095 LearningRate 0.0315 Epoch: 8 Global Step: 364310 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:28,113-Speed 2631.40 samples/sec Loss 7.6353 LearningRate 0.0315 Epoch: 8 Global Step: 364320 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:32,009-Speed 2628.54 samples/sec Loss 7.6265 LearningRate 0.0315 Epoch: 8 Global Step: 364330 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:35,887-Speed 2641.54 samples/sec Loss 7.5301 LearningRate 0.0315 Epoch: 8 Global Step: 364340 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:39,784-Speed 2627.72 samples/sec Loss 7.5230 LearningRate 0.0314 Epoch: 8 Global Step: 364350 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:43,688-Speed 2623.24 samples/sec Loss 7.6471 LearningRate 0.0314 Epoch: 8 Global Step: 364360 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:47,589-Speed 2625.96 samples/sec Loss 7.5649 LearningRate 0.0314 Epoch: 8 Global Step: 364370 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:51,487-Speed 2627.59 samples/sec Loss 7.5426 LearningRate 0.0314 Epoch: 8 Global Step: 364380 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:55,409-Speed 2611.87 samples/sec Loss 7.4566 LearningRate 0.0314 Epoch: 8 Global Step: 364390 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:39:59,309-Speed 2626.05 samples/sec Loss 7.6708 LearningRate 0.0314 Epoch: 8 Global Step: 364400 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:03,216-Speed 2621.80 samples/sec Loss 7.4747 LearningRate 0.0314 Epoch: 8 Global Step: 364410 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:07,125-Speed 2620.15 samples/sec Loss 7.5527 LearningRate 0.0314 Epoch: 8 Global Step: 364420 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:11,026-Speed 2625.36 samples/sec Loss 7.5494 LearningRate 0.0314 Epoch: 8 Global Step: 364430 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:14,919-Speed 2630.57 samples/sec Loss 7.4655 LearningRate 0.0314 Epoch: 8 Global Step: 364440 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:40:18,812-Speed 2631.23 samples/sec Loss 7.5466 LearningRate 0.0314 Epoch: 8 Global Step: 364450 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:40:22,706-Speed 2629.95 samples/sec Loss 7.6239 LearningRate 0.0314 Epoch: 8 Global Step: 364460 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:40:26,581-Speed 2642.97 samples/sec Loss 7.7455 LearningRate 0.0314 Epoch: 8 Global Step: 364470 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:30,477-Speed 2628.94 samples/sec Loss 7.5832 LearningRate 0.0314 Epoch: 8 Global Step: 364480 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:34,371-Speed 2630.63 samples/sec Loss 7.4699 LearningRate 0.0314 Epoch: 8 Global Step: 364490 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:38,263-Speed 2631.91 samples/sec Loss 7.5545 LearningRate 0.0314 Epoch: 8 Global Step: 364500 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:42,164-Speed 2625.73 samples/sec Loss 7.6300 LearningRate 0.0314 Epoch: 8 Global Step: 364510 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:46,063-Speed 2626.96 samples/sec Loss 7.6323 LearningRate 0.0314 Epoch: 8 Global Step: 364520 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:49,958-Speed 2629.02 samples/sec Loss 7.5544 LearningRate 0.0314 Epoch: 8 Global Step: 364530 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:53,854-Speed 2629.21 samples/sec Loss 7.5823 LearningRate 0.0314 Epoch: 8 Global Step: 364540 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:40:57,752-Speed 2627.80 samples/sec Loss 7.6089 LearningRate 0.0314 Epoch: 8 Global Step: 364550 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:01,653-Speed 2625.26 samples/sec Loss 7.5297 LearningRate 0.0314 Epoch: 8 Global Step: 364560 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:05,558-Speed 2622.85 samples/sec Loss 7.5416 LearningRate 0.0314 Epoch: 8 Global Step: 364570 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:41:09,453-Speed 2629.36 samples/sec Loss 7.7350 LearningRate 0.0314 Epoch: 8 Global Step: 364580 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:41:13,346-Speed 2631.57 samples/sec Loss 7.7096 LearningRate 0.0314 Epoch: 8 Global Step: 364590 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:17,260-Speed 2616.68 samples/sec Loss 7.4256 LearningRate 0.0314 Epoch: 8 Global Step: 364600 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:21,163-Speed 2624.68 samples/sec Loss 7.5749 LearningRate 0.0314 Epoch: 8 Global Step: 364610 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:25,067-Speed 2623.33 samples/sec Loss 7.6492 LearningRate 0.0314 Epoch: 8 Global Step: 364620 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:28,972-Speed 2623.36 samples/sec Loss 7.5326 LearningRate 0.0314 Epoch: 8 Global Step: 364630 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:32,877-Speed 2622.24 samples/sec Loss 7.6460 LearningRate 0.0314 Epoch: 8 Global Step: 364640 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:36,775-Speed 2627.39 samples/sec Loss 7.6482 LearningRate 0.0314 Epoch: 8 Global Step: 364650 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:40,676-Speed 2625.82 samples/sec Loss 7.5697 LearningRate 0.0314 Epoch: 8 Global Step: 364660 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:44,585-Speed 2620.56 samples/sec Loss 7.5493 LearningRate 0.0314 Epoch: 8 Global Step: 364670 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:48,478-Speed 2630.66 samples/sec Loss 7.4491 LearningRate 0.0314 Epoch: 8 Global Step: 364680 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:41:52,382-Speed 2624.00 samples/sec Loss 7.5186 LearningRate 0.0314 Epoch: 8 Global Step: 364690 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:41:56,285-Speed 2624.15 samples/sec Loss 7.5120 LearningRate 0.0314 Epoch: 8 Global Step: 364700 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:42:00,135-Speed 2659.99 samples/sec Loss 7.5735 LearningRate 0.0314 Epoch: 8 Global Step: 364710 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:42:04,035-Speed 2626.07 samples/sec Loss 7.6642 LearningRate 0.0314 Epoch: 8 Global Step: 364720 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:42:07,934-Speed 2627.32 samples/sec Loss 7.6045 LearningRate 0.0314 Epoch: 8 Global Step: 364730 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:42:11,823-Speed 2633.64 samples/sec Loss 7.6052 LearningRate 0.0314 Epoch: 8 Global Step: 364740 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:15,734-Speed 2618.42 samples/sec Loss 8.5925 LearningRate 0.0314 Epoch: 8 Global Step: 364750 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:19,649-Speed 2616.47 samples/sec Loss 8.0332 LearningRate 0.0314 Epoch: 8 Global Step: 364760 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:23,554-Speed 2622.36 samples/sec Loss 7.8472 LearningRate 0.0314 Epoch: 8 Global Step: 364770 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:27,451-Speed 2628.62 samples/sec Loss 7.9295 LearningRate 0.0314 Epoch: 8 Global Step: 364780 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:31,342-Speed 2632.44 samples/sec Loss 7.5687 LearningRate 0.0314 Epoch: 8 Global Step: 364790 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:35,253-Speed 2619.24 samples/sec Loss 7.6629 LearningRate 0.0314 Epoch: 8 Global Step: 364800 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:39,141-Speed 2634.09 samples/sec Loss 7.5570 LearningRate 0.0314 Epoch: 8 Global Step: 364810 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:43,039-Speed 2627.39 samples/sec Loss 7.6401 LearningRate 0.0314 Epoch: 8 Global Step: 364820 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:46,929-Speed 2632.94 samples/sec Loss 7.6121 LearningRate 0.0314 Epoch: 8 Global Step: 364830 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:42:50,822-Speed 2630.88 samples/sec Loss 7.6820 LearningRate 0.0314 Epoch: 8 Global Step: 364840 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:42:54,713-Speed 2632.52 samples/sec Loss 7.5710 LearningRate 0.0314 Epoch: 8 Global Step: 364850 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:42:58,654-Speed 2598.63 samples/sec Loss 7.6142 LearningRate 0.0314 Epoch: 8 Global Step: 364860 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:02,565-Speed 2619.00 samples/sec Loss 7.6604 LearningRate 0.0314 Epoch: 8 Global Step: 364870 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:06,465-Speed 2625.95 samples/sec Loss 7.5793 LearningRate 0.0314 Epoch: 8 Global Step: 364880 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:10,359-Speed 2630.42 samples/sec Loss 7.5681 LearningRate 0.0314 Epoch: 8 Global Step: 364890 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:14,250-Speed 2632.51 samples/sec Loss 7.5149 LearningRate 0.0314 Epoch: 8 Global Step: 364900 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:18,146-Speed 2629.25 samples/sec Loss 7.5525 LearningRate 0.0314 Epoch: 8 Global Step: 364910 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:22,041-Speed 2629.48 samples/sec Loss 7.5154 LearningRate 0.0314 Epoch: 8 Global Step: 364920 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:25,957-Speed 2615.41 samples/sec Loss 7.5880 LearningRate 0.0314 Epoch: 8 Global Step: 364930 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:29,851-Speed 2630.32 samples/sec Loss 7.5300 LearningRate 0.0314 Epoch: 8 Global Step: 364940 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:33,744-Speed 2630.62 samples/sec Loss 7.5416 LearningRate 0.0314 Epoch: 8 Global Step: 364950 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:43:37,641-Speed 2628.45 samples/sec Loss 7.5581 LearningRate 0.0314 Epoch: 8 Global Step: 364960 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:43:41,535-Speed 2629.75 samples/sec Loss 7.6480 LearningRate 0.0314 Epoch: 8 Global Step: 364970 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:43:45,426-Speed 2632.42 samples/sec Loss 7.4815 LearningRate 0.0314 Epoch: 8 Global Step: 364980 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:43:49,335-Speed 2620.64 samples/sec Loss 7.4437 LearningRate 0.0314 Epoch: 8 Global Step: 364990 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:43:53,227-Speed 2632.24 samples/sec Loss 7.5324 LearningRate 0.0314 Epoch: 8 Global Step: 365000 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:43:57,098-Speed 2645.69 samples/sec Loss 7.9834 LearningRate 0.0314 Epoch: 8 Global Step: 365010 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:00,992-Speed 2629.94 samples/sec Loss 7.9409 LearningRate 0.0314 Epoch: 8 Global Step: 365020 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:04,879-Speed 2635.13 samples/sec Loss 7.6072 LearningRate 0.0314 Epoch: 8 Global Step: 365030 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:08,771-Speed 2631.24 samples/sec Loss 7.4733 LearningRate 0.0314 Epoch: 8 Global Step: 365040 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:12,665-Speed 2630.51 samples/sec Loss 7.6131 LearningRate 0.0314 Epoch: 8 Global Step: 365050 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:16,561-Speed 2628.80 samples/sec Loss 7.8514 LearningRate 0.0314 Epoch: 8 Global Step: 365060 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:20,451-Speed 2633.30 samples/sec Loss 7.5977 LearningRate 0.0314 Epoch: 8 Global Step: 365070 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:24,349-Speed 2627.68 samples/sec Loss 7.4287 LearningRate 0.0314 Epoch: 8 Global Step: 365080 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:28,240-Speed 2632.57 samples/sec Loss 7.4832 LearningRate 0.0313 Epoch: 8 Global Step: 365090 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:32,134-Speed 2630.32 samples/sec Loss 7.4968 LearningRate 0.0313 Epoch: 8 Global Step: 365100 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:44:36,033-Speed 2627.23 samples/sec Loss 7.5064 LearningRate 0.0313 Epoch: 8 Global Step: 365110 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:44:39,926-Speed 2630.74 samples/sec Loss 7.5684 LearningRate 0.0313 Epoch: 8 Global Step: 365120 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:44:43,819-Speed 2630.56 samples/sec Loss 7.6254 LearningRate 0.0313 Epoch: 8 Global Step: 365130 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:44:47,710-Speed 2632.23 samples/sec Loss 7.6328 LearningRate 0.0313 Epoch: 8 Global Step: 365140 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:44:51,601-Speed 2632.30 samples/sec Loss 7.6107 LearningRate 0.0313 Epoch: 8 Global Step: 365150 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:44:55,499-Speed 2627.75 samples/sec Loss 7.5972 LearningRate 0.0313 Epoch: 8 Global Step: 365160 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:44:59,400-Speed 2625.33 samples/sec Loss 7.5754 LearningRate 0.0313 Epoch: 8 Global Step: 365170 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:45:03,291-Speed 2632.63 samples/sec Loss 7.6525 LearningRate 0.0313 Epoch: 8 Global Step: 365180 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:45:07,187-Speed 2629.10 samples/sec Loss 7.5700 LearningRate 0.0313 Epoch: 8 Global Step: 365190 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:45:11,079-Speed 2631.22 samples/sec Loss 7.5349 LearningRate 0.0313 Epoch: 8 Global Step: 365200 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:45:14,983-Speed 2623.51 samples/sec Loss 7.5774 LearningRate 0.0313 Epoch: 8 Global Step: 365210 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:18,881-Speed 2627.55 samples/sec Loss 7.6623 LearningRate 0.0313 Epoch: 8 Global Step: 365220 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:22,785-Speed 2623.47 samples/sec Loss 7.5630 LearningRate 0.0313 Epoch: 8 Global Step: 365230 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:26,679-Speed 2630.47 samples/sec Loss 7.5300 LearningRate 0.0313 Epoch: 8 Global Step: 365240 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:30,570-Speed 2632.21 samples/sec Loss 7.6114 LearningRate 0.0313 Epoch: 8 Global Step: 365250 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:34,461-Speed 2632.20 samples/sec Loss 7.4788 LearningRate 0.0313 Epoch: 8 Global Step: 365260 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:38,369-Speed 2621.24 samples/sec Loss 7.5550 LearningRate 0.0313 Epoch: 8 Global Step: 365270 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:42,267-Speed 2627.85 samples/sec Loss 7.5693 LearningRate 0.0313 Epoch: 8 Global Step: 365280 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:46,161-Speed 2629.86 samples/sec Loss 7.5864 LearningRate 0.0313 Epoch: 8 Global Step: 365290 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:50,053-Speed 2632.22 samples/sec Loss 7.4533 LearningRate 0.0313 Epoch: 8 Global Step: 365300 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:45:53,946-Speed 2630.64 samples/sec Loss 7.5567 LearningRate 0.0313 Epoch: 8 Global Step: 365310 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:45:57,841-Speed 2629.39 samples/sec Loss 7.5728 LearningRate 0.0313 Epoch: 8 Global Step: 365320 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:01,738-Speed 2628.70 samples/sec Loss 7.5574 LearningRate 0.0313 Epoch: 8 Global Step: 365330 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:05,641-Speed 2623.73 samples/sec Loss 7.5988 LearningRate 0.0313 Epoch: 8 Global Step: 365340 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:09,535-Speed 2630.20 samples/sec Loss 7.6604 LearningRate 0.0313 Epoch: 8 Global Step: 365350 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:13,430-Speed 2629.57 samples/sec Loss 7.5531 LearningRate 0.0313 Epoch: 8 Global Step: 365360 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:17,324-Speed 2630.57 samples/sec Loss 7.4258 LearningRate 0.0313 Epoch: 8 Global Step: 365370 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:21,226-Speed 2625.01 samples/sec Loss 7.5228 LearningRate 0.0313 Epoch: 8 Global Step: 365380 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:25,117-Speed 2632.37 samples/sec Loss 7.5423 LearningRate 0.0313 Epoch: 8 Global Step: 365390 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:29,013-Speed 2629.35 samples/sec Loss 7.6250 LearningRate 0.0313 Epoch: 8 Global Step: 365400 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:46:32,905-Speed 2631.61 samples/sec Loss 7.4882 LearningRate 0.0313 Epoch: 8 Global Step: 365410 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:46:36,797-Speed 2631.23 samples/sec Loss 7.4423 LearningRate 0.0313 Epoch: 8 Global Step: 365420 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:46:40,697-Speed 2626.32 samples/sec Loss 7.5314 LearningRate 0.0313 Epoch: 8 Global Step: 365430 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:46:44,614-Speed 2615.41 samples/sec Loss 7.5946 LearningRate 0.0313 Epoch: 8 Global Step: 365440 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:46:48,507-Speed 2630.29 samples/sec Loss 7.6296 LearningRate 0.0313 Epoch: 8 Global Step: 365450 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:46:52,405-Speed 2627.71 samples/sec Loss 7.5172 LearningRate 0.0313 Epoch: 8 Global Step: 365460 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:46:56,282-Speed 2641.64 samples/sec Loss 7.4477 LearningRate 0.0313 Epoch: 8 Global Step: 365470 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:00,198-Speed 2615.99 samples/sec Loss 7.5830 LearningRate 0.0313 Epoch: 8 Global Step: 365480 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:04,092-Speed 2630.06 samples/sec Loss 7.6843 LearningRate 0.0313 Epoch: 8 Global Step: 365490 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:07,986-Speed 2630.71 samples/sec Loss 7.5952 LearningRate 0.0313 Epoch: 8 Global Step: 365500 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:11,880-Speed 2630.16 samples/sec Loss 7.6831 LearningRate 0.0313 Epoch: 8 Global Step: 365510 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:15,774-Speed 2630.39 samples/sec Loss 7.5465 LearningRate 0.0313 Epoch: 8 Global Step: 365520 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:19,699-Speed 2610.10 samples/sec Loss 7.4826 LearningRate 0.0313 Epoch: 8 Global Step: 365530 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:23,601-Speed 2624.65 samples/sec Loss 7.5583 LearningRate 0.0313 Epoch: 8 Global Step: 365540 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:27,492-Speed 2632.26 samples/sec Loss 7.4935 LearningRate 0.0313 Epoch: 8 Global Step: 365550 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:31,384-Speed 2632.00 samples/sec Loss 7.6590 LearningRate 0.0313 Epoch: 8 Global Step: 365560 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:47:35,274-Speed 2632.61 samples/sec Loss 7.5304 LearningRate 0.0313 Epoch: 8 Global Step: 365570 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:47:39,208-Speed 2603.71 samples/sec Loss 7.5632 LearningRate 0.0313 Epoch: 8 Global Step: 365580 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:47:43,109-Speed 2625.79 samples/sec Loss 7.6976 LearningRate 0.0313 Epoch: 8 Global Step: 365590 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:47:46,998-Speed 2633.33 samples/sec Loss 7.3817 LearningRate 0.0313 Epoch: 8 Global Step: 365600 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:47:50,890-Speed 2631.82 samples/sec Loss 7.5879 LearningRate 0.0313 Epoch: 8 Global Step: 365610 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:47:54,782-Speed 2631.67 samples/sec Loss 7.6014 LearningRate 0.0313 Epoch: 8 Global Step: 365620 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:47:58,673-Speed 2632.02 samples/sec Loss 7.5598 LearningRate 0.0313 Epoch: 8 Global Step: 365630 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:02,578-Speed 2622.97 samples/sec Loss 7.5803 LearningRate 0.0313 Epoch: 8 Global Step: 365640 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:06,471-Speed 2631.01 samples/sec Loss 7.4091 LearningRate 0.0313 Epoch: 8 Global Step: 365650 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:10,372-Speed 2625.24 samples/sec Loss 7.5735 LearningRate 0.0313 Epoch: 8 Global Step: 365660 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:14,252-Speed 2640.24 samples/sec Loss 7.6037 LearningRate 0.0313 Epoch: 8 Global Step: 365670 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:18,139-Speed 2635.19 samples/sec Loss 7.5424 LearningRate 0.0313 Epoch: 8 Global Step: 365680 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:22,043-Speed 2623.70 samples/sec Loss 7.5257 LearningRate 0.0313 Epoch: 8 Global Step: 365690 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:25,929-Speed 2635.50 samples/sec Loss 7.7507 LearningRate 0.0313 Epoch: 8 Global Step: 365700 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:29,837-Speed 2620.70 samples/sec Loss 7.4545 LearningRate 0.0313 Epoch: 8 Global Step: 365710 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:33,741-Speed 2623.84 samples/sec Loss 7.4576 LearningRate 0.0313 Epoch: 8 Global Step: 365720 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:37,636-Speed 2628.98 samples/sec Loss 7.5782 LearningRate 0.0313 Epoch: 8 Global Step: 365730 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:41,532-Speed 2629.16 samples/sec Loss 7.4344 LearningRate 0.0313 Epoch: 8 Global Step: 365740 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:45,438-Speed 2622.07 samples/sec Loss 7.4808 LearningRate 0.0313 Epoch: 8 Global Step: 365750 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:49,322-Speed 2636.98 samples/sec Loss 7.5998 LearningRate 0.0313 Epoch: 8 Global Step: 365760 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:48:53,213-Speed 2632.54 samples/sec Loss 7.5756 LearningRate 0.0313 Epoch: 8 Global Step: 365770 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:48:57,108-Speed 2630.57 samples/sec Loss 7.6375 LearningRate 0.0313 Epoch: 8 Global Step: 365780 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:01,006-Speed 2627.56 samples/sec Loss 7.5259 LearningRate 0.0313 Epoch: 8 Global Step: 365790 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:04,905-Speed 2626.84 samples/sec Loss 7.5002 LearningRate 0.0313 Epoch: 8 Global Step: 365800 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:08,795-Speed 2632.79 samples/sec Loss 7.4911 LearningRate 0.0313 Epoch: 8 Global Step: 365810 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:12,690-Speed 2629.87 samples/sec Loss 7.6668 LearningRate 0.0313 Epoch: 8 Global Step: 365820 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:16,588-Speed 2627.16 samples/sec Loss 7.5413 LearningRate 0.0313 Epoch: 8 Global Step: 365830 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:20,504-Speed 2616.06 samples/sec Loss 7.5296 LearningRate 0.0312 Epoch: 8 Global Step: 365840 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:24,404-Speed 2626.43 samples/sec Loss 7.6808 LearningRate 0.0312 Epoch: 8 Global Step: 365850 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:28,311-Speed 2621.69 samples/sec Loss 7.4991 LearningRate 0.0312 Epoch: 8 Global Step: 365860 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:49:32,219-Speed 2620.67 samples/sec Loss 7.4928 LearningRate 0.0312 Epoch: 8 Global Step: 365870 Fp16 Grad Scale: 524288 Required: 52 hours
Training: 2022-04-14 12:49:36,083-Speed 2650.76 samples/sec Loss 7.5219 LearningRate 0.0312 Epoch: 8 Global Step: 365880 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:49:39,987-Speed 2623.30 samples/sec Loss 7.6098 LearningRate 0.0312 Epoch: 8 Global Step: 365890 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:49:43,885-Speed 2628.62 samples/sec Loss 7.5215 LearningRate 0.0312 Epoch: 8 Global Step: 365900 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:49:47,779-Speed 2630.09 samples/sec Loss 7.4970 LearningRate 0.0312 Epoch: 8 Global Step: 365910 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:49:51,673-Speed 2630.15 samples/sec Loss 7.5158 LearningRate 0.0312 Epoch: 8 Global Step: 365920 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:49:55,568-Speed 2629.60 samples/sec Loss 7.4480 LearningRate 0.0312 Epoch: 8 Global Step: 365930 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:49:59,470-Speed 2625.28 samples/sec Loss 7.5751 LearningRate 0.0312 Epoch: 8 Global Step: 365940 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:03,375-Speed 2623.02 samples/sec Loss 7.4616 LearningRate 0.0312 Epoch: 8 Global Step: 365950 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:07,284-Speed 2620.00 samples/sec Loss 7.4345 LearningRate 0.0312 Epoch: 8 Global Step: 365960 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:11,182-Speed 2627.66 samples/sec Loss 7.4378 LearningRate 0.0312 Epoch: 8 Global Step: 365970 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:15,082-Speed 2626.94 samples/sec Loss 7.4917 LearningRate 0.0312 Epoch: 8 Global Step: 365980 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:50:18,988-Speed 2621.75 samples/sec Loss 7.5387 LearningRate 0.0312 Epoch: 8 Global Step: 365990 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:50:22,884-Speed 2628.53 samples/sec Loss 7.6305 LearningRate 0.0312 Epoch: 8 Global Step: 366000 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:50:26,802-Speed 2614.82 samples/sec Loss 7.4802 LearningRate 0.0312 Epoch: 8 Global Step: 366010 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:50:30,728-Speed 2609.13 samples/sec Loss 7.5360 LearningRate 0.0312 Epoch: 8 Global Step: 366020 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:50:34,625-Speed 2628.23 samples/sec Loss 7.5221 LearningRate 0.0312 Epoch: 8 Global Step: 366030 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:50:38,508-Speed 2638.31 samples/sec Loss 7.4645 LearningRate 0.0312 Epoch: 8 Global Step: 366040 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:42,405-Speed 2628.05 samples/sec Loss 7.5204 LearningRate 0.0312 Epoch: 8 Global Step: 366050 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:46,337-Speed 2604.57 samples/sec Loss 7.6725 LearningRate 0.0312 Epoch: 8 Global Step: 366060 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:50,302-Speed 2584.14 samples/sec Loss 7.5065 LearningRate 0.0312 Epoch: 8 Global Step: 366070 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:54,200-Speed 2627.30 samples/sec Loss 7.5083 LearningRate 0.0312 Epoch: 8 Global Step: 366080 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:50:58,097-Speed 2628.95 samples/sec Loss 7.5213 LearningRate 0.0312 Epoch: 8 Global Step: 366090 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:01,992-Speed 2629.56 samples/sec Loss 7.5156 LearningRate 0.0312 Epoch: 8 Global Step: 366100 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:05,894-Speed 2624.67 samples/sec Loss 7.6418 LearningRate 0.0312 Epoch: 8 Global Step: 366110 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:09,803-Speed 2620.59 samples/sec Loss 7.4161 LearningRate 0.0312 Epoch: 8 Global Step: 366120 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:13,700-Speed 2628.34 samples/sec Loss 7.5939 LearningRate 0.0312 Epoch: 8 Global Step: 366130 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:17,644-Speed 2597.54 samples/sec Loss 7.4234 LearningRate 0.0312 Epoch: 8 Global Step: 366140 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:51:21,545-Speed 2625.32 samples/sec Loss 7.5455 LearningRate 0.0312 Epoch: 8 Global Step: 366150 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:51:25,441-Speed 2629.24 samples/sec Loss 7.5666 LearningRate 0.0312 Epoch: 8 Global Step: 366160 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:51:29,340-Speed 2626.81 samples/sec Loss 7.4489 LearningRate 0.0312 Epoch: 8 Global Step: 366170 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:33,242-Speed 2625.53 samples/sec Loss 7.5175 LearningRate 0.0312 Epoch: 8 Global Step: 366180 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:37,178-Speed 2602.27 samples/sec Loss 7.5630 LearningRate 0.0312 Epoch: 8 Global Step: 366190 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:41,065-Speed 2635.15 samples/sec Loss 7.5909 LearningRate 0.0312 Epoch: 8 Global Step: 366200 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:44,961-Speed 2628.85 samples/sec Loss 7.5627 LearningRate 0.0312 Epoch: 8 Global Step: 366210 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:48,883-Speed 2612.05 samples/sec Loss 7.6283 LearningRate 0.0312 Epoch: 8 Global Step: 366220 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:52,778-Speed 2629.23 samples/sec Loss 7.5676 LearningRate 0.0312 Epoch: 8 Global Step: 366230 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:51:56,700-Speed 2611.76 samples/sec Loss 7.5643 LearningRate 0.0312 Epoch: 8 Global Step: 366240 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:52:00,593-Speed 2631.21 samples/sec Loss 7.5636 LearningRate 0.0312 Epoch: 8 Global Step: 366250 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:52:04,490-Speed 2628.94 samples/sec Loss 7.5902 LearningRate 0.0312 Epoch: 8 Global Step: 366260 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:52:08,401-Speed 2618.35 samples/sec Loss 7.4726 LearningRate 0.0312 Epoch: 8 Global Step: 366270 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:52:12,309-Speed 2620.98 samples/sec Loss 7.4687 LearningRate 0.0312 Epoch: 8 Global Step: 366280 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:52:16,211-Speed 2624.87 samples/sec Loss 7.4069 LearningRate 0.0312 Epoch: 8 Global Step: 366290 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 12:52:20,086-Speed 2643.56 samples/sec Loss 7.3602 LearningRate 0.0312 Epoch: 8 Global Step: 366300 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:52:23,989-Speed 2624.73 samples/sec Loss 7.4703 LearningRate 0.0312 Epoch: 8 Global Step: 366310 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:52:27,887-Speed 2627.46 samples/sec Loss 7.5137 LearningRate 0.0312 Epoch: 8 Global Step: 366320 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:52:31,776-Speed 2633.37 samples/sec Loss 7.6301 LearningRate 0.0312 Epoch: 8 Global Step: 366330 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:52:35,649-Speed 2644.51 samples/sec Loss 7.4416 LearningRate 0.0312 Epoch: 8 Global Step: 366340 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:52:39,549-Speed 2626.96 samples/sec Loss 7.5543 LearningRate 0.0312 Epoch: 8 Global Step: 366350 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:52:43,447-Speed 2627.73 samples/sec Loss 7.5310 LearningRate 0.0312 Epoch: 8 Global Step: 366360 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:52:47,344-Speed 2628.69 samples/sec Loss 7.6259 LearningRate 0.0312 Epoch: 8 Global Step: 366370 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:52:51,245-Speed 2624.83 samples/sec Loss 7.5861 LearningRate 0.0312 Epoch: 8 Global Step: 366380 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:52:55,146-Speed 2626.30 samples/sec Loss 7.5325 LearningRate 0.0312 Epoch: 8 Global Step: 366390 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:52:59,057-Speed 2618.65 samples/sec Loss 7.6032 LearningRate 0.0312 Epoch: 8 Global Step: 366400 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:53:02,971-Speed 2616.91 samples/sec Loss 7.5907 LearningRate 0.0312 Epoch: 8 Global Step: 366410 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:53:06,871-Speed 2626.07 samples/sec Loss 7.4614 LearningRate 0.0312 Epoch: 8 Global Step: 366420 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:53:10,850-Speed 2574.78 samples/sec Loss 7.6028 LearningRate 0.0312 Epoch: 8 Global Step: 366430 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:53:14,761-Speed 2618.87 samples/sec Loss 7.6634 LearningRate 0.0312 Epoch: 8 Global Step: 366440 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:53:18,696-Speed 2602.75 samples/sec Loss 7.5267 LearningRate 0.0312 Epoch: 8 Global Step: 366450 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:53:22,624-Speed 2607.86 samples/sec Loss 7.5751 LearningRate 0.0312 Epoch: 8 Global Step: 366460 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:53:26,519-Speed 2629.30 samples/sec Loss 7.5310 LearningRate 0.0312 Epoch: 8 Global Step: 366470 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:53:30,415-Speed 2629.04 samples/sec Loss 7.4937 LearningRate 0.0312 Epoch: 8 Global Step: 366480 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 12:53:34,296-Speed 2639.32 samples/sec Loss 7.4164 LearningRate 0.0312 Epoch: 8 Global Step: 366490 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:53:38,209-Speed 2617.04 samples/sec Loss 7.5250 LearningRate 0.0312 Epoch: 8 Global Step: 366500 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 12:53:42,090-Speed 2639.07 samples/sec Loss 7.4877 LearningRate 0.0312 Epoch: 8 Global Step: 366510 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:53:45,993-Speed 2624.90 samples/sec Loss 7.4077 LearningRate 0.0312 Epoch: 8 Global Step: 366520 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:53:49,896-Speed 2624.42 samples/sec Loss 7.6274 LearningRate 0.0312 Epoch: 8 Global Step: 366530 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:53:53,805-Speed 2620.21 samples/sec Loss 7.5982 LearningRate 0.0312 Epoch: 8 Global Step: 366540 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:53:57,705-Speed 2626.07 samples/sec Loss 7.5370 LearningRate 0.0312 Epoch: 8 Global Step: 366550 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 12:54:01,543-Speed 2668.79 samples/sec Loss 8.3447 LearningRate 0.0312 Epoch: 8 Global Step: 366560 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:54:05,418-Speed 2642.82 samples/sec Loss 8.9709 LearningRate 0.0312 Epoch: 8 Global Step: 366570 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:09,331-Speed 2617.83 samples/sec Loss 7.9889 LearningRate 0.0311 Epoch: 8 Global Step: 366580 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:13,226-Speed 2629.12 samples/sec Loss 7.6809 LearningRate 0.0311 Epoch: 8 Global Step: 366590 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:17,129-Speed 2624.88 samples/sec Loss 7.5438 LearningRate 0.0311 Epoch: 8 Global Step: 366600 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:21,025-Speed 2629.15 samples/sec Loss 7.5564 LearningRate 0.0311 Epoch: 8 Global Step: 366610 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:24,920-Speed 2629.60 samples/sec Loss 7.4387 LearningRate 0.0311 Epoch: 8 Global Step: 366620 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:28,808-Speed 2634.34 samples/sec Loss 7.4352 LearningRate 0.0311 Epoch: 8 Global Step: 366630 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:32,700-Speed 2631.61 samples/sec Loss 7.4509 LearningRate 0.0311 Epoch: 8 Global Step: 366640 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:36,591-Speed 2632.12 samples/sec Loss 7.4819 LearningRate 0.0311 Epoch: 8 Global Step: 366650 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:40,493-Speed 2624.60 samples/sec Loss 7.4940 LearningRate 0.0311 Epoch: 8 Global Step: 366660 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:54:44,408-Speed 2616.44 samples/sec Loss 7.5613 LearningRate 0.0311 Epoch: 8 Global Step: 366670 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:54:48,311-Speed 2624.66 samples/sec Loss 7.5114 LearningRate 0.0311 Epoch: 8 Global Step: 366680 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:54:52,208-Speed 2628.82 samples/sec Loss 7.5424 LearningRate 0.0311 Epoch: 8 Global Step: 366690 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:54:56,120-Speed 2618.06 samples/sec Loss 7.5285 LearningRate 0.0311 Epoch: 8 Global Step: 366700 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:55:00,011-Speed 2632.80 samples/sec Loss 7.5099 LearningRate 0.0311 Epoch: 8 Global Step: 366710 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:55:03,901-Speed 2632.82 samples/sec Loss 7.5616 LearningRate 0.0311 Epoch: 8 Global Step: 366720 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:55:07,794-Speed 2631.48 samples/sec Loss 7.4995 LearningRate 0.0311 Epoch: 8 Global Step: 366730 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:55:11,689-Speed 2629.69 samples/sec Loss 7.3928 LearningRate 0.0311 Epoch: 8 Global Step: 366740 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:55:15,578-Speed 2633.64 samples/sec Loss 7.4926 LearningRate 0.0311 Epoch: 8 Global Step: 366750 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:55:19,470-Speed 2631.52 samples/sec Loss 7.4627 LearningRate 0.0311 Epoch: 8 Global Step: 366760 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:55:23,389-Speed 2614.15 samples/sec Loss 7.5328 LearningRate 0.0311 Epoch: 8 Global Step: 366770 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:27,279-Speed 2633.05 samples/sec Loss 7.5276 LearningRate 0.0311 Epoch: 8 Global Step: 366780 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:31,171-Speed 2631.85 samples/sec Loss 7.5427 LearningRate 0.0311 Epoch: 8 Global Step: 366790 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:35,067-Speed 2628.39 samples/sec Loss 7.4517 LearningRate 0.0311 Epoch: 8 Global Step: 366800 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:38,964-Speed 2628.58 samples/sec Loss 7.5654 LearningRate 0.0311 Epoch: 8 Global Step: 366810 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:42,868-Speed 2623.99 samples/sec Loss 7.5028 LearningRate 0.0311 Epoch: 8 Global Step: 366820 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:46,768-Speed 2625.56 samples/sec Loss 7.4903 LearningRate 0.0311 Epoch: 8 Global Step: 366830 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:50,658-Speed 2633.28 samples/sec Loss 7.5534 LearningRate 0.0311 Epoch: 8 Global Step: 366840 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:54,549-Speed 2632.05 samples/sec Loss 7.5862 LearningRate 0.0311 Epoch: 8 Global Step: 366850 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:55:58,443-Speed 2630.90 samples/sec Loss 7.4745 LearningRate 0.0311 Epoch: 8 Global Step: 366860 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:56:02,339-Speed 2628.91 samples/sec Loss 7.4769 LearningRate 0.0311 Epoch: 8 Global Step: 366870 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:06,237-Speed 2627.87 samples/sec Loss 7.5669 LearningRate 0.0311 Epoch: 8 Global Step: 366880 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:10,132-Speed 2629.11 samples/sec Loss 7.5356 LearningRate 0.0311 Epoch: 8 Global Step: 366890 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:14,027-Speed 2630.20 samples/sec Loss 7.5211 LearningRate 0.0311 Epoch: 8 Global Step: 366900 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:17,923-Speed 2628.85 samples/sec Loss 7.4852 LearningRate 0.0311 Epoch: 8 Global Step: 366910 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:21,817-Speed 2629.95 samples/sec Loss 7.6408 LearningRate 0.0311 Epoch: 8 Global Step: 366920 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:25,710-Speed 2631.18 samples/sec Loss 7.4446 LearningRate 0.0311 Epoch: 8 Global Step: 366930 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:29,602-Speed 2632.18 samples/sec Loss 7.5699 LearningRate 0.0311 Epoch: 8 Global Step: 366940 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:33,495-Speed 2631.09 samples/sec Loss 7.5138 LearningRate 0.0311 Epoch: 8 Global Step: 366950 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:37,388-Speed 2630.53 samples/sec Loss 7.6881 LearningRate 0.0311 Epoch: 8 Global Step: 366960 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:56:41,283-Speed 2629.58 samples/sec Loss 7.4535 LearningRate 0.0311 Epoch: 8 Global Step: 366970 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:56:45,176-Speed 2630.68 samples/sec Loss 7.4890 LearningRate 0.0311 Epoch: 8 Global Step: 366980 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:56:49,084-Speed 2621.61 samples/sec Loss 7.4983 LearningRate 0.0311 Epoch: 8 Global Step: 366990 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:56:52,984-Speed 2626.17 samples/sec Loss 7.5535 LearningRate 0.0311 Epoch: 8 Global Step: 367000 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:56:56,927-Speed 2597.82 samples/sec Loss 7.4768 LearningRate 0.0311 Epoch: 8 Global Step: 367010 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:57:00,819-Speed 2631.67 samples/sec Loss 7.4226 LearningRate 0.0311 Epoch: 8 Global Step: 367020 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:57:04,710-Speed 2632.66 samples/sec Loss 7.4912 LearningRate 0.0311 Epoch: 8 Global Step: 367030 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:57:08,617-Speed 2621.33 samples/sec Loss 7.4567 LearningRate 0.0311 Epoch: 8 Global Step: 367040 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:57:12,522-Speed 2623.10 samples/sec Loss 7.4972 LearningRate 0.0311 Epoch: 8 Global Step: 367050 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:57:16,362-Speed 2667.50 samples/sec Loss 8.2466 LearningRate 0.0311 Epoch: 8 Global Step: 367060 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:20,259-Speed 2628.29 samples/sec Loss 8.3295 LearningRate 0.0311 Epoch: 8 Global Step: 367070 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:24,166-Speed 2621.15 samples/sec Loss 8.0505 LearningRate 0.0311 Epoch: 8 Global Step: 367080 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:28,074-Speed 2621.73 samples/sec Loss 7.7718 LearningRate 0.0311 Epoch: 8 Global Step: 367090 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:31,970-Speed 2628.43 samples/sec Loss 7.8621 LearningRate 0.0311 Epoch: 8 Global Step: 367100 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:35,896-Speed 2609.52 samples/sec Loss 7.7887 LearningRate 0.0311 Epoch: 8 Global Step: 367110 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:39,792-Speed 2629.04 samples/sec Loss 7.6869 LearningRate 0.0311 Epoch: 8 Global Step: 367120 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:43,683-Speed 2632.03 samples/sec Loss 7.6704 LearningRate 0.0311 Epoch: 8 Global Step: 367130 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:47,584-Speed 2625.93 samples/sec Loss 7.6511 LearningRate 0.0311 Epoch: 8 Global Step: 367140 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:51,481-Speed 2628.55 samples/sec Loss 7.7229 LearningRate 0.0311 Epoch: 8 Global Step: 367150 Fp16 Grad Scale: 1024 Required: 52 hours
Training: 2022-04-14 12:57:55,377-Speed 2629.17 samples/sec Loss 7.6127 LearningRate 0.0311 Epoch: 8 Global Step: 367160 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:57:59,265-Speed 2633.73 samples/sec Loss 7.4656 LearningRate 0.0311 Epoch: 8 Global Step: 367170 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:03,166-Speed 2626.59 samples/sec Loss 7.6100 LearningRate 0.0311 Epoch: 8 Global Step: 367180 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:07,064-Speed 2627.25 samples/sec Loss 7.7853 LearningRate 0.0311 Epoch: 8 Global Step: 367190 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:10,953-Speed 2633.52 samples/sec Loss 7.5293 LearningRate 0.0311 Epoch: 8 Global Step: 367200 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:14,849-Speed 2628.69 samples/sec Loss 7.6408 LearningRate 0.0311 Epoch: 8 Global Step: 367210 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:18,736-Speed 2635.62 samples/sec Loss 7.5326 LearningRate 0.0311 Epoch: 8 Global Step: 367220 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:22,629-Speed 2631.28 samples/sec Loss 7.4627 LearningRate 0.0311 Epoch: 8 Global Step: 367230 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:26,516-Speed 2634.80 samples/sec Loss 7.6567 LearningRate 0.0311 Epoch: 8 Global Step: 367240 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:30,398-Speed 2639.07 samples/sec Loss 7.4865 LearningRate 0.0311 Epoch: 8 Global Step: 367250 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 12:58:34,289-Speed 2632.25 samples/sec Loss 7.5767 LearningRate 0.0311 Epoch: 8 Global Step: 367260 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:58:38,178-Speed 2633.55 samples/sec Loss 7.5907 LearningRate 0.0311 Epoch: 8 Global Step: 367270 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:58:42,072-Speed 2630.82 samples/sec Loss 7.4787 LearningRate 0.0311 Epoch: 8 Global Step: 367280 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:58:45,960-Speed 2634.28 samples/sec Loss 7.4058 LearningRate 0.0311 Epoch: 8 Global Step: 367290 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:58:49,858-Speed 2627.19 samples/sec Loss 7.6020 LearningRate 0.0311 Epoch: 8 Global Step: 367300 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:58:53,749-Speed 2632.99 samples/sec Loss 7.5255 LearningRate 0.0311 Epoch: 8 Global Step: 367310 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:58:57,648-Speed 2626.42 samples/sec Loss 7.4865 LearningRate 0.0310 Epoch: 8 Global Step: 367320 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:59:01,544-Speed 2629.18 samples/sec Loss 7.5938 LearningRate 0.0310 Epoch: 8 Global Step: 367330 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:59:05,438-Speed 2630.14 samples/sec Loss 7.5673 LearningRate 0.0310 Epoch: 8 Global Step: 367340 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:59:09,335-Speed 2628.53 samples/sec Loss 7.5338 LearningRate 0.0310 Epoch: 8 Global Step: 367350 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 12:59:13,232-Speed 2628.45 samples/sec Loss 7.7589 LearningRate 0.0310 Epoch: 8 Global Step: 367360 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:17,124-Speed 2632.02 samples/sec Loss 7.5568 LearningRate 0.0310 Epoch: 8 Global Step: 367370 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:21,043-Speed 2614.04 samples/sec Loss 7.6941 LearningRate 0.0310 Epoch: 8 Global Step: 367380 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:25,016-Speed 2577.68 samples/sec Loss 7.7162 LearningRate 0.0310 Epoch: 8 Global Step: 367390 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:28,923-Speed 2621.42 samples/sec Loss 7.4987 LearningRate 0.0310 Epoch: 8 Global Step: 367400 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:32,819-Speed 2629.33 samples/sec Loss 7.4960 LearningRate 0.0310 Epoch: 8 Global Step: 367410 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:36,709-Speed 2633.32 samples/sec Loss 7.4596 LearningRate 0.0310 Epoch: 8 Global Step: 367420 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:40,615-Speed 2622.09 samples/sec Loss 7.4063 LearningRate 0.0310 Epoch: 8 Global Step: 367430 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:44,525-Speed 2619.49 samples/sec Loss 7.4845 LearningRate 0.0310 Epoch: 8 Global Step: 367440 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:48,417-Speed 2631.47 samples/sec Loss 7.4082 LearningRate 0.0310 Epoch: 8 Global Step: 367450 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 12:59:52,309-Speed 2632.15 samples/sec Loss 7.6149 LearningRate 0.0310 Epoch: 8 Global Step: 367460 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 12:59:56,207-Speed 2627.67 samples/sec Loss 7.4980 LearningRate 0.0310 Epoch: 8 Global Step: 367470 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:00,105-Speed 2627.72 samples/sec Loss 7.4439 LearningRate 0.0310 Epoch: 8 Global Step: 367480 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:04,006-Speed 2625.61 samples/sec Loss 7.4747 LearningRate 0.0310 Epoch: 8 Global Step: 367490 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:07,904-Speed 2627.62 samples/sec Loss 7.5431 LearningRate 0.0310 Epoch: 8 Global Step: 367500 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:11,882-Speed 2574.38 samples/sec Loss 7.5560 LearningRate 0.0310 Epoch: 8 Global Step: 367510 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:15,792-Speed 2620.40 samples/sec Loss 7.3591 LearningRate 0.0310 Epoch: 8 Global Step: 367520 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:19,689-Speed 2628.22 samples/sec Loss 7.4989 LearningRate 0.0310 Epoch: 8 Global Step: 367530 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:23,582-Speed 2631.49 samples/sec Loss 7.4258 LearningRate 0.0310 Epoch: 8 Global Step: 367540 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:27,502-Speed 2612.55 samples/sec Loss 7.4614 LearningRate 0.0310 Epoch: 8 Global Step: 367550 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:00:31,389-Speed 2635.39 samples/sec Loss 7.5419 LearningRate 0.0310 Epoch: 8 Global Step: 367560 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:00:35,290-Speed 2625.24 samples/sec Loss 7.5342 LearningRate 0.0310 Epoch: 8 Global Step: 367570 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:00:39,188-Speed 2628.22 samples/sec Loss 7.5647 LearningRate 0.0310 Epoch: 8 Global Step: 367580 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:00:43,082-Speed 2630.27 samples/sec Loss 7.4449 LearningRate 0.0310 Epoch: 8 Global Step: 367590 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:00:46,996-Speed 2616.87 samples/sec Loss 7.5841 LearningRate 0.0310 Epoch: 8 Global Step: 367600 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:00:50,889-Speed 2631.88 samples/sec Loss 7.5635 LearningRate 0.0310 Epoch: 8 Global Step: 367610 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:00:54,781-Speed 2631.21 samples/sec Loss 7.5259 LearningRate 0.0310 Epoch: 8 Global Step: 367620 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:00:58,674-Speed 2631.37 samples/sec Loss 7.5324 LearningRate 0.0310 Epoch: 8 Global Step: 367630 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:01:02,568-Speed 2630.08 samples/sec Loss 7.5306 LearningRate 0.0310 Epoch: 8 Global Step: 367640 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:01:06,473-Speed 2623.33 samples/sec Loss 7.6484 LearningRate 0.0310 Epoch: 8 Global Step: 367650 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:01:10,358-Speed 2635.80 samples/sec Loss 7.9285 LearningRate 0.0310 Epoch: 8 Global Step: 367660 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:14,258-Speed 2626.62 samples/sec Loss 7.5972 LearningRate 0.0310 Epoch: 8 Global Step: 367670 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:18,169-Speed 2619.06 samples/sec Loss 7.5773 LearningRate 0.0310 Epoch: 8 Global Step: 367680 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:22,107-Speed 2601.37 samples/sec Loss 7.4809 LearningRate 0.0310 Epoch: 8 Global Step: 367690 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:26,003-Speed 2628.61 samples/sec Loss 7.4608 LearningRate 0.0310 Epoch: 8 Global Step: 367700 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:29,904-Speed 2626.36 samples/sec Loss 7.4396 LearningRate 0.0310 Epoch: 8 Global Step: 367710 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:33,822-Speed 2613.79 samples/sec Loss 7.4983 LearningRate 0.0310 Epoch: 8 Global Step: 367720 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:37,711-Speed 2633.81 samples/sec Loss 7.5616 LearningRate 0.0310 Epoch: 8 Global Step: 367730 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:41,601-Speed 2633.28 samples/sec Loss 7.5731 LearningRate 0.0310 Epoch: 8 Global Step: 367740 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:45,500-Speed 2626.77 samples/sec Loss 7.5330 LearningRate 0.0310 Epoch: 8 Global Step: 367750 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:01:49,368-Speed 2648.09 samples/sec Loss 7.6023 LearningRate 0.0310 Epoch: 8 Global Step: 367760 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:01:53,245-Speed 2642.24 samples/sec Loss 8.5982 LearningRate 0.0310 Epoch: 8 Global Step: 367770 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:01:57,149-Speed 2624.15 samples/sec Loss 8.1095 LearningRate 0.0310 Epoch: 8 Global Step: 367780 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:01,042-Speed 2631.28 samples/sec Loss 7.6581 LearningRate 0.0310 Epoch: 8 Global Step: 367790 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:04,947-Speed 2622.41 samples/sec Loss 7.5043 LearningRate 0.0310 Epoch: 8 Global Step: 367800 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:08,838-Speed 2632.30 samples/sec Loss 7.4792 LearningRate 0.0310 Epoch: 8 Global Step: 367810 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:12,729-Speed 2632.79 samples/sec Loss 7.4754 LearningRate 0.0310 Epoch: 8 Global Step: 367820 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:16,618-Speed 2633.59 samples/sec Loss 7.6033 LearningRate 0.0310 Epoch: 8 Global Step: 367830 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:20,518-Speed 2626.74 samples/sec Loss 7.7118 LearningRate 0.0310 Epoch: 8 Global Step: 367840 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:24,409-Speed 2631.79 samples/sec Loss 7.6120 LearningRate 0.0310 Epoch: 8 Global Step: 367850 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:28,307-Speed 2627.61 samples/sec Loss 7.6279 LearningRate 0.0310 Epoch: 8 Global Step: 367860 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:02:32,195-Speed 2635.12 samples/sec Loss 7.5923 LearningRate 0.0310 Epoch: 8 Global Step: 367870 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:02:36,089-Speed 2630.59 samples/sec Loss 7.5029 LearningRate 0.0310 Epoch: 8 Global Step: 367880 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:02:39,994-Speed 2622.85 samples/sec Loss 7.7170 LearningRate 0.0310 Epoch: 8 Global Step: 367890 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:02:43,887-Speed 2630.39 samples/sec Loss 7.4923 LearningRate 0.0310 Epoch: 8 Global Step: 367900 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:02:47,782-Speed 2629.80 samples/sec Loss 7.6296 LearningRate 0.0310 Epoch: 8 Global Step: 367910 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:02:51,677-Speed 2629.72 samples/sec Loss 7.4989 LearningRate 0.0310 Epoch: 8 Global Step: 367920 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:02:55,579-Speed 2625.36 samples/sec Loss 7.6107 LearningRate 0.0310 Epoch: 8 Global Step: 367930 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:02:59,476-Speed 2628.43 samples/sec Loss 7.4838 LearningRate 0.0310 Epoch: 8 Global Step: 367940 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:03:03,391-Speed 2616.17 samples/sec Loss 7.5324 LearningRate 0.0310 Epoch: 8 Global Step: 367950 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:03:07,286-Speed 2629.90 samples/sec Loss 7.5070 LearningRate 0.0310 Epoch: 8 Global Step: 367960 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:03:11,180-Speed 2630.08 samples/sec Loss 7.5061 LearningRate 0.0310 Epoch: 8 Global Step: 367970 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:15,076-Speed 2628.84 samples/sec Loss 7.6798 LearningRate 0.0310 Epoch: 8 Global Step: 367980 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:18,972-Speed 2629.56 samples/sec Loss 7.5514 LearningRate 0.0310 Epoch: 8 Global Step: 367990 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:22,870-Speed 2627.35 samples/sec Loss 7.4959 LearningRate 0.0310 Epoch: 8 Global Step: 368000 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:26,765-Speed 2630.32 samples/sec Loss 7.6618 LearningRate 0.0310 Epoch: 8 Global Step: 368010 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:30,657-Speed 2631.44 samples/sec Loss 7.4375 LearningRate 0.0310 Epoch: 8 Global Step: 368020 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:34,553-Speed 2629.61 samples/sec Loss 7.5667 LearningRate 0.0310 Epoch: 8 Global Step: 368030 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:38,452-Speed 2627.19 samples/sec Loss 7.4874 LearningRate 0.0310 Epoch: 8 Global Step: 368040 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:42,343-Speed 2632.05 samples/sec Loss 7.5159 LearningRate 0.0310 Epoch: 8 Global Step: 368050 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:46,242-Speed 2626.75 samples/sec Loss 7.4115 LearningRate 0.0310 Epoch: 8 Global Step: 368060 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:03:50,131-Speed 2633.95 samples/sec Loss 7.5145 LearningRate 0.0309 Epoch: 8 Global Step: 368070 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:03:54,024-Speed 2630.47 samples/sec Loss 7.7014 LearningRate 0.0309 Epoch: 8 Global Step: 368080 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:03:57,919-Speed 2630.38 samples/sec Loss 7.3939 LearningRate 0.0309 Epoch: 8 Global Step: 368090 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:01,818-Speed 2626.32 samples/sec Loss 7.4720 LearningRate 0.0309 Epoch: 8 Global Step: 368100 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:05,711-Speed 2631.54 samples/sec Loss 7.4895 LearningRate 0.0309 Epoch: 8 Global Step: 368110 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:09,601-Speed 2632.75 samples/sec Loss 7.5428 LearningRate 0.0309 Epoch: 8 Global Step: 368120 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:13,488-Speed 2634.68 samples/sec Loss 7.5621 LearningRate 0.0309 Epoch: 8 Global Step: 368130 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:17,380-Speed 2631.46 samples/sec Loss 7.6093 LearningRate 0.0309 Epoch: 8 Global Step: 368140 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:21,278-Speed 2628.35 samples/sec Loss 7.4868 LearningRate 0.0309 Epoch: 8 Global Step: 368150 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:25,171-Speed 2630.44 samples/sec Loss 7.4799 LearningRate 0.0309 Epoch: 8 Global Step: 368160 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:04:29,066-Speed 2629.74 samples/sec Loss 7.4501 LearningRate 0.0309 Epoch: 8 Global Step: 368170 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:04:32,963-Speed 2628.22 samples/sec Loss 7.4921 LearningRate 0.0309 Epoch: 8 Global Step: 368180 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:04:36,857-Speed 2630.98 samples/sec Loss 7.4569 LearningRate 0.0309 Epoch: 8 Global Step: 368190 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:04:40,750-Speed 2630.99 samples/sec Loss 7.5837 LearningRate 0.0309 Epoch: 8 Global Step: 368200 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:04:44,647-Speed 2628.31 samples/sec Loss 7.4709 LearningRate 0.0309 Epoch: 8 Global Step: 368210 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:04:48,560-Speed 2617.30 samples/sec Loss 7.5578 LearningRate 0.0309 Epoch: 8 Global Step: 368220 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:04:52,454-Speed 2630.31 samples/sec Loss 7.5557 LearningRate 0.0309 Epoch: 8 Global Step: 368230 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:04:56,346-Speed 2632.56 samples/sec Loss 7.5333 LearningRate 0.0309 Epoch: 8 Global Step: 368240 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:05:00,253-Speed 2620.99 samples/sec Loss 7.4028 LearningRate 0.0309 Epoch: 8 Global Step: 368250 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:05:04,154-Speed 2626.00 samples/sec Loss 7.4711 LearningRate 0.0309 Epoch: 8 Global Step: 368260 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:05:08,047-Speed 2630.52 samples/sec Loss 7.5606 LearningRate 0.0309 Epoch: 8 Global Step: 368270 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:11,946-Speed 2626.95 samples/sec Loss 7.5124 LearningRate 0.0309 Epoch: 8 Global Step: 368280 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:15,866-Speed 2613.45 samples/sec Loss 7.5112 LearningRate 0.0309 Epoch: 8 Global Step: 368290 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:19,766-Speed 2626.56 samples/sec Loss 7.4575 LearningRate 0.0309 Epoch: 8 Global Step: 368300 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:23,665-Speed 2627.00 samples/sec Loss 7.4769 LearningRate 0.0309 Epoch: 8 Global Step: 368310 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:27,563-Speed 2627.71 samples/sec Loss 7.5961 LearningRate 0.0309 Epoch: 8 Global Step: 368320 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:31,464-Speed 2626.62 samples/sec Loss 7.2831 LearningRate 0.0309 Epoch: 8 Global Step: 368330 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:35,365-Speed 2625.44 samples/sec Loss 7.5698 LearningRate 0.0309 Epoch: 8 Global Step: 368340 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:39,340-Speed 2576.24 samples/sec Loss 7.5418 LearningRate 0.0309 Epoch: 8 Global Step: 368350 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:43,248-Speed 2621.33 samples/sec Loss 7.5690 LearningRate 0.0309 Epoch: 8 Global Step: 368360 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:47,132-Speed 2637.54 samples/sec Loss 7.5710 LearningRate 0.0309 Epoch: 8 Global Step: 368370 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:51,036-Speed 2626.61 samples/sec Loss 7.4817 LearningRate 0.0309 Epoch: 8 Global Step: 368380 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:54,933-Speed 2627.75 samples/sec Loss 7.6603 LearningRate 0.0309 Epoch: 8 Global Step: 368390 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:05:58,840-Speed 2621.96 samples/sec Loss 7.4644 LearningRate 0.0309 Epoch: 8 Global Step: 368400 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:06:02,737-Speed 2628.74 samples/sec Loss 7.4250 LearningRate 0.0309 Epoch: 8 Global Step: 368410 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:06:06,628-Speed 2632.03 samples/sec Loss 7.4924 LearningRate 0.0309 Epoch: 8 Global Step: 368420 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:06:10,545-Speed 2614.62 samples/sec Loss 7.4366 LearningRate 0.0309 Epoch: 8 Global Step: 368430 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:06:14,481-Speed 2602.97 samples/sec Loss 7.5623 LearningRate 0.0309 Epoch: 8 Global Step: 368440 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:06:18,383-Speed 2624.65 samples/sec Loss 7.3994 LearningRate 0.0309 Epoch: 8 Global Step: 368450 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:06:22,303-Speed 2613.24 samples/sec Loss 7.5393 LearningRate 0.0309 Epoch: 8 Global Step: 368460 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:06:26,184-Speed 2639.47 samples/sec Loss 7.5289 LearningRate 0.0309 Epoch: 8 Global Step: 368470 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:06:30,070-Speed 2636.01 samples/sec Loss 7.4058 LearningRate 0.0309 Epoch: 8 Global Step: 368480 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:06:33,962-Speed 2631.33 samples/sec Loss 7.5210 LearningRate 0.0309 Epoch: 8 Global Step: 368490 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:06:37,856-Speed 2630.80 samples/sec Loss 7.4682 LearningRate 0.0309 Epoch: 8 Global Step: 368500 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:06:41,749-Speed 2630.28 samples/sec Loss 7.4427 LearningRate 0.0309 Epoch: 8 Global Step: 368510 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:06:45,647-Speed 2636.14 samples/sec Loss 7.5173 LearningRate 0.0309 Epoch: 8 Global Step: 368520 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:06:49,511-Speed 2650.75 samples/sec Loss 7.5293 LearningRate 0.0309 Epoch: 8 Global Step: 368530 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:06:53,430-Speed 2614.14 samples/sec Loss 7.5477 LearningRate 0.0309 Epoch: 8 Global Step: 368540 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:06:57,326-Speed 2629.05 samples/sec Loss 7.6042 LearningRate 0.0309 Epoch: 8 Global Step: 368550 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:01,215-Speed 2633.98 samples/sec Loss 7.5575 LearningRate 0.0309 Epoch: 8 Global Step: 368560 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:05,101-Speed 2635.35 samples/sec Loss 7.5740 LearningRate 0.0309 Epoch: 8 Global Step: 368570 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:08,996-Speed 2629.97 samples/sec Loss 7.5462 LearningRate 0.0309 Epoch: 8 Global Step: 368580 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:12,889-Speed 2630.54 samples/sec Loss 7.5148 LearningRate 0.0309 Epoch: 8 Global Step: 368590 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:16,802-Speed 2618.03 samples/sec Loss 7.4317 LearningRate 0.0309 Epoch: 8 Global Step: 368600 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:20,702-Speed 2626.31 samples/sec Loss 7.4710 LearningRate 0.0309 Epoch: 8 Global Step: 368610 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:24,603-Speed 2625.71 samples/sec Loss 7.5206 LearningRate 0.0309 Epoch: 8 Global Step: 368620 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:07:28,514-Speed 2619.08 samples/sec Loss 7.5201 LearningRate 0.0309 Epoch: 8 Global Step: 368630 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:07:32,419-Speed 2623.18 samples/sec Loss 7.5130 LearningRate 0.0309 Epoch: 8 Global Step: 368640 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:07:36,322-Speed 2624.26 samples/sec Loss 7.4574 LearningRate 0.0309 Epoch: 8 Global Step: 368650 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:07:40,227-Speed 2623.12 samples/sec Loss 7.5600 LearningRate 0.0309 Epoch: 8 Global Step: 368660 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:07:44,160-Speed 2603.99 samples/sec Loss 7.5369 LearningRate 0.0309 Epoch: 8 Global Step: 368670 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:07:48,065-Speed 2622.75 samples/sec Loss 7.3795 LearningRate 0.0309 Epoch: 8 Global Step: 368680 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:07:51,942-Speed 2641.51 samples/sec Loss 8.0174 LearningRate 0.0309 Epoch: 8 Global Step: 368690 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:07:55,831-Speed 2634.32 samples/sec Loss 8.1745 LearningRate 0.0309 Epoch: 8 Global Step: 368700 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:07:59,780-Speed 2594.06 samples/sec Loss 7.8591 LearningRate 0.0309 Epoch: 8 Global Step: 368710 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:03,676-Speed 2628.51 samples/sec Loss 7.7702 LearningRate 0.0309 Epoch: 8 Global Step: 368720 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:07,589-Speed 2617.04 samples/sec Loss 7.6073 LearningRate 0.0309 Epoch: 8 Global Step: 368730 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:11,478-Speed 2634.07 samples/sec Loss 7.4977 LearningRate 0.0309 Epoch: 8 Global Step: 368740 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:15,368-Speed 2632.93 samples/sec Loss 7.6754 LearningRate 0.0309 Epoch: 8 Global Step: 368750 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:19,272-Speed 2623.36 samples/sec Loss 7.4313 LearningRate 0.0309 Epoch: 8 Global Step: 368760 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:23,165-Speed 2630.96 samples/sec Loss 7.4922 LearningRate 0.0309 Epoch: 8 Global Step: 368770 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:27,056-Speed 2632.83 samples/sec Loss 7.5403 LearningRate 0.0309 Epoch: 8 Global Step: 368780 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:08:30,949-Speed 2631.04 samples/sec Loss 7.6607 LearningRate 0.0309 Epoch: 8 Global Step: 368790 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:08:34,843-Speed 2630.60 samples/sec Loss 7.5593 LearningRate 0.0309 Epoch: 8 Global Step: 368800 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:08:38,732-Speed 2633.20 samples/sec Loss 7.3803 LearningRate 0.0308 Epoch: 8 Global Step: 368810 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:08:42,629-Speed 2628.87 samples/sec Loss 7.2850 LearningRate 0.0308 Epoch: 8 Global Step: 368820 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:08:46,518-Speed 2632.84 samples/sec Loss 7.5181 LearningRate 0.0308 Epoch: 8 Global Step: 368830 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:08:50,415-Speed 2628.88 samples/sec Loss 7.5053 LearningRate 0.0308 Epoch: 8 Global Step: 368840 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:08:54,314-Speed 2627.38 samples/sec Loss 7.4500 LearningRate 0.0308 Epoch: 8 Global Step: 368850 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:08:58,237-Speed 2611.52 samples/sec Loss 7.5763 LearningRate 0.0308 Epoch: 8 Global Step: 368860 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:02,128-Speed 2631.77 samples/sec Loss 7.5542 LearningRate 0.0308 Epoch: 8 Global Step: 368870 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:06,019-Speed 2633.10 samples/sec Loss 7.4746 LearningRate 0.0308 Epoch: 8 Global Step: 368880 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:09,925-Speed 2622.49 samples/sec Loss 7.4228 LearningRate 0.0308 Epoch: 8 Global Step: 368890 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:09:13,819-Speed 2630.13 samples/sec Loss 7.4538 LearningRate 0.0308 Epoch: 8 Global Step: 368900 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:09:18,667-Speed 2112.39 samples/sec Loss 7.5019 LearningRate 0.0308 Epoch: 8 Global Step: 368910 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:09:22,547-Speed 2640.26 samples/sec Loss 7.4820 LearningRate 0.0308 Epoch: 8 Global Step: 368920 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:09:26,437-Speed 2633.49 samples/sec Loss 7.6976 LearningRate 0.0308 Epoch: 8 Global Step: 368930 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:30,341-Speed 2623.36 samples/sec Loss 7.6497 LearningRate 0.0308 Epoch: 8 Global Step: 368940 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:34,229-Speed 2634.61 samples/sec Loss 7.4098 LearningRate 0.0308 Epoch: 8 Global Step: 368950 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:38,124-Speed 2629.24 samples/sec Loss 7.5462 LearningRate 0.0308 Epoch: 8 Global Step: 368960 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:42,017-Speed 2630.95 samples/sec Loss 7.5560 LearningRate 0.0308 Epoch: 8 Global Step: 368970 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:45,910-Speed 2630.81 samples/sec Loss 7.3710 LearningRate 0.0308 Epoch: 8 Global Step: 368980 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:49,804-Speed 2631.25 samples/sec Loss 7.5579 LearningRate 0.0308 Epoch: 8 Global Step: 368990 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:53,698-Speed 2630.22 samples/sec Loss 7.4275 LearningRate 0.0308 Epoch: 8 Global Step: 369000 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:09:57,611-Speed 2617.69 samples/sec Loss 7.4872 LearningRate 0.0308 Epoch: 8 Global Step: 369010 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:10:01,507-Speed 2628.91 samples/sec Loss 7.4855 LearningRate 0.0308 Epoch: 8 Global Step: 369020 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:10:05,399-Speed 2631.86 samples/sec Loss 7.4754 LearningRate 0.0308 Epoch: 8 Global Step: 369030 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:09,293-Speed 2630.47 samples/sec Loss 7.4508 LearningRate 0.0308 Epoch: 8 Global Step: 369040 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:13,190-Speed 2628.18 samples/sec Loss 7.6703 LearningRate 0.0308 Epoch: 8 Global Step: 369050 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:17,093-Speed 2623.73 samples/sec Loss 7.4763 LearningRate 0.0308 Epoch: 8 Global Step: 369060 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:20,982-Speed 2634.39 samples/sec Loss 7.4006 LearningRate 0.0308 Epoch: 8 Global Step: 369070 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:24,872-Speed 2633.11 samples/sec Loss 7.4177 LearningRate 0.0308 Epoch: 8 Global Step: 369080 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:28,765-Speed 2630.83 samples/sec Loss 7.6446 LearningRate 0.0308 Epoch: 8 Global Step: 369090 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:32,660-Speed 2629.63 samples/sec Loss 7.4700 LearningRate 0.0308 Epoch: 8 Global Step: 369100 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:36,580-Speed 2613.72 samples/sec Loss 7.4919 LearningRate 0.0308 Epoch: 8 Global Step: 369110 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:40,476-Speed 2628.20 samples/sec Loss 7.5110 LearningRate 0.0308 Epoch: 8 Global Step: 369120 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:10:44,383-Speed 2622.24 samples/sec Loss 7.5045 LearningRate 0.0308 Epoch: 8 Global Step: 369130 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:10:48,304-Speed 2612.36 samples/sec Loss 7.6227 LearningRate 0.0308 Epoch: 8 Global Step: 369140 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:10:52,202-Speed 2627.91 samples/sec Loss 7.5487 LearningRate 0.0308 Epoch: 8 Global Step: 369150 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:10:56,090-Speed 2634.82 samples/sec Loss 7.4636 LearningRate 0.0308 Epoch: 8 Global Step: 369160 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:10:59,970-Speed 2639.74 samples/sec Loss 8.1882 LearningRate 0.0308 Epoch: 8 Global Step: 369170 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:03,860-Speed 2632.55 samples/sec Loss 7.7570 LearningRate 0.0308 Epoch: 8 Global Step: 369180 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:07,775-Speed 2616.38 samples/sec Loss 7.8150 LearningRate 0.0308 Epoch: 8 Global Step: 369190 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:11,747-Speed 2578.75 samples/sec Loss 7.6765 LearningRate 0.0308 Epoch: 8 Global Step: 369200 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:15,650-Speed 2624.53 samples/sec Loss 7.7205 LearningRate 0.0308 Epoch: 8 Global Step: 369210 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:19,589-Speed 2600.18 samples/sec Loss 7.5711 LearningRate 0.0308 Epoch: 8 Global Step: 369220 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:23,483-Speed 2630.52 samples/sec Loss 7.5279 LearningRate 0.0308 Epoch: 8 Global Step: 369230 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:27,382-Speed 2627.15 samples/sec Loss 7.5712 LearningRate 0.0308 Epoch: 8 Global Step: 369240 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:31,276-Speed 2630.63 samples/sec Loss 7.6528 LearningRate 0.0308 Epoch: 8 Global Step: 369250 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:35,184-Speed 2620.65 samples/sec Loss 7.5730 LearningRate 0.0308 Epoch: 8 Global Step: 369260 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:11:39,078-Speed 2630.38 samples/sec Loss 7.5532 LearningRate 0.0308 Epoch: 8 Global Step: 369270 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:11:42,972-Speed 2629.91 samples/sec Loss 7.4544 LearningRate 0.0308 Epoch: 8 Global Step: 369280 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:11:46,865-Speed 2631.50 samples/sec Loss 7.5696 LearningRate 0.0308 Epoch: 8 Global Step: 369290 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:11:50,763-Speed 2627.43 samples/sec Loss 7.3837 LearningRate 0.0308 Epoch: 8 Global Step: 369300 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:11:54,650-Speed 2636.02 samples/sec Loss 7.4825 LearningRate 0.0308 Epoch: 8 Global Step: 369310 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:11:58,560-Speed 2619.45 samples/sec Loss 7.4829 LearningRate 0.0308 Epoch: 8 Global Step: 369320 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:12:02,459-Speed 2627.13 samples/sec Loss 7.6059 LearningRate 0.0308 Epoch: 8 Global Step: 369330 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:12:06,373-Speed 2616.91 samples/sec Loss 7.7304 LearningRate 0.0308 Epoch: 8 Global Step: 369340 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:12:10,260-Speed 2635.19 samples/sec Loss 7.5978 LearningRate 0.0308 Epoch: 8 Global Step: 369350 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:12:14,162-Speed 2624.58 samples/sec Loss 7.4186 LearningRate 0.0308 Epoch: 8 Global Step: 369360 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:12:18,058-Speed 2628.98 samples/sec Loss 7.4647 LearningRate 0.0308 Epoch: 8 Global Step: 369370 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:21,988-Speed 2607.02 samples/sec Loss 7.4426 LearningRate 0.0308 Epoch: 8 Global Step: 369380 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:25,914-Speed 2608.81 samples/sec Loss 7.4901 LearningRate 0.0308 Epoch: 8 Global Step: 369390 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:29,805-Speed 2632.85 samples/sec Loss 7.4779 LearningRate 0.0308 Epoch: 8 Global Step: 369400 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:33,703-Speed 2627.71 samples/sec Loss 7.4884 LearningRate 0.0308 Epoch: 8 Global Step: 369410 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:37,606-Speed 2623.77 samples/sec Loss 7.4566 LearningRate 0.0308 Epoch: 8 Global Step: 369420 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:41,503-Speed 2628.30 samples/sec Loss 7.4370 LearningRate 0.0308 Epoch: 8 Global Step: 369430 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:45,387-Speed 2637.76 samples/sec Loss 7.4889 LearningRate 0.0308 Epoch: 8 Global Step: 369440 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:49,297-Speed 2619.05 samples/sec Loss 7.5164 LearningRate 0.0308 Epoch: 8 Global Step: 369450 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:53,182-Speed 2636.76 samples/sec Loss 7.4627 LearningRate 0.0308 Epoch: 8 Global Step: 369460 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:12:57,077-Speed 2629.53 samples/sec Loss 7.5692 LearningRate 0.0308 Epoch: 8 Global Step: 369470 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:13:00,984-Speed 2621.98 samples/sec Loss 7.5856 LearningRate 0.0308 Epoch: 8 Global Step: 369480 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:13:04,877-Speed 2630.61 samples/sec Loss 7.4717 LearningRate 0.0308 Epoch: 8 Global Step: 369490 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:13:08,775-Speed 2627.72 samples/sec Loss 7.5355 LearningRate 0.0308 Epoch: 8 Global Step: 369500 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:13:12,652-Speed 2641.50 samples/sec Loss 7.6887 LearningRate 0.0308 Epoch: 8 Global Step: 369510 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:16,545-Speed 2631.31 samples/sec Loss 7.5110 LearningRate 0.0308 Epoch: 8 Global Step: 369520 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:20,453-Speed 2620.97 samples/sec Loss 7.5136 LearningRate 0.0308 Epoch: 8 Global Step: 369530 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:24,362-Speed 2620.11 samples/sec Loss 7.6060 LearningRate 0.0308 Epoch: 8 Global Step: 369540 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:28,255-Speed 2630.77 samples/sec Loss 7.4801 LearningRate 0.0308 Epoch: 8 Global Step: 369550 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:32,154-Speed 2627.79 samples/sec Loss 7.5810 LearningRate 0.0307 Epoch: 8 Global Step: 369560 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:36,048-Speed 2629.88 samples/sec Loss 7.5105 LearningRate 0.0307 Epoch: 8 Global Step: 369570 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:39,948-Speed 2626.34 samples/sec Loss 7.5135 LearningRate 0.0307 Epoch: 8 Global Step: 369580 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:43,844-Speed 2628.64 samples/sec Loss 7.5993 LearningRate 0.0307 Epoch: 8 Global Step: 369590 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:47,757-Speed 2618.19 samples/sec Loss 7.4331 LearningRate 0.0307 Epoch: 8 Global Step: 369600 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:13:51,695-Speed 2601.08 samples/sec Loss 7.4309 LearningRate 0.0307 Epoch: 8 Global Step: 369610 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:13:55,616-Speed 2612.61 samples/sec Loss 7.4641 LearningRate 0.0307 Epoch: 8 Global Step: 369620 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:13:59,666-Speed 2529.06 samples/sec Loss 7.3953 LearningRate 0.0307 Epoch: 8 Global Step: 369630 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:03,611-Speed 2596.35 samples/sec Loss 7.5670 LearningRate 0.0307 Epoch: 8 Global Step: 369640 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:07,501-Speed 2632.75 samples/sec Loss 7.4295 LearningRate 0.0307 Epoch: 8 Global Step: 369650 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:11,399-Speed 2628.10 samples/sec Loss 7.4416 LearningRate 0.0307 Epoch: 8 Global Step: 369660 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:15,294-Speed 2629.94 samples/sec Loss 7.4409 LearningRate 0.0307 Epoch: 8 Global Step: 369670 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:19,190-Speed 2628.24 samples/sec Loss 7.4619 LearningRate 0.0307 Epoch: 8 Global Step: 369680 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:23,085-Speed 2630.15 samples/sec Loss 7.5177 LearningRate 0.0307 Epoch: 8 Global Step: 369690 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:26,979-Speed 2630.76 samples/sec Loss 7.4735 LearningRate 0.0307 Epoch: 8 Global Step: 369700 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:30,859-Speed 2639.29 samples/sec Loss 7.5236 LearningRate 0.0307 Epoch: 8 Global Step: 369710 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:34,756-Speed 2629.26 samples/sec Loss 7.5350 LearningRate 0.0307 Epoch: 8 Global Step: 369720 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:38,655-Speed 2627.04 samples/sec Loss 7.5244 LearningRate 0.0307 Epoch: 8 Global Step: 369730 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:42,553-Speed 2627.34 samples/sec Loss 7.5032 LearningRate 0.0307 Epoch: 8 Global Step: 369740 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:46,452-Speed 2627.28 samples/sec Loss 7.4720 LearningRate 0.0307 Epoch: 8 Global Step: 369750 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:50,354-Speed 2624.88 samples/sec Loss 7.4262 LearningRate 0.0307 Epoch: 8 Global Step: 369760 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:54,266-Speed 2618.01 samples/sec Loss 7.3975 LearningRate 0.0307 Epoch: 8 Global Step: 369770 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:14:58,169-Speed 2624.75 samples/sec Loss 7.4942 LearningRate 0.0307 Epoch: 8 Global Step: 369780 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:15:02,069-Speed 2626.89 samples/sec Loss 7.5402 LearningRate 0.0307 Epoch: 8 Global Step: 369790 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:15:06,013-Speed 2596.29 samples/sec Loss 7.4815 LearningRate 0.0307 Epoch: 8 Global Step: 369800 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:15:09,924-Speed 2619.60 samples/sec Loss 7.5220 LearningRate 0.0307 Epoch: 8 Global Step: 369810 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:13,822-Speed 2627.79 samples/sec Loss 7.4611 LearningRate 0.0307 Epoch: 8 Global Step: 369820 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:17,718-Speed 2628.70 samples/sec Loss 7.4649 LearningRate 0.0307 Epoch: 8 Global Step: 369830 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:21,622-Speed 2623.17 samples/sec Loss 7.5744 LearningRate 0.0307 Epoch: 8 Global Step: 369840 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:25,534-Speed 2618.72 samples/sec Loss 7.6129 LearningRate 0.0307 Epoch: 8 Global Step: 369850 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:29,432-Speed 2627.85 samples/sec Loss 7.5953 LearningRate 0.0307 Epoch: 8 Global Step: 369860 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:33,365-Speed 2604.53 samples/sec Loss 7.3872 LearningRate 0.0307 Epoch: 8 Global Step: 369870 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:37,271-Speed 2622.26 samples/sec Loss 7.4027 LearningRate 0.0307 Epoch: 8 Global Step: 369880 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:41,167-Speed 2629.37 samples/sec Loss 7.4674 LearningRate 0.0307 Epoch: 8 Global Step: 369890 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:45,068-Speed 2625.80 samples/sec Loss 7.5571 LearningRate 0.0307 Epoch: 8 Global Step: 369900 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:15:48,956-Speed 2633.95 samples/sec Loss 7.6210 LearningRate 0.0307 Epoch: 8 Global Step: 369910 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 13:15:52,854-Speed 2628.00 samples/sec Loss 7.5315 LearningRate 0.0307 Epoch: 8 Global Step: 369920 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 13:15:56,776-Speed 2611.91 samples/sec Loss 7.4592 LearningRate 0.0307 Epoch: 8 Global Step: 369930 Fp16 Grad Scale: 262144 Required: 52 hours
Training: 2022-04-14 13:16:00,651-Speed 2643.12 samples/sec Loss 7.3849 LearningRate 0.0307 Epoch: 8 Global Step: 369940 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:16:04,561-Speed 2619.70 samples/sec Loss 8.2234 LearningRate 0.0307 Epoch: 8 Global Step: 369950 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:16:08,455-Speed 2630.45 samples/sec Loss 7.9784 LearningRate 0.0307 Epoch: 8 Global Step: 369960 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:16:12,348-Speed 2630.88 samples/sec Loss 7.6410 LearningRate 0.0307 Epoch: 8 Global Step: 369970 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:16:16,238-Speed 2633.04 samples/sec Loss 7.4689 LearningRate 0.0307 Epoch: 8 Global Step: 369980 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:16:20,138-Speed 2626.59 samples/sec Loss 7.4160 LearningRate 0.0307 Epoch: 8 Global Step: 369990 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:16:24,043-Speed 2622.76 samples/sec Loss 7.5794 LearningRate 0.0307 Epoch: 8 Global Step: 370000 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:17:06,921-[lfw][370000]XNorm: 23.203387
Training: 2022-04-14 13:17:06,922-[lfw][370000]Accuracy-Flip: 0.99750+-0.00281
Training: 2022-04-14 13:17:06,923-[lfw][370000]Accuracy-Highest: 0.99783
Training: 2022-04-14 13:17:56,979-[cfp_fp][370000]XNorm: 21.421876
Training: 2022-04-14 13:17:56,980-[cfp_fp][370000]Accuracy-Flip: 0.98557+-0.00692
Training: 2022-04-14 13:17:56,981-[cfp_fp][370000]Accuracy-Highest: 0.98671
Training: 2022-04-14 13:18:40,141-[agedb_30][370000]XNorm: 23.100004
Training: 2022-04-14 13:18:40,142-[agedb_30][370000]Accuracy-Flip: 0.97567+-0.00807
Training: 2022-04-14 13:18:40,142-[agedb_30][370000]Accuracy-Highest: 0.97700
Training: 2022-04-14 13:18:44,007-Speed 73.16 samples/sec Loss 7.4218 LearningRate 0.0307 Epoch: 8 Global Step: 370010 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:18:47,881-Speed 2644.26 samples/sec Loss 7.5556 LearningRate 0.0307 Epoch: 8 Global Step: 370020 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:18:51,759-Speed 2640.86 samples/sec Loss 7.5187 LearningRate 0.0307 Epoch: 8 Global Step: 370030 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:18:55,638-Speed 2641.47 samples/sec Loss 7.4571 LearningRate 0.0307 Epoch: 8 Global Step: 370040 Fp16 Grad Scale: 2048 Required: 52 hours
Training: 2022-04-14 13:18:59,524-Speed 2635.70 samples/sec Loss 7.4537 LearningRate 0.0307 Epoch: 8 Global Step: 370050 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:03,407-Speed 2637.60 samples/sec Loss 7.4621 LearningRate 0.0307 Epoch: 8 Global Step: 370060 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:07,284-Speed 2642.27 samples/sec Loss 7.5877 LearningRate 0.0307 Epoch: 8 Global Step: 370070 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:11,170-Speed 2635.88 samples/sec Loss 7.4202 LearningRate 0.0307 Epoch: 8 Global Step: 370080 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:15,047-Speed 2641.82 samples/sec Loss 7.4986 LearningRate 0.0307 Epoch: 8 Global Step: 370090 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:18,941-Speed 2630.43 samples/sec Loss 7.4485 LearningRate 0.0307 Epoch: 8 Global Step: 370100 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:22,825-Speed 2636.94 samples/sec Loss 7.4231 LearningRate 0.0307 Epoch: 8 Global Step: 370110 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:26,718-Speed 2631.02 samples/sec Loss 7.4584 LearningRate 0.0307 Epoch: 8 Global Step: 370120 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:30,601-Speed 2638.11 samples/sec Loss 7.5596 LearningRate 0.0307 Epoch: 8 Global Step: 370130 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:34,498-Speed 2628.00 samples/sec Loss 7.5784 LearningRate 0.0307 Epoch: 8 Global Step: 370140 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:19:38,387-Speed 2633.68 samples/sec Loss 7.4268 LearningRate 0.0307 Epoch: 8 Global Step: 370150 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:19:42,289-Speed 2625.51 samples/sec Loss 7.5094 LearningRate 0.0307 Epoch: 8 Global Step: 370160 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:19:46,180-Speed 2632.11 samples/sec Loss 7.6036 LearningRate 0.0307 Epoch: 8 Global Step: 370170 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:19:50,078-Speed 2627.74 samples/sec Loss 7.5385 LearningRate 0.0307 Epoch: 8 Global Step: 370180 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:19:53,973-Speed 2629.42 samples/sec Loss 7.5478 LearningRate 0.0307 Epoch: 8 Global Step: 370190 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:19:57,873-Speed 2626.50 samples/sec Loss 7.4716 LearningRate 0.0307 Epoch: 8 Global Step: 370200 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:20:02,241-Speed 2344.70 samples/sec Loss 7.5567 LearningRate 0.0307 Epoch: 8 Global Step: 370210 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:20:06,140-Speed 2626.89 samples/sec Loss 7.5481 LearningRate 0.0307 Epoch: 8 Global Step: 370220 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:20:10,035-Speed 2630.34 samples/sec Loss 7.7376 LearningRate 0.0307 Epoch: 8 Global Step: 370230 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:20:13,935-Speed 2626.05 samples/sec Loss 7.4199 LearningRate 0.0307 Epoch: 8 Global Step: 370240 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:20:17,841-Speed 2622.22 samples/sec Loss 7.5154 LearningRate 0.0307 Epoch: 8 Global Step: 370250 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:21,741-Speed 2626.43 samples/sec Loss 7.4985 LearningRate 0.0307 Epoch: 8 Global Step: 370260 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:25,646-Speed 2622.77 samples/sec Loss 7.5836 LearningRate 0.0307 Epoch: 8 Global Step: 370270 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:29,540-Speed 2630.50 samples/sec Loss 7.4015 LearningRate 0.0307 Epoch: 8 Global Step: 370280 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:33,436-Speed 2629.07 samples/sec Loss 7.5371 LearningRate 0.0307 Epoch: 8 Global Step: 370290 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:37,333-Speed 2628.51 samples/sec Loss 7.6175 LearningRate 0.0307 Epoch: 8 Global Step: 370300 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:41,266-Speed 2604.07 samples/sec Loss 7.4506 LearningRate 0.0306 Epoch: 8 Global Step: 370310 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:45,158-Speed 2632.20 samples/sec Loss 7.4167 LearningRate 0.0306 Epoch: 8 Global Step: 370320 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:49,053-Speed 2629.79 samples/sec Loss 7.4637 LearningRate 0.0306 Epoch: 8 Global Step: 370330 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:52,954-Speed 2625.66 samples/sec Loss 7.4474 LearningRate 0.0306 Epoch: 8 Global Step: 370340 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:20:56,850-Speed 2628.92 samples/sec Loss 7.4900 LearningRate 0.0306 Epoch: 8 Global Step: 370350 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:00,746-Speed 2628.85 samples/sec Loss 7.4585 LearningRate 0.0306 Epoch: 8 Global Step: 370360 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:04,642-Speed 2628.72 samples/sec Loss 7.4569 LearningRate 0.0306 Epoch: 8 Global Step: 370370 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:08,535-Speed 2631.15 samples/sec Loss 7.4099 LearningRate 0.0306 Epoch: 8 Global Step: 370380 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:12,434-Speed 2627.62 samples/sec Loss 7.6030 LearningRate 0.0306 Epoch: 8 Global Step: 370390 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:16,331-Speed 2628.02 samples/sec Loss 7.5625 LearningRate 0.0306 Epoch: 8 Global Step: 370400 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:20,228-Speed 2628.76 samples/sec Loss 7.4507 LearningRate 0.0306 Epoch: 8 Global Step: 370410 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:24,125-Speed 2627.85 samples/sec Loss 7.3618 LearningRate 0.0306 Epoch: 8 Global Step: 370420 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:28,025-Speed 2626.58 samples/sec Loss 7.3830 LearningRate 0.0306 Epoch: 8 Global Step: 370430 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:31,942-Speed 2614.71 samples/sec Loss 7.4726 LearningRate 0.0306 Epoch: 8 Global Step: 370440 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:35,838-Speed 2629.03 samples/sec Loss 7.5149 LearningRate 0.0306 Epoch: 8 Global Step: 370450 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:21:39,740-Speed 2625.21 samples/sec Loss 7.4811 LearningRate 0.0306 Epoch: 8 Global Step: 370460 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:21:43,634-Speed 2630.79 samples/sec Loss 7.5256 LearningRate 0.0306 Epoch: 8 Global Step: 370470 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:21:47,522-Speed 2634.43 samples/sec Loss 7.5647 LearningRate 0.0306 Epoch: 8 Global Step: 370480 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:21:51,393-Speed 2645.74 samples/sec Loss 7.6463 LearningRate 0.0306 Epoch: 8 Global Step: 370490 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:21:55,300-Speed 2621.58 samples/sec Loss 7.6402 LearningRate 0.0306 Epoch: 8 Global Step: 370500 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:21:59,191-Speed 2632.80 samples/sec Loss 7.5444 LearningRate 0.0306 Epoch: 8 Global Step: 370510 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:03,098-Speed 2621.17 samples/sec Loss 7.4839 LearningRate 0.0306 Epoch: 8 Global Step: 370520 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:06,998-Speed 2626.22 samples/sec Loss 7.5373 LearningRate 0.0306 Epoch: 8 Global Step: 370530 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:10,894-Speed 2629.69 samples/sec Loss 7.5479 LearningRate 0.0306 Epoch: 8 Global Step: 370540 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:14,788-Speed 2630.17 samples/sec Loss 7.5295 LearningRate 0.0306 Epoch: 8 Global Step: 370550 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:18,681-Speed 2631.02 samples/sec Loss 7.5392 LearningRate 0.0306 Epoch: 8 Global Step: 370560 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:22,587-Speed 2622.29 samples/sec Loss 7.4348 LearningRate 0.0306 Epoch: 8 Global Step: 370570 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:26,484-Speed 2628.41 samples/sec Loss 7.4805 LearningRate 0.0306 Epoch: 8 Global Step: 370580 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:30,389-Speed 2622.92 samples/sec Loss 7.4185 LearningRate 0.0306 Epoch: 8 Global Step: 370590 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:22:34,290-Speed 2625.99 samples/sec Loss 7.3661 LearningRate 0.0306 Epoch: 8 Global Step: 370600 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:22:38,242-Speed 2591.33 samples/sec Loss 7.5782 LearningRate 0.0306 Epoch: 8 Global Step: 370610 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:22:42,139-Speed 2628.76 samples/sec Loss 7.4360 LearningRate 0.0306 Epoch: 8 Global Step: 370620 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:22:46,088-Speed 2593.03 samples/sec Loss 7.3615 LearningRate 0.0306 Epoch: 8 Global Step: 370630 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:22:49,980-Speed 2632.11 samples/sec Loss 7.5273 LearningRate 0.0306 Epoch: 8 Global Step: 370640 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:22:53,861-Speed 2639.49 samples/sec Loss 7.8681 LearningRate 0.0306 Epoch: 8 Global Step: 370650 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:22:57,751-Speed 2632.93 samples/sec Loss 7.5390 LearningRate 0.0306 Epoch: 8 Global Step: 370660 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:01,647-Speed 2628.66 samples/sec Loss 7.3902 LearningRate 0.0306 Epoch: 8 Global Step: 370670 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:05,546-Speed 2626.62 samples/sec Loss 7.4660 LearningRate 0.0306 Epoch: 8 Global Step: 370680 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:09,449-Speed 2624.10 samples/sec Loss 7.5322 LearningRate 0.0306 Epoch: 8 Global Step: 370690 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:13,352-Speed 2624.93 samples/sec Loss 7.4509 LearningRate 0.0306 Epoch: 8 Global Step: 370700 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:17,252-Speed 2626.10 samples/sec Loss 7.4358 LearningRate 0.0306 Epoch: 8 Global Step: 370710 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:21,162-Speed 2619.81 samples/sec Loss 7.5711 LearningRate 0.0306 Epoch: 8 Global Step: 370720 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:25,074-Speed 2618.61 samples/sec Loss 7.3562 LearningRate 0.0306 Epoch: 8 Global Step: 370730 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:28,970-Speed 2628.97 samples/sec Loss 7.5454 LearningRate 0.0306 Epoch: 8 Global Step: 370740 Fp16 Grad Scale: 4096 Required: 52 hours
Training: 2022-04-14 13:23:32,879-Speed 2620.54 samples/sec Loss 7.4600 LearningRate 0.0306 Epoch: 8 Global Step: 370750 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:23:36,778-Speed 2626.69 samples/sec Loss 7.4424 LearningRate 0.0306 Epoch: 8 Global Step: 370760 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:23:40,669-Speed 2632.34 samples/sec Loss 7.5278 LearningRate 0.0306 Epoch: 8 Global Step: 370770 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:23:44,587-Speed 2613.74 samples/sec Loss 7.5581 LearningRate 0.0306 Epoch: 8 Global Step: 370780 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:23:48,492-Speed 2623.68 samples/sec Loss 7.4031 LearningRate 0.0306 Epoch: 8 Global Step: 370790 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:23:52,399-Speed 2621.69 samples/sec Loss 7.5309 LearningRate 0.0306 Epoch: 8 Global Step: 370800 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:23:56,380-Speed 2573.26 samples/sec Loss 7.4374 LearningRate 0.0306 Epoch: 8 Global Step: 370810 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:24:00,321-Speed 2598.84 samples/sec Loss 7.4228 LearningRate 0.0306 Epoch: 8 Global Step: 370820 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:24:04,246-Speed 2609.55 samples/sec Loss 7.3362 LearningRate 0.0306 Epoch: 8 Global Step: 370830 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:24:08,142-Speed 2628.87 samples/sec Loss 7.4085 LearningRate 0.0306 Epoch: 8 Global Step: 370840 Fp16 Grad Scale: 8192 Required: 52 hours
Training: 2022-04-14 13:24:12,038-Speed 2629.65 samples/sec Loss 7.3889 LearningRate 0.0306 Epoch: 8 Global Step: 370850 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:15,989-Speed 2592.04 samples/sec Loss 7.4036 LearningRate 0.0306 Epoch: 8 Global Step: 370860 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:19,888-Speed 2626.54 samples/sec Loss 7.5420 LearningRate 0.0306 Epoch: 8 Global Step: 370870 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:23,795-Speed 2622.08 samples/sec Loss 7.5416 LearningRate 0.0306 Epoch: 8 Global Step: 370880 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:27,694-Speed 2627.23 samples/sec Loss 7.4958 LearningRate 0.0306 Epoch: 8 Global Step: 370890 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:31,662-Speed 2581.44 samples/sec Loss 7.3846 LearningRate 0.0306 Epoch: 8 Global Step: 370900 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:35,565-Speed 2623.66 samples/sec Loss 7.4732 LearningRate 0.0306 Epoch: 8 Global Step: 370910 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:39,476-Speed 2618.89 samples/sec Loss 7.3946 LearningRate 0.0306 Epoch: 8 Global Step: 370920 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:43,399-Speed 2610.83 samples/sec Loss 7.5565 LearningRate 0.0306 Epoch: 8 Global Step: 370930 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:47,409-Speed 2554.18 samples/sec Loss 8.2785 LearningRate 0.0306 Epoch: 8 Global Step: 370940 Fp16 Grad Scale: 16384 Required: 52 hours
Training: 2022-04-14 13:24:51,307-Speed 2627.58 samples/sec Loss 7.9291 LearningRate 0.0306 Epoch: 8 Global Step: 370950 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:24:55,213-Speed 2622.84 samples/sec Loss 7.8090 LearningRate 0.0306 Epoch: 8 Global Step: 370960 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:24:59,118-Speed 2622.79 samples/sec Loss 7.7956 LearningRate 0.0306 Epoch: 8 Global Step: 370970 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:03,021-Speed 2624.04 samples/sec Loss 7.6423 LearningRate 0.0306 Epoch: 8 Global Step: 370980 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:06,919-Speed 2627.26 samples/sec Loss 7.5346 LearningRate 0.0306 Epoch: 8 Global Step: 370990 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:10,819-Speed 2626.11 samples/sec Loss 7.5100 LearningRate 0.0306 Epoch: 8 Global Step: 371000 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:14,718-Speed 2626.88 samples/sec Loss 7.4804 LearningRate 0.0306 Epoch: 8 Global Step: 371010 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:18,616-Speed 2627.63 samples/sec Loss 7.6717 LearningRate 0.0306 Epoch: 8 Global Step: 371020 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:22,530-Speed 2617.11 samples/sec Loss 7.5169 LearningRate 0.0306 Epoch: 8 Global Step: 371030 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:26,428-Speed 2628.08 samples/sec Loss 7.5908 LearningRate 0.0306 Epoch: 8 Global Step: 371040 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:25:30,370-Speed 2597.56 samples/sec Loss 7.4770 LearningRate 0.0306 Epoch: 8 Global Step: 371050 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:25:34,266-Speed 2629.51 samples/sec Loss 7.4119 LearningRate 0.0305 Epoch: 8 Global Step: 371060 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:25:38,161-Speed 2629.01 samples/sec Loss 7.5161 LearningRate 0.0305 Epoch: 8 Global Step: 371070 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:25:42,055-Speed 2630.53 samples/sec Loss 7.4250 LearningRate 0.0305 Epoch: 8 Global Step: 371080 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:25:45,948-Speed 2631.02 samples/sec Loss 7.5319 LearningRate 0.0305 Epoch: 8 Global Step: 371090 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:25:49,846-Speed 2627.13 samples/sec Loss 7.5072 LearningRate 0.0305 Epoch: 8 Global Step: 371100 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:25:53,741-Speed 2629.45 samples/sec Loss 7.4490 LearningRate 0.0305 Epoch: 8 Global Step: 371110 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:25:57,640-Speed 2626.81 samples/sec Loss 7.5528 LearningRate 0.0305 Epoch: 8 Global Step: 371120 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:26:01,541-Speed 2626.08 samples/sec Loss 7.5822 LearningRate 0.0305 Epoch: 8 Global Step: 371130 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:26:05,430-Speed 2633.82 samples/sec Loss 7.3663 LearningRate 0.0305 Epoch: 8 Global Step: 371140 Fp16 Grad Scale: 65536 Required: 52 hours
Training: 2022-04-14 13:26:09,338-Speed 2620.61 samples/sec Loss 7.4186 LearningRate 0.0305 Epoch: 8 Global Step: 371150 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:26:13,245-Speed 2621.22 samples/sec Loss 7.4354 LearningRate 0.0305 Epoch: 8 Global Step: 371160 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:26:17,145-Speed 2626.37 samples/sec Loss 7.4477 LearningRate 0.0305 Epoch: 8 Global Step: 371170 Fp16 Grad Scale: 131072 Required: 52 hours
Training: 2022-04-14 13:26:21,009-Speed 2650.59 samples/sec Loss 7.5077 LearningRate 0.0305 Epoch: 8 Global Step: 371180 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:26:24,945-Speed 2602.35 samples/sec Loss 7.4622 LearningRate 0.0305 Epoch: 8 Global Step: 371190 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:26:28,849-Speed 2623.73 samples/sec Loss 7.4817 LearningRate 0.0305 Epoch: 8 Global Step: 371200 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:26:32,828-Speed 2574.07 samples/sec Loss 7.4568 LearningRate 0.0305 Epoch: 8 Global Step: 371210 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:26:36,719-Speed 2632.03 samples/sec Loss 7.4237 LearningRate 0.0305 Epoch: 8 Global Step: 371220 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:26:40,612-Speed 2631.16 samples/sec Loss 7.3713 LearningRate 0.0305 Epoch: 8 Global Step: 371230 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:26:44,505-Speed 2630.88 samples/sec Loss 7.4893 LearningRate 0.0305 Epoch: 8 Global Step: 371240 Fp16 Grad Scale: 32768 Required: 52 hours
Training: 2022-04-14 13:26:48,399-Speed 2630.50 samples/sec Loss 7.4679 LearningRate 0.0305 Epoch: 8 Global Step: 371250 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:26:52,306-Speed 2621.40 samples/sec Loss 7.4478 LearningRate 0.0305 Epoch: 8 Global Step: 371260 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:26:56,199-Speed 2630.73 samples/sec Loss 7.5628 LearningRate 0.0305 Epoch: 8 Global Step: 371270 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:00,096-Speed 2628.22 samples/sec Loss 7.4889 LearningRate 0.0305 Epoch: 8 Global Step: 371280 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:27:04,001-Speed 2623.12 samples/sec Loss 7.4099 LearningRate 0.0305 Epoch: 8 Global Step: 371290 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:27:07,904-Speed 2623.84 samples/sec Loss 7.4285 LearningRate 0.0305 Epoch: 8 Global Step: 371300 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:27:11,810-Speed 2622.59 samples/sec Loss 7.4120 LearningRate 0.0305 Epoch: 8 Global Step: 371310 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:27:15,701-Speed 2632.13 samples/sec Loss 7.5398 LearningRate 0.0305 Epoch: 8 Global Step: 371320 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:27:19,595-Speed 2630.45 samples/sec Loss 7.4601 LearningRate 0.0305 Epoch: 8 Global Step: 371330 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:23,502-Speed 2621.76 samples/sec Loss 8.0406 LearningRate 0.0305 Epoch: 8 Global Step: 371340 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:27,401-Speed 2626.97 samples/sec Loss 7.7389 LearningRate 0.0305 Epoch: 8 Global Step: 371350 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:31,306-Speed 2622.57 samples/sec Loss 7.5940 LearningRate 0.0305 Epoch: 8 Global Step: 371360 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:35,198-Speed 2631.32 samples/sec Loss 7.5138 LearningRate 0.0305 Epoch: 8 Global Step: 371370 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:39,096-Speed 2627.73 samples/sec Loss 7.5084 LearningRate 0.0305 Epoch: 8 Global Step: 371380 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:42,985-Speed 2634.02 samples/sec Loss 7.4373 LearningRate 0.0305 Epoch: 8 Global Step: 371390 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:46,880-Speed 2628.97 samples/sec Loss 7.5330 LearningRate 0.0305 Epoch: 8 Global Step: 371400 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:50,783-Speed 2624.27 samples/sec Loss 7.5160 LearningRate 0.0305 Epoch: 8 Global Step: 371410 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:54,673-Speed 2633.43 samples/sec Loss 7.3530 LearningRate 0.0305 Epoch: 8 Global Step: 371420 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:27:58,577-Speed 2623.58 samples/sec Loss 7.6978 LearningRate 0.0305 Epoch: 8 Global Step: 371430 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:28:02,499-Speed 2611.85 samples/sec Loss 7.4885 LearningRate 0.0305 Epoch: 8 Global Step: 371440 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:28:06,388-Speed 2632.95 samples/sec Loss 7.3741 LearningRate 0.0305 Epoch: 8 Global Step: 371450 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:28:10,315-Speed 2608.30 samples/sec Loss 7.4704 LearningRate 0.0305 Epoch: 8 Global Step: 371460 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:28:14,217-Speed 2624.92 samples/sec Loss 7.4784 LearningRate 0.0305 Epoch: 8 Global Step: 371470 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:28:18,111-Speed 2630.32 samples/sec Loss 7.3890 LearningRate 0.0305 Epoch: 8 Global Step: 371480 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:28:22,003-Speed 2631.48 samples/sec Loss 7.4333 LearningRate 0.0305 Epoch: 8 Global Step: 371490 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:28:25,884-Speed 2639.20 samples/sec Loss 7.5584 LearningRate 0.0305 Epoch: 8 Global Step: 371500 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:29,788-Speed 2624.25 samples/sec Loss 7.7910 LearningRate 0.0305 Epoch: 8 Global Step: 371510 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:33,690-Speed 2624.51 samples/sec Loss 7.4400 LearningRate 0.0305 Epoch: 8 Global Step: 371520 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:37,584-Speed 2630.42 samples/sec Loss 7.4757 LearningRate 0.0305 Epoch: 8 Global Step: 371530 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:41,476-Speed 2631.43 samples/sec Loss 7.5290 LearningRate 0.0305 Epoch: 8 Global Step: 371540 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:45,387-Speed 2618.80 samples/sec Loss 7.4758 LearningRate 0.0305 Epoch: 8 Global Step: 371550 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:49,290-Speed 2624.46 samples/sec Loss 7.5217 LearningRate 0.0305 Epoch: 8 Global Step: 371560 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:53,180-Speed 2632.46 samples/sec Loss 7.4857 LearningRate 0.0305 Epoch: 8 Global Step: 371570 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:28:57,070-Speed 2633.21 samples/sec Loss 7.4781 LearningRate 0.0305 Epoch: 8 Global Step: 371580 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:29:00,966-Speed 2628.68 samples/sec Loss 7.3820 LearningRate 0.0305 Epoch: 8 Global Step: 371590 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:29:04,892-Speed 2608.71 samples/sec Loss 7.5765 LearningRate 0.0305 Epoch: 8 Global Step: 371600 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:29:08,790-Speed 2628.00 samples/sec Loss 7.4636 LearningRate 0.0305 Epoch: 8 Global Step: 371610 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:29:12,705-Speed 2616.06 samples/sec Loss 7.4087 LearningRate 0.0305 Epoch: 8 Global Step: 371620 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:29:16,625-Speed 2613.12 samples/sec Loss 7.5011 LearningRate 0.0305 Epoch: 8 Global Step: 371630 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:29:20,547-Speed 2611.07 samples/sec Loss 7.4635 LearningRate 0.0305 Epoch: 8 Global Step: 371640 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:29:24,447-Speed 2626.12 samples/sec Loss 7.5575 LearningRate 0.0305 Epoch: 8 Global Step: 371650 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:29:28,326-Speed 2640.84 samples/sec Loss 7.8063 LearningRate 0.0305 Epoch: 8 Global Step: 371660 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:32,232-Speed 2622.20 samples/sec Loss 7.6138 LearningRate 0.0305 Epoch: 8 Global Step: 371670 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:36,130-Speed 2627.13 samples/sec Loss 7.3879 LearningRate 0.0305 Epoch: 8 Global Step: 371680 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:40,036-Speed 2622.11 samples/sec Loss 7.4122 LearningRate 0.0305 Epoch: 8 Global Step: 371690 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:43,940-Speed 2624.01 samples/sec Loss 7.5187 LearningRate 0.0305 Epoch: 8 Global Step: 371700 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:47,835-Speed 2629.86 samples/sec Loss 7.4765 LearningRate 0.0305 Epoch: 8 Global Step: 371710 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:51,725-Speed 2632.72 samples/sec Loss 7.4744 LearningRate 0.0305 Epoch: 8 Global Step: 371720 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:55,632-Speed 2621.51 samples/sec Loss 7.4770 LearningRate 0.0305 Epoch: 8 Global Step: 371730 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:29:59,528-Speed 2628.99 samples/sec Loss 7.5159 LearningRate 0.0305 Epoch: 8 Global Step: 371740 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:30:03,422-Speed 2630.18 samples/sec Loss 7.4800 LearningRate 0.0305 Epoch: 8 Global Step: 371750 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:30:07,315-Speed 2630.97 samples/sec Loss 7.5316 LearningRate 0.0305 Epoch: 8 Global Step: 371760 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:11,222-Speed 2621.02 samples/sec Loss 7.4775 LearningRate 0.0305 Epoch: 8 Global Step: 371770 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:15,123-Speed 2625.84 samples/sec Loss 7.3881 LearningRate 0.0305 Epoch: 8 Global Step: 371780 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:19,018-Speed 2629.25 samples/sec Loss 7.4694 LearningRate 0.0305 Epoch: 8 Global Step: 371790 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:22,937-Speed 2614.18 samples/sec Loss 7.4371 LearningRate 0.0305 Epoch: 8 Global Step: 371800 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:26,845-Speed 2620.47 samples/sec Loss 7.3503 LearningRate 0.0304 Epoch: 8 Global Step: 371810 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:30,756-Speed 2618.77 samples/sec Loss 7.4777 LearningRate 0.0304 Epoch: 8 Global Step: 371820 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:34,659-Speed 2624.48 samples/sec Loss 7.4335 LearningRate 0.0304 Epoch: 8 Global Step: 371830 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:38,566-Speed 2621.73 samples/sec Loss 7.4328 LearningRate 0.0304 Epoch: 8 Global Step: 371840 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:42,464-Speed 2627.17 samples/sec Loss 7.3727 LearningRate 0.0304 Epoch: 8 Global Step: 371850 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:30:46,368-Speed 2623.40 samples/sec Loss 7.4033 LearningRate 0.0304 Epoch: 8 Global Step: 371860 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:30:50,265-Speed 2628.00 samples/sec Loss 7.3365 LearningRate 0.0304 Epoch: 8 Global Step: 371870 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:30:54,158-Speed 2631.21 samples/sec Loss 7.4744 LearningRate 0.0304 Epoch: 8 Global Step: 371880 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:30:58,066-Speed 2620.43 samples/sec Loss 7.4002 LearningRate 0.0304 Epoch: 8 Global Step: 371890 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:31:01,965-Speed 2627.91 samples/sec Loss 7.4368 LearningRate 0.0304 Epoch: 8 Global Step: 371900 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:31:05,874-Speed 2619.74 samples/sec Loss 7.4580 LearningRate 0.0304 Epoch: 8 Global Step: 371910 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:31:09,768-Speed 2630.01 samples/sec Loss 7.5057 LearningRate 0.0304 Epoch: 8 Global Step: 371920 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:31:13,672-Speed 2623.53 samples/sec Loss 7.3824 LearningRate 0.0304 Epoch: 8 Global Step: 371930 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:31:17,572-Speed 2626.27 samples/sec Loss 7.4361 LearningRate 0.0304 Epoch: 8 Global Step: 371940 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:31:21,477-Speed 2622.69 samples/sec Loss 7.3967 LearningRate 0.0304 Epoch: 8 Global Step: 371950 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:31:25,338-Speed 2653.40 samples/sec Loss 7.6403 LearningRate 0.0304 Epoch: 8 Global Step: 371960 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:29,236-Speed 2627.37 samples/sec Loss 8.0404 LearningRate 0.0304 Epoch: 8 Global Step: 371970 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:33,161-Speed 2609.73 samples/sec Loss 7.5014 LearningRate 0.0304 Epoch: 8 Global Step: 371980 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:37,249-Speed 2505.57 samples/sec Loss 7.4509 LearningRate 0.0304 Epoch: 8 Global Step: 371990 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:41,211-Speed 2584.63 samples/sec Loss 7.4915 LearningRate 0.0304 Epoch: 8 Global Step: 372000 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:45,114-Speed 2624.42 samples/sec Loss 7.5238 LearningRate 0.0304 Epoch: 8 Global Step: 372010 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:49,057-Speed 2597.61 samples/sec Loss 7.5337 LearningRate 0.0304 Epoch: 8 Global Step: 372020 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:52,960-Speed 2623.94 samples/sec Loss 7.4634 LearningRate 0.0304 Epoch: 8 Global Step: 372030 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:31:56,865-Speed 2622.79 samples/sec Loss 7.3927 LearningRate 0.0304 Epoch: 8 Global Step: 372040 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:00,768-Speed 2624.57 samples/sec Loss 7.3767 LearningRate 0.0304 Epoch: 8 Global Step: 372050 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:04,655-Speed 2634.99 samples/sec Loss 7.6265 LearningRate 0.0304 Epoch: 8 Global Step: 372060 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:08,554-Speed 2626.83 samples/sec Loss 7.5339 LearningRate 0.0304 Epoch: 8 Global Step: 372070 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:12,442-Speed 2634.13 samples/sec Loss 7.4738 LearningRate 0.0304 Epoch: 8 Global Step: 372080 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:16,351-Speed 2620.32 samples/sec Loss 7.3773 LearningRate 0.0304 Epoch: 8 Global Step: 372090 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:20,246-Speed 2629.58 samples/sec Loss 7.5844 LearningRate 0.0304 Epoch: 8 Global Step: 372100 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:24,139-Speed 2631.37 samples/sec Loss 7.5040 LearningRate 0.0304 Epoch: 8 Global Step: 372110 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:28,039-Speed 2625.83 samples/sec Loss 7.5440 LearningRate 0.0304 Epoch: 8 Global Step: 372120 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:31,937-Speed 2627.60 samples/sec Loss 7.5230 LearningRate 0.0304 Epoch: 8 Global Step: 372130 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:35,828-Speed 2632.59 samples/sec Loss 7.4215 LearningRate 0.0304 Epoch: 8 Global Step: 372140 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:39,741-Speed 2617.25 samples/sec Loss 7.3873 LearningRate 0.0304 Epoch: 8 Global Step: 372150 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:32:43,639-Speed 2627.57 samples/sec Loss 7.4989 LearningRate 0.0304 Epoch: 8 Global Step: 372160 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:32:47,573-Speed 2603.69 samples/sec Loss 7.4757 LearningRate 0.0304 Epoch: 8 Global Step: 372170 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:32:51,470-Speed 2628.10 samples/sec Loss 7.5271 LearningRate 0.0304 Epoch: 8 Global Step: 372180 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:32:55,363-Speed 2631.43 samples/sec Loss 7.4309 LearningRate 0.0304 Epoch: 8 Global Step: 372190 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:32:59,298-Speed 2602.95 samples/sec Loss 7.5763 LearningRate 0.0304 Epoch: 8 Global Step: 372200 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:03,185-Speed 2634.79 samples/sec Loss 7.6085 LearningRate 0.0304 Epoch: 8 Global Step: 372210 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:07,079-Speed 2630.04 samples/sec Loss 7.3780 LearningRate 0.0304 Epoch: 8 Global Step: 372220 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:10,977-Speed 2627.35 samples/sec Loss 7.3959 LearningRate 0.0304 Epoch: 8 Global Step: 372230 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:14,866-Speed 2634.23 samples/sec Loss 7.4182 LearningRate 0.0304 Epoch: 8 Global Step: 372240 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:18,785-Speed 2613.31 samples/sec Loss 7.5054 LearningRate 0.0304 Epoch: 8 Global Step: 372250 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:22,682-Speed 2628.29 samples/sec Loss 7.4724 LearningRate 0.0304 Epoch: 8 Global Step: 372260 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:33:26,572-Speed 2632.62 samples/sec Loss 7.4514 LearningRate 0.0304 Epoch: 8 Global Step: 372270 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:33:30,465-Speed 2631.10 samples/sec Loss 7.4624 LearningRate 0.0304 Epoch: 8 Global Step: 372280 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:33:34,384-Speed 2613.69 samples/sec Loss 7.5139 LearningRate 0.0304 Epoch: 8 Global Step: 372290 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:33:38,290-Speed 2622.51 samples/sec Loss 7.4400 LearningRate 0.0304 Epoch: 8 Global Step: 372300 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:33:42,178-Speed 2634.39 samples/sec Loss 7.4814 LearningRate 0.0304 Epoch: 8 Global Step: 372310 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:33:46,080-Speed 2624.59 samples/sec Loss 7.4456 LearningRate 0.0304 Epoch: 8 Global Step: 372320 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:33:49,961-Speed 2638.94 samples/sec Loss 7.5439 LearningRate 0.0304 Epoch: 8 Global Step: 372330 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:53,853-Speed 2631.86 samples/sec Loss 7.7377 LearningRate 0.0304 Epoch: 8 Global Step: 372340 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:33:57,742-Speed 2633.47 samples/sec Loss 7.5904 LearningRate 0.0304 Epoch: 8 Global Step: 372350 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:01,637-Speed 2629.74 samples/sec Loss 7.5960 LearningRate 0.0304 Epoch: 8 Global Step: 372360 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:05,526-Speed 2633.63 samples/sec Loss 7.4431 LearningRate 0.0304 Epoch: 8 Global Step: 372370 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:09,420-Speed 2629.95 samples/sec Loss 7.4841 LearningRate 0.0304 Epoch: 8 Global Step: 372380 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:13,313-Speed 2631.46 samples/sec Loss 7.5132 LearningRate 0.0304 Epoch: 8 Global Step: 372390 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:17,212-Speed 2626.72 samples/sec Loss 7.3834 LearningRate 0.0304 Epoch: 8 Global Step: 372400 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:21,117-Speed 2623.10 samples/sec Loss 7.5136 LearningRate 0.0304 Epoch: 8 Global Step: 372410 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:25,013-Speed 2628.86 samples/sec Loss 7.4908 LearningRate 0.0304 Epoch: 8 Global Step: 372420 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:28,910-Speed 2627.95 samples/sec Loss 7.4407 LearningRate 0.0304 Epoch: 8 Global Step: 372430 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:32,802-Speed 2631.76 samples/sec Loss 7.4390 LearningRate 0.0304 Epoch: 8 Global Step: 372440 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:34:36,704-Speed 2624.38 samples/sec Loss 7.4867 LearningRate 0.0304 Epoch: 8 Global Step: 372450 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:34:40,604-Speed 2626.49 samples/sec Loss 7.6030 LearningRate 0.0304 Epoch: 8 Global Step: 372460 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:34:44,540-Speed 2602.45 samples/sec Loss 7.5575 LearningRate 0.0304 Epoch: 8 Global Step: 372470 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:34:48,430-Speed 2633.17 samples/sec Loss 7.5026 LearningRate 0.0304 Epoch: 8 Global Step: 372480 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:34:52,330-Speed 2626.07 samples/sec Loss 7.4491 LearningRate 0.0304 Epoch: 8 Global Step: 372490 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:34:56,220-Speed 2633.22 samples/sec Loss 7.3626 LearningRate 0.0304 Epoch: 8 Global Step: 372500 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:35:00,113-Speed 2630.82 samples/sec Loss 7.4455 LearningRate 0.0304 Epoch: 8 Global Step: 372510 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:35:04,009-Speed 2629.03 samples/sec Loss 7.4953 LearningRate 0.0304 Epoch: 8 Global Step: 372520 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:35:07,923-Speed 2616.22 samples/sec Loss 7.4185 LearningRate 0.0304 Epoch: 8 Global Step: 372530 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:35:11,810-Speed 2634.84 samples/sec Loss 7.4540 LearningRate 0.0304 Epoch: 8 Global Step: 372540 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:35:15,713-Speed 2624.56 samples/sec Loss 7.5172 LearningRate 0.0304 Epoch: 8 Global Step: 372550 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:19,611-Speed 2627.54 samples/sec Loss 7.4529 LearningRate 0.0303 Epoch: 8 Global Step: 372560 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:23,498-Speed 2635.36 samples/sec Loss 7.4982 LearningRate 0.0303 Epoch: 8 Global Step: 372570 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:27,416-Speed 2614.13 samples/sec Loss 7.3438 LearningRate 0.0303 Epoch: 8 Global Step: 372580 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:31,310-Speed 2630.16 samples/sec Loss 7.5610 LearningRate 0.0303 Epoch: 8 Global Step: 372590 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:35,203-Speed 2630.96 samples/sec Loss 7.4525 LearningRate 0.0303 Epoch: 8 Global Step: 372600 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:39,097-Speed 2630.40 samples/sec Loss 7.4560 LearningRate 0.0303 Epoch: 8 Global Step: 372610 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:42,990-Speed 2630.65 samples/sec Loss 7.3108 LearningRate 0.0303 Epoch: 8 Global Step: 372620 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:46,891-Speed 2625.20 samples/sec Loss 7.4476 LearningRate 0.0303 Epoch: 8 Global Step: 372630 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:50,788-Speed 2628.99 samples/sec Loss 7.5975 LearningRate 0.0303 Epoch: 8 Global Step: 372640 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:35:54,686-Speed 2626.83 samples/sec Loss 7.5601 LearningRate 0.0303 Epoch: 8 Global Step: 372650 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:35:58,583-Speed 2629.08 samples/sec Loss 7.3586 LearningRate 0.0303 Epoch: 8 Global Step: 372660 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:02,496-Speed 2617.12 samples/sec Loss 7.6112 LearningRate 0.0303 Epoch: 8 Global Step: 372670 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:06,385-Speed 2633.63 samples/sec Loss 7.6459 LearningRate 0.0303 Epoch: 8 Global Step: 372680 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:10,287-Speed 2625.07 samples/sec Loss 7.3563 LearningRate 0.0303 Epoch: 8 Global Step: 372690 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:14,181-Speed 2631.30 samples/sec Loss 7.5531 LearningRate 0.0303 Epoch: 8 Global Step: 372700 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:18,079-Speed 2627.72 samples/sec Loss 7.4365 LearningRate 0.0303 Epoch: 8 Global Step: 372710 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:21,971-Speed 2631.11 samples/sec Loss 7.5500 LearningRate 0.0303 Epoch: 8 Global Step: 372720 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:25,875-Speed 2623.96 samples/sec Loss 7.5778 LearningRate 0.0303 Epoch: 8 Global Step: 372730 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:29,762-Speed 2634.88 samples/sec Loss 7.4442 LearningRate 0.0303 Epoch: 8 Global Step: 372740 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:33,655-Speed 2631.41 samples/sec Loss 7.4030 LearningRate 0.0303 Epoch: 8 Global Step: 372750 Fp16 Grad Scale: 262144 Required: 51 hours
Training: 2022-04-14 13:36:37,531-Speed 2642.69 samples/sec Loss 7.4245 LearningRate 0.0303 Epoch: 8 Global Step: 372760 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:41,436-Speed 2622.17 samples/sec Loss 7.4253 LearningRate 0.0303 Epoch: 8 Global Step: 372770 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:45,349-Speed 2617.87 samples/sec Loss 7.5723 LearningRate 0.0303 Epoch: 8 Global Step: 372780 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:49,238-Speed 2633.68 samples/sec Loss 7.3939 LearningRate 0.0303 Epoch: 8 Global Step: 372790 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:53,137-Speed 2626.94 samples/sec Loss 7.4871 LearningRate 0.0303 Epoch: 8 Global Step: 372800 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:36:57,025-Speed 2634.83 samples/sec Loss 7.5026 LearningRate 0.0303 Epoch: 8 Global Step: 372810 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:37:00,903-Speed 2641.29 samples/sec Loss 7.7252 LearningRate 0.0303 Epoch: 8 Global Step: 372820 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:04,798-Speed 2629.88 samples/sec Loss 7.5417 LearningRate 0.0303 Epoch: 8 Global Step: 372830 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:08,693-Speed 2629.37 samples/sec Loss 7.4269 LearningRate 0.0303 Epoch: 8 Global Step: 372840 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:12,596-Speed 2624.60 samples/sec Loss 7.4005 LearningRate 0.0303 Epoch: 8 Global Step: 372850 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:16,492-Speed 2628.63 samples/sec Loss 7.3288 LearningRate 0.0303 Epoch: 8 Global Step: 372860 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:20,387-Speed 2629.51 samples/sec Loss 7.4657 LearningRate 0.0303 Epoch: 8 Global Step: 372870 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:24,296-Speed 2621.09 samples/sec Loss 7.4033 LearningRate 0.0303 Epoch: 8 Global Step: 372880 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:28,192-Speed 2628.96 samples/sec Loss 7.4591 LearningRate 0.0303 Epoch: 8 Global Step: 372890 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:32,180-Speed 2568.15 samples/sec Loss 7.3791 LearningRate 0.0303 Epoch: 8 Global Step: 372900 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:36,142-Speed 2585.24 samples/sec Loss 7.3930 LearningRate 0.0303 Epoch: 8 Global Step: 372910 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:37:40,032-Speed 2633.03 samples/sec Loss 7.4531 LearningRate 0.0303 Epoch: 8 Global Step: 372920 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:37:43,929-Speed 2627.99 samples/sec Loss 7.5344 LearningRate 0.0303 Epoch: 8 Global Step: 372930 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:37:47,821-Speed 2632.69 samples/sec Loss 7.5043 LearningRate 0.0303 Epoch: 8 Global Step: 372940 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:37:51,719-Speed 2627.45 samples/sec Loss 7.4612 LearningRate 0.0303 Epoch: 8 Global Step: 372950 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:37:55,650-Speed 2605.89 samples/sec Loss 7.4087 LearningRate 0.0303 Epoch: 8 Global Step: 372960 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:37:59,558-Speed 2620.91 samples/sec Loss 7.4377 LearningRate 0.0303 Epoch: 8 Global Step: 372970 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:38:03,446-Speed 2634.35 samples/sec Loss 7.5648 LearningRate 0.0303 Epoch: 8 Global Step: 372980 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:38:07,345-Speed 2627.14 samples/sec Loss 7.5379 LearningRate 0.0303 Epoch: 8 Global Step: 372990 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:38:11,250-Speed 2622.71 samples/sec Loss 7.5402 LearningRate 0.0303 Epoch: 8 Global Step: 373000 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:38:15,148-Speed 2627.74 samples/sec Loss 7.4115 LearningRate 0.0303 Epoch: 8 Global Step: 373010 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:38:19,055-Speed 2621.79 samples/sec Loss 7.4359 LearningRate 0.0303 Epoch: 8 Global Step: 373020 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:38:22,958-Speed 2624.39 samples/sec Loss 7.4563 LearningRate 0.0303 Epoch: 8 Global Step: 373030 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:38:26,850-Speed 2631.03 samples/sec Loss 7.5334 LearningRate 0.0303 Epoch: 8 Global Step: 373040 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:38:30,713-Speed 2651.59 samples/sec Loss 7.3986 LearningRate 0.0303 Epoch: 8 Global Step: 373050 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:38:34,615-Speed 2625.66 samples/sec Loss 7.4284 LearningRate 0.0303 Epoch: 8 Global Step: 373060 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:38:38,552-Speed 2601.52 samples/sec Loss 7.5169 LearningRate 0.0303 Epoch: 8 Global Step: 373070 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:38:42,457-Speed 2622.81 samples/sec Loss 7.6204 LearningRate 0.0303 Epoch: 8 Global Step: 373080 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:38:46,348-Speed 2632.72 samples/sec Loss 7.4529 LearningRate 0.0303 Epoch: 8 Global Step: 373090 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:38:50,242-Speed 2630.28 samples/sec Loss 7.4858 LearningRate 0.0303 Epoch: 8 Global Step: 373100 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:38:54,163-Speed 2612.48 samples/sec Loss 7.4002 LearningRate 0.0303 Epoch: 8 Global Step: 373110 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:38:58,085-Speed 2611.79 samples/sec Loss 7.4061 LearningRate 0.0303 Epoch: 8 Global Step: 373120 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:01,974-Speed 2633.13 samples/sec Loss 7.4264 LearningRate 0.0303 Epoch: 8 Global Step: 373130 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:05,865-Speed 2632.07 samples/sec Loss 7.4362 LearningRate 0.0303 Epoch: 8 Global Step: 373140 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:09,756-Speed 2632.93 samples/sec Loss 7.4270 LearningRate 0.0303 Epoch: 8 Global Step: 373150 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:39:13,649-Speed 2630.40 samples/sec Loss 7.3783 LearningRate 0.0303 Epoch: 8 Global Step: 373160 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:39:17,535-Speed 2636.22 samples/sec Loss 7.5702 LearningRate 0.0303 Epoch: 8 Global Step: 373170 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:21,442-Speed 2621.66 samples/sec Loss 7.9409 LearningRate 0.0303 Epoch: 8 Global Step: 373180 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:25,355-Speed 2617.74 samples/sec Loss 7.4198 LearningRate 0.0303 Epoch: 8 Global Step: 373190 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:29,241-Speed 2635.33 samples/sec Loss 7.3569 LearningRate 0.0303 Epoch: 8 Global Step: 373200 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:33,134-Speed 2631.00 samples/sec Loss 7.3734 LearningRate 0.0303 Epoch: 8 Global Step: 373210 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:37,033-Speed 2626.91 samples/sec Loss 7.4176 LearningRate 0.0303 Epoch: 8 Global Step: 373220 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:40,926-Speed 2630.66 samples/sec Loss 7.4716 LearningRate 0.0303 Epoch: 8 Global Step: 373230 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:44,826-Speed 2626.28 samples/sec Loss 7.4529 LearningRate 0.0303 Epoch: 8 Global Step: 373240 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:48,715-Speed 2634.09 samples/sec Loss 7.5218 LearningRate 0.0303 Epoch: 8 Global Step: 373250 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:52,664-Speed 2593.94 samples/sec Loss 7.5672 LearningRate 0.0303 Epoch: 8 Global Step: 373260 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:39:56,584-Speed 2612.74 samples/sec Loss 7.5567 LearningRate 0.0303 Epoch: 8 Global Step: 373270 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:00,473-Speed 2634.19 samples/sec Loss 7.5259 LearningRate 0.0303 Epoch: 8 Global Step: 373280 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:04,375-Speed 2624.96 samples/sec Loss 7.4809 LearningRate 0.0303 Epoch: 8 Global Step: 373290 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:08,285-Speed 2619.48 samples/sec Loss 7.6564 LearningRate 0.0303 Epoch: 8 Global Step: 373300 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:12,201-Speed 2615.17 samples/sec Loss 7.4061 LearningRate 0.0303 Epoch: 8 Global Step: 373310 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:33,756-Speed 475.10 samples/sec Loss 7.3694 LearningRate 0.0302 Epoch: 9 Global Step: 373320 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:37,639-Speed 2638.28 samples/sec Loss 7.3945 LearningRate 0.0302 Epoch: 9 Global Step: 373330 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:41,618-Speed 2573.93 samples/sec Loss 7.4375 LearningRate 0.0302 Epoch: 9 Global Step: 373340 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:45,597-Speed 2574.37 samples/sec Loss 7.4042 LearningRate 0.0302 Epoch: 9 Global Step: 373350 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:49,485-Speed 2635.11 samples/sec Loss 7.5298 LearningRate 0.0302 Epoch: 9 Global Step: 373360 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:40:53,373-Speed 2633.90 samples/sec Loss 7.6136 LearningRate 0.0302 Epoch: 9 Global Step: 373370 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:40:57,278-Speed 2623.26 samples/sec Loss 7.4064 LearningRate 0.0302 Epoch: 9 Global Step: 373380 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:41:01,241-Speed 2584.83 samples/sec Loss 7.3080 LearningRate 0.0302 Epoch: 9 Global Step: 373390 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:41:05,135-Speed 2630.09 samples/sec Loss 7.3934 LearningRate 0.0302 Epoch: 9 Global Step: 373400 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:41:09,132-Speed 2562.44 samples/sec Loss 7.3376 LearningRate 0.0302 Epoch: 9 Global Step: 373410 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:41:13,041-Speed 2620.60 samples/sec Loss 7.4863 LearningRate 0.0302 Epoch: 9 Global Step: 373420 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:41:16,899-Speed 2655.11 samples/sec Loss 7.4832 LearningRate 0.0302 Epoch: 9 Global Step: 373430 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:20,814-Speed 2616.01 samples/sec Loss 7.5234 LearningRate 0.0302 Epoch: 9 Global Step: 373440 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:24,710-Speed 2629.03 samples/sec Loss 7.5608 LearningRate 0.0302 Epoch: 9 Global Step: 373450 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:28,611-Speed 2626.02 samples/sec Loss 7.4573 LearningRate 0.0302 Epoch: 9 Global Step: 373460 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:32,506-Speed 2629.40 samples/sec Loss 7.4742 LearningRate 0.0302 Epoch: 9 Global Step: 373470 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:36,401-Speed 2629.67 samples/sec Loss 7.5138 LearningRate 0.0302 Epoch: 9 Global Step: 373480 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:40,307-Speed 2621.94 samples/sec Loss 7.4580 LearningRate 0.0302 Epoch: 9 Global Step: 373490 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:44,205-Speed 2632.09 samples/sec Loss 7.5467 LearningRate 0.0302 Epoch: 9 Global Step: 373500 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:48,119-Speed 2616.44 samples/sec Loss 7.5418 LearningRate 0.0302 Epoch: 9 Global Step: 373510 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:52,039-Speed 2613.80 samples/sec Loss 7.4280 LearningRate 0.0302 Epoch: 9 Global Step: 373520 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:41:55,935-Speed 2628.40 samples/sec Loss 7.3102 LearningRate 0.0302 Epoch: 9 Global Step: 373530 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:41:59,860-Speed 2610.35 samples/sec Loss 7.4031 LearningRate 0.0302 Epoch: 9 Global Step: 373540 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:03,754-Speed 2630.34 samples/sec Loss 7.4495 LearningRate 0.0302 Epoch: 9 Global Step: 373550 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:07,680-Speed 2608.76 samples/sec Loss 7.5505 LearningRate 0.0302 Epoch: 9 Global Step: 373560 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:11,590-Speed 2619.00 samples/sec Loss 7.2894 LearningRate 0.0302 Epoch: 9 Global Step: 373570 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:15,498-Speed 2621.31 samples/sec Loss 7.4615 LearningRate 0.0302 Epoch: 9 Global Step: 373580 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:19,421-Speed 2611.22 samples/sec Loss 7.4524 LearningRate 0.0302 Epoch: 9 Global Step: 373590 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:23,312-Speed 2632.40 samples/sec Loss 7.3752 LearningRate 0.0302 Epoch: 9 Global Step: 373600 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:27,210-Speed 2627.69 samples/sec Loss 7.4590 LearningRate 0.0302 Epoch: 9 Global Step: 373610 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:31,109-Speed 2627.31 samples/sec Loss 7.3804 LearningRate 0.0302 Epoch: 9 Global Step: 373620 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:35,006-Speed 2628.25 samples/sec Loss 7.4317 LearningRate 0.0302 Epoch: 9 Global Step: 373630 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:42:38,913-Speed 2622.00 samples/sec Loss 7.2822 LearningRate 0.0302 Epoch: 9 Global Step: 373640 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:42:42,802-Speed 2632.96 samples/sec Loss 7.3589 LearningRate 0.0302 Epoch: 9 Global Step: 373650 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:46,698-Speed 2629.70 samples/sec Loss 7.4248 LearningRate 0.0302 Epoch: 9 Global Step: 373660 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:50,593-Speed 2629.30 samples/sec Loss 7.4881 LearningRate 0.0302 Epoch: 9 Global Step: 373670 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:54,487-Speed 2630.56 samples/sec Loss 7.3089 LearningRate 0.0302 Epoch: 9 Global Step: 373680 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:42:58,382-Speed 2629.65 samples/sec Loss 7.2710 LearningRate 0.0302 Epoch: 9 Global Step: 373690 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:43:02,277-Speed 2629.33 samples/sec Loss 7.4362 LearningRate 0.0302 Epoch: 9 Global Step: 373700 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:43:06,177-Speed 2625.93 samples/sec Loss 7.4176 LearningRate 0.0302 Epoch: 9 Global Step: 373710 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:43:10,082-Speed 2623.08 samples/sec Loss 7.2875 LearningRate 0.0302 Epoch: 9 Global Step: 373720 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:43:13,987-Speed 2622.92 samples/sec Loss 7.5301 LearningRate 0.0302 Epoch: 9 Global Step: 373730 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:43:17,887-Speed 2626.62 samples/sec Loss 7.3908 LearningRate 0.0302 Epoch: 9 Global Step: 373740 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:43:21,784-Speed 2628.49 samples/sec Loss 7.4588 LearningRate 0.0302 Epoch: 9 Global Step: 373750 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:43:25,762-Speed 2575.21 samples/sec Loss 7.3721 LearningRate 0.0302 Epoch: 9 Global Step: 373760 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:43:29,654-Speed 2631.47 samples/sec Loss 7.4106 LearningRate 0.0302 Epoch: 9 Global Step: 373770 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:43:33,509-Speed 2656.82 samples/sec Loss 7.5137 LearningRate 0.0302 Epoch: 9 Global Step: 373780 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:43:37,412-Speed 2623.62 samples/sec Loss 7.4801 LearningRate 0.0302 Epoch: 9 Global Step: 373790 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:43:41,311-Speed 2627.39 samples/sec Loss 7.3299 LearningRate 0.0302 Epoch: 9 Global Step: 373800 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:43:45,209-Speed 2627.52 samples/sec Loss 7.3562 LearningRate 0.0302 Epoch: 9 Global Step: 373810 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:43:49,105-Speed 2628.87 samples/sec Loss 7.3772 LearningRate 0.0302 Epoch: 9 Global Step: 373820 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:43:53,006-Speed 2626.01 samples/sec Loss 7.4895 LearningRate 0.0302 Epoch: 9 Global Step: 373830 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:43:56,910-Speed 2623.93 samples/sec Loss 7.3557 LearningRate 0.0302 Epoch: 9 Global Step: 373840 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:44:00,810-Speed 2626.18 samples/sec Loss 7.3088 LearningRate 0.0302 Epoch: 9 Global Step: 373850 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:44:04,704-Speed 2629.53 samples/sec Loss 7.4116 LearningRate 0.0302 Epoch: 9 Global Step: 373860 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:44:08,599-Speed 2630.24 samples/sec Loss 7.4271 LearningRate 0.0302 Epoch: 9 Global Step: 373870 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:44:12,493-Speed 2630.09 samples/sec Loss 7.4085 LearningRate 0.0302 Epoch: 9 Global Step: 373880 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:44:16,401-Speed 2620.98 samples/sec Loss 7.5625 LearningRate 0.0302 Epoch: 9 Global Step: 373890 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:44:20,303-Speed 2624.69 samples/sec Loss 7.4500 LearningRate 0.0302 Epoch: 9 Global Step: 373900 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:44:24,194-Speed 2632.79 samples/sec Loss 7.3656 LearningRate 0.0302 Epoch: 9 Global Step: 373910 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:44:28,105-Speed 2618.91 samples/sec Loss 7.3412 LearningRate 0.0302 Epoch: 9 Global Step: 373920 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:44:32,059-Speed 2590.05 samples/sec Loss 7.4831 LearningRate 0.0302 Epoch: 9 Global Step: 373930 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:44:35,918-Speed 2654.52 samples/sec Loss 7.4263 LearningRate 0.0302 Epoch: 9 Global Step: 373940 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:44:39,810-Speed 2631.98 samples/sec Loss 8.3790 LearningRate 0.0302 Epoch: 9 Global Step: 373950 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:44:43,698-Speed 2633.75 samples/sec Loss 7.6278 LearningRate 0.0302 Epoch: 9 Global Step: 373960 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:44:47,582-Speed 2637.53 samples/sec Loss 7.3650 LearningRate 0.0302 Epoch: 9 Global Step: 373970 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:44:51,472-Speed 2633.41 samples/sec Loss 7.4469 LearningRate 0.0302 Epoch: 9 Global Step: 373980 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:44:55,363-Speed 2632.32 samples/sec Loss 7.4616 LearningRate 0.0302 Epoch: 9 Global Step: 373990 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:44:59,258-Speed 2629.40 samples/sec Loss 7.3239 LearningRate 0.0302 Epoch: 9 Global Step: 374000 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:45:03,155-Speed 2628.20 samples/sec Loss 7.4125 LearningRate 0.0302 Epoch: 9 Global Step: 374010 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:45:07,042-Speed 2635.52 samples/sec Loss 7.5239 LearningRate 0.0302 Epoch: 9 Global Step: 374020 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:45:10,933-Speed 2632.48 samples/sec Loss 7.4036 LearningRate 0.0302 Epoch: 9 Global Step: 374030 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:45:14,837-Speed 2623.84 samples/sec Loss 7.3928 LearningRate 0.0302 Epoch: 9 Global Step: 374040 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:18,729-Speed 2631.44 samples/sec Loss 7.3568 LearningRate 0.0302 Epoch: 9 Global Step: 374050 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:22,624-Speed 2630.63 samples/sec Loss 7.4085 LearningRate 0.0302 Epoch: 9 Global Step: 374060 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:26,514-Speed 2632.66 samples/sec Loss 7.4283 LearningRate 0.0301 Epoch: 9 Global Step: 374070 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:30,415-Speed 2625.75 samples/sec Loss 7.4839 LearningRate 0.0301 Epoch: 9 Global Step: 374080 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:34,327-Speed 2618.38 samples/sec Loss 7.3327 LearningRate 0.0301 Epoch: 9 Global Step: 374090 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:38,251-Speed 2610.30 samples/sec Loss 7.3562 LearningRate 0.0301 Epoch: 9 Global Step: 374100 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:42,146-Speed 2629.56 samples/sec Loss 7.4828 LearningRate 0.0301 Epoch: 9 Global Step: 374110 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:46,214-Speed 2517.78 samples/sec Loss 7.3694 LearningRate 0.0301 Epoch: 9 Global Step: 374120 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:50,125-Speed 2619.85 samples/sec Loss 7.4593 LearningRate 0.0301 Epoch: 9 Global Step: 374130 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:45:54,019-Speed 2629.75 samples/sec Loss 7.2693 LearningRate 0.0301 Epoch: 9 Global Step: 374140 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:45:57,928-Speed 2620.56 samples/sec Loss 7.4375 LearningRate 0.0301 Epoch: 9 Global Step: 374150 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:01,835-Speed 2621.38 samples/sec Loss 7.3382 LearningRate 0.0301 Epoch: 9 Global Step: 374160 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:05,738-Speed 2624.42 samples/sec Loss 7.5469 LearningRate 0.0301 Epoch: 9 Global Step: 374170 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:09,637-Speed 2627.05 samples/sec Loss 7.3090 LearningRate 0.0301 Epoch: 9 Global Step: 374180 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:13,549-Speed 2617.94 samples/sec Loss 7.2263 LearningRate 0.0301 Epoch: 9 Global Step: 374190 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:17,453-Speed 2624.28 samples/sec Loss 7.1738 LearningRate 0.0301 Epoch: 9 Global Step: 374200 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:21,348-Speed 2629.67 samples/sec Loss 7.3949 LearningRate 0.0301 Epoch: 9 Global Step: 374210 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:25,240-Speed 2631.65 samples/sec Loss 7.4886 LearningRate 0.0301 Epoch: 9 Global Step: 374220 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:29,147-Speed 2621.81 samples/sec Loss 7.4113 LearningRate 0.0301 Epoch: 9 Global Step: 374230 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:46:33,049-Speed 2624.76 samples/sec Loss 7.4325 LearningRate 0.0301 Epoch: 9 Global Step: 374240 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:46:36,946-Speed 2628.24 samples/sec Loss 7.3989 LearningRate 0.0301 Epoch: 9 Global Step: 374250 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:46:40,873-Speed 2608.34 samples/sec Loss 7.3477 LearningRate 0.0301 Epoch: 9 Global Step: 374260 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:46:44,898-Speed 2544.67 samples/sec Loss 7.4270 LearningRate 0.0301 Epoch: 9 Global Step: 374270 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:46:48,788-Speed 2632.86 samples/sec Loss 7.4188 LearningRate 0.0301 Epoch: 9 Global Step: 374280 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:46:52,683-Speed 2630.22 samples/sec Loss 7.4149 LearningRate 0.0301 Epoch: 9 Global Step: 374290 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:46:56,589-Speed 2621.62 samples/sec Loss 7.4154 LearningRate 0.0301 Epoch: 9 Global Step: 374300 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:47:00,491-Speed 2625.17 samples/sec Loss 7.4257 LearningRate 0.0301 Epoch: 9 Global Step: 374310 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:47:04,399-Speed 2620.94 samples/sec Loss 7.4275 LearningRate 0.0301 Epoch: 9 Global Step: 374320 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:47:08,299-Speed 2626.25 samples/sec Loss 7.3514 LearningRate 0.0301 Epoch: 9 Global Step: 374330 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:47:12,200-Speed 2625.59 samples/sec Loss 7.4689 LearningRate 0.0301 Epoch: 9 Global Step: 374340 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:47:16,116-Speed 2615.85 samples/sec Loss 7.2877 LearningRate 0.0301 Epoch: 9 Global Step: 374350 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:47:20,020-Speed 2624.10 samples/sec Loss 7.5158 LearningRate 0.0301 Epoch: 9 Global Step: 374360 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:47:23,932-Speed 2617.91 samples/sec Loss 7.2775 LearningRate 0.0301 Epoch: 9 Global Step: 374370 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:47:27,831-Speed 2627.30 samples/sec Loss 7.3571 LearningRate 0.0301 Epoch: 9 Global Step: 374380 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:47:31,691-Speed 2653.34 samples/sec Loss 7.8395 LearningRate 0.0301 Epoch: 9 Global Step: 374390 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:47:35,584-Speed 2630.80 samples/sec Loss 7.8361 LearningRate 0.0301 Epoch: 9 Global Step: 374400 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:47:39,482-Speed 2627.64 samples/sec Loss 7.3236 LearningRate 0.0301 Epoch: 9 Global Step: 374410 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:47:43,396-Speed 2616.88 samples/sec Loss 7.3669 LearningRate 0.0301 Epoch: 9 Global Step: 374420 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:47:47,294-Speed 2627.64 samples/sec Loss 7.4412 LearningRate 0.0301 Epoch: 9 Global Step: 374430 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:47:51,184-Speed 2633.61 samples/sec Loss 7.4261 LearningRate 0.0301 Epoch: 9 Global Step: 374440 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:47:55,073-Speed 2633.61 samples/sec Loss 7.3970 LearningRate 0.0301 Epoch: 9 Global Step: 374450 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:47:58,969-Speed 2628.69 samples/sec Loss 7.3617 LearningRate 0.0301 Epoch: 9 Global Step: 374460 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:48:02,868-Speed 2627.59 samples/sec Loss 7.5678 LearningRate 0.0301 Epoch: 9 Global Step: 374470 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:48:06,767-Speed 2626.76 samples/sec Loss 7.6299 LearningRate 0.0301 Epoch: 9 Global Step: 374480 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:48:10,675-Speed 2620.58 samples/sec Loss 7.5211 LearningRate 0.0301 Epoch: 9 Global Step: 374490 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:14,561-Speed 2635.99 samples/sec Loss 7.4017 LearningRate 0.0301 Epoch: 9 Global Step: 374500 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:18,463-Speed 2625.03 samples/sec Loss 7.3727 LearningRate 0.0301 Epoch: 9 Global Step: 374510 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:22,363-Speed 2626.42 samples/sec Loss 7.5138 LearningRate 0.0301 Epoch: 9 Global Step: 374520 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:26,254-Speed 2632.13 samples/sec Loss 7.3012 LearningRate 0.0301 Epoch: 9 Global Step: 374530 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:30,153-Speed 2627.63 samples/sec Loss 7.3520 LearningRate 0.0301 Epoch: 9 Global Step: 374540 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:34,047-Speed 2629.87 samples/sec Loss 7.2639 LearningRate 0.0301 Epoch: 9 Global Step: 374550 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:37,945-Speed 2627.92 samples/sec Loss 7.4496 LearningRate 0.0301 Epoch: 9 Global Step: 374560 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:41,842-Speed 2628.56 samples/sec Loss 7.4161 LearningRate 0.0301 Epoch: 9 Global Step: 374570 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:45,741-Speed 2627.00 samples/sec Loss 7.3643 LearningRate 0.0301 Epoch: 9 Global Step: 374580 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:48:49,637-Speed 2629.25 samples/sec Loss 7.5423 LearningRate 0.0301 Epoch: 9 Global Step: 374590 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:48:53,523-Speed 2635.38 samples/sec Loss 7.3772 LearningRate 0.0301 Epoch: 9 Global Step: 374600 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:48:57,425-Speed 2624.76 samples/sec Loss 7.3541 LearningRate 0.0301 Epoch: 9 Global Step: 374610 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:01,324-Speed 2627.03 samples/sec Loss 7.4441 LearningRate 0.0301 Epoch: 9 Global Step: 374620 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:05,224-Speed 2626.64 samples/sec Loss 7.4053 LearningRate 0.0301 Epoch: 9 Global Step: 374630 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:09,132-Speed 2620.94 samples/sec Loss 7.3927 LearningRate 0.0301 Epoch: 9 Global Step: 374640 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:13,049-Speed 2614.70 samples/sec Loss 7.5037 LearningRate 0.0301 Epoch: 9 Global Step: 374650 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:16,943-Speed 2630.29 samples/sec Loss 7.4029 LearningRate 0.0301 Epoch: 9 Global Step: 374660 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:20,834-Speed 2632.49 samples/sec Loss 7.4105 LearningRate 0.0301 Epoch: 9 Global Step: 374670 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:24,735-Speed 2625.25 samples/sec Loss 7.4889 LearningRate 0.0301 Epoch: 9 Global Step: 374680 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:28,634-Speed 2627.35 samples/sec Loss 7.5163 LearningRate 0.0301 Epoch: 9 Global Step: 374690 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:49:32,515-Speed 2638.73 samples/sec Loss 7.5228 LearningRate 0.0301 Epoch: 9 Global Step: 374700 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:36,410-Speed 2629.66 samples/sec Loss 7.5321 LearningRate 0.0301 Epoch: 9 Global Step: 374710 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:40,306-Speed 2629.25 samples/sec Loss 7.4021 LearningRate 0.0301 Epoch: 9 Global Step: 374720 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:44,198-Speed 2631.79 samples/sec Loss 7.4446 LearningRate 0.0301 Epoch: 9 Global Step: 374730 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:48,104-Speed 2621.83 samples/sec Loss 7.4940 LearningRate 0.0301 Epoch: 9 Global Step: 374740 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:51,994-Speed 2633.31 samples/sec Loss 7.3304 LearningRate 0.0301 Epoch: 9 Global Step: 374750 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:55,889-Speed 2629.92 samples/sec Loss 7.4165 LearningRate 0.0301 Epoch: 9 Global Step: 374760 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:49:59,798-Speed 2619.90 samples/sec Loss 7.3708 LearningRate 0.0301 Epoch: 9 Global Step: 374770 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:50:03,688-Speed 2632.99 samples/sec Loss 7.2917 LearningRate 0.0301 Epoch: 9 Global Step: 374780 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:50:07,582-Speed 2630.76 samples/sec Loss 7.4613 LearningRate 0.0301 Epoch: 9 Global Step: 374790 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:50:11,482-Speed 2626.78 samples/sec Loss 7.4664 LearningRate 0.0301 Epoch: 9 Global Step: 374800 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:50:15,373-Speed 2632.24 samples/sec Loss 7.4959 LearningRate 0.0301 Epoch: 9 Global Step: 374810 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:50:19,273-Speed 2625.91 samples/sec Loss 7.3600 LearningRate 0.0301 Epoch: 9 Global Step: 374820 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:50:23,179-Speed 2623.14 samples/sec Loss 7.4406 LearningRate 0.0300 Epoch: 9 Global Step: 374830 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:50:27,079-Speed 2625.77 samples/sec Loss 7.4683 LearningRate 0.0300 Epoch: 9 Global Step: 374840 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:50:30,983-Speed 2623.67 samples/sec Loss 7.4364 LearningRate 0.0300 Epoch: 9 Global Step: 374850 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:50:34,848-Speed 2650.33 samples/sec Loss 7.7852 LearningRate 0.0300 Epoch: 9 Global Step: 374860 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:50:38,716-Speed 2648.40 samples/sec Loss 7.8912 LearningRate 0.0300 Epoch: 9 Global Step: 374870 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:50:42,607-Speed 2632.03 samples/sec Loss 7.5438 LearningRate 0.0300 Epoch: 9 Global Step: 374880 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:50:46,502-Speed 2629.93 samples/sec Loss 7.2349 LearningRate 0.0300 Epoch: 9 Global Step: 374890 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:50:50,401-Speed 2626.55 samples/sec Loss 7.5156 LearningRate 0.0300 Epoch: 9 Global Step: 374900 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:50:54,306-Speed 2623.13 samples/sec Loss 7.3620 LearningRate 0.0300 Epoch: 9 Global Step: 374910 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:50:58,201-Speed 2629.81 samples/sec Loss 7.3685 LearningRate 0.0300 Epoch: 9 Global Step: 374920 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:51:02,100-Speed 2626.63 samples/sec Loss 7.4385 LearningRate 0.0300 Epoch: 9 Global Step: 374930 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:51:06,000-Speed 2626.17 samples/sec Loss 7.2715 LearningRate 0.0300 Epoch: 9 Global Step: 374940 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:51:09,899-Speed 2627.47 samples/sec Loss 7.3971 LearningRate 0.0300 Epoch: 9 Global Step: 374950 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:51:13,805-Speed 2622.15 samples/sec Loss 7.3840 LearningRate 0.0300 Epoch: 9 Global Step: 374960 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 13:51:17,717-Speed 2618.01 samples/sec Loss 7.3769 LearningRate 0.0300 Epoch: 9 Global Step: 374970 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:21,614-Speed 2628.32 samples/sec Loss 7.5137 LearningRate 0.0300 Epoch: 9 Global Step: 374980 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:25,504-Speed 2632.95 samples/sec Loss 7.3920 LearningRate 0.0300 Epoch: 9 Global Step: 374990 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:29,399-Speed 2630.02 samples/sec Loss 7.4898 LearningRate 0.0300 Epoch: 9 Global Step: 375000 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:33,296-Speed 2628.31 samples/sec Loss 7.5158 LearningRate 0.0300 Epoch: 9 Global Step: 375010 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:37,212-Speed 2614.94 samples/sec Loss 7.3054 LearningRate 0.0300 Epoch: 9 Global Step: 375020 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:41,104-Speed 2631.42 samples/sec Loss 7.4192 LearningRate 0.0300 Epoch: 9 Global Step: 375030 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:44,998-Speed 2630.41 samples/sec Loss 7.5172 LearningRate 0.0300 Epoch: 9 Global Step: 375040 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:48,889-Speed 2632.35 samples/sec Loss 7.4724 LearningRate 0.0300 Epoch: 9 Global Step: 375050 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:52,781-Speed 2631.96 samples/sec Loss 7.5040 LearningRate 0.0300 Epoch: 9 Global Step: 375060 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:51:56,677-Speed 2628.59 samples/sec Loss 7.4361 LearningRate 0.0300 Epoch: 9 Global Step: 375070 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:00,578-Speed 2626.26 samples/sec Loss 7.2609 LearningRate 0.0300 Epoch: 9 Global Step: 375080 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:04,501-Speed 2610.65 samples/sec Loss 7.3696 LearningRate 0.0300 Epoch: 9 Global Step: 375090 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:08,401-Speed 2626.11 samples/sec Loss 7.4396 LearningRate 0.0300 Epoch: 9 Global Step: 375100 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:12,298-Speed 2628.26 samples/sec Loss 7.4146 LearningRate 0.0300 Epoch: 9 Global Step: 375110 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:16,320-Speed 2546.86 samples/sec Loss 7.4130 LearningRate 0.0300 Epoch: 9 Global Step: 375120 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:20,258-Speed 2600.83 samples/sec Loss 7.4198 LearningRate 0.0300 Epoch: 9 Global Step: 375130 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:24,231-Speed 2577.93 samples/sec Loss 7.4761 LearningRate 0.0300 Epoch: 9 Global Step: 375140 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:28,180-Speed 2594.12 samples/sec Loss 7.5133 LearningRate 0.0300 Epoch: 9 Global Step: 375150 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:32,081-Speed 2625.54 samples/sec Loss 7.4076 LearningRate 0.0300 Epoch: 9 Global Step: 375160 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:52:35,983-Speed 2624.80 samples/sec Loss 7.2928 LearningRate 0.0300 Epoch: 9 Global Step: 375170 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:52:39,877-Speed 2630.27 samples/sec Loss 7.5476 LearningRate 0.0300 Epoch: 9 Global Step: 375180 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:52:43,832-Speed 2589.84 samples/sec Loss 7.4577 LearningRate 0.0300 Epoch: 9 Global Step: 375190 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:52:47,743-Speed 2619.06 samples/sec Loss 7.3979 LearningRate 0.0300 Epoch: 9 Global Step: 375200 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:52:51,783-Speed 2535.75 samples/sec Loss 7.4485 LearningRate 0.0300 Epoch: 9 Global Step: 375210 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:52:55,688-Speed 2622.67 samples/sec Loss 7.4899 LearningRate 0.0300 Epoch: 9 Global Step: 375220 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:52:59,583-Speed 2629.66 samples/sec Loss 7.3261 LearningRate 0.0300 Epoch: 9 Global Step: 375230 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:53:03,471-Speed 2634.43 samples/sec Loss 7.5307 LearningRate 0.0300 Epoch: 9 Global Step: 375240 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:07,379-Speed 2621.06 samples/sec Loss 8.2978 LearningRate 0.0300 Epoch: 9 Global Step: 375250 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:11,277-Speed 2627.18 samples/sec Loss 7.6954 LearningRate 0.0300 Epoch: 9 Global Step: 375260 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:15,181-Speed 2623.79 samples/sec Loss 7.4547 LearningRate 0.0300 Epoch: 9 Global Step: 375270 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:19,089-Speed 2621.25 samples/sec Loss 7.5140 LearningRate 0.0300 Epoch: 9 Global Step: 375280 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:23,008-Speed 2614.23 samples/sec Loss 7.4181 LearningRate 0.0300 Epoch: 9 Global Step: 375290 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:26,907-Speed 2627.25 samples/sec Loss 7.6964 LearningRate 0.0300 Epoch: 9 Global Step: 375300 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:30,799-Speed 2631.91 samples/sec Loss 7.4024 LearningRate 0.0300 Epoch: 9 Global Step: 375310 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:34,695-Speed 2628.61 samples/sec Loss 7.3789 LearningRate 0.0300 Epoch: 9 Global Step: 375320 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:38,701-Speed 2556.87 samples/sec Loss 7.5098 LearningRate 0.0300 Epoch: 9 Global Step: 375330 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:53:42,600-Speed 2626.90 samples/sec Loss 7.2400 LearningRate 0.0300 Epoch: 9 Global Step: 375340 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:53:46,565-Speed 2583.38 samples/sec Loss 7.2552 LearningRate 0.0300 Epoch: 9 Global Step: 375350 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:53:50,462-Speed 2628.32 samples/sec Loss 7.3591 LearningRate 0.0300 Epoch: 9 Global Step: 375360 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:53:54,364-Speed 2625.56 samples/sec Loss 7.4625 LearningRate 0.0300 Epoch: 9 Global Step: 375370 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:53:58,320-Speed 2594.25 samples/sec Loss 7.4608 LearningRate 0.0300 Epoch: 9 Global Step: 375380 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:02,265-Speed 2596.49 samples/sec Loss 7.5268 LearningRate 0.0300 Epoch: 9 Global Step: 375390 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:06,219-Speed 2590.06 samples/sec Loss 7.3503 LearningRate 0.0300 Epoch: 9 Global Step: 375400 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:10,108-Speed 2633.59 samples/sec Loss 7.4337 LearningRate 0.0300 Epoch: 9 Global Step: 375410 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:14,011-Speed 2624.53 samples/sec Loss 7.4847 LearningRate 0.0300 Epoch: 9 Global Step: 375420 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:17,909-Speed 2627.48 samples/sec Loss 7.4255 LearningRate 0.0300 Epoch: 9 Global Step: 375430 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:21,810-Speed 2626.12 samples/sec Loss 7.4620 LearningRate 0.0300 Epoch: 9 Global Step: 375440 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:25,700-Speed 2633.68 samples/sec Loss 7.5854 LearningRate 0.0300 Epoch: 9 Global Step: 375450 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:29,591-Speed 2631.85 samples/sec Loss 7.4486 LearningRate 0.0300 Epoch: 9 Global Step: 375460 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:33,484-Speed 2630.86 samples/sec Loss 7.3912 LearningRate 0.0300 Epoch: 9 Global Step: 375470 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:54:37,384-Speed 2626.73 samples/sec Loss 7.4811 LearningRate 0.0300 Epoch: 9 Global Step: 375480 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:54:41,294-Speed 2619.83 samples/sec Loss 7.4051 LearningRate 0.0300 Epoch: 9 Global Step: 375490 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:54:45,181-Speed 2634.71 samples/sec Loss 7.4440 LearningRate 0.0300 Epoch: 9 Global Step: 375500 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:54:49,072-Speed 2632.54 samples/sec Loss 7.4349 LearningRate 0.0300 Epoch: 9 Global Step: 375510 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:54:52,970-Speed 2627.71 samples/sec Loss 7.4442 LearningRate 0.0300 Epoch: 9 Global Step: 375520 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:54:56,864-Speed 2630.39 samples/sec Loss 7.3803 LearningRate 0.0300 Epoch: 9 Global Step: 375530 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:55:00,763-Speed 2626.93 samples/sec Loss 7.4193 LearningRate 0.0300 Epoch: 9 Global Step: 375540 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:55:04,661-Speed 2627.87 samples/sec Loss 7.4465 LearningRate 0.0300 Epoch: 9 Global Step: 375550 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:55:08,548-Speed 2635.23 samples/sec Loss 7.3614 LearningRate 0.0300 Epoch: 9 Global Step: 375560 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:55:12,443-Speed 2629.83 samples/sec Loss 7.3313 LearningRate 0.0300 Epoch: 9 Global Step: 375570 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:55:16,339-Speed 2629.19 samples/sec Loss 7.3636 LearningRate 0.0299 Epoch: 9 Global Step: 375580 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:20,233-Speed 2630.60 samples/sec Loss 7.3976 LearningRate 0.0299 Epoch: 9 Global Step: 375590 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:24,125-Speed 2632.07 samples/sec Loss 7.2711 LearningRate 0.0299 Epoch: 9 Global Step: 375600 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:28,029-Speed 2622.78 samples/sec Loss 7.2722 LearningRate 0.0299 Epoch: 9 Global Step: 375610 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:31,932-Speed 2625.03 samples/sec Loss 7.4160 LearningRate 0.0299 Epoch: 9 Global Step: 375620 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:35,823-Speed 2632.25 samples/sec Loss 7.3355 LearningRate 0.0299 Epoch: 9 Global Step: 375630 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:39,794-Speed 2579.19 samples/sec Loss 7.3957 LearningRate 0.0299 Epoch: 9 Global Step: 375640 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:43,691-Speed 2628.16 samples/sec Loss 7.3504 LearningRate 0.0299 Epoch: 9 Global Step: 375650 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:47,620-Speed 2607.24 samples/sec Loss 7.4745 LearningRate 0.0299 Epoch: 9 Global Step: 375660 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:51,522-Speed 2624.79 samples/sec Loss 7.3925 LearningRate 0.0299 Epoch: 9 Global Step: 375670 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:55:55,414-Speed 2631.77 samples/sec Loss 7.3391 LearningRate 0.0299 Epoch: 9 Global Step: 375680 Fp16 Grad Scale: 262144 Required: 51 hours
Training: 2022-04-14 13:55:59,290-Speed 2642.43 samples/sec Loss 7.4044 LearningRate 0.0299 Epoch: 9 Global Step: 375690 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:03,189-Speed 2627.33 samples/sec Loss 7.2664 LearningRate 0.0299 Epoch: 9 Global Step: 375700 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:07,082-Speed 2630.68 samples/sec Loss 7.4132 LearningRate 0.0299 Epoch: 9 Global Step: 375710 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:10,977-Speed 2629.63 samples/sec Loss 7.3239 LearningRate 0.0299 Epoch: 9 Global Step: 375720 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:14,893-Speed 2615.49 samples/sec Loss 7.3528 LearningRate 0.0299 Epoch: 9 Global Step: 375730 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:18,793-Speed 2626.82 samples/sec Loss 7.4307 LearningRate 0.0299 Epoch: 9 Global Step: 375740 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:22,783-Speed 2567.05 samples/sec Loss 7.3271 LearningRate 0.0299 Epoch: 9 Global Step: 375750 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:26,706-Speed 2610.66 samples/sec Loss 7.3944 LearningRate 0.0299 Epoch: 9 Global Step: 375760 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:30,602-Speed 2629.53 samples/sec Loss 7.4068 LearningRate 0.0299 Epoch: 9 Global Step: 375770 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:34,533-Speed 2605.41 samples/sec Loss 7.3392 LearningRate 0.0299 Epoch: 9 Global Step: 375780 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:56:38,355-Speed 2680.26 samples/sec Loss 7.9681 LearningRate 0.0299 Epoch: 9 Global Step: 375790 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:56:42,251-Speed 2629.03 samples/sec Loss 7.6799 LearningRate 0.0299 Epoch: 9 Global Step: 375800 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:56:46,226-Speed 2577.26 samples/sec Loss 7.4617 LearningRate 0.0299 Epoch: 9 Global Step: 375810 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:56:50,122-Speed 2629.00 samples/sec Loss 7.3001 LearningRate 0.0299 Epoch: 9 Global Step: 375820 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:56:54,009-Speed 2634.91 samples/sec Loss 7.4034 LearningRate 0.0299 Epoch: 9 Global Step: 375830 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:56:57,905-Speed 2629.04 samples/sec Loss 7.3275 LearningRate 0.0299 Epoch: 9 Global Step: 375840 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:57:01,792-Speed 2635.23 samples/sec Loss 7.3654 LearningRate 0.0299 Epoch: 9 Global Step: 375850 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:57:05,694-Speed 2625.08 samples/sec Loss 7.4654 LearningRate 0.0299 Epoch: 9 Global Step: 375860 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:57:09,589-Speed 2629.28 samples/sec Loss 7.3762 LearningRate 0.0299 Epoch: 9 Global Step: 375870 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:57:13,578-Speed 2567.83 samples/sec Loss 7.4568 LearningRate 0.0299 Epoch: 9 Global Step: 375880 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:57:17,474-Speed 2629.09 samples/sec Loss 7.4342 LearningRate 0.0299 Epoch: 9 Global Step: 375890 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:21,371-Speed 2628.48 samples/sec Loss 7.3200 LearningRate 0.0299 Epoch: 9 Global Step: 375900 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:25,262-Speed 2631.94 samples/sec Loss 7.2912 LearningRate 0.0299 Epoch: 9 Global Step: 375910 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:29,162-Speed 2626.78 samples/sec Loss 7.3902 LearningRate 0.0299 Epoch: 9 Global Step: 375920 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:33,067-Speed 2622.83 samples/sec Loss 7.3633 LearningRate 0.0299 Epoch: 9 Global Step: 375930 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:36,973-Speed 2621.95 samples/sec Loss 7.4053 LearningRate 0.0299 Epoch: 9 Global Step: 375940 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:40,897-Speed 2609.87 samples/sec Loss 7.4586 LearningRate 0.0299 Epoch: 9 Global Step: 375950 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:44,801-Speed 2623.87 samples/sec Loss 7.3667 LearningRate 0.0299 Epoch: 9 Global Step: 375960 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:48,687-Speed 2635.63 samples/sec Loss 7.3789 LearningRate 0.0299 Epoch: 9 Global Step: 375970 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:52,576-Speed 2633.94 samples/sec Loss 7.3592 LearningRate 0.0299 Epoch: 9 Global Step: 375980 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 13:57:56,468-Speed 2631.47 samples/sec Loss 7.4006 LearningRate 0.0299 Epoch: 9 Global Step: 375990 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:00,363-Speed 2629.75 samples/sec Loss 7.3969 LearningRate 0.0299 Epoch: 9 Global Step: 376000 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:04,266-Speed 2624.15 samples/sec Loss 7.3147 LearningRate 0.0299 Epoch: 9 Global Step: 376010 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:08,166-Speed 2626.51 samples/sec Loss 7.2802 LearningRate 0.0299 Epoch: 9 Global Step: 376020 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:12,065-Speed 2626.77 samples/sec Loss 7.3714 LearningRate 0.0299 Epoch: 9 Global Step: 376030 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:15,988-Speed 2610.91 samples/sec Loss 7.4145 LearningRate 0.0299 Epoch: 9 Global Step: 376040 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:19,903-Speed 2615.76 samples/sec Loss 7.3776 LearningRate 0.0299 Epoch: 9 Global Step: 376050 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:23,798-Speed 2629.96 samples/sec Loss 7.2837 LearningRate 0.0299 Epoch: 9 Global Step: 376060 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:27,691-Speed 2631.41 samples/sec Loss 7.4031 LearningRate 0.0299 Epoch: 9 Global Step: 376070 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:31,586-Speed 2629.43 samples/sec Loss 7.2940 LearningRate 0.0299 Epoch: 9 Global Step: 376080 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:58:35,490-Speed 2623.87 samples/sec Loss 7.4186 LearningRate 0.0299 Epoch: 9 Global Step: 376090 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:58:39,380-Speed 2632.47 samples/sec Loss 7.4818 LearningRate 0.0299 Epoch: 9 Global Step: 376100 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:58:43,278-Speed 2627.47 samples/sec Loss 7.3244 LearningRate 0.0299 Epoch: 9 Global Step: 376110 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:58:47,171-Speed 2630.81 samples/sec Loss 7.3808 LearningRate 0.0299 Epoch: 9 Global Step: 376120 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:58:51,066-Speed 2630.10 samples/sec Loss 7.4461 LearningRate 0.0299 Epoch: 9 Global Step: 376130 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:58:54,973-Speed 2621.23 samples/sec Loss 7.1946 LearningRate 0.0299 Epoch: 9 Global Step: 376140 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 13:58:58,856-Speed 2638.34 samples/sec Loss 7.3858 LearningRate 0.0299 Epoch: 9 Global Step: 376150 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:02,746-Speed 2632.87 samples/sec Loss 7.2642 LearningRate 0.0299 Epoch: 9 Global Step: 376160 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:06,682-Speed 2603.15 samples/sec Loss 7.4184 LearningRate 0.0299 Epoch: 9 Global Step: 376170 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:10,578-Speed 2628.73 samples/sec Loss 7.3082 LearningRate 0.0299 Epoch: 9 Global Step: 376180 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:14,487-Speed 2619.85 samples/sec Loss 7.3162 LearningRate 0.0299 Epoch: 9 Global Step: 376190 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:18,388-Speed 2626.12 samples/sec Loss 7.2678 LearningRate 0.0299 Epoch: 9 Global Step: 376200 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:22,279-Speed 2632.67 samples/sec Loss 7.3488 LearningRate 0.0299 Epoch: 9 Global Step: 376210 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:26,209-Speed 2606.35 samples/sec Loss 7.4289 LearningRate 0.0299 Epoch: 9 Global Step: 376220 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:30,114-Speed 2623.31 samples/sec Loss 7.4149 LearningRate 0.0299 Epoch: 9 Global Step: 376230 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:34,015-Speed 2625.49 samples/sec Loss 7.4112 LearningRate 0.0299 Epoch: 9 Global Step: 376240 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 13:59:37,877-Speed 2651.89 samples/sec Loss 7.8352 LearningRate 0.0299 Epoch: 9 Global Step: 376250 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:59:41,774-Speed 2628.47 samples/sec Loss 8.3635 LearningRate 0.0299 Epoch: 9 Global Step: 376260 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:59:45,669-Speed 2629.57 samples/sec Loss 7.7955 LearningRate 0.0299 Epoch: 9 Global Step: 376270 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:59:49,562-Speed 2631.27 samples/sec Loss 7.7840 LearningRate 0.0299 Epoch: 9 Global Step: 376280 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:59:53,455-Speed 2631.08 samples/sec Loss 7.5972 LearningRate 0.0299 Epoch: 9 Global Step: 376290 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 13:59:57,370-Speed 2616.69 samples/sec Loss 7.5820 LearningRate 0.0299 Epoch: 9 Global Step: 376300 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:00:01,306-Speed 2601.66 samples/sec Loss 7.5361 LearningRate 0.0299 Epoch: 9 Global Step: 376310 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:00:05,204-Speed 2628.34 samples/sec Loss 7.5472 LearningRate 0.0299 Epoch: 9 Global Step: 376320 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:00:09,102-Speed 2627.86 samples/sec Loss 7.4975 LearningRate 0.0299 Epoch: 9 Global Step: 376330 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:00:12,994-Speed 2631.37 samples/sec Loss 7.4827 LearningRate 0.0298 Epoch: 9 Global Step: 376340 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:00:16,889-Speed 2629.36 samples/sec Loss 7.4756 LearningRate 0.0298 Epoch: 9 Global Step: 376350 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:20,781-Speed 2632.27 samples/sec Loss 7.4312 LearningRate 0.0298 Epoch: 9 Global Step: 376360 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:24,666-Speed 2636.36 samples/sec Loss 7.4398 LearningRate 0.0298 Epoch: 9 Global Step: 376370 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:28,588-Speed 2611.16 samples/sec Loss 7.3882 LearningRate 0.0298 Epoch: 9 Global Step: 376380 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:32,492-Speed 2624.39 samples/sec Loss 7.4173 LearningRate 0.0298 Epoch: 9 Global Step: 376390 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:36,431-Speed 2600.50 samples/sec Loss 7.4765 LearningRate 0.0298 Epoch: 9 Global Step: 376400 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:40,333-Speed 2624.58 samples/sec Loss 7.3681 LearningRate 0.0298 Epoch: 9 Global Step: 376410 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:44,222-Speed 2633.81 samples/sec Loss 7.3915 LearningRate 0.0298 Epoch: 9 Global Step: 376420 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:48,116-Speed 2630.55 samples/sec Loss 7.2647 LearningRate 0.0298 Epoch: 9 Global Step: 376430 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:52,015-Speed 2626.73 samples/sec Loss 7.4339 LearningRate 0.0298 Epoch: 9 Global Step: 376440 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:00:55,906-Speed 2632.54 samples/sec Loss 7.3257 LearningRate 0.0298 Epoch: 9 Global Step: 376450 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:00:59,801-Speed 2629.60 samples/sec Loss 7.4363 LearningRate 0.0298 Epoch: 9 Global Step: 376460 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:03,692-Speed 2632.08 samples/sec Loss 7.4733 LearningRate 0.0298 Epoch: 9 Global Step: 376470 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:07,585-Speed 2631.08 samples/sec Loss 7.3083 LearningRate 0.0298 Epoch: 9 Global Step: 376480 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:11,500-Speed 2616.28 samples/sec Loss 7.5941 LearningRate 0.0298 Epoch: 9 Global Step: 376490 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:15,404-Speed 2624.01 samples/sec Loss 7.5477 LearningRate 0.0298 Epoch: 9 Global Step: 376500 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:19,296-Speed 2631.96 samples/sec Loss 7.4694 LearningRate 0.0298 Epoch: 9 Global Step: 376510 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:23,219-Speed 2611.24 samples/sec Loss 7.4581 LearningRate 0.0298 Epoch: 9 Global Step: 376520 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:27,114-Speed 2629.64 samples/sec Loss 7.3115 LearningRate 0.0298 Epoch: 9 Global Step: 376530 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:31,026-Speed 2619.21 samples/sec Loss 7.3020 LearningRate 0.0298 Epoch: 9 Global Step: 376540 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:01:34,931-Speed 2623.41 samples/sec Loss 7.4560 LearningRate 0.0298 Epoch: 9 Global Step: 376550 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:01:38,821-Speed 2632.40 samples/sec Loss 7.3052 LearningRate 0.0298 Epoch: 9 Global Step: 376560 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:01:42,719-Speed 2627.49 samples/sec Loss 7.4956 LearningRate 0.0298 Epoch: 9 Global Step: 376570 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:01:46,624-Speed 2623.31 samples/sec Loss 7.4297 LearningRate 0.0298 Epoch: 9 Global Step: 376580 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:01:50,630-Speed 2556.93 samples/sec Loss 7.3897 LearningRate 0.0298 Epoch: 9 Global Step: 376590 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:01:54,550-Speed 2613.02 samples/sec Loss 7.3959 LearningRate 0.0298 Epoch: 9 Global Step: 376600 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:01:58,456-Speed 2621.70 samples/sec Loss 7.5520 LearningRate 0.0298 Epoch: 9 Global Step: 376610 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:02:02,394-Speed 2601.24 samples/sec Loss 7.4952 LearningRate 0.0298 Epoch: 9 Global Step: 376620 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:06,293-Speed 2627.29 samples/sec Loss 7.3676 LearningRate 0.0298 Epoch: 9 Global Step: 376630 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:10,180-Speed 2635.18 samples/sec Loss 7.5975 LearningRate 0.0298 Epoch: 9 Global Step: 376640 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:14,108-Speed 2607.32 samples/sec Loss 7.3368 LearningRate 0.0298 Epoch: 9 Global Step: 376650 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:18,000-Speed 2631.93 samples/sec Loss 7.3609 LearningRate 0.0298 Epoch: 9 Global Step: 376660 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:21,918-Speed 2614.50 samples/sec Loss 7.3481 LearningRate 0.0298 Epoch: 9 Global Step: 376670 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:25,831-Speed 2617.29 samples/sec Loss 7.3188 LearningRate 0.0298 Epoch: 9 Global Step: 376680 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:29,749-Speed 2614.65 samples/sec Loss 7.3461 LearningRate 0.0298 Epoch: 9 Global Step: 376690 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:33,668-Speed 2613.94 samples/sec Loss 7.4050 LearningRate 0.0298 Epoch: 9 Global Step: 376700 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:37,606-Speed 2600.46 samples/sec Loss 7.4625 LearningRate 0.0298 Epoch: 9 Global Step: 376710 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:02:41,474-Speed 2647.92 samples/sec Loss 7.6585 LearningRate 0.0298 Epoch: 9 Global Step: 376720 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:02:45,359-Speed 2636.69 samples/sec Loss 7.3578 LearningRate 0.0298 Epoch: 9 Global Step: 376730 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:02:49,254-Speed 2629.46 samples/sec Loss 7.3370 LearningRate 0.0298 Epoch: 9 Global Step: 376740 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:02:53,160-Speed 2622.68 samples/sec Loss 7.4528 LearningRate 0.0298 Epoch: 9 Global Step: 376750 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:02:57,054-Speed 2630.79 samples/sec Loss 7.3462 LearningRate 0.0298 Epoch: 9 Global Step: 376760 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:03:00,954-Speed 2626.90 samples/sec Loss 7.3051 LearningRate 0.0298 Epoch: 9 Global Step: 376770 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:03:04,867-Speed 2617.50 samples/sec Loss 7.3537 LearningRate 0.0298 Epoch: 9 Global Step: 376780 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:03:08,773-Speed 2622.47 samples/sec Loss 7.3508 LearningRate 0.0298 Epoch: 9 Global Step: 376790 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:03:12,672-Speed 2626.97 samples/sec Loss 7.3810 LearningRate 0.0298 Epoch: 9 Global Step: 376800 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:03:16,569-Speed 2627.92 samples/sec Loss 7.2964 LearningRate 0.0298 Epoch: 9 Global Step: 376810 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:03:20,478-Speed 2620.21 samples/sec Loss 7.3212 LearningRate 0.0298 Epoch: 9 Global Step: 376820 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:03:24,377-Speed 2627.45 samples/sec Loss 7.4124 LearningRate 0.0298 Epoch: 9 Global Step: 376830 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:03:28,273-Speed 2628.83 samples/sec Loss 7.3339 LearningRate 0.0298 Epoch: 9 Global Step: 376840 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:03:32,165-Speed 2631.81 samples/sec Loss 7.2761 LearningRate 0.0298 Epoch: 9 Global Step: 376850 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:03:36,091-Speed 2609.10 samples/sec Loss 7.3971 LearningRate 0.0298 Epoch: 9 Global Step: 376860 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:03:39,993-Speed 2624.66 samples/sec Loss 7.3341 LearningRate 0.0298 Epoch: 9 Global Step: 376870 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:03:43,894-Speed 2625.87 samples/sec Loss 7.4604 LearningRate 0.0298 Epoch: 9 Global Step: 376880 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:03:47,762-Speed 2648.21 samples/sec Loss 7.3587 LearningRate 0.0298 Epoch: 9 Global Step: 376890 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:03:51,703-Speed 2599.45 samples/sec Loss 7.8634 LearningRate 0.0298 Epoch: 9 Global Step: 376900 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:03:55,598-Speed 2629.65 samples/sec Loss 7.3092 LearningRate 0.0298 Epoch: 9 Global Step: 376910 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:03:59,527-Speed 2606.74 samples/sec Loss 7.2918 LearningRate 0.0298 Epoch: 9 Global Step: 376920 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:03,420-Speed 2631.92 samples/sec Loss 7.2859 LearningRate 0.0298 Epoch: 9 Global Step: 376930 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:07,329-Speed 2620.13 samples/sec Loss 7.2254 LearningRate 0.0298 Epoch: 9 Global Step: 376940 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:11,217-Speed 2634.19 samples/sec Loss 7.3720 LearningRate 0.0298 Epoch: 9 Global Step: 376950 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:15,107-Speed 2633.66 samples/sec Loss 7.3477 LearningRate 0.0298 Epoch: 9 Global Step: 376960 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:19,022-Speed 2615.79 samples/sec Loss 7.3497 LearningRate 0.0298 Epoch: 9 Global Step: 376970 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:22,948-Speed 2609.41 samples/sec Loss 7.3865 LearningRate 0.0298 Epoch: 9 Global Step: 376980 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:26,830-Speed 2638.48 samples/sec Loss 7.2133 LearningRate 0.0298 Epoch: 9 Global Step: 376990 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:04:30,727-Speed 2628.51 samples/sec Loss 7.3106 LearningRate 0.0298 Epoch: 9 Global Step: 377000 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:04:34,624-Speed 2628.68 samples/sec Loss 7.2941 LearningRate 0.0298 Epoch: 9 Global Step: 377010 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:04:38,526-Speed 2625.13 samples/sec Loss 7.4768 LearningRate 0.0298 Epoch: 9 Global Step: 377020 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:04:42,414-Speed 2634.18 samples/sec Loss 7.3756 LearningRate 0.0298 Epoch: 9 Global Step: 377030 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:04:46,309-Speed 2629.72 samples/sec Loss 7.3464 LearningRate 0.0298 Epoch: 9 Global Step: 377040 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:04:50,224-Speed 2616.49 samples/sec Loss 7.5002 LearningRate 0.0298 Epoch: 9 Global Step: 377050 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:54,118-Speed 2630.56 samples/sec Loss 7.9728 LearningRate 0.0298 Epoch: 9 Global Step: 377060 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:04:58,041-Speed 2610.75 samples/sec Loss 7.6400 LearningRate 0.0298 Epoch: 9 Global Step: 377070 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:01,936-Speed 2630.04 samples/sec Loss 7.3517 LearningRate 0.0298 Epoch: 9 Global Step: 377080 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:05,923-Speed 2569.36 samples/sec Loss 7.2481 LearningRate 0.0298 Epoch: 9 Global Step: 377090 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:09,818-Speed 2629.65 samples/sec Loss 7.4278 LearningRate 0.0297 Epoch: 9 Global Step: 377100 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:13,717-Speed 2627.03 samples/sec Loss 7.6427 LearningRate 0.0297 Epoch: 9 Global Step: 377110 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:17,607-Speed 2632.73 samples/sec Loss 7.6178 LearningRate 0.0297 Epoch: 9 Global Step: 377120 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:21,495-Speed 2634.78 samples/sec Loss 7.2052 LearningRate 0.0297 Epoch: 9 Global Step: 377130 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:25,396-Speed 2625.68 samples/sec Loss 7.2866 LearningRate 0.0297 Epoch: 9 Global Step: 377140 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:05:29,299-Speed 2624.67 samples/sec Loss 7.2974 LearningRate 0.0297 Epoch: 9 Global Step: 377150 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:05:33,189-Speed 2632.92 samples/sec Loss 7.4049 LearningRate 0.0297 Epoch: 9 Global Step: 377160 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:05:37,084-Speed 2629.65 samples/sec Loss 7.3452 LearningRate 0.0297 Epoch: 9 Global Step: 377170 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:05:40,971-Speed 2635.06 samples/sec Loss 7.4369 LearningRate 0.0297 Epoch: 9 Global Step: 377180 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:05:44,862-Speed 2632.61 samples/sec Loss 7.2413 LearningRate 0.0297 Epoch: 9 Global Step: 377190 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:05:48,758-Speed 2628.37 samples/sec Loss 7.2256 LearningRate 0.0297 Epoch: 9 Global Step: 377200 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:05:52,668-Speed 2620.57 samples/sec Loss 7.3741 LearningRate 0.0297 Epoch: 9 Global Step: 377210 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:05:56,561-Speed 2631.00 samples/sec Loss 7.3053 LearningRate 0.0297 Epoch: 9 Global Step: 377220 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:06:00,465-Speed 2623.61 samples/sec Loss 7.3413 LearningRate 0.0297 Epoch: 9 Global Step: 377230 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:06:04,359-Speed 2630.48 samples/sec Loss 7.3174 LearningRate 0.0297 Epoch: 9 Global Step: 377240 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:06:08,277-Speed 2614.38 samples/sec Loss 7.4381 LearningRate 0.0297 Epoch: 9 Global Step: 377250 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:12,184-Speed 2621.25 samples/sec Loss 7.3257 LearningRate 0.0297 Epoch: 9 Global Step: 377260 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:16,077-Speed 2630.80 samples/sec Loss 7.3738 LearningRate 0.0297 Epoch: 9 Global Step: 377270 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:19,969-Speed 2631.88 samples/sec Loss 7.3518 LearningRate 0.0297 Epoch: 9 Global Step: 377280 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:23,878-Speed 2620.16 samples/sec Loss 7.3755 LearningRate 0.0297 Epoch: 9 Global Step: 377290 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:27,779-Speed 2625.84 samples/sec Loss 7.4162 LearningRate 0.0297 Epoch: 9 Global Step: 377300 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:31,670-Speed 2632.79 samples/sec Loss 7.3035 LearningRate 0.0297 Epoch: 9 Global Step: 377310 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:35,581-Speed 2618.30 samples/sec Loss 7.2902 LearningRate 0.0297 Epoch: 9 Global Step: 377320 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:39,477-Speed 2629.18 samples/sec Loss 7.3405 LearningRate 0.0297 Epoch: 9 Global Step: 377330 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:43,366-Speed 2633.86 samples/sec Loss 7.4945 LearningRate 0.0297 Epoch: 9 Global Step: 377340 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:06:47,263-Speed 2628.74 samples/sec Loss 7.3650 LearningRate 0.0297 Epoch: 9 Global Step: 377350 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:06:51,151-Speed 2633.94 samples/sec Loss 7.4149 LearningRate 0.0297 Epoch: 9 Global Step: 377360 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:06:55,050-Speed 2627.16 samples/sec Loss 7.4003 LearningRate 0.0297 Epoch: 9 Global Step: 377370 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:06:58,943-Speed 2630.52 samples/sec Loss 7.4404 LearningRate 0.0297 Epoch: 9 Global Step: 377380 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:07:02,844-Speed 2625.58 samples/sec Loss 7.3894 LearningRate 0.0297 Epoch: 9 Global Step: 377390 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:07:06,749-Speed 2622.60 samples/sec Loss 7.2295 LearningRate 0.0297 Epoch: 9 Global Step: 377400 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:07:10,649-Speed 2627.19 samples/sec Loss 7.2705 LearningRate 0.0297 Epoch: 9 Global Step: 377410 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:07:14,542-Speed 2630.38 samples/sec Loss 7.4299 LearningRate 0.0297 Epoch: 9 Global Step: 377420 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:07:18,451-Speed 2620.58 samples/sec Loss 7.2976 LearningRate 0.0297 Epoch: 9 Global Step: 377430 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:07:22,336-Speed 2636.81 samples/sec Loss 7.3742 LearningRate 0.0297 Epoch: 9 Global Step: 377440 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:07:26,221-Speed 2636.21 samples/sec Loss 7.4258 LearningRate 0.0297 Epoch: 9 Global Step: 377450 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:07:30,134-Speed 2617.68 samples/sec Loss 7.4454 LearningRate 0.0297 Epoch: 9 Global Step: 377460 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:07:34,024-Speed 2633.55 samples/sec Loss 7.4048 LearningRate 0.0297 Epoch: 9 Global Step: 377470 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:07:37,912-Speed 2634.10 samples/sec Loss 7.2095 LearningRate 0.0297 Epoch: 9 Global Step: 377480 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:07:41,789-Speed 2641.84 samples/sec Loss 7.4230 LearningRate 0.0297 Epoch: 9 Global Step: 377490 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:07:45,680-Speed 2632.47 samples/sec Loss 7.5291 LearningRate 0.0297 Epoch: 9 Global Step: 377500 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:07:49,577-Speed 2627.96 samples/sec Loss 7.6707 LearningRate 0.0297 Epoch: 9 Global Step: 377510 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:07:53,470-Speed 2631.73 samples/sec Loss 7.7505 LearningRate 0.0297 Epoch: 9 Global Step: 377520 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:07:57,364-Speed 2629.77 samples/sec Loss 7.5680 LearningRate 0.0297 Epoch: 9 Global Step: 377530 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:08:01,280-Speed 2616.24 samples/sec Loss 7.2773 LearningRate 0.0297 Epoch: 9 Global Step: 377540 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:08:05,296-Speed 2550.06 samples/sec Loss 7.2076 LearningRate 0.0297 Epoch: 9 Global Step: 377550 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:08:09,191-Speed 2629.49 samples/sec Loss 7.3651 LearningRate 0.0297 Epoch: 9 Global Step: 377560 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:08:13,095-Speed 2623.76 samples/sec Loss 7.3520 LearningRate 0.0297 Epoch: 9 Global Step: 377570 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:08:16,988-Speed 2631.11 samples/sec Loss 7.2468 LearningRate 0.0297 Epoch: 9 Global Step: 377580 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:08:20,890-Speed 2624.97 samples/sec Loss 7.2567 LearningRate 0.0297 Epoch: 9 Global Step: 377590 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:08:24,786-Speed 2629.16 samples/sec Loss 7.4646 LearningRate 0.0297 Epoch: 9 Global Step: 377600 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:28,694-Speed 2621.14 samples/sec Loss 7.3508 LearningRate 0.0297 Epoch: 9 Global Step: 377610 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:32,613-Speed 2613.56 samples/sec Loss 7.4170 LearningRate 0.0297 Epoch: 9 Global Step: 377620 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:36,514-Speed 2625.57 samples/sec Loss 7.3656 LearningRate 0.0297 Epoch: 9 Global Step: 377630 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:40,417-Speed 2624.24 samples/sec Loss 7.3626 LearningRate 0.0297 Epoch: 9 Global Step: 377640 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:44,309-Speed 2631.51 samples/sec Loss 7.2248 LearningRate 0.0297 Epoch: 9 Global Step: 377650 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:48,204-Speed 2629.36 samples/sec Loss 7.2787 LearningRate 0.0297 Epoch: 9 Global Step: 377660 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:52,134-Speed 2606.84 samples/sec Loss 7.3752 LearningRate 0.0297 Epoch: 9 Global Step: 377670 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:56,095-Speed 2585.70 samples/sec Loss 7.4495 LearningRate 0.0297 Epoch: 9 Global Step: 377680 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:08:59,995-Speed 2626.93 samples/sec Loss 7.3285 LearningRate 0.0297 Epoch: 9 Global Step: 377690 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:09:03,882-Speed 2635.30 samples/sec Loss 7.3271 LearningRate 0.0297 Epoch: 9 Global Step: 377700 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:07,775-Speed 2630.79 samples/sec Loss 7.3233 LearningRate 0.0297 Epoch: 9 Global Step: 377710 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:11,677-Speed 2625.89 samples/sec Loss 7.1908 LearningRate 0.0297 Epoch: 9 Global Step: 377720 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:15,567-Speed 2633.56 samples/sec Loss 7.4386 LearningRate 0.0297 Epoch: 9 Global Step: 377730 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:19,480-Speed 2617.02 samples/sec Loss 7.3361 LearningRate 0.0297 Epoch: 9 Global Step: 377740 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:23,377-Speed 2629.17 samples/sec Loss 7.3033 LearningRate 0.0297 Epoch: 9 Global Step: 377750 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:27,270-Speed 2630.78 samples/sec Loss 7.3314 LearningRate 0.0297 Epoch: 9 Global Step: 377760 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:31,172-Speed 2625.24 samples/sec Loss 7.5588 LearningRate 0.0297 Epoch: 9 Global Step: 377770 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:35,072-Speed 2625.83 samples/sec Loss 7.4283 LearningRate 0.0297 Epoch: 9 Global Step: 377780 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:38,960-Speed 2634.48 samples/sec Loss 7.3657 LearningRate 0.0297 Epoch: 9 Global Step: 377790 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:09:42,853-Speed 2630.77 samples/sec Loss 7.3020 LearningRate 0.0297 Epoch: 9 Global Step: 377800 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:09:46,771-Speed 2614.79 samples/sec Loss 7.1737 LearningRate 0.0297 Epoch: 9 Global Step: 377810 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:09:50,676-Speed 2622.09 samples/sec Loss 7.4428 LearningRate 0.0297 Epoch: 9 Global Step: 377820 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:09:54,597-Speed 2612.82 samples/sec Loss 7.3589 LearningRate 0.0297 Epoch: 9 Global Step: 377830 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:09:58,487-Speed 2632.71 samples/sec Loss 7.5153 LearningRate 0.0297 Epoch: 9 Global Step: 377840 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:10:02,393-Speed 2622.68 samples/sec Loss 7.2557 LearningRate 0.0297 Epoch: 9 Global Step: 377850 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:10:06,287-Speed 2630.22 samples/sec Loss 7.3333 LearningRate 0.0296 Epoch: 9 Global Step: 377860 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:10:10,183-Speed 2628.63 samples/sec Loss 7.4009 LearningRate 0.0296 Epoch: 9 Global Step: 377870 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:10:14,098-Speed 2616.79 samples/sec Loss 7.2085 LearningRate 0.0296 Epoch: 9 Global Step: 377880 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:10:18,000-Speed 2625.16 samples/sec Loss 7.2674 LearningRate 0.0296 Epoch: 9 Global Step: 377890 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:10:21,906-Speed 2621.90 samples/sec Loss 7.4413 LearningRate 0.0296 Epoch: 9 Global Step: 377900 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:10:25,802-Speed 2629.02 samples/sec Loss 7.2928 LearningRate 0.0296 Epoch: 9 Global Step: 377910 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:10:29,744-Speed 2598.39 samples/sec Loss 7.2974 LearningRate 0.0296 Epoch: 9 Global Step: 377920 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:10:33,631-Speed 2635.37 samples/sec Loss 7.3949 LearningRate 0.0296 Epoch: 9 Global Step: 377930 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:10:37,533-Speed 2625.37 samples/sec Loss 7.3384 LearningRate 0.0296 Epoch: 9 Global Step: 377940 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:10:41,426-Speed 2630.78 samples/sec Loss 7.3729 LearningRate 0.0296 Epoch: 9 Global Step: 377950 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:10:45,321-Speed 2629.20 samples/sec Loss 7.4048 LearningRate 0.0296 Epoch: 9 Global Step: 377960 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:10:49,205-Speed 2637.56 samples/sec Loss 7.4001 LearningRate 0.0296 Epoch: 9 Global Step: 377970 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:10:53,045-Speed 2667.50 samples/sec Loss 7.8667 LearningRate 0.0296 Epoch: 9 Global Step: 377980 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:10:56,945-Speed 2626.52 samples/sec Loss 7.4698 LearningRate 0.0296 Epoch: 9 Global Step: 377990 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:00,840-Speed 2629.63 samples/sec Loss 7.4129 LearningRate 0.0296 Epoch: 9 Global Step: 378000 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:04,919-Speed 2510.57 samples/sec Loss 7.3888 LearningRate 0.0296 Epoch: 9 Global Step: 378010 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:08,823-Speed 2623.81 samples/sec Loss 7.4232 LearningRate 0.0296 Epoch: 9 Global Step: 378020 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:12,708-Speed 2636.89 samples/sec Loss 7.2837 LearningRate 0.0296 Epoch: 9 Global Step: 378030 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:16,616-Speed 2620.66 samples/sec Loss 7.3897 LearningRate 0.0296 Epoch: 9 Global Step: 378040 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:20,501-Speed 2636.33 samples/sec Loss 7.3698 LearningRate 0.0296 Epoch: 9 Global Step: 378050 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:24,390-Speed 2633.62 samples/sec Loss 7.3189 LearningRate 0.0296 Epoch: 9 Global Step: 378060 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:28,284-Speed 2630.27 samples/sec Loss 7.3617 LearningRate 0.0296 Epoch: 9 Global Step: 378070 Fp16 Grad Scale: 4096 Required: 51 hours
Training: 2022-04-14 14:11:32,211-Speed 2608.69 samples/sec Loss 7.3013 LearningRate 0.0296 Epoch: 9 Global Step: 378080 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:11:36,099-Speed 2634.60 samples/sec Loss 7.3224 LearningRate 0.0296 Epoch: 9 Global Step: 378090 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:11:39,988-Speed 2633.85 samples/sec Loss 7.2286 LearningRate 0.0296 Epoch: 9 Global Step: 378100 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:11:43,914-Speed 2608.76 samples/sec Loss 7.3679 LearningRate 0.0296 Epoch: 9 Global Step: 378110 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:11:47,804-Speed 2632.91 samples/sec Loss 7.3035 LearningRate 0.0296 Epoch: 9 Global Step: 378120 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:11:51,701-Speed 2628.90 samples/sec Loss 7.3412 LearningRate 0.0296 Epoch: 9 Global Step: 378130 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:11:55,596-Speed 2629.18 samples/sec Loss 7.3468 LearningRate 0.0296 Epoch: 9 Global Step: 378140 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:11:59,496-Speed 2626.35 samples/sec Loss 7.2095 LearningRate 0.0296 Epoch: 9 Global Step: 378150 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:03,399-Speed 2624.00 samples/sec Loss 7.3317 LearningRate 0.0296 Epoch: 9 Global Step: 378160 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:07,288-Speed 2634.05 samples/sec Loss 7.2183 LearningRate 0.0296 Epoch: 9 Global Step: 378170 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:11,188-Speed 2626.72 samples/sec Loss 7.4421 LearningRate 0.0296 Epoch: 9 Global Step: 378180 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:12:15,077-Speed 2633.01 samples/sec Loss 7.4401 LearningRate 0.0296 Epoch: 9 Global Step: 378190 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:12:18,964-Speed 2635.23 samples/sec Loss 7.9709 LearningRate 0.0296 Epoch: 9 Global Step: 378200 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:22,853-Speed 2633.75 samples/sec Loss 7.6717 LearningRate 0.0296 Epoch: 9 Global Step: 378210 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:26,748-Speed 2629.13 samples/sec Loss 7.4790 LearningRate 0.0296 Epoch: 9 Global Step: 378220 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:30,665-Speed 2614.86 samples/sec Loss 7.3413 LearningRate 0.0296 Epoch: 9 Global Step: 378230 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:34,553-Speed 2633.85 samples/sec Loss 7.3633 LearningRate 0.0296 Epoch: 9 Global Step: 378240 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:38,451-Speed 2628.02 samples/sec Loss 7.3969 LearningRate 0.0296 Epoch: 9 Global Step: 378250 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:42,351-Speed 2628.58 samples/sec Loss 7.3529 LearningRate 0.0296 Epoch: 9 Global Step: 378260 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:46,240-Speed 2634.14 samples/sec Loss 7.3826 LearningRate 0.0296 Epoch: 9 Global Step: 378270 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:50,132-Speed 2631.74 samples/sec Loss 7.3215 LearningRate 0.0296 Epoch: 9 Global Step: 378280 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:54,021-Speed 2633.51 samples/sec Loss 7.3339 LearningRate 0.0296 Epoch: 9 Global Step: 378290 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:12:57,922-Speed 2626.08 samples/sec Loss 7.5342 LearningRate 0.0296 Epoch: 9 Global Step: 378300 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:01,818-Speed 2628.53 samples/sec Loss 7.3599 LearningRate 0.0296 Epoch: 9 Global Step: 378310 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:05,705-Speed 2636.53 samples/sec Loss 7.3861 LearningRate 0.0296 Epoch: 9 Global Step: 378320 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:09,596-Speed 2631.96 samples/sec Loss 7.2492 LearningRate 0.0296 Epoch: 9 Global Step: 378330 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:13,500-Speed 2623.57 samples/sec Loss 7.4284 LearningRate 0.0296 Epoch: 9 Global Step: 378340 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:17,391-Speed 2632.74 samples/sec Loss 7.3678 LearningRate 0.0296 Epoch: 9 Global Step: 378350 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:21,293-Speed 2624.23 samples/sec Loss 7.4052 LearningRate 0.0296 Epoch: 9 Global Step: 378360 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:25,182-Speed 2634.23 samples/sec Loss 7.3345 LearningRate 0.0296 Epoch: 9 Global Step: 378370 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:29,101-Speed 2613.81 samples/sec Loss 7.3181 LearningRate 0.0296 Epoch: 9 Global Step: 378380 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:33,013-Speed 2617.70 samples/sec Loss 7.3200 LearningRate 0.0296 Epoch: 9 Global Step: 378390 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:13:36,925-Speed 2618.26 samples/sec Loss 7.3175 LearningRate 0.0296 Epoch: 9 Global Step: 378400 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:13:40,833-Speed 2620.71 samples/sec Loss 7.4175 LearningRate 0.0296 Epoch: 9 Global Step: 378410 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:13:44,736-Speed 2624.62 samples/sec Loss 7.3799 LearningRate 0.0296 Epoch: 9 Global Step: 378420 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:13:48,634-Speed 2627.27 samples/sec Loss 7.5076 LearningRate 0.0296 Epoch: 9 Global Step: 378430 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:13:52,520-Speed 2636.07 samples/sec Loss 7.3156 LearningRate 0.0296 Epoch: 9 Global Step: 378440 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:13:56,416-Speed 2628.71 samples/sec Loss 7.3116 LearningRate 0.0296 Epoch: 9 Global Step: 378450 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:14:00,314-Speed 2627.65 samples/sec Loss 7.3336 LearningRate 0.0296 Epoch: 9 Global Step: 378460 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:14:04,211-Speed 2628.81 samples/sec Loss 7.3877 LearningRate 0.0296 Epoch: 9 Global Step: 378470 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:14:08,073-Speed 2651.92 samples/sec Loss 7.3119 LearningRate 0.0296 Epoch: 9 Global Step: 378480 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:11,964-Speed 2632.49 samples/sec Loss 7.7223 LearningRate 0.0296 Epoch: 9 Global Step: 378490 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:15,850-Speed 2635.13 samples/sec Loss 7.3405 LearningRate 0.0296 Epoch: 9 Global Step: 378500 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:19,756-Speed 2622.33 samples/sec Loss 7.4116 LearningRate 0.0296 Epoch: 9 Global Step: 378510 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:23,660-Speed 2623.86 samples/sec Loss 7.2997 LearningRate 0.0296 Epoch: 9 Global Step: 378520 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:27,548-Speed 2634.38 samples/sec Loss 7.3348 LearningRate 0.0296 Epoch: 9 Global Step: 378530 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:31,437-Speed 2633.61 samples/sec Loss 7.4609 LearningRate 0.0296 Epoch: 9 Global Step: 378540 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:35,323-Speed 2635.72 samples/sec Loss 7.6258 LearningRate 0.0296 Epoch: 9 Global Step: 378550 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:39,214-Speed 2631.89 samples/sec Loss 7.1818 LearningRate 0.0296 Epoch: 9 Global Step: 378560 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:43,105-Speed 2633.02 samples/sec Loss 7.4771 LearningRate 0.0296 Epoch: 9 Global Step: 378570 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:14:46,997-Speed 2631.57 samples/sec Loss 7.3508 LearningRate 0.0296 Epoch: 9 Global Step: 378580 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:14:50,888-Speed 2632.11 samples/sec Loss 7.3324 LearningRate 0.0296 Epoch: 9 Global Step: 378590 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:14:54,790-Speed 2625.02 samples/sec Loss 7.3093 LearningRate 0.0296 Epoch: 9 Global Step: 378600 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:14:58,729-Speed 2600.45 samples/sec Loss 7.2943 LearningRate 0.0296 Epoch: 9 Global Step: 378610 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:15:02,631-Speed 2625.06 samples/sec Loss 7.3121 LearningRate 0.0296 Epoch: 9 Global Step: 378620 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:15:06,528-Speed 2627.92 samples/sec Loss 7.3479 LearningRate 0.0295 Epoch: 9 Global Step: 378630 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:15:10,426-Speed 2627.11 samples/sec Loss 7.2327 LearningRate 0.0295 Epoch: 9 Global Step: 378640 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:15:14,325-Speed 2627.90 samples/sec Loss 7.4225 LearningRate 0.0295 Epoch: 9 Global Step: 378650 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:15:18,212-Speed 2635.21 samples/sec Loss 7.3846 LearningRate 0.0295 Epoch: 9 Global Step: 378660 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:15:22,102-Speed 2632.77 samples/sec Loss 7.4320 LearningRate 0.0295 Epoch: 9 Global Step: 378670 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:15:26,109-Speed 2556.34 samples/sec Loss 7.4027 LearningRate 0.0295 Epoch: 9 Global Step: 378680 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:29,999-Speed 2632.53 samples/sec Loss 7.3797 LearningRate 0.0295 Epoch: 9 Global Step: 378690 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:33,905-Speed 2622.42 samples/sec Loss 7.3879 LearningRate 0.0295 Epoch: 9 Global Step: 378700 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:37,807-Speed 2624.49 samples/sec Loss 7.4056 LearningRate 0.0295 Epoch: 9 Global Step: 378710 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:41,700-Speed 2631.21 samples/sec Loss 7.4070 LearningRate 0.0295 Epoch: 9 Global Step: 378720 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:45,609-Speed 2620.19 samples/sec Loss 7.3355 LearningRate 0.0295 Epoch: 9 Global Step: 378730 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:49,505-Speed 2628.88 samples/sec Loss 7.4385 LearningRate 0.0295 Epoch: 9 Global Step: 378740 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:53,392-Speed 2635.47 samples/sec Loss 7.3948 LearningRate 0.0295 Epoch: 9 Global Step: 378750 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:15:57,286-Speed 2630.61 samples/sec Loss 7.3047 LearningRate 0.0295 Epoch: 9 Global Step: 378760 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:16:01,174-Speed 2634.10 samples/sec Loss 7.2092 LearningRate 0.0295 Epoch: 9 Global Step: 378770 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:16:05,057-Speed 2637.87 samples/sec Loss 7.5552 LearningRate 0.0295 Epoch: 9 Global Step: 378780 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:08,948-Speed 2631.73 samples/sec Loss 7.7476 LearningRate 0.0295 Epoch: 9 Global Step: 378790 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:12,839-Speed 2632.74 samples/sec Loss 7.6696 LearningRate 0.0295 Epoch: 9 Global Step: 378800 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:16,740-Speed 2625.53 samples/sec Loss 7.5463 LearningRate 0.0295 Epoch: 9 Global Step: 378810 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:20,647-Speed 2621.61 samples/sec Loss 7.6293 LearningRate 0.0295 Epoch: 9 Global Step: 378820 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:24,565-Speed 2613.99 samples/sec Loss 7.2922 LearningRate 0.0295 Epoch: 9 Global Step: 378830 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:28,450-Speed 2636.31 samples/sec Loss 7.2360 LearningRate 0.0295 Epoch: 9 Global Step: 378840 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:32,344-Speed 2630.85 samples/sec Loss 7.4284 LearningRate 0.0295 Epoch: 9 Global Step: 378850 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:36,236-Speed 2631.83 samples/sec Loss 7.3268 LearningRate 0.0295 Epoch: 9 Global Step: 378860 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:40,126-Speed 2632.50 samples/sec Loss 7.1463 LearningRate 0.0295 Epoch: 9 Global Step: 378870 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:16:44,031-Speed 2622.65 samples/sec Loss 7.3460 LearningRate 0.0295 Epoch: 9 Global Step: 378880 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:16:47,926-Speed 2629.94 samples/sec Loss 7.3296 LearningRate 0.0295 Epoch: 9 Global Step: 378890 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:16:51,830-Speed 2623.68 samples/sec Loss 7.2379 LearningRate 0.0295 Epoch: 9 Global Step: 378900 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:16:55,735-Speed 2623.05 samples/sec Loss 7.3364 LearningRate 0.0295 Epoch: 9 Global Step: 378910 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:16:59,692-Speed 2588.38 samples/sec Loss 7.2631 LearningRate 0.0295 Epoch: 9 Global Step: 378920 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:17:03,617-Speed 2609.76 samples/sec Loss 7.3312 LearningRate 0.0295 Epoch: 9 Global Step: 378930 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:17:07,510-Speed 2631.37 samples/sec Loss 7.3898 LearningRate 0.0295 Epoch: 9 Global Step: 378940 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:17:11,408-Speed 2627.31 samples/sec Loss 7.3329 LearningRate 0.0295 Epoch: 9 Global Step: 378950 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:17:15,304-Speed 2629.02 samples/sec Loss 7.4041 LearningRate 0.0295 Epoch: 9 Global Step: 378960 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:17:19,209-Speed 2623.13 samples/sec Loss 7.4061 LearningRate 0.0295 Epoch: 9 Global Step: 378970 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:17:23,109-Speed 2626.38 samples/sec Loss 7.3334 LearningRate 0.0295 Epoch: 9 Global Step: 378980 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:26,998-Speed 2633.54 samples/sec Loss 7.2432 LearningRate 0.0295 Epoch: 9 Global Step: 378990 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:30,895-Speed 2628.92 samples/sec Loss 7.3250 LearningRate 0.0295 Epoch: 9 Global Step: 379000 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:34,783-Speed 2633.64 samples/sec Loss 7.4265 LearningRate 0.0295 Epoch: 9 Global Step: 379010 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:38,679-Speed 2629.30 samples/sec Loss 7.3218 LearningRate 0.0295 Epoch: 9 Global Step: 379020 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:42,583-Speed 2623.02 samples/sec Loss 7.3826 LearningRate 0.0295 Epoch: 9 Global Step: 379030 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:46,486-Speed 2624.99 samples/sec Loss 7.3485 LearningRate 0.0295 Epoch: 9 Global Step: 379040 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:50,382-Speed 2628.64 samples/sec Loss 7.4646 LearningRate 0.0295 Epoch: 9 Global Step: 379050 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:54,275-Speed 2631.51 samples/sec Loss 7.2657 LearningRate 0.0295 Epoch: 9 Global Step: 379060 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:17:58,170-Speed 2629.71 samples/sec Loss 7.3014 LearningRate 0.0295 Epoch: 9 Global Step: 379070 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:18:02,076-Speed 2622.26 samples/sec Loss 7.3463 LearningRate 0.0295 Epoch: 9 Global Step: 379080 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:18:06,037-Speed 2585.97 samples/sec Loss 7.3233 LearningRate 0.0295 Epoch: 9 Global Step: 379090 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:18:09,931-Speed 2629.96 samples/sec Loss 7.3382 LearningRate 0.0295 Epoch: 9 Global Step: 379100 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:18:13,817-Speed 2635.73 samples/sec Loss 7.3009 LearningRate 0.0295 Epoch: 9 Global Step: 379110 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:18:17,710-Speed 2631.42 samples/sec Loss 7.3587 LearningRate 0.0295 Epoch: 9 Global Step: 379120 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:18:21,614-Speed 2623.07 samples/sec Loss 7.3685 LearningRate 0.0295 Epoch: 9 Global Step: 379130 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:18:25,505-Speed 2632.43 samples/sec Loss 7.3077 LearningRate 0.0295 Epoch: 9 Global Step: 379140 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:18:29,356-Speed 2659.86 samples/sec Loss 7.4701 LearningRate 0.0295 Epoch: 9 Global Step: 379150 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:18:33,251-Speed 2629.84 samples/sec Loss 7.6581 LearningRate 0.0295 Epoch: 9 Global Step: 379160 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:18:37,137-Speed 2635.32 samples/sec Loss 7.4106 LearningRate 0.0295 Epoch: 9 Global Step: 379170 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:18:41,030-Speed 2631.02 samples/sec Loss 7.3255 LearningRate 0.0295 Epoch: 9 Global Step: 379180 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:18:44,936-Speed 2622.71 samples/sec Loss 7.3531 LearningRate 0.0295 Epoch: 9 Global Step: 379190 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:18:48,826-Speed 2632.58 samples/sec Loss 7.3285 LearningRate 0.0295 Epoch: 9 Global Step: 379200 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:18:52,720-Speed 2630.83 samples/sec Loss 7.2447 LearningRate 0.0295 Epoch: 9 Global Step: 379210 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:18:56,610-Speed 2633.07 samples/sec Loss 7.3521 LearningRate 0.0295 Epoch: 9 Global Step: 379220 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:19:00,509-Speed 2626.77 samples/sec Loss 7.2810 LearningRate 0.0295 Epoch: 9 Global Step: 379230 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:19:04,403-Speed 2629.96 samples/sec Loss 7.3845 LearningRate 0.0295 Epoch: 9 Global Step: 379240 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:19:08,292-Speed 2634.04 samples/sec Loss 7.3429 LearningRate 0.0295 Epoch: 9 Global Step: 379250 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:12,194-Speed 2624.58 samples/sec Loss 7.3710 LearningRate 0.0295 Epoch: 9 Global Step: 379260 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:16,098-Speed 2623.15 samples/sec Loss 7.3994 LearningRate 0.0295 Epoch: 9 Global Step: 379270 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:19,999-Speed 2625.75 samples/sec Loss 7.2143 LearningRate 0.0295 Epoch: 9 Global Step: 379280 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:23,900-Speed 2625.60 samples/sec Loss 7.3918 LearningRate 0.0295 Epoch: 9 Global Step: 379290 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:27,786-Speed 2635.72 samples/sec Loss 7.3392 LearningRate 0.0295 Epoch: 9 Global Step: 379300 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:31,684-Speed 2627.92 samples/sec Loss 7.4472 LearningRate 0.0295 Epoch: 9 Global Step: 379310 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:35,569-Speed 2636.71 samples/sec Loss 7.4058 LearningRate 0.0295 Epoch: 9 Global Step: 379320 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:39,481-Speed 2617.63 samples/sec Loss 7.2243 LearningRate 0.0295 Epoch: 9 Global Step: 379330 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:43,384-Speed 2624.73 samples/sec Loss 7.3791 LearningRate 0.0295 Epoch: 9 Global Step: 379340 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:19:47,284-Speed 2625.73 samples/sec Loss 7.3415 LearningRate 0.0295 Epoch: 9 Global Step: 379350 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:19:51,193-Speed 2620.64 samples/sec Loss 7.2450 LearningRate 0.0295 Epoch: 9 Global Step: 379360 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:19:55,084-Speed 2632.09 samples/sec Loss 7.2938 LearningRate 0.0295 Epoch: 9 Global Step: 379370 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:19:59,001-Speed 2615.32 samples/sec Loss 7.3573 LearningRate 0.0295 Epoch: 9 Global Step: 379380 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:02,903-Speed 2625.06 samples/sec Loss 7.3262 LearningRate 0.0294 Epoch: 9 Global Step: 379390 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:07,186-Speed 2391.37 samples/sec Loss 7.2970 LearningRate 0.0294 Epoch: 9 Global Step: 379400 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:11,081-Speed 2629.86 samples/sec Loss 7.2302 LearningRate 0.0294 Epoch: 9 Global Step: 379410 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:14,972-Speed 2632.48 samples/sec Loss 7.5318 LearningRate 0.0294 Epoch: 9 Global Step: 379420 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:18,906-Speed 2603.31 samples/sec Loss 7.3332 LearningRate 0.0294 Epoch: 9 Global Step: 379430 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:22,803-Speed 2628.28 samples/sec Loss 7.3979 LearningRate 0.0294 Epoch: 9 Global Step: 379440 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:26,710-Speed 2621.92 samples/sec Loss 7.4538 LearningRate 0.0294 Epoch: 9 Global Step: 379450 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:20:30,603-Speed 2630.96 samples/sec Loss 7.3731 LearningRate 0.0294 Epoch: 9 Global Step: 379460 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:20:34,505-Speed 2624.97 samples/sec Loss 7.3758 LearningRate 0.0294 Epoch: 9 Global Step: 379470 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:20:38,381-Speed 2642.10 samples/sec Loss 7.2983 LearningRate 0.0294 Epoch: 9 Global Step: 379480 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:20:42,238-Speed 2656.15 samples/sec Loss 7.4074 LearningRate 0.0294 Epoch: 9 Global Step: 379490 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:20:46,135-Speed 2627.88 samples/sec Loss 7.4347 LearningRate 0.0294 Epoch: 9 Global Step: 379500 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:20:50,030-Speed 2629.78 samples/sec Loss 7.3500 LearningRate 0.0294 Epoch: 9 Global Step: 379510 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:20:53,928-Speed 2627.46 samples/sec Loss 7.2618 LearningRate 0.0294 Epoch: 9 Global Step: 379520 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:20:57,815-Speed 2635.39 samples/sec Loss 7.2773 LearningRate 0.0294 Epoch: 9 Global Step: 379530 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:21:01,708-Speed 2631.07 samples/sec Loss 7.3744 LearningRate 0.0294 Epoch: 9 Global Step: 379540 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:21:05,863-Speed 2465.03 samples/sec Loss 7.3961 LearningRate 0.0294 Epoch: 9 Global Step: 379550 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:21:09,756-Speed 2630.41 samples/sec Loss 7.2856 LearningRate 0.0294 Epoch: 9 Global Step: 379560 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:21:13,657-Speed 2625.80 samples/sec Loss 7.3672 LearningRate 0.0294 Epoch: 9 Global Step: 379570 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:21:17,551-Speed 2630.20 samples/sec Loss 7.1307 LearningRate 0.0294 Epoch: 9 Global Step: 379580 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:21:21,451-Speed 2626.94 samples/sec Loss 7.3924 LearningRate 0.0294 Epoch: 9 Global Step: 379590 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:25,345-Speed 2629.78 samples/sec Loss 7.3874 LearningRate 0.0294 Epoch: 9 Global Step: 379600 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:29,361-Speed 2550.76 samples/sec Loss 7.2338 LearningRate 0.0294 Epoch: 9 Global Step: 379610 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:33,301-Speed 2598.99 samples/sec Loss 7.2398 LearningRate 0.0294 Epoch: 9 Global Step: 379620 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:37,195-Speed 2630.86 samples/sec Loss 7.2870 LearningRate 0.0294 Epoch: 9 Global Step: 379630 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:41,090-Speed 2629.10 samples/sec Loss 7.4450 LearningRate 0.0294 Epoch: 9 Global Step: 379640 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:44,984-Speed 2631.22 samples/sec Loss 7.3175 LearningRate 0.0294 Epoch: 9 Global Step: 379650 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:48,883-Speed 2626.80 samples/sec Loss 7.3093 LearningRate 0.0294 Epoch: 9 Global Step: 379660 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:55,327-Speed 1589.18 samples/sec Loss 7.3101 LearningRate 0.0294 Epoch: 9 Global Step: 379670 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:21:59,225-Speed 2628.28 samples/sec Loss 7.4478 LearningRate 0.0294 Epoch: 9 Global Step: 379680 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:22:03,107-Speed 2637.67 samples/sec Loss 7.3541 LearningRate 0.0294 Epoch: 9 Global Step: 379690 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:07,003-Speed 2628.92 samples/sec Loss 7.3906 LearningRate 0.0294 Epoch: 9 Global Step: 379700 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:10,891-Speed 2634.46 samples/sec Loss 7.3108 LearningRate 0.0294 Epoch: 9 Global Step: 379710 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:14,776-Speed 2637.45 samples/sec Loss 7.2572 LearningRate 0.0294 Epoch: 9 Global Step: 379720 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:18,668-Speed 2631.67 samples/sec Loss 7.4243 LearningRate 0.0294 Epoch: 9 Global Step: 379730 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:22,569-Speed 2625.76 samples/sec Loss 7.2910 LearningRate 0.0294 Epoch: 9 Global Step: 379740 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:26,467-Speed 2627.77 samples/sec Loss 7.2914 LearningRate 0.0294 Epoch: 9 Global Step: 379750 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:30,357-Speed 2632.43 samples/sec Loss 7.2333 LearningRate 0.0294 Epoch: 9 Global Step: 379760 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:34,458-Speed 2498.09 samples/sec Loss 7.2859 LearningRate 0.0294 Epoch: 9 Global Step: 379770 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:38,356-Speed 2627.07 samples/sec Loss 7.2460 LearningRate 0.0294 Epoch: 9 Global Step: 379780 Fp16 Grad Scale: 65536 Required: 51 hours
Training: 2022-04-14 14:22:42,253-Speed 2628.83 samples/sec Loss 7.2830 LearningRate 0.0294 Epoch: 9 Global Step: 379790 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:22:46,164-Speed 2618.37 samples/sec Loss 7.3678 LearningRate 0.0294 Epoch: 9 Global Step: 379800 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:22:50,064-Speed 2626.89 samples/sec Loss 7.1995 LearningRate 0.0294 Epoch: 9 Global Step: 379810 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:22:53,961-Speed 2628.49 samples/sec Loss 7.4021 LearningRate 0.0294 Epoch: 9 Global Step: 379820 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:22:57,859-Speed 2627.12 samples/sec Loss 7.2495 LearningRate 0.0294 Epoch: 9 Global Step: 379830 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:23:01,762-Speed 2624.38 samples/sec Loss 7.2969 LearningRate 0.0294 Epoch: 9 Global Step: 379840 Fp16 Grad Scale: 131072 Required: 51 hours
Training: 2022-04-14 14:23:05,654-Speed 2631.40 samples/sec Loss 7.2905 LearningRate 0.0294 Epoch: 9 Global Step: 379850 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:09,558-Speed 2623.80 samples/sec Loss 7.3761 LearningRate 0.0294 Epoch: 9 Global Step: 379860 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:13,474-Speed 2615.34 samples/sec Loss 7.3373 LearningRate 0.0294 Epoch: 9 Global Step: 379870 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:17,375-Speed 2625.47 samples/sec Loss 7.2922 LearningRate 0.0294 Epoch: 9 Global Step: 379880 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:21,273-Speed 2627.39 samples/sec Loss 7.3274 LearningRate 0.0294 Epoch: 9 Global Step: 379890 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:25,172-Speed 2627.23 samples/sec Loss 7.2872 LearningRate 0.0294 Epoch: 9 Global Step: 379900 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:29,073-Speed 2625.48 samples/sec Loss 7.2463 LearningRate 0.0294 Epoch: 9 Global Step: 379910 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:32,974-Speed 2625.93 samples/sec Loss 7.3291 LearningRate 0.0294 Epoch: 9 Global Step: 379920 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:23:36,857-Speed 2637.63 samples/sec Loss 7.3868 LearningRate 0.0294 Epoch: 9 Global Step: 379930 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:23:40,748-Speed 2631.99 samples/sec Loss 7.3962 LearningRate 0.0294 Epoch: 9 Global Step: 379940 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:23:44,649-Speed 2625.97 samples/sec Loss 7.2720 LearningRate 0.0294 Epoch: 9 Global Step: 379950 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:23:48,550-Speed 2625.76 samples/sec Loss 7.3804 LearningRate 0.0294 Epoch: 9 Global Step: 379960 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:23:52,408-Speed 2654.18 samples/sec Loss 7.5996 LearningRate 0.0294 Epoch: 9 Global Step: 379970 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:23:56,310-Speed 2625.43 samples/sec Loss 7.7685 LearningRate 0.0294 Epoch: 9 Global Step: 379980 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:24:00,203-Speed 2631.19 samples/sec Loss 7.3201 LearningRate 0.0294 Epoch: 9 Global Step: 379990 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:24:04,090-Speed 2635.53 samples/sec Loss 7.0410 LearningRate 0.0294 Epoch: 9 Global Step: 380000 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:24:46,722-[lfw][380000]XNorm: 24.351409
Training: 2022-04-14 14:24:46,723-[lfw][380000]Accuracy-Flip: 0.99750+-0.00271
Training: 2022-04-14 14:24:46,724-[lfw][380000]Accuracy-Highest: 0.99783
Training: 2022-04-14 14:25:36,366-[cfp_fp][380000]XNorm: 22.373011
Training: 2022-04-14 14:25:36,367-[cfp_fp][380000]Accuracy-Flip: 0.98757+-0.00495
Training: 2022-04-14 14:25:36,368-[cfp_fp][380000]Accuracy-Highest: 0.98757
Training: 2022-04-14 14:26:19,163-[agedb_30][380000]XNorm: 24.158033
Training: 2022-04-14 14:26:19,164-[agedb_30][380000]Accuracy-Flip: 0.97583+-0.00549
Training: 2022-04-14 14:26:19,165-[agedb_30][380000]Accuracy-Highest: 0.97700
Training: 2022-04-14 14:26:23,044-Speed 73.69 samples/sec Loss 7.2693 LearningRate 0.0294 Epoch: 9 Global Step: 380010 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:26:26,968-Speed 2609.64 samples/sec Loss 7.3517 LearningRate 0.0294 Epoch: 9 Global Step: 380020 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:26:30,847-Speed 2641.01 samples/sec Loss 7.3764 LearningRate 0.0294 Epoch: 9 Global Step: 380030 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:26:34,724-Speed 2642.17 samples/sec Loss 7.3075 LearningRate 0.0294 Epoch: 9 Global Step: 380040 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:26:38,611-Speed 2635.02 samples/sec Loss 7.3524 LearningRate 0.0294 Epoch: 9 Global Step: 380050 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:26:42,487-Speed 2642.55 samples/sec Loss 7.1790 LearningRate 0.0294 Epoch: 9 Global Step: 380060 Fp16 Grad Scale: 8192 Required: 51 hours
Training: 2022-04-14 14:26:46,383-Speed 2629.81 samples/sec Loss 7.2716 LearningRate 0.0294 Epoch: 9 Global Step: 380070 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:26:50,263-Speed 2639.84 samples/sec Loss 7.2565 LearningRate 0.0294 Epoch: 9 Global Step: 380080 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:26:54,147-Speed 2636.73 samples/sec Loss 7.3138 LearningRate 0.0294 Epoch: 9 Global Step: 380090 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:26:58,035-Speed 2634.96 samples/sec Loss 7.3253 LearningRate 0.0294 Epoch: 9 Global Step: 380100 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:27:01,930-Speed 2629.76 samples/sec Loss 7.3168 LearningRate 0.0294 Epoch: 9 Global Step: 380110 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:27:05,817-Speed 2634.93 samples/sec Loss 7.3079 LearningRate 0.0294 Epoch: 9 Global Step: 380120 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:27:09,714-Speed 2628.36 samples/sec Loss 7.1239 LearningRate 0.0294 Epoch: 9 Global Step: 380130 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:27:13,608-Speed 2630.48 samples/sec Loss 7.3457 LearningRate 0.0294 Epoch: 9 Global Step: 380140 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:27:17,497-Speed 2633.19 samples/sec Loss 7.3100 LearningRate 0.0293 Epoch: 9 Global Step: 380150 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:27:21,389-Speed 2632.47 samples/sec Loss 7.1503 LearningRate 0.0293 Epoch: 9 Global Step: 380160 Fp16 Grad Scale: 16384 Required: 51 hours
Training: 2022-04-14 14:27:25,290-Speed 2624.99 samples/sec Loss 7.2215 LearningRate 0.0293 Epoch: 9 Global Step: 380170 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:27:29,182-Speed 2632.23 samples/sec Loss 7.2819 LearningRate 0.0293 Epoch: 9 Global Step: 380180 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:27:33,084-Speed 2624.65 samples/sec Loss 7.2677 LearningRate 0.0293 Epoch: 9 Global Step: 380190 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:27:36,988-Speed 2624.16 samples/sec Loss 7.2279 LearningRate 0.0293 Epoch: 9 Global Step: 380200 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:27:40,894-Speed 2621.97 samples/sec Loss 7.2779 LearningRate 0.0293 Epoch: 9 Global Step: 380210 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:27:44,790-Speed 2628.86 samples/sec Loss 7.3434 LearningRate 0.0293 Epoch: 9 Global Step: 380220 Fp16 Grad Scale: 32768 Required: 51 hours
Training: 2022-04-14 14:27:48,686-Speed 2628.84 samples/sec Loss 7.1491 LearningRate 0.0293 Epoch: 9 Global Step: 380230 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:27:52,599-Speed 2617.88 samples/sec Loss 7.3214 LearningRate 0.0293 Epoch: 9 Global Step: 380240 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:27:56,498-Speed 2626.76 samples/sec Loss 7.2319 LearningRate 0.0293 Epoch: 9 Global Step: 380250 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:28:00,400-Speed 2625.30 samples/sec Loss 7.2668 LearningRate 0.0293 Epoch: 9 Global Step: 380260 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:28:04,291-Speed 2631.99 samples/sec Loss 7.2834 LearningRate 0.0293 Epoch: 9 Global Step: 380270 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:28:08,202-Speed 2619.52 samples/sec Loss 7.3483 LearningRate 0.0293 Epoch: 9 Global Step: 380280 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:28:12,106-Speed 2623.54 samples/sec Loss 7.3931 LearningRate 0.0293 Epoch: 9 Global Step: 380290 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:28:16,003-Speed 2627.97 samples/sec Loss 7.2274 LearningRate 0.0293 Epoch: 9 Global Step: 380300 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:28:19,919-Speed 2615.65 samples/sec Loss 7.3890 LearningRate 0.0293 Epoch: 9 Global Step: 380310 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:28:23,799-Speed 2639.59 samples/sec Loss 7.4742 LearningRate 0.0293 Epoch: 9 Global Step: 380320 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:27,698-Speed 2626.75 samples/sec Loss 7.4423 LearningRate 0.0293 Epoch: 9 Global Step: 380330 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:31,593-Speed 2629.80 samples/sec Loss 7.1873 LearningRate 0.0293 Epoch: 9 Global Step: 380340 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:35,494-Speed 2625.48 samples/sec Loss 7.1356 LearningRate 0.0293 Epoch: 9 Global Step: 380350 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:39,388-Speed 2630.30 samples/sec Loss 7.2059 LearningRate 0.0293 Epoch: 9 Global Step: 380360 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:43,286-Speed 2627.69 samples/sec Loss 7.2360 LearningRate 0.0293 Epoch: 9 Global Step: 380370 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:47,178-Speed 2631.59 samples/sec Loss 7.2753 LearningRate 0.0293 Epoch: 9 Global Step: 380380 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:51,072-Speed 2630.40 samples/sec Loss 7.3362 LearningRate 0.0293 Epoch: 9 Global Step: 380390 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:54,971-Speed 2626.98 samples/sec Loss 7.2647 LearningRate 0.0293 Epoch: 9 Global Step: 380400 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:28:58,863-Speed 2631.47 samples/sec Loss 7.3404 LearningRate 0.0293 Epoch: 9 Global Step: 380410 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:29:02,757-Speed 2629.92 samples/sec Loss 7.2897 LearningRate 0.0293 Epoch: 9 Global Step: 380420 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:06,656-Speed 2626.86 samples/sec Loss 7.3568 LearningRate 0.0293 Epoch: 9 Global Step: 380430 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:10,561-Speed 2622.53 samples/sec Loss 7.3294 LearningRate 0.0293 Epoch: 9 Global Step: 380440 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:14,454-Speed 2630.99 samples/sec Loss 7.2679 LearningRate 0.0293 Epoch: 9 Global Step: 380450 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:18,356-Speed 2625.25 samples/sec Loss 7.3223 LearningRate 0.0293 Epoch: 9 Global Step: 380460 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:22,269-Speed 2617.85 samples/sec Loss 7.3811 LearningRate 0.0293 Epoch: 9 Global Step: 380470 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:26,165-Speed 2628.55 samples/sec Loss 7.2693 LearningRate 0.0293 Epoch: 9 Global Step: 380480 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:30,065-Speed 2626.63 samples/sec Loss 7.2592 LearningRate 0.0293 Epoch: 9 Global Step: 380490 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:33,959-Speed 2630.16 samples/sec Loss 7.3453 LearningRate 0.0293 Epoch: 9 Global Step: 380500 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:37,848-Speed 2633.00 samples/sec Loss 7.2950 LearningRate 0.0293 Epoch: 9 Global Step: 380510 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:29:41,748-Speed 2626.50 samples/sec Loss 7.2226 LearningRate 0.0293 Epoch: 9 Global Step: 380520 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:29:45,660-Speed 2618.34 samples/sec Loss 7.2981 LearningRate 0.0293 Epoch: 9 Global Step: 380530 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:29:49,559-Speed 2626.59 samples/sec Loss 7.3338 LearningRate 0.0293 Epoch: 9 Global Step: 380540 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:29:53,455-Speed 2628.84 samples/sec Loss 7.4067 LearningRate 0.0293 Epoch: 9 Global Step: 380550 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:29:57,346-Speed 2632.81 samples/sec Loss 7.2377 LearningRate 0.0293 Epoch: 9 Global Step: 380560 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:01,262-Speed 2615.35 samples/sec Loss 7.2325 LearningRate 0.0293 Epoch: 9 Global Step: 380570 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:05,170-Speed 2620.82 samples/sec Loss 7.2961 LearningRate 0.0293 Epoch: 9 Global Step: 380580 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:09,069-Speed 2626.55 samples/sec Loss 7.4103 LearningRate 0.0293 Epoch: 9 Global Step: 380590 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:12,973-Speed 2623.88 samples/sec Loss 7.2808 LearningRate 0.0293 Epoch: 9 Global Step: 380600 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:16,875-Speed 2624.69 samples/sec Loss 7.2373 LearningRate 0.0293 Epoch: 9 Global Step: 380610 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:20,755-Speed 2639.89 samples/sec Loss 7.3063 LearningRate 0.0293 Epoch: 9 Global Step: 380620 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:24,664-Speed 2620.46 samples/sec Loss 7.3339 LearningRate 0.0293 Epoch: 9 Global Step: 380630 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:28,564-Speed 2626.13 samples/sec Loss 7.3839 LearningRate 0.0293 Epoch: 9 Global Step: 380640 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:32,561-Speed 2562.20 samples/sec Loss 7.4317 LearningRate 0.0293 Epoch: 9 Global Step: 380650 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:36,462-Speed 2625.92 samples/sec Loss 7.3450 LearningRate 0.0293 Epoch: 9 Global Step: 380660 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:40,360-Speed 2627.03 samples/sec Loss 7.1882 LearningRate 0.0293 Epoch: 9 Global Step: 380670 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:44,261-Speed 2626.00 samples/sec Loss 7.2234 LearningRate 0.0293 Epoch: 9 Global Step: 380680 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:48,165-Speed 2623.13 samples/sec Loss 7.2114 LearningRate 0.0293 Epoch: 9 Global Step: 380690 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:52,069-Speed 2623.41 samples/sec Loss 7.3069 LearningRate 0.0293 Epoch: 9 Global Step: 380700 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:55,966-Speed 2628.76 samples/sec Loss 7.3360 LearningRate 0.0293 Epoch: 9 Global Step: 380710 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:30:59,816-Speed 2659.89 samples/sec Loss 7.4761 LearningRate 0.0293 Epoch: 9 Global Step: 380720 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:03,727-Speed 2618.95 samples/sec Loss 7.5657 LearningRate 0.0293 Epoch: 9 Global Step: 380730 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:07,623-Speed 2628.44 samples/sec Loss 7.2141 LearningRate 0.0293 Epoch: 9 Global Step: 380740 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:11,529-Speed 2623.16 samples/sec Loss 7.4339 LearningRate 0.0293 Epoch: 9 Global Step: 380750 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:15,420-Speed 2632.12 samples/sec Loss 7.2759 LearningRate 0.0293 Epoch: 9 Global Step: 380760 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:19,316-Speed 2629.12 samples/sec Loss 7.3055 LearningRate 0.0293 Epoch: 9 Global Step: 380770 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:23,229-Speed 2616.83 samples/sec Loss 7.4357 LearningRate 0.0293 Epoch: 9 Global Step: 380780 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:27,131-Speed 2625.60 samples/sec Loss 7.4269 LearningRate 0.0293 Epoch: 9 Global Step: 380790 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:31,032-Speed 2625.31 samples/sec Loss 7.3246 LearningRate 0.0293 Epoch: 9 Global Step: 380800 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:34,930-Speed 2627.08 samples/sec Loss 7.2714 LearningRate 0.0293 Epoch: 9 Global Step: 380810 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:31:38,856-Speed 2609.29 samples/sec Loss 7.2665 LearningRate 0.0293 Epoch: 9 Global Step: 380820 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:31:42,753-Speed 2628.53 samples/sec Loss 7.4119 LearningRate 0.0293 Epoch: 9 Global Step: 380830 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:31:46,651-Speed 2627.60 samples/sec Loss 7.1262 LearningRate 0.0293 Epoch: 9 Global Step: 380840 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:31:50,555-Speed 2623.75 samples/sec Loss 7.3539 LearningRate 0.0293 Epoch: 9 Global Step: 380850 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:31:54,451-Speed 2629.22 samples/sec Loss 7.2859 LearningRate 0.0293 Epoch: 9 Global Step: 380860 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:31:58,356-Speed 2622.35 samples/sec Loss 7.1987 LearningRate 0.0293 Epoch: 9 Global Step: 380870 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:32:02,259-Speed 2624.61 samples/sec Loss 7.3007 LearningRate 0.0293 Epoch: 9 Global Step: 380880 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:32:06,150-Speed 2631.69 samples/sec Loss 7.3112 LearningRate 0.0293 Epoch: 9 Global Step: 380890 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:32:10,047-Speed 2628.57 samples/sec Loss 7.2827 LearningRate 0.0293 Epoch: 9 Global Step: 380900 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:32:13,938-Speed 2632.46 samples/sec Loss 7.4300 LearningRate 0.0293 Epoch: 9 Global Step: 380910 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:32:17,849-Speed 2618.85 samples/sec Loss 7.2927 LearningRate 0.0292 Epoch: 9 Global Step: 380920 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:21,746-Speed 2627.84 samples/sec Loss 7.2857 LearningRate 0.0292 Epoch: 9 Global Step: 380930 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:25,647-Speed 2626.43 samples/sec Loss 7.3255 LearningRate 0.0292 Epoch: 9 Global Step: 380940 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:29,534-Speed 2634.60 samples/sec Loss 7.3952 LearningRate 0.0292 Epoch: 9 Global Step: 380950 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:33,481-Speed 2595.25 samples/sec Loss 7.2867 LearningRate 0.0292 Epoch: 9 Global Step: 380960 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:37,383-Speed 2624.97 samples/sec Loss 7.2727 LearningRate 0.0292 Epoch: 9 Global Step: 380970 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:41,327-Speed 2596.57 samples/sec Loss 7.4273 LearningRate 0.0292 Epoch: 9 Global Step: 380980 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:45,224-Speed 2627.88 samples/sec Loss 7.3503 LearningRate 0.0292 Epoch: 9 Global Step: 380990 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:49,119-Speed 2629.81 samples/sec Loss 7.3350 LearningRate 0.0292 Epoch: 9 Global Step: 381000 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:53,021-Speed 2625.18 samples/sec Loss 7.3494 LearningRate 0.0292 Epoch: 9 Global Step: 381010 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:32:56,924-Speed 2624.29 samples/sec Loss 7.1961 LearningRate 0.0292 Epoch: 9 Global Step: 381020 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:00,816-Speed 2631.87 samples/sec Loss 7.2674 LearningRate 0.0292 Epoch: 9 Global Step: 381030 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:04,719-Speed 2624.24 samples/sec Loss 7.2784 LearningRate 0.0292 Epoch: 9 Global Step: 381040 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:08,615-Speed 2628.59 samples/sec Loss 7.2292 LearningRate 0.0292 Epoch: 9 Global Step: 381050 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:12,511-Speed 2629.07 samples/sec Loss 7.2072 LearningRate 0.0292 Epoch: 9 Global Step: 381060 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:16,410-Speed 2626.73 samples/sec Loss 7.2986 LearningRate 0.0292 Epoch: 9 Global Step: 381070 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:20,312-Speed 2625.03 samples/sec Loss 7.3952 LearningRate 0.0292 Epoch: 9 Global Step: 381080 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:24,204-Speed 2631.61 samples/sec Loss 7.3166 LearningRate 0.0292 Epoch: 9 Global Step: 381090 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:28,099-Speed 2629.45 samples/sec Loss 7.2884 LearningRate 0.0292 Epoch: 9 Global Step: 381100 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:32,000-Speed 2626.21 samples/sec Loss 7.3216 LearningRate 0.0292 Epoch: 9 Global Step: 381110 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:35,887-Speed 2634.51 samples/sec Loss 7.1959 LearningRate 0.0292 Epoch: 9 Global Step: 381120 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:33:39,766-Speed 2640.55 samples/sec Loss 7.2531 LearningRate 0.0292 Epoch: 9 Global Step: 381130 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:43,661-Speed 2629.42 samples/sec Loss 7.2486 LearningRate 0.0292 Epoch: 9 Global Step: 381140 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:47,552-Speed 2632.56 samples/sec Loss 7.4212 LearningRate 0.0292 Epoch: 9 Global Step: 381150 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:51,451-Speed 2627.03 samples/sec Loss 7.3905 LearningRate 0.0292 Epoch: 9 Global Step: 381160 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:55,341-Speed 2632.92 samples/sec Loss 7.3462 LearningRate 0.0292 Epoch: 9 Global Step: 381170 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:33:59,239-Speed 2627.86 samples/sec Loss 7.1775 LearningRate 0.0292 Epoch: 9 Global Step: 381180 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:34:03,125-Speed 2635.47 samples/sec Loss 7.3278 LearningRate 0.0292 Epoch: 9 Global Step: 381190 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:34:07,028-Speed 2624.40 samples/sec Loss 7.2689 LearningRate 0.0292 Epoch: 9 Global Step: 381200 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:34:10,924-Speed 2628.45 samples/sec Loss 7.1965 LearningRate 0.0292 Epoch: 9 Global Step: 381210 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:34:14,790-Speed 2649.20 samples/sec Loss 7.2299 LearningRate 0.0292 Epoch: 9 Global Step: 381220 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:34:18,672-Speed 2638.61 samples/sec Loss 7.4905 LearningRate 0.0292 Epoch: 9 Global Step: 381230 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:22,576-Speed 2623.74 samples/sec Loss 8.2937 LearningRate 0.0292 Epoch: 9 Global Step: 381240 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:26,467-Speed 2632.21 samples/sec Loss 7.6895 LearningRate 0.0292 Epoch: 9 Global Step: 381250 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:30,364-Speed 2627.91 samples/sec Loss 7.4142 LearningRate 0.0292 Epoch: 9 Global Step: 381260 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:34,262-Speed 2627.94 samples/sec Loss 7.3700 LearningRate 0.0292 Epoch: 9 Global Step: 381270 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:38,158-Speed 2629.09 samples/sec Loss 7.2916 LearningRate 0.0292 Epoch: 9 Global Step: 381280 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:42,052-Speed 2630.34 samples/sec Loss 7.3248 LearningRate 0.0292 Epoch: 9 Global Step: 381290 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:45,953-Speed 2625.64 samples/sec Loss 7.3324 LearningRate 0.0292 Epoch: 9 Global Step: 381300 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:49,858-Speed 2622.51 samples/sec Loss 7.2649 LearningRate 0.0292 Epoch: 9 Global Step: 381310 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:53,755-Speed 2628.87 samples/sec Loss 7.2873 LearningRate 0.0292 Epoch: 9 Global Step: 381320 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:34:57,650-Speed 2629.23 samples/sec Loss 7.3722 LearningRate 0.0292 Epoch: 9 Global Step: 381330 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:01,543-Speed 2630.75 samples/sec Loss 7.3807 LearningRate 0.0292 Epoch: 9 Global Step: 381340 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:05,438-Speed 2630.03 samples/sec Loss 7.3452 LearningRate 0.0292 Epoch: 9 Global Step: 381350 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:09,339-Speed 2625.11 samples/sec Loss 7.2508 LearningRate 0.0292 Epoch: 9 Global Step: 381360 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:13,229-Speed 2633.73 samples/sec Loss 7.2536 LearningRate 0.0292 Epoch: 9 Global Step: 381370 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:17,137-Speed 2620.60 samples/sec Loss 7.3158 LearningRate 0.0292 Epoch: 9 Global Step: 381380 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:21,040-Speed 2624.07 samples/sec Loss 7.3338 LearningRate 0.0292 Epoch: 9 Global Step: 381390 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:24,939-Speed 2626.52 samples/sec Loss 7.1513 LearningRate 0.0292 Epoch: 9 Global Step: 381400 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:28,843-Speed 2623.77 samples/sec Loss 7.3231 LearningRate 0.0292 Epoch: 9 Global Step: 381410 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:35:32,728-Speed 2636.48 samples/sec Loss 7.5485 LearningRate 0.0292 Epoch: 9 Global Step: 381420 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:35:36,622-Speed 2630.64 samples/sec Loss 7.6080 LearningRate 0.0292 Epoch: 9 Global Step: 381430 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:35:40,537-Speed 2615.76 samples/sec Loss 7.3854 LearningRate 0.0292 Epoch: 9 Global Step: 381440 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:35:44,439-Speed 2624.82 samples/sec Loss 7.2962 LearningRate 0.0292 Epoch: 9 Global Step: 381450 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:35:48,337-Speed 2628.10 samples/sec Loss 7.4302 LearningRate 0.0292 Epoch: 9 Global Step: 381460 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:35:52,236-Speed 2626.44 samples/sec Loss 7.2710 LearningRate 0.0292 Epoch: 9 Global Step: 381470 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:35:56,133-Speed 2628.72 samples/sec Loss 7.2049 LearningRate 0.0292 Epoch: 9 Global Step: 381480 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:36:00,051-Speed 2614.04 samples/sec Loss 7.3654 LearningRate 0.0292 Epoch: 9 Global Step: 381490 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:36:03,944-Speed 2630.55 samples/sec Loss 7.3982 LearningRate 0.0292 Epoch: 9 Global Step: 381500 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:36:07,845-Speed 2625.60 samples/sec Loss 7.2216 LearningRate 0.0292 Epoch: 9 Global Step: 381510 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:36:11,755-Speed 2620.07 samples/sec Loss 7.3482 LearningRate 0.0292 Epoch: 9 Global Step: 381520 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:15,648-Speed 2631.00 samples/sec Loss 7.2362 LearningRate 0.0292 Epoch: 9 Global Step: 381530 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:19,545-Speed 2628.43 samples/sec Loss 7.3603 LearningRate 0.0292 Epoch: 9 Global Step: 381540 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:23,437-Speed 2630.89 samples/sec Loss 7.2804 LearningRate 0.0292 Epoch: 9 Global Step: 381550 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:27,339-Speed 2625.40 samples/sec Loss 7.2985 LearningRate 0.0292 Epoch: 9 Global Step: 381560 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:31,240-Speed 2625.54 samples/sec Loss 7.2666 LearningRate 0.0292 Epoch: 9 Global Step: 381570 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:35,130-Speed 2632.74 samples/sec Loss 7.3585 LearningRate 0.0292 Epoch: 9 Global Step: 381580 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:39,028-Speed 2626.95 samples/sec Loss 7.3227 LearningRate 0.0292 Epoch: 9 Global Step: 381590 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:42,926-Speed 2628.08 samples/sec Loss 7.4256 LearningRate 0.0292 Epoch: 9 Global Step: 381600 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:46,832-Speed 2621.78 samples/sec Loss 7.3203 LearningRate 0.0292 Epoch: 9 Global Step: 381610 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:36:50,730-Speed 2627.91 samples/sec Loss 7.2659 LearningRate 0.0292 Epoch: 9 Global Step: 381620 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:36:54,617-Speed 2635.37 samples/sec Loss 7.1392 LearningRate 0.0292 Epoch: 9 Global Step: 381630 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:36:58,510-Speed 2630.60 samples/sec Loss 7.1996 LearningRate 0.0292 Epoch: 9 Global Step: 381640 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:02,415-Speed 2623.00 samples/sec Loss 7.2296 LearningRate 0.0292 Epoch: 9 Global Step: 381650 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:06,304-Speed 2633.70 samples/sec Loss 7.2122 LearningRate 0.0292 Epoch: 9 Global Step: 381660 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:10,201-Speed 2627.65 samples/sec Loss 7.2579 LearningRate 0.0292 Epoch: 9 Global Step: 381670 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:14,120-Speed 2614.38 samples/sec Loss 7.2795 LearningRate 0.0292 Epoch: 9 Global Step: 381680 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:18,026-Speed 2622.29 samples/sec Loss 7.3117 LearningRate 0.0291 Epoch: 9 Global Step: 381690 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:21,924-Speed 2627.07 samples/sec Loss 7.3151 LearningRate 0.0291 Epoch: 9 Global Step: 381700 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:25,820-Speed 2629.63 samples/sec Loss 7.3792 LearningRate 0.0291 Epoch: 9 Global Step: 381710 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:37:29,717-Speed 2628.19 samples/sec Loss 7.2910 LearningRate 0.0291 Epoch: 9 Global Step: 381720 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:37:33,620-Speed 2624.49 samples/sec Loss 7.2543 LearningRate 0.0291 Epoch: 9 Global Step: 381730 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:37:37,514-Speed 2629.93 samples/sec Loss 7.3106 LearningRate 0.0291 Epoch: 9 Global Step: 381740 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:37:41,409-Speed 2629.80 samples/sec Loss 7.3171 LearningRate 0.0291 Epoch: 9 Global Step: 381750 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:37:45,314-Speed 2622.38 samples/sec Loss 7.2183 LearningRate 0.0291 Epoch: 9 Global Step: 381760 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:37:49,221-Speed 2621.80 samples/sec Loss 7.2364 LearningRate 0.0291 Epoch: 9 Global Step: 381770 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:37:53,109-Speed 2633.88 samples/sec Loss 7.2469 LearningRate 0.0291 Epoch: 9 Global Step: 381780 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:37:57,005-Speed 2629.25 samples/sec Loss 7.3600 LearningRate 0.0291 Epoch: 9 Global Step: 381790 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:00,902-Speed 2628.15 samples/sec Loss 7.2876 LearningRate 0.0291 Epoch: 9 Global Step: 381800 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:04,796-Speed 2630.64 samples/sec Loss 7.1491 LearningRate 0.0291 Epoch: 9 Global Step: 381810 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:08,687-Speed 2632.24 samples/sec Loss 7.2976 LearningRate 0.0291 Epoch: 9 Global Step: 381820 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:38:12,581-Speed 2630.42 samples/sec Loss 7.3078 LearningRate 0.0291 Epoch: 9 Global Step: 381830 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:38:16,518-Speed 2601.24 samples/sec Loss 7.2566 LearningRate 0.0291 Epoch: 9 Global Step: 381840 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:38:20,490-Speed 2578.68 samples/sec Loss 7.2085 LearningRate 0.0291 Epoch: 9 Global Step: 381850 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:38:24,376-Speed 2635.43 samples/sec Loss 7.2602 LearningRate 0.0291 Epoch: 9 Global Step: 381860 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:28,288-Speed 2618.15 samples/sec Loss 7.3351 LearningRate 0.0291 Epoch: 9 Global Step: 381870 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:32,193-Speed 2623.11 samples/sec Loss 7.2311 LearningRate 0.0291 Epoch: 9 Global Step: 381880 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:36,089-Speed 2628.32 samples/sec Loss 7.2137 LearningRate 0.0291 Epoch: 9 Global Step: 381890 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:40,007-Speed 2614.60 samples/sec Loss 7.3819 LearningRate 0.0291 Epoch: 9 Global Step: 381900 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:43,909-Speed 2624.61 samples/sec Loss 7.2314 LearningRate 0.0291 Epoch: 9 Global Step: 381910 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:47,812-Speed 2624.83 samples/sec Loss 7.2190 LearningRate 0.0291 Epoch: 9 Global Step: 381920 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:51,893-Speed 2509.53 samples/sec Loss 7.1955 LearningRate 0.0291 Epoch: 9 Global Step: 381930 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:38:55,781-Speed 2634.07 samples/sec Loss 7.2439 LearningRate 0.0291 Epoch: 9 Global Step: 381940 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:38:59,696-Speed 2616.32 samples/sec Loss 7.8477 LearningRate 0.0291 Epoch: 9 Global Step: 381950 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:03,614-Speed 2614.11 samples/sec Loss 7.5094 LearningRate 0.0291 Epoch: 9 Global Step: 381960 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:07,515-Speed 2625.38 samples/sec Loss 7.2853 LearningRate 0.0291 Epoch: 9 Global Step: 381970 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:11,417-Speed 2625.03 samples/sec Loss 7.2722 LearningRate 0.0291 Epoch: 9 Global Step: 381980 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:15,308-Speed 2631.71 samples/sec Loss 7.2407 LearningRate 0.0291 Epoch: 9 Global Step: 381990 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:19,205-Speed 2629.04 samples/sec Loss 7.2080 LearningRate 0.0291 Epoch: 9 Global Step: 382000 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:23,107-Speed 2624.93 samples/sec Loss 7.2654 LearningRate 0.0291 Epoch: 9 Global Step: 382010 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:27,003-Speed 2629.10 samples/sec Loss 7.2430 LearningRate 0.0291 Epoch: 9 Global Step: 382020 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:30,897-Speed 2629.72 samples/sec Loss 7.2645 LearningRate 0.0291 Epoch: 9 Global Step: 382030 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:39:34,791-Speed 2630.83 samples/sec Loss 7.1310 LearningRate 0.0291 Epoch: 9 Global Step: 382040 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:39:38,690-Speed 2626.69 samples/sec Loss 7.2667 LearningRate 0.0291 Epoch: 9 Global Step: 382050 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:39:42,598-Speed 2620.28 samples/sec Loss 7.2982 LearningRate 0.0291 Epoch: 9 Global Step: 382060 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:39:46,518-Speed 2613.14 samples/sec Loss 7.2589 LearningRate 0.0291 Epoch: 9 Global Step: 382070 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:39:50,417-Speed 2626.24 samples/sec Loss 7.1817 LearningRate 0.0291 Epoch: 9 Global Step: 382080 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:39:54,358-Speed 2599.31 samples/sec Loss 7.2765 LearningRate 0.0291 Epoch: 9 Global Step: 382090 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:39:58,449-Speed 2503.50 samples/sec Loss 7.2348 LearningRate 0.0291 Epoch: 9 Global Step: 382100 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:02,449-Speed 2560.88 samples/sec Loss 7.3205 LearningRate 0.0291 Epoch: 9 Global Step: 382110 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:06,394-Speed 2596.31 samples/sec Loss 7.2671 LearningRate 0.0291 Epoch: 9 Global Step: 382120 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:10,306-Speed 2618.20 samples/sec Loss 7.3382 LearningRate 0.0291 Epoch: 9 Global Step: 382130 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:14,201-Speed 2629.19 samples/sec Loss 7.3941 LearningRate 0.0291 Epoch: 9 Global Step: 382140 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:40:18,100-Speed 2626.96 samples/sec Loss 7.5041 LearningRate 0.0291 Epoch: 9 Global Step: 382150 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:22,001-Speed 2625.98 samples/sec Loss 7.6680 LearningRate 0.0291 Epoch: 9 Global Step: 382160 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:25,901-Speed 2625.74 samples/sec Loss 7.4646 LearningRate 0.0291 Epoch: 9 Global Step: 382170 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:29,850-Speed 2594.13 samples/sec Loss 7.3426 LearningRate 0.0291 Epoch: 9 Global Step: 382180 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:33,756-Speed 2622.54 samples/sec Loss 7.1874 LearningRate 0.0291 Epoch: 9 Global Step: 382190 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:37,768-Speed 2552.77 samples/sec Loss 7.2823 LearningRate 0.0291 Epoch: 9 Global Step: 382200 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:41,806-Speed 2536.83 samples/sec Loss 7.3311 LearningRate 0.0291 Epoch: 9 Global Step: 382210 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:40:45,735-Speed 2606.58 samples/sec Loss 7.3615 LearningRate 0.0291 Epoch: 9 Global Step: 382220 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:40:49,717-Speed 2572.13 samples/sec Loss 7.2844 LearningRate 0.0291 Epoch: 9 Global Step: 382230 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:40:53,623-Speed 2622.35 samples/sec Loss 7.4938 LearningRate 0.0291 Epoch: 9 Global Step: 382240 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:40:57,526-Speed 2624.34 samples/sec Loss 7.3855 LearningRate 0.0291 Epoch: 9 Global Step: 382250 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:41:01,421-Speed 2629.26 samples/sec Loss 7.2494 LearningRate 0.0291 Epoch: 9 Global Step: 382260 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:41:05,309-Speed 2634.21 samples/sec Loss 7.1420 LearningRate 0.0291 Epoch: 9 Global Step: 382270 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:41:09,258-Speed 2593.98 samples/sec Loss 7.3399 LearningRate 0.0291 Epoch: 9 Global Step: 382280 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:41:13,157-Speed 2626.80 samples/sec Loss 7.2777 LearningRate 0.0291 Epoch: 9 Global Step: 382290 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:41:17,066-Speed 2620.15 samples/sec Loss 7.2555 LearningRate 0.0291 Epoch: 9 Global Step: 382300 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:41:20,968-Speed 2625.22 samples/sec Loss 7.3646 LearningRate 0.0291 Epoch: 9 Global Step: 382310 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:41:25,096-Speed 2481.13 samples/sec Loss 7.2017 LearningRate 0.0291 Epoch: 9 Global Step: 382320 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:28,994-Speed 2627.77 samples/sec Loss 7.2761 LearningRate 0.0291 Epoch: 9 Global Step: 382330 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:32,910-Speed 2615.05 samples/sec Loss 7.2673 LearningRate 0.0291 Epoch: 9 Global Step: 382340 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:36,802-Speed 2631.66 samples/sec Loss 7.3083 LearningRate 0.0291 Epoch: 9 Global Step: 382350 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:40,707-Speed 2622.43 samples/sec Loss 7.2714 LearningRate 0.0291 Epoch: 9 Global Step: 382360 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:44,600-Speed 2631.05 samples/sec Loss 7.1499 LearningRate 0.0291 Epoch: 9 Global Step: 382370 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:48,501-Speed 2626.37 samples/sec Loss 7.3177 LearningRate 0.0291 Epoch: 9 Global Step: 382380 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:52,390-Speed 2633.67 samples/sec Loss 7.2488 LearningRate 0.0291 Epoch: 9 Global Step: 382390 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:41:56,278-Speed 2634.47 samples/sec Loss 7.3252 LearningRate 0.0291 Epoch: 9 Global Step: 382400 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:42:00,176-Speed 2627.42 samples/sec Loss 7.2118 LearningRate 0.0291 Epoch: 9 Global Step: 382410 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:42:04,081-Speed 2622.35 samples/sec Loss 7.2154 LearningRate 0.0291 Epoch: 9 Global Step: 382420 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:07,979-Speed 2627.47 samples/sec Loss 7.2016 LearningRate 0.0291 Epoch: 9 Global Step: 382430 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:11,864-Speed 2636.26 samples/sec Loss 7.3534 LearningRate 0.0291 Epoch: 9 Global Step: 382440 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:15,767-Speed 2624.61 samples/sec Loss 7.2646 LearningRate 0.0291 Epoch: 9 Global Step: 382450 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:19,670-Speed 2624.25 samples/sec Loss 7.3675 LearningRate 0.0290 Epoch: 9 Global Step: 382460 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:23,557-Speed 2634.85 samples/sec Loss 7.1842 LearningRate 0.0290 Epoch: 9 Global Step: 382470 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:27,453-Speed 2629.56 samples/sec Loss 7.2553 LearningRate 0.0290 Epoch: 9 Global Step: 382480 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:31,353-Speed 2626.12 samples/sec Loss 7.1649 LearningRate 0.0290 Epoch: 9 Global Step: 382490 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:35,247-Speed 2629.84 samples/sec Loss 7.2986 LearningRate 0.0290 Epoch: 9 Global Step: 382500 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:39,139-Speed 2631.67 samples/sec Loss 7.1637 LearningRate 0.0290 Epoch: 9 Global Step: 382510 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:42:43,037-Speed 2627.60 samples/sec Loss 7.1309 LearningRate 0.0290 Epoch: 9 Global Step: 382520 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:42:46,945-Speed 2620.59 samples/sec Loss 7.3324 LearningRate 0.0290 Epoch: 9 Global Step: 382530 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:42:50,841-Speed 2629.53 samples/sec Loss 7.3476 LearningRate 0.0290 Epoch: 9 Global Step: 382540 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:42:54,743-Speed 2624.29 samples/sec Loss 7.2153 LearningRate 0.0290 Epoch: 9 Global Step: 382550 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:42:58,641-Speed 2627.98 samples/sec Loss 7.2190 LearningRate 0.0290 Epoch: 9 Global Step: 382560 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:43:02,523-Speed 2638.40 samples/sec Loss 7.3183 LearningRate 0.0290 Epoch: 9 Global Step: 382570 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:43:07,438-Speed 2083.76 samples/sec Loss 7.2252 LearningRate 0.0290 Epoch: 9 Global Step: 382580 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:43:11,334-Speed 2629.36 samples/sec Loss 7.3161 LearningRate 0.0290 Epoch: 9 Global Step: 382590 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:43:15,252-Speed 2614.31 samples/sec Loss 7.2200 LearningRate 0.0290 Epoch: 9 Global Step: 382600 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:43:19,158-Speed 2622.57 samples/sec Loss 7.2004 LearningRate 0.0290 Epoch: 9 Global Step: 382610 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:43:23,053-Speed 2629.43 samples/sec Loss 7.1423 LearningRate 0.0290 Epoch: 9 Global Step: 382620 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:26,956-Speed 2624.59 samples/sec Loss 7.3195 LearningRate 0.0290 Epoch: 9 Global Step: 382630 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:30,864-Speed 2620.98 samples/sec Loss 7.3201 LearningRate 0.0290 Epoch: 9 Global Step: 382640 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:34,793-Speed 2606.70 samples/sec Loss 7.3078 LearningRate 0.0290 Epoch: 9 Global Step: 382650 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:38,760-Speed 2582.10 samples/sec Loss 7.1917 LearningRate 0.0290 Epoch: 9 Global Step: 382660 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:42,655-Speed 2629.70 samples/sec Loss 7.3805 LearningRate 0.0290 Epoch: 9 Global Step: 382670 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:46,561-Speed 2622.31 samples/sec Loss 7.2444 LearningRate 0.0290 Epoch: 9 Global Step: 382680 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:50,476-Speed 2616.60 samples/sec Loss 7.1194 LearningRate 0.0290 Epoch: 9 Global Step: 382690 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:54,380-Speed 2623.32 samples/sec Loss 7.3731 LearningRate 0.0290 Epoch: 9 Global Step: 382700 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:43:58,282-Speed 2624.86 samples/sec Loss 7.2754 LearningRate 0.0290 Epoch: 9 Global Step: 382710 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:44:02,174-Speed 2632.08 samples/sec Loss 7.6548 LearningRate 0.0290 Epoch: 9 Global Step: 382720 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:06,074-Speed 2626.17 samples/sec Loss 7.6230 LearningRate 0.0290 Epoch: 9 Global Step: 382730 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:10,021-Speed 2594.85 samples/sec Loss 7.2889 LearningRate 0.0290 Epoch: 9 Global Step: 382740 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:13,927-Speed 2622.03 samples/sec Loss 7.2939 LearningRate 0.0290 Epoch: 9 Global Step: 382750 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:17,825-Speed 2627.62 samples/sec Loss 7.3349 LearningRate 0.0290 Epoch: 9 Global Step: 382760 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:21,756-Speed 2605.44 samples/sec Loss 7.2081 LearningRate 0.0290 Epoch: 9 Global Step: 382770 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:25,698-Speed 2598.67 samples/sec Loss 7.3071 LearningRate 0.0290 Epoch: 9 Global Step: 382780 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:29,597-Speed 2627.16 samples/sec Loss 7.5249 LearningRate 0.0290 Epoch: 9 Global Step: 382790 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:33,605-Speed 2555.66 samples/sec Loss 7.3362 LearningRate 0.0290 Epoch: 9 Global Step: 382800 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:37,506-Speed 2624.99 samples/sec Loss 7.3559 LearningRate 0.0290 Epoch: 9 Global Step: 382810 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:44:41,407-Speed 2626.00 samples/sec Loss 7.2748 LearningRate 0.0290 Epoch: 9 Global Step: 382820 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:44:45,303-Speed 2628.80 samples/sec Loss 7.3207 LearningRate 0.0290 Epoch: 9 Global Step: 382830 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:44:49,191-Speed 2634.46 samples/sec Loss 7.3733 LearningRate 0.0290 Epoch: 9 Global Step: 382840 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:44:53,085-Speed 2630.52 samples/sec Loss 7.3076 LearningRate 0.0290 Epoch: 9 Global Step: 382850 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:44:56,987-Speed 2624.55 samples/sec Loss 7.2252 LearningRate 0.0290 Epoch: 9 Global Step: 382860 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:45:00,885-Speed 2628.34 samples/sec Loss 7.2488 LearningRate 0.0290 Epoch: 9 Global Step: 382870 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:45:04,784-Speed 2626.88 samples/sec Loss 7.2864 LearningRate 0.0290 Epoch: 9 Global Step: 382880 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:45:08,700-Speed 2615.76 samples/sec Loss 7.3032 LearningRate 0.0290 Epoch: 9 Global Step: 382890 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:45:12,618-Speed 2613.77 samples/sec Loss 7.1321 LearningRate 0.0290 Epoch: 9 Global Step: 382900 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:45:16,504-Speed 2635.60 samples/sec Loss 7.3772 LearningRate 0.0290 Epoch: 9 Global Step: 382910 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:45:20,405-Speed 2625.36 samples/sec Loss 7.2918 LearningRate 0.0290 Epoch: 9 Global Step: 382920 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:24,305-Speed 2626.61 samples/sec Loss 7.2155 LearningRate 0.0290 Epoch: 9 Global Step: 382930 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:28,198-Speed 2631.14 samples/sec Loss 7.3712 LearningRate 0.0290 Epoch: 9 Global Step: 382940 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:32,104-Speed 2622.88 samples/sec Loss 7.2149 LearningRate 0.0290 Epoch: 9 Global Step: 382950 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:36,024-Speed 2612.65 samples/sec Loss 7.2466 LearningRate 0.0290 Epoch: 9 Global Step: 382960 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:39,941-Speed 2614.77 samples/sec Loss 7.2692 LearningRate 0.0290 Epoch: 9 Global Step: 382970 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:43,831-Speed 2633.12 samples/sec Loss 7.1910 LearningRate 0.0290 Epoch: 9 Global Step: 382980 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:47,729-Speed 2627.42 samples/sec Loss 7.2548 LearningRate 0.0290 Epoch: 9 Global Step: 382990 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:51,620-Speed 2632.57 samples/sec Loss 7.2474 LearningRate 0.0290 Epoch: 9 Global Step: 383000 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:55,517-Speed 2628.29 samples/sec Loss 7.2435 LearningRate 0.0290 Epoch: 9 Global Step: 383010 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:45:59,407-Speed 2633.00 samples/sec Loss 7.2056 LearningRate 0.0290 Epoch: 9 Global Step: 383020 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:46:03,317-Speed 2619.46 samples/sec Loss 7.1142 LearningRate 0.0290 Epoch: 9 Global Step: 383030 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:46:07,242-Speed 2609.37 samples/sec Loss 7.1827 LearningRate 0.0290 Epoch: 9 Global Step: 383040 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:46:11,135-Speed 2630.85 samples/sec Loss 7.2141 LearningRate 0.0290 Epoch: 9 Global Step: 383050 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:46:15,038-Speed 2624.21 samples/sec Loss 7.2654 LearningRate 0.0290 Epoch: 9 Global Step: 383060 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:46:18,863-Speed 2678.11 samples/sec Loss 7.5155 LearningRate 0.0290 Epoch: 9 Global Step: 383070 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:22,786-Speed 2610.74 samples/sec Loss 7.2722 LearningRate 0.0290 Epoch: 9 Global Step: 383080 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:26,679-Speed 2630.46 samples/sec Loss 7.2946 LearningRate 0.0290 Epoch: 9 Global Step: 383090 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:30,570-Speed 2633.09 samples/sec Loss 7.1455 LearningRate 0.0290 Epoch: 9 Global Step: 383100 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:34,471-Speed 2626.07 samples/sec Loss 7.1706 LearningRate 0.0290 Epoch: 9 Global Step: 383110 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:38,361-Speed 2632.85 samples/sec Loss 7.2529 LearningRate 0.0290 Epoch: 9 Global Step: 383120 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:42,253-Speed 2631.28 samples/sec Loss 7.2173 LearningRate 0.0290 Epoch: 9 Global Step: 383130 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:46,142-Speed 2633.54 samples/sec Loss 7.2944 LearningRate 0.0290 Epoch: 9 Global Step: 383140 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:50,042-Speed 2626.47 samples/sec Loss 7.2431 LearningRate 0.0290 Epoch: 9 Global Step: 383150 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:53,944-Speed 2625.19 samples/sec Loss 7.2805 LearningRate 0.0290 Epoch: 9 Global Step: 383160 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 14:46:57,835-Speed 2631.89 samples/sec Loss 7.3382 LearningRate 0.0290 Epoch: 9 Global Step: 383170 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:01,736-Speed 2625.35 samples/sec Loss 7.2763 LearningRate 0.0290 Epoch: 9 Global Step: 383180 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:05,624-Speed 2634.30 samples/sec Loss 7.4516 LearningRate 0.0290 Epoch: 9 Global Step: 383190 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:09,529-Speed 2623.82 samples/sec Loss 7.2859 LearningRate 0.0290 Epoch: 9 Global Step: 383200 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:13,451-Speed 2611.23 samples/sec Loss 7.1825 LearningRate 0.0290 Epoch: 9 Global Step: 383210 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:17,377-Speed 2608.30 samples/sec Loss 7.2442 LearningRate 0.0290 Epoch: 9 Global Step: 383220 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:21,274-Speed 2628.32 samples/sec Loss 7.1417 LearningRate 0.0289 Epoch: 9 Global Step: 383230 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:25,170-Speed 2629.14 samples/sec Loss 7.2105 LearningRate 0.0289 Epoch: 9 Global Step: 383240 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:29,069-Speed 2626.97 samples/sec Loss 7.2447 LearningRate 0.0289 Epoch: 9 Global Step: 383250 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:33,011-Speed 2598.60 samples/sec Loss 7.2591 LearningRate 0.0289 Epoch: 9 Global Step: 383260 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:47:36,913-Speed 2625.06 samples/sec Loss 7.2469 LearningRate 0.0289 Epoch: 9 Global Step: 383270 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:47:40,818-Speed 2622.65 samples/sec Loss 7.2619 LearningRate 0.0289 Epoch: 9 Global Step: 383280 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:47:44,712-Speed 2630.51 samples/sec Loss 7.1968 LearningRate 0.0289 Epoch: 9 Global Step: 383290 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:47:48,607-Speed 2629.39 samples/sec Loss 7.3395 LearningRate 0.0289 Epoch: 9 Global Step: 383300 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:47:52,519-Speed 2618.43 samples/sec Loss 7.1662 LearningRate 0.0289 Epoch: 9 Global Step: 383310 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:47:56,416-Speed 2628.12 samples/sec Loss 7.2528 LearningRate 0.0289 Epoch: 9 Global Step: 383320 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:00,306-Speed 2633.07 samples/sec Loss 7.2527 LearningRate 0.0289 Epoch: 9 Global Step: 383330 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:04,203-Speed 2628.58 samples/sec Loss 7.2206 LearningRate 0.0289 Epoch: 9 Global Step: 383340 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:08,095-Speed 2631.69 samples/sec Loss 7.3494 LearningRate 0.0289 Epoch: 9 Global Step: 383350 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:11,995-Speed 2625.94 samples/sec Loss 7.2724 LearningRate 0.0289 Epoch: 9 Global Step: 383360 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:15,875-Speed 2639.78 samples/sec Loss 7.5958 LearningRate 0.0289 Epoch: 9 Global Step: 383370 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:19,796-Speed 2612.95 samples/sec Loss 7.3840 LearningRate 0.0289 Epoch: 9 Global Step: 383380 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:23,700-Speed 2622.96 samples/sec Loss 7.3030 LearningRate 0.0289 Epoch: 9 Global Step: 383390 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:27,599-Speed 2627.22 samples/sec Loss 7.2379 LearningRate 0.0289 Epoch: 9 Global Step: 383400 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:31,502-Speed 2624.59 samples/sec Loss 7.2186 LearningRate 0.0289 Epoch: 9 Global Step: 383410 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:35,392-Speed 2632.95 samples/sec Loss 7.2766 LearningRate 0.0289 Epoch: 9 Global Step: 383420 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:39,287-Speed 2629.44 samples/sec Loss 7.1863 LearningRate 0.0289 Epoch: 9 Global Step: 383430 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:43,189-Speed 2625.57 samples/sec Loss 7.1937 LearningRate 0.0289 Epoch: 9 Global Step: 383440 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:47,093-Speed 2623.30 samples/sec Loss 7.3441 LearningRate 0.0289 Epoch: 9 Global Step: 383450 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:51,016-Speed 2611.16 samples/sec Loss 7.2801 LearningRate 0.0289 Epoch: 9 Global Step: 383460 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:48:54,922-Speed 2622.06 samples/sec Loss 7.1539 LearningRate 0.0289 Epoch: 9 Global Step: 383470 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:48:58,849-Speed 2609.03 samples/sec Loss 7.2993 LearningRate 0.0289 Epoch: 9 Global Step: 383480 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:02,743-Speed 2629.55 samples/sec Loss 7.1645 LearningRate 0.0289 Epoch: 9 Global Step: 383490 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:06,654-Speed 2619.64 samples/sec Loss 7.2330 LearningRate 0.0289 Epoch: 9 Global Step: 383500 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:10,553-Speed 2626.43 samples/sec Loss 7.2721 LearningRate 0.0289 Epoch: 9 Global Step: 383510 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:14,474-Speed 2612.79 samples/sec Loss 7.1621 LearningRate 0.0289 Epoch: 9 Global Step: 383520 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:18,380-Speed 2622.01 samples/sec Loss 7.1604 LearningRate 0.0289 Epoch: 9 Global Step: 383530 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:22,287-Speed 2622.02 samples/sec Loss 7.2195 LearningRate 0.0289 Epoch: 9 Global Step: 383540 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:26,176-Speed 2633.15 samples/sec Loss 7.3625 LearningRate 0.0289 Epoch: 9 Global Step: 383550 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:30,075-Speed 2627.27 samples/sec Loss 7.3287 LearningRate 0.0289 Epoch: 9 Global Step: 383560 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:49:33,979-Speed 2623.98 samples/sec Loss 7.3968 LearningRate 0.0289 Epoch: 9 Global Step: 383570 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:49:37,877-Speed 2627.18 samples/sec Loss 7.3169 LearningRate 0.0289 Epoch: 9 Global Step: 383580 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:49:41,792-Speed 2616.13 samples/sec Loss 7.2265 LearningRate 0.0289 Epoch: 9 Global Step: 383590 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:49:45,700-Speed 2622.18 samples/sec Loss 7.0870 LearningRate 0.0289 Epoch: 9 Global Step: 383600 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:49:49,602-Speed 2624.80 samples/sec Loss 7.3123 LearningRate 0.0289 Epoch: 9 Global Step: 383610 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:49:53,502-Speed 2626.04 samples/sec Loss 7.2640 LearningRate 0.0289 Epoch: 9 Global Step: 383620 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:49:57,511-Speed 2555.56 samples/sec Loss 7.1340 LearningRate 0.0289 Epoch: 9 Global Step: 383630 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:01,512-Speed 2559.50 samples/sec Loss 7.3376 LearningRate 0.0289 Epoch: 9 Global Step: 383640 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:05,410-Speed 2627.95 samples/sec Loss 7.2897 LearningRate 0.0289 Epoch: 9 Global Step: 383650 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:09,305-Speed 2629.10 samples/sec Loss 7.3124 LearningRate 0.0289 Epoch: 9 Global Step: 383660 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:13,187-Speed 2638.93 samples/sec Loss 7.1544 LearningRate 0.0289 Epoch: 9 Global Step: 383670 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:17,082-Speed 2629.42 samples/sec Loss 7.3642 LearningRate 0.0289 Epoch: 9 Global Step: 383680 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:20,992-Speed 2620.20 samples/sec Loss 7.3234 LearningRate 0.0289 Epoch: 9 Global Step: 383690 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:24,896-Speed 2624.01 samples/sec Loss 7.2930 LearningRate 0.0289 Epoch: 9 Global Step: 383700 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:28,811-Speed 2615.78 samples/sec Loss 7.3506 LearningRate 0.0289 Epoch: 9 Global Step: 383710 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:32,712-Speed 2625.71 samples/sec Loss 7.2289 LearningRate 0.0289 Epoch: 9 Global Step: 383720 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:36,614-Speed 2625.45 samples/sec Loss 7.2180 LearningRate 0.0289 Epoch: 9 Global Step: 383730 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:40,527-Speed 2617.33 samples/sec Loss 7.2231 LearningRate 0.0289 Epoch: 9 Global Step: 383740 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:44,417-Speed 2633.02 samples/sec Loss 7.2469 LearningRate 0.0289 Epoch: 9 Global Step: 383750 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:48,308-Speed 2632.74 samples/sec Loss 7.2404 LearningRate 0.0289 Epoch: 9 Global Step: 383760 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:52,203-Speed 2629.57 samples/sec Loss 7.2504 LearningRate 0.0289 Epoch: 9 Global Step: 383770 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:50:56,089-Speed 2635.80 samples/sec Loss 7.2056 LearningRate 0.0289 Epoch: 9 Global Step: 383780 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:50:59,984-Speed 2629.07 samples/sec Loss 7.2001 LearningRate 0.0289 Epoch: 9 Global Step: 383790 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:51:03,879-Speed 2630.23 samples/sec Loss 7.2860 LearningRate 0.0289 Epoch: 9 Global Step: 383800 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:51:07,777-Speed 2627.76 samples/sec Loss 7.3349 LearningRate 0.0289 Epoch: 9 Global Step: 383810 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:51:11,662-Speed 2636.42 samples/sec Loss 7.4179 LearningRate 0.0289 Epoch: 9 Global Step: 383820 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:51:15,562-Speed 2625.94 samples/sec Loss 7.3077 LearningRate 0.0289 Epoch: 9 Global Step: 383830 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:51:19,444-Speed 2639.03 samples/sec Loss 7.6078 LearningRate 0.0289 Epoch: 9 Global Step: 383840 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:51:23,339-Speed 2629.13 samples/sec Loss 7.3694 LearningRate 0.0289 Epoch: 9 Global Step: 383850 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:51:27,217-Speed 2641.17 samples/sec Loss 7.4318 LearningRate 0.0289 Epoch: 9 Global Step: 383860 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:31,114-Speed 2628.29 samples/sec Loss 7.4806 LearningRate 0.0289 Epoch: 9 Global Step: 383870 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:35,014-Speed 2626.45 samples/sec Loss 7.1989 LearningRate 0.0289 Epoch: 9 Global Step: 383880 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:38,906-Speed 2631.74 samples/sec Loss 7.2872 LearningRate 0.0289 Epoch: 9 Global Step: 383890 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:42,799-Speed 2631.22 samples/sec Loss 7.4352 LearningRate 0.0289 Epoch: 9 Global Step: 383900 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:46,693-Speed 2629.76 samples/sec Loss 7.3393 LearningRate 0.0289 Epoch: 9 Global Step: 383910 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:50,675-Speed 2572.54 samples/sec Loss 7.0813 LearningRate 0.0289 Epoch: 9 Global Step: 383920 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:54,574-Speed 2627.67 samples/sec Loss 7.2013 LearningRate 0.0289 Epoch: 9 Global Step: 383930 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:51:58,471-Speed 2628.14 samples/sec Loss 7.2085 LearningRate 0.0289 Epoch: 9 Global Step: 383940 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:52:02,368-Speed 2628.43 samples/sec Loss 7.1543 LearningRate 0.0289 Epoch: 9 Global Step: 383950 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 14:52:06,270-Speed 2625.51 samples/sec Loss 7.1049 LearningRate 0.0289 Epoch: 9 Global Step: 383960 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:10,165-Speed 2629.22 samples/sec Loss 7.3653 LearningRate 0.0289 Epoch: 9 Global Step: 383970 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:14,063-Speed 2627.69 samples/sec Loss 7.2962 LearningRate 0.0289 Epoch: 9 Global Step: 383980 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:17,968-Speed 2622.83 samples/sec Loss 7.2715 LearningRate 0.0289 Epoch: 9 Global Step: 383990 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:21,879-Speed 2619.26 samples/sec Loss 7.2065 LearningRate 0.0288 Epoch: 9 Global Step: 384000 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:26,026-Speed 2470.19 samples/sec Loss 7.3075 LearningRate 0.0288 Epoch: 9 Global Step: 384010 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:29,972-Speed 2595.92 samples/sec Loss 7.1102 LearningRate 0.0288 Epoch: 9 Global Step: 384020 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:33,864-Speed 2631.90 samples/sec Loss 7.3286 LearningRate 0.0288 Epoch: 9 Global Step: 384030 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:37,772-Speed 2620.77 samples/sec Loss 7.1894 LearningRate 0.0288 Epoch: 9 Global Step: 384040 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:41,671-Speed 2626.46 samples/sec Loss 7.2408 LearningRate 0.0288 Epoch: 9 Global Step: 384050 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:52:45,626-Speed 2590.25 samples/sec Loss 7.3274 LearningRate 0.0288 Epoch: 9 Global Step: 384060 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:52:49,524-Speed 2627.83 samples/sec Loss 7.2646 LearningRate 0.0288 Epoch: 9 Global Step: 384070 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:52:53,519-Speed 2563.95 samples/sec Loss 7.3414 LearningRate 0.0288 Epoch: 9 Global Step: 384080 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:52:57,418-Speed 2626.95 samples/sec Loss 7.1795 LearningRate 0.0288 Epoch: 9 Global Step: 384090 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:53:01,377-Speed 2587.23 samples/sec Loss 7.2068 LearningRate 0.0288 Epoch: 9 Global Step: 384100 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:53:05,275-Speed 2627.74 samples/sec Loss 7.1159 LearningRate 0.0288 Epoch: 9 Global Step: 384110 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:53:09,180-Speed 2622.96 samples/sec Loss 7.2342 LearningRate 0.0288 Epoch: 9 Global Step: 384120 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:53:13,074-Speed 2630.28 samples/sec Loss 7.1565 LearningRate 0.0288 Epoch: 9 Global Step: 384130 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:53:16,969-Speed 2629.53 samples/sec Loss 7.3229 LearningRate 0.0288 Epoch: 9 Global Step: 384140 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:53:20,859-Speed 2633.15 samples/sec Loss 7.3506 LearningRate 0.0288 Epoch: 9 Global Step: 384150 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:24,763-Speed 2623.81 samples/sec Loss 7.1925 LearningRate 0.0288 Epoch: 9 Global Step: 384160 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:28,665-Speed 2625.35 samples/sec Loss 7.3010 LearningRate 0.0288 Epoch: 9 Global Step: 384170 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:32,561-Speed 2628.96 samples/sec Loss 7.1872 LearningRate 0.0288 Epoch: 9 Global Step: 384180 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:36,452-Speed 2631.70 samples/sec Loss 7.3152 LearningRate 0.0288 Epoch: 9 Global Step: 384190 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:40,346-Speed 2630.56 samples/sec Loss 7.1093 LearningRate 0.0288 Epoch: 9 Global Step: 384200 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:44,248-Speed 2624.81 samples/sec Loss 7.2603 LearningRate 0.0288 Epoch: 9 Global Step: 384210 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:48,154-Speed 2622.64 samples/sec Loss 7.2691 LearningRate 0.0288 Epoch: 9 Global Step: 384220 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:52,056-Speed 2625.02 samples/sec Loss 7.1615 LearningRate 0.0288 Epoch: 9 Global Step: 384230 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:55,997-Speed 2598.70 samples/sec Loss 7.1160 LearningRate 0.0288 Epoch: 9 Global Step: 384240 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:53:59,893-Speed 2629.48 samples/sec Loss 7.3556 LearningRate 0.0288 Epoch: 9 Global Step: 384250 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:03,790-Speed 2628.05 samples/sec Loss 7.1507 LearningRate 0.0288 Epoch: 9 Global Step: 384260 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:07,689-Speed 2627.24 samples/sec Loss 7.2083 LearningRate 0.0288 Epoch: 9 Global Step: 384270 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:11,596-Speed 2621.02 samples/sec Loss 7.1889 LearningRate 0.0288 Epoch: 9 Global Step: 384280 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:15,491-Speed 2629.91 samples/sec Loss 7.2626 LearningRate 0.0288 Epoch: 9 Global Step: 384290 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:19,414-Speed 2610.46 samples/sec Loss 7.2529 LearningRate 0.0288 Epoch: 9 Global Step: 384300 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:23,414-Speed 2561.05 samples/sec Loss 7.2375 LearningRate 0.0288 Epoch: 9 Global Step: 384310 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:27,324-Speed 2619.69 samples/sec Loss 7.3642 LearningRate 0.0288 Epoch: 9 Global Step: 384320 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:31,215-Speed 2632.38 samples/sec Loss 7.3003 LearningRate 0.0288 Epoch: 9 Global Step: 384330 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:35,115-Speed 2626.12 samples/sec Loss 7.1035 LearningRate 0.0288 Epoch: 9 Global Step: 384340 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 14:54:39,010-Speed 2629.34 samples/sec Loss 7.2814 LearningRate 0.0288 Epoch: 9 Global Step: 384350 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:54:42,907-Speed 2628.25 samples/sec Loss 7.2108 LearningRate 0.0288 Epoch: 9 Global Step: 384360 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:54:46,803-Speed 2629.08 samples/sec Loss 7.1188 LearningRate 0.0288 Epoch: 9 Global Step: 384370 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:54:50,697-Speed 2630.38 samples/sec Loss 7.3364 LearningRate 0.0288 Epoch: 9 Global Step: 384380 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:54:54,594-Speed 2628.36 samples/sec Loss 7.2628 LearningRate 0.0288 Epoch: 9 Global Step: 384390 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:54:58,485-Speed 2632.56 samples/sec Loss 7.2243 LearningRate 0.0288 Epoch: 9 Global Step: 384400 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:02,381-Speed 2628.95 samples/sec Loss 7.2254 LearningRate 0.0288 Epoch: 9 Global Step: 384410 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:06,270-Speed 2633.16 samples/sec Loss 7.2706 LearningRate 0.0288 Epoch: 9 Global Step: 384420 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:10,164-Speed 2630.55 samples/sec Loss 7.2864 LearningRate 0.0288 Epoch: 9 Global Step: 384430 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:14,057-Speed 2631.24 samples/sec Loss 7.2659 LearningRate 0.0288 Epoch: 9 Global Step: 384440 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:17,943-Speed 2635.93 samples/sec Loss 7.1528 LearningRate 0.0288 Epoch: 9 Global Step: 384450 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:55:21,830-Speed 2635.38 samples/sec Loss 7.2110 LearningRate 0.0288 Epoch: 9 Global Step: 384460 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:25,727-Speed 2628.28 samples/sec Loss 7.0954 LearningRate 0.0288 Epoch: 9 Global Step: 384470 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:29,623-Speed 2629.19 samples/sec Loss 7.2179 LearningRate 0.0288 Epoch: 9 Global Step: 384480 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:33,554-Speed 2605.78 samples/sec Loss 7.3578 LearningRate 0.0288 Epoch: 9 Global Step: 384490 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:37,451-Speed 2627.97 samples/sec Loss 7.3620 LearningRate 0.0288 Epoch: 9 Global Step: 384500 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:41,353-Speed 2625.26 samples/sec Loss 7.2919 LearningRate 0.0288 Epoch: 9 Global Step: 384510 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:45,252-Speed 2627.04 samples/sec Loss 7.2035 LearningRate 0.0288 Epoch: 9 Global Step: 384520 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:49,147-Speed 2629.53 samples/sec Loss 7.1337 LearningRate 0.0288 Epoch: 9 Global Step: 384530 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:53,045-Speed 2627.66 samples/sec Loss 7.3717 LearningRate 0.0288 Epoch: 9 Global Step: 384540 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:55:56,936-Speed 2632.30 samples/sec Loss 7.2091 LearningRate 0.0288 Epoch: 9 Global Step: 384550 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:00,828-Speed 2631.67 samples/sec Loss 7.1978 LearningRate 0.0288 Epoch: 9 Global Step: 384560 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:56:04,710-Speed 2638.39 samples/sec Loss 7.2025 LearningRate 0.0288 Epoch: 9 Global Step: 384570 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:08,611-Speed 2626.03 samples/sec Loss 7.0935 LearningRate 0.0288 Epoch: 9 Global Step: 384580 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:12,526-Speed 2615.90 samples/sec Loss 7.1762 LearningRate 0.0288 Epoch: 9 Global Step: 384590 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:16,422-Speed 2629.26 samples/sec Loss 7.2295 LearningRate 0.0288 Epoch: 9 Global Step: 384600 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:20,329-Speed 2621.53 samples/sec Loss 7.1522 LearningRate 0.0288 Epoch: 9 Global Step: 384610 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:24,216-Speed 2635.05 samples/sec Loss 7.2398 LearningRate 0.0288 Epoch: 9 Global Step: 384620 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:28,115-Speed 2627.43 samples/sec Loss 7.2061 LearningRate 0.0288 Epoch: 9 Global Step: 384630 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:32,021-Speed 2621.72 samples/sec Loss 7.2410 LearningRate 0.0288 Epoch: 9 Global Step: 384640 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:35,922-Speed 2626.51 samples/sec Loss 7.2486 LearningRate 0.0288 Epoch: 9 Global Step: 384650 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:39,816-Speed 2630.43 samples/sec Loss 7.1669 LearningRate 0.0288 Epoch: 9 Global Step: 384660 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:43,705-Speed 2633.19 samples/sec Loss 7.2542 LearningRate 0.0288 Epoch: 9 Global Step: 384670 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:56:47,608-Speed 2623.97 samples/sec Loss 7.2040 LearningRate 0.0288 Epoch: 9 Global Step: 384680 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:56:51,514-Speed 2622.39 samples/sec Loss 7.2102 LearningRate 0.0288 Epoch: 9 Global Step: 384690 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:56:55,416-Speed 2625.58 samples/sec Loss 7.1937 LearningRate 0.0288 Epoch: 9 Global Step: 384700 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:56:59,317-Speed 2625.37 samples/sec Loss 7.2500 LearningRate 0.0288 Epoch: 9 Global Step: 384710 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:03,214-Speed 2628.37 samples/sec Loss 7.2603 LearningRate 0.0288 Epoch: 9 Global Step: 384720 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:07,114-Speed 2626.20 samples/sec Loss 7.1265 LearningRate 0.0288 Epoch: 9 Global Step: 384730 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:11,005-Speed 2632.55 samples/sec Loss 7.2091 LearningRate 0.0288 Epoch: 9 Global Step: 384740 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:14,909-Speed 2623.10 samples/sec Loss 7.0500 LearningRate 0.0288 Epoch: 9 Global Step: 384750 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:18,823-Speed 2617.05 samples/sec Loss 7.1965 LearningRate 0.0288 Epoch: 9 Global Step: 384760 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:22,720-Speed 2628.17 samples/sec Loss 7.0831 LearningRate 0.0287 Epoch: 9 Global Step: 384770 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:26,618-Speed 2628.07 samples/sec Loss 7.1535 LearningRate 0.0287 Epoch: 9 Global Step: 384780 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:30,515-Speed 2628.41 samples/sec Loss 7.2706 LearningRate 0.0287 Epoch: 9 Global Step: 384790 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:34,418-Speed 2624.08 samples/sec Loss 7.3214 LearningRate 0.0287 Epoch: 9 Global Step: 384800 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:57:38,307-Speed 2633.55 samples/sec Loss 7.1613 LearningRate 0.0287 Epoch: 9 Global Step: 384810 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:42,232-Speed 2609.55 samples/sec Loss 7.2296 LearningRate 0.0287 Epoch: 9 Global Step: 384820 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:46,141-Speed 2620.09 samples/sec Loss 7.2293 LearningRate 0.0287 Epoch: 9 Global Step: 384830 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:50,073-Speed 2605.58 samples/sec Loss 7.3045 LearningRate 0.0287 Epoch: 9 Global Step: 384840 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:53,970-Speed 2627.72 samples/sec Loss 7.2298 LearningRate 0.0287 Epoch: 9 Global Step: 384850 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:57:57,859-Speed 2633.85 samples/sec Loss 7.1226 LearningRate 0.0287 Epoch: 9 Global Step: 384860 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:01,769-Speed 2620.01 samples/sec Loss 7.2248 LearningRate 0.0287 Epoch: 9 Global Step: 384870 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:05,674-Speed 2623.07 samples/sec Loss 7.2732 LearningRate 0.0287 Epoch: 9 Global Step: 384880 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:09,586-Speed 2618.46 samples/sec Loss 7.1588 LearningRate 0.0287 Epoch: 9 Global Step: 384890 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:13,513-Speed 2608.00 samples/sec Loss 7.0779 LearningRate 0.0287 Epoch: 9 Global Step: 384900 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:17,428-Speed 2615.91 samples/sec Loss 7.2434 LearningRate 0.0287 Epoch: 9 Global Step: 384910 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:58:21,325-Speed 2628.70 samples/sec Loss 7.1069 LearningRate 0.0287 Epoch: 9 Global Step: 384920 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:58:25,218-Speed 2631.23 samples/sec Loss 7.3064 LearningRate 0.0287 Epoch: 9 Global Step: 384930 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:29,131-Speed 2617.59 samples/sec Loss 7.2546 LearningRate 0.0287 Epoch: 9 Global Step: 384940 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:33,029-Speed 2627.89 samples/sec Loss 7.2615 LearningRate 0.0287 Epoch: 9 Global Step: 384950 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:36,923-Speed 2630.32 samples/sec Loss 7.2293 LearningRate 0.0287 Epoch: 9 Global Step: 384960 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:40,825-Speed 2625.20 samples/sec Loss 7.2563 LearningRate 0.0287 Epoch: 9 Global Step: 384970 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:44,711-Speed 2635.34 samples/sec Loss 7.2227 LearningRate 0.0287 Epoch: 9 Global Step: 384980 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:48,621-Speed 2620.25 samples/sec Loss 7.2160 LearningRate 0.0287 Epoch: 9 Global Step: 384990 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:52,521-Speed 2625.71 samples/sec Loss 7.2472 LearningRate 0.0287 Epoch: 9 Global Step: 385000 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:58:56,412-Speed 2632.52 samples/sec Loss 7.2273 LearningRate 0.0287 Epoch: 9 Global Step: 385010 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:00,308-Speed 2629.13 samples/sec Loss 7.1882 LearningRate 0.0287 Epoch: 9 Global Step: 385020 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:04,202-Speed 2630.47 samples/sec Loss 7.2040 LearningRate 0.0287 Epoch: 9 Global Step: 385030 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:59:08,094-Speed 2631.21 samples/sec Loss 7.1132 LearningRate 0.0287 Epoch: 9 Global Step: 385040 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 14:59:11,974-Speed 2640.11 samples/sec Loss 7.3426 LearningRate 0.0287 Epoch: 9 Global Step: 385050 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:15,871-Speed 2628.35 samples/sec Loss 7.2144 LearningRate 0.0287 Epoch: 9 Global Step: 385060 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:19,773-Speed 2624.85 samples/sec Loss 7.1851 LearningRate 0.0287 Epoch: 9 Global Step: 385070 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:23,677-Speed 2624.19 samples/sec Loss 7.2135 LearningRate 0.0287 Epoch: 9 Global Step: 385080 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:27,572-Speed 2629.65 samples/sec Loss 7.2602 LearningRate 0.0287 Epoch: 9 Global Step: 385090 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:31,467-Speed 2629.88 samples/sec Loss 7.3025 LearningRate 0.0287 Epoch: 9 Global Step: 385100 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 14:59:35,333-Speed 2648.92 samples/sec Loss 7.5903 LearningRate 0.0287 Epoch: 9 Global Step: 385110 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:59:39,238-Speed 2622.90 samples/sec Loss 7.6459 LearningRate 0.0287 Epoch: 9 Global Step: 385120 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:59:43,159-Speed 2612.00 samples/sec Loss 7.4114 LearningRate 0.0287 Epoch: 9 Global Step: 385130 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:59:47,054-Speed 2630.47 samples/sec Loss 7.3449 LearningRate 0.0287 Epoch: 9 Global Step: 385140 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:59:50,946-Speed 2631.65 samples/sec Loss 7.3138 LearningRate 0.0287 Epoch: 9 Global Step: 385150 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:59:54,859-Speed 2617.57 samples/sec Loss 7.2395 LearningRate 0.0287 Epoch: 9 Global Step: 385160 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 14:59:58,766-Speed 2621.73 samples/sec Loss 7.2455 LearningRate 0.0287 Epoch: 9 Global Step: 385170 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:00:02,659-Speed 2630.87 samples/sec Loss 7.0810 LearningRate 0.0287 Epoch: 9 Global Step: 385180 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:00:06,554-Speed 2629.69 samples/sec Loss 7.2623 LearningRate 0.0287 Epoch: 9 Global Step: 385190 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:00:10,444-Speed 2632.51 samples/sec Loss 7.1957 LearningRate 0.0287 Epoch: 9 Global Step: 385200 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:00:14,347-Speed 2624.39 samples/sec Loss 7.5936 LearningRate 0.0287 Epoch: 9 Global Step: 385210 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:18,255-Speed 2620.63 samples/sec Loss 7.3102 LearningRate 0.0287 Epoch: 9 Global Step: 385220 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:22,156-Speed 2625.76 samples/sec Loss 7.2228 LearningRate 0.0287 Epoch: 9 Global Step: 385230 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:26,071-Speed 2616.32 samples/sec Loss 7.1171 LearningRate 0.0287 Epoch: 9 Global Step: 385240 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:29,981-Speed 2620.07 samples/sec Loss 7.0837 LearningRate 0.0287 Epoch: 9 Global Step: 385250 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:33,883-Speed 2624.89 samples/sec Loss 7.1396 LearningRate 0.0287 Epoch: 9 Global Step: 385260 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:38,013-Speed 2479.53 samples/sec Loss 7.3325 LearningRate 0.0287 Epoch: 9 Global Step: 385270 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:41,913-Speed 2626.43 samples/sec Loss 7.1923 LearningRate 0.0287 Epoch: 9 Global Step: 385280 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:45,830-Speed 2615.25 samples/sec Loss 7.1690 LearningRate 0.0287 Epoch: 9 Global Step: 385290 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:49,729-Speed 2626.53 samples/sec Loss 7.2308 LearningRate 0.0287 Epoch: 9 Global Step: 385300 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:00:53,642-Speed 2618.37 samples/sec Loss 7.2320 LearningRate 0.0287 Epoch: 9 Global Step: 385310 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:00:57,645-Speed 2558.39 samples/sec Loss 7.4055 LearningRate 0.0287 Epoch: 9 Global Step: 385320 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:01:01,548-Speed 2624.69 samples/sec Loss 7.3799 LearningRate 0.0287 Epoch: 9 Global Step: 385330 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:01:05,443-Speed 2629.79 samples/sec Loss 7.1758 LearningRate 0.0287 Epoch: 9 Global Step: 385340 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:01:09,347-Speed 2623.36 samples/sec Loss 7.2985 LearningRate 0.0287 Epoch: 9 Global Step: 385350 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:01:13,244-Speed 2628.02 samples/sec Loss 7.1574 LearningRate 0.0287 Epoch: 9 Global Step: 385360 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:01:17,139-Speed 2630.35 samples/sec Loss 7.2481 LearningRate 0.0287 Epoch: 9 Global Step: 385370 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:01:21,032-Speed 2631.02 samples/sec Loss 7.0820 LearningRate 0.0287 Epoch: 9 Global Step: 385380 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:01:24,901-Speed 2647.03 samples/sec Loss 7.2507 LearningRate 0.0287 Epoch: 9 Global Step: 385390 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:28,795-Speed 2630.42 samples/sec Loss 7.3261 LearningRate 0.0287 Epoch: 9 Global Step: 385400 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:32,686-Speed 2632.06 samples/sec Loss 7.2531 LearningRate 0.0287 Epoch: 9 Global Step: 385410 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:36,590-Speed 2623.79 samples/sec Loss 7.2568 LearningRate 0.0287 Epoch: 9 Global Step: 385420 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:40,585-Speed 2563.65 samples/sec Loss 7.1089 LearningRate 0.0287 Epoch: 9 Global Step: 385430 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:45,376-Speed 2137.94 samples/sec Loss 7.1209 LearningRate 0.0287 Epoch: 9 Global Step: 385440 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:49,284-Speed 2620.92 samples/sec Loss 7.3102 LearningRate 0.0287 Epoch: 9 Global Step: 385450 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:53,191-Speed 2622.90 samples/sec Loss 7.2228 LearningRate 0.0287 Epoch: 9 Global Step: 385460 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:01:57,085-Speed 2630.86 samples/sec Loss 7.1358 LearningRate 0.0287 Epoch: 9 Global Step: 385470 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:02:01,019-Speed 2603.85 samples/sec Loss 7.2611 LearningRate 0.0287 Epoch: 9 Global Step: 385480 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:02:04,906-Speed 2635.11 samples/sec Loss 7.0823 LearningRate 0.0287 Epoch: 9 Global Step: 385490 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:08,813-Speed 2621.46 samples/sec Loss 7.2642 LearningRate 0.0287 Epoch: 9 Global Step: 385500 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:12,709-Speed 2628.59 samples/sec Loss 7.0798 LearningRate 0.0287 Epoch: 9 Global Step: 385510 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:16,607-Speed 2631.62 samples/sec Loss 7.2764 LearningRate 0.0287 Epoch: 9 Global Step: 385520 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:20,508-Speed 2625.97 samples/sec Loss 7.2245 LearningRate 0.0287 Epoch: 9 Global Step: 385530 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:24,406-Speed 2627.70 samples/sec Loss 7.2367 LearningRate 0.0287 Epoch: 9 Global Step: 385540 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:28,323-Speed 2614.85 samples/sec Loss 7.2479 LearningRate 0.0286 Epoch: 9 Global Step: 385550 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:32,218-Speed 2630.02 samples/sec Loss 7.1801 LearningRate 0.0286 Epoch: 9 Global Step: 385560 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:36,112-Speed 2630.30 samples/sec Loss 7.3332 LearningRate 0.0286 Epoch: 9 Global Step: 385570 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:40,014-Speed 2625.19 samples/sec Loss 7.2557 LearningRate 0.0286 Epoch: 9 Global Step: 385580 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:02:43,902-Speed 2633.74 samples/sec Loss 7.1605 LearningRate 0.0286 Epoch: 9 Global Step: 385590 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:02:47,802-Speed 2626.51 samples/sec Loss 7.1884 LearningRate 0.0286 Epoch: 9 Global Step: 385600 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:02:51,695-Speed 2630.84 samples/sec Loss 7.3000 LearningRate 0.0286 Epoch: 9 Global Step: 385610 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:02:55,597-Speed 2624.97 samples/sec Loss 7.2195 LearningRate 0.0286 Epoch: 9 Global Step: 385620 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:02:59,490-Speed 2631.09 samples/sec Loss 7.1897 LearningRate 0.0286 Epoch: 9 Global Step: 385630 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:03,383-Speed 2631.18 samples/sec Loss 7.1827 LearningRate 0.0286 Epoch: 9 Global Step: 385640 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:07,285-Speed 2625.04 samples/sec Loss 7.1779 LearningRate 0.0286 Epoch: 9 Global Step: 385650 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:11,190-Speed 2622.39 samples/sec Loss 7.2670 LearningRate 0.0286 Epoch: 9 Global Step: 385660 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:15,093-Speed 2624.18 samples/sec Loss 7.2574 LearningRate 0.0286 Epoch: 9 Global Step: 385670 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:19,001-Speed 2621.19 samples/sec Loss 7.1513 LearningRate 0.0286 Epoch: 9 Global Step: 385680 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:22,915-Speed 2616.71 samples/sec Loss 7.1504 LearningRate 0.0286 Epoch: 9 Global Step: 385690 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:03:26,830-Speed 2616.35 samples/sec Loss 7.2109 LearningRate 0.0286 Epoch: 9 Global Step: 385700 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:30,790-Speed 2587.25 samples/sec Loss 7.2080 LearningRate 0.0286 Epoch: 9 Global Step: 385710 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:34,697-Speed 2621.80 samples/sec Loss 7.2927 LearningRate 0.0286 Epoch: 9 Global Step: 385720 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:38,624-Speed 2607.85 samples/sec Loss 7.3530 LearningRate 0.0286 Epoch: 9 Global Step: 385730 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:42,541-Speed 2614.77 samples/sec Loss 7.1587 LearningRate 0.0286 Epoch: 9 Global Step: 385740 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:46,452-Speed 2619.13 samples/sec Loss 7.0864 LearningRate 0.0286 Epoch: 9 Global Step: 385750 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:03:50,307-Speed 2656.61 samples/sec Loss 7.2346 LearningRate 0.0286 Epoch: 9 Global Step: 385760 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:03:54,222-Speed 2616.08 samples/sec Loss 7.4078 LearningRate 0.0286 Epoch: 9 Global Step: 385770 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:03:58,136-Speed 2616.89 samples/sec Loss 7.3987 LearningRate 0.0286 Epoch: 9 Global Step: 385780 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:04:02,015-Speed 2640.91 samples/sec Loss 7.1467 LearningRate 0.0286 Epoch: 9 Global Step: 385790 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:05,907-Speed 2631.76 samples/sec Loss 7.5962 LearningRate 0.0286 Epoch: 9 Global Step: 385800 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:09,846-Speed 2600.34 samples/sec Loss 7.5180 LearningRate 0.0286 Epoch: 9 Global Step: 385810 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:13,740-Speed 2630.14 samples/sec Loss 7.1738 LearningRate 0.0286 Epoch: 9 Global Step: 385820 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:17,640-Speed 2626.62 samples/sec Loss 7.2267 LearningRate 0.0286 Epoch: 9 Global Step: 385830 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:21,541-Speed 2624.88 samples/sec Loss 7.2694 LearningRate 0.0286 Epoch: 9 Global Step: 385840 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:25,445-Speed 2624.75 samples/sec Loss 7.2929 LearningRate 0.0286 Epoch: 9 Global Step: 385850 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:29,329-Speed 2636.57 samples/sec Loss 7.1273 LearningRate 0.0286 Epoch: 9 Global Step: 385860 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:33,228-Speed 2627.05 samples/sec Loss 7.2205 LearningRate 0.0286 Epoch: 9 Global Step: 385870 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:37,169-Speed 2599.26 samples/sec Loss 7.1772 LearningRate 0.0286 Epoch: 9 Global Step: 385880 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:04:41,068-Speed 2627.57 samples/sec Loss 7.2529 LearningRate 0.0286 Epoch: 9 Global Step: 385890 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:04:45,003-Speed 2602.98 samples/sec Loss 7.1464 LearningRate 0.0286 Epoch: 9 Global Step: 385900 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:04:48,897-Speed 2630.37 samples/sec Loss 7.2058 LearningRate 0.0286 Epoch: 9 Global Step: 385910 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:04:52,788-Speed 2632.28 samples/sec Loss 7.1766 LearningRate 0.0286 Epoch: 9 Global Step: 385920 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:04:56,685-Speed 2628.53 samples/sec Loss 7.0884 LearningRate 0.0286 Epoch: 9 Global Step: 385930 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:05:00,597-Speed 2618.80 samples/sec Loss 7.1883 LearningRate 0.0286 Epoch: 9 Global Step: 385940 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:05:04,487-Speed 2632.43 samples/sec Loss 7.2188 LearningRate 0.0286 Epoch: 9 Global Step: 385950 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:05:08,391-Speed 2623.80 samples/sec Loss 7.3055 LearningRate 0.0286 Epoch: 9 Global Step: 385960 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:05:12,308-Speed 2614.93 samples/sec Loss 7.2283 LearningRate 0.0286 Epoch: 9 Global Step: 385970 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:05:16,204-Speed 2629.16 samples/sec Loss 7.2299 LearningRate 0.0286 Epoch: 9 Global Step: 385980 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:05:20,099-Speed 2629.56 samples/sec Loss 7.2754 LearningRate 0.0286 Epoch: 9 Global Step: 385990 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:24,073-Speed 2577.81 samples/sec Loss 7.2524 LearningRate 0.0286 Epoch: 9 Global Step: 386000 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:27,991-Speed 2614.09 samples/sec Loss 7.2093 LearningRate 0.0286 Epoch: 9 Global Step: 386010 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:31,900-Speed 2620.76 samples/sec Loss 7.3676 LearningRate 0.0286 Epoch: 9 Global Step: 386020 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:35,789-Speed 2633.38 samples/sec Loss 7.1707 LearningRate 0.0286 Epoch: 9 Global Step: 386030 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:39,705-Speed 2615.82 samples/sec Loss 7.1792 LearningRate 0.0286 Epoch: 9 Global Step: 386040 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:43,603-Speed 2626.83 samples/sec Loss 7.1451 LearningRate 0.0286 Epoch: 9 Global Step: 386050 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:47,507-Speed 2624.21 samples/sec Loss 7.2008 LearningRate 0.0286 Epoch: 9 Global Step: 386060 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:51,403-Speed 2629.55 samples/sec Loss 7.1593 LearningRate 0.0286 Epoch: 9 Global Step: 386070 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:55,305-Speed 2624.97 samples/sec Loss 7.1515 LearningRate 0.0286 Epoch: 9 Global Step: 386080 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:05:59,199-Speed 2629.95 samples/sec Loss 7.2049 LearningRate 0.0286 Epoch: 9 Global Step: 386090 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:03,102-Speed 2624.26 samples/sec Loss 7.2491 LearningRate 0.0286 Epoch: 9 Global Step: 386100 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:06,996-Speed 2630.42 samples/sec Loss 7.2627 LearningRate 0.0286 Epoch: 9 Global Step: 386110 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:10,901-Speed 2622.69 samples/sec Loss 7.1780 LearningRate 0.0286 Epoch: 9 Global Step: 386120 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:14,799-Speed 2627.50 samples/sec Loss 7.0863 LearningRate 0.0286 Epoch: 9 Global Step: 386130 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:18,708-Speed 2620.52 samples/sec Loss 7.2347 LearningRate 0.0286 Epoch: 9 Global Step: 386140 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:22,612-Speed 2623.46 samples/sec Loss 7.1930 LearningRate 0.0286 Epoch: 9 Global Step: 386150 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:26,630-Speed 2549.44 samples/sec Loss 7.2910 LearningRate 0.0286 Epoch: 9 Global Step: 386160 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:30,721-Speed 2503.53 samples/sec Loss 7.1034 LearningRate 0.0286 Epoch: 9 Global Step: 386170 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:06:34,778-Speed 2524.53 samples/sec Loss 7.2073 LearningRate 0.0286 Epoch: 9 Global Step: 386180 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:06:38,744-Speed 2582.67 samples/sec Loss 7.0626 LearningRate 0.0286 Epoch: 9 Global Step: 386190 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:06:42,655-Speed 2621.50 samples/sec Loss 7.2613 LearningRate 0.0286 Epoch: 9 Global Step: 386200 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:06:46,560-Speed 2623.05 samples/sec Loss 7.3415 LearningRate 0.0286 Epoch: 9 Global Step: 386210 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:06:50,450-Speed 2633.03 samples/sec Loss 7.1758 LearningRate 0.0286 Epoch: 9 Global Step: 386220 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:06:54,352-Speed 2624.90 samples/sec Loss 7.1043 LearningRate 0.0286 Epoch: 9 Global Step: 386230 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:06:58,253-Speed 2625.65 samples/sec Loss 7.0549 LearningRate 0.0286 Epoch: 9 Global Step: 386240 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:07:02,153-Speed 2626.51 samples/sec Loss 7.1975 LearningRate 0.0286 Epoch: 9 Global Step: 386250 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:07:05,996-Speed 2665.22 samples/sec Loss 7.2909 LearningRate 0.0286 Epoch: 9 Global Step: 386260 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:09,907-Speed 2618.61 samples/sec Loss 7.6316 LearningRate 0.0286 Epoch: 9 Global Step: 386270 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:13,812-Speed 2622.89 samples/sec Loss 7.2771 LearningRate 0.0286 Epoch: 9 Global Step: 386280 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:17,730-Speed 2614.46 samples/sec Loss 7.1627 LearningRate 0.0286 Epoch: 9 Global Step: 386290 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:21,618-Speed 2635.44 samples/sec Loss 7.1236 LearningRate 0.0286 Epoch: 9 Global Step: 386300 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:25,553-Speed 2602.46 samples/sec Loss 7.2287 LearningRate 0.0286 Epoch: 9 Global Step: 386310 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:29,499-Speed 2596.38 samples/sec Loss 7.2324 LearningRate 0.0285 Epoch: 9 Global Step: 386320 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:33,397-Speed 2627.37 samples/sec Loss 7.2375 LearningRate 0.0285 Epoch: 9 Global Step: 386330 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:37,312-Speed 2616.27 samples/sec Loss 7.3328 LearningRate 0.0285 Epoch: 9 Global Step: 386340 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:41,234-Speed 2611.30 samples/sec Loss 7.1355 LearningRate 0.0285 Epoch: 9 Global Step: 386350 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:07:45,133-Speed 2626.56 samples/sec Loss 7.1104 LearningRate 0.0285 Epoch: 9 Global Step: 386360 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:07:49,041-Speed 2620.94 samples/sec Loss 7.1889 LearningRate 0.0285 Epoch: 9 Global Step: 386370 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:07:52,947-Speed 2623.32 samples/sec Loss 7.0354 LearningRate 0.0285 Epoch: 9 Global Step: 386380 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:07:56,839-Speed 2631.14 samples/sec Loss 7.2416 LearningRate 0.0285 Epoch: 9 Global Step: 386390 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:08:00,744-Speed 2623.59 samples/sec Loss 7.1973 LearningRate 0.0285 Epoch: 9 Global Step: 386400 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:08:04,651-Speed 2621.34 samples/sec Loss 7.2468 LearningRate 0.0285 Epoch: 9 Global Step: 386410 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:08:08,548-Speed 2627.64 samples/sec Loss 7.2011 LearningRate 0.0285 Epoch: 9 Global Step: 386420 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:08:12,442-Speed 2630.43 samples/sec Loss 7.1673 LearningRate 0.0285 Epoch: 9 Global Step: 386430 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:08:16,339-Speed 2628.35 samples/sec Loss 7.1202 LearningRate 0.0285 Epoch: 9 Global Step: 386440 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:08:20,242-Speed 2624.45 samples/sec Loss 7.1615 LearningRate 0.0285 Epoch: 9 Global Step: 386450 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:08:24,155-Speed 2617.50 samples/sec Loss 7.2265 LearningRate 0.0285 Epoch: 9 Global Step: 386460 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:28,054-Speed 2626.65 samples/sec Loss 7.2218 LearningRate 0.0285 Epoch: 9 Global Step: 386470 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:31,947-Speed 2631.21 samples/sec Loss 7.0469 LearningRate 0.0285 Epoch: 9 Global Step: 386480 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:35,853-Speed 2621.80 samples/sec Loss 7.1445 LearningRate 0.0285 Epoch: 9 Global Step: 386490 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:39,749-Speed 2629.35 samples/sec Loss 7.0682 LearningRate 0.0285 Epoch: 9 Global Step: 386500 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:43,646-Speed 2628.43 samples/sec Loss 7.2245 LearningRate 0.0285 Epoch: 9 Global Step: 386510 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:47,555-Speed 2620.08 samples/sec Loss 7.1681 LearningRate 0.0285 Epoch: 9 Global Step: 386520 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:51,452-Speed 2627.97 samples/sec Loss 7.1786 LearningRate 0.0285 Epoch: 9 Global Step: 386530 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:55,356-Speed 2623.76 samples/sec Loss 7.1642 LearningRate 0.0285 Epoch: 9 Global Step: 386540 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:08:59,246-Speed 2633.30 samples/sec Loss 7.1217 LearningRate 0.0285 Epoch: 9 Global Step: 386550 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:09:03,141-Speed 2629.18 samples/sec Loss 7.2750 LearningRate 0.0285 Epoch: 9 Global Step: 386560 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:07,050-Speed 2620.11 samples/sec Loss 7.1749 LearningRate 0.0285 Epoch: 9 Global Step: 386570 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:10,947-Speed 2628.72 samples/sec Loss 7.2848 LearningRate 0.0285 Epoch: 9 Global Step: 386580 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:14,849-Speed 2625.35 samples/sec Loss 7.1599 LearningRate 0.0285 Epoch: 9 Global Step: 386590 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:18,747-Speed 2627.33 samples/sec Loss 7.1278 LearningRate 0.0285 Epoch: 9 Global Step: 386600 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:22,659-Speed 2618.22 samples/sec Loss 7.2596 LearningRate 0.0285 Epoch: 9 Global Step: 386610 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:26,574-Speed 2615.90 samples/sec Loss 7.1309 LearningRate 0.0285 Epoch: 9 Global Step: 386620 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:30,472-Speed 2628.21 samples/sec Loss 7.2230 LearningRate 0.0285 Epoch: 9 Global Step: 386630 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:34,360-Speed 2634.02 samples/sec Loss 7.2374 LearningRate 0.0285 Epoch: 9 Global Step: 386640 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:38,254-Speed 2630.65 samples/sec Loss 7.2513 LearningRate 0.0285 Epoch: 9 Global Step: 386650 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:09:42,159-Speed 2622.54 samples/sec Loss 7.0669 LearningRate 0.0285 Epoch: 9 Global Step: 386660 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:09:46,067-Speed 2621.59 samples/sec Loss 7.2011 LearningRate 0.0285 Epoch: 9 Global Step: 386670 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:09:49,958-Speed 2631.84 samples/sec Loss 7.2620 LearningRate 0.0285 Epoch: 9 Global Step: 386680 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:09:53,850-Speed 2631.73 samples/sec Loss 7.2550 LearningRate 0.0285 Epoch: 9 Global Step: 386690 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:09:57,742-Speed 2631.60 samples/sec Loss 7.1330 LearningRate 0.0285 Epoch: 9 Global Step: 386700 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:10:01,673-Speed 2605.56 samples/sec Loss 7.2908 LearningRate 0.0285 Epoch: 9 Global Step: 386710 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:10:05,574-Speed 2626.03 samples/sec Loss 7.2435 LearningRate 0.0285 Epoch: 9 Global Step: 386720 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:10:09,467-Speed 2631.11 samples/sec Loss 7.1214 LearningRate 0.0285 Epoch: 9 Global Step: 386730 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:10:13,417-Speed 2592.84 samples/sec Loss 7.1604 LearningRate 0.0285 Epoch: 9 Global Step: 386740 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:10:17,345-Speed 2607.87 samples/sec Loss 7.2128 LearningRate 0.0285 Epoch: 9 Global Step: 386750 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:10:21,240-Speed 2629.72 samples/sec Loss 7.2060 LearningRate 0.0285 Epoch: 9 Global Step: 386760 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:10:25,090-Speed 2660.23 samples/sec Loss 7.3458 LearningRate 0.0285 Epoch: 9 Global Step: 386770 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:28,986-Speed 2628.60 samples/sec Loss 7.1842 LearningRate 0.0285 Epoch: 9 Global Step: 386780 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:32,877-Speed 2633.80 samples/sec Loss 7.1140 LearningRate 0.0285 Epoch: 9 Global Step: 386790 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:36,778-Speed 2625.67 samples/sec Loss 7.2317 LearningRate 0.0285 Epoch: 9 Global Step: 386800 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:40,685-Speed 2621.59 samples/sec Loss 7.1509 LearningRate 0.0285 Epoch: 9 Global Step: 386810 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:44,589-Speed 2623.66 samples/sec Loss 7.2221 LearningRate 0.0285 Epoch: 9 Global Step: 386820 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:48,499-Speed 2619.78 samples/sec Loss 7.2309 LearningRate 0.0285 Epoch: 9 Global Step: 386830 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:52,395-Speed 2629.22 samples/sec Loss 7.0884 LearningRate 0.0285 Epoch: 9 Global Step: 386840 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:10:56,300-Speed 2622.92 samples/sec Loss 7.1416 LearningRate 0.0285 Epoch: 9 Global Step: 386850 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:00,202-Speed 2624.65 samples/sec Loss 7.2201 LearningRate 0.0285 Epoch: 9 Global Step: 386860 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:04,101-Speed 2627.00 samples/sec Loss 7.2717 LearningRate 0.0285 Epoch: 9 Global Step: 386870 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:11:08,009-Speed 2620.80 samples/sec Loss 7.2370 LearningRate 0.0285 Epoch: 9 Global Step: 386880 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:11:11,883-Speed 2643.95 samples/sec Loss 7.2694 LearningRate 0.0285 Epoch: 9 Global Step: 386890 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:15,886-Speed 2558.66 samples/sec Loss 7.3043 LearningRate 0.0285 Epoch: 9 Global Step: 386900 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:19,775-Speed 2633.94 samples/sec Loss 7.3108 LearningRate 0.0285 Epoch: 9 Global Step: 386910 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:23,698-Speed 2610.55 samples/sec Loss 7.3338 LearningRate 0.0285 Epoch: 9 Global Step: 386920 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:27,596-Speed 2628.15 samples/sec Loss 7.0603 LearningRate 0.0285 Epoch: 9 Global Step: 386930 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:31,487-Speed 2632.59 samples/sec Loss 7.1597 LearningRate 0.0285 Epoch: 9 Global Step: 386940 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:35,389-Speed 2624.85 samples/sec Loss 7.1634 LearningRate 0.0285 Epoch: 9 Global Step: 386950 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:39,285-Speed 2628.38 samples/sec Loss 7.2286 LearningRate 0.0285 Epoch: 9 Global Step: 386960 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:43,178-Speed 2631.27 samples/sec Loss 7.1057 LearningRate 0.0285 Epoch: 9 Global Step: 386970 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:47,063-Speed 2637.00 samples/sec Loss 7.1588 LearningRate 0.0285 Epoch: 9 Global Step: 386980 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:11:50,961-Speed 2627.16 samples/sec Loss 7.0711 LearningRate 0.0285 Epoch: 9 Global Step: 386990 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:11:54,947-Speed 2569.90 samples/sec Loss 7.1509 LearningRate 0.0285 Epoch: 9 Global Step: 387000 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:11:58,841-Speed 2630.27 samples/sec Loss 7.0610 LearningRate 0.0285 Epoch: 9 Global Step: 387010 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:02,737-Speed 2629.04 samples/sec Loss 7.2797 LearningRate 0.0285 Epoch: 9 Global Step: 387020 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:06,630-Speed 2631.27 samples/sec Loss 7.1858 LearningRate 0.0285 Epoch: 9 Global Step: 387030 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:10,530-Speed 2625.56 samples/sec Loss 7.0854 LearningRate 0.0285 Epoch: 9 Global Step: 387040 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:14,432-Speed 2624.81 samples/sec Loss 7.1967 LearningRate 0.0285 Epoch: 9 Global Step: 387050 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:18,319-Speed 2635.22 samples/sec Loss 7.1665 LearningRate 0.0285 Epoch: 9 Global Step: 387060 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:22,210-Speed 2632.38 samples/sec Loss 7.0920 LearningRate 0.0285 Epoch: 9 Global Step: 387070 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:26,113-Speed 2624.28 samples/sec Loss 7.1518 LearningRate 0.0285 Epoch: 9 Global Step: 387080 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:12:30,016-Speed 2624.22 samples/sec Loss 7.1784 LearningRate 0.0285 Epoch: 9 Global Step: 387090 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:12:33,915-Speed 2626.92 samples/sec Loss 7.1841 LearningRate 0.0284 Epoch: 9 Global Step: 387100 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:12:37,842-Speed 2608.03 samples/sec Loss 7.2609 LearningRate 0.0284 Epoch: 9 Global Step: 387110 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:12:41,730-Speed 2634.75 samples/sec Loss 7.2334 LearningRate 0.0284 Epoch: 9 Global Step: 387120 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:12:45,621-Speed 2632.62 samples/sec Loss 7.0034 LearningRate 0.0284 Epoch: 9 Global Step: 387130 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:12:49,517-Speed 2628.48 samples/sec Loss 7.1415 LearningRate 0.0284 Epoch: 9 Global Step: 387140 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:12:53,420-Speed 2623.98 samples/sec Loss 7.2358 LearningRate 0.0284 Epoch: 9 Global Step: 387150 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:12:57,314-Speed 2630.28 samples/sec Loss 7.1335 LearningRate 0.0284 Epoch: 9 Global Step: 387160 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:01,217-Speed 2624.47 samples/sec Loss 7.1235 LearningRate 0.0284 Epoch: 9 Global Step: 387170 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:05,131-Speed 2616.48 samples/sec Loss 7.1109 LearningRate 0.0284 Epoch: 9 Global Step: 387180 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:09,033-Speed 2624.98 samples/sec Loss 7.2502 LearningRate 0.0284 Epoch: 9 Global Step: 387190 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:13:12,922-Speed 2633.24 samples/sec Loss 7.2055 LearningRate 0.0284 Epoch: 9 Global Step: 387200 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:13:16,813-Speed 2632.70 samples/sec Loss 7.1959 LearningRate 0.0284 Epoch: 9 Global Step: 387210 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:20,720-Speed 2621.61 samples/sec Loss 7.2009 LearningRate 0.0284 Epoch: 9 Global Step: 387220 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:24,617-Speed 2628.30 samples/sec Loss 7.1009 LearningRate 0.0284 Epoch: 9 Global Step: 387230 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:28,509-Speed 2631.71 samples/sec Loss 7.1989 LearningRate 0.0284 Epoch: 9 Global Step: 387240 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:32,409-Speed 2626.27 samples/sec Loss 7.2266 LearningRate 0.0284 Epoch: 9 Global Step: 387250 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:36,301-Speed 2631.91 samples/sec Loss 7.1152 LearningRate 0.0284 Epoch: 9 Global Step: 387260 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:40,205-Speed 2623.46 samples/sec Loss 7.1173 LearningRate 0.0284 Epoch: 9 Global Step: 387270 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:44,125-Speed 2612.40 samples/sec Loss 7.1764 LearningRate 0.0284 Epoch: 9 Global Step: 387280 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:48,027-Speed 2625.18 samples/sec Loss 7.2523 LearningRate 0.0284 Epoch: 9 Global Step: 387290 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:51,923-Speed 2629.08 samples/sec Loss 7.1535 LearningRate 0.0284 Epoch: 9 Global Step: 387300 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:13:55,820-Speed 2628.29 samples/sec Loss 7.1179 LearningRate 0.0284 Epoch: 9 Global Step: 387310 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:13:59,710-Speed 2633.29 samples/sec Loss 7.0717 LearningRate 0.0284 Epoch: 9 Global Step: 387320 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:14:03,606-Speed 2628.72 samples/sec Loss 7.1161 LearningRate 0.0284 Epoch: 9 Global Step: 387330 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:07,505-Speed 2627.03 samples/sec Loss 7.2520 LearningRate 0.0284 Epoch: 9 Global Step: 387340 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:11,414-Speed 2619.66 samples/sec Loss 7.2645 LearningRate 0.0284 Epoch: 9 Global Step: 387350 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:15,309-Speed 2629.79 samples/sec Loss 7.1332 LearningRate 0.0284 Epoch: 9 Global Step: 387360 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:19,217-Speed 2620.68 samples/sec Loss 7.1321 LearningRate 0.0284 Epoch: 9 Global Step: 387370 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:23,112-Speed 2629.83 samples/sec Loss 7.2331 LearningRate 0.0284 Epoch: 9 Global Step: 387380 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:27,015-Speed 2624.61 samples/sec Loss 7.1501 LearningRate 0.0284 Epoch: 9 Global Step: 387390 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:30,917-Speed 2624.65 samples/sec Loss 7.1951 LearningRate 0.0284 Epoch: 9 Global Step: 387400 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:34,813-Speed 2628.68 samples/sec Loss 7.2008 LearningRate 0.0284 Epoch: 9 Global Step: 387410 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:38,710-Speed 2628.06 samples/sec Loss 7.1306 LearningRate 0.0284 Epoch: 9 Global Step: 387420 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:42,615-Speed 2623.45 samples/sec Loss 7.2323 LearningRate 0.0284 Epoch: 9 Global Step: 387430 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:14:46,505-Speed 2633.09 samples/sec Loss 7.0429 LearningRate 0.0284 Epoch: 9 Global Step: 387440 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:50,409-Speed 2623.62 samples/sec Loss 7.0919 LearningRate 0.0284 Epoch: 9 Global Step: 387450 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:54,301-Speed 2630.94 samples/sec Loss 7.2508 LearningRate 0.0284 Epoch: 9 Global Step: 387460 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:14:58,207-Speed 2622.59 samples/sec Loss 7.0988 LearningRate 0.0284 Epoch: 9 Global Step: 387470 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:15:02,105-Speed 2627.27 samples/sec Loss 7.1913 LearningRate 0.0284 Epoch: 9 Global Step: 387480 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:15:05,980-Speed 2644.05 samples/sec Loss 7.1250 LearningRate 0.0284 Epoch: 9 Global Step: 387490 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:09,880-Speed 2626.47 samples/sec Loss 7.1215 LearningRate 0.0284 Epoch: 9 Global Step: 387500 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:13,838-Speed 2587.97 samples/sec Loss 7.0751 LearningRate 0.0284 Epoch: 9 Global Step: 387510 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:17,746-Speed 2620.60 samples/sec Loss 7.1144 LearningRate 0.0284 Epoch: 9 Global Step: 387520 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:21,711-Speed 2583.46 samples/sec Loss 7.1518 LearningRate 0.0284 Epoch: 9 Global Step: 387530 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:25,612-Speed 2625.32 samples/sec Loss 7.0433 LearningRate 0.0284 Epoch: 9 Global Step: 387540 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:29,522-Speed 2619.66 samples/sec Loss 7.1437 LearningRate 0.0284 Epoch: 9 Global Step: 387550 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:35,060-Speed 1849.44 samples/sec Loss 7.1790 LearningRate 0.0284 Epoch: 9 Global Step: 387560 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:38,974-Speed 2616.67 samples/sec Loss 7.3011 LearningRate 0.0284 Epoch: 9 Global Step: 387570 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:42,870-Speed 2629.05 samples/sec Loss 7.1310 LearningRate 0.0284 Epoch: 9 Global Step: 387580 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:46,843-Speed 2578.13 samples/sec Loss 7.1604 LearningRate 0.0284 Epoch: 9 Global Step: 387590 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:15:50,803-Speed 2587.98 samples/sec Loss 7.2714 LearningRate 0.0284 Epoch: 9 Global Step: 387600 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:54,721-Speed 2614.11 samples/sec Loss 7.1040 LearningRate 0.0284 Epoch: 9 Global Step: 387610 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:15:58,564-Speed 2664.65 samples/sec Loss 7.5062 LearningRate 0.0284 Epoch: 9 Global Step: 387620 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:02,459-Speed 2630.00 samples/sec Loss 7.7021 LearningRate 0.0284 Epoch: 9 Global Step: 387630 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:06,368-Speed 2620.18 samples/sec Loss 7.5854 LearningRate 0.0284 Epoch: 9 Global Step: 387640 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:10,292-Speed 2610.02 samples/sec Loss 7.4614 LearningRate 0.0284 Epoch: 9 Global Step: 387650 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:14,380-Speed 2506.13 samples/sec Loss 7.3552 LearningRate 0.0284 Epoch: 9 Global Step: 387660 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:18,278-Speed 2627.85 samples/sec Loss 7.3142 LearningRate 0.0284 Epoch: 9 Global Step: 387670 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:22,174-Speed 2629.03 samples/sec Loss 7.3016 LearningRate 0.0284 Epoch: 9 Global Step: 387680 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:26,092-Speed 2613.91 samples/sec Loss 7.1433 LearningRate 0.0284 Epoch: 9 Global Step: 387690 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:30,135-Speed 2533.45 samples/sec Loss 7.1321 LearningRate 0.0284 Epoch: 9 Global Step: 387700 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:34,204-Speed 2517.43 samples/sec Loss 7.0387 LearningRate 0.0284 Epoch: 9 Global Step: 387710 Fp16 Grad Scale: 8192 Required: 50 hours
Training: 2022-04-14 15:16:38,277-Speed 2514.87 samples/sec Loss 7.2792 LearningRate 0.0284 Epoch: 9 Global Step: 387720 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:16:42,375-Speed 2499.42 samples/sec Loss 7.1584 LearningRate 0.0284 Epoch: 9 Global Step: 387730 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:16:46,391-Speed 2550.49 samples/sec Loss 7.3004 LearningRate 0.0284 Epoch: 9 Global Step: 387740 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:16:50,319-Speed 2607.70 samples/sec Loss 7.1156 LearningRate 0.0284 Epoch: 9 Global Step: 387750 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:16:54,239-Speed 2612.75 samples/sec Loss 7.1463 LearningRate 0.0284 Epoch: 9 Global Step: 387760 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:16:58,221-Speed 2572.50 samples/sec Loss 7.3924 LearningRate 0.0284 Epoch: 9 Global Step: 387770 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:02,117-Speed 2628.62 samples/sec Loss 7.0659 LearningRate 0.0284 Epoch: 9 Global Step: 387780 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:06,020-Speed 2623.98 samples/sec Loss 7.0975 LearningRate 0.0284 Epoch: 9 Global Step: 387790 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:09,928-Speed 2620.60 samples/sec Loss 7.1601 LearningRate 0.0284 Epoch: 9 Global Step: 387800 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:13,822-Speed 2630.07 samples/sec Loss 7.0606 LearningRate 0.0284 Epoch: 9 Global Step: 387810 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:17,731-Speed 2620.53 samples/sec Loss 7.2715 LearningRate 0.0284 Epoch: 9 Global Step: 387820 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:17:21,639-Speed 2621.06 samples/sec Loss 7.2050 LearningRate 0.0284 Epoch: 9 Global Step: 387830 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:17:25,532-Speed 2631.15 samples/sec Loss 7.1758 LearningRate 0.0284 Epoch: 9 Global Step: 387840 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:17:29,431-Speed 2626.74 samples/sec Loss 7.2958 LearningRate 0.0284 Epoch: 9 Global Step: 387850 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:17:33,332-Speed 2625.76 samples/sec Loss 7.2438 LearningRate 0.0284 Epoch: 9 Global Step: 387860 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:17:37,234-Speed 2624.38 samples/sec Loss 7.1355 LearningRate 0.0284 Epoch: 9 Global Step: 387870 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:17:41,121-Speed 2635.31 samples/sec Loss 7.2646 LearningRate 0.0283 Epoch: 9 Global Step: 387880 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:45,020-Speed 2627.11 samples/sec Loss 7.2121 LearningRate 0.0283 Epoch: 9 Global Step: 387890 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:48,942-Speed 2611.62 samples/sec Loss 7.1160 LearningRate 0.0283 Epoch: 9 Global Step: 387900 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:52,837-Speed 2630.10 samples/sec Loss 7.1309 LearningRate 0.0283 Epoch: 9 Global Step: 387910 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:17:56,749-Speed 2618.92 samples/sec Loss 7.1214 LearningRate 0.0283 Epoch: 9 Global Step: 387920 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:18:00,648-Speed 2627.27 samples/sec Loss 7.1485 LearningRate 0.0283 Epoch: 9 Global Step: 387930 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:18:04,558-Speed 2619.43 samples/sec Loss 7.1966 LearningRate 0.0283 Epoch: 9 Global Step: 387940 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:18:08,451-Speed 2630.58 samples/sec Loss 7.3131 LearningRate 0.0283 Epoch: 9 Global Step: 387950 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:18:12,356-Speed 2623.22 samples/sec Loss 7.2258 LearningRate 0.0283 Epoch: 9 Global Step: 387960 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:18:16,264-Speed 2621.02 samples/sec Loss 7.1087 LearningRate 0.0283 Epoch: 9 Global Step: 387970 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:18:20,150-Speed 2635.62 samples/sec Loss 7.1238 LearningRate 0.0283 Epoch: 9 Global Step: 387980 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:24,055-Speed 2622.69 samples/sec Loss 7.1126 LearningRate 0.0283 Epoch: 9 Global Step: 387990 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:27,950-Speed 2630.66 samples/sec Loss 7.1977 LearningRate 0.0283 Epoch: 9 Global Step: 388000 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:31,839-Speed 2633.49 samples/sec Loss 7.2383 LearningRate 0.0283 Epoch: 9 Global Step: 388010 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:35,741-Speed 2624.48 samples/sec Loss 7.2748 LearningRate 0.0283 Epoch: 9 Global Step: 388020 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:39,637-Speed 2629.31 samples/sec Loss 7.2265 LearningRate 0.0283 Epoch: 9 Global Step: 388030 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:43,534-Speed 2628.68 samples/sec Loss 7.0976 LearningRate 0.0283 Epoch: 9 Global Step: 388040 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:47,437-Speed 2624.79 samples/sec Loss 7.1924 LearningRate 0.0283 Epoch: 9 Global Step: 388050 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:51,328-Speed 2631.85 samples/sec Loss 7.1498 LearningRate 0.0283 Epoch: 9 Global Step: 388060 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:55,234-Speed 2622.67 samples/sec Loss 7.0963 LearningRate 0.0283 Epoch: 9 Global Step: 388070 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:18:59,154-Speed 2612.53 samples/sec Loss 7.2230 LearningRate 0.0283 Epoch: 9 Global Step: 388080 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:19:03,043-Speed 2633.94 samples/sec Loss 7.0488 LearningRate 0.0283 Epoch: 9 Global Step: 388090 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:19:06,942-Speed 2626.69 samples/sec Loss 7.1001 LearningRate 0.0283 Epoch: 9 Global Step: 388100 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:19:10,834-Speed 2632.24 samples/sec Loss 7.1744 LearningRate 0.0283 Epoch: 9 Global Step: 388110 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:19:14,747-Speed 2617.57 samples/sec Loss 7.0200 LearningRate 0.0283 Epoch: 9 Global Step: 388120 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:19:18,664-Speed 2614.92 samples/sec Loss 7.1377 LearningRate 0.0283 Epoch: 9 Global Step: 388130 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:19:22,541-Speed 2641.75 samples/sec Loss 7.3154 LearningRate 0.0283 Epoch: 9 Global Step: 388140 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:26,438-Speed 2628.50 samples/sec Loss 7.3175 LearningRate 0.0283 Epoch: 9 Global Step: 388150 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:30,348-Speed 2619.65 samples/sec Loss 7.1952 LearningRate 0.0283 Epoch: 9 Global Step: 388160 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:34,239-Speed 2631.91 samples/sec Loss 7.2116 LearningRate 0.0283 Epoch: 9 Global Step: 388170 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:38,139-Speed 2626.09 samples/sec Loss 7.2289 LearningRate 0.0283 Epoch: 9 Global Step: 388180 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:42,029-Speed 2633.26 samples/sec Loss 7.1962 LearningRate 0.0283 Epoch: 9 Global Step: 388190 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:45,937-Speed 2620.62 samples/sec Loss 7.2410 LearningRate 0.0283 Epoch: 9 Global Step: 388200 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:49,834-Speed 2629.11 samples/sec Loss 6.9827 LearningRate 0.0283 Epoch: 9 Global Step: 388210 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:53,745-Speed 2618.87 samples/sec Loss 7.1689 LearningRate 0.0283 Epoch: 9 Global Step: 388220 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:19:57,646-Speed 2625.54 samples/sec Loss 7.0654 LearningRate 0.0283 Epoch: 9 Global Step: 388230 Fp16 Grad Scale: 16384 Required: 50 hours
Training: 2022-04-14 15:20:01,561-Speed 2616.03 samples/sec Loss 7.2661 LearningRate 0.0283 Epoch: 9 Global Step: 388240 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:05,462-Speed 2625.69 samples/sec Loss 7.1517 LearningRate 0.0283 Epoch: 9 Global Step: 388250 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:09,363-Speed 2625.02 samples/sec Loss 7.0688 LearningRate 0.0283 Epoch: 9 Global Step: 388260 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:13,371-Speed 2555.80 samples/sec Loss 7.0565 LearningRate 0.0283 Epoch: 9 Global Step: 388270 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:17,466-Speed 2501.88 samples/sec Loss 7.1544 LearningRate 0.0283 Epoch: 9 Global Step: 388280 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:21,571-Speed 2494.75 samples/sec Loss 7.2020 LearningRate 0.0283 Epoch: 9 Global Step: 388290 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:25,579-Speed 2555.52 samples/sec Loss 7.1748 LearningRate 0.0283 Epoch: 9 Global Step: 388300 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:29,469-Speed 2633.06 samples/sec Loss 7.0838 LearningRate 0.0283 Epoch: 9 Global Step: 388310 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:33,396-Speed 2608.38 samples/sec Loss 7.2098 LearningRate 0.0283 Epoch: 9 Global Step: 388320 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:37,295-Speed 2627.30 samples/sec Loss 7.0710 LearningRate 0.0283 Epoch: 9 Global Step: 388330 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:20:41,192-Speed 2628.12 samples/sec Loss 7.1521 LearningRate 0.0283 Epoch: 9 Global Step: 388340 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:20:45,081-Speed 2633.28 samples/sec Loss 7.0725 LearningRate 0.0283 Epoch: 9 Global Step: 388350 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:20:48,985-Speed 2623.50 samples/sec Loss 7.1712 LearningRate 0.0283 Epoch: 9 Global Step: 388360 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:20:52,889-Speed 2623.71 samples/sec Loss 7.1939 LearningRate 0.0283 Epoch: 9 Global Step: 388370 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:20:56,791-Speed 2624.56 samples/sec Loss 7.1765 LearningRate 0.0283 Epoch: 9 Global Step: 388380 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:21:00,710-Speed 2614.02 samples/sec Loss 7.1603 LearningRate 0.0283 Epoch: 9 Global Step: 388390 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:21:04,607-Speed 2628.35 samples/sec Loss 7.0807 LearningRate 0.0283 Epoch: 9 Global Step: 388400 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:21:08,512-Speed 2622.63 samples/sec Loss 7.0894 LearningRate 0.0283 Epoch: 9 Global Step: 388410 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:21:12,420-Speed 2621.18 samples/sec Loss 7.2065 LearningRate 0.0283 Epoch: 9 Global Step: 388420 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:21:16,326-Speed 2622.05 samples/sec Loss 7.1727 LearningRate 0.0283 Epoch: 9 Global Step: 388430 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:21:20,227-Speed 2625.77 samples/sec Loss 7.0743 LearningRate 0.0283 Epoch: 9 Global Step: 388440 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:24,125-Speed 2627.01 samples/sec Loss 7.1137 LearningRate 0.0283 Epoch: 9 Global Step: 388450 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:28,031-Speed 2622.54 samples/sec Loss 7.2122 LearningRate 0.0283 Epoch: 9 Global Step: 388460 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:31,936-Speed 2622.64 samples/sec Loss 7.2137 LearningRate 0.0283 Epoch: 9 Global Step: 388470 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:35,831-Speed 2629.69 samples/sec Loss 7.1181 LearningRate 0.0283 Epoch: 9 Global Step: 388480 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:39,724-Speed 2630.95 samples/sec Loss 7.2008 LearningRate 0.0283 Epoch: 9 Global Step: 388490 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:43,627-Speed 2624.63 samples/sec Loss 7.2559 LearningRate 0.0283 Epoch: 9 Global Step: 388500 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:47,519-Speed 2631.62 samples/sec Loss 7.1924 LearningRate 0.0283 Epoch: 9 Global Step: 388510 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:51,423-Speed 2623.12 samples/sec Loss 7.2058 LearningRate 0.0283 Epoch: 9 Global Step: 388520 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:55,323-Speed 2626.10 samples/sec Loss 7.0107 LearningRate 0.0283 Epoch: 9 Global Step: 388530 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:21:59,222-Speed 2627.20 samples/sec Loss 7.2651 LearningRate 0.0283 Epoch: 9 Global Step: 388540 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:22:03,117-Speed 2629.29 samples/sec Loss 7.0469 LearningRate 0.0283 Epoch: 9 Global Step: 388550 Fp16 Grad Scale: 262144 Required: 50 hours
Training: 2022-04-14 15:22:07,018-Speed 2625.79 samples/sec Loss 7.2537 LearningRate 0.0283 Epoch: 9 Global Step: 388560 Fp16 Grad Scale: 131072 Required: 50 hours
Training: 2022-04-14 15:22:10,900-Speed 2638.08 samples/sec Loss 7.3022 LearningRate 0.0283 Epoch: 9 Global Step: 388570 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:14,802-Speed 2625.07 samples/sec Loss 7.0930 LearningRate 0.0283 Epoch: 9 Global Step: 388580 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:18,779-Speed 2575.55 samples/sec Loss 7.0209 LearningRate 0.0283 Epoch: 9 Global Step: 388590 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:22,707-Speed 2607.78 samples/sec Loss 7.2051 LearningRate 0.0283 Epoch: 9 Global Step: 388600 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:26,603-Speed 2628.80 samples/sec Loss 7.1557 LearningRate 0.0283 Epoch: 9 Global Step: 388610 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:30,495-Speed 2631.24 samples/sec Loss 7.1267 LearningRate 0.0283 Epoch: 9 Global Step: 388620 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:34,386-Speed 2632.42 samples/sec Loss 7.0593 LearningRate 0.0283 Epoch: 9 Global Step: 388630 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:38,288-Speed 2625.03 samples/sec Loss 7.0606 LearningRate 0.0283 Epoch: 9 Global Step: 388640 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:42,186-Speed 2627.55 samples/sec Loss 7.0352 LearningRate 0.0283 Epoch: 9 Global Step: 388650 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:22:46,083-Speed 2627.93 samples/sec Loss 7.3566 LearningRate 0.0282 Epoch: 9 Global Step: 388660 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:22:49,983-Speed 2626.20 samples/sec Loss 7.3392 LearningRate 0.0282 Epoch: 9 Global Step: 388670 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:22:53,887-Speed 2624.05 samples/sec Loss 7.2333 LearningRate 0.0282 Epoch: 9 Global Step: 388680 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:22:57,785-Speed 2627.79 samples/sec Loss 7.2504 LearningRate 0.0282 Epoch: 9 Global Step: 388690 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:23:01,697-Speed 2618.00 samples/sec Loss 7.1925 LearningRate 0.0282 Epoch: 9 Global Step: 388700 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:23:05,593-Speed 2628.93 samples/sec Loss 7.1173 LearningRate 0.0282 Epoch: 9 Global Step: 388710 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:23:09,505-Speed 2617.49 samples/sec Loss 7.1758 LearningRate 0.0282 Epoch: 9 Global Step: 388720 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:23:13,407-Speed 2625.52 samples/sec Loss 7.2381 LearningRate 0.0282 Epoch: 9 Global Step: 388730 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:23:17,318-Speed 2618.81 samples/sec Loss 7.2958 LearningRate 0.0282 Epoch: 9 Global Step: 388740 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:23:21,219-Speed 2625.35 samples/sec Loss 7.2443 LearningRate 0.0282 Epoch: 9 Global Step: 388750 Fp16 Grad Scale: 32768 Required: 50 hours
Training: 2022-04-14 15:23:25,128-Speed 2620.12 samples/sec Loss 7.1453 LearningRate 0.0282 Epoch: 9 Global Step: 388760 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:23:29,042-Speed 2616.84 samples/sec Loss 7.1180 LearningRate 0.0282 Epoch: 9 Global Step: 388770 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:23:32,945-Speed 2624.22 samples/sec Loss 7.2421 LearningRate 0.0282 Epoch: 9 Global Step: 388780 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:23:36,843-Speed 2627.86 samples/sec Loss 7.0581 LearningRate 0.0282 Epoch: 9 Global Step: 388790 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:23:40,781-Speed 2600.68 samples/sec Loss 7.2660 LearningRate 0.0282 Epoch: 9 Global Step: 388800 Fp16 Grad Scale: 65536 Required: 50 hours
Training: 2022-04-14 15:23:44,680-Speed 2626.69 samples/sec Loss 7.1219 LearningRate 0.0282 Epoch: 9 Global Step: 388810 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:23:48,587-Speed 2621.41 samples/sec Loss 7.2633 LearningRate 0.0282 Epoch: 9 Global Step: 388820 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:23:52,600-Speed 2552.71 samples/sec Loss 7.1644 LearningRate 0.0282 Epoch: 9 Global Step: 388830 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:23:56,504-Speed 2623.70 samples/sec Loss 7.1786 LearningRate 0.0282 Epoch: 9 Global Step: 388840 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:00,403-Speed 2626.55 samples/sec Loss 7.1054 LearningRate 0.0282 Epoch: 9 Global Step: 388850 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:04,322-Speed 2613.33 samples/sec Loss 7.1677 LearningRate 0.0282 Epoch: 9 Global Step: 388860 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:24:08,211-Speed 2633.67 samples/sec Loss 7.1695 LearningRate 0.0282 Epoch: 9 Global Step: 388870 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:12,137-Speed 2609.24 samples/sec Loss 7.2821 LearningRate 0.0282 Epoch: 9 Global Step: 388880 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:16,201-Speed 2520.25 samples/sec Loss 7.0955 LearningRate 0.0282 Epoch: 9 Global Step: 388890 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:20,149-Speed 2594.25 samples/sec Loss 7.0524 LearningRate 0.0282 Epoch: 9 Global Step: 388900 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:24,058-Speed 2620.09 samples/sec Loss 6.9864 LearningRate 0.0282 Epoch: 9 Global Step: 388910 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:27,991-Speed 2604.36 samples/sec Loss 7.1420 LearningRate 0.0282 Epoch: 9 Global Step: 388920 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:31,984-Speed 2564.74 samples/sec Loss 7.1793 LearningRate 0.0282 Epoch: 9 Global Step: 388930 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:35,887-Speed 2624.03 samples/sec Loss 7.1951 LearningRate 0.0282 Epoch: 9 Global Step: 388940 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:39,786-Speed 2626.89 samples/sec Loss 7.0918 LearningRate 0.0282 Epoch: 9 Global Step: 388950 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:43,690-Speed 2623.66 samples/sec Loss 7.1309 LearningRate 0.0282 Epoch: 9 Global Step: 388960 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:24:47,593-Speed 2624.56 samples/sec Loss 7.1309 LearningRate 0.0282 Epoch: 9 Global Step: 388970 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:24:51,495-Speed 2624.75 samples/sec Loss 7.1562 LearningRate 0.0282 Epoch: 9 Global Step: 388980 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:24:55,393-Speed 2628.34 samples/sec Loss 7.0466 LearningRate 0.0282 Epoch: 9 Global Step: 388990 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:24:59,284-Speed 2631.58 samples/sec Loss 7.1362 LearningRate 0.0282 Epoch: 9 Global Step: 389000 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:25:03,187-Speed 2624.09 samples/sec Loss 7.1178 LearningRate 0.0282 Epoch: 9 Global Step: 389010 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:25:07,087-Speed 2626.60 samples/sec Loss 7.1820 LearningRate 0.0282 Epoch: 9 Global Step: 389020 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:25:10,944-Speed 2655.34 samples/sec Loss 7.4257 LearningRate 0.0282 Epoch: 9 Global Step: 389030 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:14,839-Speed 2629.67 samples/sec Loss 7.0506 LearningRate 0.0282 Epoch: 9 Global Step: 389040 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:18,733-Speed 2630.43 samples/sec Loss 7.0586 LearningRate 0.0282 Epoch: 9 Global Step: 389050 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:22,648-Speed 2615.68 samples/sec Loss 7.2384 LearningRate 0.0282 Epoch: 9 Global Step: 389060 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:26,542-Speed 2630.69 samples/sec Loss 7.0492 LearningRate 0.0282 Epoch: 9 Global Step: 389070 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:30,439-Speed 2628.25 samples/sec Loss 7.0538 LearningRate 0.0282 Epoch: 9 Global Step: 389080 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:34,345-Speed 2622.19 samples/sec Loss 7.2579 LearningRate 0.0282 Epoch: 9 Global Step: 389090 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:38,246-Speed 2625.35 samples/sec Loss 7.1991 LearningRate 0.0282 Epoch: 9 Global Step: 389100 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:42,141-Speed 2629.89 samples/sec Loss 7.0852 LearningRate 0.0282 Epoch: 9 Global Step: 389110 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:46,044-Speed 2623.78 samples/sec Loss 7.0859 LearningRate 0.0282 Epoch: 9 Global Step: 389120 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:25:49,946-Speed 2625.00 samples/sec Loss 6.9790 LearningRate 0.0282 Epoch: 9 Global Step: 389130 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:25:53,834-Speed 2634.46 samples/sec Loss 7.2119 LearningRate 0.0282 Epoch: 9 Global Step: 389140 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:25:57,734-Speed 2627.95 samples/sec Loss 7.1399 LearningRate 0.0282 Epoch: 9 Global Step: 389150 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:01,664-Speed 2606.34 samples/sec Loss 7.1063 LearningRate 0.0282 Epoch: 9 Global Step: 389160 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:05,553-Speed 2633.17 samples/sec Loss 7.0507 LearningRate 0.0282 Epoch: 9 Global Step: 389170 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:09,468-Speed 2616.48 samples/sec Loss 7.0323 LearningRate 0.0282 Epoch: 9 Global Step: 389180 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:13,372-Speed 2623.90 samples/sec Loss 7.1396 LearningRate 0.0282 Epoch: 9 Global Step: 389190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:17,281-Speed 2619.96 samples/sec Loss 7.0336 LearningRate 0.0282 Epoch: 9 Global Step: 389200 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:21,178-Speed 2628.31 samples/sec Loss 7.1700 LearningRate 0.0282 Epoch: 9 Global Step: 389210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:25,084-Speed 2621.73 samples/sec Loss 7.1177 LearningRate 0.0282 Epoch: 9 Global Step: 389220 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:26:28,978-Speed 2630.13 samples/sec Loss 7.2263 LearningRate 0.0282 Epoch: 9 Global Step: 389230 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:26:32,877-Speed 2627.46 samples/sec Loss 7.1517 LearningRate 0.0282 Epoch: 9 Global Step: 389240 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:26:36,785-Speed 2620.57 samples/sec Loss 7.1388 LearningRate 0.0282 Epoch: 9 Global Step: 389250 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:26:40,685-Speed 2626.22 samples/sec Loss 7.0652 LearningRate 0.0282 Epoch: 9 Global Step: 389260 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:26:44,586-Speed 2625.78 samples/sec Loss 7.1090 LearningRate 0.0282 Epoch: 9 Global Step: 389270 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:26:48,483-Speed 2627.65 samples/sec Loss 7.1491 LearningRate 0.0282 Epoch: 9 Global Step: 389280 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:26:52,386-Speed 2624.93 samples/sec Loss 7.2706 LearningRate 0.0282 Epoch: 9 Global Step: 389290 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:26:56,282-Speed 2628.85 samples/sec Loss 7.0876 LearningRate 0.0282 Epoch: 9 Global Step: 389300 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:00,181-Speed 2627.07 samples/sec Loss 7.0658 LearningRate 0.0282 Epoch: 9 Global Step: 389310 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:04,093-Speed 2617.45 samples/sec Loss 7.0836 LearningRate 0.0282 Epoch: 9 Global Step: 389320 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:07,979-Speed 2636.28 samples/sec Loss 7.1676 LearningRate 0.0282 Epoch: 9 Global Step: 389330 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:11,876-Speed 2628.26 samples/sec Loss 7.1343 LearningRate 0.0282 Epoch: 9 Global Step: 389340 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:15,781-Speed 2622.47 samples/sec Loss 7.0730 LearningRate 0.0282 Epoch: 9 Global Step: 389350 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:19,678-Speed 2628.36 samples/sec Loss 7.1896 LearningRate 0.0282 Epoch: 9 Global Step: 389360 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:23,567-Speed 2633.30 samples/sec Loss 7.1337 LearningRate 0.0282 Epoch: 9 Global Step: 389370 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:27,484-Speed 2615.09 samples/sec Loss 7.1320 LearningRate 0.0282 Epoch: 9 Global Step: 389380 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:31,380-Speed 2628.86 samples/sec Loss 7.1035 LearningRate 0.0282 Epoch: 9 Global Step: 389390 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:27:35,258-Speed 2641.60 samples/sec Loss 7.1975 LearningRate 0.0282 Epoch: 9 Global Step: 389400 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:27:39,150-Speed 2631.62 samples/sec Loss 7.1742 LearningRate 0.0282 Epoch: 9 Global Step: 389410 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:27:43,059-Speed 2620.13 samples/sec Loss 7.0498 LearningRate 0.0282 Epoch: 9 Global Step: 389420 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:27:46,971-Speed 2617.75 samples/sec Loss 7.1340 LearningRate 0.0282 Epoch: 9 Global Step: 389430 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:27:50,866-Speed 2630.05 samples/sec Loss 7.1524 LearningRate 0.0281 Epoch: 9 Global Step: 389440 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:27:54,770-Speed 2623.34 samples/sec Loss 7.1858 LearningRate 0.0281 Epoch: 9 Global Step: 389450 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:27:58,673-Speed 2624.21 samples/sec Loss 6.9645 LearningRate 0.0281 Epoch: 9 Global Step: 389460 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:28:02,583-Speed 2619.34 samples/sec Loss 7.1008 LearningRate 0.0281 Epoch: 9 Global Step: 389470 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:28:06,486-Speed 2624.49 samples/sec Loss 7.1246 LearningRate 0.0281 Epoch: 9 Global Step: 389480 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:28:10,391-Speed 2622.93 samples/sec Loss 7.2007 LearningRate 0.0281 Epoch: 9 Global Step: 389490 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:28:14,299-Speed 2620.55 samples/sec Loss 7.1799 LearningRate 0.0281 Epoch: 9 Global Step: 389500 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:28:18,203-Speed 2624.36 samples/sec Loss 7.0502 LearningRate 0.0281 Epoch: 9 Global Step: 389510 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:28:22,089-Speed 2635.03 samples/sec Loss 7.1726 LearningRate 0.0281 Epoch: 9 Global Step: 389520 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:28:25,979-Speed 2633.09 samples/sec Loss 7.1955 LearningRate 0.0281 Epoch: 9 Global Step: 389530 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:29,905-Speed 2609.13 samples/sec Loss 7.4683 LearningRate 0.0281 Epoch: 9 Global Step: 389540 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:33,835-Speed 2605.70 samples/sec Loss 7.2887 LearningRate 0.0281 Epoch: 9 Global Step: 389550 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:37,737-Speed 2625.29 samples/sec Loss 7.1051 LearningRate 0.0281 Epoch: 9 Global Step: 389560 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:41,626-Speed 2633.84 samples/sec Loss 7.1488 LearningRate 0.0281 Epoch: 9 Global Step: 389570 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:45,552-Speed 2608.53 samples/sec Loss 7.2170 LearningRate 0.0281 Epoch: 9 Global Step: 389580 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:49,520-Speed 2581.60 samples/sec Loss 7.2014 LearningRate 0.0281 Epoch: 9 Global Step: 389590 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:53,415-Speed 2629.69 samples/sec Loss 7.1498 LearningRate 0.0281 Epoch: 9 Global Step: 389600 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:28:57,309-Speed 2630.53 samples/sec Loss 7.1357 LearningRate 0.0281 Epoch: 9 Global Step: 389610 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:01,201-Speed 2630.92 samples/sec Loss 7.2225 LearningRate 0.0281 Epoch: 9 Global Step: 389620 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:05,102-Speed 2625.38 samples/sec Loss 7.1199 LearningRate 0.0281 Epoch: 9 Global Step: 389630 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:29:08,997-Speed 2629.90 samples/sec Loss 7.2104 LearningRate 0.0281 Epoch: 9 Global Step: 389640 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:29:12,890-Speed 2630.58 samples/sec Loss 7.0654 LearningRate 0.0281 Epoch: 9 Global Step: 389650 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:29:16,770-Speed 2639.92 samples/sec Loss 7.4945 LearningRate 0.0281 Epoch: 9 Global Step: 389660 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:20,666-Speed 2629.47 samples/sec Loss 7.2012 LearningRate 0.0281 Epoch: 9 Global Step: 389670 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:24,570-Speed 2623.52 samples/sec Loss 7.0491 LearningRate 0.0281 Epoch: 9 Global Step: 389680 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:28,462-Speed 2631.65 samples/sec Loss 7.1873 LearningRate 0.0281 Epoch: 9 Global Step: 389690 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:32,366-Speed 2623.31 samples/sec Loss 7.2201 LearningRate 0.0281 Epoch: 9 Global Step: 389700 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:36,263-Speed 2628.28 samples/sec Loss 7.1773 LearningRate 0.0281 Epoch: 9 Global Step: 389710 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:40,187-Speed 2609.95 samples/sec Loss 7.1129 LearningRate 0.0281 Epoch: 9 Global Step: 389720 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:44,102-Speed 2616.22 samples/sec Loss 7.1899 LearningRate 0.0281 Epoch: 9 Global Step: 389730 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:47,993-Speed 2632.58 samples/sec Loss 7.1409 LearningRate 0.0281 Epoch: 9 Global Step: 389740 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:51,888-Speed 2629.58 samples/sec Loss 7.0966 LearningRate 0.0281 Epoch: 9 Global Step: 389750 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:29:55,782-Speed 2630.23 samples/sec Loss 7.1713 LearningRate 0.0281 Epoch: 9 Global Step: 389760 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:29:59,716-Speed 2604.15 samples/sec Loss 7.1907 LearningRate 0.0281 Epoch: 9 Global Step: 389770 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:03,612-Speed 2628.58 samples/sec Loss 7.1129 LearningRate 0.0281 Epoch: 9 Global Step: 389780 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:07,511-Speed 2626.77 samples/sec Loss 7.2291 LearningRate 0.0281 Epoch: 9 Global Step: 389790 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:11,403-Speed 2631.30 samples/sec Loss 7.1085 LearningRate 0.0281 Epoch: 9 Global Step: 389800 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:15,297-Speed 2630.37 samples/sec Loss 7.1554 LearningRate 0.0281 Epoch: 9 Global Step: 389810 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:19,203-Speed 2622.37 samples/sec Loss 7.1126 LearningRate 0.0281 Epoch: 9 Global Step: 389820 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:23,113-Speed 2619.35 samples/sec Loss 7.1469 LearningRate 0.0281 Epoch: 9 Global Step: 389830 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:27,013-Speed 2625.97 samples/sec Loss 7.0892 LearningRate 0.0281 Epoch: 9 Global Step: 389840 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:30,924-Speed 2618.91 samples/sec Loss 7.0642 LearningRate 0.0281 Epoch: 9 Global Step: 389850 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:34,814-Speed 2633.34 samples/sec Loss 7.1922 LearningRate 0.0281 Epoch: 9 Global Step: 389860 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:30:38,715-Speed 2625.47 samples/sec Loss 7.1500 LearningRate 0.0281 Epoch: 9 Global Step: 389870 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:30:42,590-Speed 2643.27 samples/sec Loss 7.0307 LearningRate 0.0281 Epoch: 9 Global Step: 389880 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:46,492-Speed 2624.86 samples/sec Loss 7.1406 LearningRate 0.0281 Epoch: 9 Global Step: 389890 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:50,390-Speed 2627.71 samples/sec Loss 7.1630 LearningRate 0.0281 Epoch: 9 Global Step: 389900 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:54,296-Speed 2621.78 samples/sec Loss 7.1287 LearningRate 0.0281 Epoch: 9 Global Step: 389910 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:30:58,200-Speed 2623.43 samples/sec Loss 7.0661 LearningRate 0.0281 Epoch: 9 Global Step: 389920 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:31:02,133-Speed 2604.16 samples/sec Loss 7.1192 LearningRate 0.0281 Epoch: 9 Global Step: 389930 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:31:06,041-Speed 2621.13 samples/sec Loss 7.1272 LearningRate 0.0281 Epoch: 9 Global Step: 389940 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:31:09,942-Speed 2625.37 samples/sec Loss 7.0531 LearningRate 0.0281 Epoch: 9 Global Step: 389950 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:31:13,851-Speed 2620.81 samples/sec Loss 7.0682 LearningRate 0.0281 Epoch: 9 Global Step: 389960 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:31:17,750-Speed 2626.68 samples/sec Loss 7.0502 LearningRate 0.0281 Epoch: 9 Global Step: 389970 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:31:21,650-Speed 2625.89 samples/sec Loss 6.9568 LearningRate 0.0281 Epoch: 9 Global Step: 389980 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:31:25,545-Speed 2629.67 samples/sec Loss 7.3275 LearningRate 0.0281 Epoch: 9 Global Step: 389990 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:31:29,445-Speed 2626.30 samples/sec Loss 7.1363 LearningRate 0.0281 Epoch: 9 Global Step: 390000 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:32:12,611-[lfw][390000]XNorm: 23.729612
Training: 2022-04-14 15:32:12,612-[lfw][390000]Accuracy-Flip: 0.99733+-0.00281
Training: 2022-04-14 15:32:12,612-[lfw][390000]Accuracy-Highest: 0.99783
Training: 2022-04-14 15:33:02,708-[cfp_fp][390000]XNorm: 21.771758
Training: 2022-04-14 15:33:02,709-[cfp_fp][390000]Accuracy-Flip: 0.98543+-0.00530
Training: 2022-04-14 15:33:02,710-[cfp_fp][390000]Accuracy-Highest: 0.98757
Training: 2022-04-14 15:33:45,726-[agedb_30][390000]XNorm: 23.669865
Training: 2022-04-14 15:33:45,727-[agedb_30][390000]Accuracy-Flip: 0.97667+-0.00734
Training: 2022-04-14 15:33:45,728-[agedb_30][390000]Accuracy-Highest: 0.97700
Training: 2022-04-14 15:33:49,587-Speed 73.07 samples/sec Loss 7.0529 LearningRate 0.0281 Epoch: 9 Global Step: 390010 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:33:53,438-Speed 2659.49 samples/sec Loss 7.1733 LearningRate 0.0281 Epoch: 9 Global Step: 390020 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:33:57,308-Speed 2646.55 samples/sec Loss 7.1547 LearningRate 0.0281 Epoch: 9 Global Step: 390030 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:01,224-Speed 2615.38 samples/sec Loss 7.0370 LearningRate 0.0281 Epoch: 9 Global Step: 390040 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:05,107-Speed 2638.38 samples/sec Loss 7.0219 LearningRate 0.0281 Epoch: 9 Global Step: 390050 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:08,982-Speed 2642.90 samples/sec Loss 7.1121 LearningRate 0.0281 Epoch: 9 Global Step: 390060 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:12,878-Speed 2629.67 samples/sec Loss 7.0967 LearningRate 0.0281 Epoch: 9 Global Step: 390070 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:16,762-Speed 2637.39 samples/sec Loss 6.9348 LearningRate 0.0281 Epoch: 9 Global Step: 390080 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:20,640-Speed 2641.61 samples/sec Loss 7.2393 LearningRate 0.0281 Epoch: 9 Global Step: 390090 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:24,528-Speed 2634.20 samples/sec Loss 7.1057 LearningRate 0.0281 Epoch: 9 Global Step: 390100 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:28,416-Speed 2634.65 samples/sec Loss 7.0638 LearningRate 0.0281 Epoch: 9 Global Step: 390110 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:34:32,313-Speed 2628.09 samples/sec Loss 7.1203 LearningRate 0.0281 Epoch: 9 Global Step: 390120 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:34:36,210-Speed 2628.39 samples/sec Loss 7.0933 LearningRate 0.0281 Epoch: 9 Global Step: 390130 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:34:40,144-Speed 2603.47 samples/sec Loss 7.1156 LearningRate 0.0281 Epoch: 9 Global Step: 390140 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:34:44,223-Speed 2511.35 samples/sec Loss 7.1754 LearningRate 0.0281 Epoch: 9 Global Step: 390150 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:34:48,123-Speed 2626.55 samples/sec Loss 7.1518 LearningRate 0.0281 Epoch: 9 Global Step: 390160 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:34:52,019-Speed 2628.95 samples/sec Loss 7.0702 LearningRate 0.0281 Epoch: 9 Global Step: 390170 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:34:55,924-Speed 2622.71 samples/sec Loss 7.1342 LearningRate 0.0281 Epoch: 9 Global Step: 390180 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:34:59,829-Speed 2623.47 samples/sec Loss 7.1767 LearningRate 0.0281 Epoch: 9 Global Step: 390190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:35:03,943-Speed 2489.13 samples/sec Loss 7.0990 LearningRate 0.0281 Epoch: 9 Global Step: 390200 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:35:07,840-Speed 2628.57 samples/sec Loss 7.0687 LearningRate 0.0281 Epoch: 9 Global Step: 390210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:35:11,733-Speed 2630.54 samples/sec Loss 6.9807 LearningRate 0.0280 Epoch: 9 Global Step: 390220 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:35:15,624-Speed 2632.51 samples/sec Loss 7.1041 LearningRate 0.0280 Epoch: 9 Global Step: 390230 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:35:19,519-Speed 2629.52 samples/sec Loss 7.1918 LearningRate 0.0280 Epoch: 9 Global Step: 390240 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:35:23,411-Speed 2631.91 samples/sec Loss 7.0519 LearningRate 0.0280 Epoch: 9 Global Step: 390250 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:35:27,311-Speed 2625.70 samples/sec Loss 7.0263 LearningRate 0.0280 Epoch: 9 Global Step: 390260 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:35:31,193-Speed 2639.25 samples/sec Loss 7.0899 LearningRate 0.0280 Epoch: 9 Global Step: 390270 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:35:35,068-Speed 2643.16 samples/sec Loss 7.1417 LearningRate 0.0280 Epoch: 9 Global Step: 390280 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:35:38,966-Speed 2627.22 samples/sec Loss 7.2369 LearningRate 0.0280 Epoch: 9 Global Step: 390290 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:35:42,863-Speed 2628.89 samples/sec Loss 7.1398 LearningRate 0.0280 Epoch: 9 Global Step: 390300 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:35:46,772-Speed 2620.43 samples/sec Loss 7.1879 LearningRate 0.0280 Epoch: 9 Global Step: 390310 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:35:50,680-Speed 2620.58 samples/sec Loss 6.9091 LearningRate 0.0280 Epoch: 9 Global Step: 390320 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:35:54,579-Speed 2627.18 samples/sec Loss 7.0793 LearningRate 0.0280 Epoch: 9 Global Step: 390330 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:35:58,473-Speed 2631.08 samples/sec Loss 7.0915 LearningRate 0.0280 Epoch: 9 Global Step: 390340 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:36:02,403-Speed 2606.10 samples/sec Loss 7.0574 LearningRate 0.0280 Epoch: 9 Global Step: 390350 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:36:06,299-Speed 2629.36 samples/sec Loss 7.2182 LearningRate 0.0280 Epoch: 9 Global Step: 390360 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:36:10,190-Speed 2631.70 samples/sec Loss 7.1605 LearningRate 0.0280 Epoch: 9 Global Step: 390370 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:36:14,088-Speed 2628.08 samples/sec Loss 7.1799 LearningRate 0.0280 Epoch: 9 Global Step: 390380 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:17,985-Speed 2628.39 samples/sec Loss 7.2267 LearningRate 0.0280 Epoch: 9 Global Step: 390390 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:21,904-Speed 2613.84 samples/sec Loss 7.1954 LearningRate 0.0280 Epoch: 9 Global Step: 390400 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:25,801-Speed 2628.94 samples/sec Loss 7.0885 LearningRate 0.0280 Epoch: 9 Global Step: 390410 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:29,735-Speed 2603.33 samples/sec Loss 7.1301 LearningRate 0.0280 Epoch: 9 Global Step: 390420 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:33,630-Speed 2630.31 samples/sec Loss 7.1024 LearningRate 0.0280 Epoch: 9 Global Step: 390430 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:37,522-Speed 2631.47 samples/sec Loss 7.0578 LearningRate 0.0280 Epoch: 9 Global Step: 390440 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:41,416-Speed 2630.08 samples/sec Loss 7.1119 LearningRate 0.0280 Epoch: 9 Global Step: 390450 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:45,320-Speed 2623.44 samples/sec Loss 7.1081 LearningRate 0.0280 Epoch: 9 Global Step: 390460 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:49,219-Speed 2627.58 samples/sec Loss 7.1022 LearningRate 0.0280 Epoch: 9 Global Step: 390470 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:36:53,112-Speed 2630.66 samples/sec Loss 7.1260 LearningRate 0.0280 Epoch: 9 Global Step: 390480 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:36:57,009-Speed 2628.18 samples/sec Loss 7.0118 LearningRate 0.0280 Epoch: 9 Global Step: 390490 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:37:00,907-Speed 2627.75 samples/sec Loss 7.0297 LearningRate 0.0280 Epoch: 9 Global Step: 390500 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:37:04,801-Speed 2630.54 samples/sec Loss 7.1765 LearningRate 0.0280 Epoch: 9 Global Step: 390510 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:37:08,694-Speed 2631.18 samples/sec Loss 7.0565 LearningRate 0.0280 Epoch: 9 Global Step: 390520 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:37:12,592-Speed 2627.64 samples/sec Loss 7.1797 LearningRate 0.0280 Epoch: 9 Global Step: 390530 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:37:16,465-Speed 2644.35 samples/sec Loss 7.1332 LearningRate 0.0280 Epoch: 9 Global Step: 390540 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:37:20,351-Speed 2636.00 samples/sec Loss 7.0452 LearningRate 0.0280 Epoch: 9 Global Step: 390550 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:24,247-Speed 2629.61 samples/sec Loss 7.1839 LearningRate 0.0280 Epoch: 9 Global Step: 390560 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:28,150-Speed 2624.28 samples/sec Loss 7.1193 LearningRate 0.0280 Epoch: 9 Global Step: 390570 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:32,059-Speed 2620.25 samples/sec Loss 7.0871 LearningRate 0.0280 Epoch: 9 Global Step: 390580 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:35,961-Speed 2624.79 samples/sec Loss 7.0925 LearningRate 0.0280 Epoch: 9 Global Step: 390590 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:39,864-Speed 2624.32 samples/sec Loss 7.0549 LearningRate 0.0280 Epoch: 9 Global Step: 390600 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:43,748-Speed 2637.23 samples/sec Loss 7.1269 LearningRate 0.0280 Epoch: 9 Global Step: 390610 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:47,649-Speed 2625.27 samples/sec Loss 7.1437 LearningRate 0.0280 Epoch: 9 Global Step: 390620 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:51,550-Speed 2625.27 samples/sec Loss 7.1751 LearningRate 0.0280 Epoch: 9 Global Step: 390630 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:55,446-Speed 2629.91 samples/sec Loss 7.1517 LearningRate 0.0280 Epoch: 9 Global Step: 390640 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:37:59,337-Speed 2631.79 samples/sec Loss 7.1049 LearningRate 0.0280 Epoch: 9 Global Step: 390650 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:38:03,235-Speed 2628.24 samples/sec Loss 7.0476 LearningRate 0.0280 Epoch: 9 Global Step: 390660 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:38:07,156-Speed 2611.72 samples/sec Loss 7.1090 LearningRate 0.0280 Epoch: 9 Global Step: 390670 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:38:11,070-Speed 2617.63 samples/sec Loss 6.9925 LearningRate 0.0280 Epoch: 9 Global Step: 390680 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:38:15,008-Speed 2600.99 samples/sec Loss 7.1078 LearningRate 0.0280 Epoch: 9 Global Step: 390690 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:38:18,909-Speed 2625.21 samples/sec Loss 7.1651 LearningRate 0.0280 Epoch: 9 Global Step: 390700 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:22,809-Speed 2626.24 samples/sec Loss 7.4851 LearningRate 0.0280 Epoch: 9 Global Step: 390710 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:26,708-Speed 2627.53 samples/sec Loss 7.1065 LearningRate 0.0280 Epoch: 9 Global Step: 390720 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:30,603-Speed 2629.60 samples/sec Loss 7.0447 LearningRate 0.0280 Epoch: 9 Global Step: 390730 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:34,508-Speed 2622.50 samples/sec Loss 7.0132 LearningRate 0.0280 Epoch: 9 Global Step: 390740 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:38,406-Speed 2628.29 samples/sec Loss 7.1457 LearningRate 0.0280 Epoch: 9 Global Step: 390750 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:42,300-Speed 2630.48 samples/sec Loss 7.1103 LearningRate 0.0280 Epoch: 9 Global Step: 390760 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:46,198-Speed 2627.62 samples/sec Loss 7.1308 LearningRate 0.0280 Epoch: 9 Global Step: 390770 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:50,092-Speed 2629.68 samples/sec Loss 7.2723 LearningRate 0.0280 Epoch: 9 Global Step: 390780 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:53,992-Speed 2626.86 samples/sec Loss 7.2314 LearningRate 0.0280 Epoch: 9 Global Step: 390790 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:38:57,888-Speed 2629.35 samples/sec Loss 7.1935 LearningRate 0.0280 Epoch: 9 Global Step: 390800 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:01,786-Speed 2627.67 samples/sec Loss 7.0795 LearningRate 0.0280 Epoch: 9 Global Step: 390810 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:05,695-Speed 2619.96 samples/sec Loss 7.0766 LearningRate 0.0280 Epoch: 9 Global Step: 390820 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:09,598-Speed 2624.83 samples/sec Loss 7.2189 LearningRate 0.0280 Epoch: 9 Global Step: 390830 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:13,486-Speed 2633.81 samples/sec Loss 7.0806 LearningRate 0.0280 Epoch: 9 Global Step: 390840 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:17,408-Speed 2611.24 samples/sec Loss 7.1240 LearningRate 0.0280 Epoch: 9 Global Step: 390850 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:21,303-Speed 2630.08 samples/sec Loss 7.0578 LearningRate 0.0280 Epoch: 9 Global Step: 390860 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:25,199-Speed 2629.08 samples/sec Loss 6.9817 LearningRate 0.0280 Epoch: 9 Global Step: 390870 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:29,101-Speed 2625.17 samples/sec Loss 7.1844 LearningRate 0.0280 Epoch: 9 Global Step: 390880 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:32,995-Speed 2630.03 samples/sec Loss 7.0499 LearningRate 0.0280 Epoch: 9 Global Step: 390890 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:39:36,911-Speed 2615.73 samples/sec Loss 7.1209 LearningRate 0.0280 Epoch: 9 Global Step: 390900 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:39:40,817-Speed 2622.75 samples/sec Loss 7.0549 LearningRate 0.0280 Epoch: 9 Global Step: 390910 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:39:44,722-Speed 2622.51 samples/sec Loss 7.0729 LearningRate 0.0280 Epoch: 9 Global Step: 390920 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:39:48,625-Speed 2624.44 samples/sec Loss 7.1129 LearningRate 0.0280 Epoch: 9 Global Step: 390930 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:39:52,525-Speed 2625.84 samples/sec Loss 7.1000 LearningRate 0.0280 Epoch: 9 Global Step: 390940 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:39:56,424-Speed 2627.95 samples/sec Loss 7.0372 LearningRate 0.0280 Epoch: 9 Global Step: 390950 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:40:00,328-Speed 2623.17 samples/sec Loss 7.2154 LearningRate 0.0280 Epoch: 9 Global Step: 390960 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:40:04,232-Speed 2624.00 samples/sec Loss 7.0448 LearningRate 0.0280 Epoch: 9 Global Step: 390970 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:40:08,151-Speed 2613.07 samples/sec Loss 7.0351 LearningRate 0.0280 Epoch: 9 Global Step: 390980 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:40:12,051-Speed 2626.63 samples/sec Loss 7.1829 LearningRate 0.0280 Epoch: 9 Global Step: 390990 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:40:15,950-Speed 2626.89 samples/sec Loss 7.2556 LearningRate 0.0279 Epoch: 9 Global Step: 391000 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:19,876-Speed 2608.66 samples/sec Loss 7.1338 LearningRate 0.0279 Epoch: 9 Global Step: 391010 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:23,775-Speed 2627.74 samples/sec Loss 7.0163 LearningRate 0.0279 Epoch: 9 Global Step: 391020 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:27,667-Speed 2631.27 samples/sec Loss 7.1863 LearningRate 0.0279 Epoch: 9 Global Step: 391030 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:31,597-Speed 2607.39 samples/sec Loss 7.1900 LearningRate 0.0279 Epoch: 9 Global Step: 391040 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:35,507-Speed 2619.78 samples/sec Loss 7.0485 LearningRate 0.0279 Epoch: 9 Global Step: 391050 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:39,402-Speed 2629.26 samples/sec Loss 7.0197 LearningRate 0.0279 Epoch: 9 Global Step: 391060 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:43,315-Speed 2617.61 samples/sec Loss 7.1424 LearningRate 0.0279 Epoch: 9 Global Step: 391070 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:47,232-Speed 2615.03 samples/sec Loss 7.0994 LearningRate 0.0279 Epoch: 9 Global Step: 391080 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:51,139-Speed 2621.75 samples/sec Loss 6.9736 LearningRate 0.0279 Epoch: 9 Global Step: 391090 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:40:55,046-Speed 2621.75 samples/sec Loss 7.1658 LearningRate 0.0279 Epoch: 9 Global Step: 391100 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:40:58,958-Speed 2618.49 samples/sec Loss 7.0603 LearningRate 0.0279 Epoch: 9 Global Step: 391110 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:41:02,857-Speed 2626.83 samples/sec Loss 7.1388 LearningRate 0.0279 Epoch: 9 Global Step: 391120 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:41:06,755-Speed 2627.85 samples/sec Loss 7.1504 LearningRate 0.0279 Epoch: 9 Global Step: 391130 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:41:10,655-Speed 2626.65 samples/sec Loss 7.1212 LearningRate 0.0279 Epoch: 9 Global Step: 391140 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:41:14,552-Speed 2627.86 samples/sec Loss 7.1339 LearningRate 0.0279 Epoch: 9 Global Step: 391150 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:41:18,445-Speed 2631.36 samples/sec Loss 7.0034 LearningRate 0.0279 Epoch: 9 Global Step: 391160 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:22,342-Speed 2628.01 samples/sec Loss 7.2327 LearningRate 0.0279 Epoch: 9 Global Step: 391170 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:26,237-Speed 2630.09 samples/sec Loss 7.2288 LearningRate 0.0279 Epoch: 9 Global Step: 391180 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:30,131-Speed 2630.52 samples/sec Loss 7.1389 LearningRate 0.0279 Epoch: 9 Global Step: 391190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:34,049-Speed 2614.57 samples/sec Loss 7.1894 LearningRate 0.0279 Epoch: 9 Global Step: 391200 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:37,958-Speed 2620.18 samples/sec Loss 7.0148 LearningRate 0.0279 Epoch: 9 Global Step: 391210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:41,860-Speed 2624.91 samples/sec Loss 7.0920 LearningRate 0.0279 Epoch: 9 Global Step: 391220 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:45,761-Speed 2625.26 samples/sec Loss 7.1586 LearningRate 0.0279 Epoch: 9 Global Step: 391230 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:49,664-Speed 2624.39 samples/sec Loss 7.0770 LearningRate 0.0279 Epoch: 9 Global Step: 391240 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:53,566-Speed 2624.87 samples/sec Loss 7.0828 LearningRate 0.0279 Epoch: 9 Global Step: 391250 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:41:57,474-Speed 2620.89 samples/sec Loss 7.0234 LearningRate 0.0279 Epoch: 9 Global Step: 391260 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:42:01,379-Speed 2622.83 samples/sec Loss 7.2129 LearningRate 0.0279 Epoch: 9 Global Step: 391270 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:42:05,278-Speed 2626.68 samples/sec Loss 7.0998 LearningRate 0.0279 Epoch: 9 Global Step: 391280 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:42:09,146-Speed 2648.44 samples/sec Loss 7.1782 LearningRate 0.0279 Epoch: 9 Global Step: 391290 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:13,045-Speed 2627.09 samples/sec Loss 7.1288 LearningRate 0.0279 Epoch: 9 Global Step: 391300 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:16,942-Speed 2627.97 samples/sec Loss 7.1086 LearningRate 0.0279 Epoch: 9 Global Step: 391310 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:20,842-Speed 2626.30 samples/sec Loss 7.0754 LearningRate 0.0279 Epoch: 9 Global Step: 391320 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:24,779-Speed 2601.85 samples/sec Loss 7.0197 LearningRate 0.0279 Epoch: 9 Global Step: 391330 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:28,678-Speed 2627.24 samples/sec Loss 7.0480 LearningRate 0.0279 Epoch: 9 Global Step: 391340 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:32,575-Speed 2628.13 samples/sec Loss 7.1418 LearningRate 0.0279 Epoch: 9 Global Step: 391350 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:36,476-Speed 2625.84 samples/sec Loss 7.0344 LearningRate 0.0279 Epoch: 9 Global Step: 391360 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:40,377-Speed 2625.45 samples/sec Loss 7.0170 LearningRate 0.0279 Epoch: 9 Global Step: 391370 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:44,283-Speed 2622.39 samples/sec Loss 7.0076 LearningRate 0.0279 Epoch: 9 Global Step: 391380 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:42:48,379-Speed 2500.95 samples/sec Loss 7.0544 LearningRate 0.0279 Epoch: 9 Global Step: 391390 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:42:52,304-Speed 2609.41 samples/sec Loss 7.1074 LearningRate 0.0279 Epoch: 9 Global Step: 391400 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:42:56,219-Speed 2615.52 samples/sec Loss 7.0310 LearningRate 0.0279 Epoch: 9 Global Step: 391410 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:00,114-Speed 2630.47 samples/sec Loss 6.9856 LearningRate 0.0279 Epoch: 9 Global Step: 391420 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:04,062-Speed 2594.41 samples/sec Loss 7.1295 LearningRate 0.0279 Epoch: 9 Global Step: 391430 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:07,971-Speed 2620.46 samples/sec Loss 7.0880 LearningRate 0.0279 Epoch: 9 Global Step: 391440 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:11,863-Speed 2631.93 samples/sec Loss 7.0649 LearningRate 0.0279 Epoch: 9 Global Step: 391450 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:15,798-Speed 2603.22 samples/sec Loss 7.0368 LearningRate 0.0279 Epoch: 9 Global Step: 391460 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:19,700-Speed 2624.61 samples/sec Loss 7.1468 LearningRate 0.0279 Epoch: 9 Global Step: 391470 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:23,623-Speed 2610.97 samples/sec Loss 7.2503 LearningRate 0.0279 Epoch: 9 Global Step: 391480 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:43:27,541-Speed 2614.41 samples/sec Loss 7.1859 LearningRate 0.0279 Epoch: 9 Global Step: 391490 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:31,437-Speed 2628.95 samples/sec Loss 7.0397 LearningRate 0.0279 Epoch: 9 Global Step: 391500 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:35,337-Speed 2626.11 samples/sec Loss 7.1276 LearningRate 0.0279 Epoch: 9 Global Step: 391510 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:39,241-Speed 2624.10 samples/sec Loss 7.0545 LearningRate 0.0279 Epoch: 9 Global Step: 391520 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:43,135-Speed 2629.90 samples/sec Loss 7.1107 LearningRate 0.0279 Epoch: 9 Global Step: 391530 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:47,048-Speed 2617.79 samples/sec Loss 6.9942 LearningRate 0.0279 Epoch: 9 Global Step: 391540 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:50,946-Speed 2628.06 samples/sec Loss 7.1939 LearningRate 0.0279 Epoch: 9 Global Step: 391550 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:54,861-Speed 2615.76 samples/sec Loss 7.1078 LearningRate 0.0279 Epoch: 9 Global Step: 391560 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:43:58,800-Speed 2600.40 samples/sec Loss 6.9621 LearningRate 0.0279 Epoch: 9 Global Step: 391570 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:44:02,684-Speed 2636.53 samples/sec Loss 7.2372 LearningRate 0.0279 Epoch: 9 Global Step: 391580 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:06,595-Speed 2619.01 samples/sec Loss 7.1048 LearningRate 0.0279 Epoch: 9 Global Step: 391590 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:10,515-Speed 2612.82 samples/sec Loss 7.1740 LearningRate 0.0279 Epoch: 9 Global Step: 391600 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:14,474-Speed 2587.82 samples/sec Loss 7.0302 LearningRate 0.0279 Epoch: 9 Global Step: 391610 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:18,538-Speed 2519.97 samples/sec Loss 7.1801 LearningRate 0.0279 Epoch: 9 Global Step: 391620 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:22,441-Speed 2623.96 samples/sec Loss 6.9425 LearningRate 0.0279 Epoch: 9 Global Step: 391630 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:26,349-Speed 2621.23 samples/sec Loss 7.0463 LearningRate 0.0279 Epoch: 9 Global Step: 391640 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:30,268-Speed 2613.44 samples/sec Loss 7.0494 LearningRate 0.0279 Epoch: 9 Global Step: 391650 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:34,177-Speed 2620.08 samples/sec Loss 7.0699 LearningRate 0.0279 Epoch: 9 Global Step: 391660 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:38,082-Speed 2622.71 samples/sec Loss 7.1169 LearningRate 0.0279 Epoch: 9 Global Step: 391670 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:44:42,009-Speed 2608.73 samples/sec Loss 7.2851 LearningRate 0.0279 Epoch: 9 Global Step: 391680 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:44:45,921-Speed 2618.50 samples/sec Loss 7.1153 LearningRate 0.0279 Epoch: 9 Global Step: 391690 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:44:49,823-Speed 2625.12 samples/sec Loss 6.9952 LearningRate 0.0279 Epoch: 9 Global Step: 391700 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:44:53,735-Speed 2618.71 samples/sec Loss 7.1441 LearningRate 0.0279 Epoch: 9 Global Step: 391710 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:44:57,647-Speed 2618.06 samples/sec Loss 7.0588 LearningRate 0.0279 Epoch: 9 Global Step: 391720 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:01,545-Speed 2627.69 samples/sec Loss 6.9674 LearningRate 0.0279 Epoch: 9 Global Step: 391730 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:05,457-Speed 2617.92 samples/sec Loss 7.0515 LearningRate 0.0279 Epoch: 9 Global Step: 391740 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:09,565-Speed 2493.79 samples/sec Loss 7.0388 LearningRate 0.0279 Epoch: 9 Global Step: 391750 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:13,594-Speed 2542.16 samples/sec Loss 6.9623 LearningRate 0.0279 Epoch: 9 Global Step: 391760 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:17,542-Speed 2594.30 samples/sec Loss 7.0292 LearningRate 0.0279 Epoch: 9 Global Step: 391770 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:21,474-Speed 2605.27 samples/sec Loss 7.0131 LearningRate 0.0279 Epoch: 9 Global Step: 391780 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:45:25,384-Speed 2619.44 samples/sec Loss 7.1069 LearningRate 0.0278 Epoch: 9 Global Step: 391790 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:45:29,645-Speed 2403.87 samples/sec Loss 7.0928 LearningRate 0.0278 Epoch: 9 Global Step: 391800 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:33,777-Speed 2479.50 samples/sec Loss 7.0758 LearningRate 0.0278 Epoch: 9 Global Step: 391810 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:37,726-Speed 2593.89 samples/sec Loss 7.0779 LearningRate 0.0278 Epoch: 9 Global Step: 391820 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:41,641-Speed 2615.90 samples/sec Loss 7.0681 LearningRate 0.0278 Epoch: 9 Global Step: 391830 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:45,563-Speed 2611.93 samples/sec Loss 7.0406 LearningRate 0.0278 Epoch: 9 Global Step: 391840 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:49,465-Speed 2625.04 samples/sec Loss 7.1304 LearningRate 0.0278 Epoch: 9 Global Step: 391850 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:53,366-Speed 2625.83 samples/sec Loss 7.1041 LearningRate 0.0278 Epoch: 9 Global Step: 391860 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:45:57,237-Speed 2645.83 samples/sec Loss 7.0985 LearningRate 0.0278 Epoch: 9 Global Step: 391870 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:01,148-Speed 2618.81 samples/sec Loss 7.0838 LearningRate 0.0278 Epoch: 9 Global Step: 391880 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:05,050-Speed 2625.17 samples/sec Loss 7.2140 LearningRate 0.0278 Epoch: 9 Global Step: 391890 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:08,953-Speed 2623.81 samples/sec Loss 7.0338 LearningRate 0.0278 Epoch: 9 Global Step: 391900 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:12,846-Speed 2630.97 samples/sec Loss 7.1866 LearningRate 0.0278 Epoch: 9 Global Step: 391910 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:16,746-Speed 2626.60 samples/sec Loss 7.0368 LearningRate 0.0278 Epoch: 9 Global Step: 391920 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:20,663-Speed 2614.66 samples/sec Loss 6.9828 LearningRate 0.0278 Epoch: 9 Global Step: 391930 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:24,560-Speed 2628.92 samples/sec Loss 7.0358 LearningRate 0.0278 Epoch: 9 Global Step: 391940 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:28,453-Speed 2630.76 samples/sec Loss 7.0444 LearningRate 0.0278 Epoch: 9 Global Step: 391950 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:32,357-Speed 2624.30 samples/sec Loss 7.1320 LearningRate 0.0278 Epoch: 9 Global Step: 391960 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:46:36,258-Speed 2625.85 samples/sec Loss 7.0691 LearningRate 0.0278 Epoch: 9 Global Step: 391970 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:46:40,125-Speed 2648.42 samples/sec Loss 7.1076 LearningRate 0.0278 Epoch: 9 Global Step: 391980 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:46:44,042-Speed 2614.65 samples/sec Loss 7.1191 LearningRate 0.0278 Epoch: 9 Global Step: 391990 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:46:47,959-Speed 2615.09 samples/sec Loss 7.1141 LearningRate 0.0278 Epoch: 9 Global Step: 392000 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:46:51,880-Speed 2612.70 samples/sec Loss 6.9602 LearningRate 0.0278 Epoch: 9 Global Step: 392010 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:46:55,773-Speed 2630.22 samples/sec Loss 6.9487 LearningRate 0.0278 Epoch: 9 Global Step: 392020 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:46:59,704-Speed 2606.45 samples/sec Loss 7.0203 LearningRate 0.0278 Epoch: 9 Global Step: 392030 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:47:03,598-Speed 2630.06 samples/sec Loss 6.9970 LearningRate 0.0278 Epoch: 9 Global Step: 392040 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:47:07,519-Speed 2612.84 samples/sec Loss 6.9270 LearningRate 0.0278 Epoch: 9 Global Step: 392050 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:47:11,419-Speed 2626.48 samples/sec Loss 7.1524 LearningRate 0.0278 Epoch: 9 Global Step: 392060 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:47:15,354-Speed 2603.31 samples/sec Loss 7.0291 LearningRate 0.0278 Epoch: 9 Global Step: 392070 Fp16 Grad Scale: 16384 Required: 49 hours
Training: 2022-04-14 15:47:19,259-Speed 2622.64 samples/sec Loss 7.1185 LearningRate 0.0278 Epoch: 9 Global Step: 392080 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:23,173-Speed 2617.27 samples/sec Loss 7.0808 LearningRate 0.0278 Epoch: 9 Global Step: 392090 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:27,082-Speed 2620.50 samples/sec Loss 7.0044 LearningRate 0.0278 Epoch: 9 Global Step: 392100 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:30,984-Speed 2624.76 samples/sec Loss 7.1024 LearningRate 0.0278 Epoch: 9 Global Step: 392110 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:34,899-Speed 2616.12 samples/sec Loss 7.1096 LearningRate 0.0278 Epoch: 9 Global Step: 392120 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:38,797-Speed 2627.95 samples/sec Loss 7.0758 LearningRate 0.0278 Epoch: 9 Global Step: 392130 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:42,693-Speed 2628.89 samples/sec Loss 7.1696 LearningRate 0.0278 Epoch: 9 Global Step: 392140 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:46,599-Speed 2622.92 samples/sec Loss 7.0191 LearningRate 0.0278 Epoch: 9 Global Step: 392150 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:50,497-Speed 2627.50 samples/sec Loss 6.9551 LearningRate 0.0278 Epoch: 9 Global Step: 392160 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:54,395-Speed 2628.24 samples/sec Loss 6.9564 LearningRate 0.0278 Epoch: 9 Global Step: 392170 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:47:58,294-Speed 2626.65 samples/sec Loss 7.1083 LearningRate 0.0278 Epoch: 9 Global Step: 392180 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:02,196-Speed 2626.09 samples/sec Loss 6.9122 LearningRate 0.0278 Epoch: 9 Global Step: 392190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:06,097-Speed 2625.18 samples/sec Loss 7.1188 LearningRate 0.0278 Epoch: 9 Global Step: 392200 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:10,025-Speed 2607.36 samples/sec Loss 7.0499 LearningRate 0.0278 Epoch: 9 Global Step: 392210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:13,934-Speed 2620.13 samples/sec Loss 7.0730 LearningRate 0.0278 Epoch: 9 Global Step: 392220 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:17,911-Speed 2575.67 samples/sec Loss 7.1152 LearningRate 0.0278 Epoch: 9 Global Step: 392230 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:21,857-Speed 2595.99 samples/sec Loss 7.0364 LearningRate 0.0278 Epoch: 9 Global Step: 392240 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:25,755-Speed 2627.92 samples/sec Loss 7.2179 LearningRate 0.0278 Epoch: 9 Global Step: 392250 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:48:29,666-Speed 2618.23 samples/sec Loss 7.2557 LearningRate 0.0278 Epoch: 9 Global Step: 392260 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:48:33,579-Speed 2617.86 samples/sec Loss 7.2262 LearningRate 0.0278 Epoch: 9 Global Step: 392270 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:48:37,484-Speed 2622.73 samples/sec Loss 7.1540 LearningRate 0.0278 Epoch: 9 Global Step: 392280 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:48:41,386-Speed 2625.66 samples/sec Loss 7.0718 LearningRate 0.0278 Epoch: 9 Global Step: 392290 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:48:45,284-Speed 2626.93 samples/sec Loss 7.1888 LearningRate 0.0278 Epoch: 9 Global Step: 392300 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:48:49,189-Speed 2623.56 samples/sec Loss 7.0798 LearningRate 0.0278 Epoch: 9 Global Step: 392310 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:48:53,094-Speed 2622.53 samples/sec Loss 6.9878 LearningRate 0.0278 Epoch: 9 Global Step: 392320 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:48:56,998-Speed 2623.64 samples/sec Loss 7.1539 LearningRate 0.0278 Epoch: 9 Global Step: 392330 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:49:00,899-Speed 2625.81 samples/sec Loss 7.0692 LearningRate 0.0278 Epoch: 9 Global Step: 392340 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:49:04,806-Speed 2621.54 samples/sec Loss 7.1807 LearningRate 0.0278 Epoch: 9 Global Step: 392350 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:49:08,710-Speed 2623.14 samples/sec Loss 7.0623 LearningRate 0.0278 Epoch: 9 Global Step: 392360 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:12,613-Speed 2624.67 samples/sec Loss 6.9697 LearningRate 0.0278 Epoch: 9 Global Step: 392370 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:16,516-Speed 2624.22 samples/sec Loss 7.2091 LearningRate 0.0278 Epoch: 9 Global Step: 392380 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:20,415-Speed 2626.92 samples/sec Loss 7.0169 LearningRate 0.0278 Epoch: 9 Global Step: 392390 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:24,313-Speed 2627.91 samples/sec Loss 7.0625 LearningRate 0.0278 Epoch: 9 Global Step: 392400 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:28,228-Speed 2616.39 samples/sec Loss 7.1719 LearningRate 0.0278 Epoch: 9 Global Step: 392410 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:32,127-Speed 2626.76 samples/sec Loss 7.0193 LearningRate 0.0278 Epoch: 9 Global Step: 392420 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:36,031-Speed 2623.78 samples/sec Loss 7.0433 LearningRate 0.0278 Epoch: 9 Global Step: 392430 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:39,937-Speed 2622.75 samples/sec Loss 7.0867 LearningRate 0.0278 Epoch: 9 Global Step: 392440 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:43,843-Speed 2622.01 samples/sec Loss 7.2238 LearningRate 0.0278 Epoch: 9 Global Step: 392450 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:49:47,751-Speed 2621.45 samples/sec Loss 7.0619 LearningRate 0.0278 Epoch: 9 Global Step: 392460 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:49:51,653-Speed 2624.97 samples/sec Loss 7.1258 LearningRate 0.0278 Epoch: 9 Global Step: 392470 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:49:55,545-Speed 2631.72 samples/sec Loss 6.9838 LearningRate 0.0278 Epoch: 9 Global Step: 392480 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:49:59,457-Speed 2618.41 samples/sec Loss 7.0389 LearningRate 0.0278 Epoch: 9 Global Step: 392490 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:03,439-Speed 2571.81 samples/sec Loss 7.0782 LearningRate 0.0278 Epoch: 9 Global Step: 392500 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:07,338-Speed 2626.79 samples/sec Loss 6.9318 LearningRate 0.0278 Epoch: 9 Global Step: 392510 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:11,243-Speed 2623.48 samples/sec Loss 7.1103 LearningRate 0.0278 Epoch: 9 Global Step: 392520 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:15,158-Speed 2615.48 samples/sec Loss 7.0633 LearningRate 0.0278 Epoch: 9 Global Step: 392530 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:19,062-Speed 2624.55 samples/sec Loss 7.0830 LearningRate 0.0278 Epoch: 9 Global Step: 392540 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:22,987-Speed 2609.54 samples/sec Loss 7.1022 LearningRate 0.0278 Epoch: 9 Global Step: 392550 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:27,002-Speed 2551.07 samples/sec Loss 7.1001 LearningRate 0.0278 Epoch: 9 Global Step: 392560 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:50:30,921-Speed 2614.03 samples/sec Loss 7.0142 LearningRate 0.0278 Epoch: 9 Global Step: 392570 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:50:34,824-Speed 2624.04 samples/sec Loss 7.0256 LearningRate 0.0277 Epoch: 9 Global Step: 392580 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:50:38,721-Speed 2628.12 samples/sec Loss 7.1286 LearningRate 0.0277 Epoch: 9 Global Step: 392590 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:50:42,640-Speed 2613.52 samples/sec Loss 7.0064 LearningRate 0.0277 Epoch: 9 Global Step: 392600 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:50:46,524-Speed 2637.71 samples/sec Loss 7.0943 LearningRate 0.0277 Epoch: 9 Global Step: 392610 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:50,426-Speed 2624.87 samples/sec Loss 7.1815 LearningRate 0.0277 Epoch: 9 Global Step: 392620 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:54,348-Speed 2611.75 samples/sec Loss 7.0655 LearningRate 0.0277 Epoch: 9 Global Step: 392630 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:50:58,255-Speed 2621.44 samples/sec Loss 7.0051 LearningRate 0.0277 Epoch: 9 Global Step: 392640 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:02,159-Speed 2623.74 samples/sec Loss 7.0689 LearningRate 0.0277 Epoch: 9 Global Step: 392650 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:06,074-Speed 2615.88 samples/sec Loss 7.0784 LearningRate 0.0277 Epoch: 9 Global Step: 392660 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:09,981-Speed 2622.18 samples/sec Loss 7.0113 LearningRate 0.0277 Epoch: 9 Global Step: 392670 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:13,889-Speed 2620.61 samples/sec Loss 7.1826 LearningRate 0.0277 Epoch: 9 Global Step: 392680 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:17,794-Speed 2623.29 samples/sec Loss 7.0975 LearningRate 0.0277 Epoch: 9 Global Step: 392690 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:21,697-Speed 2624.41 samples/sec Loss 7.1450 LearningRate 0.0277 Epoch: 9 Global Step: 392700 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:25,600-Speed 2624.36 samples/sec Loss 7.1668 LearningRate 0.0277 Epoch: 9 Global Step: 392710 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:51:29,474-Speed 2644.43 samples/sec Loss 7.0565 LearningRate 0.0277 Epoch: 9 Global Step: 392720 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:33,379-Speed 2622.57 samples/sec Loss 7.1558 LearningRate 0.0277 Epoch: 9 Global Step: 392730 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:37,287-Speed 2620.54 samples/sec Loss 7.0198 LearningRate 0.0277 Epoch: 9 Global Step: 392740 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:41,195-Speed 2620.92 samples/sec Loss 7.0358 LearningRate 0.0277 Epoch: 9 Global Step: 392750 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:45,098-Speed 2624.50 samples/sec Loss 7.0861 LearningRate 0.0277 Epoch: 9 Global Step: 392760 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:49,007-Speed 2620.31 samples/sec Loss 7.0692 LearningRate 0.0277 Epoch: 9 Global Step: 392770 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:51:52,900-Speed 2631.37 samples/sec Loss 7.0728 LearningRate 0.0277 Epoch: 9 Global Step: 392780 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:51:56,807-Speed 2621.52 samples/sec Loss 7.0841 LearningRate 0.0277 Epoch: 9 Global Step: 392790 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:00,717-Speed 2619.30 samples/sec Loss 7.0643 LearningRate 0.0277 Epoch: 9 Global Step: 392800 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:04,656-Speed 2600.38 samples/sec Loss 6.8969 LearningRate 0.0277 Epoch: 9 Global Step: 392810 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:08,594-Speed 2600.61 samples/sec Loss 6.9715 LearningRate 0.0277 Epoch: 9 Global Step: 392820 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:12,516-Speed 2611.40 samples/sec Loss 7.1156 LearningRate 0.0277 Epoch: 9 Global Step: 392830 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:16,426-Speed 2619.98 samples/sec Loss 7.0548 LearningRate 0.0277 Epoch: 9 Global Step: 392840 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:20,329-Speed 2624.05 samples/sec Loss 7.1708 LearningRate 0.0277 Epoch: 9 Global Step: 392850 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:24,232-Speed 2624.90 samples/sec Loss 7.0906 LearningRate 0.0277 Epoch: 9 Global Step: 392860 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:28,133-Speed 2624.98 samples/sec Loss 7.0902 LearningRate 0.0277 Epoch: 9 Global Step: 392870 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:52:32,068-Speed 2603.40 samples/sec Loss 7.0686 LearningRate 0.0277 Epoch: 9 Global Step: 392880 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:52:35,969-Speed 2625.41 samples/sec Loss 7.0754 LearningRate 0.0277 Epoch: 9 Global Step: 392890 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:52:39,889-Speed 2613.22 samples/sec Loss 7.0248 LearningRate 0.0277 Epoch: 9 Global Step: 392900 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:52:43,787-Speed 2627.55 samples/sec Loss 7.0564 LearningRate 0.0277 Epoch: 9 Global Step: 392910 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:52:47,685-Speed 2627.72 samples/sec Loss 7.0701 LearningRate 0.0277 Epoch: 9 Global Step: 392920 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:52:51,585-Speed 2626.33 samples/sec Loss 7.1502 LearningRate 0.0277 Epoch: 9 Global Step: 392930 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:52:55,493-Speed 2621.25 samples/sec Loss 7.0967 LearningRate 0.0277 Epoch: 9 Global Step: 392940 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:52:59,390-Speed 2627.86 samples/sec Loss 6.9481 LearningRate 0.0277 Epoch: 9 Global Step: 392950 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:53:03,284-Speed 2630.66 samples/sec Loss 7.0541 LearningRate 0.0277 Epoch: 9 Global Step: 392960 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:53:07,196-Speed 2618.16 samples/sec Loss 7.1166 LearningRate 0.0277 Epoch: 9 Global Step: 392970 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:53:11,079-Speed 2637.45 samples/sec Loss 7.1821 LearningRate 0.0277 Epoch: 9 Global Step: 392980 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:14,979-Speed 2626.83 samples/sec Loss 6.9730 LearningRate 0.0277 Epoch: 9 Global Step: 392990 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:18,883-Speed 2623.45 samples/sec Loss 6.9594 LearningRate 0.0277 Epoch: 9 Global Step: 393000 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:22,791-Speed 2621.26 samples/sec Loss 7.1059 LearningRate 0.0277 Epoch: 9 Global Step: 393010 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:26,691-Speed 2626.31 samples/sec Loss 7.0633 LearningRate 0.0277 Epoch: 9 Global Step: 393020 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:30,588-Speed 2628.53 samples/sec Loss 7.1288 LearningRate 0.0277 Epoch: 9 Global Step: 393030 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:34,491-Speed 2624.05 samples/sec Loss 7.0007 LearningRate 0.0277 Epoch: 9 Global Step: 393040 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:38,401-Speed 2620.01 samples/sec Loss 7.0302 LearningRate 0.0277 Epoch: 9 Global Step: 393050 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:42,296-Speed 2629.17 samples/sec Loss 7.1456 LearningRate 0.0277 Epoch: 9 Global Step: 393060 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:46,236-Speed 2599.97 samples/sec Loss 6.8742 LearningRate 0.0277 Epoch: 9 Global Step: 393070 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:53:50,144-Speed 2621.35 samples/sec Loss 7.1619 LearningRate 0.0277 Epoch: 9 Global Step: 393080 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:53:54,045-Speed 2625.79 samples/sec Loss 7.1730 LearningRate 0.0277 Epoch: 9 Global Step: 393090 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:53:57,929-Speed 2637.20 samples/sec Loss 7.0910 LearningRate 0.0277 Epoch: 9 Global Step: 393100 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:01,829-Speed 2626.22 samples/sec Loss 7.0190 LearningRate 0.0277 Epoch: 9 Global Step: 393110 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:05,728-Speed 2626.92 samples/sec Loss 7.0480 LearningRate 0.0277 Epoch: 9 Global Step: 393120 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:09,774-Speed 2531.38 samples/sec Loss 7.0593 LearningRate 0.0277 Epoch: 9 Global Step: 393130 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:13,676-Speed 2625.14 samples/sec Loss 6.9602 LearningRate 0.0277 Epoch: 9 Global Step: 393140 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:17,593-Speed 2615.32 samples/sec Loss 7.1150 LearningRate 0.0277 Epoch: 9 Global Step: 393150 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:21,508-Speed 2615.63 samples/sec Loss 7.0663 LearningRate 0.0277 Epoch: 9 Global Step: 393160 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:25,416-Speed 2620.72 samples/sec Loss 7.1808 LearningRate 0.0277 Epoch: 9 Global Step: 393170 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:29,320-Speed 2624.18 samples/sec Loss 7.0348 LearningRate 0.0277 Epoch: 9 Global Step: 393180 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:33,220-Speed 2626.11 samples/sec Loss 7.0934 LearningRate 0.0277 Epoch: 9 Global Step: 393190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:54:37,121-Speed 2626.09 samples/sec Loss 7.2295 LearningRate 0.0277 Epoch: 9 Global Step: 393200 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:54:41,022-Speed 2625.60 samples/sec Loss 7.1859 LearningRate 0.0277 Epoch: 9 Global Step: 393210 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:54:44,936-Speed 2616.36 samples/sec Loss 6.9960 LearningRate 0.0277 Epoch: 9 Global Step: 393220 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:54:48,845-Speed 2620.77 samples/sec Loss 7.0690 LearningRate 0.0277 Epoch: 9 Global Step: 393230 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:54:52,745-Speed 2626.03 samples/sec Loss 7.0488 LearningRate 0.0277 Epoch: 9 Global Step: 393240 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:54:56,650-Speed 2622.88 samples/sec Loss 7.0517 LearningRate 0.0277 Epoch: 9 Global Step: 393250 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:00,551-Speed 2625.49 samples/sec Loss 7.0908 LearningRate 0.0277 Epoch: 9 Global Step: 393260 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:04,459-Speed 2621.04 samples/sec Loss 7.1299 LearningRate 0.0277 Epoch: 9 Global Step: 393270 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:08,356-Speed 2628.07 samples/sec Loss 6.9785 LearningRate 0.0277 Epoch: 9 Global Step: 393280 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:12,270-Speed 2617.08 samples/sec Loss 6.9102 LearningRate 0.0277 Epoch: 9 Global Step: 393290 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:16,173-Speed 2624.27 samples/sec Loss 7.0250 LearningRate 0.0277 Epoch: 9 Global Step: 393300 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:55:20,077-Speed 2623.46 samples/sec Loss 7.0813 LearningRate 0.0277 Epoch: 9 Global Step: 393310 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:55:24,010-Speed 2604.14 samples/sec Loss 7.0259 LearningRate 0.0277 Epoch: 9 Global Step: 393320 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:55:27,916-Speed 2622.43 samples/sec Loss 7.0660 LearningRate 0.0277 Epoch: 9 Global Step: 393330 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:55:31,800-Speed 2636.88 samples/sec Loss 6.9748 LearningRate 0.0277 Epoch: 9 Global Step: 393340 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:35,715-Speed 2616.24 samples/sec Loss 7.0420 LearningRate 0.0277 Epoch: 9 Global Step: 393350 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:39,623-Speed 2621.16 samples/sec Loss 7.1480 LearningRate 0.0276 Epoch: 9 Global Step: 393360 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:43,524-Speed 2625.75 samples/sec Loss 7.0678 LearningRate 0.0276 Epoch: 9 Global Step: 393370 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:47,429-Speed 2623.10 samples/sec Loss 6.9692 LearningRate 0.0276 Epoch: 9 Global Step: 393380 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:51,330-Speed 2624.94 samples/sec Loss 7.0428 LearningRate 0.0276 Epoch: 9 Global Step: 393390 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:55,258-Speed 2608.10 samples/sec Loss 7.1098 LearningRate 0.0276 Epoch: 9 Global Step: 393400 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:55:59,161-Speed 2624.05 samples/sec Loss 7.0722 LearningRate 0.0276 Epoch: 9 Global Step: 393410 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:56:03,039-Speed 2641.31 samples/sec Loss 6.9827 LearningRate 0.0276 Epoch: 9 Global Step: 393420 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:06,945-Speed 2622.28 samples/sec Loss 7.3157 LearningRate 0.0276 Epoch: 9 Global Step: 393430 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:10,850-Speed 2622.95 samples/sec Loss 7.0745 LearningRate 0.0276 Epoch: 9 Global Step: 393440 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:14,762-Speed 2617.69 samples/sec Loss 7.1351 LearningRate 0.0276 Epoch: 9 Global Step: 393450 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:18,705-Speed 2598.08 samples/sec Loss 6.8002 LearningRate 0.0276 Epoch: 9 Global Step: 393460 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:22,611-Speed 2621.94 samples/sec Loss 7.0589 LearningRate 0.0276 Epoch: 9 Global Step: 393470 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:26,514-Speed 2625.28 samples/sec Loss 7.0575 LearningRate 0.0276 Epoch: 9 Global Step: 393480 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:30,414-Speed 2626.25 samples/sec Loss 7.1092 LearningRate 0.0276 Epoch: 9 Global Step: 393490 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:34,314-Speed 2625.80 samples/sec Loss 6.9685 LearningRate 0.0276 Epoch: 9 Global Step: 393500 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:38,221-Speed 2621.66 samples/sec Loss 7.1493 LearningRate 0.0276 Epoch: 9 Global Step: 393510 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 15:56:42,118-Speed 2628.69 samples/sec Loss 6.9696 LearningRate 0.0276 Epoch: 9 Global Step: 393520 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:56:46,022-Speed 2623.54 samples/sec Loss 7.0282 LearningRate 0.0276 Epoch: 9 Global Step: 393530 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:56:49,925-Speed 2624.50 samples/sec Loss 7.0509 LearningRate 0.0276 Epoch: 9 Global Step: 393540 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:56:53,828-Speed 2624.48 samples/sec Loss 6.9909 LearningRate 0.0276 Epoch: 9 Global Step: 393550 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:56:57,735-Speed 2621.79 samples/sec Loss 6.9371 LearningRate 0.0276 Epoch: 9 Global Step: 393560 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:01,635-Speed 2626.31 samples/sec Loss 7.0766 LearningRate 0.0276 Epoch: 9 Global Step: 393570 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:05,533-Speed 2627.91 samples/sec Loss 7.1094 LearningRate 0.0276 Epoch: 9 Global Step: 393580 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:09,427-Speed 2630.05 samples/sec Loss 7.0388 LearningRate 0.0276 Epoch: 9 Global Step: 393590 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:13,334-Speed 2621.87 samples/sec Loss 6.9974 LearningRate 0.0276 Epoch: 9 Global Step: 393600 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:17,246-Speed 2618.52 samples/sec Loss 7.0746 LearningRate 0.0276 Epoch: 9 Global Step: 393610 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:21,158-Speed 2617.76 samples/sec Loss 7.0969 LearningRate 0.0276 Epoch: 9 Global Step: 393620 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:57:25,059-Speed 2625.88 samples/sec Loss 7.1166 LearningRate 0.0276 Epoch: 9 Global Step: 393630 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:57:28,990-Speed 2606.02 samples/sec Loss 7.0808 LearningRate 0.0276 Epoch: 9 Global Step: 393640 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:57:32,905-Speed 2615.72 samples/sec Loss 7.1045 LearningRate 0.0276 Epoch: 9 Global Step: 393650 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:57:36,808-Speed 2624.35 samples/sec Loss 6.9269 LearningRate 0.0276 Epoch: 9 Global Step: 393660 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:40,713-Speed 2623.35 samples/sec Loss 6.9799 LearningRate 0.0276 Epoch: 9 Global Step: 393670 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:44,615-Speed 2624.40 samples/sec Loss 7.0459 LearningRate 0.0276 Epoch: 9 Global Step: 393680 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:48,520-Speed 2623.19 samples/sec Loss 6.9879 LearningRate 0.0276 Epoch: 9 Global Step: 393690 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:52,437-Speed 2615.32 samples/sec Loss 7.0241 LearningRate 0.0276 Epoch: 9 Global Step: 393700 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:57:56,346-Speed 2620.27 samples/sec Loss 7.0513 LearningRate 0.0276 Epoch: 9 Global Step: 393710 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:58:00,252-Speed 2621.93 samples/sec Loss 7.1231 LearningRate 0.0276 Epoch: 9 Global Step: 393720 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:58:04,158-Speed 2622.33 samples/sec Loss 6.9726 LearningRate 0.0276 Epoch: 9 Global Step: 393730 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:58:08,069-Speed 2618.79 samples/sec Loss 6.9464 LearningRate 0.0276 Epoch: 9 Global Step: 393740 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:58:11,971-Speed 2625.56 samples/sec Loss 7.1557 LearningRate 0.0276 Epoch: 9 Global Step: 393750 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:58:15,881-Speed 2619.60 samples/sec Loss 7.0430 LearningRate 0.0276 Epoch: 9 Global Step: 393760 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:19,778-Speed 2628.57 samples/sec Loss 6.9930 LearningRate 0.0276 Epoch: 9 Global Step: 393770 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:23,683-Speed 2622.98 samples/sec Loss 6.8240 LearningRate 0.0276 Epoch: 9 Global Step: 393780 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:27,594-Speed 2619.12 samples/sec Loss 7.0224 LearningRate 0.0276 Epoch: 9 Global Step: 393790 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:31,504-Speed 2619.35 samples/sec Loss 6.9794 LearningRate 0.0276 Epoch: 9 Global Step: 393800 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:35,408-Speed 2623.27 samples/sec Loss 7.1441 LearningRate 0.0276 Epoch: 9 Global Step: 393810 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:39,323-Speed 2616.41 samples/sec Loss 7.0917 LearningRate 0.0276 Epoch: 9 Global Step: 393820 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:43,244-Speed 2612.43 samples/sec Loss 7.0672 LearningRate 0.0276 Epoch: 9 Global Step: 393830 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:47,142-Speed 2627.63 samples/sec Loss 6.9678 LearningRate 0.0276 Epoch: 9 Global Step: 393840 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:51,045-Speed 2624.82 samples/sec Loss 6.9608 LearningRate 0.0276 Epoch: 9 Global Step: 393850 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:58:54,941-Speed 2628.80 samples/sec Loss 7.1068 LearningRate 0.0276 Epoch: 9 Global Step: 393860 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:58:58,847-Speed 2622.23 samples/sec Loss 6.9460 LearningRate 0.0276 Epoch: 9 Global Step: 393870 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 15:59:02,730-Speed 2637.77 samples/sec Loss 6.9147 LearningRate 0.0276 Epoch: 9 Global Step: 393880 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:59:06,622-Speed 2631.93 samples/sec Loss 7.1631 LearningRate 0.0276 Epoch: 9 Global Step: 393890 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:10,530-Speed 2621.13 samples/sec Loss 7.0716 LearningRate 0.0276 Epoch: 9 Global Step: 393900 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:14,456-Speed 2608.82 samples/sec Loss 7.1011 LearningRate 0.0276 Epoch: 9 Global Step: 393910 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:18,351-Speed 2629.94 samples/sec Loss 7.0339 LearningRate 0.0276 Epoch: 9 Global Step: 393920 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:22,253-Speed 2624.92 samples/sec Loss 6.9946 LearningRate 0.0276 Epoch: 9 Global Step: 393930 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:26,274-Speed 2547.44 samples/sec Loss 6.9777 LearningRate 0.0276 Epoch: 9 Global Step: 393940 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:30,173-Speed 2626.74 samples/sec Loss 7.1121 LearningRate 0.0276 Epoch: 9 Global Step: 393950 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:34,083-Speed 2619.34 samples/sec Loss 6.9674 LearningRate 0.0276 Epoch: 9 Global Step: 393960 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:37,978-Speed 2629.57 samples/sec Loss 7.0374 LearningRate 0.0276 Epoch: 9 Global Step: 393970 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:41,892-Speed 2616.66 samples/sec Loss 7.0514 LearningRate 0.0276 Epoch: 9 Global Step: 393980 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:45,789-Speed 2628.48 samples/sec Loss 7.0774 LearningRate 0.0276 Epoch: 9 Global Step: 393990 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:59:49,688-Speed 2626.88 samples/sec Loss 7.0687 LearningRate 0.0276 Epoch: 9 Global Step: 394000 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 15:59:53,565-Speed 2641.97 samples/sec Loss 6.9671 LearningRate 0.0276 Epoch: 9 Global Step: 394010 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 15:59:57,615-Speed 2529.49 samples/sec Loss 7.0210 LearningRate 0.0276 Epoch: 9 Global Step: 394020 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:01,512-Speed 2627.75 samples/sec Loss 7.1393 LearningRate 0.0276 Epoch: 9 Global Step: 394030 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:05,417-Speed 2623.00 samples/sec Loss 7.0392 LearningRate 0.0276 Epoch: 9 Global Step: 394040 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:09,316-Speed 2626.82 samples/sec Loss 7.0740 LearningRate 0.0276 Epoch: 9 Global Step: 394050 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:13,262-Speed 2596.30 samples/sec Loss 7.0473 LearningRate 0.0276 Epoch: 9 Global Step: 394060 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:17,177-Speed 2616.25 samples/sec Loss 6.9546 LearningRate 0.0276 Epoch: 9 Global Step: 394070 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:21,082-Speed 2622.44 samples/sec Loss 6.9473 LearningRate 0.0276 Epoch: 9 Global Step: 394080 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:25,005-Speed 2611.03 samples/sec Loss 6.9936 LearningRate 0.0276 Epoch: 9 Global Step: 394090 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:28,904-Speed 2626.86 samples/sec Loss 6.9443 LearningRate 0.0276 Epoch: 9 Global Step: 394100 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:32,806-Speed 2625.57 samples/sec Loss 7.0108 LearningRate 0.0276 Epoch: 9 Global Step: 394110 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:00:36,707-Speed 2625.23 samples/sec Loss 7.0266 LearningRate 0.0276 Epoch: 9 Global Step: 394120 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:00:40,607-Speed 2626.15 samples/sec Loss 7.1024 LearningRate 0.0276 Epoch: 9 Global Step: 394130 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:00:44,495-Speed 2635.13 samples/sec Loss 7.1117 LearningRate 0.0276 Epoch: 9 Global Step: 394140 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:48,468-Speed 2577.95 samples/sec Loss 7.0440 LearningRate 0.0275 Epoch: 9 Global Step: 394150 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:52,373-Speed 2623.13 samples/sec Loss 7.0618 LearningRate 0.0275 Epoch: 9 Global Step: 394160 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:00:56,275-Speed 2624.59 samples/sec Loss 7.1040 LearningRate 0.0275 Epoch: 9 Global Step: 394170 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:01:00,171-Speed 2629.23 samples/sec Loss 7.1194 LearningRate 0.0275 Epoch: 9 Global Step: 394180 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:01:04,076-Speed 2622.45 samples/sec Loss 6.9291 LearningRate 0.0275 Epoch: 9 Global Step: 394190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:01:07,983-Speed 2621.93 samples/sec Loss 6.9612 LearningRate 0.0275 Epoch: 9 Global Step: 394200 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:01:11,881-Speed 2627.59 samples/sec Loss 7.0443 LearningRate 0.0275 Epoch: 9 Global Step: 394210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:01:15,783-Speed 2625.27 samples/sec Loss 6.9595 LearningRate 0.0275 Epoch: 9 Global Step: 394220 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:01:19,680-Speed 2628.29 samples/sec Loss 7.0751 LearningRate 0.0275 Epoch: 9 Global Step: 394230 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:01:23,588-Speed 2620.60 samples/sec Loss 7.0611 LearningRate 0.0275 Epoch: 9 Global Step: 394240 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:27,496-Speed 2621.69 samples/sec Loss 7.0085 LearningRate 0.0275 Epoch: 9 Global Step: 394250 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:31,406-Speed 2619.00 samples/sec Loss 6.8939 LearningRate 0.0275 Epoch: 9 Global Step: 394260 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:35,329-Speed 2610.68 samples/sec Loss 7.0596 LearningRate 0.0275 Epoch: 9 Global Step: 394270 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:39,235-Speed 2621.90 samples/sec Loss 7.0047 LearningRate 0.0275 Epoch: 9 Global Step: 394280 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:43,136-Speed 2626.39 samples/sec Loss 7.0837 LearningRate 0.0275 Epoch: 9 Global Step: 394290 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:47,038-Speed 2624.29 samples/sec Loss 7.1049 LearningRate 0.0275 Epoch: 9 Global Step: 394300 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:50,965-Speed 2608.27 samples/sec Loss 7.0389 LearningRate 0.0275 Epoch: 9 Global Step: 394310 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:54,866-Speed 2625.93 samples/sec Loss 7.0721 LearningRate 0.0275 Epoch: 9 Global Step: 394320 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:01:58,768-Speed 2625.31 samples/sec Loss 7.0704 LearningRate 0.0275 Epoch: 9 Global Step: 394330 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:02:02,670-Speed 2624.79 samples/sec Loss 7.0989 LearningRate 0.0275 Epoch: 9 Global Step: 394340 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:06,569-Speed 2626.55 samples/sec Loss 7.2215 LearningRate 0.0275 Epoch: 9 Global Step: 394350 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:10,474-Speed 2623.07 samples/sec Loss 6.9758 LearningRate 0.0275 Epoch: 9 Global Step: 394360 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:14,371-Speed 2628.10 samples/sec Loss 7.0077 LearningRate 0.0275 Epoch: 9 Global Step: 394370 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:18,274-Speed 2624.95 samples/sec Loss 6.9631 LearningRate 0.0275 Epoch: 9 Global Step: 394380 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:22,172-Speed 2627.38 samples/sec Loss 6.9714 LearningRate 0.0275 Epoch: 9 Global Step: 394390 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:26,067-Speed 2630.29 samples/sec Loss 6.9751 LearningRate 0.0275 Epoch: 9 Global Step: 394400 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:29,976-Speed 2620.48 samples/sec Loss 7.0553 LearningRate 0.0275 Epoch: 9 Global Step: 394410 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:33,878-Speed 2624.95 samples/sec Loss 6.9846 LearningRate 0.0275 Epoch: 9 Global Step: 394420 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:37,774-Speed 2628.49 samples/sec Loss 7.1841 LearningRate 0.0275 Epoch: 9 Global Step: 394430 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:02:41,693-Speed 2613.53 samples/sec Loss 6.9823 LearningRate 0.0275 Epoch: 9 Global Step: 394440 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:02:45,590-Speed 2628.33 samples/sec Loss 7.0362 LearningRate 0.0275 Epoch: 9 Global Step: 394450 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:02:49,496-Speed 2622.62 samples/sec Loss 7.1675 LearningRate 0.0275 Epoch: 9 Global Step: 394460 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:02:53,394-Speed 2627.69 samples/sec Loss 7.0055 LearningRate 0.0275 Epoch: 9 Global Step: 394470 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:02:57,276-Speed 2639.51 samples/sec Loss 7.1672 LearningRate 0.0275 Epoch: 9 Global Step: 394480 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:01,174-Speed 2627.30 samples/sec Loss 6.9125 LearningRate 0.0275 Epoch: 9 Global Step: 394490 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:05,077-Speed 2623.94 samples/sec Loss 7.2203 LearningRate 0.0275 Epoch: 9 Global Step: 394500 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:08,976-Speed 2626.45 samples/sec Loss 6.7988 LearningRate 0.0275 Epoch: 9 Global Step: 394510 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:12,892-Speed 2615.94 samples/sec Loss 7.0323 LearningRate 0.0275 Epoch: 9 Global Step: 394520 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:16,802-Speed 2620.04 samples/sec Loss 6.9925 LearningRate 0.0275 Epoch: 9 Global Step: 394530 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:20,698-Speed 2630.53 samples/sec Loss 7.0466 LearningRate 0.0275 Epoch: 9 Global Step: 394540 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:24,612-Speed 2616.81 samples/sec Loss 7.0264 LearningRate 0.0275 Epoch: 9 Global Step: 394550 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:28,518-Speed 2622.58 samples/sec Loss 7.0548 LearningRate 0.0275 Epoch: 9 Global Step: 394560 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:32,456-Speed 2600.69 samples/sec Loss 7.0563 LearningRate 0.0275 Epoch: 9 Global Step: 394570 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:36,340-Speed 2637.11 samples/sec Loss 7.0951 LearningRate 0.0275 Epoch: 9 Global Step: 394580 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:40,240-Speed 2626.12 samples/sec Loss 7.1499 LearningRate 0.0275 Epoch: 9 Global Step: 394590 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:44,142-Speed 2625.68 samples/sec Loss 7.1001 LearningRate 0.0275 Epoch: 9 Global Step: 394600 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:48,042-Speed 2626.61 samples/sec Loss 6.9676 LearningRate 0.0275 Epoch: 9 Global Step: 394610 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:51,938-Speed 2628.57 samples/sec Loss 6.9270 LearningRate 0.0275 Epoch: 9 Global Step: 394620 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:55,834-Speed 2629.36 samples/sec Loss 6.9145 LearningRate 0.0275 Epoch: 9 Global Step: 394630 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:03:59,711-Speed 2642.22 samples/sec Loss 7.0186 LearningRate 0.0275 Epoch: 9 Global Step: 394640 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:03,620-Speed 2619.65 samples/sec Loss 6.9982 LearningRate 0.0275 Epoch: 9 Global Step: 394650 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:07,534-Speed 2616.69 samples/sec Loss 7.0068 LearningRate 0.0275 Epoch: 9 Global Step: 394660 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:11,626-Speed 2503.71 samples/sec Loss 7.0071 LearningRate 0.0275 Epoch: 9 Global Step: 394670 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:15,595-Speed 2580.96 samples/sec Loss 7.2205 LearningRate 0.0275 Epoch: 9 Global Step: 394680 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:19,598-Speed 2558.67 samples/sec Loss 7.1485 LearningRate 0.0275 Epoch: 9 Global Step: 394690 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:23,498-Speed 2626.28 samples/sec Loss 7.0313 LearningRate 0.0275 Epoch: 9 Global Step: 394700 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:27,404-Speed 2622.48 samples/sec Loss 7.1428 LearningRate 0.0275 Epoch: 9 Global Step: 394710 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:31,311-Speed 2621.77 samples/sec Loss 7.0597 LearningRate 0.0275 Epoch: 9 Global Step: 394720 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:35,210-Speed 2627.08 samples/sec Loss 7.0207 LearningRate 0.0275 Epoch: 9 Global Step: 394730 Fp16 Grad Scale: 32768 Required: 49 hours
Training: 2022-04-14 16:04:39,211-Speed 2560.03 samples/sec Loss 7.1555 LearningRate 0.0275 Epoch: 9 Global Step: 394740 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:04:43,114-Speed 2624.07 samples/sec Loss 6.9729 LearningRate 0.0275 Epoch: 9 Global Step: 394750 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:04:47,032-Speed 2614.34 samples/sec Loss 7.1676 LearningRate 0.0275 Epoch: 9 Global Step: 394760 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:04:50,997-Speed 2584.99 samples/sec Loss 7.2373 LearningRate 0.0275 Epoch: 9 Global Step: 394770 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:04:54,894-Speed 2628.56 samples/sec Loss 7.0809 LearningRate 0.0275 Epoch: 9 Global Step: 394780 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:04:58,797-Speed 2624.60 samples/sec Loss 7.1139 LearningRate 0.0275 Epoch: 9 Global Step: 394790 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:05:02,695-Speed 2627.82 samples/sec Loss 7.0969 LearningRate 0.0275 Epoch: 9 Global Step: 394800 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:05:06,617-Speed 2610.80 samples/sec Loss 7.1396 LearningRate 0.0275 Epoch: 9 Global Step: 394810 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:05:10,545-Speed 2607.34 samples/sec Loss 7.0325 LearningRate 0.0275 Epoch: 9 Global Step: 394820 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:05:14,453-Speed 2621.95 samples/sec Loss 6.9860 LearningRate 0.0275 Epoch: 9 Global Step: 394830 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:05:18,351-Speed 2627.39 samples/sec Loss 7.0051 LearningRate 0.0275 Epoch: 9 Global Step: 394840 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:22,246-Speed 2629.53 samples/sec Loss 6.9576 LearningRate 0.0275 Epoch: 9 Global Step: 394850 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:26,156-Speed 2619.97 samples/sec Loss 7.0830 LearningRate 0.0275 Epoch: 9 Global Step: 394860 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:30,057-Speed 2625.86 samples/sec Loss 6.9121 LearningRate 0.0275 Epoch: 9 Global Step: 394870 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:33,955-Speed 2627.49 samples/sec Loss 7.0723 LearningRate 0.0275 Epoch: 9 Global Step: 394880 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:37,859-Speed 2623.66 samples/sec Loss 6.9725 LearningRate 0.0275 Epoch: 9 Global Step: 394890 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:41,757-Speed 2627.16 samples/sec Loss 6.9040 LearningRate 0.0275 Epoch: 9 Global Step: 394900 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:45,652-Speed 2629.95 samples/sec Loss 7.0032 LearningRate 0.0275 Epoch: 9 Global Step: 394910 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:49,558-Speed 2622.35 samples/sec Loss 7.0149 LearningRate 0.0275 Epoch: 9 Global Step: 394920 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:53,453-Speed 2629.44 samples/sec Loss 7.0021 LearningRate 0.0275 Epoch: 9 Global Step: 394930 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:05:57,360-Speed 2622.22 samples/sec Loss 6.9952 LearningRate 0.0275 Epoch: 9 Global Step: 394940 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:06:01,256-Speed 2628.33 samples/sec Loss 7.0931 LearningRate 0.0274 Epoch: 9 Global Step: 394950 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:06:05,138-Speed 2638.52 samples/sec Loss 7.0546 LearningRate 0.0274 Epoch: 9 Global Step: 394960 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:06:09,120-Speed 2572.05 samples/sec Loss 7.1501 LearningRate 0.0274 Epoch: 9 Global Step: 394970 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:13,047-Speed 2608.80 samples/sec Loss 7.0990 LearningRate 0.0274 Epoch: 9 Global Step: 394980 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:16,948-Speed 2625.45 samples/sec Loss 7.0997 LearningRate 0.0274 Epoch: 9 Global Step: 394990 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:20,844-Speed 2628.55 samples/sec Loss 7.1139 LearningRate 0.0274 Epoch: 9 Global Step: 395000 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:24,778-Speed 2603.59 samples/sec Loss 7.0128 LearningRate 0.0274 Epoch: 9 Global Step: 395010 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:28,868-Speed 2505.15 samples/sec Loss 7.0486 LearningRate 0.0274 Epoch: 9 Global Step: 395020 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:32,766-Speed 2627.43 samples/sec Loss 7.0819 LearningRate 0.0274 Epoch: 9 Global Step: 395030 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:36,680-Speed 2616.34 samples/sec Loss 7.0679 LearningRate 0.0274 Epoch: 9 Global Step: 395040 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:40,575-Speed 2630.24 samples/sec Loss 7.1995 LearningRate 0.0274 Epoch: 9 Global Step: 395050 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:44,470-Speed 2629.59 samples/sec Loss 6.9623 LearningRate 0.0274 Epoch: 9 Global Step: 395060 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:06:48,369-Speed 2627.31 samples/sec Loss 6.9913 LearningRate 0.0274 Epoch: 9 Global Step: 395070 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:06:52,265-Speed 2629.43 samples/sec Loss 6.9484 LearningRate 0.0274 Epoch: 9 Global Step: 395080 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:06:56,166-Speed 2625.14 samples/sec Loss 7.1153 LearningRate 0.0274 Epoch: 9 Global Step: 395090 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:00,254-Speed 2505.56 samples/sec Loss 6.9821 LearningRate 0.0274 Epoch: 9 Global Step: 395100 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:04,165-Speed 2618.50 samples/sec Loss 7.0593 LearningRate 0.0274 Epoch: 9 Global Step: 395110 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:08,061-Speed 2629.73 samples/sec Loss 7.0362 LearningRate 0.0274 Epoch: 9 Global Step: 395120 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:11,948-Speed 2634.67 samples/sec Loss 6.9090 LearningRate 0.0274 Epoch: 9 Global Step: 395130 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:15,854-Speed 2622.02 samples/sec Loss 6.9571 LearningRate 0.0274 Epoch: 9 Global Step: 395140 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:19,776-Speed 2611.52 samples/sec Loss 6.9460 LearningRate 0.0274 Epoch: 9 Global Step: 395150 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:23,696-Speed 2613.29 samples/sec Loss 7.0912 LearningRate 0.0274 Epoch: 9 Global Step: 395160 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:27,575-Speed 2640.37 samples/sec Loss 7.1122 LearningRate 0.0274 Epoch: 9 Global Step: 395170 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:31,555-Speed 2573.48 samples/sec Loss 7.0404 LearningRate 0.0274 Epoch: 9 Global Step: 395180 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:35,452-Speed 2628.03 samples/sec Loss 7.0043 LearningRate 0.0274 Epoch: 9 Global Step: 395190 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:39,353-Speed 2625.61 samples/sec Loss 7.0903 LearningRate 0.0274 Epoch: 9 Global Step: 395200 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:07:43,259-Speed 2622.29 samples/sec Loss 7.1902 LearningRate 0.0274 Epoch: 9 Global Step: 395210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:07:47,161-Speed 2624.52 samples/sec Loss 6.9832 LearningRate 0.0274 Epoch: 9 Global Step: 395220 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:07:51,143-Speed 2572.61 samples/sec Loss 6.9333 LearningRate 0.0274 Epoch: 9 Global Step: 395230 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:07:55,127-Speed 2570.81 samples/sec Loss 6.9537 LearningRate 0.0274 Epoch: 9 Global Step: 395240 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:07:59,046-Speed 2613.84 samples/sec Loss 6.8885 LearningRate 0.0274 Epoch: 9 Global Step: 395250 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:02,940-Speed 2630.12 samples/sec Loss 6.9926 LearningRate 0.0274 Epoch: 9 Global Step: 395260 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:06,837-Speed 2627.87 samples/sec Loss 7.1411 LearningRate 0.0274 Epoch: 9 Global Step: 395270 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:10,735-Speed 2627.36 samples/sec Loss 7.1072 LearningRate 0.0274 Epoch: 9 Global Step: 395280 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:14,631-Speed 2629.06 samples/sec Loss 7.0189 LearningRate 0.0274 Epoch: 9 Global Step: 395290 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:18,559-Speed 2608.97 samples/sec Loss 6.9936 LearningRate 0.0274 Epoch: 9 Global Step: 395300 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:22,456-Speed 2628.13 samples/sec Loss 6.9223 LearningRate 0.0274 Epoch: 9 Global Step: 395310 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:08:26,356-Speed 2627.12 samples/sec Loss 7.1200 LearningRate 0.0274 Epoch: 9 Global Step: 395320 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:08:30,242-Speed 2636.02 samples/sec Loss 7.1370 LearningRate 0.0274 Epoch: 9 Global Step: 395330 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:34,140-Speed 2627.27 samples/sec Loss 6.9752 LearningRate 0.0274 Epoch: 9 Global Step: 395340 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:38,038-Speed 2627.44 samples/sec Loss 7.0435 LearningRate 0.0274 Epoch: 9 Global Step: 395350 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:41,935-Speed 2628.37 samples/sec Loss 6.9406 LearningRate 0.0274 Epoch: 9 Global Step: 395360 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:45,836-Speed 2625.74 samples/sec Loss 6.9676 LearningRate 0.0274 Epoch: 9 Global Step: 395370 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:49,733-Speed 2628.38 samples/sec Loss 7.0204 LearningRate 0.0274 Epoch: 9 Global Step: 395380 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:53,631-Speed 2627.93 samples/sec Loss 6.9798 LearningRate 0.0274 Epoch: 9 Global Step: 395390 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:08:57,540-Speed 2620.08 samples/sec Loss 7.1201 LearningRate 0.0274 Epoch: 9 Global Step: 395400 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:01,441-Speed 2625.58 samples/sec Loss 7.0379 LearningRate 0.0274 Epoch: 9 Global Step: 395410 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:05,340-Speed 2626.69 samples/sec Loss 6.9860 LearningRate 0.0274 Epoch: 9 Global Step: 395420 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:09,236-Speed 2628.55 samples/sec Loss 7.0009 LearningRate 0.0274 Epoch: 9 Global Step: 395430 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:09:13,118-Speed 2638.92 samples/sec Loss 6.9243 LearningRate 0.0274 Epoch: 9 Global Step: 395440 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:17,031-Speed 2617.61 samples/sec Loss 6.9399 LearningRate 0.0274 Epoch: 9 Global Step: 395450 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:20,929-Speed 2627.65 samples/sec Loss 6.9613 LearningRate 0.0274 Epoch: 9 Global Step: 395460 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:24,827-Speed 2627.98 samples/sec Loss 7.0573 LearningRate 0.0274 Epoch: 9 Global Step: 395470 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:28,752-Speed 2609.68 samples/sec Loss 7.0007 LearningRate 0.0274 Epoch: 9 Global Step: 395480 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:32,664-Speed 2618.09 samples/sec Loss 7.0285 LearningRate 0.0274 Epoch: 9 Global Step: 395490 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:36,566-Speed 2624.87 samples/sec Loss 7.0695 LearningRate 0.0274 Epoch: 9 Global Step: 395500 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:40,467-Speed 2625.58 samples/sec Loss 6.9925 LearningRate 0.0274 Epoch: 9 Global Step: 395510 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:44,361-Speed 2630.66 samples/sec Loss 7.0363 LearningRate 0.0274 Epoch: 9 Global Step: 395520 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:48,259-Speed 2628.33 samples/sec Loss 7.0386 LearningRate 0.0274 Epoch: 9 Global Step: 395530 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:09:52,156-Speed 2628.02 samples/sec Loss 7.0028 LearningRate 0.0274 Epoch: 9 Global Step: 395540 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:09:56,051-Speed 2630.36 samples/sec Loss 7.0741 LearningRate 0.0274 Epoch: 9 Global Step: 395550 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:09:59,975-Speed 2609.71 samples/sec Loss 7.0807 LearningRate 0.0274 Epoch: 9 Global Step: 395560 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:10:03,876-Speed 2625.98 samples/sec Loss 6.9469 LearningRate 0.0274 Epoch: 9 Global Step: 395570 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:10:07,775-Speed 2626.32 samples/sec Loss 6.9340 LearningRate 0.0274 Epoch: 9 Global Step: 395580 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:10:11,675-Speed 2626.65 samples/sec Loss 7.0846 LearningRate 0.0274 Epoch: 9 Global Step: 395590 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:10:15,575-Speed 2625.98 samples/sec Loss 7.0558 LearningRate 0.0274 Epoch: 9 Global Step: 395600 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:10:19,474-Speed 2626.94 samples/sec Loss 6.9497 LearningRate 0.0274 Epoch: 9 Global Step: 395610 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:10:23,374-Speed 2626.40 samples/sec Loss 6.8740 LearningRate 0.0274 Epoch: 9 Global Step: 395620 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:10:27,249-Speed 2642.93 samples/sec Loss 6.9859 LearningRate 0.0274 Epoch: 9 Global Step: 395630 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:31,160-Speed 2619.31 samples/sec Loss 7.0725 LearningRate 0.0274 Epoch: 9 Global Step: 395640 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:35,063-Speed 2624.11 samples/sec Loss 7.0234 LearningRate 0.0274 Epoch: 9 Global Step: 395650 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:38,963-Speed 2626.01 samples/sec Loss 7.0939 LearningRate 0.0274 Epoch: 9 Global Step: 395660 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:42,861-Speed 2627.75 samples/sec Loss 7.1329 LearningRate 0.0274 Epoch: 9 Global Step: 395670 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:46,763-Speed 2625.05 samples/sec Loss 7.0038 LearningRate 0.0274 Epoch: 9 Global Step: 395680 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:50,658-Speed 2629.39 samples/sec Loss 7.0757 LearningRate 0.0274 Epoch: 9 Global Step: 395690 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:54,554-Speed 2629.16 samples/sec Loss 6.9321 LearningRate 0.0274 Epoch: 9 Global Step: 395700 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:10:58,457-Speed 2624.29 samples/sec Loss 7.1787 LearningRate 0.0274 Epoch: 9 Global Step: 395710 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:02,356-Speed 2626.90 samples/sec Loss 7.0504 LearningRate 0.0274 Epoch: 9 Global Step: 395720 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:06,259-Speed 2624.14 samples/sec Loss 6.9649 LearningRate 0.0274 Epoch: 9 Global Step: 395730 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:11:10,159-Speed 2626.63 samples/sec Loss 6.8575 LearningRate 0.0273 Epoch: 9 Global Step: 395740 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:11:14,053-Speed 2629.79 samples/sec Loss 6.9892 LearningRate 0.0273 Epoch: 9 Global Step: 395750 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:11:17,935-Speed 2639.10 samples/sec Loss 7.1090 LearningRate 0.0273 Epoch: 9 Global Step: 395760 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:21,841-Speed 2621.52 samples/sec Loss 6.9995 LearningRate 0.0273 Epoch: 9 Global Step: 395770 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:25,752-Speed 2619.68 samples/sec Loss 7.0709 LearningRate 0.0273 Epoch: 9 Global Step: 395780 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:29,669-Speed 2614.24 samples/sec Loss 6.9445 LearningRate 0.0273 Epoch: 9 Global Step: 395790 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:33,576-Speed 2622.00 samples/sec Loss 7.1263 LearningRate 0.0273 Epoch: 9 Global Step: 395800 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:37,473-Speed 2628.43 samples/sec Loss 7.1003 LearningRate 0.0273 Epoch: 9 Global Step: 395810 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:41,391-Speed 2614.24 samples/sec Loss 7.1099 LearningRate 0.0273 Epoch: 9 Global Step: 395820 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:45,312-Speed 2612.17 samples/sec Loss 7.0340 LearningRate 0.0273 Epoch: 9 Global Step: 395830 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:49,210-Speed 2628.11 samples/sec Loss 6.9711 LearningRate 0.0273 Epoch: 9 Global Step: 395840 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:53,104-Speed 2629.64 samples/sec Loss 6.9884 LearningRate 0.0273 Epoch: 9 Global Step: 395850 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:11:57,015-Speed 2619.55 samples/sec Loss 6.9702 LearningRate 0.0273 Epoch: 9 Global Step: 395860 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:00,919-Speed 2622.86 samples/sec Loss 7.1906 LearningRate 0.0273 Epoch: 9 Global Step: 395870 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:04,828-Speed 2620.64 samples/sec Loss 7.0477 LearningRate 0.0273 Epoch: 9 Global Step: 395880 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:08,721-Speed 2631.40 samples/sec Loss 6.9350 LearningRate 0.0273 Epoch: 9 Global Step: 395890 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:12,645-Speed 2610.02 samples/sec Loss 7.1542 LearningRate 0.0273 Epoch: 9 Global Step: 395900 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:16,576-Speed 2605.58 samples/sec Loss 7.1134 LearningRate 0.0273 Epoch: 9 Global Step: 395910 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:20,472-Speed 2628.55 samples/sec Loss 7.0974 LearningRate 0.0273 Epoch: 9 Global Step: 395920 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:24,374-Speed 2625.63 samples/sec Loss 7.0009 LearningRate 0.0273 Epoch: 9 Global Step: 395930 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:28,310-Speed 2602.19 samples/sec Loss 7.0672 LearningRate 0.0273 Epoch: 9 Global Step: 395940 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:32,214-Speed 2623.33 samples/sec Loss 7.0443 LearningRate 0.0273 Epoch: 9 Global Step: 395950 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:36,099-Speed 2636.47 samples/sec Loss 7.1131 LearningRate 0.0273 Epoch: 9 Global Step: 395960 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:12:39,975-Speed 2643.16 samples/sec Loss 7.0913 LearningRate 0.0273 Epoch: 9 Global Step: 395970 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:12:43,923-Speed 2594.36 samples/sec Loss 7.0471 LearningRate 0.0273 Epoch: 9 Global Step: 395980 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:12:47,824-Speed 2625.67 samples/sec Loss 7.1199 LearningRate 0.0273 Epoch: 9 Global Step: 395990 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:12:51,721-Speed 2628.58 samples/sec Loss 6.9295 LearningRate 0.0273 Epoch: 9 Global Step: 396000 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:12:55,626-Speed 2622.56 samples/sec Loss 6.9649 LearningRate 0.0273 Epoch: 9 Global Step: 396010 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:12:59,527-Speed 2625.37 samples/sec Loss 7.0466 LearningRate 0.0273 Epoch: 9 Global Step: 396020 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:03,425-Speed 2628.06 samples/sec Loss 7.0061 LearningRate 0.0273 Epoch: 9 Global Step: 396030 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:07,319-Speed 2630.47 samples/sec Loss 6.9410 LearningRate 0.0273 Epoch: 9 Global Step: 396040 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:11,217-Speed 2627.52 samples/sec Loss 7.0842 LearningRate 0.0273 Epoch: 9 Global Step: 396050 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:15,120-Speed 2624.04 samples/sec Loss 6.9006 LearningRate 0.0273 Epoch: 9 Global Step: 396060 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:19,032-Speed 2619.10 samples/sec Loss 6.9475 LearningRate 0.0273 Epoch: 9 Global Step: 396070 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:13:22,925-Speed 2631.02 samples/sec Loss 6.9466 LearningRate 0.0273 Epoch: 9 Global Step: 396080 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:13:26,820-Speed 2629.87 samples/sec Loss 7.0210 LearningRate 0.0273 Epoch: 9 Global Step: 396090 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:13:30,711-Speed 2631.80 samples/sec Loss 7.0562 LearningRate 0.0273 Epoch: 9 Global Step: 396100 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:13:34,611-Speed 2626.33 samples/sec Loss 7.0061 LearningRate 0.0273 Epoch: 9 Global Step: 396110 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:13:38,505-Speed 2630.36 samples/sec Loss 6.9666 LearningRate 0.0273 Epoch: 9 Global Step: 396120 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:13:42,399-Speed 2630.69 samples/sec Loss 6.9939 LearningRate 0.0273 Epoch: 9 Global Step: 396130 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:13:46,293-Speed 2630.14 samples/sec Loss 7.1314 LearningRate 0.0273 Epoch: 9 Global Step: 396140 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:50,200-Speed 2621.49 samples/sec Loss 7.1777 LearningRate 0.0273 Epoch: 9 Global Step: 396150 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:54,094-Speed 2630.20 samples/sec Loss 6.8987 LearningRate 0.0273 Epoch: 9 Global Step: 396160 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:13:57,989-Speed 2630.16 samples/sec Loss 7.0502 LearningRate 0.0273 Epoch: 9 Global Step: 396170 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:01,883-Speed 2630.42 samples/sec Loss 7.1092 LearningRate 0.0273 Epoch: 9 Global Step: 396180 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:05,791-Speed 2620.60 samples/sec Loss 7.1840 LearningRate 0.0273 Epoch: 9 Global Step: 396190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:09,680-Speed 2633.29 samples/sec Loss 7.1115 LearningRate 0.0273 Epoch: 9 Global Step: 396200 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:13,599-Speed 2613.96 samples/sec Loss 7.0528 LearningRate 0.0273 Epoch: 9 Global Step: 396210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:17,493-Speed 2629.66 samples/sec Loss 7.0022 LearningRate 0.0273 Epoch: 9 Global Step: 396220 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:21,384-Speed 2633.49 samples/sec Loss 6.9587 LearningRate 0.0273 Epoch: 9 Global Step: 396230 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:25,278-Speed 2630.29 samples/sec Loss 6.9874 LearningRate 0.0273 Epoch: 9 Global Step: 396240 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:14:29,175-Speed 2628.48 samples/sec Loss 6.9398 LearningRate 0.0273 Epoch: 9 Global Step: 396250 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:14:33,071-Speed 2629.10 samples/sec Loss 7.0111 LearningRate 0.0273 Epoch: 9 Global Step: 396260 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:14:36,969-Speed 2627.29 samples/sec Loss 6.9267 LearningRate 0.0273 Epoch: 9 Global Step: 396270 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:14:40,849-Speed 2639.47 samples/sec Loss 7.0391 LearningRate 0.0273 Epoch: 9 Global Step: 396280 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:44,749-Speed 2626.53 samples/sec Loss 7.1371 LearningRate 0.0273 Epoch: 9 Global Step: 396290 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:48,645-Speed 2628.70 samples/sec Loss 6.9290 LearningRate 0.0273 Epoch: 9 Global Step: 396300 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:52,546-Speed 2625.71 samples/sec Loss 7.0614 LearningRate 0.0273 Epoch: 9 Global Step: 396310 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:14:56,441-Speed 2629.41 samples/sec Loss 6.9076 LearningRate 0.0273 Epoch: 9 Global Step: 396320 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:15:00,342-Speed 2625.95 samples/sec Loss 6.8840 LearningRate 0.0273 Epoch: 9 Global Step: 396330 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:15:04,234-Speed 2631.07 samples/sec Loss 6.9903 LearningRate 0.0273 Epoch: 9 Global Step: 396340 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:15:08,134-Speed 2626.42 samples/sec Loss 6.9575 LearningRate 0.0273 Epoch: 9 Global Step: 396350 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:15:12,031-Speed 2628.29 samples/sec Loss 7.0043 LearningRate 0.0273 Epoch: 9 Global Step: 396360 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:15:15,933-Speed 2624.80 samples/sec Loss 7.1997 LearningRate 0.0273 Epoch: 9 Global Step: 396370 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:15:19,827-Speed 2630.55 samples/sec Loss 6.9589 LearningRate 0.0273 Epoch: 9 Global Step: 396380 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:23,722-Speed 2629.04 samples/sec Loss 7.0235 LearningRate 0.0273 Epoch: 9 Global Step: 396390 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:27,650-Speed 2608.16 samples/sec Loss 6.8821 LearningRate 0.0273 Epoch: 9 Global Step: 396400 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:31,543-Speed 2630.53 samples/sec Loss 7.0311 LearningRate 0.0273 Epoch: 9 Global Step: 396410 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:35,437-Speed 2630.04 samples/sec Loss 7.0107 LearningRate 0.0273 Epoch: 9 Global Step: 396420 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:39,331-Speed 2630.81 samples/sec Loss 7.0301 LearningRate 0.0273 Epoch: 9 Global Step: 396430 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:43,223-Speed 2632.36 samples/sec Loss 6.9728 LearningRate 0.0273 Epoch: 9 Global Step: 396440 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:47,118-Speed 2629.28 samples/sec Loss 7.0262 LearningRate 0.0273 Epoch: 9 Global Step: 396450 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:51,016-Speed 2627.72 samples/sec Loss 7.0055 LearningRate 0.0273 Epoch: 9 Global Step: 396460 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:54,927-Speed 2618.77 samples/sec Loss 7.0105 LearningRate 0.0273 Epoch: 9 Global Step: 396470 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:15:58,811-Speed 2636.92 samples/sec Loss 7.0568 LearningRate 0.0273 Epoch: 9 Global Step: 396480 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:02,705-Speed 2630.52 samples/sec Loss 6.8573 LearningRate 0.0273 Epoch: 9 Global Step: 396490 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:06,603-Speed 2627.19 samples/sec Loss 7.0069 LearningRate 0.0273 Epoch: 9 Global Step: 396500 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:10,500-Speed 2628.42 samples/sec Loss 7.0983 LearningRate 0.0273 Epoch: 9 Global Step: 396510 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:14,398-Speed 2627.34 samples/sec Loss 7.0678 LearningRate 0.0273 Epoch: 9 Global Step: 396520 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:18,322-Speed 2610.29 samples/sec Loss 6.9242 LearningRate 0.0272 Epoch: 9 Global Step: 396530 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:22,224-Speed 2624.67 samples/sec Loss 7.0352 LearningRate 0.0272 Epoch: 9 Global Step: 396540 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:26,119-Speed 2629.90 samples/sec Loss 6.9130 LearningRate 0.0272 Epoch: 9 Global Step: 396550 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:30,015-Speed 2628.91 samples/sec Loss 6.9822 LearningRate 0.0272 Epoch: 9 Global Step: 396560 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:16:33,903-Speed 2634.31 samples/sec Loss 7.1471 LearningRate 0.0272 Epoch: 9 Global Step: 396570 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:16:37,805-Speed 2624.90 samples/sec Loss 7.0371 LearningRate 0.0272 Epoch: 9 Global Step: 396580 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:16:41,705-Speed 2626.37 samples/sec Loss 6.9478 LearningRate 0.0272 Epoch: 9 Global Step: 396590 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:16:45,610-Speed 2622.13 samples/sec Loss 7.0253 LearningRate 0.0272 Epoch: 9 Global Step: 396600 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:16:49,513-Speed 2625.28 samples/sec Loss 7.0234 LearningRate 0.0272 Epoch: 9 Global Step: 396610 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:16:53,425-Speed 2617.82 samples/sec Loss 6.9998 LearningRate 0.0272 Epoch: 9 Global Step: 396620 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:16:57,326-Speed 2626.26 samples/sec Loss 7.0329 LearningRate 0.0272 Epoch: 9 Global Step: 396630 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:17:01,228-Speed 2624.71 samples/sec Loss 7.0084 LearningRate 0.0272 Epoch: 9 Global Step: 396640 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:17:05,124-Speed 2629.24 samples/sec Loss 6.9638 LearningRate 0.0272 Epoch: 9 Global Step: 396650 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:17:09,022-Speed 2627.84 samples/sec Loss 6.9861 LearningRate 0.0272 Epoch: 9 Global Step: 396660 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:17:12,927-Speed 2622.49 samples/sec Loss 6.9393 LearningRate 0.0272 Epoch: 9 Global Step: 396670 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:16,827-Speed 2625.76 samples/sec Loss 7.0119 LearningRate 0.0272 Epoch: 9 Global Step: 396680 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:20,747-Speed 2613.35 samples/sec Loss 6.9836 LearningRate 0.0272 Epoch: 9 Global Step: 396690 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:24,640-Speed 2631.25 samples/sec Loss 7.0970 LearningRate 0.0272 Epoch: 9 Global Step: 396700 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:28,542-Speed 2624.52 samples/sec Loss 6.9620 LearningRate 0.0272 Epoch: 9 Global Step: 396710 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:32,454-Speed 2617.98 samples/sec Loss 6.9054 LearningRate 0.0272 Epoch: 9 Global Step: 396720 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:36,351-Speed 2628.53 samples/sec Loss 7.0126 LearningRate 0.0272 Epoch: 9 Global Step: 396730 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:40,252-Speed 2625.46 samples/sec Loss 7.0725 LearningRate 0.0272 Epoch: 9 Global Step: 396740 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:44,152-Speed 2626.65 samples/sec Loss 6.9986 LearningRate 0.0272 Epoch: 9 Global Step: 396750 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:48,045-Speed 2630.89 samples/sec Loss 7.1389 LearningRate 0.0272 Epoch: 9 Global Step: 396760 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:51,924-Speed 2639.97 samples/sec Loss 7.0701 LearningRate 0.0272 Epoch: 9 Global Step: 396770 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:55,817-Speed 2631.33 samples/sec Loss 6.9835 LearningRate 0.0272 Epoch: 9 Global Step: 396780 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:17:59,716-Speed 2626.90 samples/sec Loss 6.9185 LearningRate 0.0272 Epoch: 9 Global Step: 396790 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:03,615-Speed 2627.01 samples/sec Loss 6.9109 LearningRate 0.0272 Epoch: 9 Global Step: 396800 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:07,528-Speed 2617.12 samples/sec Loss 6.9887 LearningRate 0.0272 Epoch: 9 Global Step: 396810 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:11,424-Speed 2628.55 samples/sec Loss 6.8265 LearningRate 0.0272 Epoch: 9 Global Step: 396820 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:15,322-Speed 2628.28 samples/sec Loss 6.9954 LearningRate 0.0272 Epoch: 9 Global Step: 396830 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:19,221-Speed 2626.89 samples/sec Loss 7.0522 LearningRate 0.0272 Epoch: 9 Global Step: 396840 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:23,116-Speed 2629.16 samples/sec Loss 6.9042 LearningRate 0.0272 Epoch: 9 Global Step: 396850 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:27,007-Speed 2632.58 samples/sec Loss 7.0830 LearningRate 0.0272 Epoch: 9 Global Step: 396860 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:30,893-Speed 2635.94 samples/sec Loss 7.0352 LearningRate 0.0272 Epoch: 9 Global Step: 396870 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:18:34,792-Speed 2626.44 samples/sec Loss 6.9889 LearningRate 0.0272 Epoch: 9 Global Step: 396880 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:18:38,686-Speed 2630.70 samples/sec Loss 7.0311 LearningRate 0.0272 Epoch: 9 Global Step: 396890 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:42,588-Speed 2624.36 samples/sec Loss 7.0718 LearningRate 0.0272 Epoch: 9 Global Step: 396900 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:46,486-Speed 2628.15 samples/sec Loss 7.0867 LearningRate 0.0272 Epoch: 9 Global Step: 396910 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:50,382-Speed 2628.66 samples/sec Loss 6.9943 LearningRate 0.0272 Epoch: 9 Global Step: 396920 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:54,279-Speed 2628.55 samples/sec Loss 6.9996 LearningRate 0.0272 Epoch: 9 Global Step: 396930 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:18:58,181-Speed 2624.86 samples/sec Loss 7.1088 LearningRate 0.0272 Epoch: 9 Global Step: 396940 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:02,092-Speed 2618.69 samples/sec Loss 6.9633 LearningRate 0.0272 Epoch: 9 Global Step: 396950 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:05,994-Speed 2624.99 samples/sec Loss 6.9923 LearningRate 0.0272 Epoch: 9 Global Step: 396960 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:09,953-Speed 2587.04 samples/sec Loss 6.9972 LearningRate 0.0272 Epoch: 9 Global Step: 396970 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:13,850-Speed 2627.78 samples/sec Loss 6.9979 LearningRate 0.0272 Epoch: 9 Global Step: 396980 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:17,740-Speed 2633.41 samples/sec Loss 6.9114 LearningRate 0.0272 Epoch: 9 Global Step: 396990 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:21,634-Speed 2630.30 samples/sec Loss 6.9574 LearningRate 0.0272 Epoch: 9 Global Step: 397000 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:25,537-Speed 2623.73 samples/sec Loss 6.9118 LearningRate 0.0272 Epoch: 9 Global Step: 397010 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:29,464-Speed 2608.66 samples/sec Loss 6.9382 LearningRate 0.0272 Epoch: 9 Global Step: 397020 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:33,353-Speed 2633.55 samples/sec Loss 6.8885 LearningRate 0.0272 Epoch: 9 Global Step: 397030 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:37,257-Speed 2623.65 samples/sec Loss 6.9804 LearningRate 0.0272 Epoch: 9 Global Step: 397040 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:19:41,137-Speed 2639.28 samples/sec Loss 7.1183 LearningRate 0.0272 Epoch: 9 Global Step: 397050 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:19:45,095-Speed 2588.29 samples/sec Loss 6.9484 LearningRate 0.0272 Epoch: 9 Global Step: 397060 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:19:48,995-Speed 2625.90 samples/sec Loss 6.9961 LearningRate 0.0272 Epoch: 9 Global Step: 397070 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:19:52,888-Speed 2631.17 samples/sec Loss 7.0696 LearningRate 0.0272 Epoch: 9 Global Step: 397080 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:19:56,786-Speed 2627.37 samples/sec Loss 6.8866 LearningRate 0.0272 Epoch: 9 Global Step: 397090 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:00,680-Speed 2630.42 samples/sec Loss 6.9774 LearningRate 0.0272 Epoch: 9 Global Step: 397100 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:04,583-Speed 2624.14 samples/sec Loss 7.0390 LearningRate 0.0272 Epoch: 9 Global Step: 397110 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:08,482-Speed 2626.89 samples/sec Loss 6.9689 LearningRate 0.0272 Epoch: 9 Global Step: 397120 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:12,379-Speed 2628.45 samples/sec Loss 6.8014 LearningRate 0.0272 Epoch: 9 Global Step: 397130 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:16,272-Speed 2630.41 samples/sec Loss 6.9489 LearningRate 0.0272 Epoch: 9 Global Step: 397140 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:20,170-Speed 2627.63 samples/sec Loss 6.9924 LearningRate 0.0272 Epoch: 9 Global Step: 397150 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:20:24,069-Speed 2626.84 samples/sec Loss 7.0244 LearningRate 0.0272 Epoch: 9 Global Step: 397160 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:20:27,966-Speed 2628.73 samples/sec Loss 7.0092 LearningRate 0.0272 Epoch: 9 Global Step: 397170 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:20:31,871-Speed 2622.45 samples/sec Loss 6.9002 LearningRate 0.0272 Epoch: 9 Global Step: 397180 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:20:35,749-Speed 2640.93 samples/sec Loss 6.9843 LearningRate 0.0272 Epoch: 9 Global Step: 397190 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:39,659-Speed 2619.64 samples/sec Loss 6.9518 LearningRate 0.0272 Epoch: 9 Global Step: 397200 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:43,554-Speed 2629.59 samples/sec Loss 6.9969 LearningRate 0.0272 Epoch: 9 Global Step: 397210 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:47,455-Speed 2625.64 samples/sec Loss 6.9376 LearningRate 0.0272 Epoch: 9 Global Step: 397220 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:51,352-Speed 2628.39 samples/sec Loss 7.0016 LearningRate 0.0272 Epoch: 9 Global Step: 397230 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:55,253-Speed 2625.41 samples/sec Loss 6.8669 LearningRate 0.0272 Epoch: 9 Global Step: 397240 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:20:59,149-Speed 2629.50 samples/sec Loss 6.9157 LearningRate 0.0272 Epoch: 9 Global Step: 397250 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:21:03,042-Speed 2630.77 samples/sec Loss 7.0380 LearningRate 0.0272 Epoch: 9 Global Step: 397260 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:21:06,935-Speed 2630.52 samples/sec Loss 6.9234 LearningRate 0.0272 Epoch: 9 Global Step: 397270 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:21:10,827-Speed 2631.40 samples/sec Loss 6.9160 LearningRate 0.0272 Epoch: 9 Global Step: 397280 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:21:14,721-Speed 2630.40 samples/sec Loss 7.0379 LearningRate 0.0272 Epoch: 9 Global Step: 397290 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:18,621-Speed 2626.10 samples/sec Loss 6.9722 LearningRate 0.0272 Epoch: 9 Global Step: 397300 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:22,521-Speed 2626.59 samples/sec Loss 6.9146 LearningRate 0.0272 Epoch: 9 Global Step: 397310 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:26,422-Speed 2625.47 samples/sec Loss 7.0080 LearningRate 0.0272 Epoch: 9 Global Step: 397320 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:30,319-Speed 2628.96 samples/sec Loss 7.0658 LearningRate 0.0271 Epoch: 9 Global Step: 397330 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:34,219-Speed 2625.71 samples/sec Loss 7.0233 LearningRate 0.0271 Epoch: 9 Global Step: 397340 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:38,117-Speed 2627.54 samples/sec Loss 7.0416 LearningRate 0.0271 Epoch: 9 Global Step: 397350 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:42,009-Speed 2631.35 samples/sec Loss 7.0837 LearningRate 0.0271 Epoch: 9 Global Step: 397360 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:45,905-Speed 2628.88 samples/sec Loss 7.0690 LearningRate 0.0271 Epoch: 9 Global Step: 397370 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:49,805-Speed 2626.26 samples/sec Loss 6.8675 LearningRate 0.0271 Epoch: 9 Global Step: 397380 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:21:53,704-Speed 2627.23 samples/sec Loss 6.9021 LearningRate 0.0271 Epoch: 9 Global Step: 397390 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:21:57,585-Speed 2638.75 samples/sec Loss 7.0246 LearningRate 0.0271 Epoch: 9 Global Step: 397400 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:01,482-Speed 2628.53 samples/sec Loss 6.9638 LearningRate 0.0271 Epoch: 9 Global Step: 397410 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:05,384-Speed 2624.85 samples/sec Loss 7.1097 LearningRate 0.0271 Epoch: 9 Global Step: 397420 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:09,278-Speed 2630.09 samples/sec Loss 6.9651 LearningRate 0.0271 Epoch: 9 Global Step: 397430 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:13,172-Speed 2630.97 samples/sec Loss 7.0652 LearningRate 0.0271 Epoch: 9 Global Step: 397440 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:17,072-Speed 2625.73 samples/sec Loss 6.9744 LearningRate 0.0271 Epoch: 9 Global Step: 397450 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:20,980-Speed 2621.23 samples/sec Loss 6.9759 LearningRate 0.0271 Epoch: 9 Global Step: 397460 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:24,907-Speed 2608.18 samples/sec Loss 6.9863 LearningRate 0.0271 Epoch: 9 Global Step: 397470 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:28,816-Speed 2620.00 samples/sec Loss 7.0035 LearningRate 0.0271 Epoch: 9 Global Step: 397480 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:32,709-Speed 2631.26 samples/sec Loss 7.0582 LearningRate 0.0271 Epoch: 9 Global Step: 397490 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:36,607-Speed 2627.42 samples/sec Loss 6.9760 LearningRate 0.0271 Epoch: 9 Global Step: 397500 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:22:40,523-Speed 2615.56 samples/sec Loss 6.9214 LearningRate 0.0271 Epoch: 9 Global Step: 397510 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:22:44,400-Speed 2641.85 samples/sec Loss 6.9688 LearningRate 0.0271 Epoch: 9 Global Step: 397520 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:48,298-Speed 2627.83 samples/sec Loss 6.9589 LearningRate 0.0271 Epoch: 9 Global Step: 397530 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:52,196-Speed 2627.79 samples/sec Loss 7.1416 LearningRate 0.0271 Epoch: 9 Global Step: 397540 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:56,093-Speed 2628.21 samples/sec Loss 7.0093 LearningRate 0.0271 Epoch: 9 Global Step: 397550 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:22:59,992-Speed 2626.91 samples/sec Loss 6.9875 LearningRate 0.0271 Epoch: 9 Global Step: 397560 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:03,900-Speed 2621.06 samples/sec Loss 6.9586 LearningRate 0.0271 Epoch: 9 Global Step: 397570 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:07,805-Speed 2622.66 samples/sec Loss 6.9314 LearningRate 0.0271 Epoch: 9 Global Step: 397580 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:11,693-Speed 2633.92 samples/sec Loss 6.8670 LearningRate 0.0271 Epoch: 9 Global Step: 397590 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:15,595-Speed 2624.57 samples/sec Loss 6.9646 LearningRate 0.0271 Epoch: 9 Global Step: 397600 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:19,491-Speed 2629.91 samples/sec Loss 6.9218 LearningRate 0.0271 Epoch: 9 Global Step: 397610 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:23,390-Speed 2626.71 samples/sec Loss 6.9193 LearningRate 0.0271 Epoch: 9 Global Step: 397620 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:23:27,332-Speed 2598.48 samples/sec Loss 6.9283 LearningRate 0.0271 Epoch: 9 Global Step: 397630 Fp16 Grad Scale: 262144 Required: 49 hours
Training: 2022-04-14 16:23:31,257-Speed 2609.44 samples/sec Loss 6.9906 LearningRate 0.0271 Epoch: 9 Global Step: 397640 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:35,155-Speed 2627.61 samples/sec Loss 6.8981 LearningRate 0.0271 Epoch: 9 Global Step: 397650 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:39,107-Speed 2591.63 samples/sec Loss 6.9205 LearningRate 0.0271 Epoch: 9 Global Step: 397660 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:43,003-Speed 2628.81 samples/sec Loss 6.9718 LearningRate 0.0271 Epoch: 9 Global Step: 397670 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:46,908-Speed 2622.74 samples/sec Loss 6.9297 LearningRate 0.0271 Epoch: 9 Global Step: 397680 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:50,812-Speed 2623.36 samples/sec Loss 6.9888 LearningRate 0.0271 Epoch: 9 Global Step: 397690 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:54,715-Speed 2624.47 samples/sec Loss 6.9604 LearningRate 0.0271 Epoch: 9 Global Step: 397700 Fp16 Grad Scale: 131072 Required: 49 hours
Training: 2022-04-14 16:23:58,597-Speed 2638.37 samples/sec Loss 6.9978 LearningRate 0.0271 Epoch: 9 Global Step: 397710 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:24:02,495-Speed 2627.88 samples/sec Loss 7.0031 LearningRate 0.0271 Epoch: 9 Global Step: 397720 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:24:06,391-Speed 2628.80 samples/sec Loss 6.9494 LearningRate 0.0271 Epoch: 9 Global Step: 397730 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:24:10,295-Speed 2623.71 samples/sec Loss 7.0709 LearningRate 0.0271 Epoch: 9 Global Step: 397740 Fp16 Grad Scale: 65536 Required: 49 hours
Training: 2022-04-14 16:24:14,206-Speed 2618.69 samples/sec Loss 7.0633 LearningRate 0.0271 Epoch: 9 Global Step: 397750 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:18,103-Speed 2628.33 samples/sec Loss 6.8993 LearningRate 0.0271 Epoch: 9 Global Step: 397760 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:21,998-Speed 2629.46 samples/sec Loss 7.0489 LearningRate 0.0271 Epoch: 9 Global Step: 397770 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:25,891-Speed 2631.17 samples/sec Loss 6.8404 LearningRate 0.0271 Epoch: 9 Global Step: 397780 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:29,805-Speed 2617.04 samples/sec Loss 6.9839 LearningRate 0.0271 Epoch: 9 Global Step: 397790 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:33,699-Speed 2630.55 samples/sec Loss 6.9960 LearningRate 0.0271 Epoch: 9 Global Step: 397800 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:37,597-Speed 2626.87 samples/sec Loss 7.0176 LearningRate 0.0271 Epoch: 9 Global Step: 397810 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:24:41,476-Speed 2640.97 samples/sec Loss 6.9545 LearningRate 0.0271 Epoch: 9 Global Step: 397820 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:45,368-Speed 2631.61 samples/sec Loss 7.0465 LearningRate 0.0271 Epoch: 9 Global Step: 397830 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:49,262-Speed 2630.10 samples/sec Loss 6.9800 LearningRate 0.0271 Epoch: 9 Global Step: 397840 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:53,161-Speed 2627.10 samples/sec Loss 7.0607 LearningRate 0.0271 Epoch: 9 Global Step: 397850 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:24:57,069-Speed 2620.93 samples/sec Loss 7.0079 LearningRate 0.0271 Epoch: 9 Global Step: 397860 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:01,126-Speed 2524.17 samples/sec Loss 6.8706 LearningRate 0.0271 Epoch: 9 Global Step: 397870 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:05,147-Speed 2547.10 samples/sec Loss 6.9733 LearningRate 0.0271 Epoch: 9 Global Step: 397880 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:09,044-Speed 2628.76 samples/sec Loss 7.0313 LearningRate 0.0271 Epoch: 9 Global Step: 397890 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:12,950-Speed 2621.95 samples/sec Loss 6.9075 LearningRate 0.0271 Epoch: 9 Global Step: 397900 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:16,854-Speed 2623.48 samples/sec Loss 6.8946 LearningRate 0.0271 Epoch: 9 Global Step: 397910 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:20,764-Speed 2619.83 samples/sec Loss 6.9756 LearningRate 0.0271 Epoch: 9 Global Step: 397920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:25:24,666-Speed 2624.48 samples/sec Loss 6.9764 LearningRate 0.0271 Epoch: 9 Global Step: 397930 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:25:28,569-Speed 2624.37 samples/sec Loss 6.8709 LearningRate 0.0271 Epoch: 9 Global Step: 397940 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:25:32,483-Speed 2616.54 samples/sec Loss 6.8519 LearningRate 0.0271 Epoch: 9 Global Step: 397950 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:36,392-Speed 2620.34 samples/sec Loss 6.9295 LearningRate 0.0271 Epoch: 9 Global Step: 397960 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:40,308-Speed 2615.08 samples/sec Loss 6.9311 LearningRate 0.0271 Epoch: 9 Global Step: 397970 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:44,211-Speed 2624.54 samples/sec Loss 6.9038 LearningRate 0.0271 Epoch: 9 Global Step: 397980 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:48,115-Speed 2623.53 samples/sec Loss 6.9492 LearningRate 0.0271 Epoch: 9 Global Step: 397990 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:52,019-Speed 2624.16 samples/sec Loss 6.9809 LearningRate 0.0271 Epoch: 9 Global Step: 398000 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:55,915-Speed 2629.01 samples/sec Loss 6.9065 LearningRate 0.0271 Epoch: 9 Global Step: 398010 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:25:59,823-Speed 2620.60 samples/sec Loss 6.8765 LearningRate 0.0271 Epoch: 9 Global Step: 398020 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:26:03,726-Speed 2623.89 samples/sec Loss 7.1012 LearningRate 0.0271 Epoch: 9 Global Step: 398030 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:26:07,626-Speed 2626.17 samples/sec Loss 6.9370 LearningRate 0.0271 Epoch: 9 Global Step: 398040 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:26:11,522-Speed 2629.27 samples/sec Loss 6.8575 LearningRate 0.0271 Epoch: 9 Global Step: 398050 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:15,414-Speed 2631.40 samples/sec Loss 6.9226 LearningRate 0.0271 Epoch: 9 Global Step: 398060 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:19,320-Speed 2621.94 samples/sec Loss 6.9474 LearningRate 0.0271 Epoch: 9 Global Step: 398070 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:23,224-Speed 2624.12 samples/sec Loss 6.9345 LearningRate 0.0271 Epoch: 9 Global Step: 398080 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:27,122-Speed 2627.62 samples/sec Loss 6.9236 LearningRate 0.0271 Epoch: 9 Global Step: 398090 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:31,029-Speed 2621.77 samples/sec Loss 6.9859 LearningRate 0.0271 Epoch: 9 Global Step: 398100 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:34,929-Speed 2626.09 samples/sec Loss 7.0321 LearningRate 0.0271 Epoch: 9 Global Step: 398110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:38,825-Speed 2628.48 samples/sec Loss 6.9437 LearningRate 0.0270 Epoch: 9 Global Step: 398120 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:42,722-Speed 2628.10 samples/sec Loss 7.0683 LearningRate 0.0270 Epoch: 9 Global Step: 398130 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:46,620-Speed 2627.81 samples/sec Loss 7.0161 LearningRate 0.0270 Epoch: 9 Global Step: 398140 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:50,537-Speed 2615.15 samples/sec Loss 6.8072 LearningRate 0.0270 Epoch: 9 Global Step: 398150 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:26:54,416-Speed 2640.73 samples/sec Loss 6.9174 LearningRate 0.0270 Epoch: 9 Global Step: 398160 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:26:58,312-Speed 2628.67 samples/sec Loss 7.0060 LearningRate 0.0270 Epoch: 9 Global Step: 398170 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:02,225-Speed 2617.43 samples/sec Loss 7.0382 LearningRate 0.0270 Epoch: 9 Global Step: 398180 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:06,119-Speed 2630.73 samples/sec Loss 6.8634 LearningRate 0.0270 Epoch: 9 Global Step: 398190 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:10,018-Speed 2627.21 samples/sec Loss 6.9147 LearningRate 0.0270 Epoch: 9 Global Step: 398200 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:13,925-Speed 2621.59 samples/sec Loss 6.9675 LearningRate 0.0270 Epoch: 9 Global Step: 398210 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:17,825-Speed 2626.17 samples/sec Loss 6.8001 LearningRate 0.0270 Epoch: 9 Global Step: 398220 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:21,729-Speed 2623.18 samples/sec Loss 6.9120 LearningRate 0.0270 Epoch: 9 Global Step: 398230 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:25,642-Speed 2618.17 samples/sec Loss 6.9880 LearningRate 0.0270 Epoch: 9 Global Step: 398240 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:29,552-Speed 2619.54 samples/sec Loss 7.0000 LearningRate 0.0270 Epoch: 9 Global Step: 398250 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:33,434-Speed 2638.30 samples/sec Loss 7.0355 LearningRate 0.0270 Epoch: 9 Global Step: 398260 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:37,334-Speed 2626.12 samples/sec Loss 6.9700 LearningRate 0.0270 Epoch: 9 Global Step: 398270 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:41,231-Speed 2628.04 samples/sec Loss 7.1031 LearningRate 0.0270 Epoch: 9 Global Step: 398280 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:45,130-Speed 2627.10 samples/sec Loss 6.9826 LearningRate 0.0270 Epoch: 9 Global Step: 398290 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:49,032-Speed 2624.71 samples/sec Loss 6.8230 LearningRate 0.0270 Epoch: 9 Global Step: 398300 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:52,927-Speed 2629.70 samples/sec Loss 6.9134 LearningRate 0.0270 Epoch: 9 Global Step: 398310 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:27:56,824-Speed 2628.17 samples/sec Loss 7.0150 LearningRate 0.0270 Epoch: 9 Global Step: 398320 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:28:00,717-Speed 2631.22 samples/sec Loss 6.9222 LearningRate 0.0270 Epoch: 9 Global Step: 398330 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:28:04,615-Speed 2627.38 samples/sec Loss 6.9278 LearningRate 0.0270 Epoch: 9 Global Step: 398340 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:28:08,514-Speed 2627.31 samples/sec Loss 6.9689 LearningRate 0.0270 Epoch: 9 Global Step: 398350 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:28:12,412-Speed 2627.03 samples/sec Loss 6.9059 LearningRate 0.0270 Epoch: 9 Global Step: 398360 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:28:16,307-Speed 2630.08 samples/sec Loss 6.9953 LearningRate 0.0270 Epoch: 9 Global Step: 398370 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:28:20,207-Speed 2626.22 samples/sec Loss 7.1036 LearningRate 0.0270 Epoch: 9 Global Step: 398380 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:28:24,090-Speed 2637.88 samples/sec Loss 6.9656 LearningRate 0.0270 Epoch: 9 Global Step: 398390 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:28,003-Speed 2617.69 samples/sec Loss 7.0843 LearningRate 0.0270 Epoch: 9 Global Step: 398400 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:31,892-Speed 2633.78 samples/sec Loss 7.0359 LearningRate 0.0270 Epoch: 9 Global Step: 398410 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:35,792-Speed 2626.29 samples/sec Loss 6.8342 LearningRate 0.0270 Epoch: 9 Global Step: 398420 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:39,688-Speed 2628.93 samples/sec Loss 6.9682 LearningRate 0.0270 Epoch: 9 Global Step: 398430 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:43,589-Speed 2626.07 samples/sec Loss 7.0716 LearningRate 0.0270 Epoch: 9 Global Step: 398440 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:47,484-Speed 2629.18 samples/sec Loss 6.8836 LearningRate 0.0270 Epoch: 9 Global Step: 398450 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:51,379-Speed 2629.92 samples/sec Loss 7.0413 LearningRate 0.0270 Epoch: 9 Global Step: 398460 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:55,272-Speed 2630.80 samples/sec Loss 6.9398 LearningRate 0.0270 Epoch: 9 Global Step: 398470 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:28:59,164-Speed 2631.69 samples/sec Loss 7.0328 LearningRate 0.0270 Epoch: 9 Global Step: 398480 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:29:03,063-Speed 2627.36 samples/sec Loss 6.8765 LearningRate 0.0270 Epoch: 9 Global Step: 398490 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:06,954-Speed 2632.10 samples/sec Loss 6.7974 LearningRate 0.0270 Epoch: 9 Global Step: 398500 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:10,849-Speed 2630.01 samples/sec Loss 7.1945 LearningRate 0.0270 Epoch: 9 Global Step: 398510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:14,744-Speed 2628.97 samples/sec Loss 6.9685 LearningRate 0.0270 Epoch: 9 Global Step: 398520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:18,679-Speed 2603.17 samples/sec Loss 6.9532 LearningRate 0.0270 Epoch: 9 Global Step: 398530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:22,576-Speed 2628.34 samples/sec Loss 7.0278 LearningRate 0.0270 Epoch: 9 Global Step: 398540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:26,477-Speed 2626.35 samples/sec Loss 6.9493 LearningRate 0.0270 Epoch: 9 Global Step: 398550 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:30,371-Speed 2630.03 samples/sec Loss 7.0045 LearningRate 0.0270 Epoch: 9 Global Step: 398560 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:34,271-Speed 2626.86 samples/sec Loss 6.9320 LearningRate 0.0270 Epoch: 9 Global Step: 398570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:38,168-Speed 2628.18 samples/sec Loss 7.0698 LearningRate 0.0270 Epoch: 9 Global Step: 398580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:42,045-Speed 2641.46 samples/sec Loss 6.9608 LearningRate 0.0270 Epoch: 9 Global Step: 398590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:45,944-Speed 2626.70 samples/sec Loss 7.0397 LearningRate 0.0270 Epoch: 9 Global Step: 398600 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:49,850-Speed 2622.82 samples/sec Loss 6.8829 LearningRate 0.0270 Epoch: 9 Global Step: 398610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:53,750-Speed 2626.09 samples/sec Loss 6.9358 LearningRate 0.0270 Epoch: 9 Global Step: 398620 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:29:57,648-Speed 2628.13 samples/sec Loss 6.8859 LearningRate 0.0270 Epoch: 9 Global Step: 398630 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:30:01,549-Speed 2625.37 samples/sec Loss 6.9224 LearningRate 0.0270 Epoch: 9 Global Step: 398640 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:30:05,441-Speed 2632.01 samples/sec Loss 6.9392 LearningRate 0.0270 Epoch: 9 Global Step: 398650 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:30:09,339-Speed 2627.14 samples/sec Loss 6.9387 LearningRate 0.0270 Epoch: 9 Global Step: 398660 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:30:13,216-Speed 2642.09 samples/sec Loss 6.9725 LearningRate 0.0270 Epoch: 9 Global Step: 398670 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:17,129-Speed 2616.96 samples/sec Loss 6.9758 LearningRate 0.0270 Epoch: 9 Global Step: 398680 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:21,022-Speed 2631.50 samples/sec Loss 7.1484 LearningRate 0.0270 Epoch: 9 Global Step: 398690 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:24,926-Speed 2623.72 samples/sec Loss 6.9491 LearningRate 0.0270 Epoch: 9 Global Step: 398700 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:28,827-Speed 2625.22 samples/sec Loss 6.9998 LearningRate 0.0270 Epoch: 9 Global Step: 398710 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:32,729-Speed 2624.89 samples/sec Loss 6.9689 LearningRate 0.0270 Epoch: 9 Global Step: 398720 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:36,620-Speed 2632.38 samples/sec Loss 6.9936 LearningRate 0.0270 Epoch: 9 Global Step: 398730 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:40,514-Speed 2630.79 samples/sec Loss 6.9704 LearningRate 0.0270 Epoch: 9 Global Step: 398740 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:44,409-Speed 2629.49 samples/sec Loss 6.9822 LearningRate 0.0270 Epoch: 9 Global Step: 398750 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:48,310-Speed 2625.89 samples/sec Loss 6.8314 LearningRate 0.0270 Epoch: 9 Global Step: 398760 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:30:52,219-Speed 2620.00 samples/sec Loss 6.9764 LearningRate 0.0270 Epoch: 9 Global Step: 398770 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:30:56,117-Speed 2628.06 samples/sec Loss 6.8979 LearningRate 0.0270 Epoch: 9 Global Step: 398780 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:00,014-Speed 2628.28 samples/sec Loss 7.0244 LearningRate 0.0270 Epoch: 9 Global Step: 398790 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:03,922-Speed 2620.93 samples/sec Loss 7.0749 LearningRate 0.0270 Epoch: 9 Global Step: 398800 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:07,821-Speed 2626.50 samples/sec Loss 7.0099 LearningRate 0.0270 Epoch: 9 Global Step: 398810 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:11,720-Speed 2627.51 samples/sec Loss 7.0482 LearningRate 0.0270 Epoch: 9 Global Step: 398820 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:15,615-Speed 2629.71 samples/sec Loss 7.0996 LearningRate 0.0270 Epoch: 9 Global Step: 398830 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:19,525-Speed 2619.78 samples/sec Loss 7.0570 LearningRate 0.0270 Epoch: 9 Global Step: 398840 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:23,434-Speed 2620.31 samples/sec Loss 7.0154 LearningRate 0.0270 Epoch: 9 Global Step: 398850 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:27,343-Speed 2619.84 samples/sec Loss 6.9535 LearningRate 0.0270 Epoch: 9 Global Step: 398860 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:31,252-Speed 2621.02 samples/sec Loss 7.0158 LearningRate 0.0270 Epoch: 9 Global Step: 398870 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:31:35,146-Speed 2630.23 samples/sec Loss 6.8261 LearningRate 0.0270 Epoch: 9 Global Step: 398880 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:31:39,047-Speed 2625.54 samples/sec Loss 6.9929 LearningRate 0.0270 Epoch: 9 Global Step: 398890 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:31:42,954-Speed 2621.18 samples/sec Loss 6.8703 LearningRate 0.0270 Epoch: 9 Global Step: 398900 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:31:46,849-Speed 2630.14 samples/sec Loss 6.9589 LearningRate 0.0270 Epoch: 9 Global Step: 398910 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:50,744-Speed 2630.25 samples/sec Loss 6.9633 LearningRate 0.0269 Epoch: 9 Global Step: 398920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:54,639-Speed 2629.51 samples/sec Loss 6.9538 LearningRate 0.0269 Epoch: 9 Global Step: 398930 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:31:58,536-Speed 2628.99 samples/sec Loss 6.9916 LearningRate 0.0269 Epoch: 9 Global Step: 398940 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:02,436-Speed 2625.94 samples/sec Loss 7.0309 LearningRate 0.0269 Epoch: 9 Global Step: 398950 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:06,505-Speed 2517.05 samples/sec Loss 6.8817 LearningRate 0.0269 Epoch: 9 Global Step: 398960 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:10,464-Speed 2587.59 samples/sec Loss 6.8971 LearningRate 0.0269 Epoch: 9 Global Step: 398970 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:14,361-Speed 2628.40 samples/sec Loss 6.8654 LearningRate 0.0269 Epoch: 9 Global Step: 398980 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:18,256-Speed 2629.24 samples/sec Loss 7.0198 LearningRate 0.0269 Epoch: 9 Global Step: 398990 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:22,182-Speed 2609.18 samples/sec Loss 6.9918 LearningRate 0.0269 Epoch: 9 Global Step: 399000 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:26,059-Speed 2641.93 samples/sec Loss 6.9116 LearningRate 0.0269 Epoch: 9 Global Step: 399010 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:30,091-Speed 2540.63 samples/sec Loss 6.9402 LearningRate 0.0269 Epoch: 9 Global Step: 399020 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:33,986-Speed 2630.08 samples/sec Loss 7.0488 LearningRate 0.0269 Epoch: 9 Global Step: 399030 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:37,879-Speed 2630.49 samples/sec Loss 7.0015 LearningRate 0.0269 Epoch: 9 Global Step: 399040 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:41,815-Speed 2602.17 samples/sec Loss 6.9784 LearningRate 0.0269 Epoch: 9 Global Step: 399050 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:45,714-Speed 2627.13 samples/sec Loss 6.8620 LearningRate 0.0269 Epoch: 9 Global Step: 399060 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:49,621-Speed 2621.62 samples/sec Loss 7.0341 LearningRate 0.0269 Epoch: 9 Global Step: 399070 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:53,520-Speed 2627.45 samples/sec Loss 7.0379 LearningRate 0.0269 Epoch: 9 Global Step: 399080 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:32:57,419-Speed 2626.45 samples/sec Loss 6.9929 LearningRate 0.0269 Epoch: 9 Global Step: 399090 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:01,320-Speed 2626.38 samples/sec Loss 6.9923 LearningRate 0.0269 Epoch: 9 Global Step: 399100 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:05,203-Speed 2637.60 samples/sec Loss 6.9354 LearningRate 0.0269 Epoch: 9 Global Step: 399110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:09,105-Speed 2624.54 samples/sec Loss 6.8036 LearningRate 0.0269 Epoch: 9 Global Step: 399120 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:13,014-Speed 2619.97 samples/sec Loss 6.9233 LearningRate 0.0269 Epoch: 9 Global Step: 399130 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:16,918-Speed 2623.98 samples/sec Loss 7.0334 LearningRate 0.0269 Epoch: 9 Global Step: 399140 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:20,817-Speed 2626.67 samples/sec Loss 6.9517 LearningRate 0.0269 Epoch: 9 Global Step: 399150 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:24,726-Speed 2620.42 samples/sec Loss 7.0624 LearningRate 0.0269 Epoch: 9 Global Step: 399160 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:28,629-Speed 2624.89 samples/sec Loss 6.8876 LearningRate 0.0269 Epoch: 9 Global Step: 399170 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:32,550-Speed 2611.71 samples/sec Loss 6.8729 LearningRate 0.0269 Epoch: 9 Global Step: 399180 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:36,443-Speed 2631.21 samples/sec Loss 6.8661 LearningRate 0.0269 Epoch: 9 Global Step: 399190 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:40,339-Speed 2628.65 samples/sec Loss 6.9740 LearningRate 0.0269 Epoch: 9 Global Step: 399200 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:44,284-Speed 2596.20 samples/sec Loss 6.7919 LearningRate 0.0269 Epoch: 9 Global Step: 399210 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:33:48,164-Speed 2640.46 samples/sec Loss 7.0402 LearningRate 0.0269 Epoch: 9 Global Step: 399220 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:52,061-Speed 2628.40 samples/sec Loss 7.1000 LearningRate 0.0269 Epoch: 9 Global Step: 399230 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:55,961-Speed 2626.29 samples/sec Loss 6.9432 LearningRate 0.0269 Epoch: 9 Global Step: 399240 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:33:59,868-Speed 2621.85 samples/sec Loss 7.1353 LearningRate 0.0269 Epoch: 9 Global Step: 399250 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:03,779-Speed 2618.49 samples/sec Loss 6.8572 LearningRate 0.0269 Epoch: 9 Global Step: 399260 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:07,673-Speed 2630.51 samples/sec Loss 7.0322 LearningRate 0.0269 Epoch: 9 Global Step: 399270 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:11,561-Speed 2634.28 samples/sec Loss 6.8599 LearningRate 0.0269 Epoch: 9 Global Step: 399280 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:15,468-Speed 2621.73 samples/sec Loss 6.9030 LearningRate 0.0269 Epoch: 9 Global Step: 399290 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:19,401-Speed 2604.08 samples/sec Loss 7.0178 LearningRate 0.0269 Epoch: 9 Global Step: 399300 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:23,296-Speed 2629.97 samples/sec Loss 6.9936 LearningRate 0.0269 Epoch: 9 Global Step: 399310 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:27,180-Speed 2637.33 samples/sec Loss 6.8928 LearningRate 0.0269 Epoch: 9 Global Step: 399320 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:31,077-Speed 2628.33 samples/sec Loss 6.9749 LearningRate 0.0269 Epoch: 9 Global Step: 399330 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:34,987-Speed 2619.68 samples/sec Loss 7.1003 LearningRate 0.0269 Epoch: 9 Global Step: 399340 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:38,882-Speed 2629.14 samples/sec Loss 6.9120 LearningRate 0.0269 Epoch: 9 Global Step: 399350 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:42,781-Speed 2627.67 samples/sec Loss 7.0195 LearningRate 0.0269 Epoch: 9 Global Step: 399360 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:46,681-Speed 2626.02 samples/sec Loss 6.9320 LearningRate 0.0269 Epoch: 9 Global Step: 399370 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:50,576-Speed 2629.46 samples/sec Loss 6.9707 LearningRate 0.0269 Epoch: 9 Global Step: 399380 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:54,506-Speed 2606.08 samples/sec Loss 6.9461 LearningRate 0.0269 Epoch: 9 Global Step: 399390 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:34:58,411-Speed 2623.39 samples/sec Loss 6.8727 LearningRate 0.0269 Epoch: 9 Global Step: 399400 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:35:02,319-Speed 2620.95 samples/sec Loss 7.0013 LearningRate 0.0269 Epoch: 9 Global Step: 399410 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:35:06,211-Speed 2632.39 samples/sec Loss 6.9407 LearningRate 0.0269 Epoch: 9 Global Step: 399420 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:35:10,138-Speed 2608.31 samples/sec Loss 6.9246 LearningRate 0.0269 Epoch: 9 Global Step: 399430 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:35:14,033-Speed 2629.52 samples/sec Loss 7.0121 LearningRate 0.0269 Epoch: 9 Global Step: 399440 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:35:17,912-Speed 2640.80 samples/sec Loss 6.9827 LearningRate 0.0269 Epoch: 9 Global Step: 399450 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:35:21,804-Speed 2631.76 samples/sec Loss 6.9308 LearningRate 0.0269 Epoch: 9 Global Step: 399460 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:35:25,704-Speed 2626.42 samples/sec Loss 6.9528 LearningRate 0.0269 Epoch: 9 Global Step: 399470 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:35:29,577-Speed 2644.92 samples/sec Loss 6.7828 LearningRate 0.0269 Epoch: 9 Global Step: 399480 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:35:33,479-Speed 2625.40 samples/sec Loss 6.9232 LearningRate 0.0269 Epoch: 9 Global Step: 399490 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:35:37,373-Speed 2630.30 samples/sec Loss 6.8677 LearningRate 0.0269 Epoch: 9 Global Step: 399500 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:35:41,267-Speed 2629.85 samples/sec Loss 7.0009 LearningRate 0.0269 Epoch: 9 Global Step: 399510 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:35:45,169-Speed 2624.97 samples/sec Loss 6.8322 LearningRate 0.0269 Epoch: 9 Global Step: 399520 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:35:49,069-Speed 2627.31 samples/sec Loss 7.0566 LearningRate 0.0269 Epoch: 9 Global Step: 399530 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:35:52,977-Speed 2620.53 samples/sec Loss 6.9649 LearningRate 0.0269 Epoch: 9 Global Step: 399540 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:35:56,878-Speed 2625.62 samples/sec Loss 6.8842 LearningRate 0.0269 Epoch: 9 Global Step: 399550 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:36:00,795-Speed 2614.49 samples/sec Loss 7.0300 LearningRate 0.0269 Epoch: 9 Global Step: 399560 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:36:04,700-Speed 2623.69 samples/sec Loss 6.9779 LearningRate 0.0269 Epoch: 9 Global Step: 399570 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:36:08,597-Speed 2628.08 samples/sec Loss 7.0031 LearningRate 0.0269 Epoch: 9 Global Step: 399580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:12,523-Speed 2608.97 samples/sec Loss 6.9954 LearningRate 0.0269 Epoch: 9 Global Step: 399590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:16,425-Speed 2624.61 samples/sec Loss 6.9518 LearningRate 0.0269 Epoch: 9 Global Step: 399600 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:20,321-Speed 2629.41 samples/sec Loss 6.9756 LearningRate 0.0269 Epoch: 9 Global Step: 399610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:24,215-Speed 2630.14 samples/sec Loss 6.9306 LearningRate 0.0269 Epoch: 9 Global Step: 399620 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:28,111-Speed 2629.06 samples/sec Loss 6.9946 LearningRate 0.0269 Epoch: 9 Global Step: 399630 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:32,008-Speed 2628.68 samples/sec Loss 6.9382 LearningRate 0.0269 Epoch: 9 Global Step: 399640 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:35,907-Speed 2627.21 samples/sec Loss 6.9647 LearningRate 0.0269 Epoch: 9 Global Step: 399650 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:39,840-Speed 2603.91 samples/sec Loss 6.8688 LearningRate 0.0269 Epoch: 9 Global Step: 399660 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:43,808-Speed 2581.50 samples/sec Loss 7.0031 LearningRate 0.0269 Epoch: 9 Global Step: 399670 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:47,856-Speed 2530.54 samples/sec Loss 6.9784 LearningRate 0.0269 Epoch: 9 Global Step: 399680 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:51,897-Speed 2534.83 samples/sec Loss 7.0223 LearningRate 0.0269 Epoch: 9 Global Step: 399690 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:55,820-Speed 2610.91 samples/sec Loss 6.8799 LearningRate 0.0269 Epoch: 9 Global Step: 399700 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:36:59,721-Speed 2625.67 samples/sec Loss 6.9756 LearningRate 0.0269 Epoch: 9 Global Step: 399710 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:03,655-Speed 2603.52 samples/sec Loss 6.9371 LearningRate 0.0268 Epoch: 9 Global Step: 399720 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:07,563-Speed 2620.89 samples/sec Loss 6.9051 LearningRate 0.0268 Epoch: 9 Global Step: 399730 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:11,490-Speed 2608.38 samples/sec Loss 6.9668 LearningRate 0.0268 Epoch: 9 Global Step: 399740 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:15,484-Speed 2564.25 samples/sec Loss 6.9290 LearningRate 0.0268 Epoch: 9 Global Step: 399750 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:19,382-Speed 2627.83 samples/sec Loss 6.9639 LearningRate 0.0268 Epoch: 9 Global Step: 399760 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:23,278-Speed 2628.91 samples/sec Loss 6.8519 LearningRate 0.0268 Epoch: 9 Global Step: 399770 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:27,176-Speed 2627.96 samples/sec Loss 6.8526 LearningRate 0.0268 Epoch: 9 Global Step: 399780 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:37:31,046-Speed 2646.81 samples/sec Loss 6.8892 LearningRate 0.0268 Epoch: 9 Global Step: 399790 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:34,951-Speed 2622.86 samples/sec Loss 6.8986 LearningRate 0.0268 Epoch: 9 Global Step: 399800 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:38,854-Speed 2623.81 samples/sec Loss 6.9100 LearningRate 0.0268 Epoch: 9 Global Step: 399810 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:42,761-Speed 2621.34 samples/sec Loss 6.8357 LearningRate 0.0268 Epoch: 9 Global Step: 399820 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:46,660-Speed 2628.32 samples/sec Loss 6.9182 LearningRate 0.0268 Epoch: 9 Global Step: 399830 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:50,555-Speed 2629.31 samples/sec Loss 7.0446 LearningRate 0.0268 Epoch: 9 Global Step: 399840 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:54,457-Speed 2625.22 samples/sec Loss 7.0453 LearningRate 0.0268 Epoch: 9 Global Step: 399850 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:37:58,354-Speed 2628.29 samples/sec Loss 6.8828 LearningRate 0.0268 Epoch: 9 Global Step: 399860 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:02,251-Speed 2628.61 samples/sec Loss 6.9459 LearningRate 0.0268 Epoch: 9 Global Step: 399870 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:06,157-Speed 2621.85 samples/sec Loss 6.9343 LearningRate 0.0268 Epoch: 9 Global Step: 399880 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:10,055-Speed 2627.38 samples/sec Loss 6.8800 LearningRate 0.0268 Epoch: 9 Global Step: 399890 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:38:13,934-Speed 2640.53 samples/sec Loss 6.9069 LearningRate 0.0268 Epoch: 9 Global Step: 399900 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:17,830-Speed 2629.41 samples/sec Loss 7.0784 LearningRate 0.0268 Epoch: 9 Global Step: 399910 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:21,732-Speed 2624.93 samples/sec Loss 6.9707 LearningRate 0.0268 Epoch: 9 Global Step: 399920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:25,634-Speed 2624.90 samples/sec Loss 6.8471 LearningRate 0.0268 Epoch: 9 Global Step: 399930 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:29,529-Speed 2629.97 samples/sec Loss 6.9568 LearningRate 0.0268 Epoch: 9 Global Step: 399940 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:33,434-Speed 2622.70 samples/sec Loss 6.8514 LearningRate 0.0268 Epoch: 9 Global Step: 399950 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:37,342-Speed 2620.54 samples/sec Loss 6.9225 LearningRate 0.0268 Epoch: 9 Global Step: 399960 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:41,240-Speed 2627.50 samples/sec Loss 6.9263 LearningRate 0.0268 Epoch: 9 Global Step: 399970 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:45,140-Speed 2626.82 samples/sec Loss 6.9109 LearningRate 0.0268 Epoch: 9 Global Step: 399980 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:49,061-Speed 2612.46 samples/sec Loss 6.9075 LearningRate 0.0268 Epoch: 9 Global Step: 399990 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:38:53,015-Speed 2590.42 samples/sec Loss 6.9768 LearningRate 0.0268 Epoch: 9 Global Step: 400000 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:39:36,028-[lfw][400000]XNorm: 23.593337
Training: 2022-04-14 16:39:36,029-[lfw][400000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 16:39:36,029-[lfw][400000]Accuracy-Highest: 0.99783
Training: 2022-04-14 16:40:25,997-[cfp_fp][400000]XNorm: 21.596733
Training: 2022-04-14 16:40:26,397-[cfp_fp][400000]Accuracy-Flip: 0.98757+-0.00599
Training: 2022-04-14 16:40:26,397-[cfp_fp][400000]Accuracy-Highest: 0.98757
Training: 2022-04-14 16:41:09,421-[agedb_30][400000]XNorm: 23.452849
Training: 2022-04-14 16:41:09,422-[agedb_30][400000]Accuracy-Flip: 0.97600+-0.00735
Training: 2022-04-14 16:41:09,423-[agedb_30][400000]Accuracy-Highest: 0.97700
Training: 2022-04-14 16:41:13,284-Speed 73.00 samples/sec Loss 6.9528 LearningRate 0.0268 Epoch: 9 Global Step: 400010 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:17,171-Speed 2635.21 samples/sec Loss 6.9368 LearningRate 0.0268 Epoch: 9 Global Step: 400020 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:21,051-Speed 2639.65 samples/sec Loss 6.9735 LearningRate 0.0268 Epoch: 9 Global Step: 400030 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:24,930-Speed 2640.35 samples/sec Loss 6.8156 LearningRate 0.0268 Epoch: 9 Global Step: 400040 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:28,813-Speed 2637.34 samples/sec Loss 6.8604 LearningRate 0.0268 Epoch: 9 Global Step: 400050 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:32,716-Speed 2624.57 samples/sec Loss 6.8941 LearningRate 0.0268 Epoch: 9 Global Step: 400060 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:36,622-Speed 2621.73 samples/sec Loss 6.9199 LearningRate 0.0268 Epoch: 9 Global Step: 400070 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:40,518-Speed 2628.94 samples/sec Loss 6.7846 LearningRate 0.0268 Epoch: 9 Global Step: 400080 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:44,408-Speed 2633.23 samples/sec Loss 7.0424 LearningRate 0.0268 Epoch: 9 Global Step: 400090 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:48,297-Speed 2633.95 samples/sec Loss 7.0460 LearningRate 0.0268 Epoch: 9 Global Step: 400100 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:41:52,190-Speed 2631.04 samples/sec Loss 6.8042 LearningRate 0.0268 Epoch: 9 Global Step: 400110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:41:56,086-Speed 2628.58 samples/sec Loss 6.9918 LearningRate 0.0268 Epoch: 9 Global Step: 400120 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:41:59,977-Speed 2632.18 samples/sec Loss 7.0716 LearningRate 0.0268 Epoch: 9 Global Step: 400130 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:03,940-Speed 2584.66 samples/sec Loss 6.9859 LearningRate 0.0268 Epoch: 9 Global Step: 400140 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:07,837-Speed 2628.53 samples/sec Loss 7.0308 LearningRate 0.0268 Epoch: 9 Global Step: 400150 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:11,727-Speed 2633.04 samples/sec Loss 7.0142 LearningRate 0.0268 Epoch: 9 Global Step: 400160 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:15,651-Speed 2610.17 samples/sec Loss 6.8237 LearningRate 0.0268 Epoch: 9 Global Step: 400170 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:19,554-Speed 2624.55 samples/sec Loss 6.8652 LearningRate 0.0268 Epoch: 9 Global Step: 400180 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:23,460-Speed 2622.33 samples/sec Loss 6.8525 LearningRate 0.0268 Epoch: 9 Global Step: 400190 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:27,364-Speed 2623.29 samples/sec Loss 6.8932 LearningRate 0.0268 Epoch: 9 Global Step: 400200 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:31,241-Speed 2641.73 samples/sec Loss 6.8991 LearningRate 0.0268 Epoch: 9 Global Step: 400210 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:35,142-Speed 2626.14 samples/sec Loss 6.8454 LearningRate 0.0268 Epoch: 9 Global Step: 400220 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:39,040-Speed 2627.63 samples/sec Loss 6.9660 LearningRate 0.0268 Epoch: 9 Global Step: 400230 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:42,957-Speed 2615.24 samples/sec Loss 7.0036 LearningRate 0.0268 Epoch: 9 Global Step: 400240 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:46,856-Speed 2626.71 samples/sec Loss 6.9189 LearningRate 0.0268 Epoch: 9 Global Step: 400250 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:50,756-Speed 2626.47 samples/sec Loss 6.9871 LearningRate 0.0268 Epoch: 9 Global Step: 400260 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:54,657-Speed 2625.48 samples/sec Loss 6.9538 LearningRate 0.0268 Epoch: 9 Global Step: 400270 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:42:58,548-Speed 2632.28 samples/sec Loss 6.8936 LearningRate 0.0268 Epoch: 9 Global Step: 400280 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:43:02,451-Speed 2624.24 samples/sec Loss 7.0190 LearningRate 0.0268 Epoch: 9 Global Step: 400290 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:43:06,362-Speed 2619.09 samples/sec Loss 6.9575 LearningRate 0.0268 Epoch: 9 Global Step: 400300 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:43:10,261-Speed 2626.86 samples/sec Loss 7.0624 LearningRate 0.0268 Epoch: 9 Global Step: 400310 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:43:14,128-Speed 2649.00 samples/sec Loss 6.8405 LearningRate 0.0268 Epoch: 9 Global Step: 400320 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:18,045-Speed 2615.00 samples/sec Loss 6.8063 LearningRate 0.0268 Epoch: 9 Global Step: 400330 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:21,939-Speed 2630.46 samples/sec Loss 6.9506 LearningRate 0.0268 Epoch: 9 Global Step: 400340 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:25,835-Speed 2628.64 samples/sec Loss 6.9437 LearningRate 0.0268 Epoch: 9 Global Step: 400350 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:29,738-Speed 2624.36 samples/sec Loss 6.9404 LearningRate 0.0268 Epoch: 9 Global Step: 400360 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:33,640-Speed 2624.57 samples/sec Loss 6.9219 LearningRate 0.0268 Epoch: 9 Global Step: 400370 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:37,695-Speed 2526.44 samples/sec Loss 6.8626 LearningRate 0.0268 Epoch: 9 Global Step: 400380 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:41,782-Speed 2505.85 samples/sec Loss 7.0048 LearningRate 0.0268 Epoch: 9 Global Step: 400390 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:45,713-Speed 2605.67 samples/sec Loss 6.9155 LearningRate 0.0268 Epoch: 9 Global Step: 400400 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:49,613-Speed 2626.81 samples/sec Loss 6.8409 LearningRate 0.0268 Epoch: 9 Global Step: 400410 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:43:53,513-Speed 2626.31 samples/sec Loss 6.9290 LearningRate 0.0268 Epoch: 9 Global Step: 400420 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:43:57,415-Speed 2624.70 samples/sec Loss 6.9509 LearningRate 0.0268 Epoch: 9 Global Step: 400430 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:01,313-Speed 2627.45 samples/sec Loss 6.9726 LearningRate 0.0268 Epoch: 9 Global Step: 400440 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:05,213-Speed 2626.71 samples/sec Loss 6.8815 LearningRate 0.0268 Epoch: 9 Global Step: 400450 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:09,134-Speed 2612.10 samples/sec Loss 6.9576 LearningRate 0.0268 Epoch: 9 Global Step: 400460 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:13,036-Speed 2624.90 samples/sec Loss 6.9145 LearningRate 0.0268 Epoch: 9 Global Step: 400470 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:16,947-Speed 2619.26 samples/sec Loss 6.8865 LearningRate 0.0268 Epoch: 9 Global Step: 400480 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:20,850-Speed 2624.28 samples/sec Loss 6.9131 LearningRate 0.0268 Epoch: 9 Global Step: 400490 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:24,747-Speed 2628.00 samples/sec Loss 6.9682 LearningRate 0.0268 Epoch: 9 Global Step: 400500 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:28,643-Speed 2629.02 samples/sec Loss 6.9748 LearningRate 0.0268 Epoch: 9 Global Step: 400510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:32,542-Speed 2627.31 samples/sec Loss 6.8952 LearningRate 0.0267 Epoch: 9 Global Step: 400520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:36,447-Speed 2622.67 samples/sec Loss 6.8533 LearningRate 0.0267 Epoch: 9 Global Step: 400530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:40,355-Speed 2621.21 samples/sec Loss 6.9403 LearningRate 0.0267 Epoch: 9 Global Step: 400540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:44,253-Speed 2627.55 samples/sec Loss 6.8308 LearningRate 0.0267 Epoch: 9 Global Step: 400550 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:48,152-Speed 2627.91 samples/sec Loss 6.9336 LearningRate 0.0267 Epoch: 9 Global Step: 400560 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:52,051-Speed 2626.98 samples/sec Loss 6.8960 LearningRate 0.0267 Epoch: 9 Global Step: 400570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:55,957-Speed 2622.49 samples/sec Loss 6.9953 LearningRate 0.0267 Epoch: 9 Global Step: 400580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:44:59,855-Speed 2627.28 samples/sec Loss 6.9333 LearningRate 0.0267 Epoch: 9 Global Step: 400590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:03,762-Speed 2621.85 samples/sec Loss 6.9507 LearningRate 0.0267 Epoch: 9 Global Step: 400600 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:07,660-Speed 2627.24 samples/sec Loss 7.0367 LearningRate 0.0267 Epoch: 9 Global Step: 400610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:11,560-Speed 2626.57 samples/sec Loss 7.0318 LearningRate 0.0267 Epoch: 9 Global Step: 400620 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:45:15,432-Speed 2644.80 samples/sec Loss 6.9243 LearningRate 0.0267 Epoch: 9 Global Step: 400630 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:19,392-Speed 2586.97 samples/sec Loss 6.8823 LearningRate 0.0267 Epoch: 9 Global Step: 400640 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:23,478-Speed 2506.95 samples/sec Loss 6.9181 LearningRate 0.0267 Epoch: 9 Global Step: 400650 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:27,525-Speed 2530.13 samples/sec Loss 6.9042 LearningRate 0.0267 Epoch: 9 Global Step: 400660 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:31,533-Speed 2556.17 samples/sec Loss 6.9668 LearningRate 0.0267 Epoch: 9 Global Step: 400670 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:35,615-Speed 2509.02 samples/sec Loss 7.0523 LearningRate 0.0267 Epoch: 9 Global Step: 400680 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:39,692-Speed 2512.31 samples/sec Loss 6.8582 LearningRate 0.0267 Epoch: 9 Global Step: 400690 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:43,735-Speed 2533.43 samples/sec Loss 6.9211 LearningRate 0.0267 Epoch: 9 Global Step: 400700 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:47,632-Speed 2628.83 samples/sec Loss 6.8273 LearningRate 0.0267 Epoch: 9 Global Step: 400710 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:51,530-Speed 2627.90 samples/sec Loss 6.9293 LearningRate 0.0267 Epoch: 9 Global Step: 400720 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:55,412-Speed 2638.21 samples/sec Loss 6.9202 LearningRate 0.0267 Epoch: 9 Global Step: 400730 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:45:59,306-Speed 2630.55 samples/sec Loss 6.9753 LearningRate 0.0267 Epoch: 9 Global Step: 400740 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:46:03,206-Speed 2626.63 samples/sec Loss 6.9071 LearningRate 0.0267 Epoch: 9 Global Step: 400750 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:46:07,108-Speed 2624.93 samples/sec Loss 6.8984 LearningRate 0.0267 Epoch: 9 Global Step: 400760 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:46:11,008-Speed 2626.48 samples/sec Loss 6.8019 LearningRate 0.0267 Epoch: 9 Global Step: 400770 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:46:14,906-Speed 2627.53 samples/sec Loss 6.9847 LearningRate 0.0267 Epoch: 9 Global Step: 400780 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:46:18,788-Speed 2639.12 samples/sec Loss 6.9051 LearningRate 0.0267 Epoch: 9 Global Step: 400790 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:22,682-Speed 2630.11 samples/sec Loss 6.8483 LearningRate 0.0267 Epoch: 9 Global Step: 400800 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:26,586-Speed 2623.26 samples/sec Loss 6.9315 LearningRate 0.0267 Epoch: 9 Global Step: 400810 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:30,488-Speed 2625.16 samples/sec Loss 6.9935 LearningRate 0.0267 Epoch: 9 Global Step: 400820 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:34,387-Speed 2626.96 samples/sec Loss 6.8632 LearningRate 0.0267 Epoch: 9 Global Step: 400830 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:38,282-Speed 2629.52 samples/sec Loss 6.9387 LearningRate 0.0267 Epoch: 9 Global Step: 400840 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:42,206-Speed 2610.11 samples/sec Loss 6.8592 LearningRate 0.0267 Epoch: 9 Global Step: 400850 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:46,110-Speed 2624.15 samples/sec Loss 6.8991 LearningRate 0.0267 Epoch: 9 Global Step: 400860 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:50,007-Speed 2628.04 samples/sec Loss 6.8899 LearningRate 0.0267 Epoch: 9 Global Step: 400870 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:53,919-Speed 2618.55 samples/sec Loss 6.9610 LearningRate 0.0267 Epoch: 9 Global Step: 400880 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:46:57,815-Speed 2629.49 samples/sec Loss 6.8995 LearningRate 0.0267 Epoch: 9 Global Step: 400890 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:01,712-Speed 2627.67 samples/sec Loss 6.8563 LearningRate 0.0267 Epoch: 9 Global Step: 400900 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:05,631-Speed 2613.93 samples/sec Loss 6.9477 LearningRate 0.0267 Epoch: 9 Global Step: 400910 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:09,535-Speed 2623.43 samples/sec Loss 6.8335 LearningRate 0.0267 Epoch: 9 Global Step: 400920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:13,435-Speed 2626.49 samples/sec Loss 6.8792 LearningRate 0.0267 Epoch: 9 Global Step: 400930 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:17,339-Speed 2623.44 samples/sec Loss 6.9503 LearningRate 0.0267 Epoch: 9 Global Step: 400940 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:21,237-Speed 2628.07 samples/sec Loss 6.9202 LearningRate 0.0267 Epoch: 9 Global Step: 400950 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:25,134-Speed 2627.84 samples/sec Loss 6.8287 LearningRate 0.0267 Epoch: 9 Global Step: 400960 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:29,047-Speed 2618.04 samples/sec Loss 6.8943 LearningRate 0.0267 Epoch: 9 Global Step: 400970 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:32,966-Speed 2612.92 samples/sec Loss 6.9344 LearningRate 0.0267 Epoch: 9 Global Step: 400980 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:36,880-Speed 2617.13 samples/sec Loss 7.0155 LearningRate 0.0267 Epoch: 9 Global Step: 400990 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:47:40,797-Speed 2614.66 samples/sec Loss 6.9785 LearningRate 0.0267 Epoch: 9 Global Step: 401000 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:47:44,697-Speed 2626.38 samples/sec Loss 6.8876 LearningRate 0.0267 Epoch: 9 Global Step: 401010 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:47:48,578-Speed 2639.11 samples/sec Loss 6.9535 LearningRate 0.0267 Epoch: 9 Global Step: 401020 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:52,546-Speed 2582.50 samples/sec Loss 6.9586 LearningRate 0.0267 Epoch: 9 Global Step: 401030 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:47:56,465-Speed 2613.26 samples/sec Loss 6.8466 LearningRate 0.0267 Epoch: 9 Global Step: 401040 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:00,366-Speed 2625.82 samples/sec Loss 6.8466 LearningRate 0.0267 Epoch: 9 Global Step: 401050 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:04,261-Speed 2629.78 samples/sec Loss 6.8895 LearningRate 0.0267 Epoch: 9 Global Step: 401060 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:08,166-Speed 2622.76 samples/sec Loss 6.8271 LearningRate 0.0267 Epoch: 9 Global Step: 401070 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:12,073-Speed 2621.41 samples/sec Loss 6.9235 LearningRate 0.0267 Epoch: 9 Global Step: 401080 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:15,968-Speed 2629.69 samples/sec Loss 6.8762 LearningRate 0.0267 Epoch: 9 Global Step: 401090 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:19,898-Speed 2606.52 samples/sec Loss 6.8298 LearningRate 0.0267 Epoch: 9 Global Step: 401100 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:23,802-Speed 2623.92 samples/sec Loss 6.9161 LearningRate 0.0267 Epoch: 9 Global Step: 401110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:27,715-Speed 2617.90 samples/sec Loss 6.9066 LearningRate 0.0267 Epoch: 9 Global Step: 401120 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:31,613-Speed 2627.15 samples/sec Loss 6.9223 LearningRate 0.0267 Epoch: 9 Global Step: 401130 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:35,505-Speed 2632.10 samples/sec Loss 6.9384 LearningRate 0.0267 Epoch: 9 Global Step: 401140 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:39,408-Speed 2623.92 samples/sec Loss 6.8489 LearningRate 0.0267 Epoch: 9 Global Step: 401150 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:43,312-Speed 2624.11 samples/sec Loss 6.9417 LearningRate 0.0267 Epoch: 9 Global Step: 401160 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:47,215-Speed 2624.05 samples/sec Loss 6.8335 LearningRate 0.0267 Epoch: 9 Global Step: 401170 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:51,123-Speed 2620.89 samples/sec Loss 6.7922 LearningRate 0.0267 Epoch: 9 Global Step: 401180 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:55,018-Speed 2629.83 samples/sec Loss 6.9772 LearningRate 0.0267 Epoch: 9 Global Step: 401190 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:48:58,941-Speed 2610.92 samples/sec Loss 6.8586 LearningRate 0.0267 Epoch: 9 Global Step: 401200 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:49:02,850-Speed 2620.23 samples/sec Loss 6.9402 LearningRate 0.0267 Epoch: 9 Global Step: 401210 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:49:06,746-Speed 2628.83 samples/sec Loss 6.8979 LearningRate 0.0267 Epoch: 9 Global Step: 401220 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:49:10,618-Speed 2645.16 samples/sec Loss 6.9566 LearningRate 0.0267 Epoch: 9 Global Step: 401230 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:49:14,510-Speed 2632.09 samples/sec Loss 6.9062 LearningRate 0.0267 Epoch: 9 Global Step: 401240 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:18,410-Speed 2626.73 samples/sec Loss 6.9611 LearningRate 0.0267 Epoch: 9 Global Step: 401250 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:22,336-Speed 2609.25 samples/sec Loss 6.9372 LearningRate 0.0267 Epoch: 9 Global Step: 401260 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:26,239-Speed 2624.79 samples/sec Loss 6.9367 LearningRate 0.0267 Epoch: 9 Global Step: 401270 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:30,140-Speed 2625.98 samples/sec Loss 6.8207 LearningRate 0.0267 Epoch: 9 Global Step: 401280 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:34,040-Speed 2626.02 samples/sec Loss 6.9125 LearningRate 0.0267 Epoch: 9 Global Step: 401290 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:37,949-Speed 2620.52 samples/sec Loss 6.8823 LearningRate 0.0267 Epoch: 9 Global Step: 401300 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:41,869-Speed 2612.31 samples/sec Loss 7.0048 LearningRate 0.0267 Epoch: 9 Global Step: 401310 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:45,783-Speed 2617.27 samples/sec Loss 6.8990 LearningRate 0.0267 Epoch: 9 Global Step: 401320 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:49,685-Speed 2625.81 samples/sec Loss 7.0588 LearningRate 0.0266 Epoch: 9 Global Step: 401330 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:49:53,606-Speed 2612.05 samples/sec Loss 6.8462 LearningRate 0.0266 Epoch: 9 Global Step: 401340 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:49:57,525-Speed 2613.64 samples/sec Loss 6.8960 LearningRate 0.0266 Epoch: 9 Global Step: 401350 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:01,418-Speed 2631.23 samples/sec Loss 6.9281 LearningRate 0.0266 Epoch: 9 Global Step: 401360 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:05,321-Speed 2624.10 samples/sec Loss 6.9283 LearningRate 0.0266 Epoch: 9 Global Step: 401370 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:09,220-Speed 2626.84 samples/sec Loss 6.8685 LearningRate 0.0266 Epoch: 9 Global Step: 401380 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:13,153-Speed 2604.56 samples/sec Loss 6.9081 LearningRate 0.0266 Epoch: 9 Global Step: 401390 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:17,050-Speed 2628.00 samples/sec Loss 6.8599 LearningRate 0.0266 Epoch: 9 Global Step: 401400 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:20,972-Speed 2611.74 samples/sec Loss 6.8683 LearningRate 0.0266 Epoch: 9 Global Step: 401410 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:24,884-Speed 2618.28 samples/sec Loss 6.9392 LearningRate 0.0266 Epoch: 9 Global Step: 401420 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:28,781-Speed 2628.62 samples/sec Loss 6.9230 LearningRate 0.0266 Epoch: 9 Global Step: 401430 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:32,682-Speed 2625.27 samples/sec Loss 7.0972 LearningRate 0.0266 Epoch: 9 Global Step: 401440 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:50:36,601-Speed 2613.71 samples/sec Loss 6.9417 LearningRate 0.0266 Epoch: 9 Global Step: 401450 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:50:40,498-Speed 2628.24 samples/sec Loss 6.8159 LearningRate 0.0266 Epoch: 9 Global Step: 401460 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:50:44,392-Speed 2629.63 samples/sec Loss 6.9773 LearningRate 0.0266 Epoch: 9 Global Step: 401470 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:50:48,297-Speed 2623.64 samples/sec Loss 6.9928 LearningRate 0.0266 Epoch: 9 Global Step: 401480 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:50:52,196-Speed 2626.84 samples/sec Loss 6.9521 LearningRate 0.0266 Epoch: 9 Global Step: 401490 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:50:56,095-Speed 2627.73 samples/sec Loss 6.9394 LearningRate 0.0266 Epoch: 9 Global Step: 401500 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:50:59,999-Speed 2623.27 samples/sec Loss 6.8586 LearningRate 0.0266 Epoch: 9 Global Step: 401510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:03,904-Speed 2623.10 samples/sec Loss 6.8704 LearningRate 0.0266 Epoch: 9 Global Step: 401520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:07,807-Speed 2623.86 samples/sec Loss 7.0018 LearningRate 0.0266 Epoch: 9 Global Step: 401530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:11,704-Speed 2628.23 samples/sec Loss 7.0541 LearningRate 0.0266 Epoch: 9 Global Step: 401540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:15,603-Speed 2626.80 samples/sec Loss 6.9159 LearningRate 0.0266 Epoch: 9 Global Step: 401550 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:51:19,491-Speed 2635.00 samples/sec Loss 7.0230 LearningRate 0.0266 Epoch: 9 Global Step: 401560 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:23,384-Speed 2630.56 samples/sec Loss 7.0812 LearningRate 0.0266 Epoch: 9 Global Step: 401570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:27,298-Speed 2617.56 samples/sec Loss 6.9074 LearningRate 0.0266 Epoch: 9 Global Step: 401580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:31,200-Speed 2625.16 samples/sec Loss 6.8948 LearningRate 0.0266 Epoch: 9 Global Step: 401590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:35,106-Speed 2622.55 samples/sec Loss 6.9450 LearningRate 0.0266 Epoch: 9 Global Step: 401600 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:39,019-Speed 2617.14 samples/sec Loss 7.0106 LearningRate 0.0266 Epoch: 9 Global Step: 401610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:42,918-Speed 2626.79 samples/sec Loss 7.0127 LearningRate 0.0266 Epoch: 9 Global Step: 401620 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:46,820-Speed 2625.20 samples/sec Loss 6.9555 LearningRate 0.0266 Epoch: 9 Global Step: 401630 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:50,715-Speed 2629.45 samples/sec Loss 6.7868 LearningRate 0.0266 Epoch: 9 Global Step: 401640 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:54,616-Speed 2626.17 samples/sec Loss 6.8917 LearningRate 0.0266 Epoch: 9 Global Step: 401650 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:51:58,512-Speed 2629.04 samples/sec Loss 6.8956 LearningRate 0.0266 Epoch: 9 Global Step: 401660 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:52:02,386-Speed 2643.82 samples/sec Loss 6.8820 LearningRate 0.0266 Epoch: 9 Global Step: 401670 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:06,288-Speed 2624.94 samples/sec Loss 6.8492 LearningRate 0.0266 Epoch: 9 Global Step: 401680 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:10,186-Speed 2627.65 samples/sec Loss 6.7603 LearningRate 0.0266 Epoch: 9 Global Step: 401690 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:14,092-Speed 2622.10 samples/sec Loss 6.8381 LearningRate 0.0266 Epoch: 9 Global Step: 401700 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:17,997-Speed 2623.25 samples/sec Loss 7.0171 LearningRate 0.0266 Epoch: 9 Global Step: 401710 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:21,907-Speed 2619.12 samples/sec Loss 6.8025 LearningRate 0.0266 Epoch: 9 Global Step: 401720 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:25,808-Speed 2626.35 samples/sec Loss 6.7994 LearningRate 0.0266 Epoch: 9 Global Step: 401730 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:29,710-Speed 2625.17 samples/sec Loss 6.8699 LearningRate 0.0266 Epoch: 9 Global Step: 401740 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:33,650-Speed 2599.63 samples/sec Loss 6.8406 LearningRate 0.0266 Epoch: 9 Global Step: 401750 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:37,576-Speed 2608.32 samples/sec Loss 6.9798 LearningRate 0.0266 Epoch: 9 Global Step: 401760 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:41,474-Speed 2627.42 samples/sec Loss 6.9223 LearningRate 0.0266 Epoch: 9 Global Step: 401770 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:52:45,374-Speed 2626.77 samples/sec Loss 7.0205 LearningRate 0.0266 Epoch: 9 Global Step: 401780 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:49,271-Speed 2627.99 samples/sec Loss 6.9089 LearningRate 0.0266 Epoch: 9 Global Step: 401790 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:53,172-Speed 2625.75 samples/sec Loss 6.8771 LearningRate 0.0266 Epoch: 9 Global Step: 401800 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:52:57,068-Speed 2628.82 samples/sec Loss 6.8788 LearningRate 0.0266 Epoch: 9 Global Step: 401810 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:00,972-Speed 2623.79 samples/sec Loss 6.8884 LearningRate 0.0266 Epoch: 9 Global Step: 401820 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:04,868-Speed 2629.06 samples/sec Loss 6.9104 LearningRate 0.0266 Epoch: 9 Global Step: 401830 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:08,765-Speed 2628.34 samples/sec Loss 7.0106 LearningRate 0.0266 Epoch: 9 Global Step: 401840 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:12,664-Speed 2626.77 samples/sec Loss 6.9537 LearningRate 0.0266 Epoch: 9 Global Step: 401850 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:16,569-Speed 2622.94 samples/sec Loss 7.0084 LearningRate 0.0266 Epoch: 9 Global Step: 401860 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:20,468-Speed 2626.95 samples/sec Loss 6.9228 LearningRate 0.0266 Epoch: 9 Global Step: 401870 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:24,370-Speed 2624.98 samples/sec Loss 6.9699 LearningRate 0.0266 Epoch: 9 Global Step: 401880 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:53:28,262-Speed 2632.04 samples/sec Loss 6.8843 LearningRate 0.0266 Epoch: 9 Global Step: 401890 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:32,160-Speed 2627.64 samples/sec Loss 6.8629 LearningRate 0.0266 Epoch: 9 Global Step: 401900 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:36,074-Speed 2616.47 samples/sec Loss 6.9165 LearningRate 0.0266 Epoch: 9 Global Step: 401910 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:39,972-Speed 2627.97 samples/sec Loss 6.9695 LearningRate 0.0266 Epoch: 9 Global Step: 401920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:43,864-Speed 2631.44 samples/sec Loss 6.8711 LearningRate 0.0266 Epoch: 9 Global Step: 401930 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:47,765-Speed 2625.67 samples/sec Loss 6.8798 LearningRate 0.0266 Epoch: 9 Global Step: 401940 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:51,661-Speed 2629.33 samples/sec Loss 6.9318 LearningRate 0.0266 Epoch: 9 Global Step: 401950 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:55,558-Speed 2628.56 samples/sec Loss 6.8504 LearningRate 0.0266 Epoch: 9 Global Step: 401960 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:53:59,462-Speed 2623.86 samples/sec Loss 6.8619 LearningRate 0.0266 Epoch: 9 Global Step: 401970 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:03,394-Speed 2604.54 samples/sec Loss 6.7394 LearningRate 0.0266 Epoch: 9 Global Step: 401980 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:07,274-Speed 2639.76 samples/sec Loss 6.8610 LearningRate 0.0266 Epoch: 9 Global Step: 401990 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:11,171-Speed 2628.54 samples/sec Loss 6.8953 LearningRate 0.0266 Epoch: 9 Global Step: 402000 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:15,064-Speed 2631.27 samples/sec Loss 6.8445 LearningRate 0.0266 Epoch: 9 Global Step: 402010 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:18,986-Speed 2611.12 samples/sec Loss 6.9181 LearningRate 0.0266 Epoch: 9 Global Step: 402020 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:22,884-Speed 2628.42 samples/sec Loss 6.7987 LearningRate 0.0266 Epoch: 9 Global Step: 402030 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:26,780-Speed 2628.90 samples/sec Loss 6.9104 LearningRate 0.0266 Epoch: 9 Global Step: 402040 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:30,689-Speed 2620.42 samples/sec Loss 6.7997 LearningRate 0.0266 Epoch: 9 Global Step: 402050 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:34,587-Speed 2627.47 samples/sec Loss 6.8387 LearningRate 0.0266 Epoch: 9 Global Step: 402060 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:38,486-Speed 2626.75 samples/sec Loss 6.8277 LearningRate 0.0266 Epoch: 9 Global Step: 402070 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:42,383-Speed 2628.65 samples/sec Loss 6.9323 LearningRate 0.0266 Epoch: 9 Global Step: 402080 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:46,295-Speed 2618.70 samples/sec Loss 6.9646 LearningRate 0.0266 Epoch: 9 Global Step: 402090 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:54:50,175-Speed 2639.57 samples/sec Loss 6.8257 LearningRate 0.0266 Epoch: 9 Global Step: 402100 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:54:54,049-Speed 2644.09 samples/sec Loss 6.8812 LearningRate 0.0266 Epoch: 9 Global Step: 402110 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:54:57,940-Speed 2632.56 samples/sec Loss 6.9249 LearningRate 0.0266 Epoch: 9 Global Step: 402120 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:01,843-Speed 2624.42 samples/sec Loss 6.7559 LearningRate 0.0265 Epoch: 9 Global Step: 402130 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:05,744-Speed 2625.25 samples/sec Loss 6.9672 LearningRate 0.0265 Epoch: 9 Global Step: 402140 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:09,637-Speed 2630.84 samples/sec Loss 6.8498 LearningRate 0.0265 Epoch: 9 Global Step: 402150 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:13,533-Speed 2629.25 samples/sec Loss 6.8759 LearningRate 0.0265 Epoch: 9 Global Step: 402160 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:17,429-Speed 2629.30 samples/sec Loss 6.9096 LearningRate 0.0265 Epoch: 9 Global Step: 402170 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:21,324-Speed 2629.82 samples/sec Loss 6.8156 LearningRate 0.0265 Epoch: 9 Global Step: 402180 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:25,258-Speed 2603.96 samples/sec Loss 6.9342 LearningRate 0.0265 Epoch: 9 Global Step: 402190 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:29,153-Speed 2629.45 samples/sec Loss 6.9214 LearningRate 0.0265 Epoch: 9 Global Step: 402200 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:55:33,058-Speed 2623.22 samples/sec Loss 6.9231 LearningRate 0.0265 Epoch: 9 Global Step: 402210 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:55:36,951-Speed 2630.80 samples/sec Loss 6.9396 LearningRate 0.0265 Epoch: 9 Global Step: 402220 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:55:40,855-Speed 2623.62 samples/sec Loss 6.8505 LearningRate 0.0265 Epoch: 9 Global Step: 402230 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:55:44,753-Speed 2627.43 samples/sec Loss 7.0593 LearningRate 0.0265 Epoch: 9 Global Step: 402240 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:55:48,651-Speed 2627.41 samples/sec Loss 6.7800 LearningRate 0.0265 Epoch: 9 Global Step: 402250 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:55:52,567-Speed 2616.71 samples/sec Loss 6.8133 LearningRate 0.0265 Epoch: 9 Global Step: 402260 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:55:56,462-Speed 2629.27 samples/sec Loss 6.9093 LearningRate 0.0265 Epoch: 9 Global Step: 402270 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:56:00,365-Speed 2624.62 samples/sec Loss 6.7682 LearningRate 0.0265 Epoch: 9 Global Step: 402280 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:56:04,263-Speed 2627.92 samples/sec Loss 6.9003 LearningRate 0.0265 Epoch: 9 Global Step: 402290 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:56:08,142-Speed 2640.36 samples/sec Loss 6.9094 LearningRate 0.0265 Epoch: 9 Global Step: 402300 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:12,040-Speed 2627.40 samples/sec Loss 6.7954 LearningRate 0.0265 Epoch: 9 Global Step: 402310 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:15,984-Speed 2597.54 samples/sec Loss 7.0255 LearningRate 0.0265 Epoch: 9 Global Step: 402320 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:19,882-Speed 2628.18 samples/sec Loss 6.8079 LearningRate 0.0265 Epoch: 9 Global Step: 402330 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:23,775-Speed 2630.32 samples/sec Loss 6.7628 LearningRate 0.0265 Epoch: 9 Global Step: 402340 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:27,682-Speed 2622.70 samples/sec Loss 6.9099 LearningRate 0.0265 Epoch: 9 Global Step: 402350 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:31,579-Speed 2628.07 samples/sec Loss 6.8234 LearningRate 0.0265 Epoch: 9 Global Step: 402360 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:35,486-Speed 2622.00 samples/sec Loss 6.8969 LearningRate 0.0265 Epoch: 9 Global Step: 402370 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:39,408-Speed 2611.54 samples/sec Loss 6.8127 LearningRate 0.0265 Epoch: 9 Global Step: 402380 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:43,305-Speed 2628.58 samples/sec Loss 6.9651 LearningRate 0.0265 Epoch: 9 Global Step: 402390 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:56:47,196-Speed 2631.78 samples/sec Loss 6.8430 LearningRate 0.0265 Epoch: 9 Global Step: 402400 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:56:51,094-Speed 2628.05 samples/sec Loss 6.8371 LearningRate 0.0265 Epoch: 9 Global Step: 402410 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:56:54,987-Speed 2630.96 samples/sec Loss 6.8313 LearningRate 0.0265 Epoch: 9 Global Step: 402420 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:56:58,883-Speed 2628.99 samples/sec Loss 6.8997 LearningRate 0.0265 Epoch: 9 Global Step: 402430 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:02,780-Speed 2628.60 samples/sec Loss 6.8620 LearningRate 0.0265 Epoch: 9 Global Step: 402440 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:06,674-Speed 2630.10 samples/sec Loss 6.9343 LearningRate 0.0265 Epoch: 9 Global Step: 402450 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:10,568-Speed 2630.56 samples/sec Loss 6.8978 LearningRate 0.0265 Epoch: 9 Global Step: 402460 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:14,466-Speed 2627.93 samples/sec Loss 6.9715 LearningRate 0.0265 Epoch: 9 Global Step: 402470 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:18,367-Speed 2625.37 samples/sec Loss 6.8756 LearningRate 0.0265 Epoch: 9 Global Step: 402480 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:22,282-Speed 2616.21 samples/sec Loss 6.8331 LearningRate 0.0265 Epoch: 9 Global Step: 402490 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:26,175-Speed 2631.07 samples/sec Loss 6.9198 LearningRate 0.0265 Epoch: 9 Global Step: 402500 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 16:57:30,059-Speed 2636.99 samples/sec Loss 6.7880 LearningRate 0.0265 Epoch: 9 Global Step: 402510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:34,064-Speed 2558.01 samples/sec Loss 6.8967 LearningRate 0.0265 Epoch: 9 Global Step: 402520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:37,961-Speed 2628.12 samples/sec Loss 6.8738 LearningRate 0.0265 Epoch: 9 Global Step: 402530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:41,873-Speed 2618.11 samples/sec Loss 6.8208 LearningRate 0.0265 Epoch: 9 Global Step: 402540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:45,768-Speed 2629.87 samples/sec Loss 6.8672 LearningRate 0.0265 Epoch: 9 Global Step: 402550 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:49,705-Speed 2601.53 samples/sec Loss 6.9308 LearningRate 0.0265 Epoch: 9 Global Step: 402560 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:53,601-Speed 2629.43 samples/sec Loss 6.9229 LearningRate 0.0265 Epoch: 9 Global Step: 402570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:57:57,496-Speed 2629.74 samples/sec Loss 6.8791 LearningRate 0.0265 Epoch: 9 Global Step: 402580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:58:01,398-Speed 2624.95 samples/sec Loss 6.9373 LearningRate 0.0265 Epoch: 9 Global Step: 402590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:58:05,362-Speed 2584.11 samples/sec Loss 6.8750 LearningRate 0.0265 Epoch: 9 Global Step: 402600 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:58:09,269-Speed 2621.30 samples/sec Loss 6.8470 LearningRate 0.0265 Epoch: 9 Global Step: 402610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:58:13,166-Speed 2628.23 samples/sec Loss 6.8397 LearningRate 0.0265 Epoch: 9 Global Step: 402620 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:17,062-Speed 2628.94 samples/sec Loss 6.9422 LearningRate 0.0265 Epoch: 9 Global Step: 402630 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:20,959-Speed 2628.64 samples/sec Loss 6.9705 LearningRate 0.0265 Epoch: 9 Global Step: 402640 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:24,861-Speed 2625.15 samples/sec Loss 6.8546 LearningRate 0.0265 Epoch: 9 Global Step: 402650 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:28,758-Speed 2628.20 samples/sec Loss 7.0045 LearningRate 0.0265 Epoch: 9 Global Step: 402660 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:32,675-Speed 2614.82 samples/sec Loss 6.9442 LearningRate 0.0265 Epoch: 9 Global Step: 402670 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:36,587-Speed 2618.58 samples/sec Loss 6.8783 LearningRate 0.0265 Epoch: 9 Global Step: 402680 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:40,480-Speed 2631.33 samples/sec Loss 6.8566 LearningRate 0.0265 Epoch: 9 Global Step: 402690 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:44,378-Speed 2627.54 samples/sec Loss 7.0067 LearningRate 0.0265 Epoch: 9 Global Step: 402700 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:48,281-Speed 2623.99 samples/sec Loss 6.9433 LearningRate 0.0265 Epoch: 9 Global Step: 402710 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:58:52,180-Speed 2626.68 samples/sec Loss 6.8934 LearningRate 0.0265 Epoch: 9 Global Step: 402720 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:58:56,080-Speed 2626.33 samples/sec Loss 6.8355 LearningRate 0.0265 Epoch: 9 Global Step: 402730 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:58:59,987-Speed 2621.91 samples/sec Loss 6.9743 LearningRate 0.0265 Epoch: 9 Global Step: 402740 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:59:03,863-Speed 2642.12 samples/sec Loss 6.7845 LearningRate 0.0265 Epoch: 9 Global Step: 402750 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:07,768-Speed 2623.12 samples/sec Loss 6.8761 LearningRate 0.0265 Epoch: 9 Global Step: 402760 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:11,670-Speed 2625.27 samples/sec Loss 6.9322 LearningRate 0.0265 Epoch: 9 Global Step: 402770 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:15,569-Speed 2627.20 samples/sec Loss 6.8808 LearningRate 0.0265 Epoch: 9 Global Step: 402780 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:19,466-Speed 2627.96 samples/sec Loss 6.7978 LearningRate 0.0265 Epoch: 9 Global Step: 402790 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:23,410-Speed 2596.88 samples/sec Loss 6.8617 LearningRate 0.0265 Epoch: 9 Global Step: 402800 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:27,311-Speed 2625.96 samples/sec Loss 6.8752 LearningRate 0.0265 Epoch: 9 Global Step: 402810 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:31,218-Speed 2621.77 samples/sec Loss 6.8820 LearningRate 0.0265 Epoch: 9 Global Step: 402820 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:35,134-Speed 2615.66 samples/sec Loss 6.8537 LearningRate 0.0265 Epoch: 9 Global Step: 402830 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:39,038-Speed 2623.63 samples/sec Loss 6.8474 LearningRate 0.0265 Epoch: 9 Global Step: 402840 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 16:59:42,950-Speed 2618.54 samples/sec Loss 6.8556 LearningRate 0.0265 Epoch: 9 Global Step: 402850 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:59:46,855-Speed 2622.71 samples/sec Loss 6.7935 LearningRate 0.0265 Epoch: 9 Global Step: 402860 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:59:50,763-Speed 2621.60 samples/sec Loss 6.8167 LearningRate 0.0265 Epoch: 9 Global Step: 402870 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:59:54,667-Speed 2623.48 samples/sec Loss 6.8561 LearningRate 0.0265 Epoch: 9 Global Step: 402880 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 16:59:58,545-Speed 2641.68 samples/sec Loss 6.8265 LearningRate 0.0265 Epoch: 9 Global Step: 402890 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:02,457-Speed 2617.88 samples/sec Loss 6.8742 LearningRate 0.0265 Epoch: 9 Global Step: 402900 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:06,360-Speed 2624.51 samples/sec Loss 6.9103 LearningRate 0.0265 Epoch: 9 Global Step: 402910 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:10,279-Speed 2613.29 samples/sec Loss 6.8309 LearningRate 0.0265 Epoch: 9 Global Step: 402920 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:14,178-Speed 2627.80 samples/sec Loss 6.8032 LearningRate 0.0265 Epoch: 9 Global Step: 402930 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:18,089-Speed 2618.31 samples/sec Loss 6.8997 LearningRate 0.0264 Epoch: 9 Global Step: 402940 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:21,984-Speed 2630.06 samples/sec Loss 6.9920 LearningRate 0.0264 Epoch: 9 Global Step: 402950 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:25,883-Speed 2627.05 samples/sec Loss 6.9101 LearningRate 0.0264 Epoch: 9 Global Step: 402960 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:29,781-Speed 2628.08 samples/sec Loss 6.8726 LearningRate 0.0264 Epoch: 9 Global Step: 402970 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:33,680-Speed 2626.78 samples/sec Loss 6.8863 LearningRate 0.0264 Epoch: 9 Global Step: 402980 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:00:37,581-Speed 2625.15 samples/sec Loss 6.8618 LearningRate 0.0264 Epoch: 9 Global Step: 402990 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:00:41,490-Speed 2620.33 samples/sec Loss 6.8348 LearningRate 0.0264 Epoch: 9 Global Step: 403000 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:00:45,400-Speed 2619.58 samples/sec Loss 6.9072 LearningRate 0.0264 Epoch: 9 Global Step: 403010 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:00:49,296-Speed 2629.08 samples/sec Loss 6.9377 LearningRate 0.0264 Epoch: 9 Global Step: 403020 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:00:53,194-Speed 2627.96 samples/sec Loss 6.8728 LearningRate 0.0264 Epoch: 9 Global Step: 403030 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:00:57,101-Speed 2621.75 samples/sec Loss 6.9616 LearningRate 0.0264 Epoch: 9 Global Step: 403040 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:01,025-Speed 2610.41 samples/sec Loss 6.8095 LearningRate 0.0264 Epoch: 9 Global Step: 403050 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:05,034-Speed 2554.46 samples/sec Loss 6.9043 LearningRate 0.0264 Epoch: 9 Global Step: 403060 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:08,940-Speed 2622.23 samples/sec Loss 6.8031 LearningRate 0.0264 Epoch: 9 Global Step: 403070 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:12,840-Speed 2626.28 samples/sec Loss 6.7906 LearningRate 0.0264 Epoch: 9 Global Step: 403080 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:16,727-Speed 2635.69 samples/sec Loss 6.8624 LearningRate 0.0264 Epoch: 9 Global Step: 403090 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:20,634-Speed 2621.93 samples/sec Loss 6.8368 LearningRate 0.0264 Epoch: 9 Global Step: 403100 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:24,560-Speed 2608.48 samples/sec Loss 6.8458 LearningRate 0.0264 Epoch: 9 Global Step: 403110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:01:28,431-Speed 2646.87 samples/sec Loss 6.8750 LearningRate 0.0264 Epoch: 9 Global Step: 403120 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:32,327-Speed 2628.91 samples/sec Loss 7.0447 LearningRate 0.0264 Epoch: 9 Global Step: 403130 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:36,225-Speed 2627.07 samples/sec Loss 6.8263 LearningRate 0.0264 Epoch: 9 Global Step: 403140 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:40,123-Speed 2627.47 samples/sec Loss 6.8940 LearningRate 0.0264 Epoch: 9 Global Step: 403150 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:44,018-Speed 2629.83 samples/sec Loss 6.8275 LearningRate 0.0264 Epoch: 9 Global Step: 403160 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:47,919-Speed 2625.87 samples/sec Loss 6.8906 LearningRate 0.0264 Epoch: 9 Global Step: 403170 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:51,819-Speed 2626.22 samples/sec Loss 6.8025 LearningRate 0.0264 Epoch: 9 Global Step: 403180 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:55,717-Speed 2627.29 samples/sec Loss 6.9082 LearningRate 0.0264 Epoch: 9 Global Step: 403190 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:01:59,616-Speed 2627.70 samples/sec Loss 6.8642 LearningRate 0.0264 Epoch: 9 Global Step: 403200 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:02:03,516-Speed 2626.48 samples/sec Loss 6.8912 LearningRate 0.0264 Epoch: 9 Global Step: 403210 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:02:07,412-Speed 2628.57 samples/sec Loss 6.9884 LearningRate 0.0264 Epoch: 9 Global Step: 403220 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:11,448-Speed 2537.89 samples/sec Loss 7.0667 LearningRate 0.0264 Epoch: 9 Global Step: 403230 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:15,522-Speed 2513.96 samples/sec Loss 6.9347 LearningRate 0.0264 Epoch: 9 Global Step: 403240 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:19,434-Speed 2618.26 samples/sec Loss 6.7539 LearningRate 0.0264 Epoch: 9 Global Step: 403250 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:23,344-Speed 2619.54 samples/sec Loss 6.9790 LearningRate 0.0264 Epoch: 9 Global Step: 403260 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:27,244-Speed 2626.92 samples/sec Loss 6.9372 LearningRate 0.0264 Epoch: 9 Global Step: 403270 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:31,146-Speed 2624.89 samples/sec Loss 6.9681 LearningRate 0.0264 Epoch: 9 Global Step: 403280 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:35,045-Speed 2626.94 samples/sec Loss 6.7988 LearningRate 0.0264 Epoch: 9 Global Step: 403290 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:38,944-Speed 2626.84 samples/sec Loss 6.9112 LearningRate 0.0264 Epoch: 9 Global Step: 403300 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:42,841-Speed 2627.78 samples/sec Loss 6.9535 LearningRate 0.0264 Epoch: 9 Global Step: 403310 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:46,744-Speed 2624.65 samples/sec Loss 6.8494 LearningRate 0.0264 Epoch: 9 Global Step: 403320 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:02:50,617-Speed 2645.06 samples/sec Loss 6.8202 LearningRate 0.0264 Epoch: 9 Global Step: 403330 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:54,517-Speed 2626.00 samples/sec Loss 6.9602 LearningRate 0.0264 Epoch: 9 Global Step: 403340 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:02:58,414-Speed 2628.47 samples/sec Loss 6.7707 LearningRate 0.0264 Epoch: 9 Global Step: 403350 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:03:02,312-Speed 2627.08 samples/sec Loss 6.9274 LearningRate 0.0264 Epoch: 9 Global Step: 403360 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:03:06,211-Speed 2627.45 samples/sec Loss 6.9264 LearningRate 0.0264 Epoch: 9 Global Step: 403370 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:03:10,106-Speed 2629.51 samples/sec Loss 6.8894 LearningRate 0.0264 Epoch: 9 Global Step: 403380 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:14,003-Speed 2628.21 samples/sec Loss 6.8541 LearningRate 0.0264 Epoch: 9 Global Step: 403390 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:17,904-Speed 2625.72 samples/sec Loss 6.9169 LearningRate 0.0264 Epoch: 9 Global Step: 403400 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:21,813-Speed 2620.20 samples/sec Loss 6.9173 LearningRate 0.0264 Epoch: 9 Global Step: 403410 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:25,710-Speed 2628.29 samples/sec Loss 6.8182 LearningRate 0.0264 Epoch: 9 Global Step: 403420 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:29,609-Speed 2627.30 samples/sec Loss 6.7796 LearningRate 0.0264 Epoch: 9 Global Step: 403430 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:33,507-Speed 2627.63 samples/sec Loss 6.7867 LearningRate 0.0264 Epoch: 9 Global Step: 403440 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:37,408-Speed 2625.41 samples/sec Loss 6.7806 LearningRate 0.0264 Epoch: 9 Global Step: 403450 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:41,303-Speed 2629.65 samples/sec Loss 6.8781 LearningRate 0.0264 Epoch: 9 Global Step: 403460 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:45,210-Speed 2621.00 samples/sec Loss 6.9026 LearningRate 0.0264 Epoch: 9 Global Step: 403470 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:03:49,106-Speed 2629.54 samples/sec Loss 6.9526 LearningRate 0.0264 Epoch: 9 Global Step: 403480 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:03:53,002-Speed 2629.12 samples/sec Loss 6.8666 LearningRate 0.0264 Epoch: 9 Global Step: 403490 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:03:56,899-Speed 2628.65 samples/sec Loss 6.8480 LearningRate 0.0264 Epoch: 9 Global Step: 403500 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:00,798-Speed 2626.68 samples/sec Loss 6.8707 LearningRate 0.0264 Epoch: 9 Global Step: 403510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:04,758-Speed 2586.73 samples/sec Loss 6.9490 LearningRate 0.0264 Epoch: 9 Global Step: 403520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:08,650-Speed 2631.69 samples/sec Loss 6.7507 LearningRate 0.0264 Epoch: 9 Global Step: 403530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:12,556-Speed 2621.90 samples/sec Loss 6.9809 LearningRate 0.0264 Epoch: 9 Global Step: 403540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:16,449-Speed 2631.13 samples/sec Loss 6.8572 LearningRate 0.0264 Epoch: 9 Global Step: 403550 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:20,365-Speed 2615.89 samples/sec Loss 6.9890 LearningRate 0.0264 Epoch: 9 Global Step: 403560 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:24,265-Speed 2626.12 samples/sec Loss 6.8360 LearningRate 0.0264 Epoch: 9 Global Step: 403570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:28,147-Speed 2638.43 samples/sec Loss 6.9315 LearningRate 0.0264 Epoch: 9 Global Step: 403580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:32,081-Speed 2603.62 samples/sec Loss 6.8964 LearningRate 0.0264 Epoch: 9 Global Step: 403590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:35,978-Speed 2628.92 samples/sec Loss 6.8899 LearningRate 0.0264 Epoch: 9 Global Step: 403600 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:39,879-Speed 2625.50 samples/sec Loss 6.8553 LearningRate 0.0264 Epoch: 9 Global Step: 403610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:43,774-Speed 2629.27 samples/sec Loss 6.8740 LearningRate 0.0264 Epoch: 9 Global Step: 403620 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:47,675-Speed 2626.14 samples/sec Loss 6.8395 LearningRate 0.0264 Epoch: 9 Global Step: 403630 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:51,574-Speed 2626.80 samples/sec Loss 6.9077 LearningRate 0.0264 Epoch: 9 Global Step: 403640 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:55,468-Speed 2630.78 samples/sec Loss 6.8466 LearningRate 0.0264 Epoch: 9 Global Step: 403650 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:04:59,366-Speed 2627.32 samples/sec Loss 6.8389 LearningRate 0.0264 Epoch: 9 Global Step: 403660 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:05:03,258-Speed 2631.44 samples/sec Loss 6.9522 LearningRate 0.0264 Epoch: 9 Global Step: 403670 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:05:07,154-Speed 2628.97 samples/sec Loss 6.9260 LearningRate 0.0264 Epoch: 9 Global Step: 403680 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:05:11,122-Speed 2581.79 samples/sec Loss 6.7139 LearningRate 0.0264 Epoch: 9 Global Step: 403690 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:05:15,047-Speed 2609.87 samples/sec Loss 6.8303 LearningRate 0.0264 Epoch: 9 Global Step: 403700 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:05:19,046-Speed 2560.99 samples/sec Loss 6.8935 LearningRate 0.0264 Epoch: 9 Global Step: 403710 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:05:22,943-Speed 2628.32 samples/sec Loss 6.8384 LearningRate 0.0264 Epoch: 9 Global Step: 403720 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:05:26,842-Speed 2627.61 samples/sec Loss 6.8663 LearningRate 0.0264 Epoch: 9 Global Step: 403730 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:05:30,722-Speed 2639.60 samples/sec Loss 6.8644 LearningRate 0.0263 Epoch: 9 Global Step: 403740 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:05:34,622-Speed 2625.96 samples/sec Loss 6.8726 LearningRate 0.0263 Epoch: 9 Global Step: 403750 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:05:38,527-Speed 2622.95 samples/sec Loss 6.9161 LearningRate 0.0263 Epoch: 9 Global Step: 403760 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:05:42,431-Speed 2623.18 samples/sec Loss 6.8022 LearningRate 0.0263 Epoch: 9 Global Step: 403770 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:05:46,338-Speed 2621.98 samples/sec Loss 6.8755 LearningRate 0.0263 Epoch: 9 Global Step: 403780 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:05:50,233-Speed 2629.53 samples/sec Loss 6.8810 LearningRate 0.0263 Epoch: 9 Global Step: 403790 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:05:54,128-Speed 2629.84 samples/sec Loss 6.8622 LearningRate 0.0263 Epoch: 9 Global Step: 403800 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:05:58,028-Speed 2626.47 samples/sec Loss 6.9276 LearningRate 0.0263 Epoch: 9 Global Step: 403810 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:06:01,936-Speed 2620.66 samples/sec Loss 6.8656 LearningRate 0.0263 Epoch: 9 Global Step: 403820 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:06:05,831-Speed 2629.40 samples/sec Loss 6.7369 LearningRate 0.0263 Epoch: 9 Global Step: 403830 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:06:09,729-Speed 2627.46 samples/sec Loss 6.8440 LearningRate 0.0263 Epoch: 9 Global Step: 403840 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:13,627-Speed 2627.54 samples/sec Loss 6.9481 LearningRate 0.0263 Epoch: 9 Global Step: 403850 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:17,526-Speed 2627.30 samples/sec Loss 6.9248 LearningRate 0.0263 Epoch: 9 Global Step: 403860 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:21,446-Speed 2613.00 samples/sec Loss 6.9052 LearningRate 0.0263 Epoch: 9 Global Step: 403870 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:25,344-Speed 2628.23 samples/sec Loss 6.9588 LearningRate 0.0263 Epoch: 9 Global Step: 403880 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:29,244-Speed 2626.36 samples/sec Loss 6.8928 LearningRate 0.0263 Epoch: 9 Global Step: 403890 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:33,154-Speed 2619.46 samples/sec Loss 6.8906 LearningRate 0.0263 Epoch: 9 Global Step: 403900 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:37,057-Speed 2624.20 samples/sec Loss 6.7542 LearningRate 0.0263 Epoch: 9 Global Step: 403910 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:40,957-Speed 2626.43 samples/sec Loss 6.8258 LearningRate 0.0263 Epoch: 9 Global Step: 403920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:44,855-Speed 2627.68 samples/sec Loss 6.8201 LearningRate 0.0263 Epoch: 9 Global Step: 403930 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:06:48,758-Speed 2624.58 samples/sec Loss 6.9351 LearningRate 0.0263 Epoch: 9 Global Step: 403940 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:06:52,656-Speed 2628.55 samples/sec Loss 6.8549 LearningRate 0.0263 Epoch: 9 Global Step: 403950 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:06:56,540-Speed 2636.55 samples/sec Loss 6.8714 LearningRate 0.0263 Epoch: 9 Global Step: 403960 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:07:00,445-Speed 2623.58 samples/sec Loss 6.9789 LearningRate 0.0263 Epoch: 9 Global Step: 403970 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:07:04,351-Speed 2621.94 samples/sec Loss 6.8440 LearningRate 0.0263 Epoch: 9 Global Step: 403980 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:07:08,251-Speed 2626.57 samples/sec Loss 6.7572 LearningRate 0.0263 Epoch: 9 Global Step: 403990 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:07:12,154-Speed 2624.19 samples/sec Loss 6.8422 LearningRate 0.0263 Epoch: 9 Global Step: 404000 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:07:16,040-Speed 2635.70 samples/sec Loss 6.8129 LearningRate 0.0263 Epoch: 9 Global Step: 404010 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:19,950-Speed 2619.77 samples/sec Loss 6.9243 LearningRate 0.0263 Epoch: 9 Global Step: 404020 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:23,850-Speed 2627.03 samples/sec Loss 6.8508 LearningRate 0.0263 Epoch: 9 Global Step: 404030 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:27,738-Speed 2634.14 samples/sec Loss 6.7770 LearningRate 0.0263 Epoch: 9 Global Step: 404040 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:31,642-Speed 2623.98 samples/sec Loss 6.7914 LearningRate 0.0263 Epoch: 9 Global Step: 404050 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:35,538-Speed 2628.35 samples/sec Loss 6.8287 LearningRate 0.0263 Epoch: 9 Global Step: 404060 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:39,439-Speed 2626.05 samples/sec Loss 6.8158 LearningRate 0.0263 Epoch: 9 Global Step: 404070 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:43,336-Speed 2628.08 samples/sec Loss 6.7710 LearningRate 0.0263 Epoch: 9 Global Step: 404080 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:47,240-Speed 2624.03 samples/sec Loss 6.8784 LearningRate 0.0263 Epoch: 9 Global Step: 404090 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:51,139-Speed 2626.65 samples/sec Loss 6.8080 LearningRate 0.0263 Epoch: 9 Global Step: 404100 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:07:55,046-Speed 2621.42 samples/sec Loss 6.8630 LearningRate 0.0263 Epoch: 9 Global Step: 404110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:07:58,949-Speed 2624.65 samples/sec Loss 6.7967 LearningRate 0.0263 Epoch: 9 Global Step: 404120 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:08:02,851-Speed 2624.91 samples/sec Loss 6.8560 LearningRate 0.0263 Epoch: 9 Global Step: 404130 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:08:06,760-Speed 2620.03 samples/sec Loss 6.8530 LearningRate 0.0263 Epoch: 9 Global Step: 404140 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:08:10,655-Speed 2629.65 samples/sec Loss 6.7337 LearningRate 0.0263 Epoch: 9 Global Step: 404150 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:08:14,606-Speed 2593.05 samples/sec Loss 6.8676 LearningRate 0.0263 Epoch: 9 Global Step: 404160 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:08:18,474-Speed 2648.17 samples/sec Loss 6.8815 LearningRate 0.0263 Epoch: 9 Global Step: 404170 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:22,370-Speed 2629.12 samples/sec Loss 6.8173 LearningRate 0.0263 Epoch: 9 Global Step: 404180 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:26,266-Speed 2629.29 samples/sec Loss 6.9371 LearningRate 0.0263 Epoch: 9 Global Step: 404190 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:30,173-Speed 2621.71 samples/sec Loss 6.8785 LearningRate 0.0263 Epoch: 9 Global Step: 404200 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:34,094-Speed 2612.51 samples/sec Loss 6.8319 LearningRate 0.0263 Epoch: 9 Global Step: 404210 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:37,985-Speed 2632.48 samples/sec Loss 6.8854 LearningRate 0.0263 Epoch: 9 Global Step: 404220 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:41,889-Speed 2623.05 samples/sec Loss 7.0140 LearningRate 0.0263 Epoch: 9 Global Step: 404230 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:45,786-Speed 2628.65 samples/sec Loss 6.8271 LearningRate 0.0263 Epoch: 9 Global Step: 404240 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:49,680-Speed 2630.38 samples/sec Loss 6.8112 LearningRate 0.0263 Epoch: 9 Global Step: 404250 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:53,581-Speed 2625.75 samples/sec Loss 6.9461 LearningRate 0.0263 Epoch: 9 Global Step: 404260 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:08:57,475-Speed 2630.26 samples/sec Loss 6.8970 LearningRate 0.0263 Epoch: 9 Global Step: 404270 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:01,388-Speed 2617.81 samples/sec Loss 6.7254 LearningRate 0.0263 Epoch: 9 Global Step: 404280 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:05,301-Speed 2617.99 samples/sec Loss 6.7903 LearningRate 0.0263 Epoch: 9 Global Step: 404290 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:09,198-Speed 2627.68 samples/sec Loss 6.9243 LearningRate 0.0263 Epoch: 9 Global Step: 404300 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:13,094-Speed 2628.90 samples/sec Loss 6.7431 LearningRate 0.0263 Epoch: 9 Global Step: 404310 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:16,993-Speed 2627.16 samples/sec Loss 6.8841 LearningRate 0.0263 Epoch: 9 Global Step: 404320 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:20,894-Speed 2626.09 samples/sec Loss 6.9538 LearningRate 0.0263 Epoch: 9 Global Step: 404330 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:24,792-Speed 2627.17 samples/sec Loss 6.8671 LearningRate 0.0263 Epoch: 9 Global Step: 404340 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:09:28,680-Speed 2634.91 samples/sec Loss 6.8252 LearningRate 0.0263 Epoch: 9 Global Step: 404350 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:32,573-Speed 2630.49 samples/sec Loss 6.9866 LearningRate 0.0263 Epoch: 9 Global Step: 404360 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:36,473-Speed 2627.05 samples/sec Loss 7.0046 LearningRate 0.0263 Epoch: 9 Global Step: 404370 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:40,377-Speed 2623.48 samples/sec Loss 6.8468 LearningRate 0.0263 Epoch: 9 Global Step: 404380 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:44,296-Speed 2613.00 samples/sec Loss 6.9777 LearningRate 0.0263 Epoch: 9 Global Step: 404390 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:48,218-Speed 2611.42 samples/sec Loss 6.9190 LearningRate 0.0263 Epoch: 9 Global Step: 404400 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:52,129-Speed 2619.82 samples/sec Loss 6.7674 LearningRate 0.0263 Epoch: 9 Global Step: 404410 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:56,050-Speed 2612.28 samples/sec Loss 6.8213 LearningRate 0.0263 Epoch: 9 Global Step: 404420 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:09:59,973-Speed 2610.82 samples/sec Loss 6.8945 LearningRate 0.0263 Epoch: 9 Global Step: 404430 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:10:03,876-Speed 2624.45 samples/sec Loss 6.9117 LearningRate 0.0263 Epoch: 9 Global Step: 404440 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:10:07,784-Speed 2620.83 samples/sec Loss 6.8010 LearningRate 0.0263 Epoch: 9 Global Step: 404450 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:11,708-Speed 2610.42 samples/sec Loss 6.8129 LearningRate 0.0263 Epoch: 9 Global Step: 404460 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:15,621-Speed 2617.75 samples/sec Loss 6.7700 LearningRate 0.0263 Epoch: 9 Global Step: 404470 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:19,523-Speed 2624.50 samples/sec Loss 6.8817 LearningRate 0.0263 Epoch: 9 Global Step: 404480 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:23,418-Speed 2629.86 samples/sec Loss 6.6957 LearningRate 0.0263 Epoch: 9 Global Step: 404490 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:27,329-Speed 2619.56 samples/sec Loss 6.9000 LearningRate 0.0263 Epoch: 9 Global Step: 404500 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:31,232-Speed 2623.78 samples/sec Loss 6.9284 LearningRate 0.0263 Epoch: 9 Global Step: 404510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:35,222-Speed 2567.22 samples/sec Loss 6.8376 LearningRate 0.0263 Epoch: 9 Global Step: 404520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:39,119-Speed 2628.24 samples/sec Loss 6.9872 LearningRate 0.0263 Epoch: 9 Global Step: 404530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:43,018-Speed 2627.24 samples/sec Loss 6.7994 LearningRate 0.0263 Epoch: 9 Global Step: 404540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:46,916-Speed 2627.73 samples/sec Loss 6.9292 LearningRate 0.0262 Epoch: 9 Global Step: 404550 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:10:50,809-Speed 2630.91 samples/sec Loss 6.8524 LearningRate 0.0262 Epoch: 9 Global Step: 404560 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:10:54,683-Speed 2644.11 samples/sec Loss 6.8986 LearningRate 0.0262 Epoch: 9 Global Step: 404570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:10:58,577-Speed 2630.40 samples/sec Loss 6.7863 LearningRate 0.0262 Epoch: 9 Global Step: 404580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:02,482-Speed 2622.20 samples/sec Loss 6.8071 LearningRate 0.0262 Epoch: 9 Global Step: 404590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:06,377-Speed 2629.46 samples/sec Loss 6.9177 LearningRate 0.0262 Epoch: 9 Global Step: 404600 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:10,278-Speed 2626.37 samples/sec Loss 6.7940 LearningRate 0.0262 Epoch: 9 Global Step: 404610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:14,224-Speed 2595.51 samples/sec Loss 6.8021 LearningRate 0.0262 Epoch: 9 Global Step: 404620 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:18,123-Speed 2627.38 samples/sec Loss 6.9395 LearningRate 0.0262 Epoch: 9 Global Step: 404630 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:22,036-Speed 2617.92 samples/sec Loss 6.8219 LearningRate 0.0262 Epoch: 9 Global Step: 404640 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:25,930-Speed 2630.03 samples/sec Loss 6.8757 LearningRate 0.0262 Epoch: 9 Global Step: 404650 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:29,831-Speed 2626.40 samples/sec Loss 6.9319 LearningRate 0.0262 Epoch: 9 Global Step: 404660 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:33,724-Speed 2630.54 samples/sec Loss 6.7930 LearningRate 0.0262 Epoch: 9 Global Step: 404670 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:11:37,607-Speed 2637.71 samples/sec Loss 6.7937 LearningRate 0.0262 Epoch: 9 Global Step: 404680 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:41,504-Speed 2628.02 samples/sec Loss 6.8581 LearningRate 0.0262 Epoch: 9 Global Step: 404690 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:11:45,389-Speed 2636.75 samples/sec Loss 7.0308 LearningRate 0.0262 Epoch: 9 Global Step: 404700 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:11:49,304-Speed 2616.51 samples/sec Loss 6.7442 LearningRate 0.0262 Epoch: 9 Global Step: 404710 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:11:53,209-Speed 2623.38 samples/sec Loss 6.8578 LearningRate 0.0262 Epoch: 9 Global Step: 404720 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:11:57,132-Speed 2610.79 samples/sec Loss 6.9268 LearningRate 0.0262 Epoch: 9 Global Step: 404730 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:12:01,134-Speed 2559.15 samples/sec Loss 6.9769 LearningRate 0.0262 Epoch: 9 Global Step: 404740 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:12:05,033-Speed 2627.12 samples/sec Loss 6.9335 LearningRate 0.0262 Epoch: 9 Global Step: 404750 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:12:08,934-Speed 2625.45 samples/sec Loss 6.8983 LearningRate 0.0262 Epoch: 9 Global Step: 404760 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:12:12,831-Speed 2627.92 samples/sec Loss 6.8393 LearningRate 0.0262 Epoch: 9 Global Step: 404770 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:12:16,728-Speed 2628.58 samples/sec Loss 6.7915 LearningRate 0.0262 Epoch: 9 Global Step: 404780 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:12:20,623-Speed 2630.14 samples/sec Loss 6.7294 LearningRate 0.0262 Epoch: 9 Global Step: 404790 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:12:24,523-Speed 2626.51 samples/sec Loss 6.8130 LearningRate 0.0262 Epoch: 9 Global Step: 404800 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:28,420-Speed 2628.00 samples/sec Loss 6.8972 LearningRate 0.0262 Epoch: 9 Global Step: 404810 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:32,321-Speed 2626.29 samples/sec Loss 6.8560 LearningRate 0.0262 Epoch: 9 Global Step: 404820 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:36,219-Speed 2627.21 samples/sec Loss 6.8877 LearningRate 0.0262 Epoch: 9 Global Step: 404830 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:40,113-Speed 2629.93 samples/sec Loss 6.7806 LearningRate 0.0262 Epoch: 9 Global Step: 404840 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:44,015-Speed 2625.40 samples/sec Loss 6.7986 LearningRate 0.0262 Epoch: 9 Global Step: 404850 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:47,917-Speed 2625.24 samples/sec Loss 6.9130 LearningRate 0.0262 Epoch: 9 Global Step: 404860 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:51,820-Speed 2624.09 samples/sec Loss 6.8185 LearningRate 0.0262 Epoch: 9 Global Step: 404870 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:55,725-Speed 2623.23 samples/sec Loss 6.9179 LearningRate 0.0262 Epoch: 9 Global Step: 404880 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:12:59,650-Speed 2609.50 samples/sec Loss 6.8328 LearningRate 0.0262 Epoch: 9 Global Step: 404890 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:13:03,546-Speed 2628.81 samples/sec Loss 6.9444 LearningRate 0.0262 Epoch: 9 Global Step: 404900 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:13:07,428-Speed 2638.20 samples/sec Loss 6.8004 LearningRate 0.0262 Epoch: 9 Global Step: 404910 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:13:11,327-Speed 2626.90 samples/sec Loss 6.8098 LearningRate 0.0262 Epoch: 9 Global Step: 404920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:13:15,225-Speed 2628.11 samples/sec Loss 6.9086 LearningRate 0.0262 Epoch: 9 Global Step: 404930 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:13:19,134-Speed 2620.00 samples/sec Loss 6.7276 LearningRate 0.0262 Epoch: 9 Global Step: 404940 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:13:23,044-Speed 2619.90 samples/sec Loss 6.9524 LearningRate 0.0262 Epoch: 9 Global Step: 404950 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:13:26,941-Speed 2628.01 samples/sec Loss 6.8718 LearningRate 0.0262 Epoch: 9 Global Step: 404960 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:13:30,900-Speed 2587.76 samples/sec Loss 6.8065 LearningRate 0.0262 Epoch: 9 Global Step: 404970 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:13:34,803-Speed 2623.83 samples/sec Loss 6.8455 LearningRate 0.0262 Epoch: 9 Global Step: 404980 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:13:38,744-Speed 2599.11 samples/sec Loss 6.9616 LearningRate 0.0262 Epoch: 9 Global Step: 404990 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:13:42,644-Speed 2626.19 samples/sec Loss 6.8932 LearningRate 0.0262 Epoch: 9 Global Step: 405000 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:13:46,549-Speed 2623.07 samples/sec Loss 6.8518 LearningRate 0.0262 Epoch: 9 Global Step: 405010 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:13:50,453-Speed 2623.33 samples/sec Loss 6.8395 LearningRate 0.0262 Epoch: 9 Global Step: 405020 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:13:54,349-Speed 2629.18 samples/sec Loss 6.8311 LearningRate 0.0262 Epoch: 9 Global Step: 405030 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:13:58,247-Speed 2627.25 samples/sec Loss 6.8272 LearningRate 0.0262 Epoch: 9 Global Step: 405040 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:14:02,166-Speed 2614.25 samples/sec Loss 6.8057 LearningRate 0.0262 Epoch: 9 Global Step: 405050 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:14:06,068-Speed 2624.51 samples/sec Loss 6.7418 LearningRate 0.0262 Epoch: 9 Global Step: 405060 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:14:09,971-Speed 2624.71 samples/sec Loss 6.8193 LearningRate 0.0262 Epoch: 9 Global Step: 405070 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:13,866-Speed 2629.01 samples/sec Loss 6.8205 LearningRate 0.0262 Epoch: 9 Global Step: 405080 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:17,765-Speed 2627.55 samples/sec Loss 6.9422 LearningRate 0.0262 Epoch: 9 Global Step: 405090 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:21,664-Speed 2626.96 samples/sec Loss 6.8828 LearningRate 0.0262 Epoch: 9 Global Step: 405100 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:25,561-Speed 2627.95 samples/sec Loss 6.8515 LearningRate 0.0262 Epoch: 9 Global Step: 405110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:29,460-Speed 2626.82 samples/sec Loss 6.8283 LearningRate 0.0262 Epoch: 9 Global Step: 405120 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:33,361-Speed 2625.63 samples/sec Loss 6.8627 LearningRate 0.0262 Epoch: 9 Global Step: 405130 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:37,266-Speed 2623.41 samples/sec Loss 6.8060 LearningRate 0.0262 Epoch: 9 Global Step: 405140 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:41,174-Speed 2620.72 samples/sec Loss 6.7542 LearningRate 0.0262 Epoch: 9 Global Step: 405150 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:45,068-Speed 2629.97 samples/sec Loss 6.8522 LearningRate 0.0262 Epoch: 9 Global Step: 405160 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:48,963-Speed 2629.43 samples/sec Loss 6.8363 LearningRate 0.0262 Epoch: 9 Global Step: 405170 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:14:52,894-Speed 2605.97 samples/sec Loss 6.8933 LearningRate 0.0262 Epoch: 9 Global Step: 405180 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:14:56,789-Speed 2630.05 samples/sec Loss 6.8206 LearningRate 0.0262 Epoch: 9 Global Step: 405190 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:15:00,698-Speed 2620.08 samples/sec Loss 6.9592 LearningRate 0.0262 Epoch: 9 Global Step: 405200 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:15:04,590-Speed 2632.67 samples/sec Loss 6.8122 LearningRate 0.0262 Epoch: 9 Global Step: 405210 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:15:08,481-Speed 2632.19 samples/sec Loss 6.8608 LearningRate 0.0262 Epoch: 9 Global Step: 405220 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:15:12,382-Speed 2625.37 samples/sec Loss 6.8016 LearningRate 0.0262 Epoch: 9 Global Step: 405230 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:15:16,259-Speed 2641.58 samples/sec Loss 6.9281 LearningRate 0.0262 Epoch: 9 Global Step: 405240 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:20,190-Speed 2605.71 samples/sec Loss 6.9399 LearningRate 0.0262 Epoch: 9 Global Step: 405250 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:24,089-Speed 2627.12 samples/sec Loss 6.8422 LearningRate 0.0262 Epoch: 9 Global Step: 405260 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:27,994-Speed 2622.87 samples/sec Loss 6.9219 LearningRate 0.0262 Epoch: 9 Global Step: 405270 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:31,895-Speed 2625.48 samples/sec Loss 6.8533 LearningRate 0.0262 Epoch: 9 Global Step: 405280 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:35,797-Speed 2625.59 samples/sec Loss 6.8488 LearningRate 0.0262 Epoch: 9 Global Step: 405290 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:39,739-Speed 2598.48 samples/sec Loss 6.8612 LearningRate 0.0262 Epoch: 9 Global Step: 405300 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:43,634-Speed 2629.13 samples/sec Loss 6.9075 LearningRate 0.0262 Epoch: 9 Global Step: 405310 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:47,537-Speed 2624.21 samples/sec Loss 6.8978 LearningRate 0.0262 Epoch: 9 Global Step: 405320 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:51,436-Speed 2627.36 samples/sec Loss 6.9366 LearningRate 0.0262 Epoch: 9 Global Step: 405330 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:15:55,348-Speed 2617.96 samples/sec Loss 6.9162 LearningRate 0.0262 Epoch: 9 Global Step: 405340 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:15:59,287-Speed 2600.51 samples/sec Loss 6.9861 LearningRate 0.0262 Epoch: 9 Global Step: 405350 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:16:03,184-Speed 2628.60 samples/sec Loss 6.8542 LearningRate 0.0261 Epoch: 9 Global Step: 405360 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:16:07,128-Speed 2596.64 samples/sec Loss 6.8967 LearningRate 0.0261 Epoch: 9 Global Step: 405370 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:16:11,008-Speed 2640.39 samples/sec Loss 6.8273 LearningRate 0.0261 Epoch: 9 Global Step: 405380 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:14,907-Speed 2626.40 samples/sec Loss 6.9119 LearningRate 0.0261 Epoch: 9 Global Step: 405390 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:18,819-Speed 2618.01 samples/sec Loss 6.8703 LearningRate 0.0261 Epoch: 9 Global Step: 405400 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:22,723-Speed 2624.14 samples/sec Loss 6.9004 LearningRate 0.0261 Epoch: 9 Global Step: 405410 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:26,616-Speed 2630.92 samples/sec Loss 6.7628 LearningRate 0.0261 Epoch: 9 Global Step: 405420 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:30,519-Speed 2624.48 samples/sec Loss 6.7016 LearningRate 0.0261 Epoch: 9 Global Step: 405430 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:34,412-Speed 2630.87 samples/sec Loss 6.8223 LearningRate 0.0261 Epoch: 9 Global Step: 405440 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:38,307-Speed 2629.90 samples/sec Loss 6.8174 LearningRate 0.0261 Epoch: 9 Global Step: 405450 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:42,200-Speed 2631.14 samples/sec Loss 6.8620 LearningRate 0.0261 Epoch: 9 Global Step: 405460 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:46,108-Speed 2621.16 samples/sec Loss 6.8198 LearningRate 0.0261 Epoch: 9 Global Step: 405470 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:16:50,003-Speed 2629.53 samples/sec Loss 6.7646 LearningRate 0.0261 Epoch: 9 Global Step: 405480 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:16:53,909-Speed 2622.73 samples/sec Loss 6.8763 LearningRate 0.0261 Epoch: 9 Global Step: 405490 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:16:57,804-Speed 2629.50 samples/sec Loss 6.9493 LearningRate 0.0261 Epoch: 9 Global Step: 405500 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:01,699-Speed 2629.87 samples/sec Loss 6.8341 LearningRate 0.0261 Epoch: 9 Global Step: 405510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:05,594-Speed 2629.79 samples/sec Loss 6.9353 LearningRate 0.0261 Epoch: 9 Global Step: 405520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:09,488-Speed 2629.98 samples/sec Loss 6.8344 LearningRate 0.0261 Epoch: 9 Global Step: 405530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:13,385-Speed 2628.60 samples/sec Loss 6.8790 LearningRate 0.0261 Epoch: 9 Global Step: 405540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:17,294-Speed 2620.35 samples/sec Loss 6.9944 LearningRate 0.0261 Epoch: 9 Global Step: 405550 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:21,195-Speed 2625.42 samples/sec Loss 6.8000 LearningRate 0.0261 Epoch: 9 Global Step: 405560 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:25,100-Speed 2623.21 samples/sec Loss 6.8226 LearningRate 0.0261 Epoch: 9 Global Step: 405570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:29,003-Speed 2624.05 samples/sec Loss 6.8085 LearningRate 0.0261 Epoch: 9 Global Step: 405580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:17:32,884-Speed 2638.97 samples/sec Loss 6.7718 LearningRate 0.0261 Epoch: 9 Global Step: 405590 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:17:36,792-Speed 2621.23 samples/sec Loss 6.7805 LearningRate 0.0261 Epoch: 9 Global Step: 405600 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:17:40,689-Speed 2628.27 samples/sec Loss 6.8188 LearningRate 0.0261 Epoch: 9 Global Step: 405610 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:17:44,585-Speed 2628.82 samples/sec Loss 6.7810 LearningRate 0.0261 Epoch: 9 Global Step: 405620 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:17:48,493-Speed 2621.48 samples/sec Loss 6.8275 LearningRate 0.0261 Epoch: 9 Global Step: 405630 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:17:52,414-Speed 2612.02 samples/sec Loss 6.8443 LearningRate 0.0261 Epoch: 9 Global Step: 405640 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:17:56,330-Speed 2615.46 samples/sec Loss 6.8897 LearningRate 0.0261 Epoch: 9 Global Step: 405650 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:00,226-Speed 2629.06 samples/sec Loss 6.8299 LearningRate 0.0261 Epoch: 9 Global Step: 405660 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:04,120-Speed 2630.82 samples/sec Loss 6.9231 LearningRate 0.0261 Epoch: 9 Global Step: 405670 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:08,021-Speed 2625.04 samples/sec Loss 6.8190 LearningRate 0.0261 Epoch: 9 Global Step: 405680 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:11,928-Speed 2621.65 samples/sec Loss 6.8767 LearningRate 0.0261 Epoch: 9 Global Step: 405690 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:15,839-Speed 2619.19 samples/sec Loss 6.8347 LearningRate 0.0261 Epoch: 9 Global Step: 405700 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:19,739-Speed 2626.62 samples/sec Loss 6.8730 LearningRate 0.0261 Epoch: 9 Global Step: 405710 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:23,645-Speed 2622.03 samples/sec Loss 6.8021 LearningRate 0.0261 Epoch: 9 Global Step: 405720 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:27,543-Speed 2627.82 samples/sec Loss 6.9316 LearningRate 0.0261 Epoch: 9 Global Step: 405730 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:31,439-Speed 2628.61 samples/sec Loss 6.7971 LearningRate 0.0261 Epoch: 9 Global Step: 405740 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:35,354-Speed 2616.26 samples/sec Loss 6.9642 LearningRate 0.0261 Epoch: 9 Global Step: 405750 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:39,252-Speed 2627.40 samples/sec Loss 6.7888 LearningRate 0.0261 Epoch: 9 Global Step: 405760 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:18:43,132-Speed 2640.01 samples/sec Loss 6.7702 LearningRate 0.0261 Epoch: 9 Global Step: 405770 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:47,025-Speed 2631.12 samples/sec Loss 6.8518 LearningRate 0.0261 Epoch: 9 Global Step: 405780 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:50,918-Speed 2631.31 samples/sec Loss 6.9087 LearningRate 0.0261 Epoch: 9 Global Step: 405790 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:54,818-Speed 2626.44 samples/sec Loss 6.7937 LearningRate 0.0261 Epoch: 9 Global Step: 405800 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:18:58,734-Speed 2615.25 samples/sec Loss 6.8506 LearningRate 0.0261 Epoch: 9 Global Step: 405810 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:02,633-Speed 2626.77 samples/sec Loss 6.9186 LearningRate 0.0261 Epoch: 9 Global Step: 405820 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:06,527-Speed 2629.98 samples/sec Loss 7.0167 LearningRate 0.0261 Epoch: 9 Global Step: 405830 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:10,420-Speed 2631.40 samples/sec Loss 6.7713 LearningRate 0.0261 Epoch: 9 Global Step: 405840 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:14,315-Speed 2628.99 samples/sec Loss 6.8623 LearningRate 0.0261 Epoch: 9 Global Step: 405850 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:18,210-Speed 2629.87 samples/sec Loss 6.8897 LearningRate 0.0261 Epoch: 9 Global Step: 405860 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:22,104-Speed 2630.33 samples/sec Loss 6.8160 LearningRate 0.0261 Epoch: 9 Global Step: 405870 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:19:26,001-Speed 2629.06 samples/sec Loss 6.7670 LearningRate 0.0261 Epoch: 9 Global Step: 405880 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:19:29,898-Speed 2627.60 samples/sec Loss 6.7867 LearningRate 0.0261 Epoch: 9 Global Step: 405890 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:19:33,795-Speed 2628.19 samples/sec Loss 6.7284 LearningRate 0.0261 Epoch: 9 Global Step: 405900 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:19:37,706-Speed 2619.13 samples/sec Loss 6.8361 LearningRate 0.0261 Epoch: 9 Global Step: 405910 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:19:41,606-Speed 2626.41 samples/sec Loss 6.9804 LearningRate 0.0261 Epoch: 9 Global Step: 405920 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:19:45,481-Speed 2643.13 samples/sec Loss 6.8999 LearningRate 0.0261 Epoch: 9 Global Step: 405930 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:49,398-Speed 2614.53 samples/sec Loss 6.8834 LearningRate 0.0261 Epoch: 9 Global Step: 405940 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:53,306-Speed 2621.03 samples/sec Loss 6.8535 LearningRate 0.0261 Epoch: 9 Global Step: 405950 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:19:57,201-Speed 2629.73 samples/sec Loss 6.8046 LearningRate 0.0261 Epoch: 9 Global Step: 405960 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:20:01,108-Speed 2621.79 samples/sec Loss 6.7431 LearningRate 0.0261 Epoch: 9 Global Step: 405970 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:20:05,005-Speed 2628.22 samples/sec Loss 6.7714 LearningRate 0.0261 Epoch: 9 Global Step: 405980 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:20:08,901-Speed 2628.71 samples/sec Loss 6.7304 LearningRate 0.0261 Epoch: 9 Global Step: 405990 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:20:12,801-Speed 2626.41 samples/sec Loss 6.8636 LearningRate 0.0261 Epoch: 9 Global Step: 406000 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:20:16,697-Speed 2628.66 samples/sec Loss 6.8237 LearningRate 0.0261 Epoch: 9 Global Step: 406010 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:20:20,602-Speed 2622.74 samples/sec Loss 6.7940 LearningRate 0.0261 Epoch: 9 Global Step: 406020 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:20:24,567-Speed 2583.55 samples/sec Loss 6.9471 LearningRate 0.0261 Epoch: 9 Global Step: 406030 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:28,470-Speed 2623.73 samples/sec Loss 6.7125 LearningRate 0.0261 Epoch: 9 Global Step: 406040 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:32,469-Speed 2561.70 samples/sec Loss 6.8151 LearningRate 0.0261 Epoch: 9 Global Step: 406050 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:36,361-Speed 2632.00 samples/sec Loss 6.8060 LearningRate 0.0261 Epoch: 9 Global Step: 406060 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:40,269-Speed 2620.42 samples/sec Loss 6.7556 LearningRate 0.0261 Epoch: 9 Global Step: 406070 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:44,165-Speed 2629.09 samples/sec Loss 6.7356 LearningRate 0.0261 Epoch: 9 Global Step: 406080 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:48,062-Speed 2628.45 samples/sec Loss 6.9834 LearningRate 0.0261 Epoch: 9 Global Step: 406090 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:51,964-Speed 2624.68 samples/sec Loss 6.8534 LearningRate 0.0261 Epoch: 9 Global Step: 406100 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:55,861-Speed 2628.35 samples/sec Loss 6.7633 LearningRate 0.0261 Epoch: 9 Global Step: 406110 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:20:59,754-Speed 2630.76 samples/sec Loss 6.7827 LearningRate 0.0261 Epoch: 9 Global Step: 406120 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:21:03,632-Speed 2641.67 samples/sec Loss 6.9425 LearningRate 0.0261 Epoch: 9 Global Step: 406130 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:21:07,535-Speed 2623.94 samples/sec Loss 6.9778 LearningRate 0.0261 Epoch: 9 Global Step: 406140 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:21:11,436-Speed 2625.23 samples/sec Loss 6.8856 LearningRate 0.0261 Epoch: 9 Global Step: 406150 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:21:15,324-Speed 2634.37 samples/sec Loss 6.7174 LearningRate 0.0261 Epoch: 9 Global Step: 406160 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:19,222-Speed 2627.80 samples/sec Loss 6.8066 LearningRate 0.0260 Epoch: 9 Global Step: 406170 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:23,120-Speed 2627.66 samples/sec Loss 6.8181 LearningRate 0.0260 Epoch: 9 Global Step: 406180 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:27,018-Speed 2627.33 samples/sec Loss 6.6909 LearningRate 0.0260 Epoch: 9 Global Step: 406190 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:30,918-Speed 2626.32 samples/sec Loss 6.8643 LearningRate 0.0260 Epoch: 9 Global Step: 406200 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:34,811-Speed 2631.28 samples/sec Loss 6.7414 LearningRate 0.0260 Epoch: 9 Global Step: 406210 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:38,709-Speed 2627.07 samples/sec Loss 6.8680 LearningRate 0.0260 Epoch: 9 Global Step: 406220 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:42,606-Speed 2628.62 samples/sec Loss 6.7670 LearningRate 0.0260 Epoch: 9 Global Step: 406230 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:46,512-Speed 2622.62 samples/sec Loss 6.8578 LearningRate 0.0260 Epoch: 9 Global Step: 406240 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:50,408-Speed 2629.01 samples/sec Loss 6.9184 LearningRate 0.0260 Epoch: 9 Global Step: 406250 Fp16 Grad Scale: 65536 Required: 48 hours
Training: 2022-04-14 17:21:54,303-Speed 2629.23 samples/sec Loss 6.9083 LearningRate 0.0260 Epoch: 9 Global Step: 406260 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:21:58,208-Speed 2623.34 samples/sec Loss 6.8656 LearningRate 0.0260 Epoch: 9 Global Step: 406270 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:02,109-Speed 2625.74 samples/sec Loss 6.8304 LearningRate 0.0260 Epoch: 9 Global Step: 406280 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:06,006-Speed 2628.22 samples/sec Loss 6.7757 LearningRate 0.0260 Epoch: 9 Global Step: 406290 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:09,897-Speed 2631.99 samples/sec Loss 6.8997 LearningRate 0.0260 Epoch: 9 Global Step: 406300 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:13,794-Speed 2628.60 samples/sec Loss 6.9018 LearningRate 0.0260 Epoch: 9 Global Step: 406310 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:17,695-Speed 2625.91 samples/sec Loss 6.8271 LearningRate 0.0260 Epoch: 9 Global Step: 406320 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:21,593-Speed 2627.53 samples/sec Loss 6.7784 LearningRate 0.0260 Epoch: 9 Global Step: 406330 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:25,490-Speed 2627.92 samples/sec Loss 6.8850 LearningRate 0.0260 Epoch: 9 Global Step: 406340 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:29,413-Speed 2611.12 samples/sec Loss 6.7340 LearningRate 0.0260 Epoch: 9 Global Step: 406350 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:33,323-Speed 2620.01 samples/sec Loss 6.7459 LearningRate 0.0260 Epoch: 9 Global Step: 406360 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:22:37,214-Speed 2632.14 samples/sec Loss 6.8466 LearningRate 0.0260 Epoch: 9 Global Step: 406370 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:41,113-Speed 2626.37 samples/sec Loss 6.8819 LearningRate 0.0260 Epoch: 9 Global Step: 406380 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:45,012-Speed 2627.31 samples/sec Loss 6.7361 LearningRate 0.0260 Epoch: 9 Global Step: 406390 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:48,908-Speed 2629.57 samples/sec Loss 6.8408 LearningRate 0.0260 Epoch: 9 Global Step: 406400 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:52,804-Speed 2628.90 samples/sec Loss 6.7555 LearningRate 0.0260 Epoch: 9 Global Step: 406410 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:22:56,711-Speed 2621.08 samples/sec Loss 6.9230 LearningRate 0.0260 Epoch: 9 Global Step: 406420 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:00,621-Speed 2619.55 samples/sec Loss 6.6788 LearningRate 0.0260 Epoch: 9 Global Step: 406430 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:04,517-Speed 2628.65 samples/sec Loss 6.8723 LearningRate 0.0260 Epoch: 9 Global Step: 406440 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:08,420-Speed 2624.85 samples/sec Loss 6.8453 LearningRate 0.0260 Epoch: 9 Global Step: 406450 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:12,319-Speed 2627.08 samples/sec Loss 6.8184 LearningRate 0.0260 Epoch: 9 Global Step: 406460 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:16,216-Speed 2628.00 samples/sec Loss 6.7268 LearningRate 0.0260 Epoch: 9 Global Step: 406470 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:23:20,181-Speed 2583.22 samples/sec Loss 6.8848 LearningRate 0.0260 Epoch: 9 Global Step: 406480 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:23:24,089-Speed 2620.98 samples/sec Loss 6.8786 LearningRate 0.0260 Epoch: 9 Global Step: 406490 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:23:27,965-Speed 2642.50 samples/sec Loss 6.9462 LearningRate 0.0260 Epoch: 9 Global Step: 406500 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:31,868-Speed 2624.13 samples/sec Loss 6.8711 LearningRate 0.0260 Epoch: 9 Global Step: 406510 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:35,765-Speed 2628.31 samples/sec Loss 6.8012 LearningRate 0.0260 Epoch: 9 Global Step: 406520 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:39,673-Speed 2620.84 samples/sec Loss 6.8410 LearningRate 0.0260 Epoch: 9 Global Step: 406530 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:43,565-Speed 2631.84 samples/sec Loss 6.8347 LearningRate 0.0260 Epoch: 9 Global Step: 406540 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:47,462-Speed 2628.89 samples/sec Loss 6.7570 LearningRate 0.0260 Epoch: 9 Global Step: 406550 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:51,356-Speed 2629.81 samples/sec Loss 6.8345 LearningRate 0.0260 Epoch: 9 Global Step: 406560 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:55,251-Speed 2629.68 samples/sec Loss 6.7484 LearningRate 0.0260 Epoch: 9 Global Step: 406570 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:23:59,155-Speed 2624.23 samples/sec Loss 6.7700 LearningRate 0.0260 Epoch: 9 Global Step: 406580 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:03,050-Speed 2629.39 samples/sec Loss 7.0227 LearningRate 0.0260 Epoch: 9 Global Step: 406590 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:06,963-Speed 2617.36 samples/sec Loss 6.6515 LearningRate 0.0260 Epoch: 9 Global Step: 406600 Fp16 Grad Scale: 262144 Required: 48 hours
Training: 2022-04-14 17:24:10,859-Speed 2629.07 samples/sec Loss 6.8896 LearningRate 0.0260 Epoch: 9 Global Step: 406610 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:14,753-Speed 2630.66 samples/sec Loss 6.8486 LearningRate 0.0260 Epoch: 9 Global Step: 406620 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:18,647-Speed 2630.63 samples/sec Loss 6.8679 LearningRate 0.0260 Epoch: 9 Global Step: 406630 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:22,559-Speed 2617.74 samples/sec Loss 6.8015 LearningRate 0.0260 Epoch: 9 Global Step: 406640 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:26,456-Speed 2628.95 samples/sec Loss 6.8266 LearningRate 0.0260 Epoch: 9 Global Step: 406650 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:30,347-Speed 2632.14 samples/sec Loss 6.8165 LearningRate 0.0260 Epoch: 9 Global Step: 406660 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:34,238-Speed 2632.50 samples/sec Loss 6.9547 LearningRate 0.0260 Epoch: 9 Global Step: 406670 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:38,146-Speed 2620.28 samples/sec Loss 6.8689 LearningRate 0.0260 Epoch: 9 Global Step: 406680 Fp16 Grad Scale: 131072 Required: 48 hours
Training: 2022-04-14 17:24:42,072-Speed 2609.06 samples/sec Loss 6.8764 LearningRate 0.0260 Epoch: 9 Global Step: 406690 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:24:45,977-Speed 2622.65 samples/sec Loss 6.7440 LearningRate 0.0260 Epoch: 9 Global Step: 406700 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:24:49,859-Speed 2639.27 samples/sec Loss 6.8472 LearningRate 0.0260 Epoch: 9 Global Step: 406710 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:24:53,774-Speed 2615.84 samples/sec Loss 6.9586 LearningRate 0.0260 Epoch: 9 Global Step: 406720 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:24:57,755-Speed 2573.54 samples/sec Loss 6.8350 LearningRate 0.0260 Epoch: 9 Global Step: 406730 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:01,650-Speed 2629.02 samples/sec Loss 6.8535 LearningRate 0.0260 Epoch: 9 Global Step: 406740 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:05,548-Speed 2627.78 samples/sec Loss 6.8148 LearningRate 0.0260 Epoch: 9 Global Step: 406750 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:09,519-Speed 2579.21 samples/sec Loss 6.7491 LearningRate 0.0260 Epoch: 9 Global Step: 406760 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:13,423-Speed 2624.18 samples/sec Loss 6.8557 LearningRate 0.0260 Epoch: 9 Global Step: 406770 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:17,320-Speed 2628.26 samples/sec Loss 6.9896 LearningRate 0.0260 Epoch: 9 Global Step: 406780 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:21,214-Speed 2630.61 samples/sec Loss 6.8031 LearningRate 0.0260 Epoch: 9 Global Step: 406790 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:25,205-Speed 2566.12 samples/sec Loss 6.7788 LearningRate 0.0260 Epoch: 9 Global Step: 406800 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:25:29,105-Speed 2626.39 samples/sec Loss 6.7826 LearningRate 0.0260 Epoch: 9 Global Step: 406810 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:25:32,978-Speed 2645.17 samples/sec Loss 6.7711 LearningRate 0.0260 Epoch: 9 Global Step: 406820 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:25:36,877-Speed 2626.78 samples/sec Loss 6.7862 LearningRate 0.0260 Epoch: 9 Global Step: 406830 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:25:40,773-Speed 2628.80 samples/sec Loss 6.9344 LearningRate 0.0260 Epoch: 9 Global Step: 406840 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:25:44,675-Speed 2625.08 samples/sec Loss 6.7006 LearningRate 0.0260 Epoch: 9 Global Step: 406850 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:25:48,568-Speed 2631.29 samples/sec Loss 6.8653 LearningRate 0.0260 Epoch: 9 Global Step: 406860 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:25:52,463-Speed 2629.88 samples/sec Loss 6.9882 LearningRate 0.0260 Epoch: 9 Global Step: 406870 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:25:56,399-Speed 2602.59 samples/sec Loss 6.8469 LearningRate 0.0260 Epoch: 9 Global Step: 406880 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:26:00,319-Speed 2612.54 samples/sec Loss 6.7518 LearningRate 0.0260 Epoch: 9 Global Step: 406890 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:26:04,250-Speed 2605.49 samples/sec Loss 6.8261 LearningRate 0.0260 Epoch: 9 Global Step: 406900 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:26:08,144-Speed 2630.84 samples/sec Loss 6.8225 LearningRate 0.0260 Epoch: 9 Global Step: 406910 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:26:12,038-Speed 2630.29 samples/sec Loss 6.9221 LearningRate 0.0260 Epoch: 9 Global Step: 406920 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:26:15,934-Speed 2628.95 samples/sec Loss 6.7316 LearningRate 0.0260 Epoch: 9 Global Step: 406930 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:19,839-Speed 2622.99 samples/sec Loss 6.8816 LearningRate 0.0260 Epoch: 9 Global Step: 406940 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:23,746-Speed 2621.44 samples/sec Loss 6.7888 LearningRate 0.0260 Epoch: 9 Global Step: 406950 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:27,644-Speed 2627.95 samples/sec Loss 6.7841 LearningRate 0.0260 Epoch: 9 Global Step: 406960 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:31,535-Speed 2632.51 samples/sec Loss 6.8306 LearningRate 0.0260 Epoch: 9 Global Step: 406970 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:35,430-Speed 2629.11 samples/sec Loss 6.7790 LearningRate 0.0260 Epoch: 9 Global Step: 406980 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:39,327-Speed 2628.54 samples/sec Loss 6.8937 LearningRate 0.0259 Epoch: 9 Global Step: 406990 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:43,220-Speed 2630.87 samples/sec Loss 6.8116 LearningRate 0.0259 Epoch: 9 Global Step: 407000 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:47,123-Speed 2624.61 samples/sec Loss 6.8465 LearningRate 0.0259 Epoch: 9 Global Step: 407010 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:51,026-Speed 2624.00 samples/sec Loss 6.7916 LearningRate 0.0259 Epoch: 9 Global Step: 407020 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:26:54,922-Speed 2629.49 samples/sec Loss 6.8977 LearningRate 0.0259 Epoch: 9 Global Step: 407030 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:26:58,820-Speed 2627.99 samples/sec Loss 6.8301 LearningRate 0.0259 Epoch: 9 Global Step: 407040 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:02,721-Speed 2625.25 samples/sec Loss 6.8113 LearningRate 0.0259 Epoch: 9 Global Step: 407050 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:06,614-Speed 2630.81 samples/sec Loss 6.8300 LearningRate 0.0259 Epoch: 9 Global Step: 407060 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:10,517-Speed 2624.10 samples/sec Loss 6.8221 LearningRate 0.0259 Epoch: 9 Global Step: 407070 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:14,415-Speed 2627.57 samples/sec Loss 6.7708 LearningRate 0.0259 Epoch: 9 Global Step: 407080 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:18,336-Speed 2612.77 samples/sec Loss 6.7719 LearningRate 0.0259 Epoch: 9 Global Step: 407090 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:22,237-Speed 2625.25 samples/sec Loss 6.7392 LearningRate 0.0259 Epoch: 9 Global Step: 407100 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:26,256-Speed 2548.93 samples/sec Loss 6.6337 LearningRate 0.0259 Epoch: 9 Global Step: 407110 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:30,152-Speed 2628.81 samples/sec Loss 6.7644 LearningRate 0.0259 Epoch: 9 Global Step: 407120 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:34,037-Speed 2636.70 samples/sec Loss 6.9281 LearningRate 0.0259 Epoch: 9 Global Step: 407130 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:37,945-Speed 2621.12 samples/sec Loss 6.7937 LearningRate 0.0259 Epoch: 9 Global Step: 407140 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:41,864-Speed 2613.24 samples/sec Loss 6.7238 LearningRate 0.0259 Epoch: 9 Global Step: 407150 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:45,761-Speed 2627.99 samples/sec Loss 6.7636 LearningRate 0.0259 Epoch: 9 Global Step: 407160 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:49,657-Speed 2629.44 samples/sec Loss 6.9556 LearningRate 0.0259 Epoch: 9 Global Step: 407170 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:53,555-Speed 2627.71 samples/sec Loss 6.9082 LearningRate 0.0259 Epoch: 9 Global Step: 407180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:27:57,454-Speed 2626.96 samples/sec Loss 6.7106 LearningRate 0.0259 Epoch: 9 Global Step: 407190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:28:01,349-Speed 2630.47 samples/sec Loss 6.7818 LearningRate 0.0259 Epoch: 9 Global Step: 407200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:28:05,253-Speed 2623.18 samples/sec Loss 6.6907 LearningRate 0.0259 Epoch: 9 Global Step: 407210 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:28:09,159-Speed 2622.05 samples/sec Loss 6.8323 LearningRate 0.0259 Epoch: 9 Global Step: 407220 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:28:13,037-Speed 2640.77 samples/sec Loss 6.7819 LearningRate 0.0259 Epoch: 9 Global Step: 407230 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:28:16,942-Speed 2624.60 samples/sec Loss 6.8443 LearningRate 0.0259 Epoch: 9 Global Step: 407240 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:20,841-Speed 2626.11 samples/sec Loss 6.8624 LearningRate 0.0259 Epoch: 9 Global Step: 407250 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:24,746-Speed 2623.67 samples/sec Loss 6.7399 LearningRate 0.0259 Epoch: 9 Global Step: 407260 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:28,655-Speed 2620.67 samples/sec Loss 6.8328 LearningRate 0.0259 Epoch: 9 Global Step: 407270 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:32,555-Speed 2626.19 samples/sec Loss 6.7837 LearningRate 0.0259 Epoch: 9 Global Step: 407280 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:36,515-Speed 2586.08 samples/sec Loss 6.7159 LearningRate 0.0259 Epoch: 9 Global Step: 407290 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:40,416-Speed 2625.40 samples/sec Loss 6.7894 LearningRate 0.0259 Epoch: 9 Global Step: 407300 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:44,318-Speed 2624.96 samples/sec Loss 6.7269 LearningRate 0.0259 Epoch: 9 Global Step: 407310 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:48,223-Speed 2623.13 samples/sec Loss 6.7463 LearningRate 0.0259 Epoch: 9 Global Step: 407320 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:52,122-Speed 2627.52 samples/sec Loss 6.6910 LearningRate 0.0259 Epoch: 9 Global Step: 407330 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:28:56,024-Speed 2624.39 samples/sec Loss 6.7108 LearningRate 0.0259 Epoch: 9 Global Step: 407340 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:28:59,937-Speed 2617.60 samples/sec Loss 6.7798 LearningRate 0.0259 Epoch: 9 Global Step: 407350 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:03,843-Speed 2622.39 samples/sec Loss 6.7750 LearningRate 0.0259 Epoch: 9 Global Step: 407360 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:07,735-Speed 2632.00 samples/sec Loss 6.7898 LearningRate 0.0259 Epoch: 9 Global Step: 407370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:11,630-Speed 2629.79 samples/sec Loss 6.7724 LearningRate 0.0259 Epoch: 9 Global Step: 407380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:15,532-Speed 2624.81 samples/sec Loss 6.7641 LearningRate 0.0259 Epoch: 9 Global Step: 407390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:19,425-Speed 2630.69 samples/sec Loss 6.8593 LearningRate 0.0259 Epoch: 9 Global Step: 407400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:23,318-Speed 2631.37 samples/sec Loss 6.8235 LearningRate 0.0259 Epoch: 9 Global Step: 407410 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:27,217-Speed 2627.58 samples/sec Loss 6.6802 LearningRate 0.0259 Epoch: 9 Global Step: 407420 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:31,108-Speed 2632.51 samples/sec Loss 6.7363 LearningRate 0.0259 Epoch: 9 Global Step: 407430 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:35,002-Speed 2630.08 samples/sec Loss 6.8014 LearningRate 0.0259 Epoch: 9 Global Step: 407440 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:29:38,889-Speed 2634.63 samples/sec Loss 6.8700 LearningRate 0.0259 Epoch: 9 Global Step: 407450 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:42,783-Speed 2630.62 samples/sec Loss 6.8654 LearningRate 0.0259 Epoch: 9 Global Step: 407460 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:46,674-Speed 2632.59 samples/sec Loss 6.7963 LearningRate 0.0259 Epoch: 9 Global Step: 407470 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:50,579-Speed 2623.02 samples/sec Loss 6.8230 LearningRate 0.0259 Epoch: 9 Global Step: 407480 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:54,509-Speed 2605.83 samples/sec Loss 6.8203 LearningRate 0.0259 Epoch: 9 Global Step: 407490 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:29:58,416-Speed 2622.36 samples/sec Loss 6.9719 LearningRate 0.0259 Epoch: 9 Global Step: 407500 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:02,318-Speed 2625.06 samples/sec Loss 6.8117 LearningRate 0.0259 Epoch: 9 Global Step: 407510 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:06,215-Speed 2627.90 samples/sec Loss 6.8138 LearningRate 0.0259 Epoch: 9 Global Step: 407520 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:10,125-Speed 2619.66 samples/sec Loss 6.8548 LearningRate 0.0259 Epoch: 9 Global Step: 407530 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:14,021-Speed 2628.72 samples/sec Loss 6.9601 LearningRate 0.0259 Epoch: 9 Global Step: 407540 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:17,957-Speed 2602.42 samples/sec Loss 6.8094 LearningRate 0.0259 Epoch: 9 Global Step: 407550 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:21,853-Speed 2629.65 samples/sec Loss 6.8494 LearningRate 0.0259 Epoch: 9 Global Step: 407560 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:25,746-Speed 2630.52 samples/sec Loss 6.8154 LearningRate 0.0259 Epoch: 9 Global Step: 407570 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:29,643-Speed 2628.51 samples/sec Loss 6.8535 LearningRate 0.0259 Epoch: 9 Global Step: 407580 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:33,588-Speed 2596.33 samples/sec Loss 6.7681 LearningRate 0.0259 Epoch: 9 Global Step: 407590 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:37,723-Speed 2476.59 samples/sec Loss 6.9306 LearningRate 0.0259 Epoch: 9 Global Step: 407600 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:41,623-Speed 2626.56 samples/sec Loss 6.6931 LearningRate 0.0259 Epoch: 9 Global Step: 407610 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:45,532-Speed 2619.40 samples/sec Loss 6.8676 LearningRate 0.0259 Epoch: 9 Global Step: 407620 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:49,434-Speed 2625.61 samples/sec Loss 6.8877 LearningRate 0.0259 Epoch: 9 Global Step: 407630 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:53,337-Speed 2624.03 samples/sec Loss 6.7593 LearningRate 0.0259 Epoch: 9 Global Step: 407640 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:30:57,213-Speed 2642.79 samples/sec Loss 6.8342 LearningRate 0.0259 Epoch: 9 Global Step: 407650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:01,110-Speed 2628.00 samples/sec Loss 6.7257 LearningRate 0.0259 Epoch: 9 Global Step: 407660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:05,010-Speed 2625.99 samples/sec Loss 6.8127 LearningRate 0.0259 Epoch: 9 Global Step: 407670 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:08,904-Speed 2630.47 samples/sec Loss 6.7990 LearningRate 0.0259 Epoch: 9 Global Step: 407680 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:12,797-Speed 2630.73 samples/sec Loss 6.8791 LearningRate 0.0259 Epoch: 9 Global Step: 407690 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:16,721-Speed 2610.19 samples/sec Loss 6.8590 LearningRate 0.0259 Epoch: 9 Global Step: 407700 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:20,669-Speed 2594.71 samples/sec Loss 6.8743 LearningRate 0.0259 Epoch: 9 Global Step: 407710 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:24,569-Speed 2625.97 samples/sec Loss 6.6267 LearningRate 0.0259 Epoch: 9 Global Step: 407720 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:28,464-Speed 2630.26 samples/sec Loss 6.8268 LearningRate 0.0259 Epoch: 9 Global Step: 407730 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:32,363-Speed 2626.64 samples/sec Loss 6.7795 LearningRate 0.0259 Epoch: 9 Global Step: 407740 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:36,259-Speed 2629.07 samples/sec Loss 6.8119 LearningRate 0.0259 Epoch: 9 Global Step: 407750 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:31:40,138-Speed 2639.97 samples/sec Loss 6.8188 LearningRate 0.0259 Epoch: 9 Global Step: 407760 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:31:44,013-Speed 2643.54 samples/sec Loss 6.7861 LearningRate 0.0259 Epoch: 9 Global Step: 407770 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:31:47,908-Speed 2629.80 samples/sec Loss 6.8297 LearningRate 0.0259 Epoch: 9 Global Step: 407780 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:31:51,808-Speed 2626.31 samples/sec Loss 6.8069 LearningRate 0.0259 Epoch: 9 Global Step: 407790 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:31:55,704-Speed 2628.78 samples/sec Loss 6.7691 LearningRate 0.0258 Epoch: 9 Global Step: 407800 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:31:59,610-Speed 2622.88 samples/sec Loss 6.9298 LearningRate 0.0258 Epoch: 9 Global Step: 407810 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:03,506-Speed 2628.49 samples/sec Loss 6.7795 LearningRate 0.0258 Epoch: 9 Global Step: 407820 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:07,407-Speed 2625.86 samples/sec Loss 6.8658 LearningRate 0.0258 Epoch: 9 Global Step: 407830 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:11,300-Speed 2630.94 samples/sec Loss 6.7419 LearningRate 0.0258 Epoch: 9 Global Step: 407840 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:15,197-Speed 2628.54 samples/sec Loss 6.8447 LearningRate 0.0258 Epoch: 9 Global Step: 407850 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:19,089-Speed 2631.71 samples/sec Loss 6.8234 LearningRate 0.0258 Epoch: 9 Global Step: 407860 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:22,990-Speed 2625.52 samples/sec Loss 6.8237 LearningRate 0.0258 Epoch: 9 Global Step: 407870 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:32:26,899-Speed 2619.82 samples/sec Loss 6.7592 LearningRate 0.0258 Epoch: 9 Global Step: 407880 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:32:30,804-Speed 2623.19 samples/sec Loss 6.8099 LearningRate 0.0258 Epoch: 9 Global Step: 407890 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:34,699-Speed 2629.30 samples/sec Loss 6.7824 LearningRate 0.0258 Epoch: 9 Global Step: 407900 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:38,601-Speed 2625.62 samples/sec Loss 6.7843 LearningRate 0.0258 Epoch: 9 Global Step: 407910 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:42,498-Speed 2627.92 samples/sec Loss 6.7846 LearningRate 0.0258 Epoch: 9 Global Step: 407920 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:46,399-Speed 2625.44 samples/sec Loss 6.7827 LearningRate 0.0258 Epoch: 9 Global Step: 407930 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:50,297-Speed 2627.62 samples/sec Loss 6.8296 LearningRate 0.0258 Epoch: 9 Global Step: 407940 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:54,248-Speed 2592.84 samples/sec Loss 6.7902 LearningRate 0.0258 Epoch: 9 Global Step: 407950 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:32:58,142-Speed 2629.58 samples/sec Loss 6.9079 LearningRate 0.0258 Epoch: 9 Global Step: 407960 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:33:02,037-Speed 2630.02 samples/sec Loss 6.8079 LearningRate 0.0258 Epoch: 9 Global Step: 407970 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:33:05,933-Speed 2628.41 samples/sec Loss 6.7724 LearningRate 0.0258 Epoch: 9 Global Step: 407980 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:33:09,829-Speed 2629.61 samples/sec Loss 6.7580 LearningRate 0.0258 Epoch: 9 Global Step: 407990 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:13,724-Speed 2629.06 samples/sec Loss 6.8083 LearningRate 0.0258 Epoch: 9 Global Step: 408000 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:17,622-Speed 2627.73 samples/sec Loss 6.7153 LearningRate 0.0258 Epoch: 9 Global Step: 408010 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:21,515-Speed 2630.94 samples/sec Loss 6.8946 LearningRate 0.0258 Epoch: 9 Global Step: 408020 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:25,410-Speed 2629.90 samples/sec Loss 6.7497 LearningRate 0.0258 Epoch: 9 Global Step: 408030 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:29,308-Speed 2627.82 samples/sec Loss 6.8143 LearningRate 0.0258 Epoch: 9 Global Step: 408040 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:33,210-Speed 2624.36 samples/sec Loss 6.7765 LearningRate 0.0258 Epoch: 9 Global Step: 408050 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:37,112-Speed 2625.51 samples/sec Loss 6.7908 LearningRate 0.0258 Epoch: 9 Global Step: 408060 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:41,013-Speed 2624.89 samples/sec Loss 6.7172 LearningRate 0.0258 Epoch: 9 Global Step: 408070 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:33:44,886-Speed 2644.86 samples/sec Loss 6.8922 LearningRate 0.0258 Epoch: 9 Global Step: 408080 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:33:48,784-Speed 2627.09 samples/sec Loss 6.8027 LearningRate 0.0258 Epoch: 9 Global Step: 408090 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:33:52,691-Speed 2621.96 samples/sec Loss 6.7775 LearningRate 0.0258 Epoch: 9 Global Step: 408100 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:33:56,585-Speed 2630.60 samples/sec Loss 6.8572 LearningRate 0.0258 Epoch: 9 Global Step: 408110 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:00,491-Speed 2622.34 samples/sec Loss 6.8094 LearningRate 0.0258 Epoch: 9 Global Step: 408120 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:04,393-Speed 2624.76 samples/sec Loss 6.6931 LearningRate 0.0258 Epoch: 9 Global Step: 408130 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:08,289-Speed 2629.04 samples/sec Loss 6.7542 LearningRate 0.0258 Epoch: 9 Global Step: 408140 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:12,189-Speed 2626.06 samples/sec Loss 6.6963 LearningRate 0.0258 Epoch: 9 Global Step: 408150 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:16,099-Speed 2619.50 samples/sec Loss 6.6612 LearningRate 0.0258 Epoch: 9 Global Step: 408160 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:19,998-Speed 2626.81 samples/sec Loss 6.8417 LearningRate 0.0258 Epoch: 9 Global Step: 408170 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:23,918-Speed 2612.76 samples/sec Loss 6.8445 LearningRate 0.0258 Epoch: 9 Global Step: 408180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:34:27,821-Speed 2623.77 samples/sec Loss 6.8322 LearningRate 0.0258 Epoch: 9 Global Step: 408190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:34:31,724-Speed 2624.53 samples/sec Loss 6.7728 LearningRate 0.0258 Epoch: 9 Global Step: 408200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:34:35,618-Speed 2629.98 samples/sec Loss 6.8231 LearningRate 0.0258 Epoch: 9 Global Step: 408210 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:34:39,518-Speed 2626.39 samples/sec Loss 6.8803 LearningRate 0.0258 Epoch: 9 Global Step: 408220 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:34:43,418-Speed 2626.64 samples/sec Loss 6.7723 LearningRate 0.0258 Epoch: 9 Global Step: 408230 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:34:47,315-Speed 2628.01 samples/sec Loss 6.7695 LearningRate 0.0258 Epoch: 9 Global Step: 408240 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:34:51,195-Speed 2639.73 samples/sec Loss 6.7213 LearningRate 0.0258 Epoch: 9 Global Step: 408250 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:55,091-Speed 2629.47 samples/sec Loss 6.7778 LearningRate 0.0258 Epoch: 9 Global Step: 408260 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:34:58,987-Speed 2628.40 samples/sec Loss 6.7975 LearningRate 0.0258 Epoch: 9 Global Step: 408270 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:02,882-Speed 2629.36 samples/sec Loss 6.8343 LearningRate 0.0258 Epoch: 9 Global Step: 408280 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:06,781-Speed 2627.39 samples/sec Loss 6.7843 LearningRate 0.0258 Epoch: 9 Global Step: 408290 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:10,689-Speed 2620.20 samples/sec Loss 6.7108 LearningRate 0.0258 Epoch: 9 Global Step: 408300 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:14,591-Speed 2625.87 samples/sec Loss 6.8652 LearningRate 0.0258 Epoch: 9 Global Step: 408310 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:18,491-Speed 2626.04 samples/sec Loss 6.8130 LearningRate 0.0258 Epoch: 9 Global Step: 408320 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:22,395-Speed 2623.47 samples/sec Loss 6.7174 LearningRate 0.0258 Epoch: 9 Global Step: 408330 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:26,293-Speed 2627.22 samples/sec Loss 6.6777 LearningRate 0.0258 Epoch: 9 Global Step: 408340 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:35:30,192-Speed 2627.05 samples/sec Loss 6.7692 LearningRate 0.0258 Epoch: 9 Global Step: 408350 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:35:34,108-Speed 2615.60 samples/sec Loss 6.7491 LearningRate 0.0258 Epoch: 9 Global Step: 408360 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:35:38,016-Speed 2620.78 samples/sec Loss 6.7791 LearningRate 0.0258 Epoch: 9 Global Step: 408370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:35:41,923-Speed 2621.12 samples/sec Loss 6.8244 LearningRate 0.0258 Epoch: 9 Global Step: 408380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:35:45,821-Speed 2627.74 samples/sec Loss 6.8390 LearningRate 0.0258 Epoch: 9 Global Step: 408390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:35:49,752-Speed 2605.49 samples/sec Loss 6.7374 LearningRate 0.0258 Epoch: 9 Global Step: 408400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:35:53,652-Speed 2626.62 samples/sec Loss 6.7895 LearningRate 0.0258 Epoch: 9 Global Step: 408410 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:35:57,558-Speed 2622.10 samples/sec Loss 6.8240 LearningRate 0.0258 Epoch: 9 Global Step: 408420 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:01,452-Speed 2630.38 samples/sec Loss 6.8741 LearningRate 0.0258 Epoch: 9 Global Step: 408430 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:05,350-Speed 2627.63 samples/sec Loss 6.7727 LearningRate 0.0258 Epoch: 9 Global Step: 408440 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:09,253-Speed 2624.07 samples/sec Loss 6.8837 LearningRate 0.0258 Epoch: 9 Global Step: 408450 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:36:13,128-Speed 2643.03 samples/sec Loss 6.7817 LearningRate 0.0258 Epoch: 9 Global Step: 408460 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:17,031-Speed 2623.91 samples/sec Loss 6.7474 LearningRate 0.0258 Epoch: 9 Global Step: 408470 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:20,924-Speed 2631.59 samples/sec Loss 6.7852 LearningRate 0.0258 Epoch: 9 Global Step: 408480 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:24,825-Speed 2625.28 samples/sec Loss 6.7488 LearningRate 0.0258 Epoch: 9 Global Step: 408490 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:28,723-Speed 2627.65 samples/sec Loss 6.8283 LearningRate 0.0258 Epoch: 9 Global Step: 408500 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:32,627-Speed 2623.68 samples/sec Loss 6.7479 LearningRate 0.0258 Epoch: 9 Global Step: 408510 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:36,526-Speed 2626.75 samples/sec Loss 6.8600 LearningRate 0.0258 Epoch: 9 Global Step: 408520 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:40,427-Speed 2625.48 samples/sec Loss 6.8002 LearningRate 0.0258 Epoch: 9 Global Step: 408530 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:44,325-Speed 2627.66 samples/sec Loss 6.7189 LearningRate 0.0258 Epoch: 9 Global Step: 408540 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:48,221-Speed 2629.05 samples/sec Loss 6.7937 LearningRate 0.0258 Epoch: 9 Global Step: 408550 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:52,102-Speed 2639.43 samples/sec Loss 6.6785 LearningRate 0.0258 Epoch: 9 Global Step: 408560 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:56,014-Speed 2617.96 samples/sec Loss 6.7411 LearningRate 0.0258 Epoch: 9 Global Step: 408570 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:36:59,927-Speed 2617.36 samples/sec Loss 6.7291 LearningRate 0.0258 Epoch: 9 Global Step: 408580 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:37:03,816-Speed 2634.17 samples/sec Loss 6.9530 LearningRate 0.0258 Epoch: 9 Global Step: 408590 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:07,708-Speed 2631.39 samples/sec Loss 6.7700 LearningRate 0.0258 Epoch: 9 Global Step: 408600 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:11,605-Speed 2627.87 samples/sec Loss 6.7461 LearningRate 0.0258 Epoch: 9 Global Step: 408610 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:15,501-Speed 2629.36 samples/sec Loss 6.8490 LearningRate 0.0257 Epoch: 9 Global Step: 408620 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:19,405-Speed 2623.78 samples/sec Loss 6.8136 LearningRate 0.0257 Epoch: 9 Global Step: 408630 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:23,324-Speed 2613.59 samples/sec Loss 6.7809 LearningRate 0.0257 Epoch: 9 Global Step: 408640 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:27,222-Speed 2627.46 samples/sec Loss 6.6919 LearningRate 0.0257 Epoch: 9 Global Step: 408650 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:31,126-Speed 2623.68 samples/sec Loss 6.8989 LearningRate 0.0257 Epoch: 9 Global Step: 408660 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:35,039-Speed 2617.08 samples/sec Loss 6.7906 LearningRate 0.0257 Epoch: 9 Global Step: 408670 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:38,934-Speed 2629.84 samples/sec Loss 6.7579 LearningRate 0.0257 Epoch: 9 Global Step: 408680 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:37:42,827-Speed 2630.59 samples/sec Loss 6.7144 LearningRate 0.0257 Epoch: 9 Global Step: 408690 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:37:46,724-Speed 2628.67 samples/sec Loss 6.7553 LearningRate 0.0257 Epoch: 9 Global Step: 408700 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:37:50,621-Speed 2628.25 samples/sec Loss 6.7519 LearningRate 0.0257 Epoch: 9 Global Step: 408710 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:37:54,519-Speed 2628.05 samples/sec Loss 6.7207 LearningRate 0.0257 Epoch: 9 Global Step: 408720 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:37:58,420-Speed 2625.32 samples/sec Loss 6.7069 LearningRate 0.0257 Epoch: 9 Global Step: 408730 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:38:02,300-Speed 2639.70 samples/sec Loss 6.8104 LearningRate 0.0257 Epoch: 9 Global Step: 408740 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:06,198-Speed 2627.47 samples/sec Loss 6.7776 LearningRate 0.0257 Epoch: 9 Global Step: 408750 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:10,093-Speed 2629.59 samples/sec Loss 6.7496 LearningRate 0.0257 Epoch: 9 Global Step: 408760 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:13,987-Speed 2630.26 samples/sec Loss 6.8564 LearningRate 0.0257 Epoch: 9 Global Step: 408770 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:17,883-Speed 2628.65 samples/sec Loss 6.6938 LearningRate 0.0257 Epoch: 9 Global Step: 408780 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:21,781-Speed 2627.33 samples/sec Loss 6.6820 LearningRate 0.0257 Epoch: 9 Global Step: 408790 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:25,680-Speed 2627.46 samples/sec Loss 6.8285 LearningRate 0.0257 Epoch: 9 Global Step: 408800 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:29,576-Speed 2630.60 samples/sec Loss 6.6372 LearningRate 0.0257 Epoch: 9 Global Step: 408810 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:33,471-Speed 2629.19 samples/sec Loss 6.7109 LearningRate 0.0257 Epoch: 9 Global Step: 408820 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:37,372-Speed 2625.56 samples/sec Loss 6.8812 LearningRate 0.0257 Epoch: 9 Global Step: 408830 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:41,269-Speed 2628.18 samples/sec Loss 6.9308 LearningRate 0.0257 Epoch: 9 Global Step: 408840 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:38:45,175-Speed 2622.13 samples/sec Loss 6.8427 LearningRate 0.0257 Epoch: 9 Global Step: 408850 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:38:49,049-Speed 2644.03 samples/sec Loss 6.8291 LearningRate 0.0257 Epoch: 9 Global Step: 408860 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:52,949-Speed 2626.13 samples/sec Loss 6.7525 LearningRate 0.0257 Epoch: 9 Global Step: 408870 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:38:56,847-Speed 2627.10 samples/sec Loss 6.6866 LearningRate 0.0257 Epoch: 9 Global Step: 408880 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:00,748-Speed 2626.01 samples/sec Loss 6.7734 LearningRate 0.0257 Epoch: 9 Global Step: 408890 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:04,647-Speed 2627.28 samples/sec Loss 6.6592 LearningRate 0.0257 Epoch: 9 Global Step: 408900 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:08,547-Speed 2626.13 samples/sec Loss 6.8371 LearningRate 0.0257 Epoch: 9 Global Step: 408910 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:12,443-Speed 2629.13 samples/sec Loss 6.8211 LearningRate 0.0257 Epoch: 9 Global Step: 408920 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:16,343-Speed 2626.37 samples/sec Loss 6.7748 LearningRate 0.0257 Epoch: 9 Global Step: 408930 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:20,245-Speed 2624.64 samples/sec Loss 6.6674 LearningRate 0.0257 Epoch: 9 Global Step: 408940 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:24,152-Speed 2621.09 samples/sec Loss 6.8401 LearningRate 0.0257 Epoch: 9 Global Step: 408950 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:28,051-Speed 2627.44 samples/sec Loss 6.7404 LearningRate 0.0257 Epoch: 9 Global Step: 408960 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:39:31,949-Speed 2627.16 samples/sec Loss 6.8129 LearningRate 0.0257 Epoch: 9 Global Step: 408970 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:39:35,843-Speed 2630.95 samples/sec Loss 6.8098 LearningRate 0.0257 Epoch: 9 Global Step: 408980 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:39:39,748-Speed 2623.21 samples/sec Loss 6.7226 LearningRate 0.0257 Epoch: 9 Global Step: 408990 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:39:43,640-Speed 2631.76 samples/sec Loss 6.7006 LearningRate 0.0257 Epoch: 9 Global Step: 409000 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:39:47,534-Speed 2630.14 samples/sec Loss 6.7943 LearningRate 0.0257 Epoch: 9 Global Step: 409010 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:39:51,407-Speed 2644.36 samples/sec Loss 6.8931 LearningRate 0.0257 Epoch: 9 Global Step: 409020 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:55,313-Speed 2622.15 samples/sec Loss 6.8092 LearningRate 0.0257 Epoch: 9 Global Step: 409030 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:39:59,219-Speed 2622.00 samples/sec Loss 6.7806 LearningRate 0.0257 Epoch: 9 Global Step: 409040 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:03,116-Speed 2628.09 samples/sec Loss 6.7774 LearningRate 0.0257 Epoch: 9 Global Step: 409050 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:07,012-Speed 2630.28 samples/sec Loss 6.8156 LearningRate 0.0257 Epoch: 9 Global Step: 409060 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:10,905-Speed 2630.17 samples/sec Loss 6.8496 LearningRate 0.0257 Epoch: 9 Global Step: 409070 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:14,812-Speed 2622.84 samples/sec Loss 6.8210 LearningRate 0.0257 Epoch: 9 Global Step: 409080 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:18,709-Speed 2628.36 samples/sec Loss 6.7297 LearningRate 0.0257 Epoch: 9 Global Step: 409090 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:22,607-Speed 2627.84 samples/sec Loss 6.7688 LearningRate 0.0257 Epoch: 9 Global Step: 409100 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:26,552-Speed 2595.90 samples/sec Loss 6.7419 LearningRate 0.0257 Epoch: 9 Global Step: 409110 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:40:30,479-Speed 2608.63 samples/sec Loss 6.7709 LearningRate 0.0257 Epoch: 9 Global Step: 409120 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:40:34,387-Speed 2621.10 samples/sec Loss 6.6692 LearningRate 0.0257 Epoch: 9 Global Step: 409130 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:40:38,280-Speed 2631.33 samples/sec Loss 6.7678 LearningRate 0.0257 Epoch: 9 Global Step: 409140 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:40:42,172-Speed 2631.20 samples/sec Loss 6.7460 LearningRate 0.0257 Epoch: 9 Global Step: 409150 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:40:46,069-Speed 2628.47 samples/sec Loss 6.8192 LearningRate 0.0257 Epoch: 9 Global Step: 409160 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:40:49,971-Speed 2624.79 samples/sec Loss 6.7039 LearningRate 0.0257 Epoch: 9 Global Step: 409170 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:40:53,871-Speed 2626.07 samples/sec Loss 6.9656 LearningRate 0.0257 Epoch: 9 Global Step: 409180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:40:57,779-Speed 2620.88 samples/sec Loss 6.8234 LearningRate 0.0257 Epoch: 9 Global Step: 409190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:41:01,682-Speed 2624.47 samples/sec Loss 6.8922 LearningRate 0.0257 Epoch: 9 Global Step: 409200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:41:05,567-Speed 2636.19 samples/sec Loss 6.7074 LearningRate 0.0257 Epoch: 9 Global Step: 409210 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:09,474-Speed 2621.82 samples/sec Loss 6.7683 LearningRate 0.0257 Epoch: 9 Global Step: 409220 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:13,371-Speed 2627.73 samples/sec Loss 6.6961 LearningRate 0.0257 Epoch: 9 Global Step: 409230 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:17,264-Speed 2631.24 samples/sec Loss 6.8550 LearningRate 0.0257 Epoch: 9 Global Step: 409240 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:21,162-Speed 2627.31 samples/sec Loss 6.8153 LearningRate 0.0257 Epoch: 9 Global Step: 409250 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:25,067-Speed 2623.11 samples/sec Loss 6.6055 LearningRate 0.0257 Epoch: 9 Global Step: 409260 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:28,961-Speed 2630.18 samples/sec Loss 6.9252 LearningRate 0.0257 Epoch: 9 Global Step: 409270 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:32,858-Speed 2628.56 samples/sec Loss 6.7548 LearningRate 0.0257 Epoch: 9 Global Step: 409280 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:36,842-Speed 2570.96 samples/sec Loss 6.7781 LearningRate 0.0257 Epoch: 9 Global Step: 409290 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:40,736-Speed 2629.54 samples/sec Loss 6.7714 LearningRate 0.0257 Epoch: 9 Global Step: 409300 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:41:44,633-Speed 2628.72 samples/sec Loss 6.8466 LearningRate 0.0257 Epoch: 9 Global Step: 409310 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:41:48,527-Speed 2630.13 samples/sec Loss 6.6702 LearningRate 0.0257 Epoch: 9 Global Step: 409320 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:41:52,439-Speed 2618.29 samples/sec Loss 6.6960 LearningRate 0.0257 Epoch: 9 Global Step: 409330 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:41:56,334-Speed 2629.18 samples/sec Loss 6.7535 LearningRate 0.0257 Epoch: 9 Global Step: 409340 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:42:00,233-Speed 2627.25 samples/sec Loss 6.7464 LearningRate 0.0257 Epoch: 9 Global Step: 409350 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:42:04,134-Speed 2625.45 samples/sec Loss 6.7470 LearningRate 0.0257 Epoch: 9 Global Step: 409360 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:42:08,038-Speed 2623.55 samples/sec Loss 6.6759 LearningRate 0.0257 Epoch: 9 Global Step: 409370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:42:11,935-Speed 2628.28 samples/sec Loss 6.6901 LearningRate 0.0257 Epoch: 9 Global Step: 409380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:42:15,831-Speed 2629.50 samples/sec Loss 6.6633 LearningRate 0.0257 Epoch: 9 Global Step: 409390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:42:19,727-Speed 2629.58 samples/sec Loss 6.7740 LearningRate 0.0257 Epoch: 9 Global Step: 409400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:42:23,622-Speed 2629.16 samples/sec Loss 6.7931 LearningRate 0.0257 Epoch: 9 Global Step: 409410 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:42:27,535-Speed 2617.66 samples/sec Loss 6.7997 LearningRate 0.0257 Epoch: 9 Global Step: 409420 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:42:31,377-Speed 2665.99 samples/sec Loss 6.5777 LearningRate 0.0257 Epoch: 9 Global Step: 409430 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:42:35,290-Speed 2617.00 samples/sec Loss 6.7329 LearningRate 0.0256 Epoch: 9 Global Step: 409440 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:42:39,201-Speed 2618.89 samples/sec Loss 6.7587 LearningRate 0.0256 Epoch: 9 Global Step: 409450 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:42:43,091-Speed 2633.32 samples/sec Loss 6.6980 LearningRate 0.0256 Epoch: 9 Global Step: 409460 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:42:46,989-Speed 2627.66 samples/sec Loss 6.8052 LearningRate 0.0256 Epoch: 9 Global Step: 409470 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:42:50,883-Speed 2630.66 samples/sec Loss 6.7083 LearningRate 0.0256 Epoch: 9 Global Step: 409480 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:42:54,782-Speed 2626.28 samples/sec Loss 6.7337 LearningRate 0.0256 Epoch: 9 Global Step: 409490 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:42:58,681-Speed 2627.20 samples/sec Loss 6.8198 LearningRate 0.0256 Epoch: 9 Global Step: 409500 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:43:02,585-Speed 2623.44 samples/sec Loss 6.7521 LearningRate 0.0256 Epoch: 9 Global Step: 409510 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:43:06,491-Speed 2622.08 samples/sec Loss 6.8167 LearningRate 0.0256 Epoch: 9 Global Step: 409520 Fp16 Grad Scale: 32768 Required: 47 hours
Training: 2022-04-14 17:43:10,388-Speed 2628.32 samples/sec Loss 6.7226 LearningRate 0.0256 Epoch: 9 Global Step: 409530 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:14,285-Speed 2628.08 samples/sec Loss 6.8083 LearningRate 0.0256 Epoch: 9 Global Step: 409540 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:18,190-Speed 2622.44 samples/sec Loss 6.8394 LearningRate 0.0256 Epoch: 9 Global Step: 409550 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:22,093-Speed 2624.86 samples/sec Loss 6.8592 LearningRate 0.0256 Epoch: 9 Global Step: 409560 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:25,989-Speed 2628.99 samples/sec Loss 6.9010 LearningRate 0.0256 Epoch: 9 Global Step: 409570 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:29,882-Speed 2631.34 samples/sec Loss 6.7431 LearningRate 0.0256 Epoch: 9 Global Step: 409580 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:33,784-Speed 2624.79 samples/sec Loss 6.6812 LearningRate 0.0256 Epoch: 9 Global Step: 409590 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:37,681-Speed 2628.13 samples/sec Loss 6.7626 LearningRate 0.0256 Epoch: 9 Global Step: 409600 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:41,576-Speed 2628.98 samples/sec Loss 6.8771 LearningRate 0.0256 Epoch: 9 Global Step: 409610 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:45,472-Speed 2629.42 samples/sec Loss 6.8589 LearningRate 0.0256 Epoch: 9 Global Step: 409620 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:43:49,365-Speed 2630.57 samples/sec Loss 6.7160 LearningRate 0.0256 Epoch: 9 Global Step: 409630 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:43:53,263-Speed 2627.65 samples/sec Loss 6.7161 LearningRate 0.0256 Epoch: 9 Global Step: 409640 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:43:57,164-Speed 2625.97 samples/sec Loss 6.7441 LearningRate 0.0256 Epoch: 9 Global Step: 409650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:01,056-Speed 2631.73 samples/sec Loss 6.7628 LearningRate 0.0256 Epoch: 9 Global Step: 409660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:04,954-Speed 2627.44 samples/sec Loss 6.8309 LearningRate 0.0256 Epoch: 9 Global Step: 409670 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:08,847-Speed 2631.54 samples/sec Loss 6.7581 LearningRate 0.0256 Epoch: 9 Global Step: 409680 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:12,740-Speed 2631.00 samples/sec Loss 6.7747 LearningRate 0.0256 Epoch: 9 Global Step: 409690 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:16,642-Speed 2624.91 samples/sec Loss 6.7618 LearningRate 0.0256 Epoch: 9 Global Step: 409700 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:20,543-Speed 2625.68 samples/sec Loss 6.7141 LearningRate 0.0256 Epoch: 9 Global Step: 409710 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:24,443-Speed 2626.29 samples/sec Loss 6.7358 LearningRate 0.0256 Epoch: 9 Global Step: 409720 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:28,318-Speed 2642.86 samples/sec Loss 6.7577 LearningRate 0.0256 Epoch: 9 Global Step: 409730 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:32,216-Speed 2627.18 samples/sec Loss 6.8465 LearningRate 0.0256 Epoch: 9 Global Step: 409740 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:36,111-Speed 2630.22 samples/sec Loss 6.7717 LearningRate 0.0256 Epoch: 9 Global Step: 409750 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:40,013-Speed 2624.38 samples/sec Loss 6.7216 LearningRate 0.0256 Epoch: 9 Global Step: 409760 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:43,910-Speed 2629.09 samples/sec Loss 6.5616 LearningRate 0.0256 Epoch: 9 Global Step: 409770 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:47,815-Speed 2622.40 samples/sec Loss 6.7320 LearningRate 0.0256 Epoch: 9 Global Step: 409780 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:51,710-Speed 2629.78 samples/sec Loss 6.8301 LearningRate 0.0256 Epoch: 9 Global Step: 409790 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:55,607-Speed 2628.05 samples/sec Loss 6.7313 LearningRate 0.0256 Epoch: 9 Global Step: 409800 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:44:59,522-Speed 2616.62 samples/sec Loss 6.8266 LearningRate 0.0256 Epoch: 9 Global Step: 409810 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:03,423-Speed 2625.02 samples/sec Loss 6.6665 LearningRate 0.0256 Epoch: 9 Global Step: 409820 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:07,308-Speed 2636.96 samples/sec Loss 6.7994 LearningRate 0.0256 Epoch: 9 Global Step: 409830 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:11,201-Speed 2630.50 samples/sec Loss 6.8690 LearningRate 0.0256 Epoch: 9 Global Step: 409840 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:15,100-Speed 2627.15 samples/sec Loss 6.7415 LearningRate 0.0256 Epoch: 9 Global Step: 409850 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:18,997-Speed 2628.09 samples/sec Loss 6.7156 LearningRate 0.0256 Epoch: 9 Global Step: 409860 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:22,902-Speed 2623.30 samples/sec Loss 6.7726 LearningRate 0.0256 Epoch: 9 Global Step: 409870 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:26,805-Speed 2624.17 samples/sec Loss 6.7379 LearningRate 0.0256 Epoch: 9 Global Step: 409880 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:30,703-Speed 2627.15 samples/sec Loss 6.7083 LearningRate 0.0256 Epoch: 9 Global Step: 409890 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:34,599-Speed 2629.08 samples/sec Loss 6.7390 LearningRate 0.0256 Epoch: 9 Global Step: 409900 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:38,516-Speed 2615.22 samples/sec Loss 6.7402 LearningRate 0.0256 Epoch: 9 Global Step: 409910 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:42,412-Speed 2629.15 samples/sec Loss 6.6981 LearningRate 0.0256 Epoch: 9 Global Step: 409920 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:46,306-Speed 2629.98 samples/sec Loss 6.7530 LearningRate 0.0256 Epoch: 9 Global Step: 409930 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:45:50,210-Speed 2623.07 samples/sec Loss 6.7691 LearningRate 0.0256 Epoch: 9 Global Step: 409940 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:45:54,115-Speed 2623.37 samples/sec Loss 6.9012 LearningRate 0.0256 Epoch: 9 Global Step: 409950 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:45:58,204-Speed 2505.07 samples/sec Loss 6.7149 LearningRate 0.0256 Epoch: 9 Global Step: 409960 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:46:02,186-Speed 2572.02 samples/sec Loss 6.9023 LearningRate 0.0256 Epoch: 9 Global Step: 409970 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:46:06,081-Speed 2629.41 samples/sec Loss 6.7609 LearningRate 0.0256 Epoch: 9 Global Step: 409980 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:46:09,984-Speed 2624.32 samples/sec Loss 6.7807 LearningRate 0.0256 Epoch: 9 Global Step: 409990 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:46:13,881-Speed 2628.33 samples/sec Loss 6.6822 LearningRate 0.0256 Epoch: 9 Global Step: 410000 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:46:56,874-[lfw][410000]XNorm: 23.137810
Training: 2022-04-14 17:46:56,875-[lfw][410000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-14 17:46:56,876-[lfw][410000]Accuracy-Highest: 0.99783
Training: 2022-04-14 17:47:46,575-[cfp_fp][410000]XNorm: 21.697988
Training: 2022-04-14 17:47:46,576-[cfp_fp][410000]Accuracy-Flip: 0.98743+-0.00460
Training: 2022-04-14 17:47:46,577-[cfp_fp][410000]Accuracy-Highest: 0.98757
Training: 2022-04-14 17:48:30,144-[agedb_30][410000]XNorm: 23.512937
Training: 2022-04-14 17:48:30,144-[agedb_30][410000]Accuracy-Flip: 0.97567+-0.00667
Training: 2022-04-14 17:48:30,145-[agedb_30][410000]Accuracy-Highest: 0.97700
Training: 2022-04-14 17:48:34,010-Speed 73.08 samples/sec Loss 6.8507 LearningRate 0.0256 Epoch: 9 Global Step: 410010 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:48:37,879-Speed 2646.71 samples/sec Loss 6.7371 LearningRate 0.0256 Epoch: 9 Global Step: 410020 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:48:41,751-Speed 2645.30 samples/sec Loss 6.7200 LearningRate 0.0256 Epoch: 9 Global Step: 410030 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:48:45,626-Speed 2643.63 samples/sec Loss 6.9014 LearningRate 0.0256 Epoch: 9 Global Step: 410040 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:48:49,483-Speed 2655.29 samples/sec Loss 6.8184 LearningRate 0.0256 Epoch: 9 Global Step: 410050 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:48:53,362-Speed 2640.44 samples/sec Loss 6.7314 LearningRate 0.0256 Epoch: 9 Global Step: 410060 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:48:57,253-Speed 2633.18 samples/sec Loss 6.7465 LearningRate 0.0256 Epoch: 9 Global Step: 410070 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:01,133-Speed 2639.97 samples/sec Loss 6.7128 LearningRate 0.0256 Epoch: 9 Global Step: 410080 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:05,029-Speed 2628.61 samples/sec Loss 6.6678 LearningRate 0.0256 Epoch: 9 Global Step: 410090 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:08,912-Speed 2638.19 samples/sec Loss 6.8084 LearningRate 0.0256 Epoch: 9 Global Step: 410100 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:12,802-Speed 2633.03 samples/sec Loss 6.7267 LearningRate 0.0256 Epoch: 9 Global Step: 410110 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:16,695-Speed 2631.28 samples/sec Loss 6.7525 LearningRate 0.0256 Epoch: 9 Global Step: 410120 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:20,587-Speed 2631.35 samples/sec Loss 6.7245 LearningRate 0.0256 Epoch: 9 Global Step: 410130 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:24,476-Speed 2634.13 samples/sec Loss 6.8087 LearningRate 0.0256 Epoch: 9 Global Step: 410140 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:28,347-Speed 2645.27 samples/sec Loss 6.8889 LearningRate 0.0256 Epoch: 9 Global Step: 410150 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:32,246-Speed 2628.10 samples/sec Loss 6.7882 LearningRate 0.0256 Epoch: 9 Global Step: 410160 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:36,139-Speed 2630.40 samples/sec Loss 6.8911 LearningRate 0.0256 Epoch: 9 Global Step: 410170 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:40,047-Speed 2621.31 samples/sec Loss 6.7715 LearningRate 0.0256 Epoch: 9 Global Step: 410180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:43,944-Speed 2627.92 samples/sec Loss 6.9588 LearningRate 0.0256 Epoch: 9 Global Step: 410190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:47,838-Speed 2630.50 samples/sec Loss 6.7303 LearningRate 0.0256 Epoch: 9 Global Step: 410200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:51,741-Speed 2624.05 samples/sec Loss 6.7005 LearningRate 0.0256 Epoch: 9 Global Step: 410210 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:49:55,612-Speed 2646.33 samples/sec Loss 6.6639 LearningRate 0.0256 Epoch: 9 Global Step: 410220 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:49:59,504-Speed 2631.12 samples/sec Loss 6.7632 LearningRate 0.0256 Epoch: 9 Global Step: 410230 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:03,398-Speed 2630.80 samples/sec Loss 6.7264 LearningRate 0.0256 Epoch: 9 Global Step: 410240 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:07,292-Speed 2631.21 samples/sec Loss 6.7399 LearningRate 0.0256 Epoch: 9 Global Step: 410250 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:11,202-Speed 2619.20 samples/sec Loss 6.7174 LearningRate 0.0255 Epoch: 9 Global Step: 410260 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:15,099-Speed 2628.78 samples/sec Loss 6.8469 LearningRate 0.0255 Epoch: 9 Global Step: 410270 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:18,995-Speed 2629.23 samples/sec Loss 6.7187 LearningRate 0.0255 Epoch: 9 Global Step: 410280 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:22,891-Speed 2628.36 samples/sec Loss 6.7469 LearningRate 0.0255 Epoch: 9 Global Step: 410290 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:26,790-Speed 2626.97 samples/sec Loss 6.6427 LearningRate 0.0255 Epoch: 9 Global Step: 410300 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:30,699-Speed 2620.47 samples/sec Loss 6.7555 LearningRate 0.0255 Epoch: 9 Global Step: 410310 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:50:34,722-Speed 2546.18 samples/sec Loss 6.7301 LearningRate 0.0255 Epoch: 9 Global Step: 410320 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:50:38,624-Speed 2625.24 samples/sec Loss 6.6465 LearningRate 0.0255 Epoch: 9 Global Step: 410330 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:50:42,524-Speed 2626.98 samples/sec Loss 6.8398 LearningRate 0.0255 Epoch: 9 Global Step: 410340 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:50:46,424-Speed 2626.51 samples/sec Loss 6.7758 LearningRate 0.0255 Epoch: 9 Global Step: 410350 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:50:50,355-Speed 2605.16 samples/sec Loss 6.6659 LearningRate 0.0255 Epoch: 9 Global Step: 410360 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:50:54,269-Speed 2617.29 samples/sec Loss 6.7666 LearningRate 0.0255 Epoch: 9 Global Step: 410370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:50:58,169-Speed 2626.07 samples/sec Loss 6.7487 LearningRate 0.0255 Epoch: 9 Global Step: 410380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:02,074-Speed 2623.03 samples/sec Loss 6.7657 LearningRate 0.0255 Epoch: 9 Global Step: 410390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:06,012-Speed 2600.95 samples/sec Loss 6.7846 LearningRate 0.0255 Epoch: 9 Global Step: 410400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:09,918-Speed 2622.38 samples/sec Loss 6.7284 LearningRate 0.0255 Epoch: 9 Global Step: 410410 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:13,822-Speed 2623.81 samples/sec Loss 6.7280 LearningRate 0.0255 Epoch: 9 Global Step: 410420 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:51:17,707-Speed 2636.91 samples/sec Loss 6.7120 LearningRate 0.0255 Epoch: 9 Global Step: 410430 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:21,609-Speed 2624.48 samples/sec Loss 6.7128 LearningRate 0.0255 Epoch: 9 Global Step: 410440 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:25,512-Speed 2624.49 samples/sec Loss 6.7397 LearningRate 0.0255 Epoch: 9 Global Step: 410450 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:29,416-Speed 2623.33 samples/sec Loss 6.8315 LearningRate 0.0255 Epoch: 9 Global Step: 410460 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:33,362-Speed 2595.57 samples/sec Loss 6.5880 LearningRate 0.0255 Epoch: 9 Global Step: 410470 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:37,365-Speed 2558.94 samples/sec Loss 6.6807 LearningRate 0.0255 Epoch: 9 Global Step: 410480 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:41,343-Speed 2575.21 samples/sec Loss 6.6225 LearningRate 0.0255 Epoch: 9 Global Step: 410490 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:45,288-Speed 2596.31 samples/sec Loss 6.7069 LearningRate 0.0255 Epoch: 9 Global Step: 410500 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:49,196-Speed 2620.59 samples/sec Loss 6.6549 LearningRate 0.0255 Epoch: 9 Global Step: 410510 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:53,097-Speed 2625.41 samples/sec Loss 6.8364 LearningRate 0.0255 Epoch: 9 Global Step: 410520 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:51:56,980-Speed 2638.27 samples/sec Loss 6.7145 LearningRate 0.0255 Epoch: 9 Global Step: 410530 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:00,885-Speed 2622.63 samples/sec Loss 6.6973 LearningRate 0.0255 Epoch: 9 Global Step: 410540 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:04,785-Speed 2626.81 samples/sec Loss 6.7224 LearningRate 0.0255 Epoch: 9 Global Step: 410550 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:08,694-Speed 2619.54 samples/sec Loss 6.6312 LearningRate 0.0255 Epoch: 9 Global Step: 410560 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:12,608-Speed 2617.60 samples/sec Loss 6.6001 LearningRate 0.0255 Epoch: 9 Global Step: 410570 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:16,509-Speed 2624.97 samples/sec Loss 6.7708 LearningRate 0.0255 Epoch: 9 Global Step: 410580 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:20,452-Speed 2597.96 samples/sec Loss 6.6993 LearningRate 0.0255 Epoch: 9 Global Step: 410590 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:24,352-Speed 2626.52 samples/sec Loss 6.7935 LearningRate 0.0255 Epoch: 9 Global Step: 410600 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:28,262-Speed 2619.88 samples/sec Loss 6.7906 LearningRate 0.0255 Epoch: 9 Global Step: 410610 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:32,189-Speed 2607.92 samples/sec Loss 6.7897 LearningRate 0.0255 Epoch: 9 Global Step: 410620 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:36,085-Speed 2629.25 samples/sec Loss 6.7612 LearningRate 0.0255 Epoch: 9 Global Step: 410630 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:39,990-Speed 2622.53 samples/sec Loss 6.7364 LearningRate 0.0255 Epoch: 9 Global Step: 410640 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:43,893-Speed 2624.85 samples/sec Loss 6.7569 LearningRate 0.0255 Epoch: 9 Global Step: 410650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:47,795-Speed 2625.40 samples/sec Loss 6.7275 LearningRate 0.0255 Epoch: 9 Global Step: 410660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:51,702-Speed 2621.09 samples/sec Loss 6.6506 LearningRate 0.0255 Epoch: 9 Global Step: 410670 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:55,607-Speed 2623.22 samples/sec Loss 6.7504 LearningRate 0.0255 Epoch: 9 Global Step: 410680 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:52:59,517-Speed 2619.86 samples/sec Loss 6.6850 LearningRate 0.0255 Epoch: 9 Global Step: 410690 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:03,434-Speed 2614.60 samples/sec Loss 6.6764 LearningRate 0.0255 Epoch: 9 Global Step: 410700 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:07,372-Speed 2601.01 samples/sec Loss 6.7491 LearningRate 0.0255 Epoch: 9 Global Step: 410710 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:11,334-Speed 2585.72 samples/sec Loss 6.6892 LearningRate 0.0255 Epoch: 9 Global Step: 410720 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:15,332-Speed 2562.46 samples/sec Loss 6.7626 LearningRate 0.0255 Epoch: 9 Global Step: 410730 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:53:19,238-Speed 2622.31 samples/sec Loss 6.7760 LearningRate 0.0255 Epoch: 9 Global Step: 410740 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:23,151-Speed 2617.68 samples/sec Loss 6.7255 LearningRate 0.0255 Epoch: 9 Global Step: 410750 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:27,141-Speed 2566.83 samples/sec Loss 6.8232 LearningRate 0.0255 Epoch: 9 Global Step: 410760 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:31,048-Speed 2622.11 samples/sec Loss 6.6733 LearningRate 0.0255 Epoch: 9 Global Step: 410770 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:34,961-Speed 2616.99 samples/sec Loss 6.8368 LearningRate 0.0255 Epoch: 9 Global Step: 410780 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:38,880-Speed 2613.48 samples/sec Loss 6.6723 LearningRate 0.0255 Epoch: 9 Global Step: 410790 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:42,787-Speed 2622.37 samples/sec Loss 6.7909 LearningRate 0.0255 Epoch: 9 Global Step: 410800 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:53:46,679-Speed 2631.65 samples/sec Loss 6.7439 LearningRate 0.0255 Epoch: 9 Global Step: 410810 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:53:50,581-Speed 2625.12 samples/sec Loss 6.6399 LearningRate 0.0255 Epoch: 9 Global Step: 410820 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:53:54,486-Speed 2622.47 samples/sec Loss 6.7125 LearningRate 0.0255 Epoch: 9 Global Step: 410830 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:53:58,388-Speed 2625.35 samples/sec Loss 6.7294 LearningRate 0.0255 Epoch: 9 Global Step: 410840 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:02,299-Speed 2619.02 samples/sec Loss 6.7369 LearningRate 0.0255 Epoch: 9 Global Step: 410850 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:06,233-Speed 2603.67 samples/sec Loss 6.8138 LearningRate 0.0255 Epoch: 9 Global Step: 410860 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:10,151-Speed 2614.20 samples/sec Loss 6.7552 LearningRate 0.0255 Epoch: 9 Global Step: 410870 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:14,066-Speed 2616.50 samples/sec Loss 6.6643 LearningRate 0.0255 Epoch: 9 Global Step: 410880 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:17,976-Speed 2619.53 samples/sec Loss 6.6348 LearningRate 0.0255 Epoch: 9 Global Step: 410890 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:21,889-Speed 2617.42 samples/sec Loss 6.9037 LearningRate 0.0255 Epoch: 9 Global Step: 410900 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:25,811-Speed 2612.32 samples/sec Loss 6.7455 LearningRate 0.0255 Epoch: 9 Global Step: 410910 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:54:29,757-Speed 2595.16 samples/sec Loss 6.6681 LearningRate 0.0255 Epoch: 9 Global Step: 410920 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:54:33,693-Speed 2602.89 samples/sec Loss 6.7718 LearningRate 0.0255 Epoch: 9 Global Step: 410930 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:54:37,846-Speed 2465.89 samples/sec Loss 6.7359 LearningRate 0.0255 Epoch: 9 Global Step: 410940 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:41,747-Speed 2625.84 samples/sec Loss 6.7536 LearningRate 0.0255 Epoch: 9 Global Step: 410950 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:45,653-Speed 2621.91 samples/sec Loss 6.7611 LearningRate 0.0255 Epoch: 9 Global Step: 410960 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:49,557-Speed 2623.84 samples/sec Loss 6.7442 LearningRate 0.0255 Epoch: 9 Global Step: 410970 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:53,460-Speed 2624.92 samples/sec Loss 6.8712 LearningRate 0.0255 Epoch: 9 Global Step: 410980 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:54:57,363-Speed 2623.89 samples/sec Loss 6.7948 LearningRate 0.0255 Epoch: 9 Global Step: 410990 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:55:01,285-Speed 2612.11 samples/sec Loss 6.7563 LearningRate 0.0255 Epoch: 9 Global Step: 411000 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:55:05,192-Speed 2621.53 samples/sec Loss 6.7447 LearningRate 0.0255 Epoch: 9 Global Step: 411010 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:55:09,108-Speed 2615.48 samples/sec Loss 6.7133 LearningRate 0.0255 Epoch: 9 Global Step: 411020 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:55:13,033-Speed 2609.80 samples/sec Loss 6.6876 LearningRate 0.0255 Epoch: 9 Global Step: 411030 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:55:16,936-Speed 2624.43 samples/sec Loss 6.6541 LearningRate 0.0255 Epoch: 9 Global Step: 411040 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:20,839-Speed 2623.90 samples/sec Loss 6.6652 LearningRate 0.0255 Epoch: 9 Global Step: 411050 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:24,738-Speed 2627.42 samples/sec Loss 6.6843 LearningRate 0.0255 Epoch: 9 Global Step: 411060 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:28,644-Speed 2622.35 samples/sec Loss 6.8288 LearningRate 0.0255 Epoch: 9 Global Step: 411070 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:32,550-Speed 2622.13 samples/sec Loss 6.5988 LearningRate 0.0254 Epoch: 9 Global Step: 411080 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:36,459-Speed 2620.13 samples/sec Loss 6.6970 LearningRate 0.0254 Epoch: 9 Global Step: 411090 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:40,366-Speed 2622.20 samples/sec Loss 6.6641 LearningRate 0.0254 Epoch: 9 Global Step: 411100 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:44,269-Speed 2625.25 samples/sec Loss 6.8192 LearningRate 0.0254 Epoch: 9 Global Step: 411110 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:48,168-Speed 2627.11 samples/sec Loss 6.6830 LearningRate 0.0254 Epoch: 9 Global Step: 411120 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:52,077-Speed 2619.98 samples/sec Loss 6.6504 LearningRate 0.0254 Epoch: 9 Global Step: 411130 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:55,969-Speed 2632.20 samples/sec Loss 6.7650 LearningRate 0.0254 Epoch: 9 Global Step: 411140 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:55:59,889-Speed 2612.72 samples/sec Loss 6.7622 LearningRate 0.0254 Epoch: 9 Global Step: 411150 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:03,800-Speed 2619.63 samples/sec Loss 6.6894 LearningRate 0.0254 Epoch: 9 Global Step: 411160 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:07,703-Speed 2623.78 samples/sec Loss 6.8569 LearningRate 0.0254 Epoch: 9 Global Step: 411170 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:11,610-Speed 2621.54 samples/sec Loss 6.6737 LearningRate 0.0254 Epoch: 9 Global Step: 411180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:15,518-Speed 2620.76 samples/sec Loss 6.6191 LearningRate 0.0254 Epoch: 9 Global Step: 411190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:19,431-Speed 2618.46 samples/sec Loss 6.8445 LearningRate 0.0254 Epoch: 9 Global Step: 411200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:23,341-Speed 2619.68 samples/sec Loss 6.7747 LearningRate 0.0254 Epoch: 9 Global Step: 411210 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:27,244-Speed 2624.51 samples/sec Loss 6.8299 LearningRate 0.0254 Epoch: 9 Global Step: 411220 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:31,146-Speed 2624.58 samples/sec Loss 6.7548 LearningRate 0.0254 Epoch: 9 Global Step: 411230 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:35,036-Speed 2633.29 samples/sec Loss 6.7869 LearningRate 0.0254 Epoch: 9 Global Step: 411240 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:38,963-Speed 2608.06 samples/sec Loss 6.7762 LearningRate 0.0254 Epoch: 9 Global Step: 411250 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:42,902-Speed 2603.01 samples/sec Loss 6.8376 LearningRate 0.0254 Epoch: 9 Global Step: 411260 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:46,808-Speed 2621.97 samples/sec Loss 6.8333 LearningRate 0.0254 Epoch: 9 Global Step: 411270 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:50,715-Speed 2621.90 samples/sec Loss 6.7707 LearningRate 0.0254 Epoch: 9 Global Step: 411280 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:54,623-Speed 2620.81 samples/sec Loss 6.7844 LearningRate 0.0254 Epoch: 9 Global Step: 411290 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:56:58,531-Speed 2621.11 samples/sec Loss 6.8089 LearningRate 0.0254 Epoch: 9 Global Step: 411300 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:02,471-Speed 2598.88 samples/sec Loss 6.6909 LearningRate 0.0254 Epoch: 9 Global Step: 411310 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:06,377-Speed 2623.03 samples/sec Loss 6.7911 LearningRate 0.0254 Epoch: 9 Global Step: 411320 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:10,356-Speed 2574.18 samples/sec Loss 6.7112 LearningRate 0.0254 Epoch: 9 Global Step: 411330 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:14,284-Speed 2607.85 samples/sec Loss 6.7218 LearningRate 0.0254 Epoch: 9 Global Step: 411340 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:57:18,182-Speed 2628.17 samples/sec Loss 6.7405 LearningRate 0.0254 Epoch: 9 Global Step: 411350 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:57:22,077-Speed 2629.37 samples/sec Loss 6.7714 LearningRate 0.0254 Epoch: 9 Global Step: 411360 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:25,985-Speed 2621.28 samples/sec Loss 6.7917 LearningRate 0.0254 Epoch: 9 Global Step: 411370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:29,886-Speed 2625.21 samples/sec Loss 6.7267 LearningRate 0.0254 Epoch: 9 Global Step: 411380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:33,794-Speed 2620.70 samples/sec Loss 6.6313 LearningRate 0.0254 Epoch: 9 Global Step: 411390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:37,699-Speed 2623.13 samples/sec Loss 6.7299 LearningRate 0.0254 Epoch: 9 Global Step: 411400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:57:41,610-Speed 2619.61 samples/sec Loss 6.7374 LearningRate 0.0254 Epoch: 9 Global Step: 411410 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:57:45,522-Speed 2617.49 samples/sec Loss 6.7937 LearningRate 0.0254 Epoch: 9 Global Step: 411420 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:57:49,429-Speed 2622.17 samples/sec Loss 6.6241 LearningRate 0.0254 Epoch: 9 Global Step: 411430 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:57:53,388-Speed 2587.41 samples/sec Loss 6.6853 LearningRate 0.0254 Epoch: 9 Global Step: 411440 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:57:57,292-Speed 2623.38 samples/sec Loss 6.7568 LearningRate 0.0254 Epoch: 9 Global Step: 411450 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:58:01,193-Speed 2625.73 samples/sec Loss 6.8264 LearningRate 0.0254 Epoch: 9 Global Step: 411460 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:58:05,101-Speed 2620.91 samples/sec Loss 6.7102 LearningRate 0.0254 Epoch: 9 Global Step: 411470 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:58:09,006-Speed 2623.11 samples/sec Loss 6.7250 LearningRate 0.0254 Epoch: 9 Global Step: 411480 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:58:12,916-Speed 2619.56 samples/sec Loss 6.7203 LearningRate 0.0254 Epoch: 9 Global Step: 411490 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:58:16,824-Speed 2620.45 samples/sec Loss 6.6990 LearningRate 0.0254 Epoch: 9 Global Step: 411500 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:58:20,748-Speed 2610.85 samples/sec Loss 6.6309 LearningRate 0.0254 Epoch: 9 Global Step: 411510 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:24,698-Speed 2592.58 samples/sec Loss 6.8297 LearningRate 0.0254 Epoch: 9 Global Step: 411520 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:28,602-Speed 2623.98 samples/sec Loss 6.7019 LearningRate 0.0254 Epoch: 9 Global Step: 411530 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:32,510-Speed 2620.82 samples/sec Loss 6.6447 LearningRate 0.0254 Epoch: 9 Global Step: 411540 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:36,423-Speed 2617.52 samples/sec Loss 6.6940 LearningRate 0.0254 Epoch: 9 Global Step: 411550 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:40,341-Speed 2613.89 samples/sec Loss 6.6333 LearningRate 0.0254 Epoch: 9 Global Step: 411560 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:44,244-Speed 2625.14 samples/sec Loss 6.6434 LearningRate 0.0254 Epoch: 9 Global Step: 411570 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:48,145-Speed 2625.59 samples/sec Loss 6.8457 LearningRate 0.0254 Epoch: 9 Global Step: 411580 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:52,066-Speed 2612.54 samples/sec Loss 6.6998 LearningRate 0.0254 Epoch: 9 Global Step: 411590 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:55,964-Speed 2627.71 samples/sec Loss 6.7683 LearningRate 0.0254 Epoch: 9 Global Step: 411600 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:58:59,885-Speed 2612.05 samples/sec Loss 6.8534 LearningRate 0.0254 Epoch: 9 Global Step: 411610 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:59:03,780-Speed 2629.31 samples/sec Loss 6.7526 LearningRate 0.0254 Epoch: 9 Global Step: 411620 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:59:07,674-Speed 2630.23 samples/sec Loss 6.6544 LearningRate 0.0254 Epoch: 9 Global Step: 411630 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:11,582-Speed 2621.54 samples/sec Loss 6.6791 LearningRate 0.0254 Epoch: 9 Global Step: 411640 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:15,499-Speed 2614.62 samples/sec Loss 6.7490 LearningRate 0.0254 Epoch: 9 Global Step: 411650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:19,406-Speed 2621.57 samples/sec Loss 6.6926 LearningRate 0.0254 Epoch: 9 Global Step: 411660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:23,310-Speed 2623.11 samples/sec Loss 6.8180 LearningRate 0.0254 Epoch: 9 Global Step: 411670 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:27,229-Speed 2613.99 samples/sec Loss 6.7712 LearningRate 0.0254 Epoch: 9 Global Step: 411680 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:31,137-Speed 2621.29 samples/sec Loss 6.8055 LearningRate 0.0254 Epoch: 9 Global Step: 411690 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:35,063-Speed 2608.65 samples/sec Loss 6.7486 LearningRate 0.0254 Epoch: 9 Global Step: 411700 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:38,999-Speed 2602.10 samples/sec Loss 6.7296 LearningRate 0.0254 Epoch: 9 Global Step: 411710 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:42,910-Speed 2619.40 samples/sec Loss 6.6944 LearningRate 0.0254 Epoch: 9 Global Step: 411720 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:46,819-Speed 2620.01 samples/sec Loss 6.8634 LearningRate 0.0254 Epoch: 9 Global Step: 411730 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 17:59:50,705-Speed 2635.65 samples/sec Loss 6.8445 LearningRate 0.0254 Epoch: 9 Global Step: 411740 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 17:59:54,599-Speed 2630.54 samples/sec Loss 6.7760 LearningRate 0.0254 Epoch: 9 Global Step: 411750 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 17:59:58,506-Speed 2621.93 samples/sec Loss 6.6399 LearningRate 0.0254 Epoch: 9 Global Step: 411760 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:02,410-Speed 2623.81 samples/sec Loss 6.6546 LearningRate 0.0254 Epoch: 9 Global Step: 411770 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:06,310-Speed 2625.78 samples/sec Loss 6.7103 LearningRate 0.0254 Epoch: 9 Global Step: 411780 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:10,249-Speed 2600.45 samples/sec Loss 6.7667 LearningRate 0.0254 Epoch: 9 Global Step: 411790 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:14,148-Speed 2626.87 samples/sec Loss 6.7053 LearningRate 0.0254 Epoch: 9 Global Step: 411800 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:18,056-Speed 2620.66 samples/sec Loss 6.7414 LearningRate 0.0254 Epoch: 9 Global Step: 411810 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:21,963-Speed 2621.46 samples/sec Loss 6.6897 LearningRate 0.0254 Epoch: 9 Global Step: 411820 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:25,889-Speed 2609.20 samples/sec Loss 6.7861 LearningRate 0.0254 Epoch: 9 Global Step: 411830 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:29,794-Speed 2623.19 samples/sec Loss 6.7127 LearningRate 0.0254 Epoch: 9 Global Step: 411840 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:00:33,698-Speed 2623.57 samples/sec Loss 6.6517 LearningRate 0.0254 Epoch: 9 Global Step: 411850 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:00:37,602-Speed 2623.58 samples/sec Loss 6.6912 LearningRate 0.0254 Epoch: 9 Global Step: 411860 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:00:41,504-Speed 2624.89 samples/sec Loss 6.7557 LearningRate 0.0254 Epoch: 9 Global Step: 411870 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:00:45,406-Speed 2625.13 samples/sec Loss 6.8106 LearningRate 0.0254 Epoch: 9 Global Step: 411880 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:00:49,359-Speed 2592.91 samples/sec Loss 6.7476 LearningRate 0.0254 Epoch: 9 Global Step: 411890 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:00:53,265-Speed 2622.71 samples/sec Loss 6.7915 LearningRate 0.0253 Epoch: 9 Global Step: 411900 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:00:57,178-Speed 2617.88 samples/sec Loss 6.7233 LearningRate 0.0253 Epoch: 9 Global Step: 411910 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:01:01,136-Speed 2588.32 samples/sec Loss 6.7523 LearningRate 0.0253 Epoch: 9 Global Step: 411920 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:01:05,048-Speed 2618.01 samples/sec Loss 6.7073 LearningRate 0.0253 Epoch: 9 Global Step: 411930 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:01:08,968-Speed 2612.73 samples/sec Loss 6.7269 LearningRate 0.0253 Epoch: 9 Global Step: 411940 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:12,875-Speed 2621.39 samples/sec Loss 6.6673 LearningRate 0.0253 Epoch: 9 Global Step: 411950 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:16,783-Speed 2621.02 samples/sec Loss 6.7558 LearningRate 0.0253 Epoch: 9 Global Step: 411960 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:20,733-Speed 2593.37 samples/sec Loss 6.7428 LearningRate 0.0253 Epoch: 9 Global Step: 411970 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:24,642-Speed 2620.65 samples/sec Loss 6.7174 LearningRate 0.0253 Epoch: 9 Global Step: 411980 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:28,549-Speed 2621.16 samples/sec Loss 6.6189 LearningRate 0.0253 Epoch: 9 Global Step: 411990 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:32,468-Speed 2613.91 samples/sec Loss 6.7359 LearningRate 0.0253 Epoch: 9 Global Step: 412000 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:36,376-Speed 2621.02 samples/sec Loss 6.7317 LearningRate 0.0253 Epoch: 9 Global Step: 412010 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:40,280-Speed 2623.79 samples/sec Loss 6.6956 LearningRate 0.0253 Epoch: 9 Global Step: 412020 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:44,183-Speed 2623.60 samples/sec Loss 6.8070 LearningRate 0.0253 Epoch: 9 Global Step: 412030 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:01:48,181-Speed 2562.42 samples/sec Loss 6.7657 LearningRate 0.0253 Epoch: 9 Global Step: 412040 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:01:52,087-Speed 2622.70 samples/sec Loss 6.6833 LearningRate 0.0253 Epoch: 9 Global Step: 412050 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:01:56,008-Speed 2612.10 samples/sec Loss 6.9167 LearningRate 0.0253 Epoch: 9 Global Step: 412060 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:01:59,917-Speed 2620.15 samples/sec Loss 6.8015 LearningRate 0.0253 Epoch: 9 Global Step: 412070 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:03,820-Speed 2624.74 samples/sec Loss 6.7822 LearningRate 0.0253 Epoch: 9 Global Step: 412080 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:07,726-Speed 2621.76 samples/sec Loss 6.7046 LearningRate 0.0253 Epoch: 9 Global Step: 412090 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:11,636-Speed 2619.97 samples/sec Loss 6.6741 LearningRate 0.0253 Epoch: 9 Global Step: 412100 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:15,539-Speed 2623.76 samples/sec Loss 6.7822 LearningRate 0.0253 Epoch: 9 Global Step: 412110 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:19,442-Speed 2624.96 samples/sec Loss 6.7154 LearningRate 0.0253 Epoch: 9 Global Step: 412120 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:23,345-Speed 2623.84 samples/sec Loss 6.8286 LearningRate 0.0253 Epoch: 9 Global Step: 412130 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:27,248-Speed 2624.12 samples/sec Loss 6.8025 LearningRate 0.0253 Epoch: 9 Global Step: 412140 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:31,151-Speed 2624.49 samples/sec Loss 6.7890 LearningRate 0.0253 Epoch: 9 Global Step: 412150 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:35,056-Speed 2622.79 samples/sec Loss 6.7663 LearningRate 0.0253 Epoch: 9 Global Step: 412160 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:02:38,962-Speed 2622.28 samples/sec Loss 6.7481 LearningRate 0.0253 Epoch: 9 Global Step: 412170 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:02:42,870-Speed 2620.71 samples/sec Loss 6.7010 LearningRate 0.0253 Epoch: 9 Global Step: 412180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:02:46,784-Speed 2617.43 samples/sec Loss 6.7358 LearningRate 0.0253 Epoch: 9 Global Step: 412190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:02:50,693-Speed 2620.36 samples/sec Loss 6.7395 LearningRate 0.0253 Epoch: 9 Global Step: 412200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:02:54,597-Speed 2623.65 samples/sec Loss 6.7245 LearningRate 0.0253 Epoch: 9 Global Step: 412210 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:02:58,502-Speed 2622.85 samples/sec Loss 6.7317 LearningRate 0.0253 Epoch: 9 Global Step: 412220 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:03:02,410-Speed 2620.43 samples/sec Loss 6.6811 LearningRate 0.0253 Epoch: 9 Global Step: 412230 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:03:06,334-Speed 2610.14 samples/sec Loss 6.7483 LearningRate 0.0253 Epoch: 9 Global Step: 412240 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:03:10,238-Speed 2624.48 samples/sec Loss 6.7724 LearningRate 0.0253 Epoch: 9 Global Step: 412250 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:03:14,144-Speed 2622.38 samples/sec Loss 6.7083 LearningRate 0.0253 Epoch: 9 Global Step: 412260 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:03:18,044-Speed 2626.09 samples/sec Loss 6.6850 LearningRate 0.0253 Epoch: 9 Global Step: 412270 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:21,970-Speed 2609.05 samples/sec Loss 6.6905 LearningRate 0.0253 Epoch: 9 Global Step: 412280 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:25,871-Speed 2625.47 samples/sec Loss 6.6021 LearningRate 0.0253 Epoch: 9 Global Step: 412290 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:29,780-Speed 2620.69 samples/sec Loss 6.7683 LearningRate 0.0253 Epoch: 9 Global Step: 412300 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:33,683-Speed 2624.25 samples/sec Loss 6.6758 LearningRate 0.0253 Epoch: 9 Global Step: 412310 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:37,585-Speed 2624.29 samples/sec Loss 6.7213 LearningRate 0.0253 Epoch: 9 Global Step: 412320 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:41,487-Speed 2625.48 samples/sec Loss 6.8182 LearningRate 0.0253 Epoch: 9 Global Step: 412330 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:45,393-Speed 2622.27 samples/sec Loss 6.6813 LearningRate 0.0253 Epoch: 9 Global Step: 412340 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:49,298-Speed 2623.12 samples/sec Loss 6.6290 LearningRate 0.0253 Epoch: 9 Global Step: 412350 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:53,203-Speed 2622.79 samples/sec Loss 6.6776 LearningRate 0.0253 Epoch: 9 Global Step: 412360 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:03:57,110-Speed 2622.14 samples/sec Loss 6.7348 LearningRate 0.0253 Epoch: 9 Global Step: 412370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:04:01,039-Speed 2606.71 samples/sec Loss 6.6142 LearningRate 0.0253 Epoch: 9 Global Step: 412380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:04:04,959-Speed 2613.05 samples/sec Loss 6.6824 LearningRate 0.0253 Epoch: 9 Global Step: 412390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:04:08,865-Speed 2622.39 samples/sec Loss 6.6196 LearningRate 0.0253 Epoch: 9 Global Step: 412400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:04:12,771-Speed 2622.32 samples/sec Loss 6.6470 LearningRate 0.0253 Epoch: 9 Global Step: 412410 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:04:16,672-Speed 2625.51 samples/sec Loss 6.7792 LearningRate 0.0253 Epoch: 9 Global Step: 412420 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:04:20,558-Speed 2635.54 samples/sec Loss 6.6895 LearningRate 0.0253 Epoch: 9 Global Step: 412430 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:24,467-Speed 2620.75 samples/sec Loss 6.6099 LearningRate 0.0253 Epoch: 9 Global Step: 412440 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:28,383-Speed 2615.97 samples/sec Loss 6.7633 LearningRate 0.0253 Epoch: 9 Global Step: 412450 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:32,288-Speed 2622.29 samples/sec Loss 6.7401 LearningRate 0.0253 Epoch: 9 Global Step: 412460 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:36,191-Speed 2624.18 samples/sec Loss 6.6650 LearningRate 0.0253 Epoch: 9 Global Step: 412470 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:40,098-Speed 2621.60 samples/sec Loss 6.7805 LearningRate 0.0253 Epoch: 9 Global Step: 412480 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:43,997-Speed 2627.16 samples/sec Loss 6.7751 LearningRate 0.0253 Epoch: 9 Global Step: 412490 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:47,901-Speed 2623.42 samples/sec Loss 6.8531 LearningRate 0.0253 Epoch: 9 Global Step: 412500 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:51,832-Speed 2606.25 samples/sec Loss 6.7463 LearningRate 0.0253 Epoch: 9 Global Step: 412510 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:55,755-Speed 2611.12 samples/sec Loss 6.6961 LearningRate 0.0253 Epoch: 9 Global Step: 412520 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:04:59,665-Speed 2620.01 samples/sec Loss 6.8572 LearningRate 0.0253 Epoch: 9 Global Step: 412530 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:03,578-Speed 2617.14 samples/sec Loss 6.7378 LearningRate 0.0253 Epoch: 9 Global Step: 412540 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:07,544-Speed 2582.60 samples/sec Loss 6.7732 LearningRate 0.0253 Epoch: 9 Global Step: 412550 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:11,453-Speed 2620.14 samples/sec Loss 6.7328 LearningRate 0.0253 Epoch: 9 Global Step: 412560 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:15,357-Speed 2623.64 samples/sec Loss 6.7678 LearningRate 0.0253 Epoch: 9 Global Step: 412570 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:19,377-Speed 2548.41 samples/sec Loss 6.6101 LearningRate 0.0253 Epoch: 9 Global Step: 412580 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:23,307-Speed 2606.30 samples/sec Loss 6.5974 LearningRate 0.0253 Epoch: 9 Global Step: 412590 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:27,209-Speed 2625.11 samples/sec Loss 6.7341 LearningRate 0.0253 Epoch: 9 Global Step: 412600 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:31,147-Speed 2601.41 samples/sec Loss 6.7143 LearningRate 0.0253 Epoch: 9 Global Step: 412610 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:35,192-Speed 2531.90 samples/sec Loss 6.7935 LearningRate 0.0253 Epoch: 9 Global Step: 412620 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:39,135-Speed 2597.62 samples/sec Loss 6.6572 LearningRate 0.0253 Epoch: 9 Global Step: 412630 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:05:43,022-Speed 2635.07 samples/sec Loss 6.7849 LearningRate 0.0253 Epoch: 9 Global Step: 412640 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:46,932-Speed 2619.23 samples/sec Loss 6.7297 LearningRate 0.0253 Epoch: 9 Global Step: 412650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:50,842-Speed 2620.25 samples/sec Loss 6.6558 LearningRate 0.0253 Epoch: 9 Global Step: 412660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:54,747-Speed 2622.83 samples/sec Loss 6.7159 LearningRate 0.0253 Epoch: 9 Global Step: 412670 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:05:58,648-Speed 2625.69 samples/sec Loss 6.6714 LearningRate 0.0253 Epoch: 9 Global Step: 412680 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:06:02,541-Speed 2630.97 samples/sec Loss 6.6507 LearningRate 0.0253 Epoch: 9 Global Step: 412690 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:06,456-Speed 2616.46 samples/sec Loss 6.7068 LearningRate 0.0253 Epoch: 9 Global Step: 412700 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:10,355-Speed 2626.42 samples/sec Loss 6.7523 LearningRate 0.0253 Epoch: 9 Global Step: 412710 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:14,255-Speed 2626.21 samples/sec Loss 6.7187 LearningRate 0.0253 Epoch: 9 Global Step: 412720 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:18,158-Speed 2624.59 samples/sec Loss 6.7684 LearningRate 0.0252 Epoch: 9 Global Step: 412730 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:22,064-Speed 2622.34 samples/sec Loss 6.6633 LearningRate 0.0252 Epoch: 9 Global Step: 412740 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:25,968-Speed 2623.57 samples/sec Loss 6.6897 LearningRate 0.0252 Epoch: 9 Global Step: 412750 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:29,917-Speed 2593.77 samples/sec Loss 6.7300 LearningRate 0.0252 Epoch: 9 Global Step: 412760 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:33,820-Speed 2623.80 samples/sec Loss 6.7029 LearningRate 0.0252 Epoch: 9 Global Step: 412770 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:37,725-Speed 2622.70 samples/sec Loss 6.7306 LearningRate 0.0252 Epoch: 9 Global Step: 412780 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:06:41,634-Speed 2620.48 samples/sec Loss 6.8247 LearningRate 0.0252 Epoch: 9 Global Step: 412790 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:06:45,537-Speed 2623.72 samples/sec Loss 6.7021 LearningRate 0.0252 Epoch: 9 Global Step: 412800 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:06:49,442-Speed 2623.36 samples/sec Loss 6.7117 LearningRate 0.0252 Epoch: 9 Global Step: 412810 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:06:53,344-Speed 2624.69 samples/sec Loss 6.7419 LearningRate 0.0252 Epoch: 9 Global Step: 412820 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:06:57,243-Speed 2626.90 samples/sec Loss 6.7661 LearningRate 0.0252 Epoch: 9 Global Step: 412830 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:07:01,147-Speed 2623.96 samples/sec Loss 6.6393 LearningRate 0.0252 Epoch: 9 Global Step: 412840 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:07:05,049-Speed 2624.84 samples/sec Loss 6.7364 LearningRate 0.0252 Epoch: 9 Global Step: 412850 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:07:08,951-Speed 2624.45 samples/sec Loss 6.7923 LearningRate 0.0252 Epoch: 9 Global Step: 412860 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:07:12,833-Speed 2638.50 samples/sec Loss 6.8355 LearningRate 0.0252 Epoch: 9 Global Step: 412870 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:16,733-Speed 2626.12 samples/sec Loss 6.6916 LearningRate 0.0252 Epoch: 9 Global Step: 412880 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:20,642-Speed 2620.53 samples/sec Loss 6.8828 LearningRate 0.0252 Epoch: 9 Global Step: 412890 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:24,553-Speed 2620.47 samples/sec Loss 6.6669 LearningRate 0.0252 Epoch: 9 Global Step: 412900 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:28,460-Speed 2621.41 samples/sec Loss 6.6708 LearningRate 0.0252 Epoch: 9 Global Step: 412910 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:32,366-Speed 2622.59 samples/sec Loss 6.7430 LearningRate 0.0252 Epoch: 9 Global Step: 412920 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:36,271-Speed 2623.10 samples/sec Loss 6.6153 LearningRate 0.0252 Epoch: 9 Global Step: 412930 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:40,171-Speed 2626.33 samples/sec Loss 6.6562 LearningRate 0.0252 Epoch: 9 Global Step: 412940 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:44,071-Speed 2625.87 samples/sec Loss 6.6936 LearningRate 0.0252 Epoch: 9 Global Step: 412950 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:47,981-Speed 2619.45 samples/sec Loss 6.7266 LearningRate 0.0252 Epoch: 9 Global Step: 412960 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:07:51,890-Speed 2620.05 samples/sec Loss 6.7240 LearningRate 0.0252 Epoch: 9 Global Step: 412970 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:07:55,788-Speed 2627.67 samples/sec Loss 6.6067 LearningRate 0.0252 Epoch: 9 Global Step: 412980 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:07:59,783-Speed 2563.79 samples/sec Loss 6.8009 LearningRate 0.0252 Epoch: 9 Global Step: 412990 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:08:03,732-Speed 2594.59 samples/sec Loss 6.7324 LearningRate 0.0252 Epoch: 9 Global Step: 413000 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:08:07,636-Speed 2623.57 samples/sec Loss 6.7081 LearningRate 0.0252 Epoch: 9 Global Step: 413010 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:08:11,542-Speed 2621.80 samples/sec Loss 6.8706 LearningRate 0.0252 Epoch: 9 Global Step: 413020 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:08:15,465-Speed 2611.33 samples/sec Loss 6.6386 LearningRate 0.0252 Epoch: 9 Global Step: 413030 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:08:19,349-Speed 2637.25 samples/sec Loss 6.6816 LearningRate 0.0252 Epoch: 9 Global Step: 413040 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:23,263-Speed 2616.03 samples/sec Loss 6.6917 LearningRate 0.0252 Epoch: 9 Global Step: 413050 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:27,162-Speed 2627.20 samples/sec Loss 6.7144 LearningRate 0.0252 Epoch: 9 Global Step: 413060 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:31,074-Speed 2618.27 samples/sec Loss 6.6354 LearningRate 0.0252 Epoch: 9 Global Step: 413070 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:34,968-Speed 2629.96 samples/sec Loss 6.6476 LearningRate 0.0252 Epoch: 9 Global Step: 413080 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:38,864-Speed 2628.91 samples/sec Loss 6.6501 LearningRate 0.0252 Epoch: 9 Global Step: 413090 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:42,783-Speed 2613.07 samples/sec Loss 6.7644 LearningRate 0.0252 Epoch: 9 Global Step: 413100 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:46,686-Speed 2624.57 samples/sec Loss 6.7023 LearningRate 0.0252 Epoch: 9 Global Step: 413110 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:50,595-Speed 2620.31 samples/sec Loss 6.6550 LearningRate 0.0252 Epoch: 9 Global Step: 413120 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:54,507-Speed 2618.80 samples/sec Loss 6.6900 LearningRate 0.0252 Epoch: 9 Global Step: 413130 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:08:58,409-Speed 2624.75 samples/sec Loss 6.6032 LearningRate 0.0252 Epoch: 9 Global Step: 413140 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:02,311-Speed 2624.52 samples/sec Loss 6.6027 LearningRate 0.0252 Epoch: 9 Global Step: 413150 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:06,213-Speed 2624.64 samples/sec Loss 6.7350 LearningRate 0.0252 Epoch: 9 Global Step: 413160 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:10,111-Speed 2627.85 samples/sec Loss 6.6227 LearningRate 0.0252 Epoch: 9 Global Step: 413170 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:14,039-Speed 2607.24 samples/sec Loss 6.6361 LearningRate 0.0252 Epoch: 9 Global Step: 413180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:17,967-Speed 2607.66 samples/sec Loss 6.7690 LearningRate 0.0252 Epoch: 9 Global Step: 413190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:21,880-Speed 2617.39 samples/sec Loss 6.7507 LearningRate 0.0252 Epoch: 9 Global Step: 413200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:25,789-Speed 2620.26 samples/sec Loss 6.6622 LearningRate 0.0252 Epoch: 9 Global Step: 413210 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:29,699-Speed 2619.59 samples/sec Loss 6.6756 LearningRate 0.0252 Epoch: 9 Global Step: 413220 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:33,597-Speed 2627.32 samples/sec Loss 6.7303 LearningRate 0.0252 Epoch: 9 Global Step: 413230 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:37,487-Speed 2632.87 samples/sec Loss 6.7211 LearningRate 0.0252 Epoch: 9 Global Step: 413240 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:41,392-Speed 2623.04 samples/sec Loss 6.6472 LearningRate 0.0252 Epoch: 9 Global Step: 413250 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:45,294-Speed 2624.91 samples/sec Loss 6.6544 LearningRate 0.0252 Epoch: 9 Global Step: 413260 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:49,200-Speed 2622.59 samples/sec Loss 6.7455 LearningRate 0.0252 Epoch: 9 Global Step: 413270 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:53,105-Speed 2622.68 samples/sec Loss 6.6834 LearningRate 0.0252 Epoch: 9 Global Step: 413280 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:09:57,019-Speed 2616.54 samples/sec Loss 6.7895 LearningRate 0.0252 Epoch: 9 Global Step: 413290 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:00,926-Speed 2621.72 samples/sec Loss 6.6455 LearningRate 0.0252 Epoch: 9 Global Step: 413300 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:04,831-Speed 2622.67 samples/sec Loss 6.6921 LearningRate 0.0252 Epoch: 9 Global Step: 413310 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:08,732-Speed 2625.81 samples/sec Loss 6.6944 LearningRate 0.0252 Epoch: 9 Global Step: 413320 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:12,645-Speed 2617.27 samples/sec Loss 6.6984 LearningRate 0.0252 Epoch: 9 Global Step: 413330 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:16,534-Speed 2634.08 samples/sec Loss 6.7632 LearningRate 0.0252 Epoch: 9 Global Step: 413340 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:20,437-Speed 2624.12 samples/sec Loss 6.6642 LearningRate 0.0252 Epoch: 9 Global Step: 413350 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:24,345-Speed 2620.58 samples/sec Loss 6.7622 LearningRate 0.0252 Epoch: 9 Global Step: 413360 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:28,247-Speed 2625.24 samples/sec Loss 6.6950 LearningRate 0.0252 Epoch: 9 Global Step: 413370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:32,151-Speed 2623.13 samples/sec Loss 6.7247 LearningRate 0.0252 Epoch: 9 Global Step: 413380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:10:36,033-Speed 2638.51 samples/sec Loss 6.7372 LearningRate 0.0252 Epoch: 9 Global Step: 413390 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:10:39,932-Speed 2627.23 samples/sec Loss 6.7905 LearningRate 0.0252 Epoch: 9 Global Step: 413400 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:10:43,827-Speed 2629.73 samples/sec Loss 6.8020 LearningRate 0.0252 Epoch: 9 Global Step: 413410 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:10:47,725-Speed 2627.78 samples/sec Loss 6.6949 LearningRate 0.0252 Epoch: 9 Global Step: 413420 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:10:51,638-Speed 2617.07 samples/sec Loss 6.7420 LearningRate 0.0252 Epoch: 9 Global Step: 413430 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:10:55,537-Speed 2627.06 samples/sec Loss 6.6209 LearningRate 0.0252 Epoch: 9 Global Step: 413440 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:10:59,435-Speed 2627.89 samples/sec Loss 6.6876 LearningRate 0.0252 Epoch: 9 Global Step: 413450 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:11:03,331-Speed 2628.59 samples/sec Loss 6.5864 LearningRate 0.0252 Epoch: 9 Global Step: 413460 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:11:07,254-Speed 2610.76 samples/sec Loss 6.7672 LearningRate 0.0252 Epoch: 9 Global Step: 413470 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:11:11,169-Speed 2615.92 samples/sec Loss 6.8403 LearningRate 0.0252 Epoch: 9 Global Step: 413480 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:11:15,062-Speed 2631.17 samples/sec Loss 6.6864 LearningRate 0.0252 Epoch: 9 Global Step: 413490 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:18,962-Speed 2626.85 samples/sec Loss 6.6949 LearningRate 0.0252 Epoch: 9 Global Step: 413500 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:22,867-Speed 2622.78 samples/sec Loss 6.6864 LearningRate 0.0252 Epoch: 9 Global Step: 413510 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:26,767-Speed 2626.48 samples/sec Loss 6.7666 LearningRate 0.0252 Epoch: 9 Global Step: 413520 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:30,664-Speed 2627.91 samples/sec Loss 6.6115 LearningRate 0.0252 Epoch: 9 Global Step: 413530 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:34,559-Speed 2629.60 samples/sec Loss 6.6454 LearningRate 0.0252 Epoch: 9 Global Step: 413540 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:38,460-Speed 2625.80 samples/sec Loss 6.5986 LearningRate 0.0251 Epoch: 9 Global Step: 413550 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:42,365-Speed 2622.61 samples/sec Loss 6.7439 LearningRate 0.0251 Epoch: 9 Global Step: 413560 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:46,264-Speed 2626.79 samples/sec Loss 6.7798 LearningRate 0.0251 Epoch: 9 Global Step: 413570 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:50,160-Speed 2628.99 samples/sec Loss 6.8014 LearningRate 0.0251 Epoch: 9 Global Step: 413580 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:11:54,065-Speed 2622.60 samples/sec Loss 6.7503 LearningRate 0.0251 Epoch: 9 Global Step: 413590 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:11:57,953-Speed 2635.14 samples/sec Loss 6.6768 LearningRate 0.0251 Epoch: 9 Global Step: 413600 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:01,851-Speed 2627.27 samples/sec Loss 6.6445 LearningRate 0.0251 Epoch: 9 Global Step: 413610 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:05,749-Speed 2627.97 samples/sec Loss 6.8428 LearningRate 0.0251 Epoch: 9 Global Step: 413620 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:09,650-Speed 2624.89 samples/sec Loss 6.5654 LearningRate 0.0251 Epoch: 9 Global Step: 413630 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:13,552-Speed 2625.16 samples/sec Loss 6.6323 LearningRate 0.0251 Epoch: 9 Global Step: 413640 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:17,456-Speed 2623.84 samples/sec Loss 6.7542 LearningRate 0.0251 Epoch: 9 Global Step: 413650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:21,360-Speed 2622.93 samples/sec Loss 6.7136 LearningRate 0.0251 Epoch: 9 Global Step: 413660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:25,269-Speed 2620.74 samples/sec Loss 6.7469 LearningRate 0.0251 Epoch: 9 Global Step: 413670 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:29,175-Speed 2622.03 samples/sec Loss 6.5907 LearningRate 0.0251 Epoch: 9 Global Step: 413680 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:33,084-Speed 2620.09 samples/sec Loss 6.7610 LearningRate 0.0251 Epoch: 9 Global Step: 413690 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:36,988-Speed 2623.43 samples/sec Loss 6.6238 LearningRate 0.0251 Epoch: 9 Global Step: 413700 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:12:40,864-Speed 2642.89 samples/sec Loss 6.8517 LearningRate 0.0251 Epoch: 9 Global Step: 413710 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:44,762-Speed 2627.42 samples/sec Loss 6.7094 LearningRate 0.0251 Epoch: 9 Global Step: 413720 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:48,665-Speed 2625.30 samples/sec Loss 6.6643 LearningRate 0.0251 Epoch: 9 Global Step: 413730 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:52,569-Speed 2623.19 samples/sec Loss 6.6790 LearningRate 0.0251 Epoch: 9 Global Step: 413740 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:12:56,467-Speed 2627.64 samples/sec Loss 6.7444 LearningRate 0.0251 Epoch: 9 Global Step: 413750 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:13:00,366-Speed 2626.90 samples/sec Loss 6.7288 LearningRate 0.0251 Epoch: 9 Global Step: 413760 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:13:04,269-Speed 2624.36 samples/sec Loss 6.7508 LearningRate 0.0251 Epoch: 9 Global Step: 413770 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:13:08,193-Speed 2610.14 samples/sec Loss 6.6893 LearningRate 0.0251 Epoch: 9 Global Step: 413780 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:13:12,094-Speed 2625.97 samples/sec Loss 6.6517 LearningRate 0.0251 Epoch: 9 Global Step: 413790 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:13:15,989-Speed 2629.39 samples/sec Loss 6.5641 LearningRate 0.0251 Epoch: 9 Global Step: 413800 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:13:19,887-Speed 2627.54 samples/sec Loss 6.5845 LearningRate 0.0251 Epoch: 9 Global Step: 413810 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:13:23,745-Speed 2654.65 samples/sec Loss 6.6234 LearningRate 0.0251 Epoch: 9 Global Step: 413820 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:27,719-Speed 2577.01 samples/sec Loss 6.5585 LearningRate 0.0251 Epoch: 9 Global Step: 413830 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:31,692-Speed 2578.57 samples/sec Loss 6.5711 LearningRate 0.0251 Epoch: 9 Global Step: 413840 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:35,586-Speed 2629.76 samples/sec Loss 6.7340 LearningRate 0.0251 Epoch: 9 Global Step: 413850 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:39,495-Speed 2620.11 samples/sec Loss 6.7690 LearningRate 0.0251 Epoch: 9 Global Step: 413860 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:43,408-Speed 2617.90 samples/sec Loss 6.6212 LearningRate 0.0251 Epoch: 9 Global Step: 413870 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:47,307-Speed 2626.74 samples/sec Loss 6.6695 LearningRate 0.0251 Epoch: 9 Global Step: 413880 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:51,209-Speed 2625.15 samples/sec Loss 6.7628 LearningRate 0.0251 Epoch: 9 Global Step: 413890 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:55,101-Speed 2631.87 samples/sec Loss 6.8098 LearningRate 0.0251 Epoch: 9 Global Step: 413900 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:13:58,997-Speed 2628.93 samples/sec Loss 6.6288 LearningRate 0.0251 Epoch: 9 Global Step: 413910 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:02,893-Speed 2628.92 samples/sec Loss 6.7317 LearningRate 0.0251 Epoch: 9 Global Step: 413920 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:14:06,788-Speed 2629.66 samples/sec Loss 6.6745 LearningRate 0.0251 Epoch: 9 Global Step: 413930 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:14:10,698-Speed 2619.73 samples/sec Loss 6.6606 LearningRate 0.0251 Epoch: 9 Global Step: 413940 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:14:14,605-Speed 2621.08 samples/sec Loss 6.6811 LearningRate 0.0251 Epoch: 9 Global Step: 413950 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:14:18,506-Speed 2625.19 samples/sec Loss 6.6437 LearningRate 0.0251 Epoch: 9 Global Step: 413960 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:14:22,384-Speed 2641.08 samples/sec Loss 6.8417 LearningRate 0.0251 Epoch: 9 Global Step: 413970 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:26,281-Speed 2628.72 samples/sec Loss 6.5650 LearningRate 0.0251 Epoch: 9 Global Step: 413980 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:30,180-Speed 2627.36 samples/sec Loss 6.7767 LearningRate 0.0251 Epoch: 9 Global Step: 413990 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:34,080-Speed 2625.79 samples/sec Loss 6.6887 LearningRate 0.0251 Epoch: 9 Global Step: 414000 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:37,993-Speed 2617.51 samples/sec Loss 6.6870 LearningRate 0.0251 Epoch: 9 Global Step: 414010 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:41,889-Speed 2628.92 samples/sec Loss 6.5180 LearningRate 0.0251 Epoch: 9 Global Step: 414020 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:45,790-Speed 2625.50 samples/sec Loss 6.6518 LearningRate 0.0251 Epoch: 9 Global Step: 414030 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:49,692-Speed 2624.77 samples/sec Loss 6.7457 LearningRate 0.0251 Epoch: 9 Global Step: 414040 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:53,589-Speed 2628.42 samples/sec Loss 6.7420 LearningRate 0.0251 Epoch: 9 Global Step: 414050 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:14:57,502-Speed 2617.36 samples/sec Loss 6.7052 LearningRate 0.0251 Epoch: 9 Global Step: 414060 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:15:01,500-Speed 2561.95 samples/sec Loss 6.6923 LearningRate 0.0251 Epoch: 9 Global Step: 414070 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:05,402-Speed 2624.74 samples/sec Loss 6.7383 LearningRate 0.0251 Epoch: 9 Global Step: 414080 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:09,307-Speed 2623.39 samples/sec Loss 6.7165 LearningRate 0.0251 Epoch: 9 Global Step: 414090 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:13,203-Speed 2628.81 samples/sec Loss 6.6974 LearningRate 0.0251 Epoch: 9 Global Step: 414100 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:17,109-Speed 2621.71 samples/sec Loss 6.7904 LearningRate 0.0251 Epoch: 9 Global Step: 414110 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:21,002-Speed 2630.97 samples/sec Loss 6.7335 LearningRate 0.0251 Epoch: 9 Global Step: 414120 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:24,908-Speed 2622.66 samples/sec Loss 6.7842 LearningRate 0.0251 Epoch: 9 Global Step: 414130 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:28,804-Speed 2629.46 samples/sec Loss 6.7567 LearningRate 0.0251 Epoch: 9 Global Step: 414140 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:32,699-Speed 2629.12 samples/sec Loss 6.7082 LearningRate 0.0251 Epoch: 9 Global Step: 414150 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:36,598-Speed 2626.64 samples/sec Loss 6.7100 LearningRate 0.0251 Epoch: 9 Global Step: 414160 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:40,495-Speed 2628.10 samples/sec Loss 6.7293 LearningRate 0.0251 Epoch: 9 Global Step: 414170 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:15:44,382-Speed 2635.18 samples/sec Loss 6.7219 LearningRate 0.0251 Epoch: 9 Global Step: 414180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:48,281-Speed 2627.57 samples/sec Loss 6.7135 LearningRate 0.0251 Epoch: 9 Global Step: 414190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:52,196-Speed 2616.09 samples/sec Loss 6.7049 LearningRate 0.0251 Epoch: 9 Global Step: 414200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:15:56,104-Speed 2620.43 samples/sec Loss 6.5370 LearningRate 0.0251 Epoch: 9 Global Step: 414210 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:16:00,016-Speed 2618.71 samples/sec Loss 6.5930 LearningRate 0.0251 Epoch: 9 Global Step: 414220 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:16:03,918-Speed 2624.76 samples/sec Loss 6.8286 LearningRate 0.0251 Epoch: 9 Global Step: 414230 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:16:07,840-Speed 2611.26 samples/sec Loss 6.7025 LearningRate 0.0251 Epoch: 9 Global Step: 414240 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:16:11,745-Speed 2622.76 samples/sec Loss 6.6693 LearningRate 0.0251 Epoch: 9 Global Step: 414250 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:16:15,664-Speed 2613.40 samples/sec Loss 6.8120 LearningRate 0.0251 Epoch: 9 Global Step: 414260 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:16:19,542-Speed 2641.02 samples/sec Loss 6.6262 LearningRate 0.0251 Epoch: 9 Global Step: 414270 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:23,452-Speed 2619.85 samples/sec Loss 6.7175 LearningRate 0.0251 Epoch: 9 Global Step: 414280 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:27,354-Speed 2624.65 samples/sec Loss 6.6788 LearningRate 0.0251 Epoch: 9 Global Step: 414290 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:31,269-Speed 2616.24 samples/sec Loss 6.6065 LearningRate 0.0251 Epoch: 9 Global Step: 414300 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:35,173-Speed 2623.51 samples/sec Loss 6.5991 LearningRate 0.0251 Epoch: 9 Global Step: 414310 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:39,073-Speed 2626.18 samples/sec Loss 6.6846 LearningRate 0.0251 Epoch: 9 Global Step: 414320 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:42,992-Speed 2613.37 samples/sec Loss 6.5908 LearningRate 0.0251 Epoch: 9 Global Step: 414330 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:46,888-Speed 2628.69 samples/sec Loss 6.6741 LearningRate 0.0251 Epoch: 9 Global Step: 414340 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:50,814-Speed 2608.93 samples/sec Loss 6.6204 LearningRate 0.0251 Epoch: 9 Global Step: 414350 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:54,736-Speed 2611.75 samples/sec Loss 6.5965 LearningRate 0.0251 Epoch: 9 Global Step: 414360 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:16:58,639-Speed 2624.19 samples/sec Loss 6.7445 LearningRate 0.0251 Epoch: 9 Global Step: 414370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:17:02,555-Speed 2615.71 samples/sec Loss 6.7108 LearningRate 0.0250 Epoch: 9 Global Step: 414380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:17:06,459-Speed 2623.25 samples/sec Loss 6.6858 LearningRate 0.0250 Epoch: 9 Global Step: 414390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:17:10,367-Speed 2620.69 samples/sec Loss 6.6747 LearningRate 0.0250 Epoch: 9 Global Step: 414400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:17:14,276-Speed 2620.46 samples/sec Loss 6.7741 LearningRate 0.0250 Epoch: 9 Global Step: 414410 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:17:18,173-Speed 2628.35 samples/sec Loss 6.6673 LearningRate 0.0250 Epoch: 9 Global Step: 414420 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:17:22,078-Speed 2622.93 samples/sec Loss 6.7688 LearningRate 0.0250 Epoch: 9 Global Step: 414430 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:17:25,960-Speed 2638.20 samples/sec Loss 6.6524 LearningRate 0.0250 Epoch: 9 Global Step: 414440 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:29,861-Speed 2625.66 samples/sec Loss 6.6150 LearningRate 0.0250 Epoch: 9 Global Step: 414450 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:33,769-Speed 2620.79 samples/sec Loss 6.6135 LearningRate 0.0250 Epoch: 9 Global Step: 414460 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:37,674-Speed 2622.53 samples/sec Loss 6.5843 LearningRate 0.0250 Epoch: 9 Global Step: 414470 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:41,572-Speed 2627.78 samples/sec Loss 6.6313 LearningRate 0.0250 Epoch: 9 Global Step: 414480 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:45,470-Speed 2627.71 samples/sec Loss 6.6590 LearningRate 0.0250 Epoch: 9 Global Step: 414490 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:49,369-Speed 2626.90 samples/sec Loss 6.5704 LearningRate 0.0250 Epoch: 9 Global Step: 414500 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:53,277-Speed 2621.09 samples/sec Loss 6.6903 LearningRate 0.0250 Epoch: 9 Global Step: 414510 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:17:57,194-Speed 2614.87 samples/sec Loss 6.5944 LearningRate 0.0250 Epoch: 9 Global Step: 414520 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:01,084-Speed 2633.09 samples/sec Loss 6.7016 LearningRate 0.0250 Epoch: 9 Global Step: 414530 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:05,004-Speed 2612.72 samples/sec Loss 6.8025 LearningRate 0.0250 Epoch: 9 Global Step: 414540 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:18:08,879-Speed 2642.96 samples/sec Loss 6.6949 LearningRate 0.0250 Epoch: 9 Global Step: 414550 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:12,777-Speed 2627.61 samples/sec Loss 6.7361 LearningRate 0.0250 Epoch: 9 Global Step: 414560 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:16,677-Speed 2626.14 samples/sec Loss 6.7119 LearningRate 0.0250 Epoch: 9 Global Step: 414570 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:20,578-Speed 2625.72 samples/sec Loss 6.6936 LearningRate 0.0250 Epoch: 9 Global Step: 414580 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:24,478-Speed 2626.15 samples/sec Loss 6.7140 LearningRate 0.0250 Epoch: 9 Global Step: 414590 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:28,370-Speed 2631.58 samples/sec Loss 6.7420 LearningRate 0.0250 Epoch: 9 Global Step: 414600 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:32,278-Speed 2621.51 samples/sec Loss 6.7499 LearningRate 0.0250 Epoch: 9 Global Step: 414610 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:36,173-Speed 2629.46 samples/sec Loss 6.7276 LearningRate 0.0250 Epoch: 9 Global Step: 414620 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:40,070-Speed 2627.89 samples/sec Loss 6.6017 LearningRate 0.0250 Epoch: 9 Global Step: 414630 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:43,968-Speed 2627.50 samples/sec Loss 6.6959 LearningRate 0.0250 Epoch: 9 Global Step: 414640 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:18:47,865-Speed 2628.34 samples/sec Loss 6.7079 LearningRate 0.0250 Epoch: 9 Global Step: 414650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:18:51,769-Speed 2623.53 samples/sec Loss 6.8654 LearningRate 0.0250 Epoch: 9 Global Step: 414660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:18:55,665-Speed 2628.91 samples/sec Loss 6.8122 LearningRate 0.0250 Epoch: 9 Global Step: 414670 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:18:59,561-Speed 2628.79 samples/sec Loss 6.6107 LearningRate 0.0250 Epoch: 9 Global Step: 414680 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:19:03,431-Speed 2646.76 samples/sec Loss 6.7077 LearningRate 0.0250 Epoch: 9 Global Step: 414690 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:07,322-Speed 2631.74 samples/sec Loss 6.6601 LearningRate 0.0250 Epoch: 9 Global Step: 414700 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:11,222-Speed 2626.79 samples/sec Loss 6.7466 LearningRate 0.0250 Epoch: 9 Global Step: 414710 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:15,114-Speed 2631.83 samples/sec Loss 6.6423 LearningRate 0.0250 Epoch: 9 Global Step: 414720 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:19,009-Speed 2629.70 samples/sec Loss 6.6232 LearningRate 0.0250 Epoch: 9 Global Step: 414730 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:22,929-Speed 2612.22 samples/sec Loss 6.7433 LearningRate 0.0250 Epoch: 9 Global Step: 414740 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:26,832-Speed 2624.73 samples/sec Loss 6.7222 LearningRate 0.0250 Epoch: 9 Global Step: 414750 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:30,746-Speed 2616.70 samples/sec Loss 6.7550 LearningRate 0.0250 Epoch: 9 Global Step: 414760 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:34,640-Speed 2630.44 samples/sec Loss 6.6741 LearningRate 0.0250 Epoch: 9 Global Step: 414770 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:38,550-Speed 2619.84 samples/sec Loss 6.7562 LearningRate 0.0250 Epoch: 9 Global Step: 414780 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:19:42,443-Speed 2630.77 samples/sec Loss 6.6415 LearningRate 0.0250 Epoch: 9 Global Step: 414790 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:20:03,911-Speed 477.00 samples/sec Loss 6.6136 LearningRate 0.0250 Epoch: 10 Global Step: 414800 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:20:07,798-Speed 2635.51 samples/sec Loss 6.7046 LearningRate 0.0250 Epoch: 10 Global Step: 414810 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:20:11,659-Speed 2653.07 samples/sec Loss 6.6818 LearningRate 0.0250 Epoch: 10 Global Step: 414820 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:15,540-Speed 2638.98 samples/sec Loss 6.8428 LearningRate 0.0250 Epoch: 10 Global Step: 414830 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:19,438-Speed 2627.78 samples/sec Loss 6.7264 LearningRate 0.0250 Epoch: 10 Global Step: 414840 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:23,322-Speed 2637.23 samples/sec Loss 6.7687 LearningRate 0.0250 Epoch: 10 Global Step: 414850 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:27,212-Speed 2632.99 samples/sec Loss 6.6144 LearningRate 0.0250 Epoch: 10 Global Step: 414860 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:31,102-Speed 2633.17 samples/sec Loss 6.6380 LearningRate 0.0250 Epoch: 10 Global Step: 414870 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:34,995-Speed 2630.86 samples/sec Loss 6.6934 LearningRate 0.0250 Epoch: 10 Global Step: 414880 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:38,884-Speed 2634.00 samples/sec Loss 6.7166 LearningRate 0.0250 Epoch: 10 Global Step: 414890 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:42,782-Speed 2627.52 samples/sec Loss 6.6576 LearningRate 0.0250 Epoch: 10 Global Step: 414900 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:46,723-Speed 2599.12 samples/sec Loss 6.7692 LearningRate 0.0250 Epoch: 10 Global Step: 414910 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:20:50,612-Speed 2633.34 samples/sec Loss 6.7700 LearningRate 0.0250 Epoch: 10 Global Step: 414920 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:20:54,516-Speed 2623.82 samples/sec Loss 6.7246 LearningRate 0.0250 Epoch: 10 Global Step: 414930 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:20:58,410-Speed 2630.25 samples/sec Loss 6.6584 LearningRate 0.0250 Epoch: 10 Global Step: 414940 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:21:02,309-Speed 2626.87 samples/sec Loss 6.7072 LearningRate 0.0250 Epoch: 10 Global Step: 414950 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:21:06,217-Speed 2621.28 samples/sec Loss 6.6743 LearningRate 0.0250 Epoch: 10 Global Step: 414960 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:21:10,122-Speed 2622.62 samples/sec Loss 6.5546 LearningRate 0.0250 Epoch: 10 Global Step: 414970 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:21:14,025-Speed 2624.71 samples/sec Loss 6.6345 LearningRate 0.0250 Epoch: 10 Global Step: 414980 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:21:18,039-Speed 2551.60 samples/sec Loss 6.7359 LearningRate 0.0250 Epoch: 10 Global Step: 414990 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:21:22,167-Speed 2481.24 samples/sec Loss 6.6081 LearningRate 0.0250 Epoch: 10 Global Step: 415000 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:21:26,120-Speed 2591.32 samples/sec Loss 6.6717 LearningRate 0.0250 Epoch: 10 Global Step: 415010 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:30,018-Speed 2627.80 samples/sec Loss 6.7200 LearningRate 0.0250 Epoch: 10 Global Step: 415020 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:33,919-Speed 2625.01 samples/sec Loss 6.7822 LearningRate 0.0250 Epoch: 10 Global Step: 415030 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:37,827-Speed 2621.34 samples/sec Loss 6.7206 LearningRate 0.0250 Epoch: 10 Global Step: 415040 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:41,729-Speed 2624.64 samples/sec Loss 6.6701 LearningRate 0.0250 Epoch: 10 Global Step: 415050 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:45,630-Speed 2626.52 samples/sec Loss 6.5811 LearningRate 0.0250 Epoch: 10 Global Step: 415060 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:49,533-Speed 2624.07 samples/sec Loss 6.7182 LearningRate 0.0250 Epoch: 10 Global Step: 415070 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:53,432-Speed 2626.79 samples/sec Loss 6.7163 LearningRate 0.0250 Epoch: 10 Global Step: 415080 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:21:57,328-Speed 2629.17 samples/sec Loss 6.7075 LearningRate 0.0250 Epoch: 10 Global Step: 415090 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:22:01,224-Speed 2628.69 samples/sec Loss 6.6656 LearningRate 0.0250 Epoch: 10 Global Step: 415100 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:22:05,119-Speed 2629.91 samples/sec Loss 6.6575 LearningRate 0.0250 Epoch: 10 Global Step: 415110 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:09,019-Speed 2626.73 samples/sec Loss 6.6409 LearningRate 0.0250 Epoch: 10 Global Step: 415120 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:12,973-Speed 2590.61 samples/sec Loss 6.8056 LearningRate 0.0250 Epoch: 10 Global Step: 415130 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:16,876-Speed 2624.57 samples/sec Loss 6.5898 LearningRate 0.0250 Epoch: 10 Global Step: 415140 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:20,784-Speed 2620.91 samples/sec Loss 6.7034 LearningRate 0.0250 Epoch: 10 Global Step: 415150 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:24,717-Speed 2603.65 samples/sec Loss 6.7823 LearningRate 0.0250 Epoch: 10 Global Step: 415160 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:28,632-Speed 2616.71 samples/sec Loss 6.6345 LearningRate 0.0250 Epoch: 10 Global Step: 415170 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:32,528-Speed 2628.78 samples/sec Loss 6.6079 LearningRate 0.0250 Epoch: 10 Global Step: 415180 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:36,433-Speed 2623.39 samples/sec Loss 6.5879 LearningRate 0.0250 Epoch: 10 Global Step: 415190 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:40,329-Speed 2628.62 samples/sec Loss 6.6128 LearningRate 0.0250 Epoch: 10 Global Step: 415200 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:44,228-Speed 2627.47 samples/sec Loss 6.6556 LearningRate 0.0249 Epoch: 10 Global Step: 415210 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:22:48,172-Speed 2596.95 samples/sec Loss 6.6111 LearningRate 0.0249 Epoch: 10 Global Step: 415220 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:22:52,085-Speed 2617.59 samples/sec Loss 6.5421 LearningRate 0.0249 Epoch: 10 Global Step: 415230 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:56,035-Speed 2593.18 samples/sec Loss 6.6298 LearningRate 0.0249 Epoch: 10 Global Step: 415240 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:22:59,914-Speed 2640.30 samples/sec Loss 6.5906 LearningRate 0.0249 Epoch: 10 Global Step: 415250 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:03,812-Speed 2627.59 samples/sec Loss 6.7154 LearningRate 0.0249 Epoch: 10 Global Step: 415260 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:07,734-Speed 2611.85 samples/sec Loss 6.6078 LearningRate 0.0249 Epoch: 10 Global Step: 415270 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:11,627-Speed 2630.89 samples/sec Loss 6.5437 LearningRate 0.0249 Epoch: 10 Global Step: 415280 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:15,524-Speed 2628.94 samples/sec Loss 6.6054 LearningRate 0.0249 Epoch: 10 Global Step: 415290 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:19,421-Speed 2628.10 samples/sec Loss 6.6635 LearningRate 0.0249 Epoch: 10 Global Step: 415300 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:23,315-Speed 2630.27 samples/sec Loss 6.6438 LearningRate 0.0249 Epoch: 10 Global Step: 415310 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:27,219-Speed 2623.30 samples/sec Loss 6.6085 LearningRate 0.0249 Epoch: 10 Global Step: 415320 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:31,145-Speed 2609.87 samples/sec Loss 6.4879 LearningRate 0.0249 Epoch: 10 Global Step: 415330 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:35,049-Speed 2623.05 samples/sec Loss 6.7571 LearningRate 0.0249 Epoch: 10 Global Step: 415340 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:23:38,986-Speed 2602.32 samples/sec Loss 6.6994 LearningRate 0.0249 Epoch: 10 Global Step: 415350 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:23:42,933-Speed 2594.83 samples/sec Loss 6.6087 LearningRate 0.0249 Epoch: 10 Global Step: 415360 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:23:46,856-Speed 2611.20 samples/sec Loss 6.7698 LearningRate 0.0249 Epoch: 10 Global Step: 415370 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:23:50,750-Speed 2630.39 samples/sec Loss 6.6361 LearningRate 0.0249 Epoch: 10 Global Step: 415380 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:23:54,659-Speed 2620.22 samples/sec Loss 6.7704 LearningRate 0.0249 Epoch: 10 Global Step: 415390 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:23:58,562-Speed 2624.13 samples/sec Loss 6.6950 LearningRate 0.0249 Epoch: 10 Global Step: 415400 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:24:02,463-Speed 2625.52 samples/sec Loss 6.7378 LearningRate 0.0249 Epoch: 10 Global Step: 415410 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:24:06,361-Speed 2628.14 samples/sec Loss 6.5895 LearningRate 0.0249 Epoch: 10 Global Step: 415420 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:24:10,259-Speed 2627.51 samples/sec Loss 6.7282 LearningRate 0.0249 Epoch: 10 Global Step: 415430 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:24:14,159-Speed 2626.34 samples/sec Loss 6.5110 LearningRate 0.0249 Epoch: 10 Global Step: 415440 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:24:18,059-Speed 2626.38 samples/sec Loss 6.6229 LearningRate 0.0249 Epoch: 10 Global Step: 415450 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:24:21,954-Speed 2629.54 samples/sec Loss 6.5709 LearningRate 0.0249 Epoch: 10 Global Step: 415460 Fp16 Grad Scale: 262144 Required: 47 hours
Training: 2022-04-14 18:24:25,833-Speed 2640.10 samples/sec Loss 6.5086 LearningRate 0.0249 Epoch: 10 Global Step: 415470 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:24:29,728-Speed 2630.59 samples/sec Loss 6.6290 LearningRate 0.0249 Epoch: 10 Global Step: 415480 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:24:33,610-Speed 2638.00 samples/sec Loss 6.6571 LearningRate 0.0249 Epoch: 10 Global Step: 415490 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:24:37,504-Speed 2630.75 samples/sec Loss 6.6028 LearningRate 0.0249 Epoch: 10 Global Step: 415500 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:24:41,410-Speed 2621.99 samples/sec Loss 6.5420 LearningRate 0.0249 Epoch: 10 Global Step: 415510 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:24:45,307-Speed 2628.73 samples/sec Loss 6.6470 LearningRate 0.0249 Epoch: 10 Global Step: 415520 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:24:49,206-Speed 2626.88 samples/sec Loss 6.5743 LearningRate 0.0249 Epoch: 10 Global Step: 415530 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:24:53,118-Speed 2618.80 samples/sec Loss 6.6906 LearningRate 0.0249 Epoch: 10 Global Step: 415540 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:24:57,012-Speed 2630.36 samples/sec Loss 6.7334 LearningRate 0.0249 Epoch: 10 Global Step: 415550 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:25:00,925-Speed 2617.12 samples/sec Loss 6.5558 LearningRate 0.0249 Epoch: 10 Global Step: 415560 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:25:04,824-Speed 2627.52 samples/sec Loss 6.6484 LearningRate 0.0249 Epoch: 10 Global Step: 415570 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:25:08,718-Speed 2630.38 samples/sec Loss 6.6741 LearningRate 0.0249 Epoch: 10 Global Step: 415580 Fp16 Grad Scale: 65536 Required: 47 hours
Training: 2022-04-14 18:25:12,614-Speed 2628.50 samples/sec Loss 6.6756 LearningRate 0.0249 Epoch: 10 Global Step: 415590 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:16,633-Speed 2548.52 samples/sec Loss 6.6100 LearningRate 0.0249 Epoch: 10 Global Step: 415600 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:20,533-Speed 2626.38 samples/sec Loss 6.6018 LearningRate 0.0249 Epoch: 10 Global Step: 415610 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:24,427-Speed 2630.95 samples/sec Loss 6.6890 LearningRate 0.0249 Epoch: 10 Global Step: 415620 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:28,331-Speed 2622.94 samples/sec Loss 6.7876 LearningRate 0.0249 Epoch: 10 Global Step: 415630 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:32,226-Speed 2630.23 samples/sec Loss 6.6955 LearningRate 0.0249 Epoch: 10 Global Step: 415640 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:36,123-Speed 2628.23 samples/sec Loss 6.6654 LearningRate 0.0249 Epoch: 10 Global Step: 415650 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:40,074-Speed 2592.50 samples/sec Loss 6.6587 LearningRate 0.0249 Epoch: 10 Global Step: 415660 Fp16 Grad Scale: 131072 Required: 47 hours
Training: 2022-04-14 18:25:43,969-Speed 2629.90 samples/sec Loss 6.6755 LearningRate 0.0249 Epoch: 10 Global Step: 415670 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:25:47,884-Speed 2615.99 samples/sec Loss 6.7907 LearningRate 0.0249 Epoch: 10 Global Step: 415680 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:25:51,790-Speed 2622.36 samples/sec Loss 6.7129 LearningRate 0.0249 Epoch: 10 Global Step: 415690 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:25:55,691-Speed 2626.10 samples/sec Loss 6.6212 LearningRate 0.0249 Epoch: 10 Global Step: 415700 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:25:59,595-Speed 2623.51 samples/sec Loss 6.6619 LearningRate 0.0249 Epoch: 10 Global Step: 415710 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:03,499-Speed 2623.14 samples/sec Loss 6.7675 LearningRate 0.0249 Epoch: 10 Global Step: 415720 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:07,399-Speed 2626.10 samples/sec Loss 6.6359 LearningRate 0.0249 Epoch: 10 Global Step: 415730 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:11,302-Speed 2624.90 samples/sec Loss 6.6521 LearningRate 0.0249 Epoch: 10 Global Step: 415740 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:15,195-Speed 2631.02 samples/sec Loss 6.6767 LearningRate 0.0249 Epoch: 10 Global Step: 415750 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:19,110-Speed 2616.75 samples/sec Loss 6.6509 LearningRate 0.0249 Epoch: 10 Global Step: 415760 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:23,015-Speed 2622.62 samples/sec Loss 6.6780 LearningRate 0.0249 Epoch: 10 Global Step: 415770 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:26,941-Speed 2609.10 samples/sec Loss 6.5860 LearningRate 0.0249 Epoch: 10 Global Step: 415780 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:26:30,866-Speed 2609.38 samples/sec Loss 6.7352 LearningRate 0.0249 Epoch: 10 Global Step: 415790 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:26:34,775-Speed 2620.58 samples/sec Loss 6.6400 LearningRate 0.0249 Epoch: 10 Global Step: 415800 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:26:38,688-Speed 2617.46 samples/sec Loss 6.5194 LearningRate 0.0249 Epoch: 10 Global Step: 415810 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:26:42,588-Speed 2626.60 samples/sec Loss 6.6929 LearningRate 0.0249 Epoch: 10 Global Step: 415820 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:26:46,487-Speed 2626.69 samples/sec Loss 6.6300 LearningRate 0.0249 Epoch: 10 Global Step: 415830 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:26:50,376-Speed 2634.15 samples/sec Loss 6.5883 LearningRate 0.0249 Epoch: 10 Global Step: 415840 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:54,366-Speed 2567.02 samples/sec Loss 6.6801 LearningRate 0.0249 Epoch: 10 Global Step: 415850 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:26:58,298-Speed 2604.60 samples/sec Loss 6.6002 LearningRate 0.0249 Epoch: 10 Global Step: 415860 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:02,214-Speed 2615.64 samples/sec Loss 6.6339 LearningRate 0.0249 Epoch: 10 Global Step: 415870 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:06,111-Speed 2628.23 samples/sec Loss 6.7221 LearningRate 0.0249 Epoch: 10 Global Step: 415880 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:10,011-Speed 2626.77 samples/sec Loss 6.6368 LearningRate 0.0249 Epoch: 10 Global Step: 415890 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:13,910-Speed 2626.92 samples/sec Loss 6.6334 LearningRate 0.0249 Epoch: 10 Global Step: 415900 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:17,813-Speed 2623.73 samples/sec Loss 6.6400 LearningRate 0.0249 Epoch: 10 Global Step: 415910 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:21,721-Speed 2621.13 samples/sec Loss 6.7128 LearningRate 0.0249 Epoch: 10 Global Step: 415920 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:25,623-Speed 2624.88 samples/sec Loss 6.5775 LearningRate 0.0249 Epoch: 10 Global Step: 415930 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:27:29,542-Speed 2613.92 samples/sec Loss 6.7040 LearningRate 0.0249 Epoch: 10 Global Step: 415940 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:27:33,438-Speed 2629.25 samples/sec Loss 6.5469 LearningRate 0.0249 Epoch: 10 Global Step: 415950 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:27:37,345-Speed 2621.26 samples/sec Loss 6.6215 LearningRate 0.0249 Epoch: 10 Global Step: 415960 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:27:41,242-Speed 2628.34 samples/sec Loss 6.5305 LearningRate 0.0249 Epoch: 10 Global Step: 415970 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:27:45,138-Speed 2629.74 samples/sec Loss 6.6872 LearningRate 0.0249 Epoch: 10 Global Step: 415980 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:27:49,033-Speed 2629.21 samples/sec Loss 6.5547 LearningRate 0.0249 Epoch: 10 Global Step: 415990 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:27:52,933-Speed 2626.56 samples/sec Loss 6.5374 LearningRate 0.0249 Epoch: 10 Global Step: 416000 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:27:56,834-Speed 2626.12 samples/sec Loss 6.6368 LearningRate 0.0249 Epoch: 10 Global Step: 416010 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:28:00,713-Speed 2641.11 samples/sec Loss 6.6637 LearningRate 0.0249 Epoch: 10 Global Step: 416020 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:04,645-Speed 2604.82 samples/sec Loss 6.6745 LearningRate 0.0249 Epoch: 10 Global Step: 416030 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:08,564-Speed 2613.71 samples/sec Loss 6.5566 LearningRate 0.0248 Epoch: 10 Global Step: 416040 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:12,464-Speed 2625.83 samples/sec Loss 6.6394 LearningRate 0.0248 Epoch: 10 Global Step: 416050 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:16,364-Speed 2626.76 samples/sec Loss 6.5397 LearningRate 0.0248 Epoch: 10 Global Step: 416060 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:20,274-Speed 2619.45 samples/sec Loss 6.7181 LearningRate 0.0248 Epoch: 10 Global Step: 416070 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:24,183-Speed 2619.92 samples/sec Loss 6.5740 LearningRate 0.0248 Epoch: 10 Global Step: 416080 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:28,089-Speed 2622.68 samples/sec Loss 6.7217 LearningRate 0.0248 Epoch: 10 Global Step: 416090 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:31,984-Speed 2630.08 samples/sec Loss 6.8088 LearningRate 0.0248 Epoch: 10 Global Step: 416100 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:35,880-Speed 2628.62 samples/sec Loss 6.7379 LearningRate 0.0248 Epoch: 10 Global Step: 416110 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:28:39,776-Speed 2628.58 samples/sec Loss 6.7041 LearningRate 0.0248 Epoch: 10 Global Step: 416120 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:28:43,670-Speed 2630.48 samples/sec Loss 6.6069 LearningRate 0.0248 Epoch: 10 Global Step: 416130 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:28:47,569-Speed 2627.39 samples/sec Loss 6.5702 LearningRate 0.0248 Epoch: 10 Global Step: 416140 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:28:51,470-Speed 2625.98 samples/sec Loss 6.6822 LearningRate 0.0248 Epoch: 10 Global Step: 416150 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:28:55,390-Speed 2612.39 samples/sec Loss 6.6728 LearningRate 0.0248 Epoch: 10 Global Step: 416160 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:28:59,298-Speed 2621.63 samples/sec Loss 6.6688 LearningRate 0.0248 Epoch: 10 Global Step: 416170 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:03,261-Speed 2584.05 samples/sec Loss 6.7796 LearningRate 0.0248 Epoch: 10 Global Step: 416180 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:07,159-Speed 2628.32 samples/sec Loss 6.6708 LearningRate 0.0248 Epoch: 10 Global Step: 416190 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:11,056-Speed 2628.21 samples/sec Loss 6.6369 LearningRate 0.0248 Epoch: 10 Global Step: 416200 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:14,952-Speed 2629.05 samples/sec Loss 6.6579 LearningRate 0.0248 Epoch: 10 Global Step: 416210 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:18,838-Speed 2635.53 samples/sec Loss 6.8030 LearningRate 0.0248 Epoch: 10 Global Step: 416220 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:22,748-Speed 2619.66 samples/sec Loss 6.6282 LearningRate 0.0248 Epoch: 10 Global Step: 416230 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:26,660-Speed 2617.90 samples/sec Loss 6.5540 LearningRate 0.0248 Epoch: 10 Global Step: 416240 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:30,557-Speed 2628.95 samples/sec Loss 6.5378 LearningRate 0.0248 Epoch: 10 Global Step: 416250 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:29:34,441-Speed 2636.57 samples/sec Loss 6.5806 LearningRate 0.0248 Epoch: 10 Global Step: 416260 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:29:38,366-Speed 2609.34 samples/sec Loss 6.5760 LearningRate 0.0248 Epoch: 10 Global Step: 416270 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:29:42,279-Speed 2617.73 samples/sec Loss 6.6190 LearningRate 0.0248 Epoch: 10 Global Step: 416280 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:29:46,199-Speed 2613.07 samples/sec Loss 6.7180 LearningRate 0.0248 Epoch: 10 Global Step: 416290 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:29:50,104-Speed 2622.69 samples/sec Loss 6.5895 LearningRate 0.0248 Epoch: 10 Global Step: 416300 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:29:54,001-Speed 2628.67 samples/sec Loss 6.6854 LearningRate 0.0248 Epoch: 10 Global Step: 416310 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:29:57,920-Speed 2613.34 samples/sec Loss 6.5210 LearningRate 0.0248 Epoch: 10 Global Step: 416320 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:01,852-Speed 2605.55 samples/sec Loss 6.5885 LearningRate 0.0248 Epoch: 10 Global Step: 416330 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:05,754-Speed 2625.03 samples/sec Loss 6.7807 LearningRate 0.0248 Epoch: 10 Global Step: 416340 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:09,825-Speed 2515.85 samples/sec Loss 6.6522 LearningRate 0.0248 Epoch: 10 Global Step: 416350 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:13,855-Speed 2541.75 samples/sec Loss 6.6983 LearningRate 0.0248 Epoch: 10 Global Step: 416360 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:30:17,729-Speed 2644.07 samples/sec Loss 6.6636 LearningRate 0.0248 Epoch: 10 Global Step: 416370 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:21,624-Speed 2629.49 samples/sec Loss 6.6870 LearningRate 0.0248 Epoch: 10 Global Step: 416380 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:25,525-Speed 2626.16 samples/sec Loss 6.7320 LearningRate 0.0248 Epoch: 10 Global Step: 416390 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:29,424-Speed 2626.29 samples/sec Loss 6.6857 LearningRate 0.0248 Epoch: 10 Global Step: 416400 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:33,321-Speed 2628.31 samples/sec Loss 6.7141 LearningRate 0.0248 Epoch: 10 Global Step: 416410 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:37,247-Speed 2609.26 samples/sec Loss 6.4895 LearningRate 0.0248 Epoch: 10 Global Step: 416420 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:41,148-Speed 2627.09 samples/sec Loss 6.5241 LearningRate 0.0248 Epoch: 10 Global Step: 416430 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:45,041-Speed 2630.72 samples/sec Loss 6.6109 LearningRate 0.0248 Epoch: 10 Global Step: 416440 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:48,950-Speed 2620.92 samples/sec Loss 6.7097 LearningRate 0.0248 Epoch: 10 Global Step: 416450 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:52,845-Speed 2629.33 samples/sec Loss 6.5920 LearningRate 0.0248 Epoch: 10 Global Step: 416460 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:30:56,738-Speed 2631.28 samples/sec Loss 6.7307 LearningRate 0.0248 Epoch: 10 Global Step: 416470 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:31:00,635-Speed 2628.44 samples/sec Loss 6.6908 LearningRate 0.0248 Epoch: 10 Global Step: 416480 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:31:04,510-Speed 2642.71 samples/sec Loss 6.6524 LearningRate 0.0248 Epoch: 10 Global Step: 416490 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:08,405-Speed 2629.93 samples/sec Loss 6.7374 LearningRate 0.0248 Epoch: 10 Global Step: 416500 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:12,304-Speed 2626.92 samples/sec Loss 6.6040 LearningRate 0.0248 Epoch: 10 Global Step: 416510 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:16,200-Speed 2629.03 samples/sec Loss 6.7025 LearningRate 0.0248 Epoch: 10 Global Step: 416520 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:20,097-Speed 2628.00 samples/sec Loss 6.7064 LearningRate 0.0248 Epoch: 10 Global Step: 416530 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:23,993-Speed 2630.05 samples/sec Loss 6.5739 LearningRate 0.0248 Epoch: 10 Global Step: 416540 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:27,891-Speed 2627.67 samples/sec Loss 6.6303 LearningRate 0.0248 Epoch: 10 Global Step: 416550 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:31,783-Speed 2631.89 samples/sec Loss 6.5515 LearningRate 0.0248 Epoch: 10 Global Step: 416560 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:35,676-Speed 2630.92 samples/sec Loss 6.5896 LearningRate 0.0248 Epoch: 10 Global Step: 416570 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:39,571-Speed 2628.96 samples/sec Loss 6.6604 LearningRate 0.0248 Epoch: 10 Global Step: 416580 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:31:43,479-Speed 2620.88 samples/sec Loss 6.7212 LearningRate 0.0248 Epoch: 10 Global Step: 416590 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:31:47,625-Speed 2470.58 samples/sec Loss 6.6349 LearningRate 0.0248 Epoch: 10 Global Step: 416600 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:31:51,558-Speed 2604.42 samples/sec Loss 6.7308 LearningRate 0.0248 Epoch: 10 Global Step: 416610 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:31:55,461-Speed 2624.55 samples/sec Loss 6.5138 LearningRate 0.0248 Epoch: 10 Global Step: 416620 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:31:59,358-Speed 2628.85 samples/sec Loss 6.7919 LearningRate 0.0248 Epoch: 10 Global Step: 416630 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:03,254-Speed 2628.64 samples/sec Loss 6.6916 LearningRate 0.0248 Epoch: 10 Global Step: 416640 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:07,158-Speed 2623.18 samples/sec Loss 6.6720 LearningRate 0.0248 Epoch: 10 Global Step: 416650 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:11,055-Speed 2628.50 samples/sec Loss 6.5413 LearningRate 0.0248 Epoch: 10 Global Step: 416660 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:14,950-Speed 2629.72 samples/sec Loss 6.6608 LearningRate 0.0248 Epoch: 10 Global Step: 416670 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:18,856-Speed 2622.01 samples/sec Loss 6.6927 LearningRate 0.0248 Epoch: 10 Global Step: 416680 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:22,737-Speed 2639.42 samples/sec Loss 6.5527 LearningRate 0.0248 Epoch: 10 Global Step: 416690 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:26,635-Speed 2627.82 samples/sec Loss 6.6748 LearningRate 0.0248 Epoch: 10 Global Step: 416700 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:30,528-Speed 2631.25 samples/sec Loss 6.5976 LearningRate 0.0248 Epoch: 10 Global Step: 416710 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:34,432-Speed 2623.25 samples/sec Loss 6.6184 LearningRate 0.0248 Epoch: 10 Global Step: 416720 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:38,342-Speed 2619.45 samples/sec Loss 6.6496 LearningRate 0.0248 Epoch: 10 Global Step: 416730 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:32:42,219-Speed 2641.68 samples/sec Loss 6.7885 LearningRate 0.0248 Epoch: 10 Global Step: 416740 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:32:46,120-Speed 2626.17 samples/sec Loss 6.6461 LearningRate 0.0248 Epoch: 10 Global Step: 416750 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:32:50,034-Speed 2616.90 samples/sec Loss 6.7103 LearningRate 0.0248 Epoch: 10 Global Step: 416760 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:32:53,970-Speed 2603.08 samples/sec Loss 6.6340 LearningRate 0.0248 Epoch: 10 Global Step: 416770 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:32:57,873-Speed 2624.13 samples/sec Loss 6.6513 LearningRate 0.0248 Epoch: 10 Global Step: 416780 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:33:01,776-Speed 2624.31 samples/sec Loss 6.6188 LearningRate 0.0248 Epoch: 10 Global Step: 416790 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:33:05,702-Speed 2608.50 samples/sec Loss 6.7170 LearningRate 0.0248 Epoch: 10 Global Step: 416800 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:33:09,598-Speed 2628.96 samples/sec Loss 6.6325 LearningRate 0.0248 Epoch: 10 Global Step: 416810 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:33:13,496-Speed 2628.18 samples/sec Loss 6.5555 LearningRate 0.0248 Epoch: 10 Global Step: 416820 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:33:17,391-Speed 2629.22 samples/sec Loss 6.5230 LearningRate 0.0248 Epoch: 10 Global Step: 416830 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:33:21,290-Speed 2626.77 samples/sec Loss 6.6673 LearningRate 0.0248 Epoch: 10 Global Step: 416840 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:25,185-Speed 2630.21 samples/sec Loss 6.5767 LearningRate 0.0248 Epoch: 10 Global Step: 416850 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:29,083-Speed 2627.63 samples/sec Loss 6.6122 LearningRate 0.0248 Epoch: 10 Global Step: 416860 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:32,979-Speed 2629.28 samples/sec Loss 6.7218 LearningRate 0.0247 Epoch: 10 Global Step: 416870 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:36,886-Speed 2621.60 samples/sec Loss 6.5563 LearningRate 0.0247 Epoch: 10 Global Step: 416880 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:40,783-Speed 2627.62 samples/sec Loss 6.5840 LearningRate 0.0247 Epoch: 10 Global Step: 416890 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:44,680-Speed 2628.18 samples/sec Loss 6.7890 LearningRate 0.0247 Epoch: 10 Global Step: 416900 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:48,591-Speed 2619.41 samples/sec Loss 6.5926 LearningRate 0.0247 Epoch: 10 Global Step: 416910 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:52,511-Speed 2613.50 samples/sec Loss 6.5110 LearningRate 0.0247 Epoch: 10 Global Step: 416920 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:33:56,406-Speed 2628.98 samples/sec Loss 6.6914 LearningRate 0.0247 Epoch: 10 Global Step: 416930 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:34:00,311-Speed 2623.51 samples/sec Loss 6.5837 LearningRate 0.0247 Epoch: 10 Global Step: 416940 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:34:04,184-Speed 2644.53 samples/sec Loss 6.6025 LearningRate 0.0247 Epoch: 10 Global Step: 416950 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:08,084-Speed 2625.51 samples/sec Loss 6.6507 LearningRate 0.0247 Epoch: 10 Global Step: 416960 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:11,981-Speed 2628.49 samples/sec Loss 6.5073 LearningRate 0.0247 Epoch: 10 Global Step: 416970 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:15,875-Speed 2630.89 samples/sec Loss 6.5794 LearningRate 0.0247 Epoch: 10 Global Step: 416980 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:19,770-Speed 2629.59 samples/sec Loss 6.6053 LearningRate 0.0247 Epoch: 10 Global Step: 416990 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:23,662-Speed 2631.15 samples/sec Loss 6.7707 LearningRate 0.0247 Epoch: 10 Global Step: 417000 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:27,559-Speed 2628.81 samples/sec Loss 6.5511 LearningRate 0.0247 Epoch: 10 Global Step: 417010 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:31,459-Speed 2625.70 samples/sec Loss 6.5897 LearningRate 0.0247 Epoch: 10 Global Step: 417020 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:35,358-Speed 2627.09 samples/sec Loss 6.7298 LearningRate 0.0247 Epoch: 10 Global Step: 417030 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:39,266-Speed 2621.27 samples/sec Loss 6.7141 LearningRate 0.0247 Epoch: 10 Global Step: 417040 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:34:43,161-Speed 2629.62 samples/sec Loss 6.4701 LearningRate 0.0247 Epoch: 10 Global Step: 417050 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:34:47,052-Speed 2631.77 samples/sec Loss 6.7462 LearningRate 0.0247 Epoch: 10 Global Step: 417060 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:34:50,954-Speed 2625.44 samples/sec Loss 6.6528 LearningRate 0.0247 Epoch: 10 Global Step: 417070 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:34:54,879-Speed 2609.00 samples/sec Loss 6.6112 LearningRate 0.0247 Epoch: 10 Global Step: 417080 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:34:58,858-Speed 2574.57 samples/sec Loss 6.4884 LearningRate 0.0247 Epoch: 10 Global Step: 417090 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:02,763-Speed 2623.33 samples/sec Loss 6.6113 LearningRate 0.0247 Epoch: 10 Global Step: 417100 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:06,662-Speed 2626.95 samples/sec Loss 6.5154 LearningRate 0.0247 Epoch: 10 Global Step: 417110 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:10,583-Speed 2612.22 samples/sec Loss 6.5593 LearningRate 0.0247 Epoch: 10 Global Step: 417120 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:14,483-Speed 2626.46 samples/sec Loss 6.6875 LearningRate 0.0247 Epoch: 10 Global Step: 417130 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:18,378-Speed 2629.41 samples/sec Loss 6.6284 LearningRate 0.0247 Epoch: 10 Global Step: 417140 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:22,275-Speed 2628.29 samples/sec Loss 6.6844 LearningRate 0.0247 Epoch: 10 Global Step: 417150 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:35:26,157-Speed 2638.48 samples/sec Loss 6.6413 LearningRate 0.0247 Epoch: 10 Global Step: 417160 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:30,157-Speed 2560.94 samples/sec Loss 6.5494 LearningRate 0.0247 Epoch: 10 Global Step: 417170 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:34,057-Speed 2626.32 samples/sec Loss 6.7277 LearningRate 0.0247 Epoch: 10 Global Step: 417180 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:37,955-Speed 2627.48 samples/sec Loss 6.6407 LearningRate 0.0247 Epoch: 10 Global Step: 417190 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:41,854-Speed 2627.19 samples/sec Loss 6.6892 LearningRate 0.0247 Epoch: 10 Global Step: 417200 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:45,766-Speed 2618.47 samples/sec Loss 6.7127 LearningRate 0.0247 Epoch: 10 Global Step: 417210 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:49,664-Speed 2627.05 samples/sec Loss 6.5685 LearningRate 0.0247 Epoch: 10 Global Step: 417220 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:53,558-Speed 2631.07 samples/sec Loss 6.7923 LearningRate 0.0247 Epoch: 10 Global Step: 417230 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:35:57,452-Speed 2630.46 samples/sec Loss 6.5299 LearningRate 0.0247 Epoch: 10 Global Step: 417240 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:01,349-Speed 2627.90 samples/sec Loss 6.7662 LearningRate 0.0247 Epoch: 10 Global Step: 417250 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:05,233-Speed 2637.90 samples/sec Loss 6.7153 LearningRate 0.0247 Epoch: 10 Global Step: 417260 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:09,150-Speed 2614.65 samples/sec Loss 6.6850 LearningRate 0.0247 Epoch: 10 Global Step: 417270 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:13,045-Speed 2630.07 samples/sec Loss 6.7318 LearningRate 0.0247 Epoch: 10 Global Step: 417280 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:16,955-Speed 2619.18 samples/sec Loss 6.6782 LearningRate 0.0247 Epoch: 10 Global Step: 417290 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:21,050-Speed 2501.62 samples/sec Loss 6.6563 LearningRate 0.0247 Epoch: 10 Global Step: 417300 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:25,087-Speed 2537.12 samples/sec Loss 6.5767 LearningRate 0.0247 Epoch: 10 Global Step: 417310 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:28,984-Speed 2628.34 samples/sec Loss 6.6048 LearningRate 0.0247 Epoch: 10 Global Step: 417320 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:32,880-Speed 2629.20 samples/sec Loss 6.7469 LearningRate 0.0247 Epoch: 10 Global Step: 417330 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:36,815-Speed 2602.90 samples/sec Loss 6.7540 LearningRate 0.0247 Epoch: 10 Global Step: 417340 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:40,867-Speed 2527.90 samples/sec Loss 6.6153 LearningRate 0.0247 Epoch: 10 Global Step: 417350 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:36:44,770-Speed 2624.21 samples/sec Loss 6.7268 LearningRate 0.0247 Epoch: 10 Global Step: 417360 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:36:48,634-Speed 2650.73 samples/sec Loss 6.4912 LearningRate 0.0247 Epoch: 10 Global Step: 417370 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:36:52,534-Speed 2626.45 samples/sec Loss 6.5449 LearningRate 0.0247 Epoch: 10 Global Step: 417380 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:36:56,434-Speed 2626.19 samples/sec Loss 6.5476 LearningRate 0.0247 Epoch: 10 Global Step: 417390 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:00,345-Speed 2618.90 samples/sec Loss 6.7309 LearningRate 0.0247 Epoch: 10 Global Step: 417400 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:04,269-Speed 2609.56 samples/sec Loss 6.6664 LearningRate 0.0247 Epoch: 10 Global Step: 417410 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:08,170-Speed 2625.38 samples/sec Loss 6.6325 LearningRate 0.0247 Epoch: 10 Global Step: 417420 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:12,070-Speed 2627.08 samples/sec Loss 6.7016 LearningRate 0.0247 Epoch: 10 Global Step: 417430 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:15,976-Speed 2622.43 samples/sec Loss 6.5952 LearningRate 0.0247 Epoch: 10 Global Step: 417440 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:19,874-Speed 2626.93 samples/sec Loss 6.6002 LearningRate 0.0247 Epoch: 10 Global Step: 417450 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:23,776-Speed 2625.16 samples/sec Loss 6.6118 LearningRate 0.0247 Epoch: 10 Global Step: 417460 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:37:27,679-Speed 2624.98 samples/sec Loss 6.5146 LearningRate 0.0247 Epoch: 10 Global Step: 417470 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:31,582-Speed 2624.36 samples/sec Loss 6.6390 LearningRate 0.0247 Epoch: 10 Global Step: 417480 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:35,496-Speed 2616.87 samples/sec Loss 6.6002 LearningRate 0.0247 Epoch: 10 Global Step: 417490 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:39,396-Speed 2625.96 samples/sec Loss 6.7179 LearningRate 0.0247 Epoch: 10 Global Step: 417500 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:43,293-Speed 2628.81 samples/sec Loss 6.6642 LearningRate 0.0247 Epoch: 10 Global Step: 417510 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:47,191-Speed 2627.57 samples/sec Loss 6.6984 LearningRate 0.0247 Epoch: 10 Global Step: 417520 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:51,089-Speed 2627.70 samples/sec Loss 6.7630 LearningRate 0.0247 Epoch: 10 Global Step: 417530 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:54,985-Speed 2629.38 samples/sec Loss 6.6479 LearningRate 0.0247 Epoch: 10 Global Step: 417540 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:37:58,881-Speed 2628.34 samples/sec Loss 6.6685 LearningRate 0.0247 Epoch: 10 Global Step: 417550 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:02,774-Speed 2631.74 samples/sec Loss 6.4846 LearningRate 0.0247 Epoch: 10 Global Step: 417560 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:06,672-Speed 2627.79 samples/sec Loss 6.5841 LearningRate 0.0247 Epoch: 10 Global Step: 417570 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:38:10,553-Speed 2639.09 samples/sec Loss 6.6014 LearningRate 0.0247 Epoch: 10 Global Step: 417580 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:14,448-Speed 2629.47 samples/sec Loss 6.5487 LearningRate 0.0247 Epoch: 10 Global Step: 417590 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:18,370-Speed 2612.21 samples/sec Loss 6.6363 LearningRate 0.0247 Epoch: 10 Global Step: 417600 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:22,267-Speed 2629.23 samples/sec Loss 6.6368 LearningRate 0.0247 Epoch: 10 Global Step: 417610 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:26,215-Speed 2594.13 samples/sec Loss 6.6602 LearningRate 0.0247 Epoch: 10 Global Step: 417620 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:30,118-Speed 2625.09 samples/sec Loss 6.6578 LearningRate 0.0247 Epoch: 10 Global Step: 417630 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:34,030-Speed 2618.31 samples/sec Loss 6.4692 LearningRate 0.0247 Epoch: 10 Global Step: 417640 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:37,922-Speed 2631.81 samples/sec Loss 6.6362 LearningRate 0.0247 Epoch: 10 Global Step: 417650 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:38:41,798-Speed 2642.03 samples/sec Loss 6.6611 LearningRate 0.0247 Epoch: 10 Global Step: 417660 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:38:45,699-Speed 2632.08 samples/sec Loss 6.6117 LearningRate 0.0247 Epoch: 10 Global Step: 417670 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:38:49,614-Speed 2616.02 samples/sec Loss 6.6174 LearningRate 0.0247 Epoch: 10 Global Step: 417680 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:38:53,506-Speed 2631.98 samples/sec Loss 6.5501 LearningRate 0.0247 Epoch: 10 Global Step: 417690 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:38:57,405-Speed 2627.20 samples/sec Loss 6.6461 LearningRate 0.0247 Epoch: 10 Global Step: 417700 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:39:01,299-Speed 2630.32 samples/sec Loss 6.6264 LearningRate 0.0246 Epoch: 10 Global Step: 417710 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:39:05,235-Speed 2602.42 samples/sec Loss 6.6034 LearningRate 0.0246 Epoch: 10 Global Step: 417720 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:39:09,150-Speed 2615.82 samples/sec Loss 6.6111 LearningRate 0.0246 Epoch: 10 Global Step: 417730 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:39:13,059-Speed 2620.27 samples/sec Loss 6.6126 LearningRate 0.0246 Epoch: 10 Global Step: 417740 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:39:16,951-Speed 2631.36 samples/sec Loss 6.5497 LearningRate 0.0246 Epoch: 10 Global Step: 417750 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:39:20,854-Speed 2624.51 samples/sec Loss 6.5932 LearningRate 0.0246 Epoch: 10 Global Step: 417760 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:24,754-Speed 2626.82 samples/sec Loss 6.7463 LearningRate 0.0246 Epoch: 10 Global Step: 417770 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:28,655-Speed 2625.22 samples/sec Loss 6.6285 LearningRate 0.0246 Epoch: 10 Global Step: 417780 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:32,555-Speed 2626.59 samples/sec Loss 6.6055 LearningRate 0.0246 Epoch: 10 Global Step: 417790 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:36,455-Speed 2626.28 samples/sec Loss 6.5143 LearningRate 0.0246 Epoch: 10 Global Step: 417800 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:40,349-Speed 2629.71 samples/sec Loss 6.5765 LearningRate 0.0246 Epoch: 10 Global Step: 417810 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:44,244-Speed 2630.06 samples/sec Loss 6.6145 LearningRate 0.0246 Epoch: 10 Global Step: 417820 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:48,143-Speed 2626.73 samples/sec Loss 6.5930 LearningRate 0.0246 Epoch: 10 Global Step: 417830 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:52,040-Speed 2628.81 samples/sec Loss 6.6574 LearningRate 0.0246 Epoch: 10 Global Step: 417840 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:55,940-Speed 2626.56 samples/sec Loss 6.6186 LearningRate 0.0246 Epoch: 10 Global Step: 417850 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:39:59,878-Speed 2600.87 samples/sec Loss 6.6750 LearningRate 0.0246 Epoch: 10 Global Step: 417860 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:40:03,753-Speed 2643.20 samples/sec Loss 6.5473 LearningRate 0.0246 Epoch: 10 Global Step: 417870 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:07,828-Speed 2513.38 samples/sec Loss 6.5759 LearningRate 0.0246 Epoch: 10 Global Step: 417880 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:11,728-Speed 2626.14 samples/sec Loss 6.5547 LearningRate 0.0246 Epoch: 10 Global Step: 417890 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:15,628-Speed 2626.52 samples/sec Loss 6.5317 LearningRate 0.0246 Epoch: 10 Global Step: 417900 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:19,530-Speed 2625.18 samples/sec Loss 6.6016 LearningRate 0.0246 Epoch: 10 Global Step: 417910 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:23,425-Speed 2629.51 samples/sec Loss 6.5994 LearningRate 0.0246 Epoch: 10 Global Step: 417920 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:27,330-Speed 2623.27 samples/sec Loss 6.6535 LearningRate 0.0246 Epoch: 10 Global Step: 417930 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:31,230-Speed 2626.16 samples/sec Loss 6.6388 LearningRate 0.0246 Epoch: 10 Global Step: 417940 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:35,130-Speed 2626.27 samples/sec Loss 6.5038 LearningRate 0.0246 Epoch: 10 Global Step: 417950 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:39,029-Speed 2626.78 samples/sec Loss 6.7373 LearningRate 0.0246 Epoch: 10 Global Step: 417960 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:42,930-Speed 2625.15 samples/sec Loss 6.5828 LearningRate 0.0246 Epoch: 10 Global Step: 417970 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:40:46,827-Speed 2628.91 samples/sec Loss 6.6278 LearningRate 0.0246 Epoch: 10 Global Step: 417980 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:40:50,721-Speed 2629.63 samples/sec Loss 6.6554 LearningRate 0.0246 Epoch: 10 Global Step: 417990 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:40:54,603-Speed 2639.39 samples/sec Loss 6.6416 LearningRate 0.0246 Epoch: 10 Global Step: 418000 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:40:58,502-Speed 2626.93 samples/sec Loss 6.6086 LearningRate 0.0246 Epoch: 10 Global Step: 418010 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:02,400-Speed 2627.20 samples/sec Loss 6.5279 LearningRate 0.0246 Epoch: 10 Global Step: 418020 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:06,297-Speed 2628.44 samples/sec Loss 6.5591 LearningRate 0.0246 Epoch: 10 Global Step: 418030 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:10,198-Speed 2625.21 samples/sec Loss 6.6529 LearningRate 0.0246 Epoch: 10 Global Step: 418040 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:14,094-Speed 2628.95 samples/sec Loss 6.7021 LearningRate 0.0246 Epoch: 10 Global Step: 418050 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:18,105-Speed 2554.23 samples/sec Loss 6.5246 LearningRate 0.0246 Epoch: 10 Global Step: 418060 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:22,006-Speed 2625.29 samples/sec Loss 6.6388 LearningRate 0.0246 Epoch: 10 Global Step: 418070 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:25,921-Speed 2616.66 samples/sec Loss 6.6242 LearningRate 0.0246 Epoch: 10 Global Step: 418080 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:29,822-Speed 2625.62 samples/sec Loss 6.6560 LearningRate 0.0246 Epoch: 10 Global Step: 418090 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:33,719-Speed 2628.30 samples/sec Loss 6.7195 LearningRate 0.0246 Epoch: 10 Global Step: 418100 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:37,667-Speed 2594.79 samples/sec Loss 6.6202 LearningRate 0.0246 Epoch: 10 Global Step: 418110 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:41,565-Speed 2627.38 samples/sec Loss 6.7434 LearningRate 0.0246 Epoch: 10 Global Step: 418120 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:45,482-Speed 2614.67 samples/sec Loss 6.6558 LearningRate 0.0246 Epoch: 10 Global Step: 418130 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:41:49,370-Speed 2634.63 samples/sec Loss 6.7467 LearningRate 0.0246 Epoch: 10 Global Step: 418140 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:41:53,269-Speed 2627.12 samples/sec Loss 6.5751 LearningRate 0.0246 Epoch: 10 Global Step: 418150 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:41:57,163-Speed 2630.02 samples/sec Loss 6.6158 LearningRate 0.0246 Epoch: 10 Global Step: 418160 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:01,069-Speed 2622.58 samples/sec Loss 6.5295 LearningRate 0.0246 Epoch: 10 Global Step: 418170 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:04,988-Speed 2613.28 samples/sec Loss 6.7719 LearningRate 0.0246 Epoch: 10 Global Step: 418180 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:08,887-Speed 2627.22 samples/sec Loss 6.6418 LearningRate 0.0246 Epoch: 10 Global Step: 418190 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:12,778-Speed 2632.00 samples/sec Loss 6.6076 LearningRate 0.0246 Epoch: 10 Global Step: 418200 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:16,686-Speed 2621.72 samples/sec Loss 6.6566 LearningRate 0.0246 Epoch: 10 Global Step: 418210 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:20,578-Speed 2631.01 samples/sec Loss 6.4846 LearningRate 0.0246 Epoch: 10 Global Step: 418220 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:24,476-Speed 2627.95 samples/sec Loss 6.6032 LearningRate 0.0246 Epoch: 10 Global Step: 418230 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:28,374-Speed 2627.54 samples/sec Loss 6.7676 LearningRate 0.0246 Epoch: 10 Global Step: 418240 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:42:32,272-Speed 2627.80 samples/sec Loss 6.6852 LearningRate 0.0246 Epoch: 10 Global Step: 418250 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:42:36,166-Speed 2630.35 samples/sec Loss 6.6090 LearningRate 0.0246 Epoch: 10 Global Step: 418260 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:42:40,060-Speed 2629.72 samples/sec Loss 6.6147 LearningRate 0.0246 Epoch: 10 Global Step: 418270 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:42:43,932-Speed 2646.07 samples/sec Loss 6.6148 LearningRate 0.0246 Epoch: 10 Global Step: 418280 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:47,824-Speed 2631.19 samples/sec Loss 6.6057 LearningRate 0.0246 Epoch: 10 Global Step: 418290 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:51,728-Speed 2625.82 samples/sec Loss 6.6163 LearningRate 0.0246 Epoch: 10 Global Step: 418300 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:55,625-Speed 2628.10 samples/sec Loss 6.6191 LearningRate 0.0246 Epoch: 10 Global Step: 418310 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:42:59,542-Speed 2614.98 samples/sec Loss 6.5388 LearningRate 0.0246 Epoch: 10 Global Step: 418320 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:03,455-Speed 2617.33 samples/sec Loss 6.4827 LearningRate 0.0246 Epoch: 10 Global Step: 418330 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:07,356-Speed 2625.65 samples/sec Loss 6.7702 LearningRate 0.0246 Epoch: 10 Global Step: 418340 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:11,262-Speed 2622.15 samples/sec Loss 6.6353 LearningRate 0.0246 Epoch: 10 Global Step: 418350 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:15,158-Speed 2628.91 samples/sec Loss 6.5665 LearningRate 0.0246 Epoch: 10 Global Step: 418360 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:19,076-Speed 2614.41 samples/sec Loss 6.5983 LearningRate 0.0246 Epoch: 10 Global Step: 418370 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:23,148-Speed 2515.52 samples/sec Loss 6.7388 LearningRate 0.0246 Epoch: 10 Global Step: 418380 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:43:27,062-Speed 2617.15 samples/sec Loss 6.4427 LearningRate 0.0246 Epoch: 10 Global Step: 418390 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:43:30,958-Speed 2629.15 samples/sec Loss 6.6220 LearningRate 0.0246 Epoch: 10 Global Step: 418400 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:43:34,855-Speed 2628.10 samples/sec Loss 6.5017 LearningRate 0.0246 Epoch: 10 Global Step: 418410 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:43:38,742-Speed 2634.81 samples/sec Loss 6.5855 LearningRate 0.0246 Epoch: 10 Global Step: 418420 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:42,677-Speed 2603.32 samples/sec Loss 6.6146 LearningRate 0.0246 Epoch: 10 Global Step: 418430 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:46,592-Speed 2616.58 samples/sec Loss 6.6756 LearningRate 0.0246 Epoch: 10 Global Step: 418440 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:50,490-Speed 2627.74 samples/sec Loss 6.5787 LearningRate 0.0246 Epoch: 10 Global Step: 418450 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:54,428-Speed 2600.93 samples/sec Loss 6.5314 LearningRate 0.0246 Epoch: 10 Global Step: 418460 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:43:58,326-Speed 2627.60 samples/sec Loss 6.6747 LearningRate 0.0246 Epoch: 10 Global Step: 418470 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:44:02,227-Speed 2625.54 samples/sec Loss 6.6464 LearningRate 0.0246 Epoch: 10 Global Step: 418480 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:44:06,133-Speed 2622.40 samples/sec Loss 6.7096 LearningRate 0.0246 Epoch: 10 Global Step: 418490 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:44:10,031-Speed 2627.41 samples/sec Loss 6.5253 LearningRate 0.0246 Epoch: 10 Global Step: 418500 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:44:13,926-Speed 2630.07 samples/sec Loss 6.5804 LearningRate 0.0246 Epoch: 10 Global Step: 418510 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:44:17,820-Speed 2630.81 samples/sec Loss 6.7045 LearningRate 0.0246 Epoch: 10 Global Step: 418520 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:21,715-Speed 2629.54 samples/sec Loss 6.5726 LearningRate 0.0246 Epoch: 10 Global Step: 418530 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:25,618-Speed 2624.87 samples/sec Loss 6.6791 LearningRate 0.0246 Epoch: 10 Global Step: 418540 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:29,525-Speed 2621.42 samples/sec Loss 6.6605 LearningRate 0.0245 Epoch: 10 Global Step: 418550 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:33,416-Speed 2631.84 samples/sec Loss 6.6133 LearningRate 0.0245 Epoch: 10 Global Step: 418560 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:37,315-Speed 2626.79 samples/sec Loss 6.5184 LearningRate 0.0245 Epoch: 10 Global Step: 418570 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:41,216-Speed 2626.03 samples/sec Loss 6.5624 LearningRate 0.0245 Epoch: 10 Global Step: 418580 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:45,111-Speed 2630.11 samples/sec Loss 6.5725 LearningRate 0.0245 Epoch: 10 Global Step: 418590 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:49,223-Speed 2490.57 samples/sec Loss 6.6012 LearningRate 0.0245 Epoch: 10 Global Step: 418600 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:53,139-Speed 2616.32 samples/sec Loss 6.6612 LearningRate 0.0245 Epoch: 10 Global Step: 418610 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:44:57,019-Speed 2640.64 samples/sec Loss 6.5104 LearningRate 0.0245 Epoch: 10 Global Step: 418620 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:00,939-Speed 2612.82 samples/sec Loss 6.5837 LearningRate 0.0245 Epoch: 10 Global Step: 418630 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:04,838-Speed 2627.25 samples/sec Loss 6.5247 LearningRate 0.0245 Epoch: 10 Global Step: 418640 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:08,744-Speed 2622.03 samples/sec Loss 6.5786 LearningRate 0.0245 Epoch: 10 Global Step: 418650 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:12,682-Speed 2600.75 samples/sec Loss 6.6093 LearningRate 0.0245 Epoch: 10 Global Step: 418660 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:16,575-Speed 2631.02 samples/sec Loss 6.5702 LearningRate 0.0245 Epoch: 10 Global Step: 418670 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:20,506-Speed 2605.95 samples/sec Loss 6.5542 LearningRate 0.0245 Epoch: 10 Global Step: 418680 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:24,405-Speed 2626.92 samples/sec Loss 6.5148 LearningRate 0.0245 Epoch: 10 Global Step: 418690 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:28,302-Speed 2629.51 samples/sec Loss 6.5377 LearningRate 0.0245 Epoch: 10 Global Step: 418700 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:32,198-Speed 2629.17 samples/sec Loss 6.6521 LearningRate 0.0245 Epoch: 10 Global Step: 418710 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:36,103-Speed 2622.67 samples/sec Loss 6.6598 LearningRate 0.0245 Epoch: 10 Global Step: 418720 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:45:39,986-Speed 2637.57 samples/sec Loss 6.6659 LearningRate 0.0245 Epoch: 10 Global Step: 418730 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:43,980-Speed 2564.29 samples/sec Loss 6.6719 LearningRate 0.0245 Epoch: 10 Global Step: 418740 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:45:47,863-Speed 2637.98 samples/sec Loss 6.5657 LearningRate 0.0245 Epoch: 10 Global Step: 418750 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:45:51,758-Speed 2630.03 samples/sec Loss 6.6334 LearningRate 0.0245 Epoch: 10 Global Step: 418760 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:45:55,655-Speed 2628.53 samples/sec Loss 6.6699 LearningRate 0.0245 Epoch: 10 Global Step: 418770 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:45:59,550-Speed 2630.03 samples/sec Loss 6.6590 LearningRate 0.0245 Epoch: 10 Global Step: 418780 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:46:03,456-Speed 2621.95 samples/sec Loss 6.5924 LearningRate 0.0245 Epoch: 10 Global Step: 418790 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:46:07,357-Speed 2625.40 samples/sec Loss 6.5597 LearningRate 0.0245 Epoch: 10 Global Step: 418800 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:46:11,304-Speed 2595.41 samples/sec Loss 6.5566 LearningRate 0.0245 Epoch: 10 Global Step: 418810 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:46:15,212-Speed 2621.22 samples/sec Loss 6.6266 LearningRate 0.0245 Epoch: 10 Global Step: 418820 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:46:19,113-Speed 2625.63 samples/sec Loss 6.4099 LearningRate 0.0245 Epoch: 10 Global Step: 418830 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:46:23,007-Speed 2630.94 samples/sec Loss 6.6068 LearningRate 0.0245 Epoch: 10 Global Step: 418840 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:46:26,914-Speed 2621.57 samples/sec Loss 6.6260 LearningRate 0.0245 Epoch: 10 Global Step: 418850 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:30,805-Speed 2632.56 samples/sec Loss 6.6356 LearningRate 0.0245 Epoch: 10 Global Step: 418860 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:34,698-Speed 2631.30 samples/sec Loss 6.5674 LearningRate 0.0245 Epoch: 10 Global Step: 418870 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:38,592-Speed 2630.24 samples/sec Loss 6.6647 LearningRate 0.0245 Epoch: 10 Global Step: 418880 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:42,485-Speed 2630.47 samples/sec Loss 6.7363 LearningRate 0.0245 Epoch: 10 Global Step: 418890 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:46,377-Speed 2632.15 samples/sec Loss 6.6241 LearningRate 0.0245 Epoch: 10 Global Step: 418900 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:50,266-Speed 2633.69 samples/sec Loss 6.7181 LearningRate 0.0245 Epoch: 10 Global Step: 418910 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:54,165-Speed 2627.31 samples/sec Loss 6.5513 LearningRate 0.0245 Epoch: 10 Global Step: 418920 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:46:58,077-Speed 2617.64 samples/sec Loss 6.5812 LearningRate 0.0245 Epoch: 10 Global Step: 418930 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:01,973-Speed 2629.68 samples/sec Loss 6.6322 LearningRate 0.0245 Epoch: 10 Global Step: 418940 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:05,846-Speed 2644.25 samples/sec Loss 6.5395 LearningRate 0.0245 Epoch: 10 Global Step: 418950 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:09,769-Speed 2610.95 samples/sec Loss 6.6465 LearningRate 0.0245 Epoch: 10 Global Step: 418960 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:13,664-Speed 2629.29 samples/sec Loss 6.5556 LearningRate 0.0245 Epoch: 10 Global Step: 418970 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:17,756-Speed 2503.70 samples/sec Loss 6.5794 LearningRate 0.0245 Epoch: 10 Global Step: 418980 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:21,648-Speed 2631.63 samples/sec Loss 6.6227 LearningRate 0.0245 Epoch: 10 Global Step: 418990 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:25,554-Speed 2622.28 samples/sec Loss 6.5953 LearningRate 0.0245 Epoch: 10 Global Step: 419000 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:29,468-Speed 2616.89 samples/sec Loss 6.5592 LearningRate 0.0245 Epoch: 10 Global Step: 419010 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:33,373-Speed 2623.31 samples/sec Loss 6.6826 LearningRate 0.0245 Epoch: 10 Global Step: 419020 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:37,271-Speed 2627.23 samples/sec Loss 6.6250 LearningRate 0.0245 Epoch: 10 Global Step: 419030 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:41,172-Speed 2625.27 samples/sec Loss 6.5621 LearningRate 0.0245 Epoch: 10 Global Step: 419040 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:45,048-Speed 2642.84 samples/sec Loss 6.5211 LearningRate 0.0245 Epoch: 10 Global Step: 419050 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:48,959-Speed 2619.19 samples/sec Loss 6.5138 LearningRate 0.0245 Epoch: 10 Global Step: 419060 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:52,857-Speed 2628.16 samples/sec Loss 6.5490 LearningRate 0.0245 Epoch: 10 Global Step: 419070 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:47:56,799-Speed 2598.40 samples/sec Loss 6.5586 LearningRate 0.0245 Epoch: 10 Global Step: 419080 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:48:00,700-Speed 2625.96 samples/sec Loss 6.5660 LearningRate 0.0245 Epoch: 10 Global Step: 419090 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:48:04,597-Speed 2628.16 samples/sec Loss 6.6471 LearningRate 0.0245 Epoch: 10 Global Step: 419100 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:08,488-Speed 2632.05 samples/sec Loss 6.5820 LearningRate 0.0245 Epoch: 10 Global Step: 419110 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:12,379-Speed 2632.30 samples/sec Loss 6.5713 LearningRate 0.0245 Epoch: 10 Global Step: 419120 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:16,282-Speed 2624.89 samples/sec Loss 6.5720 LearningRate 0.0245 Epoch: 10 Global Step: 419130 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:20,184-Speed 2624.90 samples/sec Loss 6.4777 LearningRate 0.0245 Epoch: 10 Global Step: 419140 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:24,082-Speed 2627.68 samples/sec Loss 6.5987 LearningRate 0.0245 Epoch: 10 Global Step: 419150 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:27,980-Speed 2628.03 samples/sec Loss 6.6288 LearningRate 0.0245 Epoch: 10 Global Step: 419160 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:31,882-Speed 2624.85 samples/sec Loss 6.5897 LearningRate 0.0245 Epoch: 10 Global Step: 419170 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:35,782-Speed 2626.39 samples/sec Loss 6.5593 LearningRate 0.0245 Epoch: 10 Global Step: 419180 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:39,678-Speed 2629.08 samples/sec Loss 6.5234 LearningRate 0.0245 Epoch: 10 Global Step: 419190 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:48:43,571-Speed 2630.70 samples/sec Loss 6.4920 LearningRate 0.0245 Epoch: 10 Global Step: 419200 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:48:47,480-Speed 2620.37 samples/sec Loss 6.6288 LearningRate 0.0245 Epoch: 10 Global Step: 419210 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:48:51,388-Speed 2621.61 samples/sec Loss 6.6132 LearningRate 0.0245 Epoch: 10 Global Step: 419220 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:48:55,304-Speed 2615.46 samples/sec Loss 6.5851 LearningRate 0.0245 Epoch: 10 Global Step: 419230 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:48:59,196-Speed 2631.52 samples/sec Loss 6.6027 LearningRate 0.0245 Epoch: 10 Global Step: 419240 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:03,092-Speed 2628.78 samples/sec Loss 6.5625 LearningRate 0.0245 Epoch: 10 Global Step: 419250 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:06,991-Speed 2627.64 samples/sec Loss 6.6194 LearningRate 0.0245 Epoch: 10 Global Step: 419260 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:10,895-Speed 2623.20 samples/sec Loss 6.5629 LearningRate 0.0245 Epoch: 10 Global Step: 419270 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:14,793-Speed 2627.84 samples/sec Loss 6.6236 LearningRate 0.0245 Epoch: 10 Global Step: 419280 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:18,702-Speed 2620.32 samples/sec Loss 6.5913 LearningRate 0.0245 Epoch: 10 Global Step: 419290 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:22,602-Speed 2626.46 samples/sec Loss 6.5610 LearningRate 0.0245 Epoch: 10 Global Step: 419300 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:49:26,510-Speed 2621.58 samples/sec Loss 6.6556 LearningRate 0.0245 Epoch: 10 Global Step: 419310 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:49:30,400-Speed 2632.79 samples/sec Loss 6.5551 LearningRate 0.0245 Epoch: 10 Global Step: 419320 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:34,308-Speed 2620.77 samples/sec Loss 6.6848 LearningRate 0.0245 Epoch: 10 Global Step: 419330 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:38,203-Speed 2629.93 samples/sec Loss 6.5926 LearningRate 0.0245 Epoch: 10 Global Step: 419340 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:42,103-Speed 2628.09 samples/sec Loss 6.5479 LearningRate 0.0245 Epoch: 10 Global Step: 419350 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:46,002-Speed 2626.49 samples/sec Loss 6.6099 LearningRate 0.0245 Epoch: 10 Global Step: 419360 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:49,921-Speed 2613.75 samples/sec Loss 6.6058 LearningRate 0.0245 Epoch: 10 Global Step: 419370 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:53,846-Speed 2609.66 samples/sec Loss 6.6570 LearningRate 0.0244 Epoch: 10 Global Step: 419380 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:49:57,751-Speed 2627.77 samples/sec Loss 6.5952 LearningRate 0.0244 Epoch: 10 Global Step: 419390 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:01,644-Speed 2630.64 samples/sec Loss 6.5406 LearningRate 0.0244 Epoch: 10 Global Step: 419400 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:05,551-Speed 2621.78 samples/sec Loss 6.4914 LearningRate 0.0244 Epoch: 10 Global Step: 419410 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:09,429-Speed 2640.59 samples/sec Loss 6.5656 LearningRate 0.0244 Epoch: 10 Global Step: 419420 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:13,343-Speed 2618.21 samples/sec Loss 6.5124 LearningRate 0.0244 Epoch: 10 Global Step: 419430 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:17,242-Speed 2626.66 samples/sec Loss 6.5278 LearningRate 0.0244 Epoch: 10 Global Step: 419440 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:21,141-Speed 2627.58 samples/sec Loss 6.5523 LearningRate 0.0244 Epoch: 10 Global Step: 419450 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:25,043-Speed 2625.26 samples/sec Loss 6.7447 LearningRate 0.0244 Epoch: 10 Global Step: 419460 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:28,940-Speed 2628.55 samples/sec Loss 6.6282 LearningRate 0.0244 Epoch: 10 Global Step: 419470 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:32,839-Speed 2626.91 samples/sec Loss 6.6463 LearningRate 0.0244 Epoch: 10 Global Step: 419480 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:36,741-Speed 2624.92 samples/sec Loss 6.7194 LearningRate 0.0244 Epoch: 10 Global Step: 419490 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:40,640-Speed 2626.56 samples/sec Loss 6.6751 LearningRate 0.0244 Epoch: 10 Global Step: 419500 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:44,615-Speed 2577.89 samples/sec Loss 6.4907 LearningRate 0.0244 Epoch: 10 Global Step: 419510 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:48,503-Speed 2634.16 samples/sec Loss 6.5854 LearningRate 0.0244 Epoch: 10 Global Step: 419520 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:52,419-Speed 2615.50 samples/sec Loss 6.5552 LearningRate 0.0244 Epoch: 10 Global Step: 419530 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:50:56,409-Speed 2567.52 samples/sec Loss 6.5771 LearningRate 0.0244 Epoch: 10 Global Step: 419540 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:00,314-Speed 2622.88 samples/sec Loss 6.6371 LearningRate 0.0244 Epoch: 10 Global Step: 419550 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:04,222-Speed 2620.53 samples/sec Loss 6.5806 LearningRate 0.0244 Epoch: 10 Global Step: 419560 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:08,126-Speed 2623.60 samples/sec Loss 6.6160 LearningRate 0.0244 Epoch: 10 Global Step: 419570 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:12,021-Speed 2630.58 samples/sec Loss 6.5803 LearningRate 0.0244 Epoch: 10 Global Step: 419580 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:15,932-Speed 2618.51 samples/sec Loss 6.5221 LearningRate 0.0244 Epoch: 10 Global Step: 419590 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:19,825-Speed 2631.27 samples/sec Loss 6.5808 LearningRate 0.0244 Epoch: 10 Global Step: 419600 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:23,740-Speed 2616.42 samples/sec Loss 6.5996 LearningRate 0.0244 Epoch: 10 Global Step: 419610 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:27,644-Speed 2623.94 samples/sec Loss 6.6829 LearningRate 0.0244 Epoch: 10 Global Step: 419620 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:51:31,525-Speed 2639.27 samples/sec Loss 6.5958 LearningRate 0.0244 Epoch: 10 Global Step: 419630 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:35,420-Speed 2629.58 samples/sec Loss 6.6345 LearningRate 0.0244 Epoch: 10 Global Step: 419640 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:39,322-Speed 2624.72 samples/sec Loss 6.5740 LearningRate 0.0244 Epoch: 10 Global Step: 419650 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:43,265-Speed 2597.28 samples/sec Loss 6.5686 LearningRate 0.0244 Epoch: 10 Global Step: 419660 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:47,235-Speed 2580.04 samples/sec Loss 6.5009 LearningRate 0.0244 Epoch: 10 Global Step: 419670 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:51,321-Speed 2506.98 samples/sec Loss 6.6176 LearningRate 0.0244 Epoch: 10 Global Step: 419680 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:55,401-Speed 2510.28 samples/sec Loss 6.5277 LearningRate 0.0244 Epoch: 10 Global Step: 419690 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:51:59,462-Speed 2521.96 samples/sec Loss 6.5478 LearningRate 0.0244 Epoch: 10 Global Step: 419700 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:52:03,359-Speed 2628.76 samples/sec Loss 6.5684 LearningRate 0.0244 Epoch: 10 Global Step: 419710 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:52:07,263-Speed 2623.56 samples/sec Loss 6.6201 LearningRate 0.0244 Epoch: 10 Global Step: 419720 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:52:11,163-Speed 2625.51 samples/sec Loss 6.4589 LearningRate 0.0244 Epoch: 10 Global Step: 419730 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:52:15,076-Speed 2618.30 samples/sec Loss 6.6325 LearningRate 0.0244 Epoch: 10 Global Step: 419740 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:52:19,018-Speed 2598.45 samples/sec Loss 6.4897 LearningRate 0.0244 Epoch: 10 Global Step: 419750 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:22,937-Speed 2613.96 samples/sec Loss 6.6678 LearningRate 0.0244 Epoch: 10 Global Step: 419760 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:26,860-Speed 2610.67 samples/sec Loss 6.6819 LearningRate 0.0244 Epoch: 10 Global Step: 419770 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:30,765-Speed 2623.47 samples/sec Loss 6.6575 LearningRate 0.0244 Epoch: 10 Global Step: 419780 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:34,661-Speed 2629.06 samples/sec Loss 6.6177 LearningRate 0.0244 Epoch: 10 Global Step: 419790 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:38,564-Speed 2623.80 samples/sec Loss 6.5323 LearningRate 0.0244 Epoch: 10 Global Step: 419800 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:42,459-Speed 2629.69 samples/sec Loss 6.5604 LearningRate 0.0244 Epoch: 10 Global Step: 419810 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:46,359-Speed 2626.21 samples/sec Loss 6.6272 LearningRate 0.0244 Epoch: 10 Global Step: 419820 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:50,261-Speed 2624.56 samples/sec Loss 6.5452 LearningRate 0.0244 Epoch: 10 Global Step: 419830 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:54,176-Speed 2616.59 samples/sec Loss 6.5735 LearningRate 0.0244 Epoch: 10 Global Step: 419840 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:52:58,079-Speed 2624.25 samples/sec Loss 6.6391 LearningRate 0.0244 Epoch: 10 Global Step: 419850 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:02,100-Speed 2547.66 samples/sec Loss 6.6588 LearningRate 0.0244 Epoch: 10 Global Step: 419860 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:05,998-Speed 2627.44 samples/sec Loss 6.6368 LearningRate 0.0244 Epoch: 10 Global Step: 419870 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:09,900-Speed 2624.97 samples/sec Loss 6.6181 LearningRate 0.0244 Epoch: 10 Global Step: 419880 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:13,796-Speed 2628.64 samples/sec Loss 6.6027 LearningRate 0.0244 Epoch: 10 Global Step: 419890 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:17,703-Speed 2623.11 samples/sec Loss 6.5725 LearningRate 0.0244 Epoch: 10 Global Step: 419900 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:21,603-Speed 2626.08 samples/sec Loss 6.6727 LearningRate 0.0244 Epoch: 10 Global Step: 419910 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:25,499-Speed 2629.76 samples/sec Loss 6.5230 LearningRate 0.0244 Epoch: 10 Global Step: 419920 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:29,397-Speed 2627.81 samples/sec Loss 6.5886 LearningRate 0.0244 Epoch: 10 Global Step: 419930 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:33,317-Speed 2613.12 samples/sec Loss 6.5769 LearningRate 0.0244 Epoch: 10 Global Step: 419940 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:53:37,194-Speed 2641.74 samples/sec Loss 6.5521 LearningRate 0.0244 Epoch: 10 Global Step: 419950 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:53:41,097-Speed 2624.61 samples/sec Loss 6.5680 LearningRate 0.0244 Epoch: 10 Global Step: 419960 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:53:45,000-Speed 2623.80 samples/sec Loss 6.6904 LearningRate 0.0244 Epoch: 10 Global Step: 419970 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:53:48,908-Speed 2621.10 samples/sec Loss 6.4872 LearningRate 0.0244 Epoch: 10 Global Step: 419980 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:53:52,805-Speed 2628.38 samples/sec Loss 6.4816 LearningRate 0.0244 Epoch: 10 Global Step: 419990 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:53:56,702-Speed 2628.73 samples/sec Loss 6.5280 LearningRate 0.0244 Epoch: 10 Global Step: 420000 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:54:39,590-[lfw][420000]XNorm: 23.189145
Training: 2022-04-14 18:54:39,590-[lfw][420000]Accuracy-Flip: 0.99750+-0.00261
Training: 2022-04-14 18:54:39,591-[lfw][420000]Accuracy-Highest: 0.99783
Training: 2022-04-14 18:55:29,226-[cfp_fp][420000]XNorm: 21.710651
Training: 2022-04-14 18:55:29,227-[cfp_fp][420000]Accuracy-Flip: 0.98843+-0.00517
Training: 2022-04-14 18:55:29,228-[cfp_fp][420000]Accuracy-Highest: 0.98843
Training: 2022-04-14 18:56:12,483-[agedb_30][420000]XNorm: 23.156444
Training: 2022-04-14 18:56:12,484-[agedb_30][420000]Accuracy-Flip: 0.97733+-0.00429
Training: 2022-04-14 18:56:12,484-[agedb_30][420000]Accuracy-Highest: 0.97733
Training: 2022-04-14 18:56:16,363-Speed 73.32 samples/sec Loss 6.5663 LearningRate 0.0244 Epoch: 10 Global Step: 420010 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:56:20,416-Speed 2527.20 samples/sec Loss 6.6214 LearningRate 0.0244 Epoch: 10 Global Step: 420020 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:56:24,531-Speed 2488.92 samples/sec Loss 6.5726 LearningRate 0.0244 Epoch: 10 Global Step: 420030 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:56:28,517-Speed 2569.80 samples/sec Loss 6.7303 LearningRate 0.0244 Epoch: 10 Global Step: 420040 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:56:32,392-Speed 2643.20 samples/sec Loss 6.5454 LearningRate 0.0244 Epoch: 10 Global Step: 420050 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:56:36,297-Speed 2622.64 samples/sec Loss 6.6500 LearningRate 0.0244 Epoch: 10 Global Step: 420060 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:56:40,186-Speed 2634.92 samples/sec Loss 6.5349 LearningRate 0.0244 Epoch: 10 Global Step: 420070 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:56:44,236-Speed 2529.30 samples/sec Loss 6.6165 LearningRate 0.0244 Epoch: 10 Global Step: 420080 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:56:48,140-Speed 2623.42 samples/sec Loss 6.5889 LearningRate 0.0244 Epoch: 10 Global Step: 420090 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:56:52,061-Speed 2611.94 samples/sec Loss 6.5852 LearningRate 0.0244 Epoch: 10 Global Step: 420100 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:56:55,978-Speed 2616.80 samples/sec Loss 6.7120 LearningRate 0.0244 Epoch: 10 Global Step: 420110 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:56:59,882-Speed 2623.79 samples/sec Loss 6.5262 LearningRate 0.0244 Epoch: 10 Global Step: 420120 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:03,791-Speed 2620.32 samples/sec Loss 6.6043 LearningRate 0.0244 Epoch: 10 Global Step: 420130 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:07,693-Speed 2625.04 samples/sec Loss 6.5641 LearningRate 0.0244 Epoch: 10 Global Step: 420140 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:11,609-Speed 2615.14 samples/sec Loss 6.5982 LearningRate 0.0244 Epoch: 10 Global Step: 420150 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 18:57:15,524-Speed 2616.59 samples/sec Loss 6.4839 LearningRate 0.0244 Epoch: 10 Global Step: 420160 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:19,425-Speed 2625.67 samples/sec Loss 6.6255 LearningRate 0.0244 Epoch: 10 Global Step: 420170 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:23,329-Speed 2623.39 samples/sec Loss 6.5437 LearningRate 0.0244 Epoch: 10 Global Step: 420180 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:27,238-Speed 2620.53 samples/sec Loss 6.5889 LearningRate 0.0244 Epoch: 10 Global Step: 420190 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:31,158-Speed 2613.24 samples/sec Loss 6.5834 LearningRate 0.0244 Epoch: 10 Global Step: 420200 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:35,066-Speed 2620.59 samples/sec Loss 6.5946 LearningRate 0.0244 Epoch: 10 Global Step: 420210 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:57:38,960-Speed 2630.04 samples/sec Loss 6.4654 LearningRate 0.0243 Epoch: 10 Global Step: 420220 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:57:42,873-Speed 2617.71 samples/sec Loss 6.6003 LearningRate 0.0243 Epoch: 10 Global Step: 420230 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:57:46,773-Speed 2626.29 samples/sec Loss 6.5167 LearningRate 0.0243 Epoch: 10 Global Step: 420240 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:57:50,755-Speed 2572.32 samples/sec Loss 6.6565 LearningRate 0.0243 Epoch: 10 Global Step: 420250 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:57:54,667-Speed 2618.32 samples/sec Loss 6.5951 LearningRate 0.0243 Epoch: 10 Global Step: 420260 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:57:58,600-Speed 2604.27 samples/sec Loss 6.6246 LearningRate 0.0243 Epoch: 10 Global Step: 420270 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:58:02,524-Speed 2610.47 samples/sec Loss 6.7312 LearningRate 0.0243 Epoch: 10 Global Step: 420280 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:58:06,444-Speed 2612.89 samples/sec Loss 6.5838 LearningRate 0.0243 Epoch: 10 Global Step: 420290 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:58:10,363-Speed 2613.40 samples/sec Loss 6.5225 LearningRate 0.0243 Epoch: 10 Global Step: 420300 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:58:14,263-Speed 2626.11 samples/sec Loss 6.7248 LearningRate 0.0243 Epoch: 10 Global Step: 420310 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:58:18,163-Speed 2626.02 samples/sec Loss 6.6453 LearningRate 0.0243 Epoch: 10 Global Step: 420320 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:22,072-Speed 2620.23 samples/sec Loss 6.4056 LearningRate 0.0243 Epoch: 10 Global Step: 420330 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:25,979-Speed 2622.33 samples/sec Loss 6.4951 LearningRate 0.0243 Epoch: 10 Global Step: 420340 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:29,916-Speed 2601.32 samples/sec Loss 6.5814 LearningRate 0.0243 Epoch: 10 Global Step: 420350 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:33,841-Speed 2609.35 samples/sec Loss 6.5488 LearningRate 0.0243 Epoch: 10 Global Step: 420360 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:37,762-Speed 2612.82 samples/sec Loss 6.5444 LearningRate 0.0243 Epoch: 10 Global Step: 420370 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:41,675-Speed 2617.78 samples/sec Loss 6.5381 LearningRate 0.0243 Epoch: 10 Global Step: 420380 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:45,588-Speed 2617.24 samples/sec Loss 6.6701 LearningRate 0.0243 Epoch: 10 Global Step: 420390 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 18:58:49,472-Speed 2637.23 samples/sec Loss 6.6457 LearningRate 0.0243 Epoch: 10 Global Step: 420400 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:58:53,385-Speed 2617.15 samples/sec Loss 6.5216 LearningRate 0.0243 Epoch: 10 Global Step: 420410 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:58:57,302-Speed 2615.34 samples/sec Loss 6.5766 LearningRate 0.0243 Epoch: 10 Global Step: 420420 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:01,202-Speed 2625.61 samples/sec Loss 6.5581 LearningRate 0.0243 Epoch: 10 Global Step: 420430 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:05,102-Speed 2626.33 samples/sec Loss 6.4596 LearningRate 0.0243 Epoch: 10 Global Step: 420440 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:09,000-Speed 2627.81 samples/sec Loss 6.5185 LearningRate 0.0243 Epoch: 10 Global Step: 420450 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:12,910-Speed 2619.72 samples/sec Loss 6.5507 LearningRate 0.0243 Epoch: 10 Global Step: 420460 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:16,822-Speed 2618.02 samples/sec Loss 6.6196 LearningRate 0.0243 Epoch: 10 Global Step: 420470 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:20,740-Speed 2614.58 samples/sec Loss 6.5239 LearningRate 0.0243 Epoch: 10 Global Step: 420480 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:24,644-Speed 2623.76 samples/sec Loss 6.5803 LearningRate 0.0243 Epoch: 10 Global Step: 420490 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:28,519-Speed 2643.39 samples/sec Loss 6.4610 LearningRate 0.0243 Epoch: 10 Global Step: 420500 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:32,417-Speed 2627.49 samples/sec Loss 6.4816 LearningRate 0.0243 Epoch: 10 Global Step: 420510 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:36,315-Speed 2627.75 samples/sec Loss 6.5510 LearningRate 0.0243 Epoch: 10 Global Step: 420520 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:40,216-Speed 2625.92 samples/sec Loss 6.4721 LearningRate 0.0243 Epoch: 10 Global Step: 420530 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:44,119-Speed 2623.62 samples/sec Loss 6.6717 LearningRate 0.0243 Epoch: 10 Global Step: 420540 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:48,033-Speed 2617.36 samples/sec Loss 6.5471 LearningRate 0.0243 Epoch: 10 Global Step: 420550 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:51,932-Speed 2627.33 samples/sec Loss 6.5444 LearningRate 0.0243 Epoch: 10 Global Step: 420560 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:55,862-Speed 2606.10 samples/sec Loss 6.5877 LearningRate 0.0243 Epoch: 10 Global Step: 420570 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 18:59:59,769-Speed 2621.23 samples/sec Loss 6.6381 LearningRate 0.0243 Epoch: 10 Global Step: 420580 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:00:03,684-Speed 2617.08 samples/sec Loss 6.5537 LearningRate 0.0243 Epoch: 10 Global Step: 420590 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:00:07,584-Speed 2625.64 samples/sec Loss 6.6253 LearningRate 0.0243 Epoch: 10 Global Step: 420600 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:11,507-Speed 2611.52 samples/sec Loss 6.6168 LearningRate 0.0243 Epoch: 10 Global Step: 420610 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:15,406-Speed 2626.59 samples/sec Loss 6.4789 LearningRate 0.0243 Epoch: 10 Global Step: 420620 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:19,309-Speed 2624.67 samples/sec Loss 6.4511 LearningRate 0.0243 Epoch: 10 Global Step: 420630 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:23,491-Speed 2449.36 samples/sec Loss 6.5979 LearningRate 0.0243 Epoch: 10 Global Step: 420640 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:27,391-Speed 2625.95 samples/sec Loss 6.5395 LearningRate 0.0243 Epoch: 10 Global Step: 420650 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:31,305-Speed 2616.92 samples/sec Loss 6.5079 LearningRate 0.0243 Epoch: 10 Global Step: 420660 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:35,205-Speed 2626.40 samples/sec Loss 6.5336 LearningRate 0.0243 Epoch: 10 Global Step: 420670 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:39,108-Speed 2624.53 samples/sec Loss 6.5256 LearningRate 0.0243 Epoch: 10 Global Step: 420680 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:00:42,984-Speed 2642.72 samples/sec Loss 6.6214 LearningRate 0.0243 Epoch: 10 Global Step: 420690 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:00:46,884-Speed 2626.75 samples/sec Loss 6.6154 LearningRate 0.0243 Epoch: 10 Global Step: 420700 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:00:50,785-Speed 2625.25 samples/sec Loss 6.6098 LearningRate 0.0243 Epoch: 10 Global Step: 420710 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:00:54,687-Speed 2625.47 samples/sec Loss 6.5479 LearningRate 0.0243 Epoch: 10 Global Step: 420720 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:00:58,597-Speed 2619.46 samples/sec Loss 6.5242 LearningRate 0.0243 Epoch: 10 Global Step: 420730 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:01:02,497-Speed 2626.24 samples/sec Loss 6.5832 LearningRate 0.0243 Epoch: 10 Global Step: 420740 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:01:06,396-Speed 2626.69 samples/sec Loss 6.4459 LearningRate 0.0243 Epoch: 10 Global Step: 420750 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:01:10,295-Speed 2627.52 samples/sec Loss 6.6386 LearningRate 0.0243 Epoch: 10 Global Step: 420760 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:01:14,193-Speed 2627.30 samples/sec Loss 6.6820 LearningRate 0.0243 Epoch: 10 Global Step: 420770 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:01:18,092-Speed 2626.97 samples/sec Loss 6.5428 LearningRate 0.0243 Epoch: 10 Global Step: 420780 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:01:21,992-Speed 2627.43 samples/sec Loss 6.6263 LearningRate 0.0243 Epoch: 10 Global Step: 420790 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:25,894-Speed 2624.68 samples/sec Loss 6.6247 LearningRate 0.0243 Epoch: 10 Global Step: 420800 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:29,813-Speed 2613.92 samples/sec Loss 6.6052 LearningRate 0.0243 Epoch: 10 Global Step: 420810 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:33,713-Speed 2626.24 samples/sec Loss 6.6498 LearningRate 0.0243 Epoch: 10 Global Step: 420820 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:37,612-Speed 2626.78 samples/sec Loss 6.5787 LearningRate 0.0243 Epoch: 10 Global Step: 420830 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:41,510-Speed 2627.80 samples/sec Loss 6.5427 LearningRate 0.0243 Epoch: 10 Global Step: 420840 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:45,412-Speed 2624.73 samples/sec Loss 6.5557 LearningRate 0.0243 Epoch: 10 Global Step: 420850 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:49,320-Speed 2620.71 samples/sec Loss 6.6204 LearningRate 0.0243 Epoch: 10 Global Step: 420860 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:53,221-Speed 2626.10 samples/sec Loss 6.5130 LearningRate 0.0243 Epoch: 10 Global Step: 420870 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:01:57,141-Speed 2612.79 samples/sec Loss 6.6064 LearningRate 0.0243 Epoch: 10 Global Step: 420880 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:01,031-Speed 2633.67 samples/sec Loss 6.6043 LearningRate 0.0243 Epoch: 10 Global Step: 420890 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:04,947-Speed 2615.67 samples/sec Loss 6.4712 LearningRate 0.0243 Epoch: 10 Global Step: 420900 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:08,873-Speed 2608.66 samples/sec Loss 6.5197 LearningRate 0.0243 Epoch: 10 Global Step: 420910 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:12,799-Speed 2608.83 samples/sec Loss 6.6750 LearningRate 0.0243 Epoch: 10 Global Step: 420920 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:16,715-Speed 2615.75 samples/sec Loss 6.5373 LearningRate 0.0243 Epoch: 10 Global Step: 420930 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:20,620-Speed 2622.80 samples/sec Loss 6.5713 LearningRate 0.0243 Epoch: 10 Global Step: 420940 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:24,518-Speed 2627.66 samples/sec Loss 6.6721 LearningRate 0.0243 Epoch: 10 Global Step: 420950 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:28,413-Speed 2629.98 samples/sec Loss 6.6035 LearningRate 0.0243 Epoch: 10 Global Step: 420960 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:32,311-Speed 2627.61 samples/sec Loss 6.5655 LearningRate 0.0243 Epoch: 10 Global Step: 420970 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:36,216-Speed 2622.55 samples/sec Loss 6.5855 LearningRate 0.0243 Epoch: 10 Global Step: 420980 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:02:40,113-Speed 2628.25 samples/sec Loss 6.4984 LearningRate 0.0243 Epoch: 10 Global Step: 420990 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:02:43,981-Speed 2648.54 samples/sec Loss 6.6096 LearningRate 0.0243 Epoch: 10 Global Step: 421000 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:02:47,885-Speed 2623.21 samples/sec Loss 6.5485 LearningRate 0.0243 Epoch: 10 Global Step: 421010 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:02:51,788-Speed 2624.41 samples/sec Loss 6.6115 LearningRate 0.0243 Epoch: 10 Global Step: 421020 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:02:55,686-Speed 2627.60 samples/sec Loss 6.7015 LearningRate 0.0243 Epoch: 10 Global Step: 421030 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:02:59,586-Speed 2626.58 samples/sec Loss 6.5338 LearningRate 0.0243 Epoch: 10 Global Step: 421040 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:03:03,504-Speed 2614.41 samples/sec Loss 6.5896 LearningRate 0.0243 Epoch: 10 Global Step: 421050 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:03:07,413-Speed 2620.21 samples/sec Loss 6.6040 LearningRate 0.0242 Epoch: 10 Global Step: 421060 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:03:11,313-Speed 2626.11 samples/sec Loss 6.7723 LearningRate 0.0242 Epoch: 10 Global Step: 421070 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:03:15,213-Speed 2626.15 samples/sec Loss 6.6080 LearningRate 0.0242 Epoch: 10 Global Step: 421080 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:03:19,143-Speed 2606.35 samples/sec Loss 6.6839 LearningRate 0.0242 Epoch: 10 Global Step: 421090 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:03:23,047-Speed 2624.53 samples/sec Loss 6.5798 LearningRate 0.0242 Epoch: 10 Global Step: 421100 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:26,939-Speed 2631.45 samples/sec Loss 6.5282 LearningRate 0.0242 Epoch: 10 Global Step: 421110 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:30,834-Speed 2630.31 samples/sec Loss 6.6758 LearningRate 0.0242 Epoch: 10 Global Step: 421120 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:34,817-Speed 2571.32 samples/sec Loss 6.5416 LearningRate 0.0242 Epoch: 10 Global Step: 421130 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:38,800-Speed 2571.58 samples/sec Loss 6.4254 LearningRate 0.0242 Epoch: 10 Global Step: 421140 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:42,701-Speed 2625.08 samples/sec Loss 6.6142 LearningRate 0.0242 Epoch: 10 Global Step: 421150 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:46,612-Speed 2619.18 samples/sec Loss 6.5583 LearningRate 0.0242 Epoch: 10 Global Step: 421160 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:50,515-Speed 2624.16 samples/sec Loss 6.5205 LearningRate 0.0242 Epoch: 10 Global Step: 421170 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:54,426-Speed 2618.75 samples/sec Loss 6.7227 LearningRate 0.0242 Epoch: 10 Global Step: 421180 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:03:58,325-Speed 2627.38 samples/sec Loss 6.5832 LearningRate 0.0242 Epoch: 10 Global Step: 421190 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:04:02,225-Speed 2626.48 samples/sec Loss 6.5008 LearningRate 0.0242 Epoch: 10 Global Step: 421200 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:04:06,106-Speed 2639.04 samples/sec Loss 6.5944 LearningRate 0.0242 Epoch: 10 Global Step: 421210 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:04:10,025-Speed 2613.29 samples/sec Loss 6.5534 LearningRate 0.0242 Epoch: 10 Global Step: 421220 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:04:13,925-Speed 2625.94 samples/sec Loss 6.6919 LearningRate 0.0242 Epoch: 10 Global Step: 421230 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:04:17,827-Speed 2625.52 samples/sec Loss 6.6298 LearningRate 0.0242 Epoch: 10 Global Step: 421240 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:04:21,732-Speed 2623.13 samples/sec Loss 6.5686 LearningRate 0.0242 Epoch: 10 Global Step: 421250 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:04:25,654-Speed 2611.97 samples/sec Loss 6.5166 LearningRate 0.0242 Epoch: 10 Global Step: 421260 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:04:29,528-Speed 2644.22 samples/sec Loss 6.5259 LearningRate 0.0242 Epoch: 10 Global Step: 421270 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:04:33,424-Speed 2628.30 samples/sec Loss 6.4926 LearningRate 0.0242 Epoch: 10 Global Step: 421280 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:04:37,320-Speed 2629.41 samples/sec Loss 6.4780 LearningRate 0.0242 Epoch: 10 Global Step: 421290 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:04:41,224-Speed 2623.61 samples/sec Loss 6.5036 LearningRate 0.0242 Epoch: 10 Global Step: 421300 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:04:45,122-Speed 2627.61 samples/sec Loss 6.4533 LearningRate 0.0242 Epoch: 10 Global Step: 421310 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:04:49,034-Speed 2618.54 samples/sec Loss 6.5467 LearningRate 0.0242 Epoch: 10 Global Step: 421320 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:04:52,930-Speed 2628.56 samples/sec Loss 6.6520 LearningRate 0.0242 Epoch: 10 Global Step: 421330 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:04:56,833-Speed 2624.73 samples/sec Loss 6.5036 LearningRate 0.0242 Epoch: 10 Global Step: 421340 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:00,739-Speed 2621.55 samples/sec Loss 6.5017 LearningRate 0.0242 Epoch: 10 Global Step: 421350 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:04,655-Speed 2615.59 samples/sec Loss 6.5885 LearningRate 0.0242 Epoch: 10 Global Step: 421360 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:08,536-Speed 2639.41 samples/sec Loss 6.5464 LearningRate 0.0242 Epoch: 10 Global Step: 421370 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:12,439-Speed 2624.54 samples/sec Loss 6.5829 LearningRate 0.0242 Epoch: 10 Global Step: 421380 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:16,351-Speed 2618.27 samples/sec Loss 6.5922 LearningRate 0.0242 Epoch: 10 Global Step: 421390 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:20,258-Speed 2621.56 samples/sec Loss 6.5247 LearningRate 0.0242 Epoch: 10 Global Step: 421400 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:24,154-Speed 2628.91 samples/sec Loss 6.6007 LearningRate 0.0242 Epoch: 10 Global Step: 421410 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:28,058-Speed 2623.19 samples/sec Loss 6.6021 LearningRate 0.0242 Epoch: 10 Global Step: 421420 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:31,958-Speed 2626.34 samples/sec Loss 6.6070 LearningRate 0.0242 Epoch: 10 Global Step: 421430 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:35,859-Speed 2625.90 samples/sec Loss 6.5370 LearningRate 0.0242 Epoch: 10 Global Step: 421440 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:39,759-Speed 2626.13 samples/sec Loss 6.4311 LearningRate 0.0242 Epoch: 10 Global Step: 421450 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:43,658-Speed 2627.38 samples/sec Loss 6.5125 LearningRate 0.0242 Epoch: 10 Global Step: 421460 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:05:47,572-Speed 2616.75 samples/sec Loss 6.5165 LearningRate 0.0242 Epoch: 10 Global Step: 421470 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:05:51,503-Speed 2606.10 samples/sec Loss 6.4745 LearningRate 0.0242 Epoch: 10 Global Step: 421480 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:05:55,409-Speed 2623.15 samples/sec Loss 6.5595 LearningRate 0.0242 Epoch: 10 Global Step: 421490 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:05:59,312-Speed 2624.43 samples/sec Loss 6.5864 LearningRate 0.0242 Epoch: 10 Global Step: 421500 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:03,207-Speed 2629.05 samples/sec Loss 6.5489 LearningRate 0.0242 Epoch: 10 Global Step: 421510 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:07,106-Speed 2626.97 samples/sec Loss 6.5676 LearningRate 0.0242 Epoch: 10 Global Step: 421520 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:11,004-Speed 2628.20 samples/sec Loss 6.4911 LearningRate 0.0242 Epoch: 10 Global Step: 421530 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:14,899-Speed 2629.92 samples/sec Loss 6.7070 LearningRate 0.0242 Epoch: 10 Global Step: 421540 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:18,796-Speed 2628.37 samples/sec Loss 6.4474 LearningRate 0.0242 Epoch: 10 Global Step: 421550 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:22,698-Speed 2624.33 samples/sec Loss 6.4984 LearningRate 0.0242 Epoch: 10 Global Step: 421560 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:26,614-Speed 2615.47 samples/sec Loss 6.5532 LearningRate 0.0242 Epoch: 10 Global Step: 421570 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:06:30,524-Speed 2619.31 samples/sec Loss 6.5055 LearningRate 0.0242 Epoch: 10 Global Step: 421580 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:06:34,420-Speed 2629.26 samples/sec Loss 6.6114 LearningRate 0.0242 Epoch: 10 Global Step: 421590 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:38,326-Speed 2622.39 samples/sec Loss 6.5081 LearningRate 0.0242 Epoch: 10 Global Step: 421600 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:42,220-Speed 2630.70 samples/sec Loss 6.4663 LearningRate 0.0242 Epoch: 10 Global Step: 421610 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:06:46,097-Speed 2642.17 samples/sec Loss 6.5950 LearningRate 0.0242 Epoch: 10 Global Step: 421620 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:06:49,998-Speed 2625.61 samples/sec Loss 6.4475 LearningRate 0.0242 Epoch: 10 Global Step: 421630 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:06:53,893-Speed 2629.37 samples/sec Loss 6.4295 LearningRate 0.0242 Epoch: 10 Global Step: 421640 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:06:57,788-Speed 2629.75 samples/sec Loss 6.4516 LearningRate 0.0242 Epoch: 10 Global Step: 421650 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:01,699-Speed 2618.42 samples/sec Loss 6.6409 LearningRate 0.0242 Epoch: 10 Global Step: 421660 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:05,605-Speed 2622.10 samples/sec Loss 6.5823 LearningRate 0.0242 Epoch: 10 Global Step: 421670 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:09,503-Speed 2628.18 samples/sec Loss 6.5089 LearningRate 0.0242 Epoch: 10 Global Step: 421680 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:13,421-Speed 2613.68 samples/sec Loss 6.6614 LearningRate 0.0242 Epoch: 10 Global Step: 421690 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:17,318-Speed 2628.75 samples/sec Loss 6.3498 LearningRate 0.0242 Epoch: 10 Global Step: 421700 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:21,217-Speed 2627.49 samples/sec Loss 6.4938 LearningRate 0.0242 Epoch: 10 Global Step: 421710 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:25,137-Speed 2612.93 samples/sec Loss 6.5563 LearningRate 0.0242 Epoch: 10 Global Step: 421720 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:07:29,036-Speed 2626.57 samples/sec Loss 6.4648 LearningRate 0.0242 Epoch: 10 Global Step: 421730 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:07:32,944-Speed 2620.93 samples/sec Loss 6.5229 LearningRate 0.0242 Epoch: 10 Global Step: 421740 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:36,853-Speed 2619.96 samples/sec Loss 6.5089 LearningRate 0.0242 Epoch: 10 Global Step: 421750 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:40,755-Speed 2624.96 samples/sec Loss 6.6020 LearningRate 0.0242 Epoch: 10 Global Step: 421760 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:44,654-Speed 2627.30 samples/sec Loss 6.5735 LearningRate 0.0242 Epoch: 10 Global Step: 421770 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:48,553-Speed 2627.17 samples/sec Loss 6.4835 LearningRate 0.0242 Epoch: 10 Global Step: 421780 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:52,560-Speed 2556.07 samples/sec Loss 6.5155 LearningRate 0.0242 Epoch: 10 Global Step: 421790 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:07:56,454-Speed 2630.34 samples/sec Loss 6.5172 LearningRate 0.0242 Epoch: 10 Global Step: 421800 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:08:00,353-Speed 2627.65 samples/sec Loss 6.4864 LearningRate 0.0242 Epoch: 10 Global Step: 421810 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:08:04,250-Speed 2627.75 samples/sec Loss 6.6247 LearningRate 0.0242 Epoch: 10 Global Step: 421820 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:08:08,158-Speed 2620.86 samples/sec Loss 6.4807 LearningRate 0.0242 Epoch: 10 Global Step: 421830 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:08:12,059-Speed 2625.62 samples/sec Loss 6.6187 LearningRate 0.0242 Epoch: 10 Global Step: 421840 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:15,962-Speed 2624.52 samples/sec Loss 6.5301 LearningRate 0.0242 Epoch: 10 Global Step: 421850 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:19,859-Speed 2628.11 samples/sec Loss 6.6390 LearningRate 0.0242 Epoch: 10 Global Step: 421860 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:23,765-Speed 2623.42 samples/sec Loss 6.5646 LearningRate 0.0242 Epoch: 10 Global Step: 421870 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:27,675-Speed 2619.22 samples/sec Loss 6.5707 LearningRate 0.0242 Epoch: 10 Global Step: 421880 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:31,585-Speed 2620.53 samples/sec Loss 6.6874 LearningRate 0.0242 Epoch: 10 Global Step: 421890 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:35,480-Speed 2629.52 samples/sec Loss 6.5161 LearningRate 0.0242 Epoch: 10 Global Step: 421900 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:39,379-Speed 2627.04 samples/sec Loss 6.5138 LearningRate 0.0241 Epoch: 10 Global Step: 421910 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:43,284-Speed 2622.24 samples/sec Loss 6.5859 LearningRate 0.0241 Epoch: 10 Global Step: 421920 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:08:47,178-Speed 2630.69 samples/sec Loss 6.4912 LearningRate 0.0241 Epoch: 10 Global Step: 421930 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:08:51,078-Speed 2626.21 samples/sec Loss 6.5628 LearningRate 0.0241 Epoch: 10 Global Step: 421940 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:08:54,987-Speed 2620.31 samples/sec Loss 6.5701 LearningRate 0.0241 Epoch: 10 Global Step: 421950 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:08:58,879-Speed 2632.23 samples/sec Loss 6.5446 LearningRate 0.0241 Epoch: 10 Global Step: 421960 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:02,775-Speed 2628.79 samples/sec Loss 6.5894 LearningRate 0.0241 Epoch: 10 Global Step: 421970 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:06,674-Speed 2626.63 samples/sec Loss 6.5206 LearningRate 0.0241 Epoch: 10 Global Step: 421980 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:10,581-Speed 2621.99 samples/sec Loss 6.5105 LearningRate 0.0241 Epoch: 10 Global Step: 421990 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:14,474-Speed 2631.29 samples/sec Loss 6.5692 LearningRate 0.0241 Epoch: 10 Global Step: 422000 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:18,376-Speed 2624.23 samples/sec Loss 6.6146 LearningRate 0.0241 Epoch: 10 Global Step: 422010 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:22,297-Speed 2613.18 samples/sec Loss 6.6249 LearningRate 0.0241 Epoch: 10 Global Step: 422020 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:26,210-Speed 2617.11 samples/sec Loss 6.5731 LearningRate 0.0241 Epoch: 10 Global Step: 422030 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:09:30,321-Speed 2491.95 samples/sec Loss 6.5313 LearningRate 0.0241 Epoch: 10 Global Step: 422040 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:09:34,214-Speed 2630.98 samples/sec Loss 6.5840 LearningRate 0.0241 Epoch: 10 Global Step: 422050 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:09:38,142-Speed 2607.25 samples/sec Loss 6.6838 LearningRate 0.0241 Epoch: 10 Global Step: 422060 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:09:42,038-Speed 2629.44 samples/sec Loss 6.5746 LearningRate 0.0241 Epoch: 10 Global Step: 422070 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:09:45,932-Speed 2630.72 samples/sec Loss 6.5751 LearningRate 0.0241 Epoch: 10 Global Step: 422080 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:09:49,810-Speed 2640.74 samples/sec Loss 6.4154 LearningRate 0.0241 Epoch: 10 Global Step: 422090 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:53,710-Speed 2626.25 samples/sec Loss 6.5361 LearningRate 0.0241 Epoch: 10 Global Step: 422100 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:09:57,618-Speed 2620.72 samples/sec Loss 6.5893 LearningRate 0.0241 Epoch: 10 Global Step: 422110 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:01,522-Speed 2624.26 samples/sec Loss 6.5104 LearningRate 0.0241 Epoch: 10 Global Step: 422120 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:05,423-Speed 2625.57 samples/sec Loss 6.5686 LearningRate 0.0241 Epoch: 10 Global Step: 422130 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:09,341-Speed 2614.39 samples/sec Loss 6.5693 LearningRate 0.0241 Epoch: 10 Global Step: 422140 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:13,249-Speed 2620.07 samples/sec Loss 6.6213 LearningRate 0.0241 Epoch: 10 Global Step: 422150 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:17,347-Speed 2499.74 samples/sec Loss 6.4355 LearningRate 0.0241 Epoch: 10 Global Step: 422160 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:21,348-Speed 2560.21 samples/sec Loss 6.5321 LearningRate 0.0241 Epoch: 10 Global Step: 422170 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:25,244-Speed 2628.50 samples/sec Loss 6.4815 LearningRate 0.0241 Epoch: 10 Global Step: 422180 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:10:29,143-Speed 2627.20 samples/sec Loss 6.5168 LearningRate 0.0241 Epoch: 10 Global Step: 422190 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:10:33,172-Speed 2542.25 samples/sec Loss 6.3554 LearningRate 0.0241 Epoch: 10 Global Step: 422200 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:10:37,070-Speed 2627.48 samples/sec Loss 6.4946 LearningRate 0.0241 Epoch: 10 Global Step: 422210 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:10:40,969-Speed 2627.50 samples/sec Loss 6.5387 LearningRate 0.0241 Epoch: 10 Global Step: 422220 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:10:44,867-Speed 2627.16 samples/sec Loss 6.5329 LearningRate 0.0241 Epoch: 10 Global Step: 422230 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:10:48,767-Speed 2626.51 samples/sec Loss 6.5622 LearningRate 0.0241 Epoch: 10 Global Step: 422240 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:10:52,668-Speed 2625.21 samples/sec Loss 6.5025 LearningRate 0.0241 Epoch: 10 Global Step: 422250 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:10:56,552-Speed 2637.66 samples/sec Loss 6.6152 LearningRate 0.0241 Epoch: 10 Global Step: 422260 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:00,457-Speed 2622.53 samples/sec Loss 6.4429 LearningRate 0.0241 Epoch: 10 Global Step: 422270 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:04,357-Speed 2626.11 samples/sec Loss 6.6294 LearningRate 0.0241 Epoch: 10 Global Step: 422280 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:08,258-Speed 2625.93 samples/sec Loss 6.4557 LearningRate 0.0241 Epoch: 10 Global Step: 422290 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:12,154-Speed 2628.96 samples/sec Loss 6.4342 LearningRate 0.0241 Epoch: 10 Global Step: 422300 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:16,052-Speed 2628.67 samples/sec Loss 6.5969 LearningRate 0.0241 Epoch: 10 Global Step: 422310 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:19,955-Speed 2623.65 samples/sec Loss 6.6176 LearningRate 0.0241 Epoch: 10 Global Step: 422320 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:23,863-Speed 2621.72 samples/sec Loss 6.4873 LearningRate 0.0241 Epoch: 10 Global Step: 422330 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:27,764-Speed 2625.30 samples/sec Loss 6.5074 LearningRate 0.0241 Epoch: 10 Global Step: 422340 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:31,661-Speed 2628.08 samples/sec Loss 6.5973 LearningRate 0.0241 Epoch: 10 Global Step: 422350 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:11:35,558-Speed 2628.36 samples/sec Loss 6.5269 LearningRate 0.0241 Epoch: 10 Global Step: 422360 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:11:39,454-Speed 2629.37 samples/sec Loss 6.6221 LearningRate 0.0241 Epoch: 10 Global Step: 422370 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:11:43,369-Speed 2615.99 samples/sec Loss 6.5818 LearningRate 0.0241 Epoch: 10 Global Step: 422380 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:11:47,265-Speed 2629.32 samples/sec Loss 6.4682 LearningRate 0.0241 Epoch: 10 Global Step: 422390 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:11:51,195-Speed 2606.44 samples/sec Loss 6.5417 LearningRate 0.0241 Epoch: 10 Global Step: 422400 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:11:55,089-Speed 2630.28 samples/sec Loss 6.5464 LearningRate 0.0241 Epoch: 10 Global Step: 422410 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:11:58,997-Speed 2620.92 samples/sec Loss 6.5446 LearningRate 0.0241 Epoch: 10 Global Step: 422420 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:12:02,908-Speed 2618.95 samples/sec Loss 6.5696 LearningRate 0.0241 Epoch: 10 Global Step: 422430 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:12:06,807-Speed 2626.51 samples/sec Loss 6.5252 LearningRate 0.0241 Epoch: 10 Global Step: 422440 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:12:10,717-Speed 2619.55 samples/sec Loss 6.5225 LearningRate 0.0241 Epoch: 10 Global Step: 422450 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:12:14,603-Speed 2636.11 samples/sec Loss 6.7238 LearningRate 0.0241 Epoch: 10 Global Step: 422460 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:12:18,500-Speed 2628.77 samples/sec Loss 6.5765 LearningRate 0.0241 Epoch: 10 Global Step: 422470 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:12:22,371-Speed 2646.27 samples/sec Loss 6.4887 LearningRate 0.0241 Epoch: 10 Global Step: 422480 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:26,304-Speed 2604.42 samples/sec Loss 6.5326 LearningRate 0.0241 Epoch: 10 Global Step: 422490 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:30,215-Speed 2618.37 samples/sec Loss 6.5411 LearningRate 0.0241 Epoch: 10 Global Step: 422500 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:34,116-Speed 2626.23 samples/sec Loss 6.5506 LearningRate 0.0241 Epoch: 10 Global Step: 422510 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:38,015-Speed 2626.64 samples/sec Loss 6.4540 LearningRate 0.0241 Epoch: 10 Global Step: 422520 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:41,913-Speed 2627.73 samples/sec Loss 6.5279 LearningRate 0.0241 Epoch: 10 Global Step: 422530 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:45,812-Speed 2627.06 samples/sec Loss 6.5640 LearningRate 0.0241 Epoch: 10 Global Step: 422540 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:49,707-Speed 2629.78 samples/sec Loss 6.6413 LearningRate 0.0241 Epoch: 10 Global Step: 422550 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:53,618-Speed 2618.70 samples/sec Loss 6.3966 LearningRate 0.0241 Epoch: 10 Global Step: 422560 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:12:57,516-Speed 2628.56 samples/sec Loss 6.4529 LearningRate 0.0241 Epoch: 10 Global Step: 422570 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:01,409-Speed 2630.90 samples/sec Loss 6.5476 LearningRate 0.0241 Epoch: 10 Global Step: 422580 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:13:05,307-Speed 2627.41 samples/sec Loss 6.5982 LearningRate 0.0241 Epoch: 10 Global Step: 422590 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:13:09,182-Speed 2642.89 samples/sec Loss 6.5841 LearningRate 0.0241 Epoch: 10 Global Step: 422600 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:13,083-Speed 2626.07 samples/sec Loss 6.5261 LearningRate 0.0241 Epoch: 10 Global Step: 422610 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:16,982-Speed 2626.91 samples/sec Loss 6.5313 LearningRate 0.0241 Epoch: 10 Global Step: 422620 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:20,885-Speed 2624.27 samples/sec Loss 6.4977 LearningRate 0.0241 Epoch: 10 Global Step: 422630 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:24,776-Speed 2632.63 samples/sec Loss 6.4018 LearningRate 0.0241 Epoch: 10 Global Step: 422640 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:28,673-Speed 2628.60 samples/sec Loss 6.5770 LearningRate 0.0241 Epoch: 10 Global Step: 422650 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:32,595-Speed 2611.59 samples/sec Loss 6.5440 LearningRate 0.0241 Epoch: 10 Global Step: 422660 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:36,492-Speed 2628.16 samples/sec Loss 6.5682 LearningRate 0.0241 Epoch: 10 Global Step: 422670 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:40,384-Speed 2631.74 samples/sec Loss 6.5510 LearningRate 0.0241 Epoch: 10 Global Step: 422680 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:44,289-Speed 2622.65 samples/sec Loss 6.5114 LearningRate 0.0241 Epoch: 10 Global Step: 422690 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:13:48,198-Speed 2620.11 samples/sec Loss 6.4895 LearningRate 0.0241 Epoch: 10 Global Step: 422700 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:13:52,096-Speed 2627.55 samples/sec Loss 6.6709 LearningRate 0.0241 Epoch: 10 Global Step: 422710 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:13:55,991-Speed 2630.32 samples/sec Loss 6.4882 LearningRate 0.0241 Epoch: 10 Global Step: 422720 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:13:59,889-Speed 2627.31 samples/sec Loss 6.4860 LearningRate 0.0241 Epoch: 10 Global Step: 422730 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:03,790-Speed 2625.89 samples/sec Loss 6.5104 LearningRate 0.0241 Epoch: 10 Global Step: 422740 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:07,693-Speed 2624.03 samples/sec Loss 6.5936 LearningRate 0.0240 Epoch: 10 Global Step: 422750 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:11,593-Speed 2626.20 samples/sec Loss 6.4597 LearningRate 0.0240 Epoch: 10 Global Step: 422760 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:15,504-Speed 2618.66 samples/sec Loss 6.4790 LearningRate 0.0240 Epoch: 10 Global Step: 422770 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:19,424-Speed 2612.71 samples/sec Loss 6.6869 LearningRate 0.0240 Epoch: 10 Global Step: 422780 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:23,322-Speed 2628.33 samples/sec Loss 6.6548 LearningRate 0.0240 Epoch: 10 Global Step: 422790 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:27,368-Speed 2531.42 samples/sec Loss 6.6936 LearningRate 0.0240 Epoch: 10 Global Step: 422800 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:14:31,302-Speed 2603.61 samples/sec Loss 6.5711 LearningRate 0.0240 Epoch: 10 Global Step: 422810 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:35,202-Speed 2625.95 samples/sec Loss 6.5244 LearningRate 0.0240 Epoch: 10 Global Step: 422820 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:39,168-Speed 2582.47 samples/sec Loss 6.5316 LearningRate 0.0240 Epoch: 10 Global Step: 422830 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:43,065-Speed 2628.16 samples/sec Loss 6.5162 LearningRate 0.0240 Epoch: 10 Global Step: 422840 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:46,963-Speed 2627.95 samples/sec Loss 6.5706 LearningRate 0.0240 Epoch: 10 Global Step: 422850 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:14:50,840-Speed 2641.73 samples/sec Loss 6.5713 LearningRate 0.0240 Epoch: 10 Global Step: 422860 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:14:54,742-Speed 2625.36 samples/sec Loss 6.4563 LearningRate 0.0240 Epoch: 10 Global Step: 422870 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:14:58,638-Speed 2628.85 samples/sec Loss 6.5335 LearningRate 0.0240 Epoch: 10 Global Step: 422880 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:02,640-Speed 2559.21 samples/sec Loss 6.6453 LearningRate 0.0240 Epoch: 10 Global Step: 422890 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:06,537-Speed 2628.62 samples/sec Loss 6.4555 LearningRate 0.0240 Epoch: 10 Global Step: 422900 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:10,433-Speed 2629.13 samples/sec Loss 6.5684 LearningRate 0.0240 Epoch: 10 Global Step: 422910 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:14,353-Speed 2612.42 samples/sec Loss 6.5795 LearningRate 0.0240 Epoch: 10 Global Step: 422920 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:18,304-Speed 2592.70 samples/sec Loss 6.5894 LearningRate 0.0240 Epoch: 10 Global Step: 422930 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:22,297-Speed 2565.26 samples/sec Loss 6.5528 LearningRate 0.0240 Epoch: 10 Global Step: 422940 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:26,196-Speed 2627.27 samples/sec Loss 6.5142 LearningRate 0.0240 Epoch: 10 Global Step: 422950 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:15:30,097-Speed 2625.53 samples/sec Loss 6.4752 LearningRate 0.0240 Epoch: 10 Global Step: 422960 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:15:34,000-Speed 2623.85 samples/sec Loss 6.4768 LearningRate 0.0240 Epoch: 10 Global Step: 422970 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:15:37,895-Speed 2629.98 samples/sec Loss 6.5551 LearningRate 0.0240 Epoch: 10 Global Step: 422980 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:15:41,793-Speed 2627.65 samples/sec Loss 6.4883 LearningRate 0.0240 Epoch: 10 Global Step: 422990 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:15:45,930-Speed 2476.18 samples/sec Loss 6.6056 LearningRate 0.0240 Epoch: 10 Global Step: 423000 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:15:49,854-Speed 2609.91 samples/sec Loss 6.6238 LearningRate 0.0240 Epoch: 10 Global Step: 423010 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:15:53,776-Speed 2611.70 samples/sec Loss 6.4693 LearningRate 0.0240 Epoch: 10 Global Step: 423020 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:15:57,676-Speed 2626.02 samples/sec Loss 6.4900 LearningRate 0.0240 Epoch: 10 Global Step: 423030 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:16:01,584-Speed 2621.03 samples/sec Loss 6.5585 LearningRate 0.0240 Epoch: 10 Global Step: 423040 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:16:05,498-Speed 2616.94 samples/sec Loss 6.7279 LearningRate 0.0240 Epoch: 10 Global Step: 423050 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:16:09,389-Speed 2632.09 samples/sec Loss 6.5514 LearningRate 0.0240 Epoch: 10 Global Step: 423060 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:16:13,278-Speed 2633.89 samples/sec Loss 6.5285 LearningRate 0.0240 Epoch: 10 Global Step: 423070 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:16:17,270-Speed 2565.94 samples/sec Loss 6.4977 LearningRate 0.0240 Epoch: 10 Global Step: 423080 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:16:21,175-Speed 2622.86 samples/sec Loss 6.4719 LearningRate 0.0240 Epoch: 10 Global Step: 423090 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:16:25,072-Speed 2628.62 samples/sec Loss 6.4800 LearningRate 0.0240 Epoch: 10 Global Step: 423100 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:29,006-Speed 2603.03 samples/sec Loss 6.4824 LearningRate 0.0240 Epoch: 10 Global Step: 423110 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:32,915-Speed 2620.68 samples/sec Loss 6.4608 LearningRate 0.0240 Epoch: 10 Global Step: 423120 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:36,812-Speed 2628.60 samples/sec Loss 6.4524 LearningRate 0.0240 Epoch: 10 Global Step: 423130 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:40,707-Speed 2629.52 samples/sec Loss 6.4731 LearningRate 0.0240 Epoch: 10 Global Step: 423140 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:44,628-Speed 2611.89 samples/sec Loss 6.5244 LearningRate 0.0240 Epoch: 10 Global Step: 423150 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:48,549-Speed 2612.46 samples/sec Loss 6.5746 LearningRate 0.0240 Epoch: 10 Global Step: 423160 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:52,446-Speed 2628.54 samples/sec Loss 6.5444 LearningRate 0.0240 Epoch: 10 Global Step: 423170 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:16:56,344-Speed 2627.98 samples/sec Loss 6.4999 LearningRate 0.0240 Epoch: 10 Global Step: 423180 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:17:00,334-Speed 2566.72 samples/sec Loss 6.5250 LearningRate 0.0240 Epoch: 10 Global Step: 423190 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:17:04,262-Speed 2608.18 samples/sec Loss 6.4159 LearningRate 0.0240 Epoch: 10 Global Step: 423200 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:08,161-Speed 2626.91 samples/sec Loss 6.5176 LearningRate 0.0240 Epoch: 10 Global Step: 423210 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:12,062-Speed 2625.55 samples/sec Loss 6.5631 LearningRate 0.0240 Epoch: 10 Global Step: 423220 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:15,964-Speed 2624.28 samples/sec Loss 6.5915 LearningRate 0.0240 Epoch: 10 Global Step: 423230 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:19,861-Speed 2628.91 samples/sec Loss 6.6326 LearningRate 0.0240 Epoch: 10 Global Step: 423240 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:23,770-Speed 2620.46 samples/sec Loss 6.5148 LearningRate 0.0240 Epoch: 10 Global Step: 423250 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:27,671-Speed 2625.97 samples/sec Loss 6.5745 LearningRate 0.0240 Epoch: 10 Global Step: 423260 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:31,588-Speed 2614.28 samples/sec Loss 6.6232 LearningRate 0.0240 Epoch: 10 Global Step: 423270 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:35,492-Speed 2623.90 samples/sec Loss 6.5562 LearningRate 0.0240 Epoch: 10 Global Step: 423280 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:39,393-Speed 2625.58 samples/sec Loss 6.4989 LearningRate 0.0240 Epoch: 10 Global Step: 423290 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:43,292-Speed 2627.05 samples/sec Loss 6.4809 LearningRate 0.0240 Epoch: 10 Global Step: 423300 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:17:47,189-Speed 2628.51 samples/sec Loss 6.5519 LearningRate 0.0240 Epoch: 10 Global Step: 423310 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:17:51,097-Speed 2620.91 samples/sec Loss 6.5138 LearningRate 0.0240 Epoch: 10 Global Step: 423320 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:54,996-Speed 2627.65 samples/sec Loss 6.5935 LearningRate 0.0240 Epoch: 10 Global Step: 423330 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:17:58,891-Speed 2629.48 samples/sec Loss 6.5290 LearningRate 0.0240 Epoch: 10 Global Step: 423340 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:02,790-Speed 2626.86 samples/sec Loss 6.5709 LearningRate 0.0240 Epoch: 10 Global Step: 423350 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:06,690-Speed 2625.74 samples/sec Loss 6.5903 LearningRate 0.0240 Epoch: 10 Global Step: 423360 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:10,597-Speed 2622.02 samples/sec Loss 6.4302 LearningRate 0.0240 Epoch: 10 Global Step: 423370 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:14,498-Speed 2625.46 samples/sec Loss 6.5061 LearningRate 0.0240 Epoch: 10 Global Step: 423380 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:18,402-Speed 2624.19 samples/sec Loss 6.4613 LearningRate 0.0240 Epoch: 10 Global Step: 423390 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:22,326-Speed 2609.57 samples/sec Loss 6.4778 LearningRate 0.0240 Epoch: 10 Global Step: 423400 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:26,220-Speed 2631.59 samples/sec Loss 6.6814 LearningRate 0.0240 Epoch: 10 Global Step: 423410 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:30,146-Speed 2608.76 samples/sec Loss 6.5140 LearningRate 0.0240 Epoch: 10 Global Step: 423420 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:18:34,022-Speed 2642.18 samples/sec Loss 6.6688 LearningRate 0.0240 Epoch: 10 Global Step: 423430 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:37,945-Speed 2610.84 samples/sec Loss 6.5296 LearningRate 0.0240 Epoch: 10 Global Step: 423440 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:41,855-Speed 2620.25 samples/sec Loss 6.5494 LearningRate 0.0240 Epoch: 10 Global Step: 423450 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:45,764-Speed 2620.21 samples/sec Loss 6.5478 LearningRate 0.0240 Epoch: 10 Global Step: 423460 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:49,673-Speed 2620.14 samples/sec Loss 6.3881 LearningRate 0.0240 Epoch: 10 Global Step: 423470 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:53,566-Speed 2631.03 samples/sec Loss 6.5673 LearningRate 0.0240 Epoch: 10 Global Step: 423480 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:18:57,460-Speed 2630.96 samples/sec Loss 6.4304 LearningRate 0.0240 Epoch: 10 Global Step: 423490 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:01,357-Speed 2627.81 samples/sec Loss 6.4544 LearningRate 0.0240 Epoch: 10 Global Step: 423500 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:05,358-Speed 2559.99 samples/sec Loss 6.5216 LearningRate 0.0240 Epoch: 10 Global Step: 423510 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:09,267-Speed 2619.67 samples/sec Loss 6.4220 LearningRate 0.0240 Epoch: 10 Global Step: 423520 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:13,143-Speed 2642.80 samples/sec Loss 6.5680 LearningRate 0.0240 Epoch: 10 Global Step: 423530 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:17,042-Speed 2626.96 samples/sec Loss 6.5343 LearningRate 0.0240 Epoch: 10 Global Step: 423540 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:20,941-Speed 2627.22 samples/sec Loss 6.5660 LearningRate 0.0240 Epoch: 10 Global Step: 423550 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:24,843-Speed 2625.26 samples/sec Loss 6.5744 LearningRate 0.0240 Epoch: 10 Global Step: 423560 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:28,742-Speed 2626.89 samples/sec Loss 6.5547 LearningRate 0.0240 Epoch: 10 Global Step: 423570 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:32,639-Speed 2628.59 samples/sec Loss 6.5086 LearningRate 0.0240 Epoch: 10 Global Step: 423580 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:36,533-Speed 2629.91 samples/sec Loss 6.5418 LearningRate 0.0240 Epoch: 10 Global Step: 423590 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:40,427-Speed 2630.05 samples/sec Loss 6.4220 LearningRate 0.0239 Epoch: 10 Global Step: 423600 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:44,322-Speed 2629.81 samples/sec Loss 6.4559 LearningRate 0.0239 Epoch: 10 Global Step: 423610 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:19:48,197-Speed 2643.14 samples/sec Loss 6.5791 LearningRate 0.0239 Epoch: 10 Global Step: 423620 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:19:52,138-Speed 2599.21 samples/sec Loss 6.5453 LearningRate 0.0239 Epoch: 10 Global Step: 423630 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:19:56,036-Speed 2627.42 samples/sec Loss 6.5896 LearningRate 0.0239 Epoch: 10 Global Step: 423640 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:19:59,948-Speed 2618.80 samples/sec Loss 6.4750 LearningRate 0.0239 Epoch: 10 Global Step: 423650 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:20:03,848-Speed 2626.36 samples/sec Loss 6.5927 LearningRate 0.0239 Epoch: 10 Global Step: 423660 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:20:07,780-Speed 2604.94 samples/sec Loss 6.6815 LearningRate 0.0239 Epoch: 10 Global Step: 423670 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:20:11,762-Speed 2572.19 samples/sec Loss 6.5962 LearningRate 0.0239 Epoch: 10 Global Step: 423680 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:20:15,858-Speed 2501.04 samples/sec Loss 6.6094 LearningRate 0.0239 Epoch: 10 Global Step: 423690 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:20:19,799-Speed 2598.67 samples/sec Loss 6.6834 LearningRate 0.0239 Epoch: 10 Global Step: 423700 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:20:23,744-Speed 2596.37 samples/sec Loss 6.5202 LearningRate 0.0239 Epoch: 10 Global Step: 423710 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:20:27,795-Speed 2528.15 samples/sec Loss 6.3975 LearningRate 0.0239 Epoch: 10 Global Step: 423720 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:31,701-Speed 2622.93 samples/sec Loss 6.6140 LearningRate 0.0239 Epoch: 10 Global Step: 423730 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:35,611-Speed 2618.98 samples/sec Loss 6.5852 LearningRate 0.0239 Epoch: 10 Global Step: 423740 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:39,553-Speed 2598.53 samples/sec Loss 6.5779 LearningRate 0.0239 Epoch: 10 Global Step: 423750 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:43,453-Speed 2626.45 samples/sec Loss 6.4894 LearningRate 0.0239 Epoch: 10 Global Step: 423760 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:47,343-Speed 2633.65 samples/sec Loss 6.5676 LearningRate 0.0239 Epoch: 10 Global Step: 423770 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:51,257-Speed 2616.84 samples/sec Loss 6.4244 LearningRate 0.0239 Epoch: 10 Global Step: 423780 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:55,160-Speed 2625.01 samples/sec Loss 6.4580 LearningRate 0.0239 Epoch: 10 Global Step: 423790 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:20:59,036-Speed 2642.47 samples/sec Loss 6.6427 LearningRate 0.0239 Epoch: 10 Global Step: 423800 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:02,933-Speed 2628.89 samples/sec Loss 6.4159 LearningRate 0.0239 Epoch: 10 Global Step: 423810 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:06,824-Speed 2631.70 samples/sec Loss 6.5584 LearningRate 0.0239 Epoch: 10 Global Step: 423820 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:10,714-Speed 2633.31 samples/sec Loss 6.5476 LearningRate 0.0239 Epoch: 10 Global Step: 423830 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:14,607-Speed 2630.78 samples/sec Loss 6.4691 LearningRate 0.0239 Epoch: 10 Global Step: 423840 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:18,514-Speed 2622.02 samples/sec Loss 6.4459 LearningRate 0.0239 Epoch: 10 Global Step: 423850 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:22,412-Speed 2627.63 samples/sec Loss 6.4645 LearningRate 0.0239 Epoch: 10 Global Step: 423860 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:26,318-Speed 2621.88 samples/sec Loss 6.5246 LearningRate 0.0239 Epoch: 10 Global Step: 423870 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:30,213-Speed 2630.55 samples/sec Loss 6.4800 LearningRate 0.0239 Epoch: 10 Global Step: 423880 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:34,111-Speed 2627.29 samples/sec Loss 6.4435 LearningRate 0.0239 Epoch: 10 Global Step: 423890 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:38,008-Speed 2627.97 samples/sec Loss 6.4505 LearningRate 0.0239 Epoch: 10 Global Step: 423900 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:21:41,906-Speed 2627.88 samples/sec Loss 6.5026 LearningRate 0.0239 Epoch: 10 Global Step: 423910 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:21:45,801-Speed 2629.38 samples/sec Loss 6.5005 LearningRate 0.0239 Epoch: 10 Global Step: 423920 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:21:49,709-Speed 2621.44 samples/sec Loss 6.5244 LearningRate 0.0239 Epoch: 10 Global Step: 423930 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:21:53,585-Speed 2642.85 samples/sec Loss 6.4681 LearningRate 0.0239 Epoch: 10 Global Step: 423940 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:21:57,481-Speed 2628.67 samples/sec Loss 6.4879 LearningRate 0.0239 Epoch: 10 Global Step: 423950 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:01,383-Speed 2625.77 samples/sec Loss 6.5325 LearningRate 0.0239 Epoch: 10 Global Step: 423960 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:05,285-Speed 2624.54 samples/sec Loss 6.5968 LearningRate 0.0239 Epoch: 10 Global Step: 423970 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:09,189-Speed 2623.41 samples/sec Loss 6.4452 LearningRate 0.0239 Epoch: 10 Global Step: 423980 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:13,083-Speed 2629.90 samples/sec Loss 6.5443 LearningRate 0.0239 Epoch: 10 Global Step: 423990 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:16,979-Speed 2629.43 samples/sec Loss 6.5333 LearningRate 0.0239 Epoch: 10 Global Step: 424000 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:20,880-Speed 2625.36 samples/sec Loss 6.5327 LearningRate 0.0239 Epoch: 10 Global Step: 424010 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:24,786-Speed 2622.52 samples/sec Loss 6.4884 LearningRate 0.0239 Epoch: 10 Global Step: 424020 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:28,679-Speed 2631.29 samples/sec Loss 6.5582 LearningRate 0.0239 Epoch: 10 Global Step: 424030 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:22:32,579-Speed 2626.33 samples/sec Loss 6.5246 LearningRate 0.0239 Epoch: 10 Global Step: 424040 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:22:36,487-Speed 2620.67 samples/sec Loss 6.6285 LearningRate 0.0239 Epoch: 10 Global Step: 424050 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:22:40,386-Speed 2626.99 samples/sec Loss 6.5682 LearningRate 0.0239 Epoch: 10 Global Step: 424060 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:22:44,286-Speed 2626.55 samples/sec Loss 6.5286 LearningRate 0.0239 Epoch: 10 Global Step: 424070 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:22:48,192-Speed 2621.61 samples/sec Loss 6.5323 LearningRate 0.0239 Epoch: 10 Global Step: 424080 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:22:52,090-Speed 2628.41 samples/sec Loss 6.5854 LearningRate 0.0239 Epoch: 10 Global Step: 424090 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:22:55,986-Speed 2629.12 samples/sec Loss 6.6079 LearningRate 0.0239 Epoch: 10 Global Step: 424100 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:22:59,879-Speed 2630.40 samples/sec Loss 6.4553 LearningRate 0.0239 Epoch: 10 Global Step: 424110 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:03,803-Speed 2610.28 samples/sec Loss 6.4257 LearningRate 0.0239 Epoch: 10 Global Step: 424120 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:07,700-Speed 2629.04 samples/sec Loss 6.4358 LearningRate 0.0239 Epoch: 10 Global Step: 424130 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:11,614-Speed 2616.85 samples/sec Loss 6.5728 LearningRate 0.0239 Epoch: 10 Global Step: 424140 Fp16 Grad Scale: 262144 Required: 46 hours
Training: 2022-04-14 19:23:15,494-Speed 2639.34 samples/sec Loss 6.5141 LearningRate 0.0239 Epoch: 10 Global Step: 424150 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:19,409-Speed 2616.54 samples/sec Loss 6.3956 LearningRate 0.0239 Epoch: 10 Global Step: 424160 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:23,321-Speed 2617.77 samples/sec Loss 6.4432 LearningRate 0.0239 Epoch: 10 Global Step: 424170 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:27,217-Speed 2629.57 samples/sec Loss 6.4105 LearningRate 0.0239 Epoch: 10 Global Step: 424180 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:31,117-Speed 2626.25 samples/sec Loss 6.4650 LearningRate 0.0239 Epoch: 10 Global Step: 424190 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:35,028-Speed 2618.99 samples/sec Loss 6.5436 LearningRate 0.0239 Epoch: 10 Global Step: 424200 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:23:38,914-Speed 2635.61 samples/sec Loss 6.5913 LearningRate 0.0239 Epoch: 10 Global Step: 424210 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:23:42,813-Speed 2626.56 samples/sec Loss 6.4330 LearningRate 0.0239 Epoch: 10 Global Step: 424220 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:23:46,723-Speed 2619.38 samples/sec Loss 6.5330 LearningRate 0.0239 Epoch: 10 Global Step: 424230 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:23:50,626-Speed 2624.81 samples/sec Loss 6.4814 LearningRate 0.0239 Epoch: 10 Global Step: 424240 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:23:54,522-Speed 2628.84 samples/sec Loss 6.4576 LearningRate 0.0239 Epoch: 10 Global Step: 424250 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:23:58,425-Speed 2624.44 samples/sec Loss 6.5228 LearningRate 0.0239 Epoch: 10 Global Step: 424260 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:24:02,321-Speed 2628.79 samples/sec Loss 6.5947 LearningRate 0.0239 Epoch: 10 Global Step: 424270 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:24:06,218-Speed 2628.34 samples/sec Loss 6.4045 LearningRate 0.0239 Epoch: 10 Global Step: 424280 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:24:10,120-Speed 2624.57 samples/sec Loss 6.4612 LearningRate 0.0239 Epoch: 10 Global Step: 424290 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:24:14,080-Speed 2586.83 samples/sec Loss 6.4139 LearningRate 0.0239 Epoch: 10 Global Step: 424300 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:24:18,001-Speed 2612.27 samples/sec Loss 6.5611 LearningRate 0.0239 Epoch: 10 Global Step: 424310 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:21,911-Speed 2619.65 samples/sec Loss 6.5688 LearningRate 0.0239 Epoch: 10 Global Step: 424320 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:25,809-Speed 2627.55 samples/sec Loss 6.5325 LearningRate 0.0239 Epoch: 10 Global Step: 424330 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:29,703-Speed 2630.40 samples/sec Loss 6.5684 LearningRate 0.0239 Epoch: 10 Global Step: 424340 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:33,601-Speed 2627.48 samples/sec Loss 6.4935 LearningRate 0.0239 Epoch: 10 Global Step: 424350 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:37,494-Speed 2630.54 samples/sec Loss 6.4864 LearningRate 0.0239 Epoch: 10 Global Step: 424360 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:41,390-Speed 2629.36 samples/sec Loss 6.4648 LearningRate 0.0239 Epoch: 10 Global Step: 424370 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:45,301-Speed 2618.98 samples/sec Loss 6.5342 LearningRate 0.0239 Epoch: 10 Global Step: 424380 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:49,208-Speed 2621.83 samples/sec Loss 6.4475 LearningRate 0.0239 Epoch: 10 Global Step: 424390 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:53,100-Speed 2631.27 samples/sec Loss 6.4037 LearningRate 0.0239 Epoch: 10 Global Step: 424400 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:24:56,977-Speed 2642.36 samples/sec Loss 6.5311 LearningRate 0.0239 Epoch: 10 Global Step: 424410 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:00,874-Speed 2628.00 samples/sec Loss 6.4894 LearningRate 0.0239 Epoch: 10 Global Step: 424420 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:04,770-Speed 2629.30 samples/sec Loss 6.4591 LearningRate 0.0239 Epoch: 10 Global Step: 424430 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:08,710-Speed 2599.00 samples/sec Loss 6.5927 LearningRate 0.0239 Epoch: 10 Global Step: 424440 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:12,615-Speed 2623.93 samples/sec Loss 6.5617 LearningRate 0.0238 Epoch: 10 Global Step: 424450 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:16,506-Speed 2631.77 samples/sec Loss 6.5793 LearningRate 0.0238 Epoch: 10 Global Step: 424460 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:20,407-Speed 2626.02 samples/sec Loss 6.5027 LearningRate 0.0238 Epoch: 10 Global Step: 424470 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:24,319-Speed 2618.35 samples/sec Loss 6.4905 LearningRate 0.0238 Epoch: 10 Global Step: 424480 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:28,218-Speed 2626.96 samples/sec Loss 6.4222 LearningRate 0.0238 Epoch: 10 Global Step: 424490 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:32,111-Speed 2630.79 samples/sec Loss 6.5961 LearningRate 0.0238 Epoch: 10 Global Step: 424500 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:25:36,007-Speed 2628.63 samples/sec Loss 6.4740 LearningRate 0.0238 Epoch: 10 Global Step: 424510 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:25:39,912-Speed 2623.26 samples/sec Loss 6.6429 LearningRate 0.0238 Epoch: 10 Global Step: 424520 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:25:43,846-Speed 2603.30 samples/sec Loss 6.6364 LearningRate 0.0238 Epoch: 10 Global Step: 424530 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:25:47,805-Speed 2587.65 samples/sec Loss 6.6741 LearningRate 0.0238 Epoch: 10 Global Step: 424540 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:25:51,703-Speed 2627.92 samples/sec Loss 6.5839 LearningRate 0.0238 Epoch: 10 Global Step: 424550 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:25:55,603-Speed 2626.41 samples/sec Loss 6.5551 LearningRate 0.0238 Epoch: 10 Global Step: 424560 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:25:59,508-Speed 2622.95 samples/sec Loss 6.5552 LearningRate 0.0238 Epoch: 10 Global Step: 424570 Fp16 Grad Scale: 131072 Required: 46 hours
Training: 2022-04-14 19:26:03,433-Speed 2609.29 samples/sec Loss 6.4583 LearningRate 0.0238 Epoch: 10 Global Step: 424580 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:26:07,327-Speed 2629.78 samples/sec Loss 6.4876 LearningRate 0.0238 Epoch: 10 Global Step: 424590 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:26:11,228-Speed 2626.09 samples/sec Loss 6.5622 LearningRate 0.0238 Epoch: 10 Global Step: 424600 Fp16 Grad Scale: 65536 Required: 46 hours
Training: 2022-04-14 19:26:15,122-Speed 2630.16 samples/sec Loss 6.4161 LearningRate 0.0238 Epoch: 10 Global Step: 424610 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:26:19,020-Speed 2627.70 samples/sec Loss 6.4960 LearningRate 0.0238 Epoch: 10 Global Step: 424620 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:26:22,928-Speed 2620.80 samples/sec Loss 6.5093 LearningRate 0.0238 Epoch: 10 Global Step: 424630 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:26:26,841-Speed 2618.07 samples/sec Loss 6.6292 LearningRate 0.0238 Epoch: 10 Global Step: 424640 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:26:30,753-Speed 2617.93 samples/sec Loss 6.6590 LearningRate 0.0238 Epoch: 10 Global Step: 424650 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:26:34,662-Speed 2619.77 samples/sec Loss 6.4321 LearningRate 0.0238 Epoch: 10 Global Step: 424660 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:26:38,578-Speed 2615.53 samples/sec Loss 6.4358 LearningRate 0.0238 Epoch: 10 Global Step: 424670 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:26:42,478-Speed 2627.07 samples/sec Loss 6.4877 LearningRate 0.0238 Epoch: 10 Global Step: 424680 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:26:46,372-Speed 2630.26 samples/sec Loss 6.5617 LearningRate 0.0238 Epoch: 10 Global Step: 424690 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:26:50,295-Speed 2610.93 samples/sec Loss 6.4418 LearningRate 0.0238 Epoch: 10 Global Step: 424700 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:26:54,201-Speed 2622.27 samples/sec Loss 6.4001 LearningRate 0.0238 Epoch: 10 Global Step: 424710 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:26:58,103-Speed 2625.49 samples/sec Loss 6.5146 LearningRate 0.0238 Epoch: 10 Global Step: 424720 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:01,997-Speed 2630.34 samples/sec Loss 6.5337 LearningRate 0.0238 Epoch: 10 Global Step: 424730 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:05,894-Speed 2628.67 samples/sec Loss 6.4454 LearningRate 0.0238 Epoch: 10 Global Step: 424740 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:09,792-Speed 2627.55 samples/sec Loss 6.3462 LearningRate 0.0238 Epoch: 10 Global Step: 424750 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:13,693-Speed 2625.65 samples/sec Loss 6.3753 LearningRate 0.0238 Epoch: 10 Global Step: 424760 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:17,592-Speed 2627.52 samples/sec Loss 6.4728 LearningRate 0.0238 Epoch: 10 Global Step: 424770 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:21,472-Speed 2639.28 samples/sec Loss 6.5224 LearningRate 0.0238 Epoch: 10 Global Step: 424780 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:25,371-Speed 2627.18 samples/sec Loss 6.5257 LearningRate 0.0238 Epoch: 10 Global Step: 424790 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:29,272-Speed 2625.61 samples/sec Loss 6.5284 LearningRate 0.0238 Epoch: 10 Global Step: 424800 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:33,178-Speed 2622.21 samples/sec Loss 6.4337 LearningRate 0.0238 Epoch: 10 Global Step: 424810 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:37,075-Speed 2628.76 samples/sec Loss 6.5014 LearningRate 0.0238 Epoch: 10 Global Step: 424820 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:41,051-Speed 2575.91 samples/sec Loss 6.4999 LearningRate 0.0238 Epoch: 10 Global Step: 424830 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:44,951-Speed 2625.94 samples/sec Loss 6.4428 LearningRate 0.0238 Epoch: 10 Global Step: 424840 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:48,867-Speed 2616.82 samples/sec Loss 6.3948 LearningRate 0.0238 Epoch: 10 Global Step: 424850 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:52,774-Speed 2621.36 samples/sec Loss 6.5568 LearningRate 0.0238 Epoch: 10 Global Step: 424860 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:27:56,668-Speed 2630.82 samples/sec Loss 6.5196 LearningRate 0.0238 Epoch: 10 Global Step: 424870 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:00,564-Speed 2628.89 samples/sec Loss 6.5545 LearningRate 0.0238 Epoch: 10 Global Step: 424880 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:28:04,844-Speed 2393.13 samples/sec Loss 6.4964 LearningRate 0.0238 Epoch: 10 Global Step: 424890 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:08,738-Speed 2630.90 samples/sec Loss 6.5306 LearningRate 0.0238 Epoch: 10 Global Step: 424900 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:12,635-Speed 2628.51 samples/sec Loss 6.3977 LearningRate 0.0238 Epoch: 10 Global Step: 424910 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:16,531-Speed 2628.78 samples/sec Loss 6.3545 LearningRate 0.0238 Epoch: 10 Global Step: 424920 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:20,432-Speed 2625.23 samples/sec Loss 6.5201 LearningRate 0.0238 Epoch: 10 Global Step: 424930 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:24,329-Speed 2629.12 samples/sec Loss 6.6056 LearningRate 0.0238 Epoch: 10 Global Step: 424940 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:28,228-Speed 2627.35 samples/sec Loss 6.4731 LearningRate 0.0238 Epoch: 10 Global Step: 424950 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:32,121-Speed 2630.73 samples/sec Loss 6.4911 LearningRate 0.0238 Epoch: 10 Global Step: 424960 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:36,040-Speed 2613.21 samples/sec Loss 6.4410 LearningRate 0.0238 Epoch: 10 Global Step: 424970 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:39,940-Speed 2626.10 samples/sec Loss 6.4704 LearningRate 0.0238 Epoch: 10 Global Step: 424980 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:43,839-Speed 2627.05 samples/sec Loss 6.6485 LearningRate 0.0238 Epoch: 10 Global Step: 424990 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:28:47,719-Speed 2639.41 samples/sec Loss 6.5782 LearningRate 0.0238 Epoch: 10 Global Step: 425000 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:51,615-Speed 2629.48 samples/sec Loss 6.6227 LearningRate 0.0238 Epoch: 10 Global Step: 425010 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:55,520-Speed 2622.80 samples/sec Loss 6.4582 LearningRate 0.0238 Epoch: 10 Global Step: 425020 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:28:59,418-Speed 2627.93 samples/sec Loss 6.4137 LearningRate 0.0238 Epoch: 10 Global Step: 425030 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:03,314-Speed 2628.71 samples/sec Loss 6.4339 LearningRate 0.0238 Epoch: 10 Global Step: 425040 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:07,208-Speed 2630.20 samples/sec Loss 6.4848 LearningRate 0.0238 Epoch: 10 Global Step: 425050 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:11,104-Speed 2629.31 samples/sec Loss 6.5166 LearningRate 0.0238 Epoch: 10 Global Step: 425060 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:15,002-Speed 2627.21 samples/sec Loss 6.4302 LearningRate 0.0238 Epoch: 10 Global Step: 425070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:18,908-Speed 2622.24 samples/sec Loss 6.6050 LearningRate 0.0238 Epoch: 10 Global Step: 425080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:22,942-Speed 2538.83 samples/sec Loss 6.5106 LearningRate 0.0238 Epoch: 10 Global Step: 425090 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:26,825-Speed 2638.24 samples/sec Loss 6.4628 LearningRate 0.0238 Epoch: 10 Global Step: 425100 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:29:30,703-Speed 2641.26 samples/sec Loss 6.4875 LearningRate 0.0238 Epoch: 10 Global Step: 425110 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:29:34,605-Speed 2624.56 samples/sec Loss 6.5844 LearningRate 0.0238 Epoch: 10 Global Step: 425120 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:29:38,503-Speed 2627.67 samples/sec Loss 6.6492 LearningRate 0.0238 Epoch: 10 Global Step: 425130 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:29:42,403-Speed 2626.05 samples/sec Loss 6.5229 LearningRate 0.0238 Epoch: 10 Global Step: 425140 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:29:46,303-Speed 2626.60 samples/sec Loss 6.5557 LearningRate 0.0238 Epoch: 10 Global Step: 425150 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:29:50,194-Speed 2632.02 samples/sec Loss 6.5461 LearningRate 0.0238 Epoch: 10 Global Step: 425160 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:29:54,090-Speed 2629.30 samples/sec Loss 6.5579 LearningRate 0.0238 Epoch: 10 Global Step: 425170 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:29:57,990-Speed 2625.90 samples/sec Loss 6.5806 LearningRate 0.0238 Epoch: 10 Global Step: 425180 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:30:01,887-Speed 2628.23 samples/sec Loss 6.5657 LearningRate 0.0238 Epoch: 10 Global Step: 425190 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:30:05,785-Speed 2627.86 samples/sec Loss 6.5127 LearningRate 0.0238 Epoch: 10 Global Step: 425200 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:30:09,745-Speed 2586.35 samples/sec Loss 6.6021 LearningRate 0.0238 Epoch: 10 Global Step: 425210 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:13,654-Speed 2620.69 samples/sec Loss 6.4957 LearningRate 0.0238 Epoch: 10 Global Step: 425220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:17,560-Speed 2622.09 samples/sec Loss 6.4126 LearningRate 0.0238 Epoch: 10 Global Step: 425230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:21,455-Speed 2629.72 samples/sec Loss 6.4843 LearningRate 0.0238 Epoch: 10 Global Step: 425240 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:25,347-Speed 2631.57 samples/sec Loss 6.4038 LearningRate 0.0238 Epoch: 10 Global Step: 425250 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:29,347-Speed 2561.61 samples/sec Loss 6.6430 LearningRate 0.0238 Epoch: 10 Global Step: 425260 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:33,243-Speed 2628.54 samples/sec Loss 6.5634 LearningRate 0.0238 Epoch: 10 Global Step: 425270 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:37,145-Speed 2625.02 samples/sec Loss 6.4773 LearningRate 0.0238 Epoch: 10 Global Step: 425280 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:41,056-Speed 2618.94 samples/sec Loss 6.5235 LearningRate 0.0238 Epoch: 10 Global Step: 425290 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:30:44,939-Speed 2637.42 samples/sec Loss 6.5464 LearningRate 0.0237 Epoch: 10 Global Step: 425300 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:30:48,849-Speed 2619.90 samples/sec Loss 6.4472 LearningRate 0.0237 Epoch: 10 Global Step: 425310 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:30:52,754-Speed 2622.65 samples/sec Loss 6.4730 LearningRate 0.0237 Epoch: 10 Global Step: 425320 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:30:56,652-Speed 2628.34 samples/sec Loss 6.4767 LearningRate 0.0237 Epoch: 10 Global Step: 425330 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:00,550-Speed 2627.52 samples/sec Loss 6.6191 LearningRate 0.0237 Epoch: 10 Global Step: 425340 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:04,454-Speed 2623.12 samples/sec Loss 6.3754 LearningRate 0.0237 Epoch: 10 Global Step: 425350 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:08,362-Speed 2621.16 samples/sec Loss 6.5554 LearningRate 0.0237 Epoch: 10 Global Step: 425360 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:12,258-Speed 2629.28 samples/sec Loss 6.5660 LearningRate 0.0237 Epoch: 10 Global Step: 425370 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:16,157-Speed 2627.06 samples/sec Loss 6.5320 LearningRate 0.0237 Epoch: 10 Global Step: 425380 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:20,054-Speed 2628.43 samples/sec Loss 6.5746 LearningRate 0.0237 Epoch: 10 Global Step: 425390 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:23,952-Speed 2627.41 samples/sec Loss 6.4324 LearningRate 0.0237 Epoch: 10 Global Step: 425400 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:31:27,849-Speed 2628.78 samples/sec Loss 6.5334 LearningRate 0.0237 Epoch: 10 Global Step: 425410 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:31:31,745-Speed 2628.61 samples/sec Loss 6.4985 LearningRate 0.0237 Epoch: 10 Global Step: 425420 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:31:35,645-Speed 2626.13 samples/sec Loss 6.5417 LearningRate 0.0237 Epoch: 10 Global Step: 425430 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:31:39,550-Speed 2622.65 samples/sec Loss 6.5657 LearningRate 0.0237 Epoch: 10 Global Step: 425440 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:31:43,454-Speed 2624.02 samples/sec Loss 6.4641 LearningRate 0.0237 Epoch: 10 Global Step: 425450 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:31:47,356-Speed 2624.73 samples/sec Loss 6.5045 LearningRate 0.0237 Epoch: 10 Global Step: 425460 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:31:51,248-Speed 2632.08 samples/sec Loss 6.4342 LearningRate 0.0237 Epoch: 10 Global Step: 425470 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:55,151-Speed 2624.30 samples/sec Loss 6.4180 LearningRate 0.0237 Epoch: 10 Global Step: 425480 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:31:59,053-Speed 2625.15 samples/sec Loss 6.4611 LearningRate 0.0237 Epoch: 10 Global Step: 425490 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:02,948-Speed 2629.55 samples/sec Loss 6.4331 LearningRate 0.0237 Epoch: 10 Global Step: 425500 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:06,860-Speed 2618.21 samples/sec Loss 6.5127 LearningRate 0.0237 Epoch: 10 Global Step: 425510 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:10,755-Speed 2629.71 samples/sec Loss 6.4486 LearningRate 0.0237 Epoch: 10 Global Step: 425520 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:14,652-Speed 2628.54 samples/sec Loss 6.5860 LearningRate 0.0237 Epoch: 10 Global Step: 425530 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:18,544-Speed 2631.76 samples/sec Loss 6.4847 LearningRate 0.0237 Epoch: 10 Global Step: 425540 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:22,445-Speed 2625.65 samples/sec Loss 6.4973 LearningRate 0.0237 Epoch: 10 Global Step: 425550 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:26,349-Speed 2623.79 samples/sec Loss 6.4782 LearningRate 0.0237 Epoch: 10 Global Step: 425560 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:32:30,242-Speed 2630.92 samples/sec Loss 6.5834 LearningRate 0.0237 Epoch: 10 Global Step: 425570 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:32:34,177-Speed 2602.78 samples/sec Loss 6.3511 LearningRate 0.0237 Epoch: 10 Global Step: 425580 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:32:38,095-Speed 2614.71 samples/sec Loss 6.4962 LearningRate 0.0237 Epoch: 10 Global Step: 425590 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:32:41,991-Speed 2629.05 samples/sec Loss 6.4561 LearningRate 0.0237 Epoch: 10 Global Step: 425600 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:32:45,900-Speed 2620.30 samples/sec Loss 6.5099 LearningRate 0.0237 Epoch: 10 Global Step: 425610 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:32:49,797-Speed 2628.27 samples/sec Loss 6.3956 LearningRate 0.0237 Epoch: 10 Global Step: 425620 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:32:53,707-Speed 2619.45 samples/sec Loss 6.4649 LearningRate 0.0237 Epoch: 10 Global Step: 425630 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:32:57,603-Speed 2629.29 samples/sec Loss 6.4519 LearningRate 0.0237 Epoch: 10 Global Step: 425640 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:01,500-Speed 2628.56 samples/sec Loss 6.5696 LearningRate 0.0237 Epoch: 10 Global Step: 425650 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:05,418-Speed 2613.88 samples/sec Loss 6.5485 LearningRate 0.0237 Epoch: 10 Global Step: 425660 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:09,333-Speed 2616.55 samples/sec Loss 6.4720 LearningRate 0.0237 Epoch: 10 Global Step: 425670 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:33:13,238-Speed 2623.05 samples/sec Loss 6.6071 LearningRate 0.0237 Epoch: 10 Global Step: 425680 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:33:17,125-Speed 2634.50 samples/sec Loss 6.2866 LearningRate 0.0237 Epoch: 10 Global Step: 425690 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:21,035-Speed 2619.79 samples/sec Loss 6.4614 LearningRate 0.0237 Epoch: 10 Global Step: 425700 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:24,971-Speed 2602.57 samples/sec Loss 6.4394 LearningRate 0.0237 Epoch: 10 Global Step: 425710 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:28,871-Speed 2626.64 samples/sec Loss 6.5000 LearningRate 0.0237 Epoch: 10 Global Step: 425720 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:32,769-Speed 2627.82 samples/sec Loss 6.4529 LearningRate 0.0237 Epoch: 10 Global Step: 425730 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:36,663-Speed 2629.67 samples/sec Loss 6.3852 LearningRate 0.0237 Epoch: 10 Global Step: 425740 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:40,557-Speed 2630.35 samples/sec Loss 6.4819 LearningRate 0.0237 Epoch: 10 Global Step: 425750 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:44,454-Speed 2628.64 samples/sec Loss 6.4518 LearningRate 0.0237 Epoch: 10 Global Step: 425760 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:48,349-Speed 2629.99 samples/sec Loss 6.4292 LearningRate 0.0237 Epoch: 10 Global Step: 425770 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:33:52,231-Speed 2639.01 samples/sec Loss 6.5328 LearningRate 0.0237 Epoch: 10 Global Step: 425780 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:33:56,130-Speed 2626.79 samples/sec Loss 6.4634 LearningRate 0.0237 Epoch: 10 Global Step: 425790 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:00,065-Speed 2602.71 samples/sec Loss 6.4921 LearningRate 0.0237 Epoch: 10 Global Step: 425800 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:03,969-Speed 2623.72 samples/sec Loss 6.4446 LearningRate 0.0237 Epoch: 10 Global Step: 425810 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:07,867-Speed 2627.58 samples/sec Loss 6.5561 LearningRate 0.0237 Epoch: 10 Global Step: 425820 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:11,763-Speed 2629.27 samples/sec Loss 6.5298 LearningRate 0.0237 Epoch: 10 Global Step: 425830 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:15,659-Speed 2628.96 samples/sec Loss 6.5242 LearningRate 0.0237 Epoch: 10 Global Step: 425840 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:19,557-Speed 2627.74 samples/sec Loss 6.5750 LearningRate 0.0237 Epoch: 10 Global Step: 425850 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:23,453-Speed 2629.20 samples/sec Loss 6.4260 LearningRate 0.0237 Epoch: 10 Global Step: 425860 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:27,359-Speed 2622.63 samples/sec Loss 6.4632 LearningRate 0.0237 Epoch: 10 Global Step: 425870 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:31,257-Speed 2627.07 samples/sec Loss 6.5005 LearningRate 0.0237 Epoch: 10 Global Step: 425880 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:34:35,153-Speed 2628.83 samples/sec Loss 6.4762 LearningRate 0.0237 Epoch: 10 Global Step: 425890 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:34:39,048-Speed 2629.71 samples/sec Loss 6.5191 LearningRate 0.0237 Epoch: 10 Global Step: 425900 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:34:42,924-Speed 2642.53 samples/sec Loss 6.5549 LearningRate 0.0237 Epoch: 10 Global Step: 425910 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:46,829-Speed 2623.37 samples/sec Loss 6.6267 LearningRate 0.0237 Epoch: 10 Global Step: 425920 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:50,741-Speed 2617.86 samples/sec Loss 6.4213 LearningRate 0.0237 Epoch: 10 Global Step: 425930 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:54,648-Speed 2621.84 samples/sec Loss 6.5939 LearningRate 0.0237 Epoch: 10 Global Step: 425940 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:34:58,547-Speed 2626.77 samples/sec Loss 6.4248 LearningRate 0.0237 Epoch: 10 Global Step: 425950 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:02,446-Speed 2626.86 samples/sec Loss 6.6119 LearningRate 0.0237 Epoch: 10 Global Step: 425960 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:06,345-Speed 2626.70 samples/sec Loss 6.5478 LearningRate 0.0237 Epoch: 10 Global Step: 425970 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:10,242-Speed 2628.05 samples/sec Loss 6.4127 LearningRate 0.0237 Epoch: 10 Global Step: 425980 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:14,140-Speed 2628.45 samples/sec Loss 6.6107 LearningRate 0.0237 Epoch: 10 Global Step: 425990 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:18,063-Speed 2610.51 samples/sec Loss 6.5166 LearningRate 0.0237 Epoch: 10 Global Step: 426000 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:21,961-Speed 2634.48 samples/sec Loss 6.5037 LearningRate 0.0237 Epoch: 10 Global Step: 426010 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:35:25,852-Speed 2631.93 samples/sec Loss 6.5301 LearningRate 0.0237 Epoch: 10 Global Step: 426020 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:35:29,757-Speed 2623.62 samples/sec Loss 6.3750 LearningRate 0.0237 Epoch: 10 Global Step: 426030 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:35:33,635-Speed 2641.26 samples/sec Loss 6.5386 LearningRate 0.0237 Epoch: 10 Global Step: 426040 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:37,533-Speed 2627.24 samples/sec Loss 6.5038 LearningRate 0.0237 Epoch: 10 Global Step: 426050 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:41,430-Speed 2628.06 samples/sec Loss 6.2744 LearningRate 0.0237 Epoch: 10 Global Step: 426060 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:45,330-Speed 2626.19 samples/sec Loss 6.5298 LearningRate 0.0237 Epoch: 10 Global Step: 426070 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:49,224-Speed 2630.34 samples/sec Loss 6.6121 LearningRate 0.0237 Epoch: 10 Global Step: 426080 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:53,134-Speed 2619.43 samples/sec Loss 6.4488 LearningRate 0.0237 Epoch: 10 Global Step: 426090 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:35:57,019-Speed 2636.64 samples/sec Loss 6.4007 LearningRate 0.0237 Epoch: 10 Global Step: 426100 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:36:00,977-Speed 2587.70 samples/sec Loss 6.3732 LearningRate 0.0237 Epoch: 10 Global Step: 426110 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:36:04,869-Speed 2631.60 samples/sec Loss 6.5330 LearningRate 0.0237 Epoch: 10 Global Step: 426120 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:36:08,766-Speed 2628.26 samples/sec Loss 6.4101 LearningRate 0.0237 Epoch: 10 Global Step: 426130 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:36:12,660-Speed 2630.31 samples/sec Loss 6.4748 LearningRate 0.0237 Epoch: 10 Global Step: 426140 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:16,575-Speed 2616.35 samples/sec Loss 6.4494 LearningRate 0.0236 Epoch: 10 Global Step: 426150 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:20,471-Speed 2628.77 samples/sec Loss 6.4750 LearningRate 0.0236 Epoch: 10 Global Step: 426160 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:24,373-Speed 2624.73 samples/sec Loss 6.4283 LearningRate 0.0236 Epoch: 10 Global Step: 426170 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:28,270-Speed 2628.70 samples/sec Loss 6.6048 LearningRate 0.0236 Epoch: 10 Global Step: 426180 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:32,170-Speed 2626.84 samples/sec Loss 6.5621 LearningRate 0.0236 Epoch: 10 Global Step: 426190 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:36,070-Speed 2625.50 samples/sec Loss 6.4056 LearningRate 0.0236 Epoch: 10 Global Step: 426200 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:39,976-Speed 2622.62 samples/sec Loss 6.4709 LearningRate 0.0236 Epoch: 10 Global Step: 426210 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:43,868-Speed 2632.20 samples/sec Loss 6.5538 LearningRate 0.0236 Epoch: 10 Global Step: 426220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:47,761-Speed 2630.76 samples/sec Loss 6.4764 LearningRate 0.0236 Epoch: 10 Global Step: 426230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:51,634-Speed 2644.67 samples/sec Loss 6.5635 LearningRate 0.0236 Epoch: 10 Global Step: 426240 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:55,531-Speed 2628.17 samples/sec Loss 6.4960 LearningRate 0.0236 Epoch: 10 Global Step: 426250 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:36:59,441-Speed 2619.88 samples/sec Loss 6.5580 LearningRate 0.0236 Epoch: 10 Global Step: 426260 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:03,341-Speed 2626.48 samples/sec Loss 6.4460 LearningRate 0.0236 Epoch: 10 Global Step: 426270 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:07,242-Speed 2625.37 samples/sec Loss 6.3942 LearningRate 0.0236 Epoch: 10 Global Step: 426280 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:11,139-Speed 2627.99 samples/sec Loss 6.4980 LearningRate 0.0236 Epoch: 10 Global Step: 426290 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:15,045-Speed 2622.34 samples/sec Loss 6.4759 LearningRate 0.0236 Epoch: 10 Global Step: 426300 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:18,947-Speed 2625.26 samples/sec Loss 6.4348 LearningRate 0.0236 Epoch: 10 Global Step: 426310 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:22,850-Speed 2623.75 samples/sec Loss 6.4190 LearningRate 0.0236 Epoch: 10 Global Step: 426320 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:26,746-Speed 2629.19 samples/sec Loss 6.4379 LearningRate 0.0236 Epoch: 10 Global Step: 426330 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:30,637-Speed 2632.40 samples/sec Loss 6.6078 LearningRate 0.0236 Epoch: 10 Global Step: 426340 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:34,528-Speed 2632.65 samples/sec Loss 6.4299 LearningRate 0.0236 Epoch: 10 Global Step: 426350 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:38,428-Speed 2626.14 samples/sec Loss 6.3701 LearningRate 0.0236 Epoch: 10 Global Step: 426360 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:37:42,342-Speed 2617.06 samples/sec Loss 6.3204 LearningRate 0.0236 Epoch: 10 Global Step: 426370 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:37:46,432-Speed 2503.67 samples/sec Loss 6.3861 LearningRate 0.0236 Epoch: 10 Global Step: 426380 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:37:50,529-Speed 2500.43 samples/sec Loss 6.4470 LearningRate 0.0236 Epoch: 10 Global Step: 426390 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:37:54,498-Speed 2580.03 samples/sec Loss 6.5002 LearningRate 0.0236 Epoch: 10 Global Step: 426400 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:37:58,421-Speed 2611.65 samples/sec Loss 6.5223 LearningRate 0.0236 Epoch: 10 Global Step: 426410 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:38:02,325-Speed 2623.70 samples/sec Loss 6.4003 LearningRate 0.0236 Epoch: 10 Global Step: 426420 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:38:06,217-Speed 2631.66 samples/sec Loss 6.5831 LearningRate 0.0236 Epoch: 10 Global Step: 426430 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:38:10,114-Speed 2627.95 samples/sec Loss 6.5529 LearningRate 0.0236 Epoch: 10 Global Step: 426440 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:38:14,012-Speed 2628.34 samples/sec Loss 6.5767 LearningRate 0.0236 Epoch: 10 Global Step: 426450 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:38:17,910-Speed 2627.28 samples/sec Loss 6.5419 LearningRate 0.0236 Epoch: 10 Global Step: 426460 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:38:21,807-Speed 2628.18 samples/sec Loss 6.5405 LearningRate 0.0236 Epoch: 10 Global Step: 426470 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:25,705-Speed 2627.82 samples/sec Loss 6.4997 LearningRate 0.0236 Epoch: 10 Global Step: 426480 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:29,602-Speed 2628.59 samples/sec Loss 6.4550 LearningRate 0.0236 Epoch: 10 Global Step: 426490 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:33,514-Speed 2618.23 samples/sec Loss 6.5861 LearningRate 0.0236 Epoch: 10 Global Step: 426500 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:37,412-Speed 2627.82 samples/sec Loss 6.5518 LearningRate 0.0236 Epoch: 10 Global Step: 426510 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:41,309-Speed 2628.25 samples/sec Loss 6.4245 LearningRate 0.0236 Epoch: 10 Global Step: 426520 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:45,237-Speed 2607.18 samples/sec Loss 6.4597 LearningRate 0.0236 Epoch: 10 Global Step: 426530 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:49,147-Speed 2619.78 samples/sec Loss 6.4587 LearningRate 0.0236 Epoch: 10 Global Step: 426540 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:53,081-Speed 2603.19 samples/sec Loss 6.4665 LearningRate 0.0236 Epoch: 10 Global Step: 426550 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:38:56,999-Speed 2620.34 samples/sec Loss 6.5418 LearningRate 0.0236 Epoch: 10 Global Step: 426560 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:00,878-Speed 2640.12 samples/sec Loss 6.5273 LearningRate 0.0236 Epoch: 10 Global Step: 426570 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:04,782-Speed 2624.20 samples/sec Loss 6.4078 LearningRate 0.0236 Epoch: 10 Global Step: 426580 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:08,689-Speed 2621.24 samples/sec Loss 6.4739 LearningRate 0.0236 Epoch: 10 Global Step: 426590 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:12,601-Speed 2618.08 samples/sec Loss 6.4658 LearningRate 0.0236 Epoch: 10 Global Step: 426600 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:16,509-Speed 2620.74 samples/sec Loss 6.4588 LearningRate 0.0236 Epoch: 10 Global Step: 426610 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:20,409-Speed 2626.50 samples/sec Loss 6.6470 LearningRate 0.0236 Epoch: 10 Global Step: 426620 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:24,309-Speed 2626.46 samples/sec Loss 6.4205 LearningRate 0.0236 Epoch: 10 Global Step: 426630 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:28,219-Speed 2619.37 samples/sec Loss 6.4936 LearningRate 0.0236 Epoch: 10 Global Step: 426640 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:32,129-Speed 2619.80 samples/sec Loss 6.4951 LearningRate 0.0236 Epoch: 10 Global Step: 426650 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:36,035-Speed 2622.46 samples/sec Loss 6.5440 LearningRate 0.0236 Epoch: 10 Global Step: 426660 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:39,942-Speed 2621.20 samples/sec Loss 6.4300 LearningRate 0.0236 Epoch: 10 Global Step: 426670 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:39:43,824-Speed 2638.31 samples/sec Loss 6.4915 LearningRate 0.0236 Epoch: 10 Global Step: 426680 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:47,728-Speed 2623.85 samples/sec Loss 6.4257 LearningRate 0.0236 Epoch: 10 Global Step: 426690 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:51,621-Speed 2630.57 samples/sec Loss 6.5427 LearningRate 0.0236 Epoch: 10 Global Step: 426700 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:55,552-Speed 2606.07 samples/sec Loss 6.5188 LearningRate 0.0236 Epoch: 10 Global Step: 426710 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:39:59,458-Speed 2622.41 samples/sec Loss 6.4128 LearningRate 0.0236 Epoch: 10 Global Step: 426720 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:40:03,337-Speed 2640.58 samples/sec Loss 6.5868 LearningRate 0.0236 Epoch: 10 Global Step: 426730 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:07,231-Speed 2629.80 samples/sec Loss 6.5015 LearningRate 0.0236 Epoch: 10 Global Step: 426740 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:11,131-Speed 2626.63 samples/sec Loss 6.5097 LearningRate 0.0236 Epoch: 10 Global Step: 426750 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:15,021-Speed 2632.92 samples/sec Loss 6.4767 LearningRate 0.0236 Epoch: 10 Global Step: 426760 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:18,925-Speed 2623.93 samples/sec Loss 6.4557 LearningRate 0.0236 Epoch: 10 Global Step: 426770 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:22,843-Speed 2614.26 samples/sec Loss 6.4154 LearningRate 0.0236 Epoch: 10 Global Step: 426780 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:26,818-Speed 2576.32 samples/sec Loss 6.6025 LearningRate 0.0236 Epoch: 10 Global Step: 426790 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:30,735-Speed 2614.68 samples/sec Loss 6.4764 LearningRate 0.0236 Epoch: 10 Global Step: 426800 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:34,652-Speed 2615.88 samples/sec Loss 6.3571 LearningRate 0.0236 Epoch: 10 Global Step: 426810 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:38,647-Speed 2563.33 samples/sec Loss 6.4902 LearningRate 0.0236 Epoch: 10 Global Step: 426820 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:40:42,547-Speed 2626.05 samples/sec Loss 6.5727 LearningRate 0.0236 Epoch: 10 Global Step: 426830 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:40:46,474-Speed 2608.59 samples/sec Loss 6.6140 LearningRate 0.0236 Epoch: 10 Global Step: 426840 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:40:50,383-Speed 2620.22 samples/sec Loss 6.5595 LearningRate 0.0236 Epoch: 10 Global Step: 426850 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:40:54,274-Speed 2632.96 samples/sec Loss 6.4053 LearningRate 0.0236 Epoch: 10 Global Step: 426860 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:40:58,170-Speed 2628.82 samples/sec Loss 6.3399 LearningRate 0.0236 Epoch: 10 Global Step: 426870 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:02,072-Speed 2624.27 samples/sec Loss 6.4252 LearningRate 0.0236 Epoch: 10 Global Step: 426880 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:05,981-Speed 2620.76 samples/sec Loss 6.4084 LearningRate 0.0236 Epoch: 10 Global Step: 426890 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:09,883-Speed 2625.40 samples/sec Loss 6.4612 LearningRate 0.0236 Epoch: 10 Global Step: 426900 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:13,782-Speed 2627.03 samples/sec Loss 6.3686 LearningRate 0.0236 Epoch: 10 Global Step: 426910 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:17,682-Speed 2626.06 samples/sec Loss 6.4319 LearningRate 0.0236 Epoch: 10 Global Step: 426920 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:21,563-Speed 2639.51 samples/sec Loss 6.5738 LearningRate 0.0236 Epoch: 10 Global Step: 426930 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:25,459-Speed 2628.94 samples/sec Loss 6.4103 LearningRate 0.0236 Epoch: 10 Global Step: 426940 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:29,358-Speed 2627.10 samples/sec Loss 6.4060 LearningRate 0.0236 Epoch: 10 Global Step: 426950 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:33,264-Speed 2621.91 samples/sec Loss 6.4016 LearningRate 0.0236 Epoch: 10 Global Step: 426960 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:37,173-Speed 2620.45 samples/sec Loss 6.6868 LearningRate 0.0236 Epoch: 10 Global Step: 426970 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:41,070-Speed 2628.09 samples/sec Loss 6.4933 LearningRate 0.0236 Epoch: 10 Global Step: 426980 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:45,004-Speed 2604.00 samples/sec Loss 6.5237 LearningRate 0.0236 Epoch: 10 Global Step: 426990 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:48,899-Speed 2629.58 samples/sec Loss 6.5062 LearningRate 0.0235 Epoch: 10 Global Step: 427000 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:52,820-Speed 2612.26 samples/sec Loss 6.5006 LearningRate 0.0235 Epoch: 10 Global Step: 427010 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:41:56,730-Speed 2620.11 samples/sec Loss 6.5173 LearningRate 0.0235 Epoch: 10 Global Step: 427020 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:00,627-Speed 2628.11 samples/sec Loss 6.5327 LearningRate 0.0235 Epoch: 10 Global Step: 427030 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:42:04,512-Speed 2636.24 samples/sec Loss 6.5057 LearningRate 0.0235 Epoch: 10 Global Step: 427040 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:08,413-Speed 2625.24 samples/sec Loss 6.4207 LearningRate 0.0235 Epoch: 10 Global Step: 427050 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:12,312-Speed 2626.96 samples/sec Loss 6.4390 LearningRate 0.0235 Epoch: 10 Global Step: 427060 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:16,215-Speed 2624.56 samples/sec Loss 6.4683 LearningRate 0.0235 Epoch: 10 Global Step: 427070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:20,130-Speed 2616.17 samples/sec Loss 6.5828 LearningRate 0.0235 Epoch: 10 Global Step: 427080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:24,033-Speed 2623.99 samples/sec Loss 6.5310 LearningRate 0.0235 Epoch: 10 Global Step: 427090 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:27,926-Speed 2630.91 samples/sec Loss 6.4477 LearningRate 0.0235 Epoch: 10 Global Step: 427100 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:31,824-Speed 2628.40 samples/sec Loss 6.3842 LearningRate 0.0235 Epoch: 10 Global Step: 427110 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:42:35,703-Speed 2640.22 samples/sec Loss 6.5645 LearningRate 0.0235 Epoch: 10 Global Step: 427120 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:42:39,605-Speed 2624.90 samples/sec Loss 6.3740 LearningRate 0.0235 Epoch: 10 Global Step: 427130 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:42:43,502-Speed 2627.99 samples/sec Loss 6.6033 LearningRate 0.0235 Epoch: 10 Global Step: 427140 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:42:47,400-Speed 2627.38 samples/sec Loss 6.4880 LearningRate 0.0235 Epoch: 10 Global Step: 427150 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:42:51,305-Speed 2623.49 samples/sec Loss 6.5606 LearningRate 0.0235 Epoch: 10 Global Step: 427160 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:42:55,238-Speed 2603.80 samples/sec Loss 6.4357 LearningRate 0.0235 Epoch: 10 Global Step: 427170 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:42:59,171-Speed 2604.33 samples/sec Loss 6.4830 LearningRate 0.0235 Epoch: 10 Global Step: 427180 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:43:03,063-Speed 2631.47 samples/sec Loss 6.4768 LearningRate 0.0235 Epoch: 10 Global Step: 427190 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:43:06,960-Speed 2628.24 samples/sec Loss 6.4110 LearningRate 0.0235 Epoch: 10 Global Step: 427200 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:43:10,855-Speed 2629.70 samples/sec Loss 6.3382 LearningRate 0.0235 Epoch: 10 Global Step: 427210 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:43:14,752-Speed 2628.41 samples/sec Loss 6.4816 LearningRate 0.0235 Epoch: 10 Global Step: 427220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:18,651-Speed 2627.05 samples/sec Loss 6.4241 LearningRate 0.0235 Epoch: 10 Global Step: 427230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:22,552-Speed 2626.10 samples/sec Loss 6.4173 LearningRate 0.0235 Epoch: 10 Global Step: 427240 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:26,450-Speed 2627.39 samples/sec Loss 6.4151 LearningRate 0.0235 Epoch: 10 Global Step: 427250 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:30,353-Speed 2624.01 samples/sec Loss 6.4551 LearningRate 0.0235 Epoch: 10 Global Step: 427260 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:34,248-Speed 2629.80 samples/sec Loss 6.4284 LearningRate 0.0235 Epoch: 10 Global Step: 427270 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:38,158-Speed 2619.37 samples/sec Loss 6.4463 LearningRate 0.0235 Epoch: 10 Global Step: 427280 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:42,072-Speed 2617.36 samples/sec Loss 6.4508 LearningRate 0.0235 Epoch: 10 Global Step: 427290 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:45,962-Speed 2632.58 samples/sec Loss 6.4054 LearningRate 0.0235 Epoch: 10 Global Step: 427300 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:49,856-Speed 2630.27 samples/sec Loss 6.4075 LearningRate 0.0235 Epoch: 10 Global Step: 427310 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:43:53,756-Speed 2626.68 samples/sec Loss 6.5410 LearningRate 0.0235 Epoch: 10 Global Step: 427320 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:43:57,635-Speed 2640.39 samples/sec Loss 6.5043 LearningRate 0.0235 Epoch: 10 Global Step: 427330 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:01,534-Speed 2627.04 samples/sec Loss 6.4600 LearningRate 0.0235 Epoch: 10 Global Step: 427340 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:05,428-Speed 2630.16 samples/sec Loss 6.5854 LearningRate 0.0235 Epoch: 10 Global Step: 427350 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:09,319-Speed 2632.22 samples/sec Loss 6.4929 LearningRate 0.0235 Epoch: 10 Global Step: 427360 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:13,214-Speed 2629.51 samples/sec Loss 6.4829 LearningRate 0.0235 Epoch: 10 Global Step: 427370 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:17,112-Speed 2627.77 samples/sec Loss 6.4753 LearningRate 0.0235 Epoch: 10 Global Step: 427380 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:21,008-Speed 2628.89 samples/sec Loss 6.3793 LearningRate 0.0235 Epoch: 10 Global Step: 427390 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:24,914-Speed 2622.03 samples/sec Loss 6.5211 LearningRate 0.0235 Epoch: 10 Global Step: 427400 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:28,885-Speed 2579.08 samples/sec Loss 6.4382 LearningRate 0.0235 Epoch: 10 Global Step: 427410 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:32,805-Speed 2612.99 samples/sec Loss 6.4271 LearningRate 0.0235 Epoch: 10 Global Step: 427420 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:44:36,702-Speed 2628.57 samples/sec Loss 6.4813 LearningRate 0.0235 Epoch: 10 Global Step: 427430 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:44:40,597-Speed 2629.62 samples/sec Loss 6.4564 LearningRate 0.0235 Epoch: 10 Global Step: 427440 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:44:44,500-Speed 2624.24 samples/sec Loss 6.3122 LearningRate 0.0235 Epoch: 10 Global Step: 427450 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:44:48,470-Speed 2580.26 samples/sec Loss 6.4544 LearningRate 0.0235 Epoch: 10 Global Step: 427460 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:44:52,406-Speed 2602.13 samples/sec Loss 6.5398 LearningRate 0.0235 Epoch: 10 Global Step: 427470 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:44:56,305-Speed 2626.80 samples/sec Loss 6.4528 LearningRate 0.0235 Epoch: 10 Global Step: 427480 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:00,202-Speed 2627.91 samples/sec Loss 6.3609 LearningRate 0.0235 Epoch: 10 Global Step: 427490 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:04,101-Speed 2627.47 samples/sec Loss 6.5233 LearningRate 0.0235 Epoch: 10 Global Step: 427500 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:08,015-Speed 2616.61 samples/sec Loss 6.5548 LearningRate 0.0235 Epoch: 10 Global Step: 427510 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:11,910-Speed 2629.42 samples/sec Loss 6.3736 LearningRate 0.0235 Epoch: 10 Global Step: 427520 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:15,784-Speed 2643.44 samples/sec Loss 6.3916 LearningRate 0.0235 Epoch: 10 Global Step: 427530 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:19,680-Speed 2629.51 samples/sec Loss 6.4644 LearningRate 0.0235 Epoch: 10 Global Step: 427540 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:23,576-Speed 2629.08 samples/sec Loss 6.3912 LearningRate 0.0235 Epoch: 10 Global Step: 427550 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:27,474-Speed 2627.35 samples/sec Loss 6.4575 LearningRate 0.0235 Epoch: 10 Global Step: 427560 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:31,363-Speed 2633.55 samples/sec Loss 6.5694 LearningRate 0.0235 Epoch: 10 Global Step: 427570 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:45:35,241-Speed 2641.39 samples/sec Loss 6.5046 LearningRate 0.0235 Epoch: 10 Global Step: 427580 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:45:39,140-Speed 2626.52 samples/sec Loss 6.4069 LearningRate 0.0235 Epoch: 10 Global Step: 427590 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:45:43,039-Speed 2627.24 samples/sec Loss 6.4593 LearningRate 0.0235 Epoch: 10 Global Step: 427600 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:45:46,945-Speed 2622.11 samples/sec Loss 6.4418 LearningRate 0.0235 Epoch: 10 Global Step: 427610 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:45:50,848-Speed 2623.81 samples/sec Loss 6.4542 LearningRate 0.0235 Epoch: 10 Global Step: 427620 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:45:54,743-Speed 2630.05 samples/sec Loss 6.3532 LearningRate 0.0235 Epoch: 10 Global Step: 427630 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:45:58,660-Speed 2614.72 samples/sec Loss 6.5523 LearningRate 0.0235 Epoch: 10 Global Step: 427640 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:02,589-Speed 2607.53 samples/sec Loss 6.5373 LearningRate 0.0235 Epoch: 10 Global Step: 427650 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:06,482-Speed 2630.78 samples/sec Loss 6.4690 LearningRate 0.0235 Epoch: 10 Global Step: 427660 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:10,382-Speed 2625.99 samples/sec Loss 6.4325 LearningRate 0.0235 Epoch: 10 Global Step: 427670 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:14,302-Speed 2612.64 samples/sec Loss 6.4515 LearningRate 0.0235 Epoch: 10 Global Step: 427680 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:46:18,197-Speed 2629.76 samples/sec Loss 6.4643 LearningRate 0.0235 Epoch: 10 Global Step: 427690 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:46:22,118-Speed 2611.98 samples/sec Loss 6.4638 LearningRate 0.0235 Epoch: 10 Global Step: 427700 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:46:26,020-Speed 2624.68 samples/sec Loss 6.3634 LearningRate 0.0235 Epoch: 10 Global Step: 427710 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:46:29,911-Speed 2632.47 samples/sec Loss 6.4839 LearningRate 0.0235 Epoch: 10 Global Step: 427720 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:33,801-Speed 2633.25 samples/sec Loss 6.4835 LearningRate 0.0235 Epoch: 10 Global Step: 427730 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:37,696-Speed 2629.64 samples/sec Loss 6.3955 LearningRate 0.0235 Epoch: 10 Global Step: 427740 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:41,593-Speed 2628.37 samples/sec Loss 6.5061 LearningRate 0.0235 Epoch: 10 Global Step: 427750 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:46:45,469-Speed 2642.16 samples/sec Loss 6.5096 LearningRate 0.0235 Epoch: 10 Global Step: 427760 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:46:49,363-Speed 2630.52 samples/sec Loss 6.4116 LearningRate 0.0235 Epoch: 10 Global Step: 427770 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:46:53,267-Speed 2623.26 samples/sec Loss 6.3634 LearningRate 0.0235 Epoch: 10 Global Step: 427780 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:46:57,166-Speed 2626.64 samples/sec Loss 6.4826 LearningRate 0.0235 Epoch: 10 Global Step: 427790 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:47:01,061-Speed 2629.70 samples/sec Loss 6.3533 LearningRate 0.0235 Epoch: 10 Global Step: 427800 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:47:04,955-Speed 2629.95 samples/sec Loss 6.4187 LearningRate 0.0235 Epoch: 10 Global Step: 427810 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:47:08,869-Speed 2617.81 samples/sec Loss 6.5178 LearningRate 0.0235 Epoch: 10 Global Step: 427820 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:47:12,760-Speed 2632.22 samples/sec Loss 6.4155 LearningRate 0.0235 Epoch: 10 Global Step: 427830 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:47:16,650-Speed 2632.29 samples/sec Loss 6.3666 LearningRate 0.0235 Epoch: 10 Global Step: 427840 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:47:20,562-Speed 2618.83 samples/sec Loss 6.4813 LearningRate 0.0235 Epoch: 10 Global Step: 427850 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 19:47:24,456-Speed 2630.20 samples/sec Loss 6.3590 LearningRate 0.0234 Epoch: 10 Global Step: 427860 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:28,352-Speed 2628.50 samples/sec Loss 6.5190 LearningRate 0.0234 Epoch: 10 Global Step: 427870 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:32,367-Speed 2551.13 samples/sec Loss 6.4386 LearningRate 0.0234 Epoch: 10 Global Step: 427880 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:36,291-Speed 2610.08 samples/sec Loss 6.4845 LearningRate 0.0234 Epoch: 10 Global Step: 427890 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:40,207-Speed 2615.48 samples/sec Loss 6.4783 LearningRate 0.0234 Epoch: 10 Global Step: 427900 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:44,103-Speed 2628.55 samples/sec Loss 6.4340 LearningRate 0.0234 Epoch: 10 Global Step: 427910 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:48,002-Speed 2627.16 samples/sec Loss 6.4109 LearningRate 0.0234 Epoch: 10 Global Step: 427920 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:51,901-Speed 2627.15 samples/sec Loss 6.4759 LearningRate 0.0234 Epoch: 10 Global Step: 427930 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:55,800-Speed 2626.93 samples/sec Loss 6.3503 LearningRate 0.0234 Epoch: 10 Global Step: 427940 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:47:59,699-Speed 2627.45 samples/sec Loss 6.5041 LearningRate 0.0234 Epoch: 10 Global Step: 427950 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:48:03,600-Speed 2625.18 samples/sec Loss 6.3346 LearningRate 0.0234 Epoch: 10 Global Step: 427960 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:07,495-Speed 2629.82 samples/sec Loss 6.4759 LearningRate 0.0234 Epoch: 10 Global Step: 427970 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:11,393-Speed 2627.21 samples/sec Loss 6.4546 LearningRate 0.0234 Epoch: 10 Global Step: 427980 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:15,290-Speed 2628.58 samples/sec Loss 6.5400 LearningRate 0.0234 Epoch: 10 Global Step: 427990 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:19,193-Speed 2623.67 samples/sec Loss 6.4159 LearningRate 0.0234 Epoch: 10 Global Step: 428000 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:23,112-Speed 2614.29 samples/sec Loss 6.4321 LearningRate 0.0234 Epoch: 10 Global Step: 428010 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:27,015-Speed 2624.21 samples/sec Loss 6.3736 LearningRate 0.0234 Epoch: 10 Global Step: 428020 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:30,911-Speed 2629.73 samples/sec Loss 6.5000 LearningRate 0.0234 Epoch: 10 Global Step: 428030 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:34,826-Speed 2615.87 samples/sec Loss 6.4901 LearningRate 0.0234 Epoch: 10 Global Step: 428040 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:38,722-Speed 2629.08 samples/sec Loss 6.3805 LearningRate 0.0234 Epoch: 10 Global Step: 428050 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:42,616-Speed 2629.59 samples/sec Loss 6.4824 LearningRate 0.0234 Epoch: 10 Global Step: 428060 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:48:46,490-Speed 2643.91 samples/sec Loss 6.4364 LearningRate 0.0234 Epoch: 10 Global Step: 428070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:50,381-Speed 2632.15 samples/sec Loss 6.4632 LearningRate 0.0234 Epoch: 10 Global Step: 428080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:54,409-Speed 2543.14 samples/sec Loss 6.4213 LearningRate 0.0234 Epoch: 10 Global Step: 428090 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:48:58,456-Speed 2530.69 samples/sec Loss 6.5064 LearningRate 0.0234 Epoch: 10 Global Step: 428100 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:02,351-Speed 2629.56 samples/sec Loss 6.4260 LearningRate 0.0234 Epoch: 10 Global Step: 428110 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:06,243-Speed 2631.87 samples/sec Loss 6.4922 LearningRate 0.0234 Epoch: 10 Global Step: 428120 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:10,135-Speed 2631.30 samples/sec Loss 6.3862 LearningRate 0.0234 Epoch: 10 Global Step: 428130 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:14,027-Speed 2633.30 samples/sec Loss 6.3686 LearningRate 0.0234 Epoch: 10 Global Step: 428140 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:17,922-Speed 2629.38 samples/sec Loss 6.4605 LearningRate 0.0234 Epoch: 10 Global Step: 428150 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:21,817-Speed 2629.74 samples/sec Loss 6.5482 LearningRate 0.0234 Epoch: 10 Global Step: 428160 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:25,709-Speed 2631.14 samples/sec Loss 6.5508 LearningRate 0.0234 Epoch: 10 Global Step: 428170 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:49:29,586-Speed 2642.47 samples/sec Loss 6.6105 LearningRate 0.0234 Epoch: 10 Global Step: 428180 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:33,484-Speed 2627.06 samples/sec Loss 6.4354 LearningRate 0.0234 Epoch: 10 Global Step: 428190 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:37,387-Speed 2624.34 samples/sec Loss 6.5367 LearningRate 0.0234 Epoch: 10 Global Step: 428200 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:41,281-Speed 2630.37 samples/sec Loss 6.4422 LearningRate 0.0234 Epoch: 10 Global Step: 428210 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:45,182-Speed 2625.61 samples/sec Loss 6.4108 LearningRate 0.0234 Epoch: 10 Global Step: 428220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:49,082-Speed 2626.42 samples/sec Loss 6.5748 LearningRate 0.0234 Epoch: 10 Global Step: 428230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:52,978-Speed 2628.86 samples/sec Loss 6.5516 LearningRate 0.0234 Epoch: 10 Global Step: 428240 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:49:56,875-Speed 2628.30 samples/sec Loss 6.4961 LearningRate 0.0234 Epoch: 10 Global Step: 428250 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:00,767-Speed 2631.46 samples/sec Loss 6.5090 LearningRate 0.0234 Epoch: 10 Global Step: 428260 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:04,863-Speed 2500.97 samples/sec Loss 6.4356 LearningRate 0.0234 Epoch: 10 Global Step: 428270 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:08,803-Speed 2599.34 samples/sec Loss 6.3970 LearningRate 0.0234 Epoch: 10 Global Step: 428280 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:12,788-Speed 2569.96 samples/sec Loss 6.4145 LearningRate 0.0234 Epoch: 10 Global Step: 428290 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:16,687-Speed 2626.55 samples/sec Loss 6.5137 LearningRate 0.0234 Epoch: 10 Global Step: 428300 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:20,583-Speed 2629.04 samples/sec Loss 6.4199 LearningRate 0.0234 Epoch: 10 Global Step: 428310 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:24,498-Speed 2616.51 samples/sec Loss 6.4560 LearningRate 0.0234 Epoch: 10 Global Step: 428320 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:28,400-Speed 2625.25 samples/sec Loss 6.5834 LearningRate 0.0234 Epoch: 10 Global Step: 428330 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:32,299-Speed 2626.42 samples/sec Loss 6.4559 LearningRate 0.0234 Epoch: 10 Global Step: 428340 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:36,209-Speed 2620.15 samples/sec Loss 6.4116 LearningRate 0.0234 Epoch: 10 Global Step: 428350 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:50:40,260-Speed 2528.01 samples/sec Loss 6.4101 LearningRate 0.0234 Epoch: 10 Global Step: 428360 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:50:44,350-Speed 2503.84 samples/sec Loss 6.3316 LearningRate 0.0234 Epoch: 10 Global Step: 428370 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:50:48,274-Speed 2610.29 samples/sec Loss 6.5350 LearningRate 0.0234 Epoch: 10 Global Step: 428380 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:50:52,170-Speed 2629.15 samples/sec Loss 6.3742 LearningRate 0.0234 Epoch: 10 Global Step: 428390 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:50:56,064-Speed 2630.06 samples/sec Loss 6.3541 LearningRate 0.0234 Epoch: 10 Global Step: 428400 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:50:59,959-Speed 2630.01 samples/sec Loss 6.4275 LearningRate 0.0234 Epoch: 10 Global Step: 428410 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:51:03,862-Speed 2624.87 samples/sec Loss 6.4048 LearningRate 0.0234 Epoch: 10 Global Step: 428420 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:51:07,758-Speed 2628.65 samples/sec Loss 6.4237 LearningRate 0.0234 Epoch: 10 Global Step: 428430 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:51:11,658-Speed 2625.85 samples/sec Loss 6.4194 LearningRate 0.0234 Epoch: 10 Global Step: 428440 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:51:15,554-Speed 2629.39 samples/sec Loss 6.4298 LearningRate 0.0234 Epoch: 10 Global Step: 428450 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:51:19,449-Speed 2629.31 samples/sec Loss 6.4877 LearningRate 0.0234 Epoch: 10 Global Step: 428460 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:23,355-Speed 2622.53 samples/sec Loss 6.4764 LearningRate 0.0234 Epoch: 10 Global Step: 428470 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:27,247-Speed 2631.47 samples/sec Loss 6.4110 LearningRate 0.0234 Epoch: 10 Global Step: 428480 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:31,141-Speed 2630.34 samples/sec Loss 6.4563 LearningRate 0.0234 Epoch: 10 Global Step: 428490 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:35,047-Speed 2625.14 samples/sec Loss 6.3522 LearningRate 0.0234 Epoch: 10 Global Step: 428500 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:38,951-Speed 2623.84 samples/sec Loss 6.4289 LearningRate 0.0234 Epoch: 10 Global Step: 428510 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:42,861-Speed 2618.87 samples/sec Loss 6.4993 LearningRate 0.0234 Epoch: 10 Global Step: 428520 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:46,768-Speed 2621.72 samples/sec Loss 6.5020 LearningRate 0.0234 Epoch: 10 Global Step: 428530 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:50,673-Speed 2623.23 samples/sec Loss 6.3794 LearningRate 0.0234 Epoch: 10 Global Step: 428540 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:54,582-Speed 2619.63 samples/sec Loss 6.5645 LearningRate 0.0234 Epoch: 10 Global Step: 428550 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:51:58,480-Speed 2627.68 samples/sec Loss 6.4701 LearningRate 0.0234 Epoch: 10 Global Step: 428560 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:52:02,358-Speed 2640.89 samples/sec Loss 6.4249 LearningRate 0.0234 Epoch: 10 Global Step: 428570 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:52:06,255-Speed 2628.92 samples/sec Loss 6.4735 LearningRate 0.0234 Epoch: 10 Global Step: 428580 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:52:10,140-Speed 2636.29 samples/sec Loss 6.4504 LearningRate 0.0234 Epoch: 10 Global Step: 428590 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:14,038-Speed 2627.52 samples/sec Loss 6.3870 LearningRate 0.0234 Epoch: 10 Global Step: 428600 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:17,936-Speed 2627.66 samples/sec Loss 6.4656 LearningRate 0.0234 Epoch: 10 Global Step: 428610 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:21,833-Speed 2628.00 samples/sec Loss 6.4463 LearningRate 0.0234 Epoch: 10 Global Step: 428620 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:25,726-Speed 2630.70 samples/sec Loss 6.4116 LearningRate 0.0234 Epoch: 10 Global Step: 428630 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:29,626-Speed 2626.06 samples/sec Loss 6.4812 LearningRate 0.0234 Epoch: 10 Global Step: 428640 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:33,520-Speed 2630.13 samples/sec Loss 6.5856 LearningRate 0.0234 Epoch: 10 Global Step: 428650 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:37,414-Speed 2630.11 samples/sec Loss 6.4589 LearningRate 0.0234 Epoch: 10 Global Step: 428660 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:41,312-Speed 2627.95 samples/sec Loss 6.4161 LearningRate 0.0234 Epoch: 10 Global Step: 428670 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:45,211-Speed 2627.12 samples/sec Loss 6.4812 LearningRate 0.0234 Epoch: 10 Global Step: 428680 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:52:49,125-Speed 2616.43 samples/sec Loss 6.3469 LearningRate 0.0234 Epoch: 10 Global Step: 428690 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:52:53,037-Speed 2619.18 samples/sec Loss 6.5887 LearningRate 0.0234 Epoch: 10 Global Step: 428700 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:52:56,943-Speed 2621.69 samples/sec Loss 6.3583 LearningRate 0.0234 Epoch: 10 Global Step: 428710 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:00,851-Speed 2620.93 samples/sec Loss 6.4814 LearningRate 0.0233 Epoch: 10 Global Step: 428720 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:04,760-Speed 2620.21 samples/sec Loss 6.4804 LearningRate 0.0233 Epoch: 10 Global Step: 428730 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:08,657-Speed 2628.23 samples/sec Loss 6.3217 LearningRate 0.0233 Epoch: 10 Global Step: 428740 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:12,557-Speed 2626.20 samples/sec Loss 6.4028 LearningRate 0.0233 Epoch: 10 Global Step: 428750 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:16,453-Speed 2628.62 samples/sec Loss 6.4491 LearningRate 0.0233 Epoch: 10 Global Step: 428760 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:20,353-Speed 2626.64 samples/sec Loss 6.3718 LearningRate 0.0233 Epoch: 10 Global Step: 428770 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:24,252-Speed 2626.66 samples/sec Loss 6.4381 LearningRate 0.0233 Epoch: 10 Global Step: 428780 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:28,153-Speed 2625.84 samples/sec Loss 6.4947 LearningRate 0.0233 Epoch: 10 Global Step: 428790 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:32,055-Speed 2624.70 samples/sec Loss 6.3771 LearningRate 0.0233 Epoch: 10 Global Step: 428800 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:35,949-Speed 2630.53 samples/sec Loss 6.4844 LearningRate 0.0233 Epoch: 10 Global Step: 428810 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:39,844-Speed 2629.64 samples/sec Loss 6.5233 LearningRate 0.0233 Epoch: 10 Global Step: 428820 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:43,748-Speed 2623.76 samples/sec Loss 6.3684 LearningRate 0.0233 Epoch: 10 Global Step: 428830 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:47,642-Speed 2630.25 samples/sec Loss 6.5601 LearningRate 0.0233 Epoch: 10 Global Step: 428840 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:51,539-Speed 2627.98 samples/sec Loss 6.4343 LearningRate 0.0233 Epoch: 10 Global Step: 428850 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:55,442-Speed 2624.04 samples/sec Loss 6.4609 LearningRate 0.0233 Epoch: 10 Global Step: 428860 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:53:59,349-Speed 2622.19 samples/sec Loss 6.4822 LearningRate 0.0233 Epoch: 10 Global Step: 428870 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:54:03,253-Speed 2623.94 samples/sec Loss 6.5302 LearningRate 0.0233 Epoch: 10 Global Step: 428880 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:54:07,227-Speed 2577.09 samples/sec Loss 6.3015 LearningRate 0.0233 Epoch: 10 Global Step: 428890 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:54:11,097-Speed 2646.62 samples/sec Loss 6.4787 LearningRate 0.0233 Epoch: 10 Global Step: 428900 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:54:14,992-Speed 2629.59 samples/sec Loss 6.4838 LearningRate 0.0233 Epoch: 10 Global Step: 428910 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:54:18,892-Speed 2626.20 samples/sec Loss 6.4442 LearningRate 0.0233 Epoch: 10 Global Step: 428920 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:54:22,790-Speed 2628.08 samples/sec Loss 6.4189 LearningRate 0.0233 Epoch: 10 Global Step: 428930 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:54:26,690-Speed 2625.80 samples/sec Loss 6.5087 LearningRate 0.0233 Epoch: 10 Global Step: 428940 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:54:30,579-Speed 2633.54 samples/sec Loss 6.4089 LearningRate 0.0233 Epoch: 10 Global Step: 428950 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:54:34,486-Speed 2621.46 samples/sec Loss 6.4434 LearningRate 0.0233 Epoch: 10 Global Step: 428960 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:54:38,396-Speed 2619.50 samples/sec Loss 6.4732 LearningRate 0.0233 Epoch: 10 Global Step: 428970 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:54:42,309-Speed 2617.51 samples/sec Loss 6.5529 LearningRate 0.0233 Epoch: 10 Global Step: 428980 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:54:46,206-Speed 2629.30 samples/sec Loss 6.4895 LearningRate 0.0233 Epoch: 10 Global Step: 428990 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:54:50,109-Speed 2624.16 samples/sec Loss 6.4926 LearningRate 0.0233 Epoch: 10 Global Step: 429000 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:54:54,006-Speed 2628.48 samples/sec Loss 6.4939 LearningRate 0.0233 Epoch: 10 Global Step: 429010 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:54:57,931-Speed 2609.99 samples/sec Loss 6.5533 LearningRate 0.0233 Epoch: 10 Global Step: 429020 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:01,831-Speed 2625.75 samples/sec Loss 6.5022 LearningRate 0.0233 Epoch: 10 Global Step: 429030 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:05,726-Speed 2629.84 samples/sec Loss 6.3705 LearningRate 0.0233 Epoch: 10 Global Step: 429040 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:09,629-Speed 2623.65 samples/sec Loss 6.4674 LearningRate 0.0233 Epoch: 10 Global Step: 429050 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:55:13,518-Speed 2634.04 samples/sec Loss 6.3503 LearningRate 0.0233 Epoch: 10 Global Step: 429060 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:55:17,410-Speed 2632.07 samples/sec Loss 6.3307 LearningRate 0.0233 Epoch: 10 Global Step: 429070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:55:21,304-Speed 2630.52 samples/sec Loss 6.3233 LearningRate 0.0233 Epoch: 10 Global Step: 429080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:55:25,198-Speed 2629.61 samples/sec Loss 6.5038 LearningRate 0.0233 Epoch: 10 Global Step: 429090 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:55:29,083-Speed 2637.76 samples/sec Loss 6.3361 LearningRate 0.0233 Epoch: 10 Global Step: 429100 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:32,980-Speed 2627.63 samples/sec Loss 6.4649 LearningRate 0.0233 Epoch: 10 Global Step: 429110 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:36,882-Speed 2624.64 samples/sec Loss 6.4047 LearningRate 0.0233 Epoch: 10 Global Step: 429120 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:40,788-Speed 2622.14 samples/sec Loss 6.4262 LearningRate 0.0233 Epoch: 10 Global Step: 429130 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:44,702-Speed 2617.38 samples/sec Loss 6.5365 LearningRate 0.0233 Epoch: 10 Global Step: 429140 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:48,606-Speed 2623.10 samples/sec Loss 6.4702 LearningRate 0.0233 Epoch: 10 Global Step: 429150 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:52,510-Speed 2624.25 samples/sec Loss 6.5596 LearningRate 0.0233 Epoch: 10 Global Step: 429160 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:55:56,407-Speed 2628.24 samples/sec Loss 6.3977 LearningRate 0.0233 Epoch: 10 Global Step: 429170 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:00,303-Speed 2629.12 samples/sec Loss 6.3397 LearningRate 0.0233 Epoch: 10 Global Step: 429180 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:04,196-Speed 2630.41 samples/sec Loss 6.4330 LearningRate 0.0233 Epoch: 10 Global Step: 429190 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:08,094-Speed 2627.41 samples/sec Loss 6.3421 LearningRate 0.0233 Epoch: 10 Global Step: 429200 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:56:11,990-Speed 2628.92 samples/sec Loss 6.4012 LearningRate 0.0233 Epoch: 10 Global Step: 429210 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:56:15,903-Speed 2617.81 samples/sec Loss 6.3099 LearningRate 0.0233 Epoch: 10 Global Step: 429220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:56:19,798-Speed 2629.71 samples/sec Loss 6.4007 LearningRate 0.0233 Epoch: 10 Global Step: 429230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:56:23,677-Speed 2640.04 samples/sec Loss 6.4132 LearningRate 0.0233 Epoch: 10 Global Step: 429240 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:27,577-Speed 2626.67 samples/sec Loss 6.5166 LearningRate 0.0233 Epoch: 10 Global Step: 429250 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:31,477-Speed 2626.51 samples/sec Loss 6.4172 LearningRate 0.0233 Epoch: 10 Global Step: 429260 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:35,370-Speed 2630.97 samples/sec Loss 6.3636 LearningRate 0.0233 Epoch: 10 Global Step: 429270 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:39,270-Speed 2625.78 samples/sec Loss 6.3984 LearningRate 0.0233 Epoch: 10 Global Step: 429280 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:43,174-Speed 2623.88 samples/sec Loss 6.3689 LearningRate 0.0233 Epoch: 10 Global Step: 429290 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:47,091-Speed 2614.33 samples/sec Loss 6.4668 LearningRate 0.0233 Epoch: 10 Global Step: 429300 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:50,996-Speed 2623.26 samples/sec Loss 6.3427 LearningRate 0.0233 Epoch: 10 Global Step: 429310 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:54,894-Speed 2627.12 samples/sec Loss 6.5078 LearningRate 0.0233 Epoch: 10 Global Step: 429320 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:56:59,102-Speed 2434.25 samples/sec Loss 6.4699 LearningRate 0.0233 Epoch: 10 Global Step: 429330 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:02,998-Speed 2628.97 samples/sec Loss 6.5579 LearningRate 0.0233 Epoch: 10 Global Step: 429340 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:57:06,906-Speed 2621.33 samples/sec Loss 6.5220 LearningRate 0.0233 Epoch: 10 Global Step: 429350 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:57:10,801-Speed 2629.39 samples/sec Loss 6.4901 LearningRate 0.0233 Epoch: 10 Global Step: 429360 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:57:14,677-Speed 2641.94 samples/sec Loss 6.4446 LearningRate 0.0233 Epoch: 10 Global Step: 429370 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:18,571-Speed 2630.70 samples/sec Loss 6.4462 LearningRate 0.0233 Epoch: 10 Global Step: 429380 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:22,464-Speed 2630.28 samples/sec Loss 6.3733 LearningRate 0.0233 Epoch: 10 Global Step: 429390 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:26,353-Speed 2633.82 samples/sec Loss 6.3665 LearningRate 0.0233 Epoch: 10 Global Step: 429400 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:30,243-Speed 2633.05 samples/sec Loss 6.5127 LearningRate 0.0233 Epoch: 10 Global Step: 429410 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:34,134-Speed 2632.29 samples/sec Loss 6.5461 LearningRate 0.0233 Epoch: 10 Global Step: 429420 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:38,024-Speed 2633.07 samples/sec Loss 6.5131 LearningRate 0.0233 Epoch: 10 Global Step: 429430 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:41,923-Speed 2627.33 samples/sec Loss 6.3318 LearningRate 0.0233 Epoch: 10 Global Step: 429440 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:45,816-Speed 2630.91 samples/sec Loss 6.4499 LearningRate 0.0233 Epoch: 10 Global Step: 429450 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:49,721-Speed 2623.08 samples/sec Loss 6.4935 LearningRate 0.0233 Epoch: 10 Global Step: 429460 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:53,606-Speed 2636.45 samples/sec Loss 6.4372 LearningRate 0.0233 Epoch: 10 Global Step: 429470 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:57:57,517-Speed 2619.76 samples/sec Loss 6.3999 LearningRate 0.0233 Epoch: 10 Global Step: 429480 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:01,449-Speed 2604.73 samples/sec Loss 6.4538 LearningRate 0.0233 Epoch: 10 Global Step: 429490 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:05,341-Speed 2632.19 samples/sec Loss 6.3539 LearningRate 0.0233 Epoch: 10 Global Step: 429500 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:09,253-Speed 2617.74 samples/sec Loss 6.4393 LearningRate 0.0233 Epoch: 10 Global Step: 429510 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:13,152-Speed 2626.87 samples/sec Loss 6.5007 LearningRate 0.0233 Epoch: 10 Global Step: 429520 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:17,049-Speed 2628.35 samples/sec Loss 6.4676 LearningRate 0.0233 Epoch: 10 Global Step: 429530 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:20,945-Speed 2629.36 samples/sec Loss 6.4375 LearningRate 0.0233 Epoch: 10 Global Step: 429540 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:24,843-Speed 2627.73 samples/sec Loss 6.4040 LearningRate 0.0233 Epoch: 10 Global Step: 429550 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:28,745-Speed 2624.35 samples/sec Loss 6.3673 LearningRate 0.0233 Epoch: 10 Global Step: 429560 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:58:32,669-Speed 2610.07 samples/sec Loss 6.4230 LearningRate 0.0233 Epoch: 10 Global Step: 429570 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:58:36,570-Speed 2626.27 samples/sec Loss 6.3838 LearningRate 0.0232 Epoch: 10 Global Step: 429580 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:58:40,465-Speed 2629.74 samples/sec Loss 6.4199 LearningRate 0.0232 Epoch: 10 Global Step: 429590 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:58:44,360-Speed 2629.57 samples/sec Loss 6.4152 LearningRate 0.0232 Epoch: 10 Global Step: 429600 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:58:48,256-Speed 2629.39 samples/sec Loss 6.3466 LearningRate 0.0232 Epoch: 10 Global Step: 429610 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:58:52,186-Speed 2605.70 samples/sec Loss 6.4016 LearningRate 0.0232 Epoch: 10 Global Step: 429620 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:58:56,092-Speed 2622.46 samples/sec Loss 6.4539 LearningRate 0.0232 Epoch: 10 Global Step: 429630 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:58:59,990-Speed 2627.72 samples/sec Loss 6.5153 LearningRate 0.0232 Epoch: 10 Global Step: 429640 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:59:03,894-Speed 2623.69 samples/sec Loss 6.4143 LearningRate 0.0232 Epoch: 10 Global Step: 429650 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:59:07,793-Speed 2627.05 samples/sec Loss 6.3420 LearningRate 0.0232 Epoch: 10 Global Step: 429660 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:59:11,690-Speed 2628.08 samples/sec Loss 6.3389 LearningRate 0.0232 Epoch: 10 Global Step: 429670 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 19:59:15,568-Speed 2641.46 samples/sec Loss 6.3971 LearningRate 0.0232 Epoch: 10 Global Step: 429680 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 19:59:19,448-Speed 2639.80 samples/sec Loss 6.4166 LearningRate 0.0232 Epoch: 10 Global Step: 429690 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:23,355-Speed 2621.53 samples/sec Loss 6.4185 LearningRate 0.0232 Epoch: 10 Global Step: 429700 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:27,245-Speed 2632.69 samples/sec Loss 6.4964 LearningRate 0.0232 Epoch: 10 Global Step: 429710 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:31,136-Speed 2632.77 samples/sec Loss 6.4274 LearningRate 0.0232 Epoch: 10 Global Step: 429720 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:35,034-Speed 2628.14 samples/sec Loss 6.4085 LearningRate 0.0232 Epoch: 10 Global Step: 429730 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:38,951-Speed 2614.33 samples/sec Loss 6.4083 LearningRate 0.0232 Epoch: 10 Global Step: 429740 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:42,843-Speed 2632.05 samples/sec Loss 6.4948 LearningRate 0.0232 Epoch: 10 Global Step: 429750 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:46,738-Speed 2630.00 samples/sec Loss 6.4395 LearningRate 0.0232 Epoch: 10 Global Step: 429760 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:50,635-Speed 2627.90 samples/sec Loss 6.4422 LearningRate 0.0232 Epoch: 10 Global Step: 429770 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:54,549-Speed 2616.81 samples/sec Loss 6.3672 LearningRate 0.0232 Epoch: 10 Global Step: 429780 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 19:59:58,440-Speed 2632.99 samples/sec Loss 6.5482 LearningRate 0.0232 Epoch: 10 Global Step: 429790 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:00:02,333-Speed 2630.66 samples/sec Loss 6.5221 LearningRate 0.0232 Epoch: 10 Global Step: 429800 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:00:06,230-Speed 2628.44 samples/sec Loss 6.4440 LearningRate 0.0232 Epoch: 10 Global Step: 429810 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:00:10,132-Speed 2624.34 samples/sec Loss 6.3923 LearningRate 0.0232 Epoch: 10 Global Step: 429820 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:00:14,029-Speed 2628.93 samples/sec Loss 6.4182 LearningRate 0.0232 Epoch: 10 Global Step: 429830 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:00:17,915-Speed 2635.14 samples/sec Loss 6.5550 LearningRate 0.0232 Epoch: 10 Global Step: 429840 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:21,817-Speed 2625.03 samples/sec Loss 6.3694 LearningRate 0.0232 Epoch: 10 Global Step: 429850 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:25,723-Speed 2622.49 samples/sec Loss 6.4759 LearningRate 0.0232 Epoch: 10 Global Step: 429860 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:29,618-Speed 2629.44 samples/sec Loss 6.5076 LearningRate 0.0232 Epoch: 10 Global Step: 429870 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:33,513-Speed 2629.98 samples/sec Loss 6.3231 LearningRate 0.0232 Epoch: 10 Global Step: 429880 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:37,412-Speed 2626.91 samples/sec Loss 6.3597 LearningRate 0.0232 Epoch: 10 Global Step: 429890 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:41,311-Speed 2626.65 samples/sec Loss 6.4838 LearningRate 0.0232 Epoch: 10 Global Step: 429900 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:45,233-Speed 2611.81 samples/sec Loss 6.4086 LearningRate 0.0232 Epoch: 10 Global Step: 429910 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:49,126-Speed 2631.42 samples/sec Loss 6.3555 LearningRate 0.0232 Epoch: 10 Global Step: 429920 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:53,032-Speed 2622.85 samples/sec Loss 6.3714 LearningRate 0.0232 Epoch: 10 Global Step: 429930 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:00:56,947-Speed 2616.33 samples/sec Loss 6.5340 LearningRate 0.0232 Epoch: 10 Global Step: 429940 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:01:00,842-Speed 2630.04 samples/sec Loss 6.4656 LearningRate 0.0232 Epoch: 10 Global Step: 429950 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:01:04,737-Speed 2629.56 samples/sec Loss 6.3728 LearningRate 0.0232 Epoch: 10 Global Step: 429960 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:01:08,633-Speed 2628.53 samples/sec Loss 6.4661 LearningRate 0.0232 Epoch: 10 Global Step: 429970 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:01:12,530-Speed 2628.37 samples/sec Loss 6.3788 LearningRate 0.0232 Epoch: 10 Global Step: 429980 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:01:16,430-Speed 2626.63 samples/sec Loss 6.3925 LearningRate 0.0232 Epoch: 10 Global Step: 429990 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:01:20,326-Speed 2628.82 samples/sec Loss 6.4771 LearningRate 0.0232 Epoch: 10 Global Step: 430000 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:02:03,095-[lfw][430000]XNorm: 22.466479
Training: 2022-04-14 20:02:03,096-[lfw][430000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 20:02:03,096-[lfw][430000]Accuracy-Highest: 0.99783
Training: 2022-04-14 20:02:52,903-[cfp_fp][430000]XNorm: 20.830325
Training: 2022-04-14 20:02:52,904-[cfp_fp][430000]Accuracy-Flip: 0.98814+-0.00384
Training: 2022-04-14 20:02:52,905-[cfp_fp][430000]Accuracy-Highest: 0.98843
Training: 2022-04-14 20:03:35,703-[agedb_30][430000]XNorm: 22.284805
Training: 2022-04-14 20:03:35,704-[agedb_30][430000]Accuracy-Flip: 0.97767+-0.00549
Training: 2022-04-14 20:03:35,705-[agedb_30][430000]Accuracy-Highest: 0.97767
Training: 2022-04-14 20:03:39,587-Speed 73.53 samples/sec Loss 6.4008 LearningRate 0.0232 Epoch: 10 Global Step: 430010 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:03:43,460-Speed 2644.40 samples/sec Loss 6.4639 LearningRate 0.0232 Epoch: 10 Global Step: 430020 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:03:47,336-Speed 2642.77 samples/sec Loss 6.4191 LearningRate 0.0232 Epoch: 10 Global Step: 430030 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:03:51,224-Speed 2634.55 samples/sec Loss 6.4487 LearningRate 0.0232 Epoch: 10 Global Step: 430040 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 20:03:55,085-Speed 2652.49 samples/sec Loss 6.4077 LearningRate 0.0232 Epoch: 10 Global Step: 430050 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:03:58,976-Speed 2632.15 samples/sec Loss 6.4880 LearningRate 0.0232 Epoch: 10 Global Step: 430060 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:04:02,864-Speed 2635.14 samples/sec Loss 6.3559 LearningRate 0.0232 Epoch: 10 Global Step: 430070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:04:06,757-Speed 2631.04 samples/sec Loss 6.3997 LearningRate 0.0232 Epoch: 10 Global Step: 430080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:04:10,661-Speed 2624.03 samples/sec Loss 6.5331 LearningRate 0.0232 Epoch: 10 Global Step: 430090 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:04:14,532-Speed 2645.80 samples/sec Loss 6.4263 LearningRate 0.0232 Epoch: 10 Global Step: 430100 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:18,427-Speed 2629.29 samples/sec Loss 6.3941 LearningRate 0.0232 Epoch: 10 Global Step: 430110 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:22,323-Speed 2628.72 samples/sec Loss 6.4402 LearningRate 0.0232 Epoch: 10 Global Step: 430120 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:26,214-Speed 2632.51 samples/sec Loss 6.4270 LearningRate 0.0232 Epoch: 10 Global Step: 430130 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:30,110-Speed 2629.78 samples/sec Loss 6.3570 LearningRate 0.0232 Epoch: 10 Global Step: 430140 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:34,037-Speed 2608.19 samples/sec Loss 6.4948 LearningRate 0.0232 Epoch: 10 Global Step: 430150 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:37,935-Speed 2628.20 samples/sec Loss 6.4366 LearningRate 0.0232 Epoch: 10 Global Step: 430160 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:41,828-Speed 2630.77 samples/sec Loss 6.4051 LearningRate 0.0232 Epoch: 10 Global Step: 430170 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:45,738-Speed 2620.18 samples/sec Loss 6.4565 LearningRate 0.0232 Epoch: 10 Global Step: 430180 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:49,632-Speed 2630.15 samples/sec Loss 6.4345 LearningRate 0.0232 Epoch: 10 Global Step: 430190 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:04:53,528-Speed 2629.40 samples/sec Loss 6.4187 LearningRate 0.0232 Epoch: 10 Global Step: 430200 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:04:57,424-Speed 2628.82 samples/sec Loss 6.3432 LearningRate 0.0232 Epoch: 10 Global Step: 430210 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:01,324-Speed 2626.60 samples/sec Loss 6.4029 LearningRate 0.0232 Epoch: 10 Global Step: 430220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:05,225-Speed 2624.86 samples/sec Loss 6.4388 LearningRate 0.0232 Epoch: 10 Global Step: 430230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:09,123-Speed 2628.21 samples/sec Loss 6.4217 LearningRate 0.0232 Epoch: 10 Global Step: 430240 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:13,023-Speed 2626.74 samples/sec Loss 6.4830 LearningRate 0.0232 Epoch: 10 Global Step: 430250 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:16,927-Speed 2623.51 samples/sec Loss 6.4006 LearningRate 0.0232 Epoch: 10 Global Step: 430260 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:20,828-Speed 2625.32 samples/sec Loss 6.3404 LearningRate 0.0232 Epoch: 10 Global Step: 430270 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:24,726-Speed 2627.59 samples/sec Loss 6.3343 LearningRate 0.0232 Epoch: 10 Global Step: 430280 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:28,632-Speed 2622.18 samples/sec Loss 6.3155 LearningRate 0.0232 Epoch: 10 Global Step: 430290 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:32,515-Speed 2637.92 samples/sec Loss 6.3487 LearningRate 0.0232 Epoch: 10 Global Step: 430300 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:36,417-Speed 2624.98 samples/sec Loss 6.4670 LearningRate 0.0232 Epoch: 10 Global Step: 430310 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:40,314-Speed 2628.54 samples/sec Loss 6.4798 LearningRate 0.0232 Epoch: 10 Global Step: 430320 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:44,221-Speed 2621.19 samples/sec Loss 6.3733 LearningRate 0.0232 Epoch: 10 Global Step: 430330 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:48,139-Speed 2615.62 samples/sec Loss 6.4239 LearningRate 0.0232 Epoch: 10 Global Step: 430340 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:52,040-Speed 2625.79 samples/sec Loss 6.3330 LearningRate 0.0232 Epoch: 10 Global Step: 430350 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:05:55,939-Speed 2626.66 samples/sec Loss 6.3925 LearningRate 0.0232 Epoch: 10 Global Step: 430360 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:00,007-Speed 2517.68 samples/sec Loss 6.3419 LearningRate 0.0232 Epoch: 10 Global Step: 430370 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:03,913-Speed 2621.93 samples/sec Loss 6.4577 LearningRate 0.0232 Epoch: 10 Global Step: 430380 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:07,808-Speed 2629.87 samples/sec Loss 6.3503 LearningRate 0.0232 Epoch: 10 Global Step: 430390 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:11,683-Speed 2643.92 samples/sec Loss 6.3946 LearningRate 0.0232 Epoch: 10 Global Step: 430400 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:15,588-Speed 2622.74 samples/sec Loss 6.3535 LearningRate 0.0232 Epoch: 10 Global Step: 430410 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:19,490-Speed 2624.75 samples/sec Loss 6.3612 LearningRate 0.0232 Epoch: 10 Global Step: 430420 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:23,396-Speed 2622.42 samples/sec Loss 6.3441 LearningRate 0.0232 Epoch: 10 Global Step: 430430 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:27,302-Speed 2622.35 samples/sec Loss 6.4000 LearningRate 0.0231 Epoch: 10 Global Step: 430440 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:31,200-Speed 2627.18 samples/sec Loss 6.5631 LearningRate 0.0231 Epoch: 10 Global Step: 430450 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:35,106-Speed 2622.04 samples/sec Loss 6.3804 LearningRate 0.0231 Epoch: 10 Global Step: 430460 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:39,006-Speed 2626.19 samples/sec Loss 6.5087 LearningRate 0.0231 Epoch: 10 Global Step: 430470 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:42,905-Speed 2627.74 samples/sec Loss 6.3982 LearningRate 0.0231 Epoch: 10 Global Step: 430480 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:46,799-Speed 2630.03 samples/sec Loss 6.4233 LearningRate 0.0231 Epoch: 10 Global Step: 430490 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:50,698-Speed 2627.21 samples/sec Loss 6.4799 LearningRate 0.0231 Epoch: 10 Global Step: 430500 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 20:06:54,576-Speed 2641.05 samples/sec Loss 6.3839 LearningRate 0.0231 Epoch: 10 Global Step: 430510 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:06:58,470-Speed 2630.00 samples/sec Loss 6.3839 LearningRate 0.0231 Epoch: 10 Global Step: 430520 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:02,363-Speed 2631.40 samples/sec Loss 6.3405 LearningRate 0.0231 Epoch: 10 Global Step: 430530 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:06,262-Speed 2626.75 samples/sec Loss 6.4645 LearningRate 0.0231 Epoch: 10 Global Step: 430540 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:10,157-Speed 2629.72 samples/sec Loss 6.3774 LearningRate 0.0231 Epoch: 10 Global Step: 430550 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:14,053-Speed 2629.58 samples/sec Loss 6.4342 LearningRate 0.0231 Epoch: 10 Global Step: 430560 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:17,954-Speed 2626.14 samples/sec Loss 6.4313 LearningRate 0.0231 Epoch: 10 Global Step: 430570 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:21,856-Speed 2624.65 samples/sec Loss 6.4226 LearningRate 0.0231 Epoch: 10 Global Step: 430580 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:25,754-Speed 2627.78 samples/sec Loss 6.2888 LearningRate 0.0231 Epoch: 10 Global Step: 430590 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:29,649-Speed 2629.34 samples/sec Loss 6.4230 LearningRate 0.0231 Epoch: 10 Global Step: 430600 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:33,548-Speed 2627.34 samples/sec Loss 6.4322 LearningRate 0.0231 Epoch: 10 Global Step: 430610 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 20:07:37,435-Speed 2635.17 samples/sec Loss 6.3593 LearningRate 0.0231 Epoch: 10 Global Step: 430620 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:41,341-Speed 2622.57 samples/sec Loss 6.3849 LearningRate 0.0231 Epoch: 10 Global Step: 430630 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:45,236-Speed 2629.18 samples/sec Loss 6.5310 LearningRate 0.0231 Epoch: 10 Global Step: 430640 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:49,131-Speed 2629.72 samples/sec Loss 6.4638 LearningRate 0.0231 Epoch: 10 Global Step: 430650 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:53,047-Speed 2615.36 samples/sec Loss 6.3349 LearningRate 0.0231 Epoch: 10 Global Step: 430660 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:07:56,951-Speed 2623.57 samples/sec Loss 6.4740 LearningRate 0.0231 Epoch: 10 Global Step: 430670 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:08:00,840-Speed 2633.55 samples/sec Loss 6.3694 LearningRate 0.0231 Epoch: 10 Global Step: 430680 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:04,749-Speed 2620.67 samples/sec Loss 6.3466 LearningRate 0.0231 Epoch: 10 Global Step: 430690 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:08,648-Speed 2626.58 samples/sec Loss 6.3942 LearningRate 0.0231 Epoch: 10 Global Step: 430700 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:12,550-Speed 2625.44 samples/sec Loss 6.3988 LearningRate 0.0231 Epoch: 10 Global Step: 430710 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:16,469-Speed 2612.98 samples/sec Loss 6.4584 LearningRate 0.0231 Epoch: 10 Global Step: 430720 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:20,382-Speed 2617.81 samples/sec Loss 6.3968 LearningRate 0.0231 Epoch: 10 Global Step: 430730 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:24,324-Speed 2598.69 samples/sec Loss 6.3655 LearningRate 0.0231 Epoch: 10 Global Step: 430740 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:28,221-Speed 2628.50 samples/sec Loss 6.4328 LearningRate 0.0231 Epoch: 10 Global Step: 430750 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:32,170-Speed 2594.03 samples/sec Loss 6.4058 LearningRate 0.0231 Epoch: 10 Global Step: 430760 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:36,073-Speed 2624.16 samples/sec Loss 6.3381 LearningRate 0.0231 Epoch: 10 Global Step: 430770 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:39,999-Speed 2609.02 samples/sec Loss 6.4387 LearningRate 0.0231 Epoch: 10 Global Step: 430780 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:08:43,922-Speed 2610.79 samples/sec Loss 6.2814 LearningRate 0.0231 Epoch: 10 Global Step: 430790 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:08:47,821-Speed 2627.15 samples/sec Loss 6.3909 LearningRate 0.0231 Epoch: 10 Global Step: 430800 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:08:51,718-Speed 2628.24 samples/sec Loss 6.4119 LearningRate 0.0231 Epoch: 10 Global Step: 430810 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:08:55,601-Speed 2637.96 samples/sec Loss 6.3789 LearningRate 0.0231 Epoch: 10 Global Step: 430820 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:08:59,511-Speed 2619.01 samples/sec Loss 6.4734 LearningRate 0.0231 Epoch: 10 Global Step: 430830 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:03,407-Speed 2629.32 samples/sec Loss 6.4161 LearningRate 0.0231 Epoch: 10 Global Step: 430840 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:07,303-Speed 2628.90 samples/sec Loss 6.4148 LearningRate 0.0231 Epoch: 10 Global Step: 430850 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:11,200-Speed 2628.55 samples/sec Loss 6.4934 LearningRate 0.0231 Epoch: 10 Global Step: 430860 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:15,096-Speed 2629.24 samples/sec Loss 6.3579 LearningRate 0.0231 Epoch: 10 Global Step: 430870 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:18,990-Speed 2630.62 samples/sec Loss 6.4728 LearningRate 0.0231 Epoch: 10 Global Step: 430880 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:22,897-Speed 2621.26 samples/sec Loss 6.4663 LearningRate 0.0231 Epoch: 10 Global Step: 430890 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:26,802-Speed 2622.60 samples/sec Loss 6.4236 LearningRate 0.0231 Epoch: 10 Global Step: 430900 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:30,711-Speed 2620.44 samples/sec Loss 6.5153 LearningRate 0.0231 Epoch: 10 Global Step: 430910 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:34,614-Speed 2624.02 samples/sec Loss 6.4532 LearningRate 0.0231 Epoch: 10 Global Step: 430920 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:09:38,513-Speed 2627.61 samples/sec Loss 6.4874 LearningRate 0.0231 Epoch: 10 Global Step: 430930 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:09:42,417-Speed 2623.17 samples/sec Loss 6.4781 LearningRate 0.0231 Epoch: 10 Global Step: 430940 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:09:46,329-Speed 2618.71 samples/sec Loss 6.4174 LearningRate 0.0231 Epoch: 10 Global Step: 430950 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:09:50,234-Speed 2622.20 samples/sec Loss 6.3945 LearningRate 0.0231 Epoch: 10 Global Step: 430960 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:09:54,106-Speed 2645.23 samples/sec Loss 6.3701 LearningRate 0.0231 Epoch: 10 Global Step: 430970 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:09:58,043-Speed 2601.56 samples/sec Loss 6.3345 LearningRate 0.0231 Epoch: 10 Global Step: 430980 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:01,936-Speed 2631.35 samples/sec Loss 6.3778 LearningRate 0.0231 Epoch: 10 Global Step: 430990 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:05,841-Speed 2622.78 samples/sec Loss 6.4743 LearningRate 0.0231 Epoch: 10 Global Step: 431000 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:09,810-Speed 2580.62 samples/sec Loss 6.4716 LearningRate 0.0231 Epoch: 10 Global Step: 431010 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:13,701-Speed 2632.23 samples/sec Loss 6.4132 LearningRate 0.0231 Epoch: 10 Global Step: 431020 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:17,636-Speed 2603.20 samples/sec Loss 6.4782 LearningRate 0.0231 Epoch: 10 Global Step: 431030 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:21,533-Speed 2628.48 samples/sec Loss 6.3878 LearningRate 0.0231 Epoch: 10 Global Step: 431040 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:25,427-Speed 2630.54 samples/sec Loss 6.4674 LearningRate 0.0231 Epoch: 10 Global Step: 431050 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:29,320-Speed 2630.45 samples/sec Loss 6.4312 LearningRate 0.0231 Epoch: 10 Global Step: 431060 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:10:33,250-Speed 2607.73 samples/sec Loss 6.3325 LearningRate 0.0231 Epoch: 10 Global Step: 431070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:10:37,145-Speed 2629.15 samples/sec Loss 6.4520 LearningRate 0.0231 Epoch: 10 Global Step: 431080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:10:41,061-Speed 2616.04 samples/sec Loss 6.4310 LearningRate 0.0231 Epoch: 10 Global Step: 431090 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:10:44,960-Speed 2627.01 samples/sec Loss 6.4078 LearningRate 0.0231 Epoch: 10 Global Step: 431100 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:10:48,859-Speed 2626.88 samples/sec Loss 6.4991 LearningRate 0.0231 Epoch: 10 Global Step: 431110 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:10:52,757-Speed 2628.29 samples/sec Loss 6.4142 LearningRate 0.0231 Epoch: 10 Global Step: 431120 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:10:56,660-Speed 2624.20 samples/sec Loss 6.2734 LearningRate 0.0231 Epoch: 10 Global Step: 431130 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:00,551-Speed 2631.94 samples/sec Loss 6.4160 LearningRate 0.0231 Epoch: 10 Global Step: 431140 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:04,449-Speed 2628.27 samples/sec Loss 6.4672 LearningRate 0.0231 Epoch: 10 Global Step: 431150 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:08,356-Speed 2621.79 samples/sec Loss 6.3639 LearningRate 0.0231 Epoch: 10 Global Step: 431160 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:12,237-Speed 2638.99 samples/sec Loss 6.3771 LearningRate 0.0231 Epoch: 10 Global Step: 431170 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:16,133-Speed 2628.67 samples/sec Loss 6.3925 LearningRate 0.0231 Epoch: 10 Global Step: 431180 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:20,031-Speed 2627.64 samples/sec Loss 6.5025 LearningRate 0.0231 Epoch: 10 Global Step: 431190 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:23,933-Speed 2624.64 samples/sec Loss 6.2939 LearningRate 0.0231 Epoch: 10 Global Step: 431200 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:27,825-Speed 2631.60 samples/sec Loss 6.4303 LearningRate 0.0231 Epoch: 10 Global Step: 431210 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:11:31,698-Speed 2645.29 samples/sec Loss 6.3706 LearningRate 0.0231 Epoch: 10 Global Step: 431220 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:11:35,591-Speed 2630.81 samples/sec Loss 6.3729 LearningRate 0.0231 Epoch: 10 Global Step: 431230 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:11:39,484-Speed 2630.88 samples/sec Loss 6.4747 LearningRate 0.0231 Epoch: 10 Global Step: 431240 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:11:43,381-Speed 2628.03 samples/sec Loss 6.5146 LearningRate 0.0231 Epoch: 10 Global Step: 431250 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:11:47,276-Speed 2630.16 samples/sec Loss 6.4020 LearningRate 0.0231 Epoch: 10 Global Step: 431260 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:11:51,166-Speed 2632.91 samples/sec Loss 6.3528 LearningRate 0.0231 Epoch: 10 Global Step: 431270 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:11:55,065-Speed 2626.91 samples/sec Loss 6.4348 LearningRate 0.0231 Epoch: 10 Global Step: 431280 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:11:58,964-Speed 2627.70 samples/sec Loss 6.4539 LearningRate 0.0231 Epoch: 10 Global Step: 431290 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:12:02,862-Speed 2627.48 samples/sec Loss 6.4818 LearningRate 0.0230 Epoch: 10 Global Step: 431300 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:12:06,761-Speed 2626.49 samples/sec Loss 6.4361 LearningRate 0.0230 Epoch: 10 Global Step: 431310 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:12:10,657-Speed 2629.21 samples/sec Loss 6.3888 LearningRate 0.0230 Epoch: 10 Global Step: 431320 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:14,593-Speed 2602.21 samples/sec Loss 6.4559 LearningRate 0.0230 Epoch: 10 Global Step: 431330 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:18,489-Speed 2629.34 samples/sec Loss 6.3930 LearningRate 0.0230 Epoch: 10 Global Step: 431340 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:22,388-Speed 2626.92 samples/sec Loss 6.3974 LearningRate 0.0230 Epoch: 10 Global Step: 431350 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:26,286-Speed 2627.76 samples/sec Loss 6.3267 LearningRate 0.0230 Epoch: 10 Global Step: 431360 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:30,184-Speed 2627.55 samples/sec Loss 6.4805 LearningRate 0.0230 Epoch: 10 Global Step: 431370 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:34,087-Speed 2624.84 samples/sec Loss 6.3999 LearningRate 0.0230 Epoch: 10 Global Step: 431380 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:38,005-Speed 2613.84 samples/sec Loss 6.3572 LearningRate 0.0230 Epoch: 10 Global Step: 431390 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:41,911-Speed 2622.36 samples/sec Loss 6.4115 LearningRate 0.0230 Epoch: 10 Global Step: 431400 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:12:45,839-Speed 2607.89 samples/sec Loss 6.3242 LearningRate 0.0230 Epoch: 10 Global Step: 431410 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:12:49,754-Speed 2616.21 samples/sec Loss 6.3582 LearningRate 0.0230 Epoch: 10 Global Step: 431420 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:12:53,656-Speed 2624.78 samples/sec Loss 6.3517 LearningRate 0.0230 Epoch: 10 Global Step: 431430 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:12:57,563-Speed 2622.77 samples/sec Loss 6.3649 LearningRate 0.0230 Epoch: 10 Global Step: 431440 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:13:01,465-Speed 2624.84 samples/sec Loss 6.5065 LearningRate 0.0230 Epoch: 10 Global Step: 431450 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:13:05,362-Speed 2627.53 samples/sec Loss 6.4476 LearningRate 0.0230 Epoch: 10 Global Step: 431460 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:13:09,277-Speed 2616.67 samples/sec Loss 6.3048 LearningRate 0.0230 Epoch: 10 Global Step: 431470 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:13:13,175-Speed 2627.23 samples/sec Loss 6.3236 LearningRate 0.0230 Epoch: 10 Global Step: 431480 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:13:17,080-Speed 2622.95 samples/sec Loss 6.3465 LearningRate 0.0230 Epoch: 10 Global Step: 431490 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:13:20,979-Speed 2627.63 samples/sec Loss 6.3502 LearningRate 0.0230 Epoch: 10 Global Step: 431500 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:13:24,882-Speed 2623.94 samples/sec Loss 6.4188 LearningRate 0.0230 Epoch: 10 Global Step: 431510 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:28,781-Speed 2626.99 samples/sec Loss 6.2746 LearningRate 0.0230 Epoch: 10 Global Step: 431520 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:32,689-Speed 2620.71 samples/sec Loss 6.3673 LearningRate 0.0230 Epoch: 10 Global Step: 431530 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:36,589-Speed 2626.00 samples/sec Loss 6.4160 LearningRate 0.0230 Epoch: 10 Global Step: 431540 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:40,498-Speed 2619.98 samples/sec Loss 6.3050 LearningRate 0.0230 Epoch: 10 Global Step: 431550 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:44,393-Speed 2630.11 samples/sec Loss 6.4522 LearningRate 0.0230 Epoch: 10 Global Step: 431560 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:48,293-Speed 2627.07 samples/sec Loss 6.4247 LearningRate 0.0230 Epoch: 10 Global Step: 431570 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:52,190-Speed 2627.87 samples/sec Loss 6.5316 LearningRate 0.0230 Epoch: 10 Global Step: 431580 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:13:56,097-Speed 2622.07 samples/sec Loss 6.4405 LearningRate 0.0230 Epoch: 10 Global Step: 431590 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:00,014-Speed 2614.51 samples/sec Loss 6.2638 LearningRate 0.0230 Epoch: 10 Global Step: 431600 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:03,893-Speed 2640.80 samples/sec Loss 6.4207 LearningRate 0.0230 Epoch: 10 Global Step: 431610 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:07,806-Speed 2617.43 samples/sec Loss 6.3973 LearningRate 0.0230 Epoch: 10 Global Step: 431620 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:11,737-Speed 2605.46 samples/sec Loss 6.3487 LearningRate 0.0230 Epoch: 10 Global Step: 431630 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:15,679-Speed 2598.16 samples/sec Loss 6.3315 LearningRate 0.0230 Epoch: 10 Global Step: 431640 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:19,582-Speed 2625.05 samples/sec Loss 6.4049 LearningRate 0.0230 Epoch: 10 Global Step: 431650 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:23,667-Speed 2507.27 samples/sec Loss 6.4135 LearningRate 0.0230 Epoch: 10 Global Step: 431660 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:27,574-Speed 2621.45 samples/sec Loss 6.3697 LearningRate 0.0230 Epoch: 10 Global Step: 431670 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:31,485-Speed 2619.21 samples/sec Loss 6.4889 LearningRate 0.0230 Epoch: 10 Global Step: 431680 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:35,385-Speed 2625.76 samples/sec Loss 6.4121 LearningRate 0.0230 Epoch: 10 Global Step: 431690 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:39,286-Speed 2626.05 samples/sec Loss 6.4139 LearningRate 0.0230 Epoch: 10 Global Step: 431700 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:43,162-Speed 2642.13 samples/sec Loss 6.3477 LearningRate 0.0230 Epoch: 10 Global Step: 431710 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:47,070-Speed 2621.46 samples/sec Loss 6.3976 LearningRate 0.0230 Epoch: 10 Global Step: 431720 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:50,966-Speed 2628.25 samples/sec Loss 6.5068 LearningRate 0.0230 Epoch: 10 Global Step: 431730 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:54,876-Speed 2619.78 samples/sec Loss 6.2875 LearningRate 0.0230 Epoch: 10 Global Step: 431740 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:14:58,777-Speed 2625.43 samples/sec Loss 6.4071 LearningRate 0.0230 Epoch: 10 Global Step: 431750 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:15:02,678-Speed 2626.31 samples/sec Loss 6.3049 LearningRate 0.0230 Epoch: 10 Global Step: 431760 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:15:06,582-Speed 2623.17 samples/sec Loss 6.4356 LearningRate 0.0230 Epoch: 10 Global Step: 431770 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:15:10,482-Speed 2626.93 samples/sec Loss 6.3467 LearningRate 0.0230 Epoch: 10 Global Step: 431780 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:15:14,376-Speed 2630.17 samples/sec Loss 6.4096 LearningRate 0.0230 Epoch: 10 Global Step: 431790 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:15:18,251-Speed 2643.27 samples/sec Loss 6.3307 LearningRate 0.0230 Epoch: 10 Global Step: 431800 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:15:22,125-Speed 2644.26 samples/sec Loss 6.3286 LearningRate 0.0230 Epoch: 10 Global Step: 431810 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:26,021-Speed 2628.93 samples/sec Loss 6.4399 LearningRate 0.0230 Epoch: 10 Global Step: 431820 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:29,925-Speed 2623.33 samples/sec Loss 6.2211 LearningRate 0.0230 Epoch: 10 Global Step: 431830 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:33,820-Speed 2629.96 samples/sec Loss 6.4341 LearningRate 0.0230 Epoch: 10 Global Step: 431840 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:37,757-Speed 2601.30 samples/sec Loss 6.3831 LearningRate 0.0230 Epoch: 10 Global Step: 431850 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:41,723-Speed 2582.69 samples/sec Loss 6.3873 LearningRate 0.0230 Epoch: 10 Global Step: 431860 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:45,617-Speed 2630.18 samples/sec Loss 6.3670 LearningRate 0.0230 Epoch: 10 Global Step: 431870 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:49,513-Speed 2629.68 samples/sec Loss 6.4342 LearningRate 0.0230 Epoch: 10 Global Step: 431880 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:53,412-Speed 2626.70 samples/sec Loss 6.4038 LearningRate 0.0230 Epoch: 10 Global Step: 431890 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:15:57,308-Speed 2629.39 samples/sec Loss 6.5049 LearningRate 0.0230 Epoch: 10 Global Step: 431900 Fp16 Grad Scale: 32768 Required: 45 hours
Training: 2022-04-14 20:16:01,202-Speed 2630.47 samples/sec Loss 6.4249 LearningRate 0.0230 Epoch: 10 Global Step: 431910 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:05,112-Speed 2619.10 samples/sec Loss 6.3570 LearningRate 0.0230 Epoch: 10 Global Step: 431920 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:09,043-Speed 2606.23 samples/sec Loss 6.5062 LearningRate 0.0230 Epoch: 10 Global Step: 431930 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:12,937-Speed 2630.11 samples/sec Loss 6.4860 LearningRate 0.0230 Epoch: 10 Global Step: 431940 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:16,832-Speed 2629.79 samples/sec Loss 6.4093 LearningRate 0.0230 Epoch: 10 Global Step: 431950 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:20,728-Speed 2629.25 samples/sec Loss 6.4285 LearningRate 0.0230 Epoch: 10 Global Step: 431960 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:24,624-Speed 2628.65 samples/sec Loss 6.3116 LearningRate 0.0230 Epoch: 10 Global Step: 431970 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:28,522-Speed 2627.75 samples/sec Loss 6.4613 LearningRate 0.0230 Epoch: 10 Global Step: 431980 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:32,414-Speed 2631.49 samples/sec Loss 6.4636 LearningRate 0.0230 Epoch: 10 Global Step: 431990 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:36,313-Speed 2626.83 samples/sec Loss 6.3382 LearningRate 0.0230 Epoch: 10 Global Step: 432000 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:16:40,226-Speed 2617.57 samples/sec Loss 6.2830 LearningRate 0.0230 Epoch: 10 Global Step: 432010 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:16:44,131-Speed 2624.02 samples/sec Loss 6.3049 LearningRate 0.0230 Epoch: 10 Global Step: 432020 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:16:48,022-Speed 2632.02 samples/sec Loss 6.3929 LearningRate 0.0230 Epoch: 10 Global Step: 432030 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:16:51,918-Speed 2628.89 samples/sec Loss 6.3783 LearningRate 0.0230 Epoch: 10 Global Step: 432040 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:16:55,810-Speed 2631.56 samples/sec Loss 6.5055 LearningRate 0.0230 Epoch: 10 Global Step: 432050 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:16:59,714-Speed 2623.65 samples/sec Loss 6.3461 LearningRate 0.0230 Epoch: 10 Global Step: 432060 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:03,609-Speed 2629.61 samples/sec Loss 6.3521 LearningRate 0.0230 Epoch: 10 Global Step: 432070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:07,505-Speed 2628.57 samples/sec Loss 6.3537 LearningRate 0.0230 Epoch: 10 Global Step: 432080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:11,399-Speed 2630.84 samples/sec Loss 6.5050 LearningRate 0.0230 Epoch: 10 Global Step: 432090 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:15,299-Speed 2626.07 samples/sec Loss 6.3402 LearningRate 0.0230 Epoch: 10 Global Step: 432100 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:19,178-Speed 2640.62 samples/sec Loss 6.3603 LearningRate 0.0230 Epoch: 10 Global Step: 432110 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:23,073-Speed 2629.28 samples/sec Loss 6.3226 LearningRate 0.0230 Epoch: 10 Global Step: 432120 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:26,978-Speed 2623.35 samples/sec Loss 6.4173 LearningRate 0.0230 Epoch: 10 Global Step: 432130 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:30,877-Speed 2626.63 samples/sec Loss 6.4260 LearningRate 0.0230 Epoch: 10 Global Step: 432140 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:34,776-Speed 2626.83 samples/sec Loss 6.3797 LearningRate 0.0230 Epoch: 10 Global Step: 432150 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:38,678-Speed 2625.02 samples/sec Loss 6.4473 LearningRate 0.0230 Epoch: 10 Global Step: 432160 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:42,600-Speed 2611.77 samples/sec Loss 6.3838 LearningRate 0.0229 Epoch: 10 Global Step: 432170 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:46,503-Speed 2624.32 samples/sec Loss 6.3042 LearningRate 0.0229 Epoch: 10 Global Step: 432180 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:50,429-Speed 2608.89 samples/sec Loss 6.3794 LearningRate 0.0229 Epoch: 10 Global Step: 432190 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:54,323-Speed 2630.70 samples/sec Loss 6.3975 LearningRate 0.0229 Epoch: 10 Global Step: 432200 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:17:58,207-Speed 2637.17 samples/sec Loss 6.4314 LearningRate 0.0229 Epoch: 10 Global Step: 432210 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:02,134-Speed 2608.10 samples/sec Loss 6.4562 LearningRate 0.0229 Epoch: 10 Global Step: 432220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:06,091-Speed 2588.56 samples/sec Loss 6.3408 LearningRate 0.0229 Epoch: 10 Global Step: 432230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:10,023-Speed 2604.90 samples/sec Loss 6.3894 LearningRate 0.0229 Epoch: 10 Global Step: 432240 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:13,936-Speed 2618.71 samples/sec Loss 6.4571 LearningRate 0.0229 Epoch: 10 Global Step: 432250 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:17,912-Speed 2576.12 samples/sec Loss 6.3841 LearningRate 0.0229 Epoch: 10 Global Step: 432260 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:21,807-Speed 2629.61 samples/sec Loss 6.3786 LearningRate 0.0229 Epoch: 10 Global Step: 432270 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:25,703-Speed 2628.62 samples/sec Loss 6.2843 LearningRate 0.0229 Epoch: 10 Global Step: 432280 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:29,617-Speed 2617.78 samples/sec Loss 6.4422 LearningRate 0.0229 Epoch: 10 Global Step: 432290 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:33,575-Speed 2587.70 samples/sec Loss 6.3406 LearningRate 0.0229 Epoch: 10 Global Step: 432300 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:37,487-Speed 2617.83 samples/sec Loss 6.3212 LearningRate 0.0229 Epoch: 10 Global Step: 432310 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 20:18:41,373-Speed 2635.11 samples/sec Loss 6.4452 LearningRate 0.0229 Epoch: 10 Global Step: 432320 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:45,272-Speed 2627.65 samples/sec Loss 6.3545 LearningRate 0.0229 Epoch: 10 Global Step: 432330 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:49,180-Speed 2621.35 samples/sec Loss 6.3929 LearningRate 0.0229 Epoch: 10 Global Step: 432340 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:53,095-Speed 2616.21 samples/sec Loss 6.4187 LearningRate 0.0229 Epoch: 10 Global Step: 432350 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:18:57,017-Speed 2611.47 samples/sec Loss 6.3313 LearningRate 0.0229 Epoch: 10 Global Step: 432360 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:19:00,919-Speed 2624.88 samples/sec Loss 6.5113 LearningRate 0.0229 Epoch: 10 Global Step: 432370 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:19:04,820-Speed 2625.47 samples/sec Loss 6.4262 LearningRate 0.0229 Epoch: 10 Global Step: 432380 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:19:08,729-Speed 2620.17 samples/sec Loss 6.3952 LearningRate 0.0229 Epoch: 10 Global Step: 432390 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:19:12,633-Speed 2623.06 samples/sec Loss 6.4766 LearningRate 0.0229 Epoch: 10 Global Step: 432400 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:19:16,520-Speed 2634.98 samples/sec Loss 6.3382 LearningRate 0.0229 Epoch: 10 Global Step: 432410 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:20,422-Speed 2625.47 samples/sec Loss 6.3050 LearningRate 0.0229 Epoch: 10 Global Step: 432420 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:24,321-Speed 2627.10 samples/sec Loss 6.2738 LearningRate 0.0229 Epoch: 10 Global Step: 432430 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:28,222-Speed 2625.41 samples/sec Loss 6.3347 LearningRate 0.0229 Epoch: 10 Global Step: 432440 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:32,125-Speed 2624.51 samples/sec Loss 6.2681 LearningRate 0.0229 Epoch: 10 Global Step: 432450 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:36,023-Speed 2627.32 samples/sec Loss 6.4641 LearningRate 0.0229 Epoch: 10 Global Step: 432460 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:39,923-Speed 2626.23 samples/sec Loss 6.4058 LearningRate 0.0229 Epoch: 10 Global Step: 432470 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:43,823-Speed 2626.41 samples/sec Loss 6.3124 LearningRate 0.0229 Epoch: 10 Global Step: 432480 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:47,727-Speed 2623.02 samples/sec Loss 6.2823 LearningRate 0.0229 Epoch: 10 Global Step: 432490 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:51,624-Speed 2628.99 samples/sec Loss 6.4243 LearningRate 0.0229 Epoch: 10 Global Step: 432500 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:19:55,541-Speed 2614.79 samples/sec Loss 6.3100 LearningRate 0.0229 Epoch: 10 Global Step: 432510 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:19:59,436-Speed 2630.13 samples/sec Loss 6.3455 LearningRate 0.0229 Epoch: 10 Global Step: 432520 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:20:03,333-Speed 2627.74 samples/sec Loss 6.4173 LearningRate 0.0229 Epoch: 10 Global Step: 432530 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:20:07,227-Speed 2630.02 samples/sec Loss 6.4259 LearningRate 0.0229 Epoch: 10 Global Step: 432540 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:20:11,132-Speed 2622.75 samples/sec Loss 6.3789 LearningRate 0.0229 Epoch: 10 Global Step: 432550 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:20:15,039-Speed 2621.80 samples/sec Loss 6.3900 LearningRate 0.0229 Epoch: 10 Global Step: 432560 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:20:18,943-Speed 2624.12 samples/sec Loss 6.4357 LearningRate 0.0229 Epoch: 10 Global Step: 432570 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:22,849-Speed 2621.79 samples/sec Loss 6.3875 LearningRate 0.0229 Epoch: 10 Global Step: 432580 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:26,757-Speed 2620.96 samples/sec Loss 6.4316 LearningRate 0.0229 Epoch: 10 Global Step: 432590 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:30,657-Speed 2626.28 samples/sec Loss 6.3941 LearningRate 0.0229 Epoch: 10 Global Step: 432600 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:34,579-Speed 2611.59 samples/sec Loss 6.1983 LearningRate 0.0229 Epoch: 10 Global Step: 432610 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:38,499-Speed 2612.67 samples/sec Loss 6.4754 LearningRate 0.0229 Epoch: 10 Global Step: 432620 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:42,401-Speed 2625.12 samples/sec Loss 6.3673 LearningRate 0.0229 Epoch: 10 Global Step: 432630 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:46,332-Speed 2605.79 samples/sec Loss 6.3014 LearningRate 0.0229 Epoch: 10 Global Step: 432640 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:50,230-Speed 2627.52 samples/sec Loss 6.3943 LearningRate 0.0229 Epoch: 10 Global Step: 432650 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:54,128-Speed 2627.90 samples/sec Loss 6.4621 LearningRate 0.0229 Epoch: 10 Global Step: 432660 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:20:58,022-Speed 2629.88 samples/sec Loss 6.4228 LearningRate 0.0229 Epoch: 10 Global Step: 432670 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:01,922-Speed 2626.60 samples/sec Loss 6.4458 LearningRate 0.0229 Epoch: 10 Global Step: 432680 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:05,819-Speed 2628.40 samples/sec Loss 6.3438 LearningRate 0.0229 Epoch: 10 Global Step: 432690 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:09,722-Speed 2624.35 samples/sec Loss 6.3899 LearningRate 0.0229 Epoch: 10 Global Step: 432700 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:13,614-Speed 2631.61 samples/sec Loss 6.4364 LearningRate 0.0229 Epoch: 10 Global Step: 432710 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:17,519-Speed 2622.96 samples/sec Loss 6.3752 LearningRate 0.0229 Epoch: 10 Global Step: 432720 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:21,419-Speed 2626.42 samples/sec Loss 6.3217 LearningRate 0.0229 Epoch: 10 Global Step: 432730 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:25,331-Speed 2618.73 samples/sec Loss 6.3459 LearningRate 0.0229 Epoch: 10 Global Step: 432740 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:29,231-Speed 2626.00 samples/sec Loss 6.3603 LearningRate 0.0229 Epoch: 10 Global Step: 432750 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:33,126-Speed 2629.29 samples/sec Loss 6.3068 LearningRate 0.0229 Epoch: 10 Global Step: 432760 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:37,037-Speed 2618.90 samples/sec Loss 6.3919 LearningRate 0.0229 Epoch: 10 Global Step: 432770 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 20:21:40,910-Speed 2645.11 samples/sec Loss 6.2592 LearningRate 0.0229 Epoch: 10 Global Step: 432780 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:45,023-Speed 2490.40 samples/sec Loss 6.3656 LearningRate 0.0229 Epoch: 10 Global Step: 432790 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:48,954-Speed 2605.56 samples/sec Loss 6.3156 LearningRate 0.0229 Epoch: 10 Global Step: 432800 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:52,868-Speed 2616.65 samples/sec Loss 6.3848 LearningRate 0.0229 Epoch: 10 Global Step: 432810 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:21:56,764-Speed 2629.31 samples/sec Loss 6.3386 LearningRate 0.0229 Epoch: 10 Global Step: 432820 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:22:00,696-Speed 2604.62 samples/sec Loss 6.4065 LearningRate 0.0229 Epoch: 10 Global Step: 432830 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:22:04,637-Speed 2598.77 samples/sec Loss 6.3681 LearningRate 0.0229 Epoch: 10 Global Step: 432840 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:22:08,532-Speed 2629.89 samples/sec Loss 6.4201 LearningRate 0.0229 Epoch: 10 Global Step: 432850 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:22:12,431-Speed 2627.17 samples/sec Loss 6.4020 LearningRate 0.0229 Epoch: 10 Global Step: 432860 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:22:16,325-Speed 2630.99 samples/sec Loss 6.3683 LearningRate 0.0229 Epoch: 10 Global Step: 432870 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:22:20,216-Speed 2632.06 samples/sec Loss 6.2524 LearningRate 0.0229 Epoch: 10 Global Step: 432880 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:22:24,103-Speed 2635.47 samples/sec Loss 6.4781 LearningRate 0.0229 Epoch: 10 Global Step: 432890 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:27,997-Speed 2630.40 samples/sec Loss 6.4205 LearningRate 0.0229 Epoch: 10 Global Step: 432900 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:31,902-Speed 2622.76 samples/sec Loss 6.3939 LearningRate 0.0229 Epoch: 10 Global Step: 432910 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:35,805-Speed 2623.89 samples/sec Loss 6.3588 LearningRate 0.0229 Epoch: 10 Global Step: 432920 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:39,700-Speed 2630.28 samples/sec Loss 6.2496 LearningRate 0.0229 Epoch: 10 Global Step: 432930 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:43,595-Speed 2629.40 samples/sec Loss 6.3393 LearningRate 0.0229 Epoch: 10 Global Step: 432940 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:47,493-Speed 2627.97 samples/sec Loss 6.4980 LearningRate 0.0229 Epoch: 10 Global Step: 432950 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:51,392-Speed 2626.80 samples/sec Loss 6.4213 LearningRate 0.0229 Epoch: 10 Global Step: 432960 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:55,290-Speed 2628.23 samples/sec Loss 6.3073 LearningRate 0.0229 Epoch: 10 Global Step: 432970 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:22:59,192-Speed 2624.42 samples/sec Loss 6.4432 LearningRate 0.0229 Epoch: 10 Global Step: 432980 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:23:03,100-Speed 2620.93 samples/sec Loss 6.3218 LearningRate 0.0229 Epoch: 10 Global Step: 432990 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:06,999-Speed 2627.04 samples/sec Loss 6.3326 LearningRate 0.0229 Epoch: 10 Global Step: 433000 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:10,903-Speed 2623.08 samples/sec Loss 6.3663 LearningRate 0.0229 Epoch: 10 Global Step: 433010 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:14,798-Speed 2630.13 samples/sec Loss 6.3879 LearningRate 0.0229 Epoch: 10 Global Step: 433020 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:18,701-Speed 2623.70 samples/sec Loss 6.3676 LearningRate 0.0228 Epoch: 10 Global Step: 433030 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:22,599-Speed 2628.38 samples/sec Loss 6.3324 LearningRate 0.0228 Epoch: 10 Global Step: 433040 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:26,499-Speed 2626.31 samples/sec Loss 6.4202 LearningRate 0.0228 Epoch: 10 Global Step: 433050 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:30,520-Speed 2547.36 samples/sec Loss 6.2726 LearningRate 0.0228 Epoch: 10 Global Step: 433060 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:34,428-Speed 2621.11 samples/sec Loss 6.4644 LearningRate 0.0228 Epoch: 10 Global Step: 433070 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:38,334-Speed 2622.07 samples/sec Loss 6.3453 LearningRate 0.0228 Epoch: 10 Global Step: 433080 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:42,235-Speed 2625.13 samples/sec Loss 6.4103 LearningRate 0.0228 Epoch: 10 Global Step: 433090 Fp16 Grad Scale: 262144 Required: 45 hours
Training: 2022-04-14 20:23:46,119-Speed 2636.98 samples/sec Loss 6.3518 LearningRate 0.0228 Epoch: 10 Global Step: 433100 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:50,026-Speed 2622.06 samples/sec Loss 6.4320 LearningRate 0.0228 Epoch: 10 Global Step: 433110 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:23:53,906-Speed 2639.40 samples/sec Loss 6.4379 LearningRate 0.0228 Epoch: 10 Global Step: 433120 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:23:57,808-Speed 2625.42 samples/sec Loss 6.4695 LearningRate 0.0228 Epoch: 10 Global Step: 433130 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:01,706-Speed 2627.56 samples/sec Loss 6.3647 LearningRate 0.0228 Epoch: 10 Global Step: 433140 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:05,605-Speed 2627.06 samples/sec Loss 6.4224 LearningRate 0.0228 Epoch: 10 Global Step: 433150 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:09,500-Speed 2629.04 samples/sec Loss 6.3298 LearningRate 0.0228 Epoch: 10 Global Step: 433160 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:13,407-Speed 2622.19 samples/sec Loss 6.2580 LearningRate 0.0228 Epoch: 10 Global Step: 433170 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:17,310-Speed 2623.59 samples/sec Loss 6.3169 LearningRate 0.0228 Epoch: 10 Global Step: 433180 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:21,217-Speed 2627.30 samples/sec Loss 6.3307 LearningRate 0.0228 Epoch: 10 Global Step: 433190 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:25,209-Speed 2565.87 samples/sec Loss 6.3292 LearningRate 0.0228 Epoch: 10 Global Step: 433200 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:29,222-Speed 2553.02 samples/sec Loss 6.3521 LearningRate 0.0228 Epoch: 10 Global Step: 433210 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:33,126-Speed 2623.41 samples/sec Loss 6.3925 LearningRate 0.0228 Epoch: 10 Global Step: 433220 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:24:37,021-Speed 2629.07 samples/sec Loss 6.3740 LearningRate 0.0228 Epoch: 10 Global Step: 433230 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:24:40,899-Speed 2641.04 samples/sec Loss 6.4015 LearningRate 0.0228 Epoch: 10 Global Step: 433240 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:44,794-Speed 2629.94 samples/sec Loss 6.5139 LearningRate 0.0228 Epoch: 10 Global Step: 433250 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:48,693-Speed 2626.58 samples/sec Loss 6.4100 LearningRate 0.0228 Epoch: 10 Global Step: 433260 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:52,592-Speed 2627.22 samples/sec Loss 6.4335 LearningRate 0.0228 Epoch: 10 Global Step: 433270 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:24:56,494-Speed 2624.55 samples/sec Loss 6.3002 LearningRate 0.0228 Epoch: 10 Global Step: 433280 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:00,387-Speed 2631.33 samples/sec Loss 6.4759 LearningRate 0.0228 Epoch: 10 Global Step: 433290 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:04,283-Speed 2629.26 samples/sec Loss 6.3346 LearningRate 0.0228 Epoch: 10 Global Step: 433300 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:08,175-Speed 2631.84 samples/sec Loss 6.4630 LearningRate 0.0228 Epoch: 10 Global Step: 433310 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:12,067-Speed 2630.92 samples/sec Loss 6.3448 LearningRate 0.0228 Epoch: 10 Global Step: 433320 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:15,963-Speed 2629.35 samples/sec Loss 6.3665 LearningRate 0.0228 Epoch: 10 Global Step: 433330 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:19,867-Speed 2624.17 samples/sec Loss 6.4744 LearningRate 0.0228 Epoch: 10 Global Step: 433340 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:25:23,760-Speed 2630.15 samples/sec Loss 6.4281 LearningRate 0.0228 Epoch: 10 Global Step: 433350 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:25:27,660-Speed 2626.64 samples/sec Loss 6.2508 LearningRate 0.0228 Epoch: 10 Global Step: 433360 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:25:31,546-Speed 2635.57 samples/sec Loss 6.3414 LearningRate 0.0228 Epoch: 10 Global Step: 433370 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:35,454-Speed 2621.53 samples/sec Loss 6.3191 LearningRate 0.0228 Epoch: 10 Global Step: 433380 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:39,357-Speed 2623.87 samples/sec Loss 6.4301 LearningRate 0.0228 Epoch: 10 Global Step: 433390 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:43,298-Speed 2598.96 samples/sec Loss 6.3722 LearningRate 0.0228 Epoch: 10 Global Step: 433400 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:47,201-Speed 2624.73 samples/sec Loss 6.3040 LearningRate 0.0228 Epoch: 10 Global Step: 433410 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:51,101-Speed 2626.20 samples/sec Loss 6.3367 LearningRate 0.0228 Epoch: 10 Global Step: 433420 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:55,033-Speed 2604.98 samples/sec Loss 6.4110 LearningRate 0.0228 Epoch: 10 Global Step: 433430 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:25:58,934-Speed 2625.94 samples/sec Loss 6.3255 LearningRate 0.0228 Epoch: 10 Global Step: 433440 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:26:02,829-Speed 2629.54 samples/sec Loss 6.4436 LearningRate 0.0228 Epoch: 10 Global Step: 433450 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:26:06,725-Speed 2629.36 samples/sec Loss 6.4140 LearningRate 0.0228 Epoch: 10 Global Step: 433460 Fp16 Grad Scale: 65536 Required: 45 hours
Training: 2022-04-14 20:26:10,630-Speed 2622.59 samples/sec Loss 6.2770 LearningRate 0.0228 Epoch: 10 Global Step: 433470 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:26:14,538-Speed 2620.62 samples/sec Loss 6.3519 LearningRate 0.0228 Epoch: 10 Global Step: 433480 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:26:18,440-Speed 2625.03 samples/sec Loss 6.4310 LearningRate 0.0228 Epoch: 10 Global Step: 433490 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:26:22,336-Speed 2629.45 samples/sec Loss 6.3476 LearningRate 0.0228 Epoch: 10 Global Step: 433500 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:26:26,237-Speed 2625.26 samples/sec Loss 6.3978 LearningRate 0.0228 Epoch: 10 Global Step: 433510 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:26:30,262-Speed 2544.64 samples/sec Loss 6.4041 LearningRate 0.0228 Epoch: 10 Global Step: 433520 Fp16 Grad Scale: 131072 Required: 45 hours
Training: 2022-04-14 20:26:34,185-Speed 2611.45 samples/sec Loss 6.4252 LearningRate 0.0228 Epoch: 10 Global Step: 433530 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:26:38,082-Speed 2628.64 samples/sec Loss 6.3626 LearningRate 0.0228 Epoch: 10 Global Step: 433540 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:26:41,988-Speed 2622.04 samples/sec Loss 6.3776 LearningRate 0.0228 Epoch: 10 Global Step: 433550 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:26:45,884-Speed 2628.83 samples/sec Loss 6.3317 LearningRate 0.0228 Epoch: 10 Global Step: 433560 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:26:49,779-Speed 2629.75 samples/sec Loss 6.3049 LearningRate 0.0228 Epoch: 10 Global Step: 433570 Fp16 Grad Scale: 262144 Required: 44 hours
Training: 2022-04-14 20:26:53,660-Speed 2640.21 samples/sec Loss 6.3529 LearningRate 0.0228 Epoch: 10 Global Step: 433580 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:26:57,554-Speed 2629.69 samples/sec Loss 6.3637 LearningRate 0.0228 Epoch: 10 Global Step: 433590 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:27:01,456-Speed 2624.85 samples/sec Loss 6.3946 LearningRate 0.0228 Epoch: 10 Global Step: 433600 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:27:05,355-Speed 2626.91 samples/sec Loss 6.3489 LearningRate 0.0228 Epoch: 10 Global Step: 433610 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:27:09,232-Speed 2642.41 samples/sec Loss 6.2559 LearningRate 0.0228 Epoch: 10 Global Step: 433620 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:13,127-Speed 2629.30 samples/sec Loss 6.3068 LearningRate 0.0228 Epoch: 10 Global Step: 433630 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:17,021-Speed 2630.60 samples/sec Loss 6.4615 LearningRate 0.0228 Epoch: 10 Global Step: 433640 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:20,918-Speed 2628.44 samples/sec Loss 6.4398 LearningRate 0.0228 Epoch: 10 Global Step: 433650 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:24,810-Speed 2632.17 samples/sec Loss 6.3608 LearningRate 0.0228 Epoch: 10 Global Step: 433660 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:28,714-Speed 2623.41 samples/sec Loss 6.4227 LearningRate 0.0228 Epoch: 10 Global Step: 433670 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:32,623-Speed 2620.07 samples/sec Loss 6.3391 LearningRate 0.0228 Epoch: 10 Global Step: 433680 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:36,528-Speed 2622.58 samples/sec Loss 6.4014 LearningRate 0.0228 Epoch: 10 Global Step: 433690 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:40,421-Speed 2632.35 samples/sec Loss 6.3830 LearningRate 0.0228 Epoch: 10 Global Step: 433700 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:44,314-Speed 2631.43 samples/sec Loss 6.5561 LearningRate 0.0228 Epoch: 10 Global Step: 433710 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:48,219-Speed 2622.64 samples/sec Loss 6.4505 LearningRate 0.0228 Epoch: 10 Global Step: 433720 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:27:52,100-Speed 2639.77 samples/sec Loss 6.2189 LearningRate 0.0228 Epoch: 10 Global Step: 433730 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:56,008-Speed 2621.12 samples/sec Loss 6.3018 LearningRate 0.0228 Epoch: 10 Global Step: 433740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:27:59,906-Speed 2626.93 samples/sec Loss 6.3197 LearningRate 0.0228 Epoch: 10 Global Step: 433750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:03,822-Speed 2615.80 samples/sec Loss 6.3922 LearningRate 0.0228 Epoch: 10 Global Step: 433760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:07,721-Speed 2627.08 samples/sec Loss 6.3383 LearningRate 0.0228 Epoch: 10 Global Step: 433770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:11,775-Speed 2526.24 samples/sec Loss 6.3550 LearningRate 0.0228 Epoch: 10 Global Step: 433780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:15,809-Speed 2539.16 samples/sec Loss 6.3505 LearningRate 0.0228 Epoch: 10 Global Step: 433790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:20,026-Speed 2429.03 samples/sec Loss 6.3775 LearningRate 0.0228 Epoch: 10 Global Step: 433800 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:23,931-Speed 2623.60 samples/sec Loss 6.4104 LearningRate 0.0228 Epoch: 10 Global Step: 433810 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:27,826-Speed 2629.08 samples/sec Loss 6.3948 LearningRate 0.0228 Epoch: 10 Global Step: 433820 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:31,733-Speed 2621.37 samples/sec Loss 6.3407 LearningRate 0.0228 Epoch: 10 Global Step: 433830 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:28:35,624-Speed 2632.56 samples/sec Loss 6.3779 LearningRate 0.0228 Epoch: 10 Global Step: 433840 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:28:39,537-Speed 2617.95 samples/sec Loss 6.3603 LearningRate 0.0228 Epoch: 10 Global Step: 433850 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:28:43,416-Speed 2640.07 samples/sec Loss 6.3659 LearningRate 0.0228 Epoch: 10 Global Step: 433860 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:47,310-Speed 2630.72 samples/sec Loss 6.3834 LearningRate 0.0228 Epoch: 10 Global Step: 433870 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:51,223-Speed 2617.49 samples/sec Loss 6.4645 LearningRate 0.0228 Epoch: 10 Global Step: 433880 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:55,117-Speed 2630.61 samples/sec Loss 6.3634 LearningRate 0.0228 Epoch: 10 Global Step: 433890 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:28:59,018-Speed 2625.42 samples/sec Loss 6.2362 LearningRate 0.0227 Epoch: 10 Global Step: 433900 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:29:02,923-Speed 2622.86 samples/sec Loss 6.2399 LearningRate 0.0227 Epoch: 10 Global Step: 433910 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:29:06,826-Speed 2624.22 samples/sec Loss 6.4629 LearningRate 0.0227 Epoch: 10 Global Step: 433920 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:29:10,741-Speed 2616.56 samples/sec Loss 6.3198 LearningRate 0.0227 Epoch: 10 Global Step: 433930 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:29:14,643-Speed 2625.05 samples/sec Loss 6.3740 LearningRate 0.0227 Epoch: 10 Global Step: 433940 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:29:18,542-Speed 2626.77 samples/sec Loss 6.4160 LearningRate 0.0227 Epoch: 10 Global Step: 433950 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:29:22,461-Speed 2614.06 samples/sec Loss 6.2635 LearningRate 0.0227 Epoch: 10 Global Step: 433960 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:26,359-Speed 2627.58 samples/sec Loss 6.3364 LearningRate 0.0227 Epoch: 10 Global Step: 433970 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:30,253-Speed 2630.26 samples/sec Loss 6.3962 LearningRate 0.0227 Epoch: 10 Global Step: 433980 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:34,148-Speed 2629.40 samples/sec Loss 6.3809 LearningRate 0.0227 Epoch: 10 Global Step: 433990 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:38,063-Speed 2616.53 samples/sec Loss 6.3565 LearningRate 0.0227 Epoch: 10 Global Step: 434000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:41,963-Speed 2626.10 samples/sec Loss 6.2588 LearningRate 0.0227 Epoch: 10 Global Step: 434010 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:45,860-Speed 2628.88 samples/sec Loss 6.4080 LearningRate 0.0227 Epoch: 10 Global Step: 434020 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:49,764-Speed 2623.62 samples/sec Loss 6.4376 LearningRate 0.0227 Epoch: 10 Global Step: 434030 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:53,673-Speed 2619.93 samples/sec Loss 6.3331 LearningRate 0.0227 Epoch: 10 Global Step: 434040 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:29:57,581-Speed 2621.29 samples/sec Loss 6.3456 LearningRate 0.0227 Epoch: 10 Global Step: 434050 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:01,455-Speed 2644.18 samples/sec Loss 6.4205 LearningRate 0.0227 Epoch: 10 Global Step: 434060 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:05,395-Speed 2599.13 samples/sec Loss 6.3077 LearningRate 0.0227 Epoch: 10 Global Step: 434070 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:09,314-Speed 2613.90 samples/sec Loss 6.3984 LearningRate 0.0227 Epoch: 10 Global Step: 434080 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:13,225-Speed 2618.97 samples/sec Loss 6.4098 LearningRate 0.0227 Epoch: 10 Global Step: 434090 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:17,130-Speed 2623.16 samples/sec Loss 6.3831 LearningRate 0.0227 Epoch: 10 Global Step: 434100 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:21,033-Speed 2624.46 samples/sec Loss 6.3135 LearningRate 0.0227 Epoch: 10 Global Step: 434110 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:24,934-Speed 2624.90 samples/sec Loss 6.3377 LearningRate 0.0227 Epoch: 10 Global Step: 434120 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:28,840-Speed 2622.68 samples/sec Loss 6.4226 LearningRate 0.0227 Epoch: 10 Global Step: 434130 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:32,733-Speed 2631.13 samples/sec Loss 6.4043 LearningRate 0.0227 Epoch: 10 Global Step: 434140 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:36,629-Speed 2629.02 samples/sec Loss 6.3412 LearningRate 0.0227 Epoch: 10 Global Step: 434150 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:40,533-Speed 2623.16 samples/sec Loss 6.3260 LearningRate 0.0227 Epoch: 10 Global Step: 434160 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:44,427-Speed 2630.45 samples/sec Loss 6.3239 LearningRate 0.0227 Epoch: 10 Global Step: 434170 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:48,319-Speed 2631.83 samples/sec Loss 6.3927 LearningRate 0.0227 Epoch: 10 Global Step: 434180 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:52,219-Speed 2626.91 samples/sec Loss 6.3244 LearningRate 0.0227 Epoch: 10 Global Step: 434190 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:30:56,117-Speed 2626.93 samples/sec Loss 6.3119 LearningRate 0.0227 Epoch: 10 Global Step: 434200 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:00,010-Speed 2631.78 samples/sec Loss 6.3251 LearningRate 0.0227 Epoch: 10 Global Step: 434210 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:03,903-Speed 2630.90 samples/sec Loss 6.3541 LearningRate 0.0227 Epoch: 10 Global Step: 434220 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:07,804-Speed 2625.63 samples/sec Loss 6.3708 LearningRate 0.0227 Epoch: 10 Global Step: 434230 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:11,806-Speed 2558.96 samples/sec Loss 6.3556 LearningRate 0.0227 Epoch: 10 Global Step: 434240 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:15,698-Speed 2632.20 samples/sec Loss 6.4216 LearningRate 0.0227 Epoch: 10 Global Step: 434250 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:19,577-Speed 2640.75 samples/sec Loss 6.3575 LearningRate 0.0227 Epoch: 10 Global Step: 434260 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:23,475-Speed 2627.58 samples/sec Loss 6.4343 LearningRate 0.0227 Epoch: 10 Global Step: 434270 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:27,375-Speed 2626.43 samples/sec Loss 6.4085 LearningRate 0.0227 Epoch: 10 Global Step: 434280 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:31,282-Speed 2621.66 samples/sec Loss 6.3810 LearningRate 0.0227 Epoch: 10 Global Step: 434290 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:35,184-Speed 2624.68 samples/sec Loss 6.3712 LearningRate 0.0227 Epoch: 10 Global Step: 434300 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:39,090-Speed 2622.29 samples/sec Loss 6.3429 LearningRate 0.0227 Epoch: 10 Global Step: 434310 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:43,031-Speed 2599.82 samples/sec Loss 6.4073 LearningRate 0.0227 Epoch: 10 Global Step: 434320 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:31:46,909-Speed 2640.90 samples/sec Loss 6.2227 LearningRate 0.0227 Epoch: 10 Global Step: 434330 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:31:50,803-Speed 2630.82 samples/sec Loss 6.3165 LearningRate 0.0227 Epoch: 10 Global Step: 434340 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:31:54,728-Speed 2608.98 samples/sec Loss 6.4026 LearningRate 0.0227 Epoch: 10 Global Step: 434350 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:31:58,637-Speed 2620.58 samples/sec Loss 6.3331 LearningRate 0.0227 Epoch: 10 Global Step: 434360 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:32:02,536-Speed 2627.27 samples/sec Loss 6.3823 LearningRate 0.0227 Epoch: 10 Global Step: 434370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:32:06,444-Speed 2620.59 samples/sec Loss 6.4220 LearningRate 0.0227 Epoch: 10 Global Step: 434380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:32:10,341-Speed 2628.66 samples/sec Loss 6.2962 LearningRate 0.0227 Epoch: 10 Global Step: 434390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:32:14,245-Speed 2623.69 samples/sec Loss 6.2988 LearningRate 0.0227 Epoch: 10 Global Step: 434400 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:32:18,141-Speed 2628.26 samples/sec Loss 6.3512 LearningRate 0.0227 Epoch: 10 Global Step: 434410 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:32:22,024-Speed 2638.40 samples/sec Loss 6.3947 LearningRate 0.0227 Epoch: 10 Global Step: 434420 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:25,921-Speed 2628.26 samples/sec Loss 6.2036 LearningRate 0.0227 Epoch: 10 Global Step: 434430 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:29,862-Speed 2599.01 samples/sec Loss 6.3930 LearningRate 0.0227 Epoch: 10 Global Step: 434440 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:33,772-Speed 2619.68 samples/sec Loss 6.4032 LearningRate 0.0227 Epoch: 10 Global Step: 434450 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:37,668-Speed 2629.01 samples/sec Loss 6.3915 LearningRate 0.0227 Epoch: 10 Global Step: 434460 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:41,564-Speed 2629.45 samples/sec Loss 6.3025 LearningRate 0.0227 Epoch: 10 Global Step: 434470 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:45,484-Speed 2612.67 samples/sec Loss 6.3251 LearningRate 0.0227 Epoch: 10 Global Step: 434480 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:49,378-Speed 2630.36 samples/sec Loss 6.4590 LearningRate 0.0227 Epoch: 10 Global Step: 434490 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:53,272-Speed 2631.02 samples/sec Loss 6.3985 LearningRate 0.0227 Epoch: 10 Global Step: 434500 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:32:57,166-Speed 2630.26 samples/sec Loss 6.2635 LearningRate 0.0227 Epoch: 10 Global Step: 434510 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:01,061-Speed 2629.63 samples/sec Loss 6.3696 LearningRate 0.0227 Epoch: 10 Global Step: 434520 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:33:04,942-Speed 2638.69 samples/sec Loss 6.4109 LearningRate 0.0227 Epoch: 10 Global Step: 434530 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:08,835-Speed 2631.44 samples/sec Loss 6.3371 LearningRate 0.0227 Epoch: 10 Global Step: 434540 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:12,732-Speed 2628.41 samples/sec Loss 6.4328 LearningRate 0.0227 Epoch: 10 Global Step: 434550 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:16,624-Speed 2631.68 samples/sec Loss 6.3268 LearningRate 0.0227 Epoch: 10 Global Step: 434560 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:20,519-Speed 2629.88 samples/sec Loss 6.3801 LearningRate 0.0227 Epoch: 10 Global Step: 434570 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:24,442-Speed 2611.17 samples/sec Loss 6.2570 LearningRate 0.0227 Epoch: 10 Global Step: 434580 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:28,335-Speed 2630.76 samples/sec Loss 6.4403 LearningRate 0.0227 Epoch: 10 Global Step: 434590 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:32,230-Speed 2629.79 samples/sec Loss 6.3132 LearningRate 0.0227 Epoch: 10 Global Step: 434600 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:36,127-Speed 2628.00 samples/sec Loss 6.3150 LearningRate 0.0227 Epoch: 10 Global Step: 434610 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:40,023-Speed 2629.42 samples/sec Loss 6.4274 LearningRate 0.0227 Epoch: 10 Global Step: 434620 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 20:33:43,918-Speed 2629.93 samples/sec Loss 6.3352 LearningRate 0.0227 Epoch: 10 Global Step: 434630 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:33:47,842-Speed 2610.12 samples/sec Loss 6.3479 LearningRate 0.0227 Epoch: 10 Global Step: 434640 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:33:51,741-Speed 2626.37 samples/sec Loss 6.3440 LearningRate 0.0227 Epoch: 10 Global Step: 434650 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:33:55,636-Speed 2630.67 samples/sec Loss 6.3686 LearningRate 0.0227 Epoch: 10 Global Step: 434660 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:33:59,530-Speed 2629.92 samples/sec Loss 6.4050 LearningRate 0.0227 Epoch: 10 Global Step: 434670 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:03,426-Speed 2628.66 samples/sec Loss 6.2285 LearningRate 0.0227 Epoch: 10 Global Step: 434680 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:07,319-Speed 2630.65 samples/sec Loss 6.3437 LearningRate 0.0227 Epoch: 10 Global Step: 434690 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:11,230-Speed 2619.37 samples/sec Loss 6.3405 LearningRate 0.0227 Epoch: 10 Global Step: 434700 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:15,239-Speed 2555.01 samples/sec Loss 6.4205 LearningRate 0.0227 Epoch: 10 Global Step: 434710 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:19,202-Speed 2584.37 samples/sec Loss 6.3138 LearningRate 0.0227 Epoch: 10 Global Step: 434720 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:23,110-Speed 2620.99 samples/sec Loss 6.4438 LearningRate 0.0227 Epoch: 10 Global Step: 434730 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:34:27,006-Speed 2628.74 samples/sec Loss 6.2468 LearningRate 0.0227 Epoch: 10 Global Step: 434740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:30,906-Speed 2626.30 samples/sec Loss 6.2398 LearningRate 0.0227 Epoch: 10 Global Step: 434750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:34,801-Speed 2629.42 samples/sec Loss 6.3179 LearningRate 0.0227 Epoch: 10 Global Step: 434760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:38,717-Speed 2615.82 samples/sec Loss 6.4506 LearningRate 0.0226 Epoch: 10 Global Step: 434770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:42,617-Speed 2625.64 samples/sec Loss 6.3466 LearningRate 0.0226 Epoch: 10 Global Step: 434780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:46,519-Speed 2625.33 samples/sec Loss 6.3921 LearningRate 0.0226 Epoch: 10 Global Step: 434790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:50,421-Speed 2625.04 samples/sec Loss 6.3525 LearningRate 0.0226 Epoch: 10 Global Step: 434800 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:54,328-Speed 2621.68 samples/sec Loss 6.3503 LearningRate 0.0226 Epoch: 10 Global Step: 434810 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:34:58,225-Speed 2628.07 samples/sec Loss 6.2347 LearningRate 0.0226 Epoch: 10 Global Step: 434820 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:02,120-Speed 2629.48 samples/sec Loss 6.4903 LearningRate 0.0226 Epoch: 10 Global Step: 434830 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:06,017-Speed 2628.28 samples/sec Loss 6.2745 LearningRate 0.0226 Epoch: 10 Global Step: 434840 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:35:10,055-Speed 2536.21 samples/sec Loss 6.2887 LearningRate 0.0226 Epoch: 10 Global Step: 434850 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:35:14,109-Speed 2526.85 samples/sec Loss 6.3343 LearningRate 0.0226 Epoch: 10 Global Step: 434860 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:35:18,017-Speed 2620.18 samples/sec Loss 6.3146 LearningRate 0.0226 Epoch: 10 Global Step: 434870 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:35:21,904-Speed 2635.29 samples/sec Loss 6.2668 LearningRate 0.0226 Epoch: 10 Global Step: 434880 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:25,899-Speed 2563.76 samples/sec Loss 6.4016 LearningRate 0.0226 Epoch: 10 Global Step: 434890 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:29,797-Speed 2627.80 samples/sec Loss 6.4255 LearningRate 0.0226 Epoch: 10 Global Step: 434900 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:33,694-Speed 2628.57 samples/sec Loss 6.2868 LearningRate 0.0226 Epoch: 10 Global Step: 434910 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:37,592-Speed 2627.45 samples/sec Loss 6.2379 LearningRate 0.0226 Epoch: 10 Global Step: 434920 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:41,492-Speed 2625.63 samples/sec Loss 6.3239 LearningRate 0.0226 Epoch: 10 Global Step: 434930 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:45,391-Speed 2627.55 samples/sec Loss 6.2751 LearningRate 0.0226 Epoch: 10 Global Step: 434940 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:49,286-Speed 2629.27 samples/sec Loss 6.4410 LearningRate 0.0226 Epoch: 10 Global Step: 434950 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:53,183-Speed 2628.39 samples/sec Loss 6.4129 LearningRate 0.0226 Epoch: 10 Global Step: 434960 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:35:57,091-Speed 2620.77 samples/sec Loss 6.3448 LearningRate 0.0226 Epoch: 10 Global Step: 434970 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:36:00,988-Speed 2627.89 samples/sec Loss 6.3709 LearningRate 0.0226 Epoch: 10 Global Step: 434980 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:04,882-Speed 2630.78 samples/sec Loss 6.3860 LearningRate 0.0226 Epoch: 10 Global Step: 434990 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:08,790-Speed 2620.77 samples/sec Loss 6.3490 LearningRate 0.0226 Epoch: 10 Global Step: 435000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:12,684-Speed 2629.96 samples/sec Loss 6.4025 LearningRate 0.0226 Epoch: 10 Global Step: 435010 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:16,587-Speed 2624.39 samples/sec Loss 6.3194 LearningRate 0.0226 Epoch: 10 Global Step: 435020 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:20,495-Speed 2621.16 samples/sec Loss 6.2723 LearningRate 0.0226 Epoch: 10 Global Step: 435030 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:24,390-Speed 2629.83 samples/sec Loss 6.3612 LearningRate 0.0226 Epoch: 10 Global Step: 435040 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:28,281-Speed 2632.08 samples/sec Loss 6.2052 LearningRate 0.0226 Epoch: 10 Global Step: 435050 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:32,207-Speed 2609.10 samples/sec Loss 6.4196 LearningRate 0.0226 Epoch: 10 Global Step: 435060 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:36,101-Speed 2629.61 samples/sec Loss 6.3076 LearningRate 0.0226 Epoch: 10 Global Step: 435070 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:39,985-Speed 2637.52 samples/sec Loss 6.4457 LearningRate 0.0226 Epoch: 10 Global Step: 435080 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:43,900-Speed 2616.45 samples/sec Loss 6.3510 LearningRate 0.0226 Epoch: 10 Global Step: 435090 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:47,790-Speed 2632.79 samples/sec Loss 6.2802 LearningRate 0.0226 Epoch: 10 Global Step: 435100 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:51,686-Speed 2629.23 samples/sec Loss 6.3785 LearningRate 0.0226 Epoch: 10 Global Step: 435110 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:55,581-Speed 2629.47 samples/sec Loss 6.3164 LearningRate 0.0226 Epoch: 10 Global Step: 435120 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:36:59,497-Speed 2615.22 samples/sec Loss 6.3071 LearningRate 0.0226 Epoch: 10 Global Step: 435130 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:03,454-Speed 2588.28 samples/sec Loss 6.4423 LearningRate 0.0226 Epoch: 10 Global Step: 435140 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:07,357-Speed 2624.32 samples/sec Loss 6.3014 LearningRate 0.0226 Epoch: 10 Global Step: 435150 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:11,251-Speed 2630.05 samples/sec Loss 6.4760 LearningRate 0.0226 Epoch: 10 Global Step: 435160 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:15,144-Speed 2630.78 samples/sec Loss 6.4443 LearningRate 0.0226 Epoch: 10 Global Step: 435170 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:19,027-Speed 2638.10 samples/sec Loss 6.4604 LearningRate 0.0226 Epoch: 10 Global Step: 435180 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:22,934-Speed 2621.97 samples/sec Loss 6.4404 LearningRate 0.0226 Epoch: 10 Global Step: 435190 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:26,830-Speed 2628.56 samples/sec Loss 6.2027 LearningRate 0.0226 Epoch: 10 Global Step: 435200 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:30,725-Speed 2629.71 samples/sec Loss 6.3825 LearningRate 0.0226 Epoch: 10 Global Step: 435210 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:34,721-Speed 2563.47 samples/sec Loss 6.3385 LearningRate 0.0226 Epoch: 10 Global Step: 435220 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:38,807-Speed 2506.56 samples/sec Loss 6.2416 LearningRate 0.0226 Epoch: 10 Global Step: 435230 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:42,719-Speed 2618.04 samples/sec Loss 6.3448 LearningRate 0.0226 Epoch: 10 Global Step: 435240 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:46,616-Speed 2629.05 samples/sec Loss 6.4572 LearningRate 0.0226 Epoch: 10 Global Step: 435250 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:50,514-Speed 2628.38 samples/sec Loss 6.4223 LearningRate 0.0226 Epoch: 10 Global Step: 435260 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:54,414-Speed 2625.89 samples/sec Loss 6.3508 LearningRate 0.0226 Epoch: 10 Global Step: 435270 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:37:58,317-Speed 2624.86 samples/sec Loss 6.2668 LearningRate 0.0226 Epoch: 10 Global Step: 435280 Fp16 Grad Scale: 262144 Required: 44 hours
Training: 2022-04-14 20:38:02,202-Speed 2636.12 samples/sec Loss 6.3969 LearningRate 0.0226 Epoch: 10 Global Step: 435290 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:38:06,099-Speed 2628.33 samples/sec Loss 6.4238 LearningRate 0.0226 Epoch: 10 Global Step: 435300 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:38:09,995-Speed 2628.47 samples/sec Loss 6.3315 LearningRate 0.0226 Epoch: 10 Global Step: 435310 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:38:13,895-Speed 2626.26 samples/sec Loss 6.2584 LearningRate 0.0226 Epoch: 10 Global Step: 435320 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:38:17,792-Speed 2628.45 samples/sec Loss 6.2379 LearningRate 0.0226 Epoch: 10 Global Step: 435330 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:38:21,692-Speed 2626.00 samples/sec Loss 6.2845 LearningRate 0.0226 Epoch: 10 Global Step: 435340 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:38:25,568-Speed 2642.75 samples/sec Loss 6.3300 LearningRate 0.0226 Epoch: 10 Global Step: 435350 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:29,465-Speed 2628.49 samples/sec Loss 6.2957 LearningRate 0.0226 Epoch: 10 Global Step: 435360 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:33,363-Speed 2626.87 samples/sec Loss 6.2432 LearningRate 0.0226 Epoch: 10 Global Step: 435370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:37,256-Speed 2631.73 samples/sec Loss 6.3320 LearningRate 0.0226 Epoch: 10 Global Step: 435380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:41,148-Speed 2631.64 samples/sec Loss 6.2248 LearningRate 0.0226 Epoch: 10 Global Step: 435390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:45,046-Speed 2627.17 samples/sec Loss 6.3874 LearningRate 0.0226 Epoch: 10 Global Step: 435400 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:48,949-Speed 2624.44 samples/sec Loss 6.2118 LearningRate 0.0226 Epoch: 10 Global Step: 435410 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:52,869-Speed 2612.71 samples/sec Loss 6.3740 LearningRate 0.0226 Epoch: 10 Global Step: 435420 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:38:56,772-Speed 2624.07 samples/sec Loss 6.4534 LearningRate 0.0226 Epoch: 10 Global Step: 435430 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:00,674-Speed 2624.87 samples/sec Loss 6.3697 LearningRate 0.0226 Epoch: 10 Global Step: 435440 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:04,594-Speed 2613.14 samples/sec Loss 6.4769 LearningRate 0.0226 Epoch: 10 Global Step: 435450 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:39:08,488-Speed 2629.65 samples/sec Loss 6.3293 LearningRate 0.0226 Epoch: 10 Global Step: 435460 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:39:12,389-Speed 2625.61 samples/sec Loss 6.2763 LearningRate 0.0226 Epoch: 10 Global Step: 435470 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:39:16,300-Speed 2624.08 samples/sec Loss 6.3222 LearningRate 0.0226 Epoch: 10 Global Step: 435480 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:39:20,195-Speed 2629.58 samples/sec Loss 6.2738 LearningRate 0.0226 Epoch: 10 Global Step: 435490 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:39:24,093-Speed 2627.82 samples/sec Loss 6.3409 LearningRate 0.0226 Epoch: 10 Global Step: 435500 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:39:27,970-Speed 2641.82 samples/sec Loss 6.2486 LearningRate 0.0226 Epoch: 10 Global Step: 435510 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:31,872-Speed 2624.78 samples/sec Loss 6.2688 LearningRate 0.0226 Epoch: 10 Global Step: 435520 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:35,772-Speed 2626.04 samples/sec Loss 6.3278 LearningRate 0.0226 Epoch: 10 Global Step: 435530 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:39,672-Speed 2626.18 samples/sec Loss 6.4065 LearningRate 0.0226 Epoch: 10 Global Step: 435540 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:43,586-Speed 2617.05 samples/sec Loss 6.3267 LearningRate 0.0226 Epoch: 10 Global Step: 435550 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:47,482-Speed 2629.03 samples/sec Loss 6.3321 LearningRate 0.0226 Epoch: 10 Global Step: 435560 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:51,377-Speed 2629.43 samples/sec Loss 6.3204 LearningRate 0.0226 Epoch: 10 Global Step: 435570 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:55,272-Speed 2629.75 samples/sec Loss 6.2784 LearningRate 0.0226 Epoch: 10 Global Step: 435580 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:39:59,168-Speed 2628.70 samples/sec Loss 6.2462 LearningRate 0.0226 Epoch: 10 Global Step: 435590 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:03,066-Speed 2627.87 samples/sec Loss 6.3457 LearningRate 0.0226 Epoch: 10 Global Step: 435600 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:06,965-Speed 2626.89 samples/sec Loss 6.3026 LearningRate 0.0226 Epoch: 10 Global Step: 435610 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:40:10,863-Speed 2627.49 samples/sec Loss 6.3208 LearningRate 0.0226 Epoch: 10 Global Step: 435620 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:40:14,758-Speed 2629.46 samples/sec Loss 6.2753 LearningRate 0.0226 Epoch: 10 Global Step: 435630 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:40:18,651-Speed 2631.07 samples/sec Loss 6.3041 LearningRate 0.0225 Epoch: 10 Global Step: 435640 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:40:22,546-Speed 2629.74 samples/sec Loss 6.4258 LearningRate 0.0225 Epoch: 10 Global Step: 435650 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:40:26,425-Speed 2640.37 samples/sec Loss 6.2961 LearningRate 0.0225 Epoch: 10 Global Step: 435660 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:30,341-Speed 2615.91 samples/sec Loss 6.3082 LearningRate 0.0225 Epoch: 10 Global Step: 435670 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:34,249-Speed 2620.77 samples/sec Loss 6.3815 LearningRate 0.0225 Epoch: 10 Global Step: 435680 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:38,145-Speed 2628.25 samples/sec Loss 6.3702 LearningRate 0.0225 Epoch: 10 Global Step: 435690 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:42,038-Speed 2631.53 samples/sec Loss 6.2909 LearningRate 0.0225 Epoch: 10 Global Step: 435700 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:45,935-Speed 2628.56 samples/sec Loss 6.4083 LearningRate 0.0225 Epoch: 10 Global Step: 435710 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:49,880-Speed 2595.78 samples/sec Loss 6.3731 LearningRate 0.0225 Epoch: 10 Global Step: 435720 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:53,915-Speed 2538.49 samples/sec Loss 6.3982 LearningRate 0.0225 Epoch: 10 Global Step: 435730 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:40:57,813-Speed 2627.85 samples/sec Loss 6.3614 LearningRate 0.0225 Epoch: 10 Global Step: 435740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:01,721-Speed 2620.68 samples/sec Loss 6.3705 LearningRate 0.0225 Epoch: 10 Global Step: 435750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:05,617-Speed 2628.82 samples/sec Loss 6.3261 LearningRate 0.0225 Epoch: 10 Global Step: 435760 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:41:09,516-Speed 2627.38 samples/sec Loss 6.2244 LearningRate 0.0225 Epoch: 10 Global Step: 435770 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:41:13,411-Speed 2629.51 samples/sec Loss 6.3509 LearningRate 0.0225 Epoch: 10 Global Step: 435780 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:41:17,290-Speed 2640.36 samples/sec Loss 6.2364 LearningRate 0.0225 Epoch: 10 Global Step: 435790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:21,189-Speed 2627.10 samples/sec Loss 6.4074 LearningRate 0.0225 Epoch: 10 Global Step: 435800 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:25,111-Speed 2611.52 samples/sec Loss 6.2444 LearningRate 0.0225 Epoch: 10 Global Step: 435810 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:29,024-Speed 2617.53 samples/sec Loss 6.3916 LearningRate 0.0225 Epoch: 10 Global Step: 435820 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:32,926-Speed 2624.51 samples/sec Loss 6.3966 LearningRate 0.0225 Epoch: 10 Global Step: 435830 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:36,823-Speed 2628.40 samples/sec Loss 6.3773 LearningRate 0.0225 Epoch: 10 Global Step: 435840 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:40,718-Speed 2629.30 samples/sec Loss 6.3805 LearningRate 0.0225 Epoch: 10 Global Step: 435850 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:44,614-Speed 2629.10 samples/sec Loss 6.3019 LearningRate 0.0225 Epoch: 10 Global Step: 435860 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:48,509-Speed 2629.64 samples/sec Loss 6.2837 LearningRate 0.0225 Epoch: 10 Global Step: 435870 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:52,433-Speed 2610.60 samples/sec Loss 6.2611 LearningRate 0.0225 Epoch: 10 Global Step: 435880 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:41:56,331-Speed 2626.99 samples/sec Loss 6.2153 LearningRate 0.0225 Epoch: 10 Global Step: 435890 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:00,229-Speed 2627.90 samples/sec Loss 6.3453 LearningRate 0.0225 Epoch: 10 Global Step: 435900 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:04,144-Speed 2616.35 samples/sec Loss 6.3042 LearningRate 0.0225 Epoch: 10 Global Step: 435910 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:08,044-Speed 2626.02 samples/sec Loss 6.1434 LearningRate 0.0225 Epoch: 10 Global Step: 435920 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:11,950-Speed 2621.68 samples/sec Loss 6.2515 LearningRate 0.0225 Epoch: 10 Global Step: 435930 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:15,845-Speed 2629.76 samples/sec Loss 6.3617 LearningRate 0.0225 Epoch: 10 Global Step: 435940 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:19,777-Speed 2604.60 samples/sec Loss 6.3779 LearningRate 0.0225 Epoch: 10 Global Step: 435950 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:23,680-Speed 2624.87 samples/sec Loss 6.2586 LearningRate 0.0225 Epoch: 10 Global Step: 435960 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:42:27,555-Speed 2643.04 samples/sec Loss 6.2350 LearningRate 0.0225 Epoch: 10 Global Step: 435970 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:31,450-Speed 2629.75 samples/sec Loss 6.3592 LearningRate 0.0225 Epoch: 10 Global Step: 435980 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:35,350-Speed 2626.25 samples/sec Loss 6.3403 LearningRate 0.0225 Epoch: 10 Global Step: 435990 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:39,276-Speed 2608.89 samples/sec Loss 6.2830 LearningRate 0.0225 Epoch: 10 Global Step: 436000 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:43,176-Speed 2625.55 samples/sec Loss 6.3801 LearningRate 0.0225 Epoch: 10 Global Step: 436010 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:47,074-Speed 2628.16 samples/sec Loss 6.3171 LearningRate 0.0225 Epoch: 10 Global Step: 436020 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:50,981-Speed 2621.80 samples/sec Loss 6.3362 LearningRate 0.0225 Epoch: 10 Global Step: 436030 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:54,878-Speed 2627.65 samples/sec Loss 6.3914 LearningRate 0.0225 Epoch: 10 Global Step: 436040 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:42:58,825-Speed 2595.54 samples/sec Loss 6.3657 LearningRate 0.0225 Epoch: 10 Global Step: 436050 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:02,721-Speed 2628.99 samples/sec Loss 6.3723 LearningRate 0.0225 Epoch: 10 Global Step: 436060 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:06,618-Speed 2628.13 samples/sec Loss 6.3546 LearningRate 0.0225 Epoch: 10 Global Step: 436070 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:43:10,516-Speed 2627.11 samples/sec Loss 6.2867 LearningRate 0.0225 Epoch: 10 Global Step: 436080 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:43:14,609-Speed 2502.27 samples/sec Loss 6.2844 LearningRate 0.0225 Epoch: 10 Global Step: 436090 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:43:18,700-Speed 2503.64 samples/sec Loss 6.2493 LearningRate 0.0225 Epoch: 10 Global Step: 436100 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:43:22,732-Speed 2540.22 samples/sec Loss 6.2888 LearningRate 0.0225 Epoch: 10 Global Step: 436110 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:43:26,628-Speed 2628.76 samples/sec Loss 6.3285 LearningRate 0.0225 Epoch: 10 Global Step: 436120 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:43:30,526-Speed 2627.98 samples/sec Loss 6.3430 LearningRate 0.0225 Epoch: 10 Global Step: 436130 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:43:34,406-Speed 2639.31 samples/sec Loss 6.2764 LearningRate 0.0225 Epoch: 10 Global Step: 436140 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:38,302-Speed 2629.20 samples/sec Loss 6.3724 LearningRate 0.0225 Epoch: 10 Global Step: 436150 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:42,205-Speed 2624.41 samples/sec Loss 6.3183 LearningRate 0.0225 Epoch: 10 Global Step: 436160 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:46,109-Speed 2623.63 samples/sec Loss 6.3699 LearningRate 0.0225 Epoch: 10 Global Step: 436170 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:50,007-Speed 2627.17 samples/sec Loss 6.3283 LearningRate 0.0225 Epoch: 10 Global Step: 436180 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:53,904-Speed 2628.96 samples/sec Loss 6.4127 LearningRate 0.0225 Epoch: 10 Global Step: 436190 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:43:57,798-Speed 2629.94 samples/sec Loss 6.4001 LearningRate 0.0225 Epoch: 10 Global Step: 436200 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:01,699-Speed 2626.34 samples/sec Loss 6.2999 LearningRate 0.0225 Epoch: 10 Global Step: 436210 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:05,602-Speed 2624.01 samples/sec Loss 6.3659 LearningRate 0.0225 Epoch: 10 Global Step: 436220 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:09,500-Speed 2626.96 samples/sec Loss 6.1954 LearningRate 0.0225 Epoch: 10 Global Step: 436230 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:13,397-Speed 2628.15 samples/sec Loss 6.3017 LearningRate 0.0225 Epoch: 10 Global Step: 436240 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:44:17,293-Speed 2629.50 samples/sec Loss 6.2480 LearningRate 0.0225 Epoch: 10 Global Step: 436250 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:44:21,566-Speed 2397.45 samples/sec Loss 6.2308 LearningRate 0.0225 Epoch: 10 Global Step: 436260 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:44:25,460-Speed 2630.25 samples/sec Loss 6.3225 LearningRate 0.0225 Epoch: 10 Global Step: 436270 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:44:29,359-Speed 2626.69 samples/sec Loss 6.3121 LearningRate 0.0225 Epoch: 10 Global Step: 436280 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:44:33,255-Speed 2628.53 samples/sec Loss 6.3998 LearningRate 0.0225 Epoch: 10 Global Step: 436290 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:44:37,129-Speed 2644.46 samples/sec Loss 6.2320 LearningRate 0.0225 Epoch: 10 Global Step: 436300 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:41,024-Speed 2629.81 samples/sec Loss 6.2967 LearningRate 0.0225 Epoch: 10 Global Step: 436310 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:44,949-Speed 2609.26 samples/sec Loss 6.3966 LearningRate 0.0225 Epoch: 10 Global Step: 436320 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:48,846-Speed 2627.81 samples/sec Loss 6.4316 LearningRate 0.0225 Epoch: 10 Global Step: 436330 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:52,761-Speed 2616.49 samples/sec Loss 6.3539 LearningRate 0.0225 Epoch: 10 Global Step: 436340 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:44:56,720-Speed 2587.17 samples/sec Loss 6.3214 LearningRate 0.0225 Epoch: 10 Global Step: 436350 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:00,724-Speed 2558.22 samples/sec Loss 6.3442 LearningRate 0.0225 Epoch: 10 Global Step: 436360 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:04,631-Speed 2621.39 samples/sec Loss 6.2869 LearningRate 0.0225 Epoch: 10 Global Step: 436370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:08,535-Speed 2623.61 samples/sec Loss 6.2710 LearningRate 0.0225 Epoch: 10 Global Step: 436380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:12,447-Speed 2618.21 samples/sec Loss 6.2048 LearningRate 0.0225 Epoch: 10 Global Step: 436390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:16,347-Speed 2626.85 samples/sec Loss 6.2824 LearningRate 0.0225 Epoch: 10 Global Step: 436400 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:45:20,243-Speed 2628.41 samples/sec Loss 6.2965 LearningRate 0.0225 Epoch: 10 Global Step: 436410 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:45:24,139-Speed 2629.32 samples/sec Loss 6.3396 LearningRate 0.0225 Epoch: 10 Global Step: 436420 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:45:28,042-Speed 2624.60 samples/sec Loss 6.2372 LearningRate 0.0225 Epoch: 10 Global Step: 436430 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:45:31,939-Speed 2628.60 samples/sec Loss 6.3803 LearningRate 0.0225 Epoch: 10 Global Step: 436440 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:45:35,837-Speed 2627.45 samples/sec Loss 6.4515 LearningRate 0.0225 Epoch: 10 Global Step: 436450 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:45:39,715-Speed 2641.37 samples/sec Loss 6.4671 LearningRate 0.0225 Epoch: 10 Global Step: 436460 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:43,618-Speed 2623.91 samples/sec Loss 6.4897 LearningRate 0.0225 Epoch: 10 Global Step: 436470 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:47,514-Speed 2629.39 samples/sec Loss 6.2310 LearningRate 0.0225 Epoch: 10 Global Step: 436480 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:51,497-Speed 2572.32 samples/sec Loss 6.3592 LearningRate 0.0225 Epoch: 10 Global Step: 436490 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:55,618-Speed 2485.10 samples/sec Loss 6.4298 LearningRate 0.0225 Epoch: 10 Global Step: 436500 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:45:59,528-Speed 2619.89 samples/sec Loss 6.3418 LearningRate 0.0225 Epoch: 10 Global Step: 436510 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:03,445-Speed 2614.55 samples/sec Loss 6.3420 LearningRate 0.0224 Epoch: 10 Global Step: 436520 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:07,353-Speed 2620.67 samples/sec Loss 6.4001 LearningRate 0.0224 Epoch: 10 Global Step: 436530 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:11,251-Speed 2627.83 samples/sec Loss 6.2728 LearningRate 0.0224 Epoch: 10 Global Step: 436540 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:15,150-Speed 2626.78 samples/sec Loss 6.2836 LearningRate 0.0224 Epoch: 10 Global Step: 436550 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:19,047-Speed 2628.16 samples/sec Loss 6.3874 LearningRate 0.0224 Epoch: 10 Global Step: 436560 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:46:22,941-Speed 2631.44 samples/sec Loss 6.3704 LearningRate 0.0224 Epoch: 10 Global Step: 436570 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:46:26,836-Speed 2629.59 samples/sec Loss 6.2974 LearningRate 0.0224 Epoch: 10 Global Step: 436580 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:46:30,744-Speed 2620.67 samples/sec Loss 6.3857 LearningRate 0.0224 Epoch: 10 Global Step: 436590 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:46:34,640-Speed 2628.48 samples/sec Loss 6.3771 LearningRate 0.0224 Epoch: 10 Global Step: 436600 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:46:38,535-Speed 2630.09 samples/sec Loss 6.4001 LearningRate 0.0224 Epoch: 10 Global Step: 436610 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:46:42,430-Speed 2629.22 samples/sec Loss 6.3554 LearningRate 0.0224 Epoch: 10 Global Step: 436620 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:46:46,306-Speed 2642.25 samples/sec Loss 6.3407 LearningRate 0.0224 Epoch: 10 Global Step: 436630 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:50,201-Speed 2630.02 samples/sec Loss 6.2895 LearningRate 0.0224 Epoch: 10 Global Step: 436640 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:54,112-Speed 2619.12 samples/sec Loss 6.4383 LearningRate 0.0224 Epoch: 10 Global Step: 436650 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:46:58,010-Speed 2627.84 samples/sec Loss 6.3451 LearningRate 0.0224 Epoch: 10 Global Step: 436660 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:01,939-Speed 2607.53 samples/sec Loss 6.2292 LearningRate 0.0224 Epoch: 10 Global Step: 436670 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:05,837-Speed 2627.30 samples/sec Loss 6.3104 LearningRate 0.0224 Epoch: 10 Global Step: 436680 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:09,748-Speed 2618.46 samples/sec Loss 6.2207 LearningRate 0.0224 Epoch: 10 Global Step: 436690 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:13,648-Speed 2626.14 samples/sec Loss 6.3746 LearningRate 0.0224 Epoch: 10 Global Step: 436700 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:17,550-Speed 2625.33 samples/sec Loss 6.4399 LearningRate 0.0224 Epoch: 10 Global Step: 436710 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:21,459-Speed 2620.09 samples/sec Loss 6.2375 LearningRate 0.0224 Epoch: 10 Global Step: 436720 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:25,409-Speed 2592.83 samples/sec Loss 6.3900 LearningRate 0.0224 Epoch: 10 Global Step: 436730 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:29,317-Speed 2621.39 samples/sec Loss 6.2880 LearningRate 0.0224 Epoch: 10 Global Step: 436740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:33,240-Speed 2610.45 samples/sec Loss 6.3070 LearningRate 0.0224 Epoch: 10 Global Step: 436750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:37,137-Speed 2628.40 samples/sec Loss 6.3836 LearningRate 0.0224 Epoch: 10 Global Step: 436760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:41,041-Speed 2623.72 samples/sec Loss 6.3690 LearningRate 0.0224 Epoch: 10 Global Step: 436770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:44,952-Speed 2618.68 samples/sec Loss 6.4048 LearningRate 0.0224 Epoch: 10 Global Step: 436780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:48,846-Speed 2630.20 samples/sec Loss 6.4363 LearningRate 0.0224 Epoch: 10 Global Step: 436790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:52,751-Speed 2622.90 samples/sec Loss 6.2822 LearningRate 0.0224 Epoch: 10 Global Step: 436800 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:47:56,657-Speed 2622.00 samples/sec Loss 6.3171 LearningRate 0.0224 Epoch: 10 Global Step: 436810 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:48:00,555-Speed 2627.54 samples/sec Loss 6.3611 LearningRate 0.0224 Epoch: 10 Global Step: 436820 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:48:04,455-Speed 2626.58 samples/sec Loss 6.2752 LearningRate 0.0224 Epoch: 10 Global Step: 436830 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:08,367-Speed 2618.20 samples/sec Loss 6.2809 LearningRate 0.0224 Epoch: 10 Global Step: 436840 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:12,269-Speed 2624.92 samples/sec Loss 6.2914 LearningRate 0.0224 Epoch: 10 Global Step: 436850 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:16,172-Speed 2624.20 samples/sec Loss 6.2429 LearningRate 0.0224 Epoch: 10 Global Step: 436860 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:20,201-Speed 2541.99 samples/sec Loss 6.3066 LearningRate 0.0224 Epoch: 10 Global Step: 436870 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:24,124-Speed 2610.94 samples/sec Loss 6.3549 LearningRate 0.0224 Epoch: 10 Global Step: 436880 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:28,022-Speed 2627.28 samples/sec Loss 6.3545 LearningRate 0.0224 Epoch: 10 Global Step: 436890 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:31,930-Speed 2621.49 samples/sec Loss 6.4014 LearningRate 0.0224 Epoch: 10 Global Step: 436900 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:35,826-Speed 2628.35 samples/sec Loss 6.3882 LearningRate 0.0224 Epoch: 10 Global Step: 436910 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:39,722-Speed 2629.12 samples/sec Loss 6.3094 LearningRate 0.0224 Epoch: 10 Global Step: 436920 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:43,607-Speed 2636.59 samples/sec Loss 6.2897 LearningRate 0.0224 Epoch: 10 Global Step: 436930 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:47,510-Speed 2624.51 samples/sec Loss 6.3929 LearningRate 0.0224 Epoch: 10 Global Step: 436940 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:51,406-Speed 2628.93 samples/sec Loss 6.3457 LearningRate 0.0224 Epoch: 10 Global Step: 436950 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:55,299-Speed 2630.74 samples/sec Loss 6.2440 LearningRate 0.0224 Epoch: 10 Global Step: 436960 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:48:59,209-Speed 2620.18 samples/sec Loss 6.2654 LearningRate 0.0224 Epoch: 10 Global Step: 436970 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:49:03,111-Speed 2624.71 samples/sec Loss 6.3768 LearningRate 0.0224 Epoch: 10 Global Step: 436980 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:49:07,021-Speed 2619.68 samples/sec Loss 6.4152 LearningRate 0.0224 Epoch: 10 Global Step: 436990 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:49:10,913-Speed 2631.10 samples/sec Loss 6.3131 LearningRate 0.0224 Epoch: 10 Global Step: 437000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:49:14,791-Speed 2641.14 samples/sec Loss 6.3334 LearningRate 0.0224 Epoch: 10 Global Step: 437010 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:18,688-Speed 2628.39 samples/sec Loss 6.2828 LearningRate 0.0224 Epoch: 10 Global Step: 437020 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:22,601-Speed 2618.22 samples/sec Loss 6.2619 LearningRate 0.0224 Epoch: 10 Global Step: 437030 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:26,498-Speed 2627.92 samples/sec Loss 6.3681 LearningRate 0.0224 Epoch: 10 Global Step: 437040 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:30,391-Speed 2630.79 samples/sec Loss 6.2496 LearningRate 0.0224 Epoch: 10 Global Step: 437050 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:34,287-Speed 2629.07 samples/sec Loss 6.2604 LearningRate 0.0224 Epoch: 10 Global Step: 437060 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:38,209-Speed 2611.37 samples/sec Loss 6.4356 LearningRate 0.0224 Epoch: 10 Global Step: 437070 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:42,108-Speed 2626.78 samples/sec Loss 6.3203 LearningRate 0.0224 Epoch: 10 Global Step: 437080 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:46,003-Speed 2630.52 samples/sec Loss 6.2670 LearningRate 0.0224 Epoch: 10 Global Step: 437090 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:49,905-Speed 2624.45 samples/sec Loss 6.3064 LearningRate 0.0224 Epoch: 10 Global Step: 437100 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:49:53,813-Speed 2620.72 samples/sec Loss 6.2043 LearningRate 0.0224 Epoch: 10 Global Step: 437110 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:49:57,713-Speed 2626.48 samples/sec Loss 6.2578 LearningRate 0.0224 Epoch: 10 Global Step: 437120 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:01,611-Speed 2628.01 samples/sec Loss 6.4255 LearningRate 0.0224 Epoch: 10 Global Step: 437130 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:05,513-Speed 2624.75 samples/sec Loss 6.3628 LearningRate 0.0224 Epoch: 10 Global Step: 437140 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:09,425-Speed 2617.90 samples/sec Loss 6.3791 LearningRate 0.0224 Epoch: 10 Global Step: 437150 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:13,335-Speed 2619.52 samples/sec Loss 6.2389 LearningRate 0.0224 Epoch: 10 Global Step: 437160 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:17,234-Speed 2626.67 samples/sec Loss 6.2738 LearningRate 0.0224 Epoch: 10 Global Step: 437170 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:21,131-Speed 2628.37 samples/sec Loss 6.2722 LearningRate 0.0224 Epoch: 10 Global Step: 437180 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:25,039-Speed 2620.40 samples/sec Loss 6.3382 LearningRate 0.0224 Epoch: 10 Global Step: 437190 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:28,945-Speed 2622.02 samples/sec Loss 6.2474 LearningRate 0.0224 Epoch: 10 Global Step: 437200 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:32,826-Speed 2639.40 samples/sec Loss 6.3833 LearningRate 0.0224 Epoch: 10 Global Step: 437210 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:36,719-Speed 2631.38 samples/sec Loss 6.2677 LearningRate 0.0224 Epoch: 10 Global Step: 437220 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:40,613-Speed 2630.61 samples/sec Loss 6.1946 LearningRate 0.0224 Epoch: 10 Global Step: 437230 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:44,510-Speed 2627.56 samples/sec Loss 6.2124 LearningRate 0.0224 Epoch: 10 Global Step: 437240 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:48,408-Speed 2627.96 samples/sec Loss 6.2299 LearningRate 0.0224 Epoch: 10 Global Step: 437250 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:52,308-Speed 2626.04 samples/sec Loss 6.4169 LearningRate 0.0224 Epoch: 10 Global Step: 437260 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:50:56,211-Speed 2624.60 samples/sec Loss 6.3406 LearningRate 0.0224 Epoch: 10 Global Step: 437270 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:00,105-Speed 2629.61 samples/sec Loss 6.4025 LearningRate 0.0224 Epoch: 10 Global Step: 437280 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:04,010-Speed 2623.18 samples/sec Loss 6.2494 LearningRate 0.0224 Epoch: 10 Global Step: 437290 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:07,908-Speed 2627.37 samples/sec Loss 6.1872 LearningRate 0.0224 Epoch: 10 Global Step: 437300 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:11,809-Speed 2625.49 samples/sec Loss 6.2972 LearningRate 0.0224 Epoch: 10 Global Step: 437310 Fp16 Grad Scale: 262144 Required: 44 hours
Training: 2022-04-14 20:51:15,703-Speed 2630.64 samples/sec Loss 6.3720 LearningRate 0.0224 Epoch: 10 Global Step: 437320 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:19,595-Speed 2631.44 samples/sec Loss 6.4045 LearningRate 0.0224 Epoch: 10 Global Step: 437330 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:23,489-Speed 2630.45 samples/sec Loss 6.4163 LearningRate 0.0224 Epoch: 10 Global Step: 437340 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:27,382-Speed 2630.66 samples/sec Loss 6.3298 LearningRate 0.0224 Epoch: 10 Global Step: 437350 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:31,274-Speed 2631.69 samples/sec Loss 6.3289 LearningRate 0.0224 Epoch: 10 Global Step: 437360 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:35,165-Speed 2632.30 samples/sec Loss 6.2367 LearningRate 0.0224 Epoch: 10 Global Step: 437370 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:51:39,051-Speed 2635.92 samples/sec Loss 6.3615 LearningRate 0.0224 Epoch: 10 Global Step: 437380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:51:42,947-Speed 2628.47 samples/sec Loss 6.3664 LearningRate 0.0223 Epoch: 10 Global Step: 437390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:51:46,847-Speed 2626.35 samples/sec Loss 6.3207 LearningRate 0.0223 Epoch: 10 Global Step: 437400 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:51:50,746-Speed 2627.49 samples/sec Loss 6.3430 LearningRate 0.0223 Epoch: 10 Global Step: 437410 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:51:54,657-Speed 2619.07 samples/sec Loss 6.2726 LearningRate 0.0223 Epoch: 10 Global Step: 437420 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:51:58,561-Speed 2622.91 samples/sec Loss 6.2872 LearningRate 0.0223 Epoch: 10 Global Step: 437430 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:02,458-Speed 2628.80 samples/sec Loss 6.4406 LearningRate 0.0223 Epoch: 10 Global Step: 437440 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:06,357-Speed 2626.30 samples/sec Loss 6.2669 LearningRate 0.0223 Epoch: 10 Global Step: 437450 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:10,260-Speed 2624.62 samples/sec Loss 6.4368 LearningRate 0.0223 Epoch: 10 Global Step: 437460 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:14,162-Speed 2624.78 samples/sec Loss 6.3447 LearningRate 0.0223 Epoch: 10 Global Step: 437470 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:18,067-Speed 2622.91 samples/sec Loss 6.2603 LearningRate 0.0223 Epoch: 10 Global Step: 437480 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:52:21,962-Speed 2629.50 samples/sec Loss 6.2768 LearningRate 0.0223 Epoch: 10 Global Step: 437490 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:52:25,859-Speed 2628.15 samples/sec Loss 6.2501 LearningRate 0.0223 Epoch: 10 Global Step: 437500 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:52:29,760-Speed 2626.29 samples/sec Loss 6.2802 LearningRate 0.0223 Epoch: 10 Global Step: 437510 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:52:33,659-Speed 2626.75 samples/sec Loss 6.2931 LearningRate 0.0223 Epoch: 10 Global Step: 437520 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:52:37,559-Speed 2625.97 samples/sec Loss 6.2845 LearningRate 0.0223 Epoch: 10 Global Step: 437530 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:52:41,461-Speed 2624.98 samples/sec Loss 6.3254 LearningRate 0.0223 Epoch: 10 Global Step: 437540 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:52:45,354-Speed 2636.53 samples/sec Loss 6.3358 LearningRate 0.0223 Epoch: 10 Global Step: 437550 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:49,249-Speed 2628.90 samples/sec Loss 6.1884 LearningRate 0.0223 Epoch: 10 Global Step: 437560 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:53,152-Speed 2624.72 samples/sec Loss 6.2209 LearningRate 0.0223 Epoch: 10 Global Step: 437570 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:52:57,054-Speed 2624.81 samples/sec Loss 6.3622 LearningRate 0.0223 Epoch: 10 Global Step: 437580 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:00,953-Speed 2627.04 samples/sec Loss 6.3989 LearningRate 0.0223 Epoch: 10 Global Step: 437590 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:04,855-Speed 2624.68 samples/sec Loss 6.2615 LearningRate 0.0223 Epoch: 10 Global Step: 437600 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:08,755-Speed 2626.27 samples/sec Loss 6.2458 LearningRate 0.0223 Epoch: 10 Global Step: 437610 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:12,655-Speed 2626.25 samples/sec Loss 6.3319 LearningRate 0.0223 Epoch: 10 Global Step: 437620 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:16,555-Speed 2626.16 samples/sec Loss 6.1806 LearningRate 0.0223 Epoch: 10 Global Step: 437630 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:20,459-Speed 2623.28 samples/sec Loss 6.2998 LearningRate 0.0223 Epoch: 10 Global Step: 437640 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:24,361-Speed 2624.93 samples/sec Loss 6.2091 LearningRate 0.0223 Epoch: 10 Global Step: 437650 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:53:28,259-Speed 2627.69 samples/sec Loss 6.3724 LearningRate 0.0223 Epoch: 10 Global Step: 437660 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:53:32,154-Speed 2629.93 samples/sec Loss 6.3796 LearningRate 0.0223 Epoch: 10 Global Step: 437670 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:53:36,051-Speed 2627.81 samples/sec Loss 6.2903 LearningRate 0.0223 Epoch: 10 Global Step: 437680 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:53:39,954-Speed 2624.20 samples/sec Loss 6.3392 LearningRate 0.0223 Epoch: 10 Global Step: 437690 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:53:43,855-Speed 2625.69 samples/sec Loss 6.2533 LearningRate 0.0223 Epoch: 10 Global Step: 437700 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:53:47,732-Speed 2641.55 samples/sec Loss 6.3824 LearningRate 0.0223 Epoch: 10 Global Step: 437710 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:51,632-Speed 2626.48 samples/sec Loss 6.3796 LearningRate 0.0223 Epoch: 10 Global Step: 437720 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:55,529-Speed 2628.87 samples/sec Loss 6.2278 LearningRate 0.0223 Epoch: 10 Global Step: 437730 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:53:59,427-Speed 2627.18 samples/sec Loss 6.2245 LearningRate 0.0223 Epoch: 10 Global Step: 437740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:54:03,330-Speed 2624.19 samples/sec Loss 6.3426 LearningRate 0.0223 Epoch: 10 Global Step: 437750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:54:07,225-Speed 2629.53 samples/sec Loss 6.2650 LearningRate 0.0223 Epoch: 10 Global Step: 437760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:54:11,121-Speed 2629.06 samples/sec Loss 6.3270 LearningRate 0.0223 Epoch: 10 Global Step: 437770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:54:15,024-Speed 2624.01 samples/sec Loss 6.3248 LearningRate 0.0223 Epoch: 10 Global Step: 437780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:54:18,918-Speed 2630.45 samples/sec Loss 6.2518 LearningRate 0.0223 Epoch: 10 Global Step: 437790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:54:22,814-Speed 2629.01 samples/sec Loss 6.3875 LearningRate 0.0223 Epoch: 10 Global Step: 437800 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:54:26,713-Speed 2626.41 samples/sec Loss 6.1993 LearningRate 0.0223 Epoch: 10 Global Step: 437810 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:30,611-Speed 2628.37 samples/sec Loss 6.3354 LearningRate 0.0223 Epoch: 10 Global Step: 437820 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:34,597-Speed 2569.03 samples/sec Loss 6.2937 LearningRate 0.0223 Epoch: 10 Global Step: 437830 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:38,493-Speed 2628.79 samples/sec Loss 6.1676 LearningRate 0.0223 Epoch: 10 Global Step: 437840 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:42,401-Speed 2621.15 samples/sec Loss 6.3532 LearningRate 0.0223 Epoch: 10 Global Step: 437850 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:46,304-Speed 2623.97 samples/sec Loss 6.2736 LearningRate 0.0223 Epoch: 10 Global Step: 437860 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:50,213-Speed 2620.06 samples/sec Loss 6.3724 LearningRate 0.0223 Epoch: 10 Global Step: 437870 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:54,118-Speed 2623.08 samples/sec Loss 6.3245 LearningRate 0.0223 Epoch: 10 Global Step: 437880 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:54:58,023-Speed 2622.99 samples/sec Loss 6.3258 LearningRate 0.0223 Epoch: 10 Global Step: 437890 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:55:01,905-Speed 2638.46 samples/sec Loss 6.2468 LearningRate 0.0223 Epoch: 10 Global Step: 437900 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:05,813-Speed 2621.02 samples/sec Loss 6.2462 LearningRate 0.0223 Epoch: 10 Global Step: 437910 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:09,716-Speed 2624.49 samples/sec Loss 6.3854 LearningRate 0.0223 Epoch: 10 Global Step: 437920 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:13,612-Speed 2628.69 samples/sec Loss 6.2011 LearningRate 0.0223 Epoch: 10 Global Step: 437930 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:17,512-Speed 2626.39 samples/sec Loss 6.2826 LearningRate 0.0223 Epoch: 10 Global Step: 437940 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:21,403-Speed 2631.63 samples/sec Loss 6.2486 LearningRate 0.0223 Epoch: 10 Global Step: 437950 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:25,297-Speed 2630.53 samples/sec Loss 6.2371 LearningRate 0.0223 Epoch: 10 Global Step: 437960 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:29,199-Speed 2624.63 samples/sec Loss 6.2855 LearningRate 0.0223 Epoch: 10 Global Step: 437970 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:33,100-Speed 2625.96 samples/sec Loss 6.2950 LearningRate 0.0223 Epoch: 10 Global Step: 437980 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:36,997-Speed 2627.89 samples/sec Loss 6.4281 LearningRate 0.0223 Epoch: 10 Global Step: 437990 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:55:40,892-Speed 2629.38 samples/sec Loss 6.4211 LearningRate 0.0223 Epoch: 10 Global Step: 438000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:55:44,789-Speed 2628.43 samples/sec Loss 6.2792 LearningRate 0.0223 Epoch: 10 Global Step: 438010 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:55:48,689-Speed 2627.06 samples/sec Loss 6.2878 LearningRate 0.0223 Epoch: 10 Global Step: 438020 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:55:52,591-Speed 2624.66 samples/sec Loss 6.2652 LearningRate 0.0223 Epoch: 10 Global Step: 438030 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:55:56,489-Speed 2627.33 samples/sec Loss 6.2773 LearningRate 0.0223 Epoch: 10 Global Step: 438040 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:56:00,386-Speed 2628.88 samples/sec Loss 6.2564 LearningRate 0.0223 Epoch: 10 Global Step: 438050 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:56:04,282-Speed 2628.23 samples/sec Loss 6.4159 LearningRate 0.0223 Epoch: 10 Global Step: 438060 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:56:08,180-Speed 2627.93 samples/sec Loss 6.2553 LearningRate 0.0223 Epoch: 10 Global Step: 438070 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:56:12,060-Speed 2639.11 samples/sec Loss 6.3581 LearningRate 0.0223 Epoch: 10 Global Step: 438080 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:15,952-Speed 2631.70 samples/sec Loss 6.2864 LearningRate 0.0223 Epoch: 10 Global Step: 438090 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:19,858-Speed 2622.35 samples/sec Loss 6.2399 LearningRate 0.0223 Epoch: 10 Global Step: 438100 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:23,753-Speed 2630.02 samples/sec Loss 6.2383 LearningRate 0.0223 Epoch: 10 Global Step: 438110 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:27,648-Speed 2629.69 samples/sec Loss 6.2115 LearningRate 0.0223 Epoch: 10 Global Step: 438120 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:31,553-Speed 2622.65 samples/sec Loss 6.2938 LearningRate 0.0223 Epoch: 10 Global Step: 438130 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:35,447-Speed 2630.21 samples/sec Loss 6.2329 LearningRate 0.0223 Epoch: 10 Global Step: 438140 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:39,346-Speed 2626.69 samples/sec Loss 6.1899 LearningRate 0.0223 Epoch: 10 Global Step: 438150 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:43,244-Speed 2628.17 samples/sec Loss 6.3737 LearningRate 0.0223 Epoch: 10 Global Step: 438160 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:47,150-Speed 2622.15 samples/sec Loss 6.2851 LearningRate 0.0223 Epoch: 10 Global Step: 438170 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:51,039-Speed 2633.57 samples/sec Loss 6.3414 LearningRate 0.0223 Epoch: 10 Global Step: 438180 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:54,935-Speed 2628.65 samples/sec Loss 6.2024 LearningRate 0.0223 Epoch: 10 Global Step: 438190 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:56:58,843-Speed 2620.49 samples/sec Loss 6.2807 LearningRate 0.0223 Epoch: 10 Global Step: 438200 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:02,743-Speed 2626.45 samples/sec Loss 6.2766 LearningRate 0.0223 Epoch: 10 Global Step: 438210 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:06,638-Speed 2630.91 samples/sec Loss 6.3081 LearningRate 0.0223 Epoch: 10 Global Step: 438220 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:10,537-Speed 2626.32 samples/sec Loss 6.3010 LearningRate 0.0223 Epoch: 10 Global Step: 438230 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:14,438-Speed 2625.41 samples/sec Loss 6.3241 LearningRate 0.0223 Epoch: 10 Global Step: 438240 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:18,336-Speed 2628.03 samples/sec Loss 6.3622 LearningRate 0.0223 Epoch: 10 Global Step: 438250 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:22,243-Speed 2621.53 samples/sec Loss 6.3002 LearningRate 0.0223 Epoch: 10 Global Step: 438260 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:26,144-Speed 2625.54 samples/sec Loss 6.2370 LearningRate 0.0222 Epoch: 10 Global Step: 438270 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:30,039-Speed 2629.52 samples/sec Loss 6.1974 LearningRate 0.0222 Epoch: 10 Global Step: 438280 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:57:33,938-Speed 2626.50 samples/sec Loss 6.2153 LearningRate 0.0222 Epoch: 10 Global Step: 438290 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:57:37,833-Speed 2629.72 samples/sec Loss 6.3045 LearningRate 0.0222 Epoch: 10 Global Step: 438300 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:57:41,708-Speed 2643.60 samples/sec Loss 6.3704 LearningRate 0.0222 Epoch: 10 Global Step: 438310 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:45,604-Speed 2628.54 samples/sec Loss 6.2354 LearningRate 0.0222 Epoch: 10 Global Step: 438320 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:49,497-Speed 2631.43 samples/sec Loss 6.1917 LearningRate 0.0222 Epoch: 10 Global Step: 438330 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:53,394-Speed 2628.61 samples/sec Loss 6.2083 LearningRate 0.0222 Epoch: 10 Global Step: 438340 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:57:57,294-Speed 2625.91 samples/sec Loss 6.2609 LearningRate 0.0222 Epoch: 10 Global Step: 438350 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:58:01,199-Speed 2623.19 samples/sec Loss 6.3198 LearningRate 0.0222 Epoch: 10 Global Step: 438360 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:58:05,094-Speed 2629.30 samples/sec Loss 6.4078 LearningRate 0.0222 Epoch: 10 Global Step: 438370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:58:08,992-Speed 2627.55 samples/sec Loss 6.2311 LearningRate 0.0222 Epoch: 10 Global Step: 438380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:58:12,885-Speed 2630.61 samples/sec Loss 6.2666 LearningRate 0.0222 Epoch: 10 Global Step: 438390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:58:16,780-Speed 2629.30 samples/sec Loss 6.2743 LearningRate 0.0222 Epoch: 10 Global Step: 438400 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:58:20,678-Speed 2628.46 samples/sec Loss 6.4093 LearningRate 0.0222 Epoch: 10 Global Step: 438410 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:24,572-Speed 2630.32 samples/sec Loss 6.2806 LearningRate 0.0222 Epoch: 10 Global Step: 438420 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:28,469-Speed 2628.43 samples/sec Loss 6.1766 LearningRate 0.0222 Epoch: 10 Global Step: 438430 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:32,361-Speed 2631.12 samples/sec Loss 6.3725 LearningRate 0.0222 Epoch: 10 Global Step: 438440 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:36,259-Speed 2627.76 samples/sec Loss 6.3496 LearningRate 0.0222 Epoch: 10 Global Step: 438450 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:40,165-Speed 2621.93 samples/sec Loss 6.3364 LearningRate 0.0222 Epoch: 10 Global Step: 438460 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:44,093-Speed 2608.48 samples/sec Loss 6.4585 LearningRate 0.0222 Epoch: 10 Global Step: 438470 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:47,997-Speed 2623.46 samples/sec Loss 6.2976 LearningRate 0.0222 Epoch: 10 Global Step: 438480 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:51,910-Speed 2617.15 samples/sec Loss 6.3333 LearningRate 0.0222 Epoch: 10 Global Step: 438490 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:55,809-Speed 2627.50 samples/sec Loss 6.2528 LearningRate 0.0222 Epoch: 10 Global Step: 438500 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:58:59,704-Speed 2629.49 samples/sec Loss 6.3234 LearningRate 0.0222 Epoch: 10 Global Step: 438510 Fp16 Grad Scale: 262144 Required: 44 hours
Training: 2022-04-14 20:59:03,593-Speed 2633.75 samples/sec Loss 6.2349 LearningRate 0.0222 Epoch: 10 Global Step: 438520 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:59:07,471-Speed 2640.51 samples/sec Loss 6.4183 LearningRate 0.0222 Epoch: 10 Global Step: 438530 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:11,371-Speed 2626.50 samples/sec Loss 6.2408 LearningRate 0.0222 Epoch: 10 Global Step: 438540 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:15,271-Speed 2625.90 samples/sec Loss 6.3368 LearningRate 0.0222 Epoch: 10 Global Step: 438550 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:19,178-Speed 2621.93 samples/sec Loss 6.3521 LearningRate 0.0222 Epoch: 10 Global Step: 438560 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:23,097-Speed 2613.41 samples/sec Loss 6.2122 LearningRate 0.0222 Epoch: 10 Global Step: 438570 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:26,998-Speed 2625.96 samples/sec Loss 6.3428 LearningRate 0.0222 Epoch: 10 Global Step: 438580 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:30,898-Speed 2625.48 samples/sec Loss 6.2142 LearningRate 0.0222 Epoch: 10 Global Step: 438590 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:34,798-Speed 2626.45 samples/sec Loss 6.2601 LearningRate 0.0222 Epoch: 10 Global Step: 438600 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:38,695-Speed 2628.54 samples/sec Loss 6.3275 LearningRate 0.0222 Epoch: 10 Global Step: 438610 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:42,602-Speed 2621.53 samples/sec Loss 6.3317 LearningRate 0.0222 Epoch: 10 Global Step: 438620 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 20:59:46,496-Speed 2630.34 samples/sec Loss 6.2610 LearningRate 0.0222 Epoch: 10 Global Step: 438630 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:59:50,393-Speed 2628.56 samples/sec Loss 6.2707 LearningRate 0.0222 Epoch: 10 Global Step: 438640 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:59:54,289-Speed 2628.30 samples/sec Loss 6.2592 LearningRate 0.0222 Epoch: 10 Global Step: 438650 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 20:59:58,187-Speed 2628.25 samples/sec Loss 6.3012 LearningRate 0.0222 Epoch: 10 Global Step: 438660 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:00:02,087-Speed 2626.33 samples/sec Loss 6.3156 LearningRate 0.0222 Epoch: 10 Global Step: 438670 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:00:05,987-Speed 2625.97 samples/sec Loss 6.3186 LearningRate 0.0222 Epoch: 10 Global Step: 438680 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:00:09,899-Speed 2618.35 samples/sec Loss 6.1475 LearningRate 0.0222 Epoch: 10 Global Step: 438690 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:00:13,811-Speed 2618.32 samples/sec Loss 6.3445 LearningRate 0.0222 Epoch: 10 Global Step: 438700 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:00:17,722-Speed 2619.44 samples/sec Loss 6.2939 LearningRate 0.0222 Epoch: 10 Global Step: 438710 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:00:21,607-Speed 2635.79 samples/sec Loss 6.2619 LearningRate 0.0222 Epoch: 10 Global Step: 438720 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:25,508-Speed 2626.18 samples/sec Loss 6.1443 LearningRate 0.0222 Epoch: 10 Global Step: 438730 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:29,409-Speed 2625.70 samples/sec Loss 6.2739 LearningRate 0.0222 Epoch: 10 Global Step: 438740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:33,321-Speed 2618.24 samples/sec Loss 6.2949 LearningRate 0.0222 Epoch: 10 Global Step: 438750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:37,213-Speed 2631.17 samples/sec Loss 6.4069 LearningRate 0.0222 Epoch: 10 Global Step: 438760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:41,110-Speed 2628.48 samples/sec Loss 6.2378 LearningRate 0.0222 Epoch: 10 Global Step: 438770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:45,005-Speed 2629.63 samples/sec Loss 6.2482 LearningRate 0.0222 Epoch: 10 Global Step: 438780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:48,905-Speed 2626.60 samples/sec Loss 6.2802 LearningRate 0.0222 Epoch: 10 Global Step: 438790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:52,799-Speed 2630.48 samples/sec Loss 6.2776 LearningRate 0.0222 Epoch: 10 Global Step: 438800 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:00:56,693-Speed 2630.07 samples/sec Loss 6.1297 LearningRate 0.0222 Epoch: 10 Global Step: 438810 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:00,586-Speed 2630.64 samples/sec Loss 6.3094 LearningRate 0.0222 Epoch: 10 Global Step: 438820 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:01:04,485-Speed 2627.30 samples/sec Loss 6.3217 LearningRate 0.0222 Epoch: 10 Global Step: 438830 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:01:08,391-Speed 2621.66 samples/sec Loss 6.2487 LearningRate 0.0222 Epoch: 10 Global Step: 438840 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:01:12,295-Speed 2623.54 samples/sec Loss 6.2551 LearningRate 0.0222 Epoch: 10 Global Step: 438850 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:01:16,185-Speed 2633.17 samples/sec Loss 6.2591 LearningRate 0.0222 Epoch: 10 Global Step: 438860 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:20,096-Speed 2619.05 samples/sec Loss 6.3844 LearningRate 0.0222 Epoch: 10 Global Step: 438870 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:23,992-Speed 2630.29 samples/sec Loss 6.2555 LearningRate 0.0222 Epoch: 10 Global Step: 438880 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:27,889-Speed 2628.76 samples/sec Loss 6.2916 LearningRate 0.0222 Epoch: 10 Global Step: 438890 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:31,785-Speed 2628.57 samples/sec Loss 6.4286 LearningRate 0.0222 Epoch: 10 Global Step: 438900 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:35,684-Speed 2626.84 samples/sec Loss 6.3702 LearningRate 0.0222 Epoch: 10 Global Step: 438910 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:39,581-Speed 2628.35 samples/sec Loss 6.2623 LearningRate 0.0222 Epoch: 10 Global Step: 438920 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:43,478-Speed 2627.81 samples/sec Loss 6.2948 LearningRate 0.0222 Epoch: 10 Global Step: 438930 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:47,401-Speed 2611.66 samples/sec Loss 6.2703 LearningRate 0.0222 Epoch: 10 Global Step: 438940 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:51,308-Speed 2621.08 samples/sec Loss 6.1687 LearningRate 0.0222 Epoch: 10 Global Step: 438950 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:01:55,214-Speed 2622.27 samples/sec Loss 6.3986 LearningRate 0.0222 Epoch: 10 Global Step: 438960 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:01:59,117-Speed 2624.51 samples/sec Loss 6.2619 LearningRate 0.0222 Epoch: 10 Global Step: 438970 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:02:03,017-Speed 2626.59 samples/sec Loss 6.3389 LearningRate 0.0222 Epoch: 10 Global Step: 438980 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:02:06,923-Speed 2622.24 samples/sec Loss 6.3332 LearningRate 0.0222 Epoch: 10 Global Step: 438990 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:02:10,827-Speed 2623.08 samples/sec Loss 6.2701 LearningRate 0.0222 Epoch: 10 Global Step: 439000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:02:14,752-Speed 2609.31 samples/sec Loss 6.3767 LearningRate 0.0222 Epoch: 10 Global Step: 439010 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:02:18,643-Speed 2633.48 samples/sec Loss 6.3881 LearningRate 0.0222 Epoch: 10 Global Step: 439020 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:22,550-Speed 2621.37 samples/sec Loss 6.3860 LearningRate 0.0222 Epoch: 10 Global Step: 439030 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:26,455-Speed 2622.81 samples/sec Loss 6.3443 LearningRate 0.0222 Epoch: 10 Global Step: 439040 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:30,361-Speed 2621.88 samples/sec Loss 6.1497 LearningRate 0.0222 Epoch: 10 Global Step: 439050 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:34,263-Speed 2624.80 samples/sec Loss 6.2536 LearningRate 0.0222 Epoch: 10 Global Step: 439060 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:38,172-Speed 2620.67 samples/sec Loss 6.2902 LearningRate 0.0222 Epoch: 10 Global Step: 439070 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:42,077-Speed 2623.07 samples/sec Loss 6.2380 LearningRate 0.0222 Epoch: 10 Global Step: 439080 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:46,009-Speed 2604.95 samples/sec Loss 6.1894 LearningRate 0.0222 Epoch: 10 Global Step: 439090 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:49,910-Speed 2625.79 samples/sec Loss 6.2180 LearningRate 0.0222 Epoch: 10 Global Step: 439100 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:53,817-Speed 2621.02 samples/sec Loss 6.3303 LearningRate 0.0222 Epoch: 10 Global Step: 439110 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:02:57,723-Speed 2622.59 samples/sec Loss 6.2069 LearningRate 0.0222 Epoch: 10 Global Step: 439120 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:03:01,615-Speed 2631.43 samples/sec Loss 6.2362 LearningRate 0.0222 Epoch: 10 Global Step: 439130 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:03:05,514-Speed 2626.27 samples/sec Loss 6.2942 LearningRate 0.0222 Epoch: 10 Global Step: 439140 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:03:09,421-Speed 2621.85 samples/sec Loss 6.2433 LearningRate 0.0221 Epoch: 10 Global Step: 439150 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:03:13,318-Speed 2628.85 samples/sec Loss 6.4118 LearningRate 0.0221 Epoch: 10 Global Step: 439160 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:03:17,202-Speed 2636.78 samples/sec Loss 6.3233 LearningRate 0.0221 Epoch: 10 Global Step: 439170 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:21,098-Speed 2629.25 samples/sec Loss 6.1553 LearningRate 0.0221 Epoch: 10 Global Step: 439180 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:24,990-Speed 2631.32 samples/sec Loss 6.3902 LearningRate 0.0221 Epoch: 10 Global Step: 439190 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:28,888-Speed 2628.08 samples/sec Loss 6.2793 LearningRate 0.0221 Epoch: 10 Global Step: 439200 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:32,794-Speed 2621.99 samples/sec Loss 6.3182 LearningRate 0.0221 Epoch: 10 Global Step: 439210 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:36,699-Speed 2622.63 samples/sec Loss 6.3014 LearningRate 0.0221 Epoch: 10 Global Step: 439220 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:40,609-Speed 2619.02 samples/sec Loss 6.3191 LearningRate 0.0221 Epoch: 10 Global Step: 439230 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:44,510-Speed 2626.17 samples/sec Loss 6.2335 LearningRate 0.0221 Epoch: 10 Global Step: 439240 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:48,411-Speed 2625.74 samples/sec Loss 6.2984 LearningRate 0.0221 Epoch: 10 Global Step: 439250 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:52,308-Speed 2628.24 samples/sec Loss 6.2369 LearningRate 0.0221 Epoch: 10 Global Step: 439260 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:03:56,203-Speed 2629.81 samples/sec Loss 6.2992 LearningRate 0.0221 Epoch: 10 Global Step: 439270 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:04:00,099-Speed 2628.45 samples/sec Loss 6.2712 LearningRate 0.0221 Epoch: 10 Global Step: 439280 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:04:04,001-Speed 2625.33 samples/sec Loss 6.3027 LearningRate 0.0221 Epoch: 10 Global Step: 439290 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:04:07,899-Speed 2626.92 samples/sec Loss 6.2924 LearningRate 0.0221 Epoch: 10 Global Step: 439300 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:04:11,772-Speed 2645.13 samples/sec Loss 6.1527 LearningRate 0.0221 Epoch: 10 Global Step: 439310 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:15,678-Speed 2621.84 samples/sec Loss 6.2183 LearningRate 0.0221 Epoch: 10 Global Step: 439320 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:19,589-Speed 2618.73 samples/sec Loss 6.2982 LearningRate 0.0221 Epoch: 10 Global Step: 439330 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:23,495-Speed 2622.52 samples/sec Loss 6.1981 LearningRate 0.0221 Epoch: 10 Global Step: 439340 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:27,389-Speed 2629.94 samples/sec Loss 6.2960 LearningRate 0.0221 Epoch: 10 Global Step: 439350 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:31,286-Speed 2628.46 samples/sec Loss 6.2975 LearningRate 0.0221 Epoch: 10 Global Step: 439360 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:35,184-Speed 2627.58 samples/sec Loss 6.3313 LearningRate 0.0221 Epoch: 10 Global Step: 439370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:39,082-Speed 2627.36 samples/sec Loss 6.1819 LearningRate 0.0221 Epoch: 10 Global Step: 439380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:42,980-Speed 2627.84 samples/sec Loss 6.2661 LearningRate 0.0221 Epoch: 10 Global Step: 439390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:46,873-Speed 2630.98 samples/sec Loss 6.2338 LearningRate 0.0221 Epoch: 10 Global Step: 439400 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:04:50,790-Speed 2615.11 samples/sec Loss 6.1665 LearningRate 0.0221 Epoch: 10 Global Step: 439410 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:04:54,688-Speed 2627.21 samples/sec Loss 6.2943 LearningRate 0.0221 Epoch: 10 Global Step: 439420 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:04:58,584-Speed 2628.99 samples/sec Loss 6.3450 LearningRate 0.0221 Epoch: 10 Global Step: 439430 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:05:02,482-Speed 2627.15 samples/sec Loss 6.2028 LearningRate 0.0221 Epoch: 10 Global Step: 439440 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:05:06,379-Speed 2628.36 samples/sec Loss 6.2658 LearningRate 0.0221 Epoch: 10 Global Step: 439450 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:05:10,286-Speed 2621.58 samples/sec Loss 6.2533 LearningRate 0.0221 Epoch: 10 Global Step: 439460 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:05:14,157-Speed 2646.65 samples/sec Loss 6.2623 LearningRate 0.0221 Epoch: 10 Global Step: 439470 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:18,054-Speed 2628.76 samples/sec Loss 6.2242 LearningRate 0.0221 Epoch: 10 Global Step: 439480 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:21,951-Speed 2628.31 samples/sec Loss 6.2698 LearningRate 0.0221 Epoch: 10 Global Step: 439490 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:25,843-Speed 2631.71 samples/sec Loss 6.2521 LearningRate 0.0221 Epoch: 10 Global Step: 439500 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:29,744-Speed 2625.58 samples/sec Loss 6.2683 LearningRate 0.0221 Epoch: 10 Global Step: 439510 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:33,649-Speed 2622.69 samples/sec Loss 6.2790 LearningRate 0.0221 Epoch: 10 Global Step: 439520 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:37,551-Speed 2624.23 samples/sec Loss 6.2457 LearningRate 0.0221 Epoch: 10 Global Step: 439530 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:41,460-Speed 2620.99 samples/sec Loss 6.0911 LearningRate 0.0221 Epoch: 10 Global Step: 439540 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:45,362-Speed 2624.65 samples/sec Loss 6.3067 LearningRate 0.0221 Epoch: 10 Global Step: 439550 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:49,260-Speed 2628.06 samples/sec Loss 6.2755 LearningRate 0.0221 Epoch: 10 Global Step: 439560 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:05:53,163-Speed 2624.20 samples/sec Loss 6.1967 LearningRate 0.0221 Epoch: 10 Global Step: 439570 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:05:57,057-Speed 2630.32 samples/sec Loss 6.1567 LearningRate 0.0221 Epoch: 10 Global Step: 439580 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:06:00,932-Speed 2642.94 samples/sec Loss 6.4043 LearningRate 0.0221 Epoch: 10 Global Step: 439590 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:04,827-Speed 2629.38 samples/sec Loss 6.1426 LearningRate 0.0221 Epoch: 10 Global Step: 439600 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:08,725-Speed 2627.44 samples/sec Loss 6.2339 LearningRate 0.0221 Epoch: 10 Global Step: 439610 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:12,632-Speed 2622.45 samples/sec Loss 6.2252 LearningRate 0.0221 Epoch: 10 Global Step: 439620 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:16,530-Speed 2627.47 samples/sec Loss 6.1344 LearningRate 0.0221 Epoch: 10 Global Step: 439630 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:20,427-Speed 2627.97 samples/sec Loss 6.3353 LearningRate 0.0221 Epoch: 10 Global Step: 439640 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:24,326-Speed 2627.50 samples/sec Loss 6.2962 LearningRate 0.0221 Epoch: 10 Global Step: 439650 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:28,221-Speed 2629.05 samples/sec Loss 6.1722 LearningRate 0.0221 Epoch: 10 Global Step: 439660 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:32,122-Speed 2625.92 samples/sec Loss 6.1627 LearningRate 0.0221 Epoch: 10 Global Step: 439670 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:36,023-Speed 2625.26 samples/sec Loss 6.2447 LearningRate 0.0221 Epoch: 10 Global Step: 439680 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:06:39,930-Speed 2621.72 samples/sec Loss 6.2142 LearningRate 0.0221 Epoch: 10 Global Step: 439690 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:06:43,832-Speed 2624.59 samples/sec Loss 6.3037 LearningRate 0.0221 Epoch: 10 Global Step: 439700 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:06:47,731-Speed 2626.87 samples/sec Loss 6.1299 LearningRate 0.0221 Epoch: 10 Global Step: 439710 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:06:51,635-Speed 2623.79 samples/sec Loss 6.2720 LearningRate 0.0221 Epoch: 10 Global Step: 439720 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:06:55,541-Speed 2622.03 samples/sec Loss 6.2822 LearningRate 0.0221 Epoch: 10 Global Step: 439730 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:06:59,448-Speed 2621.42 samples/sec Loss 6.3271 LearningRate 0.0221 Epoch: 10 Global Step: 439740 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:07:03,380-Speed 2605.52 samples/sec Loss 6.2644 LearningRate 0.0221 Epoch: 10 Global Step: 439750 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:07:07,260-Speed 2639.55 samples/sec Loss 6.2971 LearningRate 0.0221 Epoch: 10 Global Step: 439760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:11,157-Speed 2627.94 samples/sec Loss 6.1496 LearningRate 0.0221 Epoch: 10 Global Step: 439770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:15,055-Speed 2628.07 samples/sec Loss 6.2585 LearningRate 0.0221 Epoch: 10 Global Step: 439780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:18,957-Speed 2624.49 samples/sec Loss 6.3417 LearningRate 0.0221 Epoch: 10 Global Step: 439790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:22,850-Speed 2631.07 samples/sec Loss 6.3479 LearningRate 0.0221 Epoch: 10 Global Step: 439800 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:26,745-Speed 2629.91 samples/sec Loss 6.3210 LearningRate 0.0221 Epoch: 10 Global Step: 439810 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:30,657-Speed 2617.82 samples/sec Loss 6.2410 LearningRate 0.0221 Epoch: 10 Global Step: 439820 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:34,553-Speed 2628.73 samples/sec Loss 6.2700 LearningRate 0.0221 Epoch: 10 Global Step: 439830 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:38,448-Speed 2629.46 samples/sec Loss 6.1748 LearningRate 0.0221 Epoch: 10 Global Step: 439840 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:42,340-Speed 2631.96 samples/sec Loss 6.2808 LearningRate 0.0221 Epoch: 10 Global Step: 439850 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:07:46,241-Speed 2625.75 samples/sec Loss 6.2453 LearningRate 0.0221 Epoch: 10 Global Step: 439860 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:07:50,145-Speed 2623.05 samples/sec Loss 6.1796 LearningRate 0.0221 Epoch: 10 Global Step: 439870 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:07:54,053-Speed 2621.07 samples/sec Loss 6.0885 LearningRate 0.0221 Epoch: 10 Global Step: 439880 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:07:57,943-Speed 2633.31 samples/sec Loss 6.3199 LearningRate 0.0221 Epoch: 10 Global Step: 439890 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:01,851-Speed 2621.05 samples/sec Loss 6.2790 LearningRate 0.0221 Epoch: 10 Global Step: 439900 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:05,762-Speed 2618.70 samples/sec Loss 6.3188 LearningRate 0.0221 Epoch: 10 Global Step: 439910 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:09,660-Speed 2627.18 samples/sec Loss 6.4608 LearningRate 0.0221 Epoch: 10 Global Step: 439920 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:13,558-Speed 2627.57 samples/sec Loss 6.2050 LearningRate 0.0221 Epoch: 10 Global Step: 439930 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:17,450-Speed 2631.89 samples/sec Loss 6.2104 LearningRate 0.0221 Epoch: 10 Global Step: 439940 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:21,368-Speed 2614.11 samples/sec Loss 6.1729 LearningRate 0.0221 Epoch: 10 Global Step: 439950 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:25,273-Speed 2623.13 samples/sec Loss 6.2674 LearningRate 0.0221 Epoch: 10 Global Step: 439960 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:29,173-Speed 2626.04 samples/sec Loss 6.2629 LearningRate 0.0221 Epoch: 10 Global Step: 439970 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:33,067-Speed 2629.82 samples/sec Loss 6.3779 LearningRate 0.0221 Epoch: 10 Global Step: 439980 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:08:36,963-Speed 2628.85 samples/sec Loss 6.3100 LearningRate 0.0221 Epoch: 10 Global Step: 439990 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:08:40,862-Speed 2626.98 samples/sec Loss 6.3238 LearningRate 0.0221 Epoch: 10 Global Step: 440000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:09:23,743-[lfw][440000]XNorm: 23.537625
Training: 2022-04-14 21:09:23,743-[lfw][440000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-14 21:09:23,744-[lfw][440000]Accuracy-Highest: 0.99783
Training: 2022-04-14 21:10:13,485-[cfp_fp][440000]XNorm: 22.381696
Training: 2022-04-14 21:10:13,485-[cfp_fp][440000]Accuracy-Flip: 0.98671+-0.00616
Training: 2022-04-14 21:10:13,486-[cfp_fp][440000]Accuracy-Highest: 0.98843
Training: 2022-04-14 21:10:56,129-[agedb_30][440000]XNorm: 23.667220
Training: 2022-04-14 21:10:56,129-[agedb_30][440000]Accuracy-Flip: 0.97767+-0.00790
Training: 2022-04-14 21:10:56,130-[agedb_30][440000]Accuracy-Highest: 0.97767
Training: 2022-04-14 21:11:00,002-Speed 73.60 samples/sec Loss 6.2343 LearningRate 0.0221 Epoch: 10 Global Step: 440010 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:11:03,948-Speed 2595.51 samples/sec Loss 6.3596 LearningRate 0.0221 Epoch: 10 Global Step: 440020 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:07,843-Speed 2629.59 samples/sec Loss 6.2790 LearningRate 0.0221 Epoch: 10 Global Step: 440030 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:11,718-Speed 2643.49 samples/sec Loss 6.3260 LearningRate 0.0220 Epoch: 10 Global Step: 440040 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:15,593-Speed 2642.70 samples/sec Loss 6.3500 LearningRate 0.0220 Epoch: 10 Global Step: 440050 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:19,471-Speed 2640.75 samples/sec Loss 6.4086 LearningRate 0.0220 Epoch: 10 Global Step: 440060 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:23,393-Speed 2612.03 samples/sec Loss 6.2468 LearningRate 0.0220 Epoch: 10 Global Step: 440070 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:27,319-Speed 2609.02 samples/sec Loss 6.3152 LearningRate 0.0220 Epoch: 10 Global Step: 440080 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:31,205-Speed 2636.09 samples/sec Loss 6.4061 LearningRate 0.0220 Epoch: 10 Global Step: 440090 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:35,090-Speed 2636.19 samples/sec Loss 6.2125 LearningRate 0.0220 Epoch: 10 Global Step: 440100 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:38,979-Speed 2633.10 samples/sec Loss 6.2174 LearningRate 0.0220 Epoch: 10 Global Step: 440110 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:11:42,876-Speed 2628.16 samples/sec Loss 6.1686 LearningRate 0.0220 Epoch: 10 Global Step: 440120 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:11:46,776-Speed 2626.19 samples/sec Loss 6.3549 LearningRate 0.0220 Epoch: 10 Global Step: 440130 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:11:50,675-Speed 2627.47 samples/sec Loss 6.3522 LearningRate 0.0220 Epoch: 10 Global Step: 440140 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:11:54,581-Speed 2622.19 samples/sec Loss 6.3021 LearningRate 0.0220 Epoch: 10 Global Step: 440150 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:11:58,479-Speed 2627.61 samples/sec Loss 6.4588 LearningRate 0.0220 Epoch: 10 Global Step: 440160 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:02,377-Speed 2627.70 samples/sec Loss 6.2182 LearningRate 0.0220 Epoch: 10 Global Step: 440170 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:06,284-Speed 2621.25 samples/sec Loss 6.3357 LearningRate 0.0220 Epoch: 10 Global Step: 440180 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:10,222-Speed 2600.94 samples/sec Loss 6.3953 LearningRate 0.0220 Epoch: 10 Global Step: 440190 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:14,157-Speed 2602.59 samples/sec Loss 6.2853 LearningRate 0.0220 Epoch: 10 Global Step: 440200 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:18,148-Speed 2566.34 samples/sec Loss 6.3381 LearningRate 0.0220 Epoch: 10 Global Step: 440210 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:22,036-Speed 2634.57 samples/sec Loss 6.3043 LearningRate 0.0220 Epoch: 10 Global Step: 440220 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:25,932-Speed 2628.93 samples/sec Loss 6.2578 LearningRate 0.0220 Epoch: 10 Global Step: 440230 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:29,833-Speed 2625.39 samples/sec Loss 6.2450 LearningRate 0.0220 Epoch: 10 Global Step: 440240 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:33,731-Speed 2627.76 samples/sec Loss 6.2257 LearningRate 0.0220 Epoch: 10 Global Step: 440250 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:37,621-Speed 2633.56 samples/sec Loss 6.2790 LearningRate 0.0220 Epoch: 10 Global Step: 440260 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:41,517-Speed 2628.80 samples/sec Loss 6.2789 LearningRate 0.0220 Epoch: 10 Global Step: 440270 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:45,408-Speed 2632.12 samples/sec Loss 6.3246 LearningRate 0.0220 Epoch: 10 Global Step: 440280 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:49,303-Speed 2629.93 samples/sec Loss 6.2043 LearningRate 0.0220 Epoch: 10 Global Step: 440290 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:53,204-Speed 2625.39 samples/sec Loss 6.2937 LearningRate 0.0220 Epoch: 10 Global Step: 440300 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:12:57,098-Speed 2630.32 samples/sec Loss 6.2429 LearningRate 0.0220 Epoch: 10 Global Step: 440310 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:13:00,980-Speed 2638.51 samples/sec Loss 6.3234 LearningRate 0.0220 Epoch: 10 Global Step: 440320 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:13:04,899-Speed 2613.17 samples/sec Loss 6.4180 LearningRate 0.0220 Epoch: 10 Global Step: 440330 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:13:08,798-Speed 2627.23 samples/sec Loss 6.1501 LearningRate 0.0220 Epoch: 10 Global Step: 440340 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:13:12,699-Speed 2625.36 samples/sec Loss 6.2876 LearningRate 0.0220 Epoch: 10 Global Step: 440350 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:13:16,601-Speed 2625.04 samples/sec Loss 6.1974 LearningRate 0.0220 Epoch: 10 Global Step: 440360 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:13:20,479-Speed 2640.86 samples/sec Loss 6.2410 LearningRate 0.0220 Epoch: 10 Global Step: 440370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:24,385-Speed 2622.37 samples/sec Loss 6.2753 LearningRate 0.0220 Epoch: 10 Global Step: 440380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:28,282-Speed 2628.18 samples/sec Loss 6.3038 LearningRate 0.0220 Epoch: 10 Global Step: 440390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:32,177-Speed 2629.96 samples/sec Loss 6.2457 LearningRate 0.0220 Epoch: 10 Global Step: 440400 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:36,087-Speed 2618.91 samples/sec Loss 6.2532 LearningRate 0.0220 Epoch: 10 Global Step: 440410 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:40,076-Speed 2567.73 samples/sec Loss 6.2964 LearningRate 0.0220 Epoch: 10 Global Step: 440420 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:43,978-Speed 2625.14 samples/sec Loss 6.4332 LearningRate 0.0220 Epoch: 10 Global Step: 440430 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:47,882-Speed 2623.37 samples/sec Loss 6.2338 LearningRate 0.0220 Epoch: 10 Global Step: 440440 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:51,774-Speed 2631.51 samples/sec Loss 6.2542 LearningRate 0.0220 Epoch: 10 Global Step: 440450 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:55,675-Speed 2625.57 samples/sec Loss 6.2680 LearningRate 0.0220 Epoch: 10 Global Step: 440460 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:13:59,570-Speed 2630.00 samples/sec Loss 6.2640 LearningRate 0.0220 Epoch: 10 Global Step: 440470 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:14:03,464-Speed 2630.45 samples/sec Loss 6.2352 LearningRate 0.0220 Epoch: 10 Global Step: 440480 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:14:07,353-Speed 2633.09 samples/sec Loss 6.2199 LearningRate 0.0220 Epoch: 10 Global Step: 440490 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:14:11,250-Speed 2628.62 samples/sec Loss 6.3016 LearningRate 0.0220 Epoch: 10 Global Step: 440500 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:14:15,130-Speed 2639.71 samples/sec Loss 6.3234 LearningRate 0.0220 Epoch: 10 Global Step: 440510 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:19,028-Speed 2627.32 samples/sec Loss 6.2907 LearningRate 0.0220 Epoch: 10 Global Step: 440520 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:22,931-Speed 2623.75 samples/sec Loss 6.3081 LearningRate 0.0220 Epoch: 10 Global Step: 440530 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:26,820-Speed 2633.74 samples/sec Loss 6.2186 LearningRate 0.0220 Epoch: 10 Global Step: 440540 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:30,718-Speed 2627.67 samples/sec Loss 6.2518 LearningRate 0.0220 Epoch: 10 Global Step: 440550 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:34,613-Speed 2629.82 samples/sec Loss 6.3268 LearningRate 0.0220 Epoch: 10 Global Step: 440560 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:38,506-Speed 2630.84 samples/sec Loss 6.1674 LearningRate 0.0220 Epoch: 10 Global Step: 440570 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:42,399-Speed 2631.03 samples/sec Loss 6.2861 LearningRate 0.0220 Epoch: 10 Global Step: 440580 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:46,293-Speed 2630.78 samples/sec Loss 6.1539 LearningRate 0.0220 Epoch: 10 Global Step: 440590 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:50,183-Speed 2632.68 samples/sec Loss 6.2507 LearningRate 0.0220 Epoch: 10 Global Step: 440600 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:14:54,086-Speed 2624.02 samples/sec Loss 6.2851 LearningRate 0.0220 Epoch: 10 Global Step: 440610 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:14:57,982-Speed 2629.11 samples/sec Loss 6.2335 LearningRate 0.0220 Epoch: 10 Global Step: 440620 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:01,873-Speed 2632.44 samples/sec Loss 6.2761 LearningRate 0.0220 Epoch: 10 Global Step: 440630 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:05,779-Speed 2621.75 samples/sec Loss 6.3155 LearningRate 0.0220 Epoch: 10 Global Step: 440640 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:09,676-Speed 2628.66 samples/sec Loss 6.2316 LearningRate 0.0220 Epoch: 10 Global Step: 440650 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:13,572-Speed 2628.73 samples/sec Loss 6.2607 LearningRate 0.0220 Epoch: 10 Global Step: 440660 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:17,466-Speed 2630.84 samples/sec Loss 6.3695 LearningRate 0.0220 Epoch: 10 Global Step: 440670 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:21,364-Speed 2627.64 samples/sec Loss 6.2916 LearningRate 0.0220 Epoch: 10 Global Step: 440680 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:25,254-Speed 2632.45 samples/sec Loss 6.2919 LearningRate 0.0220 Epoch: 10 Global Step: 440690 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:15:29,133-Speed 2640.88 samples/sec Loss 6.2285 LearningRate 0.0220 Epoch: 10 Global Step: 440700 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:15:33,028-Speed 2629.11 samples/sec Loss 6.2045 LearningRate 0.0220 Epoch: 10 Global Step: 440710 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:15:36,930-Speed 2624.90 samples/sec Loss 6.2290 LearningRate 0.0220 Epoch: 10 Global Step: 440720 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:15:40,831-Speed 2625.44 samples/sec Loss 6.3011 LearningRate 0.0220 Epoch: 10 Global Step: 440730 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:15:44,722-Speed 2632.26 samples/sec Loss 6.2203 LearningRate 0.0220 Epoch: 10 Global Step: 440740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:15:48,619-Speed 2628.59 samples/sec Loss 6.2775 LearningRate 0.0220 Epoch: 10 Global Step: 440750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:15:52,517-Speed 2627.98 samples/sec Loss 6.4200 LearningRate 0.0220 Epoch: 10 Global Step: 440760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:15:56,411-Speed 2630.45 samples/sec Loss 6.2815 LearningRate 0.0220 Epoch: 10 Global Step: 440770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:00,319-Speed 2621.49 samples/sec Loss 6.3188 LearningRate 0.0220 Epoch: 10 Global Step: 440780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:04,215-Speed 2628.85 samples/sec Loss 6.2259 LearningRate 0.0220 Epoch: 10 Global Step: 440790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:08,111-Speed 2628.84 samples/sec Loss 6.2525 LearningRate 0.0220 Epoch: 10 Global Step: 440800 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:16:12,011-Speed 2625.92 samples/sec Loss 6.2214 LearningRate 0.0220 Epoch: 10 Global Step: 440810 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:16:15,891-Speed 2640.24 samples/sec Loss 6.1936 LearningRate 0.0220 Epoch: 10 Global Step: 440820 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:19,784-Speed 2631.18 samples/sec Loss 6.2478 LearningRate 0.0220 Epoch: 10 Global Step: 440830 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:23,680-Speed 2628.83 samples/sec Loss 6.3209 LearningRate 0.0220 Epoch: 10 Global Step: 440840 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:27,578-Speed 2627.74 samples/sec Loss 6.2759 LearningRate 0.0220 Epoch: 10 Global Step: 440850 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:31,471-Speed 2631.49 samples/sec Loss 6.2269 LearningRate 0.0220 Epoch: 10 Global Step: 440860 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:35,366-Speed 2629.68 samples/sec Loss 6.2685 LearningRate 0.0220 Epoch: 10 Global Step: 440870 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:39,261-Speed 2629.70 samples/sec Loss 6.1301 LearningRate 0.0220 Epoch: 10 Global Step: 440880 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:43,177-Speed 2615.11 samples/sec Loss 6.2100 LearningRate 0.0220 Epoch: 10 Global Step: 440890 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:47,077-Speed 2626.32 samples/sec Loss 6.2007 LearningRate 0.0220 Epoch: 10 Global Step: 440900 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:50,975-Speed 2627.55 samples/sec Loss 6.2429 LearningRate 0.0220 Epoch: 10 Global Step: 440910 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:16:54,873-Speed 2628.23 samples/sec Loss 6.2743 LearningRate 0.0219 Epoch: 10 Global Step: 440920 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:16:58,815-Speed 2598.14 samples/sec Loss 6.2701 LearningRate 0.0219 Epoch: 10 Global Step: 440930 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:02,730-Speed 2617.31 samples/sec Loss 6.2273 LearningRate 0.0219 Epoch: 10 Global Step: 440940 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:06,632-Speed 2624.84 samples/sec Loss 6.2093 LearningRate 0.0219 Epoch: 10 Global Step: 440950 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:10,641-Speed 2554.89 samples/sec Loss 6.2637 LearningRate 0.0219 Epoch: 10 Global Step: 440960 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:14,551-Speed 2619.28 samples/sec Loss 6.2213 LearningRate 0.0219 Epoch: 10 Global Step: 440970 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:18,463-Speed 2617.85 samples/sec Loss 6.1904 LearningRate 0.0219 Epoch: 10 Global Step: 440980 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:22,359-Speed 2629.29 samples/sec Loss 6.2823 LearningRate 0.0219 Epoch: 10 Global Step: 440990 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:26,274-Speed 2616.07 samples/sec Loss 6.2009 LearningRate 0.0219 Epoch: 10 Global Step: 441000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:30,170-Speed 2629.49 samples/sec Loss 6.2590 LearningRate 0.0219 Epoch: 10 Global Step: 441010 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:34,048-Speed 2641.11 samples/sec Loss 6.4397 LearningRate 0.0219 Epoch: 10 Global Step: 441020 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:17:37,939-Speed 2632.43 samples/sec Loss 6.1909 LearningRate 0.0219 Epoch: 10 Global Step: 441030 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:17:41,843-Speed 2623.74 samples/sec Loss 6.2035 LearningRate 0.0219 Epoch: 10 Global Step: 441040 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:17:45,743-Speed 2626.21 samples/sec Loss 6.2397 LearningRate 0.0219 Epoch: 10 Global Step: 441050 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:17:49,663-Speed 2612.51 samples/sec Loss 6.3583 LearningRate 0.0219 Epoch: 10 Global Step: 441060 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:17:53,638-Speed 2577.37 samples/sec Loss 6.2683 LearningRate 0.0219 Epoch: 10 Global Step: 441070 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:17:57,536-Speed 2627.11 samples/sec Loss 6.2663 LearningRate 0.0219 Epoch: 10 Global Step: 441080 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:01,447-Speed 2619.05 samples/sec Loss 6.2112 LearningRate 0.0219 Epoch: 10 Global Step: 441090 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:05,356-Speed 2620.13 samples/sec Loss 6.2311 LearningRate 0.0219 Epoch: 10 Global Step: 441100 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:09,262-Speed 2622.63 samples/sec Loss 6.2990 LearningRate 0.0219 Epoch: 10 Global Step: 441110 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:13,167-Speed 2623.18 samples/sec Loss 6.2336 LearningRate 0.0219 Epoch: 10 Global Step: 441120 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:17,066-Speed 2627.10 samples/sec Loss 6.3075 LearningRate 0.0219 Epoch: 10 Global Step: 441130 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:18:20,945-Speed 2640.43 samples/sec Loss 6.2206 LearningRate 0.0219 Epoch: 10 Global Step: 441140 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:24,849-Speed 2623.64 samples/sec Loss 6.2941 LearningRate 0.0219 Epoch: 10 Global Step: 441150 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:28,775-Speed 2608.40 samples/sec Loss 6.2170 LearningRate 0.0219 Epoch: 10 Global Step: 441160 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:32,674-Speed 2627.97 samples/sec Loss 6.3462 LearningRate 0.0219 Epoch: 10 Global Step: 441170 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:36,567-Speed 2630.69 samples/sec Loss 6.2613 LearningRate 0.0219 Epoch: 10 Global Step: 441180 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:40,492-Speed 2609.80 samples/sec Loss 6.2418 LearningRate 0.0219 Epoch: 10 Global Step: 441190 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:44,399-Speed 2621.47 samples/sec Loss 6.1984 LearningRate 0.0219 Epoch: 10 Global Step: 441200 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:48,302-Speed 2624.34 samples/sec Loss 6.2550 LearningRate 0.0219 Epoch: 10 Global Step: 441210 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:52,216-Speed 2617.18 samples/sec Loss 6.1778 LearningRate 0.0219 Epoch: 10 Global Step: 441220 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:18:56,132-Speed 2615.13 samples/sec Loss 6.3367 LearningRate 0.0219 Epoch: 10 Global Step: 441230 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:00,052-Speed 2613.08 samples/sec Loss 6.2780 LearningRate 0.0219 Epoch: 10 Global Step: 441240 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:19:04,041-Speed 2567.69 samples/sec Loss 6.2565 LearningRate 0.0219 Epoch: 10 Global Step: 441250 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:19:08,045-Speed 2558.31 samples/sec Loss 6.1913 LearningRate 0.0219 Epoch: 10 Global Step: 441260 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:19:11,942-Speed 2627.90 samples/sec Loss 6.3344 LearningRate 0.0219 Epoch: 10 Global Step: 441270 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:19:15,848-Speed 2622.98 samples/sec Loss 6.2304 LearningRate 0.0219 Epoch: 10 Global Step: 441280 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:19:19,727-Speed 2640.30 samples/sec Loss 6.2353 LearningRate 0.0219 Epoch: 10 Global Step: 441290 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:23,623-Speed 2628.99 samples/sec Loss 6.1735 LearningRate 0.0219 Epoch: 10 Global Step: 441300 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:27,517-Speed 2629.86 samples/sec Loss 6.2267 LearningRate 0.0219 Epoch: 10 Global Step: 441310 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:31,417-Speed 2627.04 samples/sec Loss 6.2669 LearningRate 0.0219 Epoch: 10 Global Step: 441320 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:35,314-Speed 2628.05 samples/sec Loss 6.0785 LearningRate 0.0219 Epoch: 10 Global Step: 441330 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:39,226-Speed 2618.89 samples/sec Loss 6.1950 LearningRate 0.0219 Epoch: 10 Global Step: 441340 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:43,142-Speed 2615.81 samples/sec Loss 6.3127 LearningRate 0.0219 Epoch: 10 Global Step: 441350 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:47,040-Speed 2628.20 samples/sec Loss 6.1676 LearningRate 0.0219 Epoch: 10 Global Step: 441360 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:50,937-Speed 2628.04 samples/sec Loss 6.2051 LearningRate 0.0219 Epoch: 10 Global Step: 441370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:54,835-Speed 2627.02 samples/sec Loss 6.3441 LearningRate 0.0219 Epoch: 10 Global Step: 441380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:19:58,736-Speed 2625.44 samples/sec Loss 6.1924 LearningRate 0.0219 Epoch: 10 Global Step: 441390 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:20:02,637-Speed 2625.90 samples/sec Loss 6.1282 LearningRate 0.0219 Epoch: 10 Global Step: 441400 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:20:06,547-Speed 2620.43 samples/sec Loss 6.3088 LearningRate 0.0219 Epoch: 10 Global Step: 441410 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:20:10,444-Speed 2627.91 samples/sec Loss 6.3560 LearningRate 0.0219 Epoch: 10 Global Step: 441420 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:20:14,351-Speed 2621.96 samples/sec Loss 6.3312 LearningRate 0.0219 Epoch: 10 Global Step: 441430 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:18,253-Speed 2624.58 samples/sec Loss 6.2441 LearningRate 0.0219 Epoch: 10 Global Step: 441440 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:22,161-Speed 2621.31 samples/sec Loss 6.3071 LearningRate 0.0219 Epoch: 10 Global Step: 441450 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:26,050-Speed 2633.01 samples/sec Loss 6.3212 LearningRate 0.0219 Epoch: 10 Global Step: 441460 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:29,951-Speed 2626.35 samples/sec Loss 6.3147 LearningRate 0.0219 Epoch: 10 Global Step: 441470 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:33,843-Speed 2631.58 samples/sec Loss 6.2791 LearningRate 0.0219 Epoch: 10 Global Step: 441480 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:37,816-Speed 2578.55 samples/sec Loss 6.3031 LearningRate 0.0219 Epoch: 10 Global Step: 441490 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:41,720-Speed 2623.97 samples/sec Loss 6.3801 LearningRate 0.0219 Epoch: 10 Global Step: 441500 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:45,635-Speed 2615.82 samples/sec Loss 6.3100 LearningRate 0.0219 Epoch: 10 Global Step: 441510 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:49,549-Speed 2617.22 samples/sec Loss 6.2327 LearningRate 0.0219 Epoch: 10 Global Step: 441520 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:20:53,459-Speed 2619.82 samples/sec Loss 6.2127 LearningRate 0.0219 Epoch: 10 Global Step: 441530 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:20:57,356-Speed 2628.20 samples/sec Loss 6.3113 LearningRate 0.0219 Epoch: 10 Global Step: 441540 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:21:01,256-Speed 2626.52 samples/sec Loss 6.2530 LearningRate 0.0219 Epoch: 10 Global Step: 441550 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:21:05,159-Speed 2624.14 samples/sec Loss 6.2108 LearningRate 0.0219 Epoch: 10 Global Step: 441560 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:21:09,075-Speed 2615.30 samples/sec Loss 6.2290 LearningRate 0.0219 Epoch: 10 Global Step: 441570 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:21:12,962-Speed 2635.47 samples/sec Loss 6.2130 LearningRate 0.0219 Epoch: 10 Global Step: 441580 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:16,864-Speed 2625.07 samples/sec Loss 6.2103 LearningRate 0.0219 Epoch: 10 Global Step: 441590 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:20,771-Speed 2621.86 samples/sec Loss 6.1871 LearningRate 0.0219 Epoch: 10 Global Step: 441600 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:24,857-Speed 2506.61 samples/sec Loss 6.3187 LearningRate 0.0219 Epoch: 10 Global Step: 441610 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:28,762-Speed 2623.03 samples/sec Loss 6.2107 LearningRate 0.0219 Epoch: 10 Global Step: 441620 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:32,735-Speed 2578.50 samples/sec Loss 6.2256 LearningRate 0.0219 Epoch: 10 Global Step: 441630 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:36,632-Speed 2628.20 samples/sec Loss 6.2675 LearningRate 0.0219 Epoch: 10 Global Step: 441640 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:40,531-Speed 2626.64 samples/sec Loss 6.2294 LearningRate 0.0219 Epoch: 10 Global Step: 441650 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:44,444-Speed 2617.38 samples/sec Loss 6.2496 LearningRate 0.0219 Epoch: 10 Global Step: 441660 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:48,344-Speed 2625.91 samples/sec Loss 6.2397 LearningRate 0.0219 Epoch: 10 Global Step: 441670 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:21:52,246-Speed 2625.76 samples/sec Loss 6.2250 LearningRate 0.0219 Epoch: 10 Global Step: 441680 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:21:56,147-Speed 2625.14 samples/sec Loss 6.1442 LearningRate 0.0219 Epoch: 10 Global Step: 441690 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:22:00,050-Speed 2624.76 samples/sec Loss 6.1971 LearningRate 0.0219 Epoch: 10 Global Step: 441700 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:22:03,963-Speed 2617.10 samples/sec Loss 6.2099 LearningRate 0.0219 Epoch: 10 Global Step: 441710 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:22:07,862-Speed 2626.92 samples/sec Loss 6.2119 LearningRate 0.0219 Epoch: 10 Global Step: 441720 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:11,763-Speed 2625.86 samples/sec Loss 6.1845 LearningRate 0.0219 Epoch: 10 Global Step: 441730 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:15,665-Speed 2624.88 samples/sec Loss 6.2021 LearningRate 0.0219 Epoch: 10 Global Step: 441740 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:19,570-Speed 2623.34 samples/sec Loss 6.2884 LearningRate 0.0219 Epoch: 10 Global Step: 441750 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:23,496-Speed 2608.85 samples/sec Loss 6.2205 LearningRate 0.0219 Epoch: 10 Global Step: 441760 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:27,434-Speed 2600.81 samples/sec Loss 6.1613 LearningRate 0.0219 Epoch: 10 Global Step: 441770 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:31,333-Speed 2627.01 samples/sec Loss 6.2692 LearningRate 0.0219 Epoch: 10 Global Step: 441780 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:35,285-Speed 2591.79 samples/sec Loss 6.1918 LearningRate 0.0219 Epoch: 10 Global Step: 441790 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:22:39,174-Speed 2634.10 samples/sec Loss 6.2663 LearningRate 0.0219 Epoch: 10 Global Step: 441800 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:22:43,090-Speed 2615.09 samples/sec Loss 6.2525 LearningRate 0.0218 Epoch: 10 Global Step: 441810 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:22:46,987-Speed 2628.90 samples/sec Loss 6.1154 LearningRate 0.0218 Epoch: 10 Global Step: 441820 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:22:50,890-Speed 2623.91 samples/sec Loss 6.1648 LearningRate 0.0218 Epoch: 10 Global Step: 441830 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:22:54,791-Speed 2625.59 samples/sec Loss 6.2966 LearningRate 0.0218 Epoch: 10 Global Step: 441840 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:22:58,692-Speed 2625.59 samples/sec Loss 6.2905 LearningRate 0.0218 Epoch: 10 Global Step: 441850 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:23:02,592-Speed 2626.56 samples/sec Loss 6.2940 LearningRate 0.0218 Epoch: 10 Global Step: 441860 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:23:06,552-Speed 2586.15 samples/sec Loss 6.1502 LearningRate 0.0218 Epoch: 10 Global Step: 441870 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:23:10,450-Speed 2627.42 samples/sec Loss 6.3086 LearningRate 0.0218 Epoch: 10 Global Step: 441880 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:23:14,354-Speed 2623.58 samples/sec Loss 6.2423 LearningRate 0.0218 Epoch: 10 Global Step: 441890 Fp16 Grad Scale: 32768 Required: 44 hours
Training: 2022-04-14 21:23:18,254-Speed 2626.47 samples/sec Loss 6.1922 LearningRate 0.0218 Epoch: 10 Global Step: 441900 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:22,151-Speed 2628.34 samples/sec Loss 6.2035 LearningRate 0.0218 Epoch: 10 Global Step: 441910 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:26,046-Speed 2629.29 samples/sec Loss 6.2472 LearningRate 0.0218 Epoch: 10 Global Step: 441920 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:29,946-Speed 2626.53 samples/sec Loss 6.1723 LearningRate 0.0218 Epoch: 10 Global Step: 441930 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:33,850-Speed 2623.22 samples/sec Loss 6.1634 LearningRate 0.0218 Epoch: 10 Global Step: 441940 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:37,747-Speed 2628.40 samples/sec Loss 6.3230 LearningRate 0.0218 Epoch: 10 Global Step: 441950 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:41,643-Speed 2628.60 samples/sec Loss 6.1892 LearningRate 0.0218 Epoch: 10 Global Step: 441960 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:45,539-Speed 2629.67 samples/sec Loss 6.2960 LearningRate 0.0218 Epoch: 10 Global Step: 441970 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:49,433-Speed 2630.00 samples/sec Loss 6.2793 LearningRate 0.0218 Epoch: 10 Global Step: 441980 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:53,331-Speed 2627.84 samples/sec Loss 6.2833 LearningRate 0.0218 Epoch: 10 Global Step: 441990 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:23:57,233-Speed 2624.42 samples/sec Loss 6.1722 LearningRate 0.0218 Epoch: 10 Global Step: 442000 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:01,133-Speed 2626.81 samples/sec Loss 6.3447 LearningRate 0.0218 Epoch: 10 Global Step: 442010 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:05,036-Speed 2624.02 samples/sec Loss 6.3101 LearningRate 0.0218 Epoch: 10 Global Step: 442020 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:08,943-Speed 2621.30 samples/sec Loss 6.2773 LearningRate 0.0218 Epoch: 10 Global Step: 442030 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:12,841-Speed 2627.16 samples/sec Loss 6.2351 LearningRate 0.0218 Epoch: 10 Global Step: 442040 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:16,740-Speed 2627.44 samples/sec Loss 6.2864 LearningRate 0.0218 Epoch: 10 Global Step: 442050 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:20,644-Speed 2623.91 samples/sec Loss 6.3061 LearningRate 0.0218 Epoch: 10 Global Step: 442060 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:24,686-Speed 2533.74 samples/sec Loss 6.2538 LearningRate 0.0218 Epoch: 10 Global Step: 442070 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:24:28,570-Speed 2637.08 samples/sec Loss 6.1844 LearningRate 0.0218 Epoch: 10 Global Step: 442080 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:32,469-Speed 2627.56 samples/sec Loss 6.2651 LearningRate 0.0218 Epoch: 10 Global Step: 442090 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:36,375-Speed 2621.79 samples/sec Loss 6.3103 LearningRate 0.0218 Epoch: 10 Global Step: 442100 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:40,295-Speed 2613.49 samples/sec Loss 6.3422 LearningRate 0.0218 Epoch: 10 Global Step: 442110 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:44,215-Speed 2613.32 samples/sec Loss 6.2210 LearningRate 0.0218 Epoch: 10 Global Step: 442120 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:48,126-Speed 2619.00 samples/sec Loss 6.1806 LearningRate 0.0218 Epoch: 10 Global Step: 442130 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:52,036-Speed 2619.54 samples/sec Loss 6.1476 LearningRate 0.0218 Epoch: 10 Global Step: 442140 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:55,936-Speed 2625.73 samples/sec Loss 6.2377 LearningRate 0.0218 Epoch: 10 Global Step: 442150 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:24:59,983-Speed 2531.06 samples/sec Loss 6.2372 LearningRate 0.0218 Epoch: 10 Global Step: 442160 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:03,878-Speed 2630.13 samples/sec Loss 6.1814 LearningRate 0.0218 Epoch: 10 Global Step: 442170 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:07,819-Speed 2598.73 samples/sec Loss 6.2516 LearningRate 0.0218 Epoch: 10 Global Step: 442180 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:25:11,779-Speed 2586.48 samples/sec Loss 6.3377 LearningRate 0.0218 Epoch: 10 Global Step: 442190 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:25:15,691-Speed 2618.45 samples/sec Loss 6.1358 LearningRate 0.0218 Epoch: 10 Global Step: 442200 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:25:19,594-Speed 2624.12 samples/sec Loss 6.2530 LearningRate 0.0218 Epoch: 10 Global Step: 442210 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:25:23,502-Speed 2621.42 samples/sec Loss 6.1874 LearningRate 0.0218 Epoch: 10 Global Step: 442220 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:25:27,381-Speed 2639.77 samples/sec Loss 6.2244 LearningRate 0.0218 Epoch: 10 Global Step: 442230 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:31,335-Speed 2590.97 samples/sec Loss 6.2539 LearningRate 0.0218 Epoch: 10 Global Step: 442240 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:35,248-Speed 2617.34 samples/sec Loss 6.1837 LearningRate 0.0218 Epoch: 10 Global Step: 442250 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:39,255-Speed 2556.58 samples/sec Loss 6.2391 LearningRate 0.0218 Epoch: 10 Global Step: 442260 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:43,361-Speed 2494.42 samples/sec Loss 6.1569 LearningRate 0.0218 Epoch: 10 Global Step: 442270 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:47,460-Speed 2498.93 samples/sec Loss 6.2564 LearningRate 0.0218 Epoch: 10 Global Step: 442280 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:51,550-Speed 2504.41 samples/sec Loss 6.3308 LearningRate 0.0218 Epoch: 10 Global Step: 442290 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:55,576-Speed 2543.97 samples/sec Loss 6.2042 LearningRate 0.0218 Epoch: 10 Global Step: 442300 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:25:59,484-Speed 2621.01 samples/sec Loss 6.1858 LearningRate 0.0218 Epoch: 10 Global Step: 442310 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:03,392-Speed 2620.71 samples/sec Loss 6.2266 LearningRate 0.0218 Epoch: 10 Global Step: 442320 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:07,277-Speed 2636.56 samples/sec Loss 6.1989 LearningRate 0.0218 Epoch: 10 Global Step: 442330 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:11,188-Speed 2619.16 samples/sec Loss 6.2657 LearningRate 0.0218 Epoch: 10 Global Step: 442340 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:15,130-Speed 2598.17 samples/sec Loss 6.2189 LearningRate 0.0218 Epoch: 10 Global Step: 442350 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:19,035-Speed 2623.54 samples/sec Loss 6.1795 LearningRate 0.0218 Epoch: 10 Global Step: 442360 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:22,934-Speed 2626.49 samples/sec Loss 6.2671 LearningRate 0.0218 Epoch: 10 Global Step: 442370 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:26,835-Speed 2625.35 samples/sec Loss 6.2718 LearningRate 0.0218 Epoch: 10 Global Step: 442380 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:30,748-Speed 2618.24 samples/sec Loss 6.2943 LearningRate 0.0218 Epoch: 10 Global Step: 442390 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:34,656-Speed 2620.74 samples/sec Loss 6.2392 LearningRate 0.0218 Epoch: 10 Global Step: 442400 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:38,605-Speed 2593.90 samples/sec Loss 6.1552 LearningRate 0.0218 Epoch: 10 Global Step: 442410 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:42,530-Speed 2610.20 samples/sec Loss 6.2929 LearningRate 0.0218 Epoch: 10 Global Step: 442420 Fp16 Grad Scale: 65536 Required: 44 hours
Training: 2022-04-14 21:26:46,431-Speed 2625.54 samples/sec Loss 6.2836 LearningRate 0.0218 Epoch: 10 Global Step: 442430 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:26:50,360-Speed 2607.51 samples/sec Loss 6.2151 LearningRate 0.0218 Epoch: 10 Global Step: 442440 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:26:54,271-Speed 2618.50 samples/sec Loss 6.2767 LearningRate 0.0218 Epoch: 10 Global Step: 442450 Fp16 Grad Scale: 131072 Required: 44 hours
Training: 2022-04-14 21:26:58,171-Speed 2626.56 samples/sec Loss 6.2305 LearningRate 0.0218 Epoch: 10 Global Step: 442460 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:02,094-Speed 2610.67 samples/sec Loss 6.1895 LearningRate 0.0218 Epoch: 10 Global Step: 442470 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:06,043-Speed 2594.38 samples/sec Loss 6.1453 LearningRate 0.0218 Epoch: 10 Global Step: 442480 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:09,941-Speed 2627.43 samples/sec Loss 6.1889 LearningRate 0.0218 Epoch: 10 Global Step: 442490 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:13,851-Speed 2619.95 samples/sec Loss 6.2298 LearningRate 0.0218 Epoch: 10 Global Step: 442500 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:17,765-Speed 2617.06 samples/sec Loss 6.2402 LearningRate 0.0218 Epoch: 10 Global Step: 442510 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:21,665-Speed 2626.44 samples/sec Loss 6.1280 LearningRate 0.0218 Epoch: 10 Global Step: 442520 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:25,551-Speed 2635.98 samples/sec Loss 6.3395 LearningRate 0.0218 Epoch: 10 Global Step: 442530 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:29,452-Speed 2625.03 samples/sec Loss 6.2046 LearningRate 0.0218 Epoch: 10 Global Step: 442540 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:33,361-Speed 2620.17 samples/sec Loss 6.2011 LearningRate 0.0218 Epoch: 10 Global Step: 442550 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:37,267-Speed 2622.06 samples/sec Loss 6.1671 LearningRate 0.0218 Epoch: 10 Global Step: 442560 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:41,164-Speed 2628.84 samples/sec Loss 6.2963 LearningRate 0.0218 Epoch: 10 Global Step: 442570 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:45,075-Speed 2618.68 samples/sec Loss 6.1754 LearningRate 0.0218 Epoch: 10 Global Step: 442580 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:48,977-Speed 2625.04 samples/sec Loss 6.2418 LearningRate 0.0218 Epoch: 10 Global Step: 442590 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:52,877-Speed 2626.09 samples/sec Loss 6.3017 LearningRate 0.0218 Epoch: 10 Global Step: 442600 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:27:56,895-Speed 2549.50 samples/sec Loss 6.1553 LearningRate 0.0218 Epoch: 10 Global Step: 442610 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:28:00,798-Speed 2624.70 samples/sec Loss 6.1472 LearningRate 0.0218 Epoch: 10 Global Step: 442620 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:28:04,695-Speed 2627.86 samples/sec Loss 6.2664 LearningRate 0.0218 Epoch: 10 Global Step: 442630 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:28:08,595-Speed 2626.15 samples/sec Loss 6.1681 LearningRate 0.0218 Epoch: 10 Global Step: 442640 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:28:12,529-Speed 2603.90 samples/sec Loss 6.2586 LearningRate 0.0218 Epoch: 10 Global Step: 442650 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:28:16,427-Speed 2628.06 samples/sec Loss 6.1780 LearningRate 0.0218 Epoch: 10 Global Step: 442660 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:28:20,314-Speed 2634.97 samples/sec Loss 6.1964 LearningRate 0.0218 Epoch: 10 Global Step: 442670 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:24,216-Speed 2625.05 samples/sec Loss 6.3566 LearningRate 0.0218 Epoch: 10 Global Step: 442680 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:28,124-Speed 2621.40 samples/sec Loss 6.3092 LearningRate 0.0217 Epoch: 10 Global Step: 442690 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:32,028-Speed 2623.42 samples/sec Loss 6.2373 LearningRate 0.0217 Epoch: 10 Global Step: 442700 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:35,928-Speed 2625.81 samples/sec Loss 6.2029 LearningRate 0.0217 Epoch: 10 Global Step: 442710 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:39,833-Speed 2623.04 samples/sec Loss 6.2616 LearningRate 0.0217 Epoch: 10 Global Step: 442720 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:43,738-Speed 2622.96 samples/sec Loss 6.2573 LearningRate 0.0217 Epoch: 10 Global Step: 442730 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:47,652-Speed 2617.21 samples/sec Loss 6.2666 LearningRate 0.0217 Epoch: 10 Global Step: 442740 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:51,557-Speed 2622.87 samples/sec Loss 6.2475 LearningRate 0.0217 Epoch: 10 Global Step: 442750 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:55,454-Speed 2628.17 samples/sec Loss 6.3104 LearningRate 0.0217 Epoch: 10 Global Step: 442760 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:28:59,361-Speed 2621.78 samples/sec Loss 6.2826 LearningRate 0.0217 Epoch: 10 Global Step: 442770 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:03,280-Speed 2613.25 samples/sec Loss 6.3587 LearningRate 0.0217 Epoch: 10 Global Step: 442780 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:07,184-Speed 2623.35 samples/sec Loss 6.2900 LearningRate 0.0217 Epoch: 10 Global Step: 442790 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:11,096-Speed 2618.26 samples/sec Loss 6.1552 LearningRate 0.0217 Epoch: 10 Global Step: 442800 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:14,996-Speed 2626.42 samples/sec Loss 6.1568 LearningRate 0.0217 Epoch: 10 Global Step: 442810 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:18,896-Speed 2626.74 samples/sec Loss 6.2008 LearningRate 0.0217 Epoch: 10 Global Step: 442820 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:22,797-Speed 2625.09 samples/sec Loss 6.2135 LearningRate 0.0217 Epoch: 10 Global Step: 442830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:26,704-Speed 2622.16 samples/sec Loss 6.2099 LearningRate 0.0217 Epoch: 10 Global Step: 442840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:30,608-Speed 2622.92 samples/sec Loss 6.1658 LearningRate 0.0217 Epoch: 10 Global Step: 442850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:34,516-Speed 2621.22 samples/sec Loss 6.1935 LearningRate 0.0217 Epoch: 10 Global Step: 442860 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:29:38,424-Speed 2620.52 samples/sec Loss 6.2621 LearningRate 0.0217 Epoch: 10 Global Step: 442870 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:29:42,334-Speed 2619.45 samples/sec Loss 6.2944 LearningRate 0.0217 Epoch: 10 Global Step: 442880 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:29:46,244-Speed 2619.41 samples/sec Loss 6.2868 LearningRate 0.0217 Epoch: 10 Global Step: 442890 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:29:50,169-Speed 2610.51 samples/sec Loss 6.2989 LearningRate 0.0217 Epoch: 10 Global Step: 442900 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:29:54,071-Speed 2624.67 samples/sec Loss 6.1168 LearningRate 0.0217 Epoch: 10 Global Step: 442910 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:29:57,968-Speed 2628.43 samples/sec Loss 6.1832 LearningRate 0.0217 Epoch: 10 Global Step: 442920 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:01,871-Speed 2623.98 samples/sec Loss 6.2360 LearningRate 0.0217 Epoch: 10 Global Step: 442930 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:05,773-Speed 2624.85 samples/sec Loss 6.1774 LearningRate 0.0217 Epoch: 10 Global Step: 442940 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:09,677-Speed 2623.44 samples/sec Loss 6.2064 LearningRate 0.0217 Epoch: 10 Global Step: 442950 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:13,583-Speed 2622.67 samples/sec Loss 6.2062 LearningRate 0.0217 Epoch: 10 Global Step: 442960 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:17,463-Speed 2639.94 samples/sec Loss 6.1624 LearningRate 0.0217 Epoch: 10 Global Step: 442970 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:21,389-Speed 2608.73 samples/sec Loss 6.1397 LearningRate 0.0217 Epoch: 10 Global Step: 442980 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:25,305-Speed 2615.87 samples/sec Loss 6.1531 LearningRate 0.0217 Epoch: 10 Global Step: 442990 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:29,220-Speed 2616.85 samples/sec Loss 6.2916 LearningRate 0.0217 Epoch: 10 Global Step: 443000 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:33,119-Speed 2626.92 samples/sec Loss 6.1821 LearningRate 0.0217 Epoch: 10 Global Step: 443010 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:30:36,996-Speed 2641.26 samples/sec Loss 6.2661 LearningRate 0.0217 Epoch: 10 Global Step: 443020 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:30:40,899-Speed 2624.30 samples/sec Loss 6.1978 LearningRate 0.0217 Epoch: 10 Global Step: 443030 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:30:44,804-Speed 2623.37 samples/sec Loss 6.3801 LearningRate 0.0217 Epoch: 10 Global Step: 443040 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:30:48,710-Speed 2622.61 samples/sec Loss 6.2644 LearningRate 0.0217 Epoch: 10 Global Step: 443050 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:30:52,622-Speed 2618.96 samples/sec Loss 6.2692 LearningRate 0.0217 Epoch: 10 Global Step: 443060 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:30:56,522-Speed 2626.05 samples/sec Loss 6.0722 LearningRate 0.0217 Epoch: 10 Global Step: 443070 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:00,422-Speed 2626.16 samples/sec Loss 6.1741 LearningRate 0.0217 Epoch: 10 Global Step: 443080 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:04,324-Speed 2624.68 samples/sec Loss 6.2648 LearningRate 0.0217 Epoch: 10 Global Step: 443090 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:08,225-Speed 2625.80 samples/sec Loss 6.1283 LearningRate 0.0217 Epoch: 10 Global Step: 443100 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:12,131-Speed 2622.35 samples/sec Loss 6.3358 LearningRate 0.0217 Epoch: 10 Global Step: 443110 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:16,036-Speed 2623.49 samples/sec Loss 6.3871 LearningRate 0.0217 Epoch: 10 Global Step: 443120 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:19,945-Speed 2620.25 samples/sec Loss 6.2881 LearningRate 0.0217 Epoch: 10 Global Step: 443130 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:23,851-Speed 2622.39 samples/sec Loss 6.2446 LearningRate 0.0217 Epoch: 10 Global Step: 443140 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:27,764-Speed 2617.42 samples/sec Loss 6.2562 LearningRate 0.0217 Epoch: 10 Global Step: 443150 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:31,680-Speed 2615.17 samples/sec Loss 6.1975 LearningRate 0.0217 Epoch: 10 Global Step: 443160 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:35,584-Speed 2623.48 samples/sec Loss 6.3259 LearningRate 0.0217 Epoch: 10 Global Step: 443170 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:39,484-Speed 2625.94 samples/sec Loss 6.2789 LearningRate 0.0217 Epoch: 10 Global Step: 443180 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:43,383-Speed 2627.69 samples/sec Loss 6.2906 LearningRate 0.0217 Epoch: 10 Global Step: 443190 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:31:47,269-Speed 2635.83 samples/sec Loss 6.3322 LearningRate 0.0217 Epoch: 10 Global Step: 443200 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:51,170-Speed 2625.46 samples/sec Loss 6.1937 LearningRate 0.0217 Epoch: 10 Global Step: 443210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:55,072-Speed 2624.80 samples/sec Loss 6.2747 LearningRate 0.0217 Epoch: 10 Global Step: 443220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:31:58,984-Speed 2618.37 samples/sec Loss 6.1886 LearningRate 0.0217 Epoch: 10 Global Step: 443230 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:32:02,886-Speed 2624.86 samples/sec Loss 6.1641 LearningRate 0.0217 Epoch: 10 Global Step: 443240 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:32:06,799-Speed 2617.68 samples/sec Loss 6.1850 LearningRate 0.0217 Epoch: 10 Global Step: 443250 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:32:10,702-Speed 2624.52 samples/sec Loss 6.1766 LearningRate 0.0217 Epoch: 10 Global Step: 443260 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:32:14,628-Speed 2608.87 samples/sec Loss 6.0168 LearningRate 0.0217 Epoch: 10 Global Step: 443270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:32:18,532-Speed 2623.50 samples/sec Loss 6.2710 LearningRate 0.0217 Epoch: 10 Global Step: 443280 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:32:22,439-Speed 2621.78 samples/sec Loss 6.3020 LearningRate 0.0217 Epoch: 10 Global Step: 443290 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:32:26,349-Speed 2619.75 samples/sec Loss 6.1649 LearningRate 0.0217 Epoch: 10 Global Step: 443300 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:30,277-Speed 2607.30 samples/sec Loss 6.2611 LearningRate 0.0217 Epoch: 10 Global Step: 443310 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:34,194-Speed 2614.20 samples/sec Loss 6.2063 LearningRate 0.0217 Epoch: 10 Global Step: 443320 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:38,095-Speed 2626.24 samples/sec Loss 6.1554 LearningRate 0.0217 Epoch: 10 Global Step: 443330 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:41,996-Speed 2625.74 samples/sec Loss 6.2583 LearningRate 0.0217 Epoch: 10 Global Step: 443340 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:45,893-Speed 2628.45 samples/sec Loss 6.2782 LearningRate 0.0217 Epoch: 10 Global Step: 443350 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:49,844-Speed 2592.36 samples/sec Loss 6.1190 LearningRate 0.0217 Epoch: 10 Global Step: 443360 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:53,751-Speed 2622.01 samples/sec Loss 6.3327 LearningRate 0.0217 Epoch: 10 Global Step: 443370 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:32:57,691-Speed 2599.79 samples/sec Loss 6.1038 LearningRate 0.0217 Epoch: 10 Global Step: 443380 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:01,616-Speed 2609.80 samples/sec Loss 6.2435 LearningRate 0.0217 Epoch: 10 Global Step: 443390 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:05,528-Speed 2618.09 samples/sec Loss 6.2428 LearningRate 0.0217 Epoch: 10 Global Step: 443400 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:33:09,424-Speed 2629.09 samples/sec Loss 6.1294 LearningRate 0.0217 Epoch: 10 Global Step: 443410 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:13,331-Speed 2622.01 samples/sec Loss 6.1936 LearningRate 0.0217 Epoch: 10 Global Step: 443420 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:17,380-Speed 2529.53 samples/sec Loss 6.2707 LearningRate 0.0217 Epoch: 10 Global Step: 443430 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:21,319-Speed 2600.40 samples/sec Loss 6.1567 LearningRate 0.0217 Epoch: 10 Global Step: 443440 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:25,233-Speed 2617.22 samples/sec Loss 6.2302 LearningRate 0.0217 Epoch: 10 Global Step: 443450 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:29,138-Speed 2623.33 samples/sec Loss 6.1815 LearningRate 0.0217 Epoch: 10 Global Step: 443460 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:33,059-Speed 2611.87 samples/sec Loss 6.2042 LearningRate 0.0217 Epoch: 10 Global Step: 443470 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:36,969-Speed 2619.28 samples/sec Loss 6.1718 LearningRate 0.0217 Epoch: 10 Global Step: 443480 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:40,911-Speed 2597.94 samples/sec Loss 6.1819 LearningRate 0.0217 Epoch: 10 Global Step: 443490 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:44,813-Speed 2625.16 samples/sec Loss 6.1998 LearningRate 0.0217 Epoch: 10 Global Step: 443500 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:33:48,716-Speed 2624.91 samples/sec Loss 6.0901 LearningRate 0.0217 Epoch: 10 Global Step: 443510 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:33:52,587-Speed 2645.71 samples/sec Loss 6.2134 LearningRate 0.0217 Epoch: 10 Global Step: 443520 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:33:56,491-Speed 2623.74 samples/sec Loss 6.1575 LearningRate 0.0217 Epoch: 10 Global Step: 443530 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:00,405-Speed 2616.85 samples/sec Loss 6.1207 LearningRate 0.0217 Epoch: 10 Global Step: 443540 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:04,310-Speed 2622.49 samples/sec Loss 6.1592 LearningRate 0.0217 Epoch: 10 Global Step: 443550 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:08,216-Speed 2622.01 samples/sec Loss 6.2549 LearningRate 0.0217 Epoch: 10 Global Step: 443560 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:12,122-Speed 2622.56 samples/sec Loss 5.9988 LearningRate 0.0217 Epoch: 10 Global Step: 443570 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:16,026-Speed 2623.27 samples/sec Loss 6.2390 LearningRate 0.0217 Epoch: 10 Global Step: 443580 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:20,002-Speed 2576.35 samples/sec Loss 6.2414 LearningRate 0.0216 Epoch: 10 Global Step: 443590 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:23,965-Speed 2584.27 samples/sec Loss 6.2581 LearningRate 0.0216 Epoch: 10 Global Step: 443600 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:27,867-Speed 2625.33 samples/sec Loss 6.1707 LearningRate 0.0216 Epoch: 10 Global Step: 443610 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:34:31,775-Speed 2620.75 samples/sec Loss 6.1285 LearningRate 0.0216 Epoch: 10 Global Step: 443620 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:34:35,680-Speed 2622.68 samples/sec Loss 6.1347 LearningRate 0.0216 Epoch: 10 Global Step: 443630 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:34:39,583-Speed 2623.99 samples/sec Loss 6.2805 LearningRate 0.0216 Epoch: 10 Global Step: 443640 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:34:43,485-Speed 2625.86 samples/sec Loss 6.2486 LearningRate 0.0216 Epoch: 10 Global Step: 443650 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:34:47,404-Speed 2613.33 samples/sec Loss 6.1463 LearningRate 0.0216 Epoch: 10 Global Step: 443660 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:34:51,457-Speed 2527.05 samples/sec Loss 6.3717 LearningRate 0.0216 Epoch: 10 Global Step: 443670 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:34:55,371-Speed 2617.14 samples/sec Loss 6.2339 LearningRate 0.0216 Epoch: 10 Global Step: 443680 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:34:59,273-Speed 2625.43 samples/sec Loss 6.2556 LearningRate 0.0216 Epoch: 10 Global Step: 443690 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:35:03,212-Speed 2599.98 samples/sec Loss 6.1084 LearningRate 0.0216 Epoch: 10 Global Step: 443700 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:35:07,090-Speed 2641.25 samples/sec Loss 6.0297 LearningRate 0.0216 Epoch: 10 Global Step: 443710 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:10,990-Speed 2626.00 samples/sec Loss 6.1183 LearningRate 0.0216 Epoch: 10 Global Step: 443720 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:14,889-Speed 2626.73 samples/sec Loss 6.2072 LearningRate 0.0216 Epoch: 10 Global Step: 443730 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:18,805-Speed 2616.20 samples/sec Loss 6.2134 LearningRate 0.0216 Epoch: 10 Global Step: 443740 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:22,703-Speed 2627.37 samples/sec Loss 6.1941 LearningRate 0.0216 Epoch: 10 Global Step: 443750 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:26,603-Speed 2626.83 samples/sec Loss 6.2660 LearningRate 0.0216 Epoch: 10 Global Step: 443760 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:30,501-Speed 2627.64 samples/sec Loss 6.1234 LearningRate 0.0216 Epoch: 10 Global Step: 443770 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:34,398-Speed 2628.02 samples/sec Loss 6.1876 LearningRate 0.0216 Epoch: 10 Global Step: 443780 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:38,302-Speed 2623.77 samples/sec Loss 6.1357 LearningRate 0.0216 Epoch: 10 Global Step: 443790 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:42,200-Speed 2627.79 samples/sec Loss 6.2326 LearningRate 0.0216 Epoch: 10 Global Step: 443800 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:46,111-Speed 2618.75 samples/sec Loss 6.2187 LearningRate 0.0216 Epoch: 10 Global Step: 443810 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:35:50,026-Speed 2616.69 samples/sec Loss 6.2239 LearningRate 0.0216 Epoch: 10 Global Step: 443820 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:35:53,967-Speed 2598.67 samples/sec Loss 6.2676 LearningRate 0.0216 Epoch: 10 Global Step: 443830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:35:58,062-Speed 2501.94 samples/sec Loss 6.1349 LearningRate 0.0216 Epoch: 10 Global Step: 443840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:01,962-Speed 2626.18 samples/sec Loss 6.0763 LearningRate 0.0216 Epoch: 10 Global Step: 443850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:05,862-Speed 2625.93 samples/sec Loss 6.0397 LearningRate 0.0216 Epoch: 10 Global Step: 443860 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:09,762-Speed 2626.48 samples/sec Loss 6.2248 LearningRate 0.0216 Epoch: 10 Global Step: 443870 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:13,670-Speed 2620.62 samples/sec Loss 6.2397 LearningRate 0.0216 Epoch: 10 Global Step: 443880 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:17,572-Speed 2624.96 samples/sec Loss 6.1135 LearningRate 0.0216 Epoch: 10 Global Step: 443890 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:21,484-Speed 2618.15 samples/sec Loss 6.1284 LearningRate 0.0216 Epoch: 10 Global Step: 443900 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:25,394-Speed 2619.44 samples/sec Loss 6.1813 LearningRate 0.0216 Epoch: 10 Global Step: 443910 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:29,335-Speed 2599.66 samples/sec Loss 6.1610 LearningRate 0.0216 Epoch: 10 Global Step: 443920 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:36:33,293-Speed 2587.90 samples/sec Loss 6.2188 LearningRate 0.0216 Epoch: 10 Global Step: 443930 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:36:37,196-Speed 2623.93 samples/sec Loss 6.2222 LearningRate 0.0216 Epoch: 10 Global Step: 443940 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:36:41,114-Speed 2614.14 samples/sec Loss 6.1625 LearningRate 0.0216 Epoch: 10 Global Step: 443950 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:36:45,087-Speed 2578.06 samples/sec Loss 6.1089 LearningRate 0.0216 Epoch: 10 Global Step: 443960 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:36:49,024-Speed 2601.89 samples/sec Loss 6.1653 LearningRate 0.0216 Epoch: 10 Global Step: 443970 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:36:52,928-Speed 2623.96 samples/sec Loss 6.2642 LearningRate 0.0216 Epoch: 10 Global Step: 443980 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:36:56,834-Speed 2622.02 samples/sec Loss 6.1206 LearningRate 0.0216 Epoch: 10 Global Step: 443990 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:00,752-Speed 2614.47 samples/sec Loss 6.1512 LearningRate 0.0216 Epoch: 10 Global Step: 444000 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:04,723-Speed 2578.68 samples/sec Loss 6.0974 LearningRate 0.0216 Epoch: 10 Global Step: 444010 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:08,630-Speed 2622.17 samples/sec Loss 6.2652 LearningRate 0.0216 Epoch: 10 Global Step: 444020 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:12,548-Speed 2614.09 samples/sec Loss 6.2765 LearningRate 0.0216 Epoch: 10 Global Step: 444030 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:16,448-Speed 2626.15 samples/sec Loss 6.3965 LearningRate 0.0216 Epoch: 10 Global Step: 444040 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:20,356-Speed 2621.20 samples/sec Loss 6.1840 LearningRate 0.0216 Epoch: 10 Global Step: 444050 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:24,262-Speed 2622.52 samples/sec Loss 6.2730 LearningRate 0.0216 Epoch: 10 Global Step: 444060 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:28,175-Speed 2617.26 samples/sec Loss 6.1877 LearningRate 0.0216 Epoch: 10 Global Step: 444070 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:32,093-Speed 2614.03 samples/sec Loss 6.2499 LearningRate 0.0216 Epoch: 10 Global Step: 444080 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:35,996-Speed 2623.91 samples/sec Loss 6.1881 LearningRate 0.0216 Epoch: 10 Global Step: 444090 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:37:39,945-Speed 2593.99 samples/sec Loss 6.1596 LearningRate 0.0216 Epoch: 10 Global Step: 444100 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:37:43,831-Speed 2636.27 samples/sec Loss 6.2810 LearningRate 0.0216 Epoch: 10 Global Step: 444110 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:47,762-Speed 2605.53 samples/sec Loss 6.2282 LearningRate 0.0216 Epoch: 10 Global Step: 444120 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:51,670-Speed 2621.12 samples/sec Loss 6.1713 LearningRate 0.0216 Epoch: 10 Global Step: 444130 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:55,577-Speed 2621.75 samples/sec Loss 6.2775 LearningRate 0.0216 Epoch: 10 Global Step: 444140 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:37:59,484-Speed 2621.62 samples/sec Loss 6.2052 LearningRate 0.0216 Epoch: 10 Global Step: 444150 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:03,504-Speed 2547.70 samples/sec Loss 6.1945 LearningRate 0.0216 Epoch: 10 Global Step: 444160 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:07,481-Speed 2575.30 samples/sec Loss 6.1755 LearningRate 0.0216 Epoch: 10 Global Step: 444170 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:11,596-Speed 2489.32 samples/sec Loss 6.2827 LearningRate 0.0216 Epoch: 10 Global Step: 444180 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:15,504-Speed 2621.47 samples/sec Loss 6.1781 LearningRate 0.0216 Epoch: 10 Global Step: 444190 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:19,410-Speed 2622.35 samples/sec Loss 6.2590 LearningRate 0.0216 Epoch: 10 Global Step: 444200 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:23,302-Speed 2631.63 samples/sec Loss 6.0587 LearningRate 0.0216 Epoch: 10 Global Step: 444210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:27,210-Speed 2620.79 samples/sec Loss 6.2748 LearningRate 0.0216 Epoch: 10 Global Step: 444220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:31,129-Speed 2613.00 samples/sec Loss 6.2406 LearningRate 0.0216 Epoch: 10 Global Step: 444230 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:35,044-Speed 2616.61 samples/sec Loss 6.2057 LearningRate 0.0216 Epoch: 10 Global Step: 444240 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:38,947-Speed 2624.02 samples/sec Loss 6.1972 LearningRate 0.0216 Epoch: 10 Global Step: 444250 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:42,890-Speed 2597.92 samples/sec Loss 6.2737 LearningRate 0.0216 Epoch: 10 Global Step: 444260 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:46,996-Speed 2494.47 samples/sec Loss 6.1776 LearningRate 0.0216 Epoch: 10 Global Step: 444270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:51,069-Speed 2514.73 samples/sec Loss 6.3341 LearningRate 0.0216 Epoch: 10 Global Step: 444280 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:55,035-Speed 2582.81 samples/sec Loss 6.3376 LearningRate 0.0216 Epoch: 10 Global Step: 444290 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:38:58,935-Speed 2626.58 samples/sec Loss 6.2608 LearningRate 0.0216 Epoch: 10 Global Step: 444300 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:39:02,835-Speed 2626.02 samples/sec Loss 6.1664 LearningRate 0.0216 Epoch: 10 Global Step: 444310 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:06,738-Speed 2624.34 samples/sec Loss 6.1427 LearningRate 0.0216 Epoch: 10 Global Step: 444320 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:10,650-Speed 2617.88 samples/sec Loss 6.1191 LearningRate 0.0216 Epoch: 10 Global Step: 444330 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:14,551-Speed 2626.43 samples/sec Loss 6.1214 LearningRate 0.0216 Epoch: 10 Global Step: 444340 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:18,451-Speed 2626.42 samples/sec Loss 6.1316 LearningRate 0.0216 Epoch: 10 Global Step: 444350 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:22,353-Speed 2624.38 samples/sec Loss 6.1151 LearningRate 0.0216 Epoch: 10 Global Step: 444360 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:26,257-Speed 2623.91 samples/sec Loss 6.2526 LearningRate 0.0216 Epoch: 10 Global Step: 444370 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:30,158-Speed 2625.99 samples/sec Loss 6.1480 LearningRate 0.0216 Epoch: 10 Global Step: 444380 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:34,061-Speed 2623.91 samples/sec Loss 6.2300 LearningRate 0.0216 Epoch: 10 Global Step: 444390 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:37,968-Speed 2621.80 samples/sec Loss 6.2183 LearningRate 0.0216 Epoch: 10 Global Step: 444400 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:41,868-Speed 2626.52 samples/sec Loss 6.2500 LearningRate 0.0216 Epoch: 10 Global Step: 444410 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:39:45,766-Speed 2627.34 samples/sec Loss 6.1995 LearningRate 0.0216 Epoch: 10 Global Step: 444420 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:39:49,691-Speed 2609.78 samples/sec Loss 6.0802 LearningRate 0.0216 Epoch: 10 Global Step: 444430 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:53,590-Speed 2627.01 samples/sec Loss 6.1771 LearningRate 0.0216 Epoch: 10 Global Step: 444440 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:39:57,487-Speed 2628.77 samples/sec Loss 6.2122 LearningRate 0.0216 Epoch: 10 Global Step: 444450 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:40:01,362-Speed 2642.59 samples/sec Loss 6.2879 LearningRate 0.0216 Epoch: 10 Global Step: 444460 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:05,265-Speed 2624.55 samples/sec Loss 6.2589 LearningRate 0.0216 Epoch: 10 Global Step: 444470 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:09,178-Speed 2617.27 samples/sec Loss 6.1461 LearningRate 0.0215 Epoch: 10 Global Step: 444480 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:13,082-Speed 2624.11 samples/sec Loss 6.2286 LearningRate 0.0215 Epoch: 10 Global Step: 444490 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:16,994-Speed 2618.04 samples/sec Loss 6.1780 LearningRate 0.0215 Epoch: 10 Global Step: 444500 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:20,894-Speed 2626.46 samples/sec Loss 6.1744 LearningRate 0.0215 Epoch: 10 Global Step: 444510 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:24,799-Speed 2623.29 samples/sec Loss 6.2660 LearningRate 0.0215 Epoch: 10 Global Step: 444520 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:28,705-Speed 2621.91 samples/sec Loss 6.2576 LearningRate 0.0215 Epoch: 10 Global Step: 444530 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:32,612-Speed 2621.65 samples/sec Loss 6.2915 LearningRate 0.0215 Epoch: 10 Global Step: 444540 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:36,533-Speed 2611.90 samples/sec Loss 6.1329 LearningRate 0.0215 Epoch: 10 Global Step: 444550 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:40,444-Speed 2619.15 samples/sec Loss 6.2283 LearningRate 0.0215 Epoch: 10 Global Step: 444560 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:40:44,357-Speed 2618.40 samples/sec Loss 6.1363 LearningRate 0.0215 Epoch: 10 Global Step: 444570 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:48,265-Speed 2620.69 samples/sec Loss 6.1977 LearningRate 0.0215 Epoch: 10 Global Step: 444580 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:52,167-Speed 2625.32 samples/sec Loss 6.1737 LearningRate 0.0215 Epoch: 10 Global Step: 444590 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:56,079-Speed 2617.87 samples/sec Loss 6.2708 LearningRate 0.0215 Epoch: 10 Global Step: 444600 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:40:59,991-Speed 2619.04 samples/sec Loss 6.1756 LearningRate 0.0215 Epoch: 10 Global Step: 444610 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:41:03,894-Speed 2623.78 samples/sec Loss 6.1277 LearningRate 0.0215 Epoch: 10 Global Step: 444620 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:41:07,823-Speed 2606.61 samples/sec Loss 6.1541 LearningRate 0.0215 Epoch: 10 Global Step: 444630 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:41:11,729-Speed 2622.04 samples/sec Loss 6.3054 LearningRate 0.0215 Epoch: 10 Global Step: 444640 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:41:15,632-Speed 2624.28 samples/sec Loss 6.1334 LearningRate 0.0215 Epoch: 10 Global Step: 444650 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:41:19,531-Speed 2627.10 samples/sec Loss 6.1757 LearningRate 0.0215 Epoch: 10 Global Step: 444660 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:41:23,435-Speed 2623.64 samples/sec Loss 6.1132 LearningRate 0.0215 Epoch: 10 Global Step: 444670 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:27,341-Speed 2622.43 samples/sec Loss 6.2071 LearningRate 0.0215 Epoch: 10 Global Step: 444680 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:31,238-Speed 2628.60 samples/sec Loss 6.2342 LearningRate 0.0215 Epoch: 10 Global Step: 444690 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:35,134-Speed 2628.36 samples/sec Loss 6.2399 LearningRate 0.0215 Epoch: 10 Global Step: 444700 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:39,034-Speed 2626.51 samples/sec Loss 6.1407 LearningRate 0.0215 Epoch: 10 Global Step: 444710 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:42,930-Speed 2628.56 samples/sec Loss 6.1503 LearningRate 0.0215 Epoch: 10 Global Step: 444720 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:46,827-Speed 2628.55 samples/sec Loss 6.1981 LearningRate 0.0215 Epoch: 10 Global Step: 444730 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:50,725-Speed 2627.89 samples/sec Loss 6.1966 LearningRate 0.0215 Epoch: 10 Global Step: 444740 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:54,623-Speed 2627.98 samples/sec Loss 6.2024 LearningRate 0.0215 Epoch: 10 Global Step: 444750 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:41:58,529-Speed 2621.84 samples/sec Loss 6.1531 LearningRate 0.0215 Epoch: 10 Global Step: 444760 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:42:02,629-Speed 2498.90 samples/sec Loss 6.2196 LearningRate 0.0215 Epoch: 10 Global Step: 444770 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:42:06,547-Speed 2614.16 samples/sec Loss 6.1598 LearningRate 0.0215 Epoch: 10 Global Step: 444780 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:42:10,466-Speed 2613.55 samples/sec Loss 6.2074 LearningRate 0.0215 Epoch: 10 Global Step: 444790 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:14,360-Speed 2630.50 samples/sec Loss 6.2175 LearningRate 0.0215 Epoch: 10 Global Step: 444800 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:18,260-Speed 2626.77 samples/sec Loss 6.2519 LearningRate 0.0215 Epoch: 10 Global Step: 444810 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:22,166-Speed 2621.64 samples/sec Loss 6.2667 LearningRate 0.0215 Epoch: 10 Global Step: 444820 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:26,067-Speed 2625.32 samples/sec Loss 6.2488 LearningRate 0.0215 Epoch: 10 Global Step: 444830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:29,967-Speed 2627.08 samples/sec Loss 6.1777 LearningRate 0.0215 Epoch: 10 Global Step: 444840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:33,916-Speed 2594.02 samples/sec Loss 6.1872 LearningRate 0.0215 Epoch: 10 Global Step: 444850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:37,811-Speed 2629.30 samples/sec Loss 6.1634 LearningRate 0.0215 Epoch: 10 Global Step: 444860 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:41,721-Speed 2619.65 samples/sec Loss 6.1557 LearningRate 0.0215 Epoch: 10 Global Step: 444870 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:45,641-Speed 2612.84 samples/sec Loss 6.2923 LearningRate 0.0215 Epoch: 10 Global Step: 444880 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:42:49,535-Speed 2630.56 samples/sec Loss 6.1419 LearningRate 0.0215 Epoch: 10 Global Step: 444890 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:42:53,441-Speed 2622.60 samples/sec Loss 6.2459 LearningRate 0.0215 Epoch: 10 Global Step: 444900 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:42:57,317-Speed 2642.33 samples/sec Loss 6.3260 LearningRate 0.0215 Epoch: 10 Global Step: 444910 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:43:01,216-Speed 2626.86 samples/sec Loss 6.2122 LearningRate 0.0215 Epoch: 10 Global Step: 444920 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:43:05,120-Speed 2623.08 samples/sec Loss 6.3010 LearningRate 0.0215 Epoch: 10 Global Step: 444930 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:43:09,079-Speed 2587.38 samples/sec Loss 6.1853 LearningRate 0.0215 Epoch: 10 Global Step: 444940 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:43:13,172-Speed 2502.68 samples/sec Loss 6.1244 LearningRate 0.0215 Epoch: 10 Global Step: 444950 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:43:17,110-Speed 2600.72 samples/sec Loss 6.2879 LearningRate 0.0215 Epoch: 10 Global Step: 444960 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:21,004-Speed 2630.85 samples/sec Loss 6.1935 LearningRate 0.0215 Epoch: 10 Global Step: 444970 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:24,904-Speed 2625.95 samples/sec Loss 6.3598 LearningRate 0.0215 Epoch: 10 Global Step: 444980 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:28,812-Speed 2620.98 samples/sec Loss 6.1939 LearningRate 0.0215 Epoch: 10 Global Step: 444990 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:32,710-Speed 2627.82 samples/sec Loss 6.1865 LearningRate 0.0215 Epoch: 10 Global Step: 445000 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:36,617-Speed 2621.61 samples/sec Loss 6.1984 LearningRate 0.0215 Epoch: 10 Global Step: 445010 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:40,514-Speed 2628.04 samples/sec Loss 6.1984 LearningRate 0.0215 Epoch: 10 Global Step: 445020 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:44,411-Speed 2628.94 samples/sec Loss 6.2517 LearningRate 0.0215 Epoch: 10 Global Step: 445030 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:48,311-Speed 2626.25 samples/sec Loss 6.2154 LearningRate 0.0215 Epoch: 10 Global Step: 445040 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:52,206-Speed 2630.05 samples/sec Loss 6.0990 LearningRate 0.0215 Epoch: 10 Global Step: 445050 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:43:56,127-Speed 2611.60 samples/sec Loss 6.1454 LearningRate 0.0215 Epoch: 10 Global Step: 445060 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:44:00,026-Speed 2627.46 samples/sec Loss 6.2317 LearningRate 0.0215 Epoch: 10 Global Step: 445070 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:44:03,924-Speed 2627.25 samples/sec Loss 6.2336 LearningRate 0.0215 Epoch: 10 Global Step: 445080 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:44:07,804-Speed 2640.05 samples/sec Loss 6.2047 LearningRate 0.0215 Epoch: 10 Global Step: 445090 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:11,700-Speed 2629.03 samples/sec Loss 6.1857 LearningRate 0.0215 Epoch: 10 Global Step: 445100 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:15,599-Speed 2627.00 samples/sec Loss 6.1701 LearningRate 0.0215 Epoch: 10 Global Step: 445110 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:19,504-Speed 2623.47 samples/sec Loss 6.1713 LearningRate 0.0215 Epoch: 10 Global Step: 445120 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:23,398-Speed 2629.87 samples/sec Loss 6.1499 LearningRate 0.0215 Epoch: 10 Global Step: 445130 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:27,305-Speed 2621.97 samples/sec Loss 6.1475 LearningRate 0.0215 Epoch: 10 Global Step: 445140 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:31,213-Speed 2620.41 samples/sec Loss 6.1309 LearningRate 0.0215 Epoch: 10 Global Step: 445150 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:35,106-Speed 2631.52 samples/sec Loss 6.1717 LearningRate 0.0215 Epoch: 10 Global Step: 445160 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:38,998-Speed 2631.26 samples/sec Loss 6.1763 LearningRate 0.0215 Epoch: 10 Global Step: 445170 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:42,904-Speed 2622.85 samples/sec Loss 6.1240 LearningRate 0.0215 Epoch: 10 Global Step: 445180 Fp16 Grad Scale: 32768 Required: 43 hours
Training: 2022-04-14 21:44:46,804-Speed 2625.77 samples/sec Loss 6.3023 LearningRate 0.0215 Epoch: 10 Global Step: 445190 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:44:50,712-Speed 2621.45 samples/sec Loss 6.1623 LearningRate 0.0215 Epoch: 10 Global Step: 445200 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:44:54,617-Speed 2622.62 samples/sec Loss 6.2072 LearningRate 0.0215 Epoch: 10 Global Step: 445210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:44:58,541-Speed 2610.55 samples/sec Loss 6.0940 LearningRate 0.0215 Epoch: 10 Global Step: 445220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:02,442-Speed 2625.84 samples/sec Loss 6.2095 LearningRate 0.0215 Epoch: 10 Global Step: 445230 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:06,338-Speed 2628.73 samples/sec Loss 6.1290 LearningRate 0.0215 Epoch: 10 Global Step: 445240 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:10,238-Speed 2626.20 samples/sec Loss 6.2321 LearningRate 0.0215 Epoch: 10 Global Step: 445250 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:14,155-Speed 2615.30 samples/sec Loss 6.2308 LearningRate 0.0215 Epoch: 10 Global Step: 445260 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:18,064-Speed 2620.16 samples/sec Loss 6.1790 LearningRate 0.0215 Epoch: 10 Global Step: 445270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:21,964-Speed 2626.72 samples/sec Loss 6.1172 LearningRate 0.0215 Epoch: 10 Global Step: 445280 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:25,869-Speed 2622.63 samples/sec Loss 6.1877 LearningRate 0.0215 Epoch: 10 Global Step: 445290 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:45:29,769-Speed 2626.30 samples/sec Loss 6.2610 LearningRate 0.0215 Epoch: 10 Global Step: 445300 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:45:33,647-Speed 2641.23 samples/sec Loss 6.1569 LearningRate 0.0215 Epoch: 10 Global Step: 445310 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:37,571-Speed 2610.17 samples/sec Loss 6.1981 LearningRate 0.0215 Epoch: 10 Global Step: 445320 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:41,471-Speed 2626.39 samples/sec Loss 6.2043 LearningRate 0.0215 Epoch: 10 Global Step: 445330 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:45,369-Speed 2628.36 samples/sec Loss 6.1436 LearningRate 0.0215 Epoch: 10 Global Step: 445340 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:49,269-Speed 2626.23 samples/sec Loss 6.1559 LearningRate 0.0215 Epoch: 10 Global Step: 445350 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:53,168-Speed 2626.96 samples/sec Loss 6.1487 LearningRate 0.0215 Epoch: 10 Global Step: 445360 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:45:57,061-Speed 2630.50 samples/sec Loss 6.2024 LearningRate 0.0214 Epoch: 10 Global Step: 445370 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:46:00,958-Speed 2628.55 samples/sec Loss 6.1656 LearningRate 0.0214 Epoch: 10 Global Step: 445380 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:46:04,856-Speed 2627.50 samples/sec Loss 6.1226 LearningRate 0.0214 Epoch: 10 Global Step: 445390 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:46:08,756-Speed 2626.13 samples/sec Loss 6.2083 LearningRate 0.0214 Epoch: 10 Global Step: 445400 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:46:12,652-Speed 2629.38 samples/sec Loss 6.1648 LearningRate 0.0214 Epoch: 10 Global Step: 445410 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:16,551-Speed 2626.65 samples/sec Loss 6.1429 LearningRate 0.0214 Epoch: 10 Global Step: 445420 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:20,444-Speed 2631.28 samples/sec Loss 6.0724 LearningRate 0.0214 Epoch: 10 Global Step: 445430 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:24,392-Speed 2594.13 samples/sec Loss 6.0471 LearningRate 0.0214 Epoch: 10 Global Step: 445440 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:28,295-Speed 2624.61 samples/sec Loss 6.2152 LearningRate 0.0214 Epoch: 10 Global Step: 445450 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:32,194-Speed 2626.99 samples/sec Loss 6.2364 LearningRate 0.0214 Epoch: 10 Global Step: 445460 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:36,094-Speed 2626.09 samples/sec Loss 6.1536 LearningRate 0.0214 Epoch: 10 Global Step: 445470 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:39,989-Speed 2629.60 samples/sec Loss 6.1739 LearningRate 0.0214 Epoch: 10 Global Step: 445480 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:43,884-Speed 2629.56 samples/sec Loss 6.1448 LearningRate 0.0214 Epoch: 10 Global Step: 445490 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:47,784-Speed 2626.22 samples/sec Loss 6.1497 LearningRate 0.0214 Epoch: 10 Global Step: 445500 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:51,677-Speed 2631.23 samples/sec Loss 6.1058 LearningRate 0.0214 Epoch: 10 Global Step: 445510 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:46:55,562-Speed 2636.15 samples/sec Loss 6.1830 LearningRate 0.0214 Epoch: 10 Global Step: 445520 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:46:59,464-Speed 2625.05 samples/sec Loss 6.1534 LearningRate 0.0214 Epoch: 10 Global Step: 445530 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:47:03,365-Speed 2626.20 samples/sec Loss 6.1069 LearningRate 0.0214 Epoch: 10 Global Step: 445540 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:47:07,273-Speed 2620.81 samples/sec Loss 6.2280 LearningRate 0.0214 Epoch: 10 Global Step: 445550 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:47:11,168-Speed 2628.80 samples/sec Loss 6.2204 LearningRate 0.0214 Epoch: 10 Global Step: 445560 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:47:15,065-Speed 2628.95 samples/sec Loss 6.3008 LearningRate 0.0214 Epoch: 10 Global Step: 445570 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:47:18,960-Speed 2629.89 samples/sec Loss 6.2181 LearningRate 0.0214 Epoch: 10 Global Step: 445580 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:47:22,880-Speed 2612.66 samples/sec Loss 6.1665 LearningRate 0.0214 Epoch: 10 Global Step: 445590 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:47:26,773-Speed 2630.91 samples/sec Loss 6.1979 LearningRate 0.0214 Epoch: 10 Global Step: 445600 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:30,676-Speed 2624.77 samples/sec Loss 6.1600 LearningRate 0.0214 Epoch: 10 Global Step: 445610 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:34,592-Speed 2615.05 samples/sec Loss 6.1842 LearningRate 0.0214 Epoch: 10 Global Step: 445620 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:38,493-Speed 2625.64 samples/sec Loss 6.2095 LearningRate 0.0214 Epoch: 10 Global Step: 445630 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:42,605-Speed 2491.35 samples/sec Loss 6.1070 LearningRate 0.0214 Epoch: 10 Global Step: 445640 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:46,569-Speed 2583.96 samples/sec Loss 6.2159 LearningRate 0.0214 Epoch: 10 Global Step: 445650 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:50,465-Speed 2629.24 samples/sec Loss 6.1919 LearningRate 0.0214 Epoch: 10 Global Step: 445660 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:54,361-Speed 2628.74 samples/sec Loss 6.1568 LearningRate 0.0214 Epoch: 10 Global Step: 445670 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:47:58,283-Speed 2611.55 samples/sec Loss 6.2037 LearningRate 0.0214 Epoch: 10 Global Step: 445680 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:02,184-Speed 2625.67 samples/sec Loss 6.1001 LearningRate 0.0214 Epoch: 10 Global Step: 445690 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:06,109-Speed 2609.68 samples/sec Loss 6.1698 LearningRate 0.0214 Epoch: 10 Global Step: 445700 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:48:10,001-Speed 2631.72 samples/sec Loss 6.2408 LearningRate 0.0214 Epoch: 10 Global Step: 445710 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:48:13,898-Speed 2628.83 samples/sec Loss 6.1505 LearningRate 0.0214 Epoch: 10 Global Step: 445720 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:48:17,807-Speed 2620.05 samples/sec Loss 6.1083 LearningRate 0.0214 Epoch: 10 Global Step: 445730 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:48:21,709-Speed 2624.80 samples/sec Loss 6.3723 LearningRate 0.0214 Epoch: 10 Global Step: 445740 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:48:25,591-Speed 2639.16 samples/sec Loss 6.1664 LearningRate 0.0214 Epoch: 10 Global Step: 445750 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:29,511-Speed 2613.40 samples/sec Loss 6.1269 LearningRate 0.0214 Epoch: 10 Global Step: 445760 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:33,414-Speed 2623.61 samples/sec Loss 6.0922 LearningRate 0.0214 Epoch: 10 Global Step: 445770 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:37,348-Speed 2603.44 samples/sec Loss 6.1919 LearningRate 0.0214 Epoch: 10 Global Step: 445780 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:41,249-Speed 2625.63 samples/sec Loss 6.1148 LearningRate 0.0214 Epoch: 10 Global Step: 445790 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:45,150-Speed 2626.10 samples/sec Loss 6.1978 LearningRate 0.0214 Epoch: 10 Global Step: 445800 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:49,047-Speed 2628.06 samples/sec Loss 6.3134 LearningRate 0.0214 Epoch: 10 Global Step: 445810 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:52,945-Speed 2628.23 samples/sec Loss 6.2115 LearningRate 0.0214 Epoch: 10 Global Step: 445820 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:48:56,840-Speed 2629.59 samples/sec Loss 6.1567 LearningRate 0.0214 Epoch: 10 Global Step: 445830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:49:00,742-Speed 2624.66 samples/sec Loss 6.0891 LearningRate 0.0214 Epoch: 10 Global Step: 445840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:49:04,642-Speed 2626.16 samples/sec Loss 6.1536 LearningRate 0.0214 Epoch: 10 Global Step: 445850 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:08,545-Speed 2624.66 samples/sec Loss 6.2194 LearningRate 0.0214 Epoch: 10 Global Step: 445860 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:12,447-Speed 2625.10 samples/sec Loss 6.2649 LearningRate 0.0214 Epoch: 10 Global Step: 445870 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:16,343-Speed 2628.81 samples/sec Loss 6.2311 LearningRate 0.0214 Epoch: 10 Global Step: 445880 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:20,245-Speed 2624.71 samples/sec Loss 6.2171 LearningRate 0.0214 Epoch: 10 Global Step: 445890 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:24,149-Speed 2623.33 samples/sec Loss 6.2201 LearningRate 0.0214 Epoch: 10 Global Step: 445900 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:28,049-Speed 2626.94 samples/sec Loss 6.2024 LearningRate 0.0214 Epoch: 10 Global Step: 445910 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:31,946-Speed 2627.92 samples/sec Loss 6.2359 LearningRate 0.0214 Epoch: 10 Global Step: 445920 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:35,846-Speed 2626.48 samples/sec Loss 6.2394 LearningRate 0.0214 Epoch: 10 Global Step: 445930 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:39,744-Speed 2627.05 samples/sec Loss 6.1828 LearningRate 0.0214 Epoch: 10 Global Step: 445940 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:43,639-Speed 2630.17 samples/sec Loss 6.1534 LearningRate 0.0214 Epoch: 10 Global Step: 445950 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:47,560-Speed 2612.67 samples/sec Loss 6.1575 LearningRate 0.0214 Epoch: 10 Global Step: 445960 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:49:51,449-Speed 2633.69 samples/sec Loss 6.1866 LearningRate 0.0214 Epoch: 10 Global Step: 445970 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:49:55,346-Speed 2628.97 samples/sec Loss 5.9872 LearningRate 0.0214 Epoch: 10 Global Step: 445980 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:49:59,243-Speed 2628.35 samples/sec Loss 6.1829 LearningRate 0.0214 Epoch: 10 Global Step: 445990 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:03,189-Speed 2595.55 samples/sec Loss 6.1556 LearningRate 0.0214 Epoch: 10 Global Step: 446000 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:07,137-Speed 2594.85 samples/sec Loss 6.2052 LearningRate 0.0214 Epoch: 10 Global Step: 446010 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:11,037-Speed 2625.94 samples/sec Loss 6.2777 LearningRate 0.0214 Epoch: 10 Global Step: 446020 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:14,932-Speed 2629.92 samples/sec Loss 6.2424 LearningRate 0.0214 Epoch: 10 Global Step: 446030 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:18,833-Speed 2625.48 samples/sec Loss 6.0999 LearningRate 0.0214 Epoch: 10 Global Step: 446040 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:22,753-Speed 2613.59 samples/sec Loss 6.0786 LearningRate 0.0214 Epoch: 10 Global Step: 446050 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:26,651-Speed 2627.54 samples/sec Loss 6.2076 LearningRate 0.0214 Epoch: 10 Global Step: 446060 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:30,545-Speed 2630.61 samples/sec Loss 6.1517 LearningRate 0.0214 Epoch: 10 Global Step: 446070 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:50:34,415-Speed 2645.96 samples/sec Loss 6.2502 LearningRate 0.0214 Epoch: 10 Global Step: 446080 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:38,310-Speed 2630.03 samples/sec Loss 6.2981 LearningRate 0.0214 Epoch: 10 Global Step: 446090 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:42,203-Speed 2630.92 samples/sec Loss 6.1278 LearningRate 0.0214 Epoch: 10 Global Step: 446100 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:46,101-Speed 2627.83 samples/sec Loss 6.2101 LearningRate 0.0214 Epoch: 10 Global Step: 446110 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:50,014-Speed 2617.47 samples/sec Loss 6.0871 LearningRate 0.0214 Epoch: 10 Global Step: 446120 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:53,947-Speed 2604.08 samples/sec Loss 6.2350 LearningRate 0.0214 Epoch: 10 Global Step: 446130 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:50:57,846-Speed 2627.41 samples/sec Loss 6.1836 LearningRate 0.0214 Epoch: 10 Global Step: 446140 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:01,745-Speed 2627.20 samples/sec Loss 6.1845 LearningRate 0.0214 Epoch: 10 Global Step: 446150 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:05,655-Speed 2619.19 samples/sec Loss 6.2144 LearningRate 0.0214 Epoch: 10 Global Step: 446160 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:09,552-Speed 2628.36 samples/sec Loss 6.2288 LearningRate 0.0214 Epoch: 10 Global Step: 446170 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:13,440-Speed 2634.44 samples/sec Loss 6.1459 LearningRate 0.0214 Epoch: 10 Global Step: 446180 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:17,361-Speed 2612.37 samples/sec Loss 6.1586 LearningRate 0.0214 Epoch: 10 Global Step: 446190 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:21,279-Speed 2614.18 samples/sec Loss 6.1395 LearningRate 0.0214 Epoch: 10 Global Step: 446200 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:25,177-Speed 2627.28 samples/sec Loss 6.1712 LearningRate 0.0214 Epoch: 10 Global Step: 446210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:29,076-Speed 2627.71 samples/sec Loss 6.0898 LearningRate 0.0214 Epoch: 10 Global Step: 446220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:32,971-Speed 2629.66 samples/sec Loss 6.1631 LearningRate 0.0214 Epoch: 10 Global Step: 446230 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:36,872-Speed 2625.54 samples/sec Loss 6.2160 LearningRate 0.0214 Epoch: 10 Global Step: 446240 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:40,772-Speed 2626.22 samples/sec Loss 6.2993 LearningRate 0.0214 Epoch: 10 Global Step: 446250 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:44,683-Speed 2620.99 samples/sec Loss 6.2238 LearningRate 0.0214 Epoch: 10 Global Step: 446260 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:48,600-Speed 2615.06 samples/sec Loss 6.2157 LearningRate 0.0213 Epoch: 10 Global Step: 446270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:51:52,490-Speed 2633.41 samples/sec Loss 6.1616 LearningRate 0.0213 Epoch: 10 Global Step: 446280 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:51:56,386-Speed 2629.20 samples/sec Loss 6.0972 LearningRate 0.0213 Epoch: 10 Global Step: 446290 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:00,285-Speed 2626.76 samples/sec Loss 6.1876 LearningRate 0.0213 Epoch: 10 Global Step: 446300 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:04,197-Speed 2618.11 samples/sec Loss 6.1331 LearningRate 0.0213 Epoch: 10 Global Step: 446310 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:08,095-Speed 2627.33 samples/sec Loss 6.1010 LearningRate 0.0213 Epoch: 10 Global Step: 446320 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:11,990-Speed 2629.93 samples/sec Loss 6.0161 LearningRate 0.0213 Epoch: 10 Global Step: 446330 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:15,886-Speed 2628.96 samples/sec Loss 6.1116 LearningRate 0.0213 Epoch: 10 Global Step: 446340 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:19,786-Speed 2626.79 samples/sec Loss 6.1025 LearningRate 0.0213 Epoch: 10 Global Step: 446350 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:23,680-Speed 2630.29 samples/sec Loss 6.1865 LearningRate 0.0213 Epoch: 10 Global Step: 446360 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:27,574-Speed 2630.43 samples/sec Loss 6.1276 LearningRate 0.0213 Epoch: 10 Global Step: 446370 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:31,470-Speed 2628.97 samples/sec Loss 6.1694 LearningRate 0.0213 Epoch: 10 Global Step: 446380 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:52:35,354-Speed 2637.10 samples/sec Loss 6.1528 LearningRate 0.0213 Epoch: 10 Global Step: 446390 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:39,250-Speed 2628.86 samples/sec Loss 6.1374 LearningRate 0.0213 Epoch: 10 Global Step: 446400 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:43,149-Speed 2627.50 samples/sec Loss 6.2802 LearningRate 0.0213 Epoch: 10 Global Step: 446410 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:47,045-Speed 2628.71 samples/sec Loss 6.0964 LearningRate 0.0213 Epoch: 10 Global Step: 446420 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:50,941-Speed 2629.43 samples/sec Loss 6.2471 LearningRate 0.0213 Epoch: 10 Global Step: 446430 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:54,844-Speed 2623.78 samples/sec Loss 6.2182 LearningRate 0.0213 Epoch: 10 Global Step: 446440 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:52:58,746-Speed 2625.83 samples/sec Loss 6.0779 LearningRate 0.0213 Epoch: 10 Global Step: 446450 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:02,640-Speed 2630.02 samples/sec Loss 6.1222 LearningRate 0.0213 Epoch: 10 Global Step: 446460 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:06,535-Speed 2629.14 samples/sec Loss 6.1199 LearningRate 0.0213 Epoch: 10 Global Step: 446470 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:10,441-Speed 2622.73 samples/sec Loss 6.1716 LearningRate 0.0213 Epoch: 10 Global Step: 446480 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:14,323-Speed 2638.66 samples/sec Loss 5.9964 LearningRate 0.0213 Epoch: 10 Global Step: 446490 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:18,215-Speed 2631.52 samples/sec Loss 6.2176 LearningRate 0.0213 Epoch: 10 Global Step: 446500 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:22,161-Speed 2595.74 samples/sec Loss 6.1089 LearningRate 0.0213 Epoch: 10 Global Step: 446510 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:26,060-Speed 2627.47 samples/sec Loss 6.0902 LearningRate 0.0213 Epoch: 10 Global Step: 446520 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:29,990-Speed 2606.53 samples/sec Loss 6.2556 LearningRate 0.0213 Epoch: 10 Global Step: 446530 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:33,887-Speed 2628.08 samples/sec Loss 6.0810 LearningRate 0.0213 Epoch: 10 Global Step: 446540 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:37,790-Speed 2624.35 samples/sec Loss 6.3218 LearningRate 0.0213 Epoch: 10 Global Step: 446550 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:41,687-Speed 2628.20 samples/sec Loss 6.3294 LearningRate 0.0213 Epoch: 10 Global Step: 446560 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:45,582-Speed 2630.02 samples/sec Loss 6.1230 LearningRate 0.0213 Epoch: 10 Global Step: 446570 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:49,480-Speed 2627.45 samples/sec Loss 6.1205 LearningRate 0.0213 Epoch: 10 Global Step: 446580 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:53:53,390-Speed 2619.82 samples/sec Loss 6.0927 LearningRate 0.0213 Epoch: 10 Global Step: 446590 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:53:57,276-Speed 2635.23 samples/sec Loss 6.1415 LearningRate 0.0213 Epoch: 10 Global Step: 446600 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:01,171-Speed 2630.09 samples/sec Loss 6.1077 LearningRate 0.0213 Epoch: 10 Global Step: 446610 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:05,063-Speed 2631.78 samples/sec Loss 6.2628 LearningRate 0.0213 Epoch: 10 Global Step: 446620 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:08,959-Speed 2628.80 samples/sec Loss 6.1936 LearningRate 0.0213 Epoch: 10 Global Step: 446630 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:12,856-Speed 2628.57 samples/sec Loss 6.1455 LearningRate 0.0213 Epoch: 10 Global Step: 446640 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:16,755-Speed 2626.57 samples/sec Loss 6.2573 LearningRate 0.0213 Epoch: 10 Global Step: 446650 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:20,661-Speed 2622.22 samples/sec Loss 6.1038 LearningRate 0.0213 Epoch: 10 Global Step: 446660 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:24,549-Speed 2634.04 samples/sec Loss 6.1972 LearningRate 0.0213 Epoch: 10 Global Step: 446670 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:28,445-Speed 2629.17 samples/sec Loss 6.1404 LearningRate 0.0213 Epoch: 10 Global Step: 446680 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:32,346-Speed 2625.53 samples/sec Loss 6.2269 LearningRate 0.0213 Epoch: 10 Global Step: 446690 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:36,244-Speed 2627.92 samples/sec Loss 6.1989 LearningRate 0.0213 Epoch: 10 Global Step: 446700 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:54:40,120-Speed 2642.71 samples/sec Loss 6.2312 LearningRate 0.0213 Epoch: 10 Global Step: 446710 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:44,018-Speed 2627.37 samples/sec Loss 6.0282 LearningRate 0.0213 Epoch: 10 Global Step: 446720 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:47,923-Speed 2623.38 samples/sec Loss 6.1508 LearningRate 0.0213 Epoch: 10 Global Step: 446730 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:51,824-Speed 2625.64 samples/sec Loss 6.1395 LearningRate 0.0213 Epoch: 10 Global Step: 446740 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:55,722-Speed 2627.78 samples/sec Loss 6.1733 LearningRate 0.0213 Epoch: 10 Global Step: 446750 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:54:59,619-Speed 2628.04 samples/sec Loss 6.1318 LearningRate 0.0213 Epoch: 10 Global Step: 446760 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:03,529-Speed 2619.79 samples/sec Loss 6.2256 LearningRate 0.0213 Epoch: 10 Global Step: 446770 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:07,428-Speed 2627.16 samples/sec Loss 6.1736 LearningRate 0.0213 Epoch: 10 Global Step: 446780 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:11,328-Speed 2626.47 samples/sec Loss 6.1418 LearningRate 0.0213 Epoch: 10 Global Step: 446790 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:15,221-Speed 2630.97 samples/sec Loss 6.1064 LearningRate 0.0213 Epoch: 10 Global Step: 446800 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:19,116-Speed 2629.65 samples/sec Loss 6.0722 LearningRate 0.0213 Epoch: 10 Global Step: 446810 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:23,022-Speed 2622.32 samples/sec Loss 6.2172 LearningRate 0.0213 Epoch: 10 Global Step: 446820 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:26,933-Speed 2618.92 samples/sec Loss 6.1796 LearningRate 0.0213 Epoch: 10 Global Step: 446830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:30,963-Speed 2541.66 samples/sec Loss 6.1150 LearningRate 0.0213 Epoch: 10 Global Step: 446840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:34,873-Speed 2619.79 samples/sec Loss 6.1578 LearningRate 0.0213 Epoch: 10 Global Step: 446850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:38,770-Speed 2628.16 samples/sec Loss 6.1696 LearningRate 0.0213 Epoch: 10 Global Step: 446860 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:55:42,781-Speed 2553.65 samples/sec Loss 6.1849 LearningRate 0.0213 Epoch: 10 Global Step: 446870 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:55:46,679-Speed 2627.73 samples/sec Loss 6.2810 LearningRate 0.0213 Epoch: 10 Global Step: 446880 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:55:50,565-Speed 2635.74 samples/sec Loss 6.1965 LearningRate 0.0213 Epoch: 10 Global Step: 446890 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:54,462-Speed 2628.62 samples/sec Loss 6.1927 LearningRate 0.0213 Epoch: 10 Global Step: 446900 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:55:58,358-Speed 2628.87 samples/sec Loss 6.0909 LearningRate 0.0213 Epoch: 10 Global Step: 446910 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:02,261-Speed 2624.45 samples/sec Loss 6.1538 LearningRate 0.0213 Epoch: 10 Global Step: 446920 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:06,181-Speed 2612.28 samples/sec Loss 6.0958 LearningRate 0.0213 Epoch: 10 Global Step: 446930 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:10,124-Speed 2597.55 samples/sec Loss 6.0255 LearningRate 0.0213 Epoch: 10 Global Step: 446940 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:14,085-Speed 2586.49 samples/sec Loss 6.1464 LearningRate 0.0213 Epoch: 10 Global Step: 446950 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:17,982-Speed 2628.21 samples/sec Loss 6.2030 LearningRate 0.0213 Epoch: 10 Global Step: 446960 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:21,885-Speed 2624.36 samples/sec Loss 6.1878 LearningRate 0.0213 Epoch: 10 Global Step: 446970 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:25,790-Speed 2622.58 samples/sec Loss 6.1187 LearningRate 0.0213 Epoch: 10 Global Step: 446980 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:56:29,698-Speed 2627.58 samples/sec Loss 6.1457 LearningRate 0.0213 Epoch: 10 Global Step: 446990 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:56:33,601-Speed 2624.68 samples/sec Loss 6.1076 LearningRate 0.0213 Epoch: 10 Global Step: 447000 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:56:37,496-Speed 2629.19 samples/sec Loss 6.2448 LearningRate 0.0213 Epoch: 10 Global Step: 447010 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:56:41,388-Speed 2631.62 samples/sec Loss 6.0282 LearningRate 0.0213 Epoch: 10 Global Step: 447020 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:56:45,290-Speed 2625.18 samples/sec Loss 6.1864 LearningRate 0.0213 Epoch: 10 Global Step: 447030 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:56:49,199-Speed 2620.13 samples/sec Loss 6.1514 LearningRate 0.0213 Epoch: 10 Global Step: 447040 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:56:53,096-Speed 2628.86 samples/sec Loss 6.2301 LearningRate 0.0213 Epoch: 10 Global Step: 447050 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:56:56,968-Speed 2645.52 samples/sec Loss 6.3153 LearningRate 0.0213 Epoch: 10 Global Step: 447060 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:00,867-Speed 2626.75 samples/sec Loss 6.1864 LearningRate 0.0213 Epoch: 10 Global Step: 447070 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:04,771-Speed 2623.58 samples/sec Loss 6.2375 LearningRate 0.0213 Epoch: 10 Global Step: 447080 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:08,683-Speed 2618.08 samples/sec Loss 6.1534 LearningRate 0.0213 Epoch: 10 Global Step: 447090 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:12,584-Speed 2625.14 samples/sec Loss 6.1922 LearningRate 0.0213 Epoch: 10 Global Step: 447100 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:16,478-Speed 2630.99 samples/sec Loss 6.1491 LearningRate 0.0213 Epoch: 10 Global Step: 447110 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:20,384-Speed 2622.44 samples/sec Loss 6.2418 LearningRate 0.0213 Epoch: 10 Global Step: 447120 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:24,305-Speed 2611.89 samples/sec Loss 6.1160 LearningRate 0.0213 Epoch: 10 Global Step: 447130 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:28,237-Speed 2606.16 samples/sec Loss 6.1734 LearningRate 0.0213 Epoch: 10 Global Step: 447140 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:32,140-Speed 2624.16 samples/sec Loss 6.1308 LearningRate 0.0213 Epoch: 10 Global Step: 447150 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:36,089-Speed 2593.40 samples/sec Loss 6.2334 LearningRate 0.0213 Epoch: 10 Global Step: 447160 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:57:39,993-Speed 2623.70 samples/sec Loss 6.1445 LearningRate 0.0212 Epoch: 10 Global Step: 447170 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:57:43,897-Speed 2623.55 samples/sec Loss 6.2966 LearningRate 0.0212 Epoch: 10 Global Step: 447180 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:57:47,774-Speed 2641.75 samples/sec Loss 6.2205 LearningRate 0.0212 Epoch: 10 Global Step: 447190 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:51,670-Speed 2629.98 samples/sec Loss 6.1008 LearningRate 0.0212 Epoch: 10 Global Step: 447200 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:55,564-Speed 2629.67 samples/sec Loss 6.0312 LearningRate 0.0212 Epoch: 10 Global Step: 447210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:57:59,477-Speed 2617.75 samples/sec Loss 6.1871 LearningRate 0.0212 Epoch: 10 Global Step: 447220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:58:03,399-Speed 2611.41 samples/sec Loss 6.0828 LearningRate 0.0212 Epoch: 10 Global Step: 447230 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:58:07,297-Speed 2627.70 samples/sec Loss 6.0962 LearningRate 0.0212 Epoch: 10 Global Step: 447240 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:58:11,202-Speed 2622.93 samples/sec Loss 6.1927 LearningRate 0.0212 Epoch: 10 Global Step: 447250 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:58:15,096-Speed 2630.60 samples/sec Loss 6.0996 LearningRate 0.0212 Epoch: 10 Global Step: 447260 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:58:18,992-Speed 2629.12 samples/sec Loss 6.1789 LearningRate 0.0212 Epoch: 10 Global Step: 447270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:58:22,897-Speed 2622.63 samples/sec Loss 6.2451 LearningRate 0.0212 Epoch: 10 Global Step: 447280 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:58:26,797-Speed 2626.39 samples/sec Loss 6.1890 LearningRate 0.0212 Epoch: 10 Global Step: 447290 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:30,692-Speed 2629.63 samples/sec Loss 6.0990 LearningRate 0.0212 Epoch: 10 Global Step: 447300 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:34,588-Speed 2629.05 samples/sec Loss 6.1252 LearningRate 0.0212 Epoch: 10 Global Step: 447310 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:38,489-Speed 2625.50 samples/sec Loss 5.9831 LearningRate 0.0212 Epoch: 10 Global Step: 447320 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:42,394-Speed 2622.70 samples/sec Loss 6.2779 LearningRate 0.0212 Epoch: 10 Global Step: 447330 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:46,294-Speed 2626.51 samples/sec Loss 6.1591 LearningRate 0.0212 Epoch: 10 Global Step: 447340 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:50,189-Speed 2629.43 samples/sec Loss 6.0828 LearningRate 0.0212 Epoch: 10 Global Step: 447350 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:54,087-Speed 2628.03 samples/sec Loss 6.1326 LearningRate 0.0212 Epoch: 10 Global Step: 447360 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:58:57,983-Speed 2629.37 samples/sec Loss 6.2138 LearningRate 0.0212 Epoch: 10 Global Step: 447370 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:59:01,969-Speed 2569.15 samples/sec Loss 6.2053 LearningRate 0.0212 Epoch: 10 Global Step: 447380 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:59:05,887-Speed 2614.49 samples/sec Loss 6.1830 LearningRate 0.0212 Epoch: 10 Global Step: 447390 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 21:59:09,738-Speed 2659.56 samples/sec Loss 6.1740 LearningRate 0.0212 Epoch: 10 Global Step: 447400 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:13,636-Speed 2627.88 samples/sec Loss 6.1068 LearningRate 0.0212 Epoch: 10 Global Step: 447410 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:17,536-Speed 2625.81 samples/sec Loss 6.1054 LearningRate 0.0212 Epoch: 10 Global Step: 447420 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:21,448-Speed 2618.80 samples/sec Loss 6.0488 LearningRate 0.0212 Epoch: 10 Global Step: 447430 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:25,343-Speed 2630.14 samples/sec Loss 6.1444 LearningRate 0.0212 Epoch: 10 Global Step: 447440 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:29,237-Speed 2629.94 samples/sec Loss 6.1466 LearningRate 0.0212 Epoch: 10 Global Step: 447450 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:33,137-Speed 2626.48 samples/sec Loss 6.1086 LearningRate 0.0212 Epoch: 10 Global Step: 447460 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:37,035-Speed 2627.77 samples/sec Loss 6.2796 LearningRate 0.0212 Epoch: 10 Global Step: 447470 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:40,933-Speed 2627.09 samples/sec Loss 6.2150 LearningRate 0.0212 Epoch: 10 Global Step: 447480 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:44,829-Speed 2629.36 samples/sec Loss 6.3322 LearningRate 0.0212 Epoch: 10 Global Step: 447490 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 21:59:48,910-Speed 2509.71 samples/sec Loss 6.1975 LearningRate 0.0212 Epoch: 10 Global Step: 447500 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:59:52,958-Speed 2530.60 samples/sec Loss 6.2265 LearningRate 0.0212 Epoch: 10 Global Step: 447510 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 21:59:56,850-Speed 2631.79 samples/sec Loss 6.1334 LearningRate 0.0212 Epoch: 10 Global Step: 447520 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:00,754-Speed 2623.37 samples/sec Loss 6.0342 LearningRate 0.0212 Epoch: 10 Global Step: 447530 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:04,659-Speed 2622.68 samples/sec Loss 6.0945 LearningRate 0.0212 Epoch: 10 Global Step: 447540 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:08,552-Speed 2630.81 samples/sec Loss 6.1565 LearningRate 0.0212 Epoch: 10 Global Step: 447550 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:12,447-Speed 2629.59 samples/sec Loss 6.0949 LearningRate 0.0212 Epoch: 10 Global Step: 447560 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:16,342-Speed 2629.73 samples/sec Loss 6.1889 LearningRate 0.0212 Epoch: 10 Global Step: 447570 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:20,237-Speed 2629.70 samples/sec Loss 6.1889 LearningRate 0.0212 Epoch: 10 Global Step: 447580 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:24,149-Speed 2618.02 samples/sec Loss 6.2200 LearningRate 0.0212 Epoch: 10 Global Step: 447590 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:28,038-Speed 2633.63 samples/sec Loss 6.0937 LearningRate 0.0212 Epoch: 10 Global Step: 447600 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:31,932-Speed 2631.37 samples/sec Loss 6.1814 LearningRate 0.0212 Epoch: 10 Global Step: 447610 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:35,829-Speed 2629.11 samples/sec Loss 6.1798 LearningRate 0.0212 Epoch: 10 Global Step: 447620 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:00:39,713-Speed 2636.77 samples/sec Loss 6.1377 LearningRate 0.0212 Epoch: 10 Global Step: 447630 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:00:43,625-Speed 2618.50 samples/sec Loss 6.1166 LearningRate 0.0212 Epoch: 10 Global Step: 447640 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:00:47,525-Speed 2626.34 samples/sec Loss 6.1289 LearningRate 0.0212 Epoch: 10 Global Step: 447650 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:00:51,446-Speed 2612.32 samples/sec Loss 6.3227 LearningRate 0.0212 Epoch: 10 Global Step: 447660 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:00:55,351-Speed 2623.38 samples/sec Loss 6.1674 LearningRate 0.0212 Epoch: 10 Global Step: 447670 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:00:59,272-Speed 2612.16 samples/sec Loss 6.1318 LearningRate 0.0212 Epoch: 10 Global Step: 447680 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:03,172-Speed 2626.75 samples/sec Loss 6.0981 LearningRate 0.0212 Epoch: 10 Global Step: 447690 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:07,075-Speed 2623.60 samples/sec Loss 6.2647 LearningRate 0.0212 Epoch: 10 Global Step: 447700 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:10,972-Speed 2628.87 samples/sec Loss 6.1506 LearningRate 0.0212 Epoch: 10 Global Step: 447710 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:14,878-Speed 2622.05 samples/sec Loss 6.2308 LearningRate 0.0212 Epoch: 10 Global Step: 447720 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:18,779-Speed 2625.13 samples/sec Loss 6.2178 LearningRate 0.0212 Epoch: 10 Global Step: 447730 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:01:22,676-Speed 2629.04 samples/sec Loss 6.2305 LearningRate 0.0212 Epoch: 10 Global Step: 447740 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:01:26,571-Speed 2629.75 samples/sec Loss 6.0364 LearningRate 0.0212 Epoch: 10 Global Step: 447750 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:01:30,479-Speed 2621.02 samples/sec Loss 6.1874 LearningRate 0.0212 Epoch: 10 Global Step: 447760 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:01:34,360-Speed 2638.90 samples/sec Loss 6.1182 LearningRate 0.0212 Epoch: 10 Global Step: 447770 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:38,255-Speed 2629.53 samples/sec Loss 6.2151 LearningRate 0.0212 Epoch: 10 Global Step: 447780 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:42,169-Speed 2616.22 samples/sec Loss 6.1676 LearningRate 0.0212 Epoch: 10 Global Step: 447790 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:46,082-Speed 2618.04 samples/sec Loss 6.1846 LearningRate 0.0212 Epoch: 10 Global Step: 447800 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:49,980-Speed 2627.60 samples/sec Loss 6.0476 LearningRate 0.0212 Epoch: 10 Global Step: 447810 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:53,897-Speed 2615.61 samples/sec Loss 6.2153 LearningRate 0.0212 Epoch: 10 Global Step: 447820 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:01:57,792-Speed 2629.52 samples/sec Loss 6.0965 LearningRate 0.0212 Epoch: 10 Global Step: 447830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:02:01,738-Speed 2596.19 samples/sec Loss 6.1621 LearningRate 0.0212 Epoch: 10 Global Step: 447840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:02:05,633-Speed 2629.54 samples/sec Loss 6.1497 LearningRate 0.0212 Epoch: 10 Global Step: 447850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:02:09,568-Speed 2602.68 samples/sec Loss 6.2454 LearningRate 0.0212 Epoch: 10 Global Step: 447860 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:02:13,461-Speed 2631.16 samples/sec Loss 6.1486 LearningRate 0.0212 Epoch: 10 Global Step: 447870 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:17,360-Speed 2627.02 samples/sec Loss 6.1799 LearningRate 0.0212 Epoch: 10 Global Step: 447880 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:21,257-Speed 2628.56 samples/sec Loss 6.1020 LearningRate 0.0212 Epoch: 10 Global Step: 447890 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:25,158-Speed 2625.00 samples/sec Loss 6.1506 LearningRate 0.0212 Epoch: 10 Global Step: 447900 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:29,054-Speed 2630.05 samples/sec Loss 6.2176 LearningRate 0.0212 Epoch: 10 Global Step: 447910 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:32,950-Speed 2628.87 samples/sec Loss 6.0663 LearningRate 0.0212 Epoch: 10 Global Step: 447920 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:36,850-Speed 2625.58 samples/sec Loss 6.1758 LearningRate 0.0212 Epoch: 10 Global Step: 447930 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:40,748-Speed 2627.99 samples/sec Loss 6.0218 LearningRate 0.0212 Epoch: 10 Global Step: 447940 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:44,651-Speed 2624.42 samples/sec Loss 6.0969 LearningRate 0.0212 Epoch: 10 Global Step: 447950 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:48,553-Speed 2624.80 samples/sec Loss 6.1210 LearningRate 0.0212 Epoch: 10 Global Step: 447960 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:02:52,464-Speed 2619.36 samples/sec Loss 6.0838 LearningRate 0.0212 Epoch: 10 Global Step: 447970 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 22:02:56,359-Speed 2629.34 samples/sec Loss 6.1267 LearningRate 0.0212 Epoch: 10 Global Step: 447980 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:03:00,269-Speed 2619.70 samples/sec Loss 6.2332 LearningRate 0.0212 Epoch: 10 Global Step: 447990 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:03:04,199-Speed 2606.64 samples/sec Loss 6.0617 LearningRate 0.0212 Epoch: 10 Global Step: 448000 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:03:08,121-Speed 2610.86 samples/sec Loss 6.1764 LearningRate 0.0212 Epoch: 10 Global Step: 448010 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:03:12,022-Speed 2626.22 samples/sec Loss 6.1443 LearningRate 0.0212 Epoch: 10 Global Step: 448020 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:03:15,921-Speed 2627.12 samples/sec Loss 6.1300 LearningRate 0.0212 Epoch: 10 Global Step: 448030 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:03:19,826-Speed 2622.94 samples/sec Loss 6.1077 LearningRate 0.0212 Epoch: 10 Global Step: 448040 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:23,729-Speed 2624.30 samples/sec Loss 6.1565 LearningRate 0.0212 Epoch: 10 Global Step: 448050 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:27,625-Speed 2628.62 samples/sec Loss 6.1952 LearningRate 0.0212 Epoch: 10 Global Step: 448060 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:31,616-Speed 2566.58 samples/sec Loss 6.1345 LearningRate 0.0211 Epoch: 10 Global Step: 448070 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:35,606-Speed 2567.05 samples/sec Loss 6.1549 LearningRate 0.0211 Epoch: 10 Global Step: 448080 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:39,523-Speed 2615.12 samples/sec Loss 6.0354 LearningRate 0.0211 Epoch: 10 Global Step: 448090 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:43,428-Speed 2622.76 samples/sec Loss 6.1940 LearningRate 0.0211 Epoch: 10 Global Step: 448100 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:47,325-Speed 2628.56 samples/sec Loss 6.2218 LearningRate 0.0211 Epoch: 10 Global Step: 448110 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:51,219-Speed 2630.59 samples/sec Loss 6.1310 LearningRate 0.0211 Epoch: 10 Global Step: 448120 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:55,115-Speed 2628.99 samples/sec Loss 6.1306 LearningRate 0.0211 Epoch: 10 Global Step: 448130 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:03:59,017-Speed 2625.65 samples/sec Loss 6.1490 LearningRate 0.0211 Epoch: 10 Global Step: 448140 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:02,935-Speed 2613.72 samples/sec Loss 6.1884 LearningRate 0.0211 Epoch: 10 Global Step: 448150 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:06,840-Speed 2623.03 samples/sec Loss 6.2318 LearningRate 0.0211 Epoch: 10 Global Step: 448160 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:10,740-Speed 2626.39 samples/sec Loss 6.1406 LearningRate 0.0211 Epoch: 10 Global Step: 448170 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:14,640-Speed 2626.05 samples/sec Loss 6.0738 LearningRate 0.0211 Epoch: 10 Global Step: 448180 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:18,643-Speed 2559.04 samples/sec Loss 6.1540 LearningRate 0.0211 Epoch: 10 Global Step: 448190 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:22,540-Speed 2628.15 samples/sec Loss 6.0157 LearningRate 0.0211 Epoch: 10 Global Step: 448200 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:26,443-Speed 2625.14 samples/sec Loss 6.1330 LearningRate 0.0211 Epoch: 10 Global Step: 448210 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:30,338-Speed 2629.60 samples/sec Loss 6.0949 LearningRate 0.0211 Epoch: 10 Global Step: 448220 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:34,237-Speed 2626.77 samples/sec Loss 6.1839 LearningRate 0.0211 Epoch: 10 Global Step: 448230 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:38,113-Speed 2642.21 samples/sec Loss 6.2369 LearningRate 0.0211 Epoch: 10 Global Step: 448240 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:42,015-Speed 2625.09 samples/sec Loss 6.1106 LearningRate 0.0211 Epoch: 10 Global Step: 448250 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:45,915-Speed 2626.17 samples/sec Loss 6.2154 LearningRate 0.0211 Epoch: 10 Global Step: 448260 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:49,811-Speed 2629.18 samples/sec Loss 6.0911 LearningRate 0.0211 Epoch: 10 Global Step: 448270 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:53,703-Speed 2631.35 samples/sec Loss 6.1506 LearningRate 0.0211 Epoch: 10 Global Step: 448280 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:04:57,597-Speed 2630.52 samples/sec Loss 6.1593 LearningRate 0.0211 Epoch: 10 Global Step: 448290 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:01,492-Speed 2630.09 samples/sec Loss 6.1858 LearningRate 0.0211 Epoch: 10 Global Step: 448300 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:05,383-Speed 2632.14 samples/sec Loss 6.1578 LearningRate 0.0211 Epoch: 10 Global Step: 448310 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:09,274-Speed 2632.32 samples/sec Loss 6.1946 LearningRate 0.0211 Epoch: 10 Global Step: 448320 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:13,169-Speed 2629.90 samples/sec Loss 6.1663 LearningRate 0.0211 Epoch: 10 Global Step: 448330 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:17,074-Speed 2622.31 samples/sec Loss 6.1291 LearningRate 0.0211 Epoch: 10 Global Step: 448340 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 22:05:20,951-Speed 2642.28 samples/sec Loss 6.1600 LearningRate 0.0211 Epoch: 10 Global Step: 448350 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:24,850-Speed 2626.87 samples/sec Loss 6.1204 LearningRate 0.0211 Epoch: 10 Global Step: 448360 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:28,748-Speed 2628.53 samples/sec Loss 6.1050 LearningRate 0.0211 Epoch: 10 Global Step: 448370 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:32,643-Speed 2628.97 samples/sec Loss 6.2325 LearningRate 0.0211 Epoch: 10 Global Step: 448380 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:36,534-Speed 2632.73 samples/sec Loss 5.9927 LearningRate 0.0211 Epoch: 10 Global Step: 448390 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:40,426-Speed 2631.43 samples/sec Loss 6.0963 LearningRate 0.0211 Epoch: 10 Global Step: 448400 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:44,320-Speed 2630.07 samples/sec Loss 6.2191 LearningRate 0.0211 Epoch: 10 Global Step: 448410 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:48,213-Speed 2630.70 samples/sec Loss 6.0894 LearningRate 0.0211 Epoch: 10 Global Step: 448420 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:52,116-Speed 2624.96 samples/sec Loss 6.1907 LearningRate 0.0211 Epoch: 10 Global Step: 448430 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:56,016-Speed 2626.56 samples/sec Loss 6.2026 LearningRate 0.0211 Epoch: 10 Global Step: 448440 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:05:59,900-Speed 2636.92 samples/sec Loss 6.2214 LearningRate 0.0211 Epoch: 10 Global Step: 448450 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:06:03,802-Speed 2625.27 samples/sec Loss 6.0636 LearningRate 0.0211 Epoch: 10 Global Step: 448460 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:06:07,703-Speed 2625.38 samples/sec Loss 6.1555 LearningRate 0.0211 Epoch: 10 Global Step: 448470 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:06:11,601-Speed 2627.74 samples/sec Loss 6.1747 LearningRate 0.0211 Epoch: 10 Global Step: 448480 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:06:15,493-Speed 2631.47 samples/sec Loss 6.0965 LearningRate 0.0211 Epoch: 10 Global Step: 448490 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:06:19,362-Speed 2647.69 samples/sec Loss 6.0821 LearningRate 0.0211 Epoch: 10 Global Step: 448500 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:23,253-Speed 2632.21 samples/sec Loss 6.0803 LearningRate 0.0211 Epoch: 10 Global Step: 448510 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:27,145-Speed 2632.00 samples/sec Loss 6.1537 LearningRate 0.0211 Epoch: 10 Global Step: 448520 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:31,044-Speed 2626.84 samples/sec Loss 6.1995 LearningRate 0.0211 Epoch: 10 Global Step: 448530 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:34,942-Speed 2627.30 samples/sec Loss 6.1048 LearningRate 0.0211 Epoch: 10 Global Step: 448540 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:38,840-Speed 2627.66 samples/sec Loss 6.1372 LearningRate 0.0211 Epoch: 10 Global Step: 448550 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:42,734-Speed 2630.56 samples/sec Loss 6.0801 LearningRate 0.0211 Epoch: 10 Global Step: 448560 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:46,628-Speed 2630.33 samples/sec Loss 6.1840 LearningRate 0.0211 Epoch: 10 Global Step: 448570 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:50,522-Speed 2630.47 samples/sec Loss 6.1730 LearningRate 0.0211 Epoch: 10 Global Step: 448580 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:54,423-Speed 2626.00 samples/sec Loss 5.9872 LearningRate 0.0211 Epoch: 10 Global Step: 448590 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:06:58,317-Speed 2630.11 samples/sec Loss 5.9542 LearningRate 0.0211 Epoch: 10 Global Step: 448600 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:07:02,211-Speed 2630.19 samples/sec Loss 6.2262 LearningRate 0.0211 Epoch: 10 Global Step: 448610 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:07:06,126-Speed 2615.66 samples/sec Loss 6.0939 LearningRate 0.0211 Epoch: 10 Global Step: 448620 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:07:10,041-Speed 2616.83 samples/sec Loss 6.1425 LearningRate 0.0211 Epoch: 10 Global Step: 448630 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:07:13,934-Speed 2630.93 samples/sec Loss 6.1531 LearningRate 0.0211 Epoch: 10 Global Step: 448640 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:07:17,813-Speed 2640.73 samples/sec Loss 6.1913 LearningRate 0.0211 Epoch: 10 Global Step: 448650 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:21,721-Speed 2621.66 samples/sec Loss 6.0586 LearningRate 0.0211 Epoch: 10 Global Step: 448660 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:25,617-Speed 2628.86 samples/sec Loss 6.1063 LearningRate 0.0211 Epoch: 10 Global Step: 448670 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:29,513-Speed 2629.03 samples/sec Loss 6.1356 LearningRate 0.0211 Epoch: 10 Global Step: 448680 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:33,408-Speed 2629.49 samples/sec Loss 6.1833 LearningRate 0.0211 Epoch: 10 Global Step: 448690 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:37,310-Speed 2624.75 samples/sec Loss 6.0051 LearningRate 0.0211 Epoch: 10 Global Step: 448700 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:41,221-Speed 2618.76 samples/sec Loss 6.2235 LearningRate 0.0211 Epoch: 10 Global Step: 448710 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:45,117-Speed 2629.41 samples/sec Loss 6.1948 LearningRate 0.0211 Epoch: 10 Global Step: 448720 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:49,015-Speed 2627.57 samples/sec Loss 6.0262 LearningRate 0.0211 Epoch: 10 Global Step: 448730 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:52,914-Speed 2627.69 samples/sec Loss 6.0735 LearningRate 0.0211 Epoch: 10 Global Step: 448740 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:07:56,810-Speed 2628.76 samples/sec Loss 6.1126 LearningRate 0.0211 Epoch: 10 Global Step: 448750 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:08:00,713-Speed 2624.94 samples/sec Loss 6.1988 LearningRate 0.0211 Epoch: 10 Global Step: 448760 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:08:04,644-Speed 2604.90 samples/sec Loss 6.0405 LearningRate 0.0211 Epoch: 10 Global Step: 448770 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:08:08,536-Speed 2632.01 samples/sec Loss 6.2356 LearningRate 0.0211 Epoch: 10 Global Step: 448780 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:08:12,567-Speed 2540.63 samples/sec Loss 6.1194 LearningRate 0.0211 Epoch: 10 Global Step: 448790 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:08:16,496-Speed 2607.49 samples/sec Loss 6.1375 LearningRate 0.0211 Epoch: 10 Global Step: 448800 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:08:20,392-Speed 2628.52 samples/sec Loss 6.0875 LearningRate 0.0211 Epoch: 10 Global Step: 448810 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:08:24,306-Speed 2617.03 samples/sec Loss 6.1927 LearningRate 0.0211 Epoch: 10 Global Step: 448820 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:28,200-Speed 2630.75 samples/sec Loss 6.1326 LearningRate 0.0211 Epoch: 10 Global Step: 448830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:32,094-Speed 2630.30 samples/sec Loss 5.9879 LearningRate 0.0211 Epoch: 10 Global Step: 448840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:35,989-Speed 2629.78 samples/sec Loss 6.1702 LearningRate 0.0211 Epoch: 10 Global Step: 448850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:39,897-Speed 2620.87 samples/sec Loss 6.1594 LearningRate 0.0211 Epoch: 10 Global Step: 448860 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:43,818-Speed 2611.87 samples/sec Loss 6.1199 LearningRate 0.0211 Epoch: 10 Global Step: 448870 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:47,725-Speed 2622.15 samples/sec Loss 6.1331 LearningRate 0.0211 Epoch: 10 Global Step: 448880 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:51,625-Speed 2626.89 samples/sec Loss 6.2526 LearningRate 0.0211 Epoch: 10 Global Step: 448890 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:55,517-Speed 2631.52 samples/sec Loss 6.1082 LearningRate 0.0211 Epoch: 10 Global Step: 448900 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:08:59,412-Speed 2629.84 samples/sec Loss 6.0667 LearningRate 0.0211 Epoch: 10 Global Step: 448910 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:09:03,309-Speed 2628.22 samples/sec Loss 6.0617 LearningRate 0.0211 Epoch: 10 Global Step: 448920 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:07,206-Speed 2628.48 samples/sec Loss 6.0573 LearningRate 0.0211 Epoch: 10 Global Step: 448930 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:11,111-Speed 2622.17 samples/sec Loss 6.0971 LearningRate 0.0211 Epoch: 10 Global Step: 448940 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:15,005-Speed 2630.69 samples/sec Loss 6.2355 LearningRate 0.0211 Epoch: 10 Global Step: 448950 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:18,904-Speed 2626.91 samples/sec Loss 6.0790 LearningRate 0.0211 Epoch: 10 Global Step: 448960 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:22,806-Speed 2625.57 samples/sec Loss 6.1573 LearningRate 0.0210 Epoch: 10 Global Step: 448970 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:26,697-Speed 2632.40 samples/sec Loss 6.1713 LearningRate 0.0210 Epoch: 10 Global Step: 448980 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:30,589-Speed 2631.23 samples/sec Loss 6.0961 LearningRate 0.0210 Epoch: 10 Global Step: 448990 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:34,486-Speed 2628.25 samples/sec Loss 6.0464 LearningRate 0.0210 Epoch: 10 Global Step: 449000 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:38,380-Speed 2630.29 samples/sec Loss 6.1059 LearningRate 0.0210 Epoch: 10 Global Step: 449010 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:42,262-Speed 2638.01 samples/sec Loss 6.1128 LearningRate 0.0210 Epoch: 10 Global Step: 449020 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:46,158-Speed 2629.49 samples/sec Loss 6.3200 LearningRate 0.0210 Epoch: 10 Global Step: 449030 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:09:50,030-Speed 2645.38 samples/sec Loss 6.0952 LearningRate 0.0210 Epoch: 10 Global Step: 449040 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:09:53,927-Speed 2628.51 samples/sec Loss 6.0845 LearningRate 0.0210 Epoch: 10 Global Step: 449050 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:09:57,828-Speed 2625.85 samples/sec Loss 6.2402 LearningRate 0.0210 Epoch: 10 Global Step: 449060 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:01,726-Speed 2627.49 samples/sec Loss 6.1080 LearningRate 0.0210 Epoch: 10 Global Step: 449070 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:05,648-Speed 2611.23 samples/sec Loss 6.1561 LearningRate 0.0210 Epoch: 10 Global Step: 449080 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:09,545-Speed 2628.35 samples/sec Loss 6.1148 LearningRate 0.0210 Epoch: 10 Global Step: 449090 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:13,440-Speed 2629.71 samples/sec Loss 6.1250 LearningRate 0.0210 Epoch: 10 Global Step: 449100 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:17,346-Speed 2622.32 samples/sec Loss 6.0658 LearningRate 0.0210 Epoch: 10 Global Step: 449110 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:21,246-Speed 2626.95 samples/sec Loss 6.0742 LearningRate 0.0210 Epoch: 10 Global Step: 449120 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:25,144-Speed 2627.37 samples/sec Loss 6.1244 LearningRate 0.0210 Epoch: 10 Global Step: 449130 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:29,045-Speed 2625.72 samples/sec Loss 6.1215 LearningRate 0.0210 Epoch: 10 Global Step: 449140 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:10:32,919-Speed 2644.00 samples/sec Loss 6.2065 LearningRate 0.0210 Epoch: 10 Global Step: 449150 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:36,814-Speed 2629.68 samples/sec Loss 6.1525 LearningRate 0.0210 Epoch: 10 Global Step: 449160 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:40,719-Speed 2623.06 samples/sec Loss 6.1514 LearningRate 0.0210 Epoch: 10 Global Step: 449170 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:44,624-Speed 2622.54 samples/sec Loss 6.1197 LearningRate 0.0210 Epoch: 10 Global Step: 449180 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:48,519-Speed 2629.45 samples/sec Loss 6.1414 LearningRate 0.0210 Epoch: 10 Global Step: 449190 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:52,416-Speed 2629.21 samples/sec Loss 6.1065 LearningRate 0.0210 Epoch: 10 Global Step: 449200 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:10:56,312-Speed 2628.64 samples/sec Loss 6.0821 LearningRate 0.0210 Epoch: 10 Global Step: 449210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:00,214-Speed 2624.83 samples/sec Loss 6.1457 LearningRate 0.0210 Epoch: 10 Global Step: 449220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:04,115-Speed 2625.66 samples/sec Loss 6.1217 LearningRate 0.0210 Epoch: 10 Global Step: 449230 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:08,031-Speed 2615.39 samples/sec Loss 6.1424 LearningRate 0.0210 Epoch: 10 Global Step: 449240 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:11,939-Speed 2620.96 samples/sec Loss 6.2129 LearningRate 0.0210 Epoch: 10 Global Step: 449250 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:11:15,834-Speed 2629.24 samples/sec Loss 6.1247 LearningRate 0.0210 Epoch: 10 Global Step: 449260 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:11:19,717-Speed 2638.34 samples/sec Loss 6.2079 LearningRate 0.0210 Epoch: 10 Global Step: 449270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:23,618-Speed 2625.83 samples/sec Loss 6.1577 LearningRate 0.0210 Epoch: 10 Global Step: 449280 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:27,514-Speed 2629.01 samples/sec Loss 6.1810 LearningRate 0.0210 Epoch: 10 Global Step: 449290 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:31,418-Speed 2623.65 samples/sec Loss 6.1713 LearningRate 0.0210 Epoch: 10 Global Step: 449300 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:35,315-Speed 2628.36 samples/sec Loss 6.1953 LearningRate 0.0210 Epoch: 10 Global Step: 449310 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:39,209-Speed 2630.18 samples/sec Loss 6.1049 LearningRate 0.0210 Epoch: 10 Global Step: 449320 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:43,109-Speed 2625.75 samples/sec Loss 6.1678 LearningRate 0.0210 Epoch: 10 Global Step: 449330 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:47,002-Speed 2630.99 samples/sec Loss 6.1363 LearningRate 0.0210 Epoch: 10 Global Step: 449340 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:50,897-Speed 2630.02 samples/sec Loss 6.0200 LearningRate 0.0210 Epoch: 10 Global Step: 449350 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:54,801-Speed 2623.42 samples/sec Loss 6.1104 LearningRate 0.0210 Epoch: 10 Global Step: 449360 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:11:58,692-Speed 2632.65 samples/sec Loss 6.1440 LearningRate 0.0210 Epoch: 10 Global Step: 449370 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:02,583-Speed 2632.26 samples/sec Loss 6.0984 LearningRate 0.0210 Epoch: 10 Global Step: 449380 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:06,479-Speed 2629.20 samples/sec Loss 6.1313 LearningRate 0.0210 Epoch: 10 Global Step: 449390 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:10,381-Speed 2625.06 samples/sec Loss 6.0431 LearningRate 0.0210 Epoch: 10 Global Step: 449400 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:14,288-Speed 2621.01 samples/sec Loss 6.0930 LearningRate 0.0210 Epoch: 10 Global Step: 449410 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:18,214-Speed 2608.61 samples/sec Loss 6.1559 LearningRate 0.0210 Epoch: 10 Global Step: 449420 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:22,121-Speed 2622.15 samples/sec Loss 6.0840 LearningRate 0.0210 Epoch: 10 Global Step: 449430 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:26,031-Speed 2619.27 samples/sec Loss 6.0967 LearningRate 0.0210 Epoch: 10 Global Step: 449440 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:29,928-Speed 2628.29 samples/sec Loss 6.0961 LearningRate 0.0210 Epoch: 10 Global Step: 449450 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:33,824-Speed 2628.75 samples/sec Loss 6.1385 LearningRate 0.0210 Epoch: 10 Global Step: 449460 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:37,700-Speed 2643.06 samples/sec Loss 6.1654 LearningRate 0.0210 Epoch: 10 Global Step: 449470 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:41,603-Speed 2624.45 samples/sec Loss 6.0077 LearningRate 0.0210 Epoch: 10 Global Step: 449480 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:45,501-Speed 2627.16 samples/sec Loss 6.1611 LearningRate 0.0210 Epoch: 10 Global Step: 449490 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:49,396-Speed 2630.05 samples/sec Loss 6.0446 LearningRate 0.0210 Epoch: 10 Global Step: 449500 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:53,294-Speed 2627.14 samples/sec Loss 6.1017 LearningRate 0.0210 Epoch: 10 Global Step: 449510 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:12:57,193-Speed 2627.20 samples/sec Loss 6.1312 LearningRate 0.0210 Epoch: 10 Global Step: 449520 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:01,087-Speed 2629.95 samples/sec Loss 6.1724 LearningRate 0.0210 Epoch: 10 Global Step: 449530 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:05,041-Speed 2590.49 samples/sec Loss 6.1155 LearningRate 0.0210 Epoch: 10 Global Step: 449540 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:08,944-Speed 2624.28 samples/sec Loss 6.0880 LearningRate 0.0210 Epoch: 10 Global Step: 449550 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:12,844-Speed 2626.91 samples/sec Loss 6.1014 LearningRate 0.0210 Epoch: 10 Global Step: 449560 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:16,752-Speed 2620.59 samples/sec Loss 6.1312 LearningRate 0.0210 Epoch: 10 Global Step: 449570 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 22:13:20,642-Speed 2633.33 samples/sec Loss 6.0850 LearningRate 0.0210 Epoch: 10 Global Step: 449580 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:24,549-Speed 2621.13 samples/sec Loss 6.1252 LearningRate 0.0210 Epoch: 10 Global Step: 449590 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:28,454-Speed 2623.12 samples/sec Loss 6.1950 LearningRate 0.0210 Epoch: 10 Global Step: 449600 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:32,352-Speed 2627.40 samples/sec Loss 6.0518 LearningRate 0.0210 Epoch: 10 Global Step: 449610 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:36,350-Speed 2562.55 samples/sec Loss 6.1254 LearningRate 0.0210 Epoch: 10 Global Step: 449620 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:40,448-Speed 2498.95 samples/sec Loss 6.0595 LearningRate 0.0210 Epoch: 10 Global Step: 449630 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:44,361-Speed 2617.76 samples/sec Loss 6.0449 LearningRate 0.0210 Epoch: 10 Global Step: 449640 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:48,265-Speed 2623.71 samples/sec Loss 5.9505 LearningRate 0.0210 Epoch: 10 Global Step: 449650 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:52,176-Speed 2619.34 samples/sec Loss 6.0297 LearningRate 0.0210 Epoch: 10 Global Step: 449660 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:13:56,042-Speed 2648.93 samples/sec Loss 6.0552 LearningRate 0.0210 Epoch: 10 Global Step: 449670 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:13:59,945-Speed 2624.56 samples/sec Loss 6.1810 LearningRate 0.0210 Epoch: 10 Global Step: 449680 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:03,861-Speed 2615.59 samples/sec Loss 6.2948 LearningRate 0.0210 Epoch: 10 Global Step: 449690 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:07,759-Speed 2628.01 samples/sec Loss 6.1719 LearningRate 0.0210 Epoch: 10 Global Step: 449700 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:11,666-Speed 2621.38 samples/sec Loss 6.1439 LearningRate 0.0210 Epoch: 10 Global Step: 449710 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:15,608-Speed 2598.64 samples/sec Loss 6.0370 LearningRate 0.0210 Epoch: 10 Global Step: 449720 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:19,508-Speed 2625.74 samples/sec Loss 6.0434 LearningRate 0.0210 Epoch: 10 Global Step: 449730 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:23,420-Speed 2618.65 samples/sec Loss 6.0708 LearningRate 0.0210 Epoch: 10 Global Step: 449740 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:27,316-Speed 2628.94 samples/sec Loss 6.2001 LearningRate 0.0210 Epoch: 10 Global Step: 449750 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:31,208-Speed 2632.43 samples/sec Loss 6.1280 LearningRate 0.0210 Epoch: 10 Global Step: 449760 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:14:35,101-Speed 2630.48 samples/sec Loss 6.0792 LearningRate 0.0210 Epoch: 10 Global Step: 449770 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:14:39,022-Speed 2611.87 samples/sec Loss 6.1865 LearningRate 0.0210 Epoch: 10 Global Step: 449780 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:14:42,917-Speed 2629.99 samples/sec Loss 6.1839 LearningRate 0.0210 Epoch: 10 Global Step: 449790 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:14:46,811-Speed 2630.83 samples/sec Loss 6.1760 LearningRate 0.0210 Epoch: 10 Global Step: 449800 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:14:50,716-Speed 2622.75 samples/sec Loss 6.0267 LearningRate 0.0210 Epoch: 10 Global Step: 449810 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:14:54,610-Speed 2630.36 samples/sec Loss 6.1250 LearningRate 0.0210 Epoch: 10 Global Step: 449820 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:14:58,528-Speed 2614.16 samples/sec Loss 6.1210 LearningRate 0.0210 Epoch: 10 Global Step: 449830 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:15:02,402-Speed 2643.97 samples/sec Loss 6.1797 LearningRate 0.0210 Epoch: 10 Global Step: 449840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:06,303-Speed 2625.87 samples/sec Loss 6.1886 LearningRate 0.0210 Epoch: 10 Global Step: 449850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:10,196-Speed 2631.06 samples/sec Loss 6.2616 LearningRate 0.0210 Epoch: 10 Global Step: 449860 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:14,088-Speed 2631.25 samples/sec Loss 6.1386 LearningRate 0.0210 Epoch: 10 Global Step: 449870 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:17,986-Speed 2627.80 samples/sec Loss 6.1241 LearningRate 0.0209 Epoch: 10 Global Step: 449880 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:21,903-Speed 2615.28 samples/sec Loss 6.0478 LearningRate 0.0209 Epoch: 10 Global Step: 449890 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:25,797-Speed 2630.49 samples/sec Loss 6.1101 LearningRate 0.0209 Epoch: 10 Global Step: 449900 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:29,691-Speed 2630.43 samples/sec Loss 6.1764 LearningRate 0.0209 Epoch: 10 Global Step: 449910 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:33,588-Speed 2627.94 samples/sec Loss 6.0925 LearningRate 0.0209 Epoch: 10 Global Step: 449920 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:37,486-Speed 2627.81 samples/sec Loss 6.0129 LearningRate 0.0209 Epoch: 10 Global Step: 449930 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:15:41,385-Speed 2626.60 samples/sec Loss 6.0928 LearningRate 0.0209 Epoch: 10 Global Step: 449940 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:15:45,288-Speed 2624.72 samples/sec Loss 6.1307 LearningRate 0.0209 Epoch: 10 Global Step: 449950 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:15:49,181-Speed 2631.17 samples/sec Loss 5.9953 LearningRate 0.0209 Epoch: 10 Global Step: 449960 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:15:53,076-Speed 2630.37 samples/sec Loss 6.1911 LearningRate 0.0209 Epoch: 10 Global Step: 449970 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:15:57,001-Speed 2609.45 samples/sec Loss 6.1274 LearningRate 0.0209 Epoch: 10 Global Step: 449980 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:16:00,896-Speed 2629.72 samples/sec Loss 6.1016 LearningRate 0.0209 Epoch: 10 Global Step: 449990 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:16:04,797-Speed 2625.78 samples/sec Loss 6.0232 LearningRate 0.0209 Epoch: 10 Global Step: 450000 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:16:47,873-[lfw][450000]XNorm: 24.080963
Training: 2022-04-14 22:16:47,874-[lfw][450000]Accuracy-Flip: 0.99767+-0.00249
Training: 2022-04-14 22:16:47,874-[lfw][450000]Accuracy-Highest: 0.99783
Training: 2022-04-14 22:17:37,696-[cfp_fp][450000]XNorm: 22.422806
Training: 2022-04-14 22:17:37,697-[cfp_fp][450000]Accuracy-Flip: 0.98743+-0.00498
Training: 2022-04-14 22:17:37,698-[cfp_fp][450000]Accuracy-Highest: 0.98843
Training: 2022-04-14 22:18:20,934-[agedb_30][450000]XNorm: 24.076111
Training: 2022-04-14 22:18:20,935-[agedb_30][450000]Accuracy-Flip: 0.97817+-0.00669
Training: 2022-04-14 22:18:20,936-[agedb_30][450000]Accuracy-Highest: 0.97817
Training: 2022-04-14 22:18:24,828-Speed 73.13 samples/sec Loss 6.1232 LearningRate 0.0209 Epoch: 10 Global Step: 450010 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:28,706-Speed 2641.20 samples/sec Loss 6.1375 LearningRate 0.0209 Epoch: 10 Global Step: 450020 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:32,609-Speed 2624.28 samples/sec Loss 6.1276 LearningRate 0.0209 Epoch: 10 Global Step: 450030 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:36,476-Speed 2649.23 samples/sec Loss 6.0966 LearningRate 0.0209 Epoch: 10 Global Step: 450040 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:40,356-Speed 2639.52 samples/sec Loss 6.1304 LearningRate 0.0209 Epoch: 10 Global Step: 450050 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:44,265-Speed 2620.25 samples/sec Loss 6.0108 LearningRate 0.0209 Epoch: 10 Global Step: 450060 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:48,153-Speed 2634.86 samples/sec Loss 6.1389 LearningRate 0.0209 Epoch: 10 Global Step: 450070 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:52,048-Speed 2629.22 samples/sec Loss 6.1296 LearningRate 0.0209 Epoch: 10 Global Step: 450080 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:55,950-Speed 2625.40 samples/sec Loss 6.0237 LearningRate 0.0209 Epoch: 10 Global Step: 450090 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:18:59,856-Speed 2622.22 samples/sec Loss 6.2045 LearningRate 0.0209 Epoch: 10 Global Step: 450100 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:03,752-Speed 2629.50 samples/sec Loss 5.9696 LearningRate 0.0209 Epoch: 10 Global Step: 450110 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:07,668-Speed 2616.10 samples/sec Loss 6.0590 LearningRate 0.0209 Epoch: 10 Global Step: 450120 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:11,557-Speed 2633.26 samples/sec Loss 6.0895 LearningRate 0.0209 Epoch: 10 Global Step: 450130 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:15,454-Speed 2628.15 samples/sec Loss 6.1643 LearningRate 0.0209 Epoch: 10 Global Step: 450140 Fp16 Grad Scale: 262144 Required: 43 hours
Training: 2022-04-14 22:19:19,343-Speed 2633.59 samples/sec Loss 6.0731 LearningRate 0.0209 Epoch: 10 Global Step: 450150 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:23,252-Speed 2620.92 samples/sec Loss 6.0185 LearningRate 0.0209 Epoch: 10 Global Step: 450160 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:27,152-Speed 2625.88 samples/sec Loss 6.1590 LearningRate 0.0209 Epoch: 10 Global Step: 450170 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:31,047-Speed 2629.71 samples/sec Loss 6.0852 LearningRate 0.0209 Epoch: 10 Global Step: 450180 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:34,970-Speed 2611.15 samples/sec Loss 6.1078 LearningRate 0.0209 Epoch: 10 Global Step: 450190 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:38,866-Speed 2629.20 samples/sec Loss 6.0949 LearningRate 0.0209 Epoch: 10 Global Step: 450200 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:19:42,732-Speed 2649.23 samples/sec Loss 6.0098 LearningRate 0.0209 Epoch: 10 Global Step: 450210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:19:46,633-Speed 2625.39 samples/sec Loss 6.0777 LearningRate 0.0209 Epoch: 10 Global Step: 450220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:19:50,542-Speed 2620.41 samples/sec Loss 6.1309 LearningRate 0.0209 Epoch: 10 Global Step: 450230 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:19:54,461-Speed 2613.71 samples/sec Loss 6.0673 LearningRate 0.0209 Epoch: 10 Global Step: 450240 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:19:58,365-Speed 2624.62 samples/sec Loss 6.1242 LearningRate 0.0209 Epoch: 10 Global Step: 450250 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:02,280-Speed 2616.44 samples/sec Loss 6.1872 LearningRate 0.0209 Epoch: 10 Global Step: 450260 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:06,204-Speed 2610.51 samples/sec Loss 6.1132 LearningRate 0.0209 Epoch: 10 Global Step: 450270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:10,100-Speed 2628.29 samples/sec Loss 6.1355 LearningRate 0.0209 Epoch: 10 Global Step: 450280 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:13,996-Speed 2628.84 samples/sec Loss 5.9458 LearningRate 0.0209 Epoch: 10 Global Step: 450290 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:17,895-Speed 2626.92 samples/sec Loss 6.0181 LearningRate 0.0209 Epoch: 10 Global Step: 450300 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:21,814-Speed 2613.73 samples/sec Loss 6.0876 LearningRate 0.0209 Epoch: 10 Global Step: 450310 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:20:25,709-Speed 2629.80 samples/sec Loss 6.0749 LearningRate 0.0209 Epoch: 10 Global Step: 450320 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:20:29,604-Speed 2629.86 samples/sec Loss 5.9949 LearningRate 0.0209 Epoch: 10 Global Step: 450330 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:20:33,479-Speed 2643.53 samples/sec Loss 6.0896 LearningRate 0.0209 Epoch: 10 Global Step: 450340 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:37,375-Speed 2628.62 samples/sec Loss 6.0458 LearningRate 0.0209 Epoch: 10 Global Step: 450350 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:41,268-Speed 2630.74 samples/sec Loss 6.0888 LearningRate 0.0209 Epoch: 10 Global Step: 450360 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:45,210-Speed 2598.69 samples/sec Loss 6.0961 LearningRate 0.0209 Epoch: 10 Global Step: 450370 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:49,107-Speed 2628.22 samples/sec Loss 6.0582 LearningRate 0.0209 Epoch: 10 Global Step: 450380 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:53,005-Speed 2628.07 samples/sec Loss 6.2090 LearningRate 0.0209 Epoch: 10 Global Step: 450390 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:20:56,901-Speed 2628.64 samples/sec Loss 6.1977 LearningRate 0.0209 Epoch: 10 Global Step: 450400 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:21:00,795-Speed 2630.66 samples/sec Loss 6.0510 LearningRate 0.0209 Epoch: 10 Global Step: 450410 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:21:04,693-Speed 2627.54 samples/sec Loss 6.0852 LearningRate 0.0209 Epoch: 10 Global Step: 450420 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:21:08,606-Speed 2617.72 samples/sec Loss 6.2970 LearningRate 0.0209 Epoch: 10 Global Step: 450430 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:21:12,502-Speed 2628.42 samples/sec Loss 6.0415 LearningRate 0.0209 Epoch: 10 Global Step: 450440 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:16,405-Speed 2624.63 samples/sec Loss 6.0668 LearningRate 0.0209 Epoch: 10 Global Step: 450450 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:20,297-Speed 2631.96 samples/sec Loss 6.1349 LearningRate 0.0209 Epoch: 10 Global Step: 450460 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:24,197-Speed 2626.01 samples/sec Loss 6.1039 LearningRate 0.0209 Epoch: 10 Global Step: 450470 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:28,106-Speed 2620.30 samples/sec Loss 5.9765 LearningRate 0.0209 Epoch: 10 Global Step: 450480 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:32,002-Speed 2628.89 samples/sec Loss 6.1628 LearningRate 0.0209 Epoch: 10 Global Step: 450490 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:35,905-Speed 2623.89 samples/sec Loss 6.1460 LearningRate 0.0209 Epoch: 10 Global Step: 450500 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:39,803-Speed 2627.51 samples/sec Loss 6.0318 LearningRate 0.0209 Epoch: 10 Global Step: 450510 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:43,711-Speed 2621.22 samples/sec Loss 6.1650 LearningRate 0.0209 Epoch: 10 Global Step: 450520 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:47,602-Speed 2632.37 samples/sec Loss 6.1242 LearningRate 0.0209 Epoch: 10 Global Step: 450530 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:51,481-Speed 2640.90 samples/sec Loss 6.1742 LearningRate 0.0209 Epoch: 10 Global Step: 450540 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:21:55,362-Speed 2639.60 samples/sec Loss 5.9646 LearningRate 0.0209 Epoch: 10 Global Step: 450550 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:21:59,255-Speed 2631.06 samples/sec Loss 5.9563 LearningRate 0.0209 Epoch: 10 Global Step: 450560 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:03,151-Speed 2628.53 samples/sec Loss 6.0138 LearningRate 0.0209 Epoch: 10 Global Step: 450570 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:07,099-Speed 2594.45 samples/sec Loss 6.2093 LearningRate 0.0209 Epoch: 10 Global Step: 450580 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:11,003-Speed 2623.87 samples/sec Loss 6.1080 LearningRate 0.0209 Epoch: 10 Global Step: 450590 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:14,910-Speed 2621.91 samples/sec Loss 6.0760 LearningRate 0.0209 Epoch: 10 Global Step: 450600 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:18,806-Speed 2628.71 samples/sec Loss 6.1836 LearningRate 0.0209 Epoch: 10 Global Step: 450610 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:22,709-Speed 2623.96 samples/sec Loss 5.9944 LearningRate 0.0209 Epoch: 10 Global Step: 450620 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:26,615-Speed 2623.00 samples/sec Loss 6.0552 LearningRate 0.0209 Epoch: 10 Global Step: 450630 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:30,512-Speed 2628.35 samples/sec Loss 6.2391 LearningRate 0.0209 Epoch: 10 Global Step: 450640 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:34,409-Speed 2628.34 samples/sec Loss 6.1307 LearningRate 0.0209 Epoch: 10 Global Step: 450650 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:22:38,309-Speed 2625.78 samples/sec Loss 6.0997 LearningRate 0.0209 Epoch: 10 Global Step: 450660 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:22:42,192-Speed 2638.03 samples/sec Loss 6.1186 LearningRate 0.0209 Epoch: 10 Global Step: 450670 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:46,101-Speed 2619.91 samples/sec Loss 6.1304 LearningRate 0.0209 Epoch: 10 Global Step: 450680 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:50,010-Speed 2620.27 samples/sec Loss 6.0714 LearningRate 0.0209 Epoch: 10 Global Step: 450690 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:53,911-Speed 2625.75 samples/sec Loss 6.1745 LearningRate 0.0209 Epoch: 10 Global Step: 450700 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:22:57,820-Speed 2620.34 samples/sec Loss 6.0704 LearningRate 0.0209 Epoch: 10 Global Step: 450710 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:01,725-Speed 2623.20 samples/sec Loss 6.1537 LearningRate 0.0209 Epoch: 10 Global Step: 450720 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:05,703-Speed 2574.42 samples/sec Loss 6.0615 LearningRate 0.0209 Epoch: 10 Global Step: 450730 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:09,803-Speed 2498.40 samples/sec Loss 6.0351 LearningRate 0.0209 Epoch: 10 Global Step: 450740 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:13,889-Speed 2506.83 samples/sec Loss 6.1132 LearningRate 0.0209 Epoch: 10 Global Step: 450750 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:17,979-Speed 2504.06 samples/sec Loss 6.0437 LearningRate 0.0209 Epoch: 10 Global Step: 450760 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:21,994-Speed 2551.32 samples/sec Loss 6.1393 LearningRate 0.0209 Epoch: 10 Global Step: 450770 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:23:25,869-Speed 2643.16 samples/sec Loss 6.1230 LearningRate 0.0208 Epoch: 10 Global Step: 450780 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:29,759-Speed 2633.22 samples/sec Loss 6.0464 LearningRate 0.0208 Epoch: 10 Global Step: 450790 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:33,655-Speed 2628.67 samples/sec Loss 6.1381 LearningRate 0.0208 Epoch: 10 Global Step: 450800 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:37,551-Speed 2629.30 samples/sec Loss 6.0310 LearningRate 0.0208 Epoch: 10 Global Step: 450810 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:41,440-Speed 2633.57 samples/sec Loss 6.0904 LearningRate 0.0208 Epoch: 10 Global Step: 450820 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:45,333-Speed 2631.21 samples/sec Loss 6.1498 LearningRate 0.0208 Epoch: 10 Global Step: 450830 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:49,237-Speed 2623.28 samples/sec Loss 6.2185 LearningRate 0.0208 Epoch: 10 Global Step: 450840 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:53,140-Speed 2624.30 samples/sec Loss 5.9734 LearningRate 0.0208 Epoch: 10 Global Step: 450850 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:23:57,033-Speed 2631.11 samples/sec Loss 6.1705 LearningRate 0.0208 Epoch: 10 Global Step: 450860 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:00,929-Speed 2629.34 samples/sec Loss 6.1410 LearningRate 0.0208 Epoch: 10 Global Step: 450870 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:04,821-Speed 2631.37 samples/sec Loss 6.0611 LearningRate 0.0208 Epoch: 10 Global Step: 450880 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:24:08,732-Speed 2618.79 samples/sec Loss 5.9917 LearningRate 0.0208 Epoch: 10 Global Step: 450890 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:24:12,637-Speed 2623.08 samples/sec Loss 6.1028 LearningRate 0.0208 Epoch: 10 Global Step: 450900 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:24:16,530-Speed 2630.93 samples/sec Loss 6.0952 LearningRate 0.0208 Epoch: 10 Global Step: 450910 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:24:20,429-Speed 2626.51 samples/sec Loss 6.1412 LearningRate 0.0208 Epoch: 10 Global Step: 450920 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:24,329-Speed 2626.95 samples/sec Loss 5.9490 LearningRate 0.0208 Epoch: 10 Global Step: 450930 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:28,222-Speed 2630.66 samples/sec Loss 6.0620 LearningRate 0.0208 Epoch: 10 Global Step: 450940 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:32,129-Speed 2621.95 samples/sec Loss 6.1228 LearningRate 0.0208 Epoch: 10 Global Step: 450950 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:36,023-Speed 2630.46 samples/sec Loss 6.1120 LearningRate 0.0208 Epoch: 10 Global Step: 450960 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:39,918-Speed 2629.63 samples/sec Loss 6.1070 LearningRate 0.0208 Epoch: 10 Global Step: 450970 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:43,826-Speed 2620.32 samples/sec Loss 6.1451 LearningRate 0.0208 Epoch: 10 Global Step: 450980 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:47,721-Speed 2629.91 samples/sec Loss 6.0900 LearningRate 0.0208 Epoch: 10 Global Step: 450990 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:51,624-Speed 2624.17 samples/sec Loss 6.0767 LearningRate 0.0208 Epoch: 10 Global Step: 451000 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:55,517-Speed 2631.71 samples/sec Loss 5.9874 LearningRate 0.0208 Epoch: 10 Global Step: 451010 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:24:59,412-Speed 2629.39 samples/sec Loss 6.2664 LearningRate 0.0208 Epoch: 10 Global Step: 451020 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:03,308-Speed 2628.95 samples/sec Loss 5.9423 LearningRate 0.0208 Epoch: 10 Global Step: 451030 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:07,213-Speed 2622.52 samples/sec Loss 6.1269 LearningRate 0.0208 Epoch: 10 Global Step: 451040 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:11,117-Speed 2623.51 samples/sec Loss 6.1047 LearningRate 0.0208 Epoch: 10 Global Step: 451050 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:15,026-Speed 2620.46 samples/sec Loss 6.1095 LearningRate 0.0208 Epoch: 10 Global Step: 451060 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:18,924-Speed 2627.77 samples/sec Loss 6.0113 LearningRate 0.0208 Epoch: 10 Global Step: 451070 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:22,830-Speed 2621.93 samples/sec Loss 6.0828 LearningRate 0.0208 Epoch: 10 Global Step: 451080 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:26,728-Speed 2628.23 samples/sec Loss 6.0895 LearningRate 0.0208 Epoch: 10 Global Step: 451090 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:30,626-Speed 2627.42 samples/sec Loss 6.1004 LearningRate 0.0208 Epoch: 10 Global Step: 451100 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:34,522-Speed 2628.97 samples/sec Loss 6.0578 LearningRate 0.0208 Epoch: 10 Global Step: 451110 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:38,419-Speed 2628.16 samples/sec Loss 6.0538 LearningRate 0.0208 Epoch: 10 Global Step: 451120 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:25:42,295-Speed 2642.22 samples/sec Loss 6.0238 LearningRate 0.0208 Epoch: 10 Global Step: 451130 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:25:46,191-Speed 2628.92 samples/sec Loss 6.2146 LearningRate 0.0208 Epoch: 10 Global Step: 451140 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:25:50,086-Speed 2629.68 samples/sec Loss 6.0443 LearningRate 0.0208 Epoch: 10 Global Step: 451150 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:25:53,987-Speed 2626.08 samples/sec Loss 6.0475 LearningRate 0.0208 Epoch: 10 Global Step: 451160 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:25:57,892-Speed 2622.72 samples/sec Loss 6.0911 LearningRate 0.0208 Epoch: 10 Global Step: 451170 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:01,795-Speed 2624.88 samples/sec Loss 6.0993 LearningRate 0.0208 Epoch: 10 Global Step: 451180 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:05,694-Speed 2626.28 samples/sec Loss 6.0358 LearningRate 0.0208 Epoch: 10 Global Step: 451190 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:09,587-Speed 2631.00 samples/sec Loss 6.0916 LearningRate 0.0208 Epoch: 10 Global Step: 451200 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:13,481-Speed 2630.18 samples/sec Loss 5.9977 LearningRate 0.0208 Epoch: 10 Global Step: 451210 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:17,380-Speed 2627.18 samples/sec Loss 6.1272 LearningRate 0.0208 Epoch: 10 Global Step: 451220 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:21,336-Speed 2589.10 samples/sec Loss 6.0936 LearningRate 0.0208 Epoch: 10 Global Step: 451230 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:26:25,233-Speed 2628.67 samples/sec Loss 6.1519 LearningRate 0.0208 Epoch: 10 Global Step: 451240 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:26:29,129-Speed 2629.34 samples/sec Loss 6.0822 LearningRate 0.0208 Epoch: 10 Global Step: 451250 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:26:33,023-Speed 2630.38 samples/sec Loss 5.9950 LearningRate 0.0208 Epoch: 10 Global Step: 451260 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:36,922-Speed 2626.28 samples/sec Loss 6.0807 LearningRate 0.0208 Epoch: 10 Global Step: 451270 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:40,815-Speed 2631.10 samples/sec Loss 6.0521 LearningRate 0.0208 Epoch: 10 Global Step: 451280 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:44,714-Speed 2626.94 samples/sec Loss 6.1481 LearningRate 0.0208 Epoch: 10 Global Step: 451290 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:48,612-Speed 2628.05 samples/sec Loss 6.1205 LearningRate 0.0208 Epoch: 10 Global Step: 451300 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:52,508-Speed 2629.18 samples/sec Loss 6.1456 LearningRate 0.0208 Epoch: 10 Global Step: 451310 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:26:56,402-Speed 2630.50 samples/sec Loss 6.0484 LearningRate 0.0208 Epoch: 10 Global Step: 451320 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:27:00,296-Speed 2630.18 samples/sec Loss 6.1479 LearningRate 0.0208 Epoch: 10 Global Step: 451330 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:27:04,195-Speed 2626.37 samples/sec Loss 6.1586 LearningRate 0.0208 Epoch: 10 Global Step: 451340 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:27:08,095-Speed 2626.99 samples/sec Loss 6.0146 LearningRate 0.0208 Epoch: 10 Global Step: 451350 Fp16 Grad Scale: 65536 Required: 43 hours
Training: 2022-04-14 22:27:11,991-Speed 2628.92 samples/sec Loss 6.1862 LearningRate 0.0208 Epoch: 10 Global Step: 451360 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:27:15,888-Speed 2628.46 samples/sec Loss 6.1000 LearningRate 0.0208 Epoch: 10 Global Step: 451370 Fp16 Grad Scale: 131072 Required: 43 hours
Training: 2022-04-14 22:27:19,778-Speed 2633.00 samples/sec Loss 6.1872 LearningRate 0.0208 Epoch: 10 Global Step: 451380 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:23,687-Speed 2619.61 samples/sec Loss 6.0026 LearningRate 0.0208 Epoch: 10 Global Step: 451390 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:27,581-Speed 2631.02 samples/sec Loss 6.1495 LearningRate 0.0208 Epoch: 10 Global Step: 451400 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:31,494-Speed 2617.75 samples/sec Loss 6.0650 LearningRate 0.0208 Epoch: 10 Global Step: 451410 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:35,395-Speed 2625.44 samples/sec Loss 6.1702 LearningRate 0.0208 Epoch: 10 Global Step: 451420 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:39,290-Speed 2629.10 samples/sec Loss 6.1061 LearningRate 0.0208 Epoch: 10 Global Step: 451430 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:43,196-Speed 2622.52 samples/sec Loss 6.1711 LearningRate 0.0208 Epoch: 10 Global Step: 451440 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:47,095-Speed 2627.02 samples/sec Loss 6.1079 LearningRate 0.0208 Epoch: 10 Global Step: 451450 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:50,994-Speed 2627.52 samples/sec Loss 6.1710 LearningRate 0.0208 Epoch: 10 Global Step: 451460 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:54,890-Speed 2629.45 samples/sec Loss 6.0641 LearningRate 0.0208 Epoch: 10 Global Step: 451470 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:27:58,806-Speed 2615.54 samples/sec Loss 6.0878 LearningRate 0.0208 Epoch: 10 Global Step: 451480 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:28:02,711-Speed 2622.69 samples/sec Loss 6.1427 LearningRate 0.0208 Epoch: 10 Global Step: 451490 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:28:06,613-Speed 2625.04 samples/sec Loss 6.1945 LearningRate 0.0208 Epoch: 10 Global Step: 451500 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:28:10,554-Speed 2598.94 samples/sec Loss 6.1098 LearningRate 0.0208 Epoch: 10 Global Step: 451510 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:28:14,448-Speed 2630.51 samples/sec Loss 6.1546 LearningRate 0.0208 Epoch: 10 Global Step: 451520 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:28:18,330-Speed 2638.51 samples/sec Loss 6.0991 LearningRate 0.0208 Epoch: 10 Global Step: 451530 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:22,230-Speed 2626.53 samples/sec Loss 6.1257 LearningRate 0.0208 Epoch: 10 Global Step: 451540 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:26,155-Speed 2608.87 samples/sec Loss 6.1001 LearningRate 0.0208 Epoch: 10 Global Step: 451550 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:30,120-Speed 2583.99 samples/sec Loss 6.0224 LearningRate 0.0208 Epoch: 10 Global Step: 451560 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:34,013-Speed 2630.66 samples/sec Loss 6.0725 LearningRate 0.0208 Epoch: 10 Global Step: 451570 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:37,921-Speed 2621.18 samples/sec Loss 6.1436 LearningRate 0.0208 Epoch: 10 Global Step: 451580 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:41,813-Speed 2631.57 samples/sec Loss 6.0670 LearningRate 0.0208 Epoch: 10 Global Step: 451590 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:45,708-Speed 2629.96 samples/sec Loss 6.0412 LearningRate 0.0208 Epoch: 10 Global Step: 451600 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:49,625-Speed 2615.30 samples/sec Loss 5.9650 LearningRate 0.0208 Epoch: 10 Global Step: 451610 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:53,527-Speed 2625.03 samples/sec Loss 6.0302 LearningRate 0.0208 Epoch: 10 Global Step: 451620 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:28:57,424-Speed 2628.56 samples/sec Loss 6.0570 LearningRate 0.0208 Epoch: 10 Global Step: 451630 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:01,329-Speed 2622.64 samples/sec Loss 6.1357 LearningRate 0.0208 Epoch: 10 Global Step: 451640 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:05,225-Speed 2628.90 samples/sec Loss 6.0656 LearningRate 0.0208 Epoch: 10 Global Step: 451650 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:09,116-Speed 2632.33 samples/sec Loss 5.9760 LearningRate 0.0208 Epoch: 10 Global Step: 451660 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:13,007-Speed 2632.55 samples/sec Loss 6.0925 LearningRate 0.0208 Epoch: 10 Global Step: 451670 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:16,911-Speed 2623.08 samples/sec Loss 6.1055 LearningRate 0.0208 Epoch: 10 Global Step: 451680 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:20,804-Speed 2631.74 samples/sec Loss 6.0831 LearningRate 0.0207 Epoch: 10 Global Step: 451690 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:24,703-Speed 2626.54 samples/sec Loss 6.1294 LearningRate 0.0207 Epoch: 10 Global Step: 451700 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:28,635-Speed 2605.67 samples/sec Loss 6.1047 LearningRate 0.0207 Epoch: 10 Global Step: 451710 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:32,528-Speed 2630.31 samples/sec Loss 6.0330 LearningRate 0.0207 Epoch: 10 Global Step: 451720 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:36,399-Speed 2645.77 samples/sec Loss 6.0685 LearningRate 0.0207 Epoch: 10 Global Step: 451730 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:40,296-Speed 2628.09 samples/sec Loss 6.1512 LearningRate 0.0207 Epoch: 10 Global Step: 451740 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:44,192-Speed 2629.15 samples/sec Loss 6.0049 LearningRate 0.0207 Epoch: 10 Global Step: 451750 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:48,089-Speed 2628.11 samples/sec Loss 5.9852 LearningRate 0.0207 Epoch: 10 Global Step: 451760 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:51,994-Speed 2623.45 samples/sec Loss 6.0268 LearningRate 0.0207 Epoch: 10 Global Step: 451770 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:55,894-Speed 2625.92 samples/sec Loss 6.0763 LearningRate 0.0207 Epoch: 10 Global Step: 451780 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:29:59,791-Speed 2629.10 samples/sec Loss 6.0996 LearningRate 0.0207 Epoch: 10 Global Step: 451790 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:30:03,687-Speed 2628.58 samples/sec Loss 5.9392 LearningRate 0.0207 Epoch: 10 Global Step: 451800 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:30:07,598-Speed 2618.96 samples/sec Loss 6.1094 LearningRate 0.0207 Epoch: 10 Global Step: 451810 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:30:11,491-Speed 2630.71 samples/sec Loss 6.0119 LearningRate 0.0207 Epoch: 10 Global Step: 451820 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:30:15,346-Speed 2656.72 samples/sec Loss 6.0620 LearningRate 0.0207 Epoch: 10 Global Step: 451830 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:19,247-Speed 2626.45 samples/sec Loss 6.0614 LearningRate 0.0207 Epoch: 10 Global Step: 451840 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:23,141-Speed 2629.95 samples/sec Loss 6.0282 LearningRate 0.0207 Epoch: 10 Global Step: 451850 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:27,039-Speed 2628.47 samples/sec Loss 6.0657 LearningRate 0.0207 Epoch: 10 Global Step: 451860 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:31,125-Speed 2506.85 samples/sec Loss 6.0601 LearningRate 0.0207 Epoch: 10 Global Step: 451870 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:35,194-Speed 2517.71 samples/sec Loss 6.1748 LearningRate 0.0207 Epoch: 10 Global Step: 451880 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:39,088-Speed 2629.59 samples/sec Loss 6.1031 LearningRate 0.0207 Epoch: 10 Global Step: 451890 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:42,982-Speed 2630.35 samples/sec Loss 6.0660 LearningRate 0.0207 Epoch: 10 Global Step: 451900 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:46,876-Speed 2630.03 samples/sec Loss 6.1195 LearningRate 0.0207 Epoch: 10 Global Step: 451910 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:50,777-Speed 2626.17 samples/sec Loss 6.1476 LearningRate 0.0207 Epoch: 10 Global Step: 451920 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:30:54,668-Speed 2632.58 samples/sec Loss 6.1348 LearningRate 0.0207 Epoch: 10 Global Step: 451930 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:30:58,561-Speed 2631.29 samples/sec Loss 6.0626 LearningRate 0.0207 Epoch: 10 Global Step: 451940 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:31:02,460-Speed 2627.32 samples/sec Loss 5.9509 LearningRate 0.0207 Epoch: 10 Global Step: 451950 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:31:06,353-Speed 2631.16 samples/sec Loss 6.0474 LearningRate 0.0207 Epoch: 10 Global Step: 451960 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:31:10,255-Speed 2624.52 samples/sec Loss 6.1015 LearningRate 0.0207 Epoch: 10 Global Step: 451970 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:31:14,152-Speed 2627.90 samples/sec Loss 6.1033 LearningRate 0.0207 Epoch: 10 Global Step: 451980 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:31:18,047-Speed 2630.36 samples/sec Loss 6.0497 LearningRate 0.0207 Epoch: 10 Global Step: 451990 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:31:21,918-Speed 2646.04 samples/sec Loss 6.0064 LearningRate 0.0207 Epoch: 10 Global Step: 452000 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:25,822-Speed 2623.92 samples/sec Loss 5.9862 LearningRate 0.0207 Epoch: 10 Global Step: 452010 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:29,726-Speed 2623.35 samples/sec Loss 6.1307 LearningRate 0.0207 Epoch: 10 Global Step: 452020 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:33,628-Speed 2624.63 samples/sec Loss 6.0045 LearningRate 0.0207 Epoch: 10 Global Step: 452030 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:37,524-Speed 2629.27 samples/sec Loss 6.1028 LearningRate 0.0207 Epoch: 10 Global Step: 452040 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:41,442-Speed 2613.90 samples/sec Loss 5.9924 LearningRate 0.0207 Epoch: 10 Global Step: 452050 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:45,343-Speed 2625.56 samples/sec Loss 6.1531 LearningRate 0.0207 Epoch: 10 Global Step: 452060 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:49,242-Speed 2626.84 samples/sec Loss 6.0185 LearningRate 0.0207 Epoch: 10 Global Step: 452070 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:53,142-Speed 2626.01 samples/sec Loss 6.2047 LearningRate 0.0207 Epoch: 10 Global Step: 452080 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:31:57,035-Speed 2631.42 samples/sec Loss 6.0374 LearningRate 0.0207 Epoch: 10 Global Step: 452090 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:32:00,927-Speed 2631.91 samples/sec Loss 5.9557 LearningRate 0.0207 Epoch: 10 Global Step: 452100 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:04,821-Speed 2630.13 samples/sec Loss 6.1196 LearningRate 0.0207 Epoch: 10 Global Step: 452110 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:08,724-Speed 2624.50 samples/sec Loss 6.1340 LearningRate 0.0207 Epoch: 10 Global Step: 452120 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:12,617-Speed 2630.52 samples/sec Loss 6.0594 LearningRate 0.0207 Epoch: 10 Global Step: 452130 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:16,516-Speed 2627.01 samples/sec Loss 6.0921 LearningRate 0.0207 Epoch: 10 Global Step: 452140 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:20,522-Speed 2556.38 samples/sec Loss 6.0823 LearningRate 0.0207 Epoch: 10 Global Step: 452150 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:24,418-Speed 2629.73 samples/sec Loss 6.0862 LearningRate 0.0207 Epoch: 10 Global Step: 452160 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:28,316-Speed 2627.17 samples/sec Loss 6.1431 LearningRate 0.0207 Epoch: 10 Global Step: 452170 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:32,212-Speed 2628.75 samples/sec Loss 6.0604 LearningRate 0.0207 Epoch: 10 Global Step: 452180 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:32:36,083-Speed 2646.14 samples/sec Loss 6.1851 LearningRate 0.0207 Epoch: 10 Global Step: 452190 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:32:39,979-Speed 2629.54 samples/sec Loss 6.0540 LearningRate 0.0207 Epoch: 10 Global Step: 452200 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:32:43,877-Speed 2627.22 samples/sec Loss 6.0669 LearningRate 0.0207 Epoch: 10 Global Step: 452210 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:32:47,771-Speed 2630.58 samples/sec Loss 6.0616 LearningRate 0.0207 Epoch: 10 Global Step: 452220 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:32:51,672-Speed 2625.42 samples/sec Loss 6.0861 LearningRate 0.0207 Epoch: 10 Global Step: 452230 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:32:55,575-Speed 2624.63 samples/sec Loss 6.0701 LearningRate 0.0207 Epoch: 10 Global Step: 452240 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:32:59,478-Speed 2624.08 samples/sec Loss 6.1803 LearningRate 0.0207 Epoch: 10 Global Step: 452250 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:03,385-Speed 2621.30 samples/sec Loss 6.0574 LearningRate 0.0207 Epoch: 10 Global Step: 452260 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:07,281-Speed 2628.81 samples/sec Loss 5.9940 LearningRate 0.0207 Epoch: 10 Global Step: 452270 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:11,183-Speed 2625.35 samples/sec Loss 6.1547 LearningRate 0.0207 Epoch: 10 Global Step: 452280 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:15,078-Speed 2629.64 samples/sec Loss 6.0713 LearningRate 0.0207 Epoch: 10 Global Step: 452290 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:33:18,979-Speed 2625.40 samples/sec Loss 6.0430 LearningRate 0.0207 Epoch: 10 Global Step: 452300 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:33:22,885-Speed 2623.45 samples/sec Loss 6.1196 LearningRate 0.0207 Epoch: 10 Global Step: 452310 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:33:26,787-Speed 2624.50 samples/sec Loss 6.0951 LearningRate 0.0207 Epoch: 10 Global Step: 452320 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:33:30,688-Speed 2625.40 samples/sec Loss 5.9745 LearningRate 0.0207 Epoch: 10 Global Step: 452330 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:33:34,601-Speed 2617.50 samples/sec Loss 5.9964 LearningRate 0.0207 Epoch: 10 Global Step: 452340 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:33:38,488-Speed 2635.31 samples/sec Loss 6.0801 LearningRate 0.0207 Epoch: 10 Global Step: 452350 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:42,384-Speed 2628.42 samples/sec Loss 6.1396 LearningRate 0.0207 Epoch: 10 Global Step: 452360 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:46,280-Speed 2629.63 samples/sec Loss 6.0756 LearningRate 0.0207 Epoch: 10 Global Step: 452370 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:50,189-Speed 2621.03 samples/sec Loss 6.0458 LearningRate 0.0207 Epoch: 10 Global Step: 452380 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:54,086-Speed 2628.07 samples/sec Loss 5.9992 LearningRate 0.0207 Epoch: 10 Global Step: 452390 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:33:58,006-Speed 2613.61 samples/sec Loss 6.0003 LearningRate 0.0207 Epoch: 10 Global Step: 452400 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:01,902-Speed 2628.35 samples/sec Loss 6.0671 LearningRate 0.0207 Epoch: 10 Global Step: 452410 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:05,797-Speed 2629.82 samples/sec Loss 6.0143 LearningRate 0.0207 Epoch: 10 Global Step: 452420 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:09,704-Speed 2621.39 samples/sec Loss 5.9651 LearningRate 0.0207 Epoch: 10 Global Step: 452430 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:13,597-Speed 2631.15 samples/sec Loss 6.1847 LearningRate 0.0207 Epoch: 10 Global Step: 452440 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:17,494-Speed 2628.00 samples/sec Loss 6.1602 LearningRate 0.0207 Epoch: 10 Global Step: 452450 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:34:21,393-Speed 2627.73 samples/sec Loss 6.0950 LearningRate 0.0207 Epoch: 10 Global Step: 452460 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:34:25,276-Speed 2638.00 samples/sec Loss 6.1084 LearningRate 0.0207 Epoch: 10 Global Step: 452470 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:29,224-Speed 2594.81 samples/sec Loss 6.0420 LearningRate 0.0207 Epoch: 10 Global Step: 452480 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:33,131-Speed 2621.30 samples/sec Loss 6.0573 LearningRate 0.0207 Epoch: 10 Global Step: 452490 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:37,024-Speed 2630.79 samples/sec Loss 6.2141 LearningRate 0.0207 Epoch: 10 Global Step: 452500 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:40,926-Speed 2624.80 samples/sec Loss 6.0143 LearningRate 0.0207 Epoch: 10 Global Step: 452510 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:44,821-Speed 2629.47 samples/sec Loss 5.9001 LearningRate 0.0207 Epoch: 10 Global Step: 452520 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:48,719-Speed 2627.91 samples/sec Loss 5.9946 LearningRate 0.0207 Epoch: 10 Global Step: 452530 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:52,621-Speed 2624.59 samples/sec Loss 6.0182 LearningRate 0.0207 Epoch: 10 Global Step: 452540 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:34:56,574-Speed 2590.95 samples/sec Loss 6.0681 LearningRate 0.0207 Epoch: 10 Global Step: 452550 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:00,661-Speed 2506.32 samples/sec Loss 6.0187 LearningRate 0.0207 Epoch: 10 Global Step: 452560 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:04,679-Speed 2549.23 samples/sec Loss 6.0521 LearningRate 0.0207 Epoch: 10 Global Step: 452570 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:35:08,555-Speed 2642.43 samples/sec Loss 6.0558 LearningRate 0.0207 Epoch: 10 Global Step: 452580 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:12,461-Speed 2622.11 samples/sec Loss 6.1193 LearningRate 0.0207 Epoch: 10 Global Step: 452590 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:16,355-Speed 2630.55 samples/sec Loss 6.2588 LearningRate 0.0207 Epoch: 10 Global Step: 452600 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:20,260-Speed 2623.39 samples/sec Loss 6.0409 LearningRate 0.0206 Epoch: 10 Global Step: 452610 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:24,158-Speed 2627.40 samples/sec Loss 6.0553 LearningRate 0.0206 Epoch: 10 Global Step: 452620 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:28,053-Speed 2629.38 samples/sec Loss 6.1535 LearningRate 0.0206 Epoch: 10 Global Step: 452630 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:31,954-Speed 2625.82 samples/sec Loss 6.0241 LearningRate 0.0206 Epoch: 10 Global Step: 452640 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:35,858-Speed 2623.12 samples/sec Loss 6.1096 LearningRate 0.0206 Epoch: 10 Global Step: 452650 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:39,771-Speed 2618.16 samples/sec Loss 6.1922 LearningRate 0.0206 Epoch: 10 Global Step: 452660 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:43,671-Speed 2626.65 samples/sec Loss 6.0720 LearningRate 0.0206 Epoch: 10 Global Step: 452670 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:47,566-Speed 2629.37 samples/sec Loss 6.0091 LearningRate 0.0206 Epoch: 10 Global Step: 452680 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:35:51,451-Speed 2636.69 samples/sec Loss 6.0892 LearningRate 0.0206 Epoch: 10 Global Step: 452690 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:35:55,341-Speed 2632.63 samples/sec Loss 6.0961 LearningRate 0.0206 Epoch: 10 Global Step: 452700 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:35:59,322-Speed 2573.02 samples/sec Loss 5.9631 LearningRate 0.0206 Epoch: 10 Global Step: 452710 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:03,222-Speed 2626.61 samples/sec Loss 6.1407 LearningRate 0.0206 Epoch: 10 Global Step: 452720 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:07,120-Speed 2627.43 samples/sec Loss 6.0408 LearningRate 0.0206 Epoch: 10 Global Step: 452730 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:11,021-Speed 2625.83 samples/sec Loss 6.0799 LearningRate 0.0206 Epoch: 10 Global Step: 452740 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:14,921-Speed 2626.03 samples/sec Loss 6.0389 LearningRate 0.0206 Epoch: 10 Global Step: 452750 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:18,830-Speed 2620.26 samples/sec Loss 6.2048 LearningRate 0.0206 Epoch: 10 Global Step: 452760 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:22,723-Speed 2630.83 samples/sec Loss 6.0784 LearningRate 0.0206 Epoch: 10 Global Step: 452770 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:26,627-Speed 2624.05 samples/sec Loss 6.0707 LearningRate 0.0206 Epoch: 10 Global Step: 452780 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:30,522-Speed 2629.13 samples/sec Loss 6.0012 LearningRate 0.0206 Epoch: 10 Global Step: 452790 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:36:34,428-Speed 2622.71 samples/sec Loss 5.9394 LearningRate 0.0206 Epoch: 10 Global Step: 452800 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:36:38,327-Speed 2626.98 samples/sec Loss 6.0811 LearningRate 0.0206 Epoch: 10 Global Step: 452810 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:36:42,250-Speed 2610.87 samples/sec Loss 6.0260 LearningRate 0.0206 Epoch: 10 Global Step: 452820 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:36:46,150-Speed 2625.67 samples/sec Loss 6.0579 LearningRate 0.0206 Epoch: 10 Global Step: 452830 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:36:50,064-Speed 2617.05 samples/sec Loss 6.1417 LearningRate 0.0206 Epoch: 10 Global Step: 452840 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:36:53,965-Speed 2625.64 samples/sec Loss 6.0708 LearningRate 0.0206 Epoch: 10 Global Step: 452850 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:36:57,866-Speed 2625.59 samples/sec Loss 6.0319 LearningRate 0.0206 Epoch: 10 Global Step: 452860 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:01,762-Speed 2629.09 samples/sec Loss 6.1588 LearningRate 0.0206 Epoch: 10 Global Step: 452870 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:05,743-Speed 2573.19 samples/sec Loss 6.0549 LearningRate 0.0206 Epoch: 10 Global Step: 452880 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:09,648-Speed 2623.01 samples/sec Loss 6.1477 LearningRate 0.0206 Epoch: 10 Global Step: 452890 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:13,543-Speed 2629.26 samples/sec Loss 6.1556 LearningRate 0.0206 Epoch: 10 Global Step: 452900 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:37:17,447-Speed 2623.57 samples/sec Loss 5.9963 LearningRate 0.0206 Epoch: 10 Global Step: 452910 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:37:21,346-Speed 2627.41 samples/sec Loss 5.9948 LearningRate 0.0206 Epoch: 10 Global Step: 452920 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:37:25,473-Speed 2482.03 samples/sec Loss 6.1307 LearningRate 0.0206 Epoch: 10 Global Step: 452930 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:37:29,432-Speed 2586.87 samples/sec Loss 6.1029 LearningRate 0.0206 Epoch: 10 Global Step: 452940 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:37:33,325-Speed 2631.15 samples/sec Loss 5.9885 LearningRate 0.0206 Epoch: 10 Global Step: 452950 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:37:37,223-Speed 2627.46 samples/sec Loss 6.0468 LearningRate 0.0206 Epoch: 10 Global Step: 452960 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:37:41,102-Speed 2640.68 samples/sec Loss 6.1209 LearningRate 0.0206 Epoch: 10 Global Step: 452970 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:44,995-Speed 2630.60 samples/sec Loss 5.9749 LearningRate 0.0206 Epoch: 10 Global Step: 452980 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:48,893-Speed 2628.11 samples/sec Loss 6.0674 LearningRate 0.0206 Epoch: 10 Global Step: 452990 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:52,791-Speed 2627.26 samples/sec Loss 5.9722 LearningRate 0.0206 Epoch: 10 Global Step: 453000 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:37:56,693-Speed 2625.20 samples/sec Loss 6.0324 LearningRate 0.0206 Epoch: 10 Global Step: 453010 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:38:00,606-Speed 2617.58 samples/sec Loss 6.0540 LearningRate 0.0206 Epoch: 10 Global Step: 453020 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:38:04,508-Speed 2624.93 samples/sec Loss 6.1343 LearningRate 0.0206 Epoch: 10 Global Step: 453030 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:38:08,403-Speed 2629.33 samples/sec Loss 6.0490 LearningRate 0.0206 Epoch: 10 Global Step: 453040 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:38:12,302-Speed 2627.00 samples/sec Loss 6.0338 LearningRate 0.0206 Epoch: 10 Global Step: 453050 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:38:16,203-Speed 2625.28 samples/sec Loss 5.9461 LearningRate 0.0206 Epoch: 10 Global Step: 453060 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:38:20,102-Speed 2627.45 samples/sec Loss 5.9503 LearningRate 0.0206 Epoch: 10 Global Step: 453070 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:24,005-Speed 2624.18 samples/sec Loss 6.0360 LearningRate 0.0206 Epoch: 10 Global Step: 453080 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:27,920-Speed 2615.70 samples/sec Loss 6.0919 LearningRate 0.0206 Epoch: 10 Global Step: 453090 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:31,818-Speed 2627.89 samples/sec Loss 6.0663 LearningRate 0.0206 Epoch: 10 Global Step: 453100 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:35,717-Speed 2626.83 samples/sec Loss 6.1668 LearningRate 0.0206 Epoch: 10 Global Step: 453110 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:39,606-Speed 2633.71 samples/sec Loss 6.1080 LearningRate 0.0206 Epoch: 10 Global Step: 453120 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:43,505-Speed 2626.72 samples/sec Loss 6.1259 LearningRate 0.0206 Epoch: 10 Global Step: 453130 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:47,400-Speed 2629.76 samples/sec Loss 5.9317 LearningRate 0.0206 Epoch: 10 Global Step: 453140 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:51,298-Speed 2627.65 samples/sec Loss 6.0899 LearningRate 0.0206 Epoch: 10 Global Step: 453150 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:55,196-Speed 2628.44 samples/sec Loss 6.0407 LearningRate 0.0206 Epoch: 10 Global Step: 453160 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:38:59,068-Speed 2644.67 samples/sec Loss 6.1164 LearningRate 0.0206 Epoch: 10 Global Step: 453170 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:02,966-Speed 2627.41 samples/sec Loss 6.1113 LearningRate 0.0206 Epoch: 10 Global Step: 453180 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:06,862-Speed 2629.09 samples/sec Loss 6.0255 LearningRate 0.0206 Epoch: 10 Global Step: 453190 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:10,767-Speed 2623.08 samples/sec Loss 6.0277 LearningRate 0.0206 Epoch: 10 Global Step: 453200 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:14,662-Speed 2629.51 samples/sec Loss 6.1743 LearningRate 0.0206 Epoch: 10 Global Step: 453210 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:18,551-Speed 2633.01 samples/sec Loss 6.0797 LearningRate 0.0206 Epoch: 10 Global Step: 453220 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:22,449-Speed 2628.02 samples/sec Loss 6.0535 LearningRate 0.0206 Epoch: 10 Global Step: 453230 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:26,346-Speed 2628.47 samples/sec Loss 5.9822 LearningRate 0.0206 Epoch: 10 Global Step: 453240 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:30,248-Speed 2624.87 samples/sec Loss 6.1749 LearningRate 0.0206 Epoch: 10 Global Step: 453250 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:34,151-Speed 2624.05 samples/sec Loss 6.1036 LearningRate 0.0206 Epoch: 10 Global Step: 453260 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:39:38,052-Speed 2625.94 samples/sec Loss 6.0167 LearningRate 0.0206 Epoch: 10 Global Step: 453270 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:39:41,952-Speed 2626.27 samples/sec Loss 6.2011 LearningRate 0.0206 Epoch: 10 Global Step: 453280 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:39:45,844-Speed 2631.77 samples/sec Loss 6.0704 LearningRate 0.0206 Epoch: 10 Global Step: 453290 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:39:49,742-Speed 2627.67 samples/sec Loss 5.9916 LearningRate 0.0206 Epoch: 10 Global Step: 453300 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:39:53,743-Speed 2560.38 samples/sec Loss 6.1376 LearningRate 0.0206 Epoch: 10 Global Step: 453310 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:39:57,726-Speed 2571.29 samples/sec Loss 6.1126 LearningRate 0.0206 Epoch: 10 Global Step: 453320 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:01,731-Speed 2558.04 samples/sec Loss 6.0098 LearningRate 0.0206 Epoch: 10 Global Step: 453330 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:05,648-Speed 2614.60 samples/sec Loss 6.0658 LearningRate 0.0206 Epoch: 10 Global Step: 453340 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:09,555-Speed 2621.29 samples/sec Loss 6.0342 LearningRate 0.0206 Epoch: 10 Global Step: 453350 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:13,462-Speed 2621.39 samples/sec Loss 6.1243 LearningRate 0.0206 Epoch: 10 Global Step: 453360 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:17,360-Speed 2628.07 samples/sec Loss 6.0251 LearningRate 0.0206 Epoch: 10 Global Step: 453370 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 22:40:21,246-Speed 2635.84 samples/sec Loss 6.0662 LearningRate 0.0206 Epoch: 10 Global Step: 453380 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:25,144-Speed 2627.31 samples/sec Loss 5.9891 LearningRate 0.0206 Epoch: 10 Global Step: 453390 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:29,038-Speed 2630.85 samples/sec Loss 6.1043 LearningRate 0.0206 Epoch: 10 Global Step: 453400 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:32,938-Speed 2626.14 samples/sec Loss 6.0587 LearningRate 0.0206 Epoch: 10 Global Step: 453410 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:36,862-Speed 2610.08 samples/sec Loss 5.9364 LearningRate 0.0206 Epoch: 10 Global Step: 453420 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:40,781-Speed 2613.64 samples/sec Loss 6.1802 LearningRate 0.0206 Epoch: 10 Global Step: 453430 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:44,700-Speed 2614.03 samples/sec Loss 6.1083 LearningRate 0.0206 Epoch: 10 Global Step: 453440 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:48,636-Speed 2602.03 samples/sec Loss 6.0467 LearningRate 0.0206 Epoch: 10 Global Step: 453450 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:52,541-Speed 2623.37 samples/sec Loss 6.0660 LearningRate 0.0206 Epoch: 10 Global Step: 453460 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:40:56,457-Speed 2615.10 samples/sec Loss 6.0570 LearningRate 0.0206 Epoch: 10 Global Step: 453470 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:00,328-Speed 2646.37 samples/sec Loss 6.1450 LearningRate 0.0206 Epoch: 10 Global Step: 453480 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:04,233-Speed 2623.48 samples/sec Loss 6.0874 LearningRate 0.0206 Epoch: 10 Global Step: 453490 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:08,135-Speed 2624.23 samples/sec Loss 6.0470 LearningRate 0.0206 Epoch: 10 Global Step: 453500 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:12,035-Speed 2626.55 samples/sec Loss 6.0360 LearningRate 0.0206 Epoch: 10 Global Step: 453510 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:15,932-Speed 2628.48 samples/sec Loss 6.0989 LearningRate 0.0205 Epoch: 10 Global Step: 453520 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:19,830-Speed 2627.53 samples/sec Loss 5.9228 LearningRate 0.0205 Epoch: 10 Global Step: 453530 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:23,745-Speed 2617.00 samples/sec Loss 6.0620 LearningRate 0.0205 Epoch: 10 Global Step: 453540 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:27,656-Speed 2618.80 samples/sec Loss 6.0723 LearningRate 0.0205 Epoch: 10 Global Step: 453550 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:31,551-Speed 2629.77 samples/sec Loss 6.1159 LearningRate 0.0205 Epoch: 10 Global Step: 453560 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:35,447-Speed 2629.25 samples/sec Loss 6.0585 LearningRate 0.0205 Epoch: 10 Global Step: 453570 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:39,340-Speed 2630.87 samples/sec Loss 6.2087 LearningRate 0.0205 Epoch: 10 Global Step: 453580 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 22:41:43,216-Speed 2642.60 samples/sec Loss 6.0382 LearningRate 0.0205 Epoch: 10 Global Step: 453590 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:47,114-Speed 2627.30 samples/sec Loss 6.0325 LearningRate 0.0205 Epoch: 10 Global Step: 453600 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:51,016-Speed 2625.41 samples/sec Loss 6.0653 LearningRate 0.0205 Epoch: 10 Global Step: 453610 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:54,909-Speed 2630.66 samples/sec Loss 6.0172 LearningRate 0.0205 Epoch: 10 Global Step: 453620 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:41:58,820-Speed 2619.42 samples/sec Loss 6.1673 LearningRate 0.0205 Epoch: 10 Global Step: 453630 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:42:02,720-Speed 2626.11 samples/sec Loss 6.0548 LearningRate 0.0205 Epoch: 10 Global Step: 453640 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:42:06,653-Speed 2604.33 samples/sec Loss 5.9136 LearningRate 0.0205 Epoch: 10 Global Step: 453650 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:42:10,549-Speed 2628.80 samples/sec Loss 6.0377 LearningRate 0.0205 Epoch: 10 Global Step: 453660 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:42:14,449-Speed 2627.61 samples/sec Loss 5.9907 LearningRate 0.0205 Epoch: 10 Global Step: 453670 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:42:18,327-Speed 2640.44 samples/sec Loss 6.0647 LearningRate 0.0205 Epoch: 10 Global Step: 453680 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:22,236-Speed 2620.03 samples/sec Loss 6.0854 LearningRate 0.0205 Epoch: 10 Global Step: 453690 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:26,144-Speed 2621.56 samples/sec Loss 6.0816 LearningRate 0.0205 Epoch: 10 Global Step: 453700 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:30,053-Speed 2620.57 samples/sec Loss 6.1141 LearningRate 0.0205 Epoch: 10 Global Step: 453710 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:34,333-Speed 2393.32 samples/sec Loss 6.1767 LearningRate 0.0205 Epoch: 10 Global Step: 453720 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:38,232-Speed 2626.74 samples/sec Loss 6.0429 LearningRate 0.0205 Epoch: 10 Global Step: 453730 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:42,126-Speed 2630.53 samples/sec Loss 6.0681 LearningRate 0.0205 Epoch: 10 Global Step: 453740 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:46,021-Speed 2629.55 samples/sec Loss 5.9524 LearningRate 0.0205 Epoch: 10 Global Step: 453750 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:49,923-Speed 2624.57 samples/sec Loss 6.0416 LearningRate 0.0205 Epoch: 10 Global Step: 453760 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:53,830-Speed 2621.29 samples/sec Loss 5.9919 LearningRate 0.0205 Epoch: 10 Global Step: 453770 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:42:57,909-Speed 2511.65 samples/sec Loss 5.8560 LearningRate 0.0205 Epoch: 10 Global Step: 453780 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:01,800-Speed 2632.76 samples/sec Loss 6.0261 LearningRate 0.0205 Epoch: 10 Global Step: 453790 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:05,688-Speed 2634.13 samples/sec Loss 6.0085 LearningRate 0.0205 Epoch: 10 Global Step: 453800 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:09,652-Speed 2584.39 samples/sec Loss 6.1024 LearningRate 0.0205 Epoch: 10 Global Step: 453810 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:13,617-Speed 2583.20 samples/sec Loss 6.1217 LearningRate 0.0205 Epoch: 10 Global Step: 453820 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:17,520-Speed 2623.76 samples/sec Loss 6.0823 LearningRate 0.0205 Epoch: 10 Global Step: 453830 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:21,416-Speed 2629.88 samples/sec Loss 6.0225 LearningRate 0.0205 Epoch: 10 Global Step: 453840 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:25,342-Speed 2609.09 samples/sec Loss 6.1015 LearningRate 0.0205 Epoch: 10 Global Step: 453850 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:43:29,226-Speed 2636.93 samples/sec Loss 5.8909 LearningRate 0.0205 Epoch: 10 Global Step: 453860 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:43:33,122-Speed 2629.37 samples/sec Loss 6.0710 LearningRate 0.0205 Epoch: 10 Global Step: 453870 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:43:37,037-Speed 2616.33 samples/sec Loss 6.0401 LearningRate 0.0205 Epoch: 10 Global Step: 453880 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:43:40,943-Speed 2622.51 samples/sec Loss 5.9971 LearningRate 0.0205 Epoch: 10 Global Step: 453890 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:43:44,907-Speed 2583.74 samples/sec Loss 6.0413 LearningRate 0.0205 Epoch: 10 Global Step: 453900 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:43:48,921-Speed 2551.92 samples/sec Loss 6.0717 LearningRate 0.0205 Epoch: 10 Global Step: 453910 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:43:52,820-Speed 2626.82 samples/sec Loss 6.0820 LearningRate 0.0205 Epoch: 10 Global Step: 453920 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:43:56,724-Speed 2624.03 samples/sec Loss 6.0949 LearningRate 0.0205 Epoch: 10 Global Step: 453930 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:00,622-Speed 2627.23 samples/sec Loss 6.2100 LearningRate 0.0205 Epoch: 10 Global Step: 453940 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:04,523-Speed 2626.18 samples/sec Loss 6.0508 LearningRate 0.0205 Epoch: 10 Global Step: 453950 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:08,524-Speed 2560.02 samples/sec Loss 6.1240 LearningRate 0.0205 Epoch: 10 Global Step: 453960 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:44:12,415-Speed 2632.39 samples/sec Loss 6.0446 LearningRate 0.0205 Epoch: 10 Global Step: 453970 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:44:16,308-Speed 2630.83 samples/sec Loss 6.0917 LearningRate 0.0205 Epoch: 10 Global Step: 453980 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:44:20,204-Speed 2629.25 samples/sec Loss 6.0167 LearningRate 0.0205 Epoch: 10 Global Step: 453990 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:44:24,098-Speed 2630.19 samples/sec Loss 6.0931 LearningRate 0.0205 Epoch: 10 Global Step: 454000 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:44:27,976-Speed 2640.72 samples/sec Loss 5.9561 LearningRate 0.0205 Epoch: 10 Global Step: 454010 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:31,873-Speed 2629.37 samples/sec Loss 6.1111 LearningRate 0.0205 Epoch: 10 Global Step: 454020 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:35,770-Speed 2629.26 samples/sec Loss 6.1123 LearningRate 0.0205 Epoch: 10 Global Step: 454030 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:39,675-Speed 2622.94 samples/sec Loss 6.0737 LearningRate 0.0205 Epoch: 10 Global Step: 454040 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:43,571-Speed 2628.97 samples/sec Loss 6.1229 LearningRate 0.0205 Epoch: 10 Global Step: 454050 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:47,535-Speed 2583.72 samples/sec Loss 6.1759 LearningRate 0.0205 Epoch: 10 Global Step: 454060 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:51,433-Speed 2627.54 samples/sec Loss 5.9649 LearningRate 0.0205 Epoch: 10 Global Step: 454070 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:55,345-Speed 2618.87 samples/sec Loss 6.1555 LearningRate 0.0205 Epoch: 10 Global Step: 454080 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:44:59,243-Speed 2627.35 samples/sec Loss 5.9645 LearningRate 0.0205 Epoch: 10 Global Step: 454090 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:45:03,167-Speed 2610.36 samples/sec Loss 6.0494 LearningRate 0.0205 Epoch: 10 Global Step: 454100 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:45:07,142-Speed 2576.34 samples/sec Loss 6.0388 LearningRate 0.0205 Epoch: 10 Global Step: 454110 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:11,041-Speed 2626.90 samples/sec Loss 6.1378 LearningRate 0.0205 Epoch: 10 Global Step: 454120 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:14,936-Speed 2629.87 samples/sec Loss 5.9034 LearningRate 0.0205 Epoch: 10 Global Step: 454130 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:18,835-Speed 2626.65 samples/sec Loss 5.9955 LearningRate 0.0205 Epoch: 10 Global Step: 454140 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:22,741-Speed 2622.22 samples/sec Loss 6.0236 LearningRate 0.0205 Epoch: 10 Global Step: 454150 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:26,634-Speed 2630.85 samples/sec Loss 6.0117 LearningRate 0.0205 Epoch: 10 Global Step: 454160 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:30,536-Speed 2625.32 samples/sec Loss 6.0357 LearningRate 0.0205 Epoch: 10 Global Step: 454170 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:34,433-Speed 2628.45 samples/sec Loss 6.0974 LearningRate 0.0205 Epoch: 10 Global Step: 454180 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:38,334-Speed 2625.29 samples/sec Loss 5.9580 LearningRate 0.0205 Epoch: 10 Global Step: 454190 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:42,243-Speed 2620.77 samples/sec Loss 6.0858 LearningRate 0.0205 Epoch: 10 Global Step: 454200 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:46,144-Speed 2626.79 samples/sec Loss 6.1392 LearningRate 0.0205 Epoch: 10 Global Step: 454210 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 22:45:50,041-Speed 2627.98 samples/sec Loss 6.1480 LearningRate 0.0205 Epoch: 10 Global Step: 454220 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:53,950-Speed 2620.08 samples/sec Loss 5.9168 LearningRate 0.0205 Epoch: 10 Global Step: 454230 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:45:57,842-Speed 2631.53 samples/sec Loss 6.1481 LearningRate 0.0205 Epoch: 10 Global Step: 454240 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:46:01,744-Speed 2625.77 samples/sec Loss 5.9970 LearningRate 0.0205 Epoch: 10 Global Step: 454250 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:46:05,620-Speed 2642.16 samples/sec Loss 6.0519 LearningRate 0.0205 Epoch: 10 Global Step: 454260 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:09,542-Speed 2611.36 samples/sec Loss 6.0658 LearningRate 0.0205 Epoch: 10 Global Step: 454270 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:13,445-Speed 2624.68 samples/sec Loss 6.1246 LearningRate 0.0205 Epoch: 10 Global Step: 454280 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:17,337-Speed 2631.72 samples/sec Loss 6.1095 LearningRate 0.0205 Epoch: 10 Global Step: 454290 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:21,233-Speed 2629.37 samples/sec Loss 6.0415 LearningRate 0.0205 Epoch: 10 Global Step: 454300 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:25,127-Speed 2630.21 samples/sec Loss 6.0683 LearningRate 0.0205 Epoch: 10 Global Step: 454310 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:29,029-Speed 2625.13 samples/sec Loss 5.9817 LearningRate 0.0205 Epoch: 10 Global Step: 454320 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:32,931-Speed 2624.32 samples/sec Loss 6.0623 LearningRate 0.0205 Epoch: 10 Global Step: 454330 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:36,833-Speed 2625.41 samples/sec Loss 6.0812 LearningRate 0.0205 Epoch: 10 Global Step: 454340 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:40,730-Speed 2627.91 samples/sec Loss 6.0436 LearningRate 0.0205 Epoch: 10 Global Step: 454350 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:46:44,625-Speed 2629.59 samples/sec Loss 5.9923 LearningRate 0.0205 Epoch: 10 Global Step: 454360 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:46:48,528-Speed 2624.28 samples/sec Loss 6.0886 LearningRate 0.0205 Epoch: 10 Global Step: 454370 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:46:52,425-Speed 2628.39 samples/sec Loss 6.0091 LearningRate 0.0205 Epoch: 10 Global Step: 454380 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:46:56,324-Speed 2627.01 samples/sec Loss 6.0433 LearningRate 0.0205 Epoch: 10 Global Step: 454390 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:00,236-Speed 2618.03 samples/sec Loss 6.0802 LearningRate 0.0205 Epoch: 10 Global Step: 454400 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:04,135-Speed 2626.94 samples/sec Loss 6.0314 LearningRate 0.0205 Epoch: 10 Global Step: 454410 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:08,031-Speed 2629.09 samples/sec Loss 5.9681 LearningRate 0.0205 Epoch: 10 Global Step: 454420 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:11,926-Speed 2629.46 samples/sec Loss 6.0545 LearningRate 0.0205 Epoch: 10 Global Step: 454430 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:15,828-Speed 2625.28 samples/sec Loss 6.0044 LearningRate 0.0204 Epoch: 10 Global Step: 454440 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:19,730-Speed 2624.82 samples/sec Loss 6.1487 LearningRate 0.0204 Epoch: 10 Global Step: 454450 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:23,613-Speed 2637.79 samples/sec Loss 6.0617 LearningRate 0.0204 Epoch: 10 Global Step: 454460 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:27,511-Speed 2627.43 samples/sec Loss 5.9281 LearningRate 0.0204 Epoch: 10 Global Step: 454470 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:31,415-Speed 2623.70 samples/sec Loss 6.0209 LearningRate 0.0204 Epoch: 10 Global Step: 454480 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:35,308-Speed 2630.23 samples/sec Loss 6.1761 LearningRate 0.0204 Epoch: 10 Global Step: 454490 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:39,206-Speed 2627.65 samples/sec Loss 5.9426 LearningRate 0.0204 Epoch: 10 Global Step: 454500 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:43,103-Speed 2628.53 samples/sec Loss 6.0974 LearningRate 0.0204 Epoch: 10 Global Step: 454510 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:47:46,987-Speed 2637.81 samples/sec Loss 6.0004 LearningRate 0.0204 Epoch: 10 Global Step: 454520 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:47:50,898-Speed 2619.39 samples/sec Loss 5.9537 LearningRate 0.0204 Epoch: 10 Global Step: 454530 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:47:54,795-Speed 2627.95 samples/sec Loss 5.9826 LearningRate 0.0204 Epoch: 10 Global Step: 454540 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:47:58,695-Speed 2626.33 samples/sec Loss 6.1431 LearningRate 0.0204 Epoch: 10 Global Step: 454550 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:48:02,596-Speed 2625.02 samples/sec Loss 5.9606 LearningRate 0.0204 Epoch: 10 Global Step: 454560 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:48:06,497-Speed 2625.48 samples/sec Loss 6.0389 LearningRate 0.0204 Epoch: 10 Global Step: 454570 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:48:10,417-Speed 2612.66 samples/sec Loss 6.0254 LearningRate 0.0204 Epoch: 10 Global Step: 454580 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:48:14,413-Speed 2563.66 samples/sec Loss 5.9647 LearningRate 0.0204 Epoch: 10 Global Step: 454590 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:18,311-Speed 2627.59 samples/sec Loss 6.0846 LearningRate 0.0204 Epoch: 10 Global Step: 454600 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:22,204-Speed 2631.20 samples/sec Loss 6.0846 LearningRate 0.0204 Epoch: 10 Global Step: 454610 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:26,101-Speed 2628.30 samples/sec Loss 6.0297 LearningRate 0.0204 Epoch: 10 Global Step: 454620 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:30,005-Speed 2623.66 samples/sec Loss 6.0605 LearningRate 0.0204 Epoch: 10 Global Step: 454630 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:33,905-Speed 2626.14 samples/sec Loss 6.0606 LearningRate 0.0204 Epoch: 10 Global Step: 454640 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:37,799-Speed 2630.31 samples/sec Loss 6.0178 LearningRate 0.0204 Epoch: 10 Global Step: 454650 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:41,695-Speed 2628.62 samples/sec Loss 6.1934 LearningRate 0.0204 Epoch: 10 Global Step: 454660 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:45,591-Speed 2628.86 samples/sec Loss 6.0983 LearningRate 0.0204 Epoch: 10 Global Step: 454670 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:49,492-Speed 2625.65 samples/sec Loss 5.8929 LearningRate 0.0204 Epoch: 10 Global Step: 454680 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:48:53,392-Speed 2625.99 samples/sec Loss 5.9524 LearningRate 0.0204 Epoch: 10 Global Step: 454690 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:48:57,290-Speed 2627.92 samples/sec Loss 6.0510 LearningRate 0.0204 Epoch: 10 Global Step: 454700 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:01,187-Speed 2628.19 samples/sec Loss 6.0769 LearningRate 0.0204 Epoch: 10 Global Step: 454710 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:05,088-Speed 2626.35 samples/sec Loss 6.0104 LearningRate 0.0204 Epoch: 10 Global Step: 454720 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:09,009-Speed 2611.89 samples/sec Loss 6.1171 LearningRate 0.0204 Epoch: 10 Global Step: 454730 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:12,904-Speed 2629.55 samples/sec Loss 6.0530 LearningRate 0.0204 Epoch: 10 Global Step: 454740 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:16,800-Speed 2628.82 samples/sec Loss 6.0159 LearningRate 0.0204 Epoch: 10 Global Step: 454750 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:20,711-Speed 2618.68 samples/sec Loss 6.0060 LearningRate 0.0204 Epoch: 10 Global Step: 454760 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:24,604-Speed 2630.56 samples/sec Loss 6.0860 LearningRate 0.0204 Epoch: 10 Global Step: 454770 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:28,500-Speed 2630.53 samples/sec Loss 6.0387 LearningRate 0.0204 Epoch: 10 Global Step: 454780 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:32,406-Speed 2621.77 samples/sec Loss 6.0060 LearningRate 0.0204 Epoch: 10 Global Step: 454790 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:49:36,305-Speed 2627.15 samples/sec Loss 6.2234 LearningRate 0.0204 Epoch: 10 Global Step: 454800 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:49:40,203-Speed 2627.82 samples/sec Loss 6.0827 LearningRate 0.0204 Epoch: 10 Global Step: 454810 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:49:44,099-Speed 2629.07 samples/sec Loss 5.8837 LearningRate 0.0204 Epoch: 10 Global Step: 454820 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:49:47,992-Speed 2630.70 samples/sec Loss 5.9468 LearningRate 0.0204 Epoch: 10 Global Step: 454830 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:49:51,887-Speed 2629.57 samples/sec Loss 6.0980 LearningRate 0.0204 Epoch: 10 Global Step: 454840 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:49:55,762-Speed 2642.81 samples/sec Loss 5.9408 LearningRate 0.0204 Epoch: 10 Global Step: 454850 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:49:59,654-Speed 2632.19 samples/sec Loss 6.0165 LearningRate 0.0204 Epoch: 10 Global Step: 454860 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:03,559-Speed 2622.22 samples/sec Loss 6.0216 LearningRate 0.0204 Epoch: 10 Global Step: 454870 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:07,457-Speed 2628.24 samples/sec Loss 6.0746 LearningRate 0.0204 Epoch: 10 Global Step: 454880 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:11,358-Speed 2625.50 samples/sec Loss 6.0811 LearningRate 0.0204 Epoch: 10 Global Step: 454890 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:15,257-Speed 2626.85 samples/sec Loss 6.0342 LearningRate 0.0204 Epoch: 10 Global Step: 454900 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:19,162-Speed 2622.78 samples/sec Loss 5.9632 LearningRate 0.0204 Epoch: 10 Global Step: 454910 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:23,053-Speed 2632.08 samples/sec Loss 5.9940 LearningRate 0.0204 Epoch: 10 Global Step: 454920 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:26,950-Speed 2632.14 samples/sec Loss 6.0183 LearningRate 0.0204 Epoch: 10 Global Step: 454930 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:30,846-Speed 2629.13 samples/sec Loss 6.1982 LearningRate 0.0204 Epoch: 10 Global Step: 454940 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:50:34,739-Speed 2630.71 samples/sec Loss 5.9743 LearningRate 0.0204 Epoch: 10 Global Step: 454950 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:50:38,641-Speed 2625.02 samples/sec Loss 5.9837 LearningRate 0.0204 Epoch: 10 Global Step: 454960 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:50:42,554-Speed 2617.59 samples/sec Loss 6.0990 LearningRate 0.0204 Epoch: 10 Global Step: 454970 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:50:46,449-Speed 2629.79 samples/sec Loss 6.1228 LearningRate 0.0204 Epoch: 10 Global Step: 454980 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:50:50,344-Speed 2628.86 samples/sec Loss 6.0436 LearningRate 0.0204 Epoch: 10 Global Step: 454990 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:50:54,243-Speed 2627.72 samples/sec Loss 6.0217 LearningRate 0.0204 Epoch: 10 Global Step: 455000 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:50:58,139-Speed 2628.81 samples/sec Loss 6.0969 LearningRate 0.0204 Epoch: 10 Global Step: 455010 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:51:02,032-Speed 2631.19 samples/sec Loss 6.0052 LearningRate 0.0204 Epoch: 10 Global Step: 455020 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:51:05,925-Speed 2630.67 samples/sec Loss 5.9014 LearningRate 0.0204 Epoch: 10 Global Step: 455030 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:51:09,801-Speed 2642.45 samples/sec Loss 5.9688 LearningRate 0.0204 Epoch: 10 Global Step: 455040 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:51:13,713-Speed 2618.18 samples/sec Loss 5.8956 LearningRate 0.0204 Epoch: 10 Global Step: 455050 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:51:17,620-Speed 2621.37 samples/sec Loss 6.0354 LearningRate 0.0204 Epoch: 10 Global Step: 455060 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:51:21,518-Speed 2628.07 samples/sec Loss 6.0739 LearningRate 0.0204 Epoch: 10 Global Step: 455070 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:51:25,403-Speed 2635.71 samples/sec Loss 6.1343 LearningRate 0.0204 Epoch: 10 Global Step: 455080 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:29,297-Speed 2631.35 samples/sec Loss 6.0815 LearningRate 0.0204 Epoch: 10 Global Step: 455090 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:33,192-Speed 2629.10 samples/sec Loss 5.8924 LearningRate 0.0204 Epoch: 10 Global Step: 455100 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:37,091-Speed 2627.07 samples/sec Loss 6.0411 LearningRate 0.0204 Epoch: 10 Global Step: 455110 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:40,983-Speed 2631.48 samples/sec Loss 5.9404 LearningRate 0.0204 Epoch: 10 Global Step: 455120 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:44,877-Speed 2630.29 samples/sec Loss 5.9575 LearningRate 0.0204 Epoch: 10 Global Step: 455130 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:48,778-Speed 2625.38 samples/sec Loss 6.1197 LearningRate 0.0204 Epoch: 10 Global Step: 455140 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:52,678-Speed 2626.67 samples/sec Loss 6.0069 LearningRate 0.0204 Epoch: 10 Global Step: 455150 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:51:56,575-Speed 2627.63 samples/sec Loss 6.0131 LearningRate 0.0204 Epoch: 10 Global Step: 455160 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:52:00,474-Speed 2626.96 samples/sec Loss 6.0812 LearningRate 0.0204 Epoch: 10 Global Step: 455170 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:52:04,371-Speed 2628.40 samples/sec Loss 5.9805 LearningRate 0.0204 Epoch: 10 Global Step: 455180 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:08,265-Speed 2630.14 samples/sec Loss 6.1361 LearningRate 0.0204 Epoch: 10 Global Step: 455190 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:12,162-Speed 2628.62 samples/sec Loss 6.0591 LearningRate 0.0204 Epoch: 10 Global Step: 455200 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:16,054-Speed 2631.38 samples/sec Loss 5.9633 LearningRate 0.0204 Epoch: 10 Global Step: 455210 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:19,950-Speed 2629.24 samples/sec Loss 5.9919 LearningRate 0.0204 Epoch: 10 Global Step: 455220 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:23,855-Speed 2622.86 samples/sec Loss 5.9892 LearningRate 0.0204 Epoch: 10 Global Step: 455230 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:27,757-Speed 2624.31 samples/sec Loss 6.2046 LearningRate 0.0204 Epoch: 10 Global Step: 455240 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:31,664-Speed 2621.64 samples/sec Loss 6.0391 LearningRate 0.0204 Epoch: 10 Global Step: 455250 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:35,562-Speed 2627.54 samples/sec Loss 6.0419 LearningRate 0.0204 Epoch: 10 Global Step: 455260 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:39,471-Speed 2620.54 samples/sec Loss 6.0176 LearningRate 0.0204 Epoch: 10 Global Step: 455270 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:52:43,391-Speed 2612.31 samples/sec Loss 6.0820 LearningRate 0.0204 Epoch: 10 Global Step: 455280 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:52:47,299-Speed 2621.55 samples/sec Loss 5.9968 LearningRate 0.0204 Epoch: 10 Global Step: 455290 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:52:51,193-Speed 2630.48 samples/sec Loss 6.0353 LearningRate 0.0204 Epoch: 10 Global Step: 455300 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:52:55,095-Speed 2624.68 samples/sec Loss 6.2159 LearningRate 0.0204 Epoch: 10 Global Step: 455310 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:52:58,992-Speed 2629.05 samples/sec Loss 5.9791 LearningRate 0.0204 Epoch: 10 Global Step: 455320 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:02,888-Speed 2628.50 samples/sec Loss 6.0799 LearningRate 0.0204 Epoch: 10 Global Step: 455330 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:06,785-Speed 2628.29 samples/sec Loss 6.0457 LearningRate 0.0204 Epoch: 10 Global Step: 455340 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:10,708-Speed 2610.37 samples/sec Loss 6.0148 LearningRate 0.0203 Epoch: 10 Global Step: 455350 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:14,622-Speed 2617.28 samples/sec Loss 5.9514 LearningRate 0.0203 Epoch: 10 Global Step: 455360 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:18,519-Speed 2628.14 samples/sec Loss 5.9999 LearningRate 0.0203 Epoch: 10 Global Step: 455370 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:22,412-Speed 2631.02 samples/sec Loss 6.0424 LearningRate 0.0203 Epoch: 10 Global Step: 455380 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 22:53:26,290-Speed 2641.34 samples/sec Loss 6.0222 LearningRate 0.0203 Epoch: 10 Global Step: 455390 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:30,189-Speed 2626.97 samples/sec Loss 5.9990 LearningRate 0.0203 Epoch: 10 Global Step: 455400 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:53:34,091-Speed 2624.77 samples/sec Loss 5.9340 LearningRate 0.0203 Epoch: 10 Global Step: 455410 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:53:37,988-Speed 2627.91 samples/sec Loss 5.9897 LearningRate 0.0203 Epoch: 10 Global Step: 455420 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:53:41,889-Speed 2625.48 samples/sec Loss 6.0160 LearningRate 0.0203 Epoch: 10 Global Step: 455430 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:53:45,793-Speed 2623.86 samples/sec Loss 5.9687 LearningRate 0.0203 Epoch: 10 Global Step: 455440 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:53:49,712-Speed 2613.10 samples/sec Loss 6.0043 LearningRate 0.0203 Epoch: 10 Global Step: 455450 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:53:53,609-Speed 2628.03 samples/sec Loss 5.8876 LearningRate 0.0203 Epoch: 10 Global Step: 455460 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:53:57,508-Speed 2627.39 samples/sec Loss 6.0552 LearningRate 0.0203 Epoch: 10 Global Step: 455470 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:01,409-Speed 2625.68 samples/sec Loss 5.9547 LearningRate 0.0203 Epoch: 10 Global Step: 455480 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:05,312-Speed 2624.00 samples/sec Loss 5.9146 LearningRate 0.0203 Epoch: 10 Global Step: 455490 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:09,209-Speed 2628.47 samples/sec Loss 5.9733 LearningRate 0.0203 Epoch: 10 Global Step: 455500 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:13,129-Speed 2612.56 samples/sec Loss 6.0324 LearningRate 0.0203 Epoch: 10 Global Step: 455510 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:54:17,025-Speed 2628.97 samples/sec Loss 5.9406 LearningRate 0.0203 Epoch: 10 Global Step: 455520 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:54:20,923-Speed 2627.75 samples/sec Loss 6.0150 LearningRate 0.0203 Epoch: 10 Global Step: 455530 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:54:24,826-Speed 2623.79 samples/sec Loss 6.0803 LearningRate 0.0203 Epoch: 10 Global Step: 455540 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:54:28,733-Speed 2621.94 samples/sec Loss 6.0045 LearningRate 0.0203 Epoch: 10 Global Step: 455550 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:54:32,629-Speed 2628.80 samples/sec Loss 5.9902 LearningRate 0.0203 Epoch: 10 Global Step: 455560 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:54:36,525-Speed 2629.54 samples/sec Loss 6.0443 LearningRate 0.0203 Epoch: 10 Global Step: 455570 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:54:40,512-Speed 2569.18 samples/sec Loss 5.9367 LearningRate 0.0203 Epoch: 10 Global Step: 455580 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:44,414-Speed 2624.78 samples/sec Loss 5.9945 LearningRate 0.0203 Epoch: 10 Global Step: 455590 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:48,315-Speed 2625.05 samples/sec Loss 5.9471 LearningRate 0.0203 Epoch: 10 Global Step: 455600 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:52,232-Speed 2614.91 samples/sec Loss 6.1096 LearningRate 0.0203 Epoch: 10 Global Step: 455610 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:54:56,131-Speed 2627.16 samples/sec Loss 6.0336 LearningRate 0.0203 Epoch: 10 Global Step: 455620 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:00,031-Speed 2626.19 samples/sec Loss 6.0315 LearningRate 0.0203 Epoch: 10 Global Step: 455630 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:03,932-Speed 2625.61 samples/sec Loss 5.9789 LearningRate 0.0203 Epoch: 10 Global Step: 455640 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:07,832-Speed 2626.30 samples/sec Loss 5.9513 LearningRate 0.0203 Epoch: 10 Global Step: 455650 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:11,734-Speed 2624.33 samples/sec Loss 6.0151 LearningRate 0.0203 Epoch: 10 Global Step: 455660 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:15,631-Speed 2628.52 samples/sec Loss 6.1014 LearningRate 0.0203 Epoch: 10 Global Step: 455670 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:19,504-Speed 2644.81 samples/sec Loss 6.0555 LearningRate 0.0203 Epoch: 10 Global Step: 455680 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:23,401-Speed 2628.17 samples/sec Loss 6.0519 LearningRate 0.0203 Epoch: 10 Global Step: 455690 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:27,306-Speed 2622.81 samples/sec Loss 6.1273 LearningRate 0.0203 Epoch: 10 Global Step: 455700 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:31,203-Speed 2628.11 samples/sec Loss 6.1746 LearningRate 0.0203 Epoch: 10 Global Step: 455710 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:35,099-Speed 2628.78 samples/sec Loss 6.1184 LearningRate 0.0203 Epoch: 10 Global Step: 455720 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:38,999-Speed 2626.59 samples/sec Loss 6.2090 LearningRate 0.0203 Epoch: 10 Global Step: 455730 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:42,898-Speed 2626.34 samples/sec Loss 6.0027 LearningRate 0.0203 Epoch: 10 Global Step: 455740 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:46,810-Speed 2618.40 samples/sec Loss 6.0325 LearningRate 0.0203 Epoch: 10 Global Step: 455750 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:50,719-Speed 2620.05 samples/sec Loss 6.0695 LearningRate 0.0203 Epoch: 10 Global Step: 455760 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:54,614-Speed 2629.73 samples/sec Loss 6.0822 LearningRate 0.0203 Epoch: 10 Global Step: 455770 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:55:58,514-Speed 2626.55 samples/sec Loss 6.0474 LearningRate 0.0203 Epoch: 10 Global Step: 455780 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:56:02,414-Speed 2626.17 samples/sec Loss 6.0085 LearningRate 0.0203 Epoch: 10 Global Step: 455790 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:56:06,313-Speed 2627.95 samples/sec Loss 6.1114 LearningRate 0.0203 Epoch: 10 Global Step: 455800 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:56:10,205-Speed 2630.95 samples/sec Loss 6.0815 LearningRate 0.0203 Epoch: 10 Global Step: 455810 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:56:14,100-Speed 2629.46 samples/sec Loss 5.8891 LearningRate 0.0203 Epoch: 10 Global Step: 455820 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:56:17,998-Speed 2627.43 samples/sec Loss 6.0387 LearningRate 0.0203 Epoch: 10 Global Step: 455830 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:56:21,897-Speed 2627.43 samples/sec Loss 6.0157 LearningRate 0.0203 Epoch: 10 Global Step: 455840 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:56:25,756-Speed 2653.66 samples/sec Loss 6.0152 LearningRate 0.0203 Epoch: 10 Global Step: 455850 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:56:29,722-Speed 2582.92 samples/sec Loss 5.9282 LearningRate 0.0203 Epoch: 10 Global Step: 455860 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:56:33,619-Speed 2628.08 samples/sec Loss 6.1435 LearningRate 0.0203 Epoch: 10 Global Step: 455870 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:56:37,517-Speed 2627.68 samples/sec Loss 6.0573 LearningRate 0.0203 Epoch: 10 Global Step: 455880 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:56:41,415-Speed 2627.67 samples/sec Loss 6.0053 LearningRate 0.0203 Epoch: 10 Global Step: 455890 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:56:45,311-Speed 2628.92 samples/sec Loss 5.9804 LearningRate 0.0203 Epoch: 10 Global Step: 455900 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:56:49,207-Speed 2631.60 samples/sec Loss 6.1774 LearningRate 0.0203 Epoch: 10 Global Step: 455910 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:56:53,104-Speed 2628.49 samples/sec Loss 6.0495 LearningRate 0.0203 Epoch: 10 Global Step: 455920 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:03,019-Speed 2508.32 samples/sec Loss 6.0788 LearningRate 0.0203 Epoch: 10 Global Step: 455930 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:06,982-Speed 2584.67 samples/sec Loss 5.9290 LearningRate 0.0203 Epoch: 10 Global Step: 455940 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:10,898-Speed 2626.90 samples/sec Loss 5.9479 LearningRate 0.0203 Epoch: 10 Global Step: 455950 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:57:14,769-Speed 2645.47 samples/sec Loss 6.0142 LearningRate 0.0203 Epoch: 10 Global Step: 455960 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:18,662-Speed 2630.69 samples/sec Loss 6.0572 LearningRate 0.0203 Epoch: 10 Global Step: 455970 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:22,668-Speed 2639.25 samples/sec Loss 6.0936 LearningRate 0.0203 Epoch: 10 Global Step: 455980 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:26,564-Speed 2629.30 samples/sec Loss 6.0478 LearningRate 0.0203 Epoch: 10 Global Step: 455990 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:30,455-Speed 2635.27 samples/sec Loss 6.0964 LearningRate 0.0203 Epoch: 10 Global Step: 456000 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:34,364-Speed 2620.22 samples/sec Loss 6.0617 LearningRate 0.0203 Epoch: 10 Global Step: 456010 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:38,257-Speed 2630.98 samples/sec Loss 6.1009 LearningRate 0.0203 Epoch: 10 Global Step: 456020 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:42,147-Speed 2632.81 samples/sec Loss 6.0555 LearningRate 0.0203 Epoch: 10 Global Step: 456030 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:46,102-Speed 2635.81 samples/sec Loss 5.9896 LearningRate 0.0203 Epoch: 10 Global Step: 456040 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:50,002-Speed 2626.27 samples/sec Loss 6.0256 LearningRate 0.0203 Epoch: 10 Global Step: 456050 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 22:57:53,904-Speed 2624.70 samples/sec Loss 5.9797 LearningRate 0.0203 Epoch: 10 Global Step: 456060 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:57:57,798-Speed 2630.23 samples/sec Loss 6.1211 LearningRate 0.0203 Epoch: 10 Global Step: 456070 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:01,709-Speed 2618.94 samples/sec Loss 6.0682 LearningRate 0.0203 Epoch: 10 Global Step: 456080 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:05,604-Speed 2630.25 samples/sec Loss 6.0126 LearningRate 0.0203 Epoch: 10 Global Step: 456090 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:09,500-Speed 2628.78 samples/sec Loss 5.9555 LearningRate 0.0203 Epoch: 10 Global Step: 456100 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:13,395-Speed 2629.43 samples/sec Loss 5.9514 LearningRate 0.0203 Epoch: 10 Global Step: 456110 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:17,299-Speed 2623.02 samples/sec Loss 6.0722 LearningRate 0.0203 Epoch: 10 Global Step: 456120 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:21,210-Speed 2618.98 samples/sec Loss 6.0541 LearningRate 0.0203 Epoch: 10 Global Step: 456130 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:25,110-Speed 2626.26 samples/sec Loss 5.9461 LearningRate 0.0203 Epoch: 10 Global Step: 456140 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:29,014-Speed 2623.89 samples/sec Loss 6.1117 LearningRate 0.0203 Epoch: 10 Global Step: 456150 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:32,889-Speed 2643.16 samples/sec Loss 6.1666 LearningRate 0.0203 Epoch: 10 Global Step: 456160 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:36,789-Speed 2625.81 samples/sec Loss 6.0816 LearningRate 0.0203 Epoch: 10 Global Step: 456170 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:40,689-Speed 2626.38 samples/sec Loss 6.0379 LearningRate 0.0203 Epoch: 10 Global Step: 456180 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:44,591-Speed 2624.79 samples/sec Loss 6.0290 LearningRate 0.0203 Epoch: 10 Global Step: 456190 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:48,482-Speed 2632.47 samples/sec Loss 6.0489 LearningRate 0.0203 Epoch: 10 Global Step: 456200 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:52,375-Speed 2631.22 samples/sec Loss 6.0261 LearningRate 0.0203 Epoch: 10 Global Step: 456210 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:58:56,271-Speed 2628.63 samples/sec Loss 5.9371 LearningRate 0.0203 Epoch: 10 Global Step: 456220 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:00,163-Speed 2632.19 samples/sec Loss 6.0669 LearningRate 0.0203 Epoch: 10 Global Step: 456230 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:04,204-Speed 2533.98 samples/sec Loss 6.0271 LearningRate 0.0203 Epoch: 10 Global Step: 456240 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:08,123-Speed 2613.59 samples/sec Loss 6.0618 LearningRate 0.0203 Epoch: 10 Global Step: 456250 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:12,061-Speed 2601.22 samples/sec Loss 6.0928 LearningRate 0.0203 Epoch: 10 Global Step: 456260 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:59:15,954-Speed 2630.82 samples/sec Loss 6.0236 LearningRate 0.0202 Epoch: 10 Global Step: 456270 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 22:59:39,420-Speed 436.38 samples/sec Loss 5.9283 LearningRate 0.0202 Epoch: 11 Global Step: 456280 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:43,303-Speed 2638.74 samples/sec Loss 6.0626 LearningRate 0.0202 Epoch: 11 Global Step: 456290 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:47,186-Speed 2637.72 samples/sec Loss 6.0348 LearningRate 0.0202 Epoch: 11 Global Step: 456300 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:51,069-Speed 2637.67 samples/sec Loss 5.9788 LearningRate 0.0202 Epoch: 11 Global Step: 456310 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:55,067-Speed 2561.70 samples/sec Loss 6.0722 LearningRate 0.0202 Epoch: 11 Global Step: 456320 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 22:59:58,932-Speed 2650.95 samples/sec Loss 6.1032 LearningRate 0.0202 Epoch: 11 Global Step: 456330 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:02,827-Speed 2629.30 samples/sec Loss 5.9930 LearningRate 0.0202 Epoch: 11 Global Step: 456340 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:06,720-Speed 2631.72 samples/sec Loss 5.8894 LearningRate 0.0202 Epoch: 11 Global Step: 456350 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:10,613-Speed 2630.82 samples/sec Loss 6.0047 LearningRate 0.0202 Epoch: 11 Global Step: 456360 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:14,519-Speed 2621.82 samples/sec Loss 5.9909 LearningRate 0.0202 Epoch: 11 Global Step: 456370 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:18,421-Speed 2624.77 samples/sec Loss 5.9796 LearningRate 0.0202 Epoch: 11 Global Step: 456380 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:22,321-Speed 2627.12 samples/sec Loss 6.0224 LearningRate 0.0202 Epoch: 11 Global Step: 456390 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:26,223-Speed 2624.88 samples/sec Loss 6.0880 LearningRate 0.0202 Epoch: 11 Global Step: 456400 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:30,125-Speed 2625.05 samples/sec Loss 6.0596 LearningRate 0.0202 Epoch: 11 Global Step: 456410 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:34,034-Speed 2619.93 samples/sec Loss 6.0133 LearningRate 0.0202 Epoch: 11 Global Step: 456420 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:00:37,929-Speed 2630.04 samples/sec Loss 6.0222 LearningRate 0.0202 Epoch: 11 Global Step: 456430 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:00:41,844-Speed 2616.06 samples/sec Loss 5.9754 LearningRate 0.0202 Epoch: 11 Global Step: 456440 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:00:45,749-Speed 2622.98 samples/sec Loss 6.0089 LearningRate 0.0202 Epoch: 11 Global Step: 456450 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:00:49,673-Speed 2609.69 samples/sec Loss 6.0357 LearningRate 0.0202 Epoch: 11 Global Step: 456460 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:00:53,569-Speed 2629.85 samples/sec Loss 5.8604 LearningRate 0.0202 Epoch: 11 Global Step: 456470 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:00:57,465-Speed 2628.73 samples/sec Loss 6.0043 LearningRate 0.0202 Epoch: 11 Global Step: 456480 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:01,359-Speed 2630.18 samples/sec Loss 6.0675 LearningRate 0.0202 Epoch: 11 Global Step: 456490 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:05,256-Speed 2627.97 samples/sec Loss 5.9824 LearningRate 0.0202 Epoch: 11 Global Step: 456500 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:09,150-Speed 2630.12 samples/sec Loss 5.9993 LearningRate 0.0202 Epoch: 11 Global Step: 456510 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:13,050-Speed 2627.16 samples/sec Loss 5.9853 LearningRate 0.0202 Epoch: 11 Global Step: 456520 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:16,942-Speed 2631.65 samples/sec Loss 6.0576 LearningRate 0.0202 Epoch: 11 Global Step: 456530 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:01:20,841-Speed 2626.61 samples/sec Loss 5.9705 LearningRate 0.0202 Epoch: 11 Global Step: 456540 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:01:24,742-Speed 2626.13 samples/sec Loss 5.9531 LearningRate 0.0202 Epoch: 11 Global Step: 456550 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:01:28,637-Speed 2629.57 samples/sec Loss 5.9995 LearningRate 0.0202 Epoch: 11 Global Step: 456560 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:01:32,540-Speed 2624.11 samples/sec Loss 5.9758 LearningRate 0.0202 Epoch: 11 Global Step: 456570 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:01:36,435-Speed 2629.84 samples/sec Loss 5.9857 LearningRate 0.0202 Epoch: 11 Global Step: 456580 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:01:40,319-Speed 2637.17 samples/sec Loss 6.0545 LearningRate 0.0202 Epoch: 11 Global Step: 456590 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:44,214-Speed 2629.91 samples/sec Loss 5.9946 LearningRate 0.0202 Epoch: 11 Global Step: 456600 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:48,111-Speed 2628.34 samples/sec Loss 5.9440 LearningRate 0.0202 Epoch: 11 Global Step: 456610 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:52,019-Speed 2620.80 samples/sec Loss 5.9927 LearningRate 0.0202 Epoch: 11 Global Step: 456620 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:55,928-Speed 2620.74 samples/sec Loss 6.1051 LearningRate 0.0202 Epoch: 11 Global Step: 456630 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:01:59,823-Speed 2629.56 samples/sec Loss 5.9650 LearningRate 0.0202 Epoch: 11 Global Step: 456640 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:02:03,721-Speed 2627.33 samples/sec Loss 6.0124 LearningRate 0.0202 Epoch: 11 Global Step: 456650 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:02:07,621-Speed 2625.93 samples/sec Loss 6.0092 LearningRate 0.0202 Epoch: 11 Global Step: 456660 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:02:11,525-Speed 2623.97 samples/sec Loss 5.9810 LearningRate 0.0202 Epoch: 11 Global Step: 456670 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:02:15,426-Speed 2625.54 samples/sec Loss 5.9340 LearningRate 0.0202 Epoch: 11 Global Step: 456680 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:02:19,323-Speed 2628.85 samples/sec Loss 6.0976 LearningRate 0.0202 Epoch: 11 Global Step: 456690 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:23,218-Speed 2629.63 samples/sec Loss 5.9665 LearningRate 0.0202 Epoch: 11 Global Step: 456700 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:27,117-Speed 2627.09 samples/sec Loss 6.0142 LearningRate 0.0202 Epoch: 11 Global Step: 456710 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:31,011-Speed 2630.09 samples/sec Loss 5.9778 LearningRate 0.0202 Epoch: 11 Global Step: 456720 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:34,919-Speed 2621.06 samples/sec Loss 6.0008 LearningRate 0.0202 Epoch: 11 Global Step: 456730 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:38,813-Speed 2630.12 samples/sec Loss 5.9360 LearningRate 0.0202 Epoch: 11 Global Step: 456740 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:42,714-Speed 2625.24 samples/sec Loss 5.9918 LearningRate 0.0202 Epoch: 11 Global Step: 456750 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:46,612-Speed 2627.98 samples/sec Loss 5.8914 LearningRate 0.0202 Epoch: 11 Global Step: 456760 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:50,521-Speed 2620.60 samples/sec Loss 5.9768 LearningRate 0.0202 Epoch: 11 Global Step: 456770 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:54,413-Speed 2631.78 samples/sec Loss 6.0188 LearningRate 0.0202 Epoch: 11 Global Step: 456780 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:02:58,298-Speed 2635.95 samples/sec Loss 6.0288 LearningRate 0.0202 Epoch: 11 Global Step: 456790 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:02,198-Speed 2626.64 samples/sec Loss 5.9288 LearningRate 0.0202 Epoch: 11 Global Step: 456800 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:06,095-Speed 2628.24 samples/sec Loss 6.0841 LearningRate 0.0202 Epoch: 11 Global Step: 456810 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:09,997-Speed 2624.41 samples/sec Loss 6.0066 LearningRate 0.0202 Epoch: 11 Global Step: 456820 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:13,897-Speed 2626.59 samples/sec Loss 6.0088 LearningRate 0.0202 Epoch: 11 Global Step: 456830 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:17,792-Speed 2629.47 samples/sec Loss 5.9338 LearningRate 0.0202 Epoch: 11 Global Step: 456840 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:21,685-Speed 2630.74 samples/sec Loss 5.9162 LearningRate 0.0202 Epoch: 11 Global Step: 456850 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:25,579-Speed 2631.41 samples/sec Loss 6.0506 LearningRate 0.0202 Epoch: 11 Global Step: 456860 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:29,475-Speed 2628.53 samples/sec Loss 5.9802 LearningRate 0.0202 Epoch: 11 Global Step: 456870 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:33,466-Speed 2566.05 samples/sec Loss 5.9588 LearningRate 0.0202 Epoch: 11 Global Step: 456880 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:37,336-Speed 2646.76 samples/sec Loss 5.9173 LearningRate 0.0202 Epoch: 11 Global Step: 456890 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:41,233-Speed 2628.34 samples/sec Loss 6.0510 LearningRate 0.0202 Epoch: 11 Global Step: 456900 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:45,137-Speed 2623.48 samples/sec Loss 6.1268 LearningRate 0.0202 Epoch: 11 Global Step: 456910 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:49,036-Speed 2627.13 samples/sec Loss 6.0058 LearningRate 0.0202 Epoch: 11 Global Step: 456920 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:52,927-Speed 2632.26 samples/sec Loss 5.9993 LearningRate 0.0202 Epoch: 11 Global Step: 456930 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:03:56,823-Speed 2629.88 samples/sec Loss 5.9431 LearningRate 0.0202 Epoch: 11 Global Step: 456940 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:00,729-Speed 2621.79 samples/sec Loss 5.9985 LearningRate 0.0202 Epoch: 11 Global Step: 456950 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:04,628-Speed 2626.83 samples/sec Loss 6.0476 LearningRate 0.0202 Epoch: 11 Global Step: 456960 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:08,534-Speed 2621.98 samples/sec Loss 6.0034 LearningRate 0.0202 Epoch: 11 Global Step: 456970 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:12,445-Speed 2619.49 samples/sec Loss 5.9810 LearningRate 0.0202 Epoch: 11 Global Step: 456980 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:16,351-Speed 2622.50 samples/sec Loss 6.0014 LearningRate 0.0202 Epoch: 11 Global Step: 456990 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 23:04:20,256-Speed 2622.78 samples/sec Loss 6.0898 LearningRate 0.0202 Epoch: 11 Global Step: 457000 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:24,183-Speed 2608.40 samples/sec Loss 5.9468 LearningRate 0.0202 Epoch: 11 Global Step: 457010 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:28,084-Speed 2626.05 samples/sec Loss 5.8669 LearningRate 0.0202 Epoch: 11 Global Step: 457020 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:31,977-Speed 2630.43 samples/sec Loss 5.9709 LearningRate 0.0202 Epoch: 11 Global Step: 457030 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:35,875-Speed 2627.52 samples/sec Loss 6.0662 LearningRate 0.0202 Epoch: 11 Global Step: 457040 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:04:39,767-Speed 2631.74 samples/sec Loss 5.8902 LearningRate 0.0202 Epoch: 11 Global Step: 457050 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:04:43,668-Speed 2625.94 samples/sec Loss 6.0553 LearningRate 0.0202 Epoch: 11 Global Step: 457060 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:04:47,570-Speed 2625.36 samples/sec Loss 6.0438 LearningRate 0.0202 Epoch: 11 Global Step: 457070 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:04:51,472-Speed 2624.34 samples/sec Loss 6.0743 LearningRate 0.0202 Epoch: 11 Global Step: 457080 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:04:55,380-Speed 2621.50 samples/sec Loss 6.0205 LearningRate 0.0202 Epoch: 11 Global Step: 457090 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:04:59,276-Speed 2629.01 samples/sec Loss 6.0850 LearningRate 0.0202 Epoch: 11 Global Step: 457100 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:03,172-Speed 2628.26 samples/sec Loss 5.9405 LearningRate 0.0202 Epoch: 11 Global Step: 457110 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:07,069-Speed 2628.16 samples/sec Loss 5.9650 LearningRate 0.0202 Epoch: 11 Global Step: 457120 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:10,979-Speed 2619.90 samples/sec Loss 5.9675 LearningRate 0.0202 Epoch: 11 Global Step: 457130 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:14,914-Speed 2603.55 samples/sec Loss 5.9943 LearningRate 0.0202 Epoch: 11 Global Step: 457140 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:18,809-Speed 2629.51 samples/sec Loss 5.9624 LearningRate 0.0202 Epoch: 11 Global Step: 457150 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:05:22,733-Speed 2610.32 samples/sec Loss 5.9671 LearningRate 0.0202 Epoch: 11 Global Step: 457160 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:05:26,630-Speed 2628.31 samples/sec Loss 5.9233 LearningRate 0.0202 Epoch: 11 Global Step: 457170 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:05:30,507-Speed 2642.34 samples/sec Loss 5.8244 LearningRate 0.0202 Epoch: 11 Global Step: 457180 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:34,407-Speed 2626.05 samples/sec Loss 5.9624 LearningRate 0.0202 Epoch: 11 Global Step: 457190 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:38,307-Speed 2625.64 samples/sec Loss 5.9397 LearningRate 0.0201 Epoch: 11 Global Step: 457200 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:42,203-Speed 2629.42 samples/sec Loss 6.0456 LearningRate 0.0201 Epoch: 11 Global Step: 457210 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:46,098-Speed 2629.77 samples/sec Loss 6.0761 LearningRate 0.0201 Epoch: 11 Global Step: 457220 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:49,993-Speed 2629.69 samples/sec Loss 6.0189 LearningRate 0.0201 Epoch: 11 Global Step: 457230 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:53,898-Speed 2622.71 samples/sec Loss 6.0257 LearningRate 0.0201 Epoch: 11 Global Step: 457240 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:05:57,802-Speed 2624.48 samples/sec Loss 6.1108 LearningRate 0.0201 Epoch: 11 Global Step: 457250 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:01,696-Speed 2629.74 samples/sec Loss 6.1562 LearningRate 0.0201 Epoch: 11 Global Step: 457260 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:05,594-Speed 2627.68 samples/sec Loss 6.0543 LearningRate 0.0201 Epoch: 11 Global Step: 457270 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:09,504-Speed 2619.23 samples/sec Loss 5.9805 LearningRate 0.0201 Epoch: 11 Global Step: 457280 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:06:13,414-Speed 2620.47 samples/sec Loss 5.9685 LearningRate 0.0201 Epoch: 11 Global Step: 457290 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:06:17,318-Speed 2623.16 samples/sec Loss 6.1003 LearningRate 0.0201 Epoch: 11 Global Step: 457300 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:06:21,225-Speed 2621.32 samples/sec Loss 6.0095 LearningRate 0.0201 Epoch: 11 Global Step: 457310 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:25,122-Speed 2628.87 samples/sec Loss 5.9873 LearningRate 0.0201 Epoch: 11 Global Step: 457320 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:29,014-Speed 2632.24 samples/sec Loss 6.0854 LearningRate 0.0201 Epoch: 11 Global Step: 457330 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:32,923-Speed 2620.99 samples/sec Loss 5.9989 LearningRate 0.0201 Epoch: 11 Global Step: 457340 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:36,817-Speed 2630.51 samples/sec Loss 6.0526 LearningRate 0.0201 Epoch: 11 Global Step: 457350 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:40,785-Speed 2580.58 samples/sec Loss 5.9603 LearningRate 0.0201 Epoch: 11 Global Step: 457360 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:06:44,675-Speed 2633.53 samples/sec Loss 6.0738 LearningRate 0.0201 Epoch: 11 Global Step: 457370 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:06:48,569-Speed 2630.37 samples/sec Loss 5.9953 LearningRate 0.0201 Epoch: 11 Global Step: 457380 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:06:52,465-Speed 2629.30 samples/sec Loss 6.1515 LearningRate 0.0201 Epoch: 11 Global Step: 457390 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:06:56,392-Speed 2607.87 samples/sec Loss 5.8269 LearningRate 0.0201 Epoch: 11 Global Step: 457400 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:07:00,284-Speed 2631.62 samples/sec Loss 5.9538 LearningRate 0.0201 Epoch: 11 Global Step: 457410 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:07:04,179-Speed 2629.75 samples/sec Loss 5.9638 LearningRate 0.0201 Epoch: 11 Global Step: 457420 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:07:08,075-Speed 2629.48 samples/sec Loss 5.9272 LearningRate 0.0201 Epoch: 11 Global Step: 457430 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:07:11,967-Speed 2631.48 samples/sec Loss 6.0113 LearningRate 0.0201 Epoch: 11 Global Step: 457440 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:07:15,857-Speed 2632.50 samples/sec Loss 5.9095 LearningRate 0.0201 Epoch: 11 Global Step: 457450 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:07:19,750-Speed 2631.20 samples/sec Loss 6.0320 LearningRate 0.0201 Epoch: 11 Global Step: 457460 Fp16 Grad Scale: 32768 Required: 42 hours
Training: 2022-04-14 23:07:23,645-Speed 2629.55 samples/sec Loss 6.0681 LearningRate 0.0201 Epoch: 11 Global Step: 457470 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:27,537-Speed 2631.97 samples/sec Loss 5.9410 LearningRate 0.0201 Epoch: 11 Global Step: 457480 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:31,444-Speed 2621.73 samples/sec Loss 5.9431 LearningRate 0.0201 Epoch: 11 Global Step: 457490 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:35,338-Speed 2629.65 samples/sec Loss 6.1345 LearningRate 0.0201 Epoch: 11 Global Step: 457500 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:39,235-Speed 2628.28 samples/sec Loss 5.9460 LearningRate 0.0201 Epoch: 11 Global Step: 457510 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:43,146-Speed 2619.20 samples/sec Loss 5.9271 LearningRate 0.0201 Epoch: 11 Global Step: 457520 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:47,042-Speed 2628.93 samples/sec Loss 6.0123 LearningRate 0.0201 Epoch: 11 Global Step: 457530 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:50,936-Speed 2630.63 samples/sec Loss 6.0310 LearningRate 0.0201 Epoch: 11 Global Step: 457540 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:54,845-Speed 2620.28 samples/sec Loss 6.1241 LearningRate 0.0201 Epoch: 11 Global Step: 457550 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:07:58,782-Speed 2601.66 samples/sec Loss 6.1075 LearningRate 0.0201 Epoch: 11 Global Step: 457560 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:02,692-Speed 2619.65 samples/sec Loss 6.0830 LearningRate 0.0201 Epoch: 11 Global Step: 457570 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:08:06,588-Speed 2628.45 samples/sec Loss 5.9119 LearningRate 0.0201 Epoch: 11 Global Step: 457580 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:08:10,486-Speed 2627.43 samples/sec Loss 5.9765 LearningRate 0.0201 Epoch: 11 Global Step: 457590 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:08:14,378-Speed 2632.89 samples/sec Loss 6.0495 LearningRate 0.0201 Epoch: 11 Global Step: 457600 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:08:18,273-Speed 2630.52 samples/sec Loss 5.9255 LearningRate 0.0201 Epoch: 11 Global Step: 457610 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:08:22,166-Speed 2630.76 samples/sec Loss 6.1220 LearningRate 0.0201 Epoch: 11 Global Step: 457620 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:08:26,036-Speed 2646.72 samples/sec Loss 5.9868 LearningRate 0.0201 Epoch: 11 Global Step: 457630 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:29,932-Speed 2628.71 samples/sec Loss 6.0705 LearningRate 0.0201 Epoch: 11 Global Step: 457640 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:33,836-Speed 2623.78 samples/sec Loss 5.8704 LearningRate 0.0201 Epoch: 11 Global Step: 457650 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:37,731-Speed 2629.57 samples/sec Loss 5.8791 LearningRate 0.0201 Epoch: 11 Global Step: 457660 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:41,634-Speed 2623.97 samples/sec Loss 5.9630 LearningRate 0.0201 Epoch: 11 Global Step: 457670 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:45,530-Speed 2628.71 samples/sec Loss 5.8483 LearningRate 0.0201 Epoch: 11 Global Step: 457680 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:49,435-Speed 2623.36 samples/sec Loss 5.8383 LearningRate 0.0201 Epoch: 11 Global Step: 457690 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:53,346-Speed 2618.70 samples/sec Loss 6.1267 LearningRate 0.0201 Epoch: 11 Global Step: 457700 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:08:57,268-Speed 2611.87 samples/sec Loss 6.0072 LearningRate 0.0201 Epoch: 11 Global Step: 457710 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:09:01,165-Speed 2628.05 samples/sec Loss 6.0337 LearningRate 0.0201 Epoch: 11 Global Step: 457720 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:09:05,061-Speed 2629.28 samples/sec Loss 5.9353 LearningRate 0.0201 Epoch: 11 Global Step: 457730 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:08,980-Speed 2613.45 samples/sec Loss 5.9856 LearningRate 0.0201 Epoch: 11 Global Step: 457740 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:12,900-Speed 2612.51 samples/sec Loss 6.0036 LearningRate 0.0201 Epoch: 11 Global Step: 457750 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:16,793-Speed 2631.00 samples/sec Loss 5.9396 LearningRate 0.0201 Epoch: 11 Global Step: 457760 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:20,691-Speed 2628.16 samples/sec Loss 6.0771 LearningRate 0.0201 Epoch: 11 Global Step: 457770 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:24,583-Speed 2631.79 samples/sec Loss 6.1015 LearningRate 0.0201 Epoch: 11 Global Step: 457780 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:28,487-Speed 2623.57 samples/sec Loss 6.0265 LearningRate 0.0201 Epoch: 11 Global Step: 457790 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:32,382-Speed 2630.10 samples/sec Loss 6.0333 LearningRate 0.0201 Epoch: 11 Global Step: 457800 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:36,274-Speed 2631.02 samples/sec Loss 5.9711 LearningRate 0.0201 Epoch: 11 Global Step: 457810 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:40,172-Speed 2627.47 samples/sec Loss 5.9157 LearningRate 0.0201 Epoch: 11 Global Step: 457820 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:44,052-Speed 2640.03 samples/sec Loss 5.9305 LearningRate 0.0201 Epoch: 11 Global Step: 457830 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:09:47,929-Speed 2642.31 samples/sec Loss 6.0957 LearningRate 0.0201 Epoch: 11 Global Step: 457840 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:09:51,823-Speed 2630.17 samples/sec Loss 5.9576 LearningRate 0.0201 Epoch: 11 Global Step: 457850 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:09:55,748-Speed 2610.01 samples/sec Loss 5.9418 LearningRate 0.0201 Epoch: 11 Global Step: 457860 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:09:59,656-Speed 2620.42 samples/sec Loss 5.9501 LearningRate 0.0201 Epoch: 11 Global Step: 457870 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:03,550-Speed 2630.15 samples/sec Loss 6.0194 LearningRate 0.0201 Epoch: 11 Global Step: 457880 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:07,451-Speed 2625.34 samples/sec Loss 6.0431 LearningRate 0.0201 Epoch: 11 Global Step: 457890 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:11,380-Speed 2607.64 samples/sec Loss 5.9215 LearningRate 0.0201 Epoch: 11 Global Step: 457900 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:15,288-Speed 2620.61 samples/sec Loss 5.9789 LearningRate 0.0201 Epoch: 11 Global Step: 457910 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:19,191-Speed 2624.07 samples/sec Loss 5.9759 LearningRate 0.0201 Epoch: 11 Global Step: 457920 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:23,113-Speed 2611.48 samples/sec Loss 6.0079 LearningRate 0.0201 Epoch: 11 Global Step: 457930 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:27,011-Speed 2628.33 samples/sec Loss 6.0018 LearningRate 0.0201 Epoch: 11 Global Step: 457940 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:10:30,884-Speed 2644.19 samples/sec Loss 5.9616 LearningRate 0.0201 Epoch: 11 Global Step: 457950 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:34,786-Speed 2625.08 samples/sec Loss 5.9188 LearningRate 0.0201 Epoch: 11 Global Step: 457960 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:38,680-Speed 2629.64 samples/sec Loss 6.0769 LearningRate 0.0201 Epoch: 11 Global Step: 457970 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:42,629-Speed 2594.17 samples/sec Loss 5.9974 LearningRate 0.0201 Epoch: 11 Global Step: 457980 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:46,528-Speed 2627.46 samples/sec Loss 5.9384 LearningRate 0.0201 Epoch: 11 Global Step: 457990 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:50,529-Speed 2560.01 samples/sec Loss 5.7499 LearningRate 0.0201 Epoch: 11 Global Step: 458000 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:54,426-Speed 2628.50 samples/sec Loss 5.9517 LearningRate 0.0201 Epoch: 11 Global Step: 458010 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:10:58,321-Speed 2629.63 samples/sec Loss 5.9299 LearningRate 0.0201 Epoch: 11 Global Step: 458020 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:11:02,229-Speed 2621.09 samples/sec Loss 6.0460 LearningRate 0.0201 Epoch: 11 Global Step: 458030 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:11:06,124-Speed 2629.42 samples/sec Loss 5.9437 LearningRate 0.0201 Epoch: 11 Global Step: 458040 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:11:10,024-Speed 2625.65 samples/sec Loss 5.9049 LearningRate 0.0201 Epoch: 11 Global Step: 458050 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:13,943-Speed 2614.51 samples/sec Loss 6.0422 LearningRate 0.0201 Epoch: 11 Global Step: 458060 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:17,836-Speed 2630.79 samples/sec Loss 6.0531 LearningRate 0.0201 Epoch: 11 Global Step: 458070 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:21,730-Speed 2630.42 samples/sec Loss 6.0139 LearningRate 0.0201 Epoch: 11 Global Step: 458080 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:25,625-Speed 2629.70 samples/sec Loss 5.9708 LearningRate 0.0201 Epoch: 11 Global Step: 458090 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:29,533-Speed 2620.71 samples/sec Loss 5.9516 LearningRate 0.0201 Epoch: 11 Global Step: 458100 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:33,437-Speed 2623.69 samples/sec Loss 5.9562 LearningRate 0.0201 Epoch: 11 Global Step: 458110 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:37,354-Speed 2614.75 samples/sec Loss 6.0325 LearningRate 0.0200 Epoch: 11 Global Step: 458120 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:41,256-Speed 2624.40 samples/sec Loss 6.0931 LearningRate 0.0200 Epoch: 11 Global Step: 458130 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:45,155-Speed 2627.01 samples/sec Loss 6.0456 LearningRate 0.0200 Epoch: 11 Global Step: 458140 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:11:49,022-Speed 2648.89 samples/sec Loss 6.0353 LearningRate 0.0200 Epoch: 11 Global Step: 458150 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:11:52,922-Speed 2626.39 samples/sec Loss 6.0223 LearningRate 0.0200 Epoch: 11 Global Step: 458160 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:11:56,919-Speed 2562.79 samples/sec Loss 6.0303 LearningRate 0.0200 Epoch: 11 Global Step: 458170 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:00,837-Speed 2613.82 samples/sec Loss 5.9033 LearningRate 0.0200 Epoch: 11 Global Step: 458180 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:04,743-Speed 2622.13 samples/sec Loss 6.0091 LearningRate 0.0200 Epoch: 11 Global Step: 458190 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:08,649-Speed 2621.91 samples/sec Loss 6.0060 LearningRate 0.0200 Epoch: 11 Global Step: 458200 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:12,562-Speed 2618.01 samples/sec Loss 5.9398 LearningRate 0.0200 Epoch: 11 Global Step: 458210 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:16,460-Speed 2627.29 samples/sec Loss 5.9492 LearningRate 0.0200 Epoch: 11 Global Step: 458220 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:20,374-Speed 2616.66 samples/sec Loss 6.0306 LearningRate 0.0200 Epoch: 11 Global Step: 458230 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:24,278-Speed 2624.34 samples/sec Loss 6.0933 LearningRate 0.0200 Epoch: 11 Global Step: 458240 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:12:28,174-Speed 2629.28 samples/sec Loss 5.9849 LearningRate 0.0200 Epoch: 11 Global Step: 458250 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:32,099-Speed 2609.08 samples/sec Loss 6.0091 LearningRate 0.0200 Epoch: 11 Global Step: 458260 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:36,005-Speed 2622.14 samples/sec Loss 6.0346 LearningRate 0.0200 Epoch: 11 Global Step: 458270 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:39,904-Speed 2627.45 samples/sec Loss 6.0577 LearningRate 0.0200 Epoch: 11 Global Step: 458280 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:43,801-Speed 2628.16 samples/sec Loss 5.9912 LearningRate 0.0200 Epoch: 11 Global Step: 458290 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:47,701-Speed 2625.98 samples/sec Loss 5.8894 LearningRate 0.0200 Epoch: 11 Global Step: 458300 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:51,605-Speed 2624.20 samples/sec Loss 5.9655 LearningRate 0.0200 Epoch: 11 Global Step: 458310 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:55,500-Speed 2630.04 samples/sec Loss 6.0191 LearningRate 0.0200 Epoch: 11 Global Step: 458320 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:12:59,431-Speed 2605.02 samples/sec Loss 5.9913 LearningRate 0.0200 Epoch: 11 Global Step: 458330 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:03,329-Speed 2627.81 samples/sec Loss 5.9974 LearningRate 0.0200 Epoch: 11 Global Step: 458340 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:07,228-Speed 2627.27 samples/sec Loss 6.0256 LearningRate 0.0200 Epoch: 11 Global Step: 458350 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 23:13:11,108-Speed 2639.39 samples/sec Loss 6.0047 LearningRate 0.0200 Epoch: 11 Global Step: 458360 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:15,004-Speed 2629.44 samples/sec Loss 5.9852 LearningRate 0.0200 Epoch: 11 Global Step: 458370 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:18,905-Speed 2626.24 samples/sec Loss 5.9733 LearningRate 0.0200 Epoch: 11 Global Step: 458380 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:22,904-Speed 2561.10 samples/sec Loss 5.9613 LearningRate 0.0200 Epoch: 11 Global Step: 458390 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:26,796-Speed 2631.84 samples/sec Loss 6.0592 LearningRate 0.0200 Epoch: 11 Global Step: 458400 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:30,689-Speed 2631.19 samples/sec Loss 5.9639 LearningRate 0.0200 Epoch: 11 Global Step: 458410 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:35,619-Speed 2077.48 samples/sec Loss 5.9899 LearningRate 0.0200 Epoch: 11 Global Step: 458420 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:39,516-Speed 2628.50 samples/sec Loss 5.8574 LearningRate 0.0200 Epoch: 11 Global Step: 458430 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:43,429-Speed 2617.00 samples/sec Loss 5.9974 LearningRate 0.0200 Epoch: 11 Global Step: 458440 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:47,336-Speed 2621.72 samples/sec Loss 5.9906 LearningRate 0.0200 Epoch: 11 Global Step: 458450 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:51,230-Speed 2630.38 samples/sec Loss 5.9061 LearningRate 0.0200 Epoch: 11 Global Step: 458460 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:55,310-Speed 2510.79 samples/sec Loss 5.9854 LearningRate 0.0200 Epoch: 11 Global Step: 458470 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:13:59,208-Speed 2627.34 samples/sec Loss 5.9493 LearningRate 0.0200 Epoch: 11 Global Step: 458480 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:14:03,170-Speed 2584.93 samples/sec Loss 6.0044 LearningRate 0.0200 Epoch: 11 Global Step: 458490 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:14:07,069-Speed 2626.76 samples/sec Loss 5.8235 LearningRate 0.0200 Epoch: 11 Global Step: 458500 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:14:10,971-Speed 2625.15 samples/sec Loss 5.9084 LearningRate 0.0200 Epoch: 11 Global Step: 458510 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:14:14,872-Speed 2625.37 samples/sec Loss 5.9621 LearningRate 0.0200 Epoch: 11 Global Step: 458520 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:14:18,778-Speed 2622.52 samples/sec Loss 5.9395 LearningRate 0.0200 Epoch: 11 Global Step: 458530 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:14:22,679-Speed 2625.68 samples/sec Loss 5.9949 LearningRate 0.0200 Epoch: 11 Global Step: 458540 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:14:26,552-Speed 2644.77 samples/sec Loss 5.9680 LearningRate 0.0200 Epoch: 11 Global Step: 458550 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:30,448-Speed 2629.16 samples/sec Loss 5.8514 LearningRate 0.0200 Epoch: 11 Global Step: 458560 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:34,343-Speed 2629.18 samples/sec Loss 6.0411 LearningRate 0.0200 Epoch: 11 Global Step: 458570 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:38,242-Speed 2627.06 samples/sec Loss 6.0956 LearningRate 0.0200 Epoch: 11 Global Step: 458580 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:42,138-Speed 2628.66 samples/sec Loss 6.0976 LearningRate 0.0200 Epoch: 11 Global Step: 458590 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:46,047-Speed 2620.29 samples/sec Loss 6.0671 LearningRate 0.0200 Epoch: 11 Global Step: 458600 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:49,946-Speed 2626.99 samples/sec Loss 6.0010 LearningRate 0.0200 Epoch: 11 Global Step: 458610 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:53,855-Speed 2620.27 samples/sec Loss 5.9927 LearningRate 0.0200 Epoch: 11 Global Step: 458620 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:14:57,749-Speed 2630.68 samples/sec Loss 6.0124 LearningRate 0.0200 Epoch: 11 Global Step: 458630 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:01,644-Speed 2629.78 samples/sec Loss 6.0330 LearningRate 0.0200 Epoch: 11 Global Step: 458640 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:05,544-Speed 2626.38 samples/sec Loss 5.9872 LearningRate 0.0200 Epoch: 11 Global Step: 458650 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:15:09,436-Speed 2631.20 samples/sec Loss 5.8920 LearningRate 0.0200 Epoch: 11 Global Step: 458660 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:15:13,327-Speed 2632.46 samples/sec Loss 5.9161 LearningRate 0.0200 Epoch: 11 Global Step: 458670 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:15:17,236-Speed 2620.28 samples/sec Loss 6.0450 LearningRate 0.0200 Epoch: 11 Global Step: 458680 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:15:21,144-Speed 2621.35 samples/sec Loss 5.8806 LearningRate 0.0200 Epoch: 11 Global Step: 458690 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:15:25,039-Speed 2629.30 samples/sec Loss 5.9944 LearningRate 0.0200 Epoch: 11 Global Step: 458700 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:15:28,915-Speed 2642.62 samples/sec Loss 5.9933 LearningRate 0.0200 Epoch: 11 Global Step: 458710 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:32,816-Speed 2626.19 samples/sec Loss 5.9691 LearningRate 0.0200 Epoch: 11 Global Step: 458720 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:36,711-Speed 2629.69 samples/sec Loss 5.9197 LearningRate 0.0200 Epoch: 11 Global Step: 458730 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:40,608-Speed 2628.05 samples/sec Loss 5.9956 LearningRate 0.0200 Epoch: 11 Global Step: 458740 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:44,504-Speed 2628.86 samples/sec Loss 5.8907 LearningRate 0.0200 Epoch: 11 Global Step: 458750 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:48,402-Speed 2627.84 samples/sec Loss 5.9528 LearningRate 0.0200 Epoch: 11 Global Step: 458760 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:52,309-Speed 2621.74 samples/sec Loss 5.9800 LearningRate 0.0200 Epoch: 11 Global Step: 458770 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:15:56,214-Speed 2622.96 samples/sec Loss 6.0004 LearningRate 0.0200 Epoch: 11 Global Step: 458780 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:16:00,113-Speed 2626.74 samples/sec Loss 6.0734 LearningRate 0.0200 Epoch: 11 Global Step: 458790 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:16:04,014-Speed 2625.49 samples/sec Loss 5.9337 LearningRate 0.0200 Epoch: 11 Global Step: 458800 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:16:07,928-Speed 2616.81 samples/sec Loss 5.9191 LearningRate 0.0200 Epoch: 11 Global Step: 458810 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:11,937-Speed 2554.55 samples/sec Loss 6.0155 LearningRate 0.0200 Epoch: 11 Global Step: 458820 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:15,949-Speed 2552.98 samples/sec Loss 5.9046 LearningRate 0.0200 Epoch: 11 Global Step: 458830 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:19,846-Speed 2628.51 samples/sec Loss 5.9702 LearningRate 0.0200 Epoch: 11 Global Step: 458840 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:23,754-Speed 2620.99 samples/sec Loss 5.9516 LearningRate 0.0200 Epoch: 11 Global Step: 458850 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:27,659-Speed 2623.20 samples/sec Loss 5.9985 LearningRate 0.0200 Epoch: 11 Global Step: 458860 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:31,581-Speed 2611.49 samples/sec Loss 5.9516 LearningRate 0.0200 Epoch: 11 Global Step: 458870 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:35,479-Speed 2627.33 samples/sec Loss 6.0005 LearningRate 0.0200 Epoch: 11 Global Step: 458880 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:39,375-Speed 2628.87 samples/sec Loss 5.9127 LearningRate 0.0200 Epoch: 11 Global Step: 458890 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:43,270-Speed 2629.25 samples/sec Loss 5.9443 LearningRate 0.0200 Epoch: 11 Global Step: 458900 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:47,163-Speed 2630.95 samples/sec Loss 6.0653 LearningRate 0.0200 Epoch: 11 Global Step: 458910 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:51,059-Speed 2628.75 samples/sec Loss 5.9510 LearningRate 0.0200 Epoch: 11 Global Step: 458920 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:54,959-Speed 2626.59 samples/sec Loss 6.0666 LearningRate 0.0200 Epoch: 11 Global Step: 458930 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:16:58,879-Speed 2613.17 samples/sec Loss 5.9292 LearningRate 0.0200 Epoch: 11 Global Step: 458940 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:17:02,780-Speed 2625.33 samples/sec Loss 6.0349 LearningRate 0.0200 Epoch: 11 Global Step: 458950 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:17:06,689-Speed 2620.29 samples/sec Loss 6.0081 LearningRate 0.0200 Epoch: 11 Global Step: 458960 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:17:10,590-Speed 2625.12 samples/sec Loss 6.0495 LearningRate 0.0200 Epoch: 11 Global Step: 458970 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:17:14,498-Speed 2620.75 samples/sec Loss 6.0053 LearningRate 0.0200 Epoch: 11 Global Step: 458980 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:17:18,398-Speed 2626.38 samples/sec Loss 5.9486 LearningRate 0.0200 Epoch: 11 Global Step: 458990 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:17:22,281-Speed 2637.59 samples/sec Loss 5.9232 LearningRate 0.0200 Epoch: 11 Global Step: 459000 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:26,192-Speed 2619.14 samples/sec Loss 5.9608 LearningRate 0.0200 Epoch: 11 Global Step: 459010 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:30,093-Speed 2625.60 samples/sec Loss 5.9658 LearningRate 0.0200 Epoch: 11 Global Step: 459020 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:33,997-Speed 2623.74 samples/sec Loss 5.9623 LearningRate 0.0200 Epoch: 11 Global Step: 459030 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:37,901-Speed 2623.83 samples/sec Loss 5.9557 LearningRate 0.0200 Epoch: 11 Global Step: 459040 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:41,797-Speed 2628.79 samples/sec Loss 5.9679 LearningRate 0.0199 Epoch: 11 Global Step: 459050 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:45,700-Speed 2624.20 samples/sec Loss 5.8973 LearningRate 0.0199 Epoch: 11 Global Step: 459060 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:49,593-Speed 2630.57 samples/sec Loss 6.1125 LearningRate 0.0199 Epoch: 11 Global Step: 459070 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:53,500-Speed 2621.76 samples/sec Loss 6.0232 LearningRate 0.0199 Epoch: 11 Global Step: 459080 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:17:57,415-Speed 2615.86 samples/sec Loss 5.9246 LearningRate 0.0199 Epoch: 11 Global Step: 459090 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:01,313-Speed 2627.74 samples/sec Loss 5.8198 LearningRate 0.0199 Epoch: 11 Global Step: 459100 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:18:05,189-Speed 2642.96 samples/sec Loss 5.8829 LearningRate 0.0199 Epoch: 11 Global Step: 459110 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:09,084-Speed 2628.88 samples/sec Loss 6.0467 LearningRate 0.0199 Epoch: 11 Global Step: 459120 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:12,994-Speed 2619.62 samples/sec Loss 5.9550 LearningRate 0.0199 Epoch: 11 Global Step: 459130 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:16,897-Speed 2624.67 samples/sec Loss 5.9734 LearningRate 0.0199 Epoch: 11 Global Step: 459140 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:20,818-Speed 2611.89 samples/sec Loss 5.9673 LearningRate 0.0199 Epoch: 11 Global Step: 459150 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:24,714-Speed 2628.91 samples/sec Loss 5.9877 LearningRate 0.0199 Epoch: 11 Global Step: 459160 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:28,638-Speed 2610.60 samples/sec Loss 6.0014 LearningRate 0.0199 Epoch: 11 Global Step: 459170 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:32,559-Speed 2611.52 samples/sec Loss 6.0190 LearningRate 0.0199 Epoch: 11 Global Step: 459180 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:36,467-Speed 2621.06 samples/sec Loss 6.0225 LearningRate 0.0199 Epoch: 11 Global Step: 459190 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:40,369-Speed 2624.67 samples/sec Loss 5.9300 LearningRate 0.0199 Epoch: 11 Global Step: 459200 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:18:44,271-Speed 2625.28 samples/sec Loss 5.9391 LearningRate 0.0199 Epoch: 11 Global Step: 459210 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:18:48,168-Speed 2628.04 samples/sec Loss 5.8871 LearningRate 0.0199 Epoch: 11 Global Step: 459220 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:18:52,071-Speed 2624.07 samples/sec Loss 5.9985 LearningRate 0.0199 Epoch: 11 Global Step: 459230 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:18:55,973-Speed 2625.17 samples/sec Loss 5.9965 LearningRate 0.0199 Epoch: 11 Global Step: 459240 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:18:59,899-Speed 2609.30 samples/sec Loss 6.0364 LearningRate 0.0199 Epoch: 11 Global Step: 459250 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:19:03,797-Speed 2626.96 samples/sec Loss 5.9013 LearningRate 0.0199 Epoch: 11 Global Step: 459260 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:19:07,697-Speed 2626.38 samples/sec Loss 6.0158 LearningRate 0.0199 Epoch: 11 Global Step: 459270 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:19:11,601-Speed 2623.55 samples/sec Loss 6.0168 LearningRate 0.0199 Epoch: 11 Global Step: 459280 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:19:15,478-Speed 2642.29 samples/sec Loss 5.9253 LearningRate 0.0199 Epoch: 11 Global Step: 459290 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:19,378-Speed 2626.17 samples/sec Loss 5.8230 LearningRate 0.0199 Epoch: 11 Global Step: 459300 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:23,277-Speed 2626.39 samples/sec Loss 5.9822 LearningRate 0.0199 Epoch: 11 Global Step: 459310 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:27,178-Speed 2625.77 samples/sec Loss 6.0278 LearningRate 0.0199 Epoch: 11 Global Step: 459320 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:31,076-Speed 2628.09 samples/sec Loss 5.9590 LearningRate 0.0199 Epoch: 11 Global Step: 459330 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:34,974-Speed 2627.68 samples/sec Loss 5.9742 LearningRate 0.0199 Epoch: 11 Global Step: 459340 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:38,871-Speed 2627.87 samples/sec Loss 5.9845 LearningRate 0.0199 Epoch: 11 Global Step: 459350 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:42,774-Speed 2624.09 samples/sec Loss 6.0294 LearningRate 0.0199 Epoch: 11 Global Step: 459360 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:46,699-Speed 2609.84 samples/sec Loss 5.9516 LearningRate 0.0199 Epoch: 11 Global Step: 459370 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:50,598-Speed 2626.82 samples/sec Loss 6.0809 LearningRate 0.0199 Epoch: 11 Global Step: 459380 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:19:54,499-Speed 2625.44 samples/sec Loss 5.9631 LearningRate 0.0199 Epoch: 11 Global Step: 459390 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:19:58,403-Speed 2623.70 samples/sec Loss 6.0778 LearningRate 0.0199 Epoch: 11 Global Step: 459400 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:02,296-Speed 2630.40 samples/sec Loss 5.9133 LearningRate 0.0199 Epoch: 11 Global Step: 459410 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:06,200-Speed 2623.97 samples/sec Loss 6.0015 LearningRate 0.0199 Epoch: 11 Global Step: 459420 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:10,097-Speed 2628.06 samples/sec Loss 5.8816 LearningRate 0.0199 Epoch: 11 Global Step: 459430 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:13,995-Speed 2628.34 samples/sec Loss 5.8660 LearningRate 0.0199 Epoch: 11 Global Step: 459440 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:17,902-Speed 2621.35 samples/sec Loss 5.9463 LearningRate 0.0199 Epoch: 11 Global Step: 459450 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:21,797-Speed 2629.30 samples/sec Loss 5.9421 LearningRate 0.0199 Epoch: 11 Global Step: 459460 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:25,702-Speed 2622.60 samples/sec Loss 5.9995 LearningRate 0.0199 Epoch: 11 Global Step: 459470 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:29,607-Speed 2623.48 samples/sec Loss 5.8971 LearningRate 0.0199 Epoch: 11 Global Step: 459480 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:33,504-Speed 2628.04 samples/sec Loss 5.8812 LearningRate 0.0199 Epoch: 11 Global Step: 459490 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 23:20:37,571-Speed 2518.00 samples/sec Loss 5.9473 LearningRate 0.0199 Epoch: 11 Global Step: 459500 Fp16 Grad Scale: 262144 Required: 42 hours
Training: 2022-04-14 23:20:41,624-Speed 2527.40 samples/sec Loss 6.0013 LearningRate 0.0199 Epoch: 11 Global Step: 459510 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:20:45,617-Speed 2565.72 samples/sec Loss 5.9621 LearningRate 0.0199 Epoch: 11 Global Step: 459520 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:20:49,517-Speed 2626.60 samples/sec Loss 6.1051 LearningRate 0.0199 Epoch: 11 Global Step: 459530 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:20:53,419-Speed 2624.80 samples/sec Loss 5.8524 LearningRate 0.0199 Epoch: 11 Global Step: 459540 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:20:57,348-Speed 2607.25 samples/sec Loss 6.0001 LearningRate 0.0199 Epoch: 11 Global Step: 459550 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:21:01,258-Speed 2619.30 samples/sec Loss 5.9600 LearningRate 0.0199 Epoch: 11 Global Step: 459560 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:21:05,163-Speed 2623.02 samples/sec Loss 5.9415 LearningRate 0.0199 Epoch: 11 Global Step: 459570 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:21:09,063-Speed 2626.10 samples/sec Loss 5.8829 LearningRate 0.0199 Epoch: 11 Global Step: 459580 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:21:12,971-Speed 2620.76 samples/sec Loss 5.8114 LearningRate 0.0199 Epoch: 11 Global Step: 459590 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:21:16,878-Speed 2621.25 samples/sec Loss 6.0039 LearningRate 0.0199 Epoch: 11 Global Step: 459600 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:21:20,779-Speed 2626.15 samples/sec Loss 6.0167 LearningRate 0.0199 Epoch: 11 Global Step: 459610 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:21:24,675-Speed 2628.65 samples/sec Loss 5.9178 LearningRate 0.0199 Epoch: 11 Global Step: 459620 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:28,577-Speed 2625.31 samples/sec Loss 5.9917 LearningRate 0.0199 Epoch: 11 Global Step: 459630 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:32,474-Speed 2627.85 samples/sec Loss 5.9326 LearningRate 0.0199 Epoch: 11 Global Step: 459640 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:36,374-Speed 2626.56 samples/sec Loss 5.8887 LearningRate 0.0199 Epoch: 11 Global Step: 459650 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:40,270-Speed 2628.52 samples/sec Loss 6.1694 LearningRate 0.0199 Epoch: 11 Global Step: 459660 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:44,168-Speed 2628.01 samples/sec Loss 6.0761 LearningRate 0.0199 Epoch: 11 Global Step: 459670 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:48,067-Speed 2626.70 samples/sec Loss 6.0070 LearningRate 0.0199 Epoch: 11 Global Step: 459680 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:51,968-Speed 2625.45 samples/sec Loss 6.0083 LearningRate 0.0199 Epoch: 11 Global Step: 459690 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:55,862-Speed 2630.10 samples/sec Loss 6.0053 LearningRate 0.0199 Epoch: 11 Global Step: 459700 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:21:59,764-Speed 2625.27 samples/sec Loss 6.0074 LearningRate 0.0199 Epoch: 11 Global Step: 459710 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:03,653-Speed 2633.55 samples/sec Loss 5.9541 LearningRate 0.0199 Epoch: 11 Global Step: 459720 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:07,558-Speed 2623.20 samples/sec Loss 5.9281 LearningRate 0.0199 Epoch: 11 Global Step: 459730 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:11,462-Speed 2622.97 samples/sec Loss 6.0169 LearningRate 0.0199 Epoch: 11 Global Step: 459740 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:15,369-Speed 2621.50 samples/sec Loss 5.9742 LearningRate 0.0199 Epoch: 11 Global Step: 459750 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:19,271-Speed 2625.41 samples/sec Loss 5.9431 LearningRate 0.0199 Epoch: 11 Global Step: 459760 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:23,177-Speed 2621.76 samples/sec Loss 5.9914 LearningRate 0.0199 Epoch: 11 Global Step: 459770 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:27,083-Speed 2622.22 samples/sec Loss 5.9809 LearningRate 0.0199 Epoch: 11 Global Step: 459780 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:22:30,958-Speed 2643.50 samples/sec Loss 6.1228 LearningRate 0.0199 Epoch: 11 Global Step: 459790 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:22:34,858-Speed 2626.45 samples/sec Loss 5.9098 LearningRate 0.0199 Epoch: 11 Global Step: 459800 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:22:38,763-Speed 2622.54 samples/sec Loss 5.9785 LearningRate 0.0199 Epoch: 11 Global Step: 459810 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:22:42,664-Speed 2625.43 samples/sec Loss 5.8791 LearningRate 0.0199 Epoch: 11 Global Step: 459820 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:22:46,561-Speed 2628.28 samples/sec Loss 5.8242 LearningRate 0.0199 Epoch: 11 Global Step: 459830 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:22:50,461-Speed 2626.17 samples/sec Loss 5.9464 LearningRate 0.0199 Epoch: 11 Global Step: 459840 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:22:54,359-Speed 2627.82 samples/sec Loss 5.9611 LearningRate 0.0199 Epoch: 11 Global Step: 459850 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:22:58,257-Speed 2627.31 samples/sec Loss 5.9237 LearningRate 0.0199 Epoch: 11 Global Step: 459860 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:23:02,156-Speed 2627.28 samples/sec Loss 5.9810 LearningRate 0.0199 Epoch: 11 Global Step: 459870 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:23:06,056-Speed 2626.23 samples/sec Loss 5.9546 LearningRate 0.0199 Epoch: 11 Global Step: 459880 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:23:09,955-Speed 2626.41 samples/sec Loss 5.9392 LearningRate 0.0199 Epoch: 11 Global Step: 459890 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:13,849-Speed 2630.39 samples/sec Loss 6.0102 LearningRate 0.0199 Epoch: 11 Global Step: 459900 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:17,754-Speed 2623.53 samples/sec Loss 5.9153 LearningRate 0.0199 Epoch: 11 Global Step: 459910 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:21,673-Speed 2613.07 samples/sec Loss 5.8901 LearningRate 0.0199 Epoch: 11 Global Step: 459920 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:25,580-Speed 2622.67 samples/sec Loss 5.9388 LearningRate 0.0199 Epoch: 11 Global Step: 459930 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:29,471-Speed 2631.99 samples/sec Loss 6.0213 LearningRate 0.0199 Epoch: 11 Global Step: 459940 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:33,365-Speed 2630.48 samples/sec Loss 5.8549 LearningRate 0.0199 Epoch: 11 Global Step: 459950 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:37,285-Speed 2612.32 samples/sec Loss 5.9601 LearningRate 0.0199 Epoch: 11 Global Step: 459960 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:41,181-Speed 2628.74 samples/sec Loss 6.0283 LearningRate 0.0199 Epoch: 11 Global Step: 459970 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:45,082-Speed 2625.94 samples/sec Loss 5.9250 LearningRate 0.0198 Epoch: 11 Global Step: 459980 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:48,972-Speed 2633.01 samples/sec Loss 5.9942 LearningRate 0.0198 Epoch: 11 Global Step: 459990 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:23:52,869-Speed 2628.56 samples/sec Loss 5.8300 LearningRate 0.0198 Epoch: 11 Global Step: 460000 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:24:35,690-[lfw][460000]XNorm: 23.337887
Training: 2022-04-14 23:24:35,691-[lfw][460000]Accuracy-Flip: 0.99783+-0.00289
Training: 2022-04-14 23:24:35,692-[lfw][460000]Accuracy-Highest: 0.99783
Training: 2022-04-14 23:25:25,868-[cfp_fp][460000]XNorm: 21.993573
Training: 2022-04-14 23:25:25,869-[cfp_fp][460000]Accuracy-Flip: 0.98843+-0.00391
Training: 2022-04-14 23:25:25,869-[cfp_fp][460000]Accuracy-Highest: 0.98843
Training: 2022-04-14 23:26:08,311-[agedb_30][460000]XNorm: 23.593340
Training: 2022-04-14 23:26:08,312-[agedb_30][460000]Accuracy-Flip: 0.97717+-0.00711
Training: 2022-04-14 23:26:08,312-[agedb_30][460000]Accuracy-Highest: 0.97817
Training: 2022-04-14 23:26:12,187-Speed 73.50 samples/sec Loss 5.9777 LearningRate 0.0198 Epoch: 11 Global Step: 460010 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:26:16,059-Speed 2644.85 samples/sec Loss 5.8247 LearningRate 0.0198 Epoch: 11 Global Step: 460020 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:26:19,955-Speed 2629.27 samples/sec Loss 5.8848 LearningRate 0.0198 Epoch: 11 Global Step: 460030 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:26:23,883-Speed 2607.36 samples/sec Loss 6.0856 LearningRate 0.0198 Epoch: 11 Global Step: 460040 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:26:27,767-Speed 2637.22 samples/sec Loss 6.0475 LearningRate 0.0198 Epoch: 11 Global Step: 460050 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:26:31,658-Speed 2632.21 samples/sec Loss 5.8778 LearningRate 0.0198 Epoch: 11 Global Step: 460060 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:26:35,527-Speed 2646.88 samples/sec Loss 6.0048 LearningRate 0.0198 Epoch: 11 Global Step: 460070 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:26:39,421-Speed 2630.63 samples/sec Loss 5.8898 LearningRate 0.0198 Epoch: 11 Global Step: 460080 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:26:43,312-Speed 2632.30 samples/sec Loss 5.8244 LearningRate 0.0198 Epoch: 11 Global Step: 460090 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:26:47,214-Speed 2625.28 samples/sec Loss 5.9381 LearningRate 0.0198 Epoch: 11 Global Step: 460100 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:26:51,113-Speed 2627.07 samples/sec Loss 5.9107 LearningRate 0.0198 Epoch: 11 Global Step: 460110 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:26:55,014-Speed 2625.54 samples/sec Loss 5.9769 LearningRate 0.0198 Epoch: 11 Global Step: 460120 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:26:58,910-Speed 2628.73 samples/sec Loss 5.9966 LearningRate 0.0198 Epoch: 11 Global Step: 460130 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:27:02,812-Speed 2625.09 samples/sec Loss 5.8871 LearningRate 0.0198 Epoch: 11 Global Step: 460140 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:27:06,709-Speed 2628.10 samples/sec Loss 5.9303 LearningRate 0.0198 Epoch: 11 Global Step: 460150 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:27:10,626-Speed 2614.83 samples/sec Loss 6.0066 LearningRate 0.0198 Epoch: 11 Global Step: 460160 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:27:14,540-Speed 2616.93 samples/sec Loss 5.8873 LearningRate 0.0198 Epoch: 11 Global Step: 460170 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:18,441-Speed 2625.72 samples/sec Loss 5.9095 LearningRate 0.0198 Epoch: 11 Global Step: 460180 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:22,341-Speed 2626.29 samples/sec Loss 5.8994 LearningRate 0.0198 Epoch: 11 Global Step: 460190 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:26,235-Speed 2630.14 samples/sec Loss 5.8389 LearningRate 0.0198 Epoch: 11 Global Step: 460200 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:30,155-Speed 2613.11 samples/sec Loss 6.0217 LearningRate 0.0198 Epoch: 11 Global Step: 460210 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:34,047-Speed 2630.90 samples/sec Loss 5.9545 LearningRate 0.0198 Epoch: 11 Global Step: 460220 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:37,943-Speed 2629.29 samples/sec Loss 5.8418 LearningRate 0.0198 Epoch: 11 Global Step: 460230 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:41,845-Speed 2624.87 samples/sec Loss 5.9390 LearningRate 0.0198 Epoch: 11 Global Step: 460240 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:45,734-Speed 2633.44 samples/sec Loss 6.0291 LearningRate 0.0198 Epoch: 11 Global Step: 460250 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:49,643-Speed 2620.57 samples/sec Loss 5.9401 LearningRate 0.0198 Epoch: 11 Global Step: 460260 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:53,593-Speed 2593.15 samples/sec Loss 5.9689 LearningRate 0.0198 Epoch: 11 Global Step: 460270 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:27:57,673-Speed 2510.12 samples/sec Loss 6.0217 LearningRate 0.0198 Epoch: 11 Global Step: 460280 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:28:01,758-Speed 2507.55 samples/sec Loss 5.8974 LearningRate 0.0198 Epoch: 11 Global Step: 460290 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:28:05,822-Speed 2520.16 samples/sec Loss 5.9463 LearningRate 0.0198 Epoch: 11 Global Step: 460300 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:28:09,719-Speed 2628.06 samples/sec Loss 5.9069 LearningRate 0.0198 Epoch: 11 Global Step: 460310 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:28:13,655-Speed 2602.04 samples/sec Loss 5.9360 LearningRate 0.0198 Epoch: 11 Global Step: 460320 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:28:17,552-Speed 2628.92 samples/sec Loss 5.9102 LearningRate 0.0198 Epoch: 11 Global Step: 460330 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:28:21,445-Speed 2630.83 samples/sec Loss 6.0327 LearningRate 0.0198 Epoch: 11 Global Step: 460340 Fp16 Grad Scale: 131072 Required: 42 hours
Training: 2022-04-14 23:28:25,347-Speed 2624.33 samples/sec Loss 5.9500 LearningRate 0.0198 Epoch: 11 Global Step: 460350 Fp16 Grad Scale: 65536 Required: 42 hours
Training: 2022-04-14 23:28:29,243-Speed 2629.16 samples/sec Loss 5.9525 LearningRate 0.0198 Epoch: 11 Global Step: 460360 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:28:33,172-Speed 2607.01 samples/sec Loss 5.9571 LearningRate 0.0198 Epoch: 11 Global Step: 460370 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:28:37,248-Speed 2512.64 samples/sec Loss 5.9976 LearningRate 0.0198 Epoch: 11 Global Step: 460380 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:28:41,254-Speed 2563.06 samples/sec Loss 6.0768 LearningRate 0.0198 Epoch: 11 Global Step: 460390 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:28:45,172-Speed 2613.79 samples/sec Loss 5.9831 LearningRate 0.0198 Epoch: 11 Global Step: 460400 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:28:49,079-Speed 2621.93 samples/sec Loss 5.9004 LearningRate 0.0198 Epoch: 11 Global Step: 460410 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:28:52,981-Speed 2625.17 samples/sec Loss 6.0043 LearningRate 0.0198 Epoch: 11 Global Step: 460420 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:28:56,876-Speed 2629.28 samples/sec Loss 5.9665 LearningRate 0.0198 Epoch: 11 Global Step: 460430 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:29:00,771-Speed 2629.23 samples/sec Loss 5.8894 LearningRate 0.0198 Epoch: 11 Global Step: 460440 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:29:04,667-Speed 2629.03 samples/sec Loss 5.8979 LearningRate 0.0198 Epoch: 11 Global Step: 460450 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:08,560-Speed 2631.12 samples/sec Loss 5.9814 LearningRate 0.0198 Epoch: 11 Global Step: 460460 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:12,460-Speed 2626.55 samples/sec Loss 5.9548 LearningRate 0.0198 Epoch: 11 Global Step: 460470 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:16,354-Speed 2630.19 samples/sec Loss 5.9542 LearningRate 0.0198 Epoch: 11 Global Step: 460480 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:20,252-Speed 2627.65 samples/sec Loss 6.0454 LearningRate 0.0198 Epoch: 11 Global Step: 460490 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:24,149-Speed 2628.37 samples/sec Loss 6.0396 LearningRate 0.0198 Epoch: 11 Global Step: 460500 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:28,045-Speed 2628.79 samples/sec Loss 5.9623 LearningRate 0.0198 Epoch: 11 Global Step: 460510 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:31,936-Speed 2631.81 samples/sec Loss 5.9373 LearningRate 0.0198 Epoch: 11 Global Step: 460520 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:35,835-Speed 2627.12 samples/sec Loss 5.9309 LearningRate 0.0198 Epoch: 11 Global Step: 460530 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:39,734-Speed 2626.99 samples/sec Loss 5.8887 LearningRate 0.0198 Epoch: 11 Global Step: 460540 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:43,603-Speed 2647.25 samples/sec Loss 5.9134 LearningRate 0.0198 Epoch: 11 Global Step: 460550 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:47,502-Speed 2627.13 samples/sec Loss 5.9425 LearningRate 0.0198 Epoch: 11 Global Step: 460560 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:51,395-Speed 2630.83 samples/sec Loss 6.1320 LearningRate 0.0198 Epoch: 11 Global Step: 460570 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:55,290-Speed 2629.43 samples/sec Loss 5.8773 LearningRate 0.0198 Epoch: 11 Global Step: 460580 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:29:59,192-Speed 2625.18 samples/sec Loss 5.8891 LearningRate 0.0198 Epoch: 11 Global Step: 460590 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:30:03,114-Speed 2611.19 samples/sec Loss 5.9842 LearningRate 0.0198 Epoch: 11 Global Step: 460600 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:30:07,009-Speed 2629.90 samples/sec Loss 5.9835 LearningRate 0.0198 Epoch: 11 Global Step: 460610 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:30:10,909-Speed 2626.05 samples/sec Loss 6.0147 LearningRate 0.0198 Epoch: 11 Global Step: 460620 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:30:14,804-Speed 2629.80 samples/sec Loss 5.8086 LearningRate 0.0198 Epoch: 11 Global Step: 460630 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:30:18,699-Speed 2629.69 samples/sec Loss 5.9297 LearningRate 0.0198 Epoch: 11 Global Step: 460640 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:30:22,569-Speed 2646.36 samples/sec Loss 5.9024 LearningRate 0.0198 Epoch: 11 Global Step: 460650 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:26,465-Speed 2628.78 samples/sec Loss 5.9829 LearningRate 0.0198 Epoch: 11 Global Step: 460660 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:30,372-Speed 2621.89 samples/sec Loss 6.0298 LearningRate 0.0198 Epoch: 11 Global Step: 460670 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:34,274-Speed 2624.96 samples/sec Loss 5.8606 LearningRate 0.0198 Epoch: 11 Global Step: 460680 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:38,170-Speed 2628.49 samples/sec Loss 5.9850 LearningRate 0.0198 Epoch: 11 Global Step: 460690 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:42,063-Speed 2630.96 samples/sec Loss 5.9780 LearningRate 0.0198 Epoch: 11 Global Step: 460700 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:45,954-Speed 2632.65 samples/sec Loss 6.0282 LearningRate 0.0198 Epoch: 11 Global Step: 460710 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:49,855-Speed 2625.44 samples/sec Loss 5.7882 LearningRate 0.0198 Epoch: 11 Global Step: 460720 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:53,760-Speed 2622.77 samples/sec Loss 5.9366 LearningRate 0.0198 Epoch: 11 Global Step: 460730 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:30:57,662-Speed 2625.15 samples/sec Loss 6.0086 LearningRate 0.0198 Epoch: 11 Global Step: 460740 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:31:01,573-Speed 2618.63 samples/sec Loss 5.8989 LearningRate 0.0198 Epoch: 11 Global Step: 460750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:05,472-Speed 2627.16 samples/sec Loss 5.9254 LearningRate 0.0198 Epoch: 11 Global Step: 460760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:09,367-Speed 2629.92 samples/sec Loss 5.8872 LearningRate 0.0198 Epoch: 11 Global Step: 460770 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:13,264-Speed 2627.85 samples/sec Loss 5.8815 LearningRate 0.0198 Epoch: 11 Global Step: 460780 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:17,159-Speed 2629.41 samples/sec Loss 5.9480 LearningRate 0.0198 Epoch: 11 Global Step: 460790 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:21,069-Speed 2620.01 samples/sec Loss 5.9693 LearningRate 0.0198 Epoch: 11 Global Step: 460800 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:24,965-Speed 2628.95 samples/sec Loss 5.9813 LearningRate 0.0198 Epoch: 11 Global Step: 460810 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:28,862-Speed 2628.38 samples/sec Loss 5.8703 LearningRate 0.0198 Epoch: 11 Global Step: 460820 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:32,765-Speed 2623.74 samples/sec Loss 6.0856 LearningRate 0.0198 Epoch: 11 Global Step: 460830 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:36,664-Speed 2626.54 samples/sec Loss 6.0098 LearningRate 0.0198 Epoch: 11 Global Step: 460840 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:31:40,561-Speed 2628.82 samples/sec Loss 5.9578 LearningRate 0.0198 Epoch: 11 Global Step: 460850 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:31:44,452-Speed 2632.29 samples/sec Loss 5.8506 LearningRate 0.0198 Epoch: 11 Global Step: 460860 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:31:48,343-Speed 2632.38 samples/sec Loss 5.8542 LearningRate 0.0198 Epoch: 11 Global Step: 460870 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:31:52,236-Speed 2631.25 samples/sec Loss 5.9582 LearningRate 0.0198 Epoch: 11 Global Step: 460880 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:31:56,128-Speed 2631.45 samples/sec Loss 6.0517 LearningRate 0.0198 Epoch: 11 Global Step: 460890 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:00,026-Speed 2627.90 samples/sec Loss 5.9105 LearningRate 0.0198 Epoch: 11 Global Step: 460900 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:03,919-Speed 2630.28 samples/sec Loss 5.9269 LearningRate 0.0197 Epoch: 11 Global Step: 460910 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:07,823-Speed 2623.81 samples/sec Loss 5.9453 LearningRate 0.0197 Epoch: 11 Global Step: 460920 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:11,717-Speed 2629.87 samples/sec Loss 5.9604 LearningRate 0.0197 Epoch: 11 Global Step: 460930 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:15,638-Speed 2612.37 samples/sec Loss 5.9185 LearningRate 0.0197 Epoch: 11 Global Step: 460940 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:19,517-Speed 2641.24 samples/sec Loss 5.9250 LearningRate 0.0197 Epoch: 11 Global Step: 460950 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:23,422-Speed 2622.38 samples/sec Loss 5.9558 LearningRate 0.0197 Epoch: 11 Global Step: 460960 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:27,317-Speed 2630.10 samples/sec Loss 5.9265 LearningRate 0.0197 Epoch: 11 Global Step: 460970 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:31,208-Speed 2632.12 samples/sec Loss 5.9220 LearningRate 0.0197 Epoch: 11 Global Step: 460980 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:35,097-Speed 2633.53 samples/sec Loss 5.9478 LearningRate 0.0197 Epoch: 11 Global Step: 460990 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:32:38,968-Speed 2645.72 samples/sec Loss 5.9751 LearningRate 0.0197 Epoch: 11 Global Step: 461000 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:32:42,871-Speed 2624.48 samples/sec Loss 5.9011 LearningRate 0.0197 Epoch: 11 Global Step: 461010 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:32:46,767-Speed 2628.40 samples/sec Loss 5.9699 LearningRate 0.0197 Epoch: 11 Global Step: 461020 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:32:50,668-Speed 2626.14 samples/sec Loss 5.9249 LearningRate 0.0197 Epoch: 11 Global Step: 461030 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:32:54,563-Speed 2629.07 samples/sec Loss 5.9903 LearningRate 0.0197 Epoch: 11 Global Step: 461040 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:32:58,460-Speed 2628.94 samples/sec Loss 5.8599 LearningRate 0.0197 Epoch: 11 Global Step: 461050 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:02,350-Speed 2632.52 samples/sec Loss 5.7924 LearningRate 0.0197 Epoch: 11 Global Step: 461060 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:06,245-Speed 2629.91 samples/sec Loss 5.9508 LearningRate 0.0197 Epoch: 11 Global Step: 461070 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:10,147-Speed 2624.70 samples/sec Loss 5.9603 LearningRate 0.0197 Epoch: 11 Global Step: 461080 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:14,069-Speed 2611.58 samples/sec Loss 5.8163 LearningRate 0.0197 Epoch: 11 Global Step: 461090 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:17,964-Speed 2629.49 samples/sec Loss 5.8611 LearningRate 0.0197 Epoch: 11 Global Step: 461100 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:33:21,848-Speed 2636.81 samples/sec Loss 5.9613 LearningRate 0.0197 Epoch: 11 Global Step: 461110 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:25,754-Speed 2622.96 samples/sec Loss 6.0704 LearningRate 0.0197 Epoch: 11 Global Step: 461120 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:29,648-Speed 2630.42 samples/sec Loss 5.8855 LearningRate 0.0197 Epoch: 11 Global Step: 461130 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:33,542-Speed 2630.12 samples/sec Loss 5.9174 LearningRate 0.0197 Epoch: 11 Global Step: 461140 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:37,437-Speed 2629.14 samples/sec Loss 5.9593 LearningRate 0.0197 Epoch: 11 Global Step: 461150 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:41,351-Speed 2617.67 samples/sec Loss 5.8743 LearningRate 0.0197 Epoch: 11 Global Step: 461160 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:45,266-Speed 2615.70 samples/sec Loss 5.8641 LearningRate 0.0197 Epoch: 11 Global Step: 461170 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:49,180-Speed 2617.85 samples/sec Loss 5.9145 LearningRate 0.0197 Epoch: 11 Global Step: 461180 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:53,078-Speed 2626.90 samples/sec Loss 5.9743 LearningRate 0.0197 Epoch: 11 Global Step: 461190 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:33:56,972-Speed 2630.38 samples/sec Loss 5.9841 LearningRate 0.0197 Epoch: 11 Global Step: 461200 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:00,868-Speed 2629.44 samples/sec Loss 5.8880 LearningRate 0.0197 Epoch: 11 Global Step: 461210 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:34:04,762-Speed 2629.65 samples/sec Loss 5.9015 LearningRate 0.0197 Epoch: 11 Global Step: 461220 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:34:08,660-Speed 2627.31 samples/sec Loss 5.9604 LearningRate 0.0197 Epoch: 11 Global Step: 461230 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:34:12,554-Speed 2631.09 samples/sec Loss 5.8706 LearningRate 0.0197 Epoch: 11 Global Step: 461240 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:34:16,452-Speed 2627.28 samples/sec Loss 5.9358 LearningRate 0.0197 Epoch: 11 Global Step: 461250 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:34:20,360-Speed 2621.37 samples/sec Loss 5.9177 LearningRate 0.0197 Epoch: 11 Global Step: 461260 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:34:24,276-Speed 2615.35 samples/sec Loss 5.9353 LearningRate 0.0197 Epoch: 11 Global Step: 461270 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:34:28,147-Speed 2646.33 samples/sec Loss 5.8463 LearningRate 0.0197 Epoch: 11 Global Step: 461280 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:32,041-Speed 2630.30 samples/sec Loss 5.9975 LearningRate 0.0197 Epoch: 11 Global Step: 461290 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:35,942-Speed 2625.58 samples/sec Loss 5.8356 LearningRate 0.0197 Epoch: 11 Global Step: 461300 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:39,853-Speed 2618.70 samples/sec Loss 5.9005 LearningRate 0.0197 Epoch: 11 Global Step: 461310 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:43,748-Speed 2629.99 samples/sec Loss 5.9697 LearningRate 0.0197 Epoch: 11 Global Step: 461320 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:47,639-Speed 2632.33 samples/sec Loss 5.9839 LearningRate 0.0197 Epoch: 11 Global Step: 461330 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:51,532-Speed 2630.74 samples/sec Loss 5.9198 LearningRate 0.0197 Epoch: 11 Global Step: 461340 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:55,450-Speed 2613.98 samples/sec Loss 5.9253 LearningRate 0.0197 Epoch: 11 Global Step: 461350 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:34:59,350-Speed 2626.71 samples/sec Loss 5.7891 LearningRate 0.0197 Epoch: 11 Global Step: 461360 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:03,249-Speed 2626.91 samples/sec Loss 5.9037 LearningRate 0.0197 Epoch: 11 Global Step: 461370 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:07,145-Speed 2629.06 samples/sec Loss 6.0204 LearningRate 0.0197 Epoch: 11 Global Step: 461380 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:35:11,046-Speed 2625.24 samples/sec Loss 5.8962 LearningRate 0.0197 Epoch: 11 Global Step: 461390 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:35:14,956-Speed 2619.83 samples/sec Loss 5.9133 LearningRate 0.0197 Epoch: 11 Global Step: 461400 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:35:18,833-Speed 2641.74 samples/sec Loss 5.9239 LearningRate 0.0197 Epoch: 11 Global Step: 461410 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:22,726-Speed 2630.51 samples/sec Loss 5.8939 LearningRate 0.0197 Epoch: 11 Global Step: 461420 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:26,632-Speed 2622.93 samples/sec Loss 5.8798 LearningRate 0.0197 Epoch: 11 Global Step: 461430 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:30,570-Speed 2600.85 samples/sec Loss 5.9869 LearningRate 0.0197 Epoch: 11 Global Step: 461440 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:34,836-Speed 2402.09 samples/sec Loss 6.0217 LearningRate 0.0197 Epoch: 11 Global Step: 461450 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:38,736-Speed 2625.97 samples/sec Loss 5.8537 LearningRate 0.0197 Epoch: 11 Global Step: 461460 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:42,648-Speed 2618.27 samples/sec Loss 5.9506 LearningRate 0.0197 Epoch: 11 Global Step: 461470 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:46,541-Speed 2630.94 samples/sec Loss 5.8315 LearningRate 0.0197 Epoch: 11 Global Step: 461480 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:50,442-Speed 2625.79 samples/sec Loss 5.9906 LearningRate 0.0197 Epoch: 11 Global Step: 461490 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:54,493-Speed 2528.55 samples/sec Loss 5.8740 LearningRate 0.0197 Epoch: 11 Global Step: 461500 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:35:58,432-Speed 2600.72 samples/sec Loss 6.0161 LearningRate 0.0197 Epoch: 11 Global Step: 461510 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:36:02,444-Speed 2552.75 samples/sec Loss 5.9049 LearningRate 0.0197 Epoch: 11 Global Step: 461520 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:06,457-Speed 2552.73 samples/sec Loss 6.0216 LearningRate 0.0197 Epoch: 11 Global Step: 461530 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:10,359-Speed 2624.25 samples/sec Loss 5.8614 LearningRate 0.0197 Epoch: 11 Global Step: 461540 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:14,396-Speed 2537.96 samples/sec Loss 5.9807 LearningRate 0.0197 Epoch: 11 Global Step: 461550 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:18,317-Speed 2611.83 samples/sec Loss 5.8920 LearningRate 0.0197 Epoch: 11 Global Step: 461560 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:22,214-Speed 2628.51 samples/sec Loss 5.8616 LearningRate 0.0197 Epoch: 11 Global Step: 461570 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:26,114-Speed 2626.09 samples/sec Loss 5.9027 LearningRate 0.0197 Epoch: 11 Global Step: 461580 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:30,015-Speed 2625.52 samples/sec Loss 5.9140 LearningRate 0.0197 Epoch: 11 Global Step: 461590 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:33,920-Speed 2622.49 samples/sec Loss 5.8229 LearningRate 0.0197 Epoch: 11 Global Step: 461600 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:37,821-Speed 2625.53 samples/sec Loss 5.9580 LearningRate 0.0197 Epoch: 11 Global Step: 461610 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:36:41,723-Speed 2624.87 samples/sec Loss 5.9623 LearningRate 0.0197 Epoch: 11 Global Step: 461620 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:36:45,621-Speed 2628.42 samples/sec Loss 5.9407 LearningRate 0.0197 Epoch: 11 Global Step: 461630 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:36:49,516-Speed 2629.45 samples/sec Loss 5.9008 LearningRate 0.0197 Epoch: 11 Global Step: 461640 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:36:53,428-Speed 2618.67 samples/sec Loss 5.8088 LearningRate 0.0197 Epoch: 11 Global Step: 461650 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:36:57,425-Speed 2562.59 samples/sec Loss 5.8479 LearningRate 0.0197 Epoch: 11 Global Step: 461660 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:01,319-Speed 2630.23 samples/sec Loss 5.9547 LearningRate 0.0197 Epoch: 11 Global Step: 461670 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:05,252-Speed 2603.73 samples/sec Loss 5.8733 LearningRate 0.0197 Epoch: 11 Global Step: 461680 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:09,154-Speed 2625.27 samples/sec Loss 5.9365 LearningRate 0.0197 Epoch: 11 Global Step: 461690 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:13,065-Speed 2619.51 samples/sec Loss 5.9955 LearningRate 0.0197 Epoch: 11 Global Step: 461700 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:16,960-Speed 2629.34 samples/sec Loss 5.9381 LearningRate 0.0197 Epoch: 11 Global Step: 461710 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:20,861-Speed 2625.71 samples/sec Loss 5.9803 LearningRate 0.0197 Epoch: 11 Global Step: 461720 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:24,769-Speed 2620.74 samples/sec Loss 5.8743 LearningRate 0.0197 Epoch: 11 Global Step: 461730 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:37:28,650-Speed 2639.68 samples/sec Loss 5.8323 LearningRate 0.0197 Epoch: 11 Global Step: 461740 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:32,542-Speed 2631.39 samples/sec Loss 5.9727 LearningRate 0.0197 Epoch: 11 Global Step: 461750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:36,435-Speed 2631.04 samples/sec Loss 6.0574 LearningRate 0.0197 Epoch: 11 Global Step: 461760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:40,331-Speed 2628.77 samples/sec Loss 5.9585 LearningRate 0.0197 Epoch: 11 Global Step: 461770 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:44,224-Speed 2631.62 samples/sec Loss 6.0069 LearningRate 0.0197 Epoch: 11 Global Step: 461780 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:48,117-Speed 2630.65 samples/sec Loss 5.8709 LearningRate 0.0197 Epoch: 11 Global Step: 461790 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:52,008-Speed 2632.31 samples/sec Loss 6.0385 LearningRate 0.0197 Epoch: 11 Global Step: 461800 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:55,921-Speed 2618.00 samples/sec Loss 5.9156 LearningRate 0.0197 Epoch: 11 Global Step: 461810 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:37:59,814-Speed 2630.55 samples/sec Loss 5.9600 LearningRate 0.0197 Epoch: 11 Global Step: 461820 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:03,709-Speed 2629.30 samples/sec Loss 5.9184 LearningRate 0.0197 Epoch: 11 Global Step: 461830 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:07,628-Speed 2614.12 samples/sec Loss 5.9418 LearningRate 0.0197 Epoch: 11 Global Step: 461840 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:38:11,522-Speed 2630.11 samples/sec Loss 5.9942 LearningRate 0.0196 Epoch: 11 Global Step: 461850 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:38:15,400-Speed 2641.33 samples/sec Loss 5.8555 LearningRate 0.0196 Epoch: 11 Global Step: 461860 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:19,309-Speed 2620.66 samples/sec Loss 5.8587 LearningRate 0.0196 Epoch: 11 Global Step: 461870 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:23,205-Speed 2628.70 samples/sec Loss 5.9262 LearningRate 0.0196 Epoch: 11 Global Step: 461880 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:27,097-Speed 2631.72 samples/sec Loss 5.8928 LearningRate 0.0196 Epoch: 11 Global Step: 461890 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:31,002-Speed 2623.10 samples/sec Loss 6.0075 LearningRate 0.0196 Epoch: 11 Global Step: 461900 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:34,905-Speed 2624.16 samples/sec Loss 5.9267 LearningRate 0.0196 Epoch: 11 Global Step: 461910 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:38,803-Speed 2627.38 samples/sec Loss 5.9527 LearningRate 0.0196 Epoch: 11 Global Step: 461920 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:42,692-Speed 2634.28 samples/sec Loss 5.9548 LearningRate 0.0196 Epoch: 11 Global Step: 461930 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:46,611-Speed 2613.50 samples/sec Loss 5.8598 LearningRate 0.0196 Epoch: 11 Global Step: 461940 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:50,508-Speed 2628.32 samples/sec Loss 5.8339 LearningRate 0.0196 Epoch: 11 Global Step: 461950 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:38:54,401-Speed 2630.47 samples/sec Loss 5.9018 LearningRate 0.0196 Epoch: 11 Global Step: 461960 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:38:58,310-Speed 2620.54 samples/sec Loss 6.0068 LearningRate 0.0196 Epoch: 11 Global Step: 461970 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:02,244-Speed 2603.01 samples/sec Loss 5.8926 LearningRate 0.0196 Epoch: 11 Global Step: 461980 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:06,142-Speed 2627.99 samples/sec Loss 5.8432 LearningRate 0.0196 Epoch: 11 Global Step: 461990 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:10,051-Speed 2619.66 samples/sec Loss 5.8361 LearningRate 0.0196 Epoch: 11 Global Step: 462000 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:13,950-Speed 2627.67 samples/sec Loss 5.9144 LearningRate 0.0196 Epoch: 11 Global Step: 462010 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:17,848-Speed 2627.19 samples/sec Loss 5.8293 LearningRate 0.0196 Epoch: 11 Global Step: 462020 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:21,749-Speed 2626.01 samples/sec Loss 5.8825 LearningRate 0.0196 Epoch: 11 Global Step: 462030 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:25,649-Speed 2626.27 samples/sec Loss 5.9157 LearningRate 0.0196 Epoch: 11 Global Step: 462040 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:29,547-Speed 2627.32 samples/sec Loss 5.9216 LearningRate 0.0196 Epoch: 11 Global Step: 462050 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:33,422-Speed 2642.78 samples/sec Loss 5.9725 LearningRate 0.0196 Epoch: 11 Global Step: 462060 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:37,323-Speed 2625.41 samples/sec Loss 5.9800 LearningRate 0.0196 Epoch: 11 Global Step: 462070 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:41,220-Speed 2628.19 samples/sec Loss 5.9629 LearningRate 0.0196 Epoch: 11 Global Step: 462080 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:45,114-Speed 2630.79 samples/sec Loss 5.9376 LearningRate 0.0196 Epoch: 11 Global Step: 462090 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:49,014-Speed 2625.85 samples/sec Loss 5.8511 LearningRate 0.0196 Epoch: 11 Global Step: 462100 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:39:52,904-Speed 2633.32 samples/sec Loss 5.9184 LearningRate 0.0196 Epoch: 11 Global Step: 462110 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:39:56,805-Speed 2626.18 samples/sec Loss 5.9049 LearningRate 0.0196 Epoch: 11 Global Step: 462120 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:00,697-Speed 2631.67 samples/sec Loss 5.9701 LearningRate 0.0196 Epoch: 11 Global Step: 462130 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:04,596-Speed 2626.47 samples/sec Loss 5.9023 LearningRate 0.0196 Epoch: 11 Global Step: 462140 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:08,492-Speed 2629.23 samples/sec Loss 6.0208 LearningRate 0.0196 Epoch: 11 Global Step: 462150 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:12,391-Speed 2626.80 samples/sec Loss 5.9052 LearningRate 0.0196 Epoch: 11 Global Step: 462160 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:16,287-Speed 2628.57 samples/sec Loss 5.8250 LearningRate 0.0196 Epoch: 11 Global Step: 462170 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:20,185-Speed 2627.72 samples/sec Loss 5.9352 LearningRate 0.0196 Epoch: 11 Global Step: 462180 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:24,078-Speed 2631.25 samples/sec Loss 5.9245 LearningRate 0.0196 Epoch: 11 Global Step: 462190 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:27,988-Speed 2619.83 samples/sec Loss 5.8474 LearningRate 0.0196 Epoch: 11 Global Step: 462200 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:31,890-Speed 2625.15 samples/sec Loss 6.0346 LearningRate 0.0196 Epoch: 11 Global Step: 462210 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:40:35,789-Speed 2626.55 samples/sec Loss 5.9772 LearningRate 0.0196 Epoch: 11 Global Step: 462220 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:40:39,748-Speed 2587.21 samples/sec Loss 5.9024 LearningRate 0.0196 Epoch: 11 Global Step: 462230 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:43,645-Speed 2627.91 samples/sec Loss 6.0089 LearningRate 0.0196 Epoch: 11 Global Step: 462240 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:47,544-Speed 2627.36 samples/sec Loss 5.9355 LearningRate 0.0196 Epoch: 11 Global Step: 462250 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:51,444-Speed 2625.94 samples/sec Loss 5.8516 LearningRate 0.0196 Epoch: 11 Global Step: 462260 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:55,342-Speed 2627.17 samples/sec Loss 5.8390 LearningRate 0.0196 Epoch: 11 Global Step: 462270 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:40:59,249-Speed 2621.76 samples/sec Loss 6.0435 LearningRate 0.0196 Epoch: 11 Global Step: 462280 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:41:03,148-Speed 2626.56 samples/sec Loss 5.8527 LearningRate 0.0196 Epoch: 11 Global Step: 462290 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:41:07,043-Speed 2629.97 samples/sec Loss 5.9003 LearningRate 0.0196 Epoch: 11 Global Step: 462300 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:41:10,939-Speed 2629.00 samples/sec Loss 5.9553 LearningRate 0.0196 Epoch: 11 Global Step: 462310 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:41:14,832-Speed 2630.79 samples/sec Loss 5.8994 LearningRate 0.0196 Epoch: 11 Global Step: 462320 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:41:18,723-Speed 2632.79 samples/sec Loss 5.9788 LearningRate 0.0196 Epoch: 11 Global Step: 462330 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:22,624-Speed 2625.52 samples/sec Loss 5.9276 LearningRate 0.0196 Epoch: 11 Global Step: 462340 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:26,532-Speed 2620.36 samples/sec Loss 6.0051 LearningRate 0.0196 Epoch: 11 Global Step: 462350 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:30,600-Speed 2517.67 samples/sec Loss 5.9566 LearningRate 0.0196 Epoch: 11 Global Step: 462360 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:34,495-Speed 2629.48 samples/sec Loss 5.9291 LearningRate 0.0196 Epoch: 11 Global Step: 462370 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:38,402-Speed 2621.38 samples/sec Loss 5.8923 LearningRate 0.0196 Epoch: 11 Global Step: 462380 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:42,309-Speed 2622.13 samples/sec Loss 5.8377 LearningRate 0.0196 Epoch: 11 Global Step: 462390 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:46,206-Speed 2628.23 samples/sec Loss 5.8709 LearningRate 0.0196 Epoch: 11 Global Step: 462400 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:50,108-Speed 2624.91 samples/sec Loss 5.8713 LearningRate 0.0196 Epoch: 11 Global Step: 462410 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:54,010-Speed 2624.67 samples/sec Loss 5.9216 LearningRate 0.0196 Epoch: 11 Global Step: 462420 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:41:57,888-Speed 2640.88 samples/sec Loss 6.0051 LearningRate 0.0196 Epoch: 11 Global Step: 462430 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:01,783-Speed 2629.96 samples/sec Loss 5.8461 LearningRate 0.0196 Epoch: 11 Global Step: 462440 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:05,694-Speed 2618.36 samples/sec Loss 5.9081 LearningRate 0.0196 Epoch: 11 Global Step: 462450 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:09,594-Speed 2626.71 samples/sec Loss 5.9250 LearningRate 0.0196 Epoch: 11 Global Step: 462460 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:13,499-Speed 2622.46 samples/sec Loss 5.8896 LearningRate 0.0196 Epoch: 11 Global Step: 462470 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:17,399-Speed 2627.08 samples/sec Loss 6.1239 LearningRate 0.0196 Epoch: 11 Global Step: 462480 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:21,297-Speed 2627.18 samples/sec Loss 5.7696 LearningRate 0.0196 Epoch: 11 Global Step: 462490 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:25,201-Speed 2623.77 samples/sec Loss 5.9714 LearningRate 0.0196 Epoch: 11 Global Step: 462500 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:29,095-Speed 2630.23 samples/sec Loss 5.8396 LearningRate 0.0196 Epoch: 11 Global Step: 462510 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:32,999-Speed 2624.19 samples/sec Loss 5.9959 LearningRate 0.0196 Epoch: 11 Global Step: 462520 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:42:36,906-Speed 2621.13 samples/sec Loss 6.0174 LearningRate 0.0196 Epoch: 11 Global Step: 462530 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:42:40,811-Speed 2622.90 samples/sec Loss 5.8787 LearningRate 0.0196 Epoch: 11 Global Step: 462540 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:42:44,707-Speed 2629.19 samples/sec Loss 5.8779 LearningRate 0.0196 Epoch: 11 Global Step: 462550 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:42:48,615-Speed 2620.90 samples/sec Loss 5.8854 LearningRate 0.0196 Epoch: 11 Global Step: 462560 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:42:52,521-Speed 2622.37 samples/sec Loss 5.8468 LearningRate 0.0196 Epoch: 11 Global Step: 462570 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:42:56,417-Speed 2628.51 samples/sec Loss 5.8092 LearningRate 0.0196 Epoch: 11 Global Step: 462580 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:43:00,315-Speed 2627.84 samples/sec Loss 5.8540 LearningRate 0.0196 Epoch: 11 Global Step: 462590 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:43:04,200-Speed 2636.56 samples/sec Loss 5.9761 LearningRate 0.0196 Epoch: 11 Global Step: 462600 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:08,094-Speed 2629.82 samples/sec Loss 5.9869 LearningRate 0.0196 Epoch: 11 Global Step: 462610 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:11,991-Speed 2628.31 samples/sec Loss 5.8866 LearningRate 0.0196 Epoch: 11 Global Step: 462620 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:15,890-Speed 2627.07 samples/sec Loss 5.9897 LearningRate 0.0196 Epoch: 11 Global Step: 462630 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:19,785-Speed 2629.73 samples/sec Loss 5.8953 LearningRate 0.0196 Epoch: 11 Global Step: 462640 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:23,683-Speed 2626.88 samples/sec Loss 5.9191 LearningRate 0.0196 Epoch: 11 Global Step: 462650 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:27,580-Speed 2628.83 samples/sec Loss 5.9336 LearningRate 0.0196 Epoch: 11 Global Step: 462660 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:31,508-Speed 2607.34 samples/sec Loss 5.9009 LearningRate 0.0196 Epoch: 11 Global Step: 462670 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:35,416-Speed 2620.53 samples/sec Loss 5.9324 LearningRate 0.0196 Epoch: 11 Global Step: 462680 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:39,315-Speed 2627.08 samples/sec Loss 5.8767 LearningRate 0.0196 Epoch: 11 Global Step: 462690 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:43,303-Speed 2568.23 samples/sec Loss 5.9085 LearningRate 0.0196 Epoch: 11 Global Step: 462700 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:43:47,210-Speed 2621.78 samples/sec Loss 5.9263 LearningRate 0.0196 Epoch: 11 Global Step: 462710 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:43:51,099-Speed 2633.67 samples/sec Loss 5.9271 LearningRate 0.0196 Epoch: 11 Global Step: 462720 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:54,997-Speed 2627.14 samples/sec Loss 5.8628 LearningRate 0.0196 Epoch: 11 Global Step: 462730 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:43:58,902-Speed 2623.23 samples/sec Loss 5.9567 LearningRate 0.0196 Epoch: 11 Global Step: 462740 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:02,811-Speed 2620.37 samples/sec Loss 5.9464 LearningRate 0.0196 Epoch: 11 Global Step: 462750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:06,704-Speed 2630.28 samples/sec Loss 5.8470 LearningRate 0.0196 Epoch: 11 Global Step: 462760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:10,614-Speed 2619.90 samples/sec Loss 5.8763 LearningRate 0.0196 Epoch: 11 Global Step: 462770 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:14,570-Speed 2589.19 samples/sec Loss 5.8372 LearningRate 0.0195 Epoch: 11 Global Step: 462780 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:18,496-Speed 2609.12 samples/sec Loss 5.8504 LearningRate 0.0195 Epoch: 11 Global Step: 462790 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:22,394-Speed 2627.15 samples/sec Loss 5.8769 LearningRate 0.0195 Epoch: 11 Global Step: 462800 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:26,311-Speed 2614.74 samples/sec Loss 5.8631 LearningRate 0.0195 Epoch: 11 Global Step: 462810 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:30,243-Speed 2605.09 samples/sec Loss 5.8958 LearningRate 0.0195 Epoch: 11 Global Step: 462820 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:44:34,161-Speed 2613.82 samples/sec Loss 5.8834 LearningRate 0.0195 Epoch: 11 Global Step: 462830 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:44:38,070-Speed 2620.21 samples/sec Loss 6.0092 LearningRate 0.0195 Epoch: 11 Global Step: 462840 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:44:41,949-Speed 2640.51 samples/sec Loss 5.9581 LearningRate 0.0195 Epoch: 11 Global Step: 462850 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:45,840-Speed 2631.71 samples/sec Loss 5.8995 LearningRate 0.0195 Epoch: 11 Global Step: 462860 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:44:49,768-Speed 2608.09 samples/sec Loss 5.9354 LearningRate 0.0195 Epoch: 11 Global Step: 462870 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:44:53,664-Speed 2629.12 samples/sec Loss 5.8791 LearningRate 0.0195 Epoch: 11 Global Step: 462880 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:44:57,556-Speed 2631.83 samples/sec Loss 5.8994 LearningRate 0.0195 Epoch: 11 Global Step: 462890 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:01,448-Speed 2630.98 samples/sec Loss 5.7071 LearningRate 0.0195 Epoch: 11 Global Step: 462900 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:05,346-Speed 2627.66 samples/sec Loss 5.9045 LearningRate 0.0195 Epoch: 11 Global Step: 462910 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:09,257-Speed 2618.71 samples/sec Loss 5.7912 LearningRate 0.0195 Epoch: 11 Global Step: 462920 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:13,152-Speed 2629.63 samples/sec Loss 5.9345 LearningRate 0.0195 Epoch: 11 Global Step: 462930 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:17,051-Speed 2626.66 samples/sec Loss 5.9734 LearningRate 0.0195 Epoch: 11 Global Step: 462940 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:20,947-Speed 2629.65 samples/sec Loss 5.8827 LearningRate 0.0195 Epoch: 11 Global Step: 462950 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:24,852-Speed 2622.53 samples/sec Loss 5.8041 LearningRate 0.0195 Epoch: 11 Global Step: 462960 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:28,748-Speed 2629.27 samples/sec Loss 5.9984 LearningRate 0.0195 Epoch: 11 Global Step: 462970 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:45:32,645-Speed 2628.11 samples/sec Loss 5.9221 LearningRate 0.0195 Epoch: 11 Global Step: 462980 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:45:36,550-Speed 2622.80 samples/sec Loss 5.9013 LearningRate 0.0195 Epoch: 11 Global Step: 462990 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:45:40,456-Speed 2621.89 samples/sec Loss 6.0196 LearningRate 0.0195 Epoch: 11 Global Step: 463000 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:45:44,356-Speed 2626.79 samples/sec Loss 5.8925 LearningRate 0.0195 Epoch: 11 Global Step: 463010 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:45:48,247-Speed 2631.90 samples/sec Loss 5.8890 LearningRate 0.0195 Epoch: 11 Global Step: 463020 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:45:52,122-Speed 2643.17 samples/sec Loss 5.8291 LearningRate 0.0195 Epoch: 11 Global Step: 463030 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:56,030-Speed 2620.51 samples/sec Loss 5.8336 LearningRate 0.0195 Epoch: 11 Global Step: 463040 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:45:59,925-Speed 2629.84 samples/sec Loss 5.7747 LearningRate 0.0195 Epoch: 11 Global Step: 463050 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:03,832-Speed 2621.78 samples/sec Loss 5.8484 LearningRate 0.0195 Epoch: 11 Global Step: 463060 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:07,819-Speed 2569.13 samples/sec Loss 5.9761 LearningRate 0.0195 Epoch: 11 Global Step: 463070 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:11,900-Speed 2509.58 samples/sec Loss 5.8044 LearningRate 0.0195 Epoch: 11 Global Step: 463080 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:15,944-Speed 2532.46 samples/sec Loss 5.8535 LearningRate 0.0195 Epoch: 11 Global Step: 463090 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:19,842-Speed 2628.06 samples/sec Loss 5.9342 LearningRate 0.0195 Epoch: 11 Global Step: 463100 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:23,746-Speed 2623.40 samples/sec Loss 5.9003 LearningRate 0.0195 Epoch: 11 Global Step: 463110 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:27,639-Speed 2630.77 samples/sec Loss 5.8973 LearningRate 0.0195 Epoch: 11 Global Step: 463120 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:46:31,546-Speed 2621.49 samples/sec Loss 5.9023 LearningRate 0.0195 Epoch: 11 Global Step: 463130 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:46:35,457-Speed 2618.81 samples/sec Loss 5.9545 LearningRate 0.0195 Epoch: 11 Global Step: 463140 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:46:39,359-Speed 2625.36 samples/sec Loss 5.9407 LearningRate 0.0195 Epoch: 11 Global Step: 463150 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:46:43,254-Speed 2629.67 samples/sec Loss 5.8842 LearningRate 0.0195 Epoch: 11 Global Step: 463160 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:46:47,151-Speed 2628.49 samples/sec Loss 5.9270 LearningRate 0.0195 Epoch: 11 Global Step: 463170 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:46:51,050-Speed 2626.62 samples/sec Loss 5.9900 LearningRate 0.0195 Epoch: 11 Global Step: 463180 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:46:54,944-Speed 2630.86 samples/sec Loss 5.9322 LearningRate 0.0195 Epoch: 11 Global Step: 463190 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:46:58,843-Speed 2626.65 samples/sec Loss 5.8586 LearningRate 0.0195 Epoch: 11 Global Step: 463200 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:47:02,773-Speed 2605.93 samples/sec Loss 5.9486 LearningRate 0.0195 Epoch: 11 Global Step: 463210 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:47:06,672-Speed 2626.65 samples/sec Loss 5.9125 LearningRate 0.0195 Epoch: 11 Global Step: 463220 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:47:10,607-Speed 2602.76 samples/sec Loss 5.8836 LearningRate 0.0195 Epoch: 11 Global Step: 463230 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:14,513-Speed 2622.67 samples/sec Loss 5.8443 LearningRate 0.0195 Epoch: 11 Global Step: 463240 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:18,411-Speed 2627.47 samples/sec Loss 5.8944 LearningRate 0.0195 Epoch: 11 Global Step: 463250 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:22,324-Speed 2617.85 samples/sec Loss 5.8764 LearningRate 0.0195 Epoch: 11 Global Step: 463260 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:26,220-Speed 2629.08 samples/sec Loss 5.8946 LearningRate 0.0195 Epoch: 11 Global Step: 463270 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:30,116-Speed 2628.43 samples/sec Loss 5.8324 LearningRate 0.0195 Epoch: 11 Global Step: 463280 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:34,012-Speed 2629.07 samples/sec Loss 5.8517 LearningRate 0.0195 Epoch: 11 Global Step: 463290 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:37,951-Speed 2600.43 samples/sec Loss 5.9307 LearningRate 0.0195 Epoch: 11 Global Step: 463300 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:41,850-Speed 2626.53 samples/sec Loss 5.9042 LearningRate 0.0195 Epoch: 11 Global Step: 463310 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:47:45,771-Speed 2613.03 samples/sec Loss 5.9003 LearningRate 0.0195 Epoch: 11 Global Step: 463320 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:47:49,680-Speed 2619.56 samples/sec Loss 5.8403 LearningRate 0.0195 Epoch: 11 Global Step: 463330 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:47:53,598-Speed 2614.45 samples/sec Loss 5.8883 LearningRate 0.0195 Epoch: 11 Global Step: 463340 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:47:57,497-Speed 2627.40 samples/sec Loss 5.9206 LearningRate 0.0195 Epoch: 11 Global Step: 463350 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:48:01,395-Speed 2627.62 samples/sec Loss 5.9591 LearningRate 0.0195 Epoch: 11 Global Step: 463360 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:48:05,294-Speed 2626.32 samples/sec Loss 5.9993 LearningRate 0.0195 Epoch: 11 Global Step: 463370 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:48:09,238-Speed 2597.72 samples/sec Loss 5.9686 LearningRate 0.0195 Epoch: 11 Global Step: 463380 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:48:13,134-Speed 2629.02 samples/sec Loss 6.0411 LearningRate 0.0195 Epoch: 11 Global Step: 463390 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:48:17,029-Speed 2629.35 samples/sec Loss 5.8094 LearningRate 0.0195 Epoch: 11 Global Step: 463400 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:48:20,928-Speed 2627.19 samples/sec Loss 5.9763 LearningRate 0.0195 Epoch: 11 Global Step: 463410 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:48:24,825-Speed 2628.57 samples/sec Loss 5.8682 LearningRate 0.0195 Epoch: 11 Global Step: 463420 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:28,722-Speed 2628.25 samples/sec Loss 5.8903 LearningRate 0.0195 Epoch: 11 Global Step: 463430 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:32,617-Speed 2629.27 samples/sec Loss 5.9091 LearningRate 0.0195 Epoch: 11 Global Step: 463440 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:36,530-Speed 2617.34 samples/sec Loss 5.8556 LearningRate 0.0195 Epoch: 11 Global Step: 463450 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:40,429-Speed 2627.57 samples/sec Loss 5.9089 LearningRate 0.0195 Epoch: 11 Global Step: 463460 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:44,327-Speed 2628.02 samples/sec Loss 5.9739 LearningRate 0.0195 Epoch: 11 Global Step: 463470 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:48,217-Speed 2632.91 samples/sec Loss 5.9126 LearningRate 0.0195 Epoch: 11 Global Step: 463480 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:52,114-Speed 2628.72 samples/sec Loss 5.7536 LearningRate 0.0195 Epoch: 11 Global Step: 463490 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:56,006-Speed 2631.31 samples/sec Loss 5.7606 LearningRate 0.0195 Epoch: 11 Global Step: 463500 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:48:59,901-Speed 2629.02 samples/sec Loss 5.9351 LearningRate 0.0195 Epoch: 11 Global Step: 463510 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:03,797-Speed 2629.17 samples/sec Loss 5.8401 LearningRate 0.0195 Epoch: 11 Global Step: 463520 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:07,695-Speed 2627.63 samples/sec Loss 5.9618 LearningRate 0.0195 Epoch: 11 Global Step: 463530 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:11,596-Speed 2625.77 samples/sec Loss 5.9183 LearningRate 0.0195 Epoch: 11 Global Step: 463540 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:15,493-Speed 2628.35 samples/sec Loss 5.9384 LearningRate 0.0195 Epoch: 11 Global Step: 463550 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:19,391-Speed 2627.99 samples/sec Loss 5.9218 LearningRate 0.0195 Epoch: 11 Global Step: 463560 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:23,289-Speed 2627.20 samples/sec Loss 6.0049 LearningRate 0.0195 Epoch: 11 Global Step: 463570 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:27,192-Speed 2624.75 samples/sec Loss 5.9497 LearningRate 0.0195 Epoch: 11 Global Step: 463580 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:31,087-Speed 2629.40 samples/sec Loss 5.9196 LearningRate 0.0195 Epoch: 11 Global Step: 463590 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:34,987-Speed 2626.11 samples/sec Loss 5.8782 LearningRate 0.0195 Epoch: 11 Global Step: 463600 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:38,881-Speed 2629.98 samples/sec Loss 5.8473 LearningRate 0.0195 Epoch: 11 Global Step: 463610 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:42,764-Speed 2638.38 samples/sec Loss 5.9439 LearningRate 0.0195 Epoch: 11 Global Step: 463620 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:46,657-Speed 2631.31 samples/sec Loss 5.8613 LearningRate 0.0195 Epoch: 11 Global Step: 463630 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:50,550-Speed 2631.57 samples/sec Loss 5.9220 LearningRate 0.0195 Epoch: 11 Global Step: 463640 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:54,459-Speed 2619.57 samples/sec Loss 5.8675 LearningRate 0.0195 Epoch: 11 Global Step: 463650 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:49:58,357-Speed 2628.47 samples/sec Loss 6.0114 LearningRate 0.0195 Epoch: 11 Global Step: 463660 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:50:02,236-Speed 2640.48 samples/sec Loss 5.8442 LearningRate 0.0195 Epoch: 11 Global Step: 463670 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:50:06,132-Speed 2628.79 samples/sec Loss 5.8295 LearningRate 0.0195 Epoch: 11 Global Step: 463680 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:50:10,030-Speed 2626.92 samples/sec Loss 5.8948 LearningRate 0.0195 Epoch: 11 Global Step: 463690 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:50:13,923-Speed 2631.15 samples/sec Loss 5.8156 LearningRate 0.0195 Epoch: 11 Global Step: 463700 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:50:17,827-Speed 2624.20 samples/sec Loss 5.9570 LearningRate 0.0195 Epoch: 11 Global Step: 463710 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:50:21,700-Speed 2644.85 samples/sec Loss 5.9202 LearningRate 0.0194 Epoch: 11 Global Step: 463720 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:25,593-Speed 2630.73 samples/sec Loss 6.0038 LearningRate 0.0194 Epoch: 11 Global Step: 463730 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:29,495-Speed 2625.88 samples/sec Loss 5.8436 LearningRate 0.0194 Epoch: 11 Global Step: 463740 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:33,408-Speed 2616.84 samples/sec Loss 5.8631 LearningRate 0.0194 Epoch: 11 Global Step: 463750 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:37,317-Speed 2620.37 samples/sec Loss 5.8936 LearningRate 0.0194 Epoch: 11 Global Step: 463760 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:41,217-Speed 2625.99 samples/sec Loss 5.8314 LearningRate 0.0194 Epoch: 11 Global Step: 463770 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:45,120-Speed 2625.08 samples/sec Loss 5.9038 LearningRate 0.0194 Epoch: 11 Global Step: 463780 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:49,022-Speed 2624.26 samples/sec Loss 5.9874 LearningRate 0.0194 Epoch: 11 Global Step: 463790 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:52,933-Speed 2619.17 samples/sec Loss 5.9862 LearningRate 0.0194 Epoch: 11 Global Step: 463800 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:50:56,840-Speed 2622.05 samples/sec Loss 5.7647 LearningRate 0.0194 Epoch: 11 Global Step: 463810 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:51:00,733-Speed 2630.98 samples/sec Loss 5.9611 LearningRate 0.0194 Epoch: 11 Global Step: 463820 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:04,643-Speed 2619.46 samples/sec Loss 5.8482 LearningRate 0.0194 Epoch: 11 Global Step: 463830 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:08,647-Speed 2558.16 samples/sec Loss 5.8370 LearningRate 0.0194 Epoch: 11 Global Step: 463840 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:12,680-Speed 2539.45 samples/sec Loss 5.9984 LearningRate 0.0194 Epoch: 11 Global Step: 463850 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:16,579-Speed 2626.74 samples/sec Loss 5.6758 LearningRate 0.0194 Epoch: 11 Global Step: 463860 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:20,479-Speed 2626.95 samples/sec Loss 5.9567 LearningRate 0.0194 Epoch: 11 Global Step: 463870 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:24,380-Speed 2625.43 samples/sec Loss 5.8163 LearningRate 0.0194 Epoch: 11 Global Step: 463880 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:28,283-Speed 2625.96 samples/sec Loss 5.7364 LearningRate 0.0194 Epoch: 11 Global Step: 463890 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:32,181-Speed 2627.55 samples/sec Loss 5.8370 LearningRate 0.0194 Epoch: 11 Global Step: 463900 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:36,268-Speed 2505.65 samples/sec Loss 5.9062 LearningRate 0.0194 Epoch: 11 Global Step: 463910 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:51:40,356-Speed 2505.48 samples/sec Loss 5.8421 LearningRate 0.0194 Epoch: 11 Global Step: 463920 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:51:44,466-Speed 2492.42 samples/sec Loss 5.9053 LearningRate 0.0194 Epoch: 11 Global Step: 463930 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:51:48,562-Speed 2500.65 samples/sec Loss 5.9214 LearningRate 0.0194 Epoch: 11 Global Step: 463940 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:51:52,582-Speed 2547.95 samples/sec Loss 5.7989 LearningRate 0.0194 Epoch: 11 Global Step: 463950 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:51:56,474-Speed 2632.17 samples/sec Loss 5.9088 LearningRate 0.0194 Epoch: 11 Global Step: 463960 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:00,367-Speed 2631.27 samples/sec Loss 5.8445 LearningRate 0.0194 Epoch: 11 Global Step: 463970 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:04,269-Speed 2624.13 samples/sec Loss 5.9014 LearningRate 0.0194 Epoch: 11 Global Step: 463980 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:08,163-Speed 2630.72 samples/sec Loss 5.7389 LearningRate 0.0194 Epoch: 11 Global Step: 463990 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:12,062-Speed 2626.74 samples/sec Loss 6.0084 LearningRate 0.0194 Epoch: 11 Global Step: 464000 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:15,975-Speed 2617.63 samples/sec Loss 5.9422 LearningRate 0.0194 Epoch: 11 Global Step: 464010 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:19,853-Speed 2641.83 samples/sec Loss 5.8291 LearningRate 0.0194 Epoch: 11 Global Step: 464020 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:23,751-Speed 2627.03 samples/sec Loss 5.8887 LearningRate 0.0194 Epoch: 11 Global Step: 464030 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:27,650-Speed 2626.76 samples/sec Loss 5.8983 LearningRate 0.0194 Epoch: 11 Global Step: 464040 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:31,548-Speed 2627.47 samples/sec Loss 5.8937 LearningRate 0.0194 Epoch: 11 Global Step: 464050 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:35,461-Speed 2618.32 samples/sec Loss 5.9594 LearningRate 0.0194 Epoch: 11 Global Step: 464060 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:39,355-Speed 2630.15 samples/sec Loss 5.8158 LearningRate 0.0194 Epoch: 11 Global Step: 464070 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:43,249-Speed 2630.09 samples/sec Loss 5.7933 LearningRate 0.0194 Epoch: 11 Global Step: 464080 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:52:47,351-Speed 2497.18 samples/sec Loss 5.9463 LearningRate 0.0194 Epoch: 11 Global Step: 464090 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:52:51,290-Speed 2600.21 samples/sec Loss 5.9510 LearningRate 0.0194 Epoch: 11 Global Step: 464100 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:52:55,213-Speed 2611.19 samples/sec Loss 5.9042 LearningRate 0.0194 Epoch: 11 Global Step: 464110 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:52:59,117-Speed 2623.52 samples/sec Loss 5.9158 LearningRate 0.0194 Epoch: 11 Global Step: 464120 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:03,015-Speed 2627.98 samples/sec Loss 5.8563 LearningRate 0.0194 Epoch: 11 Global Step: 464130 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:06,909-Speed 2630.03 samples/sec Loss 5.8467 LearningRate 0.0194 Epoch: 11 Global Step: 464140 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:10,816-Speed 2622.06 samples/sec Loss 5.7826 LearningRate 0.0194 Epoch: 11 Global Step: 464150 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:14,710-Speed 2630.16 samples/sec Loss 5.8728 LearningRate 0.0194 Epoch: 11 Global Step: 464160 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:18,606-Speed 2628.81 samples/sec Loss 5.9162 LearningRate 0.0194 Epoch: 11 Global Step: 464170 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:22,505-Speed 2627.22 samples/sec Loss 5.9538 LearningRate 0.0194 Epoch: 11 Global Step: 464180 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:26,401-Speed 2629.01 samples/sec Loss 5.8814 LearningRate 0.0194 Epoch: 11 Global Step: 464190 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:53:30,291-Speed 2632.99 samples/sec Loss 5.8309 LearningRate 0.0194 Epoch: 11 Global Step: 464200 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:53:34,188-Speed 2628.49 samples/sec Loss 5.8613 LearningRate 0.0194 Epoch: 11 Global Step: 464210 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:53:38,157-Speed 2580.67 samples/sec Loss 5.8193 LearningRate 0.0194 Epoch: 11 Global Step: 464220 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:53:42,069-Speed 2618.03 samples/sec Loss 6.0135 LearningRate 0.0194 Epoch: 11 Global Step: 464230 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:53:45,969-Speed 2626.35 samples/sec Loss 5.9264 LearningRate 0.0194 Epoch: 11 Global Step: 464240 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:53:49,846-Speed 2642.27 samples/sec Loss 5.9127 LearningRate 0.0194 Epoch: 11 Global Step: 464250 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:53,740-Speed 2630.34 samples/sec Loss 5.8683 LearningRate 0.0194 Epoch: 11 Global Step: 464260 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:53:57,639-Speed 2626.94 samples/sec Loss 5.9841 LearningRate 0.0194 Epoch: 11 Global Step: 464270 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:01,544-Speed 2622.78 samples/sec Loss 5.9124 LearningRate 0.0194 Epoch: 11 Global Step: 464280 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:05,446-Speed 2624.77 samples/sec Loss 5.8264 LearningRate 0.0194 Epoch: 11 Global Step: 464290 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:09,343-Speed 2628.56 samples/sec Loss 5.8633 LearningRate 0.0194 Epoch: 11 Global Step: 464300 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:13,241-Speed 2627.78 samples/sec Loss 5.8139 LearningRate 0.0194 Epoch: 11 Global Step: 464310 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:17,150-Speed 2620.42 samples/sec Loss 6.0218 LearningRate 0.0194 Epoch: 11 Global Step: 464320 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:21,046-Speed 2628.65 samples/sec Loss 5.9854 LearningRate 0.0194 Epoch: 11 Global Step: 464330 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:24,958-Speed 2618.94 samples/sec Loss 5.9412 LearningRate 0.0194 Epoch: 11 Global Step: 464340 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:54:28,832-Speed 2643.60 samples/sec Loss 5.9346 LearningRate 0.0194 Epoch: 11 Global Step: 464350 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:54:32,733-Speed 2625.71 samples/sec Loss 5.8150 LearningRate 0.0194 Epoch: 11 Global Step: 464360 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:54:36,630-Speed 2628.20 samples/sec Loss 5.8272 LearningRate 0.0194 Epoch: 11 Global Step: 464370 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:54:40,525-Speed 2629.90 samples/sec Loss 5.8440 LearningRate 0.0194 Epoch: 11 Global Step: 464380 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:54:44,428-Speed 2623.98 samples/sec Loss 5.8381 LearningRate 0.0194 Epoch: 11 Global Step: 464390 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:54:48,322-Speed 2630.97 samples/sec Loss 5.9307 LearningRate 0.0194 Epoch: 11 Global Step: 464400 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:54:52,222-Speed 2625.69 samples/sec Loss 5.9025 LearningRate 0.0194 Epoch: 11 Global Step: 464410 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:54:56,115-Speed 2631.39 samples/sec Loss 5.8399 LearningRate 0.0194 Epoch: 11 Global Step: 464420 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:55:00,022-Speed 2621.44 samples/sec Loss 5.7867 LearningRate 0.0194 Epoch: 11 Global Step: 464430 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:55:03,931-Speed 2619.90 samples/sec Loss 5.8762 LearningRate 0.0194 Epoch: 11 Global Step: 464440 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:55:07,829-Speed 2628.05 samples/sec Loss 6.0031 LearningRate 0.0194 Epoch: 11 Global Step: 464450 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:11,723-Speed 2630.47 samples/sec Loss 5.9595 LearningRate 0.0194 Epoch: 11 Global Step: 464460 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:15,616-Speed 2631.27 samples/sec Loss 5.8583 LearningRate 0.0194 Epoch: 11 Global Step: 464470 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:19,511-Speed 2629.88 samples/sec Loss 5.8711 LearningRate 0.0194 Epoch: 11 Global Step: 464480 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:23,447-Speed 2602.35 samples/sec Loss 5.9422 LearningRate 0.0194 Epoch: 11 Global Step: 464490 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:27,341-Speed 2630.51 samples/sec Loss 5.9084 LearningRate 0.0194 Epoch: 11 Global Step: 464500 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:31,240-Speed 2627.00 samples/sec Loss 6.0143 LearningRate 0.0194 Epoch: 11 Global Step: 464510 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:35,149-Speed 2619.92 samples/sec Loss 5.9406 LearningRate 0.0194 Epoch: 11 Global Step: 464520 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:39,048-Speed 2627.45 samples/sec Loss 5.9102 LearningRate 0.0194 Epoch: 11 Global Step: 464530 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:42,946-Speed 2627.44 samples/sec Loss 5.8141 LearningRate 0.0194 Epoch: 11 Global Step: 464540 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:55:46,849-Speed 2624.12 samples/sec Loss 5.9613 LearningRate 0.0194 Epoch: 11 Global Step: 464550 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:55:50,754-Speed 2623.14 samples/sec Loss 5.9711 LearningRate 0.0194 Epoch: 11 Global Step: 464560 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:55:54,647-Speed 2630.72 samples/sec Loss 5.8618 LearningRate 0.0194 Epoch: 11 Global Step: 464570 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:55:58,540-Speed 2631.00 samples/sec Loss 5.8655 LearningRate 0.0194 Epoch: 11 Global Step: 464580 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:02,450-Speed 2619.78 samples/sec Loss 5.8667 LearningRate 0.0194 Epoch: 11 Global Step: 464590 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:06,344-Speed 2629.96 samples/sec Loss 5.9328 LearningRate 0.0194 Epoch: 11 Global Step: 464600 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:10,239-Speed 2629.90 samples/sec Loss 5.8584 LearningRate 0.0194 Epoch: 11 Global Step: 464610 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:14,139-Speed 2626.46 samples/sec Loss 5.8004 LearningRate 0.0194 Epoch: 11 Global Step: 464620 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:18,048-Speed 2619.89 samples/sec Loss 5.9426 LearningRate 0.0194 Epoch: 11 Global Step: 464630 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:21,943-Speed 2629.90 samples/sec Loss 5.9271 LearningRate 0.0194 Epoch: 11 Global Step: 464640 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:25,824-Speed 2638.94 samples/sec Loss 5.8794 LearningRate 0.0194 Epoch: 11 Global Step: 464650 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:29,717-Speed 2630.73 samples/sec Loss 5.8550 LearningRate 0.0193 Epoch: 11 Global Step: 464660 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:33,617-Speed 2626.63 samples/sec Loss 5.9286 LearningRate 0.0193 Epoch: 11 Global Step: 464670 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:37,519-Speed 2625.20 samples/sec Loss 5.8255 LearningRate 0.0193 Epoch: 11 Global Step: 464680 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:41,423-Speed 2623.18 samples/sec Loss 5.7912 LearningRate 0.0193 Epoch: 11 Global Step: 464690 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:56:45,302-Speed 2640.37 samples/sec Loss 5.8215 LearningRate 0.0193 Epoch: 11 Global Step: 464700 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:56:49,198-Speed 2629.38 samples/sec Loss 5.9216 LearningRate 0.0193 Epoch: 11 Global Step: 464710 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:56:53,091-Speed 2631.12 samples/sec Loss 5.9168 LearningRate 0.0193 Epoch: 11 Global Step: 464720 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:56:56,988-Speed 2628.55 samples/sec Loss 5.9322 LearningRate 0.0193 Epoch: 11 Global Step: 464730 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:00,930-Speed 2598.63 samples/sec Loss 5.9346 LearningRate 0.0193 Epoch: 11 Global Step: 464740 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:04,825-Speed 2629.74 samples/sec Loss 5.8323 LearningRate 0.0193 Epoch: 11 Global Step: 464750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:08,725-Speed 2625.59 samples/sec Loss 5.8351 LearningRate 0.0193 Epoch: 11 Global Step: 464760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:12,621-Speed 2628.77 samples/sec Loss 5.7727 LearningRate 0.0193 Epoch: 11 Global Step: 464770 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:16,525-Speed 2624.66 samples/sec Loss 5.9337 LearningRate 0.0193 Epoch: 11 Global Step: 464780 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:20,439-Speed 2616.61 samples/sec Loss 5.8110 LearningRate 0.0193 Epoch: 11 Global Step: 464790 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:24,341-Speed 2624.66 samples/sec Loss 5.9713 LearningRate 0.0193 Epoch: 11 Global Step: 464800 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:57:28,239-Speed 2628.19 samples/sec Loss 5.8731 LearningRate 0.0193 Epoch: 11 Global Step: 464810 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:57:32,168-Speed 2606.95 samples/sec Loss 5.9049 LearningRate 0.0193 Epoch: 11 Global Step: 464820 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:57:36,084-Speed 2615.14 samples/sec Loss 5.9167 LearningRate 0.0193 Epoch: 11 Global Step: 464830 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:57:39,982-Speed 2628.18 samples/sec Loss 5.8946 LearningRate 0.0193 Epoch: 11 Global Step: 464840 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:57:43,878-Speed 2629.46 samples/sec Loss 5.8332 LearningRate 0.0193 Epoch: 11 Global Step: 464850 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:57:47,762-Speed 2636.36 samples/sec Loss 5.8308 LearningRate 0.0193 Epoch: 11 Global Step: 464860 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:51,658-Speed 2629.26 samples/sec Loss 5.8375 LearningRate 0.0193 Epoch: 11 Global Step: 464870 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:55,556-Speed 2627.23 samples/sec Loss 5.7989 LearningRate 0.0193 Epoch: 11 Global Step: 464880 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:57:59,451-Speed 2630.32 samples/sec Loss 5.9440 LearningRate 0.0193 Epoch: 11 Global Step: 464890 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:58:03,352-Speed 2625.59 samples/sec Loss 5.9054 LearningRate 0.0193 Epoch: 11 Global Step: 464900 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:58:07,270-Speed 2613.88 samples/sec Loss 5.8589 LearningRate 0.0193 Epoch: 11 Global Step: 464910 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:58:11,156-Speed 2635.77 samples/sec Loss 5.8478 LearningRate 0.0193 Epoch: 11 Global Step: 464920 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:15,047-Speed 2632.93 samples/sec Loss 5.7868 LearningRate 0.0193 Epoch: 11 Global Step: 464930 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:18,948-Speed 2625.31 samples/sec Loss 5.8694 LearningRate 0.0193 Epoch: 11 Global Step: 464940 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:22,853-Speed 2622.58 samples/sec Loss 5.8205 LearningRate 0.0193 Epoch: 11 Global Step: 464950 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:26,757-Speed 2623.86 samples/sec Loss 5.8867 LearningRate 0.0193 Epoch: 11 Global Step: 464960 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:30,656-Speed 2627.79 samples/sec Loss 5.8916 LearningRate 0.0193 Epoch: 11 Global Step: 464970 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:34,552-Speed 2628.43 samples/sec Loss 5.9223 LearningRate 0.0193 Epoch: 11 Global Step: 464980 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:38,448-Speed 2628.62 samples/sec Loss 5.8233 LearningRate 0.0193 Epoch: 11 Global Step: 464990 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:42,349-Speed 2625.74 samples/sec Loss 5.8680 LearningRate 0.0193 Epoch: 11 Global Step: 465000 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:46,241-Speed 2632.44 samples/sec Loss 5.8277 LearningRate 0.0193 Epoch: 11 Global Step: 465010 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-14 23:58:50,137-Speed 2628.93 samples/sec Loss 5.9249 LearningRate 0.0193 Epoch: 11 Global Step: 465020 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:58:54,033-Speed 2628.62 samples/sec Loss 5.8386 LearningRate 0.0193 Epoch: 11 Global Step: 465030 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:58:57,948-Speed 2616.35 samples/sec Loss 5.9048 LearningRate 0.0193 Epoch: 11 Global Step: 465040 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:01,845-Speed 2628.17 samples/sec Loss 5.7596 LearningRate 0.0193 Epoch: 11 Global Step: 465050 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:05,743-Speed 2627.83 samples/sec Loss 5.9459 LearningRate 0.0193 Epoch: 11 Global Step: 465060 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:09,635-Speed 2631.66 samples/sec Loss 5.8753 LearningRate 0.0193 Epoch: 11 Global Step: 465070 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:13,530-Speed 2629.47 samples/sec Loss 5.9007 LearningRate 0.0193 Epoch: 11 Global Step: 465080 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:17,430-Speed 2626.89 samples/sec Loss 5.8956 LearningRate 0.0193 Epoch: 11 Global Step: 465090 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:21,327-Speed 2628.13 samples/sec Loss 5.7961 LearningRate 0.0193 Epoch: 11 Global Step: 465100 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:25,322-Speed 2563.62 samples/sec Loss 5.8735 LearningRate 0.0193 Epoch: 11 Global Step: 465110 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-14 23:59:29,236-Speed 2616.99 samples/sec Loss 5.8749 LearningRate 0.0193 Epoch: 11 Global Step: 465120 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:59:33,139-Speed 2623.80 samples/sec Loss 5.8693 LearningRate 0.0193 Epoch: 11 Global Step: 465130 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:59:37,045-Speed 2622.83 samples/sec Loss 5.8816 LearningRate 0.0193 Epoch: 11 Global Step: 465140 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:59:40,951-Speed 2622.16 samples/sec Loss 5.8571 LearningRate 0.0193 Epoch: 11 Global Step: 465150 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:59:44,850-Speed 2626.60 samples/sec Loss 5.8549 LearningRate 0.0193 Epoch: 11 Global Step: 465160 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:59:48,756-Speed 2622.16 samples/sec Loss 5.8646 LearningRate 0.0193 Epoch: 11 Global Step: 465170 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:59:52,669-Speed 2617.25 samples/sec Loss 5.9474 LearningRate 0.0193 Epoch: 11 Global Step: 465180 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-14 23:59:56,580-Speed 2619.07 samples/sec Loss 5.9267 LearningRate 0.0193 Epoch: 11 Global Step: 465190 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:00:00,479-Speed 2627.10 samples/sec Loss 5.9563 LearningRate 0.0193 Epoch: 11 Global Step: 465200 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:00:04,387-Speed 2620.61 samples/sec Loss 5.8386 LearningRate 0.0193 Epoch: 11 Global Step: 465210 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:00:08,265-Speed 2641.20 samples/sec Loss 5.7986 LearningRate 0.0193 Epoch: 11 Global Step: 465220 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:00:12,175-Speed 2619.93 samples/sec Loss 5.9103 LearningRate 0.0193 Epoch: 11 Global Step: 465230 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:00:16,075-Speed 2627.05 samples/sec Loss 5.9014 LearningRate 0.0193 Epoch: 11 Global Step: 465240 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:00:19,953-Speed 2641.03 samples/sec Loss 5.9080 LearningRate 0.0193 Epoch: 11 Global Step: 465250 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:23,847-Speed 2630.68 samples/sec Loss 5.8941 LearningRate 0.0193 Epoch: 11 Global Step: 465260 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:27,741-Speed 2629.90 samples/sec Loss 5.8410 LearningRate 0.0193 Epoch: 11 Global Step: 465270 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:31,644-Speed 2624.16 samples/sec Loss 5.8693 LearningRate 0.0193 Epoch: 11 Global Step: 465280 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:35,545-Speed 2625.60 samples/sec Loss 5.8883 LearningRate 0.0193 Epoch: 11 Global Step: 465290 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:39,438-Speed 2630.81 samples/sec Loss 5.8958 LearningRate 0.0193 Epoch: 11 Global Step: 465300 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:43,337-Speed 2627.22 samples/sec Loss 5.8979 LearningRate 0.0193 Epoch: 11 Global Step: 465310 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:47,232-Speed 2629.68 samples/sec Loss 5.8278 LearningRate 0.0193 Epoch: 11 Global Step: 465320 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:51,142-Speed 2619.67 samples/sec Loss 5.8613 LearningRate 0.0193 Epoch: 11 Global Step: 465330 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:55,039-Speed 2628.91 samples/sec Loss 5.9250 LearningRate 0.0193 Epoch: 11 Global Step: 465340 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:00:58,937-Speed 2627.07 samples/sec Loss 5.8124 LearningRate 0.0193 Epoch: 11 Global Step: 465350 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:01:02,833-Speed 2628.89 samples/sec Loss 5.7842 LearningRate 0.0193 Epoch: 11 Global Step: 465360 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:01:06,734-Speed 2625.86 samples/sec Loss 5.8258 LearningRate 0.0193 Epoch: 11 Global Step: 465370 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:01:10,614-Speed 2640.30 samples/sec Loss 5.9470 LearningRate 0.0193 Epoch: 11 Global Step: 465380 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:14,510-Speed 2628.54 samples/sec Loss 5.8168 LearningRate 0.0193 Epoch: 11 Global Step: 465390 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:18,434-Speed 2610.01 samples/sec Loss 5.8918 LearningRate 0.0193 Epoch: 11 Global Step: 465400 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:22,335-Speed 2626.07 samples/sec Loss 5.8414 LearningRate 0.0193 Epoch: 11 Global Step: 465410 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:26,229-Speed 2630.21 samples/sec Loss 5.7874 LearningRate 0.0193 Epoch: 11 Global Step: 465420 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:30,122-Speed 2631.31 samples/sec Loss 5.8621 LearningRate 0.0193 Epoch: 11 Global Step: 465430 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:34,019-Speed 2628.00 samples/sec Loss 5.8394 LearningRate 0.0193 Epoch: 11 Global Step: 465440 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:37,917-Speed 2627.12 samples/sec Loss 5.8206 LearningRate 0.0193 Epoch: 11 Global Step: 465450 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:41,813-Speed 2629.26 samples/sec Loss 5.9372 LearningRate 0.0193 Epoch: 11 Global Step: 465460 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:45,709-Speed 2629.63 samples/sec Loss 5.8425 LearningRate 0.0193 Epoch: 11 Global Step: 465470 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:01:49,608-Speed 2626.86 samples/sec Loss 5.8788 LearningRate 0.0193 Epoch: 11 Global Step: 465480 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:01:53,518-Speed 2619.73 samples/sec Loss 5.7907 LearningRate 0.0193 Epoch: 11 Global Step: 465490 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:01:57,434-Speed 2615.49 samples/sec Loss 5.7940 LearningRate 0.0193 Epoch: 11 Global Step: 465500 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:01,332-Speed 2628.43 samples/sec Loss 5.8060 LearningRate 0.0193 Epoch: 11 Global Step: 465510 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:05,247-Speed 2616.07 samples/sec Loss 5.8710 LearningRate 0.0193 Epoch: 11 Global Step: 465520 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:09,342-Speed 2500.77 samples/sec Loss 5.7640 LearningRate 0.0193 Epoch: 11 Global Step: 465530 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:13,356-Speed 2551.60 samples/sec Loss 5.7999 LearningRate 0.0193 Epoch: 11 Global Step: 465540 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:17,254-Speed 2628.27 samples/sec Loss 5.8919 LearningRate 0.0193 Epoch: 11 Global Step: 465550 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:21,158-Speed 2624.06 samples/sec Loss 5.9388 LearningRate 0.0193 Epoch: 11 Global Step: 465560 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:25,052-Speed 2629.61 samples/sec Loss 5.8127 LearningRate 0.0193 Epoch: 11 Global Step: 465570 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:28,927-Speed 2643.49 samples/sec Loss 5.8860 LearningRate 0.0193 Epoch: 11 Global Step: 465580 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:32,920-Speed 2564.74 samples/sec Loss 5.7858 LearningRate 0.0193 Epoch: 11 Global Step: 465590 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:36,836-Speed 2615.52 samples/sec Loss 5.8758 LearningRate 0.0193 Epoch: 11 Global Step: 465600 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:40,743-Speed 2621.86 samples/sec Loss 5.8856 LearningRate 0.0192 Epoch: 11 Global Step: 465610 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:44,638-Speed 2629.58 samples/sec Loss 5.9108 LearningRate 0.0192 Epoch: 11 Global Step: 465620 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:48,543-Speed 2623.33 samples/sec Loss 5.9027 LearningRate 0.0192 Epoch: 11 Global Step: 465630 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:52,440-Speed 2628.13 samples/sec Loss 5.9591 LearningRate 0.0192 Epoch: 11 Global Step: 465640 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:02:56,338-Speed 2627.29 samples/sec Loss 5.7317 LearningRate 0.0192 Epoch: 11 Global Step: 465650 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:03:00,278-Speed 2599.78 samples/sec Loss 5.9258 LearningRate 0.0192 Epoch: 11 Global Step: 465660 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:03:04,291-Speed 2552.75 samples/sec Loss 5.7907 LearningRate 0.0192 Epoch: 11 Global Step: 465670 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:08,267-Speed 2576.18 samples/sec Loss 5.7463 LearningRate 0.0192 Epoch: 11 Global Step: 465680 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:12,160-Speed 2631.02 samples/sec Loss 5.8283 LearningRate 0.0192 Epoch: 11 Global Step: 465690 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:16,068-Speed 2621.21 samples/sec Loss 5.8142 LearningRate 0.0192 Epoch: 11 Global Step: 465700 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:19,963-Speed 2629.40 samples/sec Loss 5.8976 LearningRate 0.0192 Epoch: 11 Global Step: 465710 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:23,918-Speed 2590.00 samples/sec Loss 5.7960 LearningRate 0.0192 Epoch: 11 Global Step: 465720 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:27,817-Speed 2627.06 samples/sec Loss 5.8588 LearningRate 0.0192 Epoch: 11 Global Step: 465730 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:31,710-Speed 2631.12 samples/sec Loss 5.7568 LearningRate 0.0192 Epoch: 11 Global Step: 465740 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:35,605-Speed 2629.57 samples/sec Loss 5.9215 LearningRate 0.0192 Epoch: 11 Global Step: 465750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:39,499-Speed 2629.94 samples/sec Loss 5.8608 LearningRate 0.0192 Epoch: 11 Global Step: 465760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:43,395-Speed 2628.83 samples/sec Loss 6.0430 LearningRate 0.0192 Epoch: 11 Global Step: 465770 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:03:47,270-Speed 2643.95 samples/sec Loss 5.8602 LearningRate 0.0192 Epoch: 11 Global Step: 465780 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:51,174-Speed 2624.39 samples/sec Loss 5.8316 LearningRate 0.0192 Epoch: 11 Global Step: 465790 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:55,104-Speed 2606.07 samples/sec Loss 5.7807 LearningRate 0.0192 Epoch: 11 Global Step: 465800 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:03:58,997-Speed 2631.58 samples/sec Loss 5.9315 LearningRate 0.0192 Epoch: 11 Global Step: 465810 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:02,894-Speed 2628.10 samples/sec Loss 5.7883 LearningRate 0.0192 Epoch: 11 Global Step: 465820 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:06,804-Speed 2619.64 samples/sec Loss 5.8485 LearningRate 0.0192 Epoch: 11 Global Step: 465830 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:10,700-Speed 2628.98 samples/sec Loss 5.8715 LearningRate 0.0192 Epoch: 11 Global Step: 465840 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:14,593-Speed 2631.33 samples/sec Loss 5.7634 LearningRate 0.0192 Epoch: 11 Global Step: 465850 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:18,488-Speed 2629.92 samples/sec Loss 5.8902 LearningRate 0.0192 Epoch: 11 Global Step: 465860 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:22,386-Speed 2627.77 samples/sec Loss 5.7713 LearningRate 0.0192 Epoch: 11 Global Step: 465870 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:26,286-Speed 2626.32 samples/sec Loss 6.0045 LearningRate 0.0192 Epoch: 11 Global Step: 465880 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:30,185-Speed 2627.00 samples/sec Loss 5.8093 LearningRate 0.0192 Epoch: 11 Global Step: 465890 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:34,080-Speed 2629.39 samples/sec Loss 5.9084 LearningRate 0.0192 Epoch: 11 Global Step: 465900 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:37,976-Speed 2629.00 samples/sec Loss 5.9129 LearningRate 0.0192 Epoch: 11 Global Step: 465910 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:41,867-Speed 2632.25 samples/sec Loss 5.9294 LearningRate 0.0192 Epoch: 11 Global Step: 465920 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:45,758-Speed 2632.49 samples/sec Loss 5.9119 LearningRate 0.0192 Epoch: 11 Global Step: 465930 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:49,668-Speed 2619.34 samples/sec Loss 5.8328 LearningRate 0.0192 Epoch: 11 Global Step: 465940 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:53,563-Speed 2630.20 samples/sec Loss 5.8349 LearningRate 0.0192 Epoch: 11 Global Step: 465950 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:04:57,463-Speed 2626.16 samples/sec Loss 5.8940 LearningRate 0.0192 Epoch: 11 Global Step: 465960 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:01,357-Speed 2630.25 samples/sec Loss 5.9531 LearningRate 0.0192 Epoch: 11 Global Step: 465970 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:05,250-Speed 2630.57 samples/sec Loss 5.8880 LearningRate 0.0192 Epoch: 11 Global Step: 465980 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:05:09,148-Speed 2628.01 samples/sec Loss 5.9143 LearningRate 0.0192 Epoch: 11 Global Step: 465990 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:05:13,048-Speed 2626.53 samples/sec Loss 5.9247 LearningRate 0.0192 Epoch: 11 Global Step: 466000 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:05:16,949-Speed 2625.64 samples/sec Loss 5.9199 LearningRate 0.0192 Epoch: 11 Global Step: 466010 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:05:20,824-Speed 2642.92 samples/sec Loss 5.7840 LearningRate 0.0192 Epoch: 11 Global Step: 466020 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:24,719-Speed 2630.13 samples/sec Loss 5.9327 LearningRate 0.0192 Epoch: 11 Global Step: 466030 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:28,614-Speed 2628.93 samples/sec Loss 5.7530 LearningRate 0.0192 Epoch: 11 Global Step: 466040 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:32,516-Speed 2624.67 samples/sec Loss 5.8366 LearningRate 0.0192 Epoch: 11 Global Step: 466050 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:36,413-Speed 2628.19 samples/sec Loss 5.9125 LearningRate 0.0192 Epoch: 11 Global Step: 466060 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:40,309-Speed 2629.84 samples/sec Loss 5.8104 LearningRate 0.0192 Epoch: 11 Global Step: 466070 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:44,212-Speed 2623.87 samples/sec Loss 5.7952 LearningRate 0.0192 Epoch: 11 Global Step: 466080 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:48,112-Speed 2626.91 samples/sec Loss 5.9779 LearningRate 0.0192 Epoch: 11 Global Step: 466090 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:52,019-Speed 2621.20 samples/sec Loss 5.7656 LearningRate 0.0192 Epoch: 11 Global Step: 466100 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:55,961-Speed 2598.92 samples/sec Loss 5.9494 LearningRate 0.0192 Epoch: 11 Global Step: 466110 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:05:59,852-Speed 2632.51 samples/sec Loss 5.9495 LearningRate 0.0192 Epoch: 11 Global Step: 466120 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:06:03,754-Speed 2624.78 samples/sec Loss 5.8447 LearningRate 0.0192 Epoch: 11 Global Step: 466130 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:06:07,659-Speed 2622.94 samples/sec Loss 5.8602 LearningRate 0.0192 Epoch: 11 Global Step: 466140 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:06:11,569-Speed 2619.84 samples/sec Loss 5.8673 LearningRate 0.0192 Epoch: 11 Global Step: 466150 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:06:15,465-Speed 2629.49 samples/sec Loss 5.9072 LearningRate 0.0192 Epoch: 11 Global Step: 466160 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:06:19,362-Speed 2628.57 samples/sec Loss 5.8074 LearningRate 0.0192 Epoch: 11 Global Step: 466170 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:06:23,236-Speed 2643.56 samples/sec Loss 5.9139 LearningRate 0.0192 Epoch: 11 Global Step: 466180 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:27,174-Speed 2601.23 samples/sec Loss 5.8930 LearningRate 0.0192 Epoch: 11 Global Step: 466190 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:31,073-Speed 2626.88 samples/sec Loss 5.8975 LearningRate 0.0192 Epoch: 11 Global Step: 466200 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:34,977-Speed 2623.50 samples/sec Loss 5.7210 LearningRate 0.0192 Epoch: 11 Global Step: 466210 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:38,876-Speed 2626.92 samples/sec Loss 5.8311 LearningRate 0.0192 Epoch: 11 Global Step: 466220 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:42,773-Speed 2628.44 samples/sec Loss 5.9492 LearningRate 0.0192 Epoch: 11 Global Step: 466230 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:46,668-Speed 2630.28 samples/sec Loss 5.9372 LearningRate 0.0192 Epoch: 11 Global Step: 466240 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:50,562-Speed 2630.49 samples/sec Loss 5.8831 LearningRate 0.0192 Epoch: 11 Global Step: 466250 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:54,455-Speed 2630.83 samples/sec Loss 5.8237 LearningRate 0.0192 Epoch: 11 Global Step: 466260 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:06:58,353-Speed 2627.69 samples/sec Loss 5.8554 LearningRate 0.0192 Epoch: 11 Global Step: 466270 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:07:02,266-Speed 2617.03 samples/sec Loss 5.8354 LearningRate 0.0192 Epoch: 11 Global Step: 466280 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:06,162-Speed 2629.59 samples/sec Loss 5.8738 LearningRate 0.0192 Epoch: 11 Global Step: 466290 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:10,061-Speed 2627.21 samples/sec Loss 5.7231 LearningRate 0.0192 Epoch: 11 Global Step: 466300 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:13,961-Speed 2626.09 samples/sec Loss 5.8410 LearningRate 0.0192 Epoch: 11 Global Step: 466310 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:17,897-Speed 2602.70 samples/sec Loss 5.8775 LearningRate 0.0192 Epoch: 11 Global Step: 466320 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:21,793-Speed 2628.73 samples/sec Loss 5.8307 LearningRate 0.0192 Epoch: 11 Global Step: 466330 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:25,690-Speed 2628.60 samples/sec Loss 5.7845 LearningRate 0.0192 Epoch: 11 Global Step: 466340 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:29,586-Speed 2629.06 samples/sec Loss 5.9363 LearningRate 0.0192 Epoch: 11 Global Step: 466350 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:33,483-Speed 2628.00 samples/sec Loss 5.7873 LearningRate 0.0192 Epoch: 11 Global Step: 466360 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:37,376-Speed 2630.96 samples/sec Loss 5.7985 LearningRate 0.0192 Epoch: 11 Global Step: 466370 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:41,321-Speed 2596.25 samples/sec Loss 5.9274 LearningRate 0.0192 Epoch: 11 Global Step: 466380 Fp16 Grad Scale: 262144 Required: 41 hours
Training: 2022-04-15 00:07:45,193-Speed 2646.04 samples/sec Loss 5.8088 LearningRate 0.0192 Epoch: 11 Global Step: 466390 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:49,111-Speed 2614.34 samples/sec Loss 5.8653 LearningRate 0.0192 Epoch: 11 Global Step: 466400 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:53,037-Speed 2608.84 samples/sec Loss 5.8567 LearningRate 0.0192 Epoch: 11 Global Step: 466410 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:07:56,933-Speed 2628.84 samples/sec Loss 5.7722 LearningRate 0.0192 Epoch: 11 Global Step: 466420 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:00,929-Speed 2563.88 samples/sec Loss 5.8226 LearningRate 0.0192 Epoch: 11 Global Step: 466430 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:04,825-Speed 2628.72 samples/sec Loss 5.8936 LearningRate 0.0192 Epoch: 11 Global Step: 466440 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:08,737-Speed 2618.60 samples/sec Loss 5.8259 LearningRate 0.0192 Epoch: 11 Global Step: 466450 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:12,631-Speed 2630.08 samples/sec Loss 5.8639 LearningRate 0.0192 Epoch: 11 Global Step: 466460 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:16,546-Speed 2616.68 samples/sec Loss 5.9458 LearningRate 0.0192 Epoch: 11 Global Step: 466470 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:20,445-Speed 2626.82 samples/sec Loss 5.8157 LearningRate 0.0192 Epoch: 11 Global Step: 466480 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:24,341-Speed 2628.93 samples/sec Loss 5.8959 LearningRate 0.0192 Epoch: 11 Global Step: 466490 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:08:28,223-Speed 2638.21 samples/sec Loss 5.8585 LearningRate 0.0192 Epoch: 11 Global Step: 466500 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:32,126-Speed 2624.55 samples/sec Loss 5.8325 LearningRate 0.0192 Epoch: 11 Global Step: 466510 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:36,029-Speed 2624.27 samples/sec Loss 5.9517 LearningRate 0.0192 Epoch: 11 Global Step: 466520 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:39,923-Speed 2630.62 samples/sec Loss 5.8751 LearningRate 0.0192 Epoch: 11 Global Step: 466530 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:43,815-Speed 2631.39 samples/sec Loss 5.9555 LearningRate 0.0192 Epoch: 11 Global Step: 466540 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:47,712-Speed 2628.22 samples/sec Loss 5.8323 LearningRate 0.0191 Epoch: 11 Global Step: 466550 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:51,612-Speed 2626.88 samples/sec Loss 5.7874 LearningRate 0.0191 Epoch: 11 Global Step: 466560 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:55,506-Speed 2630.01 samples/sec Loss 5.8350 LearningRate 0.0191 Epoch: 11 Global Step: 466570 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:08:59,405-Speed 2627.51 samples/sec Loss 5.7600 LearningRate 0.0191 Epoch: 11 Global Step: 466580 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:09:03,312-Speed 2620.94 samples/sec Loss 5.8858 LearningRate 0.0191 Epoch: 11 Global Step: 466590 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:09:07,237-Speed 2609.82 samples/sec Loss 5.8818 LearningRate 0.0191 Epoch: 11 Global Step: 466600 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:11,146-Speed 2619.74 samples/sec Loss 5.7480 LearningRate 0.0191 Epoch: 11 Global Step: 466610 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:15,041-Speed 2630.06 samples/sec Loss 5.8889 LearningRate 0.0191 Epoch: 11 Global Step: 466620 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:18,940-Speed 2627.19 samples/sec Loss 5.8104 LearningRate 0.0191 Epoch: 11 Global Step: 466630 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:22,835-Speed 2629.64 samples/sec Loss 5.8442 LearningRate 0.0191 Epoch: 11 Global Step: 466640 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:26,734-Speed 2626.78 samples/sec Loss 5.8551 LearningRate 0.0191 Epoch: 11 Global Step: 466650 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:30,638-Speed 2623.51 samples/sec Loss 5.8422 LearningRate 0.0191 Epoch: 11 Global Step: 466660 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:34,536-Speed 2627.55 samples/sec Loss 5.8501 LearningRate 0.0191 Epoch: 11 Global Step: 466670 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:09:38,560-Speed 2545.51 samples/sec Loss 5.9064 LearningRate 0.0191 Epoch: 11 Global Step: 466680 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:09:42,457-Speed 2627.97 samples/sec Loss 5.8215 LearningRate 0.0191 Epoch: 11 Global Step: 466690 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:09:46,358-Speed 2625.59 samples/sec Loss 5.7196 LearningRate 0.0191 Epoch: 11 Global Step: 466700 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:09:50,256-Speed 2627.58 samples/sec Loss 5.8721 LearningRate 0.0191 Epoch: 11 Global Step: 466710 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:09:54,164-Speed 2621.09 samples/sec Loss 5.7524 LearningRate 0.0191 Epoch: 11 Global Step: 466720 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:09:58,056-Speed 2632.15 samples/sec Loss 5.9093 LearningRate 0.0191 Epoch: 11 Global Step: 466730 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:10:01,952-Speed 2629.05 samples/sec Loss 5.8155 LearningRate 0.0191 Epoch: 11 Global Step: 466740 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:10:05,852-Speed 2625.98 samples/sec Loss 5.9161 LearningRate 0.0191 Epoch: 11 Global Step: 466750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:10:09,745-Speed 2630.36 samples/sec Loss 5.9135 LearningRate 0.0191 Epoch: 11 Global Step: 466760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:10:13,642-Speed 2628.20 samples/sec Loss 5.9158 LearningRate 0.0191 Epoch: 11 Global Step: 466770 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:10:17,543-Speed 2625.93 samples/sec Loss 5.7465 LearningRate 0.0191 Epoch: 11 Global Step: 466780 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:21,450-Speed 2622.04 samples/sec Loss 5.8435 LearningRate 0.0191 Epoch: 11 Global Step: 466790 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:25,357-Speed 2621.96 samples/sec Loss 5.7765 LearningRate 0.0191 Epoch: 11 Global Step: 466800 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:29,279-Speed 2611.33 samples/sec Loss 5.8342 LearningRate 0.0191 Epoch: 11 Global Step: 466810 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:33,186-Speed 2621.69 samples/sec Loss 5.7721 LearningRate 0.0191 Epoch: 11 Global Step: 466820 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:37,084-Speed 2628.18 samples/sec Loss 5.7901 LearningRate 0.0191 Epoch: 11 Global Step: 466830 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:40,980-Speed 2628.37 samples/sec Loss 5.8331 LearningRate 0.0191 Epoch: 11 Global Step: 466840 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:44,881-Speed 2625.52 samples/sec Loss 5.8790 LearningRate 0.0191 Epoch: 11 Global Step: 466850 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:48,791-Speed 2619.99 samples/sec Loss 5.9010 LearningRate 0.0191 Epoch: 11 Global Step: 466860 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:10:52,676-Speed 2636.51 samples/sec Loss 5.8459 LearningRate 0.0191 Epoch: 11 Global Step: 466870 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:10:56,610-Speed 2603.83 samples/sec Loss 5.8165 LearningRate 0.0191 Epoch: 11 Global Step: 466880 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:00,574-Speed 2583.92 samples/sec Loss 5.7102 LearningRate 0.0191 Epoch: 11 Global Step: 466890 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:04,507-Speed 2604.66 samples/sec Loss 5.7783 LearningRate 0.0191 Epoch: 11 Global Step: 466900 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:08,432-Speed 2609.13 samples/sec Loss 5.8975 LearningRate 0.0191 Epoch: 11 Global Step: 466910 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:12,336-Speed 2624.18 samples/sec Loss 5.8236 LearningRate 0.0191 Epoch: 11 Global Step: 466920 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:16,236-Speed 2626.47 samples/sec Loss 5.9227 LearningRate 0.0191 Epoch: 11 Global Step: 466930 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:20,135-Speed 2626.36 samples/sec Loss 5.8964 LearningRate 0.0191 Epoch: 11 Global Step: 466940 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:24,031-Speed 2629.47 samples/sec Loss 5.8457 LearningRate 0.0191 Epoch: 11 Global Step: 466950 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:27,928-Speed 2628.11 samples/sec Loss 5.9073 LearningRate 0.0191 Epoch: 11 Global Step: 466960 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:31,829-Speed 2625.90 samples/sec Loss 5.8082 LearningRate 0.0191 Epoch: 11 Global Step: 466970 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:11:35,730-Speed 2625.69 samples/sec Loss 5.8042 LearningRate 0.0191 Epoch: 11 Global Step: 466980 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:11:39,635-Speed 2622.78 samples/sec Loss 5.9025 LearningRate 0.0191 Epoch: 11 Global Step: 466990 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:11:43,531-Speed 2628.24 samples/sec Loss 5.8620 LearningRate 0.0191 Epoch: 11 Global Step: 467000 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:11:47,436-Speed 2623.83 samples/sec Loss 5.9437 LearningRate 0.0191 Epoch: 11 Global Step: 467010 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:11:51,333-Speed 2628.14 samples/sec Loss 5.8933 LearningRate 0.0191 Epoch: 11 Global Step: 467020 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:11:55,207-Speed 2643.96 samples/sec Loss 5.7980 LearningRate 0.0191 Epoch: 11 Global Step: 467030 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:11:59,137-Speed 2606.35 samples/sec Loss 5.7398 LearningRate 0.0191 Epoch: 11 Global Step: 467040 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:12:03,031-Speed 2630.74 samples/sec Loss 5.7730 LearningRate 0.0191 Epoch: 11 Global Step: 467050 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:12:06,929-Speed 2627.73 samples/sec Loss 5.8239 LearningRate 0.0191 Epoch: 11 Global Step: 467060 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:12:10,809-Speed 2639.26 samples/sec Loss 5.7849 LearningRate 0.0191 Epoch: 11 Global Step: 467070 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:14,711-Speed 2624.92 samples/sec Loss 5.8281 LearningRate 0.0191 Epoch: 11 Global Step: 467080 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:18,608-Speed 2628.08 samples/sec Loss 5.7673 LearningRate 0.0191 Epoch: 11 Global Step: 467090 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:22,507-Speed 2627.92 samples/sec Loss 5.7975 LearningRate 0.0191 Epoch: 11 Global Step: 467100 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:26,409-Speed 2624.78 samples/sec Loss 5.7942 LearningRate 0.0191 Epoch: 11 Global Step: 467110 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:30,329-Speed 2612.91 samples/sec Loss 5.8174 LearningRate 0.0191 Epoch: 11 Global Step: 467120 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:34,234-Speed 2623.23 samples/sec Loss 5.8414 LearningRate 0.0191 Epoch: 11 Global Step: 467130 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:38,132-Speed 2627.09 samples/sec Loss 5.8890 LearningRate 0.0191 Epoch: 11 Global Step: 467140 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:42,028-Speed 2628.67 samples/sec Loss 5.7255 LearningRate 0.0191 Epoch: 11 Global Step: 467150 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:45,946-Speed 2615.01 samples/sec Loss 5.8926 LearningRate 0.0191 Epoch: 11 Global Step: 467160 Fp16 Grad Scale: 32768 Required: 41 hours
Training: 2022-04-15 00:12:49,838-Speed 2631.32 samples/sec Loss 5.8032 LearningRate 0.0191 Epoch: 11 Global Step: 467170 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:12:53,734-Speed 2629.27 samples/sec Loss 5.8293 LearningRate 0.0191 Epoch: 11 Global Step: 467180 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:12:57,634-Speed 2626.63 samples/sec Loss 5.7905 LearningRate 0.0191 Epoch: 11 Global Step: 467190 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:01,532-Speed 2627.72 samples/sec Loss 5.9071 LearningRate 0.0191 Epoch: 11 Global Step: 467200 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:05,433-Speed 2625.59 samples/sec Loss 5.8653 LearningRate 0.0191 Epoch: 11 Global Step: 467210 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:09,326-Speed 2630.90 samples/sec Loss 5.7949 LearningRate 0.0191 Epoch: 11 Global Step: 467220 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:13,231-Speed 2622.35 samples/sec Loss 5.9010 LearningRate 0.0191 Epoch: 11 Global Step: 467230 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:17,128-Speed 2628.86 samples/sec Loss 5.8600 LearningRate 0.0191 Epoch: 11 Global Step: 467240 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:21,022-Speed 2630.53 samples/sec Loss 5.8431 LearningRate 0.0191 Epoch: 11 Global Step: 467250 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:24,918-Speed 2628.87 samples/sec Loss 5.8507 LearningRate 0.0191 Epoch: 11 Global Step: 467260 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:28,824-Speed 2621.98 samples/sec Loss 5.7872 LearningRate 0.0191 Epoch: 11 Global Step: 467270 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:13:32,704-Speed 2640.38 samples/sec Loss 5.9425 LearningRate 0.0191 Epoch: 11 Global Step: 467280 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:36,602-Speed 2627.96 samples/sec Loss 5.9062 LearningRate 0.0191 Epoch: 11 Global Step: 467290 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:40,504-Speed 2624.53 samples/sec Loss 5.8765 LearningRate 0.0191 Epoch: 11 Global Step: 467300 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:44,403-Speed 2627.25 samples/sec Loss 5.8116 LearningRate 0.0191 Epoch: 11 Global Step: 467310 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:48,295-Speed 2631.86 samples/sec Loss 5.8355 LearningRate 0.0191 Epoch: 11 Global Step: 467320 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:52,201-Speed 2622.37 samples/sec Loss 5.8353 LearningRate 0.0191 Epoch: 11 Global Step: 467330 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:13:56,114-Speed 2617.55 samples/sec Loss 5.8490 LearningRate 0.0191 Epoch: 11 Global Step: 467340 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:00,017-Speed 2624.56 samples/sec Loss 5.8159 LearningRate 0.0191 Epoch: 11 Global Step: 467350 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:03,950-Speed 2604.37 samples/sec Loss 5.8367 LearningRate 0.0191 Epoch: 11 Global Step: 467360 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:07,976-Speed 2543.91 samples/sec Loss 5.6802 LearningRate 0.0191 Epoch: 11 Global Step: 467370 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:11,880-Speed 2623.36 samples/sec Loss 6.0063 LearningRate 0.0191 Epoch: 11 Global Step: 467380 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:14:15,779-Speed 2627.42 samples/sec Loss 5.9389 LearningRate 0.0191 Epoch: 11 Global Step: 467390 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:14:19,681-Speed 2624.89 samples/sec Loss 5.9283 LearningRate 0.0191 Epoch: 11 Global Step: 467400 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:14:23,598-Speed 2616.00 samples/sec Loss 5.8862 LearningRate 0.0191 Epoch: 11 Global Step: 467410 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:14:27,485-Speed 2635.00 samples/sec Loss 5.8586 LearningRate 0.0191 Epoch: 11 Global Step: 467420 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:31,390-Speed 2628.71 samples/sec Loss 5.7269 LearningRate 0.0191 Epoch: 11 Global Step: 467430 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:35,288-Speed 2627.45 samples/sec Loss 5.9046 LearningRate 0.0191 Epoch: 11 Global Step: 467440 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:39,197-Speed 2620.59 samples/sec Loss 5.9341 LearningRate 0.0191 Epoch: 11 Global Step: 467450 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:43,093-Speed 2628.77 samples/sec Loss 5.8814 LearningRate 0.0191 Epoch: 11 Global Step: 467460 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:46,987-Speed 2630.83 samples/sec Loss 5.8645 LearningRate 0.0191 Epoch: 11 Global Step: 467470 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:50,883-Speed 2628.87 samples/sec Loss 5.8718 LearningRate 0.0191 Epoch: 11 Global Step: 467480 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:54,776-Speed 2630.79 samples/sec Loss 5.8471 LearningRate 0.0191 Epoch: 11 Global Step: 467490 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:14:58,682-Speed 2622.09 samples/sec Loss 5.8865 LearningRate 0.0190 Epoch: 11 Global Step: 467500 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:15:02,582-Speed 2627.20 samples/sec Loss 5.8720 LearningRate 0.0190 Epoch: 11 Global Step: 467510 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:15:06,486-Speed 2623.58 samples/sec Loss 5.7486 LearningRate 0.0190 Epoch: 11 Global Step: 467520 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:10,391-Speed 2622.16 samples/sec Loss 5.6953 LearningRate 0.0190 Epoch: 11 Global Step: 467530 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:14,288-Speed 2628.80 samples/sec Loss 5.8953 LearningRate 0.0190 Epoch: 11 Global Step: 467540 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:18,192-Speed 2623.68 samples/sec Loss 5.7564 LearningRate 0.0190 Epoch: 11 Global Step: 467550 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:22,089-Speed 2628.73 samples/sec Loss 5.7423 LearningRate 0.0190 Epoch: 11 Global Step: 467560 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:26,132-Speed 2532.79 samples/sec Loss 5.7197 LearningRate 0.0190 Epoch: 11 Global Step: 467570 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:30,224-Speed 2503.21 samples/sec Loss 5.8054 LearningRate 0.0190 Epoch: 11 Global Step: 467580 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:34,315-Speed 2503.98 samples/sec Loss 5.8523 LearningRate 0.0190 Epoch: 11 Global Step: 467590 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:38,322-Speed 2556.07 samples/sec Loss 5.8189 LearningRate 0.0190 Epoch: 11 Global Step: 467600 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:42,221-Speed 2627.05 samples/sec Loss 5.7652 LearningRate 0.0190 Epoch: 11 Global Step: 467610 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:46,121-Speed 2626.26 samples/sec Loss 5.9184 LearningRate 0.0190 Epoch: 11 Global Step: 467620 Fp16 Grad Scale: 262144 Required: 41 hours
Training: 2022-04-15 00:15:50,011-Speed 2632.64 samples/sec Loss 5.7712 LearningRate 0.0190 Epoch: 11 Global Step: 467630 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:53,908-Speed 2628.92 samples/sec Loss 5.8436 LearningRate 0.0190 Epoch: 11 Global Step: 467640 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:15:57,806-Speed 2627.78 samples/sec Loss 5.8528 LearningRate 0.0190 Epoch: 11 Global Step: 467650 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:16:01,730-Speed 2610.35 samples/sec Loss 5.8906 LearningRate 0.0190 Epoch: 11 Global Step: 467660 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:16:05,622-Speed 2631.61 samples/sec Loss 5.8172 LearningRate 0.0190 Epoch: 11 Global Step: 467670 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:16:09,534-Speed 2618.01 samples/sec Loss 6.0142 LearningRate 0.0190 Epoch: 11 Global Step: 467680 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:16:13,426-Speed 2631.97 samples/sec Loss 5.9419 LearningRate 0.0190 Epoch: 11 Global Step: 467690 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:16:17,319-Speed 2630.66 samples/sec Loss 5.7693 LearningRate 0.0190 Epoch: 11 Global Step: 467700 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:16:21,199-Speed 2640.46 samples/sec Loss 5.8907 LearningRate 0.0190 Epoch: 11 Global Step: 467710 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:25,095-Speed 2628.48 samples/sec Loss 5.8676 LearningRate 0.0190 Epoch: 11 Global Step: 467720 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:29,010-Speed 2616.16 samples/sec Loss 5.8646 LearningRate 0.0190 Epoch: 11 Global Step: 467730 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:32,921-Speed 2619.42 samples/sec Loss 5.7729 LearningRate 0.0190 Epoch: 11 Global Step: 467740 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:36,916-Speed 2564.23 samples/sec Loss 5.7778 LearningRate 0.0190 Epoch: 11 Global Step: 467750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:40,837-Speed 2611.93 samples/sec Loss 5.7647 LearningRate 0.0190 Epoch: 11 Global Step: 467760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:44,748-Speed 2619.13 samples/sec Loss 5.9165 LearningRate 0.0190 Epoch: 11 Global Step: 467770 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:48,659-Speed 2618.98 samples/sec Loss 5.7971 LearningRate 0.0190 Epoch: 11 Global Step: 467780 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:52,561-Speed 2625.13 samples/sec Loss 5.9007 LearningRate 0.0190 Epoch: 11 Global Step: 467790 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:16:56,461-Speed 2625.99 samples/sec Loss 5.7988 LearningRate 0.0190 Epoch: 11 Global Step: 467800 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:00,372-Speed 2618.91 samples/sec Loss 5.7874 LearningRate 0.0190 Epoch: 11 Global Step: 467810 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:17:04,272-Speed 2626.69 samples/sec Loss 5.7772 LearningRate 0.0190 Epoch: 11 Global Step: 467820 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:17:08,152-Speed 2639.87 samples/sec Loss 5.9270 LearningRate 0.0190 Epoch: 11 Global Step: 467830 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:12,049-Speed 2628.47 samples/sec Loss 5.8260 LearningRate 0.0190 Epoch: 11 Global Step: 467840 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:15,947-Speed 2627.46 samples/sec Loss 5.7141 LearningRate 0.0190 Epoch: 11 Global Step: 467850 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:19,877-Speed 2605.99 samples/sec Loss 5.7980 LearningRate 0.0190 Epoch: 11 Global Step: 467860 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:23,771-Speed 2630.60 samples/sec Loss 5.7797 LearningRate 0.0190 Epoch: 11 Global Step: 467870 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:27,670-Speed 2627.21 samples/sec Loss 5.8985 LearningRate 0.0190 Epoch: 11 Global Step: 467880 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:31,570-Speed 2626.56 samples/sec Loss 5.9989 LearningRate 0.0190 Epoch: 11 Global Step: 467890 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:35,469-Speed 2626.70 samples/sec Loss 5.7329 LearningRate 0.0190 Epoch: 11 Global Step: 467900 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:39,366-Speed 2628.24 samples/sec Loss 5.8413 LearningRate 0.0190 Epoch: 11 Global Step: 467910 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:43,262-Speed 2629.34 samples/sec Loss 5.8440 LearningRate 0.0190 Epoch: 11 Global Step: 467920 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:17:47,172-Speed 2619.98 samples/sec Loss 5.8205 LearningRate 0.0190 Epoch: 11 Global Step: 467930 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:17:51,105-Speed 2604.43 samples/sec Loss 5.8049 LearningRate 0.0190 Epoch: 11 Global Step: 467940 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:17:55,010-Speed 2623.35 samples/sec Loss 5.7948 LearningRate 0.0190 Epoch: 11 Global Step: 467950 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:17:58,906-Speed 2628.86 samples/sec Loss 5.8399 LearningRate 0.0190 Epoch: 11 Global Step: 467960 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:18:02,808-Speed 2624.75 samples/sec Loss 5.8238 LearningRate 0.0190 Epoch: 11 Global Step: 467970 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:18:06,716-Speed 2621.07 samples/sec Loss 5.6903 LearningRate 0.0190 Epoch: 11 Global Step: 467980 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:18:10,597-Speed 2639.73 samples/sec Loss 5.8985 LearningRate 0.0190 Epoch: 11 Global Step: 467990 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:14,499-Speed 2624.25 samples/sec Loss 5.8853 LearningRate 0.0190 Epoch: 11 Global Step: 468000 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:18,415-Speed 2623.63 samples/sec Loss 5.8157 LearningRate 0.0190 Epoch: 11 Global Step: 468010 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:22,308-Speed 2631.17 samples/sec Loss 5.9097 LearningRate 0.0190 Epoch: 11 Global Step: 468020 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:26,201-Speed 2631.07 samples/sec Loss 5.8836 LearningRate 0.0190 Epoch: 11 Global Step: 468030 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:30,133-Speed 2604.87 samples/sec Loss 5.9548 LearningRate 0.0190 Epoch: 11 Global Step: 468040 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:34,031-Speed 2627.46 samples/sec Loss 5.8135 LearningRate 0.0190 Epoch: 11 Global Step: 468050 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:37,931-Speed 2625.94 samples/sec Loss 5.7416 LearningRate 0.0190 Epoch: 11 Global Step: 468060 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:41,825-Speed 2631.05 samples/sec Loss 5.8184 LearningRate 0.0190 Epoch: 11 Global Step: 468070 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:45,718-Speed 2630.21 samples/sec Loss 5.8433 LearningRate 0.0190 Epoch: 11 Global Step: 468080 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:18:49,631-Speed 2618.84 samples/sec Loss 5.8451 LearningRate 0.0190 Epoch: 11 Global Step: 468090 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:18:53,521-Speed 2633.34 samples/sec Loss 5.8064 LearningRate 0.0190 Epoch: 11 Global Step: 468100 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:18:57,415-Speed 2630.24 samples/sec Loss 5.8510 LearningRate 0.0190 Epoch: 11 Global Step: 468110 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:19:01,313-Speed 2627.65 samples/sec Loss 5.8623 LearningRate 0.0190 Epoch: 11 Global Step: 468120 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:19:05,206-Speed 2630.92 samples/sec Loss 5.8770 LearningRate 0.0190 Epoch: 11 Global Step: 468130 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:19:09,083-Speed 2641.64 samples/sec Loss 5.8609 LearningRate 0.0190 Epoch: 11 Global Step: 468140 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:13,000-Speed 2615.30 samples/sec Loss 5.8764 LearningRate 0.0190 Epoch: 11 Global Step: 468150 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:16,902-Speed 2625.02 samples/sec Loss 5.8488 LearningRate 0.0190 Epoch: 11 Global Step: 468160 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:20,796-Speed 2630.21 samples/sec Loss 5.9433 LearningRate 0.0190 Epoch: 11 Global Step: 468170 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:24,695-Speed 2626.78 samples/sec Loss 5.7983 LearningRate 0.0190 Epoch: 11 Global Step: 468180 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:28,607-Speed 2618.46 samples/sec Loss 5.8245 LearningRate 0.0190 Epoch: 11 Global Step: 468190 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:32,504-Speed 2628.63 samples/sec Loss 5.7852 LearningRate 0.0190 Epoch: 11 Global Step: 468200 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:36,397-Speed 2631.12 samples/sec Loss 5.9405 LearningRate 0.0190 Epoch: 11 Global Step: 468210 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:40,290-Speed 2630.90 samples/sec Loss 5.7769 LearningRate 0.0190 Epoch: 11 Global Step: 468220 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:44,186-Speed 2628.81 samples/sec Loss 5.8399 LearningRate 0.0190 Epoch: 11 Global Step: 468230 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:19:48,080-Speed 2630.81 samples/sec Loss 5.9140 LearningRate 0.0190 Epoch: 11 Global Step: 468240 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:19:51,972-Speed 2631.34 samples/sec Loss 5.8378 LearningRate 0.0190 Epoch: 11 Global Step: 468250 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:19:55,881-Speed 2620.69 samples/sec Loss 5.8504 LearningRate 0.0190 Epoch: 11 Global Step: 468260 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:19:59,776-Speed 2629.18 samples/sec Loss 5.7454 LearningRate 0.0190 Epoch: 11 Global Step: 468270 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:20:03,683-Speed 2621.63 samples/sec Loss 5.8673 LearningRate 0.0190 Epoch: 11 Global Step: 468280 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:20:07,560-Speed 2642.23 samples/sec Loss 5.8383 LearningRate 0.0190 Epoch: 11 Global Step: 468290 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:11,466-Speed 2622.47 samples/sec Loss 5.8914 LearningRate 0.0190 Epoch: 11 Global Step: 468300 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:15,357-Speed 2632.05 samples/sec Loss 5.7761 LearningRate 0.0190 Epoch: 11 Global Step: 468310 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:19,279-Speed 2610.93 samples/sec Loss 5.8778 LearningRate 0.0190 Epoch: 11 Global Step: 468320 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:23,182-Speed 2625.14 samples/sec Loss 5.9042 LearningRate 0.0190 Epoch: 11 Global Step: 468330 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:27,079-Speed 2628.26 samples/sec Loss 5.8280 LearningRate 0.0190 Epoch: 11 Global Step: 468340 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:30,973-Speed 2630.10 samples/sec Loss 5.8608 LearningRate 0.0190 Epoch: 11 Global Step: 468350 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:34,869-Speed 2629.21 samples/sec Loss 5.8524 LearningRate 0.0190 Epoch: 11 Global Step: 468360 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:38,769-Speed 2625.59 samples/sec Loss 5.7433 LearningRate 0.0190 Epoch: 11 Global Step: 468370 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:42,661-Speed 2631.98 samples/sec Loss 5.8970 LearningRate 0.0190 Epoch: 11 Global Step: 468380 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:20:46,552-Speed 2632.45 samples/sec Loss 5.8902 LearningRate 0.0190 Epoch: 11 Global Step: 468390 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:20:50,449-Speed 2628.68 samples/sec Loss 5.8022 LearningRate 0.0190 Epoch: 11 Global Step: 468400 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:20:54,369-Speed 2612.98 samples/sec Loss 5.7925 LearningRate 0.0190 Epoch: 11 Global Step: 468410 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:20:58,268-Speed 2626.52 samples/sec Loss 5.6866 LearningRate 0.0190 Epoch: 11 Global Step: 468420 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:21:02,171-Speed 2624.74 samples/sec Loss 5.8891 LearningRate 0.0190 Epoch: 11 Global Step: 468430 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:21:06,064-Speed 2630.93 samples/sec Loss 5.7430 LearningRate 0.0190 Epoch: 11 Global Step: 468440 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:21:09,971-Speed 2621.55 samples/sec Loss 5.8506 LearningRate 0.0190 Epoch: 11 Global Step: 468450 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:13,865-Speed 2629.56 samples/sec Loss 5.8302 LearningRate 0.0189 Epoch: 11 Global Step: 468460 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:17,769-Speed 2624.68 samples/sec Loss 5.8864 LearningRate 0.0189 Epoch: 11 Global Step: 468470 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:21,680-Speed 2619.01 samples/sec Loss 5.7689 LearningRate 0.0189 Epoch: 11 Global Step: 468480 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:25,576-Speed 2629.47 samples/sec Loss 5.8749 LearningRate 0.0189 Epoch: 11 Global Step: 468490 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:29,471-Speed 2628.88 samples/sec Loss 5.9249 LearningRate 0.0189 Epoch: 11 Global Step: 468500 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:33,393-Speed 2612.40 samples/sec Loss 6.0254 LearningRate 0.0189 Epoch: 11 Global Step: 468510 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:37,298-Speed 2622.73 samples/sec Loss 5.7566 LearningRate 0.0189 Epoch: 11 Global Step: 468520 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:41,210-Speed 2618.16 samples/sec Loss 5.7634 LearningRate 0.0189 Epoch: 11 Global Step: 468530 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:45,114-Speed 2623.84 samples/sec Loss 5.8613 LearningRate 0.0189 Epoch: 11 Global Step: 468540 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:21:49,008-Speed 2630.69 samples/sec Loss 5.8847 LearningRate 0.0189 Epoch: 11 Global Step: 468550 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:21:52,907-Speed 2626.64 samples/sec Loss 5.7963 LearningRate 0.0189 Epoch: 11 Global Step: 468560 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:21:56,785-Speed 2641.52 samples/sec Loss 5.8660 LearningRate 0.0189 Epoch: 11 Global Step: 468570 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:00,701-Speed 2615.47 samples/sec Loss 5.9214 LearningRate 0.0189 Epoch: 11 Global Step: 468580 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:04,592-Speed 2632.07 samples/sec Loss 5.8149 LearningRate 0.0189 Epoch: 11 Global Step: 468590 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:08,489-Speed 2628.21 samples/sec Loss 5.8363 LearningRate 0.0189 Epoch: 11 Global Step: 468600 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:12,383-Speed 2630.61 samples/sec Loss 5.7838 LearningRate 0.0189 Epoch: 11 Global Step: 468610 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:16,297-Speed 2616.65 samples/sec Loss 5.6974 LearningRate 0.0189 Epoch: 11 Global Step: 468620 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:20,195-Speed 2627.78 samples/sec Loss 5.8875 LearningRate 0.0189 Epoch: 11 Global Step: 468630 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:24,100-Speed 2623.13 samples/sec Loss 5.7568 LearningRate 0.0189 Epoch: 11 Global Step: 468640 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:27,995-Speed 2630.11 samples/sec Loss 5.7853 LearningRate 0.0189 Epoch: 11 Global Step: 468650 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:31,955-Speed 2586.12 samples/sec Loss 5.7385 LearningRate 0.0189 Epoch: 11 Global Step: 468660 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:35,880-Speed 2609.85 samples/sec Loss 5.9293 LearningRate 0.0189 Epoch: 11 Global Step: 468670 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:22:39,764-Speed 2636.94 samples/sec Loss 5.8162 LearningRate 0.0189 Epoch: 11 Global Step: 468680 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:43,666-Speed 2625.50 samples/sec Loss 5.7827 LearningRate 0.0189 Epoch: 11 Global Step: 468690 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:47,588-Speed 2611.04 samples/sec Loss 5.7049 LearningRate 0.0189 Epoch: 11 Global Step: 468700 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:51,602-Speed 2551.73 samples/sec Loss 5.7487 LearningRate 0.0189 Epoch: 11 Global Step: 468710 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:55,565-Speed 2585.17 samples/sec Loss 5.9347 LearningRate 0.0189 Epoch: 11 Global Step: 468720 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:22:59,469-Speed 2623.31 samples/sec Loss 5.8744 LearningRate 0.0189 Epoch: 11 Global Step: 468730 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:23:03,377-Speed 2620.91 samples/sec Loss 5.8334 LearningRate 0.0189 Epoch: 11 Global Step: 468740 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:23:07,290-Speed 2617.59 samples/sec Loss 5.8554 LearningRate 0.0189 Epoch: 11 Global Step: 468750 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:23:11,200-Speed 2619.98 samples/sec Loss 5.7545 LearningRate 0.0189 Epoch: 11 Global Step: 468760 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:23:15,139-Speed 2599.94 samples/sec Loss 5.8070 LearningRate 0.0189 Epoch: 11 Global Step: 468770 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:23:19,035-Speed 2629.22 samples/sec Loss 5.8421 LearningRate 0.0189 Epoch: 11 Global Step: 468780 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:22,934-Speed 2627.02 samples/sec Loss 5.7414 LearningRate 0.0189 Epoch: 11 Global Step: 468790 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:26,833-Speed 2627.37 samples/sec Loss 5.8028 LearningRate 0.0189 Epoch: 11 Global Step: 468800 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:30,731-Speed 2627.54 samples/sec Loss 5.9112 LearningRate 0.0189 Epoch: 11 Global Step: 468810 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:34,630-Speed 2626.59 samples/sec Loss 5.8799 LearningRate 0.0189 Epoch: 11 Global Step: 468820 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:38,527-Speed 2628.18 samples/sec Loss 5.7823 LearningRate 0.0189 Epoch: 11 Global Step: 468830 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:42,439-Speed 2623.64 samples/sec Loss 5.7999 LearningRate 0.0189 Epoch: 11 Global Step: 468840 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:46,339-Speed 2626.68 samples/sec Loss 5.7970 LearningRate 0.0189 Epoch: 11 Global Step: 468850 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:50,244-Speed 2622.84 samples/sec Loss 5.8318 LearningRate 0.0189 Epoch: 11 Global Step: 468860 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:54,154-Speed 2619.34 samples/sec Loss 5.7359 LearningRate 0.0189 Epoch: 11 Global Step: 468870 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:23:58,038-Speed 2637.42 samples/sec Loss 5.8771 LearningRate 0.0189 Epoch: 11 Global Step: 468880 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:24:01,936-Speed 2627.04 samples/sec Loss 5.8474 LearningRate 0.0189 Epoch: 11 Global Step: 468890 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:24:05,834-Speed 2627.59 samples/sec Loss 5.7815 LearningRate 0.0189 Epoch: 11 Global Step: 468900 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:24:09,739-Speed 2622.86 samples/sec Loss 5.9196 LearningRate 0.0189 Epoch: 11 Global Step: 468910 Fp16 Grad Scale: 131072 Required: 41 hours
Training: 2022-04-15 00:24:13,626-Speed 2635.50 samples/sec Loss 5.7891 LearningRate 0.0189 Epoch: 11 Global Step: 468920 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:17,535-Speed 2620.10 samples/sec Loss 5.7592 LearningRate 0.0189 Epoch: 11 Global Step: 468930 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:21,460-Speed 2612.45 samples/sec Loss 5.8624 LearningRate 0.0189 Epoch: 11 Global Step: 468940 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:25,354-Speed 2629.72 samples/sec Loss 5.8273 LearningRate 0.0189 Epoch: 11 Global Step: 468950 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:29,260-Speed 2622.45 samples/sec Loss 5.8129 LearningRate 0.0189 Epoch: 11 Global Step: 468960 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:33,166-Speed 2622.54 samples/sec Loss 5.7334 LearningRate 0.0189 Epoch: 11 Global Step: 468970 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:37,065-Speed 2626.56 samples/sec Loss 5.7718 LearningRate 0.0189 Epoch: 11 Global Step: 468980 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:40,981-Speed 2615.66 samples/sec Loss 5.8988 LearningRate 0.0189 Epoch: 11 Global Step: 468990 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:44,879-Speed 2627.15 samples/sec Loss 5.8001 LearningRate 0.0189 Epoch: 11 Global Step: 469000 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:48,778-Speed 2627.40 samples/sec Loss 5.8401 LearningRate 0.0189 Epoch: 11 Global Step: 469010 Fp16 Grad Scale: 65536 Required: 41 hours
Training: 2022-04-15 00:24:52,681-Speed 2624.26 samples/sec Loss 5.8198 LearningRate 0.0189 Epoch: 11 Global Step: 469020 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:24:56,586-Speed 2623.08 samples/sec Loss 5.7718 LearningRate 0.0189 Epoch: 11 Global Step: 469030 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:25:00,473-Speed 2634.23 samples/sec Loss 5.8644 LearningRate 0.0189 Epoch: 11 Global Step: 469040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:04,371-Speed 2627.69 samples/sec Loss 5.8195 LearningRate 0.0189 Epoch: 11 Global Step: 469050 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:08,268-Speed 2628.36 samples/sec Loss 5.7621 LearningRate 0.0189 Epoch: 11 Global Step: 469060 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:12,165-Speed 2628.50 samples/sec Loss 5.8912 LearningRate 0.0189 Epoch: 11 Global Step: 469070 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:16,065-Speed 2626.39 samples/sec Loss 5.7697 LearningRate 0.0189 Epoch: 11 Global Step: 469080 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:19,981-Speed 2615.34 samples/sec Loss 5.7441 LearningRate 0.0189 Epoch: 11 Global Step: 469090 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:23,881-Speed 2626.48 samples/sec Loss 5.7759 LearningRate 0.0189 Epoch: 11 Global Step: 469100 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:27,780-Speed 2626.45 samples/sec Loss 5.8641 LearningRate 0.0189 Epoch: 11 Global Step: 469110 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:31,678-Speed 2627.49 samples/sec Loss 5.7999 LearningRate 0.0189 Epoch: 11 Global Step: 469120 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:35,586-Speed 2621.18 samples/sec Loss 5.7588 LearningRate 0.0189 Epoch: 11 Global Step: 469130 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:25:39,489-Speed 2623.56 samples/sec Loss 5.8535 LearningRate 0.0189 Epoch: 11 Global Step: 469140 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:25:43,390-Speed 2626.33 samples/sec Loss 5.8720 LearningRate 0.0189 Epoch: 11 Global Step: 469150 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:25:47,290-Speed 2626.56 samples/sec Loss 5.8206 LearningRate 0.0189 Epoch: 11 Global Step: 469160 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:25:51,186-Speed 2629.20 samples/sec Loss 5.9205 LearningRate 0.0189 Epoch: 11 Global Step: 469170 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:25:55,095-Speed 2619.89 samples/sec Loss 5.7836 LearningRate 0.0189 Epoch: 11 Global Step: 469180 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:25:58,996-Speed 2625.75 samples/sec Loss 5.7131 LearningRate 0.0189 Epoch: 11 Global Step: 469190 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:02,897-Speed 2625.61 samples/sec Loss 5.7806 LearningRate 0.0189 Epoch: 11 Global Step: 469200 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:06,843-Speed 2595.41 samples/sec Loss 5.8567 LearningRate 0.0189 Epoch: 11 Global Step: 469210 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:10,750-Speed 2621.22 samples/sec Loss 5.8090 LearningRate 0.0189 Epoch: 11 Global Step: 469220 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:14,642-Speed 2632.14 samples/sec Loss 5.8634 LearningRate 0.0189 Epoch: 11 Global Step: 469230 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:18,525-Speed 2637.42 samples/sec Loss 5.7996 LearningRate 0.0189 Epoch: 11 Global Step: 469240 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:22,425-Speed 2626.39 samples/sec Loss 5.8999 LearningRate 0.0189 Epoch: 11 Global Step: 469250 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:26,325-Speed 2626.51 samples/sec Loss 5.8350 LearningRate 0.0189 Epoch: 11 Global Step: 469260 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:30,224-Speed 2627.27 samples/sec Loss 5.8471 LearningRate 0.0189 Epoch: 11 Global Step: 469270 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:34,132-Speed 2620.91 samples/sec Loss 5.8052 LearningRate 0.0189 Epoch: 11 Global Step: 469280 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:38,029-Speed 2628.22 samples/sec Loss 5.7806 LearningRate 0.0189 Epoch: 11 Global Step: 469290 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:41,933-Speed 2623.26 samples/sec Loss 5.8597 LearningRate 0.0189 Epoch: 11 Global Step: 469300 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:45,829-Speed 2628.31 samples/sec Loss 5.7974 LearningRate 0.0189 Epoch: 11 Global Step: 469310 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:49,724-Speed 2630.13 samples/sec Loss 5.8162 LearningRate 0.0189 Epoch: 11 Global Step: 469320 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:53,629-Speed 2622.94 samples/sec Loss 5.8349 LearningRate 0.0189 Epoch: 11 Global Step: 469330 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:26:57,530-Speed 2625.30 samples/sec Loss 5.9106 LearningRate 0.0189 Epoch: 11 Global Step: 469340 Fp16 Grad Scale: 262144 Required: 40 hours
Training: 2022-04-15 00:27:01,413-Speed 2638.11 samples/sec Loss 5.7605 LearningRate 0.0189 Epoch: 11 Global Step: 469350 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:27:05,292-Speed 2640.46 samples/sec Loss 5.7759 LearningRate 0.0189 Epoch: 11 Global Step: 469360 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:09,211-Speed 2613.75 samples/sec Loss 5.8192 LearningRate 0.0189 Epoch: 11 Global Step: 469370 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:13,109-Speed 2626.91 samples/sec Loss 5.8717 LearningRate 0.0189 Epoch: 11 Global Step: 469380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:17,014-Speed 2623.03 samples/sec Loss 5.7022 LearningRate 0.0189 Epoch: 11 Global Step: 469390 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:20,919-Speed 2622.89 samples/sec Loss 5.8349 LearningRate 0.0189 Epoch: 11 Global Step: 469400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:24,816-Speed 2628.19 samples/sec Loss 5.8721 LearningRate 0.0188 Epoch: 11 Global Step: 469410 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:28,717-Speed 2625.59 samples/sec Loss 5.7081 LearningRate 0.0188 Epoch: 11 Global Step: 469420 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:32,625-Speed 2620.80 samples/sec Loss 5.8689 LearningRate 0.0188 Epoch: 11 Global Step: 469430 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:36,578-Speed 2591.33 samples/sec Loss 5.6978 LearningRate 0.0188 Epoch: 11 Global Step: 469440 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:40,491-Speed 2617.43 samples/sec Loss 5.7591 LearningRate 0.0188 Epoch: 11 Global Step: 469450 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:44,388-Speed 2628.44 samples/sec Loss 5.8346 LearningRate 0.0188 Epoch: 11 Global Step: 469460 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:27:48,298-Speed 2619.69 samples/sec Loss 5.7717 LearningRate 0.0188 Epoch: 11 Global Step: 469470 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:27:52,183-Speed 2635.67 samples/sec Loss 5.7557 LearningRate 0.0188 Epoch: 11 Global Step: 469480 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:56,082-Speed 2627.31 samples/sec Loss 5.8925 LearningRate 0.0188 Epoch: 11 Global Step: 469490 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:27:59,987-Speed 2622.49 samples/sec Loss 5.7905 LearningRate 0.0188 Epoch: 11 Global Step: 469500 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:03,894-Speed 2621.93 samples/sec Loss 5.8097 LearningRate 0.0188 Epoch: 11 Global Step: 469510 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:07,796-Speed 2624.82 samples/sec Loss 5.8035 LearningRate 0.0188 Epoch: 11 Global Step: 469520 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:11,702-Speed 2621.90 samples/sec Loss 5.8200 LearningRate 0.0188 Epoch: 11 Global Step: 469530 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:15,605-Speed 2623.90 samples/sec Loss 5.7930 LearningRate 0.0188 Epoch: 11 Global Step: 469540 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:19,504-Speed 2627.59 samples/sec Loss 5.7997 LearningRate 0.0188 Epoch: 11 Global Step: 469550 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:23,409-Speed 2623.04 samples/sec Loss 5.7795 LearningRate 0.0188 Epoch: 11 Global Step: 469560 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:27,318-Speed 2620.02 samples/sec Loss 5.7862 LearningRate 0.0188 Epoch: 11 Global Step: 469570 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:31,212-Speed 2629.88 samples/sec Loss 5.7744 LearningRate 0.0188 Epoch: 11 Global Step: 469580 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:28:35,088-Speed 2642.63 samples/sec Loss 5.8015 LearningRate 0.0188 Epoch: 11 Global Step: 469590 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:38,986-Speed 2628.82 samples/sec Loss 5.8349 LearningRate 0.0188 Epoch: 11 Global Step: 469600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:42,893-Speed 2621.25 samples/sec Loss 5.8460 LearningRate 0.0188 Epoch: 11 Global Step: 469610 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:46,798-Speed 2622.69 samples/sec Loss 5.8543 LearningRate 0.0188 Epoch: 11 Global Step: 469620 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:50,701-Speed 2624.49 samples/sec Loss 5.7912 LearningRate 0.0188 Epoch: 11 Global Step: 469630 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:54,599-Speed 2627.59 samples/sec Loss 5.8334 LearningRate 0.0188 Epoch: 11 Global Step: 469640 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:28:58,493-Speed 2630.52 samples/sec Loss 5.8416 LearningRate 0.0188 Epoch: 11 Global Step: 469650 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:02,398-Speed 2622.61 samples/sec Loss 5.7691 LearningRate 0.0188 Epoch: 11 Global Step: 469660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:06,299-Speed 2625.44 samples/sec Loss 5.8576 LearningRate 0.0188 Epoch: 11 Global Step: 469670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:10,196-Speed 2627.85 samples/sec Loss 5.8550 LearningRate 0.0188 Epoch: 11 Global Step: 469680 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:14,137-Speed 2599.59 samples/sec Loss 5.6610 LearningRate 0.0188 Epoch: 11 Global Step: 469690 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:29:18,047-Speed 2620.02 samples/sec Loss 5.9204 LearningRate 0.0188 Epoch: 11 Global Step: 469700 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:29:21,945-Speed 2627.67 samples/sec Loss 5.7428 LearningRate 0.0188 Epoch: 11 Global Step: 469710 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:29:25,839-Speed 2630.20 samples/sec Loss 5.7993 LearningRate 0.0188 Epoch: 11 Global Step: 469720 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:29:29,722-Speed 2638.01 samples/sec Loss 5.8757 LearningRate 0.0188 Epoch: 11 Global Step: 469730 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:33,623-Speed 2625.77 samples/sec Loss 5.8017 LearningRate 0.0188 Epoch: 11 Global Step: 469740 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:37,518-Speed 2629.53 samples/sec Loss 5.7650 LearningRate 0.0188 Epoch: 11 Global Step: 469750 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:41,422-Speed 2623.44 samples/sec Loss 5.7184 LearningRate 0.0188 Epoch: 11 Global Step: 469760 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:45,314-Speed 2632.09 samples/sec Loss 5.7610 LearningRate 0.0188 Epoch: 11 Global Step: 469770 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:49,216-Speed 2624.67 samples/sec Loss 5.8103 LearningRate 0.0188 Epoch: 11 Global Step: 469780 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:53,118-Speed 2625.18 samples/sec Loss 5.8054 LearningRate 0.0188 Epoch: 11 Global Step: 469790 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:29:57,028-Speed 2619.46 samples/sec Loss 5.7706 LearningRate 0.0188 Epoch: 11 Global Step: 469800 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:01,010-Speed 2572.49 samples/sec Loss 5.8092 LearningRate 0.0188 Epoch: 11 Global Step: 469810 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:04,931-Speed 2611.94 samples/sec Loss 5.7659 LearningRate 0.0188 Epoch: 11 Global Step: 469820 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:08,829-Speed 2627.27 samples/sec Loss 5.7843 LearningRate 0.0188 Epoch: 11 Global Step: 469830 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:30:12,775-Speed 2596.45 samples/sec Loss 5.8320 LearningRate 0.0188 Epoch: 11 Global Step: 469840 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:30:16,653-Speed 2641.32 samples/sec Loss 5.8045 LearningRate 0.0188 Epoch: 11 Global Step: 469850 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:20,577-Speed 2610.31 samples/sec Loss 5.8504 LearningRate 0.0188 Epoch: 11 Global Step: 469860 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:24,473-Speed 2628.97 samples/sec Loss 5.8446 LearningRate 0.0188 Epoch: 11 Global Step: 469870 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:28,365-Speed 2631.64 samples/sec Loss 5.6886 LearningRate 0.0188 Epoch: 11 Global Step: 469880 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:32,258-Speed 2631.26 samples/sec Loss 5.6750 LearningRate 0.0188 Epoch: 11 Global Step: 469890 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:36,152-Speed 2629.85 samples/sec Loss 5.8325 LearningRate 0.0188 Epoch: 11 Global Step: 469900 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:40,047-Speed 2629.53 samples/sec Loss 5.7491 LearningRate 0.0188 Epoch: 11 Global Step: 469910 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:43,940-Speed 2630.94 samples/sec Loss 5.8067 LearningRate 0.0188 Epoch: 11 Global Step: 469920 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:47,838-Speed 2627.68 samples/sec Loss 5.7446 LearningRate 0.0188 Epoch: 11 Global Step: 469930 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:51,739-Speed 2626.19 samples/sec Loss 5.7638 LearningRate 0.0188 Epoch: 11 Global Step: 469940 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:30:55,642-Speed 2623.91 samples/sec Loss 5.7095 LearningRate 0.0188 Epoch: 11 Global Step: 469950 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:30:59,541-Speed 2627.29 samples/sec Loss 5.6627 LearningRate 0.0188 Epoch: 11 Global Step: 469960 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:31:03,417-Speed 2642.56 samples/sec Loss 5.8886 LearningRate 0.0188 Epoch: 11 Global Step: 469970 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:31:07,312-Speed 2629.36 samples/sec Loss 5.8121 LearningRate 0.0188 Epoch: 11 Global Step: 469980 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:31:11,210-Speed 2627.06 samples/sec Loss 5.7486 LearningRate 0.0188 Epoch: 11 Global Step: 469990 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:31:15,106-Speed 2630.00 samples/sec Loss 5.8653 LearningRate 0.0188 Epoch: 11 Global Step: 470000 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:31:58,722-[lfw][470000]XNorm: 23.571234
Training: 2022-04-15 00:31:58,723-[lfw][470000]Accuracy-Flip: 0.99733+-0.00281
Training: 2022-04-15 00:31:58,724-[lfw][470000]Accuracy-Highest: 0.99783
Training: 2022-04-15 00:32:48,891-[cfp_fp][470000]XNorm: 21.783026
Training: 2022-04-15 00:32:48,891-[cfp_fp][470000]Accuracy-Flip: 0.98786+-0.00617
Training: 2022-04-15 00:32:48,892-[cfp_fp][470000]Accuracy-Highest: 0.98843
Training: 2022-04-15 00:33:31,960-[agedb_30][470000]XNorm: 23.653521
Training: 2022-04-15 00:33:31,961-[agedb_30][470000]Accuracy-Flip: 0.97750+-0.00720
Training: 2022-04-15 00:33:31,962-[agedb_30][470000]Accuracy-Highest: 0.97817
Training: 2022-04-15 00:33:35,845-Speed 72.76 samples/sec Loss 5.7909 LearningRate 0.0188 Epoch: 11 Global Step: 470010 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:33:39,715-Speed 2647.15 samples/sec Loss 5.7153 LearningRate 0.0188 Epoch: 11 Global Step: 470020 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:33:43,587-Speed 2645.82 samples/sec Loss 5.8462 LearningRate 0.0188 Epoch: 11 Global Step: 470030 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:33:47,473-Speed 2635.71 samples/sec Loss 5.7933 LearningRate 0.0188 Epoch: 11 Global Step: 470040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:33:51,360-Speed 2634.83 samples/sec Loss 5.7875 LearningRate 0.0188 Epoch: 11 Global Step: 470050 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:33:55,239-Speed 2641.16 samples/sec Loss 5.6908 LearningRate 0.0188 Epoch: 11 Global Step: 470060 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:33:59,153-Speed 2617.45 samples/sec Loss 5.7146 LearningRate 0.0188 Epoch: 11 Global Step: 470070 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:34:03,049-Speed 2628.94 samples/sec Loss 5.7852 LearningRate 0.0188 Epoch: 11 Global Step: 470080 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:34:06,964-Speed 2616.52 samples/sec Loss 5.7576 LearningRate 0.0188 Epoch: 11 Global Step: 470090 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:34:10,848-Speed 2637.24 samples/sec Loss 5.7133 LearningRate 0.0188 Epoch: 11 Global Step: 470100 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:34:14,766-Speed 2613.80 samples/sec Loss 5.7082 LearningRate 0.0188 Epoch: 11 Global Step: 470110 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:34:18,667-Speed 2625.19 samples/sec Loss 5.7105 LearningRate 0.0188 Epoch: 11 Global Step: 470120 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:34:22,569-Speed 2625.66 samples/sec Loss 5.7130 LearningRate 0.0188 Epoch: 11 Global Step: 470130 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:34:26,433-Speed 2650.61 samples/sec Loss 5.8893 LearningRate 0.0188 Epoch: 11 Global Step: 470140 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:30,337-Speed 2623.76 samples/sec Loss 5.7082 LearningRate 0.0188 Epoch: 11 Global Step: 470150 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:34,228-Speed 2632.50 samples/sec Loss 5.8540 LearningRate 0.0188 Epoch: 11 Global Step: 470160 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:38,118-Speed 2633.33 samples/sec Loss 5.9184 LearningRate 0.0188 Epoch: 11 Global Step: 470170 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:42,016-Speed 2627.23 samples/sec Loss 5.9036 LearningRate 0.0188 Epoch: 11 Global Step: 470180 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:45,932-Speed 2615.93 samples/sec Loss 5.8447 LearningRate 0.0188 Epoch: 11 Global Step: 470190 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:49,834-Speed 2624.75 samples/sec Loss 5.8140 LearningRate 0.0188 Epoch: 11 Global Step: 470200 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:53,729-Speed 2630.17 samples/sec Loss 5.7316 LearningRate 0.0188 Epoch: 11 Global Step: 470210 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:34:57,694-Speed 2583.23 samples/sec Loss 5.9059 LearningRate 0.0188 Epoch: 11 Global Step: 470220 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:01,591-Speed 2629.01 samples/sec Loss 5.8376 LearningRate 0.0188 Epoch: 11 Global Step: 470230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:05,536-Speed 2601.54 samples/sec Loss 5.8435 LearningRate 0.0188 Epoch: 11 Global Step: 470240 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:35:09,430-Speed 2630.88 samples/sec Loss 5.7662 LearningRate 0.0188 Epoch: 11 Global Step: 470250 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:35:13,300-Speed 2645.93 samples/sec Loss 5.7362 LearningRate 0.0188 Epoch: 11 Global Step: 470260 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:17,201-Speed 2625.89 samples/sec Loss 5.8045 LearningRate 0.0188 Epoch: 11 Global Step: 470270 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:21,099-Speed 2628.23 samples/sec Loss 5.8701 LearningRate 0.0188 Epoch: 11 Global Step: 470280 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:25,003-Speed 2623.53 samples/sec Loss 5.8085 LearningRate 0.0188 Epoch: 11 Global Step: 470290 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:28,934-Speed 2605.64 samples/sec Loss 5.7305 LearningRate 0.0188 Epoch: 11 Global Step: 470300 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:32,838-Speed 2623.54 samples/sec Loss 5.8141 LearningRate 0.0188 Epoch: 11 Global Step: 470310 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:36,738-Speed 2626.71 samples/sec Loss 5.8205 LearningRate 0.0188 Epoch: 11 Global Step: 470320 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:40,633-Speed 2629.66 samples/sec Loss 5.8213 LearningRate 0.0188 Epoch: 11 Global Step: 470330 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:44,532-Speed 2626.49 samples/sec Loss 5.8579 LearningRate 0.0188 Epoch: 11 Global Step: 470340 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:48,429-Speed 2628.72 samples/sec Loss 5.7562 LearningRate 0.0188 Epoch: 11 Global Step: 470350 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:35:52,324-Speed 2629.64 samples/sec Loss 5.7748 LearningRate 0.0188 Epoch: 11 Global Step: 470360 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:35:56,192-Speed 2647.68 samples/sec Loss 5.8034 LearningRate 0.0187 Epoch: 11 Global Step: 470370 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:00,084-Speed 2632.24 samples/sec Loss 5.7965 LearningRate 0.0187 Epoch: 11 Global Step: 470380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:03,979-Speed 2629.60 samples/sec Loss 5.8061 LearningRate 0.0187 Epoch: 11 Global Step: 470390 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:07,867-Speed 2634.75 samples/sec Loss 5.8233 LearningRate 0.0187 Epoch: 11 Global Step: 470400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:11,760-Speed 2630.88 samples/sec Loss 5.7773 LearningRate 0.0187 Epoch: 11 Global Step: 470410 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:15,655-Speed 2629.24 samples/sec Loss 5.8311 LearningRate 0.0187 Epoch: 11 Global Step: 470420 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:19,591-Speed 2602.12 samples/sec Loss 5.7613 LearningRate 0.0187 Epoch: 11 Global Step: 470430 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:23,482-Speed 2632.76 samples/sec Loss 5.6981 LearningRate 0.0187 Epoch: 11 Global Step: 470440 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:27,377-Speed 2630.01 samples/sec Loss 5.8510 LearningRate 0.0187 Epoch: 11 Global Step: 470450 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:31,281-Speed 2623.50 samples/sec Loss 5.8350 LearningRate 0.0187 Epoch: 11 Global Step: 470460 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:35,188-Speed 2621.46 samples/sec Loss 5.8191 LearningRate 0.0187 Epoch: 11 Global Step: 470470 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:36:39,093-Speed 2623.40 samples/sec Loss 5.7758 LearningRate 0.0187 Epoch: 11 Global Step: 470480 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:36:43,000-Speed 2621.44 samples/sec Loss 5.7272 LearningRate 0.0187 Epoch: 11 Global Step: 470490 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:36:46,898-Speed 2627.49 samples/sec Loss 5.8304 LearningRate 0.0187 Epoch: 11 Global Step: 470500 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:36:50,775-Speed 2642.04 samples/sec Loss 5.7452 LearningRate 0.0187 Epoch: 11 Global Step: 470510 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:54,674-Speed 2626.58 samples/sec Loss 5.9169 LearningRate 0.0187 Epoch: 11 Global Step: 470520 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:36:58,605-Speed 2606.12 samples/sec Loss 5.7375 LearningRate 0.0187 Epoch: 11 Global Step: 470530 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:02,503-Speed 2627.94 samples/sec Loss 5.8557 LearningRate 0.0187 Epoch: 11 Global Step: 470540 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:06,416-Speed 2617.73 samples/sec Loss 5.7527 LearningRate 0.0187 Epoch: 11 Global Step: 470550 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:10,316-Speed 2626.14 samples/sec Loss 5.8481 LearningRate 0.0187 Epoch: 11 Global Step: 470560 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:14,215-Speed 2627.09 samples/sec Loss 5.8721 LearningRate 0.0187 Epoch: 11 Global Step: 470570 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:18,149-Speed 2603.24 samples/sec Loss 5.7957 LearningRate 0.0187 Epoch: 11 Global Step: 470580 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:22,088-Speed 2600.91 samples/sec Loss 5.8943 LearningRate 0.0187 Epoch: 11 Global Step: 470590 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:26,009-Speed 2612.14 samples/sec Loss 5.8342 LearningRate 0.0187 Epoch: 11 Global Step: 470600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:29,920-Speed 2619.46 samples/sec Loss 5.8777 LearningRate 0.0187 Epoch: 11 Global Step: 470610 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:37:33,816-Speed 2629.05 samples/sec Loss 5.8103 LearningRate 0.0187 Epoch: 11 Global Step: 470620 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:37:37,695-Speed 2640.60 samples/sec Loss 5.7515 LearningRate 0.0187 Epoch: 11 Global Step: 470630 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:41,641-Speed 2595.56 samples/sec Loss 5.7218 LearningRate 0.0187 Epoch: 11 Global Step: 470640 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:45,541-Speed 2626.28 samples/sec Loss 5.7206 LearningRate 0.0187 Epoch: 11 Global Step: 470650 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:49,468-Speed 2608.48 samples/sec Loss 5.8645 LearningRate 0.0187 Epoch: 11 Global Step: 470660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:53,363-Speed 2629.68 samples/sec Loss 5.8214 LearningRate 0.0187 Epoch: 11 Global Step: 470670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:37:57,256-Speed 2630.99 samples/sec Loss 5.7946 LearningRate 0.0187 Epoch: 11 Global Step: 470680 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:01,152-Speed 2629.60 samples/sec Loss 5.7021 LearningRate 0.0187 Epoch: 11 Global Step: 470690 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:05,049-Speed 2627.71 samples/sec Loss 5.7674 LearningRate 0.0187 Epoch: 11 Global Step: 470700 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:08,963-Speed 2617.33 samples/sec Loss 5.8250 LearningRate 0.0187 Epoch: 11 Global Step: 470710 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:12,858-Speed 2629.33 samples/sec Loss 5.8489 LearningRate 0.0187 Epoch: 11 Global Step: 470720 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:16,750-Speed 2631.84 samples/sec Loss 5.7652 LearningRate 0.0187 Epoch: 11 Global Step: 470730 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:38:20,643-Speed 2630.71 samples/sec Loss 5.7407 LearningRate 0.0187 Epoch: 11 Global Step: 470740 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:38:24,535-Speed 2631.39 samples/sec Loss 5.8404 LearningRate 0.0187 Epoch: 11 Global Step: 470750 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:38:28,426-Speed 2632.76 samples/sec Loss 5.8447 LearningRate 0.0187 Epoch: 11 Global Step: 470760 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:38:32,323-Speed 2628.56 samples/sec Loss 5.7291 LearningRate 0.0187 Epoch: 11 Global Step: 470770 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:38:36,217-Speed 2630.03 samples/sec Loss 5.8002 LearningRate 0.0187 Epoch: 11 Global Step: 470780 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:38:40,097-Speed 2639.72 samples/sec Loss 5.7480 LearningRate 0.0187 Epoch: 11 Global Step: 470790 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:44,025-Speed 2607.43 samples/sec Loss 5.8628 LearningRate 0.0187 Epoch: 11 Global Step: 470800 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:47,923-Speed 2628.20 samples/sec Loss 5.7057 LearningRate 0.0187 Epoch: 11 Global Step: 470810 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:51,819-Speed 2628.82 samples/sec Loss 5.9126 LearningRate 0.0187 Epoch: 11 Global Step: 470820 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:55,711-Speed 2631.59 samples/sec Loss 5.7347 LearningRate 0.0187 Epoch: 11 Global Step: 470830 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:38:59,607-Speed 2629.08 samples/sec Loss 5.8270 LearningRate 0.0187 Epoch: 11 Global Step: 470840 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:03,525-Speed 2614.80 samples/sec Loss 5.8588 LearningRate 0.0187 Epoch: 11 Global Step: 470850 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:07,419-Speed 2630.23 samples/sec Loss 5.8187 LearningRate 0.0187 Epoch: 11 Global Step: 470860 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:11,318-Speed 2626.92 samples/sec Loss 5.7033 LearningRate 0.0187 Epoch: 11 Global Step: 470870 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:15,217-Speed 2627.46 samples/sec Loss 5.9110 LearningRate 0.0187 Epoch: 11 Global Step: 470880 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:19,119-Speed 2624.81 samples/sec Loss 5.7206 LearningRate 0.0187 Epoch: 11 Global Step: 470890 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:39:23,031-Speed 2618.49 samples/sec Loss 5.7898 LearningRate 0.0187 Epoch: 11 Global Step: 470900 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:39:26,925-Speed 2630.51 samples/sec Loss 5.8265 LearningRate 0.0187 Epoch: 11 Global Step: 470910 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:39:30,819-Speed 2629.60 samples/sec Loss 5.8256 LearningRate 0.0187 Epoch: 11 Global Step: 470920 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:39:34,723-Speed 2623.48 samples/sec Loss 5.7945 LearningRate 0.0187 Epoch: 11 Global Step: 470930 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:39:38,623-Speed 2626.61 samples/sec Loss 5.8412 LearningRate 0.0187 Epoch: 11 Global Step: 470940 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:39:42,493-Speed 2646.74 samples/sec Loss 5.7888 LearningRate 0.0187 Epoch: 11 Global Step: 470950 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:46,383-Speed 2633.29 samples/sec Loss 5.8050 LearningRate 0.0187 Epoch: 11 Global Step: 470960 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:50,290-Speed 2621.32 samples/sec Loss 5.7931 LearningRate 0.0187 Epoch: 11 Global Step: 470970 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:54,188-Speed 2627.12 samples/sec Loss 5.8263 LearningRate 0.0187 Epoch: 11 Global Step: 470980 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:39:58,088-Speed 2626.41 samples/sec Loss 5.7869 LearningRate 0.0187 Epoch: 11 Global Step: 470990 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:40:01,981-Speed 2631.11 samples/sec Loss 5.7808 LearningRate 0.0187 Epoch: 11 Global Step: 471000 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:40:05,886-Speed 2622.73 samples/sec Loss 5.8193 LearningRate 0.0187 Epoch: 11 Global Step: 471010 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:40:09,784-Speed 2627.31 samples/sec Loss 5.7426 LearningRate 0.0187 Epoch: 11 Global Step: 471020 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:40:13,680-Speed 2629.20 samples/sec Loss 5.8271 LearningRate 0.0187 Epoch: 11 Global Step: 471030 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:40:17,591-Speed 2619.15 samples/sec Loss 5.7715 LearningRate 0.0187 Epoch: 11 Global Step: 471040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:40:21,488-Speed 2628.56 samples/sec Loss 5.7929 LearningRate 0.0187 Epoch: 11 Global Step: 471050 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:25,393-Speed 2622.27 samples/sec Loss 5.6498 LearningRate 0.0187 Epoch: 11 Global Step: 471060 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:29,287-Speed 2630.40 samples/sec Loss 5.8589 LearningRate 0.0187 Epoch: 11 Global Step: 471070 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:33,187-Speed 2626.42 samples/sec Loss 5.8191 LearningRate 0.0187 Epoch: 11 Global Step: 471080 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:37,088-Speed 2625.30 samples/sec Loss 5.8537 LearningRate 0.0187 Epoch: 11 Global Step: 471090 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:40,980-Speed 2631.44 samples/sec Loss 5.7775 LearningRate 0.0187 Epoch: 11 Global Step: 471100 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:44,903-Speed 2611.15 samples/sec Loss 5.7208 LearningRate 0.0187 Epoch: 11 Global Step: 471110 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:48,828-Speed 2609.29 samples/sec Loss 5.7718 LearningRate 0.0187 Epoch: 11 Global Step: 471120 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:52,732-Speed 2624.25 samples/sec Loss 5.7723 LearningRate 0.0187 Epoch: 11 Global Step: 471130 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:40:56,626-Speed 2630.12 samples/sec Loss 5.7358 LearningRate 0.0187 Epoch: 11 Global Step: 471140 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:41:00,509-Speed 2637.38 samples/sec Loss 5.8573 LearningRate 0.0187 Epoch: 11 Global Step: 471150 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:41:04,421-Speed 2618.05 samples/sec Loss 5.6513 LearningRate 0.0187 Epoch: 11 Global Step: 471160 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:41:08,326-Speed 2623.30 samples/sec Loss 5.6735 LearningRate 0.0187 Epoch: 11 Global Step: 471170 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:41:12,223-Speed 2627.86 samples/sec Loss 5.8220 LearningRate 0.0187 Epoch: 11 Global Step: 471180 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:41:16,100-Speed 2641.44 samples/sec Loss 5.7528 LearningRate 0.0187 Epoch: 11 Global Step: 471190 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:20,088-Speed 2568.42 samples/sec Loss 5.8249 LearningRate 0.0187 Epoch: 11 Global Step: 471200 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:23,991-Speed 2624.05 samples/sec Loss 5.6949 LearningRate 0.0187 Epoch: 11 Global Step: 471210 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:27,888-Speed 2628.47 samples/sec Loss 5.8525 LearningRate 0.0187 Epoch: 11 Global Step: 471220 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:31,782-Speed 2630.51 samples/sec Loss 5.9149 LearningRate 0.0187 Epoch: 11 Global Step: 471230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:35,680-Speed 2627.64 samples/sec Loss 5.7454 LearningRate 0.0187 Epoch: 11 Global Step: 471240 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:39,576-Speed 2629.24 samples/sec Loss 5.6754 LearningRate 0.0187 Epoch: 11 Global Step: 471250 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:43,474-Speed 2627.52 samples/sec Loss 5.6764 LearningRate 0.0187 Epoch: 11 Global Step: 471260 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:47,371-Speed 2628.08 samples/sec Loss 5.7659 LearningRate 0.0187 Epoch: 11 Global Step: 471270 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:51,266-Speed 2629.88 samples/sec Loss 5.7763 LearningRate 0.0187 Epoch: 11 Global Step: 471280 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:41:55,166-Speed 2625.68 samples/sec Loss 5.8241 LearningRate 0.0187 Epoch: 11 Global Step: 471290 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:41:59,075-Speed 2620.38 samples/sec Loss 5.7338 LearningRate 0.0187 Epoch: 11 Global Step: 471300 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:42:02,983-Speed 2628.10 samples/sec Loss 5.8107 LearningRate 0.0187 Epoch: 11 Global Step: 471310 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:42:06,891-Speed 2621.20 samples/sec Loss 5.7798 LearningRate 0.0187 Epoch: 11 Global Step: 471320 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:42:10,791-Speed 2626.26 samples/sec Loss 5.7301 LearningRate 0.0186 Epoch: 11 Global Step: 471330 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:42:14,660-Speed 2647.46 samples/sec Loss 5.8171 LearningRate 0.0186 Epoch: 11 Global Step: 471340 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:18,550-Speed 2632.85 samples/sec Loss 5.8682 LearningRate 0.0186 Epoch: 11 Global Step: 471350 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:22,448-Speed 2627.72 samples/sec Loss 5.7629 LearningRate 0.0186 Epoch: 11 Global Step: 471360 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:26,346-Speed 2627.63 samples/sec Loss 5.7548 LearningRate 0.0186 Epoch: 11 Global Step: 471370 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:30,246-Speed 2626.05 samples/sec Loss 5.7625 LearningRate 0.0186 Epoch: 11 Global Step: 471380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:34,168-Speed 2611.58 samples/sec Loss 5.7422 LearningRate 0.0186 Epoch: 11 Global Step: 471390 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:38,080-Speed 2617.90 samples/sec Loss 5.7820 LearningRate 0.0186 Epoch: 11 Global Step: 471400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:41,974-Speed 2630.45 samples/sec Loss 5.8317 LearningRate 0.0186 Epoch: 11 Global Step: 471410 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:45,867-Speed 2631.26 samples/sec Loss 5.8588 LearningRate 0.0186 Epoch: 11 Global Step: 471420 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:49,761-Speed 2630.43 samples/sec Loss 5.7617 LearningRate 0.0186 Epoch: 11 Global Step: 471430 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:42:53,657-Speed 2628.61 samples/sec Loss 5.6840 LearningRate 0.0186 Epoch: 11 Global Step: 471440 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:42:57,549-Speed 2631.87 samples/sec Loss 5.7235 LearningRate 0.0186 Epoch: 11 Global Step: 471450 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:43:01,446-Speed 2627.97 samples/sec Loss 5.8358 LearningRate 0.0186 Epoch: 11 Global Step: 471460 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:43:05,324-Speed 2641.14 samples/sec Loss 5.7624 LearningRate 0.0186 Epoch: 11 Global Step: 471470 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:09,234-Speed 2619.51 samples/sec Loss 5.7010 LearningRate 0.0186 Epoch: 11 Global Step: 471480 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:13,130-Speed 2629.47 samples/sec Loss 5.8047 LearningRate 0.0186 Epoch: 11 Global Step: 471490 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:17,029-Speed 2626.88 samples/sec Loss 5.7783 LearningRate 0.0186 Epoch: 11 Global Step: 471500 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:20,928-Speed 2627.24 samples/sec Loss 5.8377 LearningRate 0.0186 Epoch: 11 Global Step: 471510 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:24,863-Speed 2602.34 samples/sec Loss 5.8335 LearningRate 0.0186 Epoch: 11 Global Step: 471520 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:28,755-Speed 2632.38 samples/sec Loss 5.6835 LearningRate 0.0186 Epoch: 11 Global Step: 471530 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:32,650-Speed 2629.77 samples/sec Loss 5.7121 LearningRate 0.0186 Epoch: 11 Global Step: 471540 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:36,547-Speed 2627.79 samples/sec Loss 5.6951 LearningRate 0.0186 Epoch: 11 Global Step: 471550 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:40,441-Speed 2630.24 samples/sec Loss 5.7133 LearningRate 0.0186 Epoch: 11 Global Step: 471560 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:44,335-Speed 2631.20 samples/sec Loss 5.7417 LearningRate 0.0186 Epoch: 11 Global Step: 471570 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:43:48,231-Speed 2629.04 samples/sec Loss 5.7276 LearningRate 0.0186 Epoch: 11 Global Step: 471580 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:43:52,114-Speed 2637.31 samples/sec Loss 5.7730 LearningRate 0.0186 Epoch: 11 Global Step: 471590 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:56,008-Speed 2630.86 samples/sec Loss 5.8480 LearningRate 0.0186 Epoch: 11 Global Step: 471600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:43:59,912-Speed 2623.51 samples/sec Loss 5.7351 LearningRate 0.0186 Epoch: 11 Global Step: 471610 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:03,825-Speed 2617.19 samples/sec Loss 5.7795 LearningRate 0.0186 Epoch: 11 Global Step: 471620 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:07,732-Speed 2621.90 samples/sec Loss 5.7055 LearningRate 0.0186 Epoch: 11 Global Step: 471630 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:11,674-Speed 2598.45 samples/sec Loss 5.8706 LearningRate 0.0186 Epoch: 11 Global Step: 471640 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:15,655-Speed 2572.89 samples/sec Loss 5.8145 LearningRate 0.0186 Epoch: 11 Global Step: 471650 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:19,553-Speed 2627.50 samples/sec Loss 5.8489 LearningRate 0.0186 Epoch: 11 Global Step: 471660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:23,455-Speed 2626.01 samples/sec Loss 5.8148 LearningRate 0.0186 Epoch: 11 Global Step: 471670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:27,348-Speed 2631.10 samples/sec Loss 5.7732 LearningRate 0.0186 Epoch: 11 Global Step: 471680 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:31,242-Speed 2630.01 samples/sec Loss 5.7894 LearningRate 0.0186 Epoch: 11 Global Step: 471690 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:44:35,137-Speed 2629.60 samples/sec Loss 5.7180 LearningRate 0.0186 Epoch: 11 Global Step: 471700 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:44:39,061-Speed 2610.35 samples/sec Loss 5.8348 LearningRate 0.0186 Epoch: 11 Global Step: 471710 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:42,959-Speed 2627.92 samples/sec Loss 5.7306 LearningRate 0.0186 Epoch: 11 Global Step: 471720 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:46,906-Speed 2595.09 samples/sec Loss 5.7741 LearningRate 0.0186 Epoch: 11 Global Step: 471730 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:50,818-Speed 2618.43 samples/sec Loss 5.6723 LearningRate 0.0186 Epoch: 11 Global Step: 471740 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:54,715-Speed 2628.29 samples/sec Loss 5.6341 LearningRate 0.0186 Epoch: 11 Global Step: 471750 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:44:58,632-Speed 2615.25 samples/sec Loss 5.7421 LearningRate 0.0186 Epoch: 11 Global Step: 471760 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:45:02,540-Speed 2620.58 samples/sec Loss 5.7681 LearningRate 0.0186 Epoch: 11 Global Step: 471770 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:45:06,439-Speed 2627.04 samples/sec Loss 5.7180 LearningRate 0.0186 Epoch: 11 Global Step: 471780 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:45:10,355-Speed 2615.46 samples/sec Loss 5.8153 LearningRate 0.0186 Epoch: 11 Global Step: 471790 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:45:14,269-Speed 2616.71 samples/sec Loss 5.7241 LearningRate 0.0186 Epoch: 11 Global Step: 471800 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:45:18,162-Speed 2630.99 samples/sec Loss 5.6661 LearningRate 0.0186 Epoch: 11 Global Step: 471810 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:22,123-Speed 2586.51 samples/sec Loss 5.8484 LearningRate 0.0186 Epoch: 11 Global Step: 471820 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:26,022-Speed 2626.80 samples/sec Loss 5.7559 LearningRate 0.0186 Epoch: 11 Global Step: 471830 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:29,932-Speed 2619.37 samples/sec Loss 5.7939 LearningRate 0.0186 Epoch: 11 Global Step: 471840 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:33,830-Speed 2627.79 samples/sec Loss 5.7949 LearningRate 0.0186 Epoch: 11 Global Step: 471850 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:37,748-Speed 2614.42 samples/sec Loss 5.7854 LearningRate 0.0186 Epoch: 11 Global Step: 471860 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:41,647-Speed 2626.58 samples/sec Loss 5.7882 LearningRate 0.0186 Epoch: 11 Global Step: 471870 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:45,547-Speed 2627.11 samples/sec Loss 5.6469 LearningRate 0.0186 Epoch: 11 Global Step: 471880 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:49,442-Speed 2629.77 samples/sec Loss 5.7145 LearningRate 0.0186 Epoch: 11 Global Step: 471890 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:53,340-Speed 2628.37 samples/sec Loss 5.7172 LearningRate 0.0186 Epoch: 11 Global Step: 471900 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:45:57,214-Speed 2643.74 samples/sec Loss 5.7874 LearningRate 0.0186 Epoch: 11 Global Step: 471910 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:01,110-Speed 2629.23 samples/sec Loss 5.7900 LearningRate 0.0186 Epoch: 11 Global Step: 471920 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:05,002-Speed 2631.44 samples/sec Loss 5.7832 LearningRate 0.0186 Epoch: 11 Global Step: 471930 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:08,895-Speed 2632.10 samples/sec Loss 5.7117 LearningRate 0.0186 Epoch: 11 Global Step: 471940 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:12,793-Speed 2627.71 samples/sec Loss 5.7754 LearningRate 0.0186 Epoch: 11 Global Step: 471950 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:16,688-Speed 2629.69 samples/sec Loss 5.6696 LearningRate 0.0186 Epoch: 11 Global Step: 471960 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:20,588-Speed 2626.17 samples/sec Loss 5.8059 LearningRate 0.0186 Epoch: 11 Global Step: 471970 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:24,483-Speed 2629.94 samples/sec Loss 5.7363 LearningRate 0.0186 Epoch: 11 Global Step: 471980 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:28,374-Speed 2632.11 samples/sec Loss 5.8496 LearningRate 0.0186 Epoch: 11 Global Step: 471990 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:46:32,244-Speed 2646.32 samples/sec Loss 5.8370 LearningRate 0.0186 Epoch: 11 Global Step: 472000 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:46:36,151-Speed 2621.63 samples/sec Loss 5.7601 LearningRate 0.0186 Epoch: 11 Global Step: 472010 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:46:40,081-Speed 2606.62 samples/sec Loss 5.9388 LearningRate 0.0186 Epoch: 11 Global Step: 472020 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:46:43,978-Speed 2628.47 samples/sec Loss 5.6813 LearningRate 0.0186 Epoch: 11 Global Step: 472030 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:46:47,875-Speed 2628.70 samples/sec Loss 5.7236 LearningRate 0.0186 Epoch: 11 Global Step: 472040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:46:51,780-Speed 2623.03 samples/sec Loss 5.7295 LearningRate 0.0186 Epoch: 11 Global Step: 472050 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:46:55,681-Speed 2625.54 samples/sec Loss 5.8198 LearningRate 0.0186 Epoch: 11 Global Step: 472060 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:46:59,622-Speed 2598.71 samples/sec Loss 5.7485 LearningRate 0.0186 Epoch: 11 Global Step: 472070 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:47:03,518-Speed 2629.28 samples/sec Loss 5.8389 LearningRate 0.0186 Epoch: 11 Global Step: 472080 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:47:07,446-Speed 2607.71 samples/sec Loss 5.6527 LearningRate 0.0186 Epoch: 11 Global Step: 472090 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:47:11,343-Speed 2628.38 samples/sec Loss 5.7400 LearningRate 0.0186 Epoch: 11 Global Step: 472100 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:15,241-Speed 2627.07 samples/sec Loss 5.7021 LearningRate 0.0186 Epoch: 11 Global Step: 472110 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:19,144-Speed 2625.05 samples/sec Loss 5.7643 LearningRate 0.0186 Epoch: 11 Global Step: 472120 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:23,038-Speed 2629.82 samples/sec Loss 5.8283 LearningRate 0.0186 Epoch: 11 Global Step: 472130 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:26,932-Speed 2630.74 samples/sec Loss 5.8021 LearningRate 0.0186 Epoch: 11 Global Step: 472140 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:30,825-Speed 2631.04 samples/sec Loss 5.6945 LearningRate 0.0186 Epoch: 11 Global Step: 472150 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:34,722-Speed 2628.44 samples/sec Loss 5.6962 LearningRate 0.0186 Epoch: 11 Global Step: 472160 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:38,620-Speed 2627.35 samples/sec Loss 5.7174 LearningRate 0.0186 Epoch: 11 Global Step: 472170 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:42,521-Speed 2626.37 samples/sec Loss 5.7837 LearningRate 0.0186 Epoch: 11 Global Step: 472180 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:47:46,401-Speed 2639.18 samples/sec Loss 5.8364 LearningRate 0.0186 Epoch: 11 Global Step: 472190 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:47:50,304-Speed 2624.78 samples/sec Loss 5.6361 LearningRate 0.0186 Epoch: 11 Global Step: 472200 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:47:54,213-Speed 2619.69 samples/sec Loss 5.8134 LearningRate 0.0186 Epoch: 11 Global Step: 472210 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:47:58,111-Speed 2628.06 samples/sec Loss 5.7331 LearningRate 0.0186 Epoch: 11 Global Step: 472220 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:02,004-Speed 2630.93 samples/sec Loss 5.9344 LearningRate 0.0186 Epoch: 11 Global Step: 472230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:05,899-Speed 2629.72 samples/sec Loss 5.8180 LearningRate 0.0186 Epoch: 11 Global Step: 472240 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:09,808-Speed 2619.99 samples/sec Loss 5.7241 LearningRate 0.0186 Epoch: 11 Global Step: 472250 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:13,712-Speed 2624.22 samples/sec Loss 5.8318 LearningRate 0.0186 Epoch: 11 Global Step: 472260 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:17,625-Speed 2617.23 samples/sec Loss 5.7190 LearningRate 0.0186 Epoch: 11 Global Step: 472270 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:21,575-Speed 2593.70 samples/sec Loss 5.7364 LearningRate 0.0186 Epoch: 11 Global Step: 472280 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:25,476-Speed 2625.77 samples/sec Loss 5.7604 LearningRate 0.0185 Epoch: 11 Global Step: 472290 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:48:29,383-Speed 2622.05 samples/sec Loss 5.7686 LearningRate 0.0185 Epoch: 11 Global Step: 472300 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:48:33,270-Speed 2634.45 samples/sec Loss 5.6565 LearningRate 0.0185 Epoch: 11 Global Step: 472310 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:37,192-Speed 2611.23 samples/sec Loss 5.7005 LearningRate 0.0185 Epoch: 11 Global Step: 472320 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:41,095-Speed 2624.72 samples/sec Loss 5.7576 LearningRate 0.0185 Epoch: 11 Global Step: 472330 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:44,988-Speed 2631.65 samples/sec Loss 5.7785 LearningRate 0.0185 Epoch: 11 Global Step: 472340 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:48,887-Speed 2626.83 samples/sec Loss 5.7198 LearningRate 0.0185 Epoch: 11 Global Step: 472350 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:52,794-Speed 2621.74 samples/sec Loss 5.7456 LearningRate 0.0185 Epoch: 11 Global Step: 472360 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:48:56,719-Speed 2609.37 samples/sec Loss 5.7714 LearningRate 0.0185 Epoch: 11 Global Step: 472370 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:49:00,614-Speed 2629.76 samples/sec Loss 5.7699 LearningRate 0.0185 Epoch: 11 Global Step: 472380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:49:04,517-Speed 2624.03 samples/sec Loss 5.8010 LearningRate 0.0185 Epoch: 11 Global Step: 472390 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:49:08,430-Speed 2617.68 samples/sec Loss 5.9077 LearningRate 0.0185 Epoch: 11 Global Step: 472400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:49:12,332-Speed 2625.22 samples/sec Loss 5.8261 LearningRate 0.0185 Epoch: 11 Global Step: 472410 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:16,240-Speed 2620.19 samples/sec Loss 5.7743 LearningRate 0.0185 Epoch: 11 Global Step: 472420 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:20,147-Speed 2621.93 samples/sec Loss 5.7196 LearningRate 0.0185 Epoch: 11 Global Step: 472430 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:24,053-Speed 2622.37 samples/sec Loss 5.7176 LearningRate 0.0185 Epoch: 11 Global Step: 472440 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:27,953-Speed 2626.11 samples/sec Loss 5.8087 LearningRate 0.0185 Epoch: 11 Global Step: 472450 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:31,849-Speed 2628.66 samples/sec Loss 5.7716 LearningRate 0.0185 Epoch: 11 Global Step: 472460 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:35,743-Speed 2630.77 samples/sec Loss 5.8117 LearningRate 0.0185 Epoch: 11 Global Step: 472470 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:39,657-Speed 2616.84 samples/sec Loss 5.7836 LearningRate 0.0185 Epoch: 11 Global Step: 472480 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:43,596-Speed 2600.13 samples/sec Loss 5.7500 LearningRate 0.0185 Epoch: 11 Global Step: 472490 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:47,508-Speed 2618.26 samples/sec Loss 5.8277 LearningRate 0.0185 Epoch: 11 Global Step: 472500 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:51,384-Speed 2642.67 samples/sec Loss 5.7275 LearningRate 0.0185 Epoch: 11 Global Step: 472510 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:55,278-Speed 2629.61 samples/sec Loss 5.7763 LearningRate 0.0185 Epoch: 11 Global Step: 472520 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:49:59,174-Speed 2629.43 samples/sec Loss 5.7648 LearningRate 0.0185 Epoch: 11 Global Step: 472530 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:50:03,070-Speed 2628.79 samples/sec Loss 5.7055 LearningRate 0.0185 Epoch: 11 Global Step: 472540 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:50:06,965-Speed 2629.34 samples/sec Loss 5.8050 LearningRate 0.0185 Epoch: 11 Global Step: 472550 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:50:10,862-Speed 2628.16 samples/sec Loss 5.8111 LearningRate 0.0185 Epoch: 11 Global Step: 472560 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:50:14,756-Speed 2630.79 samples/sec Loss 5.6841 LearningRate 0.0185 Epoch: 11 Global Step: 472570 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:50:18,655-Speed 2626.97 samples/sec Loss 5.7443 LearningRate 0.0185 Epoch: 11 Global Step: 472580 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:50:22,556-Speed 2625.58 samples/sec Loss 5.7173 LearningRate 0.0185 Epoch: 11 Global Step: 472590 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:50:26,428-Speed 2644.80 samples/sec Loss 5.7073 LearningRate 0.0185 Epoch: 11 Global Step: 472600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:30,322-Speed 2630.36 samples/sec Loss 5.7129 LearningRate 0.0185 Epoch: 11 Global Step: 472610 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:34,220-Speed 2627.78 samples/sec Loss 5.8000 LearningRate 0.0185 Epoch: 11 Global Step: 472620 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:38,118-Speed 2627.33 samples/sec Loss 5.7820 LearningRate 0.0185 Epoch: 11 Global Step: 472630 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:42,013-Speed 2629.68 samples/sec Loss 5.7747 LearningRate 0.0185 Epoch: 11 Global Step: 472640 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:45,908-Speed 2629.46 samples/sec Loss 5.7727 LearningRate 0.0185 Epoch: 11 Global Step: 472650 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:49,815-Speed 2621.95 samples/sec Loss 5.7847 LearningRate 0.0185 Epoch: 11 Global Step: 472660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:53,712-Speed 2628.34 samples/sec Loss 5.7553 LearningRate 0.0185 Epoch: 11 Global Step: 472670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:50:57,626-Speed 2616.55 samples/sec Loss 5.7554 LearningRate 0.0185 Epoch: 11 Global Step: 472680 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:01,523-Speed 2628.28 samples/sec Loss 5.7333 LearningRate 0.0185 Epoch: 11 Global Step: 472690 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:05,425-Speed 2625.66 samples/sec Loss 5.7870 LearningRate 0.0185 Epoch: 11 Global Step: 472700 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:51:09,296-Speed 2645.88 samples/sec Loss 5.8218 LearningRate 0.0185 Epoch: 11 Global Step: 472710 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:13,196-Speed 2626.16 samples/sec Loss 5.6646 LearningRate 0.0185 Epoch: 11 Global Step: 472720 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:17,102-Speed 2621.93 samples/sec Loss 5.6499 LearningRate 0.0185 Epoch: 11 Global Step: 472730 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:20,996-Speed 2630.22 samples/sec Loss 5.7705 LearningRate 0.0185 Epoch: 11 Global Step: 472740 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:24,894-Speed 2627.38 samples/sec Loss 5.8804 LearningRate 0.0185 Epoch: 11 Global Step: 472750 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:28,791-Speed 2628.42 samples/sec Loss 5.7373 LearningRate 0.0185 Epoch: 11 Global Step: 472760 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:32,686-Speed 2630.20 samples/sec Loss 5.7661 LearningRate 0.0185 Epoch: 11 Global Step: 472770 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:36,581-Speed 2629.43 samples/sec Loss 5.8005 LearningRate 0.0185 Epoch: 11 Global Step: 472780 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:40,479-Speed 2627.99 samples/sec Loss 5.7639 LearningRate 0.0185 Epoch: 11 Global Step: 472790 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:44,375-Speed 2628.80 samples/sec Loss 5.6965 LearningRate 0.0185 Epoch: 11 Global Step: 472800 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:48,242-Speed 2648.28 samples/sec Loss 5.7496 LearningRate 0.0185 Epoch: 11 Global Step: 472810 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:52,145-Speed 2624.53 samples/sec Loss 5.7593 LearningRate 0.0185 Epoch: 11 Global Step: 472820 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:56,043-Speed 2626.87 samples/sec Loss 5.7504 LearningRate 0.0185 Epoch: 11 Global Step: 472830 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:51:59,955-Speed 2618.46 samples/sec Loss 5.6813 LearningRate 0.0185 Epoch: 11 Global Step: 472840 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:52:03,855-Speed 2625.97 samples/sec Loss 5.5505 LearningRate 0.0185 Epoch: 11 Global Step: 472850 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:52:07,754-Speed 2627.08 samples/sec Loss 5.8792 LearningRate 0.0185 Epoch: 11 Global Step: 472860 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:52:11,659-Speed 2623.14 samples/sec Loss 5.6960 LearningRate 0.0185 Epoch: 11 Global Step: 472870 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:52:15,567-Speed 2620.67 samples/sec Loss 5.7448 LearningRate 0.0185 Epoch: 11 Global Step: 472880 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:52:19,463-Speed 2629.07 samples/sec Loss 5.7692 LearningRate 0.0185 Epoch: 11 Global Step: 472890 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:52:23,357-Speed 2630.60 samples/sec Loss 5.7535 LearningRate 0.0185 Epoch: 11 Global Step: 472900 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:52:27,268-Speed 2618.29 samples/sec Loss 5.8340 LearningRate 0.0185 Epoch: 11 Global Step: 472910 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:31,170-Speed 2624.75 samples/sec Loss 5.7151 LearningRate 0.0185 Epoch: 11 Global Step: 472920 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:35,069-Speed 2627.49 samples/sec Loss 5.7589 LearningRate 0.0185 Epoch: 11 Global Step: 472930 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:38,970-Speed 2625.63 samples/sec Loss 5.7620 LearningRate 0.0185 Epoch: 11 Global Step: 472940 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:42,863-Speed 2630.92 samples/sec Loss 5.7966 LearningRate 0.0185 Epoch: 11 Global Step: 472950 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:46,756-Speed 2630.94 samples/sec Loss 5.8070 LearningRate 0.0185 Epoch: 11 Global Step: 472960 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:50,649-Speed 2631.00 samples/sec Loss 5.6922 LearningRate 0.0185 Epoch: 11 Global Step: 472970 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:54,555-Speed 2622.38 samples/sec Loss 5.7162 LearningRate 0.0185 Epoch: 11 Global Step: 472980 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:52:58,452-Speed 2628.35 samples/sec Loss 5.7617 LearningRate 0.0185 Epoch: 11 Global Step: 472990 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:53:02,350-Speed 2627.39 samples/sec Loss 5.6266 LearningRate 0.0185 Epoch: 11 Global Step: 473000 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:53:06,246-Speed 2628.86 samples/sec Loss 5.7666 LearningRate 0.0185 Epoch: 11 Global Step: 473010 Fp16 Grad Scale: 262144 Required: 40 hours
Training: 2022-04-15 00:53:10,126-Speed 2639.81 samples/sec Loss 5.7952 LearningRate 0.0185 Epoch: 11 Global Step: 473020 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:53:14,032-Speed 2622.00 samples/sec Loss 5.7205 LearningRate 0.0185 Epoch: 11 Global Step: 473030 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:53:17,908-Speed 2642.93 samples/sec Loss 5.7032 LearningRate 0.0185 Epoch: 11 Global Step: 473040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:21,802-Speed 2630.25 samples/sec Loss 5.7311 LearningRate 0.0185 Epoch: 11 Global Step: 473050 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:25,701-Speed 2627.12 samples/sec Loss 5.7759 LearningRate 0.0185 Epoch: 11 Global Step: 473060 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:29,598-Speed 2628.59 samples/sec Loss 5.7601 LearningRate 0.0185 Epoch: 11 Global Step: 473070 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:33,503-Speed 2622.60 samples/sec Loss 5.7247 LearningRate 0.0185 Epoch: 11 Global Step: 473080 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:37,398-Speed 2629.20 samples/sec Loss 5.7774 LearningRate 0.0185 Epoch: 11 Global Step: 473090 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:41,295-Speed 2628.04 samples/sec Loss 5.6692 LearningRate 0.0185 Epoch: 11 Global Step: 473100 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:45,195-Speed 2626.52 samples/sec Loss 5.6858 LearningRate 0.0185 Epoch: 11 Global Step: 473110 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:49,105-Speed 2619.71 samples/sec Loss 5.7694 LearningRate 0.0185 Epoch: 11 Global Step: 473120 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:53:52,997-Speed 2631.50 samples/sec Loss 5.7131 LearningRate 0.0185 Epoch: 11 Global Step: 473130 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:53:56,905-Speed 2621.12 samples/sec Loss 5.7602 LearningRate 0.0185 Epoch: 11 Global Step: 473140 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:00,807-Speed 2624.55 samples/sec Loss 5.8276 LearningRate 0.0185 Epoch: 11 Global Step: 473150 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:04,747-Speed 2599.78 samples/sec Loss 5.6739 LearningRate 0.0185 Epoch: 11 Global Step: 473160 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:08,644-Speed 2628.28 samples/sec Loss 5.7404 LearningRate 0.0185 Epoch: 11 Global Step: 473170 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:12,567-Speed 2611.31 samples/sec Loss 5.7973 LearningRate 0.0185 Epoch: 11 Global Step: 473180 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:16,538-Speed 2578.74 samples/sec Loss 5.7395 LearningRate 0.0185 Epoch: 11 Global Step: 473190 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:20,431-Speed 2631.22 samples/sec Loss 5.6329 LearningRate 0.0185 Epoch: 11 Global Step: 473200 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:24,327-Speed 2629.01 samples/sec Loss 5.7599 LearningRate 0.0185 Epoch: 11 Global Step: 473210 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:28,231-Speed 2623.50 samples/sec Loss 5.8151 LearningRate 0.0185 Epoch: 11 Global Step: 473220 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 00:54:32,127-Speed 2628.55 samples/sec Loss 5.7315 LearningRate 0.0185 Epoch: 11 Global Step: 473230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:54:36,022-Speed 2629.43 samples/sec Loss 5.6918 LearningRate 0.0185 Epoch: 11 Global Step: 473240 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:54:39,922-Speed 2626.03 samples/sec Loss 5.7713 LearningRate 0.0184 Epoch: 11 Global Step: 473250 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:54:43,845-Speed 2611.65 samples/sec Loss 5.7427 LearningRate 0.0184 Epoch: 11 Global Step: 473260 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:54:47,740-Speed 2629.39 samples/sec Loss 5.6435 LearningRate 0.0184 Epoch: 11 Global Step: 473270 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:54:51,643-Speed 2624.85 samples/sec Loss 5.8448 LearningRate 0.0184 Epoch: 11 Global Step: 473280 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:54:55,537-Speed 2629.63 samples/sec Loss 5.6776 LearningRate 0.0184 Epoch: 11 Global Step: 473290 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:54:59,430-Speed 2631.47 samples/sec Loss 5.7497 LearningRate 0.0184 Epoch: 11 Global Step: 473300 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:03,331-Speed 2625.55 samples/sec Loss 5.7217 LearningRate 0.0184 Epoch: 11 Global Step: 473310 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:07,243-Speed 2617.87 samples/sec Loss 5.7661 LearningRate 0.0184 Epoch: 11 Global Step: 473320 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:11,163-Speed 2612.89 samples/sec Loss 5.5565 LearningRate 0.0184 Epoch: 11 Global Step: 473330 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:55:15,060-Speed 2628.31 samples/sec Loss 5.6658 LearningRate 0.0184 Epoch: 11 Global Step: 473340 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:55:18,954-Speed 2630.08 samples/sec Loss 5.6290 LearningRate 0.0184 Epoch: 11 Global Step: 473350 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:55:22,824-Speed 2647.14 samples/sec Loss 5.6910 LearningRate 0.0184 Epoch: 11 Global Step: 473360 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:26,720-Speed 2628.50 samples/sec Loss 5.7984 LearningRate 0.0184 Epoch: 11 Global Step: 473370 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:30,618-Speed 2628.21 samples/sec Loss 5.8857 LearningRate 0.0184 Epoch: 11 Global Step: 473380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:34,515-Speed 2627.90 samples/sec Loss 5.7198 LearningRate 0.0184 Epoch: 11 Global Step: 473390 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:38,412-Speed 2628.13 samples/sec Loss 5.8170 LearningRate 0.0184 Epoch: 11 Global Step: 473400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:42,305-Speed 2630.98 samples/sec Loss 5.7049 LearningRate 0.0184 Epoch: 11 Global Step: 473410 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:46,212-Speed 2621.30 samples/sec Loss 5.7848 LearningRate 0.0184 Epoch: 11 Global Step: 473420 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:50,108-Speed 2629.11 samples/sec Loss 5.7946 LearningRate 0.0184 Epoch: 11 Global Step: 473430 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:54,009-Speed 2625.24 samples/sec Loss 5.6862 LearningRate 0.0184 Epoch: 11 Global Step: 473440 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:55:57,945-Speed 2602.58 samples/sec Loss 5.7112 LearningRate 0.0184 Epoch: 11 Global Step: 473450 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:56:01,886-Speed 2599.17 samples/sec Loss 5.7228 LearningRate 0.0184 Epoch: 11 Global Step: 473460 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:05,777-Speed 2631.56 samples/sec Loss 5.8079 LearningRate 0.0184 Epoch: 11 Global Step: 473470 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:09,690-Speed 2617.96 samples/sec Loss 5.8166 LearningRate 0.0184 Epoch: 11 Global Step: 473480 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:13,582-Speed 2631.66 samples/sec Loss 5.8798 LearningRate 0.0184 Epoch: 11 Global Step: 473490 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:17,485-Speed 2623.59 samples/sec Loss 5.7135 LearningRate 0.0184 Epoch: 11 Global Step: 473500 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:21,393-Speed 2621.09 samples/sec Loss 5.7863 LearningRate 0.0184 Epoch: 11 Global Step: 473510 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:25,302-Speed 2620.26 samples/sec Loss 5.7941 LearningRate 0.0184 Epoch: 11 Global Step: 473520 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:29,193-Speed 2632.49 samples/sec Loss 5.7247 LearningRate 0.0184 Epoch: 11 Global Step: 473530 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:33,097-Speed 2623.72 samples/sec Loss 5.7693 LearningRate 0.0184 Epoch: 11 Global Step: 473540 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:36,990-Speed 2630.99 samples/sec Loss 5.6818 LearningRate 0.0184 Epoch: 11 Global Step: 473550 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:40,875-Speed 2636.77 samples/sec Loss 5.6918 LearningRate 0.0184 Epoch: 11 Global Step: 473560 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:44,772-Speed 2627.90 samples/sec Loss 5.7148 LearningRate 0.0184 Epoch: 11 Global Step: 473570 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:56:48,645-Speed 2644.94 samples/sec Loss 5.6602 LearningRate 0.0184 Epoch: 11 Global Step: 473580 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:56:52,558-Speed 2617.32 samples/sec Loss 5.7727 LearningRate 0.0184 Epoch: 11 Global Step: 473590 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:56:56,458-Speed 2626.17 samples/sec Loss 5.8229 LearningRate 0.0184 Epoch: 11 Global Step: 473600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:00,356-Speed 2627.12 samples/sec Loss 5.7129 LearningRate 0.0184 Epoch: 11 Global Step: 473610 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:04,278-Speed 2611.82 samples/sec Loss 5.7108 LearningRate 0.0184 Epoch: 11 Global Step: 473620 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:08,180-Speed 2624.87 samples/sec Loss 5.7155 LearningRate 0.0184 Epoch: 11 Global Step: 473630 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:12,072-Speed 2632.01 samples/sec Loss 5.6978 LearningRate 0.0184 Epoch: 11 Global Step: 473640 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:15,982-Speed 2619.70 samples/sec Loss 5.8473 LearningRate 0.0184 Epoch: 11 Global Step: 473650 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:19,885-Speed 2623.67 samples/sec Loss 5.7627 LearningRate 0.0184 Epoch: 11 Global Step: 473660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:23,782-Speed 2628.44 samples/sec Loss 5.7838 LearningRate 0.0184 Epoch: 11 Global Step: 473670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:57:27,679-Speed 2628.01 samples/sec Loss 5.8275 LearningRate 0.0184 Epoch: 11 Global Step: 473680 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:31,585-Speed 2622.66 samples/sec Loss 5.7156 LearningRate 0.0184 Epoch: 11 Global Step: 473690 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:35,528-Speed 2597.08 samples/sec Loss 5.6338 LearningRate 0.0184 Epoch: 11 Global Step: 473700 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:39,440-Speed 2618.71 samples/sec Loss 5.5604 LearningRate 0.0184 Epoch: 11 Global Step: 473710 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:43,335-Speed 2629.01 samples/sec Loss 5.7593 LearningRate 0.0184 Epoch: 11 Global Step: 473720 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:47,232-Speed 2628.63 samples/sec Loss 5.6885 LearningRate 0.0184 Epoch: 11 Global Step: 473730 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:51,134-Speed 2625.31 samples/sec Loss 5.7414 LearningRate 0.0184 Epoch: 11 Global Step: 473740 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:55,025-Speed 2632.17 samples/sec Loss 5.7508 LearningRate 0.0184 Epoch: 11 Global Step: 473750 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:57:58,919-Speed 2630.28 samples/sec Loss 5.6901 LearningRate 0.0184 Epoch: 11 Global Step: 473760 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:02,815-Speed 2628.34 samples/sec Loss 5.7106 LearningRate 0.0184 Epoch: 11 Global Step: 473770 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:06,691-Speed 2643.02 samples/sec Loss 5.7326 LearningRate 0.0184 Epoch: 11 Global Step: 473780 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:10,598-Speed 2621.76 samples/sec Loss 5.7431 LearningRate 0.0184 Epoch: 11 Global Step: 473790 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:14,505-Speed 2622.15 samples/sec Loss 5.7647 LearningRate 0.0184 Epoch: 11 Global Step: 473800 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:18,407-Speed 2624.64 samples/sec Loss 5.7275 LearningRate 0.0184 Epoch: 11 Global Step: 473810 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:22,311-Speed 2623.40 samples/sec Loss 5.7583 LearningRate 0.0184 Epoch: 11 Global Step: 473820 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:26,203-Speed 2631.77 samples/sec Loss 5.7812 LearningRate 0.0184 Epoch: 11 Global Step: 473830 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:30,102-Speed 2627.33 samples/sec Loss 5.7831 LearningRate 0.0184 Epoch: 11 Global Step: 473840 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:58:34,037-Speed 2602.74 samples/sec Loss 5.8192 LearningRate 0.0184 Epoch: 11 Global Step: 473850 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:58:37,934-Speed 2628.21 samples/sec Loss 5.6950 LearningRate 0.0184 Epoch: 11 Global Step: 473860 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:58:41,830-Speed 2628.41 samples/sec Loss 5.7460 LearningRate 0.0184 Epoch: 11 Global Step: 473870 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:58:45,737-Speed 2621.76 samples/sec Loss 5.7140 LearningRate 0.0184 Epoch: 11 Global Step: 473880 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:58:49,631-Speed 2630.39 samples/sec Loss 5.8171 LearningRate 0.0184 Epoch: 11 Global Step: 473890 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:58:53,541-Speed 2619.37 samples/sec Loss 5.7663 LearningRate 0.0184 Epoch: 11 Global Step: 473900 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:58:57,437-Speed 2628.70 samples/sec Loss 5.7380 LearningRate 0.0184 Epoch: 11 Global Step: 473910 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:59:01,340-Speed 2624.26 samples/sec Loss 5.6883 LearningRate 0.0184 Epoch: 11 Global Step: 473920 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:59:05,236-Speed 2628.87 samples/sec Loss 5.5937 LearningRate 0.0184 Epoch: 11 Global Step: 473930 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:59:09,127-Speed 2632.59 samples/sec Loss 5.6194 LearningRate 0.0184 Epoch: 11 Global Step: 473940 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 00:59:13,033-Speed 2622.28 samples/sec Loss 5.8899 LearningRate 0.0184 Epoch: 11 Global Step: 473950 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:16,931-Speed 2627.56 samples/sec Loss 5.7171 LearningRate 0.0184 Epoch: 11 Global Step: 473960 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:20,825-Speed 2630.36 samples/sec Loss 5.7640 LearningRate 0.0184 Epoch: 11 Global Step: 473970 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:24,714-Speed 2633.11 samples/sec Loss 5.8836 LearningRate 0.0184 Epoch: 11 Global Step: 473980 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:28,610-Speed 2629.64 samples/sec Loss 5.7366 LearningRate 0.0184 Epoch: 11 Global Step: 473990 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:32,524-Speed 2616.65 samples/sec Loss 5.6740 LearningRate 0.0184 Epoch: 11 Global Step: 474000 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:36,418-Speed 2629.51 samples/sec Loss 5.7326 LearningRate 0.0184 Epoch: 11 Global Step: 474010 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:40,311-Speed 2631.20 samples/sec Loss 5.7985 LearningRate 0.0184 Epoch: 11 Global Step: 474020 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:44,210-Speed 2627.48 samples/sec Loss 5.7719 LearningRate 0.0184 Epoch: 11 Global Step: 474030 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:48,107-Speed 2628.03 samples/sec Loss 5.6277 LearningRate 0.0184 Epoch: 11 Global Step: 474040 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:51,983-Speed 2642.70 samples/sec Loss 5.6780 LearningRate 0.0184 Epoch: 11 Global Step: 474050 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:55,884-Speed 2625.53 samples/sec Loss 5.7055 LearningRate 0.0184 Epoch: 11 Global Step: 474060 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 00:59:59,793-Speed 2620.75 samples/sec Loss 5.7568 LearningRate 0.0184 Epoch: 11 Global Step: 474070 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:00:03,704-Speed 2619.03 samples/sec Loss 5.7128 LearningRate 0.0184 Epoch: 11 Global Step: 474080 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:00:07,586-Speed 2638.07 samples/sec Loss 5.7283 LearningRate 0.0184 Epoch: 11 Global Step: 474090 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:11,489-Speed 2623.74 samples/sec Loss 5.7051 LearningRate 0.0184 Epoch: 11 Global Step: 474100 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:15,385-Speed 2629.54 samples/sec Loss 5.7421 LearningRate 0.0184 Epoch: 11 Global Step: 474110 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:19,282-Speed 2628.53 samples/sec Loss 5.8808 LearningRate 0.0184 Epoch: 11 Global Step: 474120 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:23,189-Speed 2621.79 samples/sec Loss 5.6694 LearningRate 0.0184 Epoch: 11 Global Step: 474130 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:27,136-Speed 2594.99 samples/sec Loss 5.7610 LearningRate 0.0184 Epoch: 11 Global Step: 474140 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:31,040-Speed 2623.52 samples/sec Loss 5.7031 LearningRate 0.0184 Epoch: 11 Global Step: 474150 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:34,937-Speed 2628.13 samples/sec Loss 5.8995 LearningRate 0.0184 Epoch: 11 Global Step: 474160 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:38,840-Speed 2623.72 samples/sec Loss 5.6600 LearningRate 0.0184 Epoch: 11 Global Step: 474170 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:42,741-Speed 2625.54 samples/sec Loss 5.6928 LearningRate 0.0184 Epoch: 11 Global Step: 474180 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:46,642-Speed 2625.47 samples/sec Loss 5.7305 LearningRate 0.0184 Epoch: 11 Global Step: 474190 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:00:50,540-Speed 2627.92 samples/sec Loss 5.7656 LearningRate 0.0184 Epoch: 11 Global Step: 474200 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:00:54,425-Speed 2636.50 samples/sec Loss 5.7248 LearningRate 0.0184 Epoch: 11 Global Step: 474210 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:00:58,321-Speed 2629.10 samples/sec Loss 5.8208 LearningRate 0.0183 Epoch: 11 Global Step: 474220 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:02,216-Speed 2629.57 samples/sec Loss 5.7581 LearningRate 0.0183 Epoch: 11 Global Step: 474230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:06,113-Speed 2628.23 samples/sec Loss 5.6660 LearningRate 0.0183 Epoch: 11 Global Step: 474240 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:10,020-Speed 2621.34 samples/sec Loss 5.7317 LearningRate 0.0183 Epoch: 11 Global Step: 474250 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:13,913-Speed 2631.21 samples/sec Loss 5.6724 LearningRate 0.0183 Epoch: 11 Global Step: 474260 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:17,818-Speed 2623.24 samples/sec Loss 5.6523 LearningRate 0.0183 Epoch: 11 Global Step: 474270 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:21,764-Speed 2595.32 samples/sec Loss 5.7850 LearningRate 0.0183 Epoch: 11 Global Step: 474280 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:25,723-Speed 2587.09 samples/sec Loss 5.7329 LearningRate 0.0183 Epoch: 11 Global Step: 474290 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:29,621-Speed 2633.42 samples/sec Loss 5.7816 LearningRate 0.0183 Epoch: 11 Global Step: 474300 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:33,517-Speed 2629.22 samples/sec Loss 5.6956 LearningRate 0.0183 Epoch: 11 Global Step: 474310 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:01:37,484-Speed 2581.91 samples/sec Loss 5.6416 LearningRate 0.0183 Epoch: 11 Global Step: 474320 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:01:41,466-Speed 2571.65 samples/sec Loss 5.7233 LearningRate 0.0183 Epoch: 11 Global Step: 474330 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:01:45,362-Speed 2628.92 samples/sec Loss 5.7621 LearningRate 0.0183 Epoch: 11 Global Step: 474340 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:01:49,238-Speed 2642.71 samples/sec Loss 5.6939 LearningRate 0.0183 Epoch: 11 Global Step: 474350 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:53,132-Speed 2630.71 samples/sec Loss 5.7545 LearningRate 0.0183 Epoch: 11 Global Step: 474360 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:01:57,030-Speed 2627.23 samples/sec Loss 5.7022 LearningRate 0.0183 Epoch: 11 Global Step: 474370 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:02:00,923-Speed 2630.90 samples/sec Loss 5.8043 LearningRate 0.0183 Epoch: 11 Global Step: 474380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:02:04,801-Speed 2641.34 samples/sec Loss 5.6154 LearningRate 0.0183 Epoch: 11 Global Step: 474390 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:08,701-Speed 2625.74 samples/sec Loss 5.7094 LearningRate 0.0183 Epoch: 11 Global Step: 474400 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:12,630-Speed 2606.91 samples/sec Loss 6.0120 LearningRate 0.0183 Epoch: 11 Global Step: 474410 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:16,571-Speed 2599.84 samples/sec Loss 5.7428 LearningRate 0.0183 Epoch: 11 Global Step: 474420 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:20,489-Speed 2614.34 samples/sec Loss 5.8098 LearningRate 0.0183 Epoch: 11 Global Step: 474430 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:24,385-Speed 2628.27 samples/sec Loss 5.6431 LearningRate 0.0183 Epoch: 11 Global Step: 474440 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:28,286-Speed 2625.44 samples/sec Loss 5.7169 LearningRate 0.0183 Epoch: 11 Global Step: 474450 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:32,191-Speed 2623.21 samples/sec Loss 5.7571 LearningRate 0.0183 Epoch: 11 Global Step: 474460 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:36,092-Speed 2625.78 samples/sec Loss 5.5909 LearningRate 0.0183 Epoch: 11 Global Step: 474470 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:39,990-Speed 2627.50 samples/sec Loss 5.6795 LearningRate 0.0183 Epoch: 11 Global Step: 474480 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:02:43,890-Speed 2626.35 samples/sec Loss 5.7496 LearningRate 0.0183 Epoch: 11 Global Step: 474490 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:02:47,789-Speed 2626.92 samples/sec Loss 5.6960 LearningRate 0.0183 Epoch: 11 Global Step: 474500 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:02:51,683-Speed 2630.41 samples/sec Loss 5.6525 LearningRate 0.0183 Epoch: 11 Global Step: 474510 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:02:55,575-Speed 2632.16 samples/sec Loss 5.6763 LearningRate 0.0183 Epoch: 11 Global Step: 474520 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:02:59,466-Speed 2632.22 samples/sec Loss 5.8074 LearningRate 0.0183 Epoch: 11 Global Step: 474530 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:03,362-Speed 2628.35 samples/sec Loss 5.7501 LearningRate 0.0183 Epoch: 11 Global Step: 474540 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:07,254-Speed 2631.90 samples/sec Loss 5.8105 LearningRate 0.0183 Epoch: 11 Global Step: 474550 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:11,144-Speed 2632.70 samples/sec Loss 5.7691 LearningRate 0.0183 Epoch: 11 Global Step: 474560 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:15,040-Speed 2629.49 samples/sec Loss 5.7555 LearningRate 0.0183 Epoch: 11 Global Step: 474570 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:18,941-Speed 2625.39 samples/sec Loss 5.7793 LearningRate 0.0183 Epoch: 11 Global Step: 474580 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:22,823-Speed 2638.32 samples/sec Loss 5.7545 LearningRate 0.0183 Epoch: 11 Global Step: 474590 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:26,717-Speed 2630.51 samples/sec Loss 5.7698 LearningRate 0.0183 Epoch: 11 Global Step: 474600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:30,611-Speed 2630.40 samples/sec Loss 5.7083 LearningRate 0.0183 Epoch: 11 Global Step: 474610 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:34,505-Speed 2629.75 samples/sec Loss 5.6289 LearningRate 0.0183 Epoch: 11 Global Step: 474620 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:38,398-Speed 2631.18 samples/sec Loss 5.7194 LearningRate 0.0183 Epoch: 11 Global Step: 474630 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:03:42,270-Speed 2645.15 samples/sec Loss 5.7129 LearningRate 0.0183 Epoch: 11 Global Step: 474640 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:03:46,166-Speed 2629.18 samples/sec Loss 5.7387 LearningRate 0.0183 Epoch: 11 Global Step: 474650 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:03:50,057-Speed 2632.67 samples/sec Loss 5.6833 LearningRate 0.0183 Epoch: 11 Global Step: 474660 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:03:53,951-Speed 2630.46 samples/sec Loss 5.8437 LearningRate 0.0183 Epoch: 11 Global Step: 474670 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:03:57,851-Speed 2625.89 samples/sec Loss 5.7555 LearningRate 0.0183 Epoch: 11 Global Step: 474680 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:04:01,747-Speed 2628.61 samples/sec Loss 5.6080 LearningRate 0.0183 Epoch: 11 Global Step: 474690 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:04:05,647-Speed 2626.52 samples/sec Loss 5.5309 LearningRate 0.0183 Epoch: 11 Global Step: 474700 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:04:09,539-Speed 2631.82 samples/sec Loss 5.7133 LearningRate 0.0183 Epoch: 11 Global Step: 474710 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:04:13,435-Speed 2628.71 samples/sec Loss 5.7060 LearningRate 0.0183 Epoch: 11 Global Step: 474720 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:04:17,332-Speed 2628.55 samples/sec Loss 5.6716 LearningRate 0.0183 Epoch: 11 Global Step: 474730 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:04:21,328-Speed 2563.07 samples/sec Loss 5.7634 LearningRate 0.0183 Epoch: 11 Global Step: 474740 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:25,222-Speed 2630.53 samples/sec Loss 5.7387 LearningRate 0.0183 Epoch: 11 Global Step: 474750 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:29,120-Speed 2627.55 samples/sec Loss 5.7002 LearningRate 0.0183 Epoch: 11 Global Step: 474760 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:33,023-Speed 2624.23 samples/sec Loss 5.7820 LearningRate 0.0183 Epoch: 11 Global Step: 474770 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:36,925-Speed 2624.43 samples/sec Loss 5.7790 LearningRate 0.0183 Epoch: 11 Global Step: 474780 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:40,841-Speed 2616.03 samples/sec Loss 5.7888 LearningRate 0.0183 Epoch: 11 Global Step: 474790 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:44,739-Speed 2627.26 samples/sec Loss 5.7431 LearningRate 0.0183 Epoch: 11 Global Step: 474800 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:48,643-Speed 2626.79 samples/sec Loss 5.5903 LearningRate 0.0183 Epoch: 11 Global Step: 474810 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:52,545-Speed 2624.57 samples/sec Loss 5.8243 LearningRate 0.0183 Epoch: 11 Global Step: 474820 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:04:56,441-Speed 2629.44 samples/sec Loss 5.6740 LearningRate 0.0183 Epoch: 11 Global Step: 474830 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:00,346-Speed 2622.32 samples/sec Loss 5.6031 LearningRate 0.0183 Epoch: 11 Global Step: 474840 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:05:04,240-Speed 2630.72 samples/sec Loss 5.7730 LearningRate 0.0183 Epoch: 11 Global Step: 474850 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:05:08,110-Speed 2646.46 samples/sec Loss 5.7620 LearningRate 0.0183 Epoch: 11 Global Step: 474860 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:12,005-Speed 2629.31 samples/sec Loss 5.7440 LearningRate 0.0183 Epoch: 11 Global Step: 474870 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:15,902-Speed 2628.59 samples/sec Loss 5.7140 LearningRate 0.0183 Epoch: 11 Global Step: 474880 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:19,816-Speed 2616.53 samples/sec Loss 5.7523 LearningRate 0.0183 Epoch: 11 Global Step: 474890 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:23,728-Speed 2618.52 samples/sec Loss 5.7026 LearningRate 0.0183 Epoch: 11 Global Step: 474900 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:27,623-Speed 2629.75 samples/sec Loss 5.5824 LearningRate 0.0183 Epoch: 11 Global Step: 474910 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:31,524-Speed 2625.82 samples/sec Loss 5.7068 LearningRate 0.0183 Epoch: 11 Global Step: 474920 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:35,423-Speed 2626.82 samples/sec Loss 5.7276 LearningRate 0.0183 Epoch: 11 Global Step: 474930 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:39,324-Speed 2625.00 samples/sec Loss 5.7054 LearningRate 0.0183 Epoch: 11 Global Step: 474940 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:43,220-Speed 2628.74 samples/sec Loss 5.7361 LearningRate 0.0183 Epoch: 11 Global Step: 474950 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:47,117-Speed 2628.53 samples/sec Loss 5.7385 LearningRate 0.0183 Epoch: 11 Global Step: 474960 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:05:51,001-Speed 2636.90 samples/sec Loss 5.8009 LearningRate 0.0183 Epoch: 11 Global Step: 474970 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:54,900-Speed 2626.62 samples/sec Loss 5.7311 LearningRate 0.0183 Epoch: 11 Global Step: 474980 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:05:58,831-Speed 2605.60 samples/sec Loss 5.7917 LearningRate 0.0183 Epoch: 11 Global Step: 474990 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:02,740-Speed 2620.82 samples/sec Loss 5.6428 LearningRate 0.0183 Epoch: 11 Global Step: 475000 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:06,645-Speed 2622.99 samples/sec Loss 5.6893 LearningRate 0.0183 Epoch: 11 Global Step: 475010 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:10,542-Speed 2627.57 samples/sec Loss 5.7728 LearningRate 0.0183 Epoch: 11 Global Step: 475020 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:14,447-Speed 2623.85 samples/sec Loss 5.7101 LearningRate 0.0183 Epoch: 11 Global Step: 475030 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:18,359-Speed 2617.91 samples/sec Loss 5.8106 LearningRate 0.0183 Epoch: 11 Global Step: 475040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:22,286-Speed 2608.70 samples/sec Loss 5.7885 LearningRate 0.0183 Epoch: 11 Global Step: 475050 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:26,188-Speed 2624.71 samples/sec Loss 5.6848 LearningRate 0.0183 Epoch: 11 Global Step: 475060 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:06:30,097-Speed 2620.70 samples/sec Loss 5.6835 LearningRate 0.0183 Epoch: 11 Global Step: 475070 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:06:33,998-Speed 2625.36 samples/sec Loss 5.6958 LearningRate 0.0183 Epoch: 11 Global Step: 475080 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:06:37,898-Speed 2625.92 samples/sec Loss 5.6619 LearningRate 0.0183 Epoch: 11 Global Step: 475090 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:06:41,812-Speed 2617.38 samples/sec Loss 5.6288 LearningRate 0.0183 Epoch: 11 Global Step: 475100 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:06:45,717-Speed 2623.03 samples/sec Loss 5.6047 LearningRate 0.0183 Epoch: 11 Global Step: 475110 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:06:49,615-Speed 2627.59 samples/sec Loss 5.6755 LearningRate 0.0183 Epoch: 11 Global Step: 475120 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:06:53,536-Speed 2612.27 samples/sec Loss 5.6773 LearningRate 0.0183 Epoch: 11 Global Step: 475130 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:06:57,434-Speed 2627.63 samples/sec Loss 5.7668 LearningRate 0.0183 Epoch: 11 Global Step: 475140 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:01,348-Speed 2616.79 samples/sec Loss 5.7464 LearningRate 0.0183 Epoch: 11 Global Step: 475150 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:05,245-Speed 2628.36 samples/sec Loss 5.7140 LearningRate 0.0183 Epoch: 11 Global Step: 475160 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:09,132-Speed 2635.05 samples/sec Loss 5.7485 LearningRate 0.0183 Epoch: 11 Global Step: 475170 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:13,026-Speed 2629.71 samples/sec Loss 5.6199 LearningRate 0.0183 Epoch: 11 Global Step: 475180 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:16,922-Speed 2629.19 samples/sec Loss 5.7454 LearningRate 0.0182 Epoch: 11 Global Step: 475190 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:20,817-Speed 2630.25 samples/sec Loss 5.7122 LearningRate 0.0182 Epoch: 11 Global Step: 475200 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:24,716-Speed 2626.63 samples/sec Loss 5.6703 LearningRate 0.0182 Epoch: 11 Global Step: 475210 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:07:28,599-Speed 2637.89 samples/sec Loss 5.6331 LearningRate 0.0182 Epoch: 11 Global Step: 475220 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:32,492-Speed 2631.05 samples/sec Loss 5.6535 LearningRate 0.0182 Epoch: 11 Global Step: 475230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:36,386-Speed 2630.39 samples/sec Loss 5.6878 LearningRate 0.0182 Epoch: 11 Global Step: 475240 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:40,280-Speed 2630.03 samples/sec Loss 5.6959 LearningRate 0.0182 Epoch: 11 Global Step: 475250 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:44,185-Speed 2622.79 samples/sec Loss 5.7053 LearningRate 0.0182 Epoch: 11 Global Step: 475260 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:48,094-Speed 2620.23 samples/sec Loss 5.6449 LearningRate 0.0182 Epoch: 11 Global Step: 475270 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:51,991-Speed 2628.45 samples/sec Loss 5.7149 LearningRate 0.0182 Epoch: 11 Global Step: 475280 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:55,889-Speed 2627.55 samples/sec Loss 5.6557 LearningRate 0.0182 Epoch: 11 Global Step: 475290 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:07:59,758-Speed 2647.70 samples/sec Loss 5.7850 LearningRate 0.0182 Epoch: 11 Global Step: 475300 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:03,661-Speed 2624.83 samples/sec Loss 5.6548 LearningRate 0.0182 Epoch: 11 Global Step: 475310 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:07,555-Speed 2630.28 samples/sec Loss 5.7689 LearningRate 0.0182 Epoch: 11 Global Step: 475320 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:11,449-Speed 2629.97 samples/sec Loss 5.7161 LearningRate 0.0182 Epoch: 11 Global Step: 475330 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:15,346-Speed 2628.65 samples/sec Loss 5.7458 LearningRate 0.0182 Epoch: 11 Global Step: 475340 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:19,239-Speed 2630.25 samples/sec Loss 5.7320 LearningRate 0.0182 Epoch: 11 Global Step: 475350 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:23,222-Speed 2571.95 samples/sec Loss 5.7175 LearningRate 0.0182 Epoch: 11 Global Step: 475360 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:27,136-Speed 2616.72 samples/sec Loss 5.7460 LearningRate 0.0182 Epoch: 11 Global Step: 475370 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:31,030-Speed 2630.76 samples/sec Loss 5.6897 LearningRate 0.0182 Epoch: 11 Global Step: 475380 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:34,930-Speed 2626.63 samples/sec Loss 5.6861 LearningRate 0.0182 Epoch: 11 Global Step: 475390 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:08:38,854-Speed 2609.66 samples/sec Loss 5.6943 LearningRate 0.0182 Epoch: 11 Global Step: 475400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:08:42,747-Speed 2631.25 samples/sec Loss 5.7211 LearningRate 0.0182 Epoch: 11 Global Step: 475410 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:08:46,639-Speed 2631.55 samples/sec Loss 5.6229 LearningRate 0.0182 Epoch: 11 Global Step: 475420 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:08:50,540-Speed 2625.73 samples/sec Loss 5.6665 LearningRate 0.0182 Epoch: 11 Global Step: 475430 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:08:54,449-Speed 2620.49 samples/sec Loss 5.7999 LearningRate 0.0182 Epoch: 11 Global Step: 475440 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:08:58,361-Speed 2618.37 samples/sec Loss 5.8156 LearningRate 0.0182 Epoch: 11 Global Step: 475450 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:09:02,262-Speed 2624.84 samples/sec Loss 5.7120 LearningRate 0.0182 Epoch: 11 Global Step: 475460 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:09:06,164-Speed 2625.54 samples/sec Loss 5.7334 LearningRate 0.0182 Epoch: 11 Global Step: 475470 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:09:10,058-Speed 2630.41 samples/sec Loss 5.7086 LearningRate 0.0182 Epoch: 11 Global Step: 475480 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:09:13,958-Speed 2625.82 samples/sec Loss 5.7060 LearningRate 0.0182 Epoch: 11 Global Step: 475490 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:09:17,853-Speed 2629.89 samples/sec Loss 5.6616 LearningRate 0.0182 Epoch: 11 Global Step: 475500 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:21,759-Speed 2622.66 samples/sec Loss 5.6638 LearningRate 0.0182 Epoch: 11 Global Step: 475510 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:25,661-Speed 2627.85 samples/sec Loss 5.7171 LearningRate 0.0182 Epoch: 11 Global Step: 475520 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:29,556-Speed 2629.39 samples/sec Loss 5.7309 LearningRate 0.0182 Epoch: 11 Global Step: 475530 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:33,450-Speed 2630.69 samples/sec Loss 5.6862 LearningRate 0.0182 Epoch: 11 Global Step: 475540 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:37,346-Speed 2628.58 samples/sec Loss 5.7750 LearningRate 0.0182 Epoch: 11 Global Step: 475550 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:41,243-Speed 2628.93 samples/sec Loss 5.7725 LearningRate 0.0182 Epoch: 11 Global Step: 475560 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:45,137-Speed 2630.45 samples/sec Loss 5.6845 LearningRate 0.0182 Epoch: 11 Global Step: 475570 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:49,032-Speed 2629.79 samples/sec Loss 5.6784 LearningRate 0.0182 Epoch: 11 Global Step: 475580 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:52,924-Speed 2631.39 samples/sec Loss 5.7272 LearningRate 0.0182 Epoch: 11 Global Step: 475590 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:09:56,820-Speed 2629.27 samples/sec Loss 5.7776 LearningRate 0.0182 Epoch: 11 Global Step: 475600 Fp16 Grad Scale: 262144 Required: 40 hours
Training: 2022-04-15 01:10:00,693-Speed 2644.06 samples/sec Loss 5.7056 LearningRate 0.0182 Epoch: 11 Global Step: 475610 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:10:04,590-Speed 2628.09 samples/sec Loss 5.7139 LearningRate 0.0182 Epoch: 11 Global Step: 475620 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:10:08,484-Speed 2630.12 samples/sec Loss 5.6695 LearningRate 0.0182 Epoch: 11 Global Step: 475630 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:10:12,377-Speed 2632.05 samples/sec Loss 5.6789 LearningRate 0.0182 Epoch: 11 Global Step: 475640 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:10:16,278-Speed 2626.18 samples/sec Loss 5.7928 LearningRate 0.0182 Epoch: 11 Global Step: 475650 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:10:20,155-Speed 2641.40 samples/sec Loss 5.6791 LearningRate 0.0182 Epoch: 11 Global Step: 475660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:24,051-Speed 2629.33 samples/sec Loss 5.7703 LearningRate 0.0182 Epoch: 11 Global Step: 475670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:27,948-Speed 2628.48 samples/sec Loss 5.7323 LearningRate 0.0182 Epoch: 11 Global Step: 475680 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:31,852-Speed 2622.80 samples/sec Loss 5.7500 LearningRate 0.0182 Epoch: 11 Global Step: 475690 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:35,745-Speed 2631.06 samples/sec Loss 5.6477 LearningRate 0.0182 Epoch: 11 Global Step: 475700 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:39,657-Speed 2618.70 samples/sec Loss 5.7855 LearningRate 0.0182 Epoch: 11 Global Step: 475710 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:43,553-Speed 2629.37 samples/sec Loss 5.7247 LearningRate 0.0182 Epoch: 11 Global Step: 475720 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:47,453-Speed 2626.59 samples/sec Loss 5.7495 LearningRate 0.0182 Epoch: 11 Global Step: 475730 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:51,339-Speed 2635.69 samples/sec Loss 5.5916 LearningRate 0.0182 Epoch: 11 Global Step: 475740 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:55,230-Speed 2633.01 samples/sec Loss 5.7061 LearningRate 0.0182 Epoch: 11 Global Step: 475750 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:10:59,135-Speed 2622.29 samples/sec Loss 5.8818 LearningRate 0.0182 Epoch: 11 Global Step: 475760 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:03,029-Speed 2630.13 samples/sec Loss 5.8116 LearningRate 0.0182 Epoch: 11 Global Step: 475770 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:06,920-Speed 2632.20 samples/sec Loss 5.6217 LearningRate 0.0182 Epoch: 11 Global Step: 475780 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:10,814-Speed 2631.32 samples/sec Loss 5.6717 LearningRate 0.0182 Epoch: 11 Global Step: 475790 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:14,716-Speed 2624.28 samples/sec Loss 5.5949 LearningRate 0.0182 Epoch: 11 Global Step: 475800 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:18,641-Speed 2609.77 samples/sec Loss 5.6457 LearningRate 0.0182 Epoch: 11 Global Step: 475810 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:22,542-Speed 2626.04 samples/sec Loss 5.6973 LearningRate 0.0182 Epoch: 11 Global Step: 475820 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:26,445-Speed 2624.27 samples/sec Loss 5.7797 LearningRate 0.0182 Epoch: 11 Global Step: 475830 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:30,337-Speed 2631.58 samples/sec Loss 5.6496 LearningRate 0.0182 Epoch: 11 Global Step: 475840 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:34,244-Speed 2621.40 samples/sec Loss 5.6946 LearningRate 0.0182 Epoch: 11 Global Step: 475850 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:38,130-Speed 2635.79 samples/sec Loss 5.7374 LearningRate 0.0182 Epoch: 11 Global Step: 475860 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:42,022-Speed 2631.71 samples/sec Loss 5.7115 LearningRate 0.0182 Epoch: 11 Global Step: 475870 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:45,916-Speed 2630.21 samples/sec Loss 5.5607 LearningRate 0.0182 Epoch: 11 Global Step: 475880 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:49,810-Speed 2630.91 samples/sec Loss 5.7868 LearningRate 0.0182 Epoch: 11 Global Step: 475890 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:53,701-Speed 2632.86 samples/sec Loss 5.7247 LearningRate 0.0182 Epoch: 11 Global Step: 475900 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:11:57,598-Speed 2627.90 samples/sec Loss 5.8083 LearningRate 0.0182 Epoch: 11 Global Step: 475910 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:12:01,530-Speed 2605.37 samples/sec Loss 5.7976 LearningRate 0.0182 Epoch: 11 Global Step: 475920 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:12:05,436-Speed 2622.21 samples/sec Loss 5.7248 LearningRate 0.0182 Epoch: 11 Global Step: 475930 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:12:09,459-Speed 2546.22 samples/sec Loss 5.6332 LearningRate 0.0182 Epoch: 11 Global Step: 475940 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:12:13,388-Speed 2606.35 samples/sec Loss 5.7785 LearningRate 0.0182 Epoch: 11 Global Step: 475950 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:12:17,283-Speed 2630.33 samples/sec Loss 5.7184 LearningRate 0.0182 Epoch: 11 Global Step: 475960 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:21,193-Speed 2620.99 samples/sec Loss 5.7368 LearningRate 0.0182 Epoch: 11 Global Step: 475970 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:25,186-Speed 2565.04 samples/sec Loss 5.6993 LearningRate 0.0182 Epoch: 11 Global Step: 475980 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:29,087-Speed 2626.04 samples/sec Loss 5.7637 LearningRate 0.0182 Epoch: 11 Global Step: 475990 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:32,998-Speed 2618.62 samples/sec Loss 5.6876 LearningRate 0.0182 Epoch: 11 Global Step: 476000 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:36,909-Speed 2618.57 samples/sec Loss 5.7717 LearningRate 0.0182 Epoch: 11 Global Step: 476010 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:40,811-Speed 2624.72 samples/sec Loss 5.6529 LearningRate 0.0182 Epoch: 11 Global Step: 476020 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:44,720-Speed 2620.65 samples/sec Loss 5.6963 LearningRate 0.0182 Epoch: 11 Global Step: 476030 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:48,622-Speed 2624.99 samples/sec Loss 5.6534 LearningRate 0.0182 Epoch: 11 Global Step: 476040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:52,524-Speed 2624.90 samples/sec Loss 5.7627 LearningRate 0.0182 Epoch: 11 Global Step: 476050 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:12:56,458-Speed 2603.41 samples/sec Loss 5.6930 LearningRate 0.0182 Epoch: 11 Global Step: 476060 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:13:00,352-Speed 2630.67 samples/sec Loss 5.6748 LearningRate 0.0182 Epoch: 11 Global Step: 476070 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:13:04,254-Speed 2624.41 samples/sec Loss 5.7106 LearningRate 0.0182 Epoch: 11 Global Step: 476080 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:13:08,147-Speed 2631.45 samples/sec Loss 5.8067 LearningRate 0.0182 Epoch: 11 Global Step: 476090 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:13:12,042-Speed 2629.21 samples/sec Loss 5.5525 LearningRate 0.0182 Epoch: 11 Global Step: 476100 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:13:15,939-Speed 2628.61 samples/sec Loss 5.6537 LearningRate 0.0182 Epoch: 11 Global Step: 476110 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:13:19,808-Speed 2647.20 samples/sec Loss 5.6680 LearningRate 0.0182 Epoch: 11 Global Step: 476120 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:23,727-Speed 2613.85 samples/sec Loss 5.7368 LearningRate 0.0182 Epoch: 11 Global Step: 476130 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:27,621-Speed 2630.13 samples/sec Loss 5.7477 LearningRate 0.0182 Epoch: 11 Global Step: 476140 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:31,536-Speed 2616.48 samples/sec Loss 5.6795 LearningRate 0.0182 Epoch: 11 Global Step: 476150 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:35,492-Speed 2589.14 samples/sec Loss 5.6467 LearningRate 0.0181 Epoch: 11 Global Step: 476160 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:39,583-Speed 2503.79 samples/sec Loss 5.8316 LearningRate 0.0181 Epoch: 11 Global Step: 476170 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:43,678-Speed 2501.21 samples/sec Loss 5.6868 LearningRate 0.0181 Epoch: 11 Global Step: 476180 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:47,605-Speed 2608.16 samples/sec Loss 5.7210 LearningRate 0.0181 Epoch: 11 Global Step: 476190 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:51,527-Speed 2611.67 samples/sec Loss 5.7470 LearningRate 0.0181 Epoch: 11 Global Step: 476200 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:55,620-Speed 2502.32 samples/sec Loss 5.7311 LearningRate 0.0181 Epoch: 11 Global Step: 476210 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:13:59,558-Speed 2601.37 samples/sec Loss 5.6412 LearningRate 0.0181 Epoch: 11 Global Step: 476220 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:14:03,443-Speed 2636.61 samples/sec Loss 5.7345 LearningRate 0.0181 Epoch: 11 Global Step: 476230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:14:07,347-Speed 2623.22 samples/sec Loss 5.6109 LearningRate 0.0181 Epoch: 11 Global Step: 476240 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:14:11,222-Speed 2642.69 samples/sec Loss 5.7211 LearningRate 0.0181 Epoch: 11 Global Step: 476250 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:15,118-Speed 2629.53 samples/sec Loss 5.7002 LearningRate 0.0181 Epoch: 11 Global Step: 476260 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:19,010-Speed 2631.87 samples/sec Loss 5.7442 LearningRate 0.0181 Epoch: 11 Global Step: 476270 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:22,939-Speed 2606.43 samples/sec Loss 5.6381 LearningRate 0.0181 Epoch: 11 Global Step: 476280 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:26,853-Speed 2617.73 samples/sec Loss 5.6395 LearningRate 0.0181 Epoch: 11 Global Step: 476290 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:30,749-Speed 2629.27 samples/sec Loss 5.6094 LearningRate 0.0181 Epoch: 11 Global Step: 476300 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:34,644-Speed 2629.09 samples/sec Loss 5.6163 LearningRate 0.0181 Epoch: 11 Global Step: 476310 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:38,562-Speed 2614.63 samples/sec Loss 5.6605 LearningRate 0.0181 Epoch: 11 Global Step: 476320 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:42,463-Speed 2625.79 samples/sec Loss 5.7626 LearningRate 0.0181 Epoch: 11 Global Step: 476330 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:46,355-Speed 2631.78 samples/sec Loss 5.6895 LearningRate 0.0181 Epoch: 11 Global Step: 476340 Fp16 Grad Scale: 32768 Required: 40 hours
Training: 2022-04-15 01:14:50,248-Speed 2631.49 samples/sec Loss 5.7806 LearningRate 0.0181 Epoch: 11 Global Step: 476350 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:14:54,142-Speed 2629.83 samples/sec Loss 5.7097 LearningRate 0.0181 Epoch: 11 Global Step: 476360 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:14:58,063-Speed 2612.57 samples/sec Loss 5.7054 LearningRate 0.0181 Epoch: 11 Global Step: 476370 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:02,306-Speed 2414.13 samples/sec Loss 5.6145 LearningRate 0.0181 Epoch: 11 Global Step: 476380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:06,202-Speed 2628.75 samples/sec Loss 5.7575 LearningRate 0.0181 Epoch: 11 Global Step: 476390 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:10,101-Speed 2626.84 samples/sec Loss 5.7143 LearningRate 0.0181 Epoch: 11 Global Step: 476400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:13,998-Speed 2628.19 samples/sec Loss 5.6695 LearningRate 0.0181 Epoch: 11 Global Step: 476410 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:17,899-Speed 2626.17 samples/sec Loss 5.6219 LearningRate 0.0181 Epoch: 11 Global Step: 476420 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:21,824-Speed 2609.22 samples/sec Loss 5.6533 LearningRate 0.0181 Epoch: 11 Global Step: 476430 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:25,720-Speed 2629.35 samples/sec Loss 5.6845 LearningRate 0.0181 Epoch: 11 Global Step: 476440 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:29,625-Speed 2623.01 samples/sec Loss 5.7242 LearningRate 0.0181 Epoch: 11 Global Step: 476450 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:15:33,527-Speed 2624.90 samples/sec Loss 5.6762 LearningRate 0.0181 Epoch: 11 Global Step: 476460 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:15:37,431-Speed 2623.26 samples/sec Loss 5.7707 LearningRate 0.0181 Epoch: 11 Global Step: 476470 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:15:41,317-Speed 2635.94 samples/sec Loss 5.6462 LearningRate 0.0181 Epoch: 11 Global Step: 476480 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:45,219-Speed 2625.11 samples/sec Loss 5.6840 LearningRate 0.0181 Epoch: 11 Global Step: 476490 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:49,127-Speed 2621.06 samples/sec Loss 5.6738 LearningRate 0.0181 Epoch: 11 Global Step: 476500 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:53,043-Speed 2615.19 samples/sec Loss 5.6810 LearningRate 0.0181 Epoch: 11 Global Step: 476510 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:15:56,951-Speed 2621.50 samples/sec Loss 5.6953 LearningRate 0.0181 Epoch: 11 Global Step: 476520 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:00,851-Speed 2626.40 samples/sec Loss 5.6658 LearningRate 0.0181 Epoch: 11 Global Step: 476530 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:04,751-Speed 2626.30 samples/sec Loss 5.6042 LearningRate 0.0181 Epoch: 11 Global Step: 476540 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:08,649-Speed 2627.21 samples/sec Loss 5.6475 LearningRate 0.0181 Epoch: 11 Global Step: 476550 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:12,547-Speed 2627.56 samples/sec Loss 5.7028 LearningRate 0.0181 Epoch: 11 Global Step: 476560 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:16,488-Speed 2599.88 samples/sec Loss 5.6095 LearningRate 0.0181 Epoch: 11 Global Step: 476570 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:20,389-Speed 2625.87 samples/sec Loss 5.7051 LearningRate 0.0181 Epoch: 11 Global Step: 476580 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:16:24,297-Speed 2620.82 samples/sec Loss 5.6624 LearningRate 0.0181 Epoch: 11 Global Step: 476590 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:16:28,177-Speed 2639.78 samples/sec Loss 5.6518 LearningRate 0.0181 Epoch: 11 Global Step: 476600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:32,084-Speed 2621.37 samples/sec Loss 5.7255 LearningRate 0.0181 Epoch: 11 Global Step: 476610 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:35,992-Speed 2621.20 samples/sec Loss 5.7802 LearningRate 0.0181 Epoch: 11 Global Step: 476620 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:39,893-Speed 2626.31 samples/sec Loss 5.6205 LearningRate 0.0181 Epoch: 11 Global Step: 476630 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:43,798-Speed 2622.16 samples/sec Loss 5.7026 LearningRate 0.0181 Epoch: 11 Global Step: 476640 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:47,700-Speed 2625.22 samples/sec Loss 5.6911 LearningRate 0.0181 Epoch: 11 Global Step: 476650 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:51,599-Speed 2627.16 samples/sec Loss 5.6803 LearningRate 0.0181 Epoch: 11 Global Step: 476660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:55,500-Speed 2626.31 samples/sec Loss 5.7288 LearningRate 0.0181 Epoch: 11 Global Step: 476670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:16:59,415-Speed 2615.46 samples/sec Loss 5.7486 LearningRate 0.0181 Epoch: 11 Global Step: 476680 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:17:03,325-Speed 2619.99 samples/sec Loss 5.7389 LearningRate 0.0181 Epoch: 11 Global Step: 476690 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:17:07,226-Speed 2625.54 samples/sec Loss 5.7130 LearningRate 0.0181 Epoch: 11 Global Step: 476700 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:11,151-Speed 2610.05 samples/sec Loss 5.6578 LearningRate 0.0181 Epoch: 11 Global Step: 476710 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:15,048-Speed 2627.78 samples/sec Loss 5.6268 LearningRate 0.0181 Epoch: 11 Global Step: 476720 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:18,964-Speed 2615.73 samples/sec Loss 5.6216 LearningRate 0.0181 Epoch: 11 Global Step: 476730 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:22,863-Speed 2627.81 samples/sec Loss 5.6898 LearningRate 0.0181 Epoch: 11 Global Step: 476740 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:26,759-Speed 2628.96 samples/sec Loss 5.6771 LearningRate 0.0181 Epoch: 11 Global Step: 476750 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:30,654-Speed 2629.92 samples/sec Loss 5.7195 LearningRate 0.0181 Epoch: 11 Global Step: 476760 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:34,553-Speed 2626.72 samples/sec Loss 5.7253 LearningRate 0.0181 Epoch: 11 Global Step: 476770 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:38,451-Speed 2627.17 samples/sec Loss 5.6893 LearningRate 0.0181 Epoch: 11 Global Step: 476780 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:42,348-Speed 2628.42 samples/sec Loss 5.8009 LearningRate 0.0181 Epoch: 11 Global Step: 476790 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:46,239-Speed 2632.12 samples/sec Loss 5.7236 LearningRate 0.0181 Epoch: 11 Global Step: 476800 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:50,142-Speed 2624.55 samples/sec Loss 5.5737 LearningRate 0.0181 Epoch: 11 Global Step: 476810 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:54,040-Speed 2627.87 samples/sec Loss 5.4651 LearningRate 0.0181 Epoch: 11 Global Step: 476820 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:17:57,940-Speed 2626.25 samples/sec Loss 5.7440 LearningRate 0.0181 Epoch: 11 Global Step: 476830 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:18:01,835-Speed 2629.73 samples/sec Loss 5.7010 LearningRate 0.0181 Epoch: 11 Global Step: 476840 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:18:05,731-Speed 2629.35 samples/sec Loss 5.6379 LearningRate 0.0181 Epoch: 11 Global Step: 476850 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:18:09,600-Speed 2646.83 samples/sec Loss 5.6266 LearningRate 0.0181 Epoch: 11 Global Step: 476860 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:13,549-Speed 2593.85 samples/sec Loss 5.6174 LearningRate 0.0181 Epoch: 11 Global Step: 476870 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:17,447-Speed 2628.54 samples/sec Loss 5.7671 LearningRate 0.0181 Epoch: 11 Global Step: 476880 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:21,345-Speed 2627.66 samples/sec Loss 5.6712 LearningRate 0.0181 Epoch: 11 Global Step: 476890 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:25,244-Speed 2626.59 samples/sec Loss 5.6813 LearningRate 0.0181 Epoch: 11 Global Step: 476900 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:29,137-Speed 2631.96 samples/sec Loss 5.6922 LearningRate 0.0181 Epoch: 11 Global Step: 476910 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:33,029-Speed 2631.46 samples/sec Loss 5.6250 LearningRate 0.0181 Epoch: 11 Global Step: 476920 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:36,924-Speed 2629.34 samples/sec Loss 5.6770 LearningRate 0.0181 Epoch: 11 Global Step: 476930 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:40,817-Speed 2630.89 samples/sec Loss 5.7732 LearningRate 0.0181 Epoch: 11 Global Step: 476940 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:44,718-Speed 2625.73 samples/sec Loss 5.7239 LearningRate 0.0181 Epoch: 11 Global Step: 476950 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:18:48,626-Speed 2621.02 samples/sec Loss 5.5999 LearningRate 0.0181 Epoch: 11 Global Step: 476960 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:18:52,518-Speed 2631.82 samples/sec Loss 5.6484 LearningRate 0.0181 Epoch: 11 Global Step: 476970 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:18:56,417-Speed 2626.82 samples/sec Loss 5.8050 LearningRate 0.0181 Epoch: 11 Global Step: 476980 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:19:00,292-Speed 2644.95 samples/sec Loss 5.6860 LearningRate 0.0181 Epoch: 11 Global Step: 476990 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:04,189-Speed 2628.19 samples/sec Loss 5.7625 LearningRate 0.0181 Epoch: 11 Global Step: 477000 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:08,083-Speed 2629.99 samples/sec Loss 5.6194 LearningRate 0.0181 Epoch: 11 Global Step: 477010 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:11,979-Speed 2628.71 samples/sec Loss 5.6669 LearningRate 0.0181 Epoch: 11 Global Step: 477020 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:15,875-Speed 2629.05 samples/sec Loss 5.7692 LearningRate 0.0181 Epoch: 11 Global Step: 477030 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:19,767-Speed 2632.05 samples/sec Loss 5.7281 LearningRate 0.0181 Epoch: 11 Global Step: 477040 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:23,670-Speed 2624.37 samples/sec Loss 5.7049 LearningRate 0.0181 Epoch: 11 Global Step: 477050 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:27,562-Speed 2631.15 samples/sec Loss 5.7047 LearningRate 0.0181 Epoch: 11 Global Step: 477060 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:31,460-Speed 2628.23 samples/sec Loss 5.5937 LearningRate 0.0181 Epoch: 11 Global Step: 477070 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:35,354-Speed 2629.92 samples/sec Loss 5.5780 LearningRate 0.0181 Epoch: 11 Global Step: 477080 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:19:39,248-Speed 2630.55 samples/sec Loss 5.6351 LearningRate 0.0181 Epoch: 11 Global Step: 477090 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:19:43,138-Speed 2632.69 samples/sec Loss 5.7953 LearningRate 0.0181 Epoch: 11 Global Step: 477100 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:19:47,031-Speed 2631.20 samples/sec Loss 5.6648 LearningRate 0.0181 Epoch: 11 Global Step: 477110 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:19:50,929-Speed 2627.47 samples/sec Loss 5.6671 LearningRate 0.0181 Epoch: 11 Global Step: 477120 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:19:54,828-Speed 2627.21 samples/sec Loss 5.6475 LearningRate 0.0181 Epoch: 11 Global Step: 477130 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:19:58,721-Speed 2631.10 samples/sec Loss 5.6614 LearningRate 0.0180 Epoch: 11 Global Step: 477140 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:02,641-Speed 2613.19 samples/sec Loss 5.6553 LearningRate 0.0180 Epoch: 11 Global Step: 477150 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:06,543-Speed 2624.70 samples/sec Loss 5.7086 LearningRate 0.0180 Epoch: 11 Global Step: 477160 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:10,534-Speed 2566.20 samples/sec Loss 5.5617 LearningRate 0.0180 Epoch: 11 Global Step: 477170 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:14,438-Speed 2623.77 samples/sec Loss 5.6749 LearningRate 0.0180 Epoch: 11 Global Step: 477180 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:18,333-Speed 2630.10 samples/sec Loss 5.6633 LearningRate 0.0180 Epoch: 11 Global Step: 477190 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:22,225-Speed 2631.56 samples/sec Loss 5.7255 LearningRate 0.0180 Epoch: 11 Global Step: 477200 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:26,147-Speed 2611.91 samples/sec Loss 5.6330 LearningRate 0.0180 Epoch: 11 Global Step: 477210 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:30,038-Speed 2631.98 samples/sec Loss 5.7220 LearningRate 0.0180 Epoch: 11 Global Step: 477220 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:33,939-Speed 2625.29 samples/sec Loss 5.7260 LearningRate 0.0180 Epoch: 11 Global Step: 477230 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:37,831-Speed 2631.83 samples/sec Loss 5.7203 LearningRate 0.0180 Epoch: 11 Global Step: 477240 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:20:41,721-Speed 2633.31 samples/sec Loss 5.7600 LearningRate 0.0180 Epoch: 11 Global Step: 477250 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:20:45,603-Speed 2638.10 samples/sec Loss 5.7065 LearningRate 0.0180 Epoch: 11 Global Step: 477260 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:49,500-Speed 2629.30 samples/sec Loss 5.6675 LearningRate 0.0180 Epoch: 11 Global Step: 477270 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:53,396-Speed 2628.90 samples/sec Loss 5.7740 LearningRate 0.0180 Epoch: 11 Global Step: 477280 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:20:57,295-Speed 2626.59 samples/sec Loss 5.5729 LearningRate 0.0180 Epoch: 11 Global Step: 477290 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:01,198-Speed 2624.37 samples/sec Loss 5.7631 LearningRate 0.0180 Epoch: 11 Global Step: 477300 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:05,102-Speed 2623.25 samples/sec Loss 5.7183 LearningRate 0.0180 Epoch: 11 Global Step: 477310 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:08,998-Speed 2629.05 samples/sec Loss 5.6735 LearningRate 0.0180 Epoch: 11 Global Step: 477320 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:12,904-Speed 2622.36 samples/sec Loss 5.6260 LearningRate 0.0180 Epoch: 11 Global Step: 477330 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:16,802-Speed 2627.98 samples/sec Loss 5.6715 LearningRate 0.0180 Epoch: 11 Global Step: 477340 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:20,707-Speed 2622.86 samples/sec Loss 5.6519 LearningRate 0.0180 Epoch: 11 Global Step: 477350 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:24,606-Speed 2627.70 samples/sec Loss 5.7006 LearningRate 0.0180 Epoch: 11 Global Step: 477360 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:21:28,507-Speed 2625.50 samples/sec Loss 5.6363 LearningRate 0.0180 Epoch: 11 Global Step: 477370 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:21:32,399-Speed 2631.91 samples/sec Loss 5.6275 LearningRate 0.0180 Epoch: 11 Global Step: 477380 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:36,301-Speed 2624.69 samples/sec Loss 5.7420 LearningRate 0.0180 Epoch: 11 Global Step: 477390 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:40,199-Speed 2627.51 samples/sec Loss 5.7081 LearningRate 0.0180 Epoch: 11 Global Step: 477400 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:44,097-Speed 2627.76 samples/sec Loss 5.7085 LearningRate 0.0180 Epoch: 11 Global Step: 477410 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:47,990-Speed 2630.84 samples/sec Loss 5.7096 LearningRate 0.0180 Epoch: 11 Global Step: 477420 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:51,926-Speed 2602.89 samples/sec Loss 5.6091 LearningRate 0.0180 Epoch: 11 Global Step: 477430 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:21:56,010-Speed 2508.06 samples/sec Loss 5.6043 LearningRate 0.0180 Epoch: 11 Global Step: 477440 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:00,091-Speed 2509.68 samples/sec Loss 5.6240 LearningRate 0.0180 Epoch: 11 Global Step: 477450 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:04,080-Speed 2568.18 samples/sec Loss 5.7324 LearningRate 0.0180 Epoch: 11 Global Step: 477460 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:07,976-Speed 2628.89 samples/sec Loss 5.6808 LearningRate 0.0180 Epoch: 11 Global Step: 477470 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:11,876-Speed 2625.88 samples/sec Loss 5.7030 LearningRate 0.0180 Epoch: 11 Global Step: 477480 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:22:15,769-Speed 2631.42 samples/sec Loss 5.5855 LearningRate 0.0180 Epoch: 11 Global Step: 477490 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:22:19,672-Speed 2624.29 samples/sec Loss 5.6474 LearningRate 0.0180 Epoch: 11 Global Step: 477500 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:22:23,552-Speed 2640.08 samples/sec Loss 5.7034 LearningRate 0.0180 Epoch: 11 Global Step: 477510 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:27,503-Speed 2592.46 samples/sec Loss 5.6393 LearningRate 0.0180 Epoch: 11 Global Step: 477520 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:31,396-Speed 2631.70 samples/sec Loss 5.6082 LearningRate 0.0180 Epoch: 11 Global Step: 477530 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:35,315-Speed 2612.86 samples/sec Loss 5.6425 LearningRate 0.0180 Epoch: 11 Global Step: 477540 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:39,218-Speed 2624.82 samples/sec Loss 5.7196 LearningRate 0.0180 Epoch: 11 Global Step: 477550 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:43,111-Speed 2630.56 samples/sec Loss 5.7397 LearningRate 0.0180 Epoch: 11 Global Step: 477560 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:47,016-Speed 2623.49 samples/sec Loss 5.5454 LearningRate 0.0180 Epoch: 11 Global Step: 477570 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:50,953-Speed 2601.17 samples/sec Loss 5.6443 LearningRate 0.0180 Epoch: 11 Global Step: 477580 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:54,850-Speed 2628.42 samples/sec Loss 5.6731 LearningRate 0.0180 Epoch: 11 Global Step: 477590 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:22:58,761-Speed 2619.39 samples/sec Loss 5.6115 LearningRate 0.0180 Epoch: 11 Global Step: 477600 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:02,658-Speed 2628.84 samples/sec Loss 5.7019 LearningRate 0.0180 Epoch: 11 Global Step: 477610 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:23:06,559-Speed 2625.36 samples/sec Loss 5.6631 LearningRate 0.0180 Epoch: 11 Global Step: 477620 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:23:10,572-Speed 2551.75 samples/sec Loss 5.6191 LearningRate 0.0180 Epoch: 11 Global Step: 477630 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:23:14,468-Speed 2629.15 samples/sec Loss 5.7456 LearningRate 0.0180 Epoch: 11 Global Step: 477640 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:18,395-Speed 2608.26 samples/sec Loss 5.6618 LearningRate 0.0180 Epoch: 11 Global Step: 477650 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:22,295-Speed 2626.48 samples/sec Loss 5.6190 LearningRate 0.0180 Epoch: 11 Global Step: 477660 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:26,199-Speed 2624.17 samples/sec Loss 5.6782 LearningRate 0.0180 Epoch: 11 Global Step: 477670 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:30,138-Speed 2600.53 samples/sec Loss 5.7582 LearningRate 0.0180 Epoch: 11 Global Step: 477680 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:34,075-Speed 2601.05 samples/sec Loss 5.6380 LearningRate 0.0180 Epoch: 11 Global Step: 477690 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:38,168-Speed 2502.36 samples/sec Loss 5.7508 LearningRate 0.0180 Epoch: 11 Global Step: 477700 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:42,267-Speed 2498.42 samples/sec Loss 5.7012 LearningRate 0.0180 Epoch: 11 Global Step: 477710 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:46,205-Speed 2601.65 samples/sec Loss 5.6423 LearningRate 0.0180 Epoch: 11 Global Step: 477720 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:50,105-Speed 2626.07 samples/sec Loss 5.6835 LearningRate 0.0180 Epoch: 11 Global Step: 477730 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:23:54,003-Speed 2627.82 samples/sec Loss 5.7191 LearningRate 0.0180 Epoch: 11 Global Step: 477740 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:23:57,901-Speed 2627.61 samples/sec Loss 5.5935 LearningRate 0.0180 Epoch: 11 Global Step: 477750 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:24:01,811-Speed 2619.51 samples/sec Loss 5.6735 LearningRate 0.0180 Epoch: 11 Global Step: 477760 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:24:05,702-Speed 2632.64 samples/sec Loss 5.5738 LearningRate 0.0180 Epoch: 11 Global Step: 477770 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:24:09,596-Speed 2630.39 samples/sec Loss 5.6820 LearningRate 0.0180 Epoch: 11 Global Step: 477780 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:24:13,466-Speed 2646.57 samples/sec Loss 5.6203 LearningRate 0.0180 Epoch: 11 Global Step: 477790 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:17,364-Speed 2627.48 samples/sec Loss 5.7528 LearningRate 0.0180 Epoch: 11 Global Step: 477800 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:21,269-Speed 2622.50 samples/sec Loss 5.6594 LearningRate 0.0180 Epoch: 11 Global Step: 477810 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:25,164-Speed 2629.57 samples/sec Loss 5.6368 LearningRate 0.0180 Epoch: 11 Global Step: 477820 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:29,089-Speed 2610.37 samples/sec Loss 5.7219 LearningRate 0.0180 Epoch: 11 Global Step: 477830 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:32,990-Speed 2625.16 samples/sec Loss 5.7196 LearningRate 0.0180 Epoch: 11 Global Step: 477840 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:36,894-Speed 2624.14 samples/sec Loss 5.6264 LearningRate 0.0180 Epoch: 11 Global Step: 477850 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:40,870-Speed 2575.98 samples/sec Loss 5.7657 LearningRate 0.0180 Epoch: 11 Global Step: 477860 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:44,763-Speed 2631.17 samples/sec Loss 5.6715 LearningRate 0.0180 Epoch: 11 Global Step: 477870 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:48,659-Speed 2628.57 samples/sec Loss 5.6209 LearningRate 0.0180 Epoch: 11 Global Step: 477880 Fp16 Grad Scale: 65536 Required: 40 hours
Training: 2022-04-15 01:24:52,553-Speed 2630.96 samples/sec Loss 5.6746 LearningRate 0.0180 Epoch: 11 Global Step: 477890 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:24:56,466-Speed 2617.84 samples/sec Loss 5.6256 LearningRate 0.0180 Epoch: 11 Global Step: 477900 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:25:00,370-Speed 2623.22 samples/sec Loss 5.6973 LearningRate 0.0180 Epoch: 11 Global Step: 477910 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:25:04,269-Speed 2627.32 samples/sec Loss 5.6481 LearningRate 0.0180 Epoch: 11 Global Step: 477920 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:25:08,163-Speed 2630.91 samples/sec Loss 5.7032 LearningRate 0.0180 Epoch: 11 Global Step: 477930 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:25:12,062-Speed 2626.77 samples/sec Loss 5.6927 LearningRate 0.0180 Epoch: 11 Global Step: 477940 Fp16 Grad Scale: 131072 Required: 40 hours
Training: 2022-04-15 01:25:15,957-Speed 2629.21 samples/sec Loss 5.6081 LearningRate 0.0180 Epoch: 11 Global Step: 477950 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:19,855-Speed 2627.70 samples/sec Loss 5.6870 LearningRate 0.0180 Epoch: 11 Global Step: 477960 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:23,766-Speed 2618.72 samples/sec Loss 5.5540 LearningRate 0.0180 Epoch: 11 Global Step: 477970 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:27,664-Speed 2627.82 samples/sec Loss 5.5452 LearningRate 0.0180 Epoch: 11 Global Step: 477980 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:31,568-Speed 2623.00 samples/sec Loss 5.6067 LearningRate 0.0180 Epoch: 11 Global Step: 477990 Fp16 Grad Scale: 262144 Required: 39 hours
Training: 2022-04-15 01:25:35,465-Speed 2628.55 samples/sec Loss 5.6587 LearningRate 0.0180 Epoch: 11 Global Step: 478000 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:39,362-Speed 2628.65 samples/sec Loss 5.6199 LearningRate 0.0180 Epoch: 11 Global Step: 478010 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:43,268-Speed 2621.78 samples/sec Loss 5.6750 LearningRate 0.0180 Epoch: 11 Global Step: 478020 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:47,168-Speed 2626.67 samples/sec Loss 5.5907 LearningRate 0.0180 Epoch: 11 Global Step: 478030 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:51,058-Speed 2633.14 samples/sec Loss 5.7191 LearningRate 0.0180 Epoch: 11 Global Step: 478040 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:54,947-Speed 2633.33 samples/sec Loss 5.6951 LearningRate 0.0180 Epoch: 11 Global Step: 478050 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:25:58,841-Speed 2630.39 samples/sec Loss 5.6704 LearningRate 0.0180 Epoch: 11 Global Step: 478060 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:26:02,737-Speed 2629.03 samples/sec Loss 5.7856 LearningRate 0.0180 Epoch: 11 Global Step: 478070 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:26:06,629-Speed 2631.43 samples/sec Loss 5.7475 LearningRate 0.0180 Epoch: 11 Global Step: 478080 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:26:10,501-Speed 2645.03 samples/sec Loss 5.8039 LearningRate 0.0180 Epoch: 11 Global Step: 478090 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:26:14,399-Speed 2627.94 samples/sec Loss 5.6569 LearningRate 0.0180 Epoch: 11 Global Step: 478100 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:26:18,292-Speed 2631.20 samples/sec Loss 5.7410 LearningRate 0.0179 Epoch: 11 Global Step: 478110 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:26:22,163-Speed 2646.12 samples/sec Loss 5.6335 LearningRate 0.0179 Epoch: 11 Global Step: 478120 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:26,055-Speed 2631.58 samples/sec Loss 5.6637 LearningRate 0.0179 Epoch: 11 Global Step: 478130 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:29,954-Speed 2626.29 samples/sec Loss 5.6749 LearningRate 0.0179 Epoch: 11 Global Step: 478140 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:33,851-Speed 2628.41 samples/sec Loss 5.5740 LearningRate 0.0179 Epoch: 11 Global Step: 478150 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:37,745-Speed 2630.65 samples/sec Loss 5.6597 LearningRate 0.0179 Epoch: 11 Global Step: 478160 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:41,636-Speed 2632.38 samples/sec Loss 5.6600 LearningRate 0.0179 Epoch: 11 Global Step: 478170 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:45,527-Speed 2633.11 samples/sec Loss 5.7014 LearningRate 0.0179 Epoch: 11 Global Step: 478180 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:49,425-Speed 2627.16 samples/sec Loss 5.6738 LearningRate 0.0179 Epoch: 11 Global Step: 478190 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:53,328-Speed 2624.91 samples/sec Loss 5.6351 LearningRate 0.0179 Epoch: 11 Global Step: 478200 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:26:57,246-Speed 2613.69 samples/sec Loss 5.7583 LearningRate 0.0179 Epoch: 11 Global Step: 478210 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:27:01,140-Speed 2630.54 samples/sec Loss 5.7030 LearningRate 0.0179 Epoch: 11 Global Step: 478220 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:05,036-Speed 2629.11 samples/sec Loss 5.8118 LearningRate 0.0179 Epoch: 11 Global Step: 478230 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:08,934-Speed 2628.27 samples/sec Loss 5.6794 LearningRate 0.0179 Epoch: 11 Global Step: 478240 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:12,834-Speed 2626.80 samples/sec Loss 5.5924 LearningRate 0.0179 Epoch: 11 Global Step: 478250 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:16,736-Speed 2624.81 samples/sec Loss 5.7317 LearningRate 0.0179 Epoch: 11 Global Step: 478260 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:20,633-Speed 2628.75 samples/sec Loss 5.6877 LearningRate 0.0179 Epoch: 11 Global Step: 478270 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:24,537-Speed 2623.88 samples/sec Loss 5.7907 LearningRate 0.0179 Epoch: 11 Global Step: 478280 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:28,435-Speed 2628.41 samples/sec Loss 5.6118 LearningRate 0.0179 Epoch: 11 Global Step: 478290 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:32,331-Speed 2628.68 samples/sec Loss 5.7161 LearningRate 0.0179 Epoch: 11 Global Step: 478300 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:36,247-Speed 2615.18 samples/sec Loss 5.7446 LearningRate 0.0179 Epoch: 11 Global Step: 478310 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:27:40,143-Speed 2628.64 samples/sec Loss 5.5929 LearningRate 0.0179 Epoch: 11 Global Step: 478320 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:27:44,046-Speed 2625.21 samples/sec Loss 5.6387 LearningRate 0.0179 Epoch: 11 Global Step: 478330 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:27:47,951-Speed 2623.45 samples/sec Loss 5.6139 LearningRate 0.0179 Epoch: 11 Global Step: 478340 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:27:51,853-Speed 2624.95 samples/sec Loss 5.6622 LearningRate 0.0179 Epoch: 11 Global Step: 478350 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:27:55,758-Speed 2623.73 samples/sec Loss 5.6234 LearningRate 0.0179 Epoch: 11 Global Step: 478360 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:27:59,653-Speed 2629.43 samples/sec Loss 5.6884 LearningRate 0.0179 Epoch: 11 Global Step: 478370 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:03,563-Speed 2619.35 samples/sec Loss 5.6893 LearningRate 0.0179 Epoch: 11 Global Step: 478380 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:07,471-Speed 2620.93 samples/sec Loss 5.6255 LearningRate 0.0179 Epoch: 11 Global Step: 478390 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:11,380-Speed 2620.77 samples/sec Loss 5.5456 LearningRate 0.0179 Epoch: 11 Global Step: 478400 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:15,278-Speed 2627.74 samples/sec Loss 5.7806 LearningRate 0.0179 Epoch: 11 Global Step: 478410 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:19,157-Speed 2640.33 samples/sec Loss 5.6678 LearningRate 0.0179 Epoch: 11 Global Step: 478420 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:23,057-Speed 2626.55 samples/sec Loss 5.7083 LearningRate 0.0179 Epoch: 11 Global Step: 478430 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:27,040-Speed 2571.95 samples/sec Loss 5.6367 LearningRate 0.0179 Epoch: 11 Global Step: 478440 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:30,945-Speed 2622.99 samples/sec Loss 5.5859 LearningRate 0.0179 Epoch: 11 Global Step: 478450 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:34,847-Speed 2625.34 samples/sec Loss 5.5947 LearningRate 0.0179 Epoch: 11 Global Step: 478460 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:38,763-Speed 2615.23 samples/sec Loss 5.7672 LearningRate 0.0179 Epoch: 11 Global Step: 478470 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:42,657-Speed 2631.07 samples/sec Loss 5.6875 LearningRate 0.0179 Epoch: 11 Global Step: 478480 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:46,554-Speed 2627.97 samples/sec Loss 5.7477 LearningRate 0.0179 Epoch: 11 Global Step: 478490 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:50,449-Speed 2630.31 samples/sec Loss 5.7165 LearningRate 0.0179 Epoch: 11 Global Step: 478500 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:54,345-Speed 2629.40 samples/sec Loss 5.6722 LearningRate 0.0179 Epoch: 11 Global Step: 478510 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:28:58,220-Speed 2643.84 samples/sec Loss 5.7194 LearningRate 0.0179 Epoch: 11 Global Step: 478520 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:29:02,151-Speed 2605.05 samples/sec Loss 5.6336 LearningRate 0.0179 Epoch: 11 Global Step: 478530 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:29:06,051-Speed 2626.37 samples/sec Loss 5.6862 LearningRate 0.0179 Epoch: 11 Global Step: 478540 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:29:09,956-Speed 2622.72 samples/sec Loss 5.7057 LearningRate 0.0179 Epoch: 11 Global Step: 478550 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:29:13,854-Speed 2628.48 samples/sec Loss 5.6223 LearningRate 0.0179 Epoch: 11 Global Step: 478560 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:29:17,747-Speed 2631.01 samples/sec Loss 5.8138 LearningRate 0.0179 Epoch: 11 Global Step: 478570 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:29:21,644-Speed 2628.72 samples/sec Loss 5.5706 LearningRate 0.0179 Epoch: 11 Global Step: 478580 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:29:25,552-Speed 2621.56 samples/sec Loss 5.6926 LearningRate 0.0179 Epoch: 11 Global Step: 478590 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:29,452-Speed 2626.04 samples/sec Loss 5.6470 LearningRate 0.0179 Epoch: 11 Global Step: 478600 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:33,358-Speed 2622.56 samples/sec Loss 5.6777 LearningRate 0.0179 Epoch: 11 Global Step: 478610 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:37,253-Speed 2628.85 samples/sec Loss 5.5772 LearningRate 0.0179 Epoch: 11 Global Step: 478620 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:41,149-Speed 2629.60 samples/sec Loss 5.6345 LearningRate 0.0179 Epoch: 11 Global Step: 478630 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:45,040-Speed 2631.81 samples/sec Loss 5.6741 LearningRate 0.0179 Epoch: 11 Global Step: 478640 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:48,933-Speed 2631.60 samples/sec Loss 5.7427 LearningRate 0.0179 Epoch: 11 Global Step: 478650 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:52,838-Speed 2622.74 samples/sec Loss 5.6807 LearningRate 0.0179 Epoch: 11 Global Step: 478660 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:29:56,733-Speed 2630.35 samples/sec Loss 5.6603 LearningRate 0.0179 Epoch: 11 Global Step: 478670 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:00,635-Speed 2624.84 samples/sec Loss 5.6184 LearningRate 0.0179 Epoch: 11 Global Step: 478680 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:04,508-Speed 2643.92 samples/sec Loss 5.7207 LearningRate 0.0179 Epoch: 11 Global Step: 478690 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:08,410-Speed 2625.15 samples/sec Loss 5.6353 LearningRate 0.0179 Epoch: 11 Global Step: 478700 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:12,334-Speed 2611.13 samples/sec Loss 5.7205 LearningRate 0.0179 Epoch: 11 Global Step: 478710 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:16,246-Speed 2618.03 samples/sec Loss 5.6703 LearningRate 0.0179 Epoch: 11 Global Step: 478720 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:20,156-Speed 2619.44 samples/sec Loss 5.7316 LearningRate 0.0179 Epoch: 11 Global Step: 478730 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:24,053-Speed 2628.75 samples/sec Loss 5.6683 LearningRate 0.0179 Epoch: 11 Global Step: 478740 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:27,949-Speed 2628.87 samples/sec Loss 5.6465 LearningRate 0.0179 Epoch: 11 Global Step: 478750 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:31,844-Speed 2630.21 samples/sec Loss 5.5951 LearningRate 0.0179 Epoch: 11 Global Step: 478760 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:35,737-Speed 2630.74 samples/sec Loss 5.6914 LearningRate 0.0179 Epoch: 11 Global Step: 478770 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:39,638-Speed 2625.06 samples/sec Loss 5.6617 LearningRate 0.0179 Epoch: 11 Global Step: 478780 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:30:43,543-Speed 2623.06 samples/sec Loss 5.6102 LearningRate 0.0179 Epoch: 11 Global Step: 478790 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:30:47,441-Speed 2628.35 samples/sec Loss 5.6299 LearningRate 0.0179 Epoch: 11 Global Step: 478800 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:30:51,337-Speed 2628.39 samples/sec Loss 5.6763 LearningRate 0.0179 Epoch: 11 Global Step: 478810 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:30:55,253-Speed 2616.22 samples/sec Loss 5.7523 LearningRate 0.0179 Epoch: 11 Global Step: 478820 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:30:59,150-Speed 2628.13 samples/sec Loss 5.6421 LearningRate 0.0179 Epoch: 11 Global Step: 478830 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:03,059-Speed 2621.12 samples/sec Loss 5.6354 LearningRate 0.0179 Epoch: 11 Global Step: 478840 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:06,960-Speed 2625.19 samples/sec Loss 5.5903 LearningRate 0.0179 Epoch: 11 Global Step: 478850 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:10,856-Speed 2628.77 samples/sec Loss 5.6602 LearningRate 0.0179 Epoch: 11 Global Step: 478860 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:14,753-Speed 2628.46 samples/sec Loss 5.7647 LearningRate 0.0179 Epoch: 11 Global Step: 478870 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:18,660-Speed 2621.97 samples/sec Loss 5.6423 LearningRate 0.0179 Epoch: 11 Global Step: 478880 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:22,562-Speed 2625.12 samples/sec Loss 5.6622 LearningRate 0.0179 Epoch: 11 Global Step: 478890 Fp16 Grad Scale: 262144 Required: 39 hours
Training: 2022-04-15 01:31:26,440-Speed 2641.01 samples/sec Loss 5.6594 LearningRate 0.0179 Epoch: 11 Global Step: 478900 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:30,335-Speed 2630.14 samples/sec Loss 5.6273 LearningRate 0.0179 Epoch: 11 Global Step: 478910 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:34,231-Speed 2628.41 samples/sec Loss 5.6457 LearningRate 0.0179 Epoch: 11 Global Step: 478920 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:38,130-Speed 2626.99 samples/sec Loss 5.7345 LearningRate 0.0179 Epoch: 11 Global Step: 478930 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:42,037-Speed 2621.09 samples/sec Loss 5.6030 LearningRate 0.0179 Epoch: 11 Global Step: 478940 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:45,934-Speed 2628.88 samples/sec Loss 5.6886 LearningRate 0.0179 Epoch: 11 Global Step: 478950 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:31:49,809-Speed 2643.55 samples/sec Loss 5.7845 LearningRate 0.0179 Epoch: 11 Global Step: 478960 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:31:53,726-Speed 2615.46 samples/sec Loss 5.6687 LearningRate 0.0179 Epoch: 11 Global Step: 478970 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:31:57,640-Speed 2616.17 samples/sec Loss 5.7004 LearningRate 0.0179 Epoch: 11 Global Step: 478980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:01,537-Speed 2628.49 samples/sec Loss 5.6456 LearningRate 0.0179 Epoch: 11 Global Step: 478990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:05,505-Speed 2581.48 samples/sec Loss 5.7624 LearningRate 0.0179 Epoch: 11 Global Step: 479000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:09,402-Speed 2627.76 samples/sec Loss 5.5528 LearningRate 0.0179 Epoch: 11 Global Step: 479010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:13,319-Speed 2614.89 samples/sec Loss 5.5664 LearningRate 0.0179 Epoch: 11 Global Step: 479020 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:17,230-Speed 2619.88 samples/sec Loss 5.5644 LearningRate 0.0179 Epoch: 11 Global Step: 479030 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:21,130-Speed 2625.87 samples/sec Loss 5.6501 LearningRate 0.0179 Epoch: 11 Global Step: 479040 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:25,033-Speed 2624.65 samples/sec Loss 5.6804 LearningRate 0.0179 Epoch: 11 Global Step: 479050 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:32:28,937-Speed 2624.09 samples/sec Loss 5.6574 LearningRate 0.0179 Epoch: 11 Global Step: 479060 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:32:32,832-Speed 2629.64 samples/sec Loss 5.5336 LearningRate 0.0179 Epoch: 11 Global Step: 479070 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:32:36,727-Speed 2629.17 samples/sec Loss 5.6237 LearningRate 0.0179 Epoch: 11 Global Step: 479080 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:32:40,623-Speed 2628.91 samples/sec Loss 5.5867 LearningRate 0.0178 Epoch: 11 Global Step: 479090 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:32:44,525-Speed 2625.08 samples/sec Loss 5.6912 LearningRate 0.0178 Epoch: 11 Global Step: 479100 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:32:48,423-Speed 2627.88 samples/sec Loss 5.6615 LearningRate 0.0178 Epoch: 11 Global Step: 479110 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:32:52,331-Speed 2620.99 samples/sec Loss 5.5921 LearningRate 0.0178 Epoch: 11 Global Step: 479120 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:32:56,233-Speed 2624.58 samples/sec Loss 5.6395 LearningRate 0.0178 Epoch: 11 Global Step: 479130 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:33:00,136-Speed 2624.32 samples/sec Loss 5.6541 LearningRate 0.0178 Epoch: 11 Global Step: 479140 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:33:04,028-Speed 2631.30 samples/sec Loss 5.6253 LearningRate 0.0178 Epoch: 11 Global Step: 479150 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:33:07,906-Speed 2642.10 samples/sec Loss 5.6909 LearningRate 0.0178 Epoch: 11 Global Step: 479160 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:33:11,801-Speed 2628.96 samples/sec Loss 5.7226 LearningRate 0.0178 Epoch: 11 Global Step: 479170 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:33:15,700-Speed 2626.77 samples/sec Loss 5.6409 LearningRate 0.0178 Epoch: 11 Global Step: 479180 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:33:19,603-Speed 2624.76 samples/sec Loss 5.6862 LearningRate 0.0178 Epoch: 11 Global Step: 479190 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:23,493-Speed 2633.29 samples/sec Loss 5.6398 LearningRate 0.0178 Epoch: 11 Global Step: 479200 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:27,386-Speed 2631.25 samples/sec Loss 5.6330 LearningRate 0.0178 Epoch: 11 Global Step: 479210 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:31,324-Speed 2600.85 samples/sec Loss 5.6983 LearningRate 0.0178 Epoch: 11 Global Step: 479220 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:35,264-Speed 2599.33 samples/sec Loss 5.5752 LearningRate 0.0178 Epoch: 11 Global Step: 479230 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:39,295-Speed 2541.11 samples/sec Loss 5.6883 LearningRate 0.0178 Epoch: 11 Global Step: 479240 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:43,191-Speed 2629.61 samples/sec Loss 5.6614 LearningRate 0.0178 Epoch: 11 Global Step: 479250 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:47,107-Speed 2614.77 samples/sec Loss 5.6687 LearningRate 0.0178 Epoch: 11 Global Step: 479260 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:51,019-Speed 2619.13 samples/sec Loss 5.5811 LearningRate 0.0178 Epoch: 11 Global Step: 479270 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:54,914-Speed 2629.85 samples/sec Loss 5.6902 LearningRate 0.0178 Epoch: 11 Global Step: 479280 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:33:58,828-Speed 2616.48 samples/sec Loss 5.6124 LearningRate 0.0178 Epoch: 11 Global Step: 479290 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:34:02,730-Speed 2624.97 samples/sec Loss 5.5876 LearningRate 0.0178 Epoch: 11 Global Step: 479300 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:34:06,602-Speed 2645.40 samples/sec Loss 5.6890 LearningRate 0.0178 Epoch: 11 Global Step: 479310 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:10,504-Speed 2625.30 samples/sec Loss 5.6586 LearningRate 0.0178 Epoch: 11 Global Step: 479320 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:14,459-Speed 2589.72 samples/sec Loss 5.6416 LearningRate 0.0178 Epoch: 11 Global Step: 479330 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:18,366-Speed 2621.44 samples/sec Loss 5.7359 LearningRate 0.0178 Epoch: 11 Global Step: 479340 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:22,264-Speed 2628.03 samples/sec Loss 5.6809 LearningRate 0.0178 Epoch: 11 Global Step: 479350 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:26,204-Speed 2599.73 samples/sec Loss 5.6087 LearningRate 0.0178 Epoch: 11 Global Step: 479360 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:30,100-Speed 2629.33 samples/sec Loss 5.6439 LearningRate 0.0178 Epoch: 11 Global Step: 479370 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:33,999-Speed 2626.80 samples/sec Loss 5.6327 LearningRate 0.0178 Epoch: 11 Global Step: 479380 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:37,893-Speed 2629.92 samples/sec Loss 5.7470 LearningRate 0.0178 Epoch: 11 Global Step: 479390 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:41,788-Speed 2630.09 samples/sec Loss 5.6040 LearningRate 0.0178 Epoch: 11 Global Step: 479400 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:34:45,688-Speed 2625.86 samples/sec Loss 5.6631 LearningRate 0.0178 Epoch: 11 Global Step: 479410 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:34:49,597-Speed 2620.42 samples/sec Loss 5.6165 LearningRate 0.0178 Epoch: 11 Global Step: 479420 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:34:53,507-Speed 2619.48 samples/sec Loss 5.6782 LearningRate 0.0178 Epoch: 11 Global Step: 479430 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:34:57,408-Speed 2626.42 samples/sec Loss 5.5738 LearningRate 0.0178 Epoch: 11 Global Step: 479440 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:01,307-Speed 2626.91 samples/sec Loss 5.6656 LearningRate 0.0178 Epoch: 11 Global Step: 479450 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:05,216-Speed 2620.30 samples/sec Loss 5.6711 LearningRate 0.0178 Epoch: 11 Global Step: 479460 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:09,116-Speed 2626.18 samples/sec Loss 5.5825 LearningRate 0.0178 Epoch: 11 Global Step: 479470 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:13,045-Speed 2607.52 samples/sec Loss 5.5566 LearningRate 0.0178 Epoch: 11 Global Step: 479480 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:16,941-Speed 2628.89 samples/sec Loss 5.6350 LearningRate 0.0178 Epoch: 11 Global Step: 479490 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:20,863-Speed 2611.76 samples/sec Loss 5.5954 LearningRate 0.0178 Epoch: 11 Global Step: 479500 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:24,738-Speed 2643.10 samples/sec Loss 5.6551 LearningRate 0.0178 Epoch: 11 Global Step: 479510 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:28,651-Speed 2617.97 samples/sec Loss 5.6338 LearningRate 0.0178 Epoch: 11 Global Step: 479520 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:35:32,521-Speed 2646.57 samples/sec Loss 5.7349 LearningRate 0.0178 Epoch: 11 Global Step: 479530 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:35:36,418-Speed 2628.13 samples/sec Loss 5.6835 LearningRate 0.0178 Epoch: 11 Global Step: 479540 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:35:40,321-Speed 2624.43 samples/sec Loss 5.6595 LearningRate 0.0178 Epoch: 11 Global Step: 479550 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:35:44,215-Speed 2630.18 samples/sec Loss 5.6923 LearningRate 0.0178 Epoch: 11 Global Step: 479560 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:35:48,115-Speed 2627.08 samples/sec Loss 5.5506 LearningRate 0.0178 Epoch: 11 Global Step: 479570 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:35:52,012-Speed 2628.47 samples/sec Loss 5.6153 LearningRate 0.0178 Epoch: 11 Global Step: 479580 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:35:55,935-Speed 2610.76 samples/sec Loss 5.6161 LearningRate 0.0178 Epoch: 11 Global Step: 479590 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:35:59,832-Speed 2628.30 samples/sec Loss 5.5959 LearningRate 0.0178 Epoch: 11 Global Step: 479600 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:36:03,724-Speed 2631.99 samples/sec Loss 5.7068 LearningRate 0.0178 Epoch: 11 Global Step: 479610 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:36:07,619-Speed 2629.66 samples/sec Loss 5.7292 LearningRate 0.0178 Epoch: 11 Global Step: 479620 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:36:11,520-Speed 2626.11 samples/sec Loss 5.6958 LearningRate 0.0178 Epoch: 11 Global Step: 479630 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:15,418-Speed 2627.28 samples/sec Loss 5.5091 LearningRate 0.0178 Epoch: 11 Global Step: 479640 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:19,315-Speed 2628.81 samples/sec Loss 5.6588 LearningRate 0.0178 Epoch: 11 Global Step: 479650 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:23,221-Speed 2622.50 samples/sec Loss 5.5703 LearningRate 0.0178 Epoch: 11 Global Step: 479660 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:27,144-Speed 2610.57 samples/sec Loss 5.6337 LearningRate 0.0178 Epoch: 11 Global Step: 479670 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:31,041-Speed 2628.65 samples/sec Loss 5.6776 LearningRate 0.0178 Epoch: 11 Global Step: 479680 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:34,948-Speed 2621.20 samples/sec Loss 5.6528 LearningRate 0.0178 Epoch: 11 Global Step: 479690 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:38,845-Speed 2628.57 samples/sec Loss 5.5926 LearningRate 0.0178 Epoch: 11 Global Step: 479700 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:36:42,718-Speed 2644.28 samples/sec Loss 5.7273 LearningRate 0.0178 Epoch: 11 Global Step: 479710 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:36:46,588-Speed 2647.07 samples/sec Loss 5.6771 LearningRate 0.0178 Epoch: 11 Global Step: 479720 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:36:50,488-Speed 2626.28 samples/sec Loss 5.5895 LearningRate 0.0178 Epoch: 11 Global Step: 479730 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:36:54,580-Speed 2503.31 samples/sec Loss 5.4972 LearningRate 0.0178 Epoch: 11 Global Step: 479740 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:36:58,659-Speed 2510.99 samples/sec Loss 5.6338 LearningRate 0.0178 Epoch: 11 Global Step: 479750 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:37:02,749-Speed 2505.69 samples/sec Loss 5.6542 LearningRate 0.0178 Epoch: 11 Global Step: 479760 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:37:06,640-Speed 2632.18 samples/sec Loss 5.5903 LearningRate 0.0178 Epoch: 11 Global Step: 479770 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:37:10,539-Speed 2626.53 samples/sec Loss 5.7011 LearningRate 0.0178 Epoch: 11 Global Step: 479780 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:37:14,465-Speed 2608.92 samples/sec Loss 5.6228 LearningRate 0.0178 Epoch: 11 Global Step: 479790 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:37:18,364-Speed 2627.71 samples/sec Loss 5.6237 LearningRate 0.0178 Epoch: 11 Global Step: 479800 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:37:22,302-Speed 2600.72 samples/sec Loss 5.6420 LearningRate 0.0178 Epoch: 11 Global Step: 479810 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:37:26,252-Speed 2593.38 samples/sec Loss 5.5395 LearningRate 0.0178 Epoch: 11 Global Step: 479820 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:30,147-Speed 2630.20 samples/sec Loss 5.6331 LearningRate 0.0178 Epoch: 11 Global Step: 479830 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:34,046-Speed 2627.37 samples/sec Loss 5.6838 LearningRate 0.0178 Epoch: 11 Global Step: 479840 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:37,954-Speed 2620.87 samples/sec Loss 5.5336 LearningRate 0.0178 Epoch: 11 Global Step: 479850 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:41,847-Speed 2631.07 samples/sec Loss 5.6864 LearningRate 0.0178 Epoch: 11 Global Step: 479860 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:45,740-Speed 2630.84 samples/sec Loss 5.4816 LearningRate 0.0178 Epoch: 11 Global Step: 479870 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:49,646-Speed 2622.84 samples/sec Loss 5.6327 LearningRate 0.0178 Epoch: 11 Global Step: 479880 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:53,546-Speed 2626.92 samples/sec Loss 5.6435 LearningRate 0.0178 Epoch: 11 Global Step: 479890 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:37:57,449-Speed 2623.87 samples/sec Loss 5.6561 LearningRate 0.0178 Epoch: 11 Global Step: 479900 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:01,346-Speed 2629.31 samples/sec Loss 5.6588 LearningRate 0.0178 Epoch: 11 Global Step: 479910 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:05,252-Speed 2622.10 samples/sec Loss 5.5899 LearningRate 0.0178 Epoch: 11 Global Step: 479920 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:38:09,141-Speed 2633.26 samples/sec Loss 5.7734 LearningRate 0.0178 Epoch: 11 Global Step: 479930 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:13,039-Speed 2627.52 samples/sec Loss 5.5526 LearningRate 0.0178 Epoch: 11 Global Step: 479940 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:16,941-Speed 2625.50 samples/sec Loss 5.6944 LearningRate 0.0178 Epoch: 11 Global Step: 479950 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:20,845-Speed 2623.28 samples/sec Loss 5.6005 LearningRate 0.0178 Epoch: 11 Global Step: 479960 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:24,744-Speed 2627.26 samples/sec Loss 5.5385 LearningRate 0.0178 Epoch: 11 Global Step: 479970 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:28,682-Speed 2602.54 samples/sec Loss 5.6290 LearningRate 0.0178 Epoch: 11 Global Step: 479980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:32,596-Speed 2616.62 samples/sec Loss 5.6120 LearningRate 0.0178 Epoch: 11 Global Step: 479990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:38:36,495-Speed 2626.84 samples/sec Loss 5.6175 LearningRate 0.0178 Epoch: 11 Global Step: 480000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:39:19,805-[lfw][480000]XNorm: 23.695356
Training: 2022-04-15 01:39:19,806-[lfw][480000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-15 01:39:19,807-[lfw][480000]Accuracy-Highest: 0.99800
Training: 2022-04-15 01:40:10,104-[cfp_fp][480000]XNorm: 21.673020
Training: 2022-04-15 01:40:10,105-[cfp_fp][480000]Accuracy-Flip: 0.98871+-0.00509
Training: 2022-04-15 01:40:10,106-[cfp_fp][480000]Accuracy-Highest: 0.98871
Training: 2022-04-15 01:40:53,437-[agedb_30][480000]XNorm: 23.584162
Training: 2022-04-15 01:40:53,438-[agedb_30][480000]Accuracy-Flip: 0.97783+-0.00646
Training: 2022-04-15 01:40:53,439-[agedb_30][480000]Accuracy-Highest: 0.97817
Training: 2022-04-15 01:40:57,322-Speed 72.71 samples/sec Loss 5.6357 LearningRate 0.0178 Epoch: 11 Global Step: 480010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:01,201-Speed 2640.86 samples/sec Loss 5.6220 LearningRate 0.0178 Epoch: 11 Global Step: 480020 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:05,120-Speed 2614.03 samples/sec Loss 5.6867 LearningRate 0.0178 Epoch: 11 Global Step: 480030 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:41:08,998-Speed 2641.04 samples/sec Loss 5.5934 LearningRate 0.0178 Epoch: 11 Global Step: 480040 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:41:12,863-Speed 2650.03 samples/sec Loss 5.5848 LearningRate 0.0178 Epoch: 11 Global Step: 480050 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:16,756-Speed 2630.81 samples/sec Loss 5.6194 LearningRate 0.0178 Epoch: 11 Global Step: 480060 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:20,647-Speed 2633.58 samples/sec Loss 5.5887 LearningRate 0.0178 Epoch: 11 Global Step: 480070 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:24,548-Speed 2626.02 samples/sec Loss 5.6471 LearningRate 0.0177 Epoch: 11 Global Step: 480080 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:28,655-Speed 2494.53 samples/sec Loss 5.6188 LearningRate 0.0177 Epoch: 11 Global Step: 480090 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:32,545-Speed 2632.82 samples/sec Loss 5.6034 LearningRate 0.0177 Epoch: 11 Global Step: 480100 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:36,437-Speed 2631.79 samples/sec Loss 5.6306 LearningRate 0.0177 Epoch: 11 Global Step: 480110 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:40,333-Speed 2628.99 samples/sec Loss 5.7347 LearningRate 0.0177 Epoch: 11 Global Step: 480120 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:44,226-Speed 2631.64 samples/sec Loss 5.6312 LearningRate 0.0177 Epoch: 11 Global Step: 480130 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:48,121-Speed 2629.70 samples/sec Loss 5.5611 LearningRate 0.0177 Epoch: 11 Global Step: 480140 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:52,003-Speed 2638.72 samples/sec Loss 5.5058 LearningRate 0.0177 Epoch: 11 Global Step: 480150 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:55,898-Speed 2629.69 samples/sec Loss 5.6475 LearningRate 0.0177 Epoch: 11 Global Step: 480160 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:41:59,812-Speed 2617.21 samples/sec Loss 5.6446 LearningRate 0.0177 Epoch: 11 Global Step: 480170 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:03,732-Speed 2612.61 samples/sec Loss 5.6060 LearningRate 0.0177 Epoch: 11 Global Step: 480180 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:07,649-Speed 2614.92 samples/sec Loss 5.6572 LearningRate 0.0177 Epoch: 11 Global Step: 480190 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:11,556-Speed 2621.88 samples/sec Loss 5.6809 LearningRate 0.0177 Epoch: 11 Global Step: 480200 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:15,448-Speed 2631.78 samples/sec Loss 5.6580 LearningRate 0.0177 Epoch: 11 Global Step: 480210 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:19,346-Speed 2627.82 samples/sec Loss 5.7900 LearningRate 0.0177 Epoch: 11 Global Step: 480220 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:23,282-Speed 2602.19 samples/sec Loss 5.6393 LearningRate 0.0177 Epoch: 11 Global Step: 480230 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:27,178-Speed 2628.59 samples/sec Loss 5.6503 LearningRate 0.0177 Epoch: 11 Global Step: 480240 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:42:31,076-Speed 2627.60 samples/sec Loss 5.5864 LearningRate 0.0177 Epoch: 11 Global Step: 480250 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:42:34,977-Speed 2626.24 samples/sec Loss 5.6533 LearningRate 0.0177 Epoch: 11 Global Step: 480260 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:42:38,872-Speed 2629.98 samples/sec Loss 5.6155 LearningRate 0.0177 Epoch: 11 Global Step: 480270 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:42:42,765-Speed 2630.52 samples/sec Loss 5.4998 LearningRate 0.0177 Epoch: 11 Global Step: 480280 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:42:46,661-Speed 2629.37 samples/sec Loss 5.5599 LearningRate 0.0177 Epoch: 11 Global Step: 480290 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:42:50,564-Speed 2623.59 samples/sec Loss 5.6517 LearningRate 0.0177 Epoch: 11 Global Step: 480300 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:42:54,462-Speed 2627.59 samples/sec Loss 5.6121 LearningRate 0.0177 Epoch: 11 Global Step: 480310 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:42:58,364-Speed 2625.22 samples/sec Loss 5.7226 LearningRate 0.0177 Epoch: 11 Global Step: 480320 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:02,261-Speed 2628.27 samples/sec Loss 5.6286 LearningRate 0.0177 Epoch: 11 Global Step: 480330 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:06,153-Speed 2631.93 samples/sec Loss 5.5457 LearningRate 0.0177 Epoch: 11 Global Step: 480340 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:10,153-Speed 2560.70 samples/sec Loss 5.6011 LearningRate 0.0177 Epoch: 11 Global Step: 480350 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:14,246-Speed 2502.16 samples/sec Loss 5.6106 LearningRate 0.0177 Epoch: 11 Global Step: 480360 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:18,177-Speed 2605.77 samples/sec Loss 5.5501 LearningRate 0.0177 Epoch: 11 Global Step: 480370 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:22,072-Speed 2629.95 samples/sec Loss 5.5944 LearningRate 0.0177 Epoch: 11 Global Step: 480380 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:25,965-Speed 2630.59 samples/sec Loss 5.7030 LearningRate 0.0177 Epoch: 11 Global Step: 480390 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:29,859-Speed 2630.30 samples/sec Loss 5.7544 LearningRate 0.0177 Epoch: 11 Global Step: 480400 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:33,816-Speed 2588.62 samples/sec Loss 5.6543 LearningRate 0.0177 Epoch: 11 Global Step: 480410 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:37,756-Speed 2599.57 samples/sec Loss 5.5997 LearningRate 0.0177 Epoch: 11 Global Step: 480420 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:41,660-Speed 2623.84 samples/sec Loss 5.6053 LearningRate 0.0177 Epoch: 11 Global Step: 480430 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:45,555-Speed 2629.92 samples/sec Loss 5.6411 LearningRate 0.0177 Epoch: 11 Global Step: 480440 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:49,454-Speed 2627.55 samples/sec Loss 5.7055 LearningRate 0.0177 Epoch: 11 Global Step: 480450 Fp16 Grad Scale: 262144 Required: 39 hours
Training: 2022-04-15 01:43:53,345-Speed 2631.65 samples/sec Loss 5.6944 LearningRate 0.0177 Epoch: 11 Global Step: 480460 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:43:57,226-Speed 2639.56 samples/sec Loss 5.6340 LearningRate 0.0177 Epoch: 11 Global Step: 480470 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:01,131-Speed 2622.52 samples/sec Loss 5.6894 LearningRate 0.0177 Epoch: 11 Global Step: 480480 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:05,027-Speed 2629.45 samples/sec Loss 5.5944 LearningRate 0.0177 Epoch: 11 Global Step: 480490 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:08,923-Speed 2629.40 samples/sec Loss 5.7053 LearningRate 0.0177 Epoch: 11 Global Step: 480500 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:12,820-Speed 2628.43 samples/sec Loss 5.6126 LearningRate 0.0177 Epoch: 11 Global Step: 480510 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:16,716-Speed 2628.53 samples/sec Loss 5.6175 LearningRate 0.0177 Epoch: 11 Global Step: 480520 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:20,623-Speed 2621.63 samples/sec Loss 5.5687 LearningRate 0.0177 Epoch: 11 Global Step: 480530 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:24,521-Speed 2627.77 samples/sec Loss 5.6907 LearningRate 0.0177 Epoch: 11 Global Step: 480540 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:28,418-Speed 2627.83 samples/sec Loss 5.5089 LearningRate 0.0177 Epoch: 11 Global Step: 480550 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:32,313-Speed 2629.72 samples/sec Loss 5.6299 LearningRate 0.0177 Epoch: 11 Global Step: 480560 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:36,209-Speed 2629.53 samples/sec Loss 5.6738 LearningRate 0.0177 Epoch: 11 Global Step: 480570 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:44:40,108-Speed 2627.51 samples/sec Loss 5.6292 LearningRate 0.0177 Epoch: 11 Global Step: 480580 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:44:44,004-Speed 2629.08 samples/sec Loss 5.5814 LearningRate 0.0177 Epoch: 11 Global Step: 480590 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:44:47,906-Speed 2624.44 samples/sec Loss 5.5119 LearningRate 0.0177 Epoch: 11 Global Step: 480600 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:44:51,781-Speed 2643.76 samples/sec Loss 5.6546 LearningRate 0.0177 Epoch: 11 Global Step: 480610 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:55,680-Speed 2626.28 samples/sec Loss 5.6354 LearningRate 0.0177 Epoch: 11 Global Step: 480620 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:44:59,579-Speed 2627.11 samples/sec Loss 5.6344 LearningRate 0.0177 Epoch: 11 Global Step: 480630 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:03,475-Speed 2628.97 samples/sec Loss 5.6942 LearningRate 0.0177 Epoch: 11 Global Step: 480640 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:07,372-Speed 2628.53 samples/sec Loss 5.6799 LearningRate 0.0177 Epoch: 11 Global Step: 480650 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:11,270-Speed 2627.44 samples/sec Loss 5.6442 LearningRate 0.0177 Epoch: 11 Global Step: 480660 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:15,164-Speed 2630.26 samples/sec Loss 5.5991 LearningRate 0.0177 Epoch: 11 Global Step: 480670 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:19,067-Speed 2624.45 samples/sec Loss 5.6512 LearningRate 0.0177 Epoch: 11 Global Step: 480680 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:22,960-Speed 2631.18 samples/sec Loss 5.6910 LearningRate 0.0177 Epoch: 11 Global Step: 480690 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:26,855-Speed 2629.71 samples/sec Loss 5.6368 LearningRate 0.0177 Epoch: 11 Global Step: 480700 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:45:30,752-Speed 2627.92 samples/sec Loss 5.6473 LearningRate 0.0177 Epoch: 11 Global Step: 480710 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:45:34,647-Speed 2629.84 samples/sec Loss 5.5895 LearningRate 0.0177 Epoch: 11 Global Step: 480720 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:45:38,545-Speed 2627.50 samples/sec Loss 5.7025 LearningRate 0.0177 Epoch: 11 Global Step: 480730 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:45:42,439-Speed 2629.89 samples/sec Loss 5.6382 LearningRate 0.0177 Epoch: 11 Global Step: 480740 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:45:46,434-Speed 2564.13 samples/sec Loss 5.6211 LearningRate 0.0177 Epoch: 11 Global Step: 480750 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:45:50,352-Speed 2614.67 samples/sec Loss 5.6250 LearningRate 0.0177 Epoch: 11 Global Step: 480760 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:45:54,318-Speed 2582.12 samples/sec Loss 5.6441 LearningRate 0.0177 Epoch: 11 Global Step: 480770 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:45:58,223-Speed 2622.90 samples/sec Loss 5.6297 LearningRate 0.0177 Epoch: 11 Global Step: 480780 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:46:02,123-Speed 2626.19 samples/sec Loss 5.6626 LearningRate 0.0177 Epoch: 11 Global Step: 480790 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:46:06,025-Speed 2625.14 samples/sec Loss 5.5710 LearningRate 0.0177 Epoch: 11 Global Step: 480800 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:46:09,896-Speed 2645.66 samples/sec Loss 5.5725 LearningRate 0.0177 Epoch: 11 Global Step: 480810 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:46:13,809-Speed 2617.85 samples/sec Loss 5.5085 LearningRate 0.0177 Epoch: 11 Global Step: 480820 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:46:17,716-Speed 2620.87 samples/sec Loss 5.5886 LearningRate 0.0177 Epoch: 11 Global Step: 480830 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:46:21,612-Speed 2629.15 samples/sec Loss 5.5902 LearningRate 0.0177 Epoch: 11 Global Step: 480840 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:46:25,508-Speed 2629.54 samples/sec Loss 5.6183 LearningRate 0.0177 Epoch: 11 Global Step: 480850 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:46:29,407-Speed 2627.22 samples/sec Loss 5.5859 LearningRate 0.0177 Epoch: 11 Global Step: 480860 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:46:33,310-Speed 2624.11 samples/sec Loss 5.5472 LearningRate 0.0177 Epoch: 11 Global Step: 480870 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:46:37,184-Speed 2643.69 samples/sec Loss 5.5759 LearningRate 0.0177 Epoch: 11 Global Step: 480880 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:46:41,082-Speed 2627.37 samples/sec Loss 5.5325 LearningRate 0.0177 Epoch: 11 Global Step: 480890 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:46:44,977-Speed 2629.46 samples/sec Loss 5.7653 LearningRate 0.0177 Epoch: 11 Global Step: 480900 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:46:48,874-Speed 2628.31 samples/sec Loss 5.5540 LearningRate 0.0177 Epoch: 11 Global Step: 480910 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:46:52,776-Speed 2625.41 samples/sec Loss 5.5510 LearningRate 0.0177 Epoch: 11 Global Step: 480920 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:46:56,668-Speed 2631.03 samples/sec Loss 5.5981 LearningRate 0.0177 Epoch: 11 Global Step: 480930 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:47:00,567-Speed 2627.10 samples/sec Loss 5.5211 LearningRate 0.0177 Epoch: 11 Global Step: 480940 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:47:04,466-Speed 2627.50 samples/sec Loss 5.6030 LearningRate 0.0177 Epoch: 11 Global Step: 480950 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:47:08,368-Speed 2624.99 samples/sec Loss 5.5325 LearningRate 0.0177 Epoch: 11 Global Step: 480960 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:47:12,272-Speed 2623.49 samples/sec Loss 5.6298 LearningRate 0.0177 Epoch: 11 Global Step: 480970 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:47:16,166-Speed 2630.11 samples/sec Loss 5.4718 LearningRate 0.0177 Epoch: 11 Global Step: 480980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:20,060-Speed 2630.26 samples/sec Loss 5.5389 LearningRate 0.0177 Epoch: 11 Global Step: 480990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:23,958-Speed 2627.66 samples/sec Loss 5.6495 LearningRate 0.0177 Epoch: 11 Global Step: 481000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:27,858-Speed 2626.18 samples/sec Loss 5.5880 LearningRate 0.0177 Epoch: 11 Global Step: 481010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:31,762-Speed 2623.24 samples/sec Loss 5.6579 LearningRate 0.0177 Epoch: 11 Global Step: 481020 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:35,661-Speed 2626.92 samples/sec Loss 5.6158 LearningRate 0.0177 Epoch: 11 Global Step: 481030 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:39,555-Speed 2630.48 samples/sec Loss 5.6175 LearningRate 0.0177 Epoch: 11 Global Step: 481040 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:43,449-Speed 2630.29 samples/sec Loss 5.5596 LearningRate 0.0177 Epoch: 11 Global Step: 481050 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:47,347-Speed 2628.15 samples/sec Loss 5.7112 LearningRate 0.0176 Epoch: 11 Global Step: 481060 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:51,241-Speed 2630.00 samples/sec Loss 5.6982 LearningRate 0.0176 Epoch: 11 Global Step: 481070 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:55,316-Speed 2513.53 samples/sec Loss 5.6389 LearningRate 0.0176 Epoch: 11 Global Step: 481080 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:47:59,390-Speed 2514.53 samples/sec Loss 5.5971 LearningRate 0.0176 Epoch: 11 Global Step: 481090 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:48:03,400-Speed 2553.78 samples/sec Loss 5.6454 LearningRate 0.0176 Epoch: 11 Global Step: 481100 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:48:07,304-Speed 2623.21 samples/sec Loss 5.6942 LearningRate 0.0176 Epoch: 11 Global Step: 481110 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:48:11,202-Speed 2627.61 samples/sec Loss 5.5544 LearningRate 0.0176 Epoch: 11 Global Step: 481120 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:48:15,099-Speed 2628.50 samples/sec Loss 5.7402 LearningRate 0.0176 Epoch: 11 Global Step: 481130 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:48:19,014-Speed 2616.19 samples/sec Loss 5.6342 LearningRate 0.0176 Epoch: 11 Global Step: 481140 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:48:22,899-Speed 2636.78 samples/sec Loss 5.5986 LearningRate 0.0176 Epoch: 11 Global Step: 481150 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:26,844-Speed 2596.31 samples/sec Loss 5.6054 LearningRate 0.0176 Epoch: 11 Global Step: 481160 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:30,835-Speed 2566.51 samples/sec Loss 5.5938 LearningRate 0.0176 Epoch: 11 Global Step: 481170 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:34,735-Speed 2626.48 samples/sec Loss 5.6175 LearningRate 0.0176 Epoch: 11 Global Step: 481180 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:38,625-Speed 2632.66 samples/sec Loss 5.6657 LearningRate 0.0176 Epoch: 11 Global Step: 481190 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:42,526-Speed 2626.00 samples/sec Loss 5.5979 LearningRate 0.0176 Epoch: 11 Global Step: 481200 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:46,427-Speed 2625.17 samples/sec Loss 5.6198 LearningRate 0.0176 Epoch: 11 Global Step: 481210 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:50,321-Speed 2631.12 samples/sec Loss 5.5184 LearningRate 0.0176 Epoch: 11 Global Step: 481220 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:54,223-Speed 2624.45 samples/sec Loss 5.6829 LearningRate 0.0176 Epoch: 11 Global Step: 481230 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:48:58,119-Speed 2629.12 samples/sec Loss 5.6362 LearningRate 0.0176 Epoch: 11 Global Step: 481240 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:49:02,069-Speed 2593.25 samples/sec Loss 5.5744 LearningRate 0.0176 Epoch: 11 Global Step: 481250 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:05,966-Speed 2628.34 samples/sec Loss 5.4616 LearningRate 0.0176 Epoch: 11 Global Step: 481260 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:09,870-Speed 2623.63 samples/sec Loss 5.5642 LearningRate 0.0176 Epoch: 11 Global Step: 481270 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:13,776-Speed 2622.51 samples/sec Loss 5.6004 LearningRate 0.0176 Epoch: 11 Global Step: 481280 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:17,699-Speed 2611.38 samples/sec Loss 5.6287 LearningRate 0.0176 Epoch: 11 Global Step: 481290 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:21,595-Speed 2628.97 samples/sec Loss 5.6470 LearningRate 0.0176 Epoch: 11 Global Step: 481300 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:25,487-Speed 2631.93 samples/sec Loss 5.6451 LearningRate 0.0176 Epoch: 11 Global Step: 481310 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:29,379-Speed 2631.42 samples/sec Loss 5.4520 LearningRate 0.0176 Epoch: 11 Global Step: 481320 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:33,275-Speed 2629.19 samples/sec Loss 5.6610 LearningRate 0.0176 Epoch: 11 Global Step: 481330 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:37,176-Speed 2626.16 samples/sec Loss 5.5598 LearningRate 0.0176 Epoch: 11 Global Step: 481340 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:49:41,075-Speed 2627.38 samples/sec Loss 5.5069 LearningRate 0.0176 Epoch: 11 Global Step: 481350 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:49:44,975-Speed 2625.71 samples/sec Loss 5.5985 LearningRate 0.0176 Epoch: 11 Global Step: 481360 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:49:48,899-Speed 2610.49 samples/sec Loss 5.5444 LearningRate 0.0176 Epoch: 11 Global Step: 481370 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:49:52,794-Speed 2629.88 samples/sec Loss 5.6861 LearningRate 0.0176 Epoch: 11 Global Step: 481380 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:49:56,688-Speed 2630.43 samples/sec Loss 5.7032 LearningRate 0.0176 Epoch: 11 Global Step: 481390 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:00,584-Speed 2628.78 samples/sec Loss 5.7111 LearningRate 0.0176 Epoch: 11 Global Step: 481400 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:04,485-Speed 2625.43 samples/sec Loss 5.6161 LearningRate 0.0176 Epoch: 11 Global Step: 481410 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:08,382-Speed 2628.45 samples/sec Loss 5.6183 LearningRate 0.0176 Epoch: 11 Global Step: 481420 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:12,287-Speed 2623.38 samples/sec Loss 5.6859 LearningRate 0.0176 Epoch: 11 Global Step: 481430 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:16,185-Speed 2627.27 samples/sec Loss 5.6152 LearningRate 0.0176 Epoch: 11 Global Step: 481440 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:20,068-Speed 2638.88 samples/sec Loss 5.5472 LearningRate 0.0176 Epoch: 11 Global Step: 481450 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:23,972-Speed 2623.59 samples/sec Loss 5.7433 LearningRate 0.0176 Epoch: 11 Global Step: 481460 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:27,873-Speed 2625.09 samples/sec Loss 5.6105 LearningRate 0.0176 Epoch: 11 Global Step: 481470 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:50:31,752-Speed 2640.47 samples/sec Loss 5.5894 LearningRate 0.0176 Epoch: 11 Global Step: 481480 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:50:35,674-Speed 2611.70 samples/sec Loss 5.6967 LearningRate 0.0176 Epoch: 11 Global Step: 481490 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:50:39,576-Speed 2624.78 samples/sec Loss 5.6296 LearningRate 0.0176 Epoch: 11 Global Step: 481500 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:50:43,494-Speed 2614.96 samples/sec Loss 5.6362 LearningRate 0.0176 Epoch: 11 Global Step: 481510 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:50:47,391-Speed 2628.69 samples/sec Loss 5.5243 LearningRate 0.0176 Epoch: 11 Global Step: 481520 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:50:51,292-Speed 2624.93 samples/sec Loss 5.5814 LearningRate 0.0176 Epoch: 11 Global Step: 481530 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:50:55,186-Speed 2630.86 samples/sec Loss 5.6353 LearningRate 0.0176 Epoch: 11 Global Step: 481540 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:50:59,082-Speed 2628.76 samples/sec Loss 5.5651 LearningRate 0.0176 Epoch: 11 Global Step: 481550 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:02,977-Speed 2629.82 samples/sec Loss 5.6792 LearningRate 0.0176 Epoch: 11 Global Step: 481560 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:06,941-Speed 2583.16 samples/sec Loss 5.6114 LearningRate 0.0176 Epoch: 11 Global Step: 481570 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:10,847-Speed 2623.06 samples/sec Loss 5.6613 LearningRate 0.0176 Epoch: 11 Global Step: 481580 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:51:14,747-Speed 2626.03 samples/sec Loss 5.6280 LearningRate 0.0176 Epoch: 11 Global Step: 481590 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:51:18,643-Speed 2629.66 samples/sec Loss 5.5632 LearningRate 0.0176 Epoch: 11 Global Step: 481600 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:51:22,538-Speed 2629.55 samples/sec Loss 5.5891 LearningRate 0.0176 Epoch: 11 Global Step: 481610 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:51:26,415-Speed 2641.75 samples/sec Loss 5.7833 LearningRate 0.0176 Epoch: 11 Global Step: 481620 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:30,317-Speed 2626.08 samples/sec Loss 5.6497 LearningRate 0.0176 Epoch: 11 Global Step: 481630 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:34,215-Speed 2627.47 samples/sec Loss 5.6793 LearningRate 0.0176 Epoch: 11 Global Step: 481640 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:38,117-Speed 2624.52 samples/sec Loss 5.6749 LearningRate 0.0176 Epoch: 11 Global Step: 481650 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:42,010-Speed 2630.68 samples/sec Loss 5.5883 LearningRate 0.0176 Epoch: 11 Global Step: 481660 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:45,902-Speed 2632.26 samples/sec Loss 5.6000 LearningRate 0.0176 Epoch: 11 Global Step: 481670 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:49,801-Speed 2627.01 samples/sec Loss 5.6531 LearningRate 0.0176 Epoch: 11 Global Step: 481680 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:53,709-Speed 2621.15 samples/sec Loss 5.6053 LearningRate 0.0176 Epoch: 11 Global Step: 481690 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:51:57,618-Speed 2620.45 samples/sec Loss 5.6726 LearningRate 0.0176 Epoch: 11 Global Step: 481700 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:01,518-Speed 2626.47 samples/sec Loss 5.6677 LearningRate 0.0176 Epoch: 11 Global Step: 481710 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:05,423-Speed 2622.73 samples/sec Loss 5.5967 LearningRate 0.0176 Epoch: 11 Global Step: 481720 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:52:09,338-Speed 2616.19 samples/sec Loss 5.6358 LearningRate 0.0176 Epoch: 11 Global Step: 481730 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:52:13,251-Speed 2617.84 samples/sec Loss 5.5994 LearningRate 0.0176 Epoch: 11 Global Step: 481740 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:52:17,146-Speed 2629.55 samples/sec Loss 5.6576 LearningRate 0.0176 Epoch: 11 Global Step: 481750 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:21,050-Speed 2624.38 samples/sec Loss 5.6371 LearningRate 0.0176 Epoch: 11 Global Step: 481760 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:24,968-Speed 2613.97 samples/sec Loss 5.5570 LearningRate 0.0176 Epoch: 11 Global Step: 481770 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:28,866-Speed 2628.28 samples/sec Loss 5.4844 LearningRate 0.0176 Epoch: 11 Global Step: 481780 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:32,773-Speed 2621.38 samples/sec Loss 5.6342 LearningRate 0.0176 Epoch: 11 Global Step: 481790 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:36,689-Speed 2615.77 samples/sec Loss 5.6436 LearningRate 0.0176 Epoch: 11 Global Step: 481800 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:40,589-Speed 2626.44 samples/sec Loss 5.6354 LearningRate 0.0176 Epoch: 11 Global Step: 481810 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:44,489-Speed 2626.73 samples/sec Loss 5.5659 LearningRate 0.0176 Epoch: 11 Global Step: 481820 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:48,385-Speed 2628.63 samples/sec Loss 5.6300 LearningRate 0.0176 Epoch: 11 Global Step: 481830 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:52,275-Speed 2633.25 samples/sec Loss 5.6124 LearningRate 0.0176 Epoch: 11 Global Step: 481840 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:52:56,188-Speed 2617.11 samples/sec Loss 5.6215 LearningRate 0.0176 Epoch: 11 Global Step: 481850 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:00,089-Speed 2625.81 samples/sec Loss 5.5952 LearningRate 0.0176 Epoch: 11 Global Step: 481860 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:03,990-Speed 2625.55 samples/sec Loss 5.6793 LearningRate 0.0176 Epoch: 11 Global Step: 481870 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:07,889-Speed 2627.26 samples/sec Loss 5.6480 LearningRate 0.0176 Epoch: 11 Global Step: 481880 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:11,784-Speed 2629.43 samples/sec Loss 5.5698 LearningRate 0.0176 Epoch: 11 Global Step: 481890 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:15,677-Speed 2630.98 samples/sec Loss 5.5982 LearningRate 0.0176 Epoch: 11 Global Step: 481900 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:19,587-Speed 2619.68 samples/sec Loss 5.5264 LearningRate 0.0176 Epoch: 11 Global Step: 481910 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:23,479-Speed 2631.67 samples/sec Loss 5.5049 LearningRate 0.0176 Epoch: 11 Global Step: 481920 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:27,372-Speed 2630.64 samples/sec Loss 5.5658 LearningRate 0.0176 Epoch: 11 Global Step: 481930 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:53:31,245-Speed 2644.74 samples/sec Loss 5.6941 LearningRate 0.0176 Epoch: 11 Global Step: 481940 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:53:35,140-Speed 2629.95 samples/sec Loss 5.5728 LearningRate 0.0176 Epoch: 11 Global Step: 481950 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:53:39,040-Speed 2626.02 samples/sec Loss 5.7393 LearningRate 0.0176 Epoch: 11 Global Step: 481960 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:53:42,952-Speed 2618.27 samples/sec Loss 5.5713 LearningRate 0.0176 Epoch: 11 Global Step: 481970 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:53:46,850-Speed 2627.81 samples/sec Loss 5.5820 LearningRate 0.0176 Epoch: 11 Global Step: 481980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:53:50,746-Speed 2628.98 samples/sec Loss 5.6500 LearningRate 0.0176 Epoch: 11 Global Step: 481990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:53:54,646-Speed 2626.05 samples/sec Loss 5.5966 LearningRate 0.0176 Epoch: 11 Global Step: 482000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:53:58,542-Speed 2628.67 samples/sec Loss 5.5991 LearningRate 0.0176 Epoch: 11 Global Step: 482010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:02,444-Speed 2624.76 samples/sec Loss 5.6328 LearningRate 0.0176 Epoch: 11 Global Step: 482020 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:06,345-Speed 2625.42 samples/sec Loss 5.6541 LearningRate 0.0176 Epoch: 11 Global Step: 482030 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:10,244-Speed 2627.30 samples/sec Loss 5.6489 LearningRate 0.0176 Epoch: 11 Global Step: 482040 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:54:14,147-Speed 2624.51 samples/sec Loss 5.6101 LearningRate 0.0175 Epoch: 11 Global Step: 482050 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:18,046-Speed 2627.04 samples/sec Loss 5.5586 LearningRate 0.0175 Epoch: 11 Global Step: 482060 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:21,943-Speed 2628.12 samples/sec Loss 5.6199 LearningRate 0.0175 Epoch: 11 Global Step: 482070 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:25,855-Speed 2618.18 samples/sec Loss 5.6657 LearningRate 0.0175 Epoch: 11 Global Step: 482080 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:29,751-Speed 2629.10 samples/sec Loss 5.6245 LearningRate 0.0175 Epoch: 11 Global Step: 482090 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:33,660-Speed 2620.02 samples/sec Loss 5.5983 LearningRate 0.0175 Epoch: 11 Global Step: 482100 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:37,560-Speed 2626.38 samples/sec Loss 5.5157 LearningRate 0.0175 Epoch: 11 Global Step: 482110 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:41,459-Speed 2627.08 samples/sec Loss 5.5702 LearningRate 0.0175 Epoch: 11 Global Step: 482120 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:45,376-Speed 2615.39 samples/sec Loss 5.5813 LearningRate 0.0175 Epoch: 11 Global Step: 482130 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:49,278-Speed 2624.43 samples/sec Loss 5.5426 LearningRate 0.0175 Epoch: 11 Global Step: 482140 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:54:53,188-Speed 2620.06 samples/sec Loss 5.6190 LearningRate 0.0175 Epoch: 11 Global Step: 482150 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:54:57,086-Speed 2627.18 samples/sec Loss 5.6569 LearningRate 0.0175 Epoch: 11 Global Step: 482160 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:00,985-Speed 2626.65 samples/sec Loss 5.6390 LearningRate 0.0175 Epoch: 11 Global Step: 482170 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:04,894-Speed 2620.47 samples/sec Loss 5.6574 LearningRate 0.0175 Epoch: 11 Global Step: 482180 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:08,794-Speed 2626.35 samples/sec Loss 5.6386 LearningRate 0.0175 Epoch: 11 Global Step: 482190 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:12,699-Speed 2622.52 samples/sec Loss 5.5294 LearningRate 0.0175 Epoch: 11 Global Step: 482200 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:16,598-Speed 2627.21 samples/sec Loss 5.6603 LearningRate 0.0175 Epoch: 11 Global Step: 482210 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:20,500-Speed 2625.34 samples/sec Loss 5.6740 LearningRate 0.0175 Epoch: 11 Global Step: 482220 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:24,397-Speed 2628.27 samples/sec Loss 5.5843 LearningRate 0.0175 Epoch: 11 Global Step: 482230 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:28,295-Speed 2627.50 samples/sec Loss 5.5970 LearningRate 0.0175 Epoch: 11 Global Step: 482240 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:32,172-Speed 2642.01 samples/sec Loss 5.5854 LearningRate 0.0175 Epoch: 11 Global Step: 482250 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:36,069-Speed 2627.52 samples/sec Loss 5.5269 LearningRate 0.0175 Epoch: 11 Global Step: 482260 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:55:39,945-Speed 2642.71 samples/sec Loss 5.6583 LearningRate 0.0175 Epoch: 11 Global Step: 482270 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:55:43,851-Speed 2622.53 samples/sec Loss 5.6299 LearningRate 0.0175 Epoch: 11 Global Step: 482280 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:55:47,751-Speed 2626.29 samples/sec Loss 5.6014 LearningRate 0.0175 Epoch: 11 Global Step: 482290 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:55:51,672-Speed 2611.74 samples/sec Loss 5.6029 LearningRate 0.0175 Epoch: 11 Global Step: 482300 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:55:55,576-Speed 2623.81 samples/sec Loss 5.5850 LearningRate 0.0175 Epoch: 11 Global Step: 482310 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:55:59,474-Speed 2627.79 samples/sec Loss 5.6781 LearningRate 0.0175 Epoch: 11 Global Step: 482320 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:03,367-Speed 2630.82 samples/sec Loss 5.5731 LearningRate 0.0175 Epoch: 11 Global Step: 482330 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:07,269-Speed 2624.95 samples/sec Loss 5.4986 LearningRate 0.0175 Epoch: 11 Global Step: 482340 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:11,161-Speed 2631.58 samples/sec Loss 5.5408 LearningRate 0.0175 Epoch: 11 Global Step: 482350 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:15,057-Speed 2630.04 samples/sec Loss 5.5527 LearningRate 0.0175 Epoch: 11 Global Step: 482360 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:18,970-Speed 2617.40 samples/sec Loss 5.5836 LearningRate 0.0175 Epoch: 11 Global Step: 482370 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:56:22,870-Speed 2628.47 samples/sec Loss 5.6106 LearningRate 0.0175 Epoch: 11 Global Step: 482380 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:56:26,765-Speed 2629.42 samples/sec Loss 5.6591 LearningRate 0.0175 Epoch: 11 Global Step: 482390 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:56:30,656-Speed 2632.58 samples/sec Loss 5.4983 LearningRate 0.0175 Epoch: 11 Global Step: 482400 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:56:34,549-Speed 2630.40 samples/sec Loss 5.6141 LearningRate 0.0175 Epoch: 11 Global Step: 482410 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:56:38,454-Speed 2623.42 samples/sec Loss 5.7152 LearningRate 0.0175 Epoch: 11 Global Step: 482420 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:56:42,360-Speed 2622.20 samples/sec Loss 5.6970 LearningRate 0.0175 Epoch: 11 Global Step: 482430 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:56:46,243-Speed 2637.85 samples/sec Loss 5.5728 LearningRate 0.0175 Epoch: 11 Global Step: 482440 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:50,139-Speed 2629.12 samples/sec Loss 5.6785 LearningRate 0.0175 Epoch: 11 Global Step: 482450 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:54,030-Speed 2631.93 samples/sec Loss 5.5893 LearningRate 0.0175 Epoch: 11 Global Step: 482460 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:56:57,927-Speed 2628.32 samples/sec Loss 5.6658 LearningRate 0.0175 Epoch: 11 Global Step: 482470 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:57:01,828-Speed 2625.51 samples/sec Loss 5.5688 LearningRate 0.0175 Epoch: 11 Global Step: 482480 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:57:05,747-Speed 2613.64 samples/sec Loss 5.6376 LearningRate 0.0175 Epoch: 11 Global Step: 482490 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:57:09,655-Speed 2620.97 samples/sec Loss 5.6874 LearningRate 0.0175 Epoch: 11 Global Step: 482500 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:57:13,550-Speed 2629.08 samples/sec Loss 5.5869 LearningRate 0.0175 Epoch: 11 Global Step: 482510 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:57:17,441-Speed 2632.40 samples/sec Loss 5.5306 LearningRate 0.0175 Epoch: 11 Global Step: 482520 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:57:21,334-Speed 2631.16 samples/sec Loss 5.5928 LearningRate 0.0175 Epoch: 11 Global Step: 482530 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:57:25,235-Speed 2625.83 samples/sec Loss 5.5314 LearningRate 0.0175 Epoch: 11 Global Step: 482540 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:29,144-Speed 2620.46 samples/sec Loss 5.6754 LearningRate 0.0175 Epoch: 11 Global Step: 482550 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:33,036-Speed 2631.43 samples/sec Loss 5.4417 LearningRate 0.0175 Epoch: 11 Global Step: 482560 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:36,942-Speed 2621.79 samples/sec Loss 5.6547 LearningRate 0.0175 Epoch: 11 Global Step: 482570 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:40,844-Speed 2625.02 samples/sec Loss 5.5135 LearningRate 0.0175 Epoch: 11 Global Step: 482580 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:44,751-Speed 2621.67 samples/sec Loss 5.6332 LearningRate 0.0175 Epoch: 11 Global Step: 482590 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:48,642-Speed 2632.02 samples/sec Loss 5.5470 LearningRate 0.0175 Epoch: 11 Global Step: 482600 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:52,540-Speed 2628.33 samples/sec Loss 5.5883 LearningRate 0.0175 Epoch: 11 Global Step: 482610 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:57:56,426-Speed 2635.31 samples/sec Loss 5.7168 LearningRate 0.0175 Epoch: 11 Global Step: 482620 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:00,384-Speed 2588.16 samples/sec Loss 5.5755 LearningRate 0.0175 Epoch: 11 Global Step: 482630 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:04,280-Speed 2628.77 samples/sec Loss 5.5551 LearningRate 0.0175 Epoch: 11 Global Step: 482640 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:08,176-Speed 2629.28 samples/sec Loss 5.5583 LearningRate 0.0175 Epoch: 11 Global Step: 482650 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:12,080-Speed 2623.38 samples/sec Loss 5.7197 LearningRate 0.0175 Epoch: 11 Global Step: 482660 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:15,979-Speed 2626.80 samples/sec Loss 5.6220 LearningRate 0.0175 Epoch: 11 Global Step: 482670 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:19,875-Speed 2628.97 samples/sec Loss 5.6605 LearningRate 0.0175 Epoch: 11 Global Step: 482680 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:23,769-Speed 2630.05 samples/sec Loss 5.5686 LearningRate 0.0175 Epoch: 11 Global Step: 482690 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:27,666-Speed 2628.61 samples/sec Loss 5.5458 LearningRate 0.0175 Epoch: 11 Global Step: 482700 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:31,568-Speed 2625.52 samples/sec Loss 5.5700 LearningRate 0.0175 Epoch: 11 Global Step: 482710 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:58:35,467-Speed 2626.84 samples/sec Loss 5.5596 LearningRate 0.0175 Epoch: 11 Global Step: 482720 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:58:39,390-Speed 2610.62 samples/sec Loss 5.5459 LearningRate 0.0175 Epoch: 11 Global Step: 482730 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:58:43,291-Speed 2625.22 samples/sec Loss 5.5176 LearningRate 0.0175 Epoch: 11 Global Step: 482740 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:58:47,202-Speed 2618.91 samples/sec Loss 5.5275 LearningRate 0.0175 Epoch: 11 Global Step: 482750 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:58:51,095-Speed 2631.19 samples/sec Loss 5.5285 LearningRate 0.0175 Epoch: 11 Global Step: 482760 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:58:55,009-Speed 2616.46 samples/sec Loss 5.6333 LearningRate 0.0175 Epoch: 11 Global Step: 482770 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:58:58,918-Speed 2620.28 samples/sec Loss 5.5626 LearningRate 0.0175 Epoch: 11 Global Step: 482780 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 01:59:02,799-Speed 2639.34 samples/sec Loss 5.6429 LearningRate 0.0175 Epoch: 11 Global Step: 482790 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:59:06,702-Speed 2623.65 samples/sec Loss 5.6022 LearningRate 0.0175 Epoch: 11 Global Step: 482800 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:59:10,601-Speed 2627.23 samples/sec Loss 5.5646 LearningRate 0.0175 Epoch: 11 Global Step: 482810 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:59:14,482-Speed 2639.64 samples/sec Loss 5.6400 LearningRate 0.0175 Epoch: 11 Global Step: 482820 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:18,381-Speed 2626.70 samples/sec Loss 5.5667 LearningRate 0.0175 Epoch: 11 Global Step: 482830 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:22,278-Speed 2628.24 samples/sec Loss 5.6620 LearningRate 0.0175 Epoch: 11 Global Step: 482840 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:26,172-Speed 2630.08 samples/sec Loss 5.7447 LearningRate 0.0175 Epoch: 11 Global Step: 482850 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:30,094-Speed 2611.72 samples/sec Loss 5.6610 LearningRate 0.0175 Epoch: 11 Global Step: 482860 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:33,993-Speed 2626.61 samples/sec Loss 5.4706 LearningRate 0.0175 Epoch: 11 Global Step: 482870 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:37,883-Speed 2633.44 samples/sec Loss 5.6417 LearningRate 0.0175 Epoch: 11 Global Step: 482880 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:41,792-Speed 2619.50 samples/sec Loss 5.6196 LearningRate 0.0175 Epoch: 11 Global Step: 482890 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:45,684-Speed 2632.05 samples/sec Loss 5.5789 LearningRate 0.0175 Epoch: 11 Global Step: 482900 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:49,582-Speed 2627.82 samples/sec Loss 5.5926 LearningRate 0.0175 Epoch: 11 Global Step: 482910 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 01:59:53,478-Speed 2629.29 samples/sec Loss 5.7569 LearningRate 0.0175 Epoch: 11 Global Step: 482920 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 01:59:57,376-Speed 2627.15 samples/sec Loss 5.5851 LearningRate 0.0175 Epoch: 11 Global Step: 482930 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:01,273-Speed 2629.22 samples/sec Loss 5.5947 LearningRate 0.0175 Epoch: 11 Global Step: 482940 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:05,171-Speed 2627.16 samples/sec Loss 5.5005 LearningRate 0.0175 Epoch: 11 Global Step: 482950 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:09,072-Speed 2625.68 samples/sec Loss 5.5083 LearningRate 0.0175 Epoch: 11 Global Step: 482960 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:12,982-Speed 2619.10 samples/sec Loss 5.5184 LearningRate 0.0175 Epoch: 11 Global Step: 482970 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:16,880-Speed 2628.38 samples/sec Loss 5.7139 LearningRate 0.0175 Epoch: 11 Global Step: 482980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:20,780-Speed 2626.48 samples/sec Loss 5.5571 LearningRate 0.0175 Epoch: 11 Global Step: 482990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:24,677-Speed 2628.16 samples/sec Loss 5.6844 LearningRate 0.0175 Epoch: 11 Global Step: 483000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:28,576-Speed 2627.33 samples/sec Loss 5.5772 LearningRate 0.0175 Epoch: 11 Global Step: 483010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:32,479-Speed 2623.90 samples/sec Loss 5.5322 LearningRate 0.0175 Epoch: 11 Global Step: 483020 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:00:36,377-Speed 2628.83 samples/sec Loss 5.5886 LearningRate 0.0175 Epoch: 11 Global Step: 483030 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:00:40,271-Speed 2630.04 samples/sec Loss 5.5621 LearningRate 0.0174 Epoch: 11 Global Step: 483040 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:00:44,170-Speed 2627.06 samples/sec Loss 5.5845 LearningRate 0.0174 Epoch: 11 Global Step: 483050 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:00:48,064-Speed 2630.37 samples/sec Loss 5.6206 LearningRate 0.0174 Epoch: 11 Global Step: 483060 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:00:51,958-Speed 2630.30 samples/sec Loss 5.5485 LearningRate 0.0174 Epoch: 11 Global Step: 483070 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:00:55,846-Speed 2634.49 samples/sec Loss 5.5911 LearningRate 0.0174 Epoch: 11 Global Step: 483080 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:00:59,739-Speed 2630.95 samples/sec Loss 5.6154 LearningRate 0.0174 Epoch: 11 Global Step: 483090 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:03,632-Speed 2630.39 samples/sec Loss 5.6365 LearningRate 0.0174 Epoch: 11 Global Step: 483100 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:07,529-Speed 2628.91 samples/sec Loss 5.5172 LearningRate 0.0174 Epoch: 11 Global Step: 483110 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:11,421-Speed 2632.09 samples/sec Loss 5.5846 LearningRate 0.0174 Epoch: 11 Global Step: 483120 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:15,318-Speed 2628.09 samples/sec Loss 5.5760 LearningRate 0.0174 Epoch: 11 Global Step: 483130 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:19,215-Speed 2628.21 samples/sec Loss 5.6449 LearningRate 0.0174 Epoch: 11 Global Step: 483140 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:23,114-Speed 2627.03 samples/sec Loss 5.5396 LearningRate 0.0174 Epoch: 11 Global Step: 483150 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:27,008-Speed 2629.97 samples/sec Loss 5.6251 LearningRate 0.0174 Epoch: 11 Global Step: 483160 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:30,908-Speed 2626.23 samples/sec Loss 5.5296 LearningRate 0.0174 Epoch: 11 Global Step: 483170 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:01:34,803-Speed 2629.37 samples/sec Loss 5.5998 LearningRate 0.0174 Epoch: 11 Global Step: 483180 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:01:38,699-Speed 2629.34 samples/sec Loss 5.5258 LearningRate 0.0174 Epoch: 11 Global Step: 483190 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:01:42,599-Speed 2626.23 samples/sec Loss 5.5144 LearningRate 0.0174 Epoch: 11 Global Step: 483200 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:01:46,491-Speed 2632.32 samples/sec Loss 5.5949 LearningRate 0.0174 Epoch: 11 Global Step: 483210 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:01:50,398-Speed 2621.74 samples/sec Loss 5.5447 LearningRate 0.0174 Epoch: 11 Global Step: 483220 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:01:54,294-Speed 2629.17 samples/sec Loss 5.5302 LearningRate 0.0174 Epoch: 11 Global Step: 483230 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:01:58,179-Speed 2635.78 samples/sec Loss 5.4659 LearningRate 0.0174 Epoch: 11 Global Step: 483240 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:02,107-Speed 2607.34 samples/sec Loss 5.7172 LearningRate 0.0174 Epoch: 11 Global Step: 483250 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:05,999-Speed 2631.85 samples/sec Loss 5.5473 LearningRate 0.0174 Epoch: 11 Global Step: 483260 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:09,894-Speed 2632.34 samples/sec Loss 5.5860 LearningRate 0.0174 Epoch: 11 Global Step: 483270 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:13,798-Speed 2623.55 samples/sec Loss 5.5987 LearningRate 0.0174 Epoch: 11 Global Step: 483280 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:17,693-Speed 2630.20 samples/sec Loss 5.5561 LearningRate 0.0174 Epoch: 11 Global Step: 483290 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:21,603-Speed 2619.00 samples/sec Loss 5.7133 LearningRate 0.0174 Epoch: 11 Global Step: 483300 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:25,508-Speed 2623.75 samples/sec Loss 5.5477 LearningRate 0.0174 Epoch: 11 Global Step: 483310 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:29,406-Speed 2627.77 samples/sec Loss 5.7530 LearningRate 0.0174 Epoch: 11 Global Step: 483320 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:33,305-Speed 2626.66 samples/sec Loss 5.6575 LearningRate 0.0174 Epoch: 11 Global Step: 483330 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:37,200-Speed 2629.16 samples/sec Loss 5.4649 LearningRate 0.0174 Epoch: 11 Global Step: 483340 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:02:41,099-Speed 2627.37 samples/sec Loss 5.6915 LearningRate 0.0174 Epoch: 11 Global Step: 483350 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:02:44,981-Speed 2637.89 samples/sec Loss 5.6990 LearningRate 0.0174 Epoch: 11 Global Step: 483360 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:48,877-Speed 2629.86 samples/sec Loss 5.6024 LearningRate 0.0174 Epoch: 11 Global Step: 483370 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:52,771-Speed 2630.47 samples/sec Loss 5.6520 LearningRate 0.0174 Epoch: 11 Global Step: 483380 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:02:56,726-Speed 2589.72 samples/sec Loss 5.5829 LearningRate 0.0174 Epoch: 11 Global Step: 483390 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:00,622-Speed 2629.44 samples/sec Loss 5.6171 LearningRate 0.0174 Epoch: 11 Global Step: 483400 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:04,695-Speed 2514.66 samples/sec Loss 5.6805 LearningRate 0.0174 Epoch: 11 Global Step: 483410 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:08,689-Speed 2564.23 samples/sec Loss 5.6385 LearningRate 0.0174 Epoch: 11 Global Step: 483420 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:12,605-Speed 2615.24 samples/sec Loss 5.5783 LearningRate 0.0174 Epoch: 11 Global Step: 483430 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:16,521-Speed 2616.17 samples/sec Loss 5.5366 LearningRate 0.0174 Epoch: 11 Global Step: 483440 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:20,425-Speed 2623.81 samples/sec Loss 5.5240 LearningRate 0.0174 Epoch: 11 Global Step: 483450 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:24,328-Speed 2624.62 samples/sec Loss 5.6686 LearningRate 0.0174 Epoch: 11 Global Step: 483460 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:03:28,256-Speed 2607.92 samples/sec Loss 5.5353 LearningRate 0.0174 Epoch: 11 Global Step: 483470 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:03:32,165-Speed 2620.27 samples/sec Loss 5.6891 LearningRate 0.0174 Epoch: 11 Global Step: 483480 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:03:36,058-Speed 2631.12 samples/sec Loss 5.5279 LearningRate 0.0174 Epoch: 11 Global Step: 483490 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:03:39,935-Speed 2641.87 samples/sec Loss 5.6769 LearningRate 0.0174 Epoch: 11 Global Step: 483500 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:43,841-Speed 2621.93 samples/sec Loss 5.5816 LearningRate 0.0174 Epoch: 11 Global Step: 483510 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:47,745-Speed 2623.20 samples/sec Loss 5.5258 LearningRate 0.0174 Epoch: 11 Global Step: 483520 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:51,651-Speed 2622.76 samples/sec Loss 5.5887 LearningRate 0.0174 Epoch: 11 Global Step: 483530 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:55,647-Speed 2563.45 samples/sec Loss 5.5527 LearningRate 0.0174 Epoch: 11 Global Step: 483540 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:03:59,561-Speed 2617.44 samples/sec Loss 5.5761 LearningRate 0.0174 Epoch: 11 Global Step: 483550 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:04:03,460-Speed 2626.88 samples/sec Loss 5.4629 LearningRate 0.0174 Epoch: 11 Global Step: 483560 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:04:07,365-Speed 2622.82 samples/sec Loss 5.5701 LearningRate 0.0174 Epoch: 11 Global Step: 483570 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:04:11,284-Speed 2612.83 samples/sec Loss 5.5129 LearningRate 0.0174 Epoch: 11 Global Step: 483580 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:04:15,186-Speed 2625.43 samples/sec Loss 5.6161 LearningRate 0.0174 Epoch: 11 Global Step: 483590 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:04:19,081-Speed 2629.73 samples/sec Loss 5.4851 LearningRate 0.0174 Epoch: 11 Global Step: 483600 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:22,976-Speed 2630.35 samples/sec Loss 5.5620 LearningRate 0.0174 Epoch: 11 Global Step: 483610 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:26,883-Speed 2621.72 samples/sec Loss 5.5305 LearningRate 0.0174 Epoch: 11 Global Step: 483620 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:30,780-Speed 2628.56 samples/sec Loss 5.6391 LearningRate 0.0174 Epoch: 11 Global Step: 483630 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:34,680-Speed 2626.45 samples/sec Loss 5.6264 LearningRate 0.0174 Epoch: 11 Global Step: 483640 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:38,577-Speed 2627.80 samples/sec Loss 5.5046 LearningRate 0.0174 Epoch: 11 Global Step: 483650 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:42,479-Speed 2625.07 samples/sec Loss 5.6915 LearningRate 0.0174 Epoch: 11 Global Step: 483660 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:46,375-Speed 2629.39 samples/sec Loss 5.5659 LearningRate 0.0174 Epoch: 11 Global Step: 483670 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:50,272-Speed 2627.77 samples/sec Loss 5.5577 LearningRate 0.0174 Epoch: 11 Global Step: 483680 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:04:54,162-Speed 2633.21 samples/sec Loss 5.5457 LearningRate 0.0174 Epoch: 11 Global Step: 483690 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:04:58,060-Speed 2627.47 samples/sec Loss 5.6422 LearningRate 0.0174 Epoch: 11 Global Step: 483700 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:05:01,951-Speed 2632.97 samples/sec Loss 5.5893 LearningRate 0.0174 Epoch: 11 Global Step: 483710 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:05,851-Speed 2625.63 samples/sec Loss 5.5498 LearningRate 0.0174 Epoch: 11 Global Step: 483720 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:09,748-Speed 2628.76 samples/sec Loss 5.5420 LearningRate 0.0174 Epoch: 11 Global Step: 483730 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:13,646-Speed 2627.22 samples/sec Loss 5.4873 LearningRate 0.0174 Epoch: 11 Global Step: 483740 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:17,544-Speed 2628.08 samples/sec Loss 5.5812 LearningRate 0.0174 Epoch: 11 Global Step: 483750 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:21,443-Speed 2627.35 samples/sec Loss 5.5473 LearningRate 0.0174 Epoch: 11 Global Step: 483760 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:25,337-Speed 2630.05 samples/sec Loss 5.4919 LearningRate 0.0174 Epoch: 11 Global Step: 483770 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:29,255-Speed 2614.36 samples/sec Loss 5.6444 LearningRate 0.0174 Epoch: 11 Global Step: 483780 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:33,159-Speed 2623.62 samples/sec Loss 5.6274 LearningRate 0.0174 Epoch: 11 Global Step: 483790 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:37,056-Speed 2627.85 samples/sec Loss 5.6185 LearningRate 0.0174 Epoch: 11 Global Step: 483800 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:05:40,957-Speed 2625.48 samples/sec Loss 5.4954 LearningRate 0.0174 Epoch: 11 Global Step: 483810 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:05:44,865-Speed 2621.73 samples/sec Loss 5.5423 LearningRate 0.0174 Epoch: 11 Global Step: 483820 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:05:48,757-Speed 2631.30 samples/sec Loss 5.6169 LearningRate 0.0174 Epoch: 11 Global Step: 483830 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:05:52,654-Speed 2629.16 samples/sec Loss 5.7493 LearningRate 0.0174 Epoch: 11 Global Step: 483840 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:05:56,550-Speed 2628.38 samples/sec Loss 5.6155 LearningRate 0.0174 Epoch: 11 Global Step: 483850 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:06:00,443-Speed 2631.25 samples/sec Loss 5.4960 LearningRate 0.0174 Epoch: 11 Global Step: 483860 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:06:04,364-Speed 2612.07 samples/sec Loss 5.5951 LearningRate 0.0174 Epoch: 11 Global Step: 483870 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:06:08,279-Speed 2615.98 samples/sec Loss 5.5427 LearningRate 0.0174 Epoch: 11 Global Step: 483880 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:12,184-Speed 2622.82 samples/sec Loss 5.6428 LearningRate 0.0174 Epoch: 11 Global Step: 483890 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:16,083-Speed 2627.91 samples/sec Loss 5.4852 LearningRate 0.0174 Epoch: 11 Global Step: 483900 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:19,979-Speed 2628.51 samples/sec Loss 5.4744 LearningRate 0.0174 Epoch: 11 Global Step: 483910 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:23,876-Speed 2628.58 samples/sec Loss 5.5750 LearningRate 0.0174 Epoch: 11 Global Step: 483920 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:27,778-Speed 2624.89 samples/sec Loss 5.5492 LearningRate 0.0174 Epoch: 11 Global Step: 483930 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:31,673-Speed 2630.03 samples/sec Loss 5.4794 LearningRate 0.0174 Epoch: 11 Global Step: 483940 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:35,571-Speed 2626.98 samples/sec Loss 5.6613 LearningRate 0.0174 Epoch: 11 Global Step: 483950 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:39,462-Speed 2632.48 samples/sec Loss 5.5825 LearningRate 0.0174 Epoch: 11 Global Step: 483960 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:43,358-Speed 2628.65 samples/sec Loss 5.4497 LearningRate 0.0174 Epoch: 11 Global Step: 483970 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:06:47,259-Speed 2625.79 samples/sec Loss 5.5463 LearningRate 0.0174 Epoch: 11 Global Step: 483980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:06:51,157-Speed 2627.60 samples/sec Loss 5.6814 LearningRate 0.0174 Epoch: 11 Global Step: 483990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:06:55,055-Speed 2627.87 samples/sec Loss 5.6038 LearningRate 0.0174 Epoch: 11 Global Step: 484000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:06:58,967-Speed 2618.60 samples/sec Loss 5.6120 LearningRate 0.0174 Epoch: 11 Global Step: 484010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:07:02,862-Speed 2629.46 samples/sec Loss 5.7320 LearningRate 0.0174 Epoch: 11 Global Step: 484020 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:07:06,770-Speed 2620.61 samples/sec Loss 5.7210 LearningRate 0.0174 Epoch: 11 Global Step: 484030 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:07:10,714-Speed 2597.08 samples/sec Loss 5.5967 LearningRate 0.0173 Epoch: 11 Global Step: 484040 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:07:14,611-Speed 2628.76 samples/sec Loss 5.5529 LearningRate 0.0173 Epoch: 11 Global Step: 484050 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:07:18,504-Speed 2631.02 samples/sec Loss 5.4859 LearningRate 0.0173 Epoch: 11 Global Step: 484060 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:07:22,401-Speed 2628.04 samples/sec Loss 5.5303 LearningRate 0.0173 Epoch: 11 Global Step: 484070 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:07:26,303-Speed 2625.67 samples/sec Loss 5.5457 LearningRate 0.0173 Epoch: 11 Global Step: 484080 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:30,209-Speed 2621.88 samples/sec Loss 5.6506 LearningRate 0.0173 Epoch: 11 Global Step: 484090 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:34,114-Speed 2622.99 samples/sec Loss 5.6073 LearningRate 0.0173 Epoch: 11 Global Step: 484100 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:38,029-Speed 2616.03 samples/sec Loss 5.5810 LearningRate 0.0173 Epoch: 11 Global Step: 484110 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:42,154-Speed 2483.38 samples/sec Loss 5.5907 LearningRate 0.0173 Epoch: 11 Global Step: 484120 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:46,052-Speed 2628.10 samples/sec Loss 5.5447 LearningRate 0.0173 Epoch: 11 Global Step: 484130 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:49,956-Speed 2623.33 samples/sec Loss 5.5845 LearningRate 0.0173 Epoch: 11 Global Step: 484140 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:53,858-Speed 2624.96 samples/sec Loss 5.5408 LearningRate 0.0173 Epoch: 11 Global Step: 484150 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:07:57,738-Speed 2639.79 samples/sec Loss 5.6334 LearningRate 0.0173 Epoch: 11 Global Step: 484160 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:01,639-Speed 2625.37 samples/sec Loss 5.5570 LearningRate 0.0173 Epoch: 11 Global Step: 484170 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:05,552-Speed 2617.43 samples/sec Loss 5.5789 LearningRate 0.0173 Epoch: 11 Global Step: 484180 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:09,464-Speed 2618.45 samples/sec Loss 5.5646 LearningRate 0.0173 Epoch: 11 Global Step: 484190 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:13,364-Speed 2626.32 samples/sec Loss 5.6423 LearningRate 0.0173 Epoch: 11 Global Step: 484200 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:17,265-Speed 2625.79 samples/sec Loss 5.7169 LearningRate 0.0173 Epoch: 11 Global Step: 484210 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:21,169-Speed 2623.71 samples/sec Loss 5.6165 LearningRate 0.0173 Epoch: 11 Global Step: 484220 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:25,071-Speed 2624.89 samples/sec Loss 5.5100 LearningRate 0.0173 Epoch: 11 Global Step: 484230 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:28,966-Speed 2630.02 samples/sec Loss 5.6040 LearningRate 0.0173 Epoch: 11 Global Step: 484240 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:32,859-Speed 2630.94 samples/sec Loss 5.5260 LearningRate 0.0173 Epoch: 11 Global Step: 484250 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:36,761-Speed 2624.79 samples/sec Loss 5.6173 LearningRate 0.0173 Epoch: 11 Global Step: 484260 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:08:40,689-Speed 2607.80 samples/sec Loss 5.5700 LearningRate 0.0173 Epoch: 11 Global Step: 484270 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:44,593-Speed 2623.76 samples/sec Loss 5.5973 LearningRate 0.0173 Epoch: 11 Global Step: 484280 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:48,488-Speed 2630.33 samples/sec Loss 5.5044 LearningRate 0.0173 Epoch: 11 Global Step: 484290 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:52,383-Speed 2629.29 samples/sec Loss 5.5686 LearningRate 0.0173 Epoch: 11 Global Step: 484300 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:08:56,278-Speed 2629.90 samples/sec Loss 5.5380 LearningRate 0.0173 Epoch: 11 Global Step: 484310 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:00,219-Speed 2598.87 samples/sec Loss 5.5194 LearningRate 0.0173 Epoch: 11 Global Step: 484320 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:04,130-Speed 2618.72 samples/sec Loss 5.5378 LearningRate 0.0173 Epoch: 11 Global Step: 484330 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:08,027-Speed 2628.41 samples/sec Loss 5.5602 LearningRate 0.0173 Epoch: 11 Global Step: 484340 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:11,931-Speed 2629.50 samples/sec Loss 5.5661 LearningRate 0.0173 Epoch: 11 Global Step: 484350 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:15,831-Speed 2626.72 samples/sec Loss 5.5790 LearningRate 0.0173 Epoch: 11 Global Step: 484360 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:19,738-Speed 2621.43 samples/sec Loss 5.5998 LearningRate 0.0173 Epoch: 11 Global Step: 484370 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:09:23,636-Speed 2627.92 samples/sec Loss 5.6468 LearningRate 0.0173 Epoch: 11 Global Step: 484380 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:09:27,513-Speed 2641.36 samples/sec Loss 5.5545 LearningRate 0.0173 Epoch: 11 Global Step: 484390 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:31,412-Speed 2627.18 samples/sec Loss 5.5038 LearningRate 0.0173 Epoch: 11 Global Step: 484400 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:35,308-Speed 2629.34 samples/sec Loss 5.6070 LearningRate 0.0173 Epoch: 11 Global Step: 484410 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:39,204-Speed 2628.70 samples/sec Loss 5.5931 LearningRate 0.0173 Epoch: 11 Global Step: 484420 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:43,115-Speed 2618.44 samples/sec Loss 5.6486 LearningRate 0.0173 Epoch: 11 Global Step: 484430 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:47,020-Speed 2623.79 samples/sec Loss 5.5438 LearningRate 0.0173 Epoch: 11 Global Step: 484440 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:50,953-Speed 2603.83 samples/sec Loss 5.6197 LearningRate 0.0173 Epoch: 11 Global Step: 484450 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:54,852-Speed 2627.15 samples/sec Loss 5.6527 LearningRate 0.0173 Epoch: 11 Global Step: 484460 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:09:58,893-Speed 2534.81 samples/sec Loss 5.5394 LearningRate 0.0173 Epoch: 11 Global Step: 484470 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:10:02,791-Speed 2628.03 samples/sec Loss 5.5200 LearningRate 0.0173 Epoch: 11 Global Step: 484480 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:10:06,691-Speed 2626.23 samples/sec Loss 5.5376 LearningRate 0.0173 Epoch: 11 Global Step: 484490 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:10,590-Speed 2626.45 samples/sec Loss 5.4694 LearningRate 0.0173 Epoch: 11 Global Step: 484500 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:14,489-Speed 2626.84 samples/sec Loss 5.5773 LearningRate 0.0173 Epoch: 11 Global Step: 484510 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:18,386-Speed 2628.31 samples/sec Loss 5.5420 LearningRate 0.0173 Epoch: 11 Global Step: 484520 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:22,298-Speed 2619.22 samples/sec Loss 5.5029 LearningRate 0.0173 Epoch: 11 Global Step: 484530 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:26,199-Speed 2625.33 samples/sec Loss 5.5248 LearningRate 0.0173 Epoch: 11 Global Step: 484540 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:30,134-Speed 2603.35 samples/sec Loss 5.5133 LearningRate 0.0173 Epoch: 11 Global Step: 484550 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:34,058-Speed 2610.15 samples/sec Loss 5.6150 LearningRate 0.0173 Epoch: 11 Global Step: 484560 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:37,954-Speed 2629.03 samples/sec Loss 5.4175 LearningRate 0.0173 Epoch: 11 Global Step: 484570 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:41,848-Speed 2629.75 samples/sec Loss 5.6016 LearningRate 0.0173 Epoch: 11 Global Step: 484580 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:45,727-Speed 2640.72 samples/sec Loss 5.5251 LearningRate 0.0173 Epoch: 11 Global Step: 484590 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:49,625-Speed 2627.69 samples/sec Loss 5.5925 LearningRate 0.0173 Epoch: 11 Global Step: 484600 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:53,526-Speed 2625.36 samples/sec Loss 5.5628 LearningRate 0.0173 Epoch: 11 Global Step: 484610 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:10:57,469-Speed 2597.79 samples/sec Loss 5.4843 LearningRate 0.0173 Epoch: 11 Global Step: 484620 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:11:01,609-Speed 2474.57 samples/sec Loss 5.5153 LearningRate 0.0173 Epoch: 11 Global Step: 484630 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:11:05,551-Speed 2598.13 samples/sec Loss 5.6228 LearningRate 0.0173 Epoch: 11 Global Step: 484640 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:11:09,415-Speed 2650.50 samples/sec Loss 5.5646 LearningRate 0.0173 Epoch: 11 Global Step: 484650 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:13,314-Speed 2627.12 samples/sec Loss 5.5618 LearningRate 0.0173 Epoch: 11 Global Step: 484660 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:17,224-Speed 2619.69 samples/sec Loss 5.6345 LearningRate 0.0173 Epoch: 11 Global Step: 484670 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:21,124-Speed 2625.97 samples/sec Loss 5.5461 LearningRate 0.0173 Epoch: 11 Global Step: 484680 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:25,019-Speed 2629.93 samples/sec Loss 5.4549 LearningRate 0.0173 Epoch: 11 Global Step: 484690 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:28,922-Speed 2624.31 samples/sec Loss 5.6048 LearningRate 0.0173 Epoch: 11 Global Step: 484700 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:32,822-Speed 2626.60 samples/sec Loss 5.4807 LearningRate 0.0173 Epoch: 11 Global Step: 484710 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:36,735-Speed 2617.12 samples/sec Loss 5.4505 LearningRate 0.0173 Epoch: 11 Global Step: 484720 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:40,631-Speed 2628.73 samples/sec Loss 5.6669 LearningRate 0.0173 Epoch: 11 Global Step: 484730 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:44,560-Speed 2606.78 samples/sec Loss 5.6289 LearningRate 0.0173 Epoch: 11 Global Step: 484740 Fp16 Grad Scale: 32768 Required: 39 hours
Training: 2022-04-15 02:11:48,452-Speed 2631.44 samples/sec Loss 5.5243 LearningRate 0.0173 Epoch: 11 Global Step: 484750 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:11:52,351-Speed 2627.36 samples/sec Loss 5.5556 LearningRate 0.0173 Epoch: 11 Global Step: 484760 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:11:56,255-Speed 2623.07 samples/sec Loss 5.4772 LearningRate 0.0173 Epoch: 11 Global Step: 484770 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:00,149-Speed 2630.41 samples/sec Loss 5.5806 LearningRate 0.0173 Epoch: 11 Global Step: 484780 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:04,057-Speed 2621.00 samples/sec Loss 5.5723 LearningRate 0.0173 Epoch: 11 Global Step: 484790 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:07,954-Speed 2628.16 samples/sec Loss 5.6440 LearningRate 0.0173 Epoch: 11 Global Step: 484800 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:11,847-Speed 2631.12 samples/sec Loss 5.5703 LearningRate 0.0173 Epoch: 11 Global Step: 484810 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:15,735-Speed 2634.67 samples/sec Loss 5.5570 LearningRate 0.0173 Epoch: 11 Global Step: 484820 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:19,628-Speed 2630.85 samples/sec Loss 5.5649 LearningRate 0.0173 Epoch: 11 Global Step: 484830 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:23,521-Speed 2630.76 samples/sec Loss 5.5556 LearningRate 0.0173 Epoch: 11 Global Step: 484840 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:12:27,426-Speed 2622.71 samples/sec Loss 5.5928 LearningRate 0.0173 Epoch: 11 Global Step: 484850 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:31,324-Speed 2628.00 samples/sec Loss 5.6091 LearningRate 0.0173 Epoch: 11 Global Step: 484860 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:35,243-Speed 2613.39 samples/sec Loss 5.5039 LearningRate 0.0173 Epoch: 11 Global Step: 484870 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:39,145-Speed 2624.52 samples/sec Loss 5.6023 LearningRate 0.0173 Epoch: 11 Global Step: 484880 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:43,045-Speed 2626.57 samples/sec Loss 5.5469 LearningRate 0.0173 Epoch: 11 Global Step: 484890 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:46,952-Speed 2621.67 samples/sec Loss 5.5597 LearningRate 0.0173 Epoch: 11 Global Step: 484900 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:50,854-Speed 2624.84 samples/sec Loss 5.5538 LearningRate 0.0173 Epoch: 11 Global Step: 484910 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:54,771-Speed 2615.30 samples/sec Loss 5.5272 LearningRate 0.0173 Epoch: 11 Global Step: 484920 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:12:58,670-Speed 2626.85 samples/sec Loss 5.6951 LearningRate 0.0173 Epoch: 11 Global Step: 484930 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:13:02,564-Speed 2630.05 samples/sec Loss 5.5028 LearningRate 0.0173 Epoch: 11 Global Step: 484940 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:13:06,469-Speed 2622.51 samples/sec Loss 5.5793 LearningRate 0.0173 Epoch: 11 Global Step: 484950 Fp16 Grad Scale: 262144 Required: 39 hours
Training: 2022-04-15 02:13:10,325-Speed 2656.35 samples/sec Loss 5.5395 LearningRate 0.0173 Epoch: 11 Global Step: 484960 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:14,231-Speed 2621.92 samples/sec Loss 5.4800 LearningRate 0.0173 Epoch: 11 Global Step: 484970 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:18,130-Speed 2627.45 samples/sec Loss 5.5478 LearningRate 0.0173 Epoch: 11 Global Step: 484980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:22,030-Speed 2625.77 samples/sec Loss 5.5384 LearningRate 0.0173 Epoch: 11 Global Step: 484990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:25,926-Speed 2629.31 samples/sec Loss 5.4965 LearningRate 0.0173 Epoch: 11 Global Step: 485000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:29,827-Speed 2625.61 samples/sec Loss 5.5984 LearningRate 0.0173 Epoch: 11 Global Step: 485010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:33,749-Speed 2611.43 samples/sec Loss 5.5167 LearningRate 0.0173 Epoch: 11 Global Step: 485020 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:37,664-Speed 2616.43 samples/sec Loss 5.6119 LearningRate 0.0172 Epoch: 11 Global Step: 485030 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:41,575-Speed 2619.14 samples/sec Loss 5.6071 LearningRate 0.0172 Epoch: 11 Global Step: 485040 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:45,481-Speed 2621.92 samples/sec Loss 5.6279 LearningRate 0.0172 Epoch: 11 Global Step: 485050 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:13:49,389-Speed 2621.15 samples/sec Loss 5.5803 LearningRate 0.0172 Epoch: 11 Global Step: 485060 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:13:53,297-Speed 2620.19 samples/sec Loss 5.4447 LearningRate 0.0172 Epoch: 11 Global Step: 485070 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:13:57,201-Speed 2623.93 samples/sec Loss 5.6030 LearningRate 0.0172 Epoch: 11 Global Step: 485080 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:14:01,098-Speed 2627.98 samples/sec Loss 5.6164 LearningRate 0.0172 Epoch: 11 Global Step: 485090 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:14:04,997-Speed 2627.37 samples/sec Loss 5.4277 LearningRate 0.0172 Epoch: 11 Global Step: 485100 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:14:08,887-Speed 2633.06 samples/sec Loss 5.4560 LearningRate 0.0172 Epoch: 11 Global Step: 485110 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:14:12,762-Speed 2642.76 samples/sec Loss 5.6300 LearningRate 0.0172 Epoch: 11 Global Step: 485120 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:16,668-Speed 2622.78 samples/sec Loss 5.5506 LearningRate 0.0172 Epoch: 11 Global Step: 485130 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:20,564-Speed 2628.49 samples/sec Loss 5.5314 LearningRate 0.0172 Epoch: 11 Global Step: 485140 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:24,462-Speed 2627.87 samples/sec Loss 5.5899 LearningRate 0.0172 Epoch: 11 Global Step: 485150 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:28,370-Speed 2620.67 samples/sec Loss 5.5797 LearningRate 0.0172 Epoch: 11 Global Step: 485160 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:32,270-Speed 2626.15 samples/sec Loss 5.4745 LearningRate 0.0172 Epoch: 11 Global Step: 485170 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:36,161-Speed 2632.17 samples/sec Loss 5.5717 LearningRate 0.0172 Epoch: 11 Global Step: 485180 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:40,055-Speed 2630.38 samples/sec Loss 5.5640 LearningRate 0.0172 Epoch: 11 Global Step: 485190 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:43,951-Speed 2629.36 samples/sec Loss 5.5557 LearningRate 0.0172 Epoch: 11 Global Step: 485200 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:47,855-Speed 2623.82 samples/sec Loss 5.5844 LearningRate 0.0172 Epoch: 11 Global Step: 485210 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:14:51,755-Speed 2626.31 samples/sec Loss 5.7642 LearningRate 0.0172 Epoch: 11 Global Step: 485220 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:14:55,653-Speed 2627.13 samples/sec Loss 5.5463 LearningRate 0.0172 Epoch: 11 Global Step: 485230 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:14:59,569-Speed 2615.42 samples/sec Loss 5.4636 LearningRate 0.0172 Epoch: 11 Global Step: 485240 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:15:03,478-Speed 2620.50 samples/sec Loss 5.5322 LearningRate 0.0172 Epoch: 11 Global Step: 485250 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:15:07,374-Speed 2628.87 samples/sec Loss 5.4825 LearningRate 0.0172 Epoch: 11 Global Step: 485260 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:15:11,250-Speed 2642.26 samples/sec Loss 5.5435 LearningRate 0.0172 Epoch: 11 Global Step: 485270 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:15,149-Speed 2626.77 samples/sec Loss 5.5719 LearningRate 0.0172 Epoch: 11 Global Step: 485280 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:19,063-Speed 2616.56 samples/sec Loss 5.6025 LearningRate 0.0172 Epoch: 11 Global Step: 485290 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:22,970-Speed 2622.46 samples/sec Loss 5.5726 LearningRate 0.0172 Epoch: 11 Global Step: 485300 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:26,865-Speed 2629.68 samples/sec Loss 5.4342 LearningRate 0.0172 Epoch: 11 Global Step: 485310 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:30,779-Speed 2616.98 samples/sec Loss 5.5621 LearningRate 0.0172 Epoch: 11 Global Step: 485320 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:34,678-Speed 2626.66 samples/sec Loss 5.5445 LearningRate 0.0172 Epoch: 11 Global Step: 485330 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:38,575-Speed 2628.16 samples/sec Loss 5.5277 LearningRate 0.0172 Epoch: 11 Global Step: 485340 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:42,471-Speed 2628.87 samples/sec Loss 5.4481 LearningRate 0.0172 Epoch: 11 Global Step: 485350 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:46,366-Speed 2630.05 samples/sec Loss 5.5033 LearningRate 0.0172 Epoch: 11 Global Step: 485360 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:50,267-Speed 2625.01 samples/sec Loss 5.5433 LearningRate 0.0172 Epoch: 11 Global Step: 485370 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:15:54,145-Speed 2641.32 samples/sec Loss 5.5649 LearningRate 0.0172 Epoch: 11 Global Step: 485380 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:15:58,042-Speed 2628.18 samples/sec Loss 5.4401 LearningRate 0.0172 Epoch: 11 Global Step: 485390 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:01,937-Speed 2630.97 samples/sec Loss 5.5005 LearningRate 0.0172 Epoch: 11 Global Step: 485400 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:05,832-Speed 2629.86 samples/sec Loss 5.5367 LearningRate 0.0172 Epoch: 11 Global Step: 485410 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:09,727-Speed 2629.41 samples/sec Loss 5.5709 LearningRate 0.0172 Epoch: 11 Global Step: 485420 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:13,626-Speed 2626.63 samples/sec Loss 5.4845 LearningRate 0.0172 Epoch: 11 Global Step: 485430 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:17,519-Speed 2631.14 samples/sec Loss 5.6775 LearningRate 0.0172 Epoch: 11 Global Step: 485440 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:21,420-Speed 2625.39 samples/sec Loss 5.5071 LearningRate 0.0172 Epoch: 11 Global Step: 485450 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:25,328-Speed 2620.98 samples/sec Loss 5.5737 LearningRate 0.0172 Epoch: 11 Global Step: 485460 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:29,223-Speed 2629.72 samples/sec Loss 5.5306 LearningRate 0.0172 Epoch: 11 Global Step: 485470 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:33,118-Speed 2629.64 samples/sec Loss 5.5501 LearningRate 0.0172 Epoch: 11 Global Step: 485480 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:16:37,015-Speed 2628.24 samples/sec Loss 5.6289 LearningRate 0.0172 Epoch: 11 Global Step: 485490 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:16:40,910-Speed 2629.64 samples/sec Loss 5.5087 LearningRate 0.0172 Epoch: 11 Global Step: 485500 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:16:44,807-Speed 2628.39 samples/sec Loss 5.5517 LearningRate 0.0172 Epoch: 11 Global Step: 485510 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:16:48,818-Speed 2553.50 samples/sec Loss 5.5633 LearningRate 0.0172 Epoch: 11 Global Step: 485520 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:16:52,693-Speed 2643.85 samples/sec Loss 5.5406 LearningRate 0.0172 Epoch: 11 Global Step: 485530 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:16:56,589-Speed 2628.81 samples/sec Loss 5.5790 LearningRate 0.0172 Epoch: 11 Global Step: 485540 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:00,484-Speed 2629.49 samples/sec Loss 5.5254 LearningRate 0.0172 Epoch: 11 Global Step: 485550 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:04,395-Speed 2619.06 samples/sec Loss 5.4895 LearningRate 0.0172 Epoch: 11 Global Step: 485560 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:08,304-Speed 2620.17 samples/sec Loss 5.4367 LearningRate 0.0172 Epoch: 11 Global Step: 485570 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:12,203-Speed 2626.67 samples/sec Loss 5.6074 LearningRate 0.0172 Epoch: 11 Global Step: 485580 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:16,107-Speed 2623.77 samples/sec Loss 5.5740 LearningRate 0.0172 Epoch: 11 Global Step: 485590 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:20,017-Speed 2619.89 samples/sec Loss 5.5254 LearningRate 0.0172 Epoch: 11 Global Step: 485600 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:23,932-Speed 2616.17 samples/sec Loss 5.5022 LearningRate 0.0172 Epoch: 11 Global Step: 485610 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:27,874-Speed 2598.33 samples/sec Loss 5.5553 LearningRate 0.0172 Epoch: 11 Global Step: 485620 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:31,772-Speed 2627.33 samples/sec Loss 5.6919 LearningRate 0.0172 Epoch: 11 Global Step: 485630 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:17:35,671-Speed 2627.03 samples/sec Loss 5.5771 LearningRate 0.0172 Epoch: 11 Global Step: 485640 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:17:39,573-Speed 2624.44 samples/sec Loss 5.5645 LearningRate 0.0172 Epoch: 11 Global Step: 485650 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:17:43,477-Speed 2623.34 samples/sec Loss 5.5139 LearningRate 0.0172 Epoch: 11 Global Step: 485660 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:17:47,376-Speed 2627.14 samples/sec Loss 5.5155 LearningRate 0.0172 Epoch: 11 Global Step: 485670 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:17:51,250-Speed 2644.24 samples/sec Loss 5.5939 LearningRate 0.0172 Epoch: 11 Global Step: 485680 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:55,146-Speed 2629.35 samples/sec Loss 5.6066 LearningRate 0.0172 Epoch: 11 Global Step: 485690 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:17:59,042-Speed 2628.78 samples/sec Loss 5.4145 LearningRate 0.0172 Epoch: 11 Global Step: 485700 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:02,936-Speed 2630.65 samples/sec Loss 5.5495 LearningRate 0.0172 Epoch: 11 Global Step: 485710 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:06,836-Speed 2625.86 samples/sec Loss 5.5310 LearningRate 0.0172 Epoch: 11 Global Step: 485720 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:10,737-Speed 2625.46 samples/sec Loss 5.6260 LearningRate 0.0172 Epoch: 11 Global Step: 485730 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:14,642-Speed 2622.95 samples/sec Loss 5.4652 LearningRate 0.0172 Epoch: 11 Global Step: 485740 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:18,554-Speed 2618.09 samples/sec Loss 5.5761 LearningRate 0.0172 Epoch: 11 Global Step: 485750 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:22,454-Speed 2626.40 samples/sec Loss 5.4125 LearningRate 0.0172 Epoch: 11 Global Step: 485760 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:26,352-Speed 2627.28 samples/sec Loss 5.5658 LearningRate 0.0172 Epoch: 11 Global Step: 485770 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:18:30,263-Speed 2618.98 samples/sec Loss 5.5287 LearningRate 0.0172 Epoch: 11 Global Step: 485780 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:18:34,157-Speed 2630.38 samples/sec Loss 5.5668 LearningRate 0.0172 Epoch: 11 Global Step: 485790 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:18:38,051-Speed 2630.30 samples/sec Loss 5.5053 LearningRate 0.0172 Epoch: 11 Global Step: 485800 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:18:41,952-Speed 2625.80 samples/sec Loss 5.6139 LearningRate 0.0172 Epoch: 11 Global Step: 485810 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:18:45,846-Speed 2630.46 samples/sec Loss 5.4958 LearningRate 0.0172 Epoch: 11 Global Step: 485820 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:18:49,744-Speed 2626.97 samples/sec Loss 5.5856 LearningRate 0.0172 Epoch: 11 Global Step: 485830 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:18:53,644-Speed 2626.86 samples/sec Loss 5.6357 LearningRate 0.0172 Epoch: 11 Global Step: 485840 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:18:57,526-Speed 2638.19 samples/sec Loss 5.5030 LearningRate 0.0172 Epoch: 11 Global Step: 485850 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:01,423-Speed 2628.42 samples/sec Loss 5.6171 LearningRate 0.0172 Epoch: 11 Global Step: 485860 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:05,319-Speed 2628.52 samples/sec Loss 5.6163 LearningRate 0.0172 Epoch: 11 Global Step: 485870 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:09,222-Speed 2624.29 samples/sec Loss 5.5741 LearningRate 0.0172 Epoch: 11 Global Step: 485880 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:13,195-Speed 2578.15 samples/sec Loss 5.3772 LearningRate 0.0172 Epoch: 11 Global Step: 485890 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:17,129-Speed 2603.89 samples/sec Loss 5.5592 LearningRate 0.0172 Epoch: 11 Global Step: 485900 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:21,073-Speed 2597.08 samples/sec Loss 5.6095 LearningRate 0.0172 Epoch: 11 Global Step: 485910 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:24,986-Speed 2620.65 samples/sec Loss 5.5355 LearningRate 0.0172 Epoch: 11 Global Step: 485920 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:28,878-Speed 2631.13 samples/sec Loss 5.6329 LearningRate 0.0172 Epoch: 11 Global Step: 485930 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:32,774-Speed 2628.93 samples/sec Loss 5.4929 LearningRate 0.0172 Epoch: 11 Global Step: 485940 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:36,651-Speed 2641.83 samples/sec Loss 5.5111 LearningRate 0.0172 Epoch: 11 Global Step: 485950 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:40,542-Speed 2632.84 samples/sec Loss 5.6122 LearningRate 0.0172 Epoch: 11 Global Step: 485960 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:44,436-Speed 2630.48 samples/sec Loss 5.5707 LearningRate 0.0172 Epoch: 11 Global Step: 485970 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:48,333-Speed 2628.51 samples/sec Loss 5.5487 LearningRate 0.0172 Epoch: 11 Global Step: 485980 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:52,246-Speed 2617.29 samples/sec Loss 5.4704 LearningRate 0.0172 Epoch: 11 Global Step: 485990 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:19:56,140-Speed 2630.22 samples/sec Loss 5.6343 LearningRate 0.0172 Epoch: 11 Global Step: 486000 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:00,038-Speed 2627.80 samples/sec Loss 5.4192 LearningRate 0.0172 Epoch: 11 Global Step: 486010 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:03,944-Speed 2621.72 samples/sec Loss 5.6220 LearningRate 0.0172 Epoch: 11 Global Step: 486020 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:07,873-Speed 2606.58 samples/sec Loss 5.5003 LearningRate 0.0171 Epoch: 11 Global Step: 486030 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:11,770-Speed 2628.94 samples/sec Loss 5.4878 LearningRate 0.0171 Epoch: 11 Global Step: 486040 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:15,666-Speed 2628.78 samples/sec Loss 5.6857 LearningRate 0.0171 Epoch: 11 Global Step: 486050 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:20:19,555-Speed 2633.57 samples/sec Loss 5.5865 LearningRate 0.0171 Epoch: 11 Global Step: 486060 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:23,456-Speed 2625.85 samples/sec Loss 5.6723 LearningRate 0.0171 Epoch: 11 Global Step: 486070 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:27,354-Speed 2627.90 samples/sec Loss 5.5648 LearningRate 0.0171 Epoch: 11 Global Step: 486080 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:31,300-Speed 2595.50 samples/sec Loss 5.5353 LearningRate 0.0171 Epoch: 11 Global Step: 486090 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:35,198-Speed 2627.28 samples/sec Loss 5.3913 LearningRate 0.0171 Epoch: 11 Global Step: 486100 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:39,095-Speed 2628.02 samples/sec Loss 5.6218 LearningRate 0.0171 Epoch: 11 Global Step: 486110 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:43,003-Speed 2621.24 samples/sec Loss 5.5101 LearningRate 0.0171 Epoch: 11 Global Step: 486120 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:46,901-Speed 2627.10 samples/sec Loss 5.5462 LearningRate 0.0171 Epoch: 11 Global Step: 486130 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:50,806-Speed 2623.22 samples/sec Loss 5.5725 LearningRate 0.0171 Epoch: 11 Global Step: 486140 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:54,711-Speed 2622.79 samples/sec Loss 5.5589 LearningRate 0.0171 Epoch: 11 Global Step: 486150 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:20:58,617-Speed 2622.61 samples/sec Loss 5.6098 LearningRate 0.0171 Epoch: 11 Global Step: 486160 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:21:02,612-Speed 2563.69 samples/sec Loss 5.5144 LearningRate 0.0171 Epoch: 11 Global Step: 486170 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:06,707-Speed 2501.03 samples/sec Loss 5.5735 LearningRate 0.0171 Epoch: 11 Global Step: 486180 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:10,807-Speed 2498.50 samples/sec Loss 5.6774 LearningRate 0.0171 Epoch: 11 Global Step: 486190 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:14,839-Speed 2540.28 samples/sec Loss 5.4937 LearningRate 0.0171 Epoch: 11 Global Step: 486200 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:18,738-Speed 2626.77 samples/sec Loss 5.5902 LearningRate 0.0171 Epoch: 11 Global Step: 486210 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:22,633-Speed 2629.85 samples/sec Loss 5.5538 LearningRate 0.0171 Epoch: 11 Global Step: 486220 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:26,540-Speed 2621.28 samples/sec Loss 5.4810 LearningRate 0.0171 Epoch: 11 Global Step: 486230 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:30,457-Speed 2614.67 samples/sec Loss 5.5785 LearningRate 0.0171 Epoch: 11 Global Step: 486240 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:34,360-Speed 2624.75 samples/sec Loss 5.4676 LearningRate 0.0171 Epoch: 11 Global Step: 486250 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:38,256-Speed 2628.79 samples/sec Loss 5.5159 LearningRate 0.0171 Epoch: 11 Global Step: 486260 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:21:42,154-Speed 2627.35 samples/sec Loss 5.4812 LearningRate 0.0171 Epoch: 11 Global Step: 486270 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:21:46,055-Speed 2625.69 samples/sec Loss 5.5254 LearningRate 0.0171 Epoch: 11 Global Step: 486280 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:21:49,951-Speed 2629.52 samples/sec Loss 5.6266 LearningRate 0.0171 Epoch: 11 Global Step: 486290 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:21:53,844-Speed 2630.51 samples/sec Loss 5.5650 LearningRate 0.0171 Epoch: 11 Global Step: 486300 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:21:57,739-Speed 2629.63 samples/sec Loss 5.4991 LearningRate 0.0171 Epoch: 11 Global Step: 486310 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:22:01,640-Speed 2625.45 samples/sec Loss 5.5860 LearningRate 0.0171 Epoch: 11 Global Step: 486320 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:22:05,536-Speed 2628.78 samples/sec Loss 5.5203 LearningRate 0.0171 Epoch: 11 Global Step: 486330 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:22:09,428-Speed 2631.58 samples/sec Loss 5.4496 LearningRate 0.0171 Epoch: 11 Global Step: 486340 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:22:13,303-Speed 2643.92 samples/sec Loss 5.4125 LearningRate 0.0171 Epoch: 11 Global Step: 486350 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:17,197-Speed 2630.15 samples/sec Loss 5.6098 LearningRate 0.0171 Epoch: 11 Global Step: 486360 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:21,102-Speed 2622.94 samples/sec Loss 5.5881 LearningRate 0.0171 Epoch: 11 Global Step: 486370 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:25,024-Speed 2611.58 samples/sec Loss 5.4256 LearningRate 0.0171 Epoch: 11 Global Step: 486380 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:28,924-Speed 2625.80 samples/sec Loss 5.6090 LearningRate 0.0171 Epoch: 11 Global Step: 486390 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:32,818-Speed 2630.37 samples/sec Loss 5.4878 LearningRate 0.0171 Epoch: 11 Global Step: 486400 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:36,718-Speed 2626.16 samples/sec Loss 5.5698 LearningRate 0.0171 Epoch: 11 Global Step: 486410 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:40,612-Speed 2630.60 samples/sec Loss 5.6038 LearningRate 0.0171 Epoch: 11 Global Step: 486420 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:44,513-Speed 2625.09 samples/sec Loss 5.4879 LearningRate 0.0171 Epoch: 11 Global Step: 486430 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:48,412-Speed 2627.04 samples/sec Loss 5.5105 LearningRate 0.0171 Epoch: 11 Global Step: 486440 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:22:52,313-Speed 2625.63 samples/sec Loss 5.5538 LearningRate 0.0171 Epoch: 11 Global Step: 486450 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:22:56,202-Speed 2633.91 samples/sec Loss 5.5520 LearningRate 0.0171 Epoch: 11 Global Step: 486460 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:00,095-Speed 2630.95 samples/sec Loss 5.4354 LearningRate 0.0171 Epoch: 11 Global Step: 486470 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:03,989-Speed 2631.22 samples/sec Loss 5.5321 LearningRate 0.0171 Epoch: 11 Global Step: 486480 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:07,894-Speed 2623.14 samples/sec Loss 5.5176 LearningRate 0.0171 Epoch: 11 Global Step: 486490 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:11,792-Speed 2627.00 samples/sec Loss 5.5945 LearningRate 0.0171 Epoch: 11 Global Step: 486500 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:15,691-Speed 2627.29 samples/sec Loss 5.5260 LearningRate 0.0171 Epoch: 11 Global Step: 486510 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:19,592-Speed 2624.98 samples/sec Loss 5.5639 LearningRate 0.0171 Epoch: 11 Global Step: 486520 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:23,487-Speed 2630.10 samples/sec Loss 5.5620 LearningRate 0.0171 Epoch: 11 Global Step: 486530 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:27,383-Speed 2629.27 samples/sec Loss 5.5888 LearningRate 0.0171 Epoch: 11 Global Step: 486540 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:31,255-Speed 2645.12 samples/sec Loss 5.6177 LearningRate 0.0171 Epoch: 11 Global Step: 486550 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:35,153-Speed 2627.40 samples/sec Loss 5.5467 LearningRate 0.0171 Epoch: 11 Global Step: 486560 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:23:39,041-Speed 2634.73 samples/sec Loss 5.5226 LearningRate 0.0171 Epoch: 11 Global Step: 486570 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:23:42,933-Speed 2631.44 samples/sec Loss 5.6282 LearningRate 0.0171 Epoch: 11 Global Step: 486580 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:23:46,831-Speed 2627.67 samples/sec Loss 5.6458 LearningRate 0.0171 Epoch: 11 Global Step: 486590 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:23:50,727-Speed 2628.79 samples/sec Loss 5.6592 LearningRate 0.0171 Epoch: 11 Global Step: 486600 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:23:54,676-Speed 2593.43 samples/sec Loss 5.5710 LearningRate 0.0171 Epoch: 11 Global Step: 486610 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:23:58,604-Speed 2607.28 samples/sec Loss 5.5870 LearningRate 0.0171 Epoch: 11 Global Step: 486620 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:02,508-Speed 2623.71 samples/sec Loss 5.5360 LearningRate 0.0171 Epoch: 11 Global Step: 486630 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:06,404-Speed 2629.47 samples/sec Loss 5.4740 LearningRate 0.0171 Epoch: 11 Global Step: 486640 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:10,304-Speed 2626.20 samples/sec Loss 5.4810 LearningRate 0.0171 Epoch: 11 Global Step: 486650 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:14,200-Speed 2628.77 samples/sec Loss 5.4633 LearningRate 0.0171 Epoch: 11 Global Step: 486660 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:18,104-Speed 2623.99 samples/sec Loss 5.4981 LearningRate 0.0171 Epoch: 11 Global Step: 486670 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:24:21,985-Speed 2638.59 samples/sec Loss 5.5280 LearningRate 0.0171 Epoch: 11 Global Step: 486680 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:25,882-Speed 2628.11 samples/sec Loss 5.5881 LearningRate 0.0171 Epoch: 11 Global Step: 486690 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:29,790-Speed 2621.06 samples/sec Loss 5.5202 LearningRate 0.0171 Epoch: 11 Global Step: 486700 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:33,694-Speed 2623.54 samples/sec Loss 5.5818 LearningRate 0.0171 Epoch: 11 Global Step: 486710 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:37,600-Speed 2622.39 samples/sec Loss 5.4898 LearningRate 0.0171 Epoch: 11 Global Step: 486720 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:41,499-Speed 2626.70 samples/sec Loss 5.4045 LearningRate 0.0171 Epoch: 11 Global Step: 486730 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:45,391-Speed 2631.91 samples/sec Loss 5.6020 LearningRate 0.0171 Epoch: 11 Global Step: 486740 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:49,284-Speed 2631.04 samples/sec Loss 5.6245 LearningRate 0.0171 Epoch: 11 Global Step: 486750 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:53,177-Speed 2630.95 samples/sec Loss 5.5633 LearningRate 0.0171 Epoch: 11 Global Step: 486760 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:24:57,094-Speed 2614.78 samples/sec Loss 5.6340 LearningRate 0.0171 Epoch: 11 Global Step: 486770 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:25:01,012-Speed 2614.22 samples/sec Loss 5.6080 LearningRate 0.0171 Epoch: 11 Global Step: 486780 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:25:04,932-Speed 2612.82 samples/sec Loss 5.4753 LearningRate 0.0171 Epoch: 11 Global Step: 486790 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:25:08,837-Speed 2622.84 samples/sec Loss 5.5135 LearningRate 0.0171 Epoch: 11 Global Step: 486800 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:25:12,744-Speed 2621.62 samples/sec Loss 5.5005 LearningRate 0.0171 Epoch: 11 Global Step: 486810 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:25:16,652-Speed 2621.03 samples/sec Loss 5.5991 LearningRate 0.0171 Epoch: 11 Global Step: 486820 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:25:20,550-Speed 2627.42 samples/sec Loss 5.5478 LearningRate 0.0171 Epoch: 11 Global Step: 486830 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:25:24,450-Speed 2626.47 samples/sec Loss 5.6358 LearningRate 0.0171 Epoch: 11 Global Step: 486840 Fp16 Grad Scale: 131072 Required: 39 hours
Training: 2022-04-15 02:25:28,329-Speed 2640.91 samples/sec Loss 5.4087 LearningRate 0.0171 Epoch: 11 Global Step: 486850 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:25:32,234-Speed 2622.62 samples/sec Loss 5.6123 LearningRate 0.0171 Epoch: 11 Global Step: 486860 Fp16 Grad Scale: 65536 Required: 39 hours
Training: 2022-04-15 02:25:36,125-Speed 2631.97 samples/sec Loss 5.4062 LearningRate 0.0171 Epoch: 11 Global Step: 486870 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:25:40,018-Speed 2631.28 samples/sec Loss 5.6409 LearningRate 0.0171 Epoch: 11 Global Step: 486880 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:25:43,912-Speed 2630.17 samples/sec Loss 5.5022 LearningRate 0.0171 Epoch: 11 Global Step: 486890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:25:47,809-Speed 2628.57 samples/sec Loss 5.5584 LearningRate 0.0171 Epoch: 11 Global Step: 486900 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:25:51,707-Speed 2627.45 samples/sec Loss 5.5293 LearningRate 0.0171 Epoch: 11 Global Step: 486910 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:25:55,601-Speed 2630.77 samples/sec Loss 5.4781 LearningRate 0.0171 Epoch: 11 Global Step: 486920 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:25:59,504-Speed 2624.11 samples/sec Loss 5.5595 LearningRate 0.0171 Epoch: 11 Global Step: 486930 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:03,400-Speed 2628.69 samples/sec Loss 5.5181 LearningRate 0.0171 Epoch: 11 Global Step: 486940 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:07,298-Speed 2627.65 samples/sec Loss 5.5294 LearningRate 0.0171 Epoch: 11 Global Step: 486950 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:26:11,192-Speed 2630.21 samples/sec Loss 5.5517 LearningRate 0.0171 Epoch: 11 Global Step: 486960 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:26:15,090-Speed 2628.02 samples/sec Loss 5.5861 LearningRate 0.0171 Epoch: 11 Global Step: 486970 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:26:18,985-Speed 2629.28 samples/sec Loss 5.4338 LearningRate 0.0171 Epoch: 11 Global Step: 486980 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:26:22,873-Speed 2634.44 samples/sec Loss 5.5404 LearningRate 0.0171 Epoch: 11 Global Step: 486990 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:26,780-Speed 2622.36 samples/sec Loss 5.6567 LearningRate 0.0171 Epoch: 11 Global Step: 487000 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:30,685-Speed 2622.59 samples/sec Loss 5.4864 LearningRate 0.0171 Epoch: 11 Global Step: 487010 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:34,587-Speed 2624.56 samples/sec Loss 5.5651 LearningRate 0.0171 Epoch: 11 Global Step: 487020 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:38,477-Speed 2632.93 samples/sec Loss 5.5427 LearningRate 0.0171 Epoch: 11 Global Step: 487030 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:42,372-Speed 2630.34 samples/sec Loss 5.4804 LearningRate 0.0170 Epoch: 11 Global Step: 487040 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:46,273-Speed 2625.52 samples/sec Loss 5.4780 LearningRate 0.0170 Epoch: 11 Global Step: 487050 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:26:50,234-Speed 2585.95 samples/sec Loss 5.5564 LearningRate 0.0170 Epoch: 11 Global Step: 487060 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:26:54,154-Speed 2612.60 samples/sec Loss 5.4689 LearningRate 0.0170 Epoch: 11 Global Step: 487070 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:26:58,048-Speed 2630.80 samples/sec Loss 5.5169 LearningRate 0.0170 Epoch: 11 Global Step: 487080 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:01,943-Speed 2629.50 samples/sec Loss 5.5554 LearningRate 0.0170 Epoch: 11 Global Step: 487090 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:05,852-Speed 2620.58 samples/sec Loss 5.5361 LearningRate 0.0170 Epoch: 11 Global Step: 487100 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:09,754-Speed 2624.23 samples/sec Loss 5.5633 LearningRate 0.0170 Epoch: 11 Global Step: 487110 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:13,651-Speed 2628.62 samples/sec Loss 5.5657 LearningRate 0.0170 Epoch: 11 Global Step: 487120 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:17,552-Speed 2625.14 samples/sec Loss 5.5822 LearningRate 0.0170 Epoch: 11 Global Step: 487130 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:21,451-Speed 2628.00 samples/sec Loss 5.4692 LearningRate 0.0170 Epoch: 11 Global Step: 487140 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:25,360-Speed 2620.23 samples/sec Loss 5.4176 LearningRate 0.0170 Epoch: 11 Global Step: 487150 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:27:29,283-Speed 2610.76 samples/sec Loss 5.4754 LearningRate 0.0170 Epoch: 11 Global Step: 487160 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:27:33,211-Speed 2607.49 samples/sec Loss 5.5055 LearningRate 0.0170 Epoch: 11 Global Step: 487170 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:27:37,103-Speed 2631.15 samples/sec Loss 5.6573 LearningRate 0.0170 Epoch: 11 Global Step: 487180 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:27:40,997-Speed 2630.61 samples/sec Loss 5.5295 LearningRate 0.0170 Epoch: 11 Global Step: 487190 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:27:44,901-Speed 2623.39 samples/sec Loss 5.5526 LearningRate 0.0170 Epoch: 11 Global Step: 487200 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:27:48,800-Speed 2627.02 samples/sec Loss 5.4530 LearningRate 0.0170 Epoch: 11 Global Step: 487210 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:27:52,701-Speed 2625.35 samples/sec Loss 5.5713 LearningRate 0.0170 Epoch: 11 Global Step: 487220 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:27:56,604-Speed 2624.74 samples/sec Loss 5.5090 LearningRate 0.0170 Epoch: 11 Global Step: 487230 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:00,495-Speed 2631.86 samples/sec Loss 5.7270 LearningRate 0.0170 Epoch: 11 Global Step: 487240 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:04,388-Speed 2631.05 samples/sec Loss 5.4363 LearningRate 0.0170 Epoch: 11 Global Step: 487250 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:08,258-Speed 2646.50 samples/sec Loss 5.5467 LearningRate 0.0170 Epoch: 11 Global Step: 487260 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:12,152-Speed 2630.12 samples/sec Loss 5.5809 LearningRate 0.0170 Epoch: 11 Global Step: 487270 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:16,046-Speed 2630.63 samples/sec Loss 5.5768 LearningRate 0.0170 Epoch: 11 Global Step: 487280 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:19,952-Speed 2622.03 samples/sec Loss 5.4872 LearningRate 0.0170 Epoch: 11 Global Step: 487290 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:23,849-Speed 2627.96 samples/sec Loss 5.5382 LearningRate 0.0170 Epoch: 11 Global Step: 487300 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:27,756-Speed 2621.30 samples/sec Loss 5.6023 LearningRate 0.0170 Epoch: 11 Global Step: 487310 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:31,653-Speed 2629.33 samples/sec Loss 5.5336 LearningRate 0.0170 Epoch: 11 Global Step: 487320 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:35,550-Speed 2628.69 samples/sec Loss 5.4095 LearningRate 0.0170 Epoch: 11 Global Step: 487330 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:39,454-Speed 2623.71 samples/sec Loss 5.3789 LearningRate 0.0170 Epoch: 11 Global Step: 487340 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:43,353-Speed 2626.29 samples/sec Loss 5.4829 LearningRate 0.0170 Epoch: 11 Global Step: 487350 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:28:47,246-Speed 2631.41 samples/sec Loss 5.4747 LearningRate 0.0170 Epoch: 11 Global Step: 487360 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:28:51,150-Speed 2623.47 samples/sec Loss 5.5045 LearningRate 0.0170 Epoch: 11 Global Step: 487370 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:28:55,050-Speed 2626.78 samples/sec Loss 5.5284 LearningRate 0.0170 Epoch: 11 Global Step: 487380 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:28:58,948-Speed 2627.25 samples/sec Loss 5.4752 LearningRate 0.0170 Epoch: 11 Global Step: 487390 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:02,842-Speed 2630.70 samples/sec Loss 5.4788 LearningRate 0.0170 Epoch: 11 Global Step: 487400 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:06,743-Speed 2625.36 samples/sec Loss 5.5243 LearningRate 0.0170 Epoch: 11 Global Step: 487410 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:10,646-Speed 2624.25 samples/sec Loss 5.4249 LearningRate 0.0170 Epoch: 11 Global Step: 487420 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:14,544-Speed 2627.79 samples/sec Loss 5.4650 LearningRate 0.0170 Epoch: 11 Global Step: 487430 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:18,453-Speed 2620.34 samples/sec Loss 5.5719 LearningRate 0.0170 Epoch: 11 Global Step: 487440 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:22,349-Speed 2628.67 samples/sec Loss 5.5544 LearningRate 0.0170 Epoch: 11 Global Step: 487450 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:26,229-Speed 2639.88 samples/sec Loss 5.5789 LearningRate 0.0170 Epoch: 11 Global Step: 487460 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:30,124-Speed 2629.77 samples/sec Loss 5.5495 LearningRate 0.0170 Epoch: 11 Global Step: 487470 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:34,019-Speed 2631.64 samples/sec Loss 5.4173 LearningRate 0.0170 Epoch: 11 Global Step: 487480 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:37,926-Speed 2621.38 samples/sec Loss 5.5035 LearningRate 0.0170 Epoch: 11 Global Step: 487490 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:29:41,811-Speed 2636.04 samples/sec Loss 5.5787 LearningRate 0.0170 Epoch: 11 Global Step: 487500 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:29:45,714-Speed 2624.02 samples/sec Loss 5.4949 LearningRate 0.0170 Epoch: 11 Global Step: 487510 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:29:49,627-Speed 2618.17 samples/sec Loss 5.5651 LearningRate 0.0170 Epoch: 11 Global Step: 487520 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:29:53,528-Speed 2626.19 samples/sec Loss 5.5545 LearningRate 0.0170 Epoch: 11 Global Step: 487530 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:29:57,421-Speed 2630.30 samples/sec Loss 5.4822 LearningRate 0.0170 Epoch: 11 Global Step: 487540 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:01,324-Speed 2624.56 samples/sec Loss 5.4814 LearningRate 0.0170 Epoch: 11 Global Step: 487550 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:05,225-Speed 2625.54 samples/sec Loss 5.4530 LearningRate 0.0170 Epoch: 11 Global Step: 487560 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:09,129-Speed 2623.52 samples/sec Loss 5.4256 LearningRate 0.0170 Epoch: 11 Global Step: 487570 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:13,026-Speed 2628.24 samples/sec Loss 5.5013 LearningRate 0.0170 Epoch: 11 Global Step: 487580 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:16,932-Speed 2622.14 samples/sec Loss 5.3979 LearningRate 0.0170 Epoch: 11 Global Step: 487590 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:20,832-Speed 2626.14 samples/sec Loss 5.4478 LearningRate 0.0170 Epoch: 11 Global Step: 487600 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:24,731-Speed 2627.12 samples/sec Loss 5.5318 LearningRate 0.0170 Epoch: 11 Global Step: 487610 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:28,628-Speed 2628.34 samples/sec Loss 5.5108 LearningRate 0.0170 Epoch: 11 Global Step: 487620 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:32,526-Speed 2627.92 samples/sec Loss 5.4312 LearningRate 0.0170 Epoch: 11 Global Step: 487630 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:36,424-Speed 2627.31 samples/sec Loss 5.6507 LearningRate 0.0170 Epoch: 11 Global Step: 487640 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:40,331-Speed 2621.27 samples/sec Loss 5.3474 LearningRate 0.0170 Epoch: 11 Global Step: 487650 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:44,232-Speed 2626.21 samples/sec Loss 5.5119 LearningRate 0.0170 Epoch: 11 Global Step: 487660 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:48,127-Speed 2629.67 samples/sec Loss 5.4824 LearningRate 0.0170 Epoch: 11 Global Step: 487670 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:30:52,007-Speed 2642.56 samples/sec Loss 5.5479 LearningRate 0.0170 Epoch: 11 Global Step: 487680 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:55,902-Speed 2629.55 samples/sec Loss 5.5426 LearningRate 0.0170 Epoch: 11 Global Step: 487690 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:30:59,805-Speed 2624.20 samples/sec Loss 5.4909 LearningRate 0.0170 Epoch: 11 Global Step: 487700 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:03,702-Speed 2628.13 samples/sec Loss 5.4258 LearningRate 0.0170 Epoch: 11 Global Step: 487710 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:07,604-Speed 2625.36 samples/sec Loss 5.4747 LearningRate 0.0170 Epoch: 11 Global Step: 487720 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:11,505-Speed 2625.63 samples/sec Loss 5.4530 LearningRate 0.0170 Epoch: 11 Global Step: 487730 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:15,399-Speed 2629.76 samples/sec Loss 5.5132 LearningRate 0.0170 Epoch: 11 Global Step: 487740 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:19,299-Speed 2626.80 samples/sec Loss 5.5852 LearningRate 0.0170 Epoch: 11 Global Step: 487750 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:23,194-Speed 2629.58 samples/sec Loss 5.4814 LearningRate 0.0170 Epoch: 11 Global Step: 487760 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:27,088-Speed 2630.45 samples/sec Loss 5.4824 LearningRate 0.0170 Epoch: 11 Global Step: 487770 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:30,980-Speed 2631.31 samples/sec Loss 5.5818 LearningRate 0.0170 Epoch: 11 Global Step: 487780 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:31:34,874-Speed 2630.20 samples/sec Loss 5.4723 LearningRate 0.0170 Epoch: 11 Global Step: 487790 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:31:38,747-Speed 2644.37 samples/sec Loss 5.4888 LearningRate 0.0170 Epoch: 11 Global Step: 487800 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:42,654-Speed 2621.30 samples/sec Loss 5.4062 LearningRate 0.0170 Epoch: 11 Global Step: 487810 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:46,550-Speed 2629.13 samples/sec Loss 5.5550 LearningRate 0.0170 Epoch: 11 Global Step: 487820 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:50,455-Speed 2623.22 samples/sec Loss 5.5733 LearningRate 0.0170 Epoch: 11 Global Step: 487830 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:54,351-Speed 2629.15 samples/sec Loss 5.6516 LearningRate 0.0170 Epoch: 11 Global Step: 487840 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:31:58,249-Speed 2627.70 samples/sec Loss 5.6044 LearningRate 0.0170 Epoch: 11 Global Step: 487850 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:02,147-Speed 2627.49 samples/sec Loss 5.4892 LearningRate 0.0170 Epoch: 11 Global Step: 487860 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:06,046-Speed 2627.09 samples/sec Loss 5.5790 LearningRate 0.0170 Epoch: 11 Global Step: 487870 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:09,952-Speed 2621.46 samples/sec Loss 5.5382 LearningRate 0.0170 Epoch: 11 Global Step: 487880 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:13,859-Speed 2621.89 samples/sec Loss 5.4898 LearningRate 0.0170 Epoch: 11 Global Step: 487890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:17,756-Speed 2628.20 samples/sec Loss 5.4102 LearningRate 0.0170 Epoch: 11 Global Step: 487900 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:32:21,657-Speed 2625.97 samples/sec Loss 5.4854 LearningRate 0.0170 Epoch: 11 Global Step: 487910 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:32:25,557-Speed 2626.52 samples/sec Loss 5.5722 LearningRate 0.0170 Epoch: 11 Global Step: 487920 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:32:29,435-Speed 2641.14 samples/sec Loss 5.5067 LearningRate 0.0170 Epoch: 11 Global Step: 487930 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:33,340-Speed 2622.94 samples/sec Loss 5.4239 LearningRate 0.0170 Epoch: 11 Global Step: 487940 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:37,240-Speed 2625.91 samples/sec Loss 5.4636 LearningRate 0.0170 Epoch: 11 Global Step: 487950 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:41,138-Speed 2627.55 samples/sec Loss 5.6092 LearningRate 0.0170 Epoch: 11 Global Step: 487960 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:45,042-Speed 2624.53 samples/sec Loss 5.5859 LearningRate 0.0170 Epoch: 11 Global Step: 487970 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:48,945-Speed 2624.53 samples/sec Loss 5.6080 LearningRate 0.0170 Epoch: 11 Global Step: 487980 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:52,848-Speed 2624.16 samples/sec Loss 5.4282 LearningRate 0.0170 Epoch: 11 Global Step: 487990 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:32:56,756-Speed 2621.05 samples/sec Loss 5.4920 LearningRate 0.0170 Epoch: 11 Global Step: 488000 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:33:00,662-Speed 2622.10 samples/sec Loss 5.4411 LearningRate 0.0170 Epoch: 11 Global Step: 488010 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:33:04,567-Speed 2623.19 samples/sec Loss 5.4987 LearningRate 0.0170 Epoch: 11 Global Step: 488020 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:33:08,463-Speed 2628.40 samples/sec Loss 5.5162 LearningRate 0.0170 Epoch: 11 Global Step: 488030 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:12,357-Speed 2631.29 samples/sec Loss 5.4638 LearningRate 0.0169 Epoch: 11 Global Step: 488040 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:16,255-Speed 2627.45 samples/sec Loss 5.4510 LearningRate 0.0169 Epoch: 11 Global Step: 488050 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:20,156-Speed 2626.16 samples/sec Loss 5.6024 LearningRate 0.0169 Epoch: 11 Global Step: 488060 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:24,053-Speed 2628.19 samples/sec Loss 5.5363 LearningRate 0.0169 Epoch: 11 Global Step: 488070 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:27,953-Speed 2626.15 samples/sec Loss 5.4573 LearningRate 0.0169 Epoch: 11 Global Step: 488080 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:31,854-Speed 2625.17 samples/sec Loss 5.5814 LearningRate 0.0169 Epoch: 11 Global Step: 488090 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:35,747-Speed 2631.24 samples/sec Loss 5.4760 LearningRate 0.0169 Epoch: 11 Global Step: 488100 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:39,640-Speed 2631.01 samples/sec Loss 5.4394 LearningRate 0.0169 Epoch: 11 Global Step: 488110 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:43,542-Speed 2625.11 samples/sec Loss 5.5530 LearningRate 0.0169 Epoch: 11 Global Step: 488120 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:47,415-Speed 2644.39 samples/sec Loss 5.4834 LearningRate 0.0169 Epoch: 11 Global Step: 488130 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:51,309-Speed 2630.83 samples/sec Loss 5.4917 LearningRate 0.0169 Epoch: 11 Global Step: 488140 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:55,203-Speed 2629.67 samples/sec Loss 5.4726 LearningRate 0.0169 Epoch: 11 Global Step: 488150 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:33:59,099-Speed 2629.15 samples/sec Loss 5.5148 LearningRate 0.0169 Epoch: 11 Global Step: 488160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:34:02,986-Speed 2635.44 samples/sec Loss 5.5083 LearningRate 0.0169 Epoch: 11 Global Step: 488170 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:06,888-Speed 2624.66 samples/sec Loss 5.4502 LearningRate 0.0169 Epoch: 11 Global Step: 488180 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:10,784-Speed 2628.32 samples/sec Loss 5.4838 LearningRate 0.0169 Epoch: 11 Global Step: 488190 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:14,682-Speed 2628.11 samples/sec Loss 5.4425 LearningRate 0.0169 Epoch: 11 Global Step: 488200 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:18,582-Speed 2626.48 samples/sec Loss 5.5892 LearningRate 0.0169 Epoch: 11 Global Step: 488210 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:22,478-Speed 2629.03 samples/sec Loss 5.4633 LearningRate 0.0169 Epoch: 11 Global Step: 488220 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:26,376-Speed 2628.18 samples/sec Loss 5.5943 LearningRate 0.0169 Epoch: 11 Global Step: 488230 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:30,270-Speed 2629.84 samples/sec Loss 5.5583 LearningRate 0.0169 Epoch: 11 Global Step: 488240 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:34,183-Speed 2617.68 samples/sec Loss 5.5218 LearningRate 0.0169 Epoch: 11 Global Step: 488250 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:38,080-Speed 2628.36 samples/sec Loss 5.4728 LearningRate 0.0169 Epoch: 11 Global Step: 488260 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:34:41,977-Speed 2628.04 samples/sec Loss 5.5828 LearningRate 0.0169 Epoch: 11 Global Step: 488270 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:34:45,875-Speed 2627.43 samples/sec Loss 5.4623 LearningRate 0.0169 Epoch: 11 Global Step: 488280 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:34:49,774-Speed 2627.95 samples/sec Loss 5.4645 LearningRate 0.0169 Epoch: 11 Global Step: 488290 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:34:53,671-Speed 2628.47 samples/sec Loss 5.5053 LearningRate 0.0169 Epoch: 11 Global Step: 488300 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:34:57,565-Speed 2630.12 samples/sec Loss 5.5708 LearningRate 0.0169 Epoch: 11 Global Step: 488310 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:35:01,459-Speed 2630.82 samples/sec Loss 5.5114 LearningRate 0.0169 Epoch: 11 Global Step: 488320 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:35:05,374-Speed 2616.08 samples/sec Loss 5.5576 LearningRate 0.0169 Epoch: 11 Global Step: 488330 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:35:09,270-Speed 2628.39 samples/sec Loss 5.5036 LearningRate 0.0169 Epoch: 11 Global Step: 488340 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:35:13,151-Speed 2638.87 samples/sec Loss 5.5395 LearningRate 0.0169 Epoch: 11 Global Step: 488350 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:17,049-Speed 2628.26 samples/sec Loss 5.5235 LearningRate 0.0169 Epoch: 11 Global Step: 488360 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:20,945-Speed 2628.82 samples/sec Loss 5.5231 LearningRate 0.0169 Epoch: 11 Global Step: 488370 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:24,850-Speed 2623.61 samples/sec Loss 5.5069 LearningRate 0.0169 Epoch: 11 Global Step: 488380 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:28,746-Speed 2628.87 samples/sec Loss 5.5629 LearningRate 0.0169 Epoch: 11 Global Step: 488390 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:32,644-Speed 2628.18 samples/sec Loss 5.4769 LearningRate 0.0169 Epoch: 11 Global Step: 488400 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:36,569-Speed 2609.26 samples/sec Loss 5.5376 LearningRate 0.0169 Epoch: 11 Global Step: 488410 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:40,464-Speed 2629.94 samples/sec Loss 5.4597 LearningRate 0.0169 Epoch: 11 Global Step: 488420 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:44,357-Speed 2630.43 samples/sec Loss 5.4975 LearningRate 0.0169 Epoch: 11 Global Step: 488430 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:48,255-Speed 2628.56 samples/sec Loss 5.5251 LearningRate 0.0169 Epoch: 11 Global Step: 488440 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:35:52,173-Speed 2614.15 samples/sec Loss 5.5499 LearningRate 0.0169 Epoch: 11 Global Step: 488450 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:35:56,084-Speed 2619.26 samples/sec Loss 5.5377 LearningRate 0.0169 Epoch: 11 Global Step: 488460 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:35:59,992-Speed 2620.74 samples/sec Loss 5.4976 LearningRate 0.0169 Epoch: 11 Global Step: 488470 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:03,887-Speed 2630.44 samples/sec Loss 5.5198 LearningRate 0.0169 Epoch: 11 Global Step: 488480 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:07,780-Speed 2630.53 samples/sec Loss 5.5034 LearningRate 0.0169 Epoch: 11 Global Step: 488490 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:11,675-Speed 2629.24 samples/sec Loss 5.4421 LearningRate 0.0169 Epoch: 11 Global Step: 488500 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:15,577-Speed 2624.98 samples/sec Loss 5.6153 LearningRate 0.0169 Epoch: 11 Global Step: 488510 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:19,480-Speed 2624.50 samples/sec Loss 5.5905 LearningRate 0.0169 Epoch: 11 Global Step: 488520 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:23,375-Speed 2629.46 samples/sec Loss 5.5419 LearningRate 0.0169 Epoch: 11 Global Step: 488530 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:27,279-Speed 2631.10 samples/sec Loss 5.4632 LearningRate 0.0169 Epoch: 11 Global Step: 488540 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:36:31,155-Speed 2642.63 samples/sec Loss 5.5732 LearningRate 0.0169 Epoch: 11 Global Step: 488550 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:36:35,055-Speed 2628.62 samples/sec Loss 5.4974 LearningRate 0.0169 Epoch: 11 Global Step: 488560 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:36:38,949-Speed 2630.42 samples/sec Loss 5.5153 LearningRate 0.0169 Epoch: 11 Global Step: 488570 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:36:42,845-Speed 2628.68 samples/sec Loss 5.4507 LearningRate 0.0169 Epoch: 11 Global Step: 488580 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:36:46,742-Speed 2628.36 samples/sec Loss 5.5034 LearningRate 0.0169 Epoch: 11 Global Step: 488590 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:36:50,633-Speed 2632.10 samples/sec Loss 5.5048 LearningRate 0.0169 Epoch: 11 Global Step: 488600 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:36:54,540-Speed 2622.52 samples/sec Loss 5.4537 LearningRate 0.0169 Epoch: 11 Global Step: 488610 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:36:58,433-Speed 2630.86 samples/sec Loss 5.5768 LearningRate 0.0169 Epoch: 11 Global Step: 488620 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:02,330-Speed 2628.09 samples/sec Loss 5.5047 LearningRate 0.0169 Epoch: 11 Global Step: 488630 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:06,225-Speed 2629.31 samples/sec Loss 5.5563 LearningRate 0.0169 Epoch: 11 Global Step: 488640 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:10,118-Speed 2631.65 samples/sec Loss 5.5145 LearningRate 0.0169 Epoch: 11 Global Step: 488650 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:37:14,013-Speed 2630.03 samples/sec Loss 5.4363 LearningRate 0.0169 Epoch: 11 Global Step: 488660 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:37:17,910-Speed 2627.93 samples/sec Loss 5.4772 LearningRate 0.0169 Epoch: 11 Global Step: 488670 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:37:21,786-Speed 2642.95 samples/sec Loss 5.5731 LearningRate 0.0169 Epoch: 11 Global Step: 488680 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:25,693-Speed 2621.44 samples/sec Loss 5.5382 LearningRate 0.0169 Epoch: 11 Global Step: 488690 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:29,588-Speed 2630.14 samples/sec Loss 5.4993 LearningRate 0.0169 Epoch: 11 Global Step: 488700 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:33,491-Speed 2623.85 samples/sec Loss 5.4832 LearningRate 0.0169 Epoch: 11 Global Step: 488710 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:37,390-Speed 2627.02 samples/sec Loss 5.5377 LearningRate 0.0169 Epoch: 11 Global Step: 488720 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:41,300-Speed 2619.64 samples/sec Loss 5.5396 LearningRate 0.0169 Epoch: 11 Global Step: 488730 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:45,200-Speed 2626.68 samples/sec Loss 5.4572 LearningRate 0.0169 Epoch: 11 Global Step: 488740 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:49,107-Speed 2621.60 samples/sec Loss 5.5600 LearningRate 0.0169 Epoch: 11 Global Step: 488750 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:53,021-Speed 2616.34 samples/sec Loss 5.5058 LearningRate 0.0169 Epoch: 11 Global Step: 488760 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:37:56,916-Speed 2630.48 samples/sec Loss 5.4999 LearningRate 0.0169 Epoch: 11 Global Step: 488770 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:00,812-Speed 2629.13 samples/sec Loss 5.5822 LearningRate 0.0169 Epoch: 11 Global Step: 488780 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:38:04,686-Speed 2643.60 samples/sec Loss 5.5355 LearningRate 0.0169 Epoch: 11 Global Step: 488790 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:08,577-Speed 2632.19 samples/sec Loss 5.4815 LearningRate 0.0169 Epoch: 11 Global Step: 488800 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:12,471-Speed 2630.43 samples/sec Loss 5.4491 LearningRate 0.0169 Epoch: 11 Global Step: 488810 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:16,369-Speed 2628.32 samples/sec Loss 5.4751 LearningRate 0.0169 Epoch: 11 Global Step: 488820 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:20,263-Speed 2630.17 samples/sec Loss 5.4945 LearningRate 0.0169 Epoch: 11 Global Step: 488830 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:24,161-Speed 2627.72 samples/sec Loss 5.4313 LearningRate 0.0169 Epoch: 11 Global Step: 488840 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:28,074-Speed 2623.95 samples/sec Loss 5.5724 LearningRate 0.0169 Epoch: 11 Global Step: 488850 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:31,975-Speed 2625.60 samples/sec Loss 5.5953 LearningRate 0.0169 Epoch: 11 Global Step: 488860 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:35,877-Speed 2624.98 samples/sec Loss 5.5008 LearningRate 0.0169 Epoch: 11 Global Step: 488870 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:39,778-Speed 2625.49 samples/sec Loss 5.4548 LearningRate 0.0169 Epoch: 11 Global Step: 488880 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:43,652-Speed 2644.72 samples/sec Loss 5.5511 LearningRate 0.0169 Epoch: 11 Global Step: 488890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:47,553-Speed 2625.09 samples/sec Loss 5.4598 LearningRate 0.0169 Epoch: 11 Global Step: 488900 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:51,492-Speed 2601.28 samples/sec Loss 5.5643 LearningRate 0.0169 Epoch: 11 Global Step: 488910 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:55,382-Speed 2632.64 samples/sec Loss 5.5379 LearningRate 0.0169 Epoch: 11 Global Step: 488920 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:38:59,278-Speed 2629.67 samples/sec Loss 5.5195 LearningRate 0.0169 Epoch: 11 Global Step: 488930 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:39:03,172-Speed 2630.13 samples/sec Loss 5.5684 LearningRate 0.0169 Epoch: 11 Global Step: 488940 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:39:07,068-Speed 2629.09 samples/sec Loss 5.5975 LearningRate 0.0169 Epoch: 11 Global Step: 488950 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:39:10,956-Speed 2634.04 samples/sec Loss 5.6526 LearningRate 0.0169 Epoch: 11 Global Step: 488960 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:14,871-Speed 2616.44 samples/sec Loss 5.5021 LearningRate 0.0169 Epoch: 11 Global Step: 488970 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:18,769-Speed 2627.22 samples/sec Loss 5.4446 LearningRate 0.0169 Epoch: 11 Global Step: 488980 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:22,675-Speed 2623.25 samples/sec Loss 5.4753 LearningRate 0.0169 Epoch: 11 Global Step: 488990 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:26,566-Speed 2631.77 samples/sec Loss 5.4650 LearningRate 0.0169 Epoch: 11 Global Step: 489000 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:30,468-Speed 2625.48 samples/sec Loss 5.5158 LearningRate 0.0169 Epoch: 11 Global Step: 489010 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:34,372-Speed 2623.47 samples/sec Loss 5.4408 LearningRate 0.0169 Epoch: 11 Global Step: 489020 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:38,277-Speed 2622.65 samples/sec Loss 5.5694 LearningRate 0.0169 Epoch: 11 Global Step: 489030 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:42,180-Speed 2624.61 samples/sec Loss 5.5603 LearningRate 0.0169 Epoch: 11 Global Step: 489040 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:46,077-Speed 2627.80 samples/sec Loss 5.5132 LearningRate 0.0168 Epoch: 11 Global Step: 489050 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:39:49,980-Speed 2624.73 samples/sec Loss 5.5354 LearningRate 0.0168 Epoch: 11 Global Step: 489060 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:39:53,872-Speed 2631.54 samples/sec Loss 5.4920 LearningRate 0.0168 Epoch: 11 Global Step: 489070 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:39:57,770-Speed 2627.93 samples/sec Loss 5.5187 LearningRate 0.0168 Epoch: 11 Global Step: 489080 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:01,700-Speed 2606.29 samples/sec Loss 5.3959 LearningRate 0.0168 Epoch: 11 Global Step: 489090 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:05,596-Speed 2629.12 samples/sec Loss 5.5027 LearningRate 0.0168 Epoch: 11 Global Step: 489100 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:09,497-Speed 2626.55 samples/sec Loss 5.4914 LearningRate 0.0168 Epoch: 11 Global Step: 489110 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:13,399-Speed 2625.36 samples/sec Loss 5.5306 LearningRate 0.0168 Epoch: 11 Global Step: 489120 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:17,295-Speed 2628.54 samples/sec Loss 5.5219 LearningRate 0.0168 Epoch: 11 Global Step: 489130 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:21,190-Speed 2629.89 samples/sec Loss 5.5210 LearningRate 0.0168 Epoch: 11 Global Step: 489140 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:25,082-Speed 2631.66 samples/sec Loss 5.4892 LearningRate 0.0168 Epoch: 11 Global Step: 489150 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:28,978-Speed 2629.30 samples/sec Loss 5.3985 LearningRate 0.0168 Epoch: 11 Global Step: 489160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:40:32,874-Speed 2629.02 samples/sec Loss 5.4273 LearningRate 0.0168 Epoch: 11 Global Step: 489170 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:40:36,759-Speed 2636.62 samples/sec Loss 5.5435 LearningRate 0.0168 Epoch: 11 Global Step: 489180 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:40,658-Speed 2626.22 samples/sec Loss 5.5085 LearningRate 0.0168 Epoch: 11 Global Step: 489190 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:44,556-Speed 2627.91 samples/sec Loss 5.4944 LearningRate 0.0168 Epoch: 11 Global Step: 489200 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:48,460-Speed 2623.83 samples/sec Loss 5.4080 LearningRate 0.0168 Epoch: 11 Global Step: 489210 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:52,357-Speed 2628.64 samples/sec Loss 5.4267 LearningRate 0.0168 Epoch: 11 Global Step: 489220 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:40:56,251-Speed 2630.26 samples/sec Loss 5.4375 LearningRate 0.0168 Epoch: 11 Global Step: 489230 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:00,148-Speed 2628.57 samples/sec Loss 5.5445 LearningRate 0.0168 Epoch: 11 Global Step: 489240 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:04,045-Speed 2628.75 samples/sec Loss 5.4511 LearningRate 0.0168 Epoch: 11 Global Step: 489250 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:07,946-Speed 2625.19 samples/sec Loss 5.4406 LearningRate 0.0168 Epoch: 11 Global Step: 489260 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:11,842-Speed 2628.70 samples/sec Loss 5.4616 LearningRate 0.0168 Epoch: 11 Global Step: 489270 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:15,736-Speed 2630.43 samples/sec Loss 5.3993 LearningRate 0.0168 Epoch: 11 Global Step: 489280 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:41:19,682-Speed 2596.68 samples/sec Loss 5.5016 LearningRate 0.0168 Epoch: 11 Global Step: 489290 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:41:23,700-Speed 2549.14 samples/sec Loss 5.5345 LearningRate 0.0168 Epoch: 11 Global Step: 489300 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:41:27,577-Speed 2641.82 samples/sec Loss 5.5304 LearningRate 0.0168 Epoch: 11 Global Step: 489310 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:31,485-Speed 2621.29 samples/sec Loss 5.4402 LearningRate 0.0168 Epoch: 11 Global Step: 489320 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:35,382-Speed 2628.20 samples/sec Loss 5.3448 LearningRate 0.0168 Epoch: 11 Global Step: 489330 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:39,286-Speed 2623.34 samples/sec Loss 5.4632 LearningRate 0.0168 Epoch: 11 Global Step: 489340 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:43,195-Speed 2620.17 samples/sec Loss 5.5735 LearningRate 0.0168 Epoch: 11 Global Step: 489350 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:47,093-Speed 2627.69 samples/sec Loss 5.5068 LearningRate 0.0168 Epoch: 11 Global Step: 489360 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:50,991-Speed 2627.91 samples/sec Loss 5.5275 LearningRate 0.0168 Epoch: 11 Global Step: 489370 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:54,889-Speed 2628.03 samples/sec Loss 5.4928 LearningRate 0.0168 Epoch: 11 Global Step: 489380 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:41:58,784-Speed 2629.38 samples/sec Loss 5.5431 LearningRate 0.0168 Epoch: 11 Global Step: 489390 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:42:02,652-Speed 2648.15 samples/sec Loss 5.4973 LearningRate 0.0168 Epoch: 11 Global Step: 489400 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:06,554-Speed 2625.02 samples/sec Loss 5.5108 LearningRate 0.0168 Epoch: 11 Global Step: 489410 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:10,464-Speed 2619.55 samples/sec Loss 5.5529 LearningRate 0.0168 Epoch: 11 Global Step: 489420 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:14,382-Speed 2613.68 samples/sec Loss 5.3750 LearningRate 0.0168 Epoch: 11 Global Step: 489430 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:18,280-Speed 2628.29 samples/sec Loss 5.5056 LearningRate 0.0168 Epoch: 11 Global Step: 489440 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:22,177-Speed 2628.58 samples/sec Loss 5.5008 LearningRate 0.0168 Epoch: 11 Global Step: 489450 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:26,070-Speed 2631.00 samples/sec Loss 5.4733 LearningRate 0.0168 Epoch: 11 Global Step: 489460 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:29,964-Speed 2630.72 samples/sec Loss 5.4527 LearningRate 0.0168 Epoch: 11 Global Step: 489470 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:33,859-Speed 2629.46 samples/sec Loss 5.5028 LearningRate 0.0168 Epoch: 11 Global Step: 489480 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:37,758-Speed 2627.28 samples/sec Loss 5.5016 LearningRate 0.0168 Epoch: 11 Global Step: 489490 Fp16 Grad Scale: 32768 Required: 38 hours
Training: 2022-04-15 02:42:41,656-Speed 2627.22 samples/sec Loss 5.3653 LearningRate 0.0168 Epoch: 11 Global Step: 489500 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:42:45,552-Speed 2629.10 samples/sec Loss 5.5006 LearningRate 0.0168 Epoch: 11 Global Step: 489510 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:42:49,454-Speed 2624.41 samples/sec Loss 5.5288 LearningRate 0.0168 Epoch: 11 Global Step: 489520 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:42:53,355-Speed 2626.21 samples/sec Loss 5.5835 LearningRate 0.0168 Epoch: 11 Global Step: 489530 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:42:57,266-Speed 2618.30 samples/sec Loss 5.6440 LearningRate 0.0168 Epoch: 11 Global Step: 489540 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:01,169-Speed 2624.53 samples/sec Loss 5.5526 LearningRate 0.0168 Epoch: 11 Global Step: 489550 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:05,066-Speed 2627.99 samples/sec Loss 5.6142 LearningRate 0.0168 Epoch: 11 Global Step: 489560 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:08,972-Speed 2622.69 samples/sec Loss 5.5327 LearningRate 0.0168 Epoch: 11 Global Step: 489570 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:12,870-Speed 2627.85 samples/sec Loss 5.5029 LearningRate 0.0168 Epoch: 11 Global Step: 489580 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:16,771-Speed 2625.07 samples/sec Loss 5.5369 LearningRate 0.0168 Epoch: 11 Global Step: 489590 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:20,675-Speed 2623.50 samples/sec Loss 5.4305 LearningRate 0.0168 Epoch: 11 Global Step: 489600 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:43:24,580-Speed 2622.74 samples/sec Loss 5.5656 LearningRate 0.0168 Epoch: 11 Global Step: 489610 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:43:28,486-Speed 2622.65 samples/sec Loss 5.4077 LearningRate 0.0168 Epoch: 11 Global Step: 489620 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:43:32,400-Speed 2617.00 samples/sec Loss 5.4781 LearningRate 0.0168 Epoch: 11 Global Step: 489630 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:43:36,303-Speed 2623.61 samples/sec Loss 5.4942 LearningRate 0.0168 Epoch: 11 Global Step: 489640 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:43:40,188-Speed 2636.48 samples/sec Loss 5.5693 LearningRate 0.0168 Epoch: 11 Global Step: 489650 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:44,091-Speed 2633.09 samples/sec Loss 5.5880 LearningRate 0.0168 Epoch: 11 Global Step: 489660 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:47,989-Speed 2627.23 samples/sec Loss 5.3621 LearningRate 0.0168 Epoch: 11 Global Step: 489670 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:51,885-Speed 2629.47 samples/sec Loss 5.5500 LearningRate 0.0168 Epoch: 11 Global Step: 489680 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:55,776-Speed 2631.81 samples/sec Loss 5.6047 LearningRate 0.0168 Epoch: 11 Global Step: 489690 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:43:59,675-Speed 2627.34 samples/sec Loss 5.4925 LearningRate 0.0168 Epoch: 11 Global Step: 489700 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:44:03,582-Speed 2620.83 samples/sec Loss 5.5143 LearningRate 0.0168 Epoch: 11 Global Step: 489710 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:44:07,491-Speed 2620.61 samples/sec Loss 5.5377 LearningRate 0.0168 Epoch: 11 Global Step: 489720 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:44:11,389-Speed 2627.42 samples/sec Loss 5.5293 LearningRate 0.0168 Epoch: 11 Global Step: 489730 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:44:15,285-Speed 2629.09 samples/sec Loss 5.5634 LearningRate 0.0168 Epoch: 11 Global Step: 489740 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:44:19,180-Speed 2629.53 samples/sec Loss 5.4850 LearningRate 0.0168 Epoch: 11 Global Step: 489750 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:23,077-Speed 2628.67 samples/sec Loss 5.5709 LearningRate 0.0168 Epoch: 11 Global Step: 489760 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:26,967-Speed 2632.69 samples/sec Loss 5.3993 LearningRate 0.0168 Epoch: 11 Global Step: 489770 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:30,859-Speed 2631.60 samples/sec Loss 5.5918 LearningRate 0.0168 Epoch: 11 Global Step: 489780 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:34,751-Speed 2631.56 samples/sec Loss 5.4358 LearningRate 0.0168 Epoch: 11 Global Step: 489790 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:38,646-Speed 2629.81 samples/sec Loss 5.4403 LearningRate 0.0168 Epoch: 11 Global Step: 489800 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:42,548-Speed 2624.57 samples/sec Loss 5.4689 LearningRate 0.0168 Epoch: 11 Global Step: 489810 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:46,445-Speed 2628.70 samples/sec Loss 5.4023 LearningRate 0.0168 Epoch: 11 Global Step: 489820 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:50,337-Speed 2631.99 samples/sec Loss 5.6068 LearningRate 0.0168 Epoch: 11 Global Step: 489830 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:54,231-Speed 2629.99 samples/sec Loss 5.5041 LearningRate 0.0168 Epoch: 11 Global Step: 489840 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:44:58,100-Speed 2647.32 samples/sec Loss 5.4487 LearningRate 0.0168 Epoch: 11 Global Step: 489850 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:45:01,970-Speed 2647.09 samples/sec Loss 5.4336 LearningRate 0.0168 Epoch: 11 Global Step: 489860 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:05,863-Speed 2630.61 samples/sec Loss 5.4891 LearningRate 0.0168 Epoch: 11 Global Step: 489870 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:09,760-Speed 2627.96 samples/sec Loss 5.4193 LearningRate 0.0168 Epoch: 11 Global Step: 489880 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:13,662-Speed 2625.41 samples/sec Loss 5.4922 LearningRate 0.0168 Epoch: 11 Global Step: 489890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:17,577-Speed 2615.99 samples/sec Loss 5.4629 LearningRate 0.0168 Epoch: 11 Global Step: 489900 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:21,476-Speed 2626.85 samples/sec Loss 5.4301 LearningRate 0.0168 Epoch: 11 Global Step: 489910 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:25,380-Speed 2623.13 samples/sec Loss 5.5504 LearningRate 0.0168 Epoch: 11 Global Step: 489920 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:29,274-Speed 2630.67 samples/sec Loss 5.3835 LearningRate 0.0168 Epoch: 11 Global Step: 489930 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:33,169-Speed 2629.49 samples/sec Loss 5.4835 LearningRate 0.0168 Epoch: 11 Global Step: 489940 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:37,075-Speed 2622.32 samples/sec Loss 5.4617 LearningRate 0.0168 Epoch: 11 Global Step: 489950 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:45:40,972-Speed 2628.07 samples/sec Loss 5.3919 LearningRate 0.0168 Epoch: 11 Global Step: 489960 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:45:44,870-Speed 2627.79 samples/sec Loss 5.5527 LearningRate 0.0168 Epoch: 11 Global Step: 489970 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:45:48,767-Speed 2628.36 samples/sec Loss 5.4004 LearningRate 0.0168 Epoch: 11 Global Step: 489980 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:45:52,665-Speed 2627.75 samples/sec Loss 5.4822 LearningRate 0.0168 Epoch: 11 Global Step: 489990 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:45:56,557-Speed 2631.85 samples/sec Loss 5.5311 LearningRate 0.0168 Epoch: 11 Global Step: 490000 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:46:39,240-[lfw][490000]XNorm: 23.497655
Training: 2022-04-15 02:46:39,241-[lfw][490000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 02:46:39,241-[lfw][490000]Accuracy-Highest: 0.99800
Training: 2022-04-15 02:47:28,918-[cfp_fp][490000]XNorm: 22.108973
Training: 2022-04-15 02:47:28,919-[cfp_fp][490000]Accuracy-Flip: 0.98971+-0.00541
Training: 2022-04-15 02:47:28,919-[cfp_fp][490000]Accuracy-Highest: 0.98971
Training: 2022-04-15 02:48:11,543-[agedb_30][490000]XNorm: 23.511800
Training: 2022-04-15 02:48:11,544-[agedb_30][490000]Accuracy-Flip: 0.97950+-0.00742
Training: 2022-04-15 02:48:11,544-[agedb_30][490000]Accuracy-Highest: 0.97950
Training: 2022-04-15 02:48:15,385-Speed 73.76 samples/sec Loss 5.4498 LearningRate 0.0168 Epoch: 11 Global Step: 490010 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:19,253-Speed 2647.78 samples/sec Loss 5.5507 LearningRate 0.0168 Epoch: 11 Global Step: 490020 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:23,121-Speed 2648.68 samples/sec Loss 5.4385 LearningRate 0.0168 Epoch: 11 Global Step: 490030 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:26,992-Speed 2645.77 samples/sec Loss 5.4587 LearningRate 0.0168 Epoch: 11 Global Step: 490040 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:30,872-Speed 2639.39 samples/sec Loss 5.5087 LearningRate 0.0168 Epoch: 11 Global Step: 490050 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:34,770-Speed 2627.42 samples/sec Loss 5.5025 LearningRate 0.0167 Epoch: 11 Global Step: 490060 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:38,648-Speed 2641.70 samples/sec Loss 5.4614 LearningRate 0.0167 Epoch: 11 Global Step: 490070 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:42,526-Speed 2641.15 samples/sec Loss 5.5429 LearningRate 0.0167 Epoch: 11 Global Step: 490080 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:46,405-Speed 2639.83 samples/sec Loss 5.3692 LearningRate 0.0167 Epoch: 11 Global Step: 490090 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:50,286-Speed 2639.69 samples/sec Loss 5.4657 LearningRate 0.0167 Epoch: 11 Global Step: 490100 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:48:54,174-Speed 2633.94 samples/sec Loss 5.3593 LearningRate 0.0167 Epoch: 11 Global Step: 490110 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:48:58,056-Speed 2638.50 samples/sec Loss 5.4743 LearningRate 0.0167 Epoch: 11 Global Step: 490120 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:01,951-Speed 2629.49 samples/sec Loss 5.4683 LearningRate 0.0167 Epoch: 11 Global Step: 490130 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:05,843-Speed 2631.62 samples/sec Loss 5.5154 LearningRate 0.0167 Epoch: 11 Global Step: 490140 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:09,738-Speed 2629.54 samples/sec Loss 5.4741 LearningRate 0.0167 Epoch: 11 Global Step: 490150 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:13,633-Speed 2629.87 samples/sec Loss 5.4397 LearningRate 0.0167 Epoch: 11 Global Step: 490160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:17,554-Speed 2612.52 samples/sec Loss 5.4352 LearningRate 0.0167 Epoch: 11 Global Step: 490170 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:21,451-Speed 2628.10 samples/sec Loss 5.4378 LearningRate 0.0167 Epoch: 11 Global Step: 490180 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:25,351-Speed 2625.97 samples/sec Loss 5.4283 LearningRate 0.0167 Epoch: 11 Global Step: 490190 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:29,247-Speed 2629.56 samples/sec Loss 5.4482 LearningRate 0.0167 Epoch: 11 Global Step: 490200 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:33,121-Speed 2643.44 samples/sec Loss 5.5271 LearningRate 0.0167 Epoch: 11 Global Step: 490210 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:37,014-Speed 2630.74 samples/sec Loss 5.5498 LearningRate 0.0167 Epoch: 11 Global Step: 490220 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:40,926-Speed 2618.29 samples/sec Loss 5.4800 LearningRate 0.0167 Epoch: 11 Global Step: 490230 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:44,829-Speed 2624.22 samples/sec Loss 5.4813 LearningRate 0.0167 Epoch: 11 Global Step: 490240 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:48,725-Speed 2629.51 samples/sec Loss 5.4229 LearningRate 0.0167 Epoch: 11 Global Step: 490250 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:49:52,606-Speed 2638.97 samples/sec Loss 5.5635 LearningRate 0.0167 Epoch: 11 Global Step: 490260 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:49:56,500-Speed 2630.68 samples/sec Loss 5.5053 LearningRate 0.0167 Epoch: 11 Global Step: 490270 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:00,398-Speed 2627.50 samples/sec Loss 5.4501 LearningRate 0.0167 Epoch: 11 Global Step: 490280 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:04,294-Speed 2629.21 samples/sec Loss 5.3811 LearningRate 0.0167 Epoch: 11 Global Step: 490290 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:08,290-Speed 2562.93 samples/sec Loss 5.5164 LearningRate 0.0167 Epoch: 11 Global Step: 490300 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:12,200-Speed 2619.31 samples/sec Loss 5.4541 LearningRate 0.0167 Epoch: 11 Global Step: 490310 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:16,097-Speed 2628.07 samples/sec Loss 5.3748 LearningRate 0.0167 Epoch: 11 Global Step: 490320 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:20,005-Speed 2621.52 samples/sec Loss 5.4024 LearningRate 0.0167 Epoch: 11 Global Step: 490330 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:23,911-Speed 2621.98 samples/sec Loss 5.4359 LearningRate 0.0167 Epoch: 11 Global Step: 490340 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:27,812-Speed 2625.84 samples/sec Loss 5.4761 LearningRate 0.0167 Epoch: 11 Global Step: 490350 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:31,714-Speed 2624.93 samples/sec Loss 5.3597 LearningRate 0.0167 Epoch: 11 Global Step: 490360 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:50:35,612-Speed 2628.06 samples/sec Loss 5.4855 LearningRate 0.0167 Epoch: 11 Global Step: 490370 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:50:39,508-Speed 2628.62 samples/sec Loss 5.3709 LearningRate 0.0167 Epoch: 11 Global Step: 490380 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:50:43,414-Speed 2622.54 samples/sec Loss 5.3757 LearningRate 0.0167 Epoch: 11 Global Step: 490390 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:50:47,353-Speed 2599.88 samples/sec Loss 5.4189 LearningRate 0.0167 Epoch: 11 Global Step: 490400 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:50:51,229-Speed 2642.64 samples/sec Loss 5.4919 LearningRate 0.0167 Epoch: 11 Global Step: 490410 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:55,125-Speed 2628.81 samples/sec Loss 5.4441 LearningRate 0.0167 Epoch: 11 Global Step: 490420 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:50:59,033-Speed 2620.74 samples/sec Loss 5.5311 LearningRate 0.0167 Epoch: 11 Global Step: 490430 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:02,943-Speed 2619.50 samples/sec Loss 5.5701 LearningRate 0.0167 Epoch: 11 Global Step: 490440 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:06,841-Speed 2627.76 samples/sec Loss 5.4093 LearningRate 0.0167 Epoch: 11 Global Step: 490450 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:10,740-Speed 2626.51 samples/sec Loss 5.5078 LearningRate 0.0167 Epoch: 11 Global Step: 490460 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:14,634-Speed 2630.84 samples/sec Loss 5.5883 LearningRate 0.0167 Epoch: 11 Global Step: 490470 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:18,537-Speed 2624.05 samples/sec Loss 5.4496 LearningRate 0.0167 Epoch: 11 Global Step: 490480 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:22,432-Speed 2630.14 samples/sec Loss 5.5562 LearningRate 0.0167 Epoch: 11 Global Step: 490490 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:26,327-Speed 2629.40 samples/sec Loss 5.4908 LearningRate 0.0167 Epoch: 11 Global Step: 490500 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:30,223-Speed 2629.21 samples/sec Loss 5.4573 LearningRate 0.0167 Epoch: 11 Global Step: 490510 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:51:34,124-Speed 2624.85 samples/sec Loss 5.4911 LearningRate 0.0167 Epoch: 11 Global Step: 490520 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:51:38,034-Speed 2619.61 samples/sec Loss 5.4627 LearningRate 0.0167 Epoch: 11 Global Step: 490530 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:51:41,930-Speed 2628.64 samples/sec Loss 5.4918 LearningRate 0.0167 Epoch: 11 Global Step: 490540 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:51:45,832-Speed 2624.77 samples/sec Loss 5.5326 LearningRate 0.0167 Epoch: 11 Global Step: 490550 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:51:49,701-Speed 2647.61 samples/sec Loss 5.4401 LearningRate 0.0167 Epoch: 11 Global Step: 490560 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:53,601-Speed 2626.36 samples/sec Loss 5.4532 LearningRate 0.0167 Epoch: 11 Global Step: 490570 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:51:57,500-Speed 2627.00 samples/sec Loss 5.4736 LearningRate 0.0167 Epoch: 11 Global Step: 490580 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:01,445-Speed 2596.29 samples/sec Loss 5.6485 LearningRate 0.0167 Epoch: 11 Global Step: 490590 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:05,342-Speed 2628.07 samples/sec Loss 5.4794 LearningRate 0.0167 Epoch: 11 Global Step: 490600 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:09,240-Speed 2627.52 samples/sec Loss 5.4463 LearningRate 0.0167 Epoch: 11 Global Step: 490610 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:13,149-Speed 2625.08 samples/sec Loss 5.4665 LearningRate 0.0167 Epoch: 11 Global Step: 490620 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:17,046-Speed 2628.11 samples/sec Loss 5.4158 LearningRate 0.0167 Epoch: 11 Global Step: 490630 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:20,949-Speed 2624.53 samples/sec Loss 5.5440 LearningRate 0.0167 Epoch: 11 Global Step: 490640 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:24,881-Speed 2604.22 samples/sec Loss 5.4445 LearningRate 0.0167 Epoch: 11 Global Step: 490650 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:28,794-Speed 2618.61 samples/sec Loss 5.3983 LearningRate 0.0167 Epoch: 11 Global Step: 490660 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:52:32,688-Speed 2630.03 samples/sec Loss 5.4863 LearningRate 0.0167 Epoch: 11 Global Step: 490670 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:52:36,582-Speed 2629.88 samples/sec Loss 5.3903 LearningRate 0.0167 Epoch: 11 Global Step: 490680 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:52:40,557-Speed 2577.05 samples/sec Loss 5.5125 LearningRate 0.0167 Epoch: 11 Global Step: 490690 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:52:44,473-Speed 2615.29 samples/sec Loss 5.5033 LearningRate 0.0167 Epoch: 11 Global Step: 490700 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:48,556-Speed 2508.45 samples/sec Loss 5.4382 LearningRate 0.0167 Epoch: 11 Global Step: 490710 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:52,463-Speed 2621.99 samples/sec Loss 5.4907 LearningRate 0.0167 Epoch: 11 Global Step: 490720 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:52:56,367-Speed 2623.52 samples/sec Loss 5.4270 LearningRate 0.0167 Epoch: 11 Global Step: 490730 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:00,265-Speed 2627.16 samples/sec Loss 5.5042 LearningRate 0.0167 Epoch: 11 Global Step: 490740 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:04,184-Speed 2613.89 samples/sec Loss 5.3622 LearningRate 0.0167 Epoch: 11 Global Step: 490750 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:08,083-Speed 2626.82 samples/sec Loss 5.5183 LearningRate 0.0167 Epoch: 11 Global Step: 490760 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:11,985-Speed 2624.63 samples/sec Loss 5.4361 LearningRate 0.0167 Epoch: 11 Global Step: 490770 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:15,936-Speed 2593.03 samples/sec Loss 5.3997 LearningRate 0.0167 Epoch: 11 Global Step: 490780 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:19,854-Speed 2614.39 samples/sec Loss 5.4578 LearningRate 0.0167 Epoch: 11 Global Step: 490790 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:23,763-Speed 2619.93 samples/sec Loss 5.5275 LearningRate 0.0167 Epoch: 11 Global Step: 490800 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:53:27,641-Speed 2641.09 samples/sec Loss 5.5829 LearningRate 0.0167 Epoch: 11 Global Step: 490810 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:31,548-Speed 2621.41 samples/sec Loss 5.3856 LearningRate 0.0167 Epoch: 11 Global Step: 490820 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:35,465-Speed 2614.92 samples/sec Loss 5.4387 LearningRate 0.0167 Epoch: 11 Global Step: 490830 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:39,367-Speed 2624.65 samples/sec Loss 5.4330 LearningRate 0.0167 Epoch: 11 Global Step: 490840 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:43,394-Speed 2543.40 samples/sec Loss 5.4926 LearningRate 0.0167 Epoch: 11 Global Step: 490850 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:47,506-Speed 2491.53 samples/sec Loss 5.4963 LearningRate 0.0167 Epoch: 11 Global Step: 490860 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:51,485-Speed 2574.39 samples/sec Loss 5.4882 LearningRate 0.0167 Epoch: 11 Global Step: 490870 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:55,388-Speed 2623.84 samples/sec Loss 5.4872 LearningRate 0.0167 Epoch: 11 Global Step: 490880 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:53:59,400-Speed 2553.22 samples/sec Loss 5.3676 LearningRate 0.0167 Epoch: 11 Global Step: 490890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:03,303-Speed 2624.15 samples/sec Loss 5.4733 LearningRate 0.0167 Epoch: 11 Global Step: 490900 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:07,201-Speed 2627.30 samples/sec Loss 5.5036 LearningRate 0.0167 Epoch: 11 Global Step: 490910 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:54:11,102-Speed 2625.03 samples/sec Loss 5.5024 LearningRate 0.0167 Epoch: 11 Global Step: 490920 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:54:15,009-Speed 2621.75 samples/sec Loss 5.4528 LearningRate 0.0167 Epoch: 11 Global Step: 490930 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:54:18,908-Speed 2627.22 samples/sec Loss 5.4526 LearningRate 0.0167 Epoch: 11 Global Step: 490940 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:54:22,786-Speed 2640.93 samples/sec Loss 5.3206 LearningRate 0.0167 Epoch: 11 Global Step: 490950 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:26,696-Speed 2620.39 samples/sec Loss 5.4868 LearningRate 0.0167 Epoch: 11 Global Step: 490960 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:30,634-Speed 2600.64 samples/sec Loss 5.4629 LearningRate 0.0167 Epoch: 11 Global Step: 490970 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:34,668-Speed 2538.90 samples/sec Loss 5.4542 LearningRate 0.0167 Epoch: 11 Global Step: 490980 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:38,576-Speed 2621.08 samples/sec Loss 5.3737 LearningRate 0.0167 Epoch: 11 Global Step: 490990 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:42,483-Speed 2620.93 samples/sec Loss 5.4516 LearningRate 0.0167 Epoch: 11 Global Step: 491000 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:46,385-Speed 2624.75 samples/sec Loss 5.4587 LearningRate 0.0167 Epoch: 11 Global Step: 491010 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:50,295-Speed 2620.16 samples/sec Loss 5.4796 LearningRate 0.0167 Epoch: 11 Global Step: 491020 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:54,207-Speed 2618.47 samples/sec Loss 5.5110 LearningRate 0.0167 Epoch: 11 Global Step: 491030 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:54:58,126-Speed 2613.36 samples/sec Loss 5.4874 LearningRate 0.0167 Epoch: 11 Global Step: 491040 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:02,033-Speed 2622.00 samples/sec Loss 5.4620 LearningRate 0.0167 Epoch: 11 Global Step: 491050 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:55:05,914-Speed 2638.97 samples/sec Loss 5.5901 LearningRate 0.0167 Epoch: 11 Global Step: 491060 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:09,820-Speed 2621.97 samples/sec Loss 5.4014 LearningRate 0.0167 Epoch: 11 Global Step: 491070 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:13,726-Speed 2622.27 samples/sec Loss 5.4278 LearningRate 0.0166 Epoch: 11 Global Step: 491080 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:17,638-Speed 2618.17 samples/sec Loss 5.5871 LearningRate 0.0166 Epoch: 11 Global Step: 491090 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:21,550-Speed 2618.45 samples/sec Loss 5.4458 LearningRate 0.0166 Epoch: 11 Global Step: 491100 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:25,452-Speed 2624.96 samples/sec Loss 5.3785 LearningRate 0.0166 Epoch: 11 Global Step: 491110 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:29,355-Speed 2624.23 samples/sec Loss 5.5165 LearningRate 0.0166 Epoch: 11 Global Step: 491120 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:33,262-Speed 2621.72 samples/sec Loss 5.4733 LearningRate 0.0166 Epoch: 11 Global Step: 491130 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:37,173-Speed 2618.53 samples/sec Loss 5.5457 LearningRate 0.0166 Epoch: 11 Global Step: 491140 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:41,086-Speed 2617.70 samples/sec Loss 5.4737 LearningRate 0.0166 Epoch: 11 Global Step: 491150 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:55:44,995-Speed 2620.27 samples/sec Loss 5.5046 LearningRate 0.0166 Epoch: 11 Global Step: 491160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:55:48,908-Speed 2617.40 samples/sec Loss 5.4546 LearningRate 0.0166 Epoch: 11 Global Step: 491170 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:55:52,816-Speed 2620.91 samples/sec Loss 5.4628 LearningRate 0.0166 Epoch: 11 Global Step: 491180 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:55:56,737-Speed 2612.29 samples/sec Loss 5.5445 LearningRate 0.0166 Epoch: 11 Global Step: 491190 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:56:00,625-Speed 2633.96 samples/sec Loss 5.4489 LearningRate 0.0166 Epoch: 11 Global Step: 491200 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:04,532-Speed 2621.73 samples/sec Loss 5.4190 LearningRate 0.0166 Epoch: 11 Global Step: 491210 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:08,446-Speed 2617.13 samples/sec Loss 5.4428 LearningRate 0.0166 Epoch: 11 Global Step: 491220 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:12,384-Speed 2600.59 samples/sec Loss 5.3645 LearningRate 0.0166 Epoch: 11 Global Step: 491230 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:16,299-Speed 2615.90 samples/sec Loss 5.3484 LearningRate 0.0166 Epoch: 11 Global Step: 491240 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:20,213-Speed 2616.96 samples/sec Loss 5.5132 LearningRate 0.0166 Epoch: 11 Global Step: 491250 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:24,124-Speed 2619.42 samples/sec Loss 5.5400 LearningRate 0.0166 Epoch: 11 Global Step: 491260 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:28,031-Speed 2621.24 samples/sec Loss 5.5236 LearningRate 0.0166 Epoch: 11 Global Step: 491270 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:31,939-Speed 2620.75 samples/sec Loss 5.4195 LearningRate 0.0166 Epoch: 11 Global Step: 491280 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:35,844-Speed 2622.88 samples/sec Loss 5.3418 LearningRate 0.0166 Epoch: 11 Global Step: 491290 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:56:39,753-Speed 2620.23 samples/sec Loss 5.4297 LearningRate 0.0166 Epoch: 11 Global Step: 491300 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:56:43,674-Speed 2612.08 samples/sec Loss 5.3838 LearningRate 0.0166 Epoch: 11 Global Step: 491310 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:56:47,589-Speed 2616.69 samples/sec Loss 5.3715 LearningRate 0.0166 Epoch: 11 Global Step: 491320 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:56:51,517-Speed 2606.99 samples/sec Loss 5.4538 LearningRate 0.0166 Epoch: 11 Global Step: 491330 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:56:55,432-Speed 2616.96 samples/sec Loss 5.4749 LearningRate 0.0166 Epoch: 11 Global Step: 491340 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:56:59,337-Speed 2623.32 samples/sec Loss 5.4433 LearningRate 0.0166 Epoch: 11 Global Step: 491350 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:57:03,244-Speed 2621.38 samples/sec Loss 5.4119 LearningRate 0.0166 Epoch: 11 Global Step: 491360 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:57:07,156-Speed 2618.01 samples/sec Loss 5.4735 LearningRate 0.0166 Epoch: 11 Global Step: 491370 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:57:11,068-Speed 2618.13 samples/sec Loss 5.4652 LearningRate 0.0166 Epoch: 11 Global Step: 491380 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:57:14,983-Speed 2616.45 samples/sec Loss 5.4785 LearningRate 0.0166 Epoch: 11 Global Step: 491390 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:57:18,892-Speed 2620.04 samples/sec Loss 5.5494 LearningRate 0.0166 Epoch: 11 Global Step: 491400 Fp16 Grad Scale: 262144 Required: 38 hours
Training: 2022-04-15 02:57:22,755-Speed 2651.37 samples/sec Loss 5.4294 LearningRate 0.0166 Epoch: 11 Global Step: 491410 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:26,682-Speed 2608.13 samples/sec Loss 5.4574 LearningRate 0.0166 Epoch: 11 Global Step: 491420 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:30,590-Speed 2621.41 samples/sec Loss 5.4361 LearningRate 0.0166 Epoch: 11 Global Step: 491430 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:34,504-Speed 2616.27 samples/sec Loss 5.4724 LearningRate 0.0166 Epoch: 11 Global Step: 491440 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:38,423-Speed 2613.72 samples/sec Loss 5.4664 LearningRate 0.0166 Epoch: 11 Global Step: 491450 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:42,333-Speed 2619.83 samples/sec Loss 5.4652 LearningRate 0.0166 Epoch: 11 Global Step: 491460 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:46,239-Speed 2622.01 samples/sec Loss 5.3552 LearningRate 0.0166 Epoch: 11 Global Step: 491470 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:50,152-Speed 2618.21 samples/sec Loss 5.4295 LearningRate 0.0166 Epoch: 11 Global Step: 491480 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:54,064-Speed 2617.66 samples/sec Loss 5.5125 LearningRate 0.0166 Epoch: 11 Global Step: 491490 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:57:57,986-Speed 2611.68 samples/sec Loss 5.4714 LearningRate 0.0166 Epoch: 11 Global Step: 491500 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:58:01,897-Speed 2619.02 samples/sec Loss 5.4017 LearningRate 0.0166 Epoch: 11 Global Step: 491510 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:05,802-Speed 2622.65 samples/sec Loss 5.3283 LearningRate 0.0166 Epoch: 11 Global Step: 491520 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:09,707-Speed 2622.54 samples/sec Loss 5.4196 LearningRate 0.0166 Epoch: 11 Global Step: 491530 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:13,616-Speed 2620.71 samples/sec Loss 5.4407 LearningRate 0.0166 Epoch: 11 Global Step: 491540 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:17,521-Speed 2623.19 samples/sec Loss 5.3846 LearningRate 0.0166 Epoch: 11 Global Step: 491550 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:21,429-Speed 2620.65 samples/sec Loss 5.4742 LearningRate 0.0166 Epoch: 11 Global Step: 491560 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:25,350-Speed 2612.35 samples/sec Loss 5.3737 LearningRate 0.0166 Epoch: 11 Global Step: 491570 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:29,265-Speed 2616.30 samples/sec Loss 5.4699 LearningRate 0.0166 Epoch: 11 Global Step: 491580 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:33,536-Speed 2397.95 samples/sec Loss 5.4720 LearningRate 0.0166 Epoch: 11 Global Step: 491590 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:37,458-Speed 2611.14 samples/sec Loss 5.3608 LearningRate 0.0166 Epoch: 11 Global Step: 491600 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:58:41,350-Speed 2631.97 samples/sec Loss 5.4718 LearningRate 0.0166 Epoch: 11 Global Step: 491610 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:58:45,258-Speed 2620.45 samples/sec Loss 5.3512 LearningRate 0.0166 Epoch: 11 Global Step: 491620 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:58:49,172-Speed 2617.07 samples/sec Loss 5.4729 LearningRate 0.0166 Epoch: 11 Global Step: 491630 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:58:53,093-Speed 2612.42 samples/sec Loss 5.5796 LearningRate 0.0166 Epoch: 11 Global Step: 491640 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:58:57,001-Speed 2621.04 samples/sec Loss 5.4583 LearningRate 0.0166 Epoch: 11 Global Step: 491650 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:59:00,919-Speed 2614.07 samples/sec Loss 5.3598 LearningRate 0.0166 Epoch: 11 Global Step: 491660 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:59:04,825-Speed 2622.63 samples/sec Loss 5.4201 LearningRate 0.0166 Epoch: 11 Global Step: 491670 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:59:08,732-Speed 2621.27 samples/sec Loss 5.3848 LearningRate 0.0166 Epoch: 11 Global Step: 491680 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:59:12,637-Speed 2622.65 samples/sec Loss 5.5084 LearningRate 0.0166 Epoch: 11 Global Step: 491690 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:59:16,550-Speed 2617.03 samples/sec Loss 5.4723 LearningRate 0.0166 Epoch: 11 Global Step: 491700 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 02:59:20,455-Speed 2622.98 samples/sec Loss 5.4420 LearningRate 0.0166 Epoch: 11 Global Step: 491710 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:24,358-Speed 2624.16 samples/sec Loss 5.3746 LearningRate 0.0166 Epoch: 11 Global Step: 491720 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:28,272-Speed 2617.47 samples/sec Loss 5.3695 LearningRate 0.0166 Epoch: 11 Global Step: 491730 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:32,183-Speed 2618.81 samples/sec Loss 5.5044 LearningRate 0.0166 Epoch: 11 Global Step: 491740 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:36,091-Speed 2620.64 samples/sec Loss 5.4945 LearningRate 0.0166 Epoch: 11 Global Step: 491750 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:39,998-Speed 2621.50 samples/sec Loss 5.3770 LearningRate 0.0166 Epoch: 11 Global Step: 491760 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:44,000-Speed 2559.65 samples/sec Loss 5.5361 LearningRate 0.0166 Epoch: 11 Global Step: 491770 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:47,937-Speed 2601.03 samples/sec Loss 5.4440 LearningRate 0.0166 Epoch: 11 Global Step: 491780 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:51,845-Speed 2620.97 samples/sec Loss 5.5387 LearningRate 0.0166 Epoch: 11 Global Step: 491790 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:55,765-Speed 2613.27 samples/sec Loss 5.4985 LearningRate 0.0166 Epoch: 11 Global Step: 491800 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 02:59:59,648-Speed 2637.82 samples/sec Loss 5.4257 LearningRate 0.0166 Epoch: 11 Global Step: 491810 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:03,630-Speed 2571.78 samples/sec Loss 5.5264 LearningRate 0.0166 Epoch: 11 Global Step: 491820 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:07,540-Speed 2620.09 samples/sec Loss 5.4149 LearningRate 0.0166 Epoch: 11 Global Step: 491830 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:11,508-Speed 2580.77 samples/sec Loss 5.4069 LearningRate 0.0166 Epoch: 11 Global Step: 491840 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:15,428-Speed 2613.38 samples/sec Loss 5.3598 LearningRate 0.0166 Epoch: 11 Global Step: 491850 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:19,337-Speed 2619.93 samples/sec Loss 5.4648 LearningRate 0.0166 Epoch: 11 Global Step: 491860 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:23,246-Speed 2620.34 samples/sec Loss 5.5325 LearningRate 0.0166 Epoch: 11 Global Step: 491870 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:27,151-Speed 2622.87 samples/sec Loss 5.5259 LearningRate 0.0166 Epoch: 11 Global Step: 491880 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:00:31,051-Speed 2626.14 samples/sec Loss 5.4790 LearningRate 0.0166 Epoch: 11 Global Step: 491890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:00:34,959-Speed 2621.24 samples/sec Loss 5.4919 LearningRate 0.0166 Epoch: 11 Global Step: 491900 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:00:38,866-Speed 2620.77 samples/sec Loss 5.4970 LearningRate 0.0166 Epoch: 11 Global Step: 491910 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:00:42,775-Speed 2620.61 samples/sec Loss 5.4726 LearningRate 0.0166 Epoch: 11 Global Step: 491920 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:00:46,688-Speed 2617.32 samples/sec Loss 5.4224 LearningRate 0.0166 Epoch: 11 Global Step: 491930 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:00:50,600-Speed 2618.59 samples/sec Loss 5.3505 LearningRate 0.0166 Epoch: 11 Global Step: 491940 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:00:54,513-Speed 2617.53 samples/sec Loss 5.4002 LearningRate 0.0166 Epoch: 11 Global Step: 491950 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:00:58,424-Speed 2618.75 samples/sec Loss 5.3418 LearningRate 0.0166 Epoch: 11 Global Step: 491960 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:02,336-Speed 2618.01 samples/sec Loss 5.3973 LearningRate 0.0166 Epoch: 11 Global Step: 491970 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:06,243-Speed 2622.01 samples/sec Loss 5.5259 LearningRate 0.0166 Epoch: 11 Global Step: 491980 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:10,157-Speed 2616.11 samples/sec Loss 5.5174 LearningRate 0.0166 Epoch: 11 Global Step: 491990 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:01:14,074-Speed 2615.22 samples/sec Loss 5.4288 LearningRate 0.0166 Epoch: 11 Global Step: 492000 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:01:18,008-Speed 2602.86 samples/sec Loss 5.5479 LearningRate 0.0166 Epoch: 11 Global Step: 492010 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:01:21,928-Speed 2612.92 samples/sec Loss 5.3780 LearningRate 0.0166 Epoch: 11 Global Step: 492020 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:01:25,842-Speed 2617.09 samples/sec Loss 5.4061 LearningRate 0.0166 Epoch: 11 Global Step: 492030 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:01:29,761-Speed 2614.02 samples/sec Loss 5.4117 LearningRate 0.0166 Epoch: 11 Global Step: 492040 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:01:33,659-Speed 2627.56 samples/sec Loss 5.5177 LearningRate 0.0166 Epoch: 11 Global Step: 492050 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:37,569-Speed 2620.35 samples/sec Loss 5.5736 LearningRate 0.0166 Epoch: 11 Global Step: 492060 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:41,470-Speed 2625.23 samples/sec Loss 5.3872 LearningRate 0.0166 Epoch: 11 Global Step: 492070 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:45,377-Speed 2621.61 samples/sec Loss 5.4065 LearningRate 0.0166 Epoch: 11 Global Step: 492080 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:49,282-Speed 2622.92 samples/sec Loss 5.4923 LearningRate 0.0166 Epoch: 11 Global Step: 492090 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:53,206-Speed 2610.00 samples/sec Loss 5.4925 LearningRate 0.0165 Epoch: 11 Global Step: 492100 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:01:57,113-Speed 2621.67 samples/sec Loss 5.4248 LearningRate 0.0165 Epoch: 11 Global Step: 492110 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:02:01,021-Speed 2620.97 samples/sec Loss 5.5341 LearningRate 0.0165 Epoch: 11 Global Step: 492120 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:02:04,945-Speed 2610.62 samples/sec Loss 5.5198 LearningRate 0.0165 Epoch: 11 Global Step: 492130 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:02:08,860-Speed 2616.06 samples/sec Loss 5.4412 LearningRate 0.0165 Epoch: 11 Global Step: 492140 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:02:12,762-Speed 2624.52 samples/sec Loss 5.4764 LearningRate 0.0165 Epoch: 11 Global Step: 492150 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:16,669-Speed 2621.51 samples/sec Loss 5.4378 LearningRate 0.0165 Epoch: 11 Global Step: 492160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:20,573-Speed 2624.22 samples/sec Loss 5.3282 LearningRate 0.0165 Epoch: 11 Global Step: 492170 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:24,477-Speed 2622.99 samples/sec Loss 5.4584 LearningRate 0.0165 Epoch: 11 Global Step: 492180 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:28,379-Speed 2624.78 samples/sec Loss 5.4042 LearningRate 0.0165 Epoch: 11 Global Step: 492190 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:32,281-Speed 2625.03 samples/sec Loss 5.4301 LearningRate 0.0165 Epoch: 11 Global Step: 492200 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:36,186-Speed 2622.85 samples/sec Loss 5.3563 LearningRate 0.0165 Epoch: 11 Global Step: 492210 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:40,098-Speed 2618.09 samples/sec Loss 5.4232 LearningRate 0.0165 Epoch: 11 Global Step: 492220 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:44,008-Speed 2619.99 samples/sec Loss 5.5734 LearningRate 0.0165 Epoch: 11 Global Step: 492230 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:47,912-Speed 2623.66 samples/sec Loss 5.4019 LearningRate 0.0165 Epoch: 11 Global Step: 492240 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:51,816-Speed 2623.35 samples/sec Loss 5.4083 LearningRate 0.0165 Epoch: 11 Global Step: 492250 Fp16 Grad Scale: 262144 Required: 38 hours
Training: 2022-04-15 03:02:55,699-Speed 2638.01 samples/sec Loss 5.3755 LearningRate 0.0165 Epoch: 11 Global Step: 492260 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:02:59,580-Speed 2639.50 samples/sec Loss 5.3891 LearningRate 0.0165 Epoch: 11 Global Step: 492270 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:03,485-Speed 2622.79 samples/sec Loss 5.3561 LearningRate 0.0165 Epoch: 11 Global Step: 492280 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:07,391-Speed 2621.41 samples/sec Loss 5.5201 LearningRate 0.0165 Epoch: 11 Global Step: 492290 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:11,291-Speed 2627.05 samples/sec Loss 5.4345 LearningRate 0.0165 Epoch: 11 Global Step: 492300 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:15,194-Speed 2624.17 samples/sec Loss 5.4916 LearningRate 0.0165 Epoch: 11 Global Step: 492310 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:19,093-Speed 2627.00 samples/sec Loss 5.3359 LearningRate 0.0165 Epoch: 11 Global Step: 492320 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:22,997-Speed 2624.44 samples/sec Loss 5.3729 LearningRate 0.0165 Epoch: 11 Global Step: 492330 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:26,904-Speed 2621.02 samples/sec Loss 5.4757 LearningRate 0.0165 Epoch: 11 Global Step: 492340 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:30,820-Speed 2616.17 samples/sec Loss 5.5016 LearningRate 0.0165 Epoch: 11 Global Step: 492350 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:34,726-Speed 2621.82 samples/sec Loss 5.4653 LearningRate 0.0165 Epoch: 11 Global Step: 492360 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:38,631-Speed 2622.83 samples/sec Loss 5.4537 LearningRate 0.0165 Epoch: 11 Global Step: 492370 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:03:42,512-Speed 2638.76 samples/sec Loss 5.4399 LearningRate 0.0165 Epoch: 11 Global Step: 492380 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:46,424-Speed 2618.13 samples/sec Loss 5.3776 LearningRate 0.0165 Epoch: 11 Global Step: 492390 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:50,330-Speed 2623.01 samples/sec Loss 5.4727 LearningRate 0.0165 Epoch: 11 Global Step: 492400 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:54,237-Speed 2620.93 samples/sec Loss 5.4522 LearningRate 0.0165 Epoch: 11 Global Step: 492410 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:03:58,147-Speed 2620.09 samples/sec Loss 5.4956 LearningRate 0.0165 Epoch: 11 Global Step: 492420 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:02,054-Speed 2621.58 samples/sec Loss 5.4613 LearningRate 0.0165 Epoch: 11 Global Step: 492430 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:05,953-Speed 2627.07 samples/sec Loss 5.5278 LearningRate 0.0165 Epoch: 11 Global Step: 492440 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:09,873-Speed 2612.30 samples/sec Loss 5.3903 LearningRate 0.0165 Epoch: 11 Global Step: 492450 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:13,778-Speed 2622.98 samples/sec Loss 5.4093 LearningRate 0.0165 Epoch: 11 Global Step: 492460 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:17,680-Speed 2624.98 samples/sec Loss 5.6398 LearningRate 0.0165 Epoch: 11 Global Step: 492470 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:21,589-Speed 2620.21 samples/sec Loss 5.4084 LearningRate 0.0165 Epoch: 11 Global Step: 492480 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:04:25,492-Speed 2624.30 samples/sec Loss 5.3550 LearningRate 0.0165 Epoch: 11 Global Step: 492490 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:04:29,398-Speed 2622.07 samples/sec Loss 5.4278 LearningRate 0.0165 Epoch: 11 Global Step: 492500 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:04:33,308-Speed 2619.44 samples/sec Loss 5.4328 LearningRate 0.0165 Epoch: 11 Global Step: 492510 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:04:37,206-Speed 2627.72 samples/sec Loss 5.3812 LearningRate 0.0165 Epoch: 11 Global Step: 492520 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:04:41,115-Speed 2619.92 samples/sec Loss 5.3270 LearningRate 0.0165 Epoch: 11 Global Step: 492530 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:04:45,003-Speed 2634.68 samples/sec Loss 5.5391 LearningRate 0.0165 Epoch: 11 Global Step: 492540 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:48,910-Speed 2621.35 samples/sec Loss 5.5115 LearningRate 0.0165 Epoch: 11 Global Step: 492550 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:52,824-Speed 2617.50 samples/sec Loss 5.4679 LearningRate 0.0165 Epoch: 11 Global Step: 492560 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:04:56,733-Speed 2619.91 samples/sec Loss 5.4689 LearningRate 0.0165 Epoch: 11 Global Step: 492570 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:00,635-Speed 2624.83 samples/sec Loss 5.4097 LearningRate 0.0165 Epoch: 11 Global Step: 492580 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:04,547-Speed 2618.14 samples/sec Loss 5.4681 LearningRate 0.0165 Epoch: 11 Global Step: 492590 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:08,452-Speed 2622.63 samples/sec Loss 5.4373 LearningRate 0.0165 Epoch: 11 Global Step: 492600 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:12,358-Speed 2622.81 samples/sec Loss 5.4224 LearningRate 0.0165 Epoch: 11 Global Step: 492610 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:16,261-Speed 2623.98 samples/sec Loss 5.3629 LearningRate 0.0165 Epoch: 11 Global Step: 492620 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:20,168-Speed 2621.75 samples/sec Loss 5.2737 LearningRate 0.0165 Epoch: 11 Global Step: 492630 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:24,076-Speed 2621.27 samples/sec Loss 5.3559 LearningRate 0.0165 Epoch: 11 Global Step: 492640 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:05:27,954-Speed 2641.19 samples/sec Loss 5.5204 LearningRate 0.0165 Epoch: 11 Global Step: 492650 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:31,871-Speed 2614.15 samples/sec Loss 5.3855 LearningRate 0.0165 Epoch: 11 Global Step: 492660 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:35,773-Speed 2625.11 samples/sec Loss 5.4316 LearningRate 0.0165 Epoch: 11 Global Step: 492670 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:39,684-Speed 2618.87 samples/sec Loss 5.4595 LearningRate 0.0165 Epoch: 11 Global Step: 492680 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:43,585-Speed 2625.59 samples/sec Loss 5.4792 LearningRate 0.0165 Epoch: 11 Global Step: 492690 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:47,486-Speed 2625.38 samples/sec Loss 5.4507 LearningRate 0.0165 Epoch: 11 Global Step: 492700 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:51,389-Speed 2624.82 samples/sec Loss 5.3835 LearningRate 0.0165 Epoch: 11 Global Step: 492710 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:55,286-Speed 2627.77 samples/sec Loss 5.3643 LearningRate 0.0165 Epoch: 11 Global Step: 492720 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:05:59,189-Speed 2624.59 samples/sec Loss 5.4902 LearningRate 0.0165 Epoch: 11 Global Step: 492730 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:06:03,088-Speed 2626.76 samples/sec Loss 5.5067 LearningRate 0.0165 Epoch: 11 Global Step: 492740 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:06:06,997-Speed 2619.90 samples/sec Loss 5.3587 LearningRate 0.0165 Epoch: 11 Global Step: 492750 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:10,898-Speed 2625.63 samples/sec Loss 5.4554 LearningRate 0.0165 Epoch: 11 Global Step: 492760 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:14,795-Speed 2628.31 samples/sec Loss 5.4833 LearningRate 0.0165 Epoch: 11 Global Step: 492770 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:18,694-Speed 2627.06 samples/sec Loss 5.4262 LearningRate 0.0165 Epoch: 11 Global Step: 492780 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:22,600-Speed 2621.86 samples/sec Loss 5.3454 LearningRate 0.0165 Epoch: 11 Global Step: 492790 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:26,499-Speed 2627.35 samples/sec Loss 5.4106 LearningRate 0.0165 Epoch: 11 Global Step: 492800 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:30,410-Speed 2618.76 samples/sec Loss 5.5276 LearningRate 0.0165 Epoch: 11 Global Step: 492810 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:34,323-Speed 2617.68 samples/sec Loss 5.4554 LearningRate 0.0165 Epoch: 11 Global Step: 492820 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:38,229-Speed 2622.49 samples/sec Loss 5.4668 LearningRate 0.0165 Epoch: 11 Global Step: 492830 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:42,137-Speed 2620.65 samples/sec Loss 5.4586 LearningRate 0.0165 Epoch: 11 Global Step: 492840 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:46,018-Speed 2638.62 samples/sec Loss 5.3893 LearningRate 0.0165 Epoch: 11 Global Step: 492850 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:49,934-Speed 2616.04 samples/sec Loss 5.4858 LearningRate 0.0165 Epoch: 11 Global Step: 492860 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:53,835-Speed 2625.28 samples/sec Loss 5.4932 LearningRate 0.0165 Epoch: 11 Global Step: 492870 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:06:57,732-Speed 2628.74 samples/sec Loss 5.5966 LearningRate 0.0165 Epoch: 11 Global Step: 492880 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:07:01,643-Speed 2618.35 samples/sec Loss 5.4133 LearningRate 0.0165 Epoch: 11 Global Step: 492890 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:07:05,663-Speed 2548.65 samples/sec Loss 5.4260 LearningRate 0.0165 Epoch: 11 Global Step: 492900 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:07:09,585-Speed 2611.57 samples/sec Loss 5.5069 LearningRate 0.0165 Epoch: 11 Global Step: 492910 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:07:13,484-Speed 2626.63 samples/sec Loss 5.4660 LearningRate 0.0165 Epoch: 11 Global Step: 492920 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:07:17,368-Speed 2636.91 samples/sec Loss 5.4593 LearningRate 0.0165 Epoch: 11 Global Step: 492930 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:21,269-Speed 2625.76 samples/sec Loss 5.4368 LearningRate 0.0165 Epoch: 11 Global Step: 492940 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:25,177-Speed 2620.55 samples/sec Loss 5.3721 LearningRate 0.0165 Epoch: 11 Global Step: 492950 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:29,076-Speed 2627.01 samples/sec Loss 5.5232 LearningRate 0.0165 Epoch: 11 Global Step: 492960 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:32,983-Speed 2621.58 samples/sec Loss 5.4851 LearningRate 0.0165 Epoch: 11 Global Step: 492970 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:36,903-Speed 2613.22 samples/sec Loss 5.4374 LearningRate 0.0165 Epoch: 11 Global Step: 492980 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:40,800-Speed 2628.05 samples/sec Loss 5.4141 LearningRate 0.0165 Epoch: 11 Global Step: 492990 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:44,704-Speed 2623.61 samples/sec Loss 5.4496 LearningRate 0.0165 Epoch: 11 Global Step: 493000 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:48,603-Speed 2626.57 samples/sec Loss 5.3781 LearningRate 0.0165 Epoch: 11 Global Step: 493010 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:52,510-Speed 2621.32 samples/sec Loss 5.3683 LearningRate 0.0165 Epoch: 11 Global Step: 493020 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:07:56,411-Speed 2626.10 samples/sec Loss 5.4755 LearningRate 0.0165 Epoch: 11 Global Step: 493030 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:00,312-Speed 2625.28 samples/sec Loss 5.5836 LearningRate 0.0165 Epoch: 11 Global Step: 493040 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:04,222-Speed 2619.74 samples/sec Loss 5.4591 LearningRate 0.0165 Epoch: 11 Global Step: 493050 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:08,118-Speed 2628.59 samples/sec Loss 5.4481 LearningRate 0.0165 Epoch: 11 Global Step: 493060 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:12,016-Speed 2627.55 samples/sec Loss 5.4819 LearningRate 0.0165 Epoch: 11 Global Step: 493070 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:15,922-Speed 2621.97 samples/sec Loss 5.4403 LearningRate 0.0165 Epoch: 11 Global Step: 493080 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:19,823-Speed 2625.93 samples/sec Loss 5.2873 LearningRate 0.0165 Epoch: 11 Global Step: 493090 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:23,721-Speed 2627.84 samples/sec Loss 5.4416 LearningRate 0.0165 Epoch: 11 Global Step: 493100 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:27,617-Speed 2628.75 samples/sec Loss 5.3992 LearningRate 0.0165 Epoch: 11 Global Step: 493110 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:31,515-Speed 2627.87 samples/sec Loss 5.4861 LearningRate 0.0164 Epoch: 11 Global Step: 493120 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:35,394-Speed 2640.75 samples/sec Loss 5.3134 LearningRate 0.0164 Epoch: 11 Global Step: 493130 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:39,304-Speed 2619.73 samples/sec Loss 5.4630 LearningRate 0.0164 Epoch: 11 Global Step: 493140 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:43,204-Speed 2626.13 samples/sec Loss 5.4514 LearningRate 0.0164 Epoch: 11 Global Step: 493150 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:47,104-Speed 2625.98 samples/sec Loss 5.4178 LearningRate 0.0164 Epoch: 11 Global Step: 493160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:51,008-Speed 2623.41 samples/sec Loss 5.5231 LearningRate 0.0164 Epoch: 11 Global Step: 493170 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:54,924-Speed 2615.87 samples/sec Loss 5.4235 LearningRate 0.0164 Epoch: 11 Global Step: 493180 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:08:58,817-Speed 2631.22 samples/sec Loss 5.5021 LearningRate 0.0164 Epoch: 11 Global Step: 493190 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:02,722-Speed 2623.18 samples/sec Loss 5.4490 LearningRate 0.0164 Epoch: 11 Global Step: 493200 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:06,621-Speed 2626.27 samples/sec Loss 5.3832 LearningRate 0.0164 Epoch: 11 Global Step: 493210 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:10,522-Speed 2625.71 samples/sec Loss 5.3985 LearningRate 0.0164 Epoch: 11 Global Step: 493220 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:14,400-Speed 2641.06 samples/sec Loss 5.5047 LearningRate 0.0164 Epoch: 11 Global Step: 493230 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:18,302-Speed 2624.84 samples/sec Loss 5.4414 LearningRate 0.0164 Epoch: 11 Global Step: 493240 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:22,205-Speed 2624.42 samples/sec Loss 5.4014 LearningRate 0.0164 Epoch: 11 Global Step: 493250 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:26,107-Speed 2624.67 samples/sec Loss 5.4227 LearningRate 0.0164 Epoch: 11 Global Step: 493260 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:30,014-Speed 2622.06 samples/sec Loss 5.4048 LearningRate 0.0164 Epoch: 11 Global Step: 493270 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:33,916-Speed 2624.43 samples/sec Loss 5.4209 LearningRate 0.0164 Epoch: 11 Global Step: 493280 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:37,812-Speed 2629.39 samples/sec Loss 5.4110 LearningRate 0.0164 Epoch: 11 Global Step: 493290 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:41,723-Speed 2618.61 samples/sec Loss 5.4710 LearningRate 0.0164 Epoch: 11 Global Step: 493300 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:45,686-Speed 2585.27 samples/sec Loss 5.3170 LearningRate 0.0164 Epoch: 11 Global Step: 493310 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:49,619-Speed 2603.72 samples/sec Loss 5.4202 LearningRate 0.0164 Epoch: 11 Global Step: 493320 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:09:53,522-Speed 2624.71 samples/sec Loss 5.3365 LearningRate 0.0164 Epoch: 11 Global Step: 493330 Fp16 Grad Scale: 262144 Required: 38 hours
Training: 2022-04-15 03:09:57,406-Speed 2636.35 samples/sec Loss 5.3602 LearningRate 0.0164 Epoch: 11 Global Step: 493340 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:10:01,312-Speed 2622.39 samples/sec Loss 5.4216 LearningRate 0.0164 Epoch: 11 Global Step: 493350 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:10:05,197-Speed 2636.27 samples/sec Loss 5.4151 LearningRate 0.0164 Epoch: 11 Global Step: 493360 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:09,105-Speed 2621.22 samples/sec Loss 5.3309 LearningRate 0.0164 Epoch: 11 Global Step: 493370 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:13,004-Speed 2626.42 samples/sec Loss 5.4079 LearningRate 0.0164 Epoch: 11 Global Step: 493380 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:16,901-Speed 2628.63 samples/sec Loss 5.4104 LearningRate 0.0164 Epoch: 11 Global Step: 493390 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:20,802-Speed 2625.90 samples/sec Loss 5.3856 LearningRate 0.0164 Epoch: 11 Global Step: 493400 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:24,699-Speed 2628.27 samples/sec Loss 5.4890 LearningRate 0.0164 Epoch: 11 Global Step: 493410 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:28,598-Speed 2626.96 samples/sec Loss 5.3748 LearningRate 0.0164 Epoch: 11 Global Step: 493420 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:32,495-Speed 2627.90 samples/sec Loss 5.4602 LearningRate 0.0164 Epoch: 11 Global Step: 493430 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:36,415-Speed 2613.10 samples/sec Loss 5.3466 LearningRate 0.0164 Epoch: 11 Global Step: 493440 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:40,323-Speed 2620.52 samples/sec Loss 5.4057 LearningRate 0.0164 Epoch: 11 Global Step: 493450 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:10:44,241-Speed 2622.45 samples/sec Loss 5.4936 LearningRate 0.0164 Epoch: 11 Global Step: 493460 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:10:48,138-Speed 2628.62 samples/sec Loss 5.4179 LearningRate 0.0164 Epoch: 11 Global Step: 493470 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:10:52,032-Speed 2630.40 samples/sec Loss 5.3773 LearningRate 0.0164 Epoch: 11 Global Step: 493480 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:10:55,928-Speed 2628.64 samples/sec Loss 5.5296 LearningRate 0.0164 Epoch: 11 Global Step: 493490 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:10:59,832-Speed 2623.97 samples/sec Loss 5.4187 LearningRate 0.0164 Epoch: 11 Global Step: 493500 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:03,727-Speed 2629.80 samples/sec Loss 5.4606 LearningRate 0.0164 Epoch: 11 Global Step: 493510 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:07,622-Speed 2628.98 samples/sec Loss 5.4364 LearningRate 0.0164 Epoch: 11 Global Step: 493520 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:11,521-Speed 2626.99 samples/sec Loss 5.4093 LearningRate 0.0164 Epoch: 11 Global Step: 493530 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:15,422-Speed 2625.79 samples/sec Loss 5.5791 LearningRate 0.0164 Epoch: 11 Global Step: 493540 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:19,333-Speed 2618.84 samples/sec Loss 5.3980 LearningRate 0.0164 Epoch: 11 Global Step: 493550 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:23,212-Speed 2643.08 samples/sec Loss 5.4118 LearningRate 0.0164 Epoch: 11 Global Step: 493560 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:27,110-Speed 2627.93 samples/sec Loss 5.3804 LearningRate 0.0164 Epoch: 11 Global Step: 493570 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:31,009-Speed 2627.11 samples/sec Loss 5.4490 LearningRate 0.0164 Epoch: 11 Global Step: 493580 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:34,906-Speed 2627.96 samples/sec Loss 5.3603 LearningRate 0.0164 Epoch: 11 Global Step: 493590 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:38,804-Speed 2628.08 samples/sec Loss 5.5101 LearningRate 0.0164 Epoch: 11 Global Step: 493600 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:42,712-Speed 2620.24 samples/sec Loss 5.4446 LearningRate 0.0164 Epoch: 11 Global Step: 493610 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:46,607-Speed 2629.95 samples/sec Loss 5.3699 LearningRate 0.0164 Epoch: 11 Global Step: 493620 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:50,503-Speed 2628.80 samples/sec Loss 5.5643 LearningRate 0.0164 Epoch: 11 Global Step: 493630 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:11:54,374-Speed 2645.97 samples/sec Loss 5.3516 LearningRate 0.0164 Epoch: 11 Global Step: 493640 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:11:58,277-Speed 2624.06 samples/sec Loss 5.4625 LearningRate 0.0164 Epoch: 11 Global Step: 493650 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:02,196-Speed 2614.03 samples/sec Loss 5.4351 LearningRate 0.0164 Epoch: 11 Global Step: 493660 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:06,098-Speed 2624.78 samples/sec Loss 5.3539 LearningRate 0.0164 Epoch: 11 Global Step: 493670 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:09,995-Speed 2627.87 samples/sec Loss 5.4522 LearningRate 0.0164 Epoch: 11 Global Step: 493680 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:13,902-Speed 2621.85 samples/sec Loss 5.3708 LearningRate 0.0164 Epoch: 11 Global Step: 493690 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:17,804-Speed 2624.69 samples/sec Loss 5.3493 LearningRate 0.0164 Epoch: 11 Global Step: 493700 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:21,700-Speed 2629.26 samples/sec Loss 5.4212 LearningRate 0.0164 Epoch: 11 Global Step: 493710 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:25,634-Speed 2603.24 samples/sec Loss 5.4550 LearningRate 0.0164 Epoch: 11 Global Step: 493720 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:29,534-Speed 2626.69 samples/sec Loss 5.4339 LearningRate 0.0164 Epoch: 11 Global Step: 493730 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:12:33,503-Speed 2580.37 samples/sec Loss 5.5034 LearningRate 0.0164 Epoch: 11 Global Step: 493740 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:12:37,437-Speed 2603.49 samples/sec Loss 5.3481 LearningRate 0.0164 Epoch: 11 Global Step: 493750 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:12:41,342-Speed 2622.73 samples/sec Loss 5.4643 LearningRate 0.0164 Epoch: 11 Global Step: 493760 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:12:45,259-Speed 2614.88 samples/sec Loss 5.3349 LearningRate 0.0164 Epoch: 11 Global Step: 493770 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:12:49,165-Speed 2622.03 samples/sec Loss 5.4261 LearningRate 0.0164 Epoch: 11 Global Step: 493780 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:12:53,073-Speed 2620.68 samples/sec Loss 5.5870 LearningRate 0.0164 Epoch: 11 Global Step: 493790 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:12:56,979-Speed 2622.56 samples/sec Loss 5.4069 LearningRate 0.0164 Epoch: 11 Global Step: 493800 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:01,389-Speed 2322.46 samples/sec Loss 5.4000 LearningRate 0.0164 Epoch: 11 Global Step: 493810 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:05,283-Speed 2630.33 samples/sec Loss 5.3850 LearningRate 0.0164 Epoch: 11 Global Step: 493820 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:09,183-Speed 2626.53 samples/sec Loss 5.5240 LearningRate 0.0164 Epoch: 11 Global Step: 493830 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:13,069-Speed 2635.49 samples/sec Loss 5.4209 LearningRate 0.0164 Epoch: 11 Global Step: 493840 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:16,965-Speed 2628.90 samples/sec Loss 5.3254 LearningRate 0.0164 Epoch: 11 Global Step: 493850 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:20,873-Speed 2620.92 samples/sec Loss 5.4747 LearningRate 0.0164 Epoch: 11 Global Step: 493860 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:24,766-Speed 2630.58 samples/sec Loss 5.3987 LearningRate 0.0164 Epoch: 11 Global Step: 493870 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:28,667-Speed 2625.86 samples/sec Loss 5.4833 LearningRate 0.0164 Epoch: 11 Global Step: 493880 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:13:32,546-Speed 2640.91 samples/sec Loss 5.4330 LearningRate 0.0164 Epoch: 11 Global Step: 493890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:13:36,504-Speed 2587.54 samples/sec Loss 5.4510 LearningRate 0.0164 Epoch: 11 Global Step: 493900 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:13:40,412-Speed 2620.71 samples/sec Loss 5.3671 LearningRate 0.0164 Epoch: 11 Global Step: 493910 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:13:44,312-Speed 2626.30 samples/sec Loss 5.3234 LearningRate 0.0164 Epoch: 11 Global Step: 493920 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:13:48,222-Speed 2619.48 samples/sec Loss 5.4363 LearningRate 0.0164 Epoch: 11 Global Step: 493930 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:13:52,114-Speed 2631.78 samples/sec Loss 5.4739 LearningRate 0.0164 Epoch: 11 Global Step: 493940 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:13:56,004-Speed 2632.62 samples/sec Loss 5.3302 LearningRate 0.0164 Epoch: 11 Global Step: 493950 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:13:59,900-Speed 2629.12 samples/sec Loss 5.3723 LearningRate 0.0164 Epoch: 11 Global Step: 493960 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:14:03,794-Speed 2630.48 samples/sec Loss 5.3305 LearningRate 0.0164 Epoch: 11 Global Step: 493970 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:14:07,688-Speed 2630.00 samples/sec Loss 5.4541 LearningRate 0.0164 Epoch: 11 Global Step: 493980 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:14:11,584-Speed 2629.57 samples/sec Loss 5.4174 LearningRate 0.0164 Epoch: 11 Global Step: 493990 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:15,486-Speed 2624.19 samples/sec Loss 5.4359 LearningRate 0.0164 Epoch: 11 Global Step: 494000 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:19,393-Speed 2621.92 samples/sec Loss 5.4535 LearningRate 0.0164 Epoch: 11 Global Step: 494010 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:23,292-Speed 2626.82 samples/sec Loss 5.3800 LearningRate 0.0164 Epoch: 11 Global Step: 494020 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:27,200-Speed 2621.22 samples/sec Loss 5.5235 LearningRate 0.0164 Epoch: 11 Global Step: 494030 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:31,249-Speed 2529.33 samples/sec Loss 5.3666 LearningRate 0.0164 Epoch: 11 Global Step: 494040 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:35,358-Speed 2492.80 samples/sec Loss 5.3353 LearningRate 0.0164 Epoch: 11 Global Step: 494050 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:39,307-Speed 2593.45 samples/sec Loss 5.4453 LearningRate 0.0164 Epoch: 11 Global Step: 494060 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:43,206-Speed 2626.56 samples/sec Loss 5.4045 LearningRate 0.0164 Epoch: 11 Global Step: 494070 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:47,107-Speed 2626.22 samples/sec Loss 5.3571 LearningRate 0.0164 Epoch: 11 Global Step: 494080 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:50,977-Speed 2646.47 samples/sec Loss 5.4267 LearningRate 0.0164 Epoch: 11 Global Step: 494090 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:54,874-Speed 2628.77 samples/sec Loss 5.5005 LearningRate 0.0164 Epoch: 11 Global Step: 494100 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:14:58,774-Speed 2626.08 samples/sec Loss 5.4700 LearningRate 0.0164 Epoch: 11 Global Step: 494110 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:02,681-Speed 2621.57 samples/sec Loss 5.3837 LearningRate 0.0164 Epoch: 11 Global Step: 494120 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:06,588-Speed 2621.03 samples/sec Loss 5.3330 LearningRate 0.0164 Epoch: 11 Global Step: 494130 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:10,485-Speed 2628.51 samples/sec Loss 5.5332 LearningRate 0.0163 Epoch: 11 Global Step: 494140 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:14,382-Speed 2627.89 samples/sec Loss 5.3489 LearningRate 0.0163 Epoch: 11 Global Step: 494150 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:18,283-Speed 2625.85 samples/sec Loss 5.4557 LearningRate 0.0163 Epoch: 11 Global Step: 494160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:22,199-Speed 2615.72 samples/sec Loss 5.3789 LearningRate 0.0163 Epoch: 11 Global Step: 494170 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:26,101-Speed 2624.76 samples/sec Loss 5.4439 LearningRate 0.0163 Epoch: 11 Global Step: 494180 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:29,987-Speed 2636.24 samples/sec Loss 5.3066 LearningRate 0.0163 Epoch: 11 Global Step: 494190 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:33,889-Speed 2624.68 samples/sec Loss 5.3483 LearningRate 0.0163 Epoch: 11 Global Step: 494200 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:37,781-Speed 2631.77 samples/sec Loss 5.3494 LearningRate 0.0163 Epoch: 11 Global Step: 494210 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:41,679-Speed 2627.39 samples/sec Loss 5.4348 LearningRate 0.0163 Epoch: 11 Global Step: 494220 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:45,582-Speed 2624.56 samples/sec Loss 5.4744 LearningRate 0.0163 Epoch: 11 Global Step: 494230 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:49,475-Speed 2630.78 samples/sec Loss 5.5058 LearningRate 0.0163 Epoch: 11 Global Step: 494240 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:53,379-Speed 2623.55 samples/sec Loss 5.4095 LearningRate 0.0163 Epoch: 11 Global Step: 494250 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:15:57,275-Speed 2629.16 samples/sec Loss 5.4226 LearningRate 0.0163 Epoch: 11 Global Step: 494260 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:16:01,168-Speed 2630.76 samples/sec Loss 5.3679 LearningRate 0.0163 Epoch: 11 Global Step: 494270 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:16:05,049-Speed 2639.33 samples/sec Loss 5.3375 LearningRate 0.0163 Epoch: 11 Global Step: 494280 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:08,953-Speed 2623.59 samples/sec Loss 5.5241 LearningRate 0.0163 Epoch: 11 Global Step: 494290 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:12,854-Speed 2625.20 samples/sec Loss 5.5259 LearningRate 0.0163 Epoch: 11 Global Step: 494300 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:16,758-Speed 2624.26 samples/sec Loss 5.4847 LearningRate 0.0163 Epoch: 11 Global Step: 494310 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:20,650-Speed 2631.52 samples/sec Loss 5.4210 LearningRate 0.0163 Epoch: 11 Global Step: 494320 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:24,553-Speed 2624.09 samples/sec Loss 5.4462 LearningRate 0.0163 Epoch: 11 Global Step: 494330 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:28,453-Speed 2626.17 samples/sec Loss 5.3664 LearningRate 0.0163 Epoch: 11 Global Step: 494340 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:32,354-Speed 2626.13 samples/sec Loss 5.4366 LearningRate 0.0163 Epoch: 11 Global Step: 494350 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:36,270-Speed 2614.95 samples/sec Loss 5.3394 LearningRate 0.0163 Epoch: 11 Global Step: 494360 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:40,173-Speed 2624.18 samples/sec Loss 5.5136 LearningRate 0.0163 Epoch: 11 Global Step: 494370 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:16:44,089-Speed 2616.28 samples/sec Loss 5.4038 LearningRate 0.0163 Epoch: 11 Global Step: 494380 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:16:47,988-Speed 2627.19 samples/sec Loss 5.4045 LearningRate 0.0163 Epoch: 11 Global Step: 494390 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:16:51,899-Speed 2618.63 samples/sec Loss 5.3709 LearningRate 0.0163 Epoch: 11 Global Step: 494400 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:16:55,805-Speed 2622.03 samples/sec Loss 5.3521 LearningRate 0.0163 Epoch: 11 Global Step: 494410 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:16:59,683-Speed 2641.20 samples/sec Loss 5.4176 LearningRate 0.0163 Epoch: 11 Global Step: 494420 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:03,577-Speed 2630.62 samples/sec Loss 5.3550 LearningRate 0.0163 Epoch: 11 Global Step: 494430 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:07,495-Speed 2613.77 samples/sec Loss 5.4673 LearningRate 0.0163 Epoch: 11 Global Step: 494440 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:11,390-Speed 2629.30 samples/sec Loss 5.4455 LearningRate 0.0163 Epoch: 11 Global Step: 494450 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:15,291-Speed 2626.18 samples/sec Loss 5.3912 LearningRate 0.0163 Epoch: 11 Global Step: 494460 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:19,216-Speed 2609.69 samples/sec Loss 5.3885 LearningRate 0.0163 Epoch: 11 Global Step: 494470 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:23,115-Speed 2626.83 samples/sec Loss 5.4535 LearningRate 0.0163 Epoch: 11 Global Step: 494480 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:27,012-Speed 2628.58 samples/sec Loss 5.3498 LearningRate 0.0163 Epoch: 11 Global Step: 494490 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:30,908-Speed 2628.77 samples/sec Loss 5.3438 LearningRate 0.0163 Epoch: 11 Global Step: 494500 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:34,807-Speed 2626.86 samples/sec Loss 5.2601 LearningRate 0.0163 Epoch: 11 Global Step: 494510 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:38,717-Speed 2618.95 samples/sec Loss 5.4875 LearningRate 0.0163 Epoch: 11 Global Step: 494520 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:17:42,628-Speed 2618.62 samples/sec Loss 5.3953 LearningRate 0.0163 Epoch: 11 Global Step: 494530 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:17:46,538-Speed 2620.21 samples/sec Loss 5.4871 LearningRate 0.0163 Epoch: 11 Global Step: 494540 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:17:50,412-Speed 2643.54 samples/sec Loss 5.3276 LearningRate 0.0163 Epoch: 11 Global Step: 494550 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:54,311-Speed 2627.00 samples/sec Loss 5.3588 LearningRate 0.0163 Epoch: 11 Global Step: 494560 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:17:58,210-Speed 2626.94 samples/sec Loss 5.4477 LearningRate 0.0163 Epoch: 11 Global Step: 494570 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:02,104-Speed 2631.14 samples/sec Loss 5.3832 LearningRate 0.0163 Epoch: 11 Global Step: 494580 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:06,000-Speed 2628.53 samples/sec Loss 5.4300 LearningRate 0.0163 Epoch: 11 Global Step: 494590 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:09,904-Speed 2623.64 samples/sec Loss 5.3210 LearningRate 0.0163 Epoch: 11 Global Step: 494600 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:13,815-Speed 2618.74 samples/sec Loss 5.4182 LearningRate 0.0163 Epoch: 11 Global Step: 494610 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:17,743-Speed 2607.98 samples/sec Loss 5.3681 LearningRate 0.0163 Epoch: 11 Global Step: 494620 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:21,839-Speed 2500.36 samples/sec Loss 5.2787 LearningRate 0.0163 Epoch: 11 Global Step: 494630 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:25,784-Speed 2595.88 samples/sec Loss 5.3785 LearningRate 0.0163 Epoch: 11 Global Step: 494640 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:18:29,690-Speed 2622.75 samples/sec Loss 5.4000 LearningRate 0.0163 Epoch: 11 Global Step: 494650 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:18:33,594-Speed 2623.45 samples/sec Loss 5.5049 LearningRate 0.0163 Epoch: 11 Global Step: 494660 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:18:37,493-Speed 2627.08 samples/sec Loss 5.3868 LearningRate 0.0163 Epoch: 11 Global Step: 494670 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:18:41,390-Speed 2628.59 samples/sec Loss 5.2919 LearningRate 0.0163 Epoch: 11 Global Step: 494680 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:18:45,286-Speed 2628.91 samples/sec Loss 5.5713 LearningRate 0.0163 Epoch: 11 Global Step: 494690 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:18:49,184-Speed 2627.41 samples/sec Loss 5.4112 LearningRate 0.0163 Epoch: 11 Global Step: 494700 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:18:53,078-Speed 2630.46 samples/sec Loss 5.3823 LearningRate 0.0163 Epoch: 11 Global Step: 494710 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:18:56,978-Speed 2626.41 samples/sec Loss 5.4086 LearningRate 0.0163 Epoch: 11 Global Step: 494720 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:00,884-Speed 2622.02 samples/sec Loss 5.4314 LearningRate 0.0163 Epoch: 11 Global Step: 494730 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:04,808-Speed 2610.12 samples/sec Loss 5.3361 LearningRate 0.0163 Epoch: 11 Global Step: 494740 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:08,686-Speed 2641.26 samples/sec Loss 5.4693 LearningRate 0.0163 Epoch: 11 Global Step: 494750 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:12,660-Speed 2577.14 samples/sec Loss 5.4213 LearningRate 0.0163 Epoch: 11 Global Step: 494760 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:16,556-Speed 2628.90 samples/sec Loss 5.4027 LearningRate 0.0163 Epoch: 11 Global Step: 494770 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:20,480-Speed 2610.81 samples/sec Loss 5.3928 LearningRate 0.0163 Epoch: 11 Global Step: 494780 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:24,375-Speed 2629.40 samples/sec Loss 5.4195 LearningRate 0.0163 Epoch: 11 Global Step: 494790 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:28,273-Speed 2628.18 samples/sec Loss 5.3958 LearningRate 0.0163 Epoch: 11 Global Step: 494800 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:19:32,146-Speed 2644.42 samples/sec Loss 5.4088 LearningRate 0.0163 Epoch: 11 Global Step: 494810 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:19:36,044-Speed 2627.96 samples/sec Loss 5.3629 LearningRate 0.0163 Epoch: 11 Global Step: 494820 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:19:39,946-Speed 2624.59 samples/sec Loss 5.4637 LearningRate 0.0163 Epoch: 11 Global Step: 494830 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:19:43,846-Speed 2626.18 samples/sec Loss 5.4904 LearningRate 0.0163 Epoch: 11 Global Step: 494840 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:19:47,742-Speed 2628.41 samples/sec Loss 5.4971 LearningRate 0.0163 Epoch: 11 Global Step: 494850 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:19:51,649-Speed 2622.23 samples/sec Loss 5.4220 LearningRate 0.0163 Epoch: 11 Global Step: 494860 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:19:55,551-Speed 2625.18 samples/sec Loss 5.5046 LearningRate 0.0163 Epoch: 11 Global Step: 494870 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:19:59,450-Speed 2627.08 samples/sec Loss 5.3757 LearningRate 0.0163 Epoch: 11 Global Step: 494880 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:20:03,346-Speed 2628.95 samples/sec Loss 5.4776 LearningRate 0.0163 Epoch: 11 Global Step: 494890 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:20:07,246-Speed 2626.74 samples/sec Loss 5.4824 LearningRate 0.0163 Epoch: 11 Global Step: 494900 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:20:11,152-Speed 2622.07 samples/sec Loss 5.5730 LearningRate 0.0163 Epoch: 11 Global Step: 494910 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:15,055-Speed 2624.00 samples/sec Loss 5.4990 LearningRate 0.0163 Epoch: 11 Global Step: 494920 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:18,956-Speed 2625.35 samples/sec Loss 5.5048 LearningRate 0.0163 Epoch: 11 Global Step: 494930 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:22,856-Speed 2625.98 samples/sec Loss 5.3638 LearningRate 0.0163 Epoch: 11 Global Step: 494940 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:26,757-Speed 2626.22 samples/sec Loss 5.3616 LearningRate 0.0163 Epoch: 11 Global Step: 494950 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:30,649-Speed 2631.28 samples/sec Loss 5.4646 LearningRate 0.0163 Epoch: 11 Global Step: 494960 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:34,545-Speed 2629.00 samples/sec Loss 5.4050 LearningRate 0.0163 Epoch: 11 Global Step: 494970 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:38,443-Speed 2628.13 samples/sec Loss 5.3387 LearningRate 0.0163 Epoch: 11 Global Step: 494980 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:42,337-Speed 2630.44 samples/sec Loss 5.3990 LearningRate 0.0163 Epoch: 11 Global Step: 494990 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:46,229-Speed 2631.60 samples/sec Loss 5.3979 LearningRate 0.0163 Epoch: 11 Global Step: 495000 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:50,146-Speed 2614.79 samples/sec Loss 5.3098 LearningRate 0.0163 Epoch: 11 Global Step: 495010 Fp16 Grad Scale: 262144 Required: 38 hours
Training: 2022-04-15 03:20:54,026-Speed 2639.57 samples/sec Loss 5.3470 LearningRate 0.0163 Epoch: 11 Global Step: 495020 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:20:57,936-Speed 2619.99 samples/sec Loss 5.3414 LearningRate 0.0163 Epoch: 11 Global Step: 495030 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:01,927-Speed 2566.01 samples/sec Loss 5.4374 LearningRate 0.0163 Epoch: 11 Global Step: 495040 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:05,918-Speed 2566.24 samples/sec Loss 5.4010 LearningRate 0.0163 Epoch: 11 Global Step: 495050 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:09,811-Speed 2630.79 samples/sec Loss 5.4164 LearningRate 0.0163 Epoch: 11 Global Step: 495060 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:13,802-Speed 2566.48 samples/sec Loss 5.4978 LearningRate 0.0163 Epoch: 11 Global Step: 495070 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:17,714-Speed 2618.51 samples/sec Loss 5.3296 LearningRate 0.0163 Epoch: 11 Global Step: 495080 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:21,611-Speed 2627.78 samples/sec Loss 5.4085 LearningRate 0.0163 Epoch: 11 Global Step: 495090 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:25,513-Speed 2625.49 samples/sec Loss 5.4197 LearningRate 0.0163 Epoch: 11 Global Step: 495100 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:29,409-Speed 2628.69 samples/sec Loss 5.4774 LearningRate 0.0163 Epoch: 11 Global Step: 495110 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:33,283-Speed 2643.96 samples/sec Loss 5.3709 LearningRate 0.0163 Epoch: 11 Global Step: 495120 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:37,200-Speed 2614.47 samples/sec Loss 5.3947 LearningRate 0.0163 Epoch: 11 Global Step: 495130 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:41,095-Speed 2629.60 samples/sec Loss 5.4270 LearningRate 0.0163 Epoch: 11 Global Step: 495140 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:44,992-Speed 2628.13 samples/sec Loss 5.3438 LearningRate 0.0163 Epoch: 11 Global Step: 495150 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:48,888-Speed 2629.55 samples/sec Loss 5.3337 LearningRate 0.0163 Epoch: 11 Global Step: 495160 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:52,789-Speed 2625.98 samples/sec Loss 5.3423 LearningRate 0.0162 Epoch: 11 Global Step: 495170 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:21:56,692-Speed 2623.81 samples/sec Loss 5.3955 LearningRate 0.0162 Epoch: 11 Global Step: 495180 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:22:00,599-Speed 2621.84 samples/sec Loss 5.3795 LearningRate 0.0162 Epoch: 11 Global Step: 495190 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:22:04,495-Speed 2628.61 samples/sec Loss 5.3691 LearningRate 0.0162 Epoch: 11 Global Step: 495200 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:22:08,399-Speed 2623.85 samples/sec Loss 5.3029 LearningRate 0.0162 Epoch: 11 Global Step: 495210 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:12,312-Speed 2617.08 samples/sec Loss 5.4414 LearningRate 0.0162 Epoch: 11 Global Step: 495220 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:16,215-Speed 2624.53 samples/sec Loss 5.3794 LearningRate 0.0162 Epoch: 11 Global Step: 495230 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:20,117-Speed 2624.95 samples/sec Loss 5.3210 LearningRate 0.0162 Epoch: 11 Global Step: 495240 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:24,012-Speed 2629.89 samples/sec Loss 5.3859 LearningRate 0.0162 Epoch: 11 Global Step: 495250 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:27,933-Speed 2612.61 samples/sec Loss 5.2910 LearningRate 0.0162 Epoch: 11 Global Step: 495260 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:31,830-Speed 2628.18 samples/sec Loss 5.2938 LearningRate 0.0162 Epoch: 11 Global Step: 495270 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:35,791-Speed 2585.44 samples/sec Loss 5.4137 LearningRate 0.0162 Epoch: 11 Global Step: 495280 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:39,685-Speed 2630.54 samples/sec Loss 5.4036 LearningRate 0.0162 Epoch: 11 Global Step: 495290 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:43,581-Speed 2629.26 samples/sec Loss 5.3444 LearningRate 0.0162 Epoch: 11 Global Step: 495300 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:22:47,481-Speed 2626.64 samples/sec Loss 5.3190 LearningRate 0.0162 Epoch: 11 Global Step: 495310 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:22:51,379-Speed 2627.19 samples/sec Loss 5.2974 LearningRate 0.0162 Epoch: 11 Global Step: 495320 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:22:55,285-Speed 2629.04 samples/sec Loss 5.3867 LearningRate 0.0162 Epoch: 11 Global Step: 495330 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:22:59,183-Speed 2627.28 samples/sec Loss 5.4667 LearningRate 0.0162 Epoch: 11 Global Step: 495340 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:23:03,088-Speed 2623.06 samples/sec Loss 5.4201 LearningRate 0.0162 Epoch: 11 Global Step: 495350 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:23:07,009-Speed 2612.49 samples/sec Loss 5.2836 LearningRate 0.0162 Epoch: 11 Global Step: 495360 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:23:10,923-Speed 2616.60 samples/sec Loss 5.3100 LearningRate 0.0162 Epoch: 11 Global Step: 495370 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:23:14,803-Speed 2639.78 samples/sec Loss 5.4809 LearningRate 0.0162 Epoch: 11 Global Step: 495380 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:18,806-Speed 2559.23 samples/sec Loss 5.4653 LearningRate 0.0162 Epoch: 11 Global Step: 495390 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:22,704-Speed 2627.64 samples/sec Loss 5.3140 LearningRate 0.0162 Epoch: 11 Global Step: 495400 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:26,615-Speed 2619.57 samples/sec Loss 5.3708 LearningRate 0.0162 Epoch: 11 Global Step: 495410 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:30,509-Speed 2630.40 samples/sec Loss 5.3876 LearningRate 0.0162 Epoch: 11 Global Step: 495420 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:34,408-Speed 2626.26 samples/sec Loss 5.4327 LearningRate 0.0162 Epoch: 11 Global Step: 495430 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:38,304-Speed 2629.57 samples/sec Loss 5.3996 LearningRate 0.0162 Epoch: 11 Global Step: 495440 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:42,213-Speed 2619.94 samples/sec Loss 5.4249 LearningRate 0.0162 Epoch: 11 Global Step: 495450 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:46,109-Speed 2629.12 samples/sec Loss 5.4569 LearningRate 0.0162 Epoch: 11 Global Step: 495460 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:50,006-Speed 2628.17 samples/sec Loss 5.4435 LearningRate 0.0162 Epoch: 11 Global Step: 495470 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:23:53,933-Speed 2608.64 samples/sec Loss 5.3153 LearningRate 0.0162 Epoch: 11 Global Step: 495480 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:23:57,907-Speed 2577.83 samples/sec Loss 5.4369 LearningRate 0.0162 Epoch: 11 Global Step: 495490 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:01,820-Speed 2617.31 samples/sec Loss 5.2637 LearningRate 0.0162 Epoch: 11 Global Step: 495500 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:05,715-Speed 2629.80 samples/sec Loss 5.4604 LearningRate 0.0162 Epoch: 11 Global Step: 495510 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:09,620-Speed 2622.72 samples/sec Loss 5.4839 LearningRate 0.0162 Epoch: 11 Global Step: 495520 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:13,519-Speed 2627.18 samples/sec Loss 5.4195 LearningRate 0.0162 Epoch: 11 Global Step: 495530 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:17,423-Speed 2623.48 samples/sec Loss 5.3204 LearningRate 0.0162 Epoch: 11 Global Step: 495540 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:21,338-Speed 2616.06 samples/sec Loss 5.3872 LearningRate 0.0162 Epoch: 11 Global Step: 495550 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:25,246-Speed 2620.88 samples/sec Loss 5.4299 LearningRate 0.0162 Epoch: 11 Global Step: 495560 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:29,164-Speed 2615.14 samples/sec Loss 5.4836 LearningRate 0.0162 Epoch: 11 Global Step: 495570 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:33,051-Speed 2635.07 samples/sec Loss 5.3831 LearningRate 0.0162 Epoch: 11 Global Step: 495580 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:36,954-Speed 2624.23 samples/sec Loss 5.3503 LearningRate 0.0162 Epoch: 11 Global Step: 495590 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:40,864-Speed 2619.10 samples/sec Loss 5.4696 LearningRate 0.0162 Epoch: 11 Global Step: 495600 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:44,767-Speed 2624.06 samples/sec Loss 5.3711 LearningRate 0.0162 Epoch: 11 Global Step: 495610 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:48,668-Speed 2625.55 samples/sec Loss 5.4314 LearningRate 0.0162 Epoch: 11 Global Step: 495620 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:52,582-Speed 2617.38 samples/sec Loss 5.4473 LearningRate 0.0162 Epoch: 11 Global Step: 495630 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:24:56,478-Speed 2628.43 samples/sec Loss 5.4206 LearningRate 0.0162 Epoch: 11 Global Step: 495640 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:00,376-Speed 2628.20 samples/sec Loss 5.3914 LearningRate 0.0162 Epoch: 11 Global Step: 495650 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:04,271-Speed 2629.53 samples/sec Loss 5.3974 LearningRate 0.0162 Epoch: 11 Global Step: 495660 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:08,205-Speed 2604.24 samples/sec Loss 5.4993 LearningRate 0.0162 Epoch: 11 Global Step: 495670 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:12,081-Speed 2642.41 samples/sec Loss 5.5371 LearningRate 0.0162 Epoch: 11 Global Step: 495680 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:15,997-Speed 2615.20 samples/sec Loss 5.3779 LearningRate 0.0162 Epoch: 11 Global Step: 495690 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:19,902-Speed 2622.75 samples/sec Loss 5.4892 LearningRate 0.0162 Epoch: 11 Global Step: 495700 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:23,839-Speed 2601.66 samples/sec Loss 5.3199 LearningRate 0.0162 Epoch: 11 Global Step: 495710 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:27,741-Speed 2625.84 samples/sec Loss 5.4320 LearningRate 0.0162 Epoch: 11 Global Step: 495720 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:31,683-Speed 2597.77 samples/sec Loss 5.2834 LearningRate 0.0162 Epoch: 11 Global Step: 495730 Fp16 Grad Scale: 131072 Required: 38 hours
Training: 2022-04-15 03:25:35,551-Speed 2649.03 samples/sec Loss 5.4804 LearningRate 0.0162 Epoch: 11 Global Step: 495740 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:25:39,470-Speed 2613.26 samples/sec Loss 5.4371 LearningRate 0.0162 Epoch: 11 Global Step: 495750 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:25:43,370-Speed 2626.53 samples/sec Loss 5.3920 LearningRate 0.0162 Epoch: 11 Global Step: 495760 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:25:47,278-Speed 2620.53 samples/sec Loss 5.4116 LearningRate 0.0162 Epoch: 11 Global Step: 495770 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:25:51,179-Speed 2626.31 samples/sec Loss 5.4419 LearningRate 0.0162 Epoch: 11 Global Step: 495780 Fp16 Grad Scale: 65536 Required: 38 hours
Training: 2022-04-15 03:25:55,102-Speed 2610.45 samples/sec Loss 5.4122 LearningRate 0.0162 Epoch: 11 Global Step: 495790 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:25:59,017-Speed 2616.71 samples/sec Loss 5.3847 LearningRate 0.0162 Epoch: 11 Global Step: 495800 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:02,913-Speed 2629.24 samples/sec Loss 5.3194 LearningRate 0.0162 Epoch: 11 Global Step: 495810 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:06,829-Speed 2615.46 samples/sec Loss 5.4011 LearningRate 0.0162 Epoch: 11 Global Step: 495820 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:10,724-Speed 2629.42 samples/sec Loss 5.4150 LearningRate 0.0162 Epoch: 11 Global Step: 495830 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:14,637-Speed 2618.08 samples/sec Loss 5.4582 LearningRate 0.0162 Epoch: 11 Global Step: 495840 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:26:18,536-Speed 2626.31 samples/sec Loss 5.4786 LearningRate 0.0162 Epoch: 11 Global Step: 495850 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:26:22,414-Speed 2641.51 samples/sec Loss 5.4388 LearningRate 0.0162 Epoch: 11 Global Step: 495860 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:26,306-Speed 2631.52 samples/sec Loss 5.3460 LearningRate 0.0162 Epoch: 11 Global Step: 495870 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:30,216-Speed 2619.45 samples/sec Loss 5.2711 LearningRate 0.0162 Epoch: 11 Global Step: 495880 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:34,117-Speed 2626.26 samples/sec Loss 5.4409 LearningRate 0.0162 Epoch: 11 Global Step: 495890 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:38,046-Speed 2607.06 samples/sec Loss 5.4456 LearningRate 0.0162 Epoch: 11 Global Step: 495900 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:41,948-Speed 2624.70 samples/sec Loss 5.5611 LearningRate 0.0162 Epoch: 11 Global Step: 495910 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:45,840-Speed 2631.94 samples/sec Loss 5.4234 LearningRate 0.0162 Epoch: 11 Global Step: 495920 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:49,733-Speed 2630.92 samples/sec Loss 5.2915 LearningRate 0.0162 Epoch: 11 Global Step: 495930 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:53,640-Speed 2621.88 samples/sec Loss 5.5029 LearningRate 0.0162 Epoch: 11 Global Step: 495940 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:26:57,543-Speed 2623.77 samples/sec Loss 5.3845 LearningRate 0.0162 Epoch: 11 Global Step: 495950 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:27:01,450-Speed 2621.85 samples/sec Loss 5.3116 LearningRate 0.0162 Epoch: 11 Global Step: 495960 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:05,338-Speed 2633.98 samples/sec Loss 5.3895 LearningRate 0.0162 Epoch: 11 Global Step: 495970 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:09,230-Speed 2631.70 samples/sec Loss 5.3960 LearningRate 0.0162 Epoch: 11 Global Step: 495980 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:13,120-Speed 2633.51 samples/sec Loss 5.4059 LearningRate 0.0162 Epoch: 11 Global Step: 495990 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:17,016-Speed 2628.89 samples/sec Loss 5.4670 LearningRate 0.0162 Epoch: 11 Global Step: 496000 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:20,910-Speed 2630.75 samples/sec Loss 5.4658 LearningRate 0.0162 Epoch: 11 Global Step: 496010 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:24,831-Speed 2611.56 samples/sec Loss 5.3306 LearningRate 0.0162 Epoch: 11 Global Step: 496020 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:28,726-Speed 2629.90 samples/sec Loss 5.4488 LearningRate 0.0162 Epoch: 11 Global Step: 496030 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:32,637-Speed 2618.54 samples/sec Loss 5.3215 LearningRate 0.0162 Epoch: 11 Global Step: 496040 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:27:36,542-Speed 2623.05 samples/sec Loss 5.3634 LearningRate 0.0162 Epoch: 11 Global Step: 496050 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:27:40,458-Speed 2615.29 samples/sec Loss 5.3674 LearningRate 0.0162 Epoch: 11 Global Step: 496060 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:27:44,361-Speed 2623.88 samples/sec Loss 5.3770 LearningRate 0.0162 Epoch: 11 Global Step: 496070 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:27:48,265-Speed 2624.16 samples/sec Loss 5.3522 LearningRate 0.0162 Epoch: 11 Global Step: 496080 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:27:52,168-Speed 2624.70 samples/sec Loss 5.4413 LearningRate 0.0162 Epoch: 11 Global Step: 496090 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:27:56,064-Speed 2629.62 samples/sec Loss 5.4230 LearningRate 0.0162 Epoch: 11 Global Step: 496100 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:27:59,963-Speed 2626.47 samples/sec Loss 5.3053 LearningRate 0.0162 Epoch: 11 Global Step: 496110 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:03,864-Speed 2625.80 samples/sec Loss 5.4497 LearningRate 0.0162 Epoch: 11 Global Step: 496120 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:07,760-Speed 2628.35 samples/sec Loss 5.4506 LearningRate 0.0162 Epoch: 11 Global Step: 496130 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:11,653-Speed 2631.47 samples/sec Loss 5.3508 LearningRate 0.0162 Epoch: 11 Global Step: 496140 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:15,549-Speed 2628.46 samples/sec Loss 5.3957 LearningRate 0.0162 Epoch: 11 Global Step: 496150 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:28:19,467-Speed 2614.51 samples/sec Loss 5.3995 LearningRate 0.0162 Epoch: 11 Global Step: 496160 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:28:23,386-Speed 2613.55 samples/sec Loss 5.3330 LearningRate 0.0162 Epoch: 11 Global Step: 496170 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:28:27,284-Speed 2628.09 samples/sec Loss 5.3894 LearningRate 0.0162 Epoch: 11 Global Step: 496180 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:28:31,184-Speed 2626.00 samples/sec Loss 5.3134 LearningRate 0.0162 Epoch: 11 Global Step: 496190 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:28:35,078-Speed 2630.39 samples/sec Loss 5.2502 LearningRate 0.0161 Epoch: 11 Global Step: 496200 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:28:38,975-Speed 2628.01 samples/sec Loss 5.3856 LearningRate 0.0161 Epoch: 11 Global Step: 496210 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:42,873-Speed 2628.16 samples/sec Loss 5.3728 LearningRate 0.0161 Epoch: 11 Global Step: 496220 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:46,792-Speed 2613.60 samples/sec Loss 5.3956 LearningRate 0.0161 Epoch: 11 Global Step: 496230 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:50,693-Speed 2625.68 samples/sec Loss 5.3371 LearningRate 0.0161 Epoch: 11 Global Step: 496240 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:54,600-Speed 2622.10 samples/sec Loss 5.3989 LearningRate 0.0161 Epoch: 11 Global Step: 496250 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:28:58,489-Speed 2633.33 samples/sec Loss 5.3293 LearningRate 0.0161 Epoch: 11 Global Step: 496260 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:02,411-Speed 2611.59 samples/sec Loss 5.3861 LearningRate 0.0161 Epoch: 11 Global Step: 496270 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:06,318-Speed 2621.38 samples/sec Loss 5.3678 LearningRate 0.0161 Epoch: 11 Global Step: 496280 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:10,216-Speed 2628.19 samples/sec Loss 5.3004 LearningRate 0.0161 Epoch: 11 Global Step: 496290 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:14,132-Speed 2615.48 samples/sec Loss 5.4349 LearningRate 0.0161 Epoch: 11 Global Step: 496300 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:18,034-Speed 2625.08 samples/sec Loss 5.3798 LearningRate 0.0161 Epoch: 11 Global Step: 496310 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:29:21,920-Speed 2636.16 samples/sec Loss 5.4228 LearningRate 0.0161 Epoch: 11 Global Step: 496320 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:25,838-Speed 2614.32 samples/sec Loss 5.3042 LearningRate 0.0161 Epoch: 11 Global Step: 496330 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:29,748-Speed 2619.86 samples/sec Loss 5.4220 LearningRate 0.0161 Epoch: 11 Global Step: 496340 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:33,642-Speed 2630.14 samples/sec Loss 5.3629 LearningRate 0.0161 Epoch: 11 Global Step: 496350 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:37,568-Speed 2608.24 samples/sec Loss 5.2647 LearningRate 0.0161 Epoch: 11 Global Step: 496360 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:41,482-Speed 2618.08 samples/sec Loss 5.3965 LearningRate 0.0161 Epoch: 11 Global Step: 496370 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:45,396-Speed 2617.12 samples/sec Loss 5.4643 LearningRate 0.0161 Epoch: 11 Global Step: 496380 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:49,290-Speed 2630.11 samples/sec Loss 5.3342 LearningRate 0.0161 Epoch: 11 Global Step: 496390 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:53,183-Speed 2631.34 samples/sec Loss 5.3537 LearningRate 0.0161 Epoch: 11 Global Step: 496400 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:29:57,083-Speed 2626.22 samples/sec Loss 5.3991 LearningRate 0.0161 Epoch: 11 Global Step: 496410 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:00,961-Speed 2641.05 samples/sec Loss 5.2606 LearningRate 0.0161 Epoch: 11 Global Step: 496420 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:04,854-Speed 2631.07 samples/sec Loss 5.3632 LearningRate 0.0161 Epoch: 11 Global Step: 496430 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:08,743-Speed 2633.66 samples/sec Loss 5.3111 LearningRate 0.0161 Epoch: 11 Global Step: 496440 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:12,636-Speed 2631.53 samples/sec Loss 5.4664 LearningRate 0.0161 Epoch: 11 Global Step: 496450 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:16,528-Speed 2631.73 samples/sec Loss 5.4536 LearningRate 0.0161 Epoch: 11 Global Step: 496460 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:20,421-Speed 2630.80 samples/sec Loss 5.3356 LearningRate 0.0161 Epoch: 11 Global Step: 496470 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:24,325-Speed 2625.01 samples/sec Loss 5.4816 LearningRate 0.0161 Epoch: 11 Global Step: 496480 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:28,216-Speed 2632.59 samples/sec Loss 5.3660 LearningRate 0.0161 Epoch: 11 Global Step: 496490 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:32,113-Speed 2627.74 samples/sec Loss 5.4298 LearningRate 0.0161 Epoch: 11 Global Step: 496500 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:36,009-Speed 2629.67 samples/sec Loss 5.4173 LearningRate 0.0161 Epoch: 11 Global Step: 496510 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:30:39,905-Speed 2629.15 samples/sec Loss 5.4543 LearningRate 0.0161 Epoch: 11 Global Step: 496520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:30:43,822-Speed 2614.82 samples/sec Loss 5.3068 LearningRate 0.0161 Epoch: 11 Global Step: 496530 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:30:47,727-Speed 2623.14 samples/sec Loss 5.3398 LearningRate 0.0161 Epoch: 11 Global Step: 496540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:30:51,617-Speed 2632.95 samples/sec Loss 5.3837 LearningRate 0.0161 Epoch: 11 Global Step: 496550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:30:55,515-Speed 2627.61 samples/sec Loss 5.4691 LearningRate 0.0161 Epoch: 11 Global Step: 496560 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:30:59,397-Speed 2639.78 samples/sec Loss 5.2983 LearningRate 0.0161 Epoch: 11 Global Step: 496570 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:03,289-Speed 2631.31 samples/sec Loss 5.3992 LearningRate 0.0161 Epoch: 11 Global Step: 496580 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:07,191-Speed 2625.45 samples/sec Loss 5.3991 LearningRate 0.0161 Epoch: 11 Global Step: 496590 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:11,097-Speed 2622.18 samples/sec Loss 5.4002 LearningRate 0.0161 Epoch: 11 Global Step: 496600 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:14,997-Speed 2625.79 samples/sec Loss 5.3746 LearningRate 0.0161 Epoch: 11 Global Step: 496610 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:18,949-Speed 2591.67 samples/sec Loss 5.4174 LearningRate 0.0161 Epoch: 11 Global Step: 496620 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:22,845-Speed 2629.34 samples/sec Loss 5.3420 LearningRate 0.0161 Epoch: 11 Global Step: 496630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:26,762-Speed 2615.06 samples/sec Loss 5.3972 LearningRate 0.0161 Epoch: 11 Global Step: 496640 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:30,655-Speed 2630.59 samples/sec Loss 5.2885 LearningRate 0.0161 Epoch: 11 Global Step: 496650 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:34,554-Speed 2627.15 samples/sec Loss 5.2476 LearningRate 0.0161 Epoch: 11 Global Step: 496660 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:31:38,449-Speed 2629.83 samples/sec Loss 5.4630 LearningRate 0.0161 Epoch: 11 Global Step: 496670 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:31:42,340-Speed 2632.81 samples/sec Loss 5.4495 LearningRate 0.0161 Epoch: 11 Global Step: 496680 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:31:46,232-Speed 2631.28 samples/sec Loss 5.4104 LearningRate 0.0161 Epoch: 11 Global Step: 496690 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:31:50,130-Speed 2627.15 samples/sec Loss 5.3744 LearningRate 0.0161 Epoch: 11 Global Step: 496700 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:31:54,032-Speed 2625.23 samples/sec Loss 5.4864 LearningRate 0.0161 Epoch: 11 Global Step: 496710 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:31:57,910-Speed 2641.69 samples/sec Loss 5.5087 LearningRate 0.0161 Epoch: 11 Global Step: 496720 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:01,810-Speed 2626.27 samples/sec Loss 5.2800 LearningRate 0.0161 Epoch: 11 Global Step: 496730 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:05,707-Speed 2628.66 samples/sec Loss 5.3941 LearningRate 0.0161 Epoch: 11 Global Step: 496740 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:09,637-Speed 2606.06 samples/sec Loss 5.4796 LearningRate 0.0161 Epoch: 11 Global Step: 496750 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:13,535-Speed 2627.53 samples/sec Loss 5.2480 LearningRate 0.0161 Epoch: 11 Global Step: 496760 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:17,435-Speed 2626.71 samples/sec Loss 5.3403 LearningRate 0.0161 Epoch: 11 Global Step: 496770 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:21,332-Speed 2630.51 samples/sec Loss 5.2947 LearningRate 0.0161 Epoch: 11 Global Step: 496780 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:25,224-Speed 2631.22 samples/sec Loss 5.3268 LearningRate 0.0161 Epoch: 11 Global Step: 496790 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:29,122-Speed 2628.03 samples/sec Loss 5.4071 LearningRate 0.0161 Epoch: 11 Global Step: 496800 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:33,019-Speed 2628.06 samples/sec Loss 5.4734 LearningRate 0.0161 Epoch: 11 Global Step: 496810 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:32:36,914-Speed 2630.08 samples/sec Loss 5.4098 LearningRate 0.0161 Epoch: 11 Global Step: 496820 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:32:40,811-Speed 2628.22 samples/sec Loss 5.3394 LearningRate 0.0161 Epoch: 11 Global Step: 496830 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:32:44,751-Speed 2599.83 samples/sec Loss 5.3409 LearningRate 0.0161 Epoch: 11 Global Step: 496840 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:32:48,668-Speed 2614.72 samples/sec Loss 5.2672 LearningRate 0.0161 Epoch: 11 Global Step: 496850 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:32:52,577-Speed 2620.36 samples/sec Loss 5.3790 LearningRate 0.0161 Epoch: 11 Global Step: 496860 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:32:56,476-Speed 2626.83 samples/sec Loss 5.3539 LearningRate 0.0161 Epoch: 11 Global Step: 496870 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:00,370-Speed 2630.38 samples/sec Loss 5.3072 LearningRate 0.0161 Epoch: 11 Global Step: 496880 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:04,277-Speed 2621.71 samples/sec Loss 5.3635 LearningRate 0.0161 Epoch: 11 Global Step: 496890 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:08,172-Speed 2630.12 samples/sec Loss 5.3581 LearningRate 0.0161 Epoch: 11 Global Step: 496900 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:12,064-Speed 2631.50 samples/sec Loss 5.4317 LearningRate 0.0161 Epoch: 11 Global Step: 496910 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:15,938-Speed 2643.96 samples/sec Loss 5.3521 LearningRate 0.0161 Epoch: 11 Global Step: 496920 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:19,845-Speed 2622.04 samples/sec Loss 5.3570 LearningRate 0.0161 Epoch: 11 Global Step: 496930 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:23,738-Speed 2630.79 samples/sec Loss 5.2760 LearningRate 0.0161 Epoch: 11 Global Step: 496940 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:27,637-Speed 2626.79 samples/sec Loss 5.3147 LearningRate 0.0161 Epoch: 11 Global Step: 496950 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:31,534-Speed 2628.62 samples/sec Loss 5.4243 LearningRate 0.0161 Epoch: 11 Global Step: 496960 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:33:35,407-Speed 2644.62 samples/sec Loss 5.4060 LearningRate 0.0161 Epoch: 11 Global Step: 496970 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:33:39,305-Speed 2627.16 samples/sec Loss 5.3330 LearningRate 0.0161 Epoch: 11 Global Step: 496980 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:33:43,202-Speed 2628.80 samples/sec Loss 5.3830 LearningRate 0.0161 Epoch: 11 Global Step: 496990 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:33:47,102-Speed 2626.28 samples/sec Loss 5.4882 LearningRate 0.0161 Epoch: 11 Global Step: 497000 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:33:51,007-Speed 2623.19 samples/sec Loss 5.4503 LearningRate 0.0161 Epoch: 11 Global Step: 497010 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:33:54,902-Speed 2629.57 samples/sec Loss 5.3069 LearningRate 0.0161 Epoch: 11 Global Step: 497020 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:33:58,818-Speed 2616.48 samples/sec Loss 5.3900 LearningRate 0.0161 Epoch: 11 Global Step: 497030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:02,737-Speed 2613.23 samples/sec Loss 5.4118 LearningRate 0.0161 Epoch: 11 Global Step: 497040 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:06,633-Speed 2629.35 samples/sec Loss 5.2966 LearningRate 0.0161 Epoch: 11 Global Step: 497050 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:10,529-Speed 2628.88 samples/sec Loss 5.3809 LearningRate 0.0161 Epoch: 11 Global Step: 497060 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:14,425-Speed 2629.52 samples/sec Loss 5.3830 LearningRate 0.0161 Epoch: 11 Global Step: 497070 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:34:18,333-Speed 2621.31 samples/sec Loss 5.4053 LearningRate 0.0161 Epoch: 11 Global Step: 497080 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:34:22,227-Speed 2630.05 samples/sec Loss 5.3159 LearningRate 0.0161 Epoch: 11 Global Step: 497090 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:34:26,127-Speed 2627.27 samples/sec Loss 5.3945 LearningRate 0.0161 Epoch: 11 Global Step: 497100 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:34:30,044-Speed 2614.96 samples/sec Loss 5.3073 LearningRate 0.0161 Epoch: 11 Global Step: 497110 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:34:33,956-Speed 2618.43 samples/sec Loss 5.4003 LearningRate 0.0161 Epoch: 11 Global Step: 497120 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:34:37,968-Speed 2552.72 samples/sec Loss 5.3688 LearningRate 0.0161 Epoch: 11 Global Step: 497130 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:42,061-Speed 2502.78 samples/sec Loss 5.3449 LearningRate 0.0161 Epoch: 11 Global Step: 497140 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:46,156-Speed 2501.07 samples/sec Loss 5.3631 LearningRate 0.0161 Epoch: 11 Global Step: 497150 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:50,063-Speed 2622.08 samples/sec Loss 5.3752 LearningRate 0.0161 Epoch: 11 Global Step: 497160 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:53,959-Speed 2629.01 samples/sec Loss 5.4163 LearningRate 0.0161 Epoch: 11 Global Step: 497170 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:34:57,855-Speed 2629.30 samples/sec Loss 5.3107 LearningRate 0.0161 Epoch: 11 Global Step: 497180 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:01,750-Speed 2629.96 samples/sec Loss 5.3728 LearningRate 0.0161 Epoch: 11 Global Step: 497190 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:05,645-Speed 2629.15 samples/sec Loss 5.4128 LearningRate 0.0161 Epoch: 11 Global Step: 497200 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:09,550-Speed 2622.79 samples/sec Loss 5.3963 LearningRate 0.0161 Epoch: 11 Global Step: 497210 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:13,459-Speed 2620.81 samples/sec Loss 5.4702 LearningRate 0.0161 Epoch: 11 Global Step: 497220 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:17,355-Speed 2628.93 samples/sec Loss 5.2847 LearningRate 0.0160 Epoch: 11 Global Step: 497230 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:35:21,250-Speed 2630.00 samples/sec Loss 5.3402 LearningRate 0.0160 Epoch: 11 Global Step: 497240 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:35:25,154-Speed 2623.59 samples/sec Loss 5.4060 LearningRate 0.0160 Epoch: 11 Global Step: 497250 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:35:29,050-Speed 2629.73 samples/sec Loss 5.3935 LearningRate 0.0160 Epoch: 11 Global Step: 497260 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:35:33,037-Speed 2569.08 samples/sec Loss 5.1841 LearningRate 0.0160 Epoch: 11 Global Step: 497270 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:35:36,938-Speed 2625.43 samples/sec Loss 5.3888 LearningRate 0.0160 Epoch: 11 Global Step: 497280 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:35:40,837-Speed 2627.13 samples/sec Loss 5.4054 LearningRate 0.0160 Epoch: 11 Global Step: 497290 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:35:44,718-Speed 2639.84 samples/sec Loss 5.3671 LearningRate 0.0160 Epoch: 11 Global Step: 497300 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:48,613-Speed 2629.45 samples/sec Loss 5.3908 LearningRate 0.0160 Epoch: 11 Global Step: 497310 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:52,517-Speed 2623.52 samples/sec Loss 5.5381 LearningRate 0.0160 Epoch: 11 Global Step: 497320 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:35:56,427-Speed 2619.76 samples/sec Loss 5.3124 LearningRate 0.0160 Epoch: 11 Global Step: 497330 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:36:00,350-Speed 2610.74 samples/sec Loss 5.4081 LearningRate 0.0160 Epoch: 11 Global Step: 497340 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:36:04,253-Speed 2624.70 samples/sec Loss 5.3247 LearningRate 0.0160 Epoch: 11 Global Step: 497350 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:36:08,152-Speed 2626.89 samples/sec Loss 5.4551 LearningRate 0.0160 Epoch: 11 Global Step: 497360 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:36:12,093-Speed 2598.65 samples/sec Loss 5.3166 LearningRate 0.0160 Epoch: 11 Global Step: 497370 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:36:15,988-Speed 2630.11 samples/sec Loss 5.4808 LearningRate 0.0160 Epoch: 11 Global Step: 497380 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:36:19,887-Speed 2626.74 samples/sec Loss 5.4047 LearningRate 0.0160 Epoch: 11 Global Step: 497390 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:36:23,873-Speed 2570.26 samples/sec Loss 5.3428 LearningRate 0.0160 Epoch: 11 Global Step: 497400 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:27,767-Speed 2630.02 samples/sec Loss 5.4875 LearningRate 0.0160 Epoch: 11 Global Step: 497410 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:31,690-Speed 2611.08 samples/sec Loss 5.4134 LearningRate 0.0160 Epoch: 11 Global Step: 497420 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:35,638-Speed 2594.62 samples/sec Loss 5.3209 LearningRate 0.0160 Epoch: 11 Global Step: 497430 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:39,537-Speed 2627.45 samples/sec Loss 5.3739 LearningRate 0.0160 Epoch: 11 Global Step: 497440 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:43,440-Speed 2624.20 samples/sec Loss 5.4131 LearningRate 0.0160 Epoch: 11 Global Step: 497450 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:47,347-Speed 2621.44 samples/sec Loss 5.3682 LearningRate 0.0160 Epoch: 11 Global Step: 497460 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:51,243-Speed 2628.41 samples/sec Loss 5.3813 LearningRate 0.0160 Epoch: 11 Global Step: 497470 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:55,136-Speed 2630.96 samples/sec Loss 5.3965 LearningRate 0.0160 Epoch: 11 Global Step: 497480 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:36:59,033-Speed 2629.15 samples/sec Loss 5.3628 LearningRate 0.0160 Epoch: 11 Global Step: 497490 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:02,908-Speed 2642.64 samples/sec Loss 5.4324 LearningRate 0.0160 Epoch: 11 Global Step: 497500 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:06,803-Speed 2629.82 samples/sec Loss 5.3542 LearningRate 0.0160 Epoch: 11 Global Step: 497510 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:10,704-Speed 2625.62 samples/sec Loss 5.3702 LearningRate 0.0160 Epoch: 11 Global Step: 497520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:14,602-Speed 2627.63 samples/sec Loss 5.3821 LearningRate 0.0160 Epoch: 11 Global Step: 497530 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:18,506-Speed 2623.13 samples/sec Loss 5.3860 LearningRate 0.0160 Epoch: 11 Global Step: 497540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:22,404-Speed 2627.96 samples/sec Loss 5.3911 LearningRate 0.0160 Epoch: 11 Global Step: 497550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:26,294-Speed 2633.26 samples/sec Loss 5.3828 LearningRate 0.0160 Epoch: 11 Global Step: 497560 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:30,218-Speed 2610.24 samples/sec Loss 5.2587 LearningRate 0.0160 Epoch: 11 Global Step: 497570 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:34,112-Speed 2630.72 samples/sec Loss 5.3717 LearningRate 0.0160 Epoch: 11 Global Step: 497580 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:38,006-Speed 2630.02 samples/sec Loss 5.3860 LearningRate 0.0160 Epoch: 11 Global Step: 497590 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:41,885-Speed 2640.78 samples/sec Loss 5.3798 LearningRate 0.0160 Epoch: 11 Global Step: 497600 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:45,799-Speed 2616.19 samples/sec Loss 5.4253 LearningRate 0.0160 Epoch: 11 Global Step: 497610 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:49,698-Speed 2627.32 samples/sec Loss 5.3404 LearningRate 0.0160 Epoch: 11 Global Step: 497620 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:37:53,585-Speed 2635.10 samples/sec Loss 5.3727 LearningRate 0.0160 Epoch: 11 Global Step: 497630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:37:57,513-Speed 2607.21 samples/sec Loss 5.3260 LearningRate 0.0160 Epoch: 11 Global Step: 497640 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:01,408-Speed 2629.71 samples/sec Loss 5.3501 LearningRate 0.0160 Epoch: 11 Global Step: 497650 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:05,306-Speed 2628.11 samples/sec Loss 5.3638 LearningRate 0.0160 Epoch: 11 Global Step: 497660 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:09,202-Speed 2628.96 samples/sec Loss 5.4628 LearningRate 0.0160 Epoch: 11 Global Step: 497670 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:13,099-Speed 2628.01 samples/sec Loss 5.3558 LearningRate 0.0160 Epoch: 11 Global Step: 497680 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:16,994-Speed 2629.49 samples/sec Loss 5.3797 LearningRate 0.0160 Epoch: 11 Global Step: 497690 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:20,899-Speed 2622.85 samples/sec Loss 5.3952 LearningRate 0.0160 Epoch: 11 Global Step: 497700 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:24,796-Speed 2628.88 samples/sec Loss 5.4578 LearningRate 0.0160 Epoch: 11 Global Step: 497710 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:28,696-Speed 2627.13 samples/sec Loss 5.3798 LearningRate 0.0160 Epoch: 11 Global Step: 497720 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:38:32,595-Speed 2626.24 samples/sec Loss 5.3177 LearningRate 0.0160 Epoch: 11 Global Step: 497730 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:38:36,491-Speed 2629.01 samples/sec Loss 5.3411 LearningRate 0.0160 Epoch: 11 Global Step: 497740 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:38:57,656-Speed 483.85 samples/sec Loss 5.4467 LearningRate 0.0160 Epoch: 12 Global Step: 497750 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:39:01,540-Speed 2637.87 samples/sec Loss 5.4073 LearningRate 0.0160 Epoch: 12 Global Step: 497760 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:39:05,426-Speed 2635.70 samples/sec Loss 5.3457 LearningRate 0.0160 Epoch: 12 Global Step: 497770 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:39:09,315-Speed 2634.17 samples/sec Loss 5.3947 LearningRate 0.0160 Epoch: 12 Global Step: 497780 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:39:13,209-Speed 2630.17 samples/sec Loss 5.3839 LearningRate 0.0160 Epoch: 12 Global Step: 497790 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:17,135-Speed 2609.56 samples/sec Loss 5.3524 LearningRate 0.0160 Epoch: 12 Global Step: 497800 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:21,029-Speed 2630.01 samples/sec Loss 5.3134 LearningRate 0.0160 Epoch: 12 Global Step: 497810 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:24,927-Speed 2628.29 samples/sec Loss 5.2976 LearningRate 0.0160 Epoch: 12 Global Step: 497820 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:28,823-Speed 2629.05 samples/sec Loss 5.3161 LearningRate 0.0160 Epoch: 12 Global Step: 497830 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:32,722-Speed 2626.79 samples/sec Loss 5.3004 LearningRate 0.0160 Epoch: 12 Global Step: 497840 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:36,621-Speed 2626.65 samples/sec Loss 5.3595 LearningRate 0.0160 Epoch: 12 Global Step: 497850 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:40,519-Speed 2627.54 samples/sec Loss 5.3078 LearningRate 0.0160 Epoch: 12 Global Step: 497860 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:44,425-Speed 2622.37 samples/sec Loss 5.3237 LearningRate 0.0160 Epoch: 12 Global Step: 497870 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:48,324-Speed 2626.73 samples/sec Loss 5.3184 LearningRate 0.0160 Epoch: 12 Global Step: 497880 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:39:52,229-Speed 2623.42 samples/sec Loss 5.4178 LearningRate 0.0160 Epoch: 12 Global Step: 497890 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:39:56,129-Speed 2626.38 samples/sec Loss 5.3031 LearningRate 0.0160 Epoch: 12 Global Step: 497900 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:40:00,032-Speed 2624.88 samples/sec Loss 5.3847 LearningRate 0.0160 Epoch: 12 Global Step: 497910 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:40:03,937-Speed 2622.54 samples/sec Loss 5.3713 LearningRate 0.0160 Epoch: 12 Global Step: 497920 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:40:07,838-Speed 2626.02 samples/sec Loss 5.3509 LearningRate 0.0160 Epoch: 12 Global Step: 497930 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:40:11,715-Speed 2641.59 samples/sec Loss 5.3225 LearningRate 0.0160 Epoch: 12 Global Step: 497940 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:15,615-Speed 2625.61 samples/sec Loss 5.3189 LearningRate 0.0160 Epoch: 12 Global Step: 497950 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:19,516-Speed 2625.78 samples/sec Loss 5.3564 LearningRate 0.0160 Epoch: 12 Global Step: 497960 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:23,412-Speed 2629.21 samples/sec Loss 5.4004 LearningRate 0.0160 Epoch: 12 Global Step: 497970 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:27,312-Speed 2626.60 samples/sec Loss 5.4554 LearningRate 0.0160 Epoch: 12 Global Step: 497980 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:31,211-Speed 2626.79 samples/sec Loss 5.4230 LearningRate 0.0160 Epoch: 12 Global Step: 497990 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:35,106-Speed 2629.22 samples/sec Loss 5.2969 LearningRate 0.0160 Epoch: 12 Global Step: 498000 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:39,008-Speed 2624.88 samples/sec Loss 5.3382 LearningRate 0.0160 Epoch: 12 Global Step: 498010 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:42,904-Speed 2629.50 samples/sec Loss 5.4547 LearningRate 0.0160 Epoch: 12 Global Step: 498020 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:46,814-Speed 2619.32 samples/sec Loss 5.4167 LearningRate 0.0160 Epoch: 12 Global Step: 498030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:40:50,711-Speed 2628.89 samples/sec Loss 5.4746 LearningRate 0.0160 Epoch: 12 Global Step: 498040 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:40:54,608-Speed 2628.40 samples/sec Loss 5.2940 LearningRate 0.0160 Epoch: 12 Global Step: 498050 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:40:58,505-Speed 2628.76 samples/sec Loss 5.2615 LearningRate 0.0160 Epoch: 12 Global Step: 498060 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:41:02,407-Speed 2625.25 samples/sec Loss 5.4027 LearningRate 0.0160 Epoch: 12 Global Step: 498070 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:41:06,313-Speed 2622.20 samples/sec Loss 5.3781 LearningRate 0.0160 Epoch: 12 Global Step: 498080 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:41:10,215-Speed 2624.66 samples/sec Loss 5.3980 LearningRate 0.0160 Epoch: 12 Global Step: 498090 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:41:14,127-Speed 2618.50 samples/sec Loss 5.3660 LearningRate 0.0160 Epoch: 12 Global Step: 498100 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:41:18,028-Speed 2625.11 samples/sec Loss 5.3836 LearningRate 0.0160 Epoch: 12 Global Step: 498110 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:41:21,909-Speed 2639.93 samples/sec Loss 5.2542 LearningRate 0.0160 Epoch: 12 Global Step: 498120 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:25,825-Speed 2615.44 samples/sec Loss 5.3282 LearningRate 0.0160 Epoch: 12 Global Step: 498130 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:29,724-Speed 2627.06 samples/sec Loss 5.3424 LearningRate 0.0160 Epoch: 12 Global Step: 498140 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:33,619-Speed 2629.14 samples/sec Loss 5.2192 LearningRate 0.0160 Epoch: 12 Global Step: 498150 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:37,523-Speed 2623.57 samples/sec Loss 5.3583 LearningRate 0.0160 Epoch: 12 Global Step: 498160 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:41,420-Speed 2627.82 samples/sec Loss 5.3552 LearningRate 0.0160 Epoch: 12 Global Step: 498170 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:45,315-Speed 2629.79 samples/sec Loss 5.3467 LearningRate 0.0160 Epoch: 12 Global Step: 498180 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:49,217-Speed 2625.03 samples/sec Loss 5.3667 LearningRate 0.0160 Epoch: 12 Global Step: 498190 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:53,116-Speed 2627.03 samples/sec Loss 5.3706 LearningRate 0.0160 Epoch: 12 Global Step: 498200 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:41:57,011-Speed 2630.44 samples/sec Loss 5.2925 LearningRate 0.0160 Epoch: 12 Global Step: 498210 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:42:00,909-Speed 2627.71 samples/sec Loss 5.3389 LearningRate 0.0160 Epoch: 12 Global Step: 498220 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:04,835-Speed 2608.75 samples/sec Loss 5.2739 LearningRate 0.0160 Epoch: 12 Global Step: 498230 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:08,744-Speed 2619.62 samples/sec Loss 5.3711 LearningRate 0.0160 Epoch: 12 Global Step: 498240 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:12,650-Speed 2622.55 samples/sec Loss 5.2709 LearningRate 0.0160 Epoch: 12 Global Step: 498250 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:16,547-Speed 2628.06 samples/sec Loss 5.3037 LearningRate 0.0160 Epoch: 12 Global Step: 498260 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:20,457-Speed 2620.04 samples/sec Loss 5.2933 LearningRate 0.0159 Epoch: 12 Global Step: 498270 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:24,418-Speed 2585.46 samples/sec Loss 5.3058 LearningRate 0.0159 Epoch: 12 Global Step: 498280 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:28,446-Speed 2542.60 samples/sec Loss 5.2487 LearningRate 0.0159 Epoch: 12 Global Step: 498290 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:32,352-Speed 2622.95 samples/sec Loss 5.4399 LearningRate 0.0159 Epoch: 12 Global Step: 498300 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:36,253-Speed 2625.30 samples/sec Loss 5.2930 LearningRate 0.0159 Epoch: 12 Global Step: 498310 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:40,136-Speed 2637.83 samples/sec Loss 5.3762 LearningRate 0.0159 Epoch: 12 Global Step: 498320 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:44,048-Speed 2618.69 samples/sec Loss 5.4075 LearningRate 0.0159 Epoch: 12 Global Step: 498330 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:47,957-Speed 2619.58 samples/sec Loss 5.3238 LearningRate 0.0159 Epoch: 12 Global Step: 498340 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:51,854-Speed 2628.35 samples/sec Loss 5.3643 LearningRate 0.0159 Epoch: 12 Global Step: 498350 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:55,755-Speed 2625.49 samples/sec Loss 5.2693 LearningRate 0.0159 Epoch: 12 Global Step: 498360 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:42:59,629-Speed 2644.01 samples/sec Loss 5.3334 LearningRate 0.0159 Epoch: 12 Global Step: 498370 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:03,528-Speed 2626.64 samples/sec Loss 5.3624 LearningRate 0.0159 Epoch: 12 Global Step: 498380 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:07,426-Speed 2628.26 samples/sec Loss 5.2648 LearningRate 0.0159 Epoch: 12 Global Step: 498390 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:11,323-Speed 2628.21 samples/sec Loss 5.1948 LearningRate 0.0159 Epoch: 12 Global Step: 498400 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:15,218-Speed 2629.69 samples/sec Loss 5.2908 LearningRate 0.0159 Epoch: 12 Global Step: 498410 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:19,194-Speed 2575.87 samples/sec Loss 5.3265 LearningRate 0.0159 Epoch: 12 Global Step: 498420 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:23,096-Speed 2625.31 samples/sec Loss 5.4585 LearningRate 0.0159 Epoch: 12 Global Step: 498430 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:26,998-Speed 2624.79 samples/sec Loss 5.3714 LearningRate 0.0159 Epoch: 12 Global Step: 498440 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:30,894-Speed 2628.38 samples/sec Loss 5.2431 LearningRate 0.0159 Epoch: 12 Global Step: 498450 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:34,796-Speed 2625.17 samples/sec Loss 5.4096 LearningRate 0.0159 Epoch: 12 Global Step: 498460 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:43:38,717-Speed 2612.28 samples/sec Loss 5.2887 LearningRate 0.0159 Epoch: 12 Global Step: 498470 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:43:42,617-Speed 2625.87 samples/sec Loss 5.4221 LearningRate 0.0159 Epoch: 12 Global Step: 498480 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:43:46,514-Speed 2628.47 samples/sec Loss 5.4061 LearningRate 0.0159 Epoch: 12 Global Step: 498490 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:43:50,415-Speed 2625.55 samples/sec Loss 5.1827 LearningRate 0.0159 Epoch: 12 Global Step: 498500 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:43:54,313-Speed 2628.06 samples/sec Loss 5.2694 LearningRate 0.0159 Epoch: 12 Global Step: 498510 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:43:58,214-Speed 2625.18 samples/sec Loss 5.3008 LearningRate 0.0159 Epoch: 12 Global Step: 498520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:02,113-Speed 2627.36 samples/sec Loss 5.3744 LearningRate 0.0159 Epoch: 12 Global Step: 498530 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:06,009-Speed 2628.52 samples/sec Loss 5.3260 LearningRate 0.0159 Epoch: 12 Global Step: 498540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:09,908-Speed 2626.88 samples/sec Loss 5.3162 LearningRate 0.0159 Epoch: 12 Global Step: 498550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:13,819-Speed 2618.74 samples/sec Loss 5.3172 LearningRate 0.0159 Epoch: 12 Global Step: 498560 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:17,695-Speed 2642.88 samples/sec Loss 5.3740 LearningRate 0.0159 Epoch: 12 Global Step: 498570 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:21,591-Speed 2629.04 samples/sec Loss 5.4633 LearningRate 0.0159 Epoch: 12 Global Step: 498580 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:25,491-Speed 2626.61 samples/sec Loss 5.3582 LearningRate 0.0159 Epoch: 12 Global Step: 498590 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:29,388-Speed 2628.13 samples/sec Loss 5.3418 LearningRate 0.0159 Epoch: 12 Global Step: 498600 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:33,291-Speed 2624.20 samples/sec Loss 5.2913 LearningRate 0.0159 Epoch: 12 Global Step: 498610 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:37,190-Speed 2626.51 samples/sec Loss 5.4546 LearningRate 0.0159 Epoch: 12 Global Step: 498620 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:44:41,065-Speed 2643.40 samples/sec Loss 5.4459 LearningRate 0.0159 Epoch: 12 Global Step: 498630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:44:44,974-Speed 2620.48 samples/sec Loss 5.4271 LearningRate 0.0159 Epoch: 12 Global Step: 498640 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:44:48,874-Speed 2625.77 samples/sec Loss 5.3102 LearningRate 0.0159 Epoch: 12 Global Step: 498650 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:44:52,770-Speed 2629.88 samples/sec Loss 5.3189 LearningRate 0.0159 Epoch: 12 Global Step: 498660 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:44:56,665-Speed 2629.69 samples/sec Loss 5.4048 LearningRate 0.0159 Epoch: 12 Global Step: 498670 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:45:00,560-Speed 2629.61 samples/sec Loss 5.4805 LearningRate 0.0159 Epoch: 12 Global Step: 498680 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:45:04,470-Speed 2619.00 samples/sec Loss 5.3045 LearningRate 0.0159 Epoch: 12 Global Step: 498690 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:45:08,378-Speed 2621.20 samples/sec Loss 5.3140 LearningRate 0.0159 Epoch: 12 Global Step: 498700 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:45:12,325-Speed 2594.75 samples/sec Loss 5.2529 LearningRate 0.0159 Epoch: 12 Global Step: 498710 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:45:16,226-Speed 2625.33 samples/sec Loss 5.3381 LearningRate 0.0159 Epoch: 12 Global Step: 498720 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:45:20,138-Speed 2618.75 samples/sec Loss 5.3556 LearningRate 0.0159 Epoch: 12 Global Step: 498730 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:24,050-Speed 2617.64 samples/sec Loss 5.3424 LearningRate 0.0159 Epoch: 12 Global Step: 498740 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:27,946-Speed 2629.67 samples/sec Loss 5.4428 LearningRate 0.0159 Epoch: 12 Global Step: 498750 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:31,844-Speed 2627.40 samples/sec Loss 5.4170 LearningRate 0.0159 Epoch: 12 Global Step: 498760 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:35,743-Speed 2626.98 samples/sec Loss 5.3409 LearningRate 0.0159 Epoch: 12 Global Step: 498770 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:39,641-Speed 2627.34 samples/sec Loss 5.3274 LearningRate 0.0159 Epoch: 12 Global Step: 498780 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:43,538-Speed 2628.34 samples/sec Loss 5.2713 LearningRate 0.0159 Epoch: 12 Global Step: 498790 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:47,436-Speed 2627.43 samples/sec Loss 5.3723 LearningRate 0.0159 Epoch: 12 Global Step: 498800 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:51,334-Speed 2627.93 samples/sec Loss 5.4322 LearningRate 0.0159 Epoch: 12 Global Step: 498810 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:55,235-Speed 2625.31 samples/sec Loss 5.3109 LearningRate 0.0159 Epoch: 12 Global Step: 498820 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:45:59,116-Speed 2639.19 samples/sec Loss 5.2711 LearningRate 0.0159 Epoch: 12 Global Step: 498830 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:46:03,012-Speed 2629.04 samples/sec Loss 5.3615 LearningRate 0.0159 Epoch: 12 Global Step: 498840 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:46:06,923-Speed 2618.53 samples/sec Loss 5.2216 LearningRate 0.0159 Epoch: 12 Global Step: 498850 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:46:10,820-Speed 2628.20 samples/sec Loss 5.3153 LearningRate 0.0159 Epoch: 12 Global Step: 498860 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:46:14,701-Speed 2640.09 samples/sec Loss 5.3015 LearningRate 0.0159 Epoch: 12 Global Step: 498870 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:18,597-Speed 2629.34 samples/sec Loss 5.3868 LearningRate 0.0159 Epoch: 12 Global Step: 498880 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:22,499-Speed 2624.68 samples/sec Loss 5.2417 LearningRate 0.0159 Epoch: 12 Global Step: 498890 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:26,395-Speed 2628.88 samples/sec Loss 5.4504 LearningRate 0.0159 Epoch: 12 Global Step: 498900 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:30,291-Speed 2629.44 samples/sec Loss 5.3224 LearningRate 0.0159 Epoch: 12 Global Step: 498910 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:34,187-Speed 2628.57 samples/sec Loss 5.3016 LearningRate 0.0159 Epoch: 12 Global Step: 498920 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:38,079-Speed 2631.87 samples/sec Loss 5.2120 LearningRate 0.0159 Epoch: 12 Global Step: 498930 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:41,980-Speed 2625.47 samples/sec Loss 5.2740 LearningRate 0.0159 Epoch: 12 Global Step: 498940 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:45,880-Speed 2625.81 samples/sec Loss 5.3002 LearningRate 0.0159 Epoch: 12 Global Step: 498950 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:49,799-Speed 2614.24 samples/sec Loss 5.2410 LearningRate 0.0159 Epoch: 12 Global Step: 498960 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:53,672-Speed 2644.30 samples/sec Loss 5.2726 LearningRate 0.0159 Epoch: 12 Global Step: 498970 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:46:57,566-Speed 2631.08 samples/sec Loss 5.3937 LearningRate 0.0159 Epoch: 12 Global Step: 498980 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:01,461-Speed 2628.92 samples/sec Loss 5.3191 LearningRate 0.0159 Epoch: 12 Global Step: 498990 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:05,360-Speed 2627.00 samples/sec Loss 5.2998 LearningRate 0.0159 Epoch: 12 Global Step: 499000 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:09,254-Speed 2630.23 samples/sec Loss 5.3490 LearningRate 0.0159 Epoch: 12 Global Step: 499010 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:13,146-Speed 2631.38 samples/sec Loss 5.3374 LearningRate 0.0159 Epoch: 12 Global Step: 499020 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:17,049-Speed 2624.47 samples/sec Loss 5.3735 LearningRate 0.0159 Epoch: 12 Global Step: 499030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:20,942-Speed 2630.65 samples/sec Loss 5.4016 LearningRate 0.0159 Epoch: 12 Global Step: 499040 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:24,834-Speed 2631.58 samples/sec Loss 5.3590 LearningRate 0.0159 Epoch: 12 Global Step: 499050 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:47:28,708-Speed 2644.42 samples/sec Loss 5.3195 LearningRate 0.0159 Epoch: 12 Global Step: 499060 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:32,611-Speed 2624.70 samples/sec Loss 5.2758 LearningRate 0.0159 Epoch: 12 Global Step: 499070 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:36,506-Speed 2629.44 samples/sec Loss 5.4518 LearningRate 0.0159 Epoch: 12 Global Step: 499080 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:40,417-Speed 2618.57 samples/sec Loss 5.2740 LearningRate 0.0159 Epoch: 12 Global Step: 499090 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:44,315-Speed 2627.54 samples/sec Loss 5.3367 LearningRate 0.0159 Epoch: 12 Global Step: 499100 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:48,214-Speed 2627.01 samples/sec Loss 5.3480 LearningRate 0.0159 Epoch: 12 Global Step: 499110 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:52,116-Speed 2625.03 samples/sec Loss 5.2683 LearningRate 0.0159 Epoch: 12 Global Step: 499120 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:56,015-Speed 2627.36 samples/sec Loss 5.4309 LearningRate 0.0159 Epoch: 12 Global Step: 499130 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:47:59,913-Speed 2628.02 samples/sec Loss 5.2501 LearningRate 0.0159 Epoch: 12 Global Step: 499140 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:48:03,819-Speed 2622.24 samples/sec Loss 5.2400 LearningRate 0.0159 Epoch: 12 Global Step: 499150 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 03:48:07,722-Speed 2624.14 samples/sec Loss 5.4322 LearningRate 0.0159 Epoch: 12 Global Step: 499160 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:11,621-Speed 2626.72 samples/sec Loss 5.3753 LearningRate 0.0159 Epoch: 12 Global Step: 499170 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:15,529-Speed 2621.53 samples/sec Loss 5.3798 LearningRate 0.0159 Epoch: 12 Global Step: 499180 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:19,436-Speed 2621.85 samples/sec Loss 5.3713 LearningRate 0.0159 Epoch: 12 Global Step: 499190 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:23,332-Speed 2628.76 samples/sec Loss 5.2775 LearningRate 0.0159 Epoch: 12 Global Step: 499200 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:27,230-Speed 2627.20 samples/sec Loss 5.2985 LearningRate 0.0159 Epoch: 12 Global Step: 499210 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:31,128-Speed 2628.51 samples/sec Loss 5.3325 LearningRate 0.0159 Epoch: 12 Global Step: 499220 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:35,025-Speed 2628.74 samples/sec Loss 5.3088 LearningRate 0.0159 Epoch: 12 Global Step: 499230 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:38,922-Speed 2627.83 samples/sec Loss 5.4977 LearningRate 0.0159 Epoch: 12 Global Step: 499240 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:42,818-Speed 2628.60 samples/sec Loss 5.4297 LearningRate 0.0159 Epoch: 12 Global Step: 499250 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:48:46,717-Speed 2628.14 samples/sec Loss 5.2396 LearningRate 0.0159 Epoch: 12 Global Step: 499260 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:48:50,610-Speed 2630.39 samples/sec Loss 5.3921 LearningRate 0.0159 Epoch: 12 Global Step: 499270 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:48:54,525-Speed 2616.84 samples/sec Loss 5.3866 LearningRate 0.0159 Epoch: 12 Global Step: 499280 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:48:58,438-Speed 2617.39 samples/sec Loss 5.3564 LearningRate 0.0159 Epoch: 12 Global Step: 499290 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:49:02,338-Speed 2626.55 samples/sec Loss 5.3167 LearningRate 0.0159 Epoch: 12 Global Step: 499300 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:49:06,233-Speed 2629.51 samples/sec Loss 5.4037 LearningRate 0.0158 Epoch: 12 Global Step: 499310 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:49:10,135-Speed 2624.70 samples/sec Loss 5.3811 LearningRate 0.0158 Epoch: 12 Global Step: 499320 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:49:14,010-Speed 2643.47 samples/sec Loss 5.2465 LearningRate 0.0158 Epoch: 12 Global Step: 499330 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:17,908-Speed 2628.36 samples/sec Loss 5.3632 LearningRate 0.0158 Epoch: 12 Global Step: 499340 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:21,806-Speed 2627.59 samples/sec Loss 5.3509 LearningRate 0.0158 Epoch: 12 Global Step: 499350 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:25,714-Speed 2621.46 samples/sec Loss 5.3674 LearningRate 0.0158 Epoch: 12 Global Step: 499360 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:29,609-Speed 2629.49 samples/sec Loss 5.2723 LearningRate 0.0158 Epoch: 12 Global Step: 499370 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:33,509-Speed 2625.52 samples/sec Loss 5.3685 LearningRate 0.0158 Epoch: 12 Global Step: 499380 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:37,415-Speed 2622.15 samples/sec Loss 5.2581 LearningRate 0.0158 Epoch: 12 Global Step: 499390 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:41,312-Speed 2629.08 samples/sec Loss 5.4028 LearningRate 0.0158 Epoch: 12 Global Step: 499400 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:45,203-Speed 2632.21 samples/sec Loss 5.3917 LearningRate 0.0158 Epoch: 12 Global Step: 499410 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:49,101-Speed 2627.85 samples/sec Loss 5.3047 LearningRate 0.0158 Epoch: 12 Global Step: 499420 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:49:52,998-Speed 2628.37 samples/sec Loss 5.4078 LearningRate 0.0158 Epoch: 12 Global Step: 499430 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:49:56,872-Speed 2643.60 samples/sec Loss 5.2901 LearningRate 0.0158 Epoch: 12 Global Step: 499440 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:00,766-Speed 2630.20 samples/sec Loss 5.3884 LearningRate 0.0158 Epoch: 12 Global Step: 499450 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:04,675-Speed 2620.51 samples/sec Loss 5.3829 LearningRate 0.0158 Epoch: 12 Global Step: 499460 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:08,591-Speed 2615.85 samples/sec Loss 5.4026 LearningRate 0.0158 Epoch: 12 Global Step: 499470 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:12,493-Speed 2624.83 samples/sec Loss 5.3397 LearningRate 0.0158 Epoch: 12 Global Step: 499480 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:16,393-Speed 2626.76 samples/sec Loss 5.3942 LearningRate 0.0158 Epoch: 12 Global Step: 499490 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:20,329-Speed 2602.08 samples/sec Loss 5.3366 LearningRate 0.0158 Epoch: 12 Global Step: 499500 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:24,240-Speed 2619.09 samples/sec Loss 5.2172 LearningRate 0.0158 Epoch: 12 Global Step: 499510 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:28,138-Speed 2627.43 samples/sec Loss 5.2966 LearningRate 0.0158 Epoch: 12 Global Step: 499520 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:32,086-Speed 2594.34 samples/sec Loss 5.3132 LearningRate 0.0158 Epoch: 12 Global Step: 499530 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:35,998-Speed 2618.23 samples/sec Loss 5.3767 LearningRate 0.0158 Epoch: 12 Global Step: 499540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:50:39,899-Speed 2625.48 samples/sec Loss 5.4346 LearningRate 0.0158 Epoch: 12 Global Step: 499550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:50:43,779-Speed 2640.33 samples/sec Loss 5.3831 LearningRate 0.0158 Epoch: 12 Global Step: 499560 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:47,684-Speed 2622.97 samples/sec Loss 5.3872 LearningRate 0.0158 Epoch: 12 Global Step: 499570 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:51,581-Speed 2628.38 samples/sec Loss 5.3511 LearningRate 0.0158 Epoch: 12 Global Step: 499580 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:55,529-Speed 2594.66 samples/sec Loss 5.3492 LearningRate 0.0158 Epoch: 12 Global Step: 499590 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:50:59,438-Speed 2619.93 samples/sec Loss 5.4394 LearningRate 0.0158 Epoch: 12 Global Step: 499600 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:03,337-Speed 2627.28 samples/sec Loss 5.2793 LearningRate 0.0158 Epoch: 12 Global Step: 499610 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:07,261-Speed 2610.18 samples/sec Loss 5.3797 LearningRate 0.0158 Epoch: 12 Global Step: 499620 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:11,180-Speed 2614.66 samples/sec Loss 5.2495 LearningRate 0.0158 Epoch: 12 Global Step: 499630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:15,077-Speed 2629.27 samples/sec Loss 5.3402 LearningRate 0.0158 Epoch: 12 Global Step: 499640 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:18,980-Speed 2623.79 samples/sec Loss 5.2661 LearningRate 0.0158 Epoch: 12 Global Step: 499650 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:22,901-Speed 2612.79 samples/sec Loss 5.3248 LearningRate 0.0158 Epoch: 12 Global Step: 499660 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:51:26,776-Speed 2643.06 samples/sec Loss 5.2387 LearningRate 0.0158 Epoch: 12 Global Step: 499670 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:30,697-Speed 2612.47 samples/sec Loss 5.3852 LearningRate 0.0158 Epoch: 12 Global Step: 499680 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:34,594-Speed 2628.14 samples/sec Loss 5.3271 LearningRate 0.0158 Epoch: 12 Global Step: 499690 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:38,489-Speed 2629.62 samples/sec Loss 5.3309 LearningRate 0.0158 Epoch: 12 Global Step: 499700 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:42,396-Speed 2621.31 samples/sec Loss 5.2525 LearningRate 0.0158 Epoch: 12 Global Step: 499710 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:46,294-Speed 2628.01 samples/sec Loss 5.3165 LearningRate 0.0158 Epoch: 12 Global Step: 499720 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:50,189-Speed 2630.03 samples/sec Loss 5.3544 LearningRate 0.0158 Epoch: 12 Global Step: 499730 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:54,086-Speed 2628.21 samples/sec Loss 5.3123 LearningRate 0.0158 Epoch: 12 Global Step: 499740 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:51:58,015-Speed 2607.43 samples/sec Loss 5.3207 LearningRate 0.0158 Epoch: 12 Global Step: 499750 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:01,917-Speed 2624.59 samples/sec Loss 5.2818 LearningRate 0.0158 Epoch: 12 Global Step: 499760 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:05,813-Speed 2629.40 samples/sec Loss 5.3838 LearningRate 0.0158 Epoch: 12 Global Step: 499770 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:52:09,707-Speed 2630.17 samples/sec Loss 5.2791 LearningRate 0.0158 Epoch: 12 Global Step: 499780 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:52:13,580-Speed 2644.83 samples/sec Loss 5.3362 LearningRate 0.0158 Epoch: 12 Global Step: 499790 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:17,573-Speed 2565.02 samples/sec Loss 5.2696 LearningRate 0.0158 Epoch: 12 Global Step: 499800 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:21,470-Speed 2628.42 samples/sec Loss 5.3434 LearningRate 0.0158 Epoch: 12 Global Step: 499810 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:25,367-Speed 2628.13 samples/sec Loss 5.2490 LearningRate 0.0158 Epoch: 12 Global Step: 499820 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:29,271-Speed 2624.33 samples/sec Loss 5.3653 LearningRate 0.0158 Epoch: 12 Global Step: 499830 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:33,178-Speed 2621.38 samples/sec Loss 5.3515 LearningRate 0.0158 Epoch: 12 Global Step: 499840 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:37,075-Speed 2627.80 samples/sec Loss 5.4284 LearningRate 0.0158 Epoch: 12 Global Step: 499850 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:40,968-Speed 2631.02 samples/sec Loss 5.4141 LearningRate 0.0158 Epoch: 12 Global Step: 499860 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:44,860-Speed 2632.35 samples/sec Loss 5.3096 LearningRate 0.0158 Epoch: 12 Global Step: 499870 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:48,758-Speed 2627.51 samples/sec Loss 5.3091 LearningRate 0.0158 Epoch: 12 Global Step: 499880 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:52:52,745-Speed 2569.37 samples/sec Loss 5.3481 LearningRate 0.0158 Epoch: 12 Global Step: 499890 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:52:56,643-Speed 2627.14 samples/sec Loss 5.2132 LearningRate 0.0158 Epoch: 12 Global Step: 499900 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:00,536-Speed 2631.54 samples/sec Loss 5.2619 LearningRate 0.0158 Epoch: 12 Global Step: 499910 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:04,430-Speed 2630.44 samples/sec Loss 5.3511 LearningRate 0.0158 Epoch: 12 Global Step: 499920 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:08,325-Speed 2629.77 samples/sec Loss 5.4226 LearningRate 0.0158 Epoch: 12 Global Step: 499930 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:12,251-Speed 2608.50 samples/sec Loss 5.2875 LearningRate 0.0158 Epoch: 12 Global Step: 499940 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:16,151-Speed 2626.48 samples/sec Loss 5.3128 LearningRate 0.0158 Epoch: 12 Global Step: 499950 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:20,046-Speed 2630.27 samples/sec Loss 5.2772 LearningRate 0.0158 Epoch: 12 Global Step: 499960 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:23,958-Speed 2618.67 samples/sec Loss 5.3100 LearningRate 0.0158 Epoch: 12 Global Step: 499970 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:27,857-Speed 2626.83 samples/sec Loss 5.2590 LearningRate 0.0158 Epoch: 12 Global Step: 499980 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:31,774-Speed 2615.14 samples/sec Loss 5.3633 LearningRate 0.0158 Epoch: 12 Global Step: 499990 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:53:35,647-Speed 2644.52 samples/sec Loss 5.3178 LearningRate 0.0158 Epoch: 12 Global Step: 500000 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:54:18,662-[lfw][500000]XNorm: 22.839151
Training: 2022-04-15 03:54:18,663-[lfw][500000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-15 03:54:18,664-[lfw][500000]Accuracy-Highest: 0.99800
Training: 2022-04-15 03:55:08,750-[cfp_fp][500000]XNorm: 21.312026
Training: 2022-04-15 03:55:08,751-[cfp_fp][500000]Accuracy-Flip: 0.98929+-0.00439
Training: 2022-04-15 03:55:08,751-[cfp_fp][500000]Accuracy-Highest: 0.98971
Training: 2022-04-15 03:55:51,692-[agedb_30][500000]XNorm: 22.884774
Training: 2022-04-15 03:55:51,693-[agedb_30][500000]Accuracy-Flip: 0.97667+-0.00703
Training: 2022-04-15 03:55:51,694-[agedb_30][500000]Accuracy-Highest: 0.97950
Training: 2022-04-15 03:55:55,561-Speed 73.19 samples/sec Loss 5.2997 LearningRate 0.0158 Epoch: 12 Global Step: 500010 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:55:59,417-Speed 2656.28 samples/sec Loss 5.2730 LearningRate 0.0158 Epoch: 12 Global Step: 500020 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:03,280-Speed 2651.29 samples/sec Loss 5.3116 LearningRate 0.0158 Epoch: 12 Global Step: 500030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:07,150-Speed 2646.50 samples/sec Loss 5.2819 LearningRate 0.0158 Epoch: 12 Global Step: 500040 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:11,026-Speed 2642.34 samples/sec Loss 5.3360 LearningRate 0.0158 Epoch: 12 Global Step: 500050 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:14,912-Speed 2635.66 samples/sec Loss 5.2410 LearningRate 0.0158 Epoch: 12 Global Step: 500060 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:18,793-Speed 2639.48 samples/sec Loss 5.3157 LearningRate 0.0158 Epoch: 12 Global Step: 500070 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:22,685-Speed 2631.34 samples/sec Loss 5.4727 LearningRate 0.0158 Epoch: 12 Global Step: 500080 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:26,569-Speed 2637.73 samples/sec Loss 5.4389 LearningRate 0.0158 Epoch: 12 Global Step: 500090 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:56:30,445-Speed 2642.59 samples/sec Loss 5.3291 LearningRate 0.0158 Epoch: 12 Global Step: 500100 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:56:34,362-Speed 2615.06 samples/sec Loss 5.2641 LearningRate 0.0158 Epoch: 12 Global Step: 500110 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:56:38,244-Speed 2638.86 samples/sec Loss 5.3202 LearningRate 0.0158 Epoch: 12 Global Step: 500120 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:56:42,126-Speed 2637.96 samples/sec Loss 5.3024 LearningRate 0.0158 Epoch: 12 Global Step: 500130 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:56:46,017-Speed 2632.37 samples/sec Loss 5.3095 LearningRate 0.0158 Epoch: 12 Global Step: 500140 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:56:49,906-Speed 2634.07 samples/sec Loss 5.2356 LearningRate 0.0158 Epoch: 12 Global Step: 500150 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:56:53,792-Speed 2636.07 samples/sec Loss 5.2247 LearningRate 0.0158 Epoch: 12 Global Step: 500160 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:56:57,682-Speed 2633.30 samples/sec Loss 5.2517 LearningRate 0.0158 Epoch: 12 Global Step: 500170 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:01,567-Speed 2636.31 samples/sec Loss 5.3180 LearningRate 0.0158 Epoch: 12 Global Step: 500180 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:05,455-Speed 2634.95 samples/sec Loss 5.3614 LearningRate 0.0158 Epoch: 12 Global Step: 500190 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:09,327-Speed 2644.98 samples/sec Loss 5.1794 LearningRate 0.0158 Epoch: 12 Global Step: 500200 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:13,210-Speed 2637.35 samples/sec Loss 5.3161 LearningRate 0.0158 Epoch: 12 Global Step: 500210 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:17,092-Speed 2638.02 samples/sec Loss 5.3050 LearningRate 0.0158 Epoch: 12 Global Step: 500220 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:20,976-Speed 2637.93 samples/sec Loss 5.3325 LearningRate 0.0158 Epoch: 12 Global Step: 500230 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:24,866-Speed 2633.24 samples/sec Loss 5.4303 LearningRate 0.0158 Epoch: 12 Global Step: 500240 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:28,752-Speed 2636.17 samples/sec Loss 5.4056 LearningRate 0.0158 Epoch: 12 Global Step: 500250 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:32,636-Speed 2637.23 samples/sec Loss 5.3294 LearningRate 0.0158 Epoch: 12 Global Step: 500260 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:36,534-Speed 2627.33 samples/sec Loss 5.3587 LearningRate 0.0158 Epoch: 12 Global Step: 500270 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:57:40,397-Speed 2651.43 samples/sec Loss 5.3903 LearningRate 0.0158 Epoch: 12 Global Step: 500280 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:57:44,290-Speed 2631.08 samples/sec Loss 5.2877 LearningRate 0.0158 Epoch: 12 Global Step: 500290 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:57:48,184-Speed 2630.18 samples/sec Loss 5.3112 LearningRate 0.0158 Epoch: 12 Global Step: 500300 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:57:52,079-Speed 2630.01 samples/sec Loss 5.2989 LearningRate 0.0158 Epoch: 12 Global Step: 500310 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:57:55,969-Speed 2633.35 samples/sec Loss 5.2523 LearningRate 0.0158 Epoch: 12 Global Step: 500320 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:57:59,851-Speed 2638.17 samples/sec Loss 5.2699 LearningRate 0.0158 Epoch: 12 Global Step: 500330 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:58:03,737-Speed 2636.13 samples/sec Loss 5.3075 LearningRate 0.0158 Epoch: 12 Global Step: 500340 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:58:07,615-Speed 2641.58 samples/sec Loss 5.2958 LearningRate 0.0158 Epoch: 12 Global Step: 500350 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:58:11,503-Speed 2634.10 samples/sec Loss 5.3510 LearningRate 0.0157 Epoch: 12 Global Step: 500360 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:58:15,397-Speed 2629.95 samples/sec Loss 5.4227 LearningRate 0.0157 Epoch: 12 Global Step: 500370 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:58:19,282-Speed 2636.36 samples/sec Loss 5.2197 LearningRate 0.0157 Epoch: 12 Global Step: 500380 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:23,170-Speed 2634.68 samples/sec Loss 5.3274 LearningRate 0.0157 Epoch: 12 Global Step: 500390 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:27,054-Speed 2637.31 samples/sec Loss 5.3312 LearningRate 0.0157 Epoch: 12 Global Step: 500400 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:30,938-Speed 2637.26 samples/sec Loss 5.3782 LearningRate 0.0157 Epoch: 12 Global Step: 500410 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:34,819-Speed 2639.15 samples/sec Loss 5.4777 LearningRate 0.0157 Epoch: 12 Global Step: 500420 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:38,702-Speed 2637.87 samples/sec Loss 5.2947 LearningRate 0.0157 Epoch: 12 Global Step: 500430 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:42,606-Speed 2623.59 samples/sec Loss 5.1893 LearningRate 0.0157 Epoch: 12 Global Step: 500440 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:46,487-Speed 2638.74 samples/sec Loss 5.3129 LearningRate 0.0157 Epoch: 12 Global Step: 500450 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:50,371-Speed 2637.16 samples/sec Loss 5.3563 LearningRate 0.0157 Epoch: 12 Global Step: 500460 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:54,257-Speed 2635.70 samples/sec Loss 5.4487 LearningRate 0.0157 Epoch: 12 Global Step: 500470 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:58:58,122-Speed 2650.33 samples/sec Loss 5.3764 LearningRate 0.0157 Epoch: 12 Global Step: 500480 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:02,011-Speed 2634.03 samples/sec Loss 5.2939 LearningRate 0.0157 Epoch: 12 Global Step: 500490 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:05,902-Speed 2632.32 samples/sec Loss 5.2867 LearningRate 0.0157 Epoch: 12 Global Step: 500500 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:09,791-Speed 2634.36 samples/sec Loss 5.3584 LearningRate 0.0157 Epoch: 12 Global Step: 500510 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:13,696-Speed 2622.74 samples/sec Loss 5.4066 LearningRate 0.0157 Epoch: 12 Global Step: 500520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:17,584-Speed 2633.99 samples/sec Loss 5.3075 LearningRate 0.0157 Epoch: 12 Global Step: 500530 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:21,477-Speed 2630.45 samples/sec Loss 5.3499 LearningRate 0.0157 Epoch: 12 Global Step: 500540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:25,366-Speed 2634.46 samples/sec Loss 5.3413 LearningRate 0.0157 Epoch: 12 Global Step: 500550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:29,302-Speed 2601.97 samples/sec Loss 5.3733 LearningRate 0.0157 Epoch: 12 Global Step: 500560 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:33,192-Speed 2633.76 samples/sec Loss 5.3681 LearningRate 0.0157 Epoch: 12 Global Step: 500570 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:37,060-Speed 2647.30 samples/sec Loss 5.3395 LearningRate 0.0157 Epoch: 12 Global Step: 500580 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:40,946-Speed 2636.09 samples/sec Loss 5.3564 LearningRate 0.0157 Epoch: 12 Global Step: 500590 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 03:59:44,813-Speed 2648.31 samples/sec Loss 5.1928 LearningRate 0.0157 Epoch: 12 Global Step: 500600 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:59:48,709-Speed 2629.42 samples/sec Loss 5.3175 LearningRate 0.0157 Epoch: 12 Global Step: 500610 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:59:52,592-Speed 2637.00 samples/sec Loss 5.2819 LearningRate 0.0157 Epoch: 12 Global Step: 500620 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 03:59:56,478-Speed 2636.33 samples/sec Loss 5.2359 LearningRate 0.0157 Epoch: 12 Global Step: 500630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:00:00,365-Speed 2634.95 samples/sec Loss 5.4633 LearningRate 0.0157 Epoch: 12 Global Step: 500640 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:00:04,230-Speed 2650.47 samples/sec Loss 5.3422 LearningRate 0.0157 Epoch: 12 Global Step: 500650 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:08,126-Speed 2628.83 samples/sec Loss 5.3741 LearningRate 0.0157 Epoch: 12 Global Step: 500660 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:12,019-Speed 2631.31 samples/sec Loss 5.3288 LearningRate 0.0157 Epoch: 12 Global Step: 500670 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:15,909-Speed 2633.24 samples/sec Loss 5.3815 LearningRate 0.0157 Epoch: 12 Global Step: 500680 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:19,798-Speed 2633.52 samples/sec Loss 5.3969 LearningRate 0.0157 Epoch: 12 Global Step: 500690 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:23,686-Speed 2633.90 samples/sec Loss 5.2874 LearningRate 0.0157 Epoch: 12 Global Step: 500700 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:27,581-Speed 2630.67 samples/sec Loss 5.2440 LearningRate 0.0157 Epoch: 12 Global Step: 500710 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:31,469-Speed 2634.26 samples/sec Loss 5.2633 LearningRate 0.0157 Epoch: 12 Global Step: 500720 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:35,370-Speed 2625.72 samples/sec Loss 5.2969 LearningRate 0.0157 Epoch: 12 Global Step: 500730 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:39,277-Speed 2621.22 samples/sec Loss 5.3091 LearningRate 0.0157 Epoch: 12 Global Step: 500740 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:00:43,167-Speed 2633.35 samples/sec Loss 5.2683 LearningRate 0.0157 Epoch: 12 Global Step: 500750 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:00:47,055-Speed 2634.86 samples/sec Loss 5.3373 LearningRate 0.0157 Epoch: 12 Global Step: 500760 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:00:50,944-Speed 2633.49 samples/sec Loss 5.2952 LearningRate 0.0157 Epoch: 12 Global Step: 500770 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:00:54,835-Speed 2633.03 samples/sec Loss 5.3656 LearningRate 0.0157 Epoch: 12 Global Step: 500780 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:00:58,725-Speed 2632.61 samples/sec Loss 5.2837 LearningRate 0.0157 Epoch: 12 Global Step: 500790 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:02,616-Speed 2632.94 samples/sec Loss 5.2974 LearningRate 0.0157 Epoch: 12 Global Step: 500800 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:06,514-Speed 2627.61 samples/sec Loss 5.2625 LearningRate 0.0157 Epoch: 12 Global Step: 500810 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:10,405-Speed 2632.27 samples/sec Loss 5.2539 LearningRate 0.0157 Epoch: 12 Global Step: 500820 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:14,289-Speed 2636.72 samples/sec Loss 5.3147 LearningRate 0.0157 Epoch: 12 Global Step: 500830 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:18,207-Speed 2615.44 samples/sec Loss 5.2453 LearningRate 0.0157 Epoch: 12 Global Step: 500840 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:22,084-Speed 2642.19 samples/sec Loss 5.3717 LearningRate 0.0157 Epoch: 12 Global Step: 500850 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:25,997-Speed 2617.54 samples/sec Loss 5.3517 LearningRate 0.0157 Epoch: 12 Global Step: 500860 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:29,882-Speed 2637.21 samples/sec Loss 5.3341 LearningRate 0.0157 Epoch: 12 Global Step: 500870 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:33,774-Speed 2631.31 samples/sec Loss 5.3256 LearningRate 0.0157 Epoch: 12 Global Step: 500880 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:37,662-Speed 2634.48 samples/sec Loss 5.3899 LearningRate 0.0157 Epoch: 12 Global Step: 500890 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:41,548-Speed 2635.76 samples/sec Loss 5.3375 LearningRate 0.0157 Epoch: 12 Global Step: 500900 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:45,435-Speed 2635.15 samples/sec Loss 5.3709 LearningRate 0.0157 Epoch: 12 Global Step: 500910 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:49,321-Speed 2635.30 samples/sec Loss 5.2225 LearningRate 0.0157 Epoch: 12 Global Step: 500920 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:53,214-Speed 2631.86 samples/sec Loss 5.3492 LearningRate 0.0157 Epoch: 12 Global Step: 500930 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:01:57,109-Speed 2629.69 samples/sec Loss 5.4262 LearningRate 0.0157 Epoch: 12 Global Step: 500940 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:01,005-Speed 2629.12 samples/sec Loss 5.3673 LearningRate 0.0157 Epoch: 12 Global Step: 500950 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:02:04,898-Speed 2630.89 samples/sec Loss 5.2725 LearningRate 0.0157 Epoch: 12 Global Step: 500960 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:02:08,790-Speed 2631.22 samples/sec Loss 5.3418 LearningRate 0.0157 Epoch: 12 Global Step: 500970 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:02:12,677-Speed 2635.26 samples/sec Loss 5.3292 LearningRate 0.0157 Epoch: 12 Global Step: 500980 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:02:16,563-Speed 2635.75 samples/sec Loss 5.3116 LearningRate 0.0157 Epoch: 12 Global Step: 500990 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:02:20,451-Speed 2634.33 samples/sec Loss 5.2727 LearningRate 0.0157 Epoch: 12 Global Step: 501000 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:02:24,345-Speed 2630.76 samples/sec Loss 5.2368 LearningRate 0.0157 Epoch: 12 Global Step: 501010 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:02:28,222-Speed 2642.25 samples/sec Loss 5.2876 LearningRate 0.0157 Epoch: 12 Global Step: 501020 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:32,107-Speed 2636.44 samples/sec Loss 5.3374 LearningRate 0.0157 Epoch: 12 Global Step: 501030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:35,996-Speed 2633.62 samples/sec Loss 5.2590 LearningRate 0.0157 Epoch: 12 Global Step: 501040 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:39,892-Speed 2628.86 samples/sec Loss 5.2593 LearningRate 0.0157 Epoch: 12 Global Step: 501050 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:43,782-Speed 2632.75 samples/sec Loss 5.2425 LearningRate 0.0157 Epoch: 12 Global Step: 501060 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:47,677-Speed 2630.10 samples/sec Loss 5.3454 LearningRate 0.0157 Epoch: 12 Global Step: 501070 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:51,568-Speed 2633.18 samples/sec Loss 5.3507 LearningRate 0.0157 Epoch: 12 Global Step: 501080 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:55,460-Speed 2631.05 samples/sec Loss 5.2937 LearningRate 0.0157 Epoch: 12 Global Step: 501090 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:02:59,353-Speed 2631.34 samples/sec Loss 5.2994 LearningRate 0.0157 Epoch: 12 Global Step: 501100 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:03:03,225-Speed 2645.31 samples/sec Loss 5.3605 LearningRate 0.0157 Epoch: 12 Global Step: 501110 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:07,125-Speed 2625.87 samples/sec Loss 5.2139 LearningRate 0.0157 Epoch: 12 Global Step: 501120 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:11,018-Speed 2631.13 samples/sec Loss 5.3539 LearningRate 0.0157 Epoch: 12 Global Step: 501130 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:14,906-Speed 2634.55 samples/sec Loss 5.2868 LearningRate 0.0157 Epoch: 12 Global Step: 501140 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:18,794-Speed 2634.51 samples/sec Loss 5.3339 LearningRate 0.0157 Epoch: 12 Global Step: 501150 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:22,678-Speed 2637.36 samples/sec Loss 5.2841 LearningRate 0.0157 Epoch: 12 Global Step: 501160 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:26,577-Speed 2627.07 samples/sec Loss 5.1903 LearningRate 0.0157 Epoch: 12 Global Step: 501170 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:30,471-Speed 2630.39 samples/sec Loss 5.2319 LearningRate 0.0157 Epoch: 12 Global Step: 501180 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:34,361-Speed 2633.11 samples/sec Loss 5.2958 LearningRate 0.0157 Epoch: 12 Global Step: 501190 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:38,249-Speed 2634.43 samples/sec Loss 5.3716 LearningRate 0.0157 Epoch: 12 Global Step: 501200 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:03:42,176-Speed 2607.70 samples/sec Loss 5.3065 LearningRate 0.0157 Epoch: 12 Global Step: 501210 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:03:46,070-Speed 2630.60 samples/sec Loss 5.3091 LearningRate 0.0157 Epoch: 12 Global Step: 501220 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:03:49,960-Speed 2633.57 samples/sec Loss 5.3616 LearningRate 0.0157 Epoch: 12 Global Step: 501230 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:03:53,850-Speed 2632.79 samples/sec Loss 5.3104 LearningRate 0.0157 Epoch: 12 Global Step: 501240 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:03:57,744-Speed 2630.39 samples/sec Loss 5.3533 LearningRate 0.0157 Epoch: 12 Global Step: 501250 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:01,633-Speed 2633.79 samples/sec Loss 5.2407 LearningRate 0.0157 Epoch: 12 Global Step: 501260 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:05,534-Speed 2625.54 samples/sec Loss 5.3264 LearningRate 0.0157 Epoch: 12 Global Step: 501270 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:09,422-Speed 2634.02 samples/sec Loss 5.2004 LearningRate 0.0157 Epoch: 12 Global Step: 501280 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:13,314-Speed 2631.53 samples/sec Loss 5.2898 LearningRate 0.0157 Epoch: 12 Global Step: 501290 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:17,208-Speed 2630.88 samples/sec Loss 5.2956 LearningRate 0.0157 Epoch: 12 Global Step: 501300 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:21,101-Speed 2630.78 samples/sec Loss 5.2310 LearningRate 0.0157 Epoch: 12 Global Step: 501310 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:04:24,991-Speed 2632.91 samples/sec Loss 5.2901 LearningRate 0.0157 Epoch: 12 Global Step: 501320 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:04:28,864-Speed 2645.17 samples/sec Loss 5.2940 LearningRate 0.0157 Epoch: 12 Global Step: 501330 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:32,758-Speed 2629.84 samples/sec Loss 5.3373 LearningRate 0.0157 Epoch: 12 Global Step: 501340 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:36,652-Speed 2630.54 samples/sec Loss 5.2947 LearningRate 0.0157 Epoch: 12 Global Step: 501350 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:40,546-Speed 2630.44 samples/sec Loss 5.4085 LearningRate 0.0157 Epoch: 12 Global Step: 501360 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:44,437-Speed 2632.26 samples/sec Loss 5.4412 LearningRate 0.0157 Epoch: 12 Global Step: 501370 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:48,326-Speed 2633.94 samples/sec Loss 5.2942 LearningRate 0.0157 Epoch: 12 Global Step: 501380 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:52,231-Speed 2622.48 samples/sec Loss 5.3164 LearningRate 0.0157 Epoch: 12 Global Step: 501390 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:04:56,126-Speed 2629.86 samples/sec Loss 5.3278 LearningRate 0.0156 Epoch: 12 Global Step: 501400 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:05:00,074-Speed 2594.78 samples/sec Loss 5.2206 LearningRate 0.0156 Epoch: 12 Global Step: 501410 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:05:04,097-Speed 2545.96 samples/sec Loss 5.2853 LearningRate 0.0156 Epoch: 12 Global Step: 501420 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:05:07,994-Speed 2628.28 samples/sec Loss 5.2977 LearningRate 0.0156 Epoch: 12 Global Step: 501430 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:11,911-Speed 2615.09 samples/sec Loss 5.2326 LearningRate 0.0156 Epoch: 12 Global Step: 501440 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:15,806-Speed 2629.40 samples/sec Loss 5.3863 LearningRate 0.0156 Epoch: 12 Global Step: 501450 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:19,695-Speed 2633.81 samples/sec Loss 5.2627 LearningRate 0.0156 Epoch: 12 Global Step: 501460 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:23,586-Speed 2632.92 samples/sec Loss 5.2927 LearningRate 0.0156 Epoch: 12 Global Step: 501470 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:27,478-Speed 2631.70 samples/sec Loss 5.2171 LearningRate 0.0156 Epoch: 12 Global Step: 501480 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:31,379-Speed 2625.05 samples/sec Loss 5.3188 LearningRate 0.0156 Epoch: 12 Global Step: 501490 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:35,267-Speed 2634.92 samples/sec Loss 5.2096 LearningRate 0.0156 Epoch: 12 Global Step: 501500 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:39,171-Speed 2623.15 samples/sec Loss 5.3546 LearningRate 0.0156 Epoch: 12 Global Step: 501510 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:43,065-Speed 2630.26 samples/sec Loss 5.2759 LearningRate 0.0156 Epoch: 12 Global Step: 501520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:46,965-Speed 2626.74 samples/sec Loss 5.3359 LearningRate 0.0156 Epoch: 12 Global Step: 501530 Fp16 Grad Scale: 262144 Required: 37 hours
Training: 2022-04-15 04:05:50,842-Speed 2642.02 samples/sec Loss 5.3267 LearningRate 0.0156 Epoch: 12 Global Step: 501540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:54,733-Speed 2631.72 samples/sec Loss 5.3003 LearningRate 0.0156 Epoch: 12 Global Step: 501550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:05:58,623-Speed 2634.08 samples/sec Loss 5.3652 LearningRate 0.0156 Epoch: 12 Global Step: 501560 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:06:02,512-Speed 2633.43 samples/sec Loss 5.3161 LearningRate 0.0156 Epoch: 12 Global Step: 501570 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:06:06,411-Speed 2627.12 samples/sec Loss 5.3236 LearningRate 0.0156 Epoch: 12 Global Step: 501580 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:06:10,307-Speed 2628.58 samples/sec Loss 5.2771 LearningRate 0.0156 Epoch: 12 Global Step: 501590 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:06:14,187-Speed 2640.01 samples/sec Loss 5.3022 LearningRate 0.0156 Epoch: 12 Global Step: 501600 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:18,079-Speed 2631.88 samples/sec Loss 5.3383 LearningRate 0.0156 Epoch: 12 Global Step: 501610 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:21,966-Speed 2635.63 samples/sec Loss 5.2039 LearningRate 0.0156 Epoch: 12 Global Step: 501620 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:25,865-Speed 2627.19 samples/sec Loss 5.2983 LearningRate 0.0156 Epoch: 12 Global Step: 501630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:29,752-Speed 2634.39 samples/sec Loss 5.1992 LearningRate 0.0156 Epoch: 12 Global Step: 501640 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:33,641-Speed 2634.00 samples/sec Loss 5.2916 LearningRate 0.0156 Epoch: 12 Global Step: 501650 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:37,537-Speed 2629.46 samples/sec Loss 5.3119 LearningRate 0.0156 Epoch: 12 Global Step: 501660 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:41,425-Speed 2634.27 samples/sec Loss 5.3362 LearningRate 0.0156 Epoch: 12 Global Step: 501670 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:45,314-Speed 2633.34 samples/sec Loss 5.4193 LearningRate 0.0156 Epoch: 12 Global Step: 501680 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:49,204-Speed 2633.26 samples/sec Loss 5.2218 LearningRate 0.0156 Epoch: 12 Global Step: 501690 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:53,082-Speed 2640.98 samples/sec Loss 5.3799 LearningRate 0.0156 Epoch: 12 Global Step: 501700 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:06:56,981-Speed 2627.53 samples/sec Loss 5.3080 LearningRate 0.0156 Epoch: 12 Global Step: 501710 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:00,918-Speed 2601.17 samples/sec Loss 5.2286 LearningRate 0.0156 Epoch: 12 Global Step: 501720 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:04,830-Speed 2618.33 samples/sec Loss 5.2084 LearningRate 0.0156 Epoch: 12 Global Step: 501730 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:08,727-Speed 2628.72 samples/sec Loss 5.2847 LearningRate 0.0156 Epoch: 12 Global Step: 501740 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:12,619-Speed 2631.66 samples/sec Loss 5.2366 LearningRate 0.0156 Epoch: 12 Global Step: 501750 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:16,517-Speed 2627.28 samples/sec Loss 5.4110 LearningRate 0.0156 Epoch: 12 Global Step: 501760 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:20,428-Speed 2618.93 samples/sec Loss 5.3651 LearningRate 0.0156 Epoch: 12 Global Step: 501770 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:24,316-Speed 2635.19 samples/sec Loss 5.3710 LearningRate 0.0156 Epoch: 12 Global Step: 501780 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:28,213-Speed 2628.35 samples/sec Loss 5.3044 LearningRate 0.0156 Epoch: 12 Global Step: 501790 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:32,102-Speed 2633.67 samples/sec Loss 5.2616 LearningRate 0.0156 Epoch: 12 Global Step: 501800 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:07:35,991-Speed 2633.91 samples/sec Loss 5.2514 LearningRate 0.0156 Epoch: 12 Global Step: 501810 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:07:39,879-Speed 2633.94 samples/sec Loss 5.2834 LearningRate 0.0156 Epoch: 12 Global Step: 501820 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:07:43,772-Speed 2631.14 samples/sec Loss 5.3072 LearningRate 0.0156 Epoch: 12 Global Step: 501830 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:07:47,644-Speed 2645.45 samples/sec Loss 5.2448 LearningRate 0.0156 Epoch: 12 Global Step: 501840 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:51,535-Speed 2632.27 samples/sec Loss 5.4061 LearningRate 0.0156 Epoch: 12 Global Step: 501850 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:55,428-Speed 2631.00 samples/sec Loss 5.3083 LearningRate 0.0156 Epoch: 12 Global Step: 501860 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:07:59,323-Speed 2629.52 samples/sec Loss 5.2660 LearningRate 0.0156 Epoch: 12 Global Step: 501870 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:03,216-Speed 2631.34 samples/sec Loss 5.2572 LearningRate 0.0156 Epoch: 12 Global Step: 501880 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:07,108-Speed 2631.61 samples/sec Loss 5.3039 LearningRate 0.0156 Epoch: 12 Global Step: 501890 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:10,999-Speed 2632.44 samples/sec Loss 5.2406 LearningRate 0.0156 Epoch: 12 Global Step: 501900 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:14,889-Speed 2632.45 samples/sec Loss 5.3488 LearningRate 0.0156 Epoch: 12 Global Step: 501910 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:18,789-Speed 2627.21 samples/sec Loss 5.2815 LearningRate 0.0156 Epoch: 12 Global Step: 501920 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:22,683-Speed 2630.60 samples/sec Loss 5.2237 LearningRate 0.0156 Epoch: 12 Global Step: 501930 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:26,565-Speed 2638.12 samples/sec Loss 5.2169 LearningRate 0.0156 Epoch: 12 Global Step: 501940 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:30,486-Speed 2612.57 samples/sec Loss 5.2529 LearningRate 0.0156 Epoch: 12 Global Step: 501950 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:34,379-Speed 2631.26 samples/sec Loss 5.2800 LearningRate 0.0156 Epoch: 12 Global Step: 501960 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:38,269-Speed 2633.84 samples/sec Loss 5.2567 LearningRate 0.0156 Epoch: 12 Global Step: 501970 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:42,159-Speed 2633.17 samples/sec Loss 5.3647 LearningRate 0.0156 Epoch: 12 Global Step: 501980 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:46,051-Speed 2631.29 samples/sec Loss 5.2249 LearningRate 0.0156 Epoch: 12 Global Step: 501990 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:49,947-Speed 2629.05 samples/sec Loss 5.3392 LearningRate 0.0156 Epoch: 12 Global Step: 502000 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:53,839-Speed 2631.67 samples/sec Loss 5.3783 LearningRate 0.0156 Epoch: 12 Global Step: 502010 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:08:57,728-Speed 2634.28 samples/sec Loss 5.3285 LearningRate 0.0156 Epoch: 12 Global Step: 502020 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:01,619-Speed 2632.68 samples/sec Loss 5.3186 LearningRate 0.0156 Epoch: 12 Global Step: 502030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:05,512-Speed 2631.06 samples/sec Loss 5.2927 LearningRate 0.0156 Epoch: 12 Global Step: 502040 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:09:09,406-Speed 2629.68 samples/sec Loss 5.2452 LearningRate 0.0156 Epoch: 12 Global Step: 502050 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:09:13,297-Speed 2632.55 samples/sec Loss 5.3479 LearningRate 0.0156 Epoch: 12 Global Step: 502060 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:09:17,187-Speed 2633.08 samples/sec Loss 5.2677 LearningRate 0.0156 Epoch: 12 Global Step: 502070 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:09:21,087-Speed 2626.19 samples/sec Loss 5.2787 LearningRate 0.0156 Epoch: 12 Global Step: 502080 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:09:24,979-Speed 2632.07 samples/sec Loss 5.4216 LearningRate 0.0156 Epoch: 12 Global Step: 502090 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:09:28,869-Speed 2632.96 samples/sec Loss 5.3633 LearningRate 0.0156 Epoch: 12 Global Step: 502100 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:09:32,738-Speed 2647.69 samples/sec Loss 5.3127 LearningRate 0.0156 Epoch: 12 Global Step: 502110 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:36,630-Speed 2631.42 samples/sec Loss 5.3006 LearningRate 0.0156 Epoch: 12 Global Step: 502120 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:40,525-Speed 2629.25 samples/sec Loss 5.2173 LearningRate 0.0156 Epoch: 12 Global Step: 502130 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:44,412-Speed 2634.84 samples/sec Loss 5.1935 LearningRate 0.0156 Epoch: 12 Global Step: 502140 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:48,303-Speed 2632.33 samples/sec Loss 5.3534 LearningRate 0.0156 Epoch: 12 Global Step: 502150 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:52,197-Speed 2630.54 samples/sec Loss 5.2682 LearningRate 0.0156 Epoch: 12 Global Step: 502160 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:56,085-Speed 2633.99 samples/sec Loss 5.3094 LearningRate 0.0156 Epoch: 12 Global Step: 502170 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:09:59,984-Speed 2627.34 samples/sec Loss 5.3033 LearningRate 0.0156 Epoch: 12 Global Step: 502180 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:03,880-Speed 2629.10 samples/sec Loss 5.3655 LearningRate 0.0156 Epoch: 12 Global Step: 502190 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:07,768-Speed 2634.14 samples/sec Loss 5.3614 LearningRate 0.0156 Epoch: 12 Global Step: 502200 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:11,662-Speed 2630.55 samples/sec Loss 5.3312 LearningRate 0.0156 Epoch: 12 Global Step: 502210 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:10:15,549-Speed 2634.93 samples/sec Loss 5.2661 LearningRate 0.0156 Epoch: 12 Global Step: 502220 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:10:19,447-Speed 2627.47 samples/sec Loss 5.3029 LearningRate 0.0156 Epoch: 12 Global Step: 502230 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:10:23,337-Speed 2633.05 samples/sec Loss 5.2639 LearningRate 0.0156 Epoch: 12 Global Step: 502240 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:10:27,226-Speed 2633.93 samples/sec Loss 5.3464 LearningRate 0.0156 Epoch: 12 Global Step: 502250 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:31,126-Speed 2625.98 samples/sec Loss 5.2130 LearningRate 0.0156 Epoch: 12 Global Step: 502260 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:35,016-Speed 2633.27 samples/sec Loss 5.3557 LearningRate 0.0156 Epoch: 12 Global Step: 502270 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:38,909-Speed 2631.36 samples/sec Loss 5.3030 LearningRate 0.0156 Epoch: 12 Global Step: 502280 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:42,822-Speed 2617.42 samples/sec Loss 5.2919 LearningRate 0.0156 Epoch: 12 Global Step: 502290 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:46,713-Speed 2632.11 samples/sec Loss 5.3554 LearningRate 0.0156 Epoch: 12 Global Step: 502300 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:50,608-Speed 2629.92 samples/sec Loss 5.2450 LearningRate 0.0156 Epoch: 12 Global Step: 502310 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:54,502-Speed 2630.09 samples/sec Loss 5.2935 LearningRate 0.0156 Epoch: 12 Global Step: 502320 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:10:58,392-Speed 2633.21 samples/sec Loss 5.3703 LearningRate 0.0156 Epoch: 12 Global Step: 502330 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:02,287-Speed 2629.74 samples/sec Loss 5.2440 LearningRate 0.0156 Epoch: 12 Global Step: 502340 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:06,179-Speed 2631.59 samples/sec Loss 5.2466 LearningRate 0.0156 Epoch: 12 Global Step: 502350 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:11:10,053-Speed 2643.88 samples/sec Loss 5.1657 LearningRate 0.0156 Epoch: 12 Global Step: 502360 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:13,949-Speed 2628.68 samples/sec Loss 5.1621 LearningRate 0.0156 Epoch: 12 Global Step: 502370 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:17,859-Speed 2619.61 samples/sec Loss 5.2985 LearningRate 0.0156 Epoch: 12 Global Step: 502380 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:21,751-Speed 2631.58 samples/sec Loss 5.3127 LearningRate 0.0156 Epoch: 12 Global Step: 502390 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:25,646-Speed 2629.75 samples/sec Loss 5.3251 LearningRate 0.0156 Epoch: 12 Global Step: 502400 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:29,540-Speed 2630.22 samples/sec Loss 5.2749 LearningRate 0.0156 Epoch: 12 Global Step: 502410 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:33,486-Speed 2596.10 samples/sec Loss 5.1911 LearningRate 0.0156 Epoch: 12 Global Step: 502420 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:37,378-Speed 2631.59 samples/sec Loss 5.2946 LearningRate 0.0156 Epoch: 12 Global Step: 502430 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:41,289-Speed 2618.86 samples/sec Loss 5.3673 LearningRate 0.0156 Epoch: 12 Global Step: 502440 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:45,184-Speed 2629.40 samples/sec Loss 5.3008 LearningRate 0.0155 Epoch: 12 Global Step: 502450 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:11:49,092-Speed 2620.52 samples/sec Loss 5.2590 LearningRate 0.0155 Epoch: 12 Global Step: 502460 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:11:52,981-Speed 2633.61 samples/sec Loss 5.1441 LearningRate 0.0155 Epoch: 12 Global Step: 502470 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:11:56,874-Speed 2631.82 samples/sec Loss 5.3387 LearningRate 0.0155 Epoch: 12 Global Step: 502480 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:00,769-Speed 2629.37 samples/sec Loss 5.1537 LearningRate 0.0155 Epoch: 12 Global Step: 502490 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:04,702-Speed 2604.55 samples/sec Loss 5.3070 LearningRate 0.0155 Epoch: 12 Global Step: 502500 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:08,589-Speed 2634.68 samples/sec Loss 5.3456 LearningRate 0.0155 Epoch: 12 Global Step: 502510 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:12,476-Speed 2635.88 samples/sec Loss 5.2979 LearningRate 0.0155 Epoch: 12 Global Step: 502520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:16,366-Speed 2632.85 samples/sec Loss 5.3280 LearningRate 0.0155 Epoch: 12 Global Step: 502530 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:20,255-Speed 2633.13 samples/sec Loss 5.2043 LearningRate 0.0155 Epoch: 12 Global Step: 502540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:24,144-Speed 2634.05 samples/sec Loss 5.4120 LearningRate 0.0155 Epoch: 12 Global Step: 502550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:12:28,015-Speed 2645.95 samples/sec Loss 5.2749 LearningRate 0.0155 Epoch: 12 Global Step: 502560 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:31,916-Speed 2625.30 samples/sec Loss 5.1682 LearningRate 0.0155 Epoch: 12 Global Step: 502570 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:35,810-Speed 2630.16 samples/sec Loss 5.3617 LearningRate 0.0155 Epoch: 12 Global Step: 502580 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:39,700-Speed 2633.00 samples/sec Loss 5.2655 LearningRate 0.0155 Epoch: 12 Global Step: 502590 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:43,592-Speed 2632.02 samples/sec Loss 5.3254 LearningRate 0.0155 Epoch: 12 Global Step: 502600 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:47,483-Speed 2632.15 samples/sec Loss 5.3877 LearningRate 0.0155 Epoch: 12 Global Step: 502610 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:51,375-Speed 2632.13 samples/sec Loss 5.1801 LearningRate 0.0155 Epoch: 12 Global Step: 502620 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:55,301-Speed 2608.14 samples/sec Loss 5.4002 LearningRate 0.0155 Epoch: 12 Global Step: 502630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:12:59,197-Speed 2629.07 samples/sec Loss 5.2121 LearningRate 0.0155 Epoch: 12 Global Step: 502640 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:13:03,099-Speed 2624.38 samples/sec Loss 5.3047 LearningRate 0.0155 Epoch: 12 Global Step: 502650 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:13:06,992-Speed 2632.10 samples/sec Loss 5.2549 LearningRate 0.0155 Epoch: 12 Global Step: 502660 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:10,880-Speed 2633.76 samples/sec Loss 5.2234 LearningRate 0.0155 Epoch: 12 Global Step: 502670 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:14,769-Speed 2633.86 samples/sec Loss 5.1994 LearningRate 0.0155 Epoch: 12 Global Step: 502680 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:18,696-Speed 2608.93 samples/sec Loss 5.2900 LearningRate 0.0155 Epoch: 12 Global Step: 502690 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:22,653-Speed 2587.83 samples/sec Loss 5.2388 LearningRate 0.0155 Epoch: 12 Global Step: 502700 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:26,567-Speed 2617.05 samples/sec Loss 5.2745 LearningRate 0.0155 Epoch: 12 Global Step: 502710 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:30,536-Speed 2580.48 samples/sec Loss 5.2359 LearningRate 0.0155 Epoch: 12 Global Step: 502720 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:34,510-Speed 2576.93 samples/sec Loss 5.3168 LearningRate 0.0155 Epoch: 12 Global Step: 502730 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:38,407-Speed 2628.53 samples/sec Loss 5.2795 LearningRate 0.0155 Epoch: 12 Global Step: 502740 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:42,309-Speed 2625.04 samples/sec Loss 5.2148 LearningRate 0.0155 Epoch: 12 Global Step: 502750 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:46,184-Speed 2642.85 samples/sec Loss 5.2767 LearningRate 0.0155 Epoch: 12 Global Step: 502760 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:50,076-Speed 2631.80 samples/sec Loss 5.2649 LearningRate 0.0155 Epoch: 12 Global Step: 502770 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:53,971-Speed 2630.11 samples/sec Loss 5.3012 LearningRate 0.0155 Epoch: 12 Global Step: 502780 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:13:57,860-Speed 2633.65 samples/sec Loss 5.1555 LearningRate 0.0155 Epoch: 12 Global Step: 502790 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:01,755-Speed 2629.18 samples/sec Loss 5.2750 LearningRate 0.0155 Epoch: 12 Global Step: 502800 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:05,658-Speed 2624.13 samples/sec Loss 5.3650 LearningRate 0.0155 Epoch: 12 Global Step: 502810 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:09,549-Speed 2632.53 samples/sec Loss 5.2768 LearningRate 0.0155 Epoch: 12 Global Step: 502820 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:13,454-Speed 2622.69 samples/sec Loss 5.1807 LearningRate 0.0155 Epoch: 12 Global Step: 502830 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:17,348-Speed 2630.45 samples/sec Loss 5.2756 LearningRate 0.0155 Epoch: 12 Global Step: 502840 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:21,239-Speed 2632.79 samples/sec Loss 5.2431 LearningRate 0.0155 Epoch: 12 Global Step: 502850 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:25,107-Speed 2647.62 samples/sec Loss 5.3157 LearningRate 0.0155 Epoch: 12 Global Step: 502860 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:29,003-Speed 2629.53 samples/sec Loss 5.2372 LearningRate 0.0155 Epoch: 12 Global Step: 502870 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:32,894-Speed 2632.06 samples/sec Loss 5.3152 LearningRate 0.0155 Epoch: 12 Global Step: 502880 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:36,792-Speed 2627.46 samples/sec Loss 5.3014 LearningRate 0.0155 Epoch: 12 Global Step: 502890 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:40,685-Speed 2630.98 samples/sec Loss 5.2530 LearningRate 0.0155 Epoch: 12 Global Step: 502900 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:14:44,563-Speed 2640.48 samples/sec Loss 5.2920 LearningRate 0.0155 Epoch: 12 Global Step: 502910 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:14:48,466-Speed 2624.77 samples/sec Loss 5.3862 LearningRate 0.0155 Epoch: 12 Global Step: 502920 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:14:52,362-Speed 2628.83 samples/sec Loss 5.2798 LearningRate 0.0155 Epoch: 12 Global Step: 502930 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:14:56,259-Speed 2628.55 samples/sec Loss 5.3757 LearningRate 0.0155 Epoch: 12 Global Step: 502940 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:00,165-Speed 2621.59 samples/sec Loss 5.2220 LearningRate 0.0155 Epoch: 12 Global Step: 502950 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:04,058-Speed 2631.02 samples/sec Loss 5.3256 LearningRate 0.0155 Epoch: 12 Global Step: 502960 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:08,057-Speed 2561.28 samples/sec Loss 5.2243 LearningRate 0.0155 Epoch: 12 Global Step: 502970 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:11,956-Speed 2627.63 samples/sec Loss 5.2434 LearningRate 0.0155 Epoch: 12 Global Step: 502980 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:15,849-Speed 2631.12 samples/sec Loss 5.3094 LearningRate 0.0155 Epoch: 12 Global Step: 502990 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:19,751-Speed 2624.95 samples/sec Loss 5.2949 LearningRate 0.0155 Epoch: 12 Global Step: 503000 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:23,652-Speed 2625.29 samples/sec Loss 5.2500 LearningRate 0.0155 Epoch: 12 Global Step: 503010 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:15:27,554-Speed 2625.11 samples/sec Loss 5.3211 LearningRate 0.0155 Epoch: 12 Global Step: 503020 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:15:31,435-Speed 2638.85 samples/sec Loss 5.2215 LearningRate 0.0155 Epoch: 12 Global Step: 503030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:35,329-Speed 2630.26 samples/sec Loss 5.3595 LearningRate 0.0155 Epoch: 12 Global Step: 503040 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:39,220-Speed 2632.31 samples/sec Loss 5.2763 LearningRate 0.0155 Epoch: 12 Global Step: 503050 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:43,129-Speed 2620.61 samples/sec Loss 5.1932 LearningRate 0.0155 Epoch: 12 Global Step: 503060 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:47,022-Speed 2630.83 samples/sec Loss 5.2549 LearningRate 0.0155 Epoch: 12 Global Step: 503070 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:50,922-Speed 2626.48 samples/sec Loss 5.3089 LearningRate 0.0155 Epoch: 12 Global Step: 503080 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:54,824-Speed 2624.57 samples/sec Loss 5.3039 LearningRate 0.0155 Epoch: 12 Global Step: 503090 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:15:58,720-Speed 2629.04 samples/sec Loss 5.2798 LearningRate 0.0155 Epoch: 12 Global Step: 503100 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:02,618-Speed 2627.73 samples/sec Loss 5.2991 LearningRate 0.0155 Epoch: 12 Global Step: 503110 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:06,516-Speed 2627.39 samples/sec Loss 5.2900 LearningRate 0.0155 Epoch: 12 Global Step: 503120 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:10,414-Speed 2627.28 samples/sec Loss 5.2702 LearningRate 0.0155 Epoch: 12 Global Step: 503130 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:16:14,311-Speed 2628.61 samples/sec Loss 5.1694 LearningRate 0.0155 Epoch: 12 Global Step: 503140 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:16:18,205-Speed 2630.22 samples/sec Loss 5.2400 LearningRate 0.0155 Epoch: 12 Global Step: 503150 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:16:22,081-Speed 2642.59 samples/sec Loss 5.3457 LearningRate 0.0155 Epoch: 12 Global Step: 503160 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:25,970-Speed 2633.76 samples/sec Loss 5.2480 LearningRate 0.0155 Epoch: 12 Global Step: 503170 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:29,863-Speed 2631.13 samples/sec Loss 5.2018 LearningRate 0.0155 Epoch: 12 Global Step: 503180 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:33,758-Speed 2629.83 samples/sec Loss 5.2442 LearningRate 0.0155 Epoch: 12 Global Step: 503190 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:37,654-Speed 2628.77 samples/sec Loss 5.3074 LearningRate 0.0155 Epoch: 12 Global Step: 503200 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:41,555-Speed 2625.53 samples/sec Loss 5.2921 LearningRate 0.0155 Epoch: 12 Global Step: 503210 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:45,452-Speed 2627.97 samples/sec Loss 5.2419 LearningRate 0.0155 Epoch: 12 Global Step: 503220 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:49,349-Speed 2628.13 samples/sec Loss 5.2257 LearningRate 0.0155 Epoch: 12 Global Step: 503230 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:53,239-Speed 2633.33 samples/sec Loss 5.1788 LearningRate 0.0155 Epoch: 12 Global Step: 503240 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:16:57,132-Speed 2630.92 samples/sec Loss 5.3427 LearningRate 0.0155 Epoch: 12 Global Step: 503250 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:17:01,027-Speed 2629.61 samples/sec Loss 5.2311 LearningRate 0.0155 Epoch: 12 Global Step: 503260 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:04,918-Speed 2632.35 samples/sec Loss 5.3008 LearningRate 0.0155 Epoch: 12 Global Step: 503270 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:08,809-Speed 2632.40 samples/sec Loss 5.3541 LearningRate 0.0155 Epoch: 12 Global Step: 503280 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:12,705-Speed 2628.96 samples/sec Loss 5.3695 LearningRate 0.0155 Epoch: 12 Global Step: 503290 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:16,671-Speed 2582.32 samples/sec Loss 5.2091 LearningRate 0.0155 Epoch: 12 Global Step: 503300 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:20,571-Speed 2626.30 samples/sec Loss 5.2565 LearningRate 0.0155 Epoch: 12 Global Step: 503310 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:24,462-Speed 2632.84 samples/sec Loss 5.3542 LearningRate 0.0155 Epoch: 12 Global Step: 503320 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:28,355-Speed 2631.02 samples/sec Loss 5.2204 LearningRate 0.0155 Epoch: 12 Global Step: 503330 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:32,246-Speed 2632.60 samples/sec Loss 5.3202 LearningRate 0.0155 Epoch: 12 Global Step: 503340 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:36,137-Speed 2632.83 samples/sec Loss 5.1875 LearningRate 0.0155 Epoch: 12 Global Step: 503350 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:40,010-Speed 2644.60 samples/sec Loss 5.3399 LearningRate 0.0155 Epoch: 12 Global Step: 503360 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:43,906-Speed 2628.72 samples/sec Loss 5.2849 LearningRate 0.0155 Epoch: 12 Global Step: 503370 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:47,801-Speed 2629.49 samples/sec Loss 5.3048 LearningRate 0.0155 Epoch: 12 Global Step: 503380 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:17:51,670-Speed 2647.41 samples/sec Loss 5.3475 LearningRate 0.0155 Epoch: 12 Global Step: 503390 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:17:55,565-Speed 2629.13 samples/sec Loss 5.2006 LearningRate 0.0155 Epoch: 12 Global Step: 503400 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:17:59,464-Speed 2627.75 samples/sec Loss 5.1884 LearningRate 0.0155 Epoch: 12 Global Step: 503410 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:03,378-Speed 2616.88 samples/sec Loss 5.1713 LearningRate 0.0155 Epoch: 12 Global Step: 503420 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:07,276-Speed 2628.05 samples/sec Loss 5.2337 LearningRate 0.0155 Epoch: 12 Global Step: 503430 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:11,174-Speed 2627.75 samples/sec Loss 5.2949 LearningRate 0.0155 Epoch: 12 Global Step: 503440 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:15,072-Speed 2627.66 samples/sec Loss 5.2300 LearningRate 0.0155 Epoch: 12 Global Step: 503450 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:18,966-Speed 2630.03 samples/sec Loss 5.2785 LearningRate 0.0155 Epoch: 12 Global Step: 503460 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:22,862-Speed 2629.23 samples/sec Loss 5.3055 LearningRate 0.0155 Epoch: 12 Global Step: 503470 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:26,761-Speed 2626.92 samples/sec Loss 5.3105 LearningRate 0.0155 Epoch: 12 Global Step: 503480 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:18:30,654-Speed 2630.77 samples/sec Loss 5.2598 LearningRate 0.0155 Epoch: 12 Global Step: 503490 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:18:34,564-Speed 2623.48 samples/sec Loss 5.3769 LearningRate 0.0155 Epoch: 12 Global Step: 503500 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:18:38,460-Speed 2629.70 samples/sec Loss 5.2854 LearningRate 0.0154 Epoch: 12 Global Step: 503510 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:18:42,371-Speed 2618.53 samples/sec Loss 5.2613 LearningRate 0.0154 Epoch: 12 Global Step: 503520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:18:46,268-Speed 2628.45 samples/sec Loss 5.3976 LearningRate 0.0154 Epoch: 12 Global Step: 503530 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:18:50,163-Speed 2629.91 samples/sec Loss 5.3156 LearningRate 0.0154 Epoch: 12 Global Step: 503540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:18:54,073-Speed 2620.07 samples/sec Loss 5.2815 LearningRate 0.0154 Epoch: 12 Global Step: 503550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:18:57,971-Speed 2627.67 samples/sec Loss 5.3440 LearningRate 0.0154 Epoch: 12 Global Step: 503560 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:01,864-Speed 2630.89 samples/sec Loss 5.2678 LearningRate 0.0154 Epoch: 12 Global Step: 503570 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:05,758-Speed 2629.76 samples/sec Loss 5.2951 LearningRate 0.0154 Epoch: 12 Global Step: 503580 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:09,651-Speed 2631.72 samples/sec Loss 5.2982 LearningRate 0.0154 Epoch: 12 Global Step: 503590 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:13,560-Speed 2629.18 samples/sec Loss 5.2839 LearningRate 0.0154 Epoch: 12 Global Step: 503600 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:17,458-Speed 2627.83 samples/sec Loss 5.2996 LearningRate 0.0154 Epoch: 12 Global Step: 503610 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:21,383-Speed 2609.32 samples/sec Loss 5.1941 LearningRate 0.0154 Epoch: 12 Global Step: 503620 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:25,286-Speed 2624.47 samples/sec Loss 5.3269 LearningRate 0.0154 Epoch: 12 Global Step: 503630 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:29,227-Speed 2599.35 samples/sec Loss 5.3243 LearningRate 0.0154 Epoch: 12 Global Step: 503640 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:33,137-Speed 2619.12 samples/sec Loss 5.2740 LearningRate 0.0154 Epoch: 12 Global Step: 503650 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:37,034-Speed 2628.57 samples/sec Loss 5.2628 LearningRate 0.0154 Epoch: 12 Global Step: 503660 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:40,931-Speed 2627.96 samples/sec Loss 5.2752 LearningRate 0.0154 Epoch: 12 Global Step: 503670 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:44,834-Speed 2624.38 samples/sec Loss 5.2772 LearningRate 0.0154 Epoch: 12 Global Step: 503680 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:48,711-Speed 2643.03 samples/sec Loss 5.3539 LearningRate 0.0154 Epoch: 12 Global Step: 503690 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:52,614-Speed 2623.93 samples/sec Loss 5.2477 LearningRate 0.0154 Epoch: 12 Global Step: 503700 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:19:56,505-Speed 2632.24 samples/sec Loss 5.2841 LearningRate 0.0154 Epoch: 12 Global Step: 503710 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:00,394-Speed 2633.84 samples/sec Loss 5.2460 LearningRate 0.0154 Epoch: 12 Global Step: 503720 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:04,289-Speed 2629.14 samples/sec Loss 5.1877 LearningRate 0.0154 Epoch: 12 Global Step: 503730 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:08,200-Speed 2618.95 samples/sec Loss 5.2963 LearningRate 0.0154 Epoch: 12 Global Step: 503740 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:12,099-Speed 2627.67 samples/sec Loss 5.1948 LearningRate 0.0154 Epoch: 12 Global Step: 503750 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:15,996-Speed 2627.89 samples/sec Loss 5.2423 LearningRate 0.0154 Epoch: 12 Global Step: 503760 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:19,924-Speed 2607.73 samples/sec Loss 5.2356 LearningRate 0.0154 Epoch: 12 Global Step: 503770 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:23,818-Speed 2630.83 samples/sec Loss 5.2494 LearningRate 0.0154 Epoch: 12 Global Step: 503780 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:20:27,688-Speed 2646.97 samples/sec Loss 5.1637 LearningRate 0.0154 Epoch: 12 Global Step: 503790 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:31,585-Speed 2627.86 samples/sec Loss 5.2418 LearningRate 0.0154 Epoch: 12 Global Step: 503800 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:35,483-Speed 2627.38 samples/sec Loss 5.2228 LearningRate 0.0154 Epoch: 12 Global Step: 503810 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:39,386-Speed 2623.94 samples/sec Loss 5.2678 LearningRate 0.0154 Epoch: 12 Global Step: 503820 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:43,278-Speed 2632.77 samples/sec Loss 5.3951 LearningRate 0.0154 Epoch: 12 Global Step: 503830 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:47,172-Speed 2630.02 samples/sec Loss 5.2227 LearningRate 0.0154 Epoch: 12 Global Step: 503840 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:51,069-Speed 2628.58 samples/sec Loss 5.3964 LearningRate 0.0154 Epoch: 12 Global Step: 503850 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:54,964-Speed 2629.63 samples/sec Loss 5.3905 LearningRate 0.0154 Epoch: 12 Global Step: 503860 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:20:58,871-Speed 2621.80 samples/sec Loss 5.2707 LearningRate 0.0154 Epoch: 12 Global Step: 503870 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:21:02,768-Speed 2628.10 samples/sec Loss 5.2956 LearningRate 0.0154 Epoch: 12 Global Step: 503880 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:21:06,673-Speed 2622.55 samples/sec Loss 5.2347 LearningRate 0.0154 Epoch: 12 Global Step: 503890 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:10,570-Speed 2628.23 samples/sec Loss 5.3307 LearningRate 0.0154 Epoch: 12 Global Step: 503900 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:14,467-Speed 2628.65 samples/sec Loss 5.3247 LearningRate 0.0154 Epoch: 12 Global Step: 503910 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:18,375-Speed 2621.26 samples/sec Loss 5.2307 LearningRate 0.0154 Epoch: 12 Global Step: 503920 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:22,270-Speed 2629.18 samples/sec Loss 5.2624 LearningRate 0.0154 Epoch: 12 Global Step: 503930 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:26,174-Speed 2624.24 samples/sec Loss 5.2454 LearningRate 0.0154 Epoch: 12 Global Step: 503940 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:30,067-Speed 2630.75 samples/sec Loss 5.2233 LearningRate 0.0154 Epoch: 12 Global Step: 503950 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:33,964-Speed 2628.10 samples/sec Loss 5.3361 LearningRate 0.0154 Epoch: 12 Global Step: 503960 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:37,860-Speed 2628.80 samples/sec Loss 5.2042 LearningRate 0.0154 Epoch: 12 Global Step: 503970 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:41,753-Speed 2630.53 samples/sec Loss 5.2902 LearningRate 0.0154 Epoch: 12 Global Step: 503980 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:45,627-Speed 2643.87 samples/sec Loss 5.2485 LearningRate 0.0154 Epoch: 12 Global Step: 503990 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:21:49,514-Speed 2635.66 samples/sec Loss 5.3431 LearningRate 0.0154 Epoch: 12 Global Step: 504000 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:21:53,412-Speed 2627.58 samples/sec Loss 5.2843 LearningRate 0.0154 Epoch: 12 Global Step: 504010 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:21:57,321-Speed 2620.37 samples/sec Loss 5.1351 LearningRate 0.0154 Epoch: 12 Global Step: 504020 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:01,212-Speed 2632.18 samples/sec Loss 5.3173 LearningRate 0.0154 Epoch: 12 Global Step: 504030 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:05,106-Speed 2630.10 samples/sec Loss 5.3736 LearningRate 0.0154 Epoch: 12 Global Step: 504040 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:09,000-Speed 2630.34 samples/sec Loss 5.3016 LearningRate 0.0154 Epoch: 12 Global Step: 504050 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:12,894-Speed 2630.09 samples/sec Loss 5.2878 LearningRate 0.0154 Epoch: 12 Global Step: 504060 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:16,789-Speed 2629.87 samples/sec Loss 5.2704 LearningRate 0.0154 Epoch: 12 Global Step: 504070 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:20,708-Speed 2613.31 samples/sec Loss 5.1989 LearningRate 0.0154 Epoch: 12 Global Step: 504080 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:24,618-Speed 2619.87 samples/sec Loss 5.1796 LearningRate 0.0154 Epoch: 12 Global Step: 504090 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:28,511-Speed 2630.83 samples/sec Loss 5.2248 LearningRate 0.0154 Epoch: 12 Global Step: 504100 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:22:32,387-Speed 2643.00 samples/sec Loss 5.2687 LearningRate 0.0154 Epoch: 12 Global Step: 504110 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:36,282-Speed 2629.38 samples/sec Loss 5.2788 LearningRate 0.0154 Epoch: 12 Global Step: 504120 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:40,178-Speed 2629.17 samples/sec Loss 5.3257 LearningRate 0.0154 Epoch: 12 Global Step: 504130 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:44,072-Speed 2630.07 samples/sec Loss 5.2670 LearningRate 0.0154 Epoch: 12 Global Step: 504140 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:47,971-Speed 2626.81 samples/sec Loss 5.2284 LearningRate 0.0154 Epoch: 12 Global Step: 504150 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:51,878-Speed 2621.38 samples/sec Loss 5.2765 LearningRate 0.0154 Epoch: 12 Global Step: 504160 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:55,796-Speed 2614.19 samples/sec Loss 5.3107 LearningRate 0.0154 Epoch: 12 Global Step: 504170 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:22:59,701-Speed 2623.03 samples/sec Loss 5.3224 LearningRate 0.0154 Epoch: 12 Global Step: 504180 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:03,599-Speed 2627.82 samples/sec Loss 5.2345 LearningRate 0.0154 Epoch: 12 Global Step: 504190 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:07,502-Speed 2624.50 samples/sec Loss 5.2770 LearningRate 0.0154 Epoch: 12 Global Step: 504200 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:11,396-Speed 2630.35 samples/sec Loss 5.2232 LearningRate 0.0154 Epoch: 12 Global Step: 504210 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:23:15,298-Speed 2624.89 samples/sec Loss 5.2199 LearningRate 0.0154 Epoch: 12 Global Step: 504220 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:23:19,199-Speed 2625.79 samples/sec Loss 5.2515 LearningRate 0.0154 Epoch: 12 Global Step: 504230 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:23:23,094-Speed 2629.04 samples/sec Loss 5.2270 LearningRate 0.0154 Epoch: 12 Global Step: 504240 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:23:26,993-Speed 2626.99 samples/sec Loss 5.1770 LearningRate 0.0154 Epoch: 12 Global Step: 504250 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:23:30,864-Speed 2645.93 samples/sec Loss 5.2335 LearningRate 0.0154 Epoch: 12 Global Step: 504260 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:34,758-Speed 2630.78 samples/sec Loss 5.2509 LearningRate 0.0154 Epoch: 12 Global Step: 504270 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:38,662-Speed 2623.10 samples/sec Loss 5.2138 LearningRate 0.0154 Epoch: 12 Global Step: 504280 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:42,580-Speed 2615.47 samples/sec Loss 5.2094 LearningRate 0.0154 Epoch: 12 Global Step: 504290 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:46,553-Speed 2577.87 samples/sec Loss 5.2567 LearningRate 0.0154 Epoch: 12 Global Step: 504300 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:50,451-Speed 2627.31 samples/sec Loss 5.2193 LearningRate 0.0154 Epoch: 12 Global Step: 504310 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:54,355-Speed 2623.50 samples/sec Loss 5.2684 LearningRate 0.0154 Epoch: 12 Global Step: 504320 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:23:58,252-Speed 2628.24 samples/sec Loss 5.2302 LearningRate 0.0154 Epoch: 12 Global Step: 504330 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:02,159-Speed 2621.55 samples/sec Loss 5.2976 LearningRate 0.0154 Epoch: 12 Global Step: 504340 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:06,060-Speed 2625.64 samples/sec Loss 5.2345 LearningRate 0.0154 Epoch: 12 Global Step: 504350 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:09,965-Speed 2623.09 samples/sec Loss 5.3239 LearningRate 0.0154 Epoch: 12 Global Step: 504360 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:24:13,863-Speed 2627.61 samples/sec Loss 5.2742 LearningRate 0.0154 Epoch: 12 Global Step: 504370 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:24:17,777-Speed 2616.69 samples/sec Loss 5.2693 LearningRate 0.0154 Epoch: 12 Global Step: 504380 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:24:21,688-Speed 2619.43 samples/sec Loss 5.2256 LearningRate 0.0154 Epoch: 12 Global Step: 504390 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:24:25,565-Speed 2641.83 samples/sec Loss 5.3691 LearningRate 0.0154 Epoch: 12 Global Step: 504400 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:29,463-Speed 2627.20 samples/sec Loss 5.2873 LearningRate 0.0154 Epoch: 12 Global Step: 504410 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:33,355-Speed 2631.93 samples/sec Loss 5.3021 LearningRate 0.0154 Epoch: 12 Global Step: 504420 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:37,268-Speed 2617.61 samples/sec Loss 5.2480 LearningRate 0.0154 Epoch: 12 Global Step: 504430 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:41,167-Speed 2626.56 samples/sec Loss 5.2311 LearningRate 0.0154 Epoch: 12 Global Step: 504440 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:45,069-Speed 2625.28 samples/sec Loss 5.2844 LearningRate 0.0154 Epoch: 12 Global Step: 504450 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:48,968-Speed 2626.47 samples/sec Loss 5.2590 LearningRate 0.0154 Epoch: 12 Global Step: 504460 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:52,858-Speed 2633.11 samples/sec Loss 5.2705 LearningRate 0.0154 Epoch: 12 Global Step: 504470 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:24:56,747-Speed 2633.53 samples/sec Loss 5.2442 LearningRate 0.0154 Epoch: 12 Global Step: 504480 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:25:00,636-Speed 2633.59 samples/sec Loss 5.2460 LearningRate 0.0154 Epoch: 12 Global Step: 504490 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:25:04,539-Speed 2624.75 samples/sec Loss 5.1916 LearningRate 0.0154 Epoch: 12 Global Step: 504500 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:08,431-Speed 2631.33 samples/sec Loss 5.2777 LearningRate 0.0154 Epoch: 12 Global Step: 504510 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:12,327-Speed 2628.75 samples/sec Loss 5.1688 LearningRate 0.0154 Epoch: 12 Global Step: 504520 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:16,223-Speed 2629.23 samples/sec Loss 5.1669 LearningRate 0.0154 Epoch: 12 Global Step: 504530 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:20,114-Speed 2632.50 samples/sec Loss 5.1302 LearningRate 0.0154 Epoch: 12 Global Step: 504540 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:24,021-Speed 2621.42 samples/sec Loss 5.1763 LearningRate 0.0154 Epoch: 12 Global Step: 504550 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:27,916-Speed 2629.41 samples/sec Loss 5.1636 LearningRate 0.0153 Epoch: 12 Global Step: 504560 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:31,813-Speed 2628.10 samples/sec Loss 5.2430 LearningRate 0.0153 Epoch: 12 Global Step: 504570 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:35,799-Speed 2569.60 samples/sec Loss 5.2217 LearningRate 0.0153 Epoch: 12 Global Step: 504580 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:39,694-Speed 2629.73 samples/sec Loss 5.2311 LearningRate 0.0153 Epoch: 12 Global Step: 504590 Fp16 Grad Scale: 131072 Required: 37 hours
Training: 2022-04-15 04:25:43,660-Speed 2587.34 samples/sec Loss 5.2074 LearningRate 0.0153 Epoch: 12 Global Step: 504600 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:25:47,747-Speed 2506.13 samples/sec Loss 5.2032 LearningRate 0.0153 Epoch: 12 Global Step: 504610 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:25:51,841-Speed 2502.22 samples/sec Loss 5.2511 LearningRate 0.0153 Epoch: 12 Global Step: 504620 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:25:55,794-Speed 2590.89 samples/sec Loss 5.2766 LearningRate 0.0153 Epoch: 12 Global Step: 504630 Fp16 Grad Scale: 65536 Required: 37 hours
Training: 2022-04-15 04:25:59,671-Speed 2641.46 samples/sec Loss 5.2923 LearningRate 0.0153 Epoch: 12 Global Step: 504640 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:03,569-Speed 2627.57 samples/sec Loss 5.3459 LearningRate 0.0153 Epoch: 12 Global Step: 504650 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:07,476-Speed 2621.33 samples/sec Loss 5.1381 LearningRate 0.0153 Epoch: 12 Global Step: 504660 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:11,374-Speed 2627.92 samples/sec Loss 5.2781 LearningRate 0.0153 Epoch: 12 Global Step: 504670 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:15,272-Speed 2627.45 samples/sec Loss 5.1294 LearningRate 0.0153 Epoch: 12 Global Step: 504680 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:19,170-Speed 2627.85 samples/sec Loss 5.3053 LearningRate 0.0153 Epoch: 12 Global Step: 504690 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:23,069-Speed 2626.87 samples/sec Loss 5.1205 LearningRate 0.0153 Epoch: 12 Global Step: 504700 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:26,964-Speed 2630.06 samples/sec Loss 5.2255 LearningRate 0.0153 Epoch: 12 Global Step: 504710 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:30,855-Speed 2632.50 samples/sec Loss 5.2935 LearningRate 0.0153 Epoch: 12 Global Step: 504720 Fp16 Grad Scale: 32768 Required: 37 hours
Training: 2022-04-15 04:26:34,747-Speed 2631.20 samples/sec Loss 5.0982 LearningRate 0.0153 Epoch: 12 Global Step: 504730 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:26:38,639-Speed 2631.35 samples/sec Loss 5.2547 LearningRate 0.0153 Epoch: 12 Global Step: 504740 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:26:42,532-Speed 2631.31 samples/sec Loss 5.2575 LearningRate 0.0153 Epoch: 12 Global Step: 504750 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:26:46,425-Speed 2630.62 samples/sec Loss 5.3285 LearningRate 0.0153 Epoch: 12 Global Step: 504760 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:26:50,318-Speed 2631.21 samples/sec Loss 5.4366 LearningRate 0.0153 Epoch: 12 Global Step: 504770 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:26:54,210-Speed 2631.48 samples/sec Loss 5.1632 LearningRate 0.0153 Epoch: 12 Global Step: 504780 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:26:58,101-Speed 2633.04 samples/sec Loss 5.1768 LearningRate 0.0153 Epoch: 12 Global Step: 504790 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:01,997-Speed 2628.87 samples/sec Loss 5.2565 LearningRate 0.0153 Epoch: 12 Global Step: 504800 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:05,894-Speed 2628.34 samples/sec Loss 5.2847 LearningRate 0.0153 Epoch: 12 Global Step: 504810 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:09,794-Speed 2625.73 samples/sec Loss 5.2565 LearningRate 0.0153 Epoch: 12 Global Step: 504820 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:13,689-Speed 2630.62 samples/sec Loss 5.1299 LearningRate 0.0153 Epoch: 12 Global Step: 504830 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:17,606-Speed 2614.68 samples/sec Loss 5.3382 LearningRate 0.0153 Epoch: 12 Global Step: 504840 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:27:21,504-Speed 2627.57 samples/sec Loss 5.2804 LearningRate 0.0153 Epoch: 12 Global Step: 504850 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:27:25,557-Speed 2527.05 samples/sec Loss 5.2233 LearningRate 0.0153 Epoch: 12 Global Step: 504860 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:27:29,471-Speed 2617.06 samples/sec Loss 5.1419 LearningRate 0.0153 Epoch: 12 Global Step: 504870 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:27:33,349-Speed 2641.01 samples/sec Loss 5.2663 LearningRate 0.0153 Epoch: 12 Global Step: 504880 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:37,342-Speed 2564.80 samples/sec Loss 5.3150 LearningRate 0.0153 Epoch: 12 Global Step: 504890 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:41,241-Speed 2626.96 samples/sec Loss 5.2369 LearningRate 0.0153 Epoch: 12 Global Step: 504900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:45,140-Speed 2627.47 samples/sec Loss 5.2214 LearningRate 0.0153 Epoch: 12 Global Step: 504910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:49,043-Speed 2623.85 samples/sec Loss 5.2687 LearningRate 0.0153 Epoch: 12 Global Step: 504920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:52,951-Speed 2621.27 samples/sec Loss 5.2236 LearningRate 0.0153 Epoch: 12 Global Step: 504930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:27:56,852-Speed 2625.74 samples/sec Loss 5.2309 LearningRate 0.0153 Epoch: 12 Global Step: 504940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:28:00,745-Speed 2630.84 samples/sec Loss 5.0182 LearningRate 0.0153 Epoch: 12 Global Step: 504950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:28:04,645-Speed 2626.17 samples/sec Loss 5.1565 LearningRate 0.0153 Epoch: 12 Global Step: 504960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:28:08,544-Speed 2626.72 samples/sec Loss 5.2568 LearningRate 0.0153 Epoch: 12 Global Step: 504970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:28:12,440-Speed 2629.14 samples/sec Loss 5.2275 LearningRate 0.0153 Epoch: 12 Global Step: 504980 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:16,330-Speed 2633.37 samples/sec Loss 5.3118 LearningRate 0.0153 Epoch: 12 Global Step: 504990 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:20,225-Speed 2630.24 samples/sec Loss 5.2653 LearningRate 0.0153 Epoch: 12 Global Step: 505000 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:24,118-Speed 2631.70 samples/sec Loss 5.1642 LearningRate 0.0153 Epoch: 12 Global Step: 505010 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:28,012-Speed 2630.26 samples/sec Loss 5.1998 LearningRate 0.0153 Epoch: 12 Global Step: 505020 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:31,919-Speed 2621.18 samples/sec Loss 5.2177 LearningRate 0.0153 Epoch: 12 Global Step: 505030 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:35,829-Speed 2619.84 samples/sec Loss 5.2822 LearningRate 0.0153 Epoch: 12 Global Step: 505040 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:39,730-Speed 2626.14 samples/sec Loss 5.2717 LearningRate 0.0153 Epoch: 12 Global Step: 505050 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:43,636-Speed 2622.74 samples/sec Loss 5.3102 LearningRate 0.0153 Epoch: 12 Global Step: 505060 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:47,534-Speed 2628.06 samples/sec Loss 5.2753 LearningRate 0.0153 Epoch: 12 Global Step: 505070 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:51,430-Speed 2628.93 samples/sec Loss 5.3157 LearningRate 0.0153 Epoch: 12 Global Step: 505080 Fp16 Grad Scale: 262144 Required: 36 hours
Training: 2022-04-15 04:28:55,308-Speed 2641.32 samples/sec Loss 5.2752 LearningRate 0.0153 Epoch: 12 Global Step: 505090 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:28:59,203-Speed 2629.86 samples/sec Loss 5.2541 LearningRate 0.0153 Epoch: 12 Global Step: 505100 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:03,101-Speed 2628.08 samples/sec Loss 5.2787 LearningRate 0.0153 Epoch: 12 Global Step: 505110 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:07,021-Speed 2612.69 samples/sec Loss 5.2041 LearningRate 0.0153 Epoch: 12 Global Step: 505120 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:10,919-Speed 2627.31 samples/sec Loss 5.2674 LearningRate 0.0153 Epoch: 12 Global Step: 505130 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:14,818-Speed 2627.63 samples/sec Loss 5.2587 LearningRate 0.0153 Epoch: 12 Global Step: 505140 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:18,717-Speed 2627.15 samples/sec Loss 5.1914 LearningRate 0.0153 Epoch: 12 Global Step: 505150 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:22,613-Speed 2628.43 samples/sec Loss 5.1997 LearningRate 0.0153 Epoch: 12 Global Step: 505160 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:26,505-Speed 2631.58 samples/sec Loss 5.2302 LearningRate 0.0153 Epoch: 12 Global Step: 505170 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:30,403-Speed 2627.77 samples/sec Loss 5.1479 LearningRate 0.0153 Epoch: 12 Global Step: 505180 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:34,279-Speed 2642.63 samples/sec Loss 5.2157 LearningRate 0.0153 Epoch: 12 Global Step: 505190 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:29:38,138-Speed 2654.42 samples/sec Loss 5.2356 LearningRate 0.0153 Epoch: 12 Global Step: 505200 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:29:42,033-Speed 2630.11 samples/sec Loss 5.2459 LearningRate 0.0153 Epoch: 12 Global Step: 505210 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:29:45,929-Speed 2629.02 samples/sec Loss 5.1902 LearningRate 0.0153 Epoch: 12 Global Step: 505220 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:29:49,844-Speed 2615.82 samples/sec Loss 5.1932 LearningRate 0.0153 Epoch: 12 Global Step: 505230 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:29:53,767-Speed 2610.93 samples/sec Loss 5.1994 LearningRate 0.0153 Epoch: 12 Global Step: 505240 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:29:57,663-Speed 2629.80 samples/sec Loss 5.3487 LearningRate 0.0153 Epoch: 12 Global Step: 505250 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:01,606-Speed 2597.44 samples/sec Loss 5.2214 LearningRate 0.0153 Epoch: 12 Global Step: 505260 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:05,502-Speed 2629.12 samples/sec Loss 5.2799 LearningRate 0.0153 Epoch: 12 Global Step: 505270 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:09,452-Speed 2592.78 samples/sec Loss 5.2761 LearningRate 0.0153 Epoch: 12 Global Step: 505280 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:13,353-Speed 2626.40 samples/sec Loss 5.2272 LearningRate 0.0153 Epoch: 12 Global Step: 505290 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:17,250-Speed 2628.11 samples/sec Loss 5.1771 LearningRate 0.0153 Epoch: 12 Global Step: 505300 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:30:21,151-Speed 2625.21 samples/sec Loss 5.2610 LearningRate 0.0153 Epoch: 12 Global Step: 505310 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:30:25,047-Speed 2629.26 samples/sec Loss 5.2095 LearningRate 0.0153 Epoch: 12 Global Step: 505320 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:30:28,942-Speed 2630.20 samples/sec Loss 5.2747 LearningRate 0.0153 Epoch: 12 Global Step: 505330 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:30:32,838-Speed 2628.48 samples/sec Loss 5.1806 LearningRate 0.0153 Epoch: 12 Global Step: 505340 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:30:36,713-Speed 2642.98 samples/sec Loss 5.2640 LearningRate 0.0153 Epoch: 12 Global Step: 505350 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:40,620-Speed 2630.64 samples/sec Loss 5.2174 LearningRate 0.0153 Epoch: 12 Global Step: 505360 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:44,535-Speed 2616.41 samples/sec Loss 5.1667 LearningRate 0.0153 Epoch: 12 Global Step: 505370 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:48,459-Speed 2610.06 samples/sec Loss 5.1875 LearningRate 0.0153 Epoch: 12 Global Step: 505380 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:52,362-Speed 2624.86 samples/sec Loss 5.2427 LearningRate 0.0153 Epoch: 12 Global Step: 505390 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:30:56,264-Speed 2624.31 samples/sec Loss 5.2296 LearningRate 0.0153 Epoch: 12 Global Step: 505400 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:31:00,163-Speed 2627.48 samples/sec Loss 5.2692 LearningRate 0.0153 Epoch: 12 Global Step: 505410 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:31:04,056-Speed 2630.81 samples/sec Loss 5.1854 LearningRate 0.0153 Epoch: 12 Global Step: 505420 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:31:07,952-Speed 2629.62 samples/sec Loss 5.2084 LearningRate 0.0153 Epoch: 12 Global Step: 505430 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:31:11,847-Speed 2629.43 samples/sec Loss 5.2481 LearningRate 0.0153 Epoch: 12 Global Step: 505440 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:31:15,742-Speed 2629.86 samples/sec Loss 5.2121 LearningRate 0.0153 Epoch: 12 Global Step: 505450 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:19,651-Speed 2620.54 samples/sec Loss 5.1585 LearningRate 0.0153 Epoch: 12 Global Step: 505460 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:23,561-Speed 2619.79 samples/sec Loss 5.3079 LearningRate 0.0153 Epoch: 12 Global Step: 505470 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:27,482-Speed 2612.71 samples/sec Loss 5.3541 LearningRate 0.0153 Epoch: 12 Global Step: 505480 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:31,380-Speed 2627.31 samples/sec Loss 5.2971 LearningRate 0.0153 Epoch: 12 Global Step: 505490 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:35,392-Speed 2553.00 samples/sec Loss 5.2154 LearningRate 0.0153 Epoch: 12 Global Step: 505500 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:39,300-Speed 2620.74 samples/sec Loss 5.2121 LearningRate 0.0153 Epoch: 12 Global Step: 505510 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:43,198-Speed 2628.41 samples/sec Loss 5.2664 LearningRate 0.0153 Epoch: 12 Global Step: 505520 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:47,108-Speed 2619.39 samples/sec Loss 5.2887 LearningRate 0.0153 Epoch: 12 Global Step: 505530 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:51,005-Speed 2628.71 samples/sec Loss 5.3047 LearningRate 0.0153 Epoch: 12 Global Step: 505540 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:31:54,903-Speed 2628.10 samples/sec Loss 5.3225 LearningRate 0.0153 Epoch: 12 Global Step: 505550 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:31:58,802-Speed 2627.36 samples/sec Loss 5.2050 LearningRate 0.0153 Epoch: 12 Global Step: 505560 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:32:02,703-Speed 2625.10 samples/sec Loss 5.2794 LearningRate 0.0153 Epoch: 12 Global Step: 505570 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:32:06,601-Speed 2627.92 samples/sec Loss 5.2650 LearningRate 0.0153 Epoch: 12 Global Step: 505580 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:32:10,488-Speed 2634.93 samples/sec Loss 5.2540 LearningRate 0.0153 Epoch: 12 Global Step: 505590 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:14,394-Speed 2622.89 samples/sec Loss 5.3149 LearningRate 0.0153 Epoch: 12 Global Step: 505600 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:18,292-Speed 2627.78 samples/sec Loss 5.1920 LearningRate 0.0153 Epoch: 12 Global Step: 505610 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:22,217-Speed 2610.02 samples/sec Loss 5.2546 LearningRate 0.0152 Epoch: 12 Global Step: 505620 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:26,122-Speed 2622.50 samples/sec Loss 5.1907 LearningRate 0.0152 Epoch: 12 Global Step: 505630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:30,051-Speed 2607.65 samples/sec Loss 5.3320 LearningRate 0.0152 Epoch: 12 Global Step: 505640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:33,942-Speed 2632.15 samples/sec Loss 5.2071 LearningRate 0.0152 Epoch: 12 Global Step: 505650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:37,836-Speed 2630.14 samples/sec Loss 5.2064 LearningRate 0.0152 Epoch: 12 Global Step: 505660 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:41,732-Speed 2628.59 samples/sec Loss 5.2381 LearningRate 0.0152 Epoch: 12 Global Step: 505670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:45,626-Speed 2633.64 samples/sec Loss 5.2383 LearningRate 0.0152 Epoch: 12 Global Step: 505680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:32:49,520-Speed 2631.83 samples/sec Loss 5.2876 LearningRate 0.0152 Epoch: 12 Global Step: 505690 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:32:53,438-Speed 2614.58 samples/sec Loss 5.2026 LearningRate 0.0152 Epoch: 12 Global Step: 505700 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:32:57,351-Speed 2617.59 samples/sec Loss 5.2776 LearningRate 0.0152 Epoch: 12 Global Step: 505710 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:01,252-Speed 2626.17 samples/sec Loss 5.2335 LearningRate 0.0152 Epoch: 12 Global Step: 505720 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:05,199-Speed 2594.32 samples/sec Loss 5.2187 LearningRate 0.0152 Epoch: 12 Global Step: 505730 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:09,095-Speed 2629.16 samples/sec Loss 5.1715 LearningRate 0.0152 Epoch: 12 Global Step: 505740 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:12,994-Speed 2627.20 samples/sec Loss 5.2332 LearningRate 0.0152 Epoch: 12 Global Step: 505750 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:16,890-Speed 2629.38 samples/sec Loss 5.2494 LearningRate 0.0152 Epoch: 12 Global Step: 505760 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:20,783-Speed 2630.98 samples/sec Loss 5.2477 LearningRate 0.0152 Epoch: 12 Global Step: 505770 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:24,690-Speed 2621.91 samples/sec Loss 5.3436 LearningRate 0.0152 Epoch: 12 Global Step: 505780 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:28,585-Speed 2630.11 samples/sec Loss 5.1702 LearningRate 0.0152 Epoch: 12 Global Step: 505790 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:32,480-Speed 2629.83 samples/sec Loss 5.2662 LearningRate 0.0152 Epoch: 12 Global Step: 505800 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:33:36,375-Speed 2629.55 samples/sec Loss 5.2902 LearningRate 0.0152 Epoch: 12 Global Step: 505810 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:33:40,305-Speed 2606.08 samples/sec Loss 5.1604 LearningRate 0.0152 Epoch: 12 Global Step: 505820 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:33:44,220-Speed 2616.23 samples/sec Loss 5.3167 LearningRate 0.0152 Epoch: 12 Global Step: 505830 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:33:48,141-Speed 2612.71 samples/sec Loss 5.1127 LearningRate 0.0152 Epoch: 12 Global Step: 505840 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:33:52,032-Speed 2632.05 samples/sec Loss 5.2418 LearningRate 0.0152 Epoch: 12 Global Step: 505850 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:33:55,931-Speed 2626.96 samples/sec Loss 5.2206 LearningRate 0.0152 Epoch: 12 Global Step: 505860 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:33:59,829-Speed 2627.94 samples/sec Loss 5.2920 LearningRate 0.0152 Epoch: 12 Global Step: 505870 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:34:03,726-Speed 2627.94 samples/sec Loss 5.2856 LearningRate 0.0152 Epoch: 12 Global Step: 505880 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:34:07,595-Speed 2647.76 samples/sec Loss 5.2460 LearningRate 0.0152 Epoch: 12 Global Step: 505890 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:11,489-Speed 2630.38 samples/sec Loss 5.1202 LearningRate 0.0152 Epoch: 12 Global Step: 505900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:15,385-Speed 2629.08 samples/sec Loss 5.2432 LearningRate 0.0152 Epoch: 12 Global Step: 505910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:19,280-Speed 2629.51 samples/sec Loss 5.1759 LearningRate 0.0152 Epoch: 12 Global Step: 505920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:23,175-Speed 2629.46 samples/sec Loss 5.2562 LearningRate 0.0152 Epoch: 12 Global Step: 505930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:27,078-Speed 2623.55 samples/sec Loss 5.3507 LearningRate 0.0152 Epoch: 12 Global Step: 505940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:30,983-Speed 2623.18 samples/sec Loss 5.2151 LearningRate 0.0152 Epoch: 12 Global Step: 505950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:34,879-Speed 2629.29 samples/sec Loss 5.3045 LearningRate 0.0152 Epoch: 12 Global Step: 505960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:38,782-Speed 2624.52 samples/sec Loss 5.2133 LearningRate 0.0152 Epoch: 12 Global Step: 505970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:42,679-Speed 2628.87 samples/sec Loss 5.1919 LearningRate 0.0152 Epoch: 12 Global Step: 505980 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:46,574-Speed 2629.02 samples/sec Loss 5.2329 LearningRate 0.0152 Epoch: 12 Global Step: 505990 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:34:50,445-Speed 2646.30 samples/sec Loss 5.2225 LearningRate 0.0152 Epoch: 12 Global Step: 506000 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:54,346-Speed 2625.34 samples/sec Loss 5.1881 LearningRate 0.0152 Epoch: 12 Global Step: 506010 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:34:58,240-Speed 2630.37 samples/sec Loss 5.2383 LearningRate 0.0152 Epoch: 12 Global Step: 506020 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:02,149-Speed 2619.43 samples/sec Loss 5.2191 LearningRate 0.0152 Epoch: 12 Global Step: 506030 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:06,049-Speed 2626.83 samples/sec Loss 5.1355 LearningRate 0.0152 Epoch: 12 Global Step: 506040 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:09,943-Speed 2630.25 samples/sec Loss 5.2224 LearningRate 0.0152 Epoch: 12 Global Step: 506050 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:13,838-Speed 2630.00 samples/sec Loss 5.3063 LearningRate 0.0152 Epoch: 12 Global Step: 506060 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:17,733-Speed 2629.96 samples/sec Loss 5.2003 LearningRate 0.0152 Epoch: 12 Global Step: 506070 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:21,639-Speed 2622.07 samples/sec Loss 5.2284 LearningRate 0.0152 Epoch: 12 Global Step: 506080 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:25,535-Speed 2628.61 samples/sec Loss 5.2285 LearningRate 0.0152 Epoch: 12 Global Step: 506090 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:29,422-Speed 2635.07 samples/sec Loss 5.2927 LearningRate 0.0152 Epoch: 12 Global Step: 506100 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:33,314-Speed 2631.49 samples/sec Loss 5.2454 LearningRate 0.0152 Epoch: 12 Global Step: 506110 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:37,221-Speed 2621.97 samples/sec Loss 5.1162 LearningRate 0.0152 Epoch: 12 Global Step: 506120 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:41,118-Speed 2627.90 samples/sec Loss 5.2234 LearningRate 0.0152 Epoch: 12 Global Step: 506130 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:45,016-Speed 2627.61 samples/sec Loss 5.2447 LearningRate 0.0152 Epoch: 12 Global Step: 506140 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:48,909-Speed 2631.47 samples/sec Loss 5.1657 LearningRate 0.0152 Epoch: 12 Global Step: 506150 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:52,816-Speed 2621.71 samples/sec Loss 5.2206 LearningRate 0.0152 Epoch: 12 Global Step: 506160 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:35:56,716-Speed 2625.98 samples/sec Loss 5.2075 LearningRate 0.0152 Epoch: 12 Global Step: 506170 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:36:00,616-Speed 2626.11 samples/sec Loss 5.1994 LearningRate 0.0152 Epoch: 12 Global Step: 506180 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:36:04,525-Speed 2620.33 samples/sec Loss 5.2672 LearningRate 0.0152 Epoch: 12 Global Step: 506190 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:36:08,435-Speed 2619.24 samples/sec Loss 5.1941 LearningRate 0.0152 Epoch: 12 Global Step: 506200 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:12,340-Speed 2623.57 samples/sec Loss 5.3077 LearningRate 0.0152 Epoch: 12 Global Step: 506210 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:16,247-Speed 2620.87 samples/sec Loss 5.2239 LearningRate 0.0152 Epoch: 12 Global Step: 506220 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:20,153-Speed 2622.58 samples/sec Loss 5.1846 LearningRate 0.0152 Epoch: 12 Global Step: 506230 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:24,054-Speed 2625.18 samples/sec Loss 5.2754 LearningRate 0.0152 Epoch: 12 Global Step: 506240 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:27,950-Speed 2628.89 samples/sec Loss 5.1979 LearningRate 0.0152 Epoch: 12 Global Step: 506250 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:31,845-Speed 2629.82 samples/sec Loss 5.2748 LearningRate 0.0152 Epoch: 12 Global Step: 506260 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:35,738-Speed 2631.61 samples/sec Loss 5.2169 LearningRate 0.0152 Epoch: 12 Global Step: 506270 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:39,637-Speed 2627.27 samples/sec Loss 5.2065 LearningRate 0.0152 Epoch: 12 Global Step: 506280 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:43,532-Speed 2629.55 samples/sec Loss 5.2113 LearningRate 0.0152 Epoch: 12 Global Step: 506290 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:47,419-Speed 2635.68 samples/sec Loss 5.1461 LearningRate 0.0152 Epoch: 12 Global Step: 506300 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:51,324-Speed 2622.63 samples/sec Loss 5.2259 LearningRate 0.0152 Epoch: 12 Global Step: 506310 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:55,219-Speed 2629.83 samples/sec Loss 5.2721 LearningRate 0.0152 Epoch: 12 Global Step: 506320 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:36:59,121-Speed 2624.47 samples/sec Loss 5.1825 LearningRate 0.0152 Epoch: 12 Global Step: 506330 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:37:03,032-Speed 2618.65 samples/sec Loss 5.1714 LearningRate 0.0152 Epoch: 12 Global Step: 506340 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:37:06,929-Speed 2628.63 samples/sec Loss 5.2060 LearningRate 0.0152 Epoch: 12 Global Step: 506350 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:37:10,830-Speed 2626.19 samples/sec Loss 5.3052 LearningRate 0.0152 Epoch: 12 Global Step: 506360 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:37:14,719-Speed 2633.41 samples/sec Loss 5.2193 LearningRate 0.0152 Epoch: 12 Global Step: 506370 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:18,629-Speed 2619.51 samples/sec Loss 5.2777 LearningRate 0.0152 Epoch: 12 Global Step: 506380 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:22,557-Speed 2607.78 samples/sec Loss 5.2823 LearningRate 0.0152 Epoch: 12 Global Step: 506390 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:26,468-Speed 2618.96 samples/sec Loss 5.3134 LearningRate 0.0152 Epoch: 12 Global Step: 506400 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:30,363-Speed 2629.63 samples/sec Loss 5.1615 LearningRate 0.0152 Epoch: 12 Global Step: 506410 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:34,276-Speed 2617.18 samples/sec Loss 5.2499 LearningRate 0.0152 Epoch: 12 Global Step: 506420 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:38,184-Speed 2620.64 samples/sec Loss 5.3783 LearningRate 0.0152 Epoch: 12 Global Step: 506430 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:42,078-Speed 2630.06 samples/sec Loss 5.2744 LearningRate 0.0152 Epoch: 12 Global Step: 506440 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:45,976-Speed 2628.61 samples/sec Loss 5.1946 LearningRate 0.0152 Epoch: 12 Global Step: 506450 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:49,871-Speed 2629.58 samples/sec Loss 5.2457 LearningRate 0.0152 Epoch: 12 Global Step: 506460 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:37:53,770-Speed 2626.97 samples/sec Loss 5.2527 LearningRate 0.0152 Epoch: 12 Global Step: 506470 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:37:57,673-Speed 2626.54 samples/sec Loss 5.3108 LearningRate 0.0152 Epoch: 12 Global Step: 506480 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:01,568-Speed 2629.27 samples/sec Loss 5.2572 LearningRate 0.0152 Epoch: 12 Global Step: 506490 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:05,465-Speed 2627.92 samples/sec Loss 5.2146 LearningRate 0.0152 Epoch: 12 Global Step: 506500 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:09,361-Speed 2629.34 samples/sec Loss 5.2208 LearningRate 0.0152 Epoch: 12 Global Step: 506510 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:13,262-Speed 2625.45 samples/sec Loss 5.2410 LearningRate 0.0152 Epoch: 12 Global Step: 506520 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:17,161-Speed 2626.80 samples/sec Loss 5.2872 LearningRate 0.0152 Epoch: 12 Global Step: 506530 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:21,055-Speed 2630.51 samples/sec Loss 5.1286 LearningRate 0.0152 Epoch: 12 Global Step: 506540 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:24,950-Speed 2629.60 samples/sec Loss 5.2147 LearningRate 0.0152 Epoch: 12 Global Step: 506550 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:28,850-Speed 2626.55 samples/sec Loss 5.2606 LearningRate 0.0152 Epoch: 12 Global Step: 506560 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:32,770-Speed 2612.70 samples/sec Loss 5.1395 LearningRate 0.0152 Epoch: 12 Global Step: 506570 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:36,683-Speed 2617.17 samples/sec Loss 5.1754 LearningRate 0.0152 Epoch: 12 Global Step: 506580 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:38:40,559-Speed 2642.72 samples/sec Loss 5.1680 LearningRate 0.0152 Epoch: 12 Global Step: 506590 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:38:44,456-Speed 2628.63 samples/sec Loss 5.2027 LearningRate 0.0152 Epoch: 12 Global Step: 506600 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:38:48,353-Speed 2627.83 samples/sec Loss 5.2441 LearningRate 0.0152 Epoch: 12 Global Step: 506610 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:38:52,273-Speed 2613.24 samples/sec Loss 5.2380 LearningRate 0.0152 Epoch: 12 Global Step: 506620 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:38:56,171-Speed 2627.53 samples/sec Loss 5.2402 LearningRate 0.0152 Epoch: 12 Global Step: 506630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:00,070-Speed 2626.69 samples/sec Loss 5.2122 LearningRate 0.0152 Epoch: 12 Global Step: 506640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:03,967-Speed 2628.80 samples/sec Loss 5.2437 LearningRate 0.0152 Epoch: 12 Global Step: 506650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:07,860-Speed 2630.39 samples/sec Loss 5.2455 LearningRate 0.0152 Epoch: 12 Global Step: 506660 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:11,754-Speed 2630.61 samples/sec Loss 5.2809 LearningRate 0.0152 Epoch: 12 Global Step: 506670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:15,650-Speed 2628.99 samples/sec Loss 5.2750 LearningRate 0.0152 Epoch: 12 Global Step: 506680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:19,543-Speed 2630.51 samples/sec Loss 5.1582 LearningRate 0.0151 Epoch: 12 Global Step: 506690 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:39:23,440-Speed 2628.40 samples/sec Loss 5.1801 LearningRate 0.0151 Epoch: 12 Global Step: 506700 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:39:27,338-Speed 2627.83 samples/sec Loss 5.2093 LearningRate 0.0151 Epoch: 12 Global Step: 506710 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:39:31,242-Speed 2623.83 samples/sec Loss 5.1306 LearningRate 0.0151 Epoch: 12 Global Step: 506720 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:39:35,133-Speed 2632.13 samples/sec Loss 5.2315 LearningRate 0.0151 Epoch: 12 Global Step: 506730 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:39,032-Speed 2626.26 samples/sec Loss 5.3317 LearningRate 0.0151 Epoch: 12 Global Step: 506740 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:42,941-Speed 2620.17 samples/sec Loss 5.2613 LearningRate 0.0151 Epoch: 12 Global Step: 506750 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:46,853-Speed 2618.90 samples/sec Loss 5.2376 LearningRate 0.0151 Epoch: 12 Global Step: 506760 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:50,756-Speed 2624.21 samples/sec Loss 5.1989 LearningRate 0.0151 Epoch: 12 Global Step: 506770 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:54,657-Speed 2625.31 samples/sec Loss 5.1269 LearningRate 0.0151 Epoch: 12 Global Step: 506780 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:39:58,531-Speed 2644.26 samples/sec Loss 5.3681 LearningRate 0.0151 Epoch: 12 Global Step: 506790 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:02,434-Speed 2624.39 samples/sec Loss 5.1209 LearningRate 0.0151 Epoch: 12 Global Step: 506800 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:06,339-Speed 2622.41 samples/sec Loss 5.2289 LearningRate 0.0151 Epoch: 12 Global Step: 506810 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:10,236-Speed 2628.47 samples/sec Loss 5.1188 LearningRate 0.0151 Epoch: 12 Global Step: 506820 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:14,131-Speed 2629.64 samples/sec Loss 5.2270 LearningRate 0.0151 Epoch: 12 Global Step: 506830 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:18,041-Speed 2619.51 samples/sec Loss 5.1440 LearningRate 0.0151 Epoch: 12 Global Step: 506840 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:21,973-Speed 2604.98 samples/sec Loss 5.2405 LearningRate 0.0151 Epoch: 12 Global Step: 506850 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:25,866-Speed 2631.23 samples/sec Loss 5.1845 LearningRate 0.0151 Epoch: 12 Global Step: 506860 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:29,764-Speed 2627.34 samples/sec Loss 5.2771 LearningRate 0.0151 Epoch: 12 Global Step: 506870 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:33,655-Speed 2632.57 samples/sec Loss 5.1615 LearningRate 0.0151 Epoch: 12 Global Step: 506880 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:40:37,546-Speed 2631.94 samples/sec Loss 5.1936 LearningRate 0.0151 Epoch: 12 Global Step: 506890 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:40:41,440-Speed 2630.26 samples/sec Loss 5.1915 LearningRate 0.0151 Epoch: 12 Global Step: 506900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:40:45,336-Speed 2629.27 samples/sec Loss 5.2346 LearningRate 0.0151 Epoch: 12 Global Step: 506910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:40:49,236-Speed 2626.50 samples/sec Loss 5.1981 LearningRate 0.0151 Epoch: 12 Global Step: 506920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:40:53,151-Speed 2615.87 samples/sec Loss 5.2229 LearningRate 0.0151 Epoch: 12 Global Step: 506930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:40:57,041-Speed 2632.96 samples/sec Loss 5.1960 LearningRate 0.0151 Epoch: 12 Global Step: 506940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:41:00,948-Speed 2621.90 samples/sec Loss 5.1520 LearningRate 0.0151 Epoch: 12 Global Step: 506950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:41:04,847-Speed 2626.67 samples/sec Loss 5.2055 LearningRate 0.0151 Epoch: 12 Global Step: 506960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:41:08,759-Speed 2618.61 samples/sec Loss 5.2334 LearningRate 0.0151 Epoch: 12 Global Step: 506970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:41:12,654-Speed 2629.23 samples/sec Loss 5.2434 LearningRate 0.0151 Epoch: 12 Global Step: 506980 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:41:16,528-Speed 2643.94 samples/sec Loss 5.2055 LearningRate 0.0151 Epoch: 12 Global Step: 506990 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:20,434-Speed 2623.18 samples/sec Loss 5.2247 LearningRate 0.0151 Epoch: 12 Global Step: 507000 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:24,333-Speed 2626.84 samples/sec Loss 5.2004 LearningRate 0.0151 Epoch: 12 Global Step: 507010 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:28,230-Speed 2627.94 samples/sec Loss 5.2553 LearningRate 0.0151 Epoch: 12 Global Step: 507020 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:32,162-Speed 2604.93 samples/sec Loss 5.1535 LearningRate 0.0151 Epoch: 12 Global Step: 507030 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:36,153-Speed 2566.33 samples/sec Loss 5.1768 LearningRate 0.0151 Epoch: 12 Global Step: 507040 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:40,049-Speed 2629.33 samples/sec Loss 5.1483 LearningRate 0.0151 Epoch: 12 Global Step: 507050 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:43,957-Speed 2621.15 samples/sec Loss 5.2212 LearningRate 0.0151 Epoch: 12 Global Step: 507060 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:47,887-Speed 2605.91 samples/sec Loss 5.1449 LearningRate 0.0151 Epoch: 12 Global Step: 507070 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:51,783-Speed 2629.90 samples/sec Loss 5.2571 LearningRate 0.0151 Epoch: 12 Global Step: 507080 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:41:55,680-Speed 2628.16 samples/sec Loss 5.1600 LearningRate 0.0151 Epoch: 12 Global Step: 507090 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:41:59,586-Speed 2621.56 samples/sec Loss 5.1871 LearningRate 0.0151 Epoch: 12 Global Step: 507100 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:03,479-Speed 2631.01 samples/sec Loss 5.1720 LearningRate 0.0151 Epoch: 12 Global Step: 507110 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:07,408-Speed 2607.39 samples/sec Loss 5.2162 LearningRate 0.0151 Epoch: 12 Global Step: 507120 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:11,353-Speed 2596.64 samples/sec Loss 5.1592 LearningRate 0.0151 Epoch: 12 Global Step: 507130 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:15,243-Speed 2632.29 samples/sec Loss 5.1597 LearningRate 0.0151 Epoch: 12 Global Step: 507140 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:19,158-Speed 2617.01 samples/sec Loss 5.2103 LearningRate 0.0151 Epoch: 12 Global Step: 507150 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:23,057-Speed 2626.11 samples/sec Loss 5.2256 LearningRate 0.0151 Epoch: 12 Global Step: 507160 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:26,954-Speed 2628.99 samples/sec Loss 5.2376 LearningRate 0.0151 Epoch: 12 Global Step: 507170 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:30,845-Speed 2632.25 samples/sec Loss 5.1728 LearningRate 0.0151 Epoch: 12 Global Step: 507180 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:34,750-Speed 2623.08 samples/sec Loss 5.1793 LearningRate 0.0151 Epoch: 12 Global Step: 507190 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:42:38,649-Speed 2626.90 samples/sec Loss 5.2947 LearningRate 0.0151 Epoch: 12 Global Step: 507200 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:42:42,545-Speed 2628.45 samples/sec Loss 5.3773 LearningRate 0.0151 Epoch: 12 Global Step: 507210 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:42:46,456-Speed 2619.45 samples/sec Loss 5.1795 LearningRate 0.0151 Epoch: 12 Global Step: 507220 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:42:50,330-Speed 2643.62 samples/sec Loss 5.1931 LearningRate 0.0151 Epoch: 12 Global Step: 507230 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:54,237-Speed 2621.49 samples/sec Loss 5.2939 LearningRate 0.0151 Epoch: 12 Global Step: 507240 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:42:58,152-Speed 2616.61 samples/sec Loss 5.1681 LearningRate 0.0151 Epoch: 12 Global Step: 507250 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:02,054-Speed 2624.41 samples/sec Loss 5.2206 LearningRate 0.0151 Epoch: 12 Global Step: 507260 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:05,950-Speed 2629.23 samples/sec Loss 5.2301 LearningRate 0.0151 Epoch: 12 Global Step: 507270 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:09,842-Speed 2631.96 samples/sec Loss 5.2214 LearningRate 0.0151 Epoch: 12 Global Step: 507280 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:13,738-Speed 2628.58 samples/sec Loss 5.2606 LearningRate 0.0151 Epoch: 12 Global Step: 507290 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:17,636-Speed 2627.57 samples/sec Loss 5.1187 LearningRate 0.0151 Epoch: 12 Global Step: 507300 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:21,537-Speed 2625.54 samples/sec Loss 5.2003 LearningRate 0.0151 Epoch: 12 Global Step: 507310 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:25,441-Speed 2623.73 samples/sec Loss 5.2510 LearningRate 0.0151 Epoch: 12 Global Step: 507320 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:43:29,339-Speed 2627.88 samples/sec Loss 5.0912 LearningRate 0.0151 Epoch: 12 Global Step: 507330 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:43:33,233-Speed 2630.31 samples/sec Loss 5.1106 LearningRate 0.0151 Epoch: 12 Global Step: 507340 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:43:37,128-Speed 2629.84 samples/sec Loss 5.2668 LearningRate 0.0151 Epoch: 12 Global Step: 507350 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:43:41,020-Speed 2631.56 samples/sec Loss 5.2403 LearningRate 0.0151 Epoch: 12 Global Step: 507360 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:43:44,957-Speed 2601.60 samples/sec Loss 5.1763 LearningRate 0.0151 Epoch: 12 Global Step: 507370 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:43:48,870-Speed 2617.71 samples/sec Loss 5.2402 LearningRate 0.0151 Epoch: 12 Global Step: 507380 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:43:52,789-Speed 2613.99 samples/sec Loss 5.2057 LearningRate 0.0151 Epoch: 12 Global Step: 507390 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:43:56,680-Speed 2632.13 samples/sec Loss 5.2572 LearningRate 0.0151 Epoch: 12 Global Step: 507400 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:00,573-Speed 2631.59 samples/sec Loss 5.1663 LearningRate 0.0151 Epoch: 12 Global Step: 507410 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:04,473-Speed 2626.65 samples/sec Loss 5.2241 LearningRate 0.0151 Epoch: 12 Global Step: 507420 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:08,385-Speed 2618.21 samples/sec Loss 5.2492 LearningRate 0.0151 Epoch: 12 Global Step: 507430 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:12,292-Speed 2621.46 samples/sec Loss 5.2853 LearningRate 0.0151 Epoch: 12 Global Step: 507440 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:16,187-Speed 2630.06 samples/sec Loss 5.1898 LearningRate 0.0151 Epoch: 12 Global Step: 507450 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:20,102-Speed 2615.64 samples/sec Loss 5.1621 LearningRate 0.0151 Epoch: 12 Global Step: 507460 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:24,011-Speed 2620.73 samples/sec Loss 5.2876 LearningRate 0.0151 Epoch: 12 Global Step: 507470 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:27,921-Speed 2619.42 samples/sec Loss 5.2551 LearningRate 0.0151 Epoch: 12 Global Step: 507480 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:31,995-Speed 2514.25 samples/sec Loss 5.2644 LearningRate 0.0151 Epoch: 12 Global Step: 507490 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:44:35,893-Speed 2627.46 samples/sec Loss 5.1579 LearningRate 0.0151 Epoch: 12 Global Step: 507500 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:44:39,788-Speed 2629.71 samples/sec Loss 5.2428 LearningRate 0.0151 Epoch: 12 Global Step: 507510 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:44:43,708-Speed 2612.54 samples/sec Loss 5.2470 LearningRate 0.0151 Epoch: 12 Global Step: 507520 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:44:47,618-Speed 2619.11 samples/sec Loss 5.1091 LearningRate 0.0151 Epoch: 12 Global Step: 507530 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:44:51,531-Speed 2629.72 samples/sec Loss 5.2434 LearningRate 0.0151 Epoch: 12 Global Step: 507540 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:44:55,442-Speed 2618.76 samples/sec Loss 5.2234 LearningRate 0.0151 Epoch: 12 Global Step: 507550 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:44:59,342-Speed 2626.61 samples/sec Loss 5.2668 LearningRate 0.0151 Epoch: 12 Global Step: 507560 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:45:03,240-Speed 2627.44 samples/sec Loss 5.1987 LearningRate 0.0151 Epoch: 12 Global Step: 507570 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:45:07,135-Speed 2629.62 samples/sec Loss 5.1178 LearningRate 0.0151 Epoch: 12 Global Step: 507580 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:45:11,035-Speed 2626.31 samples/sec Loss 5.1384 LearningRate 0.0151 Epoch: 12 Global Step: 507590 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:45:14,928-Speed 2631.17 samples/sec Loss 5.2070 LearningRate 0.0151 Epoch: 12 Global Step: 507600 Fp16 Grad Scale: 262144 Required: 36 hours
Training: 2022-04-15 04:45:18,815-Speed 2634.61 samples/sec Loss 5.3264 LearningRate 0.0151 Epoch: 12 Global Step: 507610 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:45:22,714-Speed 2627.34 samples/sec Loss 5.1999 LearningRate 0.0151 Epoch: 12 Global Step: 507620 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:45:26,582-Speed 2648.36 samples/sec Loss 5.1899 LearningRate 0.0151 Epoch: 12 Global Step: 507630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:30,483-Speed 2625.81 samples/sec Loss 5.1804 LearningRate 0.0151 Epoch: 12 Global Step: 507640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:34,374-Speed 2631.93 samples/sec Loss 5.1055 LearningRate 0.0151 Epoch: 12 Global Step: 507650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:38,266-Speed 2631.90 samples/sec Loss 5.2152 LearningRate 0.0151 Epoch: 12 Global Step: 507660 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:42,175-Speed 2620.24 samples/sec Loss 5.1897 LearningRate 0.0151 Epoch: 12 Global Step: 507670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:46,074-Speed 2627.73 samples/sec Loss 5.1786 LearningRate 0.0151 Epoch: 12 Global Step: 507680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:49,968-Speed 2629.58 samples/sec Loss 5.2390 LearningRate 0.0151 Epoch: 12 Global Step: 507690 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:53,879-Speed 2619.29 samples/sec Loss 5.2049 LearningRate 0.0151 Epoch: 12 Global Step: 507700 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:45:57,785-Speed 2622.51 samples/sec Loss 5.3351 LearningRate 0.0151 Epoch: 12 Global Step: 507710 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:46:01,681-Speed 2629.04 samples/sec Loss 5.2811 LearningRate 0.0151 Epoch: 12 Global Step: 507720 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:46:05,586-Speed 2622.93 samples/sec Loss 5.1898 LearningRate 0.0151 Epoch: 12 Global Step: 507730 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:09,492-Speed 2622.33 samples/sec Loss 5.1911 LearningRate 0.0151 Epoch: 12 Global Step: 507740 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:13,402-Speed 2619.17 samples/sec Loss 5.1678 LearningRate 0.0151 Epoch: 12 Global Step: 507750 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:17,297-Speed 2629.77 samples/sec Loss 5.2610 LearningRate 0.0150 Epoch: 12 Global Step: 507760 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:21,222-Speed 2610.54 samples/sec Loss 5.1020 LearningRate 0.0150 Epoch: 12 Global Step: 507770 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:25,125-Speed 2624.21 samples/sec Loss 5.1993 LearningRate 0.0150 Epoch: 12 Global Step: 507780 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:29,025-Speed 2625.69 samples/sec Loss 5.2060 LearningRate 0.0150 Epoch: 12 Global Step: 507790 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:32,962-Speed 2601.65 samples/sec Loss 5.2344 LearningRate 0.0150 Epoch: 12 Global Step: 507800 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:36,857-Speed 2629.98 samples/sec Loss 5.2073 LearningRate 0.0150 Epoch: 12 Global Step: 507810 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:40,748-Speed 2632.96 samples/sec Loss 5.2859 LearningRate 0.0150 Epoch: 12 Global Step: 507820 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:44,619-Speed 2645.67 samples/sec Loss 5.2762 LearningRate 0.0150 Epoch: 12 Global Step: 507830 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:48,616-Speed 2563.30 samples/sec Loss 5.2612 LearningRate 0.0150 Epoch: 12 Global Step: 507840 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:52,511-Speed 2629.53 samples/sec Loss 5.1431 LearningRate 0.0150 Epoch: 12 Global Step: 507850 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:46:56,406-Speed 2630.07 samples/sec Loss 5.2628 LearningRate 0.0150 Epoch: 12 Global Step: 507860 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:47:00,313-Speed 2621.79 samples/sec Loss 5.1182 LearningRate 0.0150 Epoch: 12 Global Step: 507870 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:47:04,185-Speed 2645.05 samples/sec Loss 5.1887 LearningRate 0.0150 Epoch: 12 Global Step: 507880 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:08,083-Speed 2627.73 samples/sec Loss 5.1641 LearningRate 0.0150 Epoch: 12 Global Step: 507890 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:12,182-Speed 2498.70 samples/sec Loss 5.2118 LearningRate 0.0150 Epoch: 12 Global Step: 507900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:16,079-Speed 2628.63 samples/sec Loss 5.2025 LearningRate 0.0150 Epoch: 12 Global Step: 507910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:19,982-Speed 2624.48 samples/sec Loss 5.1827 LearningRate 0.0150 Epoch: 12 Global Step: 507920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:23,879-Speed 2628.46 samples/sec Loss 5.2032 LearningRate 0.0150 Epoch: 12 Global Step: 507930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:27,779-Speed 2626.44 samples/sec Loss 5.1613 LearningRate 0.0150 Epoch: 12 Global Step: 507940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:31,699-Speed 2612.51 samples/sec Loss 5.2382 LearningRate 0.0150 Epoch: 12 Global Step: 507950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:35,680-Speed 2572.90 samples/sec Loss 5.2327 LearningRate 0.0150 Epoch: 12 Global Step: 507960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:39,593-Speed 2617.73 samples/sec Loss 5.2488 LearningRate 0.0150 Epoch: 12 Global Step: 507970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:47:43,485-Speed 2631.18 samples/sec Loss 5.1589 LearningRate 0.0150 Epoch: 12 Global Step: 507980 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:47:47,382-Speed 2628.93 samples/sec Loss 5.1650 LearningRate 0.0150 Epoch: 12 Global Step: 507990 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:47:51,284-Speed 2625.08 samples/sec Loss 5.1629 LearningRate 0.0150 Epoch: 12 Global Step: 508000 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:47:55,179-Speed 2629.96 samples/sec Loss 5.2390 LearningRate 0.0150 Epoch: 12 Global Step: 508010 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:47:59,078-Speed 2626.79 samples/sec Loss 5.1883 LearningRate 0.0150 Epoch: 12 Global Step: 508020 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:48:02,973-Speed 2629.18 samples/sec Loss 5.2351 LearningRate 0.0150 Epoch: 12 Global Step: 508030 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:48:06,854-Speed 2639.10 samples/sec Loss 5.2378 LearningRate 0.0150 Epoch: 12 Global Step: 508040 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:10,751-Speed 2628.33 samples/sec Loss 5.1241 LearningRate 0.0150 Epoch: 12 Global Step: 508050 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:14,654-Speed 2624.44 samples/sec Loss 5.1671 LearningRate 0.0150 Epoch: 12 Global Step: 508060 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:18,555-Speed 2625.45 samples/sec Loss 5.1758 LearningRate 0.0150 Epoch: 12 Global Step: 508070 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:22,458-Speed 2624.68 samples/sec Loss 5.1729 LearningRate 0.0150 Epoch: 12 Global Step: 508080 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:26,347-Speed 2634.02 samples/sec Loss 5.1927 LearningRate 0.0150 Epoch: 12 Global Step: 508090 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:30,243-Speed 2628.57 samples/sec Loss 5.2363 LearningRate 0.0150 Epoch: 12 Global Step: 508100 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:34,138-Speed 2629.82 samples/sec Loss 5.1841 LearningRate 0.0150 Epoch: 12 Global Step: 508110 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:38,034-Speed 2628.38 samples/sec Loss 5.2348 LearningRate 0.0150 Epoch: 12 Global Step: 508120 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:41,935-Speed 2625.74 samples/sec Loss 5.2246 LearningRate 0.0150 Epoch: 12 Global Step: 508130 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:45,829-Speed 2631.04 samples/sec Loss 5.2710 LearningRate 0.0150 Epoch: 12 Global Step: 508140 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:48:49,700-Speed 2646.10 samples/sec Loss 5.1325 LearningRate 0.0150 Epoch: 12 Global Step: 508150 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:53,597-Speed 2628.62 samples/sec Loss 5.1018 LearningRate 0.0150 Epoch: 12 Global Step: 508160 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:48:57,528-Speed 2605.27 samples/sec Loss 5.1858 LearningRate 0.0150 Epoch: 12 Global Step: 508170 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:01,422-Speed 2630.63 samples/sec Loss 5.2099 LearningRate 0.0150 Epoch: 12 Global Step: 508180 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:05,320-Speed 2627.27 samples/sec Loss 5.1814 LearningRate 0.0150 Epoch: 12 Global Step: 508190 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:09,218-Speed 2628.10 samples/sec Loss 5.2406 LearningRate 0.0150 Epoch: 12 Global Step: 508200 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:13,108-Speed 2632.56 samples/sec Loss 5.1304 LearningRate 0.0150 Epoch: 12 Global Step: 508210 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:17,002-Speed 2630.72 samples/sec Loss 5.1992 LearningRate 0.0150 Epoch: 12 Global Step: 508220 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:20,898-Speed 2629.46 samples/sec Loss 5.2181 LearningRate 0.0150 Epoch: 12 Global Step: 508230 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:24,798-Speed 2625.98 samples/sec Loss 5.1426 LearningRate 0.0150 Epoch: 12 Global Step: 508240 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:28,715-Speed 2615.34 samples/sec Loss 5.2213 LearningRate 0.0150 Epoch: 12 Global Step: 508250 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:49:32,630-Speed 2616.16 samples/sec Loss 5.0917 LearningRate 0.0150 Epoch: 12 Global Step: 508260 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:49:36,525-Speed 2629.48 samples/sec Loss 5.1513 LearningRate 0.0150 Epoch: 12 Global Step: 508270 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:49:40,430-Speed 2622.51 samples/sec Loss 5.2781 LearningRate 0.0150 Epoch: 12 Global Step: 508280 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:49:44,327-Speed 2628.40 samples/sec Loss 5.3016 LearningRate 0.0150 Epoch: 12 Global Step: 508290 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:49:48,232-Speed 2622.80 samples/sec Loss 5.1891 LearningRate 0.0150 Epoch: 12 Global Step: 508300 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:49:52,109-Speed 2642.57 samples/sec Loss 5.2412 LearningRate 0.0150 Epoch: 12 Global Step: 508310 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:56,016-Speed 2621.68 samples/sec Loss 5.2472 LearningRate 0.0150 Epoch: 12 Global Step: 508320 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:49:59,931-Speed 2616.21 samples/sec Loss 5.1855 LearningRate 0.0150 Epoch: 12 Global Step: 508330 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:03,831-Speed 2626.04 samples/sec Loss 5.1433 LearningRate 0.0150 Epoch: 12 Global Step: 508340 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:07,732-Speed 2625.75 samples/sec Loss 5.1473 LearningRate 0.0150 Epoch: 12 Global Step: 508350 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:11,636-Speed 2623.47 samples/sec Loss 5.2597 LearningRate 0.0150 Epoch: 12 Global Step: 508360 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:15,568-Speed 2605.32 samples/sec Loss 5.1620 LearningRate 0.0150 Epoch: 12 Global Step: 508370 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:19,462-Speed 2629.77 samples/sec Loss 5.1716 LearningRate 0.0150 Epoch: 12 Global Step: 508380 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:23,376-Speed 2617.75 samples/sec Loss 5.2282 LearningRate 0.0150 Epoch: 12 Global Step: 508390 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:27,274-Speed 2627.42 samples/sec Loss 5.1587 LearningRate 0.0150 Epoch: 12 Global Step: 508400 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:50:31,174-Speed 2626.65 samples/sec Loss 5.1973 LearningRate 0.0150 Epoch: 12 Global Step: 508410 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:50:35,069-Speed 2629.62 samples/sec Loss 5.1931 LearningRate 0.0150 Epoch: 12 Global Step: 508420 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:50:38,970-Speed 2625.13 samples/sec Loss 5.1116 LearningRate 0.0150 Epoch: 12 Global Step: 508430 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:50:42,887-Speed 2615.19 samples/sec Loss 5.2125 LearningRate 0.0150 Epoch: 12 Global Step: 508440 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:50:46,799-Speed 2617.97 samples/sec Loss 5.0916 LearningRate 0.0150 Epoch: 12 Global Step: 508450 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:50:50,702-Speed 2624.68 samples/sec Loss 5.1376 LearningRate 0.0150 Epoch: 12 Global Step: 508460 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:50:54,593-Speed 2632.46 samples/sec Loss 5.1471 LearningRate 0.0150 Epoch: 12 Global Step: 508470 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:50:58,494-Speed 2625.27 samples/sec Loss 5.1393 LearningRate 0.0150 Epoch: 12 Global Step: 508480 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:02,421-Speed 2608.60 samples/sec Loss 5.1698 LearningRate 0.0150 Epoch: 12 Global Step: 508490 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:06,319-Speed 2628.18 samples/sec Loss 5.1833 LearningRate 0.0150 Epoch: 12 Global Step: 508500 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:10,204-Speed 2635.81 samples/sec Loss 5.2635 LearningRate 0.0150 Epoch: 12 Global Step: 508510 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:14,104-Speed 2626.58 samples/sec Loss 5.1074 LearningRate 0.0150 Epoch: 12 Global Step: 508520 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:18,003-Speed 2627.22 samples/sec Loss 5.2384 LearningRate 0.0150 Epoch: 12 Global Step: 508530 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:21,902-Speed 2627.13 samples/sec Loss 5.0871 LearningRate 0.0150 Epoch: 12 Global Step: 508540 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:25,799-Speed 2628.69 samples/sec Loss 5.2511 LearningRate 0.0150 Epoch: 12 Global Step: 508550 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:29,697-Speed 2627.42 samples/sec Loss 5.0821 LearningRate 0.0150 Epoch: 12 Global Step: 508560 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:51:33,582-Speed 2636.68 samples/sec Loss 5.1558 LearningRate 0.0150 Epoch: 12 Global Step: 508570 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:51:37,479-Speed 2628.15 samples/sec Loss 5.2626 LearningRate 0.0150 Epoch: 12 Global Step: 508580 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:51:41,377-Speed 2627.30 samples/sec Loss 5.2810 LearningRate 0.0150 Epoch: 12 Global Step: 508590 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:51:45,273-Speed 2629.68 samples/sec Loss 5.2269 LearningRate 0.0150 Epoch: 12 Global Step: 508600 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:51:49,168-Speed 2629.67 samples/sec Loss 5.3203 LearningRate 0.0150 Epoch: 12 Global Step: 508610 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:51:53,069-Speed 2625.42 samples/sec Loss 5.2102 LearningRate 0.0150 Epoch: 12 Global Step: 508620 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:51:56,964-Speed 2629.84 samples/sec Loss 5.1538 LearningRate 0.0150 Epoch: 12 Global Step: 508630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:00,867-Speed 2624.33 samples/sec Loss 5.2220 LearningRate 0.0150 Epoch: 12 Global Step: 508640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:04,766-Speed 2626.86 samples/sec Loss 5.0919 LearningRate 0.0150 Epoch: 12 Global Step: 508650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:08,673-Speed 2621.44 samples/sec Loss 5.2049 LearningRate 0.0150 Epoch: 12 Global Step: 508660 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:12,552-Speed 2640.56 samples/sec Loss 5.1837 LearningRate 0.0150 Epoch: 12 Global Step: 508670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:16,455-Speed 2626.83 samples/sec Loss 5.1637 LearningRate 0.0150 Epoch: 12 Global Step: 508680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:20,366-Speed 2618.72 samples/sec Loss 5.1641 LearningRate 0.0150 Epoch: 12 Global Step: 508690 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:24,259-Speed 2630.78 samples/sec Loss 5.0964 LearningRate 0.0150 Epoch: 12 Global Step: 508700 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:28,156-Speed 2628.14 samples/sec Loss 5.1859 LearningRate 0.0150 Epoch: 12 Global Step: 508710 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:32,061-Speed 2623.39 samples/sec Loss 5.1337 LearningRate 0.0150 Epoch: 12 Global Step: 508720 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:35,959-Speed 2627.41 samples/sec Loss 5.2507 LearningRate 0.0150 Epoch: 12 Global Step: 508730 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:39,859-Speed 2625.64 samples/sec Loss 5.2592 LearningRate 0.0150 Epoch: 12 Global Step: 508740 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:43,840-Speed 2572.90 samples/sec Loss 5.1371 LearningRate 0.0150 Epoch: 12 Global Step: 508750 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:47,746-Speed 2630.23 samples/sec Loss 5.0258 LearningRate 0.0150 Epoch: 12 Global Step: 508760 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:51,640-Speed 2630.53 samples/sec Loss 5.2093 LearningRate 0.0150 Epoch: 12 Global Step: 508770 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:52:55,517-Speed 2641.73 samples/sec Loss 5.2133 LearningRate 0.0150 Epoch: 12 Global Step: 508780 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:52:59,416-Speed 2627.25 samples/sec Loss 5.2113 LearningRate 0.0150 Epoch: 12 Global Step: 508790 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:03,334-Speed 2613.72 samples/sec Loss 5.2690 LearningRate 0.0150 Epoch: 12 Global Step: 508800 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:07,229-Speed 2629.52 samples/sec Loss 5.1317 LearningRate 0.0150 Epoch: 12 Global Step: 508810 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:11,127-Speed 2628.04 samples/sec Loss 5.2213 LearningRate 0.0150 Epoch: 12 Global Step: 508820 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:15,027-Speed 2625.91 samples/sec Loss 5.1429 LearningRate 0.0149 Epoch: 12 Global Step: 508830 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:18,934-Speed 2621.63 samples/sec Loss 5.2030 LearningRate 0.0149 Epoch: 12 Global Step: 508840 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:22,827-Speed 2631.03 samples/sec Loss 5.2437 LearningRate 0.0149 Epoch: 12 Global Step: 508850 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:26,728-Speed 2625.89 samples/sec Loss 5.1691 LearningRate 0.0149 Epoch: 12 Global Step: 508860 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:30,625-Speed 2628.45 samples/sec Loss 5.1507 LearningRate 0.0149 Epoch: 12 Global Step: 508870 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:34,526-Speed 2625.04 samples/sec Loss 5.2066 LearningRate 0.0149 Epoch: 12 Global Step: 508880 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:53:38,397-Speed 2646.21 samples/sec Loss 5.1656 LearningRate 0.0149 Epoch: 12 Global Step: 508890 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:42,295-Speed 2628.02 samples/sec Loss 5.2027 LearningRate 0.0149 Epoch: 12 Global Step: 508900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:46,189-Speed 2630.32 samples/sec Loss 5.1548 LearningRate 0.0149 Epoch: 12 Global Step: 508910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:50,085-Speed 2629.62 samples/sec Loss 5.1569 LearningRate 0.0149 Epoch: 12 Global Step: 508920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:53,981-Speed 2628.82 samples/sec Loss 5.1003 LearningRate 0.0149 Epoch: 12 Global Step: 508930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:53:57,878-Speed 2628.91 samples/sec Loss 5.2774 LearningRate 0.0149 Epoch: 12 Global Step: 508940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:01,773-Speed 2629.48 samples/sec Loss 5.2546 LearningRate 0.0149 Epoch: 12 Global Step: 508950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:05,676-Speed 2624.21 samples/sec Loss 5.2146 LearningRate 0.0149 Epoch: 12 Global Step: 508960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:09,573-Speed 2628.38 samples/sec Loss 5.2270 LearningRate 0.0149 Epoch: 12 Global Step: 508970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:13,468-Speed 2629.11 samples/sec Loss 5.2157 LearningRate 0.0149 Epoch: 12 Global Step: 508980 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:17,366-Speed 2628.71 samples/sec Loss 5.2275 LearningRate 0.0149 Epoch: 12 Global Step: 508990 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:54:21,268-Speed 2624.76 samples/sec Loss 5.1281 LearningRate 0.0149 Epoch: 12 Global Step: 509000 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:54:25,212-Speed 2598.21 samples/sec Loss 5.2079 LearningRate 0.0149 Epoch: 12 Global Step: 509010 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:54:29,104-Speed 2631.38 samples/sec Loss 5.1050 LearningRate 0.0149 Epoch: 12 Global Step: 509020 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:54:32,999-Speed 2629.46 samples/sec Loss 5.1273 LearningRate 0.0149 Epoch: 12 Global Step: 509030 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:54:36,922-Speed 2611.19 samples/sec Loss 5.3345 LearningRate 0.0149 Epoch: 12 Global Step: 509040 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:54:40,837-Speed 2616.83 samples/sec Loss 5.2131 LearningRate 0.0149 Epoch: 12 Global Step: 509050 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:54:44,714-Speed 2641.40 samples/sec Loss 5.1493 LearningRate 0.0149 Epoch: 12 Global Step: 509060 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:48,745-Speed 2541.26 samples/sec Loss 5.1911 LearningRate 0.0149 Epoch: 12 Global Step: 509070 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:52,641-Speed 2628.57 samples/sec Loss 5.1199 LearningRate 0.0149 Epoch: 12 Global Step: 509080 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:54:56,549-Speed 2621.73 samples/sec Loss 5.2700 LearningRate 0.0149 Epoch: 12 Global Step: 509090 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:00,445-Speed 2629.06 samples/sec Loss 5.2764 LearningRate 0.0149 Epoch: 12 Global Step: 509100 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:04,340-Speed 2629.29 samples/sec Loss 5.1441 LearningRate 0.0149 Epoch: 12 Global Step: 509110 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:08,231-Speed 2632.37 samples/sec Loss 5.2650 LearningRate 0.0149 Epoch: 12 Global Step: 509120 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:12,126-Speed 2629.73 samples/sec Loss 5.1823 LearningRate 0.0149 Epoch: 12 Global Step: 509130 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:16,043-Speed 2614.86 samples/sec Loss 5.1619 LearningRate 0.0149 Epoch: 12 Global Step: 509140 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:19,943-Speed 2626.33 samples/sec Loss 5.1013 LearningRate 0.0149 Epoch: 12 Global Step: 509150 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:23,853-Speed 2619.78 samples/sec Loss 5.1690 LearningRate 0.0149 Epoch: 12 Global Step: 509160 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:27,747-Speed 2630.61 samples/sec Loss 5.1505 LearningRate 0.0149 Epoch: 12 Global Step: 509170 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:31,665-Speed 2614.49 samples/sec Loss 5.2061 LearningRate 0.0149 Epoch: 12 Global Step: 509180 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:35,559-Speed 2630.34 samples/sec Loss 5.1068 LearningRate 0.0149 Epoch: 12 Global Step: 509190 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:55:39,440-Speed 2638.77 samples/sec Loss 5.2091 LearningRate 0.0149 Epoch: 12 Global Step: 509200 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:55:43,340-Speed 2626.29 samples/sec Loss 5.2444 LearningRate 0.0149 Epoch: 12 Global Step: 509210 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:55:47,235-Speed 2630.03 samples/sec Loss 5.2373 LearningRate 0.0149 Epoch: 12 Global Step: 509220 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:55:51,136-Speed 2625.67 samples/sec Loss 5.2324 LearningRate 0.0149 Epoch: 12 Global Step: 509230 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:55:55,040-Speed 2623.49 samples/sec Loss 5.2312 LearningRate 0.0149 Epoch: 12 Global Step: 509240 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:55:58,984-Speed 2597.38 samples/sec Loss 5.1332 LearningRate 0.0149 Epoch: 12 Global Step: 509250 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:56:02,875-Speed 2632.79 samples/sec Loss 5.1364 LearningRate 0.0149 Epoch: 12 Global Step: 509260 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:56:06,778-Speed 2624.15 samples/sec Loss 5.1483 LearningRate 0.0149 Epoch: 12 Global Step: 509270 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:56:10,677-Speed 2626.64 samples/sec Loss 5.1937 LearningRate 0.0149 Epoch: 12 Global Step: 509280 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:56:14,688-Speed 2553.23 samples/sec Loss 5.1754 LearningRate 0.0149 Epoch: 12 Global Step: 509290 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 04:56:18,596-Speed 2620.85 samples/sec Loss 5.1765 LearningRate 0.0149 Epoch: 12 Global Step: 509300 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:22,508-Speed 2618.95 samples/sec Loss 5.1159 LearningRate 0.0149 Epoch: 12 Global Step: 509310 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:26,434-Speed 2608.61 samples/sec Loss 5.0923 LearningRate 0.0149 Epoch: 12 Global Step: 509320 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:30,337-Speed 2624.83 samples/sec Loss 5.2484 LearningRate 0.0149 Epoch: 12 Global Step: 509330 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:34,232-Speed 2629.97 samples/sec Loss 5.1782 LearningRate 0.0149 Epoch: 12 Global Step: 509340 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:38,251-Speed 2548.01 samples/sec Loss 5.1497 LearningRate 0.0149 Epoch: 12 Global Step: 509350 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:42,160-Speed 2620.39 samples/sec Loss 5.2195 LearningRate 0.0149 Epoch: 12 Global Step: 509360 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:46,057-Speed 2628.38 samples/sec Loss 5.1216 LearningRate 0.0149 Epoch: 12 Global Step: 509370 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:49,962-Speed 2622.85 samples/sec Loss 5.0829 LearningRate 0.0149 Epoch: 12 Global Step: 509380 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:53,891-Speed 2607.27 samples/sec Loss 5.1704 LearningRate 0.0149 Epoch: 12 Global Step: 509390 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:56:57,785-Speed 2630.30 samples/sec Loss 5.1968 LearningRate 0.0149 Epoch: 12 Global Step: 509400 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:01,697-Speed 2618.56 samples/sec Loss 5.1052 LearningRate 0.0149 Epoch: 12 Global Step: 509410 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:05,591-Speed 2630.55 samples/sec Loss 5.1346 LearningRate 0.0149 Epoch: 12 Global Step: 509420 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:09,490-Speed 2626.80 samples/sec Loss 5.2206 LearningRate 0.0149 Epoch: 12 Global Step: 509430 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:13,502-Speed 2553.12 samples/sec Loss 5.2005 LearningRate 0.0149 Epoch: 12 Global Step: 509440 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:17,582-Speed 2510.37 samples/sec Loss 5.1773 LearningRate 0.0149 Epoch: 12 Global Step: 509450 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:21,560-Speed 2574.45 samples/sec Loss 5.1816 LearningRate 0.0149 Epoch: 12 Global Step: 509460 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:25,537-Speed 2575.94 samples/sec Loss 5.1411 LearningRate 0.0149 Epoch: 12 Global Step: 509470 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:29,441-Speed 2623.87 samples/sec Loss 5.0933 LearningRate 0.0149 Epoch: 12 Global Step: 509480 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:33,384-Speed 2597.66 samples/sec Loss 5.2667 LearningRate 0.0149 Epoch: 12 Global Step: 509490 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:57:37,270-Speed 2636.44 samples/sec Loss 5.2840 LearningRate 0.0149 Epoch: 12 Global Step: 509500 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:57:41,165-Speed 2629.60 samples/sec Loss 5.3079 LearningRate 0.0149 Epoch: 12 Global Step: 509510 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:57:45,082-Speed 2614.28 samples/sec Loss 5.2344 LearningRate 0.0149 Epoch: 12 Global Step: 509520 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:57:48,979-Speed 2628.54 samples/sec Loss 5.1151 LearningRate 0.0149 Epoch: 12 Global Step: 509530 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:57:52,876-Speed 2628.37 samples/sec Loss 5.1560 LearningRate 0.0149 Epoch: 12 Global Step: 509540 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:57:56,771-Speed 2629.51 samples/sec Loss 5.2370 LearningRate 0.0149 Epoch: 12 Global Step: 509550 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:00,678-Speed 2621.96 samples/sec Loss 5.0913 LearningRate 0.0149 Epoch: 12 Global Step: 509560 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:04,571-Speed 2630.89 samples/sec Loss 5.2236 LearningRate 0.0149 Epoch: 12 Global Step: 509570 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:08,473-Speed 2624.76 samples/sec Loss 5.1852 LearningRate 0.0149 Epoch: 12 Global Step: 509580 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:12,379-Speed 2622.28 samples/sec Loss 5.2391 LearningRate 0.0149 Epoch: 12 Global Step: 509590 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:16,312-Speed 2605.11 samples/sec Loss 5.1570 LearningRate 0.0149 Epoch: 12 Global Step: 509600 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:58:20,250-Speed 2600.89 samples/sec Loss 5.2165 LearningRate 0.0149 Epoch: 12 Global Step: 509610 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:58:24,127-Speed 2641.87 samples/sec Loss 5.1573 LearningRate 0.0149 Epoch: 12 Global Step: 509620 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:28,028-Speed 2625.90 samples/sec Loss 5.2090 LearningRate 0.0149 Epoch: 12 Global Step: 509630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:31,930-Speed 2625.88 samples/sec Loss 5.1122 LearningRate 0.0149 Epoch: 12 Global Step: 509640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:35,831-Speed 2625.15 samples/sec Loss 5.1702 LearningRate 0.0149 Epoch: 12 Global Step: 509650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:39,728-Speed 2628.32 samples/sec Loss 5.2070 LearningRate 0.0149 Epoch: 12 Global Step: 509660 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:43,626-Speed 2628.35 samples/sec Loss 5.2070 LearningRate 0.0149 Epoch: 12 Global Step: 509670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:47,518-Speed 2631.85 samples/sec Loss 5.2796 LearningRate 0.0149 Epoch: 12 Global Step: 509680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:51,414-Speed 2628.63 samples/sec Loss 5.2002 LearningRate 0.0149 Epoch: 12 Global Step: 509690 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:55,311-Speed 2629.12 samples/sec Loss 5.1606 LearningRate 0.0149 Epoch: 12 Global Step: 509700 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:58:59,206-Speed 2629.24 samples/sec Loss 5.2589 LearningRate 0.0149 Epoch: 12 Global Step: 509710 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:03,084-Speed 2640.96 samples/sec Loss 5.1690 LearningRate 0.0149 Epoch: 12 Global Step: 509720 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:07,006-Speed 2612.11 samples/sec Loss 5.1727 LearningRate 0.0149 Epoch: 12 Global Step: 509730 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:10,955-Speed 2593.70 samples/sec Loss 5.1123 LearningRate 0.0149 Epoch: 12 Global Step: 509740 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:14,965-Speed 2554.28 samples/sec Loss 5.2197 LearningRate 0.0149 Epoch: 12 Global Step: 509750 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:18,904-Speed 2600.61 samples/sec Loss 5.2135 LearningRate 0.0149 Epoch: 12 Global Step: 509760 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:22,811-Speed 2621.23 samples/sec Loss 5.1435 LearningRate 0.0149 Epoch: 12 Global Step: 509770 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:26,709-Speed 2628.23 samples/sec Loss 5.2169 LearningRate 0.0149 Epoch: 12 Global Step: 509780 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:30,606-Speed 2628.13 samples/sec Loss 5.1042 LearningRate 0.0149 Epoch: 12 Global Step: 509790 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:34,536-Speed 2606.14 samples/sec Loss 5.1953 LearningRate 0.0149 Epoch: 12 Global Step: 509800 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:38,429-Speed 2631.08 samples/sec Loss 5.2176 LearningRate 0.0149 Epoch: 12 Global Step: 509810 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 04:59:42,327-Speed 2628.86 samples/sec Loss 5.1903 LearningRate 0.0149 Epoch: 12 Global Step: 509820 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:59:46,239-Speed 2618.21 samples/sec Loss 5.2607 LearningRate 0.0149 Epoch: 12 Global Step: 509830 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:59:50,142-Speed 2624.55 samples/sec Loss 5.1528 LearningRate 0.0149 Epoch: 12 Global Step: 509840 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:59:54,038-Speed 2628.33 samples/sec Loss 5.0803 LearningRate 0.0149 Epoch: 12 Global Step: 509850 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 04:59:57,934-Speed 2629.87 samples/sec Loss 5.2531 LearningRate 0.0149 Epoch: 12 Global Step: 509860 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:00:01,831-Speed 2628.05 samples/sec Loss 5.1076 LearningRate 0.0149 Epoch: 12 Global Step: 509870 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:00:05,734-Speed 2624.06 samples/sec Loss 5.2742 LearningRate 0.0149 Epoch: 12 Global Step: 509880 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:00:09,632-Speed 2627.81 samples/sec Loss 5.2622 LearningRate 0.0149 Epoch: 12 Global Step: 509890 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:00:13,518-Speed 2635.65 samples/sec Loss 5.2156 LearningRate 0.0148 Epoch: 12 Global Step: 509900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:17,431-Speed 2618.71 samples/sec Loss 5.3167 LearningRate 0.0148 Epoch: 12 Global Step: 509910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:21,332-Speed 2625.10 samples/sec Loss 5.2192 LearningRate 0.0148 Epoch: 12 Global Step: 509920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:25,231-Speed 2627.58 samples/sec Loss 5.1568 LearningRate 0.0148 Epoch: 12 Global Step: 509930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:29,125-Speed 2629.59 samples/sec Loss 5.1473 LearningRate 0.0148 Epoch: 12 Global Step: 509940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:33,022-Speed 2628.43 samples/sec Loss 5.0747 LearningRate 0.0148 Epoch: 12 Global Step: 509950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:36,931-Speed 2620.58 samples/sec Loss 5.1523 LearningRate 0.0148 Epoch: 12 Global Step: 509960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:40,832-Speed 2625.66 samples/sec Loss 5.1190 LearningRate 0.0148 Epoch: 12 Global Step: 509970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:44,738-Speed 2621.98 samples/sec Loss 5.2441 LearningRate 0.0148 Epoch: 12 Global Step: 509980 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:48,645-Speed 2621.81 samples/sec Loss 5.1908 LearningRate 0.0148 Epoch: 12 Global Step: 509990 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:00:52,551-Speed 2622.15 samples/sec Loss 5.1976 LearningRate 0.0148 Epoch: 12 Global Step: 510000 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:01:36,800-[lfw][510000]XNorm: 23.380763
Training: 2022-04-15 05:01:36,801-[lfw][510000]Accuracy-Flip: 0.99800+-0.00267
Training: 2022-04-15 05:01:36,802-[lfw][510000]Accuracy-Highest: 0.99800
Training: 2022-04-15 05:02:26,562-[cfp_fp][510000]XNorm: 21.950113
Training: 2022-04-15 05:02:26,563-[cfp_fp][510000]Accuracy-Flip: 0.99043+-0.00443
Training: 2022-04-15 05:02:26,564-[cfp_fp][510000]Accuracy-Highest: 0.99043
Training: 2022-04-15 05:03:09,839-[agedb_30][510000]XNorm: 23.299798
Training: 2022-04-15 05:03:09,840-[agedb_30][510000]Accuracy-Flip: 0.97767+-0.00688
Training: 2022-04-15 05:03:09,840-[agedb_30][510000]Accuracy-Highest: 0.97950
Training: 2022-04-15 05:03:13,720-Speed 72.54 samples/sec Loss 5.1982 LearningRate 0.0148 Epoch: 12 Global Step: 510010 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:03:17,582-Speed 2652.66 samples/sec Loss 5.1285 LearningRate 0.0148 Epoch: 12 Global Step: 510020 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:03:21,484-Speed 2624.98 samples/sec Loss 5.2669 LearningRate 0.0148 Epoch: 12 Global Step: 510030 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:03:25,353-Speed 2647.80 samples/sec Loss 5.2115 LearningRate 0.0148 Epoch: 12 Global Step: 510040 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:03:29,232-Speed 2640.18 samples/sec Loss 5.1798 LearningRate 0.0148 Epoch: 12 Global Step: 510050 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:03:33,087-Speed 2656.85 samples/sec Loss 5.0540 LearningRate 0.0148 Epoch: 12 Global Step: 510060 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:03:37,083-Speed 2564.20 samples/sec Loss 5.2652 LearningRate 0.0148 Epoch: 12 Global Step: 510070 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:03:41,023-Speed 2599.85 samples/sec Loss 5.1344 LearningRate 0.0148 Epoch: 12 Global Step: 510080 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:03:44,907-Speed 2637.84 samples/sec Loss 5.2528 LearningRate 0.0148 Epoch: 12 Global Step: 510090 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:03:48,825-Speed 2614.01 samples/sec Loss 5.1544 LearningRate 0.0148 Epoch: 12 Global Step: 510100 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:03:52,746-Speed 2612.23 samples/sec Loss 5.0851 LearningRate 0.0148 Epoch: 12 Global Step: 510110 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:03:56,651-Speed 2622.52 samples/sec Loss 5.2054 LearningRate 0.0148 Epoch: 12 Global Step: 510120 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:00,547-Speed 2629.50 samples/sec Loss 5.1183 LearningRate 0.0148 Epoch: 12 Global Step: 510130 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:04,451-Speed 2623.89 samples/sec Loss 5.1922 LearningRate 0.0148 Epoch: 12 Global Step: 510140 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:08,346-Speed 2629.50 samples/sec Loss 5.1056 LearningRate 0.0148 Epoch: 12 Global Step: 510150 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:12,234-Speed 2635.61 samples/sec Loss 5.0640 LearningRate 0.0148 Epoch: 12 Global Step: 510160 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:04:16,159-Speed 2609.23 samples/sec Loss 5.1390 LearningRate 0.0148 Epoch: 12 Global Step: 510170 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:04:20,033-Speed 2644.45 samples/sec Loss 5.1363 LearningRate 0.0148 Epoch: 12 Global Step: 510180 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:23,921-Speed 2634.15 samples/sec Loss 5.2032 LearningRate 0.0148 Epoch: 12 Global Step: 510190 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:27,848-Speed 2608.57 samples/sec Loss 5.3004 LearningRate 0.0148 Epoch: 12 Global Step: 510200 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:31,749-Speed 2625.19 samples/sec Loss 5.0599 LearningRate 0.0148 Epoch: 12 Global Step: 510210 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:35,638-Speed 2634.44 samples/sec Loss 5.1735 LearningRate 0.0148 Epoch: 12 Global Step: 510220 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:39,532-Speed 2630.33 samples/sec Loss 5.1240 LearningRate 0.0148 Epoch: 12 Global Step: 510230 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:43,439-Speed 2621.67 samples/sec Loss 5.1745 LearningRate 0.0148 Epoch: 12 Global Step: 510240 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:47,325-Speed 2636.16 samples/sec Loss 5.1277 LearningRate 0.0148 Epoch: 12 Global Step: 510250 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:51,215-Speed 2633.13 samples/sec Loss 5.2136 LearningRate 0.0148 Epoch: 12 Global Step: 510260 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:55,104-Speed 2634.43 samples/sec Loss 5.2105 LearningRate 0.0148 Epoch: 12 Global Step: 510270 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:04:58,995-Speed 2632.47 samples/sec Loss 5.1707 LearningRate 0.0148 Epoch: 12 Global Step: 510280 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:05:02,887-Speed 2631.33 samples/sec Loss 5.1338 LearningRate 0.0148 Epoch: 12 Global Step: 510290 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:05:06,797-Speed 2619.56 samples/sec Loss 5.1590 LearningRate 0.0148 Epoch: 12 Global Step: 510300 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:05:10,689-Speed 2631.84 samples/sec Loss 5.2020 LearningRate 0.0148 Epoch: 12 Global Step: 510310 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:05:14,561-Speed 2645.62 samples/sec Loss 5.1559 LearningRate 0.0148 Epoch: 12 Global Step: 510320 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:18,479-Speed 2613.71 samples/sec Loss 5.2502 LearningRate 0.0148 Epoch: 12 Global Step: 510330 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:22,378-Speed 2627.30 samples/sec Loss 5.0095 LearningRate 0.0148 Epoch: 12 Global Step: 510340 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:26,269-Speed 2632.67 samples/sec Loss 5.1711 LearningRate 0.0148 Epoch: 12 Global Step: 510350 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:30,196-Speed 2608.70 samples/sec Loss 5.1826 LearningRate 0.0148 Epoch: 12 Global Step: 510360 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:34,093-Speed 2628.29 samples/sec Loss 5.2244 LearningRate 0.0148 Epoch: 12 Global Step: 510370 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:38,023-Speed 2605.69 samples/sec Loss 5.2132 LearningRate 0.0148 Epoch: 12 Global Step: 510380 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:41,922-Speed 2627.13 samples/sec Loss 5.0908 LearningRate 0.0148 Epoch: 12 Global Step: 510390 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:45,819-Speed 2628.52 samples/sec Loss 5.1662 LearningRate 0.0148 Epoch: 12 Global Step: 510400 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:49,718-Speed 2627.19 samples/sec Loss 5.2488 LearningRate 0.0148 Epoch: 12 Global Step: 510410 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:05:53,625-Speed 2621.04 samples/sec Loss 5.2278 LearningRate 0.0148 Epoch: 12 Global Step: 510420 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:05:57,541-Speed 2616.48 samples/sec Loss 5.1752 LearningRate 0.0148 Epoch: 12 Global Step: 510430 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:06:01,409-Speed 2647.47 samples/sec Loss 5.0491 LearningRate 0.0148 Epoch: 12 Global Step: 510440 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:05,306-Speed 2628.65 samples/sec Loss 5.0105 LearningRate 0.0148 Epoch: 12 Global Step: 510450 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:09,201-Speed 2629.38 samples/sec Loss 5.1058 LearningRate 0.0148 Epoch: 12 Global Step: 510460 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:13,106-Speed 2623.69 samples/sec Loss 5.1111 LearningRate 0.0148 Epoch: 12 Global Step: 510470 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:17,006-Speed 2625.59 samples/sec Loss 5.0998 LearningRate 0.0148 Epoch: 12 Global Step: 510480 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:20,971-Speed 2583.41 samples/sec Loss 5.2174 LearningRate 0.0148 Epoch: 12 Global Step: 510490 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:24,913-Speed 2598.71 samples/sec Loss 5.0926 LearningRate 0.0148 Epoch: 12 Global Step: 510500 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:28,829-Speed 2616.46 samples/sec Loss 5.1522 LearningRate 0.0148 Epoch: 12 Global Step: 510510 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:32,720-Speed 2632.27 samples/sec Loss 5.2328 LearningRate 0.0148 Epoch: 12 Global Step: 510520 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:36,611-Speed 2632.82 samples/sec Loss 5.1348 LearningRate 0.0148 Epoch: 12 Global Step: 510530 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:40,511-Speed 2625.82 samples/sec Loss 5.2554 LearningRate 0.0148 Epoch: 12 Global Step: 510540 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:06:44,412-Speed 2626.34 samples/sec Loss 5.2241 LearningRate 0.0148 Epoch: 12 Global Step: 510550 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:06:48,287-Speed 2642.84 samples/sec Loss 5.1225 LearningRate 0.0148 Epoch: 12 Global Step: 510560 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:52,178-Speed 2632.71 samples/sec Loss 5.1774 LearningRate 0.0148 Epoch: 12 Global Step: 510570 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:56,090-Speed 2617.89 samples/sec Loss 5.1460 LearningRate 0.0148 Epoch: 12 Global Step: 510580 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:06:59,979-Speed 2634.02 samples/sec Loss 5.1308 LearningRate 0.0148 Epoch: 12 Global Step: 510590 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:03,878-Speed 2627.06 samples/sec Loss 5.1703 LearningRate 0.0148 Epoch: 12 Global Step: 510600 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:07,769-Speed 2632.47 samples/sec Loss 5.0965 LearningRate 0.0148 Epoch: 12 Global Step: 510610 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:11,670-Speed 2625.38 samples/sec Loss 5.1639 LearningRate 0.0148 Epoch: 12 Global Step: 510620 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:15,573-Speed 2624.67 samples/sec Loss 5.2017 LearningRate 0.0148 Epoch: 12 Global Step: 510630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:19,496-Speed 2611.41 samples/sec Loss 5.1579 LearningRate 0.0148 Epoch: 12 Global Step: 510640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:23,392-Speed 2629.14 samples/sec Loss 5.1515 LearningRate 0.0148 Epoch: 12 Global Step: 510650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:27,325-Speed 2604.72 samples/sec Loss 5.2098 LearningRate 0.0148 Epoch: 12 Global Step: 510660 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:07:31,200-Speed 2642.86 samples/sec Loss 5.2006 LearningRate 0.0148 Epoch: 12 Global Step: 510670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:35,125-Speed 2610.05 samples/sec Loss 5.1165 LearningRate 0.0148 Epoch: 12 Global Step: 510680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:39,020-Speed 2629.37 samples/sec Loss 5.0857 LearningRate 0.0148 Epoch: 12 Global Step: 510690 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:42,916-Speed 2629.40 samples/sec Loss 5.1256 LearningRate 0.0148 Epoch: 12 Global Step: 510700 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:46,822-Speed 2621.46 samples/sec Loss 5.0574 LearningRate 0.0148 Epoch: 12 Global Step: 510710 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:50,718-Speed 2630.08 samples/sec Loss 5.1655 LearningRate 0.0148 Epoch: 12 Global Step: 510720 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:54,614-Speed 2628.96 samples/sec Loss 5.1536 LearningRate 0.0148 Epoch: 12 Global Step: 510730 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:07:58,513-Speed 2626.62 samples/sec Loss 5.1714 LearningRate 0.0148 Epoch: 12 Global Step: 510740 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:02,428-Speed 2616.30 samples/sec Loss 5.1965 LearningRate 0.0148 Epoch: 12 Global Step: 510750 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:06,325-Speed 2627.85 samples/sec Loss 5.1565 LearningRate 0.0148 Epoch: 12 Global Step: 510760 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:10,223-Speed 2627.43 samples/sec Loss 5.1158 LearningRate 0.0148 Epoch: 12 Global Step: 510770 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:08:14,097-Speed 2644.50 samples/sec Loss 5.0397 LearningRate 0.0148 Epoch: 12 Global Step: 510780 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:18,003-Speed 2621.87 samples/sec Loss 5.2114 LearningRate 0.0148 Epoch: 12 Global Step: 510790 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:21,902-Speed 2626.98 samples/sec Loss 5.0751 LearningRate 0.0148 Epoch: 12 Global Step: 510800 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:25,797-Speed 2630.39 samples/sec Loss 5.2336 LearningRate 0.0148 Epoch: 12 Global Step: 510810 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:29,693-Speed 2628.63 samples/sec Loss 5.1502 LearningRate 0.0148 Epoch: 12 Global Step: 510820 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:33,590-Speed 2628.25 samples/sec Loss 5.1905 LearningRate 0.0148 Epoch: 12 Global Step: 510830 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:37,492-Speed 2625.35 samples/sec Loss 5.1206 LearningRate 0.0148 Epoch: 12 Global Step: 510840 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:41,403-Speed 2618.31 samples/sec Loss 5.2369 LearningRate 0.0148 Epoch: 12 Global Step: 510850 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:45,300-Speed 2628.38 samples/sec Loss 5.1316 LearningRate 0.0148 Epoch: 12 Global Step: 510860 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:49,311-Speed 2553.50 samples/sec Loss 5.1990 LearningRate 0.0148 Epoch: 12 Global Step: 510870 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:08:53,213-Speed 2625.47 samples/sec Loss 5.1106 LearningRate 0.0148 Epoch: 12 Global Step: 510880 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:08:57,114-Speed 2625.77 samples/sec Loss 5.2154 LearningRate 0.0148 Epoch: 12 Global Step: 510890 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:09:01,006-Speed 2631.29 samples/sec Loss 5.1062 LearningRate 0.0148 Epoch: 12 Global Step: 510900 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:09:04,912-Speed 2623.10 samples/sec Loss 5.1570 LearningRate 0.0148 Epoch: 12 Global Step: 510910 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:09:08,787-Speed 2642.99 samples/sec Loss 5.1817 LearningRate 0.0148 Epoch: 12 Global Step: 510920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:12,684-Speed 2627.74 samples/sec Loss 5.2135 LearningRate 0.0148 Epoch: 12 Global Step: 510930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:16,578-Speed 2630.21 samples/sec Loss 5.0550 LearningRate 0.0148 Epoch: 12 Global Step: 510940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:20,472-Speed 2630.59 samples/sec Loss 5.1007 LearningRate 0.0148 Epoch: 12 Global Step: 510950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:24,367-Speed 2629.64 samples/sec Loss 5.0767 LearningRate 0.0148 Epoch: 12 Global Step: 510960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:28,270-Speed 2624.45 samples/sec Loss 5.1285 LearningRate 0.0148 Epoch: 12 Global Step: 510970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:32,178-Speed 2621.00 samples/sec Loss 5.1507 LearningRate 0.0147 Epoch: 12 Global Step: 510980 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:36,077-Speed 2626.90 samples/sec Loss 5.1825 LearningRate 0.0147 Epoch: 12 Global Step: 510990 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:39,973-Speed 2628.82 samples/sec Loss 5.1137 LearningRate 0.0147 Epoch: 12 Global Step: 511000 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:43,921-Speed 2594.61 samples/sec Loss 5.1778 LearningRate 0.0147 Epoch: 12 Global Step: 511010 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:47,875-Speed 2590.31 samples/sec Loss 5.0960 LearningRate 0.0147 Epoch: 12 Global Step: 511020 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:09:51,756-Speed 2639.39 samples/sec Loss 5.2544 LearningRate 0.0147 Epoch: 12 Global Step: 511030 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:55,652-Speed 2628.49 samples/sec Loss 5.1548 LearningRate 0.0147 Epoch: 12 Global Step: 511040 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:09:59,560-Speed 2621.08 samples/sec Loss 5.0872 LearningRate 0.0147 Epoch: 12 Global Step: 511050 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:10:03,458-Speed 2627.95 samples/sec Loss 5.1336 LearningRate 0.0147 Epoch: 12 Global Step: 511060 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:10:07,356-Speed 2627.93 samples/sec Loss 5.2491 LearningRate 0.0147 Epoch: 12 Global Step: 511070 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:10:11,227-Speed 2645.49 samples/sec Loss 5.1743 LearningRate 0.0147 Epoch: 12 Global Step: 511080 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:15,123-Speed 2629.19 samples/sec Loss 5.0947 LearningRate 0.0147 Epoch: 12 Global Step: 511090 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:19,018-Speed 2629.75 samples/sec Loss 5.1066 LearningRate 0.0147 Epoch: 12 Global Step: 511100 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:22,916-Speed 2627.25 samples/sec Loss 5.1007 LearningRate 0.0147 Epoch: 12 Global Step: 511110 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:26,812-Speed 2629.09 samples/sec Loss 5.0531 LearningRate 0.0147 Epoch: 12 Global Step: 511120 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:30,708-Speed 2629.14 samples/sec Loss 5.2530 LearningRate 0.0147 Epoch: 12 Global Step: 511130 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:34,617-Speed 2620.91 samples/sec Loss 5.1596 LearningRate 0.0147 Epoch: 12 Global Step: 511140 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:38,543-Speed 2608.39 samples/sec Loss 5.1331 LearningRate 0.0147 Epoch: 12 Global Step: 511150 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:42,454-Speed 2619.16 samples/sec Loss 5.0385 LearningRate 0.0147 Epoch: 12 Global Step: 511160 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:46,357-Speed 2624.40 samples/sec Loss 5.0957 LearningRate 0.0147 Epoch: 12 Global Step: 511170 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:10:50,259-Speed 2624.72 samples/sec Loss 5.0860 LearningRate 0.0147 Epoch: 12 Global Step: 511180 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:10:54,176-Speed 2615.63 samples/sec Loss 5.1446 LearningRate 0.0147 Epoch: 12 Global Step: 511190 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:10:58,074-Speed 2627.88 samples/sec Loss 5.2070 LearningRate 0.0147 Epoch: 12 Global Step: 511200 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:01,986-Speed 2618.04 samples/sec Loss 5.1088 LearningRate 0.0147 Epoch: 12 Global Step: 511210 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:05,883-Speed 2627.97 samples/sec Loss 5.1857 LearningRate 0.0147 Epoch: 12 Global Step: 511220 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:09,795-Speed 2618.32 samples/sec Loss 5.1138 LearningRate 0.0147 Epoch: 12 Global Step: 511230 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:13,696-Speed 2626.09 samples/sec Loss 5.0740 LearningRate 0.0147 Epoch: 12 Global Step: 511240 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:17,603-Speed 2621.37 samples/sec Loss 5.1312 LearningRate 0.0147 Epoch: 12 Global Step: 511250 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:21,500-Speed 2628.29 samples/sec Loss 5.1656 LearningRate 0.0147 Epoch: 12 Global Step: 511260 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:25,412-Speed 2617.83 samples/sec Loss 5.1607 LearningRate 0.0147 Epoch: 12 Global Step: 511270 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:29,286-Speed 2644.92 samples/sec Loss 5.2067 LearningRate 0.0147 Epoch: 12 Global Step: 511280 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:33,191-Speed 2622.41 samples/sec Loss 5.0940 LearningRate 0.0147 Epoch: 12 Global Step: 511290 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:37,086-Speed 2629.56 samples/sec Loss 5.1785 LearningRate 0.0147 Epoch: 12 Global Step: 511300 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:40,989-Speed 2624.08 samples/sec Loss 5.0053 LearningRate 0.0147 Epoch: 12 Global Step: 511310 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:44,896-Speed 2621.74 samples/sec Loss 5.1371 LearningRate 0.0147 Epoch: 12 Global Step: 511320 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:48,820-Speed 2610.19 samples/sec Loss 5.2158 LearningRate 0.0147 Epoch: 12 Global Step: 511330 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:52,741-Speed 2612.53 samples/sec Loss 5.1309 LearningRate 0.0147 Epoch: 12 Global Step: 511340 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:11:56,648-Speed 2621.21 samples/sec Loss 5.0745 LearningRate 0.0147 Epoch: 12 Global Step: 511350 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:00,546-Speed 2627.89 samples/sec Loss 5.1696 LearningRate 0.0147 Epoch: 12 Global Step: 511360 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:04,441-Speed 2629.70 samples/sec Loss 5.0485 LearningRate 0.0147 Epoch: 12 Global Step: 511370 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:08,336-Speed 2629.81 samples/sec Loss 5.0856 LearningRate 0.0147 Epoch: 12 Global Step: 511380 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:12,236-Speed 2626.31 samples/sec Loss 5.1673 LearningRate 0.0147 Epoch: 12 Global Step: 511390 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:16,194-Speed 2587.89 samples/sec Loss 5.1439 LearningRate 0.0147 Epoch: 12 Global Step: 511400 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:20,089-Speed 2629.73 samples/sec Loss 5.1352 LearningRate 0.0147 Epoch: 12 Global Step: 511410 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:23,990-Speed 2625.74 samples/sec Loss 5.1822 LearningRate 0.0147 Epoch: 12 Global Step: 511420 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:27,892-Speed 2625.03 samples/sec Loss 5.1736 LearningRate 0.0147 Epoch: 12 Global Step: 511430 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:31,849-Speed 2588.56 samples/sec Loss 5.1133 LearningRate 0.0147 Epoch: 12 Global Step: 511440 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:35,741-Speed 2632.02 samples/sec Loss 5.1532 LearningRate 0.0147 Epoch: 12 Global Step: 511450 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:12:39,622-Speed 2638.63 samples/sec Loss 5.2675 LearningRate 0.0147 Epoch: 12 Global Step: 511460 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:43,533-Speed 2619.53 samples/sec Loss 5.0645 LearningRate 0.0147 Epoch: 12 Global Step: 511470 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:47,443-Speed 2619.27 samples/sec Loss 5.3095 LearningRate 0.0147 Epoch: 12 Global Step: 511480 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:51,348-Speed 2622.94 samples/sec Loss 5.1157 LearningRate 0.0147 Epoch: 12 Global Step: 511490 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:55,245-Speed 2628.72 samples/sec Loss 5.1415 LearningRate 0.0147 Epoch: 12 Global Step: 511500 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:12:59,141-Speed 2629.23 samples/sec Loss 5.1823 LearningRate 0.0147 Epoch: 12 Global Step: 511510 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:03,036-Speed 2629.27 samples/sec Loss 5.1698 LearningRate 0.0147 Epoch: 12 Global Step: 511520 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:06,940-Speed 2623.46 samples/sec Loss 5.1328 LearningRate 0.0147 Epoch: 12 Global Step: 511530 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:10,843-Speed 2624.11 samples/sec Loss 5.1621 LearningRate 0.0147 Epoch: 12 Global Step: 511540 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:14,751-Speed 2621.06 samples/sec Loss 5.1185 LearningRate 0.0147 Epoch: 12 Global Step: 511550 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:18,647-Speed 2628.77 samples/sec Loss 5.1408 LearningRate 0.0147 Epoch: 12 Global Step: 511560 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:13:22,554-Speed 2621.88 samples/sec Loss 5.1366 LearningRate 0.0147 Epoch: 12 Global Step: 511570 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:13:26,454-Speed 2626.67 samples/sec Loss 5.0487 LearningRate 0.0147 Epoch: 12 Global Step: 511580 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:13:30,361-Speed 2621.65 samples/sec Loss 5.0590 LearningRate 0.0147 Epoch: 12 Global Step: 511590 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:13:34,262-Speed 2626.16 samples/sec Loss 5.1309 LearningRate 0.0147 Epoch: 12 Global Step: 511600 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:13:38,159-Speed 2627.89 samples/sec Loss 5.1755 LearningRate 0.0147 Epoch: 12 Global Step: 511610 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:13:42,037-Speed 2640.80 samples/sec Loss 5.1965 LearningRate 0.0147 Epoch: 12 Global Step: 511620 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:45,935-Speed 2627.98 samples/sec Loss 5.1029 LearningRate 0.0147 Epoch: 12 Global Step: 511630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:49,873-Speed 2614.66 samples/sec Loss 5.1068 LearningRate 0.0147 Epoch: 12 Global Step: 511640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:53,772-Speed 2626.70 samples/sec Loss 5.2060 LearningRate 0.0147 Epoch: 12 Global Step: 511650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:13:57,734-Speed 2597.26 samples/sec Loss 5.1872 LearningRate 0.0147 Epoch: 12 Global Step: 511660 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:14:01,635-Speed 2625.92 samples/sec Loss 5.1390 LearningRate 0.0147 Epoch: 12 Global Step: 511670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:14:05,541-Speed 2621.88 samples/sec Loss 5.0365 LearningRate 0.0147 Epoch: 12 Global Step: 511680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:14:09,437-Speed 2628.45 samples/sec Loss 5.2693 LearningRate 0.0147 Epoch: 12 Global Step: 511690 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:14:13,834-Speed 2628.66 samples/sec Loss 5.2483 LearningRate 0.0147 Epoch: 12 Global Step: 511700 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:14:17,743-Speed 2620.03 samples/sec Loss 5.2465 LearningRate 0.0147 Epoch: 12 Global Step: 511710 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:14:22,133-Speed 2624.26 samples/sec Loss 5.1241 LearningRate 0.0147 Epoch: 12 Global Step: 511720 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:26,596-Speed 2589.65 samples/sec Loss 5.1039 LearningRate 0.0147 Epoch: 12 Global Step: 511730 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:30,893-Speed 2615.65 samples/sec Loss 5.1618 LearningRate 0.0147 Epoch: 12 Global Step: 511740 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:34,820-Speed 2608.33 samples/sec Loss 5.1233 LearningRate 0.0147 Epoch: 12 Global Step: 511750 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:38,797-Speed 2622.49 samples/sec Loss 5.1464 LearningRate 0.0147 Epoch: 12 Global Step: 511760 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:42,701-Speed 2623.70 samples/sec Loss 5.0474 LearningRate 0.0147 Epoch: 12 Global Step: 511770 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:46,629-Speed 2607.49 samples/sec Loss 5.0529 LearningRate 0.0147 Epoch: 12 Global Step: 511780 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:50,529-Speed 2625.95 samples/sec Loss 5.2171 LearningRate 0.0147 Epoch: 12 Global Step: 511790 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:54,433-Speed 2624.34 samples/sec Loss 5.1297 LearningRate 0.0147 Epoch: 12 Global Step: 511800 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:14:58,348-Speed 2616.20 samples/sec Loss 5.1920 LearningRate 0.0147 Epoch: 12 Global Step: 511810 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:15:02,220-Speed 2645.24 samples/sec Loss 5.0838 LearningRate 0.0147 Epoch: 12 Global Step: 511820 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:15:06,131-Speed 2618.64 samples/sec Loss 5.2511 LearningRate 0.0147 Epoch: 12 Global Step: 511830 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:15:10,035-Speed 2624.17 samples/sec Loss 5.1806 LearningRate 0.0147 Epoch: 12 Global Step: 511840 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:15:14,032-Speed 2562.22 samples/sec Loss 5.1319 LearningRate 0.0147 Epoch: 12 Global Step: 511850 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:15:18,055-Speed 2546.04 samples/sec Loss 5.1742 LearningRate 0.0147 Epoch: 12 Global Step: 511860 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:15:22,052-Speed 2562.44 samples/sec Loss 5.1578 LearningRate 0.0147 Epoch: 12 Global Step: 511870 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:15:26,152-Speed 2498.66 samples/sec Loss 5.1165 LearningRate 0.0147 Epoch: 12 Global Step: 511880 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:30,239-Speed 2506.63 samples/sec Loss 5.2018 LearningRate 0.0147 Epoch: 12 Global Step: 511890 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:34,186-Speed 2594.69 samples/sec Loss 5.1237 LearningRate 0.0147 Epoch: 12 Global Step: 511900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:38,096-Speed 2620.07 samples/sec Loss 5.1133 LearningRate 0.0147 Epoch: 12 Global Step: 511910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:41,995-Speed 2626.80 samples/sec Loss 5.0925 LearningRate 0.0147 Epoch: 12 Global Step: 511920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:45,918-Speed 2610.72 samples/sec Loss 5.1415 LearningRate 0.0147 Epoch: 12 Global Step: 511930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:49,870-Speed 2592.07 samples/sec Loss 5.1681 LearningRate 0.0147 Epoch: 12 Global Step: 511940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:53,784-Speed 2617.04 samples/sec Loss 5.2498 LearningRate 0.0147 Epoch: 12 Global Step: 511950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:15:57,710-Speed 2608.50 samples/sec Loss 5.1003 LearningRate 0.0147 Epoch: 12 Global Step: 511960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:01,614-Speed 2623.70 samples/sec Loss 5.1930 LearningRate 0.0147 Epoch: 12 Global Step: 511970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:05,514-Speed 2626.15 samples/sec Loss 5.2134 LearningRate 0.0147 Epoch: 12 Global Step: 511980 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:16:09,434-Speed 2613.33 samples/sec Loss 5.1883 LearningRate 0.0147 Epoch: 12 Global Step: 511990 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:16:13,320-Speed 2635.95 samples/sec Loss 5.0833 LearningRate 0.0147 Epoch: 12 Global Step: 512000 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:17,228-Speed 2620.94 samples/sec Loss 5.1078 LearningRate 0.0147 Epoch: 12 Global Step: 512010 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:21,134-Speed 2622.17 samples/sec Loss 5.1930 LearningRate 0.0147 Epoch: 12 Global Step: 512020 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:25,040-Speed 2622.56 samples/sec Loss 5.1164 LearningRate 0.0147 Epoch: 12 Global Step: 512030 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:28,957-Speed 2614.55 samples/sec Loss 5.0166 LearningRate 0.0147 Epoch: 12 Global Step: 512040 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:32,875-Speed 2613.67 samples/sec Loss 5.0696 LearningRate 0.0147 Epoch: 12 Global Step: 512050 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:36,782-Speed 2622.08 samples/sec Loss 5.0705 LearningRate 0.0146 Epoch: 12 Global Step: 512060 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:40,681-Speed 2627.27 samples/sec Loss 5.1386 LearningRate 0.0146 Epoch: 12 Global Step: 512070 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:44,584-Speed 2623.91 samples/sec Loss 5.0709 LearningRate 0.0146 Epoch: 12 Global Step: 512080 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:48,515-Speed 2606.38 samples/sec Loss 5.0901 LearningRate 0.0146 Epoch: 12 Global Step: 512090 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:16:52,411-Speed 2628.94 samples/sec Loss 5.1066 LearningRate 0.0146 Epoch: 12 Global Step: 512100 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:16:56,291-Speed 2640.25 samples/sec Loss 5.2346 LearningRate 0.0146 Epoch: 12 Global Step: 512110 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:00,211-Speed 2612.70 samples/sec Loss 5.1356 LearningRate 0.0146 Epoch: 12 Global Step: 512120 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:04,115-Speed 2623.51 samples/sec Loss 5.0945 LearningRate 0.0146 Epoch: 12 Global Step: 512130 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:08,017-Speed 2624.54 samples/sec Loss 5.1498 LearningRate 0.0146 Epoch: 12 Global Step: 512140 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:11,930-Speed 2617.68 samples/sec Loss 5.1840 LearningRate 0.0146 Epoch: 12 Global Step: 512150 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:15,836-Speed 2622.81 samples/sec Loss 5.2147 LearningRate 0.0146 Epoch: 12 Global Step: 512160 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:19,738-Speed 2624.77 samples/sec Loss 5.2443 LearningRate 0.0146 Epoch: 12 Global Step: 512170 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:23,663-Speed 2609.26 samples/sec Loss 5.2181 LearningRate 0.0146 Epoch: 12 Global Step: 512180 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:27,567-Speed 2623.88 samples/sec Loss 5.0886 LearningRate 0.0146 Epoch: 12 Global Step: 512190 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:31,616-Speed 2529.82 samples/sec Loss 5.2108 LearningRate 0.0146 Epoch: 12 Global Step: 512200 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:35,523-Speed 2621.45 samples/sec Loss 5.1111 LearningRate 0.0146 Epoch: 12 Global Step: 512210 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:17:39,425-Speed 2624.97 samples/sec Loss 5.1886 LearningRate 0.0146 Epoch: 12 Global Step: 512220 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:17:43,327-Speed 2624.93 samples/sec Loss 5.0924 LearningRate 0.0146 Epoch: 12 Global Step: 512230 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:17:47,213-Speed 2635.42 samples/sec Loss 5.1180 LearningRate 0.0146 Epoch: 12 Global Step: 512240 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:51,114-Speed 2625.42 samples/sec Loss 5.2748 LearningRate 0.0146 Epoch: 12 Global Step: 512250 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:55,048-Speed 2603.96 samples/sec Loss 5.0590 LearningRate 0.0146 Epoch: 12 Global Step: 512260 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:17:58,949-Speed 2625.65 samples/sec Loss 5.1837 LearningRate 0.0146 Epoch: 12 Global Step: 512270 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:18:02,854-Speed 2622.65 samples/sec Loss 5.1412 LearningRate 0.0146 Epoch: 12 Global Step: 512280 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:18:06,756-Speed 2624.65 samples/sec Loss 5.0760 LearningRate 0.0146 Epoch: 12 Global Step: 512290 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:18:10,656-Speed 2625.96 samples/sec Loss 5.1221 LearningRate 0.0146 Epoch: 12 Global Step: 512300 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:18:14,557-Speed 2626.44 samples/sec Loss 5.0862 LearningRate 0.0146 Epoch: 12 Global Step: 512310 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:18:18,457-Speed 2626.11 samples/sec Loss 5.1486 LearningRate 0.0146 Epoch: 12 Global Step: 512320 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:18:22,360-Speed 2624.32 samples/sec Loss 5.0470 LearningRate 0.0146 Epoch: 12 Global Step: 512330 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:18:26,265-Speed 2622.82 samples/sec Loss 5.0791 LearningRate 0.0146 Epoch: 12 Global Step: 512340 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:30,169-Speed 2624.20 samples/sec Loss 5.1669 LearningRate 0.0146 Epoch: 12 Global Step: 512350 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:34,069-Speed 2625.87 samples/sec Loss 5.1457 LearningRate 0.0146 Epoch: 12 Global Step: 512360 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:37,973-Speed 2622.99 samples/sec Loss 5.0190 LearningRate 0.0146 Epoch: 12 Global Step: 512370 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:41,876-Speed 2624.42 samples/sec Loss 5.1970 LearningRate 0.0146 Epoch: 12 Global Step: 512380 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:45,784-Speed 2621.48 samples/sec Loss 5.1031 LearningRate 0.0146 Epoch: 12 Global Step: 512390 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:49,689-Speed 2623.01 samples/sec Loss 5.0515 LearningRate 0.0146 Epoch: 12 Global Step: 512400 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:53,587-Speed 2627.54 samples/sec Loss 5.1731 LearningRate 0.0146 Epoch: 12 Global Step: 512410 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:18:57,490-Speed 2624.21 samples/sec Loss 5.1367 LearningRate 0.0146 Epoch: 12 Global Step: 512420 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:19:01,391-Speed 2625.37 samples/sec Loss 5.0912 LearningRate 0.0146 Epoch: 12 Global Step: 512430 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:19:05,294-Speed 2624.17 samples/sec Loss 5.2004 LearningRate 0.0146 Epoch: 12 Global Step: 512440 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:19:09,213-Speed 2613.68 samples/sec Loss 5.1106 LearningRate 0.0146 Epoch: 12 Global Step: 512450 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:19:13,097-Speed 2636.76 samples/sec Loss 5.0932 LearningRate 0.0146 Epoch: 12 Global Step: 512460 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:17,015-Speed 2614.56 samples/sec Loss 5.0340 LearningRate 0.0146 Epoch: 12 Global Step: 512470 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:20,931-Speed 2615.80 samples/sec Loss 5.1450 LearningRate 0.0146 Epoch: 12 Global Step: 512480 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:24,844-Speed 2617.70 samples/sec Loss 5.1097 LearningRate 0.0146 Epoch: 12 Global Step: 512490 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:28,774-Speed 2606.21 samples/sec Loss 5.1209 LearningRate 0.0146 Epoch: 12 Global Step: 512500 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:32,730-Speed 2589.23 samples/sec Loss 5.1677 LearningRate 0.0146 Epoch: 12 Global Step: 512510 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:36,644-Speed 2617.17 samples/sec Loss 5.1772 LearningRate 0.0146 Epoch: 12 Global Step: 512520 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:40,567-Speed 2610.47 samples/sec Loss 5.0794 LearningRate 0.0146 Epoch: 12 Global Step: 512530 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:44,475-Speed 2621.12 samples/sec Loss 5.1816 LearningRate 0.0146 Epoch: 12 Global Step: 512540 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:48,390-Speed 2616.38 samples/sec Loss 5.1003 LearningRate 0.0146 Epoch: 12 Global Step: 512550 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:19:52,303-Speed 2617.54 samples/sec Loss 5.1877 LearningRate 0.0146 Epoch: 12 Global Step: 512560 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:19:56,226-Speed 2611.53 samples/sec Loss 5.1823 LearningRate 0.0146 Epoch: 12 Global Step: 512570 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:20:00,133-Speed 2621.56 samples/sec Loss 5.1856 LearningRate 0.0146 Epoch: 12 Global Step: 512580 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:20:04,038-Speed 2622.56 samples/sec Loss 5.0398 LearningRate 0.0146 Epoch: 12 Global Step: 512590 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:20:07,950-Speed 2618.35 samples/sec Loss 5.1792 LearningRate 0.0146 Epoch: 12 Global Step: 512600 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:20:11,850-Speed 2626.15 samples/sec Loss 5.1648 LearningRate 0.0146 Epoch: 12 Global Step: 512610 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:20:15,751-Speed 2626.00 samples/sec Loss 5.0206 LearningRate 0.0146 Epoch: 12 Global Step: 512620 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:20:19,638-Speed 2634.77 samples/sec Loss 5.0943 LearningRate 0.0146 Epoch: 12 Global Step: 512630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:23,539-Speed 2625.46 samples/sec Loss 5.1173 LearningRate 0.0146 Epoch: 12 Global Step: 512640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:27,447-Speed 2620.85 samples/sec Loss 5.2027 LearningRate 0.0146 Epoch: 12 Global Step: 512650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:31,389-Speed 2598.80 samples/sec Loss 5.1345 LearningRate 0.0146 Epoch: 12 Global Step: 512660 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:35,388-Speed 2561.70 samples/sec Loss 5.1216 LearningRate 0.0146 Epoch: 12 Global Step: 512670 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:39,293-Speed 2622.65 samples/sec Loss 5.0590 LearningRate 0.0146 Epoch: 12 Global Step: 512680 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:43,204-Speed 2619.67 samples/sec Loss 5.1791 LearningRate 0.0146 Epoch: 12 Global Step: 512690 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:47,114-Speed 2619.32 samples/sec Loss 5.1451 LearningRate 0.0146 Epoch: 12 Global Step: 512700 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:51,018-Speed 2623.50 samples/sec Loss 5.1262 LearningRate 0.0146 Epoch: 12 Global Step: 512710 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:54,926-Speed 2620.87 samples/sec Loss 5.2708 LearningRate 0.0146 Epoch: 12 Global Step: 512720 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:20:58,831-Speed 2623.32 samples/sec Loss 5.1204 LearningRate 0.0146 Epoch: 12 Global Step: 512730 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:21:02,744-Speed 2617.73 samples/sec Loss 5.1834 LearningRate 0.0146 Epoch: 12 Global Step: 512740 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:21:06,657-Speed 2617.54 samples/sec Loss 5.1195 LearningRate 0.0146 Epoch: 12 Global Step: 512750 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:21:10,564-Speed 2621.53 samples/sec Loss 4.9299 LearningRate 0.0146 Epoch: 12 Global Step: 512760 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:21:14,447-Speed 2638.44 samples/sec Loss 5.0886 LearningRate 0.0146 Epoch: 12 Global Step: 512770 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:18,349-Speed 2624.90 samples/sec Loss 5.0951 LearningRate 0.0146 Epoch: 12 Global Step: 512780 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:22,250-Speed 2625.71 samples/sec Loss 5.1849 LearningRate 0.0146 Epoch: 12 Global Step: 512790 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:26,145-Speed 2629.30 samples/sec Loss 5.0942 LearningRate 0.0146 Epoch: 12 Global Step: 512800 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:30,053-Speed 2621.46 samples/sec Loss 5.1258 LearningRate 0.0146 Epoch: 12 Global Step: 512810 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:33,980-Speed 2607.70 samples/sec Loss 5.1938 LearningRate 0.0146 Epoch: 12 Global Step: 512820 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:37,890-Speed 2625.26 samples/sec Loss 5.0692 LearningRate 0.0146 Epoch: 12 Global Step: 512830 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:41,792-Speed 2625.48 samples/sec Loss 5.0771 LearningRate 0.0146 Epoch: 12 Global Step: 512840 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:45,692-Speed 2626.99 samples/sec Loss 5.1723 LearningRate 0.0146 Epoch: 12 Global Step: 512850 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:49,609-Speed 2614.26 samples/sec Loss 5.0552 LearningRate 0.0146 Epoch: 12 Global Step: 512860 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:21:53,517-Speed 2620.74 samples/sec Loss 5.0905 LearningRate 0.0146 Epoch: 12 Global Step: 512870 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:21:57,412-Speed 2629.63 samples/sec Loss 5.1097 LearningRate 0.0146 Epoch: 12 Global Step: 512880 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:01,332-Speed 2614.08 samples/sec Loss 5.0490 LearningRate 0.0146 Epoch: 12 Global Step: 512890 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:05,239-Speed 2621.67 samples/sec Loss 5.1861 LearningRate 0.0146 Epoch: 12 Global Step: 512900 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:09,140-Speed 2625.93 samples/sec Loss 5.1669 LearningRate 0.0146 Epoch: 12 Global Step: 512910 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:13,044-Speed 2623.58 samples/sec Loss 5.0699 LearningRate 0.0146 Epoch: 12 Global Step: 512920 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:16,961-Speed 2615.34 samples/sec Loss 5.0978 LearningRate 0.0146 Epoch: 12 Global Step: 512930 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:20,873-Speed 2618.35 samples/sec Loss 5.1967 LearningRate 0.0146 Epoch: 12 Global Step: 512940 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:24,790-Speed 2614.62 samples/sec Loss 5.0210 LearningRate 0.0146 Epoch: 12 Global Step: 512950 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:28,740-Speed 2593.00 samples/sec Loss 5.0490 LearningRate 0.0146 Epoch: 12 Global Step: 512960 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:32,662-Speed 2611.04 samples/sec Loss 5.0491 LearningRate 0.0146 Epoch: 12 Global Step: 512970 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:22:36,567-Speed 2623.51 samples/sec Loss 5.1246 LearningRate 0.0146 Epoch: 12 Global Step: 512980 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:22:40,466-Speed 2627.80 samples/sec Loss 5.1451 LearningRate 0.0146 Epoch: 12 Global Step: 512990 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:22:44,366-Speed 2625.85 samples/sec Loss 5.0888 LearningRate 0.0146 Epoch: 12 Global Step: 513000 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:22:48,269-Speed 2624.21 samples/sec Loss 5.1491 LearningRate 0.0146 Epoch: 12 Global Step: 513010 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:22:52,169-Speed 2626.71 samples/sec Loss 5.0953 LearningRate 0.0146 Epoch: 12 Global Step: 513020 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:22:56,070-Speed 2625.66 samples/sec Loss 5.1491 LearningRate 0.0146 Epoch: 12 Global Step: 513030 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:23:00,000-Speed 2606.00 samples/sec Loss 5.0223 LearningRate 0.0146 Epoch: 12 Global Step: 513040 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:23:03,903-Speed 2624.26 samples/sec Loss 5.1310 LearningRate 0.0146 Epoch: 12 Global Step: 513050 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:23:07,779-Speed 2642.82 samples/sec Loss 5.1267 LearningRate 0.0146 Epoch: 12 Global Step: 513060 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:11,684-Speed 2622.78 samples/sec Loss 5.1616 LearningRate 0.0146 Epoch: 12 Global Step: 513070 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:15,948-Speed 2402.16 samples/sec Loss 5.1236 LearningRate 0.0146 Epoch: 12 Global Step: 513080 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:19,852-Speed 2623.39 samples/sec Loss 5.0730 LearningRate 0.0146 Epoch: 12 Global Step: 513090 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:23,756-Speed 2623.76 samples/sec Loss 5.1332 LearningRate 0.0146 Epoch: 12 Global Step: 513100 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:27,658-Speed 2625.16 samples/sec Loss 5.1509 LearningRate 0.0146 Epoch: 12 Global Step: 513110 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:31,564-Speed 2622.79 samples/sec Loss 5.1502 LearningRate 0.0146 Epoch: 12 Global Step: 513120 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:35,470-Speed 2622.10 samples/sec Loss 5.1192 LearningRate 0.0146 Epoch: 12 Global Step: 513130 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:39,378-Speed 2620.96 samples/sec Loss 5.1745 LearningRate 0.0146 Epoch: 12 Global Step: 513140 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:43,290-Speed 2617.96 samples/sec Loss 5.1265 LearningRate 0.0145 Epoch: 12 Global Step: 513150 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:23:47,190-Speed 2626.16 samples/sec Loss 5.1587 LearningRate 0.0145 Epoch: 12 Global Step: 513160 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:23:51,089-Speed 2626.95 samples/sec Loss 5.0746 LearningRate 0.0145 Epoch: 12 Global Step: 513170 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:23:54,993-Speed 2623.54 samples/sec Loss 5.2163 LearningRate 0.0145 Epoch: 12 Global Step: 513180 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:23:58,914-Speed 2617.44 samples/sec Loss 5.1929 LearningRate 0.0145 Epoch: 12 Global Step: 513190 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:24:02,799-Speed 2636.14 samples/sec Loss 5.1628 LearningRate 0.0145 Epoch: 12 Global Step: 513200 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:24:06,702-Speed 2624.40 samples/sec Loss 5.1267 LearningRate 0.0145 Epoch: 12 Global Step: 513210 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:24:10,628-Speed 2608.54 samples/sec Loss 5.0901 LearningRate 0.0145 Epoch: 12 Global Step: 513220 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:24:14,555-Speed 2608.64 samples/sec Loss 5.0287 LearningRate 0.0145 Epoch: 12 Global Step: 513230 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:24:18,470-Speed 2615.82 samples/sec Loss 5.1239 LearningRate 0.0145 Epoch: 12 Global Step: 513240 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:24:22,380-Speed 2619.85 samples/sec Loss 5.1168 LearningRate 0.0145 Epoch: 12 Global Step: 513250 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:24:26,282-Speed 2624.57 samples/sec Loss 4.9921 LearningRate 0.0145 Epoch: 12 Global Step: 513260 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:24:30,159-Speed 2641.97 samples/sec Loss 5.2187 LearningRate 0.0145 Epoch: 12 Global Step: 513270 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:24:34,061-Speed 2625.29 samples/sec Loss 5.1013 LearningRate 0.0145 Epoch: 12 Global Step: 513280 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:24:37,959-Speed 2627.49 samples/sec Loss 5.1986 LearningRate 0.0145 Epoch: 12 Global Step: 513290 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:24:41,864-Speed 2622.86 samples/sec Loss 5.1399 LearningRate 0.0145 Epoch: 12 Global Step: 513300 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:24:45,772-Speed 2620.68 samples/sec Loss 5.1767 LearningRate 0.0145 Epoch: 12 Global Step: 513310 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:24:49,681-Speed 2620.60 samples/sec Loss 5.1494 LearningRate 0.0145 Epoch: 12 Global Step: 513320 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:24:53,589-Speed 2620.96 samples/sec Loss 5.1434 LearningRate 0.0145 Epoch: 12 Global Step: 513330 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:24:57,494-Speed 2622.31 samples/sec Loss 5.1449 LearningRate 0.0145 Epoch: 12 Global Step: 513340 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:25:01,398-Speed 2623.84 samples/sec Loss 5.1020 LearningRate 0.0145 Epoch: 12 Global Step: 513350 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:25:05,299-Speed 2625.31 samples/sec Loss 5.0797 LearningRate 0.0145 Epoch: 12 Global Step: 513360 Fp16 Grad Scale: 32768 Required: 36 hours
Training: 2022-04-15 05:25:09,201-Speed 2625.10 samples/sec Loss 5.0296 LearningRate 0.0145 Epoch: 12 Global Step: 513370 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:13,102-Speed 2625.46 samples/sec Loss 5.0910 LearningRate 0.0145 Epoch: 12 Global Step: 513380 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:17,006-Speed 2624.25 samples/sec Loss 5.1390 LearningRate 0.0145 Epoch: 12 Global Step: 513390 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:20,908-Speed 2627.12 samples/sec Loss 5.1322 LearningRate 0.0145 Epoch: 12 Global Step: 513400 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:24,811-Speed 2624.10 samples/sec Loss 5.0075 LearningRate 0.0145 Epoch: 12 Global Step: 513410 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:28,723-Speed 2618.36 samples/sec Loss 5.0085 LearningRate 0.0145 Epoch: 12 Global Step: 513420 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:32,643-Speed 2612.79 samples/sec Loss 5.1374 LearningRate 0.0145 Epoch: 12 Global Step: 513430 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:36,544-Speed 2625.31 samples/sec Loss 5.2815 LearningRate 0.0145 Epoch: 12 Global Step: 513440 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:40,444-Speed 2625.98 samples/sec Loss 5.1890 LearningRate 0.0145 Epoch: 12 Global Step: 513450 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:44,345-Speed 2626.93 samples/sec Loss 5.0443 LearningRate 0.0145 Epoch: 12 Global Step: 513460 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:25:48,247-Speed 2624.96 samples/sec Loss 5.0866 LearningRate 0.0145 Epoch: 12 Global Step: 513470 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:25:52,145-Speed 2627.18 samples/sec Loss 5.0369 LearningRate 0.0145 Epoch: 12 Global Step: 513480 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:25:56,047-Speed 2624.76 samples/sec Loss 5.1485 LearningRate 0.0145 Epoch: 12 Global Step: 513490 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:25:59,948-Speed 2626.35 samples/sec Loss 5.1560 LearningRate 0.0145 Epoch: 12 Global Step: 513500 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:26:03,849-Speed 2625.00 samples/sec Loss 5.1231 LearningRate 0.0145 Epoch: 12 Global Step: 513510 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:26:07,751-Speed 2625.11 samples/sec Loss 5.1724 LearningRate 0.0145 Epoch: 12 Global Step: 513520 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:26:11,656-Speed 2622.82 samples/sec Loss 5.1555 LearningRate 0.0145 Epoch: 12 Global Step: 513530 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:26:15,564-Speed 2620.64 samples/sec Loss 5.0237 LearningRate 0.0145 Epoch: 12 Global Step: 513540 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:26:19,467-Speed 2624.33 samples/sec Loss 5.1212 LearningRate 0.0145 Epoch: 12 Global Step: 513550 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:26:23,373-Speed 2622.14 samples/sec Loss 5.0420 LearningRate 0.0145 Epoch: 12 Global Step: 513560 Fp16 Grad Scale: 131072 Required: 36 hours
Training: 2022-04-15 05:26:27,231-Speed 2655.64 samples/sec Loss 5.0401 LearningRate 0.0145 Epoch: 12 Global Step: 513570 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:31,136-Speed 2622.67 samples/sec Loss 5.1891 LearningRate 0.0145 Epoch: 12 Global Step: 513580 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:35,060-Speed 2610.37 samples/sec Loss 5.1328 LearningRate 0.0145 Epoch: 12 Global Step: 513590 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:38,960-Speed 2626.00 samples/sec Loss 5.1843 LearningRate 0.0145 Epoch: 12 Global Step: 513600 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:42,872-Speed 2618.08 samples/sec Loss 5.1138 LearningRate 0.0145 Epoch: 12 Global Step: 513610 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:46,772-Speed 2626.73 samples/sec Loss 5.1647 LearningRate 0.0145 Epoch: 12 Global Step: 513620 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:50,673-Speed 2625.57 samples/sec Loss 4.9885 LearningRate 0.0145 Epoch: 12 Global Step: 513630 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:54,580-Speed 2621.44 samples/sec Loss 5.0849 LearningRate 0.0145 Epoch: 12 Global Step: 513640 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:26:58,482-Speed 2624.22 samples/sec Loss 5.0111 LearningRate 0.0145 Epoch: 12 Global Step: 513650 Fp16 Grad Scale: 65536 Required: 36 hours
Training: 2022-04-15 05:27:02,381-Speed 2627.53 samples/sec Loss 5.0314 LearningRate 0.0145 Epoch: 12 Global Step: 513660 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:27:06,280-Speed 2627.67 samples/sec Loss 5.1301 LearningRate 0.0145 Epoch: 12 Global Step: 513670 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:10,180-Speed 2626.36 samples/sec Loss 5.1238 LearningRate 0.0145 Epoch: 12 Global Step: 513680 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:14,088-Speed 2621.07 samples/sec Loss 5.0693 LearningRate 0.0145 Epoch: 12 Global Step: 513690 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:17,989-Speed 2624.94 samples/sec Loss 5.0783 LearningRate 0.0145 Epoch: 12 Global Step: 513700 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:21,891-Speed 2624.93 samples/sec Loss 5.1628 LearningRate 0.0145 Epoch: 12 Global Step: 513710 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:25,794-Speed 2624.12 samples/sec Loss 5.0640 LearningRate 0.0145 Epoch: 12 Global Step: 513720 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:29,697-Speed 2624.03 samples/sec Loss 5.0051 LearningRate 0.0145 Epoch: 12 Global Step: 513730 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:33,598-Speed 2625.50 samples/sec Loss 5.0957 LearningRate 0.0145 Epoch: 12 Global Step: 513740 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:37,518-Speed 2613.37 samples/sec Loss 5.2434 LearningRate 0.0145 Epoch: 12 Global Step: 513750 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:41,424-Speed 2622.19 samples/sec Loss 5.2651 LearningRate 0.0145 Epoch: 12 Global Step: 513760 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:45,327-Speed 2624.32 samples/sec Loss 5.1686 LearningRate 0.0145 Epoch: 12 Global Step: 513770 Fp16 Grad Scale: 262144 Required: 35 hours
Training: 2022-04-15 05:27:49,211-Speed 2637.06 samples/sec Loss 5.1188 LearningRate 0.0145 Epoch: 12 Global Step: 513780 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:53,114-Speed 2624.24 samples/sec Loss 5.1756 LearningRate 0.0145 Epoch: 12 Global Step: 513790 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:27:57,013-Speed 2626.68 samples/sec Loss 5.1816 LearningRate 0.0145 Epoch: 12 Global Step: 513800 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:28:00,923-Speed 2627.71 samples/sec Loss 5.0820 LearningRate 0.0145 Epoch: 12 Global Step: 513810 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:28:04,806-Speed 2637.96 samples/sec Loss 4.9793 LearningRate 0.0145 Epoch: 12 Global Step: 513820 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:08,704-Speed 2627.41 samples/sec Loss 5.1255 LearningRate 0.0145 Epoch: 12 Global Step: 513830 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:12,670-Speed 2582.16 samples/sec Loss 5.1826 LearningRate 0.0145 Epoch: 12 Global Step: 513840 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:16,577-Speed 2622.27 samples/sec Loss 5.1092 LearningRate 0.0145 Epoch: 12 Global Step: 513850 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:20,476-Speed 2626.94 samples/sec Loss 5.0948 LearningRate 0.0145 Epoch: 12 Global Step: 513860 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:24,380-Speed 2623.51 samples/sec Loss 5.0824 LearningRate 0.0145 Epoch: 12 Global Step: 513870 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:28,289-Speed 2619.99 samples/sec Loss 5.0955 LearningRate 0.0145 Epoch: 12 Global Step: 513880 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:32,197-Speed 2620.83 samples/sec Loss 5.2076 LearningRate 0.0145 Epoch: 12 Global Step: 513890 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:36,101-Speed 2623.79 samples/sec Loss 5.1026 LearningRate 0.0145 Epoch: 12 Global Step: 513900 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:40,008-Speed 2621.40 samples/sec Loss 5.1006 LearningRate 0.0145 Epoch: 12 Global Step: 513910 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:43,916-Speed 2621.04 samples/sec Loss 5.0090 LearningRate 0.0145 Epoch: 12 Global Step: 513920 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:28:47,820-Speed 2623.38 samples/sec Loss 5.1145 LearningRate 0.0145 Epoch: 12 Global Step: 513930 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:28:51,721-Speed 2625.66 samples/sec Loss 5.0957 LearningRate 0.0145 Epoch: 12 Global Step: 513940 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:28:55,598-Speed 2642.12 samples/sec Loss 5.1063 LearningRate 0.0145 Epoch: 12 Global Step: 513950 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:28:59,495-Speed 2628.60 samples/sec Loss 5.0890 LearningRate 0.0145 Epoch: 12 Global Step: 513960 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:03,394-Speed 2626.25 samples/sec Loss 5.0459 LearningRate 0.0145 Epoch: 12 Global Step: 513970 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:07,298-Speed 2623.80 samples/sec Loss 5.0318 LearningRate 0.0145 Epoch: 12 Global Step: 513980 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:11,201-Speed 2623.90 samples/sec Loss 5.1886 LearningRate 0.0145 Epoch: 12 Global Step: 513990 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:15,102-Speed 2625.41 samples/sec Loss 5.0793 LearningRate 0.0145 Epoch: 12 Global Step: 514000 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:19,013-Speed 2619.40 samples/sec Loss 5.1856 LearningRate 0.0145 Epoch: 12 Global Step: 514010 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:22,911-Speed 2627.59 samples/sec Loss 5.0294 LearningRate 0.0145 Epoch: 12 Global Step: 514020 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:26,810-Speed 2626.51 samples/sec Loss 5.1708 LearningRate 0.0145 Epoch: 12 Global Step: 514030 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:30,713-Speed 2624.81 samples/sec Loss 5.1172 LearningRate 0.0145 Epoch: 12 Global Step: 514040 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:34,615-Speed 2624.87 samples/sec Loss 5.0698 LearningRate 0.0145 Epoch: 12 Global Step: 514050 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:29:38,515-Speed 2626.04 samples/sec Loss 5.0910 LearningRate 0.0145 Epoch: 12 Global Step: 514060 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:29:42,424-Speed 2620.67 samples/sec Loss 5.1717 LearningRate 0.0145 Epoch: 12 Global Step: 514070 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:29:46,331-Speed 2621.63 samples/sec Loss 5.1185 LearningRate 0.0145 Epoch: 12 Global Step: 514080 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:29:50,220-Speed 2633.40 samples/sec Loss 5.0731 LearningRate 0.0145 Epoch: 12 Global Step: 514090 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:54,121-Speed 2625.77 samples/sec Loss 5.1994 LearningRate 0.0145 Epoch: 12 Global Step: 514100 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:29:58,019-Speed 2627.27 samples/sec Loss 5.0588 LearningRate 0.0145 Epoch: 12 Global Step: 514110 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:02,017-Speed 2561.80 samples/sec Loss 5.1043 LearningRate 0.0145 Epoch: 12 Global Step: 514120 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:05,919-Speed 2625.00 samples/sec Loss 5.2301 LearningRate 0.0145 Epoch: 12 Global Step: 514130 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:09,821-Speed 2624.79 samples/sec Loss 5.1161 LearningRate 0.0145 Epoch: 12 Global Step: 514140 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:13,722-Speed 2625.52 samples/sec Loss 5.1097 LearningRate 0.0145 Epoch: 12 Global Step: 514150 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:17,626-Speed 2623.69 samples/sec Loss 5.1251 LearningRate 0.0145 Epoch: 12 Global Step: 514160 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:21,525-Speed 2627.00 samples/sec Loss 5.0752 LearningRate 0.0145 Epoch: 12 Global Step: 514170 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:25,421-Speed 2628.93 samples/sec Loss 5.1231 LearningRate 0.0145 Epoch: 12 Global Step: 514180 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:29,321-Speed 2627.33 samples/sec Loss 5.0219 LearningRate 0.0145 Epoch: 12 Global Step: 514190 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:30:33,231-Speed 2618.85 samples/sec Loss 5.0923 LearningRate 0.0145 Epoch: 12 Global Step: 514200 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:30:37,134-Speed 2624.14 samples/sec Loss 5.1871 LearningRate 0.0145 Epoch: 12 Global Step: 514210 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:30:41,030-Speed 2629.23 samples/sec Loss 5.0919 LearningRate 0.0145 Epoch: 12 Global Step: 514220 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:44,936-Speed 2622.18 samples/sec Loss 5.1117 LearningRate 0.0145 Epoch: 12 Global Step: 514230 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:48,857-Speed 2612.31 samples/sec Loss 5.0931 LearningRate 0.0144 Epoch: 12 Global Step: 514240 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:52,760-Speed 2624.31 samples/sec Loss 4.9891 LearningRate 0.0144 Epoch: 12 Global Step: 514250 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:30:56,739-Speed 2574.34 samples/sec Loss 5.0893 LearningRate 0.0144 Epoch: 12 Global Step: 514260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:00,836-Speed 2499.90 samples/sec Loss 5.0545 LearningRate 0.0144 Epoch: 12 Global Step: 514270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:04,760-Speed 2609.82 samples/sec Loss 5.1187 LearningRate 0.0144 Epoch: 12 Global Step: 514280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:08,662-Speed 2625.28 samples/sec Loss 5.0467 LearningRate 0.0144 Epoch: 12 Global Step: 514290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:12,561-Speed 2626.77 samples/sec Loss 5.1615 LearningRate 0.0144 Epoch: 12 Global Step: 514300 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:16,464-Speed 2624.04 samples/sec Loss 5.0692 LearningRate 0.0144 Epoch: 12 Global Step: 514310 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:20,362-Speed 2627.43 samples/sec Loss 5.1794 LearningRate 0.0144 Epoch: 12 Global Step: 514320 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:31:24,257-Speed 2630.11 samples/sec Loss 5.1411 LearningRate 0.0144 Epoch: 12 Global Step: 514330 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:31:28,157-Speed 2626.05 samples/sec Loss 5.0337 LearningRate 0.0144 Epoch: 12 Global Step: 514340 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:31:32,031-Speed 2644.60 samples/sec Loss 5.0342 LearningRate 0.0144 Epoch: 12 Global Step: 514350 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:35,956-Speed 2608.85 samples/sec Loss 5.1736 LearningRate 0.0144 Epoch: 12 Global Step: 514360 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:39,863-Speed 2621.92 samples/sec Loss 5.1845 LearningRate 0.0144 Epoch: 12 Global Step: 514370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:43,764-Speed 2624.88 samples/sec Loss 5.0700 LearningRate 0.0144 Epoch: 12 Global Step: 514380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:47,659-Speed 2630.31 samples/sec Loss 5.0972 LearningRate 0.0144 Epoch: 12 Global Step: 514390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:51,583-Speed 2610.19 samples/sec Loss 5.1155 LearningRate 0.0144 Epoch: 12 Global Step: 514400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:55,486-Speed 2623.95 samples/sec Loss 5.0553 LearningRate 0.0144 Epoch: 12 Global Step: 514410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:31:59,390-Speed 2623.49 samples/sec Loss 5.1088 LearningRate 0.0144 Epoch: 12 Global Step: 514420 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:03,289-Speed 2626.96 samples/sec Loss 5.0457 LearningRate 0.0144 Epoch: 12 Global Step: 514430 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:07,198-Speed 2620.38 samples/sec Loss 5.0747 LearningRate 0.0144 Epoch: 12 Global Step: 514440 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:11,097-Speed 2627.09 samples/sec Loss 4.9994 LearningRate 0.0144 Epoch: 12 Global Step: 514450 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:32:14,992-Speed 2629.36 samples/sec Loss 5.1037 LearningRate 0.0144 Epoch: 12 Global Step: 514460 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:32:18,889-Speed 2628.41 samples/sec Loss 5.1740 LearningRate 0.0144 Epoch: 12 Global Step: 514470 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:32:22,761-Speed 2644.71 samples/sec Loss 5.0165 LearningRate 0.0144 Epoch: 12 Global Step: 514480 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:26,661-Speed 2626.55 samples/sec Loss 5.0448 LearningRate 0.0144 Epoch: 12 Global Step: 514490 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:30,559-Speed 2627.32 samples/sec Loss 5.0268 LearningRate 0.0144 Epoch: 12 Global Step: 514500 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:34,461-Speed 2624.94 samples/sec Loss 5.1733 LearningRate 0.0144 Epoch: 12 Global Step: 514510 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:38,359-Speed 2627.65 samples/sec Loss 5.1006 LearningRate 0.0144 Epoch: 12 Global Step: 514520 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:42,267-Speed 2621.56 samples/sec Loss 5.0921 LearningRate 0.0144 Epoch: 12 Global Step: 514530 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:46,166-Speed 2627.24 samples/sec Loss 4.9295 LearningRate 0.0144 Epoch: 12 Global Step: 514540 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:50,082-Speed 2615.42 samples/sec Loss 5.0618 LearningRate 0.0144 Epoch: 12 Global Step: 514550 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:53,983-Speed 2625.38 samples/sec Loss 5.1600 LearningRate 0.0144 Epoch: 12 Global Step: 514560 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:32:57,891-Speed 2620.77 samples/sec Loss 5.1425 LearningRate 0.0144 Epoch: 12 Global Step: 514570 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:33:01,793-Speed 2625.23 samples/sec Loss 5.0572 LearningRate 0.0144 Epoch: 12 Global Step: 514580 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:33:05,672-Speed 2640.66 samples/sec Loss 5.0653 LearningRate 0.0144 Epoch: 12 Global Step: 514590 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:33:09,577-Speed 2622.69 samples/sec Loss 5.0869 LearningRate 0.0144 Epoch: 12 Global Step: 514600 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:33:13,456-Speed 2640.38 samples/sec Loss 5.1467 LearningRate 0.0144 Epoch: 12 Global Step: 514610 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:17,355-Speed 2626.74 samples/sec Loss 5.0649 LearningRate 0.0144 Epoch: 12 Global Step: 514620 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:21,261-Speed 2623.02 samples/sec Loss 5.1150 LearningRate 0.0144 Epoch: 12 Global Step: 514630 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:25,157-Speed 2628.64 samples/sec Loss 5.1874 LearningRate 0.0144 Epoch: 12 Global Step: 514640 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:29,053-Speed 2628.67 samples/sec Loss 5.0654 LearningRate 0.0144 Epoch: 12 Global Step: 514650 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:33,032-Speed 2573.92 samples/sec Loss 5.0884 LearningRate 0.0144 Epoch: 12 Global Step: 514660 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:36,930-Speed 2628.23 samples/sec Loss 5.1233 LearningRate 0.0144 Epoch: 12 Global Step: 514670 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:40,829-Speed 2626.24 samples/sec Loss 5.0178 LearningRate 0.0144 Epoch: 12 Global Step: 514680 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:44,735-Speed 2622.97 samples/sec Loss 4.9983 LearningRate 0.0144 Epoch: 12 Global Step: 514690 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:48,643-Speed 2620.22 samples/sec Loss 5.0432 LearningRate 0.0144 Epoch: 12 Global Step: 514700 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:33:52,538-Speed 2631.02 samples/sec Loss 5.0833 LearningRate 0.0144 Epoch: 12 Global Step: 514710 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:33:56,432-Speed 2630.12 samples/sec Loss 5.1512 LearningRate 0.0144 Epoch: 12 Global Step: 514720 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:00,330-Speed 2627.64 samples/sec Loss 5.0331 LearningRate 0.0144 Epoch: 12 Global Step: 514730 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:04,238-Speed 2620.66 samples/sec Loss 5.1550 LearningRate 0.0144 Epoch: 12 Global Step: 514740 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:08,138-Speed 2626.83 samples/sec Loss 5.1543 LearningRate 0.0144 Epoch: 12 Global Step: 514750 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:12,035-Speed 2628.44 samples/sec Loss 5.0990 LearningRate 0.0144 Epoch: 12 Global Step: 514760 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:15,934-Speed 2626.67 samples/sec Loss 5.1354 LearningRate 0.0144 Epoch: 12 Global Step: 514770 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:19,836-Speed 2625.13 samples/sec Loss 5.1826 LearningRate 0.0144 Epoch: 12 Global Step: 514780 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:23,746-Speed 2619.19 samples/sec Loss 5.1565 LearningRate 0.0144 Epoch: 12 Global Step: 514790 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:27,651-Speed 2622.97 samples/sec Loss 4.9358 LearningRate 0.0144 Epoch: 12 Global Step: 514800 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:34:31,551-Speed 2626.60 samples/sec Loss 5.1209 LearningRate 0.0144 Epoch: 12 Global Step: 514810 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:34:35,446-Speed 2629.67 samples/sec Loss 5.0518 LearningRate 0.0144 Epoch: 12 Global Step: 514820 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:34:39,342-Speed 2628.62 samples/sec Loss 5.0834 LearningRate 0.0144 Epoch: 12 Global Step: 514830 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:34:43,243-Speed 2625.79 samples/sec Loss 5.1016 LearningRate 0.0144 Epoch: 12 Global Step: 514840 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:34:47,148-Speed 2623.39 samples/sec Loss 5.0484 LearningRate 0.0144 Epoch: 12 Global Step: 514850 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:34:51,054-Speed 2621.65 samples/sec Loss 5.1495 LearningRate 0.0144 Epoch: 12 Global Step: 514860 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:34:54,970-Speed 2618.92 samples/sec Loss 5.0783 LearningRate 0.0144 Epoch: 12 Global Step: 514870 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:34:58,869-Speed 2626.97 samples/sec Loss 5.0728 LearningRate 0.0144 Epoch: 12 Global Step: 514880 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:02,778-Speed 2619.85 samples/sec Loss 5.0773 LearningRate 0.0144 Epoch: 12 Global Step: 514890 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:06,672-Speed 2630.14 samples/sec Loss 5.1861 LearningRate 0.0144 Epoch: 12 Global Step: 514900 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:10,552-Speed 2640.10 samples/sec Loss 5.0749 LearningRate 0.0144 Epoch: 12 Global Step: 514910 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:14,450-Speed 2627.65 samples/sec Loss 5.0921 LearningRate 0.0144 Epoch: 12 Global Step: 514920 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:18,346-Speed 2629.36 samples/sec Loss 4.9843 LearningRate 0.0144 Epoch: 12 Global Step: 514930 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:22,242-Speed 2628.78 samples/sec Loss 5.1471 LearningRate 0.0144 Epoch: 12 Global Step: 514940 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:26,140-Speed 2627.70 samples/sec Loss 5.0557 LearningRate 0.0144 Epoch: 12 Global Step: 514950 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:30,041-Speed 2625.55 samples/sec Loss 5.1962 LearningRate 0.0144 Epoch: 12 Global Step: 514960 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:35:33,928-Speed 2635.01 samples/sec Loss 5.0968 LearningRate 0.0144 Epoch: 12 Global Step: 514970 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:35:37,836-Speed 2620.67 samples/sec Loss 5.1454 LearningRate 0.0144 Epoch: 12 Global Step: 514980 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:35:41,736-Speed 2625.69 samples/sec Loss 5.1466 LearningRate 0.0144 Epoch: 12 Global Step: 514990 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:35:45,624-Speed 2634.81 samples/sec Loss 5.0428 LearningRate 0.0144 Epoch: 12 Global Step: 515000 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:35:49,524-Speed 2626.38 samples/sec Loss 5.0757 LearningRate 0.0144 Epoch: 12 Global Step: 515010 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:35:53,437-Speed 2617.62 samples/sec Loss 5.2200 LearningRate 0.0144 Epoch: 12 Global Step: 515020 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:35:57,349-Speed 2618.52 samples/sec Loss 5.0356 LearningRate 0.0144 Epoch: 12 Global Step: 515030 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:36:01,247-Speed 2627.84 samples/sec Loss 5.0143 LearningRate 0.0144 Epoch: 12 Global Step: 515040 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:36:05,147-Speed 2625.98 samples/sec Loss 5.1874 LearningRate 0.0144 Epoch: 12 Global Step: 515050 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:36:09,052-Speed 2622.86 samples/sec Loss 5.1000 LearningRate 0.0144 Epoch: 12 Global Step: 515060 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:36:12,948-Speed 2628.59 samples/sec Loss 5.1293 LearningRate 0.0144 Epoch: 12 Global Step: 515070 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:36:16,842-Speed 2630.53 samples/sec Loss 5.0291 LearningRate 0.0144 Epoch: 12 Global Step: 515080 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:36:20,741-Speed 2626.45 samples/sec Loss 5.1234 LearningRate 0.0144 Epoch: 12 Global Step: 515090 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:36:24,646-Speed 2623.46 samples/sec Loss 5.2065 LearningRate 0.0144 Epoch: 12 Global Step: 515100 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:28,552-Speed 2622.65 samples/sec Loss 5.0289 LearningRate 0.0144 Epoch: 12 Global Step: 515110 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:32,457-Speed 2622.90 samples/sec Loss 5.1002 LearningRate 0.0144 Epoch: 12 Global Step: 515120 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:36,365-Speed 2620.56 samples/sec Loss 5.0457 LearningRate 0.0144 Epoch: 12 Global Step: 515130 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:40,261-Speed 2628.87 samples/sec Loss 5.0532 LearningRate 0.0144 Epoch: 12 Global Step: 515140 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:44,159-Speed 2627.47 samples/sec Loss 5.1044 LearningRate 0.0144 Epoch: 12 Global Step: 515150 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:48,060-Speed 2625.58 samples/sec Loss 5.0598 LearningRate 0.0144 Epoch: 12 Global Step: 515160 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:51,959-Speed 2630.83 samples/sec Loss 5.0337 LearningRate 0.0144 Epoch: 12 Global Step: 515170 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:55,857-Speed 2627.78 samples/sec Loss 5.1139 LearningRate 0.0144 Epoch: 12 Global Step: 515180 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:36:59,756-Speed 2626.80 samples/sec Loss 5.0196 LearningRate 0.0144 Epoch: 12 Global Step: 515190 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:03,658-Speed 2625.19 samples/sec Loss 5.1677 LearningRate 0.0144 Epoch: 12 Global Step: 515200 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:37:07,569-Speed 2618.67 samples/sec Loss 5.0904 LearningRate 0.0144 Epoch: 12 Global Step: 515210 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:37:11,450-Speed 2639.28 samples/sec Loss 4.9786 LearningRate 0.0144 Epoch: 12 Global Step: 515220 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:15,354-Speed 2623.65 samples/sec Loss 5.1090 LearningRate 0.0144 Epoch: 12 Global Step: 515230 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:19,262-Speed 2620.77 samples/sec Loss 5.0966 LearningRate 0.0144 Epoch: 12 Global Step: 515240 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:23,159-Speed 2628.52 samples/sec Loss 5.0826 LearningRate 0.0144 Epoch: 12 Global Step: 515250 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:27,051-Speed 2631.58 samples/sec Loss 5.0602 LearningRate 0.0144 Epoch: 12 Global Step: 515260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:30,954-Speed 2623.91 samples/sec Loss 5.0570 LearningRate 0.0144 Epoch: 12 Global Step: 515270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:34,878-Speed 2609.92 samples/sec Loss 4.9964 LearningRate 0.0144 Epoch: 12 Global Step: 515280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:38,776-Speed 2628.08 samples/sec Loss 5.1931 LearningRate 0.0144 Epoch: 12 Global Step: 515290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:42,673-Speed 2627.92 samples/sec Loss 4.9929 LearningRate 0.0144 Epoch: 12 Global Step: 515300 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:46,571-Speed 2628.35 samples/sec Loss 5.0594 LearningRate 0.0144 Epoch: 12 Global Step: 515310 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:50,465-Speed 2629.55 samples/sec Loss 5.1221 LearningRate 0.0144 Epoch: 12 Global Step: 515320 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:37:54,340-Speed 2643.35 samples/sec Loss 5.0537 LearningRate 0.0143 Epoch: 12 Global Step: 515330 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:37:58,238-Speed 2627.69 samples/sec Loss 5.0341 LearningRate 0.0143 Epoch: 12 Global Step: 515340 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:02,150-Speed 2618.05 samples/sec Loss 5.1207 LearningRate 0.0143 Epoch: 12 Global Step: 515350 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:06,062-Speed 2618.43 samples/sec Loss 5.0627 LearningRate 0.0143 Epoch: 12 Global Step: 515360 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:09,957-Speed 2629.41 samples/sec Loss 5.0534 LearningRate 0.0143 Epoch: 12 Global Step: 515370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:13,851-Speed 2630.56 samples/sec Loss 5.0948 LearningRate 0.0143 Epoch: 12 Global Step: 515380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:17,749-Speed 2627.87 samples/sec Loss 5.0669 LearningRate 0.0143 Epoch: 12 Global Step: 515390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:21,647-Speed 2627.61 samples/sec Loss 5.0974 LearningRate 0.0143 Epoch: 12 Global Step: 515400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:25,543-Speed 2628.86 samples/sec Loss 4.9970 LearningRate 0.0143 Epoch: 12 Global Step: 515410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:29,452-Speed 2620.08 samples/sec Loss 5.0738 LearningRate 0.0143 Epoch: 12 Global Step: 515420 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:38:33,347-Speed 2629.66 samples/sec Loss 5.1258 LearningRate 0.0143 Epoch: 12 Global Step: 515430 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:38:37,250-Speed 2624.30 samples/sec Loss 5.0417 LearningRate 0.0143 Epoch: 12 Global Step: 515440 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:38:41,157-Speed 2621.09 samples/sec Loss 5.0350 LearningRate 0.0143 Epoch: 12 Global Step: 515450 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:38:45,057-Speed 2626.81 samples/sec Loss 5.1659 LearningRate 0.0143 Epoch: 12 Global Step: 515460 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:38:48,954-Speed 2627.92 samples/sec Loss 5.0340 LearningRate 0.0143 Epoch: 12 Global Step: 515470 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:38:52,854-Speed 2626.45 samples/sec Loss 5.0365 LearningRate 0.0143 Epoch: 12 Global Step: 515480 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:38:56,728-Speed 2644.32 samples/sec Loss 5.0623 LearningRate 0.0143 Epoch: 12 Global Step: 515490 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:00,623-Speed 2629.50 samples/sec Loss 5.1370 LearningRate 0.0143 Epoch: 12 Global Step: 515500 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:04,519-Speed 2628.87 samples/sec Loss 5.1215 LearningRate 0.0143 Epoch: 12 Global Step: 515510 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:08,415-Speed 2629.16 samples/sec Loss 5.0330 LearningRate 0.0143 Epoch: 12 Global Step: 515520 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:12,313-Speed 2627.45 samples/sec Loss 5.1373 LearningRate 0.0143 Epoch: 12 Global Step: 515530 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:16,224-Speed 2619.03 samples/sec Loss 5.1149 LearningRate 0.0143 Epoch: 12 Global Step: 515540 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:20,121-Speed 2627.97 samples/sec Loss 5.0451 LearningRate 0.0143 Epoch: 12 Global Step: 515550 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:24,018-Speed 2628.01 samples/sec Loss 5.0632 LearningRate 0.0143 Epoch: 12 Global Step: 515560 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:27,916-Speed 2627.49 samples/sec Loss 5.1433 LearningRate 0.0143 Epoch: 12 Global Step: 515570 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:31,814-Speed 2627.94 samples/sec Loss 5.1168 LearningRate 0.0143 Epoch: 12 Global Step: 515580 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:39:35,707-Speed 2631.49 samples/sec Loss 5.0272 LearningRate 0.0143 Epoch: 12 Global Step: 515590 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:39:39,599-Speed 2631.16 samples/sec Loss 5.2078 LearningRate 0.0143 Epoch: 12 Global Step: 515600 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:39:43,497-Speed 2627.72 samples/sec Loss 5.0747 LearningRate 0.0143 Epoch: 12 Global Step: 515610 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:39:47,409-Speed 2618.67 samples/sec Loss 5.0670 LearningRate 0.0143 Epoch: 12 Global Step: 515620 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:39:51,306-Speed 2627.98 samples/sec Loss 5.1508 LearningRate 0.0143 Epoch: 12 Global Step: 515630 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:39:55,200-Speed 2630.31 samples/sec Loss 5.1745 LearningRate 0.0143 Epoch: 12 Global Step: 515640 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:39:59,084-Speed 2636.91 samples/sec Loss 5.1670 LearningRate 0.0143 Epoch: 12 Global Step: 515650 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:02,976-Speed 2631.46 samples/sec Loss 5.1272 LearningRate 0.0143 Epoch: 12 Global Step: 515660 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:06,871-Speed 2629.68 samples/sec Loss 5.0198 LearningRate 0.0143 Epoch: 12 Global Step: 515670 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:10,769-Speed 2627.89 samples/sec Loss 5.1785 LearningRate 0.0143 Epoch: 12 Global Step: 515680 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:14,665-Speed 2628.84 samples/sec Loss 4.9395 LearningRate 0.0143 Epoch: 12 Global Step: 515690 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:18,570-Speed 2622.93 samples/sec Loss 5.1031 LearningRate 0.0143 Epoch: 12 Global Step: 515700 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:22,472-Speed 2625.42 samples/sec Loss 5.0242 LearningRate 0.0143 Epoch: 12 Global Step: 515710 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:26,370-Speed 2627.21 samples/sec Loss 5.1053 LearningRate 0.0143 Epoch: 12 Global Step: 515720 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:30,265-Speed 2629.22 samples/sec Loss 5.0669 LearningRate 0.0143 Epoch: 12 Global Step: 515730 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:34,163-Speed 2627.53 samples/sec Loss 5.0610 LearningRate 0.0143 Epoch: 12 Global Step: 515740 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:38,057-Speed 2630.25 samples/sec Loss 5.1048 LearningRate 0.0143 Epoch: 12 Global Step: 515750 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:40:41,952-Speed 2629.49 samples/sec Loss 5.1167 LearningRate 0.0143 Epoch: 12 Global Step: 515760 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:40:45,852-Speed 2626.69 samples/sec Loss 5.0418 LearningRate 0.0143 Epoch: 12 Global Step: 515770 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:40:49,731-Speed 2640.54 samples/sec Loss 5.0438 LearningRate 0.0143 Epoch: 12 Global Step: 515780 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:53,633-Speed 2625.27 samples/sec Loss 5.1603 LearningRate 0.0143 Epoch: 12 Global Step: 515790 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:40:57,530-Speed 2628.33 samples/sec Loss 4.9919 LearningRate 0.0143 Epoch: 12 Global Step: 515800 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:01,423-Speed 2630.37 samples/sec Loss 4.9890 LearningRate 0.0143 Epoch: 12 Global Step: 515810 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:05,326-Speed 2623.99 samples/sec Loss 5.0404 LearningRate 0.0143 Epoch: 12 Global Step: 515820 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:09,223-Speed 2628.30 samples/sec Loss 5.0922 LearningRate 0.0143 Epoch: 12 Global Step: 515830 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:13,129-Speed 2622.59 samples/sec Loss 5.1362 LearningRate 0.0143 Epoch: 12 Global Step: 515840 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:17,028-Speed 2626.96 samples/sec Loss 5.0603 LearningRate 0.0143 Epoch: 12 Global Step: 515850 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:20,930-Speed 2625.08 samples/sec Loss 5.0087 LearningRate 0.0143 Epoch: 12 Global Step: 515860 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:24,827-Speed 2627.76 samples/sec Loss 5.1100 LearningRate 0.0143 Epoch: 12 Global Step: 515870 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:28,729-Speed 2625.60 samples/sec Loss 5.1334 LearningRate 0.0143 Epoch: 12 Global Step: 515880 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:41:32,629-Speed 2626.19 samples/sec Loss 5.0008 LearningRate 0.0143 Epoch: 12 Global Step: 515890 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:41:36,506-Speed 2641.84 samples/sec Loss 5.1164 LearningRate 0.0143 Epoch: 12 Global Step: 515900 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:40,404-Speed 2627.36 samples/sec Loss 5.0106 LearningRate 0.0143 Epoch: 12 Global Step: 515910 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:44,305-Speed 2625.52 samples/sec Loss 5.1360 LearningRate 0.0143 Epoch: 12 Global Step: 515920 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:48,225-Speed 2612.97 samples/sec Loss 5.0296 LearningRate 0.0143 Epoch: 12 Global Step: 515930 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:52,185-Speed 2586.61 samples/sec Loss 5.1131 LearningRate 0.0143 Epoch: 12 Global Step: 515940 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:41:56,091-Speed 2622.41 samples/sec Loss 5.0359 LearningRate 0.0143 Epoch: 12 Global Step: 515950 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:00,082-Speed 2566.25 samples/sec Loss 5.0505 LearningRate 0.0143 Epoch: 12 Global Step: 515960 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:03,983-Speed 2625.73 samples/sec Loss 5.0278 LearningRate 0.0143 Epoch: 12 Global Step: 515970 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:07,877-Speed 2629.96 samples/sec Loss 5.0464 LearningRate 0.0143 Epoch: 12 Global Step: 515980 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:11,782-Speed 2623.57 samples/sec Loss 5.0316 LearningRate 0.0143 Epoch: 12 Global Step: 515990 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:15,683-Speed 2625.35 samples/sec Loss 5.0250 LearningRate 0.0143 Epoch: 12 Global Step: 516000 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:42:19,580-Speed 2628.46 samples/sec Loss 5.1506 LearningRate 0.0143 Epoch: 12 Global Step: 516010 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:42:23,476-Speed 2628.31 samples/sec Loss 5.0325 LearningRate 0.0143 Epoch: 12 Global Step: 516020 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:42:27,358-Speed 2638.79 samples/sec Loss 5.0748 LearningRate 0.0143 Epoch: 12 Global Step: 516030 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:31,255-Speed 2628.44 samples/sec Loss 5.0528 LearningRate 0.0143 Epoch: 12 Global Step: 516040 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:35,163-Speed 2620.22 samples/sec Loss 5.0597 LearningRate 0.0143 Epoch: 12 Global Step: 516050 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:39,066-Speed 2624.64 samples/sec Loss 5.1078 LearningRate 0.0143 Epoch: 12 Global Step: 516060 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:42,967-Speed 2625.96 samples/sec Loss 5.1552 LearningRate 0.0143 Epoch: 12 Global Step: 516070 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:46,874-Speed 2621.30 samples/sec Loss 5.1056 LearningRate 0.0143 Epoch: 12 Global Step: 516080 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:50,913-Speed 2535.86 samples/sec Loss 5.1319 LearningRate 0.0143 Epoch: 12 Global Step: 516090 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:54,834-Speed 2612.30 samples/sec Loss 5.0540 LearningRate 0.0143 Epoch: 12 Global Step: 516100 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:42:58,729-Speed 2629.91 samples/sec Loss 4.9941 LearningRate 0.0143 Epoch: 12 Global Step: 516110 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:43:02,633-Speed 2623.55 samples/sec Loss 5.1500 LearningRate 0.0143 Epoch: 12 Global Step: 516120 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:43:06,508-Speed 2642.67 samples/sec Loss 4.9917 LearningRate 0.0143 Epoch: 12 Global Step: 516130 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:10,406-Speed 2627.36 samples/sec Loss 5.0487 LearningRate 0.0143 Epoch: 12 Global Step: 516140 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:14,305-Speed 2627.36 samples/sec Loss 5.1185 LearningRate 0.0143 Epoch: 12 Global Step: 516150 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:18,215-Speed 2620.14 samples/sec Loss 5.0502 LearningRate 0.0143 Epoch: 12 Global Step: 516160 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:22,113-Speed 2627.33 samples/sec Loss 5.1233 LearningRate 0.0143 Epoch: 12 Global Step: 516170 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:26,010-Speed 2628.32 samples/sec Loss 5.1123 LearningRate 0.0143 Epoch: 12 Global Step: 516180 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:29,913-Speed 2624.31 samples/sec Loss 5.1487 LearningRate 0.0143 Epoch: 12 Global Step: 516190 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:33,806-Speed 2631.05 samples/sec Loss 5.0620 LearningRate 0.0143 Epoch: 12 Global Step: 516200 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:37,701-Speed 2629.40 samples/sec Loss 4.9923 LearningRate 0.0143 Epoch: 12 Global Step: 516210 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:41,596-Speed 2629.40 samples/sec Loss 5.0722 LearningRate 0.0143 Epoch: 12 Global Step: 516220 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:43:45,497-Speed 2625.43 samples/sec Loss 5.1521 LearningRate 0.0143 Epoch: 12 Global Step: 516230 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:43:49,400-Speed 2624.44 samples/sec Loss 5.0304 LearningRate 0.0143 Epoch: 12 Global Step: 516240 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:43:53,301-Speed 2625.99 samples/sec Loss 5.1001 LearningRate 0.0143 Epoch: 12 Global Step: 516250 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:43:57,196-Speed 2629.69 samples/sec Loss 5.0297 LearningRate 0.0143 Epoch: 12 Global Step: 516260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:01,095-Speed 2627.07 samples/sec Loss 5.1096 LearningRate 0.0143 Epoch: 12 Global Step: 516270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:04,994-Speed 2626.53 samples/sec Loss 5.0476 LearningRate 0.0143 Epoch: 12 Global Step: 516280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:08,893-Speed 2626.90 samples/sec Loss 5.1229 LearningRate 0.0143 Epoch: 12 Global Step: 516290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:12,804-Speed 2619.30 samples/sec Loss 5.0398 LearningRate 0.0143 Epoch: 12 Global Step: 516300 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:16,710-Speed 2621.83 samples/sec Loss 5.0143 LearningRate 0.0143 Epoch: 12 Global Step: 516310 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:20,612-Speed 2624.54 samples/sec Loss 5.0329 LearningRate 0.0143 Epoch: 12 Global Step: 516320 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:24,510-Speed 2627.57 samples/sec Loss 5.1192 LearningRate 0.0143 Epoch: 12 Global Step: 516330 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:44:28,399-Speed 2633.83 samples/sec Loss 5.1541 LearningRate 0.0143 Epoch: 12 Global Step: 516340 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:32,295-Speed 2629.22 samples/sec Loss 4.9671 LearningRate 0.0143 Epoch: 12 Global Step: 516350 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:36,194-Speed 2626.97 samples/sec Loss 5.0962 LearningRate 0.0143 Epoch: 12 Global Step: 516360 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:40,087-Speed 2631.01 samples/sec Loss 5.0356 LearningRate 0.0143 Epoch: 12 Global Step: 516370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:43,986-Speed 2626.67 samples/sec Loss 4.9916 LearningRate 0.0143 Epoch: 12 Global Step: 516380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:47,882-Speed 2628.62 samples/sec Loss 5.1054 LearningRate 0.0143 Epoch: 12 Global Step: 516390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:51,778-Speed 2629.52 samples/sec Loss 5.0590 LearningRate 0.0143 Epoch: 12 Global Step: 516400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:55,681-Speed 2624.16 samples/sec Loss 5.0045 LearningRate 0.0143 Epoch: 12 Global Step: 516410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:44:59,576-Speed 2629.43 samples/sec Loss 5.0765 LearningRate 0.0143 Epoch: 12 Global Step: 516420 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:45:03,472-Speed 2628.79 samples/sec Loss 5.1226 LearningRate 0.0142 Epoch: 12 Global Step: 516430 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:45:07,364-Speed 2631.39 samples/sec Loss 5.0895 LearningRate 0.0142 Epoch: 12 Global Step: 516440 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:45:11,262-Speed 2627.86 samples/sec Loss 5.1363 LearningRate 0.0142 Epoch: 12 Global Step: 516450 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:45:15,161-Speed 2627.21 samples/sec Loss 5.1416 LearningRate 0.0142 Epoch: 12 Global Step: 516460 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:45:19,075-Speed 2616.84 samples/sec Loss 5.1153 LearningRate 0.0142 Epoch: 12 Global Step: 516470 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:45:22,976-Speed 2625.60 samples/sec Loss 5.0173 LearningRate 0.0142 Epoch: 12 Global Step: 516480 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:45:26,850-Speed 2643.78 samples/sec Loss 5.0105 LearningRate 0.0142 Epoch: 12 Global Step: 516490 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:45:30,731-Speed 2639.29 samples/sec Loss 5.1270 LearningRate 0.0142 Epoch: 12 Global Step: 516500 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:45:34,642-Speed 2618.22 samples/sec Loss 5.1943 LearningRate 0.0142 Epoch: 12 Global Step: 516510 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:45:38,541-Speed 2627.23 samples/sec Loss 5.1085 LearningRate 0.0142 Epoch: 12 Global Step: 516520 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:45:42,438-Speed 2628.31 samples/sec Loss 5.0851 LearningRate 0.0142 Epoch: 12 Global Step: 516530 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:45:46,336-Speed 2627.49 samples/sec Loss 5.0145 LearningRate 0.0142 Epoch: 12 Global Step: 516540 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:45:50,237-Speed 2626.10 samples/sec Loss 4.9979 LearningRate 0.0142 Epoch: 12 Global Step: 516550 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:45:54,131-Speed 2630.72 samples/sec Loss 5.0245 LearningRate 0.0142 Epoch: 12 Global Step: 516560 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:45:58,027-Speed 2628.31 samples/sec Loss 5.1499 LearningRate 0.0142 Epoch: 12 Global Step: 516570 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:46:01,921-Speed 2630.25 samples/sec Loss 5.0337 LearningRate 0.0142 Epoch: 12 Global Step: 516580 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:46:05,816-Speed 2629.66 samples/sec Loss 4.9040 LearningRate 0.0142 Epoch: 12 Global Step: 516590 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:46:09,709-Speed 2630.64 samples/sec Loss 4.9651 LearningRate 0.0142 Epoch: 12 Global Step: 516600 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:13,615-Speed 2622.36 samples/sec Loss 4.9987 LearningRate 0.0142 Epoch: 12 Global Step: 516610 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:17,512-Speed 2628.23 samples/sec Loss 5.0679 LearningRate 0.0142 Epoch: 12 Global Step: 516620 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:21,409-Speed 2628.07 samples/sec Loss 5.1061 LearningRate 0.0142 Epoch: 12 Global Step: 516630 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:25,306-Speed 2628.67 samples/sec Loss 5.0102 LearningRate 0.0142 Epoch: 12 Global Step: 516640 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:29,204-Speed 2627.19 samples/sec Loss 5.1458 LearningRate 0.0142 Epoch: 12 Global Step: 516650 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:33,097-Speed 2631.34 samples/sec Loss 4.9375 LearningRate 0.0142 Epoch: 12 Global Step: 516660 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:36,991-Speed 2630.16 samples/sec Loss 5.0421 LearningRate 0.0142 Epoch: 12 Global Step: 516670 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:40,889-Speed 2627.62 samples/sec Loss 5.1295 LearningRate 0.0142 Epoch: 12 Global Step: 516680 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:44,792-Speed 2624.74 samples/sec Loss 5.1237 LearningRate 0.0142 Epoch: 12 Global Step: 516690 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:46:48,652-Speed 2653.01 samples/sec Loss 5.0478 LearningRate 0.0142 Epoch: 12 Global Step: 516700 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:46:52,545-Speed 2631.39 samples/sec Loss 4.9905 LearningRate 0.0142 Epoch: 12 Global Step: 516710 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:46:56,439-Speed 2629.88 samples/sec Loss 5.1160 LearningRate 0.0142 Epoch: 12 Global Step: 516720 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:00,334-Speed 2629.69 samples/sec Loss 5.1196 LearningRate 0.0142 Epoch: 12 Global Step: 516730 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:04,241-Speed 2621.19 samples/sec Loss 5.0584 LearningRate 0.0142 Epoch: 12 Global Step: 516740 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:08,148-Speed 2622.19 samples/sec Loss 4.9514 LearningRate 0.0142 Epoch: 12 Global Step: 516750 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:12,053-Speed 2623.02 samples/sec Loss 5.1432 LearningRate 0.0142 Epoch: 12 Global Step: 516760 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:15,949-Speed 2628.49 samples/sec Loss 5.0667 LearningRate 0.0142 Epoch: 12 Global Step: 516770 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:19,854-Speed 2623.13 samples/sec Loss 5.1232 LearningRate 0.0142 Epoch: 12 Global Step: 516780 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:23,747-Speed 2631.08 samples/sec Loss 5.0703 LearningRate 0.0142 Epoch: 12 Global Step: 516790 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:47:27,642-Speed 2629.79 samples/sec Loss 5.0713 LearningRate 0.0142 Epoch: 12 Global Step: 516800 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:31,539-Speed 2628.28 samples/sec Loss 5.1640 LearningRate 0.0142 Epoch: 12 Global Step: 516810 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:35,435-Speed 2628.24 samples/sec Loss 4.9982 LearningRate 0.0142 Epoch: 12 Global Step: 516820 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:39,336-Speed 2626.02 samples/sec Loss 5.1181 LearningRate 0.0142 Epoch: 12 Global Step: 516830 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:43,232-Speed 2629.50 samples/sec Loss 5.0118 LearningRate 0.0142 Epoch: 12 Global Step: 516840 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:47,205-Speed 2577.46 samples/sec Loss 5.0287 LearningRate 0.0142 Epoch: 12 Global Step: 516850 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:51,286-Speed 2510.09 samples/sec Loss 5.1018 LearningRate 0.0142 Epoch: 12 Global Step: 516860 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:55,275-Speed 2567.67 samples/sec Loss 5.0365 LearningRate 0.0142 Epoch: 12 Global Step: 516870 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:47:59,171-Speed 2629.21 samples/sec Loss 5.0311 LearningRate 0.0142 Epoch: 12 Global Step: 516880 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:48:03,095-Speed 2609.65 samples/sec Loss 5.0806 LearningRate 0.0142 Epoch: 12 Global Step: 516890 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:48:06,989-Speed 2630.28 samples/sec Loss 5.0748 LearningRate 0.0142 Epoch: 12 Global Step: 516900 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:48:10,885-Speed 2628.58 samples/sec Loss 5.0898 LearningRate 0.0142 Epoch: 12 Global Step: 516910 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:48:14,780-Speed 2630.72 samples/sec Loss 4.9548 LearningRate 0.0142 Epoch: 12 Global Step: 516920 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:48:18,677-Speed 2628.44 samples/sec Loss 5.0419 LearningRate 0.0142 Epoch: 12 Global Step: 516930 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:48:22,547-Speed 2646.71 samples/sec Loss 4.9911 LearningRate 0.0142 Epoch: 12 Global Step: 516940 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:48:26,445-Speed 2627.75 samples/sec Loss 5.0578 LearningRate 0.0142 Epoch: 12 Global Step: 516950 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:48:30,340-Speed 2629.65 samples/sec Loss 5.0814 LearningRate 0.0142 Epoch: 12 Global Step: 516960 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:48:34,246-Speed 2621.99 samples/sec Loss 5.1334 LearningRate 0.0142 Epoch: 12 Global Step: 516970 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:48:38,133-Speed 2634.57 samples/sec Loss 5.0452 LearningRate 0.0142 Epoch: 12 Global Step: 516980 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:48:42,046-Speed 2618.08 samples/sec Loss 5.0773 LearningRate 0.0142 Epoch: 12 Global Step: 516990 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:48:45,948-Speed 2624.76 samples/sec Loss 5.0238 LearningRate 0.0142 Epoch: 12 Global Step: 517000 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:48:49,847-Speed 2627.21 samples/sec Loss 5.0210 LearningRate 0.0142 Epoch: 12 Global Step: 517010 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:48:53,749-Speed 2625.19 samples/sec Loss 5.2308 LearningRate 0.0142 Epoch: 12 Global Step: 517020 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:48:57,646-Speed 2628.34 samples/sec Loss 5.0670 LearningRate 0.0142 Epoch: 12 Global Step: 517030 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:49:01,545-Speed 2626.68 samples/sec Loss 5.0454 LearningRate 0.0142 Epoch: 12 Global Step: 517040 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:49:05,443-Speed 2627.52 samples/sec Loss 4.9468 LearningRate 0.0142 Epoch: 12 Global Step: 517050 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:49:09,339-Speed 2629.22 samples/sec Loss 4.9360 LearningRate 0.0142 Epoch: 12 Global Step: 517060 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:49:13,235-Speed 2629.76 samples/sec Loss 5.0643 LearningRate 0.0142 Epoch: 12 Global Step: 517070 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 05:49:17,138-Speed 2624.15 samples/sec Loss 5.0721 LearningRate 0.0142 Epoch: 12 Global Step: 517080 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:21,037-Speed 2627.32 samples/sec Loss 5.0901 LearningRate 0.0142 Epoch: 12 Global Step: 517090 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:24,956-Speed 2612.96 samples/sec Loss 5.0208 LearningRate 0.0142 Epoch: 12 Global Step: 517100 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:28,850-Speed 2631.33 samples/sec Loss 5.0687 LearningRate 0.0142 Epoch: 12 Global Step: 517110 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:32,743-Speed 2630.60 samples/sec Loss 5.0817 LearningRate 0.0142 Epoch: 12 Global Step: 517120 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:36,660-Speed 2615.73 samples/sec Loss 5.1076 LearningRate 0.0142 Epoch: 12 Global Step: 517130 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:40,556-Speed 2629.10 samples/sec Loss 5.0083 LearningRate 0.0142 Epoch: 12 Global Step: 517140 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:44,458-Speed 2625.24 samples/sec Loss 4.9198 LearningRate 0.0142 Epoch: 12 Global Step: 517150 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:48,359-Speed 2625.52 samples/sec Loss 5.1837 LearningRate 0.0142 Epoch: 12 Global Step: 517160 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:52,259-Speed 2626.33 samples/sec Loss 5.1356 LearningRate 0.0142 Epoch: 12 Global Step: 517170 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:49:56,153-Speed 2630.47 samples/sec Loss 5.1792 LearningRate 0.0142 Epoch: 12 Global Step: 517180 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:50:00,054-Speed 2625.53 samples/sec Loss 4.9828 LearningRate 0.0142 Epoch: 12 Global Step: 517190 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:50:03,947-Speed 2630.97 samples/sec Loss 5.0208 LearningRate 0.0142 Epoch: 12 Global Step: 517200 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:07,862-Speed 2615.87 samples/sec Loss 5.1352 LearningRate 0.0142 Epoch: 12 Global Step: 517210 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:11,769-Speed 2621.99 samples/sec Loss 4.9089 LearningRate 0.0142 Epoch: 12 Global Step: 517220 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:15,667-Speed 2627.50 samples/sec Loss 5.1388 LearningRate 0.0142 Epoch: 12 Global Step: 517230 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:19,570-Speed 2624.39 samples/sec Loss 4.9678 LearningRate 0.0142 Epoch: 12 Global Step: 517240 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:23,473-Speed 2625.06 samples/sec Loss 4.9945 LearningRate 0.0142 Epoch: 12 Global Step: 517250 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:27,374-Speed 2625.34 samples/sec Loss 5.0467 LearningRate 0.0142 Epoch: 12 Global Step: 517260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:31,272-Speed 2627.59 samples/sec Loss 5.0433 LearningRate 0.0142 Epoch: 12 Global Step: 517270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:35,172-Speed 2625.78 samples/sec Loss 5.0429 LearningRate 0.0142 Epoch: 12 Global Step: 517280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:39,065-Speed 2631.45 samples/sec Loss 5.0778 LearningRate 0.0142 Epoch: 12 Global Step: 517290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:42,941-Speed 2642.19 samples/sec Loss 5.1003 LearningRate 0.0142 Epoch: 12 Global Step: 517300 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:46,849-Speed 2621.47 samples/sec Loss 4.9965 LearningRate 0.0142 Epoch: 12 Global Step: 517310 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:50,740-Speed 2632.03 samples/sec Loss 4.9889 LearningRate 0.0142 Epoch: 12 Global Step: 517320 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:54,633-Speed 2631.52 samples/sec Loss 4.9996 LearningRate 0.0142 Epoch: 12 Global Step: 517330 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:50:58,563-Speed 2605.87 samples/sec Loss 5.0669 LearningRate 0.0142 Epoch: 12 Global Step: 517340 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:51:02,461-Speed 2627.59 samples/sec Loss 5.0887 LearningRate 0.0142 Epoch: 12 Global Step: 517350 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:51:06,355-Speed 2630.59 samples/sec Loss 5.0519 LearningRate 0.0142 Epoch: 12 Global Step: 517360 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:51:10,252-Speed 2627.74 samples/sec Loss 5.1512 LearningRate 0.0142 Epoch: 12 Global Step: 517370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:51:14,154-Speed 2625.86 samples/sec Loss 4.9768 LearningRate 0.0142 Epoch: 12 Global Step: 517380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:51:18,057-Speed 2624.43 samples/sec Loss 5.0781 LearningRate 0.0142 Epoch: 12 Global Step: 517390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:51:21,960-Speed 2623.85 samples/sec Loss 4.9879 LearningRate 0.0142 Epoch: 12 Global Step: 517400 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:25,856-Speed 2629.61 samples/sec Loss 5.1995 LearningRate 0.0142 Epoch: 12 Global Step: 517410 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:29,749-Speed 2631.09 samples/sec Loss 5.0385 LearningRate 0.0142 Epoch: 12 Global Step: 517420 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:33,644-Speed 2629.10 samples/sec Loss 5.0578 LearningRate 0.0142 Epoch: 12 Global Step: 517430 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:37,566-Speed 2611.98 samples/sec Loss 5.0167 LearningRate 0.0142 Epoch: 12 Global Step: 517440 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:41,462-Speed 2629.27 samples/sec Loss 5.0951 LearningRate 0.0142 Epoch: 12 Global Step: 517450 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:45,377-Speed 2616.59 samples/sec Loss 5.0807 LearningRate 0.0142 Epoch: 12 Global Step: 517460 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:49,273-Speed 2629.14 samples/sec Loss 5.0034 LearningRate 0.0142 Epoch: 12 Global Step: 517470 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:53,173-Speed 2626.57 samples/sec Loss 5.1087 LearningRate 0.0142 Epoch: 12 Global Step: 517480 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:51:57,073-Speed 2626.24 samples/sec Loss 5.1232 LearningRate 0.0142 Epoch: 12 Global Step: 517490 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:00,965-Speed 2631.51 samples/sec Loss 5.0403 LearningRate 0.0142 Epoch: 12 Global Step: 517500 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:04,864-Speed 2626.64 samples/sec Loss 5.1676 LearningRate 0.0142 Epoch: 12 Global Step: 517510 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:08,761-Speed 2628.39 samples/sec Loss 4.9780 LearningRate 0.0142 Epoch: 12 Global Step: 517520 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:12,662-Speed 2625.54 samples/sec Loss 5.0988 LearningRate 0.0141 Epoch: 12 Global Step: 517530 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:16,618-Speed 2590.08 samples/sec Loss 5.0797 LearningRate 0.0141 Epoch: 12 Global Step: 517540 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:20,517-Speed 2626.95 samples/sec Loss 5.0720 LearningRate 0.0141 Epoch: 12 Global Step: 517550 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:24,440-Speed 2611.18 samples/sec Loss 5.0970 LearningRate 0.0141 Epoch: 12 Global Step: 517560 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:52:28,322-Speed 2638.30 samples/sec Loss 4.9994 LearningRate 0.0141 Epoch: 12 Global Step: 517570 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:32,233-Speed 2618.98 samples/sec Loss 5.0142 LearningRate 0.0141 Epoch: 12 Global Step: 517580 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:36,136-Speed 2623.88 samples/sec Loss 5.0905 LearningRate 0.0141 Epoch: 12 Global Step: 517590 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:40,039-Speed 2626.56 samples/sec Loss 5.0627 LearningRate 0.0141 Epoch: 12 Global Step: 517600 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:43,937-Speed 2628.37 samples/sec Loss 5.0452 LearningRate 0.0141 Epoch: 12 Global Step: 517610 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:47,834-Speed 2628.10 samples/sec Loss 5.0427 LearningRate 0.0141 Epoch: 12 Global Step: 517620 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:51,734-Speed 2626.57 samples/sec Loss 5.1064 LearningRate 0.0141 Epoch: 12 Global Step: 517630 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:55,632-Speed 2627.28 samples/sec Loss 4.9619 LearningRate 0.0141 Epoch: 12 Global Step: 517640 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:52:59,561-Speed 2606.82 samples/sec Loss 5.0709 LearningRate 0.0141 Epoch: 12 Global Step: 517650 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:03,465-Speed 2624.28 samples/sec Loss 5.0421 LearningRate 0.0141 Epoch: 12 Global Step: 517660 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:07,345-Speed 2640.79 samples/sec Loss 4.9998 LearningRate 0.0141 Epoch: 12 Global Step: 517670 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:11,244-Speed 2626.81 samples/sec Loss 5.0108 LearningRate 0.0141 Epoch: 12 Global Step: 517680 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:15,293-Speed 2529.93 samples/sec Loss 5.0650 LearningRate 0.0141 Epoch: 12 Global Step: 517690 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:19,196-Speed 2624.85 samples/sec Loss 4.9562 LearningRate 0.0141 Epoch: 12 Global Step: 517700 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:23,094-Speed 2627.77 samples/sec Loss 5.0685 LearningRate 0.0141 Epoch: 12 Global Step: 517710 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:26,992-Speed 2627.71 samples/sec Loss 5.0695 LearningRate 0.0141 Epoch: 12 Global Step: 517720 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:30,888-Speed 2628.94 samples/sec Loss 5.0437 LearningRate 0.0141 Epoch: 12 Global Step: 517730 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:34,785-Speed 2628.26 samples/sec Loss 5.0949 LearningRate 0.0141 Epoch: 12 Global Step: 517740 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:38,681-Speed 2629.22 samples/sec Loss 4.9772 LearningRate 0.0141 Epoch: 12 Global Step: 517750 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:42,579-Speed 2627.56 samples/sec Loss 5.0153 LearningRate 0.0141 Epoch: 12 Global Step: 517760 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:53:46,481-Speed 2624.76 samples/sec Loss 5.0427 LearningRate 0.0141 Epoch: 12 Global Step: 517770 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:53:50,413-Speed 2605.76 samples/sec Loss 5.0494 LearningRate 0.0141 Epoch: 12 Global Step: 517780 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:53:54,313-Speed 2626.03 samples/sec Loss 5.1536 LearningRate 0.0141 Epoch: 12 Global Step: 517790 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:53:58,231-Speed 2614.53 samples/sec Loss 4.9795 LearningRate 0.0141 Epoch: 12 Global Step: 517800 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:54:02,157-Speed 2608.69 samples/sec Loss 5.0474 LearningRate 0.0141 Epoch: 12 Global Step: 517810 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:54:06,062-Speed 2623.05 samples/sec Loss 5.0243 LearningRate 0.0141 Epoch: 12 Global Step: 517820 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:54:09,979-Speed 2614.89 samples/sec Loss 5.0178 LearningRate 0.0141 Epoch: 12 Global Step: 517830 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:54:13,885-Speed 2622.12 samples/sec Loss 5.0014 LearningRate 0.0141 Epoch: 12 Global Step: 517840 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:54:17,790-Speed 2622.95 samples/sec Loss 5.1414 LearningRate 0.0141 Epoch: 12 Global Step: 517850 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:54:21,695-Speed 2623.26 samples/sec Loss 5.1410 LearningRate 0.0141 Epoch: 12 Global Step: 517860 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:54:25,600-Speed 2623.22 samples/sec Loss 4.9969 LearningRate 0.0141 Epoch: 12 Global Step: 517870 Fp16 Grad Scale: 262144 Required: 35 hours
Training: 2022-04-15 05:54:29,467-Speed 2648.61 samples/sec Loss 5.0477 LearningRate 0.0141 Epoch: 12 Global Step: 517880 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:54:33,370-Speed 2624.21 samples/sec Loss 5.0014 LearningRate 0.0141 Epoch: 12 Global Step: 517890 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:54:37,273-Speed 2623.92 samples/sec Loss 5.0958 LearningRate 0.0141 Epoch: 12 Global Step: 517900 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:54:41,193-Speed 2612.66 samples/sec Loss 5.0633 LearningRate 0.0141 Epoch: 12 Global Step: 517910 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:54:45,093-Speed 2626.49 samples/sec Loss 4.9420 LearningRate 0.0141 Epoch: 12 Global Step: 517920 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:54:48,991-Speed 2628.49 samples/sec Loss 4.9790 LearningRate 0.0141 Epoch: 12 Global Step: 517930 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:54:52,888-Speed 2627.90 samples/sec Loss 5.0204 LearningRate 0.0141 Epoch: 12 Global Step: 517940 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:54:56,790-Speed 2625.49 samples/sec Loss 5.0206 LearningRate 0.0141 Epoch: 12 Global Step: 517950 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:00,695-Speed 2622.93 samples/sec Loss 5.0069 LearningRate 0.0141 Epoch: 12 Global Step: 517960 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:04,597-Speed 2624.52 samples/sec Loss 5.1378 LearningRate 0.0141 Epoch: 12 Global Step: 517970 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:08,495-Speed 2627.72 samples/sec Loss 5.0153 LearningRate 0.0141 Epoch: 12 Global Step: 517980 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:55:12,390-Speed 2629.39 samples/sec Loss 5.0282 LearningRate 0.0141 Epoch: 12 Global Step: 517990 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:55:16,271-Speed 2638.79 samples/sec Loss 5.0231 LearningRate 0.0141 Epoch: 12 Global Step: 518000 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:20,173-Speed 2625.64 samples/sec Loss 5.0053 LearningRate 0.0141 Epoch: 12 Global Step: 518010 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:24,074-Speed 2625.64 samples/sec Loss 4.9912 LearningRate 0.0141 Epoch: 12 Global Step: 518020 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:27,975-Speed 2625.88 samples/sec Loss 5.0007 LearningRate 0.0141 Epoch: 12 Global Step: 518030 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:31,891-Speed 2615.06 samples/sec Loss 4.9931 LearningRate 0.0141 Epoch: 12 Global Step: 518040 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:35,795-Speed 2623.87 samples/sec Loss 4.9595 LearningRate 0.0141 Epoch: 12 Global Step: 518050 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:39,721-Speed 2609.20 samples/sec Loss 5.0655 LearningRate 0.0141 Epoch: 12 Global Step: 518060 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:43,621-Speed 2626.05 samples/sec Loss 5.1157 LearningRate 0.0141 Epoch: 12 Global Step: 518070 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:47,532-Speed 2618.54 samples/sec Loss 4.9893 LearningRate 0.0141 Epoch: 12 Global Step: 518080 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:51,431-Speed 2627.61 samples/sec Loss 5.0870 LearningRate 0.0141 Epoch: 12 Global Step: 518090 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:55:55,335-Speed 2623.40 samples/sec Loss 5.0481 LearningRate 0.0141 Epoch: 12 Global Step: 518100 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:55:59,235-Speed 2627.01 samples/sec Loss 5.0365 LearningRate 0.0141 Epoch: 12 Global Step: 518110 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:03,141-Speed 2621.51 samples/sec Loss 4.9507 LearningRate 0.0141 Epoch: 12 Global Step: 518120 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:07,048-Speed 2621.88 samples/sec Loss 4.9725 LearningRate 0.0141 Epoch: 12 Global Step: 518130 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:10,949-Speed 2625.47 samples/sec Loss 5.0582 LearningRate 0.0141 Epoch: 12 Global Step: 518140 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:14,848-Speed 2627.14 samples/sec Loss 5.1404 LearningRate 0.0141 Epoch: 12 Global Step: 518150 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:18,754-Speed 2622.39 samples/sec Loss 5.0666 LearningRate 0.0141 Epoch: 12 Global Step: 518160 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:22,665-Speed 2618.35 samples/sec Loss 5.0614 LearningRate 0.0141 Epoch: 12 Global Step: 518170 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:26,565-Speed 2626.61 samples/sec Loss 5.0488 LearningRate 0.0141 Epoch: 12 Global Step: 518180 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:30,466-Speed 2626.17 samples/sec Loss 5.0282 LearningRate 0.0141 Epoch: 12 Global Step: 518190 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:56:34,332-Speed 2649.60 samples/sec Loss 4.9776 LearningRate 0.0141 Epoch: 12 Global Step: 518200 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:56:38,234-Speed 2624.65 samples/sec Loss 4.9796 LearningRate 0.0141 Epoch: 12 Global Step: 518210 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:56:42,130-Speed 2629.21 samples/sec Loss 4.9847 LearningRate 0.0141 Epoch: 12 Global Step: 518220 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:56:46,036-Speed 2622.55 samples/sec Loss 4.9714 LearningRate 0.0141 Epoch: 12 Global Step: 518230 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:56:49,943-Speed 2621.20 samples/sec Loss 5.0989 LearningRate 0.0141 Epoch: 12 Global Step: 518240 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:56:53,852-Speed 2620.20 samples/sec Loss 5.0640 LearningRate 0.0141 Epoch: 12 Global Step: 518250 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:56:57,752-Speed 2626.32 samples/sec Loss 4.9857 LearningRate 0.0141 Epoch: 12 Global Step: 518260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:57:01,647-Speed 2629.65 samples/sec Loss 5.1069 LearningRate 0.0141 Epoch: 12 Global Step: 518270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:57:05,541-Speed 2630.08 samples/sec Loss 4.9377 LearningRate 0.0141 Epoch: 12 Global Step: 518280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:57:09,440-Speed 2627.21 samples/sec Loss 5.0384 LearningRate 0.0141 Epoch: 12 Global Step: 518290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:57:13,338-Speed 2627.59 samples/sec Loss 5.0212 LearningRate 0.0141 Epoch: 12 Global Step: 518300 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:17,241-Speed 2624.21 samples/sec Loss 5.0396 LearningRate 0.0141 Epoch: 12 Global Step: 518310 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:21,138-Speed 2628.50 samples/sec Loss 5.0664 LearningRate 0.0141 Epoch: 12 Global Step: 518320 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:25,042-Speed 2623.33 samples/sec Loss 4.9779 LearningRate 0.0141 Epoch: 12 Global Step: 518330 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:28,943-Speed 2625.69 samples/sec Loss 4.9845 LearningRate 0.0141 Epoch: 12 Global Step: 518340 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:32,843-Speed 2626.19 samples/sec Loss 5.0081 LearningRate 0.0141 Epoch: 12 Global Step: 518350 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:36,756-Speed 2617.80 samples/sec Loss 5.0793 LearningRate 0.0141 Epoch: 12 Global Step: 518360 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:40,659-Speed 2624.31 samples/sec Loss 5.0383 LearningRate 0.0141 Epoch: 12 Global Step: 518370 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:44,571-Speed 2618.00 samples/sec Loss 5.1608 LearningRate 0.0141 Epoch: 12 Global Step: 518380 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:57:48,447-Speed 2642.68 samples/sec Loss 4.9505 LearningRate 0.0141 Epoch: 12 Global Step: 518390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:57:52,343-Speed 2629.35 samples/sec Loss 5.0802 LearningRate 0.0141 Epoch: 12 Global Step: 518400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:57:56,241-Speed 2627.29 samples/sec Loss 5.1001 LearningRate 0.0141 Epoch: 12 Global Step: 518410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:00,137-Speed 2629.37 samples/sec Loss 5.0800 LearningRate 0.0141 Epoch: 12 Global Step: 518420 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:04,044-Speed 2620.96 samples/sec Loss 5.1575 LearningRate 0.0141 Epoch: 12 Global Step: 518430 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:07,945-Speed 2626.15 samples/sec Loss 4.9678 LearningRate 0.0141 Epoch: 12 Global Step: 518440 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:11,839-Speed 2630.12 samples/sec Loss 5.0054 LearningRate 0.0141 Epoch: 12 Global Step: 518450 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:15,736-Speed 2628.78 samples/sec Loss 5.0516 LearningRate 0.0141 Epoch: 12 Global Step: 518460 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:19,636-Speed 2626.24 samples/sec Loss 5.0411 LearningRate 0.0141 Epoch: 12 Global Step: 518470 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:23,534-Speed 2627.29 samples/sec Loss 4.9729 LearningRate 0.0141 Epoch: 12 Global Step: 518480 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:27,441-Speed 2621.94 samples/sec Loss 5.1118 LearningRate 0.0141 Epoch: 12 Global Step: 518490 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:58:31,386-Speed 2596.53 samples/sec Loss 5.0786 LearningRate 0.0141 Epoch: 12 Global Step: 518500 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:58:35,285-Speed 2627.15 samples/sec Loss 5.0000 LearningRate 0.0141 Epoch: 12 Global Step: 518510 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:58:39,219-Speed 2603.03 samples/sec Loss 5.0660 LearningRate 0.0141 Epoch: 12 Global Step: 518520 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:58:43,128-Speed 2620.66 samples/sec Loss 5.0437 LearningRate 0.0141 Epoch: 12 Global Step: 518530 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:58:47,025-Speed 2628.36 samples/sec Loss 5.0625 LearningRate 0.0141 Epoch: 12 Global Step: 518540 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:58:50,911-Speed 2635.93 samples/sec Loss 5.0230 LearningRate 0.0141 Epoch: 12 Global Step: 518550 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:54,803-Speed 2631.26 samples/sec Loss 4.9747 LearningRate 0.0141 Epoch: 12 Global Step: 518560 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:58:58,703-Speed 2627.03 samples/sec Loss 5.0871 LearningRate 0.0141 Epoch: 12 Global Step: 518570 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:02,601-Speed 2627.26 samples/sec Loss 4.9472 LearningRate 0.0141 Epoch: 12 Global Step: 518580 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:06,496-Speed 2629.86 samples/sec Loss 5.0287 LearningRate 0.0141 Epoch: 12 Global Step: 518590 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:10,414-Speed 2613.86 samples/sec Loss 5.0868 LearningRate 0.0141 Epoch: 12 Global Step: 518600 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:14,311-Speed 2628.45 samples/sec Loss 5.0119 LearningRate 0.0141 Epoch: 12 Global Step: 518610 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:18,203-Speed 2631.48 samples/sec Loss 5.1205 LearningRate 0.0141 Epoch: 12 Global Step: 518620 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:22,159-Speed 2589.86 samples/sec Loss 5.0716 LearningRate 0.0140 Epoch: 12 Global Step: 518630 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:26,056-Speed 2628.40 samples/sec Loss 4.8936 LearningRate 0.0140 Epoch: 12 Global Step: 518640 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 05:59:29,974-Speed 2614.59 samples/sec Loss 4.9968 LearningRate 0.0140 Epoch: 12 Global Step: 518650 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:59:33,876-Speed 2624.46 samples/sec Loss 5.0958 LearningRate 0.0140 Epoch: 12 Global Step: 518660 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:59:37,873-Speed 2562.62 samples/sec Loss 4.9900 LearningRate 0.0140 Epoch: 12 Global Step: 518670 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:59:41,894-Speed 2547.26 samples/sec Loss 4.9079 LearningRate 0.0140 Epoch: 12 Global Step: 518680 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:59:45,789-Speed 2630.30 samples/sec Loss 5.0255 LearningRate 0.0140 Epoch: 12 Global Step: 518690 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:59:49,689-Speed 2626.16 samples/sec Loss 4.9557 LearningRate 0.0140 Epoch: 12 Global Step: 518700 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:59:53,602-Speed 2617.81 samples/sec Loss 5.0451 LearningRate 0.0140 Epoch: 12 Global Step: 518710 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 05:59:57,502-Speed 2626.33 samples/sec Loss 5.0480 LearningRate 0.0140 Epoch: 12 Global Step: 518720 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:01,403-Speed 2626.13 samples/sec Loss 4.9959 LearningRate 0.0140 Epoch: 12 Global Step: 518730 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:05,297-Speed 2629.79 samples/sec Loss 5.0654 LearningRate 0.0140 Epoch: 12 Global Step: 518740 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:09,182-Speed 2635.93 samples/sec Loss 5.0163 LearningRate 0.0140 Epoch: 12 Global Step: 518750 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:13,085-Speed 2624.77 samples/sec Loss 4.9921 LearningRate 0.0140 Epoch: 12 Global Step: 518760 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:17,030-Speed 2596.31 samples/sec Loss 5.0121 LearningRate 0.0140 Epoch: 12 Global Step: 518770 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:20,942-Speed 2618.60 samples/sec Loss 5.0379 LearningRate 0.0140 Epoch: 12 Global Step: 518780 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:24,840-Speed 2627.17 samples/sec Loss 5.0445 LearningRate 0.0140 Epoch: 12 Global Step: 518790 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:00:28,719-Speed 2640.84 samples/sec Loss 4.9436 LearningRate 0.0140 Epoch: 12 Global Step: 518800 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:32,614-Speed 2629.69 samples/sec Loss 5.0406 LearningRate 0.0140 Epoch: 12 Global Step: 518810 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:36,561-Speed 2595.07 samples/sec Loss 4.9750 LearningRate 0.0140 Epoch: 12 Global Step: 518820 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:40,457-Speed 2629.02 samples/sec Loss 5.1454 LearningRate 0.0140 Epoch: 12 Global Step: 518830 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:44,352-Speed 2629.59 samples/sec Loss 5.0585 LearningRate 0.0140 Epoch: 12 Global Step: 518840 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:48,257-Speed 2622.79 samples/sec Loss 4.9423 LearningRate 0.0140 Epoch: 12 Global Step: 518850 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:52,171-Speed 2617.19 samples/sec Loss 5.1069 LearningRate 0.0140 Epoch: 12 Global Step: 518860 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:56,068-Speed 2627.91 samples/sec Loss 5.0729 LearningRate 0.0140 Epoch: 12 Global Step: 518870 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:00:59,989-Speed 2613.11 samples/sec Loss 5.0941 LearningRate 0.0140 Epoch: 12 Global Step: 518880 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:01:03,995-Speed 2556.32 samples/sec Loss 5.0914 LearningRate 0.0140 Epoch: 12 Global Step: 518890 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:01:07,989-Speed 2564.52 samples/sec Loss 4.9329 LearningRate 0.0140 Epoch: 12 Global Step: 518900 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:11,895-Speed 2622.18 samples/sec Loss 5.0869 LearningRate 0.0140 Epoch: 12 Global Step: 518910 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:15,900-Speed 2557.44 samples/sec Loss 4.9508 LearningRate 0.0140 Epoch: 12 Global Step: 518920 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:19,802-Speed 2625.33 samples/sec Loss 4.9437 LearningRate 0.0140 Epoch: 12 Global Step: 518930 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:23,721-Speed 2613.86 samples/sec Loss 4.9865 LearningRate 0.0140 Epoch: 12 Global Step: 518940 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:27,618-Speed 2628.45 samples/sec Loss 5.0700 LearningRate 0.0140 Epoch: 12 Global Step: 518950 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:31,583-Speed 2583.08 samples/sec Loss 5.1003 LearningRate 0.0140 Epoch: 12 Global Step: 518960 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:35,481-Speed 2627.72 samples/sec Loss 4.9746 LearningRate 0.0140 Epoch: 12 Global Step: 518970 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:01:39,396-Speed 2615.87 samples/sec Loss 4.9659 LearningRate 0.0140 Epoch: 12 Global Step: 518980 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:01:43,324-Speed 2608.25 samples/sec Loss 5.0373 LearningRate 0.0140 Epoch: 12 Global Step: 518990 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:01:47,223-Speed 2626.89 samples/sec Loss 5.1391 LearningRate 0.0140 Epoch: 12 Global Step: 519000 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:01:51,120-Speed 2628.16 samples/sec Loss 5.1477 LearningRate 0.0140 Epoch: 12 Global Step: 519010 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:01:55,050-Speed 2606.32 samples/sec Loss 5.0214 LearningRate 0.0140 Epoch: 12 Global Step: 519020 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:01:58,951-Speed 2626.09 samples/sec Loss 5.1124 LearningRate 0.0140 Epoch: 12 Global Step: 519030 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:02,876-Speed 2609.40 samples/sec Loss 4.9385 LearningRate 0.0140 Epoch: 12 Global Step: 519040 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:06,778-Speed 2624.55 samples/sec Loss 5.0152 LearningRate 0.0140 Epoch: 12 Global Step: 519050 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:10,682-Speed 2623.88 samples/sec Loss 4.9666 LearningRate 0.0140 Epoch: 12 Global Step: 519060 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:14,625-Speed 2597.96 samples/sec Loss 4.9952 LearningRate 0.0140 Epoch: 12 Global Step: 519070 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:18,550-Speed 2609.66 samples/sec Loss 4.9444 LearningRate 0.0140 Epoch: 12 Global Step: 519080 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:02:22,435-Speed 2636.32 samples/sec Loss 4.9736 LearningRate 0.0140 Epoch: 12 Global Step: 519090 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:26,341-Speed 2621.86 samples/sec Loss 5.0484 LearningRate 0.0140 Epoch: 12 Global Step: 519100 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:30,244-Speed 2624.68 samples/sec Loss 5.0216 LearningRate 0.0140 Epoch: 12 Global Step: 519110 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:34,149-Speed 2622.67 samples/sec Loss 5.1967 LearningRate 0.0140 Epoch: 12 Global Step: 519120 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:38,055-Speed 2622.05 samples/sec Loss 5.0478 LearningRate 0.0140 Epoch: 12 Global Step: 519130 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:41,978-Speed 2610.93 samples/sec Loss 4.9684 LearningRate 0.0140 Epoch: 12 Global Step: 519140 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:45,873-Speed 2630.03 samples/sec Loss 5.0792 LearningRate 0.0140 Epoch: 12 Global Step: 519150 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:49,770-Speed 2628.38 samples/sec Loss 5.0865 LearningRate 0.0140 Epoch: 12 Global Step: 519160 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:53,666-Speed 2629.48 samples/sec Loss 5.0763 LearningRate 0.0140 Epoch: 12 Global Step: 519170 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:02:57,564-Speed 2627.45 samples/sec Loss 5.0089 LearningRate 0.0140 Epoch: 12 Global Step: 519180 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:03:01,438-Speed 2643.65 samples/sec Loss 4.9789 LearningRate 0.0140 Epoch: 12 Global Step: 519190 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:03:05,343-Speed 2622.23 samples/sec Loss 4.9577 LearningRate 0.0140 Epoch: 12 Global Step: 519200 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:03:09,253-Speed 2619.87 samples/sec Loss 4.9626 LearningRate 0.0140 Epoch: 12 Global Step: 519210 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:03:13,130-Speed 2642.14 samples/sec Loss 5.0054 LearningRate 0.0140 Epoch: 12 Global Step: 519220 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:17,034-Speed 2623.07 samples/sec Loss 5.0689 LearningRate 0.0140 Epoch: 12 Global Step: 519230 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:20,936-Speed 2625.65 samples/sec Loss 5.0363 LearningRate 0.0140 Epoch: 12 Global Step: 519240 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:24,838-Speed 2624.71 samples/sec Loss 5.0430 LearningRate 0.0140 Epoch: 12 Global Step: 519250 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:28,744-Speed 2622.59 samples/sec Loss 4.9474 LearningRate 0.0140 Epoch: 12 Global Step: 519260 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:32,649-Speed 2622.74 samples/sec Loss 5.0884 LearningRate 0.0140 Epoch: 12 Global Step: 519270 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:36,550-Speed 2625.29 samples/sec Loss 4.9605 LearningRate 0.0140 Epoch: 12 Global Step: 519280 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:40,448-Speed 2627.41 samples/sec Loss 4.9722 LearningRate 0.0140 Epoch: 12 Global Step: 519290 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:44,347-Speed 2627.50 samples/sec Loss 4.9347 LearningRate 0.0140 Epoch: 12 Global Step: 519300 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:48,247-Speed 2626.13 samples/sec Loss 5.0822 LearningRate 0.0140 Epoch: 12 Global Step: 519310 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:03:52,151-Speed 2623.86 samples/sec Loss 5.0108 LearningRate 0.0140 Epoch: 12 Global Step: 519320 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:03:56,049-Speed 2627.73 samples/sec Loss 5.0759 LearningRate 0.0140 Epoch: 12 Global Step: 519330 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:03:59,950-Speed 2626.73 samples/sec Loss 5.0415 LearningRate 0.0140 Epoch: 12 Global Step: 519340 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:03,860-Speed 2618.93 samples/sec Loss 5.0189 LearningRate 0.0140 Epoch: 12 Global Step: 519350 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:07,781-Speed 2612.65 samples/sec Loss 5.0446 LearningRate 0.0140 Epoch: 12 Global Step: 519360 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:11,681-Speed 2625.98 samples/sec Loss 5.0063 LearningRate 0.0140 Epoch: 12 Global Step: 519370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:15,582-Speed 2626.25 samples/sec Loss 4.9969 LearningRate 0.0140 Epoch: 12 Global Step: 519380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:19,485-Speed 2624.08 samples/sec Loss 4.9966 LearningRate 0.0140 Epoch: 12 Global Step: 519390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:23,403-Speed 2613.78 samples/sec Loss 4.9537 LearningRate 0.0140 Epoch: 12 Global Step: 519400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:27,302-Speed 2627.88 samples/sec Loss 5.0182 LearningRate 0.0140 Epoch: 12 Global Step: 519410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:31,213-Speed 2618.87 samples/sec Loss 5.0844 LearningRate 0.0140 Epoch: 12 Global Step: 519420 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:04:35,118-Speed 2622.76 samples/sec Loss 5.0175 LearningRate 0.0140 Epoch: 12 Global Step: 519430 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:04:38,997-Speed 2640.62 samples/sec Loss 4.9878 LearningRate 0.0140 Epoch: 12 Global Step: 519440 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:42,895-Speed 2627.85 samples/sec Loss 4.9984 LearningRate 0.0140 Epoch: 12 Global Step: 519450 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:46,792-Speed 2628.04 samples/sec Loss 5.1019 LearningRate 0.0140 Epoch: 12 Global Step: 519460 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:04:50,680-Speed 2634.47 samples/sec Loss 5.0745 LearningRate 0.0140 Epoch: 12 Global Step: 519470 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:04:54,604-Speed 2610.39 samples/sec Loss 5.0526 LearningRate 0.0140 Epoch: 12 Global Step: 519480 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:04:58,607-Speed 2558.89 samples/sec Loss 4.9293 LearningRate 0.0140 Epoch: 12 Global Step: 519490 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:02,510-Speed 2624.55 samples/sec Loss 5.0312 LearningRate 0.0140 Epoch: 12 Global Step: 519500 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:06,413-Speed 2623.57 samples/sec Loss 5.0298 LearningRate 0.0140 Epoch: 12 Global Step: 519510 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:10,308-Speed 2629.90 samples/sec Loss 5.0378 LearningRate 0.0140 Epoch: 12 Global Step: 519520 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:14,343-Speed 2538.38 samples/sec Loss 4.9691 LearningRate 0.0140 Epoch: 12 Global Step: 519530 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:18,313-Speed 2580.09 samples/sec Loss 5.0452 LearningRate 0.0140 Epoch: 12 Global Step: 519540 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:22,207-Speed 2630.34 samples/sec Loss 5.0522 LearningRate 0.0140 Epoch: 12 Global Step: 519550 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:26,111-Speed 2623.36 samples/sec Loss 5.0900 LearningRate 0.0140 Epoch: 12 Global Step: 519560 Fp16 Grad Scale: 32768 Required: 35 hours
Training: 2022-04-15 06:05:30,009-Speed 2627.29 samples/sec Loss 5.0501 LearningRate 0.0140 Epoch: 12 Global Step: 519570 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:05:33,915-Speed 2622.77 samples/sec Loss 5.0767 LearningRate 0.0140 Epoch: 12 Global Step: 519580 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:05:37,843-Speed 2607.09 samples/sec Loss 5.0361 LearningRate 0.0140 Epoch: 12 Global Step: 519590 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:05:41,749-Speed 2622.52 samples/sec Loss 4.9384 LearningRate 0.0140 Epoch: 12 Global Step: 519600 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:05:45,647-Speed 2627.39 samples/sec Loss 5.1730 LearningRate 0.0140 Epoch: 12 Global Step: 519610 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:05:49,542-Speed 2630.10 samples/sec Loss 5.0547 LearningRate 0.0140 Epoch: 12 Global Step: 519620 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:05:53,439-Speed 2628.35 samples/sec Loss 4.9475 LearningRate 0.0140 Epoch: 12 Global Step: 519630 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:05:57,341-Speed 2624.69 samples/sec Loss 5.0476 LearningRate 0.0140 Epoch: 12 Global Step: 519640 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:01,240-Speed 2627.10 samples/sec Loss 4.9785 LearningRate 0.0140 Epoch: 12 Global Step: 519650 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:05,151-Speed 2618.26 samples/sec Loss 5.0661 LearningRate 0.0140 Epoch: 12 Global Step: 519660 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:09,052-Speed 2625.96 samples/sec Loss 5.0010 LearningRate 0.0140 Epoch: 12 Global Step: 519670 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:06:12,950-Speed 2627.82 samples/sec Loss 5.0050 LearningRate 0.0140 Epoch: 12 Global Step: 519680 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:06:16,832-Speed 2638.64 samples/sec Loss 5.0127 LearningRate 0.0140 Epoch: 12 Global Step: 519690 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:20,750-Speed 2614.08 samples/sec Loss 4.9523 LearningRate 0.0140 Epoch: 12 Global Step: 519700 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:24,678-Speed 2607.91 samples/sec Loss 5.0346 LearningRate 0.0140 Epoch: 12 Global Step: 519710 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:28,617-Speed 2600.67 samples/sec Loss 5.1055 LearningRate 0.0140 Epoch: 12 Global Step: 519720 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:32,550-Speed 2603.96 samples/sec Loss 4.9721 LearningRate 0.0140 Epoch: 12 Global Step: 519730 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:36,465-Speed 2616.60 samples/sec Loss 5.1207 LearningRate 0.0139 Epoch: 12 Global Step: 519740 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:40,363-Speed 2627.36 samples/sec Loss 4.9549 LearningRate 0.0139 Epoch: 12 Global Step: 519750 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:44,263-Speed 2626.89 samples/sec Loss 5.0465 LearningRate 0.0139 Epoch: 12 Global Step: 519760 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:48,164-Speed 2625.25 samples/sec Loss 5.0863 LearningRate 0.0139 Epoch: 12 Global Step: 519770 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:52,075-Speed 2619.31 samples/sec Loss 5.1000 LearningRate 0.0139 Epoch: 12 Global Step: 519780 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:06:55,977-Speed 2624.70 samples/sec Loss 5.1422 LearningRate 0.0139 Epoch: 12 Global Step: 519790 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:06:59,873-Speed 2629.87 samples/sec Loss 4.9124 LearningRate 0.0139 Epoch: 12 Global Step: 519800 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:07:03,789-Speed 2615.27 samples/sec Loss 5.1043 LearningRate 0.0139 Epoch: 12 Global Step: 519810 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:07:07,687-Speed 2628.12 samples/sec Loss 5.0649 LearningRate 0.0139 Epoch: 12 Global Step: 519820 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:07:11,585-Speed 2627.30 samples/sec Loss 4.9394 LearningRate 0.0139 Epoch: 12 Global Step: 519830 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:07:15,481-Speed 2628.80 samples/sec Loss 5.0170 LearningRate 0.0139 Epoch: 12 Global Step: 519840 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:07:19,359-Speed 2640.99 samples/sec Loss 4.9532 LearningRate 0.0139 Epoch: 12 Global Step: 519850 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:23,254-Speed 2630.22 samples/sec Loss 5.0432 LearningRate 0.0139 Epoch: 12 Global Step: 519860 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:27,164-Speed 2619.64 samples/sec Loss 5.1204 LearningRate 0.0139 Epoch: 12 Global Step: 519870 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:31,060-Speed 2629.60 samples/sec Loss 4.9013 LearningRate 0.0139 Epoch: 12 Global Step: 519880 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:34,953-Speed 2630.70 samples/sec Loss 4.9972 LearningRate 0.0139 Epoch: 12 Global Step: 519890 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:38,869-Speed 2615.34 samples/sec Loss 4.9773 LearningRate 0.0139 Epoch: 12 Global Step: 519900 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:42,764-Speed 2629.55 samples/sec Loss 4.9937 LearningRate 0.0139 Epoch: 12 Global Step: 519910 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:46,798-Speed 2538.76 samples/sec Loss 5.1379 LearningRate 0.0139 Epoch: 12 Global Step: 519920 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:50,691-Speed 2631.39 samples/sec Loss 5.0146 LearningRate 0.0139 Epoch: 12 Global Step: 519930 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:54,590-Speed 2626.95 samples/sec Loss 4.9790 LearningRate 0.0139 Epoch: 12 Global Step: 519940 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:07:58,486-Speed 2629.69 samples/sec Loss 5.0272 LearningRate 0.0139 Epoch: 12 Global Step: 519950 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:08:02,359-Speed 2644.66 samples/sec Loss 4.9056 LearningRate 0.0139 Epoch: 12 Global Step: 519960 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:08:06,255-Speed 2628.68 samples/sec Loss 4.9802 LearningRate 0.0139 Epoch: 12 Global Step: 519970 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:08:10,149-Speed 2630.04 samples/sec Loss 5.0199 LearningRate 0.0139 Epoch: 12 Global Step: 519980 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:08:14,046-Speed 2628.27 samples/sec Loss 5.0110 LearningRate 0.0139 Epoch: 12 Global Step: 519990 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:08:17,944-Speed 2627.93 samples/sec Loss 4.9788 LearningRate 0.0139 Epoch: 12 Global Step: 520000 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:09:01,338-[lfw][520000]XNorm: 23.355590
Training: 2022-04-15 06:09:01,339-[lfw][520000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 06:09:01,340-[lfw][520000]Accuracy-Highest: 0.99800
Training: 2022-04-15 06:09:51,848-[cfp_fp][520000]XNorm: 22.104677
Training: 2022-04-15 06:09:51,849-[cfp_fp][520000]Accuracy-Flip: 0.99057+-0.00457
Training: 2022-04-15 06:09:51,850-[cfp_fp][520000]Accuracy-Highest: 0.99057
Training: 2022-04-15 06:10:35,325-[agedb_30][520000]XNorm: 23.607143
Training: 2022-04-15 06:10:35,325-[agedb_30][520000]Accuracy-Flip: 0.98083+-0.00534
Training: 2022-04-15 06:10:35,326-[agedb_30][520000]Accuracy-Highest: 0.98083
Training: 2022-04-15 06:10:39,216-Speed 72.49 samples/sec Loss 4.9896 LearningRate 0.0139 Epoch: 12 Global Step: 520010 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:10:43,092-Speed 2642.37 samples/sec Loss 5.1415 LearningRate 0.0139 Epoch: 12 Global Step: 520020 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:10:46,973-Speed 2639.10 samples/sec Loss 5.0566 LearningRate 0.0139 Epoch: 12 Global Step: 520030 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:10:50,853-Speed 2639.97 samples/sec Loss 5.0630 LearningRate 0.0139 Epoch: 12 Global Step: 520040 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:10:54,736-Speed 2637.81 samples/sec Loss 5.0163 LearningRate 0.0139 Epoch: 12 Global Step: 520050 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:10:58,621-Speed 2637.24 samples/sec Loss 5.0351 LearningRate 0.0139 Epoch: 12 Global Step: 520060 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:02,510-Speed 2634.82 samples/sec Loss 5.0040 LearningRate 0.0139 Epoch: 12 Global Step: 520070 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:06,397-Speed 2634.91 samples/sec Loss 5.0106 LearningRate 0.0139 Epoch: 12 Global Step: 520080 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:10,414-Speed 2549.70 samples/sec Loss 5.0946 LearningRate 0.0139 Epoch: 12 Global Step: 520090 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:14,425-Speed 2554.58 samples/sec Loss 4.9358 LearningRate 0.0139 Epoch: 12 Global Step: 520100 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:18,327-Speed 2624.72 samples/sec Loss 4.9818 LearningRate 0.0139 Epoch: 12 Global Step: 520110 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:22,230-Speed 2624.11 samples/sec Loss 4.9241 LearningRate 0.0139 Epoch: 12 Global Step: 520120 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:26,127-Speed 2627.93 samples/sec Loss 5.0003 LearningRate 0.0139 Epoch: 12 Global Step: 520130 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:30,025-Speed 2628.86 samples/sec Loss 5.0169 LearningRate 0.0139 Epoch: 12 Global Step: 520140 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:33,922-Speed 2628.62 samples/sec Loss 5.0255 LearningRate 0.0139 Epoch: 12 Global Step: 520150 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:37,795-Speed 2644.58 samples/sec Loss 4.9261 LearningRate 0.0139 Epoch: 12 Global Step: 520160 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:41,698-Speed 2624.40 samples/sec Loss 5.0169 LearningRate 0.0139 Epoch: 12 Global Step: 520170 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:45,602-Speed 2623.47 samples/sec Loss 4.9614 LearningRate 0.0139 Epoch: 12 Global Step: 520180 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:49,507-Speed 2622.75 samples/sec Loss 4.9708 LearningRate 0.0139 Epoch: 12 Global Step: 520190 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:53,411-Speed 2623.82 samples/sec Loss 4.9410 LearningRate 0.0139 Epoch: 12 Global Step: 520200 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:11:57,316-Speed 2623.00 samples/sec Loss 4.9978 LearningRate 0.0139 Epoch: 12 Global Step: 520210 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:01,288-Speed 2578.61 samples/sec Loss 5.1431 LearningRate 0.0139 Epoch: 12 Global Step: 520220 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:05,202-Speed 2616.83 samples/sec Loss 4.9808 LearningRate 0.0139 Epoch: 12 Global Step: 520230 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:09,106-Speed 2624.42 samples/sec Loss 5.0454 LearningRate 0.0139 Epoch: 12 Global Step: 520240 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:13,051-Speed 2596.63 samples/sec Loss 5.1032 LearningRate 0.0139 Epoch: 12 Global Step: 520250 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:16,949-Speed 2627.67 samples/sec Loss 4.9646 LearningRate 0.0139 Epoch: 12 Global Step: 520260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:20,881-Speed 2604.97 samples/sec Loss 5.0175 LearningRate 0.0139 Epoch: 12 Global Step: 520270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:24,782-Speed 2625.84 samples/sec Loss 5.0293 LearningRate 0.0139 Epoch: 12 Global Step: 520280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:28,678-Speed 2628.92 samples/sec Loss 5.0560 LearningRate 0.0139 Epoch: 12 Global Step: 520290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:32,578-Speed 2627.12 samples/sec Loss 5.0830 LearningRate 0.0139 Epoch: 12 Global Step: 520300 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:36,474-Speed 2628.97 samples/sec Loss 5.0339 LearningRate 0.0139 Epoch: 12 Global Step: 520310 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:12:40,376-Speed 2625.10 samples/sec Loss 5.0285 LearningRate 0.0139 Epoch: 12 Global Step: 520320 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:12:44,277-Speed 2625.43 samples/sec Loss 5.1355 LearningRate 0.0139 Epoch: 12 Global Step: 520330 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:12:48,172-Speed 2629.82 samples/sec Loss 5.0373 LearningRate 0.0139 Epoch: 12 Global Step: 520340 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:12:52,072-Speed 2626.03 samples/sec Loss 5.0171 LearningRate 0.0139 Epoch: 12 Global Step: 520350 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:12:55,944-Speed 2645.22 samples/sec Loss 5.0399 LearningRate 0.0139 Epoch: 12 Global Step: 520360 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:12:59,861-Speed 2614.95 samples/sec Loss 5.0539 LearningRate 0.0139 Epoch: 12 Global Step: 520370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:03,766-Speed 2623.10 samples/sec Loss 4.9219 LearningRate 0.0139 Epoch: 12 Global Step: 520380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:07,664-Speed 2627.62 samples/sec Loss 5.0764 LearningRate 0.0139 Epoch: 12 Global Step: 520390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:11,559-Speed 2629.52 samples/sec Loss 5.0360 LearningRate 0.0139 Epoch: 12 Global Step: 520400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:15,459-Speed 2626.81 samples/sec Loss 4.9224 LearningRate 0.0139 Epoch: 12 Global Step: 520410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:19,357-Speed 2627.52 samples/sec Loss 5.0898 LearningRate 0.0139 Epoch: 12 Global Step: 520420 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:23,261-Speed 2623.47 samples/sec Loss 5.0804 LearningRate 0.0139 Epoch: 12 Global Step: 520430 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:27,161-Speed 2626.03 samples/sec Loss 4.9889 LearningRate 0.0139 Epoch: 12 Global Step: 520440 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:31,077-Speed 2615.66 samples/sec Loss 4.9984 LearningRate 0.0139 Epoch: 12 Global Step: 520450 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:34,974-Speed 2628.54 samples/sec Loss 4.9416 LearningRate 0.0139 Epoch: 12 Global Step: 520460 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:13:38,846-Speed 2645.50 samples/sec Loss 5.0224 LearningRate 0.0139 Epoch: 12 Global Step: 520470 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:42,742-Speed 2628.64 samples/sec Loss 5.1201 LearningRate 0.0139 Epoch: 12 Global Step: 520480 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:46,641-Speed 2627.03 samples/sec Loss 4.9863 LearningRate 0.0139 Epoch: 12 Global Step: 520490 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:50,539-Speed 2627.83 samples/sec Loss 4.9696 LearningRate 0.0139 Epoch: 12 Global Step: 520500 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:54,440-Speed 2625.73 samples/sec Loss 4.8840 LearningRate 0.0139 Epoch: 12 Global Step: 520510 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:13:58,341-Speed 2626.25 samples/sec Loss 5.1308 LearningRate 0.0139 Epoch: 12 Global Step: 520520 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:02,262-Speed 2612.08 samples/sec Loss 4.9233 LearningRate 0.0139 Epoch: 12 Global Step: 520530 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:06,168-Speed 2622.32 samples/sec Loss 4.9481 LearningRate 0.0139 Epoch: 12 Global Step: 520540 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:10,080-Speed 2617.81 samples/sec Loss 5.0344 LearningRate 0.0139 Epoch: 12 Global Step: 520550 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:13,986-Speed 2622.95 samples/sec Loss 4.9592 LearningRate 0.0139 Epoch: 12 Global Step: 520560 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:17,892-Speed 2622.18 samples/sec Loss 4.9648 LearningRate 0.0139 Epoch: 12 Global Step: 520570 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:14:21,792-Speed 2626.15 samples/sec Loss 5.0611 LearningRate 0.0139 Epoch: 12 Global Step: 520580 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:14:25,691-Speed 2626.23 samples/sec Loss 4.9341 LearningRate 0.0139 Epoch: 12 Global Step: 520590 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:14:29,629-Speed 2601.15 samples/sec Loss 4.9702 LearningRate 0.0139 Epoch: 12 Global Step: 520600 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:14:33,529-Speed 2626.05 samples/sec Loss 4.9677 LearningRate 0.0139 Epoch: 12 Global Step: 520610 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:14:37,429-Speed 2628.39 samples/sec Loss 4.9786 LearningRate 0.0139 Epoch: 12 Global Step: 520620 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:14:41,362-Speed 2603.69 samples/sec Loss 5.0035 LearningRate 0.0139 Epoch: 12 Global Step: 520630 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:14:45,234-Speed 2645.45 samples/sec Loss 5.1219 LearningRate 0.0139 Epoch: 12 Global Step: 520640 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:49,135-Speed 2625.54 samples/sec Loss 4.9714 LearningRate 0.0139 Epoch: 12 Global Step: 520650 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:53,036-Speed 2626.12 samples/sec Loss 5.0319 LearningRate 0.0139 Epoch: 12 Global Step: 520660 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:14:56,936-Speed 2626.27 samples/sec Loss 4.9672 LearningRate 0.0139 Epoch: 12 Global Step: 520670 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:00,921-Speed 2569.84 samples/sec Loss 4.9464 LearningRate 0.0139 Epoch: 12 Global Step: 520680 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:04,847-Speed 2609.28 samples/sec Loss 4.9857 LearningRate 0.0139 Epoch: 12 Global Step: 520690 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:08,747-Speed 2626.96 samples/sec Loss 4.9815 LearningRate 0.0139 Epoch: 12 Global Step: 520700 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:12,653-Speed 2622.85 samples/sec Loss 4.9874 LearningRate 0.0139 Epoch: 12 Global Step: 520710 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:16,550-Speed 2628.47 samples/sec Loss 5.0115 LearningRate 0.0139 Epoch: 12 Global Step: 520720 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:20,446-Speed 2629.44 samples/sec Loss 4.9840 LearningRate 0.0139 Epoch: 12 Global Step: 520730 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:24,342-Speed 2628.65 samples/sec Loss 4.9720 LearningRate 0.0139 Epoch: 12 Global Step: 520740 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:15:28,216-Speed 2644.74 samples/sec Loss 4.9252 LearningRate 0.0139 Epoch: 12 Global Step: 520750 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:32,113-Speed 2627.99 samples/sec Loss 5.0201 LearningRate 0.0139 Epoch: 12 Global Step: 520760 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:36,012-Speed 2627.07 samples/sec Loss 5.1497 LearningRate 0.0139 Epoch: 12 Global Step: 520770 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:39,910-Speed 2627.39 samples/sec Loss 5.0178 LearningRate 0.0139 Epoch: 12 Global Step: 520780 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:43,809-Speed 2627.26 samples/sec Loss 5.0264 LearningRate 0.0139 Epoch: 12 Global Step: 520790 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:47,805-Speed 2563.29 samples/sec Loss 5.0287 LearningRate 0.0139 Epoch: 12 Global Step: 520800 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:51,717-Speed 2618.20 samples/sec Loss 5.0597 LearningRate 0.0139 Epoch: 12 Global Step: 520810 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:55,637-Speed 2613.74 samples/sec Loss 4.9533 LearningRate 0.0139 Epoch: 12 Global Step: 520820 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:15:59,536-Speed 2626.52 samples/sec Loss 5.0332 LearningRate 0.0139 Epoch: 12 Global Step: 520830 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:03,438-Speed 2624.93 samples/sec Loss 5.0167 LearningRate 0.0139 Epoch: 12 Global Step: 520840 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:07,345-Speed 2621.04 samples/sec Loss 4.9966 LearningRate 0.0138 Epoch: 12 Global Step: 520850 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:16:11,237-Speed 2632.23 samples/sec Loss 5.0092 LearningRate 0.0138 Epoch: 12 Global Step: 520860 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:15,138-Speed 2625.44 samples/sec Loss 4.9690 LearningRate 0.0138 Epoch: 12 Global Step: 520870 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:19,094-Speed 2589.80 samples/sec Loss 5.0372 LearningRate 0.0138 Epoch: 12 Global Step: 520880 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:22,988-Speed 2630.54 samples/sec Loss 4.9239 LearningRate 0.0138 Epoch: 12 Global Step: 520890 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:26,901-Speed 2617.50 samples/sec Loss 4.8928 LearningRate 0.0138 Epoch: 12 Global Step: 520900 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:30,804-Speed 2623.88 samples/sec Loss 4.9594 LearningRate 0.0138 Epoch: 12 Global Step: 520910 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:34,707-Speed 2624.41 samples/sec Loss 5.0305 LearningRate 0.0138 Epoch: 12 Global Step: 520920 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:38,605-Speed 2627.48 samples/sec Loss 4.9708 LearningRate 0.0138 Epoch: 12 Global Step: 520930 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:42,503-Speed 2628.21 samples/sec Loss 4.9704 LearningRate 0.0138 Epoch: 12 Global Step: 520940 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:46,399-Speed 2628.99 samples/sec Loss 4.9351 LearningRate 0.0138 Epoch: 12 Global Step: 520950 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:16:50,300-Speed 2625.30 samples/sec Loss 4.9711 LearningRate 0.0138 Epoch: 12 Global Step: 520960 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:16:54,198-Speed 2627.69 samples/sec Loss 4.9924 LearningRate 0.0138 Epoch: 12 Global Step: 520970 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:16:58,091-Speed 2631.30 samples/sec Loss 5.0014 LearningRate 0.0138 Epoch: 12 Global Step: 520980 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:17:01,992-Speed 2625.57 samples/sec Loss 4.9166 LearningRate 0.0138 Epoch: 12 Global Step: 520990 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:17:05,866-Speed 2643.57 samples/sec Loss 4.9928 LearningRate 0.0138 Epoch: 12 Global Step: 521000 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:09,797-Speed 2605.41 samples/sec Loss 5.0258 LearningRate 0.0138 Epoch: 12 Global Step: 521010 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:13,697-Speed 2626.41 samples/sec Loss 4.9043 LearningRate 0.0138 Epoch: 12 Global Step: 521020 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:17,598-Speed 2625.67 samples/sec Loss 4.9820 LearningRate 0.0138 Epoch: 12 Global Step: 521030 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:21,497-Speed 2627.25 samples/sec Loss 5.0478 LearningRate 0.0138 Epoch: 12 Global Step: 521040 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:25,395-Speed 2628.11 samples/sec Loss 4.9409 LearningRate 0.0138 Epoch: 12 Global Step: 521050 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:29,292-Speed 2628.20 samples/sec Loss 5.0975 LearningRate 0.0138 Epoch: 12 Global Step: 521060 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:33,188-Speed 2629.13 samples/sec Loss 4.9684 LearningRate 0.0138 Epoch: 12 Global Step: 521070 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:37,087-Speed 2626.33 samples/sec Loss 4.9988 LearningRate 0.0138 Epoch: 12 Global Step: 521080 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:40,984-Speed 2628.09 samples/sec Loss 4.9689 LearningRate 0.0138 Epoch: 12 Global Step: 521090 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:17:44,882-Speed 2627.83 samples/sec Loss 4.9440 LearningRate 0.0138 Epoch: 12 Global Step: 521100 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:17:48,791-Speed 2620.78 samples/sec Loss 4.9928 LearningRate 0.0138 Epoch: 12 Global Step: 521110 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:17:52,691-Speed 2625.70 samples/sec Loss 4.9345 LearningRate 0.0138 Epoch: 12 Global Step: 521120 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:17:56,592-Speed 2625.96 samples/sec Loss 5.0380 LearningRate 0.0138 Epoch: 12 Global Step: 521130 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:18:00,478-Speed 2635.85 samples/sec Loss 4.9941 LearningRate 0.0138 Epoch: 12 Global Step: 521140 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:04,377-Speed 2626.36 samples/sec Loss 4.9904 LearningRate 0.0138 Epoch: 12 Global Step: 521150 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:08,275-Speed 2627.63 samples/sec Loss 5.0102 LearningRate 0.0138 Epoch: 12 Global Step: 521160 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:12,174-Speed 2627.68 samples/sec Loss 5.0040 LearningRate 0.0138 Epoch: 12 Global Step: 521170 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:16,067-Speed 2630.93 samples/sec Loss 4.9228 LearningRate 0.0138 Epoch: 12 Global Step: 521180 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:19,990-Speed 2610.95 samples/sec Loss 5.1751 LearningRate 0.0138 Epoch: 12 Global Step: 521190 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:23,886-Speed 2629.17 samples/sec Loss 5.0011 LearningRate 0.0138 Epoch: 12 Global Step: 521200 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:27,782-Speed 2628.87 samples/sec Loss 4.9459 LearningRate 0.0138 Epoch: 12 Global Step: 521210 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:31,680-Speed 2627.54 samples/sec Loss 4.8379 LearningRate 0.0138 Epoch: 12 Global Step: 521220 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:35,586-Speed 2622.86 samples/sec Loss 5.0003 LearningRate 0.0138 Epoch: 12 Global Step: 521230 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:39,480-Speed 2629.89 samples/sec Loss 5.0183 LearningRate 0.0138 Epoch: 12 Global Step: 521240 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:18:43,385-Speed 2623.30 samples/sec Loss 4.9802 LearningRate 0.0138 Epoch: 12 Global Step: 521250 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:18:47,274-Speed 2633.92 samples/sec Loss 5.0106 LearningRate 0.0138 Epoch: 12 Global Step: 521260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:51,176-Speed 2624.72 samples/sec Loss 4.9538 LearningRate 0.0138 Epoch: 12 Global Step: 521270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:55,092-Speed 2615.32 samples/sec Loss 4.9377 LearningRate 0.0138 Epoch: 12 Global Step: 521280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:18:58,996-Speed 2623.82 samples/sec Loss 4.9746 LearningRate 0.0138 Epoch: 12 Global Step: 521290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:02,913-Speed 2615.36 samples/sec Loss 4.9387 LearningRate 0.0138 Epoch: 12 Global Step: 521300 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:06,903-Speed 2566.92 samples/sec Loss 4.9835 LearningRate 0.0138 Epoch: 12 Global Step: 521310 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:10,816-Speed 2617.98 samples/sec Loss 4.9005 LearningRate 0.0138 Epoch: 12 Global Step: 521320 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:14,718-Speed 2624.75 samples/sec Loss 4.9205 LearningRate 0.0138 Epoch: 12 Global Step: 521330 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:18,621-Speed 2624.21 samples/sec Loss 4.9134 LearningRate 0.0138 Epoch: 12 Global Step: 521340 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:22,520-Speed 2626.63 samples/sec Loss 5.0373 LearningRate 0.0138 Epoch: 12 Global Step: 521350 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:26,468-Speed 2594.55 samples/sec Loss 4.9085 LearningRate 0.0138 Epoch: 12 Global Step: 521360 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:19:30,360-Speed 2631.65 samples/sec Loss 4.9921 LearningRate 0.0138 Epoch: 12 Global Step: 521370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:34,263-Speed 2624.86 samples/sec Loss 4.9943 LearningRate 0.0138 Epoch: 12 Global Step: 521380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:38,172-Speed 2619.87 samples/sec Loss 4.9912 LearningRate 0.0138 Epoch: 12 Global Step: 521390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:42,084-Speed 2618.65 samples/sec Loss 5.0068 LearningRate 0.0138 Epoch: 12 Global Step: 521400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:46,026-Speed 2597.79 samples/sec Loss 4.9586 LearningRate 0.0138 Epoch: 12 Global Step: 521410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:49,939-Speed 2618.22 samples/sec Loss 4.8691 LearningRate 0.0138 Epoch: 12 Global Step: 521420 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:53,842-Speed 2624.24 samples/sec Loss 5.0029 LearningRate 0.0138 Epoch: 12 Global Step: 521430 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:19:57,742-Speed 2626.14 samples/sec Loss 5.0262 LearningRate 0.0138 Epoch: 12 Global Step: 521440 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:01,652-Speed 2619.76 samples/sec Loss 5.0220 LearningRate 0.0138 Epoch: 12 Global Step: 521450 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:05,700-Speed 2530.25 samples/sec Loss 4.8731 LearningRate 0.0138 Epoch: 12 Global Step: 521460 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:09,714-Speed 2551.87 samples/sec Loss 5.0006 LearningRate 0.0138 Epoch: 12 Global Step: 521470 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:20:13,611-Speed 2628.72 samples/sec Loss 4.9582 LearningRate 0.0138 Epoch: 12 Global Step: 521480 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:20:17,509-Speed 2627.02 samples/sec Loss 4.9448 LearningRate 0.0138 Epoch: 12 Global Step: 521490 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:20:21,418-Speed 2620.46 samples/sec Loss 4.9744 LearningRate 0.0138 Epoch: 12 Global Step: 521500 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:20:25,311-Speed 2630.78 samples/sec Loss 4.9922 LearningRate 0.0138 Epoch: 12 Global Step: 521510 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:20:29,210-Speed 2627.18 samples/sec Loss 5.0168 LearningRate 0.0138 Epoch: 12 Global Step: 521520 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:20:33,090-Speed 2640.55 samples/sec Loss 5.0209 LearningRate 0.0138 Epoch: 12 Global Step: 521530 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:36,990-Speed 2625.87 samples/sec Loss 4.9192 LearningRate 0.0138 Epoch: 12 Global Step: 521540 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:40,892-Speed 2625.07 samples/sec Loss 5.0466 LearningRate 0.0138 Epoch: 12 Global Step: 521550 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:44,801-Speed 2621.03 samples/sec Loss 5.0060 LearningRate 0.0138 Epoch: 12 Global Step: 521560 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:48,707-Speed 2622.01 samples/sec Loss 5.0365 LearningRate 0.0138 Epoch: 12 Global Step: 521570 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:52,608-Speed 2625.83 samples/sec Loss 4.9617 LearningRate 0.0138 Epoch: 12 Global Step: 521580 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:20:56,524-Speed 2615.75 samples/sec Loss 5.0466 LearningRate 0.0138 Epoch: 12 Global Step: 521590 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:00,425-Speed 2625.42 samples/sec Loss 5.0559 LearningRate 0.0138 Epoch: 12 Global Step: 521600 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:04,343-Speed 2613.96 samples/sec Loss 4.8999 LearningRate 0.0138 Epoch: 12 Global Step: 521610 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:08,249-Speed 2622.56 samples/sec Loss 5.0890 LearningRate 0.0138 Epoch: 12 Global Step: 521620 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:12,152-Speed 2624.49 samples/sec Loss 4.9968 LearningRate 0.0138 Epoch: 12 Global Step: 521630 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:21:16,046-Speed 2629.94 samples/sec Loss 5.0261 LearningRate 0.0138 Epoch: 12 Global Step: 521640 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:21:19,963-Speed 2615.53 samples/sec Loss 5.0124 LearningRate 0.0138 Epoch: 12 Global Step: 521650 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:21:23,861-Speed 2627.51 samples/sec Loss 4.9853 LearningRate 0.0138 Epoch: 12 Global Step: 521660 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:21:27,769-Speed 2621.51 samples/sec Loss 5.0073 LearningRate 0.0138 Epoch: 12 Global Step: 521670 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:21:31,664-Speed 2629.05 samples/sec Loss 4.9376 LearningRate 0.0138 Epoch: 12 Global Step: 521680 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:21:35,582-Speed 2614.79 samples/sec Loss 5.0010 LearningRate 0.0138 Epoch: 12 Global Step: 521690 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:21:39,463-Speed 2638.69 samples/sec Loss 4.9488 LearningRate 0.0138 Epoch: 12 Global Step: 521700 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:43,362-Speed 2627.52 samples/sec Loss 5.0598 LearningRate 0.0138 Epoch: 12 Global Step: 521710 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:47,266-Speed 2623.56 samples/sec Loss 4.9709 LearningRate 0.0138 Epoch: 12 Global Step: 521720 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:51,176-Speed 2620.35 samples/sec Loss 5.0302 LearningRate 0.0138 Epoch: 12 Global Step: 521730 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:55,092-Speed 2615.56 samples/sec Loss 4.9492 LearningRate 0.0138 Epoch: 12 Global Step: 521740 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:21:58,989-Speed 2628.33 samples/sec Loss 4.9944 LearningRate 0.0138 Epoch: 12 Global Step: 521750 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:02,886-Speed 2628.53 samples/sec Loss 5.0113 LearningRate 0.0138 Epoch: 12 Global Step: 521760 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:06,794-Speed 2620.64 samples/sec Loss 4.9711 LearningRate 0.0138 Epoch: 12 Global Step: 521770 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:10,693-Speed 2626.80 samples/sec Loss 5.0367 LearningRate 0.0138 Epoch: 12 Global Step: 521780 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:14,639-Speed 2596.41 samples/sec Loss 5.0508 LearningRate 0.0138 Epoch: 12 Global Step: 521790 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:18,536-Speed 2628.18 samples/sec Loss 5.0083 LearningRate 0.0138 Epoch: 12 Global Step: 521800 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:22:22,448-Speed 2618.15 samples/sec Loss 4.9936 LearningRate 0.0138 Epoch: 12 Global Step: 521810 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:22:26,325-Speed 2642.43 samples/sec Loss 4.9472 LearningRate 0.0138 Epoch: 12 Global Step: 521820 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:30,227-Speed 2624.78 samples/sec Loss 4.9437 LearningRate 0.0138 Epoch: 12 Global Step: 521830 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:34,128-Speed 2626.05 samples/sec Loss 5.0318 LearningRate 0.0138 Epoch: 12 Global Step: 521840 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:38,028-Speed 2625.53 samples/sec Loss 5.0831 LearningRate 0.0138 Epoch: 12 Global Step: 521850 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:41,931-Speed 2625.26 samples/sec Loss 4.9444 LearningRate 0.0138 Epoch: 12 Global Step: 521860 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:45,832-Speed 2625.34 samples/sec Loss 4.9662 LearningRate 0.0138 Epoch: 12 Global Step: 521870 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:49,731-Speed 2627.59 samples/sec Loss 5.0007 LearningRate 0.0138 Epoch: 12 Global Step: 521880 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:53,629-Speed 2627.30 samples/sec Loss 4.9850 LearningRate 0.0138 Epoch: 12 Global Step: 521890 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:22:57,526-Speed 2629.07 samples/sec Loss 4.9395 LearningRate 0.0138 Epoch: 12 Global Step: 521900 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:01,427-Speed 2625.54 samples/sec Loss 5.1562 LearningRate 0.0138 Epoch: 12 Global Step: 521910 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:05,334-Speed 2621.49 samples/sec Loss 5.0132 LearningRate 0.0138 Epoch: 12 Global Step: 521920 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:23:09,237-Speed 2624.08 samples/sec Loss 4.9990 LearningRate 0.0138 Epoch: 12 Global Step: 521930 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:23:13,134-Speed 2628.56 samples/sec Loss 5.0465 LearningRate 0.0138 Epoch: 12 Global Step: 521940 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:23:17,032-Speed 2627.80 samples/sec Loss 4.8250 LearningRate 0.0138 Epoch: 12 Global Step: 521950 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:23:20,926-Speed 2630.95 samples/sec Loss 4.9237 LearningRate 0.0138 Epoch: 12 Global Step: 521960 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:23:24,804-Speed 2641.15 samples/sec Loss 4.9714 LearningRate 0.0137 Epoch: 12 Global Step: 521970 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:28,714-Speed 2619.73 samples/sec Loss 4.9117 LearningRate 0.0137 Epoch: 12 Global Step: 521980 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:32,621-Speed 2621.42 samples/sec Loss 4.9772 LearningRate 0.0137 Epoch: 12 Global Step: 521990 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:36,525-Speed 2623.74 samples/sec Loss 4.9728 LearningRate 0.0137 Epoch: 12 Global Step: 522000 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:40,469-Speed 2596.61 samples/sec Loss 5.0856 LearningRate 0.0137 Epoch: 12 Global Step: 522010 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:44,364-Speed 2630.46 samples/sec Loss 4.9645 LearningRate 0.0137 Epoch: 12 Global Step: 522020 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:48,256-Speed 2630.95 samples/sec Loss 4.9672 LearningRate 0.0137 Epoch: 12 Global Step: 522030 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:52,153-Speed 2628.74 samples/sec Loss 5.0913 LearningRate 0.0137 Epoch: 12 Global Step: 522040 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:56,049-Speed 2628.98 samples/sec Loss 4.9253 LearningRate 0.0137 Epoch: 12 Global Step: 522050 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:23:59,948-Speed 2627.42 samples/sec Loss 5.0310 LearningRate 0.0137 Epoch: 12 Global Step: 522060 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:03,842-Speed 2630.01 samples/sec Loss 4.8804 LearningRate 0.0137 Epoch: 12 Global Step: 522070 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:24:07,750-Speed 2620.88 samples/sec Loss 5.0139 LearningRate 0.0137 Epoch: 12 Global Step: 522080 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:24:11,652-Speed 2624.75 samples/sec Loss 5.1034 LearningRate 0.0137 Epoch: 12 Global Step: 522090 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:15,559-Speed 2621.82 samples/sec Loss 4.9877 LearningRate 0.0137 Epoch: 12 Global Step: 522100 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:19,460-Speed 2625.99 samples/sec Loss 4.9765 LearningRate 0.0137 Epoch: 12 Global Step: 522110 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:23,361-Speed 2625.50 samples/sec Loss 4.9984 LearningRate 0.0137 Epoch: 12 Global Step: 522120 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:27,262-Speed 2624.88 samples/sec Loss 4.8994 LearningRate 0.0137 Epoch: 12 Global Step: 522130 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:31,168-Speed 2622.11 samples/sec Loss 5.0091 LearningRate 0.0137 Epoch: 12 Global Step: 522140 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:35,074-Speed 2622.82 samples/sec Loss 5.0190 LearningRate 0.0137 Epoch: 12 Global Step: 522150 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:38,970-Speed 2629.38 samples/sec Loss 4.9467 LearningRate 0.0137 Epoch: 12 Global Step: 522160 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:42,867-Speed 2628.47 samples/sec Loss 4.9319 LearningRate 0.0137 Epoch: 12 Global Step: 522170 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:46,764-Speed 2628.51 samples/sec Loss 4.9560 LearningRate 0.0137 Epoch: 12 Global Step: 522180 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:24:50,660-Speed 2628.99 samples/sec Loss 4.9726 LearningRate 0.0137 Epoch: 12 Global Step: 522190 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:24:54,606-Speed 2595.18 samples/sec Loss 5.0377 LearningRate 0.0137 Epoch: 12 Global Step: 522200 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:24:58,499-Speed 2630.77 samples/sec Loss 5.0071 LearningRate 0.0137 Epoch: 12 Global Step: 522210 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:25:02,412-Speed 2617.55 samples/sec Loss 4.9020 LearningRate 0.0137 Epoch: 12 Global Step: 522220 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:25:06,314-Speed 2625.44 samples/sec Loss 4.9040 LearningRate 0.0137 Epoch: 12 Global Step: 522230 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:25:10,189-Speed 2643.63 samples/sec Loss 4.9132 LearningRate 0.0137 Epoch: 12 Global Step: 522240 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:14,084-Speed 2629.18 samples/sec Loss 5.0793 LearningRate 0.0137 Epoch: 12 Global Step: 522250 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:17,986-Speed 2625.58 samples/sec Loss 4.9755 LearningRate 0.0137 Epoch: 12 Global Step: 522260 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:21,888-Speed 2624.18 samples/sec Loss 4.9893 LearningRate 0.0137 Epoch: 12 Global Step: 522270 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:25,786-Speed 2628.02 samples/sec Loss 4.9940 LearningRate 0.0137 Epoch: 12 Global Step: 522280 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:29,685-Speed 2626.28 samples/sec Loss 5.0209 LearningRate 0.0137 Epoch: 12 Global Step: 522290 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:33,585-Speed 2626.95 samples/sec Loss 4.8631 LearningRate 0.0137 Epoch: 12 Global Step: 522300 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:37,482-Speed 2627.64 samples/sec Loss 4.9506 LearningRate 0.0137 Epoch: 12 Global Step: 522310 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:41,379-Speed 2628.50 samples/sec Loss 5.0252 LearningRate 0.0137 Epoch: 12 Global Step: 522320 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:45,280-Speed 2625.92 samples/sec Loss 4.9686 LearningRate 0.0137 Epoch: 12 Global Step: 522330 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:49,180-Speed 2630.45 samples/sec Loss 5.0238 LearningRate 0.0137 Epoch: 12 Global Step: 522340 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:25:53,051-Speed 2645.55 samples/sec Loss 4.9019 LearningRate 0.0137 Epoch: 12 Global Step: 522350 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:25:56,949-Speed 2627.17 samples/sec Loss 5.0480 LearningRate 0.0137 Epoch: 12 Global Step: 522360 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:00,842-Speed 2630.92 samples/sec Loss 4.9141 LearningRate 0.0137 Epoch: 12 Global Step: 522370 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:04,737-Speed 2629.95 samples/sec Loss 4.9360 LearningRate 0.0137 Epoch: 12 Global Step: 522380 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:08,635-Speed 2628.28 samples/sec Loss 4.9378 LearningRate 0.0137 Epoch: 12 Global Step: 522390 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:12,529-Speed 2629.87 samples/sec Loss 4.9167 LearningRate 0.0137 Epoch: 12 Global Step: 522400 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:16,422-Speed 2631.54 samples/sec Loss 4.9951 LearningRate 0.0137 Epoch: 12 Global Step: 522410 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:20,315-Speed 2630.83 samples/sec Loss 4.9818 LearningRate 0.0137 Epoch: 12 Global Step: 522420 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:24,211-Speed 2629.11 samples/sec Loss 4.9744 LearningRate 0.0137 Epoch: 12 Global Step: 522430 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:28,101-Speed 2632.86 samples/sec Loss 4.9908 LearningRate 0.0137 Epoch: 12 Global Step: 522440 Fp16 Grad Scale: 65536 Required: 35 hours
Training: 2022-04-15 06:26:31,995-Speed 2630.67 samples/sec Loss 4.9069 LearningRate 0.0137 Epoch: 12 Global Step: 522450 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:26:35,889-Speed 2629.98 samples/sec Loss 4.9767 LearningRate 0.0137 Epoch: 12 Global Step: 522460 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:26:39,785-Speed 2628.79 samples/sec Loss 5.0789 LearningRate 0.0137 Epoch: 12 Global Step: 522470 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:26:43,679-Speed 2631.65 samples/sec Loss 5.0251 LearningRate 0.0137 Epoch: 12 Global Step: 522480 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:26:47,578-Speed 2626.69 samples/sec Loss 4.9557 LearningRate 0.0137 Epoch: 12 Global Step: 522490 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:26:51,476-Speed 2627.67 samples/sec Loss 4.9737 LearningRate 0.0137 Epoch: 12 Global Step: 522500 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:26:55,397-Speed 2612.40 samples/sec Loss 4.9312 LearningRate 0.0137 Epoch: 12 Global Step: 522510 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:26:59,294-Speed 2627.85 samples/sec Loss 5.0026 LearningRate 0.0137 Epoch: 12 Global Step: 522520 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:27:03,189-Speed 2629.62 samples/sec Loss 4.9860 LearningRate 0.0137 Epoch: 12 Global Step: 522530 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:27:07,091-Speed 2625.32 samples/sec Loss 4.8960 LearningRate 0.0137 Epoch: 12 Global Step: 522540 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:27:10,961-Speed 2646.21 samples/sec Loss 5.0272 LearningRate 0.0137 Epoch: 12 Global Step: 522550 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:27:14,873-Speed 2618.54 samples/sec Loss 5.0502 LearningRate 0.0137 Epoch: 12 Global Step: 522560 Fp16 Grad Scale: 131072 Required: 35 hours
Training: 2022-04-15 06:27:18,769-Speed 2629.35 samples/sec Loss 4.8761 LearningRate 0.0137 Epoch: 12 Global Step: 522570 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:27:22,664-Speed 2629.73 samples/sec Loss 4.9519 LearningRate 0.0137 Epoch: 12 Global Step: 522580 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:27:26,560-Speed 2628.77 samples/sec Loss 4.9156 LearningRate 0.0137 Epoch: 12 Global Step: 522590 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:27:30,467-Speed 2621.68 samples/sec Loss 5.0481 LearningRate 0.0137 Epoch: 12 Global Step: 522600 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:27:34,365-Speed 2627.30 samples/sec Loss 4.9199 LearningRate 0.0137 Epoch: 12 Global Step: 522610 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:27:38,269-Speed 2623.20 samples/sec Loss 4.9501 LearningRate 0.0137 Epoch: 12 Global Step: 522620 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:27:42,174-Speed 2623.56 samples/sec Loss 4.9939 LearningRate 0.0137 Epoch: 12 Global Step: 522630 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:27:46,060-Speed 2635.14 samples/sec Loss 4.9141 LearningRate 0.0137 Epoch: 12 Global Step: 522640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:27:49,960-Speed 2627.23 samples/sec Loss 5.0566 LearningRate 0.0137 Epoch: 12 Global Step: 522650 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:27:53,860-Speed 2626.37 samples/sec Loss 4.9478 LearningRate 0.0137 Epoch: 12 Global Step: 522660 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:27:57,772-Speed 2618.01 samples/sec Loss 4.8816 LearningRate 0.0137 Epoch: 12 Global Step: 522670 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:01,675-Speed 2624.31 samples/sec Loss 4.9795 LearningRate 0.0137 Epoch: 12 Global Step: 522680 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:05,574-Speed 2626.32 samples/sec Loss 4.9839 LearningRate 0.0137 Epoch: 12 Global Step: 522690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:09,468-Speed 2630.54 samples/sec Loss 4.9874 LearningRate 0.0137 Epoch: 12 Global Step: 522700 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:13,364-Speed 2629.19 samples/sec Loss 4.9874 LearningRate 0.0137 Epoch: 12 Global Step: 522710 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:17,260-Speed 2628.74 samples/sec Loss 4.9212 LearningRate 0.0137 Epoch: 12 Global Step: 522720 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:21,157-Speed 2628.12 samples/sec Loss 4.9615 LearningRate 0.0137 Epoch: 12 Global Step: 522730 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:25,058-Speed 2625.77 samples/sec Loss 4.8467 LearningRate 0.0137 Epoch: 12 Global Step: 522740 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:28:28,970-Speed 2618.96 samples/sec Loss 4.8711 LearningRate 0.0137 Epoch: 12 Global Step: 522750 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:28:32,862-Speed 2631.40 samples/sec Loss 5.0405 LearningRate 0.0137 Epoch: 12 Global Step: 522760 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:28:36,758-Speed 2628.38 samples/sec Loss 4.8831 LearningRate 0.0137 Epoch: 12 Global Step: 522770 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:28:40,657-Speed 2626.84 samples/sec Loss 5.0045 LearningRate 0.0137 Epoch: 12 Global Step: 522780 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:28:44,552-Speed 2630.19 samples/sec Loss 4.9713 LearningRate 0.0137 Epoch: 12 Global Step: 522790 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:48,449-Speed 2628.16 samples/sec Loss 4.9856 LearningRate 0.0137 Epoch: 12 Global Step: 522800 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:52,452-Speed 2559.02 samples/sec Loss 4.9230 LearningRate 0.0137 Epoch: 12 Global Step: 522810 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:28:56,353-Speed 2625.82 samples/sec Loss 4.9587 LearningRate 0.0137 Epoch: 12 Global Step: 522820 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:00,247-Speed 2631.43 samples/sec Loss 4.9237 LearningRate 0.0137 Epoch: 12 Global Step: 522830 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:04,141-Speed 2630.16 samples/sec Loss 5.0307 LearningRate 0.0137 Epoch: 12 Global Step: 522840 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:08,035-Speed 2629.86 samples/sec Loss 4.8518 LearningRate 0.0137 Epoch: 12 Global Step: 522850 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:11,940-Speed 2622.59 samples/sec Loss 4.9369 LearningRate 0.0137 Epoch: 12 Global Step: 522860 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:15,838-Speed 2627.78 samples/sec Loss 4.8586 LearningRate 0.0137 Epoch: 12 Global Step: 522870 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:19,740-Speed 2625.58 samples/sec Loss 5.0168 LearningRate 0.0137 Epoch: 12 Global Step: 522880 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:23,638-Speed 2627.03 samples/sec Loss 4.9629 LearningRate 0.0137 Epoch: 12 Global Step: 522890 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:27,562-Speed 2610.82 samples/sec Loss 4.9984 LearningRate 0.0137 Epoch: 12 Global Step: 522900 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:31,520-Speed 2587.16 samples/sec Loss 4.9173 LearningRate 0.0137 Epoch: 12 Global Step: 522910 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:35,419-Speed 2627.38 samples/sec Loss 5.0091 LearningRate 0.0137 Epoch: 12 Global Step: 522920 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:39,323-Speed 2622.89 samples/sec Loss 5.0167 LearningRate 0.0137 Epoch: 12 Global Step: 522930 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:43,265-Speed 2598.70 samples/sec Loss 4.9981 LearningRate 0.0137 Epoch: 12 Global Step: 522940 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:47,207-Speed 2598.29 samples/sec Loss 4.9107 LearningRate 0.0137 Epoch: 12 Global Step: 522950 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:51,105-Speed 2628.40 samples/sec Loss 4.9178 LearningRate 0.0137 Epoch: 12 Global Step: 522960 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:29:55,069-Speed 2583.22 samples/sec Loss 5.0427 LearningRate 0.0137 Epoch: 12 Global Step: 522970 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:29:59,058-Speed 2568.09 samples/sec Loss 4.9765 LearningRate 0.0137 Epoch: 12 Global Step: 522980 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:02,955-Speed 2628.71 samples/sec Loss 5.0281 LearningRate 0.0137 Epoch: 12 Global Step: 522990 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:06,853-Speed 2626.94 samples/sec Loss 4.9598 LearningRate 0.0137 Epoch: 12 Global Step: 523000 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:10,769-Speed 2615.89 samples/sec Loss 5.0785 LearningRate 0.0137 Epoch: 12 Global Step: 523010 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:14,667-Speed 2627.41 samples/sec Loss 4.9980 LearningRate 0.0137 Epoch: 12 Global Step: 523020 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:18,570-Speed 2624.52 samples/sec Loss 5.0646 LearningRate 0.0137 Epoch: 12 Global Step: 523030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:22,466-Speed 2628.82 samples/sec Loss 4.9658 LearningRate 0.0137 Epoch: 12 Global Step: 523040 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:26,363-Speed 2627.91 samples/sec Loss 4.9591 LearningRate 0.0137 Epoch: 12 Global Step: 523050 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:30,267-Speed 2624.29 samples/sec Loss 4.9363 LearningRate 0.0137 Epoch: 12 Global Step: 523060 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:34,212-Speed 2596.50 samples/sec Loss 4.9989 LearningRate 0.0137 Epoch: 12 Global Step: 523070 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:30:38,109-Speed 2628.41 samples/sec Loss 5.0229 LearningRate 0.0137 Epoch: 12 Global Step: 523080 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:30:42,005-Speed 2629.36 samples/sec Loss 4.8846 LearningRate 0.0136 Epoch: 12 Global Step: 523090 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:30:45,880-Speed 2642.87 samples/sec Loss 4.8748 LearningRate 0.0136 Epoch: 12 Global Step: 523100 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:30:49,765-Speed 2636.98 samples/sec Loss 5.0545 LearningRate 0.0136 Epoch: 12 Global Step: 523110 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:30:53,670-Speed 2622.58 samples/sec Loss 5.0215 LearningRate 0.0136 Epoch: 12 Global Step: 523120 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:30:57,577-Speed 2622.01 samples/sec Loss 5.0074 LearningRate 0.0136 Epoch: 12 Global Step: 523130 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:01,468-Speed 2631.76 samples/sec Loss 5.0732 LearningRate 0.0136 Epoch: 12 Global Step: 523140 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:05,360-Speed 2632.02 samples/sec Loss 4.9683 LearningRate 0.0136 Epoch: 12 Global Step: 523150 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:09,255-Speed 2629.79 samples/sec Loss 5.0402 LearningRate 0.0136 Epoch: 12 Global Step: 523160 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:13,175-Speed 2613.13 samples/sec Loss 4.9476 LearningRate 0.0136 Epoch: 12 Global Step: 523170 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:17,080-Speed 2622.98 samples/sec Loss 4.8381 LearningRate 0.0136 Epoch: 12 Global Step: 523180 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:20,972-Speed 2632.00 samples/sec Loss 4.9108 LearningRate 0.0136 Epoch: 12 Global Step: 523190 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:24,878-Speed 2622.11 samples/sec Loss 5.0128 LearningRate 0.0136 Epoch: 12 Global Step: 523200 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:28,781-Speed 2623.91 samples/sec Loss 4.8696 LearningRate 0.0136 Epoch: 12 Global Step: 523210 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:31:32,660-Speed 2640.50 samples/sec Loss 4.9820 LearningRate 0.0136 Epoch: 12 Global Step: 523220 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:36,556-Speed 2628.83 samples/sec Loss 4.8407 LearningRate 0.0136 Epoch: 12 Global Step: 523230 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:40,463-Speed 2621.53 samples/sec Loss 4.9999 LearningRate 0.0136 Epoch: 12 Global Step: 523240 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:44,356-Speed 2630.81 samples/sec Loss 5.0623 LearningRate 0.0136 Epoch: 12 Global Step: 523250 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:48,250-Speed 2630.89 samples/sec Loss 4.9055 LearningRate 0.0136 Epoch: 12 Global Step: 523260 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:52,151-Speed 2625.80 samples/sec Loss 5.0145 LearningRate 0.0136 Epoch: 12 Global Step: 523270 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:56,044-Speed 2630.84 samples/sec Loss 4.8774 LearningRate 0.0136 Epoch: 12 Global Step: 523280 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:31:59,939-Speed 2629.69 samples/sec Loss 5.0049 LearningRate 0.0136 Epoch: 12 Global Step: 523290 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:32:03,835-Speed 2629.01 samples/sec Loss 5.0875 LearningRate 0.0136 Epoch: 12 Global Step: 523300 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:32:07,728-Speed 2630.38 samples/sec Loss 4.8858 LearningRate 0.0136 Epoch: 12 Global Step: 523310 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:32:11,629-Speed 2626.28 samples/sec Loss 4.9864 LearningRate 0.0136 Epoch: 12 Global Step: 523320 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:15,527-Speed 2627.55 samples/sec Loss 4.9568 LearningRate 0.0136 Epoch: 12 Global Step: 523330 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:19,429-Speed 2624.67 samples/sec Loss 4.9311 LearningRate 0.0136 Epoch: 12 Global Step: 523340 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:23,328-Speed 2627.08 samples/sec Loss 4.8370 LearningRate 0.0136 Epoch: 12 Global Step: 523350 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:27,228-Speed 2626.73 samples/sec Loss 5.0074 LearningRate 0.0136 Epoch: 12 Global Step: 523360 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:31,124-Speed 2628.75 samples/sec Loss 4.9331 LearningRate 0.0136 Epoch: 12 Global Step: 523370 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:35,020-Speed 2629.84 samples/sec Loss 4.9023 LearningRate 0.0136 Epoch: 12 Global Step: 523380 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:38,916-Speed 2628.41 samples/sec Loss 5.0277 LearningRate 0.0136 Epoch: 12 Global Step: 523390 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:42,814-Speed 2627.93 samples/sec Loss 5.0114 LearningRate 0.0136 Epoch: 12 Global Step: 523400 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:46,741-Speed 2608.48 samples/sec Loss 5.0110 LearningRate 0.0136 Epoch: 12 Global Step: 523410 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:32:50,634-Speed 2630.81 samples/sec Loss 5.0109 LearningRate 0.0136 Epoch: 12 Global Step: 523420 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:32:54,533-Speed 2627.22 samples/sec Loss 4.8904 LearningRate 0.0136 Epoch: 12 Global Step: 523430 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:32:58,433-Speed 2626.14 samples/sec Loss 4.9689 LearningRate 0.0136 Epoch: 12 Global Step: 523440 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:33:02,319-Speed 2636.15 samples/sec Loss 4.9029 LearningRate 0.0136 Epoch: 12 Global Step: 523450 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:33:06,211-Speed 2631.28 samples/sec Loss 5.1053 LearningRate 0.0136 Epoch: 12 Global Step: 523460 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:33:10,128-Speed 2615.12 samples/sec Loss 4.9067 LearningRate 0.0136 Epoch: 12 Global Step: 523470 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:33:14,025-Speed 2628.63 samples/sec Loss 4.9219 LearningRate 0.0136 Epoch: 12 Global Step: 523480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:33:17,902-Speed 2641.83 samples/sec Loss 4.9477 LearningRate 0.0136 Epoch: 12 Global Step: 523490 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:21,827-Speed 2609.76 samples/sec Loss 4.8144 LearningRate 0.0136 Epoch: 12 Global Step: 523500 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:25,727-Speed 2626.22 samples/sec Loss 4.9741 LearningRate 0.0136 Epoch: 12 Global Step: 523510 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:29,628-Speed 2625.95 samples/sec Loss 4.9361 LearningRate 0.0136 Epoch: 12 Global Step: 523520 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:33,519-Speed 2632.14 samples/sec Loss 4.9644 LearningRate 0.0136 Epoch: 12 Global Step: 523530 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:37,413-Speed 2630.43 samples/sec Loss 4.9634 LearningRate 0.0136 Epoch: 12 Global Step: 523540 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:41,307-Speed 2629.74 samples/sec Loss 4.9664 LearningRate 0.0136 Epoch: 12 Global Step: 523550 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:45,199-Speed 2632.43 samples/sec Loss 5.0021 LearningRate 0.0136 Epoch: 12 Global Step: 523560 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:49,092-Speed 2630.94 samples/sec Loss 5.0069 LearningRate 0.0136 Epoch: 12 Global Step: 523570 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:52,997-Speed 2622.87 samples/sec Loss 5.0482 LearningRate 0.0136 Epoch: 12 Global Step: 523580 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:33:56,944-Speed 2594.66 samples/sec Loss 4.9194 LearningRate 0.0136 Epoch: 12 Global Step: 523590 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:00,844-Speed 2627.36 samples/sec Loss 4.9081 LearningRate 0.0136 Epoch: 12 Global Step: 523600 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:04,739-Speed 2629.45 samples/sec Loss 5.0584 LearningRate 0.0136 Epoch: 12 Global Step: 523610 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:08,636-Speed 2627.94 samples/sec Loss 4.8841 LearningRate 0.0136 Epoch: 12 Global Step: 523620 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:12,529-Speed 2630.66 samples/sec Loss 5.0066 LearningRate 0.0136 Epoch: 12 Global Step: 523630 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:16,424-Speed 2629.57 samples/sec Loss 4.9640 LearningRate 0.0136 Epoch: 12 Global Step: 523640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:20,320-Speed 2629.89 samples/sec Loss 4.9481 LearningRate 0.0136 Epoch: 12 Global Step: 523650 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:24,230-Speed 2619.46 samples/sec Loss 4.8539 LearningRate 0.0136 Epoch: 12 Global Step: 523660 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:28,227-Speed 2562.20 samples/sec Loss 4.8975 LearningRate 0.0136 Epoch: 12 Global Step: 523670 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:32,122-Speed 2629.79 samples/sec Loss 4.8827 LearningRate 0.0136 Epoch: 12 Global Step: 523680 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:35,996-Speed 2644.00 samples/sec Loss 4.8949 LearningRate 0.0136 Epoch: 12 Global Step: 523690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:39,889-Speed 2631.05 samples/sec Loss 4.9920 LearningRate 0.0136 Epoch: 12 Global Step: 523700 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:43,803-Speed 2617.33 samples/sec Loss 5.0544 LearningRate 0.0136 Epoch: 12 Global Step: 523710 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:47,696-Speed 2631.11 samples/sec Loss 4.9545 LearningRate 0.0136 Epoch: 12 Global Step: 523720 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:51,595-Speed 2627.15 samples/sec Loss 4.8944 LearningRate 0.0136 Epoch: 12 Global Step: 523730 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:55,488-Speed 2630.55 samples/sec Loss 4.8985 LearningRate 0.0136 Epoch: 12 Global Step: 523740 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:34:59,397-Speed 2620.65 samples/sec Loss 4.9250 LearningRate 0.0136 Epoch: 12 Global Step: 523750 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:03,288-Speed 2632.22 samples/sec Loss 4.9908 LearningRate 0.0136 Epoch: 12 Global Step: 523760 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:07,178-Speed 2632.55 samples/sec Loss 4.9962 LearningRate 0.0136 Epoch: 12 Global Step: 523770 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:11,080-Speed 2625.00 samples/sec Loss 5.0100 LearningRate 0.0136 Epoch: 12 Global Step: 523780 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:14,975-Speed 2629.66 samples/sec Loss 4.9530 LearningRate 0.0136 Epoch: 12 Global Step: 523790 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:35:18,851-Speed 2642.17 samples/sec Loss 4.8111 LearningRate 0.0136 Epoch: 12 Global Step: 523800 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:22,751-Speed 2627.21 samples/sec Loss 4.9558 LearningRate 0.0136 Epoch: 12 Global Step: 523810 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:26,649-Speed 2627.81 samples/sec Loss 4.9238 LearningRate 0.0136 Epoch: 12 Global Step: 523820 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:30,543-Speed 2630.08 samples/sec Loss 4.9625 LearningRate 0.0136 Epoch: 12 Global Step: 523830 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:34,439-Speed 2628.80 samples/sec Loss 5.0232 LearningRate 0.0136 Epoch: 12 Global Step: 523840 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:38,334-Speed 2629.95 samples/sec Loss 4.8707 LearningRate 0.0136 Epoch: 12 Global Step: 523850 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:42,232-Speed 2627.38 samples/sec Loss 5.0208 LearningRate 0.0136 Epoch: 12 Global Step: 523860 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:46,125-Speed 2631.15 samples/sec Loss 4.8253 LearningRate 0.0136 Epoch: 12 Global Step: 523870 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:50,020-Speed 2629.54 samples/sec Loss 4.9617 LearningRate 0.0136 Epoch: 12 Global Step: 523880 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:53,920-Speed 2625.99 samples/sec Loss 5.1934 LearningRate 0.0136 Epoch: 12 Global Step: 523890 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:35:57,811-Speed 2632.86 samples/sec Loss 4.8805 LearningRate 0.0136 Epoch: 12 Global Step: 523900 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:36:01,702-Speed 2632.32 samples/sec Loss 4.9360 LearningRate 0.0136 Epoch: 12 Global Step: 523910 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:36:05,601-Speed 2627.24 samples/sec Loss 4.9445 LearningRate 0.0136 Epoch: 12 Global Step: 523920 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:36:09,503-Speed 2624.48 samples/sec Loss 4.9052 LearningRate 0.0136 Epoch: 12 Global Step: 523930 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:36:13,372-Speed 2648.02 samples/sec Loss 5.0146 LearningRate 0.0136 Epoch: 12 Global Step: 523940 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:17,264-Speed 2632.00 samples/sec Loss 4.8632 LearningRate 0.0136 Epoch: 12 Global Step: 523950 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:21,154-Speed 2632.84 samples/sec Loss 4.8621 LearningRate 0.0136 Epoch: 12 Global Step: 523960 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:25,066-Speed 2618.97 samples/sec Loss 5.0333 LearningRate 0.0136 Epoch: 12 Global Step: 523970 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:28,967-Speed 2624.95 samples/sec Loss 5.0559 LearningRate 0.0136 Epoch: 12 Global Step: 523980 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:32,866-Speed 2626.68 samples/sec Loss 4.9037 LearningRate 0.0136 Epoch: 12 Global Step: 523990 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:36,762-Speed 2629.05 samples/sec Loss 4.9123 LearningRate 0.0136 Epoch: 12 Global Step: 524000 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:40,655-Speed 2631.42 samples/sec Loss 4.9940 LearningRate 0.0136 Epoch: 12 Global Step: 524010 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:44,546-Speed 2632.45 samples/sec Loss 5.0024 LearningRate 0.0136 Epoch: 12 Global Step: 524020 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:48,443-Speed 2628.24 samples/sec Loss 5.0095 LearningRate 0.0136 Epoch: 12 Global Step: 524030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:36:52,337-Speed 2630.84 samples/sec Loss 4.9462 LearningRate 0.0136 Epoch: 12 Global Step: 524040 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:36:56,230-Speed 2630.68 samples/sec Loss 5.0668 LearningRate 0.0136 Epoch: 12 Global Step: 524050 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:37:00,130-Speed 2625.78 samples/sec Loss 4.8515 LearningRate 0.0136 Epoch: 12 Global Step: 524060 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:37:04,036-Speed 2622.25 samples/sec Loss 4.9744 LearningRate 0.0136 Epoch: 12 Global Step: 524070 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:37:07,915-Speed 2641.19 samples/sec Loss 4.8464 LearningRate 0.0136 Epoch: 12 Global Step: 524080 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:11,814-Speed 2626.51 samples/sec Loss 5.0482 LearningRate 0.0136 Epoch: 12 Global Step: 524090 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:15,717-Speed 2624.64 samples/sec Loss 5.0694 LearningRate 0.0136 Epoch: 12 Global Step: 524100 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:19,631-Speed 2616.84 samples/sec Loss 4.9032 LearningRate 0.0136 Epoch: 12 Global Step: 524110 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:23,525-Speed 2630.39 samples/sec Loss 4.9730 LearningRate 0.0136 Epoch: 12 Global Step: 524120 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:27,420-Speed 2629.34 samples/sec Loss 4.9897 LearningRate 0.0136 Epoch: 12 Global Step: 524130 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:31,313-Speed 2630.95 samples/sec Loss 4.9497 LearningRate 0.0136 Epoch: 12 Global Step: 524140 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:35,231-Speed 2613.89 samples/sec Loss 4.9483 LearningRate 0.0136 Epoch: 12 Global Step: 524150 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:39,159-Speed 2608.33 samples/sec Loss 4.9874 LearningRate 0.0136 Epoch: 12 Global Step: 524160 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:43,056-Speed 2628.67 samples/sec Loss 4.8869 LearningRate 0.0136 Epoch: 12 Global Step: 524170 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:46,947-Speed 2632.00 samples/sec Loss 4.8885 LearningRate 0.0136 Epoch: 12 Global Step: 524180 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:37:50,842-Speed 2630.05 samples/sec Loss 4.9553 LearningRate 0.0136 Epoch: 12 Global Step: 524190 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:37:54,724-Speed 2638.85 samples/sec Loss 4.9334 LearningRate 0.0136 Epoch: 12 Global Step: 524200 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:37:58,622-Speed 2628.09 samples/sec Loss 4.9813 LearningRate 0.0135 Epoch: 12 Global Step: 524210 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:02,517-Speed 2628.81 samples/sec Loss 4.9612 LearningRate 0.0135 Epoch: 12 Global Step: 524220 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:06,414-Speed 2628.43 samples/sec Loss 4.8444 LearningRate 0.0135 Epoch: 12 Global Step: 524230 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:10,308-Speed 2630.83 samples/sec Loss 4.9755 LearningRate 0.0135 Epoch: 12 Global Step: 524240 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:14,204-Speed 2629.13 samples/sec Loss 4.9979 LearningRate 0.0135 Epoch: 12 Global Step: 524250 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:18,101-Speed 2628.66 samples/sec Loss 5.0293 LearningRate 0.0135 Epoch: 12 Global Step: 524260 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:21,995-Speed 2630.46 samples/sec Loss 4.8871 LearningRate 0.0135 Epoch: 12 Global Step: 524270 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:25,890-Speed 2629.88 samples/sec Loss 5.0419 LearningRate 0.0135 Epoch: 12 Global Step: 524280 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:29,784-Speed 2630.31 samples/sec Loss 4.9168 LearningRate 0.0135 Epoch: 12 Global Step: 524290 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:33,689-Speed 2622.87 samples/sec Loss 4.9702 LearningRate 0.0135 Epoch: 12 Global Step: 524300 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:38:37,596-Speed 2621.02 samples/sec Loss 4.9481 LearningRate 0.0135 Epoch: 12 Global Step: 524310 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:38:41,496-Speed 2627.07 samples/sec Loss 4.9383 LearningRate 0.0135 Epoch: 12 Global Step: 524320 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:38:45,372-Speed 2643.14 samples/sec Loss 4.9339 LearningRate 0.0135 Epoch: 12 Global Step: 524330 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:49,294-Speed 2611.07 samples/sec Loss 4.9814 LearningRate 0.0135 Epoch: 12 Global Step: 524340 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:53,191-Speed 2628.60 samples/sec Loss 4.9208 LearningRate 0.0135 Epoch: 12 Global Step: 524350 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:38:57,090-Speed 2626.77 samples/sec Loss 4.9777 LearningRate 0.0135 Epoch: 12 Global Step: 524360 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:39:01,006-Speed 2615.74 samples/sec Loss 5.0354 LearningRate 0.0135 Epoch: 12 Global Step: 524370 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:39:04,882-Speed 2642.49 samples/sec Loss 4.8597 LearningRate 0.0135 Epoch: 12 Global Step: 524380 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:08,835-Speed 2591.32 samples/sec Loss 4.9767 LearningRate 0.0135 Epoch: 12 Global Step: 524390 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:12,749-Speed 2617.12 samples/sec Loss 4.9362 LearningRate 0.0135 Epoch: 12 Global Step: 524400 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:16,652-Speed 2624.23 samples/sec Loss 5.0407 LearningRate 0.0135 Epoch: 12 Global Step: 524410 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:20,558-Speed 2622.79 samples/sec Loss 4.9280 LearningRate 0.0135 Epoch: 12 Global Step: 524420 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:24,453-Speed 2629.50 samples/sec Loss 4.8918 LearningRate 0.0135 Epoch: 12 Global Step: 524430 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:28,353-Speed 2627.50 samples/sec Loss 4.9581 LearningRate 0.0135 Epoch: 12 Global Step: 524440 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:32,270-Speed 2614.35 samples/sec Loss 4.8799 LearningRate 0.0135 Epoch: 12 Global Step: 524450 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:36,166-Speed 2629.04 samples/sec Loss 4.8890 LearningRate 0.0135 Epoch: 12 Global Step: 524460 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:40,071-Speed 2622.98 samples/sec Loss 5.0513 LearningRate 0.0135 Epoch: 12 Global Step: 524470 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:39:43,971-Speed 2626.55 samples/sec Loss 4.9588 LearningRate 0.0135 Epoch: 12 Global Step: 524480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:39:47,874-Speed 2623.98 samples/sec Loss 4.9357 LearningRate 0.0135 Epoch: 12 Global Step: 524490 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:39:51,772-Speed 2627.93 samples/sec Loss 4.9516 LearningRate 0.0135 Epoch: 12 Global Step: 524500 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:39:55,675-Speed 2624.15 samples/sec Loss 5.0165 LearningRate 0.0135 Epoch: 12 Global Step: 524510 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:39:59,591-Speed 2615.89 samples/sec Loss 4.9143 LearningRate 0.0135 Epoch: 12 Global Step: 524520 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:03,503-Speed 2618.22 samples/sec Loss 4.9497 LearningRate 0.0135 Epoch: 12 Global Step: 524530 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:07,513-Speed 2554.28 samples/sec Loss 4.9215 LearningRate 0.0135 Epoch: 12 Global Step: 524540 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:11,435-Speed 2611.28 samples/sec Loss 4.8371 LearningRate 0.0135 Epoch: 12 Global Step: 524550 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:15,333-Speed 2627.63 samples/sec Loss 4.9466 LearningRate 0.0135 Epoch: 12 Global Step: 524560 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:19,231-Speed 2628.10 samples/sec Loss 4.9607 LearningRate 0.0135 Epoch: 12 Global Step: 524570 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:23,136-Speed 2623.37 samples/sec Loss 4.9514 LearningRate 0.0135 Epoch: 12 Global Step: 524580 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:40:27,012-Speed 2642.35 samples/sec Loss 4.8776 LearningRate 0.0135 Epoch: 12 Global Step: 524590 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:30,907-Speed 2629.83 samples/sec Loss 4.9186 LearningRate 0.0135 Epoch: 12 Global Step: 524600 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:34,801-Speed 2629.71 samples/sec Loss 4.9543 LearningRate 0.0135 Epoch: 12 Global Step: 524610 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:38,696-Speed 2629.65 samples/sec Loss 4.9757 LearningRate 0.0135 Epoch: 12 Global Step: 524620 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:42,605-Speed 2620.43 samples/sec Loss 4.9445 LearningRate 0.0135 Epoch: 12 Global Step: 524630 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:46,496-Speed 2632.28 samples/sec Loss 4.9476 LearningRate 0.0135 Epoch: 12 Global Step: 524640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:50,405-Speed 2620.54 samples/sec Loss 4.9423 LearningRate 0.0135 Epoch: 12 Global Step: 524650 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:54,298-Speed 2630.89 samples/sec Loss 4.9004 LearningRate 0.0135 Epoch: 12 Global Step: 524660 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:40:58,172-Speed 2644.38 samples/sec Loss 4.9266 LearningRate 0.0135 Epoch: 12 Global Step: 524670 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:02,088-Speed 2615.49 samples/sec Loss 4.9258 LearningRate 0.0135 Epoch: 12 Global Step: 524680 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:06,122-Speed 2538.57 samples/sec Loss 5.0402 LearningRate 0.0135 Epoch: 12 Global Step: 524690 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:10,024-Speed 2624.85 samples/sec Loss 4.9132 LearningRate 0.0135 Epoch: 12 Global Step: 524700 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:13,919-Speed 2629.68 samples/sec Loss 4.9553 LearningRate 0.0135 Epoch: 12 Global Step: 524710 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:17,816-Speed 2628.77 samples/sec Loss 4.9500 LearningRate 0.0135 Epoch: 12 Global Step: 524720 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:21,709-Speed 2631.15 samples/sec Loss 4.9635 LearningRate 0.0135 Epoch: 12 Global Step: 524730 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:25,614-Speed 2622.88 samples/sec Loss 5.0277 LearningRate 0.0135 Epoch: 12 Global Step: 524740 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:29,509-Speed 2630.07 samples/sec Loss 4.9162 LearningRate 0.0135 Epoch: 12 Global Step: 524750 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:33,406-Speed 2628.33 samples/sec Loss 4.9146 LearningRate 0.0135 Epoch: 12 Global Step: 524760 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:41:37,306-Speed 2626.08 samples/sec Loss 4.8850 LearningRate 0.0135 Epoch: 12 Global Step: 524770 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:41:41,203-Speed 2628.58 samples/sec Loss 4.8731 LearningRate 0.0135 Epoch: 12 Global Step: 524780 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:41:45,097-Speed 2630.26 samples/sec Loss 4.9480 LearningRate 0.0135 Epoch: 12 Global Step: 524790 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:41:49,004-Speed 2621.40 samples/sec Loss 4.8255 LearningRate 0.0135 Epoch: 12 Global Step: 524800 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:41:52,904-Speed 2626.61 samples/sec Loss 4.9747 LearningRate 0.0135 Epoch: 12 Global Step: 524810 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:41:56,798-Speed 2630.20 samples/sec Loss 4.8774 LearningRate 0.0135 Epoch: 12 Global Step: 524820 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:00,694-Speed 2629.48 samples/sec Loss 4.8443 LearningRate 0.0135 Epoch: 12 Global Step: 524830 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:04,596-Speed 2625.10 samples/sec Loss 5.0055 LearningRate 0.0135 Epoch: 12 Global Step: 524840 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:08,497-Speed 2625.43 samples/sec Loss 4.9267 LearningRate 0.0135 Epoch: 12 Global Step: 524850 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:12,396-Speed 2627.03 samples/sec Loss 4.8433 LearningRate 0.0135 Epoch: 12 Global Step: 524860 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:16,307-Speed 2619.50 samples/sec Loss 4.9611 LearningRate 0.0135 Epoch: 12 Global Step: 524870 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:42:20,181-Speed 2644.00 samples/sec Loss 4.8854 LearningRate 0.0135 Epoch: 12 Global Step: 524880 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:24,151-Speed 2580.46 samples/sec Loss 4.9169 LearningRate 0.0135 Epoch: 12 Global Step: 524890 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:28,049-Speed 2627.13 samples/sec Loss 4.8994 LearningRate 0.0135 Epoch: 12 Global Step: 524900 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:31,946-Speed 2628.90 samples/sec Loss 4.9025 LearningRate 0.0135 Epoch: 12 Global Step: 524910 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:35,841-Speed 2629.74 samples/sec Loss 4.9469 LearningRate 0.0135 Epoch: 12 Global Step: 524920 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:39,741-Speed 2626.44 samples/sec Loss 5.0166 LearningRate 0.0135 Epoch: 12 Global Step: 524930 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:43,636-Speed 2629.65 samples/sec Loss 4.9855 LearningRate 0.0135 Epoch: 12 Global Step: 524940 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:47,531-Speed 2629.91 samples/sec Loss 4.9961 LearningRate 0.0135 Epoch: 12 Global Step: 524950 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:51,433-Speed 2624.97 samples/sec Loss 4.9899 LearningRate 0.0135 Epoch: 12 Global Step: 524960 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:55,329-Speed 2629.26 samples/sec Loss 4.9050 LearningRate 0.0135 Epoch: 12 Global Step: 524970 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:42:59,229-Speed 2626.20 samples/sec Loss 4.8609 LearningRate 0.0135 Epoch: 12 Global Step: 524980 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:03,122-Speed 2630.99 samples/sec Loss 4.9756 LearningRate 0.0135 Epoch: 12 Global Step: 524990 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:07,015-Speed 2630.79 samples/sec Loss 4.9939 LearningRate 0.0135 Epoch: 12 Global Step: 525000 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:10,906-Speed 2633.00 samples/sec Loss 4.9470 LearningRate 0.0135 Epoch: 12 Global Step: 525010 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:14,799-Speed 2630.87 samples/sec Loss 4.8756 LearningRate 0.0135 Epoch: 12 Global Step: 525020 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:18,707-Speed 2621.33 samples/sec Loss 4.8242 LearningRate 0.0135 Epoch: 12 Global Step: 525030 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:22,604-Speed 2628.01 samples/sec Loss 4.8971 LearningRate 0.0135 Epoch: 12 Global Step: 525040 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:26,497-Speed 2631.60 samples/sec Loss 4.9310 LearningRate 0.0135 Epoch: 12 Global Step: 525050 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:30,394-Speed 2627.91 samples/sec Loss 4.9334 LearningRate 0.0135 Epoch: 12 Global Step: 525060 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:34,292-Speed 2627.82 samples/sec Loss 4.8911 LearningRate 0.0135 Epoch: 12 Global Step: 525070 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:43:38,169-Speed 2641.70 samples/sec Loss 4.8898 LearningRate 0.0135 Epoch: 12 Global Step: 525080 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:43:42,063-Speed 2630.41 samples/sec Loss 4.9361 LearningRate 0.0135 Epoch: 12 Global Step: 525090 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:43:45,958-Speed 2630.02 samples/sec Loss 4.9748 LearningRate 0.0135 Epoch: 12 Global Step: 525100 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:43:49,850-Speed 2631.22 samples/sec Loss 5.0238 LearningRate 0.0135 Epoch: 12 Global Step: 525110 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:43:53,742-Speed 2632.04 samples/sec Loss 4.8774 LearningRate 0.0135 Epoch: 12 Global Step: 525120 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:43:57,635-Speed 2630.71 samples/sec Loss 4.9982 LearningRate 0.0135 Epoch: 12 Global Step: 525130 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:01,528-Speed 2630.80 samples/sec Loss 5.0072 LearningRate 0.0135 Epoch: 12 Global Step: 525140 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:05,420-Speed 2631.99 samples/sec Loss 4.8485 LearningRate 0.0135 Epoch: 12 Global Step: 525150 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:09,319-Speed 2627.39 samples/sec Loss 5.1038 LearningRate 0.0135 Epoch: 12 Global Step: 525160 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:13,213-Speed 2630.12 samples/sec Loss 4.8742 LearningRate 0.0135 Epoch: 12 Global Step: 525170 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:17,087-Speed 2644.18 samples/sec Loss 4.9116 LearningRate 0.0135 Epoch: 12 Global Step: 525180 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:20,981-Speed 2630.05 samples/sec Loss 4.9169 LearningRate 0.0135 Epoch: 12 Global Step: 525190 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:24,877-Speed 2629.68 samples/sec Loss 4.8885 LearningRate 0.0135 Epoch: 12 Global Step: 525200 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:44:28,753-Speed 2642.92 samples/sec Loss 4.9342 LearningRate 0.0135 Epoch: 12 Global Step: 525210 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:32,643-Speed 2632.29 samples/sec Loss 4.9427 LearningRate 0.0135 Epoch: 12 Global Step: 525220 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:36,544-Speed 2626.17 samples/sec Loss 4.9536 LearningRate 0.0135 Epoch: 12 Global Step: 525230 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:40,444-Speed 2626.56 samples/sec Loss 4.9181 LearningRate 0.0135 Epoch: 12 Global Step: 525240 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:44,337-Speed 2630.44 samples/sec Loss 4.9841 LearningRate 0.0135 Epoch: 12 Global Step: 525250 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:48,236-Speed 2627.60 samples/sec Loss 4.9706 LearningRate 0.0135 Epoch: 12 Global Step: 525260 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:52,135-Speed 2627.02 samples/sec Loss 4.8448 LearningRate 0.0135 Epoch: 12 Global Step: 525270 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:56,027-Speed 2631.65 samples/sec Loss 4.9294 LearningRate 0.0135 Epoch: 12 Global Step: 525280 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:44:59,936-Speed 2619.97 samples/sec Loss 4.9650 LearningRate 0.0135 Epoch: 12 Global Step: 525290 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:45:03,841-Speed 2623.15 samples/sec Loss 4.9417 LearningRate 0.0135 Epoch: 12 Global Step: 525300 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:45:07,747-Speed 2622.28 samples/sec Loss 4.9673 LearningRate 0.0135 Epoch: 12 Global Step: 525310 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:11,636-Speed 2633.98 samples/sec Loss 4.8770 LearningRate 0.0135 Epoch: 12 Global Step: 525320 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:15,528-Speed 2632.38 samples/sec Loss 4.9210 LearningRate 0.0135 Epoch: 12 Global Step: 525330 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:19,425-Speed 2628.28 samples/sec Loss 4.8994 LearningRate 0.0134 Epoch: 12 Global Step: 525340 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:23,324-Speed 2627.10 samples/sec Loss 4.8337 LearningRate 0.0134 Epoch: 12 Global Step: 525350 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:27,214-Speed 2633.50 samples/sec Loss 5.0044 LearningRate 0.0134 Epoch: 12 Global Step: 525360 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:31,107-Speed 2631.13 samples/sec Loss 4.9754 LearningRate 0.0134 Epoch: 12 Global Step: 525370 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:35,001-Speed 2630.47 samples/sec Loss 4.8734 LearningRate 0.0134 Epoch: 12 Global Step: 525380 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:38,891-Speed 2632.78 samples/sec Loss 4.9166 LearningRate 0.0134 Epoch: 12 Global Step: 525390 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:42,795-Speed 2623.27 samples/sec Loss 5.1323 LearningRate 0.0134 Epoch: 12 Global Step: 525400 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:45:46,894-Speed 2499.52 samples/sec Loss 4.7621 LearningRate 0.0134 Epoch: 12 Global Step: 525410 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:45:50,786-Speed 2631.69 samples/sec Loss 4.9857 LearningRate 0.0134 Epoch: 12 Global Step: 525420 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:45:54,709-Speed 2611.11 samples/sec Loss 4.9204 LearningRate 0.0134 Epoch: 12 Global Step: 525430 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:45:58,590-Speed 2639.38 samples/sec Loss 4.9372 LearningRate 0.0134 Epoch: 12 Global Step: 525440 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:02,483-Speed 2630.73 samples/sec Loss 4.9302 LearningRate 0.0134 Epoch: 12 Global Step: 525450 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:06,383-Speed 2626.76 samples/sec Loss 4.8934 LearningRate 0.0134 Epoch: 12 Global Step: 525460 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:10,297-Speed 2616.35 samples/sec Loss 4.9461 LearningRate 0.0134 Epoch: 12 Global Step: 525470 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:14,247-Speed 2593.70 samples/sec Loss 4.9641 LearningRate 0.0134 Epoch: 12 Global Step: 525480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:18,150-Speed 2623.85 samples/sec Loss 4.8883 LearningRate 0.0134 Epoch: 12 Global Step: 525490 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:22,068-Speed 2614.84 samples/sec Loss 4.8814 LearningRate 0.0134 Epoch: 12 Global Step: 525500 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:25,966-Speed 2627.84 samples/sec Loss 4.9933 LearningRate 0.0134 Epoch: 12 Global Step: 525510 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:29,857-Speed 2632.35 samples/sec Loss 4.9624 LearningRate 0.0134 Epoch: 12 Global Step: 525520 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:33,755-Speed 2627.20 samples/sec Loss 5.0188 LearningRate 0.0134 Epoch: 12 Global Step: 525530 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:37,648-Speed 2630.83 samples/sec Loss 4.9582 LearningRate 0.0134 Epoch: 12 Global Step: 525540 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:46:41,526-Speed 2641.34 samples/sec Loss 4.9705 LearningRate 0.0134 Epoch: 12 Global Step: 525550 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:45,429-Speed 2624.48 samples/sec Loss 5.0200 LearningRate 0.0134 Epoch: 12 Global Step: 525560 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:49,325-Speed 2629.00 samples/sec Loss 4.8264 LearningRate 0.0134 Epoch: 12 Global Step: 525570 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:53,232-Speed 2622.37 samples/sec Loss 4.9016 LearningRate 0.0134 Epoch: 12 Global Step: 525580 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:46:57,132-Speed 2625.91 samples/sec Loss 4.7520 LearningRate 0.0134 Epoch: 12 Global Step: 525590 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:01,031-Speed 2627.03 samples/sec Loss 5.0318 LearningRate 0.0134 Epoch: 12 Global Step: 525600 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:04,935-Speed 2623.28 samples/sec Loss 5.0350 LearningRate 0.0134 Epoch: 12 Global Step: 525610 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:08,840-Speed 2623.00 samples/sec Loss 4.9745 LearningRate 0.0134 Epoch: 12 Global Step: 525620 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:12,749-Speed 2619.86 samples/sec Loss 5.0631 LearningRate 0.0134 Epoch: 12 Global Step: 525630 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:16,645-Speed 2629.35 samples/sec Loss 4.9355 LearningRate 0.0134 Epoch: 12 Global Step: 525640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:20,544-Speed 2627.20 samples/sec Loss 4.9542 LearningRate 0.0134 Epoch: 12 Global Step: 525650 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:47:24,443-Speed 2627.16 samples/sec Loss 4.9001 LearningRate 0.0134 Epoch: 12 Global Step: 525660 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:47:28,339-Speed 2628.50 samples/sec Loss 4.9466 LearningRate 0.0134 Epoch: 12 Global Step: 525670 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:47:32,238-Speed 2627.63 samples/sec Loss 4.8586 LearningRate 0.0134 Epoch: 12 Global Step: 525680 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:47:36,117-Speed 2640.47 samples/sec Loss 4.8957 LearningRate 0.0134 Epoch: 12 Global Step: 525690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:40,018-Speed 2625.69 samples/sec Loss 4.9574 LearningRate 0.0134 Epoch: 12 Global Step: 525700 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:43,912-Speed 2629.80 samples/sec Loss 4.9372 LearningRate 0.0134 Epoch: 12 Global Step: 525710 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:47,807-Speed 2629.96 samples/sec Loss 4.7825 LearningRate 0.0134 Epoch: 12 Global Step: 525720 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:51,728-Speed 2612.86 samples/sec Loss 4.9815 LearningRate 0.0134 Epoch: 12 Global Step: 525730 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:55,672-Speed 2596.38 samples/sec Loss 4.9283 LearningRate 0.0134 Epoch: 12 Global Step: 525740 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:47:59,581-Speed 2620.59 samples/sec Loss 4.9493 LearningRate 0.0134 Epoch: 12 Global Step: 525750 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:03,482-Speed 2625.54 samples/sec Loss 4.8112 LearningRate 0.0134 Epoch: 12 Global Step: 525760 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:07,381-Speed 2627.24 samples/sec Loss 5.0408 LearningRate 0.0134 Epoch: 12 Global Step: 525770 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:11,279-Speed 2627.20 samples/sec Loss 4.9633 LearningRate 0.0134 Epoch: 12 Global Step: 525780 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:15,175-Speed 2629.58 samples/sec Loss 4.9145 LearningRate 0.0134 Epoch: 12 Global Step: 525790 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:48:19,079-Speed 2622.92 samples/sec Loss 4.9167 LearningRate 0.0134 Epoch: 12 Global Step: 525800 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:48:22,948-Speed 2647.54 samples/sec Loss 4.8120 LearningRate 0.0134 Epoch: 12 Global Step: 525810 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:26,854-Speed 2622.28 samples/sec Loss 5.0238 LearningRate 0.0134 Epoch: 12 Global Step: 525820 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:30,749-Speed 2629.60 samples/sec Loss 4.9026 LearningRate 0.0134 Epoch: 12 Global Step: 525830 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:34,653-Speed 2623.96 samples/sec Loss 4.8427 LearningRate 0.0134 Epoch: 12 Global Step: 525840 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:38,548-Speed 2629.33 samples/sec Loss 4.9755 LearningRate 0.0134 Epoch: 12 Global Step: 525850 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:42,445-Speed 2628.77 samples/sec Loss 5.0147 LearningRate 0.0134 Epoch: 12 Global Step: 525860 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:46,339-Speed 2630.37 samples/sec Loss 4.8928 LearningRate 0.0134 Epoch: 12 Global Step: 525870 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:50,234-Speed 2629.76 samples/sec Loss 4.9655 LearningRate 0.0134 Epoch: 12 Global Step: 525880 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:54,157-Speed 2611.10 samples/sec Loss 4.9367 LearningRate 0.0134 Epoch: 12 Global Step: 525890 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:48:58,136-Speed 2574.14 samples/sec Loss 4.9338 LearningRate 0.0134 Epoch: 12 Global Step: 525900 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:02,226-Speed 2503.74 samples/sec Loss 4.8626 LearningRate 0.0134 Epoch: 12 Global Step: 525910 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:49:06,323-Speed 2499.83 samples/sec Loss 4.9629 LearningRate 0.0134 Epoch: 12 Global Step: 525920 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:49:10,393-Speed 2517.01 samples/sec Loss 4.8987 LearningRate 0.0134 Epoch: 12 Global Step: 525930 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:49:14,289-Speed 2628.66 samples/sec Loss 4.9077 LearningRate 0.0134 Epoch: 12 Global Step: 525940 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:49:18,198-Speed 2621.11 samples/sec Loss 4.8946 LearningRate 0.0134 Epoch: 12 Global Step: 525950 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:49:22,072-Speed 2643.33 samples/sec Loss 4.9930 LearningRate 0.0134 Epoch: 12 Global Step: 525960 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:25,969-Speed 2628.79 samples/sec Loss 4.7715 LearningRate 0.0134 Epoch: 12 Global Step: 525970 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:29,866-Speed 2628.34 samples/sec Loss 4.8069 LearningRate 0.0134 Epoch: 12 Global Step: 525980 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:33,759-Speed 2630.48 samples/sec Loss 4.9693 LearningRate 0.0134 Epoch: 12 Global Step: 525990 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:37,650-Speed 2632.44 samples/sec Loss 4.8637 LearningRate 0.0134 Epoch: 12 Global Step: 526000 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:41,550-Speed 2626.60 samples/sec Loss 4.9969 LearningRate 0.0134 Epoch: 12 Global Step: 526010 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:45,452-Speed 2625.33 samples/sec Loss 4.9261 LearningRate 0.0134 Epoch: 12 Global Step: 526020 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:49,393-Speed 2598.74 samples/sec Loss 4.9259 LearningRate 0.0134 Epoch: 12 Global Step: 526030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:53,487-Speed 2501.61 samples/sec Loss 4.9409 LearningRate 0.0134 Epoch: 12 Global Step: 526040 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:49:57,393-Speed 2622.42 samples/sec Loss 4.8222 LearningRate 0.0134 Epoch: 12 Global Step: 526050 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:01,306-Speed 2617.89 samples/sec Loss 5.0603 LearningRate 0.0134 Epoch: 12 Global Step: 526060 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:50:05,213-Speed 2621.15 samples/sec Loss 4.8979 LearningRate 0.0134 Epoch: 12 Global Step: 526070 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:50:09,122-Speed 2620.31 samples/sec Loss 4.9350 LearningRate 0.0134 Epoch: 12 Global Step: 526080 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:50:13,020-Speed 2627.76 samples/sec Loss 4.8968 LearningRate 0.0134 Epoch: 12 Global Step: 526090 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:50:16,898-Speed 2641.12 samples/sec Loss 4.9587 LearningRate 0.0134 Epoch: 12 Global Step: 526100 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:20,936-Speed 2536.26 samples/sec Loss 4.9940 LearningRate 0.0134 Epoch: 12 Global Step: 526110 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:25,032-Speed 2501.10 samples/sec Loss 4.9863 LearningRate 0.0134 Epoch: 12 Global Step: 526120 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:29,128-Speed 2500.36 samples/sec Loss 4.9857 LearningRate 0.0134 Epoch: 12 Global Step: 526130 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:33,202-Speed 2514.59 samples/sec Loss 4.9506 LearningRate 0.0134 Epoch: 12 Global Step: 526140 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:37,140-Speed 2601.05 samples/sec Loss 4.8734 LearningRate 0.0134 Epoch: 12 Global Step: 526150 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:41,035-Speed 2630.02 samples/sec Loss 4.8957 LearningRate 0.0134 Epoch: 12 Global Step: 526160 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:45,037-Speed 2559.25 samples/sec Loss 4.8572 LearningRate 0.0134 Epoch: 12 Global Step: 526170 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:48,929-Speed 2631.31 samples/sec Loss 4.8256 LearningRate 0.0134 Epoch: 12 Global Step: 526180 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:52,825-Speed 2629.59 samples/sec Loss 4.9081 LearningRate 0.0134 Epoch: 12 Global Step: 526190 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:50:56,697-Speed 2645.39 samples/sec Loss 4.8777 LearningRate 0.0134 Epoch: 12 Global Step: 526200 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:00,599-Speed 2625.02 samples/sec Loss 4.9050 LearningRate 0.0134 Epoch: 12 Global Step: 526210 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:04,495-Speed 2628.46 samples/sec Loss 4.9014 LearningRate 0.0134 Epoch: 12 Global Step: 526220 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:08,412-Speed 2614.58 samples/sec Loss 4.9155 LearningRate 0.0134 Epoch: 12 Global Step: 526230 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:12,309-Speed 2628.88 samples/sec Loss 4.9084 LearningRate 0.0134 Epoch: 12 Global Step: 526240 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:16,203-Speed 2630.74 samples/sec Loss 4.8647 LearningRate 0.0134 Epoch: 12 Global Step: 526250 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:20,106-Speed 2624.15 samples/sec Loss 4.8933 LearningRate 0.0134 Epoch: 12 Global Step: 526260 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:24,009-Speed 2624.83 samples/sec Loss 4.9073 LearningRate 0.0134 Epoch: 12 Global Step: 526270 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:27,908-Speed 2626.80 samples/sec Loss 4.9369 LearningRate 0.0134 Epoch: 12 Global Step: 526280 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:31,805-Speed 2628.27 samples/sec Loss 4.9113 LearningRate 0.0134 Epoch: 12 Global Step: 526290 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:35,714-Speed 2620.26 samples/sec Loss 4.9802 LearningRate 0.0134 Epoch: 12 Global Step: 526300 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:51:39,635-Speed 2611.89 samples/sec Loss 4.8822 LearningRate 0.0134 Epoch: 12 Global Step: 526310 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:51:43,555-Speed 2612.75 samples/sec Loss 4.8030 LearningRate 0.0134 Epoch: 12 Global Step: 526320 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:51:47,470-Speed 2616.52 samples/sec Loss 4.9425 LearningRate 0.0134 Epoch: 12 Global Step: 526330 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:51:51,359-Speed 2634.49 samples/sec Loss 4.9084 LearningRate 0.0134 Epoch: 12 Global Step: 526340 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:55,253-Speed 2629.99 samples/sec Loss 4.8896 LearningRate 0.0134 Epoch: 12 Global Step: 526350 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:51:59,194-Speed 2598.79 samples/sec Loss 4.9515 LearningRate 0.0134 Epoch: 12 Global Step: 526360 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:03,128-Speed 2603.82 samples/sec Loss 4.8727 LearningRate 0.0134 Epoch: 12 Global Step: 526370 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:07,028-Speed 2626.42 samples/sec Loss 5.0312 LearningRate 0.0134 Epoch: 12 Global Step: 526380 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:10,922-Speed 2630.01 samples/sec Loss 5.0091 LearningRate 0.0134 Epoch: 12 Global Step: 526390 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:14,830-Speed 2621.08 samples/sec Loss 4.8533 LearningRate 0.0134 Epoch: 12 Global Step: 526400 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:18,736-Speed 2622.84 samples/sec Loss 4.9082 LearningRate 0.0134 Epoch: 12 Global Step: 526410 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:22,636-Speed 2626.32 samples/sec Loss 4.8116 LearningRate 0.0134 Epoch: 12 Global Step: 526420 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:26,538-Speed 2625.08 samples/sec Loss 4.7829 LearningRate 0.0134 Epoch: 12 Global Step: 526430 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:30,431-Speed 2630.77 samples/sec Loss 4.9347 LearningRate 0.0134 Epoch: 12 Global Step: 526440 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:52:34,334-Speed 2624.03 samples/sec Loss 4.9123 LearningRate 0.0134 Epoch: 12 Global Step: 526450 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:52:38,233-Speed 2627.04 samples/sec Loss 4.9590 LearningRate 0.0134 Epoch: 12 Global Step: 526460 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:52:42,106-Speed 2644.61 samples/sec Loss 4.9223 LearningRate 0.0134 Epoch: 12 Global Step: 526470 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:46,016-Speed 2620.35 samples/sec Loss 4.8560 LearningRate 0.0133 Epoch: 12 Global Step: 526480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:49,913-Speed 2628.44 samples/sec Loss 4.8504 LearningRate 0.0133 Epoch: 12 Global Step: 526490 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:53,810-Speed 2628.76 samples/sec Loss 4.8729 LearningRate 0.0133 Epoch: 12 Global Step: 526500 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:52:57,722-Speed 2617.93 samples/sec Loss 4.9138 LearningRate 0.0133 Epoch: 12 Global Step: 526510 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:01,621-Speed 2627.14 samples/sec Loss 4.9184 LearningRate 0.0133 Epoch: 12 Global Step: 526520 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:05,558-Speed 2601.53 samples/sec Loss 4.8712 LearningRate 0.0133 Epoch: 12 Global Step: 526530 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:09,479-Speed 2612.33 samples/sec Loss 4.8706 LearningRate 0.0133 Epoch: 12 Global Step: 526540 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:13,371-Speed 2631.53 samples/sec Loss 4.9198 LearningRate 0.0133 Epoch: 12 Global Step: 526550 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:17,268-Speed 2628.99 samples/sec Loss 4.9441 LearningRate 0.0133 Epoch: 12 Global Step: 526560 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:21,169-Speed 2625.91 samples/sec Loss 4.8578 LearningRate 0.0133 Epoch: 12 Global Step: 526570 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:53:25,073-Speed 2623.71 samples/sec Loss 4.8887 LearningRate 0.0133 Epoch: 12 Global Step: 526580 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:53:28,995-Speed 2611.54 samples/sec Loss 4.8898 LearningRate 0.0133 Epoch: 12 Global Step: 526590 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:53:32,892-Speed 2628.47 samples/sec Loss 4.8793 LearningRate 0.0133 Epoch: 12 Global Step: 526600 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:53:36,787-Speed 2629.77 samples/sec Loss 4.8944 LearningRate 0.0133 Epoch: 12 Global Step: 526610 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:40,687-Speed 2625.81 samples/sec Loss 4.7930 LearningRate 0.0133 Epoch: 12 Global Step: 526620 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:44,593-Speed 2622.35 samples/sec Loss 4.8934 LearningRate 0.0133 Epoch: 12 Global Step: 526630 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:48,541-Speed 2594.45 samples/sec Loss 4.8749 LearningRate 0.0133 Epoch: 12 Global Step: 526640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:52,461-Speed 2613.48 samples/sec Loss 4.9832 LearningRate 0.0133 Epoch: 12 Global Step: 526650 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:53:56,355-Speed 2630.36 samples/sec Loss 4.9307 LearningRate 0.0133 Epoch: 12 Global Step: 526660 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:00,252-Speed 2628.20 samples/sec Loss 4.8449 LearningRate 0.0133 Epoch: 12 Global Step: 526670 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:04,165-Speed 2617.67 samples/sec Loss 4.9213 LearningRate 0.0133 Epoch: 12 Global Step: 526680 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:08,070-Speed 2622.63 samples/sec Loss 4.9108 LearningRate 0.0133 Epoch: 12 Global Step: 526690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:11,978-Speed 2620.98 samples/sec Loss 4.9114 LearningRate 0.0133 Epoch: 12 Global Step: 526700 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:16,012-Speed 2539.19 samples/sec Loss 4.9282 LearningRate 0.0133 Epoch: 12 Global Step: 526710 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:19,910-Speed 2627.49 samples/sec Loss 4.9430 LearningRate 0.0133 Epoch: 12 Global Step: 526720 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:23,907-Speed 2562.99 samples/sec Loss 4.9833 LearningRate 0.0133 Epoch: 12 Global Step: 526730 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:27,800-Speed 2631.32 samples/sec Loss 4.8426 LearningRate 0.0133 Epoch: 12 Global Step: 526740 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:31,692-Speed 2632.22 samples/sec Loss 4.8487 LearningRate 0.0133 Epoch: 12 Global Step: 526750 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:35,589-Speed 2628.15 samples/sec Loss 4.8666 LearningRate 0.0133 Epoch: 12 Global Step: 526760 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:39,485-Speed 2628.55 samples/sec Loss 4.9050 LearningRate 0.0133 Epoch: 12 Global Step: 526770 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:43,384-Speed 2627.17 samples/sec Loss 5.0139 LearningRate 0.0133 Epoch: 12 Global Step: 526780 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:47,291-Speed 2621.27 samples/sec Loss 4.8596 LearningRate 0.0133 Epoch: 12 Global Step: 526790 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:51,191-Speed 2626.11 samples/sec Loss 4.8546 LearningRate 0.0133 Epoch: 12 Global Step: 526800 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:54:55,090-Speed 2627.52 samples/sec Loss 4.8625 LearningRate 0.0133 Epoch: 12 Global Step: 526810 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:54:59,007-Speed 2615.02 samples/sec Loss 4.9187 LearningRate 0.0133 Epoch: 12 Global Step: 526820 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:55:02,905-Speed 2627.46 samples/sec Loss 4.7995 LearningRate 0.0133 Epoch: 12 Global Step: 526830 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:55:06,818-Speed 2617.74 samples/sec Loss 4.8295 LearningRate 0.0133 Epoch: 12 Global Step: 526840 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:55:10,714-Speed 2628.67 samples/sec Loss 4.9714 LearningRate 0.0133 Epoch: 12 Global Step: 526850 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:55:14,621-Speed 2621.28 samples/sec Loss 4.8838 LearningRate 0.0133 Epoch: 12 Global Step: 526860 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:55:18,496-Speed 2643.04 samples/sec Loss 5.0033 LearningRate 0.0133 Epoch: 12 Global Step: 526870 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:22,392-Speed 2629.66 samples/sec Loss 4.8291 LearningRate 0.0133 Epoch: 12 Global Step: 526880 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:26,291-Speed 2626.62 samples/sec Loss 4.9408 LearningRate 0.0133 Epoch: 12 Global Step: 526890 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:30,194-Speed 2625.00 samples/sec Loss 4.8385 LearningRate 0.0133 Epoch: 12 Global Step: 526900 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:34,089-Speed 2629.56 samples/sec Loss 4.8852 LearningRate 0.0133 Epoch: 12 Global Step: 526910 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:37,981-Speed 2631.55 samples/sec Loss 4.8205 LearningRate 0.0133 Epoch: 12 Global Step: 526920 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:41,874-Speed 2631.20 samples/sec Loss 4.9329 LearningRate 0.0133 Epoch: 12 Global Step: 526930 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:45,774-Speed 2626.24 samples/sec Loss 4.8582 LearningRate 0.0133 Epoch: 12 Global Step: 526940 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:49,682-Speed 2620.75 samples/sec Loss 4.9278 LearningRate 0.0133 Epoch: 12 Global Step: 526950 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:53,578-Speed 2628.94 samples/sec Loss 4.8795 LearningRate 0.0133 Epoch: 12 Global Step: 526960 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:55:57,473-Speed 2629.56 samples/sec Loss 4.8295 LearningRate 0.0133 Epoch: 12 Global Step: 526970 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:56:01,370-Speed 2628.60 samples/sec Loss 5.0605 LearningRate 0.0133 Epoch: 12 Global Step: 526980 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:56:05,268-Speed 2627.55 samples/sec Loss 4.8757 LearningRate 0.0133 Epoch: 12 Global Step: 526990 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:56:09,140-Speed 2645.47 samples/sec Loss 4.8310 LearningRate 0.0133 Epoch: 12 Global Step: 527000 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:13,038-Speed 2627.40 samples/sec Loss 4.8326 LearningRate 0.0133 Epoch: 12 Global Step: 527010 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:16,936-Speed 2628.05 samples/sec Loss 4.7874 LearningRate 0.0133 Epoch: 12 Global Step: 527020 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:20,862-Speed 2608.63 samples/sec Loss 4.9131 LearningRate 0.0133 Epoch: 12 Global Step: 527030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:24,763-Speed 2625.49 samples/sec Loss 4.9543 LearningRate 0.0133 Epoch: 12 Global Step: 527040 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:28,658-Speed 2629.42 samples/sec Loss 4.8494 LearningRate 0.0133 Epoch: 12 Global Step: 527050 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:32,589-Speed 2605.88 samples/sec Loss 4.8594 LearningRate 0.0133 Epoch: 12 Global Step: 527060 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:36,482-Speed 2630.90 samples/sec Loss 4.8395 LearningRate 0.0133 Epoch: 12 Global Step: 527070 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:40,385-Speed 2624.66 samples/sec Loss 4.8993 LearningRate 0.0133 Epoch: 12 Global Step: 527080 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:44,282-Speed 2627.96 samples/sec Loss 4.8697 LearningRate 0.0133 Epoch: 12 Global Step: 527090 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:56:48,176-Speed 2631.38 samples/sec Loss 4.8588 LearningRate 0.0133 Epoch: 12 Global Step: 527100 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:56:52,115-Speed 2601.22 samples/sec Loss 4.9455 LearningRate 0.0133 Epoch: 12 Global Step: 527110 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:56:56,006-Speed 2632.00 samples/sec Loss 4.8769 LearningRate 0.0133 Epoch: 12 Global Step: 527120 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:56:59,899-Speed 2631.50 samples/sec Loss 4.8399 LearningRate 0.0133 Epoch: 12 Global Step: 527130 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:57:03,797-Speed 2627.52 samples/sec Loss 4.9791 LearningRate 0.0133 Epoch: 12 Global Step: 527140 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:57:07,699-Speed 2624.82 samples/sec Loss 4.8691 LearningRate 0.0133 Epoch: 12 Global Step: 527150 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:57:11,591-Speed 2631.44 samples/sec Loss 4.7797 LearningRate 0.0133 Epoch: 12 Global Step: 527160 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:57:15,486-Speed 2630.23 samples/sec Loss 4.9249 LearningRate 0.0133 Epoch: 12 Global Step: 527170 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:57:19,382-Speed 2628.74 samples/sec Loss 4.9744 LearningRate 0.0133 Epoch: 12 Global Step: 527180 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:57:23,288-Speed 2622.13 samples/sec Loss 4.9443 LearningRate 0.0133 Epoch: 12 Global Step: 527190 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:57:27,167-Speed 2640.38 samples/sec Loss 4.9223 LearningRate 0.0133 Epoch: 12 Global Step: 527200 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:31,062-Speed 2630.57 samples/sec Loss 4.7938 LearningRate 0.0133 Epoch: 12 Global Step: 527210 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:34,959-Speed 2628.13 samples/sec Loss 4.8240 LearningRate 0.0133 Epoch: 12 Global Step: 527220 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:38,854-Speed 2629.01 samples/sec Loss 4.8980 LearningRate 0.0133 Epoch: 12 Global Step: 527230 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:42,752-Speed 2627.78 samples/sec Loss 4.9442 LearningRate 0.0133 Epoch: 12 Global Step: 527240 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:46,643-Speed 2632.45 samples/sec Loss 4.9455 LearningRate 0.0133 Epoch: 12 Global Step: 527250 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:50,535-Speed 2631.87 samples/sec Loss 4.9465 LearningRate 0.0133 Epoch: 12 Global Step: 527260 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:54,431-Speed 2628.93 samples/sec Loss 4.8945 LearningRate 0.0133 Epoch: 12 Global Step: 527270 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:57:58,326-Speed 2629.74 samples/sec Loss 4.8437 LearningRate 0.0133 Epoch: 12 Global Step: 527280 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:02,226-Speed 2626.79 samples/sec Loss 5.0036 LearningRate 0.0133 Epoch: 12 Global Step: 527290 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:06,118-Speed 2631.45 samples/sec Loss 4.9192 LearningRate 0.0133 Epoch: 12 Global Step: 527300 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:58:10,029-Speed 2618.64 samples/sec Loss 4.9733 LearningRate 0.0133 Epoch: 12 Global Step: 527310 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:58:13,923-Speed 2630.45 samples/sec Loss 5.0104 LearningRate 0.0133 Epoch: 12 Global Step: 527320 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:58:17,796-Speed 2644.40 samples/sec Loss 4.9416 LearningRate 0.0133 Epoch: 12 Global Step: 527330 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:21,688-Speed 2632.18 samples/sec Loss 4.7916 LearningRate 0.0133 Epoch: 12 Global Step: 527340 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:25,592-Speed 2622.99 samples/sec Loss 5.0089 LearningRate 0.0133 Epoch: 12 Global Step: 527350 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:29,493-Speed 2625.56 samples/sec Loss 4.8278 LearningRate 0.0133 Epoch: 12 Global Step: 527360 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:33,388-Speed 2629.64 samples/sec Loss 4.8632 LearningRate 0.0133 Epoch: 12 Global Step: 527370 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:37,284-Speed 2629.11 samples/sec Loss 4.9593 LearningRate 0.0133 Epoch: 12 Global Step: 527380 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:41,179-Speed 2629.81 samples/sec Loss 5.0089 LearningRate 0.0133 Epoch: 12 Global Step: 527390 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:45,075-Speed 2629.44 samples/sec Loss 4.9131 LearningRate 0.0133 Epoch: 12 Global Step: 527400 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:48,966-Speed 2631.71 samples/sec Loss 4.9155 LearningRate 0.0133 Epoch: 12 Global Step: 527410 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:52,853-Speed 2635.66 samples/sec Loss 4.7582 LearningRate 0.0133 Epoch: 12 Global Step: 527420 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:58:56,740-Speed 2634.45 samples/sec Loss 4.8866 LearningRate 0.0133 Epoch: 12 Global Step: 527430 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:00,642-Speed 2625.54 samples/sec Loss 4.9583 LearningRate 0.0133 Epoch: 12 Global Step: 527440 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:04,545-Speed 2624.13 samples/sec Loss 4.9464 LearningRate 0.0133 Epoch: 12 Global Step: 527450 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:08,445-Speed 2626.12 samples/sec Loss 4.8551 LearningRate 0.0133 Epoch: 12 Global Step: 527460 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:12,349-Speed 2623.66 samples/sec Loss 4.9781 LearningRate 0.0133 Epoch: 12 Global Step: 527470 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:16,252-Speed 2624.87 samples/sec Loss 4.9250 LearningRate 0.0133 Epoch: 12 Global Step: 527480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:20,147-Speed 2629.20 samples/sec Loss 4.8891 LearningRate 0.0133 Epoch: 12 Global Step: 527490 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:24,047-Speed 2626.70 samples/sec Loss 4.8323 LearningRate 0.0133 Epoch: 12 Global Step: 527500 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:27,949-Speed 2624.85 samples/sec Loss 4.9276 LearningRate 0.0133 Epoch: 12 Global Step: 527510 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:31,842-Speed 2631.17 samples/sec Loss 4.8892 LearningRate 0.0133 Epoch: 12 Global Step: 527520 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:35,740-Speed 2627.81 samples/sec Loss 4.8418 LearningRate 0.0133 Epoch: 12 Global Step: 527530 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 06:59:39,617-Speed 2641.39 samples/sec Loss 4.9030 LearningRate 0.0133 Epoch: 12 Global Step: 527540 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 06:59:43,514-Speed 2628.39 samples/sec Loss 4.9226 LearningRate 0.0133 Epoch: 12 Global Step: 527550 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:59:47,419-Speed 2623.41 samples/sec Loss 4.8973 LearningRate 0.0133 Epoch: 12 Global Step: 527560 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:59:51,348-Speed 2607.03 samples/sec Loss 4.8826 LearningRate 0.0133 Epoch: 12 Global Step: 527570 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:59:55,255-Speed 2621.73 samples/sec Loss 4.8068 LearningRate 0.0133 Epoch: 12 Global Step: 527580 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 06:59:59,206-Speed 2591.98 samples/sec Loss 4.9704 LearningRate 0.0133 Epoch: 12 Global Step: 527590 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:00:03,115-Speed 2620.56 samples/sec Loss 4.8530 LearningRate 0.0133 Epoch: 12 Global Step: 527600 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:00:07,009-Speed 2630.14 samples/sec Loss 4.8928 LearningRate 0.0132 Epoch: 12 Global Step: 527610 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:00:10,909-Speed 2626.82 samples/sec Loss 4.7962 LearningRate 0.0132 Epoch: 12 Global Step: 527620 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:00:14,809-Speed 2625.54 samples/sec Loss 4.8440 LearningRate 0.0132 Epoch: 12 Global Step: 527630 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:00:18,707-Speed 2628.80 samples/sec Loss 4.7862 LearningRate 0.0132 Epoch: 12 Global Step: 527640 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:00:22,603-Speed 2629.01 samples/sec Loss 4.7940 LearningRate 0.0132 Epoch: 12 Global Step: 527650 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:26,502-Speed 2626.75 samples/sec Loss 4.9243 LearningRate 0.0132 Epoch: 12 Global Step: 527660 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:30,399-Speed 2628.35 samples/sec Loss 4.8683 LearningRate 0.0132 Epoch: 12 Global Step: 527670 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:34,292-Speed 2630.76 samples/sec Loss 4.9506 LearningRate 0.0132 Epoch: 12 Global Step: 527680 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:38,186-Speed 2630.89 samples/sec Loss 4.8988 LearningRate 0.0132 Epoch: 12 Global Step: 527690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:42,086-Speed 2626.58 samples/sec Loss 4.7615 LearningRate 0.0132 Epoch: 12 Global Step: 527700 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:45,983-Speed 2628.19 samples/sec Loss 4.8749 LearningRate 0.0132 Epoch: 12 Global Step: 527710 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:49,896-Speed 2618.14 samples/sec Loss 4.8351 LearningRate 0.0132 Epoch: 12 Global Step: 527720 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:53,796-Speed 2626.47 samples/sec Loss 4.8862 LearningRate 0.0132 Epoch: 12 Global Step: 527730 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:00:57,701-Speed 2623.17 samples/sec Loss 4.9439 LearningRate 0.0132 Epoch: 12 Global Step: 527740 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:01,603-Speed 2624.71 samples/sec Loss 5.0139 LearningRate 0.0132 Epoch: 12 Global Step: 527750 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:01:05,477-Speed 2643.66 samples/sec Loss 4.9824 LearningRate 0.0132 Epoch: 12 Global Step: 527760 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:09,379-Speed 2625.24 samples/sec Loss 4.8690 LearningRate 0.0132 Epoch: 12 Global Step: 527770 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:13,279-Speed 2626.50 samples/sec Loss 5.0306 LearningRate 0.0132 Epoch: 12 Global Step: 527780 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:17,182-Speed 2624.14 samples/sec Loss 4.7596 LearningRate 0.0132 Epoch: 12 Global Step: 527790 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:21,081-Speed 2627.50 samples/sec Loss 5.0046 LearningRate 0.0132 Epoch: 12 Global Step: 527800 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:24,986-Speed 2623.83 samples/sec Loss 4.8611 LearningRate 0.0132 Epoch: 12 Global Step: 527810 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:28,890-Speed 2623.10 samples/sec Loss 4.8813 LearningRate 0.0132 Epoch: 12 Global Step: 527820 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:32,786-Speed 2629.03 samples/sec Loss 4.9040 LearningRate 0.0132 Epoch: 12 Global Step: 527830 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:36,693-Speed 2621.30 samples/sec Loss 4.9139 LearningRate 0.0132 Epoch: 12 Global Step: 527840 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:40,606-Speed 2617.69 samples/sec Loss 4.8794 LearningRate 0.0132 Epoch: 12 Global Step: 527850 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:44,533-Speed 2608.43 samples/sec Loss 4.8701 LearningRate 0.0132 Epoch: 12 Global Step: 527860 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:48,452-Speed 2613.56 samples/sec Loss 4.9255 LearningRate 0.0132 Epoch: 12 Global Step: 527870 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:52,359-Speed 2621.36 samples/sec Loss 4.8420 LearningRate 0.0132 Epoch: 12 Global Step: 527880 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:01:56,260-Speed 2626.74 samples/sec Loss 5.0033 LearningRate 0.0132 Epoch: 12 Global Step: 527890 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:00,170-Speed 2619.65 samples/sec Loss 4.9260 LearningRate 0.0132 Epoch: 12 Global Step: 527900 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:04,098-Speed 2607.98 samples/sec Loss 4.9415 LearningRate 0.0132 Epoch: 12 Global Step: 527910 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:08,029-Speed 2604.92 samples/sec Loss 4.8598 LearningRate 0.0132 Epoch: 12 Global Step: 527920 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:11,931-Speed 2625.34 samples/sec Loss 4.9661 LearningRate 0.0132 Epoch: 12 Global Step: 527930 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:15,828-Speed 2628.49 samples/sec Loss 4.8694 LearningRate 0.0132 Epoch: 12 Global Step: 527940 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:19,744-Speed 2616.25 samples/sec Loss 4.9040 LearningRate 0.0132 Epoch: 12 Global Step: 527950 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:23,643-Speed 2626.83 samples/sec Loss 4.9445 LearningRate 0.0132 Epoch: 12 Global Step: 527960 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:02:27,540-Speed 2629.19 samples/sec Loss 4.9089 LearningRate 0.0132 Epoch: 12 Global Step: 527970 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:31,430-Speed 2632.36 samples/sec Loss 4.8776 LearningRate 0.0132 Epoch: 12 Global Step: 527980 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:35,328-Speed 2627.81 samples/sec Loss 4.9718 LearningRate 0.0132 Epoch: 12 Global Step: 527990 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:39,250-Speed 2611.56 samples/sec Loss 4.8715 LearningRate 0.0132 Epoch: 12 Global Step: 528000 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:43,146-Speed 2629.24 samples/sec Loss 4.8553 LearningRate 0.0132 Epoch: 12 Global Step: 528010 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:47,038-Speed 2631.71 samples/sec Loss 4.7703 LearningRate 0.0132 Epoch: 12 Global Step: 528020 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:50,933-Speed 2629.48 samples/sec Loss 4.8313 LearningRate 0.0132 Epoch: 12 Global Step: 528030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:54,825-Speed 2632.24 samples/sec Loss 4.8813 LearningRate 0.0132 Epoch: 12 Global Step: 528040 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:02:58,741-Speed 2615.53 samples/sec Loss 4.8684 LearningRate 0.0132 Epoch: 12 Global Step: 528050 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:02,636-Speed 2629.89 samples/sec Loss 4.8494 LearningRate 0.0132 Epoch: 12 Global Step: 528060 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:06,531-Speed 2629.70 samples/sec Loss 4.8560 LearningRate 0.0132 Epoch: 12 Global Step: 528070 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:03:10,429-Speed 2627.84 samples/sec Loss 4.8397 LearningRate 0.0132 Epoch: 12 Global Step: 528080 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:14,425-Speed 2563.49 samples/sec Loss 4.9184 LearningRate 0.0132 Epoch: 12 Global Step: 528090 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:18,322-Speed 2628.67 samples/sec Loss 4.8351 LearningRate 0.0132 Epoch: 12 Global Step: 528100 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:22,226-Speed 2623.66 samples/sec Loss 4.8455 LearningRate 0.0132 Epoch: 12 Global Step: 528110 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:26,128-Speed 2625.17 samples/sec Loss 4.9227 LearningRate 0.0132 Epoch: 12 Global Step: 528120 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:30,031-Speed 2624.26 samples/sec Loss 4.7968 LearningRate 0.0132 Epoch: 12 Global Step: 528130 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:33,939-Speed 2620.84 samples/sec Loss 4.9002 LearningRate 0.0132 Epoch: 12 Global Step: 528140 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:37,851-Speed 2618.22 samples/sec Loss 4.8998 LearningRate 0.0132 Epoch: 12 Global Step: 528150 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:41,755-Speed 2624.22 samples/sec Loss 4.9753 LearningRate 0.0132 Epoch: 12 Global Step: 528160 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:45,657-Speed 2625.12 samples/sec Loss 4.8522 LearningRate 0.0132 Epoch: 12 Global Step: 528170 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:03:49,553-Speed 2629.06 samples/sec Loss 4.7916 LearningRate 0.0132 Epoch: 12 Global Step: 528180 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:03:53,464-Speed 2618.48 samples/sec Loss 4.8712 LearningRate 0.0132 Epoch: 12 Global Step: 528190 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:03:57,371-Speed 2625.11 samples/sec Loss 4.9671 LearningRate 0.0132 Epoch: 12 Global Step: 528200 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:04:01,266-Speed 2629.31 samples/sec Loss 4.9791 LearningRate 0.0132 Epoch: 12 Global Step: 528210 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:04:05,184-Speed 2614.27 samples/sec Loss 4.9700 LearningRate 0.0132 Epoch: 12 Global Step: 528220 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:04:09,089-Speed 2623.33 samples/sec Loss 4.9628 LearningRate 0.0132 Epoch: 12 Global Step: 528230 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:04:13,005-Speed 2615.38 samples/sec Loss 4.8528 LearningRate 0.0132 Epoch: 12 Global Step: 528240 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:04:16,934-Speed 2606.76 samples/sec Loss 4.9347 LearningRate 0.0132 Epoch: 12 Global Step: 528250 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:04:20,810-Speed 2642.60 samples/sec Loss 4.8928 LearningRate 0.0132 Epoch: 12 Global Step: 528260 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:24,753-Speed 2598.73 samples/sec Loss 4.9505 LearningRate 0.0132 Epoch: 12 Global Step: 528270 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:28,681-Speed 2607.44 samples/sec Loss 4.9386 LearningRate 0.0132 Epoch: 12 Global Step: 528280 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:32,690-Speed 2554.64 samples/sec Loss 4.8936 LearningRate 0.0132 Epoch: 12 Global Step: 528290 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:36,586-Speed 2629.09 samples/sec Loss 4.9073 LearningRate 0.0132 Epoch: 12 Global Step: 528300 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:40,489-Speed 2624.44 samples/sec Loss 4.8744 LearningRate 0.0132 Epoch: 12 Global Step: 528310 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:44,383-Speed 2630.21 samples/sec Loss 4.9598 LearningRate 0.0132 Epoch: 12 Global Step: 528320 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:48,285-Speed 2625.51 samples/sec Loss 4.8399 LearningRate 0.0132 Epoch: 12 Global Step: 528330 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:52,204-Speed 2612.78 samples/sec Loss 4.9837 LearningRate 0.0132 Epoch: 12 Global Step: 528340 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:04:56,106-Speed 2625.58 samples/sec Loss 4.8884 LearningRate 0.0132 Epoch: 12 Global Step: 528350 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:00,024-Speed 2614.50 samples/sec Loss 4.9037 LearningRate 0.0132 Epoch: 12 Global Step: 528360 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:05:03,917-Speed 2631.67 samples/sec Loss 4.8196 LearningRate 0.0132 Epoch: 12 Global Step: 528370 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:05:07,818-Speed 2625.36 samples/sec Loss 4.9324 LearningRate 0.0132 Epoch: 12 Global Step: 528380 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:05:11,713-Speed 2629.79 samples/sec Loss 4.9079 LearningRate 0.0132 Epoch: 12 Global Step: 528390 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:05:15,604-Speed 2631.76 samples/sec Loss 4.8957 LearningRate 0.0132 Epoch: 12 Global Step: 528400 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:05:19,479-Speed 2643.24 samples/sec Loss 4.8354 LearningRate 0.0132 Epoch: 12 Global Step: 528410 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:23,376-Speed 2629.13 samples/sec Loss 4.8081 LearningRate 0.0132 Epoch: 12 Global Step: 528420 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:27,278-Speed 2624.86 samples/sec Loss 4.9085 LearningRate 0.0132 Epoch: 12 Global Step: 528430 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:31,262-Speed 2570.92 samples/sec Loss 4.7871 LearningRate 0.0132 Epoch: 12 Global Step: 528440 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:35,157-Speed 2629.95 samples/sec Loss 4.8143 LearningRate 0.0132 Epoch: 12 Global Step: 528450 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:39,054-Speed 2627.52 samples/sec Loss 4.9055 LearningRate 0.0132 Epoch: 12 Global Step: 528460 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:42,951-Speed 2628.43 samples/sec Loss 4.9026 LearningRate 0.0132 Epoch: 12 Global Step: 528470 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:46,858-Speed 2621.72 samples/sec Loss 4.8283 LearningRate 0.0132 Epoch: 12 Global Step: 528480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:50,759-Speed 2625.58 samples/sec Loss 4.9000 LearningRate 0.0132 Epoch: 12 Global Step: 528490 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:54,660-Speed 2626.08 samples/sec Loss 4.8555 LearningRate 0.0132 Epoch: 12 Global Step: 528500 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:05:58,578-Speed 2614.52 samples/sec Loss 4.8532 LearningRate 0.0132 Epoch: 12 Global Step: 528510 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:06:02,454-Speed 2643.07 samples/sec Loss 4.8067 LearningRate 0.0132 Epoch: 12 Global Step: 528520 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:06,362-Speed 2620.68 samples/sec Loss 4.9763 LearningRate 0.0132 Epoch: 12 Global Step: 528530 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:10,262-Speed 2626.38 samples/sec Loss 4.8146 LearningRate 0.0132 Epoch: 12 Global Step: 528540 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:14,155-Speed 2631.50 samples/sec Loss 5.0051 LearningRate 0.0132 Epoch: 12 Global Step: 528550 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:18,050-Speed 2629.51 samples/sec Loss 4.8089 LearningRate 0.0132 Epoch: 12 Global Step: 528560 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:21,957-Speed 2621.66 samples/sec Loss 4.8616 LearningRate 0.0132 Epoch: 12 Global Step: 528570 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:25,852-Speed 2629.62 samples/sec Loss 4.7309 LearningRate 0.0132 Epoch: 12 Global Step: 528580 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:29,767-Speed 2615.65 samples/sec Loss 4.9348 LearningRate 0.0132 Epoch: 12 Global Step: 528590 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:33,662-Speed 2630.10 samples/sec Loss 4.9156 LearningRate 0.0132 Epoch: 12 Global Step: 528600 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:37,556-Speed 2631.42 samples/sec Loss 4.8797 LearningRate 0.0132 Epoch: 12 Global Step: 528610 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:41,456-Speed 2625.89 samples/sec Loss 4.8777 LearningRate 0.0132 Epoch: 12 Global Step: 528620 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:06:45,337-Speed 2639.91 samples/sec Loss 4.8553 LearningRate 0.0132 Epoch: 12 Global Step: 528630 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:49,237-Speed 2625.92 samples/sec Loss 4.8154 LearningRate 0.0132 Epoch: 12 Global Step: 528640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:53,168-Speed 2605.83 samples/sec Loss 4.8725 LearningRate 0.0132 Epoch: 12 Global Step: 528650 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:06:57,060-Speed 2631.77 samples/sec Loss 4.8117 LearningRate 0.0132 Epoch: 12 Global Step: 528660 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:00,965-Speed 2622.70 samples/sec Loss 4.8979 LearningRate 0.0132 Epoch: 12 Global Step: 528670 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:04,859-Speed 2630.36 samples/sec Loss 4.9063 LearningRate 0.0132 Epoch: 12 Global Step: 528680 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:08,754-Speed 2629.93 samples/sec Loss 4.9197 LearningRate 0.0132 Epoch: 12 Global Step: 528690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:12,662-Speed 2621.37 samples/sec Loss 4.8955 LearningRate 0.0132 Epoch: 12 Global Step: 528700 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:16,563-Speed 2625.03 samples/sec Loss 4.8684 LearningRate 0.0132 Epoch: 12 Global Step: 528710 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:20,465-Speed 2625.81 samples/sec Loss 4.8126 LearningRate 0.0132 Epoch: 12 Global Step: 528720 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:24,359-Speed 2630.27 samples/sec Loss 4.8826 LearningRate 0.0132 Epoch: 12 Global Step: 528730 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:07:28,233-Speed 2644.09 samples/sec Loss 4.8890 LearningRate 0.0132 Epoch: 12 Global Step: 528740 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:32,130-Speed 2628.06 samples/sec Loss 4.8862 LearningRate 0.0131 Epoch: 12 Global Step: 528750 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:36,021-Speed 2632.67 samples/sec Loss 4.8438 LearningRate 0.0131 Epoch: 12 Global Step: 528760 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:39,920-Speed 2626.59 samples/sec Loss 4.8453 LearningRate 0.0131 Epoch: 12 Global Step: 528770 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:43,825-Speed 2623.45 samples/sec Loss 4.9203 LearningRate 0.0131 Epoch: 12 Global Step: 528780 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:47,721-Speed 2629.16 samples/sec Loss 4.8666 LearningRate 0.0131 Epoch: 12 Global Step: 528790 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:07:51,604-Speed 2637.20 samples/sec Loss 4.9664 LearningRate 0.0131 Epoch: 12 Global Step: 528800 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:07:55,498-Speed 2631.88 samples/sec Loss 4.8862 LearningRate 0.0131 Epoch: 12 Global Step: 528810 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:07:59,392-Speed 2629.93 samples/sec Loss 4.8203 LearningRate 0.0131 Epoch: 12 Global Step: 528820 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:03,289-Speed 2628.53 samples/sec Loss 4.8335 LearningRate 0.0131 Epoch: 12 Global Step: 528830 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:07,186-Speed 2627.78 samples/sec Loss 4.8451 LearningRate 0.0131 Epoch: 12 Global Step: 528840 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:11,084-Speed 2628.24 samples/sec Loss 4.9412 LearningRate 0.0131 Epoch: 12 Global Step: 528850 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:14,979-Speed 2629.16 samples/sec Loss 4.9767 LearningRate 0.0131 Epoch: 12 Global Step: 528860 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:18,879-Speed 2626.61 samples/sec Loss 4.8912 LearningRate 0.0131 Epoch: 12 Global Step: 528870 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:22,773-Speed 2630.00 samples/sec Loss 4.8312 LearningRate 0.0131 Epoch: 12 Global Step: 528880 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:26,668-Speed 2630.46 samples/sec Loss 4.8306 LearningRate 0.0131 Epoch: 12 Global Step: 528890 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:08:30,563-Speed 2629.75 samples/sec Loss 4.8635 LearningRate 0.0131 Epoch: 12 Global Step: 528900 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:08:34,558-Speed 2563.39 samples/sec Loss 4.8770 LearningRate 0.0131 Epoch: 12 Global Step: 528910 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:08:38,469-Speed 2618.43 samples/sec Loss 4.8513 LearningRate 0.0131 Epoch: 12 Global Step: 528920 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:08:42,373-Speed 2623.78 samples/sec Loss 4.8749 LearningRate 0.0131 Epoch: 12 Global Step: 528930 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:08:46,284-Speed 2619.25 samples/sec Loss 4.8370 LearningRate 0.0131 Epoch: 12 Global Step: 528940 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:08:50,182-Speed 2627.49 samples/sec Loss 4.9427 LearningRate 0.0131 Epoch: 12 Global Step: 528950 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:08:54,079-Speed 2628.29 samples/sec Loss 4.9141 LearningRate 0.0131 Epoch: 12 Global Step: 528960 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:08:57,978-Speed 2627.19 samples/sec Loss 4.7979 LearningRate 0.0131 Epoch: 12 Global Step: 528970 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:01,883-Speed 2623.37 samples/sec Loss 4.7811 LearningRate 0.0131 Epoch: 12 Global Step: 528980 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:05,820-Speed 2601.43 samples/sec Loss 4.9218 LearningRate 0.0131 Epoch: 12 Global Step: 528990 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:09,717-Speed 2628.08 samples/sec Loss 4.8907 LearningRate 0.0131 Epoch: 12 Global Step: 529000 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:09:13,619-Speed 2625.61 samples/sec Loss 4.7502 LearningRate 0.0131 Epoch: 12 Global Step: 529010 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:09:17,516-Speed 2627.68 samples/sec Loss 4.8535 LearningRate 0.0131 Epoch: 12 Global Step: 529020 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:09:21,392-Speed 2643.16 samples/sec Loss 4.9053 LearningRate 0.0131 Epoch: 12 Global Step: 529030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:25,311-Speed 2612.87 samples/sec Loss 4.8204 LearningRate 0.0131 Epoch: 12 Global Step: 529040 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:29,213-Speed 2625.43 samples/sec Loss 4.7835 LearningRate 0.0131 Epoch: 12 Global Step: 529050 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:33,114-Speed 2625.75 samples/sec Loss 4.8893 LearningRate 0.0131 Epoch: 12 Global Step: 529060 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:37,006-Speed 2631.70 samples/sec Loss 4.9389 LearningRate 0.0131 Epoch: 12 Global Step: 529070 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:40,899-Speed 2630.48 samples/sec Loss 4.7853 LearningRate 0.0131 Epoch: 12 Global Step: 529080 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:44,817-Speed 2614.06 samples/sec Loss 4.7999 LearningRate 0.0131 Epoch: 12 Global Step: 529090 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:48,710-Speed 2631.54 samples/sec Loss 4.7838 LearningRate 0.0131 Epoch: 12 Global Step: 529100 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:52,606-Speed 2628.31 samples/sec Loss 5.0161 LearningRate 0.0131 Epoch: 12 Global Step: 529110 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:09:56,497-Speed 2632.58 samples/sec Loss 4.8469 LearningRate 0.0131 Epoch: 12 Global Step: 529120 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:00,369-Speed 2645.87 samples/sec Loss 4.8274 LearningRate 0.0131 Epoch: 12 Global Step: 529130 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:04,263-Speed 2630.08 samples/sec Loss 4.8765 LearningRate 0.0131 Epoch: 12 Global Step: 529140 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:08,164-Speed 2625.73 samples/sec Loss 4.8926 LearningRate 0.0131 Epoch: 12 Global Step: 529150 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:12,061-Speed 2629.68 samples/sec Loss 4.8132 LearningRate 0.0131 Epoch: 12 Global Step: 529160 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:15,954-Speed 2630.64 samples/sec Loss 4.7996 LearningRate 0.0131 Epoch: 12 Global Step: 529170 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:19,848-Speed 2630.23 samples/sec Loss 4.8656 LearningRate 0.0131 Epoch: 12 Global Step: 529180 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:23,753-Speed 2623.17 samples/sec Loss 4.9411 LearningRate 0.0131 Epoch: 12 Global Step: 529190 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:27,663-Speed 2619.28 samples/sec Loss 4.8187 LearningRate 0.0131 Epoch: 12 Global Step: 529200 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:31,566-Speed 2624.11 samples/sec Loss 4.7909 LearningRate 0.0131 Epoch: 12 Global Step: 529210 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:35,473-Speed 2621.70 samples/sec Loss 4.8791 LearningRate 0.0131 Epoch: 12 Global Step: 529220 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:10:39,347-Speed 2643.48 samples/sec Loss 4.8630 LearningRate 0.0131 Epoch: 12 Global Step: 529230 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:10:43,265-Speed 2614.64 samples/sec Loss 4.8622 LearningRate 0.0131 Epoch: 12 Global Step: 529240 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:10:47,168-Speed 2624.79 samples/sec Loss 4.8761 LearningRate 0.0131 Epoch: 12 Global Step: 529250 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:10:51,072-Speed 2623.24 samples/sec Loss 4.8735 LearningRate 0.0131 Epoch: 12 Global Step: 529260 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:10:54,979-Speed 2622.18 samples/sec Loss 4.9936 LearningRate 0.0131 Epoch: 12 Global Step: 529270 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:10:58,894-Speed 2615.86 samples/sec Loss 4.7241 LearningRate 0.0131 Epoch: 12 Global Step: 529280 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:11:02,792-Speed 2628.78 samples/sec Loss 4.8421 LearningRate 0.0131 Epoch: 12 Global Step: 529290 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:11:06,689-Speed 2628.07 samples/sec Loss 4.8877 LearningRate 0.0131 Epoch: 12 Global Step: 529300 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:11:10,594-Speed 2622.53 samples/sec Loss 4.8660 LearningRate 0.0131 Epoch: 12 Global Step: 529310 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:11:14,498-Speed 2623.69 samples/sec Loss 4.7125 LearningRate 0.0131 Epoch: 12 Global Step: 529320 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:11:18,414-Speed 2616.35 samples/sec Loss 4.9899 LearningRate 0.0131 Epoch: 12 Global Step: 529330 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:22,310-Speed 2629.28 samples/sec Loss 4.8488 LearningRate 0.0131 Epoch: 12 Global Step: 529340 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:26,211-Speed 2625.53 samples/sec Loss 4.7979 LearningRate 0.0131 Epoch: 12 Global Step: 529350 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:30,108-Speed 2628.46 samples/sec Loss 4.7627 LearningRate 0.0131 Epoch: 12 Global Step: 529360 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:34,000-Speed 2631.36 samples/sec Loss 4.9497 LearningRate 0.0131 Epoch: 12 Global Step: 529370 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:37,894-Speed 2630.11 samples/sec Loss 4.8982 LearningRate 0.0131 Epoch: 12 Global Step: 529380 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:41,792-Speed 2627.42 samples/sec Loss 4.8192 LearningRate 0.0131 Epoch: 12 Global Step: 529390 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:45,684-Speed 2631.45 samples/sec Loss 4.9377 LearningRate 0.0131 Epoch: 12 Global Step: 529400 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:49,641-Speed 2589.46 samples/sec Loss 4.8957 LearningRate 0.0131 Epoch: 12 Global Step: 529410 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:53,532-Speed 2632.41 samples/sec Loss 4.8734 LearningRate 0.0131 Epoch: 12 Global Step: 529420 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:11:57,431-Speed 2627.19 samples/sec Loss 4.8405 LearningRate 0.0131 Epoch: 12 Global Step: 529430 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:12:01,322-Speed 2632.70 samples/sec Loss 4.8926 LearningRate 0.0131 Epoch: 12 Global Step: 529440 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:05,235-Speed 2617.47 samples/sec Loss 4.8945 LearningRate 0.0131 Epoch: 12 Global Step: 529450 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:09,161-Speed 2608.57 samples/sec Loss 4.8612 LearningRate 0.0131 Epoch: 12 Global Step: 529460 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:13,059-Speed 2627.47 samples/sec Loss 4.8863 LearningRate 0.0131 Epoch: 12 Global Step: 529470 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:16,969-Speed 2619.40 samples/sec Loss 4.8598 LearningRate 0.0131 Epoch: 12 Global Step: 529480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:20,867-Speed 2628.06 samples/sec Loss 4.7854 LearningRate 0.0131 Epoch: 12 Global Step: 529490 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:24,801-Speed 2604.09 samples/sec Loss 4.8455 LearningRate 0.0131 Epoch: 12 Global Step: 529500 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:28,694-Speed 2631.00 samples/sec Loss 4.8969 LearningRate 0.0131 Epoch: 12 Global Step: 529510 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:32,590-Speed 2628.45 samples/sec Loss 4.8379 LearningRate 0.0131 Epoch: 12 Global Step: 529520 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:36,506-Speed 2615.56 samples/sec Loss 4.8850 LearningRate 0.0131 Epoch: 12 Global Step: 529530 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:40,413-Speed 2621.24 samples/sec Loss 4.9656 LearningRate 0.0131 Epoch: 12 Global Step: 529540 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:12:44,288-Speed 2643.50 samples/sec Loss 4.8242 LearningRate 0.0131 Epoch: 12 Global Step: 529550 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:48,192-Speed 2623.67 samples/sec Loss 4.9141 LearningRate 0.0131 Epoch: 12 Global Step: 529560 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:52,090-Speed 2628.46 samples/sec Loss 4.8927 LearningRate 0.0131 Epoch: 12 Global Step: 529570 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:55,985-Speed 2629.42 samples/sec Loss 4.8265 LearningRate 0.0131 Epoch: 12 Global Step: 529580 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:12:59,898-Speed 2617.64 samples/sec Loss 4.8629 LearningRate 0.0131 Epoch: 12 Global Step: 529590 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:03,939-Speed 2534.73 samples/sec Loss 4.9239 LearningRate 0.0131 Epoch: 12 Global Step: 529600 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:07,964-Speed 2544.97 samples/sec Loss 4.9314 LearningRate 0.0131 Epoch: 12 Global Step: 529610 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:11,858-Speed 2629.64 samples/sec Loss 4.8113 LearningRate 0.0131 Epoch: 12 Global Step: 529620 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:15,754-Speed 2629.19 samples/sec Loss 4.8926 LearningRate 0.0131 Epoch: 12 Global Step: 529630 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:19,658-Speed 2623.88 samples/sec Loss 4.9354 LearningRate 0.0131 Epoch: 12 Global Step: 529640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:23,554-Speed 2628.54 samples/sec Loss 4.8885 LearningRate 0.0131 Epoch: 12 Global Step: 529650 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:13:27,450-Speed 2629.61 samples/sec Loss 4.8625 LearningRate 0.0131 Epoch: 12 Global Step: 529660 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:13:31,325-Speed 2643.06 samples/sec Loss 4.8657 LearningRate 0.0131 Epoch: 12 Global Step: 529670 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:35,222-Speed 2628.29 samples/sec Loss 4.8727 LearningRate 0.0131 Epoch: 12 Global Step: 529680 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:39,116-Speed 2630.05 samples/sec Loss 4.8578 LearningRate 0.0131 Epoch: 12 Global Step: 529690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:43,044-Speed 2608.07 samples/sec Loss 4.7514 LearningRate 0.0131 Epoch: 12 Global Step: 529700 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:47,101-Speed 2524.34 samples/sec Loss 4.8983 LearningRate 0.0131 Epoch: 12 Global Step: 529710 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:51,142-Speed 2534.73 samples/sec Loss 4.9128 LearningRate 0.0131 Epoch: 12 Global Step: 529720 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:55,046-Speed 2624.18 samples/sec Loss 4.8777 LearningRate 0.0131 Epoch: 12 Global Step: 529730 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:13:58,944-Speed 2628.10 samples/sec Loss 4.8828 LearningRate 0.0131 Epoch: 12 Global Step: 529740 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:02,843-Speed 2626.66 samples/sec Loss 4.8371 LearningRate 0.0131 Epoch: 12 Global Step: 529750 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:06,745-Speed 2624.26 samples/sec Loss 4.8455 LearningRate 0.0131 Epoch: 12 Global Step: 529760 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:10,641-Speed 2629.33 samples/sec Loss 4.9096 LearningRate 0.0131 Epoch: 12 Global Step: 529770 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:14:14,540-Speed 2627.48 samples/sec Loss 4.7966 LearningRate 0.0131 Epoch: 12 Global Step: 529780 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:14:18,452-Speed 2618.52 samples/sec Loss 4.8627 LearningRate 0.0131 Epoch: 12 Global Step: 529790 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:14:22,355-Speed 2623.96 samples/sec Loss 4.9222 LearningRate 0.0131 Epoch: 12 Global Step: 529800 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:14:26,239-Speed 2637.82 samples/sec Loss 4.8514 LearningRate 0.0131 Epoch: 12 Global Step: 529810 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:30,134-Speed 2629.64 samples/sec Loss 4.8795 LearningRate 0.0131 Epoch: 12 Global Step: 529820 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:34,040-Speed 2622.05 samples/sec Loss 4.8288 LearningRate 0.0131 Epoch: 12 Global Step: 529830 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:37,941-Speed 2625.57 samples/sec Loss 4.9267 LearningRate 0.0131 Epoch: 12 Global Step: 529840 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:41,844-Speed 2624.75 samples/sec Loss 4.8606 LearningRate 0.0131 Epoch: 12 Global Step: 529850 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:45,746-Speed 2624.56 samples/sec Loss 4.8451 LearningRate 0.0131 Epoch: 12 Global Step: 529860 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:49,651-Speed 2623.04 samples/sec Loss 4.8578 LearningRate 0.0131 Epoch: 12 Global Step: 529870 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:53,559-Speed 2621.06 samples/sec Loss 4.7763 LearningRate 0.0131 Epoch: 12 Global Step: 529880 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:14:57,458-Speed 2627.19 samples/sec Loss 4.8886 LearningRate 0.0131 Epoch: 12 Global Step: 529890 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:15:01,357-Speed 2627.17 samples/sec Loss 4.9290 LearningRate 0.0130 Epoch: 12 Global Step: 529900 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:15:05,258-Speed 2624.93 samples/sec Loss 4.8731 LearningRate 0.0130 Epoch: 12 Global Step: 529910 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:15:09,155-Speed 2628.51 samples/sec Loss 4.9457 LearningRate 0.0130 Epoch: 12 Global Step: 529920 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:15:13,075-Speed 2613.22 samples/sec Loss 4.8004 LearningRate 0.0130 Epoch: 12 Global Step: 529930 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:15:16,967-Speed 2631.41 samples/sec Loss 4.9108 LearningRate 0.0130 Epoch: 12 Global Step: 529940 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:15:20,863-Speed 2629.35 samples/sec Loss 4.8811 LearningRate 0.0130 Epoch: 12 Global Step: 529950 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:15:24,757-Speed 2630.54 samples/sec Loss 4.9135 LearningRate 0.0130 Epoch: 12 Global Step: 529960 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:15:28,653-Speed 2629.36 samples/sec Loss 4.8960 LearningRate 0.0130 Epoch: 12 Global Step: 529970 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:15:32,571-Speed 2613.43 samples/sec Loss 4.8263 LearningRate 0.0130 Epoch: 12 Global Step: 529980 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:15:36,470-Speed 2627.41 samples/sec Loss 4.9374 LearningRate 0.0130 Epoch: 12 Global Step: 529990 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:15:40,365-Speed 2629.75 samples/sec Loss 4.7252 LearningRate 0.0130 Epoch: 12 Global Step: 530000 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:16:23,425-[lfw][530000]XNorm: 23.368143
Training: 2022-04-15 07:16:23,426-[lfw][530000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 07:16:23,426-[lfw][530000]Accuracy-Highest: 0.99800
Training: 2022-04-15 07:17:13,246-[cfp_fp][530000]XNorm: 21.864775
Training: 2022-04-15 07:17:13,247-[cfp_fp][530000]Accuracy-Flip: 0.99086+-0.00453
Training: 2022-04-15 07:17:13,248-[cfp_fp][530000]Accuracy-Highest: 0.99086
Training: 2022-04-15 07:17:56,131-[agedb_30][530000]XNorm: 23.350406
Training: 2022-04-15 07:17:56,132-[agedb_30][530000]Accuracy-Flip: 0.97850+-0.00689
Training: 2022-04-15 07:17:56,133-[agedb_30][530000]Accuracy-Highest: 0.98083
Training: 2022-04-15 07:18:00,029-Speed 73.32 samples/sec Loss 4.7652 LearningRate 0.0130 Epoch: 12 Global Step: 530010 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:03,905-Speed 2642.41 samples/sec Loss 4.7585 LearningRate 0.0130 Epoch: 12 Global Step: 530020 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:07,782-Speed 2642.01 samples/sec Loss 4.8720 LearningRate 0.0130 Epoch: 12 Global Step: 530030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:11,664-Speed 2638.78 samples/sec Loss 4.8995 LearningRate 0.0130 Epoch: 12 Global Step: 530040 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:15,555-Speed 2631.85 samples/sec Loss 4.9189 LearningRate 0.0130 Epoch: 12 Global Step: 530050 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:19,438-Speed 2638.00 samples/sec Loss 4.8507 LearningRate 0.0130 Epoch: 12 Global Step: 530060 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:23,322-Speed 2636.97 samples/sec Loss 4.9030 LearningRate 0.0130 Epoch: 12 Global Step: 530070 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:27,219-Speed 2628.26 samples/sec Loss 4.8991 LearningRate 0.0130 Epoch: 12 Global Step: 530080 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:18:31,104-Speed 2636.30 samples/sec Loss 4.9084 LearningRate 0.0130 Epoch: 12 Global Step: 530090 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:18:34,996-Speed 2631.53 samples/sec Loss 4.8767 LearningRate 0.0130 Epoch: 12 Global Step: 530100 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:18:38,959-Speed 2585.20 samples/sec Loss 4.8553 LearningRate 0.0130 Epoch: 12 Global Step: 530110 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:42,855-Speed 2628.49 samples/sec Loss 4.9029 LearningRate 0.0130 Epoch: 12 Global Step: 530120 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:46,748-Speed 2631.24 samples/sec Loss 4.8126 LearningRate 0.0130 Epoch: 12 Global Step: 530130 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:50,643-Speed 2629.24 samples/sec Loss 4.8522 LearningRate 0.0130 Epoch: 12 Global Step: 530140 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:54,535-Speed 2631.98 samples/sec Loss 4.8828 LearningRate 0.0130 Epoch: 12 Global Step: 530150 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:18:58,430-Speed 2629.38 samples/sec Loss 4.7781 LearningRate 0.0130 Epoch: 12 Global Step: 530160 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:02,335-Speed 2622.94 samples/sec Loss 4.9990 LearningRate 0.0130 Epoch: 12 Global Step: 530170 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:06,223-Speed 2634.02 samples/sec Loss 4.8572 LearningRate 0.0130 Epoch: 12 Global Step: 530180 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:10,121-Speed 2627.95 samples/sec Loss 4.9102 LearningRate 0.0130 Epoch: 12 Global Step: 530190 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:14,019-Speed 2628.19 samples/sec Loss 4.7569 LearningRate 0.0130 Epoch: 12 Global Step: 530200 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:17,932-Speed 2617.34 samples/sec Loss 4.9126 LearningRate 0.0130 Epoch: 12 Global Step: 530210 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:19:21,837-Speed 2622.40 samples/sec Loss 4.8315 LearningRate 0.0130 Epoch: 12 Global Step: 530220 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:19:25,745-Speed 2621.61 samples/sec Loss 4.8407 LearningRate 0.0130 Epoch: 12 Global Step: 530230 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:29,821-Speed 2512.51 samples/sec Loss 4.8887 LearningRate 0.0130 Epoch: 12 Global Step: 530240 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:33,730-Speed 2620.54 samples/sec Loss 4.8996 LearningRate 0.0130 Epoch: 12 Global Step: 530250 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:37,629-Speed 2626.44 samples/sec Loss 4.9516 LearningRate 0.0130 Epoch: 12 Global Step: 530260 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:41,526-Speed 2628.96 samples/sec Loss 4.8366 LearningRate 0.0130 Epoch: 12 Global Step: 530270 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:45,450-Speed 2609.51 samples/sec Loss 4.8869 LearningRate 0.0130 Epoch: 12 Global Step: 530280 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:49,353-Speed 2624.46 samples/sec Loss 4.8913 LearningRate 0.0130 Epoch: 12 Global Step: 530290 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:53,248-Speed 2629.97 samples/sec Loss 4.8334 LearningRate 0.0130 Epoch: 12 Global Step: 530300 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:19:57,157-Speed 2619.91 samples/sec Loss 4.8417 LearningRate 0.0130 Epoch: 12 Global Step: 530310 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:01,060-Speed 2625.00 samples/sec Loss 4.8288 LearningRate 0.0130 Epoch: 12 Global Step: 530320 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:04,958-Speed 2627.31 samples/sec Loss 4.9042 LearningRate 0.0130 Epoch: 12 Global Step: 530330 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:20:08,857-Speed 2626.69 samples/sec Loss 4.9920 LearningRate 0.0130 Epoch: 12 Global Step: 530340 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:20:12,758-Speed 2625.89 samples/sec Loss 4.9119 LearningRate 0.0130 Epoch: 12 Global Step: 530350 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:20:16,651-Speed 2630.73 samples/sec Loss 4.7902 LearningRate 0.0130 Epoch: 12 Global Step: 530360 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:20:20,551-Speed 2626.17 samples/sec Loss 4.8502 LearningRate 0.0130 Epoch: 12 Global Step: 530370 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:20:24,456-Speed 2622.73 samples/sec Loss 4.8188 LearningRate 0.0130 Epoch: 12 Global Step: 530380 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:20:28,433-Speed 2575.66 samples/sec Loss 4.8161 LearningRate 0.0130 Epoch: 12 Global Step: 530390 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:20:32,308-Speed 2643.83 samples/sec Loss 4.8929 LearningRate 0.0130 Epoch: 12 Global Step: 530400 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:36,207-Speed 2627.07 samples/sec Loss 4.8829 LearningRate 0.0130 Epoch: 12 Global Step: 530410 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:40,115-Speed 2620.37 samples/sec Loss 4.7969 LearningRate 0.0130 Epoch: 12 Global Step: 530420 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:44,012-Speed 2628.07 samples/sec Loss 4.8121 LearningRate 0.0130 Epoch: 12 Global Step: 530430 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:47,906-Speed 2630.55 samples/sec Loss 4.8792 LearningRate 0.0130 Epoch: 12 Global Step: 530440 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:52,245-Speed 2360.26 samples/sec Loss 4.8200 LearningRate 0.0130 Epoch: 12 Global Step: 530450 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:20:56,174-Speed 2607.40 samples/sec Loss 4.9344 LearningRate 0.0130 Epoch: 12 Global Step: 530460 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:21:00,073-Speed 2627.25 samples/sec Loss 4.7754 LearningRate 0.0130 Epoch: 12 Global Step: 530470 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:21:03,968-Speed 2629.26 samples/sec Loss 4.7534 LearningRate 0.0130 Epoch: 12 Global Step: 530480 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:21:07,918-Speed 2593.00 samples/sec Loss 4.9251 LearningRate 0.0130 Epoch: 12 Global Step: 530490 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:21:11,834-Speed 2615.37 samples/sec Loss 4.8144 LearningRate 0.0130 Epoch: 12 Global Step: 530500 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:15,735-Speed 2625.63 samples/sec Loss 4.7724 LearningRate 0.0130 Epoch: 12 Global Step: 530510 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:19,630-Speed 2629.91 samples/sec Loss 4.7791 LearningRate 0.0130 Epoch: 12 Global Step: 530520 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:23,533-Speed 2624.64 samples/sec Loss 4.8626 LearningRate 0.0130 Epoch: 12 Global Step: 530530 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:27,455-Speed 2611.02 samples/sec Loss 4.8869 LearningRate 0.0130 Epoch: 12 Global Step: 530540 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:31,349-Speed 2630.32 samples/sec Loss 4.9373 LearningRate 0.0130 Epoch: 12 Global Step: 530550 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:35,244-Speed 2629.93 samples/sec Loss 4.9065 LearningRate 0.0130 Epoch: 12 Global Step: 530560 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:39,143-Speed 2627.34 samples/sec Loss 4.7314 LearningRate 0.0130 Epoch: 12 Global Step: 530570 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:43,044-Speed 2624.82 samples/sec Loss 4.9021 LearningRate 0.0130 Epoch: 12 Global Step: 530580 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:46,938-Speed 2630.82 samples/sec Loss 4.8006 LearningRate 0.0130 Epoch: 12 Global Step: 530590 Fp16 Grad Scale: 32768 Required: 34 hours
Training: 2022-04-15 07:21:50,829-Speed 2632.43 samples/sec Loss 4.7804 LearningRate 0.0130 Epoch: 12 Global Step: 530600 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:21:54,724-Speed 2629.73 samples/sec Loss 4.8322 LearningRate 0.0130 Epoch: 12 Global Step: 530610 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:21:58,625-Speed 2625.47 samples/sec Loss 4.8940 LearningRate 0.0130 Epoch: 12 Global Step: 530620 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:02,516-Speed 2632.01 samples/sec Loss 4.9125 LearningRate 0.0130 Epoch: 12 Global Step: 530630 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:06,412-Speed 2629.32 samples/sec Loss 4.8441 LearningRate 0.0130 Epoch: 12 Global Step: 530640 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:10,305-Speed 2631.30 samples/sec Loss 4.7478 LearningRate 0.0130 Epoch: 12 Global Step: 530650 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:14,207-Speed 2624.38 samples/sec Loss 4.7454 LearningRate 0.0130 Epoch: 12 Global Step: 530660 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:18,102-Speed 2629.92 samples/sec Loss 4.9169 LearningRate 0.0130 Epoch: 12 Global Step: 530670 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:22,058-Speed 2589.25 samples/sec Loss 4.7807 LearningRate 0.0130 Epoch: 12 Global Step: 530680 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:26,017-Speed 2586.79 samples/sec Loss 4.8529 LearningRate 0.0130 Epoch: 12 Global Step: 530690 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:22:29,909-Speed 2631.43 samples/sec Loss 4.8455 LearningRate 0.0130 Epoch: 12 Global Step: 530700 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:22:33,808-Speed 2627.10 samples/sec Loss 4.8696 LearningRate 0.0130 Epoch: 12 Global Step: 530710 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:22:37,780-Speed 2578.56 samples/sec Loss 4.8510 LearningRate 0.0130 Epoch: 12 Global Step: 530720 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:22:41,909-Speed 2480.34 samples/sec Loss 4.9285 LearningRate 0.0130 Epoch: 12 Global Step: 530730 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:22:45,854-Speed 2597.23 samples/sec Loss 4.8388 LearningRate 0.0130 Epoch: 12 Global Step: 530740 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:22:49,753-Speed 2627.04 samples/sec Loss 4.8465 LearningRate 0.0130 Epoch: 12 Global Step: 530750 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:22:53,660-Speed 2621.79 samples/sec Loss 4.8765 LearningRate 0.0130 Epoch: 12 Global Step: 530760 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:22:57,540-Speed 2639.53 samples/sec Loss 4.8515 LearningRate 0.0130 Epoch: 12 Global Step: 530770 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:01,447-Speed 2622.13 samples/sec Loss 4.9187 LearningRate 0.0130 Epoch: 12 Global Step: 530780 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:05,345-Speed 2627.46 samples/sec Loss 4.9026 LearningRate 0.0130 Epoch: 12 Global Step: 530790 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:09,248-Speed 2623.70 samples/sec Loss 4.8708 LearningRate 0.0130 Epoch: 12 Global Step: 530800 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:13,149-Speed 2625.51 samples/sec Loss 4.8186 LearningRate 0.0130 Epoch: 12 Global Step: 530810 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:17,042-Speed 2630.91 samples/sec Loss 4.7647 LearningRate 0.0130 Epoch: 12 Global Step: 530820 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:20,941-Speed 2627.02 samples/sec Loss 4.8478 LearningRate 0.0130 Epoch: 12 Global Step: 530830 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:24,837-Speed 2629.85 samples/sec Loss 4.7932 LearningRate 0.0130 Epoch: 12 Global Step: 530840 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:28,735-Speed 2627.14 samples/sec Loss 4.9834 LearningRate 0.0130 Epoch: 12 Global Step: 530850 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:32,638-Speed 2625.31 samples/sec Loss 4.8055 LearningRate 0.0130 Epoch: 12 Global Step: 530860 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:36,534-Speed 2628.46 samples/sec Loss 4.7879 LearningRate 0.0130 Epoch: 12 Global Step: 530870 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:23:40,432-Speed 2627.58 samples/sec Loss 4.8619 LearningRate 0.0130 Epoch: 12 Global Step: 530880 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:23:44,314-Speed 2637.94 samples/sec Loss 4.8777 LearningRate 0.0130 Epoch: 12 Global Step: 530890 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:48,246-Speed 2605.21 samples/sec Loss 4.7531 LearningRate 0.0130 Epoch: 12 Global Step: 530900 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:52,181-Speed 2603.06 samples/sec Loss 4.8373 LearningRate 0.0130 Epoch: 12 Global Step: 530910 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:56,074-Speed 2630.69 samples/sec Loss 4.8094 LearningRate 0.0130 Epoch: 12 Global Step: 530920 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:23:59,968-Speed 2630.43 samples/sec Loss 4.9165 LearningRate 0.0130 Epoch: 12 Global Step: 530930 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:03,859-Speed 2632.67 samples/sec Loss 4.7669 LearningRate 0.0130 Epoch: 12 Global Step: 530940 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:07,752-Speed 2630.79 samples/sec Loss 4.8474 LearningRate 0.0130 Epoch: 12 Global Step: 530950 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:11,646-Speed 2629.71 samples/sec Loss 4.8479 LearningRate 0.0130 Epoch: 12 Global Step: 530960 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:15,552-Speed 2622.47 samples/sec Loss 4.9280 LearningRate 0.0130 Epoch: 12 Global Step: 530970 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:19,450-Speed 2627.31 samples/sec Loss 4.8586 LearningRate 0.0130 Epoch: 12 Global Step: 530980 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:23,357-Speed 2621.48 samples/sec Loss 4.9148 LearningRate 0.0130 Epoch: 12 Global Step: 530990 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:24:27,250-Speed 2630.98 samples/sec Loss 4.7992 LearningRate 0.0130 Epoch: 12 Global Step: 531000 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:24:31,146-Speed 2629.01 samples/sec Loss 4.8875 LearningRate 0.0130 Epoch: 12 Global Step: 531010 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:24:35,039-Speed 2631.29 samples/sec Loss 4.8146 LearningRate 0.0130 Epoch: 12 Global Step: 531020 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:24:38,921-Speed 2638.17 samples/sec Loss 4.8659 LearningRate 0.0130 Epoch: 12 Global Step: 531030 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:42,825-Speed 2623.92 samples/sec Loss 4.8659 LearningRate 0.0130 Epoch: 12 Global Step: 531040 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:46,725-Speed 2626.11 samples/sec Loss 4.8436 LearningRate 0.0129 Epoch: 12 Global Step: 531050 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:50,635-Speed 2619.63 samples/sec Loss 4.8312 LearningRate 0.0129 Epoch: 12 Global Step: 531060 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:54,532-Speed 2628.15 samples/sec Loss 4.9184 LearningRate 0.0129 Epoch: 12 Global Step: 531070 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:24:58,427-Speed 2630.23 samples/sec Loss 4.9508 LearningRate 0.0129 Epoch: 12 Global Step: 531080 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:02,366-Speed 2600.05 samples/sec Loss 4.8500 LearningRate 0.0129 Epoch: 12 Global Step: 531090 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:06,271-Speed 2622.73 samples/sec Loss 4.7687 LearningRate 0.0129 Epoch: 12 Global Step: 531100 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:10,170-Speed 2627.14 samples/sec Loss 4.9012 LearningRate 0.0129 Epoch: 12 Global Step: 531110 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:14,063-Speed 2630.63 samples/sec Loss 4.8913 LearningRate 0.0129 Epoch: 12 Global Step: 531120 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:17,977-Speed 2617.38 samples/sec Loss 4.9441 LearningRate 0.0129 Epoch: 12 Global Step: 531130 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:25:21,845-Speed 2647.88 samples/sec Loss 4.8211 LearningRate 0.0129 Epoch: 12 Global Step: 531140 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:25,744-Speed 2626.55 samples/sec Loss 4.7793 LearningRate 0.0129 Epoch: 12 Global Step: 531150 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:29,639-Speed 2630.22 samples/sec Loss 4.8651 LearningRate 0.0129 Epoch: 12 Global Step: 531160 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:33,536-Speed 2627.61 samples/sec Loss 4.8495 LearningRate 0.0129 Epoch: 12 Global Step: 531170 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:37,449-Speed 2618.09 samples/sec Loss 4.8441 LearningRate 0.0129 Epoch: 12 Global Step: 531180 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:41,348-Speed 2626.47 samples/sec Loss 4.7537 LearningRate 0.0129 Epoch: 12 Global Step: 531190 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:45,243-Speed 2630.03 samples/sec Loss 4.9098 LearningRate 0.0129 Epoch: 12 Global Step: 531200 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:49,135-Speed 2631.03 samples/sec Loss 4.7469 LearningRate 0.0129 Epoch: 12 Global Step: 531210 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:53,029-Speed 2630.59 samples/sec Loss 4.9243 LearningRate 0.0129 Epoch: 12 Global Step: 531220 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:25:56,924-Speed 2630.44 samples/sec Loss 4.8985 LearningRate 0.0129 Epoch: 12 Global Step: 531230 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:00,796-Speed 2645.03 samples/sec Loss 4.8954 LearningRate 0.0129 Epoch: 12 Global Step: 531240 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:04,691-Speed 2629.37 samples/sec Loss 4.9285 LearningRate 0.0129 Epoch: 12 Global Step: 531250 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:08,588-Speed 2628.00 samples/sec Loss 4.8786 LearningRate 0.0129 Epoch: 12 Global Step: 531260 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:12,492-Speed 2623.91 samples/sec Loss 4.9101 LearningRate 0.0129 Epoch: 12 Global Step: 531270 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:16,394-Speed 2624.31 samples/sec Loss 4.8520 LearningRate 0.0129 Epoch: 12 Global Step: 531280 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:20,302-Speed 2621.15 samples/sec Loss 4.8083 LearningRate 0.0129 Epoch: 12 Global Step: 531290 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:24,307-Speed 2557.23 samples/sec Loss 4.8521 LearningRate 0.0129 Epoch: 12 Global Step: 531300 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:28,210-Speed 2624.61 samples/sec Loss 4.8236 LearningRate 0.0129 Epoch: 12 Global Step: 531310 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:32,113-Speed 2624.71 samples/sec Loss 4.8579 LearningRate 0.0129 Epoch: 12 Global Step: 531320 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:36,009-Speed 2628.43 samples/sec Loss 4.8624 LearningRate 0.0129 Epoch: 12 Global Step: 531330 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:39,903-Speed 2630.67 samples/sec Loss 4.8524 LearningRate 0.0129 Epoch: 12 Global Step: 531340 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:26:43,795-Speed 2631.12 samples/sec Loss 4.8641 LearningRate 0.0129 Epoch: 12 Global Step: 531350 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:26:47,690-Speed 2630.14 samples/sec Loss 4.8468 LearningRate 0.0129 Epoch: 12 Global Step: 531360 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:26:51,564-Speed 2643.69 samples/sec Loss 4.8286 LearningRate 0.0129 Epoch: 12 Global Step: 531370 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:55,472-Speed 2620.58 samples/sec Loss 4.8304 LearningRate 0.0129 Epoch: 12 Global Step: 531380 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:26:59,411-Speed 2600.25 samples/sec Loss 4.8333 LearningRate 0.0129 Epoch: 12 Global Step: 531390 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:03,303-Speed 2631.88 samples/sec Loss 4.7404 LearningRate 0.0129 Epoch: 12 Global Step: 531400 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:07,205-Speed 2625.01 samples/sec Loss 4.7776 LearningRate 0.0129 Epoch: 12 Global Step: 531410 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:11,118-Speed 2617.72 samples/sec Loss 4.8442 LearningRate 0.0129 Epoch: 12 Global Step: 531420 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:15,017-Speed 2626.70 samples/sec Loss 4.7795 LearningRate 0.0129 Epoch: 12 Global Step: 531430 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:18,921-Speed 2623.12 samples/sec Loss 4.7742 LearningRate 0.0129 Epoch: 12 Global Step: 531440 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:22,856-Speed 2603.15 samples/sec Loss 4.9008 LearningRate 0.0129 Epoch: 12 Global Step: 531450 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:26,961-Speed 2494.94 samples/sec Loss 4.9172 LearningRate 0.0129 Epoch: 12 Global Step: 531460 Fp16 Grad Scale: 65536 Required: 34 hours
Training: 2022-04-15 07:27:31,061-Speed 2498.05 samples/sec Loss 4.9154 LearningRate 0.0129 Epoch: 12 Global Step: 531470 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:27:34,996-Speed 2603.00 samples/sec Loss 4.7699 LearningRate 0.0129 Epoch: 12 Global Step: 531480 Fp16 Grad Scale: 131072 Required: 34 hours
Training: 2022-04-15 07:27:38,905-Speed 2620.05 samples/sec Loss 4.8534 LearningRate 0.0129 Epoch: 12 Global Step: 531490 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:27:42,911-Speed 2557.12 samples/sec Loss 4.7087 LearningRate 0.0129 Epoch: 12 Global Step: 531500 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:27:46,848-Speed 2601.29 samples/sec Loss 4.6922 LearningRate 0.0129 Epoch: 12 Global Step: 531510 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:27:50,738-Speed 2633.15 samples/sec Loss 4.8972 LearningRate 0.0129 Epoch: 12 Global Step: 531520 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:27:54,632-Speed 2630.40 samples/sec Loss 4.7096 LearningRate 0.0129 Epoch: 12 Global Step: 531530 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:27:58,526-Speed 2630.45 samples/sec Loss 4.8044 LearningRate 0.0129 Epoch: 12 Global Step: 531540 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:02,437-Speed 2619.03 samples/sec Loss 4.8003 LearningRate 0.0129 Epoch: 12 Global Step: 531550 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:06,333-Speed 2628.27 samples/sec Loss 4.9083 LearningRate 0.0129 Epoch: 12 Global Step: 531560 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:10,211-Speed 2641.48 samples/sec Loss 4.8128 LearningRate 0.0129 Epoch: 12 Global Step: 531570 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:14,106-Speed 2629.36 samples/sec Loss 4.8722 LearningRate 0.0129 Epoch: 12 Global Step: 531580 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:18,015-Speed 2620.67 samples/sec Loss 4.8651 LearningRate 0.0129 Epoch: 12 Global Step: 531590 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:21,916-Speed 2626.18 samples/sec Loss 4.8696 LearningRate 0.0129 Epoch: 12 Global Step: 531600 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:25,819-Speed 2624.20 samples/sec Loss 4.8187 LearningRate 0.0129 Epoch: 12 Global Step: 531610 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:29,721-Speed 2624.93 samples/sec Loss 4.8601 LearningRate 0.0129 Epoch: 12 Global Step: 531620 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:33,619-Speed 2627.60 samples/sec Loss 4.7843 LearningRate 0.0129 Epoch: 12 Global Step: 531630 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:37,507-Speed 2634.27 samples/sec Loss 4.8660 LearningRate 0.0129 Epoch: 12 Global Step: 531640 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:41,412-Speed 2622.24 samples/sec Loss 4.8481 LearningRate 0.0129 Epoch: 12 Global Step: 531650 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:45,319-Speed 2622.23 samples/sec Loss 4.8286 LearningRate 0.0129 Epoch: 12 Global Step: 531660 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:49,195-Speed 2642.14 samples/sec Loss 4.8500 LearningRate 0.0129 Epoch: 12 Global Step: 531670 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:53,093-Speed 2627.78 samples/sec Loss 4.8026 LearningRate 0.0129 Epoch: 12 Global Step: 531680 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:28:56,987-Speed 2630.55 samples/sec Loss 4.7306 LearningRate 0.0129 Epoch: 12 Global Step: 531690 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:29:00,891-Speed 2623.35 samples/sec Loss 4.8431 LearningRate 0.0129 Epoch: 12 Global Step: 531700 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:29:04,774-Speed 2638.18 samples/sec Loss 4.8751 LearningRate 0.0129 Epoch: 12 Global Step: 531710 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:08,669-Speed 2628.98 samples/sec Loss 4.8546 LearningRate 0.0129 Epoch: 12 Global Step: 531720 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:12,569-Speed 2626.02 samples/sec Loss 4.8089 LearningRate 0.0129 Epoch: 12 Global Step: 531730 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:16,475-Speed 2622.96 samples/sec Loss 4.9025 LearningRate 0.0129 Epoch: 12 Global Step: 531740 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:20,375-Speed 2625.71 samples/sec Loss 4.8346 LearningRate 0.0129 Epoch: 12 Global Step: 531750 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:24,290-Speed 2616.83 samples/sec Loss 4.7644 LearningRate 0.0129 Epoch: 12 Global Step: 531760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:28,187-Speed 2627.61 samples/sec Loss 4.8131 LearningRate 0.0129 Epoch: 12 Global Step: 531770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:32,085-Speed 2628.06 samples/sec Loss 4.7827 LearningRate 0.0129 Epoch: 12 Global Step: 531780 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:35,985-Speed 2626.65 samples/sec Loss 4.8806 LearningRate 0.0129 Epoch: 12 Global Step: 531790 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:39,883-Speed 2627.79 samples/sec Loss 4.9098 LearningRate 0.0129 Epoch: 12 Global Step: 531800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:43,776-Speed 2630.24 samples/sec Loss 4.8268 LearningRate 0.0129 Epoch: 12 Global Step: 531810 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:29:47,670-Speed 2630.61 samples/sec Loss 4.7916 LearningRate 0.0129 Epoch: 12 Global Step: 531820 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:29:51,565-Speed 2629.83 samples/sec Loss 4.8965 LearningRate 0.0129 Epoch: 12 Global Step: 531830 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:29:55,441-Speed 2642.67 samples/sec Loss 4.7718 LearningRate 0.0129 Epoch: 12 Global Step: 531840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:29:59,335-Speed 2629.99 samples/sec Loss 4.8087 LearningRate 0.0129 Epoch: 12 Global Step: 531850 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:03,263-Speed 2607.81 samples/sec Loss 4.7189 LearningRate 0.0129 Epoch: 12 Global Step: 531860 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:07,157-Speed 2629.74 samples/sec Loss 4.7964 LearningRate 0.0129 Epoch: 12 Global Step: 531870 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:11,061-Speed 2623.86 samples/sec Loss 4.8272 LearningRate 0.0129 Epoch: 12 Global Step: 531880 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:14,967-Speed 2622.39 samples/sec Loss 4.8275 LearningRate 0.0129 Epoch: 12 Global Step: 531890 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:18,864-Speed 2628.14 samples/sec Loss 4.7332 LearningRate 0.0129 Epoch: 12 Global Step: 531900 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:22,756-Speed 2631.30 samples/sec Loss 4.8159 LearningRate 0.0129 Epoch: 12 Global Step: 531910 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:26,657-Speed 2625.65 samples/sec Loss 4.8748 LearningRate 0.0129 Epoch: 12 Global Step: 531920 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:30,559-Speed 2624.95 samples/sec Loss 4.8155 LearningRate 0.0129 Epoch: 12 Global Step: 531930 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:34,457-Speed 2627.48 samples/sec Loss 4.7888 LearningRate 0.0129 Epoch: 12 Global Step: 531940 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:30:38,357-Speed 2626.85 samples/sec Loss 4.8549 LearningRate 0.0129 Epoch: 12 Global Step: 531950 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:30:42,254-Speed 2628.11 samples/sec Loss 4.5832 LearningRate 0.0129 Epoch: 12 Global Step: 531960 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:30:46,131-Speed 2641.37 samples/sec Loss 4.8474 LearningRate 0.0129 Epoch: 12 Global Step: 531970 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:50,022-Speed 2632.49 samples/sec Loss 4.9758 LearningRate 0.0129 Epoch: 12 Global Step: 531980 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:53,917-Speed 2629.91 samples/sec Loss 4.7991 LearningRate 0.0129 Epoch: 12 Global Step: 531990 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:30:57,812-Speed 2629.44 samples/sec Loss 4.8136 LearningRate 0.0129 Epoch: 12 Global Step: 532000 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:01,703-Speed 2632.00 samples/sec Loss 4.7908 LearningRate 0.0129 Epoch: 12 Global Step: 532010 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:05,601-Speed 2627.90 samples/sec Loss 4.9121 LearningRate 0.0129 Epoch: 12 Global Step: 532020 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:09,494-Speed 2630.49 samples/sec Loss 4.6855 LearningRate 0.0129 Epoch: 12 Global Step: 532030 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:13,395-Speed 2625.42 samples/sec Loss 4.7368 LearningRate 0.0129 Epoch: 12 Global Step: 532040 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:17,295-Speed 2626.40 samples/sec Loss 4.8584 LearningRate 0.0129 Epoch: 12 Global Step: 532050 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:21,187-Speed 2631.89 samples/sec Loss 4.8436 LearningRate 0.0129 Epoch: 12 Global Step: 532060 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:25,062-Speed 2643.31 samples/sec Loss 4.8385 LearningRate 0.0129 Epoch: 12 Global Step: 532070 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:28,954-Speed 2631.92 samples/sec Loss 4.8552 LearningRate 0.0129 Epoch: 12 Global Step: 532080 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:32,848-Speed 2630.51 samples/sec Loss 4.7915 LearningRate 0.0129 Epoch: 12 Global Step: 532090 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:36,747-Speed 2626.82 samples/sec Loss 4.8018 LearningRate 0.0129 Epoch: 12 Global Step: 532100 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:40,645-Speed 2627.06 samples/sec Loss 4.7228 LearningRate 0.0129 Epoch: 12 Global Step: 532110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:44,544-Speed 2627.09 samples/sec Loss 4.8127 LearningRate 0.0129 Epoch: 12 Global Step: 532120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:48,457-Speed 2617.38 samples/sec Loss 4.8395 LearningRate 0.0129 Epoch: 12 Global Step: 532130 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:52,354-Speed 2628.45 samples/sec Loss 4.8813 LearningRate 0.0129 Epoch: 12 Global Step: 532140 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:31:56,251-Speed 2628.04 samples/sec Loss 4.9200 LearningRate 0.0129 Epoch: 12 Global Step: 532150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:32:00,146-Speed 2629.78 samples/sec Loss 4.8608 LearningRate 0.0129 Epoch: 12 Global Step: 532160 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:32:04,046-Speed 2625.98 samples/sec Loss 4.8044 LearningRate 0.0129 Epoch: 12 Global Step: 532170 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:32:07,948-Speed 2625.97 samples/sec Loss 4.9153 LearningRate 0.0129 Epoch: 12 Global Step: 532180 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:32:11,864-Speed 2615.16 samples/sec Loss 4.7113 LearningRate 0.0129 Epoch: 12 Global Step: 532190 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:32:15,743-Speed 2640.22 samples/sec Loss 4.8435 LearningRate 0.0129 Epoch: 12 Global Step: 532200 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:32:19,650-Speed 2621.44 samples/sec Loss 4.7992 LearningRate 0.0128 Epoch: 12 Global Step: 532210 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:32:23,542-Speed 2631.72 samples/sec Loss 4.8328 LearningRate 0.0128 Epoch: 12 Global Step: 532220 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:32:27,413-Speed 2646.26 samples/sec Loss 4.7926 LearningRate 0.0128 Epoch: 12 Global Step: 532230 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:31,308-Speed 2629.34 samples/sec Loss 4.9539 LearningRate 0.0128 Epoch: 12 Global Step: 532240 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:35,203-Speed 2629.38 samples/sec Loss 4.8471 LearningRate 0.0128 Epoch: 12 Global Step: 532250 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:39,100-Speed 2628.39 samples/sec Loss 4.8312 LearningRate 0.0128 Epoch: 12 Global Step: 532260 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:42,994-Speed 2630.52 samples/sec Loss 4.8378 LearningRate 0.0128 Epoch: 12 Global Step: 532270 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:46,906-Speed 2618.55 samples/sec Loss 4.8293 LearningRate 0.0128 Epoch: 12 Global Step: 532280 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:50,803-Speed 2628.02 samples/sec Loss 4.8176 LearningRate 0.0128 Epoch: 12 Global Step: 532290 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:54,700-Speed 2628.13 samples/sec Loss 4.8020 LearningRate 0.0128 Epoch: 12 Global Step: 532300 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:32:58,595-Speed 2629.58 samples/sec Loss 4.9197 LearningRate 0.0128 Epoch: 12 Global Step: 532310 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:33:02,583-Speed 2568.44 samples/sec Loss 4.8747 LearningRate 0.0128 Epoch: 12 Global Step: 532320 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:33:06,485-Speed 2625.16 samples/sec Loss 4.6863 LearningRate 0.0128 Epoch: 12 Global Step: 532330 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:10,558-Speed 2514.32 samples/sec Loss 4.8647 LearningRate 0.0128 Epoch: 12 Global Step: 532340 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:14,545-Speed 2568.87 samples/sec Loss 4.7885 LearningRate 0.0128 Epoch: 12 Global Step: 532350 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:18,456-Speed 2618.82 samples/sec Loss 4.8398 LearningRate 0.0128 Epoch: 12 Global Step: 532360 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:22,349-Speed 2631.58 samples/sec Loss 4.7748 LearningRate 0.0128 Epoch: 12 Global Step: 532370 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:26,241-Speed 2631.82 samples/sec Loss 4.8857 LearningRate 0.0128 Epoch: 12 Global Step: 532380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:30,136-Speed 2629.34 samples/sec Loss 4.8600 LearningRate 0.0128 Epoch: 12 Global Step: 532390 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:34,029-Speed 2630.79 samples/sec Loss 4.8884 LearningRate 0.0128 Epoch: 12 Global Step: 532400 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:37,927-Speed 2627.09 samples/sec Loss 4.8619 LearningRate 0.0128 Epoch: 12 Global Step: 532410 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:41,999-Speed 2515.67 samples/sec Loss 4.7207 LearningRate 0.0128 Epoch: 12 Global Step: 532420 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:46,095-Speed 2500.57 samples/sec Loss 4.7862 LearningRate 0.0128 Epoch: 12 Global Step: 532430 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:33:50,190-Speed 2501.97 samples/sec Loss 4.8944 LearningRate 0.0128 Epoch: 12 Global Step: 532440 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:33:54,248-Speed 2523.86 samples/sec Loss 4.8498 LearningRate 0.0128 Epoch: 12 Global Step: 532450 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:33:58,300-Speed 2527.84 samples/sec Loss 4.8800 LearningRate 0.0128 Epoch: 12 Global Step: 532460 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:02,203-Speed 2624.04 samples/sec Loss 4.9001 LearningRate 0.0128 Epoch: 12 Global Step: 532470 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:06,098-Speed 2630.13 samples/sec Loss 4.8245 LearningRate 0.0128 Epoch: 12 Global Step: 532480 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:10,011-Speed 2617.29 samples/sec Loss 4.6908 LearningRate 0.0128 Epoch: 12 Global Step: 532490 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:13,918-Speed 2621.67 samples/sec Loss 4.8012 LearningRate 0.0128 Epoch: 12 Global Step: 532500 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:17,814-Speed 2628.76 samples/sec Loss 4.7679 LearningRate 0.0128 Epoch: 12 Global Step: 532510 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:21,712-Speed 2627.39 samples/sec Loss 4.8061 LearningRate 0.0128 Epoch: 12 Global Step: 532520 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:25,606-Speed 2630.79 samples/sec Loss 4.8569 LearningRate 0.0128 Epoch: 12 Global Step: 532530 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:29,500-Speed 2630.17 samples/sec Loss 4.7567 LearningRate 0.0128 Epoch: 12 Global Step: 532540 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:33,405-Speed 2622.55 samples/sec Loss 4.8514 LearningRate 0.0128 Epoch: 12 Global Step: 532550 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:34:37,308-Speed 2623.91 samples/sec Loss 4.7184 LearningRate 0.0128 Epoch: 12 Global Step: 532560 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:34:41,201-Speed 2631.82 samples/sec Loss 4.6982 LearningRate 0.0128 Epoch: 12 Global Step: 532570 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:34:45,102-Speed 2625.66 samples/sec Loss 4.7501 LearningRate 0.0128 Epoch: 12 Global Step: 532580 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:34:48,994-Speed 2631.63 samples/sec Loss 4.7903 LearningRate 0.0128 Epoch: 12 Global Step: 532590 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:34:52,864-Speed 2646.19 samples/sec Loss 4.8948 LearningRate 0.0128 Epoch: 12 Global Step: 532600 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:34:56,740-Speed 2643.34 samples/sec Loss 4.7843 LearningRate 0.0128 Epoch: 12 Global Step: 532610 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:00,641-Speed 2625.71 samples/sec Loss 4.8146 LearningRate 0.0128 Epoch: 12 Global Step: 532620 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:04,537-Speed 2628.37 samples/sec Loss 4.7964 LearningRate 0.0128 Epoch: 12 Global Step: 532630 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:08,431-Speed 2630.14 samples/sec Loss 4.8830 LearningRate 0.0128 Epoch: 12 Global Step: 532640 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:12,335-Speed 2624.05 samples/sec Loss 4.8356 LearningRate 0.0128 Epoch: 12 Global Step: 532650 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:16,232-Speed 2628.30 samples/sec Loss 4.8672 LearningRate 0.0128 Epoch: 12 Global Step: 532660 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:20,125-Speed 2631.11 samples/sec Loss 4.8094 LearningRate 0.0128 Epoch: 12 Global Step: 532670 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:24,022-Speed 2628.67 samples/sec Loss 4.7460 LearningRate 0.0128 Epoch: 12 Global Step: 532680 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:27,916-Speed 2630.91 samples/sec Loss 4.8002 LearningRate 0.0128 Epoch: 12 Global Step: 532690 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:31,811-Speed 2629.14 samples/sec Loss 4.6564 LearningRate 0.0128 Epoch: 12 Global Step: 532700 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:35:35,703-Speed 2631.47 samples/sec Loss 4.8590 LearningRate 0.0128 Epoch: 12 Global Step: 532710 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:35:39,597-Speed 2630.58 samples/sec Loss 4.7138 LearningRate 0.0128 Epoch: 12 Global Step: 532720 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:35:43,489-Speed 2631.25 samples/sec Loss 4.8759 LearningRate 0.0128 Epoch: 12 Global Step: 532730 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:35:47,388-Speed 2627.46 samples/sec Loss 4.8992 LearningRate 0.0128 Epoch: 12 Global Step: 532740 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:35:51,282-Speed 2629.81 samples/sec Loss 4.7975 LearningRate 0.0128 Epoch: 12 Global Step: 532750 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:35:55,178-Speed 2629.39 samples/sec Loss 4.9103 LearningRate 0.0128 Epoch: 12 Global Step: 532760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:35:59,070-Speed 2631.66 samples/sec Loss 4.8518 LearningRate 0.0128 Epoch: 12 Global Step: 532770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:02,963-Speed 2630.85 samples/sec Loss 4.7182 LearningRate 0.0128 Epoch: 12 Global Step: 532780 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:06,869-Speed 2622.19 samples/sec Loss 4.8904 LearningRate 0.0128 Epoch: 12 Global Step: 532790 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:10,765-Speed 2628.69 samples/sec Loss 4.7897 LearningRate 0.0128 Epoch: 12 Global Step: 532800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:14,660-Speed 2629.75 samples/sec Loss 4.8502 LearningRate 0.0128 Epoch: 12 Global Step: 532810 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:36:18,556-Speed 2629.37 samples/sec Loss 4.8161 LearningRate 0.0128 Epoch: 12 Global Step: 532820 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:36:22,457-Speed 2625.32 samples/sec Loss 4.7541 LearningRate 0.0128 Epoch: 12 Global Step: 532830 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:36:26,328-Speed 2646.05 samples/sec Loss 4.7644 LearningRate 0.0128 Epoch: 12 Global Step: 532840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:30,241-Speed 2617.68 samples/sec Loss 4.6742 LearningRate 0.0128 Epoch: 12 Global Step: 532850 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:34,136-Speed 2629.17 samples/sec Loss 4.7891 LearningRate 0.0128 Epoch: 12 Global Step: 532860 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:38,093-Speed 2589.01 samples/sec Loss 4.7949 LearningRate 0.0128 Epoch: 12 Global Step: 532870 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:41,987-Speed 2630.14 samples/sec Loss 4.8510 LearningRate 0.0128 Epoch: 12 Global Step: 532880 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:45,877-Speed 2632.94 samples/sec Loss 4.8245 LearningRate 0.0128 Epoch: 12 Global Step: 532890 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:49,776-Speed 2627.08 samples/sec Loss 4.8039 LearningRate 0.0128 Epoch: 12 Global Step: 532900 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:53,676-Speed 2626.11 samples/sec Loss 4.7719 LearningRate 0.0128 Epoch: 12 Global Step: 532910 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:36:57,573-Speed 2628.31 samples/sec Loss 4.7792 LearningRate 0.0128 Epoch: 12 Global Step: 532920 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:37:01,453-Speed 2639.60 samples/sec Loss 4.8408 LearningRate 0.0128 Epoch: 12 Global Step: 532930 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:05,343-Speed 2633.17 samples/sec Loss 4.8335 LearningRate 0.0128 Epoch: 12 Global Step: 532940 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:09,243-Speed 2625.56 samples/sec Loss 4.7833 LearningRate 0.0128 Epoch: 12 Global Step: 532950 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:13,144-Speed 2625.63 samples/sec Loss 4.9153 LearningRate 0.0128 Epoch: 12 Global Step: 532960 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:17,048-Speed 2623.99 samples/sec Loss 4.7171 LearningRate 0.0128 Epoch: 12 Global Step: 532970 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:20,951-Speed 2624.53 samples/sec Loss 4.7073 LearningRate 0.0128 Epoch: 12 Global Step: 532980 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:24,858-Speed 2621.50 samples/sec Loss 4.7699 LearningRate 0.0128 Epoch: 12 Global Step: 532990 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:28,757-Speed 2626.72 samples/sec Loss 4.8658 LearningRate 0.0128 Epoch: 12 Global Step: 533000 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:32,661-Speed 2623.72 samples/sec Loss 4.8270 LearningRate 0.0128 Epoch: 12 Global Step: 533010 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:36,560-Speed 2626.87 samples/sec Loss 4.9293 LearningRate 0.0128 Epoch: 12 Global Step: 533020 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:37:40,466-Speed 2622.11 samples/sec Loss 4.8120 LearningRate 0.0128 Epoch: 12 Global Step: 533030 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:37:44,367-Speed 2625.41 samples/sec Loss 4.8410 LearningRate 0.0128 Epoch: 12 Global Step: 533040 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:37:48,264-Speed 2627.99 samples/sec Loss 4.7764 LearningRate 0.0128 Epoch: 12 Global Step: 533050 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:37:52,163-Speed 2627.42 samples/sec Loss 4.7852 LearningRate 0.0128 Epoch: 12 Global Step: 533060 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:37:56,059-Speed 2628.72 samples/sec Loss 4.7564 LearningRate 0.0128 Epoch: 12 Global Step: 533070 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:37:59,962-Speed 2624.11 samples/sec Loss 4.7774 LearningRate 0.0128 Epoch: 12 Global Step: 533080 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:03,874-Speed 2618.67 samples/sec Loss 4.8348 LearningRate 0.0128 Epoch: 12 Global Step: 533090 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:07,771-Speed 2628.21 samples/sec Loss 4.8770 LearningRate 0.0128 Epoch: 12 Global Step: 533100 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:11,667-Speed 2628.77 samples/sec Loss 4.8305 LearningRate 0.0128 Epoch: 12 Global Step: 533110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:15,563-Speed 2628.68 samples/sec Loss 4.8561 LearningRate 0.0128 Epoch: 12 Global Step: 533120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:19,463-Speed 2626.38 samples/sec Loss 4.8196 LearningRate 0.0128 Epoch: 12 Global Step: 533130 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:38:23,365-Speed 2624.92 samples/sec Loss 4.7475 LearningRate 0.0128 Epoch: 12 Global Step: 533140 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:38:27,251-Speed 2635.92 samples/sec Loss 4.7822 LearningRate 0.0128 Epoch: 12 Global Step: 533150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:31,170-Speed 2613.35 samples/sec Loss 4.8102 LearningRate 0.0128 Epoch: 12 Global Step: 533160 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:35,074-Speed 2623.62 samples/sec Loss 4.8156 LearningRate 0.0128 Epoch: 12 Global Step: 533170 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:38,974-Speed 2625.89 samples/sec Loss 4.8395 LearningRate 0.0128 Epoch: 12 Global Step: 533180 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:42,868-Speed 2630.65 samples/sec Loss 4.6814 LearningRate 0.0128 Epoch: 12 Global Step: 533190 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:38:46,740-Speed 2644.98 samples/sec Loss 4.6900 LearningRate 0.0128 Epoch: 12 Global Step: 533200 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:38:50,730-Speed 2567.80 samples/sec Loss 4.8153 LearningRate 0.0128 Epoch: 12 Global Step: 533210 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:38:54,642-Speed 2617.68 samples/sec Loss 4.7610 LearningRate 0.0128 Epoch: 12 Global Step: 533220 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:38:58,550-Speed 2621.05 samples/sec Loss 4.7749 LearningRate 0.0128 Epoch: 12 Global Step: 533230 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:02,444-Speed 2630.01 samples/sec Loss 4.8468 LearningRate 0.0128 Epoch: 12 Global Step: 533240 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:06,337-Speed 2631.73 samples/sec Loss 4.8215 LearningRate 0.0128 Epoch: 12 Global Step: 533250 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:10,230-Speed 2630.89 samples/sec Loss 4.8719 LearningRate 0.0128 Epoch: 12 Global Step: 533260 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:14,130-Speed 2626.27 samples/sec Loss 4.8132 LearningRate 0.0128 Epoch: 12 Global Step: 533270 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:18,027-Speed 2628.62 samples/sec Loss 4.7653 LearningRate 0.0128 Epoch: 12 Global Step: 533280 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:21,924-Speed 2627.65 samples/sec Loss 4.7679 LearningRate 0.0128 Epoch: 12 Global Step: 533290 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:25,821-Speed 2628.69 samples/sec Loss 4.7627 LearningRate 0.0128 Epoch: 12 Global Step: 533300 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:39:29,723-Speed 2624.50 samples/sec Loss 4.8024 LearningRate 0.0128 Epoch: 12 Global Step: 533310 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:39:33,617-Speed 2630.54 samples/sec Loss 4.8420 LearningRate 0.0128 Epoch: 12 Global Step: 533320 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:39:37,514-Speed 2628.29 samples/sec Loss 4.8627 LearningRate 0.0128 Epoch: 12 Global Step: 533330 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:39:41,387-Speed 2644.18 samples/sec Loss 4.8188 LearningRate 0.0128 Epoch: 12 Global Step: 533340 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:45,283-Speed 2629.08 samples/sec Loss 4.7840 LearningRate 0.0128 Epoch: 12 Global Step: 533350 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:49,172-Speed 2634.46 samples/sec Loss 4.7835 LearningRate 0.0128 Epoch: 12 Global Step: 533360 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:53,064-Speed 2631.15 samples/sec Loss 4.8688 LearningRate 0.0127 Epoch: 12 Global Step: 533370 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:39:56,974-Speed 2619.69 samples/sec Loss 4.7467 LearningRate 0.0127 Epoch: 12 Global Step: 533380 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:40:00,870-Speed 2629.20 samples/sec Loss 4.8033 LearningRate 0.0127 Epoch: 12 Global Step: 533390 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:40:04,766-Speed 2628.58 samples/sec Loss 4.7824 LearningRate 0.0127 Epoch: 12 Global Step: 533400 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:40:08,670-Speed 2623.15 samples/sec Loss 4.7674 LearningRate 0.0127 Epoch: 12 Global Step: 533410 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:40:12,575-Speed 2624.19 samples/sec Loss 4.8480 LearningRate 0.0127 Epoch: 12 Global Step: 533420 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:40:16,473-Speed 2627.11 samples/sec Loss 4.7978 LearningRate 0.0127 Epoch: 12 Global Step: 533430 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:40:20,376-Speed 2624.79 samples/sec Loss 4.7675 LearningRate 0.0127 Epoch: 12 Global Step: 533440 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:24,290-Speed 2616.64 samples/sec Loss 4.7363 LearningRate 0.0127 Epoch: 12 Global Step: 533450 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:28,259-Speed 2581.15 samples/sec Loss 4.7346 LearningRate 0.0127 Epoch: 12 Global Step: 533460 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:32,158-Speed 2626.63 samples/sec Loss 4.7947 LearningRate 0.0127 Epoch: 12 Global Step: 533470 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:36,062-Speed 2623.56 samples/sec Loss 4.8036 LearningRate 0.0127 Epoch: 12 Global Step: 533480 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:39,959-Speed 2628.20 samples/sec Loss 4.8713 LearningRate 0.0127 Epoch: 12 Global Step: 533490 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:43,866-Speed 2621.30 samples/sec Loss 4.7489 LearningRate 0.0127 Epoch: 12 Global Step: 533500 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:47,761-Speed 2629.69 samples/sec Loss 4.8045 LearningRate 0.0127 Epoch: 12 Global Step: 533510 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:51,655-Speed 2630.32 samples/sec Loss 4.9303 LearningRate 0.0127 Epoch: 12 Global Step: 533520 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:55,551-Speed 2628.98 samples/sec Loss 4.7776 LearningRate 0.0127 Epoch: 12 Global Step: 533530 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:40:59,454-Speed 2623.97 samples/sec Loss 4.9811 LearningRate 0.0127 Epoch: 12 Global Step: 533540 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:03,350-Speed 2629.47 samples/sec Loss 4.6740 LearningRate 0.0127 Epoch: 12 Global Step: 533550 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:07,246-Speed 2628.61 samples/sec Loss 4.7773 LearningRate 0.0127 Epoch: 12 Global Step: 533560 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:11,144-Speed 2628.32 samples/sec Loss 4.7669 LearningRate 0.0127 Epoch: 12 Global Step: 533570 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:15,047-Speed 2623.97 samples/sec Loss 4.8035 LearningRate 0.0127 Epoch: 12 Global Step: 533580 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:18,938-Speed 2632.14 samples/sec Loss 4.7691 LearningRate 0.0127 Epoch: 12 Global Step: 533590 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:22,835-Speed 2628.49 samples/sec Loss 4.7712 LearningRate 0.0127 Epoch: 12 Global Step: 533600 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:26,731-Speed 2628.69 samples/sec Loss 4.6771 LearningRate 0.0127 Epoch: 12 Global Step: 533610 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:30,625-Speed 2630.40 samples/sec Loss 4.7867 LearningRate 0.0127 Epoch: 12 Global Step: 533620 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:34,522-Speed 2627.84 samples/sec Loss 4.8602 LearningRate 0.0127 Epoch: 12 Global Step: 533630 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:38,402-Speed 2640.26 samples/sec Loss 4.8584 LearningRate 0.0127 Epoch: 12 Global Step: 533640 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:42,300-Speed 2627.99 samples/sec Loss 4.8251 LearningRate 0.0127 Epoch: 12 Global Step: 533650 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:46,197-Speed 2627.90 samples/sec Loss 4.7993 LearningRate 0.0127 Epoch: 12 Global Step: 533660 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:50,102-Speed 2622.72 samples/sec Loss 4.7809 LearningRate 0.0127 Epoch: 12 Global Step: 533670 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:41:53,974-Speed 2645.13 samples/sec Loss 4.7390 LearningRate 0.0127 Epoch: 12 Global Step: 533680 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:41:57,866-Speed 2631.81 samples/sec Loss 4.8032 LearningRate 0.0127 Epoch: 12 Global Step: 533690 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:01,772-Speed 2622.53 samples/sec Loss 4.8602 LearningRate 0.0127 Epoch: 12 Global Step: 533700 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:05,666-Speed 2629.59 samples/sec Loss 4.8449 LearningRate 0.0127 Epoch: 12 Global Step: 533710 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:09,562-Speed 2629.23 samples/sec Loss 4.8055 LearningRate 0.0127 Epoch: 12 Global Step: 533720 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:13,455-Speed 2630.84 samples/sec Loss 4.7893 LearningRate 0.0127 Epoch: 12 Global Step: 533730 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:17,360-Speed 2623.17 samples/sec Loss 4.7218 LearningRate 0.0127 Epoch: 12 Global Step: 533740 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:21,251-Speed 2631.94 samples/sec Loss 4.8145 LearningRate 0.0127 Epoch: 12 Global Step: 533750 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:25,142-Speed 2632.99 samples/sec Loss 4.7490 LearningRate 0.0127 Epoch: 12 Global Step: 533760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:29,043-Speed 2625.00 samples/sec Loss 4.7429 LearningRate 0.0127 Epoch: 12 Global Step: 533770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:32,942-Speed 2627.23 samples/sec Loss 4.7105 LearningRate 0.0127 Epoch: 12 Global Step: 533780 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:42:36,867-Speed 2609.55 samples/sec Loss 4.8220 LearningRate 0.0127 Epoch: 12 Global Step: 533790 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:42:40,745-Speed 2641.08 samples/sec Loss 4.8148 LearningRate 0.0127 Epoch: 12 Global Step: 533800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:44,641-Speed 2628.87 samples/sec Loss 4.7248 LearningRate 0.0127 Epoch: 12 Global Step: 533810 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:42:48,522-Speed 2640.91 samples/sec Loss 4.7734 LearningRate 0.0127 Epoch: 12 Global Step: 533820 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:42:52,417-Speed 2629.21 samples/sec Loss 4.7705 LearningRate 0.0127 Epoch: 12 Global Step: 533830 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:42:56,354-Speed 2602.21 samples/sec Loss 4.7062 LearningRate 0.0127 Epoch: 12 Global Step: 533840 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:00,343-Speed 2567.11 samples/sec Loss 4.7571 LearningRate 0.0127 Epoch: 12 Global Step: 533850 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:04,245-Speed 2625.37 samples/sec Loss 4.8439 LearningRate 0.0127 Epoch: 12 Global Step: 533860 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:08,140-Speed 2629.47 samples/sec Loss 4.7726 LearningRate 0.0127 Epoch: 12 Global Step: 533870 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:12,042-Speed 2624.64 samples/sec Loss 4.7554 LearningRate 0.0127 Epoch: 12 Global Step: 533880 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:15,944-Speed 2625.04 samples/sec Loss 4.8146 LearningRate 0.0127 Epoch: 12 Global Step: 533890 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:19,841-Speed 2628.11 samples/sec Loss 4.7835 LearningRate 0.0127 Epoch: 12 Global Step: 533900 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:23,745-Speed 2624.07 samples/sec Loss 4.7769 LearningRate 0.0127 Epoch: 12 Global Step: 533910 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:43:27,643-Speed 2627.53 samples/sec Loss 4.8546 LearningRate 0.0127 Epoch: 12 Global Step: 533920 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:31,531-Speed 2636.23 samples/sec Loss 4.7779 LearningRate 0.0127 Epoch: 12 Global Step: 533930 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:35,423-Speed 2631.65 samples/sec Loss 4.7363 LearningRate 0.0127 Epoch: 12 Global Step: 533940 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:39,318-Speed 2629.73 samples/sec Loss 4.7721 LearningRate 0.0127 Epoch: 12 Global Step: 533950 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:43,212-Speed 2629.58 samples/sec Loss 4.7928 LearningRate 0.0127 Epoch: 12 Global Step: 533960 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:47,107-Speed 2630.01 samples/sec Loss 4.8177 LearningRate 0.0127 Epoch: 12 Global Step: 533970 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:51,003-Speed 2628.50 samples/sec Loss 4.8247 LearningRate 0.0127 Epoch: 12 Global Step: 533980 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:54,899-Speed 2629.21 samples/sec Loss 4.7940 LearningRate 0.0127 Epoch: 12 Global Step: 533990 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:43:58,826-Speed 2607.98 samples/sec Loss 4.7014 LearningRate 0.0127 Epoch: 12 Global Step: 534000 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:02,728-Speed 2625.40 samples/sec Loss 4.7423 LearningRate 0.0127 Epoch: 12 Global Step: 534010 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:06,654-Speed 2608.93 samples/sec Loss 4.8812 LearningRate 0.0127 Epoch: 12 Global Step: 534020 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:44:10,546-Speed 2631.43 samples/sec Loss 4.7155 LearningRate 0.0127 Epoch: 12 Global Step: 534030 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:44:14,442-Speed 2629.23 samples/sec Loss 4.7421 LearningRate 0.0127 Epoch: 12 Global Step: 534040 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:44:18,340-Speed 2627.48 samples/sec Loss 4.7599 LearningRate 0.0127 Epoch: 12 Global Step: 534050 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:44:22,210-Speed 2646.63 samples/sec Loss 4.7812 LearningRate 0.0127 Epoch: 12 Global Step: 534060 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:26,103-Speed 2630.83 samples/sec Loss 4.8598 LearningRate 0.0127 Epoch: 12 Global Step: 534070 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:29,997-Speed 2630.32 samples/sec Loss 4.9078 LearningRate 0.0127 Epoch: 12 Global Step: 534080 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:33,910-Speed 2617.79 samples/sec Loss 4.8779 LearningRate 0.0127 Epoch: 12 Global Step: 534090 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:37,806-Speed 2628.53 samples/sec Loss 4.8079 LearningRate 0.0127 Epoch: 12 Global Step: 534100 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:41,703-Speed 2628.32 samples/sec Loss 4.7754 LearningRate 0.0127 Epoch: 12 Global Step: 534110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:45,604-Speed 2625.93 samples/sec Loss 4.8152 LearningRate 0.0127 Epoch: 12 Global Step: 534120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:49,503-Speed 2627.17 samples/sec Loss 4.8005 LearningRate 0.0127 Epoch: 12 Global Step: 534130 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:53,400-Speed 2629.13 samples/sec Loss 4.7009 LearningRate 0.0127 Epoch: 12 Global Step: 534140 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:44:57,297-Speed 2627.85 samples/sec Loss 4.9027 LearningRate 0.0127 Epoch: 12 Global Step: 534150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:01,191-Speed 2629.86 samples/sec Loss 4.8078 LearningRate 0.0127 Epoch: 12 Global Step: 534160 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:45:05,093-Speed 2625.09 samples/sec Loss 4.8120 LearningRate 0.0127 Epoch: 12 Global Step: 534170 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:45:08,994-Speed 2625.29 samples/sec Loss 4.7356 LearningRate 0.0127 Epoch: 12 Global Step: 534180 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:45:12,873-Speed 2640.29 samples/sec Loss 4.8267 LearningRate 0.0127 Epoch: 12 Global Step: 534190 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:16,767-Speed 2630.34 samples/sec Loss 4.8491 LearningRate 0.0127 Epoch: 12 Global Step: 534200 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:20,661-Speed 2630.91 samples/sec Loss 4.7782 LearningRate 0.0127 Epoch: 12 Global Step: 534210 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:24,554-Speed 2630.69 samples/sec Loss 4.7787 LearningRate 0.0127 Epoch: 12 Global Step: 534220 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:28,447-Speed 2631.26 samples/sec Loss 4.7700 LearningRate 0.0127 Epoch: 12 Global Step: 534230 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:32,347-Speed 2626.11 samples/sec Loss 4.8553 LearningRate 0.0127 Epoch: 12 Global Step: 534240 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:36,241-Speed 2630.41 samples/sec Loss 4.7309 LearningRate 0.0127 Epoch: 12 Global Step: 534250 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:40,132-Speed 2631.73 samples/sec Loss 4.7748 LearningRate 0.0127 Epoch: 12 Global Step: 534260 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:44,031-Speed 2627.41 samples/sec Loss 4.8456 LearningRate 0.0127 Epoch: 12 Global Step: 534270 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:47,932-Speed 2625.78 samples/sec Loss 4.8230 LearningRate 0.0127 Epoch: 12 Global Step: 534280 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:51,829-Speed 2628.22 samples/sec Loss 4.7645 LearningRate 0.0127 Epoch: 12 Global Step: 534290 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:45:55,708-Speed 2639.95 samples/sec Loss 4.8025 LearningRate 0.0127 Epoch: 12 Global Step: 534300 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:45:59,608-Speed 2627.91 samples/sec Loss 4.8119 LearningRate 0.0127 Epoch: 12 Global Step: 534310 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:03,508-Speed 2625.90 samples/sec Loss 4.7713 LearningRate 0.0127 Epoch: 12 Global Step: 534320 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:07,569-Speed 2521.97 samples/sec Loss 4.7494 LearningRate 0.0127 Epoch: 12 Global Step: 534330 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:11,568-Speed 2561.36 samples/sec Loss 4.8298 LearningRate 0.0127 Epoch: 12 Global Step: 534340 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:15,470-Speed 2624.88 samples/sec Loss 4.7478 LearningRate 0.0127 Epoch: 12 Global Step: 534350 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:19,373-Speed 2624.57 samples/sec Loss 4.7612 LearningRate 0.0127 Epoch: 12 Global Step: 534360 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:23,274-Speed 2625.13 samples/sec Loss 4.6874 LearningRate 0.0127 Epoch: 12 Global Step: 534370 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:27,175-Speed 2626.01 samples/sec Loss 4.7505 LearningRate 0.0127 Epoch: 12 Global Step: 534380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:31,084-Speed 2620.02 samples/sec Loss 4.8301 LearningRate 0.0127 Epoch: 12 Global Step: 534390 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:34,987-Speed 2624.03 samples/sec Loss 4.8997 LearningRate 0.0127 Epoch: 12 Global Step: 534400 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:46:38,886-Speed 2626.98 samples/sec Loss 4.7998 LearningRate 0.0127 Epoch: 12 Global Step: 534410 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:46:42,765-Speed 2640.42 samples/sec Loss 4.7524 LearningRate 0.0127 Epoch: 12 Global Step: 534420 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:46,676-Speed 2619.12 samples/sec Loss 4.7693 LearningRate 0.0127 Epoch: 12 Global Step: 534430 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:50,573-Speed 2628.11 samples/sec Loss 4.8638 LearningRate 0.0127 Epoch: 12 Global Step: 534440 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:54,467-Speed 2630.35 samples/sec Loss 4.7317 LearningRate 0.0127 Epoch: 12 Global Step: 534450 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:46:58,367-Speed 2626.55 samples/sec Loss 4.7573 LearningRate 0.0127 Epoch: 12 Global Step: 534460 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:47:02,270-Speed 2623.77 samples/sec Loss 4.8145 LearningRate 0.0127 Epoch: 12 Global Step: 534470 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:47:06,181-Speed 2618.86 samples/sec Loss 4.8427 LearningRate 0.0127 Epoch: 12 Global Step: 534480 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:47:10,067-Speed 2635.85 samples/sec Loss 4.7934 LearningRate 0.0127 Epoch: 12 Global Step: 534490 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:13,976-Speed 2619.76 samples/sec Loss 4.8744 LearningRate 0.0127 Epoch: 12 Global Step: 534500 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:17,900-Speed 2610.65 samples/sec Loss 4.8212 LearningRate 0.0127 Epoch: 12 Global Step: 534510 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:21,797-Speed 2628.70 samples/sec Loss 4.8478 LearningRate 0.0127 Epoch: 12 Global Step: 534520 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:25,698-Speed 2625.29 samples/sec Loss 4.7322 LearningRate 0.0126 Epoch: 12 Global Step: 534530 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:29,666-Speed 2581.73 samples/sec Loss 4.7672 LearningRate 0.0126 Epoch: 12 Global Step: 534540 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:33,572-Speed 2621.66 samples/sec Loss 4.7412 LearningRate 0.0126 Epoch: 12 Global Step: 534550 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:37,479-Speed 2621.14 samples/sec Loss 4.8157 LearningRate 0.0126 Epoch: 12 Global Step: 534560 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:41,399-Speed 2613.34 samples/sec Loss 4.6703 LearningRate 0.0126 Epoch: 12 Global Step: 534570 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:45,334-Speed 2602.86 samples/sec Loss 4.7627 LearningRate 0.0126 Epoch: 12 Global Step: 534580 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:47:49,248-Speed 2616.67 samples/sec Loss 4.8609 LearningRate 0.0126 Epoch: 12 Global Step: 534590 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:47:53,158-Speed 2619.38 samples/sec Loss 4.8261 LearningRate 0.0126 Epoch: 12 Global Step: 534600 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:47:57,062-Speed 2624.36 samples/sec Loss 4.8385 LearningRate 0.0126 Epoch: 12 Global Step: 534610 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:00,962-Speed 2626.09 samples/sec Loss 4.8796 LearningRate 0.0126 Epoch: 12 Global Step: 534620 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:04,858-Speed 2628.56 samples/sec Loss 4.8751 LearningRate 0.0126 Epoch: 12 Global Step: 534630 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:08,769-Speed 2619.03 samples/sec Loss 4.7492 LearningRate 0.0126 Epoch: 12 Global Step: 534640 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:12,667-Speed 2628.09 samples/sec Loss 4.7216 LearningRate 0.0126 Epoch: 12 Global Step: 534650 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:16,563-Speed 2629.28 samples/sec Loss 4.8025 LearningRate 0.0126 Epoch: 12 Global Step: 534660 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:20,458-Speed 2628.96 samples/sec Loss 4.7816 LearningRate 0.0126 Epoch: 12 Global Step: 534670 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:24,361-Speed 2625.14 samples/sec Loss 4.8042 LearningRate 0.0126 Epoch: 12 Global Step: 534680 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:28,266-Speed 2622.48 samples/sec Loss 4.7612 LearningRate 0.0126 Epoch: 12 Global Step: 534690 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:48:32,168-Speed 2625.05 samples/sec Loss 4.6951 LearningRate 0.0126 Epoch: 12 Global Step: 534700 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:48:36,087-Speed 2613.67 samples/sec Loss 4.7575 LearningRate 0.0126 Epoch: 12 Global Step: 534710 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:48:40,017-Speed 2606.02 samples/sec Loss 4.7665 LearningRate 0.0126 Epoch: 12 Global Step: 534720 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:48:43,919-Speed 2624.71 samples/sec Loss 4.7600 LearningRate 0.0126 Epoch: 12 Global Step: 534730 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:48:47,816-Speed 2628.57 samples/sec Loss 4.7341 LearningRate 0.0126 Epoch: 12 Global Step: 534740 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:48:51,715-Speed 2626.63 samples/sec Loss 4.8144 LearningRate 0.0126 Epoch: 12 Global Step: 534750 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:48:55,587-Speed 2645.95 samples/sec Loss 4.7679 LearningRate 0.0126 Epoch: 12 Global Step: 534760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:48:59,486-Speed 2626.56 samples/sec Loss 4.7434 LearningRate 0.0126 Epoch: 12 Global Step: 534770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:03,392-Speed 2622.13 samples/sec Loss 4.8520 LearningRate 0.0126 Epoch: 12 Global Step: 534780 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:07,288-Speed 2629.17 samples/sec Loss 4.7402 LearningRate 0.0126 Epoch: 12 Global Step: 534790 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:11,182-Speed 2630.19 samples/sec Loss 4.7875 LearningRate 0.0126 Epoch: 12 Global Step: 534800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:15,080-Speed 2627.80 samples/sec Loss 4.7656 LearningRate 0.0126 Epoch: 12 Global Step: 534810 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:18,972-Speed 2631.75 samples/sec Loss 4.7075 LearningRate 0.0126 Epoch: 12 Global Step: 534820 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:22,873-Speed 2625.61 samples/sec Loss 4.6722 LearningRate 0.0126 Epoch: 12 Global Step: 534830 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:26,774-Speed 2625.74 samples/sec Loss 4.7747 LearningRate 0.0126 Epoch: 12 Global Step: 534840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:30,690-Speed 2615.15 samples/sec Loss 4.7638 LearningRate 0.0126 Epoch: 12 Global Step: 534850 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:34,603-Speed 2618.07 samples/sec Loss 4.7475 LearningRate 0.0126 Epoch: 12 Global Step: 534860 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:49:38,502-Speed 2626.47 samples/sec Loss 4.8053 LearningRate 0.0126 Epoch: 12 Global Step: 534870 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:49:42,378-Speed 2642.21 samples/sec Loss 4.7778 LearningRate 0.0126 Epoch: 12 Global Step: 534880 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:46,272-Speed 2630.18 samples/sec Loss 4.7910 LearningRate 0.0126 Epoch: 12 Global Step: 534890 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:50,165-Speed 2631.36 samples/sec Loss 4.7842 LearningRate 0.0126 Epoch: 12 Global Step: 534900 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:54,059-Speed 2630.65 samples/sec Loss 4.7840 LearningRate 0.0126 Epoch: 12 Global Step: 534910 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:49:57,963-Speed 2623.36 samples/sec Loss 4.8334 LearningRate 0.0126 Epoch: 12 Global Step: 534920 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:50:01,866-Speed 2624.47 samples/sec Loss 4.7300 LearningRate 0.0126 Epoch: 12 Global Step: 534930 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:50:05,762-Speed 2629.16 samples/sec Loss 4.8225 LearningRate 0.0126 Epoch: 12 Global Step: 534940 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:50:09,656-Speed 2630.17 samples/sec Loss 4.8253 LearningRate 0.0126 Epoch: 12 Global Step: 534950 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:50:13,550-Speed 2630.25 samples/sec Loss 4.7748 LearningRate 0.0126 Epoch: 12 Global Step: 534960 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:50:17,449-Speed 2627.15 samples/sec Loss 4.8253 LearningRate 0.0126 Epoch: 12 Global Step: 534970 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:50:21,341-Speed 2630.98 samples/sec Loss 4.8126 LearningRate 0.0126 Epoch: 12 Global Step: 534980 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:25,247-Speed 2622.64 samples/sec Loss 4.7976 LearningRate 0.0126 Epoch: 12 Global Step: 534990 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:29,147-Speed 2626.96 samples/sec Loss 4.6565 LearningRate 0.0126 Epoch: 12 Global Step: 535000 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:33,040-Speed 2630.37 samples/sec Loss 4.8058 LearningRate 0.0126 Epoch: 12 Global Step: 535010 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:36,950-Speed 2619.72 samples/sec Loss 4.8343 LearningRate 0.0126 Epoch: 12 Global Step: 535020 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:40,852-Speed 2624.44 samples/sec Loss 4.8002 LearningRate 0.0126 Epoch: 12 Global Step: 535030 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:44,749-Speed 2628.97 samples/sec Loss 4.7888 LearningRate 0.0126 Epoch: 12 Global Step: 535040 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:48,655-Speed 2622.19 samples/sec Loss 4.7450 LearningRate 0.0126 Epoch: 12 Global Step: 535050 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:52,659-Speed 2557.90 samples/sec Loss 4.8221 LearningRate 0.0126 Epoch: 12 Global Step: 535060 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:50:56,573-Speed 2616.65 samples/sec Loss 4.7011 LearningRate 0.0126 Epoch: 12 Global Step: 535070 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:51:00,456-Speed 2637.74 samples/sec Loss 4.7849 LearningRate 0.0126 Epoch: 12 Global Step: 535080 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:51:04,353-Speed 2628.07 samples/sec Loss 4.8352 LearningRate 0.0126 Epoch: 12 Global Step: 535090 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:51:08,225-Speed 2645.41 samples/sec Loss 4.8072 LearningRate 0.0126 Epoch: 12 Global Step: 535100 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:12,118-Speed 2631.37 samples/sec Loss 4.7544 LearningRate 0.0126 Epoch: 12 Global Step: 535110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:16,014-Speed 2628.68 samples/sec Loss 4.6717 LearningRate 0.0126 Epoch: 12 Global Step: 535120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:19,907-Speed 2631.27 samples/sec Loss 4.7096 LearningRate 0.0126 Epoch: 12 Global Step: 535130 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:23,798-Speed 2631.69 samples/sec Loss 4.7804 LearningRate 0.0126 Epoch: 12 Global Step: 535140 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:27,703-Speed 2623.61 samples/sec Loss 4.7778 LearningRate 0.0126 Epoch: 12 Global Step: 535150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:31,604-Speed 2625.11 samples/sec Loss 4.7922 LearningRate 0.0126 Epoch: 12 Global Step: 535160 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:35,495-Speed 2631.98 samples/sec Loss 4.7407 LearningRate 0.0126 Epoch: 12 Global Step: 535170 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:39,390-Speed 2629.36 samples/sec Loss 4.8291 LearningRate 0.0126 Epoch: 12 Global Step: 535180 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:43,556-Speed 2459.02 samples/sec Loss 4.7756 LearningRate 0.0126 Epoch: 12 Global Step: 535190 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:51:47,463-Speed 2621.99 samples/sec Loss 4.8015 LearningRate 0.0126 Epoch: 12 Global Step: 535200 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:51:51,368-Speed 2622.47 samples/sec Loss 4.8590 LearningRate 0.0126 Epoch: 12 Global Step: 535210 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:51:55,267-Speed 2627.09 samples/sec Loss 4.8746 LearningRate 0.0126 Epoch: 12 Global Step: 535220 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:51:59,139-Speed 2645.40 samples/sec Loss 4.8291 LearningRate 0.0126 Epoch: 12 Global Step: 535230 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:03,031-Speed 2631.78 samples/sec Loss 4.7970 LearningRate 0.0126 Epoch: 12 Global Step: 535240 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:06,929-Speed 2626.92 samples/sec Loss 4.7746 LearningRate 0.0126 Epoch: 12 Global Step: 535250 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:10,828-Speed 2627.06 samples/sec Loss 4.9271 LearningRate 0.0126 Epoch: 12 Global Step: 535260 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:14,719-Speed 2632.09 samples/sec Loss 4.7289 LearningRate 0.0126 Epoch: 12 Global Step: 535270 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:18,626-Speed 2622.02 samples/sec Loss 4.8017 LearningRate 0.0126 Epoch: 12 Global Step: 535280 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:22,528-Speed 2625.41 samples/sec Loss 4.7757 LearningRate 0.0126 Epoch: 12 Global Step: 535290 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:26,422-Speed 2629.85 samples/sec Loss 4.8400 LearningRate 0.0126 Epoch: 12 Global Step: 535300 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:30,361-Speed 2600.48 samples/sec Loss 4.8032 LearningRate 0.0126 Epoch: 12 Global Step: 535310 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:34,257-Speed 2628.76 samples/sec Loss 4.8188 LearningRate 0.0126 Epoch: 12 Global Step: 535320 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:38,152-Speed 2629.93 samples/sec Loss 4.6974 LearningRate 0.0126 Epoch: 12 Global Step: 535330 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:52:42,021-Speed 2646.68 samples/sec Loss 4.6973 LearningRate 0.0126 Epoch: 12 Global Step: 535340 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:45,919-Speed 2627.92 samples/sec Loss 4.7225 LearningRate 0.0126 Epoch: 12 Global Step: 535350 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:49,833-Speed 2616.47 samples/sec Loss 4.8137 LearningRate 0.0126 Epoch: 12 Global Step: 535360 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:53,738-Speed 2623.15 samples/sec Loss 4.7494 LearningRate 0.0126 Epoch: 12 Global Step: 535370 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:52:57,646-Speed 2621.35 samples/sec Loss 4.7154 LearningRate 0.0126 Epoch: 12 Global Step: 535380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:01,542-Speed 2628.85 samples/sec Loss 4.7753 LearningRate 0.0126 Epoch: 12 Global Step: 535390 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:05,438-Speed 2628.92 samples/sec Loss 4.8265 LearningRate 0.0126 Epoch: 12 Global Step: 535400 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:09,335-Speed 2627.99 samples/sec Loss 4.7497 LearningRate 0.0126 Epoch: 12 Global Step: 535410 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:13,244-Speed 2620.04 samples/sec Loss 4.7505 LearningRate 0.0126 Epoch: 12 Global Step: 535420 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:17,139-Speed 2629.57 samples/sec Loss 4.7530 LearningRate 0.0126 Epoch: 12 Global Step: 535430 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:21,011-Speed 2645.57 samples/sec Loss 4.7286 LearningRate 0.0126 Epoch: 12 Global Step: 535440 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:24,906-Speed 2629.51 samples/sec Loss 4.7856 LearningRate 0.0126 Epoch: 12 Global Step: 535450 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:28,819-Speed 2617.83 samples/sec Loss 4.7023 LearningRate 0.0126 Epoch: 12 Global Step: 535460 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:32,720-Speed 2625.38 samples/sec Loss 4.8666 LearningRate 0.0126 Epoch: 12 Global Step: 535470 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:36,613-Speed 2630.73 samples/sec Loss 4.7112 LearningRate 0.0126 Epoch: 12 Global Step: 535480 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:40,511-Speed 2628.10 samples/sec Loss 4.8605 LearningRate 0.0126 Epoch: 12 Global Step: 535490 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:53:44,398-Speed 2635.71 samples/sec Loss 4.7380 LearningRate 0.0126 Epoch: 12 Global Step: 535500 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:53:48,294-Speed 2628.79 samples/sec Loss 4.7364 LearningRate 0.0126 Epoch: 12 Global Step: 535510 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:53:52,190-Speed 2629.15 samples/sec Loss 4.7177 LearningRate 0.0126 Epoch: 12 Global Step: 535520 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:53:56,082-Speed 2631.36 samples/sec Loss 4.8118 LearningRate 0.0126 Epoch: 12 Global Step: 535530 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:53:59,992-Speed 2620.10 samples/sec Loss 4.7611 LearningRate 0.0126 Epoch: 12 Global Step: 535540 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:54:03,904-Speed 2618.20 samples/sec Loss 4.7286 LearningRate 0.0126 Epoch: 12 Global Step: 535550 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:54:07,802-Speed 2627.32 samples/sec Loss 4.6777 LearningRate 0.0126 Epoch: 12 Global Step: 535560 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:54:11,702-Speed 2626.33 samples/sec Loss 4.7757 LearningRate 0.0126 Epoch: 12 Global Step: 535570 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:54:15,605-Speed 2624.96 samples/sec Loss 4.7356 LearningRate 0.0126 Epoch: 12 Global Step: 535580 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:54:19,504-Speed 2627.25 samples/sec Loss 4.7255 LearningRate 0.0126 Epoch: 12 Global Step: 535590 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 07:54:23,412-Speed 2620.62 samples/sec Loss 4.7616 LearningRate 0.0126 Epoch: 12 Global Step: 535600 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:27,318-Speed 2622.69 samples/sec Loss 4.7378 LearningRate 0.0126 Epoch: 12 Global Step: 535610 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:31,219-Speed 2624.85 samples/sec Loss 4.7767 LearningRate 0.0126 Epoch: 12 Global Step: 535620 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:35,128-Speed 2620.66 samples/sec Loss 4.8475 LearningRate 0.0126 Epoch: 12 Global Step: 535630 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:39,031-Speed 2624.21 samples/sec Loss 4.7851 LearningRate 0.0126 Epoch: 12 Global Step: 535640 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:42,925-Speed 2630.22 samples/sec Loss 4.7660 LearningRate 0.0126 Epoch: 12 Global Step: 535650 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:46,819-Speed 2630.23 samples/sec Loss 4.7354 LearningRate 0.0126 Epoch: 12 Global Step: 535660 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:50,719-Speed 2626.54 samples/sec Loss 4.8303 LearningRate 0.0126 Epoch: 12 Global Step: 535670 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:54,613-Speed 2630.92 samples/sec Loss 4.7895 LearningRate 0.0126 Epoch: 12 Global Step: 535680 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:54:58,519-Speed 2621.54 samples/sec Loss 4.6005 LearningRate 0.0126 Epoch: 12 Global Step: 535690 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:02,412-Speed 2631.47 samples/sec Loss 4.7424 LearningRate 0.0125 Epoch: 12 Global Step: 535700 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:55:06,308-Speed 2628.68 samples/sec Loss 4.7597 LearningRate 0.0125 Epoch: 12 Global Step: 535710 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:55:10,206-Speed 2627.46 samples/sec Loss 4.8931 LearningRate 0.0125 Epoch: 12 Global Step: 535720 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:55:14,104-Speed 2627.51 samples/sec Loss 4.7262 LearningRate 0.0125 Epoch: 12 Global Step: 535730 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:55:18,019-Speed 2616.23 samples/sec Loss 4.7511 LearningRate 0.0125 Epoch: 12 Global Step: 535740 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:55:21,905-Speed 2635.45 samples/sec Loss 4.7522 LearningRate 0.0125 Epoch: 12 Global Step: 535750 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:25,798-Speed 2631.40 samples/sec Loss 4.8155 LearningRate 0.0125 Epoch: 12 Global Step: 535760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:29,695-Speed 2628.26 samples/sec Loss 4.7841 LearningRate 0.0125 Epoch: 12 Global Step: 535770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:33,591-Speed 2629.29 samples/sec Loss 4.9095 LearningRate 0.0125 Epoch: 12 Global Step: 535780 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:37,491-Speed 2626.52 samples/sec Loss 4.7752 LearningRate 0.0125 Epoch: 12 Global Step: 535790 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:41,387-Speed 2628.30 samples/sec Loss 4.7823 LearningRate 0.0125 Epoch: 12 Global Step: 535800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:45,282-Speed 2629.61 samples/sec Loss 4.9024 LearningRate 0.0125 Epoch: 12 Global Step: 535810 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:49,180-Speed 2628.20 samples/sec Loss 4.7103 LearningRate 0.0125 Epoch: 12 Global Step: 535820 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:53,079-Speed 2626.22 samples/sec Loss 4.7978 LearningRate 0.0125 Epoch: 12 Global Step: 535830 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:55:57,065-Speed 2570.22 samples/sec Loss 4.8045 LearningRate 0.0125 Epoch: 12 Global Step: 535840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:56:01,165-Speed 2497.97 samples/sec Loss 4.7685 LearningRate 0.0125 Epoch: 12 Global Step: 535850 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:05,138-Speed 2577.62 samples/sec Loss 4.7217 LearningRate 0.0125 Epoch: 12 Global Step: 535860 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:09,044-Speed 2622.69 samples/sec Loss 4.7164 LearningRate 0.0125 Epoch: 12 Global Step: 535870 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:12,945-Speed 2625.31 samples/sec Loss 4.7777 LearningRate 0.0125 Epoch: 12 Global Step: 535880 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:16,870-Speed 2609.57 samples/sec Loss 4.8616 LearningRate 0.0125 Epoch: 12 Global Step: 535890 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:20,813-Speed 2597.87 samples/sec Loss 4.7644 LearningRate 0.0125 Epoch: 12 Global Step: 535900 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:24,756-Speed 2597.80 samples/sec Loss 4.7590 LearningRate 0.0125 Epoch: 12 Global Step: 535910 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:28,659-Speed 2623.86 samples/sec Loss 4.7830 LearningRate 0.0125 Epoch: 12 Global Step: 535920 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:32,560-Speed 2625.62 samples/sec Loss 4.7965 LearningRate 0.0125 Epoch: 12 Global Step: 535930 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:36,457-Speed 2628.28 samples/sec Loss 4.7616 LearningRate 0.0125 Epoch: 12 Global Step: 535940 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:40,335-Speed 2640.75 samples/sec Loss 4.6420 LearningRate 0.0125 Epoch: 12 Global Step: 535950 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:44,278-Speed 2598.26 samples/sec Loss 4.7538 LearningRate 0.0125 Epoch: 12 Global Step: 535960 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:48,244-Speed 2582.68 samples/sec Loss 4.6863 LearningRate 0.0125 Epoch: 12 Global Step: 535970 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:52,138-Speed 2630.68 samples/sec Loss 4.6751 LearningRate 0.0125 Epoch: 12 Global Step: 535980 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:56,031-Speed 2630.62 samples/sec Loss 4.6811 LearningRate 0.0125 Epoch: 12 Global Step: 535990 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:56:59,926-Speed 2629.30 samples/sec Loss 4.6203 LearningRate 0.0125 Epoch: 12 Global Step: 536000 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:03,834-Speed 2620.95 samples/sec Loss 4.7097 LearningRate 0.0125 Epoch: 12 Global Step: 536010 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:07,745-Speed 2618.96 samples/sec Loss 4.7748 LearningRate 0.0125 Epoch: 12 Global Step: 536020 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:11,645-Speed 2625.90 samples/sec Loss 4.7266 LearningRate 0.0125 Epoch: 12 Global Step: 536030 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:15,542-Speed 2629.11 samples/sec Loss 4.7974 LearningRate 0.0125 Epoch: 12 Global Step: 536040 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:19,464-Speed 2611.90 samples/sec Loss 4.7820 LearningRate 0.0125 Epoch: 12 Global Step: 536050 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:23,360-Speed 2628.82 samples/sec Loss 4.7729 LearningRate 0.0125 Epoch: 12 Global Step: 536060 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:27,286-Speed 2608.97 samples/sec Loss 4.7902 LearningRate 0.0125 Epoch: 12 Global Step: 536070 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:31,204-Speed 2614.27 samples/sec Loss 4.8143 LearningRate 0.0125 Epoch: 12 Global Step: 536080 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:35,113-Speed 2620.74 samples/sec Loss 4.6591 LearningRate 0.0125 Epoch: 12 Global Step: 536090 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:57:39,017-Speed 2623.26 samples/sec Loss 4.7949 LearningRate 0.0125 Epoch: 12 Global Step: 536100 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:57:42,912-Speed 2630.03 samples/sec Loss 4.7901 LearningRate 0.0125 Epoch: 12 Global Step: 536110 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:57:46,806-Speed 2629.74 samples/sec Loss 4.6971 LearningRate 0.0125 Epoch: 12 Global Step: 536120 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:57:50,707-Speed 2626.26 samples/sec Loss 4.7649 LearningRate 0.0125 Epoch: 12 Global Step: 536130 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:57:54,603-Speed 2628.70 samples/sec Loss 4.7559 LearningRate 0.0125 Epoch: 12 Global Step: 536140 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:57:58,501-Speed 2628.39 samples/sec Loss 4.8018 LearningRate 0.0125 Epoch: 12 Global Step: 536150 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:58:02,396-Speed 2629.53 samples/sec Loss 4.7974 LearningRate 0.0125 Epoch: 12 Global Step: 536160 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:58:06,289-Speed 2630.51 samples/sec Loss 4.8331 LearningRate 0.0125 Epoch: 12 Global Step: 536170 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:58:10,160-Speed 2645.67 samples/sec Loss 4.7428 LearningRate 0.0125 Epoch: 12 Global Step: 536180 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:14,056-Speed 2629.42 samples/sec Loss 4.7134 LearningRate 0.0125 Epoch: 12 Global Step: 536190 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:17,960-Speed 2622.90 samples/sec Loss 4.8368 LearningRate 0.0125 Epoch: 12 Global Step: 536200 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:21,857-Speed 2628.76 samples/sec Loss 4.8100 LearningRate 0.0125 Epoch: 12 Global Step: 536210 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:25,755-Speed 2627.58 samples/sec Loss 4.8760 LearningRate 0.0125 Epoch: 12 Global Step: 536220 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:29,663-Speed 2621.51 samples/sec Loss 4.8858 LearningRate 0.0125 Epoch: 12 Global Step: 536230 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:33,564-Speed 2625.33 samples/sec Loss 4.7885 LearningRate 0.0125 Epoch: 12 Global Step: 536240 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:37,475-Speed 2618.84 samples/sec Loss 4.7540 LearningRate 0.0125 Epoch: 12 Global Step: 536250 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:41,368-Speed 2631.33 samples/sec Loss 4.7449 LearningRate 0.0125 Epoch: 12 Global Step: 536260 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:45,262-Speed 2630.27 samples/sec Loss 4.7651 LearningRate 0.0125 Epoch: 12 Global Step: 536270 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:58:49,168-Speed 2622.14 samples/sec Loss 4.7027 LearningRate 0.0125 Epoch: 12 Global Step: 536280 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:58:53,085-Speed 2615.41 samples/sec Loss 4.6714 LearningRate 0.0125 Epoch: 12 Global Step: 536290 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:58:56,980-Speed 2629.84 samples/sec Loss 4.8586 LearningRate 0.0125 Epoch: 12 Global Step: 536300 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:59:00,850-Speed 2646.50 samples/sec Loss 4.7063 LearningRate 0.0125 Epoch: 12 Global Step: 536310 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:04,771-Speed 2611.83 samples/sec Loss 4.7960 LearningRate 0.0125 Epoch: 12 Global Step: 536320 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:08,672-Speed 2625.44 samples/sec Loss 4.7492 LearningRate 0.0125 Epoch: 12 Global Step: 536330 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:12,575-Speed 2624.46 samples/sec Loss 4.7146 LearningRate 0.0125 Epoch: 12 Global Step: 536340 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:16,470-Speed 2629.56 samples/sec Loss 4.7547 LearningRate 0.0125 Epoch: 12 Global Step: 536350 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:20,383-Speed 2617.69 samples/sec Loss 4.7956 LearningRate 0.0125 Epoch: 12 Global Step: 536360 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:24,285-Speed 2624.64 samples/sec Loss 4.7521 LearningRate 0.0125 Epoch: 12 Global Step: 536370 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:28,203-Speed 2614.83 samples/sec Loss 4.8029 LearningRate 0.0125 Epoch: 12 Global Step: 536380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:32,150-Speed 2595.25 samples/sec Loss 4.7855 LearningRate 0.0125 Epoch: 12 Global Step: 536390 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:36,147-Speed 2562.43 samples/sec Loss 4.7705 LearningRate 0.0125 Epoch: 12 Global Step: 536400 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:40,057-Speed 2619.28 samples/sec Loss 4.8349 LearningRate 0.0125 Epoch: 12 Global Step: 536410 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:59:43,954-Speed 2628.34 samples/sec Loss 4.8185 LearningRate 0.0125 Epoch: 12 Global Step: 536420 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 07:59:47,832-Speed 2641.21 samples/sec Loss 4.7703 LearningRate 0.0125 Epoch: 12 Global Step: 536430 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:51,882-Speed 2529.24 samples/sec Loss 4.8328 LearningRate 0.0125 Epoch: 12 Global Step: 536440 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 07:59:55,965-Speed 2508.33 samples/sec Loss 4.7721 LearningRate 0.0125 Epoch: 12 Global Step: 536450 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:00,032-Speed 2518.63 samples/sec Loss 4.7718 LearningRate 0.0125 Epoch: 12 Global Step: 536460 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:03,935-Speed 2623.84 samples/sec Loss 4.7635 LearningRate 0.0125 Epoch: 12 Global Step: 536470 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:07,838-Speed 2624.70 samples/sec Loss 4.8187 LearningRate 0.0125 Epoch: 12 Global Step: 536480 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:11,739-Speed 2625.20 samples/sec Loss 4.7989 LearningRate 0.0125 Epoch: 12 Global Step: 536490 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:15,692-Speed 2591.52 samples/sec Loss 4.8222 LearningRate 0.0125 Epoch: 12 Global Step: 536500 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:19,587-Speed 2629.66 samples/sec Loss 4.6355 LearningRate 0.0125 Epoch: 12 Global Step: 536510 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:23,483-Speed 2629.20 samples/sec Loss 4.7452 LearningRate 0.0125 Epoch: 12 Global Step: 536520 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:27,378-Speed 2629.82 samples/sec Loss 4.7563 LearningRate 0.0125 Epoch: 12 Global Step: 536530 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:00:31,255-Speed 2642.20 samples/sec Loss 4.7663 LearningRate 0.0125 Epoch: 12 Global Step: 536540 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:35,148-Speed 2630.95 samples/sec Loss 4.8005 LearningRate 0.0125 Epoch: 12 Global Step: 536550 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:39,043-Speed 2629.37 samples/sec Loss 4.7689 LearningRate 0.0125 Epoch: 12 Global Step: 536560 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:42,936-Speed 2630.96 samples/sec Loss 4.7586 LearningRate 0.0125 Epoch: 12 Global Step: 536570 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:46,835-Speed 2627.14 samples/sec Loss 4.8048 LearningRate 0.0125 Epoch: 12 Global Step: 536580 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:50,741-Speed 2622.59 samples/sec Loss 4.6975 LearningRate 0.0125 Epoch: 12 Global Step: 536590 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:54,656-Speed 2616.85 samples/sec Loss 4.6839 LearningRate 0.0125 Epoch: 12 Global Step: 536600 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:00:58,552-Speed 2628.65 samples/sec Loss 4.8262 LearningRate 0.0125 Epoch: 12 Global Step: 536610 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:02,453-Speed 2626.08 samples/sec Loss 4.8583 LearningRate 0.0125 Epoch: 12 Global Step: 536620 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:06,345-Speed 2631.76 samples/sec Loss 4.7259 LearningRate 0.0125 Epoch: 12 Global Step: 536630 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:10,218-Speed 2644.31 samples/sec Loss 4.7022 LearningRate 0.0125 Epoch: 12 Global Step: 536640 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:14,113-Speed 2629.54 samples/sec Loss 4.6780 LearningRate 0.0125 Epoch: 12 Global Step: 536650 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:18,021-Speed 2621.82 samples/sec Loss 4.7600 LearningRate 0.0125 Epoch: 12 Global Step: 536660 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:21,914-Speed 2631.06 samples/sec Loss 4.6855 LearningRate 0.0125 Epoch: 12 Global Step: 536670 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:25,826-Speed 2618.27 samples/sec Loss 4.8269 LearningRate 0.0125 Epoch: 12 Global Step: 536680 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:29,718-Speed 2631.96 samples/sec Loss 4.7781 LearningRate 0.0125 Epoch: 12 Global Step: 536690 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:33,650-Speed 2604.93 samples/sec Loss 4.6970 LearningRate 0.0125 Epoch: 12 Global Step: 536700 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:37,553-Speed 2624.39 samples/sec Loss 4.7319 LearningRate 0.0125 Epoch: 12 Global Step: 536710 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:41,448-Speed 2629.59 samples/sec Loss 4.6792 LearningRate 0.0125 Epoch: 12 Global Step: 536720 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:45,338-Speed 2633.24 samples/sec Loss 4.6136 LearningRate 0.0125 Epoch: 12 Global Step: 536730 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:01:49,259-Speed 2612.62 samples/sec Loss 4.7391 LearningRate 0.0125 Epoch: 12 Global Step: 536740 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:01:53,158-Speed 2626.84 samples/sec Loss 4.8927 LearningRate 0.0125 Epoch: 12 Global Step: 536750 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:01:57,042-Speed 2637.60 samples/sec Loss 4.7670 LearningRate 0.0125 Epoch: 12 Global Step: 536760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:00,943-Speed 2625.68 samples/sec Loss 4.7760 LearningRate 0.0125 Epoch: 12 Global Step: 536770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:04,838-Speed 2630.10 samples/sec Loss 4.7646 LearningRate 0.0125 Epoch: 12 Global Step: 536780 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:08,732-Speed 2630.13 samples/sec Loss 4.7727 LearningRate 0.0125 Epoch: 12 Global Step: 536790 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:12,632-Speed 2625.77 samples/sec Loss 4.7140 LearningRate 0.0125 Epoch: 12 Global Step: 536800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:16,527-Speed 2629.57 samples/sec Loss 4.7413 LearningRate 0.0125 Epoch: 12 Global Step: 536810 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:20,428-Speed 2625.96 samples/sec Loss 4.7119 LearningRate 0.0125 Epoch: 12 Global Step: 536820 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:24,329-Speed 2625.54 samples/sec Loss 4.7393 LearningRate 0.0125 Epoch: 12 Global Step: 536830 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:28,247-Speed 2614.23 samples/sec Loss 4.6323 LearningRate 0.0125 Epoch: 12 Global Step: 536840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:32,147-Speed 2625.95 samples/sec Loss 4.7208 LearningRate 0.0125 Epoch: 12 Global Step: 536850 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:02:36,062-Speed 2616.34 samples/sec Loss 4.7474 LearningRate 0.0125 Epoch: 12 Global Step: 536860 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:02:39,966-Speed 2623.95 samples/sec Loss 4.6553 LearningRate 0.0124 Epoch: 12 Global Step: 536870 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:02:43,864-Speed 2627.09 samples/sec Loss 4.7621 LearningRate 0.0124 Epoch: 12 Global Step: 536880 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:02:47,773-Speed 2621.24 samples/sec Loss 4.7229 LearningRate 0.0124 Epoch: 12 Global Step: 536890 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:02:51,671-Speed 2627.21 samples/sec Loss 4.7338 LearningRate 0.0124 Epoch: 12 Global Step: 536900 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:02:55,569-Speed 2628.24 samples/sec Loss 4.7319 LearningRate 0.0124 Epoch: 12 Global Step: 536910 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:02:59,464-Speed 2629.47 samples/sec Loss 4.7654 LearningRate 0.0124 Epoch: 12 Global Step: 536920 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:03:03,380-Speed 2615.31 samples/sec Loss 4.8072 LearningRate 0.0124 Epoch: 12 Global Step: 536930 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:03:07,326-Speed 2595.33 samples/sec Loss 4.6411 LearningRate 0.0124 Epoch: 12 Global Step: 536940 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:03:11,316-Speed 2568.37 samples/sec Loss 4.7834 LearningRate 0.0124 Epoch: 12 Global Step: 536950 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:03:15,186-Speed 2646.63 samples/sec Loss 4.8387 LearningRate 0.0124 Epoch: 12 Global Step: 536960 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:03:19,102-Speed 2615.44 samples/sec Loss 4.8021 LearningRate 0.0124 Epoch: 12 Global Step: 536970 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:03:23,014-Speed 2618.76 samples/sec Loss 4.7803 LearningRate 0.0124 Epoch: 12 Global Step: 536980 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:03:26,908-Speed 2630.32 samples/sec Loss 4.7810 LearningRate 0.0124 Epoch: 12 Global Step: 536990 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:03:30,814-Speed 2622.17 samples/sec Loss 4.7616 LearningRate 0.0124 Epoch: 12 Global Step: 537000 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:03:34,695-Speed 2639.41 samples/sec Loss 4.6803 LearningRate 0.0124 Epoch: 12 Global Step: 537010 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:03:38,592-Speed 2628.24 samples/sec Loss 4.7034 LearningRate 0.0124 Epoch: 12 Global Step: 537020 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:03:42,485-Speed 2630.67 samples/sec Loss 4.8750 LearningRate 0.0124 Epoch: 12 Global Step: 537030 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:03:46,399-Speed 2617.57 samples/sec Loss 4.7252 LearningRate 0.0124 Epoch: 12 Global Step: 537040 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:03:50,297-Speed 2627.66 samples/sec Loss 4.7842 LearningRate 0.0124 Epoch: 12 Global Step: 537050 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:03:54,201-Speed 2624.05 samples/sec Loss 4.8222 LearningRate 0.0124 Epoch: 12 Global Step: 537060 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:03:58,109-Speed 2620.85 samples/sec Loss 4.6946 LearningRate 0.0124 Epoch: 12 Global Step: 537070 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:04:02,009-Speed 2626.56 samples/sec Loss 4.8079 LearningRate 0.0124 Epoch: 12 Global Step: 537080 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:04:05,903-Speed 2630.37 samples/sec Loss 4.7977 LearningRate 0.0124 Epoch: 12 Global Step: 537090 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:04:09,804-Speed 2625.28 samples/sec Loss 4.8120 LearningRate 0.0124 Epoch: 12 Global Step: 537100 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:04:13,703-Speed 2627.16 samples/sec Loss 4.7965 LearningRate 0.0124 Epoch: 12 Global Step: 537110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:17,606-Speed 2623.84 samples/sec Loss 4.8502 LearningRate 0.0124 Epoch: 12 Global Step: 537120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:21,498-Speed 2632.29 samples/sec Loss 4.7747 LearningRate 0.0124 Epoch: 12 Global Step: 537130 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:25,394-Speed 2628.65 samples/sec Loss 4.8665 LearningRate 0.0124 Epoch: 12 Global Step: 537140 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:29,291-Speed 2628.78 samples/sec Loss 4.7999 LearningRate 0.0124 Epoch: 12 Global Step: 537150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:33,187-Speed 2629.15 samples/sec Loss 4.7888 LearningRate 0.0124 Epoch: 12 Global Step: 537160 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:37,094-Speed 2621.35 samples/sec Loss 4.6714 LearningRate 0.0124 Epoch: 12 Global Step: 537170 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:40,990-Speed 2628.69 samples/sec Loss 4.7357 LearningRate 0.0124 Epoch: 12 Global Step: 537180 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:44,888-Speed 2627.58 samples/sec Loss 4.7655 LearningRate 0.0124 Epoch: 12 Global Step: 537190 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:48,790-Speed 2624.95 samples/sec Loss 4.7251 LearningRate 0.0124 Epoch: 12 Global Step: 537200 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:04:52,687-Speed 2628.73 samples/sec Loss 4.7330 LearningRate 0.0124 Epoch: 12 Global Step: 537210 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:04:56,583-Speed 2628.94 samples/sec Loss 4.6427 LearningRate 0.0124 Epoch: 12 Global Step: 537220 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:05:00,536-Speed 2591.70 samples/sec Loss 4.7344 LearningRate 0.0124 Epoch: 12 Global Step: 537230 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:05:04,428-Speed 2631.26 samples/sec Loss 4.7513 LearningRate 0.0124 Epoch: 12 Global Step: 537240 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:05:08,329-Speed 2625.58 samples/sec Loss 4.6873 LearningRate 0.0124 Epoch: 12 Global Step: 537250 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:05:12,236-Speed 2621.32 samples/sec Loss 4.9007 LearningRate 0.0124 Epoch: 12 Global Step: 537260 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:05:16,140-Speed 2624.45 samples/sec Loss 4.7607 LearningRate 0.0124 Epoch: 12 Global Step: 537270 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:05:20,036-Speed 2629.09 samples/sec Loss 4.7992 LearningRate 0.0124 Epoch: 12 Global Step: 537280 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:05:23,913-Speed 2642.08 samples/sec Loss 4.7878 LearningRate 0.0124 Epoch: 12 Global Step: 537290 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:27,806-Speed 2630.99 samples/sec Loss 4.6741 LearningRate 0.0124 Epoch: 12 Global Step: 537300 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:31,701-Speed 2629.63 samples/sec Loss 4.7644 LearningRate 0.0124 Epoch: 12 Global Step: 537310 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:35,609-Speed 2621.36 samples/sec Loss 4.7494 LearningRate 0.0124 Epoch: 12 Global Step: 537320 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:39,506-Speed 2627.91 samples/sec Loss 4.7284 LearningRate 0.0124 Epoch: 12 Global Step: 537330 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:43,410-Speed 2624.26 samples/sec Loss 4.8236 LearningRate 0.0124 Epoch: 12 Global Step: 537340 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:47,354-Speed 2597.14 samples/sec Loss 4.7677 LearningRate 0.0124 Epoch: 12 Global Step: 537350 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:51,287-Speed 2603.89 samples/sec Loss 4.8109 LearningRate 0.0124 Epoch: 12 Global Step: 537360 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:55,185-Speed 2628.10 samples/sec Loss 4.6589 LearningRate 0.0124 Epoch: 12 Global Step: 537370 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:05:59,080-Speed 2630.00 samples/sec Loss 4.6584 LearningRate 0.0124 Epoch: 12 Global Step: 537380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:02,995-Speed 2616.18 samples/sec Loss 4.6901 LearningRate 0.0124 Epoch: 12 Global Step: 537390 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:06:06,914-Speed 2612.97 samples/sec Loss 4.7910 LearningRate 0.0124 Epoch: 12 Global Step: 537400 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:06:10,818-Speed 2624.13 samples/sec Loss 4.7748 LearningRate 0.0124 Epoch: 12 Global Step: 537410 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:06:14,720-Speed 2625.41 samples/sec Loss 4.5919 LearningRate 0.0124 Epoch: 12 Global Step: 537420 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:06:18,617-Speed 2627.63 samples/sec Loss 4.8329 LearningRate 0.0124 Epoch: 12 Global Step: 537430 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:06:22,519-Speed 2625.16 samples/sec Loss 4.7728 LearningRate 0.0124 Epoch: 12 Global Step: 537440 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:06:26,423-Speed 2624.21 samples/sec Loss 4.7259 LearningRate 0.0124 Epoch: 12 Global Step: 537450 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:06:30,300-Speed 2641.77 samples/sec Loss 4.6802 LearningRate 0.0124 Epoch: 12 Global Step: 537460 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:34,222-Speed 2611.55 samples/sec Loss 4.7319 LearningRate 0.0124 Epoch: 12 Global Step: 537470 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:38,133-Speed 2619.54 samples/sec Loss 4.7282 LearningRate 0.0124 Epoch: 12 Global Step: 537480 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:42,033-Speed 2626.46 samples/sec Loss 4.8488 LearningRate 0.0124 Epoch: 12 Global Step: 537490 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:45,935-Speed 2624.55 samples/sec Loss 4.7340 LearningRate 0.0124 Epoch: 12 Global Step: 537500 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:49,835-Speed 2625.95 samples/sec Loss 4.6642 LearningRate 0.0124 Epoch: 12 Global Step: 537510 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:53,740-Speed 2623.41 samples/sec Loss 4.6825 LearningRate 0.0124 Epoch: 12 Global Step: 537520 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:06:57,665-Speed 2609.43 samples/sec Loss 4.7674 LearningRate 0.0124 Epoch: 12 Global Step: 537530 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:07:01,581-Speed 2616.36 samples/sec Loss 4.7598 LearningRate 0.0124 Epoch: 12 Global Step: 537540 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:07:05,508-Speed 2608.43 samples/sec Loss 4.7346 LearningRate 0.0124 Epoch: 12 Global Step: 537550 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:07:09,419-Speed 2618.78 samples/sec Loss 4.7737 LearningRate 0.0124 Epoch: 12 Global Step: 537560 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:13,330-Speed 2619.34 samples/sec Loss 4.7280 LearningRate 0.0124 Epoch: 12 Global Step: 537570 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:17,237-Speed 2621.39 samples/sec Loss 4.6864 LearningRate 0.0124 Epoch: 12 Global Step: 537580 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:21,136-Speed 2627.04 samples/sec Loss 4.8741 LearningRate 0.0124 Epoch: 12 Global Step: 537590 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:25,057-Speed 2612.42 samples/sec Loss 4.7708 LearningRate 0.0124 Epoch: 12 Global Step: 537600 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:28,967-Speed 2619.73 samples/sec Loss 4.7989 LearningRate 0.0124 Epoch: 12 Global Step: 537610 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:32,874-Speed 2621.63 samples/sec Loss 4.7128 LearningRate 0.0124 Epoch: 12 Global Step: 537620 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:36,895-Speed 2547.92 samples/sec Loss 4.8079 LearningRate 0.0124 Epoch: 12 Global Step: 537630 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:40,794-Speed 2627.19 samples/sec Loss 4.7586 LearningRate 0.0124 Epoch: 12 Global Step: 537640 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:07:44,702-Speed 2620.38 samples/sec Loss 4.7584 LearningRate 0.0124 Epoch: 12 Global Step: 537650 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:07:48,615-Speed 2617.60 samples/sec Loss 4.6319 LearningRate 0.0124 Epoch: 12 Global Step: 537660 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:07:52,513-Speed 2627.74 samples/sec Loss 4.6427 LearningRate 0.0124 Epoch: 12 Global Step: 537670 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:07:56,415-Speed 2625.36 samples/sec Loss 4.7279 LearningRate 0.0124 Epoch: 12 Global Step: 537680 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:00,312-Speed 2628.55 samples/sec Loss 4.6504 LearningRate 0.0124 Epoch: 12 Global Step: 537690 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:04,240-Speed 2607.58 samples/sec Loss 4.6799 LearningRate 0.0124 Epoch: 12 Global Step: 537700 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:08,156-Speed 2616.74 samples/sec Loss 4.7348 LearningRate 0.0124 Epoch: 12 Global Step: 537710 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:12,060-Speed 2623.15 samples/sec Loss 4.7287 LearningRate 0.0124 Epoch: 12 Global Step: 537720 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:15,961-Speed 2625.40 samples/sec Loss 4.7839 LearningRate 0.0124 Epoch: 12 Global Step: 537730 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:19,860-Speed 2627.01 samples/sec Loss 4.6062 LearningRate 0.0124 Epoch: 12 Global Step: 537740 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:23,913-Speed 2528.00 samples/sec Loss 4.7708 LearningRate 0.0124 Epoch: 12 Global Step: 537750 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:08:27,869-Speed 2588.65 samples/sec Loss 4.7097 LearningRate 0.0124 Epoch: 12 Global Step: 537760 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:08:31,774-Speed 2623.21 samples/sec Loss 4.7728 LearningRate 0.0124 Epoch: 12 Global Step: 537770 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:08:35,671-Speed 2628.39 samples/sec Loss 4.6764 LearningRate 0.0124 Epoch: 12 Global Step: 537780 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:08:39,598-Speed 2608.36 samples/sec Loss 4.7604 LearningRate 0.0124 Epoch: 12 Global Step: 537790 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:08:43,513-Speed 2616.43 samples/sec Loss 4.7951 LearningRate 0.0124 Epoch: 12 Global Step: 537800 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:08:47,389-Speed 2642.76 samples/sec Loss 4.7023 LearningRate 0.0124 Epoch: 12 Global Step: 537810 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:51,281-Speed 2631.12 samples/sec Loss 4.6970 LearningRate 0.0124 Epoch: 12 Global Step: 537820 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:55,177-Speed 2629.77 samples/sec Loss 4.7771 LearningRate 0.0124 Epoch: 12 Global Step: 537830 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:08:59,072-Speed 2629.35 samples/sec Loss 4.7149 LearningRate 0.0124 Epoch: 12 Global Step: 537840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:02,975-Speed 2624.10 samples/sec Loss 4.6675 LearningRate 0.0124 Epoch: 12 Global Step: 537850 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:06,870-Speed 2629.26 samples/sec Loss 4.7473 LearningRate 0.0124 Epoch: 12 Global Step: 537860 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:10,766-Speed 2629.26 samples/sec Loss 4.6964 LearningRate 0.0124 Epoch: 12 Global Step: 537870 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:14,775-Speed 2555.19 samples/sec Loss 4.6163 LearningRate 0.0124 Epoch: 12 Global Step: 537880 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:18,678-Speed 2624.21 samples/sec Loss 4.6750 LearningRate 0.0124 Epoch: 12 Global Step: 537890 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:22,571-Speed 2631.15 samples/sec Loss 4.7145 LearningRate 0.0124 Epoch: 12 Global Step: 537900 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:26,463-Speed 2631.89 samples/sec Loss 4.7423 LearningRate 0.0124 Epoch: 12 Global Step: 537910 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:09:30,336-Speed 2644.48 samples/sec Loss 4.7502 LearningRate 0.0124 Epoch: 12 Global Step: 537920 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:34,234-Speed 2627.93 samples/sec Loss 4.6974 LearningRate 0.0124 Epoch: 12 Global Step: 537930 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:38,141-Speed 2621.48 samples/sec Loss 4.7306 LearningRate 0.0124 Epoch: 12 Global Step: 537940 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:42,042-Speed 2625.54 samples/sec Loss 4.6906 LearningRate 0.0124 Epoch: 12 Global Step: 537950 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:45,938-Speed 2628.93 samples/sec Loss 4.6845 LearningRate 0.0124 Epoch: 12 Global Step: 537960 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:49,833-Speed 2629.44 samples/sec Loss 4.6285 LearningRate 0.0124 Epoch: 12 Global Step: 537970 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:53,726-Speed 2631.57 samples/sec Loss 4.7235 LearningRate 0.0124 Epoch: 12 Global Step: 537980 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:09:57,712-Speed 2569.26 samples/sec Loss 4.6921 LearningRate 0.0124 Epoch: 12 Global Step: 537990 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:01,606-Speed 2630.78 samples/sec Loss 4.6785 LearningRate 0.0124 Epoch: 12 Global Step: 538000 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:05,500-Speed 2630.15 samples/sec Loss 4.8112 LearningRate 0.0124 Epoch: 12 Global Step: 538010 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:09,391-Speed 2632.36 samples/sec Loss 4.6811 LearningRate 0.0124 Epoch: 12 Global Step: 538020 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:10:13,282-Speed 2632.22 samples/sec Loss 4.6772 LearningRate 0.0124 Epoch: 12 Global Step: 538030 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:17,183-Speed 2626.13 samples/sec Loss 4.7035 LearningRate 0.0124 Epoch: 12 Global Step: 538040 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:21,089-Speed 2622.48 samples/sec Loss 4.8658 LearningRate 0.0123 Epoch: 12 Global Step: 538050 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:24,998-Speed 2620.49 samples/sec Loss 4.6160 LearningRate 0.0123 Epoch: 12 Global Step: 538060 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:28,898-Speed 2626.35 samples/sec Loss 4.9229 LearningRate 0.0123 Epoch: 12 Global Step: 538070 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:32,798-Speed 2626.12 samples/sec Loss 4.6714 LearningRate 0.0123 Epoch: 12 Global Step: 538080 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:36,695-Speed 2628.25 samples/sec Loss 4.6741 LearningRate 0.0123 Epoch: 12 Global Step: 538090 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:40,594-Speed 2626.89 samples/sec Loss 4.6891 LearningRate 0.0123 Epoch: 12 Global Step: 538100 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:44,507-Speed 2617.84 samples/sec Loss 4.6353 LearningRate 0.0123 Epoch: 12 Global Step: 538110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:48,410-Speed 2624.10 samples/sec Loss 4.7309 LearningRate 0.0123 Epoch: 12 Global Step: 538120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:10:52,327-Speed 2615.83 samples/sec Loss 4.7364 LearningRate 0.0123 Epoch: 12 Global Step: 538130 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:10:56,211-Speed 2636.76 samples/sec Loss 4.7358 LearningRate 0.0123 Epoch: 12 Global Step: 538140 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:11:00,197-Speed 2569.52 samples/sec Loss 4.7963 LearningRate 0.0123 Epoch: 12 Global Step: 538150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:11:04,101-Speed 2623.56 samples/sec Loss 4.7939 LearningRate 0.0123 Epoch: 12 Global Step: 538160 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:11:07,993-Speed 2631.61 samples/sec Loss 4.7377 LearningRate 0.0123 Epoch: 12 Global Step: 538170 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:11,887-Speed 2630.86 samples/sec Loss 4.7390 LearningRate 0.0123 Epoch: 12 Global Step: 538180 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:15,785-Speed 2629.43 samples/sec Loss 4.7622 LearningRate 0.0123 Epoch: 12 Global Step: 538190 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:19,693-Speed 2620.13 samples/sec Loss 4.6948 LearningRate 0.0123 Epoch: 12 Global Step: 538200 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:23,631-Speed 2601.79 samples/sec Loss 4.7755 LearningRate 0.0123 Epoch: 12 Global Step: 538210 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:27,529-Speed 2627.43 samples/sec Loss 4.7431 LearningRate 0.0123 Epoch: 12 Global Step: 538220 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:31,425-Speed 2629.53 samples/sec Loss 4.7374 LearningRate 0.0123 Epoch: 12 Global Step: 538230 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:35,341-Speed 2615.24 samples/sec Loss 4.6905 LearningRate 0.0123 Epoch: 12 Global Step: 538240 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:39,235-Speed 2630.34 samples/sec Loss 4.7231 LearningRate 0.0123 Epoch: 12 Global Step: 538250 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:43,130-Speed 2629.63 samples/sec Loss 4.6770 LearningRate 0.0123 Epoch: 12 Global Step: 538260 Fp16 Grad Scale: 32768 Required: 33 hours
Training: 2022-04-15 08:11:47,023-Speed 2631.52 samples/sec Loss 4.6346 LearningRate 0.0123 Epoch: 12 Global Step: 538270 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:11:50,958-Speed 2603.53 samples/sec Loss 4.7209 LearningRate 0.0123 Epoch: 12 Global Step: 538280 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:11:54,860-Speed 2624.38 samples/sec Loss 4.7859 LearningRate 0.0123 Epoch: 12 Global Step: 538290 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:11:58,789-Speed 2607.48 samples/sec Loss 4.7554 LearningRate 0.0123 Epoch: 12 Global Step: 538300 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:02,681-Speed 2631.36 samples/sec Loss 4.7968 LearningRate 0.0123 Epoch: 12 Global Step: 538310 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:06,576-Speed 2630.14 samples/sec Loss 4.7234 LearningRate 0.0123 Epoch: 12 Global Step: 538320 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:10,474-Speed 2627.31 samples/sec Loss 4.6774 LearningRate 0.0123 Epoch: 12 Global Step: 538330 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:14,406-Speed 2604.87 samples/sec Loss 4.7233 LearningRate 0.0123 Epoch: 12 Global Step: 538340 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:18,309-Speed 2624.84 samples/sec Loss 4.7570 LearningRate 0.0123 Epoch: 12 Global Step: 538350 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:22,208-Speed 2629.50 samples/sec Loss 4.7003 LearningRate 0.0123 Epoch: 12 Global Step: 538360 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:26,091-Speed 2638.56 samples/sec Loss 4.6953 LearningRate 0.0123 Epoch: 12 Global Step: 538370 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:29,986-Speed 2629.35 samples/sec Loss 4.7484 LearningRate 0.0123 Epoch: 12 Global Step: 538380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:33,886-Speed 2627.06 samples/sec Loss 4.7405 LearningRate 0.0123 Epoch: 12 Global Step: 538390 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:37,784-Speed 2628.02 samples/sec Loss 4.6670 LearningRate 0.0123 Epoch: 12 Global Step: 538400 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:41,681-Speed 2628.49 samples/sec Loss 4.6144 LearningRate 0.0123 Epoch: 12 Global Step: 538410 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:45,593-Speed 2617.44 samples/sec Loss 4.7054 LearningRate 0.0123 Epoch: 12 Global Step: 538420 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:49,489-Speed 2628.96 samples/sec Loss 4.7314 LearningRate 0.0123 Epoch: 12 Global Step: 538430 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:53,388-Speed 2627.25 samples/sec Loss 4.6657 LearningRate 0.0123 Epoch: 12 Global Step: 538440 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:12:57,288-Speed 2626.26 samples/sec Loss 4.7503 LearningRate 0.0123 Epoch: 12 Global Step: 538450 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:01,189-Speed 2625.98 samples/sec Loss 4.7465 LearningRate 0.0123 Epoch: 12 Global Step: 538460 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:05,099-Speed 2619.19 samples/sec Loss 4.6886 LearningRate 0.0123 Epoch: 12 Global Step: 538470 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:13:08,981-Speed 2639.78 samples/sec Loss 4.7599 LearningRate 0.0123 Epoch: 12 Global Step: 538480 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:12,877-Speed 2628.77 samples/sec Loss 4.7519 LearningRate 0.0123 Epoch: 12 Global Step: 538490 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:16,780-Speed 2624.06 samples/sec Loss 4.7332 LearningRate 0.0123 Epoch: 12 Global Step: 538500 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:20,685-Speed 2623.03 samples/sec Loss 4.7527 LearningRate 0.0123 Epoch: 12 Global Step: 538510 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:24,583-Speed 2627.45 samples/sec Loss 4.8139 LearningRate 0.0123 Epoch: 12 Global Step: 538520 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:28,486-Speed 2624.22 samples/sec Loss 4.7168 LearningRate 0.0123 Epoch: 12 Global Step: 538530 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:32,388-Speed 2624.86 samples/sec Loss 4.6780 LearningRate 0.0123 Epoch: 12 Global Step: 538540 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:36,289-Speed 2625.67 samples/sec Loss 4.7085 LearningRate 0.0123 Epoch: 12 Global Step: 538550 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:40,188-Speed 2627.02 samples/sec Loss 4.7346 LearningRate 0.0123 Epoch: 12 Global Step: 538560 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:44,083-Speed 2629.07 samples/sec Loss 4.8157 LearningRate 0.0123 Epoch: 12 Global Step: 538570 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:47,977-Speed 2630.11 samples/sec Loss 4.7506 LearningRate 0.0123 Epoch: 12 Global Step: 538580 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:13:51,873-Speed 2629.04 samples/sec Loss 4.7545 LearningRate 0.0123 Epoch: 12 Global Step: 538590 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:55,773-Speed 2626.79 samples/sec Loss 4.7732 LearningRate 0.0123 Epoch: 12 Global Step: 538600 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:13:59,684-Speed 2618.78 samples/sec Loss 4.7342 LearningRate 0.0123 Epoch: 12 Global Step: 538610 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:03,581-Speed 2628.62 samples/sec Loss 4.7277 LearningRate 0.0123 Epoch: 12 Global Step: 538620 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:07,484-Speed 2624.26 samples/sec Loss 4.7747 LearningRate 0.0123 Epoch: 12 Global Step: 538630 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:11,392-Speed 2620.84 samples/sec Loss 4.7717 LearningRate 0.0123 Epoch: 12 Global Step: 538640 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:15,302-Speed 2619.56 samples/sec Loss 4.7908 LearningRate 0.0123 Epoch: 12 Global Step: 538650 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:19,219-Speed 2614.33 samples/sec Loss 4.7054 LearningRate 0.0123 Epoch: 12 Global Step: 538660 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:23,132-Speed 2617.20 samples/sec Loss 4.7679 LearningRate 0.0123 Epoch: 12 Global Step: 538670 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:27,051-Speed 2614.19 samples/sec Loss 4.7295 LearningRate 0.0123 Epoch: 12 Global Step: 538680 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:30,948-Speed 2627.89 samples/sec Loss 4.7186 LearningRate 0.0123 Epoch: 12 Global Step: 538690 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:14:34,847-Speed 2627.15 samples/sec Loss 4.6661 LearningRate 0.0123 Epoch: 12 Global Step: 538700 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:14:38,749-Speed 2625.39 samples/sec Loss 4.7542 LearningRate 0.0123 Epoch: 12 Global Step: 538710 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:14:42,645-Speed 2629.02 samples/sec Loss 4.7735 LearningRate 0.0123 Epoch: 12 Global Step: 538720 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:46,546-Speed 2625.65 samples/sec Loss 4.6812 LearningRate 0.0123 Epoch: 12 Global Step: 538730 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:50,500-Speed 2589.98 samples/sec Loss 4.7904 LearningRate 0.0123 Epoch: 12 Global Step: 538740 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:54,400-Speed 2626.67 samples/sec Loss 4.7646 LearningRate 0.0123 Epoch: 12 Global Step: 538750 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:14:58,297-Speed 2627.90 samples/sec Loss 4.7624 LearningRate 0.0123 Epoch: 12 Global Step: 538760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:02,236-Speed 2600.49 samples/sec Loss 4.7533 LearningRate 0.0123 Epoch: 12 Global Step: 538770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:06,133-Speed 2628.15 samples/sec Loss 4.7456 LearningRate 0.0123 Epoch: 12 Global Step: 538780 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:10,026-Speed 2631.31 samples/sec Loss 4.7157 LearningRate 0.0123 Epoch: 12 Global Step: 538790 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:13,923-Speed 2627.77 samples/sec Loss 4.6304 LearningRate 0.0123 Epoch: 12 Global Step: 538800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:17,824-Speed 2626.11 samples/sec Loss 4.6514 LearningRate 0.0123 Epoch: 12 Global Step: 538810 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:21,727-Speed 2624.33 samples/sec Loss 4.7276 LearningRate 0.0123 Epoch: 12 Global Step: 538820 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:15:25,602-Speed 2643.16 samples/sec Loss 4.7099 LearningRate 0.0123 Epoch: 12 Global Step: 538830 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:29,509-Speed 2621.87 samples/sec Loss 4.7447 LearningRate 0.0123 Epoch: 12 Global Step: 538840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:33,426-Speed 2614.47 samples/sec Loss 4.7612 LearningRate 0.0123 Epoch: 12 Global Step: 538850 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:37,329-Speed 2624.09 samples/sec Loss 4.7025 LearningRate 0.0123 Epoch: 12 Global Step: 538860 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:41,236-Speed 2621.12 samples/sec Loss 4.7450 LearningRate 0.0123 Epoch: 12 Global Step: 538870 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:45,131-Speed 2629.76 samples/sec Loss 4.6741 LearningRate 0.0123 Epoch: 12 Global Step: 538880 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:49,027-Speed 2629.27 samples/sec Loss 4.7283 LearningRate 0.0123 Epoch: 12 Global Step: 538890 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:52,929-Speed 2624.88 samples/sec Loss 4.7231 LearningRate 0.0123 Epoch: 12 Global Step: 538900 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:15:56,826-Speed 2628.54 samples/sec Loss 4.6412 LearningRate 0.0123 Epoch: 12 Global Step: 538910 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:00,722-Speed 2628.86 samples/sec Loss 4.7631 LearningRate 0.0123 Epoch: 12 Global Step: 538920 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:04,660-Speed 2600.66 samples/sec Loss 4.7849 LearningRate 0.0123 Epoch: 12 Global Step: 538930 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:16:08,556-Speed 2629.19 samples/sec Loss 4.7026 LearningRate 0.0123 Epoch: 12 Global Step: 538940 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:16:12,448-Speed 2631.51 samples/sec Loss 4.7360 LearningRate 0.0123 Epoch: 12 Global Step: 538950 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:16:16,345-Speed 2628.04 samples/sec Loss 4.7842 LearningRate 0.0123 Epoch: 12 Global Step: 538960 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:16:20,241-Speed 2629.22 samples/sec Loss 4.6564 LearningRate 0.0123 Epoch: 12 Global Step: 538970 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:16:24,114-Speed 2644.34 samples/sec Loss 4.7836 LearningRate 0.0123 Epoch: 12 Global Step: 538980 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:28,011-Speed 2628.71 samples/sec Loss 4.7422 LearningRate 0.0123 Epoch: 12 Global Step: 538990 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:31,908-Speed 2628.54 samples/sec Loss 4.7790 LearningRate 0.0123 Epoch: 12 Global Step: 539000 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:35,804-Speed 2628.73 samples/sec Loss 4.7853 LearningRate 0.0123 Epoch: 12 Global Step: 539010 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:39,700-Speed 2628.55 samples/sec Loss 4.7716 LearningRate 0.0123 Epoch: 12 Global Step: 539020 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:43,599-Speed 2626.83 samples/sec Loss 4.7216 LearningRate 0.0123 Epoch: 12 Global Step: 539030 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:47,501-Speed 2625.46 samples/sec Loss 4.7835 LearningRate 0.0123 Epoch: 12 Global Step: 539040 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:51,395-Speed 2630.13 samples/sec Loss 4.7045 LearningRate 0.0123 Epoch: 12 Global Step: 539050 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:55,300-Speed 2622.73 samples/sec Loss 4.5857 LearningRate 0.0123 Epoch: 12 Global Step: 539060 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:16:59,202-Speed 2625.52 samples/sec Loss 4.6983 LearningRate 0.0123 Epoch: 12 Global Step: 539070 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:03,098-Speed 2628.58 samples/sec Loss 4.7311 LearningRate 0.0123 Epoch: 12 Global Step: 539080 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:17:06,998-Speed 2626.20 samples/sec Loss 4.8203 LearningRate 0.0123 Epoch: 12 Global Step: 539090 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:17:10,895-Speed 2628.15 samples/sec Loss 4.7000 LearningRate 0.0123 Epoch: 12 Global Step: 539100 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:17:14,772-Speed 2642.58 samples/sec Loss 4.6610 LearningRate 0.0123 Epoch: 12 Global Step: 539110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:18,674-Speed 2624.18 samples/sec Loss 4.7045 LearningRate 0.0123 Epoch: 12 Global Step: 539120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:22,573-Speed 2627.39 samples/sec Loss 4.7425 LearningRate 0.0123 Epoch: 12 Global Step: 539130 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:26,470-Speed 2628.19 samples/sec Loss 4.7401 LearningRate 0.0123 Epoch: 12 Global Step: 539140 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:30,369-Speed 2627.11 samples/sec Loss 4.7179 LearningRate 0.0123 Epoch: 12 Global Step: 539150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:34,268-Speed 2627.00 samples/sec Loss 4.7911 LearningRate 0.0123 Epoch: 12 Global Step: 539160 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:38,164-Speed 2628.74 samples/sec Loss 4.7737 LearningRate 0.0123 Epoch: 12 Global Step: 539170 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:42,060-Speed 2629.15 samples/sec Loss 4.8351 LearningRate 0.0123 Epoch: 12 Global Step: 539180 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:45,954-Speed 2630.61 samples/sec Loss 4.8690 LearningRate 0.0123 Epoch: 12 Global Step: 539190 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:49,848-Speed 2629.67 samples/sec Loss 4.6936 LearningRate 0.0123 Epoch: 12 Global Step: 539200 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:17:53,740-Speed 2632.20 samples/sec Loss 4.7707 LearningRate 0.0123 Epoch: 12 Global Step: 539210 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:17:57,633-Speed 2630.83 samples/sec Loss 4.7296 LearningRate 0.0123 Epoch: 12 Global Step: 539220 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:18:19,113-Speed 476.74 samples/sec Loss 4.7703 LearningRate 0.0122 Epoch: 13 Global Step: 539230 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:18:22,973-Speed 2654.25 samples/sec Loss 4.7500 LearningRate 0.0122 Epoch: 13 Global Step: 539240 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:26,867-Speed 2630.27 samples/sec Loss 4.7773 LearningRate 0.0122 Epoch: 13 Global Step: 539250 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:30,769-Speed 2624.94 samples/sec Loss 4.8075 LearningRate 0.0122 Epoch: 13 Global Step: 539260 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:34,658-Speed 2634.21 samples/sec Loss 4.7186 LearningRate 0.0122 Epoch: 13 Global Step: 539270 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:38,553-Speed 2629.58 samples/sec Loss 4.7342 LearningRate 0.0122 Epoch: 13 Global Step: 539280 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:42,449-Speed 2628.70 samples/sec Loss 4.7190 LearningRate 0.0122 Epoch: 13 Global Step: 539290 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:46,350-Speed 2626.31 samples/sec Loss 4.7301 LearningRate 0.0122 Epoch: 13 Global Step: 539300 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:50,252-Speed 2624.32 samples/sec Loss 4.6973 LearningRate 0.0122 Epoch: 13 Global Step: 539310 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:54,181-Speed 2607.33 samples/sec Loss 4.6979 LearningRate 0.0122 Epoch: 13 Global Step: 539320 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:18:58,080-Speed 2626.87 samples/sec Loss 4.6166 LearningRate 0.0122 Epoch: 13 Global Step: 539330 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:01,980-Speed 2626.81 samples/sec Loss 4.7125 LearningRate 0.0122 Epoch: 13 Global Step: 539340 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:19:05,859-Speed 2640.20 samples/sec Loss 4.7187 LearningRate 0.0122 Epoch: 13 Global Step: 539350 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:09,783-Speed 2610.46 samples/sec Loss 4.6808 LearningRate 0.0122 Epoch: 13 Global Step: 539360 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:13,680-Speed 2627.99 samples/sec Loss 4.6697 LearningRate 0.0122 Epoch: 13 Global Step: 539370 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:17,598-Speed 2614.67 samples/sec Loss 4.6173 LearningRate 0.0122 Epoch: 13 Global Step: 539380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:21,496-Speed 2627.81 samples/sec Loss 4.8507 LearningRate 0.0122 Epoch: 13 Global Step: 539390 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:25,396-Speed 2626.18 samples/sec Loss 4.6801 LearningRate 0.0122 Epoch: 13 Global Step: 539400 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:29,325-Speed 2607.19 samples/sec Loss 4.7122 LearningRate 0.0122 Epoch: 13 Global Step: 539410 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:33,229-Speed 2623.95 samples/sec Loss 4.7245 LearningRate 0.0122 Epoch: 13 Global Step: 539420 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:37,129-Speed 2625.76 samples/sec Loss 4.7654 LearningRate 0.0122 Epoch: 13 Global Step: 539430 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:41,029-Speed 2626.36 samples/sec Loss 4.6609 LearningRate 0.0122 Epoch: 13 Global Step: 539440 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:19:44,959-Speed 2606.93 samples/sec Loss 4.7407 LearningRate 0.0122 Epoch: 13 Global Step: 539450 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:19:48,862-Speed 2623.90 samples/sec Loss 4.7325 LearningRate 0.0122 Epoch: 13 Global Step: 539460 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:19:52,791-Speed 2607.19 samples/sec Loss 4.6795 LearningRate 0.0122 Epoch: 13 Global Step: 539470 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:19:56,686-Speed 2630.18 samples/sec Loss 4.7473 LearningRate 0.0122 Epoch: 13 Global Step: 539480 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:20:00,586-Speed 2626.73 samples/sec Loss 4.7234 LearningRate 0.0122 Epoch: 13 Global Step: 539490 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:20:04,487-Speed 2625.14 samples/sec Loss 4.6194 LearningRate 0.0122 Epoch: 13 Global Step: 539500 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:20:08,383-Speed 2628.99 samples/sec Loss 4.6483 LearningRate 0.0122 Epoch: 13 Global Step: 539510 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:20:12,275-Speed 2631.68 samples/sec Loss 4.6652 LearningRate 0.0122 Epoch: 13 Global Step: 539520 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:16,177-Speed 2625.23 samples/sec Loss 4.7311 LearningRate 0.0122 Epoch: 13 Global Step: 539530 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:20,072-Speed 2629.39 samples/sec Loss 4.6404 LearningRate 0.0122 Epoch: 13 Global Step: 539540 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:23,983-Speed 2618.93 samples/sec Loss 4.7051 LearningRate 0.0122 Epoch: 13 Global Step: 539550 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:27,888-Speed 2622.80 samples/sec Loss 4.7686 LearningRate 0.0122 Epoch: 13 Global Step: 539560 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:31,787-Speed 2627.79 samples/sec Loss 4.7738 LearningRate 0.0122 Epoch: 13 Global Step: 539570 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:35,679-Speed 2631.55 samples/sec Loss 4.6848 LearningRate 0.0122 Epoch: 13 Global Step: 539580 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:39,583-Speed 2623.28 samples/sec Loss 4.7793 LearningRate 0.0122 Epoch: 13 Global Step: 539590 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:43,490-Speed 2621.74 samples/sec Loss 4.7407 LearningRate 0.0122 Epoch: 13 Global Step: 539600 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:47,385-Speed 2629.53 samples/sec Loss 4.6551 LearningRate 0.0122 Epoch: 13 Global Step: 539610 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:20:51,279-Speed 2630.47 samples/sec Loss 4.6717 LearningRate 0.0122 Epoch: 13 Global Step: 539620 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:20:55,176-Speed 2628.87 samples/sec Loss 4.7373 LearningRate 0.0122 Epoch: 13 Global Step: 539630 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:20:59,073-Speed 2627.87 samples/sec Loss 4.7437 LearningRate 0.0122 Epoch: 13 Global Step: 539640 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:21:02,971-Speed 2629.04 samples/sec Loss 4.5599 LearningRate 0.0122 Epoch: 13 Global Step: 539650 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:21:06,877-Speed 2621.51 samples/sec Loss 4.6794 LearningRate 0.0122 Epoch: 13 Global Step: 539660 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:21:10,779-Speed 2625.33 samples/sec Loss 4.6557 LearningRate 0.0122 Epoch: 13 Global Step: 539670 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:21:14,683-Speed 2623.47 samples/sec Loss 4.7406 LearningRate 0.0122 Epoch: 13 Global Step: 539680 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:21:18,556-Speed 2644.24 samples/sec Loss 4.7776 LearningRate 0.0122 Epoch: 13 Global Step: 539690 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:22,461-Speed 2623.31 samples/sec Loss 4.6803 LearningRate 0.0122 Epoch: 13 Global Step: 539700 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:26,371-Speed 2619.45 samples/sec Loss 4.6972 LearningRate 0.0122 Epoch: 13 Global Step: 539710 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:30,272-Speed 2626.23 samples/sec Loss 4.6774 LearningRate 0.0122 Epoch: 13 Global Step: 539720 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:34,171-Speed 2626.71 samples/sec Loss 4.6149 LearningRate 0.0122 Epoch: 13 Global Step: 539730 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:38,080-Speed 2619.72 samples/sec Loss 4.7018 LearningRate 0.0122 Epoch: 13 Global Step: 539740 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:41,977-Speed 2628.69 samples/sec Loss 4.6532 LearningRate 0.0122 Epoch: 13 Global Step: 539750 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:45,872-Speed 2630.58 samples/sec Loss 4.6450 LearningRate 0.0122 Epoch: 13 Global Step: 539760 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:49,767-Speed 2629.43 samples/sec Loss 4.7802 LearningRate 0.0122 Epoch: 13 Global Step: 539770 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:53,691-Speed 2610.79 samples/sec Loss 4.6431 LearningRate 0.0122 Epoch: 13 Global Step: 539780 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:21:57,561-Speed 2646.62 samples/sec Loss 4.5910 LearningRate 0.0122 Epoch: 13 Global Step: 539790 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:01,458-Speed 2628.89 samples/sec Loss 4.7147 LearningRate 0.0122 Epoch: 13 Global Step: 539800 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:05,375-Speed 2614.63 samples/sec Loss 4.6412 LearningRate 0.0122 Epoch: 13 Global Step: 539810 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:09,273-Speed 2627.37 samples/sec Loss 4.6482 LearningRate 0.0122 Epoch: 13 Global Step: 539820 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:13,170-Speed 2628.38 samples/sec Loss 4.7330 LearningRate 0.0122 Epoch: 13 Global Step: 539830 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:17,065-Speed 2630.59 samples/sec Loss 4.7910 LearningRate 0.0122 Epoch: 13 Global Step: 539840 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:20,967-Speed 2624.49 samples/sec Loss 4.5896 LearningRate 0.0122 Epoch: 13 Global Step: 539850 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:24,864-Speed 2628.36 samples/sec Loss 4.7403 LearningRate 0.0122 Epoch: 13 Global Step: 539860 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:28,770-Speed 2622.38 samples/sec Loss 4.7220 LearningRate 0.0122 Epoch: 13 Global Step: 539870 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:32,676-Speed 2622.79 samples/sec Loss 4.7391 LearningRate 0.0122 Epoch: 13 Global Step: 539880 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:36,570-Speed 2629.94 samples/sec Loss 4.7044 LearningRate 0.0122 Epoch: 13 Global Step: 539890 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:22:40,467-Speed 2628.37 samples/sec Loss 4.7056 LearningRate 0.0122 Epoch: 13 Global Step: 539900 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:22:44,366-Speed 2626.72 samples/sec Loss 4.7331 LearningRate 0.0122 Epoch: 13 Global Step: 539910 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:22:48,260-Speed 2630.68 samples/sec Loss 4.6811 LearningRate 0.0122 Epoch: 13 Global Step: 539920 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:22:52,155-Speed 2630.71 samples/sec Loss 4.7016 LearningRate 0.0122 Epoch: 13 Global Step: 539930 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:56,049-Speed 2630.27 samples/sec Loss 4.6914 LearningRate 0.0122 Epoch: 13 Global Step: 539940 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:22:59,963-Speed 2616.73 samples/sec Loss 4.6933 LearningRate 0.0122 Epoch: 13 Global Step: 539950 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:23:03,871-Speed 2621.21 samples/sec Loss 4.6264 LearningRate 0.0122 Epoch: 13 Global Step: 539960 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:23:07,774-Speed 2623.97 samples/sec Loss 4.6354 LearningRate 0.0122 Epoch: 13 Global Step: 539970 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:23:11,728-Speed 2590.44 samples/sec Loss 4.7110 LearningRate 0.0122 Epoch: 13 Global Step: 539980 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:23:15,637-Speed 2620.52 samples/sec Loss 4.6226 LearningRate 0.0122 Epoch: 13 Global Step: 539990 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:23:19,542-Speed 2623.10 samples/sec Loss 4.6408 LearningRate 0.0122 Epoch: 13 Global Step: 540000 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:24:02,982-[lfw][540000]XNorm: 22.376632
Training: 2022-04-15 08:24:02,983-[lfw][540000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 08:24:02,983-[lfw][540000]Accuracy-Highest: 0.99800
Training: 2022-04-15 08:24:53,381-[cfp_fp][540000]XNorm: 20.796383
Training: 2022-04-15 08:24:53,382-[cfp_fp][540000]Accuracy-Flip: 0.98971+-0.00423
Training: 2022-04-15 08:24:53,383-[cfp_fp][540000]Accuracy-Highest: 0.99086
Training: 2022-04-15 08:25:36,817-[agedb_30][540000]XNorm: 22.333698
Training: 2022-04-15 08:25:36,818-[agedb_30][540000]Accuracy-Flip: 0.97917+-0.00720
Training: 2022-04-15 08:25:36,819-[agedb_30][540000]Accuracy-Highest: 0.98083
Training: 2022-04-15 08:25:40,713-Speed 72.54 samples/sec Loss 4.6632 LearningRate 0.0122 Epoch: 13 Global Step: 540010 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:25:44,755-Speed 2533.87 samples/sec Loss 4.7313 LearningRate 0.0122 Epoch: 13 Global Step: 540020 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:25:48,639-Speed 2637.78 samples/sec Loss 4.7169 LearningRate 0.0122 Epoch: 13 Global Step: 540030 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:25:52,528-Speed 2633.30 samples/sec Loss 4.7077 LearningRate 0.0122 Epoch: 13 Global Step: 540040 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:25:56,403-Speed 2643.31 samples/sec Loss 4.6123 LearningRate 0.0122 Epoch: 13 Global Step: 540050 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:00,282-Speed 2640.73 samples/sec Loss 4.6621 LearningRate 0.0122 Epoch: 13 Global Step: 540060 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:04,161-Speed 2640.90 samples/sec Loss 4.7379 LearningRate 0.0122 Epoch: 13 Global Step: 540070 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:08,074-Speed 2617.57 samples/sec Loss 4.6431 LearningRate 0.0122 Epoch: 13 Global Step: 540080 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:11,964-Speed 2633.44 samples/sec Loss 4.6485 LearningRate 0.0122 Epoch: 13 Global Step: 540090 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:15,860-Speed 2629.71 samples/sec Loss 4.7362 LearningRate 0.0122 Epoch: 13 Global Step: 540100 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:19,741-Speed 2639.12 samples/sec Loss 4.7054 LearningRate 0.0122 Epoch: 13 Global Step: 540110 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:23,634-Speed 2630.35 samples/sec Loss 4.7243 LearningRate 0.0122 Epoch: 13 Global Step: 540120 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:27,533-Speed 2627.18 samples/sec Loss 4.6247 LearningRate 0.0122 Epoch: 13 Global Step: 540130 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:31,432-Speed 2627.33 samples/sec Loss 4.6666 LearningRate 0.0122 Epoch: 13 Global Step: 540140 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:26:35,307-Speed 2643.66 samples/sec Loss 4.6581 LearningRate 0.0122 Epoch: 13 Global Step: 540150 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:39,229-Speed 2611.23 samples/sec Loss 4.7735 LearningRate 0.0122 Epoch: 13 Global Step: 540160 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:43,122-Speed 2631.33 samples/sec Loss 4.6889 LearningRate 0.0122 Epoch: 13 Global Step: 540170 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:47,017-Speed 2629.73 samples/sec Loss 4.7480 LearningRate 0.0122 Epoch: 13 Global Step: 540180 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:50,912-Speed 2629.83 samples/sec Loss 4.7325 LearningRate 0.0122 Epoch: 13 Global Step: 540190 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:54,809-Speed 2627.72 samples/sec Loss 4.6339 LearningRate 0.0122 Epoch: 13 Global Step: 540200 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:26:58,718-Speed 2620.82 samples/sec Loss 4.6713 LearningRate 0.0122 Epoch: 13 Global Step: 540210 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:27:02,607-Speed 2634.02 samples/sec Loss 4.6846 LearningRate 0.0122 Epoch: 13 Global Step: 540220 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:27:06,505-Speed 2627.55 samples/sec Loss 4.6574 LearningRate 0.0122 Epoch: 13 Global Step: 540230 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:27:10,410-Speed 2623.02 samples/sec Loss 4.7081 LearningRate 0.0122 Epoch: 13 Global Step: 540240 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:27:14,301-Speed 2632.50 samples/sec Loss 4.7201 LearningRate 0.0122 Epoch: 13 Global Step: 540250 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:18,195-Speed 2629.65 samples/sec Loss 4.6269 LearningRate 0.0122 Epoch: 13 Global Step: 540260 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:22,084-Speed 2634.16 samples/sec Loss 4.6658 LearningRate 0.0122 Epoch: 13 Global Step: 540270 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:26,002-Speed 2613.95 samples/sec Loss 4.7204 LearningRate 0.0122 Epoch: 13 Global Step: 540280 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:29,902-Speed 2626.94 samples/sec Loss 4.7006 LearningRate 0.0122 Epoch: 13 Global Step: 540290 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:33,796-Speed 2630.53 samples/sec Loss 4.6949 LearningRate 0.0122 Epoch: 13 Global Step: 540300 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:37,690-Speed 2629.91 samples/sec Loss 4.7273 LearningRate 0.0122 Epoch: 13 Global Step: 540310 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:41,584-Speed 2630.62 samples/sec Loss 4.7323 LearningRate 0.0122 Epoch: 13 Global Step: 540320 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:45,477-Speed 2631.17 samples/sec Loss 4.7172 LearningRate 0.0122 Epoch: 13 Global Step: 540330 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:49,372-Speed 2629.97 samples/sec Loss 4.7196 LearningRate 0.0122 Epoch: 13 Global Step: 540340 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:27:53,270-Speed 2627.45 samples/sec Loss 4.7666 LearningRate 0.0122 Epoch: 13 Global Step: 540350 Fp16 Grad Scale: 262144 Required: 33 hours
Training: 2022-04-15 08:27:57,278-Speed 2555.57 samples/sec Loss 4.6359 LearningRate 0.0122 Epoch: 13 Global Step: 540360 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:28:01,368-Speed 2504.23 samples/sec Loss 4.7516 LearningRate 0.0122 Epoch: 13 Global Step: 540370 Fp16 Grad Scale: 131072 Required: 33 hours
Training: 2022-04-15 08:28:05,257-Speed 2634.27 samples/sec Loss 4.6824 LearningRate 0.0122 Epoch: 13 Global Step: 540380 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:28:09,154-Speed 2627.93 samples/sec Loss 4.6229 LearningRate 0.0122 Epoch: 13 Global Step: 540390 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:28:13,051-Speed 2628.87 samples/sec Loss 4.6510 LearningRate 0.0122 Epoch: 13 Global Step: 540400 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:28:16,949-Speed 2627.57 samples/sec Loss 4.7831 LearningRate 0.0122 Epoch: 13 Global Step: 540410 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:28:20,837-Speed 2634.37 samples/sec Loss 4.6664 LearningRate 0.0121 Epoch: 13 Global Step: 540420 Fp16 Grad Scale: 65536 Required: 33 hours
Training: 2022-04-15 08:28:24,734-Speed 2628.30 samples/sec Loss 4.7675 LearningRate 0.0121 Epoch: 13 Global Step: 540430 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:28:28,629-Speed 2630.06 samples/sec Loss 4.6516 LearningRate 0.0121 Epoch: 13 Global Step: 540440 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:28:32,565-Speed 2602.06 samples/sec Loss 4.7528 LearningRate 0.0121 Epoch: 13 Global Step: 540450 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:28:36,466-Speed 2626.23 samples/sec Loss 4.8771 LearningRate 0.0121 Epoch: 13 Global Step: 540460 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:28:40,366-Speed 2626.66 samples/sec Loss 4.6180 LearningRate 0.0121 Epoch: 13 Global Step: 540470 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:28:44,292-Speed 2608.61 samples/sec Loss 4.7323 LearningRate 0.0121 Epoch: 13 Global Step: 540480 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:28:48,198-Speed 2622.27 samples/sec Loss 4.7319 LearningRate 0.0121 Epoch: 13 Global Step: 540490 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:28:52,089-Speed 2632.34 samples/sec Loss 4.6472 LearningRate 0.0121 Epoch: 13 Global Step: 540500 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:28:55,982-Speed 2630.90 samples/sec Loss 4.6805 LearningRate 0.0121 Epoch: 13 Global Step: 540510 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:28:59,875-Speed 2631.09 samples/sec Loss 4.6873 LearningRate 0.0121 Epoch: 13 Global Step: 540520 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:03,773-Speed 2627.74 samples/sec Loss 4.7390 LearningRate 0.0121 Epoch: 13 Global Step: 540530 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:07,664-Speed 2632.99 samples/sec Loss 4.6923 LearningRate 0.0121 Epoch: 13 Global Step: 540540 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:11,554-Speed 2632.61 samples/sec Loss 4.7008 LearningRate 0.0121 Epoch: 13 Global Step: 540550 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:15,448-Speed 2630.52 samples/sec Loss 4.6192 LearningRate 0.0121 Epoch: 13 Global Step: 540560 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:19,383-Speed 2602.62 samples/sec Loss 4.6934 LearningRate 0.0121 Epoch: 13 Global Step: 540570 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:23,251-Speed 2648.09 samples/sec Loss 4.7480 LearningRate 0.0121 Epoch: 13 Global Step: 540580 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:27,141-Speed 2633.09 samples/sec Loss 4.6506 LearningRate 0.0121 Epoch: 13 Global Step: 540590 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:31,066-Speed 2609.89 samples/sec Loss 4.7559 LearningRate 0.0121 Epoch: 13 Global Step: 540600 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:34,961-Speed 2629.78 samples/sec Loss 4.7045 LearningRate 0.0121 Epoch: 13 Global Step: 540610 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:38,852-Speed 2632.77 samples/sec Loss 4.6807 LearningRate 0.0121 Epoch: 13 Global Step: 540620 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:42,750-Speed 2627.63 samples/sec Loss 4.5710 LearningRate 0.0121 Epoch: 13 Global Step: 540630 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:46,648-Speed 2628.47 samples/sec Loss 4.6556 LearningRate 0.0121 Epoch: 13 Global Step: 540640 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:29:50,509-Speed 2652.10 samples/sec Loss 4.7326 LearningRate 0.0121 Epoch: 13 Global Step: 540650 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:29:54,422-Speed 2617.72 samples/sec Loss 4.8145 LearningRate 0.0121 Epoch: 13 Global Step: 540660 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:29:58,316-Speed 2630.02 samples/sec Loss 4.5972 LearningRate 0.0121 Epoch: 13 Global Step: 540670 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:02,212-Speed 2629.64 samples/sec Loss 4.6780 LearningRate 0.0121 Epoch: 13 Global Step: 540680 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:06,108-Speed 2628.59 samples/sec Loss 4.7834 LearningRate 0.0121 Epoch: 13 Global Step: 540690 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:09,998-Speed 2633.16 samples/sec Loss 4.5576 LearningRate 0.0121 Epoch: 13 Global Step: 540700 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:13,888-Speed 2632.93 samples/sec Loss 4.7727 LearningRate 0.0121 Epoch: 13 Global Step: 540710 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:17,789-Speed 2626.21 samples/sec Loss 4.6840 LearningRate 0.0121 Epoch: 13 Global Step: 540720 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:21,721-Speed 2604.29 samples/sec Loss 4.7291 LearningRate 0.0121 Epoch: 13 Global Step: 540730 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:25,800-Speed 2511.33 samples/sec Loss 4.6791 LearningRate 0.0121 Epoch: 13 Global Step: 540740 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:30:29,887-Speed 2505.72 samples/sec Loss 4.7322 LearningRate 0.0121 Epoch: 13 Global Step: 540750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:30:33,978-Speed 2504.06 samples/sec Loss 4.7373 LearningRate 0.0121 Epoch: 13 Global Step: 540760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:30:38,064-Speed 2507.14 samples/sec Loss 4.6750 LearningRate 0.0121 Epoch: 13 Global Step: 540770 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:30:42,104-Speed 2535.26 samples/sec Loss 4.6430 LearningRate 0.0121 Epoch: 13 Global Step: 540780 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:30:45,998-Speed 2630.43 samples/sec Loss 4.6758 LearningRate 0.0121 Epoch: 13 Global Step: 540790 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:30:49,891-Speed 2631.23 samples/sec Loss 4.6367 LearningRate 0.0121 Epoch: 13 Global Step: 540800 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:30:53,785-Speed 2630.58 samples/sec Loss 4.7259 LearningRate 0.0121 Epoch: 13 Global Step: 540810 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:30:57,691-Speed 2621.88 samples/sec Loss 4.6077 LearningRate 0.0121 Epoch: 13 Global Step: 540820 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:01,588-Speed 2628.57 samples/sec Loss 4.7406 LearningRate 0.0121 Epoch: 13 Global Step: 540830 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:05,483-Speed 2629.62 samples/sec Loss 4.8139 LearningRate 0.0121 Epoch: 13 Global Step: 540840 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:09,384-Speed 2625.99 samples/sec Loss 4.7669 LearningRate 0.0121 Epoch: 13 Global Step: 540850 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:31:13,280-Speed 2629.41 samples/sec Loss 4.6048 LearningRate 0.0121 Epoch: 13 Global Step: 540860 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:31:17,173-Speed 2630.91 samples/sec Loss 4.7511 LearningRate 0.0121 Epoch: 13 Global Step: 540870 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:31:21,065-Speed 2632.15 samples/sec Loss 4.7333 LearningRate 0.0121 Epoch: 13 Global Step: 540880 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:31:24,967-Speed 2624.78 samples/sec Loss 4.7068 LearningRate 0.0121 Epoch: 13 Global Step: 540890 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:31:28,862-Speed 2629.43 samples/sec Loss 4.6526 LearningRate 0.0121 Epoch: 13 Global Step: 540900 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:31:32,776-Speed 2617.07 samples/sec Loss 4.7219 LearningRate 0.0121 Epoch: 13 Global Step: 540910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:36,674-Speed 2626.99 samples/sec Loss 4.6760 LearningRate 0.0121 Epoch: 13 Global Step: 540920 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:40,576-Speed 2625.60 samples/sec Loss 4.6180 LearningRate 0.0121 Epoch: 13 Global Step: 540930 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:44,472-Speed 2629.38 samples/sec Loss 4.6863 LearningRate 0.0121 Epoch: 13 Global Step: 540940 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:48,371-Speed 2627.47 samples/sec Loss 4.7232 LearningRate 0.0121 Epoch: 13 Global Step: 540950 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:52,416-Speed 2532.23 samples/sec Loss 4.7622 LearningRate 0.0121 Epoch: 13 Global Step: 540960 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:31:56,309-Speed 2631.66 samples/sec Loss 4.6414 LearningRate 0.0121 Epoch: 13 Global Step: 540970 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:32:00,219-Speed 2619.70 samples/sec Loss 4.7202 LearningRate 0.0121 Epoch: 13 Global Step: 540980 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:32:04,114-Speed 2630.02 samples/sec Loss 4.6193 LearningRate 0.0121 Epoch: 13 Global Step: 540990 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:32:08,012-Speed 2627.52 samples/sec Loss 4.7341 LearningRate 0.0121 Epoch: 13 Global Step: 541000 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:32:11,925-Speed 2617.87 samples/sec Loss 4.7051 LearningRate 0.0121 Epoch: 13 Global Step: 541010 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:15,816-Speed 2631.86 samples/sec Loss 4.6737 LearningRate 0.0121 Epoch: 13 Global Step: 541020 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:19,713-Speed 2628.70 samples/sec Loss 4.6602 LearningRate 0.0121 Epoch: 13 Global Step: 541030 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:23,635-Speed 2611.50 samples/sec Loss 4.6970 LearningRate 0.0121 Epoch: 13 Global Step: 541040 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:27,533-Speed 2628.15 samples/sec Loss 4.7259 LearningRate 0.0121 Epoch: 13 Global Step: 541050 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:31,425-Speed 2631.43 samples/sec Loss 4.5699 LearningRate 0.0121 Epoch: 13 Global Step: 541060 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:35,321-Speed 2628.95 samples/sec Loss 4.5676 LearningRate 0.0121 Epoch: 13 Global Step: 541070 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:39,235-Speed 2616.94 samples/sec Loss 4.7252 LearningRate 0.0121 Epoch: 13 Global Step: 541080 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:43,127-Speed 2631.91 samples/sec Loss 4.6663 LearningRate 0.0121 Epoch: 13 Global Step: 541090 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:47,022-Speed 2629.48 samples/sec Loss 4.6884 LearningRate 0.0121 Epoch: 13 Global Step: 541100 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:50,899-Speed 2642.01 samples/sec Loss 4.7183 LearningRate 0.0121 Epoch: 13 Global Step: 541110 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:32:54,785-Speed 2635.80 samples/sec Loss 4.7934 LearningRate 0.0121 Epoch: 13 Global Step: 541120 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:32:58,685-Speed 2626.64 samples/sec Loss 4.5808 LearningRate 0.0121 Epoch: 13 Global Step: 541130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:02,600-Speed 2616.03 samples/sec Loss 4.7119 LearningRate 0.0121 Epoch: 13 Global Step: 541140 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:06,700-Speed 2498.24 samples/sec Loss 4.5894 LearningRate 0.0121 Epoch: 13 Global Step: 541150 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:10,751-Speed 2528.37 samples/sec Loss 4.7195 LearningRate 0.0121 Epoch: 13 Global Step: 541160 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:14,646-Speed 2630.06 samples/sec Loss 4.6560 LearningRate 0.0121 Epoch: 13 Global Step: 541170 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:18,542-Speed 2629.52 samples/sec Loss 4.7124 LearningRate 0.0121 Epoch: 13 Global Step: 541180 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:22,451-Speed 2620.01 samples/sec Loss 4.7440 LearningRate 0.0121 Epoch: 13 Global Step: 541190 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:26,349-Speed 2628.40 samples/sec Loss 4.7238 LearningRate 0.0121 Epoch: 13 Global Step: 541200 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:30,263-Speed 2616.94 samples/sec Loss 4.6563 LearningRate 0.0121 Epoch: 13 Global Step: 541210 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:34,174-Speed 2618.55 samples/sec Loss 4.6852 LearningRate 0.0121 Epoch: 13 Global Step: 541220 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:33:38,071-Speed 2628.14 samples/sec Loss 4.6468 LearningRate 0.0121 Epoch: 13 Global Step: 541230 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:33:41,945-Speed 2644.37 samples/sec Loss 4.6059 LearningRate 0.0121 Epoch: 13 Global Step: 541240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:45,840-Speed 2629.99 samples/sec Loss 4.6654 LearningRate 0.0121 Epoch: 13 Global Step: 541250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:49,738-Speed 2627.67 samples/sec Loss 4.7670 LearningRate 0.0121 Epoch: 13 Global Step: 541260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:53,635-Speed 2627.96 samples/sec Loss 4.7542 LearningRate 0.0121 Epoch: 13 Global Step: 541270 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:33:57,534-Speed 2627.54 samples/sec Loss 4.6544 LearningRate 0.0121 Epoch: 13 Global Step: 541280 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:01,430-Speed 2628.59 samples/sec Loss 4.7192 LearningRate 0.0121 Epoch: 13 Global Step: 541290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:05,328-Speed 2627.28 samples/sec Loss 4.7351 LearningRate 0.0121 Epoch: 13 Global Step: 541300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:09,227-Speed 2627.05 samples/sec Loss 4.5740 LearningRate 0.0121 Epoch: 13 Global Step: 541310 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:13,121-Speed 2630.54 samples/sec Loss 4.7077 LearningRate 0.0121 Epoch: 13 Global Step: 541320 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:17,022-Speed 2625.16 samples/sec Loss 4.6999 LearningRate 0.0121 Epoch: 13 Global Step: 541330 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:20,923-Speed 2625.89 samples/sec Loss 4.7171 LearningRate 0.0121 Epoch: 13 Global Step: 541340 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:34:24,825-Speed 2624.47 samples/sec Loss 4.6186 LearningRate 0.0121 Epoch: 13 Global Step: 541350 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:34:28,722-Speed 2628.89 samples/sec Loss 4.6395 LearningRate 0.0121 Epoch: 13 Global Step: 541360 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:34:32,623-Speed 2625.14 samples/sec Loss 4.7077 LearningRate 0.0121 Epoch: 13 Global Step: 541370 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:34:36,517-Speed 2630.69 samples/sec Loss 4.7149 LearningRate 0.0121 Epoch: 13 Global Step: 541380 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:34:40,435-Speed 2614.26 samples/sec Loss 4.6506 LearningRate 0.0121 Epoch: 13 Global Step: 541390 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:34:44,310-Speed 2643.47 samples/sec Loss 4.6925 LearningRate 0.0121 Epoch: 13 Global Step: 541400 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:48,224-Speed 2616.54 samples/sec Loss 4.6978 LearningRate 0.0121 Epoch: 13 Global Step: 541410 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:52,121-Speed 2627.94 samples/sec Loss 4.6665 LearningRate 0.0121 Epoch: 13 Global Step: 541420 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:56,019-Speed 2628.00 samples/sec Loss 4.7744 LearningRate 0.0121 Epoch: 13 Global Step: 541430 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:34:59,912-Speed 2630.67 samples/sec Loss 4.6867 LearningRate 0.0121 Epoch: 13 Global Step: 541440 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:03,837-Speed 2609.68 samples/sec Loss 4.7260 LearningRate 0.0121 Epoch: 13 Global Step: 541450 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:07,746-Speed 2620.53 samples/sec Loss 4.8217 LearningRate 0.0121 Epoch: 13 Global Step: 541460 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:11,646-Speed 2626.18 samples/sec Loss 4.6264 LearningRate 0.0121 Epoch: 13 Global Step: 541470 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:15,556-Speed 2619.45 samples/sec Loss 4.6894 LearningRate 0.0121 Epoch: 13 Global Step: 541480 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:19,453-Speed 2628.81 samples/sec Loss 4.7969 LearningRate 0.0121 Epoch: 13 Global Step: 541490 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:23,347-Speed 2629.84 samples/sec Loss 4.6469 LearningRate 0.0121 Epoch: 13 Global Step: 541500 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:35:27,253-Speed 2622.67 samples/sec Loss 4.6323 LearningRate 0.0121 Epoch: 13 Global Step: 541510 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:35:31,150-Speed 2628.05 samples/sec Loss 4.7355 LearningRate 0.0121 Epoch: 13 Global Step: 541520 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:35:35,046-Speed 2629.79 samples/sec Loss 4.6669 LearningRate 0.0121 Epoch: 13 Global Step: 541530 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:35:38,939-Speed 2630.67 samples/sec Loss 4.6310 LearningRate 0.0121 Epoch: 13 Global Step: 541540 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:35:42,817-Speed 2641.81 samples/sec Loss 4.6515 LearningRate 0.0121 Epoch: 13 Global Step: 541550 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:46,711-Speed 2629.84 samples/sec Loss 4.7007 LearningRate 0.0121 Epoch: 13 Global Step: 541560 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:50,609-Speed 2627.89 samples/sec Loss 4.6562 LearningRate 0.0121 Epoch: 13 Global Step: 541570 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:54,524-Speed 2616.01 samples/sec Loss 4.7426 LearningRate 0.0121 Epoch: 13 Global Step: 541580 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:35:58,420-Speed 2628.97 samples/sec Loss 4.6389 LearningRate 0.0121 Epoch: 13 Global Step: 541590 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:36:02,323-Speed 2624.12 samples/sec Loss 4.6656 LearningRate 0.0121 Epoch: 13 Global Step: 541600 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:36:06,217-Speed 2631.11 samples/sec Loss 4.5574 LearningRate 0.0120 Epoch: 13 Global Step: 541610 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:36:10,112-Speed 2628.93 samples/sec Loss 4.6617 LearningRate 0.0120 Epoch: 13 Global Step: 541620 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:36:14,011-Speed 2627.60 samples/sec Loss 4.8342 LearningRate 0.0120 Epoch: 13 Global Step: 541630 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:36:17,916-Speed 2622.32 samples/sec Loss 4.7435 LearningRate 0.0120 Epoch: 13 Global Step: 541640 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:36:21,822-Speed 2622.34 samples/sec Loss 4.7244 LearningRate 0.0120 Epoch: 13 Global Step: 541650 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:25,717-Speed 2629.56 samples/sec Loss 4.6700 LearningRate 0.0120 Epoch: 13 Global Step: 541660 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:29,619-Speed 2625.60 samples/sec Loss 4.7202 LearningRate 0.0120 Epoch: 13 Global Step: 541670 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:33,519-Speed 2626.33 samples/sec Loss 4.7011 LearningRate 0.0120 Epoch: 13 Global Step: 541680 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:37,424-Speed 2622.87 samples/sec Loss 4.6821 LearningRate 0.0120 Epoch: 13 Global Step: 541690 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:41,325-Speed 2625.49 samples/sec Loss 4.6304 LearningRate 0.0120 Epoch: 13 Global Step: 541700 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:45,222-Speed 2628.56 samples/sec Loss 4.6464 LearningRate 0.0120 Epoch: 13 Global Step: 541710 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:49,134-Speed 2617.60 samples/sec Loss 4.7204 LearningRate 0.0120 Epoch: 13 Global Step: 541720 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:53,028-Speed 2630.52 samples/sec Loss 4.6087 LearningRate 0.0120 Epoch: 13 Global Step: 541730 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:36:56,909-Speed 2639.35 samples/sec Loss 4.6350 LearningRate 0.0120 Epoch: 13 Global Step: 541740 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:00,811-Speed 2624.78 samples/sec Loss 4.6416 LearningRate 0.0120 Epoch: 13 Global Step: 541750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:04,710-Speed 2627.41 samples/sec Loss 4.6596 LearningRate 0.0120 Epoch: 13 Global Step: 541760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:08,614-Speed 2623.56 samples/sec Loss 4.6769 LearningRate 0.0120 Epoch: 13 Global Step: 541770 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:12,510-Speed 2628.80 samples/sec Loss 4.7161 LearningRate 0.0120 Epoch: 13 Global Step: 541780 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:16,409-Speed 2626.51 samples/sec Loss 4.6755 LearningRate 0.0120 Epoch: 13 Global Step: 541790 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:20,310-Speed 2626.36 samples/sec Loss 4.6474 LearningRate 0.0120 Epoch: 13 Global Step: 541800 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:24,207-Speed 2627.89 samples/sec Loss 4.6538 LearningRate 0.0120 Epoch: 13 Global Step: 541810 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:28,112-Speed 2623.22 samples/sec Loss 4.7510 LearningRate 0.0120 Epoch: 13 Global Step: 541820 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:32,008-Speed 2628.79 samples/sec Loss 4.6874 LearningRate 0.0120 Epoch: 13 Global Step: 541830 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:35,914-Speed 2622.53 samples/sec Loss 4.6704 LearningRate 0.0120 Epoch: 13 Global Step: 541840 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:37:39,790-Speed 2642.33 samples/sec Loss 4.6103 LearningRate 0.0120 Epoch: 13 Global Step: 541850 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:43,689-Speed 2627.43 samples/sec Loss 4.5808 LearningRate 0.0120 Epoch: 13 Global Step: 541860 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:47,596-Speed 2621.20 samples/sec Loss 4.7018 LearningRate 0.0120 Epoch: 13 Global Step: 541870 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:51,498-Speed 2625.22 samples/sec Loss 4.6212 LearningRate 0.0120 Epoch: 13 Global Step: 541880 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:55,410-Speed 2618.32 samples/sec Loss 4.6735 LearningRate 0.0120 Epoch: 13 Global Step: 541890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:37:59,310-Speed 2626.82 samples/sec Loss 4.5511 LearningRate 0.0120 Epoch: 13 Global Step: 541900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:38:03,215-Speed 2622.34 samples/sec Loss 4.6031 LearningRate 0.0120 Epoch: 13 Global Step: 541910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:38:07,126-Speed 2619.10 samples/sec Loss 4.7167 LearningRate 0.0120 Epoch: 13 Global Step: 541920 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:38:11,023-Speed 2628.02 samples/sec Loss 4.6469 LearningRate 0.0120 Epoch: 13 Global Step: 541930 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:38:14,942-Speed 2613.43 samples/sec Loss 4.6035 LearningRate 0.0120 Epoch: 13 Global Step: 541940 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:38:18,849-Speed 2622.49 samples/sec Loss 4.6216 LearningRate 0.0120 Epoch: 13 Global Step: 541950 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:22,750-Speed 2625.05 samples/sec Loss 4.7040 LearningRate 0.0120 Epoch: 13 Global Step: 541960 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:26,647-Speed 2629.20 samples/sec Loss 4.6561 LearningRate 0.0120 Epoch: 13 Global Step: 541970 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:30,544-Speed 2627.90 samples/sec Loss 4.6822 LearningRate 0.0120 Epoch: 13 Global Step: 541980 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:34,439-Speed 2629.27 samples/sec Loss 4.6198 LearningRate 0.0120 Epoch: 13 Global Step: 541990 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:38,340-Speed 2625.35 samples/sec Loss 4.6716 LearningRate 0.0120 Epoch: 13 Global Step: 542000 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:42,239-Speed 2627.63 samples/sec Loss 4.6237 LearningRate 0.0120 Epoch: 13 Global Step: 542010 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:46,144-Speed 2622.24 samples/sec Loss 4.7090 LearningRate 0.0120 Epoch: 13 Global Step: 542020 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:50,044-Speed 2626.75 samples/sec Loss 4.7814 LearningRate 0.0120 Epoch: 13 Global Step: 542030 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:53,944-Speed 2625.83 samples/sec Loss 4.7160 LearningRate 0.0120 Epoch: 13 Global Step: 542040 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:38:57,838-Speed 2630.80 samples/sec Loss 4.6571 LearningRate 0.0120 Epoch: 13 Global Step: 542050 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:39:01,736-Speed 2627.51 samples/sec Loss 4.6631 LearningRate 0.0120 Epoch: 13 Global Step: 542060 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:39:05,589-Speed 2658.11 samples/sec Loss 4.5835 LearningRate 0.0120 Epoch: 13 Global Step: 542070 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:09,483-Speed 2630.71 samples/sec Loss 4.6901 LearningRate 0.0120 Epoch: 13 Global Step: 542080 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:13,379-Speed 2628.76 samples/sec Loss 4.7210 LearningRate 0.0120 Epoch: 13 Global Step: 542090 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:17,285-Speed 2622.06 samples/sec Loss 4.6789 LearningRate 0.0120 Epoch: 13 Global Step: 542100 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:21,205-Speed 2613.78 samples/sec Loss 4.6383 LearningRate 0.0120 Epoch: 13 Global Step: 542110 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:25,102-Speed 2628.08 samples/sec Loss 4.6782 LearningRate 0.0120 Epoch: 13 Global Step: 542120 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:29,013-Speed 2619.42 samples/sec Loss 4.5444 LearningRate 0.0120 Epoch: 13 Global Step: 542130 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:32,906-Speed 2630.84 samples/sec Loss 4.6579 LearningRate 0.0120 Epoch: 13 Global Step: 542140 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:36,804-Speed 2627.48 samples/sec Loss 4.6321 LearningRate 0.0120 Epoch: 13 Global Step: 542150 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:40,700-Speed 2628.65 samples/sec Loss 4.6424 LearningRate 0.0120 Epoch: 13 Global Step: 542160 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:39:44,598-Speed 2628.04 samples/sec Loss 4.5915 LearningRate 0.0120 Epoch: 13 Global Step: 542170 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:39:48,496-Speed 2628.01 samples/sec Loss 4.5824 LearningRate 0.0120 Epoch: 13 Global Step: 542180 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:39:52,404-Speed 2620.72 samples/sec Loss 4.7403 LearningRate 0.0120 Epoch: 13 Global Step: 542190 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:39:56,310-Speed 2622.80 samples/sec Loss 4.7242 LearningRate 0.0120 Epoch: 13 Global Step: 542200 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:00,214-Speed 2623.47 samples/sec Loss 4.5033 LearningRate 0.0120 Epoch: 13 Global Step: 542210 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:04,118-Speed 2623.39 samples/sec Loss 4.6556 LearningRate 0.0120 Epoch: 13 Global Step: 542220 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:08,022-Speed 2623.09 samples/sec Loss 4.7563 LearningRate 0.0120 Epoch: 13 Global Step: 542230 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:11,947-Speed 2609.95 samples/sec Loss 4.6501 LearningRate 0.0120 Epoch: 13 Global Step: 542240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:15,849-Speed 2625.18 samples/sec Loss 4.6754 LearningRate 0.0120 Epoch: 13 Global Step: 542250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:19,750-Speed 2625.92 samples/sec Loss 4.7492 LearningRate 0.0120 Epoch: 13 Global Step: 542260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:23,645-Speed 2629.72 samples/sec Loss 4.6211 LearningRate 0.0120 Epoch: 13 Global Step: 542270 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:40:27,547-Speed 2624.94 samples/sec Loss 4.6022 LearningRate 0.0120 Epoch: 13 Global Step: 542280 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:40:31,440-Speed 2631.23 samples/sec Loss 4.6979 LearningRate 0.0120 Epoch: 13 Global Step: 542290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:35,337-Speed 2628.27 samples/sec Loss 4.6393 LearningRate 0.0120 Epoch: 13 Global Step: 542300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:39,235-Speed 2627.18 samples/sec Loss 4.6516 LearningRate 0.0120 Epoch: 13 Global Step: 542310 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:43,139-Speed 2623.94 samples/sec Loss 4.7634 LearningRate 0.0120 Epoch: 13 Global Step: 542320 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:47,037-Speed 2627.58 samples/sec Loss 4.7394 LearningRate 0.0120 Epoch: 13 Global Step: 542330 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:50,939-Speed 2624.92 samples/sec Loss 4.6773 LearningRate 0.0120 Epoch: 13 Global Step: 542340 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:54,834-Speed 2629.36 samples/sec Loss 4.7038 LearningRate 0.0120 Epoch: 13 Global Step: 542350 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:40:58,733-Speed 2627.51 samples/sec Loss 4.6470 LearningRate 0.0120 Epoch: 13 Global Step: 542360 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:41:02,648-Speed 2615.96 samples/sec Loss 4.6525 LearningRate 0.0120 Epoch: 13 Global Step: 542370 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:41:06,549-Speed 2626.17 samples/sec Loss 4.6647 LearningRate 0.0120 Epoch: 13 Global Step: 542380 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:41:10,462-Speed 2617.82 samples/sec Loss 4.7314 LearningRate 0.0120 Epoch: 13 Global Step: 542390 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:14,359-Speed 2628.26 samples/sec Loss 4.6648 LearningRate 0.0120 Epoch: 13 Global Step: 542400 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:18,257-Speed 2627.68 samples/sec Loss 4.6397 LearningRate 0.0120 Epoch: 13 Global Step: 542410 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:22,159-Speed 2624.69 samples/sec Loss 4.5985 LearningRate 0.0120 Epoch: 13 Global Step: 542420 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:26,057-Speed 2627.91 samples/sec Loss 4.7079 LearningRate 0.0120 Epoch: 13 Global Step: 542430 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:29,957-Speed 2626.40 samples/sec Loss 4.5636 LearningRate 0.0120 Epoch: 13 Global Step: 542440 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:33,870-Speed 2617.95 samples/sec Loss 4.7277 LearningRate 0.0120 Epoch: 13 Global Step: 542450 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:37,769-Speed 2626.53 samples/sec Loss 4.7517 LearningRate 0.0120 Epoch: 13 Global Step: 542460 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:41,666-Speed 2628.67 samples/sec Loss 4.6746 LearningRate 0.0120 Epoch: 13 Global Step: 542470 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:41:45,545-Speed 2640.41 samples/sec Loss 4.6328 LearningRate 0.0120 Epoch: 13 Global Step: 542480 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:41:49,456-Speed 2618.89 samples/sec Loss 4.7906 LearningRate 0.0120 Epoch: 13 Global Step: 542490 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:41:53,353-Speed 2627.85 samples/sec Loss 4.7150 LearningRate 0.0120 Epoch: 13 Global Step: 542500 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:41:57,255-Speed 2625.52 samples/sec Loss 4.7072 LearningRate 0.0120 Epoch: 13 Global Step: 542510 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:42:01,154-Speed 2626.20 samples/sec Loss 4.5481 LearningRate 0.0120 Epoch: 13 Global Step: 542520 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:42:05,052-Speed 2628.50 samples/sec Loss 4.5640 LearningRate 0.0120 Epoch: 13 Global Step: 542530 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:42:08,959-Speed 2621.58 samples/sec Loss 4.6180 LearningRate 0.0120 Epoch: 13 Global Step: 542540 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:42:12,836-Speed 2642.11 samples/sec Loss 4.7081 LearningRate 0.0120 Epoch: 13 Global Step: 542550 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:16,765-Speed 2606.71 samples/sec Loss 4.5927 LearningRate 0.0120 Epoch: 13 Global Step: 542560 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:20,664-Speed 2627.24 samples/sec Loss 4.6505 LearningRate 0.0120 Epoch: 13 Global Step: 542570 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:24,566-Speed 2624.61 samples/sec Loss 4.6250 LearningRate 0.0120 Epoch: 13 Global Step: 542580 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:28,463-Speed 2628.79 samples/sec Loss 4.6564 LearningRate 0.0120 Epoch: 13 Global Step: 542590 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:32,371-Speed 2620.93 samples/sec Loss 4.6463 LearningRate 0.0120 Epoch: 13 Global Step: 542600 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:36,269-Speed 2627.90 samples/sec Loss 4.5692 LearningRate 0.0120 Epoch: 13 Global Step: 542610 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:40,166-Speed 2628.23 samples/sec Loss 4.6011 LearningRate 0.0120 Epoch: 13 Global Step: 542620 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:44,071-Speed 2623.67 samples/sec Loss 4.5579 LearningRate 0.0120 Epoch: 13 Global Step: 542630 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:47,972-Speed 2625.72 samples/sec Loss 4.6280 LearningRate 0.0120 Epoch: 13 Global Step: 542640 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:42:51,891-Speed 2613.66 samples/sec Loss 4.6452 LearningRate 0.0120 Epoch: 13 Global Step: 542650 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:42:55,829-Speed 2600.79 samples/sec Loss 4.7456 LearningRate 0.0120 Epoch: 13 Global Step: 542660 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:42:59,729-Speed 2626.51 samples/sec Loss 4.6723 LearningRate 0.0120 Epoch: 13 Global Step: 542670 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:03,638-Speed 2619.55 samples/sec Loss 4.6571 LearningRate 0.0120 Epoch: 13 Global Step: 542680 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:07,536-Speed 2628.37 samples/sec Loss 4.6213 LearningRate 0.0120 Epoch: 13 Global Step: 542690 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:11,433-Speed 2628.15 samples/sec Loss 4.6122 LearningRate 0.0120 Epoch: 13 Global Step: 542700 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:15,333-Speed 2626.60 samples/sec Loss 4.6586 LearningRate 0.0120 Epoch: 13 Global Step: 542710 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:19,326-Speed 2565.43 samples/sec Loss 4.6774 LearningRate 0.0120 Epoch: 13 Global Step: 542720 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:23,225-Speed 2626.66 samples/sec Loss 4.6050 LearningRate 0.0120 Epoch: 13 Global Step: 542730 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:27,119-Speed 2630.77 samples/sec Loss 4.6707 LearningRate 0.0120 Epoch: 13 Global Step: 542740 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:30,995-Speed 2642.13 samples/sec Loss 4.7586 LearningRate 0.0120 Epoch: 13 Global Step: 542750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:34,895-Speed 2626.32 samples/sec Loss 4.6284 LearningRate 0.0120 Epoch: 13 Global Step: 542760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:38,796-Speed 2625.64 samples/sec Loss 4.6197 LearningRate 0.0120 Epoch: 13 Global Step: 542770 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:42,696-Speed 2626.77 samples/sec Loss 4.6562 LearningRate 0.0120 Epoch: 13 Global Step: 542780 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:46,591-Speed 2629.64 samples/sec Loss 4.7735 LearningRate 0.0120 Epoch: 13 Global Step: 542790 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:50,490-Speed 2627.30 samples/sec Loss 4.7639 LearningRate 0.0120 Epoch: 13 Global Step: 542800 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:54,389-Speed 2626.22 samples/sec Loss 4.7929 LearningRate 0.0119 Epoch: 13 Global Step: 542810 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:43:58,299-Speed 2619.67 samples/sec Loss 4.6814 LearningRate 0.0119 Epoch: 13 Global Step: 542820 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:02,198-Speed 2626.97 samples/sec Loss 4.6170 LearningRate 0.0119 Epoch: 13 Global Step: 542830 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:06,094-Speed 2628.99 samples/sec Loss 4.7345 LearningRate 0.0119 Epoch: 13 Global Step: 542840 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:09,976-Speed 2637.93 samples/sec Loss 4.6064 LearningRate 0.0119 Epoch: 13 Global Step: 542850 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:13,875-Speed 2628.72 samples/sec Loss 4.6031 LearningRate 0.0119 Epoch: 13 Global Step: 542860 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:17,775-Speed 2625.70 samples/sec Loss 4.5865 LearningRate 0.0119 Epoch: 13 Global Step: 542870 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:21,678-Speed 2624.34 samples/sec Loss 4.5801 LearningRate 0.0119 Epoch: 13 Global Step: 542880 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:25,582-Speed 2623.88 samples/sec Loss 4.5805 LearningRate 0.0119 Epoch: 13 Global Step: 542890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:29,579-Speed 2563.60 samples/sec Loss 4.6604 LearningRate 0.0119 Epoch: 13 Global Step: 542900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:33,487-Speed 2620.59 samples/sec Loss 4.7226 LearningRate 0.0119 Epoch: 13 Global Step: 542910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:37,386-Speed 2626.66 samples/sec Loss 4.6671 LearningRate 0.0119 Epoch: 13 Global Step: 542920 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:41,286-Speed 2626.45 samples/sec Loss 4.6863 LearningRate 0.0119 Epoch: 13 Global Step: 542930 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:45,188-Speed 2625.54 samples/sec Loss 4.6917 LearningRate 0.0119 Epoch: 13 Global Step: 542940 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:49,090-Speed 2624.93 samples/sec Loss 4.5522 LearningRate 0.0119 Epoch: 13 Global Step: 542950 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:44:52,974-Speed 2637.30 samples/sec Loss 4.7131 LearningRate 0.0119 Epoch: 13 Global Step: 542960 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:44:56,874-Speed 2626.40 samples/sec Loss 4.6521 LearningRate 0.0119 Epoch: 13 Global Step: 542970 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:00,769-Speed 2629.62 samples/sec Loss 4.7011 LearningRate 0.0119 Epoch: 13 Global Step: 542980 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:04,696-Speed 2608.60 samples/sec Loss 4.6328 LearningRate 0.0119 Epoch: 13 Global Step: 542990 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:08,636-Speed 2599.28 samples/sec Loss 4.6523 LearningRate 0.0119 Epoch: 13 Global Step: 543000 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:12,534-Speed 2627.74 samples/sec Loss 4.6724 LearningRate 0.0119 Epoch: 13 Global Step: 543010 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:16,434-Speed 2626.25 samples/sec Loss 4.6345 LearningRate 0.0119 Epoch: 13 Global Step: 543020 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:20,332-Speed 2627.68 samples/sec Loss 4.7820 LearningRate 0.0119 Epoch: 13 Global Step: 543030 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:24,311-Speed 2574.18 samples/sec Loss 4.6246 LearningRate 0.0119 Epoch: 13 Global Step: 543040 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:28,213-Speed 2625.25 samples/sec Loss 4.6254 LearningRate 0.0119 Epoch: 13 Global Step: 543050 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:32,102-Speed 2633.76 samples/sec Loss 4.6632 LearningRate 0.0119 Epoch: 13 Global Step: 543060 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:36,018-Speed 2615.10 samples/sec Loss 4.6482 LearningRate 0.0119 Epoch: 13 Global Step: 543070 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:39,927-Speed 2620.94 samples/sec Loss 4.6011 LearningRate 0.0119 Epoch: 13 Global Step: 543080 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:43,829-Speed 2624.98 samples/sec Loss 4.6155 LearningRate 0.0119 Epoch: 13 Global Step: 543090 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:47,789-Speed 2586.59 samples/sec Loss 4.7003 LearningRate 0.0119 Epoch: 13 Global Step: 543100 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:51,696-Speed 2621.54 samples/sec Loss 4.5610 LearningRate 0.0119 Epoch: 13 Global Step: 543110 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:55,598-Speed 2624.84 samples/sec Loss 4.7040 LearningRate 0.0119 Epoch: 13 Global Step: 543120 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:45:59,498-Speed 2626.59 samples/sec Loss 4.6660 LearningRate 0.0119 Epoch: 13 Global Step: 543130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:03,396-Speed 2627.36 samples/sec Loss 4.6854 LearningRate 0.0119 Epoch: 13 Global Step: 543140 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:07,313-Speed 2614.75 samples/sec Loss 4.6359 LearningRate 0.0119 Epoch: 13 Global Step: 543150 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:11,214-Speed 2625.11 samples/sec Loss 4.7387 LearningRate 0.0119 Epoch: 13 Global Step: 543160 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:46:15,126-Speed 2618.93 samples/sec Loss 4.6290 LearningRate 0.0119 Epoch: 13 Global Step: 543170 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:46:19,116-Speed 2566.90 samples/sec Loss 4.6614 LearningRate 0.0119 Epoch: 13 Global Step: 543180 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:46:22,999-Speed 2637.79 samples/sec Loss 4.6095 LearningRate 0.0119 Epoch: 13 Global Step: 543190 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:26,908-Speed 2620.28 samples/sec Loss 4.6533 LearningRate 0.0119 Epoch: 13 Global Step: 543200 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:30,824-Speed 2616.02 samples/sec Loss 4.6652 LearningRate 0.0119 Epoch: 13 Global Step: 543210 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:34,729-Speed 2623.52 samples/sec Loss 4.5769 LearningRate 0.0119 Epoch: 13 Global Step: 543220 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:38,636-Speed 2621.53 samples/sec Loss 4.6226 LearningRate 0.0119 Epoch: 13 Global Step: 543230 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:42,546-Speed 2619.53 samples/sec Loss 4.6005 LearningRate 0.0119 Epoch: 13 Global Step: 543240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:46,451-Speed 2622.39 samples/sec Loss 4.7053 LearningRate 0.0119 Epoch: 13 Global Step: 543250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:50,354-Speed 2625.02 samples/sec Loss 4.7178 LearningRate 0.0119 Epoch: 13 Global Step: 543260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:54,265-Speed 2618.39 samples/sec Loss 4.6768 LearningRate 0.0119 Epoch: 13 Global Step: 543270 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:46:58,168-Speed 2624.32 samples/sec Loss 4.6732 LearningRate 0.0119 Epoch: 13 Global Step: 543280 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:02,043-Speed 2643.47 samples/sec Loss 4.6746 LearningRate 0.0119 Epoch: 13 Global Step: 543290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:05,988-Speed 2596.41 samples/sec Loss 4.6518 LearningRate 0.0119 Epoch: 13 Global Step: 543300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:09,912-Speed 2611.06 samples/sec Loss 4.6647 LearningRate 0.0119 Epoch: 13 Global Step: 543310 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:13,814-Speed 2624.23 samples/sec Loss 4.6868 LearningRate 0.0119 Epoch: 13 Global Step: 543320 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:17,733-Speed 2613.79 samples/sec Loss 4.6531 LearningRate 0.0119 Epoch: 13 Global Step: 543330 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:21,637-Speed 2624.23 samples/sec Loss 4.6636 LearningRate 0.0119 Epoch: 13 Global Step: 543340 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:25,534-Speed 2628.10 samples/sec Loss 4.6534 LearningRate 0.0119 Epoch: 13 Global Step: 543350 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:47:29,414-Speed 2639.67 samples/sec Loss 4.6522 LearningRate 0.0119 Epoch: 13 Global Step: 543360 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:47:33,317-Speed 2623.94 samples/sec Loss 4.6623 LearningRate 0.0119 Epoch: 13 Global Step: 543370 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:47:37,226-Speed 2620.27 samples/sec Loss 4.5156 LearningRate 0.0119 Epoch: 13 Global Step: 543380 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:47:41,142-Speed 2616.03 samples/sec Loss 4.6763 LearningRate 0.0119 Epoch: 13 Global Step: 543390 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:47:45,050-Speed 2620.76 samples/sec Loss 4.6904 LearningRate 0.0119 Epoch: 13 Global Step: 543400 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:47:48,967-Speed 2615.70 samples/sec Loss 4.6743 LearningRate 0.0119 Epoch: 13 Global Step: 543410 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:47:52,889-Speed 2611.35 samples/sec Loss 4.7187 LearningRate 0.0119 Epoch: 13 Global Step: 543420 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:47:56,802-Speed 2617.57 samples/sec Loss 4.6128 LearningRate 0.0119 Epoch: 13 Global Step: 543430 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:48:00,710-Speed 2620.11 samples/sec Loss 4.6198 LearningRate 0.0119 Epoch: 13 Global Step: 543440 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:48:04,653-Speed 2598.60 samples/sec Loss 4.5729 LearningRate 0.0119 Epoch: 13 Global Step: 543450 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:48:08,574-Speed 2612.21 samples/sec Loss 4.6634 LearningRate 0.0119 Epoch: 13 Global Step: 543460 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:12,495-Speed 2612.35 samples/sec Loss 4.6192 LearningRate 0.0119 Epoch: 13 Global Step: 543470 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:16,391-Speed 2628.61 samples/sec Loss 4.6671 LearningRate 0.0119 Epoch: 13 Global Step: 543480 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:20,291-Speed 2626.98 samples/sec Loss 4.6542 LearningRate 0.0119 Epoch: 13 Global Step: 543490 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:24,190-Speed 2626.76 samples/sec Loss 4.5860 LearningRate 0.0119 Epoch: 13 Global Step: 543500 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:28,091-Speed 2626.14 samples/sec Loss 4.6681 LearningRate 0.0119 Epoch: 13 Global Step: 543510 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:31,989-Speed 2626.93 samples/sec Loss 4.6829 LearningRate 0.0119 Epoch: 13 Global Step: 543520 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:36,076-Speed 2506.43 samples/sec Loss 4.6634 LearningRate 0.0119 Epoch: 13 Global Step: 543530 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:39,972-Speed 2629.48 samples/sec Loss 4.5951 LearningRate 0.0119 Epoch: 13 Global Step: 543540 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:43,871-Speed 2627.26 samples/sec Loss 4.7297 LearningRate 0.0119 Epoch: 13 Global Step: 543550 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:47,772-Speed 2626.03 samples/sec Loss 4.5705 LearningRate 0.0119 Epoch: 13 Global Step: 543560 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:48:51,680-Speed 2620.38 samples/sec Loss 4.6049 LearningRate 0.0119 Epoch: 13 Global Step: 543570 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:48:55,583-Speed 2624.44 samples/sec Loss 4.6450 LearningRate 0.0119 Epoch: 13 Global Step: 543580 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:48:59,488-Speed 2623.32 samples/sec Loss 4.6817 LearningRate 0.0119 Epoch: 13 Global Step: 543590 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:49:03,386-Speed 2627.54 samples/sec Loss 4.7004 LearningRate 0.0119 Epoch: 13 Global Step: 543600 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:49:07,288-Speed 2624.84 samples/sec Loss 4.6153 LearningRate 0.0119 Epoch: 13 Global Step: 543610 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:49:11,163-Speed 2643.53 samples/sec Loss 4.6962 LearningRate 0.0119 Epoch: 13 Global Step: 543620 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:15,060-Speed 2628.53 samples/sec Loss 4.6416 LearningRate 0.0119 Epoch: 13 Global Step: 543630 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:18,972-Speed 2617.55 samples/sec Loss 4.7496 LearningRate 0.0119 Epoch: 13 Global Step: 543640 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:22,875-Speed 2624.70 samples/sec Loss 4.6133 LearningRate 0.0119 Epoch: 13 Global Step: 543650 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:26,784-Speed 2620.24 samples/sec Loss 4.5834 LearningRate 0.0119 Epoch: 13 Global Step: 543660 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:30,696-Speed 2618.77 samples/sec Loss 4.5446 LearningRate 0.0119 Epoch: 13 Global Step: 543670 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:34,641-Speed 2595.89 samples/sec Loss 4.6895 LearningRate 0.0119 Epoch: 13 Global Step: 543680 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:38,546-Speed 2622.91 samples/sec Loss 4.7550 LearningRate 0.0119 Epoch: 13 Global Step: 543690 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:42,452-Speed 2622.21 samples/sec Loss 4.6553 LearningRate 0.0119 Epoch: 13 Global Step: 543700 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:46,356-Speed 2623.06 samples/sec Loss 4.6058 LearningRate 0.0119 Epoch: 13 Global Step: 543710 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 08:49:50,256-Speed 2626.34 samples/sec Loss 4.6187 LearningRate 0.0119 Epoch: 13 Global Step: 543720 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:49:54,160-Speed 2623.82 samples/sec Loss 4.6355 LearningRate 0.0119 Epoch: 13 Global Step: 543730 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:49:58,061-Speed 2626.10 samples/sec Loss 4.6198 LearningRate 0.0119 Epoch: 13 Global Step: 543740 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:01,967-Speed 2622.05 samples/sec Loss 4.6346 LearningRate 0.0119 Epoch: 13 Global Step: 543750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:05,877-Speed 2619.56 samples/sec Loss 4.5670 LearningRate 0.0119 Epoch: 13 Global Step: 543760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:09,784-Speed 2621.78 samples/sec Loss 4.5368 LearningRate 0.0119 Epoch: 13 Global Step: 543770 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:13,694-Speed 2619.97 samples/sec Loss 4.6032 LearningRate 0.0119 Epoch: 13 Global Step: 543780 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:17,607-Speed 2616.97 samples/sec Loss 4.7880 LearningRate 0.0119 Epoch: 13 Global Step: 543790 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:21,521-Speed 2617.52 samples/sec Loss 4.6450 LearningRate 0.0119 Epoch: 13 Global Step: 543800 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:25,443-Speed 2610.79 samples/sec Loss 4.6949 LearningRate 0.0119 Epoch: 13 Global Step: 543810 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:29,348-Speed 2624.18 samples/sec Loss 4.6542 LearningRate 0.0119 Epoch: 13 Global Step: 543820 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:50:33,235-Speed 2634.58 samples/sec Loss 4.6472 LearningRate 0.0119 Epoch: 13 Global Step: 543830 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:37,141-Speed 2622.35 samples/sec Loss 4.6363 LearningRate 0.0119 Epoch: 13 Global Step: 543840 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:41,056-Speed 2615.59 samples/sec Loss 4.6417 LearningRate 0.0119 Epoch: 13 Global Step: 543850 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:44,969-Speed 2618.38 samples/sec Loss 4.7093 LearningRate 0.0119 Epoch: 13 Global Step: 543860 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:48,871-Speed 2624.61 samples/sec Loss 4.5731 LearningRate 0.0119 Epoch: 13 Global Step: 543870 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:52,773-Speed 2625.05 samples/sec Loss 4.6558 LearningRate 0.0119 Epoch: 13 Global Step: 543880 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:50:56,676-Speed 2623.81 samples/sec Loss 4.6733 LearningRate 0.0119 Epoch: 13 Global Step: 543890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:00,580-Speed 2624.53 samples/sec Loss 4.5930 LearningRate 0.0119 Epoch: 13 Global Step: 543900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:04,484-Speed 2623.74 samples/sec Loss 4.6067 LearningRate 0.0119 Epoch: 13 Global Step: 543910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:08,389-Speed 2622.13 samples/sec Loss 4.6618 LearningRate 0.0119 Epoch: 13 Global Step: 543920 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:12,274-Speed 2636.25 samples/sec Loss 4.6878 LearningRate 0.0119 Epoch: 13 Global Step: 543930 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:16,172-Speed 2628.24 samples/sec Loss 4.5266 LearningRate 0.0119 Epoch: 13 Global Step: 543940 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:20,075-Speed 2624.05 samples/sec Loss 4.5689 LearningRate 0.0119 Epoch: 13 Global Step: 543950 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:23,979-Speed 2623.83 samples/sec Loss 4.6681 LearningRate 0.0119 Epoch: 13 Global Step: 543960 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:27,899-Speed 2622.26 samples/sec Loss 4.6499 LearningRate 0.0119 Epoch: 13 Global Step: 543970 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:31,796-Speed 2628.40 samples/sec Loss 4.5571 LearningRate 0.0119 Epoch: 13 Global Step: 543980 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:35,699-Speed 2624.15 samples/sec Loss 4.6442 LearningRate 0.0119 Epoch: 13 Global Step: 543990 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:39,599-Speed 2626.44 samples/sec Loss 4.6294 LearningRate 0.0119 Epoch: 13 Global Step: 544000 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:43,507-Speed 2621.16 samples/sec Loss 4.6495 LearningRate 0.0118 Epoch: 13 Global Step: 544010 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:47,406-Speed 2626.63 samples/sec Loss 4.6115 LearningRate 0.0118 Epoch: 13 Global Step: 544020 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:51:51,305-Speed 2626.99 samples/sec Loss 4.6443 LearningRate 0.0118 Epoch: 13 Global Step: 544030 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:51:55,234-Speed 2606.98 samples/sec Loss 4.5241 LearningRate 0.0118 Epoch: 13 Global Step: 544040 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:51:59,137-Speed 2624.41 samples/sec Loss 4.5907 LearningRate 0.0118 Epoch: 13 Global Step: 544050 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:52:03,089-Speed 2592.02 samples/sec Loss 4.5750 LearningRate 0.0118 Epoch: 13 Global Step: 544060 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:52:07,188-Speed 2498.39 samples/sec Loss 4.5650 LearningRate 0.0118 Epoch: 13 Global Step: 544070 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:52:11,139-Speed 2592.78 samples/sec Loss 4.6799 LearningRate 0.0118 Epoch: 13 Global Step: 544080 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:52:15,022-Speed 2638.51 samples/sec Loss 4.6389 LearningRate 0.0118 Epoch: 13 Global Step: 544090 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:19,746-Speed 2167.99 samples/sec Loss 4.5867 LearningRate 0.0118 Epoch: 13 Global Step: 544100 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:23,850-Speed 2495.31 samples/sec Loss 4.6507 LearningRate 0.0118 Epoch: 13 Global Step: 544110 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:27,828-Speed 2574.34 samples/sec Loss 4.6367 LearningRate 0.0118 Epoch: 13 Global Step: 544120 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:31,747-Speed 2613.87 samples/sec Loss 4.7386 LearningRate 0.0118 Epoch: 13 Global Step: 544130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:35,648-Speed 2626.48 samples/sec Loss 4.6358 LearningRate 0.0118 Epoch: 13 Global Step: 544140 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:39,545-Speed 2628.07 samples/sec Loss 4.6889 LearningRate 0.0118 Epoch: 13 Global Step: 544150 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:43,447-Speed 2624.95 samples/sec Loss 4.6527 LearningRate 0.0118 Epoch: 13 Global Step: 544160 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:47,368-Speed 2612.05 samples/sec Loss 4.6298 LearningRate 0.0118 Epoch: 13 Global Step: 544170 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:51,275-Speed 2621.60 samples/sec Loss 4.7185 LearningRate 0.0118 Epoch: 13 Global Step: 544180 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:52:55,215-Speed 2600.68 samples/sec Loss 4.6791 LearningRate 0.0118 Epoch: 13 Global Step: 544190 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:52:59,113-Speed 2627.20 samples/sec Loss 4.6929 LearningRate 0.0118 Epoch: 13 Global Step: 544200 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:53:03,030-Speed 2615.15 samples/sec Loss 4.6153 LearningRate 0.0118 Epoch: 13 Global Step: 544210 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:06,930-Speed 2626.34 samples/sec Loss 4.6686 LearningRate 0.0118 Epoch: 13 Global Step: 544220 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:10,830-Speed 2626.58 samples/sec Loss 4.6422 LearningRate 0.0118 Epoch: 13 Global Step: 544230 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:14,729-Speed 2627.53 samples/sec Loss 4.7197 LearningRate 0.0118 Epoch: 13 Global Step: 544240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:18,632-Speed 2623.64 samples/sec Loss 4.5935 LearningRate 0.0118 Epoch: 13 Global Step: 544250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:22,531-Speed 2627.27 samples/sec Loss 4.6778 LearningRate 0.0118 Epoch: 13 Global Step: 544260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:26,442-Speed 2619.33 samples/sec Loss 4.6740 LearningRate 0.0118 Epoch: 13 Global Step: 544270 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:30,721-Speed 2393.51 samples/sec Loss 4.6183 LearningRate 0.0118 Epoch: 13 Global Step: 544280 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:34,621-Speed 2627.03 samples/sec Loss 4.5934 LearningRate 0.0118 Epoch: 13 Global Step: 544290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:38,531-Speed 2618.99 samples/sec Loss 4.5789 LearningRate 0.0118 Epoch: 13 Global Step: 544300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:53:42,428-Speed 2628.50 samples/sec Loss 4.6028 LearningRate 0.0118 Epoch: 13 Global Step: 544310 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:53:46,362-Speed 2603.53 samples/sec Loss 4.6464 LearningRate 0.0118 Epoch: 13 Global Step: 544320 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:53:50,260-Speed 2627.92 samples/sec Loss 4.6532 LearningRate 0.0118 Epoch: 13 Global Step: 544330 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:53:54,161-Speed 2625.77 samples/sec Loss 4.5185 LearningRate 0.0118 Epoch: 13 Global Step: 544340 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:53:58,041-Speed 2640.06 samples/sec Loss 4.6706 LearningRate 0.0118 Epoch: 13 Global Step: 544350 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:01,945-Speed 2623.17 samples/sec Loss 4.5848 LearningRate 0.0118 Epoch: 13 Global Step: 544360 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:05,851-Speed 2622.66 samples/sec Loss 4.5999 LearningRate 0.0118 Epoch: 13 Global Step: 544370 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:09,753-Speed 2625.39 samples/sec Loss 4.5907 LearningRate 0.0118 Epoch: 13 Global Step: 544380 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:13,655-Speed 2624.44 samples/sec Loss 4.7051 LearningRate 0.0118 Epoch: 13 Global Step: 544390 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:17,559-Speed 2624.28 samples/sec Loss 4.6654 LearningRate 0.0118 Epoch: 13 Global Step: 544400 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:21,467-Speed 2620.49 samples/sec Loss 4.6212 LearningRate 0.0118 Epoch: 13 Global Step: 544410 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:25,415-Speed 2594.29 samples/sec Loss 4.7221 LearningRate 0.0118 Epoch: 13 Global Step: 544420 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:29,316-Speed 2625.73 samples/sec Loss 4.6511 LearningRate 0.0118 Epoch: 13 Global Step: 544430 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:33,219-Speed 2624.59 samples/sec Loss 4.5958 LearningRate 0.0118 Epoch: 13 Global Step: 544440 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:37,117-Speed 2627.05 samples/sec Loss 4.7539 LearningRate 0.0118 Epoch: 13 Global Step: 544450 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:54:40,990-Speed 2645.05 samples/sec Loss 4.5154 LearningRate 0.0118 Epoch: 13 Global Step: 544460 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:44,897-Speed 2622.07 samples/sec Loss 4.6102 LearningRate 0.0118 Epoch: 13 Global Step: 544470 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:48,792-Speed 2629.02 samples/sec Loss 4.7220 LearningRate 0.0118 Epoch: 13 Global Step: 544480 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:52,699-Speed 2622.46 samples/sec Loss 4.5109 LearningRate 0.0118 Epoch: 13 Global Step: 544490 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:54:56,601-Speed 2624.57 samples/sec Loss 4.6033 LearningRate 0.0118 Epoch: 13 Global Step: 544500 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:00,495-Speed 2630.47 samples/sec Loss 4.6149 LearningRate 0.0118 Epoch: 13 Global Step: 544510 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:04,428-Speed 2603.77 samples/sec Loss 4.6578 LearningRate 0.0118 Epoch: 13 Global Step: 544520 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:08,333-Speed 2623.80 samples/sec Loss 4.5789 LearningRate 0.0118 Epoch: 13 Global Step: 544530 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:12,233-Speed 2626.27 samples/sec Loss 4.6411 LearningRate 0.0118 Epoch: 13 Global Step: 544540 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:16,135-Speed 2624.94 samples/sec Loss 4.6525 LearningRate 0.0118 Epoch: 13 Global Step: 544550 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:20,036-Speed 2626.00 samples/sec Loss 4.6690 LearningRate 0.0118 Epoch: 13 Global Step: 544560 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:55:23,948-Speed 2618.44 samples/sec Loss 4.5356 LearningRate 0.0118 Epoch: 13 Global Step: 544570 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:55:27,849-Speed 2625.93 samples/sec Loss 4.6561 LearningRate 0.0118 Epoch: 13 Global Step: 544580 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:55:31,754-Speed 2623.00 samples/sec Loss 4.7091 LearningRate 0.0118 Epoch: 13 Global Step: 544590 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:55:35,678-Speed 2610.07 samples/sec Loss 4.6905 LearningRate 0.0118 Epoch: 13 Global Step: 544600 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:39,597-Speed 2613.48 samples/sec Loss 4.5167 LearningRate 0.0118 Epoch: 13 Global Step: 544610 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:43,501-Speed 2623.76 samples/sec Loss 4.5961 LearningRate 0.0118 Epoch: 13 Global Step: 544620 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:47,398-Speed 2628.56 samples/sec Loss 4.6093 LearningRate 0.0118 Epoch: 13 Global Step: 544630 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:51,307-Speed 2620.43 samples/sec Loss 4.5762 LearningRate 0.0118 Epoch: 13 Global Step: 544640 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:55,207-Speed 2626.35 samples/sec Loss 4.6394 LearningRate 0.0118 Epoch: 13 Global Step: 544650 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:55:59,116-Speed 2620.50 samples/sec Loss 4.7132 LearningRate 0.0118 Epoch: 13 Global Step: 544660 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:03,017-Speed 2625.19 samples/sec Loss 4.7096 LearningRate 0.0118 Epoch: 13 Global Step: 544670 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:06,915-Speed 2627.77 samples/sec Loss 4.6037 LearningRate 0.0118 Epoch: 13 Global Step: 544680 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:10,909-Speed 2564.53 samples/sec Loss 4.5690 LearningRate 0.0118 Epoch: 13 Global Step: 544690 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:14,836-Speed 2608.20 samples/sec Loss 4.6216 LearningRate 0.0118 Epoch: 13 Global Step: 544700 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:56:18,713-Speed 2642.04 samples/sec Loss 4.5627 LearningRate 0.0118 Epoch: 13 Global Step: 544710 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:22,616-Speed 2624.64 samples/sec Loss 4.7287 LearningRate 0.0118 Epoch: 13 Global Step: 544720 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:26,522-Speed 2622.57 samples/sec Loss 4.5469 LearningRate 0.0118 Epoch: 13 Global Step: 544730 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:30,421-Speed 2626.97 samples/sec Loss 4.6398 LearningRate 0.0118 Epoch: 13 Global Step: 544740 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:34,319-Speed 2627.82 samples/sec Loss 4.6294 LearningRate 0.0118 Epoch: 13 Global Step: 544750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:38,217-Speed 2627.15 samples/sec Loss 4.7063 LearningRate 0.0118 Epoch: 13 Global Step: 544760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:42,120-Speed 2624.62 samples/sec Loss 4.5498 LearningRate 0.0118 Epoch: 13 Global Step: 544770 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:46,029-Speed 2620.37 samples/sec Loss 4.6890 LearningRate 0.0118 Epoch: 13 Global Step: 544780 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:49,928-Speed 2627.12 samples/sec Loss 4.5353 LearningRate 0.0118 Epoch: 13 Global Step: 544790 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:53,859-Speed 2605.65 samples/sec Loss 4.5999 LearningRate 0.0118 Epoch: 13 Global Step: 544800 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:56:57,753-Speed 2630.49 samples/sec Loss 4.5757 LearningRate 0.0118 Epoch: 13 Global Step: 544810 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:57:01,650-Speed 2629.04 samples/sec Loss 4.5955 LearningRate 0.0118 Epoch: 13 Global Step: 544820 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:57:05,547-Speed 2627.74 samples/sec Loss 4.7098 LearningRate 0.0118 Epoch: 13 Global Step: 544830 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:57:09,464-Speed 2615.24 samples/sec Loss 4.5700 LearningRate 0.0118 Epoch: 13 Global Step: 544840 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:13,361-Speed 2628.00 samples/sec Loss 4.7116 LearningRate 0.0118 Epoch: 13 Global Step: 544850 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:17,260-Speed 2627.21 samples/sec Loss 4.6511 LearningRate 0.0118 Epoch: 13 Global Step: 544860 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:21,165-Speed 2622.98 samples/sec Loss 4.6941 LearningRate 0.0118 Epoch: 13 Global Step: 544870 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:25,066-Speed 2625.41 samples/sec Loss 4.6444 LearningRate 0.0118 Epoch: 13 Global Step: 544880 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:28,964-Speed 2627.97 samples/sec Loss 4.6108 LearningRate 0.0118 Epoch: 13 Global Step: 544890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:32,870-Speed 2622.29 samples/sec Loss 4.7097 LearningRate 0.0118 Epoch: 13 Global Step: 544900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:36,768-Speed 2628.01 samples/sec Loss 4.6631 LearningRate 0.0118 Epoch: 13 Global Step: 544910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:40,663-Speed 2629.25 samples/sec Loss 4.5921 LearningRate 0.0118 Epoch: 13 Global Step: 544920 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:44,590-Speed 2608.27 samples/sec Loss 4.6227 LearningRate 0.0118 Epoch: 13 Global Step: 544930 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:57:48,493-Speed 2624.56 samples/sec Loss 4.6498 LearningRate 0.0118 Epoch: 13 Global Step: 544940 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:57:52,397-Speed 2623.80 samples/sec Loss 4.5838 LearningRate 0.0118 Epoch: 13 Global Step: 544950 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:57:56,298-Speed 2625.72 samples/sec Loss 4.6172 LearningRate 0.0118 Epoch: 13 Global Step: 544960 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:58:00,173-Speed 2644.02 samples/sec Loss 4.6140 LearningRate 0.0118 Epoch: 13 Global Step: 544970 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:04,069-Speed 2628.72 samples/sec Loss 4.5742 LearningRate 0.0118 Epoch: 13 Global Step: 544980 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:07,966-Speed 2628.09 samples/sec Loss 4.6364 LearningRate 0.0118 Epoch: 13 Global Step: 544990 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:11,884-Speed 2614.44 samples/sec Loss 4.6710 LearningRate 0.0118 Epoch: 13 Global Step: 545000 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:15,778-Speed 2630.37 samples/sec Loss 4.5950 LearningRate 0.0118 Epoch: 13 Global Step: 545010 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:19,681-Speed 2624.25 samples/sec Loss 4.5970 LearningRate 0.0118 Epoch: 13 Global Step: 545020 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:23,588-Speed 2621.40 samples/sec Loss 4.6246 LearningRate 0.0118 Epoch: 13 Global Step: 545030 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:27,486-Speed 2627.41 samples/sec Loss 4.6794 LearningRate 0.0118 Epoch: 13 Global Step: 545040 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:31,381-Speed 2630.19 samples/sec Loss 4.7090 LearningRate 0.0118 Epoch: 13 Global Step: 545050 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:35,280-Speed 2627.63 samples/sec Loss 4.6906 LearningRate 0.0118 Epoch: 13 Global Step: 545060 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:39,181-Speed 2625.16 samples/sec Loss 4.6310 LearningRate 0.0118 Epoch: 13 Global Step: 545070 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:58:43,086-Speed 2622.64 samples/sec Loss 4.5342 LearningRate 0.0118 Epoch: 13 Global Step: 545080 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:58:46,989-Speed 2625.11 samples/sec Loss 4.5607 LearningRate 0.0118 Epoch: 13 Global Step: 545090 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:58:50,893-Speed 2623.63 samples/sec Loss 4.6673 LearningRate 0.0118 Epoch: 13 Global Step: 545100 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 08:58:54,772-Speed 2640.33 samples/sec Loss 4.6906 LearningRate 0.0118 Epoch: 13 Global Step: 545110 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:58:58,680-Speed 2620.67 samples/sec Loss 4.5478 LearningRate 0.0118 Epoch: 13 Global Step: 545120 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:02,589-Speed 2620.27 samples/sec Loss 4.5871 LearningRate 0.0118 Epoch: 13 Global Step: 545130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:06,496-Speed 2621.89 samples/sec Loss 4.6648 LearningRate 0.0118 Epoch: 13 Global Step: 545140 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:10,398-Speed 2624.84 samples/sec Loss 4.7057 LearningRate 0.0118 Epoch: 13 Global Step: 545150 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:14,294-Speed 2628.95 samples/sec Loss 4.6918 LearningRate 0.0118 Epoch: 13 Global Step: 545160 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:18,192-Speed 2628.25 samples/sec Loss 4.5902 LearningRate 0.0118 Epoch: 13 Global Step: 545170 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:22,085-Speed 2630.90 samples/sec Loss 4.7425 LearningRate 0.0118 Epoch: 13 Global Step: 545180 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:25,991-Speed 2622.32 samples/sec Loss 4.5458 LearningRate 0.0118 Epoch: 13 Global Step: 545190 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:29,887-Speed 2629.63 samples/sec Loss 4.6453 LearningRate 0.0118 Epoch: 13 Global Step: 545200 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:33,754-Speed 2648.48 samples/sec Loss 4.6137 LearningRate 0.0118 Epoch: 13 Global Step: 545210 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:37,676-Speed 2611.11 samples/sec Loss 4.5367 LearningRate 0.0117 Epoch: 13 Global Step: 545220 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:41,569-Speed 2631.24 samples/sec Loss 4.6733 LearningRate 0.0117 Epoch: 13 Global Step: 545230 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:45,467-Speed 2628.33 samples/sec Loss 4.5308 LearningRate 0.0117 Epoch: 13 Global Step: 545240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:49,360-Speed 2631.28 samples/sec Loss 4.6605 LearningRate 0.0117 Epoch: 13 Global Step: 545250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:53,257-Speed 2628.49 samples/sec Loss 4.7361 LearningRate 0.0117 Epoch: 13 Global Step: 545260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 08:59:57,191-Speed 2603.39 samples/sec Loss 4.6356 LearningRate 0.0117 Epoch: 13 Global Step: 545270 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:01,095-Speed 2623.12 samples/sec Loss 4.6028 LearningRate 0.0117 Epoch: 13 Global Step: 545280 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:04,995-Speed 2626.84 samples/sec Loss 4.6604 LearningRate 0.0117 Epoch: 13 Global Step: 545290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:08,892-Speed 2628.11 samples/sec Loss 4.5508 LearningRate 0.0117 Epoch: 13 Global Step: 545300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:12,786-Speed 2630.39 samples/sec Loss 4.6543 LearningRate 0.0117 Epoch: 13 Global Step: 545310 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:00:16,716-Speed 2606.44 samples/sec Loss 4.5790 LearningRate 0.0117 Epoch: 13 Global Step: 545320 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:00:20,616-Speed 2626.35 samples/sec Loss 4.6388 LearningRate 0.0117 Epoch: 13 Global Step: 545330 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:00:24,514-Speed 2627.95 samples/sec Loss 4.5208 LearningRate 0.0117 Epoch: 13 Global Step: 545340 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:00:28,412-Speed 2627.37 samples/sec Loss 4.7338 LearningRate 0.0117 Epoch: 13 Global Step: 545350 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:00:32,311-Speed 2626.75 samples/sec Loss 4.5935 LearningRate 0.0117 Epoch: 13 Global Step: 545360 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:00:36,209-Speed 2627.42 samples/sec Loss 4.5662 LearningRate 0.0117 Epoch: 13 Global Step: 545370 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:00:40,113-Speed 2624.59 samples/sec Loss 4.5873 LearningRate 0.0117 Epoch: 13 Global Step: 545380 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:44,009-Speed 2628.25 samples/sec Loss 4.5642 LearningRate 0.0117 Epoch: 13 Global Step: 545390 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:47,925-Speed 2615.77 samples/sec Loss 4.6019 LearningRate 0.0117 Epoch: 13 Global Step: 545400 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:51,824-Speed 2627.21 samples/sec Loss 4.5831 LearningRate 0.0117 Epoch: 13 Global Step: 545410 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:55,719-Speed 2630.34 samples/sec Loss 4.6003 LearningRate 0.0117 Epoch: 13 Global Step: 545420 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:00:59,621-Speed 2624.27 samples/sec Loss 4.5479 LearningRate 0.0117 Epoch: 13 Global Step: 545430 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:03,519-Speed 2628.14 samples/sec Loss 4.6072 LearningRate 0.0117 Epoch: 13 Global Step: 545440 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:07,420-Speed 2625.71 samples/sec Loss 4.6444 LearningRate 0.0117 Epoch: 13 Global Step: 545450 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:11,372-Speed 2591.64 samples/sec Loss 4.6336 LearningRate 0.0117 Epoch: 13 Global Step: 545460 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:15,277-Speed 2622.94 samples/sec Loss 4.5434 LearningRate 0.0117 Epoch: 13 Global Step: 545470 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:19,177-Speed 2626.59 samples/sec Loss 4.7676 LearningRate 0.0117 Epoch: 13 Global Step: 545480 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:01:23,082-Speed 2623.00 samples/sec Loss 4.5335 LearningRate 0.0117 Epoch: 13 Global Step: 545490 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:01:26,977-Speed 2629.44 samples/sec Loss 4.6700 LearningRate 0.0117 Epoch: 13 Global Step: 545500 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:01:30,872-Speed 2629.96 samples/sec Loss 4.6397 LearningRate 0.0117 Epoch: 13 Global Step: 545510 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:01:34,749-Speed 2642.15 samples/sec Loss 4.6282 LearningRate 0.0117 Epoch: 13 Global Step: 545520 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:38,643-Speed 2630.03 samples/sec Loss 4.6088 LearningRate 0.0117 Epoch: 13 Global Step: 545530 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:42,536-Speed 2630.46 samples/sec Loss 4.6354 LearningRate 0.0117 Epoch: 13 Global Step: 545540 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:46,432-Speed 2629.61 samples/sec Loss 4.6605 LearningRate 0.0117 Epoch: 13 Global Step: 545550 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:50,327-Speed 2629.57 samples/sec Loss 4.6864 LearningRate 0.0117 Epoch: 13 Global Step: 545560 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:54,227-Speed 2626.90 samples/sec Loss 4.5725 LearningRate 0.0117 Epoch: 13 Global Step: 545570 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:01:58,119-Speed 2631.56 samples/sec Loss 4.6712 LearningRate 0.0117 Epoch: 13 Global Step: 545580 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:02,068-Speed 2594.06 samples/sec Loss 4.6819 LearningRate 0.0117 Epoch: 13 Global Step: 545590 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:05,959-Speed 2632.33 samples/sec Loss 4.6198 LearningRate 0.0117 Epoch: 13 Global Step: 545600 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:09,856-Speed 2628.09 samples/sec Loss 4.6081 LearningRate 0.0117 Epoch: 13 Global Step: 545610 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:13,752-Speed 2628.84 samples/sec Loss 4.5916 LearningRate 0.0117 Epoch: 13 Global Step: 545620 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:02:17,647-Speed 2629.78 samples/sec Loss 4.5745 LearningRate 0.0117 Epoch: 13 Global Step: 545630 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:02:21,589-Speed 2598.92 samples/sec Loss 4.6196 LearningRate 0.0117 Epoch: 13 Global Step: 545640 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:02:25,485-Speed 2628.74 samples/sec Loss 4.6721 LearningRate 0.0117 Epoch: 13 Global Step: 545650 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:02:29,422-Speed 2601.68 samples/sec Loss 4.5649 LearningRate 0.0117 Epoch: 13 Global Step: 545660 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:33,447-Speed 2544.94 samples/sec Loss 4.6561 LearningRate 0.0117 Epoch: 13 Global Step: 545670 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:37,434-Speed 2568.84 samples/sec Loss 4.6819 LearningRate 0.0117 Epoch: 13 Global Step: 545680 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:41,328-Speed 2629.81 samples/sec Loss 4.4997 LearningRate 0.0117 Epoch: 13 Global Step: 545690 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:45,225-Speed 2628.46 samples/sec Loss 4.6764 LearningRate 0.0117 Epoch: 13 Global Step: 545700 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:49,141-Speed 2615.85 samples/sec Loss 4.7015 LearningRate 0.0117 Epoch: 13 Global Step: 545710 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:53,043-Speed 2625.10 samples/sec Loss 4.6141 LearningRate 0.0117 Epoch: 13 Global Step: 545720 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:02:56,935-Speed 2631.44 samples/sec Loss 4.5758 LearningRate 0.0117 Epoch: 13 Global Step: 545730 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:00,828-Speed 2631.38 samples/sec Loss 4.6248 LearningRate 0.0117 Epoch: 13 Global Step: 545740 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:04,735-Speed 2621.49 samples/sec Loss 4.5089 LearningRate 0.0117 Epoch: 13 Global Step: 545750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:08,619-Speed 2637.09 samples/sec Loss 4.5219 LearningRate 0.0117 Epoch: 13 Global Step: 545760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:12,517-Speed 2627.73 samples/sec Loss 4.5072 LearningRate 0.0117 Epoch: 13 Global Step: 545770 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:16,428-Speed 2619.14 samples/sec Loss 4.6837 LearningRate 0.0117 Epoch: 13 Global Step: 545780 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:20,327-Speed 2626.60 samples/sec Loss 4.6647 LearningRate 0.0117 Epoch: 13 Global Step: 545790 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:24,240-Speed 2617.79 samples/sec Loss 4.5412 LearningRate 0.0117 Epoch: 13 Global Step: 545800 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:28,140-Speed 2626.23 samples/sec Loss 4.6208 LearningRate 0.0117 Epoch: 13 Global Step: 545810 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:32,037-Speed 2628.45 samples/sec Loss 4.6194 LearningRate 0.0117 Epoch: 13 Global Step: 545820 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:35,933-Speed 2629.10 samples/sec Loss 4.5059 LearningRate 0.0117 Epoch: 13 Global Step: 545830 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:39,825-Speed 2631.73 samples/sec Loss 4.5676 LearningRate 0.0117 Epoch: 13 Global Step: 545840 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:43,727-Speed 2624.15 samples/sec Loss 4.4885 LearningRate 0.0117 Epoch: 13 Global Step: 545850 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:03:47,619-Speed 2631.77 samples/sec Loss 4.6105 LearningRate 0.0117 Epoch: 13 Global Step: 545860 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:03:51,515-Speed 2629.82 samples/sec Loss 4.6109 LearningRate 0.0117 Epoch: 13 Global Step: 545870 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:03:55,417-Speed 2624.61 samples/sec Loss 4.5642 LearningRate 0.0117 Epoch: 13 Global Step: 545880 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:03:59,289-Speed 2646.41 samples/sec Loss 4.6406 LearningRate 0.0117 Epoch: 13 Global Step: 545890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:03,184-Speed 2629.45 samples/sec Loss 4.6924 LearningRate 0.0117 Epoch: 13 Global Step: 545900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:07,085-Speed 2625.72 samples/sec Loss 4.5713 LearningRate 0.0117 Epoch: 13 Global Step: 545910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:10,998-Speed 2617.23 samples/sec Loss 4.6911 LearningRate 0.0117 Epoch: 13 Global Step: 545920 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:14,893-Speed 2630.02 samples/sec Loss 4.5612 LearningRate 0.0117 Epoch: 13 Global Step: 545930 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:18,793-Speed 2626.05 samples/sec Loss 4.4990 LearningRate 0.0117 Epoch: 13 Global Step: 545940 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:22,690-Speed 2628.60 samples/sec Loss 4.6288 LearningRate 0.0117 Epoch: 13 Global Step: 545950 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:26,582-Speed 2631.69 samples/sec Loss 4.5480 LearningRate 0.0117 Epoch: 13 Global Step: 545960 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:30,488-Speed 2622.54 samples/sec Loss 4.4934 LearningRate 0.0117 Epoch: 13 Global Step: 545970 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:34,398-Speed 2619.46 samples/sec Loss 4.5983 LearningRate 0.0117 Epoch: 13 Global Step: 545980 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:38,297-Speed 2626.35 samples/sec Loss 4.6542 LearningRate 0.0117 Epoch: 13 Global Step: 545990 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:04:42,197-Speed 2626.70 samples/sec Loss 4.6232 LearningRate 0.0117 Epoch: 13 Global Step: 546000 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:04:46,104-Speed 2622.31 samples/sec Loss 4.6486 LearningRate 0.0117 Epoch: 13 Global Step: 546010 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:04:49,979-Speed 2642.69 samples/sec Loss 4.6286 LearningRate 0.0117 Epoch: 13 Global Step: 546020 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:53,881-Speed 2625.70 samples/sec Loss 4.5709 LearningRate 0.0117 Epoch: 13 Global Step: 546030 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:04:57,784-Speed 2624.64 samples/sec Loss 4.5652 LearningRate 0.0117 Epoch: 13 Global Step: 546040 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:01,815-Speed 2540.67 samples/sec Loss 4.5315 LearningRate 0.0117 Epoch: 13 Global Step: 546050 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:05,710-Speed 2629.40 samples/sec Loss 4.6165 LearningRate 0.0117 Epoch: 13 Global Step: 546060 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:09,611-Speed 2626.15 samples/sec Loss 4.5482 LearningRate 0.0117 Epoch: 13 Global Step: 546070 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:13,522-Speed 2618.81 samples/sec Loss 4.5212 LearningRate 0.0117 Epoch: 13 Global Step: 546080 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:17,470-Speed 2594.58 samples/sec Loss 4.6228 LearningRate 0.0117 Epoch: 13 Global Step: 546090 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:21,373-Speed 2624.22 samples/sec Loss 4.5474 LearningRate 0.0117 Epoch: 13 Global Step: 546100 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:25,312-Speed 2600.68 samples/sec Loss 4.5547 LearningRate 0.0117 Epoch: 13 Global Step: 546110 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:29,212-Speed 2626.14 samples/sec Loss 4.5991 LearningRate 0.0117 Epoch: 13 Global Step: 546120 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:05:33,085-Speed 2644.65 samples/sec Loss 4.6092 LearningRate 0.0117 Epoch: 13 Global Step: 546130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:36,991-Speed 2622.07 samples/sec Loss 4.5652 LearningRate 0.0117 Epoch: 13 Global Step: 546140 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:40,969-Speed 2574.80 samples/sec Loss 4.5748 LearningRate 0.0117 Epoch: 13 Global Step: 546150 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:44,875-Speed 2622.55 samples/sec Loss 4.5221 LearningRate 0.0117 Epoch: 13 Global Step: 546160 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:48,779-Speed 2623.78 samples/sec Loss 4.6005 LearningRate 0.0117 Epoch: 13 Global Step: 546170 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:52,676-Speed 2628.42 samples/sec Loss 4.5739 LearningRate 0.0117 Epoch: 13 Global Step: 546180 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:05:56,599-Speed 2611.18 samples/sec Loss 4.5694 LearningRate 0.0117 Epoch: 13 Global Step: 546190 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:00,494-Speed 2629.81 samples/sec Loss 4.6893 LearningRate 0.0117 Epoch: 13 Global Step: 546200 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:04,389-Speed 2629.17 samples/sec Loss 4.6946 LearningRate 0.0117 Epoch: 13 Global Step: 546210 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:08,295-Speed 2622.10 samples/sec Loss 4.5658 LearningRate 0.0117 Epoch: 13 Global Step: 546220 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:12,175-Speed 2640.23 samples/sec Loss 4.5384 LearningRate 0.0117 Epoch: 13 Global Step: 546230 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:16,069-Speed 2631.48 samples/sec Loss 4.5991 LearningRate 0.0117 Epoch: 13 Global Step: 546240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:19,962-Speed 2630.78 samples/sec Loss 4.6411 LearningRate 0.0117 Epoch: 13 Global Step: 546250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:24,013-Speed 2528.73 samples/sec Loss 4.6701 LearningRate 0.0117 Epoch: 13 Global Step: 546260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:27,908-Speed 2629.41 samples/sec Loss 4.6108 LearningRate 0.0117 Epoch: 13 Global Step: 546270 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:31,800-Speed 2631.97 samples/sec Loss 4.5600 LearningRate 0.0117 Epoch: 13 Global Step: 546280 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:35,696-Speed 2628.32 samples/sec Loss 4.5790 LearningRate 0.0117 Epoch: 13 Global Step: 546290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:39,641-Speed 2596.55 samples/sec Loss 4.6258 LearningRate 0.0117 Epoch: 13 Global Step: 546300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:43,540-Speed 2627.08 samples/sec Loss 4.6463 LearningRate 0.0117 Epoch: 13 Global Step: 546310 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:47,555-Speed 2551.62 samples/sec Loss 4.5968 LearningRate 0.0117 Epoch: 13 Global Step: 546320 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:06:51,452-Speed 2628.45 samples/sec Loss 4.4680 LearningRate 0.0117 Epoch: 13 Global Step: 546330 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:06:55,394-Speed 2598.63 samples/sec Loss 4.5637 LearningRate 0.0117 Epoch: 13 Global Step: 546340 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:06:59,306-Speed 2618.19 samples/sec Loss 4.6086 LearningRate 0.0117 Epoch: 13 Global Step: 546350 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:07:03,181-Speed 2643.27 samples/sec Loss 4.6022 LearningRate 0.0117 Epoch: 13 Global Step: 546360 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:07,086-Speed 2622.99 samples/sec Loss 4.5926 LearningRate 0.0117 Epoch: 13 Global Step: 546370 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:10,980-Speed 2630.70 samples/sec Loss 4.5815 LearningRate 0.0117 Epoch: 13 Global Step: 546380 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:14,890-Speed 2619.90 samples/sec Loss 4.5855 LearningRate 0.0117 Epoch: 13 Global Step: 546390 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:18,786-Speed 2629.04 samples/sec Loss 4.6281 LearningRate 0.0117 Epoch: 13 Global Step: 546400 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:22,681-Speed 2629.89 samples/sec Loss 4.6632 LearningRate 0.0117 Epoch: 13 Global Step: 546410 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:26,586-Speed 2622.67 samples/sec Loss 4.5661 LearningRate 0.0117 Epoch: 13 Global Step: 546420 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:30,482-Speed 2629.70 samples/sec Loss 4.6492 LearningRate 0.0116 Epoch: 13 Global Step: 546430 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:34,379-Speed 2628.38 samples/sec Loss 4.6622 LearningRate 0.0116 Epoch: 13 Global Step: 546440 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:38,274-Speed 2629.78 samples/sec Loss 4.4997 LearningRate 0.0116 Epoch: 13 Global Step: 546450 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:07:42,195-Speed 2611.72 samples/sec Loss 4.5280 LearningRate 0.0116 Epoch: 13 Global Step: 546460 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:07:46,096-Speed 2625.86 samples/sec Loss 4.6387 LearningRate 0.0116 Epoch: 13 Global Step: 546470 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:07:49,992-Speed 2629.45 samples/sec Loss 4.6361 LearningRate 0.0116 Epoch: 13 Global Step: 546480 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:07:53,893-Speed 2625.57 samples/sec Loss 4.5911 LearningRate 0.0116 Epoch: 13 Global Step: 546490 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:07:57,804-Speed 2618.60 samples/sec Loss 4.7285 LearningRate 0.0116 Epoch: 13 Global Step: 546500 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:08:01,700-Speed 2629.63 samples/sec Loss 4.6159 LearningRate 0.0116 Epoch: 13 Global Step: 546510 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:08:05,598-Speed 2627.32 samples/sec Loss 4.6051 LearningRate 0.0116 Epoch: 13 Global Step: 546520 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:08:09,496-Speed 2627.61 samples/sec Loss 4.5381 LearningRate 0.0116 Epoch: 13 Global Step: 546530 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:08:13,392-Speed 2628.87 samples/sec Loss 4.7298 LearningRate 0.0116 Epoch: 13 Global Step: 546540 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:08:17,269-Speed 2642.53 samples/sec Loss 4.6227 LearningRate 0.0116 Epoch: 13 Global Step: 546550 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:21,164-Speed 2630.25 samples/sec Loss 4.5518 LearningRate 0.0116 Epoch: 13 Global Step: 546560 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:25,056-Speed 2631.36 samples/sec Loss 4.7289 LearningRate 0.0116 Epoch: 13 Global Step: 546570 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:28,949-Speed 2631.18 samples/sec Loss 4.6688 LearningRate 0.0116 Epoch: 13 Global Step: 546580 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:32,846-Speed 2628.32 samples/sec Loss 4.5605 LearningRate 0.0116 Epoch: 13 Global Step: 546590 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:36,756-Speed 2619.48 samples/sec Loss 4.4509 LearningRate 0.0116 Epoch: 13 Global Step: 546600 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:40,652-Speed 2628.89 samples/sec Loss 4.6732 LearningRate 0.0116 Epoch: 13 Global Step: 546610 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:44,556-Speed 2624.10 samples/sec Loss 4.5438 LearningRate 0.0116 Epoch: 13 Global Step: 546620 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:48,452-Speed 2628.81 samples/sec Loss 4.6086 LearningRate 0.0116 Epoch: 13 Global Step: 546630 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:52,345-Speed 2631.12 samples/sec Loss 4.5572 LearningRate 0.0116 Epoch: 13 Global Step: 546640 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:08:56,259-Speed 2616.99 samples/sec Loss 4.6285 LearningRate 0.0116 Epoch: 13 Global Step: 546650 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:00,158-Speed 2627.33 samples/sec Loss 4.5970 LearningRate 0.0116 Epoch: 13 Global Step: 546660 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:04,060-Speed 2625.07 samples/sec Loss 4.6443 LearningRate 0.0116 Epoch: 13 Global Step: 546670 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:07,955-Speed 2629.76 samples/sec Loss 4.6375 LearningRate 0.0116 Epoch: 13 Global Step: 546680 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:11,854-Speed 2626.68 samples/sec Loss 4.5846 LearningRate 0.0116 Epoch: 13 Global Step: 546690 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:15,751-Speed 2627.99 samples/sec Loss 4.6491 LearningRate 0.0116 Epoch: 13 Global Step: 546700 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:19,661-Speed 2619.42 samples/sec Loss 4.6257 LearningRate 0.0116 Epoch: 13 Global Step: 546710 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:23,558-Speed 2629.45 samples/sec Loss 4.5736 LearningRate 0.0116 Epoch: 13 Global Step: 546720 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:27,453-Speed 2629.44 samples/sec Loss 4.6877 LearningRate 0.0116 Epoch: 13 Global Step: 546730 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:31,347-Speed 2630.24 samples/sec Loss 4.6857 LearningRate 0.0116 Epoch: 13 Global Step: 546740 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:35,231-Speed 2637.29 samples/sec Loss 4.6842 LearningRate 0.0116 Epoch: 13 Global Step: 546750 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:39,127-Speed 2628.51 samples/sec Loss 4.5774 LearningRate 0.0116 Epoch: 13 Global Step: 546760 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:43,023-Speed 2629.39 samples/sec Loss 4.6367 LearningRate 0.0116 Epoch: 13 Global Step: 546770 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:47,121-Speed 2499.24 samples/sec Loss 4.5319 LearningRate 0.0116 Epoch: 13 Global Step: 546780 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:51,025-Speed 2623.59 samples/sec Loss 4.5362 LearningRate 0.0116 Epoch: 13 Global Step: 546790 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:54,917-Speed 2631.48 samples/sec Loss 4.5510 LearningRate 0.0116 Epoch: 13 Global Step: 546800 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:09:58,824-Speed 2621.97 samples/sec Loss 4.4843 LearningRate 0.0116 Epoch: 13 Global Step: 546810 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:10:02,698-Speed 2644.53 samples/sec Loss 4.6282 LearningRate 0.0116 Epoch: 13 Global Step: 546820 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:06,594-Speed 2628.34 samples/sec Loss 4.5222 LearningRate 0.0116 Epoch: 13 Global Step: 546830 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:10,504-Speed 2619.63 samples/sec Loss 4.5557 LearningRate 0.0116 Epoch: 13 Global Step: 546840 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:14,402-Speed 2627.11 samples/sec Loss 4.6160 LearningRate 0.0116 Epoch: 13 Global Step: 546850 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:18,299-Speed 2628.00 samples/sec Loss 4.5508 LearningRate 0.0116 Epoch: 13 Global Step: 546860 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:22,193-Speed 2630.86 samples/sec Loss 4.4508 LearningRate 0.0116 Epoch: 13 Global Step: 546870 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:26,086-Speed 2630.88 samples/sec Loss 4.4596 LearningRate 0.0116 Epoch: 13 Global Step: 546880 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:29,996-Speed 2620.05 samples/sec Loss 4.6016 LearningRate 0.0116 Epoch: 13 Global Step: 546890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:33,895-Speed 2626.30 samples/sec Loss 4.5794 LearningRate 0.0116 Epoch: 13 Global Step: 546900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:37,792-Speed 2628.60 samples/sec Loss 4.5426 LearningRate 0.0116 Epoch: 13 Global Step: 546910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:10:41,779-Speed 2569.54 samples/sec Loss 4.6035 LearningRate 0.0116 Epoch: 13 Global Step: 546920 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:10:45,683-Speed 2623.59 samples/sec Loss 4.6237 LearningRate 0.0116 Epoch: 13 Global Step: 546930 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:10:49,591-Speed 2621.13 samples/sec Loss 4.5717 LearningRate 0.0116 Epoch: 13 Global Step: 546940 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:10:53,496-Speed 2622.52 samples/sec Loss 4.6295 LearningRate 0.0116 Epoch: 13 Global Step: 546950 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:10:57,414-Speed 2615.06 samples/sec Loss 4.6000 LearningRate 0.0116 Epoch: 13 Global Step: 546960 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:11:03,135-Speed 1790.01 samples/sec Loss 4.5571 LearningRate 0.0116 Epoch: 13 Global Step: 546970 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:11:07,052-Speed 2614.57 samples/sec Loss 4.5718 LearningRate 0.0116 Epoch: 13 Global Step: 546980 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:11:10,931-Speed 2641.02 samples/sec Loss 4.5469 LearningRate 0.0116 Epoch: 13 Global Step: 546990 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:14,828-Speed 2628.31 samples/sec Loss 4.6704 LearningRate 0.0116 Epoch: 13 Global Step: 547000 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:18,748-Speed 2613.10 samples/sec Loss 4.5744 LearningRate 0.0116 Epoch: 13 Global Step: 547010 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:22,644-Speed 2629.37 samples/sec Loss 4.5684 LearningRate 0.0116 Epoch: 13 Global Step: 547020 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:26,539-Speed 2630.17 samples/sec Loss 4.4913 LearningRate 0.0116 Epoch: 13 Global Step: 547030 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:30,446-Speed 2621.34 samples/sec Loss 4.5259 LearningRate 0.0116 Epoch: 13 Global Step: 547040 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:34,357-Speed 2618.18 samples/sec Loss 4.7054 LearningRate 0.0116 Epoch: 13 Global Step: 547050 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:38,250-Speed 2630.80 samples/sec Loss 4.5818 LearningRate 0.0116 Epoch: 13 Global Step: 547060 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:42,145-Speed 2630.30 samples/sec Loss 4.5984 LearningRate 0.0116 Epoch: 13 Global Step: 547070 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:46,039-Speed 2630.80 samples/sec Loss 4.7015 LearningRate 0.0116 Epoch: 13 Global Step: 547080 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:49,988-Speed 2593.71 samples/sec Loss 4.5837 LearningRate 0.0116 Epoch: 13 Global Step: 547090 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:11:53,862-Speed 2643.91 samples/sec Loss 4.5670 LearningRate 0.0116 Epoch: 13 Global Step: 547100 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:11:57,791-Speed 2607.64 samples/sec Loss 4.6161 LearningRate 0.0116 Epoch: 13 Global Step: 547110 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:01,715-Speed 2609.82 samples/sec Loss 4.6874 LearningRate 0.0116 Epoch: 13 Global Step: 547120 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:05,612-Speed 2628.55 samples/sec Loss 4.5820 LearningRate 0.0116 Epoch: 13 Global Step: 547130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:09,501-Speed 2633.67 samples/sec Loss 4.6177 LearningRate 0.0116 Epoch: 13 Global Step: 547140 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:13,392-Speed 2633.08 samples/sec Loss 4.6000 LearningRate 0.0116 Epoch: 13 Global Step: 547150 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:17,292-Speed 2626.24 samples/sec Loss 4.5974 LearningRate 0.0116 Epoch: 13 Global Step: 547160 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:21,194-Speed 2624.95 samples/sec Loss 4.6062 LearningRate 0.0116 Epoch: 13 Global Step: 547170 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:25,095-Speed 2626.43 samples/sec Loss 4.6971 LearningRate 0.0116 Epoch: 13 Global Step: 547180 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:28,995-Speed 2626.13 samples/sec Loss 4.5409 LearningRate 0.0116 Epoch: 13 Global Step: 547190 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:32,912-Speed 2614.21 samples/sec Loss 4.5473 LearningRate 0.0116 Epoch: 13 Global Step: 547200 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:12:36,810-Speed 2627.87 samples/sec Loss 4.5014 LearningRate 0.0116 Epoch: 13 Global Step: 547210 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:12:40,688-Speed 2641.58 samples/sec Loss 4.5341 LearningRate 0.0116 Epoch: 13 Global Step: 547220 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:44,623-Speed 2603.12 samples/sec Loss 4.5181 LearningRate 0.0116 Epoch: 13 Global Step: 547230 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:48,515-Speed 2631.73 samples/sec Loss 4.4708 LearningRate 0.0116 Epoch: 13 Global Step: 547240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:52,410-Speed 2629.68 samples/sec Loss 4.5608 LearningRate 0.0116 Epoch: 13 Global Step: 547250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:12:56,318-Speed 2622.02 samples/sec Loss 4.5834 LearningRate 0.0116 Epoch: 13 Global Step: 547260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:00,214-Speed 2628.87 samples/sec Loss 4.6322 LearningRate 0.0116 Epoch: 13 Global Step: 547270 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:04,140-Speed 2608.79 samples/sec Loss 4.5990 LearningRate 0.0116 Epoch: 13 Global Step: 547280 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:08,040-Speed 2626.39 samples/sec Loss 4.6653 LearningRate 0.0116 Epoch: 13 Global Step: 547290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:11,933-Speed 2631.18 samples/sec Loss 4.6517 LearningRate 0.0116 Epoch: 13 Global Step: 547300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:15,826-Speed 2630.88 samples/sec Loss 4.6405 LearningRate 0.0116 Epoch: 13 Global Step: 547310 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:19,735-Speed 2619.75 samples/sec Loss 4.5431 LearningRate 0.0116 Epoch: 13 Global Step: 547320 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:13:23,630-Speed 2630.40 samples/sec Loss 4.6592 LearningRate 0.0116 Epoch: 13 Global Step: 547330 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:27,521-Speed 2632.21 samples/sec Loss 4.6458 LearningRate 0.0116 Epoch: 13 Global Step: 547340 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:31,463-Speed 2598.87 samples/sec Loss 4.5948 LearningRate 0.0116 Epoch: 13 Global Step: 547350 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:35,359-Speed 2629.22 samples/sec Loss 4.5950 LearningRate 0.0116 Epoch: 13 Global Step: 547360 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:39,252-Speed 2630.60 samples/sec Loss 4.5130 LearningRate 0.0116 Epoch: 13 Global Step: 547370 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:43,153-Speed 2625.89 samples/sec Loss 4.5675 LearningRate 0.0116 Epoch: 13 Global Step: 547380 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:47,056-Speed 2624.52 samples/sec Loss 4.5278 LearningRate 0.0116 Epoch: 13 Global Step: 547390 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:50,957-Speed 2625.66 samples/sec Loss 4.5467 LearningRate 0.0116 Epoch: 13 Global Step: 547400 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:54,849-Speed 2632.14 samples/sec Loss 4.6134 LearningRate 0.0116 Epoch: 13 Global Step: 547410 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:13:58,742-Speed 2630.87 samples/sec Loss 4.5248 LearningRate 0.0116 Epoch: 13 Global Step: 547420 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:02,638-Speed 2629.39 samples/sec Loss 4.5668 LearningRate 0.0116 Epoch: 13 Global Step: 547430 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:14:06,544-Speed 2621.88 samples/sec Loss 4.5827 LearningRate 0.0116 Epoch: 13 Global Step: 547440 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:14:10,436-Speed 2631.60 samples/sec Loss 4.5920 LearningRate 0.0116 Epoch: 13 Global Step: 547450 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:14,482-Speed 2531.36 samples/sec Loss 4.5857 LearningRate 0.0116 Epoch: 13 Global Step: 547460 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:18,573-Speed 2504.32 samples/sec Loss 4.5354 LearningRate 0.0116 Epoch: 13 Global Step: 547470 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:22,533-Speed 2586.72 samples/sec Loss 4.5373 LearningRate 0.0116 Epoch: 13 Global Step: 547480 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:26,431-Speed 2627.61 samples/sec Loss 4.5532 LearningRate 0.0116 Epoch: 13 Global Step: 547490 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:30,422-Speed 2566.41 samples/sec Loss 4.5754 LearningRate 0.0116 Epoch: 13 Global Step: 547500 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:34,324-Speed 2624.55 samples/sec Loss 4.5561 LearningRate 0.0116 Epoch: 13 Global Step: 547510 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:38,225-Speed 2625.37 samples/sec Loss 4.6422 LearningRate 0.0116 Epoch: 13 Global Step: 547520 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:42,140-Speed 2615.93 samples/sec Loss 4.6258 LearningRate 0.0116 Epoch: 13 Global Step: 547530 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:46,050-Speed 2620.24 samples/sec Loss 4.6894 LearningRate 0.0116 Epoch: 13 Global Step: 547540 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:14:49,945-Speed 2629.31 samples/sec Loss 4.6292 LearningRate 0.0116 Epoch: 13 Global Step: 547550 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:14:53,839-Speed 2631.39 samples/sec Loss 4.5614 LearningRate 0.0116 Epoch: 13 Global Step: 547560 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:14:57,738-Speed 2626.31 samples/sec Loss 4.5684 LearningRate 0.0116 Epoch: 13 Global Step: 547570 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:15:01,665-Speed 2608.69 samples/sec Loss 4.6103 LearningRate 0.0116 Epoch: 13 Global Step: 547580 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:15:05,533-Speed 2647.71 samples/sec Loss 4.4412 LearningRate 0.0116 Epoch: 13 Global Step: 547590 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:09,423-Speed 2632.99 samples/sec Loss 4.6097 LearningRate 0.0116 Epoch: 13 Global Step: 547600 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:13,345-Speed 2611.56 samples/sec Loss 4.5868 LearningRate 0.0116 Epoch: 13 Global Step: 547610 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:17,242-Speed 2628.63 samples/sec Loss 4.5617 LearningRate 0.0116 Epoch: 13 Global Step: 547620 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:21,132-Speed 2633.60 samples/sec Loss 4.5179 LearningRate 0.0116 Epoch: 13 Global Step: 547630 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:25,025-Speed 2631.60 samples/sec Loss 4.7084 LearningRate 0.0116 Epoch: 13 Global Step: 547640 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:28,944-Speed 2613.40 samples/sec Loss 4.6045 LearningRate 0.0115 Epoch: 13 Global Step: 547650 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:32,835-Speed 2632.87 samples/sec Loss 4.5462 LearningRate 0.0115 Epoch: 13 Global Step: 547660 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:36,729-Speed 2629.74 samples/sec Loss 4.5956 LearningRate 0.0115 Epoch: 13 Global Step: 547670 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:40,645-Speed 2615.63 samples/sec Loss 4.5196 LearningRate 0.0115 Epoch: 13 Global Step: 547680 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:15:44,539-Speed 2630.40 samples/sec Loss 4.5387 LearningRate 0.0115 Epoch: 13 Global Step: 547690 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:15:48,452-Speed 2617.86 samples/sec Loss 4.5945 LearningRate 0.0115 Epoch: 13 Global Step: 547700 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:15:52,345-Speed 2631.05 samples/sec Loss 4.4982 LearningRate 0.0115 Epoch: 13 Global Step: 547710 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:15:56,260-Speed 2616.05 samples/sec Loss 4.5618 LearningRate 0.0115 Epoch: 13 Global Step: 547720 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:16:00,159-Speed 2627.50 samples/sec Loss 4.6125 LearningRate 0.0115 Epoch: 13 Global Step: 547730 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:16:04,030-Speed 2646.21 samples/sec Loss 4.5834 LearningRate 0.0115 Epoch: 13 Global Step: 547740 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:16:07,927-Speed 2628.28 samples/sec Loss 4.5075 LearningRate 0.0115 Epoch: 13 Global Step: 547750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:16:11,823-Speed 2628.27 samples/sec Loss 4.7005 LearningRate 0.0115 Epoch: 13 Global Step: 547760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:16:15,701-Speed 2641.67 samples/sec Loss 4.5889 LearningRate 0.0115 Epoch: 13 Global Step: 547770 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:19,806-Speed 2495.38 samples/sec Loss 4.6531 LearningRate 0.0115 Epoch: 13 Global Step: 547780 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:23,760-Speed 2590.20 samples/sec Loss 4.5847 LearningRate 0.0115 Epoch: 13 Global Step: 547790 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:27,651-Speed 2632.92 samples/sec Loss 4.5295 LearningRate 0.0115 Epoch: 13 Global Step: 547800 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:31,548-Speed 2628.39 samples/sec Loss 4.6007 LearningRate 0.0115 Epoch: 13 Global Step: 547810 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:35,445-Speed 2627.86 samples/sec Loss 4.6203 LearningRate 0.0115 Epoch: 13 Global Step: 547820 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:39,336-Speed 2632.78 samples/sec Loss 4.5901 LearningRate 0.0115 Epoch: 13 Global Step: 547830 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:43,237-Speed 2625.85 samples/sec Loss 4.5084 LearningRate 0.0115 Epoch: 13 Global Step: 547840 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:47,131-Speed 2630.88 samples/sec Loss 4.5596 LearningRate 0.0115 Epoch: 13 Global Step: 547850 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:51,027-Speed 2628.50 samples/sec Loss 4.5618 LearningRate 0.0115 Epoch: 13 Global Step: 547860 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:16:54,928-Speed 2626.30 samples/sec Loss 4.5542 LearningRate 0.0115 Epoch: 13 Global Step: 547870 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:16:58,822-Speed 2630.46 samples/sec Loss 4.6506 LearningRate 0.0115 Epoch: 13 Global Step: 547880 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:02,714-Speed 2630.99 samples/sec Loss 4.6301 LearningRate 0.0115 Epoch: 13 Global Step: 547890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:06,608-Speed 2630.15 samples/sec Loss 4.5725 LearningRate 0.0115 Epoch: 13 Global Step: 547900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:10,516-Speed 2621.33 samples/sec Loss 4.6140 LearningRate 0.0115 Epoch: 13 Global Step: 547910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:14,410-Speed 2629.93 samples/sec Loss 4.5452 LearningRate 0.0115 Epoch: 13 Global Step: 547920 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:18,306-Speed 2630.57 samples/sec Loss 4.5061 LearningRate 0.0115 Epoch: 13 Global Step: 547930 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:22,199-Speed 2630.61 samples/sec Loss 4.5522 LearningRate 0.0115 Epoch: 13 Global Step: 547940 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:26,093-Speed 2630.65 samples/sec Loss 4.5649 LearningRate 0.0115 Epoch: 13 Global Step: 547950 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:29,986-Speed 2630.72 samples/sec Loss 4.5519 LearningRate 0.0115 Epoch: 13 Global Step: 547960 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:17:33,879-Speed 2631.06 samples/sec Loss 4.5737 LearningRate 0.0115 Epoch: 13 Global Step: 547970 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:17:37,770-Speed 2632.42 samples/sec Loss 4.5374 LearningRate 0.0115 Epoch: 13 Global Step: 547980 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:17:41,670-Speed 2626.33 samples/sec Loss 4.5665 LearningRate 0.0115 Epoch: 13 Global Step: 547990 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:17:45,564-Speed 2630.17 samples/sec Loss 4.5317 LearningRate 0.0115 Epoch: 13 Global Step: 548000 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:17:49,458-Speed 2631.10 samples/sec Loss 4.6311 LearningRate 0.0115 Epoch: 13 Global Step: 548010 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:17:53,358-Speed 2625.89 samples/sec Loss 4.6233 LearningRate 0.0115 Epoch: 13 Global Step: 548020 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:17:57,253-Speed 2630.09 samples/sec Loss 4.5561 LearningRate 0.0115 Epoch: 13 Global Step: 548030 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:18:01,136-Speed 2638.11 samples/sec Loss 4.6029 LearningRate 0.0115 Epoch: 13 Global Step: 548040 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:05,036-Speed 2626.23 samples/sec Loss 4.5810 LearningRate 0.0115 Epoch: 13 Global Step: 548050 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:08,933-Speed 2627.80 samples/sec Loss 4.6038 LearningRate 0.0115 Epoch: 13 Global Step: 548060 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:12,906-Speed 2578.81 samples/sec Loss 4.5129 LearningRate 0.0115 Epoch: 13 Global Step: 548070 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:16,883-Speed 2575.00 samples/sec Loss 4.6650 LearningRate 0.0115 Epoch: 13 Global Step: 548080 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:20,799-Speed 2616.07 samples/sec Loss 4.5707 LearningRate 0.0115 Epoch: 13 Global Step: 548090 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:24,702-Speed 2624.32 samples/sec Loss 4.6478 LearningRate 0.0115 Epoch: 13 Global Step: 548100 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:28,627-Speed 2609.77 samples/sec Loss 4.5190 LearningRate 0.0115 Epoch: 13 Global Step: 548110 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:32,563-Speed 2602.23 samples/sec Loss 4.5549 LearningRate 0.0115 Epoch: 13 Global Step: 548120 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:36,589-Speed 2543.97 samples/sec Loss 4.4995 LearningRate 0.0115 Epoch: 13 Global Step: 548130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:18:40,460-Speed 2646.28 samples/sec Loss 4.5357 LearningRate 0.0115 Epoch: 13 Global Step: 548140 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:18:44,367-Speed 2620.76 samples/sec Loss 4.6216 LearningRate 0.0115 Epoch: 13 Global Step: 548150 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:18:48,272-Speed 2623.46 samples/sec Loss 4.5543 LearningRate 0.0115 Epoch: 13 Global Step: 548160 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:18:52,179-Speed 2621.92 samples/sec Loss 4.5744 LearningRate 0.0115 Epoch: 13 Global Step: 548170 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:18:56,081-Speed 2624.95 samples/sec Loss 4.5583 LearningRate 0.0115 Epoch: 13 Global Step: 548180 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:18:59,984-Speed 2623.96 samples/sec Loss 4.6100 LearningRate 0.0115 Epoch: 13 Global Step: 548190 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:19:03,896-Speed 2618.63 samples/sec Loss 4.5186 LearningRate 0.0115 Epoch: 13 Global Step: 548200 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:19:07,867-Speed 2578.97 samples/sec Loss 4.5674 LearningRate 0.0115 Epoch: 13 Global Step: 548210 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:19:11,759-Speed 2631.57 samples/sec Loss 4.5267 LearningRate 0.0115 Epoch: 13 Global Step: 548220 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:19:15,658-Speed 2627.23 samples/sec Loss 4.5868 LearningRate 0.0115 Epoch: 13 Global Step: 548230 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:19:19,558-Speed 2626.28 samples/sec Loss 4.6178 LearningRate 0.0115 Epoch: 13 Global Step: 548240 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:23,458-Speed 2626.60 samples/sec Loss 4.5556 LearningRate 0.0115 Epoch: 13 Global Step: 548250 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:27,357-Speed 2627.45 samples/sec Loss 4.5879 LearningRate 0.0115 Epoch: 13 Global Step: 548260 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:31,258-Speed 2625.09 samples/sec Loss 4.6140 LearningRate 0.0115 Epoch: 13 Global Step: 548270 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:35,163-Speed 2623.08 samples/sec Loss 4.5782 LearningRate 0.0115 Epoch: 13 Global Step: 548280 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:39,064-Speed 2625.18 samples/sec Loss 4.5603 LearningRate 0.0115 Epoch: 13 Global Step: 548290 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:42,958-Speed 2630.41 samples/sec Loss 4.6403 LearningRate 0.0115 Epoch: 13 Global Step: 548300 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:46,853-Speed 2629.76 samples/sec Loss 4.5730 LearningRate 0.0115 Epoch: 13 Global Step: 548310 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:50,753-Speed 2626.20 samples/sec Loss 4.4807 LearningRate 0.0115 Epoch: 13 Global Step: 548320 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:54,650-Speed 2628.36 samples/sec Loss 4.5297 LearningRate 0.0115 Epoch: 13 Global Step: 548330 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:19:58,560-Speed 2620.15 samples/sec Loss 4.5974 LearningRate 0.0115 Epoch: 13 Global Step: 548340 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:02,463-Speed 2624.42 samples/sec Loss 4.5361 LearningRate 0.0115 Epoch: 13 Global Step: 548350 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:06,364-Speed 2625.46 samples/sec Loss 4.5814 LearningRate 0.0115 Epoch: 13 Global Step: 548360 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:10,256-Speed 2631.21 samples/sec Loss 4.5915 LearningRate 0.0115 Epoch: 13 Global Step: 548370 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:14,152-Speed 2629.42 samples/sec Loss 4.5495 LearningRate 0.0115 Epoch: 13 Global Step: 548380 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:18,056-Speed 2623.90 samples/sec Loss 4.5228 LearningRate 0.0115 Epoch: 13 Global Step: 548390 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:21,953-Speed 2628.73 samples/sec Loss 4.5787 LearningRate 0.0115 Epoch: 13 Global Step: 548400 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:25,847-Speed 2630.37 samples/sec Loss 4.4884 LearningRate 0.0115 Epoch: 13 Global Step: 548410 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:29,745-Speed 2627.62 samples/sec Loss 4.5742 LearningRate 0.0115 Epoch: 13 Global Step: 548420 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:20:33,628-Speed 2638.13 samples/sec Loss 4.5012 LearningRate 0.0115 Epoch: 13 Global Step: 548430 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:20:37,526-Speed 2627.50 samples/sec Loss 4.5726 LearningRate 0.0115 Epoch: 13 Global Step: 548440 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:20:41,424-Speed 2627.53 samples/sec Loss 4.4518 LearningRate 0.0115 Epoch: 13 Global Step: 548450 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:20:45,330-Speed 2621.92 samples/sec Loss 4.5366 LearningRate 0.0115 Epoch: 13 Global Step: 548460 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:20:49,236-Speed 2622.18 samples/sec Loss 4.5527 LearningRate 0.0115 Epoch: 13 Global Step: 548470 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:20:53,135-Speed 2627.51 samples/sec Loss 4.5125 LearningRate 0.0115 Epoch: 13 Global Step: 548480 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:20:57,032-Speed 2628.79 samples/sec Loss 4.5479 LearningRate 0.0115 Epoch: 13 Global Step: 548490 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:21:00,928-Speed 2629.47 samples/sec Loss 4.5569 LearningRate 0.0115 Epoch: 13 Global Step: 548500 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:21:04,826-Speed 2627.23 samples/sec Loss 4.5501 LearningRate 0.0115 Epoch: 13 Global Step: 548510 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:21:08,737-Speed 2619.13 samples/sec Loss 4.4794 LearningRate 0.0115 Epoch: 13 Global Step: 548520 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:21:12,631-Speed 2630.01 samples/sec Loss 4.5713 LearningRate 0.0115 Epoch: 13 Global Step: 548530 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:21:16,533-Speed 2625.23 samples/sec Loss 4.5744 LearningRate 0.0115 Epoch: 13 Global Step: 548540 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:21:20,428-Speed 2629.31 samples/sec Loss 4.6683 LearningRate 0.0115 Epoch: 13 Global Step: 548550 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:21:24,327-Speed 2627.75 samples/sec Loss 4.6426 LearningRate 0.0115 Epoch: 13 Global Step: 548560 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:21:28,220-Speed 2630.81 samples/sec Loss 4.5708 LearningRate 0.0115 Epoch: 13 Global Step: 548570 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:21:32,112-Speed 2631.70 samples/sec Loss 4.6595 LearningRate 0.0115 Epoch: 13 Global Step: 548580 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:21:36,005-Speed 2631.42 samples/sec Loss 4.4422 LearningRate 0.0115 Epoch: 13 Global Step: 548590 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:21:39,876-Speed 2645.48 samples/sec Loss 4.5838 LearningRate 0.0115 Epoch: 13 Global Step: 548600 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:21:43,750-Speed 2643.86 samples/sec Loss 4.5581 LearningRate 0.0115 Epoch: 13 Global Step: 548610 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:21:47,660-Speed 2619.98 samples/sec Loss 4.4956 LearningRate 0.0115 Epoch: 13 Global Step: 548620 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:21:51,570-Speed 2619.57 samples/sec Loss 4.6605 LearningRate 0.0115 Epoch: 13 Global Step: 548630 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:21:55,463-Speed 2631.38 samples/sec Loss 4.4886 LearningRate 0.0115 Epoch: 13 Global Step: 548640 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:21:59,362-Speed 2626.59 samples/sec Loss 4.5491 LearningRate 0.0115 Epoch: 13 Global Step: 548650 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:22:03,258-Speed 2629.61 samples/sec Loss 4.5724 LearningRate 0.0115 Epoch: 13 Global Step: 548660 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:22:07,156-Speed 2627.96 samples/sec Loss 4.5127 LearningRate 0.0115 Epoch: 13 Global Step: 548670 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:22:11,051-Speed 2629.49 samples/sec Loss 4.5297 LearningRate 0.0115 Epoch: 13 Global Step: 548680 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:22:15,080-Speed 2541.64 samples/sec Loss 4.5555 LearningRate 0.0115 Epoch: 13 Global Step: 548690 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:22:18,970-Speed 2633.60 samples/sec Loss 4.4596 LearningRate 0.0115 Epoch: 13 Global Step: 548700 Fp16 Grad Scale: 32768 Required: 32 hours
Training: 2022-04-15 09:22:22,863-Speed 2631.35 samples/sec Loss 4.5708 LearningRate 0.0115 Epoch: 13 Global Step: 548710 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:26,769-Speed 2622.34 samples/sec Loss 4.6493 LearningRate 0.0115 Epoch: 13 Global Step: 548720 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:30,675-Speed 2622.29 samples/sec Loss 4.5300 LearningRate 0.0115 Epoch: 13 Global Step: 548730 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:34,571-Speed 2628.65 samples/sec Loss 4.6124 LearningRate 0.0115 Epoch: 13 Global Step: 548740 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:38,467-Speed 2629.87 samples/sec Loss 4.5153 LearningRate 0.0115 Epoch: 13 Global Step: 548750 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:42,364-Speed 2627.82 samples/sec Loss 4.5389 LearningRate 0.0115 Epoch: 13 Global Step: 548760 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:46,263-Speed 2627.35 samples/sec Loss 4.4703 LearningRate 0.0115 Epoch: 13 Global Step: 548770 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:50,159-Speed 2628.95 samples/sec Loss 4.5303 LearningRate 0.0115 Epoch: 13 Global Step: 548780 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:54,060-Speed 2626.68 samples/sec Loss 4.4939 LearningRate 0.0115 Epoch: 13 Global Step: 548790 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:22:57,958-Speed 2627.82 samples/sec Loss 4.4669 LearningRate 0.0115 Epoch: 13 Global Step: 548800 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:01,851-Speed 2631.29 samples/sec Loss 4.5802 LearningRate 0.0115 Epoch: 13 Global Step: 548810 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:23:05,729-Speed 2641.17 samples/sec Loss 4.5837 LearningRate 0.0115 Epoch: 13 Global Step: 548820 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:09,629-Speed 2626.47 samples/sec Loss 4.5131 LearningRate 0.0115 Epoch: 13 Global Step: 548830 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:13,519-Speed 2633.02 samples/sec Loss 4.6204 LearningRate 0.0115 Epoch: 13 Global Step: 548840 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:17,411-Speed 2631.46 samples/sec Loss 4.5321 LearningRate 0.0115 Epoch: 13 Global Step: 548850 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:21,311-Speed 2627.01 samples/sec Loss 4.5181 LearningRate 0.0115 Epoch: 13 Global Step: 548860 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:25,206-Speed 2629.69 samples/sec Loss 4.5799 LearningRate 0.0114 Epoch: 13 Global Step: 548870 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:29,098-Speed 2632.96 samples/sec Loss 4.6164 LearningRate 0.0114 Epoch: 13 Global Step: 548880 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:32,995-Speed 2628.36 samples/sec Loss 4.5356 LearningRate 0.0114 Epoch: 13 Global Step: 548890 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:36,892-Speed 2627.84 samples/sec Loss 4.5458 LearningRate 0.0114 Epoch: 13 Global Step: 548900 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:40,783-Speed 2631.98 samples/sec Loss 4.6241 LearningRate 0.0114 Epoch: 13 Global Step: 548910 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:23:44,684-Speed 2625.92 samples/sec Loss 4.5031 LearningRate 0.0114 Epoch: 13 Global Step: 548920 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:23:48,635-Speed 2592.78 samples/sec Loss 4.5326 LearningRate 0.0114 Epoch: 13 Global Step: 548930 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:23:52,661-Speed 2543.81 samples/sec Loss 4.5625 LearningRate 0.0114 Epoch: 13 Global Step: 548940 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:23:56,637-Speed 2576.85 samples/sec Loss 4.5518 LearningRate 0.0114 Epoch: 13 Global Step: 548950 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:24:00,527-Speed 2632.80 samples/sec Loss 4.6477 LearningRate 0.0114 Epoch: 13 Global Step: 548960 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:24:04,433-Speed 2622.57 samples/sec Loss 4.6187 LearningRate 0.0114 Epoch: 13 Global Step: 548970 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:24:08,319-Speed 2635.59 samples/sec Loss 4.5314 LearningRate 0.0114 Epoch: 13 Global Step: 548980 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:12,212-Speed 2631.22 samples/sec Loss 4.5784 LearningRate 0.0114 Epoch: 13 Global Step: 548990 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:16,105-Speed 2630.81 samples/sec Loss 4.7314 LearningRate 0.0114 Epoch: 13 Global Step: 549000 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:20,013-Speed 2621.28 samples/sec Loss 4.6069 LearningRate 0.0114 Epoch: 13 Global Step: 549010 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:23,908-Speed 2629.24 samples/sec Loss 4.5969 LearningRate 0.0114 Epoch: 13 Global Step: 549020 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:27,802-Speed 2631.78 samples/sec Loss 4.5708 LearningRate 0.0114 Epoch: 13 Global Step: 549030 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:31,768-Speed 2582.73 samples/sec Loss 4.5821 LearningRate 0.0114 Epoch: 13 Global Step: 549040 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:35,666-Speed 2627.14 samples/sec Loss 4.5582 LearningRate 0.0114 Epoch: 13 Global Step: 549050 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:39,577-Speed 2619.11 samples/sec Loss 4.5234 LearningRate 0.0114 Epoch: 13 Global Step: 549060 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:43,550-Speed 2578.60 samples/sec Loss 4.5329 LearningRate 0.0114 Epoch: 13 Global Step: 549070 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:47,693-Speed 2471.78 samples/sec Loss 4.5437 LearningRate 0.0114 Epoch: 13 Global Step: 549080 Fp16 Grad Scale: 131072 Required: 32 hours
Training: 2022-04-15 09:24:51,587-Speed 2630.77 samples/sec Loss 4.5071 LearningRate 0.0114 Epoch: 13 Global Step: 549090 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:55,479-Speed 2631.58 samples/sec Loss 4.6481 LearningRate 0.0114 Epoch: 13 Global Step: 549100 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:24:59,385-Speed 2623.17 samples/sec Loss 4.6158 LearningRate 0.0114 Epoch: 13 Global Step: 549110 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:25:03,297-Speed 2617.86 samples/sec Loss 4.5894 LearningRate 0.0114 Epoch: 13 Global Step: 549120 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:25:07,196-Speed 2626.92 samples/sec Loss 4.4393 LearningRate 0.0114 Epoch: 13 Global Step: 549130 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:25:11,165-Speed 2580.36 samples/sec Loss 4.5138 LearningRate 0.0114 Epoch: 13 Global Step: 549140 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:25:15,079-Speed 2617.15 samples/sec Loss 4.5070 LearningRate 0.0114 Epoch: 13 Global Step: 549150 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:25:18,978-Speed 2627.52 samples/sec Loss 4.5895 LearningRate 0.0114 Epoch: 13 Global Step: 549160 Fp16 Grad Scale: 65536 Required: 32 hours
Training: 2022-04-15 09:25:22,875-Speed 2628.26 samples/sec Loss 4.5770 LearningRate 0.0114 Epoch: 13 Global Step: 549170 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:25:26,776-Speed 2625.35 samples/sec Loss 4.6142 LearningRate 0.0114 Epoch: 13 Global Step: 549180 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:25:30,684-Speed 2621.10 samples/sec Loss 4.5250 LearningRate 0.0114 Epoch: 13 Global Step: 549190 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:25:34,590-Speed 2622.22 samples/sec Loss 4.5887 LearningRate 0.0114 Epoch: 13 Global Step: 549200 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:25:38,497-Speed 2621.58 samples/sec Loss 4.4481 LearningRate 0.0114 Epoch: 13 Global Step: 549210 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:25:42,549-Speed 2528.11 samples/sec Loss 4.6054 LearningRate 0.0114 Epoch: 13 Global Step: 549220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:25:46,451-Speed 2624.12 samples/sec Loss 4.5126 LearningRate 0.0114 Epoch: 13 Global Step: 549230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:25:50,358-Speed 2622.21 samples/sec Loss 4.5668 LearningRate 0.0114 Epoch: 13 Global Step: 549240 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:25:54,279-Speed 2612.00 samples/sec Loss 4.5684 LearningRate 0.0114 Epoch: 13 Global Step: 549250 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:25:58,193-Speed 2617.18 samples/sec Loss 4.6187 LearningRate 0.0114 Epoch: 13 Global Step: 549260 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:26:02,084-Speed 2632.26 samples/sec Loss 4.4928 LearningRate 0.0114 Epoch: 13 Global Step: 549270 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:26:05,980-Speed 2629.44 samples/sec Loss 4.4926 LearningRate 0.0114 Epoch: 13 Global Step: 549280 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:26:09,880-Speed 2625.83 samples/sec Loss 4.6451 LearningRate 0.0114 Epoch: 13 Global Step: 549290 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:26:13,776-Speed 2629.08 samples/sec Loss 4.6611 LearningRate 0.0114 Epoch: 13 Global Step: 549300 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:26:17,670-Speed 2630.99 samples/sec Loss 4.5098 LearningRate 0.0114 Epoch: 13 Global Step: 549310 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:26:21,538-Speed 2647.76 samples/sec Loss 4.5170 LearningRate 0.0114 Epoch: 13 Global Step: 549320 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:25,429-Speed 2632.88 samples/sec Loss 4.5879 LearningRate 0.0114 Epoch: 13 Global Step: 549330 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:29,322-Speed 2630.54 samples/sec Loss 4.5702 LearningRate 0.0114 Epoch: 13 Global Step: 549340 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:33,223-Speed 2626.04 samples/sec Loss 4.5434 LearningRate 0.0114 Epoch: 13 Global Step: 549350 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:37,153-Speed 2605.92 samples/sec Loss 4.5882 LearningRate 0.0114 Epoch: 13 Global Step: 549360 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:41,045-Speed 2632.08 samples/sec Loss 4.5162 LearningRate 0.0114 Epoch: 13 Global Step: 549370 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:44,938-Speed 2631.26 samples/sec Loss 4.5899 LearningRate 0.0114 Epoch: 13 Global Step: 549380 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:48,830-Speed 2631.78 samples/sec Loss 4.4946 LearningRate 0.0114 Epoch: 13 Global Step: 549390 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:52,724-Speed 2629.97 samples/sec Loss 4.5241 LearningRate 0.0114 Epoch: 13 Global Step: 549400 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:26:56,637-Speed 2617.99 samples/sec Loss 4.5546 LearningRate 0.0114 Epoch: 13 Global Step: 549410 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:27:00,531-Speed 2630.54 samples/sec Loss 4.5002 LearningRate 0.0114 Epoch: 13 Global Step: 549420 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:04,424-Speed 2631.41 samples/sec Loss 4.5062 LearningRate 0.0114 Epoch: 13 Global Step: 549430 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:08,316-Speed 2631.37 samples/sec Loss 4.5068 LearningRate 0.0114 Epoch: 13 Global Step: 549440 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:12,217-Speed 2625.81 samples/sec Loss 4.5308 LearningRate 0.0114 Epoch: 13 Global Step: 549450 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:16,106-Speed 2633.19 samples/sec Loss 4.6409 LearningRate 0.0114 Epoch: 13 Global Step: 549460 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:20,012-Speed 2622.21 samples/sec Loss 4.6306 LearningRate 0.0114 Epoch: 13 Global Step: 549470 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:23,917-Speed 2623.24 samples/sec Loss 4.5814 LearningRate 0.0114 Epoch: 13 Global Step: 549480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:27,814-Speed 2628.80 samples/sec Loss 4.6424 LearningRate 0.0114 Epoch: 13 Global Step: 549490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:31,711-Speed 2628.07 samples/sec Loss 4.4670 LearningRate 0.0114 Epoch: 13 Global Step: 549500 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:35,623-Speed 2618.41 samples/sec Loss 4.6011 LearningRate 0.0114 Epoch: 13 Global Step: 549510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:27:39,531-Speed 2620.60 samples/sec Loss 4.4637 LearningRate 0.0114 Epoch: 13 Global Step: 549520 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:27:43,427-Speed 2629.50 samples/sec Loss 4.5986 LearningRate 0.0114 Epoch: 13 Global Step: 549530 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:27:47,321-Speed 2630.61 samples/sec Loss 4.5092 LearningRate 0.0114 Epoch: 13 Global Step: 549540 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:27:51,212-Speed 2631.76 samples/sec Loss 4.5478 LearningRate 0.0114 Epoch: 13 Global Step: 549550 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:27:55,113-Speed 2626.17 samples/sec Loss 4.4888 LearningRate 0.0114 Epoch: 13 Global Step: 549560 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:27:59,013-Speed 2626.12 samples/sec Loss 4.5548 LearningRate 0.0114 Epoch: 13 Global Step: 549570 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:02,924-Speed 2619.19 samples/sec Loss 4.4965 LearningRate 0.0114 Epoch: 13 Global Step: 549580 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:06,827-Speed 2624.23 samples/sec Loss 4.6026 LearningRate 0.0114 Epoch: 13 Global Step: 549590 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:10,722-Speed 2629.68 samples/sec Loss 4.5019 LearningRate 0.0114 Epoch: 13 Global Step: 549600 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:14,617-Speed 2629.08 samples/sec Loss 4.5761 LearningRate 0.0114 Epoch: 13 Global Step: 549610 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:18,498-Speed 2639.76 samples/sec Loss 4.6651 LearningRate 0.0114 Epoch: 13 Global Step: 549620 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:22,394-Speed 2629.01 samples/sec Loss 4.5909 LearningRate 0.0114 Epoch: 13 Global Step: 549630 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:26,376-Speed 2571.94 samples/sec Loss 4.5730 LearningRate 0.0114 Epoch: 13 Global Step: 549640 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:28:30,249-Speed 2644.26 samples/sec Loss 4.5019 LearningRate 0.0114 Epoch: 13 Global Step: 549650 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:28:34,142-Speed 2632.04 samples/sec Loss 4.5343 LearningRate 0.0114 Epoch: 13 Global Step: 549660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:28:38,041-Speed 2627.31 samples/sec Loss 4.4885 LearningRate 0.0114 Epoch: 13 Global Step: 549670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:28:41,937-Speed 2628.63 samples/sec Loss 4.5199 LearningRate 0.0114 Epoch: 13 Global Step: 549680 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:28:45,837-Speed 2626.47 samples/sec Loss 4.6210 LearningRate 0.0114 Epoch: 13 Global Step: 549690 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:28:49,737-Speed 2626.21 samples/sec Loss 4.6058 LearningRate 0.0114 Epoch: 13 Global Step: 549700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:28:53,633-Speed 2629.35 samples/sec Loss 4.4854 LearningRate 0.0114 Epoch: 13 Global Step: 549710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:28:57,542-Speed 2620.04 samples/sec Loss 4.5351 LearningRate 0.0114 Epoch: 13 Global Step: 549720 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:01,437-Speed 2629.98 samples/sec Loss 4.6226 LearningRate 0.0114 Epoch: 13 Global Step: 549730 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:05,339-Speed 2625.02 samples/sec Loss 4.5932 LearningRate 0.0114 Epoch: 13 Global Step: 549740 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:09,235-Speed 2629.19 samples/sec Loss 4.5315 LearningRate 0.0114 Epoch: 13 Global Step: 549750 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:29:13,141-Speed 2622.98 samples/sec Loss 4.5463 LearningRate 0.0114 Epoch: 13 Global Step: 549760 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:29:17,012-Speed 2645.59 samples/sec Loss 4.5443 LearningRate 0.0114 Epoch: 13 Global Step: 549770 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:20,932-Speed 2613.60 samples/sec Loss 4.5001 LearningRate 0.0114 Epoch: 13 Global Step: 549780 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:24,826-Speed 2630.12 samples/sec Loss 4.6065 LearningRate 0.0114 Epoch: 13 Global Step: 549790 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:28,726-Speed 2626.33 samples/sec Loss 4.5859 LearningRate 0.0114 Epoch: 13 Global Step: 549800 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:32,621-Speed 2629.55 samples/sec Loss 4.5082 LearningRate 0.0114 Epoch: 13 Global Step: 549810 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:36,514-Speed 2631.64 samples/sec Loss 4.5553 LearningRate 0.0114 Epoch: 13 Global Step: 549820 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:40,429-Speed 2615.74 samples/sec Loss 4.5938 LearningRate 0.0114 Epoch: 13 Global Step: 549830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:44,337-Speed 2621.20 samples/sec Loss 4.6579 LearningRate 0.0114 Epoch: 13 Global Step: 549840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:29:48,208-Speed 2646.15 samples/sec Loss 4.5233 LearningRate 0.0114 Epoch: 13 Global Step: 549850 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:29:52,103-Speed 2629.75 samples/sec Loss 4.5159 LearningRate 0.0114 Epoch: 13 Global Step: 549860 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:29:55,996-Speed 2631.04 samples/sec Loss 4.4611 LearningRate 0.0114 Epoch: 13 Global Step: 549870 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:29:59,890-Speed 2630.23 samples/sec Loss 4.5345 LearningRate 0.0114 Epoch: 13 Global Step: 549880 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:30:03,787-Speed 2628.20 samples/sec Loss 4.5579 LearningRate 0.0114 Epoch: 13 Global Step: 549890 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:30:07,756-Speed 2580.81 samples/sec Loss 4.5255 LearningRate 0.0114 Epoch: 13 Global Step: 549900 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:30:11,660-Speed 2623.54 samples/sec Loss 4.5333 LearningRate 0.0114 Epoch: 13 Global Step: 549910 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:30:15,560-Speed 2626.29 samples/sec Loss 4.4390 LearningRate 0.0114 Epoch: 13 Global Step: 549920 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:30:19,457-Speed 2629.04 samples/sec Loss 4.4909 LearningRate 0.0114 Epoch: 13 Global Step: 549930 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:30:23,357-Speed 2625.76 samples/sec Loss 4.5241 LearningRate 0.0114 Epoch: 13 Global Step: 549940 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:30:27,247-Speed 2633.50 samples/sec Loss 4.5596 LearningRate 0.0114 Epoch: 13 Global Step: 549950 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:30:31,140-Speed 2631.21 samples/sec Loss 4.6035 LearningRate 0.0114 Epoch: 13 Global Step: 549960 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:30:35,044-Speed 2622.97 samples/sec Loss 4.5768 LearningRate 0.0114 Epoch: 13 Global Step: 549970 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:30:38,962-Speed 2614.46 samples/sec Loss 4.4665 LearningRate 0.0114 Epoch: 13 Global Step: 549980 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:30:42,875-Speed 2617.84 samples/sec Loss 4.5623 LearningRate 0.0114 Epoch: 13 Global Step: 549990 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:30:46,773-Speed 2627.44 samples/sec Loss 4.4588 LearningRate 0.0114 Epoch: 13 Global Step: 550000 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:31:30,044-[lfw][550000]XNorm: 22.308013
Training: 2022-04-15 09:31:30,045-[lfw][550000]Accuracy-Flip: 0.99767+-0.00281
Training: 2022-04-15 09:31:30,045-[lfw][550000]Accuracy-Highest: 0.99800
Training: 2022-04-15 09:32:20,260-[cfp_fp][550000]XNorm: 20.862175
Training: 2022-04-15 09:32:20,261-[cfp_fp][550000]Accuracy-Flip: 0.99071+-0.00483
Training: 2022-04-15 09:32:20,263-[cfp_fp][550000]Accuracy-Highest: 0.99086
Training: 2022-04-15 09:33:03,531-[agedb_30][550000]XNorm: 22.551556
Training: 2022-04-15 09:33:03,532-[agedb_30][550000]Accuracy-Flip: 0.97850+-0.00762
Training: 2022-04-15 09:33:03,532-[agedb_30][550000]Accuracy-Highest: 0.98083
Training: 2022-04-15 09:33:07,402-Speed 72.82 samples/sec Loss 4.4630 LearningRate 0.0114 Epoch: 13 Global Step: 550010 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:11,304-Speed 2633.78 samples/sec Loss 4.6039 LearningRate 0.0114 Epoch: 13 Global Step: 550020 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:15,241-Speed 2601.63 samples/sec Loss 4.6053 LearningRate 0.0114 Epoch: 13 Global Step: 550030 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:19,149-Speed 2621.01 samples/sec Loss 4.5317 LearningRate 0.0114 Epoch: 13 Global Step: 550040 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:22,996-Speed 2662.75 samples/sec Loss 4.5080 LearningRate 0.0114 Epoch: 13 Global Step: 550050 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:26,867-Speed 2645.71 samples/sec Loss 4.4993 LearningRate 0.0114 Epoch: 13 Global Step: 550060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:30,748-Speed 2639.28 samples/sec Loss 4.5862 LearningRate 0.0114 Epoch: 13 Global Step: 550070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:34,628-Speed 2640.11 samples/sec Loss 4.6174 LearningRate 0.0114 Epoch: 13 Global Step: 550080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:38,571-Speed 2598.99 samples/sec Loss 4.5927 LearningRate 0.0114 Epoch: 13 Global Step: 550090 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:42,645-Speed 2513.97 samples/sec Loss 4.5261 LearningRate 0.0113 Epoch: 13 Global Step: 550100 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:46,653-Speed 2556.09 samples/sec Loss 4.5855 LearningRate 0.0113 Epoch: 13 Global Step: 550110 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:50,530-Speed 2642.39 samples/sec Loss 4.4566 LearningRate 0.0113 Epoch: 13 Global Step: 550120 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:54,409-Speed 2640.02 samples/sec Loss 4.4354 LearningRate 0.0113 Epoch: 13 Global Step: 550130 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:33:58,335-Speed 2609.46 samples/sec Loss 4.5951 LearningRate 0.0113 Epoch: 13 Global Step: 550140 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:02,230-Speed 2629.89 samples/sec Loss 4.5657 LearningRate 0.0113 Epoch: 13 Global Step: 550150 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:34:06,113-Speed 2638.56 samples/sec Loss 4.5609 LearningRate 0.0113 Epoch: 13 Global Step: 550160 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:34:09,992-Speed 2641.01 samples/sec Loss 4.6082 LearningRate 0.0113 Epoch: 13 Global Step: 550170 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:34:13,884-Speed 2631.78 samples/sec Loss 4.5367 LearningRate 0.0113 Epoch: 13 Global Step: 550180 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:34:17,771-Speed 2634.69 samples/sec Loss 4.5682 LearningRate 0.0113 Epoch: 13 Global Step: 550190 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:34:21,668-Speed 2628.59 samples/sec Loss 4.5381 LearningRate 0.0113 Epoch: 13 Global Step: 550200 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:25,553-Speed 2636.07 samples/sec Loss 4.5576 LearningRate 0.0113 Epoch: 13 Global Step: 550210 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:29,437-Speed 2637.45 samples/sec Loss 4.6566 LearningRate 0.0113 Epoch: 13 Global Step: 550220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:33,323-Speed 2635.97 samples/sec Loss 4.4618 LearningRate 0.0113 Epoch: 13 Global Step: 550230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:37,209-Speed 2635.89 samples/sec Loss 4.5766 LearningRate 0.0113 Epoch: 13 Global Step: 550240 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:41,095-Speed 2636.03 samples/sec Loss 4.5413 LearningRate 0.0113 Epoch: 13 Global Step: 550250 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:45,056-Speed 2585.81 samples/sec Loss 4.5459 LearningRate 0.0113 Epoch: 13 Global Step: 550260 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:48,941-Speed 2636.07 samples/sec Loss 4.5150 LearningRate 0.0113 Epoch: 13 Global Step: 550270 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:52,824-Speed 2637.93 samples/sec Loss 4.4779 LearningRate 0.0113 Epoch: 13 Global Step: 550280 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:34:56,712-Speed 2634.25 samples/sec Loss 4.5118 LearningRate 0.0113 Epoch: 13 Global Step: 550290 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:00,601-Speed 2634.13 samples/sec Loss 4.5939 LearningRate 0.0113 Epoch: 13 Global Step: 550300 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:35:04,490-Speed 2633.82 samples/sec Loss 4.6125 LearningRate 0.0113 Epoch: 13 Global Step: 550310 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:35:08,399-Speed 2620.63 samples/sec Loss 4.4293 LearningRate 0.0113 Epoch: 13 Global Step: 550320 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:35:12,285-Speed 2635.51 samples/sec Loss 4.5844 LearningRate 0.0113 Epoch: 13 Global Step: 550330 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:35:16,269-Speed 2570.77 samples/sec Loss 4.6051 LearningRate 0.0113 Epoch: 13 Global Step: 550340 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:20,332-Speed 2520.83 samples/sec Loss 4.5188 LearningRate 0.0113 Epoch: 13 Global Step: 550350 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:24,301-Speed 2580.90 samples/sec Loss 4.3770 LearningRate 0.0113 Epoch: 13 Global Step: 550360 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:28,189-Speed 2634.92 samples/sec Loss 4.6249 LearningRate 0.0113 Epoch: 13 Global Step: 550370 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:32,086-Speed 2628.70 samples/sec Loss 4.4695 LearningRate 0.0113 Epoch: 13 Global Step: 550380 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:35,968-Speed 2638.54 samples/sec Loss 4.5129 LearningRate 0.0113 Epoch: 13 Global Step: 550390 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:39,862-Speed 2630.19 samples/sec Loss 4.4624 LearningRate 0.0113 Epoch: 13 Global Step: 550400 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:43,746-Speed 2636.84 samples/sec Loss 4.4917 LearningRate 0.0113 Epoch: 13 Global Step: 550410 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:47,636-Speed 2633.13 samples/sec Loss 4.6036 LearningRate 0.0113 Epoch: 13 Global Step: 550420 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:51,536-Speed 2626.78 samples/sec Loss 4.5259 LearningRate 0.0113 Epoch: 13 Global Step: 550430 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:35:55,428-Speed 2631.63 samples/sec Loss 4.5368 LearningRate 0.0113 Epoch: 13 Global Step: 550440 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:35:59,318-Speed 2633.56 samples/sec Loss 4.5907 LearningRate 0.0113 Epoch: 13 Global Step: 550450 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:36:03,213-Speed 2629.63 samples/sec Loss 4.4847 LearningRate 0.0113 Epoch: 13 Global Step: 550460 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:36:07,103-Speed 2638.07 samples/sec Loss 4.4955 LearningRate 0.0113 Epoch: 13 Global Step: 550470 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:36:10,962-Speed 2653.94 samples/sec Loss 4.5022 LearningRate 0.0113 Epoch: 13 Global Step: 550480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:36:15,038-Speed 2512.92 samples/sec Loss 4.5539 LearningRate 0.0113 Epoch: 13 Global Step: 550490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:36:18,921-Speed 2637.81 samples/sec Loss 4.5367 LearningRate 0.0113 Epoch: 13 Global Step: 550500 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:36:22,807-Speed 2635.99 samples/sec Loss 4.5598 LearningRate 0.0113 Epoch: 13 Global Step: 550510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:36:26,694-Speed 2634.72 samples/sec Loss 4.6121 LearningRate 0.0113 Epoch: 13 Global Step: 550520 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:36:30,579-Speed 2637.51 samples/sec Loss 4.6273 LearningRate 0.0113 Epoch: 13 Global Step: 550530 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:36:34,476-Speed 2628.13 samples/sec Loss 4.4915 LearningRate 0.0113 Epoch: 13 Global Step: 550540 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:36:38,335-Speed 2654.57 samples/sec Loss 4.5832 LearningRate 0.0113 Epoch: 13 Global Step: 550550 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:36:42,227-Speed 2631.67 samples/sec Loss 4.5976 LearningRate 0.0113 Epoch: 13 Global Step: 550560 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:36:46,113-Speed 2635.55 samples/sec Loss 4.5603 LearningRate 0.0113 Epoch: 13 Global Step: 550570 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:36:49,996-Speed 2637.74 samples/sec Loss 4.4799 LearningRate 0.0113 Epoch: 13 Global Step: 550580 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:36:53,881-Speed 2636.91 samples/sec Loss 4.4793 LearningRate 0.0113 Epoch: 13 Global Step: 550590 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:36:57,770-Speed 2633.97 samples/sec Loss 4.4744 LearningRate 0.0113 Epoch: 13 Global Step: 550600 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:01,667-Speed 2628.34 samples/sec Loss 4.6310 LearningRate 0.0113 Epoch: 13 Global Step: 550610 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:05,559-Speed 2631.89 samples/sec Loss 4.5442 LearningRate 0.0113 Epoch: 13 Global Step: 550620 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:09,440-Speed 2638.52 samples/sec Loss 4.5522 LearningRate 0.0113 Epoch: 13 Global Step: 550630 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:13,328-Speed 2634.76 samples/sec Loss 4.5238 LearningRate 0.0113 Epoch: 13 Global Step: 550640 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:17,227-Speed 2626.80 samples/sec Loss 4.5130 LearningRate 0.0113 Epoch: 13 Global Step: 550650 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:37:21,115-Speed 2635.04 samples/sec Loss 4.5381 LearningRate 0.0113 Epoch: 13 Global Step: 550660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:37:25,002-Speed 2634.72 samples/sec Loss 4.5520 LearningRate 0.0113 Epoch: 13 Global Step: 550670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:37:28,900-Speed 2627.36 samples/sec Loss 4.5810 LearningRate 0.0113 Epoch: 13 Global Step: 550680 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:37:32,796-Speed 2628.81 samples/sec Loss 4.4741 LearningRate 0.0113 Epoch: 13 Global Step: 550690 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:37:36,696-Speed 2627.04 samples/sec Loss 4.6323 LearningRate 0.0113 Epoch: 13 Global Step: 550700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:37:40,571-Speed 2643.06 samples/sec Loss 4.5552 LearningRate 0.0113 Epoch: 13 Global Step: 550710 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:44,463-Speed 2631.43 samples/sec Loss 4.5811 LearningRate 0.0113 Epoch: 13 Global Step: 550720 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:48,352-Speed 2633.82 samples/sec Loss 4.4352 LearningRate 0.0113 Epoch: 13 Global Step: 550730 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:52,239-Speed 2635.18 samples/sec Loss 4.5380 LearningRate 0.0113 Epoch: 13 Global Step: 550740 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:37:56,122-Speed 2637.35 samples/sec Loss 4.4353 LearningRate 0.0113 Epoch: 13 Global Step: 550750 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:38:00,013-Speed 2633.27 samples/sec Loss 4.4065 LearningRate 0.0113 Epoch: 13 Global Step: 550760 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:38:03,898-Speed 2636.59 samples/sec Loss 4.4738 LearningRate 0.0113 Epoch: 13 Global Step: 550770 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:38:07,788-Speed 2633.26 samples/sec Loss 4.5380 LearningRate 0.0113 Epoch: 13 Global Step: 550780 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:38:11,675-Speed 2635.00 samples/sec Loss 4.5671 LearningRate 0.0113 Epoch: 13 Global Step: 550790 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:38:15,569-Speed 2630.38 samples/sec Loss 4.6002 LearningRate 0.0113 Epoch: 13 Global Step: 550800 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:38:19,452-Speed 2637.46 samples/sec Loss 4.5464 LearningRate 0.0113 Epoch: 13 Global Step: 550810 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:23,361-Speed 2620.58 samples/sec Loss 4.5423 LearningRate 0.0113 Epoch: 13 Global Step: 550820 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:27,244-Speed 2637.76 samples/sec Loss 4.4915 LearningRate 0.0113 Epoch: 13 Global Step: 550830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:31,126-Speed 2638.21 samples/sec Loss 4.5240 LearningRate 0.0113 Epoch: 13 Global Step: 550840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:35,009-Speed 2637.34 samples/sec Loss 4.5385 LearningRate 0.0113 Epoch: 13 Global Step: 550850 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:38,892-Speed 2638.81 samples/sec Loss 4.4860 LearningRate 0.0113 Epoch: 13 Global Step: 550860 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:42,775-Speed 2637.18 samples/sec Loss 4.5955 LearningRate 0.0113 Epoch: 13 Global Step: 550870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:46,665-Speed 2633.93 samples/sec Loss 4.5491 LearningRate 0.0113 Epoch: 13 Global Step: 550880 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:50,548-Speed 2637.29 samples/sec Loss 4.5463 LearningRate 0.0113 Epoch: 13 Global Step: 550890 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:54,430-Speed 2638.04 samples/sec Loss 4.5698 LearningRate 0.0113 Epoch: 13 Global Step: 550900 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:38:58,322-Speed 2631.95 samples/sec Loss 4.4243 LearningRate 0.0113 Epoch: 13 Global Step: 550910 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:39:02,183-Speed 2653.09 samples/sec Loss 4.4930 LearningRate 0.0113 Epoch: 13 Global Step: 550920 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:06,085-Speed 2625.05 samples/sec Loss 4.5544 LearningRate 0.0113 Epoch: 13 Global Step: 550930 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:09,968-Speed 2637.28 samples/sec Loss 4.5033 LearningRate 0.0113 Epoch: 13 Global Step: 550940 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:13,862-Speed 2630.91 samples/sec Loss 4.5718 LearningRate 0.0113 Epoch: 13 Global Step: 550950 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:17,752-Speed 2633.51 samples/sec Loss 4.5083 LearningRate 0.0113 Epoch: 13 Global Step: 550960 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:21,675-Speed 2610.73 samples/sec Loss 4.6012 LearningRate 0.0113 Epoch: 13 Global Step: 550970 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:25,569-Speed 2629.84 samples/sec Loss 4.4946 LearningRate 0.0113 Epoch: 13 Global Step: 550980 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:29,473-Speed 2623.53 samples/sec Loss 4.5513 LearningRate 0.0113 Epoch: 13 Global Step: 550990 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:33,362-Speed 2634.09 samples/sec Loss 4.4951 LearningRate 0.0113 Epoch: 13 Global Step: 551000 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:37,254-Speed 2631.62 samples/sec Loss 4.5869 LearningRate 0.0113 Epoch: 13 Global Step: 551010 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:39:41,230-Speed 2576.23 samples/sec Loss 4.5124 LearningRate 0.0113 Epoch: 13 Global Step: 551020 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:39:45,310-Speed 2510.29 samples/sec Loss 4.4623 LearningRate 0.0113 Epoch: 13 Global Step: 551030 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:39:49,394-Speed 2508.29 samples/sec Loss 4.5500 LearningRate 0.0113 Epoch: 13 Global Step: 551040 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:39:53,475-Speed 2510.06 samples/sec Loss 4.4662 LearningRate 0.0113 Epoch: 13 Global Step: 551050 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:39:57,518-Speed 2532.92 samples/sec Loss 4.5634 LearningRate 0.0113 Epoch: 13 Global Step: 551060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:01,497-Speed 2575.01 samples/sec Loss 4.6174 LearningRate 0.0113 Epoch: 13 Global Step: 551070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:05,384-Speed 2634.63 samples/sec Loss 4.4628 LearningRate 0.0113 Epoch: 13 Global Step: 551080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:09,269-Speed 2636.44 samples/sec Loss 4.5502 LearningRate 0.0113 Epoch: 13 Global Step: 551090 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:13,181-Speed 2618.54 samples/sec Loss 4.5904 LearningRate 0.0113 Epoch: 13 Global Step: 551100 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:17,071-Speed 2633.59 samples/sec Loss 4.5772 LearningRate 0.0113 Epoch: 13 Global Step: 551110 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:20,964-Speed 2630.84 samples/sec Loss 4.4989 LearningRate 0.0113 Epoch: 13 Global Step: 551120 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:24,858-Speed 2630.03 samples/sec Loss 4.5668 LearningRate 0.0113 Epoch: 13 Global Step: 551130 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:28,741-Speed 2637.74 samples/sec Loss 4.5421 LearningRate 0.0113 Epoch: 13 Global Step: 551140 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:32,625-Speed 2637.19 samples/sec Loss 4.4885 LearningRate 0.0113 Epoch: 13 Global Step: 551150 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:40:36,511-Speed 2635.87 samples/sec Loss 4.5101 LearningRate 0.0113 Epoch: 13 Global Step: 551160 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:40:40,398-Speed 2634.60 samples/sec Loss 4.4823 LearningRate 0.0113 Epoch: 13 Global Step: 551170 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:40:44,284-Speed 2636.41 samples/sec Loss 4.4853 LearningRate 0.0113 Epoch: 13 Global Step: 551180 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:40:48,179-Speed 2629.57 samples/sec Loss 4.5149 LearningRate 0.0113 Epoch: 13 Global Step: 551190 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:40:52,066-Speed 2635.69 samples/sec Loss 4.4617 LearningRate 0.0113 Epoch: 13 Global Step: 551200 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:40:55,973-Speed 2621.25 samples/sec Loss 4.5847 LearningRate 0.0113 Epoch: 13 Global Step: 551210 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:40:59,846-Speed 2644.39 samples/sec Loss 4.5725 LearningRate 0.0113 Epoch: 13 Global Step: 551220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:03,735-Speed 2633.96 samples/sec Loss 4.5224 LearningRate 0.0113 Epoch: 13 Global Step: 551230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:07,624-Speed 2634.12 samples/sec Loss 4.4283 LearningRate 0.0113 Epoch: 13 Global Step: 551240 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:11,515-Speed 2632.10 samples/sec Loss 4.4871 LearningRate 0.0113 Epoch: 13 Global Step: 551250 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:15,424-Speed 2620.62 samples/sec Loss 4.5996 LearningRate 0.0113 Epoch: 13 Global Step: 551260 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:19,318-Speed 2630.33 samples/sec Loss 4.5954 LearningRate 0.0113 Epoch: 13 Global Step: 551270 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:23,215-Speed 2628.90 samples/sec Loss 4.5037 LearningRate 0.0113 Epoch: 13 Global Step: 551280 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:27,102-Speed 2634.84 samples/sec Loss 4.5139 LearningRate 0.0113 Epoch: 13 Global Step: 551290 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:31,003-Speed 2625.91 samples/sec Loss 4.6073 LearningRate 0.0113 Epoch: 13 Global Step: 551300 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:34,905-Speed 2624.73 samples/sec Loss 4.4714 LearningRate 0.0113 Epoch: 13 Global Step: 551310 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:38,798-Speed 2630.81 samples/sec Loss 4.4685 LearningRate 0.0113 Epoch: 13 Global Step: 551320 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:41:42,664-Speed 2650.42 samples/sec Loss 4.5645 LearningRate 0.0113 Epoch: 13 Global Step: 551330 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:46,553-Speed 2633.79 samples/sec Loss 4.5987 LearningRate 0.0112 Epoch: 13 Global Step: 551340 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:50,442-Speed 2634.04 samples/sec Loss 4.6280 LearningRate 0.0112 Epoch: 13 Global Step: 551350 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:54,331-Speed 2633.51 samples/sec Loss 4.5056 LearningRate 0.0112 Epoch: 13 Global Step: 551360 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:41:58,218-Speed 2635.24 samples/sec Loss 4.5585 LearningRate 0.0112 Epoch: 13 Global Step: 551370 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:42:02,110-Speed 2632.18 samples/sec Loss 4.4598 LearningRate 0.0112 Epoch: 13 Global Step: 551380 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:42:06,001-Speed 2632.00 samples/sec Loss 4.4489 LearningRate 0.0112 Epoch: 13 Global Step: 551390 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:42:09,900-Speed 2626.95 samples/sec Loss 4.5131 LearningRate 0.0112 Epoch: 13 Global Step: 551400 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:42:13,918-Speed 2549.35 samples/sec Loss 4.5358 LearningRate 0.0112 Epoch: 13 Global Step: 551410 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:42:17,824-Speed 2622.52 samples/sec Loss 4.4265 LearningRate 0.0112 Epoch: 13 Global Step: 551420 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:42:21,817-Speed 2565.11 samples/sec Loss 4.6353 LearningRate 0.0112 Epoch: 13 Global Step: 551430 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:42:25,728-Speed 2619.28 samples/sec Loss 4.5256 LearningRate 0.0112 Epoch: 13 Global Step: 551440 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:42:29,606-Speed 2642.35 samples/sec Loss 4.4616 LearningRate 0.0112 Epoch: 13 Global Step: 551450 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:42:33,514-Speed 2621.07 samples/sec Loss 4.4183 LearningRate 0.0112 Epoch: 13 Global Step: 551460 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:42:37,415-Speed 2625.48 samples/sec Loss 4.5544 LearningRate 0.0112 Epoch: 13 Global Step: 551470 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:42:41,306-Speed 2632.52 samples/sec Loss 4.5123 LearningRate 0.0112 Epoch: 13 Global Step: 551480 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:42:45,201-Speed 2629.37 samples/sec Loss 4.4471 LearningRate 0.0112 Epoch: 13 Global Step: 551490 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:42:49,108-Speed 2621.68 samples/sec Loss 4.5138 LearningRate 0.0112 Epoch: 13 Global Step: 551500 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:42:53,015-Speed 2621.96 samples/sec Loss 4.5024 LearningRate 0.0112 Epoch: 13 Global Step: 551510 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:42:56,925-Speed 2620.35 samples/sec Loss 4.6287 LearningRate 0.0112 Epoch: 13 Global Step: 551520 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:43:00,832-Speed 2621.58 samples/sec Loss 4.5713 LearningRate 0.0112 Epoch: 13 Global Step: 551530 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:43:04,720-Speed 2634.39 samples/sec Loss 4.5678 LearningRate 0.0112 Epoch: 13 Global Step: 551540 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:43:08,610-Speed 2632.95 samples/sec Loss 4.5670 LearningRate 0.0112 Epoch: 13 Global Step: 551550 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:12,513-Speed 2624.26 samples/sec Loss 4.6525 LearningRate 0.0112 Epoch: 13 Global Step: 551560 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:16,403-Speed 2633.06 samples/sec Loss 4.5312 LearningRate 0.0112 Epoch: 13 Global Step: 551570 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:20,308-Speed 2623.11 samples/sec Loss 4.4854 LearningRate 0.0112 Epoch: 13 Global Step: 551580 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:24,232-Speed 2610.30 samples/sec Loss 4.3781 LearningRate 0.0112 Epoch: 13 Global Step: 551590 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:28,127-Speed 2629.64 samples/sec Loss 4.6162 LearningRate 0.0112 Epoch: 13 Global Step: 551600 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:32,014-Speed 2634.73 samples/sec Loss 4.5610 LearningRate 0.0112 Epoch: 13 Global Step: 551610 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:35,905-Speed 2632.16 samples/sec Loss 4.5135 LearningRate 0.0112 Epoch: 13 Global Step: 551620 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:39,799-Speed 2630.07 samples/sec Loss 4.5578 LearningRate 0.0112 Epoch: 13 Global Step: 551630 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:43,699-Speed 2627.02 samples/sec Loss 4.4929 LearningRate 0.0112 Epoch: 13 Global Step: 551640 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:47,594-Speed 2629.72 samples/sec Loss 4.5970 LearningRate 0.0112 Epoch: 13 Global Step: 551650 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:43:51,460-Speed 2649.36 samples/sec Loss 4.5177 LearningRate 0.0112 Epoch: 13 Global Step: 551660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:55,348-Speed 2634.21 samples/sec Loss 4.5107 LearningRate 0.0112 Epoch: 13 Global Step: 551670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:43:59,260-Speed 2618.31 samples/sec Loss 4.6349 LearningRate 0.0112 Epoch: 13 Global Step: 551680 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:03,154-Speed 2630.16 samples/sec Loss 4.5015 LearningRate 0.0112 Epoch: 13 Global Step: 551690 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:07,067-Speed 2617.50 samples/sec Loss 4.5453 LearningRate 0.0112 Epoch: 13 Global Step: 551700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:10,966-Speed 2626.78 samples/sec Loss 4.4546 LearningRate 0.0112 Epoch: 13 Global Step: 551710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:14,906-Speed 2599.53 samples/sec Loss 4.4838 LearningRate 0.0112 Epoch: 13 Global Step: 551720 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:18,798-Speed 2632.55 samples/sec Loss 4.5549 LearningRate 0.0112 Epoch: 13 Global Step: 551730 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:22,689-Speed 2632.38 samples/sec Loss 4.5436 LearningRate 0.0112 Epoch: 13 Global Step: 551740 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:26,577-Speed 2634.38 samples/sec Loss 4.5645 LearningRate 0.0112 Epoch: 13 Global Step: 551750 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:30,471-Speed 2629.71 samples/sec Loss 4.5401 LearningRate 0.0112 Epoch: 13 Global Step: 551760 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:44:34,370-Speed 2627.48 samples/sec Loss 4.4857 LearningRate 0.0112 Epoch: 13 Global Step: 551770 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:44:38,268-Speed 2627.22 samples/sec Loss 4.5661 LearningRate 0.0112 Epoch: 13 Global Step: 551780 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:44:42,154-Speed 2635.49 samples/sec Loss 4.5459 LearningRate 0.0112 Epoch: 13 Global Step: 551790 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:46,052-Speed 2628.44 samples/sec Loss 4.5355 LearningRate 0.0112 Epoch: 13 Global Step: 551800 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:49,948-Speed 2629.04 samples/sec Loss 4.5318 LearningRate 0.0112 Epoch: 13 Global Step: 551810 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:53,856-Speed 2620.87 samples/sec Loss 4.4368 LearningRate 0.0112 Epoch: 13 Global Step: 551820 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:44:57,752-Speed 2629.46 samples/sec Loss 4.4476 LearningRate 0.0112 Epoch: 13 Global Step: 551830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:01,647-Speed 2629.42 samples/sec Loss 4.5616 LearningRate 0.0112 Epoch: 13 Global Step: 551840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:05,551-Speed 2622.91 samples/sec Loss 4.5214 LearningRate 0.0112 Epoch: 13 Global Step: 551850 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:09,443-Speed 2632.12 samples/sec Loss 4.5429 LearningRate 0.0112 Epoch: 13 Global Step: 551860 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:13,362-Speed 2613.27 samples/sec Loss 4.5698 LearningRate 0.0112 Epoch: 13 Global Step: 551870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:17,254-Speed 2631.73 samples/sec Loss 4.5522 LearningRate 0.0112 Epoch: 13 Global Step: 551880 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:21,150-Speed 2629.32 samples/sec Loss 4.4494 LearningRate 0.0112 Epoch: 13 Global Step: 551890 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:45:25,021-Speed 2645.98 samples/sec Loss 4.5249 LearningRate 0.0112 Epoch: 13 Global Step: 551900 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:28,916-Speed 2629.13 samples/sec Loss 4.4459 LearningRate 0.0112 Epoch: 13 Global Step: 551910 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:32,812-Speed 2629.14 samples/sec Loss 4.5604 LearningRate 0.0112 Epoch: 13 Global Step: 551920 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:45:36,683-Speed 2645.56 samples/sec Loss 4.5303 LearningRate 0.0112 Epoch: 13 Global Step: 551930 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:45:40,657-Speed 2577.60 samples/sec Loss 4.4674 LearningRate 0.0112 Epoch: 13 Global Step: 551940 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:45:44,554-Speed 2628.36 samples/sec Loss 4.4160 LearningRate 0.0112 Epoch: 13 Global Step: 551950 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:45:48,444-Speed 2633.13 samples/sec Loss 4.5706 LearningRate 0.0112 Epoch: 13 Global Step: 551960 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:45:52,341-Speed 2628.34 samples/sec Loss 4.5508 LearningRate 0.0112 Epoch: 13 Global Step: 551970 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:45:56,241-Speed 2626.28 samples/sec Loss 4.4436 LearningRate 0.0112 Epoch: 13 Global Step: 551980 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:46:00,142-Speed 2625.48 samples/sec Loss 4.4487 LearningRate 0.0112 Epoch: 13 Global Step: 551990 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:46:04,035-Speed 2630.53 samples/sec Loss 4.4790 LearningRate 0.0112 Epoch: 13 Global Step: 552000 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:46:07,932-Speed 2628.32 samples/sec Loss 4.5334 LearningRate 0.0112 Epoch: 13 Global Step: 552010 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:46:11,829-Speed 2628.68 samples/sec Loss 4.4643 LearningRate 0.0112 Epoch: 13 Global Step: 552020 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:46:15,722-Speed 2631.36 samples/sec Loss 4.4521 LearningRate 0.0112 Epoch: 13 Global Step: 552030 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:19,615-Speed 2630.42 samples/sec Loss 4.4931 LearningRate 0.0112 Epoch: 13 Global Step: 552040 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:23,561-Speed 2596.46 samples/sec Loss 4.5254 LearningRate 0.0112 Epoch: 13 Global Step: 552050 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:27,471-Speed 2618.76 samples/sec Loss 4.5332 LearningRate 0.0112 Epoch: 13 Global Step: 552060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:31,367-Speed 2629.44 samples/sec Loss 4.4812 LearningRate 0.0112 Epoch: 13 Global Step: 552070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:35,261-Speed 2629.78 samples/sec Loss 4.6360 LearningRate 0.0112 Epoch: 13 Global Step: 552080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:39,165-Speed 2624.04 samples/sec Loss 4.4334 LearningRate 0.0112 Epoch: 13 Global Step: 552090 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:43,056-Speed 2631.93 samples/sec Loss 4.4635 LearningRate 0.0112 Epoch: 13 Global Step: 552100 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:46,955-Speed 2627.29 samples/sec Loss 4.5493 LearningRate 0.0112 Epoch: 13 Global Step: 552110 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:50,849-Speed 2630.25 samples/sec Loss 4.5913 LearningRate 0.0112 Epoch: 13 Global Step: 552120 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:46:54,743-Speed 2630.37 samples/sec Loss 4.5461 LearningRate 0.0112 Epoch: 13 Global Step: 552130 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:46:58,639-Speed 2629.03 samples/sec Loss 4.4652 LearningRate 0.0112 Epoch: 13 Global Step: 552140 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:47:02,516-Speed 2641.83 samples/sec Loss 4.5952 LearningRate 0.0112 Epoch: 13 Global Step: 552150 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:06,429-Speed 2617.24 samples/sec Loss 4.4784 LearningRate 0.0112 Epoch: 13 Global Step: 552160 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:10,329-Speed 2626.27 samples/sec Loss 4.4794 LearningRate 0.0112 Epoch: 13 Global Step: 552170 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:14,225-Speed 2629.46 samples/sec Loss 4.4625 LearningRate 0.0112 Epoch: 13 Global Step: 552180 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:18,120-Speed 2629.49 samples/sec Loss 4.5552 LearningRate 0.0112 Epoch: 13 Global Step: 552190 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:22,015-Speed 2629.88 samples/sec Loss 4.4428 LearningRate 0.0112 Epoch: 13 Global Step: 552200 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:25,914-Speed 2626.51 samples/sec Loss 4.4648 LearningRate 0.0112 Epoch: 13 Global Step: 552210 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:29,820-Speed 2622.46 samples/sec Loss 4.5098 LearningRate 0.0112 Epoch: 13 Global Step: 552220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:33,713-Speed 2631.04 samples/sec Loss 4.4437 LearningRate 0.0112 Epoch: 13 Global Step: 552230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:37,611-Speed 2627.32 samples/sec Loss 4.5080 LearningRate 0.0112 Epoch: 13 Global Step: 552240 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:41,506-Speed 2629.45 samples/sec Loss 4.5365 LearningRate 0.0112 Epoch: 13 Global Step: 552250 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:47:45,401-Speed 2629.83 samples/sec Loss 4.5267 LearningRate 0.0112 Epoch: 13 Global Step: 552260 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:47:49,385-Speed 2570.94 samples/sec Loss 4.5875 LearningRate 0.0112 Epoch: 13 Global Step: 552270 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:47:53,252-Speed 2648.59 samples/sec Loss 4.5001 LearningRate 0.0112 Epoch: 13 Global Step: 552280 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:47:57,143-Speed 2632.65 samples/sec Loss 4.4417 LearningRate 0.0112 Epoch: 13 Global Step: 552290 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:01,050-Speed 2621.28 samples/sec Loss 4.5389 LearningRate 0.0112 Epoch: 13 Global Step: 552300 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:04,943-Speed 2631.01 samples/sec Loss 4.4308 LearningRate 0.0112 Epoch: 13 Global Step: 552310 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:08,849-Speed 2621.84 samples/sec Loss 4.5063 LearningRate 0.0112 Epoch: 13 Global Step: 552320 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:12,763-Speed 2617.82 samples/sec Loss 4.5176 LearningRate 0.0112 Epoch: 13 Global Step: 552330 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:16,655-Speed 2631.45 samples/sec Loss 4.5874 LearningRate 0.0112 Epoch: 13 Global Step: 552340 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:20,549-Speed 2630.64 samples/sec Loss 4.4764 LearningRate 0.0112 Epoch: 13 Global Step: 552350 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:24,443-Speed 2630.11 samples/sec Loss 4.4828 LearningRate 0.0112 Epoch: 13 Global Step: 552360 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:28,338-Speed 2629.75 samples/sec Loss 4.5537 LearningRate 0.0112 Epoch: 13 Global Step: 552370 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:32,228-Speed 2632.73 samples/sec Loss 4.4839 LearningRate 0.0112 Epoch: 13 Global Step: 552380 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:48:36,123-Speed 2629.02 samples/sec Loss 4.4152 LearningRate 0.0112 Epoch: 13 Global Step: 552390 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:48:39,993-Speed 2647.00 samples/sec Loss 4.5112 LearningRate 0.0112 Epoch: 13 Global Step: 552400 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:43,900-Speed 2621.50 samples/sec Loss 4.4615 LearningRate 0.0112 Epoch: 13 Global Step: 552410 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:47,798-Speed 2627.40 samples/sec Loss 4.4460 LearningRate 0.0112 Epoch: 13 Global Step: 552420 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:51,689-Speed 2632.49 samples/sec Loss 4.4929 LearningRate 0.0112 Epoch: 13 Global Step: 552430 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:55,583-Speed 2630.52 samples/sec Loss 4.5504 LearningRate 0.0112 Epoch: 13 Global Step: 552440 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:48:59,576-Speed 2565.32 samples/sec Loss 4.4150 LearningRate 0.0112 Epoch: 13 Global Step: 552450 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:03,473-Speed 2628.31 samples/sec Loss 4.4530 LearningRate 0.0112 Epoch: 13 Global Step: 552460 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:07,366-Speed 2630.45 samples/sec Loss 4.4797 LearningRate 0.0112 Epoch: 13 Global Step: 552470 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:11,263-Speed 2628.48 samples/sec Loss 4.5070 LearningRate 0.0112 Epoch: 13 Global Step: 552480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:15,154-Speed 2632.06 samples/sec Loss 4.5042 LearningRate 0.0112 Epoch: 13 Global Step: 552490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:19,035-Speed 2639.60 samples/sec Loss 4.5861 LearningRate 0.0112 Epoch: 13 Global Step: 552500 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:22,940-Speed 2622.63 samples/sec Loss 4.5439 LearningRate 0.0112 Epoch: 13 Global Step: 552510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:26,843-Speed 2624.48 samples/sec Loss 4.5219 LearningRate 0.0112 Epoch: 13 Global Step: 552520 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:30,751-Speed 2620.97 samples/sec Loss 4.5233 LearningRate 0.0112 Epoch: 13 Global Step: 552530 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:34,646-Speed 2629.70 samples/sec Loss 4.4453 LearningRate 0.0112 Epoch: 13 Global Step: 552540 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:38,554-Speed 2620.67 samples/sec Loss 4.4581 LearningRate 0.0112 Epoch: 13 Global Step: 552550 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:42,453-Speed 2627.18 samples/sec Loss 4.5655 LearningRate 0.0112 Epoch: 13 Global Step: 552560 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:46,368-Speed 2615.51 samples/sec Loss 4.5185 LearningRate 0.0111 Epoch: 13 Global Step: 552570 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:50,269-Speed 2626.21 samples/sec Loss 4.5093 LearningRate 0.0111 Epoch: 13 Global Step: 552580 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:54,163-Speed 2629.97 samples/sec Loss 4.4619 LearningRate 0.0111 Epoch: 13 Global Step: 552590 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:49:58,075-Speed 2618.02 samples/sec Loss 4.4148 LearningRate 0.0111 Epoch: 13 Global Step: 552600 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:50:01,970-Speed 2629.63 samples/sec Loss 4.5052 LearningRate 0.0111 Epoch: 13 Global Step: 552610 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:50:05,859-Speed 2633.77 samples/sec Loss 4.4667 LearningRate 0.0111 Epoch: 13 Global Step: 552620 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:50:09,753-Speed 2630.30 samples/sec Loss 4.6015 LearningRate 0.0111 Epoch: 13 Global Step: 552630 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:50:13,622-Speed 2647.35 samples/sec Loss 4.4598 LearningRate 0.0111 Epoch: 13 Global Step: 552640 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:17,509-Speed 2634.90 samples/sec Loss 4.5189 LearningRate 0.0111 Epoch: 13 Global Step: 552650 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:21,399-Speed 2633.04 samples/sec Loss 4.4164 LearningRate 0.0111 Epoch: 13 Global Step: 552660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:25,297-Speed 2627.75 samples/sec Loss 4.5578 LearningRate 0.0111 Epoch: 13 Global Step: 552670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:29,198-Speed 2626.01 samples/sec Loss 4.5607 LearningRate 0.0111 Epoch: 13 Global Step: 552680 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:33,087-Speed 2633.09 samples/sec Loss 4.5059 LearningRate 0.0111 Epoch: 13 Global Step: 552690 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:36,977-Speed 2632.78 samples/sec Loss 4.4692 LearningRate 0.0111 Epoch: 13 Global Step: 552700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:40,881-Speed 2624.04 samples/sec Loss 4.5369 LearningRate 0.0111 Epoch: 13 Global Step: 552710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:44,780-Speed 2627.22 samples/sec Loss 4.4793 LearningRate 0.0111 Epoch: 13 Global Step: 552720 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:48,670-Speed 2633.12 samples/sec Loss 4.4654 LearningRate 0.0111 Epoch: 13 Global Step: 552730 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:50:52,565-Speed 2630.01 samples/sec Loss 4.4833 LearningRate 0.0111 Epoch: 13 Global Step: 552740 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:50:56,460-Speed 2629.45 samples/sec Loss 4.5060 LearningRate 0.0111 Epoch: 13 Global Step: 552750 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:51:00,334-Speed 2643.18 samples/sec Loss 4.5311 LearningRate 0.0111 Epoch: 13 Global Step: 552760 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:04,235-Speed 2625.59 samples/sec Loss 4.5378 LearningRate 0.0111 Epoch: 13 Global Step: 552770 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:08,132-Speed 2628.24 samples/sec Loss 4.5342 LearningRate 0.0111 Epoch: 13 Global Step: 552780 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:12,022-Speed 2632.83 samples/sec Loss 4.4374 LearningRate 0.0111 Epoch: 13 Global Step: 552790 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:15,913-Speed 2633.04 samples/sec Loss 4.5363 LearningRate 0.0111 Epoch: 13 Global Step: 552800 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:19,805-Speed 2631.05 samples/sec Loss 4.5294 LearningRate 0.0111 Epoch: 13 Global Step: 552810 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:23,699-Speed 2630.65 samples/sec Loss 4.4595 LearningRate 0.0111 Epoch: 13 Global Step: 552820 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:27,594-Speed 2630.13 samples/sec Loss 4.4406 LearningRate 0.0111 Epoch: 13 Global Step: 552830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:31,503-Speed 2619.96 samples/sec Loss 4.5405 LearningRate 0.0111 Epoch: 13 Global Step: 552840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:35,404-Speed 2625.12 samples/sec Loss 4.4637 LearningRate 0.0111 Epoch: 13 Global Step: 552850 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:39,299-Speed 2629.75 samples/sec Loss 4.4274 LearningRate 0.0111 Epoch: 13 Global Step: 552860 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:51:43,171-Speed 2645.37 samples/sec Loss 4.4970 LearningRate 0.0111 Epoch: 13 Global Step: 552870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:47,075-Speed 2623.51 samples/sec Loss 4.4390 LearningRate 0.0111 Epoch: 13 Global Step: 552880 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:50,997-Speed 2611.14 samples/sec Loss 4.4238 LearningRate 0.0111 Epoch: 13 Global Step: 552890 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:54,901-Speed 2624.18 samples/sec Loss 4.5725 LearningRate 0.0111 Epoch: 13 Global Step: 552900 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:51:58,796-Speed 2629.55 samples/sec Loss 4.5178 LearningRate 0.0111 Epoch: 13 Global Step: 552910 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:02,691-Speed 2629.37 samples/sec Loss 4.5225 LearningRate 0.0111 Epoch: 13 Global Step: 552920 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:06,583-Speed 2631.95 samples/sec Loss 4.4917 LearningRate 0.0111 Epoch: 13 Global Step: 552930 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:10,474-Speed 2632.42 samples/sec Loss 4.5029 LearningRate 0.0111 Epoch: 13 Global Step: 552940 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:14,368-Speed 2629.92 samples/sec Loss 4.4922 LearningRate 0.0111 Epoch: 13 Global Step: 552950 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:18,263-Speed 2630.07 samples/sec Loss 4.5039 LearningRate 0.0111 Epoch: 13 Global Step: 552960 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:22,156-Speed 2630.75 samples/sec Loss 4.4871 LearningRate 0.0111 Epoch: 13 Global Step: 552970 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:52:26,053-Speed 2628.19 samples/sec Loss 4.4859 LearningRate 0.0111 Epoch: 13 Global Step: 552980 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:52:29,957-Speed 2623.68 samples/sec Loss 4.4420 LearningRate 0.0111 Epoch: 13 Global Step: 552990 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:52:33,846-Speed 2633.14 samples/sec Loss 4.5887 LearningRate 0.0111 Epoch: 13 Global Step: 553000 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:52:37,739-Speed 2631.02 samples/sec Loss 4.4331 LearningRate 0.0111 Epoch: 13 Global Step: 553010 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:52:41,612-Speed 2644.97 samples/sec Loss 4.4803 LearningRate 0.0111 Epoch: 13 Global Step: 553020 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:45,503-Speed 2632.55 samples/sec Loss 4.5154 LearningRate 0.0111 Epoch: 13 Global Step: 553030 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:49,395-Speed 2631.34 samples/sec Loss 4.4863 LearningRate 0.0111 Epoch: 13 Global Step: 553040 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:53,288-Speed 2630.95 samples/sec Loss 4.5392 LearningRate 0.0111 Epoch: 13 Global Step: 553050 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:52:57,252-Speed 2584.15 samples/sec Loss 4.6212 LearningRate 0.0111 Epoch: 13 Global Step: 553060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:01,151-Speed 2626.79 samples/sec Loss 4.4743 LearningRate 0.0111 Epoch: 13 Global Step: 553070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:05,052-Speed 2624.97 samples/sec Loss 4.4955 LearningRate 0.0111 Epoch: 13 Global Step: 553080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:08,945-Speed 2631.26 samples/sec Loss 4.4738 LearningRate 0.0111 Epoch: 13 Global Step: 553090 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:12,842-Speed 2628.13 samples/sec Loss 4.4478 LearningRate 0.0111 Epoch: 13 Global Step: 553100 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:16,734-Speed 2631.97 samples/sec Loss 4.5169 LearningRate 0.0111 Epoch: 13 Global Step: 553110 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:20,639-Speed 2623.27 samples/sec Loss 4.5005 LearningRate 0.0111 Epoch: 13 Global Step: 553120 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:53:24,543-Speed 2623.14 samples/sec Loss 4.4777 LearningRate 0.0111 Epoch: 13 Global Step: 553130 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:53:28,421-Speed 2641.18 samples/sec Loss 4.6034 LearningRate 0.0111 Epoch: 13 Global Step: 553140 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:32,322-Speed 2625.62 samples/sec Loss 4.5297 LearningRate 0.0111 Epoch: 13 Global Step: 553150 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:36,222-Speed 2626.47 samples/sec Loss 4.4656 LearningRate 0.0111 Epoch: 13 Global Step: 553160 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:40,120-Speed 2627.42 samples/sec Loss 4.4188 LearningRate 0.0111 Epoch: 13 Global Step: 553170 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:44,021-Speed 2625.58 samples/sec Loss 4.4382 LearningRate 0.0111 Epoch: 13 Global Step: 553180 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:47,931-Speed 2619.47 samples/sec Loss 4.5167 LearningRate 0.0111 Epoch: 13 Global Step: 553190 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:51,833-Speed 2624.68 samples/sec Loss 4.5294 LearningRate 0.0111 Epoch: 13 Global Step: 553200 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:55,736-Speed 2624.46 samples/sec Loss 4.4467 LearningRate 0.0111 Epoch: 13 Global Step: 553210 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:53:59,630-Speed 2630.84 samples/sec Loss 4.4752 LearningRate 0.0111 Epoch: 13 Global Step: 553220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:03,522-Speed 2631.35 samples/sec Loss 4.4935 LearningRate 0.0111 Epoch: 13 Global Step: 553230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:07,423-Speed 2625.08 samples/sec Loss 4.4286 LearningRate 0.0111 Epoch: 13 Global Step: 553240 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:54:11,303-Speed 2639.73 samples/sec Loss 4.4681 LearningRate 0.0111 Epoch: 13 Global Step: 553250 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:15,215-Speed 2626.44 samples/sec Loss 4.5184 LearningRate 0.0111 Epoch: 13 Global Step: 553260 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:19,110-Speed 2629.60 samples/sec Loss 4.6145 LearningRate 0.0111 Epoch: 13 Global Step: 553270 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:23,008-Speed 2627.88 samples/sec Loss 4.5401 LearningRate 0.0111 Epoch: 13 Global Step: 553280 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:26,917-Speed 2620.25 samples/sec Loss 4.5045 LearningRate 0.0111 Epoch: 13 Global Step: 553290 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:30,816-Speed 2626.84 samples/sec Loss 4.5637 LearningRate 0.0111 Epoch: 13 Global Step: 553300 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:34,871-Speed 2526.28 samples/sec Loss 4.5145 LearningRate 0.0111 Epoch: 13 Global Step: 553310 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:38,920-Speed 2529.33 samples/sec Loss 4.3969 LearningRate 0.0111 Epoch: 13 Global Step: 553320 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:42,813-Speed 2631.51 samples/sec Loss 4.5030 LearningRate 0.0111 Epoch: 13 Global Step: 553330 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:46,707-Speed 2629.60 samples/sec Loss 4.3945 LearningRate 0.0111 Epoch: 13 Global Step: 553340 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:50,581-Speed 2644.60 samples/sec Loss 4.4435 LearningRate 0.0111 Epoch: 13 Global Step: 553350 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:54:54,457-Speed 2641.89 samples/sec Loss 4.5173 LearningRate 0.0111 Epoch: 13 Global Step: 553360 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:54:58,352-Speed 2630.21 samples/sec Loss 4.5031 LearningRate 0.0111 Epoch: 13 Global Step: 553370 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:02,260-Speed 2620.44 samples/sec Loss 4.4471 LearningRate 0.0111 Epoch: 13 Global Step: 553380 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:06,171-Speed 2618.36 samples/sec Loss 4.3828 LearningRate 0.0111 Epoch: 13 Global Step: 553390 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:10,063-Speed 2631.94 samples/sec Loss 4.3984 LearningRate 0.0111 Epoch: 13 Global Step: 553400 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:13,960-Speed 2628.10 samples/sec Loss 4.4128 LearningRate 0.0111 Epoch: 13 Global Step: 553410 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:17,855-Speed 2629.93 samples/sec Loss 4.5349 LearningRate 0.0111 Epoch: 13 Global Step: 553420 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:21,753-Speed 2628.29 samples/sec Loss 4.4501 LearningRate 0.0111 Epoch: 13 Global Step: 553430 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:25,650-Speed 2628.04 samples/sec Loss 4.5588 LearningRate 0.0111 Epoch: 13 Global Step: 553440 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:29,545-Speed 2629.70 samples/sec Loss 4.4183 LearningRate 0.0111 Epoch: 13 Global Step: 553450 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:55:33,439-Speed 2629.71 samples/sec Loss 4.5669 LearningRate 0.0111 Epoch: 13 Global Step: 553460 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:55:37,334-Speed 2629.48 samples/sec Loss 4.4489 LearningRate 0.0111 Epoch: 13 Global Step: 553470 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:55:41,225-Speed 2632.28 samples/sec Loss 4.5162 LearningRate 0.0111 Epoch: 13 Global Step: 553480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:55:45,128-Speed 2624.15 samples/sec Loss 4.4915 LearningRate 0.0111 Epoch: 13 Global Step: 553490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:55:49,023-Speed 2629.85 samples/sec Loss 4.6169 LearningRate 0.0111 Epoch: 13 Global Step: 553500 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:55:52,914-Speed 2632.27 samples/sec Loss 4.5404 LearningRate 0.0111 Epoch: 13 Global Step: 553510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:55:56,789-Speed 2643.27 samples/sec Loss 4.4647 LearningRate 0.0111 Epoch: 13 Global Step: 553520 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:00,703-Speed 2617.14 samples/sec Loss 4.5086 LearningRate 0.0111 Epoch: 13 Global Step: 553530 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:04,602-Speed 2627.06 samples/sec Loss 4.4158 LearningRate 0.0111 Epoch: 13 Global Step: 553540 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:08,502-Speed 2625.92 samples/sec Loss 4.3793 LearningRate 0.0111 Epoch: 13 Global Step: 553550 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:12,403-Speed 2625.55 samples/sec Loss 4.4956 LearningRate 0.0111 Epoch: 13 Global Step: 553560 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:16,302-Speed 2627.13 samples/sec Loss 4.4902 LearningRate 0.0111 Epoch: 13 Global Step: 553570 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:20,201-Speed 2627.07 samples/sec Loss 4.5487 LearningRate 0.0111 Epoch: 13 Global Step: 553580 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:24,097-Speed 2629.06 samples/sec Loss 4.5557 LearningRate 0.0111 Epoch: 13 Global Step: 553590 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:27,993-Speed 2629.00 samples/sec Loss 4.4843 LearningRate 0.0111 Epoch: 13 Global Step: 553600 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:31,887-Speed 2629.98 samples/sec Loss 4.4384 LearningRate 0.0111 Epoch: 13 Global Step: 553610 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:56:35,780-Speed 2631.07 samples/sec Loss 4.5531 LearningRate 0.0111 Epoch: 13 Global Step: 553620 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:56:39,674-Speed 2630.45 samples/sec Loss 4.4129 LearningRate 0.0111 Epoch: 13 Global Step: 553630 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:56:43,573-Speed 2627.19 samples/sec Loss 4.5130 LearningRate 0.0111 Epoch: 13 Global Step: 553640 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:56:47,466-Speed 2630.96 samples/sec Loss 4.4912 LearningRate 0.0111 Epoch: 13 Global Step: 553650 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:56:51,361-Speed 2629.30 samples/sec Loss 4.5228 LearningRate 0.0111 Epoch: 13 Global Step: 553660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:56:55,258-Speed 2628.59 samples/sec Loss 4.5097 LearningRate 0.0111 Epoch: 13 Global Step: 553670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:56:59,161-Speed 2624.11 samples/sec Loss 4.4621 LearningRate 0.0111 Epoch: 13 Global Step: 553680 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:03,061-Speed 2626.23 samples/sec Loss 4.5168 LearningRate 0.0111 Epoch: 13 Global Step: 553690 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:06,974-Speed 2617.15 samples/sec Loss 4.4128 LearningRate 0.0111 Epoch: 13 Global Step: 553700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:10,880-Speed 2622.60 samples/sec Loss 4.4274 LearningRate 0.0111 Epoch: 13 Global Step: 553710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:14,783-Speed 2624.95 samples/sec Loss 4.4757 LearningRate 0.0111 Epoch: 13 Global Step: 553720 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:57:18,674-Speed 2631.94 samples/sec Loss 4.5404 LearningRate 0.0111 Epoch: 13 Global Step: 553730 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:22,565-Speed 2633.35 samples/sec Loss 4.5491 LearningRate 0.0111 Epoch: 13 Global Step: 553740 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:26,459-Speed 2629.48 samples/sec Loss 4.5166 LearningRate 0.0111 Epoch: 13 Global Step: 553750 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:30,366-Speed 2621.77 samples/sec Loss 4.4982 LearningRate 0.0111 Epoch: 13 Global Step: 553760 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:57:34,237-Speed 2645.46 samples/sec Loss 4.4956 LearningRate 0.0111 Epoch: 13 Global Step: 553770 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:57:38,131-Speed 2631.45 samples/sec Loss 4.4338 LearningRate 0.0111 Epoch: 13 Global Step: 553780 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:57:42,030-Speed 2626.50 samples/sec Loss 4.5279 LearningRate 0.0111 Epoch: 13 Global Step: 553790 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:57:45,923-Speed 2631.62 samples/sec Loss 4.5405 LearningRate 0.0111 Epoch: 13 Global Step: 553800 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:57:49,820-Speed 2628.14 samples/sec Loss 4.4923 LearningRate 0.0111 Epoch: 13 Global Step: 553810 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:57:53,716-Speed 2629.50 samples/sec Loss 4.4219 LearningRate 0.0110 Epoch: 13 Global Step: 553820 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:57:57,610-Speed 2629.96 samples/sec Loss 4.5204 LearningRate 0.0110 Epoch: 13 Global Step: 553830 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:58:01,505-Speed 2629.64 samples/sec Loss 4.4487 LearningRate 0.0110 Epoch: 13 Global Step: 553840 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:58:05,400-Speed 2629.10 samples/sec Loss 4.4787 LearningRate 0.0110 Epoch: 13 Global Step: 553850 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:58:09,298-Speed 2627.95 samples/sec Loss 4.4623 LearningRate 0.0110 Epoch: 13 Global Step: 553860 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:58:13,196-Speed 2627.10 samples/sec Loss 4.4783 LearningRate 0.0110 Epoch: 13 Global Step: 553870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:17,100-Speed 2624.18 samples/sec Loss 4.5224 LearningRate 0.0110 Epoch: 13 Global Step: 553880 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:20,995-Speed 2629.58 samples/sec Loss 4.4350 LearningRate 0.0110 Epoch: 13 Global Step: 553890 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:24,894-Speed 2626.77 samples/sec Loss 4.5070 LearningRate 0.0110 Epoch: 13 Global Step: 553900 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:28,794-Speed 2626.79 samples/sec Loss 4.5011 LearningRate 0.0110 Epoch: 13 Global Step: 553910 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:32,698-Speed 2623.43 samples/sec Loss 4.4012 LearningRate 0.0110 Epoch: 13 Global Step: 553920 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:36,597-Speed 2626.61 samples/sec Loss 4.5119 LearningRate 0.0110 Epoch: 13 Global Step: 553930 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:40,500-Speed 2624.09 samples/sec Loss 4.4800 LearningRate 0.0110 Epoch: 13 Global Step: 553940 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:44,398-Speed 2627.37 samples/sec Loss 4.4670 LearningRate 0.0110 Epoch: 13 Global Step: 553950 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:48,298-Speed 2626.80 samples/sec Loss 4.4942 LearningRate 0.0110 Epoch: 13 Global Step: 553960 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:58:52,203-Speed 2622.78 samples/sec Loss 4.5148 LearningRate 0.0110 Epoch: 13 Global Step: 553970 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:58:56,096-Speed 2630.74 samples/sec Loss 4.5049 LearningRate 0.0110 Epoch: 13 Global Step: 553980 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:58:59,990-Speed 2630.16 samples/sec Loss 4.3552 LearningRate 0.0110 Epoch: 13 Global Step: 553990 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:59:03,888-Speed 2627.94 samples/sec Loss 4.4642 LearningRate 0.0110 Epoch: 13 Global Step: 554000 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 09:59:07,764-Speed 2642.49 samples/sec Loss 4.5314 LearningRate 0.0110 Epoch: 13 Global Step: 554010 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:59:11,657-Speed 2631.14 samples/sec Loss 4.5900 LearningRate 0.0110 Epoch: 13 Global Step: 554020 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:59:15,552-Speed 2630.14 samples/sec Loss 4.4933 LearningRate 0.0110 Epoch: 13 Global Step: 554030 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:59:19,448-Speed 2628.85 samples/sec Loss 4.4507 LearningRate 0.0110 Epoch: 13 Global Step: 554040 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 09:59:23,328-Speed 2639.43 samples/sec Loss 4.5348 LearningRate 0.0110 Epoch: 13 Global Step: 554050 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:27,239-Speed 2618.88 samples/sec Loss 4.4677 LearningRate 0.0110 Epoch: 13 Global Step: 554060 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:31,145-Speed 2622.39 samples/sec Loss 4.5141 LearningRate 0.0110 Epoch: 13 Global Step: 554070 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:35,036-Speed 2632.28 samples/sec Loss 4.4591 LearningRate 0.0110 Epoch: 13 Global Step: 554080 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:38,937-Speed 2625.47 samples/sec Loss 4.5621 LearningRate 0.0110 Epoch: 13 Global Step: 554090 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:42,858-Speed 2612.62 samples/sec Loss 4.4852 LearningRate 0.0110 Epoch: 13 Global Step: 554100 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:46,801-Speed 2597.30 samples/sec Loss 4.4707 LearningRate 0.0110 Epoch: 13 Global Step: 554110 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:50,694-Speed 2631.65 samples/sec Loss 4.4438 LearningRate 0.0110 Epoch: 13 Global Step: 554120 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:54,590-Speed 2628.78 samples/sec Loss 4.5161 LearningRate 0.0110 Epoch: 13 Global Step: 554130 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 09:59:58,490-Speed 2626.44 samples/sec Loss 4.3607 LearningRate 0.0110 Epoch: 13 Global Step: 554140 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:00:02,388-Speed 2627.42 samples/sec Loss 4.4536 LearningRate 0.0110 Epoch: 13 Global Step: 554150 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:06,285-Speed 2627.51 samples/sec Loss 4.4859 LearningRate 0.0110 Epoch: 13 Global Step: 554160 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:10,179-Speed 2630.65 samples/sec Loss 4.5367 LearningRate 0.0110 Epoch: 13 Global Step: 554170 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:14,080-Speed 2626.03 samples/sec Loss 4.4826 LearningRate 0.0110 Epoch: 13 Global Step: 554180 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:17,993-Speed 2617.42 samples/sec Loss 4.4526 LearningRate 0.0110 Epoch: 13 Global Step: 554190 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:21,885-Speed 2632.40 samples/sec Loss 4.4026 LearningRate 0.0110 Epoch: 13 Global Step: 554200 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:25,785-Speed 2626.02 samples/sec Loss 4.5014 LearningRate 0.0110 Epoch: 13 Global Step: 554210 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:29,684-Speed 2626.69 samples/sec Loss 4.4912 LearningRate 0.0110 Epoch: 13 Global Step: 554220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:33,588-Speed 2623.72 samples/sec Loss 4.3458 LearningRate 0.0110 Epoch: 13 Global Step: 554230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:00:37,465-Speed 2641.43 samples/sec Loss 4.4648 LearningRate 0.0110 Epoch: 13 Global Step: 554240 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:00:41,363-Speed 2627.88 samples/sec Loss 4.5491 LearningRate 0.0110 Epoch: 13 Global Step: 554250 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:00:45,258-Speed 2629.67 samples/sec Loss 4.5546 LearningRate 0.0110 Epoch: 13 Global Step: 554260 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:00:49,149-Speed 2631.78 samples/sec Loss 4.4444 LearningRate 0.0110 Epoch: 13 Global Step: 554270 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:00:53,051-Speed 2625.39 samples/sec Loss 4.4949 LearningRate 0.0110 Epoch: 13 Global Step: 554280 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:00:56,951-Speed 2626.45 samples/sec Loss 4.4940 LearningRate 0.0110 Epoch: 13 Global Step: 554290 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:01:00,850-Speed 2627.09 samples/sec Loss 4.3870 LearningRate 0.0110 Epoch: 13 Global Step: 554300 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:01:04,791-Speed 2598.43 samples/sec Loss 4.4023 LearningRate 0.0110 Epoch: 13 Global Step: 554310 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:01:08,694-Speed 2624.08 samples/sec Loss 4.5472 LearningRate 0.0110 Epoch: 13 Global Step: 554320 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:01:12,600-Speed 2622.09 samples/sec Loss 4.4924 LearningRate 0.0110 Epoch: 13 Global Step: 554330 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:01:16,499-Speed 2627.01 samples/sec Loss 4.4283 LearningRate 0.0110 Epoch: 13 Global Step: 554340 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:20,402-Speed 2624.66 samples/sec Loss 4.5443 LearningRate 0.0110 Epoch: 13 Global Step: 554350 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:24,304-Speed 2624.89 samples/sec Loss 4.5859 LearningRate 0.0110 Epoch: 13 Global Step: 554360 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:28,202-Speed 2627.83 samples/sec Loss 4.5136 LearningRate 0.0110 Epoch: 13 Global Step: 554370 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:32,116-Speed 2616.95 samples/sec Loss 4.5051 LearningRate 0.0110 Epoch: 13 Global Step: 554380 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:36,046-Speed 2605.90 samples/sec Loss 4.4197 LearningRate 0.0110 Epoch: 13 Global Step: 554390 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:39,949-Speed 2624.06 samples/sec Loss 4.5817 LearningRate 0.0110 Epoch: 13 Global Step: 554400 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:43,854-Speed 2622.88 samples/sec Loss 4.4660 LearningRate 0.0110 Epoch: 13 Global Step: 554410 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:47,753-Speed 2627.19 samples/sec Loss 4.4222 LearningRate 0.0110 Epoch: 13 Global Step: 554420 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:51,648-Speed 2629.42 samples/sec Loss 4.4208 LearningRate 0.0110 Epoch: 13 Global Step: 554430 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:01:55,546-Speed 2627.52 samples/sec Loss 4.4300 LearningRate 0.0110 Epoch: 13 Global Step: 554440 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:01:59,447-Speed 2626.15 samples/sec Loss 4.4669 LearningRate 0.0110 Epoch: 13 Global Step: 554450 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:02:03,343-Speed 2628.28 samples/sec Loss 4.5376 LearningRate 0.0110 Epoch: 13 Global Step: 554460 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:02:07,237-Speed 2630.53 samples/sec Loss 4.5139 LearningRate 0.0110 Epoch: 13 Global Step: 554470 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:02:11,112-Speed 2643.09 samples/sec Loss 4.3448 LearningRate 0.0110 Epoch: 13 Global Step: 554480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:15,011-Speed 2627.15 samples/sec Loss 4.4740 LearningRate 0.0110 Epoch: 13 Global Step: 554490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:18,917-Speed 2622.07 samples/sec Loss 4.4814 LearningRate 0.0110 Epoch: 13 Global Step: 554500 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:22,813-Speed 2628.83 samples/sec Loss 4.4344 LearningRate 0.0110 Epoch: 13 Global Step: 554510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:26,746-Speed 2604.11 samples/sec Loss 4.5433 LearningRate 0.0110 Epoch: 13 Global Step: 554520 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:30,642-Speed 2629.69 samples/sec Loss 4.3600 LearningRate 0.0110 Epoch: 13 Global Step: 554530 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:34,546-Speed 2623.05 samples/sec Loss 4.5990 LearningRate 0.0110 Epoch: 13 Global Step: 554540 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:38,453-Speed 2621.15 samples/sec Loss 4.4243 LearningRate 0.0110 Epoch: 13 Global Step: 554550 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:42,365-Speed 2618.24 samples/sec Loss 4.5051 LearningRate 0.0110 Epoch: 13 Global Step: 554560 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:46,263-Speed 2627.85 samples/sec Loss 4.4007 LearningRate 0.0110 Epoch: 13 Global Step: 554570 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:50,140-Speed 2641.89 samples/sec Loss 4.4813 LearningRate 0.0110 Epoch: 13 Global Step: 554580 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:54,035-Speed 2630.02 samples/sec Loss 4.3414 LearningRate 0.0110 Epoch: 13 Global Step: 554590 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:02:57,929-Speed 2629.92 samples/sec Loss 4.4498 LearningRate 0.0110 Epoch: 13 Global Step: 554600 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:01,829-Speed 2626.38 samples/sec Loss 4.4690 LearningRate 0.0110 Epoch: 13 Global Step: 554610 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:05,730-Speed 2625.62 samples/sec Loss 4.4151 LearningRate 0.0110 Epoch: 13 Global Step: 554620 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:09,629-Speed 2627.00 samples/sec Loss 4.5537 LearningRate 0.0110 Epoch: 13 Global Step: 554630 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:13,530-Speed 2625.23 samples/sec Loss 4.4888 LearningRate 0.0110 Epoch: 13 Global Step: 554640 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:17,427-Speed 2628.40 samples/sec Loss 4.4126 LearningRate 0.0110 Epoch: 13 Global Step: 554650 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:21,327-Speed 2625.91 samples/sec Loss 4.4754 LearningRate 0.0110 Epoch: 13 Global Step: 554660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:25,222-Speed 2630.00 samples/sec Loss 4.4695 LearningRate 0.0110 Epoch: 13 Global Step: 554670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:29,163-Speed 2599.34 samples/sec Loss 4.5195 LearningRate 0.0110 Epoch: 13 Global Step: 554680 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:03:33,042-Speed 2640.12 samples/sec Loss 4.4248 LearningRate 0.0110 Epoch: 13 Global Step: 554690 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:36,937-Speed 2629.38 samples/sec Loss 4.4659 LearningRate 0.0110 Epoch: 13 Global Step: 554700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:40,831-Speed 2630.49 samples/sec Loss 4.5041 LearningRate 0.0110 Epoch: 13 Global Step: 554710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:44,725-Speed 2630.48 samples/sec Loss 4.4380 LearningRate 0.0110 Epoch: 13 Global Step: 554720 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:48,633-Speed 2620.71 samples/sec Loss 4.4565 LearningRate 0.0110 Epoch: 13 Global Step: 554730 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:52,531-Speed 2627.34 samples/sec Loss 4.5188 LearningRate 0.0110 Epoch: 13 Global Step: 554740 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:03:56,425-Speed 2630.80 samples/sec Loss 4.4740 LearningRate 0.0110 Epoch: 13 Global Step: 554750 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:00,322-Speed 2628.28 samples/sec Loss 4.4477 LearningRate 0.0110 Epoch: 13 Global Step: 554760 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:04,222-Speed 2625.93 samples/sec Loss 4.4817 LearningRate 0.0110 Epoch: 13 Global Step: 554770 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:08,120-Speed 2627.95 samples/sec Loss 4.4471 LearningRate 0.0110 Epoch: 13 Global Step: 554780 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:12,019-Speed 2627.21 samples/sec Loss 4.3901 LearningRate 0.0110 Epoch: 13 Global Step: 554790 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:04:15,918-Speed 2627.10 samples/sec Loss 4.3889 LearningRate 0.0110 Epoch: 13 Global Step: 554800 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:04:19,813-Speed 2629.13 samples/sec Loss 4.3927 LearningRate 0.0110 Epoch: 13 Global Step: 554810 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:04:23,751-Speed 2600.98 samples/sec Loss 4.4370 LearningRate 0.0110 Epoch: 13 Global Step: 554820 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:27,690-Speed 2600.37 samples/sec Loss 4.5691 LearningRate 0.0110 Epoch: 13 Global Step: 554830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:31,616-Speed 2608.47 samples/sec Loss 4.4316 LearningRate 0.0110 Epoch: 13 Global Step: 554840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:35,508-Speed 2631.59 samples/sec Loss 4.4669 LearningRate 0.0110 Epoch: 13 Global Step: 554850 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:39,434-Speed 2609.23 samples/sec Loss 4.4591 LearningRate 0.0110 Epoch: 13 Global Step: 554860 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:43,328-Speed 2630.05 samples/sec Loss 4.4926 LearningRate 0.0110 Epoch: 13 Global Step: 554870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:47,220-Speed 2632.23 samples/sec Loss 4.4840 LearningRate 0.0110 Epoch: 13 Global Step: 554880 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:51,111-Speed 2632.06 samples/sec Loss 4.3954 LearningRate 0.0110 Epoch: 13 Global Step: 554890 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:55,013-Speed 2624.93 samples/sec Loss 4.4834 LearningRate 0.0110 Epoch: 13 Global Step: 554900 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:04:58,909-Speed 2628.75 samples/sec Loss 4.4331 LearningRate 0.0110 Epoch: 13 Global Step: 554910 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:02,797-Speed 2634.20 samples/sec Loss 4.4846 LearningRate 0.0110 Epoch: 13 Global Step: 554920 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:06,691-Speed 2630.02 samples/sec Loss 4.4313 LearningRate 0.0110 Epoch: 13 Global Step: 554930 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:10,598-Speed 2621.84 samples/sec Loss 4.3228 LearningRate 0.0110 Epoch: 13 Global Step: 554940 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:14,544-Speed 2595.96 samples/sec Loss 4.4563 LearningRate 0.0110 Epoch: 13 Global Step: 554950 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:18,447-Speed 2623.63 samples/sec Loss 4.5070 LearningRate 0.0110 Epoch: 13 Global Step: 554960 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:22,342-Speed 2630.22 samples/sec Loss 4.4467 LearningRate 0.0110 Epoch: 13 Global Step: 554970 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:26,238-Speed 2629.04 samples/sec Loss 4.4633 LearningRate 0.0110 Epoch: 13 Global Step: 554980 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:30,128-Speed 2633.12 samples/sec Loss 4.5417 LearningRate 0.0110 Epoch: 13 Global Step: 554990 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:34,021-Speed 2630.57 samples/sec Loss 4.5250 LearningRate 0.0110 Epoch: 13 Global Step: 555000 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:37,913-Speed 2631.49 samples/sec Loss 4.5062 LearningRate 0.0110 Epoch: 13 Global Step: 555010 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:41,805-Speed 2631.77 samples/sec Loss 4.5323 LearningRate 0.0110 Epoch: 13 Global Step: 555020 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:05:45,697-Speed 2631.48 samples/sec Loss 4.4691 LearningRate 0.0110 Epoch: 13 Global Step: 555030 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:05:49,598-Speed 2625.86 samples/sec Loss 4.4608 LearningRate 0.0110 Epoch: 13 Global Step: 555040 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:05:53,471-Speed 2644.77 samples/sec Loss 4.5097 LearningRate 0.0110 Epoch: 13 Global Step: 555050 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:05:57,375-Speed 2623.44 samples/sec Loss 4.5657 LearningRate 0.0110 Epoch: 13 Global Step: 555060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:01,267-Speed 2631.92 samples/sec Loss 4.4046 LearningRate 0.0109 Epoch: 13 Global Step: 555070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:05,170-Speed 2624.06 samples/sec Loss 4.4237 LearningRate 0.0109 Epoch: 13 Global Step: 555080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:09,087-Speed 2615.12 samples/sec Loss 4.4164 LearningRate 0.0109 Epoch: 13 Global Step: 555090 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:12,979-Speed 2630.98 samples/sec Loss 4.4268 LearningRate 0.0109 Epoch: 13 Global Step: 555100 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:16,875-Speed 2629.46 samples/sec Loss 4.4450 LearningRate 0.0109 Epoch: 13 Global Step: 555110 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:20,774-Speed 2626.77 samples/sec Loss 4.4697 LearningRate 0.0109 Epoch: 13 Global Step: 555120 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:24,666-Speed 2631.36 samples/sec Loss 4.4361 LearningRate 0.0109 Epoch: 13 Global Step: 555130 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:28,561-Speed 2629.81 samples/sec Loss 4.5182 LearningRate 0.0109 Epoch: 13 Global Step: 555140 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:32,454-Speed 2631.23 samples/sec Loss 4.4459 LearningRate 0.0109 Epoch: 13 Global Step: 555150 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:06:36,349-Speed 2629.27 samples/sec Loss 4.4028 LearningRate 0.0109 Epoch: 13 Global Step: 555160 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:06:40,250-Speed 2625.97 samples/sec Loss 4.5445 LearningRate 0.0109 Epoch: 13 Global Step: 555170 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:06:44,119-Speed 2647.33 samples/sec Loss 4.4631 LearningRate 0.0109 Epoch: 13 Global Step: 555180 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:48,016-Speed 2628.25 samples/sec Loss 4.4339 LearningRate 0.0109 Epoch: 13 Global Step: 555190 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:51,909-Speed 2631.64 samples/sec Loss 4.3872 LearningRate 0.0109 Epoch: 13 Global Step: 555200 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:55,806-Speed 2628.31 samples/sec Loss 4.4013 LearningRate 0.0109 Epoch: 13 Global Step: 555210 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:06:59,715-Speed 2621.05 samples/sec Loss 4.3599 LearningRate 0.0109 Epoch: 13 Global Step: 555220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:03,615-Speed 2625.82 samples/sec Loss 4.4856 LearningRate 0.0109 Epoch: 13 Global Step: 555230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:07,519-Speed 2623.10 samples/sec Loss 4.4892 LearningRate 0.0109 Epoch: 13 Global Step: 555240 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:11,434-Speed 2615.96 samples/sec Loss 4.4785 LearningRate 0.0109 Epoch: 13 Global Step: 555250 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:15,336-Speed 2625.53 samples/sec Loss 4.4965 LearningRate 0.0109 Epoch: 13 Global Step: 555260 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:19,235-Speed 2627.18 samples/sec Loss 4.4718 LearningRate 0.0109 Epoch: 13 Global Step: 555270 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:23,128-Speed 2631.57 samples/sec Loss 4.4745 LearningRate 0.0109 Epoch: 13 Global Step: 555280 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:07:27,028-Speed 2625.90 samples/sec Loss 4.5094 LearningRate 0.0109 Epoch: 13 Global Step: 555290 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:07:30,926-Speed 2628.05 samples/sec Loss 4.5203 LearningRate 0.0109 Epoch: 13 Global Step: 555300 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:07:34,800-Speed 2643.76 samples/sec Loss 4.4405 LearningRate 0.0109 Epoch: 13 Global Step: 555310 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:38,693-Speed 2630.75 samples/sec Loss 4.3398 LearningRate 0.0109 Epoch: 13 Global Step: 555320 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:42,591-Speed 2627.78 samples/sec Loss 4.4091 LearningRate 0.0109 Epoch: 13 Global Step: 555330 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:46,509-Speed 2614.37 samples/sec Loss 4.5140 LearningRate 0.0109 Epoch: 13 Global Step: 555340 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:50,403-Speed 2630.06 samples/sec Loss 4.5361 LearningRate 0.0109 Epoch: 13 Global Step: 555350 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:54,301-Speed 2627.72 samples/sec Loss 4.4969 LearningRate 0.0109 Epoch: 13 Global Step: 555360 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:07:58,202-Speed 2625.75 samples/sec Loss 4.4371 LearningRate 0.0109 Epoch: 13 Global Step: 555370 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:02,099-Speed 2628.40 samples/sec Loss 4.5255 LearningRate 0.0109 Epoch: 13 Global Step: 555380 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:05,996-Speed 2628.50 samples/sec Loss 4.4647 LearningRate 0.0109 Epoch: 13 Global Step: 555390 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:09,888-Speed 2631.37 samples/sec Loss 4.4160 LearningRate 0.0109 Epoch: 13 Global Step: 555400 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:13,785-Speed 2628.18 samples/sec Loss 4.5791 LearningRate 0.0109 Epoch: 13 Global Step: 555410 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:08:17,680-Speed 2629.29 samples/sec Loss 4.4705 LearningRate 0.0109 Epoch: 13 Global Step: 555420 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:08:21,591-Speed 2619.14 samples/sec Loss 4.5126 LearningRate 0.0109 Epoch: 13 Global Step: 555430 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:08:25,482-Speed 2631.76 samples/sec Loss 4.4841 LearningRate 0.0109 Epoch: 13 Global Step: 555440 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:08:29,350-Speed 2648.05 samples/sec Loss 4.4135 LearningRate 0.0109 Epoch: 13 Global Step: 555450 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:33,241-Speed 2631.96 samples/sec Loss 4.5151 LearningRate 0.0109 Epoch: 13 Global Step: 555460 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:37,132-Speed 2633.17 samples/sec Loss 4.3778 LearningRate 0.0109 Epoch: 13 Global Step: 555470 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:41,026-Speed 2630.30 samples/sec Loss 4.5484 LearningRate 0.0109 Epoch: 13 Global Step: 555480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:44,919-Speed 2630.99 samples/sec Loss 4.4103 LearningRate 0.0109 Epoch: 13 Global Step: 555490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:48,817-Speed 2627.95 samples/sec Loss 4.4409 LearningRate 0.0109 Epoch: 13 Global Step: 555500 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:52,715-Speed 2626.93 samples/sec Loss 4.4119 LearningRate 0.0109 Epoch: 13 Global Step: 555510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:08:56,607-Speed 2631.74 samples/sec Loss 4.5091 LearningRate 0.0109 Epoch: 13 Global Step: 555520 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:00,502-Speed 2630.08 samples/sec Loss 4.4950 LearningRate 0.0109 Epoch: 13 Global Step: 555530 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:04,400-Speed 2627.30 samples/sec Loss 4.4180 LearningRate 0.0109 Epoch: 13 Global Step: 555540 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:08,295-Speed 2629.55 samples/sec Loss 4.3826 LearningRate 0.0109 Epoch: 13 Global Step: 555550 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:09:12,191-Speed 2629.08 samples/sec Loss 4.4502 LearningRate 0.0109 Epoch: 13 Global Step: 555560 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:09:16,083-Speed 2632.10 samples/sec Loss 4.4721 LearningRate 0.0109 Epoch: 13 Global Step: 555570 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:09:19,980-Speed 2628.14 samples/sec Loss 4.3667 LearningRate 0.0109 Epoch: 13 Global Step: 555580 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:09:23,869-Speed 2634.33 samples/sec Loss 4.4163 LearningRate 0.0109 Epoch: 13 Global Step: 555590 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:27,775-Speed 2621.86 samples/sec Loss 4.4640 LearningRate 0.0109 Epoch: 13 Global Step: 555600 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:31,678-Speed 2624.18 samples/sec Loss 4.4352 LearningRate 0.0109 Epoch: 13 Global Step: 555610 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:35,577-Speed 2627.02 samples/sec Loss 4.5120 LearningRate 0.0109 Epoch: 13 Global Step: 555620 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:39,601-Speed 2544.94 samples/sec Loss 4.5073 LearningRate 0.0109 Epoch: 13 Global Step: 555630 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:43,680-Speed 2511.29 samples/sec Loss 4.4006 LearningRate 0.0109 Epoch: 13 Global Step: 555640 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:47,760-Speed 2509.97 samples/sec Loss 4.4822 LearningRate 0.0109 Epoch: 13 Global Step: 555650 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:51,814-Speed 2526.81 samples/sec Loss 4.5636 LearningRate 0.0109 Epoch: 13 Global Step: 555660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:55,719-Speed 2622.89 samples/sec Loss 4.3664 LearningRate 0.0109 Epoch: 13 Global Step: 555670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:09:59,630-Speed 2618.65 samples/sec Loss 4.3381 LearningRate 0.0109 Epoch: 13 Global Step: 555680 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:03,522-Speed 2631.50 samples/sec Loss 4.4176 LearningRate 0.0109 Epoch: 13 Global Step: 555690 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:10:07,426-Speed 2623.63 samples/sec Loss 4.3663 LearningRate 0.0109 Epoch: 13 Global Step: 555700 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:10:11,303-Speed 2642.10 samples/sec Loss 4.5179 LearningRate 0.0109 Epoch: 13 Global Step: 555710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:15,204-Speed 2625.28 samples/sec Loss 4.4446 LearningRate 0.0109 Epoch: 13 Global Step: 555720 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:19,112-Speed 2621.11 samples/sec Loss 4.4726 LearningRate 0.0109 Epoch: 13 Global Step: 555730 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:23,007-Speed 2629.85 samples/sec Loss 4.5154 LearningRate 0.0109 Epoch: 13 Global Step: 555740 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:26,915-Speed 2620.41 samples/sec Loss 4.3924 LearningRate 0.0109 Epoch: 13 Global Step: 555750 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:30,821-Speed 2623.01 samples/sec Loss 4.4076 LearningRate 0.0109 Epoch: 13 Global Step: 555760 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:34,723-Speed 2624.60 samples/sec Loss 4.5244 LearningRate 0.0109 Epoch: 13 Global Step: 555770 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:38,622-Speed 2626.86 samples/sec Loss 4.4328 LearningRate 0.0109 Epoch: 13 Global Step: 555780 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:42,539-Speed 2614.72 samples/sec Loss 4.4621 LearningRate 0.0109 Epoch: 13 Global Step: 555790 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:46,439-Speed 2626.33 samples/sec Loss 4.4682 LearningRate 0.0109 Epoch: 13 Global Step: 555800 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:50,322-Speed 2637.77 samples/sec Loss 4.4362 LearningRate 0.0109 Epoch: 13 Global Step: 555810 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:54,255-Speed 2604.60 samples/sec Loss 4.3279 LearningRate 0.0109 Epoch: 13 Global Step: 555820 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:10:58,158-Speed 2624.11 samples/sec Loss 4.4309 LearningRate 0.0109 Epoch: 13 Global Step: 555830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:02,053-Speed 2629.37 samples/sec Loss 4.3982 LearningRate 0.0109 Epoch: 13 Global Step: 555840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:05,954-Speed 2625.88 samples/sec Loss 4.4992 LearningRate 0.0109 Epoch: 13 Global Step: 555850 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:09,852-Speed 2627.30 samples/sec Loss 4.4595 LearningRate 0.0109 Epoch: 13 Global Step: 555860 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:13,744-Speed 2631.65 samples/sec Loss 4.4604 LearningRate 0.0109 Epoch: 13 Global Step: 555870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:17,657-Speed 2617.93 samples/sec Loss 4.4276 LearningRate 0.0109 Epoch: 13 Global Step: 555880 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:21,556-Speed 2627.09 samples/sec Loss 4.4244 LearningRate 0.0109 Epoch: 13 Global Step: 555890 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:25,448-Speed 2631.15 samples/sec Loss 4.4596 LearningRate 0.0109 Epoch: 13 Global Step: 555900 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:29,348-Speed 2626.23 samples/sec Loss 4.3924 LearningRate 0.0109 Epoch: 13 Global Step: 555910 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:11:33,231-Speed 2637.79 samples/sec Loss 4.4411 LearningRate 0.0109 Epoch: 13 Global Step: 555920 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:37,133-Speed 2625.10 samples/sec Loss 4.4295 LearningRate 0.0109 Epoch: 13 Global Step: 555930 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:41,034-Speed 2625.43 samples/sec Loss 4.3375 LearningRate 0.0109 Epoch: 13 Global Step: 555940 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:44,931-Speed 2628.65 samples/sec Loss 4.4198 LearningRate 0.0109 Epoch: 13 Global Step: 555950 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:48,832-Speed 2625.41 samples/sec Loss 4.4240 LearningRate 0.0109 Epoch: 13 Global Step: 555960 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:52,729-Speed 2629.18 samples/sec Loss 4.4288 LearningRate 0.0109 Epoch: 13 Global Step: 555970 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:11:56,629-Speed 2626.15 samples/sec Loss 4.6208 LearningRate 0.0109 Epoch: 13 Global Step: 555980 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:00,526-Speed 2628.33 samples/sec Loss 4.4084 LearningRate 0.0109 Epoch: 13 Global Step: 555990 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:04,422-Speed 2628.68 samples/sec Loss 4.3900 LearningRate 0.0109 Epoch: 13 Global Step: 556000 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:08,326-Speed 2623.48 samples/sec Loss 4.4507 LearningRate 0.0109 Epoch: 13 Global Step: 556010 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:12,202-Speed 2642.15 samples/sec Loss 4.4816 LearningRate 0.0109 Epoch: 13 Global Step: 556020 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:16,096-Speed 2631.29 samples/sec Loss 4.5744 LearningRate 0.0109 Epoch: 13 Global Step: 556030 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:19,992-Speed 2628.76 samples/sec Loss 4.4863 LearningRate 0.0109 Epoch: 13 Global Step: 556040 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:23,885-Speed 2631.49 samples/sec Loss 4.3539 LearningRate 0.0109 Epoch: 13 Global Step: 556050 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:27,785-Speed 2626.19 samples/sec Loss 4.5118 LearningRate 0.0109 Epoch: 13 Global Step: 556060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:31,680-Speed 2629.60 samples/sec Loss 4.4808 LearningRate 0.0109 Epoch: 13 Global Step: 556070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:35,575-Speed 2629.29 samples/sec Loss 4.5059 LearningRate 0.0109 Epoch: 13 Global Step: 556080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:39,473-Speed 2627.96 samples/sec Loss 4.4648 LearningRate 0.0109 Epoch: 13 Global Step: 556090 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:43,368-Speed 2629.39 samples/sec Loss 4.3757 LearningRate 0.0109 Epoch: 13 Global Step: 556100 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:47,277-Speed 2620.03 samples/sec Loss 4.4208 LearningRate 0.0109 Epoch: 13 Global Step: 556110 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:12:51,187-Speed 2619.73 samples/sec Loss 4.4437 LearningRate 0.0109 Epoch: 13 Global Step: 556120 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:12:55,091-Speed 2623.10 samples/sec Loss 4.3668 LearningRate 0.0109 Epoch: 13 Global Step: 556130 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:12:58,986-Speed 2630.37 samples/sec Loss 4.4483 LearningRate 0.0109 Epoch: 13 Global Step: 556140 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:13:02,857-Speed 2646.15 samples/sec Loss 4.4996 LearningRate 0.0109 Epoch: 13 Global Step: 556150 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:06,750-Speed 2630.46 samples/sec Loss 4.4765 LearningRate 0.0109 Epoch: 13 Global Step: 556160 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:10,646-Speed 2629.10 samples/sec Loss 4.3926 LearningRate 0.0109 Epoch: 13 Global Step: 556170 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:14,550-Speed 2623.11 samples/sec Loss 4.4880 LearningRate 0.0109 Epoch: 13 Global Step: 556180 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:18,483-Speed 2604.51 samples/sec Loss 4.4288 LearningRate 0.0109 Epoch: 13 Global Step: 556190 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:22,385-Speed 2624.95 samples/sec Loss 4.4146 LearningRate 0.0109 Epoch: 13 Global Step: 556200 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:26,293-Speed 2620.38 samples/sec Loss 4.4382 LearningRate 0.0109 Epoch: 13 Global Step: 556210 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:30,195-Speed 2625.07 samples/sec Loss 4.5142 LearningRate 0.0109 Epoch: 13 Global Step: 556220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:34,085-Speed 2633.44 samples/sec Loss 4.4844 LearningRate 0.0109 Epoch: 13 Global Step: 556230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:37,977-Speed 2631.88 samples/sec Loss 4.3829 LearningRate 0.0109 Epoch: 13 Global Step: 556240 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:41,872-Speed 2628.91 samples/sec Loss 4.4401 LearningRate 0.0109 Epoch: 13 Global Step: 556250 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:13:45,767-Speed 2630.45 samples/sec Loss 4.4835 LearningRate 0.0109 Epoch: 13 Global Step: 556260 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:13:49,649-Speed 2637.87 samples/sec Loss 4.4087 LearningRate 0.0109 Epoch: 13 Global Step: 556270 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:53,544-Speed 2629.88 samples/sec Loss 4.3842 LearningRate 0.0109 Epoch: 13 Global Step: 556280 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:13:57,447-Speed 2624.12 samples/sec Loss 4.3648 LearningRate 0.0109 Epoch: 13 Global Step: 556290 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:14:01,324-Speed 2641.77 samples/sec Loss 4.4686 LearningRate 0.0109 Epoch: 13 Global Step: 556300 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:05,224-Speed 2626.19 samples/sec Loss 4.3617 LearningRate 0.0109 Epoch: 13 Global Step: 556310 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:09,121-Speed 2628.46 samples/sec Loss 4.4882 LearningRate 0.0109 Epoch: 13 Global Step: 556320 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:13,019-Speed 2627.58 samples/sec Loss 4.4266 LearningRate 0.0108 Epoch: 13 Global Step: 556330 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:16,928-Speed 2621.07 samples/sec Loss 4.5101 LearningRate 0.0108 Epoch: 13 Global Step: 556340 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:20,823-Speed 2629.38 samples/sec Loss 4.4574 LearningRate 0.0108 Epoch: 13 Global Step: 556350 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:24,717-Speed 2630.30 samples/sec Loss 4.4955 LearningRate 0.0108 Epoch: 13 Global Step: 556360 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:28,610-Speed 2630.87 samples/sec Loss 4.4568 LearningRate 0.0108 Epoch: 13 Global Step: 556370 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:32,504-Speed 2630.16 samples/sec Loss 4.4670 LearningRate 0.0108 Epoch: 13 Global Step: 556380 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:36,395-Speed 2632.71 samples/sec Loss 4.4196 LearningRate 0.0108 Epoch: 13 Global Step: 556390 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:14:40,315-Speed 2612.81 samples/sec Loss 4.4619 LearningRate 0.0108 Epoch: 13 Global Step: 556400 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:14:44,214-Speed 2626.97 samples/sec Loss 4.4813 LearningRate 0.0108 Epoch: 13 Global Step: 556410 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:14:48,119-Speed 2623.37 samples/sec Loss 4.4350 LearningRate 0.0108 Epoch: 13 Global Step: 556420 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:14:52,027-Speed 2620.19 samples/sec Loss 4.3918 LearningRate 0.0108 Epoch: 13 Global Step: 556430 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:14:55,924-Speed 2628.52 samples/sec Loss 4.5422 LearningRate 0.0108 Epoch: 13 Global Step: 556440 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:14:59,827-Speed 2624.51 samples/sec Loss 4.4580 LearningRate 0.0108 Epoch: 13 Global Step: 556450 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:03,731-Speed 2623.35 samples/sec Loss 4.5546 LearningRate 0.0108 Epoch: 13 Global Step: 556460 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:07,641-Speed 2618.86 samples/sec Loss 4.4152 LearningRate 0.0108 Epoch: 13 Global Step: 556470 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:11,545-Speed 2623.92 samples/sec Loss 4.3991 LearningRate 0.0108 Epoch: 13 Global Step: 556480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:15,486-Speed 2598.59 samples/sec Loss 4.4988 LearningRate 0.0108 Epoch: 13 Global Step: 556490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:19,406-Speed 2612.75 samples/sec Loss 4.3603 LearningRate 0.0108 Epoch: 13 Global Step: 556500 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:15:23,292-Speed 2636.44 samples/sec Loss 4.3868 LearningRate 0.0108 Epoch: 13 Global Step: 556510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:27,186-Speed 2630.15 samples/sec Loss 4.4226 LearningRate 0.0108 Epoch: 13 Global Step: 556520 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:31,094-Speed 2621.16 samples/sec Loss 4.4219 LearningRate 0.0108 Epoch: 13 Global Step: 556530 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:34,991-Speed 2628.15 samples/sec Loss 4.4528 LearningRate 0.0108 Epoch: 13 Global Step: 556540 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:38,885-Speed 2630.27 samples/sec Loss 4.4621 LearningRate 0.0108 Epoch: 13 Global Step: 556550 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:42,778-Speed 2630.26 samples/sec Loss 4.4012 LearningRate 0.0108 Epoch: 13 Global Step: 556560 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:46,686-Speed 2621.55 samples/sec Loss 4.3846 LearningRate 0.0108 Epoch: 13 Global Step: 556570 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:50,588-Speed 2624.34 samples/sec Loss 4.4075 LearningRate 0.0108 Epoch: 13 Global Step: 556580 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:54,492-Speed 2624.41 samples/sec Loss 4.4364 LearningRate 0.0108 Epoch: 13 Global Step: 556590 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:15:58,388-Speed 2628.66 samples/sec Loss 4.4768 LearningRate 0.0108 Epoch: 13 Global Step: 556600 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:16:02,294-Speed 2622.35 samples/sec Loss 4.4987 LearningRate 0.0108 Epoch: 13 Global Step: 556610 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:06,196-Speed 2624.91 samples/sec Loss 4.3692 LearningRate 0.0108 Epoch: 13 Global Step: 556620 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:10,094-Speed 2627.56 samples/sec Loss 4.3235 LearningRate 0.0108 Epoch: 13 Global Step: 556630 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:13,993-Speed 2626.66 samples/sec Loss 4.5049 LearningRate 0.0108 Epoch: 13 Global Step: 556640 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:17,883-Speed 2633.30 samples/sec Loss 4.4299 LearningRate 0.0108 Epoch: 13 Global Step: 556650 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:21,779-Speed 2629.06 samples/sec Loss 4.4009 LearningRate 0.0108 Epoch: 13 Global Step: 556660 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:25,675-Speed 2628.91 samples/sec Loss 4.3584 LearningRate 0.0108 Epoch: 13 Global Step: 556670 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:29,581-Speed 2622.48 samples/sec Loss 4.3954 LearningRate 0.0108 Epoch: 13 Global Step: 556680 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:33,475-Speed 2629.75 samples/sec Loss 4.3963 LearningRate 0.0108 Epoch: 13 Global Step: 556690 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:16:37,347-Speed 2645.48 samples/sec Loss 4.4006 LearningRate 0.0108 Epoch: 13 Global Step: 556700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:16:41,241-Speed 2630.37 samples/sec Loss 4.4695 LearningRate 0.0108 Epoch: 13 Global Step: 556710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:16:45,134-Speed 2631.21 samples/sec Loss 4.5187 LearningRate 0.0108 Epoch: 13 Global Step: 556720 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:16:49,038-Speed 2623.77 samples/sec Loss 4.4827 LearningRate 0.0108 Epoch: 13 Global Step: 556730 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:16:52,936-Speed 2627.41 samples/sec Loss 4.4873 LearningRate 0.0108 Epoch: 13 Global Step: 556740 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:16:56,828-Speed 2631.20 samples/sec Loss 4.3793 LearningRate 0.0108 Epoch: 13 Global Step: 556750 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:00,723-Speed 2630.48 samples/sec Loss 4.5026 LearningRate 0.0108 Epoch: 13 Global Step: 556760 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:04,626-Speed 2623.43 samples/sec Loss 4.4503 LearningRate 0.0108 Epoch: 13 Global Step: 556770 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:08,531-Speed 2623.21 samples/sec Loss 4.4437 LearningRate 0.0108 Epoch: 13 Global Step: 556780 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:12,428-Speed 2627.73 samples/sec Loss 4.5283 LearningRate 0.0108 Epoch: 13 Global Step: 556790 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:16,334-Speed 2626.47 samples/sec Loss 4.5094 LearningRate 0.0108 Epoch: 13 Global Step: 556800 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:17:20,232-Speed 2627.62 samples/sec Loss 4.4290 LearningRate 0.0108 Epoch: 13 Global Step: 556810 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:17:24,126-Speed 2630.32 samples/sec Loss 4.4295 LearningRate 0.0108 Epoch: 13 Global Step: 556820 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:17:28,002-Speed 2642.26 samples/sec Loss 4.3812 LearningRate 0.0108 Epoch: 13 Global Step: 556830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:31,902-Speed 2626.87 samples/sec Loss 4.3611 LearningRate 0.0108 Epoch: 13 Global Step: 556840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:35,796-Speed 2629.54 samples/sec Loss 4.4313 LearningRate 0.0108 Epoch: 13 Global Step: 556850 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:39,694-Speed 2627.48 samples/sec Loss 4.4145 LearningRate 0.0108 Epoch: 13 Global Step: 556860 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:43,594-Speed 2626.44 samples/sec Loss 4.3393 LearningRate 0.0108 Epoch: 13 Global Step: 556870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:47,500-Speed 2622.12 samples/sec Loss 4.5609 LearningRate 0.0108 Epoch: 13 Global Step: 556880 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:51,400-Speed 2626.40 samples/sec Loss 4.4174 LearningRate 0.0108 Epoch: 13 Global Step: 556890 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:55,295-Speed 2629.52 samples/sec Loss 4.4611 LearningRate 0.0108 Epoch: 13 Global Step: 556900 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:17:59,192-Speed 2628.33 samples/sec Loss 4.4196 LearningRate 0.0108 Epoch: 13 Global Step: 556910 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:03,085-Speed 2631.72 samples/sec Loss 4.4606 LearningRate 0.0108 Epoch: 13 Global Step: 556920 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:06,966-Speed 2638.91 samples/sec Loss 4.3625 LearningRate 0.0108 Epoch: 13 Global Step: 556930 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:10,863-Speed 2627.79 samples/sec Loss 4.4294 LearningRate 0.0108 Epoch: 13 Global Step: 556940 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:14,757-Speed 2630.21 samples/sec Loss 4.4046 LearningRate 0.0108 Epoch: 13 Global Step: 556950 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:18,667-Speed 2619.37 samples/sec Loss 4.3689 LearningRate 0.0108 Epoch: 13 Global Step: 556960 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:22,585-Speed 2614.28 samples/sec Loss 4.3948 LearningRate 0.0108 Epoch: 13 Global Step: 556970 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:26,480-Speed 2629.44 samples/sec Loss 4.4222 LearningRate 0.0108 Epoch: 13 Global Step: 556980 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:30,386-Speed 2622.71 samples/sec Loss 4.3833 LearningRate 0.0108 Epoch: 13 Global Step: 556990 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:34,278-Speed 2631.37 samples/sec Loss 4.4756 LearningRate 0.0108 Epoch: 13 Global Step: 557000 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:38,175-Speed 2628.24 samples/sec Loss 4.4245 LearningRate 0.0108 Epoch: 13 Global Step: 557010 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:42,066-Speed 2632.27 samples/sec Loss 4.4042 LearningRate 0.0108 Epoch: 13 Global Step: 557020 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:45,958-Speed 2631.75 samples/sec Loss 4.3801 LearningRate 0.0108 Epoch: 13 Global Step: 557030 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:18:49,925-Speed 2581.68 samples/sec Loss 4.5228 LearningRate 0.0108 Epoch: 13 Global Step: 557040 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:53,833-Speed 2621.21 samples/sec Loss 4.3818 LearningRate 0.0108 Epoch: 13 Global Step: 557050 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:18:57,725-Speed 2631.82 samples/sec Loss 4.3397 LearningRate 0.0108 Epoch: 13 Global Step: 557060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:01,626-Speed 2625.58 samples/sec Loss 4.4193 LearningRate 0.0108 Epoch: 13 Global Step: 557070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:05,530-Speed 2623.61 samples/sec Loss 4.4865 LearningRate 0.0108 Epoch: 13 Global Step: 557080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:09,421-Speed 2631.65 samples/sec Loss 4.3494 LearningRate 0.0108 Epoch: 13 Global Step: 557090 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:13,318-Speed 2628.43 samples/sec Loss 4.5777 LearningRate 0.0108 Epoch: 13 Global Step: 557100 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:17,213-Speed 2630.23 samples/sec Loss 4.4007 LearningRate 0.0108 Epoch: 13 Global Step: 557110 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:21,107-Speed 2629.94 samples/sec Loss 4.3540 LearningRate 0.0108 Epoch: 13 Global Step: 557120 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:25,003-Speed 2629.53 samples/sec Loss 4.3866 LearningRate 0.0108 Epoch: 13 Global Step: 557130 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:28,875-Speed 2644.67 samples/sec Loss 4.3826 LearningRate 0.0108 Epoch: 13 Global Step: 557140 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:32,774-Speed 2626.97 samples/sec Loss 4.4143 LearningRate 0.0108 Epoch: 13 Global Step: 557150 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:36,667-Speed 2631.01 samples/sec Loss 4.4353 LearningRate 0.0108 Epoch: 13 Global Step: 557160 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:40,566-Speed 2626.98 samples/sec Loss 4.4212 LearningRate 0.0108 Epoch: 13 Global Step: 557170 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:44,469-Speed 2624.11 samples/sec Loss 4.4922 LearningRate 0.0108 Epoch: 13 Global Step: 557180 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:48,385-Speed 2615.70 samples/sec Loss 4.3556 LearningRate 0.0108 Epoch: 13 Global Step: 557190 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:52,290-Speed 2623.22 samples/sec Loss 4.4887 LearningRate 0.0108 Epoch: 13 Global Step: 557200 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:19:56,189-Speed 2626.37 samples/sec Loss 4.4376 LearningRate 0.0108 Epoch: 13 Global Step: 557210 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:20:00,085-Speed 2629.73 samples/sec Loss 4.4017 LearningRate 0.0108 Epoch: 13 Global Step: 557220 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:20:03,977-Speed 2631.36 samples/sec Loss 4.4298 LearningRate 0.0108 Epoch: 13 Global Step: 557230 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:20:07,868-Speed 2632.17 samples/sec Loss 4.3262 LearningRate 0.0108 Epoch: 13 Global Step: 557240 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:20:11,744-Speed 2642.49 samples/sec Loss 4.4051 LearningRate 0.0108 Epoch: 13 Global Step: 557250 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:20:15,635-Speed 2632.30 samples/sec Loss 4.4541 LearningRate 0.0108 Epoch: 13 Global Step: 557260 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:20:19,532-Speed 2627.96 samples/sec Loss 4.4521 LearningRate 0.0108 Epoch: 13 Global Step: 557270 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:20:23,427-Speed 2629.66 samples/sec Loss 4.3498 LearningRate 0.0108 Epoch: 13 Global Step: 557280 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:20:27,313-Speed 2636.33 samples/sec Loss 4.4513 LearningRate 0.0108 Epoch: 13 Global Step: 557290 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:31,213-Speed 2625.63 samples/sec Loss 4.4381 LearningRate 0.0108 Epoch: 13 Global Step: 557300 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:35,113-Speed 2626.86 samples/sec Loss 4.3915 LearningRate 0.0108 Epoch: 13 Global Step: 557310 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:39,013-Speed 2626.08 samples/sec Loss 4.4288 LearningRate 0.0108 Epoch: 13 Global Step: 557320 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:42,910-Speed 2628.23 samples/sec Loss 4.4496 LearningRate 0.0108 Epoch: 13 Global Step: 557330 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:46,804-Speed 2630.34 samples/sec Loss 4.4980 LearningRate 0.0108 Epoch: 13 Global Step: 557340 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:50,699-Speed 2629.45 samples/sec Loss 4.4751 LearningRate 0.0108 Epoch: 13 Global Step: 557350 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:54,595-Speed 2629.70 samples/sec Loss 4.4049 LearningRate 0.0108 Epoch: 13 Global Step: 557360 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:20:58,486-Speed 2631.56 samples/sec Loss 4.4174 LearningRate 0.0108 Epoch: 13 Global Step: 557370 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:21:02,379-Speed 2631.11 samples/sec Loss 4.4909 LearningRate 0.0108 Epoch: 13 Global Step: 557380 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:21:06,276-Speed 2628.42 samples/sec Loss 4.3963 LearningRate 0.0108 Epoch: 13 Global Step: 557390 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:10,169-Speed 2630.98 samples/sec Loss 4.3835 LearningRate 0.0108 Epoch: 13 Global Step: 557400 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:14,063-Speed 2630.39 samples/sec Loss 4.4014 LearningRate 0.0108 Epoch: 13 Global Step: 557410 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:17,956-Speed 2631.26 samples/sec Loss 4.4953 LearningRate 0.0108 Epoch: 13 Global Step: 557420 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:21,863-Speed 2621.68 samples/sec Loss 4.3167 LearningRate 0.0108 Epoch: 13 Global Step: 557430 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:25,761-Speed 2627.36 samples/sec Loss 4.3256 LearningRate 0.0108 Epoch: 13 Global Step: 557440 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:29,658-Speed 2628.26 samples/sec Loss 4.4311 LearningRate 0.0108 Epoch: 13 Global Step: 557450 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:33,560-Speed 2624.76 samples/sec Loss 4.3982 LearningRate 0.0108 Epoch: 13 Global Step: 557460 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:37,457-Speed 2628.19 samples/sec Loss 4.4508 LearningRate 0.0108 Epoch: 13 Global Step: 557470 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:41,355-Speed 2627.50 samples/sec Loss 4.4842 LearningRate 0.0108 Epoch: 13 Global Step: 557480 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:45,237-Speed 2638.87 samples/sec Loss 4.4707 LearningRate 0.0108 Epoch: 13 Global Step: 557490 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:49,133-Speed 2628.88 samples/sec Loss 4.3849 LearningRate 0.0108 Epoch: 13 Global Step: 557500 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:53,029-Speed 2629.14 samples/sec Loss 4.4240 LearningRate 0.0108 Epoch: 13 Global Step: 557510 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:21:56,926-Speed 2627.95 samples/sec Loss 4.4431 LearningRate 0.0108 Epoch: 13 Global Step: 557520 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:00,820-Speed 2630.93 samples/sec Loss 4.5097 LearningRate 0.0108 Epoch: 13 Global Step: 557530 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:04,721-Speed 2624.99 samples/sec Loss 4.4780 LearningRate 0.0108 Epoch: 13 Global Step: 557540 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:08,711-Speed 2567.06 samples/sec Loss 4.4452 LearningRate 0.0108 Epoch: 13 Global Step: 557550 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:12,787-Speed 2512.70 samples/sec Loss 4.2918 LearningRate 0.0108 Epoch: 13 Global Step: 557560 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:16,758-Speed 2579.40 samples/sec Loss 4.3004 LearningRate 0.0108 Epoch: 13 Global Step: 557570 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:20,665-Speed 2621.11 samples/sec Loss 4.4104 LearningRate 0.0108 Epoch: 13 Global Step: 557580 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:24,562-Speed 2629.12 samples/sec Loss 4.4409 LearningRate 0.0107 Epoch: 13 Global Step: 557590 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:22:28,458-Speed 2629.15 samples/sec Loss 4.4985 LearningRate 0.0107 Epoch: 13 Global Step: 557600 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:22:32,352-Speed 2630.75 samples/sec Loss 4.4389 LearningRate 0.0107 Epoch: 13 Global Step: 557610 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:22:36,229-Speed 2641.77 samples/sec Loss 4.4364 LearningRate 0.0107 Epoch: 13 Global Step: 557620 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:40,122-Speed 2631.00 samples/sec Loss 4.4346 LearningRate 0.0107 Epoch: 13 Global Step: 557630 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:44,019-Speed 2627.81 samples/sec Loss 4.3498 LearningRate 0.0107 Epoch: 13 Global Step: 557640 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:47,956-Speed 2601.77 samples/sec Loss 4.4523 LearningRate 0.0107 Epoch: 13 Global Step: 557650 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:51,853-Speed 2628.36 samples/sec Loss 4.4830 LearningRate 0.0107 Epoch: 13 Global Step: 557660 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:55,749-Speed 2629.78 samples/sec Loss 4.4840 LearningRate 0.0107 Epoch: 13 Global Step: 557670 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:22:59,646-Speed 2628.50 samples/sec Loss 4.5224 LearningRate 0.0107 Epoch: 13 Global Step: 557680 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:03,547-Speed 2626.09 samples/sec Loss 4.3526 LearningRate 0.0107 Epoch: 13 Global Step: 557690 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:07,442-Speed 2628.95 samples/sec Loss 4.3323 LearningRate 0.0107 Epoch: 13 Global Step: 557700 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:11,344-Speed 2625.19 samples/sec Loss 4.3568 LearningRate 0.0107 Epoch: 13 Global Step: 557710 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:15,249-Speed 2623.01 samples/sec Loss 4.3719 LearningRate 0.0107 Epoch: 13 Global Step: 557720 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:23:19,159-Speed 2619.29 samples/sec Loss 4.4420 LearningRate 0.0107 Epoch: 13 Global Step: 557730 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:23:23,052-Speed 2632.41 samples/sec Loss 4.5323 LearningRate 0.0107 Epoch: 13 Global Step: 557740 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:23:26,945-Speed 2630.52 samples/sec Loss 4.4829 LearningRate 0.0107 Epoch: 13 Global Step: 557750 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:23:30,887-Speed 2599.39 samples/sec Loss 4.4618 LearningRate 0.0107 Epoch: 13 Global Step: 557760 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:23:34,802-Speed 2616.19 samples/sec Loss 4.4433 LearningRate 0.0107 Epoch: 13 Global Step: 557770 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:23:38,688-Speed 2635.53 samples/sec Loss 4.4495 LearningRate 0.0107 Epoch: 13 Global Step: 557780 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:42,583-Speed 2629.25 samples/sec Loss 4.4661 LearningRate 0.0107 Epoch: 13 Global Step: 557790 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:46,479-Speed 2629.73 samples/sec Loss 4.4233 LearningRate 0.0107 Epoch: 13 Global Step: 557800 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:50,373-Speed 2629.84 samples/sec Loss 4.3825 LearningRate 0.0107 Epoch: 13 Global Step: 557810 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:54,293-Speed 2613.21 samples/sec Loss 4.3629 LearningRate 0.0107 Epoch: 13 Global Step: 557820 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:23:58,198-Speed 2622.99 samples/sec Loss 4.3514 LearningRate 0.0107 Epoch: 13 Global Step: 557830 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:24:02,125-Speed 2608.63 samples/sec Loss 4.4025 LearningRate 0.0107 Epoch: 13 Global Step: 557840 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:24:06,020-Speed 2629.66 samples/sec Loss 4.3454 LearningRate 0.0107 Epoch: 13 Global Step: 557850 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:24:09,917-Speed 2628.81 samples/sec Loss 4.4840 LearningRate 0.0107 Epoch: 13 Global Step: 557860 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:24:13,808-Speed 2631.88 samples/sec Loss 4.4233 LearningRate 0.0107 Epoch: 13 Global Step: 557870 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:24:17,701-Speed 2631.64 samples/sec Loss 4.5307 LearningRate 0.0107 Epoch: 13 Global Step: 557880 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:24:21,594-Speed 2630.61 samples/sec Loss 4.4194 LearningRate 0.0107 Epoch: 13 Global Step: 557890 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:24:25,486-Speed 2631.49 samples/sec Loss 4.4690 LearningRate 0.0107 Epoch: 13 Global Step: 557900 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:24:29,381-Speed 2629.85 samples/sec Loss 4.4059 LearningRate 0.0107 Epoch: 13 Global Step: 557910 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:24:33,275-Speed 2630.56 samples/sec Loss 4.3354 LearningRate 0.0107 Epoch: 13 Global Step: 557920 Fp16 Grad Scale: 131072 Required: 31 hours
Training: 2022-04-15 10:24:37,148-Speed 2644.76 samples/sec Loss 4.4650 LearningRate 0.0107 Epoch: 13 Global Step: 557930 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:24:41,045-Speed 2628.49 samples/sec Loss 4.4705 LearningRate 0.0107 Epoch: 13 Global Step: 557940 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:24:44,916-Speed 2645.68 samples/sec Loss 4.3880 LearningRate 0.0107 Epoch: 13 Global Step: 557950 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:24:48,809-Speed 2630.62 samples/sec Loss 4.4396 LearningRate 0.0107 Epoch: 13 Global Step: 557960 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:24:52,708-Speed 2627.55 samples/sec Loss 4.3379 LearningRate 0.0107 Epoch: 13 Global Step: 557970 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:24:56,600-Speed 2631.03 samples/sec Loss 4.4179 LearningRate 0.0107 Epoch: 13 Global Step: 557980 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:25:00,492-Speed 2631.80 samples/sec Loss 4.4075 LearningRate 0.0107 Epoch: 13 Global Step: 557990 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:25:04,390-Speed 2627.65 samples/sec Loss 4.3170 LearningRate 0.0107 Epoch: 13 Global Step: 558000 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:25:08,295-Speed 2623.38 samples/sec Loss 4.3987 LearningRate 0.0107 Epoch: 13 Global Step: 558010 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:25:12,188-Speed 2630.81 samples/sec Loss 4.4419 LearningRate 0.0107 Epoch: 13 Global Step: 558020 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:25:16,094-Speed 2622.09 samples/sec Loss 4.4447 LearningRate 0.0107 Epoch: 13 Global Step: 558030 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:25:19,988-Speed 2630.87 samples/sec Loss 4.4569 LearningRate 0.0107 Epoch: 13 Global Step: 558040 Fp16 Grad Scale: 32768 Required: 31 hours
Training: 2022-04-15 10:25:23,881-Speed 2630.94 samples/sec Loss 4.3701 LearningRate 0.0107 Epoch: 13 Global Step: 558050 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:25:27,782-Speed 2626.10 samples/sec Loss 4.5688 LearningRate 0.0107 Epoch: 13 Global Step: 558060 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:25:31,703-Speed 2612.21 samples/sec Loss 4.3900 LearningRate 0.0107 Epoch: 13 Global Step: 558070 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:25:35,602-Speed 2626.87 samples/sec Loss 4.3271 LearningRate 0.0107 Epoch: 13 Global Step: 558080 Fp16 Grad Scale: 65536 Required: 31 hours
Training: 2022-04-15 10:25:39,497-Speed 2629.69 samples/sec Loss 4.4411 LearningRate 0.0107 Epoch: 13 Global Step: 558090 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:25:43,376-Speed 2640.27 samples/sec Loss 4.4158 LearningRate 0.0107 Epoch: 13 Global Step: 558100 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:25:47,278-Speed 2624.99 samples/sec Loss 4.4036 LearningRate 0.0107 Epoch: 13 Global Step: 558110 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:25:51,172-Speed 2630.65 samples/sec Loss 4.4347 LearningRate 0.0107 Epoch: 13 Global Step: 558120 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:25:55,096-Speed 2610.39 samples/sec Loss 4.3629 LearningRate 0.0107 Epoch: 13 Global Step: 558130 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:25:58,996-Speed 2627.25 samples/sec Loss 4.3030 LearningRate 0.0107 Epoch: 13 Global Step: 558140 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:02,922-Speed 2608.87 samples/sec Loss 4.3978 LearningRate 0.0107 Epoch: 13 Global Step: 558150 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:06,814-Speed 2631.97 samples/sec Loss 4.4046 LearningRate 0.0107 Epoch: 13 Global Step: 558160 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:10,713-Speed 2627.04 samples/sec Loss 4.3991 LearningRate 0.0107 Epoch: 13 Global Step: 558170 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:14,611-Speed 2627.81 samples/sec Loss 4.3734 LearningRate 0.0107 Epoch: 13 Global Step: 558180 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:18,508-Speed 2628.20 samples/sec Loss 4.3759 LearningRate 0.0107 Epoch: 13 Global Step: 558190 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:22,438-Speed 2606.12 samples/sec Loss 4.3536 LearningRate 0.0107 Epoch: 13 Global Step: 558200 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:26:26,318-Speed 2640.27 samples/sec Loss 4.5187 LearningRate 0.0107 Epoch: 13 Global Step: 558210 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:30,225-Speed 2621.48 samples/sec Loss 4.5099 LearningRate 0.0107 Epoch: 13 Global Step: 558220 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:34,138-Speed 2617.32 samples/sec Loss 4.3736 LearningRate 0.0107 Epoch: 13 Global Step: 558230 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:38,035-Speed 2628.77 samples/sec Loss 4.4574 LearningRate 0.0107 Epoch: 13 Global Step: 558240 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:41,937-Speed 2625.26 samples/sec Loss 4.3342 LearningRate 0.0107 Epoch: 13 Global Step: 558250 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:45,837-Speed 2626.37 samples/sec Loss 4.3055 LearningRate 0.0107 Epoch: 13 Global Step: 558260 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:49,756-Speed 2613.51 samples/sec Loss 4.3536 LearningRate 0.0107 Epoch: 13 Global Step: 558270 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:53,648-Speed 2632.93 samples/sec Loss 4.3692 LearningRate 0.0107 Epoch: 13 Global Step: 558280 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:26:57,543-Speed 2629.16 samples/sec Loss 4.4595 LearningRate 0.0107 Epoch: 13 Global Step: 558290 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:01,440-Speed 2627.98 samples/sec Loss 4.4683 LearningRate 0.0107 Epoch: 13 Global Step: 558300 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:05,341-Speed 2625.57 samples/sec Loss 4.3090 LearningRate 0.0107 Epoch: 13 Global Step: 558310 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:27:09,237-Speed 2629.81 samples/sec Loss 4.4274 LearningRate 0.0107 Epoch: 13 Global Step: 558320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:27:13,134-Speed 2628.48 samples/sec Loss 4.4591 LearningRate 0.0107 Epoch: 13 Global Step: 558330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:27:17,024-Speed 2632.64 samples/sec Loss 4.3779 LearningRate 0.0107 Epoch: 13 Global Step: 558340 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:20,923-Speed 2627.58 samples/sec Loss 4.4032 LearningRate 0.0107 Epoch: 13 Global Step: 558350 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:24,820-Speed 2628.58 samples/sec Loss 4.4354 LearningRate 0.0107 Epoch: 13 Global Step: 558360 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:28,719-Speed 2627.38 samples/sec Loss 4.4560 LearningRate 0.0107 Epoch: 13 Global Step: 558370 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:32,740-Speed 2546.87 samples/sec Loss 4.4551 LearningRate 0.0107 Epoch: 13 Global Step: 558380 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:36,650-Speed 2619.33 samples/sec Loss 4.3476 LearningRate 0.0107 Epoch: 13 Global Step: 558390 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:40,558-Speed 2621.33 samples/sec Loss 4.3853 LearningRate 0.0107 Epoch: 13 Global Step: 558400 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:44,464-Speed 2622.88 samples/sec Loss 4.4358 LearningRate 0.0107 Epoch: 13 Global Step: 558410 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:48,390-Speed 2609.09 samples/sec Loss 4.4601 LearningRate 0.0107 Epoch: 13 Global Step: 558420 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:52,288-Speed 2627.35 samples/sec Loss 4.3175 LearningRate 0.0107 Epoch: 13 Global Step: 558430 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:27:56,207-Speed 2614.18 samples/sec Loss 4.3118 LearningRate 0.0107 Epoch: 13 Global Step: 558440 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:28:00,122-Speed 2615.80 samples/sec Loss 4.4520 LearningRate 0.0107 Epoch: 13 Global Step: 558450 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:28:04,001-Speed 2640.49 samples/sec Loss 4.4309 LearningRate 0.0107 Epoch: 13 Global Step: 558460 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:07,900-Speed 2626.64 samples/sec Loss 4.4072 LearningRate 0.0107 Epoch: 13 Global Step: 558470 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:11,800-Speed 2626.70 samples/sec Loss 4.4297 LearningRate 0.0107 Epoch: 13 Global Step: 558480 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:15,696-Speed 2629.15 samples/sec Loss 4.4619 LearningRate 0.0107 Epoch: 13 Global Step: 558490 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:19,591-Speed 2629.83 samples/sec Loss 4.4148 LearningRate 0.0107 Epoch: 13 Global Step: 558500 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:23,492-Speed 2625.66 samples/sec Loss 4.3542 LearningRate 0.0107 Epoch: 13 Global Step: 558510 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:27,427-Speed 2603.22 samples/sec Loss 4.3383 LearningRate 0.0107 Epoch: 13 Global Step: 558520 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:31,362-Speed 2603.07 samples/sec Loss 4.4497 LearningRate 0.0107 Epoch: 13 Global Step: 558530 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:35,258-Speed 2628.56 samples/sec Loss 4.3405 LearningRate 0.0107 Epoch: 13 Global Step: 558540 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:39,155-Speed 2628.50 samples/sec Loss 4.4749 LearningRate 0.0107 Epoch: 13 Global Step: 558550 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:28:43,053-Speed 2628.43 samples/sec Loss 4.3705 LearningRate 0.0107 Epoch: 13 Global Step: 558560 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:28:46,954-Speed 2625.55 samples/sec Loss 4.4589 LearningRate 0.0107 Epoch: 13 Global Step: 558570 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:28:50,851-Speed 2628.61 samples/sec Loss 4.4627 LearningRate 0.0107 Epoch: 13 Global Step: 558580 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:28:54,750-Speed 2627.12 samples/sec Loss 4.4210 LearningRate 0.0107 Epoch: 13 Global Step: 558590 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:28:58,658-Speed 2621.41 samples/sec Loss 4.4281 LearningRate 0.0107 Epoch: 13 Global Step: 558600 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:02,653-Speed 2563.95 samples/sec Loss 4.4026 LearningRate 0.0107 Epoch: 13 Global Step: 558610 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:06,567-Speed 2616.55 samples/sec Loss 4.3633 LearningRate 0.0107 Epoch: 13 Global Step: 558620 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:10,459-Speed 2631.81 samples/sec Loss 4.4258 LearningRate 0.0107 Epoch: 13 Global Step: 558630 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:14,351-Speed 2631.99 samples/sec Loss 4.4271 LearningRate 0.0107 Epoch: 13 Global Step: 558640 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:18,247-Speed 2629.44 samples/sec Loss 4.4075 LearningRate 0.0107 Epoch: 13 Global Step: 558650 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:22,139-Speed 2632.25 samples/sec Loss 4.3884 LearningRate 0.0107 Epoch: 13 Global Step: 558660 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:29:26,005-Speed 2649.67 samples/sec Loss 4.5391 LearningRate 0.0107 Epoch: 13 Global Step: 558670 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:29,896-Speed 2632.05 samples/sec Loss 4.4656 LearningRate 0.0107 Epoch: 13 Global Step: 558680 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:33,793-Speed 2628.04 samples/sec Loss 4.4625 LearningRate 0.0107 Epoch: 13 Global Step: 558690 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:37,690-Speed 2628.24 samples/sec Loss 4.4019 LearningRate 0.0107 Epoch: 13 Global Step: 558700 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:41,584-Speed 2630.67 samples/sec Loss 4.3991 LearningRate 0.0107 Epoch: 13 Global Step: 558710 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:45,479-Speed 2630.20 samples/sec Loss 4.3396 LearningRate 0.0107 Epoch: 13 Global Step: 558720 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:29:49,351-Speed 2645.52 samples/sec Loss 4.4527 LearningRate 0.0107 Epoch: 13 Global Step: 558730 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:29:53,260-Speed 2619.87 samples/sec Loss 4.4348 LearningRate 0.0107 Epoch: 13 Global Step: 558740 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:29:57,152-Speed 2632.35 samples/sec Loss 4.5384 LearningRate 0.0107 Epoch: 13 Global Step: 558750 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:01,069-Speed 2614.92 samples/sec Loss 4.2806 LearningRate 0.0107 Epoch: 13 Global Step: 558760 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:04,981-Speed 2617.62 samples/sec Loss 4.4743 LearningRate 0.0107 Epoch: 13 Global Step: 558770 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:08,879-Speed 2627.45 samples/sec Loss 4.4103 LearningRate 0.0107 Epoch: 13 Global Step: 558780 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:12,778-Speed 2627.78 samples/sec Loss 4.3133 LearningRate 0.0107 Epoch: 13 Global Step: 558790 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:16,675-Speed 2628.48 samples/sec Loss 4.4033 LearningRate 0.0107 Epoch: 13 Global Step: 558800 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:20,594-Speed 2612.85 samples/sec Loss 4.3777 LearningRate 0.0107 Epoch: 13 Global Step: 558810 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:24,504-Speed 2620.37 samples/sec Loss 4.4944 LearningRate 0.0107 Epoch: 13 Global Step: 558820 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:30:28,403-Speed 2626.72 samples/sec Loss 4.4700 LearningRate 0.0107 Epoch: 13 Global Step: 558830 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:32,338-Speed 2603.32 samples/sec Loss 4.4152 LearningRate 0.0107 Epoch: 13 Global Step: 558840 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:36,229-Speed 2632.74 samples/sec Loss 4.4266 LearningRate 0.0107 Epoch: 13 Global Step: 558850 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:40,126-Speed 2627.86 samples/sec Loss 4.3158 LearningRate 0.0106 Epoch: 13 Global Step: 558860 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:44,033-Speed 2621.86 samples/sec Loss 4.3255 LearningRate 0.0106 Epoch: 13 Global Step: 558870 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:47,926-Speed 2631.73 samples/sec Loss 4.3337 LearningRate 0.0106 Epoch: 13 Global Step: 558880 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:51,819-Speed 2631.41 samples/sec Loss 4.3572 LearningRate 0.0106 Epoch: 13 Global Step: 558890 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:55,780-Speed 2585.52 samples/sec Loss 4.4268 LearningRate 0.0106 Epoch: 13 Global Step: 558900 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:30:59,675-Speed 2630.01 samples/sec Loss 4.3510 LearningRate 0.0106 Epoch: 13 Global Step: 558910 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:03,582-Speed 2621.83 samples/sec Loss 4.3791 LearningRate 0.0106 Epoch: 13 Global Step: 558920 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:07,469-Speed 2634.66 samples/sec Loss 4.4118 LearningRate 0.0106 Epoch: 13 Global Step: 558930 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:11,364-Speed 2629.77 samples/sec Loss 4.4082 LearningRate 0.0106 Epoch: 13 Global Step: 558940 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:15,260-Speed 2628.43 samples/sec Loss 4.3680 LearningRate 0.0106 Epoch: 13 Global Step: 558950 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:19,155-Speed 2630.33 samples/sec Loss 4.5066 LearningRate 0.0106 Epoch: 13 Global Step: 558960 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:23,048-Speed 2631.77 samples/sec Loss 4.4116 LearningRate 0.0106 Epoch: 13 Global Step: 558970 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:26,939-Speed 2632.14 samples/sec Loss 4.3165 LearningRate 0.0106 Epoch: 13 Global Step: 558980 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:30,835-Speed 2629.05 samples/sec Loss 4.3560 LearningRate 0.0106 Epoch: 13 Global Step: 558990 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:34,739-Speed 2623.77 samples/sec Loss 4.4469 LearningRate 0.0106 Epoch: 13 Global Step: 559000 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:38,636-Speed 2627.97 samples/sec Loss 4.3450 LearningRate 0.0106 Epoch: 13 Global Step: 559010 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:42,542-Speed 2622.36 samples/sec Loss 4.4436 LearningRate 0.0106 Epoch: 13 Global Step: 559020 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:31:46,455-Speed 2618.06 samples/sec Loss 4.4233 LearningRate 0.0106 Epoch: 13 Global Step: 559030 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:31:50,359-Speed 2623.28 samples/sec Loss 4.3555 LearningRate 0.0106 Epoch: 13 Global Step: 559040 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:31:54,292-Speed 2604.82 samples/sec Loss 4.3909 LearningRate 0.0106 Epoch: 13 Global Step: 559050 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:31:58,187-Speed 2629.56 samples/sec Loss 4.3781 LearningRate 0.0106 Epoch: 13 Global Step: 559060 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:32:02,059-Speed 2645.65 samples/sec Loss 4.5332 LearningRate 0.0106 Epoch: 13 Global Step: 559070 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:05,962-Speed 2624.34 samples/sec Loss 4.5256 LearningRate 0.0106 Epoch: 13 Global Step: 559080 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:09,901-Speed 2600.09 samples/sec Loss 4.4551 LearningRate 0.0106 Epoch: 13 Global Step: 559090 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:13,798-Speed 2627.96 samples/sec Loss 4.4952 LearningRate 0.0106 Epoch: 13 Global Step: 559100 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:17,693-Speed 2630.28 samples/sec Loss 4.3982 LearningRate 0.0106 Epoch: 13 Global Step: 559110 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:21,602-Speed 2619.80 samples/sec Loss 4.4323 LearningRate 0.0106 Epoch: 13 Global Step: 559120 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:25,505-Speed 2624.70 samples/sec Loss 4.4802 LearningRate 0.0106 Epoch: 13 Global Step: 559130 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:29,404-Speed 2630.37 samples/sec Loss 4.4819 LearningRate 0.0106 Epoch: 13 Global Step: 559140 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:33,298-Speed 2630.91 samples/sec Loss 4.3563 LearningRate 0.0106 Epoch: 13 Global Step: 559150 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:37,190-Speed 2631.80 samples/sec Loss 4.3924 LearningRate 0.0106 Epoch: 13 Global Step: 559160 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:41,067-Speed 2641.28 samples/sec Loss 4.3738 LearningRate 0.0106 Epoch: 13 Global Step: 559170 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:44,962-Speed 2630.18 samples/sec Loss 4.3522 LearningRate 0.0106 Epoch: 13 Global Step: 559180 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:48,854-Speed 2631.49 samples/sec Loss 4.4670 LearningRate 0.0106 Epoch: 13 Global Step: 559190 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:52,759-Speed 2623.10 samples/sec Loss 4.4145 LearningRate 0.0106 Epoch: 13 Global Step: 559200 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:32:56,672-Speed 2617.48 samples/sec Loss 4.4068 LearningRate 0.0106 Epoch: 13 Global Step: 559210 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:00,600-Speed 2607.30 samples/sec Loss 4.3574 LearningRate 0.0106 Epoch: 13 Global Step: 559220 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:04,514-Speed 2616.59 samples/sec Loss 4.3923 LearningRate 0.0106 Epoch: 13 Global Step: 559230 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:08,417-Speed 2624.84 samples/sec Loss 4.3725 LearningRate 0.0106 Epoch: 13 Global Step: 559240 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:12,373-Speed 2589.35 samples/sec Loss 4.3537 LearningRate 0.0106 Epoch: 13 Global Step: 559250 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:16,271-Speed 2627.17 samples/sec Loss 4.4160 LearningRate 0.0106 Epoch: 13 Global Step: 559260 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:20,152-Speed 2639.24 samples/sec Loss 4.4850 LearningRate 0.0106 Epoch: 13 Global Step: 559270 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:24,065-Speed 2618.08 samples/sec Loss 4.4155 LearningRate 0.0106 Epoch: 13 Global Step: 559280 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:27,978-Speed 2617.70 samples/sec Loss 4.3142 LearningRate 0.0106 Epoch: 13 Global Step: 559290 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:31,884-Speed 2622.18 samples/sec Loss 4.3532 LearningRate 0.0106 Epoch: 13 Global Step: 559300 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:35,787-Speed 2623.94 samples/sec Loss 4.4223 LearningRate 0.0106 Epoch: 13 Global Step: 559310 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:39,684-Speed 2628.52 samples/sec Loss 4.3469 LearningRate 0.0106 Epoch: 13 Global Step: 559320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:43,585-Speed 2625.48 samples/sec Loss 4.3964 LearningRate 0.0106 Epoch: 13 Global Step: 559330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:47,500-Speed 2616.59 samples/sec Loss 4.4444 LearningRate 0.0106 Epoch: 13 Global Step: 559340 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:51,399-Speed 2627.67 samples/sec Loss 4.4485 LearningRate 0.0106 Epoch: 13 Global Step: 559350 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:55,292-Speed 2630.84 samples/sec Loss 4.3642 LearningRate 0.0106 Epoch: 13 Global Step: 559360 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:33:59,207-Speed 2616.06 samples/sec Loss 4.3455 LearningRate 0.0106 Epoch: 13 Global Step: 559370 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:34:03,106-Speed 2627.54 samples/sec Loss 4.4441 LearningRate 0.0106 Epoch: 13 Global Step: 559380 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:34:07,020-Speed 2616.81 samples/sec Loss 4.4072 LearningRate 0.0106 Epoch: 13 Global Step: 559390 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:34:10,918-Speed 2627.38 samples/sec Loss 4.2770 LearningRate 0.0106 Epoch: 13 Global Step: 559400 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:34:14,818-Speed 2626.87 samples/sec Loss 4.4008 LearningRate 0.0106 Epoch: 13 Global Step: 559410 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:34:18,714-Speed 2630.43 samples/sec Loss 4.4930 LearningRate 0.0106 Epoch: 13 Global Step: 559420 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:34:22,592-Speed 2640.52 samples/sec Loss 4.3557 LearningRate 0.0106 Epoch: 13 Global Step: 559430 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:34:26,470-Speed 2641.80 samples/sec Loss 4.4621 LearningRate 0.0106 Epoch: 13 Global Step: 559440 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:30,361-Speed 2632.28 samples/sec Loss 4.3987 LearningRate 0.0106 Epoch: 13 Global Step: 559450 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:34,255-Speed 2630.43 samples/sec Loss 4.4177 LearningRate 0.0106 Epoch: 13 Global Step: 559460 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:38,157-Speed 2624.74 samples/sec Loss 4.3901 LearningRate 0.0106 Epoch: 13 Global Step: 559470 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:42,066-Speed 2620.61 samples/sec Loss 4.3454 LearningRate 0.0106 Epoch: 13 Global Step: 559480 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:45,960-Speed 2630.52 samples/sec Loss 4.4776 LearningRate 0.0106 Epoch: 13 Global Step: 559490 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:49,895-Speed 2602.83 samples/sec Loss 4.3207 LearningRate 0.0106 Epoch: 13 Global Step: 559500 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:53,793-Speed 2627.40 samples/sec Loss 4.5067 LearningRate 0.0106 Epoch: 13 Global Step: 559510 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:34:57,684-Speed 2632.78 samples/sec Loss 4.2880 LearningRate 0.0106 Epoch: 13 Global Step: 559520 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:35:01,576-Speed 2631.84 samples/sec Loss 4.3297 LearningRate 0.0106 Epoch: 13 Global Step: 559530 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:35:05,488-Speed 2617.51 samples/sec Loss 4.3459 LearningRate 0.0106 Epoch: 13 Global Step: 559540 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:09,385-Speed 2628.37 samples/sec Loss 4.3446 LearningRate 0.0106 Epoch: 13 Global Step: 559550 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:13,290-Speed 2623.12 samples/sec Loss 4.4338 LearningRate 0.0106 Epoch: 13 Global Step: 559560 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:17,180-Speed 2632.54 samples/sec Loss 4.3950 LearningRate 0.0106 Epoch: 13 Global Step: 559570 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:21,073-Speed 2631.51 samples/sec Loss 4.4430 LearningRate 0.0106 Epoch: 13 Global Step: 559580 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:24,967-Speed 2630.53 samples/sec Loss 4.3626 LearningRate 0.0106 Epoch: 13 Global Step: 559590 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:28,881-Speed 2617.02 samples/sec Loss 4.3927 LearningRate 0.0106 Epoch: 13 Global Step: 559600 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:32,798-Speed 2614.78 samples/sec Loss 4.4233 LearningRate 0.0106 Epoch: 13 Global Step: 559610 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:36,693-Speed 2629.27 samples/sec Loss 4.4332 LearningRate 0.0106 Epoch: 13 Global Step: 559620 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:40,592-Speed 2626.58 samples/sec Loss 4.3608 LearningRate 0.0106 Epoch: 13 Global Step: 559630 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:35:44,452-Speed 2653.63 samples/sec Loss 4.4027 LearningRate 0.0106 Epoch: 13 Global Step: 559640 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:35:48,365-Speed 2617.98 samples/sec Loss 4.3464 LearningRate 0.0106 Epoch: 13 Global Step: 559650 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:35:52,275-Speed 2619.26 samples/sec Loss 4.3786 LearningRate 0.0106 Epoch: 13 Global Step: 559660 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:35:56,181-Speed 2622.16 samples/sec Loss 4.3572 LearningRate 0.0106 Epoch: 13 Global Step: 559670 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:36:00,088-Speed 2621.92 samples/sec Loss 4.3961 LearningRate 0.0106 Epoch: 13 Global Step: 559680 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:36:03,996-Speed 2620.47 samples/sec Loss 4.4272 LearningRate 0.0106 Epoch: 13 Global Step: 559690 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:36:07,909-Speed 2617.43 samples/sec Loss 4.3597 LearningRate 0.0106 Epoch: 13 Global Step: 559700 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:36:11,816-Speed 2622.00 samples/sec Loss 4.4384 LearningRate 0.0106 Epoch: 13 Global Step: 559710 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:36:15,716-Speed 2626.22 samples/sec Loss 4.3835 LearningRate 0.0106 Epoch: 13 Global Step: 559720 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:36:19,632-Speed 2615.35 samples/sec Loss 4.3492 LearningRate 0.0106 Epoch: 13 Global Step: 559730 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:36:23,542-Speed 2619.95 samples/sec Loss 4.4683 LearningRate 0.0106 Epoch: 13 Global Step: 559740 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:27,446-Speed 2623.44 samples/sec Loss 4.4535 LearningRate 0.0106 Epoch: 13 Global Step: 559750 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:31,342-Speed 2628.85 samples/sec Loss 4.3753 LearningRate 0.0106 Epoch: 13 Global Step: 559760 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:35,240-Speed 2627.86 samples/sec Loss 4.3797 LearningRate 0.0106 Epoch: 13 Global Step: 559770 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:39,144-Speed 2623.95 samples/sec Loss 4.3264 LearningRate 0.0106 Epoch: 13 Global Step: 559780 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:43,042-Speed 2627.16 samples/sec Loss 4.3605 LearningRate 0.0106 Epoch: 13 Global Step: 559790 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:46,940-Speed 2628.07 samples/sec Loss 4.4697 LearningRate 0.0106 Epoch: 13 Global Step: 559800 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:50,834-Speed 2630.03 samples/sec Loss 4.3862 LearningRate 0.0106 Epoch: 13 Global Step: 559810 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:54,730-Speed 2628.93 samples/sec Loss 4.3786 LearningRate 0.0106 Epoch: 13 Global Step: 559820 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:36:58,625-Speed 2629.28 samples/sec Loss 4.3901 LearningRate 0.0106 Epoch: 13 Global Step: 559830 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:02,517-Speed 2632.91 samples/sec Loss 4.3672 LearningRate 0.0106 Epoch: 13 Global Step: 559840 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:37:06,417-Speed 2625.88 samples/sec Loss 4.3911 LearningRate 0.0106 Epoch: 13 Global Step: 559850 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:37:10,313-Speed 2629.30 samples/sec Loss 4.3336 LearningRate 0.0106 Epoch: 13 Global Step: 559860 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:37:14,188-Speed 2643.00 samples/sec Loss 4.4121 LearningRate 0.0106 Epoch: 13 Global Step: 559870 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:18,085-Speed 2629.21 samples/sec Loss 4.3533 LearningRate 0.0106 Epoch: 13 Global Step: 559880 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:21,982-Speed 2627.87 samples/sec Loss 4.3745 LearningRate 0.0106 Epoch: 13 Global Step: 559890 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:25,876-Speed 2630.58 samples/sec Loss 4.2610 LearningRate 0.0106 Epoch: 13 Global Step: 559900 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:29,783-Speed 2621.48 samples/sec Loss 4.3212 LearningRate 0.0106 Epoch: 13 Global Step: 559910 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:33,683-Speed 2626.44 samples/sec Loss 4.3650 LearningRate 0.0106 Epoch: 13 Global Step: 559920 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:37,588-Speed 2622.58 samples/sec Loss 4.3276 LearningRate 0.0106 Epoch: 13 Global Step: 559930 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:41,491-Speed 2624.27 samples/sec Loss 4.3723 LearningRate 0.0106 Epoch: 13 Global Step: 559940 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:45,390-Speed 2626.85 samples/sec Loss 4.4041 LearningRate 0.0106 Epoch: 13 Global Step: 559950 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:49,302-Speed 2618.70 samples/sec Loss 4.4409 LearningRate 0.0106 Epoch: 13 Global Step: 559960 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:37:53,194-Speed 2631.26 samples/sec Loss 4.3892 LearningRate 0.0106 Epoch: 13 Global Step: 559970 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:37:57,071-Speed 2642.38 samples/sec Loss 4.3588 LearningRate 0.0106 Epoch: 13 Global Step: 559980 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:38:00,978-Speed 2621.19 samples/sec Loss 4.3907 LearningRate 0.0106 Epoch: 13 Global Step: 559990 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:38:04,883-Speed 2623.09 samples/sec Loss 4.3959 LearningRate 0.0106 Epoch: 13 Global Step: 560000 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:38:48,319-[lfw][560000]XNorm: 22.792994
Training: 2022-04-15 10:38:48,320-[lfw][560000]Accuracy-Flip: 0.99800+-0.00296
Training: 2022-04-15 10:38:48,321-[lfw][560000]Accuracy-Highest: 0.99800
Training: 2022-04-15 10:39:38,317-[cfp_fp][560000]XNorm: 21.402369
Training: 2022-04-15 10:39:38,318-[cfp_fp][560000]Accuracy-Flip: 0.99029+-0.00366
Training: 2022-04-15 10:39:38,319-[cfp_fp][560000]Accuracy-Highest: 0.99086
Training: 2022-04-15 10:40:21,246-[agedb_30][560000]XNorm: 22.867380
Training: 2022-04-15 10:40:21,247-[agedb_30][560000]Accuracy-Flip: 0.98067+-0.00578
Training: 2022-04-15 10:40:21,247-[agedb_30][560000]Accuracy-Highest: 0.98083
Training: 2022-04-15 10:40:25,142-Speed 73.01 samples/sec Loss 4.3806 LearningRate 0.0106 Epoch: 13 Global Step: 560010 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:40:29,018-Speed 2642.30 samples/sec Loss 4.4120 LearningRate 0.0106 Epoch: 13 Global Step: 560020 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:40:32,900-Speed 2639.25 samples/sec Loss 4.4262 LearningRate 0.0106 Epoch: 13 Global Step: 560030 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:40:36,926-Speed 2543.87 samples/sec Loss 4.4426 LearningRate 0.0106 Epoch: 13 Global Step: 560040 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:40:40,918-Speed 2565.76 samples/sec Loss 4.3727 LearningRate 0.0106 Epoch: 13 Global Step: 560050 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:40:44,811-Speed 2630.68 samples/sec Loss 4.4476 LearningRate 0.0106 Epoch: 13 Global Step: 560060 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:40:48,701-Speed 2633.14 samples/sec Loss 4.3551 LearningRate 0.0106 Epoch: 13 Global Step: 560070 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:40:52,595-Speed 2630.39 samples/sec Loss 4.3802 LearningRate 0.0106 Epoch: 13 Global Step: 560080 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:40:56,468-Speed 2644.70 samples/sec Loss 4.3114 LearningRate 0.0106 Epoch: 13 Global Step: 560090 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:41:00,371-Speed 2624.68 samples/sec Loss 4.3172 LearningRate 0.0106 Epoch: 13 Global Step: 560100 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:41:04,529-Speed 2462.96 samples/sec Loss 4.4004 LearningRate 0.0106 Epoch: 13 Global Step: 560110 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:41:08,420-Speed 2632.13 samples/sec Loss 4.3991 LearningRate 0.0106 Epoch: 13 Global Step: 560120 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:41:12,297-Speed 2642.78 samples/sec Loss 4.3498 LearningRate 0.0105 Epoch: 13 Global Step: 560130 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:16,187-Speed 2632.79 samples/sec Loss 4.3744 LearningRate 0.0105 Epoch: 13 Global Step: 560140 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:20,089-Speed 2624.67 samples/sec Loss 4.4216 LearningRate 0.0105 Epoch: 13 Global Step: 560150 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:23,992-Speed 2624.72 samples/sec Loss 4.2975 LearningRate 0.0105 Epoch: 13 Global Step: 560160 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:27,890-Speed 2627.30 samples/sec Loss 4.4379 LearningRate 0.0105 Epoch: 13 Global Step: 560170 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:31,791-Speed 2626.14 samples/sec Loss 4.4587 LearningRate 0.0105 Epoch: 13 Global Step: 560180 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:35,730-Speed 2600.05 samples/sec Loss 4.2979 LearningRate 0.0105 Epoch: 13 Global Step: 560190 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:39,624-Speed 2630.68 samples/sec Loss 4.3314 LearningRate 0.0105 Epoch: 13 Global Step: 560200 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:43,528-Speed 2623.73 samples/sec Loss 4.3485 LearningRate 0.0105 Epoch: 13 Global Step: 560210 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:47,431-Speed 2624.09 samples/sec Loss 4.3937 LearningRate 0.0105 Epoch: 13 Global Step: 560220 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:41:51,327-Speed 2629.62 samples/sec Loss 4.4509 LearningRate 0.0105 Epoch: 13 Global Step: 560230 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:41:55,232-Speed 2623.45 samples/sec Loss 4.4452 LearningRate 0.0105 Epoch: 13 Global Step: 560240 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:41:59,132-Speed 2626.19 samples/sec Loss 4.3959 LearningRate 0.0105 Epoch: 13 Global Step: 560250 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:03,100-Speed 2581.15 samples/sec Loss 4.3951 LearningRate 0.0105 Epoch: 13 Global Step: 560260 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:06,998-Speed 2628.35 samples/sec Loss 4.3548 LearningRate 0.0105 Epoch: 13 Global Step: 560270 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:10,944-Speed 2595.67 samples/sec Loss 4.3933 LearningRate 0.0105 Epoch: 13 Global Step: 560280 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:14,846-Speed 2624.95 samples/sec Loss 4.3959 LearningRate 0.0105 Epoch: 13 Global Step: 560290 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:18,746-Speed 2625.76 samples/sec Loss 4.3506 LearningRate 0.0105 Epoch: 13 Global Step: 560300 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:22,646-Speed 2626.84 samples/sec Loss 4.3371 LearningRate 0.0105 Epoch: 13 Global Step: 560310 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:26,540-Speed 2630.57 samples/sec Loss 4.4803 LearningRate 0.0105 Epoch: 13 Global Step: 560320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:30,417-Speed 2641.49 samples/sec Loss 4.3862 LearningRate 0.0105 Epoch: 13 Global Step: 560330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:34,314-Speed 2628.99 samples/sec Loss 4.3273 LearningRate 0.0105 Epoch: 13 Global Step: 560340 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:38,208-Speed 2630.42 samples/sec Loss 4.3161 LearningRate 0.0105 Epoch: 13 Global Step: 560350 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:42,100-Speed 2631.34 samples/sec Loss 4.3211 LearningRate 0.0105 Epoch: 13 Global Step: 560360 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:45,994-Speed 2630.83 samples/sec Loss 4.2553 LearningRate 0.0105 Epoch: 13 Global Step: 560370 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:49,907-Speed 2624.76 samples/sec Loss 4.4052 LearningRate 0.0105 Epoch: 13 Global Step: 560380 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:53,802-Speed 2629.86 samples/sec Loss 4.3659 LearningRate 0.0105 Epoch: 13 Global Step: 560390 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:42:57,695-Speed 2630.92 samples/sec Loss 4.3169 LearningRate 0.0105 Epoch: 13 Global Step: 560400 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:01,591-Speed 2629.00 samples/sec Loss 4.2626 LearningRate 0.0105 Epoch: 13 Global Step: 560410 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:05,485-Speed 2630.32 samples/sec Loss 4.4193 LearningRate 0.0105 Epoch: 13 Global Step: 560420 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:09,360-Speed 2643.01 samples/sec Loss 4.3904 LearningRate 0.0105 Epoch: 13 Global Step: 560430 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:13,263-Speed 2624.13 samples/sec Loss 4.4562 LearningRate 0.0105 Epoch: 13 Global Step: 560440 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:17,156-Speed 2632.10 samples/sec Loss 4.4285 LearningRate 0.0105 Epoch: 13 Global Step: 560450 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:21,066-Speed 2619.37 samples/sec Loss 4.4137 LearningRate 0.0105 Epoch: 13 Global Step: 560460 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:24,963-Speed 2628.20 samples/sec Loss 4.4314 LearningRate 0.0105 Epoch: 13 Global Step: 560470 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:28,853-Speed 2633.04 samples/sec Loss 4.3364 LearningRate 0.0105 Epoch: 13 Global Step: 560480 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:32,746-Speed 2631.40 samples/sec Loss 4.4704 LearningRate 0.0105 Epoch: 13 Global Step: 560490 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:36,639-Speed 2630.90 samples/sec Loss 4.3429 LearningRate 0.0105 Epoch: 13 Global Step: 560500 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:40,529-Speed 2632.91 samples/sec Loss 4.3908 LearningRate 0.0105 Epoch: 13 Global Step: 560510 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:44,422-Speed 2631.49 samples/sec Loss 4.4165 LearningRate 0.0105 Epoch: 13 Global Step: 560520 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:48,328-Speed 2622.44 samples/sec Loss 4.2859 LearningRate 0.0105 Epoch: 13 Global Step: 560530 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:43:52,218-Speed 2633.22 samples/sec Loss 4.4660 LearningRate 0.0105 Epoch: 13 Global Step: 560540 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:43:56,088-Speed 2646.78 samples/sec Loss 4.3852 LearningRate 0.0105 Epoch: 13 Global Step: 560550 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:43:59,986-Speed 2627.54 samples/sec Loss 4.3001 LearningRate 0.0105 Epoch: 13 Global Step: 560560 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:03,885-Speed 2626.84 samples/sec Loss 4.3226 LearningRate 0.0105 Epoch: 13 Global Step: 560570 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:07,785-Speed 2626.90 samples/sec Loss 4.4806 LearningRate 0.0105 Epoch: 13 Global Step: 560580 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:11,683-Speed 2627.45 samples/sec Loss 4.4506 LearningRate 0.0105 Epoch: 13 Global Step: 560590 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:15,581-Speed 2627.81 samples/sec Loss 4.3158 LearningRate 0.0105 Epoch: 13 Global Step: 560600 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:19,479-Speed 2627.45 samples/sec Loss 4.3699 LearningRate 0.0105 Epoch: 13 Global Step: 560610 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:23,374-Speed 2630.10 samples/sec Loss 4.4010 LearningRate 0.0105 Epoch: 13 Global Step: 560620 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:27,267-Speed 2630.89 samples/sec Loss 4.3548 LearningRate 0.0105 Epoch: 13 Global Step: 560630 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:31,185-Speed 2613.98 samples/sec Loss 4.3395 LearningRate 0.0105 Epoch: 13 Global Step: 560640 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:35,118-Speed 2604.11 samples/sec Loss 4.3072 LearningRate 0.0105 Epoch: 13 Global Step: 560650 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:44:39,022-Speed 2623.97 samples/sec Loss 4.3875 LearningRate 0.0105 Epoch: 13 Global Step: 560660 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:44:42,919-Speed 2628.41 samples/sec Loss 4.4301 LearningRate 0.0105 Epoch: 13 Global Step: 560670 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:44:46,810-Speed 2632.63 samples/sec Loss 4.3525 LearningRate 0.0105 Epoch: 13 Global Step: 560680 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:44:50,702-Speed 2631.19 samples/sec Loss 4.3411 LearningRate 0.0105 Epoch: 13 Global Step: 560690 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:54,597-Speed 2630.30 samples/sec Loss 4.3589 LearningRate 0.0105 Epoch: 13 Global Step: 560700 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:44:58,498-Speed 2625.65 samples/sec Loss 4.2847 LearningRate 0.0105 Epoch: 13 Global Step: 560710 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:02,405-Speed 2621.88 samples/sec Loss 4.3556 LearningRate 0.0105 Epoch: 13 Global Step: 560720 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:06,297-Speed 2631.24 samples/sec Loss 4.3257 LearningRate 0.0105 Epoch: 13 Global Step: 560730 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:10,238-Speed 2599.45 samples/sec Loss 4.3473 LearningRate 0.0105 Epoch: 13 Global Step: 560740 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:14,136-Speed 2627.75 samples/sec Loss 4.2734 LearningRate 0.0105 Epoch: 13 Global Step: 560750 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:18,039-Speed 2624.47 samples/sec Loss 4.3420 LearningRate 0.0105 Epoch: 13 Global Step: 560760 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:21,935-Speed 2628.95 samples/sec Loss 4.3877 LearningRate 0.0105 Epoch: 13 Global Step: 560770 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:25,830-Speed 2629.94 samples/sec Loss 4.2796 LearningRate 0.0105 Epoch: 13 Global Step: 560780 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:29,724-Speed 2631.04 samples/sec Loss 4.4366 LearningRate 0.0105 Epoch: 13 Global Step: 560790 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:45:33,614-Speed 2632.93 samples/sec Loss 4.4230 LearningRate 0.0105 Epoch: 13 Global Step: 560800 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:37,511-Speed 2627.61 samples/sec Loss 4.3713 LearningRate 0.0105 Epoch: 13 Global Step: 560810 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:41,439-Speed 2607.41 samples/sec Loss 4.3173 LearningRate 0.0105 Epoch: 13 Global Step: 560820 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:45,338-Speed 2627.10 samples/sec Loss 4.4405 LearningRate 0.0105 Epoch: 13 Global Step: 560830 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:49,232-Speed 2631.03 samples/sec Loss 4.4179 LearningRate 0.0105 Epoch: 13 Global Step: 560840 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:53,127-Speed 2629.80 samples/sec Loss 4.3992 LearningRate 0.0105 Epoch: 13 Global Step: 560850 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:45:57,021-Speed 2630.57 samples/sec Loss 4.5392 LearningRate 0.0105 Epoch: 13 Global Step: 560860 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:00,920-Speed 2626.54 samples/sec Loss 4.3344 LearningRate 0.0105 Epoch: 13 Global Step: 560870 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:04,842-Speed 2611.81 samples/sec Loss 4.3948 LearningRate 0.0105 Epoch: 13 Global Step: 560880 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:08,739-Speed 2627.88 samples/sec Loss 4.2995 LearningRate 0.0105 Epoch: 13 Global Step: 560890 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:12,676-Speed 2602.32 samples/sec Loss 4.3387 LearningRate 0.0105 Epoch: 13 Global Step: 560900 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:46:16,551-Speed 2643.13 samples/sec Loss 4.3692 LearningRate 0.0105 Epoch: 13 Global Step: 560910 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:20,472-Speed 2612.55 samples/sec Loss 4.4178 LearningRate 0.0105 Epoch: 13 Global Step: 560920 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:24,370-Speed 2627.38 samples/sec Loss 4.3875 LearningRate 0.0105 Epoch: 13 Global Step: 560930 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:28,304-Speed 2604.17 samples/sec Loss 4.3282 LearningRate 0.0105 Epoch: 13 Global Step: 560940 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:32,221-Speed 2614.41 samples/sec Loss 4.3170 LearningRate 0.0105 Epoch: 13 Global Step: 560950 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:36,126-Speed 2622.56 samples/sec Loss 4.3933 LearningRate 0.0105 Epoch: 13 Global Step: 560960 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:40,030-Speed 2623.70 samples/sec Loss 4.3792 LearningRate 0.0105 Epoch: 13 Global Step: 560970 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:43,985-Speed 2591.66 samples/sec Loss 4.3849 LearningRate 0.0105 Epoch: 13 Global Step: 560980 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:47,877-Speed 2631.54 samples/sec Loss 4.3881 LearningRate 0.0105 Epoch: 13 Global Step: 560990 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:51,769-Speed 2631.60 samples/sec Loss 4.4298 LearningRate 0.0105 Epoch: 13 Global Step: 561000 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:46:55,662-Speed 2630.68 samples/sec Loss 4.4045 LearningRate 0.0105 Epoch: 13 Global Step: 561010 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:46:59,557-Speed 2630.37 samples/sec Loss 4.4039 LearningRate 0.0105 Epoch: 13 Global Step: 561020 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:47:03,455-Speed 2627.90 samples/sec Loss 4.3222 LearningRate 0.0105 Epoch: 13 Global Step: 561030 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:47:07,349-Speed 2630.31 samples/sec Loss 4.4076 LearningRate 0.0105 Epoch: 13 Global Step: 561040 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:47:11,249-Speed 2626.21 samples/sec Loss 4.4618 LearningRate 0.0105 Epoch: 13 Global Step: 561050 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:47:15,121-Speed 2645.53 samples/sec Loss 4.3596 LearningRate 0.0105 Epoch: 13 Global Step: 561060 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:19,011-Speed 2633.17 samples/sec Loss 4.4240 LearningRate 0.0105 Epoch: 13 Global Step: 561070 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:22,906-Speed 2629.67 samples/sec Loss 4.3579 LearningRate 0.0105 Epoch: 13 Global Step: 561080 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:26,862-Speed 2589.62 samples/sec Loss 4.4185 LearningRate 0.0105 Epoch: 13 Global Step: 561090 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:30,760-Speed 2627.44 samples/sec Loss 4.3858 LearningRate 0.0105 Epoch: 13 Global Step: 561100 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:34,657-Speed 2627.97 samples/sec Loss 4.3817 LearningRate 0.0105 Epoch: 13 Global Step: 561110 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:38,552-Speed 2629.79 samples/sec Loss 4.4888 LearningRate 0.0105 Epoch: 13 Global Step: 561120 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:42,449-Speed 2628.92 samples/sec Loss 4.2800 LearningRate 0.0105 Epoch: 13 Global Step: 561130 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:46,357-Speed 2620.53 samples/sec Loss 4.4031 LearningRate 0.0105 Epoch: 13 Global Step: 561140 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:50,301-Speed 2597.20 samples/sec Loss 4.4475 LearningRate 0.0105 Epoch: 13 Global Step: 561150 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:47:54,200-Speed 2627.22 samples/sec Loss 4.3725 LearningRate 0.0105 Epoch: 13 Global Step: 561160 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:47:58,069-Speed 2647.76 samples/sec Loss 4.3750 LearningRate 0.0105 Epoch: 13 Global Step: 561170 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:01,964-Speed 2629.32 samples/sec Loss 4.3654 LearningRate 0.0105 Epoch: 13 Global Step: 561180 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:05,869-Speed 2622.78 samples/sec Loss 4.3647 LearningRate 0.0105 Epoch: 13 Global Step: 561190 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:09,766-Speed 2628.65 samples/sec Loss 4.4040 LearningRate 0.0105 Epoch: 13 Global Step: 561200 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:13,664-Speed 2627.45 samples/sec Loss 4.3656 LearningRate 0.0105 Epoch: 13 Global Step: 561210 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:17,560-Speed 2629.14 samples/sec Loss 4.2732 LearningRate 0.0105 Epoch: 13 Global Step: 561220 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:21,455-Speed 2629.46 samples/sec Loss 4.2855 LearningRate 0.0105 Epoch: 13 Global Step: 561230 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:25,359-Speed 2623.77 samples/sec Loss 4.3538 LearningRate 0.0105 Epoch: 13 Global Step: 561240 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:29,254-Speed 2630.28 samples/sec Loss 4.4314 LearningRate 0.0105 Epoch: 13 Global Step: 561250 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:33,157-Speed 2624.34 samples/sec Loss 4.3842 LearningRate 0.0105 Epoch: 13 Global Step: 561260 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:37,032-Speed 2643.04 samples/sec Loss 4.4075 LearningRate 0.0105 Epoch: 13 Global Step: 561270 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:40,932-Speed 2626.03 samples/sec Loss 4.4334 LearningRate 0.0105 Epoch: 13 Global Step: 561280 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:44,840-Speed 2621.00 samples/sec Loss 4.3375 LearningRate 0.0105 Epoch: 13 Global Step: 561290 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:48,727-Speed 2634.76 samples/sec Loss 4.3273 LearningRate 0.0105 Epoch: 13 Global Step: 561300 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:52,627-Speed 2626.91 samples/sec Loss 4.3338 LearningRate 0.0105 Epoch: 13 Global Step: 561310 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:48:56,524-Speed 2628.60 samples/sec Loss 4.4859 LearningRate 0.0105 Epoch: 13 Global Step: 561320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:00,446-Speed 2611.03 samples/sec Loss 4.4077 LearningRate 0.0105 Epoch: 13 Global Step: 561330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:04,340-Speed 2630.85 samples/sec Loss 4.4138 LearningRate 0.0105 Epoch: 13 Global Step: 561340 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:08,231-Speed 2632.44 samples/sec Loss 4.3571 LearningRate 0.0105 Epoch: 13 Global Step: 561350 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:12,122-Speed 2632.42 samples/sec Loss 4.3720 LearningRate 0.0105 Epoch: 13 Global Step: 561360 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:16,031-Speed 2620.89 samples/sec Loss 4.2795 LearningRate 0.0105 Epoch: 13 Global Step: 561370 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:49:19,920-Speed 2633.49 samples/sec Loss 4.4187 LearningRate 0.0105 Epoch: 13 Global Step: 561380 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:49:23,815-Speed 2630.11 samples/sec Loss 4.3010 LearningRate 0.0105 Epoch: 13 Global Step: 561390 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:49:27,686-Speed 2645.91 samples/sec Loss 4.3362 LearningRate 0.0105 Epoch: 13 Global Step: 561400 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:31,580-Speed 2630.31 samples/sec Loss 4.3052 LearningRate 0.0104 Epoch: 13 Global Step: 561410 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:35,491-Speed 2618.58 samples/sec Loss 4.3848 LearningRate 0.0104 Epoch: 13 Global Step: 561420 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:39,390-Speed 2627.70 samples/sec Loss 4.3774 LearningRate 0.0104 Epoch: 13 Global Step: 561430 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:43,295-Speed 2622.73 samples/sec Loss 4.3665 LearningRate 0.0104 Epoch: 13 Global Step: 561440 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:47,197-Speed 2625.16 samples/sec Loss 4.4085 LearningRate 0.0104 Epoch: 13 Global Step: 561450 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:51,108-Speed 2618.60 samples/sec Loss 4.4499 LearningRate 0.0104 Epoch: 13 Global Step: 561460 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:55,004-Speed 2629.67 samples/sec Loss 4.3319 LearningRate 0.0104 Epoch: 13 Global Step: 561470 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:49:58,898-Speed 2630.09 samples/sec Loss 4.3688 LearningRate 0.0104 Epoch: 13 Global Step: 561480 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:02,798-Speed 2626.20 samples/sec Loss 4.4095 LearningRate 0.0104 Epoch: 13 Global Step: 561490 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:06,702-Speed 2623.48 samples/sec Loss 4.3062 LearningRate 0.0104 Epoch: 13 Global Step: 561500 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:50:10,693-Speed 2566.31 samples/sec Loss 4.4035 LearningRate 0.0104 Epoch: 13 Global Step: 561510 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:50:14,571-Speed 2641.24 samples/sec Loss 4.2677 LearningRate 0.0104 Epoch: 13 Global Step: 561520 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:18,493-Speed 2611.27 samples/sec Loss 4.4483 LearningRate 0.0104 Epoch: 13 Global Step: 561530 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:22,422-Speed 2607.06 samples/sec Loss 4.4049 LearningRate 0.0104 Epoch: 13 Global Step: 561540 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:26,319-Speed 2628.35 samples/sec Loss 4.3740 LearningRate 0.0104 Epoch: 13 Global Step: 561550 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:30,217-Speed 2627.68 samples/sec Loss 4.2795 LearningRate 0.0104 Epoch: 13 Global Step: 561560 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:34,116-Speed 2626.99 samples/sec Loss 4.3139 LearningRate 0.0104 Epoch: 13 Global Step: 561570 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:38,017-Speed 2625.59 samples/sec Loss 4.3433 LearningRate 0.0104 Epoch: 13 Global Step: 561580 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:41,920-Speed 2624.03 samples/sec Loss 4.3304 LearningRate 0.0104 Epoch: 13 Global Step: 561590 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:45,822-Speed 2625.69 samples/sec Loss 4.4277 LearningRate 0.0104 Epoch: 13 Global Step: 561600 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:49,724-Speed 2624.60 samples/sec Loss 4.3579 LearningRate 0.0104 Epoch: 13 Global Step: 561610 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:50:53,625-Speed 2625.80 samples/sec Loss 4.4302 LearningRate 0.0104 Epoch: 13 Global Step: 561620 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:50:57,498-Speed 2644.20 samples/sec Loss 4.2696 LearningRate 0.0104 Epoch: 13 Global Step: 561630 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:01,400-Speed 2625.25 samples/sec Loss 4.3170 LearningRate 0.0104 Epoch: 13 Global Step: 561640 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:05,311-Speed 2618.91 samples/sec Loss 4.3619 LearningRate 0.0104 Epoch: 13 Global Step: 561650 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:09,216-Speed 2622.92 samples/sec Loss 4.4409 LearningRate 0.0104 Epoch: 13 Global Step: 561660 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:13,135-Speed 2613.10 samples/sec Loss 4.3389 LearningRate 0.0104 Epoch: 13 Global Step: 561670 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:17,075-Speed 2599.48 samples/sec Loss 4.3834 LearningRate 0.0104 Epoch: 13 Global Step: 561680 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:20,993-Speed 2614.51 samples/sec Loss 4.2471 LearningRate 0.0104 Epoch: 13 Global Step: 561690 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:24,896-Speed 2624.38 samples/sec Loss 4.3419 LearningRate 0.0104 Epoch: 13 Global Step: 561700 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:28,802-Speed 2622.21 samples/sec Loss 4.2913 LearningRate 0.0104 Epoch: 13 Global Step: 561710 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:32,724-Speed 2611.41 samples/sec Loss 4.3128 LearningRate 0.0104 Epoch: 13 Global Step: 561720 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:36,625-Speed 2625.43 samples/sec Loss 4.3539 LearningRate 0.0104 Epoch: 13 Global Step: 561730 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:51:40,526-Speed 2625.81 samples/sec Loss 4.4403 LearningRate 0.0104 Epoch: 13 Global Step: 561740 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:51:44,414-Speed 2634.58 samples/sec Loss 4.3002 LearningRate 0.0104 Epoch: 13 Global Step: 561750 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:48,317-Speed 2623.68 samples/sec Loss 4.2568 LearningRate 0.0104 Epoch: 13 Global Step: 561760 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:52,210-Speed 2631.19 samples/sec Loss 4.3393 LearningRate 0.0104 Epoch: 13 Global Step: 561770 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:51:56,111-Speed 2625.70 samples/sec Loss 4.3423 LearningRate 0.0104 Epoch: 13 Global Step: 561780 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:52:00,022-Speed 2618.90 samples/sec Loss 4.4183 LearningRate 0.0104 Epoch: 13 Global Step: 561790 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:52:03,906-Speed 2636.99 samples/sec Loss 4.3397 LearningRate 0.0104 Epoch: 13 Global Step: 561800 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:07,902-Speed 2563.18 samples/sec Loss 4.3113 LearningRate 0.0104 Epoch: 13 Global Step: 561810 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:11,799-Speed 2628.21 samples/sec Loss 4.3474 LearningRate 0.0104 Epoch: 13 Global Step: 561820 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:15,697-Speed 2627.32 samples/sec Loss 4.3339 LearningRate 0.0104 Epoch: 13 Global Step: 561830 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:19,595-Speed 2628.32 samples/sec Loss 4.3943 LearningRate 0.0104 Epoch: 13 Global Step: 561840 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:23,513-Speed 2614.22 samples/sec Loss 4.4338 LearningRate 0.0104 Epoch: 13 Global Step: 561850 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:27,414-Speed 2625.89 samples/sec Loss 4.4172 LearningRate 0.0104 Epoch: 13 Global Step: 561860 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:31,311-Speed 2628.44 samples/sec Loss 4.4977 LearningRate 0.0104 Epoch: 13 Global Step: 561870 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:35,208-Speed 2628.31 samples/sec Loss 4.3441 LearningRate 0.0104 Epoch: 13 Global Step: 561880 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:39,118-Speed 2619.46 samples/sec Loss 4.3612 LearningRate 0.0104 Epoch: 13 Global Step: 561890 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:43,019-Speed 2625.65 samples/sec Loss 4.3800 LearningRate 0.0104 Epoch: 13 Global Step: 561900 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:52:46,926-Speed 2621.11 samples/sec Loss 4.3336 LearningRate 0.0104 Epoch: 13 Global Step: 561910 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:52:50,799-Speed 2645.17 samples/sec Loss 4.3515 LearningRate 0.0104 Epoch: 13 Global Step: 561920 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:54,707-Speed 2620.91 samples/sec Loss 4.3201 LearningRate 0.0104 Epoch: 13 Global Step: 561930 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:52:58,602-Speed 2629.55 samples/sec Loss 4.3262 LearningRate 0.0104 Epoch: 13 Global Step: 561940 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:02,505-Speed 2624.09 samples/sec Loss 4.3622 LearningRate 0.0104 Epoch: 13 Global Step: 561950 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:06,412-Speed 2621.39 samples/sec Loss 4.3315 LearningRate 0.0104 Epoch: 13 Global Step: 561960 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:10,311-Speed 2627.00 samples/sec Loss 4.3735 LearningRate 0.0104 Epoch: 13 Global Step: 561970 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:14,204-Speed 2631.35 samples/sec Loss 4.4785 LearningRate 0.0104 Epoch: 13 Global Step: 561980 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:18,107-Speed 2624.03 samples/sec Loss 4.4696 LearningRate 0.0104 Epoch: 13 Global Step: 561990 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:22,010-Speed 2624.30 samples/sec Loss 4.3638 LearningRate 0.0104 Epoch: 13 Global Step: 562000 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:25,902-Speed 2631.82 samples/sec Loss 4.2929 LearningRate 0.0104 Epoch: 13 Global Step: 562010 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:53:29,802-Speed 2626.39 samples/sec Loss 4.4084 LearningRate 0.0104 Epoch: 13 Global Step: 562020 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:53:33,692-Speed 2633.43 samples/sec Loss 4.3744 LearningRate 0.0104 Epoch: 13 Global Step: 562030 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:53:37,599-Speed 2621.48 samples/sec Loss 4.4928 LearningRate 0.0104 Epoch: 13 Global Step: 562040 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:53:41,498-Speed 2626.61 samples/sec Loss 4.3653 LearningRate 0.0104 Epoch: 13 Global Step: 562050 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:53:45,397-Speed 2626.96 samples/sec Loss 4.3860 LearningRate 0.0104 Epoch: 13 Global Step: 562060 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:53:49,290-Speed 2631.39 samples/sec Loss 4.3627 LearningRate 0.0104 Epoch: 13 Global Step: 562070 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:53:53,186-Speed 2629.37 samples/sec Loss 4.3842 LearningRate 0.0104 Epoch: 13 Global Step: 562080 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:53:57,081-Speed 2630.01 samples/sec Loss 4.3805 LearningRate 0.0104 Epoch: 13 Global Step: 562090 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:00,974-Speed 2631.25 samples/sec Loss 4.3088 LearningRate 0.0104 Epoch: 13 Global Step: 562100 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:04,921-Speed 2594.78 samples/sec Loss 4.3255 LearningRate 0.0104 Epoch: 13 Global Step: 562110 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:08,823-Speed 2625.04 samples/sec Loss 4.4057 LearningRate 0.0104 Epoch: 13 Global Step: 562120 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:54:12,797-Speed 2577.29 samples/sec Loss 4.3903 LearningRate 0.0104 Epoch: 13 Global Step: 562130 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:54:16,692-Speed 2629.39 samples/sec Loss 4.3259 LearningRate 0.0104 Epoch: 13 Global Step: 562140 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:20,587-Speed 2629.82 samples/sec Loss 4.2305 LearningRate 0.0104 Epoch: 13 Global Step: 562150 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:24,484-Speed 2629.04 samples/sec Loss 4.3273 LearningRate 0.0104 Epoch: 13 Global Step: 562160 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:28,384-Speed 2626.29 samples/sec Loss 4.3878 LearningRate 0.0104 Epoch: 13 Global Step: 562170 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:32,303-Speed 2613.33 samples/sec Loss 4.3403 LearningRate 0.0104 Epoch: 13 Global Step: 562180 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:36,195-Speed 2631.79 samples/sec Loss 4.4004 LearningRate 0.0104 Epoch: 13 Global Step: 562190 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:40,095-Speed 2626.75 samples/sec Loss 4.3695 LearningRate 0.0104 Epoch: 13 Global Step: 562200 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:43,990-Speed 2629.67 samples/sec Loss 4.3653 LearningRate 0.0104 Epoch: 13 Global Step: 562210 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:47,882-Speed 2631.73 samples/sec Loss 4.2573 LearningRate 0.0104 Epoch: 13 Global Step: 562220 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:51,775-Speed 2630.58 samples/sec Loss 4.2541 LearningRate 0.0104 Epoch: 13 Global Step: 562230 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:54:55,677-Speed 2625.52 samples/sec Loss 4.3768 LearningRate 0.0104 Epoch: 13 Global Step: 562240 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:54:59,553-Speed 2642.54 samples/sec Loss 4.3923 LearningRate 0.0104 Epoch: 13 Global Step: 562250 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:03,444-Speed 2632.55 samples/sec Loss 4.3934 LearningRate 0.0104 Epoch: 13 Global Step: 562260 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:07,332-Speed 2633.91 samples/sec Loss 4.3653 LearningRate 0.0104 Epoch: 13 Global Step: 562270 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:11,227-Speed 2630.27 samples/sec Loss 4.3786 LearningRate 0.0104 Epoch: 13 Global Step: 562280 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:15,121-Speed 2630.18 samples/sec Loss 4.3606 LearningRate 0.0104 Epoch: 13 Global Step: 562290 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:19,018-Speed 2628.19 samples/sec Loss 4.4508 LearningRate 0.0104 Epoch: 13 Global Step: 562300 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:22,913-Speed 2630.16 samples/sec Loss 4.3880 LearningRate 0.0104 Epoch: 13 Global Step: 562310 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:26,810-Speed 2628.01 samples/sec Loss 4.2532 LearningRate 0.0104 Epoch: 13 Global Step: 562320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:30,705-Speed 2629.87 samples/sec Loss 4.3576 LearningRate 0.0104 Epoch: 13 Global Step: 562330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:34,600-Speed 2629.34 samples/sec Loss 4.4209 LearningRate 0.0104 Epoch: 13 Global Step: 562340 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:38,473-Speed 2645.08 samples/sec Loss 4.2914 LearningRate 0.0104 Epoch: 13 Global Step: 562350 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:42,366-Speed 2631.06 samples/sec Loss 4.3409 LearningRate 0.0104 Epoch: 13 Global Step: 562360 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:46,258-Speed 2632.20 samples/sec Loss 4.3176 LearningRate 0.0104 Epoch: 13 Global Step: 562370 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:50,152-Speed 2630.08 samples/sec Loss 4.2722 LearningRate 0.0104 Epoch: 13 Global Step: 562380 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:54,047-Speed 2630.03 samples/sec Loss 4.3019 LearningRate 0.0104 Epoch: 13 Global Step: 562390 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:55:57,946-Speed 2627.43 samples/sec Loss 4.3135 LearningRate 0.0104 Epoch: 13 Global Step: 562400 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:01,853-Speed 2620.80 samples/sec Loss 4.3726 LearningRate 0.0104 Epoch: 13 Global Step: 562410 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:05,749-Speed 2629.50 samples/sec Loss 4.3014 LearningRate 0.0104 Epoch: 13 Global Step: 562420 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:09,639-Speed 2634.10 samples/sec Loss 4.2727 LearningRate 0.0104 Epoch: 13 Global Step: 562430 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:13,536-Speed 2628.68 samples/sec Loss 4.3533 LearningRate 0.0104 Epoch: 13 Global Step: 562440 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:17,430-Speed 2630.21 samples/sec Loss 4.2960 LearningRate 0.0104 Epoch: 13 Global Step: 562450 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:56:21,337-Speed 2621.10 samples/sec Loss 4.4761 LearningRate 0.0104 Epoch: 13 Global Step: 562460 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:56:25,236-Speed 2628.16 samples/sec Loss 4.4329 LearningRate 0.0104 Epoch: 13 Global Step: 562470 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:56:29,131-Speed 2629.33 samples/sec Loss 4.4047 LearningRate 0.0104 Epoch: 13 Global Step: 562480 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:56:33,024-Speed 2630.97 samples/sec Loss 4.2638 LearningRate 0.0104 Epoch: 13 Global Step: 562490 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:56:36,900-Speed 2642.00 samples/sec Loss 4.3779 LearningRate 0.0104 Epoch: 13 Global Step: 562500 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:40,794-Speed 2631.00 samples/sec Loss 4.3373 LearningRate 0.0104 Epoch: 13 Global Step: 562510 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:44,692-Speed 2627.52 samples/sec Loss 4.3418 LearningRate 0.0104 Epoch: 13 Global Step: 562520 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:48,600-Speed 2620.96 samples/sec Loss 4.3049 LearningRate 0.0104 Epoch: 13 Global Step: 562530 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:52,496-Speed 2628.89 samples/sec Loss 4.2307 LearningRate 0.0104 Epoch: 13 Global Step: 562540 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:56:56,415-Speed 2614.57 samples/sec Loss 4.3289 LearningRate 0.0104 Epoch: 13 Global Step: 562550 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:57:00,300-Speed 2635.77 samples/sec Loss 4.3720 LearningRate 0.0104 Epoch: 13 Global Step: 562560 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:04,204-Speed 2623.67 samples/sec Loss 4.2796 LearningRate 0.0104 Epoch: 13 Global Step: 562570 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:08,097-Speed 2631.25 samples/sec Loss 4.3312 LearningRate 0.0104 Epoch: 13 Global Step: 562580 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:12,003-Speed 2622.04 samples/sec Loss 4.3226 LearningRate 0.0104 Epoch: 13 Global Step: 562590 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:15,908-Speed 2623.15 samples/sec Loss 4.3542 LearningRate 0.0104 Epoch: 13 Global Step: 562600 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:19,838-Speed 2606.39 samples/sec Loss 4.3489 LearningRate 0.0104 Epoch: 13 Global Step: 562610 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:23,734-Speed 2629.35 samples/sec Loss 4.3283 LearningRate 0.0104 Epoch: 13 Global Step: 562620 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:27,631-Speed 2628.74 samples/sec Loss 4.3500 LearningRate 0.0104 Epoch: 13 Global Step: 562630 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:31,545-Speed 2616.48 samples/sec Loss 4.2853 LearningRate 0.0104 Epoch: 13 Global Step: 562640 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:35,464-Speed 2613.34 samples/sec Loss 4.3635 LearningRate 0.0104 Epoch: 13 Global Step: 562650 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 10:57:39,364-Speed 2626.20 samples/sec Loss 4.3566 LearningRate 0.0104 Epoch: 13 Global Step: 562660 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:57:43,259-Speed 2630.45 samples/sec Loss 4.1869 LearningRate 0.0104 Epoch: 13 Global Step: 562670 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:57:47,155-Speed 2628.27 samples/sec Loss 4.3978 LearningRate 0.0104 Epoch: 13 Global Step: 562680 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:57:51,072-Speed 2615.53 samples/sec Loss 4.2817 LearningRate 0.0104 Epoch: 13 Global Step: 562690 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:57:54,971-Speed 2627.05 samples/sec Loss 4.3410 LearningRate 0.0103 Epoch: 13 Global Step: 562700 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:57:58,879-Speed 2621.60 samples/sec Loss 4.2615 LearningRate 0.0103 Epoch: 13 Global Step: 562710 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:02,777-Speed 2626.92 samples/sec Loss 4.3634 LearningRate 0.0103 Epoch: 13 Global Step: 562720 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:06,694-Speed 2615.20 samples/sec Loss 4.3864 LearningRate 0.0103 Epoch: 13 Global Step: 562730 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:10,596-Speed 2624.57 samples/sec Loss 4.3190 LearningRate 0.0103 Epoch: 13 Global Step: 562740 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:14,605-Speed 2555.38 samples/sec Loss 4.3650 LearningRate 0.0103 Epoch: 13 Global Step: 562750 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:18,498-Speed 2630.94 samples/sec Loss 4.3378 LearningRate 0.0103 Epoch: 13 Global Step: 562760 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:58:22,373-Speed 2644.01 samples/sec Loss 4.2844 LearningRate 0.0103 Epoch: 13 Global Step: 562770 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:26,282-Speed 2619.97 samples/sec Loss 4.1813 LearningRate 0.0103 Epoch: 13 Global Step: 562780 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:30,178-Speed 2629.65 samples/sec Loss 4.3122 LearningRate 0.0103 Epoch: 13 Global Step: 562790 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:34,080-Speed 2624.49 samples/sec Loss 4.3236 LearningRate 0.0103 Epoch: 13 Global Step: 562800 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:37,980-Speed 2626.11 samples/sec Loss 4.4464 LearningRate 0.0103 Epoch: 13 Global Step: 562810 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:41,879-Speed 2626.90 samples/sec Loss 4.3236 LearningRate 0.0103 Epoch: 13 Global Step: 562820 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:45,774-Speed 2629.61 samples/sec Loss 4.3447 LearningRate 0.0103 Epoch: 13 Global Step: 562830 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:49,669-Speed 2629.41 samples/sec Loss 4.4712 LearningRate 0.0103 Epoch: 13 Global Step: 562840 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:53,577-Speed 2620.92 samples/sec Loss 4.3342 LearningRate 0.0103 Epoch: 13 Global Step: 562850 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:58:57,471-Speed 2630.71 samples/sec Loss 4.2928 LearningRate 0.0103 Epoch: 13 Global Step: 562860 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:01,362-Speed 2632.78 samples/sec Loss 4.2944 LearningRate 0.0103 Epoch: 13 Global Step: 562870 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:05,266-Speed 2623.60 samples/sec Loss 4.3622 LearningRate 0.0103 Epoch: 13 Global Step: 562880 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:09,158-Speed 2631.38 samples/sec Loss 4.3602 LearningRate 0.0103 Epoch: 13 Global Step: 562890 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:13,058-Speed 2626.13 samples/sec Loss 4.2780 LearningRate 0.0103 Epoch: 13 Global Step: 562900 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:16,949-Speed 2632.20 samples/sec Loss 4.3810 LearningRate 0.0103 Epoch: 13 Global Step: 562910 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:20,842-Speed 2631.12 samples/sec Loss 4.3555 LearningRate 0.0103 Epoch: 13 Global Step: 562920 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:24,735-Speed 2630.83 samples/sec Loss 4.3441 LearningRate 0.0103 Epoch: 13 Global Step: 562930 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:28,632-Speed 2628.22 samples/sec Loss 4.3494 LearningRate 0.0103 Epoch: 13 Global Step: 562940 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:32,526-Speed 2630.06 samples/sec Loss 4.3740 LearningRate 0.0103 Epoch: 13 Global Step: 562950 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:36,420-Speed 2630.39 samples/sec Loss 4.2470 LearningRate 0.0103 Epoch: 13 Global Step: 562960 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:40,312-Speed 2631.87 samples/sec Loss 4.4493 LearningRate 0.0103 Epoch: 13 Global Step: 562970 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:59:44,211-Speed 2627.30 samples/sec Loss 4.4026 LearningRate 0.0103 Epoch: 13 Global Step: 562980 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:59:48,114-Speed 2624.24 samples/sec Loss 4.2949 LearningRate 0.0103 Epoch: 13 Global Step: 562990 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 10:59:51,986-Speed 2645.30 samples/sec Loss 4.3275 LearningRate 0.0103 Epoch: 13 Global Step: 563000 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:55,879-Speed 2630.70 samples/sec Loss 4.3175 LearningRate 0.0103 Epoch: 13 Global Step: 563010 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 10:59:59,772-Speed 2631.15 samples/sec Loss 4.3446 LearningRate 0.0103 Epoch: 13 Global Step: 563020 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:00:03,664-Speed 2632.14 samples/sec Loss 4.3478 LearningRate 0.0103 Epoch: 13 Global Step: 563030 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:00:07,566-Speed 2624.99 samples/sec Loss 4.3831 LearningRate 0.0103 Epoch: 13 Global Step: 563040 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:00:11,497-Speed 2604.91 samples/sec Loss 4.3428 LearningRate 0.0103 Epoch: 13 Global Step: 563050 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:00:15,391-Speed 2630.87 samples/sec Loss 4.3415 LearningRate 0.0103 Epoch: 13 Global Step: 563060 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:00:19,290-Speed 2627.29 samples/sec Loss 4.3069 LearningRate 0.0103 Epoch: 13 Global Step: 563070 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:00:23,162-Speed 2645.00 samples/sec Loss 4.3501 LearningRate 0.0103 Epoch: 13 Global Step: 563080 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:27,107-Speed 2596.40 samples/sec Loss 4.3446 LearningRate 0.0103 Epoch: 13 Global Step: 563090 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:31,011-Speed 2623.97 samples/sec Loss 4.2992 LearningRate 0.0103 Epoch: 13 Global Step: 563100 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:34,902-Speed 2632.20 samples/sec Loss 4.3498 LearningRate 0.0103 Epoch: 13 Global Step: 563110 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:38,810-Speed 2621.18 samples/sec Loss 4.2948 LearningRate 0.0103 Epoch: 13 Global Step: 563120 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:42,727-Speed 2614.60 samples/sec Loss 4.3644 LearningRate 0.0103 Epoch: 13 Global Step: 563130 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:46,624-Speed 2629.00 samples/sec Loss 4.3493 LearningRate 0.0103 Epoch: 13 Global Step: 563140 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:50,516-Speed 2631.55 samples/sec Loss 4.3192 LearningRate 0.0103 Epoch: 13 Global Step: 563150 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:54,421-Speed 2622.93 samples/sec Loss 4.3475 LearningRate 0.0103 Epoch: 13 Global Step: 563160 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:00:58,316-Speed 2629.16 samples/sec Loss 4.3148 LearningRate 0.0103 Epoch: 13 Global Step: 563170 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:01:02,215-Speed 2626.75 samples/sec Loss 4.2117 LearningRate 0.0103 Epoch: 13 Global Step: 563180 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:06,111-Speed 2629.56 samples/sec Loss 4.2864 LearningRate 0.0103 Epoch: 13 Global Step: 563190 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:10,013-Speed 2625.09 samples/sec Loss 4.2975 LearningRate 0.0103 Epoch: 13 Global Step: 563200 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:13,911-Speed 2627.70 samples/sec Loss 4.3744 LearningRate 0.0103 Epoch: 13 Global Step: 563210 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:17,826-Speed 2616.20 samples/sec Loss 4.2651 LearningRate 0.0103 Epoch: 13 Global Step: 563220 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:21,721-Speed 2629.83 samples/sec Loss 4.2751 LearningRate 0.0103 Epoch: 13 Global Step: 563230 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:25,614-Speed 2631.18 samples/sec Loss 4.3995 LearningRate 0.0103 Epoch: 13 Global Step: 563240 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:29,507-Speed 2630.84 samples/sec Loss 4.4318 LearningRate 0.0103 Epoch: 13 Global Step: 563250 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:33,398-Speed 2632.14 samples/sec Loss 4.2798 LearningRate 0.0103 Epoch: 13 Global Step: 563260 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:37,293-Speed 2629.99 samples/sec Loss 4.3313 LearningRate 0.0103 Epoch: 13 Global Step: 563270 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:01:41,196-Speed 2624.31 samples/sec Loss 4.4093 LearningRate 0.0103 Epoch: 13 Global Step: 563280 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:01:45,088-Speed 2631.48 samples/sec Loss 4.2298 LearningRate 0.0103 Epoch: 13 Global Step: 563290 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:01:48,983-Speed 2630.16 samples/sec Loss 4.2473 LearningRate 0.0103 Epoch: 13 Global Step: 563300 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:01:52,892-Speed 2619.80 samples/sec Loss 4.2795 LearningRate 0.0103 Epoch: 13 Global Step: 563310 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:01:56,767-Speed 2644.19 samples/sec Loss 4.3485 LearningRate 0.0103 Epoch: 13 Global Step: 563320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:00,665-Speed 2627.40 samples/sec Loss 4.3684 LearningRate 0.0103 Epoch: 13 Global Step: 563330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:04,575-Speed 2619.07 samples/sec Loss 4.3269 LearningRate 0.0103 Epoch: 13 Global Step: 563340 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:08,497-Speed 2611.95 samples/sec Loss 4.3594 LearningRate 0.0103 Epoch: 13 Global Step: 563350 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:12,393-Speed 2628.84 samples/sec Loss 4.2940 LearningRate 0.0103 Epoch: 13 Global Step: 563360 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:16,289-Speed 2629.26 samples/sec Loss 4.2482 LearningRate 0.0103 Epoch: 13 Global Step: 563370 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:20,194-Speed 2623.22 samples/sec Loss 4.2444 LearningRate 0.0103 Epoch: 13 Global Step: 563380 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:24,090-Speed 2628.69 samples/sec Loss 4.3969 LearningRate 0.0103 Epoch: 13 Global Step: 563390 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:27,989-Speed 2627.20 samples/sec Loss 4.3441 LearningRate 0.0103 Epoch: 13 Global Step: 563400 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:31,890-Speed 2625.78 samples/sec Loss 4.3016 LearningRate 0.0103 Epoch: 13 Global Step: 563410 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:02:35,786-Speed 2628.89 samples/sec Loss 4.3586 LearningRate 0.0103 Epoch: 13 Global Step: 563420 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:02:39,682-Speed 2628.75 samples/sec Loss 4.4287 LearningRate 0.0103 Epoch: 13 Global Step: 563430 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:02:43,580-Speed 2627.83 samples/sec Loss 4.3475 LearningRate 0.0103 Epoch: 13 Global Step: 563440 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:02:47,484-Speed 2623.43 samples/sec Loss 4.2565 LearningRate 0.0103 Epoch: 13 Global Step: 563450 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:02:51,381-Speed 2628.30 samples/sec Loss 4.3720 LearningRate 0.0103 Epoch: 13 Global Step: 563460 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:02:55,292-Speed 2619.36 samples/sec Loss 4.3657 LearningRate 0.0103 Epoch: 13 Global Step: 563470 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:02:59,196-Speed 2622.87 samples/sec Loss 4.3533 LearningRate 0.0103 Epoch: 13 Global Step: 563480 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:03:03,081-Speed 2636.46 samples/sec Loss 4.1988 LearningRate 0.0103 Epoch: 13 Global Step: 563490 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:06,978-Speed 2628.79 samples/sec Loss 4.4002 LearningRate 0.0103 Epoch: 13 Global Step: 563500 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:10,876-Speed 2627.40 samples/sec Loss 4.4451 LearningRate 0.0103 Epoch: 13 Global Step: 563510 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:14,768-Speed 2631.43 samples/sec Loss 4.2624 LearningRate 0.0103 Epoch: 13 Global Step: 563520 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:18,672-Speed 2623.66 samples/sec Loss 4.3724 LearningRate 0.0103 Epoch: 13 Global Step: 563530 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:22,565-Speed 2631.46 samples/sec Loss 4.3839 LearningRate 0.0103 Epoch: 13 Global Step: 563540 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:26,456-Speed 2632.02 samples/sec Loss 4.3304 LearningRate 0.0103 Epoch: 13 Global Step: 563550 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:30,353-Speed 2629.18 samples/sec Loss 4.4022 LearningRate 0.0103 Epoch: 13 Global Step: 563560 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:34,273-Speed 2612.67 samples/sec Loss 4.2843 LearningRate 0.0103 Epoch: 13 Global Step: 563570 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:38,172-Speed 2626.78 samples/sec Loss 4.3492 LearningRate 0.0103 Epoch: 13 Global Step: 563580 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:42,066-Speed 2630.07 samples/sec Loss 4.3384 LearningRate 0.0103 Epoch: 13 Global Step: 563590 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:45,957-Speed 2632.79 samples/sec Loss 4.3243 LearningRate 0.0103 Epoch: 13 Global Step: 563600 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:49,859-Speed 2625.46 samples/sec Loss 4.4290 LearningRate 0.0103 Epoch: 13 Global Step: 563610 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:53,760-Speed 2625.39 samples/sec Loss 4.3773 LearningRate 0.0103 Epoch: 13 Global Step: 563620 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:03:57,664-Speed 2623.96 samples/sec Loss 4.3659 LearningRate 0.0103 Epoch: 13 Global Step: 563630 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:01,568-Speed 2624.03 samples/sec Loss 4.3152 LearningRate 0.0103 Epoch: 13 Global Step: 563640 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:05,460-Speed 2631.36 samples/sec Loss 4.2929 LearningRate 0.0103 Epoch: 13 Global Step: 563650 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:09,363-Speed 2624.48 samples/sec Loss 4.3496 LearningRate 0.0103 Epoch: 13 Global Step: 563660 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:13,259-Speed 2628.48 samples/sec Loss 4.3312 LearningRate 0.0103 Epoch: 13 Global Step: 563670 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:17,155-Speed 2628.94 samples/sec Loss 4.2950 LearningRate 0.0103 Epoch: 13 Global Step: 563680 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:21,051-Speed 2629.44 samples/sec Loss 4.3228 LearningRate 0.0103 Epoch: 13 Global Step: 563690 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:04:24,930-Speed 2640.37 samples/sec Loss 4.2226 LearningRate 0.0103 Epoch: 13 Global Step: 563700 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:28,829-Speed 2627.65 samples/sec Loss 4.3849 LearningRate 0.0103 Epoch: 13 Global Step: 563710 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:32,729-Speed 2626.32 samples/sec Loss 4.2938 LearningRate 0.0103 Epoch: 13 Global Step: 563720 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:36,634-Speed 2622.16 samples/sec Loss 4.3623 LearningRate 0.0103 Epoch: 13 Global Step: 563730 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:40,530-Speed 2628.89 samples/sec Loss 4.3075 LearningRate 0.0103 Epoch: 13 Global Step: 563740 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:44,468-Speed 2601.61 samples/sec Loss 4.3243 LearningRate 0.0103 Epoch: 13 Global Step: 563750 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:48,386-Speed 2614.31 samples/sec Loss 4.3456 LearningRate 0.0103 Epoch: 13 Global Step: 563760 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:52,297-Speed 2619.19 samples/sec Loss 4.2850 LearningRate 0.0103 Epoch: 13 Global Step: 563770 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:04:56,200-Speed 2624.00 samples/sec Loss 4.3289 LearningRate 0.0103 Epoch: 13 Global Step: 563780 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:00,093-Speed 2631.58 samples/sec Loss 4.3575 LearningRate 0.0103 Epoch: 13 Global Step: 563790 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:03,988-Speed 2629.41 samples/sec Loss 4.2954 LearningRate 0.0103 Epoch: 13 Global Step: 563800 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:05:07,867-Speed 2640.27 samples/sec Loss 4.2664 LearningRate 0.0103 Epoch: 13 Global Step: 563810 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:11,755-Speed 2634.15 samples/sec Loss 4.2576 LearningRate 0.0103 Epoch: 13 Global Step: 563820 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:15,659-Speed 2624.41 samples/sec Loss 4.2553 LearningRate 0.0103 Epoch: 13 Global Step: 563830 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:19,553-Speed 2629.76 samples/sec Loss 4.2614 LearningRate 0.0103 Epoch: 13 Global Step: 563840 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:23,452-Speed 2627.70 samples/sec Loss 4.3395 LearningRate 0.0103 Epoch: 13 Global Step: 563850 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:27,347-Speed 2629.06 samples/sec Loss 4.2966 LearningRate 0.0103 Epoch: 13 Global Step: 563860 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:31,262-Speed 2617.06 samples/sec Loss 4.2949 LearningRate 0.0103 Epoch: 13 Global Step: 563870 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:35,166-Speed 2623.55 samples/sec Loss 4.3017 LearningRate 0.0103 Epoch: 13 Global Step: 563880 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:39,235-Speed 2517.34 samples/sec Loss 4.2814 LearningRate 0.0103 Epoch: 13 Global Step: 563890 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:43,129-Speed 2629.83 samples/sec Loss 4.2891 LearningRate 0.0103 Epoch: 13 Global Step: 563900 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:05:47,023-Speed 2630.75 samples/sec Loss 4.3359 LearningRate 0.0103 Epoch: 13 Global Step: 563910 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:05:50,923-Speed 2626.74 samples/sec Loss 4.2565 LearningRate 0.0103 Epoch: 13 Global Step: 563920 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:05:54,833-Speed 2619.26 samples/sec Loss 4.4418 LearningRate 0.0103 Epoch: 13 Global Step: 563930 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:05:58,710-Speed 2642.03 samples/sec Loss 4.2399 LearningRate 0.0103 Epoch: 13 Global Step: 563940 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:02,609-Speed 2627.08 samples/sec Loss 4.3809 LearningRate 0.0103 Epoch: 13 Global Step: 563950 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:06,504-Speed 2629.75 samples/sec Loss 4.2986 LearningRate 0.0103 Epoch: 13 Global Step: 563960 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:10,411-Speed 2621.31 samples/sec Loss 4.3065 LearningRate 0.0103 Epoch: 13 Global Step: 563970 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:14,305-Speed 2630.42 samples/sec Loss 4.2456 LearningRate 0.0103 Epoch: 13 Global Step: 563980 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:18,362-Speed 2524.69 samples/sec Loss 4.3384 LearningRate 0.0102 Epoch: 13 Global Step: 563990 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:22,358-Speed 2563.30 samples/sec Loss 4.3124 LearningRate 0.0102 Epoch: 13 Global Step: 564000 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:26,259-Speed 2626.14 samples/sec Loss 4.3278 LearningRate 0.0102 Epoch: 13 Global Step: 564010 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:30,159-Speed 2626.08 samples/sec Loss 4.3169 LearningRate 0.0102 Epoch: 13 Global Step: 564020 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:34,062-Speed 2624.21 samples/sec Loss 4.3487 LearningRate 0.0102 Epoch: 13 Global Step: 564030 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:06:37,933-Speed 2646.14 samples/sec Loss 4.2756 LearningRate 0.0102 Epoch: 13 Global Step: 564040 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:06:41,827-Speed 2630.38 samples/sec Loss 4.1330 LearningRate 0.0102 Epoch: 13 Global Step: 564050 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:06:45,723-Speed 2628.85 samples/sec Loss 4.2795 LearningRate 0.0102 Epoch: 13 Global Step: 564060 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:06:49,639-Speed 2615.37 samples/sec Loss 4.3041 LearningRate 0.0102 Epoch: 13 Global Step: 564070 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:06:53,542-Speed 2624.74 samples/sec Loss 4.3571 LearningRate 0.0102 Epoch: 13 Global Step: 564080 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:06:57,432-Speed 2633.05 samples/sec Loss 4.3827 LearningRate 0.0102 Epoch: 13 Global Step: 564090 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:07:01,327-Speed 2630.12 samples/sec Loss 4.3668 LearningRate 0.0102 Epoch: 13 Global Step: 564100 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:07:05,229-Speed 2624.45 samples/sec Loss 4.3278 LearningRate 0.0102 Epoch: 13 Global Step: 564110 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:07:09,123-Speed 2630.52 samples/sec Loss 4.2089 LearningRate 0.0102 Epoch: 13 Global Step: 564120 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:07:13,014-Speed 2632.35 samples/sec Loss 4.1960 LearningRate 0.0102 Epoch: 13 Global Step: 564130 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:07:16,930-Speed 2615.73 samples/sec Loss 4.3694 LearningRate 0.0102 Epoch: 13 Global Step: 564140 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:20,826-Speed 2629.01 samples/sec Loss 4.2923 LearningRate 0.0102 Epoch: 13 Global Step: 564150 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:24,735-Speed 2620.73 samples/sec Loss 4.2406 LearningRate 0.0102 Epoch: 13 Global Step: 564160 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:28,637-Speed 2624.65 samples/sec Loss 4.2695 LearningRate 0.0102 Epoch: 13 Global Step: 564170 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:32,541-Speed 2623.48 samples/sec Loss 4.3156 LearningRate 0.0102 Epoch: 13 Global Step: 564180 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:36,442-Speed 2625.82 samples/sec Loss 4.3424 LearningRate 0.0102 Epoch: 13 Global Step: 564190 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:40,338-Speed 2628.86 samples/sec Loss 4.3592 LearningRate 0.0102 Epoch: 13 Global Step: 564200 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:44,256-Speed 2614.46 samples/sec Loss 4.3107 LearningRate 0.0102 Epoch: 13 Global Step: 564210 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:48,168-Speed 2618.70 samples/sec Loss 4.3411 LearningRate 0.0102 Epoch: 13 Global Step: 564220 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:52,066-Speed 2627.50 samples/sec Loss 4.2997 LearningRate 0.0102 Epoch: 13 Global Step: 564230 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:07:55,960-Speed 2630.79 samples/sec Loss 4.3975 LearningRate 0.0102 Epoch: 13 Global Step: 564240 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:07:59,856-Speed 2628.58 samples/sec Loss 4.3163 LearningRate 0.0102 Epoch: 13 Global Step: 564250 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:08:03,766-Speed 2619.47 samples/sec Loss 4.2198 LearningRate 0.0102 Epoch: 13 Global Step: 564260 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:08:07,663-Speed 2628.50 samples/sec Loss 4.3322 LearningRate 0.0102 Epoch: 13 Global Step: 564270 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:08:11,555-Speed 2631.48 samples/sec Loss 4.2941 LearningRate 0.0102 Epoch: 13 Global Step: 564280 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:08:15,453-Speed 2627.93 samples/sec Loss 4.3164 LearningRate 0.0102 Epoch: 13 Global Step: 564290 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:08:19,324-Speed 2646.73 samples/sec Loss 4.2525 LearningRate 0.0102 Epoch: 13 Global Step: 564300 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:23,223-Speed 2626.58 samples/sec Loss 4.3774 LearningRate 0.0102 Epoch: 13 Global Step: 564310 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:27,717-Speed 2280.03 samples/sec Loss 4.3719 LearningRate 0.0102 Epoch: 13 Global Step: 564320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:31,623-Speed 2622.56 samples/sec Loss 4.3640 LearningRate 0.0102 Epoch: 13 Global Step: 564330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:35,538-Speed 2615.62 samples/sec Loss 4.3822 LearningRate 0.0102 Epoch: 13 Global Step: 564340 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:39,442-Speed 2624.29 samples/sec Loss 4.3682 LearningRate 0.0102 Epoch: 13 Global Step: 564350 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:43,347-Speed 2622.87 samples/sec Loss 4.3123 LearningRate 0.0102 Epoch: 13 Global Step: 564360 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:47,247-Speed 2627.31 samples/sec Loss 4.3064 LearningRate 0.0102 Epoch: 13 Global Step: 564370 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:51,142-Speed 2629.02 samples/sec Loss 4.2986 LearningRate 0.0102 Epoch: 13 Global Step: 564380 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:55,054-Speed 2619.17 samples/sec Loss 4.2828 LearningRate 0.0102 Epoch: 13 Global Step: 564390 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:08:58,964-Speed 2619.60 samples/sec Loss 4.1983 LearningRate 0.0102 Epoch: 13 Global Step: 564400 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:02,875-Speed 2618.79 samples/sec Loss 4.3022 LearningRate 0.0102 Epoch: 13 Global Step: 564410 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:06,780-Speed 2622.83 samples/sec Loss 4.2546 LearningRate 0.0102 Epoch: 13 Global Step: 564420 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:10,675-Speed 2629.81 samples/sec Loss 4.3908 LearningRate 0.0102 Epoch: 13 Global Step: 564430 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:14,572-Speed 2628.66 samples/sec Loss 4.3185 LearningRate 0.0102 Epoch: 13 Global Step: 564440 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:18,486-Speed 2616.48 samples/sec Loss 4.3040 LearningRate 0.0102 Epoch: 13 Global Step: 564450 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:22,388-Speed 2625.22 samples/sec Loss 4.3193 LearningRate 0.0102 Epoch: 13 Global Step: 564460 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:26,281-Speed 2630.60 samples/sec Loss 4.3563 LearningRate 0.0102 Epoch: 13 Global Step: 564470 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:30,175-Speed 2630.81 samples/sec Loss 4.2531 LearningRate 0.0102 Epoch: 13 Global Step: 564480 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:34,067-Speed 2631.73 samples/sec Loss 4.3009 LearningRate 0.0102 Epoch: 13 Global Step: 564490 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:37,943-Speed 2642.59 samples/sec Loss 4.3297 LearningRate 0.0102 Epoch: 13 Global Step: 564500 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:09:41,822-Speed 2639.94 samples/sec Loss 4.3496 LearningRate 0.0102 Epoch: 13 Global Step: 564510 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:09:45,713-Speed 2632.38 samples/sec Loss 4.3363 LearningRate 0.0102 Epoch: 13 Global Step: 564520 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:09:49,730-Speed 2549.86 samples/sec Loss 4.3099 LearningRate 0.0102 Epoch: 13 Global Step: 564530 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:09:53,783-Speed 2527.79 samples/sec Loss 4.2529 LearningRate 0.0102 Epoch: 13 Global Step: 564540 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:09:57,675-Speed 2631.72 samples/sec Loss 4.3270 LearningRate 0.0102 Epoch: 13 Global Step: 564550 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:01,568-Speed 2631.19 samples/sec Loss 4.3606 LearningRate 0.0102 Epoch: 13 Global Step: 564560 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:05,460-Speed 2631.70 samples/sec Loss 4.3588 LearningRate 0.0102 Epoch: 13 Global Step: 564570 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:09,355-Speed 2629.75 samples/sec Loss 4.2800 LearningRate 0.0102 Epoch: 13 Global Step: 564580 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:13,247-Speed 2631.45 samples/sec Loss 4.3459 LearningRate 0.0102 Epoch: 13 Global Step: 564590 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:17,143-Speed 2629.47 samples/sec Loss 4.2763 LearningRate 0.0102 Epoch: 13 Global Step: 564600 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:21,075-Speed 2605.83 samples/sec Loss 4.2667 LearningRate 0.0102 Epoch: 13 Global Step: 564610 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:10:24,978-Speed 2624.32 samples/sec Loss 4.4146 LearningRate 0.0102 Epoch: 13 Global Step: 564620 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:10:28,878-Speed 2626.51 samples/sec Loss 4.3057 LearningRate 0.0102 Epoch: 13 Global Step: 564630 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:10:32,780-Speed 2624.93 samples/sec Loss 4.3345 LearningRate 0.0102 Epoch: 13 Global Step: 564640 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:10:36,677-Speed 2628.37 samples/sec Loss 4.3504 LearningRate 0.0102 Epoch: 13 Global Step: 564650 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:10:40,565-Speed 2634.31 samples/sec Loss 4.3471 LearningRate 0.0102 Epoch: 13 Global Step: 564660 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:44,477-Speed 2618.50 samples/sec Loss 4.3005 LearningRate 0.0102 Epoch: 13 Global Step: 564670 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:48,404-Speed 2607.86 samples/sec Loss 4.3462 LearningRate 0.0102 Epoch: 13 Global Step: 564680 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:52,311-Speed 2622.13 samples/sec Loss 4.3886 LearningRate 0.0102 Epoch: 13 Global Step: 564690 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:10:56,210-Speed 2626.74 samples/sec Loss 4.2843 LearningRate 0.0102 Epoch: 13 Global Step: 564700 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:00,117-Speed 2621.88 samples/sec Loss 4.3562 LearningRate 0.0102 Epoch: 13 Global Step: 564710 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:04,012-Speed 2629.62 samples/sec Loss 4.2670 LearningRate 0.0102 Epoch: 13 Global Step: 564720 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:07,907-Speed 2629.59 samples/sec Loss 4.2681 LearningRate 0.0102 Epoch: 13 Global Step: 564730 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:11,799-Speed 2631.46 samples/sec Loss 4.3748 LearningRate 0.0102 Epoch: 13 Global Step: 564740 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:15,697-Speed 2628.97 samples/sec Loss 4.3363 LearningRate 0.0102 Epoch: 13 Global Step: 564750 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:19,608-Speed 2618.39 samples/sec Loss 4.3516 LearningRate 0.0102 Epoch: 13 Global Step: 564760 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:11:23,503-Speed 2630.28 samples/sec Loss 4.3424 LearningRate 0.0102 Epoch: 13 Global Step: 564770 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:27,398-Speed 2629.75 samples/sec Loss 4.3175 LearningRate 0.0102 Epoch: 13 Global Step: 564780 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:31,295-Speed 2628.26 samples/sec Loss 4.3054 LearningRate 0.0102 Epoch: 13 Global Step: 564790 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:35,191-Speed 2629.02 samples/sec Loss 4.2454 LearningRate 0.0102 Epoch: 13 Global Step: 564800 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:39,084-Speed 2630.97 samples/sec Loss 4.3406 LearningRate 0.0102 Epoch: 13 Global Step: 564810 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:43,078-Speed 2564.47 samples/sec Loss 4.2455 LearningRate 0.0102 Epoch: 13 Global Step: 564820 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:46,982-Speed 2624.48 samples/sec Loss 4.3397 LearningRate 0.0102 Epoch: 13 Global Step: 564830 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:50,877-Speed 2629.57 samples/sec Loss 4.3439 LearningRate 0.0102 Epoch: 13 Global Step: 564840 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:54,774-Speed 2628.46 samples/sec Loss 4.2891 LearningRate 0.0102 Epoch: 13 Global Step: 564850 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:11:58,664-Speed 2632.81 samples/sec Loss 4.3404 LearningRate 0.0102 Epoch: 13 Global Step: 564860 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:02,565-Speed 2625.81 samples/sec Loss 4.2973 LearningRate 0.0102 Epoch: 13 Global Step: 564870 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:12:06,441-Speed 2642.67 samples/sec Loss 4.3527 LearningRate 0.0102 Epoch: 13 Global Step: 564880 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:10,333-Speed 2631.25 samples/sec Loss 4.3681 LearningRate 0.0102 Epoch: 13 Global Step: 564890 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:14,236-Speed 2623.99 samples/sec Loss 4.3408 LearningRate 0.0102 Epoch: 13 Global Step: 564900 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:18,131-Speed 2630.13 samples/sec Loss 4.3315 LearningRate 0.0102 Epoch: 13 Global Step: 564910 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:22,040-Speed 2620.44 samples/sec Loss 4.2721 LearningRate 0.0102 Epoch: 13 Global Step: 564920 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:25,934-Speed 2630.51 samples/sec Loss 4.3870 LearningRate 0.0102 Epoch: 13 Global Step: 564930 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:29,831-Speed 2628.22 samples/sec Loss 4.4157 LearningRate 0.0102 Epoch: 13 Global Step: 564940 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:33,723-Speed 2631.72 samples/sec Loss 4.2609 LearningRate 0.0102 Epoch: 13 Global Step: 564950 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:37,643-Speed 2613.07 samples/sec Loss 4.3149 LearningRate 0.0102 Epoch: 13 Global Step: 564960 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:41,535-Speed 2631.56 samples/sec Loss 4.2193 LearningRate 0.0102 Epoch: 13 Global Step: 564970 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:12:45,422-Speed 2635.77 samples/sec Loss 4.3377 LearningRate 0.0102 Epoch: 13 Global Step: 564980 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:12:49,350-Speed 2607.48 samples/sec Loss 4.3063 LearningRate 0.0102 Epoch: 13 Global Step: 564990 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:12:53,296-Speed 2595.60 samples/sec Loss 4.3369 LearningRate 0.0102 Epoch: 13 Global Step: 565000 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:12:57,188-Speed 2632.23 samples/sec Loss 4.3259 LearningRate 0.0102 Epoch: 13 Global Step: 565010 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:13:01,083-Speed 2629.30 samples/sec Loss 4.2533 LearningRate 0.0102 Epoch: 13 Global Step: 565020 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:13:05,019-Speed 2602.03 samples/sec Loss 4.3359 LearningRate 0.0102 Epoch: 13 Global Step: 565030 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:13:08,918-Speed 2627.17 samples/sec Loss 4.2975 LearningRate 0.0102 Epoch: 13 Global Step: 565040 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:13:12,820-Speed 2625.32 samples/sec Loss 4.4208 LearningRate 0.0102 Epoch: 13 Global Step: 565050 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:13:16,725-Speed 2623.52 samples/sec Loss 4.3067 LearningRate 0.0102 Epoch: 13 Global Step: 565060 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:13:20,622-Speed 2627.83 samples/sec Loss 4.3004 LearningRate 0.0102 Epoch: 13 Global Step: 565070 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:13:24,641-Speed 2548.46 samples/sec Loss 4.4005 LearningRate 0.0102 Epoch: 13 Global Step: 565080 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:28,555-Speed 2616.77 samples/sec Loss 4.1912 LearningRate 0.0102 Epoch: 13 Global Step: 565090 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:32,447-Speed 2632.19 samples/sec Loss 4.3298 LearningRate 0.0102 Epoch: 13 Global Step: 565100 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:36,347-Speed 2626.44 samples/sec Loss 4.2866 LearningRate 0.0102 Epoch: 13 Global Step: 565110 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:40,241-Speed 2630.19 samples/sec Loss 4.2663 LearningRate 0.0102 Epoch: 13 Global Step: 565120 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:44,146-Speed 2622.74 samples/sec Loss 4.3575 LearningRate 0.0102 Epoch: 13 Global Step: 565130 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:48,041-Speed 2629.83 samples/sec Loss 4.2981 LearningRate 0.0102 Epoch: 13 Global Step: 565140 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:51,963-Speed 2611.70 samples/sec Loss 4.3604 LearningRate 0.0102 Epoch: 13 Global Step: 565150 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:55,864-Speed 2625.93 samples/sec Loss 4.2656 LearningRate 0.0102 Epoch: 13 Global Step: 565160 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:13:59,760-Speed 2629.86 samples/sec Loss 4.2219 LearningRate 0.0102 Epoch: 13 Global Step: 565170 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:03,659-Speed 2626.38 samples/sec Loss 4.2653 LearningRate 0.0102 Epoch: 13 Global Step: 565180 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:14:07,558-Speed 2627.58 samples/sec Loss 4.2814 LearningRate 0.0102 Epoch: 13 Global Step: 565190 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:14:11,450-Speed 2631.51 samples/sec Loss 4.2812 LearningRate 0.0102 Epoch: 13 Global Step: 565200 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:14:15,342-Speed 2631.59 samples/sec Loss 4.2943 LearningRate 0.0102 Epoch: 13 Global Step: 565210 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:14:19,259-Speed 2614.99 samples/sec Loss 4.3390 LearningRate 0.0102 Epoch: 13 Global Step: 565220 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:14:23,175-Speed 2615.74 samples/sec Loss 4.2696 LearningRate 0.0102 Epoch: 13 Global Step: 565230 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:14:27,044-Speed 2647.74 samples/sec Loss 4.3893 LearningRate 0.0102 Epoch: 13 Global Step: 565240 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:30,935-Speed 2632.68 samples/sec Loss 4.3573 LearningRate 0.0102 Epoch: 13 Global Step: 565250 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:34,837-Speed 2624.41 samples/sec Loss 4.3319 LearningRate 0.0102 Epoch: 13 Global Step: 565260 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:38,738-Speed 2625.87 samples/sec Loss 4.2621 LearningRate 0.0102 Epoch: 13 Global Step: 565270 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:42,632-Speed 2629.68 samples/sec Loss 4.2296 LearningRate 0.0102 Epoch: 13 Global Step: 565280 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:46,556-Speed 2610.68 samples/sec Loss 4.3094 LearningRate 0.0101 Epoch: 13 Global Step: 565290 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:50,471-Speed 2615.73 samples/sec Loss 4.3884 LearningRate 0.0101 Epoch: 13 Global Step: 565300 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:54,421-Speed 2593.52 samples/sec Loss 4.2952 LearningRate 0.0101 Epoch: 13 Global Step: 565310 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:14:58,316-Speed 2629.99 samples/sec Loss 4.3108 LearningRate 0.0101 Epoch: 13 Global Step: 565320 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:15:02,211-Speed 2629.69 samples/sec Loss 4.3518 LearningRate 0.0101 Epoch: 13 Global Step: 565330 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:15:06,080-Speed 2647.47 samples/sec Loss 4.2412 LearningRate 0.0101 Epoch: 13 Global Step: 565340 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:09,975-Speed 2629.20 samples/sec Loss 4.4082 LearningRate 0.0101 Epoch: 13 Global Step: 565350 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:13,873-Speed 2628.14 samples/sec Loss 4.2684 LearningRate 0.0101 Epoch: 13 Global Step: 565360 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:17,766-Speed 2631.27 samples/sec Loss 4.3147 LearningRate 0.0101 Epoch: 13 Global Step: 565370 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:21,655-Speed 2634.10 samples/sec Loss 4.3487 LearningRate 0.0101 Epoch: 13 Global Step: 565380 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:25,553-Speed 2627.54 samples/sec Loss 4.3297 LearningRate 0.0101 Epoch: 13 Global Step: 565390 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:29,454-Speed 2626.18 samples/sec Loss 4.2597 LearningRate 0.0101 Epoch: 13 Global Step: 565400 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:33,348-Speed 2629.74 samples/sec Loss 4.2476 LearningRate 0.0101 Epoch: 13 Global Step: 565410 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:37,256-Speed 2621.18 samples/sec Loss 4.2925 LearningRate 0.0101 Epoch: 13 Global Step: 565420 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:41,149-Speed 2631.27 samples/sec Loss 4.2268 LearningRate 0.0101 Epoch: 13 Global Step: 565430 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:15:45,044-Speed 2630.13 samples/sec Loss 4.3283 LearningRate 0.0101 Epoch: 13 Global Step: 565440 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:15:48,960-Speed 2614.93 samples/sec Loss 4.2379 LearningRate 0.0101 Epoch: 13 Global Step: 565450 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:15:52,870-Speed 2620.27 samples/sec Loss 4.2349 LearningRate 0.0101 Epoch: 13 Global Step: 565460 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:15:56,768-Speed 2627.28 samples/sec Loss 4.3461 LearningRate 0.0101 Epoch: 13 Global Step: 565470 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:00,663-Speed 2631.04 samples/sec Loss 4.3973 LearningRate 0.0101 Epoch: 13 Global Step: 565480 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:04,567-Speed 2623.68 samples/sec Loss 4.2227 LearningRate 0.0101 Epoch: 13 Global Step: 565490 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:08,467-Speed 2625.51 samples/sec Loss 4.2866 LearningRate 0.0101 Epoch: 13 Global Step: 565500 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:12,359-Speed 2631.86 samples/sec Loss 4.3572 LearningRate 0.0101 Epoch: 13 Global Step: 565510 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:16,261-Speed 2624.90 samples/sec Loss 4.3460 LearningRate 0.0101 Epoch: 13 Global Step: 565520 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:20,148-Speed 2634.89 samples/sec Loss 4.3323 LearningRate 0.0101 Epoch: 13 Global Step: 565530 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:24,040-Speed 2632.14 samples/sec Loss 4.3195 LearningRate 0.0101 Epoch: 13 Global Step: 565540 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:16:27,937-Speed 2628.23 samples/sec Loss 4.2953 LearningRate 0.0101 Epoch: 13 Global Step: 565550 Fp16 Grad Scale: 131072 Required: 30 hours
Training: 2022-04-15 11:16:31,811-Speed 2643.78 samples/sec Loss 4.2705 LearningRate 0.0101 Epoch: 13 Global Step: 565560 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:35,714-Speed 2624.86 samples/sec Loss 4.2729 LearningRate 0.0101 Epoch: 13 Global Step: 565570 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:39,616-Speed 2624.73 samples/sec Loss 4.3243 LearningRate 0.0101 Epoch: 13 Global Step: 565580 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:43,511-Speed 2629.29 samples/sec Loss 4.4126 LearningRate 0.0101 Epoch: 13 Global Step: 565590 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:47,417-Speed 2621.77 samples/sec Loss 4.2806 LearningRate 0.0101 Epoch: 13 Global Step: 565600 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:51,317-Speed 2626.60 samples/sec Loss 4.3364 LearningRate 0.0101 Epoch: 13 Global Step: 565610 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:55,224-Speed 2621.25 samples/sec Loss 4.2642 LearningRate 0.0101 Epoch: 13 Global Step: 565620 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:16:59,122-Speed 2627.91 samples/sec Loss 4.4428 LearningRate 0.0101 Epoch: 13 Global Step: 565630 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:03,027-Speed 2623.39 samples/sec Loss 4.1667 LearningRate 0.0101 Epoch: 13 Global Step: 565640 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:06,920-Speed 2630.85 samples/sec Loss 4.2585 LearningRate 0.0101 Epoch: 13 Global Step: 565650 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:10,792-Speed 2645.54 samples/sec Loss 4.1995 LearningRate 0.0101 Epoch: 13 Global Step: 565660 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:14,688-Speed 2628.94 samples/sec Loss 4.2819 LearningRate 0.0101 Epoch: 13 Global Step: 565670 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:18,583-Speed 2629.20 samples/sec Loss 4.3261 LearningRate 0.0101 Epoch: 13 Global Step: 565680 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:22,494-Speed 2618.95 samples/sec Loss 4.2984 LearningRate 0.0101 Epoch: 13 Global Step: 565690 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:26,385-Speed 2632.69 samples/sec Loss 4.2492 LearningRate 0.0101 Epoch: 13 Global Step: 565700 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:30,279-Speed 2629.96 samples/sec Loss 4.2845 LearningRate 0.0101 Epoch: 13 Global Step: 565710 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:34,177-Speed 2627.85 samples/sec Loss 4.3160 LearningRate 0.0101 Epoch: 13 Global Step: 565720 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:38,070-Speed 2631.21 samples/sec Loss 4.3338 LearningRate 0.0101 Epoch: 13 Global Step: 565730 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:41,965-Speed 2629.49 samples/sec Loss 4.2339 LearningRate 0.0101 Epoch: 13 Global Step: 565740 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:17:45,837-Speed 2645.08 samples/sec Loss 4.2345 LearningRate 0.0101 Epoch: 13 Global Step: 565750 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:17:49,733-Speed 2628.61 samples/sec Loss 4.3654 LearningRate 0.0101 Epoch: 13 Global Step: 565760 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:17:53,627-Speed 2632.91 samples/sec Loss 4.2913 LearningRate 0.0101 Epoch: 13 Global Step: 565770 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:17:57,522-Speed 2629.99 samples/sec Loss 4.2233 LearningRate 0.0101 Epoch: 13 Global Step: 565780 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:01,416-Speed 2629.98 samples/sec Loss 4.2787 LearningRate 0.0101 Epoch: 13 Global Step: 565790 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:05,337-Speed 2612.13 samples/sec Loss 4.3394 LearningRate 0.0101 Epoch: 13 Global Step: 565800 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:09,354-Speed 2549.54 samples/sec Loss 4.3872 LearningRate 0.0101 Epoch: 13 Global Step: 565810 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:13,245-Speed 2632.55 samples/sec Loss 4.2250 LearningRate 0.0101 Epoch: 13 Global Step: 565820 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:17,148-Speed 2623.95 samples/sec Loss 4.2686 LearningRate 0.0101 Epoch: 13 Global Step: 565830 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:21,041-Speed 2630.86 samples/sec Loss 4.3683 LearningRate 0.0101 Epoch: 13 Global Step: 565840 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:24,914-Speed 2644.69 samples/sec Loss 4.3358 LearningRate 0.0101 Epoch: 13 Global Step: 565850 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:28,803-Speed 2633.51 samples/sec Loss 4.3736 LearningRate 0.0101 Epoch: 13 Global Step: 565860 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:32,711-Speed 2621.09 samples/sec Loss 4.2444 LearningRate 0.0101 Epoch: 13 Global Step: 565870 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:36,610-Speed 2627.08 samples/sec Loss 4.2475 LearningRate 0.0101 Epoch: 13 Global Step: 565880 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:40,509-Speed 2626.43 samples/sec Loss 4.2848 LearningRate 0.0101 Epoch: 13 Global Step: 565890 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:44,407-Speed 2627.45 samples/sec Loss 4.3565 LearningRate 0.0101 Epoch: 13 Global Step: 565900 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:48,308-Speed 2626.09 samples/sec Loss 4.2420 LearningRate 0.0101 Epoch: 13 Global Step: 565910 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:52,206-Speed 2627.59 samples/sec Loss 4.2943 LearningRate 0.0101 Epoch: 13 Global Step: 565920 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:18:56,098-Speed 2631.48 samples/sec Loss 4.2585 LearningRate 0.0101 Epoch: 13 Global Step: 565930 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:19:00,023-Speed 2609.54 samples/sec Loss 4.1934 LearningRate 0.0101 Epoch: 13 Global Step: 565940 Fp16 Grad Scale: 32768 Required: 30 hours
Training: 2022-04-15 11:19:04,059-Speed 2537.78 samples/sec Loss 4.2395 LearningRate 0.0101 Epoch: 13 Global Step: 565950 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:19:07,959-Speed 2626.10 samples/sec Loss 4.2455 LearningRate 0.0101 Epoch: 13 Global Step: 565960 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:19:11,857-Speed 2628.11 samples/sec Loss 4.3085 LearningRate 0.0101 Epoch: 13 Global Step: 565970 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:19:15,757-Speed 2625.99 samples/sec Loss 4.3369 LearningRate 0.0101 Epoch: 13 Global Step: 565980 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:19:19,652-Speed 2629.13 samples/sec Loss 4.3578 LearningRate 0.0101 Epoch: 13 Global Step: 565990 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:19:23,548-Speed 2629.32 samples/sec Loss 4.3462 LearningRate 0.0101 Epoch: 13 Global Step: 566000 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-04-15 11:19:27,476-Speed 2607.27 samples/sec Loss 4.2846 LearningRate 0.0101 Epoch: 13 Global Step: 566010 Fp16 Grad Scale: 65536 Required: 30 hours
Training: 2022-